2025-12-04T09:32:17.1329506Z Current runner version: '2.330.0' 2025-12-04T09:32:17.1337455Z Runner name: 'i-00bb8650059fae3eb' 2025-12-04T09:32:17.1338361Z Runner group name: 'default' 2025-12-04T09:32:17.1339357Z Machine name: 'ip-10-0-51-5' 2025-12-04T09:32:17.1342657Z ##[group]GITHUB_TOKEN Permissions 2025-12-04T09:32:17.1345291Z Contents: read 2025-12-04T09:32:17.1346008Z Metadata: read 2025-12-04T09:32:17.1346691Z ##[endgroup] 2025-12-04T09:32:17.1349263Z Secret source: Actions 2025-12-04T09:32:17.1350195Z Prepare workflow directory 2025-12-04T09:32:17.1957273Z Prepare all required actions 2025-12-04T09:32:17.2007949Z Getting action download info 2025-12-04T09:32:17.5603949Z Download action repository 'pytorch/test-infra@main' (SHA:39aa74d619174326f4e2fb0e216151c2f29d9ffd) 2025-12-04T09:32:19.8617136Z Download action repository 'pytorch/pytorch@main' (SHA:7716da9fb23f27a65b41f9f016a2afadf281c18f) 2025-12-04T09:32:34.3515375Z Download action repository 'actions/setup-python@a26af69be951a213d495a4c3e4e4022e16d87065' (SHA:a26af69be951a213d495a4c3e4e4022e16d87065) 2025-12-04T09:32:34.7309301Z Download action repository 'aws-actions/configure-aws-credentials@ececac1a45f3b08a01d2dd070d28d111c5fe6722' (SHA:ececac1a45f3b08a01d2dd070d28d111c5fe6722) 2025-12-04T09:32:34.9213619Z Download action repository 'aws-actions/amazon-ecr-login@062b18b96a7aff071d4dc91bc00c4c1a7945b076' (SHA:062b18b96a7aff071d4dc91bc00c4c1a7945b076) 2025-12-04T09:32:35.1017632Z Download action repository 'seemethere/download-artifact-s3@1da556a7aa0a088e3153970611f6c432d58e80e6' (SHA:1da556a7aa0a088e3153970611f6c432d58e80e6) 2025-12-04T09:32:35.4087520Z Download action repository 'seemethere/upload-artifact-s3@baba72d0712b404f646cebe0730933554ebce96a' (SHA:baba72d0712b404f646cebe0730933554ebce96a) 2025-12-04T09:32:35.7010530Z Getting action download info 2025-12-04T09:32:35.8547548Z Download action repository 'actions/checkout@v4' (SHA:34e114876b0b11c390a56381ad16ebd13914f8d5) 2025-12-04T09:32:36.1506520Z Getting action download info 2025-12-04T09:32:36.3006713Z Download action repository 'nick-fields/retry@v3.0.0' (SHA:7152eba30c6575329ac0576536151aca5a72780e) 2025-12-04T09:32:36.5134161Z Getting action download info 2025-12-04T09:32:36.6269662Z Download action repository 'nick-fields/retry@3e91a01664abd3c5cd539100d10d33b9c5b68482' (SHA:3e91a01664abd3c5cd539100d10d33b9c5b68482) 2025-12-04T09:32:36.8922040Z Getting action download info 2025-12-04T09:32:37.0881561Z Uses: pytorch/pytorch/.github/workflows/_linux-test.yml@refs/heads/main (ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32) 2025-12-04T09:32:37.0886066Z ##[group] Inputs 2025-12-04T09:32:37.0886508Z build-environment: linux-jammy-cuda12.4-py3.10-gcc11 2025-12-04T09:32:37.0893942Z test-matrix: {"include": [{"config": "legacy_nvidia_driver", "shard": 1, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "mem_leak_check": "mem_leak_check", "unstable": "unstable"}, {"config": "legacy_nvidia_driver", "shard": 1, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests", "unstable": "unstable"}, {"config": "legacy_nvidia_driver", "shard": 2, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "mem_leak_check": "mem_leak_check", "unstable": "unstable"}, {"config": "legacy_nvidia_driver", "shard": 2, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests", "unstable": "unstable"}, {"config": "legacy_nvidia_driver", "shard": 3, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "mem_leak_check": "mem_leak_check", "unstable": "unstable"}, {"config": "legacy_nvidia_driver", "shard": 3, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests", "unstable": "unstable"}, {"config": "legacy_nvidia_driver", "shard": 4, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "mem_leak_check": "mem_leak_check", "unstable": "unstable"}, {"config": "legacy_nvidia_driver", "shard": 4, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests", "unstable": "unstable"}, {"config": "legacy_nvidia_driver", "shard": 5, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "mem_leak_check": "mem_leak_check", "unstable": "unstable"}, {"config": "legacy_nvidia_driver", "shard": 5, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests", "unstable": "unstable"}]} 2025-12-04T09:32:37.0902145Z docker-image: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.4-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a 2025-12-04T09:32:37.0903159Z sync-tag: 2025-12-04T09:32:37.0904123Z timeout-minutes: 240 2025-12-04T09:32:37.0904437Z use-gha: 2025-12-04T09:32:37.0904697Z dashboard-tag: 2025-12-04T09:32:37.0904969Z s3-bucket: gha-artifacts 2025-12-04T09:32:37.0905291Z aws-role-to-assume: 2025-12-04T09:32:37.0905972Z disable-monitor: false 2025-12-04T09:32:37.0906317Z monitor-log-interval: 5 2025-12-04T09:32:37.0906679Z monitor-data-collect-interval: 1 2025-12-04T09:32:37.0907071Z ##[endgroup] 2025-12-04T09:32:37.0907833Z Complete job name: linux-jammy-cuda12.4-py3.10-gcc11 / test (legacy_nvidia_driver, 1, 5, linux.g4dn.4xlarge.nvidia.gpu, mem_leak_check, unstable) 2025-12-04T09:32:37.1519348Z A job started hook has been configured by the self-hosted runner administrator 2025-12-04T09:32:37.1637728Z ##[group]Run '/home/ec2-user/runner-scripts/before_job.sh' 2025-12-04T09:32:37.1648545Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T09:32:37.1649310Z ##[endgroup] 2025-12-04T09:32:38.6911482Z Runner Type: linux.g4dn.4xlarge.nvidia.gpu 2025-12-04T09:32:38.6912144Z Instance Type: g4dn.4xlarge 2025-12-04T09:32:38.6912459Z AMI Name: unknown 2025-12-04T09:32:38.6952334Z AMI ID: ami-08982f1c5bf93d976 2025-12-04T09:32:45.3265680Z ##[group]Run pytorch/test-infra/.github/actions/setup-ssh@main 2025-12-04T09:32:45.3266213Z with: 2025-12-04T09:32:45.3266880Z github-secret: *** 2025-12-04T09:32:45.3267740Z instructions: All testing is done inside the container, to start an interactive session run: docker exec -it $(docker container ps --format '{{.ID}}') bash 2025-12-04T09:32:45.3268701Z activate-with-label: false 2025-12-04T09:32:45.3269032Z label: with-ssh 2025-12-04T09:32:45.3269312Z remove-existing-keys: true 2025-12-04T09:32:45.3269644Z fail-silently: true 2025-12-04T09:32:45.3269940Z env: 2025-12-04T09:32:45.3270179Z GIT_DEFAULT_BRANCH: main 2025-12-04T09:32:45.3270497Z ##[endgroup] 2025-12-04T09:32:45.4853692Z Please see https://github.com/pytorch/pytorch/wiki/Debugging-using-with-ssh-for-Github-Actions for more info. 2025-12-04T09:32:45.4855535Z Not on pull request and ciflow reference could not be extracted, skipping adding ssh keys 2025-12-04T09:32:45.5166195Z ##[group]Run pytorch/pytorch/.github/actions/checkout-pytorch@main 2025-12-04T09:32:45.5166727Z with: 2025-12-04T09:32:45.5166993Z no-sudo: true 2025-12-04T09:32:45.5167273Z submodules: recursive 2025-12-04T09:32:45.5167575Z fetch-depth: 0 2025-12-04T09:32:45.5167867Z env: 2025-12-04T09:32:45.5168115Z GIT_DEFAULT_BRANCH: main 2025-12-04T09:32:45.5168422Z ##[endgroup] 2025-12-04T09:32:45.5284862Z ##[group]Run echo "IN_CONTAINER_RUNNER=$(if [ -f /.inarc ] || [ -f /.incontainer ]; then echo true ; else echo false; fi)" >> "$GITHUB_OUTPUT" 2025-12-04T09:32:45.5286016Z echo "IN_CONTAINER_RUNNER=$(if [ -f /.inarc ] || [ -f /.incontainer ]; then echo true ; else echo false; fi)" >> "$GITHUB_OUTPUT" 2025-12-04T09:32:45.5298830Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T09:32:45.5299670Z env: 2025-12-04T09:32:45.5300146Z GIT_DEFAULT_BRANCH: main 2025-12-04T09:32:45.5300753Z ##[endgroup] 2025-12-04T09:32:45.5399589Z ##[group]Run # Use all available CPUs for fetching 2025-12-04T09:32:45.5400123Z # Use all available CPUs for fetching 2025-12-04T09:32:45.5400542Z cd "${GITHUB_WORKSPACE}" 2025-12-04T09:32:45.5400937Z git config --global fetch.parallel 0 2025-12-04T09:32:45.5401581Z git config --global submodule.fetchJobs 0 2025-12-04T09:32:45.5402000Z  2025-12-04T09:32:45.5402418Z # Clean workspace. The default checkout action should also do this, but 2025-12-04T09:32:45.5402987Z # do it here as well just in case 2025-12-04T09:32:45.5403375Z if [[ -d .git ]]; then 2025-12-04T09:32:45.5403713Z  if [ -z "${NO_SUDO}" ]; then 2025-12-04T09:32:45.5404087Z  sudo git clean -ffdx 2025-12-04T09:32:45.5404420Z  else 2025-12-04T09:32:45.5404700Z  git clean -ffdx 2025-12-04T09:32:45.5404998Z  fi 2025-12-04T09:32:45.5405249Z fi 2025-12-04T09:32:45.5411925Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T09:32:45.5412369Z env: 2025-12-04T09:32:45.5412718Z GIT_DEFAULT_BRANCH: main 2025-12-04T09:32:45.5413059Z NO_SUDO: true 2025-12-04T09:32:45.5413310Z ##[endgroup] 2025-12-04T09:32:45.5546641Z ##[group]Run actions/checkout@v4 2025-12-04T09:32:45.5547032Z with: 2025-12-04T09:32:45.5547334Z ref: ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32 2025-12-04T09:32:45.5547723Z fetch-depth: 0 2025-12-04T09:32:45.5548012Z submodules: recursive 2025-12-04T09:32:45.5548321Z show-progress: false 2025-12-04T09:32:45.5548639Z repository: pytorch/pytorch 2025-12-04T09:32:45.5549106Z token: *** 2025-12-04T09:32:45.5549370Z ssh-strict: true 2025-12-04T09:32:45.5549649Z ssh-user: git 2025-12-04T09:32:45.5549917Z persist-credentials: true 2025-12-04T09:32:45.5550234Z clean: true 2025-12-04T09:32:45.5550529Z sparse-checkout-cone-mode: true 2025-12-04T09:32:45.5550868Z fetch-tags: false 2025-12-04T09:32:45.5551141Z lfs: false 2025-12-04T09:32:45.5551410Z set-safe-directory: true 2025-12-04T09:32:45.5551715Z env: 2025-12-04T09:32:45.5551965Z GIT_DEFAULT_BRANCH: main 2025-12-04T09:32:45.5552271Z ##[endgroup] 2025-12-04T09:32:45.6801168Z Syncing repository: pytorch/pytorch 2025-12-04T09:32:45.6802744Z ##[group]Getting Git version info 2025-12-04T09:32:45.6803349Z Working directory is '/home/ec2-user/actions-runner/_work/pytorch/pytorch' 2025-12-04T09:32:45.6804133Z [command]/usr/bin/git version 2025-12-04T09:32:45.6962656Z git version 2.50.1 2025-12-04T09:32:45.7007886Z ##[endgroup] 2025-12-04T09:32:45.7019200Z Copying '/home/ec2-user/.gitconfig' to '/home/ec2-user/actions-runner/_work/_temp/f7d10314-b94e-44a0-bd16-0b12211406dc/.gitconfig' 2025-12-04T09:32:45.7039395Z Temporarily overriding HOME='/home/ec2-user/actions-runner/_work/_temp/f7d10314-b94e-44a0-bd16-0b12211406dc' before making global git config changes 2025-12-04T09:32:45.7040584Z Adding repository directory to the temporary git global config as a safe directory 2025-12-04T09:32:45.7044875Z [command]/usr/bin/git config --global --add safe.directory /home/ec2-user/actions-runner/_work/pytorch/pytorch 2025-12-04T09:32:45.7089610Z Deleting the contents of '/home/ec2-user/actions-runner/_work/pytorch/pytorch' 2025-12-04T09:32:45.7093092Z ##[group]Initializing the repository 2025-12-04T09:32:45.7097770Z [command]/usr/bin/git init /home/ec2-user/actions-runner/_work/pytorch/pytorch 2025-12-04T09:32:45.7161525Z hint: Using 'master' as the name for the initial branch. This default branch name 2025-12-04T09:32:45.7162258Z hint: is subject to change. To configure the initial branch name to use in all 2025-12-04T09:32:45.7162939Z hint: of your new repositories, which will suppress this warning, call: 2025-12-04T09:32:45.7163426Z hint: 2025-12-04T09:32:45.7163777Z hint: git config --global init.defaultBranch 2025-12-04T09:32:45.7164194Z hint: 2025-12-04T09:32:45.7164574Z hint: Names commonly chosen instead of 'master' are 'main', 'trunk' and 2025-12-04T09:32:45.7165268Z hint: 'development'. The just-created branch can be renamed via this command: 2025-12-04T09:32:45.7165788Z hint: 2025-12-04T09:32:45.7166034Z hint: git branch -m 2025-12-04T09:32:45.7166345Z hint: 2025-12-04T09:32:45.7166793Z hint: Disable this message with "git config set advice.defaultBranchName false" 2025-12-04T09:32:45.7170682Z Initialized empty Git repository in /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/ 2025-12-04T09:32:45.7181536Z [command]/usr/bin/git remote add origin https://github.com/pytorch/pytorch 2025-12-04T09:32:45.7219613Z ##[endgroup] 2025-12-04T09:32:45.7220132Z ##[group]Disabling automatic garbage collection 2025-12-04T09:32:45.7223510Z [command]/usr/bin/git config --local gc.auto 0 2025-12-04T09:32:45.7251151Z ##[endgroup] 2025-12-04T09:32:45.7251611Z ##[group]Setting up auth 2025-12-04T09:32:45.7258661Z [command]/usr/bin/git config --local --name-only --get-regexp core\.sshCommand 2025-12-04T09:32:45.7287966Z [command]/usr/bin/git submodule foreach --recursive sh -c "git config --local --name-only --get-regexp 'core\.sshCommand' && git config --local --unset-all 'core.sshCommand' || :" 2025-12-04T09:32:45.7633156Z [command]/usr/bin/git config --local --name-only --get-regexp http\.https\:\/\/github\.com\/\.extraheader 2025-12-04T09:32:45.7663049Z [command]/usr/bin/git submodule foreach --recursive sh -c "git config --local --name-only --get-regexp 'http\.https\:\/\/github\.com\/\.extraheader' && git config --local --unset-all 'http.https://github.com/.extraheader' || :" 2025-12-04T09:32:45.7967570Z [command]/usr/bin/git config --local --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T09:32:45.7997259Z [command]/usr/bin/git submodule foreach --recursive git config --local --show-origin --name-only --get-regexp remote.origin.url 2025-12-04T09:32:45.8341456Z [command]/usr/bin/git config --local http.https://github.com/.extraheader AUTHORIZATION: basic *** 2025-12-04T09:32:45.8399936Z ##[endgroup] 2025-12-04T09:32:45.8400465Z ##[group]Fetching the repository 2025-12-04T09:32:45.8409416Z [command]/usr/bin/git -c protocol.version=2 fetch --prune --no-recurse-submodules origin +refs/heads/*:refs/remotes/origin/* +refs/tags/*:refs/tags/* 2025-12-04T09:33:41.1139450Z From https://github.com/pytorch/pytorch 2025-12-04T09:33:41.1140961Z * [new branch] 2.6.0.dev20241004+ -> origin/2.6.0.dev20241004+ 2025-12-04T09:33:41.1143039Z * [new branch] 2.9.1 -> origin/2.9.1 2025-12-04T09:33:41.1143872Z * [new branch] AaronWang04_addmmfusion_perftest -> origin/AaronWang04_addmmfusion_perftest 2025-12-04T09:33:41.1144644Z * [new branch] Flamefire-patch-1 -> origin/Flamefire-patch-1 2025-12-04T09:33:41.1145382Z * [new branch] HDCharles-2.6.0-release-notes -> origin/HDCharles-2.6.0-release-notes 2025-12-04T09:33:41.1146077Z * [new branch] HOPrintFunc -> origin/HOPrintFunc 2025-12-04T09:33:41.1146797Z * [new branch] IvanKobzarev/stack/1 -> origin/IvanKobzarev/stack/1 2025-12-04T09:33:41.1149336Z * [new branch] NicoshevSVE128 -> origin/NicoshevSVE128 2025-12-04T09:33:41.1150561Z * [new branch] PR-AOTInductorNoneBug -> origin/PR-AOTInductorNoneBug 2025-12-04T09:33:41.1152421Z * [new branch] PR-AOTInductorNoneBugFix -> origin/PR-AOTInductorNoneBugFix 2025-12-04T09:33:41.1153656Z * [new branch] PR-FixConfigsIssue -> origin/PR-FixConfigsIssue 2025-12-04T09:33:41.1155008Z * [new branch] PR-NoneBugFix-viable -> origin/PR-NoneBugFix-viable 2025-12-04T09:33:41.1156584Z * [new branch] PR-ResetToZero -> origin/PR-ResetToZero 2025-12-04T09:33:41.1158123Z * [new branch] Update-Flash-Packaging -> origin/Update-Flash-Packaging 2025-12-04T09:33:41.1159680Z * [new branch] VLA_exp -> origin/VLA_exp 2025-12-04T09:33:41.1161328Z * [new branch] activation_bench -> origin/activation_bench 2025-12-04T09:33:41.1163467Z * [new branch] addmm-heuristic -> origin/addmm-heuristic 2025-12-04T09:33:41.1165588Z * [new branch] adi/onednn_aarch64 -> origin/adi/onednn_aarch64 2025-12-04T09:33:41.1167135Z * [new branch] adi/test -> origin/adi/test 2025-12-04T09:33:41.1168582Z * [new branch] adi/test_bgemm -> origin/adi/test_bgemm 2025-12-04T09:33:41.1170137Z * [new branch] adi/test_m8g -> origin/adi/test_m8g 2025-12-04T09:33:41.1171919Z * [new branch] adi/test_onednn -> origin/adi/test_onednn 2025-12-04T09:33:41.1173497Z * [new branch] adi/test_onednn_v3.9 -> origin/adi/test_onednn_v3.9 2025-12-04T09:33:41.1175033Z * [new branch] adi/test_presve_change -> origin/adi/test_presve_change 2025-12-04T09:33:41.1176405Z * [new branch] adi/test_timm -> origin/adi/test_timm 2025-12-04T09:33:41.1178426Z * [new branch] adi/testpresve_change -> origin/adi/testpresve_change 2025-12-04T09:33:41.1180909Z * [new branch] aditew01/test/vec_bf16 -> origin/aditew01/test/vec_bf16 2025-12-04T09:33:41.1182270Z * [new branch] ah-globalfeedback-hook -> origin/ah-globalfeedback-hook 2025-12-04T09:33:41.1184103Z * [new branch] albanD-patch-1 -> origin/albanD-patch-1 2025-12-04T09:33:41.1185380Z * [new branch] also-surround-shimh -> origin/also-surround-shimh 2025-12-04T09:33:41.1187644Z * [new branch] angelayi/aot_compile -> origin/angelayi/aot_compile 2025-12-04T09:33:41.1189050Z * [new branch] angelayi/aoti_additional_files -> origin/angelayi/aoti_additional_files 2025-12-04T09:33:41.1190465Z * [new branch] angelayi/benchmark -> origin/angelayi/benchmark 2025-12-04T09:33:41.1192105Z * [new branch] angelayi/change_pytree_serialization -> origin/angelayi/change_pytree_serialization 2025-12-04T09:33:41.1193249Z * [new branch] angelayi/cpp_loader -> origin/angelayi/cpp_loader 2025-12-04T09:33:41.1194872Z * [new branch] angelayi/inductor_const -> origin/angelayi/inductor_const 2025-12-04T09:33:41.1196064Z * [new branch] angelayi/lstm -> origin/angelayi/lstm 2025-12-04T09:33:41.1198115Z * [new branch] angelayi/no_so_weight -> origin/angelayi/no_so_weight 2025-12-04T09:33:41.1199985Z * [new branch] angelayi/scan_layers -> origin/angelayi/scan_layers 2025-12-04T09:33:41.1201487Z * [new branch] angelayi/side_eff -> origin/angelayi/side_eff 2025-12-04T09:33:41.1203052Z * [new branch] angelayi/state_dict -> origin/angelayi/state_dict 2025-12-04T09:33:41.1204396Z * [new branch] angelayi/symint_input -> origin/angelayi/symint_input 2025-12-04T09:33:41.1206219Z * [new branch] angelayi/symm_mem -> origin/angelayi/symm_mem 2025-12-04T09:33:41.1207399Z * [new branch] angelayi/test_cpp -> origin/angelayi/test_cpp 2025-12-04T09:33:41.1209002Z * [new branch] angelayi/torch_size -> origin/angelayi/torch_size 2025-12-04T09:33:41.1210337Z * [new branch] annotate_assert -> origin/annotate_assert 2025-12-04T09:33:41.1212019Z * [new branch] annotate_fallback_kernel -> origin/annotate_fallback_kernel 2025-12-04T09:33:41.1213333Z * [new branch] annotation_deepcopy -> origin/annotation_deepcopy 2025-12-04T09:33:41.1214870Z * [new branch] annotation_dynamo -> origin/annotation_dynamo 2025-12-04T09:33:41.1216493Z * [new branch] aot_eager_stack_trace -> origin/aot_eager_stack_trace 2025-12-04T09:33:41.1218018Z * [new branch] aoti-cuda-alloc -> origin/aoti-cuda-alloc 2025-12-04T09:33:41.1219488Z * [new branch] aoti_const_device -> origin/aoti_const_device 2025-12-04T09:33:41.1220987Z * [new branch] aoti_fqn_name_interface -> origin/aoti_fqn_name_interface 2025-12-04T09:33:41.1222358Z * [new branch] aoti_package_weights_binary -> origin/aoti_package_weights_binary 2025-12-04T09:33:41.1223752Z * [new branch] aoti_target_windows -> origin/aoti_target_windows 2025-12-04T09:33:41.1226341Z * [new branch] arsh/feat/inductor_check_profiling -> origin/arsh/feat/inductor_check_profiling 2025-12-04T09:33:41.1227622Z * [new branch] async_tp -> origin/async_tp 2025-12-04T09:33:41.1229359Z * [new branch] atalman-inductor-perf-cu124 -> origin/atalman-inductor-perf-cu124 2025-12-04T09:33:41.1230799Z * [new branch] atalman-inductor-perf-cu124.1 -> origin/atalman-inductor-perf-cu124.1 2025-12-04T09:33:41.1232486Z * [new branch] atalman-patch-2 -> origin/atalman-patch-2 2025-12-04T09:33:41.1234064Z * [new branch] atalman-patch-3 -> origin/atalman-patch-3 2025-12-04T09:33:41.1235606Z * [new branch] atalman-patch-4 -> origin/atalman-patch-4 2025-12-04T09:33:41.1237177Z * [new branch] atalman-patch-5 -> origin/atalman-patch-5 2025-12-04T09:33:41.1238761Z * [new branch] atalman-patch-6 -> origin/atalman-patch-6 2025-12-04T09:33:41.1240314Z * [new branch] atalman-patch-7 -> origin/atalman-patch-7 2025-12-04T09:33:41.1241957Z * [new branch] atalman-patch-8 -> origin/atalman-patch-8 2025-12-04T09:33:41.1243458Z * [new branch] atalman_inductor_2.3.1 -> origin/atalman_inductor_2.3.1 2025-12-04T09:33:41.1244812Z * [new branch] atalman_inductor_2.4.0 -> origin/atalman_inductor_2.4.0 2025-12-04T09:33:41.1246472Z * [new branch] atalman_inductor_2.4.x -> origin/atalman_inductor_2.4.x 2025-12-04T09:33:41.1248250Z * [new branch] attention_benchmarking_clean -> origin/attention_benchmarking_clean 2025-12-04T09:33:41.1250276Z * [new branch] bahuang/dt_fix_scalar_add -> origin/bahuang/dt_fix_scalar_add 2025-12-04T09:33:41.1251573Z * [new branch] bahuang/fix_debug_mode -> origin/bahuang/fix_debug_mode 2025-12-04T09:33:41.1253083Z * [new branch] bahuang/fix_expand -> origin/bahuang/fix_expand 2025-12-04T09:33:41.1254586Z * [new branch] bahuang/test -> origin/bahuang/test 2025-12-04T09:33:41.1256862Z * [new branch] base/1.5 -> origin/base/1.5 2025-12-04T09:33:41.1258741Z * [new branch] batching_sdpa_efficient_attention -> origin/batching_sdpa_efficient_attention 2025-12-04T09:33:41.1260028Z * [new branch] bench_scaled_mm_ops -> origin/bench_scaled_mm_ops 2025-12-04T09:33:41.1261741Z * [new branch] benchmark-updates -> origin/benchmark-updates 2025-12-04T09:33:41.1263045Z * [new branch] benchmarking-script -> origin/benchmarking-script 2025-12-04T09:33:41.1265084Z * [new branch] bertmaher/pinbump26 -> origin/bertmaher/pinbump26 2025-12-04T09:33:41.1267037Z * [new branch] bertrand/cutlass -> origin/bertrand/cutlass 2025-12-04T09:33:41.1268972Z * [new branch] bf/bug-static-input -> origin/bf/bug-static-input 2025-12-04T09:33:41.1270268Z * [new branch] bf/cg-backend -> origin/bf/cg-backend 2025-12-04T09:33:41.1271979Z * [new branch] bf/cg-nccl-test -> origin/bf/cg-nccl-test 2025-12-04T09:33:41.1273264Z * [new branch] bf/cg-remove-check -> origin/bf/cg-remove-check 2025-12-04T09:33:41.1274936Z * [new branch] bf/clean-torchbench-hf -> origin/bf/clean-torchbench-hf 2025-12-04T09:33:41.1276233Z * [new branch] bf/combo-debug-log -> origin/bf/combo-debug-log 2025-12-04T09:33:41.1277744Z * [new branch] bf/cudagraph -> origin/bf/cudagraph 2025-12-04T09:33:41.1279743Z * [new branch] bf/cudagraph-disable-input-mutation -> origin/bf/cudagraph-disable-input-mutation 2025-12-04T09:33:41.1281542Z * [new branch] bf/cudagraph-enable-input-mutation-support-benchmark -> origin/bf/cudagraph-enable-input-mutation-support-benchmark 2025-12-04T09:33:41.1283073Z * [new branch] bf/cudagraph-partition -> origin/bf/cudagraph-partition 2025-12-04T09:33:41.1284073Z * [new branch] bf/donated-buffer-bench -> origin/bf/donated-buffer-bench 2025-12-04T09:33:41.1285615Z * [new branch] bf/dynamo-partition -> origin/bf/dynamo-partition 2025-12-04T09:33:41.1287041Z * [new branch] bf/lite -> origin/bf/lite 2025-12-04T09:33:41.1288511Z * [new branch] bf/pa-non-divisible -> origin/bf/pa-non-divisible 2025-12-04T09:33:41.1290198Z * [new branch] bf/partition-cache-free-symbols -> origin/bf/partition-cache-free-symbols 2025-12-04T09:33:41.1291585Z * [new branch] bf/partition-memory-plan -> origin/bf/partition-memory-plan 2025-12-04T09:33:41.1293023Z * [new branch] bf/partition-move-cpu -> origin/bf/partition-move-cpu 2025-12-04T09:33:41.1294610Z * [new branch] bf/partition-view-fallback -> origin/bf/partition-view-fallback 2025-12-04T09:33:41.1295916Z * [new branch] bf/remove-check-55b0c39d -> origin/bf/remove-check-55b0c39d 2025-12-04T09:33:41.1297518Z * [new branch] bf/timm-nov-26-2025 -> origin/bf/timm-nov-26-2025 2025-12-04T09:33:41.1298877Z * [new branch] bf/transformer-pin-4-57-3 -> origin/bf/transformer-pin-4-57-3 2025-12-04T09:33:41.1300471Z * [new branch] bisect_perf_hf_T5_3acc6eac492 -> origin/bisect_perf_hf_T5_3acc6eac492 2025-12-04T09:33:41.1301796Z * [new branch] bisect_perf_hf_T5_3fcf66f61fb -> origin/bisect_perf_hf_T5_3fcf66f61fb 2025-12-04T09:33:41.1303187Z * [new branch] bisect_perf_hf_T5_4009d154129 -> origin/bisect_perf_hf_T5_4009d154129 2025-12-04T09:33:41.1304536Z * [new branch] bisect_perf_hf_T5_40d0740e73d -> origin/bisect_perf_hf_T5_40d0740e73d 2025-12-04T09:33:41.1305934Z * [new branch] bisect_perf_hf_T5_5268754e -> origin/bisect_perf_hf_T5_5268754e 2025-12-04T09:33:41.1307259Z * [new branch] bisect_perf_hf_T5_7d89a8d385c -> origin/bisect_perf_hf_T5_7d89a8d385c 2025-12-04T09:33:41.1308806Z * [new branch] bisect_perf_hf_T5_b7a25c1ee7c -> origin/bisect_perf_hf_T5_b7a25c1ee7c 2025-12-04T09:33:41.1310100Z * [new branch] bisect_perf_hf_T5_c25b201583f -> origin/bisect_perf_hf_T5_c25b201583f 2025-12-04T09:33:41.1311513Z * [new branch] bisect_perf_hf_T5_c93e57efac0 -> origin/bisect_perf_hf_T5_c93e57efac0 2025-12-04T09:33:41.1313344Z * [new branch] bisect_perf_hf_T5_ca9813ea149 -> origin/bisect_perf_hf_T5_ca9813ea149 2025-12-04T09:33:41.1314430Z * [new branch] bisect_perf_hf_T5_d65f194a -> origin/bisect_perf_hf_T5_d65f194a 2025-12-04T09:33:41.1315819Z * [new branch] bisect_perf_hf_T5_da94ab0b -> origin/bisect_perf_hf_T5_da94ab0b 2025-12-04T09:33:41.1317206Z * [new branch] bisect_perf_hf_T5_da94ab0b_new -> origin/bisect_perf_hf_T5_da94ab0b_new 2025-12-04T09:33:41.1318596Z * [new branch] bisect_perf_hf_T5_db4e8a1d8a8 -> origin/bisect_perf_hf_T5_db4e8a1d8a8 2025-12-04T09:33:41.1319909Z * [new branch] bisect_perf_hf_T5_e0d97e936a2 -> origin/bisect_perf_hf_T5_e0d97e936a2 2025-12-04T09:33:41.1321451Z * [new branch] bisect_perf_hf_T5_f23621ec563 -> origin/bisect_perf_hf_T5_f23621ec563 2025-12-04T09:33:41.1323412Z * [new branch] brister/fx_device_type -> origin/brister/fx_device_type 2025-12-04T09:33:41.1324765Z * [new branch] brister/test_inductor_all_fx -> origin/brister/test_inductor_all_fx 2025-12-04T09:33:41.1326205Z * [new branch] brister/tiled_reduction_no_numel_check -> origin/brister/tiled_reduction_no_numel_check 2025-12-04T09:33:41.1327536Z * [new branch] bwd-backup -> origin/bwd-backup 2025-12-04T09:33:41.1329266Z * [new branch] c57382a49 -> origin/c57382a49 2025-12-04T09:33:41.1330498Z * [new branch] ca_0431d47eaa -> origin/ca_0431d47eaa 2025-12-04T09:33:41.1331980Z * [new branch] ca_fix_0431d47eaa -> origin/ca_fix_0431d47eaa 2025-12-04T09:33:41.1334073Z * [new branch] camyllh/test_setup_hooks_push -> origin/camyllh/test_setup_hooks_push 2025-12-04T09:33:41.1335729Z * [new branch] cccclai-patch-1 -> origin/cccclai-patch-1 2025-12-04T09:33:41.1337626Z * [new branch] cherry-pick-159969-by-pytorch_bot_bot_ -> origin/cherry-pick-159969-by-pytorch_bot_bot_ 2025-12-04T09:33:41.1338985Z * [new branch] cherry-pick-160586-by-pytorch_bot_bot_ -> origin/cherry-pick-160586-by-pytorch_bot_bot_ 2025-12-04T09:33:41.1340572Z * [new branch] cherry-pick-162208-by-pytorch_bot_bot_ -> origin/cherry-pick-162208-by-pytorch_bot_bot_ 2025-12-04T09:33:41.1342033Z * [new branch] cherry-pick-163169-by-pytorch_bot_bot_ -> origin/cherry-pick-163169-by-pytorch_bot_bot_ 2025-12-04T09:33:41.1343525Z * [new branch] cherry-pick-165086-by-pytorch_bot_bot_ -> origin/cherry-pick-165086-by-pytorch_bot_bot_ 2025-12-04T09:33:41.1345130Z * [new branch] cherry-pick-165514-by-pytorch_bot_bot_ -> origin/cherry-pick-165514-by-pytorch_bot_bot_ 2025-12-04T09:33:41.1346546Z * [new branch] cherry-pick-165601-by-pytorch_bot_bot_ -> origin/cherry-pick-165601-by-pytorch_bot_bot_ 2025-12-04T09:33:41.1348021Z * [new branch] cherry-pick-165667-by-pytorch_bot_bot_ -> origin/cherry-pick-165667-by-pytorch_bot_bot_ 2025-12-04T09:33:41.1349615Z * [new branch] cherry-pick-165815-by-pytorch_bot_bot_ -> origin/cherry-pick-165815-by-pytorch_bot_bot_ 2025-12-04T09:33:41.1351220Z * [new branch] cherry-pick-165922-by-pytorch_bot_bot_ -> origin/cherry-pick-165922-by-pytorch_bot_bot_ 2025-12-04T09:33:41.1352640Z * [new branch] cherry-pick-166148-by-pytorch_bot_bot_ -> origin/cherry-pick-166148-by-pytorch_bot_bot_ 2025-12-04T09:33:41.1354076Z * [new branch] cherry-pick-166181-by-pytorch_bot_bot_ -> origin/cherry-pick-166181-by-pytorch_bot_bot_ 2025-12-04T09:33:41.1355487Z * [new branch] cherry-pick-166404-by-pytorch_bot_bot_ -> origin/cherry-pick-166404-by-pytorch_bot_bot_ 2025-12-04T09:33:41.1356982Z * [new branch] cherry-pick-166427-by-pytorch_bot_bot_ -> origin/cherry-pick-166427-by-pytorch_bot_bot_ 2025-12-04T09:33:41.1358539Z * [new branch] cherry-pick-166480-by-pytorch_bot_bot_ -> origin/cherry-pick-166480-by-pytorch_bot_bot_ 2025-12-04T09:33:41.1359858Z * [new branch] cherry-pick-166570-by-pytorch_bot_bot_ -> origin/cherry-pick-166570-by-pytorch_bot_bot_ 2025-12-04T09:33:41.1361372Z * [new branch] cherry-pick-166993-by-pytorch_bot_bot_ -> origin/cherry-pick-166993-by-pytorch_bot_bot_ 2025-12-04T09:33:41.1362827Z * [new branch] cherry-pick-167111-by-pytorch_bot_bot_ -> origin/cherry-pick-167111-by-pytorch_bot_bot_ 2025-12-04T09:33:41.1364346Z * [new branch] cherry-pick-167478-by-pytorch_bot_bot_ -> origin/cherry-pick-167478-by-pytorch_bot_bot_ 2025-12-04T09:33:41.1365589Z * [new branch] cherry_pick_166036_166040 -> origin/cherry_pick_166036_166040 2025-12-04T09:33:41.1367198Z * [new branch] cherry_pick_166457 -> origin/cherry_pick_166457 2025-12-04T09:33:41.1368790Z * [new branch] cherrypick_166338 -> origin/cherrypick_166338 2025-12-04T09:33:41.1370303Z * [new branch] cherrypick_166458 -> origin/cherrypick_166458 2025-12-04T09:33:41.1371984Z * [new branch] cherrypick_166586 -> origin/cherrypick_166586 2025-12-04T09:33:41.1373528Z * [new branch] cherrypick_166956 -> origin/cherrypick_166956 2025-12-04T09:33:41.1374886Z * [new branch] ci_attn -> origin/ci_attn 2025-12-04T09:33:41.1376481Z * [new branch] codex-testing -> origin/codex-testing 2025-12-04T09:33:41.1378948Z * [new branch] codex/add-check_memory_overlap-helper-functions -> origin/codex/add-check_memory_overlap-helper-functions 2025-12-04T09:33:41.1380116Z * [new branch] codex/fix-issue-121219-in-pytorch -> origin/codex/fix-issue-121219-in-pytorch 2025-12-04T09:33:41.1382220Z * [new branch] codex/investigate-segfaults-in-get_tensor_storage_id -> origin/codex/investigate-segfaults-in-get_tensor_storage_id 2025-12-04T09:33:41.1383919Z * [new branch] codex/refactor-lintrunner-config-to-use-uv-run -> origin/codex/refactor-lintrunner-config-to-use-uv-run 2025-12-04T09:33:41.1385033Z * [new branch] compatiblpy39util -> origin/compatiblpy39util 2025-12-04T09:33:41.1386609Z * [new branch] cond_hop_device -> origin/cond_hop_device 2025-12-04T09:33:41.1388018Z * [new branch] context_test -> origin/context_test 2025-12-04T09:33:41.1390116Z * [new branch] copilot/code-style-cleanup-python-pip -> origin/copilot/code-style-cleanup-python-pip 2025-12-04T09:33:41.1391890Z * [new branch] cpio/fix_new_ami_tests -> origin/cpio/fix_new_ami_tests 2025-12-04T09:33:41.1393433Z * [new branch] cpp-docs-dependency-upgrade -> origin/cpp-docs-dependency-upgrade 2025-12-04T09:33:41.1395662Z * [new branch] crpa/typo-in-inductor_comm_lowering -> origin/crpa/typo-in-inductor_comm_lowering 2025-12-04T09:33:41.1397345Z * [new branch] csl/always_produce_xml -> origin/csl/always_produce_xml 2025-12-04T09:33:41.1398618Z * [new branch] csl/build_test_more_procs -> origin/csl/build_test_more_procs 2025-12-04T09:33:41.1400120Z * [new branch] csl/build_test_more_procs2 -> origin/csl/build_test_more_procs2 2025-12-04T09:33:41.1401430Z * [new branch] csl/clean_up -> origin/csl/clean_up 2025-12-04T09:33:41.1403451Z * [new branch] csl/fix_retry_segfault_exit -> origin/csl/fix_retry_segfault_exit 2025-12-04T09:33:41.1404622Z * [new branch] csl/katex -> origin/csl/katex 2025-12-04T09:33:41.1406498Z * [new branch] csl/larger_runner -> origin/csl/larger_runner 2025-12-04T09:33:41.1408277Z * [new branch] csl/lint_testing -> origin/csl/lint_testing 2025-12-04T09:33:41.1410042Z * [new branch] csl/lint_thing -> origin/csl/lint_thing 2025-12-04T09:33:41.1411731Z * [new branch] csl/lintrunner_stuff -> origin/csl/lintrunner_stuff 2025-12-04T09:33:41.1413228Z * [new branch] csl/manually_gen_json -> origin/csl/manually_gen_json 2025-12-04T09:33:41.1414730Z * [new branch] csl/mps_sharding -> origin/csl/mps_sharding 2025-12-04T09:33:41.1416225Z * [new branch] csl/multistage_docker -> origin/csl/multistage_docker 2025-12-04T09:33:41.1417866Z * [new branch] csl/print_timing -> origin/csl/print_timing 2025-12-04T09:33:41.1419331Z * [new branch] csl/remove_experiment -> origin/csl/remove_experiment 2025-12-04T09:33:41.1420874Z * [new branch] csl/remove_maybe_unused_var -> origin/csl/remove_maybe_unused_var 2025-12-04T09:33:41.1422498Z * [new branch] csl/remove_repo_specific_autolabel -> origin/csl/remove_repo_specific_autolabel 2025-12-04T09:33:41.1423858Z * [new branch] csl/remove_run_parallel -> origin/csl/remove_run_parallel 2025-12-04T09:33:41.1425166Z * [new branch] csl/remove_unused_vars -> origin/csl/remove_unused_vars 2025-12-04T09:33:41.1426667Z * [new branch] csl/revert_open -> origin/csl/revert_open 2025-12-04T09:33:41.1428171Z * [new branch] csl/skip_build -> origin/csl/skip_build 2025-12-04T09:33:41.1429540Z * [new branch] csl/smaller_avx_amx_runenrs -> origin/csl/smaller_avx_amx_runenrs 2025-12-04T09:33:41.1430846Z * [new branch] csl/td_job_level -> origin/csl/td_job_level 2025-12-04T09:33:41.1432524Z * [new branch] csl/test_cuda_build_large_runner -> origin/csl/test_cuda_build_large_runner 2025-12-04T09:33:41.1434030Z * [new branch] csl/test_owners_autograd_dispatch_nn -> origin/csl/test_owners_autograd_dispatch_nn 2025-12-04T09:33:41.1435402Z * [new branch] csl/test_owners_higher_confidence -> origin/csl/test_owners_higher_confidence 2025-12-04T09:33:41.1436731Z * [new branch] csl/upload_json_running -> origin/csl/upload_json_running 2025-12-04T09:33:41.1438301Z * [new branch] csl/win_sccache -> origin/csl/win_sccache 2025-12-04T09:33:41.1439674Z * [new branch] csl/xml_stuff -> origin/csl/xml_stuff 2025-12-04T09:33:41.1441173Z * [new branch] cublasrelax2 -> origin/cublasrelax2 2025-12-04T09:33:41.1442686Z * [new branch] cuda_mempool -> origin/cuda_mempool 2025-12-04T09:33:41.1444151Z * [new branch] custom_lowering_dict -> origin/custom_lowering_dict 2025-12-04T09:33:41.1446127Z * [new branch] d4l3k/debug_plane_frtrace -> origin/d4l3k/debug_plane_frtrace 2025-12-04T09:33:41.1448076Z * [new branch] daxia6/2.8o3 -> origin/daxia6/2.8o3 2025-12-04T09:33:41.1449505Z * [new branch] debug-guard -> origin/debug-guard 2025-12-04T09:33:41.1451126Z * [new branch] delete-quant-docs -> origin/delete-quant-docs 2025-12-04T09:33:41.1455752Z * [new branch] dependabot/pip/dot-ci/docker/ci_commit_pins/main/transformers-4.57.0 -> origin/dependabot/pip/dot-ci/docker/ci_commit_pins/main/transformers-4.57.0 2025-12-04T09:33:41.1457476Z * [new branch] dependabot/pip/dot-ci/docker/ci_commit_pins/main/transformers-4.57.1 -> origin/dependabot/pip/dot-ci/docker/ci_commit_pins/main/transformers-4.57.1 2025-12-04T09:33:41.1459053Z * [new branch] desertfire/test_cpp_wrapper -> origin/desertfire/test_cpp_wrapper 2025-12-04T09:33:41.1460691Z * [new branch] desertfire/triton-cpu-for-aarch64 -> origin/desertfire/triton-cpu-for-aarch64 2025-12-04T09:33:41.1462889Z * [new branch] dev/dhruva/flex_attn_opt -> origin/dev/dhruva/flex_attn_opt 2025-12-04T09:33:41.1465265Z * [new branch] dev/joona/MPSNDArrayAdd -> origin/dev/joona/MPSNDArrayAdd 2025-12-04T09:33:41.1467041Z * [new branch] dev/joona/Unranked -> origin/dev/joona/Unranked 2025-12-04T09:33:41.1468754Z * [new branch] dev/joona/cat -> origin/dev/joona/cat 2025-12-04T09:33:41.1470291Z * [new branch] dev/joona/embeddingbag -> origin/dev/joona/embeddingbag 2025-12-04T09:33:41.1472065Z * [new branch] dev/joona/fix_sdpa_memtest -> origin/dev/joona/fix_sdpa_memtest 2025-12-04T09:33:41.1473882Z * [new branch] dev/joona/getTensorsString -> origin/dev/joona/getTensorsString 2025-12-04T09:33:41.1475605Z * [new branch] dev/joona/mps_linear_macos14 -> origin/dev/joona/mps_linear_macos14 2025-12-04T09:33:41.1477681Z * [new branch] dev/joona/scalar_clamp -> origin/dev/joona/scalar_clamp 2025-12-04T09:33:41.1479688Z * [new branch] dev/joona/sdpa -> origin/dev/joona/sdpa 2025-12-04T09:33:41.1481834Z * [new branch] dev/joona/sdpa_api -> origin/dev/joona/sdpa_api 2025-12-04T09:33:41.1483581Z * [new branch] dev/joona/type_inf -> origin/dev/joona/type_inf 2025-12-04T09:33:41.1485411Z * [new branch] dev/joona/ulpAssertClose -> origin/dev/joona/ulpAssertClose 2025-12-04T09:33:41.1487088Z * [new branch] dev/joona/upsize3d -> origin/dev/joona/upsize3d 2025-12-04T09:33:41.1488584Z * [new branch] disp_counter -> origin/disp_counter 2025-12-04T09:33:41.1490164Z * [new branch] divyanshk-patch-1 -> origin/divyanshk-patch-1 2025-12-04T09:33:41.1491427Z * [new branch] docs -> origin/docs 2025-12-04T09:33:41.1493091Z * [new branch] documentation -> origin/documentation 2025-12-04T09:33:41.1494606Z * [new branch] eager_model_benchmarks -> origin/eager_model_benchmarks 2025-12-04T09:33:41.1496715Z * [new branch] embg/test_inductor_ci_control -> origin/embg/test_inductor_ci_control 2025-12-04T09:33:41.1498086Z * [new branch] embg/triton_l2_prefetch_128B -> origin/embg/triton_l2_prefetch_128B 2025-12-04T09:33:41.1499361Z * [new branch] embg/triton_l2_prefetch_256B -> origin/embg/triton_l2_prefetch_256B 2025-12-04T09:33:41.1500915Z * [new branch] eqy-patch-1 -> origin/eqy-patch-1 2025-12-04T09:33:41.1502457Z * [new branch] eqy-patch-2 -> origin/eqy-patch-2 2025-12-04T09:33:41.1504045Z * [new branch] eqy-patch-3 -> origin/eqy-patch-3 2025-12-04T09:33:41.1505598Z * [new branch] eqy-patch-4 -> origin/eqy-patch-4 2025-12-04T09:33:41.1507116Z * [new branch] eqy-patch-5 -> origin/eqy-patch-5 2025-12-04T09:33:41.1508467Z * [new branch] eqy-patch-6 -> origin/eqy-patch-6 2025-12-04T09:33:41.1510523Z * [new branch] exclamaforte/amd-ma -> origin/exclamaforte/amd-ma 2025-12-04T09:33:41.1512154Z * [new branch] exclamaforte/combo-kernels-perf-run -> origin/exclamaforte/combo-kernels-perf-run 2025-12-04T09:33:41.1513482Z * [new branch] exclamaforte/do_bench_refactor -> origin/exclamaforte/do_bench_refactor 2025-12-04T09:33:41.1514913Z * [new branch] exclamaforte/enable-mem-dep-fusion -> origin/exclamaforte/enable-mem-dep-fusion 2025-12-04T09:33:41.1516410Z * [new branch] exclamaforte/fix-exhaustive-autotuning -> origin/exclamaforte/fix-exhaustive-autotuning 2025-12-04T09:33:41.1518251Z * [new branch] exclamaforte/fix-trace-parsing-fx-svg -> origin/exclamaforte/fix-trace-parsing-fx-svg 2025-12-04T09:33:41.1520180Z * [new branch] exclamaforte/force-pointwise-cat-perf-run -> origin/exclamaforte/force-pointwise-cat-perf-run 2025-12-04T09:33:41.1521446Z * [new branch] exclamaforte/fusion-data -> origin/exclamaforte/fusion-data 2025-12-04T09:33:41.1523273Z * [new branch] exclamaforte/gemm-benchmark-run -> origin/exclamaforte/gemm-benchmark-run 2025-12-04T09:33:41.1524430Z * [new branch] exclamaforte/gemm-export-model -> origin/exclamaforte/gemm-export-model 2025-12-04T09:33:41.1525796Z * [new branch] exclamaforte/gemm-model -> origin/exclamaforte/gemm-model 2025-12-04T09:33:41.1527550Z * [new branch] exclamaforte/gemm-model-all-data-collection -> origin/exclamaforte/gemm-model-all-data-collection 2025-12-04T09:33:41.1528669Z * [new branch] exclamaforte/gemm-to-amd -> origin/exclamaforte/gemm-to-amd 2025-12-04T09:33:41.1530053Z * [new branch] exclamaforte/just-gemm-model -> origin/exclamaforte/just-gemm-model 2025-12-04T09:33:41.1531818Z * [new branch] exclamaforte/just-gemm-model-no-refactor -> origin/exclamaforte/just-gemm-model-no-refactor 2025-12-04T09:33:41.1533149Z * [new branch] exclamaforte/profile-diff-algo -> origin/exclamaforte/profile-diff-algo 2025-12-04T09:33:41.1534671Z * [new branch] exclamaforte/profiler-visualization -> origin/exclamaforte/profiler-visualization 2025-12-04T09:33:41.1536107Z * [new branch] exclamaforte/test_cpp_wrapper_mode -> origin/exclamaforte/test_cpp_wrapper_mode 2025-12-04T09:33:41.1537779Z * [new branch] exclamaforte/update-autotune-configs -> origin/exclamaforte/update-autotune-configs 2025-12-04T09:33:41.1539263Z * [new branch] exclamaforte/update-autotune-configs-2 -> origin/exclamaforte/update-autotune-configs-2 2025-12-04T09:33:41.1540543Z * [new branch] exec -> origin/exec 2025-12-04T09:33:41.1542332Z * [new branch] experimental-mosaic -> origin/experimental-mosaic 2025-12-04T09:33:41.1543820Z * [new branch] export-D61047529 -> origin/export-D61047529 2025-12-04T09:33:41.1545373Z * [new branch] export-D71412006 -> origin/export-D71412006 2025-12-04T09:33:41.1547056Z * [new branch] export-D73042989 -> origin/export-D73042989 2025-12-04T09:33:41.1548495Z * [new branch] export-D78957093 -> origin/export-D78957093 2025-12-04T09:33:41.1549942Z * [new branch] export-D78996107 -> origin/export-D78996107 2025-12-04T09:33:41.1551377Z * [new branch] export-D80823877 -> origin/export-D80823877 2025-12-04T09:33:41.1552965Z * [new branch] export-D80958642 -> origin/export-D80958642 2025-12-04T09:33:41.1554416Z * [new branch] export-D81054193 -> origin/export-D81054193 2025-12-04T09:33:41.1555861Z * [new branch] export-D81204584 -> origin/export-D81204584 2025-12-04T09:33:41.1557154Z * [new branch] export-D81429090 -> origin/export-D81429090 2025-12-04T09:33:41.1558842Z * [new branch] export-D82250826 -> origin/export-D82250826 2025-12-04T09:33:41.1560354Z * [new branch] export-D82253817 -> origin/export-D82253817 2025-12-04T09:33:41.1561641Z * [new branch] export-D83541846 -> origin/export-D83541846 2025-12-04T09:33:41.1563208Z * [new branch] export-D83627170 -> origin/export-D83627170 2025-12-04T09:33:41.1564672Z * [new branch] export-D83766701 -> origin/export-D83766701 2025-12-04T09:33:41.1566138Z * [new branch] export-D83768878 -> origin/export-D83768878 2025-12-04T09:33:41.1567575Z * [new branch] export-D83769447 -> origin/export-D83769447 2025-12-04T09:33:41.1569167Z * [new branch] export-D84089824 -> origin/export-D84089824 2025-12-04T09:33:41.1570316Z * [new branch] export-D84213020 -> origin/export-D84213020 2025-12-04T09:33:41.1572752Z * [new branch] export-D84373821 -> origin/export-D84373821 2025-12-04T09:33:41.1574422Z * [new branch] export-D84612194 -> origin/export-D84612194 2025-12-04T09:33:41.1575802Z * [new branch] export-D84890985 -> origin/export-D84890985 2025-12-04T09:33:41.1577361Z * [new branch] export-D85122326 -> origin/export-D85122326 2025-12-04T09:33:41.1578933Z * [new branch] export-D86256198 -> origin/export-D86256198 2025-12-04T09:33:41.1580404Z * [new branch] export-D86460608 -> origin/export-D86460608 2025-12-04T09:33:41.1582009Z * [new branch] export-D86474796 -> origin/export-D86474796 2025-12-04T09:33:41.1583697Z * [new branch] export-D86712396 -> origin/export-D86712396 2025-12-04T09:33:41.1585255Z * [new branch] export-D87022129 -> origin/export-D87022129 2025-12-04T09:33:41.1586779Z * [new branch] export-D87838959 -> origin/export-D87838959 2025-12-04T09:33:41.1588315Z * [new branch] export-D88319437 -> origin/export-D88319437 2025-12-04T09:33:41.1590010Z * [new branch] exported-model-train-idempotent -> origin/exported-model-train-idempotent 2025-12-04T09:33:41.1591354Z * [new branch] ezyang-titan-october -> origin/ezyang-titan-october 2025-12-04T09:33:41.1592811Z * [new branch] ezyang-titan-october2 -> origin/ezyang-titan-october2 2025-12-04T09:33:41.1594098Z * [new branch] ezyang-war -> origin/ezyang-war 2025-12-04T09:33:41.1596279Z * [new branch] ezyang/wip-aot-descriptors -> origin/ezyang/wip-aot-descriptors 2025-12-04T09:33:41.1597482Z * [new branch] fa_u8_brgemm -> origin/fa_u8_brgemm 2025-12-04T09:33:41.1599584Z * [new branch] fadeputr/sequence_fbgemm -> origin/fadeputr/sequence_fbgemm 2025-12-04T09:33:41.1601043Z * [new branch] fastmath_baseline -> origin/fastmath_baseline 2025-12-04T09:33:41.1603080Z * [new branch] fbcode/warm -> origin/fbcode/warm 2025-12-04T09:33:41.1604719Z * [new branch] fca -> origin/fca 2025-12-04T09:33:41.1606138Z * [new branch] fca2_ca5984c -> origin/fca2_ca5984c 2025-12-04T09:33:41.1607629Z * [new branch] fca5 -> origin/fca5 2025-12-04T09:33:41.1609648Z * [new branch] feature/justknobs-cpp -> origin/feature/justknobs-cpp 2025-12-04T09:33:41.1611029Z * [new branch] feature/numa-forkserver -> origin/feature/numa-forkserver 2025-12-04T09:33:41.1613055Z * [new branch] ffast_math_baseline -> origin/ffast_math_baseline 2025-12-04T09:33:41.1614498Z * [new branch] ffast_math_target -> origin/ffast_math_target 2025-12-04T09:33:41.1616548Z * [new branch] findhao/base_commit -> origin/findhao/base_commit 2025-12-04T09:33:41.1618063Z * [new branch] findhao/base_commit1 -> origin/findhao/base_commit1 2025-12-04T09:33:41.1619535Z * [new branch] findhao/multistream2 -> origin/findhao/multistream2 2025-12-04T09:33:41.1620778Z * [new branch] findhao/multistream5 -> origin/findhao/multistream5 2025-12-04T09:33:41.1622080Z * [new branch] findhao/multistream6 -> origin/findhao/multistream6 2025-12-04T09:33:41.1623422Z * [new branch] findhao/operatorbench3 -> origin/findhao/operatorbench3 2025-12-04T09:33:41.1624726Z * [new branch] findhao/operatorbench5 -> origin/findhao/operatorbench5 2025-12-04T09:33:41.1625996Z * [new branch] findhao/tritonparse -> origin/findhao/tritonparse 2025-12-04T09:33:41.1627655Z * [new branch] fix-ck-gemm-template-format -> origin/fix-ck-gemm-template-format 2025-12-04T09:33:41.1629120Z * [new branch] fix-config-ignore -> origin/fix-config-ignore 2025-12-04T09:33:41.1630543Z * [new branch] fix-dict-guard -> origin/fix-dict-guard 2025-12-04T09:33:41.1632188Z * [new branch] fix_addmm_issue -> origin/fix_addmm_issue 2025-12-04T09:33:41.1633717Z * [new branch] fix_amd_missing_cluster_dims -> origin/fix_amd_missing_cluster_dims 2025-12-04T09:33:41.1634992Z * [new branch] fix_bench_bwd_pass -> origin/fix_bench_bwd_pass 2025-12-04T09:33:41.1636564Z * [new branch] fix_mem_profiler_config -> origin/fix_mem_profiler_config 2025-12-04T09:33:41.1637871Z * [new branch] fix_nvrtc_discovery -> origin/fix_nvrtc_discovery 2025-12-04T09:33:41.1639373Z * [new branch] fix_op_runner -> origin/fix_op_runner 2025-12-04T09:33:41.1641066Z * [new branch] fix_ubn_159469 -> origin/fix_ubn_159469 2025-12-04T09:33:41.1642427Z * [new branch] fixes-triage -> origin/fixes-triage 2025-12-04T09:33:41.1643893Z * [new branch] fixflashinfer -> origin/fixflashinfer 2025-12-04T09:33:41.1645328Z * [new branch] flash_decoding_cpu -> origin/flash_decoding_cpu 2025-12-04T09:33:41.1647146Z * [new branch] flex-flash -> origin/flex-flash 2025-12-04T09:33:41.1648689Z * [new branch] flex_attention_functorch_grad -> origin/flex_attention_functorch_grad 2025-12-04T09:33:41.1649986Z * [new branch] flex_flash -> origin/flex_flash 2025-12-04T09:33:41.1652234Z * [new branch] fmassa/fix_memeff_sharding_rule -> origin/fmassa/fix_memeff_sharding_rule 2025-12-04T09:33:41.1683495Z * [new branch] fmassa/tests_comm_compute_scheduler -> origin/fmassa/tests_comm_compute_scheduler 2025-12-04T09:33:41.1684424Z * [new branch] forkserver_fix -> origin/forkserver_fix 2025-12-04T09:33:41.1685075Z * [new branch] fsdp2_trace_rules -> origin/fsdp2_trace_rules 2025-12-04T09:33:41.1685652Z * [new branch] fx_cpp -> origin/fx_cpp 2025-12-04T09:33:41.1686219Z * [new branch] fy/fix-win -> origin/fy/fix-win 2025-12-04T09:33:41.1686818Z * [new branch] galv-patch-1 -> origin/galv-patch-1 2025-12-04T09:33:41.1687593Z * [new branch] galv/cudagraphs-conditional-nodes-4 -> origin/galv/cudagraphs-conditional-nodes-4 2025-12-04T09:33:41.1688443Z * [new branch] georgehong/cmakelists-patch -> origin/georgehong/cmakelists-patch 2025-12-04T09:33:41.1689155Z * [new branch] gh/AlnisM/1/base -> origin/gh/AlnisM/1/base 2025-12-04T09:33:41.1689776Z * [new branch] gh/AlnisM/1/head -> origin/gh/AlnisM/1/head 2025-12-04T09:33:41.1690407Z * [new branch] gh/EikanWang/67/base -> origin/gh/EikanWang/67/base 2025-12-04T09:33:41.1691047Z * [new branch] gh/EikanWang/67/head -> origin/gh/EikanWang/67/head 2025-12-04T09:33:41.1691683Z * [new branch] gh/Gasoonjia/1/base -> origin/gh/Gasoonjia/1/base 2025-12-04T09:33:41.1692308Z * [new branch] gh/Gasoonjia/1/head -> origin/gh/Gasoonjia/1/head 2025-12-04T09:33:41.1692933Z * [new branch] gh/H-Huang/131/base -> origin/gh/H-Huang/131/base 2025-12-04T09:33:41.1693530Z * [new branch] gh/H-Huang/131/head -> origin/gh/H-Huang/131/head 2025-12-04T09:33:41.1694134Z * [new branch] gh/H-Huang/131/orig -> origin/gh/H-Huang/131/orig 2025-12-04T09:33:41.1694735Z * [new branch] gh/H-Huang/132/base -> origin/gh/H-Huang/132/base 2025-12-04T09:33:41.1695339Z * [new branch] gh/H-Huang/132/head -> origin/gh/H-Huang/132/head 2025-12-04T09:33:41.1695940Z * [new branch] gh/H-Huang/132/orig -> origin/gh/H-Huang/132/orig 2025-12-04T09:33:41.1696839Z * [new branch] gh/H-Huang/180/base -> origin/gh/H-Huang/180/base 2025-12-04T09:33:41.1697447Z * [new branch] gh/H-Huang/180/head -> origin/gh/H-Huang/180/head 2025-12-04T09:33:41.1698052Z * [new branch] gh/H-Huang/180/orig -> origin/gh/H-Huang/180/orig 2025-12-04T09:33:41.1698658Z * [new branch] gh/H-Huang/182/base -> origin/gh/H-Huang/182/base 2025-12-04T09:33:41.1699267Z * [new branch] gh/H-Huang/182/head -> origin/gh/H-Huang/182/head 2025-12-04T09:33:41.1699875Z * [new branch] gh/H-Huang/182/orig -> origin/gh/H-Huang/182/orig 2025-12-04T09:33:41.1701714Z * [new branch] gh/H-Huang/226/base -> origin/gh/H-Huang/226/base 2025-12-04T09:33:41.1703105Z * [new branch] gh/H-Huang/226/head -> origin/gh/H-Huang/226/head 2025-12-04T09:33:41.1704593Z * [new branch] gh/H-Huang/226/orig -> origin/gh/H-Huang/226/orig 2025-12-04T09:33:41.1706581Z * [new branch] gh/H-Huang/228/base -> origin/gh/H-Huang/228/base 2025-12-04T09:33:41.1708038Z * [new branch] gh/H-Huang/228/head -> origin/gh/H-Huang/228/head 2025-12-04T09:33:41.1709499Z * [new branch] gh/H-Huang/228/orig -> origin/gh/H-Huang/228/orig 2025-12-04T09:33:41.1712087Z * [new branch] gh/IvanKobzarev/150/base -> origin/gh/IvanKobzarev/150/base 2025-12-04T09:33:41.1713408Z * [new branch] gh/IvanKobzarev/150/head -> origin/gh/IvanKobzarev/150/head 2025-12-04T09:33:41.1714969Z * [new branch] gh/IvanKobzarev/150/orig -> origin/gh/IvanKobzarev/150/orig 2025-12-04T09:33:41.1717125Z * [new branch] gh/IvanKobzarev/157/base -> origin/gh/IvanKobzarev/157/base 2025-12-04T09:33:41.1718682Z * [new branch] gh/IvanKobzarev/157/head -> origin/gh/IvanKobzarev/157/head 2025-12-04T09:33:41.1720186Z * [new branch] gh/IvanKobzarev/157/orig -> origin/gh/IvanKobzarev/157/orig 2025-12-04T09:33:41.1722235Z * [new branch] gh/IvanKobzarev/159/base -> origin/gh/IvanKobzarev/159/base 2025-12-04T09:33:41.1723724Z * [new branch] gh/IvanKobzarev/159/head -> origin/gh/IvanKobzarev/159/head 2025-12-04T09:33:41.1725229Z * [new branch] gh/IvanKobzarev/159/orig -> origin/gh/IvanKobzarev/159/orig 2025-12-04T09:33:41.1727237Z * [new branch] gh/IvanKobzarev/162/base -> origin/gh/IvanKobzarev/162/base 2025-12-04T09:33:41.1728840Z * [new branch] gh/IvanKobzarev/162/head -> origin/gh/IvanKobzarev/162/head 2025-12-04T09:33:41.1730217Z * [new branch] gh/IvanKobzarev/162/orig -> origin/gh/IvanKobzarev/162/orig 2025-12-04T09:33:41.1732297Z * [new branch] gh/IvanKobzarev/163/base -> origin/gh/IvanKobzarev/163/base 2025-12-04T09:33:41.1733575Z * [new branch] gh/IvanKobzarev/163/head -> origin/gh/IvanKobzarev/163/head 2025-12-04T09:33:41.1735061Z * [new branch] gh/IvanKobzarev/163/orig -> origin/gh/IvanKobzarev/163/orig 2025-12-04T09:33:41.1737285Z * [new branch] gh/IvanKobzarev/166/base -> origin/gh/IvanKobzarev/166/base 2025-12-04T09:33:41.1738842Z * [new branch] gh/IvanKobzarev/166/head -> origin/gh/IvanKobzarev/166/head 2025-12-04T09:33:41.1740168Z * [new branch] gh/IvanKobzarev/166/orig -> origin/gh/IvanKobzarev/166/orig 2025-12-04T09:33:41.1742372Z * [new branch] gh/IvanKobzarev/167/base -> origin/gh/IvanKobzarev/167/base 2025-12-04T09:33:41.1743692Z * [new branch] gh/IvanKobzarev/167/head -> origin/gh/IvanKobzarev/167/head 2025-12-04T09:33:41.1745156Z * [new branch] gh/IvanKobzarev/167/orig -> origin/gh/IvanKobzarev/167/orig 2025-12-04T09:33:41.1747127Z * [new branch] gh/IvanKobzarev/168/base -> origin/gh/IvanKobzarev/168/base 2025-12-04T09:33:41.1749165Z * [new branch] gh/IvanKobzarev/168/head -> origin/gh/IvanKobzarev/168/head 2025-12-04T09:33:41.1750058Z * [new branch] gh/IvanKobzarev/168/orig -> origin/gh/IvanKobzarev/168/orig 2025-12-04T09:33:41.1752142Z * [new branch] gh/IvanKobzarev/169/base -> origin/gh/IvanKobzarev/169/base 2025-12-04T09:33:41.1753638Z * [new branch] gh/IvanKobzarev/169/head -> origin/gh/IvanKobzarev/169/head 2025-12-04T09:33:41.1755125Z * [new branch] gh/IvanKobzarev/169/orig -> origin/gh/IvanKobzarev/169/orig 2025-12-04T09:33:41.1756956Z * [new branch] gh/IvanKobzarev/170/base -> origin/gh/IvanKobzarev/170/base 2025-12-04T09:33:41.1758463Z * [new branch] gh/IvanKobzarev/170/head -> origin/gh/IvanKobzarev/170/head 2025-12-04T09:33:41.1759810Z * [new branch] gh/IvanKobzarev/170/orig -> origin/gh/IvanKobzarev/170/orig 2025-12-04T09:33:41.1762141Z * [new branch] gh/IvanKobzarev/171/base -> origin/gh/IvanKobzarev/171/base 2025-12-04T09:33:41.1763643Z * [new branch] gh/IvanKobzarev/171/head -> origin/gh/IvanKobzarev/171/head 2025-12-04T09:33:41.1764974Z * [new branch] gh/IvanKobzarev/171/orig -> origin/gh/IvanKobzarev/171/orig 2025-12-04T09:33:41.1767016Z * [new branch] gh/IvanKobzarev/172/base -> origin/gh/IvanKobzarev/172/base 2025-12-04T09:33:41.1768589Z * [new branch] gh/IvanKobzarev/172/head -> origin/gh/IvanKobzarev/172/head 2025-12-04T09:33:41.1770080Z * [new branch] gh/IvanKobzarev/172/orig -> origin/gh/IvanKobzarev/172/orig 2025-12-04T09:33:41.1772259Z * [new branch] gh/IvanKobzarev/173/base -> origin/gh/IvanKobzarev/173/base 2025-12-04T09:33:41.1773774Z * [new branch] gh/IvanKobzarev/173/head -> origin/gh/IvanKobzarev/173/head 2025-12-04T09:33:41.1775237Z * [new branch] gh/IvanKobzarev/173/orig -> origin/gh/IvanKobzarev/173/orig 2025-12-04T09:33:41.1777335Z * [new branch] gh/IvanKobzarev/174/base -> origin/gh/IvanKobzarev/174/base 2025-12-04T09:33:41.1778898Z * [new branch] gh/IvanKobzarev/174/head -> origin/gh/IvanKobzarev/174/head 2025-12-04T09:33:41.1780286Z * [new branch] gh/IvanKobzarev/174/orig -> origin/gh/IvanKobzarev/174/orig 2025-12-04T09:33:41.1782487Z * [new branch] gh/IvanKobzarev/175/base -> origin/gh/IvanKobzarev/175/base 2025-12-04T09:33:41.1784038Z * [new branch] gh/IvanKobzarev/175/head -> origin/gh/IvanKobzarev/175/head 2025-12-04T09:33:41.1785553Z * [new branch] gh/IvanKobzarev/175/orig -> origin/gh/IvanKobzarev/175/orig 2025-12-04T09:33:41.1787749Z * [new branch] gh/IvanKobzarev/176/base -> origin/gh/IvanKobzarev/176/base 2025-12-04T09:33:41.1789131Z * [new branch] gh/IvanKobzarev/176/head -> origin/gh/IvanKobzarev/176/head 2025-12-04T09:33:41.1790734Z * [new branch] gh/IvanKobzarev/176/orig -> origin/gh/IvanKobzarev/176/orig 2025-12-04T09:33:41.1793068Z * [new branch] gh/IvanKobzarev/177/base -> origin/gh/IvanKobzarev/177/base 2025-12-04T09:33:41.1794638Z * [new branch] gh/IvanKobzarev/177/head -> origin/gh/IvanKobzarev/177/head 2025-12-04T09:33:41.1795973Z * [new branch] gh/IvanKobzarev/177/orig -> origin/gh/IvanKobzarev/177/orig 2025-12-04T09:33:41.1798241Z * [new branch] gh/IvanKobzarev/178/base -> origin/gh/IvanKobzarev/178/base 2025-12-04T09:33:41.1799797Z * [new branch] gh/IvanKobzarev/178/head -> origin/gh/IvanKobzarev/178/head 2025-12-04T09:33:41.1801312Z * [new branch] gh/IvanKobzarev/178/orig -> origin/gh/IvanKobzarev/178/orig 2025-12-04T09:33:41.1803454Z * [new branch] gh/IvanKobzarev/179/base -> origin/gh/IvanKobzarev/179/base 2025-12-04T09:33:41.1804798Z * [new branch] gh/IvanKobzarev/179/head -> origin/gh/IvanKobzarev/179/head 2025-12-04T09:33:41.1806477Z * [new branch] gh/IvanKobzarev/179/orig -> origin/gh/IvanKobzarev/179/orig 2025-12-04T09:33:41.1808386Z * [new branch] gh/IvanKobzarev/180/base -> origin/gh/IvanKobzarev/180/base 2025-12-04T09:33:41.1809915Z * [new branch] gh/IvanKobzarev/180/head -> origin/gh/IvanKobzarev/180/head 2025-12-04T09:33:41.1811291Z * [new branch] gh/IvanKobzarev/180/orig -> origin/gh/IvanKobzarev/180/orig 2025-12-04T09:33:41.1813640Z * [new branch] gh/IvanKobzarev/181/base -> origin/gh/IvanKobzarev/181/base 2025-12-04T09:33:41.1815278Z * [new branch] gh/IvanKobzarev/181/head -> origin/gh/IvanKobzarev/181/head 2025-12-04T09:33:41.1816672Z * [new branch] gh/IvanKobzarev/181/orig -> origin/gh/IvanKobzarev/181/orig 2025-12-04T09:33:41.1819095Z * [new branch] gh/IvanKobzarev/182/base -> origin/gh/IvanKobzarev/182/base 2025-12-04T09:33:41.1820474Z * [new branch] gh/IvanKobzarev/182/head -> origin/gh/IvanKobzarev/182/head 2025-12-04T09:33:41.1821962Z * [new branch] gh/IvanKobzarev/182/orig -> origin/gh/IvanKobzarev/182/orig 2025-12-04T09:33:41.1824165Z * [new branch] gh/IvanKobzarev/183/base -> origin/gh/IvanKobzarev/183/base 2025-12-04T09:33:41.1825586Z * [new branch] gh/IvanKobzarev/183/head -> origin/gh/IvanKobzarev/183/head 2025-12-04T09:33:41.1827243Z * [new branch] gh/IvanKobzarev/183/orig -> origin/gh/IvanKobzarev/183/orig 2025-12-04T09:33:41.1829254Z * [new branch] gh/IvanKobzarev/184/base -> origin/gh/IvanKobzarev/184/base 2025-12-04T09:33:41.1830779Z * [new branch] gh/IvanKobzarev/184/head -> origin/gh/IvanKobzarev/184/head 2025-12-04T09:33:41.1832292Z * [new branch] gh/IvanKobzarev/184/orig -> origin/gh/IvanKobzarev/184/orig 2025-12-04T09:33:41.1834700Z * [new branch] gh/NikhilAPatel/1/base -> origin/gh/NikhilAPatel/1/base 2025-12-04T09:33:41.1836339Z * [new branch] gh/NikhilAPatel/1/head -> origin/gh/NikhilAPatel/1/head 2025-12-04T09:33:41.1838145Z * [new branch] gh/NikhilAPatel/2/base -> origin/gh/NikhilAPatel/2/base 2025-12-04T09:33:41.1839714Z * [new branch] gh/NikhilAPatel/2/head -> origin/gh/NikhilAPatel/2/head 2025-12-04T09:33:41.1841806Z * [new branch] gh/NikhilAPatel/4/base -> origin/gh/NikhilAPatel/4/base 2025-12-04T09:33:41.1843420Z * [new branch] gh/NikhilAPatel/4/head -> origin/gh/NikhilAPatel/4/head 2025-12-04T09:33:41.1845349Z * [new branch] gh/NikhilAPatel/5/base -> origin/gh/NikhilAPatel/5/base 2025-12-04T09:33:41.1846918Z * [new branch] gh/NikhilAPatel/5/head -> origin/gh/NikhilAPatel/5/head 2025-12-04T09:33:41.1848447Z * [new branch] gh/NikhilAPatel/5/orig -> origin/gh/NikhilAPatel/5/orig 2025-12-04T09:33:41.1850787Z * [new branch] gh/PaliC/17/base -> origin/gh/PaliC/17/base 2025-12-04T09:33:41.1852244Z * [new branch] gh/PaliC/17/head -> origin/gh/PaliC/17/head 2025-12-04T09:33:41.1853734Z * [new branch] gh/PaliC/17/orig -> origin/gh/PaliC/17/orig 2025-12-04T09:33:41.1855743Z * [new branch] gh/PaliC/18/base -> origin/gh/PaliC/18/base 2025-12-04T09:33:41.1857369Z * [new branch] gh/PaliC/18/head -> origin/gh/PaliC/18/head 2025-12-04T09:33:41.1858901Z * [new branch] gh/PaliC/18/orig -> origin/gh/PaliC/18/orig 2025-12-04T09:33:41.1861015Z * [new branch] gh/PaliC/20/base -> origin/gh/PaliC/20/base 2025-12-04T09:33:41.1862414Z * [new branch] gh/PaliC/20/head -> origin/gh/PaliC/20/head 2025-12-04T09:33:41.1863920Z * [new branch] gh/PaliC/20/orig -> origin/gh/PaliC/20/orig 2025-12-04T09:33:41.1865810Z * [new branch] gh/PaliC/21/base -> origin/gh/PaliC/21/base 2025-12-04T09:33:41.1867432Z * [new branch] gh/PaliC/21/head -> origin/gh/PaliC/21/head 2025-12-04T09:33:41.1868699Z * [new branch] gh/PaliC/21/orig -> origin/gh/PaliC/21/orig 2025-12-04T09:33:41.1870650Z * [new branch] gh/PaliC/23/base -> origin/gh/PaliC/23/base 2025-12-04T09:33:41.1872910Z * [new branch] gh/PaliC/23/head -> origin/gh/PaliC/23/head 2025-12-04T09:33:41.1874380Z * [new branch] gh/PaliC/23/orig -> origin/gh/PaliC/23/orig 2025-12-04T09:33:41.1876330Z * [new branch] gh/PaliC/24/base -> origin/gh/PaliC/24/base 2025-12-04T09:33:41.1877872Z * [new branch] gh/PaliC/24/head -> origin/gh/PaliC/24/head 2025-12-04T09:33:41.1879364Z * [new branch] gh/PaliC/24/orig -> origin/gh/PaliC/24/orig 2025-12-04T09:33:41.1881209Z * [new branch] gh/PaliC/25/head -> origin/gh/PaliC/25/head 2025-12-04T09:33:41.1882664Z * [new branch] gh/PaliC/25/next -> origin/gh/PaliC/25/next 2025-12-04T09:33:41.1884231Z * [new branch] gh/PaliC/25/orig -> origin/gh/PaliC/25/orig 2025-12-04T09:33:41.1886164Z * [new branch] gh/PaliC/26/head -> origin/gh/PaliC/26/head 2025-12-04T09:33:41.1887387Z * [new branch] gh/PaliC/26/next -> origin/gh/PaliC/26/next 2025-12-04T09:33:41.1889001Z * [new branch] gh/PaliC/26/orig -> origin/gh/PaliC/26/orig 2025-12-04T09:33:41.1890931Z * [new branch] gh/PaliC/27/next -> origin/gh/PaliC/27/next 2025-12-04T09:33:41.1892936Z * [new branch] gh/PaliC/28/head -> origin/gh/PaliC/28/head 2025-12-04T09:33:41.1894193Z * [new branch] gh/PaliC/28/next -> origin/gh/PaliC/28/next 2025-12-04T09:33:41.1895827Z * [new branch] gh/PaliC/28/orig -> origin/gh/PaliC/28/orig 2025-12-04T09:33:41.1897874Z * [new branch] gh/PaliC/29/head -> origin/gh/PaliC/29/head 2025-12-04T09:33:41.1899307Z * [new branch] gh/PaliC/29/next -> origin/gh/PaliC/29/next 2025-12-04T09:33:41.1900796Z * [new branch] gh/PaliC/29/orig -> origin/gh/PaliC/29/orig 2025-12-04T09:33:41.1902744Z * [new branch] gh/PaliC/30/head -> origin/gh/PaliC/30/head 2025-12-04T09:33:41.1903982Z * [new branch] gh/PaliC/30/next -> origin/gh/PaliC/30/next 2025-12-04T09:33:41.1905508Z * [new branch] gh/PaliC/30/orig -> origin/gh/PaliC/30/orig 2025-12-04T09:33:41.1907504Z * [new branch] gh/PaliC/31/head -> origin/gh/PaliC/31/head 2025-12-04T09:33:41.1908704Z * [new branch] gh/PaliC/31/next -> origin/gh/PaliC/31/next 2025-12-04T09:33:41.1910586Z * [new branch] gh/PaliC/31/orig -> origin/gh/PaliC/31/orig 2025-12-04T09:33:41.1912920Z * [new branch] gh/PaulZhang12/25/base -> origin/gh/PaulZhang12/25/base 2025-12-04T09:33:41.1914543Z * [new branch] gh/PaulZhang12/25/head -> origin/gh/PaulZhang12/25/head 2025-12-04T09:33:41.1916088Z * [new branch] gh/PaulZhang12/25/orig -> origin/gh/PaulZhang12/25/orig 2025-12-04T09:33:41.1918087Z * [new branch] gh/PaulZhang12/28/base -> origin/gh/PaulZhang12/28/base 2025-12-04T09:33:41.1919673Z * [new branch] gh/PaulZhang12/28/head -> origin/gh/PaulZhang12/28/head 2025-12-04T09:33:41.1921250Z * [new branch] gh/PaulZhang12/28/orig -> origin/gh/PaulZhang12/28/orig 2025-12-04T09:33:41.1923481Z * [new branch] gh/PaulZhang12/31/base -> origin/gh/PaulZhang12/31/base 2025-12-04T09:33:41.1925047Z * [new branch] gh/PaulZhang12/31/head -> origin/gh/PaulZhang12/31/head 2025-12-04T09:33:41.1928555Z * [new branch] gh/PaulZhang12/31/orig -> origin/gh/PaulZhang12/31/orig 2025-12-04T09:33:41.1929258Z * [new branch] gh/PaulZhang12/37/base -> origin/gh/PaulZhang12/37/base 2025-12-04T09:33:41.1929948Z * [new branch] gh/PaulZhang12/37/head -> origin/gh/PaulZhang12/37/head 2025-12-04T09:33:41.1931121Z * [new branch] gh/PaulZhang12/37/orig -> origin/gh/PaulZhang12/37/orig 2025-12-04T09:33:41.1933228Z * [new branch] gh/PaulZhang12/40/base -> origin/gh/PaulZhang12/40/base 2025-12-04T09:33:41.1934562Z * [new branch] gh/PaulZhang12/40/head -> origin/gh/PaulZhang12/40/head 2025-12-04T09:33:41.1936091Z * [new branch] gh/PaulZhang12/40/orig -> origin/gh/PaulZhang12/40/orig 2025-12-04T09:33:41.1938185Z * [new branch] gh/PaulZhang12/42/base -> origin/gh/PaulZhang12/42/base 2025-12-04T09:33:41.1939632Z * [new branch] gh/PaulZhang12/42/head -> origin/gh/PaulZhang12/42/head 2025-12-04T09:33:41.1941600Z * [new branch] gh/PaulZhang12/43/base -> origin/gh/PaulZhang12/43/base 2025-12-04T09:33:41.1943103Z * [new branch] gh/PaulZhang12/43/head -> origin/gh/PaulZhang12/43/head 2025-12-04T09:33:41.1944612Z * [new branch] gh/PaulZhang12/43/orig -> origin/gh/PaulZhang12/43/orig 2025-12-04T09:33:41.1946408Z * [new branch] gh/PaulZhang12/44/base -> origin/gh/PaulZhang12/44/base 2025-12-04T09:33:41.1947913Z * [new branch] gh/PaulZhang12/44/head -> origin/gh/PaulZhang12/44/head 2025-12-04T09:33:41.1950000Z * [new branch] gh/PaulZhang12/45/base -> origin/gh/PaulZhang12/45/base 2025-12-04T09:33:41.1951487Z * [new branch] gh/PaulZhang12/45/head -> origin/gh/PaulZhang12/45/head 2025-12-04T09:33:41.1952940Z * [new branch] gh/PaulZhang12/45/orig -> origin/gh/PaulZhang12/45/orig 2025-12-04T09:33:41.1954905Z * [new branch] gh/PaulZhang12/46/base -> origin/gh/PaulZhang12/46/base 2025-12-04T09:33:41.1956405Z * [new branch] gh/PaulZhang12/46/head -> origin/gh/PaulZhang12/46/head 2025-12-04T09:33:41.1957960Z * [new branch] gh/PaulZhang12/46/orig -> origin/gh/PaulZhang12/46/orig 2025-12-04T09:33:41.1960047Z * [new branch] gh/PaulZhang12/47/base -> origin/gh/PaulZhang12/47/base 2025-12-04T09:33:41.1961515Z * [new branch] gh/PaulZhang12/47/head -> origin/gh/PaulZhang12/47/head 2025-12-04T09:33:41.1963050Z * [new branch] gh/PaulZhang12/47/orig -> origin/gh/PaulZhang12/47/orig 2025-12-04T09:33:41.1964883Z * [new branch] gh/PaulZhang12/48/base -> origin/gh/PaulZhang12/48/base 2025-12-04T09:33:41.1966227Z * [new branch] gh/PaulZhang12/48/head -> origin/gh/PaulZhang12/48/head 2025-12-04T09:33:41.1967738Z * [new branch] gh/PaulZhang12/48/orig -> origin/gh/PaulZhang12/48/orig 2025-12-04T09:33:41.1970062Z * [new branch] gh/SamGinzburg/11/base -> origin/gh/SamGinzburg/11/base 2025-12-04T09:33:41.1971743Z * [new branch] gh/SamGinzburg/11/head -> origin/gh/SamGinzburg/11/head 2025-12-04T09:33:41.1974316Z * [new branch] gh/SherlockNoMad/1/base -> origin/gh/SherlockNoMad/1/base 2025-12-04T09:33:41.1975843Z * [new branch] gh/SherlockNoMad/1/head -> origin/gh/SherlockNoMad/1/head 2025-12-04T09:33:41.1978021Z * [new branch] gh/SherlockNoMad/10/base -> origin/gh/SherlockNoMad/10/base 2025-12-04T09:33:41.1979526Z * [new branch] gh/SherlockNoMad/10/head -> origin/gh/SherlockNoMad/10/head 2025-12-04T09:33:41.1981113Z * [new branch] gh/SherlockNoMad/10/orig -> origin/gh/SherlockNoMad/10/orig 2025-12-04T09:33:41.1982891Z * [new branch] gh/SherlockNoMad/11/base -> origin/gh/SherlockNoMad/11/base 2025-12-04T09:33:41.1984427Z * [new branch] gh/SherlockNoMad/11/head -> origin/gh/SherlockNoMad/11/head 2025-12-04T09:33:41.1985942Z * [new branch] gh/SherlockNoMad/11/orig -> origin/gh/SherlockNoMad/11/orig 2025-12-04T09:33:41.1987649Z * [new branch] gh/SherlockNoMad/12/base -> origin/gh/SherlockNoMad/12/base 2025-12-04T09:33:41.1989043Z * [new branch] gh/SherlockNoMad/12/head -> origin/gh/SherlockNoMad/12/head 2025-12-04T09:33:41.1990534Z * [new branch] gh/SherlockNoMad/12/orig -> origin/gh/SherlockNoMad/12/orig 2025-12-04T09:33:41.1992573Z * [new branch] gh/SherlockNoMad/15/base -> origin/gh/SherlockNoMad/15/base 2025-12-04T09:33:41.1994063Z * [new branch] gh/SherlockNoMad/15/head -> origin/gh/SherlockNoMad/15/head 2025-12-04T09:33:41.1995456Z * [new branch] gh/SherlockNoMad/15/orig -> origin/gh/SherlockNoMad/15/orig 2025-12-04T09:33:41.1997418Z * [new branch] gh/SherlockNoMad/17/base -> origin/gh/SherlockNoMad/17/base 2025-12-04T09:33:41.1998759Z * [new branch] gh/SherlockNoMad/17/head -> origin/gh/SherlockNoMad/17/head 2025-12-04T09:33:41.2000255Z * [new branch] gh/SherlockNoMad/17/orig -> origin/gh/SherlockNoMad/17/orig 2025-12-04T09:33:41.2002416Z * [new branch] gh/SherlockNoMad/18/base -> origin/gh/SherlockNoMad/18/base 2025-12-04T09:33:41.2004079Z * [new branch] gh/SherlockNoMad/18/head -> origin/gh/SherlockNoMad/18/head 2025-12-04T09:33:41.2005456Z * [new branch] gh/SherlockNoMad/18/orig -> origin/gh/SherlockNoMad/18/orig 2025-12-04T09:33:41.2007321Z * [new branch] gh/SherlockNoMad/19/base -> origin/gh/SherlockNoMad/19/base 2025-12-04T09:33:41.2008974Z * [new branch] gh/SherlockNoMad/19/head -> origin/gh/SherlockNoMad/19/head 2025-12-04T09:33:41.2010495Z * [new branch] gh/SherlockNoMad/19/orig -> origin/gh/SherlockNoMad/19/orig 2025-12-04T09:33:41.2012301Z * [new branch] gh/SherlockNoMad/2/base -> origin/gh/SherlockNoMad/2/base 2025-12-04T09:33:41.2013600Z * [new branch] gh/SherlockNoMad/2/head -> origin/gh/SherlockNoMad/2/head 2025-12-04T09:33:41.2015473Z * [new branch] gh/SherlockNoMad/20/base -> origin/gh/SherlockNoMad/20/base 2025-12-04T09:33:41.2017186Z * [new branch] gh/SherlockNoMad/20/head -> origin/gh/SherlockNoMad/20/head 2025-12-04T09:33:41.2018445Z * [new branch] gh/SherlockNoMad/20/orig -> origin/gh/SherlockNoMad/20/orig 2025-12-04T09:33:41.2020745Z * [new branch] gh/SherlockNoMad/21/base -> origin/gh/SherlockNoMad/21/base 2025-12-04T09:33:41.2022318Z * [new branch] gh/SherlockNoMad/21/head -> origin/gh/SherlockNoMad/21/head 2025-12-04T09:33:41.2023623Z * [new branch] gh/SherlockNoMad/21/orig -> origin/gh/SherlockNoMad/21/orig 2025-12-04T09:33:41.2025473Z * [new branch] gh/SherlockNoMad/3/base -> origin/gh/SherlockNoMad/3/base 2025-12-04T09:33:41.2026960Z * [new branch] gh/SherlockNoMad/3/head -> origin/gh/SherlockNoMad/3/head 2025-12-04T09:33:41.2028707Z * [new branch] gh/SherlockNoMad/4/base -> origin/gh/SherlockNoMad/4/base 2025-12-04T09:33:41.2030043Z * [new branch] gh/SherlockNoMad/4/head -> origin/gh/SherlockNoMad/4/head 2025-12-04T09:33:41.2031890Z * [new branch] gh/SherlockNoMad/5/base -> origin/gh/SherlockNoMad/5/base 2025-12-04T09:33:41.2033195Z * [new branch] gh/SherlockNoMad/5/head -> origin/gh/SherlockNoMad/5/head 2025-12-04T09:33:41.2036222Z * [new branch] gh/Sidharth123-cpu/24/base -> origin/gh/Sidharth123-cpu/24/base 2025-12-04T09:33:41.2038001Z * [new branch] gh/Sidharth123-cpu/25/base -> origin/gh/Sidharth123-cpu/25/base 2025-12-04T09:33:41.2039830Z * [new branch] gh/Sidharth123-cpu/26/base -> origin/gh/Sidharth123-cpu/26/base 2025-12-04T09:33:41.2041962Z * [new branch] gh/Sidharth123-cpu/27/base -> origin/gh/Sidharth123-cpu/27/base 2025-12-04T09:33:41.2044510Z * [new branch] gh/StrongerXi/1/base -> origin/gh/StrongerXi/1/base 2025-12-04T09:33:41.2045773Z * [new branch] gh/StrongerXi/1/head -> origin/gh/StrongerXi/1/head 2025-12-04T09:33:41.2047783Z * [new branch] gh/StrongerXi/71/base -> origin/gh/StrongerXi/71/base 2025-12-04T09:33:41.2049257Z * [new branch] gh/StrongerXi/71/head -> origin/gh/StrongerXi/71/head 2025-12-04T09:33:41.2051066Z * [new branch] gh/StrongerXi/72/base -> origin/gh/StrongerXi/72/base 2025-12-04T09:33:41.2052503Z * [new branch] gh/StrongerXi/72/head -> origin/gh/StrongerXi/72/head 2025-12-04T09:33:41.2054387Z * [new branch] gh/StrongerXi/73/base -> origin/gh/StrongerXi/73/base 2025-12-04T09:33:41.2055888Z * [new branch] gh/StrongerXi/73/head -> origin/gh/StrongerXi/73/head 2025-12-04T09:33:41.2057654Z * [new branch] gh/StrongerXi/73/orig -> origin/gh/StrongerXi/73/orig 2025-12-04T09:33:41.2060253Z * [new branch] gh/XilunWu/160/base -> origin/gh/XilunWu/160/base 2025-12-04T09:33:41.2061711Z * [new branch] gh/XilunWu/160/head -> origin/gh/XilunWu/160/head 2025-12-04T09:33:41.2063216Z * [new branch] gh/XilunWu/160/orig -> origin/gh/XilunWu/160/orig 2025-12-04T09:33:41.2065224Z * [new branch] gh/XilunWu/163/base -> origin/gh/XilunWu/163/base 2025-12-04T09:33:41.2066799Z * [new branch] gh/XilunWu/163/head -> origin/gh/XilunWu/163/head 2025-12-04T09:33:41.2068246Z * [new branch] gh/XilunWu/163/orig -> origin/gh/XilunWu/163/orig 2025-12-04T09:33:41.2070411Z * [new branch] gh/XilunWu/168/base -> origin/gh/XilunWu/168/base 2025-12-04T09:33:41.2072016Z * [new branch] gh/XilunWu/168/head -> origin/gh/XilunWu/168/head 2025-12-04T09:33:41.2073540Z * [new branch] gh/XilunWu/168/orig -> origin/gh/XilunWu/168/orig 2025-12-04T09:33:41.2075438Z * [new branch] gh/XilunWu/169/base -> origin/gh/XilunWu/169/base 2025-12-04T09:33:41.2076933Z * [new branch] gh/XilunWu/169/head -> origin/gh/XilunWu/169/head 2025-12-04T09:33:41.2078413Z * [new branch] gh/XilunWu/169/orig -> origin/gh/XilunWu/169/orig 2025-12-04T09:33:41.2080218Z * [new branch] gh/XilunWu/170/base -> origin/gh/XilunWu/170/base 2025-12-04T09:33:41.2081703Z * [new branch] gh/XilunWu/170/head -> origin/gh/XilunWu/170/head 2025-12-04T09:33:41.2083180Z * [new branch] gh/XilunWu/170/orig -> origin/gh/XilunWu/170/orig 2025-12-04T09:33:41.2085328Z * [new branch] gh/XilunWu/171/base -> origin/gh/XilunWu/171/base 2025-12-04T09:33:41.2086802Z * [new branch] gh/XilunWu/171/head -> origin/gh/XilunWu/171/head 2025-12-04T09:33:41.2088288Z * [new branch] gh/XilunWu/171/orig -> origin/gh/XilunWu/171/orig 2025-12-04T09:33:41.2090112Z * [new branch] gh/XilunWu/173/base -> origin/gh/XilunWu/173/base 2025-12-04T09:33:41.2091660Z * [new branch] gh/XilunWu/173/head -> origin/gh/XilunWu/173/head 2025-12-04T09:33:41.2093209Z * [new branch] gh/XilunWu/173/orig -> origin/gh/XilunWu/173/orig 2025-12-04T09:33:41.2095076Z * [new branch] gh/XilunWu/175/base -> origin/gh/XilunWu/175/base 2025-12-04T09:33:41.2096687Z * [new branch] gh/XilunWu/175/head -> origin/gh/XilunWu/175/head 2025-12-04T09:33:41.2098236Z * [new branch] gh/XilunWu/175/orig -> origin/gh/XilunWu/175/orig 2025-12-04T09:33:41.2100276Z * [new branch] gh/XilunWu/176/base -> origin/gh/XilunWu/176/base 2025-12-04T09:33:41.2101756Z * [new branch] gh/XilunWu/176/head -> origin/gh/XilunWu/176/head 2025-12-04T09:33:41.2103446Z * [new branch] gh/XilunWu/176/orig -> origin/gh/XilunWu/176/orig 2025-12-04T09:33:41.2105772Z * [new branch] gh/XuehaiPan/14/base -> origin/gh/XuehaiPan/14/base 2025-12-04T09:33:41.2107235Z * [new branch] gh/XuehaiPan/14/head -> origin/gh/XuehaiPan/14/head 2025-12-04T09:33:41.2108696Z * [new branch] gh/XuehaiPan/14/orig -> origin/gh/XuehaiPan/14/orig 2025-12-04T09:33:41.2110861Z * [new branch] gh/XuehaiPan/179/base -> origin/gh/XuehaiPan/179/base 2025-12-04T09:33:41.2112337Z * [new branch] gh/XuehaiPan/179/head -> origin/gh/XuehaiPan/179/head 2025-12-04T09:33:41.2114011Z * [new branch] gh/XuehaiPan/179/orig -> origin/gh/XuehaiPan/179/orig 2025-12-04T09:33:41.2116026Z * [new branch] gh/XuehaiPan/249/base -> origin/gh/XuehaiPan/249/base 2025-12-04T09:33:41.2117628Z * [new branch] gh/XuehaiPan/249/head -> origin/gh/XuehaiPan/249/head 2025-12-04T09:33:41.2119220Z * [new branch] gh/XuehaiPan/249/orig -> origin/gh/XuehaiPan/249/orig 2025-12-04T09:33:41.2121250Z * [new branch] gh/XuehaiPan/253/base -> origin/gh/XuehaiPan/253/base 2025-12-04T09:33:41.2122687Z * [new branch] gh/XuehaiPan/253/head -> origin/gh/XuehaiPan/253/head 2025-12-04T09:33:41.2124196Z * [new branch] gh/XuehaiPan/253/orig -> origin/gh/XuehaiPan/253/orig 2025-12-04T09:33:41.2126188Z * [new branch] gh/XuehaiPan/254/base -> origin/gh/XuehaiPan/254/base 2025-12-04T09:33:41.2127672Z * [new branch] gh/XuehaiPan/254/head -> origin/gh/XuehaiPan/254/head 2025-12-04T09:33:41.2129179Z * [new branch] gh/XuehaiPan/254/orig -> origin/gh/XuehaiPan/254/orig 2025-12-04T09:33:41.2131048Z * [new branch] gh/XuehaiPan/255/base -> origin/gh/XuehaiPan/255/base 2025-12-04T09:33:41.2132513Z * [new branch] gh/XuehaiPan/255/head -> origin/gh/XuehaiPan/255/head 2025-12-04T09:33:41.2134018Z * [new branch] gh/XuehaiPan/255/orig -> origin/gh/XuehaiPan/255/orig 2025-12-04T09:33:41.2135963Z * [new branch] gh/XuehaiPan/271/base -> origin/gh/XuehaiPan/271/base 2025-12-04T09:33:41.2137616Z * [new branch] gh/XuehaiPan/271/head -> origin/gh/XuehaiPan/271/head 2025-12-04T09:33:41.2138925Z * [new branch] gh/XuehaiPan/271/orig -> origin/gh/XuehaiPan/271/orig 2025-12-04T09:33:41.2140935Z * [new branch] gh/XuehaiPan/343/base -> origin/gh/XuehaiPan/343/base 2025-12-04T09:33:41.2142390Z * [new branch] gh/XuehaiPan/343/head -> origin/gh/XuehaiPan/343/head 2025-12-04T09:33:41.2143799Z * [new branch] gh/XuehaiPan/343/orig -> origin/gh/XuehaiPan/343/orig 2025-12-04T09:33:41.2145871Z * [new branch] gh/XuehaiPan/347/base -> origin/gh/XuehaiPan/347/base 2025-12-04T09:33:41.2147365Z * [new branch] gh/XuehaiPan/347/head -> origin/gh/XuehaiPan/347/head 2025-12-04T09:33:41.2148880Z * [new branch] gh/XuehaiPan/347/orig -> origin/gh/XuehaiPan/347/orig 2025-12-04T09:33:41.2150813Z * [new branch] gh/XuehaiPan/348/base -> origin/gh/XuehaiPan/348/base 2025-12-04T09:33:41.2152288Z * [new branch] gh/XuehaiPan/348/head -> origin/gh/XuehaiPan/348/head 2025-12-04T09:33:41.2153752Z * [new branch] gh/XuehaiPan/348/orig -> origin/gh/XuehaiPan/348/orig 2025-12-04T09:33:41.2155707Z * [new branch] gh/XuehaiPan/350/base -> origin/gh/XuehaiPan/350/base 2025-12-04T09:33:41.2157166Z * [new branch] gh/XuehaiPan/350/head -> origin/gh/XuehaiPan/350/head 2025-12-04T09:33:41.2158630Z * [new branch] gh/XuehaiPan/350/orig -> origin/gh/XuehaiPan/350/orig 2025-12-04T09:33:41.2160817Z * [new branch] gh/XuehaiPan/365/base -> origin/gh/XuehaiPan/365/base 2025-12-04T09:33:41.2162122Z * [new branch] gh/XuehaiPan/365/head -> origin/gh/XuehaiPan/365/head 2025-12-04T09:33:41.2163633Z * [new branch] gh/XuehaiPan/365/orig -> origin/gh/XuehaiPan/365/orig 2025-12-04T09:33:41.2165672Z * [new branch] gh/XuehaiPan/366/base -> origin/gh/XuehaiPan/366/base 2025-12-04T09:33:41.2167115Z * [new branch] gh/XuehaiPan/366/head -> origin/gh/XuehaiPan/366/head 2025-12-04T09:33:41.2169462Z * [new branch] gh/XuehaiPan/370/base -> origin/gh/XuehaiPan/370/base 2025-12-04T09:33:41.2171082Z * [new branch] gh/XuehaiPan/370/head -> origin/gh/XuehaiPan/370/head 2025-12-04T09:33:41.2174077Z * [new branch] gh/XuehaiPan/370/orig -> origin/gh/XuehaiPan/370/orig 2025-12-04T09:33:41.2176085Z * [new branch] gh/XuehaiPan/390/base -> origin/gh/XuehaiPan/390/base 2025-12-04T09:33:41.2177720Z * [new branch] gh/XuehaiPan/390/head -> origin/gh/XuehaiPan/390/head 2025-12-04T09:33:41.2179197Z * [new branch] gh/XuehaiPan/390/orig -> origin/gh/XuehaiPan/390/orig 2025-12-04T09:33:41.2181178Z * [new branch] gh/XuehaiPan/391/base -> origin/gh/XuehaiPan/391/base 2025-12-04T09:33:41.2182556Z * [new branch] gh/XuehaiPan/391/head -> origin/gh/XuehaiPan/391/head 2025-12-04T09:33:41.2184064Z * [new branch] gh/XuehaiPan/391/orig -> origin/gh/XuehaiPan/391/orig 2025-12-04T09:33:41.2186067Z * [new branch] gh/XuehaiPan/392/base -> origin/gh/XuehaiPan/392/base 2025-12-04T09:33:41.2187523Z * [new branch] gh/XuehaiPan/392/head -> origin/gh/XuehaiPan/392/head 2025-12-04T09:33:41.2188998Z * [new branch] gh/XuehaiPan/392/orig -> origin/gh/XuehaiPan/392/orig 2025-12-04T09:33:41.2191593Z * [new branch] gh/XuehaiPan/394/base -> origin/gh/XuehaiPan/394/base 2025-12-04T09:33:41.2193283Z * [new branch] gh/XuehaiPan/394/head -> origin/gh/XuehaiPan/394/head 2025-12-04T09:33:41.2194569Z * [new branch] gh/XuehaiPan/394/orig -> origin/gh/XuehaiPan/394/orig 2025-12-04T09:33:41.2196640Z * [new branch] gh/XuehaiPan/397/base -> origin/gh/XuehaiPan/397/base 2025-12-04T09:33:41.2198177Z * [new branch] gh/XuehaiPan/397/head -> origin/gh/XuehaiPan/397/head 2025-12-04T09:33:41.2199509Z * [new branch] gh/XuehaiPan/397/orig -> origin/gh/XuehaiPan/397/orig 2025-12-04T09:33:41.2201582Z * [new branch] gh/XuehaiPan/398/base -> origin/gh/XuehaiPan/398/base 2025-12-04T09:33:41.2203063Z * [new branch] gh/XuehaiPan/398/head -> origin/gh/XuehaiPan/398/head 2025-12-04T09:33:41.2204549Z * [new branch] gh/XuehaiPan/398/orig -> origin/gh/XuehaiPan/398/orig 2025-12-04T09:33:41.2206443Z * [new branch] gh/XuehaiPan/399/base -> origin/gh/XuehaiPan/399/base 2025-12-04T09:33:41.2207988Z * [new branch] gh/XuehaiPan/399/head -> origin/gh/XuehaiPan/399/head 2025-12-04T09:33:41.2209486Z * [new branch] gh/XuehaiPan/399/orig -> origin/gh/XuehaiPan/399/orig 2025-12-04T09:33:41.2211562Z * [new branch] gh/XuehaiPan/400/base -> origin/gh/XuehaiPan/400/base 2025-12-04T09:33:41.2213091Z * [new branch] gh/XuehaiPan/400/head -> origin/gh/XuehaiPan/400/head 2025-12-04T09:33:41.2214569Z * [new branch] gh/XuehaiPan/400/orig -> origin/gh/XuehaiPan/400/orig 2025-12-04T09:33:41.2217084Z * [new branch] gh/ZhiweiYan-96/39/base -> origin/gh/ZhiweiYan-96/39/base 2025-12-04T09:33:41.2218628Z * [new branch] gh/ZhiweiYan-96/39/head -> origin/gh/ZhiweiYan-96/39/head 2025-12-04T09:33:41.2220127Z * [new branch] gh/ZhiweiYan-96/39/orig -> origin/gh/ZhiweiYan-96/39/orig 2025-12-04T09:33:41.2222284Z * [new branch] gh/ZhiweiYan-96/44/base -> origin/gh/ZhiweiYan-96/44/base 2025-12-04T09:33:41.2223581Z * [new branch] gh/ZhiweiYan-96/44/head -> origin/gh/ZhiweiYan-96/44/head 2025-12-04T09:33:41.2225543Z * [new branch] gh/ZhiweiYan-96/45/base -> origin/gh/ZhiweiYan-96/45/base 2025-12-04T09:33:41.2226842Z * [new branch] gh/ZhiweiYan-96/45/head -> origin/gh/ZhiweiYan-96/45/head 2025-12-04T09:33:41.2228931Z * [new branch] gh/ZhiweiYan-96/49/base -> origin/gh/ZhiweiYan-96/49/base 2025-12-04T09:33:41.2230451Z * [new branch] gh/ZhiweiYan-96/49/head -> origin/gh/ZhiweiYan-96/49/head 2025-12-04T09:33:41.2232411Z * [new branch] gh/ZhiweiYan-96/62/base -> origin/gh/ZhiweiYan-96/62/base 2025-12-04T09:33:41.2233886Z * [new branch] gh/ZhiweiYan-96/62/head -> origin/gh/ZhiweiYan-96/62/head 2025-12-04T09:33:41.2235927Z * [new branch] gh/ZhiweiYan-96/66/base -> origin/gh/ZhiweiYan-96/66/base 2025-12-04T09:33:41.2237413Z * [new branch] gh/ZhiweiYan-96/66/head -> origin/gh/ZhiweiYan-96/66/head 2025-12-04T09:33:41.2239329Z * [new branch] gh/ZhiweiYan-96/67/base -> origin/gh/ZhiweiYan-96/67/base 2025-12-04T09:33:41.2240668Z * [new branch] gh/ZhiweiYan-96/67/head -> origin/gh/ZhiweiYan-96/67/head 2025-12-04T09:33:41.2242590Z * [new branch] gh/ZhiweiYan-96/68/base -> origin/gh/ZhiweiYan-96/68/base 2025-12-04T09:33:41.2243861Z * [new branch] gh/ZhiweiYan-96/68/head -> origin/gh/ZhiweiYan-96/68/head 2025-12-04T09:33:41.2245407Z * [new branch] gh/ZhiweiYan-96/68/orig -> origin/gh/ZhiweiYan-96/68/orig 2025-12-04T09:33:41.2247821Z * [new branch] gh/aakhundov/1/base -> origin/gh/aakhundov/1/base 2025-12-04T09:33:41.2249394Z * [new branch] gh/aakhundov/1/head -> origin/gh/aakhundov/1/head 2025-12-04T09:33:41.2251187Z * [new branch] gh/aakhundov/2/base -> origin/gh/aakhundov/2/base 2025-12-04T09:33:41.2252756Z * [new branch] gh/aakhundov/2/head -> origin/gh/aakhundov/2/head 2025-12-04T09:33:41.2254850Z * [new branch] gh/aditew01/openblas -> origin/gh/aditew01/openblas 2025-12-04T09:33:41.2256141Z * [new branch] gh/aditew01/sbgemm -> origin/gh/aditew01/sbgemm 2025-12-04T09:33:41.2257854Z * [new branch] gh/aditew01/vecbf16 -> origin/gh/aditew01/vecbf16 2025-12-04T09:33:41.2260105Z * [new branch] gh/albanD/4/base -> origin/gh/albanD/4/base 2025-12-04T09:33:41.2261563Z * [new branch] gh/albanD/4/head -> origin/gh/albanD/4/head 2025-12-04T09:33:41.2263104Z * [new branch] gh/albanD/4/orig -> origin/gh/albanD/4/orig 2025-12-04T09:33:41.2265438Z * [new branch] gh/alexbrauckmann/paddedtensor_faketensor_init -> origin/gh/alexbrauckmann/paddedtensor_faketensor_init 2025-12-04T09:33:41.2267496Z * [new branch] gh/alexsamardzic/12/base -> origin/gh/alexsamardzic/12/base 2025-12-04T09:33:41.2268875Z * [new branch] gh/alexsamardzic/12/head -> origin/gh/alexsamardzic/12/head 2025-12-04T09:33:41.2270522Z * [new branch] gh/alexsamardzic/12/orig -> origin/gh/alexsamardzic/12/orig 2025-12-04T09:33:41.2272748Z * [new branch] gh/alexsamardzic/14/base -> origin/gh/alexsamardzic/14/base 2025-12-04T09:33:41.2274103Z * [new branch] gh/alexsamardzic/14/head -> origin/gh/alexsamardzic/14/head 2025-12-04T09:33:41.2275747Z * [new branch] gh/alexsamardzic/14/orig -> origin/gh/alexsamardzic/14/orig 2025-12-04T09:33:41.2277701Z * [new branch] gh/alexsamardzic/15/base -> origin/gh/alexsamardzic/15/base 2025-12-04T09:33:41.2279072Z * [new branch] gh/alexsamardzic/15/head -> origin/gh/alexsamardzic/15/head 2025-12-04T09:33:41.2280765Z * [new branch] gh/alexsamardzic/15/orig -> origin/gh/alexsamardzic/15/orig 2025-12-04T09:33:41.2282921Z * [new branch] gh/amjames/18/base -> origin/gh/amjames/18/base 2025-12-04T09:33:41.2284391Z * [new branch] gh/amjames/18/head -> origin/gh/amjames/18/head 2025-12-04T09:33:41.2285846Z * [new branch] gh/amjames/18/orig -> origin/gh/amjames/18/orig 2025-12-04T09:33:41.2288490Z * [new branch] gh/andrewor14/35/base -> origin/gh/andrewor14/35/base 2025-12-04T09:33:41.2290085Z * [new branch] gh/andrewor14/35/head -> origin/gh/andrewor14/35/head 2025-12-04T09:33:41.2291684Z * [new branch] gh/andrewor14/35/orig -> origin/gh/andrewor14/35/orig 2025-12-04T09:33:41.2293855Z * [new branch] gh/andrewor14/50/base -> origin/gh/andrewor14/50/base 2025-12-04T09:33:41.2295324Z * [new branch] gh/andrewor14/50/head -> origin/gh/andrewor14/50/head 2025-12-04T09:33:41.2296924Z * [new branch] gh/andrewor14/50/orig -> origin/gh/andrewor14/50/orig 2025-12-04T09:33:41.2299365Z * [new branch] gh/andyanwang/30/base -> origin/gh/andyanwang/30/base 2025-12-04T09:33:41.2301132Z * [new branch] gh/andyanwang/30/orig -> origin/gh/andyanwang/30/orig 2025-12-04T09:33:41.2303103Z * [new branch] gh/andyanwang/31/base -> origin/gh/andyanwang/31/base 2025-12-04T09:33:41.2304815Z * [new branch] gh/andyanwang/31/orig -> origin/gh/andyanwang/31/orig 2025-12-04T09:33:41.2306861Z * [new branch] gh/andyanwang/39/base -> origin/gh/andyanwang/39/base 2025-12-04T09:33:41.2308457Z * [new branch] gh/andyanwang/39/head -> origin/gh/andyanwang/39/head 2025-12-04T09:33:41.2309938Z * [new branch] gh/andyanwang/39/orig -> origin/gh/andyanwang/39/orig 2025-12-04T09:33:41.2312181Z * [new branch] gh/andyanwang/42/base -> origin/gh/andyanwang/42/base 2025-12-04T09:33:41.2313517Z * [new branch] gh/andyanwang/42/head -> origin/gh/andyanwang/42/head 2025-12-04T09:33:41.2315145Z * [new branch] gh/andyanwang/42/orig -> origin/gh/andyanwang/42/orig 2025-12-04T09:33:41.2317259Z * [new branch] gh/andyanwang/45/base -> origin/gh/andyanwang/45/base 2025-12-04T09:33:41.2318825Z * [new branch] gh/andyanwang/45/head -> origin/gh/andyanwang/45/head 2025-12-04T09:33:41.2320306Z * [new branch] gh/andyanwang/45/orig -> origin/gh/andyanwang/45/orig 2025-12-04T09:33:41.2322746Z * [new branch] gh/angelayi/107/base -> origin/gh/angelayi/107/base 2025-12-04T09:33:41.2324071Z * [new branch] gh/angelayi/107/head -> origin/gh/angelayi/107/head 2025-12-04T09:33:41.2326274Z * [new branch] gh/angelayi/114/base -> origin/gh/angelayi/114/base 2025-12-04T09:33:41.2327934Z * [new branch] gh/angelayi/114/head -> origin/gh/angelayi/114/head 2025-12-04T09:33:41.2329436Z * [new branch] gh/angelayi/114/orig -> origin/gh/angelayi/114/orig 2025-12-04T09:33:41.2331375Z * [new branch] gh/angelayi/116/base -> origin/gh/angelayi/116/base 2025-12-04T09:33:41.2332887Z * [new branch] gh/angelayi/116/head -> origin/gh/angelayi/116/head 2025-12-04T09:33:41.2334403Z * [new branch] gh/angelayi/116/orig -> origin/gh/angelayi/116/orig 2025-12-04T09:33:41.2336604Z * [new branch] gh/angelayi/122/base -> origin/gh/angelayi/122/base 2025-12-04T09:33:41.2338095Z * [new branch] gh/angelayi/122/head -> origin/gh/angelayi/122/head 2025-12-04T09:33:41.2339572Z * [new branch] gh/angelayi/122/orig -> origin/gh/angelayi/122/orig 2025-12-04T09:33:41.2341733Z * [new branch] gh/angelayi/124/base -> origin/gh/angelayi/124/base 2025-12-04T09:33:41.2343103Z * [new branch] gh/angelayi/124/head -> origin/gh/angelayi/124/head 2025-12-04T09:33:41.2344594Z * [new branch] gh/angelayi/124/orig -> origin/gh/angelayi/124/orig 2025-12-04T09:33:41.2346721Z * [new branch] gh/angelayi/128/base -> origin/gh/angelayi/128/base 2025-12-04T09:33:41.2348290Z * [new branch] gh/angelayi/128/head -> origin/gh/angelayi/128/head 2025-12-04T09:33:41.2349786Z * [new branch] gh/angelayi/128/orig -> origin/gh/angelayi/128/orig 2025-12-04T09:33:41.2351773Z * [new branch] gh/angelayi/131/base -> origin/gh/angelayi/131/base 2025-12-04T09:33:41.2353274Z * [new branch] gh/angelayi/131/head -> origin/gh/angelayi/131/head 2025-12-04T09:33:41.2354788Z * [new branch] gh/angelayi/131/orig -> origin/gh/angelayi/131/orig 2025-12-04T09:33:41.2357125Z * [new branch] gh/angelayi/132/base -> origin/gh/angelayi/132/base 2025-12-04T09:33:41.2358809Z * [new branch] gh/angelayi/132/head -> origin/gh/angelayi/132/head 2025-12-04T09:33:41.2360501Z * [new branch] gh/angelayi/132/orig -> origin/gh/angelayi/132/orig 2025-12-04T09:33:41.2362356Z * [new branch] gh/angelayi/133/base -> origin/gh/angelayi/133/base 2025-12-04T09:33:41.2363859Z * [new branch] gh/angelayi/133/head -> origin/gh/angelayi/133/head 2025-12-04T09:33:41.2365355Z * [new branch] gh/angelayi/133/orig -> origin/gh/angelayi/133/orig 2025-12-04T09:33:41.2367761Z * [new branch] gh/angelayi/134/base -> origin/gh/angelayi/134/base 2025-12-04T09:33:41.2369766Z * [new branch] gh/angelayi/134/head -> origin/gh/angelayi/134/head 2025-12-04T09:33:41.2370790Z * [new branch] gh/angelayi/134/orig -> origin/gh/angelayi/134/orig 2025-12-04T09:33:41.2375370Z * [new branch] gh/angelayi/135/base -> origin/gh/angelayi/135/base 2025-12-04T09:33:41.2377003Z * [new branch] gh/angelayi/135/head -> origin/gh/angelayi/135/head 2025-12-04T09:33:41.2378550Z * [new branch] gh/angelayi/135/orig -> origin/gh/angelayi/135/orig 2025-12-04T09:33:41.2380521Z * [new branch] gh/angelayi/136/base -> origin/gh/angelayi/136/base 2025-12-04T09:33:41.2382170Z * [new branch] gh/angelayi/136/head -> origin/gh/angelayi/136/head 2025-12-04T09:33:41.2383608Z * [new branch] gh/angelayi/136/orig -> origin/gh/angelayi/136/orig 2025-12-04T09:33:41.2385627Z * [new branch] gh/angelayi/137/base -> origin/gh/angelayi/137/base 2025-12-04T09:33:41.2387033Z * [new branch] gh/angelayi/137/head -> origin/gh/angelayi/137/head 2025-12-04T09:33:41.2388762Z * [new branch] gh/angelayi/137/orig -> origin/gh/angelayi/137/orig 2025-12-04T09:33:41.2390709Z * [new branch] gh/angelayi/138/base -> origin/gh/angelayi/138/base 2025-12-04T09:33:41.2392158Z * [new branch] gh/angelayi/138/head -> origin/gh/angelayi/138/head 2025-12-04T09:33:41.2393573Z * [new branch] gh/angelayi/138/orig -> origin/gh/angelayi/138/orig 2025-12-04T09:33:41.2395548Z * [new branch] gh/angelayi/139/base -> origin/gh/angelayi/139/base 2025-12-04T09:33:41.2397061Z * [new branch] gh/angelayi/139/head -> origin/gh/angelayi/139/head 2025-12-04T09:33:41.2398481Z * [new branch] gh/angelayi/139/orig -> origin/gh/angelayi/139/orig 2025-12-04T09:33:41.2400575Z * [new branch] gh/angelayi/140/base -> origin/gh/angelayi/140/base 2025-12-04T09:33:41.2402192Z * [new branch] gh/angelayi/140/head -> origin/gh/angelayi/140/head 2025-12-04T09:33:41.2403682Z * [new branch] gh/angelayi/140/orig -> origin/gh/angelayi/140/orig 2025-12-04T09:33:41.2406403Z * [new branch] gh/angelayi/141/base -> origin/gh/angelayi/141/base 2025-12-04T09:33:41.2407705Z * [new branch] gh/angelayi/141/head -> origin/gh/angelayi/141/head 2025-12-04T09:33:41.2409257Z * [new branch] gh/angelayi/141/orig -> origin/gh/angelayi/141/orig 2025-12-04T09:33:41.2411272Z * [new branch] gh/angelayi/142/base -> origin/gh/angelayi/142/base 2025-12-04T09:33:41.2412603Z * [new branch] gh/angelayi/142/head -> origin/gh/angelayi/142/head 2025-12-04T09:33:41.2414192Z * [new branch] gh/angelayi/142/orig -> origin/gh/angelayi/142/orig 2025-12-04T09:33:41.2416154Z * [new branch] gh/angelayi/143/base -> origin/gh/angelayi/143/base 2025-12-04T09:33:41.2417734Z * [new branch] gh/angelayi/143/head -> origin/gh/angelayi/143/head 2025-12-04T09:33:41.2419050Z * [new branch] gh/angelayi/143/orig -> origin/gh/angelayi/143/orig 2025-12-04T09:33:41.2421165Z * [new branch] gh/angelayi/144/base -> origin/gh/angelayi/144/base 2025-12-04T09:33:41.2422865Z * [new branch] gh/angelayi/144/head -> origin/gh/angelayi/144/head 2025-12-04T09:33:41.2424166Z * [new branch] gh/angelayi/144/orig -> origin/gh/angelayi/144/orig 2025-12-04T09:33:41.2426869Z * [new branch] gh/anijain2305/753/base -> origin/gh/anijain2305/753/base 2025-12-04T09:33:41.2428198Z * [new branch] gh/anijain2305/753/head -> origin/gh/anijain2305/753/head 2025-12-04T09:33:41.2429765Z * [new branch] gh/anijain2305/753/orig -> origin/gh/anijain2305/753/orig 2025-12-04T09:33:41.2431942Z * [new branch] gh/anijain2305/810/base -> origin/gh/anijain2305/810/base 2025-12-04T09:33:41.2433438Z * [new branch] gh/anijain2305/810/head -> origin/gh/anijain2305/810/head 2025-12-04T09:33:41.2434974Z * [new branch] gh/anijain2305/810/orig -> origin/gh/anijain2305/810/orig 2025-12-04T09:33:41.2436973Z * [new branch] gh/anijain2305/854/base -> origin/gh/anijain2305/854/base 2025-12-04T09:33:41.2439034Z * [new branch] gh/anijain2305/854/head -> origin/gh/anijain2305/854/head 2025-12-04T09:33:41.2440662Z * [new branch] gh/anijain2305/854/orig -> origin/gh/anijain2305/854/orig 2025-12-04T09:33:41.2442851Z * [new branch] gh/anijain2305/864/base -> origin/gh/anijain2305/864/base 2025-12-04T09:33:41.2444363Z * [new branch] gh/anijain2305/864/head -> origin/gh/anijain2305/864/head 2025-12-04T09:33:41.2445897Z * [new branch] gh/anijain2305/864/orig -> origin/gh/anijain2305/864/orig 2025-12-04T09:33:41.2447934Z * [new branch] gh/anijain2305/870/base -> origin/gh/anijain2305/870/base 2025-12-04T09:33:41.2449193Z * [new branch] gh/anijain2305/870/head -> origin/gh/anijain2305/870/head 2025-12-04T09:33:41.2450822Z * [new branch] gh/anijain2305/870/orig -> origin/gh/anijain2305/870/orig 2025-12-04T09:33:41.2452914Z * [new branch] gh/anijain2305/873/base -> origin/gh/anijain2305/873/base 2025-12-04T09:33:41.2454172Z * [new branch] gh/anijain2305/873/head -> origin/gh/anijain2305/873/head 2025-12-04T09:33:41.2455676Z * [new branch] gh/anijain2305/873/orig -> origin/gh/anijain2305/873/orig 2025-12-04T09:33:41.2457830Z * [new branch] gh/anijain2305/894/base -> origin/gh/anijain2305/894/base 2025-12-04T09:33:41.2459162Z * [new branch] gh/anijain2305/894/head -> origin/gh/anijain2305/894/head 2025-12-04T09:33:41.2460814Z * [new branch] gh/anijain2305/894/orig -> origin/gh/anijain2305/894/orig 2025-12-04T09:33:41.2462798Z * [new branch] gh/anijain2305/895/base -> origin/gh/anijain2305/895/base 2025-12-04T09:33:41.2464362Z * [new branch] gh/anijain2305/895/head -> origin/gh/anijain2305/895/head 2025-12-04T09:33:41.2465906Z * [new branch] gh/anijain2305/895/orig -> origin/gh/anijain2305/895/orig 2025-12-04T09:33:41.2467911Z * [new branch] gh/anijain2305/910/base -> origin/gh/anijain2305/910/base 2025-12-04T09:33:41.2469453Z * [new branch] gh/anijain2305/910/head -> origin/gh/anijain2305/910/head 2025-12-04T09:33:41.2471149Z * [new branch] gh/anijain2305/910/orig -> origin/gh/anijain2305/910/orig 2025-12-04T09:33:41.2473292Z * [new branch] gh/anijain2305/919/base -> origin/gh/anijain2305/919/base 2025-12-04T09:33:41.2474852Z * [new branch] gh/anijain2305/919/head -> origin/gh/anijain2305/919/head 2025-12-04T09:33:41.2476312Z * [new branch] gh/anijain2305/919/orig -> origin/gh/anijain2305/919/orig 2025-12-04T09:33:41.2478322Z * [new branch] gh/anijain2305/922/base -> origin/gh/anijain2305/922/base 2025-12-04T09:33:41.2479912Z * [new branch] gh/anijain2305/922/head -> origin/gh/anijain2305/922/head 2025-12-04T09:33:41.2481491Z * [new branch] gh/anijain2305/922/orig -> origin/gh/anijain2305/922/orig 2025-12-04T09:33:41.2483559Z * [new branch] gh/anijain2305/932/base -> origin/gh/anijain2305/932/base 2025-12-04T09:33:41.2485237Z * [new branch] gh/anijain2305/932/head -> origin/gh/anijain2305/932/head 2025-12-04T09:33:41.2486774Z * [new branch] gh/anijain2305/932/orig -> origin/gh/anijain2305/932/orig 2025-12-04T09:33:41.2488760Z * [new branch] gh/anijain2305/940/base -> origin/gh/anijain2305/940/base 2025-12-04T09:33:41.2490109Z * [new branch] gh/anijain2305/940/head -> origin/gh/anijain2305/940/head 2025-12-04T09:33:41.2491697Z * [new branch] gh/anijain2305/940/orig -> origin/gh/anijain2305/940/orig 2025-12-04T09:33:41.2493715Z * [new branch] gh/anijain2305/941/base -> origin/gh/anijain2305/941/base 2025-12-04T09:33:41.2495218Z * [new branch] gh/anijain2305/941/head -> origin/gh/anijain2305/941/head 2025-12-04T09:33:41.2496622Z * [new branch] gh/anijain2305/941/orig -> origin/gh/anijain2305/941/orig 2025-12-04T09:33:41.2498720Z * [new branch] gh/anijain2305/942/base -> origin/gh/anijain2305/942/base 2025-12-04T09:33:41.2500245Z * [new branch] gh/anijain2305/942/head -> origin/gh/anijain2305/942/head 2025-12-04T09:33:41.2501941Z * [new branch] gh/anijain2305/942/orig -> origin/gh/anijain2305/942/orig 2025-12-04T09:33:41.2503977Z * [new branch] gh/anijain2305/943/base -> origin/gh/anijain2305/943/base 2025-12-04T09:33:41.2505312Z * [new branch] gh/anijain2305/943/head -> origin/gh/anijain2305/943/head 2025-12-04T09:33:41.2506856Z * [new branch] gh/anijain2305/943/orig -> origin/gh/anijain2305/943/orig 2025-12-04T09:33:41.2509408Z * [new branch] gh/anijain2305/944/base -> origin/gh/anijain2305/944/base 2025-12-04T09:33:41.2510749Z * [new branch] gh/anijain2305/944/head -> origin/gh/anijain2305/944/head 2025-12-04T09:33:41.2513104Z * [new branch] gh/anijain2305/944/orig -> origin/gh/anijain2305/944/orig 2025-12-04T09:33:41.2515084Z * [new branch] gh/anijain2305/945/base -> origin/gh/anijain2305/945/base 2025-12-04T09:33:41.2516658Z * [new branch] gh/anijain2305/945/head -> origin/gh/anijain2305/945/head 2025-12-04T09:33:41.2517989Z * [new branch] gh/anijain2305/945/orig -> origin/gh/anijain2305/945/orig 2025-12-04T09:33:41.2520163Z * [new branch] gh/anijain2305/946/base -> origin/gh/anijain2305/946/base 2025-12-04T09:33:41.2521482Z * [new branch] gh/anijain2305/946/head -> origin/gh/anijain2305/946/head 2025-12-04T09:33:41.2523174Z * [new branch] gh/anijain2305/946/orig -> origin/gh/anijain2305/946/orig 2025-12-04T09:33:41.2525268Z * [new branch] gh/anijain2305/947/base -> origin/gh/anijain2305/947/base 2025-12-04T09:33:41.2526845Z * [new branch] gh/anijain2305/947/head -> origin/gh/anijain2305/947/head 2025-12-04T09:33:41.2527956Z * [new branch] gh/anijain2305/947/orig -> origin/gh/anijain2305/947/orig 2025-12-04T09:33:41.2530024Z * [new branch] gh/anijain2305/948/base -> origin/gh/anijain2305/948/base 2025-12-04T09:33:41.2531497Z * [new branch] gh/anijain2305/948/head -> origin/gh/anijain2305/948/head 2025-12-04T09:33:41.2532823Z * [new branch] gh/anijain2305/948/orig -> origin/gh/anijain2305/948/orig 2025-12-04T09:33:41.2534901Z * [new branch] gh/anijain2305/949/base -> origin/gh/anijain2305/949/base 2025-12-04T09:33:41.2536229Z * [new branch] gh/anijain2305/949/head -> origin/gh/anijain2305/949/head 2025-12-04T09:33:41.2537915Z * [new branch] gh/anijain2305/949/orig -> origin/gh/anijain2305/949/orig 2025-12-04T09:33:41.2539963Z * [new branch] gh/anijain2305/950/base -> origin/gh/anijain2305/950/base 2025-12-04T09:33:41.2541459Z * [new branch] gh/anijain2305/950/head -> origin/gh/anijain2305/950/head 2025-12-04T09:33:41.2543148Z * [new branch] gh/anijain2305/950/orig -> origin/gh/anijain2305/950/orig 2025-12-04T09:33:41.2545129Z * [new branch] gh/anijain2305/951/base -> origin/gh/anijain2305/951/base 2025-12-04T09:33:41.2546459Z * [new branch] gh/anijain2305/951/head -> origin/gh/anijain2305/951/head 2025-12-04T09:33:41.2548070Z * [new branch] gh/anijain2305/951/orig -> origin/gh/anijain2305/951/orig 2025-12-04T09:33:41.2550017Z * [new branch] gh/anijain2305/952/base -> origin/gh/anijain2305/952/base 2025-12-04T09:33:41.2551456Z * [new branch] gh/anijain2305/952/head -> origin/gh/anijain2305/952/head 2025-12-04T09:33:41.2552980Z * [new branch] gh/anijain2305/952/orig -> origin/gh/anijain2305/952/orig 2025-12-04T09:33:41.2555455Z * [new branch] gh/anijain2305/953/base -> origin/gh/anijain2305/953/base 2025-12-04T09:33:41.2556803Z * [new branch] gh/anijain2305/953/head -> origin/gh/anijain2305/953/head 2025-12-04T09:33:41.2558367Z * [new branch] gh/anijain2305/953/orig -> origin/gh/anijain2305/953/orig 2025-12-04T09:33:41.2560377Z * [new branch] gh/anijain2305/954/base -> origin/gh/anijain2305/954/base 2025-12-04T09:33:41.2561963Z * [new branch] gh/anijain2305/954/head -> origin/gh/anijain2305/954/head 2025-12-04T09:33:41.2563598Z * [new branch] gh/anijain2305/954/orig -> origin/gh/anijain2305/954/orig 2025-12-04T09:33:41.2565704Z * [new branch] gh/anijain2305/955/base -> origin/gh/anijain2305/955/base 2025-12-04T09:33:41.2567170Z * [new branch] gh/anijain2305/955/head -> origin/gh/anijain2305/955/head 2025-12-04T09:33:41.2568727Z * [new branch] gh/anijain2305/955/orig -> origin/gh/anijain2305/955/orig 2025-12-04T09:33:41.2570836Z * [new branch] gh/anijain2305/956/base -> origin/gh/anijain2305/956/base 2025-12-04T09:33:41.2572637Z * [new branch] gh/anijain2305/956/head -> origin/gh/anijain2305/956/head 2025-12-04T09:33:41.2574327Z * [new branch] gh/anijain2305/956/orig -> origin/gh/anijain2305/956/orig 2025-12-04T09:33:41.2576185Z * [new branch] gh/anijain2305/957/base -> origin/gh/anijain2305/957/base 2025-12-04T09:33:41.2577841Z * [new branch] gh/anijain2305/957/head -> origin/gh/anijain2305/957/head 2025-12-04T09:33:41.2579180Z * [new branch] gh/anijain2305/957/orig -> origin/gh/anijain2305/957/orig 2025-12-04T09:33:41.2581263Z * [new branch] gh/anijain2305/958/base -> origin/gh/anijain2305/958/base 2025-12-04T09:33:41.2583026Z * [new branch] gh/anijain2305/958/head -> origin/gh/anijain2305/958/head 2025-12-04T09:33:41.2584377Z * [new branch] gh/anijain2305/958/orig -> origin/gh/anijain2305/958/orig 2025-12-04T09:33:41.2586446Z * [new branch] gh/anijain2305/959/base -> origin/gh/anijain2305/959/base 2025-12-04T09:33:41.2587949Z * [new branch] gh/anijain2305/959/head -> origin/gh/anijain2305/959/head 2025-12-04T09:33:41.2589222Z * [new branch] gh/anijain2305/959/orig -> origin/gh/anijain2305/959/orig 2025-12-04T09:33:41.2591411Z * [new branch] gh/anijain2305/960/base -> origin/gh/anijain2305/960/base 2025-12-04T09:33:41.2592972Z * [new branch] gh/anijain2305/960/head -> origin/gh/anijain2305/960/head 2025-12-04T09:33:41.2594444Z * [new branch] gh/anijain2305/960/orig -> origin/gh/anijain2305/960/orig 2025-12-04T09:33:41.2596620Z * [new branch] gh/anijain2305/961/base -> origin/gh/anijain2305/961/base 2025-12-04T09:33:41.2597949Z * [new branch] gh/anijain2305/961/head -> origin/gh/anijain2305/961/head 2025-12-04T09:33:41.2599544Z * [new branch] gh/anijain2305/961/orig -> origin/gh/anijain2305/961/orig 2025-12-04T09:33:41.2601564Z * [new branch] gh/anijain2305/962/base -> origin/gh/anijain2305/962/base 2025-12-04T09:33:41.2602996Z * [new branch] gh/anijain2305/962/head -> origin/gh/anijain2305/962/head 2025-12-04T09:33:41.2604553Z * [new branch] gh/anijain2305/962/orig -> origin/gh/anijain2305/962/orig 2025-12-04T09:33:41.2606959Z * [new branch] gh/anijain2305/963/base -> origin/gh/anijain2305/963/base 2025-12-04T09:33:41.2608657Z * [new branch] gh/anijain2305/963/head -> origin/gh/anijain2305/963/head 2025-12-04T09:33:41.2610272Z * [new branch] gh/anijain2305/963/orig -> origin/gh/anijain2305/963/orig 2025-12-04T09:33:41.2612330Z * [new branch] gh/anijain2305/964/base -> origin/gh/anijain2305/964/base 2025-12-04T09:33:41.2613645Z * [new branch] gh/anijain2305/964/head -> origin/gh/anijain2305/964/head 2025-12-04T09:33:41.2615182Z * [new branch] gh/anijain2305/964/orig -> origin/gh/anijain2305/964/orig 2025-12-04T09:33:41.2617481Z * [new branch] gh/anijain2305/965/base -> origin/gh/anijain2305/965/base 2025-12-04T09:33:41.2618933Z * [new branch] gh/anijain2305/965/head -> origin/gh/anijain2305/965/head 2025-12-04T09:33:41.2620435Z * [new branch] gh/anijain2305/965/orig -> origin/gh/anijain2305/965/orig 2025-12-04T09:33:41.2622292Z * [new branch] gh/anijain2305/966/base -> origin/gh/anijain2305/966/base 2025-12-04T09:33:41.2623747Z * [new branch] gh/anijain2305/966/head -> origin/gh/anijain2305/966/head 2025-12-04T09:33:41.2625306Z * [new branch] gh/anijain2305/966/orig -> origin/gh/anijain2305/966/orig 2025-12-04T09:33:41.2627277Z * [new branch] gh/anijain2305/967/base -> origin/gh/anijain2305/967/base 2025-12-04T09:33:41.2628674Z * [new branch] gh/anijain2305/967/head -> origin/gh/anijain2305/967/head 2025-12-04T09:33:41.2630311Z * [new branch] gh/anijain2305/967/orig -> origin/gh/anijain2305/967/orig 2025-12-04T09:33:41.2632347Z * [new branch] gh/anijain2305/968/base -> origin/gh/anijain2305/968/base 2025-12-04T09:33:41.2633735Z * [new branch] gh/anijain2305/968/head -> origin/gh/anijain2305/968/head 2025-12-04T09:33:41.2635179Z * [new branch] gh/anijain2305/968/orig -> origin/gh/anijain2305/968/orig 2025-12-04T09:33:41.2637160Z * [new branch] gh/anijain2305/969/base -> origin/gh/anijain2305/969/base 2025-12-04T09:33:41.2638638Z * [new branch] gh/anijain2305/969/head -> origin/gh/anijain2305/969/head 2025-12-04T09:33:41.2640348Z * [new branch] gh/anijain2305/969/orig -> origin/gh/anijain2305/969/orig 2025-12-04T09:33:41.2642179Z * [new branch] gh/anijain2305/970/base -> origin/gh/anijain2305/970/base 2025-12-04T09:33:41.2643700Z * [new branch] gh/anijain2305/970/head -> origin/gh/anijain2305/970/head 2025-12-04T09:33:41.2645273Z * [new branch] gh/anijain2305/970/orig -> origin/gh/anijain2305/970/orig 2025-12-04T09:33:41.2647899Z * [new branch] gh/anjali411/216/base -> origin/gh/anjali411/216/base 2025-12-04T09:33:41.2649283Z * [new branch] gh/anjali411/216/head -> origin/gh/anjali411/216/head 2025-12-04T09:33:41.2650806Z * [new branch] gh/anjali411/216/orig -> origin/gh/anjali411/216/orig 2025-12-04T09:33:41.2653482Z * [new branch] gh/anshul-si/1/base -> origin/gh/anshul-si/1/base 2025-12-04T09:33:41.2654809Z * [new branch] gh/anshul-si/1/head -> origin/gh/anshul-si/1/head 2025-12-04T09:33:41.2656706Z * [new branch] gh/anshul-si/2/base -> origin/gh/anshul-si/2/base 2025-12-04T09:33:41.2658088Z * [new branch] gh/anshul-si/2/head -> origin/gh/anshul-si/2/head 2025-12-04T09:33:41.2659817Z * [new branch] gh/anshul-si/3/base -> origin/gh/anshul-si/3/base 2025-12-04T09:33:41.2661776Z * [new branch] gh/anshul-si/3/head -> origin/gh/anshul-si/3/head 2025-12-04T09:33:41.2663659Z * [new branch] gh/anshul-si/4/base -> origin/gh/anshul-si/4/base 2025-12-04T09:33:41.2664981Z * [new branch] gh/anshul-si/4/head -> origin/gh/anshul-si/4/head 2025-12-04T09:33:41.2666714Z * [new branch] gh/anshul-si/5/base -> origin/gh/anshul-si/5/base 2025-12-04T09:33:41.2668174Z * [new branch] gh/anshul-si/5/head -> origin/gh/anshul-si/5/head 2025-12-04T09:33:41.2670459Z * [new branch] gh/anshul-si/53/base -> origin/gh/anshul-si/53/base 2025-12-04T09:33:41.2672494Z * [new branch] gh/anshul-si/53/head -> origin/gh/anshul-si/53/head 2025-12-04T09:33:41.2674035Z * [new branch] gh/anshul-si/58/base -> origin/gh/anshul-si/58/base 2025-12-04T09:33:41.2675362Z * [new branch] gh/anshul-si/58/head -> origin/gh/anshul-si/58/head 2025-12-04T09:33:41.2677256Z * [new branch] gh/anshul-si/66/base -> origin/gh/anshul-si/66/base 2025-12-04T09:33:41.2678782Z * [new branch] gh/anshul-si/66/head -> origin/gh/anshul-si/66/head 2025-12-04T09:33:41.2680244Z * [new branch] gh/anshul-si/66/orig -> origin/gh/anshul-si/66/orig 2025-12-04T09:33:41.2682126Z * [new branch] gh/anshul-si/67/base -> origin/gh/anshul-si/67/base 2025-12-04T09:33:41.2683606Z * [new branch] gh/anshul-si/67/head -> origin/gh/anshul-si/67/head 2025-12-04T09:33:41.2685174Z * [new branch] gh/anshul-si/67/orig -> origin/gh/anshul-si/67/orig 2025-12-04T09:33:41.2687296Z * [new branch] gh/anshul-si/68/base -> origin/gh/anshul-si/68/base 2025-12-04T09:33:41.2688608Z * [new branch] gh/anshul-si/68/head -> origin/gh/anshul-si/68/head 2025-12-04T09:33:41.2690134Z * [new branch] gh/anshul-si/68/orig -> origin/gh/anshul-si/68/orig 2025-12-04T09:33:41.2692346Z * [new branch] gh/anshul-si/69/base -> origin/gh/anshul-si/69/base 2025-12-04T09:33:41.2693784Z * [new branch] gh/anshul-si/69/head -> origin/gh/anshul-si/69/head 2025-12-04T09:33:41.2695273Z * [new branch] gh/anshul-si/69/orig -> origin/gh/anshul-si/69/orig 2025-12-04T09:33:41.2697335Z * [new branch] gh/anshul-si/70/base -> origin/gh/anshul-si/70/base 2025-12-04T09:33:41.2698936Z * [new branch] gh/anshul-si/70/head -> origin/gh/anshul-si/70/head 2025-12-04T09:33:41.2701056Z * [new branch] gh/anshul-si/70/orig -> origin/gh/anshul-si/70/orig 2025-12-04T09:33:41.2703074Z * [new branch] gh/anshul-si/71/base -> origin/gh/anshul-si/71/base 2025-12-04T09:33:41.2704680Z * [new branch] gh/anshul-si/71/head -> origin/gh/anshul-si/71/head 2025-12-04T09:33:41.2706199Z * [new branch] gh/anshul-si/71/orig -> origin/gh/anshul-si/71/orig 2025-12-04T09:33:41.2708143Z * [new branch] gh/anshul-si/72/base -> origin/gh/anshul-si/72/base 2025-12-04T09:33:41.2709688Z * [new branch] gh/anshul-si/72/head -> origin/gh/anshul-si/72/head 2025-12-04T09:33:41.2711191Z * [new branch] gh/anshul-si/72/orig -> origin/gh/anshul-si/72/orig 2025-12-04T09:33:41.2713224Z * [new branch] gh/anshul-si/73/base -> origin/gh/anshul-si/73/base 2025-12-04T09:33:41.2714786Z * [new branch] gh/anshul-si/73/head -> origin/gh/anshul-si/73/head 2025-12-04T09:33:41.2716271Z * [new branch] gh/anshul-si/73/orig -> origin/gh/anshul-si/73/orig 2025-12-04T09:33:41.2718786Z * [new branch] gh/aorenste/132/base -> origin/gh/aorenste/132/base 2025-12-04T09:33:41.2720203Z * [new branch] gh/aorenste/132/head -> origin/gh/aorenste/132/head 2025-12-04T09:33:41.2722553Z * [new branch] gh/aorenste/134/base -> origin/gh/aorenste/134/base 2025-12-04T09:33:41.2724234Z * [new branch] gh/aorenste/134/head -> origin/gh/aorenste/134/head 2025-12-04T09:33:41.2725741Z * [new branch] gh/aorenste/134/orig -> origin/gh/aorenste/134/orig 2025-12-04T09:33:41.2727788Z * [new branch] gh/aorenste/139/base -> origin/gh/aorenste/139/base 2025-12-04T09:33:41.2729308Z * [new branch] gh/aorenste/139/head -> origin/gh/aorenste/139/head 2025-12-04T09:33:41.2730805Z * [new branch] gh/aorenste/139/orig -> origin/gh/aorenste/139/orig 2025-12-04T09:33:41.2732779Z * [new branch] gh/aorenste/141/base -> origin/gh/aorenste/141/base 2025-12-04T09:33:41.2734058Z * [new branch] gh/aorenste/141/head -> origin/gh/aorenste/141/head 2025-12-04T09:33:41.2736534Z * [new branch] gh/aorenste/145/base -> origin/gh/aorenste/145/base 2025-12-04T09:33:41.2738134Z * [new branch] gh/aorenste/145/head -> origin/gh/aorenste/145/head 2025-12-04T09:33:41.2739759Z * [new branch] gh/aorenste/145/orig -> origin/gh/aorenste/145/orig 2025-12-04T09:33:41.2741931Z * [new branch] gh/aorenste/146/base -> origin/gh/aorenste/146/base 2025-12-04T09:33:41.2743556Z * [new branch] gh/aorenste/146/head -> origin/gh/aorenste/146/head 2025-12-04T09:33:41.2745078Z * [new branch] gh/aorenste/146/orig -> origin/gh/aorenste/146/orig 2025-12-04T09:33:41.2747285Z * [new branch] gh/aorenste/147/base -> origin/gh/aorenste/147/base 2025-12-04T09:33:41.2748945Z * [new branch] gh/aorenste/147/head -> origin/gh/aorenste/147/head 2025-12-04T09:33:41.2750425Z * [new branch] gh/aorenste/147/orig -> origin/gh/aorenste/147/orig 2025-12-04T09:33:41.2752496Z * [new branch] gh/aorenste/148/base -> origin/gh/aorenste/148/base 2025-12-04T09:33:41.2753986Z * [new branch] gh/aorenste/148/head -> origin/gh/aorenste/148/head 2025-12-04T09:33:41.2755554Z * [new branch] gh/aorenste/148/orig -> origin/gh/aorenste/148/orig 2025-12-04T09:33:41.2757625Z * [new branch] gh/aorenste/149/base -> origin/gh/aorenste/149/base 2025-12-04T09:33:41.2759095Z * [new branch] gh/aorenste/149/head -> origin/gh/aorenste/149/head 2025-12-04T09:33:41.2760523Z * [new branch] gh/aorenste/149/orig -> origin/gh/aorenste/149/orig 2025-12-04T09:33:41.2762658Z * [new branch] gh/aorenste/150/base -> origin/gh/aorenste/150/base 2025-12-04T09:33:41.2763952Z * [new branch] gh/aorenste/150/head -> origin/gh/aorenste/150/head 2025-12-04T09:33:41.2765638Z * [new branch] gh/aorenste/150/orig -> origin/gh/aorenste/150/orig 2025-12-04T09:33:41.2767471Z * [new branch] gh/aorenste/151/base -> origin/gh/aorenste/151/base 2025-12-04T09:33:41.2768961Z * [new branch] gh/aorenste/151/head -> origin/gh/aorenste/151/head 2025-12-04T09:33:41.2770529Z * [new branch] gh/aorenste/151/orig -> origin/gh/aorenste/151/orig 2025-12-04T09:33:41.2772664Z * [new branch] gh/aorenste/152/base -> origin/gh/aorenste/152/base 2025-12-04T09:33:41.2774149Z * [new branch] gh/aorenste/152/head -> origin/gh/aorenste/152/head 2025-12-04T09:33:41.2775658Z * [new branch] gh/aorenste/152/orig -> origin/gh/aorenste/152/orig 2025-12-04T09:33:41.2777579Z * [new branch] gh/aorenste/153/base -> origin/gh/aorenste/153/base 2025-12-04T09:33:41.2779077Z * [new branch] gh/aorenste/153/head -> origin/gh/aorenste/153/head 2025-12-04T09:33:41.2780548Z * [new branch] gh/aorenste/153/orig -> origin/gh/aorenste/153/orig 2025-12-04T09:33:41.2782841Z * [new branch] gh/aorenste/154/base -> origin/gh/aorenste/154/base 2025-12-04T09:33:41.2784033Z * [new branch] gh/aorenste/154/head -> origin/gh/aorenste/154/head 2025-12-04T09:33:41.2785334Z * [new branch] gh/aorenste/154/orig -> origin/gh/aorenste/154/orig 2025-12-04T09:33:41.2787162Z * [new branch] gh/aorenste/155/base -> origin/gh/aorenste/155/base 2025-12-04T09:33:41.2788721Z * [new branch] gh/aorenste/155/head -> origin/gh/aorenste/155/head 2025-12-04T09:33:41.2789947Z * [new branch] gh/aorenste/155/orig -> origin/gh/aorenste/155/orig 2025-12-04T09:33:41.2792023Z * [new branch] gh/aorenste/156/base -> origin/gh/aorenste/156/base 2025-12-04T09:33:41.2793224Z * [new branch] gh/aorenste/156/head -> origin/gh/aorenste/156/head 2025-12-04T09:33:41.2794721Z * [new branch] gh/aorenste/156/orig -> origin/gh/aorenste/156/orig 2025-12-04T09:33:41.2797035Z * [new branch] gh/aorenste/157/base -> origin/gh/aorenste/157/base 2025-12-04T09:33:41.2798531Z * [new branch] gh/aorenste/157/head -> origin/gh/aorenste/157/head 2025-12-04T09:33:41.2799853Z * [new branch] gh/aorenste/157/orig -> origin/gh/aorenste/157/orig 2025-12-04T09:33:41.2801951Z * [new branch] gh/aorenste/158/base -> origin/gh/aorenste/158/base 2025-12-04T09:33:41.2803478Z * [new branch] gh/aorenste/158/head -> origin/gh/aorenste/158/head 2025-12-04T09:33:41.2804715Z * [new branch] gh/aorenste/158/orig -> origin/gh/aorenste/158/orig 2025-12-04T09:33:41.2806645Z * [new branch] gh/aorenste/159/base -> origin/gh/aorenste/159/base 2025-12-04T09:33:41.2808259Z * [new branch] gh/aorenste/159/head -> origin/gh/aorenste/159/head 2025-12-04T09:33:41.2809538Z * [new branch] gh/aorenste/159/orig -> origin/gh/aorenste/159/orig 2025-12-04T09:33:41.2812004Z * [new branch] gh/avikchaudhuri/1/base -> origin/gh/avikchaudhuri/1/base 2025-12-04T09:33:41.2813396Z * [new branch] gh/avikchaudhuri/1/head -> origin/gh/avikchaudhuri/1/head 2025-12-04T09:33:41.2815303Z * [new branch] gh/avikchaudhuri/2/base -> origin/gh/avikchaudhuri/2/base 2025-12-04T09:33:41.2816720Z * [new branch] gh/avikchaudhuri/2/head -> origin/gh/avikchaudhuri/2/head 2025-12-04T09:33:41.2818165Z * [new branch] gh/avikchaudhuri/2/orig -> origin/gh/avikchaudhuri/2/orig 2025-12-04T09:33:41.2821204Z * [new branch] gh/bdhirsh/666/base -> origin/gh/bdhirsh/666/base 2025-12-04T09:33:41.2822442Z * [new branch] gh/bdhirsh/666/head -> origin/gh/bdhirsh/666/head 2025-12-04T09:33:41.2824075Z * [new branch] gh/bdhirsh/666/orig -> origin/gh/bdhirsh/666/orig 2025-12-04T09:33:41.2826104Z * [new branch] gh/bdhirsh/668/base -> origin/gh/bdhirsh/668/base 2025-12-04T09:33:41.2827558Z * [new branch] gh/bdhirsh/668/head -> origin/gh/bdhirsh/668/head 2025-12-04T09:33:41.2828974Z * [new branch] gh/bdhirsh/668/orig -> origin/gh/bdhirsh/668/orig 2025-12-04T09:33:41.2831199Z * [new branch] gh/bdhirsh/669/base -> origin/gh/bdhirsh/669/base 2025-12-04T09:33:41.2832481Z * [new branch] gh/bdhirsh/669/head -> origin/gh/bdhirsh/669/head 2025-12-04T09:33:41.2834021Z * [new branch] gh/bdhirsh/669/orig -> origin/gh/bdhirsh/669/orig 2025-12-04T09:33:41.2836212Z * [new branch] gh/bdhirsh/670/base -> origin/gh/bdhirsh/670/base 2025-12-04T09:33:41.2837830Z * [new branch] gh/bdhirsh/670/head -> origin/gh/bdhirsh/670/head 2025-12-04T09:33:41.2839397Z * [new branch] gh/bdhirsh/670/orig -> origin/gh/bdhirsh/670/orig 2025-12-04T09:33:41.2841487Z * [new branch] gh/bdhirsh/672/base -> origin/gh/bdhirsh/672/base 2025-12-04T09:33:41.2843008Z * [new branch] gh/bdhirsh/672/head -> origin/gh/bdhirsh/672/head 2025-12-04T09:33:41.2844459Z * [new branch] gh/bdhirsh/672/orig -> origin/gh/bdhirsh/672/orig 2025-12-04T09:33:41.2846771Z * [new branch] gh/bdhirsh/675/base -> origin/gh/bdhirsh/675/base 2025-12-04T09:33:41.2848506Z * [new branch] gh/bdhirsh/675/head -> origin/gh/bdhirsh/675/head 2025-12-04T09:33:41.2850018Z * [new branch] gh/bdhirsh/675/orig -> origin/gh/bdhirsh/675/orig 2025-12-04T09:33:41.2852030Z * [new branch] gh/bdhirsh/676/base -> origin/gh/bdhirsh/676/base 2025-12-04T09:33:41.2853743Z * [new branch] gh/bdhirsh/676/head -> origin/gh/bdhirsh/676/head 2025-12-04T09:33:41.2854980Z * [new branch] gh/bdhirsh/676/orig -> origin/gh/bdhirsh/676/orig 2025-12-04T09:33:41.2857067Z * [new branch] gh/bdhirsh/677/base -> origin/gh/bdhirsh/677/base 2025-12-04T09:33:41.2859158Z * [new branch] gh/bdhirsh/677/head -> origin/gh/bdhirsh/677/head 2025-12-04T09:33:41.2860613Z * [new branch] gh/bdhirsh/677/orig -> origin/gh/bdhirsh/677/orig 2025-12-04T09:33:41.2862971Z * [new branch] gh/bdhirsh/678/base -> origin/gh/bdhirsh/678/base 2025-12-04T09:33:41.2864583Z * [new branch] gh/bdhirsh/678/head -> origin/gh/bdhirsh/678/head 2025-12-04T09:33:41.2866074Z * [new branch] gh/bdhirsh/678/orig -> origin/gh/bdhirsh/678/orig 2025-12-04T09:33:41.2868253Z * [new branch] gh/bdhirsh/679/base -> origin/gh/bdhirsh/679/base 2025-12-04T09:33:41.2869851Z * [new branch] gh/bdhirsh/679/head -> origin/gh/bdhirsh/679/head 2025-12-04T09:33:41.2871491Z * [new branch] gh/bdhirsh/679/orig -> origin/gh/bdhirsh/679/orig 2025-12-04T09:33:41.2873595Z * [new branch] gh/bdhirsh/680/base -> origin/gh/bdhirsh/680/base 2025-12-04T09:33:41.2875142Z * [new branch] gh/bdhirsh/680/head -> origin/gh/bdhirsh/680/head 2025-12-04T09:33:41.2876627Z * [new branch] gh/bdhirsh/680/orig -> origin/gh/bdhirsh/680/orig 2025-12-04T09:33:41.2878505Z * [new branch] gh/bdhirsh/681/base -> origin/gh/bdhirsh/681/base 2025-12-04T09:33:41.2880130Z * [new branch] gh/bdhirsh/681/head -> origin/gh/bdhirsh/681/head 2025-12-04T09:33:41.2881745Z * [new branch] gh/bdhirsh/681/orig -> origin/gh/bdhirsh/681/orig 2025-12-04T09:33:41.2884301Z * [new branch] gh/benjaminglass1/101/base -> origin/gh/benjaminglass1/101/base 2025-12-04T09:33:41.2885873Z * [new branch] gh/benjaminglass1/101/head -> origin/gh/benjaminglass1/101/head 2025-12-04T09:33:41.2887503Z * [new branch] gh/benjaminglass1/101/orig -> origin/gh/benjaminglass1/101/orig 2025-12-04T09:33:41.2889673Z * [new branch] gh/benjaminglass1/102/base -> origin/gh/benjaminglass1/102/base 2025-12-04T09:33:41.2891064Z * [new branch] gh/benjaminglass1/102/head -> origin/gh/benjaminglass1/102/head 2025-12-04T09:33:41.2892568Z * [new branch] gh/benjaminglass1/102/orig -> origin/gh/benjaminglass1/102/orig 2025-12-04T09:33:41.2894524Z * [new branch] gh/benjaminglass1/106/base -> origin/gh/benjaminglass1/106/base 2025-12-04T09:33:41.2896014Z * [new branch] gh/benjaminglass1/106/head -> origin/gh/benjaminglass1/106/head 2025-12-04T09:33:41.2897635Z * [new branch] gh/benjaminglass1/106/orig -> origin/gh/benjaminglass1/106/orig 2025-12-04T09:33:41.2899560Z * [new branch] gh/benjaminglass1/107/base -> origin/gh/benjaminglass1/107/base 2025-12-04T09:33:41.2901069Z * [new branch] gh/benjaminglass1/107/head -> origin/gh/benjaminglass1/107/head 2025-12-04T09:33:41.2902684Z * [new branch] gh/benjaminglass1/107/orig -> origin/gh/benjaminglass1/107/orig 2025-12-04T09:33:41.2904655Z * [new branch] gh/benjaminglass1/108/base -> origin/gh/benjaminglass1/108/base 2025-12-04T09:33:41.2906140Z * [new branch] gh/benjaminglass1/108/head -> origin/gh/benjaminglass1/108/head 2025-12-04T09:33:41.2907604Z * [new branch] gh/benjaminglass1/108/orig -> origin/gh/benjaminglass1/108/orig 2025-12-04T09:33:41.2909572Z * [new branch] gh/benjaminglass1/109/base -> origin/gh/benjaminglass1/109/base 2025-12-04T09:33:41.2911057Z * [new branch] gh/benjaminglass1/109/head -> origin/gh/benjaminglass1/109/head 2025-12-04T09:33:41.2912600Z * [new branch] gh/benjaminglass1/109/orig -> origin/gh/benjaminglass1/109/orig 2025-12-04T09:33:41.2914555Z * [new branch] gh/benjaminglass1/97/base -> origin/gh/benjaminglass1/97/base 2025-12-04T09:33:41.2916130Z * [new branch] gh/benjaminglass1/97/head -> origin/gh/benjaminglass1/97/head 2025-12-04T09:33:41.2917666Z * [new branch] gh/benjaminglass1/97/orig -> origin/gh/benjaminglass1/97/orig 2025-12-04T09:33:41.2920003Z * [new branch] gh/bobrenjc93/570/base -> origin/gh/bobrenjc93/570/base 2025-12-04T09:33:41.2921651Z * [new branch] gh/bobrenjc93/570/head -> origin/gh/bobrenjc93/570/head 2025-12-04T09:33:41.2923127Z * [new branch] gh/bobrenjc93/570/orig -> origin/gh/bobrenjc93/570/orig 2025-12-04T09:33:41.2925085Z * [new branch] gh/bobrenjc93/604/base -> origin/gh/bobrenjc93/604/base 2025-12-04T09:33:41.2926616Z * [new branch] gh/bobrenjc93/604/head -> origin/gh/bobrenjc93/604/head 2025-12-04T09:33:41.2928170Z * [new branch] gh/bobrenjc93/604/orig -> origin/gh/bobrenjc93/604/orig 2025-12-04T09:33:41.2930075Z * [new branch] gh/bobrenjc93/638/base -> origin/gh/bobrenjc93/638/base 2025-12-04T09:33:41.2931553Z * [new branch] gh/bobrenjc93/638/head -> origin/gh/bobrenjc93/638/head 2025-12-04T09:33:41.2933020Z * [new branch] gh/bobrenjc93/638/orig -> origin/gh/bobrenjc93/638/orig 2025-12-04T09:33:41.2934964Z * [new branch] gh/bobrenjc93/653/base -> origin/gh/bobrenjc93/653/base 2025-12-04T09:33:41.2936581Z * [new branch] gh/bobrenjc93/653/head -> origin/gh/bobrenjc93/653/head 2025-12-04T09:33:41.2938086Z * [new branch] gh/bobrenjc93/653/orig -> origin/gh/bobrenjc93/653/orig 2025-12-04T09:33:41.2940244Z * [new branch] gh/bobrenjc93/654/base -> origin/gh/bobrenjc93/654/base 2025-12-04T09:33:41.2941770Z * [new branch] gh/bobrenjc93/654/head -> origin/gh/bobrenjc93/654/head 2025-12-04T09:33:41.2943188Z * [new branch] gh/bobrenjc93/654/orig -> origin/gh/bobrenjc93/654/orig 2025-12-04T09:33:41.2945133Z * [new branch] gh/bobrenjc93/657/base -> origin/gh/bobrenjc93/657/base 2025-12-04T09:33:41.2946570Z * [new branch] gh/bobrenjc93/657/head -> origin/gh/bobrenjc93/657/head 2025-12-04T09:33:41.2948028Z * [new branch] gh/bobrenjc93/657/orig -> origin/gh/bobrenjc93/657/orig 2025-12-04T09:33:41.2950020Z * [new branch] gh/bobrenjc93/672/base -> origin/gh/bobrenjc93/672/base 2025-12-04T09:33:41.2951430Z * [new branch] gh/bobrenjc93/672/head -> origin/gh/bobrenjc93/672/head 2025-12-04T09:33:41.2952891Z * [new branch] gh/bobrenjc93/672/orig -> origin/gh/bobrenjc93/672/orig 2025-12-04T09:33:41.2954859Z * [new branch] gh/bobrenjc93/679/base -> origin/gh/bobrenjc93/679/base 2025-12-04T09:33:41.2956741Z * [new branch] gh/bobrenjc93/679/head -> origin/gh/bobrenjc93/679/head 2025-12-04T09:33:41.2958195Z * [new branch] gh/bobrenjc93/679/orig -> origin/gh/bobrenjc93/679/orig 2025-12-04T09:33:41.2960228Z * [new branch] gh/bobrenjc93/680/base -> origin/gh/bobrenjc93/680/base 2025-12-04T09:33:41.2961809Z * [new branch] gh/bobrenjc93/680/head -> origin/gh/bobrenjc93/680/head 2025-12-04T09:33:41.2963609Z * [new branch] gh/bobrenjc93/680/orig -> origin/gh/bobrenjc93/680/orig 2025-12-04T09:33:41.2965413Z * [new branch] gh/bobrenjc93/681/base -> origin/gh/bobrenjc93/681/base 2025-12-04T09:33:41.2966957Z * [new branch] gh/bobrenjc93/681/head -> origin/gh/bobrenjc93/681/head 2025-12-04T09:33:41.2968540Z * [new branch] gh/bobrenjc93/681/orig -> origin/gh/bobrenjc93/681/orig 2025-12-04T09:33:41.2970429Z * [new branch] gh/bobrenjc93/682/base -> origin/gh/bobrenjc93/682/base 2025-12-04T09:33:41.2972158Z * [new branch] gh/bobrenjc93/682/head -> origin/gh/bobrenjc93/682/head 2025-12-04T09:33:41.2973646Z * [new branch] gh/bobrenjc93/682/orig -> origin/gh/bobrenjc93/682/orig 2025-12-04T09:33:41.2975677Z * [new branch] gh/bobrenjc93/683/base -> origin/gh/bobrenjc93/683/base 2025-12-04T09:33:41.2977322Z * [new branch] gh/bobrenjc93/683/head -> origin/gh/bobrenjc93/683/head 2025-12-04T09:33:41.2978877Z * [new branch] gh/bobrenjc93/683/orig -> origin/gh/bobrenjc93/683/orig 2025-12-04T09:33:41.2980825Z * [new branch] gh/bobrenjc93/684/base -> origin/gh/bobrenjc93/684/base 2025-12-04T09:33:41.2982629Z * [new branch] gh/bobrenjc93/684/head -> origin/gh/bobrenjc93/684/head 2025-12-04T09:33:41.2984351Z * [new branch] gh/bobrenjc93/684/orig -> origin/gh/bobrenjc93/684/orig 2025-12-04T09:33:41.2986117Z * [new branch] gh/bobrenjc93/685/base -> origin/gh/bobrenjc93/685/base 2025-12-04T09:33:41.2988465Z * [new branch] gh/bobrenjc93/685/head -> origin/gh/bobrenjc93/685/head 2025-12-04T09:33:41.2991933Z * [new branch] gh/bobrenjc93/685/orig -> origin/gh/bobrenjc93/685/orig 2025-12-04T09:33:41.2992339Z * [new branch] gh/bobrenjc93/686/base -> origin/gh/bobrenjc93/686/base 2025-12-04T09:33:41.2993187Z * [new branch] gh/bobrenjc93/686/head -> origin/gh/bobrenjc93/686/head 2025-12-04T09:33:41.2994804Z * [new branch] gh/bobrenjc93/686/orig -> origin/gh/bobrenjc93/686/orig 2025-12-04T09:33:41.2996698Z * [new branch] gh/bobrenjc93/687/base -> origin/gh/bobrenjc93/687/base 2025-12-04T09:33:41.2998631Z * [new branch] gh/bobrenjc93/687/head -> origin/gh/bobrenjc93/687/head 2025-12-04T09:33:41.3000010Z * [new branch] gh/bobrenjc93/687/orig -> origin/gh/bobrenjc93/687/orig 2025-12-04T09:33:41.3002550Z * [new branch] gh/bobrenjc93/688/base -> origin/gh/bobrenjc93/688/base 2025-12-04T09:33:41.3004180Z * [new branch] gh/bobrenjc93/688/head -> origin/gh/bobrenjc93/688/head 2025-12-04T09:33:41.3005717Z * [new branch] gh/bobrenjc93/688/orig -> origin/gh/bobrenjc93/688/orig 2025-12-04T09:33:41.3007568Z * [new branch] gh/bobrenjc93/689/base -> origin/gh/bobrenjc93/689/base 2025-12-04T09:33:41.3009148Z * [new branch] gh/bobrenjc93/689/head -> origin/gh/bobrenjc93/689/head 2025-12-04T09:33:41.3010663Z * [new branch] gh/bobrenjc93/689/orig -> origin/gh/bobrenjc93/689/orig 2025-12-04T09:33:41.3012521Z * [new branch] gh/bobrenjc93/690/base -> origin/gh/bobrenjc93/690/base 2025-12-04T09:33:41.3013995Z * [new branch] gh/bobrenjc93/690/head -> origin/gh/bobrenjc93/690/head 2025-12-04T09:33:41.3015511Z * [new branch] gh/bobrenjc93/690/orig -> origin/gh/bobrenjc93/690/orig 2025-12-04T09:33:41.3018552Z * [new branch] gh/bobrenjc93/691/base -> origin/gh/bobrenjc93/691/base 2025-12-04T09:33:41.3020417Z * [new branch] gh/bobrenjc93/691/head -> origin/gh/bobrenjc93/691/head 2025-12-04T09:33:41.3022475Z * [new branch] gh/bobrenjc93/691/orig -> origin/gh/bobrenjc93/691/orig 2025-12-04T09:33:41.3025351Z * [new branch] gh/bobrenjc93/692/base -> origin/gh/bobrenjc93/692/base 2025-12-04T09:33:41.3026989Z * [new branch] gh/bobrenjc93/692/head -> origin/gh/bobrenjc93/692/head 2025-12-04T09:33:41.3028524Z * [new branch] gh/bobrenjc93/692/orig -> origin/gh/bobrenjc93/692/orig 2025-12-04T09:33:41.3030439Z * [new branch] gh/bobrenjc93/693/base -> origin/gh/bobrenjc93/693/base 2025-12-04T09:33:41.3031918Z * [new branch] gh/bobrenjc93/693/head -> origin/gh/bobrenjc93/693/head 2025-12-04T09:33:41.3033517Z * [new branch] gh/bobrenjc93/693/orig -> origin/gh/bobrenjc93/693/orig 2025-12-04T09:33:41.3035557Z * [new branch] gh/bobrenjc93/694/base -> origin/gh/bobrenjc93/694/base 2025-12-04T09:33:41.3037105Z * [new branch] gh/bobrenjc93/694/head -> origin/gh/bobrenjc93/694/head 2025-12-04T09:33:41.3038653Z * [new branch] gh/bobrenjc93/694/orig -> origin/gh/bobrenjc93/694/orig 2025-12-04T09:33:41.3040542Z * [new branch] gh/bobrenjc93/695/base -> origin/gh/bobrenjc93/695/base 2025-12-04T09:33:41.3042024Z * [new branch] gh/bobrenjc93/695/head -> origin/gh/bobrenjc93/695/head 2025-12-04T09:33:41.3043481Z * [new branch] gh/bobrenjc93/695/orig -> origin/gh/bobrenjc93/695/orig 2025-12-04T09:33:41.3046089Z * [new branch] gh/c00w/23/base -> origin/gh/c00w/23/base 2025-12-04T09:33:41.3047653Z * [new branch] gh/c00w/23/head -> origin/gh/c00w/23/head 2025-12-04T09:33:41.3049744Z * [new branch] gh/c00w/53/base -> origin/gh/c00w/53/base 2025-12-04T09:33:41.3051150Z * [new branch] gh/c00w/53/head -> origin/gh/c00w/53/head 2025-12-04T09:33:41.3052629Z * [new branch] gh/c00w/53/orig -> origin/gh/c00w/53/orig 2025-12-04T09:33:41.3054413Z * [new branch] gh/c00w/54/base -> origin/gh/c00w/54/base 2025-12-04T09:33:41.3056605Z * [new branch] gh/c00w/54/head -> origin/gh/c00w/54/head 2025-12-04T09:33:41.3058283Z * [new branch] gh/c00w/54/orig -> origin/gh/c00w/54/orig 2025-12-04T09:33:41.3060778Z * [new branch] gh/c00w/56/base -> origin/gh/c00w/56/base 2025-12-04T09:33:41.3062344Z * [new branch] gh/c00w/56/head -> origin/gh/c00w/56/head 2025-12-04T09:33:41.3063762Z * [new branch] gh/c00w/56/orig -> origin/gh/c00w/56/orig 2025-12-04T09:33:41.3065684Z * [new branch] gh/c00w/57/base -> origin/gh/c00w/57/base 2025-12-04T09:33:41.3067285Z * [new branch] gh/c00w/57/head -> origin/gh/c00w/57/head 2025-12-04T09:33:41.3068884Z * [new branch] gh/c00w/57/orig -> origin/gh/c00w/57/orig 2025-12-04T09:33:41.3070760Z * [new branch] gh/c00w/58/base -> origin/gh/c00w/58/base 2025-12-04T09:33:41.3075812Z * [new branch] gh/c00w/58/head -> origin/gh/c00w/58/head 2025-12-04T09:33:41.3077238Z * [new branch] gh/c00w/58/orig -> origin/gh/c00w/58/orig 2025-12-04T09:33:41.3079617Z * [new branch] gh/clee2000/1/base -> origin/gh/clee2000/1/base 2025-12-04T09:33:41.3081292Z * [new branch] gh/clee2000/1/head -> origin/gh/clee2000/1/head 2025-12-04T09:33:41.3082823Z * [new branch] gh/clee2000/1/orig -> origin/gh/clee2000/1/orig 2025-12-04T09:33:41.3085360Z * [new branch] gh/coconutruben/1/base -> origin/gh/coconutruben/1/base 2025-12-04T09:33:41.3087010Z * [new branch] gh/coconutruben/1/head -> origin/gh/coconutruben/1/head 2025-12-04T09:33:41.3089496Z * [new branch] gh/coconutruben/55/base -> origin/gh/coconutruben/55/base 2025-12-04T09:33:41.3090977Z * [new branch] gh/coconutruben/55/head -> origin/gh/coconutruben/55/head 2025-12-04T09:33:41.3092547Z * [new branch] gh/coconutruben/55/orig -> origin/gh/coconutruben/55/orig 2025-12-04T09:33:41.3094751Z * [new branch] gh/coconutruben/57/base -> origin/gh/coconutruben/57/base 2025-12-04T09:33:41.3096526Z * [new branch] gh/coconutruben/57/head -> origin/gh/coconutruben/57/head 2025-12-04T09:33:41.3098245Z * [new branch] gh/coconutruben/57/orig -> origin/gh/coconutruben/57/orig 2025-12-04T09:33:41.3100295Z * [new branch] gh/coconutruben/70/base -> origin/gh/coconutruben/70/base 2025-12-04T09:33:41.3101874Z * [new branch] gh/coconutruben/70/head -> origin/gh/coconutruben/70/head 2025-12-04T09:33:41.3103568Z * [new branch] gh/coconutruben/70/orig -> origin/gh/coconutruben/70/orig 2025-12-04T09:33:41.3105410Z * [new branch] gh/coconutruben/71/base -> origin/gh/coconutruben/71/base 2025-12-04T09:33:41.3106989Z * [new branch] gh/coconutruben/71/head -> origin/gh/coconutruben/71/head 2025-12-04T09:33:41.3108488Z * [new branch] gh/coconutruben/71/orig -> origin/gh/coconutruben/71/orig 2025-12-04T09:33:41.3110873Z * [new branch] gh/coconutruben/72/base -> origin/gh/coconutruben/72/base 2025-12-04T09:33:41.3112064Z * [new branch] gh/coconutruben/72/head -> origin/gh/coconutruben/72/head 2025-12-04T09:33:41.3113591Z * [new branch] gh/coconutruben/72/orig -> origin/gh/coconutruben/72/orig 2025-12-04T09:33:41.3115427Z * [new branch] gh/coconutruben/73/base -> origin/gh/coconutruben/73/base 2025-12-04T09:33:41.3116999Z * [new branch] gh/coconutruben/73/head -> origin/gh/coconutruben/73/head 2025-12-04T09:33:41.3118581Z * [new branch] gh/coconutruben/73/orig -> origin/gh/coconutruben/73/orig 2025-12-04T09:33:41.3120733Z * [new branch] gh/coconutruben/74/base -> origin/gh/coconutruben/74/base 2025-12-04T09:33:41.3122530Z * [new branch] gh/coconutruben/74/head -> origin/gh/coconutruben/74/head 2025-12-04T09:33:41.3123929Z * [new branch] gh/coconutruben/74/orig -> origin/gh/coconutruben/74/orig 2025-12-04T09:33:41.3126095Z * [new branch] gh/coconutruben/79/base -> origin/gh/coconutruben/79/base 2025-12-04T09:33:41.3127820Z * [new branch] gh/coconutruben/79/head -> origin/gh/coconutruben/79/head 2025-12-04T09:33:41.3129231Z * [new branch] gh/coconutruben/79/orig -> origin/gh/coconutruben/79/orig 2025-12-04T09:33:41.3131460Z * [new branch] gh/coconutruben/80/base -> origin/gh/coconutruben/80/base 2025-12-04T09:33:41.3132938Z * [new branch] gh/coconutruben/80/head -> origin/gh/coconutruben/80/head 2025-12-04T09:33:41.3134592Z * [new branch] gh/coconutruben/80/orig -> origin/gh/coconutruben/80/orig 2025-12-04T09:33:41.3136750Z * [new branch] gh/coconutruben/82/base -> origin/gh/coconutruben/82/base 2025-12-04T09:33:41.3138313Z * [new branch] gh/coconutruben/82/head -> origin/gh/coconutruben/82/head 2025-12-04T09:33:41.3139769Z * [new branch] gh/coconutruben/82/orig -> origin/gh/coconutruben/82/orig 2025-12-04T09:33:41.3142499Z * [new branch] gh/coconutruben/83/base -> origin/gh/coconutruben/83/base 2025-12-04T09:33:41.3143335Z * [new branch] gh/coconutruben/83/head -> origin/gh/coconutruben/83/head 2025-12-04T09:33:41.3144931Z * [new branch] gh/coconutruben/83/orig -> origin/gh/coconutruben/83/orig 2025-12-04T09:33:41.3147038Z * [new branch] gh/coconutruben/84/base -> origin/gh/coconutruben/84/base 2025-12-04T09:33:41.3148674Z * [new branch] gh/coconutruben/84/head -> origin/gh/coconutruben/84/head 2025-12-04T09:33:41.3150162Z * [new branch] gh/coconutruben/84/orig -> origin/gh/coconutruben/84/orig 2025-12-04T09:33:41.3152538Z * [new branch] gh/coconutruben/85/base -> origin/gh/coconutruben/85/base 2025-12-04T09:33:41.3153922Z * [new branch] gh/coconutruben/85/head -> origin/gh/coconutruben/85/head 2025-12-04T09:33:41.3155537Z * [new branch] gh/coconutruben/85/orig -> origin/gh/coconutruben/85/orig 2025-12-04T09:33:41.3157496Z * [new branch] gh/coconutruben/86/base -> origin/gh/coconutruben/86/base 2025-12-04T09:33:41.3159017Z * [new branch] gh/coconutruben/86/head -> origin/gh/coconutruben/86/head 2025-12-04T09:33:41.3160550Z * [new branch] gh/coconutruben/86/orig -> origin/gh/coconutruben/86/orig 2025-12-04T09:33:41.3163075Z * [new branch] gh/colinchan15/1/base -> origin/gh/colinchan15/1/base 2025-12-04T09:33:41.3164686Z * [new branch] gh/colinchan15/1/head -> origin/gh/colinchan15/1/head 2025-12-04T09:33:41.3166502Z * [new branch] gh/colinchan15/2/base -> origin/gh/colinchan15/2/base 2025-12-04T09:33:41.3167938Z * [new branch] gh/colinchan15/2/head -> origin/gh/colinchan15/2/head 2025-12-04T09:33:41.3169773Z * [new branch] gh/colinchan15/3/base -> origin/gh/colinchan15/3/base 2025-12-04T09:33:41.3171557Z * [new branch] gh/colinchan15/3/head -> origin/gh/colinchan15/3/head 2025-12-04T09:33:41.3173399Z * [new branch] gh/colinchan15/6/base -> origin/gh/colinchan15/6/base 2025-12-04T09:33:41.3174903Z * [new branch] gh/colinchan15/6/head -> origin/gh/colinchan15/6/head 2025-12-04T09:33:41.3177452Z * [new branch] gh/d4l3k/1/base -> origin/gh/d4l3k/1/base 2025-12-04T09:33:41.3178918Z * [new branch] gh/d4l3k/1/head -> origin/gh/d4l3k/1/head 2025-12-04T09:33:41.3180954Z * [new branch] gh/d4l3k/2/base -> origin/gh/d4l3k/2/base 2025-12-04T09:33:41.3182415Z * [new branch] gh/d4l3k/2/head -> origin/gh/d4l3k/2/head 2025-12-04T09:33:41.3183897Z * [new branch] gh/d4l3k/2/orig -> origin/gh/d4l3k/2/orig 2025-12-04T09:33:41.3185799Z * [new branch] gh/d4l3k/3/base -> origin/gh/d4l3k/3/base 2025-12-04T09:33:41.3187308Z * [new branch] gh/d4l3k/3/head -> origin/gh/d4l3k/3/head 2025-12-04T09:33:41.3188967Z * [new branch] gh/d4l3k/3/orig -> origin/gh/d4l3k/3/orig 2025-12-04T09:33:41.3190880Z * [new branch] gh/d4l3k/4/base -> origin/gh/d4l3k/4/base 2025-12-04T09:33:41.3192441Z * [new branch] gh/d4l3k/4/head -> origin/gh/d4l3k/4/head 2025-12-04T09:33:41.3194006Z * [new branch] gh/d4l3k/4/orig -> origin/gh/d4l3k/4/orig 2025-12-04T09:33:41.3195880Z * [new branch] gh/d4l3k/5/base -> origin/gh/d4l3k/5/base 2025-12-04T09:33:41.3197335Z * [new branch] gh/d4l3k/5/orig -> origin/gh/d4l3k/5/orig 2025-12-04T09:33:41.3199859Z * [new branch] gh/davidberard98/392/base -> origin/gh/davidberard98/392/base 2025-12-04T09:33:41.3201360Z * [new branch] gh/davidberard98/392/head -> origin/gh/davidberard98/392/head 2025-12-04T09:33:41.3202806Z * [new branch] gh/davidberard98/392/orig -> origin/gh/davidberard98/392/orig 2025-12-04T09:33:41.3204901Z * [new branch] gh/davidberard98/399/base -> origin/gh/davidberard98/399/base 2025-12-04T09:33:41.3206481Z * [new branch] gh/davidberard98/399/head -> origin/gh/davidberard98/399/head 2025-12-04T09:33:41.3207987Z * [new branch] gh/davidberard98/399/orig -> origin/gh/davidberard98/399/orig 2025-12-04T09:33:41.3210415Z * [new branch] gh/desertfire/605/base -> origin/gh/desertfire/605/base 2025-12-04T09:33:41.3211893Z * [new branch] gh/desertfire/605/head -> origin/gh/desertfire/605/head 2025-12-04T09:33:41.3213444Z * [new branch] gh/desertfire/605/orig -> origin/gh/desertfire/605/orig 2025-12-04T09:33:41.3215338Z * [new branch] gh/desertfire/606/base -> origin/gh/desertfire/606/base 2025-12-04T09:33:41.3216847Z * [new branch] gh/desertfire/606/head -> origin/gh/desertfire/606/head 2025-12-04T09:33:41.3218624Z * [new branch] gh/desertfire/606/orig -> origin/gh/desertfire/606/orig 2025-12-04T09:33:41.3220490Z * [new branch] gh/desertfire/607/base -> origin/gh/desertfire/607/base 2025-12-04T09:33:41.3221942Z * [new branch] gh/desertfire/607/head -> origin/gh/desertfire/607/head 2025-12-04T09:33:41.3223529Z * [new branch] gh/desertfire/607/orig -> origin/gh/desertfire/607/orig 2025-12-04T09:33:41.3225433Z * [new branch] gh/desertfire/608/base -> origin/gh/desertfire/608/base 2025-12-04T09:33:41.3226874Z * [new branch] gh/desertfire/608/head -> origin/gh/desertfire/608/head 2025-12-04T09:33:41.3228555Z * [new branch] gh/desertfire/608/orig -> origin/gh/desertfire/608/orig 2025-12-04T09:33:41.3230544Z * [new branch] gh/desertfire/609/base -> origin/gh/desertfire/609/base 2025-12-04T09:33:41.3232075Z * [new branch] gh/desertfire/609/head -> origin/gh/desertfire/609/head 2025-12-04T09:33:41.3233596Z * [new branch] gh/desertfire/609/orig -> origin/gh/desertfire/609/orig 2025-12-04T09:33:41.3235805Z * [new branch] gh/desertfire/610/base -> origin/gh/desertfire/610/base 2025-12-04T09:33:41.3237853Z * [new branch] gh/desertfire/610/head -> origin/gh/desertfire/610/head 2025-12-04T09:33:41.3239429Z * [new branch] gh/desertfire/610/orig -> origin/gh/desertfire/610/orig 2025-12-04T09:33:41.3257826Z * [new branch] gh/desertfire/611/base -> origin/gh/desertfire/611/base 2025-12-04T09:33:41.3258688Z * [new branch] gh/desertfire/611/head -> origin/gh/desertfire/611/head 2025-12-04T09:33:41.3258990Z * [new branch] gh/desertfire/611/orig -> origin/gh/desertfire/611/orig 2025-12-04T09:33:41.3259278Z * [new branch] gh/desertfire/612/base -> origin/gh/desertfire/612/base 2025-12-04T09:33:41.3259971Z * [new branch] gh/desertfire/612/head -> origin/gh/desertfire/612/head 2025-12-04T09:33:41.3260243Z * [new branch] gh/desertfire/612/orig -> origin/gh/desertfire/612/orig 2025-12-04T09:33:41.3260523Z * [new branch] gh/desertfire/613/base -> origin/gh/desertfire/613/base 2025-12-04T09:33:41.3260799Z * [new branch] gh/desertfire/613/head -> origin/gh/desertfire/613/head 2025-12-04T09:33:41.3261077Z * [new branch] gh/desertfire/613/orig -> origin/gh/desertfire/613/orig 2025-12-04T09:33:41.3261341Z * [new branch] gh/desertfire/614/base -> origin/gh/desertfire/614/base 2025-12-04T09:33:41.3261605Z * [new branch] gh/desertfire/614/head -> origin/gh/desertfire/614/head 2025-12-04T09:33:41.3261955Z * [new branch] gh/desertfire/614/orig -> origin/gh/desertfire/614/orig 2025-12-04T09:33:41.3262218Z * [new branch] gh/desertfire/615/base -> origin/gh/desertfire/615/base 2025-12-04T09:33:41.3262505Z * [new branch] gh/desertfire/615/head -> origin/gh/desertfire/615/head 2025-12-04T09:33:41.3263916Z * [new branch] gh/desertfire/615/orig -> origin/gh/desertfire/615/orig 2025-12-04T09:33:41.3265726Z * [new branch] gh/desertfire/616/base -> origin/gh/desertfire/616/base 2025-12-04T09:33:41.3267345Z * [new branch] gh/desertfire/616/head -> origin/gh/desertfire/616/head 2025-12-04T09:33:41.3268854Z * [new branch] gh/desertfire/616/orig -> origin/gh/desertfire/616/orig 2025-12-04T09:33:41.3270664Z * [new branch] gh/desertfire/617/base -> origin/gh/desertfire/617/base 2025-12-04T09:33:41.3273948Z * [new branch] gh/desertfire/617/head -> origin/gh/desertfire/617/head 2025-12-04T09:33:41.3275342Z * [new branch] gh/desertfire/617/orig -> origin/gh/desertfire/617/orig 2025-12-04T09:33:41.3277718Z * [new branch] gh/dharakk/1/base -> origin/gh/dharakk/1/base 2025-12-04T09:33:41.3279408Z * [new branch] gh/dharakk/1/head -> origin/gh/dharakk/1/head 2025-12-04T09:33:41.3281716Z * [new branch] gh/drisspg/170/base -> origin/gh/drisspg/170/base 2025-12-04T09:33:41.3283189Z * [new branch] gh/drisspg/170/head -> origin/gh/drisspg/170/head 2025-12-04T09:33:41.3284671Z * [new branch] gh/drisspg/170/orig -> origin/gh/drisspg/170/orig 2025-12-04T09:33:41.3286608Z * [new branch] gh/drisspg/182/base -> origin/gh/drisspg/182/base 2025-12-04T09:33:41.3288251Z * [new branch] gh/drisspg/182/head -> origin/gh/drisspg/182/head 2025-12-04T09:33:41.3290557Z * [new branch] gh/drisspg/183/base -> origin/gh/drisspg/183/base 2025-12-04T09:33:41.3291948Z * [new branch] gh/drisspg/183/head -> origin/gh/drisspg/183/head 2025-12-04T09:33:41.3294175Z * [new branch] gh/drisspg/184/base -> origin/gh/drisspg/184/base 2025-12-04T09:33:41.3295582Z * [new branch] gh/drisspg/184/head -> origin/gh/drisspg/184/head 2025-12-04T09:33:41.3297805Z * [new branch] gh/drisspg/185/base -> origin/gh/drisspg/185/base 2025-12-04T09:33:41.3299286Z * [new branch] gh/drisspg/185/head -> origin/gh/drisspg/185/head 2025-12-04T09:33:41.3301222Z * [new branch] gh/drisspg/194/base -> origin/gh/drisspg/194/base 2025-12-04T09:33:41.3302763Z * [new branch] gh/drisspg/194/head -> origin/gh/drisspg/194/head 2025-12-04T09:33:41.3304257Z * [new branch] gh/drisspg/194/orig -> origin/gh/drisspg/194/orig 2025-12-04T09:33:41.3306156Z * [new branch] gh/drisspg/200/base -> origin/gh/drisspg/200/base 2025-12-04T09:33:41.3307748Z * [new branch] gh/drisspg/200/head -> origin/gh/drisspg/200/head 2025-12-04T09:33:41.3309446Z * [new branch] gh/drisspg/200/orig -> origin/gh/drisspg/200/orig 2025-12-04T09:33:41.3311296Z * [new branch] gh/drisspg/218/base -> origin/gh/drisspg/218/base 2025-12-04T09:33:41.3312817Z * [new branch] gh/drisspg/218/head -> origin/gh/drisspg/218/head 2025-12-04T09:33:41.3314314Z * [new branch] gh/drisspg/218/orig -> origin/gh/drisspg/218/orig 2025-12-04T09:33:41.3316222Z * [new branch] gh/drisspg/219/base -> origin/gh/drisspg/219/base 2025-12-04T09:33:41.3317685Z * [new branch] gh/drisspg/219/head -> origin/gh/drisspg/219/head 2025-12-04T09:33:41.3319220Z * [new branch] gh/drisspg/219/orig -> origin/gh/drisspg/219/orig 2025-12-04T09:33:41.3321098Z * [new branch] gh/drisspg/220/base -> origin/gh/drisspg/220/base 2025-12-04T09:33:41.3322582Z * [new branch] gh/drisspg/220/head -> origin/gh/drisspg/220/head 2025-12-04T09:33:41.3324064Z * [new branch] gh/drisspg/220/orig -> origin/gh/drisspg/220/orig 2025-12-04T09:33:41.3326048Z * [new branch] gh/drisspg/221/base -> origin/gh/drisspg/221/base 2025-12-04T09:33:41.3327629Z * [new branch] gh/drisspg/221/head -> origin/gh/drisspg/221/head 2025-12-04T09:33:41.3329079Z * [new branch] gh/drisspg/221/orig -> origin/gh/drisspg/221/orig 2025-12-04T09:33:41.3330993Z * [new branch] gh/drisspg/222/base -> origin/gh/drisspg/222/base 2025-12-04T09:33:41.3332511Z * [new branch] gh/drisspg/222/head -> origin/gh/drisspg/222/head 2025-12-04T09:33:41.3334016Z * [new branch] gh/drisspg/222/orig -> origin/gh/drisspg/222/orig 2025-12-04T09:33:41.3335946Z * [new branch] gh/drisspg/223/base -> origin/gh/drisspg/223/base 2025-12-04T09:33:41.3337519Z * [new branch] gh/drisspg/223/head -> origin/gh/drisspg/223/head 2025-12-04T09:33:41.3339058Z * [new branch] gh/drisspg/223/orig -> origin/gh/drisspg/223/orig 2025-12-04T09:33:41.3341005Z * [new branch] gh/drisspg/224/base -> origin/gh/drisspg/224/base 2025-12-04T09:33:41.3342485Z * [new branch] gh/drisspg/224/head -> origin/gh/drisspg/224/head 2025-12-04T09:33:41.3343962Z * [new branch] gh/drisspg/224/orig -> origin/gh/drisspg/224/orig 2025-12-04T09:33:41.3345905Z * [new branch] gh/drisspg/225/base -> origin/gh/drisspg/225/base 2025-12-04T09:33:41.3347491Z * [new branch] gh/drisspg/225/head -> origin/gh/drisspg/225/head 2025-12-04T09:33:41.3348993Z * [new branch] gh/drisspg/225/orig -> origin/gh/drisspg/225/orig 2025-12-04T09:33:41.3350872Z * [new branch] gh/drisspg/226/base -> origin/gh/drisspg/226/base 2025-12-04T09:33:41.3352300Z * [new branch] gh/drisspg/226/head -> origin/gh/drisspg/226/head 2025-12-04T09:33:41.3353890Z * [new branch] gh/drisspg/226/orig -> origin/gh/drisspg/226/orig 2025-12-04T09:33:41.3356378Z * [new branch] gh/drisspg/227/base -> origin/gh/drisspg/227/base 2025-12-04T09:33:41.3357806Z * [new branch] gh/drisspg/227/head -> origin/gh/drisspg/227/head 2025-12-04T09:33:41.3359323Z * [new branch] gh/drisspg/227/orig -> origin/gh/drisspg/227/orig 2025-12-04T09:33:41.3361302Z * [new branch] gh/drisspg/228/base -> origin/gh/drisspg/228/base 2025-12-04T09:33:41.3362809Z * [new branch] gh/drisspg/228/head -> origin/gh/drisspg/228/head 2025-12-04T09:33:41.3364253Z * [new branch] gh/drisspg/228/orig -> origin/gh/drisspg/228/orig 2025-12-04T09:33:41.3366194Z * [new branch] gh/drisspg/229/base -> origin/gh/drisspg/229/base 2025-12-04T09:33:41.3367797Z * [new branch] gh/drisspg/229/head -> origin/gh/drisspg/229/head 2025-12-04T09:33:41.3369365Z * [new branch] gh/drisspg/229/orig -> origin/gh/drisspg/229/orig 2025-12-04T09:33:41.3371579Z * [new branch] gh/drisspg/230/base -> origin/gh/drisspg/230/base 2025-12-04T09:33:41.3373102Z * [new branch] gh/drisspg/230/head -> origin/gh/drisspg/230/head 2025-12-04T09:33:41.3374583Z * [new branch] gh/drisspg/230/orig -> origin/gh/drisspg/230/orig 2025-12-04T09:33:41.3377019Z * [new branch] gh/dsjohns2/1/base -> origin/gh/dsjohns2/1/base 2025-12-04T09:33:41.3378602Z * [new branch] gh/dsjohns2/1/head -> origin/gh/dsjohns2/1/head 2025-12-04T09:33:41.3381090Z * [new branch] gh/dzmitry-huba/1/base -> origin/gh/dzmitry-huba/1/base 2025-12-04T09:33:41.3382577Z * [new branch] gh/dzmitry-huba/1/head -> origin/gh/dzmitry-huba/1/head 2025-12-04T09:33:41.3384829Z * [new branch] gh/dzmitry-huba/12/base -> origin/gh/dzmitry-huba/12/base 2025-12-04T09:33:41.3386798Z * [new branch] gh/dzmitry-huba/12/head -> origin/gh/dzmitry-huba/12/head 2025-12-04T09:33:41.3388072Z * [new branch] gh/dzmitry-huba/12/orig -> origin/gh/dzmitry-huba/12/orig 2025-12-04T09:33:41.3390215Z * [new branch] gh/dzmitry-huba/13/base -> origin/gh/dzmitry-huba/13/base 2025-12-04T09:33:41.3391742Z * [new branch] gh/dzmitry-huba/13/head -> origin/gh/dzmitry-huba/13/head 2025-12-04T09:33:41.3393246Z * [new branch] gh/dzmitry-huba/13/orig -> origin/gh/dzmitry-huba/13/orig 2025-12-04T09:33:41.3395146Z * [new branch] gh/dzmitry-huba/14/base -> origin/gh/dzmitry-huba/14/base 2025-12-04T09:33:41.3396717Z * [new branch] gh/dzmitry-huba/14/head -> origin/gh/dzmitry-huba/14/head 2025-12-04T09:33:41.3398215Z * [new branch] gh/dzmitry-huba/14/orig -> origin/gh/dzmitry-huba/14/orig 2025-12-04T09:33:41.3400297Z * [new branch] gh/dzmitry-huba/15/base -> origin/gh/dzmitry-huba/15/base 2025-12-04T09:33:41.3401784Z * [new branch] gh/dzmitry-huba/15/head -> origin/gh/dzmitry-huba/15/head 2025-12-04T09:33:41.3403143Z * [new branch] gh/dzmitry-huba/15/orig -> origin/gh/dzmitry-huba/15/orig 2025-12-04T09:33:41.3405263Z * [new branch] gh/dzmitry-huba/16/base -> origin/gh/dzmitry-huba/16/base 2025-12-04T09:33:41.3406918Z * [new branch] gh/dzmitry-huba/16/head -> origin/gh/dzmitry-huba/16/head 2025-12-04T09:33:41.3408583Z * [new branch] gh/dzmitry-huba/16/orig -> origin/gh/dzmitry-huba/16/orig 2025-12-04T09:33:41.3410533Z * [new branch] gh/dzmitry-huba/17/base -> origin/gh/dzmitry-huba/17/base 2025-12-04T09:33:41.3411991Z * [new branch] gh/dzmitry-huba/17/head -> origin/gh/dzmitry-huba/17/head 2025-12-04T09:33:41.3413452Z * [new branch] gh/dzmitry-huba/17/orig -> origin/gh/dzmitry-huba/17/orig 2025-12-04T09:33:41.3415194Z * [new branch] gh/dzmitry-huba/2/base -> origin/gh/dzmitry-huba/2/base 2025-12-04T09:33:41.3416725Z * [new branch] gh/dzmitry-huba/2/head -> origin/gh/dzmitry-huba/2/head 2025-12-04T09:33:41.3418559Z * [new branch] gh/dzmitry-huba/3/base -> origin/gh/dzmitry-huba/3/base 2025-12-04T09:33:41.3419918Z * [new branch] gh/dzmitry-huba/3/head -> origin/gh/dzmitry-huba/3/head 2025-12-04T09:33:41.3422340Z * [new branch] gh/eellison/808/base -> origin/gh/eellison/808/base 2025-12-04T09:33:41.3423900Z * [new branch] gh/eellison/808/head -> origin/gh/eellison/808/head 2025-12-04T09:33:41.3425479Z * [new branch] gh/eellison/808/orig -> origin/gh/eellison/808/orig 2025-12-04T09:33:41.3427766Z * [new branch] gh/eellison/822/base -> origin/gh/eellison/822/base 2025-12-04T09:33:41.3429402Z * [new branch] gh/eellison/822/head -> origin/gh/eellison/822/head 2025-12-04T09:33:41.3430747Z * [new branch] gh/eellison/822/orig -> origin/gh/eellison/822/orig 2025-12-04T09:33:41.3432686Z * [new branch] gh/eellison/823/base -> origin/gh/eellison/823/base 2025-12-04T09:33:41.3434128Z * [new branch] gh/eellison/823/head -> origin/gh/eellison/823/head 2025-12-04T09:33:41.3435602Z * [new branch] gh/eellison/823/orig -> origin/gh/eellison/823/orig 2025-12-04T09:33:41.3437476Z * [new branch] gh/eellison/862/base -> origin/gh/eellison/862/base 2025-12-04T09:33:41.3438931Z * [new branch] gh/eellison/862/head -> origin/gh/eellison/862/head 2025-12-04T09:33:41.3440397Z * [new branch] gh/eellison/862/orig -> origin/gh/eellison/862/orig 2025-12-04T09:33:41.3442294Z * [new branch] gh/eellison/863/base -> origin/gh/eellison/863/base 2025-12-04T09:33:41.3443725Z * [new branch] gh/eellison/863/head -> origin/gh/eellison/863/head 2025-12-04T09:33:41.3445342Z * [new branch] gh/eellison/863/orig -> origin/gh/eellison/863/orig 2025-12-04T09:33:41.3447184Z * [new branch] gh/eellison/864/base -> origin/gh/eellison/864/base 2025-12-04T09:33:41.3448726Z * [new branch] gh/eellison/864/head -> origin/gh/eellison/864/head 2025-12-04T09:33:41.3450777Z * [new branch] gh/eellison/864/orig -> origin/gh/eellison/864/orig 2025-12-04T09:33:41.3452564Z * [new branch] gh/eellison/865/base -> origin/gh/eellison/865/base 2025-12-04T09:33:41.3453851Z * [new branch] gh/eellison/865/head -> origin/gh/eellison/865/head 2025-12-04T09:33:41.3455452Z * [new branch] gh/eellison/865/orig -> origin/gh/eellison/865/orig 2025-12-04T09:33:41.3457689Z * [new branch] gh/eellison/866/base -> origin/gh/eellison/866/base 2025-12-04T09:33:41.3458828Z * [new branch] gh/eellison/866/head -> origin/gh/eellison/866/head 2025-12-04T09:33:41.3460494Z * [new branch] gh/eellison/866/orig -> origin/gh/eellison/866/orig 2025-12-04T09:33:41.3462672Z * [new branch] gh/eellison/867/base -> origin/gh/eellison/867/base 2025-12-04T09:33:41.3463984Z * [new branch] gh/eellison/867/head -> origin/gh/eellison/867/head 2025-12-04T09:33:41.3465608Z * [new branch] gh/eellison/867/orig -> origin/gh/eellison/867/orig 2025-12-04T09:33:41.3467803Z * [new branch] gh/eellison/868/base -> origin/gh/eellison/868/base 2025-12-04T09:33:41.3469680Z * [new branch] gh/eellison/868/head -> origin/gh/eellison/868/head 2025-12-04T09:33:41.3471198Z * [new branch] gh/eellison/868/orig -> origin/gh/eellison/868/orig 2025-12-04T09:33:41.3473454Z * [new branch] gh/eellison/869/base -> origin/gh/eellison/869/base 2025-12-04T09:33:41.3474699Z * [new branch] gh/eellison/869/head -> origin/gh/eellison/869/head 2025-12-04T09:33:41.3476249Z * [new branch] gh/eellison/869/orig -> origin/gh/eellison/869/orig 2025-12-04T09:33:41.3478295Z * [new branch] gh/eellison/870/base -> origin/gh/eellison/870/base 2025-12-04T09:33:41.3479595Z * [new branch] gh/eellison/870/head -> origin/gh/eellison/870/head 2025-12-04T09:33:41.3480954Z * [new branch] gh/eellison/870/orig -> origin/gh/eellison/870/orig 2025-12-04T09:33:41.3483206Z * [new branch] gh/eellison/871/base -> origin/gh/eellison/871/base 2025-12-04T09:33:41.3484506Z * [new branch] gh/eellison/871/head -> origin/gh/eellison/871/head 2025-12-04T09:33:41.3486797Z * [new branch] gh/eellison/871/orig -> origin/gh/eellison/871/orig 2025-12-04T09:33:41.3489086Z * [new branch] gh/eellison/872/base -> origin/gh/eellison/872/base 2025-12-04T09:33:41.3490240Z * [new branch] gh/eellison/872/head -> origin/gh/eellison/872/head 2025-12-04T09:33:41.3491772Z * [new branch] gh/eellison/872/orig -> origin/gh/eellison/872/orig 2025-12-04T09:33:41.3493952Z * [new branch] gh/eellison/873/base -> origin/gh/eellison/873/base 2025-12-04T09:33:41.3495209Z * [new branch] gh/eellison/873/head -> origin/gh/eellison/873/head 2025-12-04T09:33:41.3496897Z * [new branch] gh/eellison/873/orig -> origin/gh/eellison/873/orig 2025-12-04T09:33:41.3498967Z * [new branch] gh/eellison/874/base -> origin/gh/eellison/874/base 2025-12-04T09:33:41.3500566Z * [new branch] gh/eellison/874/head -> origin/gh/eellison/874/head 2025-12-04T09:33:41.3501916Z * [new branch] gh/eellison/874/orig -> origin/gh/eellison/874/orig 2025-12-04T09:33:41.3504557Z * [new branch] gh/eellison/875/base -> origin/gh/eellison/875/base 2025-12-04T09:33:41.3506235Z * [new branch] gh/eellison/875/head -> origin/gh/eellison/875/head 2025-12-04T09:33:41.3507827Z * [new branch] gh/eellison/875/orig -> origin/gh/eellison/875/orig 2025-12-04T09:33:41.3510042Z * [new branch] gh/eellison/876/base -> origin/gh/eellison/876/base 2025-12-04T09:33:41.3511583Z * [new branch] gh/eellison/876/head -> origin/gh/eellison/876/head 2025-12-04T09:33:41.3512966Z * [new branch] gh/eellison/876/orig -> origin/gh/eellison/876/orig 2025-12-04T09:33:41.3515232Z * [new branch] gh/eellison/877/base -> origin/gh/eellison/877/base 2025-12-04T09:33:41.3516549Z * [new branch] gh/eellison/877/head -> origin/gh/eellison/877/head 2025-12-04T09:33:41.3518116Z * [new branch] gh/eellison/877/orig -> origin/gh/eellison/877/orig 2025-12-04T09:33:41.3520080Z * [new branch] gh/eellison/878/base -> origin/gh/eellison/878/base 2025-12-04T09:33:41.3521966Z * [new branch] gh/eellison/878/head -> origin/gh/eellison/878/head 2025-12-04T09:33:41.3522793Z * [new branch] gh/eellison/878/orig -> origin/gh/eellison/878/orig 2025-12-04T09:33:41.3524936Z * [new branch] gh/eellison/879/base -> origin/gh/eellison/879/base 2025-12-04T09:33:41.3526416Z * [new branch] gh/eellison/879/head -> origin/gh/eellison/879/head 2025-12-04T09:33:41.3527996Z * [new branch] gh/eellison/879/orig -> origin/gh/eellison/879/orig 2025-12-04T09:33:41.3529839Z * [new branch] gh/eellison/880/base -> origin/gh/eellison/880/base 2025-12-04T09:33:41.3531339Z * [new branch] gh/eellison/880/head -> origin/gh/eellison/880/head 2025-12-04T09:33:41.3532885Z * [new branch] gh/eellison/880/orig -> origin/gh/eellison/880/orig 2025-12-04T09:33:41.3534978Z * [new branch] gh/eellison/881/base -> origin/gh/eellison/881/base 2025-12-04T09:33:41.3536498Z * [new branch] gh/eellison/881/head -> origin/gh/eellison/881/head 2025-12-04T09:33:41.3538027Z * [new branch] gh/eellison/881/orig -> origin/gh/eellison/881/orig 2025-12-04T09:33:41.3539947Z * [new branch] gh/eellison/882/base -> origin/gh/eellison/882/base 2025-12-04T09:33:41.3541420Z * [new branch] gh/eellison/882/head -> origin/gh/eellison/882/head 2025-12-04T09:33:41.3543107Z * [new branch] gh/eellison/882/orig -> origin/gh/eellison/882/orig 2025-12-04T09:33:41.3544996Z * [new branch] gh/eellison/883/base -> origin/gh/eellison/883/base 2025-12-04T09:33:41.3546508Z * [new branch] gh/eellison/883/head -> origin/gh/eellison/883/head 2025-12-04T09:33:41.3548137Z * [new branch] gh/eellison/883/orig -> origin/gh/eellison/883/orig 2025-12-04T09:33:41.3549925Z * [new branch] gh/eellison/884/base -> origin/gh/eellison/884/base 2025-12-04T09:33:41.3551387Z * [new branch] gh/eellison/884/head -> origin/gh/eellison/884/head 2025-12-04T09:33:41.3552762Z * [new branch] gh/eellison/884/orig -> origin/gh/eellison/884/orig 2025-12-04T09:33:41.3555164Z * [new branch] gh/etaf/147/base -> origin/gh/etaf/147/base 2025-12-04T09:33:41.3556748Z * [new branch] gh/etaf/147/head -> origin/gh/etaf/147/head 2025-12-04T09:33:41.3559002Z * [new branch] gh/etaf/154/base -> origin/gh/etaf/154/base 2025-12-04T09:33:41.3560508Z * [new branch] gh/etaf/154/head -> origin/gh/etaf/154/head 2025-12-04T09:33:41.3561970Z * [new branch] gh/etaf/154/orig -> origin/gh/etaf/154/orig 2025-12-04T09:33:41.3564457Z * [new branch] gh/etaf/156/base -> origin/gh/etaf/156/base 2025-12-04T09:33:41.3565908Z * [new branch] gh/etaf/156/head -> origin/gh/etaf/156/head 2025-12-04T09:33:41.3567594Z * [new branch] gh/etaf/156/orig -> origin/gh/etaf/156/orig 2025-12-04T09:33:41.3569782Z * [new branch] gh/etaf/157/base -> origin/gh/etaf/157/base 2025-12-04T09:33:41.3571480Z * [new branch] gh/etaf/157/head -> origin/gh/etaf/157/head 2025-12-04T09:33:41.3573042Z * [new branch] gh/etaf/157/orig -> origin/gh/etaf/157/orig 2025-12-04T09:33:41.3575177Z * [new branch] gh/etaf/158/base -> origin/gh/etaf/158/base 2025-12-04T09:33:41.3576884Z * [new branch] gh/etaf/158/head -> origin/gh/etaf/158/head 2025-12-04T09:33:41.3578281Z * [new branch] gh/etaf/158/orig -> origin/gh/etaf/158/orig 2025-12-04T09:33:41.3580273Z * [new branch] gh/etaf/159/base -> origin/gh/etaf/159/base 2025-12-04T09:33:41.3581799Z * [new branch] gh/etaf/159/head -> origin/gh/etaf/159/head 2025-12-04T09:33:41.3583239Z * [new branch] gh/etaf/159/orig -> origin/gh/etaf/159/orig 2025-12-04T09:33:41.3585314Z * [new branch] gh/etaf/160/base -> origin/gh/etaf/160/base 2025-12-04T09:33:41.3586870Z * [new branch] gh/etaf/160/head -> origin/gh/etaf/160/head 2025-12-04T09:33:41.3588464Z * [new branch] gh/etaf/160/orig -> origin/gh/etaf/160/orig 2025-12-04T09:33:41.3590354Z * [new branch] gh/etaf/161/base -> origin/gh/etaf/161/base 2025-12-04T09:33:41.3591971Z * [new branch] gh/etaf/161/head -> origin/gh/etaf/161/head 2025-12-04T09:33:41.3593434Z * [new branch] gh/etaf/161/orig -> origin/gh/etaf/161/orig 2025-12-04T09:33:41.3595366Z * [new branch] gh/etaf/166/base -> origin/gh/etaf/166/base 2025-12-04T09:33:41.3597065Z * [new branch] gh/etaf/166/head -> origin/gh/etaf/166/head 2025-12-04T09:33:41.3598560Z * [new branch] gh/etaf/166/orig -> origin/gh/etaf/166/orig 2025-12-04T09:33:41.3600407Z * [new branch] gh/etaf/167/base -> origin/gh/etaf/167/base 2025-12-04T09:33:41.3601960Z * [new branch] gh/etaf/167/head -> origin/gh/etaf/167/head 2025-12-04T09:33:41.3603409Z * [new branch] gh/etaf/167/orig -> origin/gh/etaf/167/orig 2025-12-04T09:33:41.3605492Z * [new branch] gh/etaf/168/base -> origin/gh/etaf/168/base 2025-12-04T09:33:41.3607120Z * [new branch] gh/etaf/168/head -> origin/gh/etaf/168/head 2025-12-04T09:33:41.3608654Z * [new branch] gh/etaf/168/orig -> origin/gh/etaf/168/orig 2025-12-04T09:33:41.3610883Z * [new branch] gh/etaf/172/base -> origin/gh/etaf/172/base 2025-12-04T09:33:41.3612258Z * [new branch] gh/etaf/172/head -> origin/gh/etaf/172/head 2025-12-04T09:33:41.3613797Z * [new branch] gh/etaf/172/orig -> origin/gh/etaf/172/orig 2025-12-04T09:33:41.3615986Z * [new branch] gh/etaf/173/base -> origin/gh/etaf/173/base 2025-12-04T09:33:41.3617739Z * [new branch] gh/etaf/173/head -> origin/gh/etaf/173/head 2025-12-04T09:33:41.3619750Z * [new branch] gh/etaf/173/orig -> origin/gh/etaf/173/orig 2025-12-04T09:33:41.3621839Z * [new branch] gh/etaf/174/base -> origin/gh/etaf/174/base 2025-12-04T09:33:41.3623344Z * [new branch] gh/etaf/174/head -> origin/gh/etaf/174/head 2025-12-04T09:33:41.3625305Z * [new branch] gh/etaf/175/base -> origin/gh/etaf/175/base 2025-12-04T09:33:41.3626851Z * [new branch] gh/etaf/175/head -> origin/gh/etaf/175/head 2025-12-04T09:33:41.3628201Z * [new branch] gh/etaf/175/orig -> origin/gh/etaf/175/orig 2025-12-04T09:33:41.3630417Z * [new branch] gh/etaf/176/base -> origin/gh/etaf/176/base 2025-12-04T09:33:41.3631978Z * [new branch] gh/etaf/176/head -> origin/gh/etaf/176/head 2025-12-04T09:33:41.3633482Z * [new branch] gh/etaf/176/orig -> origin/gh/etaf/176/orig 2025-12-04T09:33:41.3635935Z * [new branch] gh/etaf/177/base -> origin/gh/etaf/177/base 2025-12-04T09:33:41.3637677Z * [new branch] gh/etaf/177/head -> origin/gh/etaf/177/head 2025-12-04T09:33:41.3639202Z * [new branch] gh/etaf/177/orig -> origin/gh/etaf/177/orig 2025-12-04T09:33:41.3641398Z * [new branch] gh/etaf/178/base -> origin/gh/etaf/178/base 2025-12-04T09:33:41.3643079Z * [new branch] gh/etaf/178/head -> origin/gh/etaf/178/head 2025-12-04T09:33:41.3644518Z * [new branch] gh/etaf/178/orig -> origin/gh/etaf/178/orig 2025-12-04T09:33:41.3646572Z * [new branch] gh/etaf/179/base -> origin/gh/etaf/179/base 2025-12-04T09:33:41.3648043Z * [new branch] gh/etaf/179/head -> origin/gh/etaf/179/head 2025-12-04T09:33:41.3649491Z * [new branch] gh/etaf/179/orig -> origin/gh/etaf/179/orig 2025-12-04T09:33:41.3651475Z * [new branch] gh/etaf/180/base -> origin/gh/etaf/180/base 2025-12-04T09:33:41.3653000Z * [new branch] gh/etaf/180/head -> origin/gh/etaf/180/head 2025-12-04T09:33:41.3654497Z * [new branch] gh/etaf/180/orig -> origin/gh/etaf/180/orig 2025-12-04T09:33:41.3657520Z * [new branch] gh/exclamaforte/1/base -> origin/gh/exclamaforte/1/base 2025-12-04T09:33:41.3658696Z * [new branch] gh/exclamaforte/1/head -> origin/gh/exclamaforte/1/head 2025-12-04T09:33:41.3660600Z * [new branch] gh/exclamaforte/2/base -> origin/gh/exclamaforte/2/base 2025-12-04T09:33:41.3661772Z * [new branch] gh/exclamaforte/2/head -> origin/gh/exclamaforte/2/head 2025-12-04T09:33:41.3663738Z * [new branch] gh/exclamaforte/3/base -> origin/gh/exclamaforte/3/base 2025-12-04T09:33:41.3665242Z * [new branch] gh/exclamaforte/3/head -> origin/gh/exclamaforte/3/head 2025-12-04T09:33:41.3667200Z * [new branch] gh/exclamaforte/4/base -> origin/gh/exclamaforte/4/base 2025-12-04T09:33:41.3668758Z * [new branch] gh/exclamaforte/4/head -> origin/gh/exclamaforte/4/head 2025-12-04T09:33:41.3671322Z * [new branch] gh/ezyang/2374/base -> origin/gh/ezyang/2374/base 2025-12-04T09:33:41.3672941Z * [new branch] gh/ezyang/2374/head -> origin/gh/ezyang/2374/head 2025-12-04T09:33:41.3674582Z * [new branch] gh/ezyang/2374/orig -> origin/gh/ezyang/2374/orig 2025-12-04T09:33:41.3676483Z * [new branch] gh/ezyang/2973/base -> origin/gh/ezyang/2973/base 2025-12-04T09:33:41.3677949Z * [new branch] gh/ezyang/2973/head -> origin/gh/ezyang/2973/head 2025-12-04T09:33:41.3679502Z * [new branch] gh/ezyang/2973/orig -> origin/gh/ezyang/2973/orig 2025-12-04T09:33:41.3681392Z * [new branch] gh/ezyang/2974/base -> origin/gh/ezyang/2974/base 2025-12-04T09:33:41.3682861Z * [new branch] gh/ezyang/2974/head -> origin/gh/ezyang/2974/head 2025-12-04T09:33:41.3684523Z * [new branch] gh/ezyang/2974/orig -> origin/gh/ezyang/2974/orig 2025-12-04T09:33:41.3686396Z * [new branch] gh/ezyang/3131/base -> origin/gh/ezyang/3131/base 2025-12-04T09:33:41.3688078Z * [new branch] gh/ezyang/3131/head -> origin/gh/ezyang/3131/head 2025-12-04T09:33:41.3689525Z * [new branch] gh/ezyang/3131/orig -> origin/gh/ezyang/3131/orig 2025-12-04T09:33:41.3691450Z * [new branch] gh/ezyang/3139/base -> origin/gh/ezyang/3139/base 2025-12-04T09:33:41.3692891Z * [new branch] gh/ezyang/3139/head -> origin/gh/ezyang/3139/head 2025-12-04T09:33:41.3694373Z * [new branch] gh/ezyang/3139/orig -> origin/gh/ezyang/3139/orig 2025-12-04T09:33:41.3696376Z * [new branch] gh/ezyang/3140/base -> origin/gh/ezyang/3140/base 2025-12-04T09:33:41.3697860Z * [new branch] gh/ezyang/3140/head -> origin/gh/ezyang/3140/head 2025-12-04T09:33:41.3699388Z * [new branch] gh/ezyang/3140/orig -> origin/gh/ezyang/3140/orig 2025-12-04T09:33:41.3701307Z * [new branch] gh/ezyang/3143/base -> origin/gh/ezyang/3143/base 2025-12-04T09:33:41.3702749Z * [new branch] gh/ezyang/3143/head -> origin/gh/ezyang/3143/head 2025-12-04T09:33:41.3704208Z * [new branch] gh/ezyang/3143/orig -> origin/gh/ezyang/3143/orig 2025-12-04T09:33:41.3706203Z * [new branch] gh/ezyang/3144/base -> origin/gh/ezyang/3144/base 2025-12-04T09:33:41.3707977Z * [new branch] gh/ezyang/3144/head -> origin/gh/ezyang/3144/head 2025-12-04T09:33:41.3709362Z * [new branch] gh/ezyang/3144/orig -> origin/gh/ezyang/3144/orig 2025-12-04T09:33:41.3711292Z * [new branch] gh/ezyang/3167/base -> origin/gh/ezyang/3167/base 2025-12-04T09:33:41.3712730Z * [new branch] gh/ezyang/3167/head -> origin/gh/ezyang/3167/head 2025-12-04T09:33:41.3714240Z * [new branch] gh/ezyang/3167/orig -> origin/gh/ezyang/3167/orig 2025-12-04T09:33:41.3716176Z * [new branch] gh/ezyang/3173/base -> origin/gh/ezyang/3173/base 2025-12-04T09:33:41.3717632Z * [new branch] gh/ezyang/3173/head -> origin/gh/ezyang/3173/head 2025-12-04T09:33:41.3719268Z * [new branch] gh/ezyang/3173/orig -> origin/gh/ezyang/3173/orig 2025-12-04T09:33:41.3721136Z * [new branch] gh/ezyang/3175/base -> origin/gh/ezyang/3175/base 2025-12-04T09:33:41.3722591Z * [new branch] gh/ezyang/3175/head -> origin/gh/ezyang/3175/head 2025-12-04T09:33:41.3724047Z * [new branch] gh/ezyang/3175/orig -> origin/gh/ezyang/3175/orig 2025-12-04T09:33:41.3726006Z * [new branch] gh/ezyang/3182/base -> origin/gh/ezyang/3182/base 2025-12-04T09:33:41.3727658Z * [new branch] gh/ezyang/3182/head -> origin/gh/ezyang/3182/head 2025-12-04T09:33:41.3729155Z * [new branch] gh/ezyang/3182/orig -> origin/gh/ezyang/3182/orig 2025-12-04T09:33:41.3731130Z * [new branch] gh/ezyang/3185/base -> origin/gh/ezyang/3185/base 2025-12-04T09:33:41.3732697Z * [new branch] gh/ezyang/3185/head -> origin/gh/ezyang/3185/head 2025-12-04T09:33:41.3734076Z * [new branch] gh/ezyang/3185/orig -> origin/gh/ezyang/3185/orig 2025-12-04T09:33:41.3735998Z * [new branch] gh/ezyang/3189/base -> origin/gh/ezyang/3189/base 2025-12-04T09:33:41.3737611Z * [new branch] gh/ezyang/3189/head -> origin/gh/ezyang/3189/head 2025-12-04T09:33:41.3739081Z * [new branch] gh/ezyang/3189/orig -> origin/gh/ezyang/3189/orig 2025-12-04T09:33:41.3740993Z * [new branch] gh/ezyang/3191/base -> origin/gh/ezyang/3191/base 2025-12-04T09:33:41.3742478Z * [new branch] gh/ezyang/3191/head -> origin/gh/ezyang/3191/head 2025-12-04T09:33:41.3743987Z * [new branch] gh/ezyang/3191/orig -> origin/gh/ezyang/3191/orig 2025-12-04T09:33:41.3746491Z * [new branch] gh/ezyang/3192/base -> origin/gh/ezyang/3192/base 2025-12-04T09:33:41.3748050Z * [new branch] gh/ezyang/3192/head -> origin/gh/ezyang/3192/head 2025-12-04T09:33:41.3749645Z * [new branch] gh/ezyang/3192/orig -> origin/gh/ezyang/3192/orig 2025-12-04T09:33:41.3751714Z * [new branch] gh/ezyang/3193/base -> origin/gh/ezyang/3193/base 2025-12-04T09:33:41.3753248Z * [new branch] gh/ezyang/3193/head -> origin/gh/ezyang/3193/head 2025-12-04T09:33:41.3755503Z * [new branch] gh/ezyang/3193/orig -> origin/gh/ezyang/3193/orig 2025-12-04T09:33:41.3757505Z * [new branch] gh/ezyang/3194/base -> origin/gh/ezyang/3194/base 2025-12-04T09:33:41.3758997Z * [new branch] gh/ezyang/3194/head -> origin/gh/ezyang/3194/head 2025-12-04T09:33:41.3760476Z * [new branch] gh/ezyang/3194/orig -> origin/gh/ezyang/3194/orig 2025-12-04T09:33:41.3762387Z * [new branch] gh/ezyang/3195/base -> origin/gh/ezyang/3195/base 2025-12-04T09:33:41.3764264Z * [new branch] gh/ezyang/3195/head -> origin/gh/ezyang/3195/head 2025-12-04T09:33:41.3765758Z * [new branch] gh/ezyang/3195/orig -> origin/gh/ezyang/3195/orig 2025-12-04T09:33:41.3767760Z * [new branch] gh/ezyang/3196/base -> origin/gh/ezyang/3196/base 2025-12-04T09:33:41.3769355Z * [new branch] gh/ezyang/3196/head -> origin/gh/ezyang/3196/head 2025-12-04T09:33:41.3770903Z * [new branch] gh/ezyang/3196/orig -> origin/gh/ezyang/3196/orig 2025-12-04T09:33:41.3777416Z * [new branch] gh/ezyang/3197/base -> origin/gh/ezyang/3197/base 2025-12-04T09:33:41.3778829Z * [new branch] gh/ezyang/3197/head -> origin/gh/ezyang/3197/head 2025-12-04T09:33:41.3780351Z * [new branch] gh/ezyang/3197/orig -> origin/gh/ezyang/3197/orig 2025-12-04T09:33:41.3782378Z * [new branch] gh/ezyang/3198/base -> origin/gh/ezyang/3198/base 2025-12-04T09:33:41.3783895Z * [new branch] gh/ezyang/3198/head -> origin/gh/ezyang/3198/head 2025-12-04T09:33:41.3785465Z * [new branch] gh/ezyang/3198/orig -> origin/gh/ezyang/3198/orig 2025-12-04T09:33:41.3787462Z * [new branch] gh/ezyang/3199/base -> origin/gh/ezyang/3199/base 2025-12-04T09:33:41.3789017Z * [new branch] gh/ezyang/3199/head -> origin/gh/ezyang/3199/head 2025-12-04T09:33:41.3790496Z * [new branch] gh/ezyang/3199/orig -> origin/gh/ezyang/3199/orig 2025-12-04T09:33:41.3792516Z * [new branch] gh/ezyang/3200/base -> origin/gh/ezyang/3200/base 2025-12-04T09:33:41.3794119Z * [new branch] gh/ezyang/3200/head -> origin/gh/ezyang/3200/head 2025-12-04T09:33:41.3795682Z * [new branch] gh/ezyang/3200/orig -> origin/gh/ezyang/3200/orig 2025-12-04T09:33:41.3797666Z * [new branch] gh/ezyang/3201/base -> origin/gh/ezyang/3201/base 2025-12-04T09:33:41.3799319Z * [new branch] gh/ezyang/3201/head -> origin/gh/ezyang/3201/head 2025-12-04T09:33:41.3800684Z * [new branch] gh/ezyang/3201/orig -> origin/gh/ezyang/3201/orig 2025-12-04T09:33:41.3802641Z * [new branch] gh/ezyang/3202/base -> origin/gh/ezyang/3202/base 2025-12-04T09:33:41.3804082Z * [new branch] gh/ezyang/3202/head -> origin/gh/ezyang/3202/head 2025-12-04T09:33:41.3805598Z * [new branch] gh/ezyang/3202/orig -> origin/gh/ezyang/3202/orig 2025-12-04T09:33:41.3807570Z * [new branch] gh/ezyang/3203/base -> origin/gh/ezyang/3203/base 2025-12-04T09:33:41.3809058Z * [new branch] gh/ezyang/3203/head -> origin/gh/ezyang/3203/head 2025-12-04T09:33:41.3810704Z * [new branch] gh/ezyang/3203/orig -> origin/gh/ezyang/3203/orig 2025-12-04T09:33:41.3812692Z * [new branch] gh/ezyang/3204/base -> origin/gh/ezyang/3204/base 2025-12-04T09:33:41.3814329Z * [new branch] gh/ezyang/3204/head -> origin/gh/ezyang/3204/head 2025-12-04T09:33:41.3815839Z * [new branch] gh/ezyang/3204/orig -> origin/gh/ezyang/3204/orig 2025-12-04T09:33:41.3818026Z * [new branch] gh/ezyang/3205/base -> origin/gh/ezyang/3205/base 2025-12-04T09:33:41.3819481Z * [new branch] gh/ezyang/3205/head -> origin/gh/ezyang/3205/head 2025-12-04T09:33:41.3820978Z * [new branch] gh/ezyang/3205/orig -> origin/gh/ezyang/3205/orig 2025-12-04T09:33:41.3822912Z * [new branch] gh/ezyang/3206/base -> origin/gh/ezyang/3206/base 2025-12-04T09:33:41.3824367Z * [new branch] gh/ezyang/3206/head -> origin/gh/ezyang/3206/head 2025-12-04T09:33:41.3825884Z * [new branch] gh/ezyang/3206/orig -> origin/gh/ezyang/3206/orig 2025-12-04T09:33:41.3827871Z * [new branch] gh/ezyang/3207/base -> origin/gh/ezyang/3207/base 2025-12-04T09:33:41.3829354Z * [new branch] gh/ezyang/3207/head -> origin/gh/ezyang/3207/head 2025-12-04T09:33:41.3830853Z * [new branch] gh/ezyang/3207/orig -> origin/gh/ezyang/3207/orig 2025-12-04T09:33:41.3832876Z * [new branch] gh/ezyang/3208/base -> origin/gh/ezyang/3208/base 2025-12-04T09:33:41.3834531Z * [new branch] gh/ezyang/3208/head -> origin/gh/ezyang/3208/head 2025-12-04T09:33:41.3836023Z * [new branch] gh/ezyang/3208/orig -> origin/gh/ezyang/3208/orig 2025-12-04T09:33:41.3837990Z * [new branch] gh/ezyang/3209/base -> origin/gh/ezyang/3209/base 2025-12-04T09:33:41.3839649Z * [new branch] gh/ezyang/3209/head -> origin/gh/ezyang/3209/head 2025-12-04T09:33:41.3841120Z * [new branch] gh/ezyang/3209/orig -> origin/gh/ezyang/3209/orig 2025-12-04T09:33:41.3843480Z * [new branch] gh/fadara01/3/base -> origin/gh/fadara01/3/base 2025-12-04T09:33:41.3845020Z * [new branch] gh/fadara01/3/head -> origin/gh/fadara01/3/head 2025-12-04T09:33:41.3846517Z * [new branch] gh/fadara01/3/orig -> origin/gh/fadara01/3/orig 2025-12-04T09:33:41.3848616Z * [new branch] gh/fadara01/5/base -> origin/gh/fadara01/5/base 2025-12-04T09:33:41.3850107Z * [new branch] gh/fadara01/5/head -> origin/gh/fadara01/5/head 2025-12-04T09:33:41.3851677Z * [new branch] gh/fadara01/5/orig -> origin/gh/fadara01/5/orig 2025-12-04T09:33:41.3853647Z * [new branch] gh/fadara01/6/base -> origin/gh/fadara01/6/base 2025-12-04T09:33:41.3855142Z * [new branch] gh/fadara01/6/head -> origin/gh/fadara01/6/head 2025-12-04T09:33:41.3856706Z * [new branch] gh/fadara01/6/orig -> origin/gh/fadara01/6/orig 2025-12-04T09:33:41.3858845Z * [new branch] gh/fadara01/7/base -> origin/gh/fadara01/7/base 2025-12-04T09:33:41.3860165Z * [new branch] gh/fadara01/7/head -> origin/gh/fadara01/7/head 2025-12-04T09:33:41.3861729Z * [new branch] gh/fadara01/7/orig -> origin/gh/fadara01/7/orig 2025-12-04T09:33:41.3863681Z * [new branch] gh/fadara01/8/base -> origin/gh/fadara01/8/base 2025-12-04T09:33:41.3865160Z * [new branch] gh/fadara01/8/head -> origin/gh/fadara01/8/head 2025-12-04T09:33:41.3866661Z * [new branch] gh/fadara01/8/orig -> origin/gh/fadara01/8/orig 2025-12-04T09:33:41.3868581Z * [new branch] gh/fadara01/9/base -> origin/gh/fadara01/9/base 2025-12-04T09:33:41.3870101Z * [new branch] gh/fadara01/9/head -> origin/gh/fadara01/9/head 2025-12-04T09:33:41.3871813Z * [new branch] gh/fadara01/9/orig -> origin/gh/fadara01/9/orig 2025-12-04T09:33:41.3874250Z * [new branch] gh/fduwjj/182/base -> origin/gh/fduwjj/182/base 2025-12-04T09:33:41.3875776Z * [new branch] gh/fduwjj/182/head -> origin/gh/fduwjj/182/head 2025-12-04T09:33:41.3877223Z * [new branch] gh/fduwjj/182/orig -> origin/gh/fduwjj/182/orig 2025-12-04T09:33:41.3879274Z * [new branch] gh/fduwjj/211/base -> origin/gh/fduwjj/211/base 2025-12-04T09:33:41.3880847Z * [new branch] gh/fduwjj/211/head -> origin/gh/fduwjj/211/head 2025-12-04T09:33:41.3882377Z * [new branch] gh/fduwjj/211/orig -> origin/gh/fduwjj/211/orig 2025-12-04T09:33:41.3884330Z * [new branch] gh/fduwjj/212/base -> origin/gh/fduwjj/212/base 2025-12-04T09:33:41.3885802Z * [new branch] gh/fduwjj/212/head -> origin/gh/fduwjj/212/head 2025-12-04T09:33:41.3887524Z * [new branch] gh/fduwjj/212/orig -> origin/gh/fduwjj/212/orig 2025-12-04T09:33:41.3889253Z * [new branch] gh/fduwjj/213/base -> origin/gh/fduwjj/213/base 2025-12-04T09:33:41.3890731Z * [new branch] gh/fduwjj/213/head -> origin/gh/fduwjj/213/head 2025-12-04T09:33:41.3892276Z * [new branch] gh/fduwjj/213/orig -> origin/gh/fduwjj/213/orig 2025-12-04T09:33:41.3894465Z * [new branch] gh/fduwjj/226/base -> origin/gh/fduwjj/226/base 2025-12-04T09:33:41.3895850Z * [new branch] gh/fduwjj/226/head -> origin/gh/fduwjj/226/head 2025-12-04T09:33:41.3897389Z * [new branch] gh/fduwjj/226/orig -> origin/gh/fduwjj/226/orig 2025-12-04T09:33:41.3899548Z * [new branch] gh/fduwjj/229/base -> origin/gh/fduwjj/229/base 2025-12-04T09:33:41.3900984Z * [new branch] gh/fduwjj/229/head -> origin/gh/fduwjj/229/head 2025-12-04T09:33:41.3902429Z * [new branch] gh/fduwjj/229/orig -> origin/gh/fduwjj/229/orig 2025-12-04T09:33:41.3904381Z * [new branch] gh/fduwjj/233/base -> origin/gh/fduwjj/233/base 2025-12-04T09:33:41.3905892Z * [new branch] gh/fduwjj/233/head -> origin/gh/fduwjj/233/head 2025-12-04T09:33:41.3907351Z * [new branch] gh/fduwjj/233/orig -> origin/gh/fduwjj/233/orig 2025-12-04T09:33:41.3909330Z * [new branch] gh/fduwjj/234/base -> origin/gh/fduwjj/234/base 2025-12-04T09:33:41.3910843Z * [new branch] gh/fduwjj/234/head -> origin/gh/fduwjj/234/head 2025-12-04T09:33:41.3912305Z * [new branch] gh/fduwjj/234/orig -> origin/gh/fduwjj/234/orig 2025-12-04T09:33:41.3914351Z * [new branch] gh/fduwjj/235/base -> origin/gh/fduwjj/235/base 2025-12-04T09:33:41.3915853Z * [new branch] gh/fduwjj/235/head -> origin/gh/fduwjj/235/head 2025-12-04T09:33:41.3917338Z * [new branch] gh/fduwjj/235/orig -> origin/gh/fduwjj/235/orig 2025-12-04T09:33:41.3919319Z * [new branch] gh/fduwjj/236/base -> origin/gh/fduwjj/236/base 2025-12-04T09:33:41.3920633Z * [new branch] gh/fduwjj/236/head -> origin/gh/fduwjj/236/head 2025-12-04T09:33:41.3922144Z * [new branch] gh/fduwjj/236/orig -> origin/gh/fduwjj/236/orig 2025-12-04T09:33:41.3923880Z * [new branch] gh/fduwjj/237/base -> origin/gh/fduwjj/237/base 2025-12-04T09:33:41.3925344Z * [new branch] gh/fduwjj/237/head -> origin/gh/fduwjj/237/head 2025-12-04T09:33:41.3926787Z * [new branch] gh/fduwjj/237/orig -> origin/gh/fduwjj/237/orig 2025-12-04T09:33:41.3928730Z * [new branch] gh/fduwjj/238/base -> origin/gh/fduwjj/238/base 2025-12-04T09:33:41.3930366Z * [new branch] gh/fduwjj/238/head -> origin/gh/fduwjj/238/head 2025-12-04T09:33:41.3931781Z * [new branch] gh/fduwjj/238/orig -> origin/gh/fduwjj/238/orig 2025-12-04T09:33:41.3933900Z * [new branch] gh/fduwjj/239/base -> origin/gh/fduwjj/239/base 2025-12-04T09:33:41.3935457Z * [new branch] gh/fduwjj/239/head -> origin/gh/fduwjj/239/head 2025-12-04T09:33:41.3936974Z * [new branch] gh/fduwjj/239/orig -> origin/gh/fduwjj/239/orig 2025-12-04T09:33:41.3939943Z * [new branch] gh/fegin/332/base -> origin/gh/fegin/332/base 2025-12-04T09:33:41.3941426Z * [new branch] gh/fegin/332/head -> origin/gh/fegin/332/head 2025-12-04T09:33:41.3942999Z * [new branch] gh/fegin/332/orig -> origin/gh/fegin/332/orig 2025-12-04T09:33:41.3944968Z * [new branch] gh/fegin/333/base -> origin/gh/fegin/333/base 2025-12-04T09:33:41.3946597Z * [new branch] gh/fegin/333/head -> origin/gh/fegin/333/head 2025-12-04T09:33:41.3948036Z * [new branch] gh/fegin/333/orig -> origin/gh/fegin/333/orig 2025-12-04T09:33:41.3949973Z * [new branch] gh/fegin/334/base -> origin/gh/fegin/334/base 2025-12-04T09:33:41.3951396Z * [new branch] gh/fegin/334/head -> origin/gh/fegin/334/head 2025-12-04T09:33:41.3953656Z * [new branch] gh/fegin/334/orig -> origin/gh/fegin/334/orig 2025-12-04T09:33:41.3955625Z * [new branch] gh/fegin/335/base -> origin/gh/fegin/335/base 2025-12-04T09:33:41.3957076Z * [new branch] gh/fegin/335/head -> origin/gh/fegin/335/head 2025-12-04T09:33:41.3958597Z * [new branch] gh/fegin/335/orig -> origin/gh/fegin/335/orig 2025-12-04T09:33:41.3960937Z * [new branch] gh/fffrog/160/base -> origin/gh/fffrog/160/base 2025-12-04T09:33:41.3962391Z * [new branch] gh/fffrog/160/head -> origin/gh/fffrog/160/head 2025-12-04T09:33:41.3964393Z * [new branch] gh/fffrog/177/base -> origin/gh/fffrog/177/base 2025-12-04T09:33:41.3965767Z * [new branch] gh/fffrog/177/head -> origin/gh/fffrog/177/head 2025-12-04T09:33:41.3967303Z * [new branch] gh/fffrog/177/orig -> origin/gh/fffrog/177/orig 2025-12-04T09:33:41.3969231Z * [new branch] gh/fffrog/178/base -> origin/gh/fffrog/178/base 2025-12-04T09:33:41.3970692Z * [new branch] gh/fffrog/178/head -> origin/gh/fffrog/178/head 2025-12-04T09:33:41.3972549Z * [new branch] gh/fffrog/178/orig -> origin/gh/fffrog/178/orig 2025-12-04T09:33:41.3974457Z * [new branch] gh/fffrog/181/base -> origin/gh/fffrog/181/base 2025-12-04T09:33:41.3975946Z * [new branch] gh/fffrog/181/head -> origin/gh/fffrog/181/head 2025-12-04T09:33:41.3977600Z * [new branch] gh/fffrog/181/orig -> origin/gh/fffrog/181/orig 2025-12-04T09:33:41.3979637Z * [new branch] gh/fffrog/183/base -> origin/gh/fffrog/183/base 2025-12-04T09:33:41.3981491Z * [new branch] gh/fffrog/183/head -> origin/gh/fffrog/183/head 2025-12-04T09:33:41.3983036Z * [new branch] gh/fffrog/183/orig -> origin/gh/fffrog/183/orig 2025-12-04T09:33:41.3985467Z * [new branch] gh/fxdawnn/10/base -> origin/gh/fxdawnn/10/base 2025-12-04T09:33:41.3987166Z * [new branch] gh/fxdawnn/10/head -> origin/gh/fxdawnn/10/head 2025-12-04T09:33:41.3988559Z * [new branch] gh/fxdawnn/10/orig -> origin/gh/fxdawnn/10/orig 2025-12-04T09:33:41.3991005Z * [new branch] gh/fxdawnn/11/base -> origin/gh/fxdawnn/11/base 2025-12-04T09:33:41.3992152Z * [new branch] gh/fxdawnn/11/head -> origin/gh/fxdawnn/11/head 2025-12-04T09:33:41.3993768Z * [new branch] gh/fxdawnn/11/orig -> origin/gh/fxdawnn/11/orig 2025-12-04T09:33:41.3995814Z * [new branch] gh/fxdawnn/12/base -> origin/gh/fxdawnn/12/base 2025-12-04T09:33:41.3997164Z * [new branch] gh/fxdawnn/12/head -> origin/gh/fxdawnn/12/head 2025-12-04T09:33:41.3998655Z * [new branch] gh/fxdawnn/12/orig -> origin/gh/fxdawnn/12/orig 2025-12-04T09:33:41.4000582Z * [new branch] gh/fxdawnn/13/base -> origin/gh/fxdawnn/13/base 2025-12-04T09:33:41.4002018Z * [new branch] gh/fxdawnn/13/head -> origin/gh/fxdawnn/13/head 2025-12-04T09:33:41.4003622Z * [new branch] gh/fxdawnn/13/orig -> origin/gh/fxdawnn/13/orig 2025-12-04T09:33:41.4005788Z * [new branch] gh/fxdawnn/14/base -> origin/gh/fxdawnn/14/base 2025-12-04T09:33:41.4007149Z * [new branch] gh/fxdawnn/14/head -> origin/gh/fxdawnn/14/head 2025-12-04T09:33:41.4008594Z * [new branch] gh/fxdawnn/14/orig -> origin/gh/fxdawnn/14/orig 2025-12-04T09:33:41.4010585Z * [new branch] gh/fxdawnn/15/base -> origin/gh/fxdawnn/15/base 2025-12-04T09:33:41.4012171Z * [new branch] gh/fxdawnn/15/head -> origin/gh/fxdawnn/15/head 2025-12-04T09:33:41.4013622Z * [new branch] gh/fxdawnn/15/orig -> origin/gh/fxdawnn/15/orig 2025-12-04T09:33:41.4015587Z * [new branch] gh/fxdawnn/6/base -> origin/gh/fxdawnn/6/base 2025-12-04T09:33:41.4017237Z * [new branch] gh/fxdawnn/6/head -> origin/gh/fxdawnn/6/head 2025-12-04T09:33:41.4018677Z * [new branch] gh/fxdawnn/6/orig -> origin/gh/fxdawnn/6/orig 2025-12-04T09:33:41.4020655Z * [new branch] gh/fxdawnn/7/base -> origin/gh/fxdawnn/7/base 2025-12-04T09:33:41.4022226Z * [new branch] gh/fxdawnn/7/head -> origin/gh/fxdawnn/7/head 2025-12-04T09:33:41.4023648Z * [new branch] gh/fxdawnn/7/orig -> origin/gh/fxdawnn/7/orig 2025-12-04T09:33:41.4025681Z * [new branch] gh/fxdawnn/9/base -> origin/gh/fxdawnn/9/base 2025-12-04T09:33:41.4027070Z * [new branch] gh/fxdawnn/9/head -> origin/gh/fxdawnn/9/head 2025-12-04T09:33:41.4028925Z * [new branch] gh/fxdawnn/9/orig -> origin/gh/fxdawnn/9/orig 2025-12-04T09:33:41.4031426Z * [new branch] gh/galv/1/base -> origin/gh/galv/1/base 2025-12-04T09:33:41.4032916Z * [new branch] gh/galv/1/head -> origin/gh/galv/1/head 2025-12-04T09:33:41.4034451Z * [new branch] gh/galv/1/orig -> origin/gh/galv/1/orig 2025-12-04T09:33:41.4036426Z * [new branch] gh/galv/2/base -> origin/gh/galv/2/base 2025-12-04T09:33:41.4037855Z * [new branch] gh/galv/2/head -> origin/gh/galv/2/head 2025-12-04T09:33:41.4040015Z * [new branch] gh/galv/2/orig -> origin/gh/galv/2/orig 2025-12-04T09:33:41.4042608Z * [new branch] gh/galv/3/base -> origin/gh/galv/3/base 2025-12-04T09:33:41.4043230Z * [new branch] gh/galv/3/head -> origin/gh/galv/3/head 2025-12-04T09:33:41.4044979Z * [new branch] gh/galv/3/orig -> origin/gh/galv/3/orig 2025-12-04T09:33:41.4047302Z * [new branch] gh/guangyey/134/base -> origin/gh/guangyey/134/base 2025-12-04T09:33:41.4048840Z * [new branch] gh/guangyey/134/head -> origin/gh/guangyey/134/head 2025-12-04T09:33:41.4050860Z * [new branch] gh/guangyey/134/orig -> origin/gh/guangyey/134/orig 2025-12-04T09:33:41.4053011Z * [new branch] gh/guangyey/163/base -> origin/gh/guangyey/163/base 2025-12-04T09:33:41.4054439Z * [new branch] gh/guangyey/163/head -> origin/gh/guangyey/163/head 2025-12-04T09:33:41.4055897Z * [new branch] gh/guangyey/163/orig -> origin/gh/guangyey/163/orig 2025-12-04T09:33:41.4057970Z * [new branch] gh/guangyey/168/base -> origin/gh/guangyey/168/base 2025-12-04T09:33:41.4059424Z * [new branch] gh/guangyey/168/head -> origin/gh/guangyey/168/head 2025-12-04T09:33:41.4060905Z * [new branch] gh/guangyey/168/orig -> origin/gh/guangyey/168/orig 2025-12-04T09:33:41.4062834Z * [new branch] gh/guangyey/169/base -> origin/gh/guangyey/169/base 2025-12-04T09:33:41.4064316Z * [new branch] gh/guangyey/169/head -> origin/gh/guangyey/169/head 2025-12-04T09:33:41.4065752Z * [new branch] gh/guangyey/169/orig -> origin/gh/guangyey/169/orig 2025-12-04T09:33:41.4067701Z * [new branch] gh/guangyey/170/base -> origin/gh/guangyey/170/base 2025-12-04T09:33:41.4069201Z * [new branch] gh/guangyey/170/head -> origin/gh/guangyey/170/head 2025-12-04T09:33:41.4070674Z * [new branch] gh/guangyey/170/orig -> origin/gh/guangyey/170/orig 2025-12-04T09:33:41.4072965Z * [new branch] gh/guangyey/171/base -> origin/gh/guangyey/171/base 2025-12-04T09:33:41.4074424Z * [new branch] gh/guangyey/171/head -> origin/gh/guangyey/171/head 2025-12-04T09:33:41.4075921Z * [new branch] gh/guangyey/171/orig -> origin/gh/guangyey/171/orig 2025-12-04T09:33:41.4077827Z * [new branch] gh/guangyey/178/base -> origin/gh/guangyey/178/base 2025-12-04T09:33:41.4079395Z * [new branch] gh/guangyey/178/head -> origin/gh/guangyey/178/head 2025-12-04T09:33:41.4080791Z * [new branch] gh/guangyey/178/orig -> origin/gh/guangyey/178/orig 2025-12-04T09:33:41.4082647Z * [new branch] gh/guangyey/182/base -> origin/gh/guangyey/182/base 2025-12-04T09:33:41.4084170Z * [new branch] gh/guangyey/182/head -> origin/gh/guangyey/182/head 2025-12-04T09:33:41.4085677Z * [new branch] gh/guangyey/182/orig -> origin/gh/guangyey/182/orig 2025-12-04T09:33:41.4087626Z * [new branch] gh/guangyey/183/base -> origin/gh/guangyey/183/base 2025-12-04T09:33:41.4089064Z * [new branch] gh/guangyey/183/head -> origin/gh/guangyey/183/head 2025-12-04T09:33:41.4090602Z * [new branch] gh/guangyey/183/orig -> origin/gh/guangyey/183/orig 2025-12-04T09:33:41.4092668Z * [new branch] gh/guangyey/185/base -> origin/gh/guangyey/185/base 2025-12-04T09:33:41.4094144Z * [new branch] gh/guangyey/185/head -> origin/gh/guangyey/185/head 2025-12-04T09:33:41.4095610Z * [new branch] gh/guangyey/185/orig -> origin/gh/guangyey/185/orig 2025-12-04T09:33:41.4097741Z * [new branch] gh/guangyey/186/base -> origin/gh/guangyey/186/base 2025-12-04T09:33:41.4099244Z * [new branch] gh/guangyey/186/head -> origin/gh/guangyey/186/head 2025-12-04T09:33:41.4100752Z * [new branch] gh/guangyey/186/orig -> origin/gh/guangyey/186/orig 2025-12-04T09:33:41.4103113Z * [new branch] gh/guangyey/187/base -> origin/gh/guangyey/187/base 2025-12-04T09:33:41.4104715Z * [new branch] gh/guangyey/187/head -> origin/gh/guangyey/187/head 2025-12-04T09:33:41.4106161Z * [new branch] gh/guangyey/187/orig -> origin/gh/guangyey/187/orig 2025-12-04T09:33:41.4108165Z * [new branch] gh/guangyey/188/base -> origin/gh/guangyey/188/base 2025-12-04T09:33:41.4109667Z * [new branch] gh/guangyey/188/head -> origin/gh/guangyey/188/head 2025-12-04T09:33:41.4111191Z * [new branch] gh/guangyey/188/orig -> origin/gh/guangyey/188/orig 2025-12-04T09:33:41.4113187Z * [new branch] gh/guangyey/190/base -> origin/gh/guangyey/190/base 2025-12-04T09:33:41.4114618Z * [new branch] gh/guangyey/190/head -> origin/gh/guangyey/190/head 2025-12-04T09:33:41.4116125Z * [new branch] gh/guangyey/190/orig -> origin/gh/guangyey/190/orig 2025-12-04T09:33:41.4117935Z * [new branch] gh/guangyey/208/base -> origin/gh/guangyey/208/base 2025-12-04T09:33:41.4119443Z * [new branch] gh/guangyey/208/head -> origin/gh/guangyey/208/head 2025-12-04T09:33:41.4120926Z * [new branch] gh/guangyey/208/orig -> origin/gh/guangyey/208/orig 2025-12-04T09:33:41.4122820Z * [new branch] gh/guangyey/228/base -> origin/gh/guangyey/228/base 2025-12-04T09:33:41.4124322Z * [new branch] gh/guangyey/228/head -> origin/gh/guangyey/228/head 2025-12-04T09:33:41.4125747Z * [new branch] gh/guangyey/228/orig -> origin/gh/guangyey/228/orig 2025-12-04T09:33:41.4128291Z * [new branch] gh/guangyey/230/base -> origin/gh/guangyey/230/base 2025-12-04T09:33:41.4129732Z * [new branch] gh/guangyey/230/head -> origin/gh/guangyey/230/head 2025-12-04T09:33:41.4131220Z * [new branch] gh/guangyey/230/orig -> origin/gh/guangyey/230/orig 2025-12-04T09:33:41.4133289Z * [new branch] gh/guangyey/231/base -> origin/gh/guangyey/231/base 2025-12-04T09:33:41.4134770Z * [new branch] gh/guangyey/231/head -> origin/gh/guangyey/231/head 2025-12-04T09:33:41.4136334Z * [new branch] gh/guangyey/231/orig -> origin/gh/guangyey/231/orig 2025-12-04T09:33:41.4138429Z * [new branch] gh/guangyey/232/base -> origin/gh/guangyey/232/base 2025-12-04T09:33:41.4140022Z * [new branch] gh/guangyey/232/head -> origin/gh/guangyey/232/head 2025-12-04T09:33:41.4141395Z * [new branch] gh/guangyey/232/orig -> origin/gh/guangyey/232/orig 2025-12-04T09:33:41.4143357Z * [new branch] gh/guangyey/233/base -> origin/gh/guangyey/233/base 2025-12-04T09:33:41.4144834Z * [new branch] gh/guangyey/233/head -> origin/gh/guangyey/233/head 2025-12-04T09:33:41.4146337Z * [new branch] gh/guangyey/233/orig -> origin/gh/guangyey/233/orig 2025-12-04T09:33:41.4148334Z * [new branch] gh/guangyey/234/base -> origin/gh/guangyey/234/base 2025-12-04T09:33:41.4149784Z * [new branch] gh/guangyey/234/head -> origin/gh/guangyey/234/head 2025-12-04T09:33:41.4151288Z * [new branch] gh/guangyey/234/orig -> origin/gh/guangyey/234/orig 2025-12-04T09:33:41.4153294Z * [new branch] gh/guangyey/235/base -> origin/gh/guangyey/235/base 2025-12-04T09:33:41.4154781Z * [new branch] gh/guangyey/235/head -> origin/gh/guangyey/235/head 2025-12-04T09:33:41.4156326Z * [new branch] gh/guangyey/235/orig -> origin/gh/guangyey/235/orig 2025-12-04T09:33:41.4158392Z * [new branch] gh/guangyey/236/base -> origin/gh/guangyey/236/base 2025-12-04T09:33:41.4159981Z * [new branch] gh/guangyey/236/head -> origin/gh/guangyey/236/head 2025-12-04T09:33:41.4161340Z * [new branch] gh/guangyey/236/orig -> origin/gh/guangyey/236/orig 2025-12-04T09:33:41.4163397Z * [new branch] gh/guangyey/237/base -> origin/gh/guangyey/237/base 2025-12-04T09:33:41.4164914Z * [new branch] gh/guangyey/237/head -> origin/gh/guangyey/237/head 2025-12-04T09:33:41.4166461Z * [new branch] gh/guangyey/237/orig -> origin/gh/guangyey/237/orig 2025-12-04T09:33:41.4168429Z * [new branch] gh/guangyey/238/base -> origin/gh/guangyey/238/base 2025-12-04T09:33:41.4169906Z * [new branch] gh/guangyey/238/head -> origin/gh/guangyey/238/head 2025-12-04T09:33:41.4173694Z * [new branch] gh/guangyey/239/base -> origin/gh/guangyey/239/base 2025-12-04T09:33:41.4175215Z * [new branch] gh/guangyey/239/head -> origin/gh/guangyey/239/head 2025-12-04T09:33:41.4176822Z * [new branch] gh/guangyey/239/orig -> origin/gh/guangyey/239/orig 2025-12-04T09:33:41.4178883Z * [new branch] gh/guangyey/240/base -> origin/gh/guangyey/240/base 2025-12-04T09:33:41.4180361Z * [new branch] gh/guangyey/240/head -> origin/gh/guangyey/240/head 2025-12-04T09:33:41.4181920Z * [new branch] gh/guangyey/240/orig -> origin/gh/guangyey/240/orig 2025-12-04T09:33:41.4183942Z * [new branch] gh/guangyey/241/base -> origin/gh/guangyey/241/base 2025-12-04T09:33:41.4185460Z * [new branch] gh/guangyey/241/head -> origin/gh/guangyey/241/head 2025-12-04T09:33:41.4186950Z * [new branch] gh/guangyey/241/orig -> origin/gh/guangyey/241/orig 2025-12-04T09:33:41.4189005Z * [new branch] gh/guangyey/242/base -> origin/gh/guangyey/242/base 2025-12-04T09:33:41.4190459Z * [new branch] gh/guangyey/242/head -> origin/gh/guangyey/242/head 2025-12-04T09:33:41.4191921Z * [new branch] gh/guangyey/242/orig -> origin/gh/guangyey/242/orig 2025-12-04T09:33:41.4194092Z * [new branch] gh/guangyey/243/base -> origin/gh/guangyey/243/base 2025-12-04T09:33:41.4195519Z * [new branch] gh/guangyey/243/head -> origin/gh/guangyey/243/head 2025-12-04T09:33:41.4197032Z * [new branch] gh/guangyey/243/orig -> origin/gh/guangyey/243/orig 2025-12-04T09:33:41.4199222Z * [new branch] gh/guangyey/244/base -> origin/gh/guangyey/244/base 2025-12-04T09:33:41.4200715Z * [new branch] gh/guangyey/244/head -> origin/gh/guangyey/244/head 2025-12-04T09:33:41.4202231Z * [new branch] gh/guangyey/244/orig -> origin/gh/guangyey/244/orig 2025-12-04T09:33:41.4204308Z * [new branch] gh/guangyey/245/base -> origin/gh/guangyey/245/base 2025-12-04T09:33:41.4205862Z * [new branch] gh/guangyey/245/head -> origin/gh/guangyey/245/head 2025-12-04T09:33:41.4207363Z * [new branch] gh/guangyey/245/orig -> origin/gh/guangyey/245/orig 2025-12-04T09:33:41.4209477Z * [new branch] gh/guangyey/246/base -> origin/gh/guangyey/246/base 2025-12-04T09:33:41.4211220Z * [new branch] gh/guangyey/246/head -> origin/gh/guangyey/246/head 2025-12-04T09:33:41.4212566Z * [new branch] gh/guangyey/246/orig -> origin/gh/guangyey/246/orig 2025-12-04T09:33:41.4214716Z * [new branch] gh/guangyey/247/base -> origin/gh/guangyey/247/base 2025-12-04T09:33:41.4216284Z * [new branch] gh/guangyey/247/head -> origin/gh/guangyey/247/head 2025-12-04T09:33:41.4217850Z * [new branch] gh/guangyey/247/orig -> origin/gh/guangyey/247/orig 2025-12-04T09:33:41.4219857Z * [new branch] gh/guangyey/248/base -> origin/gh/guangyey/248/base 2025-12-04T09:33:41.4221463Z * [new branch] gh/guangyey/248/head -> origin/gh/guangyey/248/head 2025-12-04T09:33:41.4222780Z * [new branch] gh/guangyey/248/orig -> origin/gh/guangyey/248/orig 2025-12-04T09:33:41.4224821Z * [new branch] gh/guangyey/249/base -> origin/gh/guangyey/249/base 2025-12-04T09:33:41.4226440Z * [new branch] gh/guangyey/249/head -> origin/gh/guangyey/249/head 2025-12-04T09:33:41.4227906Z * [new branch] gh/guangyey/249/orig -> origin/gh/guangyey/249/orig 2025-12-04T09:33:41.4230002Z * [new branch] gh/guangyey/250/base -> origin/gh/guangyey/250/base 2025-12-04T09:33:41.4231502Z * [new branch] gh/guangyey/250/head -> origin/gh/guangyey/250/head 2025-12-04T09:33:41.4233063Z * [new branch] gh/guangyey/250/orig -> origin/gh/guangyey/250/orig 2025-12-04T09:33:41.4235654Z * [new branch] gh/guangyey/251/base -> origin/gh/guangyey/251/base 2025-12-04T09:33:41.4237183Z * [new branch] gh/guangyey/251/head -> origin/gh/guangyey/251/head 2025-12-04T09:33:41.4238718Z * [new branch] gh/guangyey/251/orig -> origin/gh/guangyey/251/orig 2025-12-04T09:33:41.4240686Z * [new branch] gh/guangyey/252/base -> origin/gh/guangyey/252/base 2025-12-04T09:33:41.4242237Z * [new branch] gh/guangyey/252/head -> origin/gh/guangyey/252/head 2025-12-04T09:33:41.4243727Z * [new branch] gh/guangyey/252/orig -> origin/gh/guangyey/252/orig 2025-12-04T09:33:41.4245723Z * [new branch] gh/guangyey/253/base -> origin/gh/guangyey/253/base 2025-12-04T09:33:41.4247229Z * [new branch] gh/guangyey/253/head -> origin/gh/guangyey/253/head 2025-12-04T09:33:41.4248689Z * [new branch] gh/guangyey/253/orig -> origin/gh/guangyey/253/orig 2025-12-04T09:33:41.4250660Z * [new branch] gh/guangyey/254/base -> origin/gh/guangyey/254/base 2025-12-04T09:33:41.4252232Z * [new branch] gh/guangyey/254/head -> origin/gh/guangyey/254/head 2025-12-04T09:33:41.4254300Z * [new branch] gh/guangyey/254/orig -> origin/gh/guangyey/254/orig 2025-12-04T09:33:41.4256432Z * [new branch] gh/guangyey/255/base -> origin/gh/guangyey/255/base 2025-12-04T09:33:41.4258107Z * [new branch] gh/guangyey/255/head -> origin/gh/guangyey/255/head 2025-12-04T09:33:41.4259612Z * [new branch] gh/guangyey/255/orig -> origin/gh/guangyey/255/orig 2025-12-04T09:33:41.4262294Z * [new branch] gh/guilhermeleobas/107/base -> origin/gh/guilhermeleobas/107/base 2025-12-04T09:33:41.4264111Z * [new branch] gh/guilhermeleobas/107/head -> origin/gh/guilhermeleobas/107/head 2025-12-04T09:33:41.4265400Z * [new branch] gh/guilhermeleobas/107/orig -> origin/gh/guilhermeleobas/107/orig 2025-12-04T09:33:41.4267310Z * [new branch] gh/guilhermeleobas/108/base -> origin/gh/guilhermeleobas/108/base 2025-12-04T09:33:41.4268791Z * [new branch] gh/guilhermeleobas/108/head -> origin/gh/guilhermeleobas/108/head 2025-12-04T09:33:41.4270728Z * [new branch] gh/guilhermeleobas/108/orig -> origin/gh/guilhermeleobas/108/orig 2025-12-04T09:33:41.4272947Z * [new branch] gh/guilhermeleobas/150/base -> origin/gh/guilhermeleobas/150/base 2025-12-04T09:33:41.4276127Z * [new branch] gh/guilhermeleobas/150/head -> origin/gh/guilhermeleobas/150/head 2025-12-04T09:33:41.4277403Z * [new branch] gh/guilhermeleobas/150/orig -> origin/gh/guilhermeleobas/150/orig 2025-12-04T09:33:41.4279615Z * [new branch] gh/guilhermeleobas/168/base -> origin/gh/guilhermeleobas/168/base 2025-12-04T09:33:41.4280988Z * [new branch] gh/guilhermeleobas/168/head -> origin/gh/guilhermeleobas/168/head 2025-12-04T09:33:41.4282457Z * [new branch] gh/guilhermeleobas/168/orig -> origin/gh/guilhermeleobas/168/orig 2025-12-04T09:33:41.4284828Z * [new branch] gh/guilhermeleobas/169/base -> origin/gh/guilhermeleobas/169/base 2025-12-04T09:33:41.4285965Z * [new branch] gh/guilhermeleobas/169/head -> origin/gh/guilhermeleobas/169/head 2025-12-04T09:33:41.4288642Z * [new branch] gh/guilhermeleobas/169/orig -> origin/gh/guilhermeleobas/169/orig 2025-12-04T09:33:41.4289458Z * [new branch] gh/guilhermeleobas/170/base -> origin/gh/guilhermeleobas/170/base 2025-12-04T09:33:41.4290814Z * [new branch] gh/guilhermeleobas/170/head -> origin/gh/guilhermeleobas/170/head 2025-12-04T09:33:41.4292273Z * [new branch] gh/guilhermeleobas/170/orig -> origin/gh/guilhermeleobas/170/orig 2025-12-04T09:33:41.4294645Z * [new branch] gh/guilhermeleobas/171/base -> origin/gh/guilhermeleobas/171/base 2025-12-04T09:33:41.4296125Z * [new branch] gh/guilhermeleobas/171/head -> origin/gh/guilhermeleobas/171/head 2025-12-04T09:33:41.4297792Z * [new branch] gh/guilhermeleobas/171/orig -> origin/gh/guilhermeleobas/171/orig 2025-12-04T09:33:41.4299880Z * [new branch] gh/guilhermeleobas/173/base -> origin/gh/guilhermeleobas/173/base 2025-12-04T09:33:41.4301376Z * [new branch] gh/guilhermeleobas/173/head -> origin/gh/guilhermeleobas/173/head 2025-12-04T09:33:41.4302885Z * [new branch] gh/guilhermeleobas/173/orig -> origin/gh/guilhermeleobas/173/orig 2025-12-04T09:33:41.4304881Z * [new branch] gh/guilhermeleobas/193/base -> origin/gh/guilhermeleobas/193/base 2025-12-04T09:33:41.4306383Z * [new branch] gh/guilhermeleobas/193/head -> origin/gh/guilhermeleobas/193/head 2025-12-04T09:33:41.4308087Z * [new branch] gh/guilhermeleobas/193/orig -> origin/gh/guilhermeleobas/193/orig 2025-12-04T09:33:41.4310567Z * [new branch] gh/guilhermeleobas/204/base -> origin/gh/guilhermeleobas/204/base 2025-12-04T09:33:41.4312077Z * [new branch] gh/guilhermeleobas/204/head -> origin/gh/guilhermeleobas/204/head 2025-12-04T09:33:41.4313543Z * [new branch] gh/guilhermeleobas/204/orig -> origin/gh/guilhermeleobas/204/orig 2025-12-04T09:33:41.4315501Z * [new branch] gh/guilhermeleobas/211/base -> origin/gh/guilhermeleobas/211/base 2025-12-04T09:33:41.4316993Z * [new branch] gh/guilhermeleobas/211/head -> origin/gh/guilhermeleobas/211/head 2025-12-04T09:33:41.4318493Z * [new branch] gh/guilhermeleobas/211/orig -> origin/gh/guilhermeleobas/211/orig 2025-12-04T09:33:41.4320659Z * [new branch] gh/guilhermeleobas/226/base -> origin/gh/guilhermeleobas/226/base 2025-12-04T09:33:41.4322097Z * [new branch] gh/guilhermeleobas/226/head -> origin/gh/guilhermeleobas/226/head 2025-12-04T09:33:41.4323528Z * [new branch] gh/guilhermeleobas/226/orig -> origin/gh/guilhermeleobas/226/orig 2025-12-04T09:33:41.4325474Z * [new branch] gh/guilhermeleobas/236/base -> origin/gh/guilhermeleobas/236/base 2025-12-04T09:33:41.4328961Z * [new branch] gh/guilhermeleobas/236/head -> origin/gh/guilhermeleobas/236/head 2025-12-04T09:33:41.4329270Z * [new branch] gh/guilhermeleobas/236/orig -> origin/gh/guilhermeleobas/236/orig 2025-12-04T09:33:41.4330794Z * [new branch] gh/guilhermeleobas/247/base -> origin/gh/guilhermeleobas/247/base 2025-12-04T09:33:41.4331539Z * [new branch] gh/guilhermeleobas/247/head -> origin/gh/guilhermeleobas/247/head 2025-12-04T09:33:41.4333249Z * [new branch] gh/guilhermeleobas/247/orig -> origin/gh/guilhermeleobas/247/orig 2025-12-04T09:33:41.4335176Z * [new branch] gh/guilhermeleobas/248/base -> origin/gh/guilhermeleobas/248/base 2025-12-04T09:33:41.4336736Z * [new branch] gh/guilhermeleobas/248/head -> origin/gh/guilhermeleobas/248/head 2025-12-04T09:33:41.4338305Z * [new branch] gh/guilhermeleobas/248/orig -> origin/gh/guilhermeleobas/248/orig 2025-12-04T09:33:41.4340484Z * [new branch] gh/guilhermeleobas/250/base -> origin/gh/guilhermeleobas/250/base 2025-12-04T09:33:41.4341867Z * [new branch] gh/guilhermeleobas/250/head -> origin/gh/guilhermeleobas/250/head 2025-12-04T09:33:41.4343419Z * [new branch] gh/guilhermeleobas/250/orig -> origin/gh/guilhermeleobas/250/orig 2025-12-04T09:33:41.4345892Z * [new branch] gh/guilhermeleobas/253/base -> origin/gh/guilhermeleobas/253/base 2025-12-04T09:33:41.4347411Z * [new branch] gh/guilhermeleobas/253/head -> origin/gh/guilhermeleobas/253/head 2025-12-04T09:33:41.4349016Z * [new branch] gh/guilhermeleobas/253/orig -> origin/gh/guilhermeleobas/253/orig 2025-12-04T09:33:41.4351068Z * [new branch] gh/guilhermeleobas/254/base -> origin/gh/guilhermeleobas/254/base 2025-12-04T09:33:41.4352538Z * [new branch] gh/guilhermeleobas/254/head -> origin/gh/guilhermeleobas/254/head 2025-12-04T09:33:41.4354082Z * [new branch] gh/guilhermeleobas/254/orig -> origin/gh/guilhermeleobas/254/orig 2025-12-04T09:33:41.4356058Z * [new branch] gh/guilhermeleobas/255/base -> origin/gh/guilhermeleobas/255/base 2025-12-04T09:33:41.4357576Z * [new branch] gh/guilhermeleobas/255/head -> origin/gh/guilhermeleobas/255/head 2025-12-04T09:33:41.4359069Z * [new branch] gh/guilhermeleobas/255/orig -> origin/gh/guilhermeleobas/255/orig 2025-12-04T09:33:41.4361992Z * [new branch] gh/guilhermeleobas/256/base -> origin/gh/guilhermeleobas/256/base 2025-12-04T09:33:41.4363382Z * [new branch] gh/guilhermeleobas/256/head -> origin/gh/guilhermeleobas/256/head 2025-12-04T09:33:41.4364891Z * [new branch] gh/guilhermeleobas/256/orig -> origin/gh/guilhermeleobas/256/orig 2025-12-04T09:33:41.4366934Z * [new branch] gh/guilhermeleobas/257/base -> origin/gh/guilhermeleobas/257/base 2025-12-04T09:33:41.4368415Z * [new branch] gh/guilhermeleobas/257/head -> origin/gh/guilhermeleobas/257/head 2025-12-04T09:33:41.4370099Z * [new branch] gh/guilhermeleobas/257/orig -> origin/gh/guilhermeleobas/257/orig 2025-12-04T09:33:41.4372629Z * [new branch] gh/guilhermeleobas/258/base -> origin/gh/guilhermeleobas/258/base 2025-12-04T09:33:41.4373830Z * [new branch] gh/guilhermeleobas/258/head -> origin/gh/guilhermeleobas/258/head 2025-12-04T09:33:41.4375366Z * [new branch] gh/guilhermeleobas/258/orig -> origin/gh/guilhermeleobas/258/orig 2025-12-04T09:33:41.4377551Z * [new branch] gh/guilhermeleobas/259/base -> origin/gh/guilhermeleobas/259/base 2025-12-04T09:33:41.4379056Z * [new branch] gh/guilhermeleobas/259/head -> origin/gh/guilhermeleobas/259/head 2025-12-04T09:33:41.4380551Z * [new branch] gh/guilhermeleobas/259/orig -> origin/gh/guilhermeleobas/259/orig 2025-12-04T09:33:41.4382782Z * [new branch] gh/guilhermeleobas/260/base -> origin/gh/guilhermeleobas/260/base 2025-12-04T09:33:41.4384256Z * [new branch] gh/guilhermeleobas/260/head -> origin/gh/guilhermeleobas/260/head 2025-12-04T09:33:41.4385715Z * [new branch] gh/guilhermeleobas/260/orig -> origin/gh/guilhermeleobas/260/orig 2025-12-04T09:33:41.4387751Z * [new branch] gh/guilhermeleobas/261/base -> origin/gh/guilhermeleobas/261/base 2025-12-04T09:33:41.4389244Z * [new branch] gh/guilhermeleobas/261/head -> origin/gh/guilhermeleobas/261/head 2025-12-04T09:33:41.4390711Z * [new branch] gh/guilhermeleobas/261/orig -> origin/gh/guilhermeleobas/261/orig 2025-12-04T09:33:41.4392734Z * [new branch] gh/guilhermeleobas/262/base -> origin/gh/guilhermeleobas/262/base 2025-12-04T09:33:41.4394384Z * [new branch] gh/guilhermeleobas/262/head -> origin/gh/guilhermeleobas/262/head 2025-12-04T09:33:41.4395803Z * [new branch] gh/guilhermeleobas/262/orig -> origin/gh/guilhermeleobas/262/orig 2025-12-04T09:33:41.4398136Z * [new branch] gh/guilhermeleobas/263/base -> origin/gh/guilhermeleobas/263/base 2025-12-04T09:33:41.4399478Z * [new branch] gh/guilhermeleobas/263/head -> origin/gh/guilhermeleobas/263/head 2025-12-04T09:33:41.4401011Z * [new branch] gh/guilhermeleobas/263/orig -> origin/gh/guilhermeleobas/263/orig 2025-12-04T09:33:41.4403143Z * [new branch] gh/guilhermeleobas/264/base -> origin/gh/guilhermeleobas/264/base 2025-12-04T09:33:41.4404650Z * [new branch] gh/guilhermeleobas/264/head -> origin/gh/guilhermeleobas/264/head 2025-12-04T09:33:41.4406154Z * [new branch] gh/guilhermeleobas/264/orig -> origin/gh/guilhermeleobas/264/orig 2025-12-04T09:33:41.4408162Z * [new branch] gh/guilhermeleobas/265/base -> origin/gh/guilhermeleobas/265/base 2025-12-04T09:33:41.4409668Z * [new branch] gh/guilhermeleobas/265/head -> origin/gh/guilhermeleobas/265/head 2025-12-04T09:33:41.4411209Z * [new branch] gh/guilhermeleobas/265/orig -> origin/gh/guilhermeleobas/265/orig 2025-12-04T09:33:41.4413289Z * [new branch] gh/guilhermeleobas/266/base -> origin/gh/guilhermeleobas/266/base 2025-12-04T09:33:41.4414773Z * [new branch] gh/guilhermeleobas/266/head -> origin/gh/guilhermeleobas/266/head 2025-12-04T09:33:41.4416904Z * [new branch] gh/guilhermeleobas/266/orig -> origin/gh/guilhermeleobas/266/orig 2025-12-04T09:33:41.4418488Z * [new branch] gh/guilhermeleobas/267/base -> origin/gh/guilhermeleobas/267/base 2025-12-04T09:33:41.4419939Z * [new branch] gh/guilhermeleobas/267/head -> origin/gh/guilhermeleobas/267/head 2025-12-04T09:33:41.4421425Z * [new branch] gh/guilhermeleobas/267/orig -> origin/gh/guilhermeleobas/267/orig 2025-12-04T09:33:41.4424106Z * [new branch] gh/hameerabbasi/1/base -> origin/gh/hameerabbasi/1/base 2025-12-04T09:33:41.4425626Z * [new branch] gh/hameerabbasi/1/head -> origin/gh/hameerabbasi/1/head 2025-12-04T09:33:41.4427552Z * [new branch] gh/hameerabbasi/2/base -> origin/gh/hameerabbasi/2/base 2025-12-04T09:33:41.4429028Z * [new branch] gh/hameerabbasi/2/head -> origin/gh/hameerabbasi/2/head 2025-12-04T09:33:41.4430652Z * [new branch] gh/hameerabbasi/2/orig -> origin/gh/hameerabbasi/2/orig 2025-12-04T09:33:41.4432507Z * [new branch] gh/hameerabbasi/3/base -> origin/gh/hameerabbasi/3/base 2025-12-04T09:33:41.4433988Z * [new branch] gh/hameerabbasi/3/head -> origin/gh/hameerabbasi/3/head 2025-12-04T09:33:41.4435763Z * [new branch] gh/hameerabbasi/3/orig -> origin/gh/hameerabbasi/3/orig 2025-12-04T09:33:41.4437630Z * [new branch] gh/hameerabbasi/4/base -> origin/gh/hameerabbasi/4/base 2025-12-04T09:33:41.4439193Z * [new branch] gh/hameerabbasi/4/head -> origin/gh/hameerabbasi/4/head 2025-12-04T09:33:41.4440555Z * [new branch] gh/hameerabbasi/4/orig -> origin/gh/hameerabbasi/4/orig 2025-12-04T09:33:41.4443043Z * [new branch] gh/huydhn/1/next -> origin/gh/huydhn/1/next 2025-12-04T09:33:41.4444881Z * [new branch] gh/huydhn/2/next -> origin/gh/huydhn/2/next 2025-12-04T09:33:41.4446789Z * [new branch] gh/huydhn/3/next -> origin/gh/huydhn/3/next 2025-12-04T09:33:41.4448780Z * [new branch] gh/huydhn/4/next -> origin/gh/huydhn/4/next 2025-12-04T09:33:41.4450659Z * [new branch] gh/huydhn/5/next -> origin/gh/huydhn/5/next 2025-12-04T09:33:41.4452540Z * [new branch] gh/huydhn/6/next -> origin/gh/huydhn/6/next 2025-12-04T09:33:41.4454862Z * [new branch] gh/int3/97/base -> origin/gh/int3/97/base 2025-12-04T09:33:41.4456532Z * [new branch] gh/int3/97/head -> origin/gh/int3/97/head 2025-12-04T09:33:41.4459135Z * [new branch] gh/isuruf/101/base -> origin/gh/isuruf/101/base 2025-12-04T09:33:41.4460476Z * [new branch] gh/isuruf/101/head -> origin/gh/isuruf/101/head 2025-12-04T09:33:41.4462419Z * [new branch] gh/isuruf/146/base -> origin/gh/isuruf/146/base 2025-12-04T09:33:41.4463893Z * [new branch] gh/isuruf/146/head -> origin/gh/isuruf/146/head 2025-12-04T09:33:41.4465354Z * [new branch] gh/isuruf/146/orig -> origin/gh/isuruf/146/orig 2025-12-04T09:33:41.4467289Z * [new branch] gh/isuruf/158/base -> origin/gh/isuruf/158/base 2025-12-04T09:33:41.4468752Z * [new branch] gh/isuruf/158/head -> origin/gh/isuruf/158/head 2025-12-04T09:33:41.4470567Z * [new branch] gh/isuruf/159/base -> origin/gh/isuruf/159/base 2025-12-04T09:33:41.4472217Z * [new branch] gh/isuruf/159/head -> origin/gh/isuruf/159/head 2025-12-04T09:33:41.4474403Z * [new branch] gh/isuruf/160/base -> origin/gh/isuruf/160/base 2025-12-04T09:33:41.4475837Z * [new branch] gh/isuruf/160/head -> origin/gh/isuruf/160/head 2025-12-04T09:33:41.4477386Z * [new branch] gh/isuruf/160/orig -> origin/gh/isuruf/160/orig 2025-12-04T09:33:41.4479305Z * [new branch] gh/isuruf/81/base -> origin/gh/isuruf/81/base 2025-12-04T09:33:41.4480757Z * [new branch] gh/isuruf/81/head -> origin/gh/isuruf/81/head 2025-12-04T09:33:41.4482415Z * [new branch] gh/isuruf/81/orig -> origin/gh/isuruf/81/orig 2025-12-04T09:33:41.4484640Z * [new branch] gh/jamesjwu/176/base -> origin/gh/jamesjwu/176/base 2025-12-04T09:33:41.4486187Z * [new branch] gh/jamesjwu/176/head -> origin/gh/jamesjwu/176/head 2025-12-04T09:33:41.4487617Z * [new branch] gh/jamesjwu/176/orig -> origin/gh/jamesjwu/176/orig 2025-12-04T09:33:41.4489589Z * [new branch] gh/jamesjwu/187/base -> origin/gh/jamesjwu/187/base 2025-12-04T09:33:41.4490992Z * [new branch] gh/jamesjwu/187/head -> origin/gh/jamesjwu/187/head 2025-12-04T09:33:41.4492441Z * [new branch] gh/jamesjwu/187/orig -> origin/gh/jamesjwu/187/orig 2025-12-04T09:33:41.4494557Z * [new branch] gh/jamesjwu/196/base -> origin/gh/jamesjwu/196/base 2025-12-04T09:33:41.4496014Z * [new branch] gh/jamesjwu/196/head -> origin/gh/jamesjwu/196/head 2025-12-04T09:33:41.4497640Z * [new branch] gh/jamesjwu/196/orig -> origin/gh/jamesjwu/196/orig 2025-12-04T09:33:41.4499516Z * [new branch] gh/jamesjwu/198/base -> origin/gh/jamesjwu/198/base 2025-12-04T09:33:41.4500988Z * [new branch] gh/jamesjwu/198/head -> origin/gh/jamesjwu/198/head 2025-12-04T09:33:41.4502420Z * [new branch] gh/jamesjwu/198/orig -> origin/gh/jamesjwu/198/orig 2025-12-04T09:33:41.4504384Z * [new branch] gh/jamesjwu/207/base -> origin/gh/jamesjwu/207/base 2025-12-04T09:33:41.4506133Z * [new branch] gh/jamesjwu/207/head -> origin/gh/jamesjwu/207/head 2025-12-04T09:33:41.4507555Z * [new branch] gh/jamesjwu/207/orig -> origin/gh/jamesjwu/207/orig 2025-12-04T09:33:41.4509654Z * [new branch] gh/jamesjwu/208/base -> origin/gh/jamesjwu/208/base 2025-12-04T09:33:41.4511105Z * [new branch] gh/jamesjwu/208/head -> origin/gh/jamesjwu/208/head 2025-12-04T09:33:41.4512553Z * [new branch] gh/jamesjwu/208/orig -> origin/gh/jamesjwu/208/orig 2025-12-04T09:33:41.4514630Z * [new branch] gh/jamesjwu/52/base -> origin/gh/jamesjwu/52/base 2025-12-04T09:33:41.4516110Z * [new branch] gh/jamesjwu/52/head -> origin/gh/jamesjwu/52/head 2025-12-04T09:33:41.4518050Z * [new branch] gh/jamesjwu/53/base -> origin/gh/jamesjwu/53/base 2025-12-04T09:33:41.4519348Z * [new branch] gh/jamesjwu/53/head -> origin/gh/jamesjwu/53/head 2025-12-04T09:33:41.4521113Z * [new branch] gh/jamesjwu/54/base -> origin/gh/jamesjwu/54/base 2025-12-04T09:33:41.4522544Z * [new branch] gh/jamesjwu/54/head -> origin/gh/jamesjwu/54/head 2025-12-04T09:33:41.4524285Z * [new branch] gh/jamesjwu/55/base -> origin/gh/jamesjwu/55/base 2025-12-04T09:33:41.4525687Z * [new branch] gh/jamesjwu/55/head -> origin/gh/jamesjwu/55/head 2025-12-04T09:33:41.4527443Z * [new branch] gh/jamesjwu/56/base -> origin/gh/jamesjwu/56/base 2025-12-04T09:33:41.4528893Z * [new branch] gh/jamesjwu/56/head -> origin/gh/jamesjwu/56/head 2025-12-04T09:33:41.4530789Z * [new branch] gh/jamesjwu/57/base -> origin/gh/jamesjwu/57/base 2025-12-04T09:33:41.4532268Z * [new branch] gh/jamesjwu/57/head -> origin/gh/jamesjwu/57/head 2025-12-04T09:33:41.4533979Z * [new branch] gh/jamesjwu/58/base -> origin/gh/jamesjwu/58/base 2025-12-04T09:33:41.4535520Z * [new branch] gh/jamesjwu/58/head -> origin/gh/jamesjwu/58/head 2025-12-04T09:33:41.4537408Z * [new branch] gh/jamesjwu/59/base -> origin/gh/jamesjwu/59/base 2025-12-04T09:33:41.4538862Z * [new branch] gh/jamesjwu/59/head -> origin/gh/jamesjwu/59/head 2025-12-04T09:33:41.4540678Z * [new branch] gh/jamesjwu/60/base -> origin/gh/jamesjwu/60/base 2025-12-04T09:33:41.4542094Z * [new branch] gh/jamesjwu/60/head -> origin/gh/jamesjwu/60/head 2025-12-04T09:33:41.4543881Z * [new branch] gh/jamesjwu/61/base -> origin/gh/jamesjwu/61/base 2025-12-04T09:33:41.4545271Z * [new branch] gh/jamesjwu/61/head -> origin/gh/jamesjwu/61/head 2025-12-04T09:33:41.4547068Z * [new branch] gh/jamesjwu/62/base -> origin/gh/jamesjwu/62/base 2025-12-04T09:33:41.4548580Z * [new branch] gh/jamesjwu/62/head -> origin/gh/jamesjwu/62/head 2025-12-04T09:33:41.4550385Z * [new branch] gh/jamesjwu/63/base -> origin/gh/jamesjwu/63/base 2025-12-04T09:33:41.4551975Z * [new branch] gh/jamesjwu/63/head -> origin/gh/jamesjwu/63/head 2025-12-04T09:33:41.4554350Z * [new branch] gh/jamesjwu/64/base -> origin/gh/jamesjwu/64/base 2025-12-04T09:33:41.4555829Z * [new branch] gh/jamesjwu/64/head -> origin/gh/jamesjwu/64/head 2025-12-04T09:33:41.4558593Z * [new branch] gh/jamesjwu/65/base -> origin/gh/jamesjwu/65/base 2025-12-04T09:33:41.4559941Z * [new branch] gh/jamesjwu/65/head -> origin/gh/jamesjwu/65/head 2025-12-04T09:33:41.4562947Z * [new branch] gh/janeyx99/165/base -> origin/gh/janeyx99/165/base 2025-12-04T09:33:41.4564151Z * [new branch] gh/janeyx99/165/head -> origin/gh/janeyx99/165/head 2025-12-04T09:33:41.4565488Z * [new branch] gh/janeyx99/165/orig -> origin/gh/janeyx99/165/orig 2025-12-04T09:33:41.4567333Z * [new branch] gh/janeyx99/201/base -> origin/gh/janeyx99/201/base 2025-12-04T09:33:41.4568798Z * [new branch] gh/janeyx99/201/head -> origin/gh/janeyx99/201/head 2025-12-04T09:33:41.4570228Z * [new branch] gh/janeyx99/201/orig -> origin/gh/janeyx99/201/orig 2025-12-04T09:33:41.4573375Z * [new branch] gh/janeyx99/225/base -> origin/gh/janeyx99/225/base 2025-12-04T09:33:41.4574317Z * [new branch] gh/janeyx99/225/head -> origin/gh/janeyx99/225/head 2025-12-04T09:33:41.4575888Z * [new branch] gh/janeyx99/225/orig -> origin/gh/janeyx99/225/orig 2025-12-04T09:33:41.4578022Z * [new branch] gh/janeyx99/299/base -> origin/gh/janeyx99/299/base 2025-12-04T09:33:41.4579628Z * [new branch] gh/janeyx99/299/head -> origin/gh/janeyx99/299/head 2025-12-04T09:33:41.4580913Z * [new branch] gh/janeyx99/299/orig -> origin/gh/janeyx99/299/orig 2025-12-04T09:33:41.4583268Z * [new branch] gh/janeyx99/302/base -> origin/gh/janeyx99/302/base 2025-12-04T09:33:41.4584824Z * [new branch] gh/janeyx99/302/head -> origin/gh/janeyx99/302/head 2025-12-04T09:33:41.4586603Z * [new branch] gh/janeyx99/303/base -> origin/gh/janeyx99/303/base 2025-12-04T09:33:41.4588123Z * [new branch] gh/janeyx99/303/head -> origin/gh/janeyx99/303/head 2025-12-04T09:33:41.4590633Z * [new branch] gh/janeyx99/305/base -> origin/gh/janeyx99/305/base 2025-12-04T09:33:41.4592173Z * [new branch] gh/janeyx99/305/head -> origin/gh/janeyx99/305/head 2025-12-04T09:33:41.4593955Z * [new branch] gh/janeyx99/306/base -> origin/gh/janeyx99/306/base 2025-12-04T09:33:41.4595799Z * [new branch] gh/janeyx99/306/head -> origin/gh/janeyx99/306/head 2025-12-04T09:33:41.4597738Z * [new branch] gh/janeyx99/314/base -> origin/gh/janeyx99/314/base 2025-12-04T09:33:41.4599309Z * [new branch] gh/janeyx99/314/head -> origin/gh/janeyx99/314/head 2025-12-04T09:33:41.4600840Z * [new branch] gh/janeyx99/314/orig -> origin/gh/janeyx99/314/orig 2025-12-04T09:33:41.4602841Z * [new branch] gh/janeyx99/315/base -> origin/gh/janeyx99/315/base 2025-12-04T09:33:41.4604347Z * [new branch] gh/janeyx99/315/head -> origin/gh/janeyx99/315/head 2025-12-04T09:33:41.4605887Z * [new branch] gh/janeyx99/315/orig -> origin/gh/janeyx99/315/orig 2025-12-04T09:33:41.4607961Z * [new branch] gh/janeyx99/316/base -> origin/gh/janeyx99/316/base 2025-12-04T09:33:41.4609480Z * [new branch] gh/janeyx99/316/head -> origin/gh/janeyx99/316/head 2025-12-04T09:33:41.4610975Z * [new branch] gh/janeyx99/316/orig -> origin/gh/janeyx99/316/orig 2025-12-04T09:33:41.4613111Z * [new branch] gh/janeyx99/317/base -> origin/gh/janeyx99/317/base 2025-12-04T09:33:41.4614546Z * [new branch] gh/janeyx99/317/head -> origin/gh/janeyx99/317/head 2025-12-04T09:33:41.4615949Z * [new branch] gh/janeyx99/317/orig -> origin/gh/janeyx99/317/orig 2025-12-04T09:33:41.4618122Z * [new branch] gh/janeyx99/325/base -> origin/gh/janeyx99/325/base 2025-12-04T09:33:41.4619649Z * [new branch] gh/janeyx99/325/head -> origin/gh/janeyx99/325/head 2025-12-04T09:33:41.4621049Z * [new branch] gh/janeyx99/325/orig -> origin/gh/janeyx99/325/orig 2025-12-04T09:33:41.4623031Z * [new branch] gh/janeyx99/327/base -> origin/gh/janeyx99/327/base 2025-12-04T09:33:41.4624502Z * [new branch] gh/janeyx99/327/head -> origin/gh/janeyx99/327/head 2025-12-04T09:33:41.4625977Z * [new branch] gh/janeyx99/327/orig -> origin/gh/janeyx99/327/orig 2025-12-04T09:33:41.4628055Z * [new branch] gh/janeyx99/328/base -> origin/gh/janeyx99/328/base 2025-12-04T09:33:41.4629659Z * [new branch] gh/janeyx99/328/head -> origin/gh/janeyx99/328/head 2025-12-04T09:33:41.4631173Z * [new branch] gh/janeyx99/328/orig -> origin/gh/janeyx99/328/orig 2025-12-04T09:33:41.4632967Z * [new branch] gh/janeyx99/329/base -> origin/gh/janeyx99/329/base 2025-12-04T09:33:41.4634523Z * [new branch] gh/janeyx99/329/head -> origin/gh/janeyx99/329/head 2025-12-04T09:33:41.4635957Z * [new branch] gh/janeyx99/329/orig -> origin/gh/janeyx99/329/orig 2025-12-04T09:33:41.4638509Z * [new branch] gh/janeyx99/330/base -> origin/gh/janeyx99/330/base 2025-12-04T09:33:41.4640144Z * [new branch] gh/janeyx99/330/head -> origin/gh/janeyx99/330/head 2025-12-04T09:33:41.4641877Z * [new branch] gh/janeyx99/330/orig -> origin/gh/janeyx99/330/orig 2025-12-04T09:33:41.4643672Z * [new branch] gh/janeyx99/331/base -> origin/gh/janeyx99/331/base 2025-12-04T09:33:41.4645138Z * [new branch] gh/janeyx99/331/head -> origin/gh/janeyx99/331/head 2025-12-04T09:33:41.4646594Z * [new branch] gh/janeyx99/331/orig -> origin/gh/janeyx99/331/orig 2025-12-04T09:33:41.4648797Z * [new branch] gh/janeyx99/332/base -> origin/gh/janeyx99/332/base 2025-12-04T09:33:41.4650202Z * [new branch] gh/janeyx99/332/head -> origin/gh/janeyx99/332/head 2025-12-04T09:33:41.4651689Z * [new branch] gh/janeyx99/332/orig -> origin/gh/janeyx99/332/orig 2025-12-04T09:33:41.4653515Z * [new branch] gh/janeyx99/333/base -> origin/gh/janeyx99/333/base 2025-12-04T09:33:41.4655359Z * [new branch] gh/janeyx99/333/head -> origin/gh/janeyx99/333/head 2025-12-04T09:33:41.4656535Z * [new branch] gh/janeyx99/333/orig -> origin/gh/janeyx99/333/orig 2025-12-04T09:33:41.4658773Z * [new branch] gh/janeyx99/88/base -> origin/gh/janeyx99/88/base 2025-12-04T09:33:41.4660289Z * [new branch] gh/janeyx99/88/head -> origin/gh/janeyx99/88/head 2025-12-04T09:33:41.4661824Z * [new branch] gh/janeyx99/88/orig -> origin/gh/janeyx99/88/orig 2025-12-04T09:33:41.4664739Z * [new branch] gh/jansel/360/base -> origin/gh/jansel/360/base 2025-12-04T09:33:41.4666178Z * [new branch] gh/jansel/360/head -> origin/gh/jansel/360/head 2025-12-04T09:33:41.4668249Z * [new branch] gh/jansel/451/base -> origin/gh/jansel/451/base 2025-12-04T09:33:41.4669726Z * [new branch] gh/jansel/451/head -> origin/gh/jansel/451/head 2025-12-04T09:33:41.4671411Z * [new branch] gh/jansel/451/orig -> origin/gh/jansel/451/orig 2025-12-04T09:33:41.4676608Z * [new branch] gh/jansel/462/base -> origin/gh/jansel/462/base 2025-12-04T09:33:41.4678020Z * [new branch] gh/jansel/462/head -> origin/gh/jansel/462/head 2025-12-04T09:33:41.4679517Z * [new branch] gh/jansel/462/orig -> origin/gh/jansel/462/orig 2025-12-04T09:33:41.4681999Z * [new branch] gh/jansel/533/base -> origin/gh/jansel/533/base 2025-12-04T09:33:41.4683462Z * [new branch] gh/jansel/533/head -> origin/gh/jansel/533/head 2025-12-04T09:33:41.4684911Z * [new branch] gh/jansel/533/orig -> origin/gh/jansel/533/orig 2025-12-04T09:33:41.4686816Z * [new branch] gh/jansel/552/base -> origin/gh/jansel/552/base 2025-12-04T09:33:41.4688264Z * [new branch] gh/jansel/552/head -> origin/gh/jansel/552/head 2025-12-04T09:33:41.4689755Z * [new branch] gh/jansel/552/orig -> origin/gh/jansel/552/orig 2025-12-04T09:33:41.4691791Z * [new branch] gh/jansel/553/base -> origin/gh/jansel/553/base 2025-12-04T09:33:41.4693221Z * [new branch] gh/jansel/553/head -> origin/gh/jansel/553/head 2025-12-04T09:33:41.4694750Z * [new branch] gh/jansel/553/orig -> origin/gh/jansel/553/orig 2025-12-04T09:33:41.4696733Z * [new branch] gh/jansel/554/base -> origin/gh/jansel/554/base 2025-12-04T09:33:41.4698251Z * [new branch] gh/jansel/554/head -> origin/gh/jansel/554/head 2025-12-04T09:33:41.4699796Z * [new branch] gh/jansel/554/orig -> origin/gh/jansel/554/orig 2025-12-04T09:33:41.4701748Z * [new branch] gh/jansel/555/base -> origin/gh/jansel/555/base 2025-12-04T09:33:41.4703352Z * [new branch] gh/jansel/555/head -> origin/gh/jansel/555/head 2025-12-04T09:33:41.4704684Z * [new branch] gh/jansel/555/orig -> origin/gh/jansel/555/orig 2025-12-04T09:33:41.4706559Z * [new branch] gh/jansel/556/base -> origin/gh/jansel/556/base 2025-12-04T09:33:41.4708007Z * [new branch] gh/jansel/556/head -> origin/gh/jansel/556/head 2025-12-04T09:33:41.4709794Z * [new branch] gh/jansel/556/orig -> origin/gh/jansel/556/orig 2025-12-04T09:33:41.4711540Z * [new branch] gh/jansel/557/base -> origin/gh/jansel/557/base 2025-12-04T09:33:41.4713036Z * [new branch] gh/jansel/557/head -> origin/gh/jansel/557/head 2025-12-04T09:33:41.4714529Z * [new branch] gh/jansel/557/orig -> origin/gh/jansel/557/orig 2025-12-04T09:33:41.4717063Z * [new branch] gh/jansel/558/base -> origin/gh/jansel/558/base 2025-12-04T09:33:41.4718558Z * [new branch] gh/jansel/558/head -> origin/gh/jansel/558/head 2025-12-04T09:33:41.4720001Z * [new branch] gh/jansel/558/orig -> origin/gh/jansel/558/orig 2025-12-04T09:33:41.4721938Z * [new branch] gh/jansel/559/base -> origin/gh/jansel/559/base 2025-12-04T09:33:41.4723437Z * [new branch] gh/jansel/559/head -> origin/gh/jansel/559/head 2025-12-04T09:33:41.4724877Z * [new branch] gh/jansel/559/orig -> origin/gh/jansel/559/orig 2025-12-04T09:33:41.4726801Z * [new branch] gh/jansel/560/base -> origin/gh/jansel/560/base 2025-12-04T09:33:41.4728248Z * [new branch] gh/jansel/560/head -> origin/gh/jansel/560/head 2025-12-04T09:33:41.4729834Z * [new branch] gh/jansel/560/orig -> origin/gh/jansel/560/orig 2025-12-04T09:33:41.4731837Z * [new branch] gh/jansel/561/base -> origin/gh/jansel/561/base 2025-12-04T09:33:41.4733338Z * [new branch] gh/jansel/561/head -> origin/gh/jansel/561/head 2025-12-04T09:33:41.4734772Z * [new branch] gh/jansel/561/orig -> origin/gh/jansel/561/orig 2025-12-04T09:33:41.4737166Z * [new branch] gh/jansel/562/base -> origin/gh/jansel/562/base 2025-12-04T09:33:41.4738307Z * [new branch] gh/jansel/562/head -> origin/gh/jansel/562/head 2025-12-04T09:33:41.4739751Z * [new branch] gh/jansel/562/orig -> origin/gh/jansel/562/orig 2025-12-04T09:33:41.4741726Z * [new branch] gh/jansel/563/base -> origin/gh/jansel/563/base 2025-12-04T09:33:41.4743145Z * [new branch] gh/jansel/563/head -> origin/gh/jansel/563/head 2025-12-04T09:33:41.4744628Z * [new branch] gh/jansel/563/orig -> origin/gh/jansel/563/orig 2025-12-04T09:33:41.4747215Z * [new branch] gh/jansel/564/base -> origin/gh/jansel/564/base 2025-12-04T09:33:41.4748890Z * [new branch] gh/jansel/564/head -> origin/gh/jansel/564/head 2025-12-04T09:33:41.4750244Z * [new branch] gh/jansel/564/orig -> origin/gh/jansel/564/orig 2025-12-04T09:33:41.4752384Z * [new branch] gh/jansel/565/base -> origin/gh/jansel/565/base 2025-12-04T09:33:41.4753898Z * [new branch] gh/jansel/565/head -> origin/gh/jansel/565/head 2025-12-04T09:33:41.4755417Z * [new branch] gh/jansel/565/orig -> origin/gh/jansel/565/orig 2025-12-04T09:33:41.4757452Z * [new branch] gh/jansel/566/base -> origin/gh/jansel/566/base 2025-12-04T09:33:41.4758915Z * [new branch] gh/jansel/566/head -> origin/gh/jansel/566/head 2025-12-04T09:33:41.4760357Z * [new branch] gh/jansel/566/orig -> origin/gh/jansel/566/orig 2025-12-04T09:33:41.4762465Z * [new branch] gh/jansel/567/base -> origin/gh/jansel/567/base 2025-12-04T09:33:41.4763990Z * [new branch] gh/jansel/567/head -> origin/gh/jansel/567/head 2025-12-04T09:33:41.4765417Z * [new branch] gh/jansel/567/orig -> origin/gh/jansel/567/orig 2025-12-04T09:33:41.4767482Z * [new branch] gh/jansel/568/base -> origin/gh/jansel/568/base 2025-12-04T09:33:41.4769042Z * [new branch] gh/jansel/568/head -> origin/gh/jansel/568/head 2025-12-04T09:33:41.4770455Z * [new branch] gh/jansel/568/orig -> origin/gh/jansel/568/orig 2025-12-04T09:33:41.4772752Z * [new branch] gh/jansel/569/base -> origin/gh/jansel/569/base 2025-12-04T09:33:41.4774178Z * [new branch] gh/jansel/569/head -> origin/gh/jansel/569/head 2025-12-04T09:33:41.4775676Z * [new branch] gh/jansel/569/orig -> origin/gh/jansel/569/orig 2025-12-04T09:33:41.4777773Z * [new branch] gh/jansel/570/base -> origin/gh/jansel/570/base 2025-12-04T09:33:41.4779238Z * [new branch] gh/jansel/570/head -> origin/gh/jansel/570/head 2025-12-04T09:33:41.4780722Z * [new branch] gh/jansel/570/orig -> origin/gh/jansel/570/orig 2025-12-04T09:33:41.4782665Z * [new branch] gh/jansel/571/base -> origin/gh/jansel/571/base 2025-12-04T09:33:41.4784190Z * [new branch] gh/jansel/571/head -> origin/gh/jansel/571/head 2025-12-04T09:33:41.4785701Z * [new branch] gh/jansel/571/orig -> origin/gh/jansel/571/orig 2025-12-04T09:33:41.4787600Z * [new branch] gh/jansel/572/base -> origin/gh/jansel/572/base 2025-12-04T09:33:41.4789103Z * [new branch] gh/jansel/572/head -> origin/gh/jansel/572/head 2025-12-04T09:33:41.4790573Z * [new branch] gh/jansel/572/orig -> origin/gh/jansel/572/orig 2025-12-04T09:33:41.4792799Z * [new branch] gh/jansel/573/base -> origin/gh/jansel/573/base 2025-12-04T09:33:41.4794291Z * [new branch] gh/jansel/573/head -> origin/gh/jansel/573/head 2025-12-04T09:33:41.4795818Z * [new branch] gh/jansel/573/orig -> origin/gh/jansel/573/orig 2025-12-04T09:33:41.4797891Z * [new branch] gh/jansel/574/base -> origin/gh/jansel/574/base 2025-12-04T09:33:41.4799375Z * [new branch] gh/jansel/574/head -> origin/gh/jansel/574/head 2025-12-04T09:33:41.4800914Z * [new branch] gh/jansel/574/orig -> origin/gh/jansel/574/orig 2025-12-04T09:33:41.4802903Z * [new branch] gh/jansel/575/base -> origin/gh/jansel/575/base 2025-12-04T09:33:41.4804465Z * [new branch] gh/jansel/575/head -> origin/gh/jansel/575/head 2025-12-04T09:33:41.4805962Z * [new branch] gh/jansel/575/orig -> origin/gh/jansel/575/orig 2025-12-04T09:33:41.4808048Z * [new branch] gh/jansel/576/base -> origin/gh/jansel/576/base 2025-12-04T09:33:41.4809564Z * [new branch] gh/jansel/576/head -> origin/gh/jansel/576/head 2025-12-04T09:33:41.4811526Z * [new branch] gh/jansel/576/orig -> origin/gh/jansel/576/orig 2025-12-04T09:33:41.4814179Z * [new branch] gh/jbschlosser/247/base -> origin/gh/jbschlosser/247/base 2025-12-04T09:33:41.4815706Z * [new branch] gh/jbschlosser/247/head -> origin/gh/jbschlosser/247/head 2025-12-04T09:33:41.4817347Z * [new branch] gh/jbschlosser/247/orig -> origin/gh/jbschlosser/247/orig 2025-12-04T09:33:41.4819973Z * [new branch] gh/jbschlosser/250/base -> origin/gh/jbschlosser/250/base 2025-12-04T09:33:41.4821340Z * [new branch] gh/jbschlosser/250/head -> origin/gh/jbschlosser/250/head 2025-12-04T09:33:41.4822868Z * [new branch] gh/jbschlosser/250/orig -> origin/gh/jbschlosser/250/orig 2025-12-04T09:33:41.4826163Z * [new branch] gh/jerryzh168/1/base -> origin/gh/jerryzh168/1/base 2025-12-04T09:33:41.4827487Z * [new branch] gh/jerryzh168/1/head -> origin/gh/jerryzh168/1/head 2025-12-04T09:33:41.4828937Z * [new branch] gh/jerryzh168/1/orig -> origin/gh/jerryzh168/1/orig 2025-12-04T09:33:41.4831336Z * [new branch] gh/jiayisunx/59/base -> origin/gh/jiayisunx/59/base 2025-12-04T09:33:41.4833064Z * [new branch] gh/jiayisunx/59/head -> origin/gh/jiayisunx/59/head 2025-12-04T09:33:41.4834569Z * [new branch] gh/jiayisunx/59/orig -> origin/gh/jiayisunx/59/orig 2025-12-04T09:33:41.4836434Z * [new branch] gh/jiayisunx/61/base -> origin/gh/jiayisunx/61/base 2025-12-04T09:33:41.4837922Z * [new branch] gh/jiayisunx/61/head -> origin/gh/jiayisunx/61/head 2025-12-04T09:33:41.4839447Z * [new branch] gh/jiayisunx/61/orig -> origin/gh/jiayisunx/61/orig 2025-12-04T09:33:41.4841441Z * [new branch] gh/jiayisunx/68/base -> origin/gh/jiayisunx/68/base 2025-12-04T09:33:41.4842857Z * [new branch] gh/jiayisunx/68/head -> origin/gh/jiayisunx/68/head 2025-12-04T09:33:41.4844380Z * [new branch] gh/jiayisunx/68/orig -> origin/gh/jiayisunx/68/orig 2025-12-04T09:33:41.4846340Z * [new branch] gh/jiayisunx/77/base -> origin/gh/jiayisunx/77/base 2025-12-04T09:33:41.4847840Z * [new branch] gh/jiayisunx/77/head -> origin/gh/jiayisunx/77/head 2025-12-04T09:33:41.4849287Z * [new branch] gh/jiayisunx/77/orig -> origin/gh/jiayisunx/77/orig 2025-12-04T09:33:41.4851833Z * [new branch] gh/jiayisunx/78/base -> origin/gh/jiayisunx/78/base 2025-12-04T09:33:41.4853470Z * [new branch] gh/jiayisunx/78/head -> origin/gh/jiayisunx/78/head 2025-12-04T09:33:41.4854986Z * [new branch] gh/jiayisunx/78/orig -> origin/gh/jiayisunx/78/orig 2025-12-04T09:33:41.4857094Z * [new branch] gh/jiayisunx/79/base -> origin/gh/jiayisunx/79/base 2025-12-04T09:33:41.4858737Z * [new branch] gh/jiayisunx/79/head -> origin/gh/jiayisunx/79/head 2025-12-04T09:33:41.4860751Z * [new branch] gh/jiayisunx/79/orig -> origin/gh/jiayisunx/79/orig 2025-12-04T09:33:41.4862763Z * [new branch] gh/jiayisunx/82/base -> origin/gh/jiayisunx/82/base 2025-12-04T09:33:41.4864266Z * [new branch] gh/jiayisunx/82/head -> origin/gh/jiayisunx/82/head 2025-12-04T09:33:41.4865794Z * [new branch] gh/jiayisunx/82/orig -> origin/gh/jiayisunx/82/orig 2025-12-04T09:33:41.4867693Z * [new branch] gh/jiayisunx/83/base -> origin/gh/jiayisunx/83/base 2025-12-04T09:33:41.4869267Z * [new branch] gh/jiayisunx/83/head -> origin/gh/jiayisunx/83/head 2025-12-04T09:33:41.4870760Z * [new branch] gh/jiayisunx/83/orig -> origin/gh/jiayisunx/83/orig 2025-12-04T09:33:41.4872896Z * [new branch] gh/jiayisunx/84/base -> origin/gh/jiayisunx/84/base 2025-12-04T09:33:41.4874484Z * [new branch] gh/jiayisunx/84/head -> origin/gh/jiayisunx/84/head 2025-12-04T09:33:41.4875984Z * [new branch] gh/jiayisunx/84/orig -> origin/gh/jiayisunx/84/orig 2025-12-04T09:33:41.4877985Z * [new branch] gh/jiayisunx/85/base -> origin/gh/jiayisunx/85/base 2025-12-04T09:33:41.4879484Z * [new branch] gh/jiayisunx/85/head -> origin/gh/jiayisunx/85/head 2025-12-04T09:33:41.4880976Z * [new branch] gh/jiayisunx/85/orig -> origin/gh/jiayisunx/85/orig 2025-12-04T09:33:41.4882825Z * [new branch] gh/jiayisunx/86/base -> origin/gh/jiayisunx/86/base 2025-12-04T09:33:41.4884324Z * [new branch] gh/jiayisunx/86/head -> origin/gh/jiayisunx/86/head 2025-12-04T09:33:41.4886183Z * [new branch] gh/jiayisunx/86/orig -> origin/gh/jiayisunx/86/orig 2025-12-04T09:33:41.4887814Z * [new branch] gh/jiayisunx/87/base -> origin/gh/jiayisunx/87/base 2025-12-04T09:33:41.4889261Z * [new branch] gh/jiayisunx/87/head -> origin/gh/jiayisunx/87/head 2025-12-04T09:33:41.4890719Z * [new branch] gh/jiayisunx/87/orig -> origin/gh/jiayisunx/87/orig 2025-12-04T09:33:41.4892693Z * [new branch] gh/jiayisunx/88/base -> origin/gh/jiayisunx/88/base 2025-12-04T09:33:41.4894278Z * [new branch] gh/jiayisunx/88/head -> origin/gh/jiayisunx/88/head 2025-12-04T09:33:41.4895774Z * [new branch] gh/jiayisunx/88/orig -> origin/gh/jiayisunx/88/orig 2025-12-04T09:33:41.4897869Z * [new branch] gh/jiayisunx/89/base -> origin/gh/jiayisunx/89/base 2025-12-04T09:33:41.4899308Z * [new branch] gh/jiayisunx/89/head -> origin/gh/jiayisunx/89/head 2025-12-04T09:33:41.4900789Z * [new branch] gh/jiayisunx/89/orig -> origin/gh/jiayisunx/89/orig 2025-12-04T09:33:41.4902771Z * [new branch] gh/jiayisunx/90/base -> origin/gh/jiayisunx/90/base 2025-12-04T09:33:41.4904321Z * [new branch] gh/jiayisunx/90/head -> origin/gh/jiayisunx/90/head 2025-12-04T09:33:41.4905791Z * [new branch] gh/jiayisunx/90/orig -> origin/gh/jiayisunx/90/orig 2025-12-04T09:33:41.4908138Z * [new branch] gh/jjwu@meta.com/1/base -> origin/gh/jjwu@meta.com/1/base 2025-12-04T09:33:41.4909580Z * [new branch] gh/jjwu@meta.com/1/head -> origin/gh/jjwu@meta.com/1/head 2025-12-04T09:33:41.4912064Z * [new branch] gh/jturney/1/base -> origin/gh/jturney/1/base 2025-12-04T09:33:41.4913664Z * [new branch] gh/jturney/1/head -> origin/gh/jturney/1/head 2025-12-04T09:33:41.4915158Z * [new branch] gh/jturney/1/orig -> origin/gh/jturney/1/orig 2025-12-04T09:33:41.4917085Z * [new branch] gh/jturney/2/base -> origin/gh/jturney/2/base 2025-12-04T09:33:41.4918547Z * [new branch] gh/jturney/2/head -> origin/gh/jturney/2/head 2025-12-04T09:33:41.4920031Z * [new branch] gh/jturney/2/orig -> origin/gh/jturney/2/orig 2025-12-04T09:33:41.4922651Z * [new branch] gh/karthickai/10/base -> origin/gh/karthickai/10/base 2025-12-04T09:33:41.4924285Z * [new branch] gh/karthickai/10/head -> origin/gh/karthickai/10/head 2025-12-04T09:33:41.4925778Z * [new branch] gh/karthickai/10/orig -> origin/gh/karthickai/10/orig 2025-12-04T09:33:41.4927734Z * [new branch] gh/karthickai/11/base -> origin/gh/karthickai/11/base 2025-12-04T09:33:41.4929388Z * [new branch] gh/karthickai/11/head -> origin/gh/karthickai/11/head 2025-12-04T09:33:41.4930945Z * [new branch] gh/karthickai/11/orig -> origin/gh/karthickai/11/orig 2025-12-04T09:33:41.4933495Z * [new branch] gh/karthickai/12/base -> origin/gh/karthickai/12/base 2025-12-04T09:33:41.4935058Z * [new branch] gh/karthickai/12/head -> origin/gh/karthickai/12/head 2025-12-04T09:33:41.4936669Z * [new branch] gh/karthickai/12/orig -> origin/gh/karthickai/12/orig 2025-12-04T09:33:41.4938705Z * [new branch] gh/karthickai/13/base -> origin/gh/karthickai/13/base 2025-12-04T09:33:41.4940261Z * [new branch] gh/karthickai/13/head -> origin/gh/karthickai/13/head 2025-12-04T09:33:41.4941752Z * [new branch] gh/karthickai/13/orig -> origin/gh/karthickai/13/orig 2025-12-04T09:33:41.4943989Z * [new branch] gh/karthickai/14/base -> origin/gh/karthickai/14/base 2025-12-04T09:33:41.4945636Z * [new branch] gh/karthickai/14/head -> origin/gh/karthickai/14/head 2025-12-04T09:33:41.4947291Z * [new branch] gh/karthickai/14/orig -> origin/gh/karthickai/14/orig 2025-12-04T09:33:41.4949494Z * [new branch] gh/karthickai/15/base -> origin/gh/karthickai/15/base 2025-12-04T09:33:41.4951057Z * [new branch] gh/karthickai/15/head -> origin/gh/karthickai/15/head 2025-12-04T09:33:41.4952572Z * [new branch] gh/karthickai/15/orig -> origin/gh/karthickai/15/orig 2025-12-04T09:33:41.4954571Z * [new branch] gh/karthickai/16/base -> origin/gh/karthickai/16/base 2025-12-04T09:33:41.4956083Z * [new branch] gh/karthickai/16/head -> origin/gh/karthickai/16/head 2025-12-04T09:33:41.4957839Z * [new branch] gh/karthickai/16/orig -> origin/gh/karthickai/16/orig 2025-12-04T09:33:41.4959799Z * [new branch] gh/karthickai/17/base -> origin/gh/karthickai/17/base 2025-12-04T09:33:41.4961862Z * [new branch] gh/karthickai/17/head -> origin/gh/karthickai/17/head 2025-12-04T09:33:41.4963333Z * [new branch] gh/karthickai/17/orig -> origin/gh/karthickai/17/orig 2025-12-04T09:33:41.4965412Z * [new branch] gh/karthickai/18/base -> origin/gh/karthickai/18/base 2025-12-04T09:33:41.4967285Z * [new branch] gh/karthickai/18/head -> origin/gh/karthickai/18/head 2025-12-04T09:33:41.4968981Z * [new branch] gh/karthickai/18/orig -> origin/gh/karthickai/18/orig 2025-12-04T09:33:41.4971213Z * [new branch] gh/karthickai/19/base -> origin/gh/karthickai/19/base 2025-12-04T09:33:41.4972770Z * [new branch] gh/karthickai/19/head -> origin/gh/karthickai/19/head 2025-12-04T09:33:41.4974226Z * [new branch] gh/karthickai/19/orig -> origin/gh/karthickai/19/orig 2025-12-04T09:33:41.4977347Z * [new branch] gh/karthickai/20/base -> origin/gh/karthickai/20/base 2025-12-04T09:33:41.4979449Z * [new branch] gh/karthickai/20/head -> origin/gh/karthickai/20/head 2025-12-04T09:33:41.4980957Z * [new branch] gh/karthickai/20/orig -> origin/gh/karthickai/20/orig 2025-12-04T09:33:41.4983059Z * [new branch] gh/karthickai/21/base -> origin/gh/karthickai/21/base 2025-12-04T09:33:41.4984857Z * [new branch] gh/karthickai/21/head -> origin/gh/karthickai/21/head 2025-12-04T09:33:41.4986361Z * [new branch] gh/karthickai/21/orig -> origin/gh/karthickai/21/orig 2025-12-04T09:33:41.4988568Z * [new branch] gh/karthickai/22/base -> origin/gh/karthickai/22/base 2025-12-04T09:33:41.4990013Z * [new branch] gh/karthickai/22/head -> origin/gh/karthickai/22/head 2025-12-04T09:33:41.4991461Z * [new branch] gh/karthickai/22/orig -> origin/gh/karthickai/22/orig 2025-12-04T09:33:41.4993669Z * [new branch] gh/karthickai/23/base -> origin/gh/karthickai/23/base 2025-12-04T09:33:41.4995350Z * [new branch] gh/karthickai/23/head -> origin/gh/karthickai/23/head 2025-12-04T09:33:41.4996826Z * [new branch] gh/karthickai/23/orig -> origin/gh/karthickai/23/orig 2025-12-04T09:33:41.4998962Z * [new branch] gh/karthickai/24/base -> origin/gh/karthickai/24/base 2025-12-04T09:33:41.5000498Z * [new branch] gh/karthickai/24/head -> origin/gh/karthickai/24/head 2025-12-04T09:33:41.5001985Z * [new branch] gh/karthickai/24/orig -> origin/gh/karthickai/24/orig 2025-12-04T09:33:41.5004529Z * [new branch] gh/karthickai/25/base -> origin/gh/karthickai/25/base 2025-12-04T09:33:41.5006191Z * [new branch] gh/karthickai/25/head -> origin/gh/karthickai/25/head 2025-12-04T09:33:41.5007659Z * [new branch] gh/karthickai/25/orig -> origin/gh/karthickai/25/orig 2025-12-04T09:33:41.5009516Z * [new branch] gh/karthickai/26/base -> origin/gh/karthickai/26/base 2025-12-04T09:33:41.5011338Z * [new branch] gh/karthickai/26/head -> origin/gh/karthickai/26/head 2025-12-04T09:33:41.5012761Z * [new branch] gh/karthickai/26/orig -> origin/gh/karthickai/26/orig 2025-12-04T09:33:41.5016547Z * [new branch] gh/karthickai/6/base -> origin/gh/karthickai/6/base 2025-12-04T09:33:41.5018846Z * [new branch] gh/karthickai/6/head -> origin/gh/karthickai/6/head 2025-12-04T09:33:41.5020349Z * [new branch] gh/karthickai/6/orig -> origin/gh/karthickai/6/orig 2025-12-04T09:33:41.5022839Z * [new branch] gh/krocki/1/base -> origin/gh/krocki/1/base 2025-12-04T09:33:41.5024322Z * [new branch] gh/krocki/1/head -> origin/gh/krocki/1/head 2025-12-04T09:33:41.5025830Z * [new branch] gh/krocki/1/orig -> origin/gh/krocki/1/orig 2025-12-04T09:33:41.5027852Z * [new branch] gh/krocki/2/base -> origin/gh/krocki/2/base 2025-12-04T09:33:41.5029408Z * [new branch] gh/krocki/2/head -> origin/gh/krocki/2/head 2025-12-04T09:33:41.5030854Z * [new branch] gh/krocki/2/orig -> origin/gh/krocki/2/orig 2025-12-04T09:33:41.5033246Z * [new branch] gh/kurtamohler/60/base -> origin/gh/kurtamohler/60/base 2025-12-04T09:33:41.5034740Z * [new branch] gh/kurtamohler/60/head -> origin/gh/kurtamohler/60/head 2025-12-04T09:33:41.5036190Z * [new branch] gh/kurtamohler/60/orig -> origin/gh/kurtamohler/60/orig 2025-12-04T09:33:41.5038151Z * [new branch] gh/kurtamohler/61/base -> origin/gh/kurtamohler/61/base 2025-12-04T09:33:41.5039721Z * [new branch] gh/kurtamohler/61/head -> origin/gh/kurtamohler/61/head 2025-12-04T09:33:41.5041328Z * [new branch] gh/kurtamohler/61/orig -> origin/gh/kurtamohler/61/orig 2025-12-04T09:33:41.5043723Z * [new branch] gh/kurtamohler/62/base -> origin/gh/kurtamohler/62/base 2025-12-04T09:33:41.5045226Z * [new branch] gh/kurtamohler/62/head -> origin/gh/kurtamohler/62/head 2025-12-04T09:33:41.5046722Z * [new branch] gh/kurtamohler/62/orig -> origin/gh/kurtamohler/62/orig 2025-12-04T09:33:41.5049256Z * [new branch] gh/kurtamohler/63/base -> origin/gh/kurtamohler/63/base 2025-12-04T09:33:41.5050780Z * [new branch] gh/kurtamohler/63/head -> origin/gh/kurtamohler/63/head 2025-12-04T09:33:41.5052285Z * [new branch] gh/kurtamohler/63/orig -> origin/gh/kurtamohler/63/orig 2025-12-04T09:33:41.5054440Z * [new branch] gh/kurtamohler/64/base -> origin/gh/kurtamohler/64/base 2025-12-04T09:33:41.5055912Z * [new branch] gh/kurtamohler/64/head -> origin/gh/kurtamohler/64/head 2025-12-04T09:33:41.5057596Z * [new branch] gh/kurtamohler/64/orig -> origin/gh/kurtamohler/64/orig 2025-12-04T09:33:41.5059553Z * [new branch] gh/kurtamohler/65/base -> origin/gh/kurtamohler/65/base 2025-12-04T09:33:41.5061141Z * [new branch] gh/kurtamohler/65/head -> origin/gh/kurtamohler/65/head 2025-12-04T09:33:41.5062650Z * [new branch] gh/kurtamohler/65/orig -> origin/gh/kurtamohler/65/orig 2025-12-04T09:33:41.5064522Z * [new branch] gh/kurtamohler/66/base -> origin/gh/kurtamohler/66/base 2025-12-04T09:33:41.5066059Z * [new branch] gh/kurtamohler/66/head -> origin/gh/kurtamohler/66/head 2025-12-04T09:33:41.5067543Z * [new branch] gh/kurtamohler/66/orig -> origin/gh/kurtamohler/66/orig 2025-12-04T09:33:41.5069454Z * [new branch] gh/kurtamohler/67/base -> origin/gh/kurtamohler/67/base 2025-12-04T09:33:41.5070897Z * [new branch] gh/kurtamohler/67/head -> origin/gh/kurtamohler/67/head 2025-12-04T09:33:41.5075494Z * [new branch] gh/kurtamohler/67/orig -> origin/gh/kurtamohler/67/orig 2025-12-04T09:33:41.5077858Z * [new branch] gh/kwen2501/130/base -> origin/gh/kwen2501/130/base 2025-12-04T09:33:41.5079535Z * [new branch] gh/kwen2501/130/head -> origin/gh/kwen2501/130/head 2025-12-04T09:33:41.5081178Z * [new branch] gh/kwen2501/130/orig -> origin/gh/kwen2501/130/orig 2025-12-04T09:33:41.5083299Z * [new branch] gh/kwen2501/170/base -> origin/gh/kwen2501/170/base 2025-12-04T09:33:41.5084770Z * [new branch] gh/kwen2501/170/head -> origin/gh/kwen2501/170/head 2025-12-04T09:33:41.5086890Z * [new branch] gh/kwen2501/187/base -> origin/gh/kwen2501/187/base 2025-12-04T09:33:41.5088459Z * [new branch] gh/kwen2501/187/head -> origin/gh/kwen2501/187/head 2025-12-04T09:33:41.5090023Z * [new branch] gh/kwen2501/187/orig -> origin/gh/kwen2501/187/orig 2025-12-04T09:33:41.5092494Z * [new branch] gh/kwen2501/188/base -> origin/gh/kwen2501/188/base 2025-12-04T09:33:41.5094014Z * [new branch] gh/kwen2501/188/head -> origin/gh/kwen2501/188/head 2025-12-04T09:33:41.5095587Z * [new branch] gh/kwen2501/188/orig -> origin/gh/kwen2501/188/orig 2025-12-04T09:33:41.5097670Z * [new branch] gh/kwen2501/211/base -> origin/gh/kwen2501/211/base 2025-12-04T09:33:41.5099163Z * [new branch] gh/kwen2501/211/head -> origin/gh/kwen2501/211/head 2025-12-04T09:33:41.5101454Z * [new branch] gh/kwen2501/224/base -> origin/gh/kwen2501/224/base 2025-12-04T09:33:41.5102927Z * [new branch] gh/kwen2501/224/head -> origin/gh/kwen2501/224/head 2025-12-04T09:33:41.5105035Z * [new branch] gh/kwen2501/224/orig -> origin/gh/kwen2501/224/orig 2025-12-04T09:33:41.5107641Z * [new branch] gh/kwen2501/228/base -> origin/gh/kwen2501/228/base 2025-12-04T09:33:41.5109156Z * [new branch] gh/kwen2501/228/head -> origin/gh/kwen2501/228/head 2025-12-04T09:33:41.5110637Z * [new branch] gh/kwen2501/228/orig -> origin/gh/kwen2501/228/orig 2025-12-04T09:33:41.5112791Z * [new branch] gh/kwen2501/234/base -> origin/gh/kwen2501/234/base 2025-12-04T09:33:41.5114290Z * [new branch] gh/kwen2501/234/head -> origin/gh/kwen2501/234/head 2025-12-04T09:33:41.5115756Z * [new branch] gh/kwen2501/234/orig -> origin/gh/kwen2501/234/orig 2025-12-04T09:33:41.5117684Z * [new branch] gh/kwen2501/235/base -> origin/gh/kwen2501/235/base 2025-12-04T09:33:41.5119269Z * [new branch] gh/kwen2501/235/head -> origin/gh/kwen2501/235/head 2025-12-04T09:33:41.5120743Z * [new branch] gh/kwen2501/235/orig -> origin/gh/kwen2501/235/orig 2025-12-04T09:33:41.5122603Z * [new branch] gh/kwen2501/236/base -> origin/gh/kwen2501/236/base 2025-12-04T09:33:41.5124132Z * [new branch] gh/kwen2501/236/head -> origin/gh/kwen2501/236/head 2025-12-04T09:33:41.5125730Z * [new branch] gh/kwen2501/236/orig -> origin/gh/kwen2501/236/orig 2025-12-04T09:33:41.5127700Z * [new branch] gh/kwen2501/237/base -> origin/gh/kwen2501/237/base 2025-12-04T09:33:41.5129121Z * [new branch] gh/kwen2501/237/head -> origin/gh/kwen2501/237/head 2025-12-04T09:33:41.5130628Z * [new branch] gh/kwen2501/237/orig -> origin/gh/kwen2501/237/orig 2025-12-04T09:33:41.5133229Z * [new branch] gh/kwen2501/238/base -> origin/gh/kwen2501/238/base 2025-12-04T09:33:41.5134125Z * [new branch] gh/kwen2501/238/head -> origin/gh/kwen2501/238/head 2025-12-04T09:33:41.5135840Z * [new branch] gh/kwen2501/238/orig -> origin/gh/kwen2501/238/orig 2025-12-04T09:33:41.5138110Z * [new branch] gh/kwen2501/240/base -> origin/gh/kwen2501/240/base 2025-12-04T09:33:41.5139271Z * [new branch] gh/kwen2501/240/head -> origin/gh/kwen2501/240/head 2025-12-04T09:33:41.5140918Z * [new branch] gh/kwen2501/240/orig -> origin/gh/kwen2501/240/orig 2025-12-04T09:33:41.5142922Z * [new branch] gh/kwen2501/241/base -> origin/gh/kwen2501/241/base 2025-12-04T09:33:41.5144260Z * [new branch] gh/kwen2501/241/head -> origin/gh/kwen2501/241/head 2025-12-04T09:33:41.5145994Z * [new branch] gh/kwen2501/241/orig -> origin/gh/kwen2501/241/orig 2025-12-04T09:33:41.5148168Z * [new branch] gh/kwen2501/247/base -> origin/gh/kwen2501/247/base 2025-12-04T09:33:41.5149446Z * [new branch] gh/kwen2501/247/head -> origin/gh/kwen2501/247/head 2025-12-04T09:33:41.5151070Z * [new branch] gh/kwen2501/247/orig -> origin/gh/kwen2501/247/orig 2025-12-04T09:33:41.5153257Z * [new branch] gh/kwen2501/252/base -> origin/gh/kwen2501/252/base 2025-12-04T09:33:41.5154091Z * [new branch] gh/kwen2501/252/head -> origin/gh/kwen2501/252/head 2025-12-04T09:33:41.5156288Z * [new branch] gh/kwen2501/252/orig -> origin/gh/kwen2501/252/orig 2025-12-04T09:33:41.5158855Z * [new branch] gh/kwen2501/259/base -> origin/gh/kwen2501/259/base 2025-12-04T09:33:41.5160492Z * [new branch] gh/kwen2501/259/head -> origin/gh/kwen2501/259/head 2025-12-04T09:33:41.5162042Z * [new branch] gh/kwen2501/259/orig -> origin/gh/kwen2501/259/orig 2025-12-04T09:33:41.5164128Z * [new branch] gh/kwen2501/260/base -> origin/gh/kwen2501/260/base 2025-12-04T09:33:41.5165750Z * [new branch] gh/kwen2501/260/head -> origin/gh/kwen2501/260/head 2025-12-04T09:33:41.5167439Z * [new branch] gh/kwen2501/260/orig -> origin/gh/kwen2501/260/orig 2025-12-04T09:33:41.5169461Z * [new branch] gh/kwen2501/268/base -> origin/gh/kwen2501/268/base 2025-12-04T09:33:41.5171111Z * [new branch] gh/kwen2501/268/head -> origin/gh/kwen2501/268/head 2025-12-04T09:33:41.5172714Z * [new branch] gh/kwen2501/268/orig -> origin/gh/kwen2501/268/orig 2025-12-04T09:33:41.5174717Z * [new branch] gh/kwen2501/269/base -> origin/gh/kwen2501/269/base 2025-12-04T09:33:41.5176453Z * [new branch] gh/kwen2501/269/head -> origin/gh/kwen2501/269/head 2025-12-04T09:33:41.5177987Z * [new branch] gh/kwen2501/269/orig -> origin/gh/kwen2501/269/orig 2025-12-04T09:33:41.5180157Z * [new branch] gh/kwen2501/270/base -> origin/gh/kwen2501/270/base 2025-12-04T09:33:41.5181819Z * [new branch] gh/kwen2501/270/head -> origin/gh/kwen2501/270/head 2025-12-04T09:33:41.5183242Z * [new branch] gh/kwen2501/270/orig -> origin/gh/kwen2501/270/orig 2025-12-04T09:33:41.5185879Z * [new branch] gh/kwen2501/271/base -> origin/gh/kwen2501/271/base 2025-12-04T09:33:41.5187439Z * [new branch] gh/kwen2501/271/head -> origin/gh/kwen2501/271/head 2025-12-04T09:33:41.5189164Z * [new branch] gh/kwen2501/271/orig -> origin/gh/kwen2501/271/orig 2025-12-04T09:33:41.5191255Z * [new branch] gh/kwen2501/274/base -> origin/gh/kwen2501/274/base 2025-12-04T09:33:41.5192926Z * [new branch] gh/kwen2501/274/head -> origin/gh/kwen2501/274/head 2025-12-04T09:33:41.5194444Z * [new branch] gh/kwen2501/274/orig -> origin/gh/kwen2501/274/orig 2025-12-04T09:33:41.5196602Z * [new branch] gh/kwen2501/275/base -> origin/gh/kwen2501/275/base 2025-12-04T09:33:41.5198284Z * [new branch] gh/kwen2501/275/head -> origin/gh/kwen2501/275/head 2025-12-04T09:33:41.5200063Z * [new branch] gh/kwen2501/275/orig -> origin/gh/kwen2501/275/orig 2025-12-04T09:33:41.5201916Z * [new branch] gh/kwen2501/276/base -> origin/gh/kwen2501/276/base 2025-12-04T09:33:41.5203400Z * [new branch] gh/kwen2501/276/head -> origin/gh/kwen2501/276/head 2025-12-04T09:33:41.5204854Z * [new branch] gh/kwen2501/276/orig -> origin/gh/kwen2501/276/orig 2025-12-04T09:33:41.5206923Z * [new branch] gh/kwen2501/277/base -> origin/gh/kwen2501/277/base 2025-12-04T09:33:41.5208369Z * [new branch] gh/kwen2501/277/head -> origin/gh/kwen2501/277/head 2025-12-04T09:33:41.5209930Z * [new branch] gh/kwen2501/277/orig -> origin/gh/kwen2501/277/orig 2025-12-04T09:33:41.5211930Z * [new branch] gh/kwen2501/278/base -> origin/gh/kwen2501/278/base 2025-12-04T09:33:41.5213416Z * [new branch] gh/kwen2501/278/head -> origin/gh/kwen2501/278/head 2025-12-04T09:33:41.5214909Z * [new branch] gh/kwen2501/278/orig -> origin/gh/kwen2501/278/orig 2025-12-04T09:33:41.5217127Z * [new branch] gh/kwen2501/279/base -> origin/gh/kwen2501/279/base 2025-12-04T09:33:41.5218798Z * [new branch] gh/kwen2501/279/head -> origin/gh/kwen2501/279/head 2025-12-04T09:33:41.5220317Z * [new branch] gh/kwen2501/279/orig -> origin/gh/kwen2501/279/orig 2025-12-04T09:33:41.5222433Z * [new branch] gh/kwen2501/280/base -> origin/gh/kwen2501/280/base 2025-12-04T09:33:41.5223945Z * [new branch] gh/kwen2501/280/head -> origin/gh/kwen2501/280/head 2025-12-04T09:33:41.5226037Z * [new branch] gh/kwen2501/280/orig -> origin/gh/kwen2501/280/orig 2025-12-04T09:33:41.5228131Z * [new branch] gh/kwen2501/281/base -> origin/gh/kwen2501/281/base 2025-12-04T09:33:41.5229634Z * [new branch] gh/kwen2501/281/head -> origin/gh/kwen2501/281/head 2025-12-04T09:33:41.5231199Z * [new branch] gh/kwen2501/281/orig -> origin/gh/kwen2501/281/orig 2025-12-04T09:33:41.5233216Z * [new branch] gh/kwen2501/282/base -> origin/gh/kwen2501/282/base 2025-12-04T09:33:41.5234850Z * [new branch] gh/kwen2501/282/head -> origin/gh/kwen2501/282/head 2025-12-04T09:33:41.5236304Z * [new branch] gh/kwen2501/282/orig -> origin/gh/kwen2501/282/orig 2025-12-04T09:33:41.5238314Z * [new branch] gh/kwen2501/283/base -> origin/gh/kwen2501/283/base 2025-12-04T09:33:41.5239913Z * [new branch] gh/kwen2501/283/head -> origin/gh/kwen2501/283/head 2025-12-04T09:33:41.5241433Z * [new branch] gh/kwen2501/283/orig -> origin/gh/kwen2501/283/orig 2025-12-04T09:33:41.5243497Z * [new branch] gh/kwen2501/284/base -> origin/gh/kwen2501/284/base 2025-12-04T09:33:41.5245044Z * [new branch] gh/kwen2501/284/head -> origin/gh/kwen2501/284/head 2025-12-04T09:33:41.5246597Z * [new branch] gh/kwen2501/284/orig -> origin/gh/kwen2501/284/orig 2025-12-04T09:33:41.5248759Z * [new branch] gh/kwen2501/285/base -> origin/gh/kwen2501/285/base 2025-12-04T09:33:41.5250254Z * [new branch] gh/kwen2501/285/head -> origin/gh/kwen2501/285/head 2025-12-04T09:33:41.5251902Z * [new branch] gh/kwen2501/285/orig -> origin/gh/kwen2501/285/orig 2025-12-04T09:33:41.5253902Z * [new branch] gh/kwen2501/286/base -> origin/gh/kwen2501/286/base 2025-12-04T09:33:41.5255522Z * [new branch] gh/kwen2501/286/head -> origin/gh/kwen2501/286/head 2025-12-04T09:33:41.5257077Z * [new branch] gh/kwen2501/286/orig -> origin/gh/kwen2501/286/orig 2025-12-04T09:33:41.5258967Z * [new branch] gh/kwen2501/287/base -> origin/gh/kwen2501/287/base 2025-12-04T09:33:41.5260552Z * [new branch] gh/kwen2501/287/head -> origin/gh/kwen2501/287/head 2025-12-04T09:33:41.5261927Z * [new branch] gh/kwen2501/287/orig -> origin/gh/kwen2501/287/orig 2025-12-04T09:33:41.5264015Z * [new branch] gh/kwen2501/288/base -> origin/gh/kwen2501/288/base 2025-12-04T09:33:41.5265606Z * [new branch] gh/kwen2501/288/head -> origin/gh/kwen2501/288/head 2025-12-04T09:33:41.5267162Z * [new branch] gh/kwen2501/288/orig -> origin/gh/kwen2501/288/orig 2025-12-04T09:33:41.5269539Z * [new branch] gh/laithsakka/251/base -> origin/gh/laithsakka/251/base 2025-12-04T09:33:41.5271368Z * [new branch] gh/laithsakka/251/head -> origin/gh/laithsakka/251/head 2025-12-04T09:33:41.5272878Z * [new branch] gh/laithsakka/251/orig -> origin/gh/laithsakka/251/orig 2025-12-04T09:33:41.5274854Z * [new branch] gh/laithsakka/276/base -> origin/gh/laithsakka/276/base 2025-12-04T09:33:41.5276328Z * [new branch] gh/laithsakka/276/head -> origin/gh/laithsakka/276/head 2025-12-04T09:33:41.5277802Z * [new branch] gh/laithsakka/276/orig -> origin/gh/laithsakka/276/orig 2025-12-04T09:33:41.5279859Z * [new branch] gh/laithsakka/28/base -> origin/gh/laithsakka/28/base 2025-12-04T09:33:41.5281624Z * [new branch] gh/laithsakka/29/base -> origin/gh/laithsakka/29/base 2025-12-04T09:33:41.5283858Z * [new branch] gh/laithsakka/30/base -> origin/gh/laithsakka/30/base 2025-12-04T09:33:41.5285483Z * [new branch] gh/laithsakka/30/head -> origin/gh/laithsakka/30/head 2025-12-04T09:33:41.5287238Z * [new branch] gh/laithsakka/31/base -> origin/gh/laithsakka/31/base 2025-12-04T09:33:41.5288646Z * [new branch] gh/laithsakka/31/head -> origin/gh/laithsakka/31/head 2025-12-04T09:33:41.5290949Z * [new branch] gh/laithsakka/313/base -> origin/gh/laithsakka/313/base 2025-12-04T09:33:41.5292434Z * [new branch] gh/laithsakka/313/head -> origin/gh/laithsakka/313/head 2025-12-04T09:33:41.5293873Z * [new branch] gh/laithsakka/313/orig -> origin/gh/laithsakka/313/orig 2025-12-04T09:33:41.5296218Z * [new branch] gh/laithsakka/316/base -> origin/gh/laithsakka/316/base 2025-12-04T09:33:41.5297811Z * [new branch] gh/laithsakka/316/head -> origin/gh/laithsakka/316/head 2025-12-04T09:33:41.5299317Z * [new branch] gh/laithsakka/316/orig -> origin/gh/laithsakka/316/orig 2025-12-04T09:33:41.5301549Z * [new branch] gh/laithsakka/317/base -> origin/gh/laithsakka/317/base 2025-12-04T09:33:41.5302940Z * [new branch] gh/laithsakka/317/head -> origin/gh/laithsakka/317/head 2025-12-04T09:33:41.5304411Z * [new branch] gh/laithsakka/317/orig -> origin/gh/laithsakka/317/orig 2025-12-04T09:33:41.5306426Z * [new branch] gh/laithsakka/319/base -> origin/gh/laithsakka/319/base 2025-12-04T09:33:41.5307954Z * [new branch] gh/laithsakka/319/head -> origin/gh/laithsakka/319/head 2025-12-04T09:33:41.5309432Z * [new branch] gh/laithsakka/319/orig -> origin/gh/laithsakka/319/orig 2025-12-04T09:33:41.5311895Z * [new branch] gh/laithsakka/32/base -> origin/gh/laithsakka/32/base 2025-12-04T09:33:41.5313309Z * [new branch] gh/laithsakka/32/head -> origin/gh/laithsakka/32/head 2025-12-04T09:33:41.5315434Z * [new branch] gh/laithsakka/320/base -> origin/gh/laithsakka/320/base 2025-12-04T09:33:41.5316849Z * [new branch] gh/laithsakka/320/head -> origin/gh/laithsakka/320/head 2025-12-04T09:33:41.5318284Z * [new branch] gh/laithsakka/320/orig -> origin/gh/laithsakka/320/orig 2025-12-04T09:33:41.5320203Z * [new branch] gh/laithsakka/321/base -> origin/gh/laithsakka/321/base 2025-12-04T09:33:41.5321852Z * [new branch] gh/laithsakka/321/head -> origin/gh/laithsakka/321/head 2025-12-04T09:33:41.5323231Z * [new branch] gh/laithsakka/321/orig -> origin/gh/laithsakka/321/orig 2025-12-04T09:33:41.5325471Z * [new branch] gh/laithsakka/322/base -> origin/gh/laithsakka/322/base 2025-12-04T09:33:41.5326999Z * [new branch] gh/laithsakka/322/head -> origin/gh/laithsakka/322/head 2025-12-04T09:33:41.5328536Z * [new branch] gh/laithsakka/322/orig -> origin/gh/laithsakka/322/orig 2025-12-04T09:33:41.5330747Z * [new branch] gh/laithsakka/323/base -> origin/gh/laithsakka/323/base 2025-12-04T09:33:41.5332928Z * [new branch] gh/laithsakka/323/head -> origin/gh/laithsakka/323/head 2025-12-04T09:33:41.5334541Z * [new branch] gh/laithsakka/323/orig -> origin/gh/laithsakka/323/orig 2025-12-04T09:33:41.5336689Z * [new branch] gh/laithsakka/324/base -> origin/gh/laithsakka/324/base 2025-12-04T09:33:41.5338660Z * [new branch] gh/laithsakka/324/head -> origin/gh/laithsakka/324/head 2025-12-04T09:33:41.5340067Z * [new branch] gh/laithsakka/324/orig -> origin/gh/laithsakka/324/orig 2025-12-04T09:33:41.5342124Z * [new branch] gh/laithsakka/325/base -> origin/gh/laithsakka/325/base 2025-12-04T09:33:41.5343598Z * [new branch] gh/laithsakka/325/head -> origin/gh/laithsakka/325/head 2025-12-04T09:33:41.5345080Z * [new branch] gh/laithsakka/325/orig -> origin/gh/laithsakka/325/orig 2025-12-04T09:33:41.5347547Z * [new branch] gh/laithsakka/326/base -> origin/gh/laithsakka/326/base 2025-12-04T09:33:41.5349014Z * [new branch] gh/laithsakka/326/head -> origin/gh/laithsakka/326/head 2025-12-04T09:33:41.5350544Z * [new branch] gh/laithsakka/326/orig -> origin/gh/laithsakka/326/orig 2025-12-04T09:33:41.5352742Z * [new branch] gh/laithsakka/327/base -> origin/gh/laithsakka/327/base 2025-12-04T09:33:41.5354352Z * [new branch] gh/laithsakka/327/head -> origin/gh/laithsakka/327/head 2025-12-04T09:33:41.5355864Z * [new branch] gh/laithsakka/327/orig -> origin/gh/laithsakka/327/orig 2025-12-04T09:33:41.5357894Z * [new branch] gh/laithsakka/328/base -> origin/gh/laithsakka/328/base 2025-12-04T09:33:41.5359402Z * [new branch] gh/laithsakka/328/head -> origin/gh/laithsakka/328/head 2025-12-04T09:33:41.5360957Z * [new branch] gh/laithsakka/328/orig -> origin/gh/laithsakka/328/orig 2025-12-04T09:33:41.5363288Z * [new branch] gh/liangel/4/base -> origin/gh/liangel/4/base 2025-12-04T09:33:41.5364785Z * [new branch] gh/liangel/4/head -> origin/gh/liangel/4/head 2025-12-04T09:33:41.5366266Z * [new branch] gh/liangel/4/orig -> origin/gh/liangel/4/orig 2025-12-04T09:33:41.5370804Z * [new branch] gh/lucaskabela/1/base -> origin/gh/lucaskabela/1/base 2025-12-04T09:33:41.5375889Z * [new branch] gh/lucaskabela/1/head -> origin/gh/lucaskabela/1/head 2025-12-04T09:33:41.5378467Z * [new branch] gh/lw/4/base -> origin/gh/lw/4/base 2025-12-04T09:33:41.5379943Z * [new branch] gh/lw/4/head -> origin/gh/lw/4/head 2025-12-04T09:33:41.5381453Z * [new branch] gh/lw/4/orig -> origin/gh/lw/4/orig 2025-12-04T09:33:41.5383354Z * [new branch] gh/lw/5/base -> origin/gh/lw/5/base 2025-12-04T09:33:41.5384852Z * [new branch] gh/lw/5/head -> origin/gh/lw/5/head 2025-12-04T09:33:41.5386290Z * [new branch] gh/lw/5/orig -> origin/gh/lw/5/orig 2025-12-04T09:33:41.5388243Z * [new branch] gh/lw/6/base -> origin/gh/lw/6/base 2025-12-04T09:33:41.5389875Z * [new branch] gh/lw/6/head -> origin/gh/lw/6/head 2025-12-04T09:33:41.5391198Z * [new branch] gh/lw/6/orig -> origin/gh/lw/6/orig 2025-12-04T09:33:41.5393534Z * [new branch] gh/malfet/14/base -> origin/gh/malfet/14/base 2025-12-04T09:33:41.5395508Z * [new branch] gh/malfet/417/base -> origin/gh/malfet/417/base 2025-12-04T09:33:41.5396978Z * [new branch] gh/malfet/417/head -> origin/gh/malfet/417/head 2025-12-04T09:33:41.5398451Z * [new branch] gh/malfet/417/orig -> origin/gh/malfet/417/orig 2025-12-04T09:33:41.5400367Z * [new branch] gh/malfet/506/base -> origin/gh/malfet/506/base 2025-12-04T09:33:41.5401977Z * [new branch] gh/malfet/506/head -> origin/gh/malfet/506/head 2025-12-04T09:33:41.5403424Z * [new branch] gh/malfet/506/orig -> origin/gh/malfet/506/orig 2025-12-04T09:33:41.5405381Z * [new branch] gh/malfet/517/base -> origin/gh/malfet/517/base 2025-12-04T09:33:41.5406902Z * [new branch] gh/malfet/517/head -> origin/gh/malfet/517/head 2025-12-04T09:33:41.5408855Z * [new branch] gh/malfet/528/base -> origin/gh/malfet/528/base 2025-12-04T09:33:41.5410337Z * [new branch] gh/malfet/528/head -> origin/gh/malfet/528/head 2025-12-04T09:33:41.5411815Z * [new branch] gh/malfet/528/orig -> origin/gh/malfet/528/orig 2025-12-04T09:33:41.5413818Z * [new branch] gh/malfet/537/base -> origin/gh/malfet/537/base 2025-12-04T09:33:41.5415241Z * [new branch] gh/malfet/537/head -> origin/gh/malfet/537/head 2025-12-04T09:33:41.5416840Z * [new branch] gh/malfet/537/orig -> origin/gh/malfet/537/orig 2025-12-04T09:33:41.5418789Z * [new branch] gh/malfet/546/base -> origin/gh/malfet/546/base 2025-12-04T09:33:41.5420413Z * [new branch] gh/malfet/546/head -> origin/gh/malfet/546/head 2025-12-04T09:33:41.5421754Z * [new branch] gh/malfet/546/orig -> origin/gh/malfet/546/orig 2025-12-04T09:33:41.5424081Z * [new branch] gh/malfet/565/base -> origin/gh/malfet/565/base 2025-12-04T09:33:41.5425631Z * [new branch] gh/malfet/565/head -> origin/gh/malfet/565/head 2025-12-04T09:33:41.5427110Z * [new branch] gh/malfet/565/orig -> origin/gh/malfet/565/orig 2025-12-04T09:33:41.5429131Z * [new branch] gh/malfet/575/base -> origin/gh/malfet/575/base 2025-12-04T09:33:41.5430576Z * [new branch] gh/malfet/575/head -> origin/gh/malfet/575/head 2025-12-04T09:33:41.5432083Z * [new branch] gh/malfet/575/orig -> origin/gh/malfet/575/orig 2025-12-04T09:33:41.5434132Z * [new branch] gh/malfet/580/base -> origin/gh/malfet/580/base 2025-12-04T09:33:41.5435613Z * [new branch] gh/malfet/580/head -> origin/gh/malfet/580/head 2025-12-04T09:33:41.5437115Z * [new branch] gh/malfet/580/orig -> origin/gh/malfet/580/orig 2025-12-04T09:33:41.5439031Z * [new branch] gh/malfet/581/base -> origin/gh/malfet/581/base 2025-12-04T09:33:41.5440499Z * [new branch] gh/malfet/581/head -> origin/gh/malfet/581/head 2025-12-04T09:33:41.5442034Z * [new branch] gh/malfet/581/orig -> origin/gh/malfet/581/orig 2025-12-04T09:33:41.5443904Z * [new branch] gh/malfet/583/base -> origin/gh/malfet/583/base 2025-12-04T09:33:41.5445408Z * [new branch] gh/malfet/583/head -> origin/gh/malfet/583/head 2025-12-04T09:33:41.5446830Z * [new branch] gh/malfet/583/orig -> origin/gh/malfet/583/orig 2025-12-04T09:33:41.5448712Z * [new branch] gh/malfet/586/base -> origin/gh/malfet/586/base 2025-12-04T09:33:41.5450251Z * [new branch] gh/malfet/586/head -> origin/gh/malfet/586/head 2025-12-04T09:33:41.5451708Z * [new branch] gh/malfet/586/orig -> origin/gh/malfet/586/orig 2025-12-04T09:33:41.5453714Z * [new branch] gh/malfet/587/base -> origin/gh/malfet/587/base 2025-12-04T09:33:41.5455142Z * [new branch] gh/malfet/587/head -> origin/gh/malfet/587/head 2025-12-04T09:33:41.5456691Z * [new branch] gh/malfet/587/orig -> origin/gh/malfet/587/orig 2025-12-04T09:33:41.5458683Z * [new branch] gh/malfet/588/base -> origin/gh/malfet/588/base 2025-12-04T09:33:41.5460159Z * [new branch] gh/malfet/588/head -> origin/gh/malfet/588/head 2025-12-04T09:33:41.5461796Z * [new branch] gh/malfet/588/orig -> origin/gh/malfet/588/orig 2025-12-04T09:33:41.5463779Z * [new branch] gh/malfet/589/base -> origin/gh/malfet/589/base 2025-12-04T09:33:41.5465239Z * [new branch] gh/malfet/589/head -> origin/gh/malfet/589/head 2025-12-04T09:33:41.5467314Z * [new branch] gh/malfet/589/orig -> origin/gh/malfet/589/orig 2025-12-04T09:33:41.5469147Z * [new branch] gh/malfet/590/base -> origin/gh/malfet/590/base 2025-12-04T09:33:41.5470633Z * [new branch] gh/malfet/590/head -> origin/gh/malfet/590/head 2025-12-04T09:33:41.5472326Z * [new branch] gh/malfet/590/orig -> origin/gh/malfet/590/orig 2025-12-04T09:33:41.5474844Z * [new branch] gh/malfet/591/base -> origin/gh/malfet/591/base 2025-12-04T09:33:41.5476297Z * [new branch] gh/malfet/591/head -> origin/gh/malfet/591/head 2025-12-04T09:33:41.5477834Z * [new branch] gh/malfet/591/orig -> origin/gh/malfet/591/orig 2025-12-04T09:33:41.5479840Z * [new branch] gh/malfet/592/base -> origin/gh/malfet/592/base 2025-12-04T09:33:41.5481345Z * [new branch] gh/malfet/592/head -> origin/gh/malfet/592/head 2025-12-04T09:33:41.5482842Z * [new branch] gh/malfet/592/orig -> origin/gh/malfet/592/orig 2025-12-04T09:33:41.5484833Z * [new branch] gh/malfet/593/base -> origin/gh/malfet/593/base 2025-12-04T09:33:41.5486266Z * [new branch] gh/malfet/593/head -> origin/gh/malfet/593/head 2025-12-04T09:33:41.5487814Z * [new branch] gh/malfet/593/orig -> origin/gh/malfet/593/orig 2025-12-04T09:33:41.5489793Z * [new branch] gh/malfet/594/base -> origin/gh/malfet/594/base 2025-12-04T09:33:41.5491249Z * [new branch] gh/malfet/594/head -> origin/gh/malfet/594/head 2025-12-04T09:33:41.5492758Z * [new branch] gh/malfet/594/orig -> origin/gh/malfet/594/orig 2025-12-04T09:33:41.5494700Z * [new branch] gh/malfet/595/base -> origin/gh/malfet/595/base 2025-12-04T09:33:41.5496219Z * [new branch] gh/malfet/595/head -> origin/gh/malfet/595/head 2025-12-04T09:33:41.5497850Z * [new branch] gh/malfet/595/orig -> origin/gh/malfet/595/orig 2025-12-04T09:33:41.5499762Z * [new branch] gh/malfet/596/base -> origin/gh/malfet/596/base 2025-12-04T09:33:41.5501277Z * [new branch] gh/malfet/596/head -> origin/gh/malfet/596/head 2025-12-04T09:33:41.5503254Z * [new branch] gh/malfet/596/orig -> origin/gh/malfet/596/orig 2025-12-04T09:33:41.5505286Z * [new branch] gh/malfet/597/base -> origin/gh/malfet/597/base 2025-12-04T09:33:41.5506734Z * [new branch] gh/malfet/597/head -> origin/gh/malfet/597/head 2025-12-04T09:33:41.5508209Z * [new branch] gh/malfet/597/orig -> origin/gh/malfet/597/orig 2025-12-04T09:33:41.5510159Z * [new branch] gh/malfet/598/base -> origin/gh/malfet/598/base 2025-12-04T09:33:41.5511738Z * [new branch] gh/malfet/598/head -> origin/gh/malfet/598/head 2025-12-04T09:33:41.5513140Z * [new branch] gh/malfet/598/orig -> origin/gh/malfet/598/orig 2025-12-04T09:33:41.5515243Z * [new branch] gh/malfet/599/base -> origin/gh/malfet/599/base 2025-12-04T09:33:41.5516698Z * [new branch] gh/malfet/599/head -> origin/gh/malfet/599/head 2025-12-04T09:33:41.5518197Z * [new branch] gh/malfet/599/orig -> origin/gh/malfet/599/orig 2025-12-04T09:33:41.5520134Z * [new branch] gh/malfet/600/base -> origin/gh/malfet/600/base 2025-12-04T09:33:41.5521576Z * [new branch] gh/malfet/600/head -> origin/gh/malfet/600/head 2025-12-04T09:33:41.5523021Z * [new branch] gh/malfet/600/orig -> origin/gh/malfet/600/orig 2025-12-04T09:33:41.5525744Z * [new branch] gh/malfet/601/base -> origin/gh/malfet/601/base 2025-12-04T09:33:41.5527257Z * [new branch] gh/malfet/601/head -> origin/gh/malfet/601/head 2025-12-04T09:33:41.5528760Z * [new branch] gh/malfet/601/orig -> origin/gh/malfet/601/orig 2025-12-04T09:33:41.5530865Z * [new branch] gh/malfet/602/base -> origin/gh/malfet/602/base 2025-12-04T09:33:41.5532351Z * [new branch] gh/malfet/602/head -> origin/gh/malfet/602/head 2025-12-04T09:33:41.5533816Z * [new branch] gh/malfet/602/orig -> origin/gh/malfet/602/orig 2025-12-04T09:33:41.5535833Z * [new branch] gh/malfet/603/base -> origin/gh/malfet/603/base 2025-12-04T09:33:41.5537346Z * [new branch] gh/malfet/603/head -> origin/gh/malfet/603/head 2025-12-04T09:33:41.5538899Z * [new branch] gh/malfet/603/orig -> origin/gh/malfet/603/orig 2025-12-04T09:33:41.5540889Z * [new branch] gh/malfet/604/base -> origin/gh/malfet/604/base 2025-12-04T09:33:41.5542313Z * [new branch] gh/malfet/604/head -> origin/gh/malfet/604/head 2025-12-04T09:33:41.5544260Z * [new branch] gh/malfet/604/orig -> origin/gh/malfet/604/orig 2025-12-04T09:33:41.5546316Z * [new branch] gh/malfet/605/base -> origin/gh/malfet/605/base 2025-12-04T09:33:41.5547790Z * [new branch] gh/malfet/605/head -> origin/gh/malfet/605/head 2025-12-04T09:33:41.5549417Z * [new branch] gh/malfet/605/orig -> origin/gh/malfet/605/orig 2025-12-04T09:33:41.5551441Z * [new branch] gh/malfet/606/base -> origin/gh/malfet/606/base 2025-12-04T09:33:41.5553157Z * [new branch] gh/malfet/606/head -> origin/gh/malfet/606/head 2025-12-04T09:33:41.5554630Z * [new branch] gh/malfet/606/orig -> origin/gh/malfet/606/orig 2025-12-04T09:33:41.5556709Z * [new branch] gh/malfet/607/base -> origin/gh/malfet/607/base 2025-12-04T09:33:41.5558181Z * [new branch] gh/malfet/607/head -> origin/gh/malfet/607/head 2025-12-04T09:33:41.5559774Z * [new branch] gh/malfet/607/orig -> origin/gh/malfet/607/orig 2025-12-04T09:33:41.5561806Z * [new branch] gh/malfet/608/base -> origin/gh/malfet/608/base 2025-12-04T09:33:41.5563264Z * [new branch] gh/malfet/608/head -> origin/gh/malfet/608/head 2025-12-04T09:33:41.5564790Z * [new branch] gh/malfet/608/orig -> origin/gh/malfet/608/orig 2025-12-04T09:33:41.5566792Z * [new branch] gh/malfet/609/base -> origin/gh/malfet/609/base 2025-12-04T09:33:41.5568240Z * [new branch] gh/malfet/609/head -> origin/gh/malfet/609/head 2025-12-04T09:33:41.5569765Z * [new branch] gh/malfet/609/orig -> origin/gh/malfet/609/orig 2025-12-04T09:33:41.5572127Z * [new branch] gh/malfet/610/base -> origin/gh/malfet/610/base 2025-12-04T09:33:41.5573619Z * [new branch] gh/malfet/610/head -> origin/gh/malfet/610/head 2025-12-04T09:33:41.5575200Z * [new branch] gh/malfet/610/orig -> origin/gh/malfet/610/orig 2025-12-04T09:33:41.5577314Z * [new branch] gh/malfet/611/base -> origin/gh/malfet/611/base 2025-12-04T09:33:41.5578726Z * [new branch] gh/malfet/611/head -> origin/gh/malfet/611/head 2025-12-04T09:33:41.5580244Z * [new branch] gh/malfet/611/orig -> origin/gh/malfet/611/orig 2025-12-04T09:33:41.5582065Z * [new branch] gh/malfet/612/base -> origin/gh/malfet/612/base 2025-12-04T09:33:41.5583540Z * [new branch] gh/malfet/612/head -> origin/gh/malfet/612/head 2025-12-04T09:33:41.5585142Z * [new branch] gh/malfet/612/orig -> origin/gh/malfet/612/orig 2025-12-04T09:33:41.5587202Z * [new branch] gh/malfet/64/base -> origin/gh/malfet/64/base 2025-12-04T09:33:41.5588647Z * [new branch] gh/malfet/64/head -> origin/gh/malfet/64/head 2025-12-04T09:33:41.5591035Z * [new branch] gh/manuelcandales/11/base -> origin/gh/manuelcandales/11/base 2025-12-04T09:33:41.5592492Z * [new branch] gh/manuelcandales/11/head -> origin/gh/manuelcandales/11/head 2025-12-04T09:33:41.5593968Z * [new branch] gh/manuelcandales/11/orig -> origin/gh/manuelcandales/11/orig 2025-12-04T09:33:41.5596739Z * [new branch] gh/markkm/1/base -> origin/gh/markkm/1/base 2025-12-04T09:33:41.5599040Z * [new branch] gh/masnesral/1/base -> origin/gh/masnesral/1/base 2025-12-04T09:33:41.5600567Z * [new branch] gh/masnesral/1/head -> origin/gh/masnesral/1/head 2025-12-04T09:33:41.5602054Z * [new branch] gh/masnesral/1/orig -> origin/gh/masnesral/1/orig 2025-12-04T09:33:41.5604394Z * [new branch] gh/mhorowitz/0/base -> origin/gh/mhorowitz/0/base 2025-12-04T09:33:41.5605946Z * [new branch] gh/mhorowitz/0/head -> origin/gh/mhorowitz/0/head 2025-12-04T09:33:41.5607696Z * [new branch] gh/mhorowitz/1/base -> origin/gh/mhorowitz/1/base 2025-12-04T09:33:41.5609194Z * [new branch] gh/mhorowitz/1/head -> origin/gh/mhorowitz/1/head 2025-12-04T09:33:41.5610961Z * [new branch] gh/mhorowitz/2/base -> origin/gh/mhorowitz/2/base 2025-12-04T09:33:41.5612676Z * [new branch] gh/mhorowitz/2/head -> origin/gh/mhorowitz/2/head 2025-12-04T09:33:41.5614416Z * [new branch] gh/mhorowitz/3/base -> origin/gh/mhorowitz/3/base 2025-12-04T09:33:41.5615925Z * [new branch] gh/mhorowitz/3/head -> origin/gh/mhorowitz/3/head 2025-12-04T09:33:41.5617748Z * [new branch] gh/mhorowitz/4/base -> origin/gh/mhorowitz/4/base 2025-12-04T09:33:41.5619180Z * [new branch] gh/mhorowitz/4/head -> origin/gh/mhorowitz/4/head 2025-12-04T09:33:41.5620878Z * [new branch] gh/mhorowitz/5/base -> origin/gh/mhorowitz/5/base 2025-12-04T09:33:41.5622284Z * [new branch] gh/mhorowitz/5/head -> origin/gh/mhorowitz/5/head 2025-12-04T09:33:41.5624035Z * [new branch] gh/mhorowitz/6/base -> origin/gh/mhorowitz/6/base 2025-12-04T09:33:41.5625416Z * [new branch] gh/mhorowitz/6/head -> origin/gh/mhorowitz/6/head 2025-12-04T09:33:41.5627912Z * [new branch] gh/mikaylagawarecki/234/base -> origin/gh/mikaylagawarecki/234/base 2025-12-04T09:33:41.5629561Z * [new branch] gh/mikaylagawarecki/234/head -> origin/gh/mikaylagawarecki/234/head 2025-12-04T09:33:41.5631384Z * [new branch] gh/mikaylagawarecki/235/base -> origin/gh/mikaylagawarecki/235/base 2025-12-04T09:33:41.5632898Z * [new branch] gh/mikaylagawarecki/235/head -> origin/gh/mikaylagawarecki/235/head 2025-12-04T09:33:41.5634657Z * [new branch] gh/mikaylagawarecki/236/base -> origin/gh/mikaylagawarecki/236/base 2025-12-04T09:33:41.5636070Z * [new branch] gh/mikaylagawarecki/236/head -> origin/gh/mikaylagawarecki/236/head 2025-12-04T09:33:41.5638257Z * [new branch] gh/mikaylagawarecki/237/base -> origin/gh/mikaylagawarecki/237/base 2025-12-04T09:33:41.5639612Z * [new branch] gh/mikaylagawarecki/237/head -> origin/gh/mikaylagawarecki/237/head 2025-12-04T09:33:41.5641604Z * [new branch] gh/mikaylagawarecki/238/base -> origin/gh/mikaylagawarecki/238/base 2025-12-04T09:33:41.5643145Z * [new branch] gh/mikaylagawarecki/238/head -> origin/gh/mikaylagawarecki/238/head 2025-12-04T09:33:41.5645117Z * [new branch] gh/mikaylagawarecki/336/base -> origin/gh/mikaylagawarecki/336/base 2025-12-04T09:33:41.5646607Z * [new branch] gh/mikaylagawarecki/336/head -> origin/gh/mikaylagawarecki/336/head 2025-12-04T09:33:41.5648185Z * [new branch] gh/mikaylagawarecki/336/orig -> origin/gh/mikaylagawarecki/336/orig 2025-12-04T09:33:41.5650490Z * [new branch] gh/mikaylagawarecki/341/base -> origin/gh/mikaylagawarecki/341/base 2025-12-04T09:33:41.5651912Z * [new branch] gh/mikaylagawarecki/341/head -> origin/gh/mikaylagawarecki/341/head 2025-12-04T09:33:41.5653410Z * [new branch] gh/mikaylagawarecki/341/orig -> origin/gh/mikaylagawarecki/341/orig 2025-12-04T09:33:41.5655648Z * [new branch] gh/mikaylagawarecki/342/base -> origin/gh/mikaylagawarecki/342/base 2025-12-04T09:33:41.5657200Z * [new branch] gh/mikaylagawarecki/342/head -> origin/gh/mikaylagawarecki/342/head 2025-12-04T09:33:41.5658738Z * [new branch] gh/mikaylagawarecki/342/orig -> origin/gh/mikaylagawarecki/342/orig 2025-12-04T09:33:41.5661293Z * [new branch] gh/mikaylagawarecki/345/base -> origin/gh/mikaylagawarecki/345/base 2025-12-04T09:33:41.5662738Z * [new branch] gh/mikaylagawarecki/345/head -> origin/gh/mikaylagawarecki/345/head 2025-12-04T09:33:41.5664221Z * [new branch] gh/mikaylagawarecki/345/orig -> origin/gh/mikaylagawarecki/345/orig 2025-12-04T09:33:41.5666405Z * [new branch] gh/mikaylagawarecki/346/base -> origin/gh/mikaylagawarecki/346/base 2025-12-04T09:33:41.5667886Z * [new branch] gh/mikaylagawarecki/346/head -> origin/gh/mikaylagawarecki/346/head 2025-12-04T09:33:41.5669457Z * [new branch] gh/mikaylagawarecki/346/orig -> origin/gh/mikaylagawarecki/346/orig 2025-12-04T09:33:41.5671783Z * [new branch] gh/mikaylagawarecki/347/base -> origin/gh/mikaylagawarecki/347/base 2025-12-04T09:33:41.5675900Z * [new branch] gh/mikaylagawarecki/347/head -> origin/gh/mikaylagawarecki/347/head 2025-12-04T09:33:41.5677335Z * [new branch] gh/mikaylagawarecki/347/orig -> origin/gh/mikaylagawarecki/347/orig 2025-12-04T09:33:41.5680001Z * [new branch] gh/mikaylagawarecki/350/base -> origin/gh/mikaylagawarecki/350/base 2025-12-04T09:33:41.5681459Z * [new branch] gh/mikaylagawarecki/350/head -> origin/gh/mikaylagawarecki/350/head 2025-12-04T09:33:41.5683075Z * [new branch] gh/mikaylagawarecki/350/orig -> origin/gh/mikaylagawarecki/350/orig 2025-12-04T09:33:41.5686049Z * [new branch] gh/mikaylagawarecki/351/base -> origin/gh/mikaylagawarecki/351/base 2025-12-04T09:33:41.5687603Z * [new branch] gh/mikaylagawarecki/351/head -> origin/gh/mikaylagawarecki/351/head 2025-12-04T09:33:41.5689142Z * [new branch] gh/mikaylagawarecki/351/orig -> origin/gh/mikaylagawarecki/351/orig 2025-12-04T09:33:41.5691338Z * [new branch] gh/mikaylagawarecki/352/base -> origin/gh/mikaylagawarecki/352/base 2025-12-04T09:33:41.5692988Z * [new branch] gh/mikaylagawarecki/352/head -> origin/gh/mikaylagawarecki/352/head 2025-12-04T09:33:41.5694819Z * [new branch] gh/mikaylagawarecki/352/orig -> origin/gh/mikaylagawarecki/352/orig 2025-12-04T09:33:41.5696920Z * [new branch] gh/mikaylagawarecki/353/base -> origin/gh/mikaylagawarecki/353/base 2025-12-04T09:33:41.5698840Z * [new branch] gh/mikaylagawarecki/353/head -> origin/gh/mikaylagawarecki/353/head 2025-12-04T09:33:41.5700303Z * [new branch] gh/mikaylagawarecki/353/orig -> origin/gh/mikaylagawarecki/353/orig 2025-12-04T09:33:41.5702132Z * [new branch] gh/mikaylagawarecki/354/base -> origin/gh/mikaylagawarecki/354/base 2025-12-04T09:33:41.5703605Z * [new branch] gh/mikaylagawarecki/354/head -> origin/gh/mikaylagawarecki/354/head 2025-12-04T09:33:41.5705620Z * [new branch] gh/mikaylagawarecki/354/orig -> origin/gh/mikaylagawarecki/354/orig 2025-12-04T09:33:41.5708556Z * [new branch] gh/mikaylagawarecki/356/base -> origin/gh/mikaylagawarecki/356/base 2025-12-04T09:33:41.5710207Z * [new branch] gh/mikaylagawarecki/356/head -> origin/gh/mikaylagawarecki/356/head 2025-12-04T09:33:41.5711635Z * [new branch] gh/mikaylagawarecki/356/orig -> origin/gh/mikaylagawarecki/356/orig 2025-12-04T09:33:41.5713497Z * [new branch] gh/mikaylagawarecki/357/base -> origin/gh/mikaylagawarecki/357/base 2025-12-04T09:33:41.5715064Z * [new branch] gh/mikaylagawarecki/357/head -> origin/gh/mikaylagawarecki/357/head 2025-12-04T09:33:41.5716704Z * [new branch] gh/mikaylagawarecki/357/orig -> origin/gh/mikaylagawarecki/357/orig 2025-12-04T09:33:41.5718905Z * [new branch] gh/mikaylagawarecki/359/base -> origin/gh/mikaylagawarecki/359/base 2025-12-04T09:33:41.5720550Z * [new branch] gh/mikaylagawarecki/359/head -> origin/gh/mikaylagawarecki/359/head 2025-12-04T09:33:41.5722087Z * [new branch] gh/mikaylagawarecki/359/orig -> origin/gh/mikaylagawarecki/359/orig 2025-12-04T09:33:41.5724073Z * [new branch] gh/mikaylagawarecki/360/base -> origin/gh/mikaylagawarecki/360/base 2025-12-04T09:33:41.5725719Z * [new branch] gh/mikaylagawarecki/360/head -> origin/gh/mikaylagawarecki/360/head 2025-12-04T09:33:41.5727699Z * [new branch] gh/mikaylagawarecki/360/orig -> origin/gh/mikaylagawarecki/360/orig 2025-12-04T09:33:41.5729818Z * [new branch] gh/mikaylagawarecki/361/base -> origin/gh/mikaylagawarecki/361/base 2025-12-04T09:33:41.5731358Z * [new branch] gh/mikaylagawarecki/361/head -> origin/gh/mikaylagawarecki/361/head 2025-12-04T09:33:41.5732809Z * [new branch] gh/mikaylagawarecki/361/orig -> origin/gh/mikaylagawarecki/361/orig 2025-12-04T09:33:41.5734920Z * [new branch] gh/mikaylagawarecki/362/base -> origin/gh/mikaylagawarecki/362/base 2025-12-04T09:33:41.5736718Z * [new branch] gh/mikaylagawarecki/362/head -> origin/gh/mikaylagawarecki/362/head 2025-12-04T09:33:41.5738382Z * [new branch] gh/mikaylagawarecki/362/orig -> origin/gh/mikaylagawarecki/362/orig 2025-12-04T09:33:41.5740778Z * [new branch] gh/mikaylagawarecki/363/base -> origin/gh/mikaylagawarecki/363/base 2025-12-04T09:33:41.5742615Z * [new branch] gh/mikaylagawarecki/363/head -> origin/gh/mikaylagawarecki/363/head 2025-12-04T09:33:41.5744059Z * [new branch] gh/mikaylagawarecki/363/orig -> origin/gh/mikaylagawarecki/363/orig 2025-12-04T09:33:41.5746680Z * [new branch] gh/mikaylagawarecki/364/base -> origin/gh/mikaylagawarecki/364/base 2025-12-04T09:33:41.5748157Z * [new branch] gh/mikaylagawarecki/364/head -> origin/gh/mikaylagawarecki/364/head 2025-12-04T09:33:41.5749688Z * [new branch] gh/mikaylagawarecki/364/orig -> origin/gh/mikaylagawarecki/364/orig 2025-12-04T09:33:41.5752010Z * [new branch] gh/mikaylagawarecki/365/base -> origin/gh/mikaylagawarecki/365/base 2025-12-04T09:33:41.5753533Z * [new branch] gh/mikaylagawarecki/365/head -> origin/gh/mikaylagawarecki/365/head 2025-12-04T09:33:41.5755180Z * [new branch] gh/mikaylagawarecki/365/orig -> origin/gh/mikaylagawarecki/365/orig 2025-12-04T09:33:41.5757347Z * [new branch] gh/mikaylagawarecki/366/base -> origin/gh/mikaylagawarecki/366/base 2025-12-04T09:33:41.5758817Z * [new branch] gh/mikaylagawarecki/366/head -> origin/gh/mikaylagawarecki/366/head 2025-12-04T09:33:41.5760463Z * [new branch] gh/mikaylagawarecki/366/orig -> origin/gh/mikaylagawarecki/366/orig 2025-12-04T09:33:41.5762450Z * [new branch] gh/mikaylagawarecki/367/base -> origin/gh/mikaylagawarecki/367/base 2025-12-04T09:33:41.5763960Z * [new branch] gh/mikaylagawarecki/367/head -> origin/gh/mikaylagawarecki/367/head 2025-12-04T09:33:41.5765425Z * [new branch] gh/mikaylagawarecki/367/orig -> origin/gh/mikaylagawarecki/367/orig 2025-12-04T09:33:41.5768121Z * [new branch] gh/mikaylagawarecki/368/base -> origin/gh/mikaylagawarecki/368/base 2025-12-04T09:33:41.5769619Z * [new branch] gh/mikaylagawarecki/368/head -> origin/gh/mikaylagawarecki/368/head 2025-12-04T09:33:41.5771337Z * [new branch] gh/mikaylagawarecki/368/orig -> origin/gh/mikaylagawarecki/368/orig 2025-12-04T09:33:41.5773397Z * [new branch] gh/mikaylagawarecki/369/base -> origin/gh/mikaylagawarecki/369/base 2025-12-04T09:33:41.5774884Z * [new branch] gh/mikaylagawarecki/369/head -> origin/gh/mikaylagawarecki/369/head 2025-12-04T09:33:41.5776373Z * [new branch] gh/mikaylagawarecki/369/orig -> origin/gh/mikaylagawarecki/369/orig 2025-12-04T09:33:41.5778648Z * [new branch] gh/mikaylagawarecki/370/base -> origin/gh/mikaylagawarecki/370/base 2025-12-04T09:33:41.5780122Z * [new branch] gh/mikaylagawarecki/370/head -> origin/gh/mikaylagawarecki/370/head 2025-12-04T09:33:41.5781750Z * [new branch] gh/mikaylagawarecki/370/orig -> origin/gh/mikaylagawarecki/370/orig 2025-12-04T09:33:41.5783895Z * [new branch] gh/mikaylagawarecki/371/base -> origin/gh/mikaylagawarecki/371/base 2025-12-04T09:33:41.5785340Z * [new branch] gh/mikaylagawarecki/371/head -> origin/gh/mikaylagawarecki/371/head 2025-12-04T09:33:41.5786793Z * [new branch] gh/mikaylagawarecki/371/orig -> origin/gh/mikaylagawarecki/371/orig 2025-12-04T09:33:41.5788849Z * [new branch] gh/mikaylagawarecki/372/base -> origin/gh/mikaylagawarecki/372/base 2025-12-04T09:33:41.5790289Z * [new branch] gh/mikaylagawarecki/372/head -> origin/gh/mikaylagawarecki/372/head 2025-12-04T09:33:41.5791794Z * [new branch] gh/mikaylagawarecki/372/orig -> origin/gh/mikaylagawarecki/372/orig 2025-12-04T09:33:41.5793763Z * [new branch] gh/mikaylagawarecki/373/base -> origin/gh/mikaylagawarecki/373/base 2025-12-04T09:33:41.5795222Z * [new branch] gh/mikaylagawarecki/373/head -> origin/gh/mikaylagawarecki/373/head 2025-12-04T09:33:41.5796702Z * [new branch] gh/mikaylagawarecki/373/orig -> origin/gh/mikaylagawarecki/373/orig 2025-12-04T09:33:41.5798762Z * [new branch] gh/mikaylagawarecki/374/base -> origin/gh/mikaylagawarecki/374/base 2025-12-04T09:33:41.5800275Z * [new branch] gh/mikaylagawarecki/374/head -> origin/gh/mikaylagawarecki/374/head 2025-12-04T09:33:41.5801838Z * [new branch] gh/mikaylagawarecki/374/orig -> origin/gh/mikaylagawarecki/374/orig 2025-12-04T09:33:41.5803840Z * [new branch] gh/mikaylagawarecki/375/base -> origin/gh/mikaylagawarecki/375/base 2025-12-04T09:33:41.5805417Z * [new branch] gh/mikaylagawarecki/375/head -> origin/gh/mikaylagawarecki/375/head 2025-12-04T09:33:41.5806923Z * [new branch] gh/mikaylagawarecki/375/orig -> origin/gh/mikaylagawarecki/375/orig 2025-12-04T09:33:41.5809011Z * [new branch] gh/mikaylagawarecki/376/base -> origin/gh/mikaylagawarecki/376/base 2025-12-04T09:33:41.5810631Z * [new branch] gh/mikaylagawarecki/376/head -> origin/gh/mikaylagawarecki/376/head 2025-12-04T09:33:41.5812088Z * [new branch] gh/mikaylagawarecki/376/orig -> origin/gh/mikaylagawarecki/376/orig 2025-12-04T09:33:41.5814102Z * [new branch] gh/mikaylagawarecki/377/base -> origin/gh/mikaylagawarecki/377/base 2025-12-04T09:33:41.5815688Z * [new branch] gh/mikaylagawarecki/377/head -> origin/gh/mikaylagawarecki/377/head 2025-12-04T09:33:41.5817283Z * [new branch] gh/mikaylagawarecki/377/orig -> origin/gh/mikaylagawarecki/377/orig 2025-12-04T09:33:41.5819749Z * [new branch] gh/mikaylagawarecki/378/base -> origin/gh/mikaylagawarecki/378/base 2025-12-04T09:33:41.5821289Z * [new branch] gh/mikaylagawarecki/378/head -> origin/gh/mikaylagawarecki/378/head 2025-12-04T09:33:41.5822882Z * [new branch] gh/mikaylagawarecki/378/orig -> origin/gh/mikaylagawarecki/378/orig 2025-12-04T09:33:41.5824921Z * [new branch] gh/mikaylagawarecki/379/base -> origin/gh/mikaylagawarecki/379/base 2025-12-04T09:33:41.5826384Z * [new branch] gh/mikaylagawarecki/379/head -> origin/gh/mikaylagawarecki/379/head 2025-12-04T09:33:41.5827873Z * [new branch] gh/mikaylagawarecki/379/orig -> origin/gh/mikaylagawarecki/379/orig 2025-12-04T09:33:41.5829701Z * [new branch] gh/mikaylagawarecki/380/base -> origin/gh/mikaylagawarecki/380/base 2025-12-04T09:33:41.5831163Z * [new branch] gh/mikaylagawarecki/380/head -> origin/gh/mikaylagawarecki/380/head 2025-12-04T09:33:41.5832633Z * [new branch] gh/mikaylagawarecki/380/orig -> origin/gh/mikaylagawarecki/380/orig 2025-12-04T09:33:41.5834467Z * [new branch] gh/mikaylagawarecki/381/base -> origin/gh/mikaylagawarecki/381/base 2025-12-04T09:33:41.5835927Z * [new branch] gh/mikaylagawarecki/381/head -> origin/gh/mikaylagawarecki/381/head 2025-12-04T09:33:41.5837371Z * [new branch] gh/mikaylagawarecki/381/orig -> origin/gh/mikaylagawarecki/381/orig 2025-12-04T09:33:41.5839191Z * [new branch] gh/mikaylagawarecki/382/base -> origin/gh/mikaylagawarecki/382/base 2025-12-04T09:33:41.5840812Z * [new branch] gh/mikaylagawarecki/382/head -> origin/gh/mikaylagawarecki/382/head 2025-12-04T09:33:41.5842379Z * [new branch] gh/mikaylagawarecki/382/orig -> origin/gh/mikaylagawarecki/382/orig 2025-12-04T09:33:41.5844473Z * [new branch] gh/mikaylagawarecki/383/base -> origin/gh/mikaylagawarecki/383/base 2025-12-04T09:33:41.5845988Z * [new branch] gh/mikaylagawarecki/383/head -> origin/gh/mikaylagawarecki/383/head 2025-12-04T09:33:41.5847486Z * [new branch] gh/mikaylagawarecki/383/orig -> origin/gh/mikaylagawarecki/383/orig 2025-12-04T09:33:41.5849554Z * [new branch] gh/mikaylagawarecki/384/base -> origin/gh/mikaylagawarecki/384/base 2025-12-04T09:33:41.5851040Z * [new branch] gh/mikaylagawarecki/384/head -> origin/gh/mikaylagawarecki/384/head 2025-12-04T09:33:41.5852470Z * [new branch] gh/mikaylagawarecki/384/orig -> origin/gh/mikaylagawarecki/384/orig 2025-12-04T09:33:41.5854469Z * [new branch] gh/mikaylagawarecki/385/base -> origin/gh/mikaylagawarecki/385/base 2025-12-04T09:33:41.5856079Z * [new branch] gh/mikaylagawarecki/385/head -> origin/gh/mikaylagawarecki/385/head 2025-12-04T09:33:41.5857681Z * [new branch] gh/mikaylagawarecki/385/orig -> origin/gh/mikaylagawarecki/385/orig 2025-12-04T09:33:41.5859910Z * [new branch] gh/mikaylagawarecki/386/base -> origin/gh/mikaylagawarecki/386/base 2025-12-04T09:33:41.5861356Z * [new branch] gh/mikaylagawarecki/386/head -> origin/gh/mikaylagawarecki/386/head 2025-12-04T09:33:41.5862995Z * [new branch] gh/mikaylagawarecki/386/orig -> origin/gh/mikaylagawarecki/386/orig 2025-12-04T09:33:41.5865097Z * [new branch] gh/mikaylagawarecki/387/base -> origin/gh/mikaylagawarecki/387/base 2025-12-04T09:33:41.5866447Z * [new branch] gh/mikaylagawarecki/387/head -> origin/gh/mikaylagawarecki/387/head 2025-12-04T09:33:41.5867920Z * [new branch] gh/mikaylagawarecki/387/orig -> origin/gh/mikaylagawarecki/387/orig 2025-12-04T09:33:41.5869699Z * [new branch] gh/mikaylagawarecki/388/base -> origin/gh/mikaylagawarecki/388/base 2025-12-04T09:33:41.5871338Z * [new branch] gh/mikaylagawarecki/388/head -> origin/gh/mikaylagawarecki/388/head 2025-12-04T09:33:41.5872873Z * [new branch] gh/mikaylagawarecki/388/orig -> origin/gh/mikaylagawarecki/388/orig 2025-12-04T09:33:41.5874921Z * [new branch] gh/mikaylagawarecki/389/base -> origin/gh/mikaylagawarecki/389/base 2025-12-04T09:33:41.5876370Z * [new branch] gh/mikaylagawarecki/389/head -> origin/gh/mikaylagawarecki/389/head 2025-12-04T09:33:41.5877816Z * [new branch] gh/mikaylagawarecki/389/orig -> origin/gh/mikaylagawarecki/389/orig 2025-12-04T09:33:41.5879950Z * [new branch] gh/mikaylagawarecki/390/base -> origin/gh/mikaylagawarecki/390/base 2025-12-04T09:33:41.5881429Z * [new branch] gh/mikaylagawarecki/390/head -> origin/gh/mikaylagawarecki/390/head 2025-12-04T09:33:41.5882973Z * [new branch] gh/mikaylagawarecki/390/orig -> origin/gh/mikaylagawarecki/390/orig 2025-12-04T09:33:41.5885196Z * [new branch] gh/mikaylagawarecki/391/base -> origin/gh/mikaylagawarecki/391/base 2025-12-04T09:33:41.5886742Z * [new branch] gh/mikaylagawarecki/391/head -> origin/gh/mikaylagawarecki/391/head 2025-12-04T09:33:41.5888292Z * [new branch] gh/mikaylagawarecki/391/orig -> origin/gh/mikaylagawarecki/391/orig 2025-12-04T09:33:41.5890376Z * [new branch] gh/mikaylagawarecki/392/base -> origin/gh/mikaylagawarecki/392/base 2025-12-04T09:33:41.5891836Z * [new branch] gh/mikaylagawarecki/392/head -> origin/gh/mikaylagawarecki/392/head 2025-12-04T09:33:41.5893310Z * [new branch] gh/mikaylagawarecki/392/orig -> origin/gh/mikaylagawarecki/392/orig 2025-12-04T09:33:41.5895655Z * [new branch] gh/mlazos/41/base -> origin/gh/mlazos/41/base 2025-12-04T09:33:41.5897194Z * [new branch] gh/mlazos/41/head -> origin/gh/mlazos/41/head 2025-12-04T09:33:41.5898686Z * [new branch] gh/mlazos/41/orig -> origin/gh/mlazos/41/orig 2025-12-04T09:33:41.5900691Z * [new branch] gh/mlazos/42/base -> origin/gh/mlazos/42/base 2025-12-04T09:33:41.5902190Z * [new branch] gh/mlazos/42/head -> origin/gh/mlazos/42/head 2025-12-04T09:33:41.5903689Z * [new branch] gh/mlazos/42/orig -> origin/gh/mlazos/42/orig 2025-12-04T09:33:41.5905399Z * [new branch] gh/mlazos/43/base -> origin/gh/mlazos/43/base 2025-12-04T09:33:41.5907281Z * [new branch] gh/mlazos/43/head -> origin/gh/mlazos/43/head 2025-12-04T09:33:41.5908731Z * [new branch] gh/mlazos/43/orig -> origin/gh/mlazos/43/orig 2025-12-04T09:33:41.5910506Z * [new branch] gh/mlazos/44/base -> origin/gh/mlazos/44/base 2025-12-04T09:33:41.5911915Z * [new branch] gh/mlazos/44/head -> origin/gh/mlazos/44/head 2025-12-04T09:33:41.5913359Z * [new branch] gh/mlazos/44/orig -> origin/gh/mlazos/44/orig 2025-12-04T09:33:41.5915248Z * [new branch] gh/mlazos/47/base -> origin/gh/mlazos/47/base 2025-12-04T09:33:41.5916708Z * [new branch] gh/mlazos/47/head -> origin/gh/mlazos/47/head 2025-12-04T09:33:41.5918239Z * [new branch] gh/mlazos/47/orig -> origin/gh/mlazos/47/orig 2025-12-04T09:33:41.5920018Z * [new branch] gh/mlazos/48/base -> origin/gh/mlazos/48/base 2025-12-04T09:33:41.5921675Z * [new branch] gh/mlazos/48/head -> origin/gh/mlazos/48/head 2025-12-04T09:33:41.5923031Z * [new branch] gh/mlazos/48/orig -> origin/gh/mlazos/48/orig 2025-12-04T09:33:41.5925152Z * [new branch] gh/mlazos/49/base -> origin/gh/mlazos/49/base 2025-12-04T09:33:41.5926285Z * [new branch] gh/mlazos/49/head -> origin/gh/mlazos/49/head 2025-12-04T09:33:41.5927703Z * [new branch] gh/mlazos/49/orig -> origin/gh/mlazos/49/orig 2025-12-04T09:33:41.5929657Z * [new branch] gh/mlazos/50/base -> origin/gh/mlazos/50/base 2025-12-04T09:33:41.5931287Z * [new branch] gh/mlazos/50/head -> origin/gh/mlazos/50/head 2025-12-04T09:33:41.5932511Z * [new branch] gh/mlazos/50/orig -> origin/gh/mlazos/50/orig 2025-12-04T09:33:41.5934286Z * [new branch] gh/mlazos/51/base -> origin/gh/mlazos/51/base 2025-12-04T09:33:41.5935796Z * [new branch] gh/mlazos/51/head -> origin/gh/mlazos/51/head 2025-12-04T09:33:41.5937484Z * [new branch] gh/mlazos/51/orig -> origin/gh/mlazos/51/orig 2025-12-04T09:33:41.5939347Z * [new branch] gh/mlazos/52/base -> origin/gh/mlazos/52/base 2025-12-04T09:33:41.5940917Z * [new branch] gh/mlazos/52/head -> origin/gh/mlazos/52/head 2025-12-04T09:33:41.5942395Z * [new branch] gh/mlazos/52/orig -> origin/gh/mlazos/52/orig 2025-12-04T09:33:41.5944341Z * [new branch] gh/mlazos/53/base -> origin/gh/mlazos/53/base 2025-12-04T09:33:41.5945835Z * [new branch] gh/mlazos/53/head -> origin/gh/mlazos/53/head 2025-12-04T09:33:41.5947310Z * [new branch] gh/mlazos/53/orig -> origin/gh/mlazos/53/orig 2025-12-04T09:33:41.5949206Z * [new branch] gh/mlazos/54/base -> origin/gh/mlazos/54/base 2025-12-04T09:33:41.5950669Z * [new branch] gh/mlazos/54/head -> origin/gh/mlazos/54/head 2025-12-04T09:33:41.5952143Z * [new branch] gh/mlazos/54/orig -> origin/gh/mlazos/54/orig 2025-12-04T09:33:41.5953983Z * [new branch] gh/mlazos/55/base -> origin/gh/mlazos/55/base 2025-12-04T09:33:41.5955469Z * [new branch] gh/mlazos/55/head -> origin/gh/mlazos/55/head 2025-12-04T09:33:41.5956891Z * [new branch] gh/mlazos/55/orig -> origin/gh/mlazos/55/orig 2025-12-04T09:33:41.5958751Z * [new branch] gh/mlazos/56/base -> origin/gh/mlazos/56/base 2025-12-04T09:33:41.5998851Z * [new branch] gh/mlazos/56/head -> origin/gh/mlazos/56/head 2025-12-04T09:33:41.5999435Z * [new branch] gh/mlazos/56/orig -> origin/gh/mlazos/56/orig 2025-12-04T09:33:41.6000181Z * [new branch] gh/mlazos/57/base -> origin/gh/mlazos/57/base 2025-12-04T09:33:41.6000899Z * [new branch] gh/mlazos/57/head -> origin/gh/mlazos/57/head 2025-12-04T09:33:41.6001518Z * [new branch] gh/mlazos/57/orig -> origin/gh/mlazos/57/orig 2025-12-04T09:33:41.6002143Z * [new branch] gh/mlazos/58/base -> origin/gh/mlazos/58/base 2025-12-04T09:33:41.6002759Z * [new branch] gh/mlazos/58/head -> origin/gh/mlazos/58/head 2025-12-04T09:33:41.6003353Z * [new branch] gh/mlazos/58/orig -> origin/gh/mlazos/58/orig 2025-12-04T09:33:41.6003939Z * [new branch] gh/mlazos/59/base -> origin/gh/mlazos/59/base 2025-12-04T09:33:41.6004536Z * [new branch] gh/mlazos/59/head -> origin/gh/mlazos/59/head 2025-12-04T09:33:41.6005127Z * [new branch] gh/mlazos/59/orig -> origin/gh/mlazos/59/orig 2025-12-04T09:33:41.6005752Z * [new branch] gh/mlazos/60/base -> origin/gh/mlazos/60/base 2025-12-04T09:33:41.6006582Z * [new branch] gh/mlazos/60/head -> origin/gh/mlazos/60/head 2025-12-04T09:33:41.6007215Z * [new branch] gh/mlazos/60/orig -> origin/gh/mlazos/60/orig 2025-12-04T09:33:41.6007846Z * [new branch] gh/mlazos/61/base -> origin/gh/mlazos/61/base 2025-12-04T09:33:41.6008474Z * [new branch] gh/mlazos/61/head -> origin/gh/mlazos/61/head 2025-12-04T09:33:41.6009091Z * [new branch] gh/mlazos/61/orig -> origin/gh/mlazos/61/orig 2025-12-04T09:33:41.6009716Z * [new branch] gh/mlazos/62/base -> origin/gh/mlazos/62/base 2025-12-04T09:33:41.6010329Z * [new branch] gh/mlazos/62/head -> origin/gh/mlazos/62/head 2025-12-04T09:33:41.6010987Z * [new branch] gh/mlazos/62/orig -> origin/gh/mlazos/62/orig 2025-12-04T09:33:41.6011601Z * [new branch] gh/mlazos/63/base -> origin/gh/mlazos/63/base 2025-12-04T09:33:41.6012221Z * [new branch] gh/mlazos/63/head -> origin/gh/mlazos/63/head 2025-12-04T09:33:41.6012835Z * [new branch] gh/mlazos/63/orig -> origin/gh/mlazos/63/orig 2025-12-04T09:33:41.6013435Z * [new branch] gh/mlazos/64/base -> origin/gh/mlazos/64/base 2025-12-04T09:33:41.6014175Z * [new branch] gh/mlazos/64/head -> origin/gh/mlazos/64/head 2025-12-04T09:33:41.6014794Z * [new branch] gh/mlazos/64/orig -> origin/gh/mlazos/64/orig 2025-12-04T09:33:41.6015411Z * [new branch] gh/mlazos/65/base -> origin/gh/mlazos/65/base 2025-12-04T09:33:41.6016023Z * [new branch] gh/mlazos/65/head -> origin/gh/mlazos/65/head 2025-12-04T09:33:41.6016733Z * [new branch] gh/mlazos/65/orig -> origin/gh/mlazos/65/orig 2025-12-04T09:33:41.6017349Z * [new branch] gh/mlazos/66/base -> origin/gh/mlazos/66/base 2025-12-04T09:33:41.6017966Z * [new branch] gh/mlazos/66/head -> origin/gh/mlazos/66/head 2025-12-04T09:33:41.6018570Z * [new branch] gh/mlazos/66/orig -> origin/gh/mlazos/66/orig 2025-12-04T09:33:41.6019187Z * [new branch] gh/mlazos/67/base -> origin/gh/mlazos/67/base 2025-12-04T09:33:41.6019804Z * [new branch] gh/mlazos/67/head -> origin/gh/mlazos/67/head 2025-12-04T09:33:41.6020419Z * [new branch] gh/mlazos/67/orig -> origin/gh/mlazos/67/orig 2025-12-04T09:33:41.6021022Z * [new branch] gh/mlazos/68/base -> origin/gh/mlazos/68/base 2025-12-04T09:33:41.6021639Z * [new branch] gh/mlazos/68/head -> origin/gh/mlazos/68/head 2025-12-04T09:33:41.6022255Z * [new branch] gh/mlazos/68/orig -> origin/gh/mlazos/68/orig 2025-12-04T09:33:41.6022859Z * [new branch] gh/mlazos/69/base -> origin/gh/mlazos/69/base 2025-12-04T09:33:41.6023484Z * [new branch] gh/mlazos/69/head -> origin/gh/mlazos/69/head 2025-12-04T09:33:41.6025224Z * [new branch] gh/mlazos/69/orig -> origin/gh/mlazos/69/orig 2025-12-04T09:33:41.6027405Z * [new branch] gh/mlazos/70/base -> origin/gh/mlazos/70/base 2025-12-04T09:33:41.6029027Z * [new branch] gh/mlazos/70/head -> origin/gh/mlazos/70/head 2025-12-04T09:33:41.6030620Z * [new branch] gh/mlazos/70/orig -> origin/gh/mlazos/70/orig 2025-12-04T09:33:41.6032555Z * [new branch] gh/mlazos/71/base -> origin/gh/mlazos/71/base 2025-12-04T09:33:41.6034005Z * [new branch] gh/mlazos/71/head -> origin/gh/mlazos/71/head 2025-12-04T09:33:41.6035454Z * [new branch] gh/mlazos/71/orig -> origin/gh/mlazos/71/orig 2025-12-04T09:33:41.6037357Z * [new branch] gh/mlazos/72/base -> origin/gh/mlazos/72/base 2025-12-04T09:33:41.6039089Z * [new branch] gh/mlazos/72/head -> origin/gh/mlazos/72/head 2025-12-04T09:33:41.6040351Z * [new branch] gh/mlazos/72/orig -> origin/gh/mlazos/72/orig 2025-12-04T09:33:41.6042471Z * [new branch] gh/mlazos/73/base -> origin/gh/mlazos/73/base 2025-12-04T09:33:41.6043925Z * [new branch] gh/mlazos/73/head -> origin/gh/mlazos/73/head 2025-12-04T09:33:41.6045383Z * [new branch] gh/mlazos/73/orig -> origin/gh/mlazos/73/orig 2025-12-04T09:33:41.6047675Z * [new branch] gh/mrmiywj/1/base -> origin/gh/mrmiywj/1/base 2025-12-04T09:33:41.6049232Z * [new branch] gh/mrmiywj/1/head -> origin/gh/mrmiywj/1/head 2025-12-04T09:33:41.6051634Z * [new branch] gh/muchulee8/73/base -> origin/gh/muchulee8/73/base 2025-12-04T09:33:41.6053270Z * [new branch] gh/muchulee8/73/head -> origin/gh/muchulee8/73/head 2025-12-04T09:33:41.6054854Z * [new branch] gh/muchulee8/73/orig -> origin/gh/muchulee8/73/orig 2025-12-04T09:33:41.6057514Z * [new branch] gh/naveenthangudu/1/base -> origin/gh/naveenthangudu/1/base 2025-12-04T09:33:41.6058910Z * [new branch] gh/naveenthangudu/1/head -> origin/gh/naveenthangudu/1/head 2025-12-04T09:33:41.6060576Z * [new branch] gh/naveenthangudu/1/orig -> origin/gh/naveenthangudu/1/orig 2025-12-04T09:33:41.6062494Z * [new branch] gh/naveenthangudu/2/base -> origin/gh/naveenthangudu/2/base 2025-12-04T09:33:41.6063991Z * [new branch] gh/naveenthangudu/2/head -> origin/gh/naveenthangudu/2/head 2025-12-04T09:33:41.6065498Z * [new branch] gh/naveenthangudu/2/orig -> origin/gh/naveenthangudu/2/orig 2025-12-04T09:33:41.6067287Z * [new branch] gh/naveenthangudu/3/base -> origin/gh/naveenthangudu/3/base 2025-12-04T09:33:41.6068638Z * [new branch] gh/naveenthangudu/3/head -> origin/gh/naveenthangudu/3/head 2025-12-04T09:33:41.6070346Z * [new branch] gh/naveenthangudu/3/orig -> origin/gh/naveenthangudu/3/orig 2025-12-04T09:33:41.6072500Z * [new branch] gh/naveenthangudu/4/base -> origin/gh/naveenthangudu/4/base 2025-12-04T09:33:41.6073823Z * [new branch] gh/naveenthangudu/4/head -> origin/gh/naveenthangudu/4/head 2025-12-04T09:33:41.6075480Z * [new branch] gh/naveenthangudu/4/orig -> origin/gh/naveenthangudu/4/orig 2025-12-04T09:33:41.6077466Z * [new branch] gh/naveenthangudu/5/base -> origin/gh/naveenthangudu/5/base 2025-12-04T09:33:41.6078858Z * [new branch] gh/naveenthangudu/5/head -> origin/gh/naveenthangudu/5/head 2025-12-04T09:33:41.6080685Z * [new branch] gh/naveenthangudu/5/orig -> origin/gh/naveenthangudu/5/orig 2025-12-04T09:33:41.6082500Z * [new branch] gh/naveenthangudu/6/base -> origin/gh/naveenthangudu/6/base 2025-12-04T09:33:41.6083825Z * [new branch] gh/naveenthangudu/6/head -> origin/gh/naveenthangudu/6/head 2025-12-04T09:33:41.6085144Z * [new branch] gh/naveenthangudu/6/orig -> origin/gh/naveenthangudu/6/orig 2025-12-04T09:33:41.6087623Z * [new branch] gh/naveenthangudu/7/base -> origin/gh/naveenthangudu/7/base 2025-12-04T09:33:41.6089010Z * [new branch] gh/naveenthangudu/7/head -> origin/gh/naveenthangudu/7/head 2025-12-04T09:33:41.6090386Z * [new branch] gh/naveenthangudu/7/orig -> origin/gh/naveenthangudu/7/orig 2025-12-04T09:33:41.6092178Z * [new branch] gh/naveenthangudu/8/base -> origin/gh/naveenthangudu/8/base 2025-12-04T09:33:41.6093743Z * [new branch] gh/naveenthangudu/8/head -> origin/gh/naveenthangudu/8/head 2025-12-04T09:33:41.6095243Z * [new branch] gh/naveenthangudu/8/orig -> origin/gh/naveenthangudu/8/orig 2025-12-04T09:33:41.6097466Z * [new branch] gh/naveenthangudu/9/base -> origin/gh/naveenthangudu/9/base 2025-12-04T09:33:41.6098643Z * [new branch] gh/naveenthangudu/9/head -> origin/gh/naveenthangudu/9/head 2025-12-04T09:33:41.6100276Z * [new branch] gh/naveenthangudu/9/orig -> origin/gh/naveenthangudu/9/orig 2025-12-04T09:33:41.6102572Z * [new branch] gh/nikitaved/1/base -> origin/gh/nikitaved/1/base 2025-12-04T09:33:41.6104080Z * [new branch] gh/nikitaved/1/head -> origin/gh/nikitaved/1/head 2025-12-04T09:33:41.6105403Z * [new branch] gh/nikitaved/1/orig -> origin/gh/nikitaved/1/orig 2025-12-04T09:33:41.6107487Z * [new branch] gh/nikitaved/10/base -> origin/gh/nikitaved/10/base 2025-12-04T09:33:41.6108968Z * [new branch] gh/nikitaved/10/head -> origin/gh/nikitaved/10/head 2025-12-04T09:33:41.6110455Z * [new branch] gh/nikitaved/10/orig -> origin/gh/nikitaved/10/orig 2025-12-04T09:33:41.6112339Z * [new branch] gh/nikitaved/11/base -> origin/gh/nikitaved/11/base 2025-12-04T09:33:41.6113862Z * [new branch] gh/nikitaved/11/head -> origin/gh/nikitaved/11/head 2025-12-04T09:33:41.6115515Z * [new branch] gh/nikitaved/11/orig -> origin/gh/nikitaved/11/orig 2025-12-04T09:33:41.6117345Z * [new branch] gh/nikitaved/12/base -> origin/gh/nikitaved/12/base 2025-12-04T09:33:41.6118645Z * [new branch] gh/nikitaved/12/head -> origin/gh/nikitaved/12/head 2025-12-04T09:33:41.6120147Z * [new branch] gh/nikitaved/12/orig -> origin/gh/nikitaved/12/orig 2025-12-04T09:33:41.6122091Z * [new branch] gh/nikitaved/13/base -> origin/gh/nikitaved/13/base 2025-12-04T09:33:41.6123532Z * [new branch] gh/nikitaved/13/head -> origin/gh/nikitaved/13/head 2025-12-04T09:33:41.6124841Z * [new branch] gh/nikitaved/13/orig -> origin/gh/nikitaved/13/orig 2025-12-04T09:33:41.6126943Z * [new branch] gh/nikitaved/14/base -> origin/gh/nikitaved/14/base 2025-12-04T09:33:41.6128425Z * [new branch] gh/nikitaved/14/head -> origin/gh/nikitaved/14/head 2025-12-04T09:33:41.6129874Z * [new branch] gh/nikitaved/14/orig -> origin/gh/nikitaved/14/orig 2025-12-04T09:33:41.6131633Z * [new branch] gh/nikitaved/15/base -> origin/gh/nikitaved/15/base 2025-12-04T09:33:41.6133072Z * [new branch] gh/nikitaved/15/head -> origin/gh/nikitaved/15/head 2025-12-04T09:33:41.6134645Z * [new branch] gh/nikitaved/15/orig -> origin/gh/nikitaved/15/orig 2025-12-04T09:33:41.6136555Z * [new branch] gh/nikitaved/16/base -> origin/gh/nikitaved/16/base 2025-12-04T09:33:41.6138068Z * [new branch] gh/nikitaved/16/head -> origin/gh/nikitaved/16/head 2025-12-04T09:33:41.6139502Z * [new branch] gh/nikitaved/16/orig -> origin/gh/nikitaved/16/orig 2025-12-04T09:33:41.6141487Z * [new branch] gh/nikitaved/2/base -> origin/gh/nikitaved/2/base 2025-12-04T09:33:41.6142927Z * [new branch] gh/nikitaved/2/head -> origin/gh/nikitaved/2/head 2025-12-04T09:33:41.6144424Z * [new branch] gh/nikitaved/2/orig -> origin/gh/nikitaved/2/orig 2025-12-04T09:33:41.6146345Z * [new branch] gh/nikitaved/4/base -> origin/gh/nikitaved/4/base 2025-12-04T09:33:41.6147802Z * [new branch] gh/nikitaved/4/head -> origin/gh/nikitaved/4/head 2025-12-04T09:33:41.6149286Z * [new branch] gh/nikitaved/4/orig -> origin/gh/nikitaved/4/orig 2025-12-04T09:33:41.6151272Z * [new branch] gh/nikitaved/5/base -> origin/gh/nikitaved/5/base 2025-12-04T09:33:41.6152710Z * [new branch] gh/nikitaved/5/head -> origin/gh/nikitaved/5/head 2025-12-04T09:33:41.6154421Z * [new branch] gh/nikitaved/5/orig -> origin/gh/nikitaved/5/orig 2025-12-04T09:33:41.6156127Z * [new branch] gh/nikitaved/6/base -> origin/gh/nikitaved/6/base 2025-12-04T09:33:41.6157641Z * [new branch] gh/nikitaved/6/head -> origin/gh/nikitaved/6/head 2025-12-04T09:33:41.6159615Z * [new branch] gh/nikitaved/6/orig -> origin/gh/nikitaved/6/orig 2025-12-04T09:33:41.6161537Z * [new branch] gh/nikitaved/8/base -> origin/gh/nikitaved/8/base 2025-12-04T09:33:41.6162958Z * [new branch] gh/nikitaved/8/head -> origin/gh/nikitaved/8/head 2025-12-04T09:33:41.6164414Z * [new branch] gh/nikitaved/8/orig -> origin/gh/nikitaved/8/orig 2025-12-04T09:33:41.6166326Z * [new branch] gh/nikitaved/9/base -> origin/gh/nikitaved/9/base 2025-12-04T09:33:41.6167765Z * [new branch] gh/nikitaved/9/head -> origin/gh/nikitaved/9/head 2025-12-04T09:33:41.6169207Z * [new branch] gh/nikitaved/9/orig -> origin/gh/nikitaved/9/orig 2025-12-04T09:33:41.6171564Z * [new branch] gh/oulgen/10/base -> origin/gh/oulgen/10/base 2025-12-04T09:33:41.6175213Z * [new branch] gh/oulgen/10/head -> origin/gh/oulgen/10/head 2025-12-04T09:33:41.6176785Z * [new branch] gh/oulgen/10/orig -> origin/gh/oulgen/10/orig 2025-12-04T09:33:41.6178711Z * [new branch] gh/oulgen/11/base -> origin/gh/oulgen/11/base 2025-12-04T09:33:41.6180095Z * [new branch] gh/oulgen/11/head -> origin/gh/oulgen/11/head 2025-12-04T09:33:41.6181681Z * [new branch] gh/oulgen/11/orig -> origin/gh/oulgen/11/orig 2025-12-04T09:33:41.6183518Z * [new branch] gh/oulgen/12/base -> origin/gh/oulgen/12/base 2025-12-04T09:33:41.6184944Z * [new branch] gh/oulgen/12/head -> origin/gh/oulgen/12/head 2025-12-04T09:33:41.6186397Z * [new branch] gh/oulgen/12/orig -> origin/gh/oulgen/12/orig 2025-12-04T09:33:41.6188758Z * [new branch] gh/oulgen/13/base -> origin/gh/oulgen/13/base 2025-12-04T09:33:41.6190199Z * [new branch] gh/oulgen/13/head -> origin/gh/oulgen/13/head 2025-12-04T09:33:41.6191619Z * [new branch] gh/oulgen/13/orig -> origin/gh/oulgen/13/orig 2025-12-04T09:33:41.6193504Z * [new branch] gh/oulgen/14/base -> origin/gh/oulgen/14/base 2025-12-04T09:33:41.6195526Z * [new branch] gh/oulgen/14/head -> origin/gh/oulgen/14/head 2025-12-04T09:33:41.6197086Z * [new branch] gh/oulgen/14/orig -> origin/gh/oulgen/14/orig 2025-12-04T09:33:41.6199281Z * [new branch] gh/oulgen/15/base -> origin/gh/oulgen/15/base 2025-12-04T09:33:41.6200321Z * [new branch] gh/oulgen/15/head -> origin/gh/oulgen/15/head 2025-12-04T09:33:41.6201860Z * [new branch] gh/oulgen/15/orig -> origin/gh/oulgen/15/orig 2025-12-04T09:33:41.6203775Z * [new branch] gh/oulgen/16/base -> origin/gh/oulgen/16/base 2025-12-04T09:33:41.6205197Z * [new branch] gh/oulgen/16/head -> origin/gh/oulgen/16/head 2025-12-04T09:33:41.6206656Z * [new branch] gh/oulgen/16/orig -> origin/gh/oulgen/16/orig 2025-12-04T09:33:41.6208542Z * [new branch] gh/oulgen/17/base -> origin/gh/oulgen/17/base 2025-12-04T09:33:41.6210083Z * [new branch] gh/oulgen/17/head -> origin/gh/oulgen/17/head 2025-12-04T09:33:41.6211478Z * [new branch] gh/oulgen/17/orig -> origin/gh/oulgen/17/orig 2025-12-04T09:33:41.6213407Z * [new branch] gh/oulgen/18/base -> origin/gh/oulgen/18/base 2025-12-04T09:33:41.6214901Z * [new branch] gh/oulgen/18/head -> origin/gh/oulgen/18/head 2025-12-04T09:33:41.6216740Z * [new branch] gh/oulgen/18/orig -> origin/gh/oulgen/18/orig 2025-12-04T09:33:41.6218357Z * [new branch] gh/oulgen/19/base -> origin/gh/oulgen/19/base 2025-12-04T09:33:41.6219824Z * [new branch] gh/oulgen/19/head -> origin/gh/oulgen/19/head 2025-12-04T09:33:41.6221782Z * [new branch] gh/oulgen/19/orig -> origin/gh/oulgen/19/orig 2025-12-04T09:33:41.6223269Z * [new branch] gh/oulgen/20/base -> origin/gh/oulgen/20/base 2025-12-04T09:33:41.6224712Z * [new branch] gh/oulgen/20/head -> origin/gh/oulgen/20/head 2025-12-04T09:33:41.6226236Z * [new branch] gh/oulgen/20/orig -> origin/gh/oulgen/20/orig 2025-12-04T09:33:41.6228084Z * [new branch] gh/oulgen/21/base -> origin/gh/oulgen/21/base 2025-12-04T09:33:41.6229566Z * [new branch] gh/oulgen/21/head -> origin/gh/oulgen/21/head 2025-12-04T09:33:41.6231007Z * [new branch] gh/oulgen/21/orig -> origin/gh/oulgen/21/orig 2025-12-04T09:33:41.6232955Z * [new branch] gh/oulgen/22/base -> origin/gh/oulgen/22/base 2025-12-04T09:33:41.6234459Z * [new branch] gh/oulgen/22/head -> origin/gh/oulgen/22/head 2025-12-04T09:33:41.6235942Z * [new branch] gh/oulgen/22/orig -> origin/gh/oulgen/22/orig 2025-12-04T09:33:41.6237849Z * [new branch] gh/oulgen/23/base -> origin/gh/oulgen/23/base 2025-12-04T09:33:41.6239300Z * [new branch] gh/oulgen/23/head -> origin/gh/oulgen/23/head 2025-12-04T09:33:41.6240753Z * [new branch] gh/oulgen/23/orig -> origin/gh/oulgen/23/orig 2025-12-04T09:33:41.6242546Z * [new branch] gh/oulgen/24/base -> origin/gh/oulgen/24/base 2025-12-04T09:33:41.6244011Z * [new branch] gh/oulgen/24/head -> origin/gh/oulgen/24/head 2025-12-04T09:33:41.6245417Z * [new branch] gh/oulgen/24/orig -> origin/gh/oulgen/24/orig 2025-12-04T09:33:41.6247283Z * [new branch] gh/oulgen/25/base -> origin/gh/oulgen/25/base 2025-12-04T09:33:41.6248746Z * [new branch] gh/oulgen/25/head -> origin/gh/oulgen/25/head 2025-12-04T09:33:41.6250247Z * [new branch] gh/oulgen/25/orig -> origin/gh/oulgen/25/orig 2025-12-04T09:33:41.6252196Z * [new branch] gh/oulgen/26/base -> origin/gh/oulgen/26/base 2025-12-04T09:33:41.6253747Z * [new branch] gh/oulgen/26/head -> origin/gh/oulgen/26/head 2025-12-04T09:33:41.6255302Z * [new branch] gh/oulgen/26/orig -> origin/gh/oulgen/26/orig 2025-12-04T09:33:41.6257332Z * [new branch] gh/oulgen/4/base -> origin/gh/oulgen/4/base 2025-12-04T09:33:41.6258812Z * [new branch] gh/oulgen/4/head -> origin/gh/oulgen/4/head 2025-12-04T09:33:41.6260252Z * [new branch] gh/oulgen/4/orig -> origin/gh/oulgen/4/orig 2025-12-04T09:33:41.6262988Z * [new branch] gh/oulgen/7/base -> origin/gh/oulgen/7/base 2025-12-04T09:33:41.6264463Z * [new branch] gh/oulgen/7/head -> origin/gh/oulgen/7/head 2025-12-04T09:33:41.6265951Z * [new branch] gh/oulgen/7/orig -> origin/gh/oulgen/7/orig 2025-12-04T09:33:41.6267903Z * [new branch] gh/oulgen/8/base -> origin/gh/oulgen/8/base 2025-12-04T09:33:41.6269438Z * [new branch] gh/oulgen/8/head -> origin/gh/oulgen/8/head 2025-12-04T09:33:41.6270881Z * [new branch] gh/oulgen/8/orig -> origin/gh/oulgen/8/orig 2025-12-04T09:33:41.6273144Z * [new branch] gh/oulgen/9/base -> origin/gh/oulgen/9/base 2025-12-04T09:33:41.6274739Z * [new branch] gh/oulgen/9/head -> origin/gh/oulgen/9/head 2025-12-04T09:33:41.6276787Z * [new branch] gh/oulgen/9/orig -> origin/gh/oulgen/9/orig 2025-12-04T09:33:41.6278719Z * [new branch] gh/patvig/mtia-serialization -> origin/gh/patvig/mtia-serialization 2025-12-04T09:33:41.6281220Z * [new branch] gh/pearu/108/base -> origin/gh/pearu/108/base 2025-12-04T09:33:41.6282734Z * [new branch] gh/pearu/108/head -> origin/gh/pearu/108/head 2025-12-04T09:33:41.6284370Z * [new branch] gh/pearu/108/orig -> origin/gh/pearu/108/orig 2025-12-04T09:33:41.6286275Z * [new branch] gh/pearu/109/base -> origin/gh/pearu/109/base 2025-12-04T09:33:41.6287724Z * [new branch] gh/pearu/109/head -> origin/gh/pearu/109/head 2025-12-04T09:33:41.6289138Z * [new branch] gh/pearu/109/orig -> origin/gh/pearu/109/orig 2025-12-04T09:33:41.6291152Z * [new branch] gh/pearu/110/base -> origin/gh/pearu/110/base 2025-12-04T09:33:41.6292589Z * [new branch] gh/pearu/110/head -> origin/gh/pearu/110/head 2025-12-04T09:33:41.6294220Z * [new branch] gh/pearu/110/orig -> origin/gh/pearu/110/orig 2025-12-04T09:33:41.6296134Z * [new branch] gh/pearu/111/base -> origin/gh/pearu/111/base 2025-12-04T09:33:41.6297731Z * [new branch] gh/pearu/111/head -> origin/gh/pearu/111/head 2025-12-04T09:33:41.6299287Z * [new branch] gh/pearu/111/orig -> origin/gh/pearu/111/orig 2025-12-04T09:33:41.6301220Z * [new branch] gh/pearu/112/base -> origin/gh/pearu/112/base 2025-12-04T09:33:41.6302680Z * [new branch] gh/pearu/112/head -> origin/gh/pearu/112/head 2025-12-04T09:33:41.6304127Z * [new branch] gh/pearu/112/orig -> origin/gh/pearu/112/orig 2025-12-04T09:33:41.6305902Z * [new branch] gh/pearu/115/base -> origin/gh/pearu/115/base 2025-12-04T09:33:41.6307335Z * [new branch] gh/pearu/115/head -> origin/gh/pearu/115/head 2025-12-04T09:33:41.6308816Z * [new branch] gh/pearu/115/orig -> origin/gh/pearu/115/orig 2025-12-04T09:33:41.6310601Z * [new branch] gh/pearu/116/base -> origin/gh/pearu/116/base 2025-12-04T09:33:41.6312104Z * [new branch] gh/pearu/116/head -> origin/gh/pearu/116/head 2025-12-04T09:33:41.6313647Z * [new branch] gh/pearu/116/orig -> origin/gh/pearu/116/orig 2025-12-04T09:33:41.6315499Z * [new branch] gh/pearu/117/base -> origin/gh/pearu/117/base 2025-12-04T09:33:41.6316994Z * [new branch] gh/pearu/117/head -> origin/gh/pearu/117/head 2025-12-04T09:33:41.6318492Z * [new branch] gh/pearu/117/orig -> origin/gh/pearu/117/orig 2025-12-04T09:33:41.6320365Z * [new branch] gh/pearu/118/base -> origin/gh/pearu/118/base 2025-12-04T09:33:41.6321825Z * [new branch] gh/pearu/118/head -> origin/gh/pearu/118/head 2025-12-04T09:33:41.6323395Z * [new branch] gh/pearu/118/orig -> origin/gh/pearu/118/orig 2025-12-04T09:33:41.6325273Z * [new branch] gh/pearu/119/base -> origin/gh/pearu/119/base 2025-12-04T09:33:41.6326706Z * [new branch] gh/pearu/119/head -> origin/gh/pearu/119/head 2025-12-04T09:33:41.6328127Z * [new branch] gh/pearu/119/orig -> origin/gh/pearu/119/orig 2025-12-04T09:33:41.6330511Z * [new branch] gh/pearu/139/base -> origin/gh/pearu/139/base 2025-12-04T09:33:41.6331997Z * [new branch] gh/pearu/139/head -> origin/gh/pearu/139/head 2025-12-04T09:33:41.6333513Z * [new branch] gh/pearu/139/orig -> origin/gh/pearu/139/orig 2025-12-04T09:33:41.6335489Z * [new branch] gh/pearu/140/base -> origin/gh/pearu/140/base 2025-12-04T09:33:41.6337104Z * [new branch] gh/pearu/140/head -> origin/gh/pearu/140/head 2025-12-04T09:33:41.6338531Z * [new branch] gh/pearu/140/orig -> origin/gh/pearu/140/orig 2025-12-04T09:33:41.6340473Z * [new branch] gh/pearu/142/base -> origin/gh/pearu/142/base 2025-12-04T09:33:41.6341927Z * [new branch] gh/pearu/142/head -> origin/gh/pearu/142/head 2025-12-04T09:33:41.6343372Z * [new branch] gh/pearu/142/orig -> origin/gh/pearu/142/orig 2025-12-04T09:33:41.6345341Z * [new branch] gh/pearu/143/base -> origin/gh/pearu/143/base 2025-12-04T09:33:41.6346738Z * [new branch] gh/pearu/143/head -> origin/gh/pearu/143/head 2025-12-04T09:33:41.6348257Z * [new branch] gh/pearu/143/orig -> origin/gh/pearu/143/orig 2025-12-04T09:33:41.6350221Z * [new branch] gh/pearu/147/base -> origin/gh/pearu/147/base 2025-12-04T09:33:41.6351763Z * [new branch] gh/pearu/147/head -> origin/gh/pearu/147/head 2025-12-04T09:33:41.6353815Z * [new branch] gh/pearu/147/orig -> origin/gh/pearu/147/orig 2025-12-04T09:33:41.6355758Z * [new branch] gh/pearu/149/base -> origin/gh/pearu/149/base 2025-12-04T09:33:41.6357083Z * [new branch] gh/pearu/149/head -> origin/gh/pearu/149/head 2025-12-04T09:33:41.6358630Z * [new branch] gh/pearu/149/orig -> origin/gh/pearu/149/orig 2025-12-04T09:33:41.6361054Z * [new branch] gh/pearu/150/base -> origin/gh/pearu/150/base 2025-12-04T09:33:41.6362550Z * [new branch] gh/pearu/150/head -> origin/gh/pearu/150/head 2025-12-04T09:33:41.6363994Z * [new branch] gh/pearu/150/orig -> origin/gh/pearu/150/orig 2025-12-04T09:33:41.6366004Z * [new branch] gh/pearu/151/base -> origin/gh/pearu/151/base 2025-12-04T09:33:41.6367506Z * [new branch] gh/pearu/151/head -> origin/gh/pearu/151/head 2025-12-04T09:33:41.6368995Z * [new branch] gh/pearu/151/orig -> origin/gh/pearu/151/orig 2025-12-04T09:33:41.6371418Z * [new branch] gh/pearu/152/base -> origin/gh/pearu/152/base 2025-12-04T09:33:41.6372728Z * [new branch] gh/pearu/152/head -> origin/gh/pearu/152/head 2025-12-04T09:33:41.6374284Z * [new branch] gh/pearu/152/orig -> origin/gh/pearu/152/orig 2025-12-04T09:33:41.6376340Z * [new branch] gh/pearu/153/base -> origin/gh/pearu/153/base 2025-12-04T09:33:41.6377915Z * [new branch] gh/pearu/153/head -> origin/gh/pearu/153/head 2025-12-04T09:33:41.6379329Z * [new branch] gh/pearu/153/orig -> origin/gh/pearu/153/orig 2025-12-04T09:33:41.6381338Z * [new branch] gh/pearu/154/base -> origin/gh/pearu/154/base 2025-12-04T09:33:41.6382798Z * [new branch] gh/pearu/154/head -> origin/gh/pearu/154/head 2025-12-04T09:33:41.6384242Z * [new branch] gh/pearu/154/orig -> origin/gh/pearu/154/orig 2025-12-04T09:33:41.6386309Z * [new branch] gh/pearu/155/base -> origin/gh/pearu/155/base 2025-12-04T09:33:41.6387791Z * [new branch] gh/pearu/155/head -> origin/gh/pearu/155/head 2025-12-04T09:33:41.6389337Z * [new branch] gh/pearu/155/orig -> origin/gh/pearu/155/orig 2025-12-04T09:33:41.6391306Z * [new branch] gh/pearu/156/base -> origin/gh/pearu/156/base 2025-12-04T09:33:41.6392709Z * [new branch] gh/pearu/156/head -> origin/gh/pearu/156/head 2025-12-04T09:33:41.6394292Z * [new branch] gh/pearu/156/orig -> origin/gh/pearu/156/orig 2025-12-04T09:33:41.6396763Z * [new branch] gh/pearu/56/base -> origin/gh/pearu/56/base 2025-12-04T09:33:41.6398666Z * [new branch] gh/pearu/56/head -> origin/gh/pearu/56/head 2025-12-04T09:33:41.6399925Z * [new branch] gh/pearu/56/orig -> origin/gh/pearu/56/orig 2025-12-04T09:33:41.6402763Z * [new branch] gh/pearu/97/base -> origin/gh/pearu/97/base 2025-12-04T09:33:41.6404238Z * [new branch] gh/pearu/97/head -> origin/gh/pearu/97/head 2025-12-04T09:33:41.6405806Z * [new branch] gh/pearu/97/orig -> origin/gh/pearu/97/orig 2025-12-04T09:33:41.6408207Z * [new branch] gh/pianpwk/21/base -> origin/gh/pianpwk/21/base 2025-12-04T09:33:41.6409710Z * [new branch] gh/pianpwk/21/head -> origin/gh/pianpwk/21/head 2025-12-04T09:33:41.6411771Z * [new branch] gh/pianpwk/28/base -> origin/gh/pianpwk/28/base 2025-12-04T09:33:41.6413275Z * [new branch] gh/pianpwk/28/head -> origin/gh/pianpwk/28/head 2025-12-04T09:33:41.6415482Z * [new branch] gh/pianpwk/28/orig -> origin/gh/pianpwk/28/orig 2025-12-04T09:33:41.6417632Z * [new branch] gh/pianpwk/29/base -> origin/gh/pianpwk/29/base 2025-12-04T09:33:41.6419343Z * [new branch] gh/pianpwk/29/head -> origin/gh/pianpwk/29/head 2025-12-04T09:33:41.6421334Z * [new branch] gh/pianpwk/29/orig -> origin/gh/pianpwk/29/orig 2025-12-04T09:33:41.6423551Z * [new branch] gh/pianpwk/30/base -> origin/gh/pianpwk/30/base 2025-12-04T09:33:41.6425040Z * [new branch] gh/pianpwk/30/head -> origin/gh/pianpwk/30/head 2025-12-04T09:33:41.6426609Z * [new branch] gh/pianpwk/30/orig -> origin/gh/pianpwk/30/orig 2025-12-04T09:33:41.6428718Z * [new branch] gh/pianpwk/31/base -> origin/gh/pianpwk/31/base 2025-12-04T09:33:41.6430214Z * [new branch] gh/pianpwk/31/head -> origin/gh/pianpwk/31/head 2025-12-04T09:33:41.6431743Z * [new branch] gh/pianpwk/31/orig -> origin/gh/pianpwk/31/orig 2025-12-04T09:33:41.6433565Z * [new branch] gh/pianpwk/32/base -> origin/gh/pianpwk/32/base 2025-12-04T09:33:41.6435101Z * [new branch] gh/pianpwk/32/head -> origin/gh/pianpwk/32/head 2025-12-04T09:33:41.6436669Z * [new branch] gh/pianpwk/32/orig -> origin/gh/pianpwk/32/orig 2025-12-04T09:33:41.6438444Z * [new branch] gh/pianpwk/33/base -> origin/gh/pianpwk/33/base 2025-12-04T09:33:41.6439859Z * [new branch] gh/pianpwk/33/head -> origin/gh/pianpwk/33/head 2025-12-04T09:33:41.6441328Z * [new branch] gh/pianpwk/33/orig -> origin/gh/pianpwk/33/orig 2025-12-04T09:33:41.6443607Z * [new branch] gh/pianpwk/34/base -> origin/gh/pianpwk/34/base 2025-12-04T09:33:41.6445451Z * [new branch] gh/pianpwk/34/head -> origin/gh/pianpwk/34/head 2025-12-04T09:33:41.6447410Z * [new branch] gh/pianpwk/34/orig -> origin/gh/pianpwk/34/orig 2025-12-04T09:33:41.6449387Z * [new branch] gh/pianpwk/35/base -> origin/gh/pianpwk/35/base 2025-12-04T09:33:41.6450879Z * [new branch] gh/pianpwk/35/head -> origin/gh/pianpwk/35/head 2025-12-04T09:33:41.6452456Z * [new branch] gh/pianpwk/35/orig -> origin/gh/pianpwk/35/orig 2025-12-04T09:33:41.6454873Z * [new branch] gh/rec/141/base -> origin/gh/rec/141/base 2025-12-04T09:33:41.6456545Z * [new branch] gh/rec/141/head -> origin/gh/rec/141/head 2025-12-04T09:33:41.6458551Z * [new branch] gh/rec/153/base -> origin/gh/rec/153/base 2025-12-04T09:33:41.6459964Z * [new branch] gh/rec/153/head -> origin/gh/rec/153/head 2025-12-04T09:33:41.6461449Z * [new branch] gh/rec/153/orig -> origin/gh/rec/153/orig 2025-12-04T09:33:41.6463543Z * [new branch] gh/rec/154/base -> origin/gh/rec/154/base 2025-12-04T09:33:41.6465428Z * [new branch] gh/rec/154/head -> origin/gh/rec/154/head 2025-12-04T09:33:41.6466997Z * [new branch] gh/rec/154/orig -> origin/gh/rec/154/orig 2025-12-04T09:33:41.6469005Z * [new branch] gh/rec/164/base -> origin/gh/rec/164/base 2025-12-04T09:33:41.6470469Z * [new branch] gh/rec/164/head -> origin/gh/rec/164/head 2025-12-04T09:33:41.6472123Z * [new branch] gh/rec/164/orig -> origin/gh/rec/164/orig 2025-12-04T09:33:41.6474710Z * [new branch] gh/rec/166/base -> origin/gh/rec/166/base 2025-12-04T09:33:41.6476212Z * [new branch] gh/rec/166/head -> origin/gh/rec/166/head 2025-12-04T09:33:41.6477784Z * [new branch] gh/rec/166/orig -> origin/gh/rec/166/orig 2025-12-04T09:33:41.6479882Z * [new branch] gh/rec/167/base -> origin/gh/rec/167/base 2025-12-04T09:33:41.6481259Z * [new branch] gh/rec/167/head -> origin/gh/rec/167/head 2025-12-04T09:33:41.6482789Z * [new branch] gh/rec/167/orig -> origin/gh/rec/167/orig 2025-12-04T09:33:41.6484691Z * [new branch] gh/rec/168/base -> origin/gh/rec/168/base 2025-12-04T09:33:41.6486117Z * [new branch] gh/rec/168/head -> origin/gh/rec/168/head 2025-12-04T09:33:41.6487548Z * [new branch] gh/rec/168/orig -> origin/gh/rec/168/orig 2025-12-04T09:33:41.6489513Z * [new branch] gh/rec/169/base -> origin/gh/rec/169/base 2025-12-04T09:33:41.6490944Z * [new branch] gh/rec/169/head -> origin/gh/rec/169/head 2025-12-04T09:33:41.6492404Z * [new branch] gh/rec/169/orig -> origin/gh/rec/169/orig 2025-12-04T09:33:41.6494391Z * [new branch] gh/rec/170/base -> origin/gh/rec/170/base 2025-12-04T09:33:41.6496420Z * [new branch] gh/rec/170/head -> origin/gh/rec/170/head 2025-12-04T09:33:41.6498095Z * [new branch] gh/rec/170/orig -> origin/gh/rec/170/orig 2025-12-04T09:33:41.6500048Z * [new branch] gh/rec/171/base -> origin/gh/rec/171/base 2025-12-04T09:33:41.6501565Z * [new branch] gh/rec/171/head -> origin/gh/rec/171/head 2025-12-04T09:33:41.6503089Z * [new branch] gh/rec/171/orig -> origin/gh/rec/171/orig 2025-12-04T09:33:41.6504992Z * [new branch] gh/rec/172/base -> origin/gh/rec/172/base 2025-12-04T09:33:41.6506459Z * [new branch] gh/rec/172/head -> origin/gh/rec/172/head 2025-12-04T09:33:41.6507858Z * [new branch] gh/rec/172/orig -> origin/gh/rec/172/orig 2025-12-04T09:33:41.6509861Z * [new branch] gh/rec/173/base -> origin/gh/rec/173/base 2025-12-04T09:33:41.6511296Z * [new branch] gh/rec/173/head -> origin/gh/rec/173/head 2025-12-04T09:33:41.6512800Z * [new branch] gh/rec/173/orig -> origin/gh/rec/173/orig 2025-12-04T09:33:41.6514724Z * [new branch] gh/rec/174/base -> origin/gh/rec/174/base 2025-12-04T09:33:41.6516172Z * [new branch] gh/rec/174/head -> origin/gh/rec/174/head 2025-12-04T09:33:41.6517840Z * [new branch] gh/rec/174/orig -> origin/gh/rec/174/orig 2025-12-04T09:33:41.6519752Z * [new branch] gh/rec/175/base -> origin/gh/rec/175/base 2025-12-04T09:33:41.6521156Z * [new branch] gh/rec/175/head -> origin/gh/rec/175/head 2025-12-04T09:33:41.6522659Z * [new branch] gh/rec/175/orig -> origin/gh/rec/175/orig 2025-12-04T09:33:41.6524764Z * [new branch] gh/rec/176/base -> origin/gh/rec/176/base 2025-12-04T09:33:41.6525964Z * [new branch] gh/rec/176/head -> origin/gh/rec/176/head 2025-12-04T09:33:41.6527524Z * [new branch] gh/rec/176/orig -> origin/gh/rec/176/orig 2025-12-04T09:33:41.6530035Z * [new branch] gh/rec/177/base -> origin/gh/rec/177/base 2025-12-04T09:33:41.6531654Z * [new branch] gh/rec/177/head -> origin/gh/rec/177/head 2025-12-04T09:33:41.6533089Z * [new branch] gh/rec/177/orig -> origin/gh/rec/177/orig 2025-12-04T09:33:41.6535715Z * [new branch] gh/robert-hardwick/3/base -> origin/gh/robert-hardwick/3/base 2025-12-04T09:33:41.6537479Z * [new branch] gh/robert-hardwick/3/head -> origin/gh/robert-hardwick/3/head 2025-12-04T09:33:41.6538859Z * [new branch] gh/robert-hardwick/3/orig -> origin/gh/robert-hardwick/3/orig 2025-12-04T09:33:41.6540936Z * [new branch] gh/robert-hardwick/4/base -> origin/gh/robert-hardwick/4/base 2025-12-04T09:33:41.6542447Z * [new branch] gh/robert-hardwick/4/head -> origin/gh/robert-hardwick/4/head 2025-12-04T09:33:41.6543979Z * [new branch] gh/robert-hardwick/4/orig -> origin/gh/robert-hardwick/4/orig 2025-12-04T09:33:41.6545905Z * [new branch] gh/robert-hardwick/5/base -> origin/gh/robert-hardwick/5/base 2025-12-04T09:33:41.6547282Z * [new branch] gh/robert-hardwick/5/head -> origin/gh/robert-hardwick/5/head 2025-12-04T09:33:41.6548882Z * [new branch] gh/robert-hardwick/5/orig -> origin/gh/robert-hardwick/5/orig 2025-12-04T09:33:41.6550817Z * [new branch] gh/robert-hardwick/6/base -> origin/gh/robert-hardwick/6/base 2025-12-04T09:33:41.6552168Z * [new branch] gh/robert-hardwick/6/head -> origin/gh/robert-hardwick/6/head 2025-12-04T09:33:41.6553720Z * [new branch] gh/robert-hardwick/6/orig -> origin/gh/robert-hardwick/6/orig 2025-12-04T09:33:41.6555715Z * [new branch] gh/robert-hardwick/7/base -> origin/gh/robert-hardwick/7/base 2025-12-04T09:33:41.6557279Z * [new branch] gh/robert-hardwick/7/head -> origin/gh/robert-hardwick/7/head 2025-12-04T09:33:41.6558613Z * [new branch] gh/robert-hardwick/7/orig -> origin/gh/robert-hardwick/7/orig 2025-12-04T09:33:41.6561167Z * [new branch] gh/robert-hardwick/8/base -> origin/gh/robert-hardwick/8/base 2025-12-04T09:33:41.6562665Z * [new branch] gh/robert-hardwick/8/head -> origin/gh/robert-hardwick/8/head 2025-12-04T09:33:41.6564216Z * [new branch] gh/robert-hardwick/8/orig -> origin/gh/robert-hardwick/8/orig 2025-12-04T09:33:41.6566265Z * [new branch] gh/robert-hardwick/9/base -> origin/gh/robert-hardwick/9/base 2025-12-04T09:33:41.6567838Z * [new branch] gh/robert-hardwick/9/head -> origin/gh/robert-hardwick/9/head 2025-12-04T09:33:41.6569335Z * [new branch] gh/robert-hardwick/9/orig -> origin/gh/robert-hardwick/9/orig 2025-12-04T09:33:41.6571940Z * [new branch] gh/rtimpe/1/base -> origin/gh/rtimpe/1/base 2025-12-04T09:33:41.6573407Z * [new branch] gh/rtimpe/1/head -> origin/gh/rtimpe/1/head 2025-12-04T09:33:41.6575357Z * [new branch] gh/rtimpe/2/base -> origin/gh/rtimpe/2/base 2025-12-04T09:33:41.6576917Z * [new branch] gh/rtimpe/2/head -> origin/gh/rtimpe/2/head 2025-12-04T09:33:41.6579023Z * [new branch] gh/rtimpe/22/base -> origin/gh/rtimpe/22/base 2025-12-04T09:33:41.6580552Z * [new branch] gh/rtimpe/22/head -> origin/gh/rtimpe/22/head 2025-12-04T09:33:41.6582080Z * [new branch] gh/rtimpe/22/orig -> origin/gh/rtimpe/22/orig 2025-12-04T09:33:41.6583919Z * [new branch] gh/rtimpe/23/base -> origin/gh/rtimpe/23/base 2025-12-04T09:33:41.6585548Z * [new branch] gh/rtimpe/23/head -> origin/gh/rtimpe/23/head 2025-12-04T09:33:41.6586728Z * [new branch] gh/rtimpe/23/orig -> origin/gh/rtimpe/23/orig 2025-12-04T09:33:41.6588720Z * [new branch] gh/rtimpe/24/base -> origin/gh/rtimpe/24/base 2025-12-04T09:33:41.6590196Z * [new branch] gh/rtimpe/24/head -> origin/gh/rtimpe/24/head 2025-12-04T09:33:41.6591621Z * [new branch] gh/rtimpe/24/orig -> origin/gh/rtimpe/24/orig 2025-12-04T09:33:41.6593522Z * [new branch] gh/rtimpe/25/base -> origin/gh/rtimpe/25/base 2025-12-04T09:33:41.6594983Z * [new branch] gh/rtimpe/25/head -> origin/gh/rtimpe/25/head 2025-12-04T09:33:41.6596666Z * [new branch] gh/rtimpe/25/orig -> origin/gh/rtimpe/25/orig 2025-12-04T09:33:41.6599075Z * [new branch] gh/rtimpe/26/base -> origin/gh/rtimpe/26/base 2025-12-04T09:33:41.6600579Z * [new branch] gh/rtimpe/26/head -> origin/gh/rtimpe/26/head 2025-12-04T09:33:41.6602110Z * [new branch] gh/rtimpe/26/orig -> origin/gh/rtimpe/26/orig 2025-12-04T09:33:41.6604028Z * [new branch] gh/rtimpe/27/base -> origin/gh/rtimpe/27/base 2025-12-04T09:33:41.6605433Z * [new branch] gh/rtimpe/27/head -> origin/gh/rtimpe/27/head 2025-12-04T09:33:41.6607013Z * [new branch] gh/rtimpe/27/orig -> origin/gh/rtimpe/27/orig 2025-12-04T09:33:41.6609511Z * [new branch] gh/rtimpe/28/base -> origin/gh/rtimpe/28/base 2025-12-04T09:33:41.6610955Z * [new branch] gh/rtimpe/28/head -> origin/gh/rtimpe/28/head 2025-12-04T09:33:41.6612553Z * [new branch] gh/rtimpe/28/orig -> origin/gh/rtimpe/28/orig 2025-12-04T09:33:41.6614587Z * [new branch] gh/rtimpe/29/base -> origin/gh/rtimpe/29/base 2025-12-04T09:33:41.6616041Z * [new branch] gh/rtimpe/29/head -> origin/gh/rtimpe/29/head 2025-12-04T09:33:41.6617901Z * [new branch] gh/rtimpe/29/orig -> origin/gh/rtimpe/29/orig 2025-12-04T09:33:41.6619803Z * [new branch] gh/rtimpe/3/base -> origin/gh/rtimpe/3/base 2025-12-04T09:33:41.6621112Z * [new branch] gh/rtimpe/3/head -> origin/gh/rtimpe/3/head 2025-12-04T09:33:41.6623156Z * [new branch] gh/rtimpe/30/base -> origin/gh/rtimpe/30/base 2025-12-04T09:33:41.6624646Z * [new branch] gh/rtimpe/30/head -> origin/gh/rtimpe/30/head 2025-12-04T09:33:41.6626130Z * [new branch] gh/rtimpe/30/orig -> origin/gh/rtimpe/30/orig 2025-12-04T09:33:41.6628131Z * [new branch] gh/rtimpe/31/base -> origin/gh/rtimpe/31/base 2025-12-04T09:33:41.6629602Z * [new branch] gh/rtimpe/31/head -> origin/gh/rtimpe/31/head 2025-12-04T09:33:41.6631290Z * [new branch] gh/rtimpe/31/orig -> origin/gh/rtimpe/31/orig 2025-12-04T09:33:41.6633756Z * [new branch] gh/rtimpe/32/base -> origin/gh/rtimpe/32/base 2025-12-04T09:33:41.6635220Z * [new branch] gh/rtimpe/32/head -> origin/gh/rtimpe/32/head 2025-12-04T09:33:41.6636629Z * [new branch] gh/rtimpe/32/orig -> origin/gh/rtimpe/32/orig 2025-12-04T09:33:41.6638740Z * [new branch] gh/rtimpe/33/base -> origin/gh/rtimpe/33/base 2025-12-04T09:33:41.6640163Z * [new branch] gh/rtimpe/33/head -> origin/gh/rtimpe/33/head 2025-12-04T09:33:41.6641700Z * [new branch] gh/rtimpe/33/orig -> origin/gh/rtimpe/33/orig 2025-12-04T09:33:41.6643565Z * [new branch] gh/rtimpe/34/base -> origin/gh/rtimpe/34/base 2025-12-04T09:33:41.6645052Z * [new branch] gh/rtimpe/34/head -> origin/gh/rtimpe/34/head 2025-12-04T09:33:41.6646566Z * [new branch] gh/rtimpe/34/orig -> origin/gh/rtimpe/34/orig 2025-12-04T09:33:41.6648516Z * [new branch] gh/rtimpe/35/base -> origin/gh/rtimpe/35/base 2025-12-04T09:33:41.6650062Z * [new branch] gh/rtimpe/35/head -> origin/gh/rtimpe/35/head 2025-12-04T09:33:41.6651589Z * [new branch] gh/rtimpe/35/orig -> origin/gh/rtimpe/35/orig 2025-12-04T09:33:41.6653554Z * [new branch] gh/rtimpe/4/base -> origin/gh/rtimpe/4/base 2025-12-04T09:33:41.6655071Z * [new branch] gh/rtimpe/4/head -> origin/gh/rtimpe/4/head 2025-12-04T09:33:41.6657968Z * [new branch] gh/ruisizhang123/1/base -> origin/gh/ruisizhang123/1/base 2025-12-04T09:33:41.6659529Z * [new branch] gh/ruisizhang123/1/head -> origin/gh/ruisizhang123/1/head 2025-12-04T09:33:41.6660857Z * [new branch] gh/ruisizhang123/1/orig -> origin/gh/ruisizhang123/1/orig 2025-12-04T09:33:41.6662896Z * [new branch] gh/ruisizhang123/4/base -> origin/gh/ruisizhang123/4/base 2025-12-04T09:33:41.6664434Z * [new branch] gh/ruisizhang123/4/head -> origin/gh/ruisizhang123/4/head 2025-12-04T09:33:41.6665876Z * [new branch] gh/ruisizhang123/4/orig -> origin/gh/ruisizhang123/4/orig 2025-12-04T09:33:41.6667852Z * [new branch] gh/ruisizhang123/5/base -> origin/gh/ruisizhang123/5/base 2025-12-04T09:33:41.6669583Z * [new branch] gh/ruisizhang123/5/head -> origin/gh/ruisizhang123/5/head 2025-12-04T09:33:41.6671207Z * [new branch] gh/ruisizhang123/5/orig -> origin/gh/ruisizhang123/5/orig 2025-12-04T09:33:41.6675409Z * [new branch] gh/ruisizhang123/6/base -> origin/gh/ruisizhang123/6/base 2025-12-04T09:33:41.6676718Z * [new branch] gh/ruisizhang123/6/head -> origin/gh/ruisizhang123/6/head 2025-12-04T09:33:41.6678252Z * [new branch] gh/ruisizhang123/6/orig -> origin/gh/ruisizhang123/6/orig 2025-12-04T09:33:41.6680404Z * [new branch] gh/ruisizhang123/7/base -> origin/gh/ruisizhang123/7/base 2025-12-04T09:33:41.6681898Z * [new branch] gh/ruisizhang123/7/head -> origin/gh/ruisizhang123/7/head 2025-12-04T09:33:41.6683434Z * [new branch] gh/ruisizhang123/7/orig -> origin/gh/ruisizhang123/7/orig 2025-12-04T09:33:41.6685314Z * [new branch] gh/ruisizhang123/8/base -> origin/gh/ruisizhang123/8/base 2025-12-04T09:33:41.6687170Z * [new branch] gh/ruisizhang123/8/head -> origin/gh/ruisizhang123/8/head 2025-12-04T09:33:41.6688690Z * [new branch] gh/ruisizhang123/8/orig -> origin/gh/ruisizhang123/8/orig 2025-12-04T09:33:41.6690628Z * [new branch] gh/ruisizhang123/9/base -> origin/gh/ruisizhang123/9/base 2025-12-04T09:33:41.6692131Z * [new branch] gh/ruisizhang123/9/head -> origin/gh/ruisizhang123/9/head 2025-12-04T09:33:41.6693590Z * [new branch] gh/ruisizhang123/9/orig -> origin/gh/ruisizhang123/9/orig 2025-12-04T09:33:41.6696108Z * [new branch] gh/seemethere/52/base -> origin/gh/seemethere/52/base 2025-12-04T09:33:41.6697770Z * [new branch] gh/seemethere/52/head -> origin/gh/seemethere/52/head 2025-12-04T09:33:41.6699359Z * [new branch] gh/seemethere/52/orig -> origin/gh/seemethere/52/orig 2025-12-04T09:33:41.6701297Z * [new branch] gh/seemethere/53/base -> origin/gh/seemethere/53/base 2025-12-04T09:33:41.6702748Z * [new branch] gh/seemethere/53/head -> origin/gh/seemethere/53/head 2025-12-04T09:33:41.6704259Z * [new branch] gh/seemethere/53/orig -> origin/gh/seemethere/53/orig 2025-12-04T09:33:41.6706286Z * [new branch] gh/seemethere/54/base -> origin/gh/seemethere/54/base 2025-12-04T09:33:41.6707747Z * [new branch] gh/seemethere/54/head -> origin/gh/seemethere/54/head 2025-12-04T09:33:41.6709433Z * [new branch] gh/seemethere/54/orig -> origin/gh/seemethere/54/orig 2025-12-04T09:33:41.6711162Z * [new branch] gh/seemethere/55/base -> origin/gh/seemethere/55/base 2025-12-04T09:33:41.6712398Z * [new branch] gh/seemethere/55/head -> origin/gh/seemethere/55/head 2025-12-04T09:33:41.6713963Z * [new branch] gh/seemethere/55/orig -> origin/gh/seemethere/55/orig 2025-12-04T09:33:41.6715816Z * [new branch] gh/seemethere/59/base -> origin/gh/seemethere/59/base 2025-12-04T09:33:41.6717309Z * [new branch] gh/seemethere/59/head -> origin/gh/seemethere/59/head 2025-12-04T09:33:41.6718945Z * [new branch] gh/seemethere/59/orig -> origin/gh/seemethere/59/orig 2025-12-04T09:33:41.6720843Z * [new branch] gh/seemethere/62/base -> origin/gh/seemethere/62/base 2025-12-04T09:33:41.6722375Z * [new branch] gh/seemethere/62/head -> origin/gh/seemethere/62/head 2025-12-04T09:33:41.6723860Z * [new branch] gh/seemethere/62/orig -> origin/gh/seemethere/62/orig 2025-12-04T09:33:41.6725805Z * [new branch] gh/seemethere/63/base -> origin/gh/seemethere/63/base 2025-12-04T09:33:41.6727178Z * [new branch] gh/seemethere/63/head -> origin/gh/seemethere/63/head 2025-12-04T09:33:41.6728705Z * [new branch] gh/seemethere/63/orig -> origin/gh/seemethere/63/orig 2025-12-04T09:33:41.6730649Z * [new branch] gh/seemethere/71/base -> origin/gh/seemethere/71/base 2025-12-04T09:33:41.6732101Z * [new branch] gh/seemethere/71/head -> origin/gh/seemethere/71/head 2025-12-04T09:33:41.6733589Z * [new branch] gh/seemethere/71/orig -> origin/gh/seemethere/71/orig 2025-12-04T09:33:41.6735554Z * [new branch] gh/seemethere/72/base -> origin/gh/seemethere/72/base 2025-12-04T09:33:41.6737122Z * [new branch] gh/seemethere/72/head -> origin/gh/seemethere/72/head 2025-12-04T09:33:41.6738903Z * [new branch] gh/seemethere/72/orig -> origin/gh/seemethere/72/orig 2025-12-04T09:33:41.6740879Z * [new branch] gh/seemethere/73/base -> origin/gh/seemethere/73/base 2025-12-04T09:33:41.6742333Z * [new branch] gh/seemethere/73/head -> origin/gh/seemethere/73/head 2025-12-04T09:33:41.6743861Z * [new branch] gh/seemethere/73/orig -> origin/gh/seemethere/73/orig 2025-12-04T09:33:41.6745785Z * [new branch] gh/seemethere/74/base -> origin/gh/seemethere/74/base 2025-12-04T09:33:41.6747226Z * [new branch] gh/seemethere/74/head -> origin/gh/seemethere/74/head 2025-12-04T09:33:41.6748751Z * [new branch] gh/seemethere/74/orig -> origin/gh/seemethere/74/orig 2025-12-04T09:33:41.6750744Z * [new branch] gh/seemethere/75/base -> origin/gh/seemethere/75/base 2025-12-04T09:33:41.6752090Z * [new branch] gh/seemethere/75/head -> origin/gh/seemethere/75/head 2025-12-04T09:33:41.6753663Z * [new branch] gh/seemethere/75/orig -> origin/gh/seemethere/75/orig 2025-12-04T09:33:41.6755590Z * [new branch] gh/seemethere/76/base -> origin/gh/seemethere/76/base 2025-12-04T09:33:41.6757033Z * [new branch] gh/seemethere/76/head -> origin/gh/seemethere/76/head 2025-12-04T09:33:41.6758605Z * [new branch] gh/seemethere/76/orig -> origin/gh/seemethere/76/orig 2025-12-04T09:33:41.6761370Z * [new branch] gh/shunting314/145/base -> origin/gh/shunting314/145/base 2025-12-04T09:33:41.6763035Z * [new branch] gh/shunting314/145/head -> origin/gh/shunting314/145/head 2025-12-04T09:33:41.6764587Z * [new branch] gh/shunting314/145/orig -> origin/gh/shunting314/145/orig 2025-12-04T09:33:41.6767075Z * [new branch] gh/shunting314/176/base -> origin/gh/shunting314/176/base 2025-12-04T09:33:41.6768760Z * [new branch] gh/shunting314/176/head -> origin/gh/shunting314/176/head 2025-12-04T09:33:41.6770270Z * [new branch] gh/shunting314/176/orig -> origin/gh/shunting314/176/orig 2025-12-04T09:33:41.6773095Z * [new branch] gh/shunting314/249/base -> origin/gh/shunting314/249/base 2025-12-04T09:33:41.6774720Z * [new branch] gh/shunting314/249/head -> origin/gh/shunting314/249/head 2025-12-04T09:33:41.6776336Z * [new branch] gh/shunting314/249/orig -> origin/gh/shunting314/249/orig 2025-12-04T09:33:41.6778498Z * [new branch] gh/shunting314/253/base -> origin/gh/shunting314/253/base 2025-12-04T09:33:41.6780015Z * [new branch] gh/shunting314/253/head -> origin/gh/shunting314/253/head 2025-12-04T09:33:41.6781355Z * [new branch] gh/shunting314/253/orig -> origin/gh/shunting314/253/orig 2025-12-04T09:33:41.6783459Z * [new branch] gh/shunting314/256/base -> origin/gh/shunting314/256/base 2025-12-04T09:33:41.6784935Z * [new branch] gh/shunting314/256/head -> origin/gh/shunting314/256/head 2025-12-04T09:33:41.6786254Z * [new branch] gh/shunting314/256/orig -> origin/gh/shunting314/256/orig 2025-12-04T09:33:41.6788701Z * [new branch] gh/shunting314/257/base -> origin/gh/shunting314/257/base 2025-12-04T09:33:41.6790279Z * [new branch] gh/shunting314/257/head -> origin/gh/shunting314/257/head 2025-12-04T09:33:41.6791760Z * [new branch] gh/shunting314/257/orig -> origin/gh/shunting314/257/orig 2025-12-04T09:33:41.6794001Z * [new branch] gh/shunting314/258/base -> origin/gh/shunting314/258/base 2025-12-04T09:33:41.6795282Z * [new branch] gh/shunting314/258/head -> origin/gh/shunting314/258/head 2025-12-04T09:33:41.6796882Z * [new branch] gh/shunting314/258/orig -> origin/gh/shunting314/258/orig 2025-12-04T09:33:41.6798727Z * [new branch] gh/shunting314/259/base -> origin/gh/shunting314/259/base 2025-12-04T09:33:41.6800403Z * [new branch] gh/shunting314/259/head -> origin/gh/shunting314/259/head 2025-12-04T09:33:41.6801927Z * [new branch] gh/shunting314/259/orig -> origin/gh/shunting314/259/orig 2025-12-04T09:33:41.6804038Z * [new branch] gh/shunting314/260/base -> origin/gh/shunting314/260/base 2025-12-04T09:33:41.6805689Z * [new branch] gh/shunting314/260/head -> origin/gh/shunting314/260/head 2025-12-04T09:33:41.6807242Z * [new branch] gh/shunting314/260/orig -> origin/gh/shunting314/260/orig 2025-12-04T09:33:41.6809329Z * [new branch] gh/shunting314/261/base -> origin/gh/shunting314/261/base 2025-12-04T09:33:41.6811474Z * [new branch] gh/shunting314/261/head -> origin/gh/shunting314/261/head 2025-12-04T09:33:41.6813084Z * [new branch] gh/shunting314/261/orig -> origin/gh/shunting314/261/orig 2025-12-04T09:33:41.6815151Z * [new branch] gh/shunting314/262/base -> origin/gh/shunting314/262/base 2025-12-04T09:33:41.6816760Z * [new branch] gh/shunting314/262/head -> origin/gh/shunting314/262/head 2025-12-04T09:33:41.6818243Z * [new branch] gh/shunting314/262/orig -> origin/gh/shunting314/262/orig 2025-12-04T09:33:41.6820365Z * [new branch] gh/shunting314/263/base -> origin/gh/shunting314/263/base 2025-12-04T09:33:41.6822102Z * [new branch] gh/shunting314/263/head -> origin/gh/shunting314/263/head 2025-12-04T09:33:41.6823691Z * [new branch] gh/shunting314/263/orig -> origin/gh/shunting314/263/orig 2025-12-04T09:33:41.6825683Z * [new branch] gh/shunting314/264/base -> origin/gh/shunting314/264/base 2025-12-04T09:33:41.6827375Z * [new branch] gh/shunting314/264/head -> origin/gh/shunting314/264/head 2025-12-04T09:33:41.6828750Z * [new branch] gh/shunting314/264/orig -> origin/gh/shunting314/264/orig 2025-12-04T09:33:41.6830709Z * [new branch] gh/shunting314/265/base -> origin/gh/shunting314/265/base 2025-12-04T09:33:41.6832094Z * [new branch] gh/shunting314/265/head -> origin/gh/shunting314/265/head 2025-12-04T09:33:41.6833564Z * [new branch] gh/shunting314/265/orig -> origin/gh/shunting314/265/orig 2025-12-04T09:33:41.6835599Z * [new branch] gh/shunting314/266/base -> origin/gh/shunting314/266/base 2025-12-04T09:33:41.6837858Z * [new branch] gh/shunting314/266/head -> origin/gh/shunting314/266/head 2025-12-04T09:33:41.6839347Z * [new branch] gh/shunting314/266/orig -> origin/gh/shunting314/266/orig 2025-12-04T09:33:41.6842608Z * [new branch] gh/shunting314/267/base -> origin/gh/shunting314/267/base 2025-12-04T09:33:41.6844435Z * [new branch] gh/shunting314/267/head -> origin/gh/shunting314/267/head 2025-12-04T09:33:41.6845871Z * [new branch] gh/shunting314/267/orig -> origin/gh/shunting314/267/orig 2025-12-04T09:33:41.6848489Z * [new branch] gh/shunting314/268/base -> origin/gh/shunting314/268/base 2025-12-04T09:33:41.6850088Z * [new branch] gh/shunting314/268/head -> origin/gh/shunting314/268/head 2025-12-04T09:33:41.6851582Z * [new branch] gh/shunting314/268/orig -> origin/gh/shunting314/268/orig 2025-12-04T09:33:41.6853588Z * [new branch] gh/shunting314/269/base -> origin/gh/shunting314/269/base 2025-12-04T09:33:41.6855071Z * [new branch] gh/shunting314/269/head -> origin/gh/shunting314/269/head 2025-12-04T09:33:41.6857188Z * [new branch] gh/shunting314/269/orig -> origin/gh/shunting314/269/orig 2025-12-04T09:33:41.6859718Z * [new branch] gh/silverguo/1/base -> origin/gh/silverguo/1/base 2025-12-04T09:33:41.6861167Z * [new branch] gh/silverguo/1/head -> origin/gh/silverguo/1/head 2025-12-04T09:33:41.6863063Z * [new branch] gh/silverguo/2/base -> origin/gh/silverguo/2/base 2025-12-04T09:33:41.6864603Z * [new branch] gh/silverguo/2/head -> origin/gh/silverguo/2/head 2025-12-04T09:33:41.6866426Z * [new branch] gh/silverguo/3/base -> origin/gh/silverguo/3/base 2025-12-04T09:33:41.6867864Z * [new branch] gh/silverguo/3/head -> origin/gh/silverguo/3/head 2025-12-04T09:33:41.6869633Z * [new branch] gh/silverguo/4/base -> origin/gh/silverguo/4/base 2025-12-04T09:33:41.6871334Z * [new branch] gh/silverguo/4/head -> origin/gh/silverguo/4/head 2025-12-04T09:33:41.6873842Z * [new branch] gh/slayton58/39/base -> origin/gh/slayton58/39/base 2025-12-04T09:33:41.6875295Z * [new branch] gh/slayton58/39/head -> origin/gh/slayton58/39/head 2025-12-04T09:33:41.6876807Z * [new branch] gh/slayton58/39/orig -> origin/gh/slayton58/39/orig 2025-12-04T09:33:41.6878800Z * [new branch] gh/slayton58/42/base -> origin/gh/slayton58/42/base 2025-12-04T09:33:41.6880227Z * [new branch] gh/slayton58/42/head -> origin/gh/slayton58/42/head 2025-12-04T09:33:41.6881914Z * [new branch] gh/slayton58/42/orig -> origin/gh/slayton58/42/orig 2025-12-04T09:33:41.6884027Z * [new branch] gh/slayton58/43/base -> origin/gh/slayton58/43/base 2025-12-04T09:33:41.6885567Z * [new branch] gh/slayton58/43/head -> origin/gh/slayton58/43/head 2025-12-04T09:33:41.6887070Z * [new branch] gh/slayton58/43/orig -> origin/gh/slayton58/43/orig 2025-12-04T09:33:41.6889180Z * [new branch] gh/slayton58/44/base -> origin/gh/slayton58/44/base 2025-12-04T09:33:41.6890815Z * [new branch] gh/slayton58/44/head -> origin/gh/slayton58/44/head 2025-12-04T09:33:41.6892216Z * [new branch] gh/slayton58/44/orig -> origin/gh/slayton58/44/orig 2025-12-04T09:33:41.6894175Z * [new branch] gh/slayton58/45/base -> origin/gh/slayton58/45/base 2025-12-04T09:33:41.6895620Z * [new branch] gh/slayton58/45/head -> origin/gh/slayton58/45/head 2025-12-04T09:33:41.6897301Z * [new branch] gh/slayton58/45/orig -> origin/gh/slayton58/45/orig 2025-12-04T09:33:41.6899781Z * [new branch] gh/slayton58/46/base -> origin/gh/slayton58/46/base 2025-12-04T09:33:41.6901420Z * [new branch] gh/slayton58/46/head -> origin/gh/slayton58/46/head 2025-12-04T09:33:41.6902884Z * [new branch] gh/slayton58/46/orig -> origin/gh/slayton58/46/orig 2025-12-04T09:33:41.6905074Z * [new branch] gh/slayton58/6/base -> origin/gh/slayton58/6/base 2025-12-04T09:33:41.6906668Z * [new branch] gh/slayton58/6/head -> origin/gh/slayton58/6/head 2025-12-04T09:33:41.6908500Z * [new branch] gh/slayton58/7/base -> origin/gh/slayton58/7/base 2025-12-04T09:33:41.6909920Z * [new branch] gh/slayton58/7/head -> origin/gh/slayton58/7/head 2025-12-04T09:33:41.6912766Z * [new branch] gh/soulitzer/269/base -> origin/gh/soulitzer/269/base 2025-12-04T09:33:41.6914282Z * [new branch] gh/soulitzer/269/head -> origin/gh/soulitzer/269/head 2025-12-04T09:33:41.6916281Z * [new branch] gh/soulitzer/269/orig -> origin/gh/soulitzer/269/orig 2025-12-04T09:33:41.6918407Z * [new branch] gh/soulitzer/276/base -> origin/gh/soulitzer/276/base 2025-12-04T09:33:41.6919917Z * [new branch] gh/soulitzer/276/head -> origin/gh/soulitzer/276/head 2025-12-04T09:33:41.6921415Z * [new branch] gh/soulitzer/276/orig -> origin/gh/soulitzer/276/orig 2025-12-04T09:33:41.6923808Z * [new branch] gh/soulitzer/287/base -> origin/gh/soulitzer/287/base 2025-12-04T09:33:41.6925254Z * [new branch] gh/soulitzer/287/head -> origin/gh/soulitzer/287/head 2025-12-04T09:33:41.6926893Z * [new branch] gh/soulitzer/287/orig -> origin/gh/soulitzer/287/orig 2025-12-04T09:33:41.6929019Z * [new branch] gh/soulitzer/296/base -> origin/gh/soulitzer/296/base 2025-12-04T09:33:41.6930539Z * [new branch] gh/soulitzer/296/head -> origin/gh/soulitzer/296/head 2025-12-04T09:33:41.6931988Z * [new branch] gh/soulitzer/296/orig -> origin/gh/soulitzer/296/orig 2025-12-04T09:33:41.6933984Z * [new branch] gh/soulitzer/299/base -> origin/gh/soulitzer/299/base 2025-12-04T09:33:41.6935624Z * [new branch] gh/soulitzer/299/head -> origin/gh/soulitzer/299/head 2025-12-04T09:33:41.6937221Z * [new branch] gh/soulitzer/299/orig -> origin/gh/soulitzer/299/orig 2025-12-04T09:33:41.6939272Z * [new branch] gh/soulitzer/300/base -> origin/gh/soulitzer/300/base 2025-12-04T09:33:41.6941369Z * [new branch] gh/soulitzer/300/head -> origin/gh/soulitzer/300/head 2025-12-04T09:33:41.6942856Z * [new branch] gh/soulitzer/300/orig -> origin/gh/soulitzer/300/orig 2025-12-04T09:33:41.6945096Z * [new branch] gh/soulitzer/301/base -> origin/gh/soulitzer/301/base 2025-12-04T09:33:41.6946653Z * [new branch] gh/soulitzer/301/head -> origin/gh/soulitzer/301/head 2025-12-04T09:33:41.6948192Z * [new branch] gh/soulitzer/301/orig -> origin/gh/soulitzer/301/orig 2025-12-04T09:33:41.6950146Z * [new branch] gh/soulitzer/313/base -> origin/gh/soulitzer/313/base 2025-12-04T09:33:41.6951629Z * [new branch] gh/soulitzer/313/head -> origin/gh/soulitzer/313/head 2025-12-04T09:33:41.6953243Z * [new branch] gh/soulitzer/313/orig -> origin/gh/soulitzer/313/orig 2025-12-04T09:33:41.6955169Z * [new branch] gh/soulitzer/319/base -> origin/gh/soulitzer/319/base 2025-12-04T09:33:41.6957038Z * [new branch] gh/soulitzer/319/head -> origin/gh/soulitzer/319/head 2025-12-04T09:33:41.6958481Z * [new branch] gh/soulitzer/319/orig -> origin/gh/soulitzer/319/orig 2025-12-04T09:33:41.6960568Z * [new branch] gh/soulitzer/320/base -> origin/gh/soulitzer/320/base 2025-12-04T09:33:41.6962053Z * [new branch] gh/soulitzer/320/head -> origin/gh/soulitzer/320/head 2025-12-04T09:33:41.6964042Z * [new branch] gh/soulitzer/320/orig -> origin/gh/soulitzer/320/orig 2025-12-04T09:33:41.6966246Z * [new branch] gh/soulitzer/336/base -> origin/gh/soulitzer/336/base 2025-12-04T09:33:41.6967677Z * [new branch] gh/soulitzer/336/head -> origin/gh/soulitzer/336/head 2025-12-04T09:33:41.6969095Z * [new branch] gh/soulitzer/336/orig -> origin/gh/soulitzer/336/orig 2025-12-04T09:33:41.6971772Z * [new branch] gh/soulitzer/347/base -> origin/gh/soulitzer/347/base 2025-12-04T09:33:41.6973252Z * [new branch] gh/soulitzer/347/head -> origin/gh/soulitzer/347/head 2025-12-04T09:33:41.6974689Z * [new branch] gh/soulitzer/347/orig -> origin/gh/soulitzer/347/orig 2025-12-04T09:33:41.6977045Z * [new branch] gh/soulitzer/349/base -> origin/gh/soulitzer/349/base 2025-12-04T09:33:41.6978532Z * [new branch] gh/soulitzer/349/head -> origin/gh/soulitzer/349/head 2025-12-04T09:33:41.6980106Z * [new branch] gh/soulitzer/349/orig -> origin/gh/soulitzer/349/orig 2025-12-04T09:33:41.6982011Z * [new branch] gh/soulitzer/350/base -> origin/gh/soulitzer/350/base 2025-12-04T09:33:41.6983399Z * [new branch] gh/soulitzer/350/head -> origin/gh/soulitzer/350/head 2025-12-04T09:33:41.6984832Z * [new branch] gh/soulitzer/350/orig -> origin/gh/soulitzer/350/orig 2025-12-04T09:33:41.6987012Z * [new branch] gh/soulitzer/351/base -> origin/gh/soulitzer/351/base 2025-12-04T09:33:41.6988505Z * [new branch] gh/soulitzer/351/head -> origin/gh/soulitzer/351/head 2025-12-04T09:33:41.6989974Z * [new branch] gh/soulitzer/351/orig -> origin/gh/soulitzer/351/orig 2025-12-04T09:33:41.6991945Z * [new branch] gh/soulitzer/353/base -> origin/gh/soulitzer/353/base 2025-12-04T09:33:41.6993627Z * [new branch] gh/soulitzer/353/head -> origin/gh/soulitzer/353/head 2025-12-04T09:33:41.6995086Z * [new branch] gh/soulitzer/353/orig -> origin/gh/soulitzer/353/orig 2025-12-04T09:33:41.6997903Z * [new branch] gh/soulitzer/358/base -> origin/gh/soulitzer/358/base 2025-12-04T09:33:41.6999531Z * [new branch] gh/soulitzer/358/head -> origin/gh/soulitzer/358/head 2025-12-04T09:33:41.7001104Z * [new branch] gh/soulitzer/358/orig -> origin/gh/soulitzer/358/orig 2025-12-04T09:33:41.7003758Z * [new branch] gh/soulitzer/359/base -> origin/gh/soulitzer/359/base 2025-12-04T09:33:41.7005265Z * [new branch] gh/soulitzer/359/head -> origin/gh/soulitzer/359/head 2025-12-04T09:33:41.7006838Z * [new branch] gh/soulitzer/359/orig -> origin/gh/soulitzer/359/orig 2025-12-04T09:33:41.7008953Z * [new branch] gh/soulitzer/374/base -> origin/gh/soulitzer/374/base 2025-12-04T09:33:41.7010386Z * [new branch] gh/soulitzer/374/head -> origin/gh/soulitzer/374/head 2025-12-04T09:33:41.7011897Z * [new branch] gh/soulitzer/374/orig -> origin/gh/soulitzer/374/orig 2025-12-04T09:33:41.7013897Z * [new branch] gh/soulitzer/375/base -> origin/gh/soulitzer/375/base 2025-12-04T09:33:41.7015378Z * [new branch] gh/soulitzer/375/head -> origin/gh/soulitzer/375/head 2025-12-04T09:33:41.7016857Z * [new branch] gh/soulitzer/375/orig -> origin/gh/soulitzer/375/orig 2025-12-04T09:33:41.7018825Z * [new branch] gh/soulitzer/380/base -> origin/gh/soulitzer/380/base 2025-12-04T09:33:41.7020251Z * [new branch] gh/soulitzer/380/head -> origin/gh/soulitzer/380/head 2025-12-04T09:33:41.7021707Z * [new branch] gh/soulitzer/380/orig -> origin/gh/soulitzer/380/orig 2025-12-04T09:33:41.7023654Z * [new branch] gh/soulitzer/385/base -> origin/gh/soulitzer/385/base 2025-12-04T09:33:41.7025143Z * [new branch] gh/soulitzer/385/head -> origin/gh/soulitzer/385/head 2025-12-04T09:33:41.7026540Z * [new branch] gh/soulitzer/385/orig -> origin/gh/soulitzer/385/orig 2025-12-04T09:33:41.7028653Z * [new branch] gh/soulitzer/386/base -> origin/gh/soulitzer/386/base 2025-12-04T09:33:41.7030142Z * [new branch] gh/soulitzer/386/head -> origin/gh/soulitzer/386/head 2025-12-04T09:33:41.7031658Z * [new branch] gh/soulitzer/386/orig -> origin/gh/soulitzer/386/orig 2025-12-04T09:33:41.7034040Z * [new branch] gh/soulitzer/387/base -> origin/gh/soulitzer/387/base 2025-12-04T09:33:41.7035474Z * [new branch] gh/soulitzer/387/head -> origin/gh/soulitzer/387/head 2025-12-04T09:33:41.7036945Z * [new branch] gh/soulitzer/387/orig -> origin/gh/soulitzer/387/orig 2025-12-04T09:33:41.7038982Z * [new branch] gh/soulitzer/388/base -> origin/gh/soulitzer/388/base 2025-12-04T09:33:41.7040416Z * [new branch] gh/soulitzer/388/head -> origin/gh/soulitzer/388/head 2025-12-04T09:33:41.7041904Z * [new branch] gh/soulitzer/388/orig -> origin/gh/soulitzer/388/orig 2025-12-04T09:33:41.7043975Z * [new branch] gh/soulitzer/389/base -> origin/gh/soulitzer/389/base 2025-12-04T09:33:41.7045389Z * [new branch] gh/soulitzer/389/head -> origin/gh/soulitzer/389/head 2025-12-04T09:33:41.7046821Z * [new branch] gh/soulitzer/389/orig -> origin/gh/soulitzer/389/orig 2025-12-04T09:33:41.7049354Z * [new branch] gh/soulitzer/390/base -> origin/gh/soulitzer/390/base 2025-12-04T09:33:41.7050821Z * [new branch] gh/soulitzer/390/head -> origin/gh/soulitzer/390/head 2025-12-04T09:33:41.7052349Z * [new branch] gh/soulitzer/390/orig -> origin/gh/soulitzer/390/orig 2025-12-04T09:33:41.7054295Z * [new branch] gh/soulitzer/391/base -> origin/gh/soulitzer/391/base 2025-12-04T09:33:41.7055747Z * [new branch] gh/soulitzer/391/head -> origin/gh/soulitzer/391/head 2025-12-04T09:33:41.7057419Z * [new branch] gh/soulitzer/391/orig -> origin/gh/soulitzer/391/orig 2025-12-04T09:33:41.7059384Z * [new branch] gh/soulitzer/392/base -> origin/gh/soulitzer/392/base 2025-12-04T09:33:41.7060871Z * [new branch] gh/soulitzer/392/head -> origin/gh/soulitzer/392/head 2025-12-04T09:33:41.7062346Z * [new branch] gh/soulitzer/392/orig -> origin/gh/soulitzer/392/orig 2025-12-04T09:33:41.7064847Z * [new branch] gh/swolchok/728/next -> origin/gh/swolchok/728/next 2025-12-04T09:33:41.7067112Z * [new branch] gh/swolchok/819/base -> origin/gh/swolchok/819/base 2025-12-04T09:33:41.7068602Z * [new branch] gh/swolchok/819/head -> origin/gh/swolchok/819/head 2025-12-04T09:33:41.7070097Z * [new branch] gh/swolchok/819/orig -> origin/gh/swolchok/819/orig 2025-12-04T09:33:41.7072200Z * [new branch] gh/swolchok/824/base -> origin/gh/swolchok/824/base 2025-12-04T09:33:41.7073806Z * [new branch] gh/swolchok/824/head -> origin/gh/swolchok/824/head 2025-12-04T09:33:41.7075135Z * [new branch] gh/swolchok/824/orig -> origin/gh/swolchok/824/orig 2025-12-04T09:33:41.7077158Z * [new branch] gh/swolchok/829/base -> origin/gh/swolchok/829/base 2025-12-04T09:33:41.7078504Z * [new branch] gh/swolchok/829/head -> origin/gh/swolchok/829/head 2025-12-04T09:33:41.7080030Z * [new branch] gh/swolchok/829/orig -> origin/gh/swolchok/829/orig 2025-12-04T09:33:41.7082066Z * [new branch] gh/swolchok/839/base -> origin/gh/swolchok/839/base 2025-12-04T09:33:41.7083414Z * [new branch] gh/swolchok/839/head -> origin/gh/swolchok/839/head 2025-12-04T09:33:41.7084858Z * [new branch] gh/swolchok/839/orig -> origin/gh/swolchok/839/orig 2025-12-04T09:33:41.7086747Z * [new branch] gh/swolchok/841/base -> origin/gh/swolchok/841/base 2025-12-04T09:33:41.7088310Z * [new branch] gh/swolchok/841/head -> origin/gh/swolchok/841/head 2025-12-04T09:33:41.7089855Z * [new branch] gh/swolchok/841/orig -> origin/gh/swolchok/841/orig 2025-12-04T09:33:41.7091800Z * [new branch] gh/swolchok/842/base -> origin/gh/swolchok/842/base 2025-12-04T09:33:41.7093333Z * [new branch] gh/swolchok/842/head -> origin/gh/swolchok/842/head 2025-12-04T09:33:41.7095131Z * [new branch] gh/swolchok/842/orig -> origin/gh/swolchok/842/orig 2025-12-04T09:33:41.7096794Z * [new branch] gh/swolchok/845/base -> origin/gh/swolchok/845/base 2025-12-04T09:33:41.7098345Z * [new branch] gh/swolchok/845/head -> origin/gh/swolchok/845/head 2025-12-04T09:33:41.7099862Z * [new branch] gh/swolchok/845/orig -> origin/gh/swolchok/845/orig 2025-12-04T09:33:41.7102091Z * [new branch] gh/swolchok/848/base -> origin/gh/swolchok/848/base 2025-12-04T09:33:41.7103603Z * [new branch] gh/swolchok/848/head -> origin/gh/swolchok/848/head 2025-12-04T09:33:41.7105212Z * [new branch] gh/swolchok/848/orig -> origin/gh/swolchok/848/orig 2025-12-04T09:33:41.7107070Z * [new branch] gh/swolchok/856/base -> origin/gh/swolchok/856/base 2025-12-04T09:33:41.7108658Z * [new branch] gh/swolchok/856/head -> origin/gh/swolchok/856/head 2025-12-04T09:33:41.7110125Z * [new branch] gh/swolchok/856/orig -> origin/gh/swolchok/856/orig 2025-12-04T09:33:41.7112269Z * [new branch] gh/swolchok/860/base -> origin/gh/swolchok/860/base 2025-12-04T09:33:41.7113728Z * [new branch] gh/swolchok/860/head -> origin/gh/swolchok/860/head 2025-12-04T09:33:41.7115604Z * [new branch] gh/swolchok/860/orig -> origin/gh/swolchok/860/orig 2025-12-04T09:33:41.7117871Z * [new branch] gh/swolchok/861/base -> origin/gh/swolchok/861/base 2025-12-04T09:33:41.7119413Z * [new branch] gh/swolchok/861/head -> origin/gh/swolchok/861/head 2025-12-04T09:33:41.7120890Z * [new branch] gh/swolchok/861/orig -> origin/gh/swolchok/861/orig 2025-12-04T09:33:41.7122969Z * [new branch] gh/swolchok/862/base -> origin/gh/swolchok/862/base 2025-12-04T09:33:41.7124334Z * [new branch] gh/swolchok/862/head -> origin/gh/swolchok/862/head 2025-12-04T09:33:41.7125867Z * [new branch] gh/swolchok/862/orig -> origin/gh/swolchok/862/orig 2025-12-04T09:33:41.7127974Z * [new branch] gh/swolchok/863/base -> origin/gh/swolchok/863/base 2025-12-04T09:33:41.7129462Z * [new branch] gh/swolchok/863/head -> origin/gh/swolchok/863/head 2025-12-04T09:33:41.7131614Z * [new branch] gh/swolchok/863/orig -> origin/gh/swolchok/863/orig 2025-12-04T09:33:41.7133758Z * [new branch] gh/swolchok/864/base -> origin/gh/swolchok/864/base 2025-12-04T09:33:41.7135582Z * [new branch] gh/swolchok/864/head -> origin/gh/swolchok/864/head 2025-12-04T09:33:41.7137251Z * [new branch] gh/swolchok/864/orig -> origin/gh/swolchok/864/orig 2025-12-04T09:33:41.7139181Z * [new branch] gh/swolchok/865/base -> origin/gh/swolchok/865/base 2025-12-04T09:33:41.7140927Z * [new branch] gh/swolchok/865/head -> origin/gh/swolchok/865/head 2025-12-04T09:33:41.7142366Z * [new branch] gh/swolchok/865/orig -> origin/gh/swolchok/865/orig 2025-12-04T09:33:41.7145047Z * [new branch] gh/swolchok/866/base -> origin/gh/swolchok/866/base 2025-12-04T09:33:41.7146544Z * [new branch] gh/swolchok/866/head -> origin/gh/swolchok/866/head 2025-12-04T09:33:41.7148069Z * [new branch] gh/swolchok/866/orig -> origin/gh/swolchok/866/orig 2025-12-04T09:33:41.7150004Z * [new branch] gh/swolchok/867/base -> origin/gh/swolchok/867/base 2025-12-04T09:33:41.7151638Z * [new branch] gh/swolchok/867/head -> origin/gh/swolchok/867/head 2025-12-04T09:33:41.7153717Z * [new branch] gh/swolchok/867/orig -> origin/gh/swolchok/867/orig 2025-12-04T09:33:41.7155688Z * [new branch] gh/swolchok/868/base -> origin/gh/swolchok/868/base 2025-12-04T09:33:41.7157238Z * [new branch] gh/swolchok/868/head -> origin/gh/swolchok/868/head 2025-12-04T09:33:41.7158725Z * [new branch] gh/swolchok/868/orig -> origin/gh/swolchok/868/orig 2025-12-04T09:33:41.7160809Z * [new branch] gh/swolchok/869/base -> origin/gh/swolchok/869/base 2025-12-04T09:33:41.7162344Z * [new branch] gh/swolchok/869/head -> origin/gh/swolchok/869/head 2025-12-04T09:33:41.7163923Z * [new branch] gh/swolchok/869/orig -> origin/gh/swolchok/869/orig 2025-12-04T09:33:41.7166053Z * [new branch] gh/swolchok/870/base -> origin/gh/swolchok/870/base 2025-12-04T09:33:41.7167439Z * [new branch] gh/swolchok/870/head -> origin/gh/swolchok/870/head 2025-12-04T09:33:41.7168971Z * [new branch] gh/swolchok/870/orig -> origin/gh/swolchok/870/orig 2025-12-04T09:33:41.7171210Z * [new branch] gh/swolchok/871/base -> origin/gh/swolchok/871/base 2025-12-04T09:33:41.7176645Z * [new branch] gh/swolchok/871/head -> origin/gh/swolchok/871/head 2025-12-04T09:33:41.7178267Z * [new branch] gh/swolchok/871/orig -> origin/gh/swolchok/871/orig 2025-12-04T09:33:41.7181011Z * [new branch] gh/teja-rao/4/base -> origin/gh/teja-rao/4/base 2025-12-04T09:33:41.7182587Z * [new branch] gh/teja-rao/4/head -> origin/gh/teja-rao/4/head 2025-12-04T09:33:41.7184081Z * [new branch] gh/teja-rao/4/orig -> origin/gh/teja-rao/4/orig 2025-12-04T09:33:41.7186594Z * [new branch] gh/tianyu-l/2/base -> origin/gh/tianyu-l/2/base 2025-12-04T09:33:41.7188046Z * [new branch] gh/tianyu-l/2/head -> origin/gh/tianyu-l/2/head 2025-12-04T09:33:41.7189551Z * [new branch] gh/tianyu-l/2/orig -> origin/gh/tianyu-l/2/orig 2025-12-04T09:33:41.7191491Z * [new branch] gh/tianyu-l/3/base -> origin/gh/tianyu-l/3/base 2025-12-04T09:33:41.7193028Z * [new branch] gh/tianyu-l/3/orig -> origin/gh/tianyu-l/3/orig 2025-12-04T09:33:41.7195213Z * [new branch] gh/tianyu-l/4/base -> origin/gh/tianyu-l/4/base 2025-12-04T09:33:41.7196678Z * [new branch] gh/tianyu-l/4/head -> origin/gh/tianyu-l/4/head 2025-12-04T09:33:41.7198231Z * [new branch] gh/tianyu-l/4/orig -> origin/gh/tianyu-l/4/orig 2025-12-04T09:33:41.7201312Z * [new branch] gh/tugsbayasgalan/10/base -> origin/gh/tugsbayasgalan/10/base 2025-12-04T09:33:41.7202790Z * [new branch] gh/tugsbayasgalan/10/head -> origin/gh/tugsbayasgalan/10/head 2025-12-04T09:33:41.7204233Z * [new branch] gh/tugsbayasgalan/10/orig -> origin/gh/tugsbayasgalan/10/orig 2025-12-04T09:33:41.7206208Z * [new branch] gh/tugsbayasgalan/13/base -> origin/gh/tugsbayasgalan/13/base 2025-12-04T09:33:41.7207899Z * [new branch] gh/tugsbayasgalan/13/head -> origin/gh/tugsbayasgalan/13/head 2025-12-04T09:33:41.7209354Z * [new branch] gh/tugsbayasgalan/13/orig -> origin/gh/tugsbayasgalan/13/orig 2025-12-04T09:33:41.7211592Z * [new branch] gh/tugsbayasgalan/17/base -> origin/gh/tugsbayasgalan/17/base 2025-12-04T09:33:41.7212980Z * [new branch] gh/tugsbayasgalan/17/head -> origin/gh/tugsbayasgalan/17/head 2025-12-04T09:33:41.7214563Z * [new branch] gh/tugsbayasgalan/17/orig -> origin/gh/tugsbayasgalan/17/orig 2025-12-04T09:33:41.7216841Z * [new branch] gh/tugsbayasgalan/2/base -> origin/gh/tugsbayasgalan/2/base 2025-12-04T09:33:41.7218343Z * [new branch] gh/tugsbayasgalan/2/head -> origin/gh/tugsbayasgalan/2/head 2025-12-04T09:33:41.7219831Z * [new branch] gh/tugsbayasgalan/2/orig -> origin/gh/tugsbayasgalan/2/orig 2025-12-04T09:33:41.7222242Z * [new branch] gh/tugsbayasgalan/28/base -> origin/gh/tugsbayasgalan/28/base 2025-12-04T09:33:41.7223789Z * [new branch] gh/tugsbayasgalan/28/head -> origin/gh/tugsbayasgalan/28/head 2025-12-04T09:33:41.7225247Z * [new branch] gh/tugsbayasgalan/28/orig -> origin/gh/tugsbayasgalan/28/orig 2025-12-04T09:33:41.7227350Z * [new branch] gh/tugsbayasgalan/32/base -> origin/gh/tugsbayasgalan/32/base 2025-12-04T09:33:41.7229380Z * [new branch] gh/tugsbayasgalan/32/head -> origin/gh/tugsbayasgalan/32/head 2025-12-04T09:33:41.7230889Z * [new branch] gh/tugsbayasgalan/32/orig -> origin/gh/tugsbayasgalan/32/orig 2025-12-04T09:33:41.7233045Z * [new branch] gh/tugsbayasgalan/35/base -> origin/gh/tugsbayasgalan/35/base 2025-12-04T09:33:41.7234696Z * [new branch] gh/tugsbayasgalan/35/head -> origin/gh/tugsbayasgalan/35/head 2025-12-04T09:33:41.7236190Z * [new branch] gh/tugsbayasgalan/35/orig -> origin/gh/tugsbayasgalan/35/orig 2025-12-04T09:33:41.7238315Z * [new branch] gh/tugsbayasgalan/36/base -> origin/gh/tugsbayasgalan/36/base 2025-12-04T09:33:41.7239824Z * [new branch] gh/tugsbayasgalan/36/head -> origin/gh/tugsbayasgalan/36/head 2025-12-04T09:33:41.7241369Z * [new branch] gh/tugsbayasgalan/36/orig -> origin/gh/tugsbayasgalan/36/orig 2025-12-04T09:33:41.7243381Z * [new branch] gh/tugsbayasgalan/37/base -> origin/gh/tugsbayasgalan/37/base 2025-12-04T09:33:41.7244865Z * [new branch] gh/tugsbayasgalan/37/head -> origin/gh/tugsbayasgalan/37/head 2025-12-04T09:33:41.7246327Z * [new branch] gh/tugsbayasgalan/37/orig -> origin/gh/tugsbayasgalan/37/orig 2025-12-04T09:33:41.7248365Z * [new branch] gh/tugsbayasgalan/43/base -> origin/gh/tugsbayasgalan/43/base 2025-12-04T09:33:41.7249888Z * [new branch] gh/tugsbayasgalan/43/head -> origin/gh/tugsbayasgalan/43/head 2025-12-04T09:33:41.7251420Z * [new branch] gh/tugsbayasgalan/43/orig -> origin/gh/tugsbayasgalan/43/orig 2025-12-04T09:33:41.7253848Z * [new branch] gh/tugsbayasgalan/48/base -> origin/gh/tugsbayasgalan/48/base 2025-12-04T09:33:41.7255351Z * [new branch] gh/tugsbayasgalan/48/head -> origin/gh/tugsbayasgalan/48/head 2025-12-04T09:33:41.7257060Z * [new branch] gh/tugsbayasgalan/48/orig -> origin/gh/tugsbayasgalan/48/orig 2025-12-04T09:33:41.7259216Z * [new branch] gh/tugsbayasgalan/51/base -> origin/gh/tugsbayasgalan/51/base 2025-12-04T09:33:41.7260844Z * [new branch] gh/tugsbayasgalan/51/head -> origin/gh/tugsbayasgalan/51/head 2025-12-04T09:33:41.7262306Z * [new branch] gh/tugsbayasgalan/51/orig -> origin/gh/tugsbayasgalan/51/orig 2025-12-04T09:33:41.7264094Z * [new branch] gh/tugsbayasgalan/52/base -> origin/gh/tugsbayasgalan/52/base 2025-12-04T09:33:41.7265655Z * [new branch] gh/tugsbayasgalan/52/head -> origin/gh/tugsbayasgalan/52/head 2025-12-04T09:33:41.7267200Z * [new branch] gh/tugsbayasgalan/52/orig -> origin/gh/tugsbayasgalan/52/orig 2025-12-04T09:33:41.7269245Z * [new branch] gh/tugsbayasgalan/53/base -> origin/gh/tugsbayasgalan/53/base 2025-12-04T09:33:41.7270705Z * [new branch] gh/tugsbayasgalan/53/head -> origin/gh/tugsbayasgalan/53/head 2025-12-04T09:33:41.7272433Z * [new branch] gh/tugsbayasgalan/53/orig -> origin/gh/tugsbayasgalan/53/orig 2025-12-04T09:33:41.7274592Z * [new branch] gh/tugsbayasgalan/55/base -> origin/gh/tugsbayasgalan/55/base 2025-12-04T09:33:41.7276287Z * [new branch] gh/tugsbayasgalan/55/head -> origin/gh/tugsbayasgalan/55/head 2025-12-04T09:33:41.7277852Z * [new branch] gh/tugsbayasgalan/55/orig -> origin/gh/tugsbayasgalan/55/orig 2025-12-04T09:33:41.7280108Z * [new branch] gh/tugsbayasgalan/59/base -> origin/gh/tugsbayasgalan/59/base 2025-12-04T09:33:41.7281758Z * [new branch] gh/tugsbayasgalan/59/head -> origin/gh/tugsbayasgalan/59/head 2025-12-04T09:33:41.7283238Z * [new branch] gh/tugsbayasgalan/59/orig -> origin/gh/tugsbayasgalan/59/orig 2025-12-04T09:33:41.7285178Z * [new branch] gh/tugsbayasgalan/6/base -> origin/gh/tugsbayasgalan/6/base 2025-12-04T09:33:41.7286625Z * [new branch] gh/tugsbayasgalan/6/head -> origin/gh/tugsbayasgalan/6/head 2025-12-04T09:33:41.7288238Z * [new branch] gh/tugsbayasgalan/6/orig -> origin/gh/tugsbayasgalan/6/orig 2025-12-04T09:33:41.7290124Z * [new branch] gh/tugsbayasgalan/60/base -> origin/gh/tugsbayasgalan/60/base 2025-12-04T09:33:41.7291680Z * [new branch] gh/tugsbayasgalan/60/head -> origin/gh/tugsbayasgalan/60/head 2025-12-04T09:33:41.7293177Z * [new branch] gh/tugsbayasgalan/60/orig -> origin/gh/tugsbayasgalan/60/orig 2025-12-04T09:33:41.7295764Z * [new branch] gh/tugsbayasgalan/61/base -> origin/gh/tugsbayasgalan/61/base 2025-12-04T09:33:41.7297305Z * [new branch] gh/tugsbayasgalan/61/head -> origin/gh/tugsbayasgalan/61/head 2025-12-04T09:33:41.7298792Z * [new branch] gh/tugsbayasgalan/61/orig -> origin/gh/tugsbayasgalan/61/orig 2025-12-04T09:33:41.7301043Z * [new branch] gh/tugsbayasgalan/63/base -> origin/gh/tugsbayasgalan/63/base 2025-12-04T09:33:41.7302561Z * [new branch] gh/tugsbayasgalan/63/head -> origin/gh/tugsbayasgalan/63/head 2025-12-04T09:33:41.7304052Z * [new branch] gh/tugsbayasgalan/63/orig -> origin/gh/tugsbayasgalan/63/orig 2025-12-04T09:33:41.7306140Z * [new branch] gh/tugsbayasgalan/67/base -> origin/gh/tugsbayasgalan/67/base 2025-12-04T09:33:41.7307618Z * [new branch] gh/tugsbayasgalan/67/head -> origin/gh/tugsbayasgalan/67/head 2025-12-04T09:33:41.7309096Z * [new branch] gh/tugsbayasgalan/67/orig -> origin/gh/tugsbayasgalan/67/orig 2025-12-04T09:33:41.7311351Z * [new branch] gh/tugsbayasgalan/68/base -> origin/gh/tugsbayasgalan/68/base 2025-12-04T09:33:41.7312848Z * [new branch] gh/tugsbayasgalan/68/head -> origin/gh/tugsbayasgalan/68/head 2025-12-04T09:33:41.7314351Z * [new branch] gh/tugsbayasgalan/68/orig -> origin/gh/tugsbayasgalan/68/orig 2025-12-04T09:33:41.7316381Z * [new branch] gh/tugsbayasgalan/7/base -> origin/gh/tugsbayasgalan/7/base 2025-12-04T09:33:41.7317940Z * [new branch] gh/tugsbayasgalan/7/head -> origin/gh/tugsbayasgalan/7/head 2025-12-04T09:33:41.7319526Z * [new branch] gh/tugsbayasgalan/7/orig -> origin/gh/tugsbayasgalan/7/orig 2025-12-04T09:33:41.7321975Z * [new branch] gh/tugsbayasgalan/70/base -> origin/gh/tugsbayasgalan/70/base 2025-12-04T09:33:41.7323703Z * [new branch] gh/tugsbayasgalan/70/head -> origin/gh/tugsbayasgalan/70/head 2025-12-04T09:33:41.7325262Z * [new branch] gh/tugsbayasgalan/70/orig -> origin/gh/tugsbayasgalan/70/orig 2025-12-04T09:33:41.7327610Z * [new branch] gh/tugsbayasgalan/71/base -> origin/gh/tugsbayasgalan/71/base 2025-12-04T09:33:41.7329282Z * [new branch] gh/tugsbayasgalan/71/head -> origin/gh/tugsbayasgalan/71/head 2025-12-04T09:33:41.7330883Z * [new branch] gh/tugsbayasgalan/71/orig -> origin/gh/tugsbayasgalan/71/orig 2025-12-04T09:33:41.7333165Z * [new branch] gh/tugsbayasgalan/72/base -> origin/gh/tugsbayasgalan/72/base 2025-12-04T09:33:41.7334721Z * [new branch] gh/tugsbayasgalan/72/head -> origin/gh/tugsbayasgalan/72/head 2025-12-04T09:33:41.7336232Z * [new branch] gh/tugsbayasgalan/72/orig -> origin/gh/tugsbayasgalan/72/orig 2025-12-04T09:33:41.7338505Z * [new branch] gh/tugsbayasgalan/73/base -> origin/gh/tugsbayasgalan/73/base 2025-12-04T09:33:41.7340116Z * [new branch] gh/tugsbayasgalan/73/head -> origin/gh/tugsbayasgalan/73/head 2025-12-04T09:33:41.7341645Z * [new branch] gh/tugsbayasgalan/73/orig -> origin/gh/tugsbayasgalan/73/orig 2025-12-04T09:33:41.7344573Z * [new branch] gh/tugsbayasgalan/74/base -> origin/gh/tugsbayasgalan/74/base 2025-12-04T09:33:41.7346205Z * [new branch] gh/tugsbayasgalan/74/head -> origin/gh/tugsbayasgalan/74/head 2025-12-04T09:33:41.7347698Z * [new branch] gh/tugsbayasgalan/74/orig -> origin/gh/tugsbayasgalan/74/orig 2025-12-04T09:33:41.7349819Z * [new branch] gh/tugsbayasgalan/75/base -> origin/gh/tugsbayasgalan/75/base 2025-12-04T09:33:41.7351304Z * [new branch] gh/tugsbayasgalan/75/head -> origin/gh/tugsbayasgalan/75/head 2025-12-04T09:33:41.7352916Z * [new branch] gh/tugsbayasgalan/75/orig -> origin/gh/tugsbayasgalan/75/orig 2025-12-04T09:33:41.7354751Z * [new branch] gh/tugsbayasgalan/76/base -> origin/gh/tugsbayasgalan/76/base 2025-12-04T09:33:41.7356389Z * [new branch] gh/tugsbayasgalan/76/head -> origin/gh/tugsbayasgalan/76/head 2025-12-04T09:33:41.7357817Z * [new branch] gh/tugsbayasgalan/76/orig -> origin/gh/tugsbayasgalan/76/orig 2025-12-04T09:33:41.7360702Z * [new branch] gh/tugsbayasgalan/77/base -> origin/gh/tugsbayasgalan/77/base 2025-12-04T09:33:41.7362115Z * [new branch] gh/tugsbayasgalan/77/head -> origin/gh/tugsbayasgalan/77/head 2025-12-04T09:33:41.7363678Z * [new branch] gh/tugsbayasgalan/77/orig -> origin/gh/tugsbayasgalan/77/orig 2025-12-04T09:33:41.7365858Z * [new branch] gh/tugsbayasgalan/78/base -> origin/gh/tugsbayasgalan/78/base 2025-12-04T09:33:41.7367504Z * [new branch] gh/tugsbayasgalan/78/head -> origin/gh/tugsbayasgalan/78/head 2025-12-04T09:33:41.7369029Z * [new branch] gh/tugsbayasgalan/78/orig -> origin/gh/tugsbayasgalan/78/orig 2025-12-04T09:33:41.7371285Z * [new branch] gh/tugsbayasgalan/79/base -> origin/gh/tugsbayasgalan/79/base 2025-12-04T09:33:41.7372944Z * [new branch] gh/tugsbayasgalan/79/head -> origin/gh/tugsbayasgalan/79/head 2025-12-04T09:33:41.7374424Z * [new branch] gh/tugsbayasgalan/79/orig -> origin/gh/tugsbayasgalan/79/orig 2025-12-04T09:33:41.7376603Z * [new branch] gh/tugsbayasgalan/8/base -> origin/gh/tugsbayasgalan/8/base 2025-12-04T09:33:41.7378029Z * [new branch] gh/tugsbayasgalan/8/head -> origin/gh/tugsbayasgalan/8/head 2025-12-04T09:33:41.7379643Z * [new branch] gh/tugsbayasgalan/8/orig -> origin/gh/tugsbayasgalan/8/orig 2025-12-04T09:33:41.7381466Z * [new branch] gh/tugsbayasgalan/80/base -> origin/gh/tugsbayasgalan/80/base 2025-12-04T09:33:41.7382892Z * [new branch] gh/tugsbayasgalan/80/head -> origin/gh/tugsbayasgalan/80/head 2025-12-04T09:33:41.7384399Z * [new branch] gh/tugsbayasgalan/80/orig -> origin/gh/tugsbayasgalan/80/orig 2025-12-04T09:33:41.7386637Z * [new branch] gh/tugsbayasgalan/81/base -> origin/gh/tugsbayasgalan/81/base 2025-12-04T09:33:41.7388011Z * [new branch] gh/tugsbayasgalan/81/head -> origin/gh/tugsbayasgalan/81/head 2025-12-04T09:33:41.7389513Z * [new branch] gh/tugsbayasgalan/81/orig -> origin/gh/tugsbayasgalan/81/orig 2025-12-04T09:33:41.7392373Z * [new branch] gh/tugsbayasgalan/82/base -> origin/gh/tugsbayasgalan/82/base 2025-12-04T09:33:41.7394153Z * [new branch] gh/tugsbayasgalan/82/head -> origin/gh/tugsbayasgalan/82/head 2025-12-04T09:33:41.7395732Z * [new branch] gh/tugsbayasgalan/82/orig -> origin/gh/tugsbayasgalan/82/orig 2025-12-04T09:33:41.7397628Z * [new branch] gh/tugsbayasgalan/83/base -> origin/gh/tugsbayasgalan/83/base 2025-12-04T09:33:41.7399201Z * [new branch] gh/tugsbayasgalan/83/head -> origin/gh/tugsbayasgalan/83/head 2025-12-04T09:33:41.7400700Z * [new branch] gh/tugsbayasgalan/83/orig -> origin/gh/tugsbayasgalan/83/orig 2025-12-04T09:33:41.7402549Z * [new branch] gh/tugsbayasgalan/84/base -> origin/gh/tugsbayasgalan/84/base 2025-12-04T09:33:41.7404120Z * [new branch] gh/tugsbayasgalan/84/head -> origin/gh/tugsbayasgalan/84/head 2025-12-04T09:33:41.7405626Z * [new branch] gh/tugsbayasgalan/84/orig -> origin/gh/tugsbayasgalan/84/orig 2025-12-04T09:33:41.7407611Z * [new branch] gh/tugsbayasgalan/85/base -> origin/gh/tugsbayasgalan/85/base 2025-12-04T09:33:41.7409329Z * [new branch] gh/tugsbayasgalan/85/head -> origin/gh/tugsbayasgalan/85/head 2025-12-04T09:33:41.7410823Z * [new branch] gh/tugsbayasgalan/85/orig -> origin/gh/tugsbayasgalan/85/orig 2025-12-04T09:33:41.7412915Z * [new branch] gh/tugsbayasgalan/86/base -> origin/gh/tugsbayasgalan/86/base 2025-12-04T09:33:41.7414554Z * [new branch] gh/tugsbayasgalan/86/head -> origin/gh/tugsbayasgalan/86/head 2025-12-04T09:33:41.7415995Z * [new branch] gh/tugsbayasgalan/86/orig -> origin/gh/tugsbayasgalan/86/orig 2025-12-04T09:33:41.7418636Z * [new branch] gh/tugsbayasgalan/87/base -> origin/gh/tugsbayasgalan/87/base 2025-12-04T09:33:41.7420118Z * [new branch] gh/tugsbayasgalan/87/head -> origin/gh/tugsbayasgalan/87/head 2025-12-04T09:33:41.7421689Z * [new branch] gh/tugsbayasgalan/87/orig -> origin/gh/tugsbayasgalan/87/orig 2025-12-04T09:33:41.7423881Z * [new branch] gh/tugsbayasgalan/88/base -> origin/gh/tugsbayasgalan/88/base 2025-12-04T09:33:41.7425342Z * [new branch] gh/tugsbayasgalan/88/head -> origin/gh/tugsbayasgalan/88/head 2025-12-04T09:33:41.7426965Z * [new branch] gh/tugsbayasgalan/88/orig -> origin/gh/tugsbayasgalan/88/orig 2025-12-04T09:33:41.7429196Z * [new branch] gh/tugsbayasgalan/89/base -> origin/gh/tugsbayasgalan/89/base 2025-12-04T09:33:41.7430638Z * [new branch] gh/tugsbayasgalan/89/head -> origin/gh/tugsbayasgalan/89/head 2025-12-04T09:33:41.7432164Z * [new branch] gh/tugsbayasgalan/89/orig -> origin/gh/tugsbayasgalan/89/orig 2025-12-04T09:33:41.7434264Z * [new branch] gh/tugsbayasgalan/9/base -> origin/gh/tugsbayasgalan/9/base 2025-12-04T09:33:41.7435648Z * [new branch] gh/tugsbayasgalan/9/head -> origin/gh/tugsbayasgalan/9/head 2025-12-04T09:33:41.7437118Z * [new branch] gh/tugsbayasgalan/9/orig -> origin/gh/tugsbayasgalan/9/orig 2025-12-04T09:33:41.7440464Z * [new branch] gh/tugsbayasgalan/90/base -> origin/gh/tugsbayasgalan/90/base 2025-12-04T09:33:41.7441789Z * [new branch] gh/tugsbayasgalan/90/head -> origin/gh/tugsbayasgalan/90/head 2025-12-04T09:33:41.7443379Z * [new branch] gh/tugsbayasgalan/90/orig -> origin/gh/tugsbayasgalan/90/orig 2025-12-04T09:33:41.7445812Z * [new branch] gh/tugsbayasgalan/91/base -> origin/gh/tugsbayasgalan/91/base 2025-12-04T09:33:41.7447265Z * [new branch] gh/tugsbayasgalan/91/head -> origin/gh/tugsbayasgalan/91/head 2025-12-04T09:33:41.7448782Z * [new branch] gh/tugsbayasgalan/91/orig -> origin/gh/tugsbayasgalan/91/orig 2025-12-04T09:33:41.7451087Z * [new branch] gh/tugsbayasgalan/92/base -> origin/gh/tugsbayasgalan/92/base 2025-12-04T09:33:41.7452677Z * [new branch] gh/tugsbayasgalan/92/head -> origin/gh/tugsbayasgalan/92/head 2025-12-04T09:33:41.7454148Z * [new branch] gh/tugsbayasgalan/92/orig -> origin/gh/tugsbayasgalan/92/orig 2025-12-04T09:33:41.7456500Z * [new branch] gh/tugsbayasgalan/93/base -> origin/gh/tugsbayasgalan/93/base 2025-12-04T09:33:41.7458101Z * [new branch] gh/tugsbayasgalan/93/head -> origin/gh/tugsbayasgalan/93/head 2025-12-04T09:33:41.7459675Z * [new branch] gh/tugsbayasgalan/93/orig -> origin/gh/tugsbayasgalan/93/orig 2025-12-04T09:33:41.7462344Z * [new branch] gh/v0i0/14/base -> origin/gh/v0i0/14/base 2025-12-04T09:33:41.7463725Z * [new branch] gh/v0i0/14/head -> origin/gh/v0i0/14/head 2025-12-04T09:33:41.7465245Z * [new branch] gh/v0i0/14/orig -> origin/gh/v0i0/14/orig 2025-12-04T09:33:41.7467032Z * [new branch] gh/v0i0/15/base -> origin/gh/v0i0/15/base 2025-12-04T09:33:41.7468686Z * [new branch] gh/v0i0/15/head -> origin/gh/v0i0/15/head 2025-12-04T09:33:41.7470306Z * [new branch] gh/v0i0/15/orig -> origin/gh/v0i0/15/orig 2025-12-04T09:33:41.7472572Z * [new branch] gh/v0i0/16/base -> origin/gh/v0i0/16/base 2025-12-04T09:33:41.7474112Z * [new branch] gh/v0i0/16/head -> origin/gh/v0i0/16/head 2025-12-04T09:33:41.7475565Z * [new branch] gh/v0i0/16/orig -> origin/gh/v0i0/16/orig 2025-12-04T09:33:41.7477612Z * [new branch] gh/v0i0/17/base -> origin/gh/v0i0/17/base 2025-12-04T09:33:41.7479125Z * [new branch] gh/v0i0/17/head -> origin/gh/v0i0/17/head 2025-12-04T09:33:41.7480631Z * [new branch] gh/v0i0/17/orig -> origin/gh/v0i0/17/orig 2025-12-04T09:33:41.7482679Z * [new branch] gh/v0i0/18/base -> origin/gh/v0i0/18/base 2025-12-04T09:33:41.7484282Z * [new branch] gh/v0i0/18/head -> origin/gh/v0i0/18/head 2025-12-04T09:33:41.7485767Z * [new branch] gh/v0i0/18/orig -> origin/gh/v0i0/18/orig 2025-12-04T09:33:41.7487810Z * [new branch] gh/v0i0/19/base -> origin/gh/v0i0/19/base 2025-12-04T09:33:41.7489336Z * [new branch] gh/v0i0/19/head -> origin/gh/v0i0/19/head 2025-12-04T09:33:41.7490926Z * [new branch] gh/v0i0/19/orig -> origin/gh/v0i0/19/orig 2025-12-04T09:33:41.7494189Z * [new branch] gh/vishal9-team/1/base -> origin/gh/vishal9-team/1/base 2025-12-04T09:33:41.7495767Z * [new branch] gh/vishal9-team/1/head -> origin/gh/vishal9-team/1/head 2025-12-04T09:33:41.7497738Z * [new branch] gh/vishal9-team/2/base -> origin/gh/vishal9-team/2/base 2025-12-04T09:33:41.7499282Z * [new branch] gh/vishal9-team/2/head -> origin/gh/vishal9-team/2/head 2025-12-04T09:33:41.7501120Z * [new branch] gh/vishal9-team/2/orig -> origin/gh/vishal9-team/2/orig 2025-12-04T09:33:41.7503354Z * [new branch] gh/vishal9-team/3/base -> origin/gh/vishal9-team/3/base 2025-12-04T09:33:41.7504811Z * [new branch] gh/vishal9-team/3/head -> origin/gh/vishal9-team/3/head 2025-12-04T09:33:41.7506404Z * [new branch] gh/vishal9-team/3/orig -> origin/gh/vishal9-team/3/orig 2025-12-04T09:33:41.7508170Z * [new branch] gh/vishal9-team/4/base -> origin/gh/vishal9-team/4/base 2025-12-04T09:33:41.7509660Z * [new branch] gh/vishal9-team/4/head -> origin/gh/vishal9-team/4/head 2025-12-04T09:33:41.7511280Z * [new branch] gh/vishal9-team/4/orig -> origin/gh/vishal9-team/4/orig 2025-12-04T09:33:41.7513677Z * [new branch] gh/vkuzo/1/next -> origin/gh/vkuzo/1/next 2025-12-04T09:33:41.7515593Z * [new branch] gh/vkuzo/2/next -> origin/gh/vkuzo/2/next 2025-12-04T09:33:41.7517488Z * [new branch] gh/vkuzo/3/next -> origin/gh/vkuzo/3/next 2025-12-04T09:33:41.7519998Z * [new branch] gh/wconstab/424/base -> origin/gh/wconstab/424/base 2025-12-04T09:33:41.7521667Z * [new branch] gh/wconstab/424/head -> origin/gh/wconstab/424/head 2025-12-04T09:33:41.7523178Z * [new branch] gh/wconstab/424/orig -> origin/gh/wconstab/424/orig 2025-12-04T09:33:41.7525265Z * [new branch] gh/wconstab/435/base -> origin/gh/wconstab/435/base 2025-12-04T09:33:41.7526825Z * [new branch] gh/wconstab/435/head -> origin/gh/wconstab/435/head 2025-12-04T09:33:41.7528493Z * [new branch] gh/wconstab/435/orig -> origin/gh/wconstab/435/orig 2025-12-04T09:33:41.7530980Z * [new branch] gh/wconstab/444/base -> origin/gh/wconstab/444/base 2025-12-04T09:33:41.7532553Z * [new branch] gh/wconstab/444/head -> origin/gh/wconstab/444/head 2025-12-04T09:33:41.7534147Z * [new branch] gh/wconstab/444/orig -> origin/gh/wconstab/444/orig 2025-12-04T09:33:41.7536185Z * [new branch] gh/wconstab/447/base -> origin/gh/wconstab/447/base 2025-12-04T09:33:41.7537805Z * [new branch] gh/wconstab/447/head -> origin/gh/wconstab/447/head 2025-12-04T09:33:41.7539286Z * [new branch] gh/wconstab/447/orig -> origin/gh/wconstab/447/orig 2025-12-04T09:33:41.7541337Z * [new branch] gh/wconstab/448/base -> origin/gh/wconstab/448/base 2025-12-04T09:33:41.7542820Z * [new branch] gh/wconstab/448/head -> origin/gh/wconstab/448/head 2025-12-04T09:33:41.7544395Z * [new branch] gh/wconstab/448/orig -> origin/gh/wconstab/448/orig 2025-12-04T09:33:41.7546802Z * [new branch] gh/wconstab/449/base -> origin/gh/wconstab/449/base 2025-12-04T09:33:41.7548388Z * [new branch] gh/wconstab/449/head -> origin/gh/wconstab/449/head 2025-12-04T09:33:41.7550081Z * [new branch] gh/wconstab/449/orig -> origin/gh/wconstab/449/orig 2025-12-04T09:33:41.7551891Z * [new branch] gh/wconstab/450/base -> origin/gh/wconstab/450/base 2025-12-04T09:33:41.7553524Z * [new branch] gh/wconstab/450/head -> origin/gh/wconstab/450/head 2025-12-04T09:33:41.7555033Z * [new branch] gh/wconstab/450/orig -> origin/gh/wconstab/450/orig 2025-12-04T09:33:41.7556832Z * [new branch] gh/wconstab/451/base -> origin/gh/wconstab/451/base 2025-12-04T09:33:41.7558577Z * [new branch] gh/wconstab/451/head -> origin/gh/wconstab/451/head 2025-12-04T09:33:41.7559992Z * [new branch] gh/wconstab/451/orig -> origin/gh/wconstab/451/orig 2025-12-04T09:33:41.7562232Z * [new branch] gh/wconstab/452/base -> origin/gh/wconstab/452/base 2025-12-04T09:33:41.7563665Z * [new branch] gh/wconstab/452/head -> origin/gh/wconstab/452/head 2025-12-04T09:33:41.7565386Z * [new branch] gh/wconstab/452/orig -> origin/gh/wconstab/452/orig 2025-12-04T09:33:41.7567088Z * [new branch] gh/wconstab/453/base -> origin/gh/wconstab/453/base 2025-12-04T09:33:41.7568696Z * [new branch] gh/wconstab/453/head -> origin/gh/wconstab/453/head 2025-12-04T09:33:41.7570350Z * [new branch] gh/wconstab/453/orig -> origin/gh/wconstab/453/orig 2025-12-04T09:33:41.7572371Z * [new branch] gh/wconstab/454/base -> origin/gh/wconstab/454/base 2025-12-04T09:33:41.7573902Z * [new branch] gh/wconstab/454/head -> origin/gh/wconstab/454/head 2025-12-04T09:33:41.7575431Z * [new branch] gh/wconstab/454/orig -> origin/gh/wconstab/454/orig 2025-12-04T09:33:41.7577620Z * [new branch] gh/wconstab/455/base -> origin/gh/wconstab/455/base 2025-12-04T09:33:41.7579079Z * [new branch] gh/wconstab/455/head -> origin/gh/wconstab/455/head 2025-12-04T09:33:41.7580552Z * [new branch] gh/wconstab/455/orig -> origin/gh/wconstab/455/orig 2025-12-04T09:33:41.7582928Z * [new branch] gh/wconstab/456/base -> origin/gh/wconstab/456/base 2025-12-04T09:33:41.7584753Z * [new branch] gh/wconstab/456/head -> origin/gh/wconstab/456/head 2025-12-04T09:33:41.7586407Z * [new branch] gh/wconstab/456/orig -> origin/gh/wconstab/456/orig 2025-12-04T09:33:41.7588457Z * [new branch] gh/wconstab/457/base -> origin/gh/wconstab/457/base 2025-12-04T09:33:41.7590071Z * [new branch] gh/wconstab/457/head -> origin/gh/wconstab/457/head 2025-12-04T09:33:41.7593622Z * [new branch] gh/wconstab/457/orig -> origin/gh/wconstab/457/orig 2025-12-04T09:33:41.7594437Z * [new branch] gh/wconstab/458/base -> origin/gh/wconstab/458/base 2025-12-04T09:33:41.7595342Z * [new branch] gh/wconstab/458/head -> origin/gh/wconstab/458/head 2025-12-04T09:33:41.7596713Z * [new branch] gh/wconstab/458/orig -> origin/gh/wconstab/458/orig 2025-12-04T09:33:41.7598524Z * [new branch] gh/wconstab/459/base -> origin/gh/wconstab/459/base 2025-12-04T09:33:41.7600133Z * [new branch] gh/wconstab/459/head -> origin/gh/wconstab/459/head 2025-12-04T09:33:41.7601512Z * [new branch] gh/wconstab/459/orig -> origin/gh/wconstab/459/orig 2025-12-04T09:33:41.7604338Z * [new branch] gh/wconstab/460/base -> origin/gh/wconstab/460/base 2025-12-04T09:33:41.7606156Z * [new branch] gh/wconstab/460/head -> origin/gh/wconstab/460/head 2025-12-04T09:33:41.7607771Z * [new branch] gh/wconstab/460/orig -> origin/gh/wconstab/460/orig 2025-12-04T09:33:41.7609924Z * [new branch] gh/wconstab/461/base -> origin/gh/wconstab/461/base 2025-12-04T09:33:41.7611640Z * [new branch] gh/wconstab/461/head -> origin/gh/wconstab/461/head 2025-12-04T09:33:41.7613253Z * [new branch] gh/wconstab/461/orig -> origin/gh/wconstab/461/orig 2025-12-04T09:33:41.7615128Z * [new branch] gh/wconstab/462/base -> origin/gh/wconstab/462/base 2025-12-04T09:33:41.7616806Z * [new branch] gh/wconstab/462/head -> origin/gh/wconstab/462/head 2025-12-04T09:33:41.7618511Z * [new branch] gh/wconstab/462/orig -> origin/gh/wconstab/462/orig 2025-12-04T09:33:41.7620575Z * [new branch] gh/wconstab/463/base -> origin/gh/wconstab/463/base 2025-12-04T09:33:41.7622189Z * [new branch] gh/wconstab/463/head -> origin/gh/wconstab/463/head 2025-12-04T09:33:41.7623757Z * [new branch] gh/wconstab/463/orig -> origin/gh/wconstab/463/orig 2025-12-04T09:33:41.7625831Z * [new branch] gh/wconstab/464/base -> origin/gh/wconstab/464/base 2025-12-04T09:33:41.7627505Z * [new branch] gh/wconstab/464/head -> origin/gh/wconstab/464/head 2025-12-04T09:33:41.7628810Z * [new branch] gh/wconstab/464/orig -> origin/gh/wconstab/464/orig 2025-12-04T09:33:41.7630733Z * [new branch] gh/wconstab/465/base -> origin/gh/wconstab/465/base 2025-12-04T09:33:41.7632318Z * [new branch] gh/wconstab/465/head -> origin/gh/wconstab/465/head 2025-12-04T09:33:41.7633811Z * [new branch] gh/wconstab/465/orig -> origin/gh/wconstab/465/orig 2025-12-04T09:33:41.7636049Z * [new branch] gh/wconstab/466/base -> origin/gh/wconstab/466/base 2025-12-04T09:33:41.7637438Z * [new branch] gh/wconstab/466/head -> origin/gh/wconstab/466/head 2025-12-04T09:33:41.7638808Z * [new branch] gh/wconstab/466/orig -> origin/gh/wconstab/466/orig 2025-12-04T09:33:41.7641309Z * [new branch] gh/wconstab/467/base -> origin/gh/wconstab/467/base 2025-12-04T09:33:41.7642983Z * [new branch] gh/wconstab/467/head -> origin/gh/wconstab/467/head 2025-12-04T09:33:41.7644540Z * [new branch] gh/wconstab/467/orig -> origin/gh/wconstab/467/orig 2025-12-04T09:33:41.7646462Z * [new branch] gh/wconstab/468/base -> origin/gh/wconstab/468/base 2025-12-04T09:33:41.7647875Z * [new branch] gh/wconstab/468/head -> origin/gh/wconstab/468/head 2025-12-04T09:33:41.7649410Z * [new branch] gh/wconstab/468/orig -> origin/gh/wconstab/468/orig 2025-12-04T09:33:41.7652030Z * [new branch] gh/weifengpy/39/base -> origin/gh/weifengpy/39/base 2025-12-04T09:33:41.7653570Z * [new branch] gh/weifengpy/39/head -> origin/gh/weifengpy/39/head 2025-12-04T09:33:41.7655107Z * [new branch] gh/weifengpy/39/orig -> origin/gh/weifengpy/39/orig 2025-12-04T09:33:41.7657586Z * [new branch] gh/weifengpy/40/base -> origin/gh/weifengpy/40/base 2025-12-04T09:33:41.7659039Z * [new branch] gh/weifengpy/40/head -> origin/gh/weifengpy/40/head 2025-12-04T09:33:41.7660533Z * [new branch] gh/weifengpy/40/orig -> origin/gh/weifengpy/40/orig 2025-12-04T09:33:41.7662767Z * [new branch] gh/weifengpy/41/base -> origin/gh/weifengpy/41/base 2025-12-04T09:33:41.7664325Z * [new branch] gh/weifengpy/41/head -> origin/gh/weifengpy/41/head 2025-12-04T09:33:41.7665960Z * [new branch] gh/weifengpy/41/orig -> origin/gh/weifengpy/41/orig 2025-12-04T09:33:41.7668524Z * [new branch] gh/williamwen42/250/base -> origin/gh/williamwen42/250/base 2025-12-04T09:33:41.7669982Z * [new branch] gh/williamwen42/250/head -> origin/gh/williamwen42/250/head 2025-12-04T09:33:41.7671648Z * [new branch] gh/williamwen42/250/orig -> origin/gh/williamwen42/250/orig 2025-12-04T09:33:41.7677228Z * [new branch] gh/williamwen42/279/base -> origin/gh/williamwen42/279/base 2025-12-04T09:33:41.7678869Z * [new branch] gh/williamwen42/279/head -> origin/gh/williamwen42/279/head 2025-12-04T09:33:41.7680454Z * [new branch] gh/williamwen42/279/orig -> origin/gh/williamwen42/279/orig 2025-12-04T09:33:41.7682466Z * [new branch] gh/williamwen42/282/base -> origin/gh/williamwen42/282/base 2025-12-04T09:33:41.7683993Z * [new branch] gh/williamwen42/282/head -> origin/gh/williamwen42/282/head 2025-12-04T09:33:41.7685450Z * [new branch] gh/williamwen42/282/orig -> origin/gh/williamwen42/282/orig 2025-12-04T09:33:41.7687588Z * [new branch] gh/williamwen42/287/base -> origin/gh/williamwen42/287/base 2025-12-04T09:33:41.7689024Z * [new branch] gh/williamwen42/287/head -> origin/gh/williamwen42/287/head 2025-12-04T09:33:41.7690640Z * [new branch] gh/williamwen42/287/orig -> origin/gh/williamwen42/287/orig 2025-12-04T09:33:41.7692824Z * [new branch] gh/williamwen42/288/base -> origin/gh/williamwen42/288/base 2025-12-04T09:33:41.7694164Z * [new branch] gh/williamwen42/288/head -> origin/gh/williamwen42/288/head 2025-12-04T09:33:41.7695668Z * [new branch] gh/williamwen42/288/orig -> origin/gh/williamwen42/288/orig 2025-12-04T09:33:41.7698138Z * [new branch] gh/williamwen42/296/base -> origin/gh/williamwen42/296/base 2025-12-04T09:33:41.7699760Z * [new branch] gh/williamwen42/296/head -> origin/gh/williamwen42/296/head 2025-12-04T09:33:41.7701249Z * [new branch] gh/williamwen42/296/orig -> origin/gh/williamwen42/296/orig 2025-12-04T09:33:41.7703114Z * [new branch] gh/williamwen42/297/base -> origin/gh/williamwen42/297/base 2025-12-04T09:33:41.7704681Z * [new branch] gh/williamwen42/297/head -> origin/gh/williamwen42/297/head 2025-12-04T09:33:41.7706213Z * [new branch] gh/williamwen42/297/orig -> origin/gh/williamwen42/297/orig 2025-12-04T09:33:41.7708259Z * [new branch] gh/williamwen42/306/base -> origin/gh/williamwen42/306/base 2025-12-04T09:33:41.7709807Z * [new branch] gh/williamwen42/306/head -> origin/gh/williamwen42/306/head 2025-12-04T09:33:41.7711369Z * [new branch] gh/williamwen42/306/orig -> origin/gh/williamwen42/306/orig 2025-12-04T09:33:41.7713427Z * [new branch] gh/williamwen42/309/base -> origin/gh/williamwen42/309/base 2025-12-04T09:33:41.7714992Z * [new branch] gh/williamwen42/309/head -> origin/gh/williamwen42/309/head 2025-12-04T09:33:41.7716532Z * [new branch] gh/williamwen42/309/orig -> origin/gh/williamwen42/309/orig 2025-12-04T09:33:41.7718616Z * [new branch] gh/williamwen42/310/base -> origin/gh/williamwen42/310/base 2025-12-04T09:33:41.7720180Z * [new branch] gh/williamwen42/310/head -> origin/gh/williamwen42/310/head 2025-12-04T09:33:41.7721746Z * [new branch] gh/williamwen42/310/orig -> origin/gh/williamwen42/310/orig 2025-12-04T09:33:41.7725357Z * [new branch] gh/williamwen42/311/base -> origin/gh/williamwen42/311/base 2025-12-04T09:33:41.7726854Z * [new branch] gh/williamwen42/311/head -> origin/gh/williamwen42/311/head 2025-12-04T09:33:41.7728375Z * [new branch] gh/williamwen42/311/orig -> origin/gh/williamwen42/311/orig 2025-12-04T09:33:41.7730191Z * [new branch] gh/williamwen42/319/base -> origin/gh/williamwen42/319/base 2025-12-04T09:33:41.7731649Z * [new branch] gh/williamwen42/319/head -> origin/gh/williamwen42/319/head 2025-12-04T09:33:41.7733100Z * [new branch] gh/williamwen42/319/orig -> origin/gh/williamwen42/319/orig 2025-12-04T09:33:41.7735127Z * [new branch] gh/williamwen42/325/base -> origin/gh/williamwen42/325/base 2025-12-04T09:33:41.7736893Z * [new branch] gh/williamwen42/325/head -> origin/gh/williamwen42/325/head 2025-12-04T09:33:41.7738401Z * [new branch] gh/williamwen42/325/orig -> origin/gh/williamwen42/325/orig 2025-12-04T09:33:41.7740417Z * [new branch] gh/williamwen42/326/base -> origin/gh/williamwen42/326/base 2025-12-04T09:33:41.7742032Z * [new branch] gh/williamwen42/326/head -> origin/gh/williamwen42/326/head 2025-12-04T09:33:41.7743512Z * [new branch] gh/williamwen42/326/orig -> origin/gh/williamwen42/326/orig 2025-12-04T09:33:41.7745567Z * [new branch] gh/williamwen42/327/base -> origin/gh/williamwen42/327/base 2025-12-04T09:33:41.7747015Z * [new branch] gh/williamwen42/327/head -> origin/gh/williamwen42/327/head 2025-12-04T09:33:41.7748526Z * [new branch] gh/williamwen42/327/orig -> origin/gh/williamwen42/327/orig 2025-12-04T09:33:41.7750552Z * [new branch] gh/williamwen42/328/base -> origin/gh/williamwen42/328/base 2025-12-04T09:33:41.7752225Z * [new branch] gh/williamwen42/328/head -> origin/gh/williamwen42/328/head 2025-12-04T09:33:41.7753703Z * [new branch] gh/williamwen42/328/orig -> origin/gh/williamwen42/328/orig 2025-12-04T09:33:41.7756300Z * [new branch] gh/williamwen42/329/base -> origin/gh/williamwen42/329/base 2025-12-04T09:33:41.7757940Z * [new branch] gh/williamwen42/329/head -> origin/gh/williamwen42/329/head 2025-12-04T09:33:41.7759514Z * [new branch] gh/williamwen42/329/orig -> origin/gh/williamwen42/329/orig 2025-12-04T09:33:41.7761687Z * [new branch] gh/williamwen42/330/base -> origin/gh/williamwen42/330/base 2025-12-04T09:33:41.7763291Z * [new branch] gh/williamwen42/330/head -> origin/gh/williamwen42/330/head 2025-12-04T09:33:41.7764763Z * [new branch] gh/williamwen42/330/orig -> origin/gh/williamwen42/330/orig 2025-12-04T09:33:41.7766736Z * [new branch] gh/williamwen42/331/base -> origin/gh/williamwen42/331/base 2025-12-04T09:33:41.7768199Z * [new branch] gh/williamwen42/331/head -> origin/gh/williamwen42/331/head 2025-12-04T09:33:41.7769732Z * [new branch] gh/williamwen42/331/orig -> origin/gh/williamwen42/331/orig 2025-12-04T09:33:41.7771783Z * [new branch] gh/williamwen42/332/base -> origin/gh/williamwen42/332/base 2025-12-04T09:33:41.7773360Z * [new branch] gh/williamwen42/332/head -> origin/gh/williamwen42/332/head 2025-12-04T09:33:41.7775264Z * [new branch] gh/williamwen42/332/orig -> origin/gh/williamwen42/332/orig 2025-12-04T09:33:41.7777755Z * [new branch] gh/williamwen42/333/base -> origin/gh/williamwen42/333/base 2025-12-04T09:33:41.7779199Z * [new branch] gh/williamwen42/333/head -> origin/gh/williamwen42/333/head 2025-12-04T09:33:41.7780734Z * [new branch] gh/williamwen42/333/orig -> origin/gh/williamwen42/333/orig 2025-12-04T09:33:41.7782907Z * [new branch] gh/williamwen42/334/base -> origin/gh/williamwen42/334/base 2025-12-04T09:33:41.7784379Z * [new branch] gh/williamwen42/334/head -> origin/gh/williamwen42/334/head 2025-12-04T09:33:41.7785991Z * [new branch] gh/williamwen42/334/orig -> origin/gh/williamwen42/334/orig 2025-12-04T09:33:41.7792313Z * [new branch] gh/williamwen42/335/base -> origin/gh/williamwen42/335/base 2025-12-04T09:33:41.7793826Z * [new branch] gh/williamwen42/335/head -> origin/gh/williamwen42/335/head 2025-12-04T09:33:41.7795368Z * [new branch] gh/williamwen42/335/orig -> origin/gh/williamwen42/335/orig 2025-12-04T09:33:41.7797530Z * [new branch] gh/williamwen42/336/base -> origin/gh/williamwen42/336/base 2025-12-04T09:33:41.7799039Z * [new branch] gh/williamwen42/336/head -> origin/gh/williamwen42/336/head 2025-12-04T09:33:41.7800374Z * [new branch] gh/williamwen42/336/orig -> origin/gh/williamwen42/336/orig 2025-12-04T09:33:41.7802465Z * [new branch] gh/williamwen42/337/base -> origin/gh/williamwen42/337/base 2025-12-04T09:33:41.7803935Z * [new branch] gh/williamwen42/337/head -> origin/gh/williamwen42/337/head 2025-12-04T09:33:41.7805464Z * [new branch] gh/williamwen42/337/orig -> origin/gh/williamwen42/337/orig 2025-12-04T09:33:41.7807652Z * [new branch] gh/williamwen42/338/base -> origin/gh/williamwen42/338/base 2025-12-04T09:33:41.7809164Z * [new branch] gh/williamwen42/338/head -> origin/gh/williamwen42/338/head 2025-12-04T09:33:41.7810662Z * [new branch] gh/williamwen42/338/orig -> origin/gh/williamwen42/338/orig 2025-12-04T09:33:41.7812682Z * [new branch] gh/williamwen42/339/base -> origin/gh/williamwen42/339/base 2025-12-04T09:33:41.7814293Z * [new branch] gh/williamwen42/339/head -> origin/gh/williamwen42/339/head 2025-12-04T09:33:41.7815577Z * [new branch] gh/williamwen42/339/orig -> origin/gh/williamwen42/339/orig 2025-12-04T09:33:41.7827317Z * [new branch] gh/williamwen42/340/base -> origin/gh/williamwen42/340/base 2025-12-04T09:33:41.7827906Z * [new branch] gh/williamwen42/340/head -> origin/gh/williamwen42/340/head 2025-12-04T09:33:41.7828224Z * [new branch] gh/williamwen42/340/orig -> origin/gh/williamwen42/340/orig 2025-12-04T09:33:41.7828516Z * [new branch] gh/williamwen42/341/base -> origin/gh/williamwen42/341/base 2025-12-04T09:33:41.7828794Z * [new branch] gh/williamwen42/341/head -> origin/gh/williamwen42/341/head 2025-12-04T09:33:41.7829070Z * [new branch] gh/williamwen42/341/orig -> origin/gh/williamwen42/341/orig 2025-12-04T09:33:41.7829362Z * [new branch] gh/williamwen42/342/base -> origin/gh/williamwen42/342/base 2025-12-04T09:33:41.7829660Z * [new branch] gh/williamwen42/342/head -> origin/gh/williamwen42/342/head 2025-12-04T09:33:41.7830071Z * [new branch] gh/williamwen42/342/orig -> origin/gh/williamwen42/342/orig 2025-12-04T09:33:41.7832751Z * [new branch] gh/williamwen42/343/base -> origin/gh/williamwen42/343/base 2025-12-04T09:33:41.7834324Z * [new branch] gh/williamwen42/343/head -> origin/gh/williamwen42/343/head 2025-12-04T09:33:41.7835766Z * [new branch] gh/williamwen42/343/orig -> origin/gh/williamwen42/343/orig 2025-12-04T09:33:41.7837855Z * [new branch] gh/williamwen42/344/base -> origin/gh/williamwen42/344/base 2025-12-04T09:33:41.7839350Z * [new branch] gh/williamwen42/344/head -> origin/gh/williamwen42/344/head 2025-12-04T09:33:41.7840807Z * [new branch] gh/williamwen42/344/orig -> origin/gh/williamwen42/344/orig 2025-12-04T09:33:41.7843606Z * [new branch] gh/williamwen42/345/base -> origin/gh/williamwen42/345/base 2025-12-04T09:33:41.7845083Z * [new branch] gh/williamwen42/345/head -> origin/gh/williamwen42/345/head 2025-12-04T09:33:41.7846573Z * [new branch] gh/williamwen42/345/orig -> origin/gh/williamwen42/345/orig 2025-12-04T09:33:41.7848777Z * [new branch] gh/williamwen42/346/base -> origin/gh/williamwen42/346/base 2025-12-04T09:33:41.7850352Z * [new branch] gh/williamwen42/346/head -> origin/gh/williamwen42/346/head 2025-12-04T09:33:41.7851919Z * [new branch] gh/williamwen42/346/orig -> origin/gh/williamwen42/346/orig 2025-12-04T09:33:41.7854003Z * [new branch] gh/williamwen42/347/base -> origin/gh/williamwen42/347/base 2025-12-04T09:33:41.7855401Z * [new branch] gh/williamwen42/347/head -> origin/gh/williamwen42/347/head 2025-12-04T09:33:41.7856971Z * [new branch] gh/williamwen42/347/orig -> origin/gh/williamwen42/347/orig 2025-12-04T09:33:41.7859068Z * [new branch] gh/williamwen42/348/base -> origin/gh/williamwen42/348/base 2025-12-04T09:33:41.7860439Z * [new branch] gh/williamwen42/348/head -> origin/gh/williamwen42/348/head 2025-12-04T09:33:41.7861931Z * [new branch] gh/williamwen42/348/orig -> origin/gh/williamwen42/348/orig 2025-12-04T09:33:41.7863763Z * [new branch] gh/williamwen42/349/base -> origin/gh/williamwen42/349/base 2025-12-04T09:33:41.7865297Z * [new branch] gh/williamwen42/349/head -> origin/gh/williamwen42/349/head 2025-12-04T09:33:41.7866776Z * [new branch] gh/williamwen42/349/orig -> origin/gh/williamwen42/349/orig 2025-12-04T09:33:41.7868970Z * [new branch] gh/williamwen42/350/base -> origin/gh/williamwen42/350/base 2025-12-04T09:33:41.7870451Z * [new branch] gh/williamwen42/350/head -> origin/gh/williamwen42/350/head 2025-12-04T09:33:41.7872238Z * [new branch] gh/williamwen42/350/orig -> origin/gh/williamwen42/350/orig 2025-12-04T09:33:41.7874214Z * [new branch] gh/williamwen42/351/base -> origin/gh/williamwen42/351/base 2025-12-04T09:33:41.7875849Z * [new branch] gh/williamwen42/351/head -> origin/gh/williamwen42/351/head 2025-12-04T09:33:41.7877336Z * [new branch] gh/williamwen42/351/orig -> origin/gh/williamwen42/351/orig 2025-12-04T09:33:41.7879547Z * [new branch] gh/williamwen42/352/base -> origin/gh/williamwen42/352/base 2025-12-04T09:33:41.7880946Z * [new branch] gh/williamwen42/352/head -> origin/gh/williamwen42/352/head 2025-12-04T09:33:41.7882435Z * [new branch] gh/williamwen42/352/orig -> origin/gh/williamwen42/352/orig 2025-12-04T09:33:41.7884599Z * [new branch] gh/williamwen42/353/base -> origin/gh/williamwen42/353/base 2025-12-04T09:33:41.7886150Z * [new branch] gh/williamwen42/353/head -> origin/gh/williamwen42/353/head 2025-12-04T09:33:41.7887690Z * [new branch] gh/williamwen42/353/orig -> origin/gh/williamwen42/353/orig 2025-12-04T09:33:41.7889696Z * [new branch] gh/williamwen42/354/base -> origin/gh/williamwen42/354/base 2025-12-04T09:33:41.7891405Z * [new branch] gh/williamwen42/354/head -> origin/gh/williamwen42/354/head 2025-12-04T09:33:41.7892906Z * [new branch] gh/williamwen42/354/orig -> origin/gh/williamwen42/354/orig 2025-12-04T09:33:41.7894924Z * [new branch] gh/williamwen42/355/base -> origin/gh/williamwen42/355/base 2025-12-04T09:33:41.7896488Z * [new branch] gh/williamwen42/355/head -> origin/gh/williamwen42/355/head 2025-12-04T09:33:41.7898009Z * [new branch] gh/williamwen42/355/orig -> origin/gh/williamwen42/355/orig 2025-12-04T09:33:41.7900533Z * [new branch] gh/williamwen42/356/base -> origin/gh/williamwen42/356/base 2025-12-04T09:33:41.7901998Z * [new branch] gh/williamwen42/356/head -> origin/gh/williamwen42/356/head 2025-12-04T09:33:41.7903448Z * [new branch] gh/williamwen42/356/orig -> origin/gh/williamwen42/356/orig 2025-12-04T09:33:41.7905468Z * [new branch] gh/williamwen42/357/base -> origin/gh/williamwen42/357/base 2025-12-04T09:33:41.7907078Z * [new branch] gh/williamwen42/357/head -> origin/gh/williamwen42/357/head 2025-12-04T09:33:41.7908583Z * [new branch] gh/williamwen42/357/orig -> origin/gh/williamwen42/357/orig 2025-12-04T09:33:41.7910710Z * [new branch] gh/williamwen42/358/base -> origin/gh/williamwen42/358/base 2025-12-04T09:33:41.7912146Z * [new branch] gh/williamwen42/358/head -> origin/gh/williamwen42/358/head 2025-12-04T09:33:41.7913748Z * [new branch] gh/williamwen42/358/orig -> origin/gh/williamwen42/358/orig 2025-12-04T09:33:41.7916161Z * [new branch] gh/xmfan/169/base -> origin/gh/xmfan/169/base 2025-12-04T09:33:41.7917682Z * [new branch] gh/xmfan/169/head -> origin/gh/xmfan/169/head 2025-12-04T09:33:41.7919569Z * [new branch] gh/xmfan/170/base -> origin/gh/xmfan/170/base 2025-12-04T09:33:41.7920905Z * [new branch] gh/xmfan/170/head -> origin/gh/xmfan/170/head 2025-12-04T09:33:41.7922819Z * [new branch] gh/xmfan/274/base -> origin/gh/xmfan/274/base 2025-12-04T09:33:41.7924276Z * [new branch] gh/xmfan/274/head -> origin/gh/xmfan/274/head 2025-12-04T09:33:41.7925745Z * [new branch] gh/xmfan/274/orig -> origin/gh/xmfan/274/orig 2025-12-04T09:33:41.7927672Z * [new branch] gh/xmfan/277/base -> origin/gh/xmfan/277/base 2025-12-04T09:33:41.7929257Z * [new branch] gh/xmfan/277/head -> origin/gh/xmfan/277/head 2025-12-04T09:33:41.7930764Z * [new branch] gh/xmfan/277/orig -> origin/gh/xmfan/277/orig 2025-12-04T09:33:41.7932766Z * [new branch] gh/xmfan/301/base -> origin/gh/xmfan/301/base 2025-12-04T09:33:41.7934134Z * [new branch] gh/xmfan/301/head -> origin/gh/xmfan/301/head 2025-12-04T09:33:41.7935556Z * [new branch] gh/xmfan/301/orig -> origin/gh/xmfan/301/orig 2025-12-04T09:33:41.7938221Z * [new branch] gh/xmfan/304/base -> origin/gh/xmfan/304/base 2025-12-04T09:33:41.7939709Z * [new branch] gh/xmfan/304/head -> origin/gh/xmfan/304/head 2025-12-04T09:33:41.7941609Z * [new branch] gh/xmfan/304/orig -> origin/gh/xmfan/304/orig 2025-12-04T09:33:41.7943553Z * [new branch] gh/xmfan/309/base -> origin/gh/xmfan/309/base 2025-12-04T09:33:41.7945030Z * [new branch] gh/xmfan/309/head -> origin/gh/xmfan/309/head 2025-12-04T09:33:41.7946616Z * [new branch] gh/xmfan/309/orig -> origin/gh/xmfan/309/orig 2025-12-04T09:33:41.7949011Z * [new branch] gh/xmfan/310/base -> origin/gh/xmfan/310/base 2025-12-04T09:33:41.7950619Z * [new branch] gh/xmfan/310/head -> origin/gh/xmfan/310/head 2025-12-04T09:33:41.7952137Z * [new branch] gh/xmfan/310/orig -> origin/gh/xmfan/310/orig 2025-12-04T09:33:41.7954095Z * [new branch] gh/xmfan/311/base -> origin/gh/xmfan/311/base 2025-12-04T09:33:41.7955522Z * [new branch] gh/xmfan/311/head -> origin/gh/xmfan/311/head 2025-12-04T09:33:41.7956976Z * [new branch] gh/xmfan/311/orig -> origin/gh/xmfan/311/orig 2025-12-04T09:33:41.7958924Z * [new branch] gh/xmfan/312/base -> origin/gh/xmfan/312/base 2025-12-04T09:33:41.7960486Z * [new branch] gh/xmfan/312/head -> origin/gh/xmfan/312/head 2025-12-04T09:33:41.7961820Z * [new branch] gh/xmfan/312/orig -> origin/gh/xmfan/312/orig 2025-12-04T09:33:41.7964301Z * [new branch] gh/xmfan/313/base -> origin/gh/xmfan/313/base 2025-12-04T09:33:41.7965742Z * [new branch] gh/xmfan/313/head -> origin/gh/xmfan/313/head 2025-12-04T09:33:41.7967245Z * [new branch] gh/xmfan/313/orig -> origin/gh/xmfan/313/orig 2025-12-04T09:33:41.7969724Z * [new branch] gh/xuanzhang816/27/base -> origin/gh/xuanzhang816/27/base 2025-12-04T09:33:41.7971577Z * [new branch] gh/xuanzhang816/27/head -> origin/gh/xuanzhang816/27/head 2025-12-04T09:33:41.7973042Z * [new branch] gh/xuanzhang816/27/orig -> origin/gh/xuanzhang816/27/orig 2025-12-04T09:33:41.7975220Z * [new branch] gh/xuanzhang816/32/base -> origin/gh/xuanzhang816/32/base 2025-12-04T09:33:41.7976766Z * [new branch] gh/xuanzhang816/32/head -> origin/gh/xuanzhang816/32/head 2025-12-04T09:33:41.7978702Z * [new branch] gh/xuanzhang816/32/orig -> origin/gh/xuanzhang816/32/orig 2025-12-04T09:33:41.7980710Z * [new branch] gh/xuanzhang816/33/base -> origin/gh/xuanzhang816/33/base 2025-12-04T09:33:41.7982116Z * [new branch] gh/xuanzhang816/33/head -> origin/gh/xuanzhang816/33/head 2025-12-04T09:33:41.7983679Z * [new branch] gh/xuanzhang816/33/orig -> origin/gh/xuanzhang816/33/orig 2025-12-04T09:33:41.7986075Z * [new branch] gh/xuanzhang816/34/base -> origin/gh/xuanzhang816/34/base 2025-12-04T09:33:41.7987569Z * [new branch] gh/xuanzhang816/34/head -> origin/gh/xuanzhang816/34/head 2025-12-04T09:33:41.7989052Z * [new branch] gh/xuanzhang816/34/orig -> origin/gh/xuanzhang816/34/orig 2025-12-04T09:33:41.7991442Z * [new branch] gh/xuanzhang816/35/base -> origin/gh/xuanzhang816/35/base 2025-12-04T09:33:41.7992913Z * [new branch] gh/xuanzhang816/35/head -> origin/gh/xuanzhang816/35/head 2025-12-04T09:33:41.7994619Z * [new branch] gh/xuanzhang816/35/orig -> origin/gh/xuanzhang816/35/orig 2025-12-04T09:33:41.7996959Z * [new branch] gh/yanbing-j/11/base -> origin/gh/yanbing-j/11/base 2025-12-04T09:33:41.7998462Z * [new branch] gh/yanbing-j/11/head -> origin/gh/yanbing-j/11/head 2025-12-04T09:33:41.7999917Z * [new branch] gh/yanbing-j/11/orig -> origin/gh/yanbing-j/11/orig 2025-12-04T09:33:41.8001881Z * [new branch] gh/yanbing-j/12/base -> origin/gh/yanbing-j/12/base 2025-12-04T09:33:41.8003824Z * [new branch] gh/yanbing-j/12/head -> origin/gh/yanbing-j/12/head 2025-12-04T09:33:41.8005360Z * [new branch] gh/yanbing-j/12/orig -> origin/gh/yanbing-j/12/orig 2025-12-04T09:33:41.8007364Z * [new branch] gh/yanbing-j/13/base -> origin/gh/yanbing-j/13/base 2025-12-04T09:33:41.8008893Z * [new branch] gh/yanbing-j/13/head -> origin/gh/yanbing-j/13/head 2025-12-04T09:33:41.8010352Z * [new branch] gh/yanbing-j/13/orig -> origin/gh/yanbing-j/13/orig 2025-12-04T09:33:41.8012383Z * [new branch] gh/yanbing-j/14/base -> origin/gh/yanbing-j/14/base 2025-12-04T09:33:41.8014642Z * [new branch] gh/yanbing-j/14/head -> origin/gh/yanbing-j/14/head 2025-12-04T09:33:41.8015569Z * [new branch] gh/yanbing-j/14/orig -> origin/gh/yanbing-j/14/orig 2025-12-04T09:33:41.8017654Z * [new branch] gh/yanbing-j/15/base -> origin/gh/yanbing-j/15/base 2025-12-04T09:33:41.8018909Z * [new branch] gh/yanbing-j/15/head -> origin/gh/yanbing-j/15/head 2025-12-04T09:33:41.8020577Z * [new branch] gh/yanbing-j/15/orig -> origin/gh/yanbing-j/15/orig 2025-12-04T09:33:41.8022481Z * [new branch] gh/yanbing-j/18/base -> origin/gh/yanbing-j/18/base 2025-12-04T09:33:41.8023816Z * [new branch] gh/yanbing-j/18/head -> origin/gh/yanbing-j/18/head 2025-12-04T09:33:41.8025125Z * [new branch] gh/yanbing-j/18/orig -> origin/gh/yanbing-j/18/orig 2025-12-04T09:33:41.8027214Z * [new branch] gh/yanbing-j/19/base -> origin/gh/yanbing-j/19/base 2025-12-04T09:33:41.8028563Z * [new branch] gh/yanbing-j/19/head -> origin/gh/yanbing-j/19/head 2025-12-04T09:33:41.8030067Z * [new branch] gh/yanbing-j/19/orig -> origin/gh/yanbing-j/19/orig 2025-12-04T09:33:41.8032261Z * [new branch] gh/yanbing-j/20/base -> origin/gh/yanbing-j/20/base 2025-12-04T09:33:41.8033506Z * [new branch] gh/yanbing-j/20/head -> origin/gh/yanbing-j/20/head 2025-12-04T09:33:41.8035200Z * [new branch] gh/yanbing-j/20/orig -> origin/gh/yanbing-j/20/orig 2025-12-04T09:33:41.8037247Z * [new branch] gh/yanbing-j/21/base -> origin/gh/yanbing-j/21/base 2025-12-04T09:33:41.8038629Z * [new branch] gh/yanbing-j/21/head -> origin/gh/yanbing-j/21/head 2025-12-04T09:33:41.8040686Z * [new branch] gh/yanbing-j/22/base -> origin/gh/yanbing-j/22/base 2025-12-04T09:33:41.8041983Z * [new branch] gh/yanbing-j/22/head -> origin/gh/yanbing-j/22/head 2025-12-04T09:33:41.8044113Z * [new branch] gh/yanbing-j/22/orig -> origin/gh/yanbing-j/22/orig 2025-12-04T09:33:41.8046096Z * [new branch] gh/yanbing-j/23/base -> origin/gh/yanbing-j/23/base 2025-12-04T09:33:41.8047444Z * [new branch] gh/yanbing-j/23/head -> origin/gh/yanbing-j/23/head 2025-12-04T09:33:41.8049024Z * [new branch] gh/yanbing-j/23/orig -> origin/gh/yanbing-j/23/orig 2025-12-04T09:33:41.8051095Z * [new branch] gh/yanbing-j/24/base -> origin/gh/yanbing-j/24/base 2025-12-04T09:33:41.8052364Z * [new branch] gh/yanbing-j/24/head -> origin/gh/yanbing-j/24/head 2025-12-04T09:33:41.8054169Z * [new branch] gh/yanbing-j/24/orig -> origin/gh/yanbing-j/24/orig 2025-12-04T09:33:41.8056022Z * [new branch] gh/yanbing-j/25/base -> origin/gh/yanbing-j/25/base 2025-12-04T09:33:41.8057431Z * [new branch] gh/yanbing-j/25/head -> origin/gh/yanbing-j/25/head 2025-12-04T09:33:41.8058959Z * [new branch] gh/yanbing-j/25/orig -> origin/gh/yanbing-j/25/orig 2025-12-04T09:33:41.8061144Z * [new branch] gh/yanbing-j/26/base -> origin/gh/yanbing-j/26/base 2025-12-04T09:33:41.8062436Z * [new branch] gh/yanbing-j/26/head -> origin/gh/yanbing-j/26/head 2025-12-04T09:33:41.8064027Z * [new branch] gh/yanbing-j/26/orig -> origin/gh/yanbing-j/26/orig 2025-12-04T09:33:41.8066560Z * [new branch] gh/yang-yu-hang/1/base -> origin/gh/yang-yu-hang/1/base 2025-12-04T09:33:41.8068265Z * [new branch] gh/yang-yu-hang/1/head -> origin/gh/yang-yu-hang/1/head 2025-12-04T09:33:41.8069926Z * [new branch] gh/yang-yu-hang/1/orig -> origin/gh/yang-yu-hang/1/orig 2025-12-04T09:33:41.8072237Z * [new branch] gh/yang-yu-hang/2/base -> origin/gh/yang-yu-hang/2/base 2025-12-04T09:33:41.8074077Z * [new branch] gh/yang-yu-hang/2/head -> origin/gh/yang-yu-hang/2/head 2025-12-04T09:33:41.8076615Z * [new branch] gh/yang-yu-hang/2/orig -> origin/gh/yang-yu-hang/2/orig 2025-12-04T09:33:41.8078449Z * [new branch] gh/yang-yu-hang/3/base -> origin/gh/yang-yu-hang/3/base 2025-12-04T09:33:41.8079977Z * [new branch] gh/yang-yu-hang/3/head -> origin/gh/yang-yu-hang/3/head 2025-12-04T09:33:41.8081573Z * [new branch] gh/yang-yu-hang/3/orig -> origin/gh/yang-yu-hang/3/orig 2025-12-04T09:33:41.8084448Z * [new branch] gh/yangw-dev/12/base -> origin/gh/yangw-dev/12/base 2025-12-04T09:33:41.8085785Z * [new branch] gh/yangw-dev/12/head -> origin/gh/yangw-dev/12/head 2025-12-04T09:33:41.8087978Z * [new branch] gh/yangw-dev/12/orig -> origin/gh/yangw-dev/12/orig 2025-12-04T09:33:41.8089981Z * [new branch] gh/yangw-dev/13/base -> origin/gh/yangw-dev/13/base 2025-12-04T09:33:41.8091357Z * [new branch] gh/yangw-dev/13/head -> origin/gh/yangw-dev/13/head 2025-12-04T09:33:41.8093061Z * [new branch] gh/yangw-dev/13/orig -> origin/gh/yangw-dev/13/orig 2025-12-04T09:33:41.8094994Z * [new branch] gh/yangw-dev/14/base -> origin/gh/yangw-dev/14/base 2025-12-04T09:33:41.8096660Z * [new branch] gh/yangw-dev/14/head -> origin/gh/yangw-dev/14/head 2025-12-04T09:33:41.8097968Z * [new branch] gh/yangw-dev/14/orig -> origin/gh/yangw-dev/14/orig 2025-12-04T09:33:41.8100141Z * [new branch] gh/yangw-dev/15/base -> origin/gh/yangw-dev/15/base 2025-12-04T09:33:41.8101417Z * [new branch] gh/yangw-dev/15/head -> origin/gh/yangw-dev/15/head 2025-12-04T09:33:41.8102939Z * [new branch] gh/yangw-dev/15/orig -> origin/gh/yangw-dev/15/orig 2025-12-04T09:33:41.8104910Z * [new branch] gh/yangw-dev/19/base -> origin/gh/yangw-dev/19/base 2025-12-04T09:33:41.8106185Z * [new branch] gh/yangw-dev/19/head -> origin/gh/yangw-dev/19/head 2025-12-04T09:33:41.8108223Z * [new branch] gh/yangw-dev/19/orig -> origin/gh/yangw-dev/19/orig 2025-12-04T09:33:41.8110205Z * [new branch] gh/yangw-dev/26/base -> origin/gh/yangw-dev/26/base 2025-12-04T09:33:41.8111764Z * [new branch] gh/yangw-dev/26/head -> origin/gh/yangw-dev/26/head 2025-12-04T09:33:41.8113272Z * [new branch] gh/yangw-dev/26/orig -> origin/gh/yangw-dev/26/orig 2025-12-04T09:33:41.8115234Z * [new branch] gh/yangw-dev/27/base -> origin/gh/yangw-dev/27/base 2025-12-04T09:33:41.8116824Z * [new branch] gh/yangw-dev/27/head -> origin/gh/yangw-dev/27/head 2025-12-04T09:33:41.8117996Z * [new branch] gh/yangw-dev/27/orig -> origin/gh/yangw-dev/27/orig 2025-12-04T09:33:41.8120691Z * [new branch] gh/ydwu4/292/base -> origin/gh/ydwu4/292/base 2025-12-04T09:33:41.8121861Z * [new branch] gh/ydwu4/292/head -> origin/gh/ydwu4/292/head 2025-12-04T09:33:41.8123521Z * [new branch] gh/ydwu4/292/orig -> origin/gh/ydwu4/292/orig 2025-12-04T09:33:41.8125498Z * [new branch] gh/ydwu4/294/base -> origin/gh/ydwu4/294/base 2025-12-04T09:33:41.8126980Z * [new branch] gh/ydwu4/294/head -> origin/gh/ydwu4/294/head 2025-12-04T09:33:41.8128493Z * [new branch] gh/ydwu4/294/orig -> origin/gh/ydwu4/294/orig 2025-12-04T09:33:41.8130699Z * [new branch] gh/ydwu4/295/base -> origin/gh/ydwu4/295/base 2025-12-04T09:33:41.8132421Z * [new branch] gh/ydwu4/295/head -> origin/gh/ydwu4/295/head 2025-12-04T09:33:41.8133700Z * [new branch] gh/ydwu4/295/orig -> origin/gh/ydwu4/295/orig 2025-12-04T09:33:41.8135775Z * [new branch] gh/ydwu4/296/base -> origin/gh/ydwu4/296/base 2025-12-04T09:33:41.8137039Z * [new branch] gh/ydwu4/296/head -> origin/gh/ydwu4/296/head 2025-12-04T09:33:41.8138650Z * [new branch] gh/ydwu4/296/orig -> origin/gh/ydwu4/296/orig 2025-12-04T09:33:41.8140695Z * [new branch] gh/ydwu4/306/base -> origin/gh/ydwu4/306/base 2025-12-04T09:33:41.8142280Z * [new branch] gh/ydwu4/306/head -> origin/gh/ydwu4/306/head 2025-12-04T09:33:41.8144246Z * [new branch] gh/ydwu4/306/orig -> origin/gh/ydwu4/306/orig 2025-12-04T09:33:41.8146285Z * [new branch] gh/ydwu4/312/base -> origin/gh/ydwu4/312/base 2025-12-04T09:33:41.8147603Z * [new branch] gh/ydwu4/312/head -> origin/gh/ydwu4/312/head 2025-12-04T09:33:41.8149167Z * [new branch] gh/ydwu4/312/orig -> origin/gh/ydwu4/312/orig 2025-12-04T09:33:41.8151161Z * [new branch] gh/ydwu4/322/base -> origin/gh/ydwu4/322/base 2025-12-04T09:33:41.8152711Z * [new branch] gh/ydwu4/322/head -> origin/gh/ydwu4/322/head 2025-12-04T09:33:41.8154028Z * [new branch] gh/ydwu4/322/orig -> origin/gh/ydwu4/322/orig 2025-12-04T09:33:41.8156131Z * [new branch] gh/ydwu4/327/base -> origin/gh/ydwu4/327/base 2025-12-04T09:33:41.8157585Z * [new branch] gh/ydwu4/327/head -> origin/gh/ydwu4/327/head 2025-12-04T09:33:41.8159169Z * [new branch] gh/ydwu4/327/orig -> origin/gh/ydwu4/327/orig 2025-12-04T09:33:41.8161257Z * [new branch] gh/ydwu4/328/base -> origin/gh/ydwu4/328/base 2025-12-04T09:33:41.8162493Z * [new branch] gh/ydwu4/328/head -> origin/gh/ydwu4/328/head 2025-12-04T09:33:41.8164044Z * [new branch] gh/ydwu4/328/orig -> origin/gh/ydwu4/328/orig 2025-12-04T09:33:41.8165825Z * [new branch] gh/ydwu4/329/base -> origin/gh/ydwu4/329/base 2025-12-04T09:33:41.8167143Z * [new branch] gh/ydwu4/329/head -> origin/gh/ydwu4/329/head 2025-12-04T09:33:41.8168626Z * [new branch] gh/ydwu4/329/orig -> origin/gh/ydwu4/329/orig 2025-12-04T09:33:41.8170813Z * [new branch] gh/ydwu4/330/base -> origin/gh/ydwu4/330/base 2025-12-04T09:33:41.8175589Z * [new branch] gh/ydwu4/330/head -> origin/gh/ydwu4/330/head 2025-12-04T09:33:41.8176987Z * [new branch] gh/ydwu4/330/orig -> origin/gh/ydwu4/330/orig 2025-12-04T09:33:41.8178954Z * [new branch] gh/ydwu4/331/base -> origin/gh/ydwu4/331/base 2025-12-04T09:33:41.8180636Z * [new branch] gh/ydwu4/331/head -> origin/gh/ydwu4/331/head 2025-12-04T09:33:41.8181845Z * [new branch] gh/ydwu4/331/orig -> origin/gh/ydwu4/331/orig 2025-12-04T09:33:41.8183728Z * [new branch] gh/ydwu4/332/base -> origin/gh/ydwu4/332/base 2025-12-04T09:33:41.8185313Z * [new branch] gh/ydwu4/332/head -> origin/gh/ydwu4/332/head 2025-12-04T09:33:41.8186728Z * [new branch] gh/ydwu4/332/orig -> origin/gh/ydwu4/332/orig 2025-12-04T09:33:41.8188571Z * [new branch] gh/ydwu4/333/base -> origin/gh/ydwu4/333/base 2025-12-04T09:33:41.8189816Z * [new branch] gh/ydwu4/333/head -> origin/gh/ydwu4/333/head 2025-12-04T09:33:41.8191458Z * [new branch] gh/ydwu4/333/orig -> origin/gh/ydwu4/333/orig 2025-12-04T09:33:41.8193223Z * [new branch] gh/ydwu4/334/base -> origin/gh/ydwu4/334/base 2025-12-04T09:33:41.8194828Z * [new branch] gh/ydwu4/334/head -> origin/gh/ydwu4/334/head 2025-12-04T09:33:41.8196370Z * [new branch] gh/ydwu4/334/orig -> origin/gh/ydwu4/334/orig 2025-12-04T09:33:41.8198139Z * [new branch] gh/ydwu4/335/base -> origin/gh/ydwu4/335/base 2025-12-04T09:33:41.8199522Z * [new branch] gh/ydwu4/335/head -> origin/gh/ydwu4/335/head 2025-12-04T09:33:41.8201101Z * [new branch] gh/ydwu4/335/orig -> origin/gh/ydwu4/335/orig 2025-12-04T09:33:41.8203602Z * [new branch] gh/ydwu4/337/base -> origin/gh/ydwu4/337/base 2025-12-04T09:33:41.8204898Z * [new branch] gh/ydwu4/337/head -> origin/gh/ydwu4/337/head 2025-12-04T09:33:41.8206473Z * [new branch] gh/ydwu4/337/orig -> origin/gh/ydwu4/337/orig 2025-12-04T09:33:41.8208519Z * [new branch] gh/ydwu4/339/base -> origin/gh/ydwu4/339/base 2025-12-04T09:33:41.8210048Z * [new branch] gh/ydwu4/339/head -> origin/gh/ydwu4/339/head 2025-12-04T09:33:41.8211338Z * [new branch] gh/ydwu4/339/orig -> origin/gh/ydwu4/339/orig 2025-12-04T09:33:41.8214286Z * [new branch] gh/yf225/133/base -> origin/gh/yf225/133/base 2025-12-04T09:33:41.8215543Z * [new branch] gh/yf225/133/head -> origin/gh/yf225/133/head 2025-12-04T09:33:41.8217709Z * [new branch] gh/yf225/93/base -> origin/gh/yf225/93/base 2025-12-04T09:33:41.8218957Z * [new branch] gh/yf225/93/head -> origin/gh/yf225/93/head 2025-12-04T09:33:41.8222370Z * [new branch] gh/yifuwang/152/base -> origin/gh/yifuwang/152/base 2025-12-04T09:33:41.8224272Z * [new branch] gh/yifuwang/152/head -> origin/gh/yifuwang/152/head 2025-12-04T09:33:41.8225810Z * [new branch] gh/yifuwang/152/orig -> origin/gh/yifuwang/152/orig 2025-12-04T09:33:41.8227770Z * [new branch] gh/yifuwang/195/base -> origin/gh/yifuwang/195/base 2025-12-04T09:33:41.8229360Z * [new branch] gh/yifuwang/195/head -> origin/gh/yifuwang/195/head 2025-12-04T09:33:41.8230659Z * [new branch] gh/yifuwang/195/orig -> origin/gh/yifuwang/195/orig 2025-12-04T09:33:41.8233309Z * [new branch] gh/yiming0416/1/base -> origin/gh/yiming0416/1/base 2025-12-04T09:33:41.8234632Z * [new branch] gh/yiming0416/1/head -> origin/gh/yiming0416/1/head 2025-12-04T09:33:41.8236780Z * [new branch] gh/yiming0416/2/base -> origin/gh/yiming0416/2/base 2025-12-04T09:33:41.8237761Z * [new branch] gh/yiming0416/2/head -> origin/gh/yiming0416/2/head 2025-12-04T09:33:41.8240423Z * [new branch] gh/yushangdi/1/base -> origin/gh/yushangdi/1/base 2025-12-04T09:33:41.8242498Z * [new branch] gh/yushangdi/1/head -> origin/gh/yushangdi/1/head 2025-12-04T09:33:41.8244288Z * [new branch] gh/yushangdi/10/base -> origin/gh/yushangdi/10/base 2025-12-04T09:33:41.8245608Z * [new branch] gh/yushangdi/10/head -> origin/gh/yushangdi/10/head 2025-12-04T09:33:41.8247255Z * [new branch] gh/yushangdi/10/orig -> origin/gh/yushangdi/10/orig 2025-12-04T09:33:41.8249199Z * [new branch] gh/yushangdi/11/base -> origin/gh/yushangdi/11/base 2025-12-04T09:33:41.8250472Z * [new branch] gh/yushangdi/11/head -> origin/gh/yushangdi/11/head 2025-12-04T09:33:41.8252289Z * [new branch] gh/yushangdi/11/orig -> origin/gh/yushangdi/11/orig 2025-12-04T09:33:41.8254111Z * [new branch] gh/yushangdi/2/base -> origin/gh/yushangdi/2/base 2025-12-04T09:33:41.8255370Z * [new branch] gh/yushangdi/2/head -> origin/gh/yushangdi/2/head 2025-12-04T09:33:41.8257685Z * [new branch] gh/yushangdi/7/base -> origin/gh/yushangdi/7/base 2025-12-04T09:33:41.8258974Z * [new branch] gh/yushangdi/7/head -> origin/gh/yushangdi/7/head 2025-12-04T09:33:41.8260586Z * [new branch] gh/yushangdi/7/orig -> origin/gh/yushangdi/7/orig 2025-12-04T09:33:41.8262978Z * [new branch] gh/yushangdi/8/base -> origin/gh/yushangdi/8/base 2025-12-04T09:33:41.8264711Z * [new branch] gh/yushangdi/8/head -> origin/gh/yushangdi/8/head 2025-12-04T09:33:41.8266323Z * [new branch] gh/yushangdi/8/orig -> origin/gh/yushangdi/8/orig 2025-12-04T09:33:41.8268119Z * [new branch] gh/yushangdi/9/base -> origin/gh/yushangdi/9/base 2025-12-04T09:33:41.8269432Z * [new branch] gh/yushangdi/9/head -> origin/gh/yushangdi/9/head 2025-12-04T09:33:41.8271279Z * [new branch] gh/yushangdi/9/orig -> origin/gh/yushangdi/9/orig 2025-12-04T09:33:41.8274013Z * [new branch] gh/zklaus/19/base -> origin/gh/zklaus/19/base 2025-12-04T09:33:41.8275253Z * [new branch] gh/zklaus/19/head -> origin/gh/zklaus/19/head 2025-12-04T09:33:41.8276854Z * [new branch] gh/zklaus/19/orig -> origin/gh/zklaus/19/orig 2025-12-04T09:33:41.8278906Z * [new branch] gh/zklaus/20/base -> origin/gh/zklaus/20/base 2025-12-04T09:33:41.8280988Z * [new branch] gh/zklaus/20/head -> origin/gh/zklaus/20/head 2025-12-04T09:33:41.8282509Z * [new branch] gh/zklaus/20/orig -> origin/gh/zklaus/20/orig 2025-12-04T09:33:41.8284576Z * [new branch] gh/zklaus/21/base -> origin/gh/zklaus/21/base 2025-12-04T09:33:41.8285906Z * [new branch] gh/zklaus/21/head -> origin/gh/zklaus/21/head 2025-12-04T09:33:41.8287554Z * [new branch] gh/zklaus/21/orig -> origin/gh/zklaus/21/orig 2025-12-04T09:33:41.8289580Z * [new branch] gh/zklaus/22/base -> origin/gh/zklaus/22/base 2025-12-04T09:33:41.8290825Z * [new branch] gh/zklaus/22/head -> origin/gh/zklaus/22/head 2025-12-04T09:33:41.8292999Z * [new branch] gh/zklaus/22/orig -> origin/gh/zklaus/22/orig 2025-12-04T09:33:41.8295105Z * [new branch] gh/zklaus/23/base -> origin/gh/zklaus/23/base 2025-12-04T09:33:41.8296433Z * [new branch] gh/zklaus/23/head -> origin/gh/zklaus/23/head 2025-12-04T09:33:41.8298090Z * [new branch] gh/zklaus/23/orig -> origin/gh/zklaus/23/orig 2025-12-04T09:33:41.8299907Z * [new branch] gh/zklaus/24/base -> origin/gh/zklaus/24/base 2025-12-04T09:33:41.8301176Z * [new branch] gh/zklaus/24/head -> origin/gh/zklaus/24/head 2025-12-04T09:33:41.8303322Z * [new branch] gh/zklaus/24/orig -> origin/gh/zklaus/24/orig 2025-12-04T09:33:41.8306187Z * [new branch] gh/zou3519/1197/base -> origin/gh/zou3519/1197/base 2025-12-04T09:33:41.8307249Z * [new branch] gh/zou3519/1197/head -> origin/gh/zou3519/1197/head 2025-12-04T09:33:41.8308882Z * [new branch] gh/zou3519/1197/orig -> origin/gh/zou3519/1197/orig 2025-12-04T09:33:41.8311401Z * [new branch] gh/zou3519/1199/base -> origin/gh/zou3519/1199/base 2025-12-04T09:33:41.8312898Z * [new branch] gh/zou3519/1199/head -> origin/gh/zou3519/1199/head 2025-12-04T09:33:41.8314992Z * [new branch] gh/zou3519/1199/orig -> origin/gh/zou3519/1199/orig 2025-12-04T09:33:41.8316946Z * [new branch] gh/zou3519/1200/base -> origin/gh/zou3519/1200/base 2025-12-04T09:33:41.8318226Z * [new branch] gh/zou3519/1200/head -> origin/gh/zou3519/1200/head 2025-12-04T09:33:41.8319873Z * [new branch] gh/zou3519/1200/orig -> origin/gh/zou3519/1200/orig 2025-12-04T09:33:41.8321933Z * [new branch] gh/zou3519/1201/base -> origin/gh/zou3519/1201/base 2025-12-04T09:33:41.8323139Z * [new branch] gh/zou3519/1201/head -> origin/gh/zou3519/1201/head 2025-12-04T09:33:41.8324758Z * [new branch] gh/zou3519/1201/orig -> origin/gh/zou3519/1201/orig 2025-12-04T09:33:41.8326588Z * [new branch] gh/zou3519/1202/base -> origin/gh/zou3519/1202/base 2025-12-04T09:33:41.8327891Z * [new branch] gh/zou3519/1202/head -> origin/gh/zou3519/1202/head 2025-12-04T09:33:41.8329556Z * [new branch] gh/zou3519/1202/orig -> origin/gh/zou3519/1202/orig 2025-12-04T09:33:41.8332135Z * [new branch] gh/zpcore/1/base -> origin/gh/zpcore/1/base 2025-12-04T09:33:41.8333209Z * [new branch] gh/zpcore/1/head -> origin/gh/zpcore/1/head 2025-12-04T09:33:41.8335316Z * [new branch] gh/zpcore/11/base -> origin/gh/zpcore/11/base 2025-12-04T09:33:41.8336908Z * [new branch] gh/zpcore/11/head -> origin/gh/zpcore/11/head 2025-12-04T09:33:41.8338462Z * [new branch] gh/zpcore/11/orig -> origin/gh/zpcore/11/orig 2025-12-04T09:33:41.8340906Z * [new branch] gh/zpcore/12/base -> origin/gh/zpcore/12/base 2025-12-04T09:33:41.8342370Z * [new branch] gh/zpcore/12/head -> origin/gh/zpcore/12/head 2025-12-04T09:33:41.8343883Z * [new branch] gh/zpcore/12/orig -> origin/gh/zpcore/12/orig 2025-12-04T09:33:41.8345938Z * [new branch] gh/zpcore/13/base -> origin/gh/zpcore/13/base 2025-12-04T09:33:41.8347404Z * [new branch] gh/zpcore/13/head -> origin/gh/zpcore/13/head 2025-12-04T09:33:41.8348880Z * [new branch] gh/zpcore/13/orig -> origin/gh/zpcore/13/orig 2025-12-04T09:33:41.8351296Z * [new branch] gh/zpcore/14/base -> origin/gh/zpcore/14/base 2025-12-04T09:33:41.8352896Z * [new branch] gh/zpcore/14/head -> origin/gh/zpcore/14/head 2025-12-04T09:33:41.8354377Z * [new branch] gh/zpcore/14/orig -> origin/gh/zpcore/14/orig 2025-12-04T09:33:41.8356602Z * [new branch] gh/zpcore/15/base -> origin/gh/zpcore/15/base 2025-12-04T09:33:41.8358055Z * [new branch] gh/zpcore/15/head -> origin/gh/zpcore/15/head 2025-12-04T09:33:41.8359566Z * [new branch] gh/zpcore/15/orig -> origin/gh/zpcore/15/orig 2025-12-04T09:33:41.8361553Z * [new branch] gh/zpcore/2/base -> origin/gh/zpcore/2/base 2025-12-04T09:33:41.8363015Z * [new branch] gh/zpcore/2/head -> origin/gh/zpcore/2/head 2025-12-04T09:33:41.8365598Z * [new branch] gh/zpcore/21/base -> origin/gh/zpcore/21/base 2025-12-04T09:33:41.8367372Z * [new branch] gh/zpcore/21/head -> origin/gh/zpcore/21/head 2025-12-04T09:33:41.8368849Z * [new branch] gh/zpcore/21/orig -> origin/gh/zpcore/21/orig 2025-12-04T09:33:41.8371241Z * [new branch] gh/zpcore/22/base -> origin/gh/zpcore/22/base 2025-12-04T09:33:41.8372793Z * [new branch] gh/zpcore/22/head -> origin/gh/zpcore/22/head 2025-12-04T09:33:41.8374392Z * [new branch] gh/zpcore/22/orig -> origin/gh/zpcore/22/orig 2025-12-04T09:33:41.8376489Z * [new branch] gh/zpcore/23/base -> origin/gh/zpcore/23/base 2025-12-04T09:33:41.8378094Z * [new branch] gh/zpcore/23/head -> origin/gh/zpcore/23/head 2025-12-04T09:33:41.8380079Z * [new branch] gh/zpcore/23/orig -> origin/gh/zpcore/23/orig 2025-12-04T09:33:41.8381842Z * [new branch] gh/zpcore/24/base -> origin/gh/zpcore/24/base 2025-12-04T09:33:41.8383326Z * [new branch] gh/zpcore/24/head -> origin/gh/zpcore/24/head 2025-12-04T09:33:41.8384774Z * [new branch] gh/zpcore/24/orig -> origin/gh/zpcore/24/orig 2025-12-04T09:33:41.8387077Z * [new branch] gh/zpcore/25/base -> origin/gh/zpcore/25/base 2025-12-04T09:33:41.8388609Z * [new branch] gh/zpcore/25/head -> origin/gh/zpcore/25/head 2025-12-04T09:33:41.8390085Z * [new branch] gh/zpcore/25/orig -> origin/gh/zpcore/25/orig 2025-12-04T09:33:41.8392155Z * [new branch] gh/zpcore/26/base -> origin/gh/zpcore/26/base 2025-12-04T09:33:41.8393787Z * [new branch] gh/zpcore/26/head -> origin/gh/zpcore/26/head 2025-12-04T09:33:41.8395352Z * [new branch] gh/zpcore/26/orig -> origin/gh/zpcore/26/orig 2025-12-04T09:33:41.8397608Z * [new branch] gh/zpcore/27/base -> origin/gh/zpcore/27/base 2025-12-04T09:33:41.8399122Z * [new branch] gh/zpcore/27/head -> origin/gh/zpcore/27/head 2025-12-04T09:33:41.8400569Z * [new branch] gh/zpcore/27/orig -> origin/gh/zpcore/27/orig 2025-12-04T09:33:41.8403071Z * [new branch] gh/zpcore/28/base -> origin/gh/zpcore/28/base 2025-12-04T09:33:41.8405108Z * [new branch] gh/zpcore/28/head -> origin/gh/zpcore/28/head 2025-12-04T09:33:41.8406663Z * [new branch] gh/zpcore/28/orig -> origin/gh/zpcore/28/orig 2025-12-04T09:33:41.8408970Z * [new branch] gh/zpcore/3/base -> origin/gh/zpcore/3/base 2025-12-04T09:33:41.8410403Z * [new branch] gh/zpcore/3/head -> origin/gh/zpcore/3/head 2025-12-04T09:33:41.8412189Z * [new branch] gh/zpcore/4/base -> origin/gh/zpcore/4/base 2025-12-04T09:33:41.8413637Z * [new branch] gh/zpcore/4/head -> origin/gh/zpcore/4/head 2025-12-04T09:33:41.8415564Z * [new branch] gh/zpcore/5/base -> origin/gh/zpcore/5/base 2025-12-04T09:33:41.8417039Z * [new branch] gh/zpcore/5/head -> origin/gh/zpcore/5/head 2025-12-04T09:33:41.8418823Z * [new branch] gh/zpcore/6/base -> origin/gh/zpcore/6/base 2025-12-04T09:33:41.8420236Z * [new branch] gh/zpcore/6/head -> origin/gh/zpcore/6/head 2025-12-04T09:33:41.8422509Z * [new branch] gh/zpcore/7/base -> origin/gh/zpcore/7/base 2025-12-04T09:33:41.8423963Z * [new branch] gh/zpcore/7/head -> origin/gh/zpcore/7/head 2025-12-04T09:33:41.8425772Z * [new branch] gh/zpcore/8/base -> origin/gh/zpcore/8/base 2025-12-04T09:33:41.8427264Z * [new branch] gh/zpcore/8/head -> origin/gh/zpcore/8/head 2025-12-04T09:33:41.8429036Z * [new branch] google-main -> origin/google-main 2025-12-04T09:33:41.8431217Z * [new branch] guangyey/external_stream -> origin/guangyey/external_stream 2025-12-04T09:33:41.8432491Z * [new branch] guangyey/test_2025 -> origin/guangyey/test_2025 2025-12-04T09:33:41.8435364Z * [new branch] guilhermeleobas/cherry-pick-55d87d9dfd9 -> origin/guilhermeleobas/cherry-pick-55d87d9dfd9 2025-12-04T09:33:41.8437303Z * [new branch] hameerabbasi/complex_tensor_subclass -> origin/hameerabbasi/complex_tensor_subclass 2025-12-04T09:33:41.8439010Z * [new branch] hameerabbasi/fix-ctensor-gradcheck-tests -> origin/hameerabbasi/fix-ctensor-gradcheck-tests 2025-12-04T09:33:41.8440399Z * [new branch] hameerabbasi/gradcheck-allclose -> origin/hameerabbasi/gradcheck-allclose 2025-12-04T09:33:41.8441808Z * [new branch] hc_baseline -> origin/hc_baseline 2025-12-04T09:33:41.8443368Z * [new branch] hhh_rand -> origin/hhh_rand 2025-12-04T09:33:41.8445241Z * [new branch] huba/f1 -> origin/huba/f1 2025-12-04T09:33:41.8447849Z * [new branch] increase-timeout-linux-jammy-cuda12_8-py3_10-gcc11-test -> origin/increase-timeout-linux-jammy-cuda12_8-py3_10-gcc11-test 2025-12-04T09:33:41.8448981Z * [new branch] inlining -> origin/inlining 2025-12-04T09:33:41.8450555Z * [new branch] inlining-ezyang -> origin/inlining-ezyang 2025-12-04T09:33:41.8452121Z * [new branch] install-torchao-0.13.0 -> origin/install-torchao-0.13.0 2025-12-04T09:33:41.8453968Z * [new branch] instrument-trunk-pull-linux-with-job-test-filters -> origin/instrument-trunk-pull-linux-with-job-test-filters 2025-12-04T09:33:41.8455149Z * [new branch] invoke-subgraph -> origin/invoke-subgraph 2025-12-04T09:33:41.8456954Z * [new branch] issue#58739 -> origin/issue#58739 2025-12-04T09:33:41.8459063Z * [new branch] jainapurva-patch-1 -> origin/jainapurva-patch-1 2025-12-04T09:33:41.8460913Z * [new branch] jathu/o3 -> origin/jathu/o3 2025-12-04T09:33:41.8462140Z * [new branch] jathu/sve -> origin/jathu/sve 2025-12-04T09:33:41.8465123Z * [new branch] jcaip/test-cusparselt-version-0.6.2 -> origin/jcaip/test-cusparselt-version-0.6.2 2025-12-04T09:33:41.8466150Z * [new branch] jcaip/update-cusparselt-0.6.2 -> origin/jcaip/update-cusparselt-0.6.2 2025-12-04T09:33:41.8468298Z * [new branch] jiannanWang/memorysnapshot_filter -> origin/jiannanWang/memorysnapshot_filter 2025-12-04T09:33:41.8469808Z * [new branch] jiannanWang/profilerstepwarning -> origin/jiannanWang/profilerstepwarning 2025-12-04T09:33:41.8471610Z * [new branch] jithunnair-amd-patch-1 -> origin/jithunnair-amd-patch-1 2025-12-04T09:33:41.8473221Z * [new branch] jithunnair-amd-patch-10 -> origin/jithunnair-amd-patch-10 2025-12-04T09:33:41.8474816Z * [new branch] jithunnair-amd-patch-2 -> origin/jithunnair-amd-patch-2 2025-12-04T09:33:41.8476433Z * [new branch] jithunnair-amd-patch-3 -> origin/jithunnair-amd-patch-3 2025-12-04T09:33:41.8478063Z * [new branch] jithunnair-amd-patch-4 -> origin/jithunnair-amd-patch-4 2025-12-04T09:33:41.8479524Z * [new branch] jithunnair-amd-patch-5 -> origin/jithunnair-amd-patch-5 2025-12-04T09:33:41.8481252Z * [new branch] jithunnair-amd-patch-6 -> origin/jithunnair-amd-patch-6 2025-12-04T09:33:41.8482739Z * [new branch] jithunnair-amd-patch-7 -> origin/jithunnair-amd-patch-7 2025-12-04T09:33:41.8484489Z * [new branch] jithunnair-amd-patch-8 -> origin/jithunnair-amd-patch-8 2025-12-04T09:33:41.8485988Z * [new branch] jithunnair-amd-patch-9 -> origin/jithunnair-amd-patch-9 2025-12-04T09:33:41.8488097Z * [new branch] justinchu/native-qdq -> origin/justinchu/native-qdq 2025-12-04T09:33:41.8490205Z * [new branch] kainan666/xlf_debug -> origin/kainan666/xlf_debug 2025-12-04T09:33:41.8491514Z * [new branch] kainan_test -> origin/kainan_test 2025-12-04T09:33:41.8493507Z * [new branch] larryliu0820-patch-1 -> origin/larryliu0820-patch-1 2025-12-04T09:33:41.8495698Z * [new branch] leslie/test_group_gemm_epilogues -> origin/leslie/test_group_gemm_epilogues 2025-12-04T09:33:41.8498333Z * [new branch] lessw2020/fix_cutlass_cache_error -> origin/lessw2020/fix_cutlass_cache_error 2025-12-04T09:33:41.8500218Z * [new branch] liaoxuan/shm_all_reduce -> origin/liaoxuan/shm_all_reduce 2025-12-04T09:33:41.8501769Z * [new branch] liaoxuan/test_fa_disable_softmax -> origin/liaoxuan/test_fa_disable_softmax 2025-12-04T09:33:41.8503212Z * [new branch] liaoxuan/test_int8_sdpa -> origin/liaoxuan/test_int8_sdpa 2025-12-04T09:33:41.8504645Z * [new branch] llama4-stable -> origin/llama4-stable 2025-12-04T09:33:41.8507095Z * [new branch] lts/release/1.8 -> origin/lts/release/1.8 2025-12-04T09:33:41.8509189Z * [new branch] lucaskabela/#94773 -> origin/lucaskabela/#94773 2025-12-04T09:33:41.8510679Z * [new branch] lucaskabela/fix_164876 -> origin/lucaskabela/fix_164876 2025-12-04T09:33:41.8512085Z * [new branch] lucaskabela/flop_counter -> origin/lucaskabela/flop_counter 2025-12-04T09:33:41.8513501Z * [new branch] lucaskabela/func_under_decomp -> origin/lucaskabela/func_under_decomp 2025-12-04T09:33:41.8514942Z * [new branch] lucaskabela/functional_in_dynamo -> origin/lucaskabela/functional_in_dynamo 2025-12-04T09:33:41.8516484Z * [new branch] lucaskabela/install_params_as_graph_attr -> origin/lucaskabela/install_params_as_graph_attr 2025-12-04T09:33:41.8518215Z * [new branch] lucaskabela/parameters_as_graph_attr -> origin/lucaskabela/parameters_as_graph_attr 2025-12-04T09:33:41.8520184Z * [new branch] lucaskabela/remove_aot_dispatcher_metadata -> origin/lucaskabela/remove_aot_dispatcher_metadata 2025-12-04T09:33:41.8521575Z * [new branch] lucaskabela/rnn_decomp -> origin/lucaskabela/rnn_decomp 2025-12-04T09:33:41.8523157Z * [new branch] lucaskabela/typing_backends -> origin/lucaskabela/typing_backends 2025-12-04T09:33:41.8524677Z * [new branch] lucaskabela/typing_ctx_manager -> origin/lucaskabela/typing_ctx_manager 2025-12-04T09:33:41.8526092Z * [new branch] lucaskabela/typing_nn_module -> origin/lucaskabela/typing_nn_module 2025-12-04T09:33:41.8527583Z * [new branch] lucaskabela/typing_user_defined -> origin/lucaskabela/typing_user_defined 2025-12-04T09:33:41.8528986Z * [new branch] lucaskabela/typing_variables -> origin/lucaskabela/typing_variables 2025-12-04T09:33:41.8530592Z * [new branch] lucaskabela/typing_variables_dicts -> origin/lucaskabela/typing_variables_dicts 2025-12-04T09:33:41.8532044Z * [new branch] lucaskabela/typing_variables_functions -> origin/lucaskabela/typing_variables_functions 2025-12-04T09:33:41.8533433Z * [new branch] lucaskabela/typing_variables_lists -> origin/lucaskabela/typing_variables_lists 2025-12-04T09:33:41.8535496Z * [new branch] lw/torch_box_by_ref -> origin/lw/torch_box_by_ref 2025-12-04T09:33:41.8537197Z * [new branch] main -> origin/main 2025-12-04T09:33:41.8538920Z * [new branch] malfet-patch-1 -> origin/malfet-patch-1 2025-12-04T09:33:41.8540615Z * [new branch] malfet-patch-2 -> origin/malfet-patch-2 2025-12-04T09:33:41.8542252Z * [new branch] malfet-patch-3 -> origin/malfet-patch-3 2025-12-04T09:33:41.8544067Z * [new branch] malfet-patch-4 -> origin/malfet-patch-4 2025-12-04T09:33:41.8545572Z * [new branch] malfet-patch-5 -> origin/malfet-patch-5 2025-12-04T09:33:41.8547185Z * [new branch] malfet-patch-6 -> origin/malfet-patch-6 2025-12-04T09:33:41.8548805Z * [new branch] malfet-patch-7 -> origin/malfet-patch-7 2025-12-04T09:33:41.8550540Z * [new branch] malfet-patch-8 -> origin/malfet-patch-8 2025-12-04T09:33:41.8552576Z * [new branch] malfet/add-3.14-ci -> origin/malfet/add-3.14-ci 2025-12-04T09:33:41.8554319Z * [new branch] malfet/be-do-not-make-typos-in-build-artifacts -> origin/malfet/be-do-not-make-typos-in-build-artifacts 2025-12-04T09:33:41.8555796Z * [new branch] malfet/be-move-more-settings-to-checkout-pytorch -> origin/malfet/be-move-more-settings-to-checkout-pytorch 2025-12-04T09:33:41.8557521Z * [new branch] malfet/be-remove-misisng-neon-headers -> origin/malfet/be-remove-misisng-neon-headers 2025-12-04T09:33:41.8559152Z * [new branch] malfet/mps-implement-col2im -> origin/malfet/mps-implement-col2im 2025-12-04T09:33:41.8561224Z * [new branch] manuel/aoti_metal_shimify-thread_safe -> origin/manuel/aoti_metal_shimify-thread_safe 2025-12-04T09:33:41.8562575Z * [new branch] manuel/inductor_link_openmp -> origin/manuel/inductor_link_openmp 2025-12-04T09:33:41.8564557Z * [new branch] masnesral/metaconda -> origin/masnesral/metaconda 2025-12-04T09:33:41.8566190Z * [new branch] mem_profiler_flaky_fix -> origin/mem_profiler_flaky_fix 2025-12-04T09:33:41.8568269Z * [new branch] mem_profiler_stack_trace -> origin/mem_profiler_stack_trace 2025-12-04T09:33:41.8569924Z * [new branch] memory_profiler_stack -> origin/memory_profiler_stack 2025-12-04T09:33:41.8571603Z * [new branch] metascroy-patch-1 -> origin/metascroy-patch-1 2025-12-04T09:33:41.8573274Z * [new branch] mingw_posix -> origin/mingw_posix 2025-12-04T09:33:41.8575318Z * [new branch] mlazos/S429861-debug -> origin/mlazos/S429861-debug 2025-12-04T09:33:41.8576774Z * [new branch] mlazos/aa -> origin/mlazos/aa 2025-12-04T09:33:41.8578237Z * [new branch] mlazos/acts -> origin/mlazos/acts 2025-12-04T09:33:41.8579938Z * [new branch] mlazos/arg-renames -> origin/mlazos/arg-renames 2025-12-04T09:33:41.8581367Z * [new branch] mlazos/bad-cudagraphs -> origin/mlazos/bad-cudagraphs 2025-12-04T09:33:41.8582859Z * [new branch] mlazos/baseline-graph-breaks -> origin/mlazos/baseline-graph-breaks 2025-12-04T09:33:41.8584230Z * [new branch] mlazos/beta-tensor -> origin/mlazos/beta-tensor 2025-12-04T09:33:41.8585678Z * [new branch] mlazos/buffers -> origin/mlazos/buffers 2025-12-04T09:33:41.8586916Z * [new branch] mlazos/buffers2 -> origin/mlazos/buffers2 2025-12-04T09:33:41.8588668Z * [new branch] mlazos/buffers3 -> origin/mlazos/buffers3 2025-12-04T09:33:41.8590374Z * [new branch] mlazos/bwd -> origin/mlazos/bwd 2025-12-04T09:33:41.8591773Z * [new branch] mlazos/combo-test -> origin/mlazos/combo-test 2025-12-04T09:33:41.8593418Z * [new branch] mlazos/ctx-cleanup -> origin/mlazos/ctx-cleanup 2025-12-04T09:33:41.8594892Z * [new branch] mlazos/cuda-cmd-log -> origin/mlazos/cuda-cmd-log 2025-12-04T09:33:41.8596517Z * [new branch] mlazos/cudagraph-tests -> origin/mlazos/cudagraph-tests 2025-12-04T09:33:41.8598049Z * [new branch] mlazos/cudagraphs-measurement -> origin/mlazos/cudagraphs-measurement 2025-12-04T09:33:41.8599592Z * [new branch] mlazos/cutlass-test -> origin/mlazos/cutlass-test 2025-12-04T09:33:41.8601784Z * [new branch] mlazos/cutlass-topo-bug -> origin/mlazos/cutlass-topo-bug 2025-12-04T09:33:41.8603100Z * [new branch] mlazos/dataclass-proxy -> origin/mlazos/dataclass-proxy 2025-12-04T09:33:41.8604555Z * [new branch] mlazos/dc-attrs -> origin/mlazos/dc-attrs 2025-12-04T09:33:41.8606005Z * [new branch] mlazos/dc-helion -> origin/mlazos/dc-helion 2025-12-04T09:33:41.8607480Z * [new branch] mlazos/dict-fix -> origin/mlazos/dict-fix 2025-12-04T09:33:41.8609463Z * [new branch] mlazos/disable-tf -> origin/mlazos/disable-tf 2025-12-04T09:33:41.8610927Z * [new branch] mlazos/dupe-fix -> origin/mlazos/dupe-fix 2025-12-04T09:33:41.8612520Z * [new branch] mlazos/dyn-batch -> origin/mlazos/dyn-batch 2025-12-04T09:33:41.8613969Z * [new branch] mlazos/evt -> origin/mlazos/evt 2025-12-04T09:33:41.8615489Z * [new branch] mlazos/extract-examples -> origin/mlazos/extract-examples 2025-12-04T09:33:41.8617002Z * [new branch] mlazos/foreach-op -> origin/mlazos/foreach-op 2025-12-04T09:33:41.8618567Z * [new branch] mlazos/fp8 -> origin/mlazos/fp8 2025-12-04T09:33:41.8620142Z * [new branch] mlazos/fp8-bias -> origin/mlazos/fp8-bias 2025-12-04T09:33:41.8621619Z * [new branch] mlazos/fp8-bias-fusion -> origin/mlazos/fp8-bias-fusion 2025-12-04T09:33:41.8623052Z * [new branch] mlazos/fp8-fixes -> origin/mlazos/fp8-fixes 2025-12-04T09:33:41.8624509Z * [new branch] mlazos/freezing -> origin/mlazos/freezing 2025-12-04T09:33:41.8626121Z * [new branch] mlazos/h-comp -> origin/mlazos/h-comp 2025-12-04T09:33:41.8627688Z * [new branch] mlazos/h-comp2 -> origin/mlazos/h-comp2 2025-12-04T09:33:41.8629171Z * [new branch] mlazos/hash-hop -> origin/mlazos/hash-hop 2025-12-04T09:33:41.8630718Z * [new branch] mlazos/hc -> origin/mlazos/hc 2025-12-04T09:33:41.8632284Z * [new branch] mlazos/hc-cycles -> origin/mlazos/hc-cycles 2025-12-04T09:33:41.8633781Z * [new branch] mlazos/hc-fixes -> origin/mlazos/hc-fixes 2025-12-04T09:33:41.8635233Z * [new branch] mlazos/hc-fixes3 -> origin/mlazos/hc-fixes3 2025-12-04T09:33:41.8636701Z * [new branch] mlazos/hc-fixes4 -> origin/mlazos/hc-fixes4 2025-12-04T09:33:41.8638274Z * [new branch] mlazos/hc-hf -> origin/mlazos/hc-hf 2025-12-04T09:33:41.8639686Z * [new branch] mlazos/hc-mut -> origin/mlazos/hc-mut 2025-12-04T09:33:41.8641166Z * [new branch] mlazos/hc10 -> origin/mlazos/hc10 2025-12-04T09:33:41.8642733Z * [new branch] mlazos/hc11 -> origin/mlazos/hc11 2025-12-04T09:33:41.8644214Z * [new branch] mlazos/hc12 -> origin/mlazos/hc12 2025-12-04T09:33:41.8645586Z * [new branch] mlazos/hc13 -> origin/mlazos/hc13 2025-12-04T09:33:41.8647025Z * [new branch] mlazos/hc14 -> origin/mlazos/hc14 2025-12-04T09:33:41.8648442Z * [new branch] mlazos/hc15 -> origin/mlazos/hc15 2025-12-04T09:33:41.8649961Z * [new branch] mlazos/hc2 -> origin/mlazos/hc2 2025-12-04T09:33:41.8651384Z * [new branch] mlazos/hc4 -> origin/mlazos/hc4 2025-12-04T09:33:41.8652810Z * [new branch] mlazos/hc5 -> origin/mlazos/hc5 2025-12-04T09:33:41.8654257Z * [new branch] mlazos/hc6 -> origin/mlazos/hc6 2025-12-04T09:33:41.8655795Z * [new branch] mlazos/hc7 -> origin/mlazos/hc7 2025-12-04T09:33:41.8657280Z * [new branch] mlazos/hc8 -> origin/mlazos/hc8 2025-12-04T09:33:41.8658674Z * [new branch] mlazos/hc9 -> origin/mlazos/hc9 2025-12-04T09:33:41.8660178Z * [new branch] mlazos/hc_baseline2 -> origin/mlazos/hc_baseline2 2025-12-04T09:33:41.8661564Z * [new branch] mlazos/inductor-streams -> origin/mlazos/inductor-streams 2025-12-04T09:33:41.8662796Z * [new branch] mlazos/main -> origin/mlazos/main 2025-12-04T09:33:41.8664313Z * [new branch] mlazos/mcg2 -> origin/mlazos/mcg2 2025-12-04T09:33:41.8665947Z * [new branch] mlazos/meta-guards -> origin/mlazos/meta-guards 2025-12-04T09:33:41.8668080Z * [new branch] mlazos/mlazos/foreach-map-adam -> origin/mlazos/mlazos/foreach-map-adam 2025-12-04T09:33:41.8669620Z * [new branch] mlazos/mlazos/tf-mode-backup -> origin/mlazos/mlazos/tf-mode-backup 2025-12-04T09:33:41.8671210Z * [new branch] mlazos/mod-fix -> origin/mlazos/mod-fix 2025-12-04T09:33:41.8676007Z * [new branch] mlazos/mode-fix -> origin/mlazos/mode-fix 2025-12-04T09:33:41.8677536Z * [new branch] mlazos/offsets -> origin/mlazos/offsets 2025-12-04T09:33:41.8678960Z * [new branch] mlazos/overguarding -> origin/mlazos/overguarding 2025-12-04T09:33:41.8680415Z * [new branch] mlazos/proxy-ctors -> origin/mlazos/proxy-ctors 2025-12-04T09:33:41.8681897Z * [new branch] mlazos/quant-fix -> origin/mlazos/quant-fix 2025-12-04T09:33:41.8683361Z * [new branch] mlazos/resnet-fix -> origin/mlazos/resnet-fix 2025-12-04T09:33:41.8684896Z * [new branch] mlazos/rm-buf-names -> origin/mlazos/rm-buf-names 2025-12-04T09:33:41.8686368Z * [new branch] mlazos/rm-code -> origin/mlazos/rm-code 2025-12-04T09:33:41.8687878Z * [new branch] mlazos/rm-spam -> origin/mlazos/rm-spam 2025-12-04T09:33:41.8689461Z * [new branch] mlazos/rtp -> origin/mlazos/rtp 2025-12-04T09:33:41.8690960Z * [new branch] mlazos/static-idx-dbg -> origin/mlazos/static-idx-dbg 2025-12-04T09:33:41.8692603Z * [new branch] mlazos/static-inputs-log -> origin/mlazos/static-inputs-log 2025-12-04T09:33:41.8693875Z * [new branch] mlazos/stests -> origin/mlazos/stests 2025-12-04T09:33:41.8695400Z * [new branch] mlazos/stream-ops -> origin/mlazos/stream-ops 2025-12-04T09:33:41.8696986Z * [new branch] mlazos/td-fix2 -> origin/mlazos/td-fix2 2025-12-04T09:33:41.8698697Z * [new branch] mlazos/tensor-hasattr2 -> origin/mlazos/tensor-hasattr2 2025-12-04T09:33:41.8699983Z * [new branch] mlazos/test -> origin/mlazos/test 2025-12-04T09:33:41.8701518Z * [new branch] mlazos/tf-mode -> origin/mlazos/tf-mode 2025-12-04T09:33:41.8703108Z * [new branch] mlazos/tf-mode-backup2 -> origin/mlazos/tf-mode-backup2 2025-12-04T09:33:41.8705086Z * [new branch] mlazos/tf-mode-reland -> origin/mlazos/tf-mode-reland 2025-12-04T09:33:41.8706692Z * [new branch] mlazos/tf-mode-reland2 -> origin/mlazos/tf-mode-reland2 2025-12-04T09:33:41.8708195Z * [new branch] mlazos/tf-mode-reland3 -> origin/mlazos/tf-mode-reland3 2025-12-04T09:33:41.8709659Z * [new branch] mlazos/triton-no-epi -> origin/mlazos/triton-no-epi 2025-12-04T09:33:41.8711201Z * [new branch] mlazos/tune-proto -> origin/mlazos/tune-proto 2025-12-04T09:33:41.8713108Z * [new branch] mlazos/tuple-fixes -> origin/mlazos/tuple-fixes 2025-12-04T09:33:41.8714757Z * [new branch] mlazos/tuple-fixes2 -> origin/mlazos/tuple-fixes2 2025-12-04T09:33:41.8716172Z * [new branch] mlazos/tuple-handling -> origin/mlazos/tuple-handling 2025-12-04T09:33:41.8717862Z * [new branch] mlazos/user-stream-base -> origin/mlazos/user-stream-base 2025-12-04T09:33:41.8719400Z * [new branch] mlazos/user-streams -> origin/mlazos/user-streams 2025-12-04T09:33:41.8720870Z * [new branch] mlazos/user-streams-backup -> origin/mlazos/user-streams-backup 2025-12-04T09:33:41.8722433Z * [new branch] mlazos/user-streams-backup2 -> origin/mlazos/user-streams-backup2 2025-12-04T09:33:41.8723874Z * [new branch] mlazos/vary-beta -> origin/mlazos/vary-beta 2025-12-04T09:33:41.8725436Z * [new branch] mlazos/vary-beta2 -> origin/mlazos/vary-beta2 2025-12-04T09:33:41.8726925Z * [new branch] mlazos/weird-perf1 -> origin/mlazos/weird-perf1 2025-12-04T09:33:41.8728638Z * [new branch] mm_out_dtype_compile -> origin/mm_out_dtype_compile 2025-12-04T09:33:41.8730219Z * [new branch] module-shim -> origin/module-shim 2025-12-04T09:33:41.8731852Z * [new branch] move_config -> origin/move_config 2025-12-04T09:33:41.8733930Z * [new branch] msaroufim/reduce -> origin/msaroufim/reduce 2025-12-04T09:33:41.8735837Z * [new branch] mtia/basic-cmake -> origin/mtia/basic-cmake 2025-12-04T09:33:41.8738104Z * [new branch] mwizak/fix-triton-block-shape -> origin/mwizak/fix-triton-block-shape 2025-12-04T09:33:41.8739854Z * [new branch] my_varlen_backup -> origin/my_varlen_backup 2025-12-04T09:33:41.8741326Z * [new branch] nativert_num_outputs -> origin/nativert_num_outputs 2025-12-04T09:33:41.8742857Z * [new branch] new-codegen -> origin/new-codegen 2025-12-04T09:33:41.8744492Z * [new branch] newtest-base -> origin/newtest-base 2025-12-04T09:33:41.8746508Z * [new branch] ngimel/addmm_dtype -> origin/ngimel/addmm_dtype 2025-12-04T09:33:41.8747880Z * [new branch] ngimel/div_inv -> origin/ngimel/div_inv 2025-12-04T09:33:41.8749310Z * [new branch] ngimel/error_index_list -> origin/ngimel/error_index_list 2025-12-04T09:33:41.8750726Z * [new branch] ngimel/gather_grid -> origin/ngimel/gather_grid 2025-12-04T09:33:41.8752250Z * [new branch] ngimel/gather_grid_release -> origin/ngimel/gather_grid_release 2025-12-04T09:33:41.8753543Z * [new branch] ngimel/gg_new -> origin/ngimel/gg_new 2025-12-04T09:33:41.8754983Z * [new branch] ngimel/hostalloc -> origin/ngimel/hostalloc 2025-12-04T09:33:41.8756398Z * [new branch] ngimel/storage_id -> origin/ngimel/storage_id 2025-12-04T09:33:41.8757970Z * [new branch] nightly -> origin/nightly 2025-12-04T09:33:41.8760836Z * [new branch] nikitaved/addmm_1_rowcol_lt_path_check -> origin/nikitaved/addmm_1_rowcol_lt_path_check 2025-12-04T09:33:41.8762549Z * [new branch] nikitaved/addmm_epilogue_fusions_2d_bias -> origin/nikitaved/addmm_epilogue_fusions_2d_bias 2025-12-04T09:33:41.8764051Z * [new branch] nikitaved/addmm_epilogue_fusions_inductor -> origin/nikitaved/addmm_epilogue_fusions_inductor 2025-12-04T09:33:41.8765843Z * [new branch] nikitaved/addmm_epilogue_fusions_scratch -> origin/nikitaved/addmm_epilogue_fusions_scratch 2025-12-04T09:33:41.8767664Z * [new branch] nikitaved/grad_addmm_epilogue_fusions -> origin/nikitaved/grad_addmm_epilogue_fusions 2025-12-04T09:33:41.8769576Z * [new branch] nikitaved/simpler_can_use_32bit_index -> origin/nikitaved/simpler_can_use_32bit_index 2025-12-04T09:33:41.8771283Z * [new branch] nikitaved/test -> origin/nikitaved/test 2025-12-04T09:33:41.8773276Z * [new branch] nmacchioni-perf-test-async-autotune -> origin/nmacchioni-perf-test-async-autotune 2025-12-04T09:33:41.8774719Z * [new branch] no_distributed_log_spew -> origin/no_distributed_log_spew 2025-12-04T09:33:41.8776372Z * [new branch] nofun-hack -> origin/nofun-hack 2025-12-04T09:33:41.8777944Z * [new branch] norm_bench -> origin/norm_bench 2025-12-04T09:33:41.8779982Z * [new branch] nullplay/fuse_matmul -> origin/nullplay/fuse_matmul 2025-12-04T09:33:41.8781539Z * [new branch] nullplay_fuse_matmul -> origin/nullplay_fuse_matmul 2025-12-04T09:33:41.8783081Z * [new branch] optimizer_test -> origin/optimizer_test 2025-12-04T09:33:41.8785712Z * [new branch] orig/release/1.10 -> origin/orig/release/1.10 2025-12-04T09:33:41.8787325Z * [new branch] orig/release/1.11 -> origin/orig/release/1.11 2025-12-04T09:33:41.8788854Z * [new branch] orig/release/1.12 -> origin/orig/release/1.12 2025-12-04T09:33:41.8790611Z * [new branch] orig/release/1.13 -> origin/orig/release/1.13 2025-12-04T09:33:41.8792222Z * [new branch] orig/release/1.6 -> origin/orig/release/1.6 2025-12-04T09:33:41.8794418Z * [new branch] orig/release/1.7 -> origin/orig/release/1.7 2025-12-04T09:33:41.8795990Z * [new branch] orig/release/1.8 -> origin/orig/release/1.8 2025-12-04T09:33:41.8797579Z * [new branch] orig/release/1.9 -> origin/orig/release/1.9 2025-12-04T09:33:41.8799153Z * [new branch] orig/release/2.0 -> origin/orig/release/2.0 2025-12-04T09:33:41.8800600Z * [new branch] orig/release/2.1 -> origin/orig/release/2.1 2025-12-04T09:33:41.8802164Z * [new branch] orig/release/2.2 -> origin/orig/release/2.2 2025-12-04T09:33:41.8803602Z * [new branch] orig/release/2.3 -> origin/orig/release/2.3 2025-12-04T09:33:41.8805069Z * [new branch] orig/release/2.4 -> origin/orig/release/2.4 2025-12-04T09:33:41.8806478Z * [new branch] orig/release/2.5 -> origin/orig/release/2.5 2025-12-04T09:33:41.8807984Z * [new branch] orig/release/2.6 -> origin/orig/release/2.6 2025-12-04T09:33:41.8809785Z * [new branch] orig/release/2.7 -> origin/orig/release/2.7 2025-12-04T09:33:41.8811946Z * [new branch] orig/release/2.8 -> origin/orig/release/2.8 2025-12-04T09:33:41.8813369Z * [new branch] orig/release/2.9 -> origin/orig/release/2.9 2025-12-04T09:33:41.8816822Z * [new branch] origin/gh/fxdawnn/1/base -> origin/origin/gh/fxdawnn/1/base 2025-12-04T09:33:41.8818226Z * [new branch] origin/gh/fxdawnn/1/orig -> origin/origin/gh/fxdawnn/1/orig 2025-12-04T09:33:41.8820760Z * [new branch] origin/gh/zpcore/14/orig -> origin/origin/gh/zpcore/14/orig 2025-12-04T09:33:41.8822444Z * [new branch] oulgen-patch-1 -> origin/oulgen-patch-1 2025-12-04T09:33:41.8824109Z * [new branch] oulgen-patch-2 -> origin/oulgen-patch-2 2025-12-04T09:33:41.8825724Z * [new branch] oulgen-patch-3 -> origin/oulgen-patch-3 2025-12-04T09:33:41.8827554Z * [new branch] oulgen-patch-4 -> origin/oulgen-patch-4 2025-12-04T09:33:41.8829534Z * [new branch] padded-tensor -> origin/padded-tensor 2025-12-04T09:33:41.8831244Z * [new branch] pca2 -> origin/pca2 2025-12-04T09:33:41.8832962Z * [new branch] per_channel_backup -> origin/per_channel_backup 2025-12-04T09:33:41.8834577Z * [new branch] perf_ops -> origin/perf_ops 2025-12-04T09:33:41.8836084Z * [new branch] perf_ops_2_9 -> origin/perf_ops_2_9 2025-12-04T09:33:41.8837792Z * [new branch] pianpwk-patch-1 -> origin/pianpwk-patch-1 2025-12-04T09:33:41.8839849Z * [new branch] pianpwk/__draft_debug_mode -> origin/pianpwk/__draft_debug_mode 2025-12-04T09:33:41.8841447Z * [new branch] pianpwk/_debug_mode_for_triton_draft -> origin/pianpwk/_debug_mode_for_triton_draft 2025-12-04T09:33:41.8842800Z * [new branch] pianpwk/_debug_nn_module_compile -> origin/pianpwk/_debug_nn_module_compile 2025-12-04T09:33:41.8844167Z * [new branch] pianpwk/_draft_triton_11_3 -> origin/pianpwk/_draft_triton_11_3 2025-12-04T09:33:41.8845608Z * [new branch] pianpwk/_manual_bucket_draft -> origin/pianpwk/_manual_bucket_draft 2025-12-04T09:33:41.8847395Z * [new branch] pianpwk/_profile_w_dispatch_keys -> origin/pianpwk/_profile_w_dispatch_keys 2025-12-04T09:33:41.8849194Z * [new branch] pianpwk/_super_draft_debug_mode -> origin/pianpwk/_super_draft_debug_mode 2025-12-04T09:33:41.8850857Z * [new branch] pianpwk/_unbacked_local_shard_size -> origin/pianpwk/_unbacked_local_shard_size 2025-12-04T09:33:41.8852296Z * [new branch] pianpwk/anomaly_tb -> origin/pianpwk/anomaly_tb 2025-12-04T09:33:41.8853723Z * [new branch] pianpwk/auto_fx_annotate -> origin/pianpwk/auto_fx_annotate 2025-12-04T09:33:41.8855434Z * [new branch] pianpwk/backed_size_oblivious_export -> origin/pianpwk/backed_size_oblivious_export 2025-12-04T09:33:41.8856935Z * [new branch] pianpwk/bert_dynamic_perf -> origin/pianpwk/bert_dynamic_perf 2025-12-04T09:33:41.8858583Z * [new branch] pianpwk/debug_fwd_stack_traces -> origin/pianpwk/debug_fwd_stack_traces 2025-12-04T09:33:41.8860110Z * [new branch] pianpwk/debug_hash_tensor -> origin/pianpwk/debug_hash_tensor 2025-12-04T09:33:41.8861666Z * [new branch] pianpwk/debug_mode_annotate -> origin/pianpwk/debug_mode_annotate 2025-12-04T09:33:41.8863050Z * [new branch] pianpwk/debug_mode_defaults -> origin/pianpwk/debug_mode_defaults 2025-12-04T09:33:41.8864557Z * [new branch] pianpwk/debug_mode_hacks -> origin/pianpwk/debug_mode_hacks 2025-12-04T09:33:41.8866072Z * [new branch] pianpwk/debug_mode_opcall_refactor -> origin/pianpwk/debug_mode_opcall_refactor 2025-12-04T09:33:41.8867486Z * [new branch] pianpwk/debug_mode_show_ids -> origin/pianpwk/debug_mode_show_ids 2025-12-04T09:33:41.8869581Z * [new branch] pianpwk/debug_mode_triton -> origin/pianpwk/debug_mode_triton 2025-12-04T09:33:41.8871431Z * [new branch] pianpwk/debug_show_stack_trace -> origin/pianpwk/debug_show_stack_trace 2025-12-04T09:33:41.8872956Z * [new branch] pianpwk/debug_wait_on_collective -> origin/pianpwk/debug_wait_on_collective 2025-12-04T09:33:41.8874471Z * [new branch] pianpwk/debugmode_compile_tf -> origin/pianpwk/debugmode_compile_tf 2025-12-04T09:33:41.8876108Z * [new branch] pianpwk/dispatch_key_debugging_for_debug -> origin/pianpwk/dispatch_key_debugging_for_debug 2025-12-04T09:33:41.8877551Z * [new branch] pianpwk/draft_debug_mode_tfcompile -> origin/pianpwk/draft_debug_mode_tfcompile 2025-12-04T09:33:41.8878945Z * [new branch] pianpwk/draft_multikernel_nn -> origin/pianpwk/draft_multikernel_nn 2025-12-04T09:33:41.8880646Z * [new branch] pianpwk/draft_multikernel_status_10_5 -> origin/pianpwk/draft_multikernel_status_10_5 2025-12-04T09:33:41.8882196Z * [new branch] pianpwk/dtensor_custom_chunk -> origin/pianpwk/dtensor_custom_chunk 2025-12-04T09:33:41.8883890Z * [new branch] pianpwk/dtensor_unbacked_keypath -> origin/pianpwk/dtensor_unbacked_keypath 2025-12-04T09:33:41.8885466Z * [new branch] pianpwk/event_list_tree -> origin/pianpwk/event_list_tree 2025-12-04T09:33:41.8886930Z * [new branch] pianpwk/false_numel_refs -> origin/pianpwk/false_numel_refs 2025-12-04T09:33:41.8888419Z * [new branch] pianpwk/maybe_guard_rel -> origin/pianpwk/maybe_guard_rel 2025-12-04T09:33:41.8889949Z * [new branch] pianpwk/multikernel_hints_draft -> origin/pianpwk/multikernel_hints_draft 2025-12-04T09:33:41.8891579Z * [new branch] pianpwk/no_size_oblivious_slice_scat -> origin/pianpwk/no_size_oblivious_slice_scat 2025-12-04T09:33:41.8893071Z * [new branch] pianpwk/oblivious_reshape_view_better -> origin/pianpwk/oblivious_reshape_view_better 2025-12-04T09:33:41.8894405Z * [new branch] pianpwk/pre_forward_hook -> origin/pianpwk/pre_forward_hook 2025-12-04T09:33:41.8895916Z * [new branch] pianpwk/skip_python_keys_alternate -> origin/pianpwk/skip_python_keys_alternate 2025-12-04T09:33:41.8897512Z * [new branch] pianpwk/skip_python_keys_in_guards -> origin/pianpwk/skip_python_keys_in_guards 2025-12-04T09:33:41.8898897Z * [new branch] pianpwk/sym_tokens_draft -> origin/pianpwk/sym_tokens_draft 2025-12-04T09:33:41.8900378Z * [new branch] pianpwk/symint_one_hot -> origin/pianpwk/symint_one_hot 2025-12-04T09:33:41.8902135Z * [new branch] pianpwk/test_pointwise_guard_or_false -> origin/pianpwk/test_pointwise_guard_or_false 2025-12-04T09:33:41.8903510Z * [new branch] pianpwk/totally_draft_sym_wrap -> origin/pianpwk/totally_draft_sym_wrap 2025-12-04T09:33:41.8905548Z * [new branch] pianpwk/try_dumb_stuff -> origin/pianpwk/try_dumb_stuff 2025-12-04T09:33:41.8907062Z * [new branch] pianpwk/try_dumb_stuff_2 -> origin/pianpwk/try_dumb_stuff_2 2025-12-04T09:33:41.8908569Z * [new branch] pianpwk/unbacked_dtensor_mm -> origin/pianpwk/unbacked_dtensor_mm 2025-12-04T09:33:41.8910105Z * [new branch] pianpwk/unbacked_tracing_12_2 -> origin/pianpwk/unbacked_tracing_12_2 2025-12-04T09:33:41.8911502Z * [new branch] pianpwk/user_symints -> origin/pianpwk/user_symints 2025-12-04T09:33:41.8913079Z * [new branch] pianpwk/wan21_reshape -> origin/pianpwk/wan21_reshape 2025-12-04T09:33:41.8915174Z * [new branch] piz/fix_partial_backward_1112 -> origin/piz/fix_partial_backward_1112 2025-12-04T09:33:41.8916549Z * [new branch] piz/prop_cache_clean -> origin/piz/prop_cache_clean 2025-12-04T09:33:41.8918128Z * [new branch] pool-separate -> origin/pool-separate 2025-12-04T09:33:41.8919684Z * [new branch] pr-156087 -> origin/pr-156087 2025-12-04T09:33:41.8922795Z * [new branch] pr/131860 -> origin/pr/131860 2025-12-04T09:33:41.8923682Z * [new branch] predispatch_to -> origin/predispatch_to 2025-12-04T09:33:41.8925092Z * [new branch] protect-c17 -> origin/protect-c17 2025-12-04T09:33:41.8926526Z * [new branch] pt-opt-cuda3 -> origin/pt-opt-cuda3 2025-12-04T09:33:41.8928824Z * [new branch] python_compiled_autograd -> origin/python_compiled_autograd 2025-12-04T09:33:41.8931084Z * [new branch] q1l1/fix_device_moved_constant_type_unknown -> origin/q1l1/fix_device_moved_constant_type_unknown 2025-12-04T09:33:41.8932577Z * [new branch] q1l1/fix_wrong_default_type_for_kernel_call_args -> origin/q1l1/fix_wrong_default_type_for_kernel_call_args 2025-12-04T09:33:41.8934997Z * [new branch] qchip/export-D54134695 -> origin/qchip/export-D54134695 2025-12-04T09:33:41.8936738Z * [new branch] quote-pytest_cache -> origin/quote-pytest_cache 2025-12-04T09:33:41.8938817Z * [new branch] reland-accgrad-stream-warn -> origin/reland-accgrad-stream-warn 2025-12-04T09:33:41.8940980Z * [new branch] release/1.10 -> origin/release/1.10 2025-12-04T09:33:41.8942505Z * [new branch] release/1.11 -> origin/release/1.11 2025-12-04T09:33:41.8944024Z * [new branch] release/1.12 -> origin/release/1.12 2025-12-04T09:33:41.8945543Z * [new branch] release/1.13 -> origin/release/1.13 2025-12-04T09:33:41.8947011Z * [new branch] release/1.4 -> origin/release/1.4 2025-12-04T09:33:41.8948251Z * [new branch] release/1.4.1 -> origin/release/1.4.1 2025-12-04T09:33:41.8949750Z * [new branch] release/1.5 -> origin/release/1.5 2025-12-04T09:33:41.8951407Z * [new branch] release/1.6 -> origin/release/1.6 2025-12-04T09:33:41.8952939Z * [new branch] release/1.7 -> origin/release/1.7 2025-12-04T09:33:41.8954640Z * [new branch] release/1.8 -> origin/release/1.8 2025-12-04T09:33:41.8956026Z * [new branch] release/1.9 -> origin/release/1.9 2025-12-04T09:33:41.8957545Z * [new branch] release/2.0 -> origin/release/2.0 2025-12-04T09:33:41.8959292Z * [new branch] release/2.1 -> origin/release/2.1 2025-12-04T09:33:41.8960897Z * [new branch] release/2.2 -> origin/release/2.2 2025-12-04T09:33:41.8962702Z * [new branch] release/2.3 -> origin/release/2.3 2025-12-04T09:33:41.8964807Z * [new branch] release/2.4 -> origin/release/2.4 2025-12-04T09:33:41.8966759Z * [new branch] release/2.5 -> origin/release/2.5 2025-12-04T09:33:41.8968507Z * [new branch] release/2.6 -> origin/release/2.6 2025-12-04T09:33:41.8970092Z * [new branch] release/2.7 -> origin/release/2.7 2025-12-04T09:33:41.8971889Z * [new branch] release/2.8 -> origin/release/2.8 2025-12-04T09:33:41.8973541Z * [new branch] release/2.9 -> origin/release/2.9 2025-12-04T09:33:41.8975147Z * [new branch] release_notes -> origin/release_notes 2025-12-04T09:33:41.8976865Z * [new branch] remove_pyinterpreter -> origin/remove_pyinterpreter 2025-12-04T09:33:41.8978942Z * [new branch] replace-pytorch-labs-20250812-195836 -> origin/replace-pytorch-labs-20250812-195836 2025-12-04T09:33:41.8980301Z * [new branch] replace-pytorch-labs-20250812-200248 -> origin/replace-pytorch-labs-20250812-200248 2025-12-04T09:33:41.8981696Z * [new branch] replace-pytorch-labs-20250812-200324 -> origin/replace-pytorch-labs-20250812-200324 2025-12-04T09:33:41.8983181Z * [new branch] replace-pytorch-labs-20250812-204020 -> origin/replace-pytorch-labs-20250812-204020 2025-12-04T09:33:41.8986132Z * [new branch] revert-131069-gh/krzysztofjordan/1/head -> origin/revert-131069-gh/krzysztofjordan/1/head 2025-12-04T09:33:41.8989017Z * [new branch] revert-131469-gh/andrewor14/51/head -> origin/revert-131469-gh/andrewor14/51/head 2025-12-04T09:33:41.8992334Z * [new branch] revert-152361-gh/fadara01/1/head -> origin/revert-152361-gh/fadara01/1/head 2025-12-04T09:33:41.8995767Z * [new branch] revert-156870-gh/skarjala/3/head -> origin/revert-156870-gh/skarjala/3/head 2025-12-04T09:33:41.8997785Z * [new branch] revert-157914-cherry-pick-157503-by-pytorch_bot_bot_ -> origin/revert-157914-cherry-pick-157503-by-pytorch_bot_bot_ 2025-12-04T09:33:41.8999201Z * [new branch] revert-hoo-invoke-subgraph -> origin/revert-hoo-invoke-subgraph 2025-12-04T09:33:41.9000794Z * [new branch] revert_always_build_distributed -> origin/revert_always_build_distributed 2025-12-04T09:33:41.9002265Z * [new branch] rms_norm_patch -> origin/rms_norm_patch 2025-12-04T09:33:41.9004519Z * [new branch] ruisi/fix_all_to_all_estimation -> origin/ruisi/fix_all_to_all_estimation 2025-12-04T09:33:41.9006248Z * [new branch] ruisi/fix_comm_estimation -> origin/ruisi/fix_comm_estimation 2025-12-04T09:33:41.9007768Z * [new branch] ruisi/fix_dynamic_shape_estimation -> origin/ruisi/fix_dynamic_shape_estimation 2025-12-04T09:33:41.9009204Z * [new branch] ruisi/fix_llama3_autobucketing -> origin/ruisi/fix_llama3_autobucketing 2025-12-04T09:33:41.9010968Z * [new branch] ruisi/fix_manual_bucketing_ep_pass -> origin/ruisi/fix_manual_bucketing_ep_pass 2025-12-04T09:33:41.9012796Z * [new branch] ruisi/manual_bucket_pass -> origin/ruisi/manual_bucket_pass 2025-12-04T09:33:41.9015176Z * [new branch] ryanguo99/cleanup-dynamo-expected-failures -> origin/ryanguo99/cleanup-dynamo-expected-failures 2025-12-04T09:33:41.9016475Z * [new branch] ryanguo99/fix-closure-var -> origin/ryanguo99/fix-closure-var 2025-12-04T09:33:41.9018620Z * [new branch] rzou/faketensor_bench -> origin/rzou/faketensor_bench 2025-12-04T09:33:41.9020027Z * [new branch] rzou/njt -> origin/rzou/njt 2025-12-04T09:33:41.9021517Z * [new branch] rzou/pca -> origin/rzou/pca 2025-12-04T09:33:41.9022950Z * [new branch] rzou/realprop -> origin/rzou/realprop 2025-12-04T09:33:41.9024577Z * [new branch] samplevllm -> origin/samplevllm 2025-12-04T09:33:41.9027173Z * [new branch] sanchitintel/weird_thing_with_test_cpu_select_algorithm -> origin/sanchitintel/weird_thing_with_test_cpu_select_algorithm 2025-12-04T09:33:41.9028568Z * [new branch] sapling-pr-archive-SS-JIA -> origin/sapling-pr-archive-SS-JIA 2025-12-04T09:33:41.9030308Z * [new branch] sapling-pr-archive-tushar00jain -> origin/sapling-pr-archive-tushar00jain 2025-12-04T09:33:41.9032270Z * [new branch] save -> origin/save 2025-12-04T09:33:41.9033879Z * [new branch] scaled_mm -> origin/scaled_mm 2025-12-04T09:33:41.9035468Z * [new branch] scan_attempt -> origin/scan_attempt 2025-12-04T09:33:41.9037627Z * [new branch] sdym/2.5.1 -> origin/sdym/2.5.1 2025-12-04T09:33:41.9039306Z * [new branch] sekyondaMeta-dynamoconfig-fix -> origin/sekyondaMeta-dynamoconfig-fix 2025-12-04T09:33:41.9041164Z * [new branch] shengf/fx-xform-perf -> origin/shengf/fx-xform-perf 2025-12-04T09:33:41.9042827Z * [new branch] shoumikhin-patch-1 -> origin/shoumikhin-patch-1 2025-12-04T09:33:41.9044435Z * [new branch] solve-accuracy-fix -> origin/solve-accuracy-fix 2025-12-04T09:33:41.9045970Z * [new branch] some_rocm_inductor_skips -> origin/some_rocm_inductor_skips 2025-12-04T09:33:41.9047989Z * [new branch] soulitzer/stash-tls-ac -> origin/soulitzer/stash-tls-ac 2025-12-04T09:33:41.9049627Z * [new branch] sparse-mm-bf16-support -> origin/sparse-mm-bf16-support 2025-12-04T09:33:41.9051700Z * [new branch] starterTaskUpdate -> origin/starterTaskUpdate 2025-12-04T09:33:41.9053283Z * [new branch] suo -> origin/suo 2025-12-04T09:33:41.9054868Z * [new branch] sve-poc -> origin/sve-poc 2025-12-04T09:33:41.9056626Z * [new branch] switch-bn -> origin/switch-bn 2025-12-04T09:33:41.9058325Z * [new branch] sy_annotation_in_autograd_hop -> origin/sy_annotation_in_autograd_hop 2025-12-04T09:33:41.9059822Z * [new branch] sy_aot_eager_record -> origin/sy_aot_eager_record 2025-12-04T09:33:41.9061507Z * [new branch] sy_custom_bucketing -> origin/sy_custom_bucketing 2025-12-04T09:33:41.9063301Z * [new branch] sy_debug_mode_test -> origin/sy_debug_mode_test 2025-12-04T09:33:41.9064772Z * [new branch] sy_deserialize -> origin/sy_deserialize 2025-12-04T09:33:41.9066253Z * [new branch] sy_dump_gm_code -> origin/sy_dump_gm_code 2025-12-04T09:33:41.9067781Z * [new branch] sy_exp -> origin/sy_exp 2025-12-04T09:33:41.9069376Z * [new branch] sy_export_annotation -> origin/sy_export_annotation 2025-12-04T09:33:41.9071139Z * [new branch] sy_invoke_subgraph -> origin/sy_invoke_subgraph 2025-12-04T09:33:41.9072712Z * [new branch] sy_kernel_bw_name -> origin/sy_kernel_bw_name 2025-12-04T09:33:41.9074243Z * [new branch] sy_multi_arch -> origin/sy_multi_arch 2025-12-04T09:33:41.9075858Z * [new branch] sy_nn_module_stack -> origin/sy_nn_module_stack 2025-12-04T09:33:41.9077458Z * [new branch] sy_original_dtensor -> origin/sy_original_dtensor 2025-12-04T09:33:41.9079018Z * [new branch] sy_profiler_cia -> origin/sy_profiler_cia 2025-12-04T09:33:41.9080551Z * [new branch] symm_mem_sync -> origin/symm_mem_sync 2025-12-04T09:33:41.9082196Z * [new branch] sympy-bottleneck-repro -> origin/sympy-bottleneck-repro 2025-12-04T09:33:41.9083800Z * [new branch] tensordict_integration -> origin/tensordict_integration 2025-12-04T09:33:41.9085548Z * [new branch] test-move-conda-builds -> origin/test-move-conda-builds 2025-12-04T09:33:41.9087117Z * [new branch] test-old -> origin/test-old 2025-12-04T09:33:41.9089619Z * [new branch] test/bmm_heur -> origin/test/bmm_heur 2025-12-04T09:33:41.9091780Z * [new branch] tianren/customOp_autotune_fix -> origin/tianren/customOp_autotune_fix 2025-12-04T09:33:41.9093349Z * [new branch] tianren/customOp_enable_max_autotune -> origin/tianren/customOp_enable_max_autotune 2025-12-04T09:33:41.9094692Z * [new branch] tianren/customOp_fusion -> origin/tianren/customOp_fusion 2025-12-04T09:33:41.9096380Z * [new branch] tianren/customop_collectiveop_benchmark -> origin/tianren/customop_collectiveop_benchmark 2025-12-04T09:33:41.9098276Z * [new branch] tianren/customop_collectiveop_benchmark_fix -> origin/tianren/customop_collectiveop_benchmark_fix 2025-12-04T09:33:41.9100410Z * [new branch] tianren/customop_dynamic_config -> origin/tianren/customop_dynamic_config 2025-12-04T09:33:41.9101931Z * [new branch] tianren/dynamic_range_input -> origin/tianren/dynamic_range_input 2025-12-04T09:33:41.9103554Z * [new branch] tianren/dynamic_range_input_fix -> origin/tianren/dynamic_range_input_fix 2025-12-04T09:33:41.9105035Z * [new branch] tianren/dynamic_range_input_merge -> origin/tianren/dynamic_range_input_merge 2025-12-04T09:33:41.9106489Z * [new branch] tianren/flex_paged_attn_fix_temp -> origin/tianren/flex_paged_attn_fix_temp 2025-12-04T09:33:41.9108038Z * [new branch] tianren/fx_codegen_dump -> origin/tianren/fx_codegen_dump 2025-12-04T09:33:41.9109508Z * [new branch] tianren/symmetric_memory -> origin/tianren/symmetric_memory 2025-12-04T09:33:41.9111114Z * [new branch] tianren/test -> origin/tianren/test 2025-12-04T09:33:41.9112829Z * [new branch] tidy_performance_cyy -> origin/tidy_performance_cyy 2025-12-04T09:33:41.9114309Z * [new branch] tmp -> origin/tmp 2025-12-04T09:33:41.9115915Z * [new branch] torchtitan_ep -> origin/torchtitan_ep 2025-12-04T09:33:41.9117530Z * [new branch] torchtitan_integration -> origin/torchtitan_integration 2025-12-04T09:33:41.9119302Z * [new branch] trace_fsdp_torchtune_lora -> origin/trace_fsdp_torchtune_lora 2025-12-04T09:33:41.9120686Z * [new branch] traceable_fsdp_unit_tests -> origin/traceable_fsdp_unit_tests 2025-12-04T09:33:41.9122305Z * [new branch] tree_loop_vec_base -> origin/tree_loop_vec_base 2025-12-04T09:33:41.9123929Z * [new branch] triton_kernel -> origin/triton_kernel 2025-12-04T09:33:41.9125465Z * [new branch] tt_pkg_1908 -> origin/tt_pkg_1908 2025-12-04T09:33:41.9127021Z * [new branch] type_dec -> origin/type_dec 2025-12-04T09:33:41.9128658Z * [new branch] udate-sphinx-dependancies -> origin/udate-sphinx-dependancies 2025-12-04T09:33:41.9130842Z * [new branch] update-audio-commit-hash/17630256502-1803-1 -> origin/update-audio-commit-hash/17630256502-1803-1 2025-12-04T09:33:41.9132302Z * [new branch] update-audio-commit-hash/19087141161-1916-1 -> origin/update-audio-commit-hash/19087141161-1916-1 2025-12-04T09:33:41.9133755Z * [new branch] update-audio-commit-hash/19250643381-1929-1 -> origin/update-audio-commit-hash/19250643381-1929-1 2025-12-04T09:33:41.9135346Z * [new branch] update-audio-commit-hash/19397724337-1935-1 -> origin/update-audio-commit-hash/19397724337-1935-1 2025-12-04T09:33:41.9136824Z * [new branch] update-audio-commit-hash/19555670148-1941-1 -> origin/update-audio-commit-hash/19555670148-1941-1 2025-12-04T09:33:41.9138727Z * [new branch] update-audio-commit-hash/19750627930-1946-1 -> origin/update-audio-commit-hash/19750627930-1946-1 2025-12-04T09:33:41.9140887Z * [new branch] update-triton-commit-hash/13663274526-1487-2 -> origin/update-triton-commit-hash/13663274526-1487-2 2025-12-04T09:33:41.9142871Z * [new branch] update-vision-commit-hash/19087141161-1916-1 -> origin/update-vision-commit-hash/19087141161-1916-1 2025-12-04T09:33:41.9144360Z * [new branch] update-vision-commit-hash/19184897099-1925-1 -> origin/update-vision-commit-hash/19184897099-1925-1 2025-12-04T09:33:41.9145671Z * [new branch] update-vision-commit-hash/19250643381-1929-1 -> origin/update-vision-commit-hash/19250643381-1929-1 2025-12-04T09:33:41.9147178Z * [new branch] update-vision-commit-hash/19381328640-1934-1 -> origin/update-vision-commit-hash/19381328640-1934-1 2025-12-04T09:33:41.9148569Z * [new branch] update-vision-commit-hash/19485237164-1938-1 -> origin/update-vision-commit-hash/19485237164-1938-1 2025-12-04T09:33:41.9150754Z * [new branch] update-vllm-commit-hash/18451675449-1879-1 -> origin/update-vllm-commit-hash/18451675449-1879-1 2025-12-04T09:33:41.9152164Z * [new branch] update-vllm-dockerfile -> origin/update-vllm-dockerfile 2025-12-04T09:33:41.9154372Z * [new branch] update-xla-commit-hash/19224287370-211-1 -> origin/update-xla-commit-hash/19224287370-211-1 2025-12-04T09:33:41.9155939Z * [new branch] update-xla-commit-hash/19422028566-212-1 -> origin/update-xla-commit-hash/19422028566-212-1 2025-12-04T09:33:41.9157330Z * [new branch] update-xla-commit-hash/19626841311-213-1 -> origin/update-xla-commit-hash/19626841311-213-1 2025-12-04T09:33:41.9159104Z * [new branch] update_docs_torch_multinomial_issue#125388 -> origin/update_docs_torch_multinomial_issue#125388 2025-12-04T09:33:41.9160474Z * [new branch] update_operator_readme -> origin/update_operator_readme 2025-12-04T09:33:41.9162090Z * [new branch] update_slow_tests_1722488736 -> origin/update_slow_tests_1722488736 2025-12-04T09:33:41.9163689Z * [new branch] update_slow_tests_1722879173 -> origin/update_slow_tests_1722879173 2025-12-04T09:33:41.9165282Z * [new branch] update_slow_tests_1762155677 -> origin/update_slow_tests_1762155677 2025-12-04T09:33:41.9166981Z * [new branch] update_slow_tests_1763365283 -> origin/update_slow_tests_1763365283 2025-12-04T09:33:41.9168466Z * [new branch] update_submodule_FBGEMM -> origin/update_submodule_FBGEMM 2025-12-04T09:33:41.9170076Z * [new branch] update_submodule_kineto -> origin/update_submodule_kineto 2025-12-04T09:33:41.9171838Z * [new branch] update_submodule_tensorpipe -> origin/update_submodule_tensorpipe 2025-12-04T09:33:41.9173422Z * [new branch] upload-tests-for-autorevert -> origin/upload-tests-for-autorevert 2025-12-04T09:33:41.9175049Z * [new branch] v0.1.2 -> origin/v0.1.2 2025-12-04T09:33:41.9176791Z * [new branch] v1.0.1 -> origin/v1.0.1 2025-12-04T09:33:41.9178474Z * [new branch] v1.0.3 -> origin/v1.0.3 2025-12-04T09:33:41.9180342Z * [new branch] v1.1.0 -> origin/v1.1.0 2025-12-04T09:33:41.9182215Z * [new branch] v1.2.0 -> origin/v1.2.0 2025-12-04T09:33:41.9183816Z * [new branch] v1.3.0 -> origin/v1.3.0 2025-12-04T09:33:41.9185460Z * [new branch] v1.3.1 -> origin/v1.3.1 2025-12-04T09:33:41.9187066Z * [new branch] validate_fn -> origin/validate_fn 2025-12-04T09:33:41.9188841Z * [new branch] validations_2.6 -> origin/validations_2.6 2025-12-04T09:33:41.9191293Z * [new branch] validations_2.8 -> origin/validations_2.8 2025-12-04T09:33:41.9192902Z * [new branch] varlen-api -> origin/varlen-api 2025-12-04T09:33:41.9194523Z * [new branch] varlen-api-backup -> origin/varlen-api-backup 2025-12-04T09:33:41.9196074Z * [new branch] varlen_batch_invariance -> origin/varlen_batch_invariance 2025-12-04T09:33:41.9197924Z * [new branch] viable/strict -> origin/viable/strict 2025-12-04T09:33:41.9200731Z * [new branch] vishal9-team/dtensor_parallelism_toy -> origin/vishal9-team/dtensor_parallelism_toy 2025-12-04T09:33:41.9202153Z * [new branch] vllmbuildci -> origin/vllmbuildci 2025-12-04T09:33:41.9203798Z * [new branch] vllmpin -> origin/vllmpin 2025-12-04T09:33:41.9205581Z * [new branch] vscode-recommend-pyrefly -> origin/vscode-recommend-pyrefly 2025-12-04T09:33:41.9207468Z * [new branch] wdvr-patch-1 -> origin/wdvr-patch-1 2025-12-04T09:33:41.9209553Z * [new branch] wdvr/iss_145259 -> origin/wdvr/iss_145259 2025-12-04T09:33:41.9211588Z * [new branch] whc/pei -> origin/whc/pei 2025-12-04T09:33:41.9213029Z * [new branch] whc/pp_fix -> origin/whc/pp_fix 2025-12-04T09:33:41.9214611Z * [new branch] whc/sharding -> origin/whc/sharding 2025-12-04T09:33:41.9216025Z * [new branch] whc/sharding2 -> origin/whc/sharding2 2025-12-04T09:33:41.9217623Z * [new branch] whc/uneven -> origin/whc/uneven 2025-12-04T09:33:41.9219452Z * [new branch] whc/uneven-merge -> origin/whc/uneven-merge 2025-12-04T09:33:41.9221060Z * [new branch] win_warnings -> origin/win_warnings 2025-12-04T09:33:41.9222592Z * [new branch] windows_libtorch_free -> origin/windows_libtorch_free 2025-12-04T09:33:41.9224693Z * [new branch] xmfan-war -> origin/xmfan-war 2025-12-04T09:33:41.9226709Z * [new branch] xmfan/ca_0516 -> origin/xmfan/ca_0516 2025-12-04T09:33:41.9228225Z * [new branch] xmfan/ca_1051b93192 -> origin/xmfan/ca_1051b93192 2025-12-04T09:33:41.9230075Z * [new branch] xmfan/ca_1a722f62c248391fc4a542e8851a5559aa356ae8 -> origin/xmfan/ca_1a722f62c248391fc4a542e8851a5559aa356ae8 2025-12-04T09:33:41.9230968Z * [new branch] xmfan/ca_5a2be192d1 -> origin/xmfan/ca_5a2be192d1 2025-12-04T09:33:41.9232401Z * [new branch] xmfan/ca_9d59b516e9 -> origin/xmfan/ca_9d59b516e9 2025-12-04T09:33:41.9233801Z * [new branch] xmfan/ca_apr8 -> origin/xmfan/ca_apr8 2025-12-04T09:33:41.9235191Z * [new branch] xmfan/ca_base -> origin/xmfan/ca_base 2025-12-04T09:33:41.9236987Z * [new branch] xmfan/ca_dynamic -> origin/xmfan/ca_dynamic 2025-12-04T09:33:41.9238812Z * [new branch] xmfan/ca_fix_dyn -> origin/xmfan/ca_fix_dyn 2025-12-04T09:33:41.9240362Z * [new branch] xmfan/ca_fix_lowering -> origin/xmfan/ca_fix_lowering 2025-12-04T09:33:41.9241842Z * [new branch] xmfan/ca_fix_polyfills -> origin/xmfan/ca_fix_polyfills 2025-12-04T09:33:41.9243132Z * [new branch] xmfan/ca_jan3 -> origin/xmfan/ca_jan3 2025-12-04T09:33:41.9244574Z * [new branch] xmfan/ca_jun18 -> origin/xmfan/ca_jun18 2025-12-04T09:33:41.9246087Z * [new branch] xmfan/ca_jun24 -> origin/xmfan/ca_jun24 2025-12-04T09:33:41.9247542Z * [new branch] xmfan/ca_nested -> origin/xmfan/ca_nested 2025-12-04T09:33:41.9249017Z * [new branch] xmfan/ca_overhead -> origin/xmfan/ca_overhead 2025-12-04T09:33:41.9250554Z * [new branch] xmfan/ca_overhead_0eba7e5451 -> origin/xmfan/ca_overhead_0eba7e5451 2025-12-04T09:33:41.9251948Z * [new branch] xmfan/cacu_jun18 -> origin/xmfan/cacu_jun18 2025-12-04T09:33:41.9253549Z * [new branch] xmfan/cacu_jun19 -> origin/xmfan/cacu_jun19 2025-12-04T09:33:41.9255478Z * [new branch] xmfan/cacu_jun4 -> origin/xmfan/cacu_jun4 2025-12-04T09:33:41.9257107Z * [new branch] xmfan/disable_duck_shape -> origin/xmfan/disable_duck_shape 2025-12-04T09:33:41.9258693Z * [new branch] xmfan/fca_cpp_node_passthrough -> origin/xmfan/fca_cpp_node_passthrough 2025-12-04T09:33:41.9260430Z * [new branch] xmfan/post_3945954741e2d37023c5d6954f9483008e0892f9 -> origin/xmfan/post_3945954741e2d37023c5d6954f9483008e0892f9 2025-12-04T09:33:41.9261964Z * [new branch] xmfan/pre_3945954741e2d37023c5d6954f9483008e0892f9 -> origin/xmfan/pre_3945954741e2d37023c5d6954f9483008e0892f9 2025-12-04T09:33:41.9263097Z * [new branch] xmfan/single_step -> origin/xmfan/single_step 2025-12-04T09:33:41.9264496Z * [new branch] xmfan/sth_0829 -> origin/xmfan/sth_0829 2025-12-04T09:33:41.9266598Z * [new branch] xmfan/test -> origin/xmfan/test 2025-12-04T09:33:41.9269296Z * [new branch] yguo/debug-0226-constexpr -> origin/yguo/debug-0226-constexpr 2025-12-04T09:33:41.9270690Z * [new branch] yguo/new_latest_changes -> origin/yguo/new_latest_changes 2025-12-04T09:33:41.9275672Z * [new branch] yguo/patch_constexpr_changes -> origin/yguo/patch_constexpr_changes 2025-12-04T09:33:41.9277618Z * [new branch] yiming/bootcamp -> origin/yiming/bootcamp 2025-12-04T09:33:41.9279182Z * [new branch] yiming/run_with_start_end_rng_hop -> origin/yiming/run_with_start_end_rng_hop 2025-12-04T09:33:41.9280857Z * [new branch] yolo-llama3 -> origin/yolo-llama3 2025-12-04T09:33:41.9282864Z * [new branch] zainr/canary-test -> origin/zainr/canary-test 2025-12-04T09:33:41.9284599Z * [new branch] zainr/cleanup-gh-runners -> origin/zainr/cleanup-gh-runners 2025-12-04T09:33:41.9285998Z * [new branch] zainr/pull-migration-c -> origin/zainr/pull-migration-c 2025-12-04T09:33:41.9287848Z * [new branch] zainr/test2 -> origin/zainr/test2 2025-12-04T09:33:41.9289808Z * [new branch] zasdfgbnm-patch-3 -> origin/zasdfgbnm-patch-3 2025-12-04T09:33:41.9291220Z * [new branch] zb2p -> origin/zb2p 2025-12-04T09:33:41.9292820Z * [new branch] zeros-and-scatter-part2 -> origin/zeros-and-scatter-part2 2025-12-04T09:33:41.9295267Z * [new branch] zhxchen17/ci/vllm_lora_oom -> origin/zhxchen17/ci/vllm_lora_oom 2025-12-04T09:33:41.9296861Z * [new branch] zhxchen17/ci/vllm_multimodal_oom -> origin/zhxchen17/ci/vllm_multimodal_oom 2025-12-04T09:33:41.9298363Z * [new branch] zhxchen17/ci/vllm_pin -> origin/zhxchen17/ci/vllm_pin 2025-12-04T09:33:41.9300502Z * [new branch] zhxchen17/dynamo/unsafe_drop_all_guards -> origin/zhxchen17/dynamo/unsafe_drop_all_guards 2025-12-04T09:33:41.9302535Z * [new branch] zhxchen17/export/call_override -> origin/zhxchen17/export/call_override 2025-12-04T09:33:41.9303932Z * [new branch] zhxchen17/export/codemod1 -> origin/zhxchen17/export/codemod1 2025-12-04T09:33:41.9305545Z * [new branch] zhxchen17/export/ctx_return -> origin/zhxchen17/export/ctx_return 2025-12-04T09:33:41.9307149Z * [new branch] zhxchen17/export/disable_side_effect_warn -> origin/zhxchen17/export/disable_side_effect_warn 2025-12-04T09:33:41.9308659Z * [new branch] zhxchen17/export/pytree_check -> origin/zhxchen17/export/pytree_check 2025-12-04T09:33:41.9310567Z * [new branch] zhxchen17/precompile/aoti -> origin/zhxchen17/precompile/aoti 2025-12-04T09:33:41.9312166Z * [new branch] zhxchen17/precompile/globals -> origin/zhxchen17/precompile/globals 2025-12-04T09:33:41.9313655Z * [new branch] zhxchen17/precompile/inductor_guards -> origin/zhxchen17/precompile/inductor_guards 2025-12-04T09:33:41.9315459Z * [new branch] zhxchen17/scratch/0 -> origin/zhxchen17/scratch/0 2025-12-04T09:33:41.9317176Z * [new branch] zhxchen17/torch_export_api_update -> origin/zhxchen17/torch_export_api_update 2025-12-04T09:33:41.9319225Z * [new branch] zhxhcen17/moodycamel -> origin/zhxhcen17/moodycamel 2025-12-04T09:33:41.9321545Z * [new branch] zxiiro/build-times -> origin/zxiiro/build-times 2025-12-04T09:33:41.9323116Z * [new branch] zxiiro/c7i.2xlarge -> origin/zxiiro/c7i.2xlarge 2025-12-04T09:33:41.9324679Z * [new branch] zxiiro/c7i.2xlarge.h100 -> origin/zxiiro/c7i.2xlarge.h100 2025-12-04T09:33:41.9326196Z * [new branch] zxiiro/main -> origin/zxiiro/main 2025-12-04T09:33:41.9327720Z * [new branch] zxiiro/risc64 -> origin/zxiiro/risc64 2025-12-04T09:33:41.9329400Z * [new branch] zxiiro/test-multicloud-arc -> origin/zxiiro/test-multicloud-arc 2025-12-04T09:33:41.9331023Z * [new tag] bc2caa7fdf006894eff7af936babde69ab5a40f8-huydhn-debug -> bc2caa7fdf006894eff7af936babde69ab5a40f8-huydhn-debug 2025-12-04T09:33:41.9332197Z * [new tag] ci/binaries/77164 -> ci/binaries/77164 2025-12-04T09:33:41.9333568Z * [new tag] ciflow/b200/115316 -> ciflow/b200/115316 2025-12-04T09:33:41.9334564Z * [new tag] ciflow/b200/160685 -> ciflow/b200/160685 2025-12-04T09:33:41.9335621Z * [new tag] ciflow/b200/161607 -> ciflow/b200/161607 2025-12-04T09:33:41.9336779Z * [new tag] ciflow/b200/161938 -> ciflow/b200/161938 2025-12-04T09:33:41.9338027Z * [new tag] ciflow/b200/167207 -> ciflow/b200/167207 2025-12-04T09:33:41.9338869Z * [new tag] ciflow/b200/167989 -> ciflow/b200/167989 2025-12-04T09:33:41.9340098Z * [new tag] ciflow/b200/168096 -> ciflow/b200/168096 2025-12-04T09:33:41.9341275Z * [new tag] ciflow/b200/168175 -> ciflow/b200/168175 2025-12-04T09:33:41.9342379Z * [new tag] ciflow/b200/168195 -> ciflow/b200/168195 2025-12-04T09:33:41.9343449Z * [new tag] ciflow/b200/169200 -> ciflow/b200/169200 2025-12-04T09:33:41.9344615Z * [new tag] ciflow/b200/169216 -> ciflow/b200/169216 2025-12-04T09:33:41.9346133Z * [new tag] ciflow/b200/169380 -> ciflow/b200/169380 2025-12-04T09:33:41.9347619Z * [new tag] ciflow/b200/169412 -> ciflow/b200/169412 2025-12-04T09:33:41.9348855Z * [new tag] ciflow/b200/169470 -> ciflow/b200/169470 2025-12-04T09:33:41.9350106Z * [new tag] ciflow/b200/169471 -> ciflow/b200/169471 2025-12-04T09:33:41.9351105Z * [new tag] ciflow/b200/169472 -> ciflow/b200/169472 2025-12-04T09:33:41.9352460Z * [new tag] ciflow/b200/169514 -> ciflow/b200/169514 2025-12-04T09:33:41.9353518Z * [new tag] ciflow/b200/169517 -> ciflow/b200/169517 2025-12-04T09:33:41.9354948Z * [new tag] ciflow/binaries/165922 -> ciflow/binaries/165922 2025-12-04T09:33:41.9356293Z * [new tag] ciflow/binaries/169510 -> ciflow/binaries/169510 2025-12-04T09:33:41.9357740Z * [new tag] ciflow/binaries_wheel/157994 -> ciflow/binaries_wheel/157994 2025-12-04T09:33:41.9358897Z * [new tag] ciflow/binaries_wheel/166829 -> ciflow/binaries_wheel/166829 2025-12-04T09:33:41.9359897Z * [new tag] ciflow/binaries_wheel/167972 -> ciflow/binaries_wheel/167972 2025-12-04T09:33:41.9361085Z * [new tag] ciflow/binaries_wheel/167981 -> ciflow/binaries_wheel/167981 2025-12-04T09:33:41.9362252Z * [new tag] ciflow/dynamo/167695 -> ciflow/dynamo/167695 2025-12-04T09:33:41.9363251Z * [new tag] ciflow/dynamo/168096 -> ciflow/dynamo/168096 2025-12-04T09:33:41.9364423Z * [new tag] ciflow/dynamo/169525 -> ciflow/dynamo/169525 2025-12-04T09:33:41.9365830Z * [new tag] ciflow/h100-cutlass-backend/161938 -> ciflow/h100-cutlass-backend/161938 2025-12-04T09:33:41.9366744Z * [new tag] ciflow/h100-cutlass-backend/161940 -> ciflow/h100-cutlass-backend/161940 2025-12-04T09:33:41.9368126Z * [new tag] ciflow/h100-distributed/168923 -> ciflow/h100-distributed/168923 2025-12-04T09:33:41.9369332Z * [new tag] ciflow/h100-symm-mem/167552 -> ciflow/h100-symm-mem/167552 2025-12-04T09:33:41.9370198Z * [new tag] ciflow/h100-symm-mem/168129 -> ciflow/h100-symm-mem/168129 2025-12-04T09:33:41.9371401Z * [new tag] ciflow/h100-symm-mem/168917 -> ciflow/h100-symm-mem/168917 2025-12-04T09:33:41.9372893Z * [new tag] ciflow/h100-symm-mem/169156 -> ciflow/h100-symm-mem/169156 2025-12-04T09:33:41.9373757Z * [new tag] ciflow/h100-symm-mem/169200 -> ciflow/h100-symm-mem/169200 2025-12-04T09:33:41.9374883Z * [new tag] ciflow/h100-symm-mem/169216 -> ciflow/h100-symm-mem/169216 2025-12-04T09:33:41.9375720Z * [new tag] ciflow/h100-symm-mem/169338 -> ciflow/h100-symm-mem/169338 2025-12-04T09:33:41.9377101Z * [new tag] ciflow/h100-symm-mem/169355 -> ciflow/h100-symm-mem/169355 2025-12-04T09:33:41.9377968Z * [new tag] ciflow/h100-symm-mem/169543 -> ciflow/h100-symm-mem/169543 2025-12-04T09:33:41.9379324Z * [new tag] ciflow/h100/115316 -> ciflow/h100/115316 2025-12-04T09:33:41.9380361Z * [new tag] ciflow/h100/160685 -> ciflow/h100/160685 2025-12-04T09:33:41.9381370Z * [new tag] ciflow/h100/160729 -> ciflow/h100/160729 2025-12-04T09:33:41.9382337Z * [new tag] ciflow/h100/161607 -> ciflow/h100/161607 2025-12-04T09:33:41.9383363Z * [new tag] ciflow/h100/161938 -> ciflow/h100/161938 2025-12-04T09:33:41.9384527Z * [new tag] ciflow/h100/167207 -> ciflow/h100/167207 2025-12-04T09:33:41.9385255Z * [new tag] ciflow/h100/167989 -> ciflow/h100/167989 2025-12-04T09:33:41.9386340Z * [new tag] ciflow/h100/168096 -> ciflow/h100/168096 2025-12-04T09:33:41.9387366Z * [new tag] ciflow/h100/168175 -> ciflow/h100/168175 2025-12-04T09:33:41.9388200Z * [new tag] ciflow/h100/168195 -> ciflow/h100/168195 2025-12-04T09:33:41.9389308Z * [new tag] ciflow/h100/168980 -> ciflow/h100/168980 2025-12-04T09:33:41.9390655Z * [new tag] ciflow/h100/169200 -> ciflow/h100/169200 2025-12-04T09:33:41.9392056Z * [new tag] ciflow/h100/169216 -> ciflow/h100/169216 2025-12-04T09:33:41.9393299Z * [new tag] ciflow/h100/169380 -> ciflow/h100/169380 2025-12-04T09:33:41.9394362Z * [new tag] ciflow/h100/169412 -> ciflow/h100/169412 2025-12-04T09:33:41.9395450Z * [new tag] ciflow/h100/169470 -> ciflow/h100/169470 2025-12-04T09:33:41.9396495Z * [new tag] ciflow/h100/169471 -> ciflow/h100/169471 2025-12-04T09:33:41.9397571Z * [new tag] ciflow/h100/169472 -> ciflow/h100/169472 2025-12-04T09:33:41.9398488Z * [new tag] ciflow/h100/169514 -> ciflow/h100/169514 2025-12-04T09:33:41.9399851Z * [new tag] ciflow/inductor-cu126/168096 -> ciflow/inductor-cu126/168096 2025-12-04T09:33:41.9401580Z * [new tag] ciflow/inductor-micro-benchmark-cpu-x86/168096 -> ciflow/inductor-micro-benchmark-cpu-x86/168096 2025-12-04T09:33:41.9402593Z * [new tag] ciflow/inductor-micro-benchmark/166165 -> ciflow/inductor-micro-benchmark/166165 2025-12-04T09:33:41.9403637Z * [new tag] ciflow/inductor-micro-benchmark/168096 -> ciflow/inductor-micro-benchmark/168096 2025-12-04T09:33:41.9404921Z * [new tag] ciflow/inductor-perf-compare/168096 -> ciflow/inductor-perf-compare/168096 2025-12-04T09:33:41.9406752Z * [new tag] ciflow/inductor-perf-test-nightly-rocm-mi300/168073 -> ciflow/inductor-perf-test-nightly-rocm-mi300/168073 2025-12-04T09:33:41.9407609Z * [new tag] ciflow/inductor-perf-test-nightly-rocm-mi300/168096 -> ciflow/inductor-perf-test-nightly-rocm-mi300/168096 2025-12-04T09:33:41.9408786Z * [new tag] ciflow/inductor-perf-test-nightly-rocm-mi300/169024 -> ciflow/inductor-perf-test-nightly-rocm-mi300/169024 2025-12-04T09:33:41.9410211Z * [new tag] ciflow/inductor-perf-test-nightly-rocm-mi355/169024 -> ciflow/inductor-perf-test-nightly-rocm-mi355/169024 2025-12-04T09:33:41.9411176Z * [new tag] ciflow/inductor-perf-test-nightly/168096 -> ciflow/inductor-perf-test-nightly/168096 2025-12-04T09:33:41.9412402Z * [new tag] ciflow/inductor-periodic/168096 -> ciflow/inductor-periodic/168096 2025-12-04T09:33:41.9413302Z * [new tag] ciflow/inductor-periodic/169024 -> ciflow/inductor-periodic/169024 2025-12-04T09:33:41.9414546Z * [new tag] ciflow/inductor-periodic/169425 -> ciflow/inductor-periodic/169425 2025-12-04T09:33:41.9415925Z * [new tag] ciflow/inductor-rocm-mi200/165545 -> ciflow/inductor-rocm-mi200/165545 2025-12-04T09:33:41.9417206Z * [new tag] ciflow/inductor-rocm-mi200/165997 -> ciflow/inductor-rocm-mi200/165997 2025-12-04T09:33:41.9418102Z * [new tag] ciflow/inductor-rocm-mi200/168096 -> ciflow/inductor-rocm-mi200/168096 2025-12-04T09:33:41.9419378Z * [new tag] ciflow/inductor-rocm-mi200/169063 -> ciflow/inductor-rocm-mi200/169063 2025-12-04T09:33:41.9420265Z * [new tag] ciflow/inductor-rocm-mi200/169425 -> ciflow/inductor-rocm-mi200/169425 2025-12-04T09:33:41.9421688Z * [new tag] ciflow/inductor-rocm-mi300/165545 -> ciflow/inductor-rocm-mi300/165545 2025-12-04T09:33:41.9422474Z * [new tag] ciflow/inductor-rocm-mi300/168096 -> ciflow/inductor-rocm-mi300/168096 2025-12-04T09:33:41.9423422Z * [new tag] ciflow/inductor-rocm-mi300/169063 -> ciflow/inductor-rocm-mi300/169063 2025-12-04T09:33:41.9424369Z * [new tag] ciflow/inductor-rocm-mi300/169425 -> ciflow/inductor-rocm-mi300/169425 2025-12-04T09:33:41.9425803Z * [new tag] ciflow/inductor-rocm/162052 -> ciflow/inductor-rocm/162052 2025-12-04T09:33:41.9426684Z * [new tag] ciflow/inductor-rocm/168971 -> ciflow/inductor-rocm/168971 2025-12-04T09:33:41.9428046Z * [new tag] ciflow/inductor-windows/168096 -> ciflow/inductor-windows/168096 2025-12-04T09:33:41.9429221Z * [new tag] ciflow/inductor/144542 -> ciflow/inductor/144542 2025-12-04T09:33:41.9430794Z * [new tag] ciflow/inductor/146506 -> ciflow/inductor/146506 2025-12-04T09:33:41.9431693Z * [new tag] ciflow/inductor/147990 -> ciflow/inductor/147990 2025-12-04T09:33:41.9433013Z * [new tag] ciflow/inductor/148294 -> ciflow/inductor/148294 2025-12-04T09:33:41.9434103Z * [new tag] ciflow/inductor/148492 -> ciflow/inductor/148492 2025-12-04T09:33:41.9434986Z * [new tag] ciflow/inductor/157149 -> ciflow/inductor/157149 2025-12-04T09:33:41.9436079Z * [new tag] ciflow/inductor/157994 -> ciflow/inductor/157994 2025-12-04T09:33:41.9437209Z * [new tag] ciflow/inductor/160685 -> ciflow/inductor/160685 2025-12-04T09:33:41.9438099Z * [new tag] ciflow/inductor/160686 -> ciflow/inductor/160686 2025-12-04T09:33:41.9439186Z * [new tag] ciflow/inductor/160687 -> ciflow/inductor/160687 2025-12-04T09:33:41.9440062Z * [new tag] ciflow/inductor/160688 -> ciflow/inductor/160688 2025-12-04T09:33:41.9441535Z * [new tag] ciflow/inductor/160706 -> ciflow/inductor/160706 2025-12-04T09:33:41.9442913Z * [new tag] ciflow/inductor/160729 -> ciflow/inductor/160729 2025-12-04T09:33:41.9444159Z * [new tag] ciflow/inductor/161938 -> ciflow/inductor/161938 2025-12-04T09:33:41.9445202Z * [new tag] ciflow/inductor/161939 -> ciflow/inductor/161939 2025-12-04T09:33:41.9446277Z * [new tag] ciflow/inductor/161940 -> ciflow/inductor/161940 2025-12-04T09:33:41.9447364Z * [new tag] ciflow/inductor/162052 -> ciflow/inductor/162052 2025-12-04T09:33:41.9448367Z * [new tag] ciflow/inductor/162275 -> ciflow/inductor/162275 2025-12-04T09:33:41.9449436Z * [new tag] ciflow/inductor/162795 -> ciflow/inductor/162795 2025-12-04T09:33:41.9450747Z * [new tag] ciflow/inductor/163245 -> ciflow/inductor/163245 2025-12-04T09:33:41.9451818Z * [new tag] ciflow/inductor/163335 -> ciflow/inductor/163335 2025-12-04T09:33:41.9452899Z * [new tag] ciflow/inductor/163503 -> ciflow/inductor/163503 2025-12-04T09:33:41.9453796Z * [new tag] ciflow/inductor/163942 -> ciflow/inductor/163942 2025-12-04T09:33:41.9455088Z * [new tag] ciflow/inductor/165270 -> ciflow/inductor/165270 2025-12-04T09:33:41.9456149Z * [new tag] ciflow/inductor/165274 -> ciflow/inductor/165274 2025-12-04T09:33:41.9457363Z * [new tag] ciflow/inductor/165322 -> ciflow/inductor/165322 2025-12-04T09:33:41.9458401Z * [new tag] ciflow/inductor/165597 -> ciflow/inductor/165597 2025-12-04T09:33:41.9459495Z * [new tag] ciflow/inductor/166063 -> ciflow/inductor/166063 2025-12-04T09:33:41.9460526Z * [new tag] ciflow/inductor/166075 -> ciflow/inductor/166075 2025-12-04T09:33:41.9461688Z * [new tag] ciflow/inductor/166165 -> ciflow/inductor/166165 2025-12-04T09:33:41.9462966Z * [new tag] ciflow/inductor/166254 -> ciflow/inductor/166254 2025-12-04T09:33:41.9464034Z * [new tag] ciflow/inductor/166483 -> ciflow/inductor/166483 2025-12-04T09:33:41.9465125Z * [new tag] ciflow/inductor/166494 -> ciflow/inductor/166494 2025-12-04T09:33:41.9466241Z * [new tag] ciflow/inductor/166545 -> ciflow/inductor/166545 2025-12-04T09:33:41.9467116Z * [new tag] ciflow/inductor/166788 -> ciflow/inductor/166788 2025-12-04T09:33:41.9468452Z * [new tag] ciflow/inductor/166846 -> ciflow/inductor/166846 2025-12-04T09:33:41.9469503Z * [new tag] ciflow/inductor/167300 -> ciflow/inductor/167300 2025-12-04T09:33:41.9470570Z * [new tag] ciflow/inductor/167407 -> ciflow/inductor/167407 2025-12-04T09:33:41.9471945Z * [new tag] ciflow/inductor/167536 -> ciflow/inductor/167536 2025-12-04T09:33:41.9473092Z * [new tag] ciflow/inductor/167552 -> ciflow/inductor/167552 2025-12-04T09:33:41.9474117Z * [new tag] ciflow/inductor/167555 -> ciflow/inductor/167555 2025-12-04T09:33:41.9475417Z * [new tag] ciflow/inductor/167583 -> ciflow/inductor/167583 2025-12-04T09:33:41.9476237Z * [new tag] ciflow/inductor/167599 -> ciflow/inductor/167599 2025-12-04T09:33:41.9477401Z * [new tag] ciflow/inductor/167647 -> ciflow/inductor/167647 2025-12-04T09:33:41.9478434Z * [new tag] ciflow/inductor/167677 -> ciflow/inductor/167677 2025-12-04T09:33:41.9479477Z * [new tag] ciflow/inductor/167680 -> ciflow/inductor/167680 2025-12-04T09:33:41.9480525Z * [new tag] ciflow/inductor/167695 -> ciflow/inductor/167695 2025-12-04T09:33:41.9481585Z * [new tag] ciflow/inductor/167742 -> ciflow/inductor/167742 2025-12-04T09:33:41.9482638Z * [new tag] ciflow/inductor/167768 -> ciflow/inductor/167768 2025-12-04T09:33:41.9483953Z * [new tag] ciflow/inductor/167773 -> ciflow/inductor/167773 2025-12-04T09:33:41.9485097Z * [new tag] ciflow/inductor/167781 -> ciflow/inductor/167781 2025-12-04T09:33:41.9486083Z * [new tag] ciflow/inductor/167880 -> ciflow/inductor/167880 2025-12-04T09:33:41.9487166Z * [new tag] ciflow/inductor/167887 -> ciflow/inductor/167887 2025-12-04T09:33:41.9488259Z * [new tag] ciflow/inductor/167972 -> ciflow/inductor/167972 2025-12-04T09:33:41.9489315Z * [new tag] ciflow/inductor/167989 -> ciflow/inductor/167989 2025-12-04T09:33:41.9490380Z * [new tag] ciflow/inductor/168002 -> ciflow/inductor/168002 2025-12-04T09:33:41.9491407Z * [new tag] ciflow/inductor/168050 -> ciflow/inductor/168050 2025-12-04T09:33:41.9492487Z * [new tag] ciflow/inductor/168051 -> ciflow/inductor/168051 2025-12-04T09:33:41.9493482Z * [new tag] ciflow/inductor/168052 -> ciflow/inductor/168052 2025-12-04T09:33:41.9494572Z * [new tag] ciflow/inductor/168073 -> ciflow/inductor/168073 2025-12-04T09:33:41.9495653Z * [new tag] ciflow/inductor/168096 -> ciflow/inductor/168096 2025-12-04T09:33:41.9496805Z * [new tag] ciflow/inductor/168114 -> ciflow/inductor/168114 2025-12-04T09:33:41.9497894Z * [new tag] ciflow/inductor/168115 -> ciflow/inductor/168115 2025-12-04T09:33:41.9498881Z * [new tag] ciflow/inductor/168127 -> ciflow/inductor/168127 2025-12-04T09:33:41.9500062Z * [new tag] ciflow/inductor/168129 -> ciflow/inductor/168129 2025-12-04T09:33:41.9501683Z * [new tag] ciflow/inductor/168157 -> ciflow/inductor/168157 2025-12-04T09:33:41.9502910Z * [new tag] ciflow/inductor/168175 -> ciflow/inductor/168175 2025-12-04T09:33:41.9503723Z * [new tag] ciflow/inductor/168185 -> ciflow/inductor/168185 2025-12-04T09:33:41.9504831Z * [new tag] ciflow/inductor/168195 -> ciflow/inductor/168195 2025-12-04T09:33:41.9505868Z * [new tag] ciflow/inductor/168209 -> ciflow/inductor/168209 2025-12-04T09:33:41.9506906Z * [new tag] ciflow/inductor/168266 -> ciflow/inductor/168266 2025-12-04T09:33:41.9508129Z * [new tag] ciflow/inductor/168316 -> ciflow/inductor/168316 2025-12-04T09:33:41.9509410Z * [new tag] ciflow/inductor/168326 -> ciflow/inductor/168326 2025-12-04T09:33:41.9510459Z * [new tag] ciflow/inductor/168368 -> ciflow/inductor/168368 2025-12-04T09:33:41.9511530Z * [new tag] ciflow/inductor/168894 -> ciflow/inductor/168894 2025-12-04T09:33:41.9512575Z * [new tag] ciflow/inductor/168934 -> ciflow/inductor/168934 2025-12-04T09:33:41.9513681Z * [new tag] ciflow/inductor/168939 -> ciflow/inductor/168939 2025-12-04T09:33:41.9514703Z * [new tag] ciflow/inductor/168946 -> ciflow/inductor/168946 2025-12-04T09:33:41.9515605Z * [new tag] ciflow/inductor/168950 -> ciflow/inductor/168950 2025-12-04T09:33:41.9516758Z * [new tag] ciflow/inductor/168951 -> ciflow/inductor/168951 2025-12-04T09:33:41.9517788Z * [new tag] ciflow/inductor/168952 -> ciflow/inductor/168952 2025-12-04T09:33:41.9518848Z * [new tag] ciflow/inductor/168955 -> ciflow/inductor/168955 2025-12-04T09:33:41.9519831Z * [new tag] ciflow/inductor/168971 -> ciflow/inductor/168971 2025-12-04T09:33:41.9520919Z * [new tag] ciflow/inductor/168979 -> ciflow/inductor/168979 2025-12-04T09:33:41.9521970Z * [new tag] ciflow/inductor/168980 -> ciflow/inductor/168980 2025-12-04T09:33:41.9523188Z * [new tag] ciflow/inductor/168983 -> ciflow/inductor/168983 2025-12-04T09:33:41.9524252Z * [new tag] ciflow/inductor/169006 -> ciflow/inductor/169006 2025-12-04T09:33:41.9525336Z * [new tag] ciflow/inductor/169023 -> ciflow/inductor/169023 2025-12-04T09:33:41.9526422Z * [new tag] ciflow/inductor/169024 -> ciflow/inductor/169024 2025-12-04T09:33:41.9527297Z * [new tag] ciflow/inductor/169025 -> ciflow/inductor/169025 2025-12-04T09:33:41.9528471Z * [new tag] ciflow/inductor/169066 -> ciflow/inductor/169066 2025-12-04T09:33:41.9529516Z * [new tag] ciflow/inductor/169091 -> ciflow/inductor/169091 2025-12-04T09:33:41.9530593Z * [new tag] ciflow/inductor/169102 -> ciflow/inductor/169102 2025-12-04T09:33:41.9531492Z * [new tag] ciflow/inductor/169103 -> ciflow/inductor/169103 2025-12-04T09:33:41.9532617Z * [new tag] ciflow/inductor/169121 -> ciflow/inductor/169121 2025-12-04T09:33:41.9533730Z * [new tag] ciflow/inductor/169134 -> ciflow/inductor/169134 2025-12-04T09:33:41.9534717Z * [new tag] ciflow/inductor/169135 -> ciflow/inductor/169135 2025-12-04T09:33:41.9535766Z * [new tag] ciflow/inductor/169141 -> ciflow/inductor/169141 2025-12-04T09:33:41.9536866Z * [new tag] ciflow/inductor/169151 -> ciflow/inductor/169151 2025-12-04T09:33:41.9538112Z * [new tag] ciflow/inductor/169161 -> ciflow/inductor/169161 2025-12-04T09:33:41.9539198Z * [new tag] ciflow/inductor/169167 -> ciflow/inductor/169167 2025-12-04T09:33:41.9540434Z * [new tag] ciflow/inductor/169177 -> ciflow/inductor/169177 2025-12-04T09:33:41.9541732Z * [new tag] ciflow/inductor/169185 -> ciflow/inductor/169185 2025-12-04T09:33:41.9542822Z * [new tag] ciflow/inductor/169196 -> ciflow/inductor/169196 2025-12-04T09:33:41.9543913Z * [new tag] ciflow/inductor/169200 -> ciflow/inductor/169200 2025-12-04T09:33:41.9544975Z * [new tag] ciflow/inductor/169204 -> ciflow/inductor/169204 2025-12-04T09:33:41.9545995Z * [new tag] ciflow/inductor/169216 -> ciflow/inductor/169216 2025-12-04T09:33:41.9547083Z * [new tag] ciflow/inductor/169219 -> ciflow/inductor/169219 2025-12-04T09:33:41.9548125Z * [new tag] ciflow/inductor/169220 -> ciflow/inductor/169220 2025-12-04T09:33:41.9549409Z * [new tag] ciflow/inductor/169230 -> ciflow/inductor/169230 2025-12-04T09:33:41.9550437Z * [new tag] ciflow/inductor/169242 -> ciflow/inductor/169242 2025-12-04T09:33:41.9551465Z * [new tag] ciflow/inductor/169245 -> ciflow/inductor/169245 2025-12-04T09:33:41.9552691Z * [new tag] ciflow/inductor/169260 -> ciflow/inductor/169260 2025-12-04T09:33:41.9553720Z * [new tag] ciflow/inductor/169282 -> ciflow/inductor/169282 2025-12-04T09:33:41.9554915Z * [new tag] ciflow/inductor/169286 -> ciflow/inductor/169286 2025-12-04T09:33:41.9555700Z * [new tag] ciflow/inductor/169299 -> ciflow/inductor/169299 2025-12-04T09:33:41.9556988Z * [new tag] ciflow/inductor/169304 -> ciflow/inductor/169304 2025-12-04T09:33:41.9558552Z * [new tag] ciflow/inductor/169305 -> ciflow/inductor/169305 2025-12-04T09:33:41.9559628Z * [new tag] ciflow/inductor/169308 -> ciflow/inductor/169308 2025-12-04T09:33:41.9560705Z * [new tag] ciflow/inductor/169319 -> ciflow/inductor/169319 2025-12-04T09:33:41.9561782Z * [new tag] ciflow/inductor/169326 -> ciflow/inductor/169326 2025-12-04T09:33:41.9562851Z * [new tag] ciflow/inductor/169332 -> ciflow/inductor/169332 2025-12-04T09:33:41.9563728Z * [new tag] ciflow/inductor/169333 -> ciflow/inductor/169333 2025-12-04T09:33:41.9565147Z * [new tag] ciflow/inductor/169336 -> ciflow/inductor/169336 2025-12-04T09:33:41.9566279Z * [new tag] ciflow/inductor/169340 -> ciflow/inductor/169340 2025-12-04T09:33:41.9567365Z * [new tag] ciflow/inductor/169341 -> ciflow/inductor/169341 2025-12-04T09:33:41.9568466Z * [new tag] ciflow/inductor/169343 -> ciflow/inductor/169343 2025-12-04T09:33:41.9569336Z * [new tag] ciflow/inductor/169346 -> ciflow/inductor/169346 2025-12-04T09:33:41.9570662Z * [new tag] ciflow/inductor/169348 -> ciflow/inductor/169348 2025-12-04T09:33:41.9573023Z * [new tag] ciflow/inductor/169350 -> ciflow/inductor/169350 2025-12-04T09:33:41.9573575Z * [new tag] ciflow/inductor/169355 -> ciflow/inductor/169355 2025-12-04T09:33:41.9574701Z * [new tag] ciflow/inductor/169370 -> ciflow/inductor/169370 2025-12-04T09:33:41.9576278Z * [new tag] ciflow/inductor/169375 -> ciflow/inductor/169375 2025-12-04T09:33:41.9577452Z * [new tag] ciflow/inductor/169389 -> ciflow/inductor/169389 2025-12-04T09:33:41.9578523Z * [new tag] ciflow/inductor/169391 -> ciflow/inductor/169391 2025-12-04T09:33:41.9579686Z * [new tag] ciflow/inductor/169393 -> ciflow/inductor/169393 2025-12-04T09:33:41.9580733Z * [new tag] ciflow/inductor/169399 -> ciflow/inductor/169399 2025-12-04T09:33:41.9582022Z * [new tag] ciflow/inductor/169400 -> ciflow/inductor/169400 2025-12-04T09:33:41.9583065Z * [new tag] ciflow/inductor/169415 -> ciflow/inductor/169415 2025-12-04T09:33:41.9584378Z * [new tag] ciflow/inductor/169417 -> ciflow/inductor/169417 2025-12-04T09:33:41.9585073Z * [new tag] ciflow/inductor/169418 -> ciflow/inductor/169418 2025-12-04T09:33:41.9586521Z * [new tag] ciflow/inductor/169430 -> ciflow/inductor/169430 2025-12-04T09:33:41.9587570Z * [new tag] ciflow/inductor/169432 -> ciflow/inductor/169432 2025-12-04T09:33:41.9588584Z * [new tag] ciflow/inductor/169436 -> ciflow/inductor/169436 2025-12-04T09:33:41.9589819Z * [new tag] ciflow/inductor/169437 -> ciflow/inductor/169437 2025-12-04T09:33:41.9590875Z * [new tag] ciflow/inductor/169438 -> ciflow/inductor/169438 2025-12-04T09:33:41.9591947Z * [new tag] ciflow/inductor/169441 -> ciflow/inductor/169441 2025-12-04T09:33:41.9593024Z * [new tag] ciflow/inductor/169446 -> ciflow/inductor/169446 2025-12-04T09:33:41.9594246Z * [new tag] ciflow/inductor/169447 -> ciflow/inductor/169447 2025-12-04T09:33:41.9595144Z * [new tag] ciflow/inductor/169452 -> ciflow/inductor/169452 2025-12-04T09:33:41.9596491Z * [new tag] ciflow/inductor/169455 -> ciflow/inductor/169455 2025-12-04T09:33:41.9597436Z * [new tag] ciflow/inductor/169459 -> ciflow/inductor/169459 2025-12-04T09:33:41.9598673Z * [new tag] ciflow/inductor/169463 -> ciflow/inductor/169463 2025-12-04T09:33:41.9599905Z * [new tag] ciflow/inductor/169476 -> ciflow/inductor/169476 2025-12-04T09:33:41.9600946Z * [new tag] ciflow/inductor/169485 -> ciflow/inductor/169485 2025-12-04T09:33:41.9602043Z * [new tag] ciflow/inductor/169493 -> ciflow/inductor/169493 2025-12-04T09:33:41.9603202Z * [new tag] ciflow/inductor/169496 -> ciflow/inductor/169496 2025-12-04T09:33:41.9604098Z * [new tag] ciflow/inductor/169497 -> ciflow/inductor/169497 2025-12-04T09:33:41.9605239Z * [new tag] ciflow/inductor/169503 -> ciflow/inductor/169503 2025-12-04T09:33:41.9606287Z * [new tag] ciflow/inductor/169504 -> ciflow/inductor/169504 2025-12-04T09:33:41.9607685Z * [new tag] ciflow/inductor/169505 -> ciflow/inductor/169505 2025-12-04T09:33:41.9609238Z * [new tag] ciflow/inductor/169508 -> ciflow/inductor/169508 2025-12-04T09:33:41.9610308Z * [new tag] ciflow/inductor/169509 -> ciflow/inductor/169509 2025-12-04T09:33:41.9611445Z * [new tag] ciflow/inductor/169513 -> ciflow/inductor/169513 2025-12-04T09:33:41.9612531Z * [new tag] ciflow/inductor/169514 -> ciflow/inductor/169514 2025-12-04T09:33:41.9613588Z * [new tag] ciflow/inductor/169515 -> ciflow/inductor/169515 2025-12-04T09:33:41.9614657Z * [new tag] ciflow/inductor/169517 -> ciflow/inductor/169517 2025-12-04T09:33:41.9615861Z * [new tag] ciflow/inductor/169519 -> ciflow/inductor/169519 2025-12-04T09:33:41.9616844Z * [new tag] ciflow/inductor/169520 -> ciflow/inductor/169520 2025-12-04T09:33:41.9618066Z * [new tag] ciflow/inductor/169521 -> ciflow/inductor/169521 2025-12-04T09:33:41.9619090Z * [new tag] ciflow/inductor/169524 -> ciflow/inductor/169524 2025-12-04T09:33:41.9620157Z * [new tag] ciflow/inductor/169527 -> ciflow/inductor/169527 2025-12-04T09:33:41.9621205Z * [new tag] ciflow/inductor/169528 -> ciflow/inductor/169528 2025-12-04T09:33:41.9622421Z * [new tag] ciflow/inductor/169532 -> ciflow/inductor/169532 2025-12-04T09:33:41.9623506Z * [new tag] ciflow/inductor/169535 -> ciflow/inductor/169535 2025-12-04T09:33:41.9624556Z * [new tag] ciflow/inductor/169536 -> ciflow/inductor/169536 2025-12-04T09:33:41.9625746Z * [new tag] ciflow/inductor/169547 -> ciflow/inductor/169547 2025-12-04T09:33:41.9626565Z * [new tag] ciflow/inductor/169548 -> ciflow/inductor/169548 2025-12-04T09:33:41.9627711Z * [new tag] ciflow/inductor/169549 -> ciflow/inductor/169549 2025-12-04T09:33:41.9628794Z * [new tag] ciflow/inductor/169551 -> ciflow/inductor/169551 2025-12-04T09:33:41.9629832Z * [new tag] ciflow/inductor/169552 -> ciflow/inductor/169552 2025-12-04T09:33:41.9630914Z * [new tag] ciflow/inductor/169553 -> ciflow/inductor/169553 2025-12-04T09:33:41.9631946Z * [new tag] ciflow/inductor/169557 -> ciflow/inductor/169557 2025-12-04T09:33:41.9633427Z * [new tag] ciflow/inductor/3b9a386 -> ciflow/inductor/3b9a386 2025-12-04T09:33:41.9634748Z * [new tag] ciflow/inductor/3d4b92b -> ciflow/inductor/3d4b92b 2025-12-04T09:33:41.9635947Z * [new tag] ciflow/inductor/d224ac7 -> ciflow/inductor/d224ac7 2025-12-04T09:33:41.9637321Z * [new tag] ciflow/linux-aarch64/157994 -> ciflow/linux-aarch64/157994 2025-12-04T09:33:41.9638185Z * [new tag] ciflow/linux-aarch64/166075 -> ciflow/linux-aarch64/166075 2025-12-04T09:33:41.9639372Z * [new tag] ciflow/linux-aarch64/166876 -> ciflow/linux-aarch64/166876 2025-12-04T09:33:41.9640207Z * [new tag] ciflow/linux-aarch64/167981 -> ciflow/linux-aarch64/167981 2025-12-04T09:33:41.9641549Z * [new tag] ciflow/mps/166254 -> ciflow/mps/166254 2025-12-04T09:33:41.9642376Z * [new tag] ciflow/mps/169017 -> ciflow/mps/169017 2025-12-04T09:33:41.9643628Z * [new tag] ciflow/mps/169372 -> ciflow/mps/169372 2025-12-04T09:33:41.9644500Z * [new tag] ciflow/mps/169478 -> ciflow/mps/169478 2025-12-04T09:33:41.9645919Z * [new tag] ciflow/op-benchmark/157994 -> ciflow/op-benchmark/157994 2025-12-04T09:33:41.9646775Z * [new tag] ciflow/op-benchmark/166075 -> ciflow/op-benchmark/166075 2025-12-04T09:33:41.9647989Z * [new tag] ciflow/op-benchmark/169544 -> ciflow/op-benchmark/169544 2025-12-04T09:33:41.9649348Z * [new tag] ciflow/periodic-rocm-mi200/165997 -> ciflow/periodic-rocm-mi200/165997 2025-12-04T09:33:41.9650593Z * [new tag] ciflow/periodic-rocm-mi200/166517 -> ciflow/periodic-rocm-mi200/166517 2025-12-04T09:33:41.9651451Z * [new tag] ciflow/periodic-rocm-mi200/169063 -> ciflow/periodic-rocm-mi200/169063 2025-12-04T09:33:41.9652613Z * [new tag] ciflow/periodic-rocm-mi200/169425 -> ciflow/periodic-rocm-mi200/169425 2025-12-04T09:33:41.9653858Z * [new tag] ciflow/periodic-rocm-mi300/166517 -> ciflow/periodic-rocm-mi300/166517 2025-12-04T09:33:41.9654841Z * [new tag] ciflow/periodic-rocm-mi300/169063 -> ciflow/periodic-rocm-mi300/169063 2025-12-04T09:33:41.9655666Z * [new tag] ciflow/periodic-rocm-mi300/169425 -> ciflow/periodic-rocm-mi300/169425 2025-12-04T09:33:41.9657389Z * [new tag] ciflow/periodic/054a2fd -> ciflow/periodic/054a2fd 2025-12-04T09:33:41.9658214Z * [new tag] ciflow/periodic/167207 -> ciflow/periodic/167207 2025-12-04T09:33:41.9659469Z * [new tag] ciflow/periodic/167978 -> ciflow/periodic/167978 2025-12-04T09:33:41.9660335Z * [new tag] ciflow/periodic/168096 -> ciflow/periodic/168096 2025-12-04T09:33:41.9661439Z * [new tag] ciflow/periodic/169286 -> ciflow/periodic/169286 2025-12-04T09:33:41.9662766Z * [new tag] ciflow/periodic/2a6d37d -> ciflow/periodic/2a6d37d 2025-12-04T09:33:41.9663919Z * [new tag] ciflow/periodic/317eeb8 -> ciflow/periodic/317eeb8 2025-12-04T09:33:41.9665168Z * [new tag] ciflow/periodic/3c32 -> ciflow/periodic/3c32 2025-12-04T09:33:41.9666327Z * [new tag] ciflow/periodic/3e98831 -> ciflow/periodic/3e98831 2025-12-04T09:33:41.9668372Z * [new tag] ciflow/periodic/7c648509a7470ace9fb2bae960dd4790f7e943e9 -> ciflow/periodic/7c648509a7470ace9fb2bae960dd4790f7e943e9 2025-12-04T09:33:41.9669592Z * [new tag] ciflow/periodic/94512-point -> ciflow/periodic/94512-point 2025-12-04T09:33:41.9671416Z * [new tag] ciflow/periodic/csl/test87519 -> ciflow/periodic/csl/test87519 2025-12-04T09:33:41.9675876Z * [new tag] ciflow/periodic/csltest88275 -> ciflow/periodic/csltest88275 2025-12-04T09:33:41.9677090Z * [new tag] ciflow/periodic/csltest88761 -> ciflow/periodic/csltest88761 2025-12-04T09:33:41.9678379Z * [new tag] ciflow/periodic/release_1.12 -> ciflow/periodic/release_1.12 2025-12-04T09:33:41.9679828Z * [new tag] ciflow/periodic/release_1.12.0 -> ciflow/periodic/release_1.12.0 2025-12-04T09:33:41.9681212Z * [new tag] ciflow/periodic/sha-ec5b83 -> ciflow/periodic/sha-ec5b83 2025-12-04T09:33:41.9682442Z * [new tag] ciflow/pull/167207 -> ciflow/pull/167207 2025-12-04T09:33:41.9684095Z * [new tag] ciflow/quantization-periodic/169207 -> ciflow/quantization-periodic/169207 2025-12-04T09:33:41.9685194Z * [new tag] ciflow/rocm-mi200/165545 -> ciflow/rocm-mi200/165545 2025-12-04T09:33:41.9686075Z * [new tag] ciflow/rocm-mi200/165997 -> ciflow/rocm-mi200/165997 2025-12-04T09:33:41.9687104Z * [new tag] ciflow/rocm-mi200/168096 -> ciflow/rocm-mi200/168096 2025-12-04T09:33:41.9688324Z * [new tag] ciflow/rocm-mi200/168275 -> ciflow/rocm-mi200/168275 2025-12-04T09:33:41.9689214Z * [new tag] ciflow/rocm-mi200/169063 -> ciflow/rocm-mi200/169063 2025-12-04T09:33:41.9690484Z * [new tag] ciflow/rocm-mi200/169356 -> ciflow/rocm-mi200/169356 2025-12-04T09:33:41.9691349Z * [new tag] ciflow/rocm-mi200/169425 -> ciflow/rocm-mi200/169425 2025-12-04T09:33:41.9692655Z * [new tag] ciflow/rocm-mi300/165545 -> ciflow/rocm-mi300/165545 2025-12-04T09:33:41.9693871Z * [new tag] ciflow/rocm-mi300/167157 -> ciflow/rocm-mi300/167157 2025-12-04T09:33:41.9694862Z * [new tag] ciflow/rocm-mi300/168096 -> ciflow/rocm-mi300/168096 2025-12-04T09:33:41.9695771Z * [new tag] ciflow/rocm-mi300/169063 -> ciflow/rocm-mi300/169063 2025-12-04T09:33:41.9696868Z * [new tag] ciflow/rocm-mi300/169425 -> ciflow/rocm-mi300/169425 2025-12-04T09:33:41.9698140Z * [new tag] ciflow/rocm-mi355/167157 -> ciflow/rocm-mi355/167157 2025-12-04T09:33:41.9699180Z * [new tag] ciflow/rocm-mi355/168275 -> ciflow/rocm-mi355/168275 2025-12-04T09:33:41.9700662Z * [new tag] ciflow/rocm-mi355/169425 -> ciflow/rocm-mi355/169425 2025-12-04T09:33:41.9701965Z * [new tag] ciflow/rocm-navi31/168275 -> ciflow/rocm-navi31/168275 2025-12-04T09:33:41.9702856Z * [new tag] ciflow/rocm-navi31/169425 -> ciflow/rocm-navi31/169425 2025-12-04T09:33:41.9704153Z * [new tag] ciflow/rocm/115316 -> ciflow/rocm/115316 2025-12-04T09:33:41.9705209Z * [new tag] ciflow/rocm/148492 -> ciflow/rocm/148492 2025-12-04T09:33:41.9706203Z * [new tag] ciflow/rocm/160685 -> ciflow/rocm/160685 2025-12-04T09:33:41.9707195Z * [new tag] ciflow/rocm/161607 -> ciflow/rocm/161607 2025-12-04T09:33:41.9708201Z * [new tag] ciflow/rocm/162052 -> ciflow/rocm/162052 2025-12-04T09:33:41.9709234Z * [new tag] ciflow/rocm/165997 -> ciflow/rocm/165997 2025-12-04T09:33:41.9710345Z * [new tag] ciflow/rocm/166165 -> ciflow/rocm/166165 2025-12-04T09:33:41.9711164Z * [new tag] ciflow/rocm/166517 -> ciflow/rocm/166517 2025-12-04T09:33:41.9712244Z * [new tag] ciflow/rocm/167207 -> ciflow/rocm/167207 2025-12-04T09:33:41.9713376Z * [new tag] ciflow/rocm/167536 -> ciflow/rocm/167536 2025-12-04T09:33:41.9714242Z * [new tag] ciflow/rocm/167781 -> ciflow/rocm/167781 2025-12-04T09:33:41.9716428Z * [new tag] ciflow/rocm/167989 -> ciflow/rocm/167989 2025-12-04T09:33:41.9717171Z * [new tag] ciflow/rocm/168073 -> ciflow/rocm/168073 2025-12-04T09:33:41.9718594Z * [new tag] ciflow/rocm/168195 -> ciflow/rocm/168195 2025-12-04T09:33:41.9719659Z * [new tag] ciflow/rocm/168939 -> ciflow/rocm/168939 2025-12-04T09:33:41.9720695Z * [new tag] ciflow/rocm/168971 -> ciflow/rocm/168971 2025-12-04T09:33:41.9721807Z * [new tag] ciflow/rocm/169024 -> ciflow/rocm/169024 2025-12-04T09:33:41.9722855Z * [new tag] ciflow/rocm/169200 -> ciflow/rocm/169200 2025-12-04T09:33:41.9723897Z * [new tag] ciflow/rocm/169216 -> ciflow/rocm/169216 2025-12-04T09:33:41.9724873Z * [new tag] ciflow/rocm/169312 -> ciflow/rocm/169312 2025-12-04T09:33:41.9725988Z * [new tag] ciflow/rocm/169380 -> ciflow/rocm/169380 2025-12-04T09:33:41.9727628Z * [new tag] ciflow/rocm/169427 -> ciflow/rocm/169427 2025-12-04T09:33:41.9728648Z * [new tag] ciflow/rocm/169455 -> ciflow/rocm/169455 2025-12-04T09:33:41.9729720Z * [new tag] ciflow/rocm/169470 -> ciflow/rocm/169470 2025-12-04T09:33:41.9730776Z * [new tag] ciflow/rocm/169471 -> ciflow/rocm/169471 2025-12-04T09:33:41.9731890Z * [new tag] ciflow/rocm/169472 -> ciflow/rocm/169472 2025-12-04T09:33:41.9732860Z * [new tag] ciflow/rocm/169514 -> ciflow/rocm/169514 2025-12-04T09:33:41.9734422Z * [new tag] ciflow/slow/01c7106 -> ciflow/slow/01c7106 2025-12-04T09:33:41.9735600Z * [new tag] ciflow/slow/0577043 -> ciflow/slow/0577043 2025-12-04T09:33:41.9737477Z * [new tag] ciflow/slow/0d5b74da0cab798fbfdb9caa53fad816999c8386-sdym -> ciflow/slow/0d5b74da0cab798fbfdb9caa53fad816999c8386-sdym 2025-12-04T09:33:41.9738155Z * [new tag] ciflow/slow/0e81104 -> ciflow/slow/0e81104 2025-12-04T09:33:41.9739260Z * [new tag] ciflow/slow/167207 -> ciflow/slow/167207 2025-12-04T09:33:41.9740240Z * [new tag] ciflow/slow/168050 -> ciflow/slow/168050 2025-12-04T09:33:41.9741558Z * [new tag] ciflow/slow/1732077 -> ciflow/slow/1732077 2025-12-04T09:33:41.9742884Z * [new tag] ciflow/slow/187eb7c -> ciflow/slow/187eb7c 2025-12-04T09:33:41.9744407Z * [new tag] ciflow/slow/1faef89 -> ciflow/slow/1faef89 2025-12-04T09:33:41.9745896Z * [new tag] ciflow/slow/3920ec1 -> ciflow/slow/3920ec1 2025-12-04T09:33:41.9747366Z * [new tag] ciflow/slow/3b7c6b2 -> ciflow/slow/3b7c6b2 2025-12-04T09:33:41.9748636Z * [new tag] ciflow/slow/59a3759 -> ciflow/slow/59a3759 2025-12-04T09:33:41.9749877Z * [new tag] ciflow/slow/70ef0bb -> ciflow/slow/70ef0bb 2025-12-04T09:33:41.9751215Z * [new tag] ciflow/slow/788ff06 -> ciflow/slow/788ff06 2025-12-04T09:33:41.9753058Z * [new tag] ciflow/slow/8751002215790a3a88750faa8f4366933e296693-sdym -> ciflow/slow/8751002215790a3a88750faa8f4366933e296693-sdym 2025-12-04T09:33:41.9753796Z * [new tag] ciflow/slow/9d85864 -> ciflow/slow/9d85864 2025-12-04T09:33:41.9755210Z * [new tag] ciflow/slow/9ffad5b -> ciflow/slow/9ffad5b 2025-12-04T09:33:41.9756367Z * [new tag] ciflow/slow/a206e8b -> ciflow/slow/a206e8b 2025-12-04T09:33:41.9757647Z * [new tag] ciflow/slow/a837609 -> ciflow/slow/a837609 2025-12-04T09:33:41.9758976Z * [new tag] ciflow/slow/af841f3 -> ciflow/slow/af841f3 2025-12-04T09:33:41.9760841Z * [new tag] ciflow/slow/da3aba1e46157c4df504b067477cdf2b3c96b194-sdym -> ciflow/slow/da3aba1e46157c4df504b067477cdf2b3c96b194-sdym 2025-12-04T09:33:41.9761644Z * [new tag] ciflow/torchbench/168175 -> ciflow/torchbench/168175 2025-12-04T09:33:41.9762950Z * [new tag] ciflow/trunk/148492 -> ciflow/trunk/148492 2025-12-04T09:33:41.9763991Z * [new tag] ciflow/trunk/157149 -> ciflow/trunk/157149 2025-12-04T09:33:41.9765015Z * [new tag] ciflow/trunk/157994 -> ciflow/trunk/157994 2025-12-04T09:33:41.9766014Z * [new tag] ciflow/trunk/159718 -> ciflow/trunk/159718 2025-12-04T09:33:41.9767064Z * [new tag] ciflow/trunk/160685 -> ciflow/trunk/160685 2025-12-04T09:33:41.9768099Z * [new tag] ciflow/trunk/160729 -> ciflow/trunk/160729 2025-12-04T09:33:41.9768992Z * [new tag] ciflow/trunk/162275 -> ciflow/trunk/162275 2025-12-04T09:33:41.9770109Z * [new tag] ciflow/trunk/162795 -> ciflow/trunk/162795 2025-12-04T09:33:41.9771421Z * [new tag] ciflow/trunk/163245 -> ciflow/trunk/163245 2025-12-04T09:33:41.9772485Z * [new tag] ciflow/trunk/163942 -> ciflow/trunk/163942 2025-12-04T09:33:41.9773393Z * [new tag] ciflow/trunk/165274 -> ciflow/trunk/165274 2025-12-04T09:33:41.9775080Z * [new tag] ciflow/trunk/165483 -> ciflow/trunk/165483 2025-12-04T09:33:41.9776506Z * [new tag] ciflow/trunk/165728 -> ciflow/trunk/165728 2025-12-04T09:33:41.9777779Z * [new tag] ciflow/trunk/165922 -> ciflow/trunk/165922 2025-12-04T09:33:41.9778843Z * [new tag] ciflow/trunk/166075 -> ciflow/trunk/166075 2025-12-04T09:33:41.9779945Z * [new tag] ciflow/trunk/166165 -> ciflow/trunk/166165 2025-12-04T09:33:41.9780989Z * [new tag] ciflow/trunk/166829 -> ciflow/trunk/166829 2025-12-04T09:33:41.9782255Z * [new tag] ciflow/trunk/166843 -> ciflow/trunk/166843 2025-12-04T09:33:41.9783303Z * [new tag] ciflow/trunk/166876 -> ciflow/trunk/166876 2025-12-04T09:33:41.9784364Z * [new tag] ciflow/trunk/167207 -> ciflow/trunk/167207 2025-12-04T09:33:41.9785399Z * [new tag] ciflow/trunk/167536 -> ciflow/trunk/167536 2025-12-04T09:33:41.9787183Z * [new tag] ciflow/trunk/167552 -> ciflow/trunk/167552 2025-12-04T09:33:41.9788274Z * [new tag] ciflow/trunk/167555 -> ciflow/trunk/167555 2025-12-04T09:33:41.9789406Z * [new tag] ciflow/trunk/167599 -> ciflow/trunk/167599 2025-12-04T09:33:41.9790472Z * [new tag] ciflow/trunk/167659 -> ciflow/trunk/167659 2025-12-04T09:33:41.9791670Z * [new tag] ciflow/trunk/167672 -> ciflow/trunk/167672 2025-12-04T09:33:41.9792731Z * [new tag] ciflow/trunk/167742 -> ciflow/trunk/167742 2025-12-04T09:33:41.9793757Z * [new tag] ciflow/trunk/167781 -> ciflow/trunk/167781 2025-12-04T09:33:41.9795054Z * [new tag] ciflow/trunk/167837 -> ciflow/trunk/167837 2025-12-04T09:33:41.9796149Z * [new tag] ciflow/trunk/167887 -> ciflow/trunk/167887 2025-12-04T09:33:41.9797181Z * [new tag] ciflow/trunk/167978 -> ciflow/trunk/167978 2025-12-04T09:33:41.9798438Z * [new tag] ciflow/trunk/168050 -> ciflow/trunk/168050 2025-12-04T09:33:41.9799175Z * [new tag] ciflow/trunk/168051 -> ciflow/trunk/168051 2025-12-04T09:33:41.9800312Z * [new tag] ciflow/trunk/168096 -> ciflow/trunk/168096 2025-12-04T09:33:41.9801352Z * [new tag] ciflow/trunk/168127 -> ciflow/trunk/168127 2025-12-04T09:33:41.9802406Z * [new tag] ciflow/trunk/168157 -> ciflow/trunk/168157 2025-12-04T09:33:41.9803498Z * [new tag] ciflow/trunk/168175 -> ciflow/trunk/168175 2025-12-04T09:33:41.9804535Z * [new tag] ciflow/trunk/168209 -> ciflow/trunk/168209 2025-12-04T09:33:41.9805754Z * [new tag] ciflow/trunk/168213 -> ciflow/trunk/168213 2025-12-04T09:33:41.9807012Z * [new tag] ciflow/trunk/168226 -> ciflow/trunk/168226 2025-12-04T09:33:41.9808095Z * [new tag] ciflow/trunk/168262 -> ciflow/trunk/168262 2025-12-04T09:33:41.9809171Z * [new tag] ciflow/trunk/168275 -> ciflow/trunk/168275 2025-12-04T09:33:41.9810403Z * [new tag] ciflow/trunk/168328 -> ciflow/trunk/168328 2025-12-04T09:33:41.9811363Z * [new tag] ciflow/trunk/168368 -> ciflow/trunk/168368 2025-12-04T09:33:41.9812486Z * [new tag] ciflow/trunk/168917 -> ciflow/trunk/168917 2025-12-04T09:33:41.9813541Z * [new tag] ciflow/trunk/168933 -> ciflow/trunk/168933 2025-12-04T09:33:41.9814794Z * [new tag] ciflow/trunk/168941 -> ciflow/trunk/168941 2025-12-04T09:33:41.9815858Z * [new tag] ciflow/trunk/168955 -> ciflow/trunk/168955 2025-12-04T09:33:41.9816989Z * [new tag] ciflow/trunk/168980 -> ciflow/trunk/168980 2025-12-04T09:33:41.9818307Z * [new tag] ciflow/trunk/169004 -> ciflow/trunk/169004 2025-12-04T09:33:41.9819430Z * [new tag] ciflow/trunk/169006 -> ciflow/trunk/169006 2025-12-04T09:33:41.9820499Z * [new tag] ciflow/trunk/169023 -> ciflow/trunk/169023 2025-12-04T09:33:41.9821554Z * [new tag] ciflow/trunk/169025 -> ciflow/trunk/169025 2025-12-04T09:33:41.9822804Z * [new tag] ciflow/trunk/169048 -> ciflow/trunk/169048 2025-12-04T09:33:41.9823869Z * [new tag] ciflow/trunk/169066 -> ciflow/trunk/169066 2025-12-04T09:33:41.9824980Z * [new tag] ciflow/trunk/169091 -> ciflow/trunk/169091 2025-12-04T09:33:41.9825956Z * [new tag] ciflow/trunk/169102 -> ciflow/trunk/169102 2025-12-04T09:33:41.9827065Z * [new tag] ciflow/trunk/169103 -> ciflow/trunk/169103 2025-12-04T09:33:41.9828282Z * [new tag] ciflow/trunk/169125 -> ciflow/trunk/169125 2025-12-04T09:33:41.9829517Z * [new tag] ciflow/trunk/169139 -> ciflow/trunk/169139 2025-12-04T09:33:41.9830786Z * [new tag] ciflow/trunk/169148 -> ciflow/trunk/169148 2025-12-04T09:33:41.9831851Z * [new tag] ciflow/trunk/169151 -> ciflow/trunk/169151 2025-12-04T09:33:41.9832957Z * [new tag] ciflow/trunk/169156 -> ciflow/trunk/169156 2025-12-04T09:33:41.9834187Z * [new tag] ciflow/trunk/169176 -> ciflow/trunk/169176 2025-12-04T09:33:41.9835250Z * [new tag] ciflow/trunk/169204 -> ciflow/trunk/169204 2025-12-04T09:33:41.9836296Z * [new tag] ciflow/trunk/169207 -> ciflow/trunk/169207 2025-12-04T09:33:41.9837349Z * [new tag] ciflow/trunk/169211 -> ciflow/trunk/169211 2025-12-04T09:33:41.9838820Z * [new tag] ciflow/trunk/169231 -> ciflow/trunk/169231 2025-12-04T09:33:41.9839889Z * [new tag] ciflow/trunk/169260 -> ciflow/trunk/169260 2025-12-04T09:33:41.9841263Z * [new tag] ciflow/trunk/169271 -> ciflow/trunk/169271 2025-12-04T09:33:41.9842318Z * [new tag] ciflow/trunk/169280 -> ciflow/trunk/169280 2025-12-04T09:33:41.9843459Z * [new tag] ciflow/trunk/169281 -> ciflow/trunk/169281 2025-12-04T09:33:41.9844331Z * [new tag] ciflow/trunk/169286 -> ciflow/trunk/169286 2025-12-04T09:33:41.9845714Z * [new tag] ciflow/trunk/169293 -> ciflow/trunk/169293 2025-12-04T09:33:41.9846779Z * [new tag] ciflow/trunk/169296 -> ciflow/trunk/169296 2025-12-04T09:33:41.9847872Z * [new tag] ciflow/trunk/169304 -> ciflow/trunk/169304 2025-12-04T09:33:41.9848951Z * [new tag] ciflow/trunk/169305 -> ciflow/trunk/169305 2025-12-04T09:33:41.9849994Z * [new tag] ciflow/trunk/169312 -> ciflow/trunk/169312 2025-12-04T09:33:41.9851479Z * [new tag] ciflow/trunk/169328 -> ciflow/trunk/169328 2025-12-04T09:33:41.9852571Z * [new tag] ciflow/trunk/169343 -> ciflow/trunk/169343 2025-12-04T09:33:41.9853614Z * [new tag] ciflow/trunk/169355 -> ciflow/trunk/169355 2025-12-04T09:33:41.9854677Z * [new tag] ciflow/trunk/169370 -> ciflow/trunk/169370 2025-12-04T09:33:41.9855944Z * [new tag] ciflow/trunk/169379 -> ciflow/trunk/169379 2025-12-04T09:33:41.9857172Z * [new tag] ciflow/trunk/169380 -> ciflow/trunk/169380 2025-12-04T09:33:41.9858245Z * [new tag] ciflow/trunk/169385 -> ciflow/trunk/169385 2025-12-04T09:33:41.9859990Z * [new tag] ciflow/trunk/169387 -> ciflow/trunk/169387 2025-12-04T09:33:41.9861267Z * [new tag] ciflow/trunk/169410 -> ciflow/trunk/169410 2025-12-04T09:33:41.9862411Z * [new tag] ciflow/trunk/169412 -> ciflow/trunk/169412 2025-12-04T09:33:41.9863454Z * [new tag] ciflow/trunk/169418 -> ciflow/trunk/169418 2025-12-04T09:33:41.9864522Z * [new tag] ciflow/trunk/169423 -> ciflow/trunk/169423 2025-12-04T09:33:41.9865589Z * [new tag] ciflow/trunk/169427 -> ciflow/trunk/169427 2025-12-04T09:33:41.9866731Z * [new tag] ciflow/trunk/169430 -> ciflow/trunk/169430 2025-12-04T09:33:41.9867827Z * [new tag] ciflow/trunk/169437 -> ciflow/trunk/169437 2025-12-04T09:33:41.9868929Z * [new tag] ciflow/trunk/169442 -> ciflow/trunk/169442 2025-12-04T09:33:41.9869999Z * [new tag] ciflow/trunk/169452 -> ciflow/trunk/169452 2025-12-04T09:33:41.9871260Z * [new tag] ciflow/trunk/169454 -> ciflow/trunk/169454 2025-12-04T09:33:41.9872368Z * [new tag] ciflow/trunk/169459 -> ciflow/trunk/169459 2025-12-04T09:33:41.9873651Z * [new tag] ciflow/trunk/169474 -> ciflow/trunk/169474 2025-12-04T09:33:41.9874724Z * [new tag] ciflow/trunk/169475 -> ciflow/trunk/169475 2025-12-04T09:33:41.9875799Z * [new tag] ciflow/trunk/169476 -> ciflow/trunk/169476 2025-12-04T09:33:41.9877092Z * [new tag] ciflow/trunk/169487 -> ciflow/trunk/169487 2025-12-04T09:33:41.9878152Z * [new tag] ciflow/trunk/169497 -> ciflow/trunk/169497 2025-12-04T09:33:41.9879263Z * [new tag] ciflow/trunk/169503 -> ciflow/trunk/169503 2025-12-04T09:33:41.9880310Z * [new tag] ciflow/trunk/169505 -> ciflow/trunk/169505 2025-12-04T09:33:41.9881371Z * [new tag] ciflow/trunk/169507 -> ciflow/trunk/169507 2025-12-04T09:33:41.9882466Z * [new tag] ciflow/trunk/169514 -> ciflow/trunk/169514 2025-12-04T09:33:41.9883699Z * [new tag] ciflow/trunk/169517 -> ciflow/trunk/169517 2025-12-04T09:33:41.9884495Z * [new tag] ciflow/trunk/169519 -> ciflow/trunk/169519 2025-12-04T09:33:41.9885608Z * [new tag] ciflow/trunk/169528 -> ciflow/trunk/169528 2025-12-04T09:33:41.9886566Z * [new tag] ciflow/trunk/169541 -> ciflow/trunk/169541 2025-12-04T09:33:41.9887831Z * [new tag] ciflow/trunk/169555 -> ciflow/trunk/169555 2025-12-04T09:33:41.9889459Z * [new tag] ciflow/unstable/123 -> ciflow/unstable/123 2025-12-04T09:33:41.9890765Z * [new tag] ciflow/vllm/165270 -> ciflow/vllm/165270 2025-12-04T09:33:41.9891729Z * [new tag] ciflow/vllm/165274 -> ciflow/vllm/165274 2025-12-04T09:33:41.9892804Z * [new tag] ciflow/vllm/166494 -> ciflow/vllm/166494 2025-12-04T09:33:41.9893931Z * [new tag] ciflow/vllm/169219 -> ciflow/vllm/169219 2025-12-04T09:33:41.9894920Z * [new tag] ciflow/vllm/169220 -> ciflow/vllm/169220 2025-12-04T09:33:41.9896240Z * [new tag] ciflow/xpu/157994 -> ciflow/xpu/157994 2025-12-04T09:33:41.9897319Z * [new tag] ciflow/xpu/159718 -> ciflow/xpu/159718 2025-12-04T09:33:41.9898234Z * [new tag] ciflow/xpu/161940 -> ciflow/xpu/161940 2025-12-04T09:33:41.9899467Z * [new tag] ciflow/xpu/163251 -> ciflow/xpu/163251 2025-12-04T09:33:41.9900494Z * [new tag] ciflow/xpu/166829 -> ciflow/xpu/166829 2025-12-04T09:33:41.9901452Z * [new tag] ciflow/xpu/166843 -> ciflow/xpu/166843 2025-12-04T09:33:41.9902485Z * [new tag] ciflow/xpu/167972 -> ciflow/xpu/167972 2025-12-04T09:33:41.9903517Z * [new tag] ciflow/xpu/167981 -> ciflow/xpu/167981 2025-12-04T09:33:41.9904551Z * [new tag] ciflow/xpu/168213 -> ciflow/xpu/168213 2025-12-04T09:33:41.9905589Z * [new tag] ciflow/xpu/168262 -> ciflow/xpu/168262 2025-12-04T09:33:41.9906642Z * [new tag] ciflow/xpu/168328 -> ciflow/xpu/168328 2025-12-04T09:33:41.9908122Z * [new tag] ciflow/xpu/168950 -> ciflow/xpu/168950 2025-12-04T09:33:41.9909641Z * [new tag] ciflow/xpu/169039 -> ciflow/xpu/169039 2025-12-04T09:33:41.9911368Z * [new tag] ciflow/xpu/169200 -> ciflow/xpu/169200 2025-12-04T09:33:41.9911837Z * [new tag] ciflow/xpu/169203 -> ciflow/xpu/169203 2025-12-04T09:33:41.9913057Z * [new tag] ciflow/xpu/169230 -> ciflow/xpu/169230 2025-12-04T09:33:41.9914107Z * [new tag] ciflow/xpu/169231 -> ciflow/xpu/169231 2025-12-04T09:33:41.9915365Z * [new tag] ciflow/xpu/169241 -> ciflow/xpu/169241 2025-12-04T09:33:41.9916401Z * [new tag] ciflow/xpu/169280 -> ciflow/xpu/169280 2025-12-04T09:33:41.9917478Z * [new tag] ciflow/xpu/169296 -> ciflow/xpu/169296 2025-12-04T09:33:41.9918733Z * [new tag] ciflow/xpu/169353 -> ciflow/xpu/169353 2025-12-04T09:33:41.9919876Z * [new tag] ciflow/xpu/169410 -> ciflow/xpu/169410 2025-12-04T09:33:41.9920728Z * [new tag] ciflow/xpu/169442 -> ciflow/xpu/169442 2025-12-04T09:33:41.9921959Z * [new tag] ciflow/xpu/169555 -> ciflow/xpu/169555 2025-12-04T09:33:41.9923146Z * [new tag] cslpull75 -> cslpull75 2025-12-04T09:33:41.9924367Z * [new tag] cslpull76 -> cslpull76 2025-12-04T09:33:41.9925458Z * [new tag] cslpull77 -> cslpull77 2025-12-04T09:33:41.9926691Z * [new tag] cslpull78 -> cslpull78 2025-12-04T09:33:41.9928050Z * [new tag] cslpull79 -> cslpull79 2025-12-04T09:33:41.9929615Z * [new tag] cslpull80 -> cslpull80 2025-12-04T09:33:41.9930826Z * [new tag] cslpull81 -> cslpull81 2025-12-04T09:33:41.9932058Z * [new tag] cslpull82 -> cslpull82 2025-12-04T09:33:41.9933221Z * [new tag] cslpull83 -> cslpull83 2025-12-04T09:33:41.9934472Z * [new tag] cslpull84 -> cslpull84 2025-12-04T09:33:41.9935506Z * [new tag] cslpull85 -> cslpull85 2025-12-04T09:33:41.9936859Z * [new tag] cslpull86 -> cslpull86 2025-12-04T09:33:41.9938129Z * [new tag] cslpull87 -> cslpull87 2025-12-04T09:33:41.9939364Z * [new tag] cslpull88 -> cslpull88 2025-12-04T09:33:41.9940524Z * [new tag] cslpull89 -> cslpull89 2025-12-04T09:33:41.9941523Z * [new tag] cslpull90 -> cslpull90 2025-12-04T09:33:41.9943225Z * [new tag] cslpull91 -> cslpull91 2025-12-04T09:33:41.9944372Z * [new tag] cslpull92 -> cslpull92 2025-12-04T09:33:41.9945692Z * [new tag] flight_5 -> flight_5 2025-12-04T09:33:41.9947093Z * [new tag] flight_5.1 -> flight_5.1 2025-12-04T09:33:41.9948358Z * [new tag] flight_5.2 -> flight_5.2 2025-12-04T09:33:41.9949718Z * [new tag] flight_5.3 -> flight_5.3 2025-12-04T09:33:41.9950885Z * [new tag] forpull1 -> forpull1 2025-12-04T09:33:41.9952484Z * [new tag] malfet/tag-2ef5611 -> malfet/tag-2ef5611 2025-12-04T09:33:41.9953857Z * [new tag] malfet/tag-317b1a0 -> malfet/tag-317b1a0 2025-12-04T09:33:41.9955067Z * [new tag] malfet/tag-ec6f767 -> malfet/tag-ec6f767 2025-12-04T09:33:41.9956361Z * [new tag] nightly-binary -> nightly-binary 2025-12-04T09:33:41.9957608Z * [new tag] sqzhang_flight4_plus -> sqzhang_flight4_plus 2025-12-04T09:33:41.9959073Z * [new tag] sqzhang_flight_3 -> sqzhang_flight_3 2025-12-04T09:33:41.9960723Z * [new tag] trunk/02d8bd6974cf84b721680d773dbdb1b6f40ce272 -> trunk/02d8bd6974cf84b721680d773dbdb1b6f40ce272 2025-12-04T09:33:41.9961857Z * [new tag] trunk/066997fb38ade71e00d78e9d572e380b5f02bd3e -> trunk/066997fb38ade71e00d78e9d572e380b5f02bd3e 2025-12-04T09:33:41.9963572Z * [new tag] trunk/076e7b19fa1d481ad778d06d2b49ba57d3ce8c88 -> trunk/076e7b19fa1d481ad778d06d2b49ba57d3ce8c88 2025-12-04T09:33:41.9964996Z * [new tag] trunk/07dcc0b83db3211653a38565a24e15acdba75654 -> trunk/07dcc0b83db3211653a38565a24e15acdba75654 2025-12-04T09:33:41.9966306Z * [new tag] trunk/082e96b68dfcd16cab7cfafc4d3d055767dab3eb -> trunk/082e96b68dfcd16cab7cfafc4d3d055767dab3eb 2025-12-04T09:33:41.9967306Z * [new tag] trunk/088048f2fea28ff7d450f65c72419ca45780d30b -> trunk/088048f2fea28ff7d450f65c72419ca45780d30b 2025-12-04T09:33:41.9968848Z * [new tag] trunk/09076941a95c76f4d9ad189d064dfd8baa39e672 -> trunk/09076941a95c76f4d9ad189d064dfd8baa39e672 2025-12-04T09:33:41.9970141Z * [new tag] trunk/0b80a4c62b94402844bf221791c096b0035c6d75 -> trunk/0b80a4c62b94402844bf221791c096b0035c6d75 2025-12-04T09:33:41.9971803Z * [new tag] trunk/0bbbdf1750567a980634ad907a325357ba8ba8f2 -> trunk/0bbbdf1750567a980634ad907a325357ba8ba8f2 2025-12-04T09:33:41.9974313Z * [new tag] trunk/0c281dd78773b2bc17c58ead0e4cd4ac46e775c5 -> trunk/0c281dd78773b2bc17c58ead0e4cd4ac46e775c5 2025-12-04T09:33:41.9975692Z * [new tag] trunk/135f3753c418a6879b1954904184937b67e61688 -> trunk/135f3753c418a6879b1954904184937b67e61688 2025-12-04T09:33:41.9976814Z * [new tag] trunk/15da21026cb13cd20257dc9e96830db108743c10 -> trunk/15da21026cb13cd20257dc9e96830db108743c10 2025-12-04T09:33:41.9978118Z * [new tag] trunk/166efdad2ac827f30fb02504c6017520257f88ec -> trunk/166efdad2ac827f30fb02504c6017520257f88ec 2025-12-04T09:33:41.9979084Z * [new tag] trunk/174272c15fae553d8488140af931f7d8050a313f -> trunk/174272c15fae553d8488140af931f7d8050a313f 2025-12-04T09:33:41.9980536Z * [new tag] trunk/18f3ca08f13b8de61307f5e8cd7d4cccb67e9d11 -> trunk/18f3ca08f13b8de61307f5e8cd7d4cccb67e9d11 2025-12-04T09:33:41.9981540Z * [new tag] trunk/1902eddfe655a15ebcf2c72bd81ade110fdeef63 -> trunk/1902eddfe655a15ebcf2c72bd81ade110fdeef63 2025-12-04T09:33:41.9982581Z * [new tag] trunk/195f92e98d3d66738577f11f22c4b5c8a1c76dd5 -> trunk/195f92e98d3d66738577f11f22c4b5c8a1c76dd5 2025-12-04T09:33:41.9983725Z * [new tag] trunk/1aa13e17de39e3c768ea7aebaad166ce72a06676 -> trunk/1aa13e17de39e3c768ea7aebaad166ce72a06676 2025-12-04T09:33:41.9984793Z * [new tag] trunk/1afe2832f58e24e54a5bfda5a5afa9b96fdea40e -> trunk/1afe2832f58e24e54a5bfda5a5afa9b96fdea40e 2025-12-04T09:33:41.9985955Z * [new tag] trunk/1c87554d74140eaee964ca8b1832cede67f5f520 -> trunk/1c87554d74140eaee964ca8b1832cede67f5f520 2025-12-04T09:33:41.9987035Z * [new tag] trunk/1ccb743b7b5be955f49736c162c4f5004b8a0dd8 -> trunk/1ccb743b7b5be955f49736c162c4f5004b8a0dd8 2025-12-04T09:33:41.9988300Z * [new tag] trunk/1cee47d6ce0a02227185b566593f002dd639ca0c -> trunk/1cee47d6ce0a02227185b566593f002dd639ca0c 2025-12-04T09:33:41.9989115Z * [new tag] trunk/1d21b4df2babe322e5d085ceb6de884eb260a62d -> trunk/1d21b4df2babe322e5d085ceb6de884eb260a62d 2025-12-04T09:33:41.9990217Z * [new tag] trunk/1e34fb2550e4aa650314f7a6d9f6daf4da7478a8 -> trunk/1e34fb2550e4aa650314f7a6d9f6daf4da7478a8 2025-12-04T09:33:41.9993514Z * [new tag] trunk/1e526fb5b1d93bfc70691c5c3955fdffc1b7b7de -> trunk/1e526fb5b1d93bfc70691c5c3955fdffc1b7b7de 2025-12-04T09:33:41.9993987Z * [new tag] trunk/1ee32a8b1f554a312d79bad01ded24f38cd95543 -> trunk/1ee32a8b1f554a312d79bad01ded24f38cd95543 2025-12-04T09:33:41.9994441Z * [new tag] trunk/201e2c4117eb9744594dad6a5c18213d7b4705d7 -> trunk/201e2c4117eb9744594dad6a5c18213d7b4705d7 2025-12-04T09:33:41.9994902Z * [new tag] trunk/2353a0f60eb4b4cb6675907a7fa9fbedc1c02e7f -> trunk/2353a0f60eb4b4cb6675907a7fa9fbedc1c02e7f 2025-12-04T09:33:41.9995866Z * [new tag] trunk/285779b1621cf9f073a062b0889a642d200308d9 -> trunk/285779b1621cf9f073a062b0889a642d200308d9 2025-12-04T09:33:41.9996697Z * [new tag] trunk/2887faaec6295d081580d09fce161201826c6d87 -> trunk/2887faaec6295d081580d09fce161201826c6d87 2025-12-04T09:33:41.9997796Z * [new tag] trunk/296e67c92635443c67b11c0ae1bd045f03ebb7bc -> trunk/296e67c92635443c67b11c0ae1bd045f03ebb7bc 2025-12-04T09:33:41.9999063Z * [new tag] trunk/29856679769b3dede478767e2fe6cfb51197cb25 -> trunk/29856679769b3dede478767e2fe6cfb51197cb25 2025-12-04T09:33:42.0000054Z * [new tag] trunk/29e5455a4740c326ab187c7aa7b5ef98034ea563 -> trunk/29e5455a4740c326ab187c7aa7b5ef98034ea563 2025-12-04T09:33:42.0001178Z * [new tag] trunk/2ac3ef882afb23136adc188975f0a8802fc68adf -> trunk/2ac3ef882afb23136adc188975f0a8802fc68adf 2025-12-04T09:33:42.0002021Z * [new tag] trunk/2bec68e73b64715354af076ad309335f943e36cd -> trunk/2bec68e73b64715354af076ad309335f943e36cd 2025-12-04T09:33:42.0003121Z * [new tag] trunk/2c87367e6f88662cd5cedbd1537748b7948c38e1 -> trunk/2c87367e6f88662cd5cedbd1537748b7948c38e1 2025-12-04T09:33:42.0004292Z * [new tag] trunk/2d1f78fe3ec13820f136a2e0336da12a25f41708 -> trunk/2d1f78fe3ec13820f136a2e0336da12a25f41708 2025-12-04T09:33:42.0005407Z * [new tag] trunk/2df6058f116a65722a0e03073402feb242572d35 -> trunk/2df6058f116a65722a0e03073402feb242572d35 2025-12-04T09:33:42.0006459Z * [new tag] trunk/2e0c2e170fe658c440775c8e5c44228aafcc47ec -> trunk/2e0c2e170fe658c440775c8e5c44228aafcc47ec 2025-12-04T09:33:42.0007821Z * [new tag] trunk/2f9b7dad7b5419b063bd0f2e204de192720ebb94 -> trunk/2f9b7dad7b5419b063bd0f2e204de192720ebb94 2025-12-04T09:33:42.0008784Z * [new tag] trunk/305168768a95d69c444df5cd334bb774edfe06f1 -> trunk/305168768a95d69c444df5cd334bb774edfe06f1 2025-12-04T09:33:42.0009831Z * [new tag] trunk/31fc12773026e8e00f054dd79ad9b2491e693b48 -> trunk/31fc12773026e8e00f054dd79ad9b2491e693b48 2025-12-04T09:33:42.0010876Z * [new tag] trunk/320de0c6b0a3e7c6d2693ea5c28d5d0156ba7991 -> trunk/320de0c6b0a3e7c6d2693ea5c28d5d0156ba7991 2025-12-04T09:33:42.0011942Z * [new tag] trunk/3418bd29475dff06695045fcdf93e7d0dac67da8 -> trunk/3418bd29475dff06695045fcdf93e7d0dac67da8 2025-12-04T09:33:42.0012945Z * [new tag] trunk/34a98608afa0cb5b48f0d6d30432fdd0a2614ddf -> trunk/34a98608afa0cb5b48f0d6d30432fdd0a2614ddf 2025-12-04T09:33:42.0013909Z * [new tag] trunk/35b7a9a26c5923d98aebaa41a031dae21788a9ee -> trunk/35b7a9a26c5923d98aebaa41a031dae21788a9ee 2025-12-04T09:33:42.0015140Z * [new tag] trunk/39d07dbf03a911bdd45d1af78d8638dc92074938 -> trunk/39d07dbf03a911bdd45d1af78d8638dc92074938 2025-12-04T09:33:42.0015954Z * [new tag] trunk/3cd98b4205ada151042cc7ff097a82d4a4b18725 -> trunk/3cd98b4205ada151042cc7ff097a82d4a4b18725 2025-12-04T09:33:42.0017090Z * [new tag] trunk/3d35fd20a78ff4d016fa80f4e5fad37191d7bcae -> trunk/3d35fd20a78ff4d016fa80f4e5fad37191d7bcae 2025-12-04T09:33:42.0018212Z * [new tag] trunk/409a5fee945c46a3edaf5df162812f201bfd7b2f -> trunk/409a5fee945c46a3edaf5df162812f201bfd7b2f 2025-12-04T09:33:42.0019255Z * [new tag] trunk/42e9005cda22da3f1c559c3649218cebd671027c -> trunk/42e9005cda22da3f1c559c3649218cebd671027c 2025-12-04T09:33:42.0020485Z * [new tag] trunk/43b94713bbf340d3c124fde02d0f73add4021247 -> trunk/43b94713bbf340d3c124fde02d0f73add4021247 2025-12-04T09:33:42.0021485Z * [new tag] trunk/44ac69388a4a5eb463dbd2a13f00d1e3b924566c -> trunk/44ac69388a4a5eb463dbd2a13f00d1e3b924566c 2025-12-04T09:33:42.0022601Z * [new tag] trunk/45d14e2497292be06ad36eaa1aaaf7c630a2586a -> trunk/45d14e2497292be06ad36eaa1aaaf7c630a2586a 2025-12-04T09:33:42.0024245Z * [new tag] trunk/45d310ad84854dff730c0b12e577d7998d978686 -> trunk/45d310ad84854dff730c0b12e577d7998d978686 2025-12-04T09:33:42.0025685Z * [new tag] trunk/47b28ddf7bd74b50fa93b307a7d3b183a6d77f54 -> trunk/47b28ddf7bd74b50fa93b307a7d3b183a6d77f54 2025-12-04T09:33:42.0026463Z * [new tag] trunk/481e5ab336275bd3acd5fa8a611b05b4469012af -> trunk/481e5ab336275bd3acd5fa8a611b05b4469012af 2025-12-04T09:33:42.0027801Z * [new tag] trunk/491731647f6b8a9345dcfb3bc9416aea254a7d96 -> trunk/491731647f6b8a9345dcfb3bc9416aea254a7d96 2025-12-04T09:33:42.0028810Z * [new tag] trunk/49a04d26088acc17d948ddd66920f3e16371e873 -> trunk/49a04d26088acc17d948ddd66920f3e16371e873 2025-12-04T09:33:42.0029875Z * [new tag] trunk/4bebc827c47d2f1f0fa1a417a5201a97aef3d985 -> trunk/4bebc827c47d2f1f0fa1a417a5201a97aef3d985 2025-12-04T09:33:42.0030824Z * [new tag] trunk/4c246677784c6a14bc2dbb9ff8773ef0a3a3222f -> trunk/4c246677784c6a14bc2dbb9ff8773ef0a3a3222f 2025-12-04T09:33:42.0032207Z * [new tag] trunk/4cfb47ff548b6d996641058cf04a70e311a4c3aa -> trunk/4cfb47ff548b6d996641058cf04a70e311a4c3aa 2025-12-04T09:33:42.0033273Z * [new tag] trunk/4e0061c1aa52f606dda8cfab0bd7591e588faf2c -> trunk/4e0061c1aa52f606dda8cfab0bd7591e588faf2c 2025-12-04T09:33:42.0035027Z * [new tag] trunk/4fefb8e7e942386ffac764a41b232241f82bea3a -> trunk/4fefb8e7e942386ffac764a41b232241f82bea3a 2025-12-04T09:33:42.0036000Z * [new tag] trunk/503b2640023521f5a35cd9a52fc8033d73a95d0d -> trunk/503b2640023521f5a35cd9a52fc8033d73a95d0d 2025-12-04T09:33:42.0037065Z * [new tag] trunk/518c2b1b3dab9a2ef2849e04b3bc2f20c1c41db9 -> trunk/518c2b1b3dab9a2ef2849e04b3bc2f20c1c41db9 2025-12-04T09:33:42.0038140Z * [new tag] trunk/5191b2fa68ba19960912bfd7fd721c79d76bb1f3 -> trunk/5191b2fa68ba19960912bfd7fd721c79d76bb1f3 2025-12-04T09:33:42.0039487Z * [new tag] trunk/52ac0f0dc4acacd219f1317fbc28ec631c01e07a -> trunk/52ac0f0dc4acacd219f1317fbc28ec631c01e07a 2025-12-04T09:33:42.0040477Z * [new tag] trunk/539ba711b029de9f191070f4f0d12f18f5b7f292 -> trunk/539ba711b029de9f191070f4f0d12f18f5b7f292 2025-12-04T09:33:42.0041573Z * [new tag] trunk/556375b55deebebbc56cb7aef81f4d52f031ba28 -> trunk/556375b55deebebbc56cb7aef81f4d52f031ba28 2025-12-04T09:33:42.0043388Z * [new tag] trunk/55c4ab554845481d0a69a3811937575fe8bb1a66 -> trunk/55c4ab554845481d0a69a3811937575fe8bb1a66 2025-12-04T09:33:42.0044332Z * [new tag] trunk/5634469fda9e5d98869c82c7d03bb08914245f96 -> trunk/5634469fda9e5d98869c82c7d03bb08914245f96 2025-12-04T09:33:42.0045210Z * [new tag] trunk/5778f6ff894686a975a9a23645178ae4c87ad5dc -> trunk/5778f6ff894686a975a9a23645178ae4c87ad5dc 2025-12-04T09:33:42.0046325Z * [new tag] trunk/587d63a3e07de5dc91065f9ef70bcacda9989068 -> trunk/587d63a3e07de5dc91065f9ef70bcacda9989068 2025-12-04T09:33:42.0047351Z * [new tag] trunk/597930f6b568852356ca9795dac76f9e4653adbd -> trunk/597930f6b568852356ca9795dac76f9e4653adbd 2025-12-04T09:33:42.0048261Z * [new tag] trunk/597df3a4e2a67b9fdbe1a89b2f4d74f822274db6 -> trunk/597df3a4e2a67b9fdbe1a89b2f4d74f822274db6 2025-12-04T09:33:42.0049599Z * [new tag] trunk/59abd50e931f4efb21b053f7a2911f5d8a49d883 -> trunk/59abd50e931f4efb21b053f7a2911f5d8a49d883 2025-12-04T09:33:42.0050558Z * [new tag] trunk/5a607febc04c3a2b5824c75f3f60307867439a2c -> trunk/5a607febc04c3a2b5824c75f3f60307867439a2c 2025-12-04T09:33:42.0051676Z * [new tag] trunk/5bf1cdf4755c54ef462b44cb8041b0a57311556b -> trunk/5bf1cdf4755c54ef462b44cb8041b0a57311556b 2025-12-04T09:33:42.0052540Z * [new tag] trunk/5f0030ba63d334d7e8c93a09e41403b89e4c573c -> trunk/5f0030ba63d334d7e8c93a09e41403b89e4c573c 2025-12-04T09:33:42.0053587Z * [new tag] trunk/5f21d27e71268464d362a96c9ac09ea475f7f202 -> trunk/5f21d27e71268464d362a96c9ac09ea475f7f202 2025-12-04T09:33:42.0054701Z * [new tag] trunk/5fafc13038c9988d9ac21fa793fbd5890604b447 -> trunk/5fafc13038c9988d9ac21fa793fbd5890604b447 2025-12-04T09:33:42.0055849Z * [new tag] trunk/61be54a31dc09b59d99b62176fb935aee0b924ef -> trunk/61be54a31dc09b59d99b62176fb935aee0b924ef 2025-12-04T09:33:42.0056979Z * [new tag] trunk/62d3ccd71484ed6a760d909b41487101bbc65719 -> trunk/62d3ccd71484ed6a760d909b41487101bbc65719 2025-12-04T09:33:42.0058297Z * [new tag] trunk/641cdb68ae27668eb441d0e49c87a0602c120c2b -> trunk/641cdb68ae27668eb441d0e49c87a0602c120c2b 2025-12-04T09:33:42.0059254Z * [new tag] trunk/65c4620d6bb0c6029f69762c22b91dda2294da9a -> trunk/65c4620d6bb0c6029f69762c22b91dda2294da9a 2025-12-04T09:33:42.0060782Z * [new tag] trunk/66004b993744b4106bf8afaba71f3c228a804206 -> trunk/66004b993744b4106bf8afaba71f3c228a804206 2025-12-04T09:33:42.0061413Z * [new tag] trunk/6658a04c7ca67acb64512341342e7b3ee13ee386 -> trunk/6658a04c7ca67acb64512341342e7b3ee13ee386 2025-12-04T09:33:42.0062439Z * [new tag] trunk/6864e309092a71f8ab0ca6a4dc7f8a4073fd31c4 -> trunk/6864e309092a71f8ab0ca6a4dc7f8a4073fd31c4 2025-12-04T09:33:42.0063606Z * [new tag] trunk/6c261c6cb07892c90ca19ed51c9705b1659a3f7d -> trunk/6c261c6cb07892c90ca19ed51c9705b1659a3f7d 2025-12-04T09:33:42.0064581Z * [new tag] trunk/6c8b6a043f1628188b6396b3a2a6e000ca68362b -> trunk/6c8b6a043f1628188b6396b3a2a6e000ca68362b 2025-12-04T09:33:42.0065634Z * [new tag] trunk/6ceb4a32f92ae67ce5d7d97931d17401ebf5ffa5 -> trunk/6ceb4a32f92ae67ce5d7d97931d17401ebf5ffa5 2025-12-04T09:33:42.0066744Z * [new tag] trunk/6e404e9b7d6f5fb0de86aa73888c3038248c17f8 -> trunk/6e404e9b7d6f5fb0de86aa73888c3038248c17f8 2025-12-04T09:33:42.0067993Z * [new tag] trunk/6ec30b490aee1db6bcdc7340abddef25784f08ec -> trunk/6ec30b490aee1db6bcdc7340abddef25784f08ec 2025-12-04T09:33:42.0068954Z * [new tag] trunk/6f2783a6c08e1db34275ff25176ffe9aebc30a71 -> trunk/6f2783a6c08e1db34275ff25176ffe9aebc30a71 2025-12-04T09:33:42.0070233Z * [new tag] trunk/6f53fefeb90ad3281119b5cfc4aa9ffd8a066e3d -> trunk/6f53fefeb90ad3281119b5cfc4aa9ffd8a066e3d 2025-12-04T09:33:42.0071364Z * [new tag] trunk/6f7dcf51e46d0c880db1a2f5c70de57adb576f4a -> trunk/6f7dcf51e46d0c880db1a2f5c70de57adb576f4a 2025-12-04T09:33:42.0072900Z * [new tag] trunk/6ff831180d2fa436c7f1c1af3adac641fce9d60e -> trunk/6ff831180d2fa436c7f1c1af3adac641fce9d60e 2025-12-04T09:33:42.0073859Z * [new tag] trunk/70076464a63ab218a7ceefb0e76ccd7131deb8f8 -> trunk/70076464a63ab218a7ceefb0e76ccd7131deb8f8 2025-12-04T09:33:42.0074921Z * [new tag] trunk/70d797a5fc109b20a517646fcaa819477cd0d485 -> trunk/70d797a5fc109b20a517646fcaa819477cd0d485 2025-12-04T09:33:42.0075940Z * [new tag] trunk/7348cb355ff0a6f79cd4871215aea72185748734 -> trunk/7348cb355ff0a6f79cd4871215aea72185748734 2025-12-04T09:33:42.0077040Z * [new tag] trunk/74fe26a1ebe32931783569f2e762e3c2c974901f -> trunk/74fe26a1ebe32931783569f2e762e3c2c974901f 2025-12-04T09:33:42.0078158Z * [new tag] trunk/76aeb8c7e0f795b3fddca134cbea9a69da3ee696 -> trunk/76aeb8c7e0f795b3fddca134cbea9a69da3ee696 2025-12-04T09:33:42.0079053Z * [new tag] trunk/7716da9fb23f27a65b41f9f016a2afadf281c18f -> trunk/7716da9fb23f27a65b41f9f016a2afadf281c18f 2025-12-04T09:33:42.0080146Z * [new tag] trunk/7741edd4ed665f3988052e260863efb508d61a03 -> trunk/7741edd4ed665f3988052e260863efb508d61a03 2025-12-04T09:33:42.0081387Z * [new tag] trunk/78adb3b3df41b45d2368b67226d2f864b78939a6 -> trunk/78adb3b3df41b45d2368b67226d2f864b78939a6 2025-12-04T09:33:42.0082368Z * [new tag] trunk/79d7b178225e5ed24d4e1db74e5abbff848f5fb7 -> trunk/79d7b178225e5ed24d4e1db74e5abbff848f5fb7 2025-12-04T09:33:42.0083261Z * [new tag] trunk/7a1e316115fc6996b3f2336822ba5d5f6179f0c3 -> trunk/7a1e316115fc6996b3f2336822ba5d5f6179f0c3 2025-12-04T09:33:42.0084286Z * [new tag] trunk/7a41b66367c38d0af3e8a90f7be48d6b281e7bca -> trunk/7a41b66367c38d0af3e8a90f7be48d6b281e7bca 2025-12-04T09:33:42.0085312Z * [new tag] trunk/7b7af390ea8541c611d1ce2018a6934188fc197b -> trunk/7b7af390ea8541c611d1ce2018a6934188fc197b 2025-12-04T09:33:42.0086414Z * [new tag] trunk/7ba4680f3755a560af81aa0f688791e367aa3609 -> trunk/7ba4680f3755a560af81aa0f688791e367aa3609 2025-12-04T09:33:42.0087674Z * [new tag] trunk/7bc2a66ded06a0b2549aa51d807edc5dc3e73d1b -> trunk/7bc2a66ded06a0b2549aa51d807edc5dc3e73d1b 2025-12-04T09:33:42.0088462Z * [new tag] trunk/7c648509a7470ace9fb2bae960dd4790f7e943e9 -> trunk/7c648509a7470ace9fb2bae960dd4790f7e943e9 2025-12-04T09:33:42.0089423Z * [new tag] trunk/7cbc2d034cecd21ab5c9707d0a9c525c17143fb8 -> trunk/7cbc2d034cecd21ab5c9707d0a9c525c17143fb8 2025-12-04T09:33:42.0090514Z * [new tag] trunk/7d1bbaf4ba301ea3fba6f3c7bc02d58f6417aaed -> trunk/7d1bbaf4ba301ea3fba6f3c7bc02d58f6417aaed 2025-12-04T09:33:42.0091705Z * [new tag] trunk/7d2a33e4ebf60b217a3cd77feae19231eb996fc8 -> trunk/7d2a33e4ebf60b217a3cd77feae19231eb996fc8 2025-12-04T09:33:42.0092711Z * [new tag] trunk/7eb625920054b1126a7d2d99818aaa188c6ba95e -> trunk/7eb625920054b1126a7d2d99818aaa188c6ba95e 2025-12-04T09:33:42.0093646Z * [new tag] trunk/7f55ba19c456a3d6cc443dd9edb6bb7cca677ead -> trunk/7f55ba19c456a3d6cc443dd9edb6bb7cca677ead 2025-12-04T09:33:42.0095372Z * [new tag] trunk/81af382128efa094d8702e18f2c133760904c718 -> trunk/81af382128efa094d8702e18f2c133760904c718 2025-12-04T09:33:42.0096885Z * [new tag] trunk/84149583d483e9c973c9a0feda70e4f3964947b0 -> trunk/84149583d483e9c973c9a0feda70e4f3964947b0 2025-12-04T09:33:42.0098353Z * [new tag] trunk/85a315917efe82c24306be805c584ec044951c75 -> trunk/85a315917efe82c24306be805c584ec044951c75 2025-12-04T09:33:42.0099354Z * [new tag] trunk/87329491c82a5f8c1cc4ec11d8f55a5de2551ece -> trunk/87329491c82a5f8c1cc4ec11d8f55a5de2551ece 2025-12-04T09:33:42.0100247Z * [new tag] trunk/892640e25aeefa8007c5af837214b4502b6b62a6 -> trunk/892640e25aeefa8007c5af837214b4502b6b62a6 2025-12-04T09:33:42.0101594Z * [new tag] trunk/89e3bbcb5b5321dc8b9520b4d5a8ee60cea1d0b4 -> trunk/89e3bbcb5b5321dc8b9520b4d5a8ee60cea1d0b4 2025-12-04T09:33:42.0102527Z * [new tag] trunk/8c73bbbb02159223c0c97d268a0a74cb78158a1c -> trunk/8c73bbbb02159223c0c97d268a0a74cb78158a1c 2025-12-04T09:33:42.0103625Z * [new tag] trunk/8d56e98c8db988a22cb2dfaeefb30bc7d2a3cc43 -> trunk/8d56e98c8db988a22cb2dfaeefb30bc7d2a3cc43 2025-12-04T09:33:42.0104897Z * [new tag] trunk/8d9dd9603e5ee26c01007f0cd4f018e584840922 -> trunk/8d9dd9603e5ee26c01007f0cd4f018e584840922 2025-12-04T09:33:42.0105968Z * [new tag] trunk/8ef0c0b02b062d75e7c9be2594914a3e784d23ca -> trunk/8ef0c0b02b062d75e7c9be2594914a3e784d23ca 2025-12-04T09:33:42.0107039Z * [new tag] trunk/90b27e7e8352cde97d32ddad24740ef819633f38 -> trunk/90b27e7e8352cde97d32ddad24740ef819633f38 2025-12-04T09:33:42.0107914Z * [new tag] trunk/90f0139e64b2951815d524b6a373bed20c4fbf90 -> trunk/90f0139e64b2951815d524b6a373bed20c4fbf90 2025-12-04T09:33:42.0115058Z * [new tag] trunk/93d0d6838c56af59b0dba794e6aa08f0c1c7799c -> trunk/93d0d6838c56af59b0dba794e6aa08f0c1c7799c 2025-12-04T09:33:42.0115692Z * [new tag] trunk/94ca8d5f1e81fea3ae488650a0fb6795049a9f87 -> trunk/94ca8d5f1e81fea3ae488650a0fb6795049a9f87 2025-12-04T09:33:42.0116163Z * [new tag] trunk/9844fbeadd5cebdf1281d6fbf79164139c352693 -> trunk/9844fbeadd5cebdf1281d6fbf79164139c352693 2025-12-04T09:33:42.0116634Z * [new tag] trunk/99024dec888ec1e50b546822a32b6fb2f35e5eaa -> trunk/99024dec888ec1e50b546822a32b6fb2f35e5eaa 2025-12-04T09:33:42.0117084Z * [new tag] trunk/9a296e640fc88aa44d275b48cd9cc30c573b169d -> trunk/9a296e640fc88aa44d275b48cd9cc30c573b169d 2025-12-04T09:33:42.0117539Z * [new tag] trunk/9b3e34d8589b29f7b4e7fab6f78711b7ca6e4639 -> trunk/9b3e34d8589b29f7b4e7fab6f78711b7ca6e4639 2025-12-04T09:33:42.0118001Z * [new tag] trunk/9cd055e547e9b67a5f9827f8999c38d7eda1bcb8 -> trunk/9cd055e547e9b67a5f9827f8999c38d7eda1bcb8 2025-12-04T09:33:42.0118460Z * [new tag] trunk/9f0df5686cb4ada94f94620acba2e3c3f363b11d -> trunk/9f0df5686cb4ada94f94620acba2e3c3f363b11d 2025-12-04T09:33:42.0118931Z * [new tag] trunk/9f7fceb887d0cfa0326a59b887821c63ff11340a -> trunk/9f7fceb887d0cfa0326a59b887821c63ff11340a 2025-12-04T09:33:42.0119373Z * [new tag] trunk/9f8ef8855d3078d70f7b782540ff2aaf158d6742 -> trunk/9f8ef8855d3078d70f7b782540ff2aaf158d6742 2025-12-04T09:33:42.0119842Z * [new tag] trunk/9fb52efc797b47a1f425a03aa5e47b866d8b1098 -> trunk/9fb52efc797b47a1f425a03aa5e47b866d8b1098 2025-12-04T09:33:42.0120703Z * [new tag] trunk/9ff4a2ebc5762d46c73e46b1b523d7ff349fedfa -> trunk/9ff4a2ebc5762d46c73e46b1b523d7ff349fedfa 2025-12-04T09:33:42.0122121Z * [new tag] trunk/a0f3937b94422354538ebbd47202d5b0e8a3fd0d -> trunk/a0f3937b94422354538ebbd47202d5b0e8a3fd0d 2025-12-04T09:33:42.0122901Z * [new tag] trunk/a15066c28b3145e6edbfc88359d0411d14cfc70c -> trunk/a15066c28b3145e6edbfc88359d0411d14cfc70c 2025-12-04T09:33:42.0124004Z * [new tag] trunk/a20f775e82564d2a9979221ed7f3b8d7cf54ce90 -> trunk/a20f775e82564d2a9979221ed7f3b8d7cf54ce90 2025-12-04T09:33:42.0125104Z * [new tag] trunk/a2973fb00ec002dd4b6bbf07385f066efb259b8c -> trunk/a2973fb00ec002dd4b6bbf07385f066efb259b8c 2025-12-04T09:33:42.0125991Z * [new tag] trunk/a7dc6dab9ad911259d4801c502907e531594db45 -> trunk/a7dc6dab9ad911259d4801c502907e531594db45 2025-12-04T09:33:42.0127233Z * [new tag] trunk/a951a9cee65c01660bbc6e6fded90ecb10fa6109 -> trunk/a951a9cee65c01660bbc6e6fded90ecb10fa6109 2025-12-04T09:33:42.0128242Z * [new tag] trunk/abfa1a6d65c7c159e35c72c25979b9da4971689e -> trunk/abfa1a6d65c7c159e35c72c25979b9da4971689e 2025-12-04T09:33:42.0129466Z * [new tag] trunk/ae3a2395bf66151078e2d201716f7d63ce1c6f3e -> trunk/ae3a2395bf66151078e2d201716f7d63ce1c6f3e 2025-12-04T09:33:42.0130326Z * [new tag] trunk/afdff7f0325080dedac44d080cb5a3b0e65e6c5e -> trunk/afdff7f0325080dedac44d080cb5a3b0e65e6c5e 2025-12-04T09:33:42.0131238Z * [new tag] trunk/b1aed4e7a72c03a38f44543aaea0dae2e9b76d48 -> trunk/b1aed4e7a72c03a38f44543aaea0dae2e9b76d48 2025-12-04T09:33:42.0132611Z * [new tag] trunk/b1decff555cd50e2123c8c6e25cc0d447c411f62 -> trunk/b1decff555cd50e2123c8c6e25cc0d447c411f62 2025-12-04T09:33:42.0133708Z * [new tag] trunk/b2b6b034c9fd08672c40e63ef243556ad4c49bd2 -> trunk/b2b6b034c9fd08672c40e63ef243556ad4c49bd2 2025-12-04T09:33:42.0134767Z * [new tag] trunk/b39813b4a04931682b0491adba2138d01d716d99 -> trunk/b39813b4a04931682b0491adba2138d01d716d99 2025-12-04T09:33:42.0135831Z * [new tag] trunk/b3a7edb2311367974cc7cd764cfb11a5d6758b24 -> trunk/b3a7edb2311367974cc7cd764cfb11a5d6758b24 2025-12-04T09:33:42.0137535Z * [new tag] trunk/b4cc1329c86acaef6d42c1fac7169b8d870ab0d7 -> trunk/b4cc1329c86acaef6d42c1fac7169b8d870ab0d7 2025-12-04T09:33:42.0138221Z * [new tag] trunk/b555c39217f765759954a4f9f9bd1e9b87bed11a -> trunk/b555c39217f765759954a4f9f9bd1e9b87bed11a 2025-12-04T09:33:42.0139464Z * [new tag] trunk/b6b6c80379388b7f9932c3e6a0f9907bf430e417 -> trunk/b6b6c80379388b7f9932c3e6a0f9907bf430e417 2025-12-04T09:33:42.0140568Z * [new tag] trunk/b6b6d912df0b6f4082f8e50b18bd1de1dd7325f4 -> trunk/b6b6d912df0b6f4082f8e50b18bd1de1dd7325f4 2025-12-04T09:33:42.0141667Z * [new tag] trunk/b7d60685f8cbc939b68a20871e90db67e729329b -> trunk/b7d60685f8cbc939b68a20871e90db67e729329b 2025-12-04T09:33:42.0142779Z * [new tag] trunk/b7f6b9a4fc6259f7af068f31868b3119bb1bac3e -> trunk/b7f6b9a4fc6259f7af068f31868b3119bb1bac3e 2025-12-04T09:33:42.0144118Z * [new tag] trunk/b8c4ba3593761e7b2a3ebd86f040fb07b47c02cf -> trunk/b8c4ba3593761e7b2a3ebd86f040fb07b47c02cf 2025-12-04T09:33:42.0144982Z * [new tag] trunk/b9c8f3a4884befb965ff42620ce44a71b04887f5 -> trunk/b9c8f3a4884befb965ff42620ce44a71b04887f5 2025-12-04T09:33:42.0146057Z * [new tag] trunk/ba1412546f3082c0958c077acc2025e4dbc33f1f -> trunk/ba1412546f3082c0958c077acc2025e4dbc33f1f 2025-12-04T09:33:42.0147174Z * [new tag] trunk/bac403c0b38c63bdbcc0c31f1c2b0bc0260f610f -> trunk/bac403c0b38c63bdbcc0c31f1c2b0bc0260f610f 2025-12-04T09:33:42.0148230Z * [new tag] trunk/bb3034198b459401fabeab254e1b99f0115046e2 -> trunk/bb3034198b459401fabeab254e1b99f0115046e2 2025-12-04T09:33:42.0149542Z * [new tag] trunk/bc39b2b3bc7a6e19a42e62bd576974035086fe55 -> trunk/bc39b2b3bc7a6e19a42e62bd576974035086fe55 2025-12-04T09:33:42.0150912Z * [new tag] trunk/bc43d5b297f207a11d83d77ddf0152bdaabe15a8 -> trunk/bc43d5b297f207a11d83d77ddf0152bdaabe15a8 2025-12-04T09:33:42.0151846Z * [new tag] trunk/bc6a4863c7246a6493d16d4ea6eee71ec07c6a09 -> trunk/bc6a4863c7246a6493d16d4ea6eee71ec07c6a09 2025-12-04T09:33:42.0152906Z * [new tag] trunk/bea4912944defdbcb8b061800caab6cbbbd01df5 -> trunk/bea4912944defdbcb8b061800caab6cbbbd01df5 2025-12-04T09:33:42.0154472Z * [new tag] trunk/c04e2c656f48d82d1521b867bbbf03967b9b7564 -> trunk/c04e2c656f48d82d1521b867bbbf03967b9b7564 2025-12-04T09:33:42.0155473Z * [new tag] trunk/c0660bcee27e7d7731634e274576a7081882bede -> trunk/c0660bcee27e7d7731634e274576a7081882bede 2025-12-04T09:33:42.0156586Z * [new tag] trunk/c178ed43d3d99cbefe84fbfb21d6f282b20d62ac -> trunk/c178ed43d3d99cbefe84fbfb21d6f282b20d62ac 2025-12-04T09:33:42.0157696Z * [new tag] trunk/c55b1e8f61d041ee436d697449eb028931d574fb -> trunk/c55b1e8f61d041ee436d697449eb028931d574fb 2025-12-04T09:33:42.0158678Z * [new tag] trunk/c6ae7579fe12fe75f1a8f7043a494c90567273f1 -> trunk/c6ae7579fe12fe75f1a8f7043a494c90567273f1 2025-12-04T09:33:42.0160162Z * [new tag] trunk/c8210e7d94bad5ae21ac389fa4ba8a463c76c4d0 -> trunk/c8210e7d94bad5ae21ac389fa4ba8a463c76c4d0 2025-12-04T09:33:42.0161213Z * [new tag] trunk/cc0853af42122f8185321f542616f4474e717f09 -> trunk/cc0853af42122f8185321f542616f4474e717f09 2025-12-04T09:33:42.0162243Z * [new tag] trunk/cddec6562eabfa390d014fa3741a5659cf9c94c9 -> trunk/cddec6562eabfa390d014fa3741a5659cf9c94c9 2025-12-04T09:33:42.0163366Z * [new tag] trunk/ce5e7e3bf1f4b69a4f4f93d288ba75b906df492a -> trunk/ce5e7e3bf1f4b69a4f4f93d288ba75b906df492a 2025-12-04T09:33:42.0164481Z * [new tag] trunk/d038b0130ec7c20ebcac219301292fd8e98a1ace -> trunk/d038b0130ec7c20ebcac219301292fd8e98a1ace 2025-12-04T09:33:42.0165488Z * [new tag] trunk/d16447dacaf2420ea175f0c275c75da951f57d39 -> trunk/d16447dacaf2420ea175f0c275c75da951f57d39 2025-12-04T09:33:42.0167229Z * [new tag] trunk/d19f1e8cab6810bb2e99141f9976665954c67a50 -> trunk/d19f1e8cab6810bb2e99141f9976665954c67a50 2025-12-04T09:33:42.0168224Z * [new tag] trunk/d1c9f03b2a5af4104721712f8cdffe9b4f340c01 -> trunk/d1c9f03b2a5af4104721712f8cdffe9b4f340c01 2025-12-04T09:33:42.0169564Z * [new tag] trunk/d40f4950f2b7f7aa380a22fe0f6166e71680fbcf -> trunk/d40f4950f2b7f7aa380a22fe0f6166e71680fbcf 2025-12-04T09:33:42.0170524Z * [new tag] trunk/d5038950bacfe36bbf24a47a455fe76901deb8e8 -> trunk/d5038950bacfe36bbf24a47a455fe76901deb8e8 2025-12-04T09:33:42.0171594Z * [new tag] trunk/d54ff42903c2ae0533931ff11d23b35f875bdb3d -> trunk/d54ff42903c2ae0533931ff11d23b35f875bdb3d 2025-12-04T09:33:42.0173073Z * [new tag] trunk/d76697633a2d2b9cced1ae21161849b33bfe7e47 -> trunk/d76697633a2d2b9cced1ae21161849b33bfe7e47 2025-12-04T09:33:42.0174031Z * [new tag] trunk/d78f52b199c547106d4cd9d2856dd0805c118bf1 -> trunk/d78f52b199c547106d4cd9d2856dd0805c118bf1 2025-12-04T09:33:42.0175129Z * [new tag] trunk/d8fd5c6eed28e5004150691d048a3f6785e19a8e -> trunk/d8fd5c6eed28e5004150691d048a3f6785e19a8e 2025-12-04T09:33:42.0176169Z * [new tag] trunk/d900f5e86745dec76713f4b0ef07005ef36b2f5a -> trunk/d900f5e86745dec76713f4b0ef07005ef36b2f5a 2025-12-04T09:33:42.0177712Z * [new tag] trunk/d973dc6b87d763859fe1c5bd1287e3b6b1c49d1b -> trunk/d973dc6b87d763859fe1c5bd1287e3b6b1c49d1b 2025-12-04T09:33:42.0178653Z * [new tag] trunk/d998c03304cb6ede76e1ed535b4ddeb6c2bf40ec -> trunk/d998c03304cb6ede76e1ed535b4ddeb6c2bf40ec 2025-12-04T09:33:42.0179892Z * [new tag] trunk/d9cb8a70833101dbbe16b99520cfbdd70d0a87bf -> trunk/d9cb8a70833101dbbe16b99520cfbdd70d0a87bf 2025-12-04T09:33:42.0180981Z * [new tag] trunk/d9d5e91b43f70eb8637af55db6856d49be391ffd -> trunk/d9d5e91b43f70eb8637af55db6856d49be391ffd 2025-12-04T09:33:42.0181945Z * [new tag] trunk/dd18a75336a4fbd7497955cc5665904724fce889 -> trunk/dd18a75336a4fbd7497955cc5665904724fce889 2025-12-04T09:33:42.0183020Z * [new tag] trunk/ded9bcd61a059bf723e6e84689552962b480ea77 -> trunk/ded9bcd61a059bf723e6e84689552962b480ea77 2025-12-04T09:33:42.0184143Z * [new tag] trunk/dfbd3714d15c37a7b83b322a6b60f997fc00f50c -> trunk/dfbd3714d15c37a7b83b322a6b60f997fc00f50c 2025-12-04T09:33:42.0185519Z * [new tag] trunk/e115f9f4e4b039f8e9a642aaa2bd8254a920541b -> trunk/e115f9f4e4b039f8e9a642aaa2bd8254a920541b 2025-12-04T09:33:42.0186305Z * [new tag] trunk/e3f24fd73ad74c6e7176687986436956c7c18235 -> trunk/e3f24fd73ad74c6e7176687986436956c7c18235 2025-12-04T09:33:42.0187599Z * [new tag] trunk/e7d24d3ff93d1503ba63860b7057438ad93f918e -> trunk/e7d24d3ff93d1503ba63860b7057438ad93f918e 2025-12-04T09:33:42.0188650Z * [new tag] trunk/ea7035f462a0d2830865ee86c832bd101e1427fc -> trunk/ea7035f462a0d2830865ee86c832bd101e1427fc 2025-12-04T09:33:42.0189723Z * [new tag] trunk/eabb7ad2128580ef674446027b95bcf4e21e8df3 -> trunk/eabb7ad2128580ef674446027b95bcf4e21e8df3 2025-12-04T09:33:42.0190857Z * [new tag] trunk/eb5c63652a33da42e7018c23df5f20a3eb4c6ccf -> trunk/eb5c63652a33da42e7018c23df5f20a3eb4c6ccf 2025-12-04T09:33:42.0191988Z * [new tag] trunk/ec2c71f5c85021b8938cdafadce24c15a36fd93e -> trunk/ec2c71f5c85021b8938cdafadce24c15a36fd93e 2025-12-04T09:33:42.0193113Z * [new tag] trunk/ecbcc3f6bf327856b435b259ac63cc2f328c4b4e -> trunk/ecbcc3f6bf327856b435b259ac63cc2f328c4b4e 2025-12-04T09:33:42.0194772Z * [new tag] trunk/ee87bbe876c42575e961b32a0827d76bc9782ca2 -> trunk/ee87bbe876c42575e961b32a0827d76bc9782ca2 2025-12-04T09:33:42.0195730Z * [new tag] trunk/ef019d1d431c4c5a95b594cb90d40a50cd00f5e4 -> trunk/ef019d1d431c4c5a95b594cb90d40a50cd00f5e4 2025-12-04T09:33:42.0196805Z * [new tag] trunk/ef8ecc13830a86c4b231f1aad9aba7851db61b53 -> trunk/ef8ecc13830a86c4b231f1aad9aba7851db61b53 2025-12-04T09:33:42.0197896Z * [new tag] trunk/f1076f5510920044912247b1abb8760cb820f598 -> trunk/f1076f5510920044912247b1abb8760cb820f598 2025-12-04T09:33:42.0199036Z * [new tag] trunk/f2d6a75a00a1d648ca9a0abc6a33e14c3dea6c40 -> trunk/f2d6a75a00a1d648ca9a0abc6a33e14c3dea6c40 2025-12-04T09:33:42.0200095Z * [new tag] trunk/f47dd0ddef1359e5b43e4b962412f67b30ecde56 -> trunk/f47dd0ddef1359e5b43e4b962412f67b30ecde56 2025-12-04T09:33:42.0201186Z * [new tag] trunk/f49d32dfa4730dcfb1b60eeeb369b5889da983c8 -> trunk/f49d32dfa4730dcfb1b60eeeb369b5889da983c8 2025-12-04T09:33:42.0202176Z * [new tag] trunk/f4dedf78fc30fd4b93975787ca6074ee89db9467 -> trunk/f4dedf78fc30fd4b93975787ca6074ee89db9467 2025-12-04T09:33:42.0203266Z * [new tag] trunk/f7c0d03819ebed05c4038f095d66d1b8c54aca17 -> trunk/f7c0d03819ebed05c4038f095d66d1b8c54aca17 2025-12-04T09:33:42.0204335Z * [new tag] trunk/f7e1bd80a063e17453c361837ba6ea2570920a73 -> trunk/f7e1bd80a063e17453c361837ba6ea2570920a73 2025-12-04T09:33:42.0205288Z * [new tag] trunk/f9bd6c53624c7c0ea3772de78498326e84c2f0e7 -> trunk/f9bd6c53624c7c0ea3772de78498326e84c2f0e7 2025-12-04T09:33:42.0206643Z * [new tag] trunk/fb5be221a46b51bfc9509013b0d85bc5a9d4f15b -> trunk/fb5be221a46b51bfc9509013b0d85bc5a9d4f15b 2025-12-04T09:33:42.0207631Z * [new tag] trunk/fdf863d5e1de3b2688c9511e96876e34581dbfd7 -> trunk/fdf863d5e1de3b2688c9511e96876e34581dbfd7 2025-12-04T09:33:42.0209438Z * [new tag] trunk/fe0e65adfc0e7ca6e5f57e6ea8b16bd5cc967307 -> trunk/fe0e65adfc0e7ca6e5f57e6ea8b16bd5cc967307 2025-12-04T09:33:42.0210383Z * [new tag] trunk/fec710bf89173f5355468a7ce1afe9157c3d9009 -> trunk/fec710bf89173f5355468a7ce1afe9157c3d9009 2025-12-04T09:33:42.0211658Z * [new tag] trunk/ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32 -> trunk/ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32 2025-12-04T09:33:42.0212359Z * [new tag] v0.1.1 -> v0.1.1 2025-12-04T09:33:42.0213637Z * [new tag] v0.1.10 -> v0.1.10 2025-12-04T09:33:42.0214668Z * [new tag] v0.1.11 -> v0.1.11 2025-12-04T09:33:42.0215714Z * [new tag] v0.1.12 -> v0.1.12 2025-12-04T09:33:42.0216824Z * [new tag] v0.1.2 -> v0.1.2 2025-12-04T09:33:42.0217808Z * [new tag] v0.1.3 -> v0.1.3 2025-12-04T09:33:42.0218744Z * [new tag] v0.1.4 -> v0.1.4 2025-12-04T09:33:42.0219738Z * [new tag] v0.1.5 -> v0.1.5 2025-12-04T09:33:42.0220694Z * [new tag] v0.1.6 -> v0.1.6 2025-12-04T09:33:42.0221682Z * [new tag] v0.1.7 -> v0.1.7 2025-12-04T09:33:42.0222473Z * [new tag] v0.1.8 -> v0.1.8 2025-12-04T09:33:42.0223614Z * [new tag] v0.1.9 -> v0.1.9 2025-12-04T09:33:42.0224613Z * [new tag] v0.2.0 -> v0.2.0 2025-12-04T09:33:42.0225675Z * [new tag] v0.3.0 -> v0.3.0 2025-12-04T09:33:42.0226777Z * [new tag] v0.3.1 -> v0.3.1 2025-12-04T09:33:42.0227761Z * [new tag] v0.4.0 -> v0.4.0 2025-12-04T09:33:42.0228612Z * [new tag] v0.4.1 -> v0.4.1 2025-12-04T09:33:42.0229731Z * [new tag] v1.0.0 -> v1.0.0 2025-12-04T09:33:42.0230792Z * [new tag] v1.0.0a0 -> v1.0.0a0 2025-12-04T09:33:42.0231623Z * [new tag] v1.0.1 -> v1.0.1 2025-12-04T09:33:42.0232780Z * [new tag] v1.0rc0 -> v1.0rc0 2025-12-04T09:33:42.0233457Z * [new tag] v1.0rc1 -> v1.0rc1 2025-12-04T09:33:42.0234590Z * [new tag] v1.1.0 -> v1.1.0 2025-12-04T09:33:42.0235853Z * [new tag] v1.1.0a0 -> v1.1.0a0 2025-12-04T09:33:42.0237086Z * [new tag] v1.10.0 -> v1.10.0 2025-12-04T09:33:42.0238186Z * [new tag] v1.10.0-rc1 -> v1.10.0-rc1 2025-12-04T09:33:42.0239278Z * [new tag] v1.10.0-rc2 -> v1.10.0-rc2 2025-12-04T09:33:42.0239955Z * [new tag] v1.10.0-rc3 -> v1.10.0-rc3 2025-12-04T09:33:42.0241134Z * [new tag] v1.10.1 -> v1.10.1 2025-12-04T09:33:42.0241830Z * [new tag] v1.10.1-rc1 -> v1.10.1-rc1 2025-12-04T09:33:42.0242805Z * [new tag] v1.10.2 -> v1.10.2 2025-12-04T09:33:42.0244123Z * [new tag] v1.10.2-rc1 -> v1.10.2-rc1 2025-12-04T09:33:42.0245201Z * [new tag] v1.11.0 -> v1.11.0 2025-12-04T09:33:42.0246482Z * [new tag] v1.11.0-rc1 -> v1.11.0-rc1 2025-12-04T09:33:42.0247612Z * [new tag] v1.11.0-rc2 -> v1.11.0-rc2 2025-12-04T09:33:42.0248738Z * [new tag] v1.11.0-rc3 -> v1.11.0-rc3 2025-12-04T09:33:42.0249841Z * [new tag] v1.11.0-rc4 -> v1.11.0-rc4 2025-12-04T09:33:42.0250925Z * [new tag] v1.11.0-rc5 -> v1.11.0-rc5 2025-12-04T09:33:42.0251644Z * [new tag] v1.11.0-rc6 -> v1.11.0-rc6 2025-12-04T09:33:42.0252489Z * [new tag] v1.11.0-rc7 -> v1.11.0-rc7 2025-12-04T09:33:42.0253803Z * [new tag] v1.12.0 -> v1.12.0 2025-12-04T09:33:42.0254603Z * [new tag] v1.12.0-rc1 -> v1.12.0-rc1 2025-12-04T09:33:42.0255791Z * [new tag] v1.12.0-rc2 -> v1.12.0-rc2 2025-12-04T09:33:42.0257004Z * [new tag] v1.12.0-rc3 -> v1.12.0-rc3 2025-12-04T09:33:42.0258230Z * [new tag] v1.12.0-rc4 -> v1.12.0-rc4 2025-12-04T09:33:42.0259235Z * [new tag] v1.12.0-rc5 -> v1.12.0-rc5 2025-12-04T09:33:42.0260421Z * [new tag] v1.12.0-rc6 -> v1.12.0-rc6 2025-12-04T09:33:42.0261181Z * [new tag] v1.12.0-rc7 -> v1.12.0-rc7 2025-12-04T09:33:42.0261940Z * [new tag] v1.12.0-rc8 -> v1.12.0-rc8 2025-12-04T09:33:42.0262766Z * [new tag] v1.12.1 -> v1.12.1 2025-12-04T09:33:42.0264066Z * [new tag] v1.12.1-rc1 -> v1.12.1-rc1 2025-12-04T09:33:42.0265168Z * [new tag] v1.12.1-rc2 -> v1.12.1-rc2 2025-12-04T09:33:42.0266244Z * [new tag] v1.12.1-rc3 -> v1.12.1-rc3 2025-12-04T09:33:42.0267323Z * [new tag] v1.12.1-rc4 -> v1.12.1-rc4 2025-12-04T09:33:42.0268045Z * [new tag] v1.12.1-rc5 -> v1.12.1-rc5 2025-12-04T09:33:42.0269289Z * [new tag] v1.13.0 -> v1.13.0 2025-12-04T09:33:42.0270309Z * [new tag] v1.13.0-rc1 -> v1.13.0-rc1 2025-12-04T09:33:42.0271499Z * [new tag] v1.13.0-rc2 -> v1.13.0-rc2 2025-12-04T09:33:42.0272754Z * [new tag] v1.13.0-rc3 -> v1.13.0-rc3 2025-12-04T09:33:42.0274040Z * [new tag] v1.13.0-rc4 -> v1.13.0-rc4 2025-12-04T09:33:42.0274699Z * [new tag] v1.13.0-rc5 -> v1.13.0-rc5 2025-12-04T09:33:42.0275675Z * [new tag] v1.13.0-rc6 -> v1.13.0-rc6 2025-12-04T09:33:42.0276782Z * [new tag] v1.13.1 -> v1.13.1 2025-12-04T09:33:42.0277545Z * [new tag] v1.13.1-rc1 -> v1.13.1-rc1 2025-12-04T09:33:42.0278627Z * [new tag] v1.2.0 -> v1.2.0 2025-12-04T09:33:42.0279793Z * [new tag] v1.2.0a0 -> v1.2.0a0 2025-12-04T09:33:42.0280822Z * [new tag] v1.3.0 -> v1.3.0 2025-12-04T09:33:42.0281854Z * [new tag] v1.3.0a0 -> v1.3.0a0 2025-12-04T09:33:42.0282557Z * [new tag] v1.3.1 -> v1.3.1 2025-12-04T09:33:42.0283659Z * [new tag] v1.4.0 -> v1.4.0 2025-12-04T09:33:42.0284651Z * [new tag] v1.4.0a0 -> v1.4.0a0 2025-12-04T09:33:42.0285470Z * [new tag] v1.4.1 -> v1.4.1 2025-12-04T09:33:42.0286602Z * [new tag] v1.5.0 -> v1.5.0 2025-12-04T09:33:42.0287742Z * [new tag] v1.5.0-rc1 -> v1.5.0-rc1 2025-12-04T09:33:42.0288856Z * [new tag] v1.5.0-rc2 -> v1.5.0-rc2 2025-12-04T09:33:42.0289978Z * [new tag] v1.5.0-rc3 -> v1.5.0-rc3 2025-12-04T09:33:42.0290820Z * [new tag] v1.5.0-rc4 -> v1.5.0-rc4 2025-12-04T09:33:42.0291636Z * [new tag] v1.5.0-rc5 -> v1.5.0-rc5 2025-12-04T09:33:42.0292893Z * [new tag] v1.5.1 -> v1.5.1 2025-12-04T09:33:42.0293623Z * [new tag] v1.5.1-rc1 -> v1.5.1-rc1 2025-12-04T09:33:42.0294411Z * [new tag] v1.6.0 -> v1.6.0 2025-12-04T09:33:42.0295544Z * [new tag] v1.6.0-rc1 -> v1.6.0-rc1 2025-12-04T09:33:42.0296943Z * [new tag] v1.6.0-rc2 -> v1.6.0-rc2 2025-12-04T09:33:42.0297928Z * [new tag] v1.6.0-rc3 -> v1.6.0-rc3 2025-12-04T09:33:42.0298940Z * [new tag] v1.6.0-rc4 -> v1.6.0-rc4 2025-12-04T09:33:42.0300014Z * [new tag] v1.6.0-rc5 -> v1.6.0-rc5 2025-12-04T09:33:42.0301012Z * [new tag] v1.6.0-rc6 -> v1.6.0-rc6 2025-12-04T09:33:42.0301727Z * [new tag] v1.6.0-rc7 -> v1.6.0-rc7 2025-12-04T09:33:42.0302957Z * [new tag] v1.7.0 -> v1.7.0 2025-12-04T09:33:42.0304028Z * [new tag] v1.7.0-rc1 -> v1.7.0-rc1 2025-12-04T09:33:42.0305217Z * [new tag] v1.7.0-rc2 -> v1.7.0-rc2 2025-12-04T09:33:42.0306357Z * [new tag] v1.7.0-rc3 -> v1.7.0-rc3 2025-12-04T09:33:42.0307056Z * [new tag] v1.7.0-rc4 -> v1.7.0-rc4 2025-12-04T09:33:42.0308222Z * [new tag] v1.7.1 -> v1.7.1 2025-12-04T09:33:42.0309500Z * [new tag] v1.7.1-rc1 -> v1.7.1-rc1 2025-12-04T09:33:42.0311232Z * [new tag] v1.7.1-rc2 -> v1.7.1-rc2 2025-12-04T09:33:42.0311926Z * [new tag] v1.7.1-rc3 -> v1.7.1-rc3 2025-12-04T09:33:42.0313261Z * [new tag] v1.8.0 -> v1.8.0 2025-12-04T09:33:42.0313960Z * [new tag] v1.8.0-rc1 -> v1.8.0-rc1 2025-12-04T09:33:42.0315211Z * [new tag] v1.8.0-rc2 -> v1.8.0-rc2 2025-12-04T09:33:42.0316244Z * [new tag] v1.8.0-rc3 -> v1.8.0-rc3 2025-12-04T09:33:42.0317240Z * [new tag] v1.8.0-rc4 -> v1.8.0-rc4 2025-12-04T09:33:42.0317987Z * [new tag] v1.8.0-rc5 -> v1.8.0-rc5 2025-12-04T09:33:42.0318807Z * [new tag] v1.8.1 -> v1.8.1 2025-12-04T09:33:42.0320050Z * [new tag] v1.8.1-rc1 -> v1.8.1-rc1 2025-12-04T09:33:42.0320736Z * [new tag] v1.8.1-rc2 -> v1.8.1-rc2 2025-12-04T09:33:42.0321548Z * [new tag] v1.8.1-rc3 -> v1.8.1-rc3 2025-12-04T09:33:42.0323337Z * [new tag] v1.8.2 -> v1.8.2 2025-12-04T09:33:42.0324036Z * [new tag] v1.8.2-rc1 -> v1.8.2-rc1 2025-12-04T09:33:42.0325270Z * [new tag] v1.9.0 -> v1.9.0 2025-12-04T09:33:42.0326349Z * [new tag] v1.9.0-rc1 -> v1.9.0-rc1 2025-12-04T09:33:42.0327507Z * [new tag] v1.9.0-rc2 -> v1.9.0-rc2 2025-12-04T09:33:42.0328539Z * [new tag] v1.9.0-rc3 -> v1.9.0-rc3 2025-12-04T09:33:42.0329282Z * [new tag] v1.9.0-rc4 -> v1.9.0-rc4 2025-12-04T09:33:42.0330506Z * [new tag] v1.9.1 -> v1.9.1 2025-12-04T09:33:42.0331744Z * [new tag] v1.9.1-rc1 -> v1.9.1-rc1 2025-12-04T09:33:42.0332584Z * [new tag] v1.9.1-rc2 -> v1.9.1-rc2 2025-12-04T09:33:42.0333662Z * [new tag] v2.0.0 -> v2.0.0 2025-12-04T09:33:42.0334681Z * [new tag] v2.0.0-rc1 -> v2.0.0-rc1 2025-12-04T09:33:42.0335764Z * [new tag] v2.0.0-rc2 -> v2.0.0-rc2 2025-12-04T09:33:42.0336951Z * [new tag] v2.0.0-rc3 -> v2.0.0-rc3 2025-12-04T09:33:42.0338017Z * [new tag] v2.0.0-rc4 -> v2.0.0-rc4 2025-12-04T09:33:42.0339097Z * [new tag] v2.0.0-rc5 -> v2.0.0-rc5 2025-12-04T09:33:42.0340010Z * [new tag] v2.0.0-rc6 -> v2.0.0-rc6 2025-12-04T09:33:42.0341053Z * [new tag] v2.0.1 -> v2.0.1 2025-12-04T09:33:42.0342169Z * [new tag] v2.0.1-rc1 -> v2.0.1-rc1 2025-12-04T09:33:42.0342915Z * [new tag] v2.0.1-rc2 -> v2.0.1-rc2 2025-12-04T09:33:42.0344085Z * [new tag] v2.0.1-rc3 -> v2.0.1-rc3 2025-12-04T09:33:42.0344723Z * [new tag] v2.0.1-rc4 -> v2.0.1-rc4 2025-12-04T09:33:42.0346596Z * [new tag] v2.1.0 -> v2.1.0 2025-12-04T09:33:42.0347648Z * [new tag] v2.1.0-rc1 -> v2.1.0-rc1 2025-12-04T09:33:42.0348812Z * [new tag] v2.1.0-rc2 -> v2.1.0-rc2 2025-12-04T09:33:42.0349911Z * [new tag] v2.1.0-rc3 -> v2.1.0-rc3 2025-12-04T09:33:42.0350987Z * [new tag] v2.1.0-rc4 -> v2.1.0-rc4 2025-12-04T09:33:42.0352089Z * [new tag] v2.1.0-rc5 -> v2.1.0-rc5 2025-12-04T09:33:42.0352870Z * [new tag] v2.1.0-rc6 -> v2.1.0-rc6 2025-12-04T09:33:42.0354155Z * [new tag] v2.1.1 -> v2.1.1 2025-12-04T09:33:42.0355345Z * [new tag] v2.1.1-rc1 -> v2.1.1-rc1 2025-12-04T09:33:42.0356368Z * [new tag] v2.1.1-rc2 -> v2.1.1-rc2 2025-12-04T09:33:42.0357628Z * [new tag] v2.1.1-rc3 -> v2.1.1-rc3 2025-12-04T09:33:42.0358655Z * [new tag] v2.1.1-rc4 -> v2.1.1-rc4 2025-12-04T09:33:42.0359751Z * [new tag] v2.1.1-rc5 -> v2.1.1-rc5 2025-12-04T09:33:42.0360552Z * [new tag] v2.1.1-rc6 -> v2.1.1-rc6 2025-12-04T09:33:42.0361624Z * [new tag] v2.1.2 -> v2.1.2 2025-12-04T09:33:42.0362775Z * [new tag] v2.1.2-rc1 -> v2.1.2-rc1 2025-12-04T09:33:42.0363862Z * [new tag] v2.1.2-rc2 -> v2.1.2-rc2 2025-12-04T09:33:42.0364604Z * [new tag] v2.1.2-rc3 -> v2.1.2-rc3 2025-12-04T09:33:42.0365849Z * [new tag] v2.2.0 -> v2.2.0 2025-12-04T09:33:42.0366827Z * [new tag] v2.2.0-rc1 -> v2.2.0-rc1 2025-12-04T09:33:42.0367923Z * [new tag] v2.2.0-rc2 -> v2.2.0-rc2 2025-12-04T09:33:42.0369409Z * [new tag] v2.2.0-rc3 -> v2.2.0-rc3 2025-12-04T09:33:42.0370500Z * [new tag] v2.2.0-rc4 -> v2.2.0-rc4 2025-12-04T09:33:42.0371395Z * [new tag] v2.2.0-rc5 -> v2.2.0-rc5 2025-12-04T09:33:42.0372663Z * [new tag] v2.2.0-rc6 -> v2.2.0-rc6 2025-12-04T09:33:42.0373444Z * [new tag] v2.2.0-rc7 -> v2.2.0-rc7 2025-12-04T09:33:42.0374291Z * [new tag] v2.2.0-rc8 -> v2.2.0-rc8 2025-12-04T09:33:42.0375503Z * [new tag] v2.2.1 -> v2.2.1 2025-12-04T09:33:42.0376742Z * [new tag] v2.2.1-rc1 -> v2.2.1-rc1 2025-12-04T09:33:42.0377520Z * [new tag] v2.2.1-rc2 -> v2.2.1-rc2 2025-12-04T09:33:42.0379024Z * [new tag] v2.2.1-rc3 -> v2.2.1-rc3 2025-12-04T09:33:42.0379755Z * [new tag] v2.2.2 -> v2.2.2 2025-12-04T09:33:42.0381220Z * [new tag] v2.2.2-rc1 -> v2.2.2-rc1 2025-12-04T09:33:42.0381956Z * [new tag] v2.2.2-rc2 -> v2.2.2-rc2 2025-12-04T09:33:42.0382982Z * [new tag] v2.2.2-rc3 -> v2.2.2-rc3 2025-12-04T09:33:42.0384188Z * [new tag] v2.3.0 -> v2.3.0 2025-12-04T09:33:42.0385003Z * [new tag] v2.3.0-rc1 -> v2.3.0-rc1 2025-12-04T09:33:42.0386267Z * [new tag] v2.3.0-rc10 -> v2.3.0-rc10 2025-12-04T09:33:42.0387417Z * [new tag] v2.3.0-rc11 -> v2.3.0-rc11 2025-12-04T09:33:42.0388158Z * [new tag] v2.3.0-rc12 -> v2.3.0-rc12 2025-12-04T09:33:42.0389308Z * [new tag] v2.3.0-rc2 -> v2.3.0-rc2 2025-12-04T09:33:42.0390411Z * [new tag] v2.3.0-rc3 -> v2.3.0-rc3 2025-12-04T09:33:42.0391506Z * [new tag] v2.3.0-rc4 -> v2.3.0-rc4 2025-12-04T09:33:42.0392493Z * [new tag] v2.3.0-rc5 -> v2.3.0-rc5 2025-12-04T09:33:42.0393306Z * [new tag] v2.3.0-rc6 -> v2.3.0-rc6 2025-12-04T09:33:42.0394470Z * [new tag] v2.3.0-rc7 -> v2.3.0-rc7 2025-12-04T09:33:42.0395552Z * [new tag] v2.3.0-rc8 -> v2.3.0-rc8 2025-12-04T09:33:42.0396286Z * [new tag] v2.3.0-rc9 -> v2.3.0-rc9 2025-12-04T09:33:42.0397076Z * [new tag] v2.3.1 -> v2.3.1 2025-12-04T09:33:42.0398330Z * [new tag] v2.3.1-rc1 -> v2.3.1-rc1 2025-12-04T09:33:42.0399379Z * [new tag] v2.3.1-rc2 -> v2.3.1-rc2 2025-12-04T09:33:42.0400470Z * [new tag] v2.3.1-rc3 -> v2.3.1-rc3 2025-12-04T09:33:42.0401571Z * [new tag] v2.4.0 -> v2.4.0 2025-12-04T09:33:42.0402681Z * [new tag] v2.4.0-rc1 -> v2.4.0-rc1 2025-12-04T09:33:42.0403739Z * [new tag] v2.4.0-rc2 -> v2.4.0-rc2 2025-12-04T09:33:42.0404742Z * [new tag] v2.4.0-rc3 -> v2.4.0-rc3 2025-12-04T09:33:42.0405789Z * [new tag] v2.4.0-rc4 -> v2.4.0-rc4 2025-12-04T09:33:42.0406878Z * [new tag] v2.4.0-rc5 -> v2.4.0-rc5 2025-12-04T09:33:42.0408032Z * [new tag] v2.4.0-rc6 -> v2.4.0-rc6 2025-12-04T09:33:42.0409073Z * [new tag] v2.4.0-rc7 -> v2.4.0-rc7 2025-12-04T09:33:42.0410090Z * [new tag] v2.4.0-rc8 -> v2.4.0-rc8 2025-12-04T09:33:42.0411237Z * [new tag] v2.4.0-rc9 -> v2.4.0-rc9 2025-12-04T09:33:42.0412201Z * [new tag] v2.4.1 -> v2.4.1 2025-12-04T09:33:42.0413366Z * [new tag] v2.4.1-rc1 -> v2.4.1-rc1 2025-12-04T09:33:42.0414559Z * [new tag] v2.4.1-rc2 -> v2.4.1-rc2 2025-12-04T09:33:42.0415660Z * [new tag] v2.4.1-rc3 -> v2.4.1-rc3 2025-12-04T09:33:42.0416856Z * [new tag] v2.5.0 -> v2.5.0 2025-12-04T09:33:42.0417904Z * [new tag] v2.5.0-rc1 -> v2.5.0-rc1 2025-12-04T09:33:42.0418678Z * [new tag] v2.5.0-rc10 -> v2.5.0-rc10 2025-12-04T09:33:42.0419776Z * [new tag] v2.5.0-rc2 -> v2.5.0-rc2 2025-12-04T09:33:42.0420862Z * [new tag] v2.5.0-rc3 -> v2.5.0-rc3 2025-12-04T09:33:42.0421926Z * [new tag] v2.5.0-rc4 -> v2.5.0-rc4 2025-12-04T09:33:42.0422994Z * [new tag] v2.5.0-rc5 -> v2.5.0-rc5 2025-12-04T09:33:42.0424182Z * [new tag] v2.5.0-rc6 -> v2.5.0-rc6 2025-12-04T09:33:42.0425249Z * [new tag] v2.5.0-rc7 -> v2.5.0-rc7 2025-12-04T09:33:42.0426348Z * [new tag] v2.5.0-rc8 -> v2.5.0-rc8 2025-12-04T09:33:42.0427520Z * [new tag] v2.5.0-rc9 -> v2.5.0-rc9 2025-12-04T09:33:42.0428175Z * [new tag] v2.5.1 -> v2.5.1 2025-12-04T09:33:42.0429005Z * [new tag] v2.5.1-rc1 -> v2.5.1-rc1 2025-12-04T09:33:42.0429812Z * [new tag] v2.6.0 -> v2.6.0 2025-12-04T09:33:42.0431029Z * [new tag] v2.6.0-rc1 -> v2.6.0-rc1 2025-12-04T09:33:42.0432239Z * [new tag] v2.6.0-rc2 -> v2.6.0-rc2 2025-12-04T09:33:42.0433328Z * [new tag] v2.6.0-rc3 -> v2.6.0-rc3 2025-12-04T09:33:42.0434337Z * [new tag] v2.6.0-rc4 -> v2.6.0-rc4 2025-12-04T09:33:42.0435651Z * [new tag] v2.6.0-rc5 -> v2.6.0-rc5 2025-12-04T09:33:42.0436857Z * [new tag] v2.6.0-rc6 -> v2.6.0-rc6 2025-12-04T09:33:42.0437979Z * [new tag] v2.6.0-rc7 -> v2.6.0-rc7 2025-12-04T09:33:42.0439168Z * [new tag] v2.6.0-rc8 -> v2.6.0-rc8 2025-12-04T09:33:42.0440275Z * [new tag] v2.6.0-rc9 -> v2.6.0-rc9 2025-12-04T09:33:42.0441580Z * [new tag] v2.7.0 -> v2.7.0 2025-12-04T09:33:42.0442659Z * [new tag] v2.7.0-rc1 -> v2.7.0-rc1 2025-12-04T09:33:42.0443419Z * [new tag] v2.7.0-rc10 -> v2.7.0-rc10 2025-12-04T09:33:42.0444705Z * [new tag] v2.7.0-rc2 -> v2.7.0-rc2 2025-12-04T09:33:42.0445879Z * [new tag] v2.7.0-rc3 -> v2.7.0-rc3 2025-12-04T09:33:42.0447575Z * [new tag] v2.7.0-rc4 -> v2.7.0-rc4 2025-12-04T09:33:42.0448613Z * [new tag] v2.7.0-rc5 -> v2.7.0-rc5 2025-12-04T09:33:42.0449764Z * [new tag] v2.7.0-rc6 -> v2.7.0-rc6 2025-12-04T09:33:42.0450899Z * [new tag] v2.7.0-rc7 -> v2.7.0-rc7 2025-12-04T09:33:42.0452037Z * [new tag] v2.7.0-rc8 -> v2.7.0-rc8 2025-12-04T09:33:42.0453235Z * [new tag] v2.7.0-rc9 -> v2.7.0-rc9 2025-12-04T09:33:42.0453993Z * [new tag] v2.7.1 -> v2.7.1 2025-12-04T09:33:42.0455190Z * [new tag] v2.7.1-rc1 -> v2.7.1-rc1 2025-12-04T09:33:42.0456419Z * [new tag] v2.7.1-rc2 -> v2.7.1-rc2 2025-12-04T09:33:42.0457648Z * [new tag] v2.7.1-rc3 -> v2.7.1-rc3 2025-12-04T09:33:42.0458780Z * [new tag] v2.7.1-rc4 -> v2.7.1-rc4 2025-12-04T09:33:42.0459892Z * [new tag] v2.7.1-rc5 -> v2.7.1-rc5 2025-12-04T09:33:42.0460725Z * [new tag] v2.8.0 -> v2.8.0 2025-12-04T09:33:42.0461956Z * [new tag] v2.8.0-rc1 -> v2.8.0-rc1 2025-12-04T09:33:42.0462995Z * [new tag] v2.8.0-rc2 -> v2.8.0-rc2 2025-12-04T09:33:42.0464337Z * [new tag] v2.8.0-rc3 -> v2.8.0-rc3 2025-12-04T09:33:42.0465459Z * [new tag] v2.8.0-rc4 -> v2.8.0-rc4 2025-12-04T09:33:42.0466564Z * [new tag] v2.8.0-rc5 -> v2.8.0-rc5 2025-12-04T09:33:42.0467730Z * [new tag] v2.8.0-rc6 -> v2.8.0-rc6 2025-12-04T09:33:42.0468834Z * [new tag] v2.8.0-rc7 -> v2.8.0-rc7 2025-12-04T09:33:42.0469907Z * [new tag] v2.8.0-rc8 -> v2.8.0-rc8 2025-12-04T09:33:42.0471172Z * [new tag] v2.9.0 -> v2.9.0 2025-12-04T09:33:42.0472338Z * [new tag] v2.9.0-rc1 -> v2.9.0-rc1 2025-12-04T09:33:42.0473576Z * [new tag] v2.9.0-rc10 -> v2.9.0-rc10 2025-12-04T09:33:42.0474591Z * [new tag] v2.9.0-rc11 -> v2.9.0-rc11 2025-12-04T09:33:42.0475920Z * [new tag] v2.9.0-rc2 -> v2.9.0-rc2 2025-12-04T09:33:42.0477048Z * [new tag] v2.9.0-rc3 -> v2.9.0-rc3 2025-12-04T09:33:42.0478183Z * [new tag] v2.9.0-rc4 -> v2.9.0-rc4 2025-12-04T09:33:42.0479333Z * [new tag] v2.9.0-rc5 -> v2.9.0-rc5 2025-12-04T09:33:42.0480659Z * [new tag] v2.9.0-rc6 -> v2.9.0-rc6 2025-12-04T09:33:42.0481769Z * [new tag] v2.9.0-rc7 -> v2.9.0-rc7 2025-12-04T09:33:42.0483081Z * [new tag] v2.9.0-rc8 -> v2.9.0-rc8 2025-12-04T09:33:42.0483909Z * [new tag] v2.9.0-rc9 -> v2.9.0-rc9 2025-12-04T09:33:42.0484829Z * [new tag] v2.9.1 -> v2.9.1 2025-12-04T09:33:42.0486068Z * [new tag] v2.9.1-rc1 -> v2.9.1-rc1 2025-12-04T09:33:42.0487229Z * [new tag] v2.9.1-rc2 -> v2.9.1-rc2 2025-12-04T09:33:42.0488844Z * [new tag] viable/strict/1759343184 -> viable/strict/1759343184 2025-12-04T09:33:42.0489818Z * [new tag] viable/strict/1759346540 -> viable/strict/1759346540 2025-12-04T09:33:42.0490902Z * [new tag] viable/strict/1759348181 -> viable/strict/1759348181 2025-12-04T09:33:42.0491937Z * [new tag] viable/strict/1759350324 -> viable/strict/1759350324 2025-12-04T09:33:42.0492950Z * [new tag] viable/strict/1759351793 -> viable/strict/1759351793 2025-12-04T09:33:42.0493954Z * [new tag] viable/strict/1759353844 -> viable/strict/1759353844 2025-12-04T09:33:42.0494952Z * [new tag] viable/strict/1759355374 -> viable/strict/1759355374 2025-12-04T09:33:42.0495940Z * [new tag] viable/strict/1759357472 -> viable/strict/1759357472 2025-12-04T09:33:42.0497042Z * [new tag] viable/strict/1759361002 -> viable/strict/1759361002 2025-12-04T09:33:42.0498397Z * [new tag] viable/strict/1759362585 -> viable/strict/1759362585 2025-12-04T09:33:42.0499712Z * [new tag] viable/strict/1759365359 -> viable/strict/1759365359 2025-12-04T09:33:42.0500790Z * [new tag] viable/strict/1759370089 -> viable/strict/1759370089 2025-12-04T09:33:42.0501868Z * [new tag] viable/strict/1759377554 -> viable/strict/1759377554 2025-12-04T09:33:42.0502941Z * [new tag] viable/strict/1759379133 -> viable/strict/1759379133 2025-12-04T09:33:42.0503999Z * [new tag] viable/strict/1759389871 -> viable/strict/1759389871 2025-12-04T09:33:42.0505061Z * [new tag] viable/strict/1759393562 -> viable/strict/1759393562 2025-12-04T09:33:42.0506175Z * [new tag] viable/strict/1759395076 -> viable/strict/1759395076 2025-12-04T09:33:42.0507304Z * [new tag] viable/strict/1759398579 -> viable/strict/1759398579 2025-12-04T09:33:42.0508343Z * [new tag] viable/strict/1759404142 -> viable/strict/1759404142 2025-12-04T09:33:42.0509377Z * [new tag] viable/strict/1759405773 -> viable/strict/1759405773 2025-12-04T09:33:42.0510403Z * [new tag] viable/strict/1759408041 -> viable/strict/1759408041 2025-12-04T09:33:42.0511439Z * [new tag] viable/strict/1759411593 -> viable/strict/1759411593 2025-12-04T09:33:42.0512482Z * [new tag] viable/strict/1759427395 -> viable/strict/1759427395 2025-12-04T09:33:42.0513577Z * [new tag] viable/strict/1759434582 -> viable/strict/1759434582 2025-12-04T09:33:42.0514641Z * [new tag] viable/strict/1759436720 -> viable/strict/1759436720 2025-12-04T09:33:42.0515761Z * [new tag] viable/strict/1759440219 -> viable/strict/1759440219 2025-12-04T09:33:42.0516736Z * [new tag] viable/strict/1759441948 -> viable/strict/1759441948 2025-12-04T09:33:42.0517729Z * [new tag] viable/strict/1759443860 -> viable/strict/1759443860 2025-12-04T09:33:42.0518842Z * [new tag] viable/strict/1759445377 -> viable/strict/1759445377 2025-12-04T09:33:42.0519982Z * [new tag] viable/strict/1759447415 -> viable/strict/1759447415 2025-12-04T09:33:42.0520999Z * [new tag] viable/strict/1759451750 -> viable/strict/1759451750 2025-12-04T09:33:42.0522049Z * [new tag] viable/strict/1759453910 -> viable/strict/1759453910 2025-12-04T09:33:42.0523144Z * [new tag] viable/strict/1759456483 -> viable/strict/1759456483 2025-12-04T09:33:42.0524221Z * [new tag] viable/strict/1759459279 -> viable/strict/1759459279 2025-12-04T09:33:42.0525284Z * [new tag] viable/strict/1759460742 -> viable/strict/1759460742 2025-12-04T09:33:42.0526316Z * [new tag] viable/strict/1759462025 -> viable/strict/1759462025 2025-12-04T09:33:42.0527428Z * [new tag] viable/strict/1759469086 -> viable/strict/1759469086 2025-12-04T09:33:42.0528472Z * [new tag] viable/strict/1759470581 -> viable/strict/1759470581 2025-12-04T09:33:42.0529504Z * [new tag] viable/strict/1759472786 -> viable/strict/1759472786 2025-12-04T09:33:42.0530610Z * [new tag] viable/strict/1759476294 -> viable/strict/1759476294 2025-12-04T09:33:42.0531626Z * [new tag] viable/strict/1759479963 -> viable/strict/1759479963 2025-12-04T09:33:42.0532660Z * [new tag] viable/strict/1759492177 -> viable/strict/1759492177 2025-12-04T09:33:42.0533693Z * [new tag] viable/strict/1759519278 -> viable/strict/1759519278 2025-12-04T09:33:42.0534733Z * [new tag] viable/strict/1759524580 -> viable/strict/1759524580 2025-12-04T09:33:42.0535726Z * [new tag] viable/strict/1759528193 -> viable/strict/1759528193 2025-12-04T09:33:42.0537119Z * [new tag] viable/strict/1759533797 -> viable/strict/1759533797 2025-12-04T09:33:42.0538180Z * [new tag] viable/strict/1759542780 -> viable/strict/1759542780 2025-12-04T09:33:42.0539277Z * [new tag] viable/strict/1759549779 -> viable/strict/1759549779 2025-12-04T09:33:42.0540343Z * [new tag] viable/strict/1759555455 -> viable/strict/1759555455 2025-12-04T09:33:42.0541371Z * [new tag] viable/strict/1759559176 -> viable/strict/1759559176 2025-12-04T09:33:42.0542948Z * [new tag] viable/strict/1759560629 -> viable/strict/1759560629 2025-12-04T09:33:42.0543967Z * [new tag] viable/strict/1759569848 -> viable/strict/1759569848 2025-12-04T09:33:42.0545262Z * [new tag] viable/strict/1759571382 -> viable/strict/1759571382 2025-12-04T09:33:42.0546259Z * [new tag] viable/strict/1759573474 -> viable/strict/1759573474 2025-12-04T09:33:42.0547315Z * [new tag] viable/strict/1759618187 -> viable/strict/1759618187 2025-12-04T09:33:42.0548378Z * [new tag] viable/strict/1759626742 -> viable/strict/1759626742 2025-12-04T09:33:42.0549479Z * [new tag] viable/strict/1759632427 -> viable/strict/1759632427 2025-12-04T09:33:42.0550554Z * [new tag] viable/strict/1759634971 -> viable/strict/1759634971 2025-12-04T09:33:42.0551645Z * [new tag] viable/strict/1759661382 -> viable/strict/1759661382 2025-12-04T09:33:42.0552735Z * [new tag] viable/strict/1759663294 -> viable/strict/1759663294 2025-12-04T09:33:42.0553749Z * [new tag] viable/strict/1759708178 -> viable/strict/1759708178 2025-12-04T09:33:42.0554943Z * [new tag] viable/strict/1759715695 -> viable/strict/1759715695 2025-12-04T09:33:42.0555940Z * [new tag] viable/strict/1759728293 -> viable/strict/1759728293 2025-12-04T09:33:42.0556988Z * [new tag] viable/strict/1759735513 -> viable/strict/1759735513 2025-12-04T09:33:42.0558089Z * [new tag] viable/strict/1759739177 -> viable/strict/1759739177 2025-12-04T09:33:42.0559113Z * [new tag] viable/strict/1759758635 -> viable/strict/1759758635 2025-12-04T09:33:42.0560222Z * [new tag] viable/strict/1759765784 -> viable/strict/1759765784 2025-12-04T09:33:42.0561229Z * [new tag] viable/strict/1759767948 -> viable/strict/1759767948 2025-12-04T09:33:42.0562319Z * [new tag] viable/strict/1759771461 -> viable/strict/1759771461 2025-12-04T09:33:42.0563140Z * [new tag] viable/strict/1759776706 -> viable/strict/1759776706 2025-12-04T09:33:42.0564304Z * [new tag] viable/strict/1759782317 -> viable/strict/1759782317 2025-12-04T09:33:42.0565468Z * [new tag] viable/strict/1759783777 -> viable/strict/1759783777 2025-12-04T09:33:42.0566651Z * [new tag] viable/strict/1759785815 -> viable/strict/1759785815 2025-12-04T09:33:42.0567768Z * [new tag] viable/strict/1759789459 -> viable/strict/1759789459 2025-12-04T09:33:42.0568804Z * [new tag] viable/strict/1759790974 -> viable/strict/1759790974 2025-12-04T09:33:42.0569634Z * [new tag] viable/strict/1759794583 -> viable/strict/1759794583 2025-12-04T09:33:42.0570847Z * [new tag] viable/strict/1759797408 -> viable/strict/1759797408 2025-12-04T09:33:42.0572232Z * [new tag] viable/strict/1759799518 -> viable/strict/1759799518 2025-12-04T09:33:42.0573283Z * [new tag] viable/strict/1759804909 -> viable/strict/1759804909 2025-12-04T09:33:42.0574405Z * [new tag] viable/strict/1759807643 -> viable/strict/1759807643 2025-12-04T09:33:42.0575455Z * [new tag] viable/strict/1759809089 -> viable/strict/1759809089 2025-12-04T09:33:42.0576555Z * [new tag] viable/strict/1759811145 -> viable/strict/1759811145 2025-12-04T09:33:42.0577693Z * [new tag] viable/strict/1759812581 -> viable/strict/1759812581 2025-12-04T09:33:42.0578744Z * [new tag] viable/strict/1759814683 -> viable/strict/1759814683 2025-12-04T09:33:42.0579848Z * [new tag] viable/strict/1759821889 -> viable/strict/1759821889 2025-12-04T09:33:42.0580971Z * [new tag] viable/strict/1759823376 -> viable/strict/1759823376 2025-12-04T09:33:42.0581994Z * [new tag] viable/strict/1759827107 -> viable/strict/1759827107 2025-12-04T09:33:42.0583042Z * [new tag] viable/strict/1759830577 -> viable/strict/1759830577 2025-12-04T09:33:42.0584199Z * [new tag] viable/strict/1759832720 -> viable/strict/1759832720 2025-12-04T09:33:42.0585218Z * [new tag] viable/strict/1759842063 -> viable/strict/1759842063 2025-12-04T09:33:42.0586279Z * [new tag] viable/strict/1759847121 -> viable/strict/1759847121 2025-12-04T09:33:42.0587631Z * [new tag] viable/strict/1759850721 -> viable/strict/1759850721 2025-12-04T09:33:42.0588770Z * [new tag] viable/strict/1759857870 -> viable/strict/1759857870 2025-12-04T09:33:42.0589847Z * [new tag] viable/strict/1759863143 -> viable/strict/1759863143 2025-12-04T09:33:42.0590880Z * [new tag] viable/strict/1759875874 -> viable/strict/1759875874 2025-12-04T09:33:42.0591674Z * [new tag] viable/strict/1759877385 -> viable/strict/1759877385 2025-12-04T09:33:42.0592854Z * [new tag] viable/strict/1759883801 -> viable/strict/1759883801 2025-12-04T09:33:42.0594096Z * [new tag] viable/strict/1759885922 -> viable/strict/1759885922 2025-12-04T09:33:42.0595060Z * [new tag] viable/strict/1759888488 -> viable/strict/1759888488 2025-12-04T09:33:42.0596063Z * [new tag] viable/strict/1759895471 -> viable/strict/1759895471 2025-12-04T09:33:42.0597073Z * [new tag] viable/strict/1759904803 -> viable/strict/1759904803 2025-12-04T09:33:42.0598340Z * [new tag] viable/strict/1759908300 -> viable/strict/1759908300 2025-12-04T09:33:42.0599490Z * [new tag] viable/strict/1759915520 -> viable/strict/1759915520 2025-12-04T09:33:42.0600531Z * [new tag] viable/strict/1759916978 -> viable/strict/1759916978 2025-12-04T09:33:42.0601360Z * [new tag] viable/strict/1759930024 -> viable/strict/1759930024 2025-12-04T09:33:42.0602503Z * [new tag] viable/strict/1759948122 -> viable/strict/1759948122 2025-12-04T09:33:42.0603616Z * [new tag] viable/strict/1759952983 -> viable/strict/1759952983 2025-12-04T09:33:42.0604700Z * [new tag] viable/strict/1759955121 -> viable/strict/1759955121 2025-12-04T09:33:42.0605748Z * [new tag] viable/strict/1759962298 -> viable/strict/1759962298 2025-12-04T09:33:42.0606867Z * [new tag] viable/strict/1759965837 -> viable/strict/1759965837 2025-12-04T09:33:42.0607995Z * [new tag] viable/strict/1759970213 -> viable/strict/1759970213 2025-12-04T09:33:42.0609001Z * [new tag] viable/strict/1759974894 -> viable/strict/1759974894 2025-12-04T09:33:42.0610021Z * [new tag] viable/strict/1759977763 -> viable/strict/1759977763 2025-12-04T09:33:42.0611160Z * [new tag] viable/strict/1759979241 -> viable/strict/1759979241 2025-12-04T09:33:42.0612726Z * [new tag] viable/strict/1759985417 -> viable/strict/1759985417 2025-12-04T09:33:42.0613809Z * [new tag] viable/strict/1759987490 -> viable/strict/1759987490 2025-12-04T09:33:42.0614864Z * [new tag] viable/strict/1759996180 -> viable/strict/1759996180 2025-12-04T09:33:42.0615871Z * [new tag] viable/strict/1760065682 -> viable/strict/1760065682 2025-12-04T09:33:42.0617093Z * [new tag] viable/strict/1760066894 -> viable/strict/1760066894 2025-12-04T09:33:42.0618200Z * [new tag] viable/strict/1760070345 -> viable/strict/1760070345 2025-12-04T09:33:42.0619416Z * [new tag] viable/strict/1760089782 -> viable/strict/1760089782 2025-12-04T09:33:42.0620499Z * [new tag] viable/strict/1760091921 -> viable/strict/1760091921 2025-12-04T09:33:42.0621543Z * [new tag] viable/strict/1760127924 -> viable/strict/1760127924 2025-12-04T09:33:42.0622612Z * [new tag] viable/strict/1760129489 -> viable/strict/1760129489 2025-12-04T09:33:42.0623839Z * [new tag] viable/strict/1760132980 -> viable/strict/1760132980 2025-12-04T09:33:42.0624982Z * [new tag] viable/strict/1760135060 -> viable/strict/1760135060 2025-12-04T09:33:42.0626023Z * [new tag] viable/strict/1760215782 -> viable/strict/1760215782 2025-12-04T09:33:42.0627104Z * [new tag] viable/strict/1760273849 -> viable/strict/1760273849 2025-12-04T09:33:42.0628139Z * [new tag] viable/strict/1760275517 -> viable/strict/1760275517 2025-12-04T09:33:42.0629216Z * [new tag] viable/strict/1760276979 -> viable/strict/1760276979 2025-12-04T09:33:42.0630306Z * [new tag] viable/strict/1760279007 -> viable/strict/1760279007 2025-12-04T09:33:42.0631174Z * [new tag] viable/strict/1760286328 -> viable/strict/1760286328 2025-12-04T09:33:42.0632058Z * [new tag] viable/strict/1760493304 -> viable/strict/1760493304 2025-12-04T09:33:42.0633223Z * [new tag] viable/strict/1760496298 -> viable/strict/1760496298 2025-12-04T09:33:42.0634225Z * [new tag] viable/strict/1760518396 -> viable/strict/1760518396 2025-12-04T09:33:42.0635273Z * [new tag] viable/strict/1760534864 -> viable/strict/1760534864 2025-12-04T09:33:42.0636302Z * [new tag] viable/strict/1760549062 -> viable/strict/1760549062 2025-12-04T09:33:42.0637491Z * [new tag] viable/strict/1760552799 -> viable/strict/1760552799 2025-12-04T09:33:42.0638555Z * [new tag] viable/strict/1760554355 -> viable/strict/1760554355 2025-12-04T09:33:42.0639662Z * [new tag] viable/strict/1760556275 -> viable/strict/1760556275 2025-12-04T09:33:42.0640746Z * [new tag] viable/strict/1760564979 -> viable/strict/1760564979 2025-12-04T09:33:42.0641932Z * [new tag] viable/strict/1760567049 -> viable/strict/1760567049 2025-12-04T09:33:42.0643421Z * [new tag] viable/strict/1760568585 -> viable/strict/1760568585 2025-12-04T09:33:42.0644424Z * [new tag] viable/strict/1760570630 -> viable/strict/1760570630 2025-12-04T09:33:42.0645496Z * [new tag] viable/strict/1760572180 -> viable/strict/1760572180 2025-12-04T09:33:42.0646542Z * [new tag] viable/strict/1760575094 -> viable/strict/1760575094 2025-12-04T09:33:42.0647724Z * [new tag] viable/strict/1760579709 -> viable/strict/1760579709 2025-12-04T09:33:42.0649287Z * [new tag] viable/strict/1760582614 -> viable/strict/1760582614 2025-12-04T09:33:42.0650390Z * [new tag] viable/strict/1760586815 -> viable/strict/1760586815 2025-12-04T09:33:42.0651239Z * [new tag] viable/strict/1760588829 -> viable/strict/1760588829 2025-12-04T09:33:42.0652344Z * [new tag] viable/strict/1760590200 -> viable/strict/1760590200 2025-12-04T09:33:42.0653492Z * [new tag] viable/strict/1760592311 -> viable/strict/1760592311 2025-12-04T09:33:42.0654534Z * [new tag] viable/strict/1760619733 -> viable/strict/1760619733 2025-12-04T09:33:42.0655390Z * [new tag] viable/strict/1760628335 -> viable/strict/1760628335 2025-12-04T09:33:42.0656524Z * [new tag] viable/strict/1760635490 -> viable/strict/1760635490 2025-12-04T09:33:42.0657702Z * [new tag] viable/strict/1760640743 -> viable/strict/1760640743 2025-12-04T09:33:42.0658855Z * [new tag] viable/strict/1760642528 -> viable/strict/1760642528 2025-12-04T09:33:42.0659993Z * [new tag] viable/strict/1760646330 -> viable/strict/1760646330 2025-12-04T09:33:42.0660857Z * [new tag] viable/strict/1760666101 -> viable/strict/1760666101 2025-12-04T09:33:42.0662010Z * [new tag] viable/strict/1760668990 -> viable/strict/1760668990 2025-12-04T09:33:42.0663057Z * [new tag] viable/strict/1760670600 -> viable/strict/1760670600 2025-12-04T09:33:42.0664145Z * [new tag] viable/strict/1760671704 -> viable/strict/1760671704 2025-12-04T09:33:42.0665205Z * [new tag] viable/strict/1760673121 -> viable/strict/1760673121 2025-12-04T09:33:42.0666253Z * [new tag] viable/strict/1760675352 -> viable/strict/1760675352 2025-12-04T09:33:42.0667347Z * [new tag] viable/strict/1760696731 -> viable/strict/1760696731 2025-12-04T09:33:42.0670082Z * [new tag] viable/strict/1760723515 -> viable/strict/1760723515 2025-12-04T09:33:42.0671292Z * [new tag] viable/strict/1760727234 -> viable/strict/1760727234 2025-12-04T09:33:42.0672472Z * [new tag] viable/strict/1760730578 -> viable/strict/1760730578 2025-12-04T09:33:42.0673570Z * [new tag] viable/strict/1760732726 -> viable/strict/1760732726 2025-12-04T09:33:42.0674814Z * [new tag] viable/strict/1760734180 -> viable/strict/1760734180 2025-12-04T09:33:42.0675653Z * [new tag] viable/strict/1760736251 -> viable/strict/1760736251 2025-12-04T09:33:42.0676807Z * [new tag] viable/strict/1760737772 -> viable/strict/1760737772 2025-12-04T09:33:42.0677930Z * [new tag] viable/strict/1760758005 -> viable/strict/1760758005 2025-12-04T09:33:42.0678948Z * [new tag] viable/strict/1760761532 -> viable/strict/1760761532 2025-12-04T09:33:42.0680043Z * [new tag] viable/strict/1760802581 -> viable/strict/1760802581 2025-12-04T09:33:42.0681103Z * [new tag] viable/strict/1760827772 -> viable/strict/1760827772 2025-12-04T09:33:42.0682143Z * [new tag] viable/strict/1760834524 -> viable/strict/1760834524 2025-12-04T09:33:42.0683275Z * [new tag] viable/strict/1760845009 -> viable/strict/1760845009 2025-12-04T09:33:42.0684852Z * [new tag] viable/strict/1760876836 -> viable/strict/1760876836 2025-12-04T09:33:42.0685978Z * [new tag] viable/strict/1760880329 -> viable/strict/1760880329 2025-12-04T09:33:42.0687045Z * [new tag] viable/strict/1760888987 -> viable/strict/1760888987 2025-12-04T09:33:42.0687974Z * [new tag] viable/strict/1760912664 -> viable/strict/1760912664 2025-12-04T09:33:42.0689104Z * [new tag] viable/strict/1760925321 -> viable/strict/1760925321 2025-12-04T09:33:42.0690147Z * [new tag] viable/strict/1760931488 -> viable/strict/1760931488 2025-12-04T09:33:42.0691261Z * [new tag] viable/strict/1760932693 -> viable/strict/1760932693 2025-12-04T09:33:42.0692313Z * [new tag] viable/strict/1761004184 -> viable/strict/1761004184 2025-12-04T09:33:42.0693374Z * [new tag] viable/strict/1761014748 -> viable/strict/1761014748 2025-12-04T09:33:42.0694472Z * [new tag] viable/strict/1761017491 -> viable/strict/1761017491 2025-12-04T09:33:42.0695617Z * [new tag] viable/strict/1761018806 -> viable/strict/1761018806 2025-12-04T09:33:42.0696862Z * [new tag] viable/strict/1761020754 -> viable/strict/1761020754 2025-12-04T09:33:42.0697926Z * [new tag] viable/strict/1761024303 -> viable/strict/1761024303 2025-12-04T09:33:42.0698956Z * [new tag] viable/strict/1761029582 -> viable/strict/1761029582 2025-12-04T09:33:42.0699984Z * [new tag] viable/strict/1761031535 -> viable/strict/1761031535 2025-12-04T09:33:42.0701333Z * [new tag] viable/strict/1761035196 -> viable/strict/1761035196 2025-12-04T09:33:42.0702574Z * [new tag] viable/strict/1761045825 -> viable/strict/1761045825 2025-12-04T09:33:42.0703680Z * [new tag] viable/strict/1761054796 -> viable/strict/1761054796 2025-12-04T09:33:42.0704787Z * [new tag] viable/strict/1761060314 -> viable/strict/1761060314 2025-12-04T09:33:42.0705895Z * [new tag] viable/strict/1761071198 -> viable/strict/1761071198 2025-12-04T09:33:42.0707014Z * [new tag] viable/strict/1761074628 -> viable/strict/1761074628 2025-12-04T09:33:42.0708060Z * [new tag] viable/strict/1761078351 -> viable/strict/1761078351 2025-12-04T09:33:42.0709116Z * [new tag] viable/strict/1761079822 -> viable/strict/1761079822 2025-12-04T09:33:42.0710179Z * [new tag] viable/strict/1761081873 -> viable/strict/1761081873 2025-12-04T09:33:42.0711281Z * [new tag] viable/strict/1761083392 -> viable/strict/1761083392 2025-12-04T09:33:42.0712387Z * [new tag] viable/strict/1761085465 -> viable/strict/1761085465 2025-12-04T09:33:42.0713531Z * [new tag] viable/strict/1761089099 -> viable/strict/1761089099 2025-12-04T09:33:42.0714718Z * [new tag] viable/strict/1761095535 -> viable/strict/1761095535 2025-12-04T09:33:42.0715543Z * [new tag] viable/strict/1761098119 -> viable/strict/1761098119 2025-12-04T09:33:42.0717211Z * [new tag] viable/strict/1761101330 -> viable/strict/1761101330 2025-12-04T09:33:42.0718315Z * [new tag] viable/strict/1761114425 -> viable/strict/1761114425 2025-12-04T09:33:42.0719369Z * [new tag] viable/strict/1761116036 -> viable/strict/1761116036 2025-12-04T09:33:42.0720419Z * [new tag] viable/strict/1761119379 -> viable/strict/1761119379 2025-12-04T09:33:42.0721458Z * [new tag] viable/strict/1761121601 -> viable/strict/1761121601 2025-12-04T09:33:42.0722472Z * [new tag] viable/strict/1761123234 -> viable/strict/1761123234 2025-12-04T09:33:42.0723482Z * [new tag] viable/strict/1761126621 -> viable/strict/1761126621 2025-12-04T09:33:42.0724555Z * [new tag] viable/strict/1761132259 -> viable/strict/1761132259 2025-12-04T09:33:42.0725776Z * [new tag] viable/strict/1761146746 -> viable/strict/1761146746 2025-12-04T09:33:42.0726848Z * [new tag] viable/strict/1761164752 -> viable/strict/1761164752 2025-12-04T09:33:42.0727784Z * [new tag] viable/strict/1761166198 -> viable/strict/1761166198 2025-12-04T09:33:42.0728945Z * [new tag] viable/strict/1761175424 -> viable/strict/1761175424 2025-12-04T09:33:42.0729996Z * [new tag] viable/strict/1761176983 -> viable/strict/1761176983 2025-12-04T09:33:42.0731412Z * [new tag] viable/strict/1761179891 -> viable/strict/1761179891 2025-12-04T09:33:42.0732456Z * [new tag] viable/strict/1761181930 -> viable/strict/1761181930 2025-12-04T09:33:42.0733510Z * [new tag] viable/strict/1761184516 -> viable/strict/1761184516 2025-12-04T09:33:42.0734597Z * [new tag] viable/strict/1761190179 -> viable/strict/1761190179 2025-12-04T09:33:42.0735700Z * [new tag] viable/strict/1761193558 -> viable/strict/1761193558 2025-12-04T09:33:42.0736789Z * [new tag] viable/strict/1761207990 -> viable/strict/1761207990 2025-12-04T09:33:42.0737907Z * [new tag] viable/strict/1761229539 -> viable/strict/1761229539 2025-12-04T09:33:42.0739223Z * [new tag] viable/strict/1761244031 -> viable/strict/1761244031 2025-12-04T09:33:42.0740300Z * [new tag] viable/strict/1761248986 -> viable/strict/1761248986 2025-12-04T09:33:42.0741333Z * [new tag] viable/strict/1761259791 -> viable/strict/1761259791 2025-12-04T09:33:42.0742344Z * [new tag] viable/strict/1761266139 -> viable/strict/1761266139 2025-12-04T09:33:42.0743528Z * [new tag] viable/strict/1761268316 -> viable/strict/1761268316 2025-12-04T09:33:42.0744591Z * [new tag] viable/strict/1761273805 -> viable/strict/1761273805 2025-12-04T09:33:42.0745570Z * [new tag] viable/strict/1761275261 -> viable/strict/1761275261 2025-12-04T09:33:42.0746740Z * [new tag] viable/strict/1761277913 -> viable/strict/1761277913 2025-12-04T09:33:42.0747824Z * [new tag] viable/strict/1761290701 -> viable/strict/1761290701 2025-12-04T09:33:42.0749005Z * [new tag] viable/strict/1761294396 -> viable/strict/1761294396 2025-12-04T09:33:42.0750000Z * [new tag] viable/strict/1761303047 -> viable/strict/1761303047 2025-12-04T09:33:42.0751060Z * [new tag] viable/strict/1761335388 -> viable/strict/1761335388 2025-12-04T09:33:42.0752121Z * [new tag] viable/strict/1761337551 -> viable/strict/1761337551 2025-12-04T09:33:42.0753251Z * [new tag] viable/strict/1761339007 -> viable/strict/1761339007 2025-12-04T09:33:42.0754390Z * [new tag] viable/strict/1761341050 -> viable/strict/1761341050 2025-12-04T09:33:42.0755915Z * [new tag] viable/strict/1761346188 -> viable/strict/1761346188 2025-12-04T09:33:42.0757131Z * [new tag] viable/strict/1761349792 -> viable/strict/1761349792 2025-12-04T09:33:42.0758173Z * [new tag] viable/strict/1761352620 -> viable/strict/1761352620 2025-12-04T09:33:42.0759260Z * [new tag] viable/strict/1761354730 -> viable/strict/1761354730 2025-12-04T09:33:42.0760330Z * [new tag] viable/strict/1761357298 -> viable/strict/1761357298 2025-12-04T09:33:42.0761396Z * [new tag] viable/strict/1761360201 -> viable/strict/1761360201 2025-12-04T09:33:42.0762484Z * [new tag] viable/strict/1761361753 -> viable/strict/1761361753 2025-12-04T09:33:42.0763553Z * [new tag] viable/strict/1761364351 -> viable/strict/1761364351 2025-12-04T09:33:42.0764611Z * [new tag] viable/strict/1761366338 -> viable/strict/1761366338 2025-12-04T09:33:42.0765832Z * [new tag] viable/strict/1761367802 -> viable/strict/1761367802 2025-12-04T09:33:42.0767038Z * [new tag] viable/strict/1761369889 -> viable/strict/1761369889 2025-12-04T09:33:42.0768173Z * [new tag] viable/strict/1761371385 -> viable/strict/1761371385 2025-12-04T09:33:42.0769247Z * [new tag] viable/strict/1761373581 -> viable/strict/1761373581 2025-12-04T09:33:42.0770486Z * [new tag] viable/strict/1761375054 -> viable/strict/1761375054 2025-12-04T09:33:42.0771818Z * [new tag] viable/strict/1761421785 -> viable/strict/1761421785 2025-12-04T09:33:42.0773022Z * [new tag] viable/strict/1761434614 -> viable/strict/1761434614 2025-12-04T09:33:42.0774522Z * [new tag] viable/strict/1761439254 -> viable/strict/1761439254 2025-12-04T09:33:42.0775577Z * [new tag] viable/strict/1761454187 -> viable/strict/1761454187 2025-12-04T09:33:42.0776786Z * [new tag] viable/strict/1761459991 -> viable/strict/1761459991 2025-12-04T09:33:42.0778122Z * [new tag] viable/strict/1761470668 -> viable/strict/1761470668 2025-12-04T09:33:42.0779721Z * [new tag] viable/strict/1761472188 -> viable/strict/1761472188 2025-12-04T09:33:42.0780797Z * [new tag] viable/strict/1761503178 -> viable/strict/1761503178 2025-12-04T09:33:42.0781905Z * [new tag] viable/strict/1761517492 -> viable/strict/1761517492 2025-12-04T09:33:42.0782992Z * [new tag] viable/strict/1761518981 -> viable/strict/1761518981 2025-12-04T09:33:42.0784115Z * [new tag] viable/strict/1761533609 -> viable/strict/1761533609 2025-12-04T09:33:42.0784984Z * [new tag] viable/strict/1761546438 -> viable/strict/1761546438 2025-12-04T09:33:42.0786255Z * [new tag] viable/strict/1761548133 -> viable/strict/1761548133 2025-12-04T09:33:42.0787620Z * [new tag] viable/strict/1761555186 -> viable/strict/1761555186 2025-12-04T09:33:42.0788772Z * [new tag] viable/strict/1761557178 -> viable/strict/1761557178 2025-12-04T09:33:42.0789861Z * [new tag] viable/strict/1761560772 -> viable/strict/1761560772 2025-12-04T09:33:42.0790942Z * [new tag] viable/strict/1761562266 -> viable/strict/1761562266 2025-12-04T09:33:42.0792088Z * [new tag] viable/strict/1761564260 -> viable/strict/1761564260 2025-12-04T09:33:42.0793152Z * [new tag] viable/strict/1761568072 -> viable/strict/1761568072 2025-12-04T09:33:42.0794209Z * [new tag] viable/strict/1761571683 -> viable/strict/1761571683 2025-12-04T09:33:42.0795205Z * [new tag] viable/strict/1761580199 -> viable/strict/1761580199 2025-12-04T09:33:42.0796248Z * [new tag] viable/strict/1761587383 -> viable/strict/1761587383 2025-12-04T09:33:42.0797371Z * [new tag] viable/strict/1761591165 -> viable/strict/1761591165 2025-12-04T09:33:42.0798451Z * [new tag] viable/strict/1761594575 -> viable/strict/1761594575 2025-12-04T09:33:42.0799494Z * [new tag] viable/strict/1761596710 -> viable/strict/1761596710 2025-12-04T09:33:42.0800700Z * [new tag] viable/strict/1761598189 -> viable/strict/1761598189 2025-12-04T09:33:42.0801783Z * [new tag] viable/strict/1761600254 -> viable/strict/1761600254 2025-12-04T09:33:42.0802899Z * [new tag] viable/strict/1761603879 -> viable/strict/1761603879 2025-12-04T09:33:42.0804044Z * [new tag] viable/strict/1761605429 -> viable/strict/1761605429 2025-12-04T09:33:42.0805269Z * [new tag] viable/strict/1761607468 -> viable/strict/1761607468 2025-12-04T09:33:42.0806358Z * [new tag] viable/strict/1761608983 -> viable/strict/1761608983 2025-12-04T09:33:42.0807442Z * [new tag] viable/strict/1761611846 -> viable/strict/1761611846 2025-12-04T09:33:42.0808592Z * [new tag] viable/strict/1761613922 -> viable/strict/1761613922 2025-12-04T09:33:42.0809443Z * [new tag] viable/strict/1761616504 -> viable/strict/1761616504 2025-12-04T09:33:42.0810452Z * [new tag] viable/strict/1761619599 -> viable/strict/1761619599 2025-12-04T09:33:42.0811518Z * [new tag] viable/strict/1761686693 -> viable/strict/1761686693 2025-12-04T09:33:42.0812572Z * [new tag] viable/strict/1761688179 -> viable/strict/1761688179 2025-12-04T09:33:42.0813656Z * [new tag] viable/strict/1761691973 -> viable/strict/1761691973 2025-12-04T09:33:42.0814903Z * [new tag] viable/strict/1761693884 -> viable/strict/1761693884 2025-12-04T09:33:42.0816013Z * [new tag] viable/strict/1761695389 -> viable/strict/1761695389 2025-12-04T09:33:42.0817277Z * [new tag] viable/strict/1761698408 -> viable/strict/1761698408 2025-12-04T09:33:42.0818334Z * [new tag] viable/strict/1761702931 -> viable/strict/1761702931 2025-12-04T09:33:42.0819457Z * [new tag] viable/strict/1761706307 -> viable/strict/1761706307 2025-12-04T09:33:42.0820554Z * [new tag] viable/strict/1761709065 -> viable/strict/1761709065 2025-12-04T09:33:42.0821779Z * [new tag] viable/strict/1761710285 -> viable/strict/1761710285 2025-12-04T09:33:42.0822889Z * [new tag] viable/strict/1761711983 -> viable/strict/1761711983 2025-12-04T09:33:42.0824036Z * [new tag] viable/strict/1761713514 -> viable/strict/1761713514 2025-12-04T09:33:42.0825274Z * [new tag] viable/strict/1761715523 -> viable/strict/1761715523 2025-12-04T09:33:42.0826411Z * [new tag] viable/strict/1761727973 -> viable/strict/1761727973 2025-12-04T09:33:42.0827569Z * [new tag] viable/strict/1761751558 -> viable/strict/1761751558 2025-12-04T09:33:42.0829143Z * [new tag] viable/strict/1761755187 -> viable/strict/1761755187 2025-12-04T09:33:42.0830313Z * [new tag] viable/strict/1761756826 -> viable/strict/1761756826 2025-12-04T09:33:42.0831488Z * [new tag] viable/strict/1761769551 -> viable/strict/1761769551 2025-12-04T09:33:42.0832707Z * [new tag] viable/strict/1761771032 -> viable/strict/1761771032 2025-12-04T09:33:42.0833589Z * [new tag] viable/strict/1761773101 -> viable/strict/1761773101 2025-12-04T09:33:42.0834784Z * [new tag] viable/strict/1761781792 -> viable/strict/1761781792 2025-12-04T09:33:42.0836015Z * [new tag] viable/strict/1761784788 -> viable/strict/1761784788 2025-12-04T09:33:42.0837178Z * [new tag] viable/strict/1761786740 -> viable/strict/1761786740 2025-12-04T09:33:42.0838329Z * [new tag] viable/strict/1761789332 -> viable/strict/1761789332 2025-12-04T09:33:42.0839966Z * [new tag] viable/strict/1761792569 -> viable/strict/1761792569 2025-12-04T09:33:42.0841218Z * [new tag] viable/strict/1761795289 -> viable/strict/1761795289 2025-12-04T09:33:42.0842366Z * [new tag] viable/strict/1761798345 -> viable/strict/1761798345 2025-12-04T09:33:42.0843443Z * [new tag] viable/strict/1761799827 -> viable/strict/1761799827 2025-12-04T09:33:42.0844610Z * [new tag] viable/strict/1761805604 -> viable/strict/1761805604 2025-12-04T09:33:42.0845710Z * [new tag] viable/strict/1761807202 -> viable/strict/1761807202 2025-12-04T09:33:42.0846863Z * [new tag] viable/strict/1761809094 -> viable/strict/1761809094 2025-12-04T09:33:42.0847980Z * [new tag] viable/strict/1761810576 -> viable/strict/1761810576 2025-12-04T09:33:42.0849139Z * [new tag] viable/strict/1761812771 -> viable/strict/1761812771 2025-12-04T09:33:42.0850498Z * [new tag] viable/strict/1761814363 -> viable/strict/1761814363 2025-12-04T09:33:42.0851640Z * [new tag] viable/strict/1761857410 -> viable/strict/1761857410 2025-12-04T09:33:42.0852825Z * [new tag] viable/strict/1761860985 -> viable/strict/1761860985 2025-12-04T09:33:42.0853913Z * [new tag] viable/strict/1761863094 -> viable/strict/1761863094 2025-12-04T09:33:42.0854997Z * [new tag] viable/strict/1761864590 -> viable/strict/1761864590 2025-12-04T09:33:42.0856094Z * [new tag] viable/strict/1761866675 -> viable/strict/1761866675 2025-12-04T09:33:42.0857620Z * [new tag] viable/strict/1761868178 -> viable/strict/1761868178 2025-12-04T09:33:42.0858723Z * [new tag] viable/strict/1761871111 -> viable/strict/1761871111 2025-12-04T09:33:42.0859854Z * [new tag] viable/strict/1761873126 -> viable/strict/1761873126 2025-12-04T09:33:42.0861051Z * [new tag] viable/strict/1761875714 -> viable/strict/1761875714 2025-12-04T09:33:42.0862203Z * [new tag] viable/strict/1761878924 -> viable/strict/1761878924 2025-12-04T09:33:42.0863351Z * [new tag] viable/strict/1761881727 -> viable/strict/1761881727 2025-12-04T09:33:42.0864437Z * [new tag] viable/strict/1761882959 -> viable/strict/1761882959 2025-12-04T09:33:42.0865589Z * [new tag] viable/strict/1761886268 -> viable/strict/1761886268 2025-12-04T09:33:42.0866721Z * [new tag] viable/strict/1761893641 -> viable/strict/1761893641 2025-12-04T09:33:42.0867820Z * [new tag] viable/strict/1761931517 -> viable/strict/1761931517 2025-12-04T09:33:42.0868983Z * [new tag] viable/strict/1761933080 -> viable/strict/1761933080 2025-12-04T09:33:42.0870076Z * [new tag] viable/strict/1761935217 -> viable/strict/1761935217 2025-12-04T09:33:42.0871377Z * [new tag] viable/strict/1761938533 -> viable/strict/1761938533 2025-12-04T09:33:42.0872548Z * [new tag] viable/strict/1761940184 -> viable/strict/1761940184 2025-12-04T09:33:42.0873660Z * [new tag] viable/strict/1761942338 -> viable/strict/1761942338 2025-12-04T09:33:42.0874783Z * [new tag] viable/strict/1761946100 -> viable/strict/1761946100 2025-12-04T09:33:42.0875909Z * [new tag] viable/strict/1761947374 -> viable/strict/1761947374 2025-12-04T09:33:42.0878891Z * [new tag] viable/strict/1761950978 -> viable/strict/1761950978 2025-12-04T09:33:42.0879284Z * [new tag] viable/strict/1761957727 -> viable/strict/1761957727 2025-12-04T09:33:42.0879978Z * [new tag] viable/strict/1761959532 -> viable/strict/1761959532 2025-12-04T09:33:42.0880225Z * [new tag] viable/strict/1761965366 -> viable/strict/1761965366 2025-12-04T09:33:42.0881630Z * [new tag] viable/strict/1761968066 -> viable/strict/1761968066 2025-12-04T09:33:42.0882717Z * [new tag] viable/strict/1761969322 -> viable/strict/1761969322 2025-12-04T09:33:42.0883800Z * [new tag] viable/strict/1761974723 -> viable/strict/1761974723 2025-12-04T09:33:42.0884909Z * [new tag] viable/strict/1761981837 -> viable/strict/1761981837 2025-12-04T09:33:42.0886141Z * [new tag] viable/strict/1761985546 -> viable/strict/1761985546 2025-12-04T09:33:42.0887238Z * [new tag] viable/strict/1761987030 -> viable/strict/1761987030 2025-12-04T09:33:42.0888406Z * [new tag] viable/strict/1762003554 -> viable/strict/1762003554 2025-12-04T09:33:42.0889508Z * [new tag] viable/strict/1762021560 -> viable/strict/1762021560 2025-12-04T09:33:42.0890620Z * [new tag] viable/strict/1762032190 -> viable/strict/1762032190 2025-12-04T09:33:42.0891760Z * [new tag] viable/strict/1762040981 -> viable/strict/1762040981 2025-12-04T09:33:42.0892900Z * [new tag] viable/strict/1762048525 -> viable/strict/1762048525 2025-12-04T09:33:42.0894098Z * [new tag] viable/strict/1762104223 -> viable/strict/1762104223 2025-12-04T09:33:42.0895140Z * [new tag] viable/strict/1762105778 -> viable/strict/1762105778 2025-12-04T09:33:42.0896354Z * [new tag] viable/strict/1762115109 -> viable/strict/1762115109 2025-12-04T09:33:42.0897535Z * [new tag] viable/strict/1762125840 -> viable/strict/1762125840 2025-12-04T09:33:42.0898416Z * [new tag] viable/strict/1762127377 -> viable/strict/1762127377 2025-12-04T09:33:42.0900007Z * [new tag] viable/strict/1762134925 -> viable/strict/1762134925 2025-12-04T09:33:42.0901024Z * [new tag] viable/strict/1762138338 -> viable/strict/1762138338 2025-12-04T09:33:42.0902170Z * [new tag] viable/strict/1762148993 -> viable/strict/1762148993 2025-12-04T09:33:42.0903785Z * [new tag] viable/strict/1762152871 -> viable/strict/1762152871 2025-12-04T09:33:42.0904933Z * [new tag] viable/strict/1762156183 -> viable/strict/1762156183 2025-12-04T09:33:42.0906023Z * [new tag] viable/strict/1762163457 -> viable/strict/1762163457 2025-12-04T09:33:42.0907165Z * [new tag] viable/strict/1762165569 -> viable/strict/1762165569 2025-12-04T09:33:42.0908428Z * [new tag] viable/strict/1762169035 -> viable/strict/1762169035 2025-12-04T09:33:42.0909602Z * [new tag] viable/strict/1762174936 -> viable/strict/1762174936 2025-12-04T09:33:42.0910739Z * [new tag] viable/strict/1762194412 -> viable/strict/1762194412 2025-12-04T09:33:42.0911805Z * [new tag] viable/strict/1762195876 -> viable/strict/1762195876 2025-12-04T09:33:42.0912930Z * [new tag] viable/strict/1762197788 -> viable/strict/1762197788 2025-12-04T09:33:42.0914099Z * [new tag] viable/strict/1762199389 -> viable/strict/1762199389 2025-12-04T09:33:42.0915619Z * [new tag] viable/strict/1762206585 -> viable/strict/1762206585 2025-12-04T09:33:42.0916869Z * [new tag] viable/strict/1762210184 -> viable/strict/1762210184 2025-12-04T09:33:42.0917884Z * [new tag] viable/strict/1762218736 -> viable/strict/1762218736 2025-12-04T09:33:42.0919039Z * [new tag] viable/strict/1762224529 -> viable/strict/1762224529 2025-12-04T09:33:42.0920289Z * [new tag] viable/strict/1762227253 -> viable/strict/1762227253 2025-12-04T09:33:42.0921125Z * [new tag] viable/strict/1762228515 -> viable/strict/1762228515 2025-12-04T09:33:42.0922447Z * [new tag] viable/strict/1762230349 -> viable/strict/1762230349 2025-12-04T09:33:42.0923550Z * [new tag] viable/strict/1762231859 -> viable/strict/1762231859 2025-12-04T09:33:42.0924707Z * [new tag] viable/strict/1762233925 -> viable/strict/1762233925 2025-12-04T09:33:42.0925972Z * [new tag] viable/strict/1762237630 -> viable/strict/1762237630 2025-12-04T09:33:42.0926946Z * [new tag] viable/strict/1762253522 -> viable/strict/1762253522 2025-12-04T09:33:42.0928304Z * [new tag] viable/strict/1762278588 -> viable/strict/1762278588 2025-12-04T09:33:42.0929446Z * [new tag] viable/strict/1762284203 -> viable/strict/1762284203 2025-12-04T09:33:42.0930602Z * [new tag] viable/strict/1762289446 -> viable/strict/1762289446 2025-12-04T09:33:42.0931719Z * [new tag] viable/strict/1762291515 -> viable/strict/1762291515 2025-12-04T09:33:42.0932934Z * [new tag] viable/strict/1762295100 -> viable/strict/1762295100 2025-12-04T09:33:42.0933815Z * [new tag] viable/strict/1762296590 -> viable/strict/1762296590 2025-12-04T09:33:42.0934871Z * [new tag] viable/strict/1762300179 -> viable/strict/1762300179 2025-12-04T09:33:42.0936110Z * [new tag] viable/strict/1762303207 -> viable/strict/1762303207 2025-12-04T09:33:42.0937053Z * [new tag] viable/strict/1762386584 -> viable/strict/1762386584 2025-12-04T09:33:42.0938276Z * [new tag] viable/strict/1762391537 -> viable/strict/1762391537 2025-12-04T09:33:42.0939157Z * [new tag] viable/strict/1762394119 -> viable/strict/1762394119 2025-12-04T09:33:42.0940783Z * [new tag] viable/strict/1762397437 -> viable/strict/1762397437 2025-12-04T09:33:42.0941895Z * [new tag] viable/strict/1762400256 -> viable/strict/1762400256 2025-12-04T09:33:42.0942987Z * [new tag] viable/strict/1762401469 -> viable/strict/1762401469 2025-12-04T09:33:42.0944204Z * [new tag] viable/strict/1762408195 -> viable/strict/1762408195 2025-12-04T09:33:42.0945414Z * [new tag] viable/strict/1762410411 -> viable/strict/1762410411 2025-12-04T09:33:42.0946503Z * [new tag] viable/strict/1762417613 -> viable/strict/1762417613 2025-12-04T09:33:42.0947638Z * [new tag] viable/strict/1762419198 -> viable/strict/1762419198 2025-12-04T09:33:42.0948773Z * [new tag] viable/strict/1762422656 -> viable/strict/1762422656 2025-12-04T09:33:42.0950364Z * [new tag] viable/strict/1762424746 -> viable/strict/1762424746 2025-12-04T09:33:42.0951576Z * [new tag] viable/strict/1762446386 -> viable/strict/1762446386 2025-12-04T09:33:42.0952853Z * [new tag] viable/strict/1762449912 -> viable/strict/1762449912 2025-12-04T09:33:42.0953987Z * [new tag] viable/strict/1762457031 -> viable/strict/1762457031 2025-12-04T09:33:42.0955142Z * [new tag] viable/strict/1762462441 -> viable/strict/1762462441 2025-12-04T09:33:42.0956258Z * [new tag] viable/strict/1762467909 -> viable/strict/1762467909 2025-12-04T09:33:42.0957420Z * [new tag] viable/strict/1762471493 -> viable/strict/1762471493 2025-12-04T09:33:42.0958613Z * [new tag] viable/strict/1762475990 -> viable/strict/1762475990 2025-12-04T09:33:42.0959851Z * [new tag] viable/strict/1762477933 -> viable/strict/1762477933 2025-12-04T09:33:42.0960966Z * [new tag] viable/strict/1762491053 -> viable/strict/1762491053 2025-12-04T09:33:42.0962238Z * [new tag] viable/strict/1762493118 -> viable/strict/1762493118 2025-12-04T09:33:42.0963116Z * [new tag] viable/strict/1762498442 -> viable/strict/1762498442 2025-12-04T09:33:42.0964428Z * [new tag] viable/strict/1762501778 -> viable/strict/1762501778 2025-12-04T09:33:42.0965539Z * [new tag] viable/strict/1762504001 -> viable/strict/1762504001 2025-12-04T09:33:42.0966779Z * [new tag] viable/strict/1762505583 -> viable/strict/1762505583 2025-12-04T09:33:42.0968000Z * [new tag] viable/strict/1762507523 -> viable/strict/1762507523 2025-12-04T09:33:42.0969241Z * [new tag] viable/strict/1762511140 -> viable/strict/1762511140 2025-12-04T09:33:42.0970593Z * [new tag] viable/strict/1762512632 -> viable/strict/1762512632 2025-12-04T09:33:42.0972035Z * [new tag] viable/strict/1762520467 -> viable/strict/1762520467 2025-12-04T09:33:42.0973200Z * [new tag] viable/strict/1762522016 -> viable/strict/1762522016 2025-12-04T09:33:42.0974315Z * [new tag] viable/strict/1762530591 -> viable/strict/1762530591 2025-12-04T09:33:42.0975531Z * [new tag] viable/strict/1762543405 -> viable/strict/1762543405 2025-12-04T09:33:42.0976477Z * [new tag] viable/strict/1762544998 -> viable/strict/1762544998 2025-12-04T09:33:42.0977736Z * [new tag] viable/strict/1762552182 -> viable/strict/1762552182 2025-12-04T09:33:42.0979333Z * [new tag] viable/strict/1762554297 -> viable/strict/1762554297 2025-12-04T09:33:42.0980203Z * [new tag] viable/strict/1762559381 -> viable/strict/1762559381 2025-12-04T09:33:42.0981581Z * [new tag] viable/strict/1762562222 -> viable/strict/1762562222 2025-12-04T09:33:42.0982793Z * [new tag] viable/strict/1762564319 -> viable/strict/1762564319 2025-12-04T09:33:42.0983665Z * [new tag] viable/strict/1762566904 -> viable/strict/1762566904 2025-12-04T09:33:42.0984898Z * [new tag] viable/strict/1762569781 -> viable/strict/1762569781 2025-12-04T09:33:42.0985893Z * [new tag] viable/strict/1762575940 -> viable/strict/1762575940 2025-12-04T09:33:42.0987177Z * [new tag] viable/strict/1762580974 -> viable/strict/1762580974 2025-12-04T09:33:42.0988304Z * [new tag] viable/strict/1762583185 -> viable/strict/1762583185 2025-12-04T09:33:42.0989429Z * [new tag] viable/strict/1762586647 -> viable/strict/1762586647 2025-12-04T09:33:42.0990670Z * [new tag] viable/strict/1762588183 -> viable/strict/1762588183 2025-12-04T09:33:42.0991837Z * [new tag] viable/strict/1762593886 -> viable/strict/1762593886 2025-12-04T09:33:42.0993003Z * [new tag] viable/strict/1762650743 -> viable/strict/1762650743 2025-12-04T09:33:42.0994201Z * [new tag] viable/strict/1762653328 -> viable/strict/1762653328 2025-12-04T09:33:42.0995861Z * [new tag] viable/strict/1762659342 -> viable/strict/1762659342 2025-12-04T09:33:42.0996917Z * [new tag] viable/strict/1762662360 -> viable/strict/1762662360 2025-12-04T09:33:42.0997779Z * [new tag] viable/strict/1762667377 -> viable/strict/1762667377 2025-12-04T09:33:42.0998923Z * [new tag] viable/strict/1762671090 -> viable/strict/1762671090 2025-12-04T09:33:42.1000049Z * [new tag] viable/strict/1762680284 -> viable/strict/1762680284 2025-12-04T09:33:42.1001188Z * [new tag] viable/strict/1762683900 -> viable/strict/1762683900 2025-12-04T09:33:42.1002331Z * [new tag] viable/strict/1762705541 -> viable/strict/1762705541 2025-12-04T09:33:42.1003443Z * [new tag] viable/strict/1762709004 -> viable/strict/1762709004 2025-12-04T09:33:42.1004761Z * [new tag] viable/strict/1762746004 -> viable/strict/1762746004 2025-12-04T09:33:42.1005928Z * [new tag] viable/strict/1762748799 -> viable/strict/1762748799 2025-12-04T09:33:42.1007046Z * [new tag] viable/strict/1762759504 -> viable/strict/1762759504 2025-12-04T09:33:42.1008442Z * [new tag] viable/strict/1762760973 -> viable/strict/1762760973 2025-12-04T09:33:42.1009427Z * [new tag] viable/strict/1762775374 -> viable/strict/1762775374 2025-12-04T09:33:42.1010629Z * [new tag] viable/strict/1762777661 -> viable/strict/1762777661 2025-12-04T09:33:42.1011740Z * [new tag] viable/strict/1762779774 -> viable/strict/1762779774 2025-12-04T09:33:42.1013117Z * [new tag] viable/strict/1762781259 -> viable/strict/1762781259 2025-12-04T09:33:42.1014201Z * [new tag] viable/strict/1762793628 -> viable/strict/1762793628 2025-12-04T09:33:42.1015425Z * [new tag] viable/strict/1762800711 -> viable/strict/1762800711 2025-12-04T09:33:42.1016607Z * [new tag] viable/strict/1762809894 -> viable/strict/1762809894 2025-12-04T09:33:42.1017794Z * [new tag] viable/strict/1762811384 -> viable/strict/1762811384 2025-12-04T09:33:42.1018999Z * [new tag] viable/strict/1762813841 -> viable/strict/1762813841 2025-12-04T09:33:42.1020203Z * [new tag] viable/strict/1762815047 -> viable/strict/1762815047 2025-12-04T09:33:42.1022202Z * [new tag] viable/strict/1762817094 -> viable/strict/1762817094 2025-12-04T09:33:42.1023284Z * [new tag] viable/strict/1762818582 -> viable/strict/1762818582 2025-12-04T09:33:42.1024255Z * [new tag] viable/strict/1762821623 -> viable/strict/1762821623 2025-12-04T09:33:42.1025214Z * [new tag] viable/strict/1762823531 -> viable/strict/1762823531 2025-12-04T09:33:42.1026318Z * [new tag] viable/strict/1762849583 -> viable/strict/1762849583 2025-12-04T09:33:42.1027552Z * [new tag] viable/strict/1762851200 -> viable/strict/1762851200 2025-12-04T09:33:42.1028715Z * [new tag] viable/strict/1762854603 -> viable/strict/1762854603 2025-12-04T09:33:42.1029893Z * [new tag] viable/strict/1762858276 -> viable/strict/1762858276 2025-12-04T09:33:42.1031125Z * [new tag] viable/strict/1762860891 -> viable/strict/1762860891 2025-12-04T09:33:42.1032958Z * [new tag] viable/strict/1762866174 -> viable/strict/1762866174 2025-12-04T09:33:42.1034018Z * [new tag] viable/strict/1762867653 -> viable/strict/1762867653 2025-12-04T09:33:42.1035196Z * [new tag] viable/strict/1762872669 -> viable/strict/1762872669 2025-12-04T09:33:42.1036185Z * [new tag] viable/strict/1762878380 -> viable/strict/1762878380 2025-12-04T09:33:42.1037270Z * [new tag] viable/strict/1762889003 -> viable/strict/1762889003 2025-12-04T09:33:42.1038515Z * [new tag] viable/strict/1762890589 -> viable/strict/1762890589 2025-12-04T09:33:42.1039636Z * [new tag] viable/strict/1762892743 -> viable/strict/1762892743 2025-12-04T09:33:42.1040759Z * [new tag] viable/strict/1762894271 -> viable/strict/1762894271 2025-12-04T09:33:42.1041817Z * [new tag] viable/strict/1762896287 -> viable/strict/1762896287 2025-12-04T09:33:42.1042810Z * [new tag] viable/strict/1762915871 -> viable/strict/1762915871 2025-12-04T09:33:42.1043881Z * [new tag] viable/strict/1762918569 -> viable/strict/1762918569 2025-12-04T09:33:42.1044821Z * [new tag] viable/strict/1762919776 -> viable/strict/1762919776 2025-12-04T09:33:42.1045960Z * [new tag] viable/strict/1762923072 -> viable/strict/1762923072 2025-12-04T09:33:42.1047272Z * [new tag] viable/strict/1762928826 -> viable/strict/1762928826 2025-12-04T09:33:42.1048456Z * [new tag] viable/strict/1762930451 -> viable/strict/1762930451 2025-12-04T09:33:42.1049535Z * [new tag] viable/strict/1762933780 -> viable/strict/1762933780 2025-12-04T09:33:42.1051333Z * [new tag] viable/strict/1762937638 -> viable/strict/1762937638 2025-12-04T09:33:42.1052735Z * [new tag] viable/strict/1762939545 -> viable/strict/1762939545 2025-12-04T09:33:42.1053788Z * [new tag] viable/strict/1762962692 -> viable/strict/1762962692 2025-12-04T09:33:42.1055567Z * [new tag] viable/strict/1762979143 -> viable/strict/1762979143 2025-12-04T09:33:42.1056695Z * [new tag] viable/strict/1762984188 -> viable/strict/1762984188 2025-12-04T09:33:42.1057746Z * [new tag] viable/strict/1762986306 -> viable/strict/1762986306 2025-12-04T09:33:42.1058908Z * [new tag] viable/strict/1762989903 -> viable/strict/1762989903 2025-12-04T09:33:42.1060047Z * [new tag] viable/strict/1762991377 -> viable/strict/1762991377 2025-12-04T09:33:42.1061177Z * [new tag] viable/strict/1762998921 -> viable/strict/1762998921 2025-12-04T09:33:42.1062471Z * [new tag] viable/strict/1763002287 -> viable/strict/1763002287 2025-12-04T09:33:42.1063678Z * [new tag] viable/strict/1763016840 -> viable/strict/1763016840 2025-12-04T09:33:42.1064834Z * [new tag] viable/strict/1763020180 -> viable/strict/1763020180 2025-12-04T09:33:42.1066041Z * [new tag] viable/strict/1763027421 -> viable/strict/1763027421 2025-12-04T09:33:42.1067403Z * [new tag] viable/strict/1763031120 -> viable/strict/1763031120 2025-12-04T09:33:42.1068472Z * [new tag] viable/strict/1763036861 -> viable/strict/1763036861 2025-12-04T09:33:42.1069723Z * [new tag] viable/strict/1763038993 -> viable/strict/1763038993 2025-12-04T09:33:42.1071185Z * [new tag] viable/strict/1763054703 -> viable/strict/1763054703 2025-12-04T09:33:42.1074276Z * [new tag] viable/strict/1763067061 -> viable/strict/1763067061 2025-12-04T09:33:42.1075466Z * [new tag] viable/strict/1763070847 -> viable/strict/1763070847 2025-12-04T09:33:42.1076642Z * [new tag] viable/strict/1763072706 -> viable/strict/1763072706 2025-12-04T09:33:42.1077865Z * [new tag] viable/strict/1763076302 -> viable/strict/1763076302 2025-12-04T09:33:42.1079051Z * [new tag] viable/strict/1763080816 -> viable/strict/1763080816 2025-12-04T09:33:42.1080153Z * [new tag] viable/strict/1763082732 -> viable/strict/1763082732 2025-12-04T09:33:42.1081279Z * [new tag] viable/strict/1763085329 -> viable/strict/1763085329 2025-12-04T09:33:42.1082429Z * [new tag] viable/strict/1763088623 -> viable/strict/1763088623 2025-12-04T09:33:42.1083782Z * [new tag] viable/strict/1763091402 -> viable/strict/1763091402 2025-12-04T09:33:42.1084850Z * [new tag] viable/strict/1763092602 -> viable/strict/1763092602 2025-12-04T09:33:42.1085990Z * [new tag] viable/strict/1763094355 -> viable/strict/1763094355 2025-12-04T09:33:42.1087152Z * [new tag] viable/strict/1763099390 -> viable/strict/1763099390 2025-12-04T09:33:42.1088294Z * [new tag] viable/strict/1763101608 -> viable/strict/1763101608 2025-12-04T09:33:42.1089513Z * [new tag] viable/strict/1763105102 -> viable/strict/1763105102 2025-12-04T09:33:42.1090729Z * [new tag] viable/strict/1763112347 -> viable/strict/1763112347 2025-12-04T09:33:42.1091877Z * [new tag] viable/strict/1763119471 -> viable/strict/1763119471 2025-12-04T09:33:42.1093006Z * [new tag] viable/strict/1763126835 -> viable/strict/1763126835 2025-12-04T09:33:42.1093833Z * [new tag] viable/strict/1763149779 -> viable/strict/1763149779 2025-12-04T09:33:42.1094980Z * [new tag] viable/strict/1763164178 -> viable/strict/1763164178 2025-12-04T09:33:42.1096146Z * [new tag] viable/strict/1763167104 -> viable/strict/1763167104 2025-12-04T09:33:42.1097390Z * [new tag] viable/strict/1763169132 -> viable/strict/1763169132 2025-12-04T09:33:42.1098495Z * [new tag] viable/strict/1763171708 -> viable/strict/1763171708 2025-12-04T09:33:42.1099620Z * [new tag] viable/strict/1763174759 -> viable/strict/1763174759 2025-12-04T09:33:42.1100806Z * [new tag] viable/strict/1763180744 -> viable/strict/1763180744 2025-12-04T09:33:42.1101977Z * [new tag] viable/strict/1763182227 -> viable/strict/1763182227 2025-12-04T09:33:42.1103118Z * [new tag] viable/strict/1763184309 -> viable/strict/1763184309 2025-12-04T09:33:42.1104936Z * [new tag] viable/strict/1763187991 -> viable/strict/1763187991 2025-12-04T09:33:42.1105940Z * [new tag] viable/strict/1763191445 -> viable/strict/1763191445 2025-12-04T09:33:42.1107649Z * [new tag] viable/strict/1763195152 -> viable/strict/1763195152 2025-12-04T09:33:42.1108513Z * [new tag] viable/strict/1763205769 -> viable/strict/1763205769 2025-12-04T09:33:42.1109689Z * [new tag] viable/strict/1763246990 -> viable/strict/1763246990 2025-12-04T09:33:42.1110882Z * [new tag] viable/strict/1763261578 -> viable/strict/1763261578 2025-12-04T09:33:42.1111959Z * [new tag] viable/strict/1763286573 -> viable/strict/1763286573 2025-12-04T09:33:42.1112926Z * [new tag] viable/strict/1763292167 -> viable/strict/1763292167 2025-12-04T09:33:42.1114119Z * [new tag] viable/strict/1763333386 -> viable/strict/1763333386 2025-12-04T09:33:42.1115229Z * [new tag] viable/strict/1763340082 -> viable/strict/1763340082 2025-12-04T09:33:42.1117352Z * [new tag] viable/strict/1763364324 -> viable/strict/1763364324 2025-12-04T09:33:42.1118343Z * [new tag] viable/strict/1763371569 -> viable/strict/1763371569 2025-12-04T09:33:42.1119481Z * [new tag] viable/strict/1763373067 -> viable/strict/1763373067 2025-12-04T09:33:42.1120601Z * [new tag] viable/strict/1763375157 -> viable/strict/1763375157 2025-12-04T09:33:42.1121771Z * [new tag] viable/strict/1763382462 -> viable/strict/1763382462 2025-12-04T09:33:42.1122982Z * [new tag] viable/strict/1763394661 -> viable/strict/1763394661 2025-12-04T09:33:42.1124442Z * [new tag] viable/strict/1763396797 -> viable/strict/1763396797 2025-12-04T09:33:42.1125553Z * [new tag] viable/strict/1763398542 -> viable/strict/1763398542 2025-12-04T09:33:42.1126697Z * [new tag] viable/strict/1763401807 -> viable/strict/1763401807 2025-12-04T09:33:42.1127696Z * [new tag] viable/strict/1763414698 -> viable/strict/1763414698 2025-12-04T09:33:42.1128912Z * [new tag] viable/strict/1763419807 -> viable/strict/1763419807 2025-12-04T09:33:42.1130051Z * [new tag] viable/strict/1763426369 -> viable/strict/1763426369 2025-12-04T09:33:42.1131291Z * [new tag] viable/strict/1763428331 -> viable/strict/1763428331 2025-12-04T09:33:42.1132502Z * [new tag] viable/strict/1763430922 -> viable/strict/1763430922 2025-12-04T09:33:42.1134064Z * [new tag] viable/strict/1763434184 -> viable/strict/1763434184 2025-12-04T09:33:42.1135100Z * [new tag] viable/strict/1763439973 -> viable/strict/1763439973 2025-12-04T09:33:42.1136435Z * [new tag] viable/strict/1763444995 -> viable/strict/1763444995 2025-12-04T09:33:42.1137593Z * [new tag] viable/strict/1763447206 -> viable/strict/1763447206 2025-12-04T09:33:42.1138825Z * [new tag] viable/strict/1763448826 -> viable/strict/1763448826 2025-12-04T09:33:42.1139974Z * [new tag] viable/strict/1763450717 -> viable/strict/1763450717 2025-12-04T09:33:42.1141152Z * [new tag] viable/strict/1763452183 -> viable/strict/1763452183 2025-12-04T09:33:42.1142539Z * [new tag] viable/strict/1763457945 -> viable/strict/1763457945 2025-12-04T09:33:42.1143541Z * [new tag] viable/strict/1763459439 -> viable/strict/1763459439 2025-12-04T09:33:42.1144532Z * [new tag] viable/strict/1763461556 -> viable/strict/1763461556 2025-12-04T09:33:42.1145952Z * [new tag] viable/strict/1763463103 -> viable/strict/1763463103 2025-12-04T09:33:42.1147052Z * [new tag] viable/strict/1763465100 -> viable/strict/1763465100 2025-12-04T09:33:42.1148073Z * [new tag] viable/strict/1763468866 -> viable/strict/1763468866 2025-12-04T09:33:42.1149021Z * [new tag] viable/strict/1763493823 -> viable/strict/1763493823 2025-12-04T09:33:42.1149995Z * [new tag] viable/strict/1763496249 -> viable/strict/1763496249 2025-12-04T09:33:42.1151162Z * [new tag] viable/strict/1763502620 -> viable/strict/1763502620 2025-12-04T09:33:42.1152338Z * [new tag] viable/strict/1763504715 -> viable/strict/1763504715 2025-12-04T09:33:42.1153534Z * [new tag] viable/strict/1763506208 -> viable/strict/1763506208 2025-12-04T09:33:42.1154671Z * [new tag] viable/strict/1763520590 -> viable/strict/1763520590 2025-12-04T09:33:42.1155890Z * [new tag] viable/strict/1763523357 -> viable/strict/1763523357 2025-12-04T09:33:42.1157082Z * [new tag] viable/strict/1763529922 -> viable/strict/1763529922 2025-12-04T09:33:42.1158326Z * [new tag] viable/strict/1763531408 -> viable/strict/1763531408 2025-12-04T09:33:42.1159455Z * [new tag] viable/strict/1763533622 -> viable/strict/1763533622 2025-12-04T09:33:42.1160618Z * [new tag] viable/strict/1763538576 -> viable/strict/1763538576 2025-12-04T09:33:42.1161828Z * [new tag] viable/strict/1763545823 -> viable/strict/1763545823 2025-12-04T09:33:42.1162785Z * [new tag] viable/strict/1763547951 -> viable/strict/1763547951 2025-12-04T09:33:42.1164159Z * [new tag] viable/strict/1763551477 -> viable/strict/1763551477 2025-12-04T09:33:42.1165207Z * [new tag] viable/strict/1763552982 -> viable/strict/1763552982 2025-12-04T09:33:42.1166379Z * [new tag] viable/strict/1763594698 -> viable/strict/1763594698 2025-12-04T09:33:42.1167505Z * [new tag] viable/strict/1763596178 -> viable/strict/1763596178 2025-12-04T09:33:42.1168651Z * [new tag] viable/strict/1763599155 -> viable/strict/1763599155 2025-12-04T09:33:42.1169838Z * [new tag] viable/strict/1763603717 -> viable/strict/1763603717 2025-12-04T09:33:42.1171177Z * [new tag] viable/strict/1763606923 -> viable/strict/1763606923 2025-12-04T09:33:42.1172572Z * [new tag] viable/strict/1763609715 -> viable/strict/1763609715 2025-12-04T09:33:42.1173526Z * [new tag] viable/strict/1763612757 -> viable/strict/1763612757 2025-12-04T09:33:42.1174705Z * [new tag] viable/strict/1763616325 -> viable/strict/1763616325 2025-12-04T09:33:42.1175867Z * [new tag] viable/strict/1763623509 -> viable/strict/1763623509 2025-12-04T09:33:42.1177379Z * [new tag] viable/strict/1763624984 -> viable/strict/1763624984 2025-12-04T09:33:42.1178621Z * [new tag] viable/strict/1763628796 -> viable/strict/1763628796 2025-12-04T09:33:42.1179612Z * [new tag] viable/strict/1763634343 -> viable/strict/1763634343 2025-12-04T09:33:42.1180717Z * [new tag] viable/strict/1763635867 -> viable/strict/1763635867 2025-12-04T09:33:42.1182144Z * [new tag] viable/strict/1763639382 -> viable/strict/1763639382 2025-12-04T09:33:42.1183291Z * [new tag] viable/strict/1763646626 -> viable/strict/1763646626 2025-12-04T09:33:42.1184725Z * [new tag] viable/strict/1763655997 -> viable/strict/1763655997 2025-12-04T09:33:42.1185778Z * [new tag] viable/strict/1763659444 -> viable/strict/1763659444 2025-12-04T09:33:42.1187009Z * [new tag] viable/strict/1763660992 -> viable/strict/1763660992 2025-12-04T09:33:42.1188116Z * [new tag] viable/strict/1763663201 -> viable/strict/1763663201 2025-12-04T09:33:42.1189335Z * [new tag] viable/strict/1763670362 -> viable/strict/1763670362 2025-12-04T09:33:42.1190305Z * [new tag] viable/strict/1763675378 -> viable/strict/1763675378 2025-12-04T09:33:42.1191437Z * [new tag] viable/strict/1763693343 -> viable/strict/1763693343 2025-12-04T09:33:42.1192558Z * [new tag] viable/strict/1763696088 -> viable/strict/1763696088 2025-12-04T09:33:42.1194059Z * [new tag] viable/strict/1763697343 -> viable/strict/1763697343 2025-12-04T09:33:42.1195089Z * [new tag] viable/strict/1763699165 -> viable/strict/1763699165 2025-12-04T09:33:42.1196290Z * [new tag] viable/strict/1763700660 -> viable/strict/1763700660 2025-12-04T09:33:42.1197358Z * [new tag] viable/strict/1763704209 -> viable/strict/1763704209 2025-12-04T09:33:42.1198732Z * [new tag] viable/strict/1763706411 -> viable/strict/1763706411 2025-12-04T09:33:42.1199711Z * [new tag] viable/strict/1763708082 -> viable/strict/1763708082 2025-12-04T09:33:42.1200766Z * [new tag] viable/strict/1763711381 -> viable/strict/1763711381 2025-12-04T09:33:42.1201808Z * [new tag] viable/strict/1763713593 -> viable/strict/1763713593 2025-12-04T09:33:42.1202931Z * [new tag] viable/strict/1763715201 -> viable/strict/1763715201 2025-12-04T09:33:42.1204064Z * [new tag] viable/strict/1763733017 -> viable/strict/1763733017 2025-12-04T09:33:42.1205240Z * [new tag] viable/strict/1763735108 -> viable/strict/1763735108 2025-12-04T09:33:42.1206385Z * [new tag] viable/strict/1763749579 -> viable/strict/1763749579 2025-12-04T09:33:42.1207503Z * [new tag] viable/strict/1763751113 -> viable/strict/1763751113 2025-12-04T09:33:42.1209278Z * [new tag] viable/strict/1763753035 -> viable/strict/1763753035 2025-12-04T09:33:42.1210386Z * [new tag] viable/strict/1763754578 -> viable/strict/1763754578 2025-12-04T09:33:42.1211533Z * [new tag] viable/strict/1763756748 -> viable/strict/1763756748 2025-12-04T09:33:42.1212717Z * [new tag] viable/strict/1763758205 -> viable/strict/1763758205 2025-12-04T09:33:42.1213697Z * [new tag] viable/strict/1763764050 -> viable/strict/1763764050 2025-12-04T09:33:42.1214890Z * [new tag] viable/strict/1763771887 -> viable/strict/1763771887 2025-12-04T09:33:42.1216252Z * [new tag] viable/strict/1763773920 -> viable/strict/1763773920 2025-12-04T09:33:42.1217402Z * [new tag] viable/strict/1763776501 -> viable/strict/1763776501 2025-12-04T09:33:42.1218520Z * [new tag] viable/strict/1763779437 -> viable/strict/1763779437 2025-12-04T09:33:42.1219995Z * [new tag] viable/strict/1763781038 -> viable/strict/1763781038 2025-12-04T09:33:42.1221071Z * [new tag] viable/strict/1763782245 -> viable/strict/1763782245 2025-12-04T09:33:42.1222117Z * [new tag] viable/strict/1763785568 -> viable/strict/1763785568 2025-12-04T09:33:42.1223355Z * [new tag] viable/strict/1763787006 -> viable/strict/1763787006 2025-12-04T09:33:42.1224533Z * [new tag] viable/strict/1763789103 -> viable/strict/1763789103 2025-12-04T09:33:42.1225672Z * [new tag] viable/strict/1763790578 -> viable/strict/1763790578 2025-12-04T09:33:42.1226818Z * [new tag] viable/strict/1763796275 -> viable/strict/1763796275 2025-12-04T09:33:42.1228362Z * [new tag] viable/strict/1763801465 -> viable/strict/1763801465 2025-12-04T09:33:42.1229334Z * [new tag] viable/strict/1763803522 -> viable/strict/1763803522 2025-12-04T09:33:42.1230402Z * [new tag] viable/strict/1763808581 -> viable/strict/1763808581 2025-12-04T09:33:42.1231605Z * [new tag] viable/strict/1763840977 -> viable/strict/1763840977 2025-12-04T09:33:42.1232706Z * [new tag] viable/strict/1763846659 -> viable/strict/1763846659 2025-12-04T09:33:42.1233785Z * [new tag] viable/strict/1763872065 -> viable/strict/1763872065 2025-12-04T09:33:42.1234979Z * [new tag] viable/strict/1763873648 -> viable/strict/1763873648 2025-12-04T09:33:42.1236175Z * [new tag] viable/strict/1763875506 -> viable/strict/1763875506 2025-12-04T09:33:42.1237163Z * [new tag] viable/strict/1763889904 -> viable/strict/1763889904 2025-12-04T09:33:42.1238272Z * [new tag] viable/strict/1763930999 -> viable/strict/1763930999 2025-12-04T09:33:42.1239440Z * [new tag] viable/strict/1763944964 -> viable/strict/1763944964 2025-12-04T09:33:42.1240459Z * [new tag] viable/strict/1763958474 -> viable/strict/1763958474 2025-12-04T09:33:42.1241600Z * [new tag] viable/strict/1763967263 -> viable/strict/1763967263 2025-12-04T09:33:42.1242709Z * [new tag] viable/strict/1763972803 -> viable/strict/1763972803 2025-12-04T09:33:42.1243800Z * [new tag] viable/strict/1763976376 -> viable/strict/1763976376 2025-12-04T09:33:42.1244994Z * [new tag] viable/strict/1763989404 -> viable/strict/1763989404 2025-12-04T09:33:42.1246138Z * [new tag] viable/strict/1763990887 -> viable/strict/1763990887 2025-12-04T09:33:42.1247278Z * [new tag] viable/strict/1764019919 -> viable/strict/1764019919 2025-12-04T09:33:42.1248481Z * [new tag] viable/strict/1764023134 -> viable/strict/1764023134 2025-12-04T09:33:42.1249451Z * [new tag] viable/strict/1764024593 -> viable/strict/1764024593 2025-12-04T09:33:42.1250549Z * [new tag] viable/strict/1764026706 -> viable/strict/1764026706 2025-12-04T09:33:42.1252172Z * [new tag] viable/strict/1764031139 -> viable/strict/1764031139 2025-12-04T09:33:42.1253180Z * [new tag] viable/strict/1764033131 -> viable/strict/1764033131 2025-12-04T09:33:42.1254152Z * [new tag] viable/strict/1764035725 -> viable/strict/1764035725 2025-12-04T09:33:42.1255150Z * [new tag] viable/strict/1764624265 -> viable/strict/1764624265 2025-12-04T09:33:42.1256066Z * [new tag] viable/strict/1764631514 -> viable/strict/1764631514 2025-12-04T09:33:42.1257291Z * [new tag] viable/strict/1764632987 -> viable/strict/1764632987 2025-12-04T09:33:42.1258262Z * [new tag] viable/strict/1764636063 -> viable/strict/1764636063 2025-12-04T09:33:42.1259203Z * [new tag] viable/strict/1764643975 -> viable/strict/1764643975 2025-12-04T09:33:42.1260176Z * [new tag] viable/strict/1764646859 -> viable/strict/1764646859 2025-12-04T09:33:42.1261225Z * [new tag] viable/strict/1764653120 -> viable/strict/1764653120 2025-12-04T09:33:42.1262100Z * [new tag] viable/strict/1764654632 -> viable/strict/1764654632 2025-12-04T09:33:42.1263050Z * [new tag] viable/strict/1764656821 -> viable/strict/1764656821 2025-12-04T09:33:42.1264011Z * [new tag] viable/strict/1764658557 -> viable/strict/1764658557 2025-12-04T09:33:42.1264936Z * [new tag] viable/strict/1764660333 -> viable/strict/1764660333 2025-12-04T09:33:42.1265933Z * [new tag] viable/strict/1764661812 -> viable/strict/1764661812 2025-12-04T09:33:42.1266904Z * [new tag] viable/strict/1764664023 -> viable/strict/1764664023 2025-12-04T09:33:42.1267837Z * [new tag] viable/strict/1764669150 -> viable/strict/1764669150 2025-12-04T09:33:42.1268793Z * [new tag] viable/strict/1764680709 -> viable/strict/1764680709 2025-12-04T09:33:42.1269767Z * [new tag] viable/strict/1764687619 -> viable/strict/1764687619 2025-12-04T09:33:42.1270683Z * [new tag] viable/strict/1764696355 -> viable/strict/1764696355 2025-12-04T09:33:42.1271851Z * [new tag] viable/strict/1764701767 -> viable/strict/1764701767 2025-12-04T09:33:42.1272790Z * [new tag] viable/strict/1764710768 -> viable/strict/1764710768 2025-12-04T09:33:42.1273736Z * [new tag] viable/strict/1764716202 -> viable/strict/1764716202 2025-12-04T09:33:42.1274687Z * [new tag] viable/strict/1764793566 -> viable/strict/1764793566 2025-12-04T09:33:42.1275670Z * [new tag] viable/strict/1764797093 -> viable/strict/1764797093 2025-12-04T09:33:42.1276620Z * [new tag] viable/strict/1764800729 -> viable/strict/1764800729 2025-12-04T09:33:42.1278016Z * [new tag] whc_flight_1 -> whc_flight_1 2025-12-04T09:33:42.1279018Z * [new tag] whc_flight_2 -> whc_flight_2 2025-12-04T09:33:42.1280533Z * [new tag] whc_flight_4 -> whc_flight_4 2025-12-04T09:33:42.2142141Z [command]/usr/bin/git rev-parse --verify --quiet ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32^{object} 2025-12-04T09:33:42.2169451Z ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32 2025-12-04T09:33:42.2173609Z ##[endgroup] 2025-12-04T09:33:42.2174082Z ##[group]Determining the checkout info 2025-12-04T09:33:42.2175087Z ##[endgroup] 2025-12-04T09:33:42.2179557Z [command]/usr/bin/git sparse-checkout disable 2025-12-04T09:33:42.2214376Z [command]/usr/bin/git config --local --unset-all extensions.worktreeConfig 2025-12-04T09:33:42.2243368Z ##[group]Checking out the ref 2025-12-04T09:33:42.2247064Z [command]/usr/bin/git checkout --progress --force ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32 2025-12-04T09:33:43.2786650Z Updating files: 75% (15192/20121) 2025-12-04T09:33:43.2947418Z Updating files: 76% (15292/20121) 2025-12-04T09:33:43.3092080Z Updating files: 77% (15494/20121) 2025-12-04T09:33:43.3321933Z Updating files: 78% (15695/20121) 2025-12-04T09:33:43.3617185Z Updating files: 79% (15896/20121) 2025-12-04T09:33:43.3978185Z Updating files: 80% (16097/20121) 2025-12-04T09:33:43.4300786Z Updating files: 81% (16299/20121) 2025-12-04T09:33:43.4539638Z Updating files: 82% (16500/20121) 2025-12-04T09:33:43.4709233Z Updating files: 83% (16701/20121) 2025-12-04T09:33:43.4864371Z Updating files: 84% (16902/20121) 2025-12-04T09:33:43.5044775Z Updating files: 85% (17103/20121) 2025-12-04T09:33:43.5217224Z Updating files: 86% (17305/20121) 2025-12-04T09:33:43.5371501Z Updating files: 87% (17506/20121) 2025-12-04T09:33:43.5497959Z Updating files: 88% (17707/20121) 2025-12-04T09:33:43.5648352Z Updating files: 89% (17908/20121) 2025-12-04T09:33:43.5841485Z Updating files: 90% (18109/20121) 2025-12-04T09:33:43.5969520Z Updating files: 91% (18311/20121) 2025-12-04T09:33:43.6142156Z Updating files: 92% (18512/20121) 2025-12-04T09:33:43.6347494Z Updating files: 93% (18713/20121) 2025-12-04T09:33:43.6576671Z Updating files: 94% (18914/20121) 2025-12-04T09:33:43.6771077Z Updating files: 95% (19115/20121) 2025-12-04T09:33:43.6946944Z Updating files: 96% (19317/20121) 2025-12-04T09:33:43.7130752Z Updating files: 97% (19518/20121) 2025-12-04T09:33:43.7450067Z Updating files: 98% (19719/20121) 2025-12-04T09:33:43.7647769Z Updating files: 99% (19920/20121) 2025-12-04T09:33:43.7648141Z Updating files: 100% (20121/20121) 2025-12-04T09:33:43.7648515Z Updating files: 100% (20121/20121), done. 2025-12-04T09:33:43.7966534Z Note: switching to 'ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32'. 2025-12-04T09:33:43.7966929Z 2025-12-04T09:33:43.7967188Z You are in 'detached HEAD' state. You can look around, make experimental 2025-12-04T09:33:43.7967841Z changes and commit them, and you can discard any commits you make in this 2025-12-04T09:33:43.7968495Z state without impacting any branches by switching back to a branch. 2025-12-04T09:33:43.7968885Z 2025-12-04T09:33:43.7969127Z If you want to create a new branch to retain commits you create, you may 2025-12-04T09:33:43.7969721Z do so (now or later) by using -c with the switch command. Example: 2025-12-04T09:33:43.7970074Z 2025-12-04T09:33:43.7970203Z git switch -c 2025-12-04T09:33:43.7970658Z 2025-12-04T09:33:43.7970800Z Or undo this operation with: 2025-12-04T09:33:43.7971164Z 2025-12-04T09:33:43.7971269Z git switch - 2025-12-04T09:33:43.7971436Z 2025-12-04T09:33:43.7971711Z Turn off this advice by setting config variable advice.detachedHead to false 2025-12-04T09:33:43.7972466Z 2025-12-04T09:33:43.7973095Z HEAD is now at ffd9b0fb435 Resolve collective autotuning test failure on arm (#168919) 2025-12-04T09:33:43.8061366Z ##[endgroup] 2025-12-04T09:33:43.8061864Z ##[group]Setting up auth for fetching submodules 2025-12-04T09:33:43.8068191Z [command]/usr/bin/git config --global http.https://github.com/.extraheader AUTHORIZATION: basic *** 2025-12-04T09:33:43.8121459Z [command]/usr/bin/git config --global --unset-all url.https://github.com/.insteadOf 2025-12-04T09:33:43.8152148Z [command]/usr/bin/git config --global --add url.https://github.com/.insteadOf git@github.com: 2025-12-04T09:33:43.8182164Z [command]/usr/bin/git config --global --add url.https://github.com/.insteadOf org-21003710@github.com: 2025-12-04T09:33:43.8208055Z ##[endgroup] 2025-12-04T09:33:43.8208914Z ##[group]Fetching submodules 2025-12-04T09:33:43.8212793Z [command]/usr/bin/git submodule sync --recursive 2025-12-04T09:33:43.8589320Z [command]/usr/bin/git -c protocol.version=2 submodule update --init --force --recursive 2025-12-04T09:33:43.8932149Z Submodule 'android/libs/fbjni' (https://github.com/facebookincubator/fbjni.git) registered for path 'android/libs/fbjni' 2025-12-04T09:33:43.8933408Z Submodule 'third_party/NNPACK_deps/FP16' (https://github.com/Maratyszcza/FP16.git) registered for path 'third_party/FP16' 2025-12-04T09:33:43.8935990Z Submodule 'third_party/NNPACK_deps/FXdiv' (https://github.com/Maratyszcza/FXdiv.git) registered for path 'third_party/FXdiv' 2025-12-04T09:33:43.8942243Z Submodule 'third_party/NNPACK' (https://github.com/Maratyszcza/NNPACK.git) registered for path 'third_party/NNPACK' 2025-12-04T09:33:43.8943309Z Submodule 'third_party/NVTX' (https://github.com/NVIDIA/NVTX.git) registered for path 'third_party/NVTX' 2025-12-04T09:33:43.8945685Z Submodule 'third_party/VulkanMemoryAllocator' (https://github.com/GPUOpen-LibrariesAndSDKs/VulkanMemoryAllocator.git) registered for path 'third_party/VulkanMemoryAllocator' 2025-12-04T09:33:43.8949225Z Submodule 'third_party/XNNPACK' (https://github.com/google/XNNPACK.git) registered for path 'third_party/XNNPACK' 2025-12-04T09:33:43.8952650Z Submodule 'third_party/aiter' (https://github.com/ROCm/aiter.git) registered for path 'third_party/aiter' 2025-12-04T09:33:43.8956356Z Submodule 'third_party/benchmark' (https://github.com/google/benchmark.git) registered for path 'third_party/benchmark' 2025-12-04T09:33:43.8960563Z Submodule 'third_party/composable_kernel' (https://github.com/ROCm/composable_kernel.git) registered for path 'third_party/composable_kernel' 2025-12-04T09:33:43.8964117Z Submodule 'third_party/cpp-httplib' (https://github.com/yhirose/cpp-httplib.git) registered for path 'third_party/cpp-httplib' 2025-12-04T09:33:43.8968044Z Submodule 'third_party/cpuinfo' (https://github.com/pytorch/cpuinfo.git) registered for path 'third_party/cpuinfo' 2025-12-04T09:33:43.8972956Z Submodule 'third_party/cudnn_frontend' (https://github.com/NVIDIA/cudnn-frontend.git) registered for path 'third_party/cudnn_frontend' 2025-12-04T09:33:43.8977238Z Submodule 'third_party/cutlass' (https://github.com/NVIDIA/cutlass.git) registered for path 'third_party/cutlass' 2025-12-04T09:33:43.8981448Z Submodule 'third_party/fbgemm' (https://github.com/pytorch/fbgemm) registered for path 'third_party/fbgemm' 2025-12-04T09:33:43.8987191Z Submodule 'third_party/flash-attention' (https://github.com/Dao-AILab/flash-attention.git) registered for path 'third_party/flash-attention' 2025-12-04T09:33:43.8993224Z Submodule 'third_party/flatbuffers' (https://github.com/google/flatbuffers.git) registered for path 'third_party/flatbuffers' 2025-12-04T09:33:43.8997828Z Submodule 'third_party/fmt' (https://github.com/fmtlib/fmt.git) registered for path 'third_party/fmt' 2025-12-04T09:33:43.9002703Z Submodule 'third_party/gemmlowp/gemmlowp' (https://github.com/google/gemmlowp.git) registered for path 'third_party/gemmlowp/gemmlowp' 2025-12-04T09:33:43.9007335Z Submodule 'third_party/gloo' (https://github.com/pytorch/gloo) registered for path 'third_party/gloo' 2025-12-04T09:33:43.9012374Z Submodule 'third_party/googletest' (https://github.com/google/googletest.git) registered for path 'third_party/googletest' 2025-12-04T09:33:43.9017299Z Submodule 'third_party/ideep' (https://github.com/intel/ideep) registered for path 'third_party/ideep' 2025-12-04T09:33:43.9022519Z Submodule 'third_party/ittapi' (https://github.com/intel/ittapi.git) registered for path 'third_party/ittapi' 2025-12-04T09:33:43.9027615Z Submodule 'third_party/kineto' (https://github.com/pytorch/kineto) registered for path 'third_party/kineto' 2025-12-04T09:33:43.9033487Z Submodule 'third_party/kleidiai' (https://github.com/ARM-software/kleidiai.git) registered for path 'third_party/kleidiai' 2025-12-04T09:33:43.9039465Z Submodule 'third_party/mimalloc' (https://github.com/microsoft/mimalloc.git) registered for path 'third_party/mimalloc' 2025-12-04T09:33:43.9044841Z Submodule 'third_party/nlohmann' (https://github.com/nlohmann/json.git) registered for path 'third_party/nlohmann' 2025-12-04T09:33:43.9051423Z Submodule 'third_party/onnx' (https://github.com/onnx/onnx.git) registered for path 'third_party/onnx' 2025-12-04T09:33:43.9057780Z Submodule 'third_party/opentelemetry-cpp' (https://github.com/open-telemetry/opentelemetry-cpp.git) registered for path 'third_party/opentelemetry-cpp' 2025-12-04T09:33:43.9063097Z Submodule 'third_party/pocketfft' (https://github.com/mreineck/pocketfft) registered for path 'third_party/pocketfft' 2025-12-04T09:33:43.9069196Z Submodule 'third_party/protobuf' (https://github.com/protocolbuffers/protobuf.git) registered for path 'third_party/protobuf' 2025-12-04T09:33:43.9075539Z Submodule 'third_party/NNPACK_deps/psimd' (https://github.com/Maratyszcza/psimd.git) registered for path 'third_party/psimd' 2025-12-04T09:33:43.9082290Z Submodule 'third_party/NNPACK_deps/pthreadpool' (https://github.com/Maratyszcza/pthreadpool.git) registered for path 'third_party/pthreadpool' 2025-12-04T09:33:43.9089390Z Submodule 'third_party/pybind11' (https://github.com/pybind/pybind11.git) registered for path 'third_party/pybind11' 2025-12-04T09:33:43.9095917Z Submodule 'third_party/python-peachpy' (https://github.com/malfet/PeachPy.git) registered for path 'third_party/python-peachpy' 2025-12-04T09:33:43.9102528Z Submodule 'third_party/sleef' (https://github.com/shibatch/sleef) registered for path 'third_party/sleef' 2025-12-04T09:33:43.9109575Z Submodule 'third_party/tensorpipe' (https://github.com/pytorch/tensorpipe.git) registered for path 'third_party/tensorpipe' 2025-12-04T09:33:43.9147061Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/android/libs/fbjni'... 2025-12-04T09:33:44.1532485Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/FXdiv'... 2025-12-04T09:33:44.1533401Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/FP16'... 2025-12-04T09:33:44.1564841Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/fmt'... 2025-12-04T09:33:47.9757592Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/NNPACK'... 2025-12-04T09:33:47.9759721Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/benchmark'... 2025-12-04T09:33:47.9761635Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/NVTX'... 2025-12-04T09:33:47.9763414Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/gloo'... 2025-12-04T09:33:47.9765421Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/gemmlowp/gemmlowp'... 2025-12-04T09:33:47.9816588Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/cpuinfo'... 2025-12-04T09:33:47.9818662Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/flash-attention'... 2025-12-04T09:33:47.9820823Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/cpp-httplib'... 2025-12-04T09:33:47.9822646Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/ideep'... 2025-12-04T09:33:47.9824369Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/ittapi'... 2025-12-04T09:33:47.9826063Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/kleidiai'... 2025-12-04T09:33:47.9827855Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/pocketfft'... 2025-12-04T09:33:47.9829782Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/cudnn_frontend'... 2025-12-04T09:33:47.9831503Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/psimd'... 2025-12-04T09:33:47.9833258Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/pthreadpool'... 2025-12-04T09:33:47.9835211Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/googletest'... 2025-12-04T09:33:47.9836974Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/mimalloc'... 2025-12-04T09:33:48.0100469Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/flatbuffers'... 2025-12-04T09:33:48.1887646Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/opentelemetry-cpp'... 2025-12-04T09:34:08.3052849Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/python-peachpy'... 2025-12-04T09:34:08.3055358Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/VulkanMemoryAllocator'... 2025-12-04T09:34:08.3058975Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/tensorpipe'... 2025-12-04T09:34:08.3060700Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/kineto'... 2025-12-04T09:34:08.3062312Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/sleef'... 2025-12-04T09:34:08.3063928Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/pybind11'... 2025-12-04T09:34:08.3065583Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/cutlass'... 2025-12-04T09:34:08.3067183Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/fbgemm'... 2025-12-04T09:34:08.3068767Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/onnx'... 2025-12-04T09:34:08.3070486Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/composable_kernel'... 2025-12-04T09:34:08.3072774Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/nlohmann'... 2025-12-04T09:34:08.4054235Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/XNNPACK'... 2025-12-04T09:34:13.0307069Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/aiter'... 2025-12-04T09:34:13.0307997Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/protobuf'... 2025-12-04T09:34:13.0493852Z Submodule path 'android/libs/fbjni': checked out '7e1e1fe3858c63c251c637ae41a20de425dde96f' 2025-12-04T09:34:13.0641141Z Submodule path 'third_party/FP16': checked out '4dfe081cf6bcd15db339cf2680b9281b8451eeb3' 2025-12-04T09:34:13.0752508Z Submodule path 'third_party/FXdiv': checked out 'b408327ac2a15ec3e43352421954f5b1967701d1' 2025-12-04T09:34:13.1044538Z Submodule path 'third_party/NNPACK': checked out 'c07e3a0400713d546e0dea2d5466dd22ea389c73' 2025-12-04T09:34:13.2021568Z Submodule path 'third_party/NVTX': checked out '3ebbc93ded7285963bff932c678fa367eb393ba6' 2025-12-04T09:34:13.2606513Z Submodule path 'third_party/VulkanMemoryAllocator': checked out '1d8f600fd424278486eade7ed3e877c99f0846b1' 2025-12-04T09:34:14.1198546Z Submodule path 'third_party/XNNPACK': checked out '51a0103656eff6fc9bfd39a4597923c4b542c883' 2025-12-04T09:34:14.3405823Z Submodule path 'third_party/aiter': checked out '01aae101b9e5e94d6c16a9514c9fb8df99c93150' 2025-12-04T09:34:14.3428444Z Submodule '3rdparty/composable_kernel' (https://github.com/ROCm/composable_kernel.git) registered for path 'third_party/aiter/3rdparty/composable_kernel' 2025-12-04T09:34:14.3459191Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/aiter/3rdparty/composable_kernel'... 2025-12-04T09:34:19.5728741Z Submodule path 'third_party/aiter/3rdparty/composable_kernel': checked out 'cffe8fa2a442ac8e80dd236a1a5d24fe3d7e0cbf' 2025-12-04T09:34:19.6010872Z Submodule path 'third_party/benchmark': checked out '299e5928955cc62af9968370293b916f5130916f' 2025-12-04T09:34:20.0146752Z Submodule path 'third_party/composable_kernel': checked out '7fe50dc3da2069d6645d9deb8c017a876472a977' 2025-12-04T09:34:20.0718003Z Submodule path 'third_party/cpp-httplib': checked out '89c932f313c6437c38f2982869beacc89c2f2246' 2025-12-04T09:34:20.1855339Z Submodule path 'third_party/cpuinfo': checked out 'f858c30bcb16f8effd5ff46996f0514539e17abc' 2025-12-04T09:34:20.2422923Z Submodule path 'third_party/cudnn_frontend': checked out '0b1577c8c83401237d601d0d0db5210506705396' 2025-12-04T09:34:20.9940338Z Submodule path 'third_party/cutlass': checked out 'f88806b1e31dfa579842638740216dd41fc6c588' 2025-12-04T09:34:21.1755571Z Submodule path 'third_party/fbgemm': checked out 'c0b988d39a9e47c794d699f29930ed4d7c7e13a4' 2025-12-04T09:34:21.1780869Z Submodule 'external/asmjit' (https://github.com/asmjit/asmjit.git) registered for path 'third_party/fbgemm/external/asmjit' 2025-12-04T09:34:21.1784361Z Submodule 'external/composable_kernel' (https://github.com/ROCm/composable_kernel.git) registered for path 'third_party/fbgemm/external/composable_kernel' 2025-12-04T09:34:21.1787172Z Submodule 'external/cpuinfo' (https://github.com/pytorch/cpuinfo) registered for path 'third_party/fbgemm/external/cpuinfo' 2025-12-04T09:34:21.1790273Z Submodule 'external/cutlass' (https://github.com/jwfromm/cutlass) registered for path 'third_party/fbgemm/external/cutlass' 2025-12-04T09:34:21.1793616Z Submodule 'external/googletest' (https://github.com/google/googletest) registered for path 'third_party/fbgemm/external/googletest' 2025-12-04T09:34:21.1797111Z Submodule 'external/hipify_torch' (https://github.com/ROCmSoftwarePlatform/hipify_torch.git) registered for path 'third_party/fbgemm/external/hipify_torch' 2025-12-04T09:34:21.1800399Z Submodule 'external/json' (https://github.com/nlohmann/json.git) registered for path 'third_party/fbgemm/external/json' 2025-12-04T09:34:21.1834085Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/fbgemm/external/asmjit'... 2025-12-04T09:34:22.5489865Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/fbgemm/external/hipify_torch'... 2025-12-04T09:34:22.5491038Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/fbgemm/external/cpuinfo'... 2025-12-04T09:34:22.5492075Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/fbgemm/external/googletest'... 2025-12-04T09:34:22.6491049Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/fbgemm/external/composable_kernel'... 2025-12-04T09:34:26.3528584Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/fbgemm/external/cutlass'... 2025-12-04T09:34:26.4529459Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/fbgemm/external/json'... 2025-12-04T09:34:29.6880160Z Submodule path 'third_party/fbgemm/external/asmjit': checked out 'a3199e8857792cd10b7589ff5d58343d2c9008ea' 2025-12-04T09:34:30.1032375Z Submodule path 'third_party/fbgemm/external/composable_kernel': checked out '7fe50dc3da2069d6645d9deb8c017a876472a977' 2025-12-04T09:34:30.2211509Z Submodule path 'third_party/fbgemm/external/cpuinfo': checked out '6543fec09b2f04ac4a666882998b534afc9c1349' 2025-12-04T09:34:30.9661096Z Submodule path 'third_party/fbgemm/external/cutlass': checked out '98125ce499b0fdf7ffbe0e3052f5b8709f4840f8' 2025-12-04T09:34:31.0219980Z Submodule path 'third_party/fbgemm/external/googletest': checked out '52eb8108c5bdec04579160ae17225d66034bd723' 2025-12-04T09:34:31.0364687Z Submodule path 'third_party/fbgemm/external/hipify_torch': checked out '63b6a7b541fa7f08f8475ca7d74054db36ff2691' 2025-12-04T09:34:31.1630956Z Submodule path 'third_party/fbgemm/external/json': checked out '9cca280a4d0ccf0c08f47a99aa71d1b0e52f8d03' 2025-12-04T09:34:31.2450617Z Submodule path 'third_party/flash-attention': checked out '979702c87a8713a8e0a5e9fee122b90d2ef13be5' 2025-12-04T09:34:31.2474473Z Submodule 'csrc/composable_kernel' (https://github.com/ROCm/composable_kernel.git) registered for path 'third_party/flash-attention/csrc/composable_kernel' 2025-12-04T09:34:31.2477210Z Submodule 'csrc/cutlass' (https://github.com/NVIDIA/cutlass.git) registered for path 'third_party/flash-attention/csrc/cutlass' 2025-12-04T09:34:31.2509922Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/flash-attention/csrc/composable_kernel'... 2025-12-04T09:34:36.0998357Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/flash-attention/csrc/cutlass'... 2025-12-04T09:34:36.3858536Z Submodule path 'third_party/flash-attention/csrc/composable_kernel': checked out '888317e698e9803c62bd38568abc9e05d7709f33' 2025-12-04T09:34:37.0364638Z Submodule path 'third_party/flash-attention/csrc/cutlass': checked out 'c506e16788cb08416a4a57e11a9067beeee29420' 2025-12-04T09:34:37.1999067Z Submodule path 'third_party/flatbuffers': checked out 'a2cd1ea3b6d3fee220106b5fed3f7ce8da9eb757' 2025-12-04T09:34:37.2350077Z Submodule path 'third_party/fmt': checked out '407c905e45ad75fc29bf0f9bb7c5c2fd3475976f' 2025-12-04T09:34:37.2828923Z Submodule path 'third_party/gemmlowp/gemmlowp': checked out '3fb5c176c17c765a3492cd2f0321b0dab712f350' 2025-12-04T09:34:37.3134553Z Submodule path 'third_party/gloo': checked out '54cbae0d3a67fa890b4c3d9ee162b7860315e341' 2025-12-04T09:34:37.3668468Z Submodule path 'third_party/googletest': checked out '52eb8108c5bdec04579160ae17225d66034bd723' 2025-12-04T09:34:37.3827035Z Submodule path 'third_party/ideep': checked out '719d8e6cd7f7a0e01b155657526d693acf97c2b3' 2025-12-04T09:34:37.3846730Z Submodule 'mkl-dnn' (https://github.com/intel/mkl-dnn.git) registered for path 'third_party/ideep/mkl-dnn' 2025-12-04T09:34:37.3875472Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/ideep/mkl-dnn'... 2025-12-04T09:34:53.9708436Z Submodule path 'third_party/ideep/mkl-dnn': checked out '8d263e693366ef8db40acc569cc7d8edf644556d' 2025-12-04T09:34:53.9954712Z Submodule path 'third_party/ittapi': checked out 'dec1d23ca65ab069d225dfe40dea14f455170959' 2025-12-04T09:34:54.1006644Z Submodule path 'third_party/kineto': checked out '31f85df8fbd89c188f14ef10f1ec65379786b943' 2025-12-04T09:34:54.1028926Z Submodule 'libkineto/third_party/dynolog' (https://github.com/facebookincubator/dynolog.git) registered for path 'third_party/kineto/libkineto/third_party/dynolog' 2025-12-04T09:34:54.1031696Z Submodule 'libkineto/third_party/fmt' (https://github.com/fmtlib/fmt.git) registered for path 'third_party/kineto/libkineto/third_party/fmt' 2025-12-04T09:34:54.1034907Z Submodule 'libkineto/third_party/googletest' (https://github.com/google/googletest.git) registered for path 'third_party/kineto/libkineto/third_party/googletest' 2025-12-04T09:34:54.1067647Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/kineto/libkineto/third_party/dynolog'... 2025-12-04T09:34:55.1857095Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/kineto/libkineto/third_party/fmt'... 2025-12-04T09:34:55.7793603Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/kineto/libkineto/third_party/googletest'... 2025-12-04T09:34:55.8849306Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog': checked out 'd2ffe0a4e3acace628db49974246b66fc3e85fb1' 2025-12-04T09:34:55.8871203Z Submodule 'third_party/DCGM' (https://github.com/NVIDIA/DCGM.git) registered for path 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM' 2025-12-04T09:34:55.8873956Z Submodule 'third_party/cpr' (https://github.com/libcpr/cpr.git) registered for path 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr' 2025-12-04T09:34:55.8876923Z Submodule 'third_party/fmt' (https://github.com/fmtlib/fmt.git) registered for path 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt' 2025-12-04T09:34:55.8880188Z Submodule 'third_party/gflags' (https://github.com/gflags/gflags.git) registered for path 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags' 2025-12-04T09:34:55.8883485Z Submodule 'third_party/glog' (https://github.com/google/glog.git) registered for path 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog' 2025-12-04T09:34:55.8887061Z Submodule 'third_party/googletest' (https://github.com/google/googletest.git) registered for path 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest' 2025-12-04T09:34:55.8890642Z Submodule 'third_party/json' (https://github.com/nlohmann/json.git) registered for path 'third_party/kineto/libkineto/third_party/dynolog/third_party/json' 2025-12-04T09:34:55.8894839Z Submodule 'third_party/pfs' (https://github.com/dtrugman/pfs.git) registered for path 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs' 2025-12-04T09:34:55.8898927Z Submodule 'third_party/prometheus-cpp' (https://github.com/jupp0r/prometheus-cpp.git) registered for path 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp' 2025-12-04T09:34:55.8931866Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM'... 2025-12-04T09:34:57.9006827Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/kineto/libkineto/third_party/dynolog/third_party/pfs'... 2025-12-04T09:34:57.9008298Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp'... 2025-12-04T09:34:57.9009764Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/kineto/libkineto/third_party/dynolog/third_party/gflags'... 2025-12-04T09:34:57.9011116Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/kineto/libkineto/third_party/dynolog/third_party/cpr'... 2025-12-04T09:34:57.9012438Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/kineto/libkineto/third_party/dynolog/third_party/glog'... 2025-12-04T09:34:57.9013814Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/kineto/libkineto/third_party/dynolog/third_party/googletest'... 2025-12-04T09:34:57.9015401Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/kineto/libkineto/third_party/dynolog/third_party/fmt'... 2025-12-04T09:34:58.0007824Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/kineto/libkineto/third_party/dynolog/third_party/json'... 2025-12-04T09:35:04.6101574Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM': checked out 'ffde4e54bc7249a6039a5e6b45b395141e1217f9' 2025-12-04T09:35:04.6319827Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr': checked out '871ed52d350214a034f6ef8a3b8f51c5ce1bd400' 2025-12-04T09:35:04.6752441Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt': checked out 'cd4af11efc9c622896a3e4cb599fa28668ca3d05' 2025-12-04T09:35:04.6918097Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags': checked out 'e171aa2d15ed9eb17054558e0b3a6a413bb01067' 2025-12-04T09:35:04.6936190Z Submodule 'doc' (https://github.com/gflags/gflags.git) registered for path 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc' 2025-12-04T09:35:04.6965567Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc'... 2025-12-04T09:35:04.9908510Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc': checked out '8411df715cf522606e3b1aca386ddfc0b63d34b4' 2025-12-04T09:35:05.0130472Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog': checked out 'b33e3bad4c46c8a6345525fd822af355e5ef9446' 2025-12-04T09:35:05.0665296Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest': checked out '52eb8108c5bdec04579160ae17225d66034bd723' 2025-12-04T09:35:05.1813525Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/json': checked out '4f8fba14066156b73f1189a2b8bd568bde5284c5' 2025-12-04T09:35:05.2010253Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs': checked out 'f68a2fa8ea36c783bdd760371411fcb495aa3150' 2025-12-04T09:35:05.2213005Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp': checked out 'b1234816facfdda29845c46696a02998a4af115a' 2025-12-04T09:35:05.2233378Z Submodule 'civetweb' (https://github.com/civetweb/civetweb.git) registered for path 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/civetweb' 2025-12-04T09:35:05.2236527Z Submodule 'googletest' (https://github.com/google/googletest.git) registered for path 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/googletest' 2025-12-04T09:35:05.2269170Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/civetweb'... 2025-12-04T09:35:07.6168875Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/googletest'... 2025-12-04T09:35:07.9080612Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/civetweb': checked out 'd7ba35bbb649209c66e582d5a0244ba988a15159' 2025-12-04T09:35:07.9632498Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/googletest': checked out 'e2239ee6043f73722e7aa812a459f54a28552929' 2025-12-04T09:35:08.0011510Z Submodule path 'third_party/kineto/libkineto/third_party/fmt': checked out '40626af88bd7df9a5fb80be7b25ac85b122d6c21' 2025-12-04T09:35:08.0548578Z Submodule path 'third_party/kineto/libkineto/third_party/googletest': checked out '52eb8108c5bdec04579160ae17225d66034bd723' 2025-12-04T09:35:08.1179655Z Submodule path 'third_party/kleidiai': checked out 'd7770c89632329a9914ef1a90289917597639cbe' 2025-12-04T09:35:08.1634715Z Submodule path 'third_party/mimalloc': checked out 'fbd8b99c2b828428947d70fdc046bb55609be93e' 2025-12-04T09:35:08.2814700Z Submodule path 'third_party/nlohmann': checked out '55f93686c01528224f448c19128836e7df245f72' 2025-12-04T09:35:08.7565235Z Submodule path 'third_party/onnx': checked out 'e709452ef2bbc1d113faf678c24e6d3467696e83' 2025-12-04T09:35:08.7608527Z Submodule 'third_party/pybind11' (https://github.com/pybind/pybind11.git) registered for path 'third_party/onnx/third_party/pybind11' 2025-12-04T09:35:08.7639370Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/onnx/third_party/pybind11'... 2025-12-04T09:35:09.7396557Z Submodule path 'third_party/onnx/third_party/pybind11': checked out 'a2e59f0e7065404b44dfe92a28aca47ba1378dc4' 2025-12-04T09:35:09.8223424Z Submodule path 'third_party/opentelemetry-cpp': checked out 'a799f4aed9c94b765dcdaabaeab7d5e7e2310878' 2025-12-04T09:35:09.8246289Z Submodule 'third_party/benchmark' (https://github.com/google/benchmark) registered for path 'third_party/opentelemetry-cpp/third_party/benchmark' 2025-12-04T09:35:09.8249038Z Submodule 'third_party/googletest' (https://github.com/google/googletest) registered for path 'third_party/opentelemetry-cpp/third_party/googletest' 2025-12-04T09:35:09.8251980Z Submodule 'third_party/ms-gsl' (https://github.com/microsoft/GSL) registered for path 'third_party/opentelemetry-cpp/third_party/ms-gsl' 2025-12-04T09:35:09.8255375Z Submodule 'third_party/nlohmann-json' (https://github.com/nlohmann/json) registered for path 'third_party/opentelemetry-cpp/third_party/nlohmann-json' 2025-12-04T09:35:09.8258849Z Submodule 'third_party/opentelemetry-proto' (https://github.com/open-telemetry/opentelemetry-proto) registered for path 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto' 2025-12-04T09:35:09.8262120Z Submodule 'third_party/opentracing-cpp' (https://github.com/opentracing/opentracing-cpp.git) registered for path 'third_party/opentelemetry-cpp/third_party/opentracing-cpp' 2025-12-04T09:35:09.8266615Z Submodule 'third_party/prometheus-cpp' (https://github.com/jupp0r/prometheus-cpp) registered for path 'third_party/opentelemetry-cpp/third_party/prometheus-cpp' 2025-12-04T09:35:09.8270180Z Submodule 'tools/vcpkg' (https://github.com/Microsoft/vcpkg) registered for path 'third_party/opentelemetry-cpp/tools/vcpkg' 2025-12-04T09:35:09.8305530Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/opentelemetry-cpp/third_party/benchmark'... 2025-12-04T09:35:10.2775317Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/opentelemetry-cpp/third_party/opentracing-cpp'... 2025-12-04T09:35:10.2776748Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/opentelemetry-cpp/third_party/opentelemetry-proto'... 2025-12-04T09:35:10.2778059Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/opentelemetry-cpp/third_party/prometheus-cpp'... 2025-12-04T09:35:10.2779275Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/opentelemetry-cpp/third_party/ms-gsl'... 2025-12-04T09:35:10.3776393Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/opentelemetry-cpp/third_party/googletest'... 2025-12-04T09:35:11.0679408Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/opentelemetry-cpp/third_party/nlohmann-json'... 2025-12-04T09:35:19.1967840Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/opentelemetry-cpp/tools/vcpkg'... 2025-12-04T09:35:19.9318253Z Submodule path 'third_party/opentelemetry-cpp/third_party/benchmark': checked out 'd572f4777349d43653b21d6c2fc63020ab326db2' 2025-12-04T09:35:19.9791216Z Submodule path 'third_party/opentelemetry-cpp/third_party/googletest': checked out 'b796f7d44681514f58a683a3a71ff17c94edb0c1' 2025-12-04T09:35:19.9989273Z Submodule path 'third_party/opentelemetry-cpp/third_party/ms-gsl': checked out '6f4529395c5b7c2d661812257cd6780c67e54afa' 2025-12-04T09:35:20.1203125Z Submodule path 'third_party/opentelemetry-cpp/third_party/nlohmann-json': checked out 'bc889afb4c5bf1c0d8ee29ef35eaaf4c8bef8a5d' 2025-12-04T09:35:20.1371116Z Submodule path 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto': checked out '4ca4f0335c63cda7ab31ea7ed70d6553aee14dce' 2025-12-04T09:35:20.1552293Z Submodule path 'third_party/opentelemetry-cpp/third_party/opentracing-cpp': checked out '06b57f48ded1fa3bdd3d4346f6ef29e40e08eaf5' 2025-12-04T09:35:20.1742002Z Submodule path 'third_party/opentelemetry-cpp/third_party/prometheus-cpp': checked out 'c9ffcdda9086ffd9e1283ea7a0276d831f3c8a8d' 2025-12-04T09:35:20.1759788Z Submodule 'civetweb' (https://github.com/civetweb/civetweb.git) registered for path 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb' 2025-12-04T09:35:20.1762824Z Submodule 'googletest' (https://github.com/google/googletest.git) registered for path 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest' 2025-12-04T09:35:20.1793558Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb'... 2025-12-04T09:35:22.6049454Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest'... 2025-12-04T09:35:22.8966280Z Submodule path 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb': checked out 'eefb26f82b233268fc98577d265352720d477ba4' 2025-12-04T09:35:22.9513399Z Submodule path 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest': checked out 'e2239ee6043f73722e7aa812a459f54a28552929' 2025-12-04T09:35:23.5125817Z Submodule path 'third_party/opentelemetry-cpp/tools/vcpkg': checked out '8eb57355a4ffb410a2e94c07b4dca2dffbee8e50' 2025-12-04T09:35:23.5268396Z Submodule path 'third_party/pocketfft': checked out '0fa0ef591e38c2758e3184c6c23e497b9f732ffa' 2025-12-04T09:35:23.8427896Z Submodule path 'third_party/protobuf': checked out 'd1eca4e4b421cd2997495c4b4e65cea6be4e9b8a' 2025-12-04T09:35:23.8454575Z Submodule 'third_party/benchmark' (https://github.com/google/benchmark.git) registered for path 'third_party/protobuf/third_party/benchmark' 2025-12-04T09:35:23.8457745Z Submodule 'third_party/googletest' (https://github.com/google/googletest.git) registered for path 'third_party/protobuf/third_party/googletest' 2025-12-04T09:35:23.8491562Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/protobuf/third_party/benchmark'... 2025-12-04T09:35:24.3866227Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/protobuf/third_party/googletest'... 2025-12-04T09:35:24.8813874Z Submodule path 'third_party/protobuf/third_party/benchmark': checked out '5b7683f49e1e9223cf9927b24f6fd3d6bd82e3f8' 2025-12-04T09:35:24.9655601Z Submodule path 'third_party/protobuf/third_party/googletest': checked out '5ec7f0c4a113e2f18ac2c6cc7df51ad6afc24081' 2025-12-04T09:35:24.9769859Z Submodule path 'third_party/psimd': checked out '072586a71b55b7f8c584153d223e95687148a900' 2025-12-04T09:35:24.9917185Z Submodule path 'third_party/pthreadpool': checked out '4fe0e1e183925bf8cfa6aae24237e724a96479b8' 2025-12-04T09:35:25.0413858Z Submodule path 'third_party/pybind11': checked out 'f5fbe867d2d26e4a0a9177a51f6e568868ad3dc8' 2025-12-04T09:35:25.0769664Z Submodule path 'third_party/python-peachpy': checked out 'f45429b087dd7d5bc78bb40dc7cf06425c252d67' 2025-12-04T09:35:25.1291680Z Submodule path 'third_party/sleef': checked out '5a1d179df9cf652951b59010a2d2075372d67f68' 2025-12-04T09:35:25.1619253Z Submodule path 'third_party/tensorpipe': checked out '2b4cd91092d335a697416b2a3cb398283246849d' 2025-12-04T09:35:25.1640325Z Submodule 'third_party/googletest' (https://github.com/google/googletest.git) registered for path 'third_party/tensorpipe/third_party/googletest' 2025-12-04T09:35:25.1643227Z Submodule 'third_party/libnop' (https://github.com/google/libnop.git) registered for path 'third_party/tensorpipe/third_party/libnop' 2025-12-04T09:35:25.1646664Z Submodule 'third_party/libuv' (https://github.com/libuv/libuv.git) registered for path 'third_party/tensorpipe/third_party/libuv' 2025-12-04T09:35:25.1649559Z Submodule 'third_party/pybind11' (https://github.com/pybind/pybind11.git) registered for path 'third_party/tensorpipe/third_party/pybind11' 2025-12-04T09:35:25.1681988Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/tensorpipe/third_party/googletest'... 2025-12-04T09:35:26.5005563Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/tensorpipe/third_party/libnop'... 2025-12-04T09:35:26.5006733Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/tensorpipe/third_party/pybind11'... 2025-12-04T09:35:26.5007831Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/tensorpipe/third_party/libuv'... 2025-12-04T09:35:26.5674068Z Submodule path 'third_party/tensorpipe/third_party/googletest': checked out 'aee0f9d9b5b87796ee8a0ab26b7587ec30e8858e' 2025-12-04T09:35:26.5861766Z Submodule path 'third_party/tensorpipe/third_party/libnop': checked out '910b55815be16109f04f4180e9adee14fb4ce281' 2025-12-04T09:35:26.6726144Z Submodule path 'third_party/tensorpipe/third_party/libuv': checked out '5152db2cbfeb5582e9c27c5ea1dba2cd9e10759b' 2025-12-04T09:35:26.7067808Z Submodule path 'third_party/tensorpipe/third_party/pybind11': checked out 'a23996fce38ff6ccfbcdc09f1e63f2c4be5ea2ef' 2025-12-04T09:35:26.7086080Z Submodule 'tools/clang' (https://github.com/wjakob/clang-cindex-python3) registered for path 'third_party/tensorpipe/third_party/pybind11/tools/clang' 2025-12-04T09:35:26.7115615Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/tensorpipe/third_party/pybind11/tools/clang'... 2025-12-04T09:35:26.9184401Z Submodule path 'third_party/tensorpipe/third_party/pybind11/tools/clang': checked out '6a00cbc4a9b8e68b71caf7f774b3f9c753ae84d5' 2025-12-04T09:35:26.9224208Z [command]/usr/bin/git submodule foreach --recursive git config --local gc.auto 0 2025-12-04T09:35:26.9570467Z Entering 'android/libs/fbjni' 2025-12-04T09:35:26.9621879Z Entering 'third_party/FP16' 2025-12-04T09:35:26.9671153Z Entering 'third_party/FXdiv' 2025-12-04T09:35:26.9721959Z Entering 'third_party/NNPACK' 2025-12-04T09:35:26.9769808Z Entering 'third_party/NVTX' 2025-12-04T09:35:26.9818990Z Entering 'third_party/VulkanMemoryAllocator' 2025-12-04T09:35:26.9867130Z Entering 'third_party/XNNPACK' 2025-12-04T09:35:26.9932414Z Entering 'third_party/aiter' 2025-12-04T09:35:26.9982851Z Entering 'third_party/aiter/3rdparty/composable_kernel' 2025-12-04T09:35:27.0039945Z Entering 'third_party/benchmark' 2025-12-04T09:35:27.0088300Z Entering 'third_party/composable_kernel' 2025-12-04T09:35:27.0147410Z Entering 'third_party/cpp-httplib' 2025-12-04T09:35:27.0195187Z Entering 'third_party/cpuinfo' 2025-12-04T09:35:27.0244244Z Entering 'third_party/cudnn_frontend' 2025-12-04T09:35:27.0292617Z Entering 'third_party/cutlass' 2025-12-04T09:35:27.0351043Z Entering 'third_party/fbgemm' 2025-12-04T09:35:27.0402024Z Entering 'third_party/fbgemm/external/asmjit' 2025-12-04T09:35:27.0449060Z Entering 'third_party/fbgemm/external/composable_kernel' 2025-12-04T09:35:27.0508811Z Entering 'third_party/fbgemm/external/cpuinfo' 2025-12-04T09:35:27.0558010Z Entering 'third_party/fbgemm/external/cutlass' 2025-12-04T09:35:27.0616068Z Entering 'third_party/fbgemm/external/googletest' 2025-12-04T09:35:27.0663102Z Entering 'third_party/fbgemm/external/hipify_torch' 2025-12-04T09:35:27.0709565Z Entering 'third_party/fbgemm/external/json' 2025-12-04T09:35:27.0760542Z Entering 'third_party/flash-attention' 2025-12-04T09:35:27.0810682Z Entering 'third_party/flash-attention/csrc/composable_kernel' 2025-12-04T09:35:27.0864809Z Entering 'third_party/flash-attention/csrc/cutlass' 2025-12-04T09:35:27.0923625Z Entering 'third_party/flatbuffers' 2025-12-04T09:35:27.0975582Z Entering 'third_party/fmt' 2025-12-04T09:35:27.1026371Z Entering 'third_party/gemmlowp/gemmlowp' 2025-12-04T09:35:27.1076445Z Entering 'third_party/gloo' 2025-12-04T09:35:27.1125081Z Entering 'third_party/googletest' 2025-12-04T09:35:27.1175383Z Entering 'third_party/ideep' 2025-12-04T09:35:27.1221526Z Entering 'third_party/ideep/mkl-dnn' 2025-12-04T09:35:27.1280021Z Entering 'third_party/ittapi' 2025-12-04T09:35:27.1328628Z Entering 'third_party/kineto' 2025-12-04T09:35:27.1378054Z Entering 'third_party/kineto/libkineto/third_party/dynolog' 2025-12-04T09:35:27.1425161Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM' 2025-12-04T09:35:27.1474193Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr' 2025-12-04T09:35:27.1520497Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt' 2025-12-04T09:35:27.1568409Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags' 2025-12-04T09:35:27.1614952Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc' 2025-12-04T09:35:27.1663824Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog' 2025-12-04T09:35:27.1711247Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest' 2025-12-04T09:35:27.1758523Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/json' 2025-12-04T09:35:27.1806859Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs' 2025-12-04T09:35:27.1854403Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp' 2025-12-04T09:35:27.1900658Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/civetweb' 2025-12-04T09:35:27.1950491Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/googletest' 2025-12-04T09:35:27.2002054Z Entering 'third_party/kineto/libkineto/third_party/fmt' 2025-12-04T09:35:27.2050202Z Entering 'third_party/kineto/libkineto/third_party/googletest' 2025-12-04T09:35:27.2098697Z Entering 'third_party/kleidiai' 2025-12-04T09:35:27.2148381Z Entering 'third_party/mimalloc' 2025-12-04T09:35:27.2198263Z Entering 'third_party/nlohmann' 2025-12-04T09:35:27.2247590Z Entering 'third_party/onnx' 2025-12-04T09:35:27.2316926Z Entering 'third_party/onnx/third_party/pybind11' 2025-12-04T09:35:27.2365724Z Entering 'third_party/opentelemetry-cpp' 2025-12-04T09:35:27.2415709Z Entering 'third_party/opentelemetry-cpp/third_party/benchmark' 2025-12-04T09:35:27.2461976Z Entering 'third_party/opentelemetry-cpp/third_party/googletest' 2025-12-04T09:35:27.2507582Z Entering 'third_party/opentelemetry-cpp/third_party/ms-gsl' 2025-12-04T09:35:27.2553337Z Entering 'third_party/opentelemetry-cpp/third_party/nlohmann-json' 2025-12-04T09:35:27.2603186Z Entering 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto' 2025-12-04T09:35:27.2649002Z Entering 'third_party/opentelemetry-cpp/third_party/opentracing-cpp' 2025-12-04T09:35:27.2700396Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp' 2025-12-04T09:35:27.2746645Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb' 2025-12-04T09:35:27.2796244Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest' 2025-12-04T09:35:27.2846497Z Entering 'third_party/opentelemetry-cpp/tools/vcpkg' 2025-12-04T09:35:27.2917503Z Entering 'third_party/pocketfft' 2025-12-04T09:35:27.2970219Z Entering 'third_party/protobuf' 2025-12-04T09:35:27.3024291Z Entering 'third_party/protobuf/third_party/benchmark' 2025-12-04T09:35:27.3075440Z Entering 'third_party/protobuf/third_party/googletest' 2025-12-04T09:35:27.3119985Z Entering 'third_party/psimd' 2025-12-04T09:35:27.3169021Z Entering 'third_party/pthreadpool' 2025-12-04T09:35:27.3219624Z Entering 'third_party/pybind11' 2025-12-04T09:35:27.3267708Z Entering 'third_party/python-peachpy' 2025-12-04T09:35:27.3319068Z Entering 'third_party/sleef' 2025-12-04T09:35:27.3367110Z Entering 'third_party/tensorpipe' 2025-12-04T09:35:27.3415758Z Entering 'third_party/tensorpipe/third_party/googletest' 2025-12-04T09:35:27.3465461Z Entering 'third_party/tensorpipe/third_party/libnop' 2025-12-04T09:35:27.3514828Z Entering 'third_party/tensorpipe/third_party/libuv' 2025-12-04T09:35:27.3562768Z Entering 'third_party/tensorpipe/third_party/pybind11' 2025-12-04T09:35:27.3611480Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang' 2025-12-04T09:35:27.3673571Z ##[endgroup] 2025-12-04T09:35:27.3674183Z ##[group]Persisting credentials for submodules 2025-12-04T09:35:27.3681499Z [command]/usr/bin/git submodule foreach --recursive sh -c "git config --local --name-only --get-regexp 'url\.https\:\/\/github\.com\/\.insteadOf' && git config --local --unset-all 'url.https://github.com/.insteadOf' || :" 2025-12-04T09:35:27.4020284Z Entering 'android/libs/fbjni' 2025-12-04T09:35:27.4085493Z Entering 'third_party/FP16' 2025-12-04T09:35:27.4149769Z Entering 'third_party/FXdiv' 2025-12-04T09:35:27.4212058Z Entering 'third_party/NNPACK' 2025-12-04T09:35:27.4274871Z Entering 'third_party/NVTX' 2025-12-04T09:35:27.4338239Z Entering 'third_party/VulkanMemoryAllocator' 2025-12-04T09:35:27.4400654Z Entering 'third_party/XNNPACK' 2025-12-04T09:35:27.4479739Z Entering 'third_party/aiter' 2025-12-04T09:35:27.4542706Z Entering 'third_party/aiter/3rdparty/composable_kernel' 2025-12-04T09:35:27.4615544Z Entering 'third_party/benchmark' 2025-12-04T09:35:27.4679575Z Entering 'third_party/composable_kernel' 2025-12-04T09:35:27.4752498Z Entering 'third_party/cpp-httplib' 2025-12-04T09:35:27.4814567Z Entering 'third_party/cpuinfo' 2025-12-04T09:35:27.4882564Z Entering 'third_party/cudnn_frontend' 2025-12-04T09:35:27.4947095Z Entering 'third_party/cutlass' 2025-12-04T09:35:27.5021383Z Entering 'third_party/fbgemm' 2025-12-04T09:35:27.5089381Z Entering 'third_party/fbgemm/external/asmjit' 2025-12-04T09:35:27.5150241Z Entering 'third_party/fbgemm/external/composable_kernel' 2025-12-04T09:35:27.5228828Z Entering 'third_party/fbgemm/external/cpuinfo' 2025-12-04T09:35:27.5290704Z Entering 'third_party/fbgemm/external/cutlass' 2025-12-04T09:35:27.5364057Z Entering 'third_party/fbgemm/external/googletest' 2025-12-04T09:35:27.5426382Z Entering 'third_party/fbgemm/external/hipify_torch' 2025-12-04T09:35:27.5488037Z Entering 'third_party/fbgemm/external/json' 2025-12-04T09:35:27.5552915Z Entering 'third_party/flash-attention' 2025-12-04T09:35:27.5619206Z Entering 'third_party/flash-attention/csrc/composable_kernel' 2025-12-04T09:35:27.5694133Z Entering 'third_party/flash-attention/csrc/cutlass' 2025-12-04T09:35:27.5767609Z Entering 'third_party/flatbuffers' 2025-12-04T09:35:27.5834303Z Entering 'third_party/fmt' 2025-12-04T09:35:27.5897034Z Entering 'third_party/gemmlowp/gemmlowp' 2025-12-04T09:35:27.5960736Z Entering 'third_party/gloo' 2025-12-04T09:35:27.6023938Z Entering 'third_party/googletest' 2025-12-04T09:35:27.6089385Z Entering 'third_party/ideep' 2025-12-04T09:35:27.6151098Z Entering 'third_party/ideep/mkl-dnn' 2025-12-04T09:35:27.6221896Z Entering 'third_party/ittapi' 2025-12-04T09:35:27.6285911Z Entering 'third_party/kineto' 2025-12-04T09:35:27.6349326Z Entering 'third_party/kineto/libkineto/third_party/dynolog' 2025-12-04T09:35:27.6414385Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM' 2025-12-04T09:35:27.6479786Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr' 2025-12-04T09:35:27.6542320Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt' 2025-12-04T09:35:27.6605572Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags' 2025-12-04T09:35:27.6669712Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc' 2025-12-04T09:35:27.6736537Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog' 2025-12-04T09:35:27.6801345Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest' 2025-12-04T09:35:27.6866185Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/json' 2025-12-04T09:35:27.6931989Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs' 2025-12-04T09:35:27.6998359Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp' 2025-12-04T09:35:27.7058692Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/civetweb' 2025-12-04T09:35:27.7126234Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/googletest' 2025-12-04T09:35:27.7194565Z Entering 'third_party/kineto/libkineto/third_party/fmt' 2025-12-04T09:35:27.7257284Z Entering 'third_party/kineto/libkineto/third_party/googletest' 2025-12-04T09:35:27.7322301Z Entering 'third_party/kleidiai' 2025-12-04T09:35:27.7392858Z Entering 'third_party/mimalloc' 2025-12-04T09:35:27.7456106Z Entering 'third_party/nlohmann' 2025-12-04T09:35:27.7521414Z Entering 'third_party/onnx' 2025-12-04T09:35:27.7606930Z Entering 'third_party/onnx/third_party/pybind11' 2025-12-04T09:35:27.7673857Z Entering 'third_party/opentelemetry-cpp' 2025-12-04T09:35:27.7738340Z Entering 'third_party/opentelemetry-cpp/third_party/benchmark' 2025-12-04T09:35:27.7801060Z Entering 'third_party/opentelemetry-cpp/third_party/googletest' 2025-12-04T09:35:27.7863077Z Entering 'third_party/opentelemetry-cpp/third_party/ms-gsl' 2025-12-04T09:35:27.7925052Z Entering 'third_party/opentelemetry-cpp/third_party/nlohmann-json' 2025-12-04T09:35:27.7987926Z Entering 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto' 2025-12-04T09:35:27.8048327Z Entering 'third_party/opentelemetry-cpp/third_party/opentracing-cpp' 2025-12-04T09:35:27.8109951Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp' 2025-12-04T09:35:27.8171454Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb' 2025-12-04T09:35:27.8237736Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest' 2025-12-04T09:35:27.8302639Z Entering 'third_party/opentelemetry-cpp/tools/vcpkg' 2025-12-04T09:35:27.8387706Z Entering 'third_party/pocketfft' 2025-12-04T09:35:27.8450936Z Entering 'third_party/protobuf' 2025-12-04T09:35:27.8520148Z Entering 'third_party/protobuf/third_party/benchmark' 2025-12-04T09:35:27.8583104Z Entering 'third_party/protobuf/third_party/googletest' 2025-12-04T09:35:27.8647633Z Entering 'third_party/psimd' 2025-12-04T09:35:27.8712052Z Entering 'third_party/pthreadpool' 2025-12-04T09:35:27.8778949Z Entering 'third_party/pybind11' 2025-12-04T09:35:27.8841107Z Entering 'third_party/python-peachpy' 2025-12-04T09:35:27.8903372Z Entering 'third_party/sleef' 2025-12-04T09:35:27.8965963Z Entering 'third_party/tensorpipe' 2025-12-04T09:35:27.9031537Z Entering 'third_party/tensorpipe/third_party/googletest' 2025-12-04T09:35:27.9092881Z Entering 'third_party/tensorpipe/third_party/libnop' 2025-12-04T09:35:27.9154016Z Entering 'third_party/tensorpipe/third_party/libuv' 2025-12-04T09:35:27.9214763Z Entering 'third_party/tensorpipe/third_party/pybind11' 2025-12-04T09:35:27.9279163Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang' 2025-12-04T09:35:27.9359686Z [command]/usr/bin/git submodule foreach --recursive sh -c "git config --local 'http.https://github.com/.extraheader' 'AUTHORIZATION: basic ***' && git config --local --show-origin --name-only --get-regexp remote.origin.url" 2025-12-04T09:35:27.9708958Z Entering 'android/libs/fbjni' 2025-12-04T09:35:27.9768187Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/android/libs/fbjni/config remote.origin.url 2025-12-04T09:35:27.9786392Z Entering 'third_party/FP16' 2025-12-04T09:35:27.9845597Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/FP16/config remote.origin.url 2025-12-04T09:35:27.9863926Z Entering 'third_party/FXdiv' 2025-12-04T09:35:27.9921921Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/FXdiv/config remote.origin.url 2025-12-04T09:35:27.9940931Z Entering 'third_party/NNPACK' 2025-12-04T09:35:27.9999329Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK/config remote.origin.url 2025-12-04T09:35:28.0137728Z Entering 'third_party/NVTX' 2025-12-04T09:35:28.0200459Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/NVTX/config remote.origin.url 2025-12-04T09:35:28.0247809Z Entering 'third_party/VulkanMemoryAllocator' 2025-12-04T09:35:28.0324145Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/VulkanMemoryAllocator/config remote.origin.url 2025-12-04T09:35:28.0354815Z Entering 'third_party/XNNPACK' 2025-12-04T09:35:28.0413200Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/XNNPACK/config remote.origin.url 2025-12-04T09:35:28.0448741Z Entering 'third_party/aiter' 2025-12-04T09:35:28.0508478Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/aiter/config remote.origin.url 2025-12-04T09:35:28.0527706Z Entering 'third_party/aiter/3rdparty/composable_kernel' 2025-12-04T09:35:28.0586482Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/aiter/modules/3rdparty/composable_kernel/config remote.origin.url 2025-12-04T09:35:28.0614797Z Entering 'third_party/benchmark' 2025-12-04T09:35:28.0674276Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/benchmark/config remote.origin.url 2025-12-04T09:35:28.0692727Z Entering 'third_party/composable_kernel' 2025-12-04T09:35:28.0751774Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/composable_kernel/config remote.origin.url 2025-12-04T09:35:28.0780599Z Entering 'third_party/cpp-httplib' 2025-12-04T09:35:28.0839861Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/cpp-httplib/config remote.origin.url 2025-12-04T09:35:28.0858802Z Entering 'third_party/cpuinfo' 2025-12-04T09:35:28.0919763Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/cpuinfo/config remote.origin.url 2025-12-04T09:35:28.0939062Z Entering 'third_party/cudnn_frontend' 2025-12-04T09:35:28.0999936Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/cudnn_frontend/config remote.origin.url 2025-12-04T09:35:28.1019264Z Entering 'third_party/cutlass' 2025-12-04T09:35:28.1081027Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/cutlass/config remote.origin.url 2025-12-04T09:35:28.1110551Z Entering 'third_party/fbgemm' 2025-12-04T09:35:28.1173256Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/config remote.origin.url 2025-12-04T09:35:28.1194327Z Entering 'third_party/fbgemm/external/asmjit' 2025-12-04T09:35:28.1254412Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/asmjit/config remote.origin.url 2025-12-04T09:35:28.1272949Z Entering 'third_party/fbgemm/external/composable_kernel' 2025-12-04T09:35:28.1331687Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/composable_kernel/config remote.origin.url 2025-12-04T09:35:28.1358763Z Entering 'third_party/fbgemm/external/cpuinfo' 2025-12-04T09:35:28.1416444Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/cpuinfo/config remote.origin.url 2025-12-04T09:35:28.1434711Z Entering 'third_party/fbgemm/external/cutlass' 2025-12-04T09:35:28.1493252Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/cutlass/config remote.origin.url 2025-12-04T09:35:28.1521868Z Entering 'third_party/fbgemm/external/googletest' 2025-12-04T09:35:28.1582743Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/googletest/config remote.origin.url 2025-12-04T09:35:28.1600486Z Entering 'third_party/fbgemm/external/hipify_torch' 2025-12-04T09:35:28.1659863Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/hipify_torch/config remote.origin.url 2025-12-04T09:35:28.1677526Z Entering 'third_party/fbgemm/external/json' 2025-12-04T09:35:28.1735691Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/json/config remote.origin.url 2025-12-04T09:35:28.1757098Z Entering 'third_party/flash-attention' 2025-12-04T09:35:28.1817497Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/flash-attention/config remote.origin.url 2025-12-04T09:35:28.1837059Z Entering 'third_party/flash-attention/csrc/composable_kernel' 2025-12-04T09:35:28.1898060Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/flash-attention/modules/csrc/composable_kernel/config remote.origin.url 2025-12-04T09:35:28.1922942Z Entering 'third_party/flash-attention/csrc/cutlass' 2025-12-04T09:35:28.1982659Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/flash-attention/modules/csrc/cutlass/config remote.origin.url 2025-12-04T09:35:28.2010740Z Entering 'third_party/flatbuffers' 2025-12-04T09:35:28.2070885Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/flatbuffers/config remote.origin.url 2025-12-04T09:35:28.2093098Z Entering 'third_party/fmt' 2025-12-04T09:35:28.2152750Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fmt/config remote.origin.url 2025-12-04T09:35:28.2171847Z Entering 'third_party/gemmlowp/gemmlowp' 2025-12-04T09:35:28.2230645Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/gemmlowp/gemmlowp/config remote.origin.url 2025-12-04T09:35:28.2248968Z Entering 'third_party/gloo' 2025-12-04T09:35:28.2307111Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/gloo/config remote.origin.url 2025-12-04T09:35:28.2325736Z Entering 'third_party/googletest' 2025-12-04T09:35:28.2384362Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/googletest/config remote.origin.url 2025-12-04T09:35:28.2403007Z Entering 'third_party/ideep' 2025-12-04T09:35:28.2463464Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/ideep/config remote.origin.url 2025-12-04T09:35:28.2482210Z Entering 'third_party/ideep/mkl-dnn' 2025-12-04T09:35:28.2539658Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/ideep/modules/mkl-dnn/config remote.origin.url 2025-12-04T09:35:28.2567279Z Entering 'third_party/ittapi' 2025-12-04T09:35:28.2627004Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/ittapi/config remote.origin.url 2025-12-04T09:35:28.2645640Z Entering 'third_party/kineto' 2025-12-04T09:35:28.2704796Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/config remote.origin.url 2025-12-04T09:35:28.2722990Z Entering 'third_party/kineto/libkineto/third_party/dynolog' 2025-12-04T09:35:28.2783274Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/config remote.origin.url 2025-12-04T09:35:28.2800972Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM' 2025-12-04T09:35:28.2860374Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/DCGM/config remote.origin.url 2025-12-04T09:35:28.2879907Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr' 2025-12-04T09:35:28.2938665Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/cpr/config remote.origin.url 2025-12-04T09:35:28.2956330Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt' 2025-12-04T09:35:28.3016250Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/fmt/config remote.origin.url 2025-12-04T09:35:28.3034328Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags' 2025-12-04T09:35:28.3094838Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/gflags/config remote.origin.url 2025-12-04T09:35:28.3111766Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc' 2025-12-04T09:35:28.3172008Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/gflags/modules/doc/config remote.origin.url 2025-12-04T09:35:28.3191729Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog' 2025-12-04T09:35:28.3251227Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/glog/config remote.origin.url 2025-12-04T09:35:28.3269016Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest' 2025-12-04T09:35:28.3328777Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/googletest/config remote.origin.url 2025-12-04T09:35:28.3346844Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/json' 2025-12-04T09:35:28.3406460Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/json/config remote.origin.url 2025-12-04T09:35:28.3425318Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs' 2025-12-04T09:35:28.3484580Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/pfs/config remote.origin.url 2025-12-04T09:35:28.3502863Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp' 2025-12-04T09:35:28.3565955Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/prometheus-cpp/config remote.origin.url 2025-12-04T09:35:28.3584364Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/civetweb' 2025-12-04T09:35:28.3643946Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/prometheus-cpp/modules/civetweb/config remote.origin.url 2025-12-04T09:35:28.3664004Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/googletest' 2025-12-04T09:35:28.3724075Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/prometheus-cpp/modules/googletest/config remote.origin.url 2025-12-04T09:35:28.3746535Z Entering 'third_party/kineto/libkineto/third_party/fmt' 2025-12-04T09:35:28.3805888Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/fmt/config remote.origin.url 2025-12-04T09:35:28.3823984Z Entering 'third_party/kineto/libkineto/third_party/googletest' 2025-12-04T09:35:28.3881601Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/googletest/config remote.origin.url 2025-12-04T09:35:28.3901658Z Entering 'third_party/kleidiai' 2025-12-04T09:35:28.3960023Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kleidiai/config remote.origin.url 2025-12-04T09:35:28.3980381Z Entering 'third_party/mimalloc' 2025-12-04T09:35:28.4040103Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/mimalloc/config remote.origin.url 2025-12-04T09:35:28.4059502Z Entering 'third_party/nlohmann' 2025-12-04T09:35:28.4120412Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/nlohmann/config remote.origin.url 2025-12-04T09:35:28.4141164Z Entering 'third_party/onnx' 2025-12-04T09:35:28.4199805Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/onnx/config remote.origin.url 2025-12-04T09:35:28.4238124Z Entering 'third_party/onnx/third_party/pybind11' 2025-12-04T09:35:28.4300303Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/onnx/modules/third_party/pybind11/config remote.origin.url 2025-12-04T09:35:28.4321789Z Entering 'third_party/opentelemetry-cpp' 2025-12-04T09:35:28.4384137Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/config remote.origin.url 2025-12-04T09:35:28.4403971Z Entering 'third_party/opentelemetry-cpp/third_party/benchmark' 2025-12-04T09:35:28.4461944Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/benchmark/config remote.origin.url 2025-12-04T09:35:28.4481278Z Entering 'third_party/opentelemetry-cpp/third_party/googletest' 2025-12-04T09:35:28.4538523Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/googletest/config remote.origin.url 2025-12-04T09:35:28.4559846Z Entering 'third_party/opentelemetry-cpp/third_party/ms-gsl' 2025-12-04T09:35:28.4618498Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/ms-gsl/config remote.origin.url 2025-12-04T09:35:28.4636180Z Entering 'third_party/opentelemetry-cpp/third_party/nlohmann-json' 2025-12-04T09:35:28.4694341Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/nlohmann-json/config remote.origin.url 2025-12-04T09:35:28.4713884Z Entering 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto' 2025-12-04T09:35:28.4772504Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/opentelemetry-proto/config remote.origin.url 2025-12-04T09:35:28.4789632Z Entering 'third_party/opentelemetry-cpp/third_party/opentracing-cpp' 2025-12-04T09:35:28.4848052Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/opentracing-cpp/config remote.origin.url 2025-12-04T09:35:28.4865664Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp' 2025-12-04T09:35:28.4923237Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/prometheus-cpp/config remote.origin.url 2025-12-04T09:35:28.4939895Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb' 2025-12-04T09:35:28.4999450Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/prometheus-cpp/modules/civetweb/config remote.origin.url 2025-12-04T09:35:28.5019259Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest' 2025-12-04T09:35:28.5082890Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/prometheus-cpp/modules/googletest/config remote.origin.url 2025-12-04T09:35:28.5103033Z Entering 'third_party/opentelemetry-cpp/tools/vcpkg' 2025-12-04T09:35:28.5160670Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/tools/vcpkg/config remote.origin.url 2025-12-04T09:35:28.5201189Z Entering 'third_party/pocketfft' 2025-12-04T09:35:28.5260782Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/pocketfft/config remote.origin.url 2025-12-04T09:35:28.5279818Z Entering 'third_party/protobuf' 2025-12-04T09:35:28.5339151Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/protobuf/config remote.origin.url 2025-12-04T09:35:28.5360949Z Entering 'third_party/protobuf/third_party/benchmark' 2025-12-04T09:35:28.5420368Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/protobuf/modules/third_party/benchmark/config remote.origin.url 2025-12-04T09:35:28.5438499Z Entering 'third_party/protobuf/third_party/googletest' 2025-12-04T09:35:28.5497197Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/protobuf/modules/third_party/googletest/config remote.origin.url 2025-12-04T09:35:28.5517486Z Entering 'third_party/psimd' 2025-12-04T09:35:28.5581047Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/psimd/config remote.origin.url 2025-12-04T09:35:28.5600782Z Entering 'third_party/pthreadpool' 2025-12-04T09:35:28.5661129Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/pthreadpool/config remote.origin.url 2025-12-04T09:35:28.5680638Z Entering 'third_party/pybind11' 2025-12-04T09:35:28.5740181Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/pybind11/config remote.origin.url 2025-12-04T09:35:28.5759432Z Entering 'third_party/python-peachpy' 2025-12-04T09:35:28.5819977Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/python-peachpy/config remote.origin.url 2025-12-04T09:35:28.5838376Z Entering 'third_party/sleef' 2025-12-04T09:35:28.5898883Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/sleef/config remote.origin.url 2025-12-04T09:35:28.5918291Z Entering 'third_party/tensorpipe' 2025-12-04T09:35:28.5979401Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/config remote.origin.url 2025-12-04T09:35:28.5997866Z Entering 'third_party/tensorpipe/third_party/googletest' 2025-12-04T09:35:28.6058848Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/googletest/config remote.origin.url 2025-12-04T09:35:28.6077231Z Entering 'third_party/tensorpipe/third_party/libnop' 2025-12-04T09:35:28.6135893Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/libnop/config remote.origin.url 2025-12-04T09:35:28.6153505Z Entering 'third_party/tensorpipe/third_party/libuv' 2025-12-04T09:35:28.6211809Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/libuv/config remote.origin.url 2025-12-04T09:35:28.6230288Z Entering 'third_party/tensorpipe/third_party/pybind11' 2025-12-04T09:35:28.6288645Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/pybind11/config remote.origin.url 2025-12-04T09:35:28.6305690Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang' 2025-12-04T09:35:28.6366393Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/pybind11/modules/tools/clang/config remote.origin.url 2025-12-04T09:35:28.7613160Z [command]/usr/bin/git submodule foreach --recursive git config --local --add 'url.https://github.com/.insteadOf' 'git@github.com:' 2025-12-04T09:35:28.7962990Z Entering 'android/libs/fbjni' 2025-12-04T09:35:28.8012731Z Entering 'third_party/FP16' 2025-12-04T09:35:28.8059889Z Entering 'third_party/FXdiv' 2025-12-04T09:35:28.8106677Z Entering 'third_party/NNPACK' 2025-12-04T09:35:28.8154434Z Entering 'third_party/NVTX' 2025-12-04T09:35:28.8202186Z Entering 'third_party/VulkanMemoryAllocator' 2025-12-04T09:35:28.8252709Z Entering 'third_party/XNNPACK' 2025-12-04T09:35:28.8319063Z Entering 'third_party/aiter' 2025-12-04T09:35:28.8368149Z Entering 'third_party/aiter/3rdparty/composable_kernel' 2025-12-04T09:35:28.8424473Z Entering 'third_party/benchmark' 2025-12-04T09:35:28.8474457Z Entering 'third_party/composable_kernel' 2025-12-04T09:35:28.8536725Z Entering 'third_party/cpp-httplib' 2025-12-04T09:35:28.8587443Z Entering 'third_party/cpuinfo' 2025-12-04T09:35:28.8636642Z Entering 'third_party/cudnn_frontend' 2025-12-04T09:35:28.8685340Z Entering 'third_party/cutlass' 2025-12-04T09:35:28.8743108Z Entering 'third_party/fbgemm' 2025-12-04T09:35:28.8794643Z Entering 'third_party/fbgemm/external/asmjit' 2025-12-04T09:35:28.8841125Z Entering 'third_party/fbgemm/external/composable_kernel' 2025-12-04T09:35:28.8896812Z Entering 'third_party/fbgemm/external/cpuinfo' 2025-12-04T09:35:28.8945210Z Entering 'third_party/fbgemm/external/cutlass' 2025-12-04T09:35:28.9001915Z Entering 'third_party/fbgemm/external/googletest' 2025-12-04T09:35:28.9051711Z Entering 'third_party/fbgemm/external/hipify_torch' 2025-12-04T09:35:28.9100408Z Entering 'third_party/fbgemm/external/json' 2025-12-04T09:35:28.9157702Z Entering 'third_party/flash-attention' 2025-12-04T09:35:28.9207203Z Entering 'third_party/flash-attention/csrc/composable_kernel' 2025-12-04T09:35:28.9261123Z Entering 'third_party/flash-attention/csrc/cutlass' 2025-12-04T09:35:28.9318471Z Entering 'third_party/flatbuffers' 2025-12-04T09:35:28.9369495Z Entering 'third_party/fmt' 2025-12-04T09:35:28.9421048Z Entering 'third_party/gemmlowp/gemmlowp' 2025-12-04T09:35:28.9469556Z Entering 'third_party/gloo' 2025-12-04T09:35:28.9517184Z Entering 'third_party/googletest' 2025-12-04T09:35:28.9566356Z Entering 'third_party/ideep' 2025-12-04T09:35:28.9613999Z Entering 'third_party/ideep/mkl-dnn' 2025-12-04T09:35:28.9672404Z Entering 'third_party/ittapi' 2025-12-04T09:35:28.9720386Z Entering 'third_party/kineto' 2025-12-04T09:35:28.9769135Z Entering 'third_party/kineto/libkineto/third_party/dynolog' 2025-12-04T09:35:28.9817809Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM' 2025-12-04T09:35:28.9867040Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr' 2025-12-04T09:35:28.9916625Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt' 2025-12-04T09:35:28.9964095Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags' 2025-12-04T09:35:29.0010661Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc' 2025-12-04T09:35:29.0061056Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog' 2025-12-04T09:35:29.0108595Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest' 2025-12-04T09:35:29.0155584Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/json' 2025-12-04T09:35:29.0209191Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs' 2025-12-04T09:35:29.0256338Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp' 2025-12-04T09:35:29.0303202Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/civetweb' 2025-12-04T09:35:29.0352328Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/googletest' 2025-12-04T09:35:29.0404254Z Entering 'third_party/kineto/libkineto/third_party/fmt' 2025-12-04T09:35:29.0455872Z Entering 'third_party/kineto/libkineto/third_party/googletest' 2025-12-04T09:35:29.0506427Z Entering 'third_party/kleidiai' 2025-12-04T09:35:29.0555745Z Entering 'third_party/mimalloc' 2025-12-04T09:35:29.0604718Z Entering 'third_party/nlohmann' 2025-12-04T09:35:29.0653153Z Entering 'third_party/onnx' 2025-12-04T09:35:29.0723559Z Entering 'third_party/onnx/third_party/pybind11' 2025-12-04T09:35:29.0777273Z Entering 'third_party/opentelemetry-cpp' 2025-12-04T09:35:29.0827992Z Entering 'third_party/opentelemetry-cpp/third_party/benchmark' 2025-12-04T09:35:29.0876367Z Entering 'third_party/opentelemetry-cpp/third_party/googletest' 2025-12-04T09:35:29.0923471Z Entering 'third_party/opentelemetry-cpp/third_party/ms-gsl' 2025-12-04T09:35:29.0969872Z Entering 'third_party/opentelemetry-cpp/third_party/nlohmann-json' 2025-12-04T09:35:29.1020161Z Entering 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto' 2025-12-04T09:35:29.1067732Z Entering 'third_party/opentelemetry-cpp/third_party/opentracing-cpp' 2025-12-04T09:35:29.1114920Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp' 2025-12-04T09:35:29.1160950Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb' 2025-12-04T09:35:29.1213714Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest' 2025-12-04T09:35:29.1262731Z Entering 'third_party/opentelemetry-cpp/tools/vcpkg' 2025-12-04T09:35:29.1333182Z Entering 'third_party/pocketfft' 2025-12-04T09:35:29.1382369Z Entering 'third_party/protobuf' 2025-12-04T09:35:29.1433554Z Entering 'third_party/protobuf/third_party/benchmark' 2025-12-04T09:35:29.1480613Z Entering 'third_party/protobuf/third_party/googletest' 2025-12-04T09:35:29.1530061Z Entering 'third_party/psimd' 2025-12-04T09:35:29.1577542Z Entering 'third_party/pthreadpool' 2025-12-04T09:35:29.1624119Z Entering 'third_party/pybind11' 2025-12-04T09:35:29.1673444Z Entering 'third_party/python-peachpy' 2025-12-04T09:35:29.1720683Z Entering 'third_party/sleef' 2025-12-04T09:35:29.1769050Z Entering 'third_party/tensorpipe' 2025-12-04T09:35:29.1816692Z Entering 'third_party/tensorpipe/third_party/googletest' 2025-12-04T09:35:29.1863497Z Entering 'third_party/tensorpipe/third_party/libnop' 2025-12-04T09:35:29.1910952Z Entering 'third_party/tensorpipe/third_party/libuv' 2025-12-04T09:35:29.1957029Z Entering 'third_party/tensorpipe/third_party/pybind11' 2025-12-04T09:35:29.2001976Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang' 2025-12-04T09:35:29.2068493Z [command]/usr/bin/git submodule foreach --recursive git config --local --add 'url.https://github.com/.insteadOf' 'org-21003710@github.com:' 2025-12-04T09:35:29.2409565Z Entering 'android/libs/fbjni' 2025-12-04T09:35:29.2455820Z Entering 'third_party/FP16' 2025-12-04T09:35:29.2505313Z Entering 'third_party/FXdiv' 2025-12-04T09:35:29.2553935Z Entering 'third_party/NNPACK' 2025-12-04T09:35:29.2603363Z Entering 'third_party/NVTX' 2025-12-04T09:35:29.2651376Z Entering 'third_party/VulkanMemoryAllocator' 2025-12-04T09:35:29.2700461Z Entering 'third_party/XNNPACK' 2025-12-04T09:35:29.2764658Z Entering 'third_party/aiter' 2025-12-04T09:35:29.2813015Z Entering 'third_party/aiter/3rdparty/composable_kernel' 2025-12-04T09:35:29.2873060Z Entering 'third_party/benchmark' 2025-12-04T09:35:29.2922134Z Entering 'third_party/composable_kernel' 2025-12-04T09:35:29.2981674Z Entering 'third_party/cpp-httplib' 2025-12-04T09:35:29.3029889Z Entering 'third_party/cpuinfo' 2025-12-04T09:35:29.3078640Z Entering 'third_party/cudnn_frontend' 2025-12-04T09:35:29.3125969Z Entering 'third_party/cutlass' 2025-12-04T09:35:29.3189149Z Entering 'third_party/fbgemm' 2025-12-04T09:35:29.3241008Z Entering 'third_party/fbgemm/external/asmjit' 2025-12-04T09:35:29.3288724Z Entering 'third_party/fbgemm/external/composable_kernel' 2025-12-04T09:35:29.3347307Z Entering 'third_party/fbgemm/external/cpuinfo' 2025-12-04T09:35:29.3394429Z Entering 'third_party/fbgemm/external/cutlass' 2025-12-04T09:35:29.3450993Z Entering 'third_party/fbgemm/external/googletest' 2025-12-04T09:35:29.3498046Z Entering 'third_party/fbgemm/external/hipify_torch' 2025-12-04T09:35:29.3545025Z Entering 'third_party/fbgemm/external/json' 2025-12-04T09:35:29.3595688Z Entering 'third_party/flash-attention' 2025-12-04T09:35:29.3645117Z Entering 'third_party/flash-attention/csrc/composable_kernel' 2025-12-04T09:35:29.3700612Z Entering 'third_party/flash-attention/csrc/cutlass' 2025-12-04T09:35:29.3760342Z Entering 'third_party/flatbuffers' 2025-12-04T09:35:29.3812072Z Entering 'third_party/fmt' 2025-12-04T09:35:29.3859714Z Entering 'third_party/gemmlowp/gemmlowp' 2025-12-04T09:35:29.3908199Z Entering 'third_party/gloo' 2025-12-04T09:35:29.3957580Z Entering 'third_party/googletest' 2025-12-04T09:35:29.4006316Z Entering 'third_party/ideep' 2025-12-04T09:35:29.4053154Z Entering 'third_party/ideep/mkl-dnn' 2025-12-04T09:35:29.4109399Z Entering 'third_party/ittapi' 2025-12-04T09:35:29.4158196Z Entering 'third_party/kineto' 2025-12-04T09:35:29.4205199Z Entering 'third_party/kineto/libkineto/third_party/dynolog' 2025-12-04T09:35:29.4251773Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM' 2025-12-04T09:35:29.4300965Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr' 2025-12-04T09:35:29.4353794Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt' 2025-12-04T09:35:29.4400767Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags' 2025-12-04T09:35:29.4447885Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc' 2025-12-04T09:35:29.4496882Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog' 2025-12-04T09:35:29.4544079Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest' 2025-12-04T09:35:29.4592439Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/json' 2025-12-04T09:35:29.4641564Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs' 2025-12-04T09:35:29.4688289Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp' 2025-12-04T09:35:29.4734093Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/civetweb' 2025-12-04T09:35:29.4784606Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/googletest' 2025-12-04T09:35:29.4837262Z Entering 'third_party/kineto/libkineto/third_party/fmt' 2025-12-04T09:35:29.4884766Z Entering 'third_party/kineto/libkineto/third_party/googletest' 2025-12-04T09:35:29.4934596Z Entering 'third_party/kleidiai' 2025-12-04T09:35:29.4993609Z Entering 'third_party/mimalloc' 2025-12-04T09:35:29.5035381Z Entering 'third_party/nlohmann' 2025-12-04T09:35:29.5088081Z Entering 'third_party/onnx' 2025-12-04T09:35:29.5156523Z Entering 'third_party/onnx/third_party/pybind11' 2025-12-04T09:35:29.5208657Z Entering 'third_party/opentelemetry-cpp' 2025-12-04T09:35:29.5261478Z Entering 'third_party/opentelemetry-cpp/third_party/benchmark' 2025-12-04T09:35:29.5309120Z Entering 'third_party/opentelemetry-cpp/third_party/googletest' 2025-12-04T09:35:29.5357955Z Entering 'third_party/opentelemetry-cpp/third_party/ms-gsl' 2025-12-04T09:35:29.5405273Z Entering 'third_party/opentelemetry-cpp/third_party/nlohmann-json' 2025-12-04T09:35:29.5454517Z Entering 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto' 2025-12-04T09:35:29.5503243Z Entering 'third_party/opentelemetry-cpp/third_party/opentracing-cpp' 2025-12-04T09:35:29.5549839Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp' 2025-12-04T09:35:29.5598354Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb' 2025-12-04T09:35:29.5646114Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest' 2025-12-04T09:35:29.5696088Z Entering 'third_party/opentelemetry-cpp/tools/vcpkg' 2025-12-04T09:35:29.5766242Z Entering 'third_party/pocketfft' 2025-12-04T09:35:29.5815048Z Entering 'third_party/protobuf' 2025-12-04T09:35:29.5867965Z Entering 'third_party/protobuf/third_party/benchmark' 2025-12-04T09:35:29.5915103Z Entering 'third_party/protobuf/third_party/googletest' 2025-12-04T09:35:29.5964794Z Entering 'third_party/psimd' 2025-12-04T09:35:29.6013233Z Entering 'third_party/pthreadpool' 2025-12-04T09:35:29.6061225Z Entering 'third_party/pybind11' 2025-12-04T09:35:29.6110508Z Entering 'third_party/python-peachpy' 2025-12-04T09:35:29.6157734Z Entering 'third_party/sleef' 2025-12-04T09:35:29.6206808Z Entering 'third_party/tensorpipe' 2025-12-04T09:35:29.6254755Z Entering 'third_party/tensorpipe/third_party/googletest' 2025-12-04T09:35:29.6301806Z Entering 'third_party/tensorpipe/third_party/libnop' 2025-12-04T09:35:29.6348321Z Entering 'third_party/tensorpipe/third_party/libuv' 2025-12-04T09:35:29.6399965Z Entering 'third_party/tensorpipe/third_party/pybind11' 2025-12-04T09:35:29.6444345Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang' 2025-12-04T09:35:29.6507253Z ##[endgroup] 2025-12-04T09:35:29.6545834Z [command]/usr/bin/git log -1 --format=%H 2025-12-04T09:35:29.6569927Z ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32 2025-12-04T09:35:29.6677676Z ##[group]Run cd "${GITHUB_WORKSPACE}" 2025-12-04T09:35:29.6678101Z cd "${GITHUB_WORKSPACE}" 2025-12-04T09:35:29.6678585Z # Clean stale submodule dirs 2025-12-04T09:35:29.6678967Z if [ -z "${NO_SUDO}" ]; then 2025-12-04T09:35:29.6679427Z  sudo git submodule foreach --recursive git clean -ffdx 2025-12-04T09:35:29.6679873Z else 2025-12-04T09:35:29.6680230Z  git submodule foreach --recursive git clean -ffdx 2025-12-04T09:35:29.6680677Z fi 2025-12-04T09:35:29.6689991Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T09:35:29.6690439Z env: 2025-12-04T09:35:29.6690696Z GIT_DEFAULT_BRANCH: main 2025-12-04T09:35:29.6690991Z NO_SUDO: true 2025-12-04T09:35:29.6691255Z ##[endgroup] 2025-12-04T09:35:29.7056551Z Entering 'android/libs/fbjni' 2025-12-04T09:35:29.7097025Z Entering 'third_party/FP16' 2025-12-04T09:35:29.7134037Z Entering 'third_party/FXdiv' 2025-12-04T09:35:29.7171358Z Entering 'third_party/NNPACK' 2025-12-04T09:35:29.7212068Z Entering 'third_party/NVTX' 2025-12-04T09:35:29.7256671Z Entering 'third_party/VulkanMemoryAllocator' 2025-12-04T09:35:29.7294228Z Entering 'third_party/XNNPACK' 2025-12-04T09:35:29.7435463Z Entering 'third_party/aiter' 2025-12-04T09:35:29.7484734Z Entering 'third_party/aiter/3rdparty/composable_kernel' 2025-12-04T09:35:29.7607027Z Entering 'third_party/benchmark' 2025-12-04T09:35:29.7646951Z Entering 'third_party/composable_kernel' 2025-12-04T09:35:29.7783412Z Entering 'third_party/cpp-httplib' 2025-12-04T09:35:29.7821224Z Entering 'third_party/cpuinfo' 2025-12-04T09:35:29.7863502Z Entering 'third_party/cudnn_frontend' 2025-12-04T09:35:29.7903247Z Entering 'third_party/cutlass' 2025-12-04T09:35:29.8017528Z Entering 'third_party/fbgemm' 2025-12-04T09:35:29.8085485Z Entering 'third_party/fbgemm/external/asmjit' 2025-12-04T09:35:29.8120165Z Entering 'third_party/fbgemm/external/composable_kernel' 2025-12-04T09:35:29.8252491Z Entering 'third_party/fbgemm/external/cpuinfo' 2025-12-04T09:35:29.8291326Z Entering 'third_party/fbgemm/external/cutlass' 2025-12-04T09:35:29.8402602Z Entering 'third_party/fbgemm/external/googletest' 2025-12-04T09:35:29.8439699Z Entering 'third_party/fbgemm/external/hipify_torch' 2025-12-04T09:35:29.8473218Z Entering 'third_party/fbgemm/external/json' 2025-12-04T09:35:29.8523165Z Entering 'third_party/flash-attention' 2025-12-04T09:35:29.8568921Z Entering 'third_party/flash-attention/csrc/composable_kernel' 2025-12-04T09:35:29.8682119Z Entering 'third_party/flash-attention/csrc/cutlass' 2025-12-04T09:35:29.8782933Z Entering 'third_party/flatbuffers' 2025-12-04T09:35:29.8861333Z Entering 'third_party/fmt' 2025-12-04T09:35:29.8898993Z Entering 'third_party/gemmlowp/gemmlowp' 2025-12-04T09:35:29.8935128Z Entering 'third_party/gloo' 2025-12-04T09:35:29.8972655Z Entering 'third_party/googletest' 2025-12-04T09:35:29.9010834Z Entering 'third_party/ideep' 2025-12-04T09:35:29.9043834Z Entering 'third_party/ideep/mkl-dnn' 2025-12-04T09:35:29.9139742Z Entering 'third_party/ittapi' 2025-12-04T09:35:29.9178810Z Entering 'third_party/kineto' 2025-12-04T09:35:29.9217706Z Entering 'third_party/kineto/libkineto/third_party/dynolog' 2025-12-04T09:35:29.9259742Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM' 2025-12-04T09:35:29.9315122Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr' 2025-12-04T09:35:29.9350683Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt' 2025-12-04T09:35:29.9387136Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags' 2025-12-04T09:35:29.9421401Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc' 2025-12-04T09:35:29.9457738Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog' 2025-12-04T09:35:29.9492892Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest' 2025-12-04T09:35:29.9530646Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/json' 2025-12-04T09:35:29.9576436Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs' 2025-12-04T09:35:29.9611489Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp' 2025-12-04T09:35:29.9647024Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/civetweb' 2025-12-04T09:35:29.9704045Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/googletest' 2025-12-04T09:35:29.9747701Z Entering 'third_party/kineto/libkineto/third_party/fmt' 2025-12-04T09:35:29.9784186Z Entering 'third_party/kineto/libkineto/third_party/googletest' 2025-12-04T09:35:29.9823949Z Entering 'third_party/kleidiai' 2025-12-04T09:35:29.9868866Z Entering 'third_party/mimalloc' 2025-12-04T09:35:29.9907116Z Entering 'third_party/nlohmann' 2025-12-04T09:35:29.9957660Z Entering 'third_party/onnx' 2025-12-04T09:35:30.0333608Z Entering 'third_party/onnx/third_party/pybind11' 2025-12-04T09:35:30.0375424Z Entering 'third_party/opentelemetry-cpp' 2025-12-04T09:35:30.0439018Z Entering 'third_party/opentelemetry-cpp/third_party/benchmark' 2025-12-04T09:35:30.0475873Z Entering 'third_party/opentelemetry-cpp/third_party/googletest' 2025-12-04T09:35:30.0512789Z Entering 'third_party/opentelemetry-cpp/third_party/ms-gsl' 2025-12-04T09:35:30.0547141Z Entering 'third_party/opentelemetry-cpp/third_party/nlohmann-json' 2025-12-04T09:35:30.0595291Z Entering 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto' 2025-12-04T09:35:30.0630513Z Entering 'third_party/opentelemetry-cpp/third_party/opentracing-cpp' 2025-12-04T09:35:30.0666233Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp' 2025-12-04T09:35:30.0701955Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb' 2025-12-04T09:35:30.0754868Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest' 2025-12-04T09:35:30.0797777Z Entering 'third_party/opentelemetry-cpp/tools/vcpkg' 2025-12-04T09:35:30.1097407Z Entering 'third_party/pocketfft' 2025-12-04T09:35:30.1134998Z Entering 'third_party/protobuf' 2025-12-04T09:35:30.1223456Z Entering 'third_party/protobuf/third_party/benchmark' 2025-12-04T09:35:30.1257796Z Entering 'third_party/protobuf/third_party/googletest' 2025-12-04T09:35:30.1304072Z Entering 'third_party/psimd' 2025-12-04T09:35:30.1338504Z Entering 'third_party/pthreadpool' 2025-12-04T09:35:30.1374076Z Entering 'third_party/pybind11' 2025-12-04T09:35:30.1412511Z Entering 'third_party/python-peachpy' 2025-12-04T09:35:30.1448686Z Entering 'third_party/sleef' 2025-12-04T09:35:30.1487305Z Entering 'third_party/tensorpipe' 2025-12-04T09:35:30.1526720Z Entering 'third_party/tensorpipe/third_party/googletest' 2025-12-04T09:35:30.1562956Z Entering 'third_party/tensorpipe/third_party/libnop' 2025-12-04T09:35:30.1597822Z Entering 'third_party/tensorpipe/third_party/libuv' 2025-12-04T09:35:30.1638198Z Entering 'third_party/tensorpipe/third_party/pybind11' 2025-12-04T09:35:30.1671414Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang' 2025-12-04T09:35:30.1864161Z Prepare all required actions 2025-12-04T09:35:30.1864785Z Getting action download info 2025-12-04T09:35:30.3423620Z ##[group]Run ./.github/actions/setup-linux 2025-12-04T09:35:30.3423997Z env: 2025-12-04T09:35:30.3424251Z GIT_DEFAULT_BRANCH: main 2025-12-04T09:35:30.3424553Z ##[endgroup] 2025-12-04T09:35:30.3468875Z ##[group]Run set -euo pipefail 2025-12-04T09:35:30.3469315Z set -euo pipefail 2025-12-04T09:35:30.3469661Z function get_ec2_metadata() { 2025-12-04T09:35:30.3470107Z  # Pulled from instance metadata endpoint for EC2 2025-12-04T09:35:30.3470839Z  # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html 2025-12-04T09:35:30.3471835Z  category=$1 2025-12-04T09:35:30.3472261Z  # If it is GCP runner (runner name contains gcp), do not run this 2025-12-04T09:35:30.3472765Z  runner_name_str=i-00bb8650059fae3eb 2025-12-04T09:35:30.3473223Z  if [[ -f /.inarc ]]; then 2025-12-04T09:35:30.3473617Z  echo "ARC Runner, no info on ec2 metadata" 2025-12-04T09:35:30.3474078Z  elif [[ $runner_name_str == *"gcp"* ]]; then 2025-12-04T09:35:30.3474634Z  echo "Runner is from Google Cloud Platform, No info on ec2 metadata" 2025-12-04T09:35:30.3475142Z  else 2025-12-04T09:35:30.3476157Z  curl -H "X-aws-ec2-metadata-token: $(curl -s -X PUT "http://169.254.169.254/latest/api/token" -H "X-aws-ec2-metadata-token-ttl-seconds: 30")" -fsSL "http://169.254.169.254/latest/meta-data/${category}" 2025-12-04T09:35:30.3477253Z  fi 2025-12-04T09:35:30.3477509Z } 2025-12-04T09:35:30.3477814Z echo "ami-id: $(get_ec2_metadata ami-id)" 2025-12-04T09:35:30.3478301Z echo "instance-id: $(get_ec2_metadata instance-id)" 2025-12-04T09:35:30.3478869Z echo "instance-type: $(get_ec2_metadata instance-type)" 2025-12-04T09:35:30.3479364Z echo "system info $(uname -a)" 2025-12-04T09:35:30.3486395Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T09:35:30.3486842Z env: 2025-12-04T09:35:30.3487089Z GIT_DEFAULT_BRANCH: main 2025-12-04T09:35:30.3487379Z ##[endgroup] 2025-12-04T09:35:30.3647605Z ami-id: ami-08982f1c5bf93d976 2025-12-04T09:35:30.3762574Z instance-id: i-00bb8650059fae3eb 2025-12-04T09:35:30.3873475Z instance-type: g4dn.4xlarge 2025-12-04T09:35:30.3885296Z system info Linux ip-10-0-51-5.ec2.internal 6.1.150-174.273.amzn2023.x86_64 #1 SMP PREEMPT_DYNAMIC Tue Sep 9 12:21:26 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux 2025-12-04T09:35:30.3908944Z ##[group]Run if [ -f /usr/bin/nvidia-smi ]; then nvidia-smi; fi 2025-12-04T09:35:30.3909540Z if [ -f /usr/bin/nvidia-smi ]; then nvidia-smi; fi 2025-12-04T09:35:30.3918054Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T09:35:30.3918489Z env: 2025-12-04T09:35:30.3918757Z GIT_DEFAULT_BRANCH: main 2025-12-04T09:35:30.3919072Z ##[endgroup] 2025-12-04T09:35:31.7511673Z Thu Dec 4 09:35:31 2025 2025-12-04T09:35:31.7512839Z +-----------------------------------------------------------------------------------------+ 2025-12-04T09:35:31.7513490Z | NVIDIA-SMI 580.82.07 Driver Version: 580.82.07 CUDA Version: 13.0 | 2025-12-04T09:35:31.7514107Z +-----------------------------------------+------------------------+----------------------+ 2025-12-04T09:35:31.7514758Z | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | 2025-12-04T09:35:31.7515434Z | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | 2025-12-04T09:35:31.7515983Z | | | MIG M. | 2025-12-04T09:35:31.7516392Z |=========================================+========================+======================| 2025-12-04T09:35:31.7614688Z | 0 Tesla T4 Off | 00000000:00:1E.0 Off | 0 | 2025-12-04T09:35:31.7615641Z | N/A 36C P0 25W / 70W | 0MiB / 15360MiB | 9% Default | 2025-12-04T09:35:31.7616135Z | | | N/A | 2025-12-04T09:35:31.7616692Z +-----------------------------------------+------------------------+----------------------+ 2025-12-04T09:35:31.7617078Z 2025-12-04T09:35:31.7617306Z +-----------------------------------------------------------------------------------------+ 2025-12-04T09:35:31.7617856Z | Processes: | 2025-12-04T09:35:31.7618413Z | GPU GI CI PID Type Process name GPU Memory | 2025-12-04T09:35:31.7618926Z | ID ID Usage | 2025-12-04T09:35:31.7619359Z |=========================================================================================| 2025-12-04T09:35:31.7619896Z | No running processes found | 2025-12-04T09:35:31.7620501Z +-----------------------------------------------------------------------------------------+ 2025-12-04T09:35:32.1765859Z ##[group]Run echo "IN_CONTAINER_RUNNER=$(if [ -f /.inarc ] || [ -f /.incontainer ]; then echo true ; else echo false; fi)" >> "$GITHUB_OUTPUT" 2025-12-04T09:35:32.1766992Z echo "IN_CONTAINER_RUNNER=$(if [ -f /.inarc ] || [ -f /.incontainer ]; then echo true ; else echo false; fi)" >> "$GITHUB_OUTPUT" 2025-12-04T09:35:32.1776015Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T09:35:32.1776559Z env: 2025-12-04T09:35:32.1776815Z GIT_DEFAULT_BRANCH: main 2025-12-04T09:35:32.1777132Z ##[endgroup] 2025-12-04T09:35:32.1853846Z ##[group]Run if systemctl is-active --quiet docker; then 2025-12-04T09:35:32.1854371Z if systemctl is-active --quiet docker; then 2025-12-04T09:35:32.1854838Z  echo "Docker daemon is running..."; 2025-12-04T09:35:32.1855249Z else 2025-12-04T09:35:32.1855656Z  echo "Starting docker daemon..." && sudo systemctl start docker; 2025-12-04T09:35:32.1856162Z fi 2025-12-04T09:35:32.1863569Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T09:35:32.1864021Z env: 2025-12-04T09:35:32.1864279Z GIT_DEFAULT_BRANCH: main 2025-12-04T09:35:32.1864573Z ##[endgroup] 2025-12-04T09:35:32.1955603Z Docker daemon is running... 2025-12-04T09:35:32.2001091Z ##[group]Run nick-fields/retry@v3.0.0 2025-12-04T09:35:32.2001453Z with: 2025-12-04T09:35:32.2001693Z shell: bash 2025-12-04T09:35:32.2001936Z timeout_minutes: 5 2025-12-04T09:35:32.2002221Z max_attempts: 3 2025-12-04T09:35:32.2002497Z retry_wait_seconds: 30 2025-12-04T09:35:32.2005225Z command: AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" # For LF Runners we need to make sure we also login to Meta's ECR docker registry too. META_AWS_ACCOUNT_ID=308535385114 if [ "$AWS_ACCOUNT_ID" != "$META_AWS_ACCOUNT_ID" ] ; then aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ --password-stdin "$META_AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" fi 2025-12-04T09:35:32.2008000Z polling_interval_seconds: 1 2025-12-04T09:35:32.2008335Z warning_on_retry: true 2025-12-04T09:35:32.2008628Z continue_on_error: false 2025-12-04T09:35:32.2008930Z env: 2025-12-04T09:35:32.2009177Z GIT_DEFAULT_BRANCH: main 2025-12-04T09:35:32.2009474Z AWS_RETRY_MODE: standard 2025-12-04T09:35:32.2009777Z AWS_MAX_ATTEMPTS: 5 2025-12-04T09:35:32.2010072Z AWS_DEFAULT_REGION: us-east-1 2025-12-04T09:35:32.2010374Z ##[endgroup] 2025-12-04T09:35:33.5309946Z WARNING! Your password will be stored unencrypted in /home/ec2-user/.docker/config.json. 2025-12-04T09:35:33.5310973Z Configure a credential helper to remove this warning. See 2025-12-04T09:35:33.5311658Z https://docs.docker.com/engine/reference/commandline/login/#credentials-store 2025-12-04T09:35:33.5312114Z 2025-12-04T09:35:33.5312251Z Login Succeeded 2025-12-04T09:35:34.2959261Z Command completed after 1 attempt(s). 2025-12-04T09:35:34.3018505Z ##[group]Run env | grep '^GITHUB' >> "/tmp/github_env_${GITHUB_RUN_ID}" 2025-12-04T09:35:34.3019252Z env | grep '^GITHUB' >> "/tmp/github_env_${GITHUB_RUN_ID}" 2025-12-04T09:35:34.3019800Z env | grep '^CI' >> "/tmp/github_env_${GITHUB_RUN_ID}" 2025-12-04T09:35:34.3028310Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T09:35:34.3028763Z env: 2025-12-04T09:35:34.3029024Z GIT_DEFAULT_BRANCH: main 2025-12-04T09:35:34.3029316Z ##[endgroup] 2025-12-04T09:35:34.3120663Z ##[group]Run # ignore expansion of "docker ps -q" since it could be empty 2025-12-04T09:35:34.3121334Z # ignore expansion of "docker ps -q" since it could be empty 2025-12-04T09:35:34.3121861Z # shellcheck disable=SC2046 2025-12-04T09:35:34.3122262Z docker stop $(docker ps -q) || true 2025-12-04T09:35:34.3122671Z # Prune all of the docker images 2025-12-04T09:35:34.3123047Z docker system prune -af 2025-12-04T09:35:34.3130042Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T09:35:34.3130486Z env: 2025-12-04T09:35:34.3130753Z GIT_DEFAULT_BRANCH: main 2025-12-04T09:35:34.3131049Z ##[endgroup] 2025-12-04T09:35:34.3401700Z "docker stop" requires at least 1 argument. 2025-12-04T09:35:34.3402156Z See 'docker stop --help'. 2025-12-04T09:35:34.3402376Z 2025-12-04T09:35:34.3402565Z Usage: docker stop [OPTIONS] CONTAINER [CONTAINER...] 2025-12-04T09:35:34.3402897Z 2025-12-04T09:35:34.3403023Z Stop one or more running containers 2025-12-04T09:35:34.3590001Z Total reclaimed space: 0B 2025-12-04T09:35:34.3802461Z ##[group]Run pytorch/test-infra/.github/actions/calculate-docker-image@main 2025-12-04T09:35:34.3803025Z with: 2025-12-04T09:35:34.3803969Z docker-image-name: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.4-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a 2025-12-04T09:35:34.3805019Z use-custom-docker-registry: true 2025-12-04T09:35:34.3805391Z docker-build-dir: .ci/docker 2025-12-04T09:35:34.3805736Z docker-build-script: ./build.sh 2025-12-04T09:35:34.3806070Z working-directory: . 2025-12-04T09:35:34.3806644Z docker-registry: 308535385114.dkr.ecr.us-east-1.amazonaws.com 2025-12-04T09:35:34.3807115Z force-push: false 2025-12-04T09:35:34.3807369Z env: 2025-12-04T09:35:34.3807621Z GIT_DEFAULT_BRANCH: main 2025-12-04T09:35:34.3807920Z ##[endgroup] 2025-12-04T09:35:34.3829831Z ##[group]Run set -ex 2025-12-04T09:35:34.3830200Z set -ex 2025-12-04T09:35:34.3830451Z  2025-12-04T09:35:34.3830991Z # If the docker build directory or the build script doesn't exist, the action will 2025-12-04T09:35:34.3831794Z # gracefully return the docker image name as it is. Pulling docker image in Linux 2025-12-04T09:35:34.3832455Z # job could then download the pre-built image as usual 2025-12-04T09:35:34.3833276Z if [[ -d "${DOCKER_BUILD_DIR}" ]] && [[ -f "${DOCKER_BUILD_DIR}/${DOCKER_BUILD_SCRIPT}" ]] && [[ "${USE_CUSTOM_DOCKER_REGISTRY}" == "true" ]]; then 2025-12-04T09:35:34.3834043Z  echo "skip=false" >> "${GITHUB_OUTPUT}" 2025-12-04T09:35:34.3834439Z else 2025-12-04T09:35:34.3834742Z  echo "skip=true" >> "${GITHUB_OUTPUT}" 2025-12-04T09:35:34.3835265Z  echo "docker-image=${DOCKER_IMAGE_NAME}" >> "${GITHUB_OUTPUT}" 2025-12-04T09:35:34.3835748Z  2025-12-04T09:35:34.3836389Z  echo "Not using custom ECR registry. Either it was not requested or there is no Docker build script in the ${REPO_NAME} repo..." 2025-12-04T09:35:34.3837145Z  exit 0 2025-12-04T09:35:34.3837401Z fi 2025-12-04T09:35:34.3837626Z  2025-12-04T09:35:34.3838021Z if [[ "${DOCKER_IMAGE_NAME}" == *"${DOCKER_REGISTRY}/${REPO_NAME}"* ]]; then 2025-12-04T09:35:34.3838735Z  # The docker image name already includes the ECR prefix and tag, so we can just 2025-12-04T09:35:34.3839374Z  # use it as it is, but first let's extract the tag 2025-12-04T09:35:34.3839928Z  DOCKER_TAG=$(echo "${DOCKER_IMAGE_NAME}" | awk -F '[:,]' '{print $2}') 2025-12-04T09:35:34.3840533Z  echo "docker-tag=${DOCKER_TAG}" >> "${GITHUB_OUTPUT}" 2025-12-04T09:35:34.3841106Z  echo "docker-image=${DOCKER_IMAGE_NAME}" >> "${GITHUB_OUTPUT}" 2025-12-04T09:35:34.3841595Z else 2025-12-04T09:35:34.3841890Z  if [[ "${DOCKER_IMAGE_NAME}" == *:* ]]; then 2025-12-04T09:35:34.3842341Z  CUSTOM_TAG_PREFIX=${DOCKER_IMAGE_NAME#*:} 2025-12-04T09:35:34.3842808Z  DOCKER_IMAGE_NAME=${DOCKER_IMAGE_NAME%%:*} 2025-12-04T09:35:34.3843193Z  fi 2025-12-04T09:35:34.3843725Z  DOCKER_TAG=${CUSTOM_TAG_PREFIX:+${CUSTOM_TAG_PREFIX}-}$(git rev-parse HEAD:"${DOCKER_BUILD_DIR}") 2025-12-04T09:35:34.3844440Z  echo "docker-tag=${DOCKER_TAG}" >> "${GITHUB_OUTPUT}" 2025-12-04T09:35:34.3845194Z  echo "docker-image=${DOCKER_REGISTRY}/${REPO_NAME}/${DOCKER_IMAGE_NAME}:${DOCKER_TAG}" >> "${GITHUB_OUTPUT}" 2025-12-04T09:35:34.3846010Z  echo "custom-tag-prefix=${CUSTOM_TAG_PREFIX}" >> "${GITHUB_OUTPUT}" 2025-12-04T09:35:34.3846521Z fi 2025-12-04T09:35:34.3854076Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T09:35:34.3854501Z env: 2025-12-04T09:35:34.3854751Z GIT_DEFAULT_BRANCH: main 2025-12-04T09:35:34.3855066Z REPO_NAME: pytorch 2025-12-04T09:35:34.3856173Z DOCKER_IMAGE_NAME: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.4-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a 2025-12-04T09:35:34.3857335Z DOCKER_BUILD_DIR: .ci/docker 2025-12-04T09:35:34.3857680Z DOCKER_BUILD_SCRIPT: ./build.sh 2025-12-04T09:35:34.3858132Z DOCKER_REGISTRY: 308535385114.dkr.ecr.us-east-1.amazonaws.com 2025-12-04T09:35:34.3858602Z USE_CUSTOM_DOCKER_REGISTRY: true 2025-12-04T09:35:34.3858953Z CUSTOM_TAG_PREFIX: 2025-12-04T09:35:34.3859242Z ##[endgroup] 2025-12-04T09:35:34.3887992Z + [[ -d .ci/docker ]] 2025-12-04T09:35:34.3888327Z + [[ -f .ci/docker/./build.sh ]] 2025-12-04T09:35:34.3888816Z + [[ true == \t\r\u\e ]] 2025-12-04T09:35:34.3889113Z + echo skip=false 2025-12-04T09:35:34.3890380Z + [[ 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.4-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a == *\3\0\8\5\3\5\3\8\5\1\1\4\.\d\k\r\.\e\c\r\.\u\s\-\e\a\s\t\-\1\.\a\m\a\z\o\n\a\w\s\.\c\o\m\/\p\y\t\o\r\c\h* ]] 2025-12-04T09:35:34.3896410Z ++ echo 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.4-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a 2025-12-04T09:35:34.3897393Z ++ awk -F '[:,]' '{print $2}' 2025-12-04T09:35:34.3920834Z + DOCKER_TAG=pytorch-linux-jammy-cuda12.4-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a 2025-12-04T09:35:34.3921890Z + echo docker-tag=pytorch-linux-jammy-cuda12.4-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a 2025-12-04T09:35:34.3923426Z + echo docker-image=308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.4-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a 2025-12-04T09:35:34.3950578Z ##[group]Run set +e 2025-12-04T09:35:34.3950953Z set +e 2025-12-04T09:35:34.3951207Z set -x 2025-12-04T09:35:34.3951471Z  2025-12-04T09:35:34.3951722Z login() { 2025-12-04T09:35:34.3952268Z  aws ecr get-login-password --region us-east-1 | docker login -u AWS --password-stdin "$1" 2025-12-04T09:35:34.3952882Z } 2025-12-04T09:35:34.3953138Z  2025-12-04T09:35:34.3953366Z retry () { 2025-12-04T09:35:34.3953679Z  $* || (sleep 1 && $*) || (sleep 2 && $*) 2025-12-04T09:35:34.3954047Z } 2025-12-04T09:35:34.3954272Z  2025-12-04T09:35:34.3954543Z retry login "${DOCKER_REGISTRY}" 2025-12-04T09:35:34.3954900Z  2025-12-04T09:35:34.3955149Z START_TIME=$(date +%s) 2025-12-04T09:35:34.3955480Z # Wait up to 120 minutes 2025-12-04T09:35:34.3955903Z while [[ $(( $(date +%s) - 7200 )) -lt $START_TIME ]]; do 2025-12-04T09:35:34.3956486Z  # Check if image already exists, if it does then skip building it 2025-12-04T09:35:34.3957061Z  if docker manifest inspect "${DOCKER_IMAGE}"; then 2025-12-04T09:35:34.3957489Z  exit 0 2025-12-04T09:35:34.3957758Z  fi 2025-12-04T09:35:34.3957994Z  2025-12-04T09:35:34.3958452Z  # NB: This flag is used by Docker build workflow to push the image to ECR, so we can 2025-12-04T09:35:34.3959248Z  # use this to differentiate between the Docker build and regular build jobs. For the 2025-12-04T09:35:34.3960041Z  # latter, it will wait for the Docker images to become available before continuing 2025-12-04T09:35:34.3960642Z  if [ "${DOCKER_PUSH:-false}" == "true" ]; then 2025-12-04T09:35:34.3961114Z  # It's a Docker build job, let's build the image 2025-12-04T09:35:34.3961525Z  break 2025-12-04T09:35:34.3961797Z  else 2025-12-04T09:35:34.3962192Z  # It's a regular build job, wait for the image to become available 2025-12-04T09:35:34.3962678Z  sleep 300 2025-12-04T09:35:34.3962962Z  fi 2025-12-04T09:35:34.3963200Z done 2025-12-04T09:35:34.3963447Z  2025-12-04T09:35:34.3963858Z # NB: This part requires a full checkout. Otherwise, the merge base will 2025-12-04T09:35:34.3964692Z # be empty. The default action would be to continue rebuild the image 2025-12-04T09:35:34.3965308Z if [[ "$BASE_REVISION" = "$(git rev-parse HEAD)" ]]; then 2025-12-04T09:35:34.3965846Z  # if we're on the base branch then use the parent commit 2025-12-04T09:35:34.3966325Z  MERGE_BASE=$(git rev-parse HEAD~) 2025-12-04T09:35:34.3966688Z else 2025-12-04T09:35:34.3967075Z  # otherwise we're on a PR, so use the most recent base commit 2025-12-04T09:35:34.3967643Z  MERGE_BASE=$(git merge-base HEAD "$BASE_REVISION") 2025-12-04T09:35:34.3968157Z fi 2025-12-04T09:35:34.3968403Z  2025-12-04T09:35:34.3968674Z if [[ -z "${MERGE_BASE}" ]]; then 2025-12-04T09:35:34.3969104Z  echo "rebuild=true" >> "${GITHUB_OUTPUT}" 2025-12-04T09:35:34.3969493Z  2025-12-04T09:35:34.3970046Z  echo "Finding merge base only works with full checkout, please set fetch-depth to 0, continuing ..." 2025-12-04T09:35:34.3970709Z  exit 0 2025-12-04T09:35:34.3971224Z fi 2025-12-04T09:35:34.3971477Z  2025-12-04T09:35:34.3971836Z if ! git rev-parse "${MERGE_BASE}:${DOCKER_BUILD_DIR}"; then 2025-12-04T09:35:34.3972651Z  echo "Directory '${DOCKER_BUILD_DIR}' not found in commit $MERGE_BASE, you should rebase onto a more recent commit" 2025-12-04T09:35:34.3973332Z  exit 1 2025-12-04T09:35:34.3973588Z fi 2025-12-04T09:35:34.3973829Z  2025-12-04T09:35:34.3974233Z PREVIOUS_DOCKER_TAG=$(git rev-parse "${MERGE_BASE}:${DOCKER_BUILD_DIR}") 2025-12-04T09:35:34.3975021Z # If no image exists but the hash is the same as the previous hash then we should error out here 2025-12-04T09:35:34.3975730Z if [[ "${PREVIOUS_DOCKER_TAG}" == "${DOCKER_TAG}" ]]; then 2025-12-04T09:35:34.3976606Z  echo "WARNING: Something has gone wrong and the previous image isn't available for the merge-base of your branch" 2025-12-04T09:35:34.3977536Z  echo " Will re-build docker image to store in local cache, TTS may be longer" 2025-12-04T09:35:34.3978067Z fi 2025-12-04T09:35:34.3978316Z  2025-12-04T09:35:34.3978623Z echo "rebuild=true" >> "${GITHUB_OUTPUT}" 2025-12-04T09:35:34.3985540Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T09:35:34.3985992Z env: 2025-12-04T09:35:34.3986250Z GIT_DEFAULT_BRANCH: main 2025-12-04T09:35:34.3986564Z DOCKER_BUILD_DIR: .ci/docker 2025-12-04T09:35:34.3986979Z BASE_REVISION: ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32 2025-12-04T09:35:34.3988083Z DOCKER_IMAGE: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.4-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a 2025-12-04T09:35:34.3989422Z DOCKER_TAG: pytorch-linux-jammy-cuda12.4-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a 2025-12-04T09:35:34.3990205Z DOCKER_REGISTRY: 308535385114.dkr.ecr.us-east-1.amazonaws.com 2025-12-04T09:35:34.3990680Z DOCKER_PUSH: 2025-12-04T09:35:34.3990949Z ##[endgroup] 2025-12-04T09:35:34.4019598Z + retry login 308535385114.dkr.ecr.us-east-1.amazonaws.com 2025-12-04T09:35:34.4020120Z + login 308535385114.dkr.ecr.us-east-1.amazonaws.com 2025-12-04T09:35:34.4022690Z + aws ecr get-login-password --region us-east-1 2025-12-04T09:35:34.4024183Z + docker login -u AWS --password-stdin 308535385114.dkr.ecr.us-east-1.amazonaws.com 2025-12-04T09:35:35.0202103Z WARNING! Your password will be stored unencrypted in /home/ec2-user/.docker/config.json. 2025-12-04T09:35:35.0202853Z Configure a credential helper to remove this warning. See 2025-12-04T09:35:35.0203530Z https://docs.docker.com/engine/reference/commandline/login/#credentials-store 2025-12-04T09:35:35.0203987Z 2025-12-04T09:35:35.0204124Z Login Succeeded 2025-12-04T09:35:35.0219404Z ++ date +%s 2025-12-04T09:35:35.0230365Z + START_TIME=1764840935 2025-12-04T09:35:35.0233660Z ++ date +%s 2025-12-04T09:35:35.0244383Z + [[ 1764833735 -lt 1764840935 ]] 2025-12-04T09:35:35.0245479Z + docker manifest inspect 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.4-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a 2025-12-04T09:35:35.2899016Z { 2025-12-04T09:35:35.2899645Z "schemaVersion": 2, 2025-12-04T09:35:35.2900504Z "mediaType": "application/vnd.docker.distribution.manifest.v2+json", 2025-12-04T09:35:35.2901031Z "config": { 2025-12-04T09:35:35.2901414Z "mediaType": "application/vnd.docker.container.image.v1+json", 2025-12-04T09:35:35.2901888Z "size": 34787, 2025-12-04T09:35:35.2902646Z "digest": "sha256:5465aa79632b68f6240c23f0d0b021df4d0fd595333b61a40d36a0cf73656024" 2025-12-04T09:35:35.2903190Z }, 2025-12-04T09:35:35.2903431Z "layers": [ 2025-12-04T09:35:35.2903679Z { 2025-12-04T09:35:35.2904047Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T09:35:35.2904534Z "size": 30447951, 2025-12-04T09:35:35.2905037Z "digest": "sha256:63e5bc7682b85ae57a1221210f64d62e7a90b0a30f19af4ca734b8242ae49d63" 2025-12-04T09:35:35.2905589Z }, 2025-12-04T09:35:35.2905797Z { 2025-12-04T09:35:35.2906171Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T09:35:35.2906652Z "size": 1554, 2025-12-04T09:35:35.2907107Z "digest": "sha256:835841cca3b7e1464290cdb78e48773e03583413fbed852c3cc5165a392ea44d" 2025-12-04T09:35:35.2907653Z }, 2025-12-04T09:35:35.2907872Z { 2025-12-04T09:35:35.2908235Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T09:35:35.2908719Z "size": 313276213, 2025-12-04T09:35:35.2909230Z "digest": "sha256:1bf1bb125deaa5b8a3adf121671e87ba2fa7e229f9eb1dff7ade581cb737175a" 2025-12-04T09:35:35.2909777Z }, 2025-12-04T09:35:35.2909997Z { 2025-12-04T09:35:35.2910371Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T09:35:35.2910837Z "size": 787, 2025-12-04T09:35:35.2911309Z "digest": "sha256:b21856d1bf420da6fa8ec7331b82ab355d4f4178644e7d3a3d3d0fbc3610109a" 2025-12-04T09:35:35.2911866Z }, 2025-12-04T09:35:35.2912090Z { 2025-12-04T09:35:35.2912448Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T09:35:35.2912928Z "size": 106, 2025-12-04T09:35:35.2913416Z "digest": "sha256:848ba2c095e2b9e6acfb0ecf077adb526fb2fa82ed44cf6648ebde97f296f8ec" 2025-12-04T09:35:35.2913967Z }, 2025-12-04T09:35:35.2914187Z { 2025-12-04T09:35:35.2914557Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T09:35:35.2915026Z "size": 704, 2025-12-04T09:35:35.2915503Z "digest": "sha256:029495b23122c840ca0e52d487afa8d2c4dbf1991cd7f204ec3e434dcf947bf4" 2025-12-04T09:35:35.2916052Z }, 2025-12-04T09:35:35.2916257Z { 2025-12-04T09:35:35.2916787Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T09:35:35.2917316Z "size": 1216, 2025-12-04T09:35:35.2917781Z "digest": "sha256:073bb82063cfba4639b11fea43753dbb128f9238353189fc02d2e2aa0b2ad359" 2025-12-04T09:35:35.2918333Z }, 2025-12-04T09:35:35.2918555Z { 2025-12-04T09:35:35.2918929Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T09:35:35.2919398Z "size": 484, 2025-12-04T09:35:35.2919867Z "digest": "sha256:59b63930883363c7d2aaab27cc61555d9f3e119dc18247a8624c98ebdaa354a5" 2025-12-04T09:35:35.2920409Z }, 2025-12-04T09:35:35.2920613Z { 2025-12-04T09:35:35.2920982Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T09:35:35.2921465Z "size": 110362071, 2025-12-04T09:35:35.2921934Z "digest": "sha256:1c6177b2970db2d7743b4337c420a35f2ec79f338c30d97d534a1f0987c00913" 2025-12-04T09:35:35.2922482Z }, 2025-12-04T09:35:35.2922703Z { 2025-12-04T09:35:35.2923062Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T09:35:35.2923544Z "size": 4961, 2025-12-04T09:35:35.2924024Z "digest": "sha256:fabe466dd5f33c3209a56abf5cb46b9b07fe21c57fb43b98e13308c8665c0864" 2025-12-04T09:35:35.2924579Z }, 2025-12-04T09:35:35.2924781Z { 2025-12-04T09:35:35.2925368Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T09:35:35.2925859Z "size": 1755, 2025-12-04T09:35:35.2926314Z "digest": "sha256:2b5a11b41761d8ea3b829e4772e4064cb6c4e4989126af324d0057661e4493a1" 2025-12-04T09:35:35.2926863Z }, 2025-12-04T09:35:35.2927080Z { 2025-12-04T09:35:35.2927439Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T09:35:35.2927925Z "size": 724, 2025-12-04T09:35:35.2928388Z "digest": "sha256:9681563a88ff9e62494a2740e537440d3df978d466c9478d6a941fae8b57b084" 2025-12-04T09:35:35.2928994Z }, 2025-12-04T09:35:35.2929211Z { 2025-12-04T09:35:35.2929582Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T09:35:35.2930046Z "size": 544, 2025-12-04T09:35:35.2930518Z "digest": "sha256:dc0780902fca810498f16efa71f8e5990385f141a0cfcc552616a4acc434f79a" 2025-12-04T09:35:35.2931070Z }, 2025-12-04T09:35:35.2931296Z { 2025-12-04T09:35:35.2931661Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T09:35:35.2932151Z "size": 3185191720, 2025-12-04T09:35:35.2932646Z "digest": "sha256:5b09a2b135c8e540e2b9374b68991afdd63a5dfaba75fb44efe054a591f400c2" 2025-12-04T09:35:35.2933185Z }, 2025-12-04T09:35:35.2933410Z { 2025-12-04T09:35:35.2933783Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T09:35:35.2934248Z "size": 32, 2025-12-04T09:35:35.2934724Z "digest": "sha256:4f4fb700ef54461cfa02571ae0db9a0dc1e0cdb5577484a6d75e68dc38e8acc1" 2025-12-04T09:35:35.2935274Z }, 2025-12-04T09:35:35.2935482Z { 2025-12-04T09:35:35.2935854Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T09:35:35.2936426Z "size": 396, 2025-12-04T09:35:35.2936902Z "digest": "sha256:5bfdaeb5578d6ffcd7db29c48303cbceb13c591210feaa216a8daa7a6d445b4b" 2025-12-04T09:35:35.2937470Z }, 2025-12-04T09:35:35.2937690Z { 2025-12-04T09:35:35.2938065Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T09:35:35.2938547Z "size": 236865, 2025-12-04T09:35:35.2939023Z "digest": "sha256:0ef42867f370b8a14b8c301388793b78a0bd2533bb2a317b129b03c8667dc767" 2025-12-04T09:35:35.2939567Z }, 2025-12-04T09:35:35.2939775Z { 2025-12-04T09:35:35.2940149Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T09:35:35.2940628Z "size": 230, 2025-12-04T09:35:35.2941076Z "digest": "sha256:446083e497f322789c2d87933a77fb2dfd94e18d2e85f6d4362e6e9521b82c4e" 2025-12-04T09:35:35.2941619Z }, 2025-12-04T09:35:35.2941836Z { 2025-12-04T09:35:35.2942203Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T09:35:35.2942684Z "size": 3043500, 2025-12-04T09:35:35.2943167Z "digest": "sha256:d8a170bef0f4e0e28f5ba0952320dd465552adf74f0864b4f47cc11f4c4f82f7" 2025-12-04T09:35:35.2943717Z }, 2025-12-04T09:35:35.2943922Z { 2025-12-04T09:35:35.2944296Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T09:35:35.2944779Z "size": 1472, 2025-12-04T09:35:35.2945248Z "digest": "sha256:e2b6cd6a5bd0418a1e4aca3f37942324d4d9f9b0177597e37fc8d1a5626048e1" 2025-12-04T09:35:35.2945797Z }, 2025-12-04T09:35:35.2946015Z { 2025-12-04T09:35:35.2946378Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T09:35:35.2946873Z "size": 481, 2025-12-04T09:35:35.2947341Z "digest": "sha256:93efc0181a22218a544413f1d57e9e0e7a0f492e41bef598084c5b9177e3987a" 2025-12-04T09:35:35.2947885Z }, 2025-12-04T09:35:35.2948091Z { 2025-12-04T09:35:35.2948468Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T09:35:35.2948956Z "size": 202, 2025-12-04T09:35:35.2949411Z "digest": "sha256:7454c938f17425bcf167ad28a62b42b95f638a7d2cf0840885cfe5ffe8480a12" 2025-12-04T09:35:35.2949963Z }, 2025-12-04T09:35:35.2950188Z { 2025-12-04T09:35:35.2950550Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T09:35:35.2951037Z "size": 607, 2025-12-04T09:35:35.2951613Z "digest": "sha256:4d57ff55f6d4161cb6c29e2c0b08d47e65898427db3938479158684899f0023d" 2025-12-04T09:35:35.2952163Z }, 2025-12-04T09:35:35.2952371Z { 2025-12-04T09:35:35.2952747Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T09:35:35.2953236Z "size": 6243016141, 2025-12-04T09:35:35.2953717Z "digest": "sha256:b0301534b4a58072d5b140b08a7608bbead41d126fa29fdc78c1e8a43ebb865d" 2025-12-04T09:35:35.2954274Z }, 2025-12-04T09:35:35.2954500Z { 2025-12-04T09:35:35.2954857Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T09:35:35.2955410Z "size": 829, 2025-12-04T09:35:35.2955875Z "digest": "sha256:1969e15d0c13874ea5883ed829235a19ef6dc21c8aa6172032b78a8ffa6ff262" 2025-12-04T09:35:35.2956404Z }, 2025-12-04T09:35:35.2956624Z { 2025-12-04T09:35:35.2956995Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T09:35:35.2957462Z "size": 33450177, 2025-12-04T09:35:35.2957961Z "digest": "sha256:73180a0f2d5a961a0cc0ba2c3cf375fdcfb43ae5e4e5c63a000c4b4366d52a64" 2025-12-04T09:35:35.2958516Z }, 2025-12-04T09:35:35.2958735Z { 2025-12-04T09:35:35.2959092Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T09:35:35.2959576Z "size": 104, 2025-12-04T09:35:35.2960048Z "digest": "sha256:ad81b25cb69f8cf42a4a96678a64b7d0598a8f95236a3e63d1fec4e53edff613" 2025-12-04T09:35:35.2960588Z }, 2025-12-04T09:35:35.2960806Z { 2025-12-04T09:35:35.2961184Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T09:35:35.2961650Z "size": 1496, 2025-12-04T09:35:35.2962132Z "digest": "sha256:8165374f8dccf88a7791a5d31afbe29e4d4542b4f1cf1904945e07f9af6bf8ba" 2025-12-04T09:35:35.2962685Z }, 2025-12-04T09:35:35.2962890Z { 2025-12-04T09:35:35.2963262Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T09:35:35.2963744Z "size": 458786969, 2025-12-04T09:35:35.2964223Z "digest": "sha256:7779c0bb9be2030df9060b526b98d0afeed1ce5b61ee0530321ef04a4e145e8c" 2025-12-04T09:35:35.2964781Z }, 2025-12-04T09:35:35.2965001Z { 2025-12-04T09:35:35.2965374Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T09:35:35.2965845Z "size": 164, 2025-12-04T09:35:35.2966314Z "digest": "sha256:4d0a1c027262ed8c83181b931b64afa1c41c3cac97580231c4cae3a524ebd7d5" 2025-12-04T09:35:35.2966861Z }, 2025-12-04T09:35:35.2967071Z { 2025-12-04T09:35:35.2967449Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T09:35:35.2967932Z "size": 346, 2025-12-04T09:35:35.2968385Z "digest": "sha256:a51e0dab2d596e6563483f27c12660007160847d177ba4c31812a8f44ada5754" 2025-12-04T09:35:35.2968929Z }, 2025-12-04T09:35:35.2969148Z { 2025-12-04T09:35:35.2969509Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T09:35:35.2969989Z "size": 32, 2025-12-04T09:35:35.2970459Z "digest": "sha256:4f4fb700ef54461cfa02571ae0db9a0dc1e0cdb5577484a6d75e68dc38e8acc1" 2025-12-04T09:35:35.2971266Z }, 2025-12-04T09:35:35.2971492Z { 2025-12-04T09:35:35.2971871Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T09:35:35.2972355Z "size": 106, 2025-12-04T09:35:35.2972817Z "digest": "sha256:3eb6d4ff040b8761b1e3e1da768bdb884ce0e5324e3d0f6471b0a8b2ddf4736f" 2025-12-04T09:35:35.2973371Z }, 2025-12-04T09:35:35.2973590Z { 2025-12-04T09:35:35.2973951Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T09:35:35.2974438Z "size": 424, 2025-12-04T09:35:35.2974903Z "digest": "sha256:b168858b85373f8ddca549d79267a06de4fa945d04bf791c55c9ddc93957fa3c" 2025-12-04T09:35:35.2975446Z }, 2025-12-04T09:35:35.2975663Z { 2025-12-04T09:35:35.2976034Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T09:35:35.2976578Z "size": 19309367, 2025-12-04T09:35:35.2977060Z "digest": "sha256:d77a39278026a8899e2f97643918bdcf96e711ca26951880b4841b319dc71321" 2025-12-04T09:35:35.2977595Z }, 2025-12-04T09:35:35.2977812Z { 2025-12-04T09:35:35.2978336Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T09:35:35.2978819Z "size": 108, 2025-12-04T09:35:35.2979302Z "digest": "sha256:36fbd357280b6b40e90f36ac3d19da3da10e5dbf0027a5cfe8e2f29d1870d347" 2025-12-04T09:35:35.2979846Z }, 2025-12-04T09:35:35.2980067Z { 2025-12-04T09:35:35.2980442Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T09:35:35.2980912Z "size": 826, 2025-12-04T09:35:35.2981390Z "digest": "sha256:4e3b10a5dd6aed29f238d604925e2a4f873141c1087c8dd4fdde5c61e7560893" 2025-12-04T09:35:35.2982057Z }, 2025-12-04T09:35:35.2982266Z { 2025-12-04T09:35:35.2982640Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T09:35:35.2983125Z "size": 724, 2025-12-04T09:35:35.2983572Z "digest": "sha256:9681563a88ff9e62494a2740e537440d3df978d466c9478d6a941fae8b57b084" 2025-12-04T09:35:35.2984114Z }, 2025-12-04T09:35:35.2984329Z { 2025-12-04T09:35:35.2999624Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T09:35:35.3000270Z "size": 149, 2025-12-04T09:35:35.3000763Z "digest": "sha256:3092fab73b59190b9facfc49bf18f58612172bc2fd68dfa339a1118632616939" 2025-12-04T09:35:35.3001312Z }, 2025-12-04T09:35:35.3001542Z { 2025-12-04T09:35:35.3001929Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T09:35:35.3002403Z "size": 136, 2025-12-04T09:35:35.3002890Z "digest": "sha256:20020dd28a15ba092fcbfe906ee39cdddfcc9d0b7eb42fdd6f4c08a984fa9c00" 2025-12-04T09:35:35.3003453Z }, 2025-12-04T09:35:35.3003687Z { 2025-12-04T09:35:35.3004054Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T09:35:35.3004534Z "size": 140, 2025-12-04T09:35:35.3005049Z "digest": "sha256:ae5280ce969dcff08c091e9a5f7641f13561b2b0ee44d78b7c3f81d8fe8e6d32" 2025-12-04T09:35:35.3005594Z }, 2025-12-04T09:35:35.3005813Z { 2025-12-04T09:35:35.3006186Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T09:35:35.3006662Z "size": 32, 2025-12-04T09:35:35.3007136Z "digest": "sha256:4f4fb700ef54461cfa02571ae0db9a0dc1e0cdb5577484a6d75e68dc38e8acc1" 2025-12-04T09:35:35.3007689Z }, 2025-12-04T09:35:35.3007896Z { 2025-12-04T09:35:35.3008273Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T09:35:35.3008750Z "size": 223, 2025-12-04T09:35:35.3009205Z "digest": "sha256:026e4484b749dfc556dcf7c8f45c1759518a89072e4dbc974d9405ada1582d03" 2025-12-04T09:35:35.3009749Z }, 2025-12-04T09:35:35.3009964Z { 2025-12-04T09:35:35.3010344Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T09:35:35.3010812Z "size": 256, 2025-12-04T09:35:35.3011300Z "digest": "sha256:1be9da2ce53d20d8befad5c024ee0eb41ee35984307cbd5621d8effae0353073" 2025-12-04T09:35:35.3011864Z }, 2025-12-04T09:35:35.3012069Z { 2025-12-04T09:35:35.3012445Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T09:35:35.3012924Z "size": 32, 2025-12-04T09:35:35.3013389Z "digest": "sha256:4f4fb700ef54461cfa02571ae0db9a0dc1e0cdb5577484a6d75e68dc38e8acc1" 2025-12-04T09:35:35.3013941Z }, 2025-12-04T09:35:35.3014156Z { 2025-12-04T09:35:35.3014514Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T09:35:35.3014987Z "size": 106, 2025-12-04T09:35:35.3015450Z "digest": "sha256:6481b7a1d9fb4001fd6f9e2a8d1600192529ddb957128e41671ca4630fa06ad4" 2025-12-04T09:35:35.3015993Z }, 2025-12-04T09:35:35.3016198Z { 2025-12-04T09:35:35.3016662Z + exit 0 2025-12-04T09:35:35.3017046Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T09:35:35.3017536Z "size": 312293471, 2025-12-04T09:35:35.3018026Z "digest": "sha256:fa519d18c39d8f297109c056017ebce7efc322d058afd27fdac5880d6c8d35b0" 2025-12-04T09:35:35.3018580Z }, 2025-12-04T09:35:35.3018799Z { 2025-12-04T09:35:35.3019176Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T09:35:35.3019649Z "size": 3058012325, 2025-12-04T09:35:35.3020313Z "digest": "sha256:d172f25b97f78fce0f6c6701f0db794b1c994a9cdf8cff9ddc6bdd1a1bea835c" 2025-12-04T09:35:35.3020884Z }, 2025-12-04T09:35:35.3021092Z { 2025-12-04T09:35:35.3021471Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T09:35:35.3021958Z "size": 129, 2025-12-04T09:35:35.3022420Z "digest": "sha256:fd60ab6b1c2c85a932e9894b5d0cf5c9e75fa21782e3028ea40d76017ecfbf85" 2025-12-04T09:35:35.3022977Z }, 2025-12-04T09:35:35.3023197Z { 2025-12-04T09:35:35.3023563Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T09:35:35.3024153Z "size": 880, 2025-12-04T09:35:35.3024632Z "digest": "sha256:0afe45579c2c87002db8c1abf7b32a748e6cb3b9b57e9b391f91cad9f84df476" 2025-12-04T09:35:35.3025187Z }, 2025-12-04T09:35:35.3025393Z { 2025-12-04T09:35:35.3025765Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T09:35:35.3026252Z "size": 724, 2025-12-04T09:35:35.3026707Z "digest": "sha256:9681563a88ff9e62494a2740e537440d3df978d466c9478d6a941fae8b57b084" 2025-12-04T09:35:35.3027248Z }, 2025-12-04T09:35:35.3027466Z { 2025-12-04T09:35:35.3027826Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T09:35:35.3028305Z "size": 139, 2025-12-04T09:35:35.3028769Z "digest": "sha256:5884ffd6720b47274f651262d5f9224f55960f9ea717faafe332aa20afb0ffa4" 2025-12-04T09:35:35.3029302Z }, 2025-12-04T09:35:35.3029519Z { 2025-12-04T09:35:35.3029891Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T09:35:35.3030363Z "size": 32, 2025-12-04T09:35:35.3030834Z "digest": "sha256:4f4fb700ef54461cfa02571ae0db9a0dc1e0cdb5577484a6d75e68dc38e8acc1" 2025-12-04T09:35:35.3031384Z }, 2025-12-04T09:35:35.3031601Z { 2025-12-04T09:35:35.3031961Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T09:35:35.3032439Z "size": 160, 2025-12-04T09:35:35.3032931Z "digest": "sha256:ab7a7c316fa7a9b7a96304ce96fafdffbc5cc6b960a4bb2def9131b36d9225c5" 2025-12-04T09:35:35.3033483Z }, 2025-12-04T09:35:35.3033705Z { 2025-12-04T09:35:35.3034081Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T09:35:35.3034549Z "size": 1012, 2025-12-04T09:35:35.3035032Z "digest": "sha256:c7775ce5574bdde75b4c09a1db19f7d0dc027f1f4c1f961022fc55833133e616" 2025-12-04T09:35:35.3035587Z }, 2025-12-04T09:35:35.3035794Z { 2025-12-04T09:35:35.3036170Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T09:35:35.3036651Z "size": 724, 2025-12-04T09:35:35.3037107Z "digest": "sha256:9681563a88ff9e62494a2740e537440d3df978d466c9478d6a941fae8b57b084" 2025-12-04T09:35:35.3037652Z }, 2025-12-04T09:35:35.3037870Z { 2025-12-04T09:35:35.3038241Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T09:35:35.3038710Z "size": 134, 2025-12-04T09:35:35.3039180Z "digest": "sha256:81945c4fb228ca73f4bac38b6d8a1eca7139585d4a078219dfaa16ea13945949" 2025-12-04T09:35:35.3039735Z }, 2025-12-04T09:35:35.3039950Z { 2025-12-04T09:35:35.3040323Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T09:35:35.3040807Z "size": 32, 2025-12-04T09:35:35.3041265Z "digest": "sha256:4f4fb700ef54461cfa02571ae0db9a0dc1e0cdb5577484a6d75e68dc38e8acc1" 2025-12-04T09:35:35.3041817Z }, 2025-12-04T09:35:35.3042034Z { 2025-12-04T09:35:35.3042398Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T09:35:35.3042873Z "size": 158, 2025-12-04T09:35:35.3043346Z "digest": "sha256:663cbe24d60bf42bc7a440cb4867e4287cacf54194dd3152406668e61d7e92e5" 2025-12-04T09:35:35.3043905Z }, 2025-12-04T09:35:35.3044108Z { 2025-12-04T09:35:35.3044480Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T09:35:35.3044962Z "size": 603, 2025-12-04T09:35:35.3045404Z "digest": "sha256:43f216b027865c8ca16f855703465445f3a548614a4d7e29387337b9651ac25c" 2025-12-04T09:35:35.3045936Z }, 2025-12-04T09:35:35.3046151Z { 2025-12-04T09:35:35.3046601Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T09:35:35.3047090Z "size": 724, 2025-12-04T09:35:35.3047553Z "digest": "sha256:9681563a88ff9e62494a2740e537440d3df978d466c9478d6a941fae8b57b084" 2025-12-04T09:35:35.3048079Z }, 2025-12-04T09:35:35.3048302Z { 2025-12-04T09:35:35.3048678Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T09:35:35.3049144Z "size": 155, 2025-12-04T09:35:35.3049618Z "digest": "sha256:c47c3cfeb68763aa19727693ad52fe0c80561a98139adaa2ab5eccea35c2d1b4" 2025-12-04T09:35:35.3050239Z }, 2025-12-04T09:35:35.3050457Z { 2025-12-04T09:35:35.3050817Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T09:35:35.3051297Z "size": 32, 2025-12-04T09:35:35.3051773Z "digest": "sha256:4f4fb700ef54461cfa02571ae0db9a0dc1e0cdb5577484a6d75e68dc38e8acc1" 2025-12-04T09:35:35.3052311Z }, 2025-12-04T09:35:35.3052533Z { 2025-12-04T09:35:35.3052910Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T09:35:35.3053377Z "size": 188, 2025-12-04T09:35:35.3053854Z "digest": "sha256:7d326b9e267322de9337ac2a71ddeac4cb61f28a018a6155863f83a164ad9437" 2025-12-04T09:35:35.3054407Z }, 2025-12-04T09:35:35.3054612Z { 2025-12-04T09:35:35.3054987Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T09:35:35.3055469Z "size": 1370, 2025-12-04T09:35:35.3055930Z "digest": "sha256:7ec8f17141c8335192fa21b660dfe1fe0ad16b202bc234e7d4ef063b35124158" 2025-12-04T09:35:35.3056566Z }, 2025-12-04T09:35:35.3056790Z { 2025-12-04T09:35:35.3057161Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T09:35:35.3057630Z "size": 32, 2025-12-04T09:35:35.3058107Z "digest": "sha256:4f4fb700ef54461cfa02571ae0db9a0dc1e0cdb5577484a6d75e68dc38e8acc1" 2025-12-04T09:35:35.3058663Z }, 2025-12-04T09:35:35.3058868Z { 2025-12-04T09:35:35.3059239Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T09:35:35.3059725Z "size": 136, 2025-12-04T09:35:35.3060184Z "digest": "sha256:26249ea175bf816b87c4c83e5efb78fd386a800fa10e819ba85b06858bcf877e" 2025-12-04T09:35:35.3060734Z }, 2025-12-04T09:35:35.3060951Z { 2025-12-04T09:35:35.3061310Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T09:35:35.3061790Z "size": 529, 2025-12-04T09:35:35.3062259Z "digest": "sha256:5e8e9ccb36f30a8c3a7e6a5011ee5001152f36c9c749397f3e234b1822326dd0" 2025-12-04T09:35:35.3062806Z }, 2025-12-04T09:35:35.3063010Z { 2025-12-04T09:35:35.3063387Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T09:35:35.3063865Z "size": 32, 2025-12-04T09:35:35.3064324Z "digest": "sha256:4f4fb700ef54461cfa02571ae0db9a0dc1e0cdb5577484a6d75e68dc38e8acc1" 2025-12-04T09:35:35.3064877Z }, 2025-12-04T09:35:35.3065095Z { 2025-12-04T09:35:35.3065450Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T09:35:35.3065928Z "size": 104, 2025-12-04T09:35:35.3066400Z "digest": "sha256:5bc72d4e1de83a1a254e8808f727118dd54cf048c14ff298a5299e015a116bfd" 2025-12-04T09:35:35.3066934Z }, 2025-12-04T09:35:35.3067152Z { 2025-12-04T09:35:35.3067525Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T09:35:35.3067993Z "size": 436, 2025-12-04T09:35:35.3068461Z "digest": "sha256:83cddbd497794c27254e11c4c00105d1f61399e7fef9d208a0be250724efd2c0" 2025-12-04T09:35:35.3069009Z }, 2025-12-04T09:35:35.3069224Z { 2025-12-04T09:35:35.3069580Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T09:35:35.3070071Z "size": 32, 2025-12-04T09:35:35.3070544Z "digest": "sha256:4f4fb700ef54461cfa02571ae0db9a0dc1e0cdb5577484a6d75e68dc38e8acc1" 2025-12-04T09:35:35.3071276Z }, 2025-12-04T09:35:35.3071502Z { 2025-12-04T09:35:35.3071877Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T09:35:35.3072345Z "size": 109, 2025-12-04T09:35:35.3072984Z "digest": "sha256:60c25d8c3dd2d78785f659204d0b1e64954ca581f89874b68ffe8fee23c6b661" 2025-12-04T09:35:35.3073534Z }, 2025-12-04T09:35:35.3073762Z { 2025-12-04T09:35:35.3074119Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T09:35:35.3074603Z "size": 1896, 2025-12-04T09:35:35.3075095Z "digest": "sha256:a534dcf4b9a9e5fabed742c8a8fc43c9cfe7346ea88ab3c177c3b14fd3afe00a" 2025-12-04T09:35:35.3075664Z }, 2025-12-04T09:35:35.3075868Z { 2025-12-04T09:35:35.3076241Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T09:35:35.3076821Z "size": 245582017, 2025-12-04T09:35:35.3077299Z "digest": "sha256:10138310c65c78d7de8375225ce37f5f7bfae7898e4e8bbcb90bd56a1bd05db4" 2025-12-04T09:35:35.3077849Z }, 2025-12-04T09:35:35.3078066Z { 2025-12-04T09:35:35.3078423Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T09:35:35.3078904Z "size": 106, 2025-12-04T09:35:35.3079383Z "digest": "sha256:8487679f252b6fb703dc9398d73aaeec68df724bfc961579ec5bdae62ebe3a37" 2025-12-04T09:35:35.3079918Z }, 2025-12-04T09:35:35.3080135Z { 2025-12-04T09:35:35.3080503Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T09:35:35.3080981Z "size": 162, 2025-12-04T09:35:35.3081439Z "digest": "sha256:52580ee2caa9ab69b0ac640315ee350e847cd0955c0a1eafa933a076669e87ad" 2025-12-04T09:35:35.3081980Z }, 2025-12-04T09:35:35.3082194Z { 2025-12-04T09:35:35.3082551Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T09:35:35.3083029Z "size": 7944, 2025-12-04T09:35:35.3083518Z "digest": "sha256:741c215cb2ffb295ab6a07fab3f0dfdde029463779ff9c0bbff4add26a340cfb" 2025-12-04T09:35:35.3084060Z }, 2025-12-04T09:35:35.3084273Z { 2025-12-04T09:35:35.3084641Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T09:35:35.3085106Z "size": 8070, 2025-12-04T09:35:35.3085568Z "digest": "sha256:d17f5aba17a608d1c7851cb3940a25d43f063385813051127074f693d0ede19b" 2025-12-04T09:35:35.3086117Z }, 2025-12-04T09:35:35.3086323Z { 2025-12-04T09:35:35.3086688Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T09:35:35.3087163Z "size": 304, 2025-12-04T09:35:35.3087638Z "digest": "sha256:bc08246bb4ba18c3ec5bc69e16b6b4e929c5bd0f3fae10eeb0b1a622a63d6fa2" 2025-12-04T09:35:35.3088187Z }, 2025-12-04T09:35:35.3088409Z { 2025-12-04T09:35:35.3088780Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T09:35:35.3089249Z "size": 23755574, 2025-12-04T09:35:35.3089730Z "digest": "sha256:7323bf084bf98f915db061b178c56525a0f95bd34d211b381c7527ad242c5a58" 2025-12-04T09:35:35.3090272Z }, 2025-12-04T09:35:35.3090472Z { 2025-12-04T09:35:35.3090836Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T09:35:35.3091314Z "size": 108, 2025-12-04T09:35:35.3091786Z "digest": "sha256:d344ecc97fd77c7d12fd68ddb67aeb6cc3dd2e723de5ad1ca2c80b45c8d6bd77" 2025-12-04T09:35:35.3092341Z }, 2025-12-04T09:35:35.3092553Z { 2025-12-04T09:35:35.3092912Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T09:35:35.3093393Z "size": 54145663, 2025-12-04T09:35:35.3093881Z "digest": "sha256:fb60b2d2147ff57c218f449f5b680132af8f7f8032ed69f422b48a3c3c1424f4" 2025-12-04T09:35:35.3094429Z }, 2025-12-04T09:35:35.3094636Z { 2025-12-04T09:35:35.3095003Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T09:35:35.3095485Z "size": 32, 2025-12-04T09:35:35.3095941Z "digest": "sha256:4f4fb700ef54461cfa02571ae0db9a0dc1e0cdb5577484a6d75e68dc38e8acc1" 2025-12-04T09:35:35.3096610Z } 2025-12-04T09:35:35.3096832Z ] 2025-12-04T09:35:35.3097035Z } 2025-12-04T09:35:35.3127369Z ##[group]Run set -eux 2025-12-04T09:35:35.3127703Z set -eux 2025-12-04T09:35:35.3128186Z # It's ok if this steps fails, it would then be an anonymous user like what we used to have 2025-12-04T09:35:35.3129683Z aws secretsmanager get-secret-value --secret-id docker_hub_readonly_token | jq --raw-output '.SecretString' | jq -r .docker_hub_readonly_token | docker login --username pytorchbot --password-stdin || true 2025-12-04T09:35:35.3138172Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T09:35:35.3138627Z env: 2025-12-04T09:35:35.3138864Z GIT_DEFAULT_BRANCH: main 2025-12-04T09:35:35.3139167Z ##[endgroup] 2025-12-04T09:35:35.3170444Z + aws secretsmanager get-secret-value --secret-id docker_hub_readonly_token 2025-12-04T09:35:35.3171386Z + jq --raw-output .SecretString 2025-12-04T09:35:35.3172740Z + jq -r .docker_hub_readonly_token 2025-12-04T09:35:35.3173846Z + docker login --username pytorchbot --password-stdin 2025-12-04T09:35:35.9808775Z WARNING! Your password will be stored unencrypted in /home/ec2-user/.docker/config.json. 2025-12-04T09:35:35.9809586Z Configure a credential helper to remove this warning. See 2025-12-04T09:35:35.9810665Z https://docs.docker.com/engine/reference/commandline/login/#credentials-store 2025-12-04T09:35:35.9811139Z 2025-12-04T09:35:35.9811280Z Login Succeeded 2025-12-04T09:35:35.9907156Z ##[group]Run tag=${ECR_DOCKER_IMAGE##*:} 2025-12-04T09:35:35.9907607Z tag=${ECR_DOCKER_IMAGE##*:} 2025-12-04T09:35:35.9908079Z echo "docker pull ghcr.io/pytorch/ci-image:${tag/:/-}" 2025-12-04T09:35:35.9914992Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T09:35:35.9915438Z env: 2025-12-04T09:35:35.9915689Z GIT_DEFAULT_BRANCH: main 2025-12-04T09:35:35.9916675Z ECR_DOCKER_IMAGE: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.4-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a 2025-12-04T09:35:35.9917712Z ##[endgroup] 2025-12-04T09:35:35.9947780Z docker pull ghcr.io/pytorch/ci-image:pytorch-linux-jammy-cuda12.4-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a 2025-12-04T09:35:35.9999448Z ##[group]Run pytorch/test-infra/.github/actions/pull-docker-image@main 2025-12-04T09:35:35.9999957Z with: 2025-12-04T09:35:36.0000878Z docker-image: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.4-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a 2025-12-04T09:35:36.0002005Z docker-registry: 308535385114.dkr.ecr.us-east-1.amazonaws.com 2025-12-04T09:35:36.0002458Z env: 2025-12-04T09:35:36.0002697Z GIT_DEFAULT_BRANCH: main 2025-12-04T09:35:36.0003004Z ##[endgroup] 2025-12-04T09:35:36.0019692Z ##[group]Run set -x 2025-12-04T09:35:36.0020017Z set -x 2025-12-04T09:35:36.0020291Z set +e 2025-12-04T09:35:36.0020533Z  2025-12-04T09:35:36.0020801Z login() { 2025-12-04T09:35:36.0021364Z  aws ecr get-login-password --region us-east-1 | docker login -u AWS --password-stdin "$1" 2025-12-04T09:35:36.0021965Z } 2025-12-04T09:35:36.0022207Z  2025-12-04T09:35:36.0022502Z retry () { 2025-12-04T09:35:36.0022808Z  $* || (sleep 1 && $*) || (sleep 2 && $*) 2025-12-04T09:35:36.0023174Z } 2025-12-04T09:35:36.0023414Z  2025-12-04T09:35:36.0023668Z retry login "${DOCKER_REGISTRY}" 2025-12-04T09:35:36.0024023Z  2025-12-04T09:35:36.0024599Z IMAGE_SIZE=$(docker manifest inspect "${DOCKER_IMAGE}" | jq '[.layers[].size, .config.size] | add / 1024 / 1024') 2025-12-04T09:35:36.0025383Z echo "Compressed size of image in MB: ${IMAGE_SIZE}" 2025-12-04T09:35:36.0025811Z  2025-12-04T09:35:36.0026058Z set -e 2025-12-04T09:35:36.0026458Z # ignore output since only exit code is used for conditional 2025-12-04T09:35:36.0027030Z # only pull docker image if it's not available locally 2025-12-04T09:35:36.0027673Z if ! docker inspect --type=image "${DOCKER_IMAGE}" >/dev/null 2>/dev/null; then 2025-12-04T09:35:36.0028267Z  retry docker pull "${DOCKER_IMAGE}" 2025-12-04T09:35:36.0028654Z fi 2025-12-04T09:35:36.0035248Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T09:35:36.0035689Z env: 2025-12-04T09:35:36.0035943Z GIT_DEFAULT_BRANCH: main 2025-12-04T09:35:36.0036903Z DOCKER_IMAGE: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.4-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a 2025-12-04T09:35:36.0038028Z DOCKER_REGISTRY: 308535385114.dkr.ecr.us-east-1.amazonaws.com 2025-12-04T09:35:36.0038482Z ##[endgroup] 2025-12-04T09:35:36.0065055Z + set +e 2025-12-04T09:35:36.0065677Z + retry login 308535385114.dkr.ecr.us-east-1.amazonaws.com 2025-12-04T09:35:36.0066207Z + login 308535385114.dkr.ecr.us-east-1.amazonaws.com 2025-12-04T09:35:36.0068937Z + aws ecr get-login-password --region us-east-1 2025-12-04T09:35:36.0070205Z + docker login -u AWS --password-stdin 308535385114.dkr.ecr.us-east-1.amazonaws.com 2025-12-04T09:35:36.6280919Z WARNING! Your password will be stored unencrypted in /home/ec2-user/.docker/config.json. 2025-12-04T09:35:36.6281681Z Configure a credential helper to remove this warning. See 2025-12-04T09:35:36.6282531Z https://docs.docker.com/engine/reference/commandline/login/#credentials-store 2025-12-04T09:35:36.6282990Z 2025-12-04T09:35:36.6283109Z Login Succeeded 2025-12-04T09:35:36.6305759Z ++ docker manifest inspect 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.4-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a 2025-12-04T09:35:36.6306907Z ++ jq '[.layers[].size, .config.size] | add / 1024 / 1024' 2025-12-04T09:35:36.8582579Z + IMAGE_SIZE=13438.219573020935 2025-12-04T09:35:36.8583080Z + echo 'Compressed size of image in MB: 13438.219573020935' 2025-12-04T09:35:36.8583568Z + set -e 2025-12-04T09:35:36.8584574Z + docker inspect --type=image 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.4-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a 2025-12-04T09:35:36.8586038Z Compressed size of image in MB: 13438.219573020935 2025-12-04T09:35:36.8716317Z + retry docker pull 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.4-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a 2025-12-04T09:35:36.8718025Z + docker pull 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.4-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a 2025-12-04T09:35:37.0519833Z pytorch-linux-jammy-cuda12.4-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a: Pulling from pytorch/ci-image 2025-12-04T09:35:37.0521350Z 63e5bc7682b8: Pulling fs layer 2025-12-04T09:35:37.0521881Z 835841cca3b7: Pulling fs layer 2025-12-04T09:35:37.0522554Z 1bf1bb125dea: Pulling fs layer 2025-12-04T09:35:37.0523102Z b21856d1bf42: Pulling fs layer 2025-12-04T09:35:37.0523505Z 848ba2c095e2: Pulling fs layer 2025-12-04T09:35:37.0523834Z 029495b23122: Pulling fs layer 2025-12-04T09:35:37.0524175Z 073bb82063cf: Pulling fs layer 2025-12-04T09:35:37.0524499Z 59b639308833: Pulling fs layer 2025-12-04T09:35:37.0524822Z 1c6177b2970d: Pulling fs layer 2025-12-04T09:35:37.0525153Z fabe466dd5f3: Pulling fs layer 2025-12-04T09:35:37.0525469Z 2b5a11b41761: Pulling fs layer 2025-12-04T09:35:37.0525794Z 9681563a88ff: Pulling fs layer 2025-12-04T09:35:37.0526120Z dc0780902fca: Pulling fs layer 2025-12-04T09:35:37.0526435Z 5b09a2b135c8: Pulling fs layer 2025-12-04T09:35:37.0526850Z 4f4fb700ef54: Pulling fs layer 2025-12-04T09:35:37.0527359Z 5bfdaeb5578d: Pulling fs layer 2025-12-04T09:35:37.0527866Z 0ef42867f370: Pulling fs layer 2025-12-04T09:35:37.0528309Z 446083e497f3: Pulling fs layer 2025-12-04T09:35:37.0528937Z d8a170bef0f4: Pulling fs layer 2025-12-04T09:35:37.0529290Z e2b6cd6a5bd0: Pulling fs layer 2025-12-04T09:35:37.0529604Z 93efc0181a22: Pulling fs layer 2025-12-04T09:35:37.0530038Z 7454c938f174: Pulling fs layer 2025-12-04T09:35:37.0530575Z 4d57ff55f6d4: Pulling fs layer 2025-12-04T09:35:37.0530905Z b0301534b4a5: Pulling fs layer 2025-12-04T09:35:37.0531314Z 1969e15d0c13: Pulling fs layer 2025-12-04T09:35:37.0531681Z 73180a0f2d5a: Pulling fs layer 2025-12-04T09:35:37.0532060Z ad81b25cb69f: Pulling fs layer 2025-12-04T09:35:37.0532378Z 029495b23122: Waiting 2025-12-04T09:35:37.0532666Z 8165374f8dcc: Pulling fs layer 2025-12-04T09:35:37.0532990Z 7779c0bb9be2: Pulling fs layer 2025-12-04T09:35:37.0533307Z 4d0a1c027262: Pulling fs layer 2025-12-04T09:35:37.0533636Z a51e0dab2d59: Pulling fs layer 2025-12-04T09:35:37.0533965Z 3eb6d4ff040b: Pulling fs layer 2025-12-04T09:35:37.0534317Z b168858b8537: Pulling fs layer 2025-12-04T09:35:37.0534640Z d77a39278026: Pulling fs layer 2025-12-04T09:35:37.0534965Z 36fbd357280b: Pulling fs layer 2025-12-04T09:35:37.0535294Z 4e3b10a5dd6a: Pulling fs layer 2025-12-04T09:35:37.0535890Z 3092fab73b59: Pulling fs layer 2025-12-04T09:35:37.0536218Z 20020dd28a15: Pulling fs layer 2025-12-04T09:35:37.0536634Z ae5280ce969d: Pulling fs layer 2025-12-04T09:35:37.0536948Z 026e4484b749: Pulling fs layer 2025-12-04T09:35:37.0537289Z 1be9da2ce53d: Pulling fs layer 2025-12-04T09:35:37.0537624Z 6481b7a1d9fb: Pulling fs layer 2025-12-04T09:35:37.0537940Z fa519d18c39d: Pulling fs layer 2025-12-04T09:35:37.0538266Z d172f25b97f7: Pulling fs layer 2025-12-04T09:35:37.0538675Z fd60ab6b1c2c: Pulling fs layer 2025-12-04T09:35:37.0538996Z 0afe45579c2c: Pulling fs layer 2025-12-04T09:35:37.0539328Z 5884ffd6720b: Pulling fs layer 2025-12-04T09:35:37.0539660Z ab7a7c316fa7: Pulling fs layer 2025-12-04T09:35:37.0539977Z c7775ce5574b: Pulling fs layer 2025-12-04T09:35:37.0540306Z 81945c4fb228: Pulling fs layer 2025-12-04T09:35:37.0540618Z 1c6177b2970d: Waiting 2025-12-04T09:35:37.0540883Z 073bb82063cf: Waiting 2025-12-04T09:35:37.0541203Z 848ba2c095e2: Waiting 2025-12-04T09:35:37.0541479Z 59b639308833: Waiting 2025-12-04T09:35:37.0541752Z 5b09a2b135c8: Waiting 2025-12-04T09:35:37.0542018Z fabe466dd5f3: Waiting 2025-12-04T09:35:37.0542302Z 0ef42867f370: Waiting 2025-12-04T09:35:37.0542579Z 5bfdaeb5578d: Waiting 2025-12-04T09:35:37.0543007Z 663cbe24d60b: Pulling fs layer 2025-12-04T09:35:37.0543330Z b21856d1bf42: Waiting 2025-12-04T09:35:37.0543936Z d8a170bef0f4: Waiting 2025-12-04T09:35:37.0544207Z ae5280ce969d: Waiting 2025-12-04T09:35:37.0544496Z 43f216b02786: Pulling fs layer 2025-12-04T09:35:37.0544812Z 4d57ff55f6d4: Waiting 2025-12-04T09:35:37.0545073Z 4f4fb700ef54: Waiting 2025-12-04T09:35:37.0545349Z 446083e497f3: Waiting 2025-12-04T09:35:37.0545618Z 9681563a88ff: Waiting 2025-12-04T09:35:37.0545878Z e2b6cd6a5bd0: Waiting 2025-12-04T09:35:37.0546155Z b0301534b4a5: Waiting 2025-12-04T09:35:37.0546444Z c47c3cfeb687: Pulling fs layer 2025-12-04T09:35:37.0546750Z dc0780902fca: Waiting 2025-12-04T09:35:37.0547024Z 1969e15d0c13: Waiting 2025-12-04T09:35:37.0547300Z 2b5a11b41761: Waiting 2025-12-04T09:35:37.0547611Z 8165374f8dcc: Waiting 2025-12-04T09:35:37.0547958Z 7d326b9e2673: Pulling fs layer 2025-12-04T09:35:37.0548277Z 7779c0bb9be2: Waiting 2025-12-04T09:35:37.0548575Z 5884ffd6720b: Waiting 2025-12-04T09:35:37.0548861Z d172f25b97f7: Waiting 2025-12-04T09:35:37.0549138Z fa519d18c39d: Waiting 2025-12-04T09:35:37.0549398Z 026e4484b749: Waiting 2025-12-04T09:35:37.0549676Z 1be9da2ce53d: Waiting 2025-12-04T09:35:37.0549971Z 7ec8f17141c8: Pulling fs layer 2025-12-04T09:35:37.0550279Z 4d0a1c027262: Waiting 2025-12-04T09:35:37.0550558Z 663cbe24d60b: Waiting 2025-12-04T09:35:37.0550842Z ad81b25cb69f: Waiting 2025-12-04T09:35:37.0551122Z 26249ea175bf: Pulling fs layer 2025-12-04T09:35:37.0551466Z 43f216b02786: Waiting 2025-12-04T09:35:37.0551759Z 5e8e9ccb36f3: Pulling fs layer 2025-12-04T09:35:37.0552081Z 93efc0181a22: Waiting 2025-12-04T09:35:37.0552345Z 7454c938f174: Waiting 2025-12-04T09:35:37.0552624Z fd60ab6b1c2c: Waiting 2025-12-04T09:35:37.0552920Z 5bc72d4e1de8: Pulling fs layer 2025-12-04T09:35:37.0553234Z 4e3b10a5dd6a: Waiting 2025-12-04T09:35:37.0553509Z 81945c4fb228: Waiting 2025-12-04T09:35:37.0553794Z 83cddbd49779: Pulling fs layer 2025-12-04T09:35:37.0554097Z c47c3cfeb687: Waiting 2025-12-04T09:35:37.0554394Z 60c25d8c3dd2: Pulling fs layer 2025-12-04T09:35:37.0554812Z 7d326b9e2673: Waiting 2025-12-04T09:35:37.0555086Z 5bc72d4e1de8: Waiting 2025-12-04T09:35:37.0555344Z 26249ea175bf: Waiting 2025-12-04T09:35:37.0555628Z a534dcf4b9a9: Pulling fs layer 2025-12-04T09:35:37.0555946Z 83cddbd49779: Waiting 2025-12-04T09:35:37.0556210Z 5e8e9ccb36f3: Waiting 2025-12-04T09:35:37.0556497Z 10138310c65c: Pulling fs layer 2025-12-04T09:35:37.0556806Z 7ec8f17141c8: Waiting 2025-12-04T09:35:37.0557231Z 3092fab73b59: Waiting 2025-12-04T09:35:37.0557520Z 8487679f252b: Pulling fs layer 2025-12-04T09:35:37.0557938Z 20020dd28a15: Waiting 2025-12-04T09:35:37.0558244Z a534dcf4b9a9: Waiting 2025-12-04T09:35:37.0558516Z d77a39278026: Waiting 2025-12-04T09:35:37.0558828Z 10138310c65c: Waiting 2025-12-04T09:35:37.0559230Z 52580ee2caa9: Pulling fs layer 2025-12-04T09:35:37.0559547Z 8487679f252b: Waiting 2025-12-04T09:35:37.0559818Z b168858b8537: Waiting 2025-12-04T09:35:37.0560088Z 741c215cb2ff: Pulling fs layer 2025-12-04T09:35:37.0560423Z d17f5aba17a6: Pulling fs layer 2025-12-04T09:35:37.0560740Z 36fbd357280b: Waiting 2025-12-04T09:35:37.0561016Z bc08246bb4ba: Pulling fs layer 2025-12-04T09:35:37.0561334Z 60c25d8c3dd2: Waiting 2025-12-04T09:35:37.0561622Z 7323bf084bf9: Pulling fs layer 2025-12-04T09:35:37.0561932Z 741c215cb2ff: Waiting 2025-12-04T09:35:37.0562194Z d17f5aba17a6: Waiting 2025-12-04T09:35:37.0562481Z d344ecc97fd7: Pulling fs layer 2025-12-04T09:35:37.0562811Z fb60b2d2147f: Pulling fs layer 2025-12-04T09:35:37.0563113Z c7775ce5574b: Waiting 2025-12-04T09:35:37.0563388Z fb60b2d2147f: Waiting 2025-12-04T09:35:37.0563664Z d344ecc97fd7: Waiting 2025-12-04T09:35:37.0563924Z 7323bf084bf9: Waiting 2025-12-04T09:35:37.0564197Z bc08246bb4ba: Waiting 2025-12-04T09:35:37.0564469Z 73180a0f2d5a: Waiting 2025-12-04T09:35:37.0564734Z ab7a7c316fa7: Waiting 2025-12-04T09:35:37.0565009Z 6481b7a1d9fb: Waiting 2025-12-04T09:35:37.0565285Z a51e0dab2d59: Waiting 2025-12-04T09:35:37.0565547Z 3eb6d4ff040b: Waiting 2025-12-04T09:35:37.1269239Z 835841cca3b7: Verifying Checksum 2025-12-04T09:35:37.1269902Z 835841cca3b7: Download complete 2025-12-04T09:35:37.2024193Z b21856d1bf42: Verifying Checksum 2025-12-04T09:35:37.2024600Z b21856d1bf42: Download complete 2025-12-04T09:35:37.2802841Z 848ba2c095e2: Verifying Checksum 2025-12-04T09:35:37.2803217Z 848ba2c095e2: Download complete 2025-12-04T09:35:37.3666160Z 029495b23122: Download complete 2025-12-04T09:35:37.4341239Z 63e5bc7682b8: Verifying Checksum 2025-12-04T09:35:37.4341633Z 63e5bc7682b8: Download complete 2025-12-04T09:35:37.4607554Z 073bb82063cf: Verifying Checksum 2025-12-04T09:35:37.4608224Z 073bb82063cf: Download complete 2025-12-04T09:35:37.4999646Z 59b639308833: Download complete 2025-12-04T09:35:37.5747119Z fabe466dd5f3: Verifying Checksum 2025-12-04T09:35:37.5747590Z fabe466dd5f3: Download complete 2025-12-04T09:35:37.6754374Z 2b5a11b41761: Verifying Checksum 2025-12-04T09:35:37.6754818Z 2b5a11b41761: Download complete 2025-12-04T09:35:37.7474671Z 9681563a88ff: Verifying Checksum 2025-12-04T09:35:37.7475324Z 9681563a88ff: Download complete 2025-12-04T09:35:37.8248418Z dc0780902fca: Verifying Checksum 2025-12-04T09:35:37.8248859Z dc0780902fca: Download complete 2025-12-04T09:35:38.4201140Z 63e5bc7682b8: Pull complete 2025-12-04T09:35:38.4411491Z 835841cca3b7: Pull complete 2025-12-04T09:35:38.7304314Z 1c6177b2970d: Verifying Checksum 2025-12-04T09:35:38.7305046Z 1c6177b2970d: Download complete 2025-12-04T09:35:38.7377065Z 4f4fb700ef54: Download complete 2025-12-04T09:35:38.8172522Z 5bfdaeb5578d: Verifying Checksum 2025-12-04T09:35:38.8172972Z 5bfdaeb5578d: Download complete 2025-12-04T09:35:38.8929047Z 0ef42867f370: Download complete 2025-12-04T09:35:38.9812799Z 446083e497f3: Verifying Checksum 2025-12-04T09:35:38.9813481Z 446083e497f3: Download complete 2025-12-04T09:35:39.0827640Z d8a170bef0f4: Verifying Checksum 2025-12-04T09:35:39.0828075Z d8a170bef0f4: Download complete 2025-12-04T09:35:39.1736835Z e2b6cd6a5bd0: Verifying Checksum 2025-12-04T09:35:39.1737285Z e2b6cd6a5bd0: Download complete 2025-12-04T09:35:39.2682152Z 93efc0181a22: Verifying Checksum 2025-12-04T09:35:39.2696973Z 93efc0181a22: Download complete 2025-12-04T09:35:39.3768558Z 7454c938f174: Verifying Checksum 2025-12-04T09:35:39.3769063Z 7454c938f174: Download complete 2025-12-04T09:35:39.4613140Z 4d57ff55f6d4: Verifying Checksum 2025-12-04T09:35:39.4613868Z 4d57ff55f6d4: Download complete 2025-12-04T09:35:40.8572046Z 1bf1bb125dea: Verifying Checksum 2025-12-04T09:35:40.8572501Z 1bf1bb125dea: Download complete 2025-12-04T09:35:40.9470825Z 1969e15d0c13: Verifying Checksum 2025-12-04T09:35:40.9471443Z 1969e15d0c13: Download complete 2025-12-04T09:35:41.4242374Z 73180a0f2d5a: Verifying Checksum 2025-12-04T09:35:41.4243000Z 73180a0f2d5a: Download complete 2025-12-04T09:35:41.5222086Z ad81b25cb69f: Verifying Checksum 2025-12-04T09:35:41.5222829Z ad81b25cb69f: Download complete 2025-12-04T09:35:41.6100832Z 8165374f8dcc: Verifying Checksum 2025-12-04T09:35:41.6101202Z 8165374f8dcc: Download complete 2025-12-04T09:35:49.3125913Z 7779c0bb9be2: Verifying Checksum 2025-12-04T09:35:49.3126410Z 7779c0bb9be2: Download complete 2025-12-04T09:35:49.4217943Z 4d0a1c027262: Verifying Checksum 2025-12-04T09:35:49.4218545Z 4d0a1c027262: Download complete 2025-12-04T09:35:49.5047087Z a51e0dab2d59: Verifying Checksum 2025-12-04T09:35:49.5047716Z a51e0dab2d59: Download complete 2025-12-04T09:35:49.5272777Z 1bf1bb125dea: Pull complete 2025-12-04T09:35:49.6016546Z 3eb6d4ff040b: Download complete 2025-12-04T09:35:49.6811081Z b168858b8537: Verifying Checksum 2025-12-04T09:35:49.6811697Z b168858b8537: Download complete 2025-12-04T09:35:49.7264009Z b21856d1bf42: Pull complete 2025-12-04T09:35:49.8883692Z 848ba2c095e2: Pull complete 2025-12-04T09:35:50.0018687Z 029495b23122: Pull complete 2025-12-04T09:35:50.1363959Z d77a39278026: Verifying Checksum 2025-12-04T09:35:50.1364394Z d77a39278026: Download complete 2025-12-04T09:35:50.1910543Z 073bb82063cf: Pull complete 2025-12-04T09:35:50.2424293Z 36fbd357280b: Verifying Checksum 2025-12-04T09:35:50.2425011Z 36fbd357280b: Download complete 2025-12-04T09:35:50.3155927Z 59b639308833: Pull complete 2025-12-04T09:35:50.3395806Z 4e3b10a5dd6a: Verifying Checksum 2025-12-04T09:35:50.3396453Z 4e3b10a5dd6a: Download complete 2025-12-04T09:35:50.4218646Z 3092fab73b59: Verifying Checksum 2025-12-04T09:35:50.4219376Z 3092fab73b59: Download complete 2025-12-04T09:35:50.5242079Z 20020dd28a15: Verifying Checksum 2025-12-04T09:35:50.5242672Z 20020dd28a15: Download complete 2025-12-04T09:35:50.6102568Z ae5280ce969d: Download complete 2025-12-04T09:35:50.6999671Z 026e4484b749: Verifying Checksum 2025-12-04T09:35:50.7000211Z 026e4484b749: Download complete 2025-12-04T09:35:50.7790183Z 1be9da2ce53d: Verifying Checksum 2025-12-04T09:35:50.7790677Z 1be9da2ce53d: Download complete 2025-12-04T09:35:50.8728955Z 6481b7a1d9fb: Verifying Checksum 2025-12-04T09:35:50.8729451Z 6481b7a1d9fb: Download complete 2025-12-04T09:35:53.0623928Z 1c6177b2970d: Pull complete 2025-12-04T09:35:53.2668915Z fabe466dd5f3: Pull complete 2025-12-04T09:35:53.4794517Z 2b5a11b41761: Pull complete 2025-12-04T09:35:53.7023864Z 9681563a88ff: Pull complete 2025-12-04T09:35:53.9147689Z dc0780902fca: Pull complete 2025-12-04T09:35:55.7686524Z fa519d18c39d: Verifying Checksum 2025-12-04T09:36:25.4925628Z fa519d18c39d: Download complete 2025-12-04T09:36:25.4926078Z 5b09a2b135c8: Verifying Checksum 2025-12-04T09:36:25.4926418Z 5b09a2b135c8: Download complete 2025-12-04T09:36:25.5923811Z fd60ab6b1c2c: Verifying Checksum 2025-12-04T09:36:25.5924448Z fd60ab6b1c2c: Download complete 2025-12-04T09:36:25.6861000Z 0afe45579c2c: Verifying Checksum 2025-12-04T09:36:25.6861498Z 0afe45579c2c: Download complete 2025-12-04T09:36:25.7783866Z 5884ffd6720b: Verifying Checksum 2025-12-04T09:36:25.7784493Z 5884ffd6720b: Download complete 2025-12-04T09:36:25.9505977Z ab7a7c316fa7: Verifying Checksum 2025-12-04T09:36:25.9506456Z ab7a7c316fa7: Download complete 2025-12-04T09:36:26.0142132Z c7775ce5574b: Verifying Checksum 2025-12-04T09:36:26.0142837Z c7775ce5574b: Download complete 2025-12-04T09:36:26.1046311Z 81945c4fb228: Verifying Checksum 2025-12-04T09:36:26.1046781Z 81945c4fb228: Download complete 2025-12-04T09:36:26.1813211Z 663cbe24d60b: Verifying Checksum 2025-12-04T09:36:26.1813638Z 663cbe24d60b: Download complete 2025-12-04T09:36:26.2748229Z 43f216b02786: Download complete 2025-12-04T09:36:26.4057984Z c47c3cfeb687: Download complete 2025-12-04T09:36:26.4893757Z 7d326b9e2673: Verifying Checksum 2025-12-04T09:36:26.4894278Z 7d326b9e2673: Download complete 2025-12-04T09:36:26.5896530Z 7ec8f17141c8: Download complete 2025-12-04T09:36:26.7125442Z 26249ea175bf: Verifying Checksum 2025-12-04T09:36:26.7125901Z 26249ea175bf: Download complete 2025-12-04T09:36:26.8018817Z 5e8e9ccb36f3: Download complete 2025-12-04T09:36:26.8727983Z 5bc72d4e1de8: Download complete 2025-12-04T09:36:26.9792648Z 83cddbd49779: Verifying Checksum 2025-12-04T09:36:26.9793366Z 83cddbd49779: Download complete 2025-12-04T09:36:27.0569091Z 60c25d8c3dd2: Verifying Checksum 2025-12-04T09:36:27.0569581Z 60c25d8c3dd2: Download complete 2025-12-04T09:36:27.1287948Z a534dcf4b9a9: Verifying Checksum 2025-12-04T09:36:27.1288437Z a534dcf4b9a9: Download complete 2025-12-04T09:36:30.6454194Z 10138310c65c: Verifying Checksum 2025-12-04T09:36:30.8135186Z 10138310c65c: Download complete 2025-12-04T09:36:30.8135882Z 52580ee2caa9: Download complete 2025-12-04T09:36:30.8996765Z 741c215cb2ff: Verifying Checksum 2025-12-04T09:36:30.8997184Z 741c215cb2ff: Download complete 2025-12-04T09:36:31.0253513Z d17f5aba17a6: Download complete 2025-12-04T09:36:31.5527761Z 7323bf084bf9: Verifying Checksum 2025-12-04T09:36:31.5528203Z 7323bf084bf9: Download complete 2025-12-04T09:36:31.6562320Z d344ecc97fd7: Download complete 2025-12-04T09:36:32.6236846Z fb60b2d2147f: Verifying Checksum 2025-12-04T09:36:32.6237394Z fb60b2d2147f: Download complete 2025-12-04T09:36:46.7134522Z d172f25b97f7: Verifying Checksum 2025-12-04T09:36:46.7134954Z d172f25b97f7: Download complete 2025-12-04T09:37:18.1359843Z 5b09a2b135c8: Pull complete 2025-12-04T09:37:18.3621287Z 4f4fb700ef54: Pull complete 2025-12-04T09:37:18.5760913Z 5bfdaeb5578d: Pull complete 2025-12-04T09:37:18.8273791Z 0ef42867f370: Pull complete 2025-12-04T09:37:19.0561997Z 446083e497f3: Pull complete 2025-12-04T09:37:19.3478293Z d8a170bef0f4: Pull complete 2025-12-04T09:37:19.5648449Z e2b6cd6a5bd0: Pull complete 2025-12-04T09:37:19.7899298Z 93efc0181a22: Pull complete 2025-12-04T09:37:20.0154941Z 7454c938f174: Pull complete 2025-12-04T09:37:20.2436138Z 4d57ff55f6d4: Pull complete 2025-12-04T09:37:20.4956107Z b0301534b4a5: Verifying Checksum 2025-12-04T09:37:20.4957792Z b0301534b4a5: Download complete 2025-12-04T09:38:38.6098927Z b0301534b4a5: Pull complete 2025-12-04T09:38:38.8263105Z 1969e15d0c13: Pull complete 2025-12-04T09:38:39.5864570Z 73180a0f2d5a: Pull complete 2025-12-04T09:38:39.7973528Z ad81b25cb69f: Pull complete 2025-12-04T09:38:40.0254374Z 8165374f8dcc: Pull complete 2025-12-04T09:38:48.3902357Z 7779c0bb9be2: Pull complete 2025-12-04T09:38:48.6095369Z 4d0a1c027262: Pull complete 2025-12-04T09:38:48.8267620Z a51e0dab2d59: Pull complete 2025-12-04T09:38:49.1640060Z 3eb6d4ff040b: Pull complete 2025-12-04T09:38:49.3626019Z b168858b8537: Pull complete 2025-12-04T09:38:49.8015313Z d77a39278026: Pull complete 2025-12-04T09:38:50.0234852Z 36fbd357280b: Pull complete 2025-12-04T09:38:50.2405731Z 4e3b10a5dd6a: Pull complete 2025-12-04T09:38:50.6338185Z 3092fab73b59: Pull complete 2025-12-04T09:38:50.8650041Z 20020dd28a15: Pull complete 2025-12-04T09:38:51.0934213Z ae5280ce969d: Pull complete 2025-12-04T09:38:51.5114541Z 026e4484b749: Pull complete 2025-12-04T09:38:51.7183653Z 1be9da2ce53d: Pull complete 2025-12-04T09:38:52.1108693Z 6481b7a1d9fb: Pull complete 2025-12-04T09:38:53.9475480Z fa519d18c39d: Pull complete 2025-12-04T09:39:54.1719381Z d172f25b97f7: Pull complete 2025-12-04T09:39:54.2611874Z fd60ab6b1c2c: Pull complete 2025-12-04T09:39:54.4058510Z 0afe45579c2c: Pull complete 2025-12-04T09:39:54.6742695Z 5884ffd6720b: Pull complete 2025-12-04T09:39:54.9071645Z ab7a7c316fa7: Pull complete 2025-12-04T09:39:55.0381296Z c7775ce5574b: Pull complete 2025-12-04T09:39:55.3482946Z 81945c4fb228: Pull complete 2025-12-04T09:39:55.5947516Z 663cbe24d60b: Pull complete 2025-12-04T09:39:55.7045319Z 43f216b02786: Pull complete 2025-12-04T09:39:55.8876068Z c47c3cfeb687: Pull complete 2025-12-04T09:39:56.1569512Z 7d326b9e2673: Pull complete 2025-12-04T09:39:56.2843815Z 7ec8f17141c8: Pull complete 2025-12-04T09:39:56.5366894Z 26249ea175bf: Pull complete 2025-12-04T09:39:56.6791294Z 5e8e9ccb36f3: Pull complete 2025-12-04T09:39:57.0140280Z 5bc72d4e1de8: Pull complete 2025-12-04T09:39:57.2378689Z 83cddbd49779: Pull complete 2025-12-04T09:39:57.6460501Z 60c25d8c3dd2: Pull complete 2025-12-04T09:39:57.8508626Z a534dcf4b9a9: Pull complete 2025-12-04T09:40:04.5379920Z 10138310c65c: Pull complete 2025-12-04T09:40:04.7595992Z 8487679f252b: Pull complete 2025-12-04T09:40:04.9868373Z 52580ee2caa9: Pull complete 2025-12-04T09:40:05.2033172Z 741c215cb2ff: Pull complete 2025-12-04T09:40:05.4282347Z d17f5aba17a6: Pull complete 2025-12-04T09:40:05.6616193Z bc08246bb4ba: Pull complete 2025-12-04T09:40:07.0220123Z 7323bf084bf9: Pull complete 2025-12-04T09:40:07.1545922Z d344ecc97fd7: Pull complete 2025-12-04T09:40:08.8963067Z fb60b2d2147f: Pull complete 2025-12-04T09:40:09.0771504Z Digest: sha256:ae30f11a5b50741bd652aa0c94ad89ef791c4e50157eff642748620825cf7940 2025-12-04T09:40:09.1095583Z Status: Downloaded newer image for 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.4-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a 2025-12-04T09:40:09.1263005Z 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.4-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a 2025-12-04T09:40:09.1341211Z ##[group]Run echo "IN_CONTAINER_RUNNER=$(if [ -f /.inarc ] || [ -f /.incontainer ]; then echo true ; else echo false; fi)" >> "$GITHUB_OUTPUT" 2025-12-04T09:40:09.1342496Z echo "IN_CONTAINER_RUNNER=$(if [ -f /.inarc ] || [ -f /.incontainer ]; then echo true ; else echo false; fi)" >> "$GITHUB_OUTPUT" 2025-12-04T09:40:09.1352125Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T09:40:09.1352573Z env: 2025-12-04T09:40:09.1352820Z GIT_DEFAULT_BRANCH: main 2025-12-04T09:40:09.1353129Z ##[endgroup] 2025-12-04T09:40:09.1602256Z ##[group]Run pytorch/test-infra/.github/actions/setup-nvidia@main 2025-12-04T09:40:09.1602776Z with: 2025-12-04T09:40:09.1603026Z driver-version: 525.105.17 2025-12-04T09:40:09.1603333Z env: 2025-12-04T09:40:09.1603583Z GIT_DEFAULT_BRANCH: main 2025-12-04T09:40:09.1603873Z ##[endgroup] 2025-12-04T09:40:09.1657747Z ##[group]Run echo "IN_CONTAINER_RUNNER=$(if [ -f /.inarc ] || [ -f /.incontainer ]; then echo true ; else echo false; fi)" >> "$GITHUB_OUTPUT" 2025-12-04T09:40:09.1658852Z echo "IN_CONTAINER_RUNNER=$(if [ -f /.inarc ] || [ -f /.incontainer ]; then echo true ; else echo false; fi)" >> "$GITHUB_OUTPUT" 2025-12-04T09:40:09.1666074Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T09:40:09.1666524Z env: 2025-12-04T09:40:09.1666777Z GIT_DEFAULT_BRANCH: main 2025-12-04T09:40:09.1667073Z ##[endgroup] 2025-12-04T09:40:09.1725593Z ##[group]Run set -euo pipefail 2025-12-04T09:40:09.1725999Z set -euo pipefail 2025-12-04T09:40:09.1726355Z  2025-12-04T09:40:09.1726596Z has_gpu=false 2025-12-04T09:40:09.1726894Z devices="" 2025-12-04T09:40:09.1727167Z  2025-12-04T09:40:09.1727478Z if command -v nvidia-smi >/dev/null 2>&1; then 2025-12-04T09:40:09.1728014Z  if nvidia-smi -L >/tmp/nvidia_devices 2>/dev/null; then 2025-12-04T09:40:09.1728475Z  has_gpu=true 2025-12-04T09:40:09.1728823Z  devices=$(cat /tmp/nvidia_devices) 2025-12-04T09:40:09.1729187Z  fi 2025-12-04T09:40:09.1729456Z fi 2025-12-04T09:40:09.1729698Z  2025-12-04T09:40:09.1729950Z if [ "$has_gpu" = false ]; then 2025-12-04T09:40:09.1730416Z  if ls /dev/nvidia* >/tmp/nvidia_devices 2>/dev/null; then 2025-12-04T09:40:09.1730873Z  has_gpu=true 2025-12-04T09:40:09.1731250Z  devices=$(cat /tmp/nvidia_devices) 2025-12-04T09:40:09.1731608Z  fi 2025-12-04T09:40:09.1731854Z fi 2025-12-04T09:40:09.1732100Z  2025-12-04T09:40:09.1732449Z if [ "$has_gpu" = false ] && command -v lspci >/dev/null 2>&1; then 2025-12-04T09:40:09.1733057Z  if lspci | grep -i 'nvidia' >/tmp/nvidia_devices 2>/dev/null; then 2025-12-04T09:40:09.1733551Z  has_gpu=true 2025-12-04T09:40:09.1733883Z  devices=$(cat /tmp/nvidia_devices) 2025-12-04T09:40:09.1734255Z  fi 2025-12-04T09:40:09.1734503Z fi 2025-12-04T09:40:09.1734730Z  2025-12-04T09:40:09.1735083Z printf 'HAS_NVIDIA=%s\n' "$has_gpu" >> "$GITHUB_OUTPUT" 2025-12-04T09:40:09.1735919Z printf 'DETECTED_DEVICES<> "$GITHUB_OUTPUT" 2025-12-04T09:40:09.1742791Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T09:40:09.1743224Z env: 2025-12-04T09:40:09.1743475Z GIT_DEFAULT_BRANCH: main 2025-12-04T09:40:09.1743796Z ##[endgroup] 2025-12-04T09:40:10.7294604Z ##[group]Run if [ "${HAS_NVIDIA}" = "true" ]; then 2025-12-04T09:40:10.7295087Z if [ "${HAS_NVIDIA}" = "true" ]; then 2025-12-04T09:40:10.7295539Z  echo "HAS_NVIDIA_GPU=true" >> "${GITHUB_ENV}" 2025-12-04T09:40:10.7296319Z  echo "GPU_FLAG=--gpus all -e NVIDIA_DRIVER_CAPABILITIES=all" >> "${GITHUB_ENV}" 2025-12-04T09:40:10.7296879Z else 2025-12-04T09:40:10.7297200Z  echo "HAS_NVIDIA_GPU=false" >> "${GITHUB_ENV}" 2025-12-04T09:40:10.7297613Z fi 2025-12-04T09:40:10.7304527Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T09:40:10.7304979Z env: 2025-12-04T09:40:10.7305233Z GIT_DEFAULT_BRANCH: main 2025-12-04T09:40:10.7305542Z HAS_NVIDIA: true 2025-12-04T09:40:10.7305796Z ##[endgroup] 2025-12-04T09:40:10.7388988Z ##[group]Run nick-fields/retry@3e91a01664abd3c5cd539100d10d33b9c5b68482 2025-12-04T09:40:10.7389505Z with: 2025-12-04T09:40:10.7389746Z timeout_minutes: 10 2025-12-04T09:40:10.7390045Z max_attempts: 3 2025-12-04T09:40:10.7422919Z command: # Is it disgusting to have a full shell script here in this github action? Sure # But is it the best way to make it so that this action relies on nothing else? Absolutely set -eou pipefail DISTRIBUTION=$(. /etc/os-release;echo $ID$VERSION_ID) DRIVER_FN="NVIDIA-Linux-x86_64-${DRIVER_VERSION}.run" install_nvidia_docker2_amzn2() { ( set -x # Needed for yum-config-manager sudo yum install -y yum-utils if [[ "${DISTRIBUTION}" == "amzn2023" ]] ; then YUM_REPO_URL="https://nvidia.github.io/libnvidia-container/stable/rpm/nvidia-container-toolkit.repo" else # Amazon Linux 2 YUM_REPO_URL="https://nvidia.github.io/nvidia-docker/${DISTRIBUTION}/nvidia-docker.repo" fi sudo yum-config-manager --add-repo "${YUM_REPO_URL}" sudo yum install -y \ nvidia-container-toolkit-1.17.8 \ libnvidia-container-tools-1.17.8 \ libnvidia-container1-1.17.8 \ nvidia-container-toolkit-base-1.17.8 sudo systemctl restart docker ) } install_nvidia_docker2_ubuntu20() { ( set -x # Install nvidia-driver package if not installed status="$(dpkg-query -W --showformat='${db:Status-Status}' nvidia-docker2 2>&1)" if [ ! $? = 0 ] || [ ! "$status" = installed ]; then sudo apt-get install -y nvidia-container-toolkit-1.17.8 sudo systemctl restart docker fi ) } pre_install_nvidia_driver_amzn2() { ( # Purge any nvidia driver installed from RHEL repo sudo yum remove -y nvidia-driver-latest-dkms ) } install_nvidia_driver_common() { ( # Try to gather more information about the runner and its existing NVIDIA driver if any echo "Before installing NVIDIA driver" lspci lsmod modinfo nvidia || true HAS_NVIDIA_DRIVER=0 # Check if NVIDIA driver has already been installed if [ -x "$(command -v nvidia-smi)" ]; then set +e # The driver exists, check its version next. Also check only the first GPU if there are more than one of them # so that the same driver version is not print over multiple lines INSTALLED_DRIVER_VERSION=$(nvidia-smi --query-gpu=driver_version --format=csv,noheader --id=0) NVIDIA_SMI_STATUS=$? if [ "$NVIDIA_SMI_STATUS" -ne 0 ] && [ "$NVIDIA_SMI_STATUS" -ne 14 ]; then echo "Failed to get NVIDIA driver version ($INSTALLED_DRIVER_VERSION). Continuing" elif [ "$INSTALLED_DRIVER_VERSION" != "$DRIVER_VERSION" ]; then echo "NVIDIA driver ($INSTALLED_DRIVER_VERSION) has been installed, but we expect to have $DRIVER_VERSION instead. Continuing" # Turn off persistent mode so that the installation script can unload the kernel module sudo killall nvidia-persistenced || true else HAS_NVIDIA_DRIVER=1 echo "NVIDIA driver ($INSTALLED_DRIVER_VERSION) has already been installed. Skipping NVIDIA driver installation" fi set -e fi if [ "$HAS_NVIDIA_DRIVER" -eq 0 ]; then # CAUTION: this may need to be updated in future if [ "${DISTRIBUTION}" != ubuntu20.04 ]; then sudo yum groupinstall -y "Development Tools" # ensure our kernel install is the same as our underlying kernel, # groupinstall "Development Tools" has a habit of mismatching kernel headers sudo yum install -y "kernel-devel-uname-r == $(uname -r)" sudo modprobe backlight fi sudo curl -fsL -o /tmp/nvidia_driver "https://s3.amazonaws.com/ossci-linux/nvidia_driver/$DRIVER_FN" set +e sudo /bin/bash /tmp/nvidia_driver -s --no-drm NVIDIA_INSTALLATION_STATUS=$? RESET_GPU=0 if [ "$NVIDIA_INSTALLATION_STATUS" -ne 0 ]; then sudo cat /var/log/nvidia-installer.log # Fail to install NVIDIA driver, try to reset the GPU RESET_GPU=1 elif [ -x "$(command -v nvidia-smi)" ]; then # Check again if nvidia-smi works even if the driver installation completes successfully INSTALLED_DRIVER_VERSION=$(nvidia-smi --query-gpu=driver_version --format=csv,noheader --id=0) NVIDIA_SMI_STATUS=$? if [ "$NVIDIA_SMI_STATUS" -ne 0 ] && [ "$NVIDIA_SMI_STATUS" -ne 14 ]; then RESET_GPU=1 fi fi if [ "$RESET_GPU" -eq 1 ]; then NVIDIA_DEVICES=$(lspci -D | grep -i NVIDIA | cut -d' ' -f1) # The GPU can get stuck in a failure state if somehow the test crashs the GPU microcode. When this # happens, we'll try to reset all NVIDIA devices https://github.com/pytorch/pytorch/issues/88388 for PCI_ID in $NVIDIA_DEVICES; do DEVICE_ENABLED=$(cat /sys/bus/pci/devices/$PCI_ID/enable) echo "Reseting $PCI_ID (enabled state: $DEVICE_ENABLED)" # This requires sudo permission of course echo "1" | sudo tee /sys/bus/pci/devices/$PCI_ID/reset sleep 1 done fi sudo rm -fv /tmp/nvidia_driver set -e fi ) } post_install_nvidia_driver_common() { ( sudo modprobe nvidia || true echo "After installing NVIDIA driver" lspci lsmod modinfo nvidia || true ( set +e nvidia-smi # NB: Annoyingly, nvidia-smi command returns successfully with return code 0 even in # the case where the driver has already crashed as it still can get the driver version # and some basic information like the bus ID. However, the rest of the information # would be missing (ERR!), for example: # # +-----------------------------------------------------------------------------+ # | NVIDIA-SMI 525.89.02 Driver Version: 525.89.02 CUDA Version: 12.0 | # |-------------------------------+----------------------+----------------------+ # | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | # | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | # | | | MIG M. | # |===============================+======================+======================| # | 0 ERR! Off | 00000000:00:1E.0 Off | ERR! | # |ERR! ERR! ERR! ERR! / ERR! | 4184MiB / 23028MiB | ERR! Default | # | | | ERR! | # +-------------------------------+----------------------+----------------------+ # # +-----------------------------------------------------------------------------+ # | Processes: | # | GPU GI CI PID Type Process name GPU Memory | # | ID ID Usage | # |=============================================================================| # +-----------------------------------------------------------------------------+ # # This should be reported as a failure instead as it will guarantee to fail when # Docker tries to run with --gpus all # # So, the correct check here is to query one of the missing piece of info like # GPU name, so that the command can fail accordingly nvidia-smi --query-gpu=gpu_name --format=csv,noheader --id=0 NVIDIA_SMI_STATUS=$? # Allowable exit statuses for nvidia-smi, see: https://github.com/NVIDIA/gpu-operator/issues/285 if [ "$NVIDIA_SMI_STATUS" -eq 0 ] || [ "$NVIDIA_SMI_STATUS" -eq 14 ]; then echo "INFO: Ignoring allowed status ${NVIDIA_SMI_STATUS}" else echo "ERROR: nvidia-smi exited with unresolved status ${NVIDIA_SMI_STATUS}" exit ${NVIDIA_SMI_STATUS} fi set -e ) ) } install_nvidia_driver_amzn2() { ( set -x pre_install_nvidia_driver_amzn2 install_nvidia_driver_common post_install_nvidia_driver_common ) } install_nvidia_driver_ubuntu20() { ( set -x install_nvidia_driver_common post_install_nvidia_driver_common ) } echo "== Installing nvidia driver ${DRIVER_FN} ==" case "${DISTRIBUTION}" in amzn*) install_nvidia_driver_amzn2 ;; ubuntu20.04) install_nvidia_driver_ubuntu20 ;; *) echo "ERROR: Unknown distribution ${DISTRIBUTION}" exit 1 ;; esac # Install container toolkit based on distribution echo "== Installing nvidia container toolkit for ${DISTRIBUTION} ==" case "${DISTRIBUTION}" in amzn*) install_nvidia_docker2_amzn2 ;; ubuntu20.04) install_nvidia_docker2_ubuntu20 ;; *) echo "ERROR: Unknown distribution ${DISTRIBUTION}" exit 1 ;; esac # Fix https://github.com/NVIDIA/nvidia-docker/issues/1648 on runners with # more than one GPUs. This just needs to be run once. The command fails # on subsequent runs and complains that the mode is already on, but that's # ok sudo nvidia-persistenced || true # This should show persistence mode ON nvidia-smi # check if the container-toolkit is correctly installed and CUDA is available inside a container docker run --rm -t --gpus=all public.ecr.aws/docker/library/python:3.13 nvidia-smi 2025-12-04T09:40:10.7456274Z retry_wait_seconds: 10 2025-12-04T09:40:10.7456615Z polling_interval_seconds: 1 2025-12-04T09:40:10.7456942Z warning_on_retry: true 2025-12-04T09:40:10.7457264Z continue_on_error: false 2025-12-04T09:40:10.7457578Z env: 2025-12-04T09:40:10.7457811Z GIT_DEFAULT_BRANCH: main 2025-12-04T09:40:10.7458121Z HAS_NVIDIA_GPU: true 2025-12-04T09:40:10.7458488Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T09:40:10.7458919Z DRIVER_VERSION: 525.105.17 2025-12-04T09:40:10.7459260Z ##[endgroup] 2025-12-04T09:40:10.8245529Z == Installing nvidia driver NVIDIA-Linux-x86_64-525.105.17.run == 2025-12-04T09:40:10.8246636Z + pre_install_nvidia_driver_amzn2 2025-12-04T09:40:10.8247441Z + sudo yum remove -y nvidia-driver-latest-dkms 2025-12-04T09:40:11.5435209Z No match for argument: nvidia-driver-latest-dkms 2025-12-04T09:40:11.5435966Z No packages marked for removal. 2025-12-04T09:40:11.5514872Z Dependencies resolved. 2025-12-04T09:40:11.5527015Z Nothing to do. 2025-12-04T09:40:11.5527549Z Complete! 2025-12-04T09:40:11.6327011Z + install_nvidia_driver_common 2025-12-04T09:40:11.6334521Z + echo 'Before installing NVIDIA driver' 2025-12-04T09:40:11.6335693Z Before installing NVIDIA driver 2025-12-04T09:40:11.6337719Z + lspci 2025-12-04T09:40:11.6915484Z 00:00.0 Host bridge: Intel Corporation 440FX - 82441FX PMC [Natoma] 2025-12-04T09:40:11.6916537Z 00:01.0 ISA bridge: Intel Corporation 82371SB PIIX3 ISA [Natoma/Triton II] 2025-12-04T09:40:11.6917329Z 00:01.3 Non-VGA unclassified device: Intel Corporation 82371AB/EB/MB PIIX4 ACPI (rev 08) 2025-12-04T09:40:11.6918004Z 00:03.0 VGA compatible controller: Amazon.com, Inc. Device 1111 2025-12-04T09:40:11.6918621Z 00:04.0 Non-Volatile memory controller: Amazon.com, Inc. NVMe EBS Controller 2025-12-04T09:40:11.6919316Z 00:05.0 Ethernet controller: Amazon.com, Inc. Elastic Network Adapter (ENA) 2025-12-04T09:40:11.6919956Z 00:1e.0 3D controller: NVIDIA Corporation TU104GL [Tesla T4] (rev a1) 2025-12-04T09:40:11.6920592Z 00:1f.0 Non-Volatile memory controller: Amazon.com, Inc. NVMe SSD Controller 2025-12-04T09:40:11.6921097Z + lsmod 2025-12-04T09:40:11.6961140Z Module Size Used by 2025-12-04T09:40:11.6961851Z nvidia_uvm 1925120 0 2025-12-04T09:40:11.6962202Z nvidia 14286848 1 nvidia_uvm 2025-12-04T09:40:11.6962611Z drm 602112 1 nvidia 2025-12-04T09:40:11.6963227Z drm_panel_orientation_quirks 32768 1 drm 2025-12-04T09:40:11.6963669Z backlight 24576 1 drm 2025-12-04T09:40:11.6964035Z i2c_core 110592 2 nvidia,drm 2025-12-04T09:40:11.6964402Z xt_conntrack 16384 1 2025-12-04T09:40:11.6964738Z nft_chain_nat 16384 3 2025-12-04T09:40:11.6965057Z xt_MASQUERADE 20480 1 2025-12-04T09:40:11.6965473Z nf_nat 57344 2 nft_chain_nat,xt_MASQUERADE 2025-12-04T09:40:11.6966279Z nf_conntrack_netlink 57344 0 2025-12-04T09:40:11.6967035Z nf_conntrack 184320 4 xt_conntrack,nf_nat,nf_conntrack_netlink,xt_MASQUERADE 2025-12-04T09:40:11.6967596Z nf_defrag_ipv6 24576 1 nf_conntrack 2025-12-04T09:40:11.6967997Z nf_defrag_ipv4 16384 1 nf_conntrack 2025-12-04T09:40:11.6968416Z xfrm_user 57344 1 2025-12-04T09:40:11.6968739Z xfrm_algo 16384 1 xfrm_user 2025-12-04T09:40:11.6969109Z xt_addrtype 16384 2 2025-12-04T09:40:11.6969440Z nft_compat 20480 4 2025-12-04T09:40:11.6969810Z nf_tables 311296 57 nft_compat,nft_chain_nat 2025-12-04T09:40:11.6970333Z nfnetlink 20480 4 nft_compat,nf_conntrack_netlink,nf_tables 2025-12-04T09:40:11.6970806Z br_netfilter 36864 0 2025-12-04T09:40:11.6971630Z bridge 323584 1 br_netfilter 2025-12-04T09:40:11.6972001Z stp 16384 1 bridge 2025-12-04T09:40:11.6972359Z llc 16384 2 bridge,stp 2025-12-04T09:40:11.6972719Z overlay 167936 0 2025-12-04T09:40:11.6973026Z tls 139264 0 2025-12-04T09:40:11.6973343Z nls_ascii 16384 1 2025-12-04T09:40:11.6973662Z nls_cp437 20480 1 2025-12-04T09:40:11.6973962Z vfat 24576 1 2025-12-04T09:40:11.6974285Z fat 86016 1 vfat 2025-12-04T09:40:11.6974632Z sunrpc 700416 1 2025-12-04T09:40:11.6974930Z i8042 45056 0 2025-12-04T09:40:11.6975245Z ena 184320 0 2025-12-04T09:40:11.6975561Z serio 28672 3 i8042 2025-12-04T09:40:11.6975908Z skx_edac_common 28672 0 2025-12-04T09:40:11.6976278Z button 24576 0 2025-12-04T09:40:11.6976606Z ghash_clmulni_intel 16384 0 2025-12-04T09:40:11.6976938Z sch_fq_codel 20480 17 2025-12-04T09:40:11.6977484Z fuse 184320 1 2025-12-04T09:40:11.6977797Z dm_mod 188416 0 2025-12-04T09:40:11.6978117Z configfs 57344 1 2025-12-04T09:40:11.6978420Z loop 36864 0 2025-12-04T09:40:11.6978740Z dmi_sysfs 20480 0 2025-12-04T09:40:11.6979062Z crc32_pclmul 16384 0 2025-12-04T09:40:11.6979374Z crc32c_intel 24576 0 2025-12-04T09:40:11.6979692Z efivarfs 24576 1 2025-12-04T09:40:11.6980014Z + modinfo nvidia 2025-12-04T09:40:11.6983617Z filename: /lib/modules/6.1.150-174.273.amzn2023.x86_64/kernel/drivers/video/nvidia.ko 2025-12-04T09:40:11.6984644Z import_ns: DMA_BUF 2025-12-04T09:40:11.6984959Z alias: char-major-195-* 2025-12-04T09:40:11.6985295Z version: 580.82.07 2025-12-04T09:40:11.6985589Z supported: external 2025-12-04T09:40:11.6985899Z license: Dual MIT/GPL 2025-12-04T09:40:11.6986254Z firmware: nvidia/580.82.07/gsp_tu10x.bin 2025-12-04T09:40:11.6986686Z firmware: nvidia/580.82.07/gsp_ga10x.bin 2025-12-04T09:40:11.6987103Z srcversion: BA7240A71DCF7DC6FE88C1D 2025-12-04T09:40:11.6987516Z alias: of:N*T*Cnvidia,tegra264-displayC* 2025-12-04T09:40:11.6987959Z alias: of:N*T*Cnvidia,tegra264-display 2025-12-04T09:40:11.6988379Z alias: of:N*T*Cnvidia,tegra234-displayC* 2025-12-04T09:40:11.6988813Z alias: of:N*T*Cnvidia,tegra234-display 2025-12-04T09:40:11.6989392Z alias: pci:v000010DEd*sv*sd*bc06sc80i00* 2025-12-04T09:40:11.6989807Z alias: pci:v000010DEd*sv*sd*bc03sc02i00* 2025-12-04T09:40:11.6990233Z alias: pci:v000010DEd*sv*sd*bc03sc00i00* 2025-12-04T09:40:11.6990626Z depends: i2c-core,drm 2025-12-04T09:40:11.6990933Z retpoline: Y 2025-12-04T09:40:11.6991208Z name: nvidia 2025-12-04T09:40:11.6991664Z vermagic: 6.1.150-174.273.amzn2023.x86_64 SMP preempt mod_unload modversions 2025-12-04T09:40:11.6992261Z parm: NvSwitchRegDwords:NvSwitch regkey (charp) 2025-12-04T09:40:11.6992819Z parm: NvSwitchBlacklist:NvSwitchBlacklist=uuid[,uuid...] (charp) 2025-12-04T09:40:11.6993351Z parm: NVreg_ResmanDebugLevel:int 2025-12-04T09:40:11.6993743Z parm: NVreg_RmLogonRC:int 2025-12-04T09:40:11.6994109Z parm: NVreg_ModifyDeviceFiles:int 2025-12-04T09:40:11.6994505Z parm: NVreg_DeviceFileUID:int 2025-12-04T09:40:11.6994887Z parm: NVreg_DeviceFileGID:int 2025-12-04T09:40:11.6995259Z parm: NVreg_DeviceFileMode:int 2025-12-04T09:40:11.6995713Z parm: NVreg_InitializeSystemMemoryAllocations:int 2025-12-04T09:40:11.6996199Z parm: NVreg_UsePageAttributeTable:int 2025-12-04T09:40:11.6996618Z parm: NVreg_EnablePCIeGen3:int 2025-12-04T09:40:11.6996991Z parm: NVreg_EnableMSI:int 2025-12-04T09:40:11.6997376Z parm: NVreg_EnableStreamMemOPs:int 2025-12-04T09:40:11.6997824Z parm: NVreg_RestrictProfilingToAdminUsers:int 2025-12-04T09:40:11.6998306Z parm: NVreg_PreserveVideoMemoryAllocations:int 2025-12-04T09:40:11.6998785Z parm: NVreg_EnableS0ixPowerManagement:int 2025-12-04T09:40:11.6999294Z parm: NVreg_S0ixPowerManagementVideoMemoryThreshold:int 2025-12-04T09:40:11.6999789Z parm: NVreg_DynamicPowerManagement:int 2025-12-04T09:40:11.7000308Z parm: NVreg_DynamicPowerManagementVideoMemoryThreshold:int 2025-12-04T09:40:11.7000820Z parm: NVreg_EnableGpuFirmware:int 2025-12-04T09:40:11.7001247Z parm: NVreg_EnableGpuFirmwareLogs:int 2025-12-04T09:40:11.7001693Z parm: NVreg_OpenRmEnableUnsupportedGpus:int 2025-12-04T09:40:11.7002164Z parm: NVreg_EnableUserNUMAManagement:int 2025-12-04T09:40:11.7002590Z parm: NVreg_MemoryPoolSize:int 2025-12-04T09:40:11.7002981Z parm: NVreg_KMallocHeapMaxSize:int 2025-12-04T09:40:11.7003402Z parm: NVreg_VMallocHeapMaxSize:int 2025-12-04T09:40:11.7003807Z parm: NVreg_IgnoreMMIOCheck:int 2025-12-04T09:40:11.7004185Z parm: NVreg_NvLinkDisable:int 2025-12-04T09:40:11.7004723Z parm: NVreg_EnablePCIERelaxedOrderingMode:int 2025-12-04T09:40:11.7005182Z parm: NVreg_RegisterPCIDriver:int 2025-12-04T09:40:11.7005615Z parm: NVreg_RegisterPlatformDeviceDriver:int 2025-12-04T09:40:11.7006070Z parm: NVreg_EnableResizableBar:int 2025-12-04T09:40:11.7006495Z parm: NVreg_EnableDbgBreakpoint:int 2025-12-04T09:40:11.7006934Z parm: NVreg_EnableNonblockingOpen:int 2025-12-04T09:40:11.7007363Z parm: NVreg_CoherentGPUMemoryMode:charp 2025-12-04T09:40:11.7007791Z parm: NVreg_RegistryDwords:charp 2025-12-04T09:40:11.7008216Z parm: NVreg_RegistryDwordsPerDevice:charp 2025-12-04T09:40:11.7008616Z parm: NVreg_RmMsg:charp 2025-12-04T09:40:11.7008978Z parm: NVreg_GpuBlacklist:charp 2025-12-04T09:40:11.7009385Z parm: NVreg_TemporaryFilePath:charp 2025-12-04T09:40:11.7009781Z parm: NVreg_ExcludedGpus:charp 2025-12-04T09:40:11.7010181Z parm: NVreg_DmaRemapPeerMmio:int 2025-12-04T09:40:11.7010591Z parm: NVreg_RmNvlinkBandwidth:charp 2025-12-04T09:40:11.7011037Z parm: NVreg_RmNvlinkBandwidthLinkCount:int 2025-12-04T09:40:11.7011460Z parm: NVreg_ImexChannelCount:int 2025-12-04T09:40:11.7011866Z parm: NVreg_CreateImexChannel0:int 2025-12-04T09:40:11.7012300Z parm: NVreg_GrdmaPciTopoCheckOverride:int 2025-12-04T09:40:11.7012786Z parm: rm_firmware_active:charp 2025-12-04T09:40:11.7013156Z + HAS_NVIDIA_DRIVER=0 2025-12-04T09:40:11.7013474Z ++ command -v nvidia-smi 2025-12-04T09:40:11.7013781Z + '[' -x /usr/bin/nvidia-smi ']' 2025-12-04T09:40:11.7014106Z + set +e 2025-12-04T09:40:11.7014499Z ++ nvidia-smi --query-gpu=driver_version --format=csv,noheader --id=0 2025-12-04T09:40:13.2489648Z + INSTALLED_DRIVER_VERSION=580.82.07 2025-12-04T09:40:13.2490088Z + NVIDIA_SMI_STATUS=0 2025-12-04T09:40:13.2490637Z + '[' 0 -ne 0 ']' 2025-12-04T09:40:13.2490913Z + '[' 580.82.07 '!=' 525.105.17 ']' 2025-12-04T09:40:13.2491566Z + echo 'NVIDIA driver (580.82.07) has been installed, but we expect to have 525.105.17 instead. Continuing' 2025-12-04T09:40:13.2492218Z + sudo killall nvidia-persistenced 2025-12-04T09:40:13.2492825Z NVIDIA driver (580.82.07) has been installed, but we expect to have 525.105.17 instead. Continuing 2025-12-04T09:40:13.3983716Z nvidia-persistenced: no process found 2025-12-04T09:40:13.4003090Z + true 2025-12-04T09:40:13.4003439Z + set -e 2025-12-04T09:40:13.4003678Z + '[' 0 -eq 0 ']' 2025-12-04T09:40:13.4003977Z + '[' amzn2023 '!=' ubuntu20.04 ']' 2025-12-04T09:40:13.4004388Z + sudo yum groupinstall -y 'Development Tools' 2025-12-04T09:40:13.9483309Z Last metadata expiration check: 0:23:23 ago on Thu Dec 4 09:16:50 2025. 2025-12-04T09:40:13.9944015Z No match for group package "system-rpm-config" 2025-12-04T09:40:13.9963978Z No match for group package "rcs" 2025-12-04T09:40:13.9990371Z No match for group package "pkgconfig" 2025-12-04T09:40:14.0583796Z Dependencies resolved. 2025-12-04T09:40:14.0923056Z ================================================================================ 2025-12-04T09:40:14.0923633Z Package Architecture Version Repository Size 2025-12-04T09:40:14.0924161Z ================================================================================ 2025-12-04T09:40:14.0924554Z Installing Groups: 2025-12-04T09:40:14.0924957Z Development Tools 2025-12-04T09:40:14.0925311Z 2025-12-04T09:40:14.0925419Z Transaction Summary 2025-12-04T09:40:14.0925726Z ================================================================================ 2025-12-04T09:40:14.0926044Z 2025-12-04T09:40:15.0086091Z ================================================================================ 2025-12-04T09:40:15.0086609Z WARNING: 2025-12-04T09:40:15.0086920Z A newer release of "Amazon Linux" is available. 2025-12-04T09:40:15.0087210Z 2025-12-04T09:40:15.0087319Z Available Versions: 2025-12-04T09:40:15.0087798Z 2025-12-04T09:40:15.0087908Z Version 2023.9.20250929: 2025-12-04T09:40:15.0088298Z Run the following command to upgrade to 2023.9.20250929: 2025-12-04T09:40:15.0088622Z 2025-12-04T09:40:15.0088784Z dnf upgrade --releasever=2023.9.20250929 2025-12-04T09:40:15.0089049Z 2025-12-04T09:40:15.0089153Z Release notes: 2025-12-04T09:40:15.0089683Z https://docs.aws.amazon.com/linux/al2023/release-notes/relnotes-2023.9.20250929.html 2025-12-04T09:40:15.0090156Z 2025-12-04T09:40:15.0090283Z Version 2023.9.20251014: 2025-12-04T09:40:15.0090755Z Run the following command to upgrade to 2023.9.20251014: 2025-12-04T09:40:15.0091088Z 2025-12-04T09:40:15.0091231Z dnf upgrade --releasever=2023.9.20251014 2025-12-04T09:40:15.0091511Z 2025-12-04T09:40:15.0091615Z Release notes: 2025-12-04T09:40:15.0092120Z https://docs.aws.amazon.com/linux/al2023/release-notes/relnotes-2023.9.20251014.html 2025-12-04T09:40:15.0092589Z 2025-12-04T09:40:15.0092700Z Version 2023.9.20251020: 2025-12-04T09:40:15.0093089Z Run the following command to upgrade to 2023.9.20251020: 2025-12-04T09:40:15.0093405Z 2025-12-04T09:40:15.0093557Z dnf upgrade --releasever=2023.9.20251020 2025-12-04T09:40:15.0093819Z 2025-12-04T09:40:15.0093934Z Release notes: 2025-12-04T09:40:15.0094415Z https://docs.aws.amazon.com/linux/al2023/release-notes/relnotes-2023.9.20251020.html 2025-12-04T09:40:15.0094895Z 2025-12-04T09:40:15.0095160Z Version 2023.9.20251027: 2025-12-04T09:40:15.0095550Z Run the following command to upgrade to 2023.9.20251027: 2025-12-04T09:40:15.0095864Z 2025-12-04T09:40:15.0096000Z dnf upgrade --releasever=2023.9.20251027 2025-12-04T09:40:15.0096372Z 2025-12-04T09:40:15.0096475Z Release notes: 2025-12-04T09:40:15.0096970Z https://docs.aws.amazon.com/linux/al2023/release-notes/relnotes-2023.9.20251027.html 2025-12-04T09:40:15.0097436Z 2025-12-04T09:40:15.0097561Z Version 2023.9.20251105: 2025-12-04T09:40:15.0097935Z Run the following command to upgrade to 2023.9.20251105: 2025-12-04T09:40:15.0098276Z 2025-12-04T09:40:15.0098414Z dnf upgrade --releasever=2023.9.20251105 2025-12-04T09:40:15.0098676Z 2025-12-04T09:40:15.0098795Z Release notes: 2025-12-04T09:40:15.0099281Z https://docs.aws.amazon.com/linux/al2023/release-notes/relnotes-2023.9.20251105.html 2025-12-04T09:40:15.0099762Z 2025-12-04T09:40:15.0099869Z Version 2023.9.20251110: 2025-12-04T09:40:15.0100255Z Run the following command to upgrade to 2023.9.20251110: 2025-12-04T09:40:15.0100572Z 2025-12-04T09:40:15.0100722Z dnf upgrade --releasever=2023.9.20251110 2025-12-04T09:40:15.0100980Z 2025-12-04T09:40:15.0101081Z Release notes: 2025-12-04T09:40:15.0101571Z https://docs.aws.amazon.com/linux/al2023/release-notes/relnotes-2023.9.20251110.html 2025-12-04T09:40:15.0102032Z 2025-12-04T09:40:15.0102151Z Version 2023.9.20251117: 2025-12-04T09:40:15.0102518Z Run the following command to upgrade to 2023.9.20251117: 2025-12-04T09:40:15.0102854Z 2025-12-04T09:40:15.0102990Z dnf upgrade --releasever=2023.9.20251117 2025-12-04T09:40:15.0103265Z 2025-12-04T09:40:15.0103367Z Release notes: 2025-12-04T09:40:15.0103855Z https://docs.aws.amazon.com/linux/al2023/release-notes/relnotes-2023.9.20251117.html 2025-12-04T09:40:15.0104316Z 2025-12-04T09:40:15.0104449Z ================================================================================ 2025-12-04T09:40:15.0104844Z Complete! 2025-12-04T09:40:15.1253255Z ++ uname -r 2025-12-04T09:40:15.1265676Z + sudo yum install -y 'kernel-devel-uname-r == 6.1.150-174.273.amzn2023.x86_64' 2025-12-04T09:40:15.7101409Z Last metadata expiration check: 0:23:25 ago on Thu Dec 4 09:16:50 2025. 2025-12-04T09:40:15.7418792Z Using '==' operator in reldeps can result in an undefined behavior. It is deprecated and the support will be dropped in future versions. Use '=' operator instead. 2025-12-04T09:40:15.7543608Z Package kernel-devel-1:6.1.150-174.273.amzn2023.x86_64 is already installed. 2025-12-04T09:40:15.8187446Z Dependencies resolved. 2025-12-04T09:40:15.8544721Z Nothing to do. 2025-12-04T09:40:15.8545385Z Complete! 2025-12-04T09:40:15.9691489Z + sudo modprobe backlight 2025-12-04T09:40:16.2362214Z + sudo curl -fsL -o /tmp/nvidia_driver https://s3.amazonaws.com/ossci-linux/nvidia_driver/NVIDIA-Linux-x86_64-525.105.17.run 2025-12-04T09:40:20.5959749Z + set +e 2025-12-04T09:40:20.5960225Z + sudo /bin/bash /tmp/nvidia_driver -s --no-drm 2025-12-04T09:40:22.1108168Z Verifying archive integrity... OK 2025-12-04T09:40:49.8207627Z Uncompressing NVIDIA Accelerated Graphics Driver for Linux-x86_64 525.105.17................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................... 2025-12-04T09:40:50.3739232Z 2025-12-04T09:40:50.3742167Z WARNING: The nvidia-drm module will not be installed. As a result, DRM-KMS will not function with this installation of the NVIDIA driver. 2025-12-04T09:40:50.3742892Z 2025-12-04T09:41:16.5245111Z 2025-12-04T09:41:16.5247012Z WARNING: nvidia-installer was forced to guess the X library path '/usr/lib64' and X module path '/usr/lib64/xorg/modules'; these paths were not queryable from the system. If X fails to find the NVIDIA X driver module, please install the `pkg-config` utility and the X.Org SDK/development package for your distribution and reinstall the driver. 2025-12-04T09:41:16.5248729Z 2025-12-04T09:41:16.5264403Z 2025-12-04T09:41:16.5265773Z WARNING: This NVIDIA driver package includes Vulkan components, but no Vulkan ICD loader was detected on this system. The NVIDIA Vulkan ICD will not function without the loader. Most distributions package the Vulkan loader; try installing the "vulkan-loader", "vulkan-icd-loader", or "libvulkan1" package. 2025-12-04T09:41:16.5267258Z 2025-12-04T09:41:28.1302144Z + NVIDIA_INSTALLATION_STATUS=0 2025-12-04T09:41:28.1302557Z + RESET_GPU=0 2025-12-04T09:41:28.1302836Z + '[' 0 -ne 0 ']' 2025-12-04T09:41:28.1304638Z ++ command -v nvidia-smi 2025-12-04T09:41:28.1307569Z + '[' -x /usr/bin/nvidia-smi ']' 2025-12-04T09:41:28.1311266Z ++ nvidia-smi --query-gpu=driver_version --format=csv,noheader --id=0 2025-12-04T09:41:30.7139362Z + INSTALLED_DRIVER_VERSION=525.105.17 2025-12-04T09:41:30.7139821Z + NVIDIA_SMI_STATUS=0 2025-12-04T09:41:30.7140138Z + '[' 0 -ne 0 ']' 2025-12-04T09:41:30.7140389Z + '[' 0 -eq 1 ']' 2025-12-04T09:41:30.7140683Z + sudo rm -fv /tmp/nvidia_driver 2025-12-04T09:41:30.8987357Z removed '/tmp/nvidia_driver' 2025-12-04T09:41:30.9006505Z + set -e 2025-12-04T09:41:30.9008667Z + post_install_nvidia_driver_common 2025-12-04T09:41:30.9012478Z + sudo modprobe nvidia 2025-12-04T09:41:31.1454613Z + echo 'After installing NVIDIA driver' 2025-12-04T09:41:31.1455031Z + lspci 2025-12-04T09:41:31.1455286Z After installing NVIDIA driver 2025-12-04T09:41:31.1592554Z 00:00.0 Host bridge: Intel Corporation 440FX - 82441FX PMC [Natoma] 2025-12-04T09:41:31.1593189Z 00:01.0 ISA bridge: Intel Corporation 82371SB PIIX3 ISA [Natoma/Triton II] 2025-12-04T09:41:31.1594009Z 00:01.3 Non-VGA unclassified device: Intel Corporation 82371AB/EB/MB PIIX4 ACPI (rev 08) 2025-12-04T09:41:31.1594664Z 00:03.0 VGA compatible controller: Amazon.com, Inc. Device 1111 2025-12-04T09:41:31.1595587Z 00:04.0 Non-Volatile memory controller: Amazon.com, Inc. NVMe EBS Controller 2025-12-04T09:41:31.1596267Z 00:05.0 Ethernet controller: Amazon.com, Inc. Elastic Network Adapter (ENA) 2025-12-04T09:41:31.1596902Z 00:1e.0 3D controller: NVIDIA Corporation TU104GL [Tesla T4] (rev a1) 2025-12-04T09:41:31.1597521Z 00:1f.0 Non-Volatile memory controller: Amazon.com, Inc. NVMe SSD Controller 2025-12-04T09:41:31.1598050Z + lsmod 2025-12-04T09:41:31.1625496Z Module Size Used by 2025-12-04T09:41:31.1625918Z nvidia 56537088 0 2025-12-04T09:41:31.1626243Z drm 602112 1 nvidia 2025-12-04T09:41:31.1626645Z drm_panel_orientation_quirks 32768 1 drm 2025-12-04T09:41:31.1627068Z backlight 24576 1 drm 2025-12-04T09:41:31.1627432Z i2c_core 110592 2 nvidia,drm 2025-12-04T09:41:31.1627797Z xt_conntrack 16384 1 2025-12-04T09:41:31.1628121Z nft_chain_nat 16384 3 2025-12-04T09:41:31.1628438Z xt_MASQUERADE 20480 1 2025-12-04T09:41:31.1628807Z nf_nat 57344 2 nft_chain_nat,xt_MASQUERADE 2025-12-04T09:41:31.1629219Z nf_conntrack_netlink 57344 0 2025-12-04T09:41:31.1629707Z nf_conntrack 184320 4 xt_conntrack,nf_nat,nf_conntrack_netlink,xt_MASQUERADE 2025-12-04T09:41:31.1630261Z nf_defrag_ipv6 24576 1 nf_conntrack 2025-12-04T09:41:31.1630858Z nf_defrag_ipv4 16384 1 nf_conntrack 2025-12-04T09:41:31.1631223Z xfrm_user 57344 1 2025-12-04T09:41:31.1631579Z xfrm_algo 16384 1 xfrm_user 2025-12-04T09:41:31.1631947Z xt_addrtype 16384 2 2025-12-04T09:41:31.1632275Z nft_compat 20480 4 2025-12-04T09:41:31.1632646Z nf_tables 311296 57 nft_compat,nft_chain_nat 2025-12-04T09:41:31.1633167Z nfnetlink 20480 4 nft_compat,nf_conntrack_netlink,nf_tables 2025-12-04T09:41:31.1633647Z br_netfilter 36864 0 2025-12-04T09:41:31.1633984Z bridge 323584 1 br_netfilter 2025-12-04T09:41:31.1634366Z stp 16384 1 bridge 2025-12-04T09:41:31.1634722Z llc 16384 2 bridge,stp 2025-12-04T09:41:31.1635062Z overlay 167936 0 2025-12-04T09:41:31.1635376Z tls 139264 0 2025-12-04T09:41:31.1635689Z nls_ascii 16384 1 2025-12-04T09:41:31.1636001Z nls_cp437 20480 1 2025-12-04T09:41:31.1636305Z vfat 24576 1 2025-12-04T09:41:31.1636618Z fat 86016 1 vfat 2025-12-04T09:41:31.1636952Z sunrpc 700416 1 2025-12-04T09:41:31.1637245Z i8042 45056 0 2025-12-04T09:41:31.1637549Z ena 184320 0 2025-12-04T09:41:31.1637863Z serio 28672 3 i8042 2025-12-04T09:41:31.1638193Z skx_edac_common 28672 0 2025-12-04T09:41:31.1638510Z button 24576 0 2025-12-04T09:41:31.1638832Z ghash_clmulni_intel 16384 0 2025-12-04T09:41:31.1639148Z sch_fq_codel 20480 17 2025-12-04T09:41:31.1639478Z fuse 184320 1 2025-12-04T09:41:31.1639785Z dm_mod 188416 0 2025-12-04T09:41:31.1640083Z configfs 57344 1 2025-12-04T09:41:31.1640391Z loop 36864 0 2025-12-04T09:41:31.1640699Z dmi_sysfs 20480 0 2025-12-04T09:41:31.1641012Z crc32_pclmul 16384 0 2025-12-04T09:41:31.1641313Z crc32c_intel 24576 0 2025-12-04T09:41:31.1641632Z efivarfs 24576 1 2025-12-04T09:41:31.1641944Z + modinfo nvidia 2025-12-04T09:41:31.1644323Z filename: /lib/modules/6.1.150-174.273.amzn2023.x86_64/kernel/drivers/video/nvidia.ko 2025-12-04T09:41:31.1644944Z firmware: nvidia/525.105.17/gsp_tu10x.bin 2025-12-04T09:41:31.1645380Z firmware: nvidia/525.105.17/gsp_ad10x.bin 2025-12-04T09:41:31.1645771Z alias: char-major-195-* 2025-12-04T09:41:31.1646109Z version: 525.105.17 2025-12-04T09:41:31.1646428Z supported: external 2025-12-04T09:41:31.1646716Z license: NVIDIA 2025-12-04T09:41:31.1647150Z srcversion: 98F82D76E0EF3952EEE57A7 2025-12-04T09:41:31.1647556Z alias: pci:v000010DEd*sv*sd*bc06sc80i00* 2025-12-04T09:41:31.1648042Z alias: pci:v000010DEd*sv*sd*bc03sc02i00* 2025-12-04T09:41:31.1648513Z alias: pci:v000010DEd*sv*sd*bc03sc00i00* 2025-12-04T09:41:31.1648909Z depends: i2c-core,drm 2025-12-04T09:41:31.1649230Z retpoline: Y 2025-12-04T09:41:31.1649505Z name: nvidia 2025-12-04T09:41:31.1649960Z vermagic: 6.1.150-174.273.amzn2023.x86_64 SMP preempt mod_unload modversions 2025-12-04T09:41:31.1650558Z parm: NvSwitchRegDwords:NvSwitch regkey (charp) 2025-12-04T09:41:31.1651105Z parm: NvSwitchBlacklist:NvSwitchBlacklist=uuid[,uuid...] (charp) 2025-12-04T09:41:31.1651638Z parm: NVreg_ResmanDebugLevel:int 2025-12-04T09:41:31.1652026Z parm: NVreg_RmLogonRC:int 2025-12-04T09:41:31.1652393Z parm: NVreg_ModifyDeviceFiles:int 2025-12-04T09:41:31.1652798Z parm: NVreg_DeviceFileUID:int 2025-12-04T09:41:31.1653177Z parm: NVreg_DeviceFileGID:int 2025-12-04T09:41:31.1653575Z parm: NVreg_DeviceFileMode:int 2025-12-04T09:41:31.1654009Z parm: NVreg_InitializeSystemMemoryAllocations:int 2025-12-04T09:41:31.1654489Z parm: NVreg_UsePageAttributeTable:int 2025-12-04T09:41:31.1654901Z parm: NVreg_EnablePCIeGen3:int 2025-12-04T09:41:31.1655360Z parm: NVreg_EnableMSI:int 2025-12-04T09:41:31.1655724Z parm: NVreg_TCEBypassMode:int 2025-12-04T09:41:31.1656120Z parm: NVreg_EnableStreamMemOPs:int 2025-12-04T09:41:31.1656647Z parm: NVreg_RestrictProfilingToAdminUsers:int 2025-12-04T09:41:31.1657126Z parm: NVreg_PreserveVideoMemoryAllocations:int 2025-12-04T09:41:31.1657598Z parm: NVreg_EnableS0ixPowerManagement:int 2025-12-04T09:41:31.1658109Z parm: NVreg_S0ixPowerManagementVideoMemoryThreshold:int 2025-12-04T09:41:31.1658605Z parm: NVreg_DynamicPowerManagement:int 2025-12-04T09:41:31.1659128Z parm: NVreg_DynamicPowerManagementVideoMemoryThreshold:int 2025-12-04T09:41:31.1659639Z parm: NVreg_EnableGpuFirmware:int 2025-12-04T09:41:31.1660046Z parm: NVreg_EnableGpuFirmwareLogs:int 2025-12-04T09:41:31.1660506Z parm: NVreg_OpenRmEnableUnsupportedGpus:int 2025-12-04T09:41:31.1660970Z parm: NVreg_EnableUserNUMAManagement:int 2025-12-04T09:41:31.1661396Z parm: NVreg_MemoryPoolSize:int 2025-12-04T09:41:31.1661784Z parm: NVreg_KMallocHeapMaxSize:int 2025-12-04T09:41:31.1662195Z parm: NVreg_VMallocHeapMaxSize:int 2025-12-04T09:41:31.1662596Z parm: NVreg_IgnoreMMIOCheck:int 2025-12-04T09:41:31.1662973Z parm: NVreg_NvLinkDisable:int 2025-12-04T09:41:31.1663399Z parm: NVreg_EnablePCIERelaxedOrderingMode:int 2025-12-04T09:41:31.1663847Z parm: NVreg_RegisterPCIDriver:int 2025-12-04T09:41:31.1664246Z parm: NVreg_EnableDbgBreakpoint:int 2025-12-04T09:41:31.1664668Z parm: NVreg_RegistryDwords:charp 2025-12-04T09:41:31.1665092Z parm: NVreg_RegistryDwordsPerDevice:charp 2025-12-04T09:41:31.1665494Z parm: NVreg_RmMsg:charp 2025-12-04T09:41:31.1665853Z parm: NVreg_GpuBlacklist:charp 2025-12-04T09:41:31.1666257Z parm: NVreg_TemporaryFilePath:charp 2025-12-04T09:41:31.1666666Z parm: NVreg_ExcludedGpus:charp 2025-12-04T09:41:31.1667049Z parm: NVreg_DmaRemapPeerMmio:int 2025-12-04T09:41:31.1667442Z parm: rm_firmware_active:charp 2025-12-04T09:41:31.1667792Z + set +e 2025-12-04T09:41:31.1668011Z + nvidia-smi 2025-12-04T09:41:33.1429579Z Thu Dec 4 09:41:33 2025 2025-12-04T09:41:33.1430085Z +-----------------------------------------------------------------------------+ 2025-12-04T09:41:33.1430694Z | NVIDIA-SMI 525.105.17 Driver Version: 525.105.17 CUDA Version: 12.0 | 2025-12-04T09:41:33.1431275Z |-------------------------------+----------------------+----------------------+ 2025-12-04T09:41:33.1432271Z | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | 2025-12-04T09:41:33.1432928Z | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | 2025-12-04T09:41:33.1433460Z | | | MIG M. | 2025-12-04T09:41:33.1433856Z |===============================+======================+======================| 2025-12-04T09:41:33.1509162Z | 0 Tesla T4 Off | 00000000:00:1E.0 Off | 0 | 2025-12-04T09:41:33.1509768Z | N/A 30C P0 25W / 70W | 2MiB / 15360MiB | 4% Default | 2025-12-04T09:41:33.1510324Z | | | N/A | 2025-12-04T09:41:33.1510781Z +-------------------------------+----------------------+----------------------+ 2025-12-04T09:41:33.1511258Z 2025-12-04T09:41:33.1511728Z +-----------------------------------------------------------------------------+ 2025-12-04T09:41:33.1512229Z | Processes: | 2025-12-04T09:41:33.1512893Z | GPU GI CI PID Type Process name GPU Memory | 2025-12-04T09:41:33.1513383Z | ID ID Usage | 2025-12-04T09:41:33.1514039Z |=============================================================================| 2025-12-04T09:41:33.1514551Z | No running processes found | 2025-12-04T09:41:33.1515122Z +-----------------------------------------------------------------------------+ 2025-12-04T09:41:33.5994895Z + nvidia-smi --query-gpu=gpu_name --format=csv,noheader --id=0 2025-12-04T09:41:35.5628166Z Tesla T4 2025-12-04T09:41:36.0004830Z + NVIDIA_SMI_STATUS=0 2025-12-04T09:41:36.0005216Z + '[' 0 -eq 0 ']' 2025-12-04T09:41:36.0005520Z + echo 'INFO: Ignoring allowed status 0' 2025-12-04T09:41:36.0005929Z + set -e 2025-12-04T09:41:36.0006178Z INFO: Ignoring allowed status 0 2025-12-04T09:41:36.0012658Z == Installing nvidia container toolkit for amzn2023 == 2025-12-04T09:41:36.0016421Z + sudo yum install -y yum-utils 2025-12-04T09:41:36.5647087Z Last metadata expiration check: 0:24:46 ago on Thu Dec 4 09:16:50 2025. 2025-12-04T09:41:36.5992846Z Package dnf-utils-4.3.0-13.amzn2023.0.5.noarch is already installed. 2025-12-04T09:41:36.6620604Z Dependencies resolved. 2025-12-04T09:41:36.6974932Z Nothing to do. 2025-12-04T09:41:36.6975660Z Complete! 2025-12-04T09:41:36.7955566Z + [[ amzn2023 == \a\m\z\n\2\0\2\3 ]] 2025-12-04T09:41:36.7956327Z + YUM_REPO_URL=https://nvidia.github.io/libnvidia-container/stable/rpm/nvidia-container-toolkit.repo 2025-12-04T09:41:36.7957442Z + sudo yum-config-manager --add-repo https://nvidia.github.io/libnvidia-container/stable/rpm/nvidia-container-toolkit.repo 2025-12-04T09:41:37.1641007Z Adding repo from: https://nvidia.github.io/libnvidia-container/stable/rpm/nvidia-container-toolkit.repo 2025-12-04T09:41:37.2235151Z + sudo yum install -y nvidia-container-toolkit-1.17.8 libnvidia-container-tools-1.17.8 libnvidia-container1-1.17.8 nvidia-container-toolkit-base-1.17.8 2025-12-04T09:41:37.8990156Z nvidia-container-toolkit 19 kB/s | 833 B 00:00 2025-12-04T09:41:38.0055711Z Dependencies resolved. 2025-12-04T09:41:38.0401147Z ================================================================================ 2025-12-04T09:41:38.0401710Z Package Arch Version Repository Size 2025-12-04T09:41:38.0402199Z ================================================================================ 2025-12-04T09:41:38.0402573Z Downgrading: 2025-12-04T09:41:38.0403037Z libnvidia-container-tools x86_64 1.17.8-1 nvidia-container-toolkit 40 k 2025-12-04T09:41:38.0403759Z libnvidia-container1 x86_64 1.17.8-1 nvidia-container-toolkit 1.0 M 2025-12-04T09:41:38.0404482Z nvidia-container-toolkit x86_64 1.17.8-1 nvidia-container-toolkit 1.2 M 2025-12-04T09:41:38.0405495Z nvidia-container-toolkit-base x86_64 1.17.8-1 nvidia-container-toolkit 5.8 M 2025-12-04T09:41:38.0406097Z 2025-12-04T09:41:38.0406269Z Transaction Summary 2025-12-04T09:41:38.0406703Z ================================================================================ 2025-12-04T09:41:38.0407094Z Downgrade 4 Packages 2025-12-04T09:41:38.0407291Z 2025-12-04T09:41:38.0407437Z Total download size: 8.0 M 2025-12-04T09:41:38.0408249Z Downloading Packages: 2025-12-04T09:41:38.0790466Z (1/4): libnvidia-container-tools-1.17.8-1.x86_6 1.1 MB/s | 40 kB 00:00 2025-12-04T09:41:38.1307652Z (2/4): libnvidia-container1-1.17.8-1.x86_64.rpm 11 MB/s | 1.0 MB 00:00 2025-12-04T09:41:38.1826269Z (3/4): nvidia-container-toolkit-1.17.8-1.x86_64 8.8 MB/s | 1.2 MB 00:00 2025-12-04T09:41:38.3246748Z (4/4): nvidia-container-toolkit-base-1.17.8-1.x 24 MB/s | 5.8 MB 00:00 2025-12-04T09:41:38.3258902Z -------------------------------------------------------------------------------- 2025-12-04T09:41:38.3263296Z Total 28 MB/s | 8.0 MB 00:00 2025-12-04T09:41:38.3266656Z Running transaction check 2025-12-04T09:41:38.3428838Z Transaction check succeeded. 2025-12-04T09:41:38.3429406Z Running transaction test 2025-12-04T09:41:38.3998070Z Transaction test succeeded. 2025-12-04T09:41:38.4002155Z Running transaction 2025-12-04T09:41:39.4223420Z Preparing : 1/1 2025-12-04T09:41:39.5798207Z Downgrading : nvidia-container-toolkit-base-1.17.8-1.x86_64 1/8 2025-12-04T09:41:39.6122889Z Downgrading : libnvidia-container1-1.17.8-1.x86_64 2/8 2025-12-04T09:41:39.6885917Z Running scriptlet: libnvidia-container1-1.17.8-1.x86_64 2/8 2025-12-04T09:41:39.8453549Z Downgrading : libnvidia-container-tools-1.17.8-1.x86_64 3/8 2025-12-04T09:41:39.8756895Z Downgrading : nvidia-container-toolkit-1.17.8-1.x86_64 4/8 2025-12-04T09:41:39.9609196Z Running scriptlet: nvidia-container-toolkit-1.17.8-1.x86_64 4/8 2025-12-04T09:41:39.9679432Z Running scriptlet: nvidia-container-toolkit-1.18.1-1.x86_64 5/8 2025-12-04T09:41:39.9680393Z Cleanup : nvidia-container-toolkit-1.18.1-1.x86_64 5/8 2025-12-04T09:41:40.0046739Z Running scriptlet: nvidia-container-toolkit-1.18.1-1.x86_64 5/8 2025-12-04T09:41:40.0115655Z Running scriptlet: libnvidia-container-tools-1.18.1-1.x86_64 6/8 2025-12-04T09:41:40.0117073Z Cleanup : libnvidia-container-tools-1.18.1-1.x86_64 6/8 2025-12-04T09:41:40.0501390Z Running scriptlet: libnvidia-container-tools-1.18.1-1.x86_64 6/8 2025-12-04T09:41:40.0573901Z Running scriptlet: libnvidia-container1-1.18.1-1.x86_64 7/8 2025-12-04T09:41:40.0575076Z Cleanup : libnvidia-container1-1.18.1-1.x86_64 7/8 2025-12-04T09:41:40.0949969Z Running scriptlet: libnvidia-container1-1.18.1-1.x86_64 7/8 2025-12-04T09:41:40.1022592Z Running scriptlet: nvidia-container-toolkit-base-1.18.1-1.x86_64 8/8 2025-12-04T09:41:40.1023949Z Cleanup : nvidia-container-toolkit-base-1.18.1-1.x86_64 8/8 2025-12-04T09:41:40.1357865Z Running scriptlet: nvidia-container-toolkit-base-1.18.1-1.x86_64 8/8 2025-12-04T09:41:40.1927045Z Running scriptlet: nvidia-container-toolkit-1.17.8-1.x86_64 8/8 2025-12-04T09:41:41.1002479Z Running scriptlet: nvidia-container-toolkit-base-1.18.1-1.x86_64 8/8 2025-12-04T09:41:41.1003951Z Verifying : libnvidia-container-tools-1.17.8-1.x86_64 1/8 2025-12-04T09:41:41.1005258Z Verifying : libnvidia-container-tools-1.18.1-1.x86_64 2/8 2025-12-04T09:41:41.1006216Z Verifying : libnvidia-container1-1.17.8-1.x86_64 3/8 2025-12-04T09:41:41.1007813Z Verifying : libnvidia-container1-1.18.1-1.x86_64 4/8 2025-12-04T09:41:41.1009101Z Verifying : nvidia-container-toolkit-1.17.8-1.x86_64 5/8 2025-12-04T09:41:41.1010377Z Verifying : nvidia-container-toolkit-1.18.1-1.x86_64 6/8 2025-12-04T09:41:41.1011669Z Verifying : nvidia-container-toolkit-base-1.17.8-1.x86_64 7/8 2025-12-04T09:41:41.2634107Z Verifying : nvidia-container-toolkit-base-1.18.1-1.x86_64 8/8================================================================================ 2025-12-04T09:41:41.2635314Z WARNING: 2025-12-04T09:41:41.2635749Z A newer release of "Amazon Linux" is available. 2025-12-04T09:41:41.2636226Z 2025-12-04T09:41:41.2636397Z Available Versions: 2025-12-04T09:41:41.2636684Z 2025-12-04T09:41:41.2636884Z Version 2023.9.20250929: 2025-12-04T09:41:41.2637488Z Run the following command to upgrade to 2023.9.20250929: 2025-12-04T09:41:41.2638022Z 2025-12-04T09:41:41.2638263Z dnf upgrade --releasever=2023.9.20250929 2025-12-04T09:41:41.2638698Z 2025-12-04T09:41:41.2638880Z Release notes: 2025-12-04T09:41:41.2639681Z https://docs.aws.amazon.com/linux/al2023/release-notes/relnotes-2023.9.20250929.html 2025-12-04T09:41:41.2640469Z 2025-12-04T09:41:41.2640646Z Version 2023.9.20251014: 2025-12-04T09:41:41.2641603Z Run the following command to upgrade to 2023.9.20251014: 2025-12-04T09:41:41.2642150Z 2025-12-04T09:41:41.2642395Z dnf upgrade --releasever=2023.9.20251014 2025-12-04T09:41:41.2642859Z 2025-12-04T09:41:41.2643039Z Release notes: 2025-12-04T09:41:41.2643850Z https://docs.aws.amazon.com/linux/al2023/release-notes/relnotes-2023.9.20251014.html 2025-12-04T09:41:41.2644640Z 2025-12-04T09:41:41.2644826Z Version 2023.9.20251020: 2025-12-04T09:41:41.2645449Z Run the following command to upgrade to 2023.9.20251020: 2025-12-04T09:41:41.2645990Z 2025-12-04T09:41:41.2646207Z dnf upgrade --releasever=2023.9.20251020 2025-12-04T09:41:41.2646672Z 2025-12-04T09:41:41.2646844Z Release notes: 2025-12-04T09:41:41.2647590Z https://docs.aws.amazon.com/linux/al2023/release-notes/relnotes-2023.9.20251020.html 2025-12-04T09:41:41.2648338Z 2025-12-04T09:41:41.2648530Z Version 2023.9.20251027: 2025-12-04T09:41:41.2649141Z Run the following command to upgrade to 2023.9.20251027: 2025-12-04T09:41:41.2649672Z 2025-12-04T09:41:41.2649885Z dnf upgrade --releasever=2023.9.20251027 2025-12-04T09:41:41.2650286Z 2025-12-04T09:41:41.2650453Z Release notes: 2025-12-04T09:41:41.2651200Z https://docs.aws.amazon.com/linux/al2023/release-notes/relnotes-2023.9.20251027.html 2025-12-04T09:41:41.2651949Z 2025-12-04T09:41:41.2652111Z Version 2023.9.20251105: 2025-12-04T09:41:41.2652731Z Run the following command to upgrade to 2023.9.20251105: 2025-12-04T09:41:41.2653207Z 2025-12-04T09:41:41.2653437Z dnf upgrade --releasever=2023.9.20251105 2025-12-04T09:41:41.2653903Z 2025-12-04T09:41:41.2654092Z Release notes: 2025-12-04T09:41:41.2654924Z https://docs.aws.amazon.com/linux/al2023/release-notes/relnotes-2023.9.20251105.html 2025-12-04T09:41:41.2655688Z 2025-12-04T09:41:41.2655874Z Version 2023.9.20251110: 2025-12-04T09:41:41.2656563Z Run the following command to upgrade to 2023.9.20251110: 2025-12-04T09:41:41.2657088Z 2025-12-04T09:41:41.2657309Z dnf upgrade --releasever=2023.9.20251110 2025-12-04T09:41:41.2657753Z 2025-12-04T09:41:41.2657928Z Release notes: 2025-12-04T09:41:41.2658796Z https://docs.aws.amazon.com/linux/al2023/release-notes/relnotes-2023.9.20251110.html 2025-12-04T09:41:41.2659672Z 2025-12-04T09:41:41.2659883Z Version 2023.9.20251117: 2025-12-04T09:41:41.2660605Z Run the following command to upgrade to 2023.9.20251117: 2025-12-04T09:41:41.2661210Z 2025-12-04T09:41:41.2661470Z dnf upgrade --releasever=2023.9.20251117 2025-12-04T09:41:41.2662004Z 2025-12-04T09:41:41.2662211Z Release notes: 2025-12-04T09:41:41.2663570Z https://docs.aws.amazon.com/linux/al2023/release-notes/relnotes-2023.9.20251117.html 2025-12-04T09:41:41.2664489Z 2025-12-04T09:41:41.2664723Z ================================================================================ 2025-12-04T09:41:41.3341112Z 2025-12-04T09:41:41.3341407Z 2025-12-04T09:41:41.3341553Z Downgraded: 2025-12-04T09:41:41.3342301Z libnvidia-container-tools-1.17.8-1.x86_64 2025-12-04T09:41:41.3343424Z libnvidia-container1-1.17.8-1.x86_64 2025-12-04T09:41:41.3344496Z nvidia-container-toolkit-1.17.8-1.x86_64 2025-12-04T09:41:41.3345799Z nvidia-container-toolkit-base-1.17.8-1.x86_64 2025-12-04T09:41:41.3346498Z 2025-12-04T09:41:41.3346663Z Complete! 2025-12-04T09:41:41.3922208Z + sudo systemctl restart docker 2025-12-04T09:41:48.1116974Z Thu Dec 4 09:41:48 2025 2025-12-04T09:41:48.1117469Z +-----------------------------------------------------------------------------+ 2025-12-04T09:41:48.1118099Z | NVIDIA-SMI 525.105.17 Driver Version: 525.105.17 CUDA Version: 12.0 | 2025-12-04T09:41:48.1118691Z |-------------------------------+----------------------+----------------------+ 2025-12-04T09:41:48.1119294Z | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | 2025-12-04T09:41:48.1120251Z | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | 2025-12-04T09:41:48.1120774Z | | | MIG M. | 2025-12-04T09:41:48.1121181Z |===============================+======================+======================| 2025-12-04T09:41:48.1218257Z | 0 Tesla T4 On | 00000000:00:1E.0 Off | 0 | 2025-12-04T09:41:48.1218827Z | N/A 30C P0 25W / 70W | 2MiB / 15360MiB | 7% Default | 2025-12-04T09:41:48.1219362Z | | | N/A | 2025-12-04T09:41:48.1219844Z +-------------------------------+----------------------+----------------------+ 2025-12-04T09:41:48.1220326Z 2025-12-04T09:41:48.1220777Z +-----------------------------------------------------------------------------+ 2025-12-04T09:41:48.1221290Z | Processes: | 2025-12-04T09:41:48.1221945Z | GPU GI CI PID Type Process name GPU Memory | 2025-12-04T09:41:48.1222439Z | ID ID Usage | 2025-12-04T09:41:48.1222846Z |=============================================================================| 2025-12-04T09:41:48.1223364Z | No running processes found | 2025-12-04T09:41:48.1223926Z +-----------------------------------------------------------------------------+ 2025-12-04T09:41:48.2001394Z Unable to find image 'public.ecr.aws/docker/library/python:3.13' locally 2025-12-04T09:41:48.3711485Z 3.13: Pulling from docker/library/python 2025-12-04T09:41:48.4584206Z 53c88f1dfeb7: Pulling fs layer 2025-12-04T09:41:48.4584641Z eae668646f44: Pulling fs layer 2025-12-04T09:41:48.4584971Z ff2e6e687b6c: Pulling fs layer 2025-12-04T09:41:48.4585311Z 7c40a3faff76: Pulling fs layer 2025-12-04T09:41:48.4585642Z 967a3b1c8fef: Pulling fs layer 2025-12-04T09:41:48.4586006Z a64e1a44f22a: Pulling fs layer 2025-12-04T09:41:48.4586332Z 52655f8a5bcc: Pulling fs layer 2025-12-04T09:41:48.4586649Z 7c40a3faff76: Waiting 2025-12-04T09:41:48.4586972Z 52655f8a5bcc: Waiting 2025-12-04T09:41:48.4587236Z a64e1a44f22a: Waiting 2025-12-04T09:41:48.6231616Z eae668646f44: Verifying Checksum 2025-12-04T09:41:48.6232119Z eae668646f44: Download complete 2025-12-04T09:41:48.7283856Z 53c88f1dfeb7: Verifying Checksum 2025-12-04T09:41:48.7284298Z 53c88f1dfeb7: Download complete 2025-12-04T09:41:48.8283027Z 967a3b1c8fef: Verifying Checksum 2025-12-04T09:41:48.8283720Z 967a3b1c8fef: Download complete 2025-12-04T09:41:48.8316716Z ff2e6e687b6c: Verifying Checksum 2025-12-04T09:41:48.8317100Z ff2e6e687b6c: Download complete 2025-12-04T09:41:48.8628566Z 52655f8a5bcc: Download complete 2025-12-04T09:41:49.0308187Z a64e1a44f22a: Download complete 2025-12-04T09:41:49.8395332Z 7c40a3faff76: Verifying Checksum 2025-12-04T09:41:49.8395775Z 7c40a3faff76: Download complete 2025-12-04T09:41:50.2241971Z 53c88f1dfeb7: Pull complete 2025-12-04T09:41:50.8395146Z eae668646f44: Pull complete 2025-12-04T09:41:52.8700681Z ff2e6e687b6c: Pull complete 2025-12-04T09:41:58.7127817Z 7c40a3faff76: Pull complete 2025-12-04T09:41:58.9479790Z 967a3b1c8fef: Pull complete 2025-12-04T09:41:59.6374952Z a64e1a44f22a: Pull complete 2025-12-04T09:41:59.6601271Z 52655f8a5bcc: Pull complete 2025-12-04T09:41:59.6731897Z Digest: sha256:3f986299a7b8b44b0d8cf9bda2b22361ce5c3058ef5d7cb17fb7452506680ab0 2025-12-04T09:41:59.6773400Z Status: Downloaded newer image for public.ecr.aws/docker/library/python:3.13 2025-12-04T09:42:07.0261155Z Thu Dec 4 09:42:07 2025 2025-12-04T09:42:07.0261831Z +-----------------------------------------------------------------------------+ 2025-12-04T09:42:07.0262639Z | NVIDIA-SMI 525.105.17 Driver Version: 525.105.17 CUDA Version: 12.0 | 2025-12-04T09:42:07.0263428Z |-------------------------------+----------------------+----------------------+ 2025-12-04T09:42:07.0264634Z | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | 2025-12-04T09:42:07.0265574Z | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | 2025-12-04T09:42:07.0266342Z | | | MIG M. | 2025-12-04T09:42:07.0266969Z |===============================+======================+======================| 2025-12-04T09:42:07.0420177Z | 0 Tesla T4 On | 00000000:00:1E.0 Off | 0 | 2025-12-04T09:42:07.0421098Z | N/A 29C P8 9W / 70W | 2MiB / 15360MiB | 0% Default | 2025-12-04T09:42:07.0421859Z | | | N/A | 2025-12-04T09:42:07.0422608Z +-------------------------------+----------------------+----------------------+ 2025-12-04T09:42:07.0423255Z 2025-12-04T09:42:07.0423884Z +-----------------------------------------------------------------------------+ 2025-12-04T09:42:07.0424572Z | Processes: | 2025-12-04T09:42:07.0425264Z | GPU GI CI PID Type Process name GPU Memory | 2025-12-04T09:42:07.0425912Z | ID ID Usage | 2025-12-04T09:42:07.0426460Z |=============================================================================| 2025-12-04T09:42:07.0427157Z | No running processes found | 2025-12-04T09:42:07.0427923Z +-----------------------------------------------------------------------------+ 2025-12-04T09:42:07.9120854Z Command completed after 1 attempt(s). 2025-12-04T09:42:07.9224746Z Prepare all required actions 2025-12-04T09:42:07.9260393Z ##[group]Run ./.github/actions/get-workflow-job-id 2025-12-04T09:42:07.9260805Z with: 2025-12-04T09:42:07.9261538Z github-token: *** 2025-12-04T09:42:07.9261800Z env: 2025-12-04T09:42:07.9262054Z GIT_DEFAULT_BRANCH: main 2025-12-04T09:42:07.9262368Z HAS_NVIDIA_GPU: true 2025-12-04T09:42:07.9262725Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T09:42:07.9263150Z ##[endgroup] 2025-12-04T09:42:07.9281234Z ##[group]Run set -eux 2025-12-04T09:42:07.9281646Z set -eux 2025-12-04T09:42:07.9299557Z python3 .github/scripts/get_workflow_job_id.py "${GITHUB_RUN_ID}" "${RUNNER_NAME}" 2025-12-04T09:42:07.9311880Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T09:42:07.9312331Z env: 2025-12-04T09:42:07.9312818Z GIT_DEFAULT_BRANCH: main 2025-12-04T09:42:07.9313122Z HAS_NVIDIA_GPU: true 2025-12-04T09:42:07.9313557Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T09:42:07.9314203Z GITHUB_TOKEN: *** 2025-12-04T09:42:07.9314466Z ##[endgroup] 2025-12-04T09:42:07.9351151Z + python3 .github/scripts/get_workflow_job_id.py 19922826259 i-00bb8650059fae3eb 2025-12-04T09:42:10.3245226Z Setting output job-id=57119749248 2025-12-04T09:42:10.3246186Z Setting output job-name=linux-jammy-cuda12.4-py3.10-gcc11 / test (legacy_nvidia_driver, 1, 5, linux.g4dn.4xlarge.nvidia.gpu, mem_leak_check, unstable) 2025-12-04T09:42:10.3385605Z ##[group]Run python3 -m pip install psutil==5.9.8 dataclasses_json==0.6.7 nvidia-ml-py==11.525.84 2025-12-04T09:42:10.3386511Z python3 -m pip install psutil==5.9.8 dataclasses_json==0.6.7 nvidia-ml-py==11.525.84 2025-12-04T09:42:10.3387665Z python3 -m tools.stats.monitor --log-interval "$MONITOR_LOG_INTERVAL" --data-collect-interval "$MONITOR_DATA_COLLECT_INTERVAL" > usage_log.txt 2>&1 & 2025-12-04T09:42:10.3388707Z echo "monitor-script-pid=${!}" >> "${GITHUB_OUTPUT}" 2025-12-04T09:42:10.3395639Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T09:42:10.3396068Z env: 2025-12-04T09:42:10.3396326Z GIT_DEFAULT_BRANCH: main 2025-12-04T09:42:10.3396641Z HAS_NVIDIA_GPU: true 2025-12-04T09:42:10.3396994Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T09:42:10.3397412Z JOB_ID: 57119749248 2025-12-04T09:42:10.3398197Z JOB_NAME: linux-jammy-cuda12.4-py3.10-gcc11 / test (legacy_nvidia_driver, 1, 5, linux.g4dn.4xlarge.nvidia.gpu, mem_leak_check, unstable) 2025-12-04T09:42:10.3398988Z WORKFLOW_NAME: periodic 2025-12-04T09:42:10.3399305Z WORKFLOW_RUN_ID: 19922826259 2025-12-04T09:42:10.3399639Z MONITOR_LOG_INTERVAL: 5 2025-12-04T09:42:10.3399947Z MONITOR_DATA_COLLECT_INTERVAL: 1 2025-12-04T09:42:10.3400288Z ##[endgroup] 2025-12-04T09:42:10.6583014Z Defaulting to user installation because normal site-packages is not writeable 2025-12-04T09:42:11.0738950Z Collecting psutil==5.9.8 2025-12-04T09:42:11.0935787Z Downloading psutil-5.9.8-cp36-abi3-manylinux_2_12_x86_64.manylinux2010_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (288 kB) 2025-12-04T09:42:11.1749841Z Collecting dataclasses_json==0.6.7 2025-12-04T09:42:11.1788039Z Downloading dataclasses_json-0.6.7-py3-none-any.whl (28 kB) 2025-12-04T09:42:11.2105806Z Collecting nvidia-ml-py==11.525.84 2025-12-04T09:42:11.2142817Z Downloading nvidia_ml_py-11.525.84-py3-none-any.whl (34 kB) 2025-12-04T09:42:11.3482074Z Collecting marshmallow<4.0.0,>=3.18.0 2025-12-04T09:42:11.3523998Z Downloading marshmallow-3.26.1-py3-none-any.whl (50 kB) 2025-12-04T09:42:11.3781840Z Collecting typing-inspect<1,>=0.4.0 2025-12-04T09:42:11.3820490Z Downloading typing_inspect-0.9.0-py3-none-any.whl (8.8 kB) 2025-12-04T09:42:11.4451680Z Collecting packaging>=17.0 2025-12-04T09:42:11.4489518Z Downloading packaging-25.0-py3-none-any.whl (66 kB) 2025-12-04T09:42:11.4755715Z Collecting mypy-extensions>=0.3.0 2025-12-04T09:42:11.4792245Z Downloading mypy_extensions-1.1.0-py3-none-any.whl (5.0 kB) 2025-12-04T09:42:11.5337970Z Collecting typing-extensions>=3.7.4 2025-12-04T09:42:11.5376870Z Downloading typing_extensions-4.15.0-py3-none-any.whl (44 kB) 2025-12-04T09:42:11.6395058Z Installing collected packages: typing-extensions, packaging, mypy-extensions, typing-inspect, marshmallow, psutil, nvidia-ml-py, dataclasses-json 2025-12-04T09:42:11.9634855Z Successfully installed dataclasses-json-0.6.7 marshmallow-3.26.1 mypy-extensions-1.1.0 nvidia-ml-py-11.525.84 packaging-25.0 psutil-5.9.8 typing-extensions-4.15.0 typing-inspect-0.9.0 2025-12-04T09:42:12.1839988Z Prepare all required actions 2025-12-04T09:42:12.1840448Z Getting action download info 2025-12-04T09:42:12.3833948Z Download action repository 'seemethere/download-artifact-s3@v4' (SHA:1da556a7aa0a088e3153970611f6c432d58e80e6) 2025-12-04T09:42:12.6477573Z Download action repository 'actions/download-artifact@v4' (SHA:d3f86a106a0bac45b974a628896c90dbdf5c8093) 2025-12-04T09:42:13.0263440Z ##[group]Run ./.github/actions/download-build-artifacts 2025-12-04T09:42:13.0264197Z with: 2025-12-04T09:42:13.0264696Z name: linux-jammy-cuda12.4-py3.10-gcc11 2025-12-04T09:42:13.0265376Z s3-bucket: gha-artifacts 2025-12-04T09:42:13.0265899Z env: 2025-12-04T09:42:13.0266319Z GIT_DEFAULT_BRANCH: main 2025-12-04T09:42:13.0266861Z HAS_NVIDIA_GPU: true 2025-12-04T09:42:13.0267512Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T09:42:13.0268259Z ##[endgroup] 2025-12-04T09:42:13.0325414Z ##[group]Run seemethere/download-artifact-s3@v4 2025-12-04T09:42:13.0326177Z with: 2025-12-04T09:42:13.0326756Z name: linux-jammy-cuda12.4-py3.10-gcc11 2025-12-04T09:42:13.0327422Z s3-bucket: gha-artifacts 2025-12-04T09:42:13.0327936Z region: us-east-1 2025-12-04T09:42:13.0328409Z env: 2025-12-04T09:42:13.0328822Z GIT_DEFAULT_BRANCH: main 2025-12-04T09:42:13.0329370Z HAS_NVIDIA_GPU: true 2025-12-04T09:42:13.0330038Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T09:42:13.0330787Z ##[endgroup] 2025-12-04T09:42:13.5540172Z (node:68798) NOTE: We are formalizing our plans to enter AWS SDK for JavaScript (v2) into maintenance mode in 2023. 2025-12-04T09:42:13.5540819Z 2025-12-04T09:42:13.5541108Z Please migrate your code to use AWS SDK for JavaScript (v3). 2025-12-04T09:42:13.5541733Z For more information, check the migration guide at https://a.co/7PzMCcy 2025-12-04T09:42:13.5542396Z (Use `node --trace-warnings ...` to show where the warning was created) 2025-12-04T09:42:13.7659879Z Found 1 objects with prefix pytorch/pytorch/19922826259/linux-jammy-cuda12.4-py3.10-gcc11/ 2025-12-04T09:42:13.7660776Z Starting download (1/1): /home/ec2-user/actions-runner/_work/pytorch/pytorch/artifacts.zip 2025-12-04T09:42:20.4963200Z Finished download (1/1): /home/ec2-user/actions-runner/_work/pytorch/pytorch/artifacts.zip 2025-12-04T09:42:20.4969636Z Artifact download has finished successfully 2025-12-04T09:42:20.5171981Z ##[group]Run unzip -o artifacts.zip 2025-12-04T09:42:20.5172382Z unzip -o artifacts.zip 2025-12-04T09:42:20.5179428Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T09:42:20.5179879Z env: 2025-12-04T09:42:20.5180126Z GIT_DEFAULT_BRANCH: main 2025-12-04T09:42:20.5180445Z HAS_NVIDIA_GPU: true 2025-12-04T09:42:20.5180814Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T09:42:20.5181229Z ##[endgroup] 2025-12-04T09:42:20.5250842Z Archive: artifacts.zip 2025-12-04T09:42:20.5252375Z creating: dist/ 2025-12-04T09:42:22.5370331Z inflating: dist/torch-2.10.0a0+gitffd9b0f-cp310-cp310-linux_x86_64.whl 2025-12-04T09:42:22.5513992Z inflating: dist/.ninja_log 2025-12-04T09:42:22.5514746Z creating: build/custom_test_artifacts/ 2025-12-04T09:42:22.5515268Z creating: build/custom_test_artifacts/custom-op-build/ 2025-12-04T09:42:22.5515858Z creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/ 2025-12-04T09:42:22.5516589Z creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/pkgRedirects/ 2025-12-04T09:42:22.5524188Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/CMakeConfigureLog.yaml 2025-12-04T09:42:22.5525008Z creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/ 2025-12-04T09:42:22.5525811Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CMakeSystem.cmake 2025-12-04T09:42:22.5526672Z creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdC/ 2025-12-04T09:42:22.5527497Z creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdC/tmp/ 2025-12-04T09:42:22.5528798Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdC/CMakeCCompilerId.c 2025-12-04T09:42:22.5530079Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdC/a.out 2025-12-04T09:42:22.5531007Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CMakeCCompiler.cmake 2025-12-04T09:42:22.5532042Z creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCXX/ 2025-12-04T09:42:22.5532888Z creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCXX/tmp/ 2025-12-04T09:42:22.5534285Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCXX/CMakeCXXCompilerId.cpp 2025-12-04T09:42:22.5535818Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCXX/a.out 2025-12-04T09:42:22.5536973Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CMakeCXXCompiler.cmake 2025-12-04T09:42:22.5538741Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CMakeDetermineCompilerABI_C.bin 2025-12-04T09:42:22.5540682Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CMakeDetermineCompilerABI_CXX.bin 2025-12-04T09:42:22.5541655Z creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCUDA/ 2025-12-04T09:42:22.5542522Z creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/ 2025-12-04T09:42:22.5604344Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cpp4.ii 2025-12-04T09:42:22.5669015Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cudafe1.cpp 2025-12-04T09:42:22.5670292Z extracting: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.module_id 2025-12-04T09:42:22.5737805Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cpp1.ii 2025-12-04T09:42:22.5739055Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cudafe1.c 2025-12-04T09:42:22.5740303Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cudafe1.gpu 2025-12-04T09:42:22.5741599Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cudafe1.stub.c 2025-12-04T09:42:22.5742847Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.ptx 2025-12-04T09:42:22.5744080Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.sm_52.cubin 2025-12-04T09:42:22.5745303Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.fatbin 2025-12-04T09:42:22.5746528Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.fatbin.c 2025-12-04T09:42:22.5747722Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.o 2025-12-04T09:42:22.5748859Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/a_dlink.sm_52.cubin 2025-12-04T09:42:22.5749953Z extracting: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/a_dlink.reg.c 2025-12-04T09:42:22.5751016Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/a_dlink.fatbin 2025-12-04T09:42:22.5752096Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/a_dlink.fatbin.c 2025-12-04T09:42:22.5753145Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/a_dlink.o 2025-12-04T09:42:22.5754454Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCUDA/CMakeCUDACompilerId.cu 2025-12-04T09:42:22.5832755Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCUDA/a.out 2025-12-04T09:42:22.5833739Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CMakeCUDACompiler.cmake 2025-12-04T09:42:22.5916743Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CMakeDetermineCompilerABI_CUDA.bin 2025-12-04T09:42:22.5917685Z creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/CMakeScratch/ 2025-12-04T09:42:22.5918417Z creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/CMakeTmp/ 2025-12-04T09:42:22.5919198Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/cmake.check_cache 2025-12-04T09:42:22.5919997Z creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/ 2025-12-04T09:42:22.5920892Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/compiler_depend.ts 2025-12-04T09:42:22.5921901Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/compiler_depend.make 2025-12-04T09:42:22.5922873Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/depend.make 2025-12-04T09:42:22.5923779Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/link.txt 2025-12-04T09:42:22.5924705Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/cmake_clean.cmake 2025-12-04T09:42:22.5925644Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/build.make 2025-12-04T09:42:22.5926582Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/DependInfo.cmake 2025-12-04T09:42:22.5927520Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/flags.make 2025-12-04T09:42:22.5928447Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/progress.make 2025-12-04T09:42:22.5946761Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/op.cpp.o.d 2025-12-04T09:42:22.6165476Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/op.cpp.o 2025-12-04T09:42:22.6166390Z creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/ 2025-12-04T09:42:22.6167345Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/compiler_depend.ts 2025-12-04T09:42:22.6168402Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/compiler_depend.make 2025-12-04T09:42:22.6169426Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/depend.make 2025-12-04T09:42:22.6170377Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/link.txt 2025-12-04T09:42:22.6171523Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/cmake_clean.cmake 2025-12-04T09:42:22.6172511Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/build.make 2025-12-04T09:42:22.6173501Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/DependInfo.cmake 2025-12-04T09:42:22.6174494Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/flags.make 2025-12-04T09:42:22.6175467Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/progress.make 2025-12-04T09:42:22.6195126Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/test_custom_ops.cpp.o.d 2025-12-04T09:42:22.6285320Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/test_custom_ops.cpp.o 2025-12-04T09:42:22.6286616Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/CMakeDirectoryInformation.cmake 2025-12-04T09:42:22.6287570Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/TargetDirectories.txt 2025-12-04T09:42:22.6288408Z extracting: build/custom_test_artifacts/custom-op-build/CMakeFiles/progress.marks 2025-12-04T09:42:22.6289196Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/Makefile2 2025-12-04T09:42:22.6290108Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/Makefile.cmake 2025-12-04T09:42:22.6290889Z inflating: build/custom_test_artifacts/custom-op-build/detect_cuda_version.cc 2025-12-04T09:42:22.6293237Z inflating: build/custom_test_artifacts/custom-op-build/CMakeCache.txt 2025-12-04T09:42:22.6294057Z inflating: build/custom_test_artifacts/custom-op-build/Makefile 2025-12-04T09:42:22.6294770Z inflating: build/custom_test_artifacts/custom-op-build/cmake_install.cmake 2025-12-04T09:42:22.6485923Z inflating: build/custom_test_artifacts/custom-op-build/libcustom_ops.so 2025-12-04T09:42:22.6548506Z inflating: build/custom_test_artifacts/custom-op-build/test_custom_ops 2025-12-04T09:42:22.6549119Z creating: build/custom_test_artifacts/jit-hook-build/ 2025-12-04T09:42:22.6549684Z creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/ 2025-12-04T09:42:22.6550369Z creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/pkgRedirects/ 2025-12-04T09:42:22.6557880Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/CMakeConfigureLog.yaml 2025-12-04T09:42:22.6558678Z creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/ 2025-12-04T09:42:22.6559463Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CMakeSystem.cmake 2025-12-04T09:42:22.6560303Z creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdC/ 2025-12-04T09:42:22.6561109Z creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdC/tmp/ 2025-12-04T09:42:22.6562173Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdC/CMakeCCompilerId.c 2025-12-04T09:42:22.6563717Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdC/a.out 2025-12-04T09:42:22.6564966Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CMakeCCompiler.cmake 2025-12-04T09:42:22.6565898Z creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCXX/ 2025-12-04T09:42:22.6566746Z creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCXX/tmp/ 2025-12-04T09:42:22.6567980Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCXX/CMakeCXXCompilerId.cpp 2025-12-04T09:42:22.6569478Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCXX/a.out 2025-12-04T09:42:22.6570508Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CMakeCXXCompiler.cmake 2025-12-04T09:42:22.6572532Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CMakeDetermineCompilerABI_C.bin 2025-12-04T09:42:22.6574446Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CMakeDetermineCompilerABI_CXX.bin 2025-12-04T09:42:22.6575528Z creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCUDA/ 2025-12-04T09:42:22.6576462Z creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/ 2025-12-04T09:42:22.6638544Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cpp4.ii 2025-12-04T09:42:22.6703468Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cudafe1.cpp 2025-12-04T09:42:22.6704731Z extracting: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.module_id 2025-12-04T09:42:22.6772259Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cpp1.ii 2025-12-04T09:42:22.6773489Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cudafe1.c 2025-12-04T09:42:22.6774737Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cudafe1.gpu 2025-12-04T09:42:22.6776156Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cudafe1.stub.c 2025-12-04T09:42:22.6777450Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.ptx 2025-12-04T09:42:22.6778663Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.sm_52.cubin 2025-12-04T09:42:22.6779891Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.fatbin 2025-12-04T09:42:22.6781106Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.fatbin.c 2025-12-04T09:42:22.6782293Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.o 2025-12-04T09:42:22.6783401Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/a_dlink.sm_52.cubin 2025-12-04T09:42:22.6784494Z extracting: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/a_dlink.reg.c 2025-12-04T09:42:22.6785557Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/a_dlink.fatbin 2025-12-04T09:42:22.6786620Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/a_dlink.fatbin.c 2025-12-04T09:42:22.6787645Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/a_dlink.o 2025-12-04T09:42:22.6788701Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCUDA/CMakeCUDACompilerId.cu 2025-12-04T09:42:22.6866770Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCUDA/a.out 2025-12-04T09:42:22.6867932Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CMakeCUDACompiler.cmake 2025-12-04T09:42:22.6949330Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CMakeDetermineCompilerABI_CUDA.bin 2025-12-04T09:42:22.6950256Z creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/CMakeScratch/ 2025-12-04T09:42:22.6950986Z creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/CMakeTmp/ 2025-12-04T09:42:22.6951748Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/cmake.check_cache 2025-12-04T09:42:22.6952555Z creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/ 2025-12-04T09:42:22.6953476Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/compiler_depend.ts 2025-12-04T09:42:22.6954528Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/compiler_depend.make 2025-12-04T09:42:22.6955525Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/depend.make 2025-12-04T09:42:22.6956456Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/link.txt 2025-12-04T09:42:22.6957405Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/cmake_clean.cmake 2025-12-04T09:42:22.6958372Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/build.make 2025-12-04T09:42:22.6959337Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/DependInfo.cmake 2025-12-04T09:42:22.6960302Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/flags.make 2025-12-04T09:42:22.6961441Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/progress.make 2025-12-04T09:42:22.6979285Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/test_jit_hooks.cpp.o.d 2025-12-04T09:42:22.7048525Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/test_jit_hooks.cpp.o 2025-12-04T09:42:22.7049779Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/CMakeDirectoryInformation.cmake 2025-12-04T09:42:22.7050701Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/TargetDirectories.txt 2025-12-04T09:42:22.7051540Z extracting: build/custom_test_artifacts/jit-hook-build/CMakeFiles/progress.marks 2025-12-04T09:42:22.7052304Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/Makefile2 2025-12-04T09:42:22.7053058Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/Makefile.cmake 2025-12-04T09:42:22.7053817Z inflating: build/custom_test_artifacts/jit-hook-build/detect_cuda_version.cc 2025-12-04T09:42:22.7056553Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeCache.txt 2025-12-04T09:42:22.7057415Z inflating: build/custom_test_artifacts/jit-hook-build/Makefile 2025-12-04T09:42:22.7058156Z inflating: build/custom_test_artifacts/jit-hook-build/cmake_install.cmake 2025-12-04T09:42:22.7100857Z inflating: build/custom_test_artifacts/jit-hook-build/test_jit_hooks 2025-12-04T09:42:22.7101528Z creating: build/custom_test_artifacts/custom-backend-build/ 2025-12-04T09:42:22.7102163Z creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/ 2025-12-04T09:42:22.7102911Z creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/pkgRedirects/ 2025-12-04T09:42:22.7110824Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/CMakeConfigureLog.yaml 2025-12-04T09:42:22.7111687Z creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/ 2025-12-04T09:42:22.7112548Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CMakeSystem.cmake 2025-12-04T09:42:22.7113468Z creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdC/ 2025-12-04T09:42:22.7114343Z creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdC/tmp/ 2025-12-04T09:42:22.7115372Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdC/CMakeCCompilerId.c 2025-12-04T09:42:22.7116604Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdC/a.out 2025-12-04T09:42:22.7117576Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CMakeCCompiler.cmake 2025-12-04T09:42:22.7118505Z creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCXX/ 2025-12-04T09:42:22.7119419Z creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCXX/tmp/ 2025-12-04T09:42:22.7120901Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCXX/CMakeCXXCompilerId.cpp 2025-12-04T09:42:22.7122460Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCXX/a.out 2025-12-04T09:42:22.7123537Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CMakeCXXCompiler.cmake 2025-12-04T09:42:22.7125162Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CMakeDetermineCompilerABI_C.bin 2025-12-04T09:42:22.7127232Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CMakeDetermineCompilerABI_CXX.bin 2025-12-04T09:42:22.7128278Z creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCUDA/ 2025-12-04T09:42:22.7129212Z creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/ 2025-12-04T09:42:22.7191176Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cpp4.ii 2025-12-04T09:42:22.7255140Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cudafe1.cpp 2025-12-04T09:42:22.7256534Z extracting: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.module_id 2025-12-04T09:42:22.7324219Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cpp1.ii 2025-12-04T09:42:22.7325525Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cudafe1.c 2025-12-04T09:42:22.7326848Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cudafe1.gpu 2025-12-04T09:42:22.7328220Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cudafe1.stub.c 2025-12-04T09:42:22.7329553Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.ptx 2025-12-04T09:42:22.7330846Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.sm_52.cubin 2025-12-04T09:42:22.7332126Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.fatbin 2025-12-04T09:42:22.7333438Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.fatbin.c 2025-12-04T09:42:22.7334699Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.o 2025-12-04T09:42:22.7335895Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/a_dlink.sm_52.cubin 2025-12-04T09:42:22.7337117Z extracting: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/a_dlink.reg.c 2025-12-04T09:42:22.7338259Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/a_dlink.fatbin 2025-12-04T09:42:22.7339411Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/a_dlink.fatbin.c 2025-12-04T09:42:22.7340517Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/a_dlink.o 2025-12-04T09:42:22.7341655Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCUDA/CMakeCUDACompilerId.cu 2025-12-04T09:42:22.7419027Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCUDA/a.out 2025-12-04T09:42:22.7420050Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CMakeCUDACompiler.cmake 2025-12-04T09:42:22.7501046Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CMakeDetermineCompilerABI_CUDA.bin 2025-12-04T09:42:22.7502037Z creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/CMakeScratch/ 2025-12-04T09:42:22.7502834Z creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/CMakeTmp/ 2025-12-04T09:42:22.7503664Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/cmake.check_cache 2025-12-04T09:42:22.7504543Z creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/ 2025-12-04T09:42:22.7505523Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/compiler_depend.ts 2025-12-04T09:42:22.7506667Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/compiler_depend.make 2025-12-04T09:42:22.7507755Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/depend.make 2025-12-04T09:42:22.7508761Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/link.txt 2025-12-04T09:42:22.7509984Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/cmake_clean.cmake 2025-12-04T09:42:22.7511039Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/build.make 2025-12-04T09:42:22.7512096Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/DependInfo.cmake 2025-12-04T09:42:22.7513233Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/flags.make 2025-12-04T09:42:22.7514256Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/progress.make 2025-12-04T09:42:22.7515364Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/custom_backend.cpp.o.d 2025-12-04T09:42:22.7645098Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/custom_backend.cpp.o 2025-12-04T09:42:22.7646150Z creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/ 2025-12-04T09:42:22.7647185Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/compiler_depend.ts 2025-12-04T09:42:22.7648358Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/compiler_depend.make 2025-12-04T09:42:22.7649499Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/depend.make 2025-12-04T09:42:22.7650561Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/link.txt 2025-12-04T09:42:22.7651646Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/cmake_clean.cmake 2025-12-04T09:42:22.7652759Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/build.make 2025-12-04T09:42:22.7653871Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/DependInfo.cmake 2025-12-04T09:42:22.7654977Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/flags.make 2025-12-04T09:42:22.7656066Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/progress.make 2025-12-04T09:42:22.7674598Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/test_custom_backend.cpp.o.d 2025-12-04T09:42:22.7735212Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/test_custom_backend.cpp.o 2025-12-04T09:42:22.7736432Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/CMakeDirectoryInformation.cmake 2025-12-04T09:42:22.7737446Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/TargetDirectories.txt 2025-12-04T09:42:22.7738338Z extracting: build/custom_test_artifacts/custom-backend-build/CMakeFiles/progress.marks 2025-12-04T09:42:22.7739183Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/Makefile2 2025-12-04T09:42:22.7740001Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/Makefile.cmake 2025-12-04T09:42:22.7797009Z inflating: build/custom_test_artifacts/custom-backend-build/detect_cuda_version.cc 2025-12-04T09:42:22.7797840Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeCache.txt 2025-12-04T09:42:22.7798552Z inflating: build/custom_test_artifacts/custom-backend-build/Makefile 2025-12-04T09:42:22.7799269Z inflating: build/custom_test_artifacts/custom-backend-build/cmake_install.cmake 2025-12-04T09:42:22.7857997Z inflating: build/custom_test_artifacts/custom-backend-build/libcustom_backend.so 2025-12-04T09:42:22.7902113Z inflating: build/custom_test_artifacts/custom-backend-build/test_custom_backend 2025-12-04T09:42:22.7902685Z creating: build/lib/ 2025-12-04T09:42:22.7993629Z inflating: build/lib/libprotobuf-lite.a 2025-12-04T09:42:22.8482852Z inflating: build/lib/libprotobuf.a 2025-12-04T09:42:22.9030052Z inflating: build/lib/libprotoc.a 2025-12-04T09:42:22.9041096Z inflating: build/lib/libpthreadpool.a 2025-12-04T09:42:22.9050247Z inflating: build/lib/libcpuinfo.a 2025-12-04T09:42:22.9059104Z inflating: build/lib/libcpuinfo_internals.a 2025-12-04T09:42:22.9060119Z inflating: build/lib/libclog.a 2025-12-04T09:42:22.9081472Z inflating: build/lib/libpytorch_qnnpack.a 2025-12-04T09:42:22.9084173Z inflating: build/lib/libnnpack_reference_layers.a 2025-12-04T09:42:22.9104149Z inflating: build/lib/libnnpack.a 2025-12-04T09:42:22.9310599Z inflating: build/lib/libmicrokernels-prod.a 2025-12-04T09:42:23.0281747Z inflating: build/lib/libmicrokernels-all.a 2025-12-04T09:42:23.0359030Z inflating: build/lib/libgtest.a 2025-12-04T09:42:23.0378583Z inflating: build/lib/libgmock.a 2025-12-04T09:42:23.0379454Z inflating: build/lib/libgtest_main.a 2025-12-04T09:42:23.0380307Z inflating: build/lib/libgmock_main.a 2025-12-04T09:42:23.0480453Z inflating: build/lib/libXNNPACK.a 2025-12-04T09:42:23.0563954Z inflating: build/lib/libbenchmark.a 2025-12-04T09:42:23.0564865Z inflating: build/lib/libbenchmark_main.a 2025-12-04T09:42:23.0573852Z inflating: build/lib/libittnotify.a 2025-12-04T09:42:23.0646856Z inflating: build/lib/libasmjit.a 2025-12-04T09:42:23.0647771Z inflating: build/lib/libjitprofiling.a 2025-12-04T09:42:23.1931774Z inflating: build/lib/libfbgemm.a 2025-12-04T09:42:23.1965328Z inflating: build/lib/libtensorpipe_uv.a 2025-12-04T09:42:23.2561927Z inflating: build/lib/libtensorpipe.a 2025-12-04T09:42:23.2830949Z inflating: build/lib/libtensorpipe_cuda.a 2025-12-04T09:42:23.2980104Z inflating: build/lib/libgloo.a 2025-12-04T09:42:23.3031933Z inflating: build/lib/libonnx_proto.a 2025-12-04T09:42:23.3502561Z inflating: build/lib/libgloo_cuda.a 2025-12-04T09:42:23.4285990Z inflating: build/lib/libonnx.a 2025-12-04T09:42:24.5376582Z inflating: build/lib/libdnnl.a 2025-12-04T09:42:24.5397945Z inflating: build/lib/libfmt.a 2025-12-04T09:42:24.5927159Z inflating: build/lib/libkineto.a 2025-12-04T09:42:24.6056523Z inflating: build/lib/libc10.so 2025-12-04T09:42:24.6111632Z inflating: build/lib/libc10_cuda.so 2025-12-04T09:42:24.6113225Z inflating: build/lib/libtorch_global_deps.so 2025-12-04T09:42:24.6115200Z inflating: build/lib/libcaffe2_nvrtc.so 2025-12-04T09:42:28.0275579Z inflating: build/lib/libtorch_cpu.so 2025-12-04T09:42:29.8261881Z inflating: build/lib/libtorch_cuda.so 2025-12-04T09:42:29.8266535Z inflating: build/lib/libshm.so 2025-12-04T09:42:29.8268162Z inflating: build/lib/libtorch.so 2025-12-04T09:42:29.8321744Z inflating: build/lib/libtorch_cuda_linalg.so 2025-12-04T09:42:29.8324463Z inflating: build/lib/libc10d_cuda_test.so 2025-12-04T09:42:29.8403501Z inflating: build/lib/libtorchbind_test.so 2025-12-04T09:42:29.8424908Z inflating: build/lib/libjitbackend_test.so 2025-12-04T09:42:29.8451314Z inflating: build/lib/libbackend_with_compiler.so 2025-12-04T09:42:29.8480641Z inflating: build/lib/libaoti_custom_ops.so 2025-12-04T09:42:30.1114245Z inflating: build/lib/libtorch_python.so 2025-12-04T09:42:30.1155123Z inflating: build/lib/libnnapi_backend.so 2025-12-04T09:42:30.1155572Z creating: build/bin/ 2025-12-04T09:42:30.1664713Z inflating: build/bin/protoc-3.13.0.0 2025-12-04T09:42:30.2173889Z inflating: build/bin/protoc 2025-12-04T09:42:30.2240208Z inflating: build/bin/c10_AllocatorConfig_test 2025-12-04T09:42:30.2302411Z inflating: build/bin/c10_CompileTimeFunctionPointer_test 2025-12-04T09:42:30.2366028Z inflating: build/bin/c10_DeviceGuard_test 2025-12-04T09:42:30.2430186Z inflating: build/bin/c10_Device_test 2025-12-04T09:42:30.2503390Z inflating: build/bin/c10_DispatchKeySet_test 2025-12-04T09:42:30.2564156Z inflating: build/bin/c10_StreamGuard_test 2025-12-04T09:42:30.2631215Z inflating: build/bin/c10_Scalar_test 2025-12-04T09:42:30.2700232Z inflating: build/bin/c10_SymInt_test 2025-12-04T09:42:30.2768876Z inflating: build/bin/c10_InlineStreamGuard_test 2025-12-04T09:42:30.2835921Z inflating: build/bin/c10_InlineDeviceGuard_test 2025-12-04T09:42:30.2904974Z inflating: build/bin/c10_SizesAndStrides_test 2025-12-04T09:42:30.2966509Z inflating: build/bin/c10_ArrayRef_test 2025-12-04T09:42:30.3027304Z inflating: build/bin/c10_ConstexprCrc_test 2025-12-04T09:42:30.3112908Z inflating: build/bin/c10_cow_test 2025-12-04T09:42:30.3178010Z inflating: build/bin/c10_Bitset_test 2025-12-04T09:42:30.3239707Z inflating: build/bin/c10_DeadlockDetection_test 2025-12-04T09:42:30.3309497Z inflating: build/bin/c10_Enumerate_test 2025-12-04T09:42:30.3372370Z inflating: build/bin/c10_Half_test 2025-12-04T09:42:30.3437895Z inflating: build/bin/c10_IntrusiveList_test 2025-12-04T09:42:30.3503876Z inflating: build/bin/c10_NetworkFlow_test 2025-12-04T09:42:30.3572592Z inflating: build/bin/c10_LeftRight_test 2025-12-04T09:42:30.3634346Z inflating: build/bin/c10_Synchronized_test 2025-12-04T09:42:30.3696710Z inflating: build/bin/c10_Semaphore_test 2025-12-04T09:42:30.3764684Z inflating: build/bin/c10_ThreadLocal_test 2025-12-04T09:42:30.3828842Z inflating: build/bin/c10_TypeIndex_test 2025-12-04T09:42:30.3892856Z inflating: build/bin/c10_accumulate_test 2025-12-04T09:42:30.3961325Z inflating: build/bin/c10_bfloat16_test 2025-12-04T09:42:30.4030936Z inflating: build/bin/c10_complex_math_test 2025-12-04T09:42:30.4093219Z inflating: build/bin/c10_bit_cast_test 2025-12-04T09:42:30.4154691Z inflating: build/bin/c10_error_test 2025-12-04T09:42:30.4222562Z inflating: build/bin/c10_complex_test 2025-12-04T09:42:30.4287506Z inflating: build/bin/c10_exception_test 2025-12-04T09:42:30.4349440Z inflating: build/bin/c10_flags_test 2025-12-04T09:42:30.4411800Z inflating: build/bin/c10_generic_math_test 2025-12-04T09:42:30.4596277Z inflating: build/bin/c10_intrusive_ptr_test 2025-12-04T09:42:30.4659325Z inflating: build/bin/c10_irange_test 2025-12-04T09:42:30.4725386Z inflating: build/bin/c10_lazy_test 2025-12-04T09:42:30.4795644Z inflating: build/bin/c10_logging_test 2025-12-04T09:42:30.4857457Z inflating: build/bin/c10_nofatal_test 2025-12-04T09:42:30.4948125Z inflating: build/bin/c10_optional_test 2025-12-04T09:42:30.5023936Z inflating: build/bin/c10_ordered_preserving_dict_test 2025-12-04T09:42:30.5089548Z inflating: build/bin/c10_registry_test 2025-12-04T09:42:30.5268849Z inflating: build/bin/c10_small_vector_test 2025-12-04T09:42:30.5332450Z inflating: build/bin/c10_ssize_test 2025-12-04T09:42:30.5402594Z inflating: build/bin/c10_string_util_test 2025-12-04T09:42:30.5456418Z inflating: build/bin/c10_intrusive_ptr_benchmark 2025-12-04T09:42:30.5519086Z inflating: build/bin/c10_tempfile_test 2025-12-04T09:42:30.5579765Z inflating: build/bin/c10_string_view_test 2025-12-04T09:42:30.5648848Z inflating: build/bin/c10_typeid_test 2025-12-04T09:42:30.5714255Z inflating: build/bin/c10_cuda_CUDAAssertionsTest_catches_thread_and_block_and_device 2025-12-04T09:42:30.5779815Z inflating: build/bin/c10_cuda_CUDAAssertionsTest_multiple_writes_from_multiple_blocks 2025-12-04T09:42:30.5843761Z inflating: build/bin/c10_cuda_CUDAAssertionsTest_from_2_processes 2025-12-04T09:42:30.5911750Z inflating: build/bin/c10_cuda_CUDAAssertionsTest_multiple_writes_from_blocks_and_threads 2025-12-04T09:42:30.5972934Z inflating: build/bin/c10_cuda_CUDATest 2025-12-04T09:42:30.6038580Z inflating: build/bin/c10_cuda_CUDAAssertionsTest_catches_stream 2025-12-04T09:42:30.6104071Z inflating: build/bin/c10_cuda_CUDAAssertionsTest_1_var_test 2025-12-04T09:42:30.6169548Z inflating: build/bin/c10_cuda_CUDAAssertionsTest_multiple_writes_from_same_block 2025-12-04T09:42:30.6837972Z inflating: build/bin/vec_test_all_types_DEFAULT 2025-12-04T09:42:30.7525938Z inflating: build/bin/vec_test_all_types_AVX512 2025-12-04T09:42:30.8223455Z inflating: build/bin/vec_test_all_types_AVX2 2025-12-04T09:42:30.8284792Z inflating: build/bin/test_vec_half_DEFAULT 2025-12-04T09:42:30.8401717Z inflating: build/bin/test_aoti_abi_check 2025-12-04T09:42:30.8463418Z inflating: build/bin/test_vec_half_AVX512 2025-12-04T09:42:30.8525879Z inflating: build/bin/test_vec_half_AVX2 2025-12-04T09:42:30.8591123Z inflating: build/bin/BackoffTest 2025-12-04T09:42:30.8656703Z inflating: build/bin/FileStoreTest 2025-12-04T09:42:30.8726755Z inflating: build/bin/TCPStoreTest 2025-12-04T09:42:30.8793262Z inflating: build/bin/HashStoreTest 2025-12-04T09:42:30.8808848Z inflating: build/bin/ProcessGroupMPITest 2025-12-04T09:42:30.8812976Z inflating: build/bin/torch_shm_manager 2025-12-04T09:42:30.8902298Z inflating: build/bin/Dict_test 2025-12-04T09:42:30.8966966Z inflating: build/bin/Dimname_test 2025-12-04T09:42:30.9046178Z inflating: build/bin/MaybeOwned_test 2025-12-04T09:42:30.9116081Z inflating: build/bin/NamedTensor_test 2025-12-04T09:42:30.9188351Z inflating: build/bin/apply_utils_test 2025-12-04T09:42:30.9260214Z inflating: build/bin/atest 2025-12-04T09:42:30.9338353Z inflating: build/bin/basic 2025-12-04T09:42:30.9405257Z inflating: build/bin/broadcast_test 2025-12-04T09:42:30.9467807Z inflating: build/bin/cpu_allocator_test 2025-12-04T09:42:30.9539214Z inflating: build/bin/cpu_generator_test 2025-12-04T09:42:30.9604387Z inflating: build/bin/cpu_profiling_allocator_test 2025-12-04T09:42:30.9714582Z inflating: build/bin/cpu_rng_test 2025-12-04T09:42:30.9777726Z inflating: build/bin/dlconvertor_test 2025-12-04T09:42:30.9848400Z inflating: build/bin/extension_backend_test 2025-12-04T09:42:30.9917004Z inflating: build/bin/half_test 2025-12-04T09:42:31.0033791Z inflating: build/bin/ivalue_test 2025-12-04T09:42:31.0096236Z inflating: build/bin/lazy_tensor_test 2025-12-04T09:42:31.0161674Z inflating: build/bin/math_kernel_test 2025-12-04T09:42:31.0227599Z inflating: build/bin/memory_format_test 2025-12-04T09:42:31.0293730Z inflating: build/bin/memory_overlapping_test 2025-12-04T09:42:31.0359618Z inflating: build/bin/mobile_memory_cleanup 2025-12-04T09:42:31.0428563Z inflating: build/bin/native_test 2025-12-04T09:42:31.0491375Z inflating: build/bin/operator_name_test 2025-12-04T09:42:31.0554152Z inflating: build/bin/operators_test 2025-12-04T09:42:31.0618844Z inflating: build/bin/packedtensoraccessor_test 2025-12-04T09:42:31.0701304Z inflating: build/bin/pow_test 2025-12-04T09:42:31.0770518Z inflating: build/bin/quantized_test 2025-12-04T09:42:31.0832565Z inflating: build/bin/reduce_ops_test 2025-12-04T09:42:31.0896011Z inflating: build/bin/reportMemoryUsage_test 2025-12-04T09:42:31.0964480Z inflating: build/bin/scalar_tensor_test 2025-12-04T09:42:31.1035407Z inflating: build/bin/scalar_test 2025-12-04T09:42:31.1099183Z inflating: build/bin/StorageUtils_test 2025-12-04T09:42:31.1163540Z inflating: build/bin/stride_properties_test 2025-12-04T09:42:31.1258596Z inflating: build/bin/tensor_iterator_test 2025-12-04T09:42:31.1325652Z inflating: build/bin/test_parallel 2025-12-04T09:42:31.1388249Z inflating: build/bin/thread_init_test 2025-12-04T09:42:31.1455735Z inflating: build/bin/type_ptr_test 2025-12-04T09:42:31.1528669Z inflating: build/bin/type_test 2025-12-04T09:42:31.1594156Z inflating: build/bin/undefined_tensor_test 2025-12-04T09:42:31.1655282Z inflating: build/bin/verify_api_visibility 2025-12-04T09:42:31.1741495Z inflating: build/bin/legacy_vmap_test 2025-12-04T09:42:31.1804744Z inflating: build/bin/weakref_test 2025-12-04T09:42:31.1868141Z inflating: build/bin/wrapdim_test 2025-12-04T09:42:31.1931495Z inflating: build/bin/xla_tensor_test 2025-12-04T09:42:31.2004652Z inflating: build/bin/IListRef_test 2025-12-04T09:42:31.2130089Z inflating: build/bin/List_test 2025-12-04T09:42:31.2210630Z inflating: build/bin/KernelFunction_test 2025-12-04T09:42:31.2352871Z inflating: build/bin/kernel_function_legacy_test 2025-12-04T09:42:31.2466884Z inflating: build/bin/kernel_function_test 2025-12-04T09:42:31.2615606Z inflating: build/bin/kernel_lambda_legacy_test 2025-12-04T09:42:31.2736703Z inflating: build/bin/kernel_lambda_test 2025-12-04T09:42:31.2810453Z inflating: build/bin/kernel_stackbased_test 2025-12-04T09:42:31.2924380Z inflating: build/bin/make_boxed_from_unboxed_functor_test 2025-12-04T09:42:31.2987533Z inflating: build/bin/CppSignature_test 2025-12-04T09:42:31.3055127Z inflating: build/bin/backend_fallback_test 2025-12-04T09:42:31.3116108Z inflating: build/bin/op_allowlist_test 2025-12-04T09:42:31.3472689Z inflating: build/bin/op_registration_test 2025-12-04T09:42:31.3554037Z inflating: build/bin/inline_container_test 2025-12-04T09:42:31.3620090Z inflating: build/bin/cuda_allocator_test 2025-12-04T09:42:31.3685827Z inflating: build/bin/cuda_apply_test 2025-12-04T09:42:31.3758887Z inflating: build/bin/cuda_atomic_ops_test 2025-12-04T09:42:31.3828520Z inflating: build/bin/cuda_caching_host_allocator_test 2025-12-04T09:42:31.3913052Z inflating: build/bin/cuda_complex_math_test 2025-12-04T09:42:31.3985733Z inflating: build/bin/cuda_complex_test 2025-12-04T09:42:31.4057520Z inflating: build/bin/cuda_cub_test 2025-12-04T09:42:31.4122841Z inflating: build/bin/cuda_cublas_handle_pool_test 2025-12-04T09:42:31.4184484Z inflating: build/bin/cuda_device_test 2025-12-04T09:42:31.4263648Z inflating: build/bin/cuda_distributions_test 2025-12-04T09:42:31.4327853Z inflating: build/bin/cuda_dlconvertor_test 2025-12-04T09:42:31.4393964Z inflating: build/bin/cuda_event_test 2025-12-04T09:42:31.4455490Z inflating: build/bin/cuda_exchange_device_test 2025-12-04T09:42:31.4525388Z inflating: build/bin/cuda_generator_test 2025-12-04T09:42:31.4587165Z inflating: build/bin/cuda_half_test 2025-12-04T09:42:31.4650653Z inflating: build/bin/cuda_integer_divider_test 2025-12-04T09:42:31.4712306Z inflating: build/bin/cuda_optional_test 2025-12-04T09:42:31.4776722Z inflating: build/bin/cuda_packedtensoraccessor_test 2025-12-04T09:42:31.4841672Z inflating: build/bin/cuda_reportMemoryUsage_test 2025-12-04T09:42:31.4903519Z inflating: build/bin/cuda_allocatorTraceTracker_test 2025-12-04T09:42:31.4978640Z inflating: build/bin/cuda_stream_test 2025-12-04T09:42:31.5043306Z inflating: build/bin/cuda_vectorized_test 2025-12-04T09:42:31.5105705Z inflating: build/bin/cuda_cudnn_test 2025-12-04T09:42:31.5508498Z inflating: build/bin/test_lazy 2025-12-04T09:42:31.5590524Z inflating: build/bin/ProcessGroupGlooTest 2025-12-04T09:42:31.5660520Z inflating: build/bin/ProcessGroupGlooAsyncTest 2025-12-04T09:42:31.6913569Z inflating: build/bin/test_jit 2025-12-04T09:42:31.6991984Z inflating: build/bin/ProcessGroupNCCLTest 2025-12-04T09:42:31.7067536Z inflating: build/bin/ProcessGroupNCCLErrorsTest 2025-12-04T09:42:31.7070758Z inflating: build/bin/example_allreduce 2025-12-04T09:42:31.7139246Z inflating: build/bin/test_dist_autograd 2025-12-04T09:42:31.7223007Z inflating: build/bin/test_cpp_rpc 2025-12-04T09:42:31.7225907Z inflating: build/bin/parallel_benchmark 2025-12-04T09:42:31.8566699Z inflating: build/bin/test_api 2025-12-04T09:42:31.8567474Z creating: .additional_ci_files/ 2025-12-04T09:42:31.8639237Z inflating: .additional_ci_files/test-times.json 2025-12-04T09:42:31.8902211Z inflating: .additional_ci_files/test-class-times.json 2025-12-04T09:42:31.8947340Z ##[group]Run rm artifacts.zip 2025-12-04T09:42:31.8947735Z rm artifacts.zip 2025-12-04T09:42:31.8955171Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T09:42:31.8955629Z env: 2025-12-04T09:42:31.8955898Z GIT_DEFAULT_BRANCH: main 2025-12-04T09:42:31.8956206Z HAS_NVIDIA_GPU: true 2025-12-04T09:42:31.8956810Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T09:42:31.8957253Z ##[endgroup] 2025-12-04T09:42:31.9550018Z ##[group]Run df -H 2025-12-04T09:42:31.9550349Z df -H 2025-12-04T09:42:31.9557321Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T09:42:31.9557775Z env: 2025-12-04T09:42:31.9558037Z GIT_DEFAULT_BRANCH: main 2025-12-04T09:42:31.9558528Z HAS_NVIDIA_GPU: true 2025-12-04T09:42:31.9558897Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T09:42:31.9559321Z ##[endgroup] 2025-12-04T09:42:31.9610755Z Filesystem Size Used Avail Use% Mounted on 2025-12-04T09:42:31.9611220Z devtmpfs 4.2M 0 4.2M 0% /dev 2025-12-04T09:42:31.9611610Z tmpfs 34G 0 34G 0% /dev/shm 2025-12-04T09:42:31.9612007Z tmpfs 14G 562k 14G 1% /run 2025-12-04T09:42:31.9612393Z /dev/nvme0n1p1 161G 51G 111G 32% / 2025-12-04T09:42:31.9612775Z tmpfs 34G 17k 34G 1% /tmp 2025-12-04T09:42:31.9613370Z /dev/nvme0n1p128 11M 1.4M 9.2M 13% /boot/efi 2025-12-04T09:42:31.9613801Z tmpfs 6.7G 0 6.7G 0% /run/user/0 2025-12-04T09:42:31.9653074Z Prepare all required actions 2025-12-04T09:42:31.9654122Z Getting action download info 2025-12-04T09:42:32.1237741Z ##[group]Run ./.github/actions/download-td-artifacts 2025-12-04T09:42:32.1238193Z with: 2025-12-04T09:42:32.1238440Z env: 2025-12-04T09:42:32.1238676Z GIT_DEFAULT_BRANCH: main 2025-12-04T09:42:32.1238995Z HAS_NVIDIA_GPU: true 2025-12-04T09:42:32.1239368Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T09:42:32.1239774Z ##[endgroup] 2025-12-04T09:42:32.1272750Z ##[group]Run seemethere/download-artifact-s3@v4 2025-12-04T09:42:32.1273174Z with: 2025-12-04T09:42:32.1273409Z name: td_results 2025-12-04T09:42:32.1273720Z s3-bucket: gha-artifacts 2025-12-04T09:42:32.1274033Z region: us-east-1 2025-12-04T09:42:32.1274283Z env: 2025-12-04T09:42:32.1274528Z GIT_DEFAULT_BRANCH: main 2025-12-04T09:42:32.1274834Z HAS_NVIDIA_GPU: true 2025-12-04T09:42:32.1275201Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T09:42:32.1275683Z ##[endgroup] 2025-12-04T09:42:32.7814762Z (node:68824) NOTE: We are formalizing our plans to enter AWS SDK for JavaScript (v2) into maintenance mode in 2023. 2025-12-04T09:42:32.7815353Z 2025-12-04T09:42:32.7815591Z Please migrate your code to use AWS SDK for JavaScript (v3). 2025-12-04T09:42:32.7816232Z For more information, check the migration guide at https://a.co/7PzMCcy 2025-12-04T09:42:32.7817011Z (Use `node --trace-warnings ...` to show where the warning was created) 2025-12-04T09:42:32.8934156Z Found 1 objects with prefix pytorch/pytorch/19922826259/td_results/ 2025-12-04T09:42:32.8934917Z Starting download (1/1): /home/ec2-user/actions-runner/_work/pytorch/pytorch/td_results.json 2025-12-04T09:42:32.9817745Z Finished download (1/1): /home/ec2-user/actions-runner/_work/pytorch/pytorch/td_results.json 2025-12-04T09:42:32.9823886Z Artifact download has finished successfully 2025-12-04T09:42:33.0001408Z ##[group]Run mkdir -p .additional_ci_files 2025-12-04T09:42:33.0001867Z mkdir -p .additional_ci_files 2025-12-04T09:42:33.0002379Z mv td_results.json .additional_ci_files/td_results.json || true 2025-12-04T09:42:33.0010056Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T09:42:33.0010506Z env: 2025-12-04T09:42:33.0010764Z GIT_DEFAULT_BRANCH: main 2025-12-04T09:42:33.0011079Z HAS_NVIDIA_GPU: true 2025-12-04T09:42:33.0011432Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T09:42:33.0011858Z ##[endgroup] 2025-12-04T09:42:33.0144192Z ##[group]Run .github/scripts/parse_ref.py 2025-12-04T09:42:33.0144667Z .github/scripts/parse_ref.py 2025-12-04T09:42:33.0151278Z shell: /usr/bin/bash -e {0} 2025-12-04T09:42:33.0151604Z env: 2025-12-04T09:42:33.0151858Z GIT_DEFAULT_BRANCH: main 2025-12-04T09:42:33.0152174Z HAS_NVIDIA_GPU: true 2025-12-04T09:42:33.0152531Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T09:42:33.0152958Z ##[endgroup] 2025-12-04T09:42:33.0399015Z Setting output branch=main 2025-12-04T09:42:33.0563865Z Prepare all required actions 2025-12-04T09:42:33.0564310Z Getting action download info 2025-12-04T09:42:33.1985589Z ##[group]Run ./.github/actions/filter-test-configs 2025-12-04T09:42:33.1986009Z with: 2025-12-04T09:42:33.1986650Z github-token: *** 2025-12-04T09:42:33.1993668Z test-matrix: {"include": [{"config": "legacy_nvidia_driver", "shard": 1, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "mem_leak_check": "mem_leak_check", "unstable": "unstable"}, {"config": "legacy_nvidia_driver", "shard": 1, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests", "unstable": "unstable"}, {"config": "legacy_nvidia_driver", "shard": 2, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "mem_leak_check": "mem_leak_check", "unstable": "unstable"}, {"config": "legacy_nvidia_driver", "shard": 2, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests", "unstable": "unstable"}, {"config": "legacy_nvidia_driver", "shard": 3, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "mem_leak_check": "mem_leak_check", "unstable": "unstable"}, {"config": "legacy_nvidia_driver", "shard": 3, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests", "unstable": "unstable"}, {"config": "legacy_nvidia_driver", "shard": 4, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "mem_leak_check": "mem_leak_check", "unstable": "unstable"}, {"config": "legacy_nvidia_driver", "shard": 4, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests", "unstable": "unstable"}, {"config": "legacy_nvidia_driver", "shard": 5, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "mem_leak_check": "mem_leak_check", "unstable": "unstable"}, {"config": "legacy_nvidia_driver", "shard": 5, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests", "unstable": "unstable"}]} 2025-12-04T09:42:33.2001374Z job-name: linux-jammy-cuda12.4-py3.10-gcc11 / test (legacy_nvidia_driver, 1, 5, linux.g4dn.4xlarge.nvidia.gpu, mem_leak_check, unstable) 2025-12-04T09:42:33.2002155Z env: 2025-12-04T09:42:33.2002410Z GIT_DEFAULT_BRANCH: main 2025-12-04T09:42:33.2002711Z HAS_NVIDIA_GPU: true 2025-12-04T09:42:33.2003080Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T09:42:33.2003499Z ##[endgroup] 2025-12-04T09:42:33.2045027Z ##[group]Run nick-fields/retry@v3.0.0 2025-12-04T09:42:33.2045374Z with: 2025-12-04T09:42:33.2045617Z shell: bash 2025-12-04T09:42:33.2045878Z timeout_minutes: 10 2025-12-04T09:42:33.2046165Z max_attempts: 5 2025-12-04T09:42:33.2046429Z retry_wait_seconds: 30 2025-12-04T09:42:33.2047373Z command: set -eux # PyYAML 6.0 doesn't work with MacOS x86 anymore # This must run on Python-3.7 (AmazonLinux2) so can't use request=3.32.2 python3 -m pip install requests==2.27.1 pyyaml==6.0.2 2025-12-04T09:42:33.2048381Z polling_interval_seconds: 1 2025-12-04T09:42:33.2048699Z warning_on_retry: true 2025-12-04T09:42:33.2049006Z continue_on_error: false 2025-12-04T09:42:33.2049305Z env: 2025-12-04T09:42:33.2049539Z GIT_DEFAULT_BRANCH: main 2025-12-04T09:42:33.2049859Z HAS_NVIDIA_GPU: true 2025-12-04T09:42:33.2050224Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T09:42:33.2050836Z GITHUB_TOKEN: *** 2025-12-04T09:42:33.2051095Z ##[endgroup] 2025-12-04T09:42:33.3166150Z + python3 -m pip install requests==2.27.1 pyyaml==6.0.2 2025-12-04T09:42:33.5963797Z Defaulting to user installation because normal site-packages is not writeable 2025-12-04T09:42:33.7293956Z Collecting requests==2.27.1 2025-12-04T09:42:33.7489429Z Downloading requests-2.27.1-py2.py3-none-any.whl (63 kB) 2025-12-04T09:42:33.9526432Z Collecting pyyaml==6.0.2 2025-12-04T09:42:33.9581732Z Downloading PyYAML-6.0.2-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (737 kB) 2025-12-04T09:42:34.4326186Z Collecting charset-normalizer~=2.0.0 2025-12-04T09:42:34.4370809Z Downloading charset_normalizer-2.0.12-py3-none-any.whl (39 kB) 2025-12-04T09:42:34.4435092Z Requirement already satisfied: idna<4,>=2.5 in /usr/lib/python3.9/site-packages (from requests==2.27.1) (2.10) 2025-12-04T09:42:34.4439416Z Requirement already satisfied: urllib3<1.27,>=1.21.1 in /usr/lib/python3.9/site-packages (from requests==2.27.1) (1.25.10) 2025-12-04T09:42:34.4978566Z Collecting certifi>=2017.4.17 2025-12-04T09:42:34.5025346Z Downloading certifi-2025.11.12-py3-none-any.whl (159 kB) 2025-12-04T09:42:34.6081533Z Installing collected packages: charset-normalizer, certifi, requests, pyyaml 2025-12-04T09:42:34.7450941Z Successfully installed certifi-2025.11.12 charset-normalizer-2.0.12 pyyaml-6.0.2 requests-2.27.1 2025-12-04T09:42:35.2951354Z Command completed after 1 attempt(s). 2025-12-04T09:42:35.3002790Z ##[group]Run set -x 2025-12-04T09:42:35.3003090Z set -x 2025-12-04T09:42:35.3003360Z  2025-12-04T09:42:35.3003842Z # Use relative path here as this could be checked out anywhere, not necessarily 2025-12-04T09:42:35.3004405Z # in runner workspace 2025-12-04T09:42:35.3004864Z python3 "${GITHUB_ACTION_PATH}/../../scripts/parse_ref.py" 2025-12-04T09:42:35.3012661Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T09:42:35.3013111Z env: 2025-12-04T09:42:35.3013354Z GIT_DEFAULT_BRANCH: main 2025-12-04T09:42:35.3013674Z HAS_NVIDIA_GPU: true 2025-12-04T09:42:35.3014046Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T09:42:35.3014455Z ##[endgroup] 2025-12-04T09:42:35.3045808Z + python3 /home/ec2-user/actions-runner/_work/pytorch/pytorch/./.github/actions/filter-test-configs/../../scripts/parse_ref.py 2025-12-04T09:42:35.3255165Z Setting output branch=main 2025-12-04T09:42:35.3315731Z ##[group]Run echo "Workflow: ${GITHUB_WORKFLOW}" 2025-12-04T09:42:35.3316228Z echo "Workflow: ${GITHUB_WORKFLOW}" 2025-12-04T09:42:35.3316633Z echo "Job name: ${JOB_NAME}" 2025-12-04T09:42:35.3316977Z  2025-12-04T09:42:35.3317419Z # Use relative path here as this could be checked out anywhere, not necessarily 2025-12-04T09:42:35.3317970Z # in runner workspace 2025-12-04T09:42:35.3318489Z python3 "${GITHUB_ACTION_PATH}/../../scripts/filter_test_configs.py" \ 2025-12-04T09:42:35.3319059Z  --workflow "${GITHUB_WORKFLOW}" \ 2025-12-04T09:42:35.3319455Z  --job-name "${JOB_NAME}" \ 2025-12-04T09:42:35.3326719Z  --test-matrix "{"include": [{"config": "legacy_nvidia_driver", "shard": 1, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "mem_leak_check": "mem_leak_check", "unstable": "unstable"}, {"config": "legacy_nvidia_driver", "shard": 1, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests", "unstable": "unstable"}, {"config": "legacy_nvidia_driver", "shard": 2, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "mem_leak_check": "mem_leak_check", "unstable": "unstable"}, {"config": "legacy_nvidia_driver", "shard": 2, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests", "unstable": "unstable"}, {"config": "legacy_nvidia_driver", "shard": 3, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "mem_leak_check": "mem_leak_check", "unstable": "unstable"}, {"config": "legacy_nvidia_driver", "shard": 3, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests", "unstable": "unstable"}, {"config": "legacy_nvidia_driver", "shard": 4, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "mem_leak_check": "mem_leak_check", "unstable": "unstable"}, {"config": "legacy_nvidia_driver", "shard": 4, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests", "unstable": "unstable"}, {"config": "legacy_nvidia_driver", "shard": 5, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "mem_leak_check": "mem_leak_check", "unstable": "unstable"}, {"config": "legacy_nvidia_driver", "shard": 5, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests", "unstable": "unstable"}]}" \ 2025-12-04T09:42:35.3334255Z  --selected-test-configs "" \ 2025-12-04T09:42:35.3334656Z  --pr-number "${PR_NUMBER}" \ 2025-12-04T09:42:35.3335025Z  --tag "${TAG}" \ 2025-12-04T09:42:35.3335366Z  --event-name "${EVENT_NAME}" \ 2025-12-04T09:42:35.3335730Z  --schedule "${SCHEDULE}" \ 2025-12-04T09:42:35.3336095Z  --branch "${HEAD_BRANCH}" 2025-12-04T09:42:35.3342972Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T09:42:35.3343406Z env: 2025-12-04T09:42:35.3343660Z GIT_DEFAULT_BRANCH: main 2025-12-04T09:42:35.3343978Z HAS_NVIDIA_GPU: true 2025-12-04T09:42:35.3344330Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T09:42:35.3345047Z GITHUB_TOKEN: *** 2025-12-04T09:42:35.3345767Z JOB_NAME: linux-jammy-cuda12.4-py3.10-gcc11 / test (legacy_nvidia_driver, 1, 5, linux.g4dn.4xlarge.nvidia.gpu, mem_leak_check, unstable) 2025-12-04T09:42:35.3346562Z PR_NUMBER: 2025-12-04T09:42:35.3346801Z TAG: 2025-12-04T09:42:35.3347058Z EVENT_NAME: schedule 2025-12-04T09:42:35.3347348Z SCHEDULE: 29 8 * * * 2025-12-04T09:42:35.3347617Z HEAD_BRANCH: main 2025-12-04T09:42:35.3347884Z ##[endgroup] 2025-12-04T09:42:35.3375234Z Workflow: periodic 2025-12-04T09:42:35.3375999Z Job name: linux-jammy-cuda12.4-py3.10-gcc11 / test (legacy_nvidia_driver, 1, 5, linux.g4dn.4xlarge.nvidia.gpu, mem_leak_check, unstable) 2025-12-04T09:42:35.5382421Z Setting output keep-going=True 2025-12-04T09:42:35.5382852Z Setting output ci-verbose-test-logs=False 2025-12-04T09:42:35.5383257Z Setting output ci-test-showlocals=False 2025-12-04T09:42:35.5383652Z Setting output ci-no-test-timeout=False 2025-12-04T09:42:35.5384033Z Setting output ci-no-td=False 2025-12-04T09:42:35.5384416Z Setting output ci-td-distributed=False 2025-12-04T09:42:35.5384797Z Setting output is-unstable=True 2025-12-04T09:42:35.5385150Z Setting output reenabled-issues= 2025-12-04T09:42:35.5400908Z Setting output test-matrix={"include": [{"config": "legacy_nvidia_driver", "shard": 1, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "mem_leak_check": "mem_leak_check", "unstable": "unstable"}, {"config": "legacy_nvidia_driver", "shard": 1, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "mem_leak_check": "mem_leak_check", "unstable": "unstable", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "legacy_nvidia_driver", "shard": 1, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests", "unstable": "unstable", "mem_leak_check": "mem_leak_check"}, {"config": "legacy_nvidia_driver", "shard": 1, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests", "unstable": "unstable"}, {"config": "legacy_nvidia_driver", "shard": 2, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "mem_leak_check": "mem_leak_check", "unstable": "unstable"}, {"config": "legacy_nvidia_driver", "shard": 2, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "mem_leak_check": "mem_leak_check", "unstable": "unstable", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "legacy_nvidia_driver", "shard": 2, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests", "unstable": "unstable", "mem_leak_check": "mem_leak_check"}, {"config": "legacy_nvidia_driver", "shard": 2, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests", "unstable": "unstable"}, {"config": "legacy_nvidia_driver", "shard": 3, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "mem_leak_check": "mem_leak_check", "unstable": "unstable"}, {"config": "legacy_nvidia_driver", "shard": 3, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "mem_leak_check": "mem_leak_check", "unstable": "unstable", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "legacy_nvidia_driver", "shard": 3, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests", "unstable": "unstable", "mem_leak_check": "mem_leak_check"}, {"config": "legacy_nvidia_driver", "shard": 3, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests", "unstable": "unstable"}, {"config": "legacy_nvidia_driver", "shard": 4, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "mem_leak_check": "mem_leak_check", "unstable": "unstable"}, {"config": "legacy_nvidia_driver", "shard": 4, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "mem_leak_check": "mem_leak_check", "unstable": "unstable", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "legacy_nvidia_driver", "shard": 4, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests", "unstable": "unstable", "mem_leak_check": "mem_leak_check"}, {"config": "legacy_nvidia_driver", "shard": 4, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests", "unstable": "unstable"}, {"config": "legacy_nvidia_driver", "shard": 5, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "mem_leak_check": "mem_leak_check", "unstable": "unstable"}, {"config": "legacy_nvidia_driver", "shard": 5, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "mem_leak_check": "mem_leak_check", "unstable": "unstable", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "legacy_nvidia_driver", "shard": 5, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests", "unstable": "unstable", "mem_leak_check": "mem_leak_check"}, {"config": "legacy_nvidia_driver", "shard": 5, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests", "unstable": "unstable"}]} 2025-12-04T09:42:35.5416917Z Setting output is-test-matrix-empty=False 2025-12-04T09:42:35.5537286Z ##[group]Run echo "Filtered matrix:" 2025-12-04T09:42:35.5537756Z echo "Filtered matrix:" 2025-12-04T09:42:35.5553429Z echo "{"include": [{"config": "legacy_nvidia_driver", "shard": 1, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "mem_leak_check": "mem_leak_check", "unstable": "unstable"}, {"config": "legacy_nvidia_driver", "shard": 1, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "mem_leak_check": "mem_leak_check", "unstable": "unstable", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "legacy_nvidia_driver", "shard": 1, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests", "unstable": "unstable", "mem_leak_check": "mem_leak_check"}, {"config": "legacy_nvidia_driver", "shard": 1, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests", "unstable": "unstable"}, {"config": "legacy_nvidia_driver", "shard": 2, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "mem_leak_check": "mem_leak_check", "unstable": "unstable"}, {"config": "legacy_nvidia_driver", "shard": 2, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "mem_leak_check": "mem_leak_check", "unstable": "unstable", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "legacy_nvidia_driver", "shard": 2, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests", "unstable": "unstable", "mem_leak_check": "mem_leak_check"}, {"config": "legacy_nvidia_driver", "shard": 2, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests", "unstable": "unstable"}, {"config": "legacy_nvidia_driver", "shard": 3, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "mem_leak_check": "mem_leak_check", "unstable": "unstable"}, {"config": "legacy_nvidia_driver", "shard": 3, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "mem_leak_check": "mem_leak_check", "unstable": "unstable", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "legacy_nvidia_driver", "shard": 3, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests", "unstable": "unstable", "mem_leak_check": "mem_leak_check"}, {"config": "legacy_nvidia_driver", "shard": 3, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests", "unstable": "unstable"}, {"config": "legacy_nvidia_driver", "shard": 4, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "mem_leak_check": "mem_leak_check", "unstable": "unstable"}, {"config": "legacy_nvidia_driver", "shard": 4, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "mem_leak_check": "mem_leak_check", "unstable": "unstable", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "legacy_nvidia_driver", "shard": 4, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests", "unstable": "unstable", "mem_leak_check": "mem_leak_check"}, {"config": "legacy_nvidia_driver", "shard": 4, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests", "unstable": "unstable"}, {"config": "legacy_nvidia_driver", "shard": 5, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "mem_leak_check": "mem_leak_check", "unstable": "unstable"}, {"config": "legacy_nvidia_driver", "shard": 5, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "mem_leak_check": "mem_leak_check", "unstable": "unstable", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "legacy_nvidia_driver", "shard": 5, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests", "unstable": "unstable", "mem_leak_check": "mem_leak_check"}, {"config": "legacy_nvidia_driver", "shard": 5, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests", "unstable": "unstable"}]}" 2025-12-04T09:42:35.5569414Z  2025-12-04T09:42:35.5569666Z echo 2025-12-04T09:42:35.5569986Z echo "Is the current job unstable? True" 2025-12-04T09:42:35.5570378Z  2025-12-04T09:42:35.5570618Z echo 2025-12-04T09:42:35.5570919Z echo "Is keep-going label set? True" 2025-12-04T09:42:35.5571643Z  2025-12-04T09:42:35.5571892Z echo 2025-12-04T09:42:35.5572173Z echo "Reenabled issues? " 2025-12-04T09:42:35.5579238Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T09:42:35.5579691Z env: 2025-12-04T09:42:35.5579951Z GIT_DEFAULT_BRANCH: main 2025-12-04T09:42:35.5580254Z HAS_NVIDIA_GPU: true 2025-12-04T09:42:35.5580625Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T09:42:35.5581050Z ##[endgroup] 2025-12-04T09:42:35.5608647Z Filtered matrix: 2025-12-04T09:42:35.5628017Z {include: [{config: legacy_nvidia_driver, shard: 1, num_shards: 5, runner: linux.g4dn.4xlarge.nvidia.gpu, mem_leak_check: mem_leak_check, unstable: unstable}, {config: legacy_nvidia_driver, shard: 1, num_shards: 5, runner: linux.g4dn.4xlarge.nvidia.gpu, mem_leak_check: mem_leak_check, unstable: unstable, rerun_disabled_tests: rerun_disabled_tests}, {config: legacy_nvidia_driver, shard: 1, num_shards: 5, runner: linux.g4dn.4xlarge.nvidia.gpu, rerun_disabled_tests: rerun_disabled_tests, unstable: unstable, mem_leak_check: mem_leak_check}, {config: legacy_nvidia_driver, shard: 1, num_shards: 5, runner: linux.g4dn.4xlarge.nvidia.gpu, rerun_disabled_tests: rerun_disabled_tests, unstable: unstable}, {config: legacy_nvidia_driver, shard: 2, num_shards: 5, runner: linux.g4dn.4xlarge.nvidia.gpu, mem_leak_check: mem_leak_check, unstable: unstable}, {config: legacy_nvidia_driver, shard: 2, num_shards: 5, runner: linux.g4dn.4xlarge.nvidia.gpu, mem_leak_check: mem_leak_check, unstable: unstable, rerun_disabled_tests: rerun_disabled_tests}, {config: legacy_nvidia_driver, shard: 2, num_shards: 5, runner: linux.g4dn.4xlarge.nvidia.gpu, rerun_disabled_tests: rerun_disabled_tests, unstable: unstable, mem_leak_check: mem_leak_check}, {config: legacy_nvidia_driver, shard: 2, num_shards: 5, runner: linux.g4dn.4xlarge.nvidia.gpu, rerun_disabled_tests: rerun_disabled_tests, unstable: unstable}, {config: legacy_nvidia_driver, shard: 3, num_shards: 5, runner: linux.g4dn.4xlarge.nvidia.gpu, mem_leak_check: mem_leak_check, unstable: unstable}, {config: legacy_nvidia_driver, shard: 3, num_shards: 5, runner: linux.g4dn.4xlarge.nvidia.gpu, mem_leak_check: mem_leak_check, unstable: unstable, rerun_disabled_tests: rerun_disabled_tests}, {config: legacy_nvidia_driver, shard: 3, num_shards: 5, runner: linux.g4dn.4xlarge.nvidia.gpu, rerun_disabled_tests: rerun_disabled_tests, unstable: unstable, mem_leak_check: mem_leak_check}, {config: legacy_nvidia_driver, shard: 3, num_shards: 5, runner: linux.g4dn.4xlarge.nvidia.gpu, rerun_disabled_tests: rerun_disabled_tests, unstable: unstable}, {config: legacy_nvidia_driver, shard: 4, num_shards: 5, runner: linux.g4dn.4xlarge.nvidia.gpu, mem_leak_check: mem_leak_check, unstable: unstable}, {config: legacy_nvidia_driver, shard: 4, num_shards: 5, runner: linux.g4dn.4xlarge.nvidia.gpu, mem_leak_check: mem_leak_check, unstable: unstable, rerun_disabled_tests: rerun_disabled_tests}, {config: legacy_nvidia_driver, shard: 4, num_shards: 5, runner: linux.g4dn.4xlarge.nvidia.gpu, rerun_disabled_tests: rerun_disabled_tests, unstable: unstable, mem_leak_check: mem_leak_check}, {config: legacy_nvidia_driver, shard: 4, num_shards: 5, runner: linux.g4dn.4xlarge.nvidia.gpu, rerun_disabled_tests: rerun_disabled_tests, unstable: unstable}, {config: legacy_nvidia_driver, shard: 5, num_shards: 5, runner: linux.g4dn.4xlarge.nvidia.gpu, mem_leak_check: mem_leak_check, unstable: unstable}, {config: legacy_nvidia_driver, shard: 5, num_shards: 5, runner: linux.g4dn.4xlarge.nvidia.gpu, mem_leak_check: mem_leak_check, unstable: unstable, rerun_disabled_tests: rerun_disabled_tests}, {config: legacy_nvidia_driver, shard: 5, num_shards: 5, runner: linux.g4dn.4xlarge.nvidia.gpu, rerun_disabled_tests: rerun_disabled_tests, unstable: unstable, mem_leak_check: mem_leak_check}, {config: legacy_nvidia_driver, shard: 5, num_shards: 5, runner: linux.g4dn.4xlarge.nvidia.gpu, rerun_disabled_tests: rerun_disabled_tests, unstable: unstable}]} 2025-12-04T09:42:35.5643534Z 2025-12-04T09:42:35.5643686Z Is the current job unstable? True 2025-12-04T09:42:35.5643938Z 2025-12-04T09:42:35.5644067Z Is keep-going label set? True 2025-12-04T09:42:35.5644285Z 2025-12-04T09:42:35.5644414Z Reenabled issues? 2025-12-04T09:42:35.5681867Z ##[group]Run echo "timeout=$((JOB_TIMEOUT-30))" >> "${GITHUB_OUTPUT}" 2025-12-04T09:42:35.5682500Z echo "timeout=$((JOB_TIMEOUT-30))" >> "${GITHUB_OUTPUT}" 2025-12-04T09:42:35.5689300Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T09:42:35.5689734Z env: 2025-12-04T09:42:35.5689991Z GIT_DEFAULT_BRANCH: main 2025-12-04T09:42:35.5690310Z HAS_NVIDIA_GPU: true 2025-12-04T09:42:35.5690668Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T09:42:35.5691091Z JOB_TIMEOUT: 600 2025-12-04T09:42:35.5691363Z ##[endgroup] 2025-12-04T09:42:35.5745465Z ##[group]Run env | grep '^GITHUB' >> "/tmp/github_env_${GITHUB_RUN_ID}" 2025-12-04T09:42:35.5746099Z env | grep '^GITHUB' >> "/tmp/github_env_${GITHUB_RUN_ID}" 2025-12-04T09:42:35.5746641Z env | grep '^CI' >> "/tmp/github_env_${GITHUB_RUN_ID}" 2025-12-04T09:42:35.5753070Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T09:42:35.5753520Z env: 2025-12-04T09:42:35.5753771Z GIT_DEFAULT_BRANCH: main 2025-12-04T09:42:35.5754073Z HAS_NVIDIA_GPU: true 2025-12-04T09:42:35.5754443Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T09:42:35.5754864Z ##[endgroup] 2025-12-04T09:42:35.5864726Z ##[group]Run set -x 2025-12-04T09:42:35.5865126Z set -x 2025-12-04T09:42:35.5865392Z  2025-12-04T09:42:35.5865687Z if [[ $TEST_CONFIG == 'multigpu' ]]; then 2025-12-04T09:42:35.5866137Z  TEST_COMMAND=.ci/pytorch/multigpu-test.sh 2025-12-04T09:42:35.5866610Z elif [[ $BUILD_ENVIRONMENT == *onnx* ]]; then 2025-12-04T09:42:35.5867193Z  TEST_COMMAND=.ci/onnx/test.sh 2025-12-04T09:42:35.5867536Z else 2025-12-04T09:42:35.5867836Z  TEST_COMMAND=.ci/pytorch/test.sh 2025-12-04T09:42:35.5868204Z fi 2025-12-04T09:42:35.5868431Z  2025-12-04T09:42:35.5868737Z # Leaving 1GB for the runner and other things 2025-12-04T09:42:35.5869424Z TOTAL_AVAILABLE_MEMORY_IN_GB=$(awk '/MemTotal/ { printf "%.3f \n", $2/1024/1024 - 1 }' /proc/meminfo) 2025-12-04T09:42:35.5870453Z # https://docs.docker.com/engine/containers/resource_constraints/#--memory-swap-details, the 3GB swap 2025-12-04T09:42:35.5871572Z # comes from https://github.com/pytorch/test-infra/pull/6058 2025-12-04T09:42:35.5872209Z TOTAL_MEMORY_WITH_SWAP=$(("${TOTAL_AVAILABLE_MEMORY_IN_GB%.*}" + 3)) 2025-12-04T09:42:35.5872710Z  2025-12-04T09:42:35.5873012Z if [[ ${BUILD_ENVIRONMENT} == *"s390x"* ]]; then 2025-12-04T09:42:35.5873425Z  SHM_OPTS= 2025-12-04T09:42:35.5873724Z  JENKINS_USER= 2025-12-04T09:42:35.5874121Z  # ensure that docker container cleanly exits in 12 hours 2025-12-04T09:42:35.5874687Z  # if for some reason cleanup action doesn't stop container 2025-12-04T09:42:35.5875164Z  # when job is cancelled 2025-12-04T09:42:35.5875529Z  DOCKER_SHELL_CMD="sleep 12h" 2025-12-04T09:42:35.5875911Z  USED_IMAGE="${DOCKER_IMAGE_S390X}" 2025-12-04T09:42:35.5876283Z else 2025-12-04T09:42:35.5876573Z  SHM_OPTS="--shm-size=${SHM_SIZE}" 2025-12-04T09:42:35.5876961Z  JENKINS_USER="--user jenkins" 2025-12-04T09:42:35.5877332Z  DOCKER_SHELL_CMD= 2025-12-04T09:42:35.5877671Z  USED_IMAGE="${DOCKER_IMAGE}" 2025-12-04T09:42:35.5878008Z fi 2025-12-04T09:42:35.5878251Z  2025-12-04T09:42:35.5878650Z # detached container should get cleaned up by teardown_ec2_linux 2025-12-04T09:42:35.5879293Z # TODO: Stop building test binaries as part of the build phase 2025-12-04T09:42:35.5880012Z # Used for GPU_FLAG, SHM_OPTS, JENKINS_USER and DOCKER_SHELL_CMD since that doesn't play nice 2025-12-04T09:42:35.5880654Z # shellcheck disable=SC2086,SC2090 2025-12-04T09:42:35.5881047Z container_name=$(docker run \ 2025-12-04T09:42:35.5881401Z  ${GPU_FLAG:-} \ 2025-12-04T09:42:35.5881756Z  ${SCCACHE_SERVER_PORT_DOCKER_FLAG:-} \ 2025-12-04T09:42:35.5882164Z  -e BUILD_ENVIRONMENT \ 2025-12-04T09:42:35.5882517Z  -e PR_NUMBER \ 2025-12-04T09:42:35.5882826Z  -e GITHUB_ACTIONS \ 2025-12-04T09:42:35.5883168Z  -e GITHUB_REPOSITORY \ 2025-12-04T09:42:35.5883520Z  -e GITHUB_WORKFLOW \ 2025-12-04T09:42:35.5883845Z  -e GITHUB_JOB \ 2025-12-04T09:42:35.5884164Z  -e GITHUB_RUN_ID \ 2025-12-04T09:42:35.5884495Z  -e GITHUB_RUN_NUMBER \ 2025-12-04T09:42:35.5884834Z  -e GITHUB_RUN_ATTEMPT \ 2025-12-04T09:42:35.5885189Z  -e JOB_ID \ 2025-12-04T09:42:35.5885494Z  -e JOB_NAME \ 2025-12-04T09:42:35.5885799Z  -e BASE_SHA \ 2025-12-04T09:42:35.5886085Z  -e BRANCH \ 2025-12-04T09:42:35.5886375Z  -e SHA1 \ 2025-12-04T09:42:35.5886670Z  -e AWS_DEFAULT_REGION \ 2025-12-04T09:42:35.5887003Z  -e IN_WHEEL_TEST \ 2025-12-04T09:42:35.5887331Z  -e SHARD_NUMBER \ 2025-12-04T09:42:35.5887656Z  -e TEST_CONFIG \ 2025-12-04T09:42:35.5887968Z  -e NUM_TEST_SHARDS \ 2025-12-04T09:42:35.5888484Z  -e REENABLED_ISSUES \ 2025-12-04T09:42:35.5888858Z  -e CONTINUE_THROUGH_ERROR \ 2025-12-04T09:42:35.5889215Z  -e VERBOSE_TEST_LOGS \ 2025-12-04T09:42:35.5889566Z  -e TEST_SHOWLOCALS \ 2025-12-04T09:42:35.5889908Z  -e NO_TEST_TIMEOUT \ 2025-12-04T09:42:35.5890239Z  -e NO_TD \ 2025-12-04T09:42:35.5904781Z  -e TD_DISTRIBUTED \ 2025-12-04T09:42:35.5905355Z  -e PR_LABELS \ 2025-12-04T09:42:35.5905719Z  -e MAX_JOBS="$(nproc --ignore=2)" \ 2025-12-04T09:42:35.5906107Z  -e SCCACHE_BUCKET \ 2025-12-04T09:42:35.5906446Z  -e SCCACHE_REGION \ 2025-12-04T09:42:35.5906773Z  -e XLA_CUDA \ 2025-12-04T09:42:35.5907102Z  -e XLA_CLANG_CACHE_S3_BUCKET_NAME \ 2025-12-04T09:42:35.5907534Z  -e PYTORCH_TEST_CUDA_MEM_LEAK_CHECK \ 2025-12-04T09:42:35.5907972Z  -e PYTORCH_TEST_RERUN_DISABLED_TESTS \ 2025-12-04T09:42:35.5908426Z  -e SKIP_SCCACHE_INITIALIZATION=1 \ 2025-12-04T09:42:35.5908819Z  -e HUGGING_FACE_HUB_TOKEN \ 2025-12-04T09:42:35.5909211Z  -e VLLM_TEST_HUGGING_FACE_TOKEN \ 2025-12-04T09:42:35.5909618Z  -e SCRIBE_GRAPHQL_ACCESS_TOKEN \ 2025-12-04T09:42:35.5909987Z  -e DASHBOARD_TAG \ 2025-12-04T09:42:35.5910327Z  -e ARTIFACTS_FILE_SUFFIX \ 2025-12-04T09:42:35.5910764Z  --memory="${TOTAL_AVAILABLE_MEMORY_IN_GB%.*}g" \ 2025-12-04T09:42:35.5911262Z  --memory-swap="${TOTAL_MEMORY_WITH_SWAP}g" \ 2025-12-04T09:42:35.5911737Z  --env-file="/tmp/github_env_${GITHUB_RUN_ID}" \ 2025-12-04T09:42:35.5912207Z  --security-opt seccomp=unconfined \ 2025-12-04T09:42:35.5912608Z  --cap-add=SYS_PTRACE \ 2025-12-04T09:42:35.5912939Z  --ipc=host \ 2025-12-04T09:42:35.5913239Z  ${SHM_OPTS} \ 2025-12-04T09:42:35.5913535Z  --tty \ 2025-12-04T09:42:35.5913796Z  --detach \ 2025-12-04T09:42:35.5914116Z  --name="${container_name}" \ 2025-12-04T09:42:35.5914492Z  ${JENKINS_USER} \ 2025-12-04T09:42:35.5914896Z  -v "${GITHUB_WORKSPACE}:/var/lib/jenkins/workspace" \ 2025-12-04T09:42:35.5915353Z  -w /var/lib/jenkins/workspace \ 2025-12-04T09:42:35.5915728Z  "${USED_IMAGE}" \ 2025-12-04T09:42:35.5916052Z  ${DOCKER_SHELL_CMD} 2025-12-04T09:42:35.5916356Z ) 2025-12-04T09:42:35.5916751Z echo "DOCKER_CONTAINER_ID=${container_name}" >> "${GITHUB_ENV}" 2025-12-04T09:42:35.5917236Z  2025-12-04T09:42:35.5917529Z if [[ ${BUILD_ENVIRONMENT} == *"s390x"* ]]; then 2025-12-04T09:42:35.5918220Z  docker exec -t "${container_name}" sh -c "python3 -m pip install -r .ci/docker/requirements-ci.txt" 2025-12-04T09:42:35.5918834Z fi 2025-12-04T09:42:35.5919077Z  2025-12-04T09:42:35.5919652Z docker exec -t "${container_name}" sh -c "python3 -m pip install $(echo dist/*.whl)[opt-einsum] && ${TEST_COMMAND}" 2025-12-04T09:42:35.5926673Z shell: /usr/bin/bash -e {0} 2025-12-04T09:42:35.5927001Z env: 2025-12-04T09:42:35.5927242Z GIT_DEFAULT_BRANCH: main 2025-12-04T09:42:35.5927557Z HAS_NVIDIA_GPU: true 2025-12-04T09:42:35.5927935Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T09:42:35.5928443Z BUILD_ENVIRONMENT: linux-jammy-cuda12.4-py3.10-gcc11 2025-12-04T09:42:35.5928859Z PR_NUMBER: 2025-12-04T09:42:35.5929142Z GITHUB_REPOSITORY: pytorch/pytorch 2025-12-04T09:42:35.5929502Z GITHUB_WORKFLOW: periodic 2025-12-04T09:42:35.5929800Z GITHUB_JOB: test 2025-12-04T09:42:35.5930074Z GITHUB_RUN_ID: 19922826259 2025-12-04T09:42:35.5930390Z GITHUB_RUN_NUMBER: 19107 2025-12-04T09:42:35.5930681Z GITHUB_RUN_ATTEMPT: 1 2025-12-04T09:42:35.5930962Z JOB_ID: 57119749248 2025-12-04T09:42:35.5931676Z JOB_NAME: linux-jammy-cuda12.4-py3.10-gcc11 / test (legacy_nvidia_driver, 1, 5, linux.g4dn.4xlarge.nvidia.gpu, mem_leak_check, unstable) 2025-12-04T09:42:35.5932571Z BRANCH: main 2025-12-04T09:42:35.5932886Z SHA1: ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32 2025-12-04T09:42:35.5933344Z BASE_SHA: ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32 2025-12-04T09:42:35.5933767Z TEST_CONFIG: legacy_nvidia_driver 2025-12-04T09:42:35.5934098Z SHARD_NUMBER: 1 2025-12-04T09:42:35.5934366Z NUM_TEST_SHARDS: 5 2025-12-04T09:42:35.5934644Z EXTRA_FLAGS: 2025-12-04T09:42:35.5934974Z OP_BENCHMARK_TESTS: 2025-12-04T09:42:35.5935262Z REENABLED_ISSUES: 2025-12-04T09:42:35.5935559Z CONTINUE_THROUGH_ERROR: True 2025-12-04T09:42:35.5935879Z VERBOSE_TEST_LOGS: False 2025-12-04T09:42:35.5936189Z TEST_SHOWLOCALS: False 2025-12-04T09:42:35.5936601Z NO_TEST_TIMEOUT: False 2025-12-04T09:42:35.5936878Z NO_TD: False 2025-12-04T09:42:35.5937143Z TD_DISTRIBUTED: False 2025-12-04T09:42:35.5937504Z SCCACHE_BUCKET: ossci-compiler-cache-circleci-v2 2025-12-04T09:42:35.5937928Z SCCACHE_REGION: us-east-1 2025-12-04T09:42:35.5938218Z SHM_SIZE: 2g 2025-12-04T09:42:35.5939144Z DOCKER_IMAGE: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.4-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a 2025-12-04T09:42:35.5940835Z DOCKER_IMAGE_S390X: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.4-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a 2025-12-04T09:42:35.5941855Z XLA_CUDA: 2025-12-04T09:42:35.5942260Z XLA_CLANG_CACHE_S3_BUCKET_NAME: ossci-compiler-clang-cache-circleci-xla 2025-12-04T09:42:35.5942796Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK: 1 2025-12-04T09:42:35.5943172Z PYTORCH_TEST_RERUN_DISABLED_TESTS: 0 2025-12-04T09:42:35.5943512Z DASHBOARD_TAG: 2025-12-04T09:42:35.5944031Z VLLM_TEST_HUGGING_FACE_TOKEN: *** 2025-12-04T09:42:35.5944521Z HUGGING_FACE_HUB_TOKEN: *** 2025-12-04T09:42:35.5945002Z SCRIBE_GRAPHQL_ACCESS_TOKEN: *** 2025-12-04T09:42:35.5945622Z ARTIFACTS_FILE_SUFFIX: test-legacy_nvidia_driver-1-5-linux.g4dn.4xlarge.nvidia.gpu_57119749248 2025-12-04T09:42:35.5946250Z ##[endgroup] 2025-12-04T09:42:35.5974814Z + [[ legacy_nvidia_driver == \m\u\l\t\i\g\p\u ]] 2025-12-04T09:42:35.5975312Z + [[ linux-jammy-cuda12.4-py3.10-gcc11 == *onnx* ]] 2025-12-04T09:42:35.5975735Z + TEST_COMMAND=.ci/pytorch/test.sh 2025-12-04T09:42:35.5978687Z ++ awk '/MemTotal/ { printf "%.3f \n", $2/1024/1024 - 1 }' /proc/meminfo 2025-12-04T09:42:35.6000500Z + TOTAL_AVAILABLE_MEMORY_IN_GB='61.094 ' 2025-12-04T09:42:35.6000932Z + TOTAL_MEMORY_WITH_SWAP=64 2025-12-04T09:42:35.6001316Z + [[ linux-jammy-cuda12.4-py3.10-gcc11 == *\s\3\9\0\x* ]] 2025-12-04T09:42:35.6001747Z + SHM_OPTS=--shm-size=2g 2025-12-04T09:42:35.6002060Z + JENKINS_USER='--user jenkins' 2025-12-04T09:42:35.6002371Z + DOCKER_SHELL_CMD= 2025-12-04T09:42:35.6003302Z + USED_IMAGE=308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.4-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a 2025-12-04T09:42:35.6009881Z +++ nproc --ignore=2 2025-12-04T09:42:35.6042406Z ++ docker run --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all -e BUILD_ENVIRONMENT -e PR_NUMBER -e GITHUB_ACTIONS -e GITHUB_REPOSITORY -e GITHUB_WORKFLOW -e GITHUB_JOB -e GITHUB_RUN_ID -e GITHUB_RUN_NUMBER -e GITHUB_RUN_ATTEMPT -e JOB_ID -e JOB_NAME -e BASE_SHA -e BRANCH -e SHA1 -e AWS_DEFAULT_REGION -e IN_WHEEL_TEST -e SHARD_NUMBER -e TEST_CONFIG -e NUM_TEST_SHARDS -e REENABLED_ISSUES -e CONTINUE_THROUGH_ERROR -e VERBOSE_TEST_LOGS -e TEST_SHOWLOCALS -e NO_TEST_TIMEOUT -e NO_TD -e TD_DISTRIBUTED -e PR_LABELS -e MAX_JOBS=14 -e SCCACHE_BUCKET -e SCCACHE_REGION -e XLA_CUDA -e XLA_CLANG_CACHE_S3_BUCKET_NAME -e PYTORCH_TEST_CUDA_MEM_LEAK_CHECK -e PYTORCH_TEST_RERUN_DISABLED_TESTS -e SKIP_SCCACHE_INITIALIZATION=1 -e HUGGING_FACE_HUB_TOKEN -e VLLM_TEST_HUGGING_FACE_TOKEN -e SCRIBE_GRAPHQL_ACCESS_TOKEN -e DASHBOARD_TAG -e ARTIFACTS_FILE_SUFFIX --memory=61g --memory-swap=64g --env-file=/tmp/github_env_19922826259 --security-opt seccomp=unconfined --cap-add=SYS_PTRACE --ipc=host --shm-size=2g --tty --detach --name= --user jenkins -v /home/ec2-user/actions-runner/_work/pytorch/pytorch:/var/lib/jenkins/workspace -w /var/lib/jenkins/workspace 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.4-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a 2025-12-04T09:42:43.9244569Z + container_name=764ff984146fd3268e049644ccb47d7d8238fae8138055ae0a6928cb5da435ad 2025-12-04T09:42:43.9245467Z + echo DOCKER_CONTAINER_ID=764ff984146fd3268e049644ccb47d7d8238fae8138055ae0a6928cb5da435ad 2025-12-04T09:42:43.9246987Z + [[ linux-jammy-cuda12.4-py3.10-gcc11 == *\s\3\9\0\x* ]] 2025-12-04T09:42:43.9251539Z ++ echo dist/torch-2.10.0a0+gitffd9b0f-cp310-cp310-linux_x86_64.whl 2025-12-04T09:42:43.9254332Z + docker exec -t 764ff984146fd3268e049644ccb47d7d8238fae8138055ae0a6928cb5da435ad sh -c 'python3 -m pip install dist/torch-2.10.0a0+gitffd9b0f-cp310-cp310-linux_x86_64.whl[opt-einsum] && .ci/pytorch/test.sh' 2025-12-04T09:42:44.4999894Z Processing ./dist/torch-2.10.0a0+gitffd9b0f-cp310-cp310-linux_x86_64.whl (from torch==2.10.0a0+gitffd9b0f) 2025-12-04T09:42:45.3792517Z Requirement already satisfied: filelock in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from torch==2.10.0a0+gitffd9b0f->torch==2.10.0a0+gitffd9b0f) (3.18.0) 2025-12-04T09:42:45.3796700Z Requirement already satisfied: typing-extensions>=4.10.0 in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from torch==2.10.0a0+gitffd9b0f->torch==2.10.0a0+gitffd9b0f) (4.12.2) 2025-12-04T09:42:45.3801911Z Requirement already satisfied: sympy>=1.13.3 in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from torch==2.10.0a0+gitffd9b0f->torch==2.10.0a0+gitffd9b0f) (1.13.3) 2025-12-04T09:42:45.3807092Z Requirement already satisfied: networkx>=2.5.1 in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from torch==2.10.0a0+gitffd9b0f->torch==2.10.0a0+gitffd9b0f) (2.8.8) 2025-12-04T09:42:45.3811244Z Requirement already satisfied: jinja2 in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from torch==2.10.0a0+gitffd9b0f->torch==2.10.0a0+gitffd9b0f) (3.1.6) 2025-12-04T09:42:45.3816729Z Requirement already satisfied: fsspec>=0.8.5 in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from torch==2.10.0a0+gitffd9b0f->torch==2.10.0a0+gitffd9b0f) (2025.10.0) 2025-12-04T09:42:45.3832292Z Requirement already satisfied: opt-einsum>=3.3 in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from torch==2.10.0a0+gitffd9b0f->torch==2.10.0a0+gitffd9b0f) (3.3.0) 2025-12-04T09:42:45.4271393Z Requirement already satisfied: numpy>=1.7 in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from opt-einsum>=3.3->torch==2.10.0a0+gitffd9b0f->torch==2.10.0a0+gitffd9b0f) (1.22.4) 2025-12-04T09:42:45.4294710Z Requirement already satisfied: mpmath<1.4,>=1.1.0 in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from sympy>=1.13.3->torch==2.10.0a0+gitffd9b0f->torch==2.10.0a0+gitffd9b0f) (1.3.0) 2025-12-04T09:42:45.4362796Z Requirement already satisfied: MarkupSafe>=2.0 in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from jinja2->torch==2.10.0a0+gitffd9b0f->torch==2.10.0a0+gitffd9b0f) (3.0.3) 2025-12-04T09:42:45.8660017Z Installing collected packages: torch 2025-12-04T09:42:58.4856658Z Successfully installed torch-2.10.0a0+gitffd9b0f 2025-12-04T09:42:58.5780059Z + export TERM=vt100 2025-12-04T09:42:58.5780403Z + TERM=vt100 2025-12-04T09:42:58.5782655Z ++ dirname .ci/pytorch/test.sh 2025-12-04T09:42:58.5791310Z + source .ci/pytorch/common.sh 2025-12-04T09:42:58.5794913Z +++ dirname .ci/pytorch/common.sh 2025-12-04T09:42:58.5802641Z ++ source .ci/pytorch/common_utils.sh 2025-12-04T09:42:58.5804358Z +++ declare -f -t trap_add 2025-12-04T09:42:58.5810381Z ++ set -ex -o pipefail 2025-12-04T09:42:58.5810754Z ++ [[ linux-jammy-cuda12.4-py3.10-gcc11 == *rocm* ]] 2025-12-04T09:42:58.5811164Z ++ BUILD_TEST_LIBTORCH=0 2025-12-04T09:42:58.5815609Z ++ dirname .ci/pytorch/test.sh 2025-12-04T09:42:58.5823597Z + source .ci/pytorch/common-build.sh 2025-12-04T09:42:58.5825344Z ++ [[ linux-jammy-cuda12.4-py3.10-gcc11 != *win-* ]] 2025-12-04T09:42:58.5832079Z ++++ dirname .ci/pytorch/common-build.sh 2025-12-04T09:42:58.5840553Z +++ cd .ci/pytorch 2025-12-04T09:42:58.5840865Z +++ pwd -P 2025-12-04T09:42:58.5842867Z ++ script_dir=/var/lib/jenkins/workspace/.ci/pytorch 2025-12-04T09:42:58.5843471Z ++ [[ linux-jammy-cuda12.4-py3.10-gcc11 == *-pch* ]] 2025-12-04T09:42:58.5843880Z ++ which sccache 2025-12-04T09:42:58.5862248Z ++ [[ -z ossci-compiler-cache-circleci-v2 ]] 2025-12-04T09:42:58.5862862Z ++ sccache --stop-server 2025-12-04T09:42:58.5893382Z ++ true 2025-12-04T09:42:58.5893836Z ++ rm -f /var/lib/jenkins/sccache_error.log 2025-12-04T09:42:58.5904021Z ++ trap_add sccache_epilogue EXIT 2025-12-04T09:42:58.5904422Z ++ trap_add_cmd=sccache_epilogue 2025-12-04T09:42:58.5904748Z ++ shift 2025-12-04T09:42:58.5904995Z ++ for trap_add_name in "$@" 2025-12-04T09:42:58.5911151Z ++++ trap -p EXIT 2025-12-04T09:42:58.5913791Z +++ eval 'extract_trap_cmd ' 2025-12-04T09:42:58.5914098Z ++++ extract_trap_cmd 2025-12-04T09:42:58.5914392Z ++++ printf '%s\n' '' 2025-12-04T09:42:58.5914832Z +++ printf '%s\n' sccache_epilogue 2025-12-04T09:42:58.5916845Z ++ trap -- ' 2025-12-04T09:42:58.5917103Z sccache_epilogue' EXIT 2025-12-04T09:42:58.5917394Z ++ [[ -n 1 ]] 2025-12-04T09:42:58.5917847Z ++ echo 'Skipping sccache server initialization, setting environment variables' 2025-12-04T09:42:58.5918540Z Skipping sccache server initialization, setting environment variables 2025-12-04T09:42:58.5919078Z ++ export SCCACHE_IDLE_TIMEOUT=0 2025-12-04T09:42:58.5919435Z ++ SCCACHE_IDLE_TIMEOUT=0 2025-12-04T09:42:58.5919845Z ++ export SCCACHE_ERROR_LOG=/var/lib/jenkins/sccache_error.log 2025-12-04T09:42:58.5920356Z ++ SCCACHE_ERROR_LOG=/var/lib/jenkins/sccache_error.log 2025-12-04T09:42:58.5929113Z ++ export RUST_LOG=sccache::server=error 2025-12-04T09:42:58.5929554Z ++ RUST_LOG=sccache::server=error 2025-12-04T09:42:58.5929898Z ++ sccache --zero-stats 2025-12-04T09:42:58.7150924Z Statistics zeroed. 2025-12-04T09:42:58.7156126Z ++ which ccache 2025-12-04T09:42:58.7180321Z + [[ linux-jammy-cuda12.4-py3.10-gcc11 != *rocm* ]] 2025-12-04T09:42:58.7180891Z + [[ linux-jammy-cuda12.4-py3.10-gcc11 != *s390x* ]] 2025-12-04T09:42:58.7181342Z + [[ -d /var/lib/jenkins/workspace ]] 2025-12-04T09:42:58.7183293Z ++ stat -c %u /var/lib/jenkins/workspace 2025-12-04T09:42:58.7197930Z + WORKSPACE_ORIGINAL_OWNER_ID=1000 2025-12-04T09:42:58.7198315Z + trap_add cleanup_workspace EXIT 2025-12-04T09:42:58.7198748Z + trap_add_cmd=cleanup_workspace 2025-12-04T09:42:58.7199174Z + shift 2025-12-04T09:42:58.7199614Z + for trap_add_name in "$@" 2025-12-04T09:42:58.7206745Z +++ trap -p EXIT 2025-12-04T09:42:58.7209850Z ++ eval 'extract_trap_cmd trap -- '\'' 2025-12-04T09:42:58.7210364Z sccache_epilogue'\'' EXIT' 2025-12-04T09:42:58.7210720Z +++ extract_trap_cmd trap -- ' 2025-12-04T09:42:58.7211056Z sccache_epilogue' EXIT 2025-12-04T09:42:58.7211340Z +++ printf '%s\n' ' 2025-12-04T09:42:58.7211623Z sccache_epilogue' 2025-12-04T09:42:58.7211922Z ++ printf '%s\n' cleanup_workspace 2025-12-04T09:42:58.7212828Z + trap -- ' 2025-12-04T09:42:58.7213168Z sccache_epilogue 2025-12-04T09:42:58.7213485Z cleanup_workspace' EXIT 2025-12-04T09:42:58.7213836Z + sudo chown -R jenkins /var/lib/jenkins/workspace 2025-12-04T09:42:59.4586627Z + git config --global --add safe.directory /var/lib/jenkins/workspace 2025-12-04T09:42:59.4606215Z + [[ linux-jammy-cuda12.4-py3.10-gcc11 == *cuda* ]] 2025-12-04T09:42:59.4609383Z ++ python -c 'import os;import numba.cuda; print(os.path.dirname(numba.cuda.__file__))' 2025-12-04T09:42:59.9614820Z + NUMBA_CUDA_DIR=/opt/conda/envs/py_3.10/lib/python3.10/site-packages/numba/cuda 2025-12-04T09:42:59.9615578Z + '[' -n /opt/conda/envs/py_3.10/lib/python3.10/site-packages/numba/cuda ']' 2025-12-04T09:42:59.9621095Z +++ realpath .ci/pytorch/test.sh 2025-12-04T09:42:59.9631140Z ++ dirname /var/lib/jenkins/workspace/.ci/pytorch/test.sh 2025-12-04T09:42:59.9649483Z + NUMBA_PATCH=/var/lib/jenkins/workspace/.ci/pytorch/numba-cuda-13.patch 2025-12-04T09:42:59.9650254Z + pushd /opt/conda/envs/py_3.10/lib/python3.10/site-packages/numba/cuda 2025-12-04T09:42:59.9651256Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/numba/cuda ~/workspace 2025-12-04T09:42:59.9651814Z + patch -p4 2025-12-04T09:42:59.9665244Z patching file cudadrv/driver.py 2025-12-04T09:42:59.9665617Z Hunk #1 succeeded at 357 (offset -8 lines). 2025-12-04T09:42:59.9674795Z + popd 2025-12-04T09:42:59.9675081Z ~/workspace 2025-12-04T09:42:59.9675401Z + echo 'Environment variables:' 2025-12-04T09:42:59.9675921Z Environment variables: 2025-12-04T09:42:59.9676206Z + env 2025-12-04T09:42:59.9686014Z GITHUB_WORKSPACE=/home/ec2-user/actions-runner/_work/pytorch/pytorch 2025-12-04T09:42:59.9686791Z CONTINUE_THROUGH_ERROR=True 2025-12-04T09:42:59.9687352Z BUILD_ENVIRONMENT=linux-jammy-cuda12.4-py3.10-gcc11 2025-12-04T09:42:59.9688037Z VLLM_TEST_HUGGING_FACE_TOKEN=*** 2025-12-04T09:42:59.9688383Z HOSTNAME=764ff984146f 2025-12-04T09:42:59.9689081Z GITHUB_PATH=/home/ec2-user/actions-runner/_work/_temp/_runner_file_commands/add_path_685e94f7-4594-411d-afae-acf4a383301b 2025-12-04T09:42:59.9689849Z GITHUB_ACTION=__run_3 2025-12-04T09:42:59.9690169Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 2025-12-04T09:42:59.9690533Z GITHUB_RUN_NUMBER=19107 2025-12-04T09:42:59.9690849Z TEST_CONFIG=legacy_nvidia_driver 2025-12-04T09:42:59.9691195Z GITHUB_REPOSITORY_OWNER_ID=21003710 2025-12-04T09:42:59.9691586Z TORCH_NVCC_FLAGS=-Xfatbin -compress-all 2025-12-04T09:42:59.9691962Z SCCACHE_IDLE_TIMEOUT=0 2025-12-04T09:42:59.9692454Z SCRIBE_GRAPHQL_ACCESS_TOKEN=*** 2025-12-04T09:42:59.9692818Z GITHUB_TRIGGERING_ACTOR=huydhn 2025-12-04T09:42:59.9693155Z GITHUB_REF_TYPE=branch 2025-12-04T09:42:59.9693502Z BASE_SHA=ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32 2025-12-04T09:42:59.9693887Z XLA_CUDA= 2025-12-04T09:42:59.9694153Z NCCL_LIB_DIR=/usr/local/cuda/lib64/ 2025-12-04T09:42:59.9694630Z HUGGING_FACE_HUB_TOKEN=*** 2025-12-04T09:42:59.9695128Z *** 2025-12-04T09:42:59.9695377Z GITHUB_REPOSITORY_ID=65600975 2025-12-04T09:42:59.9695705Z GITHUB_ACTIONS=true 2025-12-04T09:42:59.9695998Z NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T09:42:59.9696465Z SCCACHE_ERROR_LOG=/var/lib/jenkins/sccache_error.log 2025-12-04T09:42:59.9696937Z SHA1=ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32 2025-12-04T09:42:59.9697386Z GITHUB_SHA=ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32 2025-12-04T09:42:59.9698002Z GITHUB_WORKFLOW_REF=pytorch/pytorch/.github/workflows/periodic.yml@refs/heads/main 2025-12-04T09:42:59.9698573Z UCC_HOME=/usr 2025-12-04T09:42:59.9698846Z VERBOSE_TEST_LOGS=False 2025-12-04T09:42:59.9699150Z GITHUB_REF=refs/heads/main 2025-12-04T09:42:59.9699455Z SHARD_NUMBER=1 2025-12-04T09:42:59.9699729Z GITHUB_REF_PROTECTED=true 2025-12-04T09:42:59.9700028Z HOME=/var/lib/jenkins 2025-12-04T09:42:59.9700353Z GITHUB_API_URL=https://api.github.com 2025-12-04T09:42:59.9700742Z PYTORCH_TEST_RERUN_DISABLED_TESTS=0 2025-12-04T09:42:59.9701135Z UCX_COMMIT=7836b165abdbe468a2f607e7254011c07d788152 2025-12-04T09:42:59.9701537Z USE_SYSTEM_NCCL=1 2025-12-04T09:42:59.9701802Z NUM_TEST_SHARDS=5 2025-12-04T09:42:59.9702064Z UCX_HOME=/usr 2025-12-04T09:42:59.9702730Z GITHUB_STATE=/home/ec2-user/actions-runner/_work/_temp/_runner_file_commands/save_state_685e94f7-4594-411d-afae-acf4a383301b 2025-12-04T09:42:59.9703922Z JOB_NAME=linux-jammy-cuda12.4-py3.10-gcc11 / test (legacy_nvidia_driver, 1, 5, linux.g4dn.4xlarge.nvidia.gpu, mem_leak_check, unstable) 2025-12-04T09:42:59.9705090Z GITHUB_ENV=/home/ec2-user/actions-runner/_work/_temp/_runner_file_commands/set_env_685e94f7-4594-411d-afae-acf4a383301b 2025-12-04T09:42:59.9706056Z GITHUB_EVENT_PATH=/home/ec2-user/actions-runner/_work/_temp/_github_workflow/event.json 2025-12-04T09:42:59.9706660Z GITHUB_EVENT_NAME=schedule 2025-12-04T09:42:59.9706968Z DASHBOARD_TAG= 2025-12-04T09:42:59.9707236Z GITHUB_RUN_ID=19922826259 2025-12-04T09:42:59.9707529Z INSTALLED_OPENBLAS= 2025-12-04T09:42:59.9708267Z GITHUB_STEP_SUMMARY=/home/ec2-user/actions-runner/_work/_temp/_runner_file_commands/step_summary_685e94f7-4594-411d-afae-acf4a383301b 2025-12-04T09:42:59.9709068Z GITHUB_ACTOR=huydhn 2025-12-04T09:42:59.9709325Z PR_NUMBER= 2025-12-04T09:42:59.9709572Z DESIRED_CUDA=12.4 2025-12-04T09:42:59.9710074Z GITHUB_RUN_ATTEMPT=1 2025-12-04T09:42:59.9710369Z ANACONDA_PYTHON_VERSION=3.10 2025-12-04T09:42:59.9710767Z GITHUB_GRAPHQL_URL=https://api.github.com/graphql 2025-12-04T09:42:59.9711185Z TERM=vt100 2025-12-04T09:42:59.9711424Z INSTALLED_VISION=yes 2025-12-04T09:42:59.9711706Z BRANCH=main 2025-12-04T09:42:59.9711968Z SCCACHE_REGION=us-east-1 2025-12-04T09:42:59.9712270Z OPENSSL_ROOT_DIR=/opt/openssl 2025-12-04T09:42:59.9712693Z BUILD_AOT_INDUCTOR_TEST= 2025-12-04T09:42:59.9712997Z CUDA_PATH=/usr/local/cuda 2025-12-04T09:42:59.9713610Z GITHUB_ACTION_PATH=/home/ec2-user/actions-runner/_work/pytorch/pytorch/./.github/actions/setup-linux 2025-12-04T09:42:59.9714286Z GITHUB_SERVER_URL=https://github.com 2025-12-04T09:42:59.9714702Z UCC_COMMIT=430e241bf5d38cbc73fc7a6b89155397232e3f96 2025-12-04T09:42:59.9715101Z REENABLED_ISSUES= 2025-12-04T09:42:59.9715350Z DOCS= 2025-12-04T09:42:59.9715575Z SHLVL=1 2025-12-04T09:42:59.9715805Z MAX_JOBS=14 2025-12-04T09:42:59.9716042Z GITHUB_ACTOR_ID=475357 2025-12-04T09:42:59.9716441Z GITHUB_WORKFLOW_SHA=ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32 2025-12-04T09:42:59.9716897Z GITHUB_REF_NAME=main 2025-12-04T09:42:59.9717353Z XLA_CLANG_CACHE_S3_BUCKET_NAME=ossci-compiler-clang-cache-circleci-xla 2025-12-04T09:42:59.9717854Z GITHUB_JOB=test 2025-12-04T09:42:59.9718123Z NO_TEST_TIMEOUT=False 2025-12-04T09:42:59.9718415Z TD_DISTRIBUTED=False 2025-12-04T09:42:59.9718722Z GITHUB_REPOSITORY=pytorch/pytorch 2025-12-04T09:42:59.9719074Z GITHUB_RETENTION_DAYS=90 2025-12-04T09:42:59.9719379Z OPENSSL_DIR=/opt/openssl 2025-12-04T09:42:59.9719676Z GITHUB_ACTION_REPOSITORY= 2025-12-04T09:42:59.9720603Z PATH=/opt/cache/bin:/usr/local/nvidia/bin:/usr/local/cuda/bin:/opt/conda/envs/py_3.10/bin:/opt/conda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin 2025-12-04T09:42:59.9721572Z GITHUB_BASE_REF= 2025-12-04T09:42:59.9721824Z INSTALLED_ACL= 2025-12-04T09:42:59.9722373Z ARTIFACTS_FILE_SUFFIX=test-legacy_nvidia_driver-1-5-linux.g4dn.4xlarge.nvidia.gpu_57119749248 2025-12-04T09:42:59.9722999Z CI=true 2025-12-04T09:42:59.9723246Z GITHUB_REPOSITORY_OWNER=pytorch 2025-12-04T09:42:59.9723628Z RUST_LOG=sccache::server=error 2025-12-04T09:42:59.9723947Z JOB_ID=57119749248 2025-12-04T09:42:59.9724211Z GITHUB_HEAD_REF= 2025-12-04T09:42:59.9724463Z GITHUB_ACTION_REF= 2025-12-04T09:42:59.9724795Z SCCACHE_BUCKET=ossci-compiler-cache-circleci-v2 2025-12-04T09:42:59.9725207Z TEST_SHOWLOCALS=False 2025-12-04T09:42:59.9725495Z GITHUB_WORKFLOW=periodic 2025-12-04T09:42:59.9725811Z DEBIAN_FRONTEND=noninteractive 2025-12-04T09:42:59.9726567Z GITHUB_OUTPUT=/home/ec2-user/actions-runner/_work/_temp/_runner_file_commands/set_output_685e94f7-4594-411d-afae-acf4a383301b 2025-12-04T09:42:59.9727311Z NO_TD=False 2025-12-04T09:42:59.9727584Z SKIP_SCCACHE_INITIALIZATION=1 2025-12-04T09:42:59.9727944Z NCCL_INCLUDE_DIR=/usr/local/cuda/include/ 2025-12-04T09:42:59.9728297Z _=/usr/bin/env 2025-12-04T09:42:59.9728718Z OLDPWD=/opt/conda/envs/py_3.10/lib/python3.10/site-packages/numba/cuda 2025-12-04T09:42:59.9729345Z ++ python -c 'import site; print(site.getsitepackages()[0])' 2025-12-04T09:42:59.9837988Z + TORCH_INSTALL_DIR=/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch 2025-12-04T09:42:59.9838701Z + TORCH_BIN_DIR=/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/bin 2025-12-04T09:42:59.9839408Z + TORCH_LIB_DIR=/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/lib 2025-12-04T09:42:59.9840135Z + TORCH_TEST_DIR=/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/test 2025-12-04T09:42:59.9840658Z + BUILD_DIR=build 2025-12-04T09:42:59.9840952Z + BUILD_RENAMED_DIR=build_renamed 2025-12-04T09:42:59.9841316Z + BUILD_BIN_DIR=build/bin 2025-12-04T09:42:59.9841619Z + SHARD_NUMBER=1 2025-12-04T09:42:59.9841876Z + NUM_TEST_SHARDS=5 2025-12-04T09:42:59.9842178Z + export TORCH_SERIALIZATION_DEBUG=1 2025-12-04T09:42:59.9842549Z + TORCH_SERIALIZATION_DEBUG=1 2025-12-04T09:42:59.9842861Z + export VALGRIND=ON 2025-12-04T09:42:59.9843137Z + VALGRIND=ON 2025-12-04T09:42:59.9843697Z + [[ linux-jammy-cuda12.4-py3.10-gcc11 == *clang9* ]] 2025-12-04T09:42:59.9844156Z + [[ linux-jammy-cuda12.4-py3.10-gcc11 == *xpu* ]] 2025-12-04T09:42:59.9844558Z + detect_cuda_arch 2025-12-04T09:42:59.9844885Z + [[ linux-jammy-cuda12.4-py3.10-gcc11 == *cuda* ]] 2025-12-04T09:42:59.9845281Z + command -v nvidia-smi 2025-12-04T09:42:59.9845580Z /usr/bin/nvidia-smi 2025-12-04T09:42:59.9849919Z ++ nvidia-smi --query-gpu=compute_cap --format=csv 2025-12-04T09:42:59.9850795Z ++ tail -n 1 2025-12-04T09:43:00.0081815Z + TORCH_CUDA_ARCH_LIST=7.5 2025-12-04T09:43:00.0082233Z + export TORCH_CUDA_ARCH_LIST 2025-12-04T09:43:00.0082629Z + [[ linux-jammy-cuda12.4-py3.10-gcc11 == *s390x* ]] 2025-12-04T09:43:00.0083027Z + [[ 0 == \1 ]] 2025-12-04T09:43:00.0083288Z + [[ True == \1 ]] 2025-12-04T09:43:00.0083620Z + [[ linux-jammy-cuda12.4-py3.10-gcc11 != *bazel* ]] 2025-12-04T09:43:00.0085760Z ++ realpath build/custom_test_artifacts 2025-12-04T09:43:00.0105830Z + CUSTOM_TEST_ARTIFACT_BUILD_DIR=/var/lib/jenkins/workspace/build/custom_test_artifacts 2025-12-04T09:43:00.0106434Z + [[ -n '' ]] 2025-12-04T09:43:00.0106718Z + echo 'Environment variables' 2025-12-04T09:43:00.0107040Z Environment variables 2025-12-04T09:43:00.0107319Z + env 2025-12-04T09:43:00.0126233Z GITHUB_WORKSPACE=/home/ec2-user/actions-runner/_work/pytorch/pytorch 2025-12-04T09:43:00.0126841Z CONTINUE_THROUGH_ERROR=True 2025-12-04T09:43:00.0127361Z BUILD_ENVIRONMENT=linux-jammy-cuda12.4-py3.10-gcc11 2025-12-04T09:43:00.0128076Z VLLM_TEST_HUGGING_FACE_TOKEN=*** 2025-12-04T09:43:00.0128464Z HOSTNAME=764ff984146f 2025-12-04T09:43:00.0129155Z GITHUB_PATH=/home/ec2-user/actions-runner/_work/_temp/_runner_file_commands/add_path_685e94f7-4594-411d-afae-acf4a383301b 2025-12-04T09:43:00.0129904Z GITHUB_ACTION=__run_3 2025-12-04T09:43:00.0130219Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 2025-12-04T09:43:00.0130576Z GITHUB_RUN_NUMBER=19107 2025-12-04T09:43:00.0130874Z TEST_CONFIG=legacy_nvidia_driver 2025-12-04T09:43:00.0131228Z GITHUB_REPOSITORY_OWNER_ID=21003710 2025-12-04T09:43:00.0131620Z TORCH_NVCC_FLAGS=-Xfatbin -compress-all 2025-12-04T09:43:00.0131980Z SCCACHE_IDLE_TIMEOUT=0 2025-12-04T09:43:00.0132440Z SCRIBE_GRAPHQL_ACCESS_TOKEN=*** 2025-12-04T09:43:00.0132789Z GITHUB_TRIGGERING_ACTOR=huydhn 2025-12-04T09:43:00.0133159Z GITHUB_REF_TYPE=branch 2025-12-04T09:43:00.0133479Z TORCH_CUDA_ARCH_LIST=7.5 2025-12-04T09:43:00.0133925Z BASE_SHA=ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32 2025-12-04T09:43:00.0134486Z XLA_CUDA= 2025-12-04T09:43:00.0134800Z NCCL_LIB_DIR=/usr/local/cuda/lib64/ 2025-12-04T09:43:00.0135509Z HUGGING_FACE_HUB_TOKEN=*** 2025-12-04T09:43:00.0135950Z *** 2025-12-04T09:43:00.0136199Z GITHUB_REPOSITORY_ID=65600975 2025-12-04T09:43:00.0136702Z GITHUB_ACTIONS=true 2025-12-04T09:43:00.0136997Z NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T09:43:00.0137395Z SCCACHE_ERROR_LOG=/var/lib/jenkins/sccache_error.log 2025-12-04T09:43:00.0137846Z SHA1=ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32 2025-12-04T09:43:00.0138297Z GITHUB_SHA=ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32 2025-12-04T09:43:00.0138938Z GITHUB_WORKFLOW_REF=pytorch/pytorch/.github/workflows/periodic.yml@refs/heads/main 2025-12-04T09:43:00.0139502Z UCC_HOME=/usr 2025-12-04T09:43:00.0139774Z TORCH_SERIALIZATION_DEBUG=1 2025-12-04T09:43:00.0140097Z VERBOSE_TEST_LOGS=False 2025-12-04T09:43:00.0140399Z GITHUB_REF=refs/heads/main 2025-12-04T09:43:00.0140689Z SHARD_NUMBER=1 2025-12-04T09:43:00.0140961Z GITHUB_REF_PROTECTED=true 2025-12-04T09:43:00.0141275Z HOME=/var/lib/jenkins 2025-12-04T09:43:00.0141585Z GITHUB_API_URL=https://api.github.com 2025-12-04T09:43:00.0141970Z PYTORCH_TEST_RERUN_DISABLED_TESTS=0 2025-12-04T09:43:00.0142374Z UCX_COMMIT=7836b165abdbe468a2f607e7254011c07d788152 2025-12-04T09:43:00.0142762Z USE_SYSTEM_NCCL=1 2025-12-04T09:43:00.0143033Z NUM_TEST_SHARDS=5 2025-12-04T09:43:00.0143295Z UCX_HOME=/usr 2025-12-04T09:43:00.0143954Z GITHUB_STATE=/home/ec2-user/actions-runner/_work/_temp/_runner_file_commands/save_state_685e94f7-4594-411d-afae-acf4a383301b 2025-12-04T09:43:00.0145426Z JOB_NAME=linux-jammy-cuda12.4-py3.10-gcc11 / test (legacy_nvidia_driver, 1, 5, linux.g4dn.4xlarge.nvidia.gpu, mem_leak_check, unstable) 2025-12-04T09:43:00.0146598Z GITHUB_ENV=/home/ec2-user/actions-runner/_work/_temp/_runner_file_commands/set_env_685e94f7-4594-411d-afae-acf4a383301b 2025-12-04T09:43:00.0147560Z GITHUB_EVENT_PATH=/home/ec2-user/actions-runner/_work/_temp/_github_workflow/event.json 2025-12-04T09:43:00.0148146Z GITHUB_EVENT_NAME=schedule 2025-12-04T09:43:00.0148567Z DASHBOARD_TAG= 2025-12-04T09:43:00.0148835Z GITHUB_RUN_ID=19922826259 2025-12-04T09:43:00.0149125Z INSTALLED_OPENBLAS= 2025-12-04T09:43:00.0149858Z GITHUB_STEP_SUMMARY=/home/ec2-user/actions-runner/_work/_temp/_runner_file_commands/step_summary_685e94f7-4594-411d-afae-acf4a383301b 2025-12-04T09:43:00.0150657Z GITHUB_ACTOR=huydhn 2025-12-04T09:43:00.0150916Z PR_NUMBER= 2025-12-04T09:43:00.0151165Z DESIRED_CUDA=12.4 2025-12-04T09:43:00.0151438Z GITHUB_RUN_ATTEMPT=1 2025-12-04T09:43:00.0151721Z VALGRIND=ON 2025-12-04T09:43:00.0151978Z ANACONDA_PYTHON_VERSION=3.10 2025-12-04T09:43:00.0152380Z GITHUB_GRAPHQL_URL=https://api.github.com/graphql 2025-12-04T09:43:00.0152794Z TERM=vt100 2025-12-04T09:43:00.0153030Z INSTALLED_VISION=yes 2025-12-04T09:43:00.0153313Z BRANCH=main 2025-12-04T09:43:00.0153579Z SCCACHE_REGION=us-east-1 2025-12-04T09:43:00.0153888Z OPENSSL_ROOT_DIR=/opt/openssl 2025-12-04T09:43:00.0154231Z BUILD_AOT_INDUCTOR_TEST= 2025-12-04T09:43:00.0154540Z CUDA_PATH=/usr/local/cuda 2025-12-04T09:43:00.0155155Z GITHUB_ACTION_PATH=/home/ec2-user/actions-runner/_work/pytorch/pytorch/./.github/actions/setup-linux 2025-12-04T09:43:00.0155863Z GITHUB_SERVER_URL=https://github.com 2025-12-04T09:43:00.0156287Z UCC_COMMIT=430e241bf5d38cbc73fc7a6b89155397232e3f96 2025-12-04T09:43:00.0156681Z REENABLED_ISSUES= 2025-12-04T09:43:00.0156947Z DOCS= 2025-12-04T09:43:00.0157176Z SHLVL=1 2025-12-04T09:43:00.0157398Z MAX_JOBS=14 2025-12-04T09:43:00.0157649Z GITHUB_ACTOR_ID=475357 2025-12-04T09:43:00.0158044Z GITHUB_WORKFLOW_SHA=ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32 2025-12-04T09:43:00.0158512Z GITHUB_REF_NAME=main 2025-12-04T09:43:00.0158946Z XLA_CLANG_CACHE_S3_BUCKET_NAME=ossci-compiler-clang-cache-circleci-xla 2025-12-04T09:43:00.0159446Z GITHUB_JOB=test 2025-12-04T09:43:00.0159710Z NO_TEST_TIMEOUT=False 2025-12-04T09:43:00.0159985Z TD_DISTRIBUTED=False 2025-12-04T09:43:00.0160288Z GITHUB_REPOSITORY=pytorch/pytorch 2025-12-04T09:43:00.0160641Z GITHUB_RETENTION_DAYS=90 2025-12-04T09:43:00.0160942Z OPENSSL_DIR=/opt/openssl 2025-12-04T09:43:00.0161255Z GITHUB_ACTION_REPOSITORY= 2025-12-04T09:43:00.0162184Z PATH=/opt/cache/bin:/usr/local/nvidia/bin:/usr/local/cuda/bin:/opt/conda/envs/py_3.10/bin:/opt/conda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin 2025-12-04T09:43:00.0163138Z GITHUB_BASE_REF= 2025-12-04T09:43:00.0163405Z INSTALLED_ACL= 2025-12-04T09:43:00.0163945Z ARTIFACTS_FILE_SUFFIX=test-legacy_nvidia_driver-1-5-linux.g4dn.4xlarge.nvidia.gpu_57119749248 2025-12-04T09:43:00.0164569Z CI=true 2025-12-04T09:43:00.0164814Z GITHUB_REPOSITORY_OWNER=pytorch 2025-12-04T09:43:00.0165202Z RUST_LOG=sccache::server=error 2025-12-04T09:43:00.0165522Z JOB_ID=57119749248 2025-12-04T09:43:00.0165781Z GITHUB_HEAD_REF= 2025-12-04T09:43:00.0166047Z GITHUB_ACTION_REF= 2025-12-04T09:43:00.0166381Z SCCACHE_BUCKET=ossci-compiler-cache-circleci-v2 2025-12-04T09:43:00.0166779Z TEST_SHOWLOCALS=False 2025-12-04T09:43:00.0167074Z GITHUB_WORKFLOW=periodic 2025-12-04T09:43:00.0167392Z DEBIAN_FRONTEND=noninteractive 2025-12-04T09:43:00.0168125Z GITHUB_OUTPUT=/home/ec2-user/actions-runner/_work/_temp/_runner_file_commands/set_output_685e94f7-4594-411d-afae-acf4a383301b 2025-12-04T09:43:00.0168887Z NO_TD=False 2025-12-04T09:43:00.0169154Z SKIP_SCCACHE_INITIALIZATION=1 2025-12-04T09:43:00.0169501Z NCCL_INCLUDE_DIR=/usr/local/cuda/include/ 2025-12-04T09:43:00.0170032Z OLDPWD=/opt/conda/envs/py_3.10/lib/python3.10/site-packages/numba/cuda 2025-12-04T09:43:00.0170535Z _=/usr/bin/env 2025-12-04T09:43:00.0170828Z + echo 'Testing pytorch' 2025-12-04T09:43:00.0171381Z Testing pytorch 2025-12-04T09:43:00.0171833Z + export LANG=C.UTF-8 2025-12-04T09:43:00.0172133Z + LANG=C.UTF-8 2025-12-04T09:43:00.0172378Z + PR_NUMBER= 2025-12-04T09:43:00.0172671Z + [[ legacy_nvidia_driver == \d\e\f\a\u\l\t ]] 2025-12-04T09:43:00.0173111Z + [[ legacy_nvidia_driver == \d\i\s\t\r\i\b\u\t\e\d ]] 2025-12-04T09:43:00.0173522Z + [[ legacy_nvidia_driver == \s\l\o\w ]] 2025-12-04T09:43:00.0173990Z + [[ linux-jammy-cuda12.4-py3.10-gcc11 == *slow-gradcheck* ]] 2025-12-04T09:43:00.0174611Z + [[ linux-jammy-cuda12.4-py3.10-gcc11 == *cuda* ]] 2025-12-04T09:43:00.0175066Z + export PYTORCH_TESTING_DEVICE_ONLY_FOR=cuda 2025-12-04T09:43:00.0175469Z + PYTORCH_TESTING_DEVICE_ONLY_FOR=cuda 2025-12-04T09:43:00.0175848Z + [[ legacy_nvidia_driver == *crossref* ]] 2025-12-04T09:43:00.0176380Z + [[ linux-jammy-cuda12.4-py3.10-gcc11 == *rocm* ]] 2025-12-04T09:43:00.0176822Z + [[ linux-jammy-cuda12.4-py3.10-gcc11 == *xpu* ]] 2025-12-04T09:43:00.0177291Z + [[ linux-jammy-cuda12.4-py3.10-gcc11 != *-bazel-* ]] 2025-12-04T09:43:00.0177713Z + pip_install ninja==1.10.2 2025-12-04T09:43:00.0178141Z + pip_install_pkg='python3 -m pip install --progress-bar off' 2025-12-04T09:43:00.0178695Z + python3 -m pip install --progress-bar off ninja==1.10.2 2025-12-04T09:43:00.5129125Z Collecting ninja==1.10.2 2025-12-04T09:43:00.5481268Z Downloading ninja-1.10.2-py2.py3-none-manylinux_2_5_x86_64.manylinux1_x86_64.whl.metadata (5.0 kB) 2025-12-04T09:43:00.5598120Z Downloading ninja-1.10.2-py2.py3-none-manylinux_2_5_x86_64.manylinux1_x86_64.whl (108 kB) 2025-12-04T09:43:00.9938755Z Installing collected packages: ninja 2025-12-04T09:43:00.9939416Z Attempting uninstall: ninja 2025-12-04T09:43:00.9948456Z Found existing installation: ninja 1.11.1.4 2025-12-04T09:43:00.9973054Z Uninstalling ninja-1.11.1.4: 2025-12-04T09:43:01.0040520Z Successfully uninstalled ninja-1.11.1.4 2025-12-04T09:43:01.0425193Z Successfully installed ninja-1.10.2 2025-12-04T09:43:01.1171965Z + export PATH=/var/lib/jenkins/.local/bin:/opt/cache/bin:/usr/local/nvidia/bin:/usr/local/cuda/bin:/opt/conda/envs/py_3.10/bin:/opt/conda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin 2025-12-04T09:43:01.1173914Z + PATH=/var/lib/jenkins/.local/bin:/opt/cache/bin:/usr/local/nvidia/bin:/usr/local/cuda/bin:/opt/conda/envs/py_3.10/bin:/opt/conda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin 2025-12-04T09:43:01.1175178Z + [[ linux-jammy-cuda12.4-py3.10-gcc11 == *aarch64* ]] 2025-12-04T09:43:01.1175662Z + [[ linux-jammy-cuda12.4-py3.10-gcc11 == *asan* ]] 2025-12-04T09:43:01.1176212Z + [[ linux-jammy-cuda12.4-py3.10-gcc11 == *-debug* ]] 2025-12-04T09:43:01.1176757Z + [[ linux-jammy-cuda12.4-py3.10-gcc11 != *-bazel-* ]] 2025-12-04T09:43:01.1177409Z + echo 'We are not in debug mode: linux-jammy-cuda12.4-py3.10-gcc11. Expect the assertion to pass' 2025-12-04T09:43:01.1178229Z We are not in debug mode: linux-jammy-cuda12.4-py3.10-gcc11. Expect the assertion to pass 2025-12-04T09:43:01.1178796Z + cd test 2025-12-04T09:43:01.1179225Z + python -c 'import torch; torch._C._crash_if_debug_asserts_fail(424242)' 2025-12-04T09:43:02.9360036Z + [[ legacy_nvidia_driver == \n\o\g\p\u\_\N\O\_\A\V\X\2 ]] 2025-12-04T09:43:02.9360912Z + [[ legacy_nvidia_driver == \n\o\g\p\u\_\A\V\X\5\1\2 ]] 2025-12-04T09:43:02.9361791Z + [[ legacy_nvidia_driver == \l\e\g\a\c\y\_\n\v\i\d\i\a\_\d\r\i\v\e\r ]] 2025-12-04T09:43:02.9362647Z + cd test 2025-12-04T09:43:02.9363270Z + python -c 'import torch; torch.rand(2, 2, device='\''cuda'\'')' 2025-12-04T09:43:07.9735167Z + export USE_LEGACY_DRIVER=1 2025-12-04T09:43:07.9735567Z + USE_LEGACY_DRIVER=1 2025-12-04T09:43:07.9741445Z + DYNAMO_BENCHMARK_FLAGS=() 2025-12-04T09:43:07.9742377Z + [[ legacy_nvidia_driver == *pr_time_benchmarks* ]] 2025-12-04T09:43:07.9742838Z + [[ legacy_nvidia_driver == *dynamo_eager* ]] 2025-12-04T09:43:07.9743252Z + [[ legacy_nvidia_driver == *aot_eager* ]] 2025-12-04T09:43:07.9743655Z + [[ legacy_nvidia_driver == *aot_inductor* ]] 2025-12-04T09:43:07.9744092Z + [[ legacy_nvidia_driver == *max_autotune_inductor* ]] 2025-12-04T09:43:07.9744854Z + [[ legacy_nvidia_driver == *inductor* ]] 2025-12-04T09:43:07.9745246Z + [[ legacy_nvidia_driver == *dynamic* ]] 2025-12-04T09:43:07.9745634Z + [[ legacy_nvidia_driver == *cpu* ]] 2025-12-04T09:43:07.9745991Z + [[ legacy_nvidia_driver == *xpu* ]] 2025-12-04T09:43:07.9746381Z + DYNAMO_BENCHMARK_FLAGS+=(--device cuda) 2025-12-04T09:43:07.9779713Z + [[ linux-jammy-cuda12.4-py3.10-gcc11 == *libtorch* ]] 2025-12-04T09:43:07.9780400Z + [[ linux-jammy-cuda12.4-py3.10-gcc11 == *-bazel-* ]] 2025-12-04T09:43:07.9783248Z + cd test 2025-12-04T09:43:07.9783976Z + python -c 'import torch; print(torch.__config__.show())' 2025-12-04T09:43:10.7580655Z PyTorch built with: 2025-12-04T09:43:10.7581020Z - GCC 11.4 2025-12-04T09:43:10.7581269Z - C++ Version: 201703 2025-12-04T09:43:10.7581955Z - Intel(R) oneAPI Math Kernel Library Version 2024.2-Product Build 20240605 for Intel(R) 64 architecture applications 2025-12-04T09:43:10.7582815Z - Intel(R) MKL-DNN v3.7.1 (Git Hash 8d263e693366ef8db40acc569cc7d8edf644556d) 2025-12-04T09:43:10.7583368Z - OpenMP 201511 (a.k.a. OpenMP 4.5) 2025-12-04T09:43:10.7583763Z - LAPACK is enabled (usually provided by MKL) 2025-12-04T09:43:10.7584162Z - NNPACK is enabled 2025-12-04T09:43:10.7584473Z - CPU capability usage: AVX512 2025-12-04T09:43:10.7584804Z - CUDA Runtime 12.4 2025-12-04T09:43:10.7585213Z - NVCC architecture flags: -gencode;arch=compute_75,code=sm_75 2025-12-04T09:43:10.7585686Z - CuDNN 90.1 2025-12-04T09:43:10.7591398Z - Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, COMMIT_SHA=ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32, CUDA_VERSION=12.4, CUDNN_VERSION=9.1.0, CXX_COMPILER=/opt/cache/bin/c++, CXX_FLAGS= -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOROCTRACER -DLIBKINETO_NOXPUPTI=ON -DUSE_FBGEMM -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -DC10_NODEPRECATED -Wall -Wextra -Werror=return-type -Werror=non-virtual-dtor -Werror=range-loop-construct -Werror=bool-operation -Wnarrowing -Wno-missing-field-initializers -Wno-unknown-pragmas -Wno-unused-parameter -Wno-strict-overflow -Wno-strict-aliasing -Wno-stringop-overflow -Wsuggest-override -Wno-psabi -Wno-error=old-style-cast -faligned-new -Werror -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, FORCE_FALLBACK_CUDA_MPI=1, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, TORCH_VERSION=2.10.0, USE_CUDA=ON, USE_CUDNN=ON, USE_CUSPARSELT=ON, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_GLOO=ON, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=ON, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF, USE_ROCM_KERNEL_ASSERT=OFF, USE_XCCL=OFF, USE_XPU=OFF, 2025-12-04T09:43:10.7597396Z 2025-12-04T09:43:11.2179114Z + cd test 2025-12-04T09:43:11.2179588Z + python -c 'import torch; print(torch.__config__.parallel_info())' 2025-12-04T09:43:12.7055366Z ATen/Parallel: 2025-12-04T09:43:12.7055741Z at::get_num_threads() : 8 2025-12-04T09:43:12.7056100Z at::get_num_interop_threads() : 8 2025-12-04T09:43:12.7056543Z OpenMP 201511 (a.k.a. OpenMP 4.5) 2025-12-04T09:43:12.7056908Z omp_get_max_threads() : 8 2025-12-04T09:43:12.7057572Z Intel(R) oneAPI Math Kernel Library Version 2024.2-Product Build 20240605 for Intel(R) 64 architecture applications 2025-12-04T09:43:12.7058288Z mkl_get_max_threads() : 8 2025-12-04T09:43:12.7058731Z Intel(R) MKL-DNN v3.7.1 (Git Hash 8d263e693366ef8db40acc569cc7d8edf644556d) 2025-12-04T09:43:12.7059271Z std::thread::hardware_concurrency() : 16 2025-12-04T09:43:12.7059659Z Environment variables: 2025-12-04T09:43:12.7059962Z OMP_NUM_THREADS : [not set] 2025-12-04T09:43:12.7060272Z MKL_NUM_THREADS : [not set] 2025-12-04T09:43:12.7060600Z ATen parallel backend: OpenMP 2025-12-04T09:43:12.7060814Z 2025-12-04T09:43:13.0284302Z + [[ legacy_nvidia_driver == *numpy_2* ]] 2025-12-04T09:43:13.0285238Z + [[ linux-jammy-cuda12.4-py3.10-gcc11 == *aarch64* ]] 2025-12-04T09:43:13.0286081Z + [[ legacy_nvidia_driver == *backward* ]] 2025-12-04T09:43:13.0286645Z + [[ legacy_nvidia_driver == *libtorch_agnostic_targetting* ]] 2025-12-04T09:43:13.0287411Z + [[ legacy_nvidia_driver == *xla* ]] 2025-12-04T09:43:13.0287770Z + [[ legacy_nvidia_driver == *vllm* ]] 2025-12-04T09:43:13.0288153Z + [[ legacy_nvidia_driver == *executorch* ]] 2025-12-04T09:43:13.0288577Z + [[ legacy_nvidia_driver == \j\i\t\_\l\e\g\a\c\y ]] 2025-12-04T09:43:13.0289016Z + [[ legacy_nvidia_driver == \q\u\a\n\t\i\z\a\t\i\o\n ]] 2025-12-04T09:43:13.0289488Z + [[ linux-jammy-cuda12.4-py3.10-gcc11 == *libtorch* ]] 2025-12-04T09:43:13.0290072Z + [[ legacy_nvidia_driver == distributed ]] 2025-12-04T09:43:13.0290501Z + [[ legacy_nvidia_driver == *operator_benchmark* ]] 2025-12-04T09:43:13.0290973Z + [[ legacy_nvidia_driver == *operator_microbenchmark* ]] 2025-12-04T09:43:13.0291475Z + [[ legacy_nvidia_driver == *attention_microbenchmark* ]] 2025-12-04T09:43:13.0291964Z + [[ legacy_nvidia_driver == *inductor_distributed* ]] 2025-12-04T09:43:13.0292405Z + [[ legacy_nvidia_driver == *inductor-halide* ]] 2025-12-04T09:43:13.0292848Z + [[ legacy_nvidia_driver == *inductor-pallas* ]] 2025-12-04T09:43:13.0293307Z + [[ legacy_nvidia_driver == *inductor-triton-cpu* ]] 2025-12-04T09:43:13.0293783Z + [[ legacy_nvidia_driver == *inductor-micro-benchmark* ]] 2025-12-04T09:43:13.0294312Z + [[ legacy_nvidia_driver == *aoti_cross_compile_for_windows* ]] 2025-12-04T09:43:13.0294796Z + [[ legacy_nvidia_driver == *huggingface* ]] 2025-12-04T09:43:13.0295186Z + [[ legacy_nvidia_driver == *timm* ]] 2025-12-04T09:43:13.0295558Z + [[ legacy_nvidia_driver == cachebench ]] 2025-12-04T09:43:13.0295964Z + [[ legacy_nvidia_driver == verify_cachebench ]] 2025-12-04T09:43:13.0296466Z + [[ legacy_nvidia_driver == *torchbench* ]] 2025-12-04T09:43:13.0296891Z + [[ legacy_nvidia_driver == *inductor_cpp_wrapper* ]] 2025-12-04T09:43:13.0297339Z + [[ legacy_nvidia_driver == *inductor_core* ]] 2025-12-04T09:43:13.0297757Z + [[ legacy_nvidia_driver == *inductor* ]] 2025-12-04T09:43:13.0298130Z + [[ legacy_nvidia_driver == *einops* ]] 2025-12-04T09:43:13.0298523Z + [[ legacy_nvidia_driver == *dynamo_core* ]] 2025-12-04T09:43:13.0298938Z + [[ legacy_nvidia_driver == *dynamo_wrapped* ]] 2025-12-04T09:43:13.0299377Z + [[ linux-jammy-cuda12.4-py3.10-gcc11 == *rocm* ]] 2025-12-04T09:43:13.0299753Z + [[ 1 == 1 ]] 2025-12-04T09:43:13.0300008Z + [[ 5 -gt 1 ]] 2025-12-04T09:43:13.0300306Z + test_lazy_tensor_meta_reference_disabled 2025-12-04T09:43:13.0300770Z + export TORCH_DISABLE_FUNCTIONALIZATION_META_REFERENCE=1 2025-12-04T09:43:13.0301280Z + TORCH_DISABLE_FUNCTIONALIZATION_META_REFERENCE=1 2025-12-04T09:43:13.0301796Z + echo 'Testing lazy tensor operations without meta reference' 2025-12-04T09:43:13.0302314Z Testing lazy tensor operations without meta reference 2025-12-04T09:43:13.0302879Z + python test/run_test.py --include lazy/test_ts_opinfo.py --verbose 2025-12-04T09:43:20.0796294Z Downloading https://ossci-metrics.s3.amazonaws.com/disabled-tests-condensed.json to /var/lib/jenkins/workspace/test/.pytorch-disabled-tests.json 2025-12-04T09:43:20.1337892Z Ignoring disabled issues: [''] 2025-12-04T09:43:20.1456221Z Found test times from artifacts 2025-12-04T09:43:20.1916896Z Found test times from artifacts 2025-12-04T09:43:20.1932548Z Running all tests 2025-12-04T09:43:20.1936032Z Running parallel tests on 1 processes 2025-12-04T09:43:20.1936887Z Name: tests to run (est. time: 0.01min) 2025-12-04T09:43:20.1937293Z Serial tests (1): 2025-12-04T09:43:20.1937613Z lazy/test_ts_opinfo 1/1 2025-12-04T09:43:20.1937922Z Parallel tests (0): 2025-12-04T09:43:20.1938266Z Name: excluded (est. time: 0.0min) 2025-12-04T09:43:20.1938606Z Serial tests (0): 2025-12-04T09:43:20.1938869Z Parallel tests (0): 2025-12-04T09:43:20.1939474Z Running lazy/test_ts_opinfo 1/1 ... [2025-12-04 09:43:20.193787][1828.576685955] 2025-12-04T09:43:20.1940011Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T09:43:20.1945360Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'lazy/test_ts_opinfo.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 09:43:20.194284] 2025-12-04T09:43:27.1191829Z 2025-12-04T09:43:27.1192939Z lazy/test_ts_opinfo 1/1 was successful, full logs can be found in artifacts with path test/test-reports/lazy.test_ts_opinfo_1.1_4d268f7078430bdf_.log 2025-12-04T09:43:27.1195547Z Running 5 items in this shard: test/lazy/test_ts_opinfo.py::TestLazyTensor::testConvolutionBackward, test/lazy/test_ts_opinfo.py::TestLazyTensor::test_tensor_ctr, test/lazy/test_ts_opinfo.py::TestLazyTensor::test_view_mark_step_preserved, test/lazy/test_ts_opinfo.py::TestLazyDynamicOps::test_adaptiveavgpool3d_dynamic, test/lazy/test_ts_opinfo.py::TestLazyDynamicOps::test_nonzero_dynamic 2025-12-04T09:43:27.1197851Z 2025-12-04T09:43:27.1198185Z Finished lazy/test_ts_opinfo 1/1 ... [2025-12-04 09:43:27.119110][1835.502006988], took 0.12min 2025-12-04T09:43:27.1199894Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/lazy.test_ts_opinfo/lazy.test_ts_opinfo-8eadd60536af3632.xml 2025-12-04T09:43:27.5569574Z Uploading artifacts took 0.12 seconds 2025-12-04T09:43:34.8761968Z Running test batch 'tests to run' cost 14.68 seconds 2025-12-04T09:43:35.7681941Z 2025-12-04T09:43:35.7682611Z real 0m22.739s 2025-12-04T09:43:35.7682923Z user 0m21.851s 2025-12-04T09:43:35.7683181Z sys 0m7.066s 2025-12-04T09:43:35.7683569Z + export -n TORCH_DISABLE_FUNCTIONALIZATION_META_REFERENCE 2025-12-04T09:43:35.7684019Z + test_without_numpy 2025-12-04T09:43:35.7686931Z ++ dirname .ci/pytorch/test.sh 2025-12-04T09:43:35.7699930Z + pushd .ci/pytorch 2025-12-04T09:43:35.7700328Z ~/workspace/.ci/pytorch ~/workspace 2025-12-04T09:43:35.7701314Z + python -c 'import sys;sys.path.insert(0, '\''fake_numpy'\'');from unittest import TestCase;import torch;x=torch.randn(3,3);TestCase().assertRaises(RuntimeError, lambda: x.numpy())' 2025-12-04T09:43:36.6233326Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_subclasses/functional_tensor.py:283: UserWarning: Failed to initialize NumPy: Sorry PyTorch, but our NumPy is in the other folder (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/utils/tensor_numpy.cpp:84.) 2025-12-04T09:43:36.6235039Z cpu = _conversion_method_template(device=torch.device("cpu")) 2025-12-04T09:43:37.4980412Z + python -c 'import sys;sys.path.insert(0, '\''fake_numpy'\'');import torch;print(torch.tensor([torch.tensor(0.), torch.tensor(1.)]))' 2025-12-04T09:43:38.3685354Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_subclasses/functional_tensor.py:283: UserWarning: Failed to initialize NumPy: Sorry PyTorch, but our NumPy is in the other folder (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/utils/tensor_numpy.cpp:84.) 2025-12-04T09:43:38.3687111Z cpu = _conversion_method_template(device=torch.device("cpu")) 2025-12-04T09:43:38.8912295Z tensor([0., 1.]) 2025-12-04T09:43:39.1956603Z + [[ legacy_nvidia_driver == *dynamo_wrapped* ]] 2025-12-04T09:43:39.1957266Z + python -c 'import sys;sys.path.insert(0, '\''fake_numpy'\'');import torch; import torch.onnx' 2025-12-04T09:43:40.0496488Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_subclasses/functional_tensor.py:283: UserWarning: Failed to initialize NumPy: Sorry PyTorch, but our NumPy is in the other folder (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/utils/tensor_numpy.cpp:84.) 2025-12-04T09:43:40.0498192Z cpu = _conversion_method_template(device=torch.device("cpu")) 2025-12-04T09:43:40.9395881Z + popd 2025-12-04T09:43:40.9396204Z ~/workspace 2025-12-04T09:43:40.9396471Z + install_torchvision 2025-12-04T09:43:40.9396797Z + local orig_preload 2025-12-04T09:43:40.9397168Z + local commit 2025-12-04T09:43:40.9400756Z ++ get_pinned_commit vision 2025-12-04T09:43:40.9401165Z ++ cat .github/ci_commit_pins/vision.txt 2025-12-04T09:43:40.9416620Z + commit=617079d944b0e72632311c30ae2bbdf1168b901e 2025-12-04T09:43:40.9417032Z + orig_preload= 2025-12-04T09:43:40.9417315Z + '[' -n '' ']' 2025-12-04T09:43:40.9417628Z + [[ linux-jammy-cuda12.4-py3.10-gcc11 == *cuda* ]] 2025-12-04T09:43:40.9418048Z + export FORCE_CUDA=1 2025-12-04T09:43:40.9419389Z + FORCE_CUDA=1 2025-12-04T09:43:40.9419682Z + export WITH_CUDA=1 2025-12-04T09:43:40.9419948Z + WITH_CUDA=1 2025-12-04T09:43:40.9420631Z + pip_build_and_install git+https://github.com/pytorch/vision.git@617079d944b0e72632311c30ae2bbdf1168b901e dist/vision 2025-12-04T09:43:40.9421691Z + local build_target=git+https://github.com/pytorch/vision.git@617079d944b0e72632311c30ae2bbdf1168b901e 2025-12-04T09:43:40.9422495Z + local wheel_dir=dist/vision 2025-12-04T09:43:40.9422826Z + local found_whl=0 2025-12-04T09:43:40.9423118Z + for file in "${wheel_dir}"/*.whl 2025-12-04T09:43:40.9423454Z + [[ -f dist/vision/*.whl ]] 2025-12-04T09:43:40.9423766Z + '[' 0 == 0 ']' 2025-12-04T09:43:40.9424557Z + python3 -m pip wheel --no-build-isolation --no-deps -w dist/vision git+https://github.com/pytorch/vision.git@617079d944b0e72632311c30ae2bbdf1168b901e 2025-12-04T09:43:41.3131189Z Collecting git+https://github.com/pytorch/vision.git@617079d944b0e72632311c30ae2bbdf1168b901e 2025-12-04T09:43:41.3137280Z Cloning https://github.com/pytorch/vision.git (to revision 617079d944b0e72632311c30ae2bbdf1168b901e) to /tmp/pip-req-build-rqa6hlff 2025-12-04T09:43:41.3319124Z Running command git clone --filter=blob:none --quiet https://github.com/pytorch/vision.git /tmp/pip-req-build-rqa6hlff 2025-12-04T09:43:43.0192050Z Running command git rev-parse -q --verify 'sha^617079d944b0e72632311c30ae2bbdf1168b901e' 2025-12-04T09:43:43.0215382Z Running command git fetch -q https://github.com/pytorch/vision.git 617079d944b0e72632311c30ae2bbdf1168b901e 2025-12-04T09:43:43.1436052Z Resolved https://github.com/pytorch/vision.git to commit 617079d944b0e72632311c30ae2bbdf1168b901e 2025-12-04T09:43:46.8213661Z Preparing metadata (pyproject.toml) ... [?25l- \ | done 2025-12-04T09:43:46.8253188Z [?25hBuilding wheels for collected packages: torchvision 2025-12-04T09:45:19.2915843Z Building wheel for torchvision (pyproject.toml) ... [?25l- \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | done 2025-12-04T09:45:19.2982303Z [?25h Created wheel for torchvision: filename=torchvision-0.25.0a0+617079d-cp310-cp310-linux_x86_64.whl size=1821704 sha256=219a7f84513fcaa1896571ae4e982082b0713184231b39ddfb0215ffbe02c5c6 2025-12-04T09:45:19.2984543Z Stored in directory: /var/lib/jenkins/.cache/pip/wheels/12/b2/29/1f82685c5b5173629e1f36a9b93989ce92ce563e5fb91d27ac 2025-12-04T09:45:19.3027148Z Successfully built torchvision 2025-12-04T09:45:19.3976411Z + for file in "${wheel_dir}"/*.whl 2025-12-04T09:45:19.3977081Z + pip_install_whl dist/vision/torchvision-0.25.0a0+617079d-cp310-cp310-linux_x86_64.whl 2025-12-04T09:45:19.3977879Z + args=('dist/vision/torchvision-0.25.0a0+617079d-cp310-cp310-linux_x86_64.whl') 2025-12-04T09:45:19.3978410Z + local args 2025-12-04T09:45:19.3978877Z + [[ dist/vision/torchvision-0.25.0a0+617079d-cp310-cp310-linux_x86_64.whl == *\ * ]] 2025-12-04T09:45:19.3979497Z + for path in "${args[@]}" 2025-12-04T09:45:19.3980048Z + echo 'Installing dist/vision/torchvision-0.25.0a0+617079d-cp310-cp310-linux_x86_64.whl' 2025-12-04T09:45:19.3980867Z Installing dist/vision/torchvision-0.25.0a0+617079d-cp310-cp310-linux_x86_64.whl 2025-12-04T09:45:19.3981807Z + python3 -mpip install --no-index --no-deps dist/vision/torchvision-0.25.0a0+617079d-cp310-cp310-linux_x86_64.whl 2025-12-04T09:45:19.7764321Z Processing ./dist/vision/torchvision-0.25.0a0+617079d-cp310-cp310-linux_x86_64.whl 2025-12-04T09:45:19.7906948Z Installing collected packages: torchvision 2025-12-04T09:45:20.3296105Z Successfully installed torchvision-0.25.0a0+617079d 2025-12-04T09:45:20.3989952Z + '[' -n '' ']' 2025-12-04T09:45:20.3990322Z + test_python_shard 1 2025-12-04T09:45:20.3990613Z + [[ -z 5 ]] 2025-12-04T09:45:20.3991580Z + python test/run_test.py --exclude-jit-executor --exclude-distributed-tests --exclude-quantization-tests --shard 1 5 --verbose --upload-artifacts-while-running 2025-12-04T09:45:27.6312028Z Downloading https://ossci-metrics.s3.amazonaws.com/disabled-tests-condensed.json to /var/lib/jenkins/workspace/test/.pytorch-disabled-tests.json 2025-12-04T09:45:27.6426136Z Found test times from artifacts 2025-12-04T09:45:27.6872583Z Found test times from artifacts 2025-12-04T09:45:27.6888120Z Running all tests 2025-12-04T09:45:27.7780345Z Running parallel tests on 1 processes 2025-12-04T09:45:27.7792908Z Name: tests to run (est. time: 288.55min) 2025-12-04T09:45:27.7794086Z Serial tests (146): 2025-12-04T09:45:27.7794403Z inductor/test_aot_inductor 1/6 2025-12-04T09:45:27.7794761Z inductor/test_aot_inductor 6/6 2025-12-04T09:45:27.7795201Z inductor/test_torchinductor_codegen_dynamic_shapes 2/4 2025-12-04T09:45:27.7795735Z inductor/test_torchinductor_opinfo 2/17 2025-12-04T09:45:27.7796141Z inductor/test_torchinductor_opinfo 7/17 2025-12-04T09:45:27.7796537Z inductor/test_torchinductor_opinfo 12/17 2025-12-04T09:45:27.7796946Z inductor/test_torchinductor_opinfo 17/17 2025-12-04T09:45:27.7797366Z inductor/test_cuda_select_algorithm 3/5 2025-12-04T09:45:27.7797749Z inductor/test_compile_subprocess 3/3 2025-12-04T09:45:27.7798137Z inductor/test_flex_decoding 1/1 2025-12-04T09:45:27.7798500Z inductor/test_deterministic 5/8 2025-12-04T09:45:27.7798853Z inductor/test_fp8 1/1 2025-12-04T09:45:27.7799158Z dynamo/test_model_output 1/1 2025-12-04T09:45:27.7799502Z inductor/test_triton_kernels 1/1 2025-12-04T09:45:27.7799870Z inductor/test_loop_ordering 1/1 2025-12-04T09:45:27.7800207Z export/test_serdes 1/1 2025-12-04T09:45:27.7800522Z dynamo/test_backends 1/1 2025-12-04T09:45:27.7811874Z inductor/test_aot_inductor_package 1/1 2025-12-04T09:45:27.7812630Z inductor/test_padding 1/1 2025-12-04T09:45:27.7813240Z dynamo/test_aot_compile 1/1 2025-12-04T09:45:27.7813577Z dynamo/test_sets 1/1 2025-12-04T09:45:27.7813921Z dynamo/test_wrap_inductor_compiled_regions 1/1 2025-12-04T09:45:27.7814330Z test_sparse 2/2 2025-12-04T09:45:27.7814606Z test_decomp 3/17 2025-12-04T09:45:27.7814905Z test_decomp 8/17 2025-12-04T09:45:27.7815174Z test_decomp 13/17 2025-12-04T09:45:27.7815476Z test_ops_fwd_gradients 1/2 2025-12-04T09:45:27.7815796Z test_meta 2/5 2025-12-04T09:45:27.7816050Z test_ops_jit 2/2 2025-12-04T09:45:27.7816422Z test_nestedtensor 3/4 2025-12-04T09:45:27.7816726Z test_ops 2/11 2025-12-04T09:45:27.7816976Z test_ops 7/11 2025-12-04T09:45:27.7817271Z functorch/test_dims 1/1 2025-12-04T09:45:27.7817597Z functorch/test_ops 1/7 2025-12-04T09:45:27.7817899Z functorch/test_ops 6/7 2025-12-04T09:45:27.7818229Z inductor/test_select_algorithm 1/1 2025-12-04T09:45:27.7818599Z inductor/test_cpu_repro 1/3 2025-12-04T09:45:27.7818935Z inductor/test_custom_lowering 1/1 2025-12-04T09:45:27.7819295Z inductor/test_perf 1/1 2025-12-04T09:45:27.7819620Z inductor/test_binary_folding 1/1 2025-12-04T09:45:27.7820045Z inductor/test_mkldnn_pattern_matcher 3/3 2025-12-04T09:45:27.7820444Z inductor/test_cutlass_backend 1/1 2025-12-04T09:45:27.7820809Z inductor/test_ck_backend 1/1 2025-12-04T09:45:27.7821165Z inductor/test_gpu_cpp_wrapper 1/1 2025-12-04T09:45:27.7821528Z inductor/test_cutedsl_template 1/1 2025-12-04T09:45:27.7821906Z inductor/test_benchmark_fusion 1/1 2025-12-04T09:45:27.7822267Z dynamo/test_modules 1/1 2025-12-04T09:45:27.7822580Z dynamo/test_recompiles 1/1 2025-12-04T09:45:27.7822921Z export/test_tree_utils 1/1 2025-12-04T09:45:27.7823263Z inductor/test_triton_wrapper 1/1 2025-12-04T09:45:27.7823629Z inductor/test_static_cuda_launcher 1/1 2025-12-04T09:45:27.7824017Z export/test_dynamic_shapes 1/1 2025-12-04T09:45:27.7824368Z dynamo/test_sdpa 1/1 2025-12-04T09:45:27.7824661Z dynamo/test_utils 1/1 2025-12-04T09:45:27.7824986Z inductor/test_codegen_triton 1/1 2025-12-04T09:45:27.7825343Z dynamo/test_frame_init 1/1 2025-12-04T09:45:27.7825668Z inductor/test_device_assert 1/1 2025-12-04T09:45:27.7826027Z dynamo/test_skip_non_tensor 1/1 2025-12-04T09:45:27.7826603Z dynamo/test_skip_guard_eval_unsafe 1/1 2025-12-04T09:45:27.7826991Z inductor/test_control_deps 1/1 2025-12-04T09:45:27.7827335Z inductor/test_benchmarking 1/1 2025-12-04T09:45:27.7827697Z inductor/test_helion_kernels 1/1 2025-12-04T09:45:27.7828059Z inductor/test_quantization 1/1 2025-12-04T09:45:27.7828392Z export/test_tools 1/1 2025-12-04T09:45:27.7828817Z inductor/test_compiled_optimizers 1/3 2025-12-04T09:45:27.7829211Z inductor/test_aot_inductor_utils 1/1 2025-12-04T09:45:27.7829576Z inductor/test_control_flow 3/4 2025-12-04T09:45:27.7829938Z inductor/test_minifier_isolate 1/1 2025-12-04T09:45:27.7830307Z dynamo/test_error_messages 1/1 2025-12-04T09:45:27.7830653Z dynamo/test_fake_distributed 1/1 2025-12-04T09:45:27.7831011Z dynamo/test_tree_map 1/1 2025-12-04T09:45:27.7831341Z dynamo/test_minifier 1/1 2025-12-04T09:45:27.7831656Z dynamo/test_guard_manager 1/1 2025-12-04T09:45:27.7832000Z export/test_schema 1/1 2025-12-04T09:45:27.7832325Z export/test_pass_infra 1/1 2025-12-04T09:45:27.7832670Z dynamo/test_recompile_ux 1/1 2025-12-04T09:45:27.7833003Z export/test_experimental 1/1 2025-12-04T09:45:27.7833348Z export/test_converter 1/1 2025-12-04T09:45:27.7833683Z dynamo/test_reorder_logs 1/1 2025-12-04T09:45:27.7834010Z dynamo/test_subclasses 1/1 2025-12-04T09:45:27.7834350Z dynamo/test_python_autograd 1/1 2025-12-04T09:45:27.7834717Z export/test_draft_export 1/1 2025-12-04T09:45:27.7835028Z test_package 1/1 2025-12-04T09:45:27.7835313Z test_mkl_verbose 1/1 2025-12-04T09:45:27.7835625Z test_comparison_utils 1/1 2025-12-04T09:45:27.7835947Z functorch/test_ac_logging 1/1 2025-12-04T09:45:27.7836289Z test_mkldnn_verbose 1/1 2025-12-04T09:45:27.7836609Z test_cpp_api_parity 1/1 2025-12-04T09:45:27.7836906Z test_autoload 1/1 2025-12-04T09:45:27.7837216Z nn/attention/test_open_registry 1/1 2025-12-04T09:45:27.7837578Z test_as_strided 1/1 2025-12-04T09:45:27.7837856Z test_foreach 1/1 2025-12-04T09:45:27.7838139Z xpu/test_gemm 1/1 2025-12-04T09:45:27.7838433Z test_numpy_interop 1/1 2025-12-04T09:45:27.7838763Z profiler/test_cpp_thread 1/1 2025-12-04T09:45:27.7839079Z test_hub 1/1 2025-12-04T09:45:27.7839360Z test_segment_reductions 1/1 2025-12-04T09:45:27.7839699Z test_autograd_fallback 1/1 2025-12-04T09:45:27.7840010Z test_type_hints 1/1 2025-12-04T09:45:27.7840362Z functorch/test_aot_joint_with_descriptors 1/1 2025-12-04T09:45:27.7840767Z test_fx_reinplace_pass 1/1 2025-12-04T09:45:27.7841095Z functorch/test_control_flow 2/2 2025-12-04T09:45:27.7841449Z test_subclass 1/1 2025-12-04T09:45:27.7841761Z functorch/test_vmap_registrations 1/1 2025-12-04T09:45:27.7842137Z nn/test_parametrization 1/1 2025-12-04T09:45:27.7842475Z test_dynamic_shapes 1/1 2025-12-04T09:45:27.7842789Z test_dispatch 1/1 2025-12-04T09:45:27.7843072Z test_numba_integration 1/1 2025-12-04T09:45:27.7843413Z test_functional_optim 1/1 2025-12-04T09:45:27.7843743Z test_maskedtensor 1/1 2025-12-04T09:45:27.7844071Z benchmark_utils/test_benchmark_utils 1/1 2025-12-04T09:45:27.7844447Z test_scaled_matmul_cuda 1/1 2025-12-04T09:45:27.7844817Z torch_np/numpy_tests/core/test_shape_base 1/1 2025-12-04T09:45:27.7845208Z test_vulkan 1/1 2025-12-04T09:45:27.7845491Z lazy/test_generator 1/1 2025-12-04T09:45:27.7845836Z torch_np/numpy_tests/linalg/test_linalg 1/1 2025-12-04T09:45:27.7846255Z torch_np/numpy_tests/core/test_dtype 1/1 2025-12-04T09:45:27.7846638Z lazy/test_debug_util 1/1 2025-12-04T09:45:27.7846949Z nn/test_load_state_dict 1/1 2025-12-04T09:45:27.7847273Z test_shape_ops 1/1 2025-12-04T09:45:27.7847613Z nn/test_module_hooks 1/1 2025-12-04T09:45:27.7847962Z torch_np/numpy_tests/lib/test_twodim_base 1/1 2025-12-04T09:45:27.7848381Z profiler/test_memory_profiler 1/1 2025-12-04T09:45:27.7848796Z test_jit_llga_fuser 1/1 2025-12-04T09:45:27.7849096Z optim/test_optim 1/1 2025-12-04T09:45:27.7849535Z torch_np/numpy_tests/core/test_getlimits 1/1 2025-12-04T09:45:27.7849958Z torch_np/test_ndarray_methods 1/1 2025-12-04T09:45:27.7850296Z test_view_ops 1/1 2025-12-04T09:45:27.7850588Z test_type_info 1/1 2025-12-04T09:45:27.7850898Z functorch/test_aotdispatch 1/1 2025-12-04T09:45:27.7851257Z test_scatter_gather_ops 1/1 2025-12-04T09:45:27.7851572Z test_cuda_multigpu 1/1 2025-12-04T09:45:27.7852000Z torch_np/numpy_tests/lib/test_index_tricks 1/1 2025-12-04T09:45:27.7852401Z test_jit_autocast 1/1 2025-12-04T09:45:27.7852690Z nn/test_pooling 1/1 2025-12-04T09:45:27.7852987Z nn/test_embedding 1/1 2025-12-04T09:45:27.7853303Z test_xnnpack_integration 1/1 2025-12-04T09:45:27.7853622Z test_cuda_trace 1/1 2025-12-04T09:45:27.7853926Z torch_np/test_reductions 1/1 2025-12-04T09:45:27.7854309Z torch_np/numpy_tests/core/test_scalar_ctors 1/1 2025-12-04T09:45:27.7854734Z torch_np/numpy_tests/lib/test_arraypad 1/1 2025-12-04T09:45:27.7855113Z test_prims 1/1 2025-12-04T09:45:27.7855388Z test_spectral_ops 1/1 2025-12-04T09:45:27.7855693Z test_autoload_disable 1/1 2025-12-04T09:45:27.7856037Z test_cpp_extensions_aot_ninja 1/1 2025-12-04T09:45:27.7856508Z test_cpp_extensions_aot_no_ninja 1/1 2025-12-04T09:45:27.7856880Z Parallel tests (0): 2025-12-04T09:45:27.7857174Z Name: excluded (est. time: 0.0min) 2025-12-04T09:45:27.7857519Z Serial tests (0): 2025-12-04T09:45:27.7857796Z Parallel tests (0): 2025-12-04T09:45:27.7858283Z Running inductor/test_aot_inductor 1/6 ... [2025-12-04 09:45:27.780266][1956.163164587] 2025-12-04T09:45:27.7858857Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T09:45:27.7860116Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_aot_inductor.py', '--shard-id=1', '--num-shards=6', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 09:45:27.780728] 2025-12-04T09:54:33.0179084Z 2025-12-04T09:54:33.0180332Z PRINTING LOG FILE of inductor/test_aot_inductor 1/6 (test/test-reports/inductor.test_aot_inductor_1.6_cf1c969272c5d084_.log) 2025-12-04T09:54:33.0181602Z W1204 09:45:41.040000 1815 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T09:54:33.0183041Z Test results will be stored in test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-f2c58a9dfc31919e.xml 2025-12-04T09:54:33.0183959Z ============================= test session starts ============================== 2025-12-04T09:54:33.0184650Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T09:54:33.0185253Z cachedir: .pytest_cache 2025-12-04T09:54:33.0185975Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T09:54:33.0186774Z rootdir: /var/lib/jenkins/workspace 2025-12-04T09:54:33.0187124Z configfile: pytest.ini 2025-12-04T09:54:33.0187873Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T09:54:33.0188754Z collecting ... collected 934 items 2025-12-04T09:54:33.0189180Z stepcurrent: Cannot find last run test, not skipping 2025-12-04T09:54:33.0277144Z Running 154 items in this shard: test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test__weight_int4pack_mm_with_scales_and_zeros_m_32_n_64_q_group_64_num_groups_1_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_add_complex_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_addmm_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_aoti_constant_tensor_name_collision_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_aoti_debug_printer_cpp_kernel_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_boolean_indexing_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_buffer_mutation_1_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_cond_mismatched_branch_output_dynamic_True_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_cond_non_tensor_predicates_dynamic_False_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_cond_predicate_on_cpu_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_cond_simple_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_cond_symint_input_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_cond_with_replace_view_ops_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_device_moved_constant_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_duplicated_params_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_fake_tensor_device_validation_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_free_inactive_buffer_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_int_list_input_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_large_mmaped_weights_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_large_weight_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_masked_select_dynamic_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_misaligned_input_1_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_pad_non_zero_memory_leak_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_poi_multiple_dynamic_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_proxy_executor_squeeze_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_pytree_inputs_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_quanatized_int8_linear_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_quantized_linear_bias_none_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_repeated_user_defined_triton_kernel_embed_kernel_binary_True_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_replace_unbacked_symbol_with_backed_expr_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_reuse_kernel_dynamic_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_runtime_checks_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_simple_multi_arch_embed_kernel_binary_False_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_stft_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_equal_to_1_float_arg_dynamic_True_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_grid_type_1_num_dims_1_dynamic_True_autotune_True_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_grid_type_1_num_dims_2_dynamic_False_autotune_True_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_grid_type_2_num_dims_1_dynamic_False_autotune_False_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_grid_type_2_num_dims_1_dynamic_True_autotune_True_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_grid_type_3_num_dims_1_dynamic_True_autotune_False_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_on_device_tma_dynamic_False_tma_version_old_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_on_device_tma_dynamic_True_tma_version_new_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_reinterpret_view_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_tma_descriptor_1d_dynamic_True_tma_version_new_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_tma_descriptor_1d_dynamic_True_tma_version_old_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_with_none_input_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_mutated_autotuning_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_unbacked_expr_replacements_shift_k_0_use_static_size_True_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_unbounded_expr_substitutions_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_update_constant_buffer_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_while_loop_simple_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_while_loop_with_conv_dynamic_False_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_while_loop_with_outer_code_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_while_loop_with_unbacked_symint_closure_dynamic_True_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_with_cudagraphs_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_add_complex_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_addmm_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_amp_fallback_random_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_aoti_constant_tensor_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_aoti_debug_printer_codegen_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_assert_async_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_buffer_mutation_3_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_cond_nested_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_cond_non_tensor_predicates_dynamic_False_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_cond_use_buffers_from_outer_scope_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_constant_original_fqn_and_dtype_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_device_moved_constant_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_dynamic_smem_above_default_limit_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_fill__fallback_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_fqn_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_free_inactive_buffer_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_freezing_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_linear_dynamic_maxautotune_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_non_default_gpu_device_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_on_gpu_device1_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_proxy_executor_permute_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_pytree_inputs_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_runtime_checks_complex_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_runtime_checks_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_size_with_unbacked_add_and_mul_expr_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_symint_item_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_triton_kernel_extern_kernel_arg_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_triton_kernel_grid_type_1_num_dims_1_dynamic_True_autotune_False_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_triton_kernel_grid_type_1_num_dims_2_dynamic_False_autotune_False_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_triton_kernel_grid_type_1_num_dims_2_dynamic_True_autotune_True_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_triton_kernel_grid_type_2_num_dims_1_dynamic_True_autotune_False_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_triton_kernel_grid_type_2_num_dims_2_dynamic_False_autotune_True_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_triton_kernel_grid_type_2_num_dims_2_dynamic_True_autotune_True_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_triton_kernel_on_device_tma_dynamic_True_tma_version_new_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_triton_kernel_reinterpret_view_mem_leak_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_triton_kernel_sympy_expr_arg_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_triton_kernel_tma_descriptor_2d_dynamic_False_tma_version_old_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_unbacked_expr_replacements_shift_k_0_use_static_size_False_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_unbacked_expr_replacements_shift_k_2_use_static_size_False_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_while_loop_simple_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_while_loop_with_sym_expr_cond_dynamic_False_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_zero_grid_with_backed_symbols_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_zero_size_weight_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test__weight_int4pack_mm_m_32_n_64_q_group_64_num_groups_1_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_amp_fallback_random_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_aot_inductor_consts_cpp_build_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_aoti_debug_printer_codegen_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_aoti_debug_printer_fp8_dtype_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_aoti_user_defined_triton_kernel_profiling_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_autotuning_args_reuse_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_buffer_mutation_4_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_codegen_int_array_var_fix_memory_leak_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_cond_cpu_predicate_cuda_operands_max_autotune_True_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_cond_nested_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_cond_with_multiple_outputs_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_cond_with_replace_view_ops_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_constant_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_constant_original_fqn_and_dtype_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_constant_type_propagation_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_dynamic_smem_above_default_limit_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_fallback_mem_leak_fix_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_foreach_multiple_dynamic_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_fp8_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_fqn_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_large_grid_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_masked_select_dynamic_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_missing_output_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_mixed_device_1_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_nested_tensor_from_jagged_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_non_contiguous_output_alias_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_none_args_aot_codegen_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_output_misaligned_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_pad_non_zero_memory_leak_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_proxy_executor_permute_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_pytree_inputs_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_repeated_user_defined_triton_kernel_embed_kernel_binary_True_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_replace_unbacked_symbol_with_backed_expr_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_reuse_kernel_dynamic_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_runtime_checks_fp8_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_simple_embed_kernel_binary_False_max_autotune_False_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_small_constant_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_stride_with_unbacked_expr_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_sympy_cpp_printer_min_max_minmax1_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_triton_autotuning_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_triton_kernel_grid_type_1_num_dims_2_dynamic_True_autotune_True_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_triton_kernel_grid_type_2_num_dims_1_dynamic_True_autotune_False_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_triton_kernel_unbacked_symint_in_grid_dynamic_False_autotuning_False_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_triton_kernel_unbacked_symint_in_grid_dynamic_True_autotuning_True_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_triton_kernel_with_none_input_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_triton_kernel_with_none_inputs_and_equal_to_1_arg_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_triton_next_power_of_2_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_unbacked_expr_replacements_shift_k_2_use_static_size_True_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_unbounded_expr_substitutions_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_using_model_name_for_files_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_while_loop_with_mixed_device_dynamic_False_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_while_loop_with_mixed_device_dynamic_True_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_while_loop_with_outer_buffers_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_while_loop_with_parameters_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_with_no_triton_profiler_mps 2025-12-04T09:54:33.0364232Z 2025-12-04T09:54:33.0365106Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test__weight_int4pack_mm_with_scales_and_zeros_m_32_n_64_q_group_64_num_groups_1_cpu SKIPPED [0.0041s] (requires Intel GPU) [ 0%] 2025-12-04T09:54:33.0366762Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_add_complex_cpu <- test/inductor/test_torchinductor.py PASSED [14.4244s] [ 1%] 2025-12-04T09:54:33.0368280Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_addmm_cpu <- test/inductor/test_torchinductor.py PASSED [7.2112s] [ 1%] 2025-12-04T09:54:33.0369806Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_aoti_constant_tensor_name_collision_cpu <- test/inductor/test_torchinductor.py PASSED [6.6210s] [ 2%] 2025-12-04T09:54:33.0371641Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_aoti_debug_printer_cpp_kernel_cpu <- test/inductor/test_torchinductor.py PASSED [5.2311s] [ 3%] 2025-12-04T09:54:33.0373207Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_boolean_indexing_cpu <- test/inductor/test_torchinductor.py PASSED [6.2206s] [ 3%] 2025-12-04T09:54:33.0374714Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_buffer_mutation_1_cpu <- test/inductor/test_torchinductor.py PASSED [5.2596s] [ 4%] 2025-12-04T09:54:33.0376860Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_cond_mismatched_branch_output_dynamic_True_cpu W1204 09:46:28.042000 1815 site-packages/torch/export/dynamic_shapes.py:923] Using None as a dynamic shape dimension is deprecated. Please use Dim.STATIC instead 2025-12-04T09:54:33.0378899Z W1204 09:46:28.042000 1815 site-packages/torch/export/dynamic_shapes.py:923] Using None as a dynamic shape dimension is deprecated. Please use Dim.STATIC instead 2025-12-04T09:54:33.0380358Z W1204 09:46:28.043000 1815 site-packages/torch/export/dynamic_shapes.py:923] Using None as a dynamic shape dimension is deprecated. Please use Dim.STATIC instead 2025-12-04T09:54:33.0381258Z PASSED [6.1420s] [ 5%] 2025-12-04T09:54:33.0382052Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_cond_non_tensor_predicates_dynamic_False_cpu PASSED [5.2505s] [ 5%] 2025-12-04T09:54:33.0383488Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_cond_predicate_on_cpu_cpu <- test/inductor/test_torchinductor.py PASSED [5.8065s] [ 6%] 2025-12-04T09:54:33.0385631Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_cond_simple_cpu <- test/inductor/test_torchinductor.py W1204 09:46:45.223000 1815 site-packages/torch/export/dynamic_shapes.py:923] Using None as a dynamic shape dimension is deprecated. Please use Dim.STATIC instead 2025-12-04T09:54:33.0387712Z W1204 09:46:45.224000 1815 site-packages/torch/export/dynamic_shapes.py:923] Using None as a dynamic shape dimension is deprecated. Please use Dim.STATIC instead 2025-12-04T09:54:33.0388615Z PASSED [5.5656s] [ 7%] 2025-12-04T09:54:33.0389475Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_cond_symint_input_cpu <- test/inductor/test_torchinductor.py PASSED [5.5206s] [ 7%] 2025-12-04T09:54:33.0391092Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_cond_with_replace_view_ops_cpu <- test/inductor/test_torchinductor.py SKIPPED [0.0032s] (requires GPU) [ 8%] 2025-12-04T09:54:33.0392737Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_device_moved_constant_cpu <- test/inductor/test_torchinductor.py PASSED [10.8606s] [ 9%] 2025-12-04T09:54:33.0394265Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_duplicated_params_cpu <- test/inductor/test_torchinductor.py PASSED [5.3319s] [ 9%] 2025-12-04T09:54:33.0395916Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_fake_tensor_device_validation_cpu <- test/inductor/test_torchinductor.py SKIPPED [0.0032s] (requires GPU) [ 10%] 2025-12-04T09:54:33.0397798Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_free_inactive_buffer_cpu <- test/inductor/test_torchinductor.py SKIPPED [0.0029s] (requires GPU) [ 11%] 2025-12-04T09:54:33.0399388Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_int_list_input_cpu <- test/inductor/test_torchinductor.py PASSED [5.1502s] [ 11%] 2025-12-04T09:54:33.0400903Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_large_mmaped_weights_cpu <- test/inductor/test_torchinductor.py PASSED [13.8336s] [ 12%] 2025-12-04T09:54:33.0402721Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_large_weight_cpu SKIPPED [0.0003s] (install_free_tensors leads to OOM - https://github.com/pytorch/pytorch/issues/164062) [ 12%] 2025-12-04T09:54:33.0404444Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_masked_select_dynamic_cpu <- test/inductor/test_torchinductor.py PASSED [5.6881s] [ 13%] 2025-12-04T09:54:33.0406081Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_misaligned_input_1_cpu <- test/inductor/test_torchinductor.py SKIPPED [0.0037s] (CUDA/XPU test only) [ 14%] 2025-12-04T09:54:33.0408013Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_pad_non_zero_memory_leak_cpu <- test/inductor/test_torchinductor.py SKIPPED [0.0031s] (test is only for GPU_TYPE) [ 14%] 2025-12-04T09:54:33.0409693Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_poi_multiple_dynamic_cpu <- test/inductor/test_torchinductor.py PASSED [5.3685s] [ 15%] 2025-12-04T09:54:33.0411249Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_proxy_executor_squeeze_cpu <- test/inductor/test_torchinductor.py PASSED [5.2204s] [ 16%] 2025-12-04T09:54:33.0412752Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_pytree_inputs_cpu <- test/inductor/test_torchinductor.py PASSED [5.3006s] [ 16%] 2025-12-04T09:54:33.0414088Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_quanatized_int8_linear_cpu PASSED [5.3603s] [ 17%] 2025-12-04T09:54:33.0415957Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_quantized_linear_bias_none_cpu [W1204 09:47:58.896568605 QuantizedLinear.cpp:379] Warning: fbgemm_pack_gemm_matrix_fp16 is deprecated and will be removed in a future PyTorch release. (function operator()) 2025-12-04T09:54:33.0418123Z [W1204 09:48:03.075802721 QuantizedLinear.cpp:415] Warning: fbgemm_linear_fp16_weight_fp32_activation is deprecated and will be removed in a future PyTorch release. (function operator()) 2025-12-04T09:54:33.0419140Z PASSED [5.3023s] [ 18%] 2025-12-04T09:54:33.0420096Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_repeated_user_defined_triton_kernel_embed_kernel_binary_True_cpu SKIPPED [0.0032s] (requires GPU) [ 18%] 2025-12-04T09:54:33.0421699Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_replace_unbacked_symbol_with_backed_expr_cpu SKIPPED [0.0030s] (requires triton) [ 19%] 2025-12-04T09:54:33.0423242Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_reuse_kernel_dynamic_cpu <- test/inductor/test_torchinductor.py PASSED [6.8573s] [ 20%] 2025-12-04T09:54:33.0424564Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_runtime_checks_cpu PASSED [11.2020s] [ 20%] 2025-12-04T09:54:33.0425967Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_simple_multi_arch_embed_kernel_binary_False_cpu SKIPPED [0.0003s] (Test is only supported on CUDA 12.8+) [ 21%] 2025-12-04T09:54:33.0427522Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_stft_cpu <- test/inductor/test_torchinductor.py PASSED [5.7993s] [ 22%] 2025-12-04T09:54:33.0428980Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_equal_to_1_float_arg_dynamic_True_cpu SKIPPED [0.0033s] (requires GPU) [ 22%] 2025-12-04T09:54:33.0430673Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_grid_type_1_num_dims_1_dynamic_True_autotune_True_cpu SKIPPED [0.0031s] (requires GPU) [ 23%] 2025-12-04T09:54:33.0432369Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_grid_type_1_num_dims_2_dynamic_False_autotune_True_cpu SKIPPED [0.0034s] (requires GPU) [ 24%] 2025-12-04T09:54:33.0434066Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_grid_type_2_num_dims_1_dynamic_False_autotune_False_cpu SKIPPED [0.0031s] (requires GPU) [ 24%] 2025-12-04T09:54:33.0435833Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_grid_type_2_num_dims_1_dynamic_True_autotune_True_cpu SKIPPED [0.0029s] (requires GPU) [ 25%] 2025-12-04T09:54:33.0437530Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_grid_type_3_num_dims_1_dynamic_True_autotune_False_cpu SKIPPED [0.0029s] (requires GPU) [ 25%] 2025-12-04T09:54:33.0439205Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_on_device_tma_dynamic_False_tma_version_old_cpu SKIPPED [0.0029s] (requires GPU) [ 26%] 2025-12-04T09:54:33.0440845Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_on_device_tma_dynamic_True_tma_version_new_cpu SKIPPED [0.0029s] (requires GPU) [ 27%] 2025-12-04T09:54:33.0442564Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_reinterpret_view_cpu <- test/inductor/test_torchinductor.py SKIPPED [0.0030s] (requires GPU) [ 27%] 2025-12-04T09:54:33.0444302Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_tma_descriptor_1d_dynamic_True_tma_version_new_cpu SKIPPED [0.0029s] (requires GPU) [ 28%] 2025-12-04T09:54:33.0445971Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_tma_descriptor_1d_dynamic_True_tma_version_old_cpu SKIPPED [0.0033s] (requires GPU) [ 29%] 2025-12-04T09:54:33.0447688Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_with_none_input_cpu <- test/inductor/test_torchinductor.py SKIPPED [0.0029s] (requires GPU) [ 29%] 2025-12-04T09:54:33.0449261Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_mutated_autotuning_cpu SKIPPED [0.0029s] (requires GPU) [ 30%] 2025-12-04T09:54:33.0450899Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_unbacked_expr_replacements_shift_k_0_use_static_size_True_cpu SKIPPED [0.0030s] (Need triton for user-defined triton kernel) [ 31%] 2025-12-04T09:54:33.0453035Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_unbounded_expr_substitutions_cpu <- test/inductor/test_torchinductor.py W1204 09:48:33.031000 1815 site-packages/torch/_export/__init__.py:71] +============================+ 2025-12-04T09:54:33.0454621Z W1204 09:48:33.031000 1815 site-packages/torch/_export/__init__.py:72] | !!! WARNING !!! | 2025-12-04T09:54:33.0455454Z W1204 09:48:33.031000 1815 site-packages/torch/_export/__init__.py:73] +============================+ 2025-12-04T09:54:33.0457232Z W1204 09:48:33.031000 1815 site-packages/torch/_export/__init__.py:74] torch._export.aot_compile()/torch._export.aot_load() is being deprecated, please switch to directly calling torch._inductor.aoti_compile_and_package(torch.export.export())/torch._inductor.aoti_load_package() instead. 2025-12-04T09:54:33.0458694Z PASSED [5.4016s] [ 31%] 2025-12-04T09:54:33.0459615Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_update_constant_buffer_cpu <- test/inductor/test_torchinductor.py PASSED [5.2432s] [ 32%] 2025-12-04T09:54:33.0461800Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_while_loop_simple_cpu <- test/inductor/test_torchinductor.py W1204 09:48:38.463000 1815 site-packages/torch/export/dynamic_shapes.py:923] Using None as a dynamic shape dimension is deprecated. Please use Dim.STATIC instead 2025-12-04T09:54:33.0463902Z W1204 09:48:38.463000 1815 site-packages/torch/export/dynamic_shapes.py:923] Using None as a dynamic shape dimension is deprecated. Please use Dim.STATIC instead 2025-12-04T09:54:33.0464899Z PASSED [5.8470s] [ 33%] 2025-12-04T09:54:33.0465670Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_while_loop_with_conv_dynamic_False_cpu PASSED [6.0207s] [ 33%] 2025-12-04T09:54:33.0467748Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_while_loop_with_outer_code_cpu <- test/inductor/test_torchinductor.py W1204 09:48:50.334000 1815 site-packages/torch/export/dynamic_shapes.py:923] Using None as a dynamic shape dimension is deprecated. Please use Dim.STATIC instead 2025-12-04T09:54:33.0469941Z W1204 09:48:50.334000 1815 site-packages/torch/export/dynamic_shapes.py:923] Using None as a dynamic shape dimension is deprecated. Please use Dim.STATIC instead 2025-12-04T09:54:33.0470838Z PASSED [5.9070s] [ 34%] 2025-12-04T09:54:33.0472514Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_while_loop_with_unbacked_symint_closure_dynamic_True_cpu W1204 09:48:56.545000 1815 site-packages/torch/export/dynamic_shapes.py:923] Using None as a dynamic shape dimension is deprecated. Please use Dim.STATIC instead 2025-12-04T09:54:33.0474592Z W1204 09:48:56.545000 1815 site-packages/torch/export/dynamic_shapes.py:923] Using None as a dynamic shape dimension is deprecated. Please use Dim.STATIC instead 2025-12-04T09:54:33.0475492Z PASSED [6.2509s] [ 35%] 2025-12-04T09:54:33.0476509Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_with_cudagraphs_cpu <- test/inductor/test_torchinductor.py SKIPPED [0.0033s] (requires CUDA) [ 35%] 2025-12-04T09:54:33.0478081Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_add_complex_cuda <- test/inductor/test_torchinductor.py PASSED [11.3485s] [ 36%] 2025-12-04T09:54:33.0479520Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_addmm_cuda <- test/inductor/test_torchinductor.py PASSED [7.1668s] [ 37%] 2025-12-04T09:54:33.0481433Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_amp_fallback_random_cuda <- test/inductor/test_torchinductor.py W1204 09:49:21.138000 1815 site-packages/torch/_inductor/utils.py:1703] [0/0] Not enough SMs to use max_autotune_gemm mode 2025-12-04T09:54:33.0482798Z PASSED [6.3023s] [ 37%] 2025-12-04T09:54:33.0483693Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_aoti_constant_tensor_cuda <- test/inductor/test_torchinductor.py PASSED [5.4165s] [ 38%] 2025-12-04T09:54:33.0485269Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_aoti_debug_printer_codegen_cuda <- test/inductor/test_torchinductor.py PASSED [11.8141s] [ 38%] 2025-12-04T09:54:33.0486710Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_assert_async_cuda PASSED [5.6555s] [ 39%] 2025-12-04T09:54:33.0487824Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_buffer_mutation_3_cuda PASSED [12.6210s] [ 40%] 2025-12-04T09:54:33.0489777Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_cond_nested_cuda <- test/inductor/test_torchinductor.py W1204 09:50:02.858000 1815 site-packages/torch/export/dynamic_shapes.py:923] Using None as a dynamic shape dimension is deprecated. Please use Dim.STATIC instead 2025-12-04T09:54:33.0491872Z W1204 09:50:02.858000 1815 site-packages/torch/export/dynamic_shapes.py:923] Using None as a dynamic shape dimension is deprecated. Please use Dim.STATIC instead 2025-12-04T09:54:33.0493311Z W1204 09:50:02.859000 1815 site-packages/torch/export/dynamic_shapes.py:923] Using None as a dynamic shape dimension is deprecated. Please use Dim.STATIC instead 2025-12-04T09:54:33.0494212Z PASSED [8.7682s] [ 40%] 2025-12-04T09:54:33.0494997Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_cond_non_tensor_predicates_dynamic_False_cuda PASSED [6.0350s] [ 41%] 2025-12-04T09:54:33.0497341Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_cond_use_buffers_from_outer_scope_cuda <- test/inductor/test_torchinductor.py W1204 09:50:17.550000 1815 site-packages/torch/export/dynamic_shapes.py:923] Using None as a dynamic shape dimension is deprecated. Please use Dim.STATIC instead 2025-12-04T09:54:33.0499540Z W1204 09:50:17.550000 1815 site-packages/torch/export/dynamic_shapes.py:923] Using None as a dynamic shape dimension is deprecated. Please use Dim.STATIC instead 2025-12-04T09:54:33.0500992Z W1204 09:50:17.551000 1815 site-packages/torch/export/dynamic_shapes.py:923] Using None as a dynamic shape dimension is deprecated. Please use Dim.STATIC instead 2025-12-04T09:54:33.0501958Z PASSED [6.6609s] [ 42%] 2025-12-04T09:54:33.0502908Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_constant_original_fqn_and_dtype_cuda <- test/inductor/test_torchinductor.py PASSED [5.9463s] [ 42%] 2025-12-04T09:54:33.0504513Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_device_moved_constant_cuda <- test/inductor/test_torchinductor.py PASSED [10.5693s] [ 43%] 2025-12-04T09:54:33.0506255Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_dynamic_smem_above_default_limit_cuda SKIPPED [0.0004s] (Skipping triton backend only since not big GPU (not enough SM)) [ 44%] 2025-12-04T09:54:33.0507950Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_fill__fallback_cuda <- test/inductor/test_torchinductor.py PASSED [5.7898s] [ 44%] 2025-12-04T09:54:33.0509368Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_fqn_cuda <- test/inductor/test_torchinductor.py PASSED [5.9852s] [ 45%] 2025-12-04T09:54:33.0510844Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_free_inactive_buffer_cuda <- test/inductor/test_torchinductor.py PASSED [5.8639s] [ 46%] 2025-12-04T09:54:33.0512329Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_freezing_cuda <- test/inductor/test_torchinductor.py PASSED [5.6392s] [ 46%] 2025-12-04T09:54:33.0513969Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_linear_dynamic_maxautotune_cuda SKIPPED [0.0004s] (Skipping triton backend only since not big GPU (not enough SM)) [ 47%] 2025-12-04T09:54:33.0515639Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_non_default_gpu_device_cuda SKIPPED [0.0002s] (requires multiple cuda devices) [ 48%] 2025-12-04T09:54:33.0517103Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_on_gpu_device1_cuda SKIPPED [0.0002s] (requires multiple cuda devices) [ 48%] 2025-12-04T09:54:33.0518595Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_proxy_executor_permute_cuda <- test/inductor/test_torchinductor.py PASSED [5.4209s] [ 49%] 2025-12-04T09:54:33.0520124Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_pytree_inputs_cuda <- test/inductor/test_torchinductor.py PASSED [5.8990s] [ 50%] 2025-12-04T09:54:33.0521639Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_runtime_checks_complex_cuda <- test/inductor/test_torchinductor.py PASSED [5.7714s] [ 50%] 2025-12-04T09:54:33.0522972Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_runtime_checks_cuda PASSED [11.4735s] [ 51%] 2025-12-04T09:54:33.0524260Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_size_with_unbacked_add_and_mul_expr_cuda ('RERUN', {'yellow': True}) [1.1028s] [ 51%] 2025-12-04T09:54:33.0525722Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_size_with_unbacked_add_and_mul_expr_cuda ('RERUN', {'yellow': True}) [0.6033s] [ 51%] 2025-12-04T09:54:33.0527098Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_size_with_unbacked_add_and_mul_expr_cuda FAILED [0.6080s] [ 51%] 2025-12-04T09:54:33.0527817Z 2025-12-04T09:54:33.0527963Z ==================================== RERUNS ==================================== 2025-12-04T09:54:33.0528679Z _ AOTInductorTestABICompatibleGpu.test_size_with_unbacked_add_and_mul_expr_cuda _ 2025-12-04T09:54:33.0529292Z Traceback (most recent call last): 2025-12-04T09:54:33.0530165Z File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor.py", line 2019, in test_size_with_unbacked_add_and_mul_expr 2025-12-04T09:54:33.0531068Z self.check_model(Repro(), example_inputs, dynamic_shapes=spec) 2025-12-04T09:54:33.0531862Z File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_utils.py", line 252, in check_model 2025-12-04T09:54:33.0532559Z actual = AOTIRunnerUtil.run( 2025-12-04T09:54:33.0533164Z File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_utils.py", line 184, in run 2025-12-04T09:54:33.0533902Z package_path = AOTIRunnerUtil.compile( 2025-12-04T09:54:33.0534581Z File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_utils.py", line 172, in compile 2025-12-04T09:54:33.0535336Z package_path = torch._inductor.aoti_compile_and_package( 2025-12-04T09:54:33.0536199Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 151, in aoti_compile_and_package 2025-12-04T09:54:33.0537070Z return aot_inductor_minifier_wrapper( 2025-12-04T09:54:33.0537884Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/debug.py", line 1336, in aot_inductor_minifier_wrapper 2025-12-04T09:54:33.0538648Z raise e 2025-12-04T09:54:33.0539339Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/debug.py", line 1306, in aot_inductor_minifier_wrapper 2025-12-04T09:54:33.0540113Z return func( 2025-12-04T09:54:33.0541080Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 195, in _aoti_compile_and_package_inner 2025-12-04T09:54:33.0541999Z aoti_files = aot_compile(gm, args, kwargs, options=inductor_configs) 2025-12-04T09:54:33.0542841Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 311, in aot_compile 2025-12-04T09:54:33.0543557Z return compile_fx_aot( 2025-12-04T09:54:33.0544257Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2007, in compile_fx_aot 2025-12-04T09:54:33.0545004Z compiled_artifacts = compile_fx( 2025-12-04T09:54:33.0545732Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2477, in compile_fx 2025-12-04T09:54:33.0546448Z return compile_fx( 2025-12-04T09:54:33.0547090Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2516, in compile_fx 2025-12-04T09:54:33.0547845Z return _maybe_wrap_and_compile_fx_main( 2025-12-04T09:54:33.0548689Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2605, in _maybe_wrap_and_compile_fx_main 2025-12-04T09:54:33.0549516Z return _compile_fx_main( 2025-12-04T09:54:33.0550228Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2788, in _compile_fx_main 2025-12-04T09:54:33.0551076Z return inference_compiler(unlifted_gm, example_inputs_) 2025-12-04T09:54:33.0551938Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/schemas.py", line 1249, in __call__ 2025-12-04T09:54:33.0552761Z return self.compiler_fn(gm, example_inputs) 2025-12-04T09:54:33.0553543Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2669, in fw_compiler_base 2025-12-04T09:54:33.0554314Z return compile_fx_forward( 2025-12-04T09:54:33.0555057Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2341, in compile_fx_forward 2025-12-04T09:54:33.0555830Z return inner_compile( 2025-12-04T09:54:33.0556317Z File "/opt/conda/envs/py_3.10/lib/python3.10/contextlib.py", line 79, in inner 2025-12-04T09:54:33.0556858Z return func(*args, **kwds) 2025-12-04T09:54:33.0557574Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 806, in compile_fx_inner 2025-12-04T09:54:33.0558476Z return wrap_compiler_debug(_compile_fx_inner, compiler_name="inductor")( 2025-12-04T09:54:33.0559486Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/repro/after_aot.py", line 146, in debug_wrapper 2025-12-04T09:54:33.0560307Z inner_compiled_fn = compiler_fn(gm, example_inputs) 2025-12-04T09:54:33.0561114Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 992, in _compile_fx_inner 2025-12-04T09:54:33.0561960Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T09:54:33.0562870Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 988, in _compile_fx_inner 2025-12-04T09:54:33.0563669Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T09:54:33.0564473Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T09:54:33.0565477Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T09:54:33.0566470Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1471, in codegen_and_compile 2025-12-04T09:54:33.0567262Z _check_triton_bf16_support(graph) 2025-12-04T09:54:33.0568050Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2911, in _check_triton_bf16_support 2025-12-04T09:54:33.0568863Z warn_and_skip(node.get_device()) 2025-12-04T09:54:33.0569595Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2894, in warn_and_skip 2025-12-04T09:54:33.0570354Z raise SkipFrame("BF16 is not supported") 2025-12-04T09:54:33.0570876Z torch._inductor.exc.InductorError: SkipFrame: BF16 is not supported 2025-12-04T09:54:33.0571430Z 2025-12-04T09:54:33.0571650Z To execute this test, run the following from the base repo dir: 2025-12-04T09:54:33.0572708Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_aot_inductor.py AOTInductorTestABICompatibleGpu.test_size_with_unbacked_add_and_mul_expr_cuda 2025-12-04T09:54:33.0573542Z 2025-12-04T09:54:33.0573816Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:54:33.0574465Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T09:54:33.0574936Z unimplemented [] 2025-12-04T09:54:33.0575274Z stats [('calls_captured', 22), ('unique_graphs', 1)] 2025-12-04T09:54:33.0575849Z inductor [('pattern_matcher_count', 2), ('pattern_matcher_nodes', 2)] 2025-12-04T09:54:33.0576408Z graph_break [] 2025-12-04T09:54:33.0576792Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T09:54:33.0577960Z /opt/conda/envs/py_3.10/lib/python3.10/copyreg.py:101: FutureWarning: `isinstance(treespec, LeafSpec)` is deprecated, use `isinstance(treespec, TreeSpec) and treespec.is_leaf()` instead. 2025-12-04T09:54:33.0579028Z return cls.__new__(cls, *args) 2025-12-04T09:54:33.0579993Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T09:54:33.0580960Z warnings.warn( 2025-12-04T09:54:33.0581463Z _ AOTInductorTestABICompatibleGpu.test_size_with_unbacked_add_and_mul_expr_cuda _ 2025-12-04T09:54:33.0582075Z Traceback (most recent call last): 2025-12-04T09:54:33.0582866Z File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor.py", line 2019, in test_size_with_unbacked_add_and_mul_expr 2025-12-04T09:54:33.0583781Z self.check_model(Repro(), example_inputs, dynamic_shapes=spec) 2025-12-04T09:54:33.0584562Z File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_utils.py", line 252, in check_model 2025-12-04T09:54:33.0585260Z actual = AOTIRunnerUtil.run( 2025-12-04T09:54:33.0585876Z File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_utils.py", line 184, in run 2025-12-04T09:54:33.0586535Z package_path = AOTIRunnerUtil.compile( 2025-12-04T09:54:33.0587213Z File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_utils.py", line 172, in compile 2025-12-04T09:54:33.0588134Z package_path = torch._inductor.aoti_compile_and_package( 2025-12-04T09:54:33.0589006Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 151, in aoti_compile_and_package 2025-12-04T09:54:33.0589793Z return aot_inductor_minifier_wrapper( 2025-12-04T09:54:33.0590608Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/debug.py", line 1336, in aot_inductor_minifier_wrapper 2025-12-04T09:54:33.0591483Z raise e 2025-12-04T09:54:33.0592159Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/debug.py", line 1306, in aot_inductor_minifier_wrapper 2025-12-04T09:54:33.0592944Z return func( 2025-12-04T09:54:33.0593664Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 195, in _aoti_compile_and_package_inner 2025-12-04T09:54:33.0594595Z aoti_files = aot_compile(gm, args, kwargs, options=inductor_configs) 2025-12-04T09:54:33.0595424Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 311, in aot_compile 2025-12-04T09:54:33.0596141Z return compile_fx_aot( 2025-12-04T09:54:33.0596844Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2007, in compile_fx_aot 2025-12-04T09:54:33.0597592Z compiled_artifacts = compile_fx( 2025-12-04T09:54:33.0598316Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2477, in compile_fx 2025-12-04T09:54:33.0599038Z return compile_fx( 2025-12-04T09:54:33.0599697Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2516, in compile_fx 2025-12-04T09:54:33.0600434Z return _maybe_wrap_and_compile_fx_main( 2025-12-04T09:54:33.0601278Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2605, in _maybe_wrap_and_compile_fx_main 2025-12-04T09:54:33.0602110Z return _compile_fx_main( 2025-12-04T09:54:33.0602829Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2788, in _compile_fx_main 2025-12-04T09:54:33.0603667Z return inference_compiler(unlifted_gm, example_inputs_) 2025-12-04T09:54:33.0604529Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/schemas.py", line 1249, in __call__ 2025-12-04T09:54:33.0605351Z return self.compiler_fn(gm, example_inputs) 2025-12-04T09:54:33.0606130Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2669, in fw_compiler_base 2025-12-04T09:54:33.0606896Z return compile_fx_forward( 2025-12-04T09:54:33.0607636Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2341, in compile_fx_forward 2025-12-04T09:54:33.0608407Z return inner_compile( 2025-12-04T09:54:33.0608877Z File "/opt/conda/envs/py_3.10/lib/python3.10/contextlib.py", line 79, in inner 2025-12-04T09:54:33.0609418Z return func(*args, **kwds) 2025-12-04T09:54:33.0610136Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 806, in compile_fx_inner 2025-12-04T09:54:33.0611031Z return wrap_compiler_debug(_compile_fx_inner, compiler_name="inductor")( 2025-12-04T09:54:33.0611939Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/repro/after_aot.py", line 146, in debug_wrapper 2025-12-04T09:54:33.0612756Z inner_compiled_fn = compiler_fn(gm, example_inputs) 2025-12-04T09:54:33.0613573Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 992, in _compile_fx_inner 2025-12-04T09:54:33.0614401Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T09:54:33.0615235Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 988, in _compile_fx_inner 2025-12-04T09:54:33.0616031Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T09:54:33.0617022Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T09:54:33.0618009Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T09:54:33.0618994Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1471, in codegen_and_compile 2025-12-04T09:54:33.0619856Z _check_triton_bf16_support(graph) 2025-12-04T09:54:33.0620642Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2911, in _check_triton_bf16_support 2025-12-04T09:54:33.0621456Z warn_and_skip(node.get_device()) 2025-12-04T09:54:33.0622184Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2894, in warn_and_skip 2025-12-04T09:54:33.0622975Z raise SkipFrame("BF16 is not supported") 2025-12-04T09:54:33.0623488Z torch._inductor.exc.InductorError: SkipFrame: BF16 is not supported 2025-12-04T09:54:33.0623894Z 2025-12-04T09:54:33.0624113Z To execute this test, run the following from the base repo dir: 2025-12-04T09:54:33.0625170Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_aot_inductor.py AOTInductorTestABICompatibleGpu.test_size_with_unbacked_add_and_mul_expr_cuda 2025-12-04T09:54:33.0626008Z 2025-12-04T09:54:33.0626293Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:54:33.0626924Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T09:54:33.0627401Z unimplemented [] 2025-12-04T09:54:33.0627740Z stats [('calls_captured', 22), ('unique_graphs', 1)] 2025-12-04T09:54:33.0628277Z inductor [('pattern_matcher_count', 2), ('pattern_matcher_nodes', 2)] 2025-12-04T09:54:33.0628765Z graph_break [] 2025-12-04T09:54:33.0629149Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T09:54:33.0630338Z /opt/conda/envs/py_3.10/lib/python3.10/copyreg.py:101: FutureWarning: `isinstance(treespec, LeafSpec)` is deprecated, use `isinstance(treespec, TreeSpec) and treespec.is_leaf()` instead. 2025-12-04T09:54:33.0631393Z return cls.__new__(cls, *args) 2025-12-04T09:54:33.0632358Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T09:54:33.0633331Z warnings.warn( 2025-12-04T09:54:33.0633720Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T09:54:33.0634179Z unimplemented [] 2025-12-04T09:54:33.0634516Z stats [('calls_captured', 22), ('unique_graphs', 1)] 2025-12-04T09:54:33.0635058Z inductor [('pattern_matcher_count', 2), ('pattern_matcher_nodes', 2)] 2025-12-04T09:54:33.0635535Z graph_break [] 2025-12-04T09:54:33.0635907Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T09:54:33.0637088Z /opt/conda/envs/py_3.10/lib/python3.10/copyreg.py:101: FutureWarning: `isinstance(treespec, LeafSpec)` is deprecated, use `isinstance(treespec, TreeSpec) and treespec.is_leaf()` instead. 2025-12-04T09:54:33.0638145Z return cls.__new__(cls, *args) 2025-12-04T09:54:33.0639081Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T09:54:33.0640047Z warnings.warn( 2025-12-04T09:54:33.0640369Z =================================== FAILURES =================================== 2025-12-04T09:54:33.0641002Z _ AOTInductorTestABICompatibleGpu.test_size_with_unbacked_add_and_mul_expr_cuda _ 2025-12-04T09:54:33.0641614Z Traceback (most recent call last): 2025-12-04T09:54:33.0642409Z File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor.py", line 2019, in test_size_with_unbacked_add_and_mul_expr 2025-12-04T09:54:33.0643323Z self.check_model(Repro(), example_inputs, dynamic_shapes=spec) 2025-12-04T09:54:33.0644178Z File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_utils.py", line 252, in check_model 2025-12-04T09:54:33.0644879Z actual = AOTIRunnerUtil.run( 2025-12-04T09:54:33.0645493Z File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_utils.py", line 184, in run 2025-12-04T09:54:33.0646154Z package_path = AOTIRunnerUtil.compile( 2025-12-04T09:54:33.0646835Z File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_utils.py", line 172, in compile 2025-12-04T09:54:33.0647660Z package_path = torch._inductor.aoti_compile_and_package( 2025-12-04T09:54:33.0648527Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 151, in aoti_compile_and_package 2025-12-04T09:54:33.0649313Z return aot_inductor_minifier_wrapper( 2025-12-04T09:54:33.0650120Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/debug.py", line 1336, in aot_inductor_minifier_wrapper 2025-12-04T09:54:33.0650901Z raise e 2025-12-04T09:54:33.0651577Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/debug.py", line 1306, in aot_inductor_minifier_wrapper 2025-12-04T09:54:33.0652359Z return func( 2025-12-04T09:54:33.0653071Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 195, in _aoti_compile_and_package_inner 2025-12-04T09:54:33.0653991Z aoti_files = aot_compile(gm, args, kwargs, options=inductor_configs) 2025-12-04T09:54:33.0654818Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 311, in aot_compile 2025-12-04T09:54:33.0655536Z return compile_fx_aot( 2025-12-04T09:54:33.0656234Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2007, in compile_fx_aot 2025-12-04T09:54:33.0657071Z compiled_artifacts = compile_fx( 2025-12-04T09:54:33.0657789Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2477, in compile_fx 2025-12-04T09:54:33.0658514Z return compile_fx( 2025-12-04T09:54:33.0659175Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2516, in compile_fx 2025-12-04T09:54:33.0659911Z return _maybe_wrap_and_compile_fx_main( 2025-12-04T09:54:33.0660754Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2605, in _maybe_wrap_and_compile_fx_main 2025-12-04T09:54:33.0661590Z return _compile_fx_main( 2025-12-04T09:54:33.0662309Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2788, in _compile_fx_main 2025-12-04T09:54:33.0663155Z return inference_compiler(unlifted_gm, example_inputs_) 2025-12-04T09:54:33.0664015Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/schemas.py", line 1249, in __call__ 2025-12-04T09:54:33.0664828Z return self.compiler_fn(gm, example_inputs) 2025-12-04T09:54:33.0665609Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2669, in fw_compiler_base 2025-12-04T09:54:33.0666372Z return compile_fx_forward( 2025-12-04T09:54:33.0667109Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2341, in compile_fx_forward 2025-12-04T09:54:33.0667880Z return inner_compile( 2025-12-04T09:54:33.0668349Z File "/opt/conda/envs/py_3.10/lib/python3.10/contextlib.py", line 79, in inner 2025-12-04T09:54:33.0668893Z return func(*args, **kwds) 2025-12-04T09:54:33.0669610Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 806, in compile_fx_inner 2025-12-04T09:54:33.0670509Z return wrap_compiler_debug(_compile_fx_inner, compiler_name="inductor")( 2025-12-04T09:54:33.0671584Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/repro/after_aot.py", line 146, in debug_wrapper 2025-12-04T09:54:33.0672399Z inner_compiled_fn = compiler_fn(gm, example_inputs) 2025-12-04T09:54:33.0673350Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 992, in _compile_fx_inner 2025-12-04T09:54:33.0674178Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T09:54:33.0675012Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 988, in _compile_fx_inner 2025-12-04T09:54:33.0675804Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T09:54:33.0676739Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T09:54:33.0677723Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T09:54:33.0678716Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1471, in codegen_and_compile 2025-12-04T09:54:33.0679511Z _check_triton_bf16_support(graph) 2025-12-04T09:54:33.0680318Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2911, in _check_triton_bf16_support 2025-12-04T09:54:33.0681119Z warn_and_skip(node.get_device()) 2025-12-04T09:54:33.0681850Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2894, in warn_and_skip 2025-12-04T09:54:33.0682619Z raise SkipFrame("BF16 is not supported") 2025-12-04T09:54:33.0683135Z torch._inductor.exc.InductorError: SkipFrame: BF16 is not supported 2025-12-04T09:54:33.0683538Z 2025-12-04T09:54:33.0683756Z To execute this test, run the following from the base repo dir: 2025-12-04T09:54:33.0684808Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_aot_inductor.py AOTInductorTestABICompatibleGpu.test_size_with_unbacked_add_and_mul_expr_cuda 2025-12-04T09:54:33.0685643Z 2025-12-04T09:54:33.0685929Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:54:33.0686555Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T09:54:33.0687033Z unimplemented [] 2025-12-04T09:54:33.0687368Z stats [('calls_captured', 22), ('unique_graphs', 1)] 2025-12-04T09:54:33.0687913Z inductor [('pattern_matcher_count', 2), ('pattern_matcher_nodes', 2)] 2025-12-04T09:54:33.0688384Z graph_break [] 2025-12-04T09:54:33.0688762Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T09:54:33.0689944Z /opt/conda/envs/py_3.10/lib/python3.10/copyreg.py:101: FutureWarning: `isinstance(treespec, LeafSpec)` is deprecated, use `isinstance(treespec, TreeSpec) and treespec.is_leaf()` instead. 2025-12-04T09:54:33.0690997Z return cls.__new__(cls, *args) 2025-12-04T09:54:33.0691949Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T09:54:33.0692917Z warnings.warn( 2025-12-04T09:54:33.0693303Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T09:54:33.0693762Z unimplemented [] 2025-12-04T09:54:33.0694099Z stats [('calls_captured', 22), ('unique_graphs', 1)] 2025-12-04T09:54:33.0694645Z inductor [('pattern_matcher_count', 2), ('pattern_matcher_nodes', 2)] 2025-12-04T09:54:33.0695125Z graph_break [] 2025-12-04T09:54:33.0695503Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T09:54:33.0696774Z /opt/conda/envs/py_3.10/lib/python3.10/copyreg.py:101: FutureWarning: `isinstance(treespec, LeafSpec)` is deprecated, use `isinstance(treespec, TreeSpec) and treespec.is_leaf()` instead. 2025-12-04T09:54:33.0697850Z return cls.__new__(cls, *args) 2025-12-04T09:54:33.0698790Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T09:54:33.0699763Z warnings.warn( 2025-12-04T09:54:33.0700154Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T09:54:33.0700691Z unimplemented [] 2025-12-04T09:54:33.0701029Z stats [('calls_captured', 22), ('unique_graphs', 1)] 2025-12-04T09:54:33.0701575Z inductor [('pattern_matcher_count', 2), ('pattern_matcher_nodes', 2)] 2025-12-04T09:54:33.0702064Z graph_break [] 2025-12-04T09:54:33.0702431Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T09:54:33.0703616Z /opt/conda/envs/py_3.10/lib/python3.10/copyreg.py:101: FutureWarning: `isinstance(treespec, LeafSpec)` is deprecated, use `isinstance(treespec, TreeSpec) and treespec.is_leaf()` instead. 2025-12-04T09:54:33.0704826Z return cls.__new__(cls, *args) 2025-12-04T09:54:33.0705767Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T09:54:33.0706739Z warnings.warn( 2025-12-04T09:54:33.0707662Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-f2c58a9dfc31919e.xml - 2025-12-04T09:54:33.0708716Z =========================== short test summary info ============================ 2025-12-04T09:54:33.0709916Z FAILED [0.6080s] inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_size_with_unbacked_add_and_mul_expr_cuda - torch._inductor.exc.InductorError: SkipFrame: BF16 is not supported 2025-12-04T09:54:33.0710930Z 2025-12-04T09:54:33.0711146Z To execute this test, run the following from the base repo dir: 2025-12-04T09:54:33.0712209Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_aot_inductor.py AOTInductorTestABICompatibleGpu.test_size_with_unbacked_add_and_mul_expr_cuda 2025-12-04T09:54:33.0713043Z 2025-12-04T09:54:33.0713323Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:54:33.0713906Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T09:54:33.0714467Z ======== 1 failed, 50 passed, 29 skipped, 2 rerun in 351.97s (0:05:51) ========= 2025-12-04T09:54:33.0714940Z Got exit code 1 2025-12-04T09:54:33.0715215Z Retrying single test... 2025-12-04T09:54:33.0715841Z W1204 09:51:47.151000 8209 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T09:54:33.0716987Z Test results will be stored in test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-a793ea186f6e0edb.xml 2025-12-04T09:54:33.0717865Z ============================= test session starts ============================== 2025-12-04T09:54:33.0718516Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T09:54:33.0719120Z cachedir: .pytest_cache 2025-12-04T09:54:33.0719834Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T09:54:33.0720619Z rootdir: /var/lib/jenkins/workspace 2025-12-04T09:54:33.0720963Z configfile: pytest.ini 2025-12-04T09:54:33.0721697Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T09:54:33.0722605Z collecting ... collected 934 items / 153 deselected / 781 selected 2025-12-04T09:54:33.0723755Z stepcurrent: skipping 79 already run items. Running only test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_size_with_unbacked_add_and_mul_expr_cuda 2025-12-04T09:54:33.0724777Z Running 1 items in this shard 2025-12-04T09:54:33.0724999Z 2025-12-04T09:54:33.0725660Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_size_with_unbacked_add_and_mul_expr_cuda ('RERUN', {'yellow': True}) [4.1418s] [100%] 2025-12-04T09:54:33.0727128Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_size_with_unbacked_add_and_mul_expr_cuda ('RERUN', {'yellow': True}) [0.5839s] [100%] 2025-12-04T09:54:33.0728605Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_size_with_unbacked_add_and_mul_expr_cuda FAILED [0.5984s] [100%] 2025-12-04T09:54:33.0729322Z 2025-12-04T09:54:33.0729468Z ==================================== RERUNS ==================================== 2025-12-04T09:54:33.0730111Z _ AOTInductorTestABICompatibleGpu.test_size_with_unbacked_add_and_mul_expr_cuda _ 2025-12-04T09:54:33.0730726Z Traceback (most recent call last): 2025-12-04T09:54:33.0731525Z File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor.py", line 2019, in test_size_with_unbacked_add_and_mul_expr 2025-12-04T09:54:33.0732481Z self.check_model(Repro(), example_inputs, dynamic_shapes=spec) 2025-12-04T09:54:33.0733282Z File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_utils.py", line 252, in check_model 2025-12-04T09:54:33.0733980Z actual = AOTIRunnerUtil.run( 2025-12-04T09:54:33.0734588Z File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_utils.py", line 184, in run 2025-12-04T09:54:33.0735264Z package_path = AOTIRunnerUtil.compile( 2025-12-04T09:54:33.0735953Z File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_utils.py", line 172, in compile 2025-12-04T09:54:33.0736793Z package_path = torch._inductor.aoti_compile_and_package( 2025-12-04T09:54:33.0737652Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 151, in aoti_compile_and_package 2025-12-04T09:54:33.0738455Z return aot_inductor_minifier_wrapper( 2025-12-04T09:54:33.0739271Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/debug.py", line 1336, in aot_inductor_minifier_wrapper 2025-12-04T09:54:33.0740041Z raise e 2025-12-04T09:54:33.0740725Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/debug.py", line 1306, in aot_inductor_minifier_wrapper 2025-12-04T09:54:33.0741509Z return func( 2025-12-04T09:54:33.0742223Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 195, in _aoti_compile_and_package_inner 2025-12-04T09:54:33.0743135Z aoti_files = aot_compile(gm, args, kwargs, options=inductor_configs) 2025-12-04T09:54:33.0743968Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 311, in aot_compile 2025-12-04T09:54:33.0744685Z return compile_fx_aot( 2025-12-04T09:54:33.0745374Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2007, in compile_fx_aot 2025-12-04T09:54:33.0746141Z compiled_artifacts = compile_fx( 2025-12-04T09:54:33.0746864Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2477, in compile_fx 2025-12-04T09:54:33.0747588Z return compile_fx( 2025-12-04T09:54:33.0748234Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2516, in compile_fx 2025-12-04T09:54:33.0748980Z return _maybe_wrap_and_compile_fx_main( 2025-12-04T09:54:33.0749826Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2605, in _maybe_wrap_and_compile_fx_main 2025-12-04T09:54:33.0750656Z return _compile_fx_main( 2025-12-04T09:54:33.0751359Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2788, in _compile_fx_main 2025-12-04T09:54:33.0752209Z return inference_compiler(unlifted_gm, example_inputs_) 2025-12-04T09:54:33.0753066Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/schemas.py", line 1249, in __call__ 2025-12-04T09:54:33.0753869Z return self.compiler_fn(gm, example_inputs) 2025-12-04T09:54:33.0754661Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2669, in fw_compiler_base 2025-12-04T09:54:33.0755427Z return compile_fx_forward( 2025-12-04T09:54:33.0756165Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2341, in compile_fx_forward 2025-12-04T09:54:33.0756921Z return inner_compile( 2025-12-04T09:54:33.0757494Z File "/opt/conda/envs/py_3.10/lib/python3.10/contextlib.py", line 79, in inner 2025-12-04T09:54:33.0758039Z return func(*args, **kwds) 2025-12-04T09:54:33.0758740Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 806, in compile_fx_inner 2025-12-04T09:54:33.0759650Z return wrap_compiler_debug(_compile_fx_inner, compiler_name="inductor")( 2025-12-04T09:54:33.0760622Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/repro/after_aot.py", line 146, in debug_wrapper 2025-12-04T09:54:33.0761433Z inner_compiled_fn = compiler_fn(gm, example_inputs) 2025-12-04T09:54:33.0762234Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 992, in _compile_fx_inner 2025-12-04T09:54:33.0763076Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T09:54:33.0763914Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 988, in _compile_fx_inner 2025-12-04T09:54:33.0764706Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T09:54:33.0765516Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T09:54:33.0766526Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T09:54:33.0767518Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1471, in codegen_and_compile 2025-12-04T09:54:33.0768313Z _check_triton_bf16_support(graph) 2025-12-04T09:54:33.0769101Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2911, in _check_triton_bf16_support 2025-12-04T09:54:33.0769917Z warn_and_skip(node.get_device()) 2025-12-04T09:54:33.0770652Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2894, in warn_and_skip 2025-12-04T09:54:33.0771610Z raise SkipFrame("BF16 is not supported") 2025-12-04T09:54:33.0772143Z torch._inductor.exc.InductorError: SkipFrame: BF16 is not supported 2025-12-04T09:54:33.0772543Z 2025-12-04T09:54:33.0772763Z To execute this test, run the following from the base repo dir: 2025-12-04T09:54:33.0773826Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_aot_inductor.py AOTInductorTestABICompatibleGpu.test_size_with_unbacked_add_and_mul_expr_cuda 2025-12-04T09:54:33.0774673Z 2025-12-04T09:54:33.0774940Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:54:33.0775587Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T09:54:33.0776061Z unimplemented [] 2025-12-04T09:54:33.0776474Z stats [('calls_captured', 22), ('unique_graphs', 1)] 2025-12-04T09:54:33.0777014Z inductor [('pattern_matcher_count', 2), ('pattern_matcher_nodes', 2)] 2025-12-04T09:54:33.0777502Z graph_break [] 2025-12-04T09:54:33.0777888Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T09:54:33.0779058Z /opt/conda/envs/py_3.10/lib/python3.10/copyreg.py:101: FutureWarning: `isinstance(treespec, LeafSpec)` is deprecated, use `isinstance(treespec, TreeSpec) and treespec.is_leaf()` instead. 2025-12-04T09:54:33.0780125Z return cls.__new__(cls, *args) 2025-12-04T09:54:33.0781086Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T09:54:33.0782064Z warnings.warn( 2025-12-04T09:54:33.0782569Z _ AOTInductorTestABICompatibleGpu.test_size_with_unbacked_add_and_mul_expr_cuda _ 2025-12-04T09:54:33.0783179Z Traceback (most recent call last): 2025-12-04T09:54:33.0783971Z File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor.py", line 2019, in test_size_with_unbacked_add_and_mul_expr 2025-12-04T09:54:33.0784867Z self.check_model(Repro(), example_inputs, dynamic_shapes=spec) 2025-12-04T09:54:33.0785792Z File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_utils.py", line 252, in check_model 2025-12-04T09:54:33.0786500Z actual = AOTIRunnerUtil.run( 2025-12-04T09:54:33.0787117Z File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_utils.py", line 184, in run 2025-12-04T09:54:33.0787777Z package_path = AOTIRunnerUtil.compile( 2025-12-04T09:54:33.0788456Z File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_utils.py", line 172, in compile 2025-12-04T09:54:33.0789321Z package_path = torch._inductor.aoti_compile_and_package( 2025-12-04T09:54:33.0790171Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 151, in aoti_compile_and_package 2025-12-04T09:54:33.0790971Z return aot_inductor_minifier_wrapper( 2025-12-04T09:54:33.0791776Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/debug.py", line 1336, in aot_inductor_minifier_wrapper 2025-12-04T09:54:33.0792560Z raise e 2025-12-04T09:54:33.0793239Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/debug.py", line 1306, in aot_inductor_minifier_wrapper 2025-12-04T09:54:33.0794021Z return func( 2025-12-04T09:54:33.0794734Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 195, in _aoti_compile_and_package_inner 2025-12-04T09:54:33.0795656Z aoti_files = aot_compile(gm, args, kwargs, options=inductor_configs) 2025-12-04T09:54:33.0796482Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 311, in aot_compile 2025-12-04T09:54:33.0797195Z return compile_fx_aot( 2025-12-04T09:54:33.0797898Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2007, in compile_fx_aot 2025-12-04T09:54:33.0798648Z compiled_artifacts = compile_fx( 2025-12-04T09:54:33.0799366Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2477, in compile_fx 2025-12-04T09:54:33.0800088Z return compile_fx( 2025-12-04T09:54:33.0800744Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2516, in compile_fx 2025-12-04T09:54:33.0801482Z return _maybe_wrap_and_compile_fx_main( 2025-12-04T09:54:33.0802327Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2605, in _maybe_wrap_and_compile_fx_main 2025-12-04T09:54:33.0803160Z return _compile_fx_main( 2025-12-04T09:54:33.0803865Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2788, in _compile_fx_main 2025-12-04T09:54:33.0804719Z return inference_compiler(unlifted_gm, example_inputs_) 2025-12-04T09:54:33.0805578Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/schemas.py", line 1249, in __call__ 2025-12-04T09:54:33.0806396Z return self.compiler_fn(gm, example_inputs) 2025-12-04T09:54:33.0807181Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2669, in fw_compiler_base 2025-12-04T09:54:33.0807941Z return compile_fx_forward( 2025-12-04T09:54:33.0808674Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2341, in compile_fx_forward 2025-12-04T09:54:33.0809439Z return inner_compile( 2025-12-04T09:54:33.0809909Z File "/opt/conda/envs/py_3.10/lib/python3.10/contextlib.py", line 79, in inner 2025-12-04T09:54:33.0810453Z return func(*args, **kwds) 2025-12-04T09:54:33.0811168Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 806, in compile_fx_inner 2025-12-04T09:54:33.0812071Z return wrap_compiler_debug(_compile_fx_inner, compiler_name="inductor")( 2025-12-04T09:54:33.0812974Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/repro/after_aot.py", line 146, in debug_wrapper 2025-12-04T09:54:33.0813792Z inner_compiled_fn = compiler_fn(gm, example_inputs) 2025-12-04T09:54:33.0814680Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 992, in _compile_fx_inner 2025-12-04T09:54:33.0815511Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T09:54:33.0816420Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 988, in _compile_fx_inner 2025-12-04T09:54:33.0817301Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T09:54:33.0818110Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T09:54:33.0819109Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T09:54:33.0820101Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1471, in codegen_and_compile 2025-12-04T09:54:33.0820893Z _check_triton_bf16_support(graph) 2025-12-04T09:54:33.0821689Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2911, in _check_triton_bf16_support 2025-12-04T09:54:33.0822502Z warn_and_skip(node.get_device()) 2025-12-04T09:54:33.0823230Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2894, in warn_and_skip 2025-12-04T09:54:33.0823998Z raise SkipFrame("BF16 is not supported") 2025-12-04T09:54:33.0824513Z torch._inductor.exc.InductorError: SkipFrame: BF16 is not supported 2025-12-04T09:54:33.0824910Z 2025-12-04T09:54:33.0825130Z To execute this test, run the following from the base repo dir: 2025-12-04T09:54:33.0826181Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_aot_inductor.py AOTInductorTestABICompatibleGpu.test_size_with_unbacked_add_and_mul_expr_cuda 2025-12-04T09:54:33.0827011Z 2025-12-04T09:54:33.0827289Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:54:33.0827919Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T09:54:33.0828392Z unimplemented [] 2025-12-04T09:54:33.0828732Z stats [('calls_captured', 22), ('unique_graphs', 1)] 2025-12-04T09:54:33.0829265Z inductor [('pattern_matcher_count', 2), ('pattern_matcher_nodes', 2)] 2025-12-04T09:54:33.0829750Z graph_break [] 2025-12-04T09:54:33.0830131Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T09:54:33.0831315Z /opt/conda/envs/py_3.10/lib/python3.10/copyreg.py:101: FutureWarning: `isinstance(treespec, LeafSpec)` is deprecated, use `isinstance(treespec, TreeSpec) and treespec.is_leaf()` instead. 2025-12-04T09:54:33.0832361Z return cls.__new__(cls, *args) 2025-12-04T09:54:33.0833323Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T09:54:33.0834294Z warnings.warn( 2025-12-04T09:54:33.0834670Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T09:54:33.0835138Z unimplemented [] 2025-12-04T09:54:33.0835470Z stats [('calls_captured', 22), ('unique_graphs', 1)] 2025-12-04T09:54:33.0836012Z inductor [('pattern_matcher_count', 2), ('pattern_matcher_nodes', 2)] 2025-12-04T09:54:33.0836482Z graph_break [] 2025-12-04T09:54:33.0850149Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T09:54:33.0851392Z /opt/conda/envs/py_3.10/lib/python3.10/copyreg.py:101: FutureWarning: `isinstance(treespec, LeafSpec)` is deprecated, use `isinstance(treespec, TreeSpec) and treespec.is_leaf()` instead. 2025-12-04T09:54:33.0852465Z return cls.__new__(cls, *args) 2025-12-04T09:54:33.0853419Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T09:54:33.0854403Z warnings.warn( 2025-12-04T09:54:33.0854729Z =================================== FAILURES =================================== 2025-12-04T09:54:33.0855510Z _ AOTInductorTestABICompatibleGpu.test_size_with_unbacked_add_and_mul_expr_cuda _ 2025-12-04T09:54:33.0856137Z Traceback (most recent call last): 2025-12-04T09:54:33.0857018Z File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor.py", line 2019, in test_size_with_unbacked_add_and_mul_expr 2025-12-04T09:54:33.0857944Z self.check_model(Repro(), example_inputs, dynamic_shapes=spec) 2025-12-04T09:54:33.0858808Z File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_utils.py", line 252, in check_model 2025-12-04T09:54:33.0859510Z actual = AOTIRunnerUtil.run( 2025-12-04T09:54:33.0860132Z File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_utils.py", line 184, in run 2025-12-04T09:54:33.0860800Z package_path = AOTIRunnerUtil.compile( 2025-12-04T09:54:33.0861491Z File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_utils.py", line 172, in compile 2025-12-04T09:54:33.0862256Z package_path = torch._inductor.aoti_compile_and_package( 2025-12-04T09:54:33.0863133Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 151, in aoti_compile_and_package 2025-12-04T09:54:33.0863920Z return aot_inductor_minifier_wrapper( 2025-12-04T09:54:33.0864735Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/debug.py", line 1336, in aot_inductor_minifier_wrapper 2025-12-04T09:54:33.0865550Z raise e 2025-12-04T09:54:33.0866236Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/debug.py", line 1306, in aot_inductor_minifier_wrapper 2025-12-04T09:54:33.0867010Z return func( 2025-12-04T09:54:33.0867729Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 195, in _aoti_compile_and_package_inner 2025-12-04T09:54:33.0868661Z aoti_files = aot_compile(gm, args, kwargs, options=inductor_configs) 2025-12-04T09:54:33.0869498Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 311, in aot_compile 2025-12-04T09:54:33.0870220Z return compile_fx_aot( 2025-12-04T09:54:33.0870927Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2007, in compile_fx_aot 2025-12-04T09:54:33.0871882Z compiled_artifacts = compile_fx( 2025-12-04T09:54:33.0872592Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2477, in compile_fx 2025-12-04T09:54:33.0873331Z return compile_fx( 2025-12-04T09:54:33.0874000Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2516, in compile_fx 2025-12-04T09:54:33.0874743Z return _maybe_wrap_and_compile_fx_main( 2025-12-04T09:54:33.0875590Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2605, in _maybe_wrap_and_compile_fx_main 2025-12-04T09:54:33.0876426Z return _compile_fx_main( 2025-12-04T09:54:33.0877151Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2788, in _compile_fx_main 2025-12-04T09:54:33.0877993Z return inference_compiler(unlifted_gm, example_inputs_) 2025-12-04T09:54:33.0878852Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/schemas.py", line 1249, in __call__ 2025-12-04T09:54:33.0879670Z return self.compiler_fn(gm, example_inputs) 2025-12-04T09:54:33.0880471Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2669, in fw_compiler_base 2025-12-04T09:54:33.0881228Z return compile_fx_forward( 2025-12-04T09:54:33.0881969Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2341, in compile_fx_forward 2025-12-04T09:54:33.0882746Z return inner_compile( 2025-12-04T09:54:33.0883231Z File "/opt/conda/envs/py_3.10/lib/python3.10/contextlib.py", line 79, in inner 2025-12-04T09:54:33.0883776Z return func(*args, **kwds) 2025-12-04T09:54:33.0884678Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 806, in compile_fx_inner 2025-12-04T09:54:33.0885601Z return wrap_compiler_debug(_compile_fx_inner, compiler_name="inductor")( 2025-12-04T09:54:33.0886508Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/repro/after_aot.py", line 146, in debug_wrapper 2025-12-04T09:54:33.0887434Z inner_compiled_fn = compiler_fn(gm, example_inputs) 2025-12-04T09:54:33.0888255Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 992, in _compile_fx_inner 2025-12-04T09:54:33.0889097Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T09:54:33.0889942Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 988, in _compile_fx_inner 2025-12-04T09:54:33.0890746Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T09:54:33.0891569Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T09:54:33.0892552Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T09:54:33.0893549Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1471, in codegen_and_compile 2025-12-04T09:54:33.0894340Z _check_triton_bf16_support(graph) 2025-12-04T09:54:33.0895148Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2911, in _check_triton_bf16_support 2025-12-04T09:54:33.0895950Z warn_and_skip(node.get_device()) 2025-12-04T09:54:33.0896767Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2894, in warn_and_skip 2025-12-04T09:54:33.0897547Z raise SkipFrame("BF16 is not supported") 2025-12-04T09:54:33.0898059Z torch._inductor.exc.InductorError: SkipFrame: BF16 is not supported 2025-12-04T09:54:33.0898465Z 2025-12-04T09:54:33.0898694Z To execute this test, run the following from the base repo dir: 2025-12-04T09:54:33.0899765Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_aot_inductor.py AOTInductorTestABICompatibleGpu.test_size_with_unbacked_add_and_mul_expr_cuda 2025-12-04T09:54:33.0900601Z 2025-12-04T09:54:33.0900883Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:54:33.0901554Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T09:54:33.0902038Z unimplemented [] 2025-12-04T09:54:33.0902377Z stats [('calls_captured', 22), ('unique_graphs', 1)] 2025-12-04T09:54:33.0902933Z inductor [('pattern_matcher_count', 2), ('pattern_matcher_nodes', 2)] 2025-12-04T09:54:33.0903406Z graph_break [] 2025-12-04T09:54:33.0903789Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T09:54:33.0904986Z /opt/conda/envs/py_3.10/lib/python3.10/copyreg.py:101: FutureWarning: `isinstance(treespec, LeafSpec)` is deprecated, use `isinstance(treespec, TreeSpec) and treespec.is_leaf()` instead. 2025-12-04T09:54:33.0906046Z return cls.__new__(cls, *args) 2025-12-04T09:54:33.0907011Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T09:54:33.0907985Z warnings.warn( 2025-12-04T09:54:33.0908382Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T09:54:33.0908846Z unimplemented [] 2025-12-04T09:54:33.0909197Z stats [('calls_captured', 22), ('unique_graphs', 1)] 2025-12-04T09:54:33.0909743Z inductor [('pattern_matcher_count', 2), ('pattern_matcher_nodes', 2)] 2025-12-04T09:54:33.0910230Z graph_break [] 2025-12-04T09:54:33.0910595Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T09:54:33.0911901Z /opt/conda/envs/py_3.10/lib/python3.10/copyreg.py:101: FutureWarning: `isinstance(treespec, LeafSpec)` is deprecated, use `isinstance(treespec, TreeSpec) and treespec.is_leaf()` instead. 2025-12-04T09:54:33.0912973Z return cls.__new__(cls, *args) 2025-12-04T09:54:33.0913935Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T09:54:33.0914894Z warnings.warn( 2025-12-04T09:54:33.0915286Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T09:54:33.0915832Z unimplemented [] 2025-12-04T09:54:33.0916154Z stats [('calls_captured', 22), ('unique_graphs', 1)] 2025-12-04T09:54:33.0916698Z inductor [('pattern_matcher_count', 2), ('pattern_matcher_nodes', 2)] 2025-12-04T09:54:33.0917183Z graph_break [] 2025-12-04T09:54:33.0917562Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T09:54:33.0918724Z /opt/conda/envs/py_3.10/lib/python3.10/copyreg.py:101: FutureWarning: `isinstance(treespec, LeafSpec)` is deprecated, use `isinstance(treespec, TreeSpec) and treespec.is_leaf()` instead. 2025-12-04T09:54:33.0919790Z return cls.__new__(cls, *args) 2025-12-04T09:54:33.0920745Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T09:54:33.0921704Z warnings.warn( 2025-12-04T09:54:33.0922623Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-a793ea186f6e0edb.xml - 2025-12-04T09:54:33.0923686Z =========================== short test summary info ============================ 2025-12-04T09:54:33.0924903Z FAILED [0.5984s] inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_size_with_unbacked_add_and_mul_expr_cuda - torch._inductor.exc.InductorError: SkipFrame: BF16 is not supported 2025-12-04T09:54:33.0925908Z 2025-12-04T09:54:33.0926141Z To execute this test, run the following from the base repo dir: 2025-12-04T09:54:33.0927189Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_aot_inductor.py AOTInductorTestABICompatibleGpu.test_size_with_unbacked_add_and_mul_expr_cuda 2025-12-04T09:54:33.0928032Z 2025-12-04T09:54:33.0928302Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:54:33.0928898Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T09:54:33.0929435Z ================== 1 failed, 153 deselected, 2 rerun in 5.41s ================== 2025-12-04T09:54:33.0929874Z Got exit code 1 2025-12-04T09:54:33.0930147Z Retrying single test... 2025-12-04T09:54:33.0930778Z W1204 09:52:09.969000 8437 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T09:54:33.0931912Z Test results will be stored in test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-fcd1db8f24799401.xml 2025-12-04T09:54:33.0932790Z ============================= test session starts ============================== 2025-12-04T09:54:33.0933458Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T09:54:33.0934065Z cachedir: .pytest_cache 2025-12-04T09:54:33.0934770Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T09:54:33.0935562Z rootdir: /var/lib/jenkins/workspace 2025-12-04T09:54:33.0935927Z configfile: pytest.ini 2025-12-04T09:54:33.0936739Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T09:54:33.0937657Z collecting ... collected 934 items / 153 deselected / 781 selected 2025-12-04T09:54:33.0938812Z stepcurrent: skipping 79 already run items. Running only test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_size_with_unbacked_add_and_mul_expr_cuda 2025-12-04T09:54:33.0939843Z Running 1 items in this shard 2025-12-04T09:54:33.0940056Z 2025-12-04T09:54:33.0940836Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_size_with_unbacked_add_and_mul_expr_cuda ('RERUN', {'yellow': True}) [4.0766s] [100%] 2025-12-04T09:54:33.0942313Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_size_with_unbacked_add_and_mul_expr_cuda ('RERUN', {'yellow': True}) [0.5983s] [100%] 2025-12-04T09:54:33.0943702Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_size_with_unbacked_add_and_mul_expr_cuda FAILED [0.6080s] [100%] 2025-12-04T09:54:33.0944527Z 2025-12-04T09:54:33.0944687Z ==================================== RERUNS ==================================== 2025-12-04T09:54:33.0945312Z _ AOTInductorTestABICompatibleGpu.test_size_with_unbacked_add_and_mul_expr_cuda _ 2025-12-04T09:54:33.0945931Z Traceback (most recent call last): 2025-12-04T09:54:33.0946725Z File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor.py", line 2019, in test_size_with_unbacked_add_and_mul_expr 2025-12-04T09:54:33.0947644Z self.check_model(Repro(), example_inputs, dynamic_shapes=spec) 2025-12-04T09:54:33.0948430Z File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_utils.py", line 252, in check_model 2025-12-04T09:54:33.0949130Z actual = AOTIRunnerUtil.run( 2025-12-04T09:54:33.0949752Z File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_utils.py", line 184, in run 2025-12-04T09:54:33.0950421Z package_path = AOTIRunnerUtil.compile( 2025-12-04T09:54:33.0951105Z File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_utils.py", line 172, in compile 2025-12-04T09:54:33.0951860Z package_path = torch._inductor.aoti_compile_and_package( 2025-12-04T09:54:33.0952725Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 151, in aoti_compile_and_package 2025-12-04T09:54:33.0955908Z return aot_inductor_minifier_wrapper( 2025-12-04T09:54:33.0956736Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/debug.py", line 1336, in aot_inductor_minifier_wrapper 2025-12-04T09:54:33.0957516Z raise e 2025-12-04T09:54:33.0958189Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/debug.py", line 1306, in aot_inductor_minifier_wrapper 2025-12-04T09:54:33.0958981Z return func( 2025-12-04T09:54:33.0959693Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 195, in _aoti_compile_and_package_inner 2025-12-04T09:54:33.0960622Z aoti_files = aot_compile(gm, args, kwargs, options=inductor_configs) 2025-12-04T09:54:33.0961451Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 311, in aot_compile 2025-12-04T09:54:33.0962172Z return compile_fx_aot( 2025-12-04T09:54:33.0962873Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2007, in compile_fx_aot 2025-12-04T09:54:33.0963634Z compiled_artifacts = compile_fx( 2025-12-04T09:54:33.0964349Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2477, in compile_fx 2025-12-04T09:54:33.0965073Z return compile_fx( 2025-12-04T09:54:33.0965730Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2516, in compile_fx 2025-12-04T09:54:33.0966470Z return _maybe_wrap_and_compile_fx_main( 2025-12-04T09:54:33.0967313Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2605, in _maybe_wrap_and_compile_fx_main 2025-12-04T09:54:33.0968149Z return _compile_fx_main( 2025-12-04T09:54:33.0968868Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2788, in _compile_fx_main 2025-12-04T09:54:33.0969703Z return inference_compiler(unlifted_gm, example_inputs_) 2025-12-04T09:54:33.0970567Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/schemas.py", line 1249, in __call__ 2025-12-04T09:54:33.0971791Z return self.compiler_fn(gm, example_inputs) 2025-12-04T09:54:33.0972573Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2669, in fw_compiler_base 2025-12-04T09:54:33.0973340Z return compile_fx_forward( 2025-12-04T09:54:33.0974084Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2341, in compile_fx_forward 2025-12-04T09:54:33.0974955Z return inner_compile( 2025-12-04T09:54:33.0975425Z File "/opt/conda/envs/py_3.10/lib/python3.10/contextlib.py", line 79, in inner 2025-12-04T09:54:33.0975967Z return func(*args, **kwds) 2025-12-04T09:54:33.0976774Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 806, in compile_fx_inner 2025-12-04T09:54:33.0977685Z return wrap_compiler_debug(_compile_fx_inner, compiler_name="inductor")( 2025-12-04T09:54:33.0978592Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/repro/after_aot.py", line 146, in debug_wrapper 2025-12-04T09:54:33.0979413Z inner_compiled_fn = compiler_fn(gm, example_inputs) 2025-12-04T09:54:33.0980233Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 992, in _compile_fx_inner 2025-12-04T09:54:33.0981071Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T09:54:33.0981934Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 988, in _compile_fx_inner 2025-12-04T09:54:33.0982723Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T09:54:33.0983547Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T09:54:33.0984552Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T09:54:33.0985548Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1471, in codegen_and_compile 2025-12-04T09:54:33.0986333Z _check_triton_bf16_support(graph) 2025-12-04T09:54:33.0987142Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2911, in _check_triton_bf16_support 2025-12-04T09:54:33.0987964Z warn_and_skip(node.get_device()) 2025-12-04T09:54:33.0988688Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2894, in warn_and_skip 2025-12-04T09:54:33.0989475Z raise SkipFrame("BF16 is not supported") 2025-12-04T09:54:33.0990006Z torch._inductor.exc.InductorError: SkipFrame: BF16 is not supported 2025-12-04T09:54:33.0990400Z 2025-12-04T09:54:33.0990637Z To execute this test, run the following from the base repo dir: 2025-12-04T09:54:33.0991704Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_aot_inductor.py AOTInductorTestABICompatibleGpu.test_size_with_unbacked_add_and_mul_expr_cuda 2025-12-04T09:54:33.0992566Z 2025-12-04T09:54:33.0992842Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:54:33.0993496Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T09:54:33.0993978Z unimplemented [] 2025-12-04T09:54:33.0994310Z stats [('calls_captured', 22), ('unique_graphs', 1)] 2025-12-04T09:54:33.0994868Z inductor [('pattern_matcher_count', 2), ('pattern_matcher_nodes', 2)] 2025-12-04T09:54:33.0995361Z graph_break [] 2025-12-04T09:54:33.0995736Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T09:54:33.0996927Z /opt/conda/envs/py_3.10/lib/python3.10/copyreg.py:101: FutureWarning: `isinstance(treespec, LeafSpec)` is deprecated, use `isinstance(treespec, TreeSpec) and treespec.is_leaf()` instead. 2025-12-04T09:54:33.0998008Z return cls.__new__(cls, *args) 2025-12-04T09:54:33.0998965Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T09:54:33.1000032Z warnings.warn( 2025-12-04T09:54:33.1000555Z _ AOTInductorTestABICompatibleGpu.test_size_with_unbacked_add_and_mul_expr_cuda _ 2025-12-04T09:54:33.1001170Z Traceback (most recent call last): 2025-12-04T09:54:33.1001945Z File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor.py", line 2019, in test_size_with_unbacked_add_and_mul_expr 2025-12-04T09:54:33.1002859Z self.check_model(Repro(), example_inputs, dynamic_shapes=spec) 2025-12-04T09:54:33.1003724Z File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_utils.py", line 252, in check_model 2025-12-04T09:54:33.1004431Z actual = AOTIRunnerUtil.run( 2025-12-04T09:54:33.1005039Z File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_utils.py", line 184, in run 2025-12-04T09:54:33.1005715Z package_path = AOTIRunnerUtil.compile( 2025-12-04T09:54:33.1006402Z File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_utils.py", line 172, in compile 2025-12-04T09:54:33.1007145Z package_path = torch._inductor.aoti_compile_and_package( 2025-12-04T09:54:33.1008016Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 151, in aoti_compile_and_package 2025-12-04T09:54:33.1008817Z return aot_inductor_minifier_wrapper( 2025-12-04T09:54:33.1009628Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/debug.py", line 1336, in aot_inductor_minifier_wrapper 2025-12-04T09:54:33.1010401Z raise e 2025-12-04T09:54:33.1011092Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/debug.py", line 1306, in aot_inductor_minifier_wrapper 2025-12-04T09:54:33.1011880Z return func( 2025-12-04T09:54:33.1012593Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 195, in _aoti_compile_and_package_inner 2025-12-04T09:54:33.1013507Z aoti_files = aot_compile(gm, args, kwargs, options=inductor_configs) 2025-12-04T09:54:33.1014349Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 311, in aot_compile 2025-12-04T09:54:33.1015062Z return compile_fx_aot( 2025-12-04T09:54:33.1015749Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2007, in compile_fx_aot 2025-12-04T09:54:33.1016618Z compiled_artifacts = compile_fx( 2025-12-04T09:54:33.1017344Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2477, in compile_fx 2025-12-04T09:54:33.1018080Z return compile_fx( 2025-12-04T09:54:33.1018730Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2516, in compile_fx 2025-12-04T09:54:33.1019492Z return _maybe_wrap_and_compile_fx_main( 2025-12-04T09:54:33.1020341Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2605, in _maybe_wrap_and_compile_fx_main 2025-12-04T09:54:33.1021159Z return _compile_fx_main( 2025-12-04T09:54:33.1021886Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2788, in _compile_fx_main 2025-12-04T09:54:33.1022738Z return inference_compiler(unlifted_gm, example_inputs_) 2025-12-04T09:54:33.1023598Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/schemas.py", line 1249, in __call__ 2025-12-04T09:54:33.1024400Z return self.compiler_fn(gm, example_inputs) 2025-12-04T09:54:33.1025196Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2669, in fw_compiler_base 2025-12-04T09:54:33.1025959Z return compile_fx_forward( 2025-12-04T09:54:33.1026699Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2341, in compile_fx_forward 2025-12-04T09:54:33.1027461Z return inner_compile( 2025-12-04T09:54:33.1027943Z File "/opt/conda/envs/py_3.10/lib/python3.10/contextlib.py", line 79, in inner 2025-12-04T09:54:33.1028486Z return func(*args, **kwds) 2025-12-04T09:54:33.1029310Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 806, in compile_fx_inner 2025-12-04T09:54:33.1030221Z return wrap_compiler_debug(_compile_fx_inner, compiler_name="inductor")( 2025-12-04T09:54:33.1031131Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/repro/after_aot.py", line 146, in debug_wrapper 2025-12-04T09:54:33.1032014Z inner_compiled_fn = compiler_fn(gm, example_inputs) 2025-12-04T09:54:33.1032820Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 992, in _compile_fx_inner 2025-12-04T09:54:33.1033663Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T09:54:33.1034498Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 988, in _compile_fx_inner 2025-12-04T09:54:33.1035295Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T09:54:33.1036105Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T09:54:33.1037102Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T09:54:33.1038097Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1471, in codegen_and_compile 2025-12-04T09:54:33.1038875Z _check_triton_bf16_support(graph) 2025-12-04T09:54:33.1039706Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2911, in _check_triton_bf16_support 2025-12-04T09:54:33.1040518Z warn_and_skip(node.get_device()) 2025-12-04T09:54:33.1041251Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2894, in warn_and_skip 2025-12-04T09:54:33.1042009Z raise SkipFrame("BF16 is not supported") 2025-12-04T09:54:33.1042534Z torch._inductor.exc.InductorError: SkipFrame: BF16 is not supported 2025-12-04T09:54:33.1042920Z 2025-12-04T09:54:33.1043154Z To execute this test, run the following from the base repo dir: 2025-12-04T09:54:33.1044214Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_aot_inductor.py AOTInductorTestABICompatibleGpu.test_size_with_unbacked_add_and_mul_expr_cuda 2025-12-04T09:54:33.1045052Z 2025-12-04T09:54:33.1045325Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:54:33.1045968Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T09:54:33.1046442Z unimplemented [] 2025-12-04T09:54:33.1046764Z stats [('calls_captured', 22), ('unique_graphs', 1)] 2025-12-04T09:54:33.1047308Z inductor [('pattern_matcher_count', 2), ('pattern_matcher_nodes', 2)] 2025-12-04T09:54:33.1047795Z graph_break [] 2025-12-04T09:54:33.1048171Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T09:54:33.1049344Z /opt/conda/envs/py_3.10/lib/python3.10/copyreg.py:101: FutureWarning: `isinstance(treespec, LeafSpec)` is deprecated, use `isinstance(treespec, TreeSpec) and treespec.is_leaf()` instead. 2025-12-04T09:54:33.1050413Z return cls.__new__(cls, *args) 2025-12-04T09:54:33.1051365Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T09:54:33.1052336Z warnings.warn( 2025-12-04T09:54:33.1052716Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T09:54:33.1053184Z unimplemented [] 2025-12-04T09:54:33.1053519Z stats [('calls_captured', 22), ('unique_graphs', 1)] 2025-12-04T09:54:33.1054051Z inductor [('pattern_matcher_count', 2), ('pattern_matcher_nodes', 2)] 2025-12-04T09:54:33.1054544Z graph_break [] 2025-12-04T09:54:33.1054922Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T09:54:33.1056206Z /opt/conda/envs/py_3.10/lib/python3.10/copyreg.py:101: FutureWarning: `isinstance(treespec, LeafSpec)` is deprecated, use `isinstance(treespec, TreeSpec) and treespec.is_leaf()` instead. 2025-12-04T09:54:33.1057378Z return cls.__new__(cls, *args) 2025-12-04T09:54:33.1058339Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T09:54:33.1059313Z warnings.warn( 2025-12-04T09:54:33.1059691Z =================================== FAILURES =================================== 2025-12-04T09:54:33.1060348Z _ AOTInductorTestABICompatibleGpu.test_size_with_unbacked_add_and_mul_expr_cuda _ 2025-12-04T09:54:33.1060969Z Traceback (most recent call last): 2025-12-04T09:54:33.1061755Z File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor.py", line 2019, in test_size_with_unbacked_add_and_mul_expr 2025-12-04T09:54:33.1062676Z self.check_model(Repro(), example_inputs, dynamic_shapes=spec) 2025-12-04T09:54:33.1063479Z File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_utils.py", line 252, in check_model 2025-12-04T09:54:33.1064183Z actual = AOTIRunnerUtil.run( 2025-12-04T09:54:33.1064790Z File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_utils.py", line 184, in run 2025-12-04T09:54:33.1065472Z package_path = AOTIRunnerUtil.compile( 2025-12-04T09:54:33.1066154Z File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_utils.py", line 172, in compile 2025-12-04T09:54:33.1066909Z package_path = torch._inductor.aoti_compile_and_package( 2025-12-04T09:54:33.1067769Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 151, in aoti_compile_and_package 2025-12-04T09:54:33.1068573Z return aot_inductor_minifier_wrapper( 2025-12-04T09:54:33.1069387Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/debug.py", line 1336, in aot_inductor_minifier_wrapper 2025-12-04T09:54:33.1070159Z raise e 2025-12-04T09:54:33.1070846Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/debug.py", line 1306, in aot_inductor_minifier_wrapper 2025-12-04T09:54:33.1071863Z return func( 2025-12-04T09:54:33.1072578Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 195, in _aoti_compile_and_package_inner 2025-12-04T09:54:33.1073492Z aoti_files = aot_compile(gm, args, kwargs, options=inductor_configs) 2025-12-04T09:54:33.1074335Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 311, in aot_compile 2025-12-04T09:54:33.1075053Z return compile_fx_aot( 2025-12-04T09:54:33.1075747Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2007, in compile_fx_aot 2025-12-04T09:54:33.1076508Z compiled_artifacts = compile_fx( 2025-12-04T09:54:33.1077231Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2477, in compile_fx 2025-12-04T09:54:33.1077946Z return compile_fx( 2025-12-04T09:54:33.1078596Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2516, in compile_fx 2025-12-04T09:54:33.1079342Z return _maybe_wrap_and_compile_fx_main( 2025-12-04T09:54:33.1080187Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2605, in _maybe_wrap_and_compile_fx_main 2025-12-04T09:54:33.1081011Z return _compile_fx_main( 2025-12-04T09:54:33.1081721Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2788, in _compile_fx_main 2025-12-04T09:54:33.1082574Z return inference_compiler(unlifted_gm, example_inputs_) 2025-12-04T09:54:33.1083440Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/schemas.py", line 1249, in __call__ 2025-12-04T09:54:33.1084242Z return self.compiler_fn(gm, example_inputs) 2025-12-04T09:54:33.1085209Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2669, in fw_compiler_base 2025-12-04T09:54:33.1085985Z return compile_fx_forward( 2025-12-04T09:54:33.1086730Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2341, in compile_fx_forward 2025-12-04T09:54:33.1087488Z return inner_compile( 2025-12-04T09:54:33.1087974Z File "/opt/conda/envs/py_3.10/lib/python3.10/contextlib.py", line 79, in inner 2025-12-04T09:54:33.1088614Z return func(*args, **kwds) 2025-12-04T09:54:33.1089329Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 806, in compile_fx_inner 2025-12-04T09:54:33.1090252Z return wrap_compiler_debug(_compile_fx_inner, compiler_name="inductor")( 2025-12-04T09:54:33.1091168Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/repro/after_aot.py", line 146, in debug_wrapper 2025-12-04T09:54:33.1091997Z inner_compiled_fn = compiler_fn(gm, example_inputs) 2025-12-04T09:54:33.1092815Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 992, in _compile_fx_inner 2025-12-04T09:54:33.1093671Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T09:54:33.1094516Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 988, in _compile_fx_inner 2025-12-04T09:54:33.1095323Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T09:54:33.1096138Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T09:54:33.1097320Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T09:54:33.1098313Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1471, in codegen_and_compile 2025-12-04T09:54:33.1099114Z _check_triton_bf16_support(graph) 2025-12-04T09:54:33.1099932Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2911, in _check_triton_bf16_support 2025-12-04T09:54:33.1100753Z warn_and_skip(node.get_device()) 2025-12-04T09:54:33.1101502Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2894, in warn_and_skip 2025-12-04T09:54:33.1102298Z raise SkipFrame("BF16 is not supported") 2025-12-04T09:54:33.1102838Z torch._inductor.exc.InductorError: SkipFrame: BF16 is not supported 2025-12-04T09:54:33.1103243Z 2025-12-04T09:54:33.1103462Z To execute this test, run the following from the base repo dir: 2025-12-04T09:54:33.1104556Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_aot_inductor.py AOTInductorTestABICompatibleGpu.test_size_with_unbacked_add_and_mul_expr_cuda 2025-12-04T09:54:33.1105438Z 2025-12-04T09:54:33.1105714Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:54:33.1106384Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T09:54:33.1106870Z unimplemented [] 2025-12-04T09:54:33.1107225Z stats [('calls_captured', 22), ('unique_graphs', 1)] 2025-12-04T09:54:33.1107793Z inductor [('pattern_matcher_count', 2), ('pattern_matcher_nodes', 2)] 2025-12-04T09:54:33.1108277Z graph_break [] 2025-12-04T09:54:33.1108662Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T09:54:33.1109905Z /opt/conda/envs/py_3.10/lib/python3.10/copyreg.py:101: FutureWarning: `isinstance(treespec, LeafSpec)` is deprecated, use `isinstance(treespec, TreeSpec) and treespec.is_leaf()` instead. 2025-12-04T09:54:33.1111003Z return cls.__new__(cls, *args) 2025-12-04T09:54:33.1112013Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T09:54:33.1113016Z warnings.warn( 2025-12-04T09:54:33.1113420Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T09:54:33.1113890Z unimplemented [] 2025-12-04T09:54:33.1114377Z stats [('calls_captured', 22), ('unique_graphs', 1)] 2025-12-04T09:54:33.1114947Z inductor [('pattern_matcher_count', 2), ('pattern_matcher_nodes', 2)] 2025-12-04T09:54:33.1115429Z graph_break [] 2025-12-04T09:54:33.1115811Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T09:54:33.1117005Z /opt/conda/envs/py_3.10/lib/python3.10/copyreg.py:101: FutureWarning: `isinstance(treespec, LeafSpec)` is deprecated, use `isinstance(treespec, TreeSpec) and treespec.is_leaf()` instead. 2025-12-04T09:54:33.1118149Z return cls.__new__(cls, *args) 2025-12-04T09:54:33.1119118Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T09:54:33.1120111Z warnings.warn( 2025-12-04T09:54:33.1120503Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T09:54:33.1120964Z unimplemented [] 2025-12-04T09:54:33.1121314Z stats [('calls_captured', 22), ('unique_graphs', 1)] 2025-12-04T09:54:33.1121865Z inductor [('pattern_matcher_count', 2), ('pattern_matcher_nodes', 2)] 2025-12-04T09:54:33.1122343Z graph_break [] 2025-12-04T09:54:33.1122722Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T09:54:33.1123913Z /opt/conda/envs/py_3.10/lib/python3.10/copyreg.py:101: FutureWarning: `isinstance(treespec, LeafSpec)` is deprecated, use `isinstance(treespec, TreeSpec) and treespec.is_leaf()` instead. 2025-12-04T09:54:33.1124989Z return cls.__new__(cls, *args) 2025-12-04T09:54:33.1125930Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T09:54:33.1126900Z warnings.warn( 2025-12-04T09:54:33.1127830Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-fcd1db8f24799401.xml - 2025-12-04T09:54:33.1128900Z =========================== short test summary info ============================ 2025-12-04T09:54:33.1130105Z FAILED [0.6080s] inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_size_with_unbacked_add_and_mul_expr_cuda - torch._inductor.exc.InductorError: SkipFrame: BF16 is not supported 2025-12-04T09:54:33.1131130Z 2025-12-04T09:54:33.1131358Z To execute this test, run the following from the base repo dir: 2025-12-04T09:54:33.1132418Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_aot_inductor.py AOTInductorTestABICompatibleGpu.test_size_with_unbacked_add_and_mul_expr_cuda 2025-12-04T09:54:33.1133252Z 2025-12-04T09:54:33.1133535Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:54:33.1134124Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T09:54:33.1134661Z ================== 1 failed, 153 deselected, 2 rerun in 5.37s ================== 2025-12-04T09:54:33.1135123Z Got exit code 1 2025-12-04T09:54:33.1135905Z FAILED CONSISTENTLY: test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_size_with_unbacked_add_and_mul_expr_cuda 2025-12-04T09:54:33.1137168Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T09:54:33.1138180Z W1204 09:52:32.750000 8665 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T09:54:33.1139341Z Test results will be stored in test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-bb08f25297cc596b.xml 2025-12-04T09:54:33.1140216Z ============================= test session starts ============================== 2025-12-04T09:54:33.1140877Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T09:54:33.1141491Z cachedir: .pytest_cache 2025-12-04T09:54:33.1142343Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T09:54:33.1143123Z rootdir: /var/lib/jenkins/workspace 2025-12-04T09:54:33.1143485Z configfile: pytest.ini 2025-12-04T09:54:33.1144215Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T09:54:33.1145131Z collecting ... collected 934 items / 80 deselected / 854 selected 2025-12-04T09:54:33.1145707Z stepcurrent: skipping 80 already run items. 2025-12-04T09:54:33.1146103Z Running 74 items in this shard 2025-12-04T09:54:33.1146314Z 2025-12-04T09:54:33.1147000Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_symint_item_cuda <- test/inductor/test_torchinductor.py PASSED [6.5275s] [ 1%] 2025-12-04T09:54:33.1148797Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_triton_kernel_extern_kernel_arg_cuda W1204 09:52:44.647000 8665 site-packages/torch/_inductor/utils.py:1703] [0/0] Not enough SMs to use max_autotune_gemm mode 2025-12-04T09:54:33.1150076Z PASSED [9.2830s] [ 2%] 2025-12-04T09:54:33.1150978Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_triton_kernel_grid_type_1_num_dims_1_dynamic_True_autotune_False_cuda PASSED [6.5493s] [ 4%] 2025-12-04T09:54:33.1152532Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_triton_kernel_grid_type_1_num_dims_2_dynamic_False_autotune_False_cuda PASSED [6.4522s] [ 5%] 2025-12-04T09:54:33.1154090Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_triton_kernel_grid_type_1_num_dims_2_dynamic_True_autotune_True_cuda PASSED [9.9408s] [ 6%] 2025-12-04T09:54:33.1155638Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_triton_kernel_grid_type_2_num_dims_1_dynamic_True_autotune_False_cuda PASSED [6.4978s] [ 8%] 2025-12-04T09:54:33.1157207Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_triton_kernel_grid_type_2_num_dims_2_dynamic_False_autotune_True_cuda PASSED [7.8103s] [ 9%] 2025-12-04T09:54:33.1158757Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_triton_kernel_grid_type_2_num_dims_2_dynamic_True_autotune_True_cuda PASSED [8.1735s] [ 10%] 2025-12-04T09:54:33.1160539Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_triton_kernel_on_device_tma_dynamic_True_tma_version_new_cuda SKIPPED [0.0033s] (requires triton.tools.tensor_descriptor TMA support) [ 12%] 2025-12-04T09:54:33.1162389Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_triton_kernel_reinterpret_view_mem_leak_cuda <- test/inductor/test_torchinductor.py PASSED [6.6981s] [ 13%] 2025-12-04T09:54:33.1164059Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_triton_kernel_sympy_expr_arg_cuda <- test/inductor/test_torchinductor.py PASSED [6.4184s] [ 14%] 2025-12-04T09:54:33.1165940Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_triton_kernel_tma_descriptor_2d_dynamic_False_tma_version_old_cuda SKIPPED [0.0034s] (requires triton.tools.experimental_descriptor TMA support) [ 16%] 2025-12-04T09:54:33.1167771Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_unbacked_expr_replacements_shift_k_0_use_static_size_False_cuda PASSED [7.8875s] [ 17%] 2025-12-04T09:54:33.1169304Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_unbacked_expr_replacements_shift_k_2_use_static_size_False_cuda PASSED [7.5621s] [ 18%] 2025-12-04T09:54:33.1171692Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_while_loop_simple_cuda <- test/inductor/test_torchinductor.py W1204 09:54:04.717000 8665 site-packages/torch/export/dynamic_shapes.py:923] Using None as a dynamic shape dimension is deprecated. Please use Dim.STATIC instead 2025-12-04T09:54:33.1173826Z W1204 09:54:04.718000 8665 site-packages/torch/export/dynamic_shapes.py:923] Using None as a dynamic shape dimension is deprecated. Please use Dim.STATIC instead 2025-12-04T09:54:33.1174730Z PASSED [6.7229s] [ 20%] 2025-12-04T09:54:33.1175739Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_while_loop_with_sym_expr_cond_dynamic_False_cuda PASSED [7.1835s] [ 21%] 2025-12-04T09:54:33.1177323Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_zero_grid_with_backed_symbols_cuda <- test/inductor/test_torchinductor.py PASSED [5.8985s] [ 22%] 2025-12-04T09:54:33.1178875Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_zero_size_weight_cuda <- test/inductor/test_torchinductor.py PASSED [6.0453s] [ 24%] 2025-12-04T09:54:33.1180637Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test__weight_int4pack_mm_m_32_n_64_q_group_64_num_groups_1_mps SKIPPED [0.0004s] (No MPS backend available) [ 25%] 2025-12-04T09:54:33.1182365Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_amp_fallback_random_mps <- test/inductor/test_torchinductor.py SKIPPED [0.0003s] (No MPS backend available) [ 27%] 2025-12-04T09:54:33.1184194Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_aot_inductor_consts_cpp_build_mps <- test/inductor/test_torchinductor.py SKIPPED [0.0002s] (No MPS backend available) [ 28%] 2025-12-04T09:54:33.1186054Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_aoti_debug_printer_codegen_mps <- test/inductor/test_torchinductor.py SKIPPED [0.0002s] (No MPS backend available) [ 29%] 2025-12-04T09:54:33.1187718Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_aoti_debug_printer_fp8_dtype_mps SKIPPED [0.0007s] (No MPS backend available) [ 31%] 2025-12-04T09:54:33.1189473Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_aoti_user_defined_triton_kernel_profiling_mps <- test/inductor/test_torchinductor.py SKIPPED [0.0002s] (No MPS backend available) [ 32%] 2025-12-04T09:54:33.1191369Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_autotuning_args_reuse_mps <- test/inductor/test_torchinductor.py SKIPPED [0.0002s] (No MPS backend available) [ 33%] 2025-12-04T09:54:33.1193154Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_buffer_mutation_4_mps <- test/inductor/test_torchinductor.py SKIPPED [0.0002s] (No MPS backend available) [ 35%] 2025-12-04T09:54:33.1194996Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_codegen_int_array_var_fix_memory_leak_mps <- test/inductor/test_torchinductor.py SKIPPED [0.0002s] (No MPS backend available) [ 36%] 2025-12-04T09:54:33.1196815Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_cond_cpu_predicate_cuda_operands_max_autotune_True_mps SKIPPED [0.0002s] (No MPS backend available) [ 37%] 2025-12-04T09:54:33.1198522Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_cond_nested_mps <- test/inductor/test_torchinductor.py SKIPPED [0.0002s] (No MPS backend available) [ 39%] 2025-12-04T09:54:33.1200305Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_cond_with_multiple_outputs_mps <- test/inductor/test_torchinductor.py SKIPPED [0.0002s] (No MPS backend available) [ 40%] 2025-12-04T09:54:33.1202146Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_cond_with_replace_view_ops_mps <- test/inductor/test_torchinductor.py SKIPPED [0.0002s] (No MPS backend available) [ 41%] 2025-12-04T09:54:33.1203711Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_constant_mps SKIPPED [0.0002s] (No MPS backend available) [ 43%] 2025-12-04T09:54:33.1205335Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_constant_original_fqn_and_dtype_mps <- test/inductor/test_torchinductor.py SKIPPED [0.0002s] (No MPS backend available) [ 44%] 2025-12-04T09:54:33.1207204Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_constant_type_propagation_mps <- test/inductor/test_torchinductor.py SKIPPED [0.0002s] (No MPS backend available) [ 45%] 2025-12-04T09:54:33.1208992Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_dynamic_smem_above_default_limit_mps SKIPPED [0.0002s] (No MPS backend available) [ 47%] 2025-12-04T09:54:33.1210479Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_fallback_mem_leak_fix_mps SKIPPED [0.0002s] (No MPS backend available) [ 48%] 2025-12-04T09:54:33.1211913Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_foreach_multiple_dynamic_mps SKIPPED [0.0002s] (No MPS backend available) [ 50%] 2025-12-04T09:54:33.1213270Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_fp8_mps SKIPPED [0.0002s] (No MPS backend available) [ 51%] 2025-12-04T09:54:33.1214777Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_fqn_mps <- test/inductor/test_torchinductor.py SKIPPED [0.0002s] (No MPS backend available) [ 52%] 2025-12-04T09:54:33.1216520Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_large_grid_mps <- test/inductor/test_torchinductor.py SKIPPED [0.0002s] (No MPS backend available) [ 54%] 2025-12-04T09:54:33.1218281Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_masked_select_dynamic_mps <- test/inductor/test_torchinductor.py SKIPPED [0.0002s] (No MPS backend available) [ 55%] 2025-12-04T09:54:33.1220047Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_missing_output_mps <- test/inductor/test_torchinductor.py SKIPPED [0.0002s] (No MPS backend available) [ 56%] 2025-12-04T09:54:33.1221787Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_mixed_device_1_mps <- test/inductor/test_torchinductor.py SKIPPED [0.0002s] (No MPS backend available) [ 58%] 2025-12-04T09:54:33.1223575Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_nested_tensor_from_jagged_mps <- test/inductor/test_torchinductor.py SKIPPED [0.0002s] (No MPS backend available) [ 59%] 2025-12-04T09:54:33.1225434Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_non_contiguous_output_alias_mps <- test/inductor/test_torchinductor.py SKIPPED [0.0002s] (No MPS backend available) [ 60%] 2025-12-04T09:54:33.1227266Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_none_args_aot_codegen_mps <- test/inductor/test_torchinductor.py SKIPPED [0.0002s] (No MPS backend available) [ 62%] 2025-12-04T09:54:33.1229033Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_output_misaligned_mps <- test/inductor/test_torchinductor.py SKIPPED [0.0002s] (No MPS backend available) [ 63%] 2025-12-04T09:54:33.1230828Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_pad_non_zero_memory_leak_mps <- test/inductor/test_torchinductor.py SKIPPED [0.0004s] (No MPS backend available) [ 64%] 2025-12-04T09:54:33.1232642Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_proxy_executor_permute_mps <- test/inductor/test_torchinductor.py SKIPPED [0.0002s] (No MPS backend available) [ 66%] 2025-12-04T09:54:33.1234418Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_pytree_inputs_mps <- test/inductor/test_torchinductor.py SKIPPED [0.0002s] (No MPS backend available) [ 67%] 2025-12-04T09:54:33.1236182Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_repeated_user_defined_triton_kernel_embed_kernel_binary_True_mps SKIPPED [0.0002s] (No MPS backend available) [ 68%] 2025-12-04T09:54:33.1237880Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_replace_unbacked_symbol_with_backed_expr_mps SKIPPED [0.0002s] (No MPS backend available) [ 70%] 2025-12-04T09:54:33.1239595Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_reuse_kernel_dynamic_mps <- test/inductor/test_torchinductor.py SKIPPED [0.0002s] (No MPS backend available) [ 71%] 2025-12-04T09:54:33.1241193Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_runtime_checks_fp8_mps SKIPPED [0.0002s] (No MPS backend available) [ 72%] 2025-12-04T09:54:33.1242748Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_simple_embed_kernel_binary_False_max_autotune_False_mps SKIPPED [0.0002s] (No MPS backend available) [ 74%] 2025-12-04T09:54:33.1244547Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_small_constant_mps <- test/inductor/test_torchinductor.py SKIPPED [0.0002s] (No MPS backend available) [ 75%] 2025-12-04T09:54:33.1246157Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_stride_with_unbacked_expr_mps SKIPPED [0.0002s] (No MPS backend available) [ 77%] 2025-12-04T09:54:33.1247657Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_sympy_cpp_printer_min_max_minmax1_mps SKIPPED [0.0003s] (No MPS backend available) [ 78%] 2025-12-04T09:54:33.1249199Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_triton_autotuning_mps SKIPPED [0.0002s] (No MPS backend available) [ 79%] 2025-12-04T09:54:33.1250801Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_triton_kernel_grid_type_1_num_dims_2_dynamic_True_autotune_True_mps SKIPPED [0.0002s] (No MPS backend available) [ 81%] 2025-12-04T09:54:33.1252593Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_triton_kernel_grid_type_2_num_dims_1_dynamic_True_autotune_False_mps SKIPPED [0.0002s] (No MPS backend available) [ 82%] 2025-12-04T09:54:33.1254438Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_triton_kernel_unbacked_symint_in_grid_dynamic_False_autotuning_False_mps SKIPPED [0.0002s] (No MPS backend available) [ 83%] 2025-12-04T09:54:33.1256373Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_triton_kernel_unbacked_symint_in_grid_dynamic_True_autotuning_True_mps SKIPPED [0.0002s] (No MPS backend available) [ 85%] 2025-12-04T09:54:33.1258241Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_triton_kernel_with_none_input_mps <- test/inductor/test_torchinductor.py SKIPPED [0.0002s] (No MPS backend available) [ 86%] 2025-12-04T09:54:33.1260194Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_triton_kernel_with_none_inputs_and_equal_to_1_arg_mps <- test/inductor/test_torchinductor.py SKIPPED [0.0002s] (No MPS backend available) [ 87%] 2025-12-04T09:54:33.1262086Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_triton_next_power_of_2_mps <- test/inductor/test_torchinductor.py SKIPPED [0.0002s] (No MPS backend available) [ 89%] 2025-12-04T09:54:33.1263859Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_unbacked_expr_replacements_shift_k_2_use_static_size_True_mps SKIPPED [0.0002s] (No MPS backend available) [ 90%] 2025-12-04T09:54:33.1265693Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_unbounded_expr_substitutions_mps <- test/inductor/test_torchinductor.py SKIPPED [0.0002s] (No MPS backend available) [ 91%] 2025-12-04T09:54:33.1267556Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_using_model_name_for_files_mps <- test/inductor/test_torchinductor.py SKIPPED [0.0002s] (No MPS backend available) [ 93%] 2025-12-04T09:54:33.1269277Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_while_loop_with_mixed_device_dynamic_False_mps SKIPPED [0.0002s] (No MPS backend available) [ 94%] 2025-12-04T09:54:33.1270874Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_while_loop_with_mixed_device_dynamic_True_mps SKIPPED [0.0002s] (No MPS backend available) [ 95%] 2025-12-04T09:54:33.1272806Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_while_loop_with_outer_buffers_mps <- test/inductor/test_torchinductor.py SKIPPED [0.0002s] (No MPS backend available) [ 97%] 2025-12-04T09:54:33.1274497Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_while_loop_with_parameters_mps SKIPPED [0.0002s] (No MPS backend available) [ 98%] 2025-12-04T09:54:33.1276150Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_with_no_triton_profiler_mps <- test/inductor/test_torchinductor.py SKIPPED [0.0002s] (No MPS backend available) [100%] 2025-12-04T09:54:33.1277126Z 2025-12-04T09:54:33.1278061Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-bb08f25297cc596b.xml - 2025-12-04T09:54:33.1279179Z ========== 16 passed, 58 skipped, 80 deselected in 115.87s (0:01:55) =========== 2025-12-04T09:54:33.1280280Z The following tests failed consistently: ['test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_size_with_unbacked_add_and_mul_expr_cuda'] 2025-12-04T09:54:33.1281247Z 2025-12-04T09:54:33.1281796Z FINISHED PRINTING LOG FILE of inductor/test_aot_inductor 1/6 (test/test-reports/inductor.test_aot_inductor_1.6_cf1c969272c5d084_.log) 2025-12-04T09:54:33.1282482Z 2025-12-04T09:54:33.1282855Z Finished inductor/test_aot_inductor 1/6 ... [2025-12-04 09:54:33.018077][2501.400968487], took 9.09min 2025-12-04T09:54:33.1284145Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-f2c58a9dfc31919e.xml 2025-12-04T09:54:33.2058559Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-a793ea186f6e0edb.xml 2025-12-04T09:54:33.2385367Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-fcd1db8f24799401.xml 2025-12-04T09:54:33.2727794Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-bb08f25297cc596b.xml 2025-12-04T09:54:33.4879247Z Uploading logs for 57119749248 to S3 2025-12-04T09:54:33.5185831Z Uploading artifacts took 0.20 seconds 2025-12-04T09:54:33.5186305Z inductor/test_aot_inductor 1/6 failed! 2025-12-04T09:54:33.5190545Z Running inductor/test_aot_inductor 6/6 ... [2025-12-04 09:54:33.518876][2501.90177057] 2025-12-04T09:54:33.5191136Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T09:54:33.5195973Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_aot_inductor.py', '--shard-id=6', '--num-shards=6', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 09:54:33.519363] 2025-12-04T10:08:41.9506470Z 2025-12-04T10:08:41.9507421Z PRINTING LOG FILE of inductor/test_aot_inductor 6/6 (test/test-reports/inductor.test_aot_inductor_6.6_462385258b0b1d27_.log) 2025-12-04T10:08:41.9508985Z W1204 09:54:42.970000 11895 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T10:08:41.9510481Z Test results will be stored in test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-bf15e775351f3d84.xml 2025-12-04T10:08:41.9511502Z ============================= test session starts ============================== 2025-12-04T10:08:41.9512478Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:08:41.9513339Z cachedir: .pytest_cache 2025-12-04T10:08:41.9514338Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:08:41.9515295Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:08:41.9515729Z configfile: pytest.ini 2025-12-04T10:08:41.9516620Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:08:41.9517816Z collecting ... collected 934 items 2025-12-04T10:08:41.9518403Z stepcurrent: Cannot find last run test, not skipping 2025-12-04T10:08:41.9619919Z Running 158 items in this shard: test/inductor/test_aot_inductor.py::TestAOTInductorConfig::test_compile_standalone_cross_compile_windows_package_format, test/inductor/test_aot_inductor.py::TestAOTInductorConfig::test_compile_standalone_explicit_set, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test__int_mm_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test__weight_int4pack_mm_m_32_n_64_q_group_64_num_groups_1_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test__weight_int4pack_mm_with_scales_and_zeros_m_32_n_64_q_group_32_num_groups_2_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_amp_fallback_random_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_aoti_constant_tensor_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_aoti_debug_printing_model_inputs_codegen_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_aoti_profiler_enable_kernel_profile_False_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_aoti_profiler_enable_kernel_profile_True_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_autotune_with_constant_folding_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_autotuning_args_reuse_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_buffer_mutation_2_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_buffer_mutation_and_force_mmap_weights_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_cond_mismatched_branch_output_dynamic_False_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_cond_symint_input_disable_one_pass_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_cond_with_outer_code_before_after_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_cond_with_parameters_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_constant_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_conv_freezing_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_copy_non_blocking_is_pinned_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_dup_unbacked_sym_decl_with_refinement_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_empty_cat_dtype_promotion_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_fallback_kernel_with_symexpr_output_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_index_put_fallback_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_index_put_with_none_index_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_large_mmaped_weights_on_disk_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_misc_1_max_autotune_False_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_mixed_device_1_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_non_tensor_input_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_on_gpu_device1_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_pad_fallback_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_quantized_linear_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_repeated_user_defined_triton_kernel_embed_kernel_binary_False_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_replicate_on_devices_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_run_with_grad_enabled_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_runtime_checks_shape_failed_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_scaled_grouped_mm_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_seq_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_so_without_weight_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_stride_with_unbacked_expr_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_sym_expr_indexing_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_torchvision_transforms_functional_tensor_resize_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_dynamic_launcher_grid_infer_from_tensor_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_dynamic_shape_with_div_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_equal_to_1_arg_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_grid_type_1_num_dims_1_dynamic_False_autotune_False_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_grid_type_1_num_dims_2_dynamic_True_autotune_False_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_grid_type_2_num_dims_2_dynamic_False_autotune_False_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_reinterpret_view_mem_leak_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_tma_descriptor_2d_dynamic_True_tma_version_old_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_with_none_inputs_and_equal_to_1_arg_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_unbacked_expr_replacements_shift_k_2_use_static_size_False_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_unbacked_expr_replacements_shift_k_3_use_static_size_True_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_while_loop_with_unbacked_symint_closure_dynamic_False_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_with_offset_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_with_profiler_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_zero_size_buffer_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test__weight_int4pack_mm_m_32_n_64_q_group_64_num_groups_1_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test__weight_int4pack_mm_with_scales_and_zeros_m_32_n_64_q_group_32_num_groups_1_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_addmm_multiple_dynamic_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_aoti_constant_tensor_name_collision_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_aoti_debug_printer_cpp_kernel_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_assert_tensor_meta_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_autotuning_args_reuse_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_buffer_mutation_1_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_buffer_mutation_and_force_mmap_weights_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_cond_with_parameters_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_conv3d_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_copy_non_blocking_is_pinned_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_deconv_freezing_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_duplicate_constant_folding_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_dynamic_cat_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_extract_constants_map_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_fake_tensor_device_validation_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_fp8_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_inf_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_input_codegen_with_sympy_expr_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_masked_select_dynamic_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_misaligned_input_1_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_misaligned_input_2_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_multi_device_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_non_tensor_input_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_none_args_aot_codegen_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_normal_functional_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_pad_non_zero_memory_leak_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_proxy_executor_squeeze_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_repeated_calling_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_replace_unbacked_symbol_with_backed_expr_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_reuse_kernel_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_rocm_triton_autotuning_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_run_with_grad_enabled_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_runtime_checks_device_type_failed_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_scatter_fallback_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_seq_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_size_with_unbacked_add_expr_transitive_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_sym_expr_indexing_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_torchvision_transforms_functional_tensor_resize_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_triton_kernel_dynamic_shape_with_div_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_triton_kernel_equal_to_1_float_arg_dynamic_True_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_triton_kernel_grid_type_3_num_dims_1_dynamic_True_autotune_True_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_triton_kernel_on_device_tma_dynamic_False_tma_version_new_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_triton_kernel_reinterpret_view_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_triton_kernel_sympy_fn_like_arg_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_unbacked_expr_replacements_shift_k_1_use_static_size_False_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_upper_bound_i64_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_weight_on_disk_legacy_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_while_loop_with_mixed_device_dynamic_True_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_while_loop_with_parameters_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_while_loop_with_sym_expr_cond_dynamic_True_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_with_profiler_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_zero_size_buffer_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test__int_mm_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_addmm_multiple_dynamic_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_aoti_constant_tensor_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_aoti_debug_printing_model_inputs_codegen_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_aoti_runtime_asserts_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_assert_tensor_meta_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_backward_no_op_logging_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_buffer_mutation_1_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_buffer_mutation_3_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_cond_predicate_on_cpu_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_cond_unbacked_symint_closure_dynamic_True_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_cond_use_buffers_from_outer_scope_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_cond_with_parameters_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_cond_with_reinterpret_view_inputs_outputs_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_d2h_copy_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_dynamic_cat_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_fp8_view_of_param_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_index_put_fallback_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_index_put_with_none_index_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_large_mmaped_weights_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_libtorch_free_so_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_misc_1_max_autotune_True_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_output_path_2_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_poi_multiple_dynamic_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_quanatized_int8_linear_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_repeat_interleave_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_repeated_calling_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_runtime_checks_device_type_failed_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_sdpa_2_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_shifted_constraint_ranges_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_size_from_multi_output_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_subclasses_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_symbool_item_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_symint_item_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_triton_kernel_equal_to_1_arg_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_triton_kernel_grid_type_1_num_dims_1_dynamic_False_autotune_True_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_triton_kernel_grid_type_3_num_dims_2_dynamic_False_autotune_False_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_triton_kernel_grid_type_3_num_dims_2_dynamic_True_autotune_False_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_triton_kernel_sympy_expr_arg_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_triton_kernel_unbacked_symint_in_grid_dynamic_True_autotuning_False_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_unbacked_expr_replacements_shift_k_0_use_static_size_True_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_unbacked_expr_replacements_shift_k_1_use_static_size_True_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_while_loop_with_sym_expr_cond_dynamic_True_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_while_loop_with_unbacked_symint_closure_dynamic_False_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_while_loop_with_unbacked_symint_closure_dynamic_True_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_zero_size_buffer_mps 2025-12-04T10:08:41.9708414Z 2025-12-04T10:08:41.9709063Z inductor/test_aot_inductor.py::TestAOTInductorConfig::test_compile_standalone_cross_compile_windows_package_format PASSED [0.0040s] [ 0%] 2025-12-04T10:08:41.9710302Z inductor/test_aot_inductor.py::TestAOTInductorConfig::test_compile_standalone_explicit_set PASSED [0.0037s] [ 1%] 2025-12-04T10:08:41.9711552Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test__int_mm_cpu <- test/inductor/test_torchinductor.py PASSED [8.1261s] [ 1%] 2025-12-04T10:08:41.9713021Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test__weight_int4pack_mm_m_32_n_64_q_group_64_num_groups_1_cpu SKIPPED [0.0032s] (requires GPU) [ 2%] 2025-12-04T10:08:41.9714697Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test__weight_int4pack_mm_with_scales_and_zeros_m_32_n_64_q_group_32_num_groups_2_cpu SKIPPED [0.0032s] (requires Intel GPU) [ 3%] 2025-12-04T10:08:41.9716354Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_amp_fallback_random_cpu <- test/inductor/test_torchinductor.py PASSED [5.3726s] [ 3%] 2025-12-04T10:08:41.9718667Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_aoti_constant_tensor_cpu <- test/inductor/test_torchinductor.py PASSED [5.2548s] [ 4%] 2025-12-04T10:08:41.9720391Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_aoti_debug_printing_model_inputs_codegen_cpu <- test/inductor/test_torchinductor.py SKIPPED [0.0032s] (requires CUDA/XPU) [ 5%] 2025-12-04T10:08:41.9722058Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_aoti_profiler_enable_kernel_profile_False_cpu PASSED [6.5363s] [ 5%] 2025-12-04T10:08:41.9723423Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_aoti_profiler_enable_kernel_profile_True_cpu PASSED [6.5733s] [ 6%] 2025-12-04T10:08:41.9724924Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_autotune_with_constant_folding_cpu <- test/inductor/test_torchinductor.py PASSED [5.5134s] [ 6%] 2025-12-04T10:08:41.9726603Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_autotuning_args_reuse_cpu <- test/inductor/test_torchinductor.py SKIPPED [0.0033s] (requires GPU) [ 7%] 2025-12-04T10:08:41.9728208Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_buffer_mutation_2_cpu <- test/inductor/test_torchinductor.py PASSED [5.3971s] [ 8%] 2025-12-04T10:08:41.9729627Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_buffer_mutation_and_force_mmap_weights_cpu PASSED [7.0598s] [ 8%] 2025-12-04T10:08:41.9731003Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_cond_mismatched_branch_output_dynamic_False_cpu PASSED [5.9240s] [ 9%] 2025-12-04T10:08:41.9732524Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_cond_symint_input_disable_one_pass_cpu <- test/inductor/test_torchinductor.py PASSED [5.6302s] [ 10%] 2025-12-04T10:08:41.9734835Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_cond_with_outer_code_before_after_cpu <- test/inductor/test_torchinductor.py W1204 09:55:46.405000 11895 site-packages/torch/export/dynamic_shapes.py:923] Using None as a dynamic shape dimension is deprecated. Please use Dim.STATIC instead 2025-12-04T10:08:41.9737121Z W1204 09:55:46.406000 11895 site-packages/torch/export/dynamic_shapes.py:923] Using None as a dynamic shape dimension is deprecated. Please use Dim.STATIC instead 2025-12-04T10:08:41.9756384Z PASSED [5.5997s] [ 10%] 2025-12-04T10:08:41.9758291Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_cond_with_parameters_cpu <- test/inductor/test_torchinductor.py W1204 09:55:52.059000 11895 site-packages/torch/export/dynamic_shapes.py:923] Using None as a dynamic shape dimension is deprecated. Please use Dim.STATIC instead 2025-12-04T10:08:41.9759920Z PASSED [5.9527s] [ 11%] 2025-12-04T10:08:41.9760558Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_constant_cpu PASSED [5.2606s] [ 12%] 2025-12-04T10:08:41.9761829Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_conv_freezing_cpu <- test/inductor/test_torchinductor.py PASSED [11.1031s] [ 12%] 2025-12-04T10:08:41.9763699Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_copy_non_blocking_is_pinned_cpu <- test/inductor/test_torchinductor.py SKIPPED [0.0032s] (only matters for device-to-cpu copy) [ 13%] 2025-12-04T10:08:41.9765528Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_dup_unbacked_sym_decl_with_refinement_cpu <- test/inductor/test_torchinductor.py PASSED [5.6299s] [ 13%] 2025-12-04T10:08:41.9767167Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_empty_cat_dtype_promotion_cpu <- test/inductor/test_torchinductor.py PASSED [5.2371s] [ 14%] 2025-12-04T10:08:41.9768788Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_fallback_kernel_with_symexpr_output_cpu SKIPPED [0.0003s] (Some archs don't support flash SDPA) [ 15%] 2025-12-04T10:08:41.9770384Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_index_put_fallback_cpu <- test/inductor/test_torchinductor.py PASSED [5.2788s] [ 15%] 2025-12-04T10:08:41.9771987Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_index_put_with_none_index_cpu PASSED [5.5379s] [ 16%] 2025-12-04T10:08:41.9773401Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_large_mmaped_weights_on_disk_cpu <- test/inductor/test_torchinductor.py PASSED [13.6671s] [ 17%] 2025-12-04T10:08:41.9774802Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_misc_1_max_autotune_False_cpu PASSED [5.9795s] [ 17%] 2025-12-04T10:08:41.9776396Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_mixed_device_1_cpu <- test/inductor/test_torchinductor.py SKIPPED [0.0031s] (Mixed-device test requires GPU) [ 18%] 2025-12-04T10:08:41.9778406Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_non_tensor_input_cpu <- test/inductor/test_torchinductor.py W1204 09:56:55.603000 11895 site-packages/torch/_export/__init__.py:71] +============================+ 2025-12-04T10:08:41.9779934Z W1204 09:56:55.604000 11895 site-packages/torch/_export/__init__.py:72] | !!! WARNING !!! | 2025-12-04T10:08:41.9780790Z W1204 09:56:55.604000 11895 site-packages/torch/_export/__init__.py:73] +============================+ 2025-12-04T10:08:41.9782502Z W1204 09:56:55.604000 11895 site-packages/torch/_export/__init__.py:74] torch._export.aot_compile()/torch._export.aot_load() is being deprecated, please switch to directly calling torch._inductor.aoti_compile_and_package(torch.export.export())/torch._inductor.aoti_load_package() instead. 2025-12-04T10:08:41.9783981Z PASSED [17.7328s] [ 18%] 2025-12-04T10:08:41.9784826Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_on_gpu_device1_cpu SKIPPED [0.0003s] (requires multiple cuda devices) [ 19%] 2025-12-04T10:08:41.9786271Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_pad_fallback_cpu <- test/inductor/test_torchinductor.py PASSED [5.4528s] [ 20%] 2025-12-04T10:08:41.9788230Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_quantized_linear_cpu [W1204 09:57:18.298046110 QuantizedLinear.cpp:379] Warning: fbgemm_pack_gemm_matrix_fp16 is deprecated and will be removed in a future PyTorch release. (function operator()) 2025-12-04T10:08:41.9790299Z [W1204 09:57:24.435700574 QuantizedLinear.cpp:415] Warning: fbgemm_linear_fp16_weight_fp32_activation is deprecated and will be removed in a future PyTorch release. (function operator()) 2025-12-04T10:08:41.9791312Z PASSED [5.2632s] [ 20%] 2025-12-04T10:08:41.9792409Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_repeated_user_defined_triton_kernel_embed_kernel_binary_False_cpu SKIPPED [0.0032s] (requires GPU) [ 21%] 2025-12-04T10:08:41.9794019Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_replicate_on_devices_cpu SKIPPED [0.0003s] (requires multiple cuda devices) [ 22%] 2025-12-04T10:08:41.9795608Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_run_with_grad_enabled_cpu <- test/inductor/test_torchinductor.py PASSED [5.2609s] [ 22%] 2025-12-04T10:08:41.9797226Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_runtime_checks_shape_failed_cpu Error: input_handles[0]: unmatched dim value at 1, expected: 4, but got: 8 2025-12-04T10:08:41.9798155Z 2025-12-04T10:08:41.9798429Z Error: input_handles[0]: unmatched stride value at 1, expected: 4, but got: 1 2025-12-04T10:08:41.9798841Z 2025-12-04T10:08:41.9799173Z Error: input_handles[0]: dim value is too large at 0, expected to be <= 1024, but got: 2048 2025-12-04T10:08:41.9799631Z 2025-12-04T10:08:41.9799937Z Error: input_handles[0]: dim value is too large at 0, expected to be <= 1024, but got: 2048 2025-12-04T10:08:41.9800397Z 2025-12-04T10:08:41.9800505Z PASSED [5.2302s] [ 23%] 2025-12-04T10:08:41.9801407Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_scaled_grouped_mm_cpu SKIPPED [0.0003s] (scaled_grouped_mm is only supported on SM90) [ 24%] 2025-12-04T10:08:41.9802878Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_seq_cpu <- test/inductor/test_torchinductor.py PASSED [5.3141s] [ 24%] 2025-12-04T10:08:41.9804309Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_so_without_weight_cpu <- test/inductor/test_torchinductor.py PASSED [10.7425s] [ 25%] 2025-12-04T10:08:41.9805675Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_stride_with_unbacked_expr_cpu PASSED [5.2698s] [ 25%] 2025-12-04T10:08:41.9807146Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_sym_expr_indexing_cpu <- test/inductor/test_torchinductor.py SKIPPED [0.0032s] (requires CUDA/XPU) [ 26%] 2025-12-04T10:08:41.9808930Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_torchvision_transforms_functional_tensor_resize_cpu <- test/inductor/test_torchinductor.py PASSED [6.9432s] [ 27%] 2025-12-04T10:08:41.9810629Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_dynamic_launcher_grid_infer_from_tensor_cpu SKIPPED [0.0040s] (requires GPU) [ 27%] 2025-12-04T10:08:41.9812316Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_dynamic_shape_with_div_cpu <- test/inductor/test_torchinductor.py SKIPPED [0.0031s] (requires GPU) [ 28%] 2025-12-04T10:08:41.9814116Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_equal_to_1_arg_cpu <- test/inductor/test_torchinductor.py SKIPPED [0.0033s] (requires GPU) [ 29%] 2025-12-04T10:08:41.9815848Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_grid_type_1_num_dims_1_dynamic_False_autotune_False_cpu SKIPPED [0.0030s] (requires GPU) [ 29%] 2025-12-04T10:08:41.9817614Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_grid_type_1_num_dims_2_dynamic_True_autotune_False_cpu SKIPPED [0.0029s] (requires GPU) [ 30%] 2025-12-04T10:08:41.9819313Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_grid_type_2_num_dims_2_dynamic_False_autotune_False_cpu SKIPPED [0.0033s] (requires GPU) [ 31%] 2025-12-04T10:08:41.9821095Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_reinterpret_view_mem_leak_cpu <- test/inductor/test_torchinductor.py SKIPPED [0.0028s] (requires GPU) [ 31%] 2025-12-04T10:08:41.9822866Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_tma_descriptor_2d_dynamic_True_tma_version_old_cpu SKIPPED [0.0028s] (requires GPU) [ 32%] 2025-12-04T10:08:41.9824800Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_with_none_inputs_and_equal_to_1_arg_cpu <- test/inductor/test_torchinductor.py SKIPPED [0.0028s] (requires GPU) [ 32%] 2025-12-04T10:08:41.9826735Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_unbacked_expr_replacements_shift_k_2_use_static_size_False_cpu SKIPPED [0.0028s] (Need triton for user-defined triton kernel) [ 33%] 2025-12-04T10:08:41.9828711Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_unbacked_expr_replacements_shift_k_3_use_static_size_True_cpu SKIPPED [0.0029s] (Need triton for user-defined triton kernel) [ 34%] 2025-12-04T10:08:41.9830415Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_while_loop_with_unbacked_symint_closure_dynamic_False_cpu PASSED [5.8309s] [ 34%] 2025-12-04T10:08:41.9831889Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_with_offset_cpu <- test/inductor/test_torchinductor.py PASSED [5.2391s] [ 35%] 2025-12-04T10:08:41.9833356Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_with_profiler_cpu <- test/inductor/test_torchinductor.py PASSED [5.2525s] [ 36%] 2025-12-04T10:08:41.9834832Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_zero_size_buffer_cpu <- test/inductor/test_torchinductor.py PASSED [5.1696s] [ 36%] 2025-12-04T10:08:41.9836355Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test__weight_int4pack_mm_m_32_n_64_q_group_64_num_groups_1_cuda ('RERUN', {'yellow': True}) [0.0362s] [ 37%] 2025-12-04T10:08:41.9837950Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test__weight_int4pack_mm_m_32_n_64_q_group_64_num_groups_1_cuda ('RERUN', {'yellow': True}) [0.0061s] [ 37%] 2025-12-04T10:08:41.9839451Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test__weight_int4pack_mm_m_32_n_64_q_group_64_num_groups_1_cuda FAILED [0.0061s] [ 37%] 2025-12-04T10:08:41.9840227Z 2025-12-04T10:08:41.9840394Z ==================================== RERUNS ==================================== 2025-12-04T10:08:41.9841107Z _ AOTInductorTestABICompatibleGpu.test__weight_int4pack_mm_m_32_n_64_q_group_64_num_groups_1_cuda _ 2025-12-04T10:08:41.9841805Z Traceback (most recent call last): 2025-12-04T10:08:41.9842532Z File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor.py", line 6893, in test__weight_int4pack_mm 2025-12-04T10:08:41.9843282Z self.check_model(model, (a,)) 2025-12-04T10:08:41.9843939Z File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_utils.py", line 247, in check_model 2025-12-04T10:08:41.9844643Z ref_model = copy.deepcopy(model) 2025-12-04T10:08:41.9845172Z File "/opt/conda/envs/py_3.10/lib/python3.10/copy.py", line 172, in deepcopy 2025-12-04T10:08:41.9845694Z y = _reconstruct(x, memo, *rv) 2025-12-04T10:08:41.9846219Z File "/opt/conda/envs/py_3.10/lib/python3.10/copy.py", line 271, in _reconstruct 2025-12-04T10:08:41.9846774Z state = deepcopy(state, memo) 2025-12-04T10:08:41.9847278Z File "/opt/conda/envs/py_3.10/lib/python3.10/copy.py", line 146, in deepcopy 2025-12-04T10:08:41.9847782Z y = copier(x, memo) 2025-12-04T10:08:41.9848267Z File "/opt/conda/envs/py_3.10/lib/python3.10/copy.py", line 231, in _deepcopy_dict 2025-12-04T10:08:41.9848872Z y[deepcopy(key, memo)] = deepcopy(value, memo) 2025-12-04T10:08:41.9849415Z File "/opt/conda/envs/py_3.10/lib/python3.10/copy.py", line 153, in deepcopy 2025-12-04T10:08:41.9849931Z y = copier(memo) 2025-12-04T10:08:41.9850523Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_tensor.py", line 180, in __deepcopy__ 2025-12-04T10:08:41.9851256Z new_storage = self._typed_storage()._deepcopy(memo) 2025-12-04T10:08:41.9851948Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/storage.py", line 1139, in _deepcopy 2025-12-04T10:08:41.9852775Z return self._new_wrapped_storage(copy.deepcopy(self._untyped_storage, memo)) 2025-12-04T10:08:41.9853547Z File "/opt/conda/envs/py_3.10/lib/python3.10/copy.py", line 153, in deepcopy 2025-12-04T10:08:41.9854053Z y = copier(memo) 2025-12-04T10:08:41.9854643Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/storage.py", line 243, in __deepcopy__ 2025-12-04T10:08:41.9855314Z new_storage = self.clone() 2025-12-04T10:08:41.9855900Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/storage.py", line 257, in clone 2025-12-04T10:08:41.9856798Z return type(self)(self.nbytes(), device=self.device).copy_(self) 2025-12-04T10:08:41.9857388Z torch.AcceleratorError: CUDA error: invalid device function 2025-12-04T10:08:41.9858396Z Search for `cudaErrorInvalidDeviceFunction' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information. 2025-12-04T10:08:41.9859640Z CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. 2025-12-04T10:08:41.9860455Z For debugging consider passing CUDA_LAUNCH_BLOCKING=1 2025-12-04T10:08:41.9861016Z Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions. 2025-12-04T10:08:41.9861383Z 2025-12-04T10:08:41.9861389Z 2025-12-04T10:08:41.9861617Z To execute this test, run the following from the base repo dir: 2025-12-04T10:08:41.9862748Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_aot_inductor.py AOTInductorTestABICompatibleGpu.test__weight_int4pack_mm_m_32_n_64_q_group_64_num_groups_1_cuda 2025-12-04T10:08:41.9863687Z 2025-12-04T10:08:41.9863955Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:08:41.9864791Z _ AOTInductorTestABICompatibleGpu.test__weight_int4pack_mm_m_32_n_64_q_group_64_num_groups_1_cuda _ 2025-12-04T10:08:41.9865489Z Traceback (most recent call last): 2025-12-04T10:08:41.9866192Z File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor.py", line 6893, in test__weight_int4pack_mm 2025-12-04T10:08:41.9866935Z self.check_model(model, (a,)) 2025-12-04T10:08:41.9867606Z File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_utils.py", line 247, in check_model 2025-12-04T10:08:41.9868311Z ref_model = copy.deepcopy(model) 2025-12-04T10:08:41.9868815Z File "/opt/conda/envs/py_3.10/lib/python3.10/copy.py", line 172, in deepcopy 2025-12-04T10:08:41.9869347Z y = _reconstruct(x, memo, *rv) 2025-12-04T10:08:41.9869870Z File "/opt/conda/envs/py_3.10/lib/python3.10/copy.py", line 271, in _reconstruct 2025-12-04T10:08:41.9870415Z state = deepcopy(state, memo) 2025-12-04T10:08:41.9870913Z File "/opt/conda/envs/py_3.10/lib/python3.10/copy.py", line 146, in deepcopy 2025-12-04T10:08:41.9871650Z y = copier(x, memo) 2025-12-04T10:08:41.9872127Z File "/opt/conda/envs/py_3.10/lib/python3.10/copy.py", line 231, in _deepcopy_dict 2025-12-04T10:08:41.9872735Z y[deepcopy(key, memo)] = deepcopy(value, memo) 2025-12-04T10:08:41.9873295Z File "/opt/conda/envs/py_3.10/lib/python3.10/copy.py", line 153, in deepcopy 2025-12-04T10:08:41.9873811Z y = copier(memo) 2025-12-04T10:08:41.9874397Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_tensor.py", line 180, in __deepcopy__ 2025-12-04T10:08:41.9875127Z new_storage = self._typed_storage()._deepcopy(memo) 2025-12-04T10:08:41.9875823Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/storage.py", line 1139, in _deepcopy 2025-12-04T10:08:41.9876632Z return self._new_wrapped_storage(copy.deepcopy(self._untyped_storage, memo)) 2025-12-04T10:08:41.9877330Z File "/opt/conda/envs/py_3.10/lib/python3.10/copy.py", line 153, in deepcopy 2025-12-04T10:08:41.9877844Z y = copier(memo) 2025-12-04T10:08:41.9878429Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/storage.py", line 243, in __deepcopy__ 2025-12-04T10:08:41.9879089Z new_storage = self.clone() 2025-12-04T10:08:41.9879673Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/storage.py", line 257, in clone 2025-12-04T10:08:41.9880417Z return type(self)(self.nbytes(), device=self.device).copy_(self) 2025-12-04T10:08:41.9881183Z torch.AcceleratorError: CUDA error: invalid device function 2025-12-04T10:08:41.9882189Z Search for `cudaErrorInvalidDeviceFunction' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information. 2025-12-04T10:08:41.9883445Z CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. 2025-12-04T10:08:41.9884361Z For debugging consider passing CUDA_LAUNCH_BLOCKING=1 2025-12-04T10:08:41.9884906Z Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions. 2025-12-04T10:08:41.9885287Z 2025-12-04T10:08:41.9885292Z 2025-12-04T10:08:41.9885507Z To execute this test, run the following from the base repo dir: 2025-12-04T10:08:41.9886653Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_aot_inductor.py AOTInductorTestABICompatibleGpu.test__weight_int4pack_mm_m_32_n_64_q_group_64_num_groups_1_cuda 2025-12-04T10:08:41.9887565Z 2025-12-04T10:08:41.9887852Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:08:41.9888403Z =================================== FAILURES =================================== 2025-12-04T10:08:41.9889130Z _ AOTInductorTestABICompatibleGpu.test__weight_int4pack_mm_m_32_n_64_q_group_64_num_groups_1_cuda _ 2025-12-04T10:08:41.9889824Z Traceback (most recent call last): 2025-12-04T10:08:41.9890544Z File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor.py", line 6893, in test__weight_int4pack_mm 2025-12-04T10:08:41.9891276Z self.check_model(model, (a,)) 2025-12-04T10:08:41.9891943Z File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_utils.py", line 247, in check_model 2025-12-04T10:08:41.9892644Z ref_model = copy.deepcopy(model) 2025-12-04T10:08:41.9893152Z File "/opt/conda/envs/py_3.10/lib/python3.10/copy.py", line 172, in deepcopy 2025-12-04T10:08:41.9893689Z y = _reconstruct(x, memo, *rv) 2025-12-04T10:08:41.9894215Z File "/opt/conda/envs/py_3.10/lib/python3.10/copy.py", line 271, in _reconstruct 2025-12-04T10:08:41.9894784Z state = deepcopy(state, memo) 2025-12-04T10:08:41.9895283Z File "/opt/conda/envs/py_3.10/lib/python3.10/copy.py", line 146, in deepcopy 2025-12-04T10:08:41.9895789Z y = copier(x, memo) 2025-12-04T10:08:41.9896343Z File "/opt/conda/envs/py_3.10/lib/python3.10/copy.py", line 231, in _deepcopy_dict 2025-12-04T10:08:41.9896955Z y[deepcopy(key, memo)] = deepcopy(value, memo) 2025-12-04T10:08:41.9897524Z File "/opt/conda/envs/py_3.10/lib/python3.10/copy.py", line 153, in deepcopy 2025-12-04T10:08:41.9898032Z y = copier(memo) 2025-12-04T10:08:41.9898631Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_tensor.py", line 180, in __deepcopy__ 2025-12-04T10:08:41.9899361Z new_storage = self._typed_storage()._deepcopy(memo) 2025-12-04T10:08:41.9900054Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/storage.py", line 1139, in _deepcopy 2025-12-04T10:08:41.9900889Z return self._new_wrapped_storage(copy.deepcopy(self._untyped_storage, memo)) 2025-12-04T10:08:41.9901592Z File "/opt/conda/envs/py_3.10/lib/python3.10/copy.py", line 153, in deepcopy 2025-12-04T10:08:41.9902117Z y = copier(memo) 2025-12-04T10:08:41.9902695Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/storage.py", line 243, in __deepcopy__ 2025-12-04T10:08:41.9903368Z new_storage = self.clone() 2025-12-04T10:08:41.9903963Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/storage.py", line 257, in clone 2025-12-04T10:08:41.9904693Z return type(self)(self.nbytes(), device=self.device).copy_(self) 2025-12-04T10:08:41.9905286Z torch.AcceleratorError: CUDA error: invalid device function 2025-12-04T10:08:41.9906282Z Search for `cudaErrorInvalidDeviceFunction' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information. 2025-12-04T10:08:41.9907532Z CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. 2025-12-04T10:08:41.9908418Z For debugging consider passing CUDA_LAUNCH_BLOCKING=1 2025-12-04T10:08:41.9908978Z Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions. 2025-12-04T10:08:41.9909346Z 2025-12-04T10:08:41.9909351Z 2025-12-04T10:08:41.9909581Z To execute this test, run the following from the base repo dir: 2025-12-04T10:08:41.9910721Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_aot_inductor.py AOTInductorTestABICompatibleGpu.test__weight_int4pack_mm_m_32_n_64_q_group_64_num_groups_1_cuda 2025-12-04T10:08:41.9911706Z 2025-12-04T10:08:41.9911972Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:08:41.9913135Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-bf15e775351f3d84.xml - 2025-12-04T10:08:41.9914197Z =========================== short test summary info ============================ 2025-12-04T10:08:41.9915438Z FAILED [0.0061s] inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test__weight_int4pack_mm_m_32_n_64_q_group_64_num_groups_1_cuda - torch.AcceleratorError: CUDA error: invalid device function 2025-12-04T10:08:41.9917084Z Search for `cudaErrorInvalidDeviceFunction' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information. 2025-12-04T10:08:41.9918334Z CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. 2025-12-04T10:08:41.9919145Z For debugging consider passing CUDA_LAUNCH_BLOCKING=1 2025-12-04T10:08:41.9919699Z Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions. 2025-12-04T10:08:41.9920068Z 2025-12-04T10:08:41.9920074Z 2025-12-04T10:08:41.9920288Z To execute this test, run the following from the base repo dir: 2025-12-04T10:08:41.9921427Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_aot_inductor.py AOTInductorTestABICompatibleGpu.test__weight_int4pack_mm_m_32_n_64_q_group_64_num_groups_1_cuda 2025-12-04T10:08:41.9922350Z 2025-12-04T10:08:41.9922618Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:08:41.9923217Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:08:41.9923761Z ======== 1 failed, 35 passed, 23 skipped, 2 rerun in 219.63s (0:03:39) ========= 2025-12-04T10:08:41.9924239Z Got exit code 1 2025-12-04T10:08:41.9924517Z Retrying single test... 2025-12-04T10:08:41.9925136Z W1204 09:58:35.463000 15634 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T10:08:41.9926291Z Test results will be stored in test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-cd1c50b62bb47a1b.xml 2025-12-04T10:08:41.9927168Z ============================= test session starts ============================== 2025-12-04T10:08:41.9927834Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:08:41.9928434Z cachedir: .pytest_cache 2025-12-04T10:08:41.9929152Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:08:41.9929938Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:08:41.9930293Z configfile: pytest.ini 2025-12-04T10:08:41.9931017Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:08:41.9931927Z collecting ... collected 934 items / 157 deselected / 777 selected 2025-12-04T10:08:41.9933158Z stepcurrent: skipping 58 already run items. Running only test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test__weight_int4pack_mm_m_32_n_64_q_group_64_num_groups_1_cuda 2025-12-04T10:08:41.9934270Z Running 1 items in this shard 2025-12-04T10:08:41.9934483Z 2025-12-04T10:08:41.9935703Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test__weight_int4pack_mm_m_32_n_64_q_group_64_num_groups_1_cuda [W1204 09:58:37.753737745 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:08:41.9937072Z 2025-12-04T10:08:41.9937213Z ('RERUN', {'yellow': True}) [15.7512s] [100%] 2025-12-04T10:08:41.9938619Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test__weight_int4pack_mm_m_32_n_64_q_group_64_num_groups_1_cuda [W1204 09:58:53.513212722 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:08:41.9939944Z 2025-12-04T10:08:41.9940089Z ('RERUN', {'yellow': True}) [0.0070s] [100%] 2025-12-04T10:08:41.9941457Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test__weight_int4pack_mm_m_32_n_64_q_group_64_num_groups_1_cuda [W1204 09:58:53.520790272 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:08:41.9942722Z 2025-12-04T10:08:41.9942830Z FAILED [0.0054s] [100%] 2025-12-04T10:08:41.9943022Z 2025-12-04T10:08:41.9943168Z ==================================== RERUNS ==================================== 2025-12-04T10:08:41.9943888Z _ AOTInductorTestABICompatibleGpu.test__weight_int4pack_mm_m_32_n_64_q_group_64_num_groups_1_cuda _ 2025-12-04T10:08:41.9944582Z Traceback (most recent call last): 2025-12-04T10:08:41.9945294Z File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor.py", line 6893, in test__weight_int4pack_mm 2025-12-04T10:08:41.9946043Z self.check_model(model, (a,)) 2025-12-04T10:08:41.9946713Z File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_utils.py", line 247, in check_model 2025-12-04T10:08:41.9947491Z ref_model = copy.deepcopy(model) 2025-12-04T10:08:41.9948080Z File "/opt/conda/envs/py_3.10/lib/python3.10/copy.py", line 172, in deepcopy 2025-12-04T10:08:41.9948617Z y = _reconstruct(x, memo, *rv) 2025-12-04T10:08:41.9949135Z File "/opt/conda/envs/py_3.10/lib/python3.10/copy.py", line 271, in _reconstruct 2025-12-04T10:08:41.9949686Z state = deepcopy(state, memo) 2025-12-04T10:08:41.9950188Z File "/opt/conda/envs/py_3.10/lib/python3.10/copy.py", line 146, in deepcopy 2025-12-04T10:08:41.9950711Z y = copier(x, memo) 2025-12-04T10:08:41.9951184Z File "/opt/conda/envs/py_3.10/lib/python3.10/copy.py", line 231, in _deepcopy_dict 2025-12-04T10:08:41.9951790Z y[deepcopy(key, memo)] = deepcopy(value, memo) 2025-12-04T10:08:41.9952349Z File "/opt/conda/envs/py_3.10/lib/python3.10/copy.py", line 153, in deepcopy 2025-12-04T10:08:41.9952851Z y = copier(memo) 2025-12-04T10:08:41.9953439Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_tensor.py", line 180, in __deepcopy__ 2025-12-04T10:08:41.9954162Z new_storage = self._typed_storage()._deepcopy(memo) 2025-12-04T10:08:41.9954863Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/storage.py", line 1139, in _deepcopy 2025-12-04T10:08:41.9955675Z return self._new_wrapped_storage(copy.deepcopy(self._untyped_storage, memo)) 2025-12-04T10:08:41.9956363Z File "/opt/conda/envs/py_3.10/lib/python3.10/copy.py", line 153, in deepcopy 2025-12-04T10:08:41.9956878Z y = copier(memo) 2025-12-04T10:08:41.9957451Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/storage.py", line 243, in __deepcopy__ 2025-12-04T10:08:41.9958169Z new_storage = self.clone() 2025-12-04T10:08:41.9958942Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/storage.py", line 257, in clone 2025-12-04T10:08:41.9959684Z return type(self)(self.nbytes(), device=self.device).copy_(self) 2025-12-04T10:08:41.9960261Z torch.AcceleratorError: CUDA error: invalid device function 2025-12-04T10:08:41.9961259Z Search for `cudaErrorInvalidDeviceFunction' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information. 2025-12-04T10:08:41.9962513Z CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. 2025-12-04T10:08:41.9963459Z For debugging consider passing CUDA_LAUNCH_BLOCKING=1 2025-12-04T10:08:41.9964003Z Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions. 2025-12-04T10:08:41.9964386Z 2025-12-04T10:08:41.9964998Z Exception raised from copy_device_to_device at /var/lib/jenkins/workspace/aten/src/ATen/native/cuda/Copy.cu:337 (most recent call first): 2025-12-04T10:08:41.9965860Z C++ CapturedTraceback: 2025-12-04T10:08:41.9967462Z #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T10:08:41.9969387Z #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T10:08:41.9970671Z #6 c10::AcceleratorError::AcceleratorError(c10::SourceLocation, int, std::__cxx11::basic_string, std::allocator > const&) from :0 2025-12-04T10:08:41.9972049Z #7 c10::cuda::c10_cuda_check_implementation(int, char const*, char const*, unsigned int, bool) from ??:0 2025-12-04T10:08:41.9972839Z #8 at::native::copy_device_to_device(at::TensorIterator&, bool, bool) from ??:0 2025-12-04T10:08:41.9973555Z #9 at::native::copy_impl(at::Tensor&, at::Tensor const&, bool) [clone .isra.0] from Copy.cpp:0 2025-12-04T10:08:41.9974237Z #10 at::native::copy_(at::Tensor&, at::Tensor const&, bool) from ??:0 2025-12-04T10:08:41.9976806Z #11 c10::impl::wrap_kernel_functor_unboxed_, at::Tensor&, c10::guts::typelist::typelist >, at::Tensor& (c10::DispatchKeySet, at::Tensor&, at::Tensor const&, bool)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor&, at::Tensor const&, bool) from VariableTypeManual.cpp:0 2025-12-04T10:08:41.9979697Z #12 torch::autograd::VariableType::(anonymous namespace)::copy_(c10::DispatchKeySet, at::Tensor&, at::Tensor const&, bool) from VariableTypeManual.cpp:0 2025-12-04T10:08:41.9980681Z #13 at::_ops::copy_::call(at::Tensor&, at::Tensor const&, bool) from ??:0 2025-12-04T10:08:41.9981286Z #14 at::storage_copy(c10::Storage&, c10::Storage const&, bool) from ??:0 2025-12-04T10:08:41.9981915Z #15 THPStorage_copy_(_object*, _object*, _object*) from StorageMethods.cpp:0 2025-12-04T10:08:41.9982754Z #16 method_vectorcall_VARARGS_KEYWORDS from /usr/local/src/conda/python-3.10.14/Objects/descrobject.c:344 2025-12-04T10:08:41.9983737Z #17 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:41.9984663Z #18 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:41.9985605Z #19 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:41.9986541Z #20 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:41.9987476Z #21 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:41.9988385Z #22 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:41.9989324Z #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:41.9990254Z #24 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:41.9991182Z #25 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:41.9992094Z #26 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:41.9993219Z #27 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:41.9994150Z #28 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:41.9995076Z #29 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:41.9996082Z #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:41.9997009Z #31 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:41.9997938Z #32 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:41.9998866Z #33 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:41.9999777Z #34 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0000571Z #35 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T10:08:42.0001362Z #36 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0002273Z #37 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0003201Z #38 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0004131Z #39 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0005056Z #40 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0005855Z #41 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267 2025-12-04T10:08:42.0006556Z #42 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T10:08:42.0007339Z #43 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0008156Z #44 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267 2025-12-04T10:08:42.0008842Z #45 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T10:08:42.0009624Z #46 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0010441Z #47 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267 2025-12-04T10:08:42.0011125Z #48 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T10:08:42.0011908Z #49 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0012838Z #50 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0013767Z #51 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0014535Z #52 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T10:08:42.0015316Z #53 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0016243Z #54 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0017249Z #55 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0018172Z #56 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0019108Z #57 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0020045Z #58 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0020983Z #59 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0021980Z #60 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0022913Z #61 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0023730Z #62 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267 2025-12-04T10:08:42.0024422Z #63 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T10:08:42.0025363Z #64 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0026240Z #65 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153 2025-12-04T10:08:42.0027051Z #66 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431 2025-12-04T10:08:42.0027789Z #67 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494 2025-12-04T10:08:42.0028541Z #68 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215 2025-12-04T10:08:42.0029401Z #69 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112 2025-12-04T10:08:42.0030324Z #70 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0031237Z #71 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0032170Z #72 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0032952Z #73 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T10:08:42.0033730Z #74 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0034642Z #75 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0035574Z #76 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0036501Z #77 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0037411Z #78 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0038279Z #79 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153 2025-12-04T10:08:42.0039085Z #80 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431 2025-12-04T10:08:42.0039835Z #81 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494 2025-12-04T10:08:42.0040538Z #82 _PyObject_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:305 2025-12-04T10:08:42.0041214Z #83 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T10:08:42.0041999Z #84 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0042933Z #85 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0043849Z #86 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0044781Z #87 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0045702Z #88 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0046472Z #89 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T10:08:42.0047250Z #90 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0048175Z #91 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0049096Z #92 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0050072Z #93 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0050997Z #94 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0051780Z #95 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T10:08:42.0052562Z #96 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0053541Z #97 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0054469Z #98 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0055413Z #99 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0056427Z #100 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0057318Z #101 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153 2025-12-04T10:08:42.0058144Z #102 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431 2025-12-04T10:08:42.0058914Z #103 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494 2025-12-04T10:08:42.0059666Z #104 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215 2025-12-04T10:08:42.0060551Z #105 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112 2025-12-04T10:08:42.0061498Z #106 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0062298Z #107 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T10:08:42.0063084Z #108 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0064033Z #109 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0064988Z #110 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0065936Z #111 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0066869Z #112 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0067770Z #113 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153 2025-12-04T10:08:42.0068594Z #114 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431 2025-12-04T10:08:42.0069356Z #115 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494 2025-12-04T10:08:42.0070099Z #116 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215 2025-12-04T10:08:42.0071205Z #117 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112 2025-12-04T10:08:42.0072310Z #118 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0073242Z #119 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0074185Z #120 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0075132Z #121 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0076081Z #122 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0076863Z #123 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T10:08:42.0077651Z #124 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0078593Z #125 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0079719Z #126 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0080651Z #127 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0081595Z #128 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0082480Z #129 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153 2025-12-04T10:08:42.0083390Z #130 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431 2025-12-04T10:08:42.0084141Z #131 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494 2025-12-04T10:08:42.0084906Z #132 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215 2025-12-04T10:08:42.0085784Z #133 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112 2025-12-04T10:08:42.0086719Z #134 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0087663Z #135 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0088602Z #136 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0089545Z #137 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0090481Z #138 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0091418Z #139 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0092356Z #140 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0093294Z #141 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0094222Z #142 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0095042Z #143 PyEval_EvalCode from /usr/local/src/conda/python-3.10.14/Python/ceval.c:1134 2025-12-04T10:08:42.0095789Z #144 run_eval_code_obj from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1291 2025-12-04T10:08:42.0096600Z #145 run_mod from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1312 2025-12-04T10:08:42.0097307Z #146 pyrun_file from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1208 2025-12-04T10:08:42.0098103Z #147 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:456 2025-12-04T10:08:42.0098937Z #148 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:90 2025-12-04T10:08:42.0099696Z #149 pymain_run_file_obj from /usr/local/src/conda/python-3.10.14/Modules/main.c:357 2025-12-04T10:08:42.0100422Z #150 Py_BytesMain from /usr/local/src/conda/python-3.10.14/Modules/main.c:1090 2025-12-04T10:08:42.0101119Z #151 __libc_start_call_main from ./csu/../sysdeps/nptl/libc_start_call_main.h:58 2025-12-04T10:08:42.0101740Z #152 __libc_start_main_impl from ./csu/../csu/libc-start.c:392 2025-12-04T10:08:42.0102172Z #153 _start from ??:0 2025-12-04T10:08:42.0102478Z #154 from ??:0 2025-12-04T10:08:42.0102715Z 2025-12-04T10:08:42.0102720Z 2025-12-04T10:08:42.0102949Z To execute this test, run the following from the base repo dir: 2025-12-04T10:08:42.0104104Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_aot_inductor.py AOTInductorTestABICompatibleGpu.test__weight_int4pack_mm_m_32_n_64_q_group_64_num_groups_1_cuda 2025-12-04T10:08:42.0105017Z 2025-12-04T10:08:42.0105288Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:08:42.0105931Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:08:42.0107498Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/storage.py:257: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:08:42.0109012Z return type(self)(self.nbytes(), device=self.device).copy_(self) 2025-12-04T10:08:42.0109792Z _ AOTInductorTestABICompatibleGpu.test__weight_int4pack_mm_m_32_n_64_q_group_64_num_groups_1_cuda _ 2025-12-04T10:08:42.0110491Z Traceback (most recent call last): 2025-12-04T10:08:42.0111280Z File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor.py", line 6893, in test__weight_int4pack_mm 2025-12-04T10:08:42.0112006Z self.check_model(model, (a,)) 2025-12-04T10:08:42.0112671Z File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_utils.py", line 247, in check_model 2025-12-04T10:08:42.0113369Z ref_model = copy.deepcopy(model) 2025-12-04T10:08:42.0113889Z File "/opt/conda/envs/py_3.10/lib/python3.10/copy.py", line 172, in deepcopy 2025-12-04T10:08:42.0114407Z y = _reconstruct(x, memo, *rv) 2025-12-04T10:08:42.0114938Z File "/opt/conda/envs/py_3.10/lib/python3.10/copy.py", line 271, in _reconstruct 2025-12-04T10:08:42.0115487Z state = deepcopy(state, memo) 2025-12-04T10:08:42.0115975Z File "/opt/conda/envs/py_3.10/lib/python3.10/copy.py", line 146, in deepcopy 2025-12-04T10:08:42.0116492Z y = copier(x, memo) 2025-12-04T10:08:42.0116979Z File "/opt/conda/envs/py_3.10/lib/python3.10/copy.py", line 231, in _deepcopy_dict 2025-12-04T10:08:42.0117584Z y[deepcopy(key, memo)] = deepcopy(value, memo) 2025-12-04T10:08:42.0118129Z File "/opt/conda/envs/py_3.10/lib/python3.10/copy.py", line 153, in deepcopy 2025-12-04T10:08:42.0118638Z y = copier(memo) 2025-12-04T10:08:42.0119232Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_tensor.py", line 180, in __deepcopy__ 2025-12-04T10:08:42.0119945Z new_storage = self._typed_storage()._deepcopy(memo) 2025-12-04T10:08:42.0120650Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/storage.py", line 1139, in _deepcopy 2025-12-04T10:08:42.0121481Z return self._new_wrapped_storage(copy.deepcopy(self._untyped_storage, memo)) 2025-12-04T10:08:42.0122175Z File "/opt/conda/envs/py_3.10/lib/python3.10/copy.py", line 153, in deepcopy 2025-12-04T10:08:42.0122677Z y = copier(memo) 2025-12-04T10:08:42.0123265Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/storage.py", line 243, in __deepcopy__ 2025-12-04T10:08:42.0123944Z new_storage = self.clone() 2025-12-04T10:08:42.0124518Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/storage.py", line 257, in clone 2025-12-04T10:08:42.0125260Z return type(self)(self.nbytes(), device=self.device).copy_(self) 2025-12-04T10:08:42.0125845Z torch.AcceleratorError: CUDA error: invalid device function 2025-12-04T10:08:42.0126844Z Search for `cudaErrorInvalidDeviceFunction' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information. 2025-12-04T10:08:42.0128092Z CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. 2025-12-04T10:08:42.0128900Z For debugging consider passing CUDA_LAUNCH_BLOCKING=1 2025-12-04T10:08:42.0129456Z Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions. 2025-12-04T10:08:42.0129824Z 2025-12-04T10:08:42.0130433Z Exception raised from copy_device_to_device at /var/lib/jenkins/workspace/aten/src/ATen/native/cuda/Copy.cu:337 (most recent call first): 2025-12-04T10:08:42.0131300Z C++ CapturedTraceback: 2025-12-04T10:08:42.0132808Z #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T10:08:42.0134734Z #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T10:08:42.0136094Z #6 c10::AcceleratorError::AcceleratorError(c10::SourceLocation, int, std::__cxx11::basic_string, std::allocator > const&) from :0 2025-12-04T10:08:42.0137332Z #7 c10::cuda::c10_cuda_check_implementation(int, char const*, char const*, unsigned int, bool) from ??:0 2025-12-04T10:08:42.0138122Z #8 at::native::copy_device_to_device(at::TensorIterator&, bool, bool) from ??:0 2025-12-04T10:08:42.0138912Z #9 at::native::copy_impl(at::Tensor&, at::Tensor const&, bool) [clone .isra.0] from Copy.cpp:0 2025-12-04T10:08:42.0139585Z #10 at::native::copy_(at::Tensor&, at::Tensor const&, bool) from ??:0 2025-12-04T10:08:42.0142078Z #11 c10::impl::wrap_kernel_functor_unboxed_, at::Tensor&, c10::guts::typelist::typelist >, at::Tensor& (c10::DispatchKeySet, at::Tensor&, at::Tensor const&, bool)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor&, at::Tensor const&, bool) from VariableTypeManual.cpp:0 2025-12-04T10:08:42.0144962Z #12 torch::autograd::VariableType::(anonymous namespace)::copy_(c10::DispatchKeySet, at::Tensor&, at::Tensor const&, bool) from VariableTypeManual.cpp:0 2025-12-04T10:08:42.0145949Z #13 at::_ops::copy_::call(at::Tensor&, at::Tensor const&, bool) from ??:0 2025-12-04T10:08:42.0146556Z #14 at::storage_copy(c10::Storage&, c10::Storage const&, bool) from ??:0 2025-12-04T10:08:42.0147202Z #15 THPStorage_copy_(_object*, _object*, _object*) from StorageMethods.cpp:0 2025-12-04T10:08:42.0148027Z #16 method_vectorcall_VARARGS_KEYWORDS from /usr/local/src/conda/python-3.10.14/Objects/descrobject.c:344 2025-12-04T10:08:42.0149009Z #17 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0149947Z #18 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0150873Z #19 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0151804Z #20 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0152730Z #21 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0153662Z #22 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0154579Z #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0155503Z #24 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0156433Z #25 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0157363Z #26 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0158274Z #27 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0159198Z #28 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0160121Z #29 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0161049Z #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0161962Z #31 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0162891Z #32 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0163822Z #33 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0164825Z #34 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0165601Z #35 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T10:08:42.0166391Z #36 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0167320Z #37 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0168333Z #38 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0169262Z #39 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0170191Z #40 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0171190Z #41 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267 2025-12-04T10:08:42.0171884Z #42 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T10:08:42.0172670Z #43 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0173485Z #44 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267 2025-12-04T10:08:42.0174186Z #45 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T10:08:42.0174950Z #46 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0175771Z #47 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267 2025-12-04T10:08:42.0176548Z #48 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T10:08:42.0177320Z #49 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0178257Z #50 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0179190Z #51 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0179976Z #52 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T10:08:42.0180745Z #53 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0181680Z #54 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0182614Z #55 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0183540Z #56 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0184450Z #57 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0185375Z #58 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0186306Z #59 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0187228Z #60 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0188135Z #61 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0188948Z #62 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267 2025-12-04T10:08:42.0189652Z #63 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T10:08:42.0190420Z #64 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0191288Z #65 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153 2025-12-04T10:08:42.0192097Z #66 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431 2025-12-04T10:08:42.0192847Z #67 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494 2025-12-04T10:08:42.0193729Z #68 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215 2025-12-04T10:08:42.0194591Z #69 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112 2025-12-04T10:08:42.0195520Z #70 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0196453Z #71 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0197455Z #72 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0198238Z #73 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T10:08:42.0199018Z #74 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0199929Z #75 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0200861Z #76 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0201786Z #77 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0202708Z #78 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0203567Z #79 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153 2025-12-04T10:08:42.0204380Z #80 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431 2025-12-04T10:08:42.0205134Z #81 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494 2025-12-04T10:08:42.0205856Z #82 _PyObject_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:305 2025-12-04T10:08:42.0206524Z #83 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T10:08:42.0207312Z #84 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0208251Z #85 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0209178Z #86 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0210095Z #87 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0211024Z #88 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0211803Z #89 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T10:08:42.0212572Z #90 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0213496Z #91 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0214422Z #92 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0215344Z #93 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0216319Z #94 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0217105Z #95 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T10:08:42.0217902Z #96 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0218843Z #97 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0219760Z #98 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0220693Z #99 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0221633Z #100 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0222616Z #101 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153 2025-12-04T10:08:42.0223425Z #102 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431 2025-12-04T10:08:42.0224199Z #103 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494 2025-12-04T10:08:42.0224966Z #104 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215 2025-12-04T10:08:42.0225892Z #105 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112 2025-12-04T10:08:42.0226842Z #106 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0227636Z #107 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T10:08:42.0228440Z #108 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0229384Z #109 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0230331Z #110 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0231280Z #111 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0232223Z #112 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0233096Z #113 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153 2025-12-04T10:08:42.0233912Z #114 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431 2025-12-04T10:08:42.0234679Z #115 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494 2025-12-04T10:08:42.0235435Z #116 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215 2025-12-04T10:08:42.0236296Z #117 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112 2025-12-04T10:08:42.0237239Z #118 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0238187Z #119 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0239110Z #120 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0240058Z #121 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0240996Z #122 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0241785Z #123 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T10:08:42.0242574Z #124 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0243518Z #125 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0244459Z #126 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0245403Z #127 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0246329Z #128 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0247216Z #129 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153 2025-12-04T10:08:42.0248034Z #130 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431 2025-12-04T10:08:42.0248793Z #131 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494 2025-12-04T10:08:42.0249540Z #132 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215 2025-12-04T10:08:42.0250414Z #133 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112 2025-12-04T10:08:42.0251428Z #134 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0252358Z #135 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0253295Z #136 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0254235Z #137 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0255239Z #138 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0256167Z #139 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0257177Z #140 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0258126Z #141 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0259079Z #142 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0259892Z #143 PyEval_EvalCode from /usr/local/src/conda/python-3.10.14/Python/ceval.c:1134 2025-12-04T10:08:42.0260652Z #144 run_eval_code_obj from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1291 2025-12-04T10:08:42.0261384Z #145 run_mod from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1312 2025-12-04T10:08:42.0262094Z #146 pyrun_file from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1208 2025-12-04T10:08:42.0262870Z #147 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:456 2025-12-04T10:08:42.0263704Z #148 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:90 2025-12-04T10:08:42.0264478Z #149 pymain_run_file_obj from /usr/local/src/conda/python-3.10.14/Modules/main.c:357 2025-12-04T10:08:42.0265190Z #150 Py_BytesMain from /usr/local/src/conda/python-3.10.14/Modules/main.c:1090 2025-12-04T10:08:42.0265877Z #151 __libc_start_call_main from ./csu/../sysdeps/nptl/libc_start_call_main.h:58 2025-12-04T10:08:42.0266494Z #152 __libc_start_main_impl from ./csu/../csu/libc-start.c:392 2025-12-04T10:08:42.0266937Z #153 _start from ??:0 2025-12-04T10:08:42.0267233Z #154 from ??:0 2025-12-04T10:08:42.0267482Z 2025-12-04T10:08:42.0267487Z 2025-12-04T10:08:42.0267708Z To execute this test, run the following from the base repo dir: 2025-12-04T10:08:42.0268853Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_aot_inductor.py AOTInductorTestABICompatibleGpu.test__weight_int4pack_mm_m_32_n_64_q_group_64_num_groups_1_cuda 2025-12-04T10:08:42.0269767Z 2025-12-04T10:08:42.0270048Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:08:42.0270673Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:08:42.0272329Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/storage.py:257: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:08:42.0273820Z return type(self)(self.nbytes(), device=self.device).copy_(self) 2025-12-04T10:08:42.0274338Z =================================== FAILURES =================================== 2025-12-04T10:08:42.0275060Z _ AOTInductorTestABICompatibleGpu.test__weight_int4pack_mm_m_32_n_64_q_group_64_num_groups_1_cuda _ 2025-12-04T10:08:42.0275756Z Traceback (most recent call last): 2025-12-04T10:08:42.0276471Z File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor.py", line 6893, in test__weight_int4pack_mm 2025-12-04T10:08:42.0277200Z self.check_model(model, (a,)) 2025-12-04T10:08:42.0277866Z File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_utils.py", line 247, in check_model 2025-12-04T10:08:42.0278561Z ref_model = copy.deepcopy(model) 2025-12-04T10:08:42.0279217Z File "/opt/conda/envs/py_3.10/lib/python3.10/copy.py", line 172, in deepcopy 2025-12-04T10:08:42.0279737Z y = _reconstruct(x, memo, *rv) 2025-12-04T10:08:42.0280264Z File "/opt/conda/envs/py_3.10/lib/python3.10/copy.py", line 271, in _reconstruct 2025-12-04T10:08:42.0280818Z state = deepcopy(state, memo) 2025-12-04T10:08:42.0281305Z File "/opt/conda/envs/py_3.10/lib/python3.10/copy.py", line 146, in deepcopy 2025-12-04T10:08:42.0281956Z y = copier(x, memo) 2025-12-04T10:08:42.0282447Z File "/opt/conda/envs/py_3.10/lib/python3.10/copy.py", line 231, in _deepcopy_dict 2025-12-04T10:08:42.0283043Z y[deepcopy(key, memo)] = deepcopy(value, memo) 2025-12-04T10:08:42.0283603Z File "/opt/conda/envs/py_3.10/lib/python3.10/copy.py", line 153, in deepcopy 2025-12-04T10:08:42.0284117Z y = copier(memo) 2025-12-04T10:08:42.0284709Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_tensor.py", line 180, in __deepcopy__ 2025-12-04T10:08:42.0285432Z new_storage = self._typed_storage()._deepcopy(memo) 2025-12-04T10:08:42.0286135Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/storage.py", line 1139, in _deepcopy 2025-12-04T10:08:42.0286958Z return self._new_wrapped_storage(copy.deepcopy(self._untyped_storage, memo)) 2025-12-04T10:08:42.0287649Z File "/opt/conda/envs/py_3.10/lib/python3.10/copy.py", line 153, in deepcopy 2025-12-04T10:08:42.0288145Z y = copier(memo) 2025-12-04T10:08:42.0288734Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/storage.py", line 243, in __deepcopy__ 2025-12-04T10:08:42.0289409Z new_storage = self.clone() 2025-12-04T10:08:42.0289984Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/storage.py", line 257, in clone 2025-12-04T10:08:42.0290725Z return type(self)(self.nbytes(), device=self.device).copy_(self) 2025-12-04T10:08:42.0291310Z torch.AcceleratorError: CUDA error: invalid device function 2025-12-04T10:08:42.0292298Z Search for `cudaErrorInvalidDeviceFunction' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information. 2025-12-04T10:08:42.0293553Z CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. 2025-12-04T10:08:42.0294358Z For debugging consider passing CUDA_LAUNCH_BLOCKING=1 2025-12-04T10:08:42.0294917Z Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions. 2025-12-04T10:08:42.0295289Z 2025-12-04T10:08:42.0295901Z Exception raised from copy_device_to_device at /var/lib/jenkins/workspace/aten/src/ATen/native/cuda/Copy.cu:337 (most recent call first): 2025-12-04T10:08:42.0296831Z C++ CapturedTraceback: 2025-12-04T10:08:42.0298354Z #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T10:08:42.0300288Z #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T10:08:42.0301573Z #6 c10::AcceleratorError::AcceleratorError(c10::SourceLocation, int, std::__cxx11::basic_string, std::allocator > const&) from :0 2025-12-04T10:08:42.0302729Z #7 c10::cuda::c10_cuda_check_implementation(int, char const*, char const*, unsigned int, bool) from ??:0 2025-12-04T10:08:42.0303533Z #8 at::native::copy_device_to_device(at::TensorIterator&, bool, bool) from ??:0 2025-12-04T10:08:42.0304258Z #9 at::native::copy_impl(at::Tensor&, at::Tensor const&, bool) [clone .isra.0] from Copy.cpp:0 2025-12-04T10:08:42.0304933Z #10 at::native::copy_(at::Tensor&, at::Tensor const&, bool) from ??:0 2025-12-04T10:08:42.0307499Z #11 c10::impl::wrap_kernel_functor_unboxed_, at::Tensor&, c10::guts::typelist::typelist >, at::Tensor& (c10::DispatchKeySet, at::Tensor&, at::Tensor const&, bool)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor&, at::Tensor const&, bool) from VariableTypeManual.cpp:0 2025-12-04T10:08:42.0310369Z #12 torch::autograd::VariableType::(anonymous namespace)::copy_(c10::DispatchKeySet, at::Tensor&, at::Tensor const&, bool) from VariableTypeManual.cpp:0 2025-12-04T10:08:42.0311412Z #13 at::_ops::copy_::call(at::Tensor&, at::Tensor const&, bool) from ??:0 2025-12-04T10:08:42.0312010Z #14 at::storage_copy(c10::Storage&, c10::Storage const&, bool) from ??:0 2025-12-04T10:08:42.0312654Z #15 THPStorage_copy_(_object*, _object*, _object*) from StorageMethods.cpp:0 2025-12-04T10:08:42.0313474Z #16 method_vectorcall_VARARGS_KEYWORDS from /usr/local/src/conda/python-3.10.14/Objects/descrobject.c:344 2025-12-04T10:08:42.0314455Z #17 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0315387Z #18 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0316310Z #19 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0317247Z #20 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0318180Z #21 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0319106Z #22 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0320019Z #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0320945Z #24 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0321874Z #25 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0322795Z #26 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0323712Z #27 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0324645Z #28 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0325576Z #29 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0326499Z #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0327413Z #31 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0328337Z #32 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0329276Z #33 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0330190Z #34 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0330976Z #35 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T10:08:42.0331760Z #36 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0332694Z #37 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0333607Z #38 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0334534Z #39 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0335461Z #40 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0336427Z #41 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267 2025-12-04T10:08:42.0337124Z #42 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T10:08:42.0337904Z #43 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0338722Z #44 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267 2025-12-04T10:08:42.0339490Z #45 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T10:08:42.0340259Z #46 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0341073Z #47 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267 2025-12-04T10:08:42.0341777Z #48 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T10:08:42.0342543Z #49 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0343480Z #50 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0344411Z #51 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0345190Z #52 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T10:08:42.0345952Z #53 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0346883Z #54 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0347809Z #55 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0348734Z #56 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0349643Z #57 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0350575Z #58 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0351494Z #59 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0352413Z #60 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0353320Z #61 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0354142Z #62 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267 2025-12-04T10:08:42.0354840Z #63 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T10:08:42.0355608Z #64 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0356481Z #65 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153 2025-12-04T10:08:42.0357287Z #66 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431 2025-12-04T10:08:42.0358035Z #67 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494 2025-12-04T10:08:42.0358763Z #68 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215 2025-12-04T10:08:42.0359623Z #69 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112 2025-12-04T10:08:42.0360554Z #70 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0361478Z #71 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0362386Z #72 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0363165Z #73 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T10:08:42.0363941Z #74 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0364919Z #75 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0365849Z #76 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0366777Z #77 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0367701Z #78 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0368630Z #79 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153 2025-12-04T10:08:42.0369438Z #80 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431 2025-12-04T10:08:42.0370186Z #81 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494 2025-12-04T10:08:42.0370902Z #82 _PyObject_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:305 2025-12-04T10:08:42.0371723Z #83 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T10:08:42.0372503Z #84 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0373432Z #85 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0374358Z #86 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0375276Z #87 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0376201Z #88 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0377086Z #89 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T10:08:42.0377854Z #90 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0378793Z #91 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0379734Z #92 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0380666Z #93 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0381587Z #94 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0382377Z #95 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T10:08:42.0383155Z #96 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0384085Z #97 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0385000Z #98 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0385940Z #99 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0386881Z #100 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0387770Z #101 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153 2025-12-04T10:08:42.0388583Z #102 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431 2025-12-04T10:08:42.0389357Z #103 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494 2025-12-04T10:08:42.0390121Z #104 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215 2025-12-04T10:08:42.0390988Z #105 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112 2025-12-04T10:08:42.0391938Z #106 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0392741Z #107 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T10:08:42.0393713Z #108 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0394651Z #109 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0395599Z #110 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0396541Z #111 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0397565Z #112 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0398437Z #113 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153 2025-12-04T10:08:42.0399256Z #114 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431 2025-12-04T10:08:42.0400021Z #115 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494 2025-12-04T10:08:42.0400771Z #116 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215 2025-12-04T10:08:42.0401644Z #117 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112 2025-12-04T10:08:42.0402589Z #118 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0403531Z #119 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0404463Z #120 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0405405Z #121 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0406343Z #122 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0407137Z #123 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T10:08:42.0407922Z #124 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0408868Z #125 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0409809Z #126 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0410751Z #127 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0411687Z #128 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0412570Z #129 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153 2025-12-04T10:08:42.0413390Z #130 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431 2025-12-04T10:08:42.0414149Z #131 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494 2025-12-04T10:08:42.0414906Z #132 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215 2025-12-04T10:08:42.0415776Z #133 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112 2025-12-04T10:08:42.0416786Z #134 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0417716Z #135 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0418669Z #136 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0419616Z #137 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0420566Z #138 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0421498Z #139 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0422514Z #140 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0423461Z #141 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0424401Z #142 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0425215Z #143 PyEval_EvalCode from /usr/local/src/conda/python-3.10.14/Python/ceval.c:1134 2025-12-04T10:08:42.0426039Z #144 run_eval_code_obj from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1291 2025-12-04T10:08:42.0426778Z #145 run_mod from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1312 2025-12-04T10:08:42.0427472Z #146 pyrun_file from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1208 2025-12-04T10:08:42.0428262Z #147 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:456 2025-12-04T10:08:42.0429095Z #148 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:90 2025-12-04T10:08:42.0429875Z #149 pymain_run_file_obj from /usr/local/src/conda/python-3.10.14/Modules/main.c:357 2025-12-04T10:08:42.0430583Z #150 Py_BytesMain from /usr/local/src/conda/python-3.10.14/Modules/main.c:1090 2025-12-04T10:08:42.0431271Z #151 __libc_start_call_main from ./csu/../sysdeps/nptl/libc_start_call_main.h:58 2025-12-04T10:08:42.0431887Z #152 __libc_start_main_impl from ./csu/../csu/libc-start.c:392 2025-12-04T10:08:42.0432333Z #153 _start from ??:0 2025-12-04T10:08:42.0432631Z #154 from ??:0 2025-12-04T10:08:42.0432878Z 2025-12-04T10:08:42.0432883Z 2025-12-04T10:08:42.0433101Z To execute this test, run the following from the base repo dir: 2025-12-04T10:08:42.0434259Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_aot_inductor.py AOTInductorTestABICompatibleGpu.test__weight_int4pack_mm_m_32_n_64_q_group_64_num_groups_1_cuda 2025-12-04T10:08:42.0435316Z 2025-12-04T10:08:42.0435597Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:08:42.0436226Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:08:42.0437709Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/storage.py:257: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:08:42.0439195Z return type(self)(self.nbytes(), device=self.device).copy_(self) 2025-12-04T10:08:42.0440319Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-cd1c50b62bb47a1b.xml - 2025-12-04T10:08:42.0441368Z =========================== short test summary info ============================ 2025-12-04T10:08:42.0442605Z FAILED [0.0054s] inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test__weight_int4pack_mm_m_32_n_64_q_group_64_num_groups_1_cuda - torch.AcceleratorError: CUDA error: invalid device function 2025-12-04T10:08:42.0444272Z Search for `cudaErrorInvalidDeviceFunction' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information. 2025-12-04T10:08:42.0445526Z CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. 2025-12-04T10:08:42.0446319Z For debugging consider passing CUDA_LAUNCH_BLOCKING=1 2025-12-04T10:08:42.0446872Z Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions. 2025-12-04T10:08:42.0447245Z 2025-12-04T10:08:42.0447871Z Exception raised from copy_device_to_device at /var/lib/jenkins/workspace/aten/src/ATen/native/cuda/Copy.cu:337 (most recent call first): 2025-12-04T10:08:42.0448734Z C++ CapturedTraceback: 2025-12-04T10:08:42.0450333Z #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T10:08:42.0452268Z #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T10:08:42.0453556Z #6 c10::AcceleratorError::AcceleratorError(c10::SourceLocation, int, std::__cxx11::basic_string, std::allocator > const&) from :0 2025-12-04T10:08:42.0454796Z #7 c10::cuda::c10_cuda_check_implementation(int, char const*, char const*, unsigned int, bool) from ??:0 2025-12-04T10:08:42.0455571Z #8 at::native::copy_device_to_device(at::TensorIterator&, bool, bool) from ??:0 2025-12-04T10:08:42.0456353Z #9 at::native::copy_impl(at::Tensor&, at::Tensor const&, bool) [clone .isra.0] from Copy.cpp:0 2025-12-04T10:08:42.0457033Z #10 at::native::copy_(at::Tensor&, at::Tensor const&, bool) from ??:0 2025-12-04T10:08:42.0459536Z #11 c10::impl::wrap_kernel_functor_unboxed_, at::Tensor&, c10::guts::typelist::typelist >, at::Tensor& (c10::DispatchKeySet, at::Tensor&, at::Tensor const&, bool)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor&, at::Tensor const&, bool) from VariableTypeManual.cpp:0 2025-12-04T10:08:42.0462403Z #12 torch::autograd::VariableType::(anonymous namespace)::copy_(c10::DispatchKeySet, at::Tensor&, at::Tensor const&, bool) from VariableTypeManual.cpp:0 2025-12-04T10:08:42.0463360Z #13 at::_ops::copy_::call(at::Tensor&, at::Tensor const&, bool) from ??:0 2025-12-04T10:08:42.0463960Z #14 at::storage_copy(c10::Storage&, c10::Storage const&, bool) from ??:0 2025-12-04T10:08:42.0464598Z #15 THPStorage_copy_(_object*, _object*, _object*) from StorageMethods.cpp:0 2025-12-04T10:08:42.0465427Z #16 method_vectorcall_VARARGS_KEYWORDS from /usr/local/src/conda/python-3.10.14/Objects/descrobject.c:344 2025-12-04T10:08:42.0466396Z #17 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0467334Z #18 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0468266Z #19 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0469204Z #20 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0470120Z #21 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0471253Z #22 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0472191Z #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0473115Z #24 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0474048Z #25 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0474978Z #26 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0475908Z #27 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0476832Z #28 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0477771Z #29 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0478700Z #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0479633Z #31 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0480785Z #32 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0481720Z #33 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0482661Z #34 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0483453Z #35 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T10:08:42.0484304Z #36 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0485232Z #37 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0486165Z #38 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0487093Z #39 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0488012Z #40 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0488827Z #41 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267 2025-12-04T10:08:42.0489527Z #42 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T10:08:42.0490294Z #43 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0491108Z #44 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267 2025-12-04T10:08:42.0491815Z #45 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T10:08:42.0492595Z #46 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0493397Z #47 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267 2025-12-04T10:08:42.0494100Z #48 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T10:08:42.0494881Z #49 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0495821Z #50 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0496805Z #51 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0497591Z #52 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T10:08:42.0498386Z #53 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0499303Z #54 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0500235Z #55 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0501158Z #56 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0502080Z #57 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0502994Z #58 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0503920Z #59 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0504844Z #60 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0505776Z #61 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0506576Z #62 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267 2025-12-04T10:08:42.0524741Z #63 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T10:08:42.0525644Z #64 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0526521Z #65 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153 2025-12-04T10:08:42.0527553Z #66 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431 2025-12-04T10:08:42.0528309Z #67 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494 2025-12-04T10:08:42.0529040Z #68 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215 2025-12-04T10:08:42.0529900Z #69 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112 2025-12-04T10:08:42.0530921Z #70 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0531836Z #71 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0532748Z #72 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0533515Z #73 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T10:08:42.0534285Z #74 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0535195Z #75 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0536099Z #76 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0537096Z #77 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0538011Z #78 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0538859Z #79 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153 2025-12-04T10:08:42.0539643Z #80 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431 2025-12-04T10:08:42.0540373Z #81 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494 2025-12-04T10:08:42.0541061Z #82 _PyObject_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:305 2025-12-04T10:08:42.0541726Z #83 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T10:08:42.0542486Z #84 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0543393Z #85 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0544289Z #86 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0545197Z #87 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0546101Z #88 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0546863Z #89 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T10:08:42.0547620Z #90 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0548531Z #91 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0549443Z #92 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0550349Z #93 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0551247Z #94 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0552014Z #95 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T10:08:42.0552771Z #96 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0553673Z #97 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0554581Z #98 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0555557Z #99 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0556479Z #100 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0557337Z #101 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153 2025-12-04T10:08:42.0558138Z #102 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431 2025-12-04T10:08:42.0558941Z #103 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494 2025-12-04T10:08:42.0559679Z #104 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215 2025-12-04T10:08:42.0560535Z #105 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112 2025-12-04T10:08:42.0561456Z #106 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0562233Z #107 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T10:08:42.0563017Z #108 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0563943Z #109 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0564882Z #110 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0565817Z #111 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0566740Z #112 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0567607Z #113 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153 2025-12-04T10:08:42.0568424Z #114 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431 2025-12-04T10:08:42.0569188Z #115 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494 2025-12-04T10:08:42.0569941Z #116 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215 2025-12-04T10:08:42.0570813Z #117 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112 2025-12-04T10:08:42.0571972Z #118 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0572919Z #119 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0573853Z #120 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0574797Z #121 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0575742Z #122 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0576618Z #123 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T10:08:42.0577414Z #124 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0578361Z #125 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0579310Z #126 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0580240Z #127 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0581189Z #128 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0582073Z #129 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153 2025-12-04T10:08:42.0582888Z #130 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431 2025-12-04T10:08:42.0583640Z #131 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494 2025-12-04T10:08:42.0584552Z #132 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215 2025-12-04T10:08:42.0585429Z #133 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112 2025-12-04T10:08:42.0586374Z #134 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0587302Z #135 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0588333Z #136 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0589279Z #137 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0590225Z #138 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0591153Z #139 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0592106Z #140 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0593049Z #141 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0593979Z #142 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0594804Z #143 PyEval_EvalCode from /usr/local/src/conda/python-3.10.14/Python/ceval.c:1134 2025-12-04T10:08:42.0595563Z #144 run_eval_code_obj from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1291 2025-12-04T10:08:42.0596304Z #145 run_mod from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1312 2025-12-04T10:08:42.0596997Z #146 pyrun_file from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1208 2025-12-04T10:08:42.0597792Z #147 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:456 2025-12-04T10:08:42.0598627Z #148 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:90 2025-12-04T10:08:42.0599398Z #149 pymain_run_file_obj from /usr/local/src/conda/python-3.10.14/Modules/main.c:357 2025-12-04T10:08:42.0600106Z #150 Py_BytesMain from /usr/local/src/conda/python-3.10.14/Modules/main.c:1090 2025-12-04T10:08:42.0600800Z #151 __libc_start_call_main from ./csu/../sysdeps/nptl/libc_start_call_main.h:58 2025-12-04T10:08:42.0601416Z #152 __libc_start_main_impl from ./csu/../csu/libc-start.c:392 2025-12-04T10:08:42.0601855Z #153 _start from ??:0 2025-12-04T10:08:42.0602167Z #154 from ??:0 2025-12-04T10:08:42.0602407Z 2025-12-04T10:08:42.0602426Z 2025-12-04T10:08:42.0602648Z To execute this test, run the following from the base repo dir: 2025-12-04T10:08:42.0603802Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_aot_inductor.py AOTInductorTestABICompatibleGpu.test__weight_int4pack_mm_m_32_n_64_q_group_64_num_groups_1_cuda 2025-12-04T10:08:42.0604723Z 2025-12-04T10:08:42.0605001Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:08:42.0605602Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:08:42.0606138Z ================= 1 failed, 157 deselected, 2 rerun in 15.85s ================== 2025-12-04T10:08:42.0606585Z Got exit code 1 2025-12-04T10:08:42.0606874Z Retrying single test... 2025-12-04T10:08:42.0607520Z W1204 09:59:04.172000 15755 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T10:08:42.0608668Z Test results will be stored in test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-3e5313e420476f15.xml 2025-12-04T10:08:42.0609546Z ============================= test session starts ============================== 2025-12-04T10:08:42.0610219Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:08:42.0610831Z cachedir: .pytest_cache 2025-12-04T10:08:42.0611607Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:08:42.0612397Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:08:42.0612760Z configfile: pytest.ini 2025-12-04T10:08:42.0613486Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:08:42.0614398Z collecting ... collected 934 items / 157 deselected / 777 selected 2025-12-04T10:08:42.0615732Z stepcurrent: skipping 58 already run items. Running only test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test__weight_int4pack_mm_m_32_n_64_q_group_64_num_groups_1_cuda 2025-12-04T10:08:42.0616924Z Running 1 items in this shard 2025-12-04T10:08:42.0617139Z 2025-12-04T10:08:42.0618284Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test__weight_int4pack_mm_m_32_n_64_q_group_64_num_groups_1_cuda [W1204 09:59:06.480206481 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:08:42.0619565Z 2025-12-04T10:08:42.0619705Z ('RERUN', {'yellow': True}) [15.9940s] [100%] 2025-12-04T10:08:42.0621101Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test__weight_int4pack_mm_m_32_n_64_q_group_64_num_groups_1_cuda [W1204 09:59:22.482459322 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:08:42.0622371Z 2025-12-04T10:08:42.0622506Z ('RERUN', {'yellow': True}) [0.0070s] [100%] 2025-12-04T10:08:42.0623890Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test__weight_int4pack_mm_m_32_n_64_q_group_64_num_groups_1_cuda [W1204 09:59:22.490028905 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:08:42.0625145Z 2025-12-04T10:08:42.0625250Z FAILED [0.0054s] [100%] 2025-12-04T10:08:42.0625442Z 2025-12-04T10:08:42.0625597Z ==================================== RERUNS ==================================== 2025-12-04T10:08:42.0626317Z _ AOTInductorTestABICompatibleGpu.test__weight_int4pack_mm_m_32_n_64_q_group_64_num_groups_1_cuda _ 2025-12-04T10:08:42.0627008Z Traceback (most recent call last): 2025-12-04T10:08:42.0627713Z File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor.py", line 6893, in test__weight_int4pack_mm 2025-12-04T10:08:42.0628451Z self.check_model(model, (a,)) 2025-12-04T10:08:42.0629117Z File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_utils.py", line 247, in check_model 2025-12-04T10:08:42.0629802Z ref_model = copy.deepcopy(model) 2025-12-04T10:08:42.0630323Z File "/opt/conda/envs/py_3.10/lib/python3.10/copy.py", line 172, in deepcopy 2025-12-04T10:08:42.0630857Z y = _reconstruct(x, memo, *rv) 2025-12-04T10:08:42.0631381Z File "/opt/conda/envs/py_3.10/lib/python3.10/copy.py", line 271, in _reconstruct 2025-12-04T10:08:42.0631922Z state = deepcopy(state, memo) 2025-12-04T10:08:42.0632427Z File "/opt/conda/envs/py_3.10/lib/python3.10/copy.py", line 146, in deepcopy 2025-12-04T10:08:42.0632949Z y = copier(x, memo) 2025-12-04T10:08:42.0633427Z File "/opt/conda/envs/py_3.10/lib/python3.10/copy.py", line 231, in _deepcopy_dict 2025-12-04T10:08:42.0634036Z y[deepcopy(key, memo)] = deepcopy(value, memo) 2025-12-04T10:08:42.0634599Z File "/opt/conda/envs/py_3.10/lib/python3.10/copy.py", line 153, in deepcopy 2025-12-04T10:08:42.0635116Z y = copier(memo) 2025-12-04T10:08:42.0635699Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_tensor.py", line 180, in __deepcopy__ 2025-12-04T10:08:42.0636434Z new_storage = self._typed_storage()._deepcopy(memo) 2025-12-04T10:08:42.0637137Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/storage.py", line 1139, in _deepcopy 2025-12-04T10:08:42.0637948Z return self._new_wrapped_storage(copy.deepcopy(self._untyped_storage, memo)) 2025-12-04T10:08:42.0638736Z File "/opt/conda/envs/py_3.10/lib/python3.10/copy.py", line 153, in deepcopy 2025-12-04T10:08:42.0639253Z y = copier(memo) 2025-12-04T10:08:42.0639842Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/storage.py", line 243, in __deepcopy__ 2025-12-04T10:08:42.0640502Z new_storage = self.clone() 2025-12-04T10:08:42.0641093Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/storage.py", line 257, in clone 2025-12-04T10:08:42.0641920Z return type(self)(self.nbytes(), device=self.device).copy_(self) 2025-12-04T10:08:42.0642499Z torch.AcceleratorError: CUDA error: invalid device function 2025-12-04T10:08:42.0643501Z Search for `cudaErrorInvalidDeviceFunction' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information. 2025-12-04T10:08:42.0644763Z CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. 2025-12-04T10:08:42.0645580Z For debugging consider passing CUDA_LAUNCH_BLOCKING=1 2025-12-04T10:08:42.0646134Z Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions. 2025-12-04T10:08:42.0646517Z 2025-12-04T10:08:42.0647130Z Exception raised from copy_device_to_device at /var/lib/jenkins/workspace/aten/src/ATen/native/cuda/Copy.cu:337 (most recent call first): 2025-12-04T10:08:42.0647995Z C++ CapturedTraceback: 2025-12-04T10:08:42.0649506Z #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T10:08:42.0651413Z #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T10:08:42.0652688Z #6 c10::AcceleratorError::AcceleratorError(c10::SourceLocation, int, std::__cxx11::basic_string, std::allocator > const&) from :0 2025-12-04T10:08:42.0653860Z #7 c10::cuda::c10_cuda_check_implementation(int, char const*, char const*, unsigned int, bool) from ??:0 2025-12-04T10:08:42.0654639Z #8 at::native::copy_device_to_device(at::TensorIterator&, bool, bool) from ??:0 2025-12-04T10:08:42.0655342Z #9 at::native::copy_impl(at::Tensor&, at::Tensor const&, bool) [clone .isra.0] from Copy.cpp:0 2025-12-04T10:08:42.0656017Z #10 at::native::copy_(at::Tensor&, at::Tensor const&, bool) from ??:0 2025-12-04T10:08:42.0658605Z #11 c10::impl::wrap_kernel_functor_unboxed_, at::Tensor&, c10::guts::typelist::typelist >, at::Tensor& (c10::DispatchKeySet, at::Tensor&, at::Tensor const&, bool)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor&, at::Tensor const&, bool) from VariableTypeManual.cpp:0 2025-12-04T10:08:42.0661479Z #12 torch::autograd::VariableType::(anonymous namespace)::copy_(c10::DispatchKeySet, at::Tensor&, at::Tensor const&, bool) from VariableTypeManual.cpp:0 2025-12-04T10:08:42.0662453Z #13 at::_ops::copy_::call(at::Tensor&, at::Tensor const&, bool) from ??:0 2025-12-04T10:08:42.0663035Z #14 at::storage_copy(c10::Storage&, c10::Storage const&, bool) from ??:0 2025-12-04T10:08:42.0663322Z #15 THPStorage_copy_(_object*, _object*, _object*) from StorageMethods.cpp:0 2025-12-04T10:08:42.0663747Z #16 method_vectorcall_VARARGS_KEYWORDS from /usr/local/src/conda/python-3.10.14/Objects/descrobject.c:344 2025-12-04T10:08:42.0664171Z #17 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0664545Z #18 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0664953Z #19 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0665412Z #20 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0665818Z #21 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0666204Z #22 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0666672Z #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0667045Z #24 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0667465Z #25 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0667837Z #26 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0668252Z #27 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0668628Z #28 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0669032Z #29 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0669412Z #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0669825Z #31 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0670194Z #32 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0670614Z #33 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0671181Z #34 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0671462Z #35 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T10:08:42.0671838Z #36 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0672243Z #37 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0672628Z #38 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0673037Z #39 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0673420Z #40 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0673715Z #41 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267 2025-12-04T10:08:42.0673974Z #42 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T10:08:42.0674360Z #43 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0674659Z #44 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267 2025-12-04T10:08:42.0674919Z #45 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T10:08:42.0675308Z #46 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0675599Z #47 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267 2025-12-04T10:08:42.0675873Z #48 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T10:08:42.0676249Z #49 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0676654Z #50 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0677037Z #51 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0677296Z #52 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T10:08:42.0677809Z #53 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0678217Z #54 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0678589Z #55 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0679012Z #56 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0679463Z #57 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0679882Z #58 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0680256Z #59 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0680661Z #60 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0681049Z #61 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0681341Z #62 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267 2025-12-04T10:08:42.0681617Z #63 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T10:08:42.0681986Z #64 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0682352Z #65 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153 2025-12-04T10:08:42.0682659Z #66 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431 2025-12-04T10:08:42.0682955Z #67 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494 2025-12-04T10:08:42.0683271Z #68 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215 2025-12-04T10:08:42.0683678Z #69 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112 2025-12-04T10:08:42.0684066Z #70 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0684472Z #71 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0684842Z #72 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0685124Z #73 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T10:08:42.0685503Z #74 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0685920Z #75 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0686289Z #76 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0686703Z #77 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0687091Z #78 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0687442Z #79 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153 2025-12-04T10:08:42.0687760Z #80 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431 2025-12-04T10:08:42.0688065Z #81 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494 2025-12-04T10:08:42.0688336Z #82 _PyObject_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:305 2025-12-04T10:08:42.0688611Z #83 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T10:08:42.0688982Z #84 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0689391Z #85 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0689846Z #86 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0690254Z #87 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0690641Z #88 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0690905Z #89 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T10:08:42.0691356Z #90 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0691776Z #91 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0692146Z #92 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0692560Z #93 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0692936Z #94 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0693193Z #95 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T10:08:42.0693578Z #96 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0693980Z #97 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0694367Z #98 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0694772Z #99 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0695157Z #100 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0695527Z #101 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153 2025-12-04T10:08:42.0695842Z #102 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431 2025-12-04T10:08:42.0696146Z #103 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494 2025-12-04T10:08:42.0696530Z #104 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215 2025-12-04T10:08:42.0696947Z #105 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112 2025-12-04T10:08:42.0697348Z #106 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0697616Z #107 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T10:08:42.0697994Z #108 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0698423Z #109 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0698801Z #110 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0699229Z #111 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0699607Z #112 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0699961Z #113 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153 2025-12-04T10:08:42.0700291Z #114 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431 2025-12-04T10:08:42.0700587Z #115 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494 2025-12-04T10:08:42.0700893Z #116 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215 2025-12-04T10:08:42.0701323Z #117 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112 2025-12-04T10:08:42.0701699Z #118 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0702201Z #119 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0702585Z #120 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0702998Z #121 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0703388Z #122 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0703710Z #123 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T10:08:42.0704097Z #124 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0704508Z #125 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0704883Z #126 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0705311Z #127 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0705689Z #128 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0706060Z #129 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153 2025-12-04T10:08:42.0706371Z #130 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431 2025-12-04T10:08:42.0706674Z #131 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494 2025-12-04T10:08:42.0706994Z #132 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215 2025-12-04T10:08:42.0707406Z #133 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112 2025-12-04T10:08:42.0707781Z #134 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0708209Z #135 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0708585Z #136 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0709011Z #137 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0709390Z #138 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0709804Z #139 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0710196Z #140 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0710608Z #141 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0710995Z #142 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0711289Z #143 PyEval_EvalCode from /usr/local/src/conda/python-3.10.14/Python/ceval.c:1134 2025-12-04T10:08:42.0711596Z #144 run_eval_code_obj from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1291 2025-12-04T10:08:42.0711875Z #145 run_mod from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1312 2025-12-04T10:08:42.0712163Z #146 pyrun_file from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1208 2025-12-04T10:08:42.0712532Z #147 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:456 2025-12-04T10:08:42.0712857Z #148 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:90 2025-12-04T10:08:42.0713148Z #149 pymain_run_file_obj from /usr/local/src/conda/python-3.10.14/Modules/main.c:357 2025-12-04T10:08:42.0713428Z #150 Py_BytesMain from /usr/local/src/conda/python-3.10.14/Modules/main.c:1090 2025-12-04T10:08:42.0713697Z #151 __libc_start_call_main from ./csu/../sysdeps/nptl/libc_start_call_main.h:58 2025-12-04T10:08:42.0713957Z #152 __libc_start_main_impl from ./csu/../csu/libc-start.c:392 2025-12-04T10:08:42.0714075Z #153 _start from ??:0 2025-12-04T10:08:42.0714200Z #154 from ??:0 2025-12-04T10:08:42.0714206Z 2025-12-04T10:08:42.0714211Z 2025-12-04T10:08:42.0714443Z To execute this test, run the following from the base repo dir: 2025-12-04T10:08:42.0715237Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_aot_inductor.py AOTInductorTestABICompatibleGpu.test__weight_int4pack_mm_m_32_n_64_q_group_64_num_groups_1_cuda 2025-12-04T10:08:42.0715306Z 2025-12-04T10:08:42.0715577Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:08:42.0715817Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:08:42.0716936Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/storage.py:257: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:08:42.0717184Z return type(self)(self.nbytes(), device=self.device).copy_(self) 2025-12-04T10:08:42.0717609Z _ AOTInductorTestABICompatibleGpu.test__weight_int4pack_mm_m_32_n_64_q_group_64_num_groups_1_cuda _ 2025-12-04T10:08:42.0717734Z Traceback (most recent call last): 2025-12-04T10:08:42.0718224Z File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor.py", line 6893, in test__weight_int4pack_mm 2025-12-04T10:08:42.0718349Z self.check_model(model, (a,)) 2025-12-04T10:08:42.0718793Z File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_utils.py", line 247, in check_model 2025-12-04T10:08:42.0718920Z ref_model = copy.deepcopy(model) 2025-12-04T10:08:42.0719191Z File "/opt/conda/envs/py_3.10/lib/python3.10/copy.py", line 172, in deepcopy 2025-12-04T10:08:42.0719321Z y = _reconstruct(x, memo, *rv) 2025-12-04T10:08:42.0719610Z File "/opt/conda/envs/py_3.10/lib/python3.10/copy.py", line 271, in _reconstruct 2025-12-04T10:08:42.0719733Z state = deepcopy(state, memo) 2025-12-04T10:08:42.0720010Z File "/opt/conda/envs/py_3.10/lib/python3.10/copy.py", line 146, in deepcopy 2025-12-04T10:08:42.0720116Z y = copier(x, memo) 2025-12-04T10:08:42.0720424Z File "/opt/conda/envs/py_3.10/lib/python3.10/copy.py", line 231, in _deepcopy_dict 2025-12-04T10:08:42.0720582Z y[deepcopy(key, memo)] = deepcopy(value, memo) 2025-12-04T10:08:42.0720846Z File "/opt/conda/envs/py_3.10/lib/python3.10/copy.py", line 153, in deepcopy 2025-12-04T10:08:42.0720964Z y = copier(memo) 2025-12-04T10:08:42.0721370Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_tensor.py", line 180, in __deepcopy__ 2025-12-04T10:08:42.0721540Z new_storage = self._typed_storage()._deepcopy(memo) 2025-12-04T10:08:42.0721944Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/storage.py", line 1139, in _deepcopy 2025-12-04T10:08:42.0722228Z return self._new_wrapped_storage(copy.deepcopy(self._untyped_storage, memo)) 2025-12-04T10:08:42.0722510Z File "/opt/conda/envs/py_3.10/lib/python3.10/copy.py", line 153, in deepcopy 2025-12-04T10:08:42.0722615Z y = copier(memo) 2025-12-04T10:08:42.0723024Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/storage.py", line 243, in __deepcopy__ 2025-12-04T10:08:42.0723158Z new_storage = self.clone() 2025-12-04T10:08:42.0723523Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/storage.py", line 257, in clone 2025-12-04T10:08:42.0723753Z return type(self)(self.nbytes(), device=self.device).copy_(self) 2025-12-04T10:08:42.0723987Z torch.AcceleratorError: CUDA error: invalid device function 2025-12-04T10:08:42.0724633Z Search for `cudaErrorInvalidDeviceFunction' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information. 2025-12-04T10:08:42.0725122Z CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. 2025-12-04T10:08:42.0725379Z For debugging consider passing CUDA_LAUNCH_BLOCKING=1 2025-12-04T10:08:42.0725617Z Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions. 2025-12-04T10:08:42.0725623Z 2025-12-04T10:08:42.0726244Z Exception raised from copy_device_to_device at /var/lib/jenkins/workspace/aten/src/ATen/native/cuda/Copy.cu:337 (most recent call first): 2025-12-04T10:08:42.0726358Z C++ CapturedTraceback: 2025-12-04T10:08:42.0727686Z #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T10:08:42.0728237Z #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T10:08:42.0728895Z #6 c10::AcceleratorError::AcceleratorError(c10::SourceLocation, int, std::__cxx11::basic_string, std::allocator > const&) from :0 2025-12-04T10:08:42.0729277Z #7 c10::cuda::c10_cuda_check_implementation(int, char const*, char const*, unsigned int, bool) from ??:0 2025-12-04T10:08:42.0729543Z #8 at::native::copy_device_to_device(at::TensorIterator&, bool, bool) from ??:0 2025-12-04T10:08:42.0729857Z #9 at::native::copy_impl(at::Tensor&, at::Tensor const&, bool) [clone .isra.0] from Copy.cpp:0 2025-12-04T10:08:42.0730076Z #10 at::native::copy_(at::Tensor&, at::Tensor const&, bool) from ??:0 2025-12-04T10:08:42.0732208Z #11 c10::impl::wrap_kernel_functor_unboxed_, at::Tensor&, c10::guts::typelist::typelist >, at::Tensor& (c10::DispatchKeySet, at::Tensor&, at::Tensor const&, bool)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor&, at::Tensor const&, bool) from VariableTypeManual.cpp:0 2025-12-04T10:08:42.0732827Z #12 torch::autograd::VariableType::(anonymous namespace)::copy_(c10::DispatchKeySet, at::Tensor&, at::Tensor const&, bool) from VariableTypeManual.cpp:0 2025-12-04T10:08:42.0733046Z #13 at::_ops::copy_::call(at::Tensor&, at::Tensor const&, bool) from ??:0 2025-12-04T10:08:42.0733288Z #14 at::storage_copy(c10::Storage&, c10::Storage const&, bool) from ??:0 2025-12-04T10:08:42.0733554Z #15 THPStorage_copy_(_object*, _object*, _object*) from StorageMethods.cpp:0 2025-12-04T10:08:42.0733990Z #16 method_vectorcall_VARARGS_KEYWORDS from /usr/local/src/conda/python-3.10.14/Objects/descrobject.c:344 2025-12-04T10:08:42.0734399Z #17 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0734775Z #18 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0735198Z #19 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0735570Z #20 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0735974Z #21 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0736424Z #22 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0736835Z #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0737221Z #24 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0737627Z #25 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0737996Z #26 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0738507Z #27 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0738878Z #28 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0739291Z #29 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0739657Z #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0740122Z #31 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0740505Z #32 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0740909Z #33 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0741297Z #34 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0741566Z #35 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T10:08:42.0741937Z #36 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0742356Z #37 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0742725Z #38 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0743145Z #39 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0743513Z #40 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0743809Z #41 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267 2025-12-04T10:08:42.0744085Z #42 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T10:08:42.0744456Z #43 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0744747Z #44 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267 2025-12-04T10:08:42.0745019Z #45 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T10:08:42.0745391Z #46 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0745693Z #47 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267 2025-12-04T10:08:42.0745950Z #48 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T10:08:42.0746322Z #49 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0746738Z #50 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0747107Z #51 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0747381Z #52 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T10:08:42.0747749Z #53 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0748154Z #54 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0748530Z #55 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0748938Z #56 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0749306Z #57 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0749722Z #58 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0750092Z #59 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0750571Z #60 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0750947Z #61 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0751238Z #62 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267 2025-12-04T10:08:42.0751515Z #63 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T10:08:42.0751945Z #64 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0752306Z #65 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153 2025-12-04T10:08:42.0752611Z #66 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431 2025-12-04T10:08:42.0752907Z #67 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494 2025-12-04T10:08:42.0753228Z #68 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215 2025-12-04T10:08:42.0753640Z #69 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112 2025-12-04T10:08:42.0754024Z #70 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0754434Z #71 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0754802Z #72 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0755079Z #73 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T10:08:42.0755451Z #74 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0755851Z #75 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0756237Z #76 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0756644Z #77 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0757024Z #78 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0757377Z #79 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153 2025-12-04T10:08:42.0757684Z #80 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431 2025-12-04T10:08:42.0757998Z #81 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494 2025-12-04T10:08:42.0758266Z #82 _PyObject_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:305 2025-12-04T10:08:42.0758541Z #83 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T10:08:42.0758913Z #84 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0759324Z #85 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0759707Z #86 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0760115Z #87 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0760497Z #88 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0760761Z #89 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T10:08:42.0761132Z #90 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0761549Z #91 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0761920Z #92 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0762323Z #93 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0762765Z #94 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0763025Z #95 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T10:08:42.0763404Z #96 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0763809Z #97 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0764236Z #98 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0764650Z #99 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0765034Z #100 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0765403Z #101 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153 2025-12-04T10:08:42.0765718Z #102 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431 2025-12-04T10:08:42.0766018Z #103 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494 2025-12-04T10:08:42.0766340Z #104 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215 2025-12-04T10:08:42.0766752Z #105 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112 2025-12-04T10:08:42.0767143Z #106 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0767410Z #107 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T10:08:42.0767789Z #108 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0768213Z #109 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0768596Z #110 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0769009Z #111 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0769401Z #112 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0769759Z #113 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153 2025-12-04T10:08:42.0770089Z #114 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431 2025-12-04T10:08:42.0770389Z #115 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494 2025-12-04T10:08:42.0770699Z #116 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215 2025-12-04T10:08:42.0771302Z #117 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112 2025-12-04T10:08:42.0771684Z #118 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0772108Z #119 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0772481Z #120 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0772892Z #121 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0773285Z #122 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0773547Z #123 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T10:08:42.0773923Z #124 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0774349Z #125 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0774867Z #126 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0775296Z #127 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0775673Z #128 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0776029Z #129 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153 2025-12-04T10:08:42.0776508Z #130 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431 2025-12-04T10:08:42.0776809Z #131 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494 2025-12-04T10:08:42.0777129Z #132 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215 2025-12-04T10:08:42.0777545Z #133 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112 2025-12-04T10:08:42.0777922Z #134 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0778356Z #135 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0778734Z #136 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0779158Z #137 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0779540Z #138 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0779946Z #139 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0780333Z #140 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0780746Z #141 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0781134Z #142 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0781427Z #143 PyEval_EvalCode from /usr/local/src/conda/python-3.10.14/Python/ceval.c:1134 2025-12-04T10:08:42.0781737Z #144 run_eval_code_obj from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1291 2025-12-04T10:08:42.0782017Z #145 run_mod from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1312 2025-12-04T10:08:42.0782304Z #146 pyrun_file from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1208 2025-12-04T10:08:42.0782663Z #147 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:456 2025-12-04T10:08:42.0783005Z #148 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:90 2025-12-04T10:08:42.0783300Z #149 pymain_run_file_obj from /usr/local/src/conda/python-3.10.14/Modules/main.c:357 2025-12-04T10:08:42.0783586Z #150 Py_BytesMain from /usr/local/src/conda/python-3.10.14/Modules/main.c:1090 2025-12-04T10:08:42.0783855Z #151 __libc_start_call_main from ./csu/../sysdeps/nptl/libc_start_call_main.h:58 2025-12-04T10:08:42.0784059Z #152 __libc_start_main_impl from ./csu/../csu/libc-start.c:392 2025-12-04T10:08:42.0784177Z #153 _start from ??:0 2025-12-04T10:08:42.0784302Z #154 from ??:0 2025-12-04T10:08:42.0784308Z 2025-12-04T10:08:42.0784313Z 2025-12-04T10:08:42.0784545Z To execute this test, run the following from the base repo dir: 2025-12-04T10:08:42.0785336Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_aot_inductor.py AOTInductorTestABICompatibleGpu.test__weight_int4pack_mm_m_32_n_64_q_group_64_num_groups_1_cuda 2025-12-04T10:08:42.0785346Z 2025-12-04T10:08:42.0785617Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:08:42.0785860Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:08:42.0787147Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/storage.py:257: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:08:42.0787401Z return type(self)(self.nbytes(), device=self.device).copy_(self) 2025-12-04T10:08:42.0787554Z =================================== FAILURES =================================== 2025-12-04T10:08:42.0787984Z _ AOTInductorTestABICompatibleGpu.test__weight_int4pack_mm_m_32_n_64_q_group_64_num_groups_1_cuda _ 2025-12-04T10:08:42.0788186Z Traceback (most recent call last): 2025-12-04T10:08:42.0788667Z File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor.py", line 6893, in test__weight_int4pack_mm 2025-12-04T10:08:42.0788790Z self.check_model(model, (a,)) 2025-12-04T10:08:42.0789234Z File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_utils.py", line 247, in check_model 2025-12-04T10:08:42.0789363Z ref_model = copy.deepcopy(model) 2025-12-04T10:08:42.0789653Z File "/opt/conda/envs/py_3.10/lib/python3.10/copy.py", line 172, in deepcopy 2025-12-04T10:08:42.0789773Z y = _reconstruct(x, memo, *rv) 2025-12-04T10:08:42.0790069Z File "/opt/conda/envs/py_3.10/lib/python3.10/copy.py", line 271, in _reconstruct 2025-12-04T10:08:42.0790206Z state = deepcopy(state, memo) 2025-12-04T10:08:42.0790473Z File "/opt/conda/envs/py_3.10/lib/python3.10/copy.py", line 146, in deepcopy 2025-12-04T10:08:42.0790579Z y = copier(x, memo) 2025-12-04T10:08:42.0790890Z File "/opt/conda/envs/py_3.10/lib/python3.10/copy.py", line 231, in _deepcopy_dict 2025-12-04T10:08:42.0791052Z y[deepcopy(key, memo)] = deepcopy(value, memo) 2025-12-04T10:08:42.0791329Z File "/opt/conda/envs/py_3.10/lib/python3.10/copy.py", line 153, in deepcopy 2025-12-04T10:08:42.0791433Z y = copier(memo) 2025-12-04T10:08:42.0791844Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_tensor.py", line 180, in __deepcopy__ 2025-12-04T10:08:42.0792028Z new_storage = self._typed_storage()._deepcopy(memo) 2025-12-04T10:08:42.0792425Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/storage.py", line 1139, in _deepcopy 2025-12-04T10:08:42.0792708Z return self._new_wrapped_storage(copy.deepcopy(self._untyped_storage, memo)) 2025-12-04T10:08:42.0792985Z File "/opt/conda/envs/py_3.10/lib/python3.10/copy.py", line 153, in deepcopy 2025-12-04T10:08:42.0793085Z y = copier(memo) 2025-12-04T10:08:42.0793507Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/storage.py", line 243, in __deepcopy__ 2025-12-04T10:08:42.0793631Z new_storage = self.clone() 2025-12-04T10:08:42.0793998Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/storage.py", line 257, in clone 2025-12-04T10:08:42.0794240Z return type(self)(self.nbytes(), device=self.device).copy_(self) 2025-12-04T10:08:42.0794464Z torch.AcceleratorError: CUDA error: invalid device function 2025-12-04T10:08:42.0795112Z Search for `cudaErrorInvalidDeviceFunction' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information. 2025-12-04T10:08:42.0795607Z CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. 2025-12-04T10:08:42.0795793Z For debugging consider passing CUDA_LAUNCH_BLOCKING=1 2025-12-04T10:08:42.0796044Z Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions. 2025-12-04T10:08:42.0796049Z 2025-12-04T10:08:42.0796660Z Exception raised from copy_device_to_device at /var/lib/jenkins/workspace/aten/src/ATen/native/cuda/Copy.cu:337 (most recent call first): 2025-12-04T10:08:42.0796777Z C++ CapturedTraceback: 2025-12-04T10:08:42.0798110Z #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T10:08:42.0798659Z #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T10:08:42.0799329Z #6 c10::AcceleratorError::AcceleratorError(c10::SourceLocation, int, std::__cxx11::basic_string, std::allocator > const&) from :0 2025-12-04T10:08:42.0799700Z #7 c10::cuda::c10_cuda_check_implementation(int, char const*, char const*, unsigned int, bool) from ??:0 2025-12-04T10:08:42.0799976Z #8 at::native::copy_device_to_device(at::TensorIterator&, bool, bool) from ??:0 2025-12-04T10:08:42.0800344Z #9 at::native::copy_impl(at::Tensor&, at::Tensor const&, bool) [clone .isra.0] from Copy.cpp:0 2025-12-04T10:08:42.0800562Z #10 at::native::copy_(at::Tensor&, at::Tensor const&, bool) from ??:0 2025-12-04T10:08:42.0802837Z #11 c10::impl::wrap_kernel_functor_unboxed_, at::Tensor&, c10::guts::typelist::typelist >, at::Tensor& (c10::DispatchKeySet, at::Tensor&, at::Tensor const&, bool)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor&, at::Tensor const&, bool) from VariableTypeManual.cpp:0 2025-12-04T10:08:42.0803450Z #12 torch::autograd::VariableType::(anonymous namespace)::copy_(c10::DispatchKeySet, at::Tensor&, at::Tensor const&, bool) from VariableTypeManual.cpp:0 2025-12-04T10:08:42.0803686Z #13 at::_ops::copy_::call(at::Tensor&, at::Tensor const&, bool) from ??:0 2025-12-04T10:08:42.0803921Z #14 at::storage_copy(c10::Storage&, c10::Storage const&, bool) from ??:0 2025-12-04T10:08:42.0804184Z #15 THPStorage_copy_(_object*, _object*, _object*) from StorageMethods.cpp:0 2025-12-04T10:08:42.0804620Z #16 method_vectorcall_VARARGS_KEYWORDS from /usr/local/src/conda/python-3.10.14/Objects/descrobject.c:344 2025-12-04T10:08:42.0805032Z #17 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0805426Z #18 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0805839Z #19 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0806214Z #20 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0806636Z #21 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0807014Z #22 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0807433Z #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0807804Z #24 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0808211Z #25 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0808605Z #26 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0809007Z #27 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0809392Z #28 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0809800Z #29 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0810176Z #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0810593Z #31 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0810966Z #32 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0811372Z #33 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0811834Z #34 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0812102Z #35 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T10:08:42.0812482Z #36 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0812891Z #37 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0813319Z #38 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0813735Z #39 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0814105Z #40 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0814409Z #41 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267 2025-12-04T10:08:42.0814673Z #42 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T10:08:42.0815042Z #43 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0815346Z #44 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267 2025-12-04T10:08:42.0815605Z #45 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T10:08:42.0815982Z #46 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0816379Z #47 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267 2025-12-04T10:08:42.0816643Z #48 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T10:08:42.0817037Z #49 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0817445Z #50 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0817820Z #51 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0818095Z #52 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T10:08:42.0818466Z #53 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0818887Z #54 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0819262Z #55 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0819671Z #56 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0820057Z #57 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0820464Z #58 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0820853Z #59 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0821260Z #60 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0821633Z #61 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0821941Z #62 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267 2025-12-04T10:08:42.0822207Z #63 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T10:08:42.0822578Z #64 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0822948Z #65 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153 2025-12-04T10:08:42.0823256Z #66 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431 2025-12-04T10:08:42.0823645Z #67 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494 2025-12-04T10:08:42.0823955Z #68 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215 2025-12-04T10:08:42.0824364Z #69 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112 2025-12-04T10:08:42.0824753Z #70 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0825240Z #71 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0825627Z #72 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0825888Z #73 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T10:08:42.0826260Z #74 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0826686Z #75 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0827063Z #76 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0827484Z #77 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0827856Z #78 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0828210Z #79 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153 2025-12-04T10:08:42.0828529Z #80 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431 2025-12-04T10:08:42.0828828Z #81 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494 2025-12-04T10:08:42.0829100Z #82 _PyObject_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:305 2025-12-04T10:08:42.0829375Z #83 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T10:08:42.0829755Z #84 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0830172Z #85 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0830541Z #86 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0830949Z #87 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0831336Z #88 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0831597Z #89 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T10:08:42.0831978Z #90 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0832382Z #91 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0832754Z #92 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0833170Z #93 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0833538Z #94 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0833809Z #95 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T10:08:42.0834183Z #96 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0834585Z #97 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0834969Z #98 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0835375Z #99 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0835832Z #100 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0836203Z #101 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153 2025-12-04T10:08:42.0836513Z #102 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431 2025-12-04T10:08:42.0836827Z #103 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494 2025-12-04T10:08:42.0837203Z #104 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215 2025-12-04T10:08:42.0837622Z #105 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112 2025-12-04T10:08:42.0838021Z #106 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0838287Z #107 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T10:08:42.0838683Z #108 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0839103Z #109 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0839479Z #110 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0840071Z #111 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0840452Z #112 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0840826Z #113 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153 2025-12-04T10:08:42.0841141Z #114 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431 2025-12-04T10:08:42.0841440Z #115 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494 2025-12-04T10:08:42.0841763Z #116 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215 2025-12-04T10:08:42.0842184Z #117 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112 2025-12-04T10:08:42.0842567Z #118 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0842997Z #119 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0843379Z #120 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0843815Z #121 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0844195Z #122 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0844466Z #123 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T10:08:42.0844858Z #124 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0845277Z #125 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0845664Z #126 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0846078Z #127 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0846454Z #128 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0846827Z #129 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153 2025-12-04T10:08:42.0847136Z #130 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431 2025-12-04T10:08:42.0847446Z #131 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494 2025-12-04T10:08:42.0847758Z #132 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215 2025-12-04T10:08:42.0848252Z #133 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112 2025-12-04T10:08:42.0848647Z #134 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0849058Z #135 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0849434Z #136 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0849930Z #137 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0850309Z #138 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0850908Z #139 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0851290Z #140 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0851711Z #141 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0852106Z #142 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0852402Z #143 PyEval_EvalCode from /usr/local/src/conda/python-3.10.14/Python/ceval.c:1134 2025-12-04T10:08:42.0852730Z #144 run_eval_code_obj from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1291 2025-12-04T10:08:42.0853008Z #145 run_mod from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1312 2025-12-04T10:08:42.0853298Z #146 pyrun_file from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1208 2025-12-04T10:08:42.0853670Z #147 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:456 2025-12-04T10:08:42.0854000Z #148 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:90 2025-12-04T10:08:42.0854293Z #149 pymain_run_file_obj from /usr/local/src/conda/python-3.10.14/Modules/main.c:357 2025-12-04T10:08:42.0854584Z #150 Py_BytesMain from /usr/local/src/conda/python-3.10.14/Modules/main.c:1090 2025-12-04T10:08:42.0854854Z #151 __libc_start_call_main from ./csu/../sysdeps/nptl/libc_start_call_main.h:58 2025-12-04T10:08:42.0855064Z #152 __libc_start_main_impl from ./csu/../csu/libc-start.c:392 2025-12-04T10:08:42.0855170Z #153 _start from ??:0 2025-12-04T10:08:42.0855295Z #154 from ??:0 2025-12-04T10:08:42.0855306Z 2025-12-04T10:08:42.0855315Z 2025-12-04T10:08:42.0855552Z To execute this test, run the following from the base repo dir: 2025-12-04T10:08:42.0856418Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_aot_inductor.py AOTInductorTestABICompatibleGpu.test__weight_int4pack_mm_m_32_n_64_q_group_64_num_groups_1_cuda 2025-12-04T10:08:42.0856425Z 2025-12-04T10:08:42.0856711Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:08:42.0856942Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:08:42.0858066Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/storage.py:257: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:08:42.0858313Z return type(self)(self.nbytes(), device=self.device).copy_(self) 2025-12-04T10:08:42.0859055Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-3e5313e420476f15.xml - 2025-12-04T10:08:42.0859253Z =========================== short test summary info ============================ 2025-12-04T10:08:42.0860166Z FAILED [0.0054s] inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test__weight_int4pack_mm_m_32_n_64_q_group_64_num_groups_1_cuda - torch.AcceleratorError: CUDA error: invalid device function 2025-12-04T10:08:42.0860811Z Search for `cudaErrorInvalidDeviceFunction' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information. 2025-12-04T10:08:42.0861392Z CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. 2025-12-04T10:08:42.0861583Z For debugging consider passing CUDA_LAUNCH_BLOCKING=1 2025-12-04T10:08:42.0861835Z Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions. 2025-12-04T10:08:42.0861841Z 2025-12-04T10:08:42.0862451Z Exception raised from copy_device_to_device at /var/lib/jenkins/workspace/aten/src/ATen/native/cuda/Copy.cu:337 (most recent call first): 2025-12-04T10:08:42.0862626Z C++ CapturedTraceback: 2025-12-04T10:08:42.0863958Z #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T10:08:42.0864450Z #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T10:08:42.0865116Z #6 c10::AcceleratorError::AcceleratorError(c10::SourceLocation, int, std::__cxx11::basic_string, std::allocator > const&) from :0 2025-12-04T10:08:42.0865484Z #7 c10::cuda::c10_cuda_check_implementation(int, char const*, char const*, unsigned int, bool) from ??:0 2025-12-04T10:08:42.0865769Z #8 at::native::copy_device_to_device(at::TensorIterator&, bool, bool) from ??:0 2025-12-04T10:08:42.0866071Z #9 at::native::copy_impl(at::Tensor&, at::Tensor const&, bool) [clone .isra.0] from Copy.cpp:0 2025-12-04T10:08:42.0866288Z #10 at::native::copy_(at::Tensor&, at::Tensor const&, bool) from ??:0 2025-12-04T10:08:42.0868433Z #11 c10::impl::wrap_kernel_functor_unboxed_, at::Tensor&, c10::guts::typelist::typelist >, at::Tensor& (c10::DispatchKeySet, at::Tensor&, at::Tensor const&, bool)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor&, at::Tensor const&, bool) from VariableTypeManual.cpp:0 2025-12-04T10:08:42.0869039Z #12 torch::autograd::VariableType::(anonymous namespace)::copy_(c10::DispatchKeySet, at::Tensor&, at::Tensor const&, bool) from VariableTypeManual.cpp:0 2025-12-04T10:08:42.0869274Z #13 at::_ops::copy_::call(at::Tensor&, at::Tensor const&, bool) from ??:0 2025-12-04T10:08:42.0869510Z #14 at::storage_copy(c10::Storage&, c10::Storage const&, bool) from ??:0 2025-12-04T10:08:42.0869777Z #15 THPStorage_copy_(_object*, _object*, _object*) from StorageMethods.cpp:0 2025-12-04T10:08:42.0870213Z #16 method_vectorcall_VARARGS_KEYWORDS from /usr/local/src/conda/python-3.10.14/Objects/descrobject.c:344 2025-12-04T10:08:42.0870630Z #17 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0871189Z #18 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0871599Z #19 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0871971Z #20 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0872395Z #21 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0872765Z #22 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0873182Z #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0873551Z #24 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0874089Z #25 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0874478Z #26 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0874880Z #27 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0875264Z #28 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0875786Z #29 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0876154Z #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0876576Z #31 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0876945Z #32 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0877368Z #33 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0877740Z #34 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0878009Z #35 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T10:08:42.0878395Z #36 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0878805Z #37 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0879172Z #38 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0879590Z #39 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0879963Z #40 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0880273Z #41 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267 2025-12-04T10:08:42.0880541Z #42 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T10:08:42.0880916Z #43 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0881221Z #44 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267 2025-12-04T10:08:42.0881482Z #45 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T10:08:42.0881870Z #46 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0882162Z #47 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267 2025-12-04T10:08:42.0882422Z #48 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T10:08:42.0882811Z #49 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0883224Z #50 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0883594Z #51 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0883867Z #52 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T10:08:42.0884237Z #53 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0884660Z #54 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0885030Z #55 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0885436Z #56 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0885820Z #57 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0886291Z #58 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0886674Z #59 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0887080Z #60 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0887451Z #61 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0887818Z #62 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267 2025-12-04T10:08:42.0888080Z #63 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T10:08:42.0888464Z #64 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0888815Z #65 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153 2025-12-04T10:08:42.0889124Z #66 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431 2025-12-04T10:08:42.0889437Z #67 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494 2025-12-04T10:08:42.0889741Z #68 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215 2025-12-04T10:08:42.0890153Z #69 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112 2025-12-04T10:08:42.0890544Z #70 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0890957Z #71 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0891340Z #72 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0891602Z #73 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T10:08:42.0891973Z #74 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0892398Z #75 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0892771Z #76 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0893194Z #77 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0893567Z #78 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0893923Z #79 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153 2025-12-04T10:08:42.0894248Z #80 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431 2025-12-04T10:08:42.0894547Z #81 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494 2025-12-04T10:08:42.0894832Z #82 _PyObject_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:305 2025-12-04T10:08:42.0895093Z #83 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T10:08:42.0895470Z #84 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0895892Z #85 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0896325Z #86 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0896737Z #87 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0897128Z #88 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0897388Z #89 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T10:08:42.0897774Z #90 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0898184Z #91 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0898631Z #92 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0899052Z #93 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0899423Z #94 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0899694Z #95 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T10:08:42.0900130Z #96 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0900536Z #97 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0900922Z #98 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0901327Z #99 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0901729Z #100 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0902089Z #101 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153 2025-12-04T10:08:42.0902404Z #102 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431 2025-12-04T10:08:42.0902720Z #103 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494 2025-12-04T10:08:42.0903035Z #104 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215 2025-12-04T10:08:42.0903451Z #105 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112 2025-12-04T10:08:42.0903845Z #106 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0904111Z #107 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T10:08:42.0904510Z #108 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0904924Z #109 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0905301Z #110 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0905728Z #111 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0906110Z #112 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0906479Z #113 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153 2025-12-04T10:08:42.0906792Z #114 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431 2025-12-04T10:08:42.0907096Z #115 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494 2025-12-04T10:08:42.0907420Z #116 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215 2025-12-04T10:08:42.0907835Z #117 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112 2025-12-04T10:08:42.0908214Z #118 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0908637Z #119 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0909019Z #120 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0909443Z #121 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0909824Z #122 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0910091Z #123 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T10:08:42.0910485Z #124 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0910968Z #125 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0911359Z #126 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0911771Z #127 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0912208Z #128 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0912580Z #129 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153 2025-12-04T10:08:42.0912893Z #130 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431 2025-12-04T10:08:42.0913210Z #131 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494 2025-12-04T10:08:42.0913523Z #132 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215 2025-12-04T10:08:42.0913942Z #133 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112 2025-12-04T10:08:42.0914339Z #134 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0914754Z #135 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0915131Z #136 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0915566Z #137 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0915944Z #138 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0916371Z #139 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0916752Z #140 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0917169Z #141 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:08:42.0917563Z #142 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:08:42.0917858Z #143 PyEval_EvalCode from /usr/local/src/conda/python-3.10.14/Python/ceval.c:1134 2025-12-04T10:08:42.0918179Z #144 run_eval_code_obj from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1291 2025-12-04T10:08:42.0918458Z #145 run_mod from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1312 2025-12-04T10:08:42.0918746Z #146 pyrun_file from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1208 2025-12-04T10:08:42.0919119Z #147 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:456 2025-12-04T10:08:42.0919446Z #148 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:90 2025-12-04T10:08:42.0919752Z #149 pymain_run_file_obj from /usr/local/src/conda/python-3.10.14/Modules/main.c:357 2025-12-04T10:08:42.0920029Z #150 Py_BytesMain from /usr/local/src/conda/python-3.10.14/Modules/main.c:1090 2025-12-04T10:08:42.0920298Z #151 __libc_start_call_main from ./csu/../sysdeps/nptl/libc_start_call_main.h:58 2025-12-04T10:08:42.0920510Z #152 __libc_start_main_impl from ./csu/../csu/libc-start.c:392 2025-12-04T10:08:42.0920614Z #153 _start from ??:0 2025-12-04T10:08:42.0920739Z #154 from ??:0 2025-12-04T10:08:42.0920749Z 2025-12-04T10:08:42.0920754Z 2025-12-04T10:08:42.0920993Z To execute this test, run the following from the base repo dir: 2025-12-04T10:08:42.0921784Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_aot_inductor.py AOTInductorTestABICompatibleGpu.test__weight_int4pack_mm_m_32_n_64_q_group_64_num_groups_1_cuda 2025-12-04T10:08:42.0921790Z 2025-12-04T10:08:42.0922070Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:08:42.0922349Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:08:42.0922556Z ================= 1 failed, 157 deselected, 2 rerun in 16.09s ================== 2025-12-04T10:08:42.0922674Z Got exit code 1 2025-12-04T10:08:42.0923382Z FAILED CONSISTENTLY: test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test__weight_int4pack_mm_m_32_n_64_q_group_64_num_groups_1_cuda 2025-12-04T10:08:42.0923807Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T10:08:42.0924316Z W1204 09:59:32.768000 15876 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T10:08:42.0924880Z Test results will be stored in test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-b23b654b51890d24.xml 2025-12-04T10:08:42.0925064Z ============================= test session starts ============================== 2025-12-04T10:08:42.0925422Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:08:42.0925551Z cachedir: .pytest_cache 2025-12-04T10:08:42.0926074Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:08:42.0926201Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:08:42.0926325Z configfile: pytest.ini 2025-12-04T10:08:42.0926870Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:08:42.0927103Z collecting ... collected 934 items / 59 deselected / 875 selected 2025-12-04T10:08:42.0927262Z stepcurrent: skipping 59 already run items. 2025-12-04T10:08:42.0927382Z Running 99 items in this shard 2025-12-04T10:08:42.0927388Z 2025-12-04T10:08:42.0928246Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test__weight_int4pack_mm_with_scales_and_zeros_m_32_n_64_q_group_32_num_groups_1_cuda SKIPPED [0.0040s] (requires Intel GPU) [ 1%] 2025-12-04T10:08:42.0929079Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_addmm_multiple_dynamic_cuda SKIPPED [0.0003s] (Skipping triton backend only since not big GPU (not enough SM)) [ 2%] 2025-12-04T10:08:42.0929858Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_aoti_constant_tensor_name_collision_cuda <- test/inductor/test_torchinductor.py PASSED [9.9357s] [ 3%] 2025-12-04T10:08:42.0930721Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_aoti_debug_printer_cpp_kernel_cuda <- test/inductor/test_torchinductor.py SKIPPED [0.0032s] (cpu test case only) [ 4%] 2025-12-04T10:08:42.0931409Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_assert_tensor_meta_cuda <- test/inductor/test_torchinductor.py PASSED [5.9687s] [ 5%] 2025-12-04T10:08:42.0932125Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_autotuning_args_reuse_cuda <- test/inductor/test_torchinductor.py PASSED [8.1062s] [ 6%] 2025-12-04T10:08:42.0932811Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_buffer_mutation_1_cuda <- test/inductor/test_torchinductor.py PASSED [6.1250s] [ 7%] 2025-12-04T10:08:42.0933528Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_buffer_mutation_and_force_mmap_weights_cuda SKIPPED [0.0031s] (Test for x86 backend) [ 8%] 2025-12-04T10:08:42.0934889Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_cond_with_parameters_cuda <- test/inductor/test_torchinductor.py W1204 10:00:05.791000 15876 site-packages/torch/export/dynamic_shapes.py:923] Using None as a dynamic shape dimension is deprecated. Please use Dim.STATIC instead 2025-12-04T10:08:42.0935354Z W1204 10:00:06.500000 15876 site-packages/torch/_inductor/utils.py:1703] [0/0] Not enough SMs to use max_autotune_gemm mode 2025-12-04T10:08:42.0935475Z PASSED [8.1015s] [ 9%] 2025-12-04T10:08:42.0936122Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_conv3d_cuda SKIPPED [0.0035s] (requires modern GPU to run max-autotune) [ 10%] 2025-12-04T10:08:42.0938356Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_copy_non_blocking_is_pinned_cuda <- test/inductor/test_torchinductor.py SKIPPED [0.0006s] (Test is disabled because an issue exists disabling it: https://github.com/pytorch/pytorch/issues/164858 for platform(s) linux. If you're seeing this on your local machine and would like to enable this test, please make sure CI is not set and you are not using the flag --import-disabled-tests.) [ 11%] 2025-12-04T10:08:42.0940106Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_deconv_freezing_cuda ('RERUN', {'yellow': True}) [13.2476s] [ 12%] 2025-12-04T10:08:42.0940701Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_deconv_freezing_cuda ('RERUN', {'yellow': True}) [11.7770s] [ 12%] 2025-12-04T10:08:42.0941191Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_deconv_freezing_cuda FAILED [11.8502s] [ 12%] 2025-12-04T10:08:42.0941197Z 2025-12-04T10:08:42.0941348Z ==================================== RERUNS ==================================== 2025-12-04T10:08:42.0941673Z __________ AOTInductorTestABICompatibleGpu.test_deconv_freezing_cuda ___________ 2025-12-04T10:08:42.0941798Z Traceback (most recent call last): 2025-12-04T10:08:42.0942272Z File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor.py", line 873, in test_deconv_freezing 2025-12-04T10:08:42.0942462Z self.check_model(Model(self.device), example_inputs) 2025-12-04T10:08:42.0942897Z File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_utils.py", line 252, in check_model 2025-12-04T10:08:42.0943033Z actual = AOTIRunnerUtil.run( 2025-12-04T10:08:42.0943420Z File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_utils.py", line 184, in run 2025-12-04T10:08:42.0943564Z package_path = AOTIRunnerUtil.compile( 2025-12-04T10:08:42.0943981Z File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_utils.py", line 172, in compile 2025-12-04T10:08:42.0944187Z package_path = torch._inductor.aoti_compile_and_package( 2025-12-04T10:08:42.0944731Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 151, in aoti_compile_and_package 2025-12-04T10:08:42.0944864Z return aot_inductor_minifier_wrapper( 2025-12-04T10:08:42.0945409Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/debug.py", line 1336, in aot_inductor_minifier_wrapper 2025-12-04T10:08:42.0945521Z raise e 2025-12-04T10:08:42.0946058Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/debug.py", line 1306, in aot_inductor_minifier_wrapper 2025-12-04T10:08:42.0946170Z return func( 2025-12-04T10:08:42.0946717Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 195, in _aoti_compile_and_package_inner 2025-12-04T10:08:42.0946948Z aoti_files = aot_compile(gm, args, kwargs, options=inductor_configs) 2025-12-04T10:08:42.0947423Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 311, in aot_compile 2025-12-04T10:08:42.0947542Z return compile_fx_aot( 2025-12-04T10:08:42.0948034Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2007, in compile_fx_aot 2025-12-04T10:08:42.0948174Z compiled_artifacts = compile_fx( 2025-12-04T10:08:42.0948644Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2477, in compile_fx 2025-12-04T10:08:42.0948768Z return compile_fx( 2025-12-04T10:08:42.0949235Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2516, in compile_fx 2025-12-04T10:08:42.0949372Z return _maybe_wrap_and_compile_fx_main( 2025-12-04T10:08:42.0949953Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2605, in _maybe_wrap_and_compile_fx_main 2025-12-04T10:08:42.0950069Z return _compile_fx_main( 2025-12-04T10:08:42.0950643Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2788, in _compile_fx_main 2025-12-04T10:08:42.0950860Z return inference_compiler(unlifted_gm, example_inputs_) 2025-12-04T10:08:42.0951384Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2098, in fw_compiler_freezing 2025-12-04T10:08:42.0951585Z optimized_function = inner_compile( 2025-12-04T10:08:42.0951868Z File "/opt/conda/envs/py_3.10/lib/python3.10/contextlib.py", line 79, in inner 2025-12-04T10:08:42.0951983Z return func(*args, **kwds) 2025-12-04T10:08:42.0952492Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 806, in compile_fx_inner 2025-12-04T10:08:42.0952758Z return wrap_compiler_debug(_compile_fx_inner, compiler_name="inductor")( 2025-12-04T10:08:42.0953271Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/repro/after_aot.py", line 146, in debug_wrapper 2025-12-04T10:08:42.0953448Z inner_compiled_fn = compiler_fn(gm, example_inputs) 2025-12-04T10:08:42.0953949Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 992, in _compile_fx_inner 2025-12-04T10:08:42.0954155Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:08:42.0954658Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 988, in _compile_fx_inner 2025-12-04T10:08:42.0954810Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:08:42.0955354Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:08:42.0955676Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:08:42.0956210Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1471, in codegen_and_compile 2025-12-04T10:08:42.0956341Z _check_triton_bf16_support(graph) 2025-12-04T10:08:42.0956892Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2911, in _check_triton_bf16_support 2025-12-04T10:08:42.0957027Z warn_and_skip(node.get_device()) 2025-12-04T10:08:42.0957510Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2894, in warn_and_skip 2025-12-04T10:08:42.0957669Z raise SkipFrame("BF16 is not supported") 2025-12-04T10:08:42.0957926Z torch._inductor.exc.InductorError: SkipFrame: BF16 is not supported 2025-12-04T10:08:42.0957933Z 2025-12-04T10:08:42.0958150Z To execute this test, run the following from the base repo dir: 2025-12-04T10:08:42.0958758Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_aot_inductor.py AOTInductorTestABICompatibleGpu.test_deconv_freezing_cuda 2025-12-04T10:08:42.0958764Z 2025-12-04T10:08:42.0959033Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:08:42.0959272Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:08:42.0959381Z unimplemented [] 2025-12-04T10:08:42.0959539Z stats [('calls_captured', 3), ('unique_graphs', 3)] 2025-12-04T10:08:42.0959892Z inductor [('async_compile_cache_miss', 4), ('extern_calls', 2), ('async_compile_cache_hit', 2)] 2025-12-04T10:08:42.0959995Z graph_break [] 2025-12-04T10:08:42.0960220Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:08:42.0960969Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T10:08:42.0961076Z warnings.warn( 2025-12-04T10:08:42.0961489Z __________ AOTInductorTestABICompatibleGpu.test_deconv_freezing_cuda ___________ 2025-12-04T10:08:42.0961677Z Traceback (most recent call last): 2025-12-04T10:08:42.0962215Z File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor.py", line 873, in test_deconv_freezing 2025-12-04T10:08:42.0962415Z self.check_model(Model(self.device), example_inputs) 2025-12-04T10:08:42.0962850Z File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_utils.py", line 252, in check_model 2025-12-04T10:08:42.0962974Z actual = AOTIRunnerUtil.run( 2025-12-04T10:08:42.0963376Z File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_utils.py", line 184, in run 2025-12-04T10:08:42.0963584Z package_path = AOTIRunnerUtil.compile( 2025-12-04T10:08:42.0964005Z File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_utils.py", line 172, in compile 2025-12-04T10:08:42.0964211Z package_path = torch._inductor.aoti_compile_and_package( 2025-12-04T10:08:42.0964738Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 151, in aoti_compile_and_package 2025-12-04T10:08:42.0964885Z return aot_inductor_minifier_wrapper( 2025-12-04T10:08:42.0965433Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/debug.py", line 1336, in aot_inductor_minifier_wrapper 2025-12-04T10:08:42.0965532Z raise e 2025-12-04T10:08:42.0966087Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/debug.py", line 1306, in aot_inductor_minifier_wrapper 2025-12-04T10:08:42.0966188Z return func( 2025-12-04T10:08:42.0966756Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 195, in _aoti_compile_and_package_inner 2025-12-04T10:08:42.0966997Z aoti_files = aot_compile(gm, args, kwargs, options=inductor_configs) 2025-12-04T10:08:42.0967460Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 311, in aot_compile 2025-12-04T10:08:42.0967592Z return compile_fx_aot( 2025-12-04T10:08:42.0968087Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2007, in compile_fx_aot 2025-12-04T10:08:42.0968214Z compiled_artifacts = compile_fx( 2025-12-04T10:08:42.0968703Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2477, in compile_fx 2025-12-04T10:08:42.0968812Z return compile_fx( 2025-12-04T10:08:42.0969296Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2516, in compile_fx 2025-12-04T10:08:42.0969434Z return _maybe_wrap_and_compile_fx_main( 2025-12-04T10:08:42.0970008Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2605, in _maybe_wrap_and_compile_fx_main 2025-12-04T10:08:42.0970142Z return _compile_fx_main( 2025-12-04T10:08:42.0970645Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2788, in _compile_fx_main 2025-12-04T10:08:42.0970863Z return inference_compiler(unlifted_gm, example_inputs_) 2025-12-04T10:08:42.0971565Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2098, in fw_compiler_freezing 2025-12-04T10:08:42.0971696Z optimized_function = inner_compile( 2025-12-04T10:08:42.0971991Z File "/opt/conda/envs/py_3.10/lib/python3.10/contextlib.py", line 79, in inner 2025-12-04T10:08:42.0972138Z return func(*args, **kwds) 2025-12-04T10:08:42.0972767Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 806, in compile_fx_inner 2025-12-04T10:08:42.0973056Z return wrap_compiler_debug(_compile_fx_inner, compiler_name="inductor")( 2025-12-04T10:08:42.0973549Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/repro/after_aot.py", line 146, in debug_wrapper 2025-12-04T10:08:42.0973738Z inner_compiled_fn = compiler_fn(gm, example_inputs) 2025-12-04T10:08:42.0974240Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 992, in _compile_fx_inner 2025-12-04T10:08:42.0974430Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:08:42.0975129Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 988, in _compile_fx_inner 2025-12-04T10:08:42.0975279Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:08:42.0975823Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:08:42.0976144Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:08:42.0976927Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1471, in codegen_and_compile 2025-12-04T10:08:42.0977071Z _check_triton_bf16_support(graph) 2025-12-04T10:08:42.0977617Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2911, in _check_triton_bf16_support 2025-12-04T10:08:42.0977739Z warn_and_skip(node.get_device()) 2025-12-04T10:08:42.0978243Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2894, in warn_and_skip 2025-12-04T10:08:42.0978385Z raise SkipFrame("BF16 is not supported") 2025-12-04T10:08:42.0978655Z torch._inductor.exc.InductorError: SkipFrame: BF16 is not supported 2025-12-04T10:08:42.0978661Z 2025-12-04T10:08:42.0978878Z To execute this test, run the following from the base repo dir: 2025-12-04T10:08:42.0979472Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_aot_inductor.py AOTInductorTestABICompatibleGpu.test_deconv_freezing_cuda 2025-12-04T10:08:42.0979483Z 2025-12-04T10:08:42.0979768Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:08:42.0979993Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:08:42.0980169Z unimplemented [] 2025-12-04T10:08:42.0980379Z stats [('calls_captured', 3), ('unique_graphs', 3)] 2025-12-04T10:08:42.0980712Z inductor [('async_compile_cache_miss', 4), ('extern_calls', 2), ('async_compile_cache_hit', 2)] 2025-12-04T10:08:42.0980837Z graph_break [] 2025-12-04T10:08:42.0981056Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:08:42.0981790Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T10:08:42.0981910Z warnings.warn( 2025-12-04T10:08:42.0982131Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:08:42.0982250Z unimplemented [] 2025-12-04T10:08:42.0982410Z stats [('calls_captured', 3), ('unique_graphs', 3)] 2025-12-04T10:08:42.0982742Z inductor [('async_compile_cache_miss', 4), ('extern_calls', 2), ('async_compile_cache_hit', 2)] 2025-12-04T10:08:42.0982856Z graph_break [] 2025-12-04T10:08:42.0983072Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:08:42.0983806Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T10:08:42.0983922Z warnings.warn( 2025-12-04T10:08:42.0984071Z =================================== FAILURES =================================== 2025-12-04T10:08:42.0984395Z __________ AOTInductorTestABICompatibleGpu.test_deconv_freezing_cuda ___________ 2025-12-04T10:08:42.0984519Z Traceback (most recent call last): 2025-12-04T10:08:42.0984981Z File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor.py", line 873, in test_deconv_freezing 2025-12-04T10:08:42.0985179Z self.check_model(Model(self.device), example_inputs) 2025-12-04T10:08:42.0985612Z File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_utils.py", line 252, in check_model 2025-12-04T10:08:42.0985733Z actual = AOTIRunnerUtil.run( 2025-12-04T10:08:42.0986132Z File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_utils.py", line 184, in run 2025-12-04T10:08:42.0986275Z package_path = AOTIRunnerUtil.compile( 2025-12-04T10:08:42.0986773Z File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_utils.py", line 172, in compile 2025-12-04T10:08:42.0986977Z package_path = torch._inductor.aoti_compile_and_package( 2025-12-04T10:08:42.0987499Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 151, in aoti_compile_and_package 2025-12-04T10:08:42.0987705Z return aot_inductor_minifier_wrapper( 2025-12-04T10:08:42.0988248Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/debug.py", line 1336, in aot_inductor_minifier_wrapper 2025-12-04T10:08:42.0988345Z raise e 2025-12-04T10:08:42.0988899Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/debug.py", line 1306, in aot_inductor_minifier_wrapper 2025-12-04T10:08:42.0988998Z return func( 2025-12-04T10:08:42.0989560Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 195, in _aoti_compile_and_package_inner 2025-12-04T10:08:42.0989800Z aoti_files = aot_compile(gm, args, kwargs, options=inductor_configs) 2025-12-04T10:08:42.0990257Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 311, in aot_compile 2025-12-04T10:08:42.0990386Z return compile_fx_aot( 2025-12-04T10:08:42.0990880Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2007, in compile_fx_aot 2025-12-04T10:08:42.0991024Z compiled_artifacts = compile_fx( 2025-12-04T10:08:42.0991494Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2477, in compile_fx 2025-12-04T10:08:42.0991601Z return compile_fx( 2025-12-04T10:08:42.0992084Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2516, in compile_fx 2025-12-04T10:08:42.0992219Z return _maybe_wrap_and_compile_fx_main( 2025-12-04T10:08:42.0992794Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2605, in _maybe_wrap_and_compile_fx_main 2025-12-04T10:08:42.0992923Z return _compile_fx_main( 2025-12-04T10:08:42.0993430Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2788, in _compile_fx_main 2025-12-04T10:08:42.0993644Z return inference_compiler(unlifted_gm, example_inputs_) 2025-12-04T10:08:42.0994171Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2098, in fw_compiler_freezing 2025-12-04T10:08:42.0994298Z optimized_function = inner_compile( 2025-12-04T10:08:42.0994590Z File "/opt/conda/envs/py_3.10/lib/python3.10/contextlib.py", line 79, in inner 2025-12-04T10:08:42.0994705Z return func(*args, **kwds) 2025-12-04T10:08:42.0995199Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 806, in compile_fx_inner 2025-12-04T10:08:42.0995481Z return wrap_compiler_debug(_compile_fx_inner, compiler_name="inductor")( 2025-12-04T10:08:42.0995975Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/repro/after_aot.py", line 146, in debug_wrapper 2025-12-04T10:08:42.0996161Z inner_compiled_fn = compiler_fn(gm, example_inputs) 2025-12-04T10:08:42.0996663Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 992, in _compile_fx_inner 2025-12-04T10:08:42.0996859Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:08:42.0997374Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 988, in _compile_fx_inner 2025-12-04T10:08:42.0997522Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:08:42.0998068Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:08:42.0998390Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:08:42.0998976Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1471, in codegen_and_compile 2025-12-04T10:08:42.0999118Z _check_triton_bf16_support(graph) 2025-12-04T10:08:42.0999664Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2911, in _check_triton_bf16_support 2025-12-04T10:08:42.0999787Z warn_and_skip(node.get_device()) 2025-12-04T10:08:42.1000344Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2894, in warn_and_skip 2025-12-04T10:08:42.1000489Z raise SkipFrame("BF16 is not supported") 2025-12-04T10:08:42.1000759Z torch._inductor.exc.InductorError: SkipFrame: BF16 is not supported 2025-12-04T10:08:42.1000766Z 2025-12-04T10:08:42.1000985Z To execute this test, run the following from the base repo dir: 2025-12-04T10:08:42.1001577Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_aot_inductor.py AOTInductorTestABICompatibleGpu.test_deconv_freezing_cuda 2025-12-04T10:08:42.1001599Z 2025-12-04T10:08:42.1001869Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:08:42.1002091Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:08:42.1002211Z unimplemented [] 2025-12-04T10:08:42.1002368Z stats [('calls_captured', 3), ('unique_graphs', 3)] 2025-12-04T10:08:42.1002702Z inductor [('async_compile_cache_miss', 4), ('extern_calls', 2), ('async_compile_cache_hit', 2)] 2025-12-04T10:08:42.1002817Z graph_break [] 2025-12-04T10:08:42.1003035Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:08:42.1003764Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T10:08:42.1003879Z warnings.warn( 2025-12-04T10:08:42.1004095Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:08:42.1004215Z unimplemented [] 2025-12-04T10:08:42.1004373Z stats [('calls_captured', 3), ('unique_graphs', 3)] 2025-12-04T10:08:42.1004699Z inductor [('async_compile_cache_miss', 4), ('extern_calls', 2), ('async_compile_cache_hit', 2)] 2025-12-04T10:08:42.1004812Z graph_break [] 2025-12-04T10:08:42.1005027Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:08:42.1005755Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T10:08:42.1005870Z warnings.warn( 2025-12-04T10:08:42.1006083Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:08:42.1006201Z unimplemented [] 2025-12-04T10:08:42.1006358Z stats [('calls_captured', 3), ('unique_graphs', 3)] 2025-12-04T10:08:42.1006685Z inductor [('async_compile_cache_miss', 4), ('extern_calls', 2), ('async_compile_cache_hit', 2)] 2025-12-04T10:08:42.1006807Z graph_break [] 2025-12-04T10:08:42.1007023Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:08:42.1007751Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T10:08:42.1007866Z warnings.warn( 2025-12-04T10:08:42.1008608Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-b23b654b51890d24.xml - 2025-12-04T10:08:42.1008800Z =========================== short test summary info ============================ 2025-12-04T10:08:42.1009609Z FAILED [11.8502s] inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_deconv_freezing_cuda - torch._inductor.exc.InductorError: SkipFrame: BF16 is not supported 2025-12-04T10:08:42.1009615Z 2025-12-04T10:08:42.1009836Z To execute this test, run the following from the base repo dir: 2025-12-04T10:08:42.1010511Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_aot_inductor.py AOTInductorTestABICompatibleGpu.test_deconv_freezing_cuda 2025-12-04T10:08:42.1010518Z 2025-12-04T10:08:42.1010786Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:08:42.1010980Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:08:42.1011310Z == 1 failed, 5 passed, 6 skipped, 59 deselected, 2 rerun in 75.23s (0:01:15) === 2025-12-04T10:08:42.1011412Z Got exit code 1 2025-12-04T10:08:42.1011535Z Retrying single test... 2025-12-04T10:08:42.1011981Z W1204 10:01:02.102000 17907 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T10:08:42.1012559Z Test results will be stored in test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-2e7c8f13f7be0603.xml 2025-12-04T10:08:42.1012727Z ============================= test session starts ============================== 2025-12-04T10:08:42.1013085Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:08:42.1013209Z cachedir: .pytest_cache 2025-12-04T10:08:42.1013734Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:08:42.1013861Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:08:42.1013988Z configfile: pytest.ini 2025-12-04T10:08:42.1014529Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:08:42.1014774Z collecting ... collected 934 items / 157 deselected / 777 selected 2025-12-04T10:08:42.1015446Z stepcurrent: skipping 70 already run items. Running only test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_deconv_freezing_cuda 2025-12-04T10:08:42.1015562Z Running 1 items in this shard 2025-12-04T10:08:42.1015567Z 2025-12-04T10:08:42.1016667Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_deconv_freezing_cuda [W1204 10:01:06.686889056 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:08:42.1016675Z 2025-12-04T10:08:42.1017193Z [W1204 10:01:22.458104453 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:08:42.1017203Z 2025-12-04T10:08:42.1017730Z [W1204 10:01:22.461371808 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:08:42.1017735Z 2025-12-04T10:08:42.1018199Z W1204 10:01:22.133000 17907 site-packages/torch/_inductor/utils.py:1703] [0/0] Not enough SMs to use max_autotune_gemm mode 2025-12-04T10:08:42.1018730Z [W1204 10:01:29.177153013 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:08:42.1018735Z 2025-12-04T10:08:42.1019251Z [W1204 10:01:29.177692163 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:08:42.1019256Z 2025-12-04T10:08:42.1019786Z [W1204 10:01:29.177873901 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:08:42.1019792Z 2025-12-04T10:08:42.1020303Z [W1204 10:01:29.182719305 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:08:42.1020313Z 2025-12-04T10:08:42.1020825Z [W1204 10:01:29.186747597 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:08:42.1020829Z 2025-12-04T10:08:42.1021354Z [W1204 10:01:36.205092467 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:08:42.1021358Z 2025-12-04T10:08:42.1021933Z [W1204 10:01:36.207015370 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:08:42.1021938Z 2025-12-04T10:08:42.1022466Z [W1204 10:01:36.209392177 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:08:42.1022470Z 2025-12-04T10:08:42.1022608Z ('RERUN', {'yellow': True}) [32.8516s] [100%] 2025-12-04T10:08:42.1023673Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_deconv_freezing_cuda [W1204 10:01:36.381944046 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:08:42.1023679Z 2025-12-04T10:08:42.1024193Z [W1204 10:01:36.383730334 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:08:42.1024199Z 2025-12-04T10:08:42.1024729Z [W1204 10:01:36.386050108 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:08:42.1024734Z 2025-12-04T10:08:42.1025243Z [W1204 10:01:42.884198898 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:08:42.1025248Z 2025-12-04T10:08:42.1025758Z [W1204 10:01:42.884665417 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:08:42.1025781Z 2025-12-04T10:08:42.1026293Z [W1204 10:01:42.884838468 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:08:42.1026297Z 2025-12-04T10:08:42.1026807Z [W1204 10:01:42.888045367 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:08:42.1026811Z 2025-12-04T10:08:42.1027341Z [W1204 10:01:42.891762381 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:08:42.1027345Z 2025-12-04T10:08:42.1027859Z [W1204 10:01:48.974715488 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:08:42.1027864Z 2025-12-04T10:08:42.1028390Z [W1204 10:01:48.976593061 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:08:42.1028401Z 2025-12-04T10:08:42.1028915Z [W1204 10:01:48.978919971 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:08:42.1028920Z 2025-12-04T10:08:42.1029067Z ('RERUN', {'yellow': True}) [11.7288s] [100%] 2025-12-04T10:08:42.1030068Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_deconv_freezing_cuda [W1204 10:01:48.109639308 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:08:42.1030074Z 2025-12-04T10:08:42.1030594Z [W1204 10:01:48.111433956 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:08:42.1030613Z 2025-12-04T10:08:42.1031123Z [W1204 10:01:48.113999686 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:08:42.1031128Z 2025-12-04T10:08:42.1031644Z [W1204 10:01:54.509384790 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:08:42.1031649Z 2025-12-04T10:08:42.1032179Z [W1204 10:01:54.509814465 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:08:42.1032185Z 2025-12-04T10:08:42.1032697Z [W1204 10:01:54.509969178 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:08:42.1032702Z 2025-12-04T10:08:42.1033299Z [W1204 10:01:54.513143703 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:08:42.1033305Z 2025-12-04T10:08:42.1033818Z [W1204 10:01:54.516781924 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:08:42.1033823Z 2025-12-04T10:08:42.1034350Z [W1204 10:02:00.564663468 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:08:42.1034456Z 2025-12-04T10:08:42.1034965Z [W1204 10:02:00.566543683 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:08:42.1034970Z 2025-12-04T10:08:42.1035494Z [W1204 10:02:00.568903701 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:08:42.1035499Z 2025-12-04T10:08:42.1035610Z FAILED [11.5884s] [100%] 2025-12-04T10:08:42.1035620Z 2025-12-04T10:08:42.1035766Z ==================================== RERUNS ==================================== 2025-12-04T10:08:42.1036095Z __________ AOTInductorTestABICompatibleGpu.test_deconv_freezing_cuda ___________ 2025-12-04T10:08:42.1036224Z Traceback (most recent call last): 2025-12-04T10:08:42.1036686Z File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor.py", line 873, in test_deconv_freezing 2025-12-04T10:08:42.1036894Z self.check_model(Model(self.device), example_inputs) 2025-12-04T10:08:42.1037328Z File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_utils.py", line 252, in check_model 2025-12-04T10:08:42.1037469Z actual = AOTIRunnerUtil.run( 2025-12-04T10:08:42.1037856Z File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_utils.py", line 184, in run 2025-12-04T10:08:42.1037998Z package_path = AOTIRunnerUtil.compile( 2025-12-04T10:08:42.1038425Z File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_utils.py", line 172, in compile 2025-12-04T10:08:42.1038626Z package_path = torch._inductor.aoti_compile_and_package( 2025-12-04T10:08:42.1039164Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 151, in aoti_compile_and_package 2025-12-04T10:08:42.1039295Z return aot_inductor_minifier_wrapper( 2025-12-04T10:08:42.1039838Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/debug.py", line 1336, in aot_inductor_minifier_wrapper 2025-12-04T10:08:42.1039953Z raise e 2025-12-04T10:08:42.1040490Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/debug.py", line 1306, in aot_inductor_minifier_wrapper 2025-12-04T10:08:42.1040586Z return func( 2025-12-04T10:08:42.1041144Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 195, in _aoti_compile_and_package_inner 2025-12-04T10:08:42.1041377Z aoti_files = aot_compile(gm, args, kwargs, options=inductor_configs) 2025-12-04T10:08:42.1041852Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 311, in aot_compile 2025-12-04T10:08:42.1041967Z return compile_fx_aot( 2025-12-04T10:08:42.1042459Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2007, in compile_fx_aot 2025-12-04T10:08:42.1042596Z compiled_artifacts = compile_fx( 2025-12-04T10:08:42.1043071Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2477, in compile_fx 2025-12-04T10:08:42.1043175Z return compile_fx( 2025-12-04T10:08:42.1043652Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2516, in compile_fx 2025-12-04T10:08:42.1043791Z return _maybe_wrap_and_compile_fx_main( 2025-12-04T10:08:42.1044376Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2605, in _maybe_wrap_and_compile_fx_main 2025-12-04T10:08:42.1044554Z return _compile_fx_main( 2025-12-04T10:08:42.1045058Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2788, in _compile_fx_main 2025-12-04T10:08:42.1045275Z return inference_compiler(unlifted_gm, example_inputs_) 2025-12-04T10:08:42.1045801Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2098, in fw_compiler_freezing 2025-12-04T10:08:42.1046005Z optimized_function = inner_compile( 2025-12-04T10:08:42.1046286Z File "/opt/conda/envs/py_3.10/lib/python3.10/contextlib.py", line 79, in inner 2025-12-04T10:08:42.1046401Z return func(*args, **kwds) 2025-12-04T10:08:42.1046910Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 806, in compile_fx_inner 2025-12-04T10:08:42.1047177Z return wrap_compiler_debug(_compile_fx_inner, compiler_name="inductor")( 2025-12-04T10:08:42.1047673Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/repro/after_aot.py", line 146, in debug_wrapper 2025-12-04T10:08:42.1047862Z inner_compiled_fn = compiler_fn(gm, example_inputs) 2025-12-04T10:08:42.1048363Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 992, in _compile_fx_inner 2025-12-04T10:08:42.1048572Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:08:42.1049073Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 988, in _compile_fx_inner 2025-12-04T10:08:42.1049226Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:08:42.1049768Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:08:42.1050090Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:08:42.1050629Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1471, in codegen_and_compile 2025-12-04T10:08:42.1050757Z _check_triton_bf16_support(graph) 2025-12-04T10:08:42.1051306Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2911, in _check_triton_bf16_support 2025-12-04T10:08:42.1051445Z warn_and_skip(node.get_device()) 2025-12-04T10:08:42.1051926Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2894, in warn_and_skip 2025-12-04T10:08:42.1052072Z raise SkipFrame("BF16 is not supported") 2025-12-04T10:08:42.1052337Z torch._inductor.exc.InductorError: SkipFrame: BF16 is not supported 2025-12-04T10:08:42.1052343Z 2025-12-04T10:08:42.1052560Z To execute this test, run the following from the base repo dir: 2025-12-04T10:08:42.1053158Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_aot_inductor.py AOTInductorTestABICompatibleGpu.test_deconv_freezing_cuda 2025-12-04T10:08:42.1053163Z 2025-12-04T10:08:42.1053436Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:08:42.1053658Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:08:42.1053777Z unimplemented [] 2025-12-04T10:08:42.1053936Z stats [('calls_captured', 3), ('unique_graphs', 3)] 2025-12-04T10:08:42.1054277Z inductor [('async_compile_cache_miss', 4), ('extern_calls', 2), ('async_compile_cache_hit', 2)] 2025-12-04T10:08:42.1054382Z graph_break [] 2025-12-04T10:08:42.1054600Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:08:42.1055821Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:08:42.1055941Z if out == self.unknown_value: 2025-12-04T10:08:42.1057306Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:08:42.1057430Z if out == self.unknown_value: 2025-12-04T10:08:42.1058624Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:08:42.1058834Z if out == self.unknown_value: 2025-12-04T10:08:42.1059564Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T10:08:42.1059695Z warnings.warn( 2025-12-04T10:08:42.1060011Z __________ AOTInductorTestABICompatibleGpu.test_deconv_freezing_cuda ___________ 2025-12-04T10:08:42.1060140Z Traceback (most recent call last): 2025-12-04T10:08:42.1060619Z File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor.py", line 873, in test_deconv_freezing 2025-12-04T10:08:42.1060805Z self.check_model(Model(self.device), example_inputs) 2025-12-04T10:08:42.1061235Z File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_utils.py", line 252, in check_model 2025-12-04T10:08:42.1061375Z actual = AOTIRunnerUtil.run( 2025-12-04T10:08:42.1061761Z File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_utils.py", line 184, in run 2025-12-04T10:08:42.1061920Z package_path = AOTIRunnerUtil.compile( 2025-12-04T10:08:42.1062322Z File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_utils.py", line 172, in compile 2025-12-04T10:08:42.1062523Z package_path = torch._inductor.aoti_compile_and_package( 2025-12-04T10:08:42.1063060Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 151, in aoti_compile_and_package 2025-12-04T10:08:42.1063192Z return aot_inductor_minifier_wrapper( 2025-12-04T10:08:42.1063735Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/debug.py", line 1336, in aot_inductor_minifier_wrapper 2025-12-04T10:08:42.1063842Z raise e 2025-12-04T10:08:42.1064384Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/debug.py", line 1306, in aot_inductor_minifier_wrapper 2025-12-04T10:08:42.1064494Z return func( 2025-12-04T10:08:42.1065043Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 195, in _aoti_compile_and_package_inner 2025-12-04T10:08:42.1065275Z aoti_files = aot_compile(gm, args, kwargs, options=inductor_configs) 2025-12-04T10:08:42.1065743Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 311, in aot_compile 2025-12-04T10:08:42.1065858Z return compile_fx_aot( 2025-12-04T10:08:42.1066350Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2007, in compile_fx_aot 2025-12-04T10:08:42.1066492Z compiled_artifacts = compile_fx( 2025-12-04T10:08:42.1066959Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2477, in compile_fx 2025-12-04T10:08:42.1067079Z return compile_fx( 2025-12-04T10:08:42.1067542Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2516, in compile_fx 2025-12-04T10:08:42.1067682Z return _maybe_wrap_and_compile_fx_main( 2025-12-04T10:08:42.1068267Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2605, in _maybe_wrap_and_compile_fx_main 2025-12-04T10:08:42.1068382Z return _compile_fx_main( 2025-12-04T10:08:42.1068893Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2788, in _compile_fx_main 2025-12-04T10:08:42.1069096Z return inference_compiler(unlifted_gm, example_inputs_) 2025-12-04T10:08:42.1069684Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2098, in fw_compiler_freezing 2025-12-04T10:08:42.1069825Z optimized_function = inner_compile( 2025-12-04T10:08:42.1070107Z File "/opt/conda/envs/py_3.10/lib/python3.10/contextlib.py", line 79, in inner 2025-12-04T10:08:42.1070222Z return func(*args, **kwds) 2025-12-04T10:08:42.1070733Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 806, in compile_fx_inner 2025-12-04T10:08:42.1071248Z return wrap_compiler_debug(_compile_fx_inner, compiler_name="inductor")( 2025-12-04T10:08:42.1071754Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/repro/after_aot.py", line 146, in debug_wrapper 2025-12-04T10:08:42.1071928Z inner_compiled_fn = compiler_fn(gm, example_inputs) 2025-12-04T10:08:42.1072430Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 992, in _compile_fx_inner 2025-12-04T10:08:42.1072645Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:08:42.1073145Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 988, in _compile_fx_inner 2025-12-04T10:08:42.1073306Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:08:42.1073840Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:08:42.1074167Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:08:42.1074702Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1471, in codegen_and_compile 2025-12-04T10:08:42.1074829Z _check_triton_bf16_support(graph) 2025-12-04T10:08:42.1075377Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2911, in _check_triton_bf16_support 2025-12-04T10:08:42.1075511Z warn_and_skip(node.get_device()) 2025-12-04T10:08:42.1075998Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2894, in warn_and_skip 2025-12-04T10:08:42.1076152Z raise SkipFrame("BF16 is not supported") 2025-12-04T10:08:42.1076407Z torch._inductor.exc.InductorError: SkipFrame: BF16 is not supported 2025-12-04T10:08:42.1076413Z 2025-12-04T10:08:42.1076630Z To execute this test, run the following from the base repo dir: 2025-12-04T10:08:42.1077237Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_aot_inductor.py AOTInductorTestABICompatibleGpu.test_deconv_freezing_cuda 2025-12-04T10:08:42.1077242Z 2025-12-04T10:08:42.1077511Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:08:42.1077749Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:08:42.1077860Z unimplemented [] 2025-12-04T10:08:42.1078018Z stats [('calls_captured', 3), ('unique_graphs', 3)] 2025-12-04T10:08:42.1078368Z inductor [('async_compile_cache_miss', 4), ('extern_calls', 2), ('async_compile_cache_hit', 2)] 2025-12-04T10:08:42.1078469Z graph_break [] 2025-12-04T10:08:42.1078691Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:08:42.1079904Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:08:42.1080028Z if out == self.unknown_value: 2025-12-04T10:08:42.1081401Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:08:42.1083237Z if out == self.unknown_value: 2025-12-04T10:08:42.1084805Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:08:42.1086277Z if out == self.unknown_value: 2025-12-04T10:08:42.1087239Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T10:08:42.1088304Z warnings.warn( 2025-12-04T10:08:42.1088688Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:08:42.1089174Z unimplemented [] 2025-12-04T10:08:42.1089511Z stats [('calls_captured', 3), ('unique_graphs', 3)] 2025-12-04T10:08:42.1090124Z inductor [('async_compile_cache_miss', 4), ('extern_calls', 2), ('async_compile_cache_hit', 2)] 2025-12-04T10:08:42.1090740Z graph_break [] 2025-12-04T10:08:42.1091133Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:08:42.1093254Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:08:42.1095203Z if out == self.unknown_value: 2025-12-04T10:08:42.1096720Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:08:42.1098205Z if out == self.unknown_value: 2025-12-04T10:08:42.1099165Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T10:08:42.1100144Z warnings.warn( 2025-12-04T10:08:42.1100464Z =================================== FAILURES =================================== 2025-12-04T10:08:42.1101079Z __________ AOTInductorTestABICompatibleGpu.test_deconv_freezing_cuda ___________ 2025-12-04T10:08:42.1101992Z Traceback (most recent call last): 2025-12-04T10:08:42.1103011Z File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor.py", line 873, in test_deconv_freezing 2025-12-04T10:08:42.1103810Z self.check_model(Model(self.device), example_inputs) 2025-12-04T10:08:42.1104575Z File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_utils.py", line 252, in check_model 2025-12-04T10:08:42.1105284Z actual = AOTIRunnerUtil.run( 2025-12-04T10:08:42.1105915Z File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_utils.py", line 184, in run 2025-12-04T10:08:42.1106603Z package_path = AOTIRunnerUtil.compile( 2025-12-04T10:08:42.1107277Z File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_utils.py", line 172, in compile 2025-12-04T10:08:42.1108040Z package_path = torch._inductor.aoti_compile_and_package( 2025-12-04T10:08:42.1108922Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 151, in aoti_compile_and_package 2025-12-04T10:08:42.1109728Z return aot_inductor_minifier_wrapper( 2025-12-04T10:08:42.1110527Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/debug.py", line 1336, in aot_inductor_minifier_wrapper 2025-12-04T10:08:42.1111402Z raise e 2025-12-04T10:08:42.1112088Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/debug.py", line 1306, in aot_inductor_minifier_wrapper 2025-12-04T10:08:42.1112879Z return func( 2025-12-04T10:08:42.1113583Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 195, in _aoti_compile_and_package_inner 2025-12-04T10:08:42.1114514Z aoti_files = aot_compile(gm, args, kwargs, options=inductor_configs) 2025-12-04T10:08:42.1115359Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 311, in aot_compile 2025-12-04T10:08:42.1116191Z return compile_fx_aot( 2025-12-04T10:08:42.1116907Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2007, in compile_fx_aot 2025-12-04T10:08:42.1117687Z compiled_artifacts = compile_fx( 2025-12-04T10:08:42.1118414Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2477, in compile_fx 2025-12-04T10:08:42.1119213Z return compile_fx( 2025-12-04T10:08:42.1119889Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2516, in compile_fx 2025-12-04T10:08:42.1120646Z return _maybe_wrap_and_compile_fx_main( 2025-12-04T10:08:42.1121482Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2605, in _maybe_wrap_and_compile_fx_main 2025-12-04T10:08:42.1122319Z return _compile_fx_main( 2025-12-04T10:08:42.1123051Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2788, in _compile_fx_main 2025-12-04T10:08:42.1123904Z return inference_compiler(unlifted_gm, example_inputs_) 2025-12-04T10:08:42.1124761Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2098, in fw_compiler_freezing 2025-12-04T10:08:42.1125565Z optimized_function = inner_compile( 2025-12-04T10:08:42.1126108Z File "/opt/conda/envs/py_3.10/lib/python3.10/contextlib.py", line 79, in inner 2025-12-04T10:08:42.1126652Z return func(*args, **kwds) 2025-12-04T10:08:42.1127357Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 806, in compile_fx_inner 2025-12-04T10:08:42.1128266Z return wrap_compiler_debug(_compile_fx_inner, compiler_name="inductor")( 2025-12-04T10:08:42.1129170Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/repro/after_aot.py", line 146, in debug_wrapper 2025-12-04T10:08:42.1129977Z inner_compiled_fn = compiler_fn(gm, example_inputs) 2025-12-04T10:08:42.1130805Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 992, in _compile_fx_inner 2025-12-04T10:08:42.1131652Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:08:42.1132494Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 988, in _compile_fx_inner 2025-12-04T10:08:42.1133280Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:08:42.1134104Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:08:42.1135110Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:08:42.1136099Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1471, in codegen_and_compile 2025-12-04T10:08:42.1136951Z _check_triton_bf16_support(graph) 2025-12-04T10:08:42.1137759Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2911, in _check_triton_bf16_support 2025-12-04T10:08:42.1138577Z warn_and_skip(node.get_device()) 2025-12-04T10:08:42.1139292Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2894, in warn_and_skip 2025-12-04T10:08:42.1140067Z raise SkipFrame("BF16 is not supported") 2025-12-04T10:08:42.1140594Z torch._inductor.exc.InductorError: SkipFrame: BF16 is not supported 2025-12-04T10:08:42.1140993Z 2025-12-04T10:08:42.1141225Z To execute this test, run the following from the base repo dir: 2025-12-04T10:08:42.1142161Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_aot_inductor.py AOTInductorTestABICompatibleGpu.test_deconv_freezing_cuda 2025-12-04T10:08:42.1142895Z 2025-12-04T10:08:42.1143166Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:08:42.1143812Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:08:42.1144413Z unimplemented [] 2025-12-04T10:08:42.1144742Z stats [('calls_captured', 3), ('unique_graphs', 3)] 2025-12-04T10:08:42.1145382Z inductor [('async_compile_cache_miss', 4), ('extern_calls', 2), ('async_compile_cache_hit', 2)] 2025-12-04T10:08:42.1145974Z graph_break [] 2025-12-04T10:08:42.1146347Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:08:42.1147993Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:08:42.1160362Z if out == self.unknown_value: 2025-12-04T10:08:42.1161857Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:08:42.1163355Z if out == self.unknown_value: 2025-12-04T10:08:42.1164776Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:08:42.1166221Z if out == self.unknown_value: 2025-12-04T10:08:42.1167186Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T10:08:42.1168163Z warnings.warn( 2025-12-04T10:08:42.1168552Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:08:42.1169034Z unimplemented [] 2025-12-04T10:08:42.1169368Z stats [('calls_captured', 3), ('unique_graphs', 3)] 2025-12-04T10:08:42.1170003Z inductor [('async_compile_cache_miss', 4), ('extern_calls', 2), ('async_compile_cache_hit', 2)] 2025-12-04T10:08:42.1170573Z graph_break [] 2025-12-04T10:08:42.1171178Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:08:42.1172752Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:08:42.1174222Z if out == self.unknown_value: 2025-12-04T10:08:42.1175621Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:08:42.1177137Z if out == self.unknown_value: 2025-12-04T10:08:42.1178084Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T10:08:42.1179064Z warnings.warn( 2025-12-04T10:08:42.1179443Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:08:42.1179918Z unimplemented [] 2025-12-04T10:08:42.1180251Z stats [('calls_captured', 3), ('unique_graphs', 3)] 2025-12-04T10:08:42.1180870Z inductor [('async_compile_cache_miss', 4), ('extern_calls', 2), ('async_compile_cache_hit', 2)] 2025-12-04T10:08:42.1181455Z graph_break [] 2025-12-04T10:08:42.1181835Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:08:42.1183403Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:08:42.1184840Z if out == self.unknown_value: 2025-12-04T10:08:42.1186477Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:08:42.1187933Z if out == self.unknown_value: 2025-12-04T10:08:42.1188892Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T10:08:42.1189960Z warnings.warn( 2025-12-04T10:08:42.1190881Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-2e7c8f13f7be0603.xml - 2025-12-04T10:08:42.1191946Z =========================== short test summary info ============================ 2025-12-04T10:08:42.1193090Z FAILED [11.5884s] inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_deconv_freezing_cuda - torch._inductor.exc.InductorError: SkipFrame: BF16 is not supported 2025-12-04T10:08:42.1194019Z 2025-12-04T10:08:42.1194246Z To execute this test, run the following from the base repo dir: 2025-12-04T10:08:42.1195199Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_aot_inductor.py AOTInductorTestABICompatibleGpu.test_deconv_freezing_cuda 2025-12-04T10:08:42.1195923Z 2025-12-04T10:08:42.1196205Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:08:42.1196800Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:08:42.1197328Z ================= 1 failed, 157 deselected, 2 rerun in 56.26s ================== 2025-12-04T10:08:42.1197787Z Got exit code 1 2025-12-04T10:08:42.1198065Z Retrying single test... 2025-12-04T10:08:42.1198692Z W1204 10:02:12.522000 18895 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T10:08:42.1199848Z Test results will be stored in test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-8d6cdce6581fa448.xml 2025-12-04T10:08:42.1200737Z ============================= test session starts ============================== 2025-12-04T10:08:42.1201413Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:08:42.1202005Z cachedir: .pytest_cache 2025-12-04T10:08:42.1202720Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:08:42.1203514Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:08:42.1203859Z configfile: pytest.ini 2025-12-04T10:08:42.1204598Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:08:42.1205516Z collecting ... collected 934 items / 157 deselected / 777 selected 2025-12-04T10:08:42.1206561Z stepcurrent: skipping 70 already run items. Running only test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_deconv_freezing_cuda 2025-12-04T10:08:42.1207483Z Running 1 items in this shard 2025-12-04T10:08:42.1207712Z 2025-12-04T10:08:42.1208710Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_deconv_freezing_cuda [W1204 10:02:16.072468211 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:08:42.1209845Z 2025-12-04T10:08:42.1210365Z [W1204 10:02:32.969755695 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:08:42.1211020Z 2025-12-04T10:08:42.1211549Z [W1204 10:02:32.973126310 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:08:42.1212203Z 2025-12-04T10:08:42.1212677Z W1204 10:02:32.648000 18895 site-packages/torch/_inductor/utils.py:1703] [0/0] Not enough SMs to use max_autotune_gemm mode 2025-12-04T10:08:42.1213861Z [W1204 10:02:40.775847541 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:08:42.1214531Z 2025-12-04T10:08:42.1215045Z [W1204 10:02:40.776377923 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:08:42.1215709Z 2025-12-04T10:08:42.1216222Z [W1204 10:02:40.776570645 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:08:42.1217009Z 2025-12-04T10:08:42.1217535Z [W1204 10:02:40.781470701 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:08:42.1218185Z 2025-12-04T10:08:42.1218708Z [W1204 10:02:40.785591636 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:08:42.1219354Z 2025-12-04T10:08:42.1219869Z [W1204 10:02:47.826891735 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:08:42.1220530Z 2025-12-04T10:08:42.1221044Z [W1204 10:02:47.828779803 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:08:42.1221703Z 2025-12-04T10:08:42.1222215Z [W1204 10:02:47.831156108 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:08:42.1222872Z 2025-12-04T10:08:42.1223022Z ('RERUN', {'yellow': True}) [33.0632s] [100%] 2025-12-04T10:08:42.1224286Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_deconv_freezing_cuda [W1204 10:02:47.999722910 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:08:42.1225409Z 2025-12-04T10:08:42.1225929Z [W1204 10:02:47.001504841 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:08:42.1226594Z 2025-12-04T10:08:42.1227116Z [W1204 10:02:47.003768082 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:08:42.1227777Z 2025-12-04T10:08:42.1228289Z [W1204 10:02:53.422308098 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:08:42.1228938Z 2025-12-04T10:08:42.1229463Z [W1204 10:02:53.422718137 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:08:42.1230114Z 2025-12-04T10:08:42.1230642Z [W1204 10:02:53.422868533 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:08:42.1231292Z 2025-12-04T10:08:42.1231806Z [W1204 10:02:53.425944101 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:08:42.1232468Z 2025-12-04T10:08:42.1232981Z [W1204 10:02:53.429506266 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:08:42.1233645Z 2025-12-04T10:08:42.1234156Z [W1204 10:02:59.578267718 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:08:42.1234802Z 2025-12-04T10:08:42.1235328Z [W1204 10:02:59.580183005 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:08:42.1235980Z 2025-12-04T10:08:42.1236505Z [W1204 10:02:59.582566586 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:08:42.1237156Z 2025-12-04T10:08:42.1237290Z ('RERUN', {'yellow': True}) [11.7124s] [100%] 2025-12-04T10:08:42.1238623Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_deconv_freezing_cuda [W1204 10:02:59.713768696 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:08:42.1239765Z 2025-12-04T10:08:42.1240281Z [W1204 10:02:59.715501167 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:08:42.1240928Z 2025-12-04T10:08:42.1241453Z [W1204 10:02:59.718003599 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:08:42.1242166Z 2025-12-04T10:08:42.1242694Z [W1204 10:03:04.129315977 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:08:42.1243348Z 2025-12-04T10:08:42.1243864Z [W1204 10:03:04.129749292 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:08:42.1244529Z 2025-12-04T10:08:42.1245046Z [W1204 10:03:04.129905310 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:08:42.1245707Z 2025-12-04T10:08:42.1246223Z [W1204 10:03:04.133121240 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:08:42.1246866Z 2025-12-04T10:08:42.1247393Z [W1204 10:03:04.136769310 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:08:42.1248046Z 2025-12-04T10:08:42.1248567Z [W1204 10:03:10.260192085 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:08:42.1249214Z 2025-12-04T10:08:42.1249728Z [W1204 10:03:10.262059543 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:08:42.1250392Z 2025-12-04T10:08:42.1250903Z [W1204 10:03:10.264361021 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:08:42.1251568Z 2025-12-04T10:08:42.1251684Z FAILED [11.6797s] [100%] 2025-12-04T10:08:42.1251872Z 2025-12-04T10:08:42.1252034Z ==================================== RERUNS ==================================== 2025-12-04T10:08:42.1252647Z __________ AOTInductorTestABICompatibleGpu.test_deconv_freezing_cuda ___________ 2025-12-04T10:08:42.1253216Z Traceback (most recent call last): 2025-12-04T10:08:42.1253924Z File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor.py", line 873, in test_deconv_freezing 2025-12-04T10:08:42.1254702Z self.check_model(Model(self.device), example_inputs) 2025-12-04T10:08:42.1255471Z File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_utils.py", line 252, in check_model 2025-12-04T10:08:42.1256173Z actual = AOTIRunnerUtil.run( 2025-12-04T10:08:42.1256863Z File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_utils.py", line 184, in run 2025-12-04T10:08:42.1257531Z package_path = AOTIRunnerUtil.compile( 2025-12-04T10:08:42.1258225Z File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_utils.py", line 172, in compile 2025-12-04T10:08:42.1258988Z package_path = torch._inductor.aoti_compile_and_package( 2025-12-04T10:08:42.1259850Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 151, in aoti_compile_and_package 2025-12-04T10:08:42.1260662Z return aot_inductor_minifier_wrapper( 2025-12-04T10:08:42.1261479Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/debug.py", line 1336, in aot_inductor_minifier_wrapper 2025-12-04T10:08:42.1262261Z raise e 2025-12-04T10:08:42.1262933Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/debug.py", line 1306, in aot_inductor_minifier_wrapper 2025-12-04T10:08:42.1263708Z return func( 2025-12-04T10:08:42.1264427Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 195, in _aoti_compile_and_package_inner 2025-12-04T10:08:42.1265434Z aoti_files = aot_compile(gm, args, kwargs, options=inductor_configs) 2025-12-04T10:08:42.1266281Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 311, in aot_compile 2025-12-04T10:08:42.1267007Z return compile_fx_aot( 2025-12-04T10:08:42.1267709Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2007, in compile_fx_aot 2025-12-04T10:08:42.1268551Z compiled_artifacts = compile_fx( 2025-12-04T10:08:42.1269274Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2477, in compile_fx 2025-12-04T10:08:42.1269999Z return compile_fx( 2025-12-04T10:08:42.1270646Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2516, in compile_fx 2025-12-04T10:08:42.1271622Z return _maybe_wrap_and_compile_fx_main( 2025-12-04T10:08:42.1272471Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2605, in _maybe_wrap_and_compile_fx_main 2025-12-04T10:08:42.1273302Z return _compile_fx_main( 2025-12-04T10:08:42.1274008Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2788, in _compile_fx_main 2025-12-04T10:08:42.1274861Z return inference_compiler(unlifted_gm, example_inputs_) 2025-12-04T10:08:42.1275722Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2098, in fw_compiler_freezing 2025-12-04T10:08:42.1276529Z optimized_function = inner_compile( 2025-12-04T10:08:42.1277056Z File "/opt/conda/envs/py_3.10/lib/python3.10/contextlib.py", line 79, in inner 2025-12-04T10:08:42.1277596Z return func(*args, **kwds) 2025-12-04T10:08:42.1278316Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 806, in compile_fx_inner 2025-12-04T10:08:42.1279215Z return wrap_compiler_debug(_compile_fx_inner, compiler_name="inductor")( 2025-12-04T10:08:42.1280126Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/repro/after_aot.py", line 146, in debug_wrapper 2025-12-04T10:08:42.1280941Z inner_compiled_fn = compiler_fn(gm, example_inputs) 2025-12-04T10:08:42.1281756Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 992, in _compile_fx_inner 2025-12-04T10:08:42.1282593Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:08:42.1283424Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 988, in _compile_fx_inner 2025-12-04T10:08:42.1284225Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:08:42.1285043Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:08:42.1286030Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:08:42.1287027Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1471, in codegen_and_compile 2025-12-04T10:08:42.1287817Z _check_triton_bf16_support(graph) 2025-12-04T10:08:42.1288604Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2911, in _check_triton_bf16_support 2025-12-04T10:08:42.1289420Z warn_and_skip(node.get_device()) 2025-12-04T10:08:42.1290153Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2894, in warn_and_skip 2025-12-04T10:08:42.1290915Z raise SkipFrame("BF16 is not supported") 2025-12-04T10:08:42.1291432Z torch._inductor.exc.InductorError: SkipFrame: BF16 is not supported 2025-12-04T10:08:42.1291834Z 2025-12-04T10:08:42.1292054Z To execute this test, run the following from the base repo dir: 2025-12-04T10:08:42.1292997Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_aot_inductor.py AOTInductorTestABICompatibleGpu.test_deconv_freezing_cuda 2025-12-04T10:08:42.1293923Z 2025-12-04T10:08:42.1294214Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:08:42.1294846Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:08:42.1295328Z unimplemented [] 2025-12-04T10:08:42.1295667Z stats [('calls_captured', 3), ('unique_graphs', 3)] 2025-12-04T10:08:42.1296356Z inductor [('async_compile_cache_miss', 4), ('extern_calls', 2), ('async_compile_cache_hit', 2)] 2025-12-04T10:08:42.1297140Z graph_break [] 2025-12-04T10:08:42.1297526Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:08:42.1299103Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:08:42.1300559Z if out == self.unknown_value: 2025-12-04T10:08:42.1301986Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:08:42.1303431Z if out == self.unknown_value: 2025-12-04T10:08:42.1304857Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:08:42.1306296Z if out == self.unknown_value: 2025-12-04T10:08:42.1307244Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T10:08:42.1308214Z warnings.warn( 2025-12-04T10:08:42.1308691Z __________ AOTInductorTestABICompatibleGpu.test_deconv_freezing_cuda ___________ 2025-12-04T10:08:42.1309261Z Traceback (most recent call last): 2025-12-04T10:08:42.1309960Z File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor.py", line 873, in test_deconv_freezing 2025-12-04T10:08:42.1310741Z self.check_model(Model(self.device), example_inputs) 2025-12-04T10:08:42.1311477Z File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_utils.py", line 252, in check_model 2025-12-04T10:08:42.1312174Z actual = AOTIRunnerUtil.run( 2025-12-04T10:08:42.1312791Z File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_utils.py", line 184, in run 2025-12-04T10:08:42.1313461Z package_path = AOTIRunnerUtil.compile( 2025-12-04T10:08:42.1314126Z File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_utils.py", line 172, in compile 2025-12-04T10:08:42.1314878Z package_path = torch._inductor.aoti_compile_and_package( 2025-12-04T10:08:42.1315747Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 151, in aoti_compile_and_package 2025-12-04T10:08:42.1316546Z return aot_inductor_minifier_wrapper( 2025-12-04T10:08:42.1317341Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/debug.py", line 1336, in aot_inductor_minifier_wrapper 2025-12-04T10:08:42.1318121Z raise e 2025-12-04T10:08:42.1318806Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/debug.py", line 1306, in aot_inductor_minifier_wrapper 2025-12-04T10:08:42.1319575Z return func( 2025-12-04T10:08:42.1320290Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 195, in _aoti_compile_and_package_inner 2025-12-04T10:08:42.1321214Z aoti_files = aot_compile(gm, args, kwargs, options=inductor_configs) 2025-12-04T10:08:42.1322048Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 311, in aot_compile 2025-12-04T10:08:42.1322758Z return compile_fx_aot( 2025-12-04T10:08:42.1323534Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2007, in compile_fx_aot 2025-12-04T10:08:42.1324306Z compiled_artifacts = compile_fx( 2025-12-04T10:08:42.1325008Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2477, in compile_fx 2025-12-04T10:08:42.1325733Z return compile_fx( 2025-12-04T10:08:42.1326398Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2516, in compile_fx 2025-12-04T10:08:42.1327221Z return _maybe_wrap_and_compile_fx_main( 2025-12-04T10:08:42.1328073Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2605, in _maybe_wrap_and_compile_fx_main 2025-12-04T10:08:42.1328894Z return _compile_fx_main( 2025-12-04T10:08:42.1329611Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2788, in _compile_fx_main 2025-12-04T10:08:42.1330461Z return inference_compiler(unlifted_gm, example_inputs_) 2025-12-04T10:08:42.1331332Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2098, in fw_compiler_freezing 2025-12-04T10:08:42.1332119Z optimized_function = inner_compile( 2025-12-04T10:08:42.1332653Z File "/opt/conda/envs/py_3.10/lib/python3.10/contextlib.py", line 79, in inner 2025-12-04T10:08:42.1333196Z return func(*args, **kwds) 2025-12-04T10:08:42.1333907Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 806, in compile_fx_inner 2025-12-04T10:08:42.1334812Z return wrap_compiler_debug(_compile_fx_inner, compiler_name="inductor")( 2025-12-04T10:08:42.1335721Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/repro/after_aot.py", line 146, in debug_wrapper 2025-12-04T10:08:42.1336604Z inner_compiled_fn = compiler_fn(gm, example_inputs) 2025-12-04T10:08:42.1337411Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 992, in _compile_fx_inner 2025-12-04T10:08:42.1338257Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:08:42.1339090Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 988, in _compile_fx_inner 2025-12-04T10:08:42.1339893Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:08:42.1340696Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:08:42.1341698Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:08:42.1342694Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1471, in codegen_and_compile 2025-12-04T10:08:42.1343487Z _check_triton_bf16_support(graph) 2025-12-04T10:08:42.1344272Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2911, in _check_triton_bf16_support 2025-12-04T10:08:42.1345096Z warn_and_skip(node.get_device()) 2025-12-04T10:08:42.1345823Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2894, in warn_and_skip 2025-12-04T10:08:42.1346573Z raise SkipFrame("BF16 is not supported") 2025-12-04T10:08:42.1347096Z torch._inductor.exc.InductorError: SkipFrame: BF16 is not supported 2025-12-04T10:08:42.1347499Z 2025-12-04T10:08:42.1347718Z To execute this test, run the following from the base repo dir: 2025-12-04T10:08:42.1348663Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_aot_inductor.py AOTInductorTestABICompatibleGpu.test_deconv_freezing_cuda 2025-12-04T10:08:42.1349384Z 2025-12-04T10:08:42.1349655Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:08:42.1350286Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:08:42.1350760Z unimplemented [] 2025-12-04T10:08:42.1351188Z stats [('calls_captured', 3), ('unique_graphs', 3)] 2025-12-04T10:08:42.1351806Z inductor [('async_compile_cache_miss', 4), ('extern_calls', 2), ('async_compile_cache_hit', 2)] 2025-12-04T10:08:42.1352375Z graph_break [] 2025-12-04T10:08:42.1352754Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:08:42.1354310Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:08:42.1355834Z if out == self.unknown_value: 2025-12-04T10:08:42.1357247Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:08:42.1358693Z if out == self.unknown_value: 2025-12-04T10:08:42.1360092Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:08:42.1361545Z if out == self.unknown_value: 2025-12-04T10:08:42.1362491Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T10:08:42.1363469Z warnings.warn( 2025-12-04T10:08:42.1363846Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:08:42.1364314Z unimplemented [] 2025-12-04T10:08:42.1364637Z stats [('calls_captured', 3), ('unique_graphs', 3)] 2025-12-04T10:08:42.1365260Z inductor [('async_compile_cache_miss', 4), ('extern_calls', 2), ('async_compile_cache_hit', 2)] 2025-12-04T10:08:42.1365821Z graph_break [] 2025-12-04T10:08:42.1366201Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:08:42.1367767Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:08:42.1369201Z if out == self.unknown_value: 2025-12-04T10:08:42.1370621Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:08:42.1372295Z if out == self.unknown_value: 2025-12-04T10:08:42.1373247Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T10:08:42.1374210Z warnings.warn( 2025-12-04T10:08:42.1374537Z =================================== FAILURES =================================== 2025-12-04T10:08:42.1375148Z __________ AOTInductorTestABICompatibleGpu.test_deconv_freezing_cuda ___________ 2025-12-04T10:08:42.1375729Z Traceback (most recent call last): 2025-12-04T10:08:42.1376495Z File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor.py", line 873, in test_deconv_freezing 2025-12-04T10:08:42.1377285Z self.check_model(Model(self.device), example_inputs) 2025-12-04T10:08:42.1378038Z File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_utils.py", line 252, in check_model 2025-12-04T10:08:42.1378723Z actual = AOTIRunnerUtil.run( 2025-12-04T10:08:42.1379341Z File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_utils.py", line 184, in run 2025-12-04T10:08:42.1380014Z package_path = AOTIRunnerUtil.compile( 2025-12-04T10:08:42.1380690Z File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_utils.py", line 172, in compile 2025-12-04T10:08:42.1381595Z package_path = torch._inductor.aoti_compile_and_package( 2025-12-04T10:08:42.1382468Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 151, in aoti_compile_and_package 2025-12-04T10:08:42.1383269Z return aot_inductor_minifier_wrapper( 2025-12-04T10:08:42.1384079Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/debug.py", line 1336, in aot_inductor_minifier_wrapper 2025-12-04T10:08:42.1384933Z raise e 2025-12-04T10:08:42.1385618Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/debug.py", line 1306, in aot_inductor_minifier_wrapper 2025-12-04T10:08:42.1386395Z return func( 2025-12-04T10:08:42.1387093Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 195, in _aoti_compile_and_package_inner 2025-12-04T10:08:42.1388014Z aoti_files = aot_compile(gm, args, kwargs, options=inductor_configs) 2025-12-04T10:08:42.1388854Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 311, in aot_compile 2025-12-04T10:08:42.1389569Z return compile_fx_aot( 2025-12-04T10:08:42.1390259Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2007, in compile_fx_aot 2025-12-04T10:08:42.1391017Z compiled_artifacts = compile_fx( 2025-12-04T10:08:42.1391737Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2477, in compile_fx 2025-12-04T10:08:42.1392445Z return compile_fx( 2025-12-04T10:08:42.1393098Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2516, in compile_fx 2025-12-04T10:08:42.1393842Z return _maybe_wrap_and_compile_fx_main( 2025-12-04T10:08:42.1394679Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2605, in _maybe_wrap_and_compile_fx_main 2025-12-04T10:08:42.1395494Z return _compile_fx_main( 2025-12-04T10:08:42.1396215Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2788, in _compile_fx_main 2025-12-04T10:08:42.1397063Z return inference_compiler(unlifted_gm, example_inputs_) 2025-12-04T10:08:42.1397921Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2098, in fw_compiler_freezing 2025-12-04T10:08:42.1398713Z optimized_function = inner_compile( 2025-12-04T10:08:42.1399256Z File "/opt/conda/envs/py_3.10/lib/python3.10/contextlib.py", line 79, in inner 2025-12-04T10:08:42.1399791Z return func(*args, **kwds) 2025-12-04T10:08:42.1400496Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 806, in compile_fx_inner 2025-12-04T10:08:42.1401408Z return wrap_compiler_debug(_compile_fx_inner, compiler_name="inductor")( 2025-12-04T10:08:42.1402314Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/repro/after_aot.py", line 146, in debug_wrapper 2025-12-04T10:08:42.1403133Z inner_compiled_fn = compiler_fn(gm, example_inputs) 2025-12-04T10:08:42.1403936Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 992, in _compile_fx_inner 2025-12-04T10:08:42.1404779Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:08:42.1405615Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 988, in _compile_fx_inner 2025-12-04T10:08:42.1406404Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:08:42.1407227Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:08:42.1408230Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:08:42.1409221Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1471, in codegen_and_compile 2025-12-04T10:08:42.1410081Z _check_triton_bf16_support(graph) 2025-12-04T10:08:42.1410887Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2911, in _check_triton_bf16_support 2025-12-04T10:08:42.1411712Z warn_and_skip(node.get_device()) 2025-12-04T10:08:42.1412447Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2894, in warn_and_skip 2025-12-04T10:08:42.1413274Z raise SkipFrame("BF16 is not supported") 2025-12-04T10:08:42.1413814Z torch._inductor.exc.InductorError: SkipFrame: BF16 is not supported 2025-12-04T10:08:42.1414199Z 2025-12-04T10:08:42.1414435Z To execute this test, run the following from the base repo dir: 2025-12-04T10:08:42.1415369Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_aot_inductor.py AOTInductorTestABICompatibleGpu.test_deconv_freezing_cuda 2025-12-04T10:08:42.1416104Z 2025-12-04T10:08:42.1416449Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:08:42.1417096Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:08:42.1417575Z unimplemented [] 2025-12-04T10:08:42.1417895Z stats [('calls_captured', 3), ('unique_graphs', 3)] 2025-12-04T10:08:42.1418528Z inductor [('async_compile_cache_miss', 4), ('extern_calls', 2), ('async_compile_cache_hit', 2)] 2025-12-04T10:08:42.1419108Z graph_break [] 2025-12-04T10:08:42.1419483Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:08:42.1421042Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:08:42.1422551Z if out == self.unknown_value: 2025-12-04T10:08:42.1424063Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:08:42.1425516Z if out == self.unknown_value: 2025-12-04T10:08:42.1426920Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:08:42.1428385Z if out == self.unknown_value: 2025-12-04T10:08:42.1429333Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T10:08:42.1430314Z warnings.warn( 2025-12-04T10:08:42.1430687Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:08:42.1431166Z unimplemented [] 2025-12-04T10:08:42.1431495Z stats [('calls_captured', 3), ('unique_graphs', 3)] 2025-12-04T10:08:42.1432118Z inductor [('async_compile_cache_miss', 4), ('extern_calls', 2), ('async_compile_cache_hit', 2)] 2025-12-04T10:08:42.1432695Z graph_break [] 2025-12-04T10:08:42.1433076Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:08:42.1434633Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:08:42.1436086Z if out == self.unknown_value: 2025-12-04T10:08:42.1437505Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:08:42.1438957Z if out == self.unknown_value: 2025-12-04T10:08:42.1440017Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T10:08:42.1440981Z warnings.warn( 2025-12-04T10:08:42.1441381Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:08:42.1441857Z unimplemented [] 2025-12-04T10:08:42.1442178Z stats [('calls_captured', 3), ('unique_graphs', 3)] 2025-12-04T10:08:42.1442813Z inductor [('async_compile_cache_miss', 4), ('extern_calls', 2), ('async_compile_cache_hit', 2)] 2025-12-04T10:08:42.1443470Z graph_break [] 2025-12-04T10:08:42.1443852Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:08:42.1445416Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:08:42.1446880Z if out == self.unknown_value: 2025-12-04T10:08:42.1448309Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:08:42.1449774Z if out == self.unknown_value: 2025-12-04T10:08:42.1450712Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T10:08:42.1451687Z warnings.warn( 2025-12-04T10:08:42.1452603Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-8d6cdce6581fa448.xml - 2025-12-04T10:08:42.1453662Z =========================== short test summary info ============================ 2025-12-04T10:08:42.1454798Z FAILED [11.6797s] inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_deconv_freezing_cuda - torch._inductor.exc.InductorError: SkipFrame: BF16 is not supported 2025-12-04T10:08:42.1455737Z 2025-12-04T10:08:42.1455954Z To execute this test, run the following from the base repo dir: 2025-12-04T10:08:42.1456971Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_aot_inductor.py AOTInductorTestABICompatibleGpu.test_deconv_freezing_cuda 2025-12-04T10:08:42.1457695Z 2025-12-04T10:08:42.1457980Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:08:42.1458576Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:08:42.1459122Z ================= 1 failed, 157 deselected, 2 rerun in 56.54s ================== 2025-12-04T10:08:42.1459577Z Got exit code 1 2025-12-04T10:08:42.1460240Z FAILED CONSISTENTLY: test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_deconv_freezing_cuda 2025-12-04T10:08:42.1461302Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T10:08:42.1462302Z W1204 10:03:22.855000 19883 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T10:08:42.1463451Z Test results will be stored in test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-9dce38c1d023996d.xml 2025-12-04T10:08:42.1464313Z ============================= test session starts ============================== 2025-12-04T10:08:42.1464984Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:08:42.1465588Z cachedir: .pytest_cache 2025-12-04T10:08:42.1466296Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:08:42.1467068Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:08:42.1467430Z configfile: pytest.ini 2025-12-04T10:08:42.1468164Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:08:42.1469168Z collecting ... collected 934 items / 71 deselected / 863 selected 2025-12-04T10:08:42.1469666Z stepcurrent: skipping 71 already run items. 2025-12-04T10:08:42.1470061Z Running 87 items in this shard 2025-12-04T10:08:42.1470273Z 2025-12-04T10:08:42.1471229Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_duplicate_constant_folding_cuda <- test/inductor/test_torchinductor.py PASSED [9.7597s] [ 1%] 2025-12-04T10:08:42.1472891Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_dynamic_cat_cuda <- test/inductor/test_torchinductor.py PASSED [6.4184s] [ 2%] 2025-12-04T10:08:42.1474410Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_extract_constants_map_cuda <- test/inductor/test_torchinductor.py PASSED [6.3029s] [ 3%] 2025-12-04T10:08:42.1476009Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_fake_tensor_device_validation_cuda <- test/inductor/test_torchinductor.py PASSED [0.0788s] [ 4%] 2025-12-04T10:08:42.1477607Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_fp8_cuda SKIPPED [0.0003s] (FP8 is only supported on H100+, SM 8.9 and MI300+ devices) [ 5%] 2025-12-04T10:08:42.1479131Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_inf_cuda SKIPPED [0.0002s] (Skip this test, only for local test. SIGABRT is produced.) [ 6%] 2025-12-04T10:08:42.1480696Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_input_codegen_with_sympy_expr_cuda <- test/inductor/test_torchinductor.py PASSED [7.0187s] [ 8%] 2025-12-04T10:08:42.1482276Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_masked_select_dynamic_cuda <- test/inductor/test_torchinductor.py PASSED [6.4916s] [ 9%] 2025-12-04T10:08:42.1484160Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_misaligned_input_1_cuda <- test/inductor/test_torchinductor.py W1204 10:04:07.312000 19883 site-packages/torch/_export/__init__.py:71] +============================+ 2025-12-04T10:08:42.1485700Z W1204 10:04:07.312000 19883 site-packages/torch/_export/__init__.py:72] | !!! WARNING !!! | 2025-12-04T10:08:42.1486538Z W1204 10:04:07.312000 19883 site-packages/torch/_export/__init__.py:73] +============================+ 2025-12-04T10:08:42.1488268Z W1204 10:04:07.312000 19883 site-packages/torch/_export/__init__.py:74] torch._export.aot_compile()/torch._export.aot_load() is being deprecated, please switch to directly calling torch._inductor.aoti_compile_and_package(torch.export.export())/torch._inductor.aoti_load_package() instead. 2025-12-04T10:08:42.1490924Z [W1204 10:04:12.269716879 cgy3tashbjqpuzkl7jeiimyyzcze2gud63na6myypd3346d645j2.wrapper.cpp:752] Warning: "Input 0 was compiled as 16-bytes aligned, but it is not aligned at run time. Copying to an aligned tensor to guarantee correctness, but expect a performance hit." (function run_impl) 2025-12-04T10:08:42.1492436Z PASSED [12.0124s] [ 10%] 2025-12-04T10:08:42.1493351Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_misaligned_input_2_cuda <- test/inductor/test_torchinductor.py PASSED [11.9552s] [ 11%] 2025-12-04T10:08:42.1495233Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_multi_device_cuda <- test/inductor/test_torchinductor.py W1204 10:04:24.982000 19883 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program 2025-12-04T10:08:42.1496887Z W1204 10:04:24.984000 19883 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program 2025-12-04T10:08:42.1497541Z PASSED [6.5199s] [ 12%] 2025-12-04T10:08:42.1498423Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_non_tensor_input_cuda <- test/inductor/test_torchinductor.py PASSED [16.0093s] [ 13%] 2025-12-04T10:08:42.1499939Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_none_args_aot_codegen_cuda <- test/inductor/test_torchinductor.py PASSED [12.8722s] [ 14%] 2025-12-04T10:08:42.1501625Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_normal_functional_cuda <- test/inductor/test_torchinductor.py PASSED [5.3145s] [ 16%] 2025-12-04T10:08:42.1503636Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_pad_non_zero_memory_leak_cuda <- test/inductor/test_torchinductor.py W1204 10:05:05.695000 19883 site-packages/torch/_inductor/utils.py:1703] [0/0] Not enough SMs to use max_autotune_gemm mode 2025-12-04T10:08:42.1505101Z PASSED [6.5301s] [ 17%] 2025-12-04T10:08:42.1506005Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_proxy_executor_squeeze_cuda <- test/inductor/test_torchinductor.py PASSED [5.3510s] [ 18%] 2025-12-04T10:08:42.1509011Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_repeated_calling_cuda <- test/inductor/test_torchinductor.py SKIPPED [0.0009s] (Test is disabled because an issue exists disabling it: https://github.com/pytorch/pytorch/issues/146185 for platform(s) inductor, linux, rocm, slow. If you're seeing this on your local machine and would like to enable this test, please make sure CI is not set and you are not using the flag --import-disabled-tests.) [ 19%] 2025-12-04T10:08:42.1511873Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_replace_unbacked_symbol_with_backed_expr_cuda PASSED [7.6863s] [ 20%] 2025-12-04T10:08:42.1513301Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_reuse_kernel_cuda <- test/inductor/test_torchinductor.py PASSED [11.6272s] [ 21%] 2025-12-04T10:08:42.1514852Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_rocm_triton_autotuning_cuda SKIPPED [0.0032s] (test currently only works on the ROCm stack) [ 22%] 2025-12-04T10:08:42.1516435Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_run_with_grad_enabled_cuda <- test/inductor/test_torchinductor.py PASSED [5.4674s] [ 24%] 2025-12-04T10:08:42.1518083Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_runtime_checks_device_type_failed_cuda Error: input_handles[0]: unmatched device type, expected: 00(cpu), but got: 1 2025-12-04T10:08:42.1519047Z 2025-12-04T10:08:42.1519155Z PASSED [5.7040s] [ 25%] 2025-12-04T10:08:42.1520029Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_scatter_fallback_cuda <- test/inductor/test_torchinductor.py PASSED [5.8492s] [ 26%] 2025-12-04T10:08:42.1521475Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_seq_cuda <- test/inductor/test_torchinductor.py PASSED [5.9965s] [ 27%] 2025-12-04T10:08:42.1522928Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_size_with_unbacked_add_expr_transitive_cuda ('RERUN', {'yellow': True}) [1.1269s] [ 28%] 2025-12-04T10:08:42.1524423Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_size_with_unbacked_add_expr_transitive_cuda ('RERUN', {'yellow': True}) [0.6134s] [ 28%] 2025-12-04T10:08:42.1525841Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_size_with_unbacked_add_expr_transitive_cuda FAILED [0.8981s] [ 28%] 2025-12-04T10:08:42.1526575Z 2025-12-04T10:08:42.1526734Z ==================================== RERUNS ==================================== 2025-12-04T10:08:42.1527390Z _ AOTInductorTestABICompatibleGpu.test_size_with_unbacked_add_expr_transitive_cuda _ 2025-12-04T10:08:42.1528006Z Traceback (most recent call last): 2025-12-04T10:08:42.1528818Z File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor.py", line 1969, in test_size_with_unbacked_add_expr_transitive 2025-12-04T10:08:42.1529749Z self.check_model(Repro(), example_inputs, dynamic_shapes=spec) 2025-12-04T10:08:42.1530533Z File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_utils.py", line 252, in check_model 2025-12-04T10:08:42.1531235Z actual = AOTIRunnerUtil.run( 2025-12-04T10:08:42.1531855Z File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_utils.py", line 184, in run 2025-12-04T10:08:42.1532529Z package_path = AOTIRunnerUtil.compile( 2025-12-04T10:08:42.1533276Z File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_utils.py", line 172, in compile 2025-12-04T10:08:42.1534034Z package_path = torch._inductor.aoti_compile_and_package( 2025-12-04T10:08:42.1534910Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 151, in aoti_compile_and_package 2025-12-04T10:08:42.1535699Z return aot_inductor_minifier_wrapper( 2025-12-04T10:08:42.1536580Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/debug.py", line 1336, in aot_inductor_minifier_wrapper 2025-12-04T10:08:42.1537435Z raise e 2025-12-04T10:08:42.1538123Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/debug.py", line 1306, in aot_inductor_minifier_wrapper 2025-12-04T10:08:42.1538893Z return func( 2025-12-04T10:08:42.1539605Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 195, in _aoti_compile_and_package_inner 2025-12-04T10:08:42.1540529Z aoti_files = aot_compile(gm, args, kwargs, options=inductor_configs) 2025-12-04T10:08:42.1541365Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 311, in aot_compile 2025-12-04T10:08:42.1542075Z return compile_fx_aot( 2025-12-04T10:08:42.1542776Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2007, in compile_fx_aot 2025-12-04T10:08:42.1543546Z compiled_artifacts = compile_fx( 2025-12-04T10:08:42.1544252Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2477, in compile_fx 2025-12-04T10:08:42.1544972Z return compile_fx( 2025-12-04T10:08:42.1545632Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2516, in compile_fx 2025-12-04T10:08:42.1546384Z return _maybe_wrap_and_compile_fx_main( 2025-12-04T10:08:42.1547222Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2605, in _maybe_wrap_and_compile_fx_main 2025-12-04T10:08:42.1548060Z return _compile_fx_main( 2025-12-04T10:08:42.1548783Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2788, in _compile_fx_main 2025-12-04T10:08:42.1549621Z return inference_compiler(unlifted_gm, example_inputs_) 2025-12-04T10:08:42.1550155Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/schemas.py", line 1249, in __call__ 2025-12-04T10:08:42.1550310Z return self.compiler_fn(gm, example_inputs) 2025-12-04T10:08:42.1550826Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2669, in fw_compiler_base 2025-12-04T10:08:42.1550945Z return compile_fx_forward( 2025-12-04T10:08:42.1551457Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2341, in compile_fx_forward 2025-12-04T10:08:42.1551583Z return inner_compile( 2025-12-04T10:08:42.1551869Z File "/opt/conda/envs/py_3.10/lib/python3.10/contextlib.py", line 79, in inner 2025-12-04T10:08:42.1551997Z return func(*args, **kwds) 2025-12-04T10:08:42.1552490Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 806, in compile_fx_inner 2025-12-04T10:08:42.1552755Z return wrap_compiler_debug(_compile_fx_inner, compiler_name="inductor")( 2025-12-04T10:08:42.1553260Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/repro/after_aot.py", line 146, in debug_wrapper 2025-12-04T10:08:42.1553442Z inner_compiled_fn = compiler_fn(gm, example_inputs) 2025-12-04T10:08:42.1553943Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 992, in _compile_fx_inner 2025-12-04T10:08:42.1554148Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:08:42.1554647Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 988, in _compile_fx_inner 2025-12-04T10:08:42.1554892Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:08:42.1555423Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:08:42.1555744Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:08:42.1556276Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1471, in codegen_and_compile 2025-12-04T10:08:42.1556465Z _check_triton_bf16_support(graph) 2025-12-04T10:08:42.1557025Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2911, in _check_triton_bf16_support 2025-12-04T10:08:42.1557148Z warn_and_skip(node.get_device()) 2025-12-04T10:08:42.1557630Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2894, in warn_and_skip 2025-12-04T10:08:42.1557790Z raise SkipFrame("BF16 is not supported") 2025-12-04T10:08:42.1558052Z torch._inductor.exc.InductorError: SkipFrame: BF16 is not supported 2025-12-04T10:08:42.1558058Z 2025-12-04T10:08:42.1558276Z To execute this test, run the following from the base repo dir: 2025-12-04T10:08:42.1559005Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_aot_inductor.py AOTInductorTestABICompatibleGpu.test_size_with_unbacked_add_expr_transitive_cuda 2025-12-04T10:08:42.1559016Z 2025-12-04T10:08:42.1559284Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:08:42.1559524Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:08:42.1559631Z unimplemented [] 2025-12-04T10:08:42.1559797Z stats [('calls_captured', 22), ('unique_graphs', 1)] 2025-12-04T10:08:42.1560055Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1)] 2025-12-04T10:08:42.1560155Z graph_break [] 2025-12-04T10:08:42.1560387Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:08:42.1561209Z /opt/conda/envs/py_3.10/lib/python3.10/copyreg.py:101: FutureWarning: `isinstance(treespec, LeafSpec)` is deprecated, use `isinstance(treespec, TreeSpec) and treespec.is_leaf()` instead. 2025-12-04T10:08:42.1561328Z return cls.__new__(cls, *args) 2025-12-04T10:08:42.1562069Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T10:08:42.1562178Z warnings.warn( 2025-12-04T10:08:42.1562539Z _ AOTInductorTestABICompatibleGpu.test_size_with_unbacked_add_expr_transitive_cuda _ 2025-12-04T10:08:42.1562675Z Traceback (most recent call last): 2025-12-04T10:08:42.1563238Z File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor.py", line 1969, in test_size_with_unbacked_add_expr_transitive 2025-12-04T10:08:42.1563476Z self.check_model(Repro(), example_inputs, dynamic_shapes=spec) 2025-12-04T10:08:42.1563912Z File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_utils.py", line 252, in check_model 2025-12-04T10:08:42.1564035Z actual = AOTIRunnerUtil.run( 2025-12-04T10:08:42.1564435Z File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_utils.py", line 184, in run 2025-12-04T10:08:42.1564578Z package_path = AOTIRunnerUtil.compile( 2025-12-04T10:08:42.1564985Z File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_utils.py", line 172, in compile 2025-12-04T10:08:42.1565201Z package_path = torch._inductor.aoti_compile_and_package( 2025-12-04T10:08:42.1565727Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 151, in aoti_compile_and_package 2025-12-04T10:08:42.1565871Z return aot_inductor_minifier_wrapper( 2025-12-04T10:08:42.1566410Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/debug.py", line 1336, in aot_inductor_minifier_wrapper 2025-12-04T10:08:42.1566506Z raise e 2025-12-04T10:08:42.1567131Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/debug.py", line 1306, in aot_inductor_minifier_wrapper 2025-12-04T10:08:42.1567230Z return func( 2025-12-04T10:08:42.1567793Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 195, in _aoti_compile_and_package_inner 2025-12-04T10:08:42.1568027Z aoti_files = aot_compile(gm, args, kwargs, options=inductor_configs) 2025-12-04T10:08:42.1568569Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 311, in aot_compile 2025-12-04T10:08:42.1568698Z return compile_fx_aot( 2025-12-04T10:08:42.1569191Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2007, in compile_fx_aot 2025-12-04T10:08:42.1569316Z compiled_artifacts = compile_fx( 2025-12-04T10:08:42.1569801Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2477, in compile_fx 2025-12-04T10:08:42.1569908Z return compile_fx( 2025-12-04T10:08:42.1570390Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2516, in compile_fx 2025-12-04T10:08:42.1570527Z return _maybe_wrap_and_compile_fx_main( 2025-12-04T10:08:42.1571276Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2605, in _maybe_wrap_and_compile_fx_main 2025-12-04T10:08:42.1571412Z return _compile_fx_main( 2025-12-04T10:08:42.1571917Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2788, in _compile_fx_main 2025-12-04T10:08:42.1572118Z return inference_compiler(unlifted_gm, example_inputs_) 2025-12-04T10:08:42.1572651Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/schemas.py", line 1249, in __call__ 2025-12-04T10:08:42.1572804Z return self.compiler_fn(gm, example_inputs) 2025-12-04T10:08:42.1573321Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2669, in fw_compiler_base 2025-12-04T10:08:42.1573440Z return compile_fx_forward( 2025-12-04T10:08:42.1573957Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2341, in compile_fx_forward 2025-12-04T10:08:42.1574082Z return inner_compile( 2025-12-04T10:08:42.1574364Z File "/opt/conda/envs/py_3.10/lib/python3.10/contextlib.py", line 79, in inner 2025-12-04T10:08:42.1574498Z return func(*args, **kwds) 2025-12-04T10:08:42.1574994Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 806, in compile_fx_inner 2025-12-04T10:08:42.1575260Z return wrap_compiler_debug(_compile_fx_inner, compiler_name="inductor")( 2025-12-04T10:08:42.1575771Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/repro/after_aot.py", line 146, in debug_wrapper 2025-12-04T10:08:42.1575945Z inner_compiled_fn = compiler_fn(gm, example_inputs) 2025-12-04T10:08:42.1576551Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 992, in _compile_fx_inner 2025-12-04T10:08:42.1576763Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:08:42.1577266Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 988, in _compile_fx_inner 2025-12-04T10:08:42.1577431Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:08:42.1577970Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:08:42.1578294Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:08:42.1578833Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1471, in codegen_and_compile 2025-12-04T10:08:42.1578962Z _check_triton_bf16_support(graph) 2025-12-04T10:08:42.1579669Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2911, in _check_triton_bf16_support 2025-12-04T10:08:42.1579795Z warn_and_skip(node.get_device()) 2025-12-04T10:08:42.1580280Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2894, in warn_and_skip 2025-12-04T10:08:42.1580438Z raise SkipFrame("BF16 is not supported") 2025-12-04T10:08:42.1580699Z torch._inductor.exc.InductorError: SkipFrame: BF16 is not supported 2025-12-04T10:08:42.1580783Z 2025-12-04T10:08:42.1581003Z To execute this test, run the following from the base repo dir: 2025-12-04T10:08:42.1581737Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_aot_inductor.py AOTInductorTestABICompatibleGpu.test_size_with_unbacked_add_expr_transitive_cuda 2025-12-04T10:08:42.1581743Z 2025-12-04T10:08:42.1582014Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:08:42.1582257Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:08:42.1582370Z unimplemented [] 2025-12-04T10:08:42.1582539Z stats [('calls_captured', 22), ('unique_graphs', 1)] 2025-12-04T10:08:42.1582803Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1)] 2025-12-04T10:08:42.1582906Z graph_break [] 2025-12-04T10:08:42.1583141Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:08:42.1583964Z /opt/conda/envs/py_3.10/lib/python3.10/copyreg.py:101: FutureWarning: `isinstance(treespec, LeafSpec)` is deprecated, use `isinstance(treespec, TreeSpec) and treespec.is_leaf()` instead. 2025-12-04T10:08:42.1584083Z return cls.__new__(cls, *args) 2025-12-04T10:08:42.1584826Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T10:08:42.1584933Z warnings.warn( 2025-12-04T10:08:42.1585154Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:08:42.1585280Z unimplemented [] 2025-12-04T10:08:42.1585452Z stats [('calls_captured', 22), ('unique_graphs', 1)] 2025-12-04T10:08:42.1585712Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1)] 2025-12-04T10:08:42.1585815Z graph_break [] 2025-12-04T10:08:42.1586033Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:08:42.1586853Z /opt/conda/envs/py_3.10/lib/python3.10/copyreg.py:101: FutureWarning: `isinstance(treespec, LeafSpec)` is deprecated, use `isinstance(treespec, TreeSpec) and treespec.is_leaf()` instead. 2025-12-04T10:08:42.1586974Z return cls.__new__(cls, *args) 2025-12-04T10:08:42.1587699Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T10:08:42.1587816Z warnings.warn( 2025-12-04T10:08:42.1587964Z =================================== FAILURES =================================== 2025-12-04T10:08:42.1588338Z _ AOTInductorTestABICompatibleGpu.test_size_with_unbacked_add_expr_transitive_cuda _ 2025-12-04T10:08:42.1588464Z Traceback (most recent call last): 2025-12-04T10:08:42.1589028Z File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor.py", line 1969, in test_size_with_unbacked_add_expr_transitive 2025-12-04T10:08:42.1589266Z self.check_model(Repro(), example_inputs, dynamic_shapes=spec) 2025-12-04T10:08:42.1589705Z File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_utils.py", line 252, in check_model 2025-12-04T10:08:42.1589842Z actual = AOTIRunnerUtil.run( 2025-12-04T10:08:42.1590226Z File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_utils.py", line 184, in run 2025-12-04T10:08:42.1590371Z package_path = AOTIRunnerUtil.compile( 2025-12-04T10:08:42.1590783Z File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_utils.py", line 172, in compile 2025-12-04T10:08:42.1590985Z package_path = torch._inductor.aoti_compile_and_package( 2025-12-04T10:08:42.1591578Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 151, in aoti_compile_and_package 2025-12-04T10:08:42.1591723Z return aot_inductor_minifier_wrapper( 2025-12-04T10:08:42.1592262Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/debug.py", line 1336, in aot_inductor_minifier_wrapper 2025-12-04T10:08:42.1592433Z raise e 2025-12-04T10:08:42.1592970Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/debug.py", line 1306, in aot_inductor_minifier_wrapper 2025-12-04T10:08:42.1593066Z return func( 2025-12-04T10:08:42.1593626Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 195, in _aoti_compile_and_package_inner 2025-12-04T10:08:42.1593857Z aoti_files = aot_compile(gm, args, kwargs, options=inductor_configs) 2025-12-04T10:08:42.1594319Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 311, in aot_compile 2025-12-04T10:08:42.1594444Z return compile_fx_aot( 2025-12-04T10:08:42.1594933Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2007, in compile_fx_aot 2025-12-04T10:08:42.1595071Z compiled_artifacts = compile_fx( 2025-12-04T10:08:42.1595538Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2477, in compile_fx 2025-12-04T10:08:42.1595652Z return compile_fx( 2025-12-04T10:08:42.1596135Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2516, in compile_fx 2025-12-04T10:08:42.1596270Z return _maybe_wrap_and_compile_fx_main( 2025-12-04T10:08:42.1596841Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2605, in _maybe_wrap_and_compile_fx_main 2025-12-04T10:08:42.1596971Z return _compile_fx_main( 2025-12-04T10:08:42.1597475Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2788, in _compile_fx_main 2025-12-04T10:08:42.1597691Z return inference_compiler(unlifted_gm, example_inputs_) 2025-12-04T10:08:42.1598213Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/schemas.py", line 1249, in __call__ 2025-12-04T10:08:42.1598363Z return self.compiler_fn(gm, example_inputs) 2025-12-04T10:08:42.1598879Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2669, in fw_compiler_base 2025-12-04T10:08:42.1598996Z return compile_fx_forward( 2025-12-04T10:08:42.1599524Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2341, in compile_fx_forward 2025-12-04T10:08:42.1599635Z return inner_compile( 2025-12-04T10:08:42.1599921Z File "/opt/conda/envs/py_3.10/lib/python3.10/contextlib.py", line 79, in inner 2025-12-04T10:08:42.1600050Z return func(*args, **kwds) 2025-12-04T10:08:42.1600551Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 806, in compile_fx_inner 2025-12-04T10:08:42.1600816Z return wrap_compiler_debug(_compile_fx_inner, compiler_name="inductor")( 2025-12-04T10:08:42.1601321Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/repro/after_aot.py", line 146, in debug_wrapper 2025-12-04T10:08:42.1601497Z inner_compiled_fn = compiler_fn(gm, example_inputs) 2025-12-04T10:08:42.1602015Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 992, in _compile_fx_inner 2025-12-04T10:08:42.1602207Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:08:42.1602707Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 988, in _compile_fx_inner 2025-12-04T10:08:42.1602864Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:08:42.1603463Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:08:42.1603797Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:08:42.1604315Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1471, in codegen_and_compile 2025-12-04T10:08:42.1604440Z _check_triton_bf16_support(graph) 2025-12-04T10:08:42.1605059Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2911, in _check_triton_bf16_support 2025-12-04T10:08:42.1605182Z warn_and_skip(node.get_device()) 2025-12-04T10:08:42.1605665Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2894, in warn_and_skip 2025-12-04T10:08:42.1605818Z raise SkipFrame("BF16 is not supported") 2025-12-04T10:08:42.1606076Z torch._inductor.exc.InductorError: SkipFrame: BF16 is not supported 2025-12-04T10:08:42.1606083Z 2025-12-04T10:08:42.1606320Z To execute this test, run the following from the base repo dir: 2025-12-04T10:08:42.1607036Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_aot_inductor.py AOTInductorTestABICompatibleGpu.test_size_with_unbacked_add_expr_transitive_cuda 2025-12-04T10:08:42.1607042Z 2025-12-04T10:08:42.1607314Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:08:42.1607552Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:08:42.1607663Z unimplemented [] 2025-12-04T10:08:42.1607841Z stats [('calls_captured', 22), ('unique_graphs', 1)] 2025-12-04T10:08:42.1608084Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1)] 2025-12-04T10:08:42.1608185Z graph_break [] 2025-12-04T10:08:42.1608415Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:08:42.1609235Z /opt/conda/envs/py_3.10/lib/python3.10/copyreg.py:101: FutureWarning: `isinstance(treespec, LeafSpec)` is deprecated, use `isinstance(treespec, TreeSpec) and treespec.is_leaf()` instead. 2025-12-04T10:08:42.1609352Z return cls.__new__(cls, *args) 2025-12-04T10:08:42.1610088Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T10:08:42.1610193Z warnings.warn( 2025-12-04T10:08:42.1610427Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:08:42.1610538Z unimplemented [] 2025-12-04T10:08:42.1610704Z stats [('calls_captured', 22), ('unique_graphs', 1)] 2025-12-04T10:08:42.1610963Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1)] 2025-12-04T10:08:42.1611061Z graph_break [] 2025-12-04T10:08:42.1611277Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:08:42.1612100Z /opt/conda/envs/py_3.10/lib/python3.10/copyreg.py:101: FutureWarning: `isinstance(treespec, LeafSpec)` is deprecated, use `isinstance(treespec, TreeSpec) and treespec.is_leaf()` instead. 2025-12-04T10:08:42.1612220Z return cls.__new__(cls, *args) 2025-12-04T10:08:42.1612953Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T10:08:42.1613059Z warnings.warn( 2025-12-04T10:08:42.1613275Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:08:42.1613395Z unimplemented [] 2025-12-04T10:08:42.1613561Z stats [('calls_captured', 22), ('unique_graphs', 1)] 2025-12-04T10:08:42.1613803Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1)] 2025-12-04T10:08:42.1613918Z graph_break [] 2025-12-04T10:08:42.1614131Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:08:42.1615021Z /opt/conda/envs/py_3.10/lib/python3.10/copyreg.py:101: FutureWarning: `isinstance(treespec, LeafSpec)` is deprecated, use `isinstance(treespec, TreeSpec) and treespec.is_leaf()` instead. 2025-12-04T10:08:42.1615137Z return cls.__new__(cls, *args) 2025-12-04T10:08:42.1615860Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T10:08:42.1615975Z warnings.warn( 2025-12-04T10:08:42.1616899Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-9dce38c1d023996d.xml - 2025-12-04T10:08:42.1617182Z =========================== short test summary info ============================ 2025-12-04T10:08:42.1618090Z FAILED [0.8981s] inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_size_with_unbacked_add_expr_transitive_cuda - torch._inductor.exc.InductorError: SkipFrame: BF16 is not supported 2025-12-04T10:08:42.1618096Z 2025-12-04T10:08:42.1618313Z To execute this test, run the following from the base repo dir: 2025-12-04T10:08:42.1619050Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_aot_inductor.py AOTInductorTestABICompatibleGpu.test_size_with_unbacked_add_expr_transitive_cuda 2025-12-04T10:08:42.1619056Z 2025-12-04T10:08:42.1619325Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:08:42.1619521Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:08:42.1619778Z = 1 failed, 20 passed, 4 skipped, 71 deselected, 2 rerun in 157.74s (0:02:37) == 2025-12-04T10:08:42.1619879Z Got exit code 1 2025-12-04T10:08:42.1620005Z Retrying single test... 2025-12-04T10:08:42.1620450Z W1204 10:06:14.384000 23467 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T10:08:42.1621029Z Test results will be stored in test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-b570798f966501a4.xml 2025-12-04T10:08:42.1621195Z ============================= test session starts ============================== 2025-12-04T10:08:42.1621551Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:08:42.1621674Z cachedir: .pytest_cache 2025-12-04T10:08:42.1622194Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:08:42.1622321Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:08:42.1622448Z configfile: pytest.ini 2025-12-04T10:08:42.1622989Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:08:42.1623229Z collecting ... collected 934 items / 157 deselected / 777 selected 2025-12-04T10:08:42.1624027Z stepcurrent: skipping 95 already run items. Running only test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_size_with_unbacked_add_expr_transitive_cuda 2025-12-04T10:08:42.1624143Z Running 1 items in this shard 2025-12-04T10:08:42.1624149Z 2025-12-04T10:08:42.1624848Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_size_with_unbacked_add_expr_transitive_cuda ('RERUN', {'yellow': True}) [4.1738s] [100%] 2025-12-04T10:08:42.1625532Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_size_with_unbacked_add_expr_transitive_cuda ('RERUN', {'yellow': True}) [0.6041s] [100%] 2025-12-04T10:08:42.1626139Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_size_with_unbacked_add_expr_transitive_cuda FAILED [0.6119s] [100%] 2025-12-04T10:08:42.1626148Z 2025-12-04T10:08:42.1626295Z ==================================== RERUNS ==================================== 2025-12-04T10:08:42.1626655Z _ AOTInductorTestABICompatibleGpu.test_size_with_unbacked_add_expr_transitive_cuda _ 2025-12-04T10:08:42.1626792Z Traceback (most recent call last): 2025-12-04T10:08:42.1627356Z File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor.py", line 1969, in test_size_with_unbacked_add_expr_transitive 2025-12-04T10:08:42.1627660Z self.check_model(Repro(), example_inputs, dynamic_shapes=spec) 2025-12-04T10:08:42.1628096Z File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_utils.py", line 252, in check_model 2025-12-04T10:08:42.1628218Z actual = AOTIRunnerUtil.run( 2025-12-04T10:08:42.1628621Z File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_utils.py", line 184, in run 2025-12-04T10:08:42.1628823Z package_path = AOTIRunnerUtil.compile( 2025-12-04T10:08:42.1629228Z File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_utils.py", line 172, in compile 2025-12-04T10:08:42.1629444Z package_path = torch._inductor.aoti_compile_and_package( 2025-12-04T10:08:42.1629973Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 151, in aoti_compile_and_package 2025-12-04T10:08:42.1630121Z return aot_inductor_minifier_wrapper( 2025-12-04T10:08:42.1630669Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/debug.py", line 1336, in aot_inductor_minifier_wrapper 2025-12-04T10:08:42.1630767Z raise e 2025-12-04T10:08:42.1631317Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/debug.py", line 1306, in aot_inductor_minifier_wrapper 2025-12-04T10:08:42.1631417Z return func( 2025-12-04T10:08:42.1631962Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 195, in _aoti_compile_and_package_inner 2025-12-04T10:08:42.1632213Z aoti_files = aot_compile(gm, args, kwargs, options=inductor_configs) 2025-12-04T10:08:42.1632671Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 311, in aot_compile 2025-12-04T10:08:42.1632797Z return compile_fx_aot( 2025-12-04T10:08:42.1633288Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2007, in compile_fx_aot 2025-12-04T10:08:42.1633413Z compiled_artifacts = compile_fx( 2025-12-04T10:08:42.1633903Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2477, in compile_fx 2025-12-04T10:08:42.1634010Z return compile_fx( 2025-12-04T10:08:42.1634488Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2516, in compile_fx 2025-12-04T10:08:42.1634623Z return _maybe_wrap_and_compile_fx_main( 2025-12-04T10:08:42.1635199Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2605, in _maybe_wrap_and_compile_fx_main 2025-12-04T10:08:42.1635331Z return _compile_fx_main( 2025-12-04T10:08:42.1635831Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2788, in _compile_fx_main 2025-12-04T10:08:42.1636032Z return inference_compiler(unlifted_gm, example_inputs_) 2025-12-04T10:08:42.1636563Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/schemas.py", line 1249, in __call__ 2025-12-04T10:08:42.1636719Z return self.compiler_fn(gm, example_inputs) 2025-12-04T10:08:42.1637230Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2669, in fw_compiler_base 2025-12-04T10:08:42.1637347Z return compile_fx_forward( 2025-12-04T10:08:42.1637873Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2341, in compile_fx_forward 2025-12-04T10:08:42.1638001Z return inner_compile( 2025-12-04T10:08:42.1638284Z File "/opt/conda/envs/py_3.10/lib/python3.10/contextlib.py", line 79, in inner 2025-12-04T10:08:42.1638399Z return func(*args, **kwds) 2025-12-04T10:08:42.1638914Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 806, in compile_fx_inner 2025-12-04T10:08:42.1639183Z return wrap_compiler_debug(_compile_fx_inner, compiler_name="inductor")( 2025-12-04T10:08:42.1639842Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/repro/after_aot.py", line 146, in debug_wrapper 2025-12-04T10:08:42.1640022Z inner_compiled_fn = compiler_fn(gm, example_inputs) 2025-12-04T10:08:42.1640525Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 992, in _compile_fx_inner 2025-12-04T10:08:42.1640734Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:08:42.1641237Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 988, in _compile_fx_inner 2025-12-04T10:08:42.1641458Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:08:42.1641991Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:08:42.1642317Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:08:42.1642852Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1471, in codegen_and_compile 2025-12-04T10:08:42.1642986Z _check_triton_bf16_support(graph) 2025-12-04T10:08:42.1643551Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2911, in _check_triton_bf16_support 2025-12-04T10:08:42.1643676Z warn_and_skip(node.get_device()) 2025-12-04T10:08:42.1644162Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2894, in warn_and_skip 2025-12-04T10:08:42.1644326Z raise SkipFrame("BF16 is not supported") 2025-12-04T10:08:42.1644585Z torch._inductor.exc.InductorError: SkipFrame: BF16 is not supported 2025-12-04T10:08:42.1644592Z 2025-12-04T10:08:42.1644811Z To execute this test, run the following from the base repo dir: 2025-12-04T10:08:42.1645546Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_aot_inductor.py AOTInductorTestABICompatibleGpu.test_size_with_unbacked_add_expr_transitive_cuda 2025-12-04T10:08:42.1645551Z 2025-12-04T10:08:42.1645826Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:08:42.1646067Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:08:42.1646177Z unimplemented [] 2025-12-04T10:08:42.1646345Z stats [('calls_captured', 22), ('unique_graphs', 1)] 2025-12-04T10:08:42.1646608Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1)] 2025-12-04T10:08:42.1646717Z graph_break [] 2025-12-04T10:08:42.1646937Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:08:42.1647772Z /opt/conda/envs/py_3.10/lib/python3.10/copyreg.py:101: FutureWarning: `isinstance(treespec, LeafSpec)` is deprecated, use `isinstance(treespec, TreeSpec) and treespec.is_leaf()` instead. 2025-12-04T10:08:42.1647893Z return cls.__new__(cls, *args) 2025-12-04T10:08:42.1648640Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T10:08:42.1648749Z warnings.warn( 2025-12-04T10:08:42.1649114Z _ AOTInductorTestABICompatibleGpu.test_size_with_unbacked_add_expr_transitive_cuda _ 2025-12-04T10:08:42.1649255Z Traceback (most recent call last): 2025-12-04T10:08:42.1649816Z File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor.py", line 1969, in test_size_with_unbacked_add_expr_transitive 2025-12-04T10:08:42.1650056Z self.check_model(Repro(), example_inputs, dynamic_shapes=spec) 2025-12-04T10:08:42.1650492Z File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_utils.py", line 252, in check_model 2025-12-04T10:08:42.1650615Z actual = AOTIRunnerUtil.run( 2025-12-04T10:08:42.1651016Z File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_utils.py", line 184, in run 2025-12-04T10:08:42.1651158Z package_path = AOTIRunnerUtil.compile( 2025-12-04T10:08:42.1651563Z File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_utils.py", line 172, in compile 2025-12-04T10:08:42.1651847Z package_path = torch._inductor.aoti_compile_and_package( 2025-12-04T10:08:42.1652376Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 151, in aoti_compile_and_package 2025-12-04T10:08:42.1652520Z return aot_inductor_minifier_wrapper( 2025-12-04T10:08:42.1653065Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/debug.py", line 1336, in aot_inductor_minifier_wrapper 2025-12-04T10:08:42.1653216Z raise e 2025-12-04T10:08:42.1653769Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/debug.py", line 1306, in aot_inductor_minifier_wrapper 2025-12-04T10:08:42.1653868Z return func( 2025-12-04T10:08:42.1654427Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 195, in _aoti_compile_and_package_inner 2025-12-04T10:08:42.1654658Z aoti_files = aot_compile(gm, args, kwargs, options=inductor_configs) 2025-12-04T10:08:42.1655118Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 311, in aot_compile 2025-12-04T10:08:42.1655246Z return compile_fx_aot( 2025-12-04T10:08:42.1655737Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2007, in compile_fx_aot 2025-12-04T10:08:42.1655861Z compiled_artifacts = compile_fx( 2025-12-04T10:08:42.1656424Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2477, in compile_fx 2025-12-04T10:08:42.1656543Z return compile_fx( 2025-12-04T10:08:42.1657028Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2516, in compile_fx 2025-12-04T10:08:42.1657164Z return _maybe_wrap_and_compile_fx_main( 2025-12-04T10:08:42.1657738Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2605, in _maybe_wrap_and_compile_fx_main 2025-12-04T10:08:42.1657871Z return _compile_fx_main( 2025-12-04T10:08:42.1658374Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2788, in _compile_fx_main 2025-12-04T10:08:42.1658578Z return inference_compiler(unlifted_gm, example_inputs_) 2025-12-04T10:08:42.1659114Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/schemas.py", line 1249, in __call__ 2025-12-04T10:08:42.1659264Z return self.compiler_fn(gm, example_inputs) 2025-12-04T10:08:42.1659780Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2669, in fw_compiler_base 2025-12-04T10:08:42.1659900Z return compile_fx_forward( 2025-12-04T10:08:42.1660413Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2341, in compile_fx_forward 2025-12-04T10:08:42.1660537Z return inner_compile( 2025-12-04T10:08:42.1660818Z File "/opt/conda/envs/py_3.10/lib/python3.10/contextlib.py", line 79, in inner 2025-12-04T10:08:42.1660950Z return func(*args, **kwds) 2025-12-04T10:08:42.1661444Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 806, in compile_fx_inner 2025-12-04T10:08:42.1661709Z return wrap_compiler_debug(_compile_fx_inner, compiler_name="inductor")( 2025-12-04T10:08:42.1662217Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/repro/after_aot.py", line 146, in debug_wrapper 2025-12-04T10:08:42.1662398Z inner_compiled_fn = compiler_fn(gm, example_inputs) 2025-12-04T10:08:42.1662900Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 992, in _compile_fx_inner 2025-12-04T10:08:42.1663106Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:08:42.1663608Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 988, in _compile_fx_inner 2025-12-04T10:08:42.1663768Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:08:42.1664393Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:08:42.1664718Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:08:42.1665247Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1471, in codegen_and_compile 2025-12-04T10:08:42.1665430Z _check_triton_bf16_support(graph) 2025-12-04T10:08:42.1665984Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2911, in _check_triton_bf16_support 2025-12-04T10:08:42.1666107Z warn_and_skip(node.get_device()) 2025-12-04T10:08:42.1666590Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2894, in warn_and_skip 2025-12-04T10:08:42.1666745Z raise SkipFrame("BF16 is not supported") 2025-12-04T10:08:42.1667004Z torch._inductor.exc.InductorError: SkipFrame: BF16 is not supported 2025-12-04T10:08:42.1667014Z 2025-12-04T10:08:42.1667232Z To execute this test, run the following from the base repo dir: 2025-12-04T10:08:42.1667964Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_aot_inductor.py AOTInductorTestABICompatibleGpu.test_size_with_unbacked_add_expr_transitive_cuda 2025-12-04T10:08:42.1667969Z 2025-12-04T10:08:42.1668238Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:08:42.1668479Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:08:42.1668585Z unimplemented [] 2025-12-04T10:08:42.1668750Z stats [('calls_captured', 22), ('unique_graphs', 1)] 2025-12-04T10:08:42.1669008Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1)] 2025-12-04T10:08:42.1669110Z graph_break [] 2025-12-04T10:08:42.1669327Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:08:42.1670162Z /opt/conda/envs/py_3.10/lib/python3.10/copyreg.py:101: FutureWarning: `isinstance(treespec, LeafSpec)` is deprecated, use `isinstance(treespec, TreeSpec) and treespec.is_leaf()` instead. 2025-12-04T10:08:42.1670281Z return cls.__new__(cls, *args) 2025-12-04T10:08:42.1671192Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T10:08:42.1671300Z warnings.warn( 2025-12-04T10:08:42.1671525Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:08:42.1671647Z unimplemented [] 2025-12-04T10:08:42.1671816Z stats [('calls_captured', 22), ('unique_graphs', 1)] 2025-12-04T10:08:42.1672077Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1)] 2025-12-04T10:08:42.1672179Z graph_break [] 2025-12-04T10:08:42.1672397Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:08:42.1673226Z /opt/conda/envs/py_3.10/lib/python3.10/copyreg.py:101: FutureWarning: `isinstance(treespec, LeafSpec)` is deprecated, use `isinstance(treespec, TreeSpec) and treespec.is_leaf()` instead. 2025-12-04T10:08:42.1673344Z return cls.__new__(cls, *args) 2025-12-04T10:08:42.1674071Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T10:08:42.1674186Z warnings.warn( 2025-12-04T10:08:42.1674337Z =================================== FAILURES =================================== 2025-12-04T10:08:42.1674707Z _ AOTInductorTestABICompatibleGpu.test_size_with_unbacked_add_expr_transitive_cuda _ 2025-12-04T10:08:42.1674831Z Traceback (most recent call last): 2025-12-04T10:08:42.1675393Z File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor.py", line 1969, in test_size_with_unbacked_add_expr_transitive 2025-12-04T10:08:42.1675629Z self.check_model(Repro(), example_inputs, dynamic_shapes=spec) 2025-12-04T10:08:42.1676190Z File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_utils.py", line 252, in check_model 2025-12-04T10:08:42.1676318Z actual = AOTIRunnerUtil.run( 2025-12-04T10:08:42.1676717Z File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_utils.py", line 184, in run 2025-12-04T10:08:42.1676863Z package_path = AOTIRunnerUtil.compile( 2025-12-04T10:08:42.1677284Z File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_utils.py", line 172, in compile 2025-12-04T10:08:42.1677568Z package_path = torch._inductor.aoti_compile_and_package( 2025-12-04T10:08:42.1678094Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 151, in aoti_compile_and_package 2025-12-04T10:08:42.1678237Z return aot_inductor_minifier_wrapper( 2025-12-04T10:08:42.1678781Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/debug.py", line 1336, in aot_inductor_minifier_wrapper 2025-12-04T10:08:42.1678889Z raise e 2025-12-04T10:08:42.1679433Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/debug.py", line 1306, in aot_inductor_minifier_wrapper 2025-12-04T10:08:42.1679532Z return func( 2025-12-04T10:08:42.1680094Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 195, in _aoti_compile_and_package_inner 2025-12-04T10:08:42.1680327Z aoti_files = aot_compile(gm, args, kwargs, options=inductor_configs) 2025-12-04T10:08:42.1680793Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 311, in aot_compile 2025-12-04T10:08:42.1680918Z return compile_fx_aot( 2025-12-04T10:08:42.1681411Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2007, in compile_fx_aot 2025-12-04T10:08:42.1681548Z compiled_artifacts = compile_fx( 2025-12-04T10:08:42.1682018Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2477, in compile_fx 2025-12-04T10:08:42.1682127Z return compile_fx( 2025-12-04T10:08:42.1682606Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2516, in compile_fx 2025-12-04T10:08:42.1682744Z return _maybe_wrap_and_compile_fx_main( 2025-12-04T10:08:42.1683312Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2605, in _maybe_wrap_and_compile_fx_main 2025-12-04T10:08:42.1683442Z return _compile_fx_main( 2025-12-04T10:08:42.1683941Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2788, in _compile_fx_main 2025-12-04T10:08:42.1684153Z return inference_compiler(unlifted_gm, example_inputs_) 2025-12-04T10:08:42.1684672Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/schemas.py", line 1249, in __call__ 2025-12-04T10:08:42.1684821Z return self.compiler_fn(gm, example_inputs) 2025-12-04T10:08:42.1685338Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2669, in fw_compiler_base 2025-12-04T10:08:42.1685456Z return compile_fx_forward( 2025-12-04T10:08:42.1685981Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2341, in compile_fx_forward 2025-12-04T10:08:42.1686091Z return inner_compile( 2025-12-04T10:08:42.1686371Z File "/opt/conda/envs/py_3.10/lib/python3.10/contextlib.py", line 79, in inner 2025-12-04T10:08:42.1686505Z return func(*args, **kwds) 2025-12-04T10:08:42.1687002Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 806, in compile_fx_inner 2025-12-04T10:08:42.1687266Z return wrap_compiler_debug(_compile_fx_inner, compiler_name="inductor")( 2025-12-04T10:08:42.1687769Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/repro/after_aot.py", line 146, in debug_wrapper 2025-12-04T10:08:42.1687943Z inner_compiled_fn = compiler_fn(gm, example_inputs) 2025-12-04T10:08:42.1688520Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 992, in _compile_fx_inner 2025-12-04T10:08:42.1688714Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:08:42.1689211Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 988, in _compile_fx_inner 2025-12-04T10:08:42.1689375Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:08:42.1689969Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:08:42.1690306Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:08:42.1690825Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1471, in codegen_and_compile 2025-12-04T10:08:42.1690953Z _check_triton_bf16_support(graph) 2025-12-04T10:08:42.1691517Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2911, in _check_triton_bf16_support 2025-12-04T10:08:42.1691642Z warn_and_skip(node.get_device()) 2025-12-04T10:08:42.1692123Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2894, in warn_and_skip 2025-12-04T10:08:42.1692278Z raise SkipFrame("BF16 is not supported") 2025-12-04T10:08:42.1692534Z torch._inductor.exc.InductorError: SkipFrame: BF16 is not supported 2025-12-04T10:08:42.1692544Z 2025-12-04T10:08:42.1692775Z To execute this test, run the following from the base repo dir: 2025-12-04T10:08:42.1693493Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_aot_inductor.py AOTInductorTestABICompatibleGpu.test_size_with_unbacked_add_expr_transitive_cuda 2025-12-04T10:08:42.1693499Z 2025-12-04T10:08:42.1693766Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:08:42.1693999Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:08:42.1694108Z unimplemented [] 2025-12-04T10:08:42.1694286Z stats [('calls_captured', 22), ('unique_graphs', 1)] 2025-12-04T10:08:42.1694530Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1)] 2025-12-04T10:08:42.1694631Z graph_break [] 2025-12-04T10:08:42.1694860Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:08:42.1695681Z /opt/conda/envs/py_3.10/lib/python3.10/copyreg.py:101: FutureWarning: `isinstance(treespec, LeafSpec)` is deprecated, use `isinstance(treespec, TreeSpec) and treespec.is_leaf()` instead. 2025-12-04T10:08:42.1695800Z return cls.__new__(cls, *args) 2025-12-04T10:08:42.1696615Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T10:08:42.1696725Z warnings.warn( 2025-12-04T10:08:42.1696958Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:08:42.1697069Z unimplemented [] 2025-12-04T10:08:42.1697238Z stats [('calls_captured', 22), ('unique_graphs', 1)] 2025-12-04T10:08:42.1697501Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1)] 2025-12-04T10:08:42.1697602Z graph_break [] 2025-12-04T10:08:42.1697822Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:08:42.1698647Z /opt/conda/envs/py_3.10/lib/python3.10/copyreg.py:101: FutureWarning: `isinstance(treespec, LeafSpec)` is deprecated, use `isinstance(treespec, TreeSpec) and treespec.is_leaf()` instead. 2025-12-04T10:08:42.1698771Z return cls.__new__(cls, *args) 2025-12-04T10:08:42.1699509Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T10:08:42.1699615Z warnings.warn( 2025-12-04T10:08:42.1699832Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:08:42.1700026Z unimplemented [] 2025-12-04T10:08:42.1700193Z stats [('calls_captured', 22), ('unique_graphs', 1)] 2025-12-04T10:08:42.1700438Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1)] 2025-12-04T10:08:42.1700553Z graph_break [] 2025-12-04T10:08:42.1700770Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:08:42.1701592Z /opt/conda/envs/py_3.10/lib/python3.10/copyreg.py:101: FutureWarning: `isinstance(treespec, LeafSpec)` is deprecated, use `isinstance(treespec, TreeSpec) and treespec.is_leaf()` instead. 2025-12-04T10:08:42.1701767Z return cls.__new__(cls, *args) 2025-12-04T10:08:42.1702489Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T10:08:42.1702612Z warnings.warn( 2025-12-04T10:08:42.1703363Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-b570798f966501a4.xml - 2025-12-04T10:08:42.1703554Z =========================== short test summary info ============================ 2025-12-04T10:08:42.1704471Z FAILED [0.6119s] inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_size_with_unbacked_add_expr_transitive_cuda - torch._inductor.exc.InductorError: SkipFrame: BF16 is not supported 2025-12-04T10:08:42.1704483Z 2025-12-04T10:08:42.1704702Z To execute this test, run the following from the base repo dir: 2025-12-04T10:08:42.1705434Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_aot_inductor.py AOTInductorTestABICompatibleGpu.test_size_with_unbacked_add_expr_transitive_cuda 2025-12-04T10:08:42.1705439Z 2025-12-04T10:08:42.1705710Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:08:42.1705909Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:08:42.1706117Z ================== 1 failed, 157 deselected, 2 rerun in 5.48s ================== 2025-12-04T10:08:42.1706221Z Got exit code 1 2025-12-04T10:08:42.1706348Z Retrying single test... 2025-12-04T10:08:42.1706794Z W1204 10:06:37.377000 23695 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T10:08:42.1707375Z Test results will be stored in test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-be9e2a318f1480ff.xml 2025-12-04T10:08:42.1707549Z ============================= test session starts ============================== 2025-12-04T10:08:42.1707901Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:08:42.1708032Z cachedir: .pytest_cache 2025-12-04T10:08:42.1708556Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:08:42.1708684Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:08:42.1708807Z configfile: pytest.ini 2025-12-04T10:08:42.1709352Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:08:42.1709598Z collecting ... collected 934 items / 157 deselected / 777 selected 2025-12-04T10:08:42.1710391Z stepcurrent: skipping 95 already run items. Running only test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_size_with_unbacked_add_expr_transitive_cuda 2025-12-04T10:08:42.1710517Z Running 1 items in this shard 2025-12-04T10:08:42.1710522Z 2025-12-04T10:08:42.1711227Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_size_with_unbacked_add_expr_transitive_cuda ('RERUN', {'yellow': True}) [4.1921s] [100%] 2025-12-04T10:08:42.1711911Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_size_with_unbacked_add_expr_transitive_cuda ('RERUN', {'yellow': True}) [0.6323s] [100%] 2025-12-04T10:08:42.1712612Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_size_with_unbacked_add_expr_transitive_cuda FAILED [0.6361s] [100%] 2025-12-04T10:08:42.1712619Z 2025-12-04T10:08:42.1712762Z ==================================== RERUNS ==================================== 2025-12-04T10:08:42.1713123Z _ AOTInductorTestABICompatibleGpu.test_size_with_unbacked_add_expr_transitive_cuda _ 2025-12-04T10:08:42.1713263Z Traceback (most recent call last): 2025-12-04T10:08:42.1713827Z File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor.py", line 1969, in test_size_with_unbacked_add_expr_transitive 2025-12-04T10:08:42.1714131Z self.check_model(Repro(), example_inputs, dynamic_shapes=spec) 2025-12-04T10:08:42.1714566Z File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_utils.py", line 252, in check_model 2025-12-04T10:08:42.1714690Z actual = AOTIRunnerUtil.run( 2025-12-04T10:08:42.1715092Z File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_utils.py", line 184, in run 2025-12-04T10:08:42.1715234Z package_path = AOTIRunnerUtil.compile( 2025-12-04T10:08:42.1715641Z File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_utils.py", line 172, in compile 2025-12-04T10:08:42.1715854Z package_path = torch._inductor.aoti_compile_and_package( 2025-12-04T10:08:42.1716381Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 151, in aoti_compile_and_package 2025-12-04T10:08:42.1716525Z return aot_inductor_minifier_wrapper( 2025-12-04T10:08:42.1717074Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/debug.py", line 1336, in aot_inductor_minifier_wrapper 2025-12-04T10:08:42.1717170Z raise e 2025-12-04T10:08:42.1717721Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/debug.py", line 1306, in aot_inductor_minifier_wrapper 2025-12-04T10:08:42.1717824Z return func( 2025-12-04T10:08:42.1718368Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 195, in _aoti_compile_and_package_inner 2025-12-04T10:08:42.1718617Z aoti_files = aot_compile(gm, args, kwargs, options=inductor_configs) 2025-12-04T10:08:42.1719074Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 311, in aot_compile 2025-12-04T10:08:42.1719202Z return compile_fx_aot( 2025-12-04T10:08:42.1719694Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2007, in compile_fx_aot 2025-12-04T10:08:42.1719825Z compiled_artifacts = compile_fx( 2025-12-04T10:08:42.1720307Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2477, in compile_fx 2025-12-04T10:08:42.1720414Z return compile_fx( 2025-12-04T10:08:42.1720895Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2516, in compile_fx 2025-12-04T10:08:42.1721031Z return _maybe_wrap_and_compile_fx_main( 2025-12-04T10:08:42.1721606Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2605, in _maybe_wrap_and_compile_fx_main 2025-12-04T10:08:42.1721736Z return _compile_fx_main( 2025-12-04T10:08:42.1722238Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2788, in _compile_fx_main 2025-12-04T10:08:42.1722440Z return inference_compiler(unlifted_gm, example_inputs_) 2025-12-04T10:08:42.1722971Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/schemas.py", line 1249, in __call__ 2025-12-04T10:08:42.1723126Z return self.compiler_fn(gm, example_inputs) 2025-12-04T10:08:42.1723640Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2669, in fw_compiler_base 2025-12-04T10:08:42.1723758Z return compile_fx_forward( 2025-12-04T10:08:42.1724272Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2341, in compile_fx_forward 2025-12-04T10:08:42.1724394Z return inner_compile( 2025-12-04T10:08:42.1724737Z File "/opt/conda/envs/py_3.10/lib/python3.10/contextlib.py", line 79, in inner 2025-12-04T10:08:42.1724853Z return func(*args, **kwds) 2025-12-04T10:08:42.1725362Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 806, in compile_fx_inner 2025-12-04T10:08:42.1725625Z return wrap_compiler_debug(_compile_fx_inner, compiler_name="inductor")( 2025-12-04T10:08:42.1726191Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/repro/after_aot.py", line 146, in debug_wrapper 2025-12-04T10:08:42.1726367Z inner_compiled_fn = compiler_fn(gm, example_inputs) 2025-12-04T10:08:42.1726870Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 992, in _compile_fx_inner 2025-12-04T10:08:42.1727078Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:08:42.1727584Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 988, in _compile_fx_inner 2025-12-04T10:08:42.1727746Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:08:42.1728279Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:08:42.1728598Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:08:42.1729133Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1471, in codegen_and_compile 2025-12-04T10:08:42.1729259Z _check_triton_bf16_support(graph) 2025-12-04T10:08:42.1729816Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2911, in _check_triton_bf16_support 2025-12-04T10:08:42.1729940Z warn_and_skip(node.get_device()) 2025-12-04T10:08:42.1730422Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2894, in warn_and_skip 2025-12-04T10:08:42.1730583Z raise SkipFrame("BF16 is not supported") 2025-12-04T10:08:42.1730841Z torch._inductor.exc.InductorError: SkipFrame: BF16 is not supported 2025-12-04T10:08:42.1730847Z 2025-12-04T10:08:42.1731067Z To execute this test, run the following from the base repo dir: 2025-12-04T10:08:42.1731796Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_aot_inductor.py AOTInductorTestABICompatibleGpu.test_size_with_unbacked_add_expr_transitive_cuda 2025-12-04T10:08:42.1731807Z 2025-12-04T10:08:42.1732075Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:08:42.1732310Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:08:42.1732416Z unimplemented [] 2025-12-04T10:08:42.1732581Z stats [('calls_captured', 22), ('unique_graphs', 1)] 2025-12-04T10:08:42.1732842Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1)] 2025-12-04T10:08:42.1732944Z graph_break [] 2025-12-04T10:08:42.1733166Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:08:42.1733991Z /opt/conda/envs/py_3.10/lib/python3.10/copyreg.py:101: FutureWarning: `isinstance(treespec, LeafSpec)` is deprecated, use `isinstance(treespec, TreeSpec) and treespec.is_leaf()` instead. 2025-12-04T10:08:42.1734109Z return cls.__new__(cls, *args) 2025-12-04T10:08:42.1734848Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T10:08:42.1734957Z warnings.warn( 2025-12-04T10:08:42.1735315Z _ AOTInductorTestABICompatibleGpu.test_size_with_unbacked_add_expr_transitive_cuda _ 2025-12-04T10:08:42.1735454Z Traceback (most recent call last): 2025-12-04T10:08:42.1736014Z File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor.py", line 1969, in test_size_with_unbacked_add_expr_transitive 2025-12-04T10:08:42.1736250Z self.check_model(Repro(), example_inputs, dynamic_shapes=spec) 2025-12-04T10:08:42.1736818Z File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_utils.py", line 252, in check_model 2025-12-04T10:08:42.1736945Z actual = AOTIRunnerUtil.run( 2025-12-04T10:08:42.1737343Z File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_utils.py", line 184, in run 2025-12-04T10:08:42.1737486Z package_path = AOTIRunnerUtil.compile( 2025-12-04T10:08:42.1737889Z File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_utils.py", line 172, in compile 2025-12-04T10:08:42.1738168Z package_path = torch._inductor.aoti_compile_and_package( 2025-12-04T10:08:42.1738693Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 151, in aoti_compile_and_package 2025-12-04T10:08:42.1738840Z return aot_inductor_minifier_wrapper( 2025-12-04T10:08:42.1739377Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/debug.py", line 1336, in aot_inductor_minifier_wrapper 2025-12-04T10:08:42.1739470Z raise e 2025-12-04T10:08:42.1740023Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/debug.py", line 1306, in aot_inductor_minifier_wrapper 2025-12-04T10:08:42.1740123Z return func( 2025-12-04T10:08:42.1740669Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 195, in _aoti_compile_and_package_inner 2025-12-04T10:08:42.1740913Z aoti_files = aot_compile(gm, args, kwargs, options=inductor_configs) 2025-12-04T10:08:42.1741373Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 311, in aot_compile 2025-12-04T10:08:42.1741499Z return compile_fx_aot( 2025-12-04T10:08:42.1741990Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2007, in compile_fx_aot 2025-12-04T10:08:42.1742114Z compiled_artifacts = compile_fx( 2025-12-04T10:08:42.1742597Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2477, in compile_fx 2025-12-04T10:08:42.1742706Z return compile_fx( 2025-12-04T10:08:42.1743183Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2516, in compile_fx 2025-12-04T10:08:42.1743318Z return _maybe_wrap_and_compile_fx_main( 2025-12-04T10:08:42.1743891Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2605, in _maybe_wrap_and_compile_fx_main 2025-12-04T10:08:42.1744022Z return _compile_fx_main( 2025-12-04T10:08:42.1744522Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2788, in _compile_fx_main 2025-12-04T10:08:42.1744724Z return inference_compiler(unlifted_gm, example_inputs_) 2025-12-04T10:08:42.1745253Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/schemas.py", line 1249, in __call__ 2025-12-04T10:08:42.1745402Z return self.compiler_fn(gm, example_inputs) 2025-12-04T10:08:42.1745920Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2669, in fw_compiler_base 2025-12-04T10:08:42.1746037Z return compile_fx_forward( 2025-12-04T10:08:42.1746551Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2341, in compile_fx_forward 2025-12-04T10:08:42.1746673Z return inner_compile( 2025-12-04T10:08:42.1746957Z File "/opt/conda/envs/py_3.10/lib/python3.10/contextlib.py", line 79, in inner 2025-12-04T10:08:42.1747071Z return func(*args, **kwds) 2025-12-04T10:08:42.1747576Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 806, in compile_fx_inner 2025-12-04T10:08:42.1747842Z return wrap_compiler_debug(_compile_fx_inner, compiler_name="inductor")( 2025-12-04T10:08:42.1748344Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/repro/after_aot.py", line 146, in debug_wrapper 2025-12-04T10:08:42.1748580Z inner_compiled_fn = compiler_fn(gm, example_inputs) 2025-12-04T10:08:42.1749084Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 992, in _compile_fx_inner 2025-12-04T10:08:42.1749293Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:08:42.1749794Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 988, in _compile_fx_inner 2025-12-04T10:08:42.1750030Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:08:42.1750564Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:08:42.1750887Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:08:42.1751420Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1471, in codegen_and_compile 2025-12-04T10:08:42.1751548Z _check_triton_bf16_support(graph) 2025-12-04T10:08:42.1752114Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2911, in _check_triton_bf16_support 2025-12-04T10:08:42.1752239Z warn_and_skip(node.get_device()) 2025-12-04T10:08:42.1752722Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2894, in warn_and_skip 2025-12-04T10:08:42.1752876Z raise SkipFrame("BF16 is not supported") 2025-12-04T10:08:42.1753137Z torch._inductor.exc.InductorError: SkipFrame: BF16 is not supported 2025-12-04T10:08:42.1753142Z 2025-12-04T10:08:42.1753360Z To execute this test, run the following from the base repo dir: 2025-12-04T10:08:42.1754093Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_aot_inductor.py AOTInductorTestABICompatibleGpu.test_size_with_unbacked_add_expr_transitive_cuda 2025-12-04T10:08:42.1754099Z 2025-12-04T10:08:42.1754366Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:08:42.1754605Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:08:42.1754711Z unimplemented [] 2025-12-04T10:08:42.1754875Z stats [('calls_captured', 22), ('unique_graphs', 1)] 2025-12-04T10:08:42.1755129Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1)] 2025-12-04T10:08:42.1755230Z graph_break [] 2025-12-04T10:08:42.1755449Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:08:42.1756280Z /opt/conda/envs/py_3.10/lib/python3.10/copyreg.py:101: FutureWarning: `isinstance(treespec, LeafSpec)` is deprecated, use `isinstance(treespec, TreeSpec) and treespec.is_leaf()` instead. 2025-12-04T10:08:42.1756398Z return cls.__new__(cls, *args) 2025-12-04T10:08:42.1757136Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T10:08:42.1757241Z warnings.warn( 2025-12-04T10:08:42.1757465Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:08:42.1757583Z unimplemented [] 2025-12-04T10:08:42.1757746Z stats [('calls_captured', 22), ('unique_graphs', 1)] 2025-12-04T10:08:42.1758002Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1)] 2025-12-04T10:08:42.1758102Z graph_break [] 2025-12-04T10:08:42.1758317Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:08:42.1759154Z /opt/conda/envs/py_3.10/lib/python3.10/copyreg.py:101: FutureWarning: `isinstance(treespec, LeafSpec)` is deprecated, use `isinstance(treespec, TreeSpec) and treespec.is_leaf()` instead. 2025-12-04T10:08:42.1759273Z return cls.__new__(cls, *args) 2025-12-04T10:08:42.1759997Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T10:08:42.1760118Z warnings.warn( 2025-12-04T10:08:42.1760342Z =================================== FAILURES =================================== 2025-12-04T10:08:42.1760718Z _ AOTInductorTestABICompatibleGpu.test_size_with_unbacked_add_expr_transitive_cuda _ 2025-12-04T10:08:42.1760843Z Traceback (most recent call last): 2025-12-04T10:08:42.1761410Z File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor.py", line 1969, in test_size_with_unbacked_add_expr_transitive 2025-12-04T10:08:42.1761652Z self.check_model(Repro(), example_inputs, dynamic_shapes=spec) 2025-12-04T10:08:42.1762147Z File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_utils.py", line 252, in check_model 2025-12-04T10:08:42.1762271Z actual = AOTIRunnerUtil.run( 2025-12-04T10:08:42.1762672Z File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_utils.py", line 184, in run 2025-12-04T10:08:42.1762818Z package_path = AOTIRunnerUtil.compile( 2025-12-04T10:08:42.1763242Z File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_utils.py", line 172, in compile 2025-12-04T10:08:42.1763451Z package_path = torch._inductor.aoti_compile_and_package( 2025-12-04T10:08:42.1763972Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 151, in aoti_compile_and_package 2025-12-04T10:08:42.1764123Z return aot_inductor_minifier_wrapper( 2025-12-04T10:08:42.1764666Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/debug.py", line 1336, in aot_inductor_minifier_wrapper 2025-12-04T10:08:42.1764781Z raise e 2025-12-04T10:08:42.1765319Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/debug.py", line 1306, in aot_inductor_minifier_wrapper 2025-12-04T10:08:42.1765419Z return func( 2025-12-04T10:08:42.1765983Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 195, in _aoti_compile_and_package_inner 2025-12-04T10:08:42.1766219Z aoti_files = aot_compile(gm, args, kwargs, options=inductor_configs) 2025-12-04T10:08:42.1766680Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 311, in aot_compile 2025-12-04T10:08:42.1766811Z return compile_fx_aot( 2025-12-04T10:08:42.1767303Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2007, in compile_fx_aot 2025-12-04T10:08:42.1767444Z compiled_artifacts = compile_fx( 2025-12-04T10:08:42.1768051Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2477, in compile_fx 2025-12-04T10:08:42.1768168Z return compile_fx( 2025-12-04T10:08:42.1768650Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2516, in compile_fx 2025-12-04T10:08:42.1768787Z return _maybe_wrap_and_compile_fx_main( 2025-12-04T10:08:42.1769364Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2605, in _maybe_wrap_and_compile_fx_main 2025-12-04T10:08:42.1769491Z return _compile_fx_main( 2025-12-04T10:08:42.1769996Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2788, in _compile_fx_main 2025-12-04T10:08:42.1770215Z return inference_compiler(unlifted_gm, example_inputs_) 2025-12-04T10:08:42.1770730Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/schemas.py", line 1249, in __call__ 2025-12-04T10:08:42.1770882Z return self.compiler_fn(gm, example_inputs) 2025-12-04T10:08:42.1771583Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2669, in fw_compiler_base 2025-12-04T10:08:42.1771701Z return compile_fx_forward( 2025-12-04T10:08:42.1772236Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2341, in compile_fx_forward 2025-12-04T10:08:42.1772349Z return inner_compile( 2025-12-04T10:08:42.1772631Z File "/opt/conda/envs/py_3.10/lib/python3.10/contextlib.py", line 79, in inner 2025-12-04T10:08:42.1772897Z return func(*args, **kwds) 2025-12-04T10:08:42.1773393Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 806, in compile_fx_inner 2025-12-04T10:08:42.1773657Z return wrap_compiler_debug(_compile_fx_inner, compiler_name="inductor")( 2025-12-04T10:08:42.1774160Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/repro/after_aot.py", line 146, in debug_wrapper 2025-12-04T10:08:42.1774429Z inner_compiled_fn = compiler_fn(gm, example_inputs) 2025-12-04T10:08:42.1774943Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 992, in _compile_fx_inner 2025-12-04T10:08:42.1775137Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:08:42.1775641Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 988, in _compile_fx_inner 2025-12-04T10:08:42.1775799Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:08:42.1776398Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:08:42.1776735Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:08:42.1777251Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1471, in codegen_and_compile 2025-12-04T10:08:42.1777386Z _check_triton_bf16_support(graph) 2025-12-04T10:08:42.1777950Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2911, in _check_triton_bf16_support 2025-12-04T10:08:42.1778073Z warn_and_skip(node.get_device()) 2025-12-04T10:08:42.1778561Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2894, in warn_and_skip 2025-12-04T10:08:42.1778717Z raise SkipFrame("BF16 is not supported") 2025-12-04T10:08:42.1778974Z torch._inductor.exc.InductorError: SkipFrame: BF16 is not supported 2025-12-04T10:08:42.1778985Z 2025-12-04T10:08:42.1779220Z To execute this test, run the following from the base repo dir: 2025-12-04T10:08:42.1779943Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_aot_inductor.py AOTInductorTestABICompatibleGpu.test_size_with_unbacked_add_expr_transitive_cuda 2025-12-04T10:08:42.1779949Z 2025-12-04T10:08:42.1780219Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:08:42.1780463Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:08:42.1780570Z unimplemented [] 2025-12-04T10:08:42.1780748Z stats [('calls_captured', 22), ('unique_graphs', 1)] 2025-12-04T10:08:42.1780992Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1)] 2025-12-04T10:08:42.1781094Z graph_break [] 2025-12-04T10:08:42.1781327Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:08:42.1782150Z /opt/conda/envs/py_3.10/lib/python3.10/copyreg.py:101: FutureWarning: `isinstance(treespec, LeafSpec)` is deprecated, use `isinstance(treespec, TreeSpec) and treespec.is_leaf()` instead. 2025-12-04T10:08:42.1782268Z return cls.__new__(cls, *args) 2025-12-04T10:08:42.1783009Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T10:08:42.1783114Z warnings.warn( 2025-12-04T10:08:42.1783351Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:08:42.1783456Z unimplemented [] 2025-12-04T10:08:42.1783622Z stats [('calls_captured', 22), ('unique_graphs', 1)] 2025-12-04T10:08:42.1783878Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1)] 2025-12-04T10:08:42.1783979Z graph_break [] 2025-12-04T10:08:42.1784192Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:08:42.1785102Z /opt/conda/envs/py_3.10/lib/python3.10/copyreg.py:101: FutureWarning: `isinstance(treespec, LeafSpec)` is deprecated, use `isinstance(treespec, TreeSpec) and treespec.is_leaf()` instead. 2025-12-04T10:08:42.1785226Z return cls.__new__(cls, *args) 2025-12-04T10:08:42.1785962Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T10:08:42.1786068Z warnings.warn( 2025-12-04T10:08:42.1786354Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:08:42.1786471Z unimplemented [] 2025-12-04T10:08:42.1786633Z stats [('calls_captured', 22), ('unique_graphs', 1)] 2025-12-04T10:08:42.1786876Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1)] 2025-12-04T10:08:42.1786991Z graph_break [] 2025-12-04T10:08:42.1787207Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:08:42.1788030Z /opt/conda/envs/py_3.10/lib/python3.10/copyreg.py:101: FutureWarning: `isinstance(treespec, LeafSpec)` is deprecated, use `isinstance(treespec, TreeSpec) and treespec.is_leaf()` instead. 2025-12-04T10:08:42.1788146Z return cls.__new__(cls, *args) 2025-12-04T10:08:42.1788867Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T10:08:42.1788985Z warnings.warn( 2025-12-04T10:08:42.1789746Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-be9e2a318f1480ff.xml - 2025-12-04T10:08:42.1789940Z =========================== short test summary info ============================ 2025-12-04T10:08:42.1790857Z FAILED [0.6361s] inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_size_with_unbacked_add_expr_transitive_cuda - torch._inductor.exc.InductorError: SkipFrame: BF16 is not supported 2025-12-04T10:08:42.1790864Z 2025-12-04T10:08:42.1791087Z To execute this test, run the following from the base repo dir: 2025-12-04T10:08:42.1791822Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_aot_inductor.py AOTInductorTestABICompatibleGpu.test_size_with_unbacked_add_expr_transitive_cuda 2025-12-04T10:08:42.1791829Z 2025-12-04T10:08:42.1792098Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:08:42.1792294Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:08:42.1792503Z ================== 1 failed, 157 deselected, 2 rerun in 5.55s ================== 2025-12-04T10:08:42.1792605Z Got exit code 1 2025-12-04T10:08:42.1793257Z FAILED CONSISTENTLY: test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_size_with_unbacked_add_expr_transitive_cuda 2025-12-04T10:08:42.1793670Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T10:08:42.1794135Z W1204 10:07:00.169000 23923 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T10:08:42.1794704Z Test results will be stored in test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-e62e290dfdad5699.xml 2025-12-04T10:08:42.1794869Z ============================= test session starts ============================== 2025-12-04T10:08:42.1795236Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:08:42.1795355Z cachedir: .pytest_cache 2025-12-04T10:08:42.1795878Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:08:42.1796016Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:08:42.1796126Z configfile: pytest.ini 2025-12-04T10:08:42.1796677Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:08:42.1796902Z collecting ... collected 934 items / 96 deselected / 838 selected 2025-12-04T10:08:42.1797136Z stepcurrent: skipping 96 already run items. 2025-12-04T10:08:42.1797267Z Running 62 items in this shard 2025-12-04T10:08:42.1797273Z 2025-12-04T10:08:42.1797966Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_sym_expr_indexing_cuda <- test/inductor/test_torchinductor.py PASSED [9.5457s] [ 1%] 2025-12-04T10:08:42.1798828Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_torchvision_transforms_functional_tensor_resize_cuda <- test/inductor/test_torchinductor.py PASSED [8.3397s] [ 3%] 2025-12-04T10:08:42.1799686Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_triton_kernel_dynamic_shape_with_div_cuda <- test/inductor/test_torchinductor.py PASSED [5.5832s] [ 4%] 2025-12-04T10:08:42.1800329Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_triton_kernel_equal_to_1_float_arg_dynamic_True_cuda PASSED [6.2479s] [ 6%] 2025-12-04T10:08:42.1801053Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_triton_kernel_grid_type_3_num_dims_1_dynamic_True_autotune_True_cuda PASSED [7.0983s] [ 8%] 2025-12-04T10:08:42.1801990Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_triton_kernel_on_device_tma_dynamic_False_tma_version_new_cuda SKIPPED [0.0032s] (requires triton.tools.tensor_descriptor TMA support) [ 9%] 2025-12-04T10:08:42.1802760Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_triton_kernel_reinterpret_view_cuda <- test/inductor/test_torchinductor.py PASSED [6.4212s] [ 11%] 2025-12-04T10:08:42.1803505Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_triton_kernel_sympy_fn_like_arg_cuda <- test/inductor/test_torchinductor.py PASSED [6.2953s] [ 12%] 2025-12-04T10:08:42.1804206Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_unbacked_expr_replacements_shift_k_1_use_static_size_False_cuda PASSED [7.8616s] [ 14%] 2025-12-04T10:08:42.1806278Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_upper_bound_i64_cuda <- test/inductor/test_torchinductor.py SKIPPED [0.0008s] (Test is disabled because an issue exists disabling it: https://github.com/pytorch/pytorch/issues/159860 for platform(s) inductor, linux. If you're seeing this on your local machine and would like to enable this test, please make sure CI is not set and you are not using the flag --import-disabled-tests.) [ 16%] 2025-12-04T10:08:42.1807000Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_weight_on_disk_legacy_cuda <- test/inductor/test_torchinductor.py PASSED [6.4071s] [ 17%] 2025-12-04T10:08:42.1808255Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_while_loop_with_mixed_device_dynamic_True_cuda W1204 10:08:06.093000 23923 site-packages/torch/export/dynamic_shapes.py:923] Using None as a dynamic shape dimension is deprecated. Please use Dim.STATIC instead 2025-12-04T10:08:42.1808922Z W1204 10:08:06.094000 23923 site-packages/torch/export/dynamic_shapes.py:923] Using None as a dynamic shape dimension is deprecated. Please use Dim.STATIC instead 2025-12-04T10:08:42.1809332Z W1204 10:08:06.553000 23923 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program 2025-12-04T10:08:42.1809439Z PASSED [7.0360s] [ 19%] 2025-12-04T10:08:42.1810637Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_while_loop_with_parameters_cuda W1204 10:08:13.146000 23923 site-packages/torch/export/dynamic_shapes.py:923] Using None as a dynamic shape dimension is deprecated. Please use Dim.STATIC instead 2025-12-04T10:08:42.1810749Z PASSED [7.0724s] [ 20%] 2025-12-04T10:08:42.1811996Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_while_loop_with_sym_expr_cond_dynamic_True_cuda W1204 10:08:20.203000 23923 site-packages/torch/export/dynamic_shapes.py:923] Using None as a dynamic shape dimension is deprecated. Please use Dim.STATIC instead 2025-12-04T10:08:42.1812736Z W1204 10:08:20.203000 23923 site-packages/torch/export/dynamic_shapes.py:923] Using None as a dynamic shape dimension is deprecated. Please use Dim.STATIC instead 2025-12-04T10:08:42.1812843Z PASSED [7.2707s] [ 22%] 2025-12-04T10:08:42.1813981Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_with_profiler_cuda <- test/inductor/test_torchinductor.py W1204 10:08:27.544000 23923 site-packages/torch/_inductor/utils.py:1703] [0/0] Not enough SMs to use max_autotune_gemm mode 2025-12-04T10:08:42.1814142Z PASSED [6.1129s] [ 24%] 2025-12-04T10:08:42.1814821Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_zero_size_buffer_cuda <- test/inductor/test_torchinductor.py PASSED [5.7960s] [ 25%] 2025-12-04T10:08:42.1815581Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test__int_mm_mps <- test/inductor/test_torchinductor.py SKIPPED [0.0004s] (No MPS backend available) [ 27%] 2025-12-04T10:08:42.1816232Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_addmm_multiple_dynamic_mps SKIPPED [0.0002s] (No MPS backend available) [ 29%] 2025-12-04T10:08:42.1817148Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_aoti_constant_tensor_mps <- test/inductor/test_torchinductor.py SKIPPED [0.0002s] (No MPS backend available) [ 30%] 2025-12-04T10:08:42.1818075Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_aoti_debug_printing_model_inputs_codegen_mps <- test/inductor/test_torchinductor.py SKIPPED [0.0002s] (No MPS backend available) [ 32%] 2025-12-04T10:08:42.1818917Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_aoti_runtime_asserts_mps <- test/inductor/test_torchinductor.py SKIPPED [0.0005s] (No MPS backend available) [ 33%] 2025-12-04T10:08:42.1819726Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_assert_tensor_meta_mps <- test/inductor/test_torchinductor.py SKIPPED [0.0002s] (No MPS backend available) [ 35%] 2025-12-04T10:08:42.1820372Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_backward_no_op_logging_mps SKIPPED [0.0002s] (No MPS backend available) [ 37%] 2025-12-04T10:08:42.1821196Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_buffer_mutation_1_mps <- test/inductor/test_torchinductor.py SKIPPED [0.0002s] (No MPS backend available) [ 38%] 2025-12-04T10:08:42.1821817Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_buffer_mutation_3_mps SKIPPED [0.0002s] (No MPS backend available) [ 40%] 2025-12-04T10:08:42.1822657Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_cond_predicate_on_cpu_mps <- test/inductor/test_torchinductor.py SKIPPED [0.0002s] (No MPS backend available) [ 41%] 2025-12-04T10:08:42.1823397Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_cond_unbacked_symint_closure_dynamic_True_mps SKIPPED [0.0002s] (No MPS backend available) [ 43%] 2025-12-04T10:08:42.1824295Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_cond_use_buffers_from_outer_scope_mps <- test/inductor/test_torchinductor.py SKIPPED [0.0002s] (No MPS backend available) [ 45%] 2025-12-04T10:08:42.1825117Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_cond_with_parameters_mps <- test/inductor/test_torchinductor.py SKIPPED [0.0002s] (No MPS backend available) [ 46%] 2025-12-04T10:08:42.1826044Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_cond_with_reinterpret_view_inputs_outputs_mps <- test/inductor/test_torchinductor.py SKIPPED [0.0004s] (No MPS backend available) [ 48%] 2025-12-04T10:08:42.1826818Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_d2h_copy_mps <- test/inductor/test_torchinductor.py SKIPPED [0.0002s] (No MPS backend available) [ 50%] 2025-12-04T10:08:42.1827604Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_dynamic_cat_mps <- test/inductor/test_torchinductor.py SKIPPED [0.0002s] (No MPS backend available) [ 51%] 2025-12-04T10:08:42.1828311Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_fp8_view_of_param_mps SKIPPED [0.0002s] (No MPS backend available) [ 53%] 2025-12-04T10:08:42.1829128Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_index_put_fallback_mps <- test/inductor/test_torchinductor.py SKIPPED [0.0002s] (No MPS backend available) [ 54%] 2025-12-04T10:08:42.1829791Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_index_put_with_none_index_mps SKIPPED [0.0004s] (No MPS backend available) [ 56%] 2025-12-04T10:08:42.1830674Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_large_mmaped_weights_mps <- test/inductor/test_torchinductor.py SKIPPED [0.0002s] (No MPS backend available) [ 58%] 2025-12-04T10:08:42.1831473Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_libtorch_free_so_mps <- test/inductor/test_torchinductor.py SKIPPED [0.0002s] (No MPS backend available) [ 59%] 2025-12-04T10:08:42.1832135Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_misc_1_max_autotune_True_mps SKIPPED [0.0002s] (No MPS backend available) [ 61%] 2025-12-04T10:08:42.1832932Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_output_path_2_mps <- test/inductor/test_torchinductor.py SKIPPED [0.0002s] (No MPS backend available) [ 62%] 2025-12-04T10:08:42.1833775Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_poi_multiple_dynamic_mps <- test/inductor/test_torchinductor.py SKIPPED [0.0004s] (No MPS backend available) [ 64%] 2025-12-04T10:08:42.1834428Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_quanatized_int8_linear_mps SKIPPED [0.0002s] (No MPS backend available) [ 66%] 2025-12-04T10:08:42.1835260Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_repeat_interleave_mps <- test/inductor/test_torchinductor.py SKIPPED [0.0003s] (No MPS backend available) [ 67%] 2025-12-04T10:08:42.1836072Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_repeated_calling_mps <- test/inductor/test_torchinductor.py SKIPPED [0.0002s] (No MPS backend available) [ 69%] 2025-12-04T10:08:42.1836771Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_runtime_checks_device_type_failed_mps SKIPPED [0.0002s] (No MPS backend available) [ 70%] 2025-12-04T10:08:42.1837349Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_sdpa_2_mps SKIPPED [0.0002s] (No MPS backend available) [ 72%] 2025-12-04T10:08:42.1838206Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_shifted_constraint_ranges_mps <- test/inductor/test_torchinductor.py SKIPPED [0.0002s] (No MPS backend available) [ 74%] 2025-12-04T10:08:42.1839049Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_size_from_multi_output_mps <- test/inductor/test_torchinductor.py SKIPPED [0.0002s] (No MPS backend available) [ 75%] 2025-12-04T10:08:42.1839833Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_subclasses_mps <- test/inductor/test_torchinductor.py SKIPPED [0.0002s] (No MPS backend available) [ 77%] 2025-12-04T10:08:42.1840632Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_symbool_item_mps <- test/inductor/test_torchinductor.py SKIPPED [0.0002s] (No MPS backend available) [ 79%] 2025-12-04T10:08:42.1841411Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_symint_item_mps <- test/inductor/test_torchinductor.py SKIPPED [0.0002s] (No MPS backend available) [ 80%] 2025-12-04T10:08:42.1842259Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_triton_kernel_equal_to_1_arg_mps <- test/inductor/test_torchinductor.py SKIPPED [0.2830s] (No MPS backend available) [ 82%] 2025-12-04T10:08:42.1843105Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_triton_kernel_grid_type_1_num_dims_1_dynamic_False_autotune_True_mps SKIPPED [0.0003s] (No MPS backend available) [ 83%] 2025-12-04T10:08:42.1843938Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_triton_kernel_grid_type_3_num_dims_2_dynamic_False_autotune_False_mps SKIPPED [0.0002s] (No MPS backend available) [ 85%] 2025-12-04T10:08:42.1853744Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_triton_kernel_grid_type_3_num_dims_2_dynamic_True_autotune_False_mps SKIPPED [0.0002s] (No MPS backend available) [ 87%] 2025-12-04T10:08:42.1854737Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_triton_kernel_sympy_expr_arg_mps <- test/inductor/test_torchinductor.py SKIPPED [0.0002s] (No MPS backend available) [ 88%] 2025-12-04T10:08:42.1855681Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_triton_kernel_unbacked_symint_in_grid_dynamic_True_autotuning_False_mps SKIPPED [0.0002s] (No MPS backend available) [ 90%] 2025-12-04T10:08:42.1856618Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_unbacked_expr_replacements_shift_k_0_use_static_size_True_mps SKIPPED [0.0002s] (No MPS backend available) [ 91%] 2025-12-04T10:08:42.1857430Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_unbacked_expr_replacements_shift_k_1_use_static_size_True_mps SKIPPED [0.0002s] (No MPS backend available) [ 93%] 2025-12-04T10:08:42.1858179Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_while_loop_with_sym_expr_cond_dynamic_True_mps SKIPPED [0.0004s] (No MPS backend available) [ 95%] 2025-12-04T10:08:42.1858970Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_while_loop_with_unbacked_symint_closure_dynamic_False_mps SKIPPED [0.0002s] (No MPS backend available) [ 96%] 2025-12-04T10:08:42.1859769Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_while_loop_with_unbacked_symint_closure_dynamic_True_mps SKIPPED [0.0002s] (No MPS backend available) [ 98%] 2025-12-04T10:08:42.1860571Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_zero_size_buffer_mps <- test/inductor/test_torchinductor.py SKIPPED [0.0002s] (No MPS backend available) [100%] 2025-12-04T10:08:42.1860578Z 2025-12-04T10:08:42.1861343Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-e62e290dfdad5699.xml - 2025-12-04T10:08:42.1861583Z =========== 14 passed, 48 skipped, 96 deselected in 97.55s (0:01:37) =========== 2025-12-04T10:08:42.1863463Z The following tests failed consistently: ['test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test__weight_int4pack_mm_m_32_n_64_q_group_64_num_groups_1_cuda', 'test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_deconv_freezing_cuda', 'test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_size_with_unbacked_add_expr_transitive_cuda'] 2025-12-04T10:08:42.1863474Z 2025-12-04T10:08:42.1864040Z FINISHED PRINTING LOG FILE of inductor/test_aot_inductor 6/6 (test/test-reports/inductor.test_aot_inductor_6.6_462385258b0b1d27_.log) 2025-12-04T10:08:42.1864046Z 2025-12-04T10:08:42.1864404Z Finished inductor/test_aot_inductor 6/6 ... [2025-12-04 10:08:41.954931][3350.337820076], took 14.14min 2025-12-04T10:08:42.1865224Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-bf15e775351f3d84.xml 2025-12-04T10:08:42.1866050Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-cd1c50b62bb47a1b.xml 2025-12-04T10:08:42.1866863Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-3e5313e420476f15.xml 2025-12-04T10:08:42.1867663Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-b23b654b51890d24.xml 2025-12-04T10:08:42.1868451Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-2e7c8f13f7be0603.xml 2025-12-04T10:08:42.1869347Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-8d6cdce6581fa448.xml 2025-12-04T10:08:42.2068273Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-9dce38c1d023996d.xml 2025-12-04T10:08:42.2562518Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-b570798f966501a4.xml 2025-12-04T10:08:42.2878467Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-be9e2a318f1480ff.xml 2025-12-04T10:08:42.3207326Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-e62e290dfdad5699.xml 2025-12-04T10:08:42.6633827Z Uploading logs for 57119749248 to S3 2025-12-04T10:08:42.7007168Z Uploading artifacts took 0.34 seconds 2025-12-04T10:08:42.7007630Z inductor/test_aot_inductor 6/6 failed! 2025-12-04T10:08:42.7012218Z Running inductor/test_torchinductor_codegen_dynamic_shapes 2/4 ... [2025-12-04 10:08:42.701016][3351.083910326] 2025-12-04T10:08:42.7012929Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T10:08:42.7017306Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_torchinductor_codegen_dynamic_shapes.py', '--shard-id=2', '--num-shards=4', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:08:42.701476] 2025-12-04T10:19:47.8566431Z 2025-12-04T10:19:47.8567675Z PRINTING LOG FILE of inductor/test_torchinductor_codegen_dynamic_shapes 2/4 (test/test-reports/inductor.test_torchinductor_codegen_dynamic_shapes_2.4_37f84ce4dcc870f4_.log) 2025-12-04T10:19:47.8569310Z W1204 10:08:51.991000 26638 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T10:19:47.8571615Z Test results will be stored in test-reports/python-pytest/inductor.test_torchinductor_codegen_dynamic_shapes/inductor.test_torchinductor_codegen_dynamic_shapes-0c75da116b2f10f8.xml 2025-12-04T10:19:47.8573239Z ============================= test session starts ============================== 2025-12-04T10:19:47.8574144Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:19:47.8574990Z cachedir: .pytest_cache 2025-12-04T10:19:47.8576190Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:19:47.8577682Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:19:47.8578231Z configfile: pytest.ini 2025-12-04T10:19:47.8579569Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:19:47.8580391Z collecting ... collected 1750 items 2025-12-04T10:19:47.8580851Z stepcurrent: Cannot find last run test, not skipping 2025-12-04T10:19:47.8888534Z Running 441 items in this shard: test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_AllenaiLongformerBase_repro_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test__dyn_quant_matmul_4bit_fp32_input_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test__dyn_quant_pack_4bit_weight_bf16_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test__dyn_quant_pack_4bit_weight_fp32_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test__unsafe_masked_index_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test__unsafe_masked_index_put_accumulate_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_adaptive_avg_pool2d1_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_add_complex10_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_add_complex4_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_add_complex8_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_add_complex_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_add_inplace_permuted_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_adding_tensor_offsets_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_alexnet_prefix_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_allow_reuse_disable_if_exceed_peak_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_angle_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_any_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_aoti_eager_cache_hit_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_aoti_eager_with_persistent_cache_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_arange1_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_argmax_argmin1_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_as_strided_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_avg_pool2d_backward4_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_bitwise2_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_bitwise3_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_bmm1_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_bucketize_int_int32_int8_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_bucketize_int_int64_uint8_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_bucketize_int_uint8_uint8_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_buffer_batch_norm_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_buffer_use_after_remove_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_builtins_round_float_ndigits_neg_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_cat_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_cat_empty_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_cat_extern_kernel_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_cat_uint8_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_cat_unbacked_legacy_empty_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_cauchy_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_check_stack_no_cycles_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_clamp_type_promotion_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_complex_from_real_imag_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_concat_add_inplace_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_consecutive_split_cumsum_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_constant_pad_1d_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_constant_pad_float64_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_conv1d_depthwise_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_conv1d_with_permute_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_conv3d_channels_last_use_block_ptr_True_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_conv_backward_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_conv_inference_heuristics_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_conv_with_as_strided_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_convolution1_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_convolution2_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_convolution3_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_copy_non_blocking_is_pinned_use_cat_True_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_cpu_scalar_with_cpu_scalar_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_cumprod_zero_dim_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_custom_op_1_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_custom_scan_op_compiled_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_deterministic_codegen_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_deterministic_codegen_on_graph_break_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_div2_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_div_presicion_accuracy_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_dont_constant_fold_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_dropout2_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_dropout3_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_dropout_trivial_0_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_dtype_sympy_expr_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_dtypeview_float16_int64_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_dtypeview_float16_uint8_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_dtypeview_float64_int16_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_dtypeview_float64_int32_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_dtypeview_float64_int64_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_dtypeview_float64_uint8_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_dtypeview_fusion_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_dtypeview_int32_uint8_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_dtypeview_int64_float16_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_dtypeview_int64_float32_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_dtypeview_int64_uint8_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_dtypeview_int8_int16_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_dtypeview_int8_int32_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_dtypeview_uint8_float32_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_dtypeview_uint8_int8_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_dtypeview_uint8_uint8_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_elu_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_empty2_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_erfinv_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_expanded_reduction_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_fallback_mutable_op_with_return_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_fill1_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_floordiv_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_fmod_zero_dim_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_forced_buffer_realize_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_fractional_max_pool2d3_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_fractional_max_pool2d4_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_fractional_max_pool2d5_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_generated_code_has_size_stride_assert_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_getitem_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_glu_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_graph_partition_arange1_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_graph_partition_arange2_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_graph_partition_both_scalars_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_graph_partition_misaligned_input_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_hardsigmoid_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_hardswish_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_index_float_zero_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_index_propagation_floordiv_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_index_propagation_remainder_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_index_put3_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_index_put_fallback1_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_index_put_fallback2_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_index_put_reinplace_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_inductor_triton_bucketize_respects_masking_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_inplace_activations_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_inplace_add_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_inplace_mixed_dtype_ops_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_inplace_resize_as_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_inplace_where_pointwise_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_input_mutation1_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_int_input_dynamic_shapes_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_issue102546_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_large_broadcast_reduction_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_large_grid_use_block_ptr_False_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_large_offset_pointwise_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_layer_norm_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_leaky_relu_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_linear2_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_linear_float64_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_list_clearing_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_lite_mode_fallback_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_lite_mode_not_decompose_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_lite_regional_compile_flex_attention_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_lite_regional_compile_invoke_subgraph_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_log1p_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_log2_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_log_fp64_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_logcumsumexp_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_mark_dynamic_with_hint_override_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_max_pool2d1_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_max_pool2d2_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_max_pool2d6_dilation_2_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_max_pool2d_with_indices_backward5_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_min_max_reduction_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_min_max_reduction_nan_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_mixed_mm2_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_mixed_mm3_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_mm_mixed_dtype_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_multilayer_sum_low_prec_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_mutable_custom_op_fixed_layout2_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_nan_sort_stable_True_descending_True_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_neg_max_uint8_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_new_empty_strided_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_nll_loss_forward_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_one_hot_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_pad_cast_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_pattern_matcher_unbacked_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_pointwise_bessel_j0_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_pointwise_erf_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_pointwise_erfc_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_pointwise_exp2_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_pointwise_expm1_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_pointwise_gammaincc_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_pointwise_modified_bessel_i0_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_pointwise_multigammaln_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_pointwise_ndtr_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_pointwise_polygamma_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_pointwise_scaled_modified_bessel_k1_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_pointwise_shifted_chebyshev_polynomial_v_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_pointwise_xlogy_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_pow3_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_pow_int_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_prepare_softmax_with_fast_math_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_rand_like_deterministic_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_reduction1_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_reduction2_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_reduction_config_limit_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_remainder_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_remove_noop_slice_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_remove_noop_view_default_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_repeat_as_strided_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_repeat_interleave_Tensor_decomp_int32_nd_2_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_repeat_interleave_decomposition_has_clamp_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_resize_as_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_round_correctness_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_rsqrt_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_scalar_input_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_scaled_dot_product_efficient_attention_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_scatter2_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_scatter6_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_scatter_bf16_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_scheduler_vertical_fusion1_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_sdpa_prefer_nd_tiling_True_use_block_ptr_False_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_sdpa_prefer_nd_tiling_True_use_block_ptr_True_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_setitem_with_int_parameter_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_sgn_extremal_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_should_pad_bench_for_bmm_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_sigmoid_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_sign_dtype_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_sin_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_slice_scatter_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_softmax_backward_data_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_softmax_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_softmax_one_kernel_persist_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_sort_stable_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_split_cumsum_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_split_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_squeeze_varargs_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_sum3_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_sum4_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_sum_int_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_to_device_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_to_dtype_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_torch_device_split_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_triu_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_uint_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_unbacked_float_item_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_unbacked_floordiv_simplify_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_unbacked_floordiv_simplify_errors_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_unbind_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_unspec_inputs_float32_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_unspec_inputs_int64_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_unsqueeze_inplace_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_upsample_nearest2d_backward_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_var_correction_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_var_mean_tile_reduction_True_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_views5_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_views7_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_weight_norm_conv2d_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_where_broadcast_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_xblock_divides_xnumel_dynamic_shapes_cpu, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_AllenaiLongformerBase_repro_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test__dyn_quant_matmul_4bit_fp32_input_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test__dyn_quant_pack_4bit_weight_fp32_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test__unsafe_masked_index_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_adaptive_avg_pool_errors_with_long_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_adaptive_avg_pool_with_output_size_0_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_adaptive_max_pool2d1_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_adaptive_max_pool2d3_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_adaptive_pool_errors_with_long_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_add_complex5_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_add_complex_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_add_complex_strided_fallback_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_add_inplace_permuted_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_alexnet_prefix_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_any_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_arange2_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_arange6_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_argmax_argmin3_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_avg_pool2d2_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_avg_pool3d_backward3_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_bucketize_default_kwargs_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_bucketize_int_int32_int32_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_bucketize_int_int64_int32_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_bucketize_int_int64_int8_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_bucketize_int_int8_uint8_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_bucketize_nd_tiling_False_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_buffer_batch_norm_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_builtins_round_float_ndigits_pos_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_builtins_round_float_ndigits_zero_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_cat_negative_dim_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_cat_of_loops_and_extern_kernel_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_cat_uint8_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_cauchy_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_chunk_recompiles_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_clamp_type_promotion_non_tensor_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_computed_buffer_inlining_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_config_option_dont_assume_alignment_recompiles_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_constant_pad_1d_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_constant_pad_float64_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_constant_pad_nd_inplace_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_conv_bn_fuse_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_conv_inference_heuristics_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_conv_with_as_strided_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_copy_non_blocking_is_pinned_use_cat_False_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_copy_non_blocking_is_pinned_use_cat_True_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_cpu_scalar_with_cpu_scalar_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_cummin_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_cumprod_zero_dim_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_custom_op_2_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_custom_scan_op_compiled_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_custom_scan_op_multi_input_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_custom_scan_would_split_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_deterministic_codegen_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_dist_bf16_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_div1_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_div3_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_dtype_mismatch_issue_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_dtypeview_float16_float32_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_dtypeview_float16_int16_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_dtypeview_float16_uint8_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_dtypeview_float32_int16_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_dtypeview_float32_int64_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_dtypeview_float64_float32_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_dtypeview_fusion_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_dtypeview_int16_float64_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_dtypeview_int16_uint8_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_dtypeview_int32_float16_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_dtypeview_int32_float32_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_dtypeview_int32_float64_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_dtypeview_int64_int16_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_dtypeview_int64_int8_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_dtypeview_int8_int64_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_dtypeview_int8_uint8_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_dtypeview_uint8_float16_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_dtypeview_uint8_int16_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_dtypeview_uint8_int32_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_embedding_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_empty2_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_empty_strided_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_emulate_precision_triton_fp_fusion_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_exp2_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_fallback_mutable_op_no_mutated_tensors_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_fallback_mutable_op_with_return_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_fft_real_input_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_float_index_expression_type_promotion_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_floordiv_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_fuse_large_params_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_generated_code_has_alignment_assert_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_getitem_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_graph_partition_misaligned_input_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_graph_partition_refcount_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_hardsigmoid_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_horizonal_fusion2_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_index2_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_index3_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_index_propagation_device_assert_masked_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_index_propagation_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_index_propagation_flip_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_index_propagation_floordiv_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_index_put1_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_index_put_fallback1_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_index_put_fallback2_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_index_select_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_inplace_mixed_dtype_ops_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_int_input_dynamic_shapes_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_invalid_operand_issue1_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_isinf2_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_issue102546_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_kernel_names_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_large_block_sizes_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_large_grid_use_block_ptr_False_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_large_pointwise_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_large_tensor_reduction_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_lerp_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_like_rands_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_linear1_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_linear2_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_linear_float64_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_list_clearing_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_lite_regional_compile_flex_attention_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_lite_regional_compile_invoke_subgraph_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_log2_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_low_memory_max_pool_dilation_1_dim_2_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_low_memory_max_pool_dilation_1_dim_3_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_low_memory_max_pool_dilation_2_dim_2_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_max_min_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_max_pool2d4_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_max_pool2d6_dilation_1_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_max_pool2d6_dilation_2_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_max_pool2d_with_indices_backward6_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_mean_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_mul_index_expr_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_multi_gpu_device_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_multilayer_var_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_nan_assert_inside_triton_kernel_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_nan_sort_stable_False_descending_True_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_new_empty_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_nll_loss_forward_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_pixel_shuffle_channels_last_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_pointwise_bessel_j1_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_pointwise_chebyshev_polynomial_t_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_pointwise_chebyshev_polynomial_w_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_pointwise_expit_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_pointwise_gammaincc_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_pointwise_i0_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_pointwise_log1p_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_pointwise_modified_bessel_k0_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_pointwise_psi_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_pointwise_round_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_pointwise_shifted_chebyshev_polynomial_v_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_pow3_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_pow_int_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_pow_symfloat_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_randn_generator_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_randn_like_empty_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_reduction1_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_reflection_pad2d_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_relu_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_remove_noop_view_dtype_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_repeat_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_repeat_interleave_2_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_repeat_interleave_Tensor_decomp_int32_nd_2_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_require_stride_expanded_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_resize_as_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_roll_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_rsqrt_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_scalar_output_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_scatter5_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_scatter_reduce1_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_scheduler_vertical_fusion1_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_sdpa_unaligned_mask_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_searchsorted_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_select_scatter_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_sin_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_size_asserts_for_multi_output_fallback_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_sizehint_issue1_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_slice3_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_slice_mutation3_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_slice_scatter2_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_slice_scatter_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_slice_scatter_reinplace_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_softmax_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_softmax_one_kernel_persist_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_split_cumprod_low_prec_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_split_cumsum_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_split_cumsum_low_prec_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_sqrt_dynamic_shapes_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_squeeze1_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_sum3_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_sum5_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_tanh_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_tmp_not_defined_issue1_use_block_ptr_True_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_to_dtype_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_triton_argmin_argmax_transpose_logical_index_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_triton_kernel_bool_param_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_unfold_zero_dimension_tensor_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_unroll_small_reduction_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_unspec_inputs_float16_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_unspec_inputs_float32_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_unspec_inputs_float64_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_unsqueeze_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_unsqueeze_inplace_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_upsample_bilinear2d_a_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_vectorized_ops_masked_var_novec_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_view_as_complex_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_view_as_real_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_view_detach_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_views2_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_views3_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_views7_dynamic_shapes_cuda, test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_zeros_dynamic_shapes_cuda 2025-12-04T10:19:47.9191670Z 2025-12-04T10:19:47.9193493Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_AllenaiLongformerBase_repro_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py W1204 10:08:54.874000 26638 site-packages/torch/_dynamo/variables/torch.py:1533] [0/0] Calling on only torch.SymInt arguments is not yet supported. 2025-12-04T10:19:47.9196111Z W1204 10:08:54.874000 26638 site-packages/torch/_dynamo/variables/torch.py:1533] [0/0] To support this behavior, we need to allow const-propping tensors that store symint data. 2025-12-04T10:19:47.9197681Z W1204 10:08:54.874000 26638 site-packages/torch/_dynamo/variables/torch.py:1533] [0/0] For now, dynamo will explicitly graph break when it encounters user code with this behavior. 2025-12-04T10:19:47.9198836Z W1204 10:08:54.874000 26638 site-packages/torch/_dynamo/variables/torch.py:1533] [0/0] 2025-12-04T10:19:47.9199395Z XFAIL [23.3568s] [ 0%] 2025-12-04T10:19:47.9200529Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test__dyn_quant_matmul_4bit_fp32_input_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [4.6635s] [ 0%] 2025-12-04T10:19:47.9202521Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test__dyn_quant_pack_4bit_weight_bf16_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [4.3064s] [ 0%] 2025-12-04T10:19:47.9204498Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test__dyn_quant_pack_4bit_weight_fp32_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [2.5684s] [ 0%] 2025-12-04T10:19:47.9206417Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test__unsafe_masked_index_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [0.9125s] [ 1%] 2025-12-04T10:19:47.9208394Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test__unsafe_masked_index_put_accumulate_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [1.0043s] [ 1%] 2025-12-04T10:19:47.9210371Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_adaptive_avg_pool2d1_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [8.0236s] [ 1%] 2025-12-04T10:19:47.9212235Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_add_complex10_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [0.9747s] [ 1%] 2025-12-04T10:19:47.9214318Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_add_complex4_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [3.3084s] [ 2%] 2025-12-04T10:19:47.9216181Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_add_complex8_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py SKIPPED [0.0004s] (Skipped!) [ 2%] 2025-12-04T10:19:47.9218122Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_add_complex_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [0.9021s] [ 2%] 2025-12-04T10:19:47.9220154Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_add_inplace_permuted_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [1.2064s] [ 2%] 2025-12-04T10:19:47.9222049Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_adding_tensor_offsets_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [0.2876s] [ 2%] 2025-12-04T10:19:47.9223934Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_alexnet_prefix_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [8.2190s] [ 3%] 2025-12-04T10:19:47.9225840Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_allow_reuse_disable_if_exceed_peak_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [1.5679s] [ 3%] 2025-12-04T10:19:47.9227744Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_angle_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [1.0761s] [ 3%] 2025-12-04T10:19:47.9229558Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_any_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [2.3983s] [ 3%] 2025-12-04T10:19:47.9231697Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_aoti_eager_cache_hit_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py W1204 10:09:59.203000 26638 site-packages/torch/_export/__init__.py:71] +============================+ 2025-12-04T10:19:47.9233405Z W1204 10:09:59.204000 26638 site-packages/torch/_export/__init__.py:72] | !!! WARNING !!! | 2025-12-04T10:19:47.9234248Z W1204 10:09:59.204000 26638 site-packages/torch/_export/__init__.py:73] +============================+ 2025-12-04T10:19:47.9235966Z W1204 10:09:59.204000 26638 site-packages/torch/_export/__init__.py:74] torch._export.aot_compile()/torch._export.aot_load() is being deprecated, please switch to directly calling torch._inductor.aoti_compile_and_package(torch.export.export())/torch._inductor.aoti_load_package() instead. 2025-12-04T10:19:47.9237439Z PASSED [5.6791s] [ 4%] 2025-12-04T10:19:47.9238565Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_aoti_eager_with_persistent_cache_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [5.1087s] [ 4%] 2025-12-04T10:19:47.9240446Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_arange1_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py XFAIL [0.8514s] [ 4%] 2025-12-04T10:19:47.9242237Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_argmax_argmin1_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [1.1264s] [ 4%] 2025-12-04T10:19:47.9244052Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_as_strided_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [1.9365s] [ 4%] 2025-12-04T10:19:47.9245893Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_avg_pool2d_backward4_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py XFAIL [0.3442s] [ 5%] 2025-12-04T10:19:47.9247715Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_bitwise2_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [1.0401s] [ 5%] 2025-12-04T10:19:47.9249570Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_bitwise3_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [0.9872s] [ 5%] 2025-12-04T10:19:47.9251329Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_bmm1_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [1.8679s] [ 5%] 2025-12-04T10:19:47.9253151Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_bucketize_int_int32_int8_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [1.2882s] [ 6%] 2025-12-04T10:19:47.9255151Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_bucketize_int_int64_uint8_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [1.2796s] [ 6%] 2025-12-04T10:19:47.9257149Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_bucketize_int_uint8_uint8_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py XFAIL [0.1477s] [ 6%] 2025-12-04T10:19:47.9259032Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_buffer_batch_norm_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [2.5560s] [ 6%] 2025-12-04T10:19:47.9260907Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_buffer_use_after_remove_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [3.5134s] [ 7%] 2025-12-04T10:19:47.9262863Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_builtins_round_float_ndigits_neg_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [0.8061s] [ 7%] 2025-12-04T10:19:47.9264731Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_cat_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [3.2712s] [ 7%] 2025-12-04T10:19:47.9266480Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_cat_empty_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [1.8640s] [ 7%] 2025-12-04T10:19:47.9268277Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_cat_extern_kernel_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [1.0362s] [ 7%] 2025-12-04T10:19:47.9270085Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_cat_uint8_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py XFAIL [0.7988s] [ 8%] 2025-12-04T10:19:47.9272093Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_cat_unbacked_legacy_empty_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [0.0314s] [ 8%] 2025-12-04T10:19:47.9273937Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_cauchy_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [0.8695s] [ 8%] 2025-12-04T10:19:47.9275760Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_check_stack_no_cycles_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [0.8353s] [ 8%] 2025-12-04T10:19:47.9277630Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_clamp_type_promotion_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py XFAIL [0.8605s] [ 9%] 2025-12-04T10:19:47.9279512Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_complex_from_real_imag_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py XFAIL [0.1709s] [ 9%] 2025-12-04T10:19:47.9281381Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_concat_add_inplace_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [1.2000s] [ 9%] 2025-12-04T10:19:47.9283274Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_consecutive_split_cumsum_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [1.1870s] [ 9%] 2025-12-04T10:19:47.9285292Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_constant_pad_1d_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [1.2387s] [ 9%] 2025-12-04T10:19:47.9287169Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_constant_pad_float64_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [0.8992s] [ 10%] 2025-12-04T10:19:47.9289107Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_conv1d_depthwise_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [1.3846s] [ 10%] 2025-12-04T10:19:47.9291043Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_conv1d_with_permute_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py SKIPPED [0.0004s] (Skipped!) [ 10%] 2025-12-04T10:19:47.9293251Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_conv3d_channels_last_use_block_ptr_True_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py SKIPPED [0.0034s] (triton backend is required for cpu) [ 10%] 2025-12-04T10:19:47.9295367Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_conv_backward_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py XFAIL [1.0560s] [ 11%] 2025-12-04T10:19:47.9297420Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_conv_inference_heuristics_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py SKIPPED [0.0033s] (cuda only test) [ 11%] 2025-12-04T10:19:47.9299412Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_conv_with_as_strided_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [4.7183s] [ 11%] 2025-12-04T10:19:47.9301262Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_convolution1_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [4.5091s] [ 11%] 2025-12-04T10:19:47.9303081Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_convolution2_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py XFAIL [0.4406s] [ 12%] 2025-12-04T10:19:47.9304894Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_convolution3_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [2.4916s] [ 12%] 2025-12-04T10:19:47.9307219Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_copy_non_blocking_is_pinned_use_cat_True_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py W1204 10:10:58.614000 26638 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program 2025-12-04T10:19:47.9309069Z W1204 10:10:58.616000 26638 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program 2025-12-04T10:19:47.9309994Z W1204 10:10:58.617000 26638 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program 2025-12-04T10:19:47.9310921Z W1204 10:10:58.618000 26638 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program 2025-12-04T10:19:47.9311842Z W1204 10:10:58.618000 26638 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program 2025-12-04T10:19:47.9312746Z W1204 10:10:58.619000 26638 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program 2025-12-04T10:19:47.9313667Z W1204 10:10:58.620000 26638 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program 2025-12-04T10:19:47.9314584Z W1204 10:10:58.621000 26638 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program 2025-12-04T10:19:47.9315497Z W1204 10:10:58.622000 26638 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program 2025-12-04T10:19:47.9316401Z W1204 10:10:58.622000 26638 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program 2025-12-04T10:19:47.9317394Z W1204 10:10:58.623000 26638 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program 2025-12-04T10:19:47.9318320Z W1204 10:10:58.624000 26638 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program 2025-12-04T10:19:47.9319233Z W1204 10:10:58.625000 26638 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program 2025-12-04T10:19:47.9320135Z W1204 10:10:58.625000 26638 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program 2025-12-04T10:19:47.9321145Z W1204 10:10:58.626000 26638 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program 2025-12-04T10:19:47.9322064Z W1204 10:10:58.627000 26638 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program 2025-12-04T10:19:47.9322981Z W1204 10:10:58.628000 26638 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program 2025-12-04T10:19:47.9323894Z W1204 10:10:58.628000 26638 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program 2025-12-04T10:19:47.9324820Z W1204 10:10:58.629000 26638 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program 2025-12-04T10:19:47.9325746Z W1204 10:10:58.630000 26638 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program 2025-12-04T10:19:47.9326669Z W1204 10:10:58.631000 26638 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program 2025-12-04T10:19:47.9327583Z W1204 10:10:58.631000 26638 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program 2025-12-04T10:19:47.9328596Z W1204 10:10:58.632000 26638 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program 2025-12-04T10:19:47.9329546Z W1204 10:10:58.633000 26638 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program 2025-12-04T10:19:47.9330456Z W1204 10:10:58.634000 26638 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program 2025-12-04T10:19:47.9331386Z W1204 10:10:58.635000 26638 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program 2025-12-04T10:19:47.9332312Z W1204 10:10:58.635000 26638 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program 2025-12-04T10:19:47.9333238Z W1204 10:10:58.636000 26638 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program 2025-12-04T10:19:47.9334145Z W1204 10:10:58.637000 26638 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program 2025-12-04T10:19:47.9335098Z W1204 10:10:58.638000 26638 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program 2025-12-04T10:19:47.9336029Z W1204 10:10:58.638000 26638 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program 2025-12-04T10:19:47.9337034Z W1204 10:10:58.639000 26638 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program 2025-12-04T10:19:47.9337946Z W1204 10:10:58.640000 26638 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program 2025-12-04T10:19:47.9338880Z W1204 10:10:58.641000 26638 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program 2025-12-04T10:19:47.9339814Z W1204 10:10:58.641000 26638 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program 2025-12-04T10:19:47.9340739Z W1204 10:10:58.642000 26638 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program 2025-12-04T10:19:47.9341656Z W1204 10:10:58.643000 26638 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program 2025-12-04T10:19:47.9342591Z W1204 10:10:58.644000 26638 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program 2025-12-04T10:19:47.9343517Z W1204 10:10:58.645000 26638 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program 2025-12-04T10:19:47.9344447Z W1204 10:10:58.646000 26638 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program 2025-12-04T10:19:47.9345453Z W1204 10:10:58.646000 26638 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program 2025-12-04T10:19:47.9346387Z W1204 10:10:58.647000 26638 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program 2025-12-04T10:19:47.9347317Z W1204 10:10:58.648000 26638 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program 2025-12-04T10:19:47.9348244Z W1204 10:10:58.649000 26638 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program 2025-12-04T10:19:47.9349252Z W1204 10:10:58.650000 26638 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program 2025-12-04T10:19:47.9350176Z W1204 10:10:58.650000 26638 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program 2025-12-04T10:19:47.9351104Z W1204 10:10:58.651000 26638 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program 2025-12-04T10:19:47.9352029Z W1204 10:10:58.652000 26638 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program 2025-12-04T10:19:47.9352944Z W1204 10:10:58.653000 26638 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program 2025-12-04T10:19:47.9353868Z W1204 10:10:58.653000 26638 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program 2025-12-04T10:19:47.9354794Z W1204 10:10:58.654000 26638 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program 2025-12-04T10:19:47.9355716Z W1204 10:10:58.655000 26638 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program 2025-12-04T10:19:47.9356631Z W1204 10:10:58.656000 26638 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program 2025-12-04T10:19:47.9357550Z W1204 10:10:58.656000 26638 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program 2025-12-04T10:19:47.9358468Z W1204 10:10:58.657000 26638 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program 2025-12-04T10:19:47.9359381Z W1204 10:10:58.658000 26638 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program 2025-12-04T10:19:47.9360302Z W1204 10:10:58.659000 26638 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program 2025-12-04T10:19:47.9361225Z W1204 10:10:58.660000 26638 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program 2025-12-04T10:19:47.9362152Z W1204 10:10:58.660000 26638 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program 2025-12-04T10:19:47.9363061Z W1204 10:10:58.661000 26638 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program 2025-12-04T10:19:47.9363987Z W1204 10:10:58.662000 26638 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program 2025-12-04T10:19:47.9364911Z W1204 10:10:58.663000 26638 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program 2025-12-04T10:19:47.9365827Z W1204 10:10:58.663000 26638 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program 2025-12-04T10:19:47.9366738Z W1204 10:10:58.664000 26638 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program 2025-12-04T10:19:47.9367663Z W1204 10:10:58.665000 26638 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program 2025-12-04T10:19:47.9368588Z W1204 10:10:58.666000 26638 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program 2025-12-04T10:19:47.9369517Z W1204 10:10:58.666000 26638 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program 2025-12-04T10:19:47.9370430Z W1204 10:10:58.667000 26638 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program 2025-12-04T10:19:47.9371532Z W1204 10:10:58.668000 26638 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program 2025-12-04T10:19:47.9372460Z W1204 10:10:58.669000 26638 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program 2025-12-04T10:19:47.9373510Z W1204 10:10:58.670000 26638 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program 2025-12-04T10:19:47.9374423Z W1204 10:10:58.670000 26638 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program 2025-12-04T10:19:47.9375345Z W1204 10:10:58.671000 26638 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program 2025-12-04T10:19:47.9376359Z W1204 10:10:58.672000 26638 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program 2025-12-04T10:19:47.9377386Z W1204 10:10:58.673000 26638 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program 2025-12-04T10:19:47.9378299Z W1204 10:10:58.673000 26638 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program 2025-12-04T10:19:47.9379227Z W1204 10:10:58.674000 26638 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program 2025-12-04T10:19:47.9380155Z W1204 10:10:58.675000 26638 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program 2025-12-04T10:19:47.9381088Z W1204 10:10:58.676000 26638 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program 2025-12-04T10:19:47.9382005Z W1204 10:10:58.676000 26638 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program 2025-12-04T10:19:47.9382930Z W1204 10:10:58.677000 26638 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program 2025-12-04T10:19:47.9383853Z W1204 10:10:58.678000 26638 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program 2025-12-04T10:19:47.9384763Z W1204 10:10:58.679000 26638 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program 2025-12-04T10:19:47.9385689Z W1204 10:10:58.679000 26638 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program 2025-12-04T10:19:47.9386605Z W1204 10:10:58.680000 26638 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program 2025-12-04T10:19:47.9387536Z W1204 10:10:58.681000 26638 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program 2025-12-04T10:19:47.9388440Z W1204 10:10:58.682000 26638 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program 2025-12-04T10:19:47.9389360Z W1204 10:10:58.683000 26638 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program 2025-12-04T10:19:47.9390282Z W1204 10:10:58.683000 26638 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program 2025-12-04T10:19:47.9391202Z W1204 10:10:58.684000 26638 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program 2025-12-04T10:19:47.9392112Z W1204 10:10:58.685000 26638 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program 2025-12-04T10:19:47.9393035Z W1204 10:10:58.686000 26638 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program 2025-12-04T10:19:47.9393960Z W1204 10:10:58.686000 26638 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program 2025-12-04T10:19:47.9394882Z W1204 10:10:58.687000 26638 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program 2025-12-04T10:19:47.9395785Z W1204 10:10:58.688000 26638 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program 2025-12-04T10:19:47.9396703Z W1204 10:10:58.689000 26638 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program 2025-12-04T10:19:47.9397619Z W1204 10:10:58.689000 26638 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program 2025-12-04T10:19:47.9398544Z W1204 10:10:58.690000 26638 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program 2025-12-04T10:19:47.9399446Z W1204 10:10:58.691000 26638 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program 2025-12-04T10:19:47.9400367Z W1204 10:10:58.692000 26638 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program 2025-12-04T10:19:47.9401006Z PASSED [12.0757s] [ 12%] 2025-12-04T10:19:47.9402189Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_cpu_scalar_with_cpu_scalar_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [0.7626s] [ 12%] 2025-12-04T10:19:47.9404063Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_cumprod_zero_dim_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py XFAIL [0.7309s] [ 12%] 2025-12-04T10:19:47.9405878Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_custom_op_1_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [1.1294s] [ 13%] 2025-12-04T10:19:47.9408010Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_custom_scan_op_compiled_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py SKIPPED [0.0036s] (associative_scan only supported on GPU) [ 13%] 2025-12-04T10:19:47.9410149Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_deterministic_codegen_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [11.3208s] [ 13%] 2025-12-04T10:19:47.9412139Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_deterministic_codegen_on_graph_break_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [2.4426s] [ 13%] 2025-12-04T10:19:47.9414025Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_div2_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [0.9860s] [ 14%] 2025-12-04T10:19:47.9415853Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_div_presicion_accuracy_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [0.8129s] [ 14%] 2025-12-04T10:19:47.9417822Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_dont_constant_fold_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [1.7506s] [ 14%] 2025-12-04T10:19:47.9419717Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_dropout2_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py SKIPPED [0.0003s] (Skipped!) [ 14%] 2025-12-04T10:19:47.9421624Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_dropout3_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py SKIPPED [0.0003s] (Skipped!) [ 14%] 2025-12-04T10:19:47.9423479Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_dropout_trivial_0_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [0.7720s] [ 15%] 2025-12-04T10:19:47.9425344Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_dtype_sympy_expr_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [2.2906s] [ 15%] 2025-12-04T10:19:47.9427231Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_dtypeview_float16_int64_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [1.9703s] [ 15%] 2025-12-04T10:19:47.9429162Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_dtypeview_float16_uint8_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [1.9607s] [ 15%] 2025-12-04T10:19:47.9431070Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_dtypeview_float64_int16_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [0.0044s] [ 16%] 2025-12-04T10:19:47.9432972Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_dtypeview_float64_int32_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [1.9800s] [ 16%] 2025-12-04T10:19:47.9434894Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_dtypeview_float64_int64_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [1.9573s] [ 16%] 2025-12-04T10:19:47.9436917Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_dtypeview_float64_uint8_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [0.0044s] [ 16%] 2025-12-04T10:19:47.9438810Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_dtypeview_fusion_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [0.7861s] [ 17%] 2025-12-04T10:19:47.9440688Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_dtypeview_int32_uint8_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [0.0043s] [ 17%] 2025-12-04T10:19:47.9442640Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_dtypeview_int64_float16_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [0.0040s] [ 17%] 2025-12-04T10:19:47.9444566Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_dtypeview_int64_float32_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [2.0153s] [ 17%] 2025-12-04T10:19:47.9446484Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_dtypeview_int64_uint8_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [0.0042s] [ 17%] 2025-12-04T10:19:47.9448376Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_dtypeview_int8_int16_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [1.9378s] [ 18%] 2025-12-04T10:19:47.9450267Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_dtypeview_int8_int32_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [1.9175s] [ 18%] 2025-12-04T10:19:47.9452156Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_dtypeview_uint8_float32_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [1.9450s] [ 18%] 2025-12-04T10:19:47.9454066Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_dtypeview_uint8_int8_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [1.8972s] [ 18%] 2025-12-04T10:19:47.9455962Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_dtypeview_uint8_uint8_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [1.8797s] [ 19%] 2025-12-04T10:19:47.9457847Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_elu_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [1.5956s] [ 19%] 2025-12-04T10:19:47.9459596Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_empty2_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py XFAIL [0.1020s] [ 19%] 2025-12-04T10:19:47.9461337Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_erfinv_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [0.7959s] [ 19%] 2025-12-04T10:19:47.9463174Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_expanded_reduction_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [1.0350s] [ 19%] 2025-12-04T10:19:47.9465117Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_fallback_mutable_op_with_return_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [0.0538s] [ 20%] 2025-12-04T10:19:47.9466986Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_fill1_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [0.7918s] [ 20%] 2025-12-04T10:19:47.9468753Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_floordiv_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [1.5951s] [ 20%] 2025-12-04T10:19:47.9470537Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_fmod_zero_dim_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [2.0226s] [ 20%] 2025-12-04T10:19:47.9473379Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_forced_buffer_realize_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py SKIPPED [0.0004s] (Skipped!) [ 21%] 2025-12-04T10:19:47.9475361Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_fractional_max_pool2d3_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [1.3441s] [ 21%] 2025-12-04T10:19:47.9477374Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_fractional_max_pool2d4_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [4.3824s] [ 21%] 2025-12-04T10:19:47.9479293Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_fractional_max_pool2d5_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [1.7791s] [ 21%] 2025-12-04T10:19:47.9481459Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_generated_code_has_size_stride_assert_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py SKIPPED [0.0034s] (triton backend is required for cpu) [ 21%] 2025-12-04T10:19:47.9483651Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_getitem_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [0.0687s] [ 22%] 2025-12-04T10:19:47.9485392Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_glu_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py XFAIL [1.8233s] [ 22%] 2025-12-04T10:19:47.9487213Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_graph_partition_arange1_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [1.6630s] [ 22%] 2025-12-04T10:19:47.9489144Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_graph_partition_arange2_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [1.4891s] [ 22%] 2025-12-04T10:19:47.9491097Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_graph_partition_both_scalars_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [0.8422s] [ 23%] 2025-12-04T10:19:47.9493083Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_graph_partition_misaligned_input_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [2.0153s] [ 23%] 2025-12-04T10:19:47.9495005Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_hardsigmoid_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [0.8987s] [ 23%] 2025-12-04T10:19:47.9496877Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_hardswish_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [0.9197s] [ 23%] 2025-12-04T10:19:47.9498714Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_index_float_zero_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [1.1615s] [ 24%] 2025-12-04T10:19:47.9500616Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_index_propagation_floordiv_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [1.2158s] [ 24%] 2025-12-04T10:19:47.9502565Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_index_propagation_remainder_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [0.9749s] [ 24%] 2025-12-04T10:19:47.9504451Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_index_put3_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [1.2532s] [ 24%] 2025-12-04T10:19:47.9506295Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_index_put_fallback1_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [0.9517s] [ 24%] 2025-12-04T10:19:47.9508260Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_index_put_fallback2_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [1.0265s] [ 25%] 2025-12-04T10:19:47.9510141Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_index_put_reinplace_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [1.0056s] [ 25%] 2025-12-04T10:19:47.9512124Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_inductor_triton_bucketize_respects_masking_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [1.0734s] [ 25%] 2025-12-04T10:19:47.9514205Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_inplace_activations_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [1.8721s] [ 25%] 2025-12-04T10:19:47.9516120Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_inplace_add_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py SKIPPED [0.0003s] (Skipped!) [ 26%] 2025-12-04T10:19:47.9518110Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_inplace_mixed_dtype_ops_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py SKIPPED [0.0002s] (Skipped!) [ 26%] 2025-12-04T10:19:47.9520486Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_inplace_resize_as_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py E1204 10:12:27.962000 26638 site-packages/torch/_dynamo/utils.py:3241] Accuracy failed: allclose not within tol=0.0001 2025-12-04T10:19:47.9522009Z PASSED [0.0804s] [ 26%] 2025-12-04T10:19:47.9523092Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_inplace_where_pointwise_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [0.7812s] [ 26%] 2025-12-04T10:19:47.9525051Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_input_mutation1_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py SKIPPED [0.0003s] (Skipped!) [ 26%] 2025-12-04T10:19:47.9526994Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_int_input_dynamic_shapes_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [0.7701s] [ 27%] 2025-12-04T10:19:47.9528854Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_issue102546_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py XFAIL [0.7171s] [ 27%] 2025-12-04T10:19:47.9530830Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_large_broadcast_reduction_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py SKIPPED [0.0033s] (cpu not supported) [ 27%] 2025-12-04T10:19:47.9532891Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_large_grid_use_block_ptr_False_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [1.4994s] [ 27%] 2025-12-04T10:19:47.9534833Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_large_offset_pointwise_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [1.5430s] [ 28%] 2025-12-04T10:19:47.9537805Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_layer_norm_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py W1204 10:12:34.648000 26638 site-packages/torch/_inductor/debug.py:518] [0/0_1] model__151_inference_165 debug trace: /var/lib/jenkins/workspace/test/torch_compile_debug/run_2025_12_04_10_12_33_608767-pid_26638/torchinductor/model__151_inference_165.0 2025-12-04T10:19:47.9539925Z PASSED [1.4086s] [ 28%] 2025-12-04T10:19:47.9540931Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_leaky_relu_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [0.9105s] [ 28%] 2025-12-04T10:19:47.9542786Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_linear2_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [2.1148s] [ 28%] 2025-12-04T10:19:47.9544581Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_linear_float64_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py XFAIL [0.1705s] [ 29%] 2025-12-04T10:19:47.9546470Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_list_clearing_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py SKIPPED [0.0003s] (Skipped!) [ 29%] 2025-12-04T10:19:47.9548440Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_lite_mode_fallback_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [0.1388s] [ 29%] 2025-12-04T10:19:47.9550386Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_lite_mode_not_decompose_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py SKIPPED [0.0033s] (requires GPU) [ 29%] 2025-12-04T10:19:47.9552522Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_lite_regional_compile_flex_attention_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py SKIPPED [0.0031s] (requires GPU) [ 29%] 2025-12-04T10:19:47.9554644Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_lite_regional_compile_invoke_subgraph_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [2.0693s] [ 30%] 2025-12-04T10:19:47.9556546Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_log1p_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [5.9812s] [ 30%] 2025-12-04T10:19:47.9558289Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_log2_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [0.9948s] [ 30%] 2025-12-04T10:19:47.9560036Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_log_fp64_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [1.0211s] [ 30%] 2025-12-04T10:19:47.9561814Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_logcumsumexp_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py XFAIL [0.1599s] [ 31%] 2025-12-04T10:19:47.9564030Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_mark_dynamic_with_hint_override_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py SKIPPED [0.0003s] (Skipping triton backend only since not big GPU (not enough SM)) [ 31%] 2025-12-04T10:19:47.9566238Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_max_pool2d1_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [2.8082s] [ 31%] 2025-12-04T10:19:47.9568033Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_max_pool2d2_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [1.6041s] [ 31%] 2025-12-04T10:19:47.9569890Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_max_pool2d6_dilation_2_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [3.1172s] [ 31%] 2025-12-04T10:19:47.9572185Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_max_pool2d_with_indices_backward5_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py XFAIL [0.3953s] [ 32%] 2025-12-04T10:19:47.9574191Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_min_max_reduction_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py SKIPPED [0.0003s] (Skipped!) [ 32%] 2025-12-04T10:19:47.9576125Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_min_max_reduction_nan_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [0.8041s] [ 32%] 2025-12-04T10:19:47.9578393Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_mixed_mm2_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [0.9313s] [ 32%] 2025-12-04T10:19:47.9580168Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_mixed_mm3_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [0.8411s] [ 33%] 2025-12-04T10:19:47.9581959Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_mm_mixed_dtype_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [0.1972s] [ 33%] 2025-12-04T10:19:47.9584024Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_multilayer_sum_low_prec_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py SKIPPED [0.0034s] (requires cuda) [ 33%] 2025-12-04T10:19:47.9586064Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_mutable_custom_op_fixed_layout2_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [1.3788s] [ 33%] 2025-12-04T10:19:47.9588077Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_nan_sort_stable_True_descending_True_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [0.1266s] [ 34%] 2025-12-04T10:19:47.9589987Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_neg_max_uint8_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [0.8018s] [ 34%] 2025-12-04T10:19:47.9591803Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_new_empty_strided_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py XFAIL [0.1070s] [ 34%] 2025-12-04T10:19:47.9593656Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_nll_loss_forward_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [1.4678s] [ 34%] 2025-12-04T10:19:47.9595457Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_one_hot_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [0.9614s] [ 34%] 2025-12-04T10:19:47.9597228Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_pad_cast_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [2.1749s] [ 35%] 2025-12-04T10:19:47.9599082Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_pattern_matcher_unbacked_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [1.3124s] [ 35%] 2025-12-04T10:19:47.9600982Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_pointwise_bessel_j0_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [0.8190s] [ 35%] 2025-12-04T10:19:47.9602836Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_pointwise_erf_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [0.9767s] [ 35%] 2025-12-04T10:19:47.9604667Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_pointwise_erfc_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [0.9638s] [ 36%] 2025-12-04T10:19:47.9606496Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_pointwise_exp2_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [0.9504s] [ 36%] 2025-12-04T10:19:47.9608343Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_pointwise_expm1_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [0.9856s] [ 36%] 2025-12-04T10:19:47.9610194Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_pointwise_gammaincc_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [0.8873s] [ 36%] 2025-12-04T10:19:47.9612135Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_pointwise_modified_bessel_i0_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [0.7902s] [ 36%] 2025-12-04T10:19:47.9614154Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_pointwise_multigammaln_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [1.0758s] [ 37%] 2025-12-04T10:19:47.9616072Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_pointwise_ndtr_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [1.0681s] [ 37%] 2025-12-04T10:19:47.9618014Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_pointwise_polygamma_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [0.9147s] [ 37%] 2025-12-04T10:19:47.9620033Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_pointwise_scaled_modified_bessel_k1_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [0.8020s] [ 37%] 2025-12-04T10:19:47.9622103Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_pointwise_shifted_chebyshev_polynomial_v_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [0.8019s] [ 38%] 2025-12-04T10:19:47.9624086Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_pointwise_xlogy_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [1.0875s] [ 38%] 2025-12-04T10:19:47.9625952Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_pow3_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py SKIPPED [0.0003s] (Skipped!) [ 38%] 2025-12-04T10:19:47.9627778Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_pow_int_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [3.9913s] [ 38%] 2025-12-04T10:19:47.9629651Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_prepare_softmax_with_fast_math_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [0.2456s] [ 39%] 2025-12-04T10:19:47.9631666Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_rand_like_deterministic_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py SKIPPED [0.0003s] (Skipped!) [ 39%] 2025-12-04T10:19:47.9633597Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_reduction1_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [1.0404s] [ 39%] 2025-12-04T10:19:47.9635408Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_reduction2_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [0.9663s] [ 39%] 2025-12-04T10:19:47.9637446Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_reduction_config_limit_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py SKIPPED [0.0034s] (triton backend is required for cpu) [ 39%] 2025-12-04T10:19:47.9639497Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_remainder_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [1.1096s] [ 40%] 2025-12-04T10:19:47.9641317Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_remove_noop_slice_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [0.8775s] [ 40%] 2025-12-04T10:19:47.9643211Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_remove_noop_view_default_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [0.3706s] [ 40%] 2025-12-04T10:19:47.9645101Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_repeat_as_strided_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py XFAIL [0.7332s] [ 40%] 2025-12-04T10:19:47.9647074Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_repeat_interleave_Tensor_decomp_int32_nd_2_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [1.0373s] [ 41%] 2025-12-04T10:19:47.9649561Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_repeat_interleave_decomposition_has_clamp_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py SKIPPED [0.0036s] (repeat_interleave decomp doesn't support dynamic output size) [ 41%] 2025-12-04T10:19:47.9651809Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_resize_as_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [22.6377s] [ 41%] 2025-12-04T10:19:47.9653645Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_round_correctness_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [0.8167s] [ 41%] 2025-12-04T10:19:47.9655526Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_rsqrt_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [0.8030s] [ 41%] 2025-12-04T10:19:47.9657348Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_scalar_input_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [0.8782s] [ 42%] 2025-12-04T10:19:47.9659356Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_scaled_dot_product_efficient_attention_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py SKIPPED [0.0003s] (Skipped!) [ 42%] 2025-12-04T10:19:47.9661327Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_scatter2_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [0.8047s] [ 42%] 2025-12-04T10:19:47.9663102Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_scatter6_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [1.9560s] [ 42%] 2025-12-04T10:19:47.9664896Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_scatter_bf16_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [2.4898s] [ 43%] 2025-12-04T10:19:47.9666774Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_scheduler_vertical_fusion1_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [1.1551s] [ 43%] 2025-12-04T10:19:47.9669032Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_sdpa_prefer_nd_tiling_True_use_block_ptr_False_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py SKIPPED [0.0003s] (Does not support SDPA or pre-SM80 hardware) [ 43%] 2025-12-04T10:19:47.9671940Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_sdpa_prefer_nd_tiling_True_use_block_ptr_True_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py SKIPPED [0.0003s] (Does not support SDPA or pre-SM80 hardware) [ 43%] 2025-12-04T10:19:47.9674158Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_setitem_with_int_parameter_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [1.5847s] [ 43%] 2025-12-04T10:19:47.9676049Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_sgn_extremal_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [0.7466s] [ 44%] 2025-12-04T10:19:47.9677905Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_should_pad_bench_for_bmm_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [0.0249s] [ 44%] 2025-12-04T10:19:47.9679747Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_sigmoid_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [1.0681s] [ 44%] 2025-12-04T10:19:47.9681524Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_sign_dtype_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [1.0645s] [ 44%] 2025-12-04T10:19:47.9683270Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_sin_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [1.0392s] [ 45%] 2025-12-04T10:19:47.9685193Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_slice_scatter_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [1.6867s] [ 45%] 2025-12-04T10:19:47.9687060Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_softmax_backward_data_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [0.8922s] [ 45%] 2025-12-04T10:19:47.9688903Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_softmax_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [1.6137s] [ 45%] 2025-12-04T10:19:47.9690849Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_softmax_one_kernel_persist_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [1.2356s] [ 46%] 2025-12-04T10:19:47.9692698Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_sort_stable_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py XFAIL [0.1440s] [ 46%] 2025-12-04T10:19:47.9694489Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_split_cumsum_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py XFAIL [0.2997s] [ 46%] 2025-12-04T10:19:47.9696253Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_split_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py XFAIL [0.2941s] [ 46%] 2025-12-04T10:19:47.9698111Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_squeeze_varargs_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [0.7609s] [ 46%] 2025-12-04T10:19:47.9699904Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_sum3_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [0.8842s] [ 47%] 2025-12-04T10:19:47.9701636Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_sum4_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [1.0359s] [ 47%] 2025-12-04T10:19:47.9703399Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_sum_int_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [2.5068s] [ 47%] 2025-12-04T10:19:47.9705229Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_to_device_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py SKIPPED [0.0004s] (Skipped!) [ 47%] 2025-12-04T10:19:47.9707061Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_to_dtype_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [1.0199s] [ 48%] 2025-12-04T10:19:47.9708873Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_torch_device_split_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [0.1093s] [ 48%] 2025-12-04T10:19:47.9710666Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_triu_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [1.9149s] [ 48%] 2025-12-04T10:19:47.9712392Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_uint_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py XFAIL [0.7460s] [ 48%] 2025-12-04T10:19:47.9714183Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_unbacked_float_item_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [0.9381s] [ 48%] 2025-12-04T10:19:47.9716117Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_unbacked_floordiv_simplify_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [1.0190s] [ 49%] 2025-12-04T10:19:47.9718099Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_unbacked_floordiv_simplify_errors_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [0.0240s] [ 49%] 2025-12-04T10:19:47.9720132Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_unbind_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py XFAIL [0.1831s] [ 49%] 2025-12-04T10:19:47.9722098Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_unspec_inputs_float32_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py SKIPPED [0.0034s] (Testing mixed devices) [ 49%] 2025-12-04T10:19:47.9724245Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_unspec_inputs_int64_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py SKIPPED [0.0032s] (Testing mixed devices) [ 50%] 2025-12-04T10:19:47.9726309Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_unsqueeze_inplace_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [0.9899s] [ 50%] 2025-12-04T10:19:47.9728230Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_upsample_nearest2d_backward_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py XFAIL [4.5830s] [ 50%] 2025-12-04T10:19:47.9730122Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_var_correction_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [1.6481s] [ 50%] 2025-12-04T10:19:47.9732016Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_var_mean_tile_reduction_True_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [1.6593s] [ 51%] 2025-12-04T10:19:47.9733856Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_views5_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py XFAIL [0.1559s] [ 51%] 2025-12-04T10:19:47.9735605Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_views7_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [0.9596s] [ 51%] 2025-12-04T10:19:47.9737467Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_weight_norm_conv2d_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [3.7935s] [ 51%] 2025-12-04T10:19:47.9739304Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_where_broadcast_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [1.6955s] [ 51%] 2025-12-04T10:19:47.9741169Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_xblock_divides_xnumel_dynamic_shapes_cpu <- test/inductor/test_torchinductor.py PASSED [0.8973s] [ 52%] 2025-12-04T10:19:47.9743111Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_AllenaiLongformerBase_repro_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py XFAIL [0.3547s] [ 52%] 2025-12-04T10:19:47.9745340Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test__dyn_quant_matmul_4bit_fp32_input_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py SKIPPED [0.0033s] (No _dyn_quant_matmul_4bit implementation on CUDA) [ 52%] 2025-12-04T10:19:47.9747811Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test__dyn_quant_pack_4bit_weight_fp32_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py SKIPPED [0.0031s] (No _dyn_quant_pack_4bit_weight implementation on CUDA) [ 52%] 2025-12-04T10:19:47.9750005Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test__unsafe_masked_index_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [0.5730s] [ 53%] 2025-12-04T10:19:47.9751968Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_adaptive_avg_pool_errors_with_long_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [0.6279s] [ 53%] 2025-12-04T10:19:47.9753974Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_adaptive_avg_pool_with_output_size_0_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py XFAIL [0.1465s] [ 53%] 2025-12-04T10:19:47.9756012Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_adaptive_max_pool2d1_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [4.2365s] [ 53%] 2025-12-04T10:19:47.9757904Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_adaptive_max_pool2d3_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [6.2163s] [ 53%] 2025-12-04T10:19:47.9759905Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_adaptive_pool_errors_with_long_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [1.2459s] [ 54%] 2025-12-04T10:19:47.9761792Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_add_complex5_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [0.8504s] [ 54%] 2025-12-04T10:19:47.9763619Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_add_complex_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [0.6096s] [ 54%] 2025-12-04T10:19:47.9765517Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_add_complex_strided_fallback_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [0.1862s] [ 54%] 2025-12-04T10:19:47.9767455Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_add_inplace_permuted_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [1.0588s] [ 55%] 2025-12-04T10:19:47.9769338Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_alexnet_prefix_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [5.5127s] [ 55%] 2025-12-04T10:19:47.9771440Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_any_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [1.7087s] [ 55%] 2025-12-04T10:19:47.9773235Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_arange2_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [0.2032s] [ 55%] 2025-12-04T10:19:47.9775011Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_arange6_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [0.3202s] [ 56%] 2025-12-04T10:19:47.9776891Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_argmax_argmin3_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [0.4432s] [ 56%] 2025-12-04T10:19:47.9778732Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_avg_pool2d2_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [1.3588s] [ 56%] 2025-12-04T10:19:47.9780579Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_avg_pool3d_backward3_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [2.7478s] [ 56%] 2025-12-04T10:19:47.9782503Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_bucketize_default_kwargs_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [0.2308s] [ 56%] 2025-12-04T10:19:47.9784442Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_bucketize_int_int32_int32_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [1.5961s] [ 57%] 2025-12-04T10:19:47.9786381Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_bucketize_int_int64_int32_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [1.5921s] [ 57%] 2025-12-04T10:19:47.9788314Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_bucketize_int_int64_int8_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [1.5827s] [ 57%] 2025-12-04T10:19:47.9790380Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_bucketize_int_int8_uint8_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [1.5751s] [ 57%] 2025-12-04T10:19:47.9792328Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_bucketize_nd_tiling_False_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [0.8609s] [ 58%] 2025-12-04T10:19:47.9794239Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_buffer_batch_norm_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [3.1174s] [ 58%] 2025-12-04T10:19:47.9796281Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_builtins_round_float_ndigits_pos_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [0.2558s] [ 58%] 2025-12-04T10:19:47.9798307Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_builtins_round_float_ndigits_zero_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [0.2344s] [ 58%] 2025-12-04T10:19:47.9800228Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_cat_negative_dim_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [1.7886s] [ 58%] 2025-12-04T10:19:47.9802208Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_cat_of_loops_and_extern_kernel_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py SKIPPED [0.0003s] (Skipped!) [ 59%] 2025-12-04T10:19:47.9804160Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_cat_uint8_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [0.4613s] [ 59%] 2025-12-04T10:19:47.9805937Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_cauchy_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [0.6168s] [ 59%] 2025-12-04T10:19:47.9807757Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_chunk_recompiles_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [0.9788s] [ 59%] 2025-12-04T10:19:47.9809690Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_clamp_type_promotion_non_tensor_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [0.2444s] [ 60%] 2025-12-04T10:19:47.9811678Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_computed_buffer_inlining_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [0.2186s] [ 60%] 2025-12-04T10:19:47.9813737Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_config_option_dont_assume_alignment_recompiles_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [0.5165s] [ 60%] 2025-12-04T10:19:47.9815740Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_constant_pad_1d_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [0.7983s] [ 60%] 2025-12-04T10:19:47.9817688Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_constant_pad_float64_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [0.4966s] [ 60%] 2025-12-04T10:19:47.9819585Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_constant_pad_nd_inplace_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [0.1915s] [ 61%] 2025-12-04T10:19:47.9821747Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_conv_bn_fuse_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py SKIPPED [0.0033s] (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 61%] 2025-12-04T10:19:47.9823920Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_conv_inference_heuristics_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py XFAIL [2.1554s] [ 61%] 2025-12-04T10:19:47.9825958Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_conv_with_as_strided_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [2.6527s] [ 61%] 2025-12-04T10:19:47.9827941Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_copy_non_blocking_is_pinned_use_cat_False_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [2.1997s] [ 62%] 2025-12-04T10:19:47.9830009Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_copy_non_blocking_is_pinned_use_cat_True_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [12.5666s] [ 62%] 2025-12-04T10:19:47.9832083Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_cpu_scalar_with_cpu_scalar_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [0.7479s] [ 62%] 2025-12-04T10:19:47.9833941Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_cummin_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py XFAIL [1.3612s] [ 62%] 2025-12-04T10:19:47.9835749Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_cumprod_zero_dim_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [0.1595s] [ 63%] 2025-12-04T10:19:47.9837582Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_custom_op_2_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [0.4625s] [ 63%] 2025-12-04T10:19:47.9839447Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_custom_scan_op_compiled_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [0.6357s] [ 63%] 2025-12-04T10:19:47.9841364Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_custom_scan_op_multi_input_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [0.1719s] [ 63%] 2025-12-04T10:19:47.9843291Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_custom_scan_would_split_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [0.4763s] [ 63%] 2025-12-04T10:19:47.9845220Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_deterministic_codegen_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [11.5569s] [ 64%] 2025-12-04T10:19:47.9847168Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_dist_bf16_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py SKIPPED [0.0035s] (Requires sm80) [ 64%] 2025-12-04T10:19:47.9849024Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_div1_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [0.5704s] [ 64%] 2025-12-04T10:19:47.9850754Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_div3_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [0.4017s] [ 64%] 2025-12-04T10:19:47.9852634Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_dtype_mismatch_issue_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py SKIPPED [0.0003s] (Skipped!) [ 65%] 2025-12-04T10:19:47.9854823Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_dtypeview_float16_float32_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py SKIPPED [0.0034s] (uses bfloat16 which requires SM >= 80) [ 65%] 2025-12-04T10:19:47.9857219Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_dtypeview_float16_int16_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py SKIPPED [0.0031s] (uses bfloat16 which requires SM >= 80) [ 65%] 2025-12-04T10:19:47.9859527Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_dtypeview_float16_uint8_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py SKIPPED [0.0031s] (uses bfloat16 which requires SM >= 80) [ 65%] 2025-12-04T10:19:47.9861944Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_dtypeview_float32_int16_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py SKIPPED [0.0030s] (uses bfloat16 which requires SM >= 80) [ 65%] 2025-12-04T10:19:47.9864252Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_dtypeview_float32_int64_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py SKIPPED [0.0034s] (uses bfloat16 which requires SM >= 80) [ 66%] 2025-12-04T10:19:47.9866637Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_dtypeview_float64_float32_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py SKIPPED [0.0030s] (uses bfloat16 which requires SM >= 80) [ 66%] 2025-12-04T10:19:47.9868752Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_dtypeview_fusion_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [0.2796s] [ 66%] 2025-12-04T10:19:47.9870857Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_dtypeview_int16_float64_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py SKIPPED [0.0035s] (uses bfloat16 which requires SM >= 80) [ 66%] 2025-12-04T10:19:47.9873500Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_dtypeview_int16_uint8_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py SKIPPED [0.0036s] (uses bfloat16 which requires SM >= 80) [ 67%] 2025-12-04T10:19:47.9875816Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_dtypeview_int32_float16_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py SKIPPED [0.0031s] (uses bfloat16 which requires SM >= 80) [ 67%] 2025-12-04T10:19:47.9878107Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_dtypeview_int32_float32_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py SKIPPED [0.0030s] (uses bfloat16 which requires SM >= 80) [ 67%] 2025-12-04T10:19:47.9880425Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_dtypeview_int32_float64_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py SKIPPED [0.0030s] (uses bfloat16 which requires SM >= 80) [ 67%] 2025-12-04T10:19:47.9882730Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_dtypeview_int64_int16_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py SKIPPED [0.0035s] (uses bfloat16 which requires SM >= 80) [ 68%] 2025-12-04T10:19:47.9885010Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_dtypeview_int64_int8_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py SKIPPED [0.0030s] (uses bfloat16 which requires SM >= 80) [ 68%] 2025-12-04T10:19:47.9887294Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_dtypeview_int8_int64_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py SKIPPED [0.0030s] (uses bfloat16 which requires SM >= 80) [ 68%] 2025-12-04T10:19:47.9889562Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_dtypeview_int8_uint8_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py SKIPPED [0.0030s] (uses bfloat16 which requires SM >= 80) [ 68%] 2025-12-04T10:19:47.9891848Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_dtypeview_uint8_float16_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py SKIPPED [0.0029s] (uses bfloat16 which requires SM >= 80) [ 68%] 2025-12-04T10:19:47.9894146Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_dtypeview_uint8_int16_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py SKIPPED [0.0029s] (uses bfloat16 which requires SM >= 80) [ 69%] 2025-12-04T10:19:47.9896679Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_dtypeview_uint8_int32_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py SKIPPED [0.0032s] (uses bfloat16 which requires SM >= 80) [ 69%] 2025-12-04T10:19:47.9898742Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_embedding_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [1.0159s] [ 69%] 2025-12-04T10:19:47.9900545Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_empty2_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py XFAIL [0.1063s] [ 69%] 2025-12-04T10:19:47.9902420Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_empty_strided_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py XFAIL [0.1055s] [ 70%] 2025-12-04T10:19:47.9904345Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_emulate_precision_triton_fp_fusion_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [0.2887s] [ 70%] 2025-12-04T10:19:47.9906247Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_exp2_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [0.3630s] [ 70%] 2025-12-04T10:19:47.9908154Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_fallback_mutable_op_no_mutated_tensors_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [0.0368s] [ 70%] 2025-12-04T10:19:47.9910191Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_fallback_mutable_op_with_return_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [0.0540s] [ 70%] 2025-12-04T10:19:47.9912169Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_fft_real_input_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py SKIPPED [0.0003s] (Skipped!) [ 71%] 2025-12-04T10:19:47.9914175Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_float_index_expression_type_promotion_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [0.2111s] [ 71%] 2025-12-04T10:19:47.9916104Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_floordiv_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [0.4573s] [ 71%] 2025-12-04T10:19:47.9918210Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_fuse_large_params_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py SKIPPED [0.0005s] (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 71%] 2025-12-04T10:19:47.9920442Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_generated_code_has_alignment_assert_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [0.3170s] [ 72%] 2025-12-04T10:19:47.9922356Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_getitem_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [0.0343s] [ 72%] 2025-12-04T10:19:47.9924252Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_graph_partition_misaligned_input_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [0.9638s] [ 72%] 2025-12-04T10:19:47.9926522Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_graph_partition_refcount_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py SKIPPED [0.0005s] (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 72%] 2025-12-04T10:19:47.9928697Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_hardsigmoid_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [0.3179s] [ 73%] 2025-12-04T10:19:47.9930557Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_horizonal_fusion2_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [0.4485s] [ 73%] 2025-12-04T10:19:47.9932434Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_index2_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [0.7328s] [ 73%] 2025-12-04T10:19:47.9934193Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_index3_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [0.8863s] [ 73%] 2025-12-04T10:19:47.9936113Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_index_propagation_device_assert_masked_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [0.4714s] [ 73%] 2025-12-04T10:19:47.9938228Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_index_propagation_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [0.1975s] [ 74%] 2025-12-04T10:19:47.9940140Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_index_propagation_flip_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [0.2016s] [ 74%] 2025-12-04T10:19:47.9942084Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_index_propagation_floordiv_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [0.7441s] [ 74%] 2025-12-04T10:19:47.9943955Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_index_put1_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [3.0350s] [ 74%] 2025-12-04T10:19:47.9945809Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_index_put_fallback1_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [0.3750s] [ 75%] 2025-12-04T10:19:47.9947687Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_index_put_fallback2_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [0.4434s] [ 75%] 2025-12-04T10:19:47.9949543Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_index_select_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [1.4645s] [ 75%] 2025-12-04T10:19:47.9951472Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_inplace_mixed_dtype_ops_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py SKIPPED [0.0003s] (Skipped!) [ 75%] 2025-12-04T10:19:47.9953454Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_int_input_dynamic_shapes_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [0.1922s] [ 75%] 2025-12-04T10:19:47.9955394Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_invalid_operand_issue1_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [1.2120s] [ 76%] 2025-12-04T10:19:47.9957246Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_isinf2_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [0.2126s] [ 76%] 2025-12-04T10:19:47.9959051Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_issue102546_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [0.1550s] [ 76%] 2025-12-04T10:19:47.9960935Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_kernel_names_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py SKIPPED [0.0003s] (Skipped!) [ 76%] 2025-12-04T10:19:47.9962831Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_large_block_sizes_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [6.8157s] [ 77%] 2025-12-04T10:19:47.9964732Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_large_grid_use_block_ptr_False_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [1.5035s] [ 77%] 2025-12-04T10:19:47.9966732Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_large_pointwise_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [1.2373s] [ 77%] 2025-12-04T10:19:47.9968632Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_large_tensor_reduction_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [1.0067s] [ 77%] 2025-12-04T10:19:47.9970524Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_lerp_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py SKIPPED [0.0003s] (Skipped!) [ 78%] 2025-12-04T10:19:47.9972804Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_like_rands_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py XFAIL [0.1981s] [ 78%] 2025-12-04T10:19:47.9974570Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_linear1_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [1.1485s] [ 78%] 2025-12-04T10:19:47.9976438Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_linear2_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [1.2373s] [ 78%] 2025-12-04T10:19:47.9978430Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_linear_float64_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py SKIPPED [0.0035s] (cuda failed for float64 linear) [ 78%] 2025-12-04T10:19:47.9980492Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_list_clearing_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py SKIPPED [0.0003s] (Skipped!) [ 79%] 2025-12-04T10:19:47.9982609Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_lite_regional_compile_flex_attention_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py ('RERUN', {'yellow': True}) [2.0736s] [ 79%] 2025-12-04T10:19:47.9984861Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_lite_regional_compile_flex_attention_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py ('RERUN', {'yellow': True}) [1.8701s] [ 79%] 2025-12-04T10:19:47.9987026Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_lite_regional_compile_flex_attention_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py FAILED [1.8548s] [ 79%] 2025-12-04T10:19:47.9988142Z 2025-12-04T10:19:47.9988292Z ==================================== RERUNS ==================================== 2025-12-04T10:19:47.9989003Z _ DynamicShapesCodegenGPUTests.test_lite_regional_compile_flex_attention_dynamic_shapes_cuda _ 2025-12-04T10:19:47.9989675Z Traceback (most recent call last): 2025-12-04T10:19:47.9990510Z File "/var/lib/jenkins/workspace/test/inductor/test_torchinductor.py", line 13781, in test_lite_regional_compile_flex_attention 2025-12-04T10:19:47.9991401Z _, codes = run_fw_bw_and_get_code(lambda: opt_fn(x)) 2025-12-04T10:19:47.9992231Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/utils.py", line 2430, in run_fw_bw_and_get_code 2025-12-04T10:19:47.9993034Z return run_and_get_code(run_with_backward) 2025-12-04T10:19:47.9993796Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/utils.py", line 2409, in run_and_get_code 2025-12-04T10:19:47.9994539Z result = fn(*args, **kwargs) 2025-12-04T10:19:47.9995245Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/utils.py", line 2426, in run_with_backward 2025-12-04T10:19:47.9995964Z result = fn() 2025-12-04T10:19:47.9996535Z File "/var/lib/jenkins/workspace/test/inductor/test_torchinductor.py", line 13781, in 2025-12-04T10:19:47.9997243Z _, codes = run_fw_bw_and_get_code(lambda: opt_fn(x)) 2025-12-04T10:19:47.9998016Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 936, in compile_wrapper 2025-12-04T10:19:47.9998888Z raise e.with_traceback(None) from e.__cause__ # User compiler error 2025-12-04T10:19:47.9999474Z torch._dynamo.exc.Unsupported: Attempt to trace generator 2025-12-04T10:19:48.0000246Z Explanation: Generators cannot be compiled directly with `torch.compile`. 2025-12-04T10:19:48.0001063Z Hint: Call a generator from inside of a non-generator Python function and compile that function instead. 2025-12-04T10:19:48.0002165Z Hint: This graph break is fundamental - it is unlikely that Dynamo will ever be able to trace through your code. Consider finding a workaround. 2025-12-04T10:19:48.0003041Z 2025-12-04T10:19:48.0003177Z Developer debug context: 2025-12-04T10:19:48.0003383Z 2025-12-04T10:19:48.0003936Z For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0003.html 2025-12-04T10:19:48.0004592Z 2025-12-04T10:19:48.0004815Z To execute this test, run the following from the base repo dir: 2025-12-04T10:19:48.0006061Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_torchinductor_codegen_dynamic_shapes.py DynamicShapesCodegenGPUTests.test_lite_regional_compile_flex_attention_dynamic_shapes_cuda 2025-12-04T10:19:48.0007093Z 2025-12-04T10:19:48.0007367Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:19:48.0008003Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:19:48.0010771Z unimplemented [('Attempt to trace generator\n Explanation: Generators cannot be compiled directly with `torch.compile`.\n Hint: Call a generator from inside of a non-generator Python function and compile that function instead.\n Hint: This graph break is fundamental - it is unlikely that Dynamo will ever be able to trace through your code. Consider finding a workaround.\n\n Developer debug context: \n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0003.html', 1)] 2025-12-04T10:19:48.0013472Z stats [('calls_captured', 12), ('unique_graphs', 1)] 2025-12-04T10:19:48.0014017Z aot_autograd [('total', 3), ('autograd_cache_miss', 3), ('not_ok', 2), ('ok', 1)] 2025-12-04T10:19:48.0014549Z inductor [('fxgraph_cache_miss', 2)] 2025-12-04T10:19:48.0014900Z graph_break [] 2025-12-04T10:19:48.0015266Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:19:48.0016430Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T10:19:48.0017418Z warnings.warn( 2025-12-04T10:19:48.0018313Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T10:19:48.0019267Z warnings.warn( 2025-12-04T10:19:48.0020702Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/attention/flex_attention.py:1624: UserWarning: flex_attention called without torch.compile() - this will use an unfused implementation that materializes the full scores matrix instead of generating a fused kernel. 2025-12-04T10:19:48.0022114Z 2025-12-04T10:19:48.0022278Z SOLUTION: Use torch.compile(flex_attention)(...) 2025-12-04T10:19:48.0022571Z 2025-12-04T10:19:48.0022768Z If you want to debug your score_mod/mask_mod, you can set: 2025-12-04T10:19:48.0023373Z torch.nn.attention.flex_attention._FLEX_ATTENTION_DISABLE_COMPILE_DEBUG = True 2025-12-04T10:19:48.0023823Z 2025-12-04T10:19:48.0024384Z This will allow you to use print statements or breakpoints. Note: This doesn't work with the backwards pass and may produce incorrect results. 2025-12-04T10:19:48.0025190Z _warn_once( 2025-12-04T10:19:48.0025754Z _ DynamicShapesCodegenGPUTests.test_lite_regional_compile_flex_attention_dynamic_shapes_cuda _ 2025-12-04T10:19:48.0026417Z Traceback (most recent call last): 2025-12-04T10:19:48.0027236Z File "/var/lib/jenkins/workspace/test/inductor/test_torchinductor.py", line 13781, in test_lite_regional_compile_flex_attention 2025-12-04T10:19:48.0028119Z _, codes = run_fw_bw_and_get_code(lambda: opt_fn(x)) 2025-12-04T10:19:48.0029014Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/utils.py", line 2430, in run_fw_bw_and_get_code 2025-12-04T10:19:48.0029800Z return run_and_get_code(run_with_backward) 2025-12-04T10:19:48.0030557Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/utils.py", line 2409, in run_and_get_code 2025-12-04T10:19:48.0031287Z result = fn(*args, **kwargs) 2025-12-04T10:19:48.0032039Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/utils.py", line 2426, in run_with_backward 2025-12-04T10:19:48.0032762Z result = fn() 2025-12-04T10:19:48.0033327Z File "/var/lib/jenkins/workspace/test/inductor/test_torchinductor.py", line 13781, in 2025-12-04T10:19:48.0034033Z _, codes = run_fw_bw_and_get_code(lambda: opt_fn(x)) 2025-12-04T10:19:48.0034801Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 936, in compile_wrapper 2025-12-04T10:19:48.0035670Z raise e.with_traceback(None) from e.__cause__ # User compiler error 2025-12-04T10:19:48.0036262Z torch._dynamo.exc.Unsupported: Attempt to trace generator 2025-12-04T10:19:48.0036878Z Explanation: Generators cannot be compiled directly with `torch.compile`. 2025-12-04T10:19:48.0037703Z Hint: Call a generator from inside of a non-generator Python function and compile that function instead. 2025-12-04T10:19:48.0038798Z Hint: This graph break is fundamental - it is unlikely that Dynamo will ever be able to trace through your code. Consider finding a workaround. 2025-12-04T10:19:48.0039484Z 2025-12-04T10:19:48.0039616Z Developer debug context: 2025-12-04T10:19:48.0039821Z 2025-12-04T10:19:48.0040348Z For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0003.html 2025-12-04T10:19:48.0041017Z 2025-12-04T10:19:48.0041235Z To execute this test, run the following from the base repo dir: 2025-12-04T10:19:48.0042481Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_torchinductor_codegen_dynamic_shapes.py DynamicShapesCodegenGPUTests.test_lite_regional_compile_flex_attention_dynamic_shapes_cuda 2025-12-04T10:19:48.0043498Z 2025-12-04T10:19:48.0043775Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:19:48.0044405Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:19:48.0047154Z unimplemented [('Attempt to trace generator\n Explanation: Generators cannot be compiled directly with `torch.compile`.\n Hint: Call a generator from inside of a non-generator Python function and compile that function instead.\n Hint: This graph break is fundamental - it is unlikely that Dynamo will ever be able to trace through your code. Consider finding a workaround.\n\n Developer debug context: \n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0003.html', 1)] 2025-12-04T10:19:48.0049850Z stats [('calls_captured', 12), ('unique_graphs', 1)] 2025-12-04T10:19:48.0050404Z aot_autograd [('total', 3), ('autograd_cache_miss', 3), ('not_ok', 2), ('ok', 1)] 2025-12-04T10:19:48.0050926Z inductor [('fxgraph_cache_miss', 2)] 2025-12-04T10:19:48.0051266Z graph_break [] 2025-12-04T10:19:48.0051643Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:19:48.0052745Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T10:19:48.0053728Z warnings.warn( 2025-12-04T10:19:48.0054610Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T10:19:48.0055580Z warnings.warn( 2025-12-04T10:19:48.0057168Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/attention/flex_attention.py:1624: UserWarning: flex_attention called without torch.compile() - this will use an unfused implementation that materializes the full scores matrix instead of generating a fused kernel. 2025-12-04T10:19:48.0058561Z 2025-12-04T10:19:48.0058741Z SOLUTION: Use torch.compile(flex_attention)(...) 2025-12-04T10:19:48.0059033Z 2025-12-04T10:19:48.0059216Z If you want to debug your score_mod/mask_mod, you can set: 2025-12-04T10:19:48.0059837Z torch.nn.attention.flex_attention._FLEX_ATTENTION_DISABLE_COMPILE_DEBUG = True 2025-12-04T10:19:48.0060335Z 2025-12-04T10:19:48.0060909Z This will allow you to use print statements or breakpoints. Note: This doesn't work with the backwards pass and may produce incorrect results. 2025-12-04T10:19:48.0061702Z _warn_once( 2025-12-04T10:19:48.0062063Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:19:48.0064822Z unimplemented [('Attempt to trace generator\n Explanation: Generators cannot be compiled directly with `torch.compile`.\n Hint: Call a generator from inside of a non-generator Python function and compile that function instead.\n Hint: This graph break is fundamental - it is unlikely that Dynamo will ever be able to trace through your code. Consider finding a workaround.\n\n Developer debug context: \n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0003.html', 1)] 2025-12-04T10:19:48.0067505Z stats [('calls_captured', 12), ('unique_graphs', 1)] 2025-12-04T10:19:48.0082558Z aot_autograd [('total', 3), ('autograd_cache_miss', 3), ('not_ok', 2), ('ok', 1)] 2025-12-04T10:19:48.0083288Z inductor [('fxgraph_cache_miss', 2)] 2025-12-04T10:19:48.0083648Z graph_break [] 2025-12-04T10:19:48.0084050Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:19:48.0085161Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T10:19:48.0086142Z warnings.warn( 2025-12-04T10:19:48.0087065Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T10:19:48.0088037Z warnings.warn( 2025-12-04T10:19:48.0088343Z =================================== FAILURES =================================== 2025-12-04T10:19:48.0089053Z _ DynamicShapesCodegenGPUTests.test_lite_regional_compile_flex_attention_dynamic_shapes_cuda _ 2025-12-04T10:19:48.0089743Z Traceback (most recent call last): 2025-12-04T10:19:48.0090547Z File "/var/lib/jenkins/workspace/test/inductor/test_torchinductor.py", line 13781, in test_lite_regional_compile_flex_attention 2025-12-04T10:19:48.0091428Z _, codes = run_fw_bw_and_get_code(lambda: opt_fn(x)) 2025-12-04T10:19:48.0092250Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/utils.py", line 2430, in run_fw_bw_and_get_code 2025-12-04T10:19:48.0093050Z return run_and_get_code(run_with_backward) 2025-12-04T10:19:48.0093791Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/utils.py", line 2409, in run_and_get_code 2025-12-04T10:19:48.0094521Z result = fn(*args, **kwargs) 2025-12-04T10:19:48.0095228Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/utils.py", line 2426, in run_with_backward 2025-12-04T10:19:48.0095938Z result = fn() 2025-12-04T10:19:48.0096597Z File "/var/lib/jenkins/workspace/test/inductor/test_torchinductor.py", line 13781, in 2025-12-04T10:19:48.0097310Z _, codes = run_fw_bw_and_get_code(lambda: opt_fn(x)) 2025-12-04T10:19:48.0098102Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 936, in compile_wrapper 2025-12-04T10:19:48.0098960Z raise e.with_traceback(None) from e.__cause__ # User compiler error 2025-12-04T10:19:48.0099559Z torch._dynamo.exc.Unsupported: Attempt to trace generator 2025-12-04T10:19:48.0100395Z Explanation: Generators cannot be compiled directly with `torch.compile`. 2025-12-04T10:19:48.0101235Z Hint: Call a generator from inside of a non-generator Python function and compile that function instead. 2025-12-04T10:19:48.0102323Z Hint: This graph break is fundamental - it is unlikely that Dynamo will ever be able to trace through your code. Consider finding a workaround. 2025-12-04T10:19:48.0103027Z 2025-12-04T10:19:48.0103147Z Developer debug context: 2025-12-04T10:19:48.0103471Z 2025-12-04T10:19:48.0104023Z For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0003.html 2025-12-04T10:19:48.0104684Z 2025-12-04T10:19:48.0104920Z To execute this test, run the following from the base repo dir: 2025-12-04T10:19:48.0106159Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_torchinductor_codegen_dynamic_shapes.py DynamicShapesCodegenGPUTests.test_lite_regional_compile_flex_attention_dynamic_shapes_cuda 2025-12-04T10:19:48.0107191Z 2025-12-04T10:19:48.0107465Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:19:48.0108106Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:19:48.0110885Z unimplemented [('Attempt to trace generator\n Explanation: Generators cannot be compiled directly with `torch.compile`.\n Hint: Call a generator from inside of a non-generator Python function and compile that function instead.\n Hint: This graph break is fundamental - it is unlikely that Dynamo will ever be able to trace through your code. Consider finding a workaround.\n\n Developer debug context: \n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0003.html', 1)] 2025-12-04T10:19:48.0113593Z stats [('calls_captured', 12), ('unique_graphs', 1)] 2025-12-04T10:19:48.0114136Z aot_autograd [('total', 3), ('autograd_cache_miss', 3), ('not_ok', 2), ('ok', 1)] 2025-12-04T10:19:48.0114666Z inductor [('fxgraph_cache_miss', 2)] 2025-12-04T10:19:48.0115022Z graph_break [] 2025-12-04T10:19:48.0115392Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:19:48.0116505Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T10:19:48.0117479Z warnings.warn( 2025-12-04T10:19:48.0118375Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T10:19:48.0119337Z warnings.warn( 2025-12-04T10:19:48.0120774Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/attention/flex_attention.py:1624: UserWarning: flex_attention called without torch.compile() - this will use an unfused implementation that materializes the full scores matrix instead of generating a fused kernel. 2025-12-04T10:19:48.0122191Z 2025-12-04T10:19:48.0122358Z SOLUTION: Use torch.compile(flex_attention)(...) 2025-12-04T10:19:48.0122654Z 2025-12-04T10:19:48.0122855Z If you want to debug your score_mod/mask_mod, you can set: 2025-12-04T10:19:48.0123466Z torch.nn.attention.flex_attention._FLEX_ATTENTION_DISABLE_COMPILE_DEBUG = True 2025-12-04T10:19:48.0123918Z 2025-12-04T10:19:48.0124480Z This will allow you to use print statements or breakpoints. Note: This doesn't work with the backwards pass and may produce incorrect results. 2025-12-04T10:19:48.0125290Z _warn_once( 2025-12-04T10:19:48.0125673Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:19:48.0128494Z unimplemented [('Attempt to trace generator\n Explanation: Generators cannot be compiled directly with `torch.compile`.\n Hint: Call a generator from inside of a non-generator Python function and compile that function instead.\n Hint: This graph break is fundamental - it is unlikely that Dynamo will ever be able to trace through your code. Consider finding a workaround.\n\n Developer debug context: \n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0003.html', 1)] 2025-12-04T10:19:48.0131198Z stats [('calls_captured', 12), ('unique_graphs', 1)] 2025-12-04T10:19:48.0131760Z aot_autograd [('total', 3), ('autograd_cache_miss', 3), ('not_ok', 2), ('ok', 1)] 2025-12-04T10:19:48.0132288Z inductor [('fxgraph_cache_miss', 2)] 2025-12-04T10:19:48.0132694Z graph_break [] 2025-12-04T10:19:48.0133078Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:19:48.0134184Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T10:19:48.0135174Z warnings.warn( 2025-12-04T10:19:48.0136056Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T10:19:48.0137117Z warnings.warn( 2025-12-04T10:19:48.0137508Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:19:48.0140271Z unimplemented [('Attempt to trace generator\n Explanation: Generators cannot be compiled directly with `torch.compile`.\n Hint: Call a generator from inside of a non-generator Python function and compile that function instead.\n Hint: This graph break is fundamental - it is unlikely that Dynamo will ever be able to trace through your code. Consider finding a workaround.\n\n Developer debug context: \n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0003.html', 1)] 2025-12-04T10:19:48.0142985Z stats [('calls_captured', 12), ('unique_graphs', 1)] 2025-12-04T10:19:48.0143528Z aot_autograd [('total', 3), ('autograd_cache_miss', 3), ('not_ok', 2), ('ok', 1)] 2025-12-04T10:19:48.0144053Z inductor [('fxgraph_cache_miss', 2)] 2025-12-04T10:19:48.0144411Z graph_break [] 2025-12-04T10:19:48.0144783Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:19:48.0145881Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T10:19:48.0146856Z warnings.warn( 2025-12-04T10:19:48.0147748Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T10:19:48.0148704Z warnings.warn( 2025-12-04T10:19:48.0149872Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_torchinductor_codegen_dynamic_shapes/inductor.test_torchinductor_codegen_dynamic_shapes-0c75da116b2f10f8.xml - 2025-12-04T10:19:48.0151188Z =========================== short test summary info ============================ 2025-12-04T10:19:48.0152554Z FAILED [1.8548s] inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_lite_regional_compile_flex_attention_dynamic_shapes_cuda - torch._dynamo.exc.Unsupported: Attempt to trace generator 2025-12-04T10:19:48.0154000Z Explanation: Generators cannot be compiled directly with `torch.compile`. 2025-12-04T10:19:48.0154843Z Hint: Call a generator from inside of a non-generator Python function and compile that function instead. 2025-12-04T10:19:48.0155941Z Hint: This graph break is fundamental - it is unlikely that Dynamo will ever be able to trace through your code. Consider finding a workaround. 2025-12-04T10:19:48.0156641Z 2025-12-04T10:19:48.0156760Z Developer debug context: 2025-12-04T10:19:48.0156967Z 2025-12-04T10:19:48.0157507Z For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0003.html 2025-12-04T10:19:48.0158165Z 2025-12-04T10:19:48.0158404Z To execute this test, run the following from the base repo dir: 2025-12-04T10:19:48.0159742Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_torchinductor_codegen_dynamic_shapes.py DynamicShapesCodegenGPUTests.test_lite_regional_compile_flex_attention_dynamic_shapes_cuda 2025-12-04T10:19:48.0160780Z 2025-12-04T10:19:48.0161052Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:19:48.0161650Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:19:48.0162304Z == 1 failed, 256 passed, 61 skipped, 32 xfailed, 2 rerun in 462.93s (0:07:42) == 2025-12-04T10:19:48.0162786Z Got exit code 1 2025-12-04T10:19:48.0163063Z Retrying single test... 2025-12-04T10:19:48.0163708Z W1204 10:16:50.891000 36017 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T10:19:48.0165124Z Test results will be stored in test-reports/python-pytest/inductor.test_torchinductor_codegen_dynamic_shapes/inductor.test_torchinductor_codegen_dynamic_shapes-fd0863b8a222871a.xml 2025-12-04T10:19:48.0166244Z ============================= test session starts ============================== 2025-12-04T10:19:48.0166913Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:19:48.0167521Z cachedir: .pytest_cache 2025-12-04T10:19:48.0168225Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:19:48.0169021Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:19:48.0169382Z configfile: pytest.ini 2025-12-04T10:19:48.0170110Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:19:48.0171255Z collecting ... collected 1750 items / 440 deselected / 1310 selected 2025-12-04T10:19:48.0172688Z stepcurrent: skipping 349 already run items. Running only test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_lite_regional_compile_flex_attention_dynamic_shapes_cuda 2025-12-04T10:19:48.0173925Z Running 1 items in this shard 2025-12-04T10:19:48.0174140Z 2025-12-04T10:19:48.0175208Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_lite_regional_compile_flex_attention_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py ('RERUN', {'yellow': True}) [4.7316s] [100%] 2025-12-04T10:19:48.0177518Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_lite_regional_compile_flex_attention_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py ('RERUN', {'yellow': True}) [2.0298s] [100%] 2025-12-04T10:19:48.0179675Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_lite_regional_compile_flex_attention_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py FAILED [1.7950s] [100%] 2025-12-04T10:19:48.0180789Z 2025-12-04T10:19:48.0180933Z ==================================== RERUNS ==================================== 2025-12-04T10:19:48.0181642Z _ DynamicShapesCodegenGPUTests.test_lite_regional_compile_flex_attention_dynamic_shapes_cuda _ 2025-12-04T10:19:48.0182321Z Traceback (most recent call last): 2025-12-04T10:19:48.0183123Z File "/var/lib/jenkins/workspace/test/inductor/test_torchinductor.py", line 13781, in test_lite_regional_compile_flex_attention 2025-12-04T10:19:48.0184000Z _, codes = run_fw_bw_and_get_code(lambda: opt_fn(x)) 2025-12-04T10:19:48.0184829Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/utils.py", line 2430, in run_fw_bw_and_get_code 2025-12-04T10:19:48.0185620Z return run_and_get_code(run_with_backward) 2025-12-04T10:19:48.0186370Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/utils.py", line 2409, in run_and_get_code 2025-12-04T10:19:48.0187107Z result = fn(*args, **kwargs) 2025-12-04T10:19:48.0187812Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/utils.py", line 2426, in run_with_backward 2025-12-04T10:19:48.0188521Z result = fn() 2025-12-04T10:19:48.0189258Z File "/var/lib/jenkins/workspace/test/inductor/test_torchinductor.py", line 13781, in 2025-12-04T10:19:48.0189981Z _, codes = run_fw_bw_and_get_code(lambda: opt_fn(x)) 2025-12-04T10:19:48.0190756Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 936, in compile_wrapper 2025-12-04T10:19:48.0191632Z raise e.with_traceback(None) from e.__cause__ # User compiler error 2025-12-04T10:19:48.0192343Z torch._dynamo.exc.Unsupported: Attempt to trace generator 2025-12-04T10:19:48.0192972Z Explanation: Generators cannot be compiled directly with `torch.compile`. 2025-12-04T10:19:48.0193787Z Hint: Call a generator from inside of a non-generator Python function and compile that function instead. 2025-12-04T10:19:48.0194884Z Hint: This graph break is fundamental - it is unlikely that Dynamo will ever be able to trace through your code. Consider finding a workaround. 2025-12-04T10:19:48.0195587Z 2025-12-04T10:19:48.0195712Z Developer debug context: 2025-12-04T10:19:48.0195918Z 2025-12-04T10:19:48.0196463Z For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0003.html 2025-12-04T10:19:48.0197118Z 2025-12-04T10:19:48.0197339Z To execute this test, run the following from the base repo dir: 2025-12-04T10:19:48.0198586Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_torchinductor_codegen_dynamic_shapes.py DynamicShapesCodegenGPUTests.test_lite_regional_compile_flex_attention_dynamic_shapes_cuda 2025-12-04T10:19:48.0199626Z 2025-12-04T10:19:48.0199895Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:19:48.0200537Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:19:48.0203310Z unimplemented [('Attempt to trace generator\n Explanation: Generators cannot be compiled directly with `torch.compile`.\n Hint: Call a generator from inside of a non-generator Python function and compile that function instead.\n Hint: This graph break is fundamental - it is unlikely that Dynamo will ever be able to trace through your code. Consider finding a workaround.\n\n Developer debug context: \n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0003.html', 1)] 2025-12-04T10:19:48.0205981Z stats [('calls_captured', 12), ('unique_graphs', 1)] 2025-12-04T10:19:48.0206543Z aot_autograd [('total', 3), ('autograd_cache_miss', 3), ('not_ok', 2), ('ok', 1)] 2025-12-04T10:19:48.0207069Z inductor [('fxgraph_cache_miss', 2)] 2025-12-04T10:19:48.0207421Z graph_break [] 2025-12-04T10:19:48.0207790Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:19:48.0208883Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T10:19:48.0209857Z warnings.warn( 2025-12-04T10:19:48.0210734Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T10:19:48.0211698Z warnings.warn( 2025-12-04T10:19:48.0213128Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/attention/flex_attention.py:1624: UserWarning: flex_attention called without torch.compile() - this will use an unfused implementation that materializes the full scores matrix instead of generating a fused kernel. 2025-12-04T10:19:48.0214525Z 2025-12-04T10:19:48.0214702Z SOLUTION: Use torch.compile(flex_attention)(...) 2025-12-04T10:19:48.0214996Z 2025-12-04T10:19:48.0215196Z If you want to debug your score_mod/mask_mod, you can set: 2025-12-04T10:19:48.0215798Z torch.nn.attention.flex_attention._FLEX_ATTENTION_DISABLE_COMPILE_DEBUG = True 2025-12-04T10:19:48.0216241Z 2025-12-04T10:19:48.0216966Z This will allow you to use print statements or breakpoints. Note: This doesn't work with the backwards pass and may produce incorrect results. 2025-12-04T10:19:48.0217769Z _warn_once( 2025-12-04T10:19:48.0218314Z _ DynamicShapesCodegenGPUTests.test_lite_regional_compile_flex_attention_dynamic_shapes_cuda _ 2025-12-04T10:19:48.0218993Z Traceback (most recent call last): 2025-12-04T10:19:48.0219816Z File "/var/lib/jenkins/workspace/test/inductor/test_torchinductor.py", line 13781, in test_lite_regional_compile_flex_attention 2025-12-04T10:19:48.0220756Z _, codes = run_fw_bw_and_get_code(lambda: opt_fn(x)) 2025-12-04T10:19:48.0221560Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/utils.py", line 2430, in run_fw_bw_and_get_code 2025-12-04T10:19:48.0222361Z return run_and_get_code(run_with_backward) 2025-12-04T10:19:48.0223112Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/utils.py", line 2409, in run_and_get_code 2025-12-04T10:19:48.0223836Z result = fn(*args, **kwargs) 2025-12-04T10:19:48.0224537Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/utils.py", line 2426, in run_with_backward 2025-12-04T10:19:48.0225258Z result = fn() 2025-12-04T10:19:48.0225825Z File "/var/lib/jenkins/workspace/test/inductor/test_torchinductor.py", line 13781, in 2025-12-04T10:19:48.0226524Z _, codes = run_fw_bw_and_get_code(lambda: opt_fn(x)) 2025-12-04T10:19:48.0227309Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 936, in compile_wrapper 2025-12-04T10:19:48.0228179Z raise e.with_traceback(None) from e.__cause__ # User compiler error 2025-12-04T10:19:48.0228771Z torch._dynamo.exc.Unsupported: Attempt to trace generator 2025-12-04T10:19:48.0229386Z Explanation: Generators cannot be compiled directly with `torch.compile`. 2025-12-04T10:19:48.0230213Z Hint: Call a generator from inside of a non-generator Python function and compile that function instead. 2025-12-04T10:19:48.0231314Z Hint: This graph break is fundamental - it is unlikely that Dynamo will ever be able to trace through your code. Consider finding a workaround. 2025-12-04T10:19:48.0232002Z 2025-12-04T10:19:48.0232121Z Developer debug context: 2025-12-04T10:19:48.0232342Z 2025-12-04T10:19:48.0232871Z For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0003.html 2025-12-04T10:19:48.0233544Z 2025-12-04T10:19:48.0233769Z To execute this test, run the following from the base repo dir: 2025-12-04T10:19:48.0235016Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_torchinductor_codegen_dynamic_shapes.py DynamicShapesCodegenGPUTests.test_lite_regional_compile_flex_attention_dynamic_shapes_cuda 2025-12-04T10:19:48.0236060Z 2025-12-04T10:19:48.0236332Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:19:48.0236973Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:19:48.0239756Z unimplemented [('Attempt to trace generator\n Explanation: Generators cannot be compiled directly with `torch.compile`.\n Hint: Call a generator from inside of a non-generator Python function and compile that function instead.\n Hint: This graph break is fundamental - it is unlikely that Dynamo will ever be able to trace through your code. Consider finding a workaround.\n\n Developer debug context: \n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0003.html', 1)] 2025-12-04T10:19:48.0242466Z stats [('calls_captured', 12), ('unique_graphs', 1)] 2025-12-04T10:19:48.0243009Z aot_autograd [('total', 3), ('autograd_cache_miss', 3), ('not_ok', 2), ('ok', 1)] 2025-12-04T10:19:48.0243542Z inductor [('fxgraph_cache_miss', 2)] 2025-12-04T10:19:48.0243898Z graph_break [] 2025-12-04T10:19:48.0244269Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:19:48.0245449Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T10:19:48.0246424Z warnings.warn( 2025-12-04T10:19:48.0247318Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T10:19:48.0248271Z warnings.warn( 2025-12-04T10:19:48.0249719Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/attention/flex_attention.py:1624: UserWarning: flex_attention called without torch.compile() - this will use an unfused implementation that materializes the full scores matrix instead of generating a fused kernel. 2025-12-04T10:19:48.0251191Z 2025-12-04T10:19:48.0251354Z SOLUTION: Use torch.compile(flex_attention)(...) 2025-12-04T10:19:48.0251648Z 2025-12-04T10:19:48.0251848Z If you want to debug your score_mod/mask_mod, you can set: 2025-12-04T10:19:48.0252453Z torch.nn.attention.flex_attention._FLEX_ATTENTION_DISABLE_COMPILE_DEBUG = True 2025-12-04T10:19:48.0252914Z 2025-12-04T10:19:48.0253477Z This will allow you to use print statements or breakpoints. Note: This doesn't work with the backwards pass and may produce incorrect results. 2025-12-04T10:19:48.0254282Z _warn_once( 2025-12-04T10:19:48.0254655Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:19:48.0257482Z unimplemented [('Attempt to trace generator\n Explanation: Generators cannot be compiled directly with `torch.compile`.\n Hint: Call a generator from inside of a non-generator Python function and compile that function instead.\n Hint: This graph break is fundamental - it is unlikely that Dynamo will ever be able to trace through your code. Consider finding a workaround.\n\n Developer debug context: \n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0003.html', 1)] 2025-12-04T10:19:48.0260191Z stats [('calls_captured', 12), ('unique_graphs', 1)] 2025-12-04T10:19:48.0260749Z aot_autograd [('total', 3), ('autograd_cache_miss', 3), ('not_ok', 2), ('ok', 1)] 2025-12-04T10:19:48.0261272Z inductor [('fxgraph_cache_miss', 2)] 2025-12-04T10:19:48.0261605Z graph_break [] 2025-12-04T10:19:48.0261986Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:19:48.0263091Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T10:19:48.0264062Z warnings.warn( 2025-12-04T10:19:48.0264943Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T10:19:48.0265908Z warnings.warn( 2025-12-04T10:19:48.0266225Z =================================== FAILURES =================================== 2025-12-04T10:19:48.0266912Z _ DynamicShapesCodegenGPUTests.test_lite_regional_compile_flex_attention_dynamic_shapes_cuda _ 2025-12-04T10:19:48.0267590Z Traceback (most recent call last): 2025-12-04T10:19:48.0268411Z File "/var/lib/jenkins/workspace/test/inductor/test_torchinductor.py", line 13781, in test_lite_regional_compile_flex_attention 2025-12-04T10:19:48.0269291Z _, codes = run_fw_bw_and_get_code(lambda: opt_fn(x)) 2025-12-04T10:19:48.0270097Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/utils.py", line 2430, in run_fw_bw_and_get_code 2025-12-04T10:19:48.0270901Z return run_and_get_code(run_with_backward) 2025-12-04T10:19:48.0271982Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/utils.py", line 2409, in run_and_get_code 2025-12-04T10:19:48.0272712Z result = fn(*args, **kwargs) 2025-12-04T10:19:48.0273409Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/utils.py", line 2426, in run_with_backward 2025-12-04T10:19:48.0274133Z result = fn() 2025-12-04T10:19:48.0274869Z File "/var/lib/jenkins/workspace/test/inductor/test_torchinductor.py", line 13781, in 2025-12-04T10:19:48.0275569Z _, codes = run_fw_bw_and_get_code(lambda: opt_fn(x)) 2025-12-04T10:19:48.0276353Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 936, in compile_wrapper 2025-12-04T10:19:48.0277221Z raise e.with_traceback(None) from e.__cause__ # User compiler error 2025-12-04T10:19:48.0277813Z torch._dynamo.exc.Unsupported: Attempt to trace generator 2025-12-04T10:19:48.0278517Z Explanation: Generators cannot be compiled directly with `torch.compile`. 2025-12-04T10:19:48.0279343Z Hint: Call a generator from inside of a non-generator Python function and compile that function instead. 2025-12-04T10:19:48.0280442Z Hint: This graph break is fundamental - it is unlikely that Dynamo will ever be able to trace through your code. Consider finding a workaround. 2025-12-04T10:19:48.0281128Z 2025-12-04T10:19:48.0281262Z Developer debug context: 2025-12-04T10:19:48.0281468Z 2025-12-04T10:19:48.0282002Z For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0003.html 2025-12-04T10:19:48.0282677Z 2025-12-04T10:19:48.0282897Z To execute this test, run the following from the base repo dir: 2025-12-04T10:19:48.0284145Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_torchinductor_codegen_dynamic_shapes.py DynamicShapesCodegenGPUTests.test_lite_regional_compile_flex_attention_dynamic_shapes_cuda 2025-12-04T10:19:48.0285167Z 2025-12-04T10:19:48.0285449Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:19:48.0286066Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:19:48.0288830Z unimplemented [('Attempt to trace generator\n Explanation: Generators cannot be compiled directly with `torch.compile`.\n Hint: Call a generator from inside of a non-generator Python function and compile that function instead.\n Hint: This graph break is fundamental - it is unlikely that Dynamo will ever be able to trace through your code. Consider finding a workaround.\n\n Developer debug context: \n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0003.html', 1)] 2025-12-04T10:19:48.0291528Z stats [('calls_captured', 12), ('unique_graphs', 1)] 2025-12-04T10:19:48.0292082Z aot_autograd [('total', 3), ('autograd_cache_miss', 3), ('not_ok', 2), ('ok', 1)] 2025-12-04T10:19:48.0292612Z inductor [('fxgraph_cache_miss', 2)] 2025-12-04T10:19:48.0292946Z graph_break [] 2025-12-04T10:19:48.0293325Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:19:48.0294416Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T10:19:48.0295382Z warnings.warn( 2025-12-04T10:19:48.0296337Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T10:19:48.0297310Z warnings.warn( 2025-12-04T10:19:48.0298751Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/attention/flex_attention.py:1624: UserWarning: flex_attention called without torch.compile() - this will use an unfused implementation that materializes the full scores matrix instead of generating a fused kernel. 2025-12-04T10:19:48.0300151Z 2025-12-04T10:19:48.0300315Z SOLUTION: Use torch.compile(flex_attention)(...) 2025-12-04T10:19:48.0300623Z 2025-12-04T10:19:48.0300805Z If you want to debug your score_mod/mask_mod, you can set: 2025-12-04T10:19:48.0301426Z torch.nn.attention.flex_attention._FLEX_ATTENTION_DISABLE_COMPILE_DEBUG = True 2025-12-04T10:19:48.0301861Z 2025-12-04T10:19:48.0302432Z This will allow you to use print statements or breakpoints. Note: This doesn't work with the backwards pass and may produce incorrect results. 2025-12-04T10:19:48.0303215Z _warn_once( 2025-12-04T10:19:48.0303724Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:19:48.0306501Z unimplemented [('Attempt to trace generator\n Explanation: Generators cannot be compiled directly with `torch.compile`.\n Hint: Call a generator from inside of a non-generator Python function and compile that function instead.\n Hint: This graph break is fundamental - it is unlikely that Dynamo will ever be able to trace through your code. Consider finding a workaround.\n\n Developer debug context: \n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0003.html', 1)] 2025-12-04T10:19:48.0309266Z stats [('calls_captured', 12), ('unique_graphs', 1)] 2025-12-04T10:19:48.0309817Z aot_autograd [('total', 3), ('autograd_cache_miss', 3), ('not_ok', 2), ('ok', 1)] 2025-12-04T10:19:48.0310327Z inductor [('fxgraph_cache_miss', 2)] 2025-12-04T10:19:48.0310683Z graph_break [] 2025-12-04T10:19:48.0311068Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:19:48.0312153Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T10:19:48.0313127Z warnings.warn( 2025-12-04T10:19:48.0314012Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T10:19:48.0314979Z warnings.warn( 2025-12-04T10:19:48.0315350Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:19:48.0318130Z unimplemented [('Attempt to trace generator\n Explanation: Generators cannot be compiled directly with `torch.compile`.\n Hint: Call a generator from inside of a non-generator Python function and compile that function instead.\n Hint: This graph break is fundamental - it is unlikely that Dynamo will ever be able to trace through your code. Consider finding a workaround.\n\n Developer debug context: \n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0003.html', 1)] 2025-12-04T10:19:48.0320795Z stats [('calls_captured', 12), ('unique_graphs', 1)] 2025-12-04T10:19:48.0321352Z aot_autograd [('total', 3), ('autograd_cache_miss', 3), ('not_ok', 2), ('ok', 1)] 2025-12-04T10:19:48.0321868Z inductor [('fxgraph_cache_miss', 2)] 2025-12-04T10:19:48.0322216Z graph_break [] 2025-12-04T10:19:48.0322588Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:19:48.0323678Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T10:19:48.0324631Z warnings.warn( 2025-12-04T10:19:48.0325525Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T10:19:48.0326493Z warnings.warn( 2025-12-04T10:19:48.0327654Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_torchinductor_codegen_dynamic_shapes/inductor.test_torchinductor_codegen_dynamic_shapes-fd0863b8a222871a.xml - 2025-12-04T10:19:48.0328958Z =========================== short test summary info ============================ 2025-12-04T10:19:48.0330329Z FAILED [1.7950s] inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_lite_regional_compile_flex_attention_dynamic_shapes_cuda - torch._dynamo.exc.Unsupported: Attempt to trace generator 2025-12-04T10:19:48.0331772Z Explanation: Generators cannot be compiled directly with `torch.compile`. 2025-12-04T10:19:48.0332603Z Hint: Call a generator from inside of a non-generator Python function and compile that function instead. 2025-12-04T10:19:48.0333762Z Hint: This graph break is fundamental - it is unlikely that Dynamo will ever be able to trace through your code. Consider finding a workaround. 2025-12-04T10:19:48.0334464Z 2025-12-04T10:19:48.0334583Z Developer debug context: 2025-12-04T10:19:48.0334791Z 2025-12-04T10:19:48.0335332Z For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0003.html 2025-12-04T10:19:48.0335996Z 2025-12-04T10:19:48.0336363Z To execute this test, run the following from the base repo dir: 2025-12-04T10:19:48.0337598Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_torchinductor_codegen_dynamic_shapes.py DynamicShapesCodegenGPUTests.test_lite_regional_compile_flex_attention_dynamic_shapes_cuda 2025-12-04T10:19:48.0338630Z 2025-12-04T10:19:48.0338900Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:19:48.0339501Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:19:48.0340041Z ================== 1 failed, 440 deselected, 2 rerun in 8.68s ================== 2025-12-04T10:19:48.0340483Z Got exit code 1 2025-12-04T10:19:48.0340758Z Retrying single test... 2025-12-04T10:19:48.0341400Z W1204 10:17:13.656000 36216 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T10:19:48.0342805Z Test results will be stored in test-reports/python-pytest/inductor.test_torchinductor_codegen_dynamic_shapes/inductor.test_torchinductor_codegen_dynamic_shapes-6fcb35b3fc35a71c.xml 2025-12-04T10:19:48.0343937Z ============================= test session starts ============================== 2025-12-04T10:19:48.0344604Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:19:48.0345213Z cachedir: .pytest_cache 2025-12-04T10:19:48.0345913Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:19:48.0346704Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:19:48.0347068Z configfile: pytest.ini 2025-12-04T10:19:48.0347798Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:19:48.0348694Z collecting ... collected 1750 items / 440 deselected / 1310 selected 2025-12-04T10:19:48.0350031Z stepcurrent: skipping 349 already run items. Running only test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_lite_regional_compile_flex_attention_dynamic_shapes_cuda 2025-12-04T10:19:48.0351251Z Running 1 items in this shard 2025-12-04T10:19:48.0351463Z 2025-12-04T10:19:48.0352547Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_lite_regional_compile_flex_attention_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py ('RERUN', {'yellow': True}) [4.6647s] [100%] 2025-12-04T10:19:48.0354792Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_lite_regional_compile_flex_attention_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py ('RERUN', {'yellow': True}) [2.0683s] [100%] 2025-12-04T10:19:48.0356939Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_lite_regional_compile_flex_attention_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py FAILED [1.8384s] [100%] 2025-12-04T10:19:48.0358044Z 2025-12-04T10:19:48.0358193Z ==================================== RERUNS ==================================== 2025-12-04T10:19:48.0358896Z _ DynamicShapesCodegenGPUTests.test_lite_regional_compile_flex_attention_dynamic_shapes_cuda _ 2025-12-04T10:19:48.0359567Z Traceback (most recent call last): 2025-12-04T10:19:48.0360372Z File "/var/lib/jenkins/workspace/test/inductor/test_torchinductor.py", line 13781, in test_lite_regional_compile_flex_attention 2025-12-04T10:19:48.0361249Z _, codes = run_fw_bw_and_get_code(lambda: opt_fn(x)) 2025-12-04T10:19:48.0362151Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/utils.py", line 2430, in run_fw_bw_and_get_code 2025-12-04T10:19:48.0362946Z return run_and_get_code(run_with_backward) 2025-12-04T10:19:48.0363701Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/utils.py", line 2409, in run_and_get_code 2025-12-04T10:19:48.0364432Z result = fn(*args, **kwargs) 2025-12-04T10:19:48.0365136Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/utils.py", line 2426, in run_with_backward 2025-12-04T10:19:48.0365912Z result = fn() 2025-12-04T10:19:48.0366477Z File "/var/lib/jenkins/workspace/test/inductor/test_torchinductor.py", line 13781, in 2025-12-04T10:19:48.0367182Z _, codes = run_fw_bw_and_get_code(lambda: opt_fn(x)) 2025-12-04T10:19:48.0367955Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 936, in compile_wrapper 2025-12-04T10:19:48.0368822Z raise e.with_traceback(None) from e.__cause__ # User compiler error 2025-12-04T10:19:48.0369418Z torch._dynamo.exc.Unsupported: Attempt to trace generator 2025-12-04T10:19:48.0370047Z Explanation: Generators cannot be compiled directly with `torch.compile`. 2025-12-04T10:19:48.0370858Z Hint: Call a generator from inside of a non-generator Python function and compile that function instead. 2025-12-04T10:19:48.0372309Z Hint: This graph break is fundamental - it is unlikely that Dynamo will ever be able to trace through your code. Consider finding a workaround. 2025-12-04T10:19:48.0373021Z 2025-12-04T10:19:48.0373143Z Developer debug context: 2025-12-04T10:19:48.0373351Z 2025-12-04T10:19:48.0373897Z For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0003.html 2025-12-04T10:19:48.0374556Z 2025-12-04T10:19:48.0374777Z To execute this test, run the following from the base repo dir: 2025-12-04T10:19:48.0376031Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_torchinductor_codegen_dynamic_shapes.py DynamicShapesCodegenGPUTests.test_lite_regional_compile_flex_attention_dynamic_shapes_cuda 2025-12-04T10:19:48.0377138Z 2025-12-04T10:19:48.0377408Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:19:48.0378051Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:19:48.0380801Z unimplemented [('Attempt to trace generator\n Explanation: Generators cannot be compiled directly with `torch.compile`.\n Hint: Call a generator from inside of a non-generator Python function and compile that function instead.\n Hint: This graph break is fundamental - it is unlikely that Dynamo will ever be able to trace through your code. Consider finding a workaround.\n\n Developer debug context: \n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0003.html', 1)] 2025-12-04T10:19:48.0383527Z stats [('calls_captured', 12), ('unique_graphs', 1)] 2025-12-04T10:19:48.0384088Z aot_autograd [('total', 3), ('autograd_cache_miss', 3), ('not_ok', 2), ('ok', 1)] 2025-12-04T10:19:48.0384610Z inductor [('fxgraph_cache_miss', 2)] 2025-12-04T10:19:48.0384959Z graph_break [] 2025-12-04T10:19:48.0385327Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:19:48.0386424Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T10:19:48.0387406Z warnings.warn( 2025-12-04T10:19:48.0388396Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T10:19:48.0389367Z warnings.warn( 2025-12-04T10:19:48.0390964Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/attention/flex_attention.py:1624: UserWarning: flex_attention called without torch.compile() - this will use an unfused implementation that materializes the full scores matrix instead of generating a fused kernel. 2025-12-04T10:19:48.0392354Z 2025-12-04T10:19:48.0392532Z SOLUTION: Use torch.compile(flex_attention)(...) 2025-12-04T10:19:48.0392824Z 2025-12-04T10:19:48.0393024Z If you want to debug your score_mod/mask_mod, you can set: 2025-12-04T10:19:48.0393630Z torch.nn.attention.flex_attention._FLEX_ATTENTION_DISABLE_COMPILE_DEBUG = True 2025-12-04T10:19:48.0394164Z 2025-12-04T10:19:48.0394722Z This will allow you to use print statements or breakpoints. Note: This doesn't work with the backwards pass and may produce incorrect results. 2025-12-04T10:19:48.0395516Z _warn_once( 2025-12-04T10:19:48.0396062Z _ DynamicShapesCodegenGPUTests.test_lite_regional_compile_flex_attention_dynamic_shapes_cuda _ 2025-12-04T10:19:48.0396735Z Traceback (most recent call last): 2025-12-04T10:19:48.0397549Z File "/var/lib/jenkins/workspace/test/inductor/test_torchinductor.py", line 13781, in test_lite_regional_compile_flex_attention 2025-12-04T10:19:48.0398435Z _, codes = run_fw_bw_and_get_code(lambda: opt_fn(x)) 2025-12-04T10:19:48.0399249Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/utils.py", line 2430, in run_fw_bw_and_get_code 2025-12-04T10:19:48.0400048Z return run_and_get_code(run_with_backward) 2025-12-04T10:19:48.0400802Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/utils.py", line 2409, in run_and_get_code 2025-12-04T10:19:48.0401534Z result = fn(*args, **kwargs) 2025-12-04T10:19:48.0402230Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/utils.py", line 2426, in run_with_backward 2025-12-04T10:19:48.0402952Z result = fn() 2025-12-04T10:19:48.0403530Z File "/var/lib/jenkins/workspace/test/inductor/test_torchinductor.py", line 13781, in 2025-12-04T10:19:48.0404225Z _, codes = run_fw_bw_and_get_code(lambda: opt_fn(x)) 2025-12-04T10:19:48.0405017Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 936, in compile_wrapper 2025-12-04T10:19:48.0405887Z raise e.with_traceback(None) from e.__cause__ # User compiler error 2025-12-04T10:19:48.0406479Z torch._dynamo.exc.Unsupported: Attempt to trace generator 2025-12-04T10:19:48.0407094Z Explanation: Generators cannot be compiled directly with `torch.compile`. 2025-12-04T10:19:48.0407919Z Hint: Call a generator from inside of a non-generator Python function and compile that function instead. 2025-12-04T10:19:48.0409016Z Hint: This graph break is fundamental - it is unlikely that Dynamo will ever be able to trace through your code. Consider finding a workaround. 2025-12-04T10:19:48.0409701Z 2025-12-04T10:19:48.0409832Z Developer debug context: 2025-12-04T10:19:48.0410038Z 2025-12-04T10:19:48.0410560Z For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0003.html 2025-12-04T10:19:48.0411235Z 2025-12-04T10:19:48.0411453Z To execute this test, run the following from the base repo dir: 2025-12-04T10:19:48.0412700Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_torchinductor_codegen_dynamic_shapes.py DynamicShapesCodegenGPUTests.test_lite_regional_compile_flex_attention_dynamic_shapes_cuda 2025-12-04T10:19:48.0413721Z 2025-12-04T10:19:48.0414016Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:19:48.0414643Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:19:48.0417609Z unimplemented [('Attempt to trace generator\n Explanation: Generators cannot be compiled directly with `torch.compile`.\n Hint: Call a generator from inside of a non-generator Python function and compile that function instead.\n Hint: This graph break is fundamental - it is unlikely that Dynamo will ever be able to trace through your code. Consider finding a workaround.\n\n Developer debug context: \n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0003.html', 1)] 2025-12-04T10:19:48.0420316Z stats [('calls_captured', 12), ('unique_graphs', 1)] 2025-12-04T10:19:48.0420876Z aot_autograd [('total', 3), ('autograd_cache_miss', 3), ('not_ok', 2), ('ok', 1)] 2025-12-04T10:19:48.0421405Z inductor [('fxgraph_cache_miss', 2)] 2025-12-04T10:19:48.0421745Z graph_break [] 2025-12-04T10:19:48.0422131Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:19:48.0423303Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T10:19:48.0424262Z warnings.warn( 2025-12-04T10:19:48.0425162Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T10:19:48.0426134Z warnings.warn( 2025-12-04T10:19:48.0427579Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/attention/flex_attention.py:1624: UserWarning: flex_attention called without torch.compile() - this will use an unfused implementation that materializes the full scores matrix instead of generating a fused kernel. 2025-12-04T10:19:48.0428973Z 2025-12-04T10:19:48.0429138Z SOLUTION: Use torch.compile(flex_attention)(...) 2025-12-04T10:19:48.0429450Z 2025-12-04T10:19:48.0429636Z If you want to debug your score_mod/mask_mod, you can set: 2025-12-04T10:19:48.0430261Z torch.nn.attention.flex_attention._FLEX_ATTENTION_DISABLE_COMPILE_DEBUG = True 2025-12-04T10:19:48.0430697Z 2025-12-04T10:19:48.0431274Z This will allow you to use print statements or breakpoints. Note: This doesn't work with the backwards pass and may produce incorrect results. 2025-12-04T10:19:48.0432056Z _warn_once( 2025-12-04T10:19:48.0432433Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:19:48.0435226Z unimplemented [('Attempt to trace generator\n Explanation: Generators cannot be compiled directly with `torch.compile`.\n Hint: Call a generator from inside of a non-generator Python function and compile that function instead.\n Hint: This graph break is fundamental - it is unlikely that Dynamo will ever be able to trace through your code. Consider finding a workaround.\n\n Developer debug context: \n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0003.html', 1)] 2025-12-04T10:19:48.0437905Z stats [('calls_captured', 12), ('unique_graphs', 1)] 2025-12-04T10:19:48.0438456Z aot_autograd [('total', 3), ('autograd_cache_miss', 3), ('not_ok', 2), ('ok', 1)] 2025-12-04T10:19:48.0438963Z inductor [('fxgraph_cache_miss', 2)] 2025-12-04T10:19:48.0439308Z graph_break [] 2025-12-04T10:19:48.0439684Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:19:48.0440767Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T10:19:48.0441734Z warnings.warn( 2025-12-04T10:19:48.0442625Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T10:19:48.0443592Z warnings.warn( 2025-12-04T10:19:48.0443897Z =================================== FAILURES =================================== 2025-12-04T10:19:48.0444611Z _ DynamicShapesCodegenGPUTests.test_lite_regional_compile_flex_attention_dynamic_shapes_cuda _ 2025-12-04T10:19:48.0445286Z Traceback (most recent call last): 2025-12-04T10:19:48.0446090Z File "/var/lib/jenkins/workspace/test/inductor/test_torchinductor.py", line 13781, in test_lite_regional_compile_flex_attention 2025-12-04T10:19:48.0446971Z _, codes = run_fw_bw_and_get_code(lambda: opt_fn(x)) 2025-12-04T10:19:48.0447797Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/utils.py", line 2430, in run_fw_bw_and_get_code 2025-12-04T10:19:48.0448769Z return run_and_get_code(run_with_backward) 2025-12-04T10:19:48.0449515Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/utils.py", line 2409, in run_and_get_code 2025-12-04T10:19:48.0450249Z result = fn(*args, **kwargs) 2025-12-04T10:19:48.0450958Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/utils.py", line 2426, in run_with_backward 2025-12-04T10:19:48.0451765Z result = fn() 2025-12-04T10:19:48.0452315Z File "/var/lib/jenkins/workspace/test/inductor/test_torchinductor.py", line 13781, in 2025-12-04T10:19:48.0453016Z _, codes = run_fw_bw_and_get_code(lambda: opt_fn(x)) 2025-12-04T10:19:48.0453804Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 936, in compile_wrapper 2025-12-04T10:19:48.0454657Z raise e.with_traceback(None) from e.__cause__ # User compiler error 2025-12-04T10:19:48.0455247Z torch._dynamo.exc.Unsupported: Attempt to trace generator 2025-12-04T10:19:48.0455885Z Explanation: Generators cannot be compiled directly with `torch.compile`. 2025-12-04T10:19:48.0456784Z Hint: Call a generator from inside of a non-generator Python function and compile that function instead. 2025-12-04T10:19:48.0457873Z Hint: This graph break is fundamental - it is unlikely that Dynamo will ever be able to trace through your code. Consider finding a workaround. 2025-12-04T10:19:48.0458581Z 2025-12-04T10:19:48.0458701Z Developer debug context: 2025-12-04T10:19:48.0458908Z 2025-12-04T10:19:48.0459448Z For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0003.html 2025-12-04T10:19:48.0460110Z 2025-12-04T10:19:48.0460347Z To execute this test, run the following from the base repo dir: 2025-12-04T10:19:48.0461578Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_torchinductor_codegen_dynamic_shapes.py DynamicShapesCodegenGPUTests.test_lite_regional_compile_flex_attention_dynamic_shapes_cuda 2025-12-04T10:19:48.0462615Z 2025-12-04T10:19:48.0462884Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:19:48.0463515Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:19:48.0466287Z unimplemented [('Attempt to trace generator\n Explanation: Generators cannot be compiled directly with `torch.compile`.\n Hint: Call a generator from inside of a non-generator Python function and compile that function instead.\n Hint: This graph break is fundamental - it is unlikely that Dynamo will ever be able to trace through your code. Consider finding a workaround.\n\n Developer debug context: \n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0003.html', 1)] 2025-12-04T10:19:48.0468991Z stats [('calls_captured', 12), ('unique_graphs', 1)] 2025-12-04T10:19:48.0469533Z aot_autograd [('total', 3), ('autograd_cache_miss', 3), ('not_ok', 2), ('ok', 1)] 2025-12-04T10:19:48.0470059Z inductor [('fxgraph_cache_miss', 2)] 2025-12-04T10:19:48.0470411Z graph_break [] 2025-12-04T10:19:48.0470778Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:19:48.0472224Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T10:19:48.0473201Z warnings.warn( 2025-12-04T10:19:48.0474094Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T10:19:48.0475050Z warnings.warn( 2025-12-04T10:19:48.0476485Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/attention/flex_attention.py:1624: UserWarning: flex_attention called without torch.compile() - this will use an unfused implementation that materializes the full scores matrix instead of generating a fused kernel. 2025-12-04T10:19:48.0477894Z 2025-12-04T10:19:48.0478230Z SOLUTION: Use torch.compile(flex_attention)(...) 2025-12-04T10:19:48.0478526Z 2025-12-04T10:19:48.0478729Z If you want to debug your score_mod/mask_mod, you can set: 2025-12-04T10:19:48.0479336Z torch.nn.attention.flex_attention._FLEX_ATTENTION_DISABLE_COMPILE_DEBUG = True 2025-12-04T10:19:48.0479790Z 2025-12-04T10:19:48.0480350Z This will allow you to use print statements or breakpoints. Note: This doesn't work with the backwards pass and may produce incorrect results. 2025-12-04T10:19:48.0481239Z _warn_once( 2025-12-04T10:19:48.0481614Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:19:48.0484381Z unimplemented [('Attempt to trace generator\n Explanation: Generators cannot be compiled directly with `torch.compile`.\n Hint: Call a generator from inside of a non-generator Python function and compile that function instead.\n Hint: This graph break is fundamental - it is unlikely that Dynamo will ever be able to trace through your code. Consider finding a workaround.\n\n Developer debug context: \n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0003.html', 1)] 2025-12-04T10:19:48.0487061Z stats [('calls_captured', 12), ('unique_graphs', 1)] 2025-12-04T10:19:48.0487620Z aot_autograd [('total', 3), ('autograd_cache_miss', 3), ('not_ok', 2), ('ok', 1)] 2025-12-04T10:19:48.0488147Z inductor [('fxgraph_cache_miss', 2)] 2025-12-04T10:19:48.0488497Z graph_break [] 2025-12-04T10:19:48.0488862Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:19:48.0489957Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T10:19:48.0490930Z warnings.warn( 2025-12-04T10:19:48.0491810Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T10:19:48.0492775Z warnings.warn( 2025-12-04T10:19:48.0493157Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:19:48.0495927Z unimplemented [('Attempt to trace generator\n Explanation: Generators cannot be compiled directly with `torch.compile`.\n Hint: Call a generator from inside of a non-generator Python function and compile that function instead.\n Hint: This graph break is fundamental - it is unlikely that Dynamo will ever be able to trace through your code. Consider finding a workaround.\n\n Developer debug context: \n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0003.html', 1)] 2025-12-04T10:19:48.0498736Z stats [('calls_captured', 12), ('unique_graphs', 1)] 2025-12-04T10:19:48.0499278Z aot_autograd [('total', 3), ('autograd_cache_miss', 3), ('not_ok', 2), ('ok', 1)] 2025-12-04T10:19:48.0499806Z inductor [('fxgraph_cache_miss', 2)] 2025-12-04T10:19:48.0500162Z graph_break [] 2025-12-04T10:19:48.0500524Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:19:48.0501622Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T10:19:48.0502589Z warnings.warn( 2025-12-04T10:19:48.0503481Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T10:19:48.0504429Z warnings.warn( 2025-12-04T10:19:48.0505597Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_torchinductor_codegen_dynamic_shapes/inductor.test_torchinductor_codegen_dynamic_shapes-6fcb35b3fc35a71c.xml - 2025-12-04T10:19:48.0506902Z =========================== short test summary info ============================ 2025-12-04T10:19:48.0508348Z FAILED [1.8384s] inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_lite_regional_compile_flex_attention_dynamic_shapes_cuda - torch._dynamo.exc.Unsupported: Attempt to trace generator 2025-12-04T10:19:48.0509791Z Explanation: Generators cannot be compiled directly with `torch.compile`. 2025-12-04T10:19:48.0510624Z Hint: Call a generator from inside of a non-generator Python function and compile that function instead. 2025-12-04T10:19:48.0511807Z Hint: This graph break is fundamental - it is unlikely that Dynamo will ever be able to trace through your code. Consider finding a workaround. 2025-12-04T10:19:48.0512495Z 2025-12-04T10:19:48.0512631Z Developer debug context: 2025-12-04T10:19:48.0512839Z 2025-12-04T10:19:48.0513364Z For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0003.html 2025-12-04T10:19:48.0514038Z 2025-12-04T10:19:48.0514258Z To execute this test, run the following from the base repo dir: 2025-12-04T10:19:48.0515510Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_torchinductor_codegen_dynamic_shapes.py DynamicShapesCodegenGPUTests.test_lite_regional_compile_flex_attention_dynamic_shapes_cuda 2025-12-04T10:19:48.0516531Z 2025-12-04T10:19:48.0516820Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:19:48.0517416Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:19:48.0517944Z ================== 1 failed, 440 deselected, 2 rerun in 8.70s ================== 2025-12-04T10:19:48.0518398Z Got exit code 1 2025-12-04T10:19:48.0519367Z FAILED CONSISTENTLY: test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_lite_regional_compile_flex_attention_dynamic_shapes_cuda 2025-12-04T10:19:48.0520709Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T10:19:48.0521718Z W1204 10:17:36.320000 36415 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T10:19:48.0523125Z Test results will be stored in test-reports/python-pytest/inductor.test_torchinductor_codegen_dynamic_shapes/inductor.test_torchinductor_codegen_dynamic_shapes-f8b2416e9d43ac69.xml 2025-12-04T10:19:48.0524256Z ============================= test session starts ============================== 2025-12-04T10:19:48.0524914Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:19:48.0525521Z cachedir: .pytest_cache 2025-12-04T10:19:48.0526233Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:19:48.0527022Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:19:48.0527365Z configfile: pytest.ini 2025-12-04T10:19:48.0528094Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:19:48.0528337Z collecting ... collected 1750 items / 350 deselected / 1400 selected 2025-12-04T10:19:48.0528501Z stepcurrent: skipping 350 already run items. 2025-12-04T10:19:48.0528619Z Running 91 items in this shard 2025-12-04T10:19:48.0528624Z 2025-12-04T10:19:48.0529611Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_lite_regional_compile_invoke_subgraph_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [4.9039s] [ 1%] 2025-12-04T10:19:48.0530423Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_log2_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [0.8361s] [ 2%] 2025-12-04T10:19:48.0531366Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_low_memory_max_pool_dilation_1_dim_2_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [2.1538s] [ 3%] 2025-12-04T10:19:48.0532385Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_low_memory_max_pool_dilation_1_dim_3_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [6.2089s] [ 4%] 2025-12-04T10:19:48.0533321Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_low_memory_max_pool_dilation_2_dim_2_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [2.5604s] [ 5%] 2025-12-04T10:19:48.0534210Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_max_min_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [0.7639s] [ 6%] 2025-12-04T10:19:48.0535041Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_max_pool2d4_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [3.0247s] [ 7%] 2025-12-04T10:19:48.0535946Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_max_pool2d6_dilation_1_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [2.7706s] [ 8%] 2025-12-04T10:19:48.0536900Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_max_pool2d6_dilation_2_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [2.9282s] [ 9%] 2025-12-04T10:19:48.0537831Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_max_pool2d_with_indices_backward6_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py XFAIL [0.3683s] [ 10%] 2025-12-04T10:19:48.0538654Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_mean_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [1.5212s] [ 12%] 2025-12-04T10:19:48.0539494Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_mul_index_expr_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [0.5776s] [ 13%] 2025-12-04T10:19:48.0540530Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_multi_gpu_device_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py SKIPPED [0.0003s] (requires multiple cuda devices) [ 14%] 2025-12-04T10:19:48.0541389Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_multilayer_var_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [1.9535s] [ 15%] 2025-12-04T10:19:48.0542331Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_nan_assert_inside_triton_kernel_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [0.4113s] [ 16%] 2025-12-04T10:19:48.0543288Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_nan_sort_stable_False_descending_True_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [0.1395s] [ 17%] 2025-12-04T10:19:48.0544125Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_new_empty_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py XFAIL [0.1191s] [ 18%] 2025-12-04T10:19:48.0544978Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_nll_loss_forward_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [0.8117s] [ 19%] 2025-12-04T10:19:48.0545885Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_pixel_shuffle_channels_last_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [2.3575s] [ 20%] 2025-12-04T10:19:48.0546777Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_pointwise_bessel_j1_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [0.7731s] [ 21%] 2025-12-04T10:19:48.0547716Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_pointwise_chebyshev_polynomial_t_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py XFAIL [0.5134s] [ 23%] 2025-12-04T10:19:48.0548730Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_pointwise_chebyshev_polynomial_w_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py XFAIL [1.0405s] [ 24%] 2025-12-04T10:19:48.0549597Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_pointwise_expit_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [0.1915s] [ 25%] 2025-12-04T10:19:48.0550483Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_pointwise_gammaincc_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py XFAIL [0.1575s] [ 26%] 2025-12-04T10:19:48.0551384Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_pointwise_i0_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [0.2490s] [ 27%] 2025-12-04T10:19:48.0552242Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_pointwise_log1p_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [0.2497s] [ 28%] 2025-12-04T10:19:48.0553170Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_pointwise_modified_bessel_k0_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py XFAIL [0.6418s] [ 29%] 2025-12-04T10:19:48.0554008Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_pointwise_psi_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py XFAIL [0.5504s] [ 30%] 2025-12-04T10:19:48.0554882Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_pointwise_round_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [0.2620s] [ 31%] 2025-12-04T10:19:48.0555863Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_pointwise_shifted_chebyshev_polynomial_v_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py XFAIL [1.0660s] [ 32%] 2025-12-04T10:19:48.0556741Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_pow3_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py SKIPPED [0.0003s] (Skipped!) [ 34%] 2025-12-04T10:19:48.0557556Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_pow_int_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [1.4988s] [ 35%] 2025-12-04T10:19:48.0558410Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_pow_symfloat_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [0.4773s] [ 36%] 2025-12-04T10:19:48.0559272Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_randn_generator_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [0.5635s] [ 37%] 2025-12-04T10:19:48.0560119Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_randn_like_empty_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py XFAIL [0.1598s] [ 38%] 2025-12-04T10:19:48.0560974Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_reduction1_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [0.3739s] [ 39%] 2025-12-04T10:19:48.0562430Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_reflection_pad2d_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py W1204 10:18:23.024000 36415 site-packages/torch/utils/_sympy/interp.py:179] [0/0] failed while executing pow_by_natural([VR[4, int_oo], VR[-1, -1]]) 2025-12-04T10:19:48.0563057Z W1204 10:18:23.560000 36415 site-packages/torch/utils/_sympy/interp.py:179] [0/0] failed while executing pow_by_natural([VR[-int_oo, int_oo], VR[-1, -1]]) 2025-12-04T10:19:48.0563164Z PASSED [2.3943s] [ 40%] 2025-12-04T10:19:48.0563989Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_relu_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [0.3397s] [ 41%] 2025-12-04T10:19:48.0564929Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_remove_noop_view_dtype_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [0.2395s] [ 42%] 2025-12-04T10:19:48.0565746Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_repeat_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [0.9065s] [ 43%] 2025-12-04T10:19:48.0566641Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_repeat_interleave_2_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [0.3294s] [ 45%] 2025-12-04T10:19:48.0567685Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_repeat_interleave_Tensor_decomp_int32_nd_2_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [0.5351s] [ 46%] 2025-12-04T10:19:48.0568665Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_require_stride_expanded_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py SKIPPED [0.0004s] (Skipped!) [ 47%] 2025-12-04T10:19:48.0569494Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_resize_as_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [19.8656s] [ 48%] 2025-12-04T10:19:48.0570307Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_roll_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [1.5791s] [ 49%] 2025-12-04T10:19:48.0571398Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_rsqrt_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [0.2768s] [ 50%] 2025-12-04T10:19:48.0572248Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_scalar_output_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [1.1325s] [ 51%] 2025-12-04T10:19:48.0573090Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_scatter5_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [0.6721s] [ 52%] 2025-12-04T10:19:48.0573942Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_scatter_reduce1_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [0.4243s] [ 53%] 2025-12-04T10:19:48.0574870Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_scheduler_vertical_fusion1_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [0.7659s] [ 54%] 2025-12-04T10:19:48.0575749Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_sdpa_unaligned_mask_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [0.5627s] [ 56%] 2025-12-04T10:19:48.0576683Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_searchsorted_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [13.7973s] [ 57%] 2025-12-04T10:19:48.0577547Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_select_scatter_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [0.5948s] [ 58%] 2025-12-04T10:19:48.0578340Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_sin_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [0.5009s] [ 59%] 2025-12-04T10:19:48.0579324Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_size_asserts_for_multi_output_fallback_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [0.1578s] [ 60%] 2025-12-04T10:19:48.0580185Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_sizehint_issue1_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [1.8376s] [ 61%] 2025-12-04T10:19:48.0581007Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_slice3_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py XFAIL [0.6224s] [ 62%] 2025-12-04T10:19:48.0581976Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_slice_mutation3_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [0.3567s] [ 63%] 2025-12-04T10:19:48.0582846Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_slice_scatter2_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [0.3713s] [ 64%] 2025-12-04T10:19:48.0583784Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_slice_scatter_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [1.0612s] [ 65%] 2025-12-04T10:19:48.0584697Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_slice_scatter_reinplace_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [2.3586s] [ 67%] 2025-12-04T10:19:48.0585519Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_softmax_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [1.1851s] [ 68%] 2025-12-04T10:19:48.0586427Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_softmax_one_kernel_persist_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [0.5542s] [ 69%] 2025-12-04T10:19:48.0587408Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_split_cumprod_low_prec_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py SKIPPED [0.0035s] (Requires sm80) [ 70%] 2025-12-04T10:19:48.0588250Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_split_cumsum_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [6.6486s] [ 71%] 2025-12-04T10:19:48.0589214Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_split_cumsum_low_prec_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py SKIPPED [0.0035s] (Requires sm80) [ 72%] 2025-12-04T10:19:48.0590295Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_sqrt_dynamic_shapes_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py SKIPPED [0.0032s] (sqrt dynamic shapes only supports cpu) [ 73%] 2025-12-04T10:19:48.0591138Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_squeeze1_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [0.6889s] [ 74%] 2025-12-04T10:19:48.0591936Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_sum3_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [0.3984s] [ 75%] 2025-12-04T10:19:48.0592729Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_sum5_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [0.4989s] [ 76%] 2025-12-04T10:19:48.0593543Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_tanh_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [0.4368s] [ 78%] 2025-12-04T10:19:48.0594511Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_tmp_not_defined_issue1_use_block_ptr_True_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [0.7638s] [ 79%] 2025-12-04T10:19:48.0595346Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_to_dtype_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [0.4644s] [ 80%] 2025-12-04T10:19:48.0596344Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_triton_argmin_argmax_transpose_logical_index_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [3.2268s] [ 81%] 2025-12-04T10:19:48.0597249Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_triton_kernel_bool_param_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [0.4998s] [ 82%] 2025-12-04T10:19:48.0598224Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_unfold_zero_dimension_tensor_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [0.1141s] [ 83%] 2025-12-04T10:19:48.0599128Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_unroll_small_reduction_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [1.1768s] [ 84%] 2025-12-04T10:19:48.0600011Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_unspec_inputs_float16_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [0.5993s] [ 85%] 2025-12-04T10:19:48.0600964Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_unspec_inputs_float32_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [0.5079s] [ 86%] 2025-12-04T10:19:48.0601865Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_unspec_inputs_float64_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [0.6759s] [ 87%] 2025-12-04T10:19:48.0602696Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_unsqueeze_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [1.0717s] [ 89%] 2025-12-04T10:19:48.0603583Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_unsqueeze_inplace_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [0.4361s] [ 90%] 2025-12-04T10:19:48.0604476Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_upsample_bilinear2d_a_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [3.2228s] [ 91%] 2025-12-04T10:19:48.0605414Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_vectorized_ops_masked_var_novec_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [0.3384s] [ 92%] 2025-12-04T10:19:48.0606265Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_view_as_complex_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [1.2636s] [ 93%] 2025-12-04T10:19:48.0607096Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_view_as_real_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [0.2346s] [ 94%] 2025-12-04T10:19:48.0607934Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_view_detach_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py XFAIL [0.1546s] [ 95%] 2025-12-04T10:19:48.0608745Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_views2_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [3.9210s] [ 96%] 2025-12-04T10:19:48.0609573Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_views3_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [0.4985s] [ 97%] 2025-12-04T10:19:48.0610378Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_views7_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [0.3701s] [ 98%] 2025-12-04T10:19:48.0611197Z inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_zeros_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [0.8997s] [100%] 2025-12-04T10:19:48.0611203Z 2025-12-04T10:19:48.0612204Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_torchinductor_codegen_dynamic_shapes/inductor.test_torchinductor_codegen_dynamic_shapes-f8b2416e9d43ac69.xml - 2025-12-04T10:19:48.0612463Z ==== 74 passed, 6 skipped, 350 deselected, 11 xfailed in 126.01s (0:02:06) ===== 2025-12-04T10:19:48.0613390Z The following tests failed consistently: ['test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenGPUTests::test_lite_regional_compile_flex_attention_dynamic_shapes_cuda'] 2025-12-04T10:19:48.0613395Z 2025-12-04T10:19:48.0614257Z FINISHED PRINTING LOG FILE of inductor/test_torchinductor_codegen_dynamic_shapes 2/4 (test/test-reports/inductor.test_torchinductor_codegen_dynamic_shapes_2.4_37f84ce4dcc870f4_.log) 2025-12-04T10:19:48.0614263Z 2025-12-04T10:19:48.0614758Z Finished inductor/test_torchinductor_codegen_dynamic_shapes 2/4 ... [2025-12-04 10:19:47.858455][4016.241347114], took 11.09min 2025-12-04T10:19:48.0615806Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_torchinductor_codegen_dynamic_shapes/inductor.test_torchinductor_codegen_dynamic_shapes-0c75da116b2f10f8.xml 2025-12-04T10:19:48.0616982Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_torchinductor_codegen_dynamic_shapes/inductor.test_torchinductor_codegen_dynamic_shapes-fd0863b8a222871a.xml 2025-12-04T10:19:48.0618033Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_torchinductor_codegen_dynamic_shapes/inductor.test_torchinductor_codegen_dynamic_shapes-6fcb35b3fc35a71c.xml 2025-12-04T10:19:48.0773006Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_torchinductor_codegen_dynamic_shapes/inductor.test_torchinductor_codegen_dynamic_shapes-f8b2416e9d43ac69.xml 2025-12-04T10:19:48.4069903Z Uploading logs for 57119749248 to S3 2025-12-04T10:19:48.4519181Z Uploading artifacts took 0.34 seconds 2025-12-04T10:19:48.4519749Z inductor/test_torchinductor_codegen_dynamic_shapes 2/4 failed! 2025-12-04T10:19:48.4524034Z Running inductor/test_torchinductor_opinfo 2/17 ... [2025-12-04 10:19:48.452221][4016.835117137] 2025-12-04T10:19:48.4524666Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T10:19:48.4529099Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_torchinductor_opinfo.py', '--shard-id=2', '--num-shards=17', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:19:48.452667] 2025-12-04T10:30:08.2074160Z 2025-12-04T10:30:08.2075475Z inductor/test_torchinductor_opinfo 2/17 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_torchinductor_opinfo_2.17_595df7515ef47f8b_.log 2025-12-04T10:30:08.2196235Z Running 196 items in this shard: test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_H_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_H_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive___ror___cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive__unsafe_masked_index_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive__unsafe_masked_index_put_accumulate_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_acos_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_angle_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_arange_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_argwhere_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_asinh_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_atan2_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_atan2_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_block_diag_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_bmm_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_broadcast_tensors_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_cartesian_prod_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_cdist_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_ceil_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_char_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_chunk_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_clamp_min_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_clone_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_column_stack_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_conj_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_contiguous_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_corrcoef_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_cos_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_count_nonzero_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_cov_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_cov_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_cummax_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_cumprod_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_diag_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_diag_embed_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_diagflat_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_diagonal_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_diagonal_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_div_floor_rounding_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_div_trunc_rounding_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_div_trunc_rounding_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_dstack_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_empty_like_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_empty_strided_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_eq_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_erf_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_erf_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_exp_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_fft2_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_fftn_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_hfft2_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_hfft_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_ifft2_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_ifftn_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_ihfft2_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_irfft_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_rfft_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_flip_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fmax_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_full_like_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_gather_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_gradient_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_gradient_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_histc_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_hsplit_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_hsplit_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_hstack_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_hstack_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_i0_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_index_add_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_index_reduce_prod_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_inner_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_isfinite_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_isin_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_isnan_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_ldexp_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_lgamma_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_diagonal_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_eigvalsh_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_lstsq_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_lu_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_lu_factor_ex_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_matrix_power_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_norm_subgradients_at_zero_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_slogdet_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_vander_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_log10_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_log_normal_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_logical_not_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_logical_or_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_logit_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_logspace_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_logsumexp_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_lt_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_lu_solve_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_amax_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_cumprod_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_cumprod_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_cumprod_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_scatter_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_sum_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_matmul_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_matrix_exp_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_mode_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_movedim_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nanmean_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_new_ones_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_new_zeros_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_new_zeros_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_new_zeros_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nextafter_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_batch_norm_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_celu_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_cosine_similarity_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_ctc_loss_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_feature_alpha_dropout_without_train_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_feature_alpha_dropout_without_train_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_grid_sample_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_interpolate_nearest_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_l1_loss_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_leaky_relu_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_margin_ranking_loss_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_margin_ranking_loss_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_max_pool3d_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_max_unpool3d_grad_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_pixel_unshuffle_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_relu6_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_relu_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_relu_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_rms_norm_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_scaled_dot_product_attention_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_upsample_nearest_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_upsample_nearest_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_ones_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_polygamma_polygamma_n_2_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_polygamma_polygamma_n_4_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_qr_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_randint_like_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_randn_like_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_remainder_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_renorm_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_reshape_as_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_resolve_neg_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_round_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_round_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_round_decimals_neg_3_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_scatter_reduce_amin_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_scatter_reduce_sum_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_searchsorted_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_select_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_select_scatter_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_signal_windows_blackman_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_signal_windows_hamming_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_sinc_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_sinh_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_slice_scatter_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_sort_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_bessel_y1_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_bessel_y1_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_bessel_y1_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_chebyshev_polynomial_t_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_chebyshev_polynomial_v_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_chebyshev_polynomial_v_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_chebyshev_polynomial_v_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_chebyshev_polynomial_v_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_erfcx_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_modified_bessel_i0_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_modified_bessel_i1_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_modified_bessel_k0_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_modified_bessel_k0_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_modified_bessel_k0_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_modified_bessel_k1_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_scaled_modified_bessel_k0_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_shifted_chebyshev_polynomial_w_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_xlog1py_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_split_with_sizes_copy_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_std_unbiased_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_sum_to_size_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_svd_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_tanh_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_tensordot_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_torch_ops_aten__safe_softmax_default_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_trace_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_triu_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_true_divide_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_unbind_copy_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_unbind_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_unflatten_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_unflatten_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_unfold_copy_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_unique_consecutive_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_unsafe_chunk_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_unsqueeze_copy_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_view_copy_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_view_copy_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_view_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_zero__cuda_bool 2025-12-04T10:30:08.2310415Z 2025-12-04T10:30:08.2310837Z Finished inductor/test_torchinductor_opinfo 2/17 ... [2025-12-04 10:30:08.207758][4636.5906515], took 10.33min 2025-12-04T10:30:08.2312305Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_torchinductor_opinfo/inductor.test_torchinductor_opinfo-8ad43f769763d7e0.xml 2025-12-04T10:30:08.2895100Z Running inductor/test_torchinductor_opinfo 7/17 ... [2025-12-04 10:30:08.289203][4636.672096974] 2025-12-04T10:30:08.2895725Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T10:30:08.2899405Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_torchinductor_opinfo.py', '--shard-id=7', '--num-shards=17', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:30:08.289677] 2025-12-04T10:41:46.2950398Z 2025-12-04T10:41:46.2951589Z inductor/test_torchinductor_opinfo 7/17 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_torchinductor_opinfo_7.17_bf87dc9c512027f2_.log 2025-12-04T10:41:46.3076911Z Running 209 items in this shard: test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive___radd___cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive___rand___cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive___rmod___cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive___rmul___cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive___rxor___cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive__batch_norm_with_update_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive__batch_norm_with_update_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive__segment_reduce_lengths_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive__unsafe_masked_index_put_accumulate_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_addcmul_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_addr_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_all_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_amax_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_amin_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_angle_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_argsort_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_argwhere_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_as_strided_copy_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_atan2_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_atleast_1d_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_atleast_3d_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_atleast_3d_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_bernoulli_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_bitwise_and_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_bitwise_left_shift_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_bitwise_right_shift_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_bitwise_xor_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_block_diag_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_bucketize_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_bucketize_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_cfloat_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_clamp_max_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_clamp_min_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_column_stack_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_cummax_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_cummin_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_cumulative_trapezoid_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_deg2rad_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_diag_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_diagonal_copy_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_digamma_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_dot_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_dsplit_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_dsplit_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_empty_like_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_empty_permuted_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_empty_strided_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_empty_strided_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_erf_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_expand_as_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_eye_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_fft2_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_fftshift_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_hfft2_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_hfftn_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_ihfftn_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_irfft_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_irfft_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_irfftn_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fill_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_float_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fmin_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fmod_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_ge_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_index_add_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_index_fill_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_index_reduce_amax_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_index_reduce_mean_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_index_select_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_index_select_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_isclose_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_isfinite_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_isinf_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_isposinf_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_isreal_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_isreal_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_jiterator_4inputs_with_extra_args_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_jiterator_binary_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_kthvalue_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_ldexp_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_diagonal_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_matrix_norm_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_matrix_rank_hermitian_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_norm_subgradients_at_zero_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linspace_tensor_overload_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_log10_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_logdet_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_logical_not_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_logical_not_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_logical_or_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_logspace_tensor_overload_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_long_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_amin_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_cumsum_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_cumsum_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_cumsum_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_normalize_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_prod_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_sum_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_var_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_max_binary_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_max_pool2d_with_indices_backward_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_max_reduction_no_dim_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_maximum_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_maximum_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_meshgrid_list_of_tensors_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_movedim_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_mvlgamma_mvlgamma_p_5_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nanmedian_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nanmedian_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_narrow_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_new_empty_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_adaptive_max_pool1d_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_avg_pool1d_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_avg_pool1d_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_avg_pool3d_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_binary_cross_entropy_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_channel_shuffle_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_channel_shuffle_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_conv1d_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_conv2d_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_conv_transpose3d_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_gaussian_nll_loss_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_hardsigmoid_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_huber_loss_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_interpolate_bilinear_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_interpolate_nearest-exact_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_interpolate_trilinear_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_l1_loss_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_linear_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_local_response_norm_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_max_pool2d_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_max_unpool1d_grad_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_max_unpool1d_grad_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_one_hot_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_pad_constant_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_pad_reflect_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_pdist_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_poisson_nll_loss_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_poisson_nll_loss_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_poisson_nll_loss_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_prelu_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_smooth_l1_loss_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_smooth_l1_loss_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_softplus_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_triplet_margin_loss_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nonzero_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_ones_like_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_permute_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_polygamma_polygamma_n_3_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_randint_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_randint_like_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_reciprocal_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_reciprocal_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_repeat_interleave_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_resize_as__cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_resolve_conj_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_roll_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_rot90_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_rsqrt_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_scalar_tensor_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_scatter_add_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_scatter_reduce_amin_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_scatter_reduce_mean_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_searchsorted_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_select_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_select_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_sgn_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_sgn_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_short_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_signal_windows_blackman_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_signal_windows_general_hamming_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_signal_windows_hann_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_sort_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_bessel_j0_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_chebyshev_polynomial_u_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_erfcx_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_hermite_polynomial_h_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_i1_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_i1_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_modified_bessel_i1_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_scaled_modified_bessel_k0_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_zeta_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_split_list_args_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_square_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_square_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_squeeze_multiple_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_stack_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_stack_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_sub_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_t_copy_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_take_along_dim_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_take_along_dim_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_take_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_tanh_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_tensor_split_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_tile_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_to_sparse_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_torch_ops_aten__safe_softmax_default_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_trace_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_trapz_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_true_divide_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_unfold_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_unique_consecutive_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_unique_cuda_uint16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_unsqueeze_copy_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_var_mean_unbiased_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_xlogy_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_xlogy_cuda_float32 2025-12-04T10:41:46.3201051Z 2025-12-04T10:41:46.3201489Z Finished inductor/test_torchinductor_opinfo 7/17 ... [2025-12-04 10:41:46.294728][5334.677622919], took 11.63min 2025-12-04T10:41:46.3203009Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_torchinductor_opinfo/inductor.test_torchinductor_opinfo-6495f5d67df68869.xml 2025-12-04T10:41:46.6663552Z Uploading artifacts took 0.28 seconds 2025-12-04T10:41:46.6668555Z Running inductor/test_torchinductor_opinfo 12/17 ... [2025-12-04 10:41:46.666615][5335.049510926] 2025-12-04T10:41:46.6669215Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T10:41:46.6673471Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_torchinductor_opinfo.py', '--shard-id=12', '--num-shards=17', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:41:46.667054] 2025-12-04T10:49:42.5548764Z 2025-12-04T10:49:42.5552041Z inductor/test_torchinductor_opinfo 12/17 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_torchinductor_opinfo_12.17_a032934f54d29036_.log 2025-12-04T10:49:42.5666643Z Running 195 items in this shard: test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_H_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_T_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive___getitem___cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive___rdiv___cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive___rmul___cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive___ror___cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive__native_batch_norm_legit_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive__unsafe_masked_index_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_abs_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_acosh_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_add_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_addcdiv_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_addmm_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_argmax_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_argmax_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_as_strided_scatter_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_asin_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_asinh_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_atan_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_atan_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_atleast_2d_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_baddbmm_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_bitwise_or_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_bitwise_or_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_cartesian_prod_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_cat_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_chunk_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_clamp_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_clone_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_constant_pad_nd_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_cumsum_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_diagonal_copy_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_diagonal_copy_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_diff_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_div_no_rounding_mode_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_dstack_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_dstack_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_dstack_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_empty_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_empty_like_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_exp_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_expand_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_expand_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_hfft2_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_ihfftn_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_rfftn_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fill_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fill_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_flipud_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_flipud_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_float_power_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_floor_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fmax_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_full_like_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_ge_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_grid_sampler_3d_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_half_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_heaviside_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_hstack_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_index_reduce_amax_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_index_reduce_mean_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_index_reduce_prod_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_index_reduce_prod_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_isclose_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_isneginf_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_isneginf_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_isneginf_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_isreal_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_jiterator_unary_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_kron_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_lcm_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_inv_ex_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_multi_dot_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_pinv_hermitian_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_pinv_singular_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_vander_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_log_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_log_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_log_softmax_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_logical_xor_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_logit_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_logsumexp_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_lt_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_mT_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_amax_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_amax_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_fill_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_logsumexp_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_prod_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_prod_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_max_binary_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_mean_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_median_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_min_binary_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_msort_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_msort_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_mul_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nansum_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_narrow_copy_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_narrow_copy_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_native_batch_norm_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_native_batch_norm_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_native_layer_norm_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_neg_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_new_zeros_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_adaptive_avg_pool2d_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_batch_norm_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_binary_cross_entropy_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_channel_shuffle_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_conv1d_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_conv2d_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_cosine_embedding_loss_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_dropout3d_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_embedding_bag_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_gelu_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_group_norm_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_hardswish_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_hardtanh_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_hinge_embedding_loss_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_instance_norm_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_interpolate_nearest_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_linear_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_max_unpool1d_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_mish_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_multi_margin_loss_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_multilabel_soft_margin_loss_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_pad_circular_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_pad_constant_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_silu_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_softsign_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_upsample_bilinear_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_normal_in_place_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_normal_in_place_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_ones_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_ones_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_ones_like_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_outer_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_outer_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_outer_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_pca_lowrank_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_permute_copy_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_permute_copy_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_polygamma_polygamma_n_2_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_polygamma_polygamma_n_3_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_pow_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_pow_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_randn_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_repeat_interleave_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_resolve_neg_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_rsqrt_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_scatter_reduce_mean_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_scatter_reduce_prod_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_scatter_reduce_sum_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_short_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_short_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_sigmoid_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_sparse_sampled_addmm_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_airy_ai_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_bessel_y0_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_entr_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_laguerre_polynomial_l_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_modified_bessel_i1_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_ndtri_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_shifted_chebyshev_polynomial_t_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_shifted_chebyshev_polynomial_v_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_shifted_chebyshev_polynomial_w_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_spherical_bessel_j0_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_zeta_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_split_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_split_with_sizes_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_sqrt_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_sqrt_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_sum_to_size_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_svd_lowrank_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_svd_lowrank_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_t_copy_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_t_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_tan_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_tanh_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_tensordot_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_tile_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_topk_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_torch_ops_aten__efficient_attention_forward_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_trace_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_trace_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_tril_indices_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_triu_indices_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_unflatten_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_unique_consecutive_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_unravel_index_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_unsafe_split_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_view_as_complex_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_view_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_zeros_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_zeros_like_cuda_float32 2025-12-04T10:49:42.5779492Z 2025-12-04T10:49:42.5779918Z Finished inductor/test_torchinductor_opinfo 12/17 ... [2025-12-04 10:49:42.555027][5810.937919587], took 7.93min 2025-12-04T10:49:42.5781399Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_torchinductor_opinfo/inductor.test_torchinductor_opinfo-f9f6352517dfd8be.xml 2025-12-04T10:49:42.6605562Z Running inductor/test_torchinductor_opinfo 17/17 ... [2025-12-04 10:49:42.660147][5811.043041087] 2025-12-04T10:49:42.6606615Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T10:49:42.6610231Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_torchinductor_opinfo.py', '--shard-id=17', '--num-shards=17', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:49:42.660648] 2025-12-04T10:59:27.7335252Z 2025-12-04T10:59:27.7336662Z inductor/test_torchinductor_opinfo 17/17 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_torchinductor_opinfo_17.17_0b4f962be1a8215a_.log 2025-12-04T10:59:27.7460591Z Running 206 items in this shard: test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_T_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive___getitem___cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive___getitem___cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive___rdiv___cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive___rmatmul___cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive___rmod___cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive___rpow___cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_addmm_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_addr_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_alias_copy_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_all_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_amin_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_angle_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_angle_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_any_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_argwhere_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_argwhere_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_as_strided_partial_views_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_asin_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_atleast_1d_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_bernoulli_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_bfloat16_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_bincount_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_bitwise_or_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_bitwise_or_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_broadcast_to_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_broadcast_to_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_cat_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_cdist_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_cdouble_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_cdouble_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_cfloat_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_cholesky_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_clone_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_conj_physical_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_constant_pad_nd_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_contiguous_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_cos_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_cos_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_cumulative_trapezoid_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_cumulative_trapezoid_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_diag_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_diag_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_diag_embed_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_diagonal_scatter_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_digamma_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_empty_permuted_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_exp2_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_exp_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_expm1_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_eye_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_fftshift_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_irfftn_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_irfftn_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_irfftn_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_full_like_cuda_uint32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_gather_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_geometric_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_gt_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_hash_tensor_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_hash_tensor_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_index_copy_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_int_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_int_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_isinf_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_isreal_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_jiterator_unary_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_eigvalsh_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_inv_ex_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_ldl_factor_ex_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_matrix_power_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_pinv_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_logical_or_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_logical_xor_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_logspace_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_logsumexp_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_mT_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_amin_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_argmin_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_cumprod_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_logaddexp_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_logsumexp_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_mean_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_max_binary_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_max_reduction_no_dim_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_median_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_min_reduction_no_dim_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_min_reduction_with_dim_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_mode_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_movedim_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_msort_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_mvlgamma_mvlgamma_p_1_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_mvlgamma_mvlgamma_p_1_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nan_to_num_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_narrow_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_neg_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_adaptive_avg_pool3d_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_adaptive_max_pool3d_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_avg_pool1d_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_batch_norm_without_cudnn_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_conv3d_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_conv3d_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_conv_transpose1d_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_cosine_embedding_loss_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_cross_entropy_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_dropout2d_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_embedding_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_feature_alpha_dropout_with_train_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_gelu_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_group_norm_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_interpolate_area_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_interpolate_nearest-exact_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_layer_norm_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_max_unpool1d_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_max_unpool2d_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_mish_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_mse_loss_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_multilabel_margin_loss_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_pad_constant_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_pad_reflect_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_pixel_unshuffle_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_relu6_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_rrelu_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_scaled_dot_product_attention_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_softsign_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_threshold_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_upsample_nearest_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nonzero_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_ones_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_ones_like_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_permute_copy_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_polygamma_polygamma_n_2_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_polygamma_polygamma_n_4_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_positive_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_positive_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_randn_like_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_reciprocal_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_repeat_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_reshape_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_reshape_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_resize__cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_resize__cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_rot90_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_round_decimals_0_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_scatter_reduce_amin_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_scatter_reduce_sum_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_scatter_reduce_sum_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_signal_windows_nuttall_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_sin_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_slice_scatter_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_softmax_with_dtype_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_bessel_j0_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_bessel_j1_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_bessel_y0_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_bessel_y1_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_chebyshev_polynomial_u_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_i0e_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_i1_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_i1e_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_legendre_polynomial_p_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_modified_bessel_k0_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_ndtri_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_ndtri_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_shifted_chebyshev_polynomial_t_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_shifted_chebyshev_polynomial_v_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_shifted_chebyshev_polynomial_v_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_shifted_chebyshev_polynomial_w_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_xlog1py_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_xlog1py_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_split_list_args_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_square_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_square_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_squeeze_copy_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_squeeze_copy_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_squeeze_multiple_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_squeeze_multiple_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_std_mean_unbiased_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_sub_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_t_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_t_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_tanh_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_torch_ops_aten__safe_softmax_default_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_transpose_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_tril_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_true_divide_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_unbind_copy_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_unbind_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_unfold_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_unfold_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_unique_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_unravel_index_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_unsqueeze_copy_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_unsqueeze_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_var_unbiased_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_view_as_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_view_copy_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_view_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_vsplit_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_vsplit_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_where_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_where_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_xlogy_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_zero__cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_zeros_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_zeros_like_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_zeros_like_cuda_uint8 2025-12-04T10:59:27.7581114Z 2025-12-04T10:59:27.7581549Z Finished inductor/test_torchinductor_opinfo 17/17 ... [2025-12-04 10:59:27.734029][6396.116922776], took 9.75min 2025-12-04T10:59:27.7583151Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_torchinductor_opinfo/inductor.test_torchinductor_opinfo-25ab0fa1230b07b5.xml 2025-12-04T10:59:27.8264746Z Running inductor/test_cuda_select_algorithm 3/5 ... [2025-12-04 10:59:27.826112][6396.209005792] 2025-12-04T10:59:27.8265391Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T10:59:27.8268625Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_cuda_select_algorithm.py', '--shard-id=3', '--num-shards=5', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:59:27.826567] 2025-12-04T11:20:45.1673555Z 2025-12-04T11:20:45.1674843Z PRINTING LOG FILE of inductor/test_cuda_select_algorithm 3/5 (test/test-reports/inductor.test_cuda_select_algorithm_3.5_e3565bc7025c1889_.log) 2025-12-04T11:20:45.1676095Z W1204 10:59:37.235000 86349 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T11:20:45.1678089Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-74cab4bdcde89184.xml 2025-12-04T11:20:45.1679711Z ============================= test session starts ============================== 2025-12-04T11:20:45.1680906Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T11:20:45.1681698Z cachedir: .pytest_cache 2025-12-04T11:20:45.1682912Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:20:45.1684090Z rootdir: /var/lib/jenkins/workspace 2025-12-04T11:20:45.1684473Z configfile: pytest.ini 2025-12-04T11:20:45.1685755Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:20:45.1686769Z collecting ... collected 58 items 2025-12-04T11:20:45.1687377Z stepcurrent: Cannot find last run test, not skipping 2025-12-04T11:20:45.1705380Z Running 14 items in this shard: test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16, test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16, test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16, test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16, test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16, test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16, test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16, test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16, test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16, test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16, test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16, test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16, test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16, test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 2025-12-04T11:20:45.1724015Z 2025-12-04T11:20:45.1725525Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 ('RERUN', {'yellow': True}) [4.2856s] [ 7%] 2025-12-04T11:20:45.1728915Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 ('RERUN', {'yellow': True}) [0.8332s] [ 7%] 2025-12-04T11:20:45.1731781Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 FAILED [0.8264s] [ 7%] 2025-12-04T11:20:45.1738167Z 2025-12-04T11:20:45.1738400Z ==================================== RERUNS ==================================== 2025-12-04T11:20:45.1739677Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 _ 2025-12-04T11:20:45.1740922Z Traceback (most recent call last): 2025-12-04T11:20:45.1742113Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 204, in test_int8_woq_mm_concat_cuda 2025-12-04T11:20:45.1743477Z self.assertEqual(counters["inductor"]["woq_matcher_count"], 3) 2025-12-04T11:20:45.1744671Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual 2025-12-04T11:20:45.1745804Z return super().assertEqual(x, y, *args, **kwargs) 2025-12-04T11:20:45.1747054Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual 2025-12-04T11:20:45.1748270Z raise error_metas.pop()[0].to_error( # type: ignore[index] 2025-12-04T11:20:45.1749048Z AssertionError: Scalars are not equal! 2025-12-04T11:20:45.1749479Z 2025-12-04T11:20:45.1749662Z Expected 3 but got 6. 2025-12-04T11:20:45.1750050Z Absolute difference: 3 2025-12-04T11:20:45.1750552Z Relative difference: 1.0 2025-12-04T11:20:45.1750776Z 2025-12-04T11:20:45.1751143Z To execute this test, run the following from the base repo dir: 2025-12-04T11:20:45.1753110Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T11:20:45.1754613Z 2025-12-04T11:20:45.1754982Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:20:45.1756035Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:20:45.1756753Z stats [('calls_captured', 36)] 2025-12-04T11:20:45.1758175Z inductor [('pattern_matcher_nodes', 36), ('woq_matcher_nodes', 24), ('pattern_matcher_count', 18), ('woq_matcher_count', 6), ('fxgraph_cache_miss', 2)] 2025-12-04T11:20:45.1759821Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:20:45.1760680Z graph_break [] 2025-12-04T11:20:45.1761356Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:20:45.1763463Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.1765182Z warnings.warn( 2025-12-04T11:20:45.1766725Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.1768502Z warnings.warn( 2025-12-04T11:20:45.1769993Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 _ 2025-12-04T11:20:45.1771575Z Traceback (most recent call last): 2025-12-04T11:20:45.1772889Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 204, in test_int8_woq_mm_concat_cuda 2025-12-04T11:20:45.1774300Z self.assertEqual(counters["inductor"]["woq_matcher_count"], 3) 2025-12-04T11:20:45.1775785Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual 2025-12-04T11:20:45.1777247Z return super().assertEqual(x, y, *args, **kwargs) 2025-12-04T11:20:45.1778711Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual 2025-12-04T11:20:45.1780270Z raise error_metas.pop()[0].to_error( # type: ignore[index] 2025-12-04T11:20:45.1781120Z AssertionError: Scalars are not equal! 2025-12-04T11:20:45.1781598Z 2025-12-04T11:20:45.1781769Z Expected 3 but got 6. 2025-12-04T11:20:45.1782318Z Absolute difference: 3 2025-12-04T11:20:45.1782826Z Relative difference: 1.0 2025-12-04T11:20:45.1783373Z 2025-12-04T11:20:45.1783811Z To execute this test, run the following from the base repo dir: 2025-12-04T11:20:45.1785458Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T11:20:45.1786543Z 2025-12-04T11:20:45.1786814Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:20:45.1787489Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:20:45.1787990Z stats [('calls_captured', 36)] 2025-12-04T11:20:45.1788734Z inductor [('pattern_matcher_nodes', 36), ('woq_matcher_nodes', 24), ('pattern_matcher_count', 18), ('woq_matcher_count', 6), ('fxgraph_cache_miss', 2)] 2025-12-04T11:20:45.1789649Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:20:45.1790127Z graph_break [] 2025-12-04T11:20:45.1790506Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:20:45.1791608Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.1792591Z warnings.warn( 2025-12-04T11:20:45.1793505Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.1794470Z warnings.warn( 2025-12-04T11:20:45.1796949Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:20:45.1797437Z stats [('calls_captured', 36)] 2025-12-04T11:20:45.1797883Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:20:45.1798804Z inductor [('pattern_matcher_nodes', 36), ('woq_matcher_nodes', 24), ('pattern_matcher_count', 18), ('woq_matcher_count', 6), ('fxgraph_cache_miss', 2)] 2025-12-04T11:20:45.1799592Z graph_break [] 2025-12-04T11:20:45.1799977Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:20:45.1801078Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.1802063Z warnings.warn( 2025-12-04T11:20:45.1802952Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.1803930Z warnings.warn( 2025-12-04T11:20:45.1804236Z =================================== FAILURES =================================== 2025-12-04T11:20:45.1805086Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 _ 2025-12-04T11:20:45.1806072Z Traceback (most recent call last): 2025-12-04T11:20:45.1806859Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 204, in test_int8_woq_mm_concat_cuda 2025-12-04T11:20:45.1807781Z self.assertEqual(counters["inductor"]["woq_matcher_count"], 3) 2025-12-04T11:20:45.1808629Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual 2025-12-04T11:20:45.1809442Z return super().assertEqual(x, y, *args, **kwargs) 2025-12-04T11:20:45.1810267Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual 2025-12-04T11:20:45.1811163Z raise error_metas.pop()[0].to_error( # type: ignore[index] 2025-12-04T11:20:45.1811650Z AssertionError: Scalars are not equal! 2025-12-04T11:20:45.1811901Z 2025-12-04T11:20:45.1812024Z Expected 3 but got 6. 2025-12-04T11:20:45.1812307Z Absolute difference: 3 2025-12-04T11:20:45.1812609Z Relative difference: 1.0 2025-12-04T11:20:45.1812806Z 2025-12-04T11:20:45.1813035Z To execute this test, run the following from the base repo dir: 2025-12-04T11:20:45.1814380Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T11:20:45.1815462Z 2025-12-04T11:20:45.1815734Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:20:45.1816500Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:20:45.1816994Z stats [('calls_captured', 36)] 2025-12-04T11:20:45.1817754Z inductor [('pattern_matcher_nodes', 36), ('woq_matcher_nodes', 24), ('pattern_matcher_count', 18), ('woq_matcher_count', 6), ('fxgraph_cache_miss', 2)] 2025-12-04T11:20:45.1818642Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:20:45.1819124Z graph_break [] 2025-12-04T11:20:45.1819512Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:20:45.1820623Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.1821592Z warnings.warn( 2025-12-04T11:20:45.1822496Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.1823472Z warnings.warn( 2025-12-04T11:20:45.1823851Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:20:45.1824346Z stats [('calls_captured', 36)] 2025-12-04T11:20:45.1824800Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:20:45.1825705Z inductor [('pattern_matcher_nodes', 36), ('woq_matcher_nodes', 24), ('pattern_matcher_count', 18), ('woq_matcher_count', 6), ('fxgraph_cache_miss', 2)] 2025-12-04T11:20:45.1826479Z graph_break [] 2025-12-04T11:20:45.1826865Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:20:45.1827966Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.1828937Z warnings.warn( 2025-12-04T11:20:45.1829814Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.1830794Z warnings.warn( 2025-12-04T11:20:45.1831188Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:20:45.1831656Z stats [('calls_captured', 36)] 2025-12-04T11:20:45.1832114Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:20:45.1833136Z inductor [('pattern_matcher_nodes', 36), ('woq_matcher_nodes', 24), ('pattern_matcher_count', 18), ('woq_matcher_count', 6), ('fxgraph_cache_miss', 2)] 2025-12-04T11:20:45.1833919Z graph_break [] 2025-12-04T11:20:45.1834290Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:20:45.1835385Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.1836386Z warnings.warn( 2025-12-04T11:20:45.1837265Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.1838240Z warnings.warn( 2025-12-04T11:20:45.1839248Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-74cab4bdcde89184.xml - 2025-12-04T11:20:45.1840401Z =========================== short test summary info ============================ 2025-12-04T11:20:45.1841694Z FAILED [0.8264s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 - AssertionError: Scalars are not equal! 2025-12-04T11:20:45.1842834Z 2025-12-04T11:20:45.1842943Z Expected 3 but got 6. 2025-12-04T11:20:45.1843242Z Absolute difference: 3 2025-12-04T11:20:45.1843542Z Relative difference: 1.0 2025-12-04T11:20:45.1843740Z 2025-12-04T11:20:45.1843965Z To execute this test, run the following from the base repo dir: 2025-12-04T11:20:45.1845262Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T11:20:45.1846340Z 2025-12-04T11:20:45.1846611Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:20:45.1847213Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:20:45.1847695Z ========================== 1 failed, 2 rerun in 5.98s ========================== 2025-12-04T11:20:45.1848111Z Got exit code 1 2025-12-04T11:20:45.1848386Z Retrying single test... 2025-12-04T11:20:45.1849009Z W1204 10:59:57.781000 86519 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T11:20:45.1850253Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-77e37a2f8b75b3d9.xml 2025-12-04T11:20:45.1851219Z ============================= test session starts ============================== 2025-12-04T11:20:45.1851886Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T11:20:45.1852477Z cachedir: .pytest_cache 2025-12-04T11:20:45.1853188Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:20:45.1853981Z rootdir: /var/lib/jenkins/workspace 2025-12-04T11:20:45.1854345Z configfile: pytest.ini 2025-12-04T11:20:45.1855066Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:20:45.1855967Z collecting ... collected 58 items / 13 deselected / 45 selected 2025-12-04T11:20:45.1857431Z stepcurrent: skipping 0 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T11:20:45.1858711Z Running 1 items in this shard 2025-12-04T11:20:45.1858927Z 2025-12-04T11:20:45.1860313Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 [W1204 11:00:01.840475111 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.1861760Z 2025-12-04T11:20:45.1862286Z [W1204 11:00:17.953892386 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.1862953Z 2025-12-04T11:20:45.1863469Z [W1204 11:00:17.954164824 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.1864177Z 2025-12-04T11:20:45.1864688Z [W1204 11:00:17.954810229 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.1865333Z 2025-12-04T11:20:45.1865855Z [W1204 11:00:17.955015966 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.1866503Z 2025-12-04T11:20:45.1867026Z [W1204 11:00:17.956811964 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.1867679Z 2025-12-04T11:20:45.1868189Z [W1204 11:00:17.956991029 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.1868886Z 2025-12-04T11:20:45.1869400Z [W1204 11:00:17.957308954 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.1870063Z 2025-12-04T11:20:45.1870575Z [W1204 11:00:17.957480472 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.1871442Z 2025-12-04T11:20:45.1871970Z [W1204 11:00:17.968243169 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.1872619Z 2025-12-04T11:20:45.1873148Z [W1204 11:00:17.968480778 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.1873798Z 2025-12-04T11:20:45.1874318Z [W1204 11:00:17.968686775 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.1874983Z 2025-12-04T11:20:45.1875497Z [W1204 11:00:17.968979381 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.1876155Z 2025-12-04T11:20:45.1876671Z [W1204 11:00:17.969152214 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.1877321Z 2025-12-04T11:20:45.1877844Z [W1204 11:00:17.969445114 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.1878489Z 2025-12-04T11:20:45.1879015Z [W1204 11:00:17.969619167 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.1879663Z 2025-12-04T11:20:45.1880177Z [W1204 11:00:17.969903509 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.1880842Z 2025-12-04T11:20:45.1881352Z [W1204 11:00:17.970104807 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.1882011Z 2025-12-04T11:20:45.1882520Z [W1204 11:00:17.095645911 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.1883173Z 2025-12-04T11:20:45.1883700Z [W1204 11:00:17.095970543 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.1884346Z 2025-12-04T11:20:45.1884864Z [W1204 11:00:17.096161586 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.1885511Z 2025-12-04T11:20:45.1886148Z [W1204 11:00:17.096460349 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.1886812Z 2025-12-04T11:20:45.1887324Z [W1204 11:00:17.096648820 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.1887986Z 2025-12-04T11:20:45.1888495Z [W1204 11:00:17.096954946 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.1889196Z 2025-12-04T11:20:45.1889724Z [W1204 11:00:17.097124097 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.1890372Z 2025-12-04T11:20:45.1890901Z [W1204 11:00:17.097403096 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.1891551Z 2025-12-04T11:20:45.1892068Z [W1204 11:00:17.097570407 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.1892786Z 2025-12-04T11:20:45.1893298Z [W1204 11:00:19.182966350 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.1893962Z 2025-12-04T11:20:45.1894473Z [W1204 11:00:19.184220379 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.1895141Z 2025-12-04T11:20:45.1895660Z [W1204 11:00:19.184419949 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.1896389Z 2025-12-04T11:20:45.1896914Z [W1204 11:00:19.184733274 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.1897563Z 2025-12-04T11:20:45.1898091Z [W1204 11:00:19.184929591 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.1898744Z 2025-12-04T11:20:45.1899258Z [W1204 11:00:19.185236684 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.1899924Z 2025-12-04T11:20:45.1900433Z [W1204 11:00:19.185421198 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.1901097Z 2025-12-04T11:20:45.1901611Z [W1204 11:00:19.185705553 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.1902262Z 2025-12-04T11:20:45.1902786Z [W1204 11:00:19.185879999 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.1903428Z 2025-12-04T11:20:45.1903958Z [W1204 11:00:19.194064884 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.1904608Z 2025-12-04T11:20:45.1905121Z [W1204 11:00:19.194311849 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.1905784Z 2025-12-04T11:20:45.1906296Z [W1204 11:00:19.194504964 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.1906960Z 2025-12-04T11:20:45.1907476Z [W1204 11:00:19.194781605 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.1908125Z 2025-12-04T11:20:45.1908653Z [W1204 11:00:19.194960093 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.1909300Z 2025-12-04T11:20:45.1909830Z [W1204 11:00:19.195261953 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.1910544Z 2025-12-04T11:20:45.1911057Z [W1204 11:00:19.195438828 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.1911725Z 2025-12-04T11:20:45.1912237Z [W1204 11:00:19.195722906 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.1912934Z 2025-12-04T11:20:45.1913447Z [W1204 11:00:19.195897647 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.1914095Z 2025-12-04T11:20:45.1914623Z [W1204 11:00:19.314678505 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.1915270Z 2025-12-04T11:20:45.1915797Z [W1204 11:00:19.314970990 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.1916444Z 2025-12-04T11:20:45.1916961Z [W1204 11:00:19.315163864 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.1917661Z 2025-12-04T11:20:45.1918173Z [W1204 11:00:19.315469472 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.1918832Z 2025-12-04T11:20:45.1919349Z [W1204 11:00:19.315646782 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.1919998Z 2025-12-04T11:20:45.1920526Z [W1204 11:00:19.315952755 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.1921174Z 2025-12-04T11:20:45.1921702Z [W1204 11:00:19.316129178 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.1922351Z 2025-12-04T11:20:45.1922869Z [W1204 11:00:19.316420043 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.1923537Z 2025-12-04T11:20:45.1924050Z [W1204 11:00:19.316611607 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.1924714Z 2025-12-04T11:20:45.1924850Z ('RERUN', {'yellow': True}) [20.3918s] [100%] 2025-12-04T11:20:45.1926427Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 [W1204 11:00:20.770626612 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.1927867Z 2025-12-04T11:20:45.1928397Z [W1204 11:00:20.770918494 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.1929051Z 2025-12-04T11:20:45.1929580Z [W1204 11:00:20.771108700 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.1930228Z 2025-12-04T11:20:45.1930738Z [W1204 11:00:20.771396373 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.1931400Z 2025-12-04T11:20:45.1931913Z [W1204 11:00:20.771577779 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.1932575Z 2025-12-04T11:20:45.1933091Z [W1204 11:00:20.771882639 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.1933736Z 2025-12-04T11:20:45.1934260Z [W1204 11:00:20.772057543 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.1934909Z 2025-12-04T11:20:45.1935520Z [W1204 11:00:20.772342053 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.1936173Z 2025-12-04T11:20:45.1936781Z [W1204 11:00:20.772515249 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.1937447Z 2025-12-04T11:20:45.1937959Z [W1204 11:00:20.781012889 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.1938655Z 2025-12-04T11:20:45.1939167Z [W1204 11:00:20.781269766 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.1939817Z 2025-12-04T11:20:45.1940341Z [W1204 11:00:20.781459534 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.1940991Z 2025-12-04T11:20:45.1941523Z [W1204 11:00:20.781744592 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.1942205Z 2025-12-04T11:20:45.1942714Z [W1204 11:00:20.781921396 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.1943369Z 2025-12-04T11:20:45.1943883Z [W1204 11:00:20.782226225 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.1944550Z 2025-12-04T11:20:45.1945061Z [W1204 11:00:20.782401036 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.1945702Z 2025-12-04T11:20:45.1946226Z [W1204 11:00:20.782680355 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.1946874Z 2025-12-04T11:20:45.1947405Z [W1204 11:00:20.782851371 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.1948057Z 2025-12-04T11:20:45.1948571Z [W1204 11:00:20.901517992 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.1949236Z 2025-12-04T11:20:45.1949745Z [W1204 11:00:20.901811414 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.1950402Z 2025-12-04T11:20:45.1950912Z [W1204 11:00:20.902002693 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.1951563Z 2025-12-04T11:20:45.1952088Z [W1204 11:00:20.902301778 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.1952734Z 2025-12-04T11:20:45.1953262Z [W1204 11:00:20.902481222 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.1953909Z 2025-12-04T11:20:45.1954426Z [W1204 11:00:20.902785157 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.1955088Z 2025-12-04T11:20:45.1955605Z [W1204 11:00:20.902959692 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.1956270Z 2025-12-04T11:20:45.1956781Z [W1204 11:00:20.903241195 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.1957426Z 2025-12-04T11:20:45.1957950Z [W1204 11:00:20.903413893 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.1958596Z 2025-12-04T11:20:45.1959119Z [W1204 11:00:20.070163567 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.1959857Z 2025-12-04T11:20:45.1960373Z [W1204 11:00:20.070466279 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.1961043Z 2025-12-04T11:20:45.1961555Z [W1204 11:00:20.070663357 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.1962255Z 2025-12-04T11:20:45.1962768Z [W1204 11:00:20.070958604 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.1963436Z 2025-12-04T11:20:45.1963949Z [W1204 11:00:20.071145989 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.1964598Z 2025-12-04T11:20:45.1965121Z [W1204 11:00:20.071448261 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.1965776Z 2025-12-04T11:20:45.1966304Z [W1204 11:00:20.071628375 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.1966987Z 2025-12-04T11:20:45.1967497Z [W1204 11:00:20.071914899 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.1968157Z 2025-12-04T11:20:45.1968670Z [W1204 11:00:20.072089877 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.1969327Z 2025-12-04T11:20:45.1969839Z [W1204 11:00:20.080524134 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.1970483Z 2025-12-04T11:20:45.1971213Z [W1204 11:00:20.080832059 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.1971869Z 2025-12-04T11:20:45.1972401Z [W1204 11:00:20.081030061 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.1973054Z 2025-12-04T11:20:45.1973565Z [W1204 11:00:20.081314117 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.1974229Z 2025-12-04T11:20:45.1974743Z [W1204 11:00:20.081492458 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.1975405Z 2025-12-04T11:20:45.1975921Z [W1204 11:00:20.081788949 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.1976636Z 2025-12-04T11:20:45.1977160Z [W1204 11:00:20.081964710 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.1977806Z 2025-12-04T11:20:45.1978338Z [W1204 11:00:20.082262923 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.1978989Z 2025-12-04T11:20:45.1979502Z [W1204 11:00:20.082439281 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.1980161Z 2025-12-04T11:20:45.1980675Z [W1204 11:00:20.202496916 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.1981338Z 2025-12-04T11:20:45.1981855Z [W1204 11:00:20.202789151 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.1982508Z 2025-12-04T11:20:45.1983039Z [W1204 11:00:20.202982229 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.1983686Z 2025-12-04T11:20:45.1984327Z [W1204 11:00:20.203287309 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.1984985Z 2025-12-04T11:20:45.1985498Z [W1204 11:00:20.203471164 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.1986162Z 2025-12-04T11:20:45.1986676Z [W1204 11:00:20.203778557 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.1987382Z 2025-12-04T11:20:45.1987898Z [W1204 11:00:20.203956237 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.1988545Z 2025-12-04T11:20:45.1989069Z [W1204 11:00:20.204243554 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.1989716Z 2025-12-04T11:20:45.1990243Z [W1204 11:00:20.204415572 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.1990946Z 2025-12-04T11:20:45.1991080Z ('RERUN', {'yellow': True}) [0.8457s] [100%] 2025-12-04T11:20:45.1992650Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 [W1204 11:00:21.591484339 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.1994104Z 2025-12-04T11:20:45.1994618Z [W1204 11:00:21.591787280 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.1995284Z 2025-12-04T11:20:45.1995799Z [W1204 11:00:21.591979661 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.1996446Z 2025-12-04T11:20:45.1996976Z [W1204 11:00:21.592269567 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.1997632Z 2025-12-04T11:20:45.1998159Z [W1204 11:00:21.592447616 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.1998809Z 2025-12-04T11:20:45.1999323Z [W1204 11:00:21.592767296 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.1999989Z 2025-12-04T11:20:45.2000508Z [W1204 11:00:21.592943942 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2001169Z 2025-12-04T11:20:45.2001686Z [W1204 11:00:21.593229118 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2002339Z 2025-12-04T11:20:45.2002872Z [W1204 11:00:21.593400011 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2003523Z 2025-12-04T11:20:45.2004056Z [W1204 11:00:21.601801537 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2004709Z 2025-12-04T11:20:45.2005221Z [W1204 11:00:21.602057175 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2005883Z 2025-12-04T11:20:45.2006395Z [W1204 11:00:21.602245364 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2007054Z 2025-12-04T11:20:45.2007566Z [W1204 11:00:21.602523826 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2008218Z 2025-12-04T11:20:45.2008738Z [W1204 11:00:21.602696622 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2009451Z 2025-12-04T11:20:45.2009977Z [W1204 11:00:21.602989909 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2010630Z 2025-12-04T11:20:45.2011139Z [W1204 11:00:21.603163757 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2011830Z 2025-12-04T11:20:45.2012343Z [W1204 11:00:21.603443185 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2013004Z 2025-12-04T11:20:45.2013518Z [W1204 11:00:21.603624371 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2014167Z 2025-12-04T11:20:45.2014695Z [W1204 11:00:21.721047897 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2015345Z 2025-12-04T11:20:45.2015876Z [W1204 11:00:21.721335371 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2016660Z 2025-12-04T11:20:45.2017182Z [W1204 11:00:21.721525245 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2017847Z 2025-12-04T11:20:45.2018368Z [W1204 11:00:21.721820804 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2019031Z 2025-12-04T11:20:45.2019543Z [W1204 11:00:21.721999497 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2020196Z 2025-12-04T11:20:45.2020725Z [W1204 11:00:21.722295721 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2021380Z 2025-12-04T11:20:45.2021911Z [W1204 11:00:21.722472531 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2022567Z 2025-12-04T11:20:45.2023079Z [W1204 11:00:21.722757128 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2023748Z 2025-12-04T11:20:45.2024260Z [W1204 11:00:21.722929754 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2024925Z 2025-12-04T11:20:45.2025440Z [W1204 11:00:21.889770494 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2026090Z 2025-12-04T11:20:45.2026616Z [W1204 11:00:21.890093563 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2027264Z 2025-12-04T11:20:45.2027795Z [W1204 11:00:21.890300142 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2028443Z 2025-12-04T11:20:45.2028952Z [W1204 11:00:21.890587401 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2029617Z 2025-12-04T11:20:45.2030128Z [W1204 11:00:21.890769509 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2030797Z 2025-12-04T11:20:45.2031313Z [W1204 11:00:21.891068843 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2031977Z 2025-12-04T11:20:45.2032492Z [W1204 11:00:21.891249625 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2033141Z 2025-12-04T11:20:45.2033745Z [W1204 11:00:21.891534095 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2034401Z 2025-12-04T11:20:45.2034929Z [W1204 11:00:21.891709417 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2035581Z 2025-12-04T11:20:45.2036092Z [W1204 11:00:21.899735938 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2036788Z 2025-12-04T11:20:45.2037302Z [W1204 11:00:21.899972111 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2037966Z 2025-12-04T11:20:45.2038474Z [W1204 11:00:21.900196171 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2039125Z 2025-12-04T11:20:45.2039653Z [W1204 11:00:21.900483343 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2040338Z 2025-12-04T11:20:45.2040862Z [W1204 11:00:21.900673369 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2041509Z 2025-12-04T11:20:45.2042020Z [W1204 11:00:21.900974792 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2042689Z 2025-12-04T11:20:45.2043200Z [W1204 11:00:21.901149948 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2043865Z 2025-12-04T11:20:45.2044376Z [W1204 11:00:21.901432333 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2045030Z 2025-12-04T11:20:45.2045561Z [W1204 11:00:21.901605830 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2046210Z 2025-12-04T11:20:45.2046741Z [W1204 11:00:21.019100577 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2047389Z 2025-12-04T11:20:45.2047903Z [W1204 11:00:21.019410159 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2048563Z 2025-12-04T11:20:45.2049078Z [W1204 11:00:21.019603022 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2049740Z 2025-12-04T11:20:45.2050251Z [W1204 11:00:21.019902421 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2050906Z 2025-12-04T11:20:45.2051434Z [W1204 11:00:21.020108298 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2052090Z 2025-12-04T11:20:45.2052616Z [W1204 11:00:21.020432567 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2053264Z 2025-12-04T11:20:45.2053777Z [W1204 11:00:21.020649246 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2054437Z 2025-12-04T11:20:45.2054950Z [W1204 11:00:21.020961166 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2055610Z 2025-12-04T11:20:45.2056124Z [W1204 11:00:21.021139146 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2056855Z 2025-12-04T11:20:45.2056974Z FAILED [0.8145s] [100%] 2025-12-04T11:20:45.2057155Z 2025-12-04T11:20:45.2057316Z ==================================== RERUNS ==================================== 2025-12-04T11:20:45.2058216Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 _ 2025-12-04T11:20:45.2059023Z Traceback (most recent call last): 2025-12-04T11:20:45.2059822Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 204, in test_int8_woq_mm_concat_cuda 2025-12-04T11:20:45.2060723Z self.assertEqual(counters["inductor"]["woq_matcher_count"], 3) 2025-12-04T11:20:45.2061603Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual 2025-12-04T11:20:45.2062375Z return super().assertEqual(x, y, *args, **kwargs) 2025-12-04T11:20:45.2063219Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual 2025-12-04T11:20:45.2064092Z raise error_metas.pop()[0].to_error( # type: ignore[index] 2025-12-04T11:20:45.2064574Z AssertionError: Scalars are not equal! 2025-12-04T11:20:45.2064831Z 2025-12-04T11:20:45.2064954Z Expected 3 but got 6. 2025-12-04T11:20:45.2065272Z Absolute difference: 3 2025-12-04T11:20:45.2065576Z Relative difference: 1.0 2025-12-04T11:20:45.2065783Z 2025-12-04T11:20:45.2065998Z To execute this test, run the following from the base repo dir: 2025-12-04T11:20:45.2067293Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T11:20:45.2068362Z 2025-12-04T11:20:45.2068633Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:20:45.2069271Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:20:45.2069759Z stats [('calls_captured', 36)] 2025-12-04T11:20:45.2070514Z inductor [('pattern_matcher_nodes', 36), ('woq_matcher_nodes', 24), ('pattern_matcher_count', 18), ('woq_matcher_count', 6), ('fxgraph_cache_miss', 2)] 2025-12-04T11:20:45.2071592Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:20:45.2072077Z graph_break [] 2025-12-04T11:20:45.2072467Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:20:45.2074046Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T11:20:45.2075503Z if out == self.unknown_value: 2025-12-04T11:20:45.2076454Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.2077422Z warnings.warn( 2025-12-04T11:20:45.2078317Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.2079270Z warnings.warn( 2025-12-04T11:20:45.2079979Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 _ 2025-12-04T11:20:45.2080777Z Traceback (most recent call last): 2025-12-04T11:20:45.2081557Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 204, in test_int8_woq_mm_concat_cuda 2025-12-04T11:20:45.2082479Z self.assertEqual(counters["inductor"]["woq_matcher_count"], 3) 2025-12-04T11:20:45.2083313Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual 2025-12-04T11:20:45.2084076Z return super().assertEqual(x, y, *args, **kwargs) 2025-12-04T11:20:45.2084896Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual 2025-12-04T11:20:45.2085779Z raise error_metas.pop()[0].to_error( # type: ignore[index] 2025-12-04T11:20:45.2086390Z AssertionError: Scalars are not equal! 2025-12-04T11:20:45.2086649Z 2025-12-04T11:20:45.2086757Z Expected 3 but got 6. 2025-12-04T11:20:45.2087053Z Absolute difference: 3 2025-12-04T11:20:45.2087354Z Relative difference: 1.0 2025-12-04T11:20:45.2087549Z 2025-12-04T11:20:45.2087781Z To execute this test, run the following from the base repo dir: 2025-12-04T11:20:45.2089061Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T11:20:45.2090183Z 2025-12-04T11:20:45.2090455Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:20:45.2091090Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:20:45.2091581Z stats [('calls_captured', 36)] 2025-12-04T11:20:45.2092333Z inductor [('pattern_matcher_nodes', 36), ('woq_matcher_nodes', 24), ('pattern_matcher_count', 18), ('woq_matcher_count', 6), ('fxgraph_cache_miss', 2)] 2025-12-04T11:20:45.2093284Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:20:45.2093758Z graph_break [] 2025-12-04T11:20:45.2094126Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:20:45.2095699Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T11:20:45.2097257Z if out == self.unknown_value: 2025-12-04T11:20:45.2098206Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.2099170Z warnings.warn( 2025-12-04T11:20:45.2100064Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.2101042Z warnings.warn( 2025-12-04T11:20:45.2101428Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:20:45.2101902Z stats [('calls_captured', 36)] 2025-12-04T11:20:45.2102353Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:20:45.2103261Z inductor [('pattern_matcher_nodes', 36), ('woq_matcher_nodes', 24), ('pattern_matcher_count', 18), ('woq_matcher_count', 6), ('fxgraph_cache_miss', 2)] 2025-12-04T11:20:45.2104033Z graph_break [] 2025-12-04T11:20:45.2104404Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:20:45.2105502Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.2106469Z warnings.warn( 2025-12-04T11:20:45.2107350Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.2108323Z warnings.warn( 2025-12-04T11:20:45.2108644Z =================================== FAILURES =================================== 2025-12-04T11:20:45.2109483Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 _ 2025-12-04T11:20:45.2110279Z Traceback (most recent call last): 2025-12-04T11:20:45.2111073Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 204, in test_int8_woq_mm_concat_cuda 2025-12-04T11:20:45.2111992Z self.assertEqual(counters["inductor"]["woq_matcher_count"], 3) 2025-12-04T11:20:45.2112816Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual 2025-12-04T11:20:45.2113589Z return super().assertEqual(x, y, *args, **kwargs) 2025-12-04T11:20:45.2114509Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual 2025-12-04T11:20:45.2115402Z raise error_metas.pop()[0].to_error( # type: ignore[index] 2025-12-04T11:20:45.2115868Z AssertionError: Scalars are not equal! 2025-12-04T11:20:45.2116135Z 2025-12-04T11:20:45.2116245Z Expected 3 but got 6. 2025-12-04T11:20:45.2116604Z Absolute difference: 3 2025-12-04T11:20:45.2116892Z Relative difference: 1.0 2025-12-04T11:20:45.2117097Z 2025-12-04T11:20:45.2117311Z To execute this test, run the following from the base repo dir: 2025-12-04T11:20:45.2118603Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T11:20:45.2119672Z 2025-12-04T11:20:45.2119952Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:20:45.2120586Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:20:45.2121112Z stats [('calls_captured', 36)] 2025-12-04T11:20:45.2121857Z inductor [('pattern_matcher_nodes', 36), ('woq_matcher_nodes', 24), ('pattern_matcher_count', 18), ('woq_matcher_count', 6), ('fxgraph_cache_miss', 2)] 2025-12-04T11:20:45.2122758Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:20:45.2123223Z graph_break [] 2025-12-04T11:20:45.2123600Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:20:45.2125174Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T11:20:45.2126635Z if out == self.unknown_value: 2025-12-04T11:20:45.2127571Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.2128549Z warnings.warn( 2025-12-04T11:20:45.2129442Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.2130406Z warnings.warn( 2025-12-04T11:20:45.2130785Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:20:45.2131270Z stats [('calls_captured', 36)] 2025-12-04T11:20:45.2131718Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:20:45.2132607Z inductor [('pattern_matcher_nodes', 36), ('woq_matcher_nodes', 24), ('pattern_matcher_count', 18), ('woq_matcher_count', 6), ('fxgraph_cache_miss', 2)] 2025-12-04T11:20:45.2133375Z graph_break [] 2025-12-04T11:20:45.2133753Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:20:45.2134844Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.2135801Z warnings.warn( 2025-12-04T11:20:45.2136777Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.2137746Z warnings.warn( 2025-12-04T11:20:45.2138121Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:20:45.2138611Z stats [('calls_captured', 36)] 2025-12-04T11:20:45.2139065Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:20:45.2139963Z inductor [('pattern_matcher_nodes', 36), ('woq_matcher_nodes', 24), ('pattern_matcher_count', 18), ('woq_matcher_count', 6), ('fxgraph_cache_miss', 2)] 2025-12-04T11:20:45.2140723Z graph_break [] 2025-12-04T11:20:45.2141187Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:20:45.2142280Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.2143250Z warnings.warn( 2025-12-04T11:20:45.2144128Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.2145133Z warnings.warn( 2025-12-04T11:20:45.2146144Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-77e37a2f8b75b3d9.xml - 2025-12-04T11:20:45.2147283Z =========================== short test summary info ============================ 2025-12-04T11:20:45.2148576Z FAILED [0.8145s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 - AssertionError: Scalars are not equal! 2025-12-04T11:20:45.2149712Z 2025-12-04T11:20:45.2149822Z Expected 3 but got 6. 2025-12-04T11:20:45.2150120Z Absolute difference: 3 2025-12-04T11:20:45.2150413Z Relative difference: 1.0 2025-12-04T11:20:45.2150621Z 2025-12-04T11:20:45.2150838Z To execute this test, run the following from the base repo dir: 2025-12-04T11:20:45.2152142Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T11:20:45.2153206Z 2025-12-04T11:20:45.2153491Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:20:45.2154074Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:20:45.2154605Z ================== 1 failed, 13 deselected, 2 rerun in 22.09s ================== 2025-12-04T11:20:45.2155049Z Got exit code 1 2025-12-04T11:20:45.2155330Z Retrying single test... 2025-12-04T11:20:45.2155958Z W1204 11:00:33.561000 86694 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T11:20:45.2157201Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-3ba19b390afd5854.xml 2025-12-04T11:20:45.2158170Z ============================= test session starts ============================== 2025-12-04T11:20:45.2158823Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T11:20:45.2159427Z cachedir: .pytest_cache 2025-12-04T11:20:45.2160142Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:20:45.2160930Z rootdir: /var/lib/jenkins/workspace 2025-12-04T11:20:45.2161274Z configfile: pytest.ini 2025-12-04T11:20:45.2162010Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:20:45.2162911Z collecting ... collected 58 items / 13 deselected / 45 selected 2025-12-04T11:20:45.2164267Z stepcurrent: skipping 0 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T11:20:45.2165524Z Running 1 items in this shard 2025-12-04T11:20:45.2165754Z 2025-12-04T11:20:45.2167051Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 [W1204 11:00:37.646486466 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2168486Z 2025-12-04T11:20:45.2169075Z [W1204 11:00:53.486280731 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2169738Z 2025-12-04T11:20:45.2170264Z [W1204 11:00:53.486541899 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2170915Z 2025-12-04T11:20:45.2171675Z [W1204 11:00:53.487159035 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2172395Z 2025-12-04T11:20:45.2172911Z [W1204 11:00:53.487368813 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2173575Z 2025-12-04T11:20:45.2174090Z [W1204 11:00:53.489227826 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2174751Z 2025-12-04T11:20:45.2175273Z [W1204 11:00:53.489404642 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2175922Z 2025-12-04T11:20:45.2176594Z [W1204 11:00:53.489726629 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2177244Z 2025-12-04T11:20:45.2177764Z [W1204 11:00:53.489895277 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2178413Z 2025-12-04T11:20:45.2178920Z [W1204 11:00:53.500684314 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2179581Z 2025-12-04T11:20:45.2180093Z [W1204 11:00:53.500930971 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2180749Z 2025-12-04T11:20:45.2181266Z [W1204 11:00:53.501120512 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2181916Z 2025-12-04T11:20:45.2182449Z [W1204 11:00:53.501400885 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2183113Z 2025-12-04T11:20:45.2183643Z [W1204 11:00:53.501572741 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2184293Z 2025-12-04T11:20:45.2184804Z [W1204 11:00:53.501865671 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2185466Z 2025-12-04T11:20:45.2185981Z [W1204 11:00:53.502035585 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2186646Z 2025-12-04T11:20:45.2187162Z [W1204 11:00:53.502317094 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2187814Z 2025-12-04T11:20:45.2188349Z [W1204 11:00:53.502485434 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2189003Z 2025-12-04T11:20:45.2189532Z [W1204 11:00:53.624582823 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2190182Z 2025-12-04T11:20:45.2190702Z [W1204 11:00:53.624905435 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2191370Z 2025-12-04T11:20:45.2191881Z [W1204 11:00:53.625092982 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2192545Z 2025-12-04T11:20:45.2193061Z [W1204 11:00:53.625387441 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2193728Z 2025-12-04T11:20:45.2194351Z [W1204 11:00:53.625560299 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2195004Z 2025-12-04T11:20:45.2195533Z [W1204 11:00:53.625849571 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2196179Z 2025-12-04T11:20:45.2196706Z [W1204 11:00:53.626025173 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2197393Z 2025-12-04T11:20:45.2197905Z [W1204 11:00:53.626302703 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2198570Z 2025-12-04T11:20:45.2199088Z [W1204 11:00:53.626469155 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2199748Z 2025-12-04T11:20:45.2200263Z [W1204 11:00:55.710732486 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2200975Z 2025-12-04T11:20:45.2201499Z [W1204 11:00:55.711955233 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2202150Z 2025-12-04T11:20:45.2202672Z [W1204 11:00:55.712150608 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2203323Z 2025-12-04T11:20:45.2203835Z [W1204 11:00:55.712437201 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2204495Z 2025-12-04T11:20:45.2205008Z [W1204 11:00:55.712630266 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2205669Z 2025-12-04T11:20:45.2206184Z [W1204 11:00:55.712930295 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2206833Z 2025-12-04T11:20:45.2207361Z [W1204 11:00:55.713106713 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2208009Z 2025-12-04T11:20:45.2208528Z [W1204 11:00:55.713385614 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2209177Z 2025-12-04T11:20:45.2209692Z [W1204 11:00:55.713560130 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2210349Z 2025-12-04T11:20:45.2210860Z [W1204 11:00:55.721874542 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2211518Z 2025-12-04T11:20:45.2212035Z [W1204 11:00:55.722126391 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2212682Z 2025-12-04T11:20:45.2213209Z [W1204 11:00:55.722316080 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2213853Z 2025-12-04T11:20:45.2214382Z [W1204 11:00:55.722591300 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2215030Z 2025-12-04T11:20:45.2215540Z [W1204 11:00:55.722767172 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2216202Z 2025-12-04T11:20:45.2216806Z [W1204 11:00:55.723057582 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2217469Z 2025-12-04T11:20:45.2217985Z [W1204 11:00:55.723232472 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2218735Z 2025-12-04T11:20:45.2219269Z [W1204 11:00:55.723512102 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2219919Z 2025-12-04T11:20:45.2220446Z [W1204 11:00:55.723682292 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2221129Z 2025-12-04T11:20:45.2221647Z [W1204 11:00:55.841636637 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2222309Z 2025-12-04T11:20:45.2222818Z [W1204 11:00:55.841925179 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2223479Z 2025-12-04T11:20:45.2223991Z [W1204 11:00:55.842113664 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2224656Z 2025-12-04T11:20:45.2225166Z [W1204 11:00:55.842411593 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2225843Z 2025-12-04T11:20:45.2226369Z [W1204 11:00:55.842592035 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2227019Z 2025-12-04T11:20:45.2227531Z [W1204 11:00:55.842891664 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2228195Z 2025-12-04T11:20:45.2228703Z [W1204 11:00:55.843062327 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2229363Z 2025-12-04T11:20:45.2229871Z [W1204 11:00:55.843340021 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2230536Z 2025-12-04T11:20:45.2231050Z [W1204 11:00:55.843519682 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2231704Z 2025-12-04T11:20:45.2231851Z ('RERUN', {'yellow': True}) [20.1353s] [100%] 2025-12-04T11:20:45.2233409Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 [W1204 11:00:55.294580613 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2234851Z 2025-12-04T11:20:45.2235366Z [W1204 11:00:55.294860416 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2236033Z 2025-12-04T11:20:45.2236544Z [W1204 11:00:55.295045956 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2237202Z 2025-12-04T11:20:45.2237719Z [W1204 11:00:55.295321906 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2238365Z 2025-12-04T11:20:45.2238894Z [W1204 11:00:55.295498608 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2239540Z 2025-12-04T11:20:45.2240072Z [W1204 11:00:55.295792143 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2240716Z 2025-12-04T11:20:45.2241226Z [W1204 11:00:55.295965329 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2241886Z 2025-12-04T11:20:45.2242398Z [W1204 11:00:55.296242076 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2243059Z 2025-12-04T11:20:45.2243641Z [W1204 11:00:55.296409163 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2244292Z 2025-12-04T11:20:45.2244816Z [W1204 11:00:55.304752908 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2245463Z 2025-12-04T11:20:45.2245983Z [W1204 11:00:55.304992280 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2246661Z 2025-12-04T11:20:45.2247172Z [W1204 11:00:55.305178289 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2247829Z 2025-12-04T11:20:45.2248340Z [W1204 11:00:55.305447963 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2249003Z 2025-12-04T11:20:45.2249515Z [W1204 11:00:55.305619612 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2250197Z 2025-12-04T11:20:45.2250721Z [W1204 11:00:55.305904088 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2251367Z 2025-12-04T11:20:45.2251886Z [W1204 11:00:55.306075103 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2252533Z 2025-12-04T11:20:45.2253046Z [W1204 11:00:55.306357035 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2253708Z 2025-12-04T11:20:45.2254219Z [W1204 11:00:55.306526971 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2254877Z 2025-12-04T11:20:45.2255392Z [W1204 11:00:56.425691044 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2256041Z 2025-12-04T11:20:45.2256647Z [W1204 11:00:56.425970995 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2257293Z 2025-12-04T11:20:45.2257821Z [W1204 11:00:56.426155476 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2258473Z 2025-12-04T11:20:45.2258984Z [W1204 11:00:56.426445978 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2259649Z 2025-12-04T11:20:45.2260160Z [W1204 11:00:56.426620050 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2260819Z 2025-12-04T11:20:45.2261339Z [W1204 11:00:56.426911983 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2262000Z 2025-12-04T11:20:45.2262513Z [W1204 11:00:56.427081021 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2263160Z 2025-12-04T11:20:45.2263682Z [W1204 11:00:56.427355773 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2264331Z 2025-12-04T11:20:45.2264853Z [W1204 11:00:56.427520914 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2265501Z 2025-12-04T11:20:45.2266010Z [W1204 11:00:56.592502746 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2266666Z 2025-12-04T11:20:45.2267258Z [W1204 11:00:56.592809066 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2267922Z 2025-12-04T11:20:45.2268436Z [W1204 11:00:56.593003371 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2269079Z 2025-12-04T11:20:45.2269600Z [W1204 11:00:56.593293002 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2270287Z 2025-12-04T11:20:45.2270808Z [W1204 11:00:56.593473274 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2271652Z 2025-12-04T11:20:45.2272162Z [W1204 11:00:56.593768048 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2272823Z 2025-12-04T11:20:45.2273332Z [W1204 11:00:56.593943222 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2273996Z 2025-12-04T11:20:45.2274509Z [W1204 11:00:56.594223642 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2275233Z 2025-12-04T11:20:45.2275759Z [W1204 11:00:56.594396788 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2276409Z 2025-12-04T11:20:45.2276934Z [W1204 11:00:56.602623396 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2277579Z 2025-12-04T11:20:45.2278091Z [W1204 11:00:56.602876355 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2278758Z 2025-12-04T11:20:45.2279273Z [W1204 11:00:56.603066602 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2279940Z 2025-12-04T11:20:45.2280455Z [W1204 11:00:56.603344763 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2281106Z 2025-12-04T11:20:45.2281629Z [W1204 11:00:56.603521164 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2282275Z 2025-12-04T11:20:45.2282800Z [W1204 11:00:56.603812686 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2283445Z 2025-12-04T11:20:45.2283956Z [W1204 11:00:56.603986235 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2284614Z 2025-12-04T11:20:45.2285124Z [W1204 11:00:56.604267437 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2285783Z 2025-12-04T11:20:45.2286299Z [W1204 11:00:56.604441727 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2286949Z 2025-12-04T11:20:45.2287470Z [W1204 11:00:56.722288395 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2288117Z 2025-12-04T11:20:45.2288639Z [W1204 11:00:56.722578767 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2289291Z 2025-12-04T11:20:45.2289802Z [W1204 11:00:56.722769312 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2290461Z 2025-12-04T11:20:45.2290971Z [W1204 11:00:56.723066328 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2328642Z 2025-12-04T11:20:45.2329511Z [W1204 11:00:56.723244892 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2330210Z 2025-12-04T11:20:45.2330724Z [W1204 11:00:56.723545329 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2331389Z 2025-12-04T11:20:45.2331896Z [W1204 11:00:56.723721865 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2332600Z 2025-12-04T11:20:45.2333127Z [W1204 11:00:56.724001763 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2333775Z 2025-12-04T11:20:45.2334290Z [W1204 11:00:56.724171507 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2334941Z 2025-12-04T11:20:45.2335079Z ('RERUN', {'yellow': True}) [0.8417s] [100%] 2025-12-04T11:20:45.2336769Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 [W1204 11:00:56.117593935 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2338279Z 2025-12-04T11:20:45.2338795Z [W1204 11:00:56.117887041 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2339455Z 2025-12-04T11:20:45.2339985Z [W1204 11:00:56.118077247 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2340632Z 2025-12-04T11:20:45.2341156Z [W1204 11:00:56.118361239 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2341801Z 2025-12-04T11:20:45.2342316Z [W1204 11:00:56.118543227 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2342984Z 2025-12-04T11:20:45.2343501Z [W1204 11:00:56.118838898 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2344165Z 2025-12-04T11:20:45.2344680Z [W1204 11:00:56.119017502 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2345335Z 2025-12-04T11:20:45.2345863Z [W1204 11:00:56.119299200 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2346511Z 2025-12-04T11:20:45.2347038Z [W1204 11:00:56.119467582 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2347688Z 2025-12-04T11:20:45.2348205Z [W1204 11:00:56.127893405 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2348873Z 2025-12-04T11:20:45.2349384Z [W1204 11:00:56.128140874 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2350049Z 2025-12-04T11:20:45.2350561Z [W1204 11:00:56.128327445 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2351225Z 2025-12-04T11:20:45.2351736Z [W1204 11:00:56.128610853 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2352386Z 2025-12-04T11:20:45.2352910Z [W1204 11:00:56.128782445 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2353555Z 2025-12-04T11:20:45.2354131Z [W1204 11:00:56.129070081 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2354797Z 2025-12-04T11:20:45.2355311Z [W1204 11:00:56.129239352 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2355974Z 2025-12-04T11:20:45.2356483Z [W1204 11:00:56.129517002 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2357180Z 2025-12-04T11:20:45.2357692Z [W1204 11:00:56.129684787 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2358341Z 2025-12-04T11:20:45.2358864Z [W1204 11:00:56.248126998 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2359516Z 2025-12-04T11:20:45.2360041Z [W1204 11:00:56.248413205 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2360694Z 2025-12-04T11:20:45.2361209Z [W1204 11:00:56.248616552 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2361904Z 2025-12-04T11:20:45.2362418Z [W1204 11:00:56.248913040 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2363081Z 2025-12-04T11:20:45.2363595Z [W1204 11:00:56.249087532 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2364243Z 2025-12-04T11:20:45.2364770Z [W1204 11:00:56.249382004 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2365417Z 2025-12-04T11:20:45.2365944Z [W1204 11:00:56.249554124 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2366592Z 2025-12-04T11:20:45.2367106Z [W1204 11:00:56.249833089 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2367776Z 2025-12-04T11:20:45.2368289Z [W1204 11:00:56.250025099 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2368951Z 2025-12-04T11:20:45.2369466Z [W1204 11:00:57.418156039 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2370117Z 2025-12-04T11:20:45.2370641Z [W1204 11:00:57.418454328 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2371555Z 2025-12-04T11:20:45.2372083Z [W1204 11:00:57.418654489 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2372731Z 2025-12-04T11:20:45.2373250Z [W1204 11:00:57.418943951 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2373917Z 2025-12-04T11:20:45.2374428Z [W1204 11:00:57.419127951 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2375088Z 2025-12-04T11:20:45.2375599Z [W1204 11:00:57.419420935 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2376252Z 2025-12-04T11:20:45.2376850Z [W1204 11:00:57.419595387 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2377501Z 2025-12-04T11:20:45.2378026Z [W1204 11:00:57.419875396 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2378673Z 2025-12-04T11:20:45.2379306Z [W1204 11:00:57.420072956 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2379976Z 2025-12-04T11:20:45.2380491Z [W1204 11:00:57.428306360 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2381149Z 2025-12-04T11:20:45.2381660Z [W1204 11:00:57.428578954 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2382356Z 2025-12-04T11:20:45.2382884Z [W1204 11:00:57.428774462 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2383540Z 2025-12-04T11:20:45.2384062Z [W1204 11:00:57.429055240 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2384708Z 2025-12-04T11:20:45.2385226Z [W1204 11:00:57.429231933 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2385937Z 2025-12-04T11:20:45.2386451Z [W1204 11:00:57.429523725 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2387112Z 2025-12-04T11:20:45.2387621Z [W1204 11:00:57.429699797 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2388290Z 2025-12-04T11:20:45.2388801Z [W1204 11:00:57.429981137 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2389450Z 2025-12-04T11:20:45.2389977Z [W1204 11:00:57.430188557 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2390625Z 2025-12-04T11:20:45.2391158Z [W1204 11:00:57.549908308 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2391813Z 2025-12-04T11:20:45.2392328Z [W1204 11:00:57.550221782 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2392988Z 2025-12-04T11:20:45.2393498Z [W1204 11:00:57.550418745 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2394162Z 2025-12-04T11:20:45.2394674Z [W1204 11:00:57.550715201 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2395321Z 2025-12-04T11:20:45.2395847Z [W1204 11:00:57.550891038 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2396496Z 2025-12-04T11:20:45.2397025Z [W1204 11:00:57.551187353 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2397675Z 2025-12-04T11:20:45.2398187Z [W1204 11:00:57.551362311 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2398849Z 2025-12-04T11:20:45.2399359Z [W1204 11:00:57.551642194 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2400024Z 2025-12-04T11:20:45.2400535Z [W1204 11:00:57.551812909 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2401188Z 2025-12-04T11:20:45.2401311Z FAILED [0.8255s] [100%] 2025-12-04T11:20:45.2401492Z 2025-12-04T11:20:45.2401641Z ==================================== RERUNS ==================================== 2025-12-04T11:20:45.2402474Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 _ 2025-12-04T11:20:45.2403347Z Traceback (most recent call last): 2025-12-04T11:20:45.2404143Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 204, in test_int8_woq_mm_concat_cuda 2025-12-04T11:20:45.2405044Z self.assertEqual(counters["inductor"]["woq_matcher_count"], 3) 2025-12-04T11:20:45.2405892Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual 2025-12-04T11:20:45.2406721Z return super().assertEqual(x, y, *args, **kwargs) 2025-12-04T11:20:45.2407561Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual 2025-12-04T11:20:45.2408441Z raise error_metas.pop()[0].to_error( # type: ignore[index] 2025-12-04T11:20:45.2408924Z AssertionError: Scalars are not equal! 2025-12-04T11:20:45.2409176Z 2025-12-04T11:20:45.2409301Z Expected 3 but got 6. 2025-12-04T11:20:45.2409582Z Absolute difference: 3 2025-12-04T11:20:45.2409889Z Relative difference: 1.0 2025-12-04T11:20:45.2410086Z 2025-12-04T11:20:45.2410316Z To execute this test, run the following from the base repo dir: 2025-12-04T11:20:45.2411661Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T11:20:45.2412732Z 2025-12-04T11:20:45.2413006Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:20:45.2413646Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:20:45.2414135Z stats [('calls_captured', 36)] 2025-12-04T11:20:45.2414892Z inductor [('pattern_matcher_nodes', 36), ('woq_matcher_nodes', 24), ('pattern_matcher_count', 18), ('woq_matcher_count', 6), ('fxgraph_cache_miss', 2)] 2025-12-04T11:20:45.2415781Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:20:45.2416261Z graph_break [] 2025-12-04T11:20:45.2416744Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:20:45.2418309Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T11:20:45.2419776Z if out == self.unknown_value: 2025-12-04T11:20:45.2420733Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.2421713Z warnings.warn( 2025-12-04T11:20:45.2422592Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.2423565Z warnings.warn( 2025-12-04T11:20:45.2424274Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 _ 2025-12-04T11:20:45.2425079Z Traceback (most recent call last): 2025-12-04T11:20:45.2425857Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 204, in test_int8_woq_mm_concat_cuda 2025-12-04T11:20:45.2426773Z self.assertEqual(counters["inductor"]["woq_matcher_count"], 3) 2025-12-04T11:20:45.2427619Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual 2025-12-04T11:20:45.2428397Z return super().assertEqual(x, y, *args, **kwargs) 2025-12-04T11:20:45.2429230Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual 2025-12-04T11:20:45.2430122Z raise error_metas.pop()[0].to_error( # type: ignore[index] 2025-12-04T11:20:45.2430593Z AssertionError: Scalars are not equal! 2025-12-04T11:20:45.2430859Z 2025-12-04T11:20:45.2430970Z Expected 3 but got 6. 2025-12-04T11:20:45.2431349Z Absolute difference: 3 2025-12-04T11:20:45.2431641Z Relative difference: 1.0 2025-12-04T11:20:45.2431855Z 2025-12-04T11:20:45.2432072Z To execute this test, run the following from the base repo dir: 2025-12-04T11:20:45.2433369Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T11:20:45.2434464Z 2025-12-04T11:20:45.2434749Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:20:45.2435374Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:20:45.2435865Z stats [('calls_captured', 36)] 2025-12-04T11:20:45.2436619Z inductor [('pattern_matcher_nodes', 36), ('woq_matcher_nodes', 24), ('pattern_matcher_count', 18), ('woq_matcher_count', 6), ('fxgraph_cache_miss', 2)] 2025-12-04T11:20:45.2437524Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:20:45.2437992Z graph_break [] 2025-12-04T11:20:45.2438416Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:20:45.2439998Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T11:20:45.2441464Z if out == self.unknown_value: 2025-12-04T11:20:45.2442407Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.2443387Z warnings.warn( 2025-12-04T11:20:45.2444284Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.2445252Z warnings.warn( 2025-12-04T11:20:45.2445628Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:20:45.2446116Z stats [('calls_captured', 36)] 2025-12-04T11:20:45.2446564Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:20:45.2447445Z inductor [('pattern_matcher_nodes', 36), ('woq_matcher_nodes', 24), ('pattern_matcher_count', 18), ('woq_matcher_count', 6), ('fxgraph_cache_miss', 2)] 2025-12-04T11:20:45.2448215Z graph_break [] 2025-12-04T11:20:45.2448591Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:20:45.2449681Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.2450637Z warnings.warn( 2025-12-04T11:20:45.2451527Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.2452500Z warnings.warn( 2025-12-04T11:20:45.2452803Z =================================== FAILURES =================================== 2025-12-04T11:20:45.2453644Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 _ 2025-12-04T11:20:45.2454437Z Traceback (most recent call last): 2025-12-04T11:20:45.2455226Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 204, in test_int8_woq_mm_concat_cuda 2025-12-04T11:20:45.2456129Z self.assertEqual(counters["inductor"]["woq_matcher_count"], 3) 2025-12-04T11:20:45.2457059Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual 2025-12-04T11:20:45.2457834Z return super().assertEqual(x, y, *args, **kwargs) 2025-12-04T11:20:45.2458672Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual 2025-12-04T11:20:45.2459654Z raise error_metas.pop()[0].to_error( # type: ignore[index] 2025-12-04T11:20:45.2460139Z AssertionError: Scalars are not equal! 2025-12-04T11:20:45.2460391Z 2025-12-04T11:20:45.2460515Z Expected 3 but got 6. 2025-12-04T11:20:45.2460800Z Absolute difference: 3 2025-12-04T11:20:45.2461099Z Relative difference: 1.0 2025-12-04T11:20:45.2461290Z 2025-12-04T11:20:45.2461520Z To execute this test, run the following from the base repo dir: 2025-12-04T11:20:45.2462858Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T11:20:45.2463930Z 2025-12-04T11:20:45.2464199Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:20:45.2464830Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:20:45.2465313Z stats [('calls_captured', 36)] 2025-12-04T11:20:45.2466052Z inductor [('pattern_matcher_nodes', 36), ('woq_matcher_nodes', 24), ('pattern_matcher_count', 18), ('woq_matcher_count', 6), ('fxgraph_cache_miss', 2)] 2025-12-04T11:20:45.2466995Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:20:45.2467471Z graph_break [] 2025-12-04T11:20:45.2467847Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:20:45.2469408Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T11:20:45.2470869Z if out == self.unknown_value: 2025-12-04T11:20:45.2472041Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.2473013Z warnings.warn( 2025-12-04T11:20:45.2473894Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.2474856Z warnings.warn( 2025-12-04T11:20:45.2475246Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:20:45.2475726Z stats [('calls_captured', 36)] 2025-12-04T11:20:45.2476163Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:20:45.2477059Z inductor [('pattern_matcher_nodes', 36), ('woq_matcher_nodes', 24), ('pattern_matcher_count', 18), ('woq_matcher_count', 6), ('fxgraph_cache_miss', 2)] 2025-12-04T11:20:45.2477824Z graph_break [] 2025-12-04T11:20:45.2478189Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:20:45.2479275Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.2480244Z warnings.warn( 2025-12-04T11:20:45.2481125Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.2482073Z warnings.warn( 2025-12-04T11:20:45.2482456Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:20:45.2482938Z stats [('calls_captured', 36)] 2025-12-04T11:20:45.2483375Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:20:45.2484265Z inductor [('pattern_matcher_nodes', 36), ('woq_matcher_nodes', 24), ('pattern_matcher_count', 18), ('woq_matcher_count', 6), ('fxgraph_cache_miss', 2)] 2025-12-04T11:20:45.2485030Z graph_break [] 2025-12-04T11:20:45.2485403Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:20:45.2486644Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.2487621Z warnings.warn( 2025-12-04T11:20:45.2488505Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.2489467Z warnings.warn( 2025-12-04T11:20:45.2490454Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-3ba19b390afd5854.xml - 2025-12-04T11:20:45.2491647Z =========================== short test summary info ============================ 2025-12-04T11:20:45.2492936Z FAILED [0.8255s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 - AssertionError: Scalars are not equal! 2025-12-04T11:20:45.2494025Z 2025-12-04T11:20:45.2494150Z Expected 3 but got 6. 2025-12-04T11:20:45.2494433Z Absolute difference: 3 2025-12-04T11:20:45.2494734Z Relative difference: 1.0 2025-12-04T11:20:45.2494990Z 2025-12-04T11:20:45.2495215Z To execute this test, run the following from the base repo dir: 2025-12-04T11:20:45.2496572Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T11:20:45.2497649Z 2025-12-04T11:20:45.2497919Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:20:45.2498508Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:20:45.2499043Z ================== 1 failed, 13 deselected, 2 rerun in 21.84s ================== 2025-12-04T11:20:45.2499491Z Got exit code 1 2025-12-04T11:20:45.2500514Z FAILED CONSISTENTLY: test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T11:20:45.2501906Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T11:20:45.2502906Z W1204 11:01:08.865000 86869 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T11:20:45.2504155Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-4ad317a243ecdd30.xml 2025-12-04T11:20:45.2505135Z ============================= test session starts ============================== 2025-12-04T11:20:45.2505792Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T11:20:45.2506402Z cachedir: .pytest_cache 2025-12-04T11:20:45.2507125Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:20:45.2507909Z rootdir: /var/lib/jenkins/workspace 2025-12-04T11:20:45.2508274Z configfile: pytest.ini 2025-12-04T11:20:45.2509010Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:20:45.2509910Z collecting ... collected 58 items / 1 deselected / 57 selected 2025-12-04T11:20:45.2510395Z stepcurrent: skipping 1 already run items. 2025-12-04T11:20:45.2510795Z Running 13 items in this shard 2025-12-04T11:20:45.2511003Z 2025-12-04T11:20:45.2511889Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 ('RERUN', {'yellow': True}) [3.9275s] [ 7%] 2025-12-04T11:20:45.2513750Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 ('RERUN', {'yellow': True}) [0.4960s] [ 7%] 2025-12-04T11:20:45.2515634Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 FAILED [0.4879s] [ 7%] 2025-12-04T11:20:45.2516567Z 2025-12-04T11:20:45.2516712Z ==================================== RERUNS ==================================== 2025-12-04T11:20:45.2517517Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 _ 2025-12-04T11:20:45.2518341Z Traceback (most recent call last): 2025-12-04T11:20:45.2519081Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda 2025-12-04T11:20:45.2519963Z self.assertEqual(counters["inductor"]["woq_matcher_count"], 1) 2025-12-04T11:20:45.2520793Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual 2025-12-04T11:20:45.2521570Z return super().assertEqual(x, y, *args, **kwargs) 2025-12-04T11:20:45.2522401Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual 2025-12-04T11:20:45.2523326Z raise error_metas.pop()[0].to_error( # type: ignore[index] 2025-12-04T11:20:45.2523804Z AssertionError: Scalars are not equal! 2025-12-04T11:20:45.2524055Z 2025-12-04T11:20:45.2524168Z Expected 1 but got 2. 2025-12-04T11:20:45.2524461Z Absolute difference: 1 2025-12-04T11:20:45.2524769Z Relative difference: 1.0 2025-12-04T11:20:45.2524959Z 2025-12-04T11:20:45.2525184Z To execute this test, run the following from the base repo dir: 2025-12-04T11:20:45.2526435Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 2025-12-04T11:20:45.2527487Z 2025-12-04T11:20:45.2527758Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:20:45.2528393Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:20:45.2528876Z stats [('calls_captured', 6)] 2025-12-04T11:20:45.2529975Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)] 2025-12-04T11:20:45.2531247Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:20:45.2531723Z graph_break [] 2025-12-04T11:20:45.2532087Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:20:45.2533176Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.2534146Z warnings.warn( 2025-12-04T11:20:45.2535036Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.2535997Z warnings.warn( 2025-12-04T11:20:45.2536773Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 _ 2025-12-04T11:20:45.2537552Z Traceback (most recent call last): 2025-12-04T11:20:45.2538315Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda 2025-12-04T11:20:45.2539190Z self.assertEqual(counters["inductor"]["woq_matcher_count"], 1) 2025-12-04T11:20:45.2540024Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual 2025-12-04T11:20:45.2540795Z return super().assertEqual(x, y, *args, **kwargs) 2025-12-04T11:20:45.2541616Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual 2025-12-04T11:20:45.2542504Z raise error_metas.pop()[0].to_error( # type: ignore[index] 2025-12-04T11:20:45.2543072Z AssertionError: Scalars are not equal! 2025-12-04T11:20:45.2543327Z 2025-12-04T11:20:45.2543450Z Expected 1 but got 2. 2025-12-04T11:20:45.2543733Z Absolute difference: 1 2025-12-04T11:20:45.2544039Z Relative difference: 1.0 2025-12-04T11:20:45.2544233Z 2025-12-04T11:20:45.2544467Z To execute this test, run the following from the base repo dir: 2025-12-04T11:20:45.2545733Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 2025-12-04T11:20:45.2546807Z 2025-12-04T11:20:45.2547075Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:20:45.2547708Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:20:45.2548189Z stats [('calls_captured', 6)] 2025-12-04T11:20:45.2549283Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)] 2025-12-04T11:20:45.2550585Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:20:45.2551058Z graph_break [] 2025-12-04T11:20:45.2551436Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:20:45.2552525Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.2553499Z warnings.warn( 2025-12-04T11:20:45.2554384Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.2555350Z warnings.warn( 2025-12-04T11:20:45.2555720Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:20:45.2556205Z stats [('calls_captured', 6)] 2025-12-04T11:20:45.2556650Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:20:45.2557891Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)] 2025-12-04T11:20:45.2559015Z graph_break [] 2025-12-04T11:20:45.2559400Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:20:45.2560491Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.2561446Z warnings.warn( 2025-12-04T11:20:45.2562333Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.2563297Z warnings.warn( 2025-12-04T11:20:45.2563616Z =================================== FAILURES =================================== 2025-12-04T11:20:45.2564411Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 _ 2025-12-04T11:20:45.2565186Z Traceback (most recent call last): 2025-12-04T11:20:45.2565942Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda 2025-12-04T11:20:45.2566814Z self.assertEqual(counters["inductor"]["woq_matcher_count"], 1) 2025-12-04T11:20:45.2567642Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual 2025-12-04T11:20:45.2568404Z return super().assertEqual(x, y, *args, **kwargs) 2025-12-04T11:20:45.2569239Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual 2025-12-04T11:20:45.2570180Z raise error_metas.pop()[0].to_error( # type: ignore[index] 2025-12-04T11:20:45.2570659Z AssertionError: Scalars are not equal! 2025-12-04T11:20:45.2570914Z 2025-12-04T11:20:45.2571244Z Expected 1 but got 2. 2025-12-04T11:20:45.2571547Z Absolute difference: 1 2025-12-04T11:20:45.2571835Z Relative difference: 1.0 2025-12-04T11:20:45.2572042Z 2025-12-04T11:20:45.2572261Z To execute this test, run the following from the base repo dir: 2025-12-04T11:20:45.2573610Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 2025-12-04T11:20:45.2574651Z 2025-12-04T11:20:45.2574920Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:20:45.2575561Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:20:45.2576052Z stats [('calls_captured', 6)] 2025-12-04T11:20:45.2577252Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)] 2025-12-04T11:20:45.2578566Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:20:45.2579045Z graph_break [] 2025-12-04T11:20:45.2579430Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:20:45.2580530Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.2581494Z warnings.warn( 2025-12-04T11:20:45.2582390Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.2583363Z warnings.warn( 2025-12-04T11:20:45.2583736Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:20:45.2584230Z stats [('calls_captured', 6)] 2025-12-04T11:20:45.2584685Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:20:45.2585955Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)] 2025-12-04T11:20:45.2587075Z graph_break [] 2025-12-04T11:20:45.2587453Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:20:45.2588539Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.2589512Z warnings.warn( 2025-12-04T11:20:45.2590385Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.2591355Z warnings.warn( 2025-12-04T11:20:45.2591735Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:20:45.2592207Z stats [('calls_captured', 6)] 2025-12-04T11:20:45.2592654Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:20:45.2593908Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)] 2025-12-04T11:20:45.2595037Z graph_break [] 2025-12-04T11:20:45.2595398Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:20:45.2596486Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.2597460Z warnings.warn( 2025-12-04T11:20:45.2598458Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.2599412Z warnings.warn( 2025-12-04T11:20:45.2600430Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-4ad317a243ecdd30.xml - 2025-12-04T11:20:45.2601580Z =========================== short test summary info ============================ 2025-12-04T11:20:45.2602887Z FAILED [0.4879s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 - AssertionError: Scalars are not equal! 2025-12-04T11:20:45.2603949Z 2025-12-04T11:20:45.2604057Z Expected 1 but got 2. 2025-12-04T11:20:45.2604356Z Absolute difference: 1 2025-12-04T11:20:45.2604658Z Relative difference: 1.0 2025-12-04T11:20:45.2604851Z 2025-12-04T11:20:45.2605074Z To execute this test, run the following from the base repo dir: 2025-12-04T11:20:45.2606332Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 2025-12-04T11:20:45.2607419Z 2025-12-04T11:20:45.2607685Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:20:45.2608281Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:20:45.2608797Z =================== 1 failed, 1 deselected, 2 rerun in 4.94s =================== 2025-12-04T11:20:45.2609240Z Got exit code 1 2025-12-04T11:20:45.2609515Z Retrying single test... 2025-12-04T11:20:45.2610153Z W1204 11:01:29.485000 87046 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T11:20:45.2611387Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-f482798b2b39d897.xml 2025-12-04T11:20:45.2612357Z ============================= test session starts ============================== 2025-12-04T11:20:45.2613025Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T11:20:45.2613632Z cachedir: .pytest_cache 2025-12-04T11:20:45.2614334Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:20:45.2615123Z rootdir: /var/lib/jenkins/workspace 2025-12-04T11:20:45.2615481Z configfile: pytest.ini 2025-12-04T11:20:45.2616201Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:20:45.2617202Z collecting ... collected 58 items / 13 deselected / 45 selected 2025-12-04T11:20:45.2618556Z stepcurrent: skipping 1 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 2025-12-04T11:20:45.2619793Z Running 1 items in this shard 2025-12-04T11:20:45.2620005Z 2025-12-04T11:20:45.2621282Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 [W1204 11:01:35.407570069 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2622714Z 2025-12-04T11:20:45.2623231Z [W1204 11:01:51.707034883 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2623901Z 2025-12-04T11:20:45.2624421Z [W1204 11:01:51.707301033 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2625082Z 2025-12-04T11:20:45.2625711Z [W1204 11:01:51.714839285 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2626364Z 2025-12-04T11:20:45.2626889Z [W1204 11:01:51.715605529 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2627535Z 2025-12-04T11:20:45.2628060Z [W1204 11:01:51.715807172 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2628739Z 2025-12-04T11:20:45.2629253Z [W1204 11:01:51.722963564 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2629921Z 2025-12-04T11:20:45.2630432Z [W1204 11:01:51.723807891 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2631094Z 2025-12-04T11:20:45.2631611Z [W1204 11:01:51.723999328 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2632292Z 2025-12-04T11:20:45.2632814Z [W1204 11:01:51.863930710 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2633463Z 2025-12-04T11:20:45.2633986Z [W1204 11:01:51.865728842 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2634641Z 2025-12-04T11:20:45.2635149Z [W1204 11:01:51.865954245 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2635808Z 2025-12-04T11:20:45.2636322Z [W1204 11:01:51.869979635 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2636981Z 2025-12-04T11:20:45.2637497Z [W1204 11:01:51.870677380 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2638149Z 2025-12-04T11:20:45.2638674Z [W1204 11:01:51.870889043 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2639325Z 2025-12-04T11:20:45.2639852Z [W1204 11:01:51.877011000 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2640502Z 2025-12-04T11:20:45.2641015Z [W1204 11:01:51.877669983 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2641680Z 2025-12-04T11:20:45.2642191Z [W1204 11:01:51.877868518 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2642848Z 2025-12-04T11:20:45.2642982Z ('RERUN', {'yellow': True}) [20.2481s] [100%] 2025-12-04T11:20:45.2644531Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 [W1204 11:01:51.315864040 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2645942Z 2025-12-04T11:20:45.2646467Z [W1204 11:01:51.316660570 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2647126Z 2025-12-04T11:20:45.2647637Z [W1204 11:01:51.316876389 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2648296Z 2025-12-04T11:20:45.2648812Z [W1204 11:01:51.321075483 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2649481Z 2025-12-04T11:20:45.2649995Z [W1204 11:01:51.321745384 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2650720Z 2025-12-04T11:20:45.2651249Z [W1204 11:01:51.321946566 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2651896Z 2025-12-04T11:20:45.2652423Z [W1204 11:01:51.328122107 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2653108Z 2025-12-04T11:20:45.2653617Z [W1204 11:01:51.328788702 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2654276Z 2025-12-04T11:20:45.2654787Z [W1204 11:01:51.328983861 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2655446Z 2025-12-04T11:20:45.2655955Z [W1204 11:01:52.420327224 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2656702Z 2025-12-04T11:20:45.2657218Z [W1204 11:01:52.421121224 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2657928Z 2025-12-04T11:20:45.2658454Z [W1204 11:01:52.421334786 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2659104Z 2025-12-04T11:20:45.2659631Z [W1204 11:01:52.425326827 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2660282Z 2025-12-04T11:20:45.2660791Z [W1204 11:01:52.425981144 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2661449Z 2025-12-04T11:20:45.2661964Z [W1204 11:01:52.426178311 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2662624Z 2025-12-04T11:20:45.2663141Z [W1204 11:01:52.432313637 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2663801Z 2025-12-04T11:20:45.2664325Z [W1204 11:01:52.433164404 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2664974Z 2025-12-04T11:20:45.2665500Z [W1204 11:01:52.433363808 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2666153Z 2025-12-04T11:20:45.2666286Z ('RERUN', {'yellow': True}) [0.5140s] [100%] 2025-12-04T11:20:45.2667830Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 [W1204 11:01:52.805435355 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2669243Z 2025-12-04T11:20:45.2669763Z [W1204 11:01:52.806207540 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2670420Z 2025-12-04T11:20:45.2671132Z [W1204 11:01:52.806413604 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2671801Z 2025-12-04T11:20:45.2672335Z [W1204 11:01:52.810626479 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2672989Z 2025-12-04T11:20:45.2673504Z [W1204 11:01:52.811296141 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2674174Z 2025-12-04T11:20:45.2674687Z [W1204 11:01:52.811495451 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2675351Z 2025-12-04T11:20:45.2676012Z [W1204 11:01:52.817699856 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2676668Z 2025-12-04T11:20:45.2677199Z [W1204 11:01:52.818343233 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2677847Z 2025-12-04T11:20:45.2678374Z [W1204 11:01:52.818538282 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2679070Z 2025-12-04T11:20:45.2679583Z [W1204 11:01:52.908621467 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2680253Z 2025-12-04T11:20:45.2680762Z [W1204 11:01:52.909405090 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2681424Z 2025-12-04T11:20:45.2681940Z [W1204 11:01:52.909621243 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2682637Z 2025-12-04T11:20:45.2683166Z [W1204 11:01:52.913643458 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2683817Z 2025-12-04T11:20:45.2684343Z [W1204 11:01:52.914315629 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2684994Z 2025-12-04T11:20:45.2685506Z [W1204 11:01:52.914523982 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2686164Z 2025-12-04T11:20:45.2686673Z [W1204 11:01:52.920627462 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2687334Z 2025-12-04T11:20:45.2687854Z [W1204 11:01:52.921487344 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2688520Z 2025-12-04T11:20:45.2689028Z [W1204 11:01:52.921690413 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2689678Z 2025-12-04T11:20:45.2689802Z FAILED [0.4896s] [100%] 2025-12-04T11:20:45.2689983Z 2025-12-04T11:20:45.2690124Z ==================================== RERUNS ==================================== 2025-12-04T11:20:45.2690931Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 _ 2025-12-04T11:20:45.2691712Z Traceback (most recent call last): 2025-12-04T11:20:45.2692465Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda 2025-12-04T11:20:45.2693333Z self.assertEqual(counters["inductor"]["woq_matcher_count"], 1) 2025-12-04T11:20:45.2694178Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual 2025-12-04T11:20:45.2694955Z return super().assertEqual(x, y, *args, **kwargs) 2025-12-04T11:20:45.2695778Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual 2025-12-04T11:20:45.2696775Z raise error_metas.pop()[0].to_error( # type: ignore[index] 2025-12-04T11:20:45.2697261Z AssertionError: Scalars are not equal! 2025-12-04T11:20:45.2697513Z 2025-12-04T11:20:45.2697636Z Expected 1 but got 2. 2025-12-04T11:20:45.2697919Z Absolute difference: 1 2025-12-04T11:20:45.2698226Z Relative difference: 1.0 2025-12-04T11:20:45.2698417Z 2025-12-04T11:20:45.2698645Z To execute this test, run the following from the base repo dir: 2025-12-04T11:20:45.2699914Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 2025-12-04T11:20:45.2700958Z 2025-12-04T11:20:45.2701314Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:20:45.2701951Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:20:45.2702433Z stats [('calls_captured', 6)] 2025-12-04T11:20:45.2703531Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)] 2025-12-04T11:20:45.2704835Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:20:45.2705307Z graph_break [] 2025-12-04T11:20:45.2705687Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:20:45.2707240Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T11:20:45.2708766Z if out == self.unknown_value: 2025-12-04T11:20:45.2709718Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.2710699Z warnings.warn( 2025-12-04T11:20:45.2711580Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.2712555Z warnings.warn( 2025-12-04T11:20:45.2713234Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 _ 2025-12-04T11:20:45.2714009Z Traceback (most recent call last): 2025-12-04T11:20:45.2714745Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda 2025-12-04T11:20:45.2715630Z self.assertEqual(counters["inductor"]["woq_matcher_count"], 1) 2025-12-04T11:20:45.2716459Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual 2025-12-04T11:20:45.2717218Z return super().assertEqual(x, y, *args, **kwargs) 2025-12-04T11:20:45.2718059Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual 2025-12-04T11:20:45.2718954Z raise error_metas.pop()[0].to_error( # type: ignore[index] 2025-12-04T11:20:45.2719431Z AssertionError: Scalars are not equal! 2025-12-04T11:20:45.2719683Z 2025-12-04T11:20:45.2719791Z Expected 1 but got 2. 2025-12-04T11:20:45.2720087Z Absolute difference: 1 2025-12-04T11:20:45.2720385Z Relative difference: 1.0 2025-12-04T11:20:45.2720577Z 2025-12-04T11:20:45.2720791Z To execute this test, run the following from the base repo dir: 2025-12-04T11:20:45.2722061Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 2025-12-04T11:20:45.2723113Z 2025-12-04T11:20:45.2723381Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:20:45.2724014Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:20:45.2724483Z stats [('calls_captured', 6)] 2025-12-04T11:20:45.2725591Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)] 2025-12-04T11:20:45.2726852Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:20:45.2727326Z graph_break [] 2025-12-04T11:20:45.2727690Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:20:45.2729337Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T11:20:45.2730817Z if out == self.unknown_value: 2025-12-04T11:20:45.2731763Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.2732750Z warnings.warn( 2025-12-04T11:20:45.2733643Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.2734610Z warnings.warn( 2025-12-04T11:20:45.2734994Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:20:45.2735465Z stats [('calls_captured', 6)] 2025-12-04T11:20:45.2735922Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:20:45.2737272Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)] 2025-12-04T11:20:45.2738452Z graph_break [] 2025-12-04T11:20:45.2738821Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:20:45.2739919Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.2740891Z warnings.warn( 2025-12-04T11:20:45.2741767Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.2742737Z warnings.warn( 2025-12-04T11:20:45.2743054Z =================================== FAILURES =================================== 2025-12-04T11:20:45.2743869Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 _ 2025-12-04T11:20:45.2744635Z Traceback (most recent call last): 2025-12-04T11:20:45.2745389Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda 2025-12-04T11:20:45.2746266Z self.assertEqual(counters["inductor"]["woq_matcher_count"], 1) 2025-12-04T11:20:45.2747086Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual 2025-12-04T11:20:45.2747850Z return super().assertEqual(x, y, *args, **kwargs) 2025-12-04T11:20:45.2748690Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual 2025-12-04T11:20:45.2749577Z raise error_metas.pop()[0].to_error( # type: ignore[index] 2025-12-04T11:20:45.2750048Z AssertionError: Scalars are not equal! 2025-12-04T11:20:45.2750315Z 2025-12-04T11:20:45.2750430Z Expected 1 but got 2. 2025-12-04T11:20:45.2750728Z Absolute difference: 1 2025-12-04T11:20:45.2751022Z Relative difference: 1.0 2025-12-04T11:20:45.2751231Z 2025-12-04T11:20:45.2751448Z To execute this test, run the following from the base repo dir: 2025-12-04T11:20:45.2752715Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 2025-12-04T11:20:45.2753758Z 2025-12-04T11:20:45.2754043Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:20:45.2754668Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:20:45.2755157Z stats [('calls_captured', 6)] 2025-12-04T11:20:45.2756347Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)] 2025-12-04T11:20:45.2757606Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:20:45.2758069Z graph_break [] 2025-12-04T11:20:45.2758455Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:20:45.2760025Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T11:20:45.2761530Z if out == self.unknown_value: 2025-12-04T11:20:45.2762465Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.2763434Z warnings.warn( 2025-12-04T11:20:45.2764329Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.2765332Z warnings.warn( 2025-12-04T11:20:45.2765711Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:20:45.2766190Z stats [('calls_captured', 6)] 2025-12-04T11:20:45.2766635Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:20:45.2767880Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)] 2025-12-04T11:20:45.2769019Z graph_break [] 2025-12-04T11:20:45.2769393Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:20:45.2770492Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.2771685Z warnings.warn( 2025-12-04T11:20:45.2772586Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.2773552Z warnings.warn( 2025-12-04T11:20:45.2773937Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:20:45.2774406Z stats [('calls_captured', 6)] 2025-12-04T11:20:45.2774861Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:20:45.2776112Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)] 2025-12-04T11:20:45.2777319Z graph_break [] 2025-12-04T11:20:45.2777700Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:20:45.2778797Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.2779780Z warnings.warn( 2025-12-04T11:20:45.2780656Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.2781619Z warnings.warn( 2025-12-04T11:20:45.2782628Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-f482798b2b39d897.xml - 2025-12-04T11:20:45.2783779Z =========================== short test summary info ============================ 2025-12-04T11:20:45.2785026Z FAILED [0.4896s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 - AssertionError: Scalars are not equal! 2025-12-04T11:20:45.2786097Z 2025-12-04T11:20:45.2786206Z Expected 1 but got 2. 2025-12-04T11:20:45.2786661Z Absolute difference: 1 2025-12-04T11:20:45.2786976Z Relative difference: 1.0 2025-12-04T11:20:45.2787172Z 2025-12-04T11:20:45.2787390Z To execute this test, run the following from the base repo dir: 2025-12-04T11:20:45.2788659Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 2025-12-04T11:20:45.2789740Z 2025-12-04T11:20:45.2790023Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:20:45.2790619Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:20:45.2791136Z ================== 1 failed, 13 deselected, 2 rerun in 21.29s ================== 2025-12-04T11:20:45.2791582Z Got exit code 1 2025-12-04T11:20:45.2791853Z Retrying single test... 2025-12-04T11:20:45.2792483Z W1204 11:02:04.500000 87228 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T11:20:45.2793777Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-cbe2514f89eef609.xml 2025-12-04T11:20:45.2794746Z ============================= test session starts ============================== 2025-12-04T11:20:45.2795409Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T11:20:45.2795999Z cachedir: .pytest_cache 2025-12-04T11:20:45.2796711Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:20:45.2796838Z rootdir: /var/lib/jenkins/workspace 2025-12-04T11:20:45.2796966Z configfile: pytest.ini 2025-12-04T11:20:45.2797509Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:20:45.2797748Z collecting ... collected 58 items / 13 deselected / 45 selected 2025-12-04T11:20:45.2798735Z stepcurrent: skipping 1 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 2025-12-04T11:20:45.2798855Z Running 1 items in this shard 2025-12-04T11:20:45.2798860Z 2025-12-04T11:20:45.2800150Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 [W1204 11:02:10.475432435 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2800159Z 2025-12-04T11:20:45.2800679Z [W1204 11:02:25.913062298 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2800685Z 2025-12-04T11:20:45.2801218Z [W1204 11:02:25.913332120 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2801226Z 2025-12-04T11:20:45.2801734Z [W1204 11:02:25.921073822 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2801739Z 2025-12-04T11:20:45.2802264Z [W1204 11:02:25.921834099 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2802271Z 2025-12-04T11:20:45.2802778Z [W1204 11:02:25.922032885 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2802784Z 2025-12-04T11:20:45.2803302Z [W1204 11:02:25.929145907 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2803307Z 2025-12-04T11:20:45.2803889Z [W1204 11:02:25.929846701 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2803897Z 2025-12-04T11:20:45.2804408Z [W1204 11:02:25.930061895 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2804425Z 2025-12-04T11:20:45.2804934Z [W1204 11:02:25.069481247 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2804990Z 2025-12-04T11:20:45.2805496Z [W1204 11:02:25.071301433 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2805501Z 2025-12-04T11:20:45.2806020Z [W1204 11:02:25.071524197 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2806025Z 2025-12-04T11:20:45.2806534Z [W1204 11:02:25.075580976 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2806539Z 2025-12-04T11:20:45.2807095Z [W1204 11:02:25.076271302 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2807100Z 2025-12-04T11:20:45.2807609Z [W1204 11:02:25.076472418 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2807616Z 2025-12-04T11:20:45.2808141Z [W1204 11:02:25.082714100 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2808146Z 2025-12-04T11:20:45.2808651Z [W1204 11:02:25.083438610 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2808656Z 2025-12-04T11:20:45.2809177Z [W1204 11:02:25.083637913 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2809187Z 2025-12-04T11:20:45.2809321Z ('RERUN', {'yellow': True}) [19.4091s] [100%] 2025-12-04T11:20:45.2810595Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 [W1204 11:02:26.524022184 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2810603Z 2025-12-04T11:20:45.2811128Z [W1204 11:02:26.524800070 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2811133Z 2025-12-04T11:20:45.2811644Z [W1204 11:02:26.525001316 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2811651Z 2025-12-04T11:20:45.2812172Z [W1204 11:02:26.529088579 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2812181Z 2025-12-04T11:20:45.2812687Z [W1204 11:02:26.529729386 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2812694Z 2025-12-04T11:20:45.2813210Z [W1204 11:02:26.529923118 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2813218Z 2025-12-04T11:20:45.2813728Z [W1204 11:02:26.536209937 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2813733Z 2025-12-04T11:20:45.2814257Z [W1204 11:02:26.536875498 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2814262Z 2025-12-04T11:20:45.2814770Z [W1204 11:02:26.537066228 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2814775Z 2025-12-04T11:20:45.2815343Z [W1204 11:02:26.627409384 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2815364Z 2025-12-04T11:20:45.2815876Z [W1204 11:02:26.628201914 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2815881Z 2025-12-04T11:20:45.2816500Z [W1204 11:02:26.628411871 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2816506Z 2025-12-04T11:20:45.2817030Z [W1204 11:02:26.632494374 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2817035Z 2025-12-04T11:20:45.2817540Z [W1204 11:02:26.633189785 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2817545Z 2025-12-04T11:20:45.2818075Z [W1204 11:02:26.633390987 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2818114Z 2025-12-04T11:20:45.2818625Z [W1204 11:02:26.639513848 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2818630Z 2025-12-04T11:20:45.2819151Z [W1204 11:02:26.640389148 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2819158Z 2025-12-04T11:20:45.2819668Z [W1204 11:02:26.640609341 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2819673Z 2025-12-04T11:20:45.2819819Z ('RERUN', {'yellow': True}) [0.5167s] [100%] 2025-12-04T11:20:45.2821103Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 [W1204 11:02:26.016255890 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2821112Z 2025-12-04T11:20:45.2821622Z [W1204 11:02:26.017039891 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2821640Z 2025-12-04T11:20:45.2822149Z [W1204 11:02:26.017245077 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2822156Z 2025-12-04T11:20:45.2822669Z [W1204 11:02:26.021500090 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2822674Z 2025-12-04T11:20:45.2823191Z [W1204 11:02:26.022167493 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2823197Z 2025-12-04T11:20:45.2823707Z [W1204 11:02:26.022362808 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2823715Z 2025-12-04T11:20:45.2824241Z [W1204 11:02:26.028572834 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2824246Z 2025-12-04T11:20:45.2824755Z [W1204 11:02:26.029229997 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2824762Z 2025-12-04T11:20:45.2825283Z [W1204 11:02:26.029419944 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2825288Z 2025-12-04T11:20:45.2825794Z [W1204 11:02:26.120061872 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2825799Z 2025-12-04T11:20:45.2826377Z [W1204 11:02:26.120868006 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2826385Z 2025-12-04T11:20:45.2826892Z [W1204 11:02:26.121083580 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2826897Z 2025-12-04T11:20:45.2827404Z [W1204 11:02:26.125088773 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2827440Z 2025-12-04T11:20:45.2827959Z [W1204 11:02:26.125762105 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2827965Z 2025-12-04T11:20:45.2828473Z [W1204 11:02:26.125967572 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2828478Z 2025-12-04T11:20:45.2829005Z [W1204 11:02:26.132112530 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2829039Z 2025-12-04T11:20:45.2829550Z [W1204 11:02:26.132988808 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2829555Z 2025-12-04T11:20:45.2830085Z [W1204 11:02:26.133186719 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2830092Z 2025-12-04T11:20:45.2830196Z FAILED [0.4932s] [100%] 2025-12-04T11:20:45.2830201Z 2025-12-04T11:20:45.2830361Z ==================================== RERUNS ==================================== 2025-12-04T11:20:45.2830870Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 _ 2025-12-04T11:20:45.2830998Z Traceback (most recent call last): 2025-12-04T11:20:45.2831543Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda 2025-12-04T11:20:45.2831778Z self.assertEqual(counters["inductor"]["woq_matcher_count"], 1) 2025-12-04T11:20:45.2832245Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual 2025-12-04T11:20:45.2832429Z return super().assertEqual(x, y, *args, **kwargs) 2025-12-04T11:20:45.2832971Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual 2025-12-04T11:20:45.2833197Z raise error_metas.pop()[0].to_error( # type: ignore[index] 2025-12-04T11:20:45.2833334Z AssertionError: Scalars are not equal! 2025-12-04T11:20:45.2833339Z 2025-12-04T11:20:45.2833448Z Expected 1 but got 2. 2025-12-04T11:20:45.2833574Z Absolute difference: 1 2025-12-04T11:20:45.2833690Z Relative difference: 1.0 2025-12-04T11:20:45.2833695Z 2025-12-04T11:20:45.2833911Z To execute this test, run the following from the base repo dir: 2025-12-04T11:20:45.2834839Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 2025-12-04T11:20:45.2834847Z 2025-12-04T11:20:45.2835118Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:20:45.2835356Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:20:45.2835481Z stats [('calls_captured', 6)] 2025-12-04T11:20:45.2836375Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)] 2025-12-04T11:20:45.2836620Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:20:45.2836724Z graph_break [] 2025-12-04T11:20:45.2836959Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:20:45.2838233Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T11:20:45.2838359Z if out == self.unknown_value: 2025-12-04T11:20:45.2839102Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.2839237Z warnings.warn( 2025-12-04T11:20:45.2839969Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.2840071Z warnings.warn( 2025-12-04T11:20:45.2840582Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 _ 2025-12-04T11:20:45.2840727Z Traceback (most recent call last): 2025-12-04T11:20:45.2841240Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda 2025-12-04T11:20:45.2841523Z self.assertEqual(counters["inductor"]["woq_matcher_count"], 1) 2025-12-04T11:20:45.2841979Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual 2025-12-04T11:20:45.2842149Z return super().assertEqual(x, y, *args, **kwargs) 2025-12-04T11:20:45.2842697Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual 2025-12-04T11:20:45.2842905Z raise error_metas.pop()[0].to_error( # type: ignore[index] 2025-12-04T11:20:45.2843039Z AssertionError: Scalars are not equal! 2025-12-04T11:20:45.2843045Z 2025-12-04T11:20:45.2843165Z Expected 1 but got 2. 2025-12-04T11:20:45.2843275Z Absolute difference: 1 2025-12-04T11:20:45.2843398Z Relative difference: 1.0 2025-12-04T11:20:45.2843403Z 2025-12-04T11:20:45.2843624Z To execute this test, run the following from the base repo dir: 2025-12-04T11:20:45.2844532Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 2025-12-04T11:20:45.2844539Z 2025-12-04T11:20:45.2844821Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:20:45.2845048Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:20:45.2845180Z stats [('calls_captured', 6)] 2025-12-04T11:20:45.2846068Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)] 2025-12-04T11:20:45.2846298Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:20:45.2846411Z graph_break [] 2025-12-04T11:20:45.2846632Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:20:45.2847858Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T11:20:45.2847980Z if out == self.unknown_value: 2025-12-04T11:20:45.2848705Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.2848822Z warnings.warn( 2025-12-04T11:20:45.2849543Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.2849646Z warnings.warn( 2025-12-04T11:20:45.2849964Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:20:45.2850085Z stats [('calls_captured', 6)] 2025-12-04T11:20:45.2850331Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:20:45.2851222Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)] 2025-12-04T11:20:45.2851353Z graph_break [] 2025-12-04T11:20:45.2851582Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:20:45.2852309Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.2852426Z warnings.warn( 2025-12-04T11:20:45.2853145Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.2853252Z warnings.warn( 2025-12-04T11:20:45.2853449Z =================================== FAILURES =================================== 2025-12-04T11:20:45.2853960Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 _ 2025-12-04T11:20:45.2854087Z Traceback (most recent call last): 2025-12-04T11:20:45.2854613Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda 2025-12-04T11:20:45.2854845Z self.assertEqual(counters["inductor"]["woq_matcher_count"], 1) 2025-12-04T11:20:45.2855316Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual 2025-12-04T11:20:45.2855482Z return super().assertEqual(x, y, *args, **kwargs) 2025-12-04T11:20:45.2856017Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual 2025-12-04T11:20:45.2856244Z raise error_metas.pop()[0].to_error( # type: ignore[index] 2025-12-04T11:20:45.2856463Z AssertionError: Scalars are not equal! 2025-12-04T11:20:45.2856471Z 2025-12-04T11:20:45.2856592Z Expected 1 but got 2. 2025-12-04T11:20:45.2856705Z Absolute difference: 1 2025-12-04T11:20:45.2856815Z Relative difference: 1.0 2025-12-04T11:20:45.2856821Z 2025-12-04T11:20:45.2857048Z To execute this test, run the following from the base repo dir: 2025-12-04T11:20:45.2857959Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 2025-12-04T11:20:45.2857964Z 2025-12-04T11:20:45.2858246Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:20:45.2858466Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:20:45.2858583Z stats [('calls_captured', 6)] 2025-12-04T11:20:45.2859491Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)] 2025-12-04T11:20:45.2859721Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:20:45.2859823Z graph_break [] 2025-12-04T11:20:45.2860057Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:20:45.2861262Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T11:20:45.2861393Z if out == self.unknown_value: 2025-12-04T11:20:45.2862117Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.2862296Z warnings.warn( 2025-12-04T11:20:45.2863036Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.2863138Z warnings.warn( 2025-12-04T11:20:45.2863369Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:20:45.2863520Z stats [('calls_captured', 6)] 2025-12-04T11:20:45.2863748Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:20:45.2864646Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)] 2025-12-04T11:20:45.2864748Z graph_break [] 2025-12-04T11:20:45.2864965Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:20:45.2865713Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.2865849Z warnings.warn( 2025-12-04T11:20:45.2866579Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.2866682Z warnings.warn( 2025-12-04T11:20:45.2866898Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:20:45.2867027Z stats [('calls_captured', 6)] 2025-12-04T11:20:45.2867255Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:20:45.2868155Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)] 2025-12-04T11:20:45.2868256Z graph_break [] 2025-12-04T11:20:45.2868479Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:20:45.2869215Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.2869317Z warnings.warn( 2025-12-04T11:20:45.2870031Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.2870149Z warnings.warn( 2025-12-04T11:20:45.2871174Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-cbe2514f89eef609.xml - 2025-12-04T11:20:45.2871372Z =========================== short test summary info ============================ 2025-12-04T11:20:45.2872325Z FAILED [0.4932s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 - AssertionError: Scalars are not equal! 2025-12-04T11:20:45.2872333Z 2025-12-04T11:20:45.2872459Z Expected 1 but got 2. 2025-12-04T11:20:45.2872569Z Absolute difference: 1 2025-12-04T11:20:45.2872682Z Relative difference: 1.0 2025-12-04T11:20:45.2872687Z 2025-12-04T11:20:45.2872921Z To execute this test, run the following from the base repo dir: 2025-12-04T11:20:45.2873834Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 2025-12-04T11:20:45.2873840Z 2025-12-04T11:20:45.2874124Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:20:45.2874304Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:20:45.2874504Z ================== 1 failed, 13 deselected, 2 rerun in 20.45s ================== 2025-12-04T11:20:45.2874741Z Got exit code 1 2025-12-04T11:20:45.2875569Z FAILED CONSISTENTLY: test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 2025-12-04T11:20:45.2875982Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T11:20:45.2876494Z W1204 11:02:38.585000 87410 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T11:20:45.2877151Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-3707d31910126ebf.xml 2025-12-04T11:20:45.2877333Z ============================= test session starts ============================== 2025-12-04T11:20:45.2877685Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T11:20:45.2877798Z cachedir: .pytest_cache 2025-12-04T11:20:45.2878341Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:20:45.2878514Z rootdir: /var/lib/jenkins/workspace 2025-12-04T11:20:45.2878641Z configfile: pytest.ini 2025-12-04T11:20:45.2879180Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:20:45.2879399Z collecting ... collected 58 items / 2 deselected / 56 selected 2025-12-04T11:20:45.2879556Z stepcurrent: skipping 2 already run items. 2025-12-04T11:20:45.2879674Z Running 12 items in this shard 2025-12-04T11:20:45.2879680Z 2025-12-04T11:20:45.2880903Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 W1204 11:02:44.250000 87410 site-packages/torch/_inductor/utils.py:1703] [0/0] Not enough SMs to use max_autotune_gemm mode 2025-12-04T11:20:45.2881054Z ('RERUN', {'yellow': True}) [4.0054s] [ 8%] 2025-12-04T11:20:45.2881918Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 ('RERUN', {'yellow': True}) [0.5437s] [ 8%] 2025-12-04T11:20:45.2882708Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 FAILED [0.5503s] [ 8%] 2025-12-04T11:20:45.2882716Z 2025-12-04T11:20:45.2882859Z ==================================== RERUNS ==================================== 2025-12-04T11:20:45.2883382Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 _ 2025-12-04T11:20:45.2883508Z Traceback (most recent call last): 2025-12-04T11:20:45.2884018Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda 2025-12-04T11:20:45.2884269Z self.assertEqual(counters["inductor"]["woq_matcher_count"], 1) 2025-12-04T11:20:45.2884732Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual 2025-12-04T11:20:45.2884898Z return super().assertEqual(x, y, *args, **kwargs) 2025-12-04T11:20:45.2885451Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual 2025-12-04T11:20:45.2885665Z raise error_metas.pop()[0].to_error( # type: ignore[index] 2025-12-04T11:20:45.2885814Z AssertionError: Scalars are not equal! 2025-12-04T11:20:45.2885820Z 2025-12-04T11:20:45.2885929Z Expected 1 but got 0. 2025-12-04T11:20:45.2886042Z Absolute difference: 1 2025-12-04T11:20:45.2886172Z Relative difference: 1.0 2025-12-04T11:20:45.2886177Z 2025-12-04T11:20:45.2886395Z To execute this test, run the following from the base repo dir: 2025-12-04T11:20:45.2887381Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 2025-12-04T11:20:45.2887390Z 2025-12-04T11:20:45.2887665Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:20:45.2887886Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:20:45.2888048Z stats [('calls_captured', 6)] 2025-12-04T11:20:45.2888748Z inductor [('pattern_matcher_count', 6), ('pattern_matcher_nodes', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('pad_mm_bench', 1)] 2025-12-04T11:20:45.2888989Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:20:45.2889090Z graph_break [] 2025-12-04T11:20:45.2889215Z aten_mm_info [('aten.mm_24_72_1024', 2)] 2025-12-04T11:20:45.2889452Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:20:45.2890189Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.2890332Z warnings.warn( 2025-12-04T11:20:45.2891064Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.2891171Z warnings.warn( 2025-12-04T11:20:45.2891694Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 _ 2025-12-04T11:20:45.2891819Z Traceback (most recent call last): 2025-12-04T11:20:45.2892331Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda 2025-12-04T11:20:45.2892579Z self.assertEqual(counters["inductor"]["woq_matcher_count"], 1) 2025-12-04T11:20:45.2893044Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual 2025-12-04T11:20:45.2893210Z return super().assertEqual(x, y, *args, **kwargs) 2025-12-04T11:20:45.2893761Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual 2025-12-04T11:20:45.2893970Z raise error_metas.pop()[0].to_error( # type: ignore[index] 2025-12-04T11:20:45.2894115Z AssertionError: Scalars are not equal! 2025-12-04T11:20:45.2894123Z 2025-12-04T11:20:45.2894230Z Expected 1 but got 0. 2025-12-04T11:20:45.2894338Z Absolute difference: 1 2025-12-04T11:20:45.2894464Z Relative difference: 1.0 2025-12-04T11:20:45.2894469Z 2025-12-04T11:20:45.2894686Z To execute this test, run the following from the base repo dir: 2025-12-04T11:20:45.2895608Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 2025-12-04T11:20:45.2895614Z 2025-12-04T11:20:45.2895888Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:20:45.2896110Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:20:45.2896245Z stats [('calls_captured', 6)] 2025-12-04T11:20:45.2897020Z inductor [('pattern_matcher_count', 6), ('pattern_matcher_nodes', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('pad_mm_bench', 1)] 2025-12-04T11:20:45.2897269Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:20:45.2897371Z graph_break [] 2025-12-04T11:20:45.2897494Z aten_mm_info [('aten.mm_24_72_1024', 2)] 2025-12-04T11:20:45.2897731Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:20:45.2898462Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.2898567Z warnings.warn( 2025-12-04T11:20:45.2900261Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.2900381Z warnings.warn( 2025-12-04T11:20:45.2900619Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:20:45.2900741Z stats [('calls_captured', 6)] 2025-12-04T11:20:45.2901008Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:20:45.2901717Z inductor [('pattern_matcher_count', 6), ('pattern_matcher_nodes', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('pad_mm_bench', 1)] 2025-12-04T11:20:45.2901821Z graph_break [] 2025-12-04T11:20:45.2901946Z aten_mm_info [('aten.mm_24_72_1024', 2)] 2025-12-04T11:20:45.2902178Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:20:45.2902907Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.2903069Z warnings.warn( 2025-12-04T11:20:45.2903784Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.2903889Z warnings.warn( 2025-12-04T11:20:45.2904058Z =================================== FAILURES =================================== 2025-12-04T11:20:45.2904571Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 _ 2025-12-04T11:20:45.2904711Z Traceback (most recent call last): 2025-12-04T11:20:45.2905222Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda 2025-12-04T11:20:45.2905458Z self.assertEqual(counters["inductor"]["woq_matcher_count"], 1) 2025-12-04T11:20:45.2905939Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual 2025-12-04T11:20:45.2906107Z return super().assertEqual(x, y, *args, **kwargs) 2025-12-04T11:20:45.2906642Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual 2025-12-04T11:20:45.2906869Z raise error_metas.pop()[0].to_error( # type: ignore[index] 2025-12-04T11:20:45.2907005Z AssertionError: Scalars are not equal! 2025-12-04T11:20:45.2907011Z 2025-12-04T11:20:45.2907131Z Expected 1 but got 0. 2025-12-04T11:20:45.2907242Z Absolute difference: 1 2025-12-04T11:20:45.2907354Z Relative difference: 1.0 2025-12-04T11:20:45.2907361Z 2025-12-04T11:20:45.2907590Z To execute this test, run the following from the base repo dir: 2025-12-04T11:20:45.2908505Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 2025-12-04T11:20:45.2908511Z 2025-12-04T11:20:45.2908799Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:20:45.2909020Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:20:45.2909141Z stats [('calls_captured', 6)] 2025-12-04T11:20:45.2909849Z inductor [('pattern_matcher_count', 6), ('pattern_matcher_nodes', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('pad_mm_bench', 1)] 2025-12-04T11:20:45.2910081Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:20:45.2910181Z graph_break [] 2025-12-04T11:20:45.2910316Z aten_mm_info [('aten.mm_24_72_1024', 2)] 2025-12-04T11:20:45.2910532Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:20:45.2911340Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.2911444Z warnings.warn( 2025-12-04T11:20:45.2912171Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.2912291Z warnings.warn( 2025-12-04T11:20:45.2912508Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:20:45.2912658Z stats [('calls_captured', 6)] 2025-12-04T11:20:45.2912904Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:20:45.2913596Z inductor [('pattern_matcher_count', 6), ('pattern_matcher_nodes', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('pad_mm_bench', 1)] 2025-12-04T11:20:45.2913713Z graph_break [] 2025-12-04T11:20:45.2913835Z aten_mm_info [('aten.mm_24_72_1024', 2)] 2025-12-04T11:20:45.2914052Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:20:45.2914793Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.2914928Z warnings.warn( 2025-12-04T11:20:45.2915658Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.2915763Z warnings.warn( 2025-12-04T11:20:45.2915980Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:20:45.2916113Z stats [('calls_captured', 6)] 2025-12-04T11:20:45.2916341Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:20:45.2917042Z inductor [('pattern_matcher_count', 6), ('pattern_matcher_nodes', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('pad_mm_bench', 1)] 2025-12-04T11:20:45.2917159Z graph_break [] 2025-12-04T11:20:45.2917286Z aten_mm_info [('aten.mm_24_72_1024', 2)] 2025-12-04T11:20:45.2917515Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:20:45.2918241Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.2918344Z warnings.warn( 2025-12-04T11:20:45.2919080Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.2919181Z warnings.warn( 2025-12-04T11:20:45.2920016Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-3707d31910126ebf.xml - 2025-12-04T11:20:45.2920207Z =========================== short test summary info ============================ 2025-12-04T11:20:45.2921158Z FAILED [0.5503s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 - AssertionError: Scalars are not equal! 2025-12-04T11:20:45.2921166Z 2025-12-04T11:20:45.2921286Z Expected 1 but got 0. 2025-12-04T11:20:45.2921395Z Absolute difference: 1 2025-12-04T11:20:45.2921506Z Relative difference: 1.0 2025-12-04T11:20:45.2921524Z 2025-12-04T11:20:45.2921746Z To execute this test, run the following from the base repo dir: 2025-12-04T11:20:45.2922649Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 2025-12-04T11:20:45.2922655Z 2025-12-04T11:20:45.2922936Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:20:45.2923116Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:20:45.2923377Z =================== 1 failed, 2 deselected, 2 rerun in 5.13s =================== 2025-12-04T11:20:45.2923496Z Got exit code 1 2025-12-04T11:20:45.2923606Z Retrying single test... 2025-12-04T11:20:45.2924065Z W1204 11:02:58.886000 87587 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T11:20:45.2924733Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-dedaec5daecec784.xml 2025-12-04T11:20:45.2924934Z ============================= test session starts ============================== 2025-12-04T11:20:45.2925299Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T11:20:45.2925410Z cachedir: .pytest_cache 2025-12-04T11:20:45.2925943Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:20:45.2926072Z rootdir: /var/lib/jenkins/workspace 2025-12-04T11:20:45.2926187Z configfile: pytest.ini 2025-12-04T11:20:45.2926742Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:20:45.2927013Z collecting ... collected 58 items / 13 deselected / 45 selected 2025-12-04T11:20:45.2927999Z stepcurrent: skipping 2 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 2025-12-04T11:20:45.2928133Z Running 1 items in this shard 2025-12-04T11:20:45.2928139Z 2025-12-04T11:20:45.2929415Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 [W1204 11:03:04.867016663 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2929422Z 2025-12-04T11:20:45.2929956Z [W1204 11:03:20.684596446 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2929964Z 2025-12-04T11:20:45.2930478Z [W1204 11:03:20.684856249 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2930484Z 2025-12-04T11:20:45.2931007Z [W1204 11:03:20.693826002 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2931015Z 2025-12-04T11:20:45.2931522Z [W1204 11:03:20.694540265 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2931527Z 2025-12-04T11:20:45.2932045Z [W1204 11:03:20.694731773 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2932050Z 2025-12-04T11:20:45.2932561Z [W1204 11:03:20.703014190 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2932568Z 2025-12-04T11:20:45.2933087Z [W1204 11:03:20.703663002 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2933092Z 2025-12-04T11:20:45.2933602Z [W1204 11:03:20.703852153 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2933610Z 2025-12-04T11:20:45.2934075Z W1204 11:03:20.425000 87587 site-packages/torch/_inductor/utils.py:1703] [0/0] Not enough SMs to use max_autotune_gemm mode 2025-12-04T11:20:45.2934598Z [W1204 11:03:20.901339401 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2934603Z 2025-12-04T11:20:45.2935183Z [W1204 11:03:20.903131457 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2935189Z 2025-12-04T11:20:45.2935714Z [W1204 11:03:20.903352137 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2935719Z 2025-12-04T11:20:45.2936229Z [W1204 11:03:20.908778546 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2936264Z 2025-12-04T11:20:45.2936872Z [W1204 11:03:20.909471360 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2936879Z 2025-12-04T11:20:45.2937387Z [W1204 11:03:20.909683967 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2937392Z 2025-12-04T11:20:45.2937915Z [W1204 11:03:20.917269454 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2937926Z 2025-12-04T11:20:45.2938432Z [W1204 11:03:20.917986983 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2938497Z 2025-12-04T11:20:45.2939006Z [W1204 11:03:20.918199072 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2939029Z 2025-12-04T11:20:45.2939165Z ('RERUN', {'yellow': True}) [19.8774s] [100%] 2025-12-04T11:20:45.2940437Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 [W1204 11:03:20.376636447 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2940443Z 2025-12-04T11:20:45.2940971Z [W1204 11:03:20.377396283 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2940981Z 2025-12-04T11:20:45.2941490Z [W1204 11:03:20.377607821 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2941498Z 2025-12-04T11:20:45.2942016Z [W1204 11:03:20.383054833 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2942023Z 2025-12-04T11:20:45.2942535Z [W1204 11:03:20.383716494 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2942539Z 2025-12-04T11:20:45.2943062Z [W1204 11:03:20.383913795 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2943067Z 2025-12-04T11:20:45.2943573Z [W1204 11:03:21.391417901 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2943577Z 2025-12-04T11:20:45.2944092Z [W1204 11:03:21.392070147 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2944113Z 2025-12-04T11:20:45.2944623Z [W1204 11:03:21.392262362 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2944628Z 2025-12-04T11:20:45.2945139Z [W1204 11:03:21.504280689 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2945144Z 2025-12-04T11:20:45.2945664Z [W1204 11:03:21.505075056 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2945670Z 2025-12-04T11:20:45.2946178Z [W1204 11:03:21.505287256 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2946184Z 2025-12-04T11:20:45.2946769Z [W1204 11:03:21.510671171 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2946777Z 2025-12-04T11:20:45.2947288Z [W1204 11:03:21.511336982 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2947293Z 2025-12-04T11:20:45.2947814Z [W1204 11:03:21.511535560 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2947853Z 2025-12-04T11:20:45.2948362Z [W1204 11:03:21.519093013 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2948367Z 2025-12-04T11:20:45.2948889Z [W1204 11:03:21.519734683 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2948894Z 2025-12-04T11:20:45.2949407Z [W1204 11:03:21.519931880 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2949446Z 2025-12-04T11:20:45.2949579Z ('RERUN', {'yellow': True}) [0.5614s] [100%] 2025-12-04T11:20:45.2950858Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 [W1204 11:03:21.915511863 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2950867Z 2025-12-04T11:20:45.2951380Z [W1204 11:03:21.916272016 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2951385Z 2025-12-04T11:20:45.2951907Z [W1204 11:03:21.916472848 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2951912Z 2025-12-04T11:20:45.2952426Z [W1204 11:03:21.921954581 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2952432Z 2025-12-04T11:20:45.2952950Z [W1204 11:03:21.922587647 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2952955Z 2025-12-04T11:20:45.2953466Z [W1204 11:03:21.922779947 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2953474Z 2025-12-04T11:20:45.2953994Z [W1204 11:03:21.930226060 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2953999Z 2025-12-04T11:20:45.2954509Z [W1204 11:03:21.930868017 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2954514Z 2025-12-04T11:20:45.2955040Z [W1204 11:03:21.931057118 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2955047Z 2025-12-04T11:20:45.2955554Z [W1204 11:03:21.047529530 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2955559Z 2025-12-04T11:20:45.2956066Z [W1204 11:03:21.048299181 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2956073Z 2025-12-04T11:20:45.2956591Z [W1204 11:03:21.048502969 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2956596Z 2025-12-04T11:20:45.2957106Z [W1204 11:03:21.054095047 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2957111Z 2025-12-04T11:20:45.2957693Z [W1204 11:03:21.054753017 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2957702Z 2025-12-04T11:20:45.2958212Z [W1204 11:03:21.054950545 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2958217Z 2025-12-04T11:20:45.2958736Z [W1204 11:03:21.062502445 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2958770Z 2025-12-04T11:20:45.2959278Z [W1204 11:03:21.063162059 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2959283Z 2025-12-04T11:20:45.2959801Z [W1204 11:03:21.063361165 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.2959806Z 2025-12-04T11:20:45.2959910Z FAILED [0.5428s] [100%] 2025-12-04T11:20:45.2959916Z 2025-12-04T11:20:45.2960070Z ==================================== RERUNS ==================================== 2025-12-04T11:20:45.2960627Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 _ 2025-12-04T11:20:45.2960753Z Traceback (most recent call last): 2025-12-04T11:20:45.2961284Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda 2025-12-04T11:20:45.2961520Z self.assertEqual(counters["inductor"]["woq_matcher_count"], 1) 2025-12-04T11:20:45.2961987Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual 2025-12-04T11:20:45.2962168Z return super().assertEqual(x, y, *args, **kwargs) 2025-12-04T11:20:45.2962707Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual 2025-12-04T11:20:45.2962916Z raise error_metas.pop()[0].to_error( # type: ignore[index] 2025-12-04T11:20:45.2963068Z AssertionError: Scalars are not equal! 2025-12-04T11:20:45.2963075Z 2025-12-04T11:20:45.2963181Z Expected 1 but got 0. 2025-12-04T11:20:45.2963300Z Absolute difference: 1 2025-12-04T11:20:45.2963412Z Relative difference: 1.0 2025-12-04T11:20:45.2963417Z 2025-12-04T11:20:45.2963634Z To execute this test, run the following from the base repo dir: 2025-12-04T11:20:45.2964563Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 2025-12-04T11:20:45.2964571Z 2025-12-04T11:20:45.2964843Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:20:45.2965077Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:20:45.2965194Z stats [('calls_captured', 6)] 2025-12-04T11:20:45.2965901Z inductor [('pattern_matcher_count', 6), ('pattern_matcher_nodes', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('pad_mm_bench', 1)] 2025-12-04T11:20:45.2966148Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:20:45.2966249Z graph_break [] 2025-12-04T11:20:45.2966372Z aten_mm_info [('aten.mm_24_72_1024', 2)] 2025-12-04T11:20:45.2966607Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:20:45.2967822Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T11:20:45.2967966Z if out == self.unknown_value: 2025-12-04T11:20:45.2968693Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.2968798Z warnings.warn( 2025-12-04T11:20:45.2969594Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.2969704Z warnings.warn( 2025-12-04T11:20:45.2970229Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 _ 2025-12-04T11:20:45.2970356Z Traceback (most recent call last): 2025-12-04T11:20:45.2970900Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda 2025-12-04T11:20:45.2971418Z self.assertEqual(counters["inductor"]["woq_matcher_count"], 1) 2025-12-04T11:20:45.2971882Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual 2025-12-04T11:20:45.2972066Z return super().assertEqual(x, y, *args, **kwargs) 2025-12-04T11:20:45.2972609Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual 2025-12-04T11:20:45.2972818Z raise error_metas.pop()[0].to_error( # type: ignore[index] 2025-12-04T11:20:45.2973052Z AssertionError: Scalars are not equal! 2025-12-04T11:20:45.2973058Z 2025-12-04T11:20:45.2973166Z Expected 1 but got 0. 2025-12-04T11:20:45.2973277Z Absolute difference: 1 2025-12-04T11:20:45.2973406Z Relative difference: 1.0 2025-12-04T11:20:45.2973413Z 2025-12-04T11:20:45.2973634Z To execute this test, run the following from the base repo dir: 2025-12-04T11:20:45.2974558Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 2025-12-04T11:20:45.2974563Z 2025-12-04T11:20:45.2974835Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:20:45.2975058Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:20:45.2975198Z stats [('calls_captured', 6)] 2025-12-04T11:20:45.2975899Z inductor [('pattern_matcher_count', 6), ('pattern_matcher_nodes', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('pad_mm_bench', 1)] 2025-12-04T11:20:45.2976149Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:20:45.2976253Z graph_break [] 2025-12-04T11:20:45.2976444Z aten_mm_info [('aten.mm_24_72_1024', 2)] 2025-12-04T11:20:45.2976683Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:20:45.2977889Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T11:20:45.2978024Z if out == self.unknown_value: 2025-12-04T11:20:45.2978754Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.2978859Z warnings.warn( 2025-12-04T11:20:45.2979597Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.2979703Z warnings.warn( 2025-12-04T11:20:45.2979925Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:20:45.2980065Z stats [('calls_captured', 6)] 2025-12-04T11:20:45.2980296Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:20:45.2981004Z inductor [('pattern_matcher_count', 6), ('pattern_matcher_nodes', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('pad_mm_bench', 1)] 2025-12-04T11:20:45.2981107Z graph_break [] 2025-12-04T11:20:45.2981229Z aten_mm_info [('aten.mm_24_72_1024', 2)] 2025-12-04T11:20:45.2981570Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:20:45.2982292Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.2982397Z warnings.warn( 2025-12-04T11:20:45.2983125Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.2983287Z warnings.warn( 2025-12-04T11:20:45.2983448Z =================================== FAILURES =================================== 2025-12-04T11:20:45.2983959Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 _ 2025-12-04T11:20:45.2984082Z Traceback (most recent call last): 2025-12-04T11:20:45.2984606Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda 2025-12-04T11:20:45.2984845Z self.assertEqual(counters["inductor"]["woq_matcher_count"], 1) 2025-12-04T11:20:45.2985346Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual 2025-12-04T11:20:45.2985512Z return super().assertEqual(x, y, *args, **kwargs) 2025-12-04T11:20:45.2986050Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual 2025-12-04T11:20:45.2986276Z raise error_metas.pop()[0].to_error( # type: ignore[index] 2025-12-04T11:20:45.2986409Z AssertionError: Scalars are not equal! 2025-12-04T11:20:45.2986415Z 2025-12-04T11:20:45.2986522Z Expected 1 but got 0. 2025-12-04T11:20:45.2986646Z Absolute difference: 1 2025-12-04T11:20:45.2986757Z Relative difference: 1.0 2025-12-04T11:20:45.2986763Z 2025-12-04T11:20:45.2986995Z To execute this test, run the following from the base repo dir: 2025-12-04T11:20:45.2987910Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 2025-12-04T11:20:45.2987918Z 2025-12-04T11:20:45.2988188Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:20:45.2988422Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:20:45.2988539Z stats [('calls_captured', 6)] 2025-12-04T11:20:45.2989250Z inductor [('pattern_matcher_count', 6), ('pattern_matcher_nodes', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('pad_mm_bench', 1)] 2025-12-04T11:20:45.2989481Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:20:45.2989582Z graph_break [] 2025-12-04T11:20:45.2989720Z aten_mm_info [('aten.mm_24_72_1024', 2)] 2025-12-04T11:20:45.2989939Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:20:45.2991148Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T11:20:45.2991282Z if out == self.unknown_value: 2025-12-04T11:20:45.2992002Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.2992120Z warnings.warn( 2025-12-04T11:20:45.2992839Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.2992942Z warnings.warn( 2025-12-04T11:20:45.2993174Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:20:45.2993290Z stats [('calls_captured', 6)] 2025-12-04T11:20:45.2993536Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:20:45.2994304Z inductor [('pattern_matcher_count', 6), ('pattern_matcher_nodes', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('pad_mm_bench', 1)] 2025-12-04T11:20:45.2994411Z graph_break [] 2025-12-04T11:20:45.2994547Z aten_mm_info [('aten.mm_24_72_1024', 2)] 2025-12-04T11:20:45.2994762Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:20:45.2995515Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.2995629Z warnings.warn( 2025-12-04T11:20:45.2996347Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.2996462Z warnings.warn( 2025-12-04T11:20:45.2996677Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:20:45.2996798Z stats [('calls_captured', 6)] 2025-12-04T11:20:45.2997075Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:20:45.2997770Z inductor [('pattern_matcher_count', 6), ('pattern_matcher_nodes', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('pad_mm_bench', 1)] 2025-12-04T11:20:45.2997883Z graph_break [] 2025-12-04T11:20:45.2998008Z aten_mm_info [('aten.mm_24_72_1024', 2)] 2025-12-04T11:20:45.2998223Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:20:45.2998960Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.2999064Z warnings.warn( 2025-12-04T11:20:45.2999777Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.2999899Z warnings.warn( 2025-12-04T11:20:45.3000746Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-dedaec5daecec784.xml - 2025-12-04T11:20:45.3000935Z =========================== short test summary info ============================ 2025-12-04T11:20:45.3001882Z FAILED [0.5428s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 - AssertionError: Scalars are not equal! 2025-12-04T11:20:45.3001890Z 2025-12-04T11:20:45.3001997Z Expected 1 but got 0. 2025-12-04T11:20:45.3002120Z Absolute difference: 1 2025-12-04T11:20:45.3002234Z Relative difference: 1.0 2025-12-04T11:20:45.3002239Z 2025-12-04T11:20:45.3002466Z To execute this test, run the following from the base repo dir: 2025-12-04T11:20:45.3003375Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 2025-12-04T11:20:45.3003383Z 2025-12-04T11:20:45.3003651Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:20:45.3003847Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:20:45.3004049Z ================== 1 failed, 13 deselected, 2 rerun in 21.01s ================== 2025-12-04T11:20:45.3004166Z Got exit code 1 2025-12-04T11:20:45.3004272Z Retrying single test... 2025-12-04T11:20:45.3004718Z W1204 11:03:33.460000 87769 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T11:20:45.3005390Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-2f4f0e9c4ac682e4.xml 2025-12-04T11:20:45.3005623Z ============================= test session starts ============================== 2025-12-04T11:20:45.3005977Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T11:20:45.3006105Z cachedir: .pytest_cache 2025-12-04T11:20:45.3006623Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:20:45.3006761Z rootdir: /var/lib/jenkins/workspace 2025-12-04T11:20:45.3006904Z configfile: pytest.ini 2025-12-04T11:20:45.3007445Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:20:45.3007675Z collecting ... collected 58 items / 13 deselected / 45 selected 2025-12-04T11:20:45.3008664Z stepcurrent: skipping 2 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 2025-12-04T11:20:45.3008799Z Running 1 items in this shard 2025-12-04T11:20:45.3008805Z 2025-12-04T11:20:45.3010112Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 [W1204 11:03:39.443560793 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3010118Z 2025-12-04T11:20:45.3010637Z [W1204 11:03:55.685063911 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3010654Z 2025-12-04T11:20:45.3011166Z [W1204 11:03:55.685333822 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3011171Z 2025-12-04T11:20:45.3011682Z [W1204 11:03:55.694546155 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3011687Z 2025-12-04T11:20:45.3012215Z [W1204 11:03:55.695305986 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3012223Z 2025-12-04T11:20:45.3012732Z [W1204 11:03:55.695499539 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3012737Z 2025-12-04T11:20:45.3013264Z [W1204 11:03:55.704021790 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3013270Z 2025-12-04T11:20:45.3013780Z [W1204 11:03:55.704725026 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3013785Z 2025-12-04T11:20:45.3014302Z [W1204 11:03:55.704911622 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3014307Z 2025-12-04T11:20:45.3014773Z W1204 11:03:55.428000 87769 site-packages/torch/_inductor/utils.py:1703] [0/0] Not enough SMs to use max_autotune_gemm mode 2025-12-04T11:20:45.3015297Z [W1204 11:03:55.907822037 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3015302Z 2025-12-04T11:20:45.3015814Z [W1204 11:03:55.909628467 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3015821Z 2025-12-04T11:20:45.3016403Z [W1204 11:03:55.909846753 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3016410Z 2025-12-04T11:20:45.3016929Z [W1204 11:03:55.915407062 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3016934Z 2025-12-04T11:20:45.3017509Z [W1204 11:03:55.916141120 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3017518Z 2025-12-04T11:20:45.3018043Z [W1204 11:03:55.916349107 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3018049Z 2025-12-04T11:20:45.3018554Z [W1204 11:03:55.924069211 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3018588Z 2025-12-04T11:20:45.3019107Z [W1204 11:03:55.924775219 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3019112Z 2025-12-04T11:20:45.3019619Z [W1204 11:03:55.924972639 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3019625Z 2025-12-04T11:20:45.3019774Z ('RERUN', {'yellow': True}) [20.3057s] [100%] 2025-12-04T11:20:45.3021053Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 [W1204 11:03:56.392670584 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3021088Z 2025-12-04T11:20:45.3021598Z [W1204 11:03:56.393484189 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3021620Z 2025-12-04T11:20:45.3022130Z [W1204 11:03:56.393717535 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3022135Z 2025-12-04T11:20:45.3022643Z [W1204 11:03:56.399406441 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3022649Z 2025-12-04T11:20:45.3023176Z [W1204 11:03:56.400144655 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3023181Z 2025-12-04T11:20:45.3023694Z [W1204 11:03:56.400357246 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3023699Z 2025-12-04T11:20:45.3024219Z [W1204 11:03:56.408172706 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3024226Z 2025-12-04T11:20:45.3024733Z [W1204 11:03:56.408878116 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3024738Z 2025-12-04T11:20:45.3025262Z [W1204 11:03:56.409075174 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3025267Z 2025-12-04T11:20:45.3025772Z [W1204 11:03:56.523846091 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3025781Z 2025-12-04T11:20:45.3026304Z [W1204 11:03:56.524667495 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3026311Z 2025-12-04T11:20:45.3026821Z [W1204 11:03:56.524883891 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3026828Z 2025-12-04T11:20:45.3027336Z [W1204 11:03:56.530437729 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3027353Z 2025-12-04T11:20:45.3027863Z [W1204 11:03:56.531165776 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3027868Z 2025-12-04T11:20:45.3028376Z [W1204 11:03:56.531373305 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3028381Z 2025-12-04T11:20:45.3028988Z [W1204 11:03:56.539267769 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3028996Z 2025-12-04T11:20:45.3029511Z [W1204 11:03:56.539964477 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3029516Z 2025-12-04T11:20:45.3030067Z [W1204 11:03:56.540189222 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3030073Z 2025-12-04T11:20:45.3030205Z ('RERUN', {'yellow': True}) [0.5748s] [100%] 2025-12-04T11:20:45.3031493Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 [W1204 11:03:56.937947528 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3031498Z 2025-12-04T11:20:45.3032013Z [W1204 11:03:56.938761270 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3032048Z 2025-12-04T11:20:45.3032557Z [W1204 11:03:56.938977802 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3032576Z 2025-12-04T11:20:45.3033090Z [W1204 11:03:56.944681370 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3033095Z 2025-12-04T11:20:45.3033605Z [W1204 11:03:56.945403725 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3033610Z 2025-12-04T11:20:45.3034130Z [W1204 11:03:56.945610005 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3034135Z 2025-12-04T11:20:45.3034648Z [W1204 11:03:56.953419964 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3034655Z 2025-12-04T11:20:45.3035177Z [W1204 11:03:56.954122726 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3035182Z 2025-12-04T11:20:45.3035693Z [W1204 11:03:56.954326995 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3035701Z 2025-12-04T11:20:45.3036220Z [W1204 11:03:56.073975858 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3036225Z 2025-12-04T11:20:45.3036730Z [W1204 11:03:56.074795984 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3036735Z 2025-12-04T11:20:45.3037252Z [W1204 11:03:56.075011810 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3037261Z 2025-12-04T11:20:45.3060517Z [W1204 11:03:56.080807175 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3060526Z 2025-12-04T11:20:45.3061042Z [W1204 11:03:56.081536538 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3061061Z 2025-12-04T11:20:45.3061577Z [W1204 11:03:56.081743133 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3061587Z 2025-12-04T11:20:45.3062096Z [W1204 11:03:56.089449771 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3062101Z 2025-12-04T11:20:45.3062807Z [W1204 11:03:56.090178144 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3062817Z 2025-12-04T11:20:45.3063324Z [W1204 11:03:56.090390316 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3063329Z 2025-12-04T11:20:45.3063445Z FAILED [0.5487s] [100%] 2025-12-04T11:20:45.3063451Z 2025-12-04T11:20:45.3063638Z ==================================== RERUNS ==================================== 2025-12-04T11:20:45.3064150Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 _ 2025-12-04T11:20:45.3064290Z Traceback (most recent call last): 2025-12-04T11:20:45.3064804Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda 2025-12-04T11:20:45.3065031Z self.assertEqual(counters["inductor"]["woq_matcher_count"], 1) 2025-12-04T11:20:45.3065510Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual 2025-12-04T11:20:45.3065718Z return super().assertEqual(x, y, *args, **kwargs) 2025-12-04T11:20:45.3066266Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual 2025-12-04T11:20:45.3066471Z raise error_metas.pop()[0].to_error( # type: ignore[index] 2025-12-04T11:20:45.3066607Z AssertionError: Scalars are not equal! 2025-12-04T11:20:45.3066613Z 2025-12-04T11:20:45.3066729Z Expected 1 but got 0. 2025-12-04T11:20:45.3066835Z Absolute difference: 1 2025-12-04T11:20:45.3066956Z Relative difference: 1.0 2025-12-04T11:20:45.3066962Z 2025-12-04T11:20:45.3067176Z To execute this test, run the following from the base repo dir: 2025-12-04T11:20:45.3068087Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 2025-12-04T11:20:45.3068094Z 2025-12-04T11:20:45.3068378Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:20:45.3068601Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:20:45.3068729Z stats [('calls_captured', 6)] 2025-12-04T11:20:45.3069429Z inductor [('pattern_matcher_count', 6), ('pattern_matcher_nodes', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('pad_mm_bench', 1)] 2025-12-04T11:20:45.3069659Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:20:45.3069774Z graph_break [] 2025-12-04T11:20:45.3069892Z aten_mm_info [('aten.mm_24_72_1024', 2)] 2025-12-04T11:20:45.3070110Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:20:45.3071693Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T11:20:45.3071813Z if out == self.unknown_value: 2025-12-04T11:20:45.3072549Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.3072652Z warnings.warn( 2025-12-04T11:20:45.3073373Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.3073492Z warnings.warn( 2025-12-04T11:20:45.3074002Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 _ 2025-12-04T11:20:45.3074138Z Traceback (most recent call last): 2025-12-04T11:20:45.3074640Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda 2025-12-04T11:20:45.3074993Z self.assertEqual(counters["inductor"]["woq_matcher_count"], 1) 2025-12-04T11:20:45.3075470Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual 2025-12-04T11:20:45.3075631Z return super().assertEqual(x, y, *args, **kwargs) 2025-12-04T11:20:45.3076160Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual 2025-12-04T11:20:45.3076422Z raise error_metas.pop()[0].to_error( # type: ignore[index] 2025-12-04T11:20:45.3076552Z AssertionError: Scalars are not equal! 2025-12-04T11:20:45.3076558Z 2025-12-04T11:20:45.3076679Z Expected 1 but got 0. 2025-12-04T11:20:45.3076784Z Absolute difference: 1 2025-12-04T11:20:45.3076891Z Relative difference: 1.0 2025-12-04T11:20:45.3076897Z 2025-12-04T11:20:45.3077121Z To execute this test, run the following from the base repo dir: 2025-12-04T11:20:45.3078028Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 2025-12-04T11:20:45.3078090Z 2025-12-04T11:20:45.3078368Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:20:45.3078588Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:20:45.3078709Z stats [('calls_captured', 6)] 2025-12-04T11:20:45.3079420Z inductor [('pattern_matcher_count', 6), ('pattern_matcher_nodes', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('pad_mm_bench', 1)] 2025-12-04T11:20:45.3079646Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:20:45.3079741Z graph_break [] 2025-12-04T11:20:45.3079879Z aten_mm_info [('aten.mm_24_72_1024', 2)] 2025-12-04T11:20:45.3080092Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:20:45.3081325Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T11:20:45.3081443Z if out == self.unknown_value: 2025-12-04T11:20:45.3082161Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.3082278Z warnings.warn( 2025-12-04T11:20:45.3082989Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.3083096Z warnings.warn( 2025-12-04T11:20:45.3083311Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:20:45.3083425Z stats [('calls_captured', 6)] 2025-12-04T11:20:45.3083669Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:20:45.3084358Z inductor [('pattern_matcher_count', 6), ('pattern_matcher_nodes', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('pad_mm_bench', 1)] 2025-12-04T11:20:45.3084454Z graph_break [] 2025-12-04T11:20:45.3084585Z aten_mm_info [('aten.mm_24_72_1024', 2)] 2025-12-04T11:20:45.3084797Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:20:45.3085531Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.3085628Z warnings.warn( 2025-12-04T11:20:45.3086334Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.3086443Z warnings.warn( 2025-12-04T11:20:45.3086669Z =================================== FAILURES =================================== 2025-12-04T11:20:45.3087188Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 _ 2025-12-04T11:20:45.3087309Z Traceback (most recent call last): 2025-12-04T11:20:45.3087813Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda 2025-12-04T11:20:45.3088103Z self.assertEqual(counters["inductor"]["woq_matcher_count"], 1) 2025-12-04T11:20:45.3088563Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual 2025-12-04T11:20:45.3088727Z return super().assertEqual(x, y, *args, **kwargs) 2025-12-04T11:20:45.3089269Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual 2025-12-04T11:20:45.3089472Z raise error_metas.pop()[0].to_error( # type: ignore[index] 2025-12-04T11:20:45.3089618Z AssertionError: Scalars are not equal! 2025-12-04T11:20:45.3089624Z 2025-12-04T11:20:45.3089760Z Expected 1 but got 0. 2025-12-04T11:20:45.3089865Z Absolute difference: 1 2025-12-04T11:20:45.3089988Z Relative difference: 1.0 2025-12-04T11:20:45.3089993Z 2025-12-04T11:20:45.3090206Z To execute this test, run the following from the base repo dir: 2025-12-04T11:20:45.3091124Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 2025-12-04T11:20:45.3091133Z 2025-12-04T11:20:45.3091400Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:20:45.3091617Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:20:45.3091743Z stats [('calls_captured', 6)] 2025-12-04T11:20:45.3092440Z inductor [('pattern_matcher_count', 6), ('pattern_matcher_nodes', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('pad_mm_bench', 1)] 2025-12-04T11:20:45.3092679Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:20:45.3092776Z graph_break [] 2025-12-04T11:20:45.3092896Z aten_mm_info [('aten.mm_24_72_1024', 2)] 2025-12-04T11:20:45.3093123Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:20:45.3094332Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T11:20:45.3094448Z if out == self.unknown_value: 2025-12-04T11:20:45.3095188Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.3095287Z warnings.warn( 2025-12-04T11:20:45.3096019Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.3096121Z warnings.warn( 2025-12-04T11:20:45.3096413Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:20:45.3096547Z stats [('calls_captured', 6)] 2025-12-04T11:20:45.3096771Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:20:45.3097463Z inductor [('pattern_matcher_count', 6), ('pattern_matcher_nodes', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('pad_mm_bench', 1)] 2025-12-04T11:20:45.3097578Z graph_break [] 2025-12-04T11:20:45.3097698Z aten_mm_info [('aten.mm_24_72_1024', 2)] 2025-12-04T11:20:45.3097926Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:20:45.3098715Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.3098818Z warnings.warn( 2025-12-04T11:20:45.3099546Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.3099645Z warnings.warn( 2025-12-04T11:20:45.3099870Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:20:45.3100018Z stats [('calls_captured', 6)] 2025-12-04T11:20:45.3100242Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:20:45.3100946Z inductor [('pattern_matcher_count', 6), ('pattern_matcher_nodes', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('pad_mm_bench', 1)] 2025-12-04T11:20:45.3101041Z graph_break [] 2025-12-04T11:20:45.3101165Z aten_mm_info [('aten.mm_24_72_1024', 2)] 2025-12-04T11:20:45.3101394Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:20:45.3102111Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.3102250Z warnings.warn( 2025-12-04T11:20:45.3102965Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.3103066Z warnings.warn( 2025-12-04T11:20:45.3103915Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-2f4f0e9c4ac682e4.xml - 2025-12-04T11:20:45.3104089Z =========================== short test summary info ============================ 2025-12-04T11:20:45.3105051Z FAILED [0.5487s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 - AssertionError: Scalars are not equal! 2025-12-04T11:20:45.3105059Z 2025-12-04T11:20:45.3105167Z Expected 1 but got 0. 2025-12-04T11:20:45.3105274Z Absolute difference: 1 2025-12-04T11:20:45.3105392Z Relative difference: 1.0 2025-12-04T11:20:45.3105397Z 2025-12-04T11:20:45.3105608Z To execute this test, run the following from the base repo dir: 2025-12-04T11:20:45.3106517Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 2025-12-04T11:20:45.3106525Z 2025-12-04T11:20:45.3106793Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:20:45.3106969Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:20:45.3107178Z ================== 1 failed, 13 deselected, 2 rerun in 21.46s ================== 2025-12-04T11:20:45.3107277Z Got exit code 1 2025-12-04T11:20:45.3108098Z FAILED CONSISTENTLY: test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 2025-12-04T11:20:45.3108521Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T11:20:45.3108959Z W1204 11:04:08.727000 87951 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T11:20:45.3109627Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-580d25229e34cb07.xml 2025-12-04T11:20:45.3109788Z ============================= test session starts ============================== 2025-12-04T11:20:45.3110137Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T11:20:45.3110256Z cachedir: .pytest_cache 2025-12-04T11:20:45.3110838Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:20:45.3110981Z rootdir: /var/lib/jenkins/workspace 2025-12-04T11:20:45.3111090Z configfile: pytest.ini 2025-12-04T11:20:45.3111631Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:20:45.3111856Z collecting ... collected 58 items / 3 deselected / 55 selected 2025-12-04T11:20:45.3112033Z stepcurrent: skipping 3 already run items. 2025-12-04T11:20:45.3112149Z Running 11 items in this shard 2025-12-04T11:20:45.3112164Z 2025-12-04T11:20:45.3113037Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 ('RERUN', {'yellow': True}) [4.2622s] [ 9%] 2025-12-04T11:20:45.3113906Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 ('RERUN', {'yellow': True}) [0.5567s] [ 9%] 2025-12-04T11:20:45.3114726Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 FAILED [0.5499s] [ 9%] 2025-12-04T11:20:45.3114732Z 2025-12-04T11:20:45.3114874Z ==================================== RERUNS ==================================== 2025-12-04T11:20:45.3115392Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 _ 2025-12-04T11:20:45.3115515Z Traceback (most recent call last): 2025-12-04T11:20:45.3116026Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda 2025-12-04T11:20:45.3116264Z self.assertEqual(counters["inductor"]["woq_matcher_count"], 1) 2025-12-04T11:20:45.3116727Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual 2025-12-04T11:20:45.3116909Z return super().assertEqual(x, y, *args, **kwargs) 2025-12-04T11:20:45.3117452Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual 2025-12-04T11:20:45.3117655Z raise error_metas.pop()[0].to_error( # type: ignore[index] 2025-12-04T11:20:45.3117798Z AssertionError: Scalars are not equal! 2025-12-04T11:20:45.3117806Z 2025-12-04T11:20:45.3117910Z Expected 1 but got 2. 2025-12-04T11:20:45.3118018Z Absolute difference: 1 2025-12-04T11:20:45.3118141Z Relative difference: 1.0 2025-12-04T11:20:45.3118147Z 2025-12-04T11:20:45.3118358Z To execute this test, run the following from the base repo dir: 2025-12-04T11:20:45.3119276Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 2025-12-04T11:20:45.3119282Z 2025-12-04T11:20:45.3119552Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:20:45.3119772Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:20:45.3119903Z stats [('calls_captured', 6)] 2025-12-04T11:20:45.3120787Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)] 2025-12-04T11:20:45.3121031Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:20:45.3121132Z graph_break [] 2025-12-04T11:20:45.3121350Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:20:45.3122095Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.3122199Z warnings.warn( 2025-12-04T11:20:45.3122995Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.3123101Z warnings.warn( 2025-12-04T11:20:45.3123613Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 _ 2025-12-04T11:20:45.3123749Z Traceback (most recent call last): 2025-12-04T11:20:45.3124292Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda 2025-12-04T11:20:45.3124526Z self.assertEqual(counters["inductor"]["woq_matcher_count"], 1) 2025-12-04T11:20:45.3125000Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual 2025-12-04T11:20:45.3125163Z return super().assertEqual(x, y, *args, **kwargs) 2025-12-04T11:20:45.3125719Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual 2025-12-04T11:20:45.3125928Z raise error_metas.pop()[0].to_error( # type: ignore[index] 2025-12-04T11:20:45.3126111Z AssertionError: Scalars are not equal! 2025-12-04T11:20:45.3126116Z 2025-12-04T11:20:45.3126239Z Expected 1 but got 2. 2025-12-04T11:20:45.3126348Z Absolute difference: 1 2025-12-04T11:20:45.3126461Z Relative difference: 1.0 2025-12-04T11:20:45.3126480Z 2025-12-04T11:20:45.3126701Z To execute this test, run the following from the base repo dir: 2025-12-04T11:20:45.3127613Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 2025-12-04T11:20:45.3127619Z 2025-12-04T11:20:45.3127905Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:20:45.3128127Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:20:45.3128250Z stats [('calls_captured', 6)] 2025-12-04T11:20:45.3129158Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)] 2025-12-04T11:20:45.3129394Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:20:45.3129511Z graph_break [] 2025-12-04T11:20:45.3129731Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:20:45.3130462Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.3130582Z warnings.warn( 2025-12-04T11:20:45.3131304Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.3131423Z warnings.warn( 2025-12-04T11:20:45.3131648Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:20:45.3131768Z stats [('calls_captured', 6)] 2025-12-04T11:20:45.3132013Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:20:45.3132904Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)] 2025-12-04T11:20:45.3133009Z graph_break [] 2025-12-04T11:20:45.3133242Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:20:45.3133965Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.3134082Z warnings.warn( 2025-12-04T11:20:45.3134858Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.3134962Z warnings.warn( 2025-12-04T11:20:45.3135127Z =================================== FAILURES =================================== 2025-12-04T11:20:45.3135645Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 _ 2025-12-04T11:20:45.3135819Z Traceback (most recent call last): 2025-12-04T11:20:45.3136418Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda 2025-12-04T11:20:45.3136655Z self.assertEqual(counters["inductor"]["woq_matcher_count"], 1) 2025-12-04T11:20:45.3137129Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual 2025-12-04T11:20:45.3137293Z return super().assertEqual(x, y, *args, **kwargs) 2025-12-04T11:20:45.3137833Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual 2025-12-04T11:20:45.3138132Z raise error_metas.pop()[0].to_error( # type: ignore[index] 2025-12-04T11:20:45.3138266Z AssertionError: Scalars are not equal! 2025-12-04T11:20:45.3138272Z 2025-12-04T11:20:45.3138398Z Expected 1 but got 2. 2025-12-04T11:20:45.3138507Z Absolute difference: 1 2025-12-04T11:20:45.3138619Z Relative difference: 1.0 2025-12-04T11:20:45.3138627Z 2025-12-04T11:20:45.3138858Z To execute this test, run the following from the base repo dir: 2025-12-04T11:20:45.3139770Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 2025-12-04T11:20:45.3139775Z 2025-12-04T11:20:45.3140056Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:20:45.3140275Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:20:45.3140396Z stats [('calls_captured', 6)] 2025-12-04T11:20:45.3141298Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)] 2025-12-04T11:20:45.3141525Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:20:45.3141641Z graph_break [] 2025-12-04T11:20:45.3141858Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:20:45.3142588Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.3142706Z warnings.warn( 2025-12-04T11:20:45.3143426Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.3143538Z warnings.warn( 2025-12-04T11:20:45.3143773Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:20:45.3143892Z stats [('calls_captured', 6)] 2025-12-04T11:20:45.3144139Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:20:45.3145028Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)] 2025-12-04T11:20:45.3145131Z graph_break [] 2025-12-04T11:20:45.3145363Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:20:45.3146087Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.3146206Z warnings.warn( 2025-12-04T11:20:45.3146988Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.3147095Z warnings.warn( 2025-12-04T11:20:45.3147324Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:20:45.3147440Z stats [('calls_captured', 6)] 2025-12-04T11:20:45.3147668Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:20:45.3148601Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)] 2025-12-04T11:20:45.3148700Z graph_break [] 2025-12-04T11:20:45.3148930Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:20:45.3149656Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.3149759Z warnings.warn( 2025-12-04T11:20:45.3150524Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.3150625Z warnings.warn( 2025-12-04T11:20:45.3151481Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-580d25229e34cb07.xml - 2025-12-04T11:20:45.3151657Z =========================== short test summary info ============================ 2025-12-04T11:20:45.3152598Z FAILED [0.5499s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 - AssertionError: Scalars are not equal! 2025-12-04T11:20:45.3152604Z 2025-12-04T11:20:45.3152725Z Expected 1 but got 2. 2025-12-04T11:20:45.3152833Z Absolute difference: 1 2025-12-04T11:20:45.3152961Z Relative difference: 1.0 2025-12-04T11:20:45.3152969Z 2025-12-04T11:20:45.3153189Z To execute this test, run the following from the base repo dir: 2025-12-04T11:20:45.3154104Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 2025-12-04T11:20:45.3154112Z 2025-12-04T11:20:45.3154394Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:20:45.3154575Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:20:45.3154785Z =================== 1 failed, 3 deselected, 2 rerun in 5.40s =================== 2025-12-04T11:20:45.3154886Z Got exit code 1 2025-12-04T11:20:45.3154996Z Retrying single test... 2025-12-04T11:20:45.3155453Z W1204 11:04:29.537000 88155 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T11:20:45.3156119Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-9d15e1ab064c4537.xml 2025-12-04T11:20:45.3156288Z ============================= test session starts ============================== 2025-12-04T11:20:45.3156653Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T11:20:45.3156767Z cachedir: .pytest_cache 2025-12-04T11:20:45.3157303Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:20:45.3157430Z rootdir: /var/lib/jenkins/workspace 2025-12-04T11:20:45.3157541Z configfile: pytest.ini 2025-12-04T11:20:45.3158098Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:20:45.3158317Z collecting ... collected 58 items / 13 deselected / 45 selected 2025-12-04T11:20:45.3159377Z stepcurrent: skipping 3 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 2025-12-04T11:20:45.3159509Z Running 1 items in this shard 2025-12-04T11:20:45.3159513Z 2025-12-04T11:20:45.3160790Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 [W1204 11:04:35.858736407 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3160826Z 2025-12-04T11:20:45.3161361Z [W1204 11:04:51.683103562 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3161366Z 2025-12-04T11:20:45.3161882Z [W1204 11:04:51.683367710 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3161888Z 2025-12-04T11:20:45.3162442Z [W1204 11:04:51.691012787 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3162447Z 2025-12-04T11:20:45.3162956Z [W1204 11:04:51.691822743 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3162963Z 2025-12-04T11:20:45.3163482Z [W1204 11:04:51.692030482 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3163487Z 2025-12-04T11:20:45.3163996Z [W1204 11:04:51.699425928 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3164001Z 2025-12-04T11:20:45.3164526Z [W1204 11:04:51.700193773 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3164535Z 2025-12-04T11:20:45.3165044Z [W1204 11:04:51.700394860 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3165051Z 2025-12-04T11:20:45.3165560Z [W1204 11:04:51.843491251 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3165582Z 2025-12-04T11:20:45.3166089Z [W1204 11:04:51.845281897 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3166094Z 2025-12-04T11:20:45.3166599Z [W1204 11:04:51.845498916 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3166605Z 2025-12-04T11:20:45.3167129Z [W1204 11:04:51.849609588 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3167134Z 2025-12-04T11:20:45.3167645Z [W1204 11:04:51.850352583 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3167652Z 2025-12-04T11:20:45.3168172Z [W1204 11:04:51.850566443 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3168177Z 2025-12-04T11:20:45.3168688Z [W1204 11:04:51.856782331 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3168693Z 2025-12-04T11:20:45.3169215Z [W1204 11:04:51.857470901 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3169219Z 2025-12-04T11:20:45.3169724Z [W1204 11:04:51.857668962 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3169729Z 2025-12-04T11:20:45.3169940Z ('RERUN', {'yellow': True}) [20.1321s] [100%] 2025-12-04T11:20:45.3171575Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 [W1204 11:04:51.350105120 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3171584Z 2025-12-04T11:20:45.3172195Z [W1204 11:04:51.350865873 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3172215Z 2025-12-04T11:20:45.3172725Z [W1204 11:04:51.351076777 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3172730Z 2025-12-04T11:20:45.3173239Z [W1204 11:04:51.355233539 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3173244Z 2025-12-04T11:20:45.3173777Z [W1204 11:04:51.355920646 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3173832Z 2025-12-04T11:20:45.3174342Z [W1204 11:04:51.356121170 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3174347Z 2025-12-04T11:20:45.3174865Z [W1204 11:04:51.362545131 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3174873Z 2025-12-04T11:20:45.3175381Z [W1204 11:04:51.363227009 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3175386Z 2025-12-04T11:20:45.3175907Z [W1204 11:04:51.363418628 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3175911Z 2025-12-04T11:20:45.3176487Z [W1204 11:04:52.455401128 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3176496Z 2025-12-04T11:20:45.3177004Z [W1204 11:04:52.456215113 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3177028Z 2025-12-04T11:20:45.3177533Z [W1204 11:04:52.456432566 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3177540Z 2025-12-04T11:20:45.3178048Z [W1204 11:04:52.460499790 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3178053Z 2025-12-04T11:20:45.3178576Z [W1204 11:04:52.461203277 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3178581Z 2025-12-04T11:20:45.3179091Z [W1204 11:04:52.461406752 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3179098Z 2025-12-04T11:20:45.3179615Z [W1204 11:04:52.467579445 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3179620Z 2025-12-04T11:20:45.3180125Z [W1204 11:04:52.468438257 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3180132Z 2025-12-04T11:20:45.3180652Z [W1204 11:04:52.468649896 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3180657Z 2025-12-04T11:20:45.3180788Z ('RERUN', {'yellow': True}) [0.5698s] [100%] 2025-12-04T11:20:45.3182173Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 [W1204 11:04:52.894208619 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3182182Z 2025-12-04T11:20:45.3182694Z [W1204 11:04:52.895141600 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3182699Z 2025-12-04T11:20:45.3183208Z [W1204 11:04:52.895373057 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3183262Z 2025-12-04T11:20:45.3183770Z [W1204 11:04:52.900263332 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3183775Z 2025-12-04T11:20:45.3184279Z [W1204 11:04:52.901198844 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3184284Z 2025-12-04T11:20:45.3184815Z [W1204 11:04:52.901428089 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3184862Z 2025-12-04T11:20:45.3185369Z [W1204 11:04:52.908479111 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3185374Z 2025-12-04T11:20:45.3185894Z [W1204 11:04:52.909382756 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3185902Z 2025-12-04T11:20:45.3186411Z [W1204 11:04:52.909589672 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3186417Z 2025-12-04T11:20:45.3186936Z [W1204 11:04:52.006060038 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3186941Z 2025-12-04T11:20:45.3187457Z [W1204 11:04:52.006898471 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3187463Z 2025-12-04T11:20:45.3187990Z [W1204 11:04:52.007124222 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3187995Z 2025-12-04T11:20:45.3188505Z [W1204 11:04:52.011364865 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3188513Z 2025-12-04T11:20:45.3189024Z [W1204 11:04:52.012104331 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3189029Z 2025-12-04T11:20:45.3189548Z [W1204 11:04:52.012325294 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3189553Z 2025-12-04T11:20:45.3190066Z [W1204 11:04:52.018618183 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3190075Z 2025-12-04T11:20:45.3190600Z [W1204 11:04:52.019574716 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3190607Z 2025-12-04T11:20:45.3191114Z [W1204 11:04:52.019780781 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3191122Z 2025-12-04T11:20:45.3191243Z FAILED [0.5502s] [100%] 2025-12-04T11:20:45.3191248Z 2025-12-04T11:20:45.3191395Z ==================================== RERUNS ==================================== 2025-12-04T11:20:45.3191913Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 _ 2025-12-04T11:20:45.3192054Z Traceback (most recent call last): 2025-12-04T11:20:45.3192567Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda 2025-12-04T11:20:45.3192884Z self.assertEqual(counters["inductor"]["woq_matcher_count"], 1) 2025-12-04T11:20:45.3193354Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual 2025-12-04T11:20:45.3193521Z return super().assertEqual(x, y, *args, **kwargs) 2025-12-04T11:20:45.3194073Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual 2025-12-04T11:20:45.3194312Z raise error_metas.pop()[0].to_error( # type: ignore[index] 2025-12-04T11:20:45.3194462Z AssertionError: Scalars are not equal! 2025-12-04T11:20:45.3194468Z 2025-12-04T11:20:45.3194577Z Expected 1 but got 2. 2025-12-04T11:20:45.3194685Z Absolute difference: 1 2025-12-04T11:20:45.3194809Z Relative difference: 1.0 2025-12-04T11:20:45.3194814Z 2025-12-04T11:20:45.3195029Z To execute this test, run the following from the base repo dir: 2025-12-04T11:20:45.3195948Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 2025-12-04T11:20:45.3195998Z 2025-12-04T11:20:45.3196269Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:20:45.3196492Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:20:45.3196625Z stats [('calls_captured', 6)] 2025-12-04T11:20:45.3197514Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)] 2025-12-04T11:20:45.3197744Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:20:45.3197873Z graph_break [] 2025-12-04T11:20:45.3198092Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:20:45.3199321Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T11:20:45.3199445Z if out == self.unknown_value: 2025-12-04T11:20:45.3200171Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.3200294Z warnings.warn( 2025-12-04T11:20:45.3201010Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.3201127Z warnings.warn( 2025-12-04T11:20:45.3201638Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 _ 2025-12-04T11:20:45.3201762Z Traceback (most recent call last): 2025-12-04T11:20:45.3202295Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda 2025-12-04T11:20:45.3202534Z self.assertEqual(counters["inductor"]["woq_matcher_count"], 1) 2025-12-04T11:20:45.3202994Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual 2025-12-04T11:20:45.3203177Z return super().assertEqual(x, y, *args, **kwargs) 2025-12-04T11:20:45.3203718Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual 2025-12-04T11:20:45.3203941Z raise error_metas.pop()[0].to_error( # type: ignore[index] 2025-12-04T11:20:45.3204076Z AssertionError: Scalars are not equal! 2025-12-04T11:20:45.3204081Z 2025-12-04T11:20:45.3204189Z Expected 1 but got 2. 2025-12-04T11:20:45.3204316Z Absolute difference: 1 2025-12-04T11:20:45.3204429Z Relative difference: 1.0 2025-12-04T11:20:45.3204434Z 2025-12-04T11:20:45.3204736Z To execute this test, run the following from the base repo dir: 2025-12-04T11:20:45.3205650Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 2025-12-04T11:20:45.3205658Z 2025-12-04T11:20:45.3205931Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:20:45.3206204Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:20:45.3206325Z stats [('calls_captured', 6)] 2025-12-04T11:20:45.3207224Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)] 2025-12-04T11:20:45.3207453Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:20:45.3207554Z graph_break [] 2025-12-04T11:20:45.3207794Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:20:45.3209043Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T11:20:45.3209174Z if out == self.unknown_value: 2025-12-04T11:20:45.3209907Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.3210010Z warnings.warn( 2025-12-04T11:20:45.3210740Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.3210843Z warnings.warn( 2025-12-04T11:20:45.3211063Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:20:45.3211198Z stats [('calls_captured', 6)] 2025-12-04T11:20:45.3211425Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:20:45.3212326Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)] 2025-12-04T11:20:45.3212426Z graph_break [] 2025-12-04T11:20:45.3212640Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:20:45.3213379Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.3213483Z warnings.warn( 2025-12-04T11:20:45.3214210Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.3214311Z warnings.warn( 2025-12-04T11:20:45.3214462Z =================================== FAILURES =================================== 2025-12-04T11:20:45.3214990Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 _ 2025-12-04T11:20:45.3215114Z Traceback (most recent call last): 2025-12-04T11:20:45.3215622Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda 2025-12-04T11:20:45.3215869Z self.assertEqual(counters["inductor"]["woq_matcher_count"], 1) 2025-12-04T11:20:45.3216400Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual 2025-12-04T11:20:45.3216583Z return super().assertEqual(x, y, *args, **kwargs) 2025-12-04T11:20:45.3217116Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual 2025-12-04T11:20:45.3217447Z raise error_metas.pop()[0].to_error( # type: ignore[index] 2025-12-04T11:20:45.3217599Z AssertionError: Scalars are not equal! 2025-12-04T11:20:45.3217607Z 2025-12-04T11:20:45.3217713Z Expected 1 but got 2. 2025-12-04T11:20:45.3217821Z Absolute difference: 1 2025-12-04T11:20:45.3217946Z Relative difference: 1.0 2025-12-04T11:20:45.3217951Z 2025-12-04T11:20:45.3218166Z To execute this test, run the following from the base repo dir: 2025-12-04T11:20:45.3219127Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 2025-12-04T11:20:45.3219133Z 2025-12-04T11:20:45.3219404Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:20:45.3219622Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:20:45.3219755Z stats [('calls_captured', 6)] 2025-12-04T11:20:45.3220646Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)] 2025-12-04T11:20:45.3220919Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:20:45.3221019Z graph_break [] 2025-12-04T11:20:45.3221238Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:20:45.3222461Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T11:20:45.3222580Z if out == self.unknown_value: 2025-12-04T11:20:45.3223313Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.3223416Z warnings.warn( 2025-12-04T11:20:45.3224138Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.3224257Z warnings.warn( 2025-12-04T11:20:45.3224474Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:20:45.3224592Z stats [('calls_captured', 6)] 2025-12-04T11:20:45.3224836Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:20:45.3225726Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)] 2025-12-04T11:20:45.3225839Z graph_break [] 2025-12-04T11:20:45.3226054Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:20:45.3226782Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.3226901Z warnings.warn( 2025-12-04T11:20:45.3227613Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.3227727Z warnings.warn( 2025-12-04T11:20:45.3227944Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:20:45.3228060Z stats [('calls_captured', 6)] 2025-12-04T11:20:45.3228302Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:20:45.3229187Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)] 2025-12-04T11:20:45.3229297Z graph_break [] 2025-12-04T11:20:45.3229578Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:20:45.3230307Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.3230421Z warnings.warn( 2025-12-04T11:20:45.3231139Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.3231272Z warnings.warn( 2025-12-04T11:20:45.3232125Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-9d15e1ab064c4537.xml - 2025-12-04T11:20:45.3232299Z =========================== short test summary info ============================ 2025-12-04T11:20:45.3233268Z FAILED [0.5502s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 - AssertionError: Scalars are not equal! 2025-12-04T11:20:45.3233306Z 2025-12-04T11:20:45.3233415Z Expected 1 but got 2. 2025-12-04T11:20:45.3233525Z Absolute difference: 1 2025-12-04T11:20:45.3233653Z Relative difference: 1.0 2025-12-04T11:20:45.3233658Z 2025-12-04T11:20:45.3233876Z To execute this test, run the following from the base repo dir: 2025-12-04T11:20:45.3234796Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 2025-12-04T11:20:45.3234802Z 2025-12-04T11:20:45.3235070Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:20:45.3235250Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:20:45.3235464Z ================== 1 failed, 13 deselected, 2 rerun in 21.29s ================== 2025-12-04T11:20:45.3235566Z Got exit code 1 2025-12-04T11:20:45.3235691Z Retrying single test... 2025-12-04T11:20:45.3236138Z W1204 11:05:04.505000 88364 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T11:20:45.3236798Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-e6d909bcc6975bf8.xml 2025-12-04T11:20:45.3236979Z ============================= test session starts ============================== 2025-12-04T11:20:45.3237330Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T11:20:45.3237439Z cachedir: .pytest_cache 2025-12-04T11:20:45.3237971Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:20:45.3238097Z rootdir: /var/lib/jenkins/workspace 2025-12-04T11:20:45.3238217Z configfile: pytest.ini 2025-12-04T11:20:45.3238765Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:20:45.3238988Z collecting ... collected 58 items / 13 deselected / 45 selected 2025-12-04T11:20:45.3239994Z stepcurrent: skipping 3 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 2025-12-04T11:20:45.3240111Z Running 1 items in this shard 2025-12-04T11:20:45.3240116Z 2025-12-04T11:20:45.3241408Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 [W1204 11:05:10.780546421 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3241415Z 2025-12-04T11:20:45.3241996Z [W1204 11:05:26.919146891 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3242003Z 2025-12-04T11:20:45.3242528Z [W1204 11:05:26.919416751 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3242534Z 2025-12-04T11:20:45.3243043Z [W1204 11:05:26.926934322 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3243078Z 2025-12-04T11:20:45.3243604Z [W1204 11:05:26.927758136 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3243609Z 2025-12-04T11:20:45.3244120Z [W1204 11:05:26.927957863 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3244125Z 2025-12-04T11:20:45.3244632Z [W1204 11:05:26.935334671 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3244642Z 2025-12-04T11:20:45.3245162Z [W1204 11:05:26.936054982 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3245199Z 2025-12-04T11:20:45.3245711Z [W1204 11:05:26.936244522 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3245718Z 2025-12-04T11:20:45.3246239Z [W1204 11:05:26.076376665 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3246244Z 2025-12-04T11:20:45.3246752Z [W1204 11:05:26.078188203 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3246757Z 2025-12-04T11:20:45.3247279Z [W1204 11:05:26.078405879 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3247284Z 2025-12-04T11:20:45.3247797Z [W1204 11:05:26.082496328 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3247805Z 2025-12-04T11:20:45.3248325Z [W1204 11:05:26.083195765 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3248330Z 2025-12-04T11:20:45.3248843Z [W1204 11:05:26.083399460 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3248848Z 2025-12-04T11:20:45.3249353Z [W1204 11:05:26.089525159 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3249370Z 2025-12-04T11:20:45.3249881Z [W1204 11:05:26.090218626 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3249886Z 2025-12-04T11:20:45.3250396Z [W1204 11:05:26.090423240 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3250402Z 2025-12-04T11:20:45.3250547Z ('RERUN', {'yellow': True}) [20.4168s] [100%] 2025-12-04T11:20:45.3251827Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 [W1204 11:05:27.580710909 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3251836Z 2025-12-04T11:20:45.3252359Z [W1204 11:05:27.581482248 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3252364Z 2025-12-04T11:20:45.3252872Z [W1204 11:05:27.581691283 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3252877Z 2025-12-04T11:20:45.3253461Z [W1204 11:05:27.585824300 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3253469Z 2025-12-04T11:20:45.3253978Z [W1204 11:05:27.586490821 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3253982Z 2025-12-04T11:20:45.3254530Z [W1204 11:05:27.586688165 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3254535Z 2025-12-04T11:20:45.3255041Z [W1204 11:05:27.592993223 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3255045Z 2025-12-04T11:20:45.3255550Z [W1204 11:05:27.593641707 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3255566Z 2025-12-04T11:20:45.3256078Z [W1204 11:05:27.593831802 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3256114Z 2025-12-04T11:20:45.3256704Z [W1204 11:05:27.685560711 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3256710Z 2025-12-04T11:20:45.3257231Z [W1204 11:05:27.686377967 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3257239Z 2025-12-04T11:20:45.3257748Z [W1204 11:05:27.686599367 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3257753Z 2025-12-04T11:20:45.3258277Z [W1204 11:05:27.690708846 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3258282Z 2025-12-04T11:20:45.3258795Z [W1204 11:05:27.691421343 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3258801Z 2025-12-04T11:20:45.3259319Z [W1204 11:05:27.691628754 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3259324Z 2025-12-04T11:20:45.3259828Z [W1204 11:05:27.697848049 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3259835Z 2025-12-04T11:20:45.3260357Z [W1204 11:05:27.698742869 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3260362Z 2025-12-04T11:20:45.3260870Z [W1204 11:05:27.698943962 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3260875Z 2025-12-04T11:20:45.3261009Z ('RERUN', {'yellow': True}) [0.5696s] [100%] 2025-12-04T11:20:45.3262314Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 [W1204 11:05:27.125813059 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3262323Z 2025-12-04T11:20:45.3262832Z [W1204 11:05:27.126623300 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3262839Z 2025-12-04T11:20:45.3263363Z [W1204 11:05:27.126837387 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3263368Z 2025-12-04T11:20:45.3263880Z [W1204 11:05:27.131139909 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3263886Z 2025-12-04T11:20:45.3264489Z [W1204 11:05:27.131860597 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3264497Z 2025-12-04T11:20:45.3265005Z [W1204 11:05:27.132062746 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3265009Z 2025-12-04T11:20:45.3265531Z [W1204 11:05:27.138366932 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3265574Z 2025-12-04T11:20:45.3266082Z [W1204 11:05:27.139072284 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3266087Z 2025-12-04T11:20:45.3266593Z [W1204 11:05:27.139269389 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3266611Z 2025-12-04T11:20:45.3267122Z [W1204 11:05:27.235306040 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3267159Z 2025-12-04T11:20:45.3267671Z [W1204 11:05:27.236087841 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3267676Z 2025-12-04T11:20:45.3268197Z [W1204 11:05:27.236295211 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3268205Z 2025-12-04T11:20:45.3268713Z [W1204 11:05:27.240288160 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3268718Z 2025-12-04T11:20:45.3269237Z [W1204 11:05:27.240971303 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3269242Z 2025-12-04T11:20:45.3269757Z [W1204 11:05:27.241169477 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3269762Z 2025-12-04T11:20:45.3270283Z [W1204 11:05:27.247311814 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3270287Z 2025-12-04T11:20:45.3270795Z [W1204 11:05:27.248238418 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3270801Z 2025-12-04T11:20:45.3271684Z [W1204 11:05:27.248442563 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3271691Z 2025-12-04T11:20:45.3271798Z FAILED [0.5469s] [100%] 2025-12-04T11:20:45.3271802Z 2025-12-04T11:20:45.3271948Z ==================================== RERUNS ==================================== 2025-12-04T11:20:45.3272475Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 _ 2025-12-04T11:20:45.3272609Z Traceback (most recent call last): 2025-12-04T11:20:45.3273139Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda 2025-12-04T11:20:45.3273377Z self.assertEqual(counters["inductor"]["woq_matcher_count"], 1) 2025-12-04T11:20:45.3273841Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual 2025-12-04T11:20:45.3274021Z return super().assertEqual(x, y, *args, **kwargs) 2025-12-04T11:20:45.3274559Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual 2025-12-04T11:20:45.3274765Z raise error_metas.pop()[0].to_error( # type: ignore[index] 2025-12-04T11:20:45.3274913Z AssertionError: Scalars are not equal! 2025-12-04T11:20:45.3274918Z 2025-12-04T11:20:45.3275028Z Expected 1 but got 2. 2025-12-04T11:20:45.3275150Z Absolute difference: 1 2025-12-04T11:20:45.3275263Z Relative difference: 1.0 2025-12-04T11:20:45.3275390Z 2025-12-04T11:20:45.3275610Z To execute this test, run the following from the base repo dir: 2025-12-04T11:20:45.3276542Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 2025-12-04T11:20:45.3276548Z 2025-12-04T11:20:45.3276864Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:20:45.3277103Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:20:45.3277224Z stats [('calls_captured', 6)] 2025-12-04T11:20:45.3278125Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)] 2025-12-04T11:20:45.3278370Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:20:45.3278478Z graph_break [] 2025-12-04T11:20:45.3278698Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:20:45.3279967Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T11:20:45.3280089Z if out == self.unknown_value: 2025-12-04T11:20:45.3280829Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.3280937Z warnings.warn( 2025-12-04T11:20:45.3281658Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.3281779Z warnings.warn( 2025-12-04T11:20:45.3282299Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 _ 2025-12-04T11:20:45.3282441Z Traceback (most recent call last): 2025-12-04T11:20:45.3282959Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda 2025-12-04T11:20:45.3283193Z self.assertEqual(counters["inductor"]["woq_matcher_count"], 1) 2025-12-04T11:20:45.3283671Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual 2025-12-04T11:20:45.3283839Z return super().assertEqual(x, y, *args, **kwargs) 2025-12-04T11:20:45.3284389Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual 2025-12-04T11:20:45.3284597Z raise error_metas.pop()[0].to_error( # type: ignore[index] 2025-12-04T11:20:45.3284734Z AssertionError: Scalars are not equal! 2025-12-04T11:20:45.3284739Z 2025-12-04T11:20:45.3284865Z Expected 1 but got 2. 2025-12-04T11:20:45.3284980Z Absolute difference: 1 2025-12-04T11:20:45.3285100Z Relative difference: 1.0 2025-12-04T11:20:45.3285105Z 2025-12-04T11:20:45.3285334Z To execute this test, run the following from the base repo dir: 2025-12-04T11:20:45.3286252Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 2025-12-04T11:20:45.3286260Z 2025-12-04T11:20:45.3286546Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:20:45.3286768Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:20:45.3286887Z stats [('calls_captured', 6)] 2025-12-04T11:20:45.3287851Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)] 2025-12-04T11:20:45.3288081Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:20:45.3288199Z graph_break [] 2025-12-04T11:20:45.3288420Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:20:45.3289627Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T11:20:45.3289791Z if out == self.unknown_value: 2025-12-04T11:20:45.3290514Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.3290632Z warnings.warn( 2025-12-04T11:20:45.3291354Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.3291455Z warnings.warn( 2025-12-04T11:20:45.3291715Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:20:45.3291832Z stats [('calls_captured', 6)] 2025-12-04T11:20:45.3292059Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:20:45.3292952Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)] 2025-12-04T11:20:45.3293054Z graph_break [] 2025-12-04T11:20:45.3293281Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:20:45.3294008Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.3294111Z warnings.warn( 2025-12-04T11:20:45.3294850Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.3294955Z warnings.warn( 2025-12-04T11:20:45.3295120Z =================================== FAILURES =================================== 2025-12-04T11:20:45.3295632Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 _ 2025-12-04T11:20:45.3295761Z Traceback (most recent call last): 2025-12-04T11:20:45.3296361Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda 2025-12-04T11:20:45.3296596Z self.assertEqual(counters["inductor"]["woq_matcher_count"], 1) 2025-12-04T11:20:45.3297056Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual 2025-12-04T11:20:45.3297239Z return super().assertEqual(x, y, *args, **kwargs) 2025-12-04T11:20:45.3297779Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual 2025-12-04T11:20:45.3298002Z raise error_metas.pop()[0].to_error( # type: ignore[index] 2025-12-04T11:20:45.3298136Z AssertionError: Scalars are not equal! 2025-12-04T11:20:45.3298142Z 2025-12-04T11:20:45.3298248Z Expected 1 but got 2. 2025-12-04T11:20:45.3298376Z Absolute difference: 1 2025-12-04T11:20:45.3298488Z Relative difference: 1.0 2025-12-04T11:20:45.3298493Z 2025-12-04T11:20:45.3298725Z To execute this test, run the following from the base repo dir: 2025-12-04T11:20:45.3299635Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 2025-12-04T11:20:45.3299641Z 2025-12-04T11:20:45.3299908Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:20:45.3300219Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:20:45.3300341Z stats [('calls_captured', 6)] 2025-12-04T11:20:45.3301240Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)] 2025-12-04T11:20:45.3301498Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:20:45.3301598Z graph_break [] 2025-12-04T11:20:45.3301826Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:20:45.3303029Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T11:20:45.3303162Z if out == self.unknown_value: 2025-12-04T11:20:45.3303892Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.3304048Z warnings.warn( 2025-12-04T11:20:45.3304779Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.3304883Z warnings.warn( 2025-12-04T11:20:45.3305101Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:20:45.3305230Z stats [('calls_captured', 6)] 2025-12-04T11:20:45.3305458Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:20:45.3306357Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)] 2025-12-04T11:20:45.3306456Z graph_break [] 2025-12-04T11:20:45.3306677Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:20:45.3307414Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.3307519Z warnings.warn( 2025-12-04T11:20:45.3308255Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.3308361Z warnings.warn( 2025-12-04T11:20:45.3308579Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:20:45.3308710Z stats [('calls_captured', 6)] 2025-12-04T11:20:45.3308940Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:20:45.3309831Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)] 2025-12-04T11:20:45.3309946Z graph_break [] 2025-12-04T11:20:45.3310163Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:20:45.3310901Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.3311004Z warnings.warn( 2025-12-04T11:20:45.3311720Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.3311835Z warnings.warn( 2025-12-04T11:20:45.3312669Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-e6d909bcc6975bf8.xml - 2025-12-04T11:20:45.3312914Z =========================== short test summary info ============================ 2025-12-04T11:20:45.3313871Z FAILED [0.5469s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 - AssertionError: Scalars are not equal! 2025-12-04T11:20:45.3313877Z 2025-12-04T11:20:45.3313984Z Expected 1 but got 2. 2025-12-04T11:20:45.3314110Z Absolute difference: 1 2025-12-04T11:20:45.3314252Z Relative difference: 1.0 2025-12-04T11:20:45.3314257Z 2025-12-04T11:20:45.3314486Z To execute this test, run the following from the base repo dir: 2025-12-04T11:20:45.3315391Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 2025-12-04T11:20:45.3315397Z 2025-12-04T11:20:45.3315663Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:20:45.3315865Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:20:45.3316096Z ================== 1 failed, 13 deselected, 2 rerun in 21.57s ================== 2025-12-04T11:20:45.3316196Z Got exit code 1 2025-12-04T11:20:45.3317034Z FAILED CONSISTENTLY: test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 2025-12-04T11:20:45.3317445Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T11:20:45.3317904Z W1204 11:05:40.038000 88573 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T11:20:45.3318559Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-0a612698d44183a1.xml 2025-12-04T11:20:45.3318728Z ============================= test session starts ============================== 2025-12-04T11:20:45.3319092Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T11:20:45.3319208Z cachedir: .pytest_cache 2025-12-04T11:20:45.3319741Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:20:45.3319868Z rootdir: /var/lib/jenkins/workspace 2025-12-04T11:20:45.3319979Z configfile: pytest.ini 2025-12-04T11:20:45.3320532Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:20:45.3320747Z collecting ... collected 58 items / 4 deselected / 54 selected 2025-12-04T11:20:45.3320906Z stepcurrent: skipping 4 already run items. 2025-12-04T11:20:45.3321024Z Running 10 items in this shard 2025-12-04T11:20:45.3321029Z 2025-12-04T11:20:45.3321910Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 ('RERUN', {'yellow': True}) [4.2482s] [ 10%] 2025-12-04T11:20:45.3322787Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 ('RERUN', {'yellow': True}) [0.5534s] [ 10%] 2025-12-04T11:20:45.3323563Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 FAILED [0.5550s] [ 10%] 2025-12-04T11:20:45.3323572Z 2025-12-04T11:20:45.3323726Z ==================================== RERUNS ==================================== 2025-12-04T11:20:45.3324237Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 _ 2025-12-04T11:20:45.3324361Z Traceback (most recent call last): 2025-12-04T11:20:45.3324885Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda 2025-12-04T11:20:45.3325180Z self.assertEqual(counters["inductor"]["woq_matcher_count"], 1) 2025-12-04T11:20:45.3325658Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual 2025-12-04T11:20:45.3325825Z return super().assertEqual(x, y, *args, **kwargs) 2025-12-04T11:20:45.3326364Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual 2025-12-04T11:20:45.3326616Z raise error_metas.pop()[0].to_error( # type: ignore[index] 2025-12-04T11:20:45.3326749Z AssertionError: Scalars are not equal! 2025-12-04T11:20:45.3326754Z 2025-12-04T11:20:45.3326862Z Expected 1 but got 2. 2025-12-04T11:20:45.3326983Z Absolute difference: 1 2025-12-04T11:20:45.3327093Z Relative difference: 1.0 2025-12-04T11:20:45.3327098Z 2025-12-04T11:20:45.3327325Z To execute this test, run the following from the base repo dir: 2025-12-04T11:20:45.3328236Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 2025-12-04T11:20:45.3328293Z 2025-12-04T11:20:45.3328563Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:20:45.3328798Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:20:45.3328919Z stats [('calls_captured', 6)] 2025-12-04T11:20:45.3329819Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)] 2025-12-04T11:20:45.3330046Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:20:45.3330147Z graph_break [] 2025-12-04T11:20:45.3330379Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:20:45.3331115Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.3331239Z warnings.warn( 2025-12-04T11:20:45.3331963Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.3332066Z warnings.warn( 2025-12-04T11:20:45.3332588Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 _ 2025-12-04T11:20:45.3332714Z Traceback (most recent call last): 2025-12-04T11:20:45.3333221Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda 2025-12-04T11:20:45.3333464Z self.assertEqual(counters["inductor"]["woq_matcher_count"], 1) 2025-12-04T11:20:45.3333926Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual 2025-12-04T11:20:45.3334108Z return super().assertEqual(x, y, *args, **kwargs) 2025-12-04T11:20:45.3334642Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual 2025-12-04T11:20:45.3334851Z raise error_metas.pop()[0].to_error( # type: ignore[index] 2025-12-04T11:20:45.3334998Z AssertionError: Scalars are not equal! 2025-12-04T11:20:45.3335007Z 2025-12-04T11:20:45.3335113Z Expected 1 but got 2. 2025-12-04T11:20:45.3335234Z Absolute difference: 1 2025-12-04T11:20:45.3335345Z Relative difference: 1.0 2025-12-04T11:20:45.3335350Z 2025-12-04T11:20:45.3335565Z To execute this test, run the following from the base repo dir: 2025-12-04T11:20:45.3336561Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 2025-12-04T11:20:45.3336567Z 2025-12-04T11:20:45.3336910Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:20:45.3337146Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:20:45.3337265Z stats [('calls_captured', 6)] 2025-12-04T11:20:45.3338149Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)] 2025-12-04T11:20:45.3338426Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:20:45.3338527Z graph_break [] 2025-12-04T11:20:45.3338745Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:20:45.3339487Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.3339605Z warnings.warn( 2025-12-04T11:20:45.3340340Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.3340476Z warnings.warn( 2025-12-04T11:20:45.3340694Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:20:45.3340828Z stats [('calls_captured', 6)] 2025-12-04T11:20:45.3341060Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:20:45.3341960Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)] 2025-12-04T11:20:45.3342062Z graph_break [] 2025-12-04T11:20:45.3342278Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:20:45.3343027Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.3343133Z warnings.warn( 2025-12-04T11:20:45.3343849Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.3343966Z warnings.warn( 2025-12-04T11:20:45.3344119Z =================================== FAILURES =================================== 2025-12-04T11:20:45.3344648Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 _ 2025-12-04T11:20:45.3344776Z Traceback (most recent call last): 2025-12-04T11:20:45.3345287Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda 2025-12-04T11:20:45.3345535Z self.assertEqual(counters["inductor"]["woq_matcher_count"], 1) 2025-12-04T11:20:45.3345999Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual 2025-12-04T11:20:45.3346181Z return super().assertEqual(x, y, *args, **kwargs) 2025-12-04T11:20:45.3346719Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual 2025-12-04T11:20:45.3346928Z raise error_metas.pop()[0].to_error( # type: ignore[index] 2025-12-04T11:20:45.3347083Z AssertionError: Scalars are not equal! 2025-12-04T11:20:45.3347088Z 2025-12-04T11:20:45.3347196Z Expected 1 but got 2. 2025-12-04T11:20:45.3347307Z Absolute difference: 1 2025-12-04T11:20:45.3347440Z Relative difference: 1.0 2025-12-04T11:20:45.3347445Z 2025-12-04T11:20:45.3347662Z To execute this test, run the following from the base repo dir: 2025-12-04T11:20:45.3348600Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 2025-12-04T11:20:45.3348673Z 2025-12-04T11:20:45.3348945Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:20:45.3349167Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:20:45.3349301Z stats [('calls_captured', 6)] 2025-12-04T11:20:45.3350185Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)] 2025-12-04T11:20:45.3350480Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:20:45.3350583Z graph_break [] 2025-12-04T11:20:45.3350802Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:20:45.3351545Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.3351654Z warnings.warn( 2025-12-04T11:20:45.3352371Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.3352522Z warnings.warn( 2025-12-04T11:20:45.3352741Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:20:45.3352874Z stats [('calls_captured', 6)] 2025-12-04T11:20:45.3353104Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:20:45.3353989Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)] 2025-12-04T11:20:45.3354104Z graph_break [] 2025-12-04T11:20:45.3354322Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:20:45.3355063Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.3355168Z warnings.warn( 2025-12-04T11:20:45.3355890Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.3356002Z warnings.warn( 2025-12-04T11:20:45.3356220Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:20:45.3356334Z stats [('calls_captured', 6)] 2025-12-04T11:20:45.3356574Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:20:45.3357458Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)] 2025-12-04T11:20:45.3357570Z graph_break [] 2025-12-04T11:20:45.3357789Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:20:45.3358514Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.3358630Z warnings.warn( 2025-12-04T11:20:45.3359351Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.3359470Z warnings.warn( 2025-12-04T11:20:45.3360307Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-0a612698d44183a1.xml - 2025-12-04T11:20:45.3360482Z =========================== short test summary info ============================ 2025-12-04T11:20:45.3361508Z FAILED [0.5550s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 - AssertionError: Scalars are not equal! 2025-12-04T11:20:45.3361517Z 2025-12-04T11:20:45.3361625Z Expected 1 but got 2. 2025-12-04T11:20:45.3361747Z Absolute difference: 1 2025-12-04T11:20:45.3361859Z Relative difference: 1.0 2025-12-04T11:20:45.3361864Z 2025-12-04T11:20:45.3362082Z To execute this test, run the following from the base repo dir: 2025-12-04T11:20:45.3363042Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 2025-12-04T11:20:45.3363048Z 2025-12-04T11:20:45.3363316Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:20:45.3363510Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:20:45.3363709Z =================== 1 failed, 4 deselected, 2 rerun in 5.39s =================== 2025-12-04T11:20:45.3363815Z Got exit code 1 2025-12-04T11:20:45.3363940Z Retrying single test... 2025-12-04T11:20:45.3364414Z W1204 11:06:00.979000 88777 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T11:20:45.3365074Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-90f2ceb88314c75a.xml 2025-12-04T11:20:45.3365258Z ============================= test session starts ============================== 2025-12-04T11:20:45.3365608Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T11:20:45.3365732Z cachedir: .pytest_cache 2025-12-04T11:20:45.3366251Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:20:45.3366378Z rootdir: /var/lib/jenkins/workspace 2025-12-04T11:20:45.3366503Z configfile: pytest.ini 2025-12-04T11:20:45.3367048Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:20:45.3367269Z collecting ... collected 58 items / 13 deselected / 45 selected 2025-12-04T11:20:45.3368273Z stepcurrent: skipping 4 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 2025-12-04T11:20:45.3368392Z Running 1 items in this shard 2025-12-04T11:20:45.3368398Z 2025-12-04T11:20:45.3369700Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 [W1204 11:06:06.376765421 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3369707Z 2025-12-04T11:20:45.3370231Z [W1204 11:06:22.087818799 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3370239Z 2025-12-04T11:20:45.3370763Z [W1204 11:06:22.088083682 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3370769Z 2025-12-04T11:20:45.3371638Z [W1204 11:06:22.095489169 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3371650Z 2025-12-04T11:20:45.3372177Z [W1204 11:06:22.096224079 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3372182Z 2025-12-04T11:20:45.3372693Z [W1204 11:06:22.096417919 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3372698Z 2025-12-04T11:20:45.3373352Z [W1204 11:06:22.103545531 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3373358Z 2025-12-04T11:20:45.3373872Z [W1204 11:06:22.104204659 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3373877Z 2025-12-04T11:20:45.3374384Z [W1204 11:06:22.104392332 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3374445Z 2025-12-04T11:20:45.3374959Z [W1204 11:06:22.240459661 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3374964Z 2025-12-04T11:20:45.3375473Z [W1204 11:06:22.242177774 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3375478Z 2025-12-04T11:20:45.3376004Z [W1204 11:06:22.242391340 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3376014Z 2025-12-04T11:20:45.3376591Z [W1204 11:06:22.246359559 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3376658Z 2025-12-04T11:20:45.3377183Z [W1204 11:06:22.247009057 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3377190Z 2025-12-04T11:20:45.3377696Z [W1204 11:06:22.247210558 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3377701Z 2025-12-04T11:20:45.3378223Z [W1204 11:06:22.253320201 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3378228Z 2025-12-04T11:20:45.3378740Z [W1204 11:06:22.253961494 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3378745Z 2025-12-04T11:20:45.3379256Z [W1204 11:06:22.254158790 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3379279Z 2025-12-04T11:20:45.3379416Z ('RERUN', {'yellow': True}) [20.0260s] [100%] 2025-12-04T11:20:45.3380703Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 [W1204 11:06:23.735296598 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3380711Z 2025-12-04T11:20:45.3381246Z [W1204 11:06:23.736032306 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3381252Z 2025-12-04T11:20:45.3381760Z [W1204 11:06:23.736235670 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3381769Z 2025-12-04T11:20:45.3382292Z [W1204 11:06:23.740357723 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3382299Z 2025-12-04T11:20:45.3382810Z [W1204 11:06:23.741001988 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3382817Z 2025-12-04T11:20:45.3383341Z [W1204 11:06:23.741195318 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3383346Z 2025-12-04T11:20:45.3383853Z [W1204 11:06:23.747260128 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3383858Z 2025-12-04T11:20:45.3384379Z [W1204 11:06:23.747875854 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3384384Z 2025-12-04T11:20:45.3384960Z [W1204 11:06:23.748066506 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3384968Z 2025-12-04T11:20:45.3385481Z [W1204 11:06:23.837438193 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3385499Z 2025-12-04T11:20:45.3386033Z [W1204 11:06:23.838228476 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3386039Z 2025-12-04T11:20:45.3386543Z [W1204 11:06:23.838451428 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3386548Z 2025-12-04T11:20:45.3387067Z [W1204 11:06:23.842526009 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3387072Z 2025-12-04T11:20:45.3387582Z [W1204 11:06:23.843200175 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3387620Z 2025-12-04T11:20:45.3388142Z [W1204 11:06:23.843401431 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3388147Z 2025-12-04T11:20:45.3388658Z [W1204 11:06:23.849503892 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3388666Z 2025-12-04T11:20:45.3389186Z [W1204 11:06:23.850367333 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3389191Z 2025-12-04T11:20:45.3389700Z [W1204 11:06:23.850572535 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3389704Z 2025-12-04T11:20:45.3389839Z ('RERUN', {'yellow': True}) [0.5571s] [100%] 2025-12-04T11:20:45.3391129Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 [W1204 11:06:23.272493034 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3391137Z 2025-12-04T11:20:45.3391644Z [W1204 11:06:23.273237062 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3391651Z 2025-12-04T11:20:45.3392172Z [W1204 11:06:23.273436227 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3392177Z 2025-12-04T11:20:45.3392681Z [W1204 11:06:23.277489708 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3392687Z 2025-12-04T11:20:45.3393213Z [W1204 11:06:23.278108572 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3393220Z 2025-12-04T11:20:45.3393725Z [W1204 11:06:23.278302370 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3393730Z 2025-12-04T11:20:45.3394248Z [W1204 11:06:23.284477487 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3394254Z 2025-12-04T11:20:45.3394763Z [W1204 11:06:23.285109703 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3394768Z 2025-12-04T11:20:45.3395285Z [W1204 11:06:23.285299795 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3395290Z 2025-12-04T11:20:45.3395878Z [W1204 11:06:23.378206063 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3395886Z 2025-12-04T11:20:45.3396395Z [W1204 11:06:23.379033322 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3396414Z 2025-12-04T11:20:45.3396924Z [W1204 11:06:23.379251880 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3396959Z 2025-12-04T11:20:45.3397468Z [W1204 11:06:23.383683921 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3397474Z 2025-12-04T11:20:45.3397997Z [W1204 11:06:23.384395564 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3398002Z 2025-12-04T11:20:45.3398514Z [W1204 11:06:23.384609607 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3398551Z 2025-12-04T11:20:45.3399070Z [W1204 11:06:24.390931891 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3399075Z 2025-12-04T11:20:45.3399586Z [W1204 11:06:24.391651945 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3399594Z 2025-12-04T11:20:45.3400115Z [W1204 11:06:24.391855758 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3400120Z 2025-12-04T11:20:45.3400224Z FAILED [0.5398s] [100%] 2025-12-04T11:20:45.3400229Z 2025-12-04T11:20:45.3400372Z ==================================== RERUNS ==================================== 2025-12-04T11:20:45.3400899Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 _ 2025-12-04T11:20:45.3401029Z Traceback (most recent call last): 2025-12-04T11:20:45.3401560Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda 2025-12-04T11:20:45.3401793Z self.assertEqual(counters["inductor"]["woq_matcher_count"], 1) 2025-12-04T11:20:45.3402260Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual 2025-12-04T11:20:45.3402441Z return super().assertEqual(x, y, *args, **kwargs) 2025-12-04T11:20:45.3402980Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual 2025-12-04T11:20:45.3403201Z raise error_metas.pop()[0].to_error( # type: ignore[index] 2025-12-04T11:20:45.3403338Z AssertionError: Scalars are not equal! 2025-12-04T11:20:45.3403343Z 2025-12-04T11:20:45.3403450Z Expected 1 but got 2. 2025-12-04T11:20:45.3403575Z Absolute difference: 1 2025-12-04T11:20:45.3403694Z Relative difference: 1.0 2025-12-04T11:20:45.3403698Z 2025-12-04T11:20:45.3403918Z To execute this test, run the following from the base repo dir: 2025-12-04T11:20:45.3404847Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 2025-12-04T11:20:45.3404855Z 2025-12-04T11:20:45.3405127Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:20:45.3405362Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:20:45.3405479Z stats [('calls_captured', 6)] 2025-12-04T11:20:45.3406371Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)] 2025-12-04T11:20:45.3406704Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:20:45.3406807Z graph_break [] 2025-12-04T11:20:45.3407043Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:20:45.3408262Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T11:20:45.3408413Z if out == self.unknown_value: 2025-12-04T11:20:45.3409150Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.3409255Z warnings.warn( 2025-12-04T11:20:45.3409987Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.3410091Z warnings.warn( 2025-12-04T11:20:45.3410609Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 _ 2025-12-04T11:20:45.3410788Z Traceback (most recent call last): 2025-12-04T11:20:45.3411303Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda 2025-12-04T11:20:45.3411536Z self.assertEqual(counters["inductor"]["woq_matcher_count"], 1) 2025-12-04T11:20:45.3412010Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual 2025-12-04T11:20:45.3412175Z return super().assertEqual(x, y, *args, **kwargs) 2025-12-04T11:20:45.3412725Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual 2025-12-04T11:20:45.3412932Z raise error_metas.pop()[0].to_error( # type: ignore[index] 2025-12-04T11:20:45.3413065Z AssertionError: Scalars are not equal! 2025-12-04T11:20:45.3413070Z 2025-12-04T11:20:45.3413195Z Expected 1 but got 2. 2025-12-04T11:20:45.3413306Z Absolute difference: 1 2025-12-04T11:20:45.3413431Z Relative difference: 1.0 2025-12-04T11:20:45.3413436Z 2025-12-04T11:20:45.3413652Z To execute this test, run the following from the base repo dir: 2025-12-04T11:20:45.3414567Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 2025-12-04T11:20:45.3414575Z 2025-12-04T11:20:45.3414861Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:20:45.3415085Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:20:45.3415217Z stats [('calls_captured', 6)] 2025-12-04T11:20:45.3416113Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)] 2025-12-04T11:20:45.3416419Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:20:45.3416543Z graph_break [] 2025-12-04T11:20:45.3416763Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:20:45.3417988Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T11:20:45.3418112Z if out == self.unknown_value: 2025-12-04T11:20:45.3418837Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.3418957Z warnings.warn( 2025-12-04T11:20:45.3419774Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.3419882Z warnings.warn( 2025-12-04T11:20:45.3420116Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:20:45.3420235Z stats [('calls_captured', 6)] 2025-12-04T11:20:45.3420479Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:20:45.3421404Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)] 2025-12-04T11:20:45.3421505Z graph_break [] 2025-12-04T11:20:45.3421742Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:20:45.3422468Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.3422586Z warnings.warn( 2025-12-04T11:20:45.3423308Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.3423446Z warnings.warn( 2025-12-04T11:20:45.3423607Z =================================== FAILURES =================================== 2025-12-04T11:20:45.3424123Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 _ 2025-12-04T11:20:45.3424253Z Traceback (most recent call last): 2025-12-04T11:20:45.3424780Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda 2025-12-04T11:20:45.3425010Z self.assertEqual(counters["inductor"]["woq_matcher_count"], 1) 2025-12-04T11:20:45.3425487Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual 2025-12-04T11:20:45.3425657Z return super().assertEqual(x, y, *args, **kwargs) 2025-12-04T11:20:45.3426195Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual 2025-12-04T11:20:45.3426420Z raise error_metas.pop()[0].to_error( # type: ignore[index] 2025-12-04T11:20:45.3426558Z AssertionError: Scalars are not equal! 2025-12-04T11:20:45.3426563Z 2025-12-04T11:20:45.3426682Z Expected 1 but got 2. 2025-12-04T11:20:45.3426800Z Absolute difference: 1 2025-12-04T11:20:45.3426911Z Relative difference: 1.0 2025-12-04T11:20:45.3426916Z 2025-12-04T11:20:45.3427144Z To execute this test, run the following from the base repo dir: 2025-12-04T11:20:45.3428056Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 2025-12-04T11:20:45.3428062Z 2025-12-04T11:20:45.3428334Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:20:45.3428570Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:20:45.3428690Z stats [('calls_captured', 6)] 2025-12-04T11:20:45.3429584Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)] 2025-12-04T11:20:45.3429814Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:20:45.3429912Z graph_break [] 2025-12-04T11:20:45.3430142Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:20:45.3431345Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T11:20:45.3431476Z if out == self.unknown_value: 2025-12-04T11:20:45.3432273Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.3432378Z warnings.warn( 2025-12-04T11:20:45.3433109Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.3433240Z warnings.warn( 2025-12-04T11:20:45.3433472Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:20:45.3433592Z stats [('calls_captured', 6)] 2025-12-04T11:20:45.3433821Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:20:45.3434724Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)] 2025-12-04T11:20:45.3434830Z graph_break [] 2025-12-04T11:20:45.3435081Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:20:45.3435821Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.3435924Z warnings.warn( 2025-12-04T11:20:45.3436658Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.3436760Z warnings.warn( 2025-12-04T11:20:45.3436978Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:20:45.3437110Z stats [('calls_captured', 6)] 2025-12-04T11:20:45.3437339Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:20:45.3438243Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)] 2025-12-04T11:20:45.3438345Z graph_break [] 2025-12-04T11:20:45.3438560Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:20:45.3439297Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.3439402Z warnings.warn( 2025-12-04T11:20:45.3440117Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.3440233Z warnings.warn( 2025-12-04T11:20:45.3441067Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-90f2ceb88314c75a.xml - 2025-12-04T11:20:45.3441261Z =========================== short test summary info ============================ 2025-12-04T11:20:45.3442215Z FAILED [0.5398s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 - AssertionError: Scalars are not equal! 2025-12-04T11:20:45.3442220Z 2025-12-04T11:20:45.3442341Z Expected 1 but got 2. 2025-12-04T11:20:45.3442454Z Absolute difference: 1 2025-12-04T11:20:45.3442568Z Relative difference: 1.0 2025-12-04T11:20:45.3442573Z 2025-12-04T11:20:45.3442807Z To execute this test, run the following from the base repo dir: 2025-12-04T11:20:45.3443715Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 2025-12-04T11:20:45.3443720Z 2025-12-04T11:20:45.3443989Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:20:45.3444266Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:20:45.3444470Z ================== 1 failed, 13 deselected, 2 rerun in 21.16s ================== 2025-12-04T11:20:45.3444584Z Got exit code 1 2025-12-04T11:20:45.3444694Z Retrying single test... 2025-12-04T11:20:45.3445141Z W1204 11:06:35.878000 88986 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T11:20:45.3445841Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-9644b19a5203c0ee.xml 2025-12-04T11:20:45.3446011Z ============================= test session starts ============================== 2025-12-04T11:20:45.3446377Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T11:20:45.3446488Z cachedir: .pytest_cache 2025-12-04T11:20:45.3447015Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:20:45.3447195Z rootdir: /var/lib/jenkins/workspace 2025-12-04T11:20:45.3447306Z configfile: pytest.ini 2025-12-04T11:20:45.3447851Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:20:45.3448084Z collecting ... collected 58 items / 13 deselected / 45 selected 2025-12-04T11:20:45.3449078Z stepcurrent: skipping 4 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 2025-12-04T11:20:45.3449209Z Running 1 items in this shard 2025-12-04T11:20:45.3449214Z 2025-12-04T11:20:45.3450500Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 [W1204 11:06:41.143577932 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3450508Z 2025-12-04T11:20:45.3451041Z [W1204 11:06:57.491570727 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3451046Z 2025-12-04T11:20:45.3451557Z [W1204 11:06:57.491841112 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3451565Z 2025-12-04T11:20:45.3452074Z [W1204 11:06:57.499345615 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3452092Z 2025-12-04T11:20:45.3452601Z [W1204 11:06:57.500152176 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3452607Z 2025-12-04T11:20:45.3453114Z [W1204 11:06:57.500360440 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3453120Z 2025-12-04T11:20:45.3453640Z [W1204 11:06:57.507655334 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3453646Z 2025-12-04T11:20:45.3454150Z [W1204 11:06:57.508356087 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3454158Z 2025-12-04T11:20:45.3454680Z [W1204 11:06:57.508562966 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3454685Z 2025-12-04T11:20:45.3455189Z [W1204 11:06:57.646212449 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3455194Z 2025-12-04T11:20:45.3455773Z [W1204 11:06:57.647973657 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3455779Z 2025-12-04T11:20:45.3456361Z [W1204 11:06:57.648188518 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3456367Z 2025-12-04T11:20:45.3456892Z [W1204 11:06:57.652239608 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3456936Z 2025-12-04T11:20:45.3457443Z [W1204 11:06:57.652965452 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3457448Z 2025-12-04T11:20:45.3457954Z [W1204 11:06:57.653168708 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3457974Z 2025-12-04T11:20:45.3458479Z [W1204 11:06:57.659298996 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3458489Z 2025-12-04T11:20:45.3458996Z [W1204 11:06:57.659964432 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3459032Z 2025-12-04T11:20:45.3459552Z [W1204 11:06:57.660186746 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3459559Z 2025-12-04T11:20:45.3459693Z ('RERUN', {'yellow': True}) [19.6324s] [100%] 2025-12-04T11:20:45.3460983Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 [W1204 11:06:57.156649865 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3460989Z 2025-12-04T11:20:45.3461498Z [W1204 11:06:57.157441711 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3461507Z 2025-12-04T11:20:45.3462033Z [W1204 11:06:57.157653505 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3462040Z 2025-12-04T11:20:45.3462545Z [W1204 11:06:57.161925863 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3462552Z 2025-12-04T11:20:45.3463059Z [W1204 11:06:57.162606988 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3463076Z 2025-12-04T11:20:45.3463586Z [W1204 11:06:57.162806247 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3463591Z 2025-12-04T11:20:45.3464098Z [W1204 11:06:57.169069830 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3464103Z 2025-12-04T11:20:45.3464624Z [W1204 11:06:57.169726148 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3464631Z 2025-12-04T11:20:45.3465138Z [W1204 11:06:57.169919758 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3465142Z 2025-12-04T11:20:45.3465668Z [W1204 11:06:57.259894073 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3465673Z 2025-12-04T11:20:45.3466179Z [W1204 11:06:57.260717715 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3466184Z 2025-12-04T11:20:45.3466701Z [W1204 11:06:57.260934182 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3466706Z 2025-12-04T11:20:45.3467283Z [W1204 11:06:57.264928029 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3467291Z 2025-12-04T11:20:45.3467811Z [W1204 11:06:57.265583236 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3467815Z 2025-12-04T11:20:45.3468321Z [W1204 11:06:57.265782299 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3468357Z 2025-12-04T11:20:45.3468866Z [W1204 11:06:57.271977896 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3468883Z 2025-12-04T11:20:45.3469390Z [W1204 11:06:57.272834749 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3469395Z 2025-12-04T11:20:45.3469906Z [W1204 11:06:57.273032791 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3469940Z 2025-12-04T11:20:45.3470084Z ('RERUN', {'yellow': True}) [0.5733s] [100%] 2025-12-04T11:20:45.3471768Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 [W1204 11:06:58.701941684 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3471780Z 2025-12-04T11:20:45.3472308Z [W1204 11:06:58.702738583 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3472313Z 2025-12-04T11:20:45.3472822Z [W1204 11:06:58.702956311 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3472826Z 2025-12-04T11:20:45.3473351Z [W1204 11:06:58.707098368 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3473358Z 2025-12-04T11:20:45.3473863Z [W1204 11:06:58.707781943 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3473868Z 2025-12-04T11:20:45.3474393Z [W1204 11:06:58.707988164 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3474400Z 2025-12-04T11:20:45.3474905Z [W1204 11:06:58.714324880 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3474910Z 2025-12-04T11:20:45.3475416Z [W1204 11:06:58.715006697 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3475422Z 2025-12-04T11:20:45.3475948Z [W1204 11:06:58.715203624 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3475956Z 2025-12-04T11:20:45.3476465Z [W1204 11:06:58.809890180 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3476470Z 2025-12-04T11:20:45.3476993Z [W1204 11:06:58.810733420 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3477000Z 2025-12-04T11:20:45.3477511Z [W1204 11:06:58.810958579 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3477517Z 2025-12-04T11:20:45.3478035Z [W1204 11:06:58.814946738 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3478040Z 2025-12-04T11:20:45.3478680Z [W1204 11:06:58.815617626 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3478688Z 2025-12-04T11:20:45.3479215Z [W1204 11:06:58.815819598 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3479220Z 2025-12-04T11:20:45.3479727Z [W1204 11:06:58.822002787 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3479774Z 2025-12-04T11:20:45.3480286Z [W1204 11:06:58.822861662 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3480304Z 2025-12-04T11:20:45.3480811Z [W1204 11:06:58.823060882 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3480816Z 2025-12-04T11:20:45.3480923Z FAILED [0.5485s] [100%] 2025-12-04T11:20:45.3480928Z 2025-12-04T11:20:45.3481098Z ==================================== RERUNS ==================================== 2025-12-04T11:20:45.3481674Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 _ 2025-12-04T11:20:45.3481813Z Traceback (most recent call last): 2025-12-04T11:20:45.3482335Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda 2025-12-04T11:20:45.3482571Z self.assertEqual(counters["inductor"]["woq_matcher_count"], 1) 2025-12-04T11:20:45.3483049Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual 2025-12-04T11:20:45.3483215Z return super().assertEqual(x, y, *args, **kwargs) 2025-12-04T11:20:45.3483749Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual 2025-12-04T11:20:45.3483971Z raise error_metas.pop()[0].to_error( # type: ignore[index] 2025-12-04T11:20:45.3484111Z AssertionError: Scalars are not equal! 2025-12-04T11:20:45.3484119Z 2025-12-04T11:20:45.3484241Z Expected 1 but got 2. 2025-12-04T11:20:45.3484351Z Absolute difference: 1 2025-12-04T11:20:45.3484465Z Relative difference: 1.0 2025-12-04T11:20:45.3484470Z 2025-12-04T11:20:45.3484698Z To execute this test, run the following from the base repo dir: 2025-12-04T11:20:45.3485603Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 2025-12-04T11:20:45.3485611Z 2025-12-04T11:20:45.3485893Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:20:45.3486115Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:20:45.3486232Z stats [('calls_captured', 6)] 2025-12-04T11:20:45.3487139Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)] 2025-12-04T11:20:45.3487371Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:20:45.3487485Z graph_break [] 2025-12-04T11:20:45.3487705Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:20:45.3488912Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T11:20:45.3489042Z if out == self.unknown_value: 2025-12-04T11:20:45.3489770Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.3489873Z warnings.warn( 2025-12-04T11:20:45.3490689Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.3490796Z warnings.warn( 2025-12-04T11:20:45.3491318Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 _ 2025-12-04T11:20:45.3491445Z Traceback (most recent call last): 2025-12-04T11:20:45.3491987Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda 2025-12-04T11:20:45.3492234Z self.assertEqual(counters["inductor"]["woq_matcher_count"], 1) 2025-12-04T11:20:45.3492695Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual 2025-12-04T11:20:45.3492876Z return super().assertEqual(x, y, *args, **kwargs) 2025-12-04T11:20:45.3493416Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual 2025-12-04T11:20:45.3493625Z raise error_metas.pop()[0].to_error( # type: ignore[index] 2025-12-04T11:20:45.3493805Z AssertionError: Scalars are not equal! 2025-12-04T11:20:45.3493811Z 2025-12-04T11:20:45.3493918Z Expected 1 but got 2. 2025-12-04T11:20:45.3494028Z Absolute difference: 1 2025-12-04T11:20:45.3494158Z Relative difference: 1.0 2025-12-04T11:20:45.3494163Z 2025-12-04T11:20:45.3494383Z To execute this test, run the following from the base repo dir: 2025-12-04T11:20:45.3495310Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 2025-12-04T11:20:45.3495316Z 2025-12-04T11:20:45.3495587Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:20:45.3495809Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:20:45.3495944Z stats [('calls_captured', 6)] 2025-12-04T11:20:45.3496905Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)] 2025-12-04T11:20:45.3497153Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:20:45.3497255Z graph_break [] 2025-12-04T11:20:45.3497476Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:20:45.3498704Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T11:20:45.3498827Z if out == self.unknown_value: 2025-12-04T11:20:45.3499570Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.3499676Z warnings.warn( 2025-12-04T11:20:45.3500395Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.3500514Z warnings.warn( 2025-12-04T11:20:45.3500736Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:20:45.3500855Z stats [('calls_captured', 6)] 2025-12-04T11:20:45.3501096Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:20:45.3501980Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)] 2025-12-04T11:20:45.3502097Z graph_break [] 2025-12-04T11:20:45.3502316Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:20:45.3503105Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.3503222Z warnings.warn( 2025-12-04T11:20:45.3503941Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.3504091Z warnings.warn( 2025-12-04T11:20:45.3504242Z =================================== FAILURES =================================== 2025-12-04T11:20:45.3504754Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 _ 2025-12-04T11:20:45.3504896Z Traceback (most recent call last): 2025-12-04T11:20:45.3505404Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda 2025-12-04T11:20:45.3505649Z self.assertEqual(counters["inductor"]["woq_matcher_count"], 1) 2025-12-04T11:20:45.3506108Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual 2025-12-04T11:20:45.3506304Z return super().assertEqual(x, y, *args, **kwargs) 2025-12-04T11:20:45.3506853Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual 2025-12-04T11:20:45.3507063Z raise error_metas.pop()[0].to_error( # type: ignore[index] 2025-12-04T11:20:45.3507193Z AssertionError: Scalars are not equal! 2025-12-04T11:20:45.3507200Z 2025-12-04T11:20:45.3507322Z Expected 1 but got 2. 2025-12-04T11:20:45.3507430Z Absolute difference: 1 2025-12-04T11:20:45.3507554Z Relative difference: 1.0 2025-12-04T11:20:45.3507559Z 2025-12-04T11:20:45.3507775Z To execute this test, run the following from the base repo dir: 2025-12-04T11:20:45.3508690Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 2025-12-04T11:20:45.3508698Z 2025-12-04T11:20:45.3508979Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:20:45.3509199Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:20:45.3509333Z stats [('calls_captured', 6)] 2025-12-04T11:20:45.3510224Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)] 2025-12-04T11:20:45.3510452Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:20:45.3510565Z graph_break [] 2025-12-04T11:20:45.3510783Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:20:45.3512003Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T11:20:45.3512128Z if out == self.unknown_value: 2025-12-04T11:20:45.3512849Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.3512966Z warnings.warn( 2025-12-04T11:20:45.3513684Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.3513786Z warnings.warn( 2025-12-04T11:20:45.3514017Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:20:45.3514133Z stats [('calls_captured', 6)] 2025-12-04T11:20:45.3514380Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:20:45.3515333Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)] 2025-12-04T11:20:45.3515438Z graph_break [] 2025-12-04T11:20:45.3515667Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:20:45.3516390Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.3516534Z warnings.warn( 2025-12-04T11:20:45.3517251Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.3517352Z warnings.warn( 2025-12-04T11:20:45.3517581Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:20:45.3517698Z stats [('calls_captured', 6)] 2025-12-04T11:20:45.3517930Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:20:45.3518874Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)] 2025-12-04T11:20:45.3518973Z graph_break [] 2025-12-04T11:20:45.3519204Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:20:45.3519930Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.3520030Z warnings.warn( 2025-12-04T11:20:45.3520759Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.3520859Z warnings.warn( 2025-12-04T11:20:45.3521708Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-9644b19a5203c0ee.xml - 2025-12-04T11:20:45.3521885Z =========================== short test summary info ============================ 2025-12-04T11:20:45.3522834Z FAILED [0.5485s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 - AssertionError: Scalars are not equal! 2025-12-04T11:20:45.3522842Z 2025-12-04T11:20:45.3522964Z Expected 1 but got 2. 2025-12-04T11:20:45.3523073Z Absolute difference: 1 2025-12-04T11:20:45.3523199Z Relative difference: 1.0 2025-12-04T11:20:45.3523204Z 2025-12-04T11:20:45.3523425Z To execute this test, run the following from the base repo dir: 2025-12-04T11:20:45.3524339Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 2025-12-04T11:20:45.3524346Z 2025-12-04T11:20:45.3524629Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:20:45.3524811Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:20:45.3525023Z ================== 1 failed, 13 deselected, 2 rerun in 20.79s ================== 2025-12-04T11:20:45.3525128Z Got exit code 1 2025-12-04T11:20:45.3525958Z FAILED CONSISTENTLY: test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 2025-12-04T11:20:45.3526384Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T11:20:45.3526835Z W1204 11:07:10.337000 89195 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T11:20:45.3527569Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-1a6f999e52eb1904.xml 2025-12-04T11:20:45.3527740Z ============================= test session starts ============================== 2025-12-04T11:20:45.3528092Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T11:20:45.3528218Z cachedir: .pytest_cache 2025-12-04T11:20:45.3528786Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:20:45.3528911Z rootdir: /var/lib/jenkins/workspace 2025-12-04T11:20:45.3529039Z configfile: pytest.ini 2025-12-04T11:20:45.3529579Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:20:45.3529808Z collecting ... collected 58 items / 5 deselected / 53 selected 2025-12-04T11:20:45.3529949Z stepcurrent: skipping 5 already run items. 2025-12-04T11:20:45.3530068Z Running 9 items in this shard 2025-12-04T11:20:45.3530105Z 2025-12-04T11:20:45.3530996Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 ('RERUN', {'yellow': True}) [4.3412s] [ 11%] 2025-12-04T11:20:45.3531869Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 ('RERUN', {'yellow': True}) [0.8933s] [ 11%] 2025-12-04T11:20:45.3532672Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 FAILED [0.8957s] [ 11%] 2025-12-04T11:20:45.3532678Z 2025-12-04T11:20:45.3532822Z ==================================== RERUNS ==================================== 2025-12-04T11:20:45.3533338Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 _ 2025-12-04T11:20:45.3533476Z Traceback (most recent call last): 2025-12-04T11:20:45.3533989Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda 2025-12-04T11:20:45.3534233Z self.assertEqual(counters["inductor"]["woq_matcher_count"], 1) 2025-12-04T11:20:45.3534692Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual 2025-12-04T11:20:45.3534858Z return super().assertEqual(x, y, *args, **kwargs) 2025-12-04T11:20:45.3535410Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual 2025-12-04T11:20:45.3535618Z raise error_metas.pop()[0].to_error( # type: ignore[index] 2025-12-04T11:20:45.3535765Z AssertionError: Scalars are not equal! 2025-12-04T11:20:45.3535770Z 2025-12-04T11:20:45.3535878Z Expected 1 but got 2. 2025-12-04T11:20:45.3535988Z Absolute difference: 1 2025-12-04T11:20:45.3536116Z Relative difference: 1.0 2025-12-04T11:20:45.3536123Z 2025-12-04T11:20:45.3536417Z To execute this test, run the following from the base repo dir: 2025-12-04T11:20:45.3537337Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 2025-12-04T11:20:45.3537360Z 2025-12-04T11:20:45.3537631Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:20:45.3537853Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:20:45.3537985Z stats [('calls_captured', 6)] 2025-12-04T11:20:45.3538515Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)] 2025-12-04T11:20:45.3538743Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:20:45.3538927Z graph_break [] 2025-12-04T11:20:45.3539147Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:20:45.3539894Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.3539995Z warnings.warn( 2025-12-04T11:20:45.3540743Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.3540862Z warnings.warn( 2025-12-04T11:20:45.3541378Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 _ 2025-12-04T11:20:45.3541500Z Traceback (most recent call last): 2025-12-04T11:20:45.3542018Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda 2025-12-04T11:20:45.3542256Z self.assertEqual(counters["inductor"]["woq_matcher_count"], 1) 2025-12-04T11:20:45.3542755Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual 2025-12-04T11:20:45.3542920Z return super().assertEqual(x, y, *args, **kwargs) 2025-12-04T11:20:45.3543455Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual 2025-12-04T11:20:45.3543677Z raise error_metas.pop()[0].to_error( # type: ignore[index] 2025-12-04T11:20:45.3543812Z AssertionError: Scalars are not equal! 2025-12-04T11:20:45.3543817Z 2025-12-04T11:20:45.3543938Z Expected 1 but got 2. 2025-12-04T11:20:45.3544048Z Absolute difference: 1 2025-12-04T11:20:45.3544160Z Relative difference: 1.0 2025-12-04T11:20:45.3544165Z 2025-12-04T11:20:45.3544395Z To execute this test, run the following from the base repo dir: 2025-12-04T11:20:45.3545317Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 2025-12-04T11:20:45.3545325Z 2025-12-04T11:20:45.3545609Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:20:45.3545830Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:20:45.3545949Z stats [('calls_captured', 6)] 2025-12-04T11:20:45.3546496Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)] 2025-12-04T11:20:45.3546727Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:20:45.3546827Z graph_break [] 2025-12-04T11:20:45.3547058Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:20:45.3547790Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.3547910Z warnings.warn( 2025-12-04T11:20:45.3548631Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.3548733Z warnings.warn( 2025-12-04T11:20:45.3548963Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:20:45.3549082Z stats [('calls_captured', 6)] 2025-12-04T11:20:45.3549310Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:20:45.3549849Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)] 2025-12-04T11:20:45.3549949Z graph_break [] 2025-12-04T11:20:45.3550177Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:20:45.3550970Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.3551076Z warnings.warn( 2025-12-04T11:20:45.3551807Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.3551909Z warnings.warn( 2025-12-04T11:20:45.3552091Z =================================== FAILURES =================================== 2025-12-04T11:20:45.3552621Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 _ 2025-12-04T11:20:45.3552744Z Traceback (most recent call last): 2025-12-04T11:20:45.3553271Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda 2025-12-04T11:20:45.3553503Z self.assertEqual(counters["inductor"]["woq_matcher_count"], 1) 2025-12-04T11:20:45.3553965Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual 2025-12-04T11:20:45.3554177Z return super().assertEqual(x, y, *args, **kwargs) 2025-12-04T11:20:45.3554715Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual 2025-12-04T11:20:45.3554942Z raise error_metas.pop()[0].to_error( # type: ignore[index] 2025-12-04T11:20:45.3555079Z AssertionError: Scalars are not equal! 2025-12-04T11:20:45.3555085Z 2025-12-04T11:20:45.3555192Z Expected 1 but got 2. 2025-12-04T11:20:45.3555316Z Absolute difference: 1 2025-12-04T11:20:45.3555427Z Relative difference: 1.0 2025-12-04T11:20:45.3555432Z 2025-12-04T11:20:45.3555646Z To execute this test, run the following from the base repo dir: 2025-12-04T11:20:45.3556585Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 2025-12-04T11:20:45.3556591Z 2025-12-04T11:20:45.3556864Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:20:45.3557100Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:20:45.3557219Z stats [('calls_captured', 6)] 2025-12-04T11:20:45.3557745Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)] 2025-12-04T11:20:45.3557992Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:20:45.3558094Z graph_break [] 2025-12-04T11:20:45.3558328Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:20:45.3559057Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.3559161Z warnings.warn( 2025-12-04T11:20:45.3559894Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.3560002Z warnings.warn( 2025-12-04T11:20:45.3560221Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:20:45.3560355Z stats [('calls_captured', 6)] 2025-12-04T11:20:45.3560593Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:20:45.3561141Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)] 2025-12-04T11:20:45.3561245Z graph_break [] 2025-12-04T11:20:45.3561461Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:20:45.3562201Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.3562366Z warnings.warn( 2025-12-04T11:20:45.3563102Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.3563206Z warnings.warn( 2025-12-04T11:20:45.3563420Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:20:45.3563587Z stats [('calls_captured', 6)] 2025-12-04T11:20:45.3563816Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:20:45.3564343Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)] 2025-12-04T11:20:45.3564457Z graph_break [] 2025-12-04T11:20:45.3564675Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:20:45.3565411Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.3565555Z warnings.warn( 2025-12-04T11:20:45.3566268Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.3566381Z warnings.warn( 2025-12-04T11:20:45.3567214Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-1a6f999e52eb1904.xml - 2025-12-04T11:20:45.3567409Z =========================== short test summary info ============================ 2025-12-04T11:20:45.3568358Z FAILED [0.8957s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 - AssertionError: Scalars are not equal! 2025-12-04T11:20:45.3568364Z 2025-12-04T11:20:45.3568474Z Expected 1 but got 2. 2025-12-04T11:20:45.3568600Z Absolute difference: 1 2025-12-04T11:20:45.3568716Z Relative difference: 1.0 2025-12-04T11:20:45.3568723Z 2025-12-04T11:20:45.3568953Z To execute this test, run the following from the base repo dir: 2025-12-04T11:20:45.3569864Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 2025-12-04T11:20:45.3569872Z 2025-12-04T11:20:45.3570140Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:20:45.3570331Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:20:45.3570528Z =================== 1 failed, 5 deselected, 2 rerun in 6.16s =================== 2025-12-04T11:20:45.3570629Z Got exit code 1 2025-12-04T11:20:45.3570750Z Retrying single test... 2025-12-04T11:20:45.3571530Z W1204 11:07:31.151000 89372 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T11:20:45.3572203Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-547414903ca204e9.xml 2025-12-04T11:20:45.3572373Z ============================= test session starts ============================== 2025-12-04T11:20:45.3572722Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T11:20:45.3572850Z cachedir: .pytest_cache 2025-12-04T11:20:45.3573375Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:20:45.3573516Z rootdir: /var/lib/jenkins/workspace 2025-12-04T11:20:45.3573625Z configfile: pytest.ini 2025-12-04T11:20:45.3574171Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:20:45.3574556Z collecting ... collected 58 items / 13 deselected / 45 selected 2025-12-04T11:20:45.3575562Z stepcurrent: skipping 5 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 2025-12-04T11:20:45.3575682Z Running 1 items in this shard 2025-12-04T11:20:45.3575703Z 2025-12-04T11:20:45.3577055Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 [W1204 11:07:35.611505898 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3577111Z 2025-12-04T11:20:45.3577632Z [W1204 11:07:51.542888694 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3577651Z 2025-12-04T11:20:45.3578166Z [W1204 11:07:51.543149450 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3578216Z 2025-12-04T11:20:45.3578728Z [W1204 11:07:51.550598929 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3578733Z 2025-12-04T11:20:45.3579256Z [W1204 11:07:51.551419010 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3579264Z 2025-12-04T11:20:45.3579777Z [W1204 11:07:51.551615413 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3579782Z 2025-12-04T11:20:45.3580305Z [W1204 11:07:51.558867861 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3580310Z 2025-12-04T11:20:45.3580824Z [W1204 11:07:51.559648206 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3580829Z 2025-12-04T11:20:45.3581353Z [W1204 11:07:51.559847222 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3581357Z 2025-12-04T11:20:45.3581864Z [W1204 11:07:53.562425796 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3581872Z 2025-12-04T11:20:45.3582393Z [W1204 11:07:53.564202215 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3582398Z 2025-12-04T11:20:45.3582906Z [W1204 11:07:53.564431693 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3582911Z 2025-12-04T11:20:45.3583418Z [W1204 11:07:53.568497211 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3583426Z 2025-12-04T11:20:45.3583949Z [W1204 11:07:53.569209727 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3583956Z 2025-12-04T11:20:45.3584462Z [W1204 11:07:53.569413525 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3584469Z 2025-12-04T11:20:45.3584989Z [W1204 11:07:53.575625034 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3584994Z 2025-12-04T11:20:45.3585503Z [W1204 11:07:53.576332632 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3585508Z 2025-12-04T11:20:45.3586028Z [W1204 11:07:53.576544452 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3586033Z 2025-12-04T11:20:45.3586227Z ('RERUN', {'yellow': True}) [20.2628s] [100%] 2025-12-04T11:20:45.3587535Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 [W1204 11:07:54.416233067 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3587572Z 2025-12-04T11:20:45.3588084Z [W1204 11:07:54.417051198 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3588089Z 2025-12-04T11:20:45.3588594Z [W1204 11:07:54.417264930 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3588611Z 2025-12-04T11:20:45.3589120Z [W1204 11:07:54.421342055 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3589129Z 2025-12-04T11:20:45.3589636Z [W1204 11:07:54.422217987 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3589669Z 2025-12-04T11:20:45.3590190Z [W1204 11:07:54.422418389 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3590197Z 2025-12-04T11:20:45.3590707Z [W1204 11:07:54.428624018 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3590711Z 2025-12-04T11:20:45.3591234Z [W1204 11:07:54.429314103 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3591238Z 2025-12-04T11:20:45.3591746Z [W1204 11:07:54.429506896 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3591751Z 2025-12-04T11:20:45.3592276Z [W1204 11:07:54.521090188 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3592283Z 2025-12-04T11:20:45.3592792Z [W1204 11:07:54.521900368 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3592797Z 2025-12-04T11:20:45.3593322Z [W1204 11:07:54.522117599 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3593327Z 2025-12-04T11:20:45.3593840Z [W1204 11:07:54.526209633 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3593845Z 2025-12-04T11:20:45.3594349Z [W1204 11:07:54.526902162 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3594367Z 2025-12-04T11:20:45.3594878Z [W1204 11:07:54.527100942 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3594884Z 2025-12-04T11:20:45.3595394Z [W1204 11:07:54.533343262 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3595398Z 2025-12-04T11:20:45.3595918Z [W1204 11:07:54.534283037 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3595925Z 2025-12-04T11:20:45.3596432Z [W1204 11:07:54.534483535 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3596437Z 2025-12-04T11:20:45.3596580Z ('RERUN', {'yellow': True}) [0.9181s] [100%] 2025-12-04T11:20:45.3597921Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 [W1204 11:07:54.314101059 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3597929Z 2025-12-04T11:20:45.3598455Z [W1204 11:07:54.314902345 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3598460Z 2025-12-04T11:20:45.3598971Z [W1204 11:07:54.315103523 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3599007Z 2025-12-04T11:20:45.3599531Z [W1204 11:07:54.319066398 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3599536Z 2025-12-04T11:20:45.3600044Z [W1204 11:07:54.319718292 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3600050Z 2025-12-04T11:20:45.3600564Z [W1204 11:07:54.319909131 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3600601Z 2025-12-04T11:20:45.3601121Z [W1204 11:07:54.326016425 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3601126Z 2025-12-04T11:20:45.3601631Z [W1204 11:07:54.326674109 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3601638Z 2025-12-04T11:20:45.3602159Z [W1204 11:07:54.326861191 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3602164Z 2025-12-04T11:20:45.3602672Z [W1204 11:07:55.416622721 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3602677Z 2025-12-04T11:20:45.3603199Z [W1204 11:07:55.417412597 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3603206Z 2025-12-04T11:20:45.3603714Z [W1204 11:07:55.417619963 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3603719Z 2025-12-04T11:20:45.3604240Z [W1204 11:07:55.421646380 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3604247Z 2025-12-04T11:20:45.3604753Z [W1204 11:07:55.422323954 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3604758Z 2025-12-04T11:20:45.3605268Z [W1204 11:07:55.422523436 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3605285Z 2025-12-04T11:20:45.3605797Z [W1204 11:07:55.428638163 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3605804Z 2025-12-04T11:20:45.3606311Z [W1204 11:07:55.429479410 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3606316Z 2025-12-04T11:20:45.3606831Z [W1204 11:07:55.429676683 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3606838Z 2025-12-04T11:20:45.3606941Z FAILED [0.8927s] [100%] 2025-12-04T11:20:45.3606946Z 2025-12-04T11:20:45.3607100Z ==================================== RERUNS ==================================== 2025-12-04T11:20:45.3607617Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 _ 2025-12-04T11:20:45.3607741Z Traceback (most recent call last): 2025-12-04T11:20:45.3608327Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda 2025-12-04T11:20:45.3608562Z self.assertEqual(counters["inductor"]["woq_matcher_count"], 1) 2025-12-04T11:20:45.3609042Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual 2025-12-04T11:20:45.3609205Z return super().assertEqual(x, y, *args, **kwargs) 2025-12-04T11:20:45.3609744Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual 2025-12-04T11:20:45.3609999Z raise error_metas.pop()[0].to_error( # type: ignore[index] 2025-12-04T11:20:45.3610134Z AssertionError: Scalars are not equal! 2025-12-04T11:20:45.3610139Z 2025-12-04T11:20:45.3610258Z Expected 1 but got 2. 2025-12-04T11:20:45.3610366Z Absolute difference: 1 2025-12-04T11:20:45.3610477Z Relative difference: 1.0 2025-12-04T11:20:45.3610482Z 2025-12-04T11:20:45.3610710Z To execute this test, run the following from the base repo dir: 2025-12-04T11:20:45.3611633Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 2025-12-04T11:20:45.3611670Z 2025-12-04T11:20:45.3611943Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:20:45.3612180Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:20:45.3612299Z stats [('calls_captured', 6)] 2025-12-04T11:20:45.3612841Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)] 2025-12-04T11:20:45.3613070Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:20:45.3613170Z graph_break [] 2025-12-04T11:20:45.3613401Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:20:45.3614618Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T11:20:45.3614750Z if out == self.unknown_value: 2025-12-04T11:20:45.3615474Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.3615583Z warnings.warn( 2025-12-04T11:20:45.3616389Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.3616496Z warnings.warn( 2025-12-04T11:20:45.3617016Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 _ 2025-12-04T11:20:45.3617156Z Traceback (most recent call last): 2025-12-04T11:20:45.3617668Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda 2025-12-04T11:20:45.3617917Z self.assertEqual(counters["inductor"]["woq_matcher_count"], 1) 2025-12-04T11:20:45.3618374Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual 2025-12-04T11:20:45.3618539Z return super().assertEqual(x, y, *args, **kwargs) 2025-12-04T11:20:45.3619092Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual 2025-12-04T11:20:45.3619299Z raise error_metas.pop()[0].to_error( # type: ignore[index] 2025-12-04T11:20:45.3619451Z AssertionError: Scalars are not equal! 2025-12-04T11:20:45.3619456Z 2025-12-04T11:20:45.3619564Z Expected 1 but got 2. 2025-12-04T11:20:45.3619673Z Absolute difference: 1 2025-12-04T11:20:45.3619801Z Relative difference: 1.0 2025-12-04T11:20:45.3619806Z 2025-12-04T11:20:45.3620026Z To execute this test, run the following from the base repo dir: 2025-12-04T11:20:45.3621033Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 2025-12-04T11:20:45.3621056Z 2025-12-04T11:20:45.3621329Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:20:45.3621550Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:20:45.3621716Z stats [('calls_captured', 6)] 2025-12-04T11:20:45.3622246Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)] 2025-12-04T11:20:45.3622474Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:20:45.3622588Z graph_break [] 2025-12-04T11:20:45.3622802Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:20:45.3624020Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T11:20:45.3624177Z if out == self.unknown_value: 2025-12-04T11:20:45.3624896Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.3625016Z warnings.warn( 2025-12-04T11:20:45.3625736Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.3625851Z warnings.warn( 2025-12-04T11:20:45.3626069Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:20:45.3626187Z stats [('calls_captured', 6)] 2025-12-04T11:20:45.3626433Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:20:45.3626960Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)] 2025-12-04T11:20:45.3627060Z graph_break [] 2025-12-04T11:20:45.3627641Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:20:45.3628478Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.3628622Z warnings.warn( 2025-12-04T11:20:45.3629444Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.3629586Z warnings.warn( 2025-12-04T11:20:45.3629749Z =================================== FAILURES =================================== 2025-12-04T11:20:45.3630420Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 _ 2025-12-04T11:20:45.3630586Z Traceback (most recent call last): 2025-12-04T11:20:45.3631142Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda 2025-12-04T11:20:45.3631468Z self.assertEqual(counters["inductor"]["woq_matcher_count"], 1) 2025-12-04T11:20:45.3631968Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual 2025-12-04T11:20:45.3632234Z return super().assertEqual(x, y, *args, **kwargs) 2025-12-04T11:20:45.3632823Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual 2025-12-04T11:20:45.3633067Z raise error_metas.pop()[0].to_error( # type: ignore[index] 2025-12-04T11:20:45.3633304Z AssertionError: Scalars are not equal! 2025-12-04T11:20:45.3633310Z 2025-12-04T11:20:45.3633452Z Expected 1 but got 2. 2025-12-04T11:20:45.3633688Z Absolute difference: 1 2025-12-04T11:20:45.3633885Z Relative difference: 1.0 2025-12-04T11:20:45.3633891Z 2025-12-04T11:20:45.3634169Z To execute this test, run the following from the base repo dir: 2025-12-04T11:20:45.3635187Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 2025-12-04T11:20:45.3635224Z 2025-12-04T11:20:45.3635532Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:20:45.3635841Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:20:45.3635975Z stats [('calls_captured', 6)] 2025-12-04T11:20:45.3636579Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)] 2025-12-04T11:20:45.3636931Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:20:45.3637069Z graph_break [] 2025-12-04T11:20:45.3637360Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:20:45.3638657Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T11:20:45.3638794Z if out == self.unknown_value: 2025-12-04T11:20:45.3639697Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.3639842Z warnings.warn( 2025-12-04T11:20:45.3640599Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.3640785Z warnings.warn( 2025-12-04T11:20:45.3641046Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:20:45.3641282Z stats [('calls_captured', 6)] 2025-12-04T11:20:45.3641569Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:20:45.3642135Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)] 2025-12-04T11:20:45.3642325Z graph_break [] 2025-12-04T11:20:45.3642582Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:20:45.3643386Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.3643569Z warnings.warn( 2025-12-04T11:20:45.3644347Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.3644534Z warnings.warn( 2025-12-04T11:20:45.3644790Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:20:45.3644941Z stats [('calls_captured', 6)] 2025-12-04T11:20:45.3645245Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:20:45.3645843Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)] 2025-12-04T11:20:45.3646056Z graph_break [] 2025-12-04T11:20:45.3646311Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:20:45.3647068Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.3647267Z warnings.warn( 2025-12-04T11:20:45.3648066Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.3648309Z warnings.warn( 2025-12-04T11:20:45.3649181Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-547414903ca204e9.xml - 2025-12-04T11:20:45.3649391Z =========================== short test summary info ============================ 2025-12-04T11:20:45.3650478Z FAILED [0.8927s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 - AssertionError: Scalars are not equal! 2025-12-04T11:20:45.3650486Z 2025-12-04T11:20:45.3650632Z Expected 1 but got 2. 2025-12-04T11:20:45.3650842Z Absolute difference: 1 2025-12-04T11:20:45.3651012Z Relative difference: 1.0 2025-12-04T11:20:45.3651018Z 2025-12-04T11:20:45.3651277Z To execute this test, run the following from the base repo dir: 2025-12-04T11:20:45.3652296Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 2025-12-04T11:20:45.3652340Z 2025-12-04T11:20:45.3652648Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:20:45.3652888Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:20:45.3653165Z ================== 1 failed, 13 deselected, 2 rerun in 22.11s ================== 2025-12-04T11:20:45.3653321Z Got exit code 1 2025-12-04T11:20:45.3653536Z Retrying single test... 2025-12-04T11:20:45.3654021Z W1204 11:08:06.780000 89554 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T11:20:45.3654762Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-41f0f199b083e6d2.xml 2025-12-04T11:20:45.3654948Z ============================= test session starts ============================== 2025-12-04T11:20:45.3655375Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T11:20:45.3655604Z cachedir: .pytest_cache 2025-12-04T11:20:45.3656167Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:20:45.3656479Z rootdir: /var/lib/jenkins/workspace 2025-12-04T11:20:45.3656629Z configfile: pytest.ini 2025-12-04T11:20:45.3657188Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:20:45.3657564Z collecting ... collected 58 items / 13 deselected / 45 selected 2025-12-04T11:20:45.3658611Z stepcurrent: skipping 5 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 2025-12-04T11:20:45.3658767Z Running 1 items in this shard 2025-12-04T11:20:45.3658822Z 2025-12-04T11:20:45.3660144Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 [W1204 11:08:10.243397218 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3660153Z 2025-12-04T11:20:45.3660710Z [W1204 11:08:26.993787869 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3660742Z 2025-12-04T11:20:45.3661337Z [W1204 11:08:26.994072693 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3661344Z 2025-12-04T11:20:45.3661992Z [W1204 11:08:26.001587353 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3661998Z 2025-12-04T11:20:45.3662593Z [W1204 11:08:26.002347959 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3662598Z 2025-12-04T11:20:45.3663144Z [W1204 11:08:26.002547272 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3663182Z 2025-12-04T11:20:45.3663777Z [W1204 11:08:26.009701318 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3663782Z 2025-12-04T11:20:45.3664315Z [W1204 11:08:26.010488744 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3664320Z 2025-12-04T11:20:45.3664973Z [W1204 11:08:26.010694260 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3664984Z 2025-12-04T11:20:45.3665530Z [W1204 11:08:28.017672412 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3665565Z 2025-12-04T11:20:45.3666162Z [W1204 11:08:28.019424886 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3666170Z 2025-12-04T11:20:45.3666715Z [W1204 11:08:28.019642223 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3666720Z 2025-12-04T11:20:45.3667275Z [W1204 11:08:28.023893815 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3667306Z 2025-12-04T11:20:45.3667890Z [W1204 11:08:28.024658586 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3667895Z 2025-12-04T11:20:45.3668460Z [W1204 11:08:28.024871034 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3668467Z 2025-12-04T11:20:45.3669061Z [W1204 11:08:28.031158266 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3669067Z 2025-12-04T11:20:45.3669613Z [W1204 11:08:28.031887418 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3669618Z 2025-12-04T11:20:45.3670221Z [W1204 11:08:28.032091799 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3670226Z 2025-12-04T11:20:45.3670374Z ('RERUN', {'yellow': True}) [20.0741s] [100%] 2025-12-04T11:20:45.3672195Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 [W1204 11:08:29.893022120 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3672206Z 2025-12-04T11:20:45.3672760Z [W1204 11:08:29.893848526 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3672768Z 2025-12-04T11:20:45.3673361Z [W1204 11:08:29.894062370 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3673366Z 2025-12-04T11:20:45.3673921Z [W1204 11:08:29.898277431 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3673926Z 2025-12-04T11:20:45.3674471Z [W1204 11:08:29.899185675 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3674503Z 2025-12-04T11:20:45.3675215Z [W1204 11:08:29.899389102 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3675224Z 2025-12-04T11:20:45.3675788Z [W1204 11:08:29.905728040 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3675843Z 2025-12-04T11:20:45.3676441Z [W1204 11:08:29.906459251 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3676447Z 2025-12-04T11:20:45.3676991Z [W1204 11:08:29.906659857 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3676997Z 2025-12-04T11:20:45.3677591Z [W1204 11:08:29.000988940 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3677597Z 2025-12-04T11:20:45.3678122Z [W1204 11:08:29.001828087 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3678188Z 2025-12-04T11:20:45.3678847Z [W1204 11:08:29.002053371 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3678852Z 2025-12-04T11:20:45.3679408Z [W1204 11:08:29.006248342 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3679417Z 2025-12-04T11:20:45.3680010Z [W1204 11:08:29.006984842 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3680015Z 2025-12-04T11:20:45.3680558Z [W1204 11:08:29.007199666 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3680563Z 2025-12-04T11:20:45.3681140Z [W1204 11:08:29.013583811 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3681148Z 2025-12-04T11:20:45.3681746Z [W1204 11:08:29.014574444 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3681752Z 2025-12-04T11:20:45.3682324Z [W1204 11:08:29.014783817 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3682380Z 2025-12-04T11:20:45.3682550Z ('RERUN', {'yellow': True}) [0.9434s] [100%] 2025-12-04T11:20:45.3683870Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 [W1204 11:08:30.799798371 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3683877Z 2025-12-04T11:20:45.3684472Z [W1204 11:08:30.800641898 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3684480Z 2025-12-04T11:20:45.3685000Z [W1204 11:08:30.800858234 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3685005Z 2025-12-04T11:20:45.3685670Z [W1204 11:08:30.804933634 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3685678Z 2025-12-04T11:20:45.3686221Z [W1204 11:08:30.805606551 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3686226Z 2025-12-04T11:20:45.3686825Z [W1204 11:08:30.805804789 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3686831Z 2025-12-04T11:20:45.3687436Z [W1204 11:08:30.811985885 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3687444Z 2025-12-04T11:20:45.3688017Z [W1204 11:08:30.812657803 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3688022Z 2025-12-04T11:20:45.3688612Z [W1204 11:08:30.812852916 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3688647Z 2025-12-04T11:20:45.3689208Z [W1204 11:08:30.901570427 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3689261Z 2025-12-04T11:20:45.3689802Z [W1204 11:08:30.902319220 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3689807Z 2025-12-04T11:20:45.3690359Z [W1204 11:08:30.902527550 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3690392Z 2025-12-04T11:20:45.3690997Z [W1204 11:08:30.906516533 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3691002Z 2025-12-04T11:20:45.3691524Z [W1204 11:08:30.907168882 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3691532Z 2025-12-04T11:20:45.3692176Z [W1204 11:08:30.907369815 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3692182Z 2025-12-04T11:20:45.3692726Z [W1204 11:08:30.913520642 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3692731Z 2025-12-04T11:20:45.3693328Z [W1204 11:08:30.914374260 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3693333Z 2025-12-04T11:20:45.3693890Z [W1204 11:08:30.914574983 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3693896Z 2025-12-04T11:20:45.3694061Z FAILED [0.8960s] [100%] 2025-12-04T11:20:45.3694066Z 2025-12-04T11:20:45.3694282Z ==================================== RERUNS ==================================== 2025-12-04T11:20:45.3694856Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 _ 2025-12-04T11:20:45.3695067Z Traceback (most recent call last): 2025-12-04T11:20:45.3695620Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda 2025-12-04T11:20:45.3695946Z self.assertEqual(counters["inductor"]["woq_matcher_count"], 1) 2025-12-04T11:20:45.3696519Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual 2025-12-04T11:20:45.3696771Z return super().assertEqual(x, y, *args, **kwargs) 2025-12-04T11:20:45.3697413Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual 2025-12-04T11:20:45.3697658Z raise error_metas.pop()[0].to_error( # type: ignore[index] 2025-12-04T11:20:45.3697925Z AssertionError: Scalars are not equal! 2025-12-04T11:20:45.3697984Z 2025-12-04T11:20:45.3698131Z Expected 1 but got 2. 2025-12-04T11:20:45.3698254Z Absolute difference: 1 2025-12-04T11:20:45.3698507Z Relative difference: 1.0 2025-12-04T11:20:45.3698512Z 2025-12-04T11:20:45.3698767Z To execute this test, run the following from the base repo dir: 2025-12-04T11:20:45.3699720Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 2025-12-04T11:20:45.3699788Z 2025-12-04T11:20:45.3700170Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:20:45.3700438Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:20:45.3700660Z stats [('calls_captured', 6)] 2025-12-04T11:20:45.3701250Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)] 2025-12-04T11:20:45.3701565Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:20:45.3701754Z graph_break [] 2025-12-04T11:20:45.3702015Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:20:45.3703283Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T11:20:45.3703482Z if out == self.unknown_value: 2025-12-04T11:20:45.3704273Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.3704498Z warnings.warn( 2025-12-04T11:20:45.3705254Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.3705448Z warnings.warn( 2025-12-04T11:20:45.3705981Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 _ 2025-12-04T11:20:45.3706184Z Traceback (most recent call last): 2025-12-04T11:20:45.3706877Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda 2025-12-04T11:20:45.3707147Z self.assertEqual(counters["inductor"]["woq_matcher_count"], 1) 2025-12-04T11:20:45.3707700Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual 2025-12-04T11:20:45.3707907Z return super().assertEqual(x, y, *args, **kwargs) 2025-12-04T11:20:45.3718594Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual 2025-12-04T11:20:45.3718925Z raise error_metas.pop()[0].to_error( # type: ignore[index] 2025-12-04T11:20:45.3719098Z AssertionError: Scalars are not equal! 2025-12-04T11:20:45.3719106Z 2025-12-04T11:20:45.3719220Z Expected 1 but got 2. 2025-12-04T11:20:45.3719328Z Absolute difference: 1 2025-12-04T11:20:45.3719455Z Relative difference: 1.0 2025-12-04T11:20:45.3719460Z 2025-12-04T11:20:45.3719685Z To execute this test, run the following from the base repo dir: 2025-12-04T11:20:45.3720625Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 2025-12-04T11:20:45.3720647Z 2025-12-04T11:20:45.3720924Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:20:45.3721151Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:20:45.3721283Z stats [('calls_captured', 6)] 2025-12-04T11:20:45.3721815Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)] 2025-12-04T11:20:45.3722049Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:20:45.3722166Z graph_break [] 2025-12-04T11:20:45.3722386Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:20:45.3723613Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T11:20:45.3723903Z if out == self.unknown_value: 2025-12-04T11:20:45.3724639Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.3724759Z warnings.warn( 2025-12-04T11:20:45.3725482Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.3725638Z warnings.warn( 2025-12-04T11:20:45.3725862Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:20:45.3725978Z stats [('calls_captured', 6)] 2025-12-04T11:20:45.3726223Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:20:45.3726757Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)] 2025-12-04T11:20:45.3726863Z graph_break [] 2025-12-04T11:20:45.3727097Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:20:45.3727866Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.3727982Z warnings.warn( 2025-12-04T11:20:45.3728702Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.3728807Z warnings.warn( 2025-12-04T11:20:45.3728969Z =================================== FAILURES =================================== 2025-12-04T11:20:45.3729485Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 _ 2025-12-04T11:20:45.3729626Z Traceback (most recent call last): 2025-12-04T11:20:45.3730145Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda 2025-12-04T11:20:45.3730381Z self.assertEqual(counters["inductor"]["woq_matcher_count"], 1) 2025-12-04T11:20:45.3730852Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual 2025-12-04T11:20:45.3731016Z return super().assertEqual(x, y, *args, **kwargs) 2025-12-04T11:20:45.3731553Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual 2025-12-04T11:20:45.3731770Z raise error_metas.pop()[0].to_error( # type: ignore[index] 2025-12-04T11:20:45.3731905Z AssertionError: Scalars are not equal! 2025-12-04T11:20:45.3731910Z 2025-12-04T11:20:45.3732031Z Expected 1 but got 2. 2025-12-04T11:20:45.3732140Z Absolute difference: 1 2025-12-04T11:20:45.3732252Z Relative difference: 1.0 2025-12-04T11:20:45.3732257Z 2025-12-04T11:20:45.3732488Z To execute this test, run the following from the base repo dir: 2025-12-04T11:20:45.3733410Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 2025-12-04T11:20:45.3733420Z 2025-12-04T11:20:45.3733702Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:20:45.3733924Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:20:45.3734044Z stats [('calls_captured', 6)] 2025-12-04T11:20:45.3734592Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)] 2025-12-04T11:20:45.3734825Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:20:45.3734927Z graph_break [] 2025-12-04T11:20:45.3735162Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:20:45.3736548Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T11:20:45.3736692Z if out == self.unknown_value: 2025-12-04T11:20:45.3737423Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.3737563Z warnings.warn( 2025-12-04T11:20:45.3738299Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.3738402Z warnings.warn( 2025-12-04T11:20:45.3738635Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:20:45.3738753Z stats [('calls_captured', 6)] 2025-12-04T11:20:45.3738989Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:20:45.3739567Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)] 2025-12-04T11:20:45.3739668Z graph_break [] 2025-12-04T11:20:45.3739885Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:20:45.3740632Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.3740739Z warnings.warn( 2025-12-04T11:20:45.3741470Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.3741573Z warnings.warn( 2025-12-04T11:20:45.3741787Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:20:45.3741914Z stats [('calls_captured', 6)] 2025-12-04T11:20:45.3742146Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:20:45.3742680Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)] 2025-12-04T11:20:45.3742796Z graph_break [] 2025-12-04T11:20:45.3743012Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:20:45.3743746Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.3743850Z warnings.warn( 2025-12-04T11:20:45.3744568Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.3744683Z warnings.warn( 2025-12-04T11:20:45.3745526Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-41f0f199b083e6d2.xml - 2025-12-04T11:20:45.3745716Z =========================== short test summary info ============================ 2025-12-04T11:20:45.3746673Z FAILED [0.8960s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 - AssertionError: Scalars are not equal! 2025-12-04T11:20:45.3746682Z 2025-12-04T11:20:45.3746789Z Expected 1 but got 2. 2025-12-04T11:20:45.3746915Z Absolute difference: 1 2025-12-04T11:20:45.3747027Z Relative difference: 1.0 2025-12-04T11:20:45.3747033Z 2025-12-04T11:20:45.3747265Z To execute this test, run the following from the base repo dir: 2025-12-04T11:20:45.3748182Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 2025-12-04T11:20:45.3748248Z 2025-12-04T11:20:45.3748521Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:20:45.3748718Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:20:45.3748919Z ================== 1 failed, 13 deselected, 2 rerun in 21.95s ================== 2025-12-04T11:20:45.3749034Z Got exit code 1 2025-12-04T11:20:45.3749897Z FAILED CONSISTENTLY: test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 2025-12-04T11:20:45.3750310Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T11:20:45.3750770Z W1204 11:08:42.220000 89736 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T11:20:45.3751432Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-438f9d52209526cc.xml 2025-12-04T11:20:45.3751644Z ============================= test session starts ============================== 2025-12-04T11:20:45.3751998Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T11:20:45.3752109Z cachedir: .pytest_cache 2025-12-04T11:20:45.3752644Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:20:45.3752774Z rootdir: /var/lib/jenkins/workspace 2025-12-04T11:20:45.3752883Z configfile: pytest.ini 2025-12-04T11:20:45.3753437Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:20:45.3753655Z collecting ... collected 58 items / 6 deselected / 52 selected 2025-12-04T11:20:45.3753819Z stepcurrent: skipping 6 already run items. 2025-12-04T11:20:45.3753935Z Running 8 items in this shard 2025-12-04T11:20:45.3753946Z 2025-12-04T11:20:45.3754804Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 ('RERUN', {'yellow': True}) [3.8908s] [ 12%] 2025-12-04T11:20:45.3755669Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 ('RERUN', {'yellow': True}) [0.4674s] [ 12%] 2025-12-04T11:20:45.3756440Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 FAILED [0.4687s] [ 12%] 2025-12-04T11:20:45.3756446Z 2025-12-04T11:20:45.3756601Z ==================================== RERUNS ==================================== 2025-12-04T11:20:45.3757104Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 _ 2025-12-04T11:20:45.3757234Z Traceback (most recent call last): 2025-12-04T11:20:45.3757756Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda 2025-12-04T11:20:45.3757993Z self.assertEqual(counters["inductor"]["woq_matcher_count"], 1) 2025-12-04T11:20:45.3758469Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual 2025-12-04T11:20:45.3758641Z return super().assertEqual(x, y, *args, **kwargs) 2025-12-04T11:20:45.3759178Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual 2025-12-04T11:20:45.3759396Z raise error_metas.pop()[0].to_error( # type: ignore[index] 2025-12-04T11:20:45.3759533Z AssertionError: Scalars are not equal! 2025-12-04T11:20:45.3759538Z 2025-12-04T11:20:45.3759663Z Expected 1 but got 2. 2025-12-04T11:20:45.3759773Z Absolute difference: 1 2025-12-04T11:20:45.3759886Z Relative difference: 1.0 2025-12-04T11:20:45.3759891Z 2025-12-04T11:20:45.3760190Z To execute this test, run the following from the base repo dir: 2025-12-04T11:20:45.3761094Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 2025-12-04T11:20:45.3761100Z 2025-12-04T11:20:45.3761382Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:20:45.3761635Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:20:45.3761754Z stats [('calls_captured', 6)] 2025-12-04T11:20:45.3762658Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)] 2025-12-04T11:20:45.3762886Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:20:45.3762990Z graph_break [] 2025-12-04T11:20:45.3763223Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:20:45.3763991Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.3764107Z warnings.warn( 2025-12-04T11:20:45.3764827Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.3764935Z warnings.warn( 2025-12-04T11:20:45.3765451Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 _ 2025-12-04T11:20:45.3765575Z Traceback (most recent call last): 2025-12-04T11:20:45.3766094Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda 2025-12-04T11:20:45.3766334Z self.assertEqual(counters["inductor"]["woq_matcher_count"], 1) 2025-12-04T11:20:45.3766797Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual 2025-12-04T11:20:45.3766975Z return super().assertEqual(x, y, *args, **kwargs) 2025-12-04T11:20:45.3767515Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual 2025-12-04T11:20:45.3767724Z raise error_metas.pop()[0].to_error( # type: ignore[index] 2025-12-04T11:20:45.3767871Z AssertionError: Scalars are not equal! 2025-12-04T11:20:45.3767876Z 2025-12-04T11:20:45.3767984Z Expected 1 but got 2. 2025-12-04T11:20:45.3768112Z Absolute difference: 1 2025-12-04T11:20:45.3768223Z Relative difference: 1.0 2025-12-04T11:20:45.3768228Z 2025-12-04T11:20:45.3768447Z To execute this test, run the following from the base repo dir: 2025-12-04T11:20:45.3769375Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 2025-12-04T11:20:45.3769384Z 2025-12-04T11:20:45.3769653Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:20:45.3769892Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:20:45.3770016Z stats [('calls_captured', 6)] 2025-12-04T11:20:45.3770906Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)] 2025-12-04T11:20:45.3771480Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:20:45.3771595Z graph_break [] 2025-12-04T11:20:45.3771816Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:20:45.3772715Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.3772828Z warnings.warn( 2025-12-04T11:20:45.3773569Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.3773672Z warnings.warn( 2025-12-04T11:20:45.3773938Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:20:45.3774071Z stats [('calls_captured', 6)] 2025-12-04T11:20:45.3774301Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:20:45.3775201Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)] 2025-12-04T11:20:45.3775301Z graph_break [] 2025-12-04T11:20:45.3775525Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:20:45.3776377Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.3776481Z warnings.warn( 2025-12-04T11:20:45.3777199Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.3777318Z warnings.warn( 2025-12-04T11:20:45.3777466Z =================================== FAILURES =================================== 2025-12-04T11:20:45.3777987Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 _ 2025-12-04T11:20:45.3778120Z Traceback (most recent call last): 2025-12-04T11:20:45.3778632Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda 2025-12-04T11:20:45.3778884Z self.assertEqual(counters["inductor"]["woq_matcher_count"], 1) 2025-12-04T11:20:45.3779346Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual 2025-12-04T11:20:45.3779526Z return super().assertEqual(x, y, *args, **kwargs) 2025-12-04T11:20:45.3780062Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual 2025-12-04T11:20:45.3780272Z raise error_metas.pop()[0].to_error( # type: ignore[index] 2025-12-04T11:20:45.3780420Z AssertionError: Scalars are not equal! 2025-12-04T11:20:45.3780425Z 2025-12-04T11:20:45.3780533Z Expected 1 but got 2. 2025-12-04T11:20:45.3780643Z Absolute difference: 1 2025-12-04T11:20:45.3780769Z Relative difference: 1.0 2025-12-04T11:20:45.3780774Z 2025-12-04T11:20:45.3780989Z To execute this test, run the following from the base repo dir: 2025-12-04T11:20:45.3781912Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 2025-12-04T11:20:45.3781920Z 2025-12-04T11:20:45.3782190Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:20:45.3782411Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:20:45.3782545Z stats [('calls_captured', 6)] 2025-12-04T11:20:45.3783433Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)] 2025-12-04T11:20:45.3783676Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:20:45.3783775Z graph_break [] 2025-12-04T11:20:45.3783993Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:20:45.3784809Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.3784915Z warnings.warn( 2025-12-04T11:20:45.3785649Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.3785780Z warnings.warn( 2025-12-04T11:20:45.3785999Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:20:45.3786132Z stats [('calls_captured', 6)] 2025-12-04T11:20:45.3786364Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:20:45.3787249Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)] 2025-12-04T11:20:45.3787364Z graph_break [] 2025-12-04T11:20:45.3787588Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:20:45.3788355Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.3788456Z warnings.warn( 2025-12-04T11:20:45.3789171Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.3789289Z warnings.warn( 2025-12-04T11:20:45.3789505Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:20:45.3789624Z stats [('calls_captured', 6)] 2025-12-04T11:20:45.3789866Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:20:45.3790758Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)] 2025-12-04T11:20:45.3790873Z graph_break [] 2025-12-04T11:20:45.3791091Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:20:45.3791815Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.3791934Z warnings.warn( 2025-12-04T11:20:45.3792655Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.3792772Z warnings.warn( 2025-12-04T11:20:45.3793607Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-438f9d52209526cc.xml - 2025-12-04T11:20:45.3793784Z =========================== short test summary info ============================ 2025-12-04T11:20:45.3794744Z FAILED [0.4687s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 - AssertionError: Scalars are not equal! 2025-12-04T11:20:45.3794753Z 2025-12-04T11:20:45.3794860Z Expected 1 but got 2. 2025-12-04T11:20:45.3794983Z Absolute difference: 1 2025-12-04T11:20:45.3795097Z Relative difference: 1.0 2025-12-04T11:20:45.3795102Z 2025-12-04T11:20:45.3795319Z To execute this test, run the following from the base repo dir: 2025-12-04T11:20:45.3796230Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 2025-12-04T11:20:45.3796236Z 2025-12-04T11:20:45.3796503Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:20:45.3796764Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:20:45.3796964Z =================== 1 failed, 6 deselected, 2 rerun in 4.86s =================== 2025-12-04T11:20:45.3797066Z Got exit code 1 2025-12-04T11:20:45.3797186Z Retrying single test... 2025-12-04T11:20:45.3797632Z W1204 11:09:02.718000 89905 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T11:20:45.3798296Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-98df9406c6e0faf3.xml 2025-12-04T11:20:45.3798506Z ============================= test session starts ============================== 2025-12-04T11:20:45.3798856Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T11:20:45.3798982Z cachedir: .pytest_cache 2025-12-04T11:20:45.3799506Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:20:45.3799639Z rootdir: /var/lib/jenkins/workspace 2025-12-04T11:20:45.3799800Z configfile: pytest.ini 2025-12-04T11:20:45.3800342Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:20:45.3800578Z collecting ... collected 58 items / 13 deselected / 45 selected 2025-12-04T11:20:45.3801557Z stepcurrent: skipping 6 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 2025-12-04T11:20:45.3801678Z Running 1 items in this shard 2025-12-04T11:20:45.3801683Z 2025-12-04T11:20:45.3802967Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 [W1204 11:09:08.632398257 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3802978Z 2025-12-04T11:20:45.3803500Z [W1204 11:09:24.522037128 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3803508Z 2025-12-04T11:20:45.3804039Z [W1204 11:09:24.522301560 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3804046Z 2025-12-04T11:20:45.3804556Z [W1204 11:09:24.529822132 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3804562Z 2025-12-04T11:20:45.3805087Z [W1204 11:09:24.530634636 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3805092Z 2025-12-04T11:20:45.3805599Z [W1204 11:09:24.530843137 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3805604Z 2025-12-04T11:20:45.3806127Z [W1204 11:09:24.538004044 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3806134Z 2025-12-04T11:20:45.3806641Z [W1204 11:09:24.538870586 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3806645Z 2025-12-04T11:20:45.3807156Z [W1204 11:09:24.539067005 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3807174Z 2025-12-04T11:20:45.3807684Z [W1204 11:09:24.678692615 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3807689Z 2025-12-04T11:20:45.3808195Z [W1204 11:09:24.680545050 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3808199Z 2025-12-04T11:20:45.3808778Z [W1204 11:09:24.680775509 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3808786Z 2025-12-04T11:20:45.3809293Z [W1204 11:09:24.684866570 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3809298Z 2025-12-04T11:20:45.3809819Z [W1204 11:09:24.685542987 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3809859Z 2025-12-04T11:20:45.3810366Z [W1204 11:09:24.685743610 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3810371Z 2025-12-04T11:20:45.3810886Z [W1204 11:09:24.691988774 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3810890Z 2025-12-04T11:20:45.3811405Z [W1204 11:09:24.692669621 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3811455Z 2025-12-04T11:20:45.3811977Z [W1204 11:09:24.692869115 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3811982Z 2025-12-04T11:20:45.3812115Z ('RERUN', {'yellow': True}) [19.8058s] [100%] 2025-12-04T11:20:45.3813383Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 [W1204 11:09:24.099325879 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3813389Z 2025-12-04T11:20:45.3813913Z [W1204 11:09:24.100114023 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3813917Z 2025-12-04T11:20:45.3814428Z [W1204 11:09:24.100326655 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3814435Z 2025-12-04T11:20:45.3814957Z [W1204 11:09:24.104376219 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3814962Z 2025-12-04T11:20:45.3815469Z [W1204 11:09:24.105017510 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3815477Z 2025-12-04T11:20:45.3816002Z [W1204 11:09:24.105210957 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3816008Z 2025-12-04T11:20:45.3816594Z [W1204 11:09:24.111434917 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3816600Z 2025-12-04T11:20:45.3817127Z [W1204 11:09:24.112066514 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3817134Z 2025-12-04T11:20:45.3817646Z [W1204 11:09:24.112256475 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3817651Z 2025-12-04T11:20:45.3818168Z [W1204 11:09:24.200773142 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3818191Z 2025-12-04T11:20:45.3818700Z [W1204 11:09:24.201563106 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3818705Z 2025-12-04T11:20:45.3819214Z [W1204 11:09:24.201775354 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3819218Z 2025-12-04T11:20:45.3819812Z [W1204 11:09:24.205792632 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3819820Z 2025-12-04T11:20:45.3820327Z [W1204 11:09:24.206450292 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3820332Z 2025-12-04T11:20:45.3820853Z [W1204 11:09:24.206645326 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3820888Z 2025-12-04T11:20:45.3821399Z [W1204 11:09:24.212802721 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3821404Z 2025-12-04T11:20:45.3821927Z [W1204 11:09:24.213647513 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3821931Z 2025-12-04T11:20:45.3822446Z [W1204 11:09:24.213846026 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3822451Z 2025-12-04T11:20:45.3822631Z ('RERUN', {'yellow': True}) [0.4819s] [100%] 2025-12-04T11:20:45.3823899Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 [W1204 11:09:25.557866919 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3823908Z 2025-12-04T11:20:45.3824420Z [W1204 11:09:25.558592953 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3824441Z 2025-12-04T11:20:45.3824955Z [W1204 11:09:25.558793269 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3824960Z 2025-12-04T11:20:45.3825473Z [W1204 11:09:25.562866254 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3825481Z 2025-12-04T11:20:45.3826002Z [W1204 11:09:25.563500657 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3826007Z 2025-12-04T11:20:45.3826514Z [W1204 11:09:25.563693482 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3826521Z 2025-12-04T11:20:45.3827041Z [W1204 11:09:25.569831369 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3827046Z 2025-12-04T11:20:45.3827555Z [W1204 11:09:25.570534864 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3827560Z 2025-12-04T11:20:45.3828085Z [W1204 11:09:25.570729093 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3828091Z 2025-12-04T11:20:45.3828599Z [W1204 11:09:25.660974806 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3828603Z 2025-12-04T11:20:45.3829123Z [W1204 11:09:25.661768144 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3829130Z 2025-12-04T11:20:45.3829640Z [W1204 11:09:25.661985667 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3829645Z 2025-12-04T11:20:45.3830149Z [W1204 11:09:25.666027162 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3830154Z 2025-12-04T11:20:45.3830673Z [W1204 11:09:25.666709729 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3830744Z 2025-12-04T11:20:45.3831254Z [W1204 11:09:25.666918706 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3831261Z 2025-12-04T11:20:45.3831784Z [W1204 11:09:25.673107486 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3831818Z 2025-12-04T11:20:45.3832324Z [W1204 11:09:25.673969783 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3832328Z 2025-12-04T11:20:45.3832847Z [W1204 11:09:25.674170448 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3832851Z 2025-12-04T11:20:45.3832956Z FAILED [0.4573s] [100%] 2025-12-04T11:20:45.3832961Z 2025-12-04T11:20:45.3833115Z ==================================== RERUNS ==================================== 2025-12-04T11:20:45.3833622Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 _ 2025-12-04T11:20:45.3833775Z Traceback (most recent call last): 2025-12-04T11:20:45.3834299Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda 2025-12-04T11:20:45.3834535Z self.assertEqual(counters["inductor"]["woq_matcher_count"], 1) 2025-12-04T11:20:45.3835001Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual 2025-12-04T11:20:45.3835177Z return super().assertEqual(x, y, *args, **kwargs) 2025-12-04T11:20:45.3835713Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual 2025-12-04T11:20:45.3835932Z raise error_metas.pop()[0].to_error( # type: ignore[index] 2025-12-04T11:20:45.3836065Z AssertionError: Scalars are not equal! 2025-12-04T11:20:45.3836071Z 2025-12-04T11:20:45.3836181Z Expected 1 but got 2. 2025-12-04T11:20:45.3836304Z Absolute difference: 1 2025-12-04T11:20:45.3836416Z Relative difference: 1.0 2025-12-04T11:20:45.3836420Z 2025-12-04T11:20:45.3836636Z To execute this test, run the following from the base repo dir: 2025-12-04T11:20:45.3837545Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 2025-12-04T11:20:45.3837554Z 2025-12-04T11:20:45.3837824Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:20:45.3838057Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:20:45.3838175Z stats [('calls_captured', 6)] 2025-12-04T11:20:45.3839071Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)] 2025-12-04T11:20:45.3839313Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:20:45.3839412Z graph_break [] 2025-12-04T11:20:45.3839643Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:20:45.3840851Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T11:20:45.3840973Z if out == self.unknown_value: 2025-12-04T11:20:45.3841712Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.3841814Z warnings.warn( 2025-12-04T11:20:45.3842605Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.3842713Z warnings.warn( 2025-12-04T11:20:45.3843220Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 _ 2025-12-04T11:20:45.3843360Z Traceback (most recent call last): 2025-12-04T11:20:45.3843865Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda 2025-12-04T11:20:45.3844146Z self.assertEqual(counters["inductor"]["woq_matcher_count"], 1) 2025-12-04T11:20:45.3844606Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual 2025-12-04T11:20:45.3844773Z return super().assertEqual(x, y, *args, **kwargs) 2025-12-04T11:20:45.3845325Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual 2025-12-04T11:20:45.3845538Z raise error_metas.pop()[0].to_error( # type: ignore[index] 2025-12-04T11:20:45.3845673Z AssertionError: Scalars are not equal! 2025-12-04T11:20:45.3845707Z 2025-12-04T11:20:45.3845828Z Expected 1 but got 2. 2025-12-04T11:20:45.3845936Z Absolute difference: 1 2025-12-04T11:20:45.3846060Z Relative difference: 1.0 2025-12-04T11:20:45.3846065Z 2025-12-04T11:20:45.3846282Z To execute this test, run the following from the base repo dir: 2025-12-04T11:20:45.3847189Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 2025-12-04T11:20:45.3847195Z 2025-12-04T11:20:45.3847478Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:20:45.3847697Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:20:45.3847828Z stats [('calls_captured', 6)] 2025-12-04T11:20:45.3848716Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)] 2025-12-04T11:20:45.3848946Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:20:45.3849058Z graph_break [] 2025-12-04T11:20:45.3849275Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:20:45.3850502Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T11:20:45.3850620Z if out == self.unknown_value: 2025-12-04T11:20:45.3851347Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.3851464Z warnings.warn( 2025-12-04T11:20:45.3852190Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.3852296Z warnings.warn( 2025-12-04T11:20:45.3852528Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:20:45.3852647Z stats [('calls_captured', 6)] 2025-12-04T11:20:45.3852893Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:20:45.3853778Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)] 2025-12-04T11:20:45.3853878Z graph_break [] 2025-12-04T11:20:45.3854108Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:20:45.3854899Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.3855019Z warnings.warn( 2025-12-04T11:20:45.3855738Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.3855840Z warnings.warn( 2025-12-04T11:20:45.3856003Z =================================== FAILURES =================================== 2025-12-04T11:20:45.3856640Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 _ 2025-12-04T11:20:45.3856794Z Traceback (most recent call last): 2025-12-04T11:20:45.3857310Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda 2025-12-04T11:20:45.3857544Z self.assertEqual(counters["inductor"]["woq_matcher_count"], 1) 2025-12-04T11:20:45.3858024Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual 2025-12-04T11:20:45.3858844Z return super().assertEqual(x, y, *args, **kwargs) 2025-12-04T11:20:45.3859384Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual 2025-12-04T11:20:45.3859608Z raise error_metas.pop()[0].to_error( # type: ignore[index] 2025-12-04T11:20:45.3859748Z AssertionError: Scalars are not equal! 2025-12-04T11:20:45.3859754Z 2025-12-04T11:20:45.3859877Z Expected 1 but got 2. 2025-12-04T11:20:45.3859988Z Absolute difference: 1 2025-12-04T11:20:45.3860102Z Relative difference: 1.0 2025-12-04T11:20:45.3860107Z 2025-12-04T11:20:45.3860339Z To execute this test, run the following from the base repo dir: 2025-12-04T11:20:45.3861239Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 2025-12-04T11:20:45.3861250Z 2025-12-04T11:20:45.3861535Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:20:45.3861761Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:20:45.3861880Z stats [('calls_captured', 6)] 2025-12-04T11:20:45.3862787Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)] 2025-12-04T11:20:45.3863021Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:20:45.3863135Z graph_break [] 2025-12-04T11:20:45.3863355Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:20:45.3864566Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T11:20:45.3864704Z if out == self.unknown_value: 2025-12-04T11:20:45.3865429Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.3865534Z warnings.warn( 2025-12-04T11:20:45.3866266Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.3866373Z warnings.warn( 2025-12-04T11:20:45.3866608Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:20:45.3866726Z stats [('calls_captured', 6)] 2025-12-04T11:20:45.3866956Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:20:45.3867942Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)] 2025-12-04T11:20:45.3868045Z graph_break [] 2025-12-04T11:20:45.3868276Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:20:45.3868999Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.3869174Z warnings.warn( 2025-12-04T11:20:45.3869905Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.3870007Z warnings.warn( 2025-12-04T11:20:45.3870222Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:20:45.3870352Z stats [('calls_captured', 6)] 2025-12-04T11:20:45.3870580Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:20:45.3871822Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)] 2025-12-04T11:20:45.3872092Z graph_break [] 2025-12-04T11:20:45.3872310Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:20:45.3873053Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.3873156Z warnings.warn( 2025-12-04T11:20:45.3873884Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.3873985Z warnings.warn( 2025-12-04T11:20:45.3874826Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-98df9406c6e0faf3.xml - 2025-12-04T11:20:45.3875016Z =========================== short test summary info ============================ 2025-12-04T11:20:45.3875950Z FAILED [0.4573s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 - AssertionError: Scalars are not equal! 2025-12-04T11:20:45.3875958Z 2025-12-04T11:20:45.3876081Z Expected 1 but got 2. 2025-12-04T11:20:45.3876188Z Absolute difference: 1 2025-12-04T11:20:45.3876301Z Relative difference: 1.0 2025-12-04T11:20:45.3876307Z 2025-12-04T11:20:45.3876541Z To execute this test, run the following from the base repo dir: 2025-12-04T11:20:45.3877441Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 2025-12-04T11:20:45.3877447Z 2025-12-04T11:20:45.3877734Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:20:45.3877916Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:20:45.3878115Z ================== 1 failed, 13 deselected, 2 rerun in 20.78s ================== 2025-12-04T11:20:45.3878231Z Got exit code 1 2025-12-04T11:20:45.3878341Z Retrying single test... 2025-12-04T11:20:45.3878790Z W1204 11:09:36.960000 90079 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T11:20:45.3879467Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-f706cf73cc88a5b8.xml 2025-12-04T11:20:45.3879633Z ============================= test session starts ============================== 2025-12-04T11:20:45.3879998Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T11:20:45.3880108Z cachedir: .pytest_cache 2025-12-04T11:20:45.3880732Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:20:45.3880877Z rootdir: /var/lib/jenkins/workspace 2025-12-04T11:20:45.3880990Z configfile: pytest.ini 2025-12-04T11:20:45.3881534Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:20:45.3881816Z collecting ... collected 58 items / 13 deselected / 45 selected 2025-12-04T11:20:45.3882794Z stepcurrent: skipping 6 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 2025-12-04T11:20:45.3882925Z Running 1 items in this shard 2025-12-04T11:20:45.3882930Z 2025-12-04T11:20:45.3884198Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 [W1204 11:09:42.817963690 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3884235Z 2025-12-04T11:20:45.3884768Z [W1204 11:09:58.589273741 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3884775Z 2025-12-04T11:20:45.3885289Z [W1204 11:09:58.589530563 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3885294Z 2025-12-04T11:20:45.3885817Z [W1204 11:09:58.596767050 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3885822Z 2025-12-04T11:20:45.3886331Z [W1204 11:09:58.597478187 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3886337Z 2025-12-04T11:20:45.3886849Z [W1204 11:09:58.597667138 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3886868Z 2025-12-04T11:20:45.3887377Z [W1204 11:09:58.604592264 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3887382Z 2025-12-04T11:20:45.3887891Z [W1204 11:09:58.605395222 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3887896Z 2025-12-04T11:20:45.3888417Z [W1204 11:09:58.605584410 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3888422Z 2025-12-04T11:20:45.3888927Z [W1204 11:09:58.737322492 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3888932Z 2025-12-04T11:20:45.3889454Z [W1204 11:09:58.739102308 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3889462Z 2025-12-04T11:20:45.3889968Z [W1204 11:09:58.739313618 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3889973Z 2025-12-04T11:20:45.3890490Z [W1204 11:09:58.743320274 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3890497Z 2025-12-04T11:20:45.3891006Z [W1204 11:09:58.744004994 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3891011Z 2025-12-04T11:20:45.3891529Z [W1204 11:09:58.744197384 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3891534Z 2025-12-04T11:20:45.3892104Z [W1204 11:09:58.750287548 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3892112Z 2025-12-04T11:20:45.3892619Z [W1204 11:09:58.750920945 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3892624Z 2025-12-04T11:20:45.3893140Z [W1204 11:09:58.751112652 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3893175Z 2025-12-04T11:20:45.3893310Z ('RERUN', {'yellow': True}) [19.6417s] [100%] 2025-12-04T11:20:45.3894584Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 [W1204 11:09:58.147624160 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3894590Z 2025-12-04T11:20:45.3895108Z [W1204 11:09:58.148368731 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3895142Z 2025-12-04T11:20:45.3895669Z [W1204 11:09:58.148596740 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3895675Z 2025-12-04T11:20:45.3896183Z [W1204 11:09:58.152644818 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3896190Z 2025-12-04T11:20:45.3896786Z [W1204 11:09:58.153281573 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3896792Z 2025-12-04T11:20:45.3897304Z [W1204 11:09:58.153472488 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3897309Z 2025-12-04T11:20:45.3897826Z [W1204 11:09:58.159508290 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3897848Z 2025-12-04T11:20:45.3898357Z [W1204 11:09:58.160188026 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3898362Z 2025-12-04T11:20:45.3898873Z [W1204 11:09:58.160375446 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3898880Z 2025-12-04T11:20:45.3899404Z [W1204 11:09:58.247722610 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3899409Z 2025-12-04T11:20:45.3899916Z [W1204 11:09:58.248476474 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3899921Z 2025-12-04T11:20:45.3900449Z [W1204 11:09:58.248717039 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3900456Z 2025-12-04T11:20:45.3900966Z [W1204 11:09:58.252625098 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3900971Z 2025-12-04T11:20:45.3901494Z [W1204 11:09:58.253287563 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3901501Z 2025-12-04T11:20:45.3902011Z [W1204 11:09:58.253483574 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3902016Z 2025-12-04T11:20:45.3902531Z [W1204 11:09:58.259437688 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3902535Z 2025-12-04T11:20:45.3903125Z [W1204 11:09:58.260259849 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3903131Z 2025-12-04T11:20:45.3903640Z [W1204 11:09:58.260455271 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3903658Z 2025-12-04T11:20:45.3903790Z ('RERUN', {'yellow': True}) [0.4701s] [100%] 2025-12-04T11:20:45.3905053Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 [W1204 11:09:59.588684942 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3905091Z 2025-12-04T11:20:45.3905611Z [W1204 11:09:59.589410498 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3905616Z 2025-12-04T11:20:45.3906132Z [W1204 11:09:59.589607469 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3906138Z 2025-12-04T11:20:45.3906696Z [W1204 11:09:59.593556166 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3906702Z 2025-12-04T11:20:45.3907211Z [W1204 11:09:59.594191774 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3907219Z 2025-12-04T11:20:45.3907739Z [W1204 11:09:59.594378065 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3907743Z 2025-12-04T11:20:45.3908252Z [W1204 11:09:59.600336498 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3908257Z 2025-12-04T11:20:45.3908763Z [W1204 11:09:59.600984188 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3908785Z 2025-12-04T11:20:45.3909289Z [W1204 11:09:59.601171953 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3909296Z 2025-12-04T11:20:45.3909799Z [W1204 11:09:59.687311302 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3909806Z 2025-12-04T11:20:45.3910323Z [W1204 11:09:59.688080550 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3910328Z 2025-12-04T11:20:45.3910834Z [W1204 11:09:59.688288618 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3910839Z 2025-12-04T11:20:45.3911359Z [W1204 11:09:59.692247355 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3911364Z 2025-12-04T11:20:45.3911875Z [W1204 11:09:59.692950519 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3911883Z 2025-12-04T11:20:45.3912405Z [W1204 11:09:59.693151974 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3912410Z 2025-12-04T11:20:45.3912921Z [W1204 11:09:59.699168630 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3912926Z 2025-12-04T11:20:45.3913447Z [W1204 11:09:59.700040014 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3913452Z 2025-12-04T11:20:45.3913957Z [W1204 11:09:59.700243755 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.3913962Z 2025-12-04T11:20:45.3914125Z FAILED [0.4386s] [100%] 2025-12-04T11:20:45.3914131Z 2025-12-04T11:20:45.3914291Z ==================================== RERUNS ==================================== 2025-12-04T11:20:45.3914790Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 _ 2025-12-04T11:20:45.3914929Z Traceback (most recent call last): 2025-12-04T11:20:45.3915440Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda 2025-12-04T11:20:45.3915703Z self.assertEqual(counters["inductor"]["woq_matcher_count"], 1) 2025-12-04T11:20:45.3916181Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual 2025-12-04T11:20:45.3916349Z return super().assertEqual(x, y, *args, **kwargs) 2025-12-04T11:20:45.3916895Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual 2025-12-04T11:20:45.3917106Z raise error_metas.pop()[0].to_error( # type: ignore[index] 2025-12-04T11:20:45.3917300Z AssertionError: Scalars are not equal! 2025-12-04T11:20:45.3917306Z 2025-12-04T11:20:45.3917424Z Expected 1 but got 2. 2025-12-04T11:20:45.3917532Z Absolute difference: 1 2025-12-04T11:20:45.3917642Z Relative difference: 1.0 2025-12-04T11:20:45.3917646Z 2025-12-04T11:20:45.3917876Z To execute this test, run the following from the base repo dir: 2025-12-04T11:20:45.3918780Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 2025-12-04T11:20:45.3918787Z 2025-12-04T11:20:45.3919070Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:20:45.3919289Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:20:45.3919405Z stats [('calls_captured', 6)] 2025-12-04T11:20:45.3920308Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)] 2025-12-04T11:20:45.3920538Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:20:45.3920650Z graph_break [] 2025-12-04T11:20:45.3920869Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:20:45.3922074Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T11:20:45.3922209Z if out == self.unknown_value: 2025-12-04T11:20:45.3922931Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.3923052Z warnings.warn( 2025-12-04T11:20:45.3923780Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.3923885Z warnings.warn( 2025-12-04T11:20:45.3924404Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 _ 2025-12-04T11:20:45.3924533Z Traceback (most recent call last): 2025-12-04T11:20:45.3925045Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda 2025-12-04T11:20:45.3925293Z self.assertEqual(counters["inductor"]["woq_matcher_count"], 1) 2025-12-04T11:20:45.3925751Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual 2025-12-04T11:20:45.3925927Z return super().assertEqual(x, y, *args, **kwargs) 2025-12-04T11:20:45.3926527Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual 2025-12-04T11:20:45.3926737Z raise error_metas.pop()[0].to_error( # type: ignore[index] 2025-12-04T11:20:45.3926885Z AssertionError: Scalars are not equal! 2025-12-04T11:20:45.3926890Z 2025-12-04T11:20:45.3926998Z Expected 1 but got 2. 2025-12-04T11:20:45.3927123Z Absolute difference: 1 2025-12-04T11:20:45.3927268Z Relative difference: 1.0 2025-12-04T11:20:45.3927273Z 2025-12-04T11:20:45.3927489Z To execute this test, run the following from the base repo dir: 2025-12-04T11:20:45.3928395Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 2025-12-04T11:20:45.3928400Z 2025-12-04T11:20:45.3928670Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:20:45.3928904Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:20:45.3929022Z stats [('calls_captured', 6)] 2025-12-04T11:20:45.3929941Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)] 2025-12-04T11:20:45.3930182Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:20:45.3930285Z graph_break [] 2025-12-04T11:20:45.3930503Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:20:45.3931720Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T11:20:45.3931839Z if out == self.unknown_value: 2025-12-04T11:20:45.3932582Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.3932688Z warnings.warn( 2025-12-04T11:20:45.3933405Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.3933520Z warnings.warn( 2025-12-04T11:20:45.3933741Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:20:45.3933872Z stats [('calls_captured', 6)] 2025-12-04T11:20:45.3934101Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:20:45.3934989Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)] 2025-12-04T11:20:45.3935106Z graph_break [] 2025-12-04T11:20:45.3935328Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:20:45.3936068Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.3936172Z warnings.warn( 2025-12-04T11:20:45.3936995Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.3937116Z warnings.warn( 2025-12-04T11:20:45.3937267Z =================================== FAILURES =================================== 2025-12-04T11:20:45.3937772Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 _ 2025-12-04T11:20:45.3937913Z Traceback (most recent call last): 2025-12-04T11:20:45.3938422Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda 2025-12-04T11:20:45.3938737Z self.assertEqual(counters["inductor"]["woq_matcher_count"], 1) 2025-12-04T11:20:45.3939202Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual 2025-12-04T11:20:45.3939368Z return super().assertEqual(x, y, *args, **kwargs) 2025-12-04T11:20:45.3939919Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual 2025-12-04T11:20:45.3940161Z raise error_metas.pop()[0].to_error( # type: ignore[index] 2025-12-04T11:20:45.3940298Z AssertionError: Scalars are not equal! 2025-12-04T11:20:45.3940318Z 2025-12-04T11:20:45.3940427Z Expected 1 but got 2. 2025-12-04T11:20:45.3940538Z Absolute difference: 1 2025-12-04T11:20:45.3940663Z Relative difference: 1.0 2025-12-04T11:20:45.3940668Z 2025-12-04T11:20:45.3940885Z To execute this test, run the following from the base repo dir: 2025-12-04T11:20:45.3941793Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 2025-12-04T11:20:45.3941829Z 2025-12-04T11:20:45.3942113Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:20:45.3942336Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:20:45.3942471Z stats [('calls_captured', 6)] 2025-12-04T11:20:45.3943357Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)] 2025-12-04T11:20:45.3943585Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:20:45.3943700Z graph_break [] 2025-12-04T11:20:45.3943914Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:20:45.3945138Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T11:20:45.3945259Z if out == self.unknown_value: 2025-12-04T11:20:45.3945983Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.3946104Z warnings.warn( 2025-12-04T11:20:45.3946820Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.3946937Z warnings.warn( 2025-12-04T11:20:45.3947155Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:20:45.3947272Z stats [('calls_captured', 6)] 2025-12-04T11:20:45.3947519Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:20:45.3948409Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)] 2025-12-04T11:20:45.3948512Z graph_break [] 2025-12-04T11:20:45.3948740Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:20:45.3949465Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.3949580Z warnings.warn( 2025-12-04T11:20:45.3950293Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.3950394Z warnings.warn( 2025-12-04T11:20:45.3950700Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:20:45.3950820Z stats [('calls_captured', 6)] 2025-12-04T11:20:45.3951050Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:20:45.3951943Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)] 2025-12-04T11:20:45.3952076Z graph_break [] 2025-12-04T11:20:45.3952305Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:20:45.3953031Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.3953135Z warnings.warn( 2025-12-04T11:20:45.3953869Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.3953974Z warnings.warn( 2025-12-04T11:20:45.3954859Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-f706cf73cc88a5b8.xml - 2025-12-04T11:20:45.3955040Z =========================== short test summary info ============================ 2025-12-04T11:20:45.3955972Z FAILED [0.4386s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 - AssertionError: Scalars are not equal! 2025-12-04T11:20:45.3955993Z 2025-12-04T11:20:45.3956102Z Expected 1 but got 2. 2025-12-04T11:20:45.3956214Z Absolute difference: 1 2025-12-04T11:20:45.3956340Z Relative difference: 1.0 2025-12-04T11:20:45.3956345Z 2025-12-04T11:20:45.3956561Z To execute this test, run the following from the base repo dir: 2025-12-04T11:20:45.3957460Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 2025-12-04T11:20:45.3957468Z 2025-12-04T11:20:45.3957750Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:20:45.3957932Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:20:45.3958144Z ================== 1 failed, 13 deselected, 2 rerun in 20.58s ================== 2025-12-04T11:20:45.3958251Z Got exit code 1 2025-12-04T11:20:45.3959064Z FAILED CONSISTENTLY: test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 2025-12-04T11:20:45.3959490Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T11:20:45.3959934Z W1204 11:10:10.810000 90253 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T11:20:45.3960614Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-c67f05de6c39b0d8.xml 2025-12-04T11:20:45.3960784Z ============================= test session starts ============================== 2025-12-04T11:20:45.3961134Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T11:20:45.3961261Z cachedir: .pytest_cache 2025-12-04T11:20:45.3961781Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:20:45.3961909Z rootdir: /var/lib/jenkins/workspace 2025-12-04T11:20:45.3962031Z configfile: pytest.ini 2025-12-04T11:20:45.3962568Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:20:45.3962794Z collecting ... collected 58 items / 7 deselected / 51 selected 2025-12-04T11:20:45.3963001Z stepcurrent: skipping 7 already run items. 2025-12-04T11:20:45.3963121Z Running 7 items in this shard 2025-12-04T11:20:45.3963126Z 2025-12-04T11:20:45.3964005Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 ('RERUN', {'yellow': True}) [3.8417s] [ 14%] 2025-12-04T11:20:45.3964891Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 ('RERUN', {'yellow': True}) [0.4582s] [ 14%] 2025-12-04T11:20:45.3965675Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 FAILED [0.4535s] [ 14%] 2025-12-04T11:20:45.3965681Z 2025-12-04T11:20:45.3965825Z ==================================== RERUNS ==================================== 2025-12-04T11:20:45.3966332Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 _ 2025-12-04T11:20:45.3966500Z Traceback (most recent call last): 2025-12-04T11:20:45.3967014Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda 2025-12-04T11:20:45.3967260Z self.assertEqual(counters["inductor"]["woq_matcher_count"], 1) 2025-12-04T11:20:45.3967724Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual 2025-12-04T11:20:45.3967890Z return super().assertEqual(x, y, *args, **kwargs) 2025-12-04T11:20:45.3968440Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual 2025-12-04T11:20:45.3968645Z raise error_metas.pop()[0].to_error( # type: ignore[index] 2025-12-04T11:20:45.3968796Z AssertionError: Scalars are not equal! 2025-12-04T11:20:45.3968802Z 2025-12-04T11:20:45.3968909Z Expected 1 but got 2. 2025-12-04T11:20:45.3969022Z Absolute difference: 1 2025-12-04T11:20:45.3969149Z Relative difference: 1.0 2025-12-04T11:20:45.3969154Z 2025-12-04T11:20:45.3969369Z To execute this test, run the following from the base repo dir: 2025-12-04T11:20:45.3970272Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 2025-12-04T11:20:45.3970294Z 2025-12-04T11:20:45.3970562Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:20:45.3970781Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:20:45.3970912Z stats [('calls_captured', 6)] 2025-12-04T11:20:45.3972191Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)] 2025-12-04T11:20:45.3972424Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:20:45.3972543Z graph_break [] 2025-12-04T11:20:45.3972763Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:20:45.3973511Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.3973617Z warnings.warn( 2025-12-04T11:20:45.3974338Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.3974453Z warnings.warn( 2025-12-04T11:20:45.3974956Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 _ 2025-12-04T11:20:45.3975098Z Traceback (most recent call last): 2025-12-04T11:20:45.3975741Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda 2025-12-04T11:20:45.3975979Z self.assertEqual(counters["inductor"]["woq_matcher_count"], 1) 2025-12-04T11:20:45.3976526Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual 2025-12-04T11:20:45.3976692Z return super().assertEqual(x, y, *args, **kwargs) 2025-12-04T11:20:45.3977278Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual 2025-12-04T11:20:45.3977500Z raise error_metas.pop()[0].to_error( # type: ignore[index] 2025-12-04T11:20:45.3977634Z AssertionError: Scalars are not equal! 2025-12-04T11:20:45.3977639Z 2025-12-04T11:20:45.3977759Z Expected 1 but got 2. 2025-12-04T11:20:45.3977866Z Absolute difference: 1 2025-12-04T11:20:45.3977977Z Relative difference: 1.0 2025-12-04T11:20:45.3977982Z 2025-12-04T11:20:45.3978212Z To execute this test, run the following from the base repo dir: 2025-12-04T11:20:45.3979113Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 2025-12-04T11:20:45.3979166Z 2025-12-04T11:20:45.3979448Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:20:45.3979673Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:20:45.3979790Z stats [('calls_captured', 6)] 2025-12-04T11:20:45.3980689Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)] 2025-12-04T11:20:45.3980916Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:20:45.3981030Z graph_break [] 2025-12-04T11:20:45.3981254Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:20:45.3981987Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.3982105Z warnings.warn( 2025-12-04T11:20:45.3982824Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.3982928Z warnings.warn( 2025-12-04T11:20:45.3983158Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:20:45.3983278Z stats [('calls_captured', 6)] 2025-12-04T11:20:45.3983519Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:20:45.3984406Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)] 2025-12-04T11:20:45.3984511Z graph_break [] 2025-12-04T11:20:45.3984739Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:20:45.3985464Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.3985580Z warnings.warn( 2025-12-04T11:20:45.3986296Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.3986397Z warnings.warn( 2025-12-04T11:20:45.3986557Z =================================== FAILURES =================================== 2025-12-04T11:20:45.3987062Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 _ 2025-12-04T11:20:45.3987186Z Traceback (most recent call last): 2025-12-04T11:20:45.3987782Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda 2025-12-04T11:20:45.3988019Z self.assertEqual(counters["inductor"]["woq_matcher_count"], 1) 2025-12-04T11:20:45.3988493Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual 2025-12-04T11:20:45.3988660Z return super().assertEqual(x, y, *args, **kwargs) 2025-12-04T11:20:45.3989224Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual 2025-12-04T11:20:45.3989446Z raise error_metas.pop()[0].to_error( # type: ignore[index] 2025-12-04T11:20:45.3989584Z AssertionError: Scalars are not equal! 2025-12-04T11:20:45.3989589Z 2025-12-04T11:20:45.3989709Z Expected 1 but got 2. 2025-12-04T11:20:45.3989815Z Absolute difference: 1 2025-12-04T11:20:45.3989926Z Relative difference: 1.0 2025-12-04T11:20:45.3989931Z 2025-12-04T11:20:45.3990162Z To execute this test, run the following from the base repo dir: 2025-12-04T11:20:45.3991113Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 2025-12-04T11:20:45.3991119Z 2025-12-04T11:20:45.3991393Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:20:45.3991629Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:20:45.3991746Z stats [('calls_captured', 6)] 2025-12-04T11:20:45.3992647Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)] 2025-12-04T11:20:45.3992874Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:20:45.3992973Z graph_break [] 2025-12-04T11:20:45.3993207Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:20:45.3993940Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.3994056Z warnings.warn( 2025-12-04T11:20:45.3994773Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.3994878Z warnings.warn( 2025-12-04T11:20:45.3995107Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:20:45.3995224Z stats [('calls_captured', 6)] 2025-12-04T11:20:45.3995450Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:20:45.3996359Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)] 2025-12-04T11:20:45.3996461Z graph_break [] 2025-12-04T11:20:45.3996689Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:20:45.3997410Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.3997512Z warnings.warn( 2025-12-04T11:20:45.3998251Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.3998353Z warnings.warn( 2025-12-04T11:20:45.3998581Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:20:45.3998700Z stats [('calls_captured', 6)] 2025-12-04T11:20:45.3998929Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:20:45.3999926Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)] 2025-12-04T11:20:45.4000031Z graph_break [] 2025-12-04T11:20:45.4000249Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:20:45.4000989Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.4001126Z warnings.warn( 2025-12-04T11:20:45.4001862Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.4001965Z warnings.warn( 2025-12-04T11:20:45.4002807Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-c67f05de6c39b0d8.xml - 2025-12-04T11:20:45.4003028Z =========================== short test summary info ============================ 2025-12-04T11:20:45.4003970Z FAILED [0.4535s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 - AssertionError: Scalars are not equal! 2025-12-04T11:20:45.4003978Z 2025-12-04T11:20:45.4004101Z Expected 1 but got 2. 2025-12-04T11:20:45.4004213Z Absolute difference: 1 2025-12-04T11:20:45.4004327Z Relative difference: 1.0 2025-12-04T11:20:45.4004332Z 2025-12-04T11:20:45.4004568Z To execute this test, run the following from the base repo dir: 2025-12-04T11:20:45.4005470Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 2025-12-04T11:20:45.4005476Z 2025-12-04T11:20:45.4005761Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:20:45.4005947Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:20:45.4006148Z =================== 1 failed, 7 deselected, 2 rerun in 4.78s =================== 2025-12-04T11:20:45.4006269Z Got exit code 1 2025-12-04T11:20:45.4006381Z Retrying single test... 2025-12-04T11:20:45.4006827Z W1204 11:10:31.085000 90429 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T11:20:45.4007511Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-0e30a339afee7d22.xml 2025-12-04T11:20:45.4007679Z ============================= test session starts ============================== 2025-12-04T11:20:45.4008046Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T11:20:45.4008161Z cachedir: .pytest_cache 2025-12-04T11:20:45.4008687Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:20:45.4008833Z rootdir: /var/lib/jenkins/workspace 2025-12-04T11:20:45.4008944Z configfile: pytest.ini 2025-12-04T11:20:45.4009506Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:20:45.4009735Z collecting ... collected 58 items / 13 deselected / 45 selected 2025-12-04T11:20:45.4010724Z stepcurrent: skipping 7 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 2025-12-04T11:20:45.4010856Z Running 1 items in this shard 2025-12-04T11:20:45.4010862Z 2025-12-04T11:20:45.4012197Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 [W1204 11:10:36.935468562 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4012207Z 2025-12-04T11:20:45.4012737Z [W1204 11:10:52.692230925 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4012743Z 2025-12-04T11:20:45.4013294Z [W1204 11:10:52.692493183 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4013299Z 2025-12-04T11:20:45.4013825Z [W1204 11:10:52.699763058 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4013830Z 2025-12-04T11:20:45.4014338Z [W1204 11:10:52.700480400 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4014343Z 2025-12-04T11:20:45.4014870Z [W1204 11:10:52.700712984 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4014905Z 2025-12-04T11:20:45.4015415Z [W1204 11:10:52.707564883 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4015420Z 2025-12-04T11:20:45.4015924Z [W1204 11:10:52.708327166 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4015944Z 2025-12-04T11:20:45.4016525Z [W1204 11:10:52.708509715 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4016531Z 2025-12-04T11:20:45.4017041Z [W1204 11:10:52.846185718 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4017046Z 2025-12-04T11:20:45.4017572Z [W1204 11:10:52.847942650 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4017579Z 2025-12-04T11:20:45.4018088Z [W1204 11:10:52.848154970 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4018093Z 2025-12-04T11:20:45.4018616Z [W1204 11:10:52.852234769 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4018623Z 2025-12-04T11:20:45.4019131Z [W1204 11:10:52.852929893 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4019136Z 2025-12-04T11:20:45.4019654Z [W1204 11:10:52.853127888 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4019659Z 2025-12-04T11:20:45.4020167Z [W1204 11:10:52.859192631 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4020174Z 2025-12-04T11:20:45.4020682Z [W1204 11:10:52.859840535 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4020701Z 2025-12-04T11:20:45.4021206Z [W1204 11:10:52.860059104 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4021214Z 2025-12-04T11:20:45.4021347Z ('RERUN', {'yellow': True}) [19.6302s] [100%] 2025-12-04T11:20:45.4022638Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 [W1204 11:10:52.265044445 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4022644Z 2025-12-04T11:20:45.4023217Z [W1204 11:10:52.265793263 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4023225Z 2025-12-04T11:20:45.4023744Z [W1204 11:10:52.265989075 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4023750Z 2025-12-04T11:20:45.4024257Z [W1204 11:10:52.270094116 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4024291Z 2025-12-04T11:20:45.4024813Z [W1204 11:10:52.270733179 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4024817Z 2025-12-04T11:20:45.4025323Z [W1204 11:10:52.270923080 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4025328Z 2025-12-04T11:20:45.4025854Z [W1204 11:10:52.276989448 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4025890Z 2025-12-04T11:20:45.4026400Z [W1204 11:10:52.277618443 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4026405Z 2025-12-04T11:20:45.4026913Z [W1204 11:10:52.277822356 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4026935Z 2025-12-04T11:20:45.4027448Z [W1204 11:10:52.364469886 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4027452Z 2025-12-04T11:20:45.4027959Z [W1204 11:10:52.365250162 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4027964Z 2025-12-04T11:20:45.4028490Z [W1204 11:10:52.365458275 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4028495Z 2025-12-04T11:20:45.4029010Z [W1204 11:10:52.369355873 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4029016Z 2025-12-04T11:20:45.4029536Z [W1204 11:10:52.369992733 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4029543Z 2025-12-04T11:20:45.4030051Z [W1204 11:10:52.370208579 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4030056Z 2025-12-04T11:20:45.4030581Z [W1204 11:10:52.376138353 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4030586Z 2025-12-04T11:20:45.4031097Z [W1204 11:10:52.376956007 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4031105Z 2025-12-04T11:20:45.4031633Z [W1204 11:10:52.377152691 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4031641Z 2025-12-04T11:20:45.4031775Z ('RERUN', {'yellow': True}) [0.4768s] [100%] 2025-12-04T11:20:45.4033049Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 [W1204 11:10:53.709083154 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4033058Z 2025-12-04T11:20:45.4033578Z [W1204 11:10:53.709813503 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4033582Z 2025-12-04T11:20:45.4034092Z [W1204 11:10:53.710029561 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4034162Z 2025-12-04T11:20:45.4034682Z [W1204 11:10:53.713984779 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4034689Z 2025-12-04T11:20:45.4035194Z [W1204 11:10:53.714586977 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4035245Z 2025-12-04T11:20:45.4035765Z [W1204 11:10:53.714773689 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4035769Z 2025-12-04T11:20:45.4036276Z [W1204 11:10:53.720886061 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4036281Z 2025-12-04T11:20:45.4036801Z [W1204 11:10:53.721502405 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4036806Z 2025-12-04T11:20:45.4037321Z [W1204 11:10:53.721692889 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4037358Z 2025-12-04T11:20:45.4037866Z [W1204 11:10:53.808625099 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4037882Z 2025-12-04T11:20:45.4038390Z [W1204 11:10:53.809409982 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4038395Z 2025-12-04T11:20:45.4038898Z [W1204 11:10:53.809623646 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4038903Z 2025-12-04T11:20:45.4039423Z [W1204 11:10:53.813634380 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4039428Z 2025-12-04T11:20:45.4039940Z [W1204 11:10:53.814362333 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4039946Z 2025-12-04T11:20:45.4040469Z [W1204 11:10:53.814564778 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4040474Z 2025-12-04T11:20:45.4040981Z [W1204 11:10:53.820725633 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4040988Z 2025-12-04T11:20:45.4041511Z [W1204 11:10:53.821614939 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4041516Z 2025-12-04T11:20:45.4042027Z [W1204 11:10:53.821829828 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4042032Z 2025-12-04T11:20:45.4042148Z FAILED [0.4435s] [100%] 2025-12-04T11:20:45.4042156Z 2025-12-04T11:20:45.4042303Z ==================================== RERUNS ==================================== 2025-12-04T11:20:45.4042814Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 _ 2025-12-04T11:20:45.4042955Z Traceback (most recent call last): 2025-12-04T11:20:45.4043465Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda 2025-12-04T11:20:45.4043701Z self.assertEqual(counters["inductor"]["woq_matcher_count"], 1) 2025-12-04T11:20:45.4044182Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual 2025-12-04T11:20:45.4044348Z return super().assertEqual(x, y, *args, **kwargs) 2025-12-04T11:20:45.4044897Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual 2025-12-04T11:20:45.4045167Z raise error_metas.pop()[0].to_error( # type: ignore[index] 2025-12-04T11:20:45.4045305Z AssertionError: Scalars are not equal! 2025-12-04T11:20:45.4045311Z 2025-12-04T11:20:45.4045431Z Expected 1 but got 2. 2025-12-04T11:20:45.4045539Z Absolute difference: 1 2025-12-04T11:20:45.4045651Z Relative difference: 1.0 2025-12-04T11:20:45.4045669Z 2025-12-04T11:20:45.4045884Z To execute this test, run the following from the base repo dir: 2025-12-04T11:20:45.4046819Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 2025-12-04T11:20:45.4046825Z 2025-12-04T11:20:45.4047108Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:20:45.4047329Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:20:45.4047446Z stats [('calls_captured', 6)] 2025-12-04T11:20:45.4048353Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)] 2025-12-04T11:20:45.4048612Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:20:45.4048727Z graph_break [] 2025-12-04T11:20:45.4048943Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:20:45.4050157Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T11:20:45.4050287Z if out == self.unknown_value: 2025-12-04T11:20:45.4051007Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.4051126Z warnings.warn( 2025-12-04T11:20:45.4051845Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.4051950Z warnings.warn( 2025-12-04T11:20:45.4052467Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 _ 2025-12-04T11:20:45.4052594Z Traceback (most recent call last): 2025-12-04T11:20:45.4053118Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda 2025-12-04T11:20:45.4053349Z self.assertEqual(counters["inductor"]["woq_matcher_count"], 1) 2025-12-04T11:20:45.4053807Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual 2025-12-04T11:20:45.4053982Z return super().assertEqual(x, y, *args, **kwargs) 2025-12-04T11:20:45.4054521Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual 2025-12-04T11:20:45.4054730Z raise error_metas.pop()[0].to_error( # type: ignore[index] 2025-12-04T11:20:45.4054873Z AssertionError: Scalars are not equal! 2025-12-04T11:20:45.4054878Z 2025-12-04T11:20:45.4054984Z Expected 1 but got 2. 2025-12-04T11:20:45.4055104Z Absolute difference: 1 2025-12-04T11:20:45.4055216Z Relative difference: 1.0 2025-12-04T11:20:45.4055223Z 2025-12-04T11:20:45.4055442Z To execute this test, run the following from the base repo dir: 2025-12-04T11:20:45.4056433Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 2025-12-04T11:20:45.4056440Z 2025-12-04T11:20:45.4056709Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:20:45.4056944Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:20:45.4057145Z stats [('calls_captured', 6)] 2025-12-04T11:20:45.4058032Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)] 2025-12-04T11:20:45.4058273Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:20:45.4058482Z graph_break [] 2025-12-04T11:20:45.4058717Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:20:45.4059917Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T11:20:45.4060034Z if out == self.unknown_value: 2025-12-04T11:20:45.4060777Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.4060912Z warnings.warn( 2025-12-04T11:20:45.4061648Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.4061752Z warnings.warn( 2025-12-04T11:20:45.4061972Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:20:45.4062108Z stats [('calls_captured', 6)] 2025-12-04T11:20:45.4062339Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:20:45.4063231Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)] 2025-12-04T11:20:45.4063346Z graph_break [] 2025-12-04T11:20:45.4063568Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:20:45.4064308Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.4064411Z warnings.warn( 2025-12-04T11:20:45.4065129Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.4065248Z warnings.warn( 2025-12-04T11:20:45.4065396Z =================================== FAILURES =================================== 2025-12-04T11:20:45.4065912Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 _ 2025-12-04T11:20:45.4066040Z Traceback (most recent call last): 2025-12-04T11:20:45.4066547Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda 2025-12-04T11:20:45.4066793Z self.assertEqual(counters["inductor"]["woq_matcher_count"], 1) 2025-12-04T11:20:45.4067254Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual 2025-12-04T11:20:45.4067419Z return super().assertEqual(x, y, *args, **kwargs) 2025-12-04T11:20:45.4067972Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual 2025-12-04T11:20:45.4068183Z raise error_metas.pop()[0].to_error( # type: ignore[index] 2025-12-04T11:20:45.4068334Z AssertionError: Scalars are not equal! 2025-12-04T11:20:45.4068339Z 2025-12-04T11:20:45.4068444Z Expected 1 but got 2. 2025-12-04T11:20:45.4068554Z Absolute difference: 1 2025-12-04T11:20:45.4068677Z Relative difference: 1.0 2025-12-04T11:20:45.4068682Z 2025-12-04T11:20:45.4068897Z To execute this test, run the following from the base repo dir: 2025-12-04T11:20:45.4069864Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 2025-12-04T11:20:45.4069886Z 2025-12-04T11:20:45.4070159Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:20:45.4070378Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:20:45.4070541Z stats [('calls_captured', 6)] 2025-12-04T11:20:45.4071763Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)] 2025-12-04T11:20:45.4072009Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:20:45.4072111Z graph_break [] 2025-12-04T11:20:45.4072331Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:20:45.4073556Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T11:20:45.4073753Z if out == self.unknown_value: 2025-12-04T11:20:45.4074476Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.4074603Z warnings.warn( 2025-12-04T11:20:45.4075323Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.4075442Z warnings.warn( 2025-12-04T11:20:45.4075665Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:20:45.4075786Z stats [('calls_captured', 6)] 2025-12-04T11:20:45.4076031Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:20:45.4076928Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)] 2025-12-04T11:20:45.4077050Z graph_break [] 2025-12-04T11:20:45.4077268Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:20:45.4077997Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.4078115Z warnings.warn( 2025-12-04T11:20:45.4078833Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.4078936Z warnings.warn( 2025-12-04T11:20:45.4079166Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:20:45.4079288Z stats [('calls_captured', 6)] 2025-12-04T11:20:45.4079535Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:20:45.4080425Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)] 2025-12-04T11:20:45.4080531Z graph_break [] 2025-12-04T11:20:45.4080764Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:20:45.4081494Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.4081614Z warnings.warn( 2025-12-04T11:20:45.4082334Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.4082543Z warnings.warn( 2025-12-04T11:20:45.4083400Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-0e30a339afee7d22.xml - 2025-12-04T11:20:45.4083580Z =========================== short test summary info ============================ 2025-12-04T11:20:45.4084537Z FAILED [0.4435s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 - AssertionError: Scalars are not equal! 2025-12-04T11:20:45.4084591Z 2025-12-04T11:20:45.4084700Z Expected 1 but got 2. 2025-12-04T11:20:45.4084812Z Absolute difference: 1 2025-12-04T11:20:45.4084943Z Relative difference: 1.0 2025-12-04T11:20:45.4084948Z 2025-12-04T11:20:45.4085170Z To execute this test, run the following from the base repo dir: 2025-12-04T11:20:45.4086078Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 2025-12-04T11:20:45.4086130Z 2025-12-04T11:20:45.4086406Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:20:45.4086589Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:20:45.4086802Z ================== 1 failed, 13 deselected, 2 rerun in 20.58s ================== 2025-12-04T11:20:45.4086906Z Got exit code 1 2025-12-04T11:20:45.4087014Z Retrying single test... 2025-12-04T11:20:45.4087473Z W1204 11:11:05.205000 90610 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T11:20:45.4088136Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-4a14a2e6be65f97f.xml 2025-12-04T11:20:45.4088318Z ============================= test session starts ============================== 2025-12-04T11:20:45.4088673Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T11:20:45.4088786Z cachedir: .pytest_cache 2025-12-04T11:20:45.4089323Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:20:45.4089449Z rootdir: /var/lib/jenkins/workspace 2025-12-04T11:20:45.4089560Z configfile: pytest.ini 2025-12-04T11:20:45.4090115Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:20:45.4090333Z collecting ... collected 58 items / 13 deselected / 45 selected 2025-12-04T11:20:45.4091329Z stepcurrent: skipping 7 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 2025-12-04T11:20:45.4091447Z Running 1 items in this shard 2025-12-04T11:20:45.4091457Z 2025-12-04T11:20:45.4092747Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 [W1204 11:11:10.047314338 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4092756Z 2025-12-04T11:20:45.4093275Z [W1204 11:11:25.259216840 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4093283Z 2025-12-04T11:20:45.4093793Z [W1204 11:11:25.259483766 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4093811Z 2025-12-04T11:20:45.4094320Z [W1204 11:11:25.266839325 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4094325Z 2025-12-04T11:20:45.4094904Z [W1204 11:11:25.267536827 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4094913Z 2025-12-04T11:20:45.4095434Z [W1204 11:11:25.267723570 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4095440Z 2025-12-04T11:20:45.4095945Z [W1204 11:11:25.274666725 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4095980Z 2025-12-04T11:20:45.4096580Z [W1204 11:11:25.275439269 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4096586Z 2025-12-04T11:20:45.4097094Z [W1204 11:11:25.275619551 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4097099Z 2025-12-04T11:20:45.4097627Z [W1204 11:11:26.413524292 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4097670Z 2025-12-04T11:20:45.4098182Z [W1204 11:11:26.415261131 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4098187Z 2025-12-04T11:20:45.4098709Z [W1204 11:11:26.415475728 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4098716Z 2025-12-04T11:20:45.4099231Z [W1204 11:11:26.419460348 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4099236Z 2025-12-04T11:20:45.4099738Z [W1204 11:11:26.420137977 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4099743Z 2025-12-04T11:20:45.4100272Z [W1204 11:11:26.420337083 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4100280Z 2025-12-04T11:20:45.4100789Z [W1204 11:11:26.426438664 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4100794Z 2025-12-04T11:20:45.4101317Z [W1204 11:11:26.427068125 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4101324Z 2025-12-04T11:20:45.4101834Z [W1204 11:11:26.427257028 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4101839Z 2025-12-04T11:20:45.4101990Z ('RERUN', {'yellow': True}) [19.0761s] [100%] 2025-12-04T11:20:45.4103271Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 [W1204 11:11:26.831109857 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4103279Z 2025-12-04T11:20:45.4103803Z [W1204 11:11:26.831833376 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4103808Z 2025-12-04T11:20:45.4104319Z [W1204 11:11:26.832032507 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4104326Z 2025-12-04T11:20:45.4104832Z [W1204 11:11:26.836069081 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4104849Z 2025-12-04T11:20:45.4105357Z [W1204 11:11:26.836726476 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4105362Z 2025-12-04T11:20:45.4105929Z [W1204 11:11:26.836921751 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4105935Z 2025-12-04T11:20:45.4106457Z [W1204 11:11:26.843139546 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4106462Z 2025-12-04T11:20:45.4106969Z [W1204 11:11:26.843750780 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4107006Z 2025-12-04T11:20:45.4107525Z [W1204 11:11:26.843934467 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4107531Z 2025-12-04T11:20:45.4108039Z [W1204 11:11:26.931833573 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4108043Z 2025-12-04T11:20:45.4108564Z [W1204 11:11:26.932645973 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4108573Z 2025-12-04T11:20:45.4109112Z [W1204 11:11:26.932860144 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4109117Z 2025-12-04T11:20:45.4109639Z [W1204 11:11:26.936828804 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4109646Z 2025-12-04T11:20:45.4110154Z [W1204 11:11:26.937487791 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4110160Z 2025-12-04T11:20:45.4110666Z [W1204 11:11:26.937695296 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4110684Z 2025-12-04T11:20:45.4111188Z [W1204 11:11:26.943771149 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4111193Z 2025-12-04T11:20:45.4111705Z [W1204 11:11:26.944603657 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4111712Z 2025-12-04T11:20:45.4112233Z [W1204 11:11:26.944798040 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4112238Z 2025-12-04T11:20:45.4112373Z ('RERUN', {'yellow': True}) [0.4778s] [100%] 2025-12-04T11:20:45.4113648Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 [W1204 11:11:26.278517032 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4113654Z 2025-12-04T11:20:45.4114163Z [W1204 11:11:26.279223789 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4114173Z 2025-12-04T11:20:45.4114692Z [W1204 11:11:26.279419340 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4114699Z 2025-12-04T11:20:45.4115208Z [W1204 11:11:26.283470640 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4115214Z 2025-12-04T11:20:45.4115724Z [W1204 11:11:26.284100521 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4115742Z 2025-12-04T11:20:45.4116254Z [W1204 11:11:26.284306997 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4116259Z 2025-12-04T11:20:45.4116768Z [W1204 11:11:26.290495118 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4116773Z 2025-12-04T11:20:45.4117356Z [W1204 11:11:26.291118891 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4117364Z 2025-12-04T11:20:45.4117870Z [W1204 11:11:26.291302088 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4117875Z 2025-12-04T11:20:45.4118427Z [W1204 11:11:26.379308521 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4118432Z 2025-12-04T11:20:45.4118940Z [W1204 11:11:26.380104609 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4118944Z 2025-12-04T11:20:45.4119463Z [W1204 11:11:26.380314013 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4119468Z 2025-12-04T11:20:45.4119980Z [W1204 11:11:26.384254662 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4120013Z 2025-12-04T11:20:45.4120539Z [W1204 11:11:26.384946142 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4120544Z 2025-12-04T11:20:45.4121052Z [W1204 11:11:26.385149299 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4121060Z 2025-12-04T11:20:45.4121574Z [W1204 11:11:27.391215832 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4121594Z 2025-12-04T11:20:45.4122101Z [W1204 11:11:27.392029307 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4122105Z 2025-12-04T11:20:45.4122617Z [W1204 11:11:27.392224871 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4122625Z 2025-12-04T11:20:45.4122744Z FAILED [0.4463s] [100%] 2025-12-04T11:20:45.4122749Z 2025-12-04T11:20:45.4122897Z ==================================== RERUNS ==================================== 2025-12-04T11:20:45.4123415Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 _ 2025-12-04T11:20:45.4123542Z Traceback (most recent call last): 2025-12-04T11:20:45.4124052Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda 2025-12-04T11:20:45.4124299Z self.assertEqual(counters["inductor"]["woq_matcher_count"], 1) 2025-12-04T11:20:45.4124766Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual 2025-12-04T11:20:45.4124944Z return super().assertEqual(x, y, *args, **kwargs) 2025-12-04T11:20:45.4125483Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual 2025-12-04T11:20:45.4125694Z raise error_metas.pop()[0].to_error( # type: ignore[index] 2025-12-04T11:20:45.4125841Z AssertionError: Scalars are not equal! 2025-12-04T11:20:45.4125846Z 2025-12-04T11:20:45.4125954Z Expected 1 but got 2. 2025-12-04T11:20:45.4126065Z Absolute difference: 1 2025-12-04T11:20:45.4126189Z Relative difference: 1.0 2025-12-04T11:20:45.4126194Z 2025-12-04T11:20:45.4126411Z To execute this test, run the following from the base repo dir: 2025-12-04T11:20:45.4127325Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 2025-12-04T11:20:45.4127330Z 2025-12-04T11:20:45.4127600Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:20:45.4127907Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:20:45.4128044Z stats [('calls_captured', 6)] 2025-12-04T11:20:45.4128938Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)] 2025-12-04T11:20:45.4129209Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:20:45.4129311Z graph_break [] 2025-12-04T11:20:45.4129528Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:20:45.4130746Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T11:20:45.4130865Z if out == self.unknown_value: 2025-12-04T11:20:45.4131617Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.4131753Z warnings.warn( 2025-12-04T11:20:45.4132478Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.4132597Z warnings.warn( 2025-12-04T11:20:45.4133100Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 _ 2025-12-04T11:20:45.4133224Z Traceback (most recent call last): 2025-12-04T11:20:45.4133742Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda 2025-12-04T11:20:45.4133976Z self.assertEqual(counters["inductor"]["woq_matcher_count"], 1) 2025-12-04T11:20:45.4134450Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual 2025-12-04T11:20:45.4134616Z return super().assertEqual(x, y, *args, **kwargs) 2025-12-04T11:20:45.4135156Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual 2025-12-04T11:20:45.4135374Z raise error_metas.pop()[0].to_error( # type: ignore[index] 2025-12-04T11:20:45.4135508Z AssertionError: Scalars are not equal! 2025-12-04T11:20:45.4135516Z 2025-12-04T11:20:45.4135636Z Expected 1 but got 2. 2025-12-04T11:20:45.4135744Z Absolute difference: 1 2025-12-04T11:20:45.4135855Z Relative difference: 1.0 2025-12-04T11:20:45.4135861Z 2025-12-04T11:20:45.4136088Z To execute this test, run the following from the base repo dir: 2025-12-04T11:20:45.4137056Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 2025-12-04T11:20:45.4137062Z 2025-12-04T11:20:45.4137354Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:20:45.4137576Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:20:45.4137694Z stats [('calls_captured', 6)] 2025-12-04T11:20:45.4138598Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)] 2025-12-04T11:20:45.4138832Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:20:45.4138936Z graph_break [] 2025-12-04T11:20:45.4139170Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:20:45.4140453Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T11:20:45.4140592Z if out == self.unknown_value: 2025-12-04T11:20:45.4141325Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.4141428Z warnings.warn( 2025-12-04T11:20:45.4142160Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.4142295Z warnings.warn( 2025-12-04T11:20:45.4142529Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:20:45.4142646Z stats [('calls_captured', 6)] 2025-12-04T11:20:45.4142876Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:20:45.4143785Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)] 2025-12-04T11:20:45.4143917Z graph_break [] 2025-12-04T11:20:45.4144132Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:20:45.4144870Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.4144978Z warnings.warn( 2025-12-04T11:20:45.4145705Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.4145807Z warnings.warn( 2025-12-04T11:20:45.4145953Z =================================== FAILURES =================================== 2025-12-04T11:20:45.4146474Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 _ 2025-12-04T11:20:45.4146604Z Traceback (most recent call last): 2025-12-04T11:20:45.4147126Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda 2025-12-04T11:20:45.4147363Z self.assertEqual(counters["inductor"]["woq_matcher_count"], 1) 2025-12-04T11:20:45.4147824Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual 2025-12-04T11:20:45.4148005Z return super().assertEqual(x, y, *args, **kwargs) 2025-12-04T11:20:45.4148542Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual 2025-12-04T11:20:45.4148748Z raise error_metas.pop()[0].to_error( # type: ignore[index] 2025-12-04T11:20:45.4148903Z AssertionError: Scalars are not equal! 2025-12-04T11:20:45.4148908Z 2025-12-04T11:20:45.4149016Z Expected 1 but got 2. 2025-12-04T11:20:45.4149140Z Absolute difference: 1 2025-12-04T11:20:45.4149252Z Relative difference: 1.0 2025-12-04T11:20:45.4149257Z 2025-12-04T11:20:45.4149479Z To execute this test, run the following from the base repo dir: 2025-12-04T11:20:45.4150402Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 2025-12-04T11:20:45.4150407Z 2025-12-04T11:20:45.4150677Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:20:45.4150914Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:20:45.4151034Z stats [('calls_captured', 6)] 2025-12-04T11:20:45.4151919Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)] 2025-12-04T11:20:45.4152164Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:20:45.4152325Z graph_break [] 2025-12-04T11:20:45.4152556Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:20:45.4153760Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T11:20:45.4153914Z if out == self.unknown_value: 2025-12-04T11:20:45.4154652Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.4154757Z warnings.warn( 2025-12-04T11:20:45.4155490Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.4155596Z warnings.warn( 2025-12-04T11:20:45.4155822Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:20:45.4155990Z stats [('calls_captured', 6)] 2025-12-04T11:20:45.4156222Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:20:45.4157112Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)] 2025-12-04T11:20:45.4157232Z graph_break [] 2025-12-04T11:20:45.4157451Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:20:45.4158191Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.4158297Z warnings.warn( 2025-12-04T11:20:45.4159019Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.4159135Z warnings.warn( 2025-12-04T11:20:45.4159355Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:20:45.4159471Z stats [('calls_captured', 6)] 2025-12-04T11:20:45.4159713Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:20:45.4160602Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)] 2025-12-04T11:20:45.4160722Z graph_break [] 2025-12-04T11:20:45.4160939Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:20:45.4161667Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.4161782Z warnings.warn( 2025-12-04T11:20:45.4162498Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.4162614Z warnings.warn( 2025-12-04T11:20:45.4163454Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-4a14a2e6be65f97f.xml - 2025-12-04T11:20:45.4163629Z =========================== short test summary info ============================ 2025-12-04T11:20:45.4164581Z FAILED [0.4463s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 - AssertionError: Scalars are not equal! 2025-12-04T11:20:45.4164588Z 2025-12-04T11:20:45.4164695Z Expected 1 but got 2. 2025-12-04T11:20:45.4164816Z Absolute difference: 1 2025-12-04T11:20:45.4164927Z Relative difference: 1.0 2025-12-04T11:20:45.4164932Z 2025-12-04T11:20:45.4165212Z To execute this test, run the following from the base repo dir: 2025-12-04T11:20:45.4166134Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 2025-12-04T11:20:45.4166139Z 2025-12-04T11:20:45.4166408Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:20:45.4166633Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:20:45.4166833Z ================== 1 failed, 13 deselected, 2 rerun in 20.03s ================== 2025-12-04T11:20:45.4166934Z Got exit code 1 2025-12-04T11:20:45.4167760Z FAILED CONSISTENTLY: test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 2025-12-04T11:20:45.4168179Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T11:20:45.4168636Z W1204 11:11:38.682000 90791 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T11:20:45.4169348Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-82a3db4b14f41cd2.xml 2025-12-04T11:20:45.4169517Z ============================= test session starts ============================== 2025-12-04T11:20:45.4169886Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T11:20:45.4169998Z cachedir: .pytest_cache 2025-12-04T11:20:45.4170518Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:20:45.4170657Z rootdir: /var/lib/jenkins/workspace 2025-12-04T11:20:45.4170767Z configfile: pytest.ini 2025-12-04T11:20:45.4171665Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:20:45.4171891Z collecting ... collected 58 items / 8 deselected / 50 selected 2025-12-04T11:20:45.4172035Z stepcurrent: skipping 8 already run items. 2025-12-04T11:20:45.4172166Z Running 6 items in this shard 2025-12-04T11:20:45.4172171Z 2025-12-04T11:20:45.4173030Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 ('RERUN', {'yellow': True}) [3.8788s] [ 16%] 2025-12-04T11:20:45.4173901Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 ('RERUN', {'yellow': True}) [0.4250s] [ 16%] 2025-12-04T11:20:45.4174669Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 FAILED [0.4182s] [ 16%] 2025-12-04T11:20:45.4174674Z 2025-12-04T11:20:45.4174823Z ==================================== RERUNS ==================================== 2025-12-04T11:20:45.4175348Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 _ 2025-12-04T11:20:45.4175474Z Traceback (most recent call last): 2025-12-04T11:20:45.4175999Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda 2025-12-04T11:20:45.4176237Z self.assertEqual(counters["inductor"]["woq_matcher_count"], 1) 2025-12-04T11:20:45.4176767Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual 2025-12-04T11:20:45.4176947Z return super().assertEqual(x, y, *args, **kwargs) 2025-12-04T11:20:45.4177484Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual 2025-12-04T11:20:45.4177839Z raise error_metas.pop()[0].to_error( # type: ignore[index] 2025-12-04T11:20:45.4177978Z AssertionError: Scalars are not equal! 2025-12-04T11:20:45.4177986Z 2025-12-04T11:20:45.4178092Z Expected 1 but got 2. 2025-12-04T11:20:45.4178214Z Absolute difference: 1 2025-12-04T11:20:45.4178327Z Relative difference: 1.0 2025-12-04T11:20:45.4178332Z 2025-12-04T11:20:45.4178547Z To execute this test, run the following from the base repo dir: 2025-12-04T11:20:45.4179513Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T11:20:45.4179520Z 2025-12-04T11:20:45.4179788Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:20:45.4180023Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:20:45.4180141Z stats [('calls_captured', 6)] 2025-12-04T11:20:45.4180675Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)] 2025-12-04T11:20:45.4180967Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:20:45.4181069Z graph_break [] 2025-12-04T11:20:45.4181295Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:20:45.4182028Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.4182136Z warnings.warn( 2025-12-04T11:20:45.4182872Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.4182974Z warnings.warn( 2025-12-04T11:20:45.4183489Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 _ 2025-12-04T11:20:45.4183620Z Traceback (most recent call last): 2025-12-04T11:20:45.4184135Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda 2025-12-04T11:20:45.4184383Z self.assertEqual(counters["inductor"]["woq_matcher_count"], 1) 2025-12-04T11:20:45.4184842Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual 2025-12-04T11:20:45.4185012Z return super().assertEqual(x, y, *args, **kwargs) 2025-12-04T11:20:45.4185560Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual 2025-12-04T11:20:45.4185769Z raise error_metas.pop()[0].to_error( # type: ignore[index] 2025-12-04T11:20:45.4185916Z AssertionError: Scalars are not equal! 2025-12-04T11:20:45.4185921Z 2025-12-04T11:20:45.4186026Z Expected 1 but got 2. 2025-12-04T11:20:45.4186134Z Absolute difference: 1 2025-12-04T11:20:45.4186259Z Relative difference: 1.0 2025-12-04T11:20:45.4186268Z 2025-12-04T11:20:45.4186485Z To execute this test, run the following from the base repo dir: 2025-12-04T11:20:45.4187391Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T11:20:45.4187409Z 2025-12-04T11:20:45.4187679Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:20:45.4187898Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:20:45.4188030Z stats [('calls_captured', 6)] 2025-12-04T11:20:45.4188556Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)] 2025-12-04T11:20:45.4188785Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:20:45.4188898Z graph_break [] 2025-12-04T11:20:45.4189178Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:20:45.4189924Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.4190027Z warnings.warn( 2025-12-04T11:20:45.4190745Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.4190891Z warnings.warn( 2025-12-04T11:20:45.4191109Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:20:45.4191225Z stats [('calls_captured', 6)] 2025-12-04T11:20:45.4191466Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:20:45.4191994Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)] 2025-12-04T11:20:45.4192112Z graph_break [] 2025-12-04T11:20:45.4192327Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:20:45.4193084Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.4193201Z warnings.warn( 2025-12-04T11:20:45.4193917Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.4194034Z warnings.warn( 2025-12-04T11:20:45.4194180Z =================================== FAILURES =================================== 2025-12-04T11:20:45.4194683Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 _ 2025-12-04T11:20:45.4194820Z Traceback (most recent call last): 2025-12-04T11:20:45.4195332Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda 2025-12-04T11:20:45.4195566Z self.assertEqual(counters["inductor"]["woq_matcher_count"], 1) 2025-12-04T11:20:45.4196035Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual 2025-12-04T11:20:45.4196202Z return super().assertEqual(x, y, *args, **kwargs) 2025-12-04T11:20:45.4196749Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual 2025-12-04T11:20:45.4196955Z raise error_metas.pop()[0].to_error( # type: ignore[index] 2025-12-04T11:20:45.4197088Z AssertionError: Scalars are not equal! 2025-12-04T11:20:45.4197093Z 2025-12-04T11:20:45.4197211Z Expected 1 but got 2. 2025-12-04T11:20:45.4197318Z Absolute difference: 1 2025-12-04T11:20:45.4197428Z Relative difference: 1.0 2025-12-04T11:20:45.4197447Z 2025-12-04T11:20:45.4197664Z To execute this test, run the following from the base repo dir: 2025-12-04T11:20:45.4198571Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T11:20:45.4198579Z 2025-12-04T11:20:45.4198859Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:20:45.4199078Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:20:45.4199197Z stats [('calls_captured', 6)] 2025-12-04T11:20:45.4199737Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)] 2025-12-04T11:20:45.4199965Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:20:45.4200079Z graph_break [] 2025-12-04T11:20:45.4200298Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:20:45.4201092Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.4201211Z warnings.warn( 2025-12-04T11:20:45.4201931Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.4202076Z warnings.warn( 2025-12-04T11:20:45.4202295Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:20:45.4202416Z stats [('calls_captured', 6)] 2025-12-04T11:20:45.4202655Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:20:45.4203185Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)] 2025-12-04T11:20:45.4203287Z graph_break [] 2025-12-04T11:20:45.4203523Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:20:45.4204242Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.4204387Z warnings.warn( 2025-12-04T11:20:45.4205105Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.4205208Z warnings.warn( 2025-12-04T11:20:45.4205435Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:20:45.4205550Z stats [('calls_captured', 6)] 2025-12-04T11:20:45.4205776Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:20:45.4206312Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)] 2025-12-04T11:20:45.4206411Z graph_break [] 2025-12-04T11:20:45.4206644Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:20:45.4207366Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.4207467Z warnings.warn( 2025-12-04T11:20:45.4208196Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.4208299Z warnings.warn( 2025-12-04T11:20:45.4209147Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-82a3db4b14f41cd2.xml - 2025-12-04T11:20:45.4209323Z =========================== short test summary info ============================ 2025-12-04T11:20:45.4210259Z FAILED [0.4182s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 - AssertionError: Scalars are not equal! 2025-12-04T11:20:45.4210267Z 2025-12-04T11:20:45.4210389Z Expected 1 but got 2. 2025-12-04T11:20:45.4210498Z Absolute difference: 1 2025-12-04T11:20:45.4210621Z Relative difference: 1.0 2025-12-04T11:20:45.4210626Z 2025-12-04T11:20:45.4210841Z To execute this test, run the following from the base repo dir: 2025-12-04T11:20:45.4211748Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T11:20:45.4211754Z 2025-12-04T11:20:45.4212034Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:20:45.4212215Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:20:45.4212426Z =================== 1 failed, 8 deselected, 2 rerun in 4.75s =================== 2025-12-04T11:20:45.4212593Z Got exit code 1 2025-12-04T11:20:45.4212704Z Retrying single test... 2025-12-04T11:20:45.4213165Z W1204 11:11:59.267000 90960 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T11:20:45.4213833Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-a02c7191ab69f431.xml 2025-12-04T11:20:45.4214048Z ============================= test session starts ============================== 2025-12-04T11:20:45.4214415Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T11:20:45.4214528Z cachedir: .pytest_cache 2025-12-04T11:20:45.4215066Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:20:45.4215195Z rootdir: /var/lib/jenkins/workspace 2025-12-04T11:20:45.4215307Z configfile: pytest.ini 2025-12-04T11:20:45.4215870Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:20:45.4216129Z collecting ... collected 58 items / 13 deselected / 45 selected 2025-12-04T11:20:45.4217209Z stepcurrent: skipping 8 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T11:20:45.4217337Z Running 1 items in this shard 2025-12-04T11:20:45.4217342Z 2025-12-04T11:20:45.4218611Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 [W1204 11:12:02.272761954 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4218617Z 2025-12-04T11:20:45.4219155Z [W1204 11:12:19.518301637 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4219162Z 2025-12-04T11:20:45.4219675Z [W1204 11:12:19.518566200 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4219681Z 2025-12-04T11:20:45.4220207Z [W1204 11:12:19.525909295 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4220214Z 2025-12-04T11:20:45.4220724Z [W1204 11:12:19.526648895 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4220729Z 2025-12-04T11:20:45.4221253Z [W1204 11:12:19.526840399 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4221258Z 2025-12-04T11:20:45.4221771Z [W1204 11:12:19.533862900 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4221776Z 2025-12-04T11:20:45.4222302Z [W1204 11:12:19.534543891 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4222306Z 2025-12-04T11:20:45.4222817Z [W1204 11:12:19.534734527 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4222824Z 2025-12-04T11:20:45.4223330Z [W1204 11:12:21.542108370 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4223350Z 2025-12-04T11:20:45.4223860Z [W1204 11:12:21.543931835 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4223865Z 2025-12-04T11:20:45.4224372Z [W1204 11:12:21.544156187 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4224452Z 2025-12-04T11:20:45.4224978Z [W1204 11:12:21.548325892 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4224985Z 2025-12-04T11:20:45.4225491Z [W1204 11:12:21.549090184 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4225527Z 2025-12-04T11:20:45.4226049Z [W1204 11:12:21.549297089 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4226054Z 2025-12-04T11:20:45.4226562Z [W1204 11:12:21.555640488 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4226567Z 2025-12-04T11:20:45.4227086Z [W1204 11:12:21.556418740 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4227091Z 2025-12-04T11:20:45.4227602Z [W1204 11:12:21.556639491 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4227639Z 2025-12-04T11:20:45.4227789Z ('RERUN', {'yellow': True}) [20.1203s] [100%] 2025-12-04T11:20:45.4229046Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 [W1204 11:12:21.920799249 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4229055Z 2025-12-04T11:20:45.4229568Z [W1204 11:12:21.921611810 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4229587Z 2025-12-04T11:20:45.4230101Z [W1204 11:12:21.921825920 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4230106Z 2025-12-04T11:20:45.4230623Z [W1204 11:12:21.925823311 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4230630Z 2025-12-04T11:20:45.4231159Z [W1204 11:12:21.926625021 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4231163Z 2025-12-04T11:20:45.4231672Z [W1204 11:12:21.926818700 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4231677Z 2025-12-04T11:20:45.4232200Z [W1204 11:12:21.932910440 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4232205Z 2025-12-04T11:20:45.4232712Z [W1204 11:12:21.933538858 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4232716Z 2025-12-04T11:20:45.4233235Z [W1204 11:12:21.933725485 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4233241Z 2025-12-04T11:20:45.4233749Z [W1204 11:12:21.023781718 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4233754Z 2025-12-04T11:20:45.4234272Z [W1204 11:12:21.024578135 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4234279Z 2025-12-04T11:20:45.4234785Z [W1204 11:12:21.024784370 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4234790Z 2025-12-04T11:20:45.4235295Z [W1204 11:12:21.028760273 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4235300Z 2025-12-04T11:20:45.4235892Z [W1204 11:12:21.029423515 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4235900Z 2025-12-04T11:20:45.4236412Z [W1204 11:12:21.029621400 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4236417Z 2025-12-04T11:20:45.4236939Z [W1204 11:12:21.035762795 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4236981Z 2025-12-04T11:20:45.4237490Z [W1204 11:12:21.036668746 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4237496Z 2025-12-04T11:20:45.4238016Z [W1204 11:12:21.036867140 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4238020Z 2025-12-04T11:20:45.4238152Z ('RERUN', {'yellow': True}) [0.4392s] [100%] 2025-12-04T11:20:45.4239434Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 [W1204 11:12:21.333555728 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4239473Z 2025-12-04T11:20:45.4239985Z [W1204 11:12:21.334346692 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4239993Z 2025-12-04T11:20:45.4240500Z [W1204 11:12:21.334550788 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4240520Z 2025-12-04T11:20:45.4241031Z [W1204 11:12:21.338602848 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4241036Z 2025-12-04T11:20:45.4241547Z [W1204 11:12:21.339440911 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4241554Z 2025-12-04T11:20:45.4242073Z [W1204 11:12:21.339633496 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4242079Z 2025-12-04T11:20:45.4242583Z [W1204 11:12:21.345780229 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4242590Z 2025-12-04T11:20:45.4243110Z [W1204 11:12:21.346478713 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4243114Z 2025-12-04T11:20:45.4243622Z [W1204 11:12:21.346669243 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4243626Z 2025-12-04T11:20:45.4244156Z [W1204 11:12:22.436304789 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4244163Z 2025-12-04T11:20:45.4244673Z [W1204 11:12:22.437120886 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4244677Z 2025-12-04T11:20:45.4245197Z [W1204 11:12:22.437329853 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4245204Z 2025-12-04T11:20:45.4245714Z [W1204 11:12:22.441441114 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4245719Z 2025-12-04T11:20:45.4246225Z [W1204 11:12:22.442126413 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4246244Z 2025-12-04T11:20:45.4246900Z [W1204 11:12:22.442321726 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4246905Z 2025-12-04T11:20:45.4247416Z [W1204 11:12:22.448543890 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4247420Z 2025-12-04T11:20:45.4247945Z [W1204 11:12:22.449474833 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4247981Z 2025-12-04T11:20:45.4248490Z [W1204 11:12:22.449679283 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4248494Z 2025-12-04T11:20:45.4248612Z FAILED [0.4117s] [100%] 2025-12-04T11:20:45.4248617Z 2025-12-04T11:20:45.4248762Z ==================================== RERUNS ==================================== 2025-12-04T11:20:45.4249268Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 _ 2025-12-04T11:20:45.4249411Z Traceback (most recent call last): 2025-12-04T11:20:45.4249957Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda 2025-12-04T11:20:45.4250205Z self.assertEqual(counters["inductor"]["woq_matcher_count"], 1) 2025-12-04T11:20:45.4250674Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual 2025-12-04T11:20:45.4250840Z return super().assertEqual(x, y, *args, **kwargs) 2025-12-04T11:20:45.4251388Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual 2025-12-04T11:20:45.4251595Z raise error_metas.pop()[0].to_error( # type: ignore[index] 2025-12-04T11:20:45.4251741Z AssertionError: Scalars are not equal! 2025-12-04T11:20:45.4251746Z 2025-12-04T11:20:45.4251853Z Expected 1 but got 2. 2025-12-04T11:20:45.4251962Z Absolute difference: 1 2025-12-04T11:20:45.4252088Z Relative difference: 1.0 2025-12-04T11:20:45.4252098Z 2025-12-04T11:20:45.4252313Z To execute this test, run the following from the base repo dir: 2025-12-04T11:20:45.4253214Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T11:20:45.4253231Z 2025-12-04T11:20:45.4253504Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:20:45.4253728Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:20:45.4253860Z stats [('calls_captured', 6)] 2025-12-04T11:20:45.4254387Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)] 2025-12-04T11:20:45.4254617Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:20:45.4254732Z graph_break [] 2025-12-04T11:20:45.4254955Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:20:45.4256184Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T11:20:45.4256371Z if out == self.unknown_value: 2025-12-04T11:20:45.4257107Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.4257227Z warnings.warn( 2025-12-04T11:20:45.4257945Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.4258065Z warnings.warn( 2025-12-04T11:20:45.4258651Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 _ 2025-12-04T11:20:45.4258779Z Traceback (most recent call last): 2025-12-04T11:20:45.4259302Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda 2025-12-04T11:20:45.4259532Z self.assertEqual(counters["inductor"]["woq_matcher_count"], 1) 2025-12-04T11:20:45.4259990Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual 2025-12-04T11:20:45.4260198Z return super().assertEqual(x, y, *args, **kwargs) 2025-12-04T11:20:45.4260731Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual 2025-12-04T11:20:45.4260954Z raise error_metas.pop()[0].to_error( # type: ignore[index] 2025-12-04T11:20:45.4261088Z AssertionError: Scalars are not equal! 2025-12-04T11:20:45.4261093Z 2025-12-04T11:20:45.4261201Z Expected 1 but got 2. 2025-12-04T11:20:45.4261326Z Absolute difference: 1 2025-12-04T11:20:45.4261443Z Relative difference: 1.0 2025-12-04T11:20:45.4261481Z 2025-12-04T11:20:45.4261699Z To execute this test, run the following from the base repo dir: 2025-12-04T11:20:45.4262617Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T11:20:45.4262625Z 2025-12-04T11:20:45.4262893Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:20:45.4263125Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:20:45.4263245Z stats [('calls_captured', 6)] 2025-12-04T11:20:45.4263772Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)] 2025-12-04T11:20:45.4264013Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:20:45.4264113Z graph_break [] 2025-12-04T11:20:45.4264344Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:20:45.4265558Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T11:20:45.4265679Z if out == self.unknown_value: 2025-12-04T11:20:45.4266415Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.4266519Z warnings.warn( 2025-12-04T11:20:45.4267249Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.4267352Z warnings.warn( 2025-12-04T11:20:45.4267573Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:20:45.4267706Z stats [('calls_captured', 6)] 2025-12-04T11:20:45.4267936Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:20:45.4268465Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)] 2025-12-04T11:20:45.4268578Z graph_break [] 2025-12-04T11:20:45.4268796Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:20:45.4269532Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.4269633Z warnings.warn( 2025-12-04T11:20:45.4270349Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.4270462Z warnings.warn( 2025-12-04T11:20:45.4270680Z =================================== FAILURES =================================== 2025-12-04T11:20:45.4271537Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 _ 2025-12-04T11:20:45.4271672Z Traceback (most recent call last): 2025-12-04T11:20:45.4272186Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda 2025-12-04T11:20:45.4272512Z self.assertEqual(counters["inductor"]["woq_matcher_count"], 1) 2025-12-04T11:20:45.4272971Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual 2025-12-04T11:20:45.4273133Z return super().assertEqual(x, y, *args, **kwargs) 2025-12-04T11:20:45.4273683Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual 2025-12-04T11:20:45.4273895Z raise error_metas.pop()[0].to_error( # type: ignore[index] 2025-12-04T11:20:45.4274047Z AssertionError: Scalars are not equal! 2025-12-04T11:20:45.4274105Z 2025-12-04T11:20:45.4274215Z Expected 1 but got 2. 2025-12-04T11:20:45.4274325Z Absolute difference: 1 2025-12-04T11:20:45.4274450Z Relative difference: 1.0 2025-12-04T11:20:45.4274455Z 2025-12-04T11:20:45.4274672Z To execute this test, run the following from the base repo dir: 2025-12-04T11:20:45.4275592Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T11:20:45.4275598Z 2025-12-04T11:20:45.4275869Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:20:45.4276088Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:20:45.4276221Z stats [('calls_captured', 6)] 2025-12-04T11:20:45.4276752Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)] 2025-12-04T11:20:45.4276985Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:20:45.4277104Z graph_break [] 2025-12-04T11:20:45.4277319Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:20:45.4278535Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T11:20:45.4278661Z if out == self.unknown_value: 2025-12-04T11:20:45.4279387Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.4279504Z warnings.warn( 2025-12-04T11:20:45.4280226Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.4280344Z warnings.warn( 2025-12-04T11:20:45.4280562Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:20:45.4280680Z stats [('calls_captured', 6)] 2025-12-04T11:20:45.4280922Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:20:45.4281458Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)] 2025-12-04T11:20:45.4281558Z graph_break [] 2025-12-04T11:20:45.4281784Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:20:45.4282506Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.4282620Z warnings.warn( 2025-12-04T11:20:45.4283444Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.4283550Z warnings.warn( 2025-12-04T11:20:45.4283782Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:20:45.4283899Z stats [('calls_captured', 6)] 2025-12-04T11:20:45.4284176Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:20:45.4284704Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)] 2025-12-04T11:20:45.4284804Z graph_break [] 2025-12-04T11:20:45.4285043Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:20:45.4285766Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.4285874Z warnings.warn( 2025-12-04T11:20:45.4286638Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.4286743Z warnings.warn( 2025-12-04T11:20:45.4287593Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-a02c7191ab69f431.xml - 2025-12-04T11:20:45.4287772Z =========================== short test summary info ============================ 2025-12-04T11:20:45.4288705Z FAILED [0.4117s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 - AssertionError: Scalars are not equal! 2025-12-04T11:20:45.4288725Z 2025-12-04T11:20:45.4288837Z Expected 1 but got 2. 2025-12-04T11:20:45.4288948Z Absolute difference: 1 2025-12-04T11:20:45.4289082Z Relative difference: 1.0 2025-12-04T11:20:45.4289087Z 2025-12-04T11:20:45.4289307Z To execute this test, run the following from the base repo dir: 2025-12-04T11:20:45.4290208Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T11:20:45.4290214Z 2025-12-04T11:20:45.4290504Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:20:45.4290687Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:20:45.4290902Z ================== 1 failed, 13 deselected, 2 rerun in 21.00s ================== 2025-12-04T11:20:45.4291006Z Got exit code 1 2025-12-04T11:20:45.4291111Z Retrying single test... 2025-12-04T11:20:45.4291572Z W1204 11:12:33.782000 91134 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T11:20:45.4292237Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-e37b8ebc7938792f.xml 2025-12-04T11:20:45.4292420Z ============================= test session starts ============================== 2025-12-04T11:20:45.4292772Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T11:20:45.4292886Z cachedir: .pytest_cache 2025-12-04T11:20:45.4293423Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:20:45.4293551Z rootdir: /var/lib/jenkins/workspace 2025-12-04T11:20:45.4293663Z configfile: pytest.ini 2025-12-04T11:20:45.4294225Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:20:45.4294446Z collecting ... collected 58 items / 13 deselected / 45 selected 2025-12-04T11:20:45.4295501Z stepcurrent: skipping 8 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T11:20:45.4295627Z Running 1 items in this shard 2025-12-04T11:20:45.4295633Z 2025-12-04T11:20:45.4296969Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 [W1204 11:12:37.776359949 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4297027Z 2025-12-04T11:20:45.4297549Z [W1204 11:12:53.862439089 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4297554Z 2025-12-04T11:20:45.4298064Z [W1204 11:12:53.862700188 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4298070Z 2025-12-04T11:20:45.4298601Z [W1204 11:12:53.870192683 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4298636Z 2025-12-04T11:20:45.4299146Z [W1204 11:12:53.870916311 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4299151Z 2025-12-04T11:20:45.4299679Z [W1204 11:12:53.871106470 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4299684Z 2025-12-04T11:20:45.4300196Z [W1204 11:12:53.878050672 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4300201Z 2025-12-04T11:20:45.4300726Z [W1204 11:12:53.878708784 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4300731Z 2025-12-04T11:20:45.4301244Z [W1204 11:12:53.878893636 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4301252Z 2025-12-04T11:20:45.4301778Z [W1204 11:12:55.885365278 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4301784Z 2025-12-04T11:20:45.4302292Z [W1204 11:12:55.887132558 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4302299Z 2025-12-04T11:20:45.4302815Z [W1204 11:12:55.887353773 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4302834Z 2025-12-04T11:20:45.4303341Z [W1204 11:12:55.891470759 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4303346Z 2025-12-04T11:20:45.4303860Z [W1204 11:12:55.892194656 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4303867Z 2025-12-04T11:20:45.4304396Z [W1204 11:12:55.892402346 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4304401Z 2025-12-04T11:20:45.4304908Z [W1204 11:12:55.898681018 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4304915Z 2025-12-04T11:20:45.4305438Z [W1204 11:12:55.899398288 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4305443Z 2025-12-04T11:20:45.4305948Z [W1204 11:12:55.899600662 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4305953Z 2025-12-04T11:20:45.4306104Z ('RERUN', {'yellow': True}) [19.9477s] [100%] 2025-12-04T11:20:45.4307451Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 [W1204 11:12:55.264413976 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4307460Z 2025-12-04T11:20:45.4307970Z [W1204 11:12:55.265230197 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4308018Z 2025-12-04T11:20:45.4308529Z [W1204 11:12:55.265435750 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4308534Z 2025-12-04T11:20:45.4309041Z [W1204 11:12:55.269464401 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4309046Z 2025-12-04T11:20:45.4309568Z [W1204 11:12:55.270342425 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4309608Z 2025-12-04T11:20:45.4310116Z [W1204 11:12:55.270544744 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4310121Z 2025-12-04T11:20:45.4310642Z [W1204 11:12:55.276711460 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4310649Z 2025-12-04T11:20:45.4311161Z [W1204 11:12:55.277351336 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4311166Z 2025-12-04T11:20:45.4311688Z [W1204 11:12:55.277539164 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4311693Z 2025-12-04T11:20:45.4312206Z [W1204 11:12:55.366923654 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4312213Z 2025-12-04T11:20:45.4312731Z [W1204 11:12:55.367697961 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4312736Z 2025-12-04T11:20:45.4313240Z [W1204 11:12:55.367903647 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4313248Z 2025-12-04T11:20:45.4313758Z [W1204 11:12:55.371897849 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4313779Z 2025-12-04T11:20:45.4314285Z [W1204 11:12:55.372563885 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4314289Z 2025-12-04T11:20:45.4314799Z [W1204 11:12:55.372758507 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4314804Z 2025-12-04T11:20:45.4315327Z [W1204 11:12:55.378813179 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4315332Z 2025-12-04T11:20:45.4315841Z [W1204 11:12:55.379635546 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4315848Z 2025-12-04T11:20:45.4316369Z [W1204 11:12:55.379830822 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4316374Z 2025-12-04T11:20:45.4316506Z ('RERUN', {'yellow': True}) [0.4399s] [100%] 2025-12-04T11:20:45.4317840Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 [W1204 11:12:56.687201557 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4317849Z 2025-12-04T11:20:45.4318360Z [W1204 11:12:56.688009277 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4318364Z 2025-12-04T11:20:45.4318888Z [W1204 11:12:56.688220048 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4318923Z 2025-12-04T11:20:45.4319431Z [W1204 11:12:56.692334916 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4319435Z 2025-12-04T11:20:45.4319943Z [W1204 11:12:56.693197701 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4319948Z 2025-12-04T11:20:45.4320475Z [W1204 11:12:56.693394377 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4320480Z 2025-12-04T11:20:45.4321020Z [W1204 11:12:56.699524104 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4321025Z 2025-12-04T11:20:45.4321544Z [W1204 11:12:56.700200727 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4321552Z 2025-12-04T11:20:45.4322058Z [W1204 11:12:56.700404924 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4322063Z 2025-12-04T11:20:45.4322581Z [W1204 11:12:56.789785195 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4322587Z 2025-12-04T11:20:45.4323093Z [W1204 11:12:56.790620210 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4323102Z 2025-12-04T11:20:45.4323623Z [W1204 11:12:56.790838473 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4323630Z 2025-12-04T11:20:45.4324135Z [W1204 11:12:56.794914062 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4324142Z 2025-12-04T11:20:45.4324649Z [W1204 11:12:56.795600285 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4324665Z 2025-12-04T11:20:45.4325170Z [W1204 11:12:56.795799349 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4325175Z 2025-12-04T11:20:45.4325681Z [W1204 11:12:56.802091790 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4325686Z 2025-12-04T11:20:45.4326211Z [W1204 11:12:56.803047940 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4326218Z 2025-12-04T11:20:45.4326724Z [W1204 11:12:56.803254091 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4326728Z 2025-12-04T11:20:45.4326847Z FAILED [0.4217s] [100%] 2025-12-04T11:20:45.4326852Z 2025-12-04T11:20:45.4326995Z ==================================== RERUNS ==================================== 2025-12-04T11:20:45.4327511Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 _ 2025-12-04T11:20:45.4327637Z Traceback (most recent call last): 2025-12-04T11:20:45.4328152Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda 2025-12-04T11:20:45.4328397Z self.assertEqual(counters["inductor"]["woq_matcher_count"], 1) 2025-12-04T11:20:45.4328932Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual 2025-12-04T11:20:45.4329102Z return super().assertEqual(x, y, *args, **kwargs) 2025-12-04T11:20:45.4329652Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual 2025-12-04T11:20:45.4329893Z raise error_metas.pop()[0].to_error( # type: ignore[index] 2025-12-04T11:20:45.4330041Z AssertionError: Scalars are not equal! 2025-12-04T11:20:45.4330046Z 2025-12-04T11:20:45.4330154Z Expected 1 but got 2. 2025-12-04T11:20:45.4330264Z Absolute difference: 1 2025-12-04T11:20:45.4330390Z Relative difference: 1.0 2025-12-04T11:20:45.4330395Z 2025-12-04T11:20:45.4330610Z To execute this test, run the following from the base repo dir: 2025-12-04T11:20:45.4331524Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T11:20:45.4331575Z 2025-12-04T11:20:45.4331847Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:20:45.4332068Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:20:45.4332201Z stats [('calls_captured', 6)] 2025-12-04T11:20:45.4332731Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)] 2025-12-04T11:20:45.4332959Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:20:45.4333073Z graph_break [] 2025-12-04T11:20:45.4333293Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:20:45.4334517Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T11:20:45.4334637Z if out == self.unknown_value: 2025-12-04T11:20:45.4335362Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.4335478Z warnings.warn( 2025-12-04T11:20:45.4336200Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.4336398Z warnings.warn( 2025-12-04T11:20:45.4336903Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 _ 2025-12-04T11:20:45.4337027Z Traceback (most recent call last): 2025-12-04T11:20:45.4337548Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda 2025-12-04T11:20:45.4337784Z self.assertEqual(counters["inductor"]["woq_matcher_count"], 1) 2025-12-04T11:20:45.4338244Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual 2025-12-04T11:20:45.4338424Z return super().assertEqual(x, y, *args, **kwargs) 2025-12-04T11:20:45.4338958Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual 2025-12-04T11:20:45.4339181Z raise error_metas.pop()[0].to_error( # type: ignore[index] 2025-12-04T11:20:45.4339316Z AssertionError: Scalars are not equal! 2025-12-04T11:20:45.4339321Z 2025-12-04T11:20:45.4339430Z Expected 1 but got 2. 2025-12-04T11:20:45.4339555Z Absolute difference: 1 2025-12-04T11:20:45.4339667Z Relative difference: 1.0 2025-12-04T11:20:45.4339673Z 2025-12-04T11:20:45.4339901Z To execute this test, run the following from the base repo dir: 2025-12-04T11:20:45.4340880Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T11:20:45.4340889Z 2025-12-04T11:20:45.4341160Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:20:45.4341395Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:20:45.4341510Z stats [('calls_captured', 6)] 2025-12-04T11:20:45.4342083Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)] 2025-12-04T11:20:45.4342311Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:20:45.4342410Z graph_break [] 2025-12-04T11:20:45.4342640Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:20:45.4343846Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T11:20:45.4344009Z if out == self.unknown_value: 2025-12-04T11:20:45.4344744Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.4344849Z warnings.warn( 2025-12-04T11:20:45.4345576Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.4345679Z warnings.warn( 2025-12-04T11:20:45.4345896Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:20:45.4346025Z stats [('calls_captured', 6)] 2025-12-04T11:20:45.4346257Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:20:45.4346801Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)] 2025-12-04T11:20:45.4346905Z graph_break [] 2025-12-04T11:20:45.4347122Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:20:45.4347862Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.4347966Z warnings.warn( 2025-12-04T11:20:45.4348680Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.4348796Z warnings.warn( 2025-12-04T11:20:45.4348942Z =================================== FAILURES =================================== 2025-12-04T11:20:45.4349456Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 _ 2025-12-04T11:20:45.4349585Z Traceback (most recent call last): 2025-12-04T11:20:45.4350101Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda 2025-12-04T11:20:45.4350347Z self.assertEqual(counters["inductor"]["woq_matcher_count"], 1) 2025-12-04T11:20:45.4350806Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual 2025-12-04T11:20:45.4350985Z return super().assertEqual(x, y, *args, **kwargs) 2025-12-04T11:20:45.4351527Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual 2025-12-04T11:20:45.4351734Z raise error_metas.pop()[0].to_error( # type: ignore[index] 2025-12-04T11:20:45.4351879Z AssertionError: Scalars are not equal! 2025-12-04T11:20:45.4351884Z 2025-12-04T11:20:45.4351991Z Expected 1 but got 2. 2025-12-04T11:20:45.4352098Z Absolute difference: 1 2025-12-04T11:20:45.4352223Z Relative difference: 1.0 2025-12-04T11:20:45.4352289Z 2025-12-04T11:20:45.4352507Z To execute this test, run the following from the base repo dir: 2025-12-04T11:20:45.4353426Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T11:20:45.4353432Z 2025-12-04T11:20:45.4353733Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:20:45.4353951Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:20:45.4359240Z stats [('calls_captured', 6)] 2025-12-04T11:20:45.4359839Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)] 2025-12-04T11:20:45.4360074Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:20:45.4360195Z graph_break [] 2025-12-04T11:20:45.4360437Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:20:45.4361754Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T11:20:45.4361876Z if out == self.unknown_value: 2025-12-04T11:20:45.4362606Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.4362725Z warnings.warn( 2025-12-04T11:20:45.4363443Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.4363560Z warnings.warn( 2025-12-04T11:20:45.4363782Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:20:45.4363905Z stats [('calls_captured', 6)] 2025-12-04T11:20:45.4364154Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:20:45.4364684Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)] 2025-12-04T11:20:45.4364784Z graph_break [] 2025-12-04T11:20:45.4365015Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:20:45.4365742Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.4365857Z warnings.warn( 2025-12-04T11:20:45.4366572Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.4366673Z warnings.warn( 2025-12-04T11:20:45.4366912Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:20:45.4367031Z stats [('calls_captured', 6)] 2025-12-04T11:20:45.4367261Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:20:45.4367805Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)] 2025-12-04T11:20:45.4367908Z graph_break [] 2025-12-04T11:20:45.4368139Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:20:45.4368862Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.4368966Z warnings.warn( 2025-12-04T11:20:45.4369695Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.4369797Z warnings.warn( 2025-12-04T11:20:45.4370726Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-e37b8ebc7938792f.xml - 2025-12-04T11:20:45.4370906Z =========================== short test summary info ============================ 2025-12-04T11:20:45.4372220Z FAILED [0.4217s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 - AssertionError: Scalars are not equal! 2025-12-04T11:20:45.4372310Z 2025-12-04T11:20:45.4372436Z Expected 1 but got 2. 2025-12-04T11:20:45.4372549Z Absolute difference: 1 2025-12-04T11:20:45.4372664Z Relative difference: 1.0 2025-12-04T11:20:45.4372689Z 2025-12-04T11:20:45.4372910Z To execute this test, run the following from the base repo dir: 2025-12-04T11:20:45.4373826Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T11:20:45.4373888Z 2025-12-04T11:20:45.4374174Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:20:45.4374358Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:20:45.4374558Z ================== 1 failed, 13 deselected, 2 rerun in 20.84s ================== 2025-12-04T11:20:45.4374677Z Got exit code 1 2025-12-04T11:20:45.4375496Z FAILED CONSISTENTLY: test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T11:20:45.4375925Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T11:20:45.4376443Z W1204 11:13:08.355000 91308 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T11:20:45.4377107Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-ee37665d187f9309.xml 2025-12-04T11:20:45.4377293Z ============================= test session starts ============================== 2025-12-04T11:20:45.4377649Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T11:20:45.4377779Z cachedir: .pytest_cache 2025-12-04T11:20:45.4378298Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:20:45.4378426Z rootdir: /var/lib/jenkins/workspace 2025-12-04T11:20:45.4378552Z configfile: pytest.ini 2025-12-04T11:20:45.4379093Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:20:45.4379328Z collecting ... collected 58 items / 9 deselected / 49 selected 2025-12-04T11:20:45.4379478Z stepcurrent: skipping 9 already run items. 2025-12-04T11:20:45.4379597Z Running 5 items in this shard 2025-12-04T11:20:45.4379605Z 2025-12-04T11:20:45.4380492Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 ('RERUN', {'yellow': True}) [3.9433s] [ 20%] 2025-12-04T11:20:45.4381353Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 ('RERUN', {'yellow': True}) [0.5052s] [ 20%] 2025-12-04T11:20:45.4382139Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 FAILED [0.5059s] [ 20%] 2025-12-04T11:20:45.4382145Z 2025-12-04T11:20:45.4382289Z ==================================== RERUNS ==================================== 2025-12-04T11:20:45.4382898Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 _ 2025-12-04T11:20:45.4383045Z Traceback (most recent call last): 2025-12-04T11:20:45.4383564Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda 2025-12-04T11:20:45.4383811Z self.assertEqual(counters["inductor"]["woq_matcher_count"], 1) 2025-12-04T11:20:45.4384277Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual 2025-12-04T11:20:45.4384483Z return super().assertEqual(x, y, *args, **kwargs) 2025-12-04T11:20:45.4385041Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual 2025-12-04T11:20:45.4385252Z raise error_metas.pop()[0].to_error( # type: ignore[index] 2025-12-04T11:20:45.4385387Z AssertionError: Scalars are not equal! 2025-12-04T11:20:45.4385407Z 2025-12-04T11:20:45.4385515Z Expected 1 but got 2. 2025-12-04T11:20:45.4385631Z Absolute difference: 1 2025-12-04T11:20:45.4385759Z Relative difference: 1.0 2025-12-04T11:20:45.4385797Z 2025-12-04T11:20:45.4386016Z To execute this test, run the following from the base repo dir: 2025-12-04T11:20:45.4386921Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 2025-12-04T11:20:45.4386929Z 2025-12-04T11:20:45.4387209Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:20:45.4387430Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:20:45.4387560Z stats [('calls_captured', 6)] 2025-12-04T11:20:45.4388091Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)] 2025-12-04T11:20:45.4388318Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:20:45.4388436Z graph_break [] 2025-12-04T11:20:45.4388656Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:20:45.4389384Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.4389502Z warnings.warn( 2025-12-04T11:20:45.4390226Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.4390340Z warnings.warn( 2025-12-04T11:20:45.4390850Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 _ 2025-12-04T11:20:45.4390975Z Traceback (most recent call last): 2025-12-04T11:20:45.4391496Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda 2025-12-04T11:20:45.4391729Z self.assertEqual(counters["inductor"]["woq_matcher_count"], 1) 2025-12-04T11:20:45.4392202Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual 2025-12-04T11:20:45.4392369Z return super().assertEqual(x, y, *args, **kwargs) 2025-12-04T11:20:45.4392906Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual 2025-12-04T11:20:45.4393127Z raise error_metas.pop()[0].to_error( # type: ignore[index] 2025-12-04T11:20:45.4393263Z AssertionError: Scalars are not equal! 2025-12-04T11:20:45.4393268Z 2025-12-04T11:20:45.4393375Z Expected 1 but got 2. 2025-12-04T11:20:45.4393495Z Absolute difference: 1 2025-12-04T11:20:45.4393605Z Relative difference: 1.0 2025-12-04T11:20:45.4393611Z 2025-12-04T11:20:45.4393837Z To execute this test, run the following from the base repo dir: 2025-12-04T11:20:45.4394804Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 2025-12-04T11:20:45.4394812Z 2025-12-04T11:20:45.4395082Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:20:45.4395314Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:20:45.4395485Z stats [('calls_captured', 6)] 2025-12-04T11:20:45.4396024Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)] 2025-12-04T11:20:45.4396251Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:20:45.4396348Z graph_break [] 2025-12-04T11:20:45.4396582Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:20:45.4397316Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.4397465Z warnings.warn( 2025-12-04T11:20:45.4398184Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.4398289Z warnings.warn( 2025-12-04T11:20:45.4398522Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:20:45.4398641Z stats [('calls_captured', 6)] 2025-12-04T11:20:45.4398868Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:20:45.4399408Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)] 2025-12-04T11:20:45.4399512Z graph_break [] 2025-12-04T11:20:45.4399740Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:20:45.4400464Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.4400570Z warnings.warn( 2025-12-04T11:20:45.4401300Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.4401405Z warnings.warn( 2025-12-04T11:20:45.4401553Z =================================== FAILURES =================================== 2025-12-04T11:20:45.4402073Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 _ 2025-12-04T11:20:45.4402197Z Traceback (most recent call last): 2025-12-04T11:20:45.4402719Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda 2025-12-04T11:20:45.4402952Z self.assertEqual(counters["inductor"]["woq_matcher_count"], 1) 2025-12-04T11:20:45.4403416Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual 2025-12-04T11:20:45.4403596Z return super().assertEqual(x, y, *args, **kwargs) 2025-12-04T11:20:45.4404135Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual 2025-12-04T11:20:45.4404355Z raise error_metas.pop()[0].to_error( # type: ignore[index] 2025-12-04T11:20:45.4404493Z AssertionError: Scalars are not equal! 2025-12-04T11:20:45.4404499Z 2025-12-04T11:20:45.4404609Z Expected 1 but got 2. 2025-12-04T11:20:45.4404733Z Absolute difference: 1 2025-12-04T11:20:45.4404844Z Relative difference: 1.0 2025-12-04T11:20:45.4404850Z 2025-12-04T11:20:45.4405065Z To execute this test, run the following from the base repo dir: 2025-12-04T11:20:45.4406054Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 2025-12-04T11:20:45.4406063Z 2025-12-04T11:20:45.4406336Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:20:45.4406565Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:20:45.4406685Z stats [('calls_captured', 6)] 2025-12-04T11:20:45.4407215Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)] 2025-12-04T11:20:45.4407486Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:20:45.4407585Z graph_break [] 2025-12-04T11:20:45.4407815Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:20:45.4408545Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.4408649Z warnings.warn( 2025-12-04T11:20:45.4409382Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.4409515Z warnings.warn( 2025-12-04T11:20:45.4409732Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:20:45.4409862Z stats [('calls_captured', 6)] 2025-12-04T11:20:45.4410091Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:20:45.4410630Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)] 2025-12-04T11:20:45.4410731Z graph_break [] 2025-12-04T11:20:45.4410951Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:20:45.4411692Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.4411794Z warnings.warn( 2025-12-04T11:20:45.4412525Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.4412627Z warnings.warn( 2025-12-04T11:20:45.4412844Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:20:45.4412975Z stats [('calls_captured', 6)] 2025-12-04T11:20:45.4413202Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:20:45.4413728Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)] 2025-12-04T11:20:45.4413841Z graph_break [] 2025-12-04T11:20:45.4414055Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:20:45.4414794Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.4414900Z warnings.warn( 2025-12-04T11:20:45.4415618Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.4415732Z warnings.warn( 2025-12-04T11:20:45.4416645Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-ee37665d187f9309.xml - 2025-12-04T11:20:45.4416827Z =========================== short test summary info ============================ 2025-12-04T11:20:45.4417787Z FAILED [0.5059s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 - AssertionError: Scalars are not equal! 2025-12-04T11:20:45.4417794Z 2025-12-04T11:20:45.4417901Z Expected 1 but got 2. 2025-12-04T11:20:45.4418099Z Absolute difference: 1 2025-12-04T11:20:45.4418215Z Relative difference: 1.0 2025-12-04T11:20:45.4418220Z 2025-12-04T11:20:45.4418436Z To execute this test, run the following from the base repo dir: 2025-12-04T11:20:45.4419367Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 2025-12-04T11:20:45.4419405Z 2025-12-04T11:20:45.4419673Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:20:45.4419869Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:20:45.4420069Z =================== 1 failed, 9 deselected, 2 rerun in 4.99s =================== 2025-12-04T11:20:45.4420170Z Got exit code 1 2025-12-04T11:20:45.4420295Z Retrying single test... 2025-12-04T11:20:45.4420746Z W1204 11:13:29.214000 91485 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T11:20:45.4421450Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-511047743df1b08e.xml 2025-12-04T11:20:45.4421615Z ============================= test session starts ============================== 2025-12-04T11:20:45.4421967Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T11:20:45.4422093Z cachedir: .pytest_cache 2025-12-04T11:20:45.4422616Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:20:45.4422756Z rootdir: /var/lib/jenkins/workspace 2025-12-04T11:20:45.4422864Z configfile: pytest.ini 2025-12-04T11:20:45.4423409Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:20:45.4423650Z collecting ... collected 58 items / 13 deselected / 45 selected 2025-12-04T11:20:45.4424638Z stepcurrent: skipping 9 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 2025-12-04T11:20:45.4424760Z Running 1 items in this shard 2025-12-04T11:20:45.4424765Z 2025-12-04T11:20:45.4426057Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 [W1204 11:13:32.324728314 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4426066Z 2025-12-04T11:20:45.4426582Z [W1204 11:13:48.177217314 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4426588Z 2025-12-04T11:20:45.4427116Z [W1204 11:13:48.177487155 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4427124Z 2025-12-04T11:20:45.4427634Z [W1204 11:13:48.184942108 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4427639Z 2025-12-04T11:20:45.4428164Z [W1204 11:13:48.185660341 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4428172Z 2025-12-04T11:20:45.4428682Z [W1204 11:13:48.185853065 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4428687Z 2025-12-04T11:20:45.4429205Z [W1204 11:13:48.192923348 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4429210Z 2025-12-04T11:20:45.4429784Z [W1204 11:13:48.193656377 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4429792Z 2025-12-04T11:20:45.4430316Z [W1204 11:13:48.193846145 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4430321Z 2025-12-04T11:20:45.4430829Z [W1204 11:13:50.203421010 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4430864Z 2025-12-04T11:20:45.4431370Z [W1204 11:13:50.205217182 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4431387Z 2025-12-04T11:20:45.4431893Z [W1204 11:13:50.205438097 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4431898Z 2025-12-04T11:20:45.4432411Z [W1204 11:13:50.209618157 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4432416Z 2025-12-04T11:20:45.4432968Z [W1204 11:13:50.210373750 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4432973Z 2025-12-04T11:20:45.4433481Z [W1204 11:13:50.210586578 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4433489Z 2025-12-04T11:20:45.4434007Z [W1204 11:13:50.216930399 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4434011Z 2025-12-04T11:20:45.4434515Z [W1204 11:13:50.217656831 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4434520Z 2025-12-04T11:20:45.4435041Z [W1204 11:13:50.217861035 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4435051Z 2025-12-04T11:20:45.4435186Z ('RERUN', {'yellow': True}) [19.8161s] [100%] 2025-12-04T11:20:45.4436472Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 [W1204 11:13:51.671095345 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4436492Z 2025-12-04T11:20:45.4437003Z [W1204 11:13:51.671917773 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4437008Z 2025-12-04T11:20:45.4437513Z [W1204 11:13:51.672133362 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4437531Z 2025-12-04T11:20:45.4438038Z [W1204 11:13:51.676236417 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4438048Z 2025-12-04T11:20:45.4438559Z [W1204 11:13:51.677104652 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4438566Z 2025-12-04T11:20:45.4439088Z [W1204 11:13:51.677303515 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4439096Z 2025-12-04T11:20:45.4439603Z [W1204 11:13:51.683637243 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4439607Z 2025-12-04T11:20:45.4440130Z [W1204 11:13:51.684351557 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4440135Z 2025-12-04T11:20:45.4440644Z [W1204 11:13:51.684562181 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4440649Z 2025-12-04T11:20:45.4441345Z [W1204 11:13:51.776615271 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4441354Z 2025-12-04T11:20:45.4441864Z [W1204 11:13:51.777460977 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4441869Z 2025-12-04T11:20:45.4442413Z [W1204 11:13:51.777679184 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4442432Z 2025-12-04T11:20:45.4442938Z [W1204 11:13:51.781905628 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4442943Z 2025-12-04T11:20:45.4443450Z [W1204 11:13:51.782660157 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4443454Z 2025-12-04T11:20:45.4443979Z [W1204 11:13:51.782871494 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4444015Z 2025-12-04T11:20:45.4444525Z [W1204 11:13:51.789254631 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4444530Z 2025-12-04T11:20:45.4445049Z [W1204 11:13:51.790269828 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4445056Z 2025-12-04T11:20:45.4445567Z [W1204 11:13:51.790487312 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4445571Z 2025-12-04T11:20:45.4445717Z ('RERUN', {'yellow': True}) [0.5326s] [100%] 2025-12-04T11:20:45.4446994Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 [W1204 11:13:51.181560488 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4447002Z 2025-12-04T11:20:45.4447529Z [W1204 11:13:51.182369855 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4447533Z 2025-12-04T11:20:45.4448046Z [W1204 11:13:51.182582838 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4448054Z 2025-12-04T11:20:45.4448563Z [W1204 11:13:51.186718560 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4448582Z 2025-12-04T11:20:45.4449091Z [W1204 11:13:51.187573010 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4449096Z 2025-12-04T11:20:45.4449610Z [W1204 11:13:51.187772665 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4449617Z 2025-12-04T11:20:45.4450138Z [W1204 11:13:51.194081282 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4450143Z 2025-12-04T11:20:45.4450652Z [W1204 11:13:51.194772556 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4450659Z 2025-12-04T11:20:45.4451182Z [W1204 11:13:51.194966725 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4451187Z 2025-12-04T11:20:45.4451696Z [W1204 11:13:51.285730282 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4451701Z 2025-12-04T11:20:45.4452283Z [W1204 11:13:51.286540875 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4452291Z 2025-12-04T11:20:45.4452803Z [W1204 11:13:51.286764599 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4452808Z 2025-12-04T11:20:45.4453332Z [W1204 11:13:51.293119480 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4453367Z 2025-12-04T11:20:45.4453875Z [W1204 11:13:51.294095312 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4453880Z 2025-12-04T11:20:45.4454390Z [W1204 11:13:51.294312514 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4454408Z 2025-12-04T11:20:45.4454922Z [W1204 11:13:51.301740277 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4454962Z 2025-12-04T11:20:45.4455468Z [W1204 11:13:51.302660516 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4455473Z 2025-12-04T11:20:45.4455996Z [W1204 11:13:51.302874272 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4456004Z 2025-12-04T11:20:45.4456111Z FAILED [0.5119s] [100%] 2025-12-04T11:20:45.4456116Z 2025-12-04T11:20:45.4456272Z ==================================== RERUNS ==================================== 2025-12-04T11:20:45.4456863Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 _ 2025-12-04T11:20:45.4456991Z Traceback (most recent call last): 2025-12-04T11:20:45.4457527Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda 2025-12-04T11:20:45.4457759Z self.assertEqual(counters["inductor"]["woq_matcher_count"], 1) 2025-12-04T11:20:45.4458248Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual 2025-12-04T11:20:45.4458414Z return super().assertEqual(x, y, *args, **kwargs) 2025-12-04T11:20:45.4458955Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual 2025-12-04T11:20:45.4459180Z raise error_metas.pop()[0].to_error( # type: ignore[index] 2025-12-04T11:20:45.4459315Z AssertionError: Scalars are not equal! 2025-12-04T11:20:45.4459321Z 2025-12-04T11:20:45.4459426Z Expected 1 but got 2. 2025-12-04T11:20:45.4459548Z Absolute difference: 1 2025-12-04T11:20:45.4459659Z Relative difference: 1.0 2025-12-04T11:20:45.4459664Z 2025-12-04T11:20:45.4459892Z To execute this test, run the following from the base repo dir: 2025-12-04T11:20:45.4460809Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 2025-12-04T11:20:45.4460817Z 2025-12-04T11:20:45.4461087Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:20:45.4461322Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:20:45.4461443Z stats [('calls_captured', 6)] 2025-12-04T11:20:45.4461984Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)] 2025-12-04T11:20:45.4462212Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:20:45.4462311Z graph_break [] 2025-12-04T11:20:45.4462540Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:20:45.4463827Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T11:20:45.4463962Z if out == self.unknown_value: 2025-12-04T11:20:45.4464685Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.4464824Z warnings.warn( 2025-12-04T11:20:45.4465551Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.4465657Z warnings.warn( 2025-12-04T11:20:45.4466163Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 _ 2025-12-04T11:20:45.4466300Z Traceback (most recent call last): 2025-12-04T11:20:45.4466813Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda 2025-12-04T11:20:45.4467090Z self.assertEqual(counters["inductor"]["woq_matcher_count"], 1) 2025-12-04T11:20:45.4467548Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual 2025-12-04T11:20:45.4467711Z return super().assertEqual(x, y, *args, **kwargs) 2025-12-04T11:20:45.4468260Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual 2025-12-04T11:20:45.4468465Z raise error_metas.pop()[0].to_error( # type: ignore[index] 2025-12-04T11:20:45.4468614Z AssertionError: Scalars are not equal! 2025-12-04T11:20:45.4468619Z 2025-12-04T11:20:45.4468726Z Expected 1 but got 2. 2025-12-04T11:20:45.4468833Z Absolute difference: 1 2025-12-04T11:20:45.4468957Z Relative difference: 1.0 2025-12-04T11:20:45.4468963Z 2025-12-04T11:20:45.4469179Z To execute this test, run the following from the base repo dir: 2025-12-04T11:20:45.4470091Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 2025-12-04T11:20:45.4470111Z 2025-12-04T11:20:45.4470382Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:20:45.4470601Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:20:45.4470731Z stats [('calls_captured', 6)] 2025-12-04T11:20:45.4471591Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)] 2025-12-04T11:20:45.4471821Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:20:45.4471937Z graph_break [] 2025-12-04T11:20:45.4472158Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:20:45.4473392Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T11:20:45.4473513Z if out == self.unknown_value: 2025-12-04T11:20:45.4474237Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.4474356Z warnings.warn( 2025-12-04T11:20:45.4475076Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.4475192Z warnings.warn( 2025-12-04T11:20:45.4475411Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:20:45.4475533Z stats [('calls_captured', 6)] 2025-12-04T11:20:45.4475916Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:20:45.4476443Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)] 2025-12-04T11:20:45.4476547Z graph_break [] 2025-12-04T11:20:45.4476780Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:20:45.4477498Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.4477656Z warnings.warn( 2025-12-04T11:20:45.4478371Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.4478473Z warnings.warn( 2025-12-04T11:20:45.4478635Z =================================== FAILURES =================================== 2025-12-04T11:20:45.4479148Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 _ 2025-12-04T11:20:45.4479324Z Traceback (most recent call last): 2025-12-04T11:20:45.4479850Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda 2025-12-04T11:20:45.4480081Z self.assertEqual(counters["inductor"]["woq_matcher_count"], 1) 2025-12-04T11:20:45.4480555Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual 2025-12-04T11:20:45.4480720Z return super().assertEqual(x, y, *args, **kwargs) 2025-12-04T11:20:45.4481256Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual 2025-12-04T11:20:45.4481477Z raise error_metas.pop()[0].to_error( # type: ignore[index] 2025-12-04T11:20:45.4481611Z AssertionError: Scalars are not equal! 2025-12-04T11:20:45.4481617Z 2025-12-04T11:20:45.4481738Z Expected 1 but got 2. 2025-12-04T11:20:45.4481853Z Absolute difference: 1 2025-12-04T11:20:45.4481969Z Relative difference: 1.0 2025-12-04T11:20:45.4481975Z 2025-12-04T11:20:45.4482207Z To execute this test, run the following from the base repo dir: 2025-12-04T11:20:45.4483115Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 2025-12-04T11:20:45.4483124Z 2025-12-04T11:20:45.4483405Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:20:45.4483626Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:20:45.4483744Z stats [('calls_captured', 6)] 2025-12-04T11:20:45.4484286Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)] 2025-12-04T11:20:45.4484520Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:20:45.4484622Z graph_break [] 2025-12-04T11:20:45.4484854Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:20:45.4486053Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T11:20:45.4486188Z if out == self.unknown_value: 2025-12-04T11:20:45.4486913Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.4487015Z warnings.warn( 2025-12-04T11:20:45.4487749Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.4487851Z warnings.warn( 2025-12-04T11:20:45.4488164Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:20:45.4488284Z stats [('calls_captured', 6)] 2025-12-04T11:20:45.4488515Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:20:45.4489054Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)] 2025-12-04T11:20:45.4489183Z graph_break [] 2025-12-04T11:20:45.4489399Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:20:45.4490134Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.4490239Z warnings.warn( 2025-12-04T11:20:45.4490974Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.4491081Z warnings.warn( 2025-12-04T11:20:45.4491329Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:20:45.4491461Z stats [('calls_captured', 6)] 2025-12-04T11:20:45.4491687Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:20:45.4492213Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)] 2025-12-04T11:20:45.4492330Z graph_break [] 2025-12-04T11:20:45.4492548Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:20:45.4493280Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.4493381Z warnings.warn( 2025-12-04T11:20:45.4494104Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.4494221Z warnings.warn( 2025-12-04T11:20:45.4495055Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-511047743df1b08e.xml - 2025-12-04T11:20:45.4495244Z =========================== short test summary info ============================ 2025-12-04T11:20:45.4496186Z FAILED [0.5119s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 - AssertionError: Scalars are not equal! 2025-12-04T11:20:45.4496192Z 2025-12-04T11:20:45.4496389Z Expected 1 but got 2. 2025-12-04T11:20:45.4496503Z Absolute difference: 1 2025-12-04T11:20:45.4496616Z Relative difference: 1.0 2025-12-04T11:20:45.4496622Z 2025-12-04T11:20:45.4496858Z To execute this test, run the following from the base repo dir: 2025-12-04T11:20:45.4497776Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 2025-12-04T11:20:45.4497784Z 2025-12-04T11:20:45.4498070Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:20:45.4498253Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:20:45.4498455Z ================== 1 failed, 13 deselected, 2 rerun in 20.89s ================== 2025-12-04T11:20:45.4498576Z Got exit code 1 2025-12-04T11:20:45.4498685Z Retrying single test... 2025-12-04T11:20:45.4499130Z W1204 11:14:03.817000 91667 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T11:20:45.4499801Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-4d9221d5ac70ff44.xml 2025-12-04T11:20:45.4500040Z ============================= test session starts ============================== 2025-12-04T11:20:45.4500410Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T11:20:45.4500524Z cachedir: .pytest_cache 2025-12-04T11:20:45.4501046Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:20:45.4501223Z rootdir: /var/lib/jenkins/workspace 2025-12-04T11:20:45.4501333Z configfile: pytest.ini 2025-12-04T11:20:45.4501892Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:20:45.4502116Z collecting ... collected 58 items / 13 deselected / 45 selected 2025-12-04T11:20:45.4503112Z stepcurrent: skipping 9 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 2025-12-04T11:20:45.4503247Z Running 1 items in this shard 2025-12-04T11:20:45.4503288Z 2025-12-04T11:20:45.4504570Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 [W1204 11:14:07.927166044 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4504579Z 2025-12-04T11:20:45.4505111Z [W1204 11:14:22.362494888 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4505117Z 2025-12-04T11:20:45.4505636Z [W1204 11:14:22.362758240 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4505642Z 2025-12-04T11:20:45.4506171Z [W1204 11:14:22.370243206 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4506176Z 2025-12-04T11:20:45.4506686Z [W1204 11:14:22.370974257 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4506691Z 2025-12-04T11:20:45.4507198Z [W1204 11:14:22.371173540 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4507223Z 2025-12-04T11:20:45.4507736Z [W1204 11:14:22.378261286 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4507741Z 2025-12-04T11:20:45.4508249Z [W1204 11:14:22.378952097 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4508254Z 2025-12-04T11:20:45.4508774Z [W1204 11:14:22.379143732 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4508783Z 2025-12-04T11:20:45.4509294Z [W1204 11:14:24.392987944 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4509302Z 2025-12-04T11:20:45.4509823Z [W1204 11:14:24.394908312 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4509830Z 2025-12-04T11:20:45.4510342Z [W1204 11:14:24.395140158 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4510347Z 2025-12-04T11:20:45.4510870Z [W1204 11:14:24.399919835 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4510875Z 2025-12-04T11:20:45.4511381Z [W1204 11:14:25.400867765 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4511386Z 2025-12-04T11:20:45.4511978Z [W1204 11:14:25.401111230 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4511986Z 2025-12-04T11:20:45.4512495Z [W1204 11:14:25.408117913 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4512499Z 2025-12-04T11:20:45.4513033Z [W1204 11:14:25.409083366 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4513050Z 2025-12-04T11:20:45.4513558Z [W1204 11:14:25.409309795 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4513563Z 2025-12-04T11:20:45.4513698Z ('RERUN', {'yellow': True}) [19.4004s] [100%] 2025-12-04T11:20:45.4514987Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 [W1204 11:14:25.870695527 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4515029Z 2025-12-04T11:20:45.4515539Z [W1204 11:14:25.871554081 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4515544Z 2025-12-04T11:20:45.4516070Z [W1204 11:14:25.871771369 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4516075Z 2025-12-04T11:20:45.4516580Z [W1204 11:14:25.875981991 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4516585Z 2025-12-04T11:20:45.4517105Z [W1204 11:14:25.876924062 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4517109Z 2025-12-04T11:20:45.4517620Z [W1204 11:14:25.877128531 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4517627Z 2025-12-04T11:20:45.4518145Z [W1204 11:14:25.883484872 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4518150Z 2025-12-04T11:20:45.4518657Z [W1204 11:14:25.884223704 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4518664Z 2025-12-04T11:20:45.4519175Z [W1204 11:14:25.884423197 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4519180Z 2025-12-04T11:20:45.4519697Z [W1204 11:14:25.978156104 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4519702Z 2025-12-04T11:20:45.4520212Z [W1204 11:14:25.979008592 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4520219Z 2025-12-04T11:20:45.4520739Z [W1204 11:14:25.979232045 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4520743Z 2025-12-04T11:20:45.4521254Z [W1204 11:14:25.983484653 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4521261Z 2025-12-04T11:20:45.4521780Z [W1204 11:14:25.984229613 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4521785Z 2025-12-04T11:20:45.4522294Z [W1204 11:14:25.984437175 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4522299Z 2025-12-04T11:20:45.4522893Z [W1204 11:14:25.990863838 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4522900Z 2025-12-04T11:20:45.4523409Z [W1204 11:14:25.991829601 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4523413Z 2025-12-04T11:20:45.4523922Z [W1204 11:14:25.992034568 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4523973Z 2025-12-04T11:20:45.4524107Z ('RERUN', {'yellow': True}) [0.5425s] [100%] 2025-12-04T11:20:45.4525382Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 [W1204 11:14:25.389335682 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4525388Z 2025-12-04T11:20:45.4525918Z [W1204 11:14:25.390173142 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4525977Z 2025-12-04T11:20:45.4526485Z [W1204 11:14:25.390393733 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4526490Z 2025-12-04T11:20:45.4527018Z [W1204 11:14:25.394554876 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4527025Z 2025-12-04T11:20:45.4527536Z [W1204 11:14:25.395441919 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4527541Z 2025-12-04T11:20:45.4528060Z [W1204 11:14:25.395642465 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4528064Z 2025-12-04T11:20:45.4528579Z [W1204 11:14:26.402027556 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4528586Z 2025-12-04T11:20:45.4529106Z [W1204 11:14:26.402785796 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4529111Z 2025-12-04T11:20:45.4529618Z [W1204 11:14:26.402984402 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4529625Z 2025-12-04T11:20:45.4530131Z [W1204 11:14:26.498959614 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4530149Z 2025-12-04T11:20:45.4530653Z [W1204 11:14:26.500144064 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4530658Z 2025-12-04T11:20:45.4531167Z [W1204 11:14:26.500368497 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4531172Z 2025-12-04T11:20:45.4531694Z [W1204 11:14:26.505317630 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4531699Z 2025-12-04T11:20:45.4532205Z [W1204 11:14:26.506204860 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4532212Z 2025-12-04T11:20:45.4532729Z [W1204 11:14:26.506405324 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4532734Z 2025-12-04T11:20:45.4533242Z [W1204 11:14:26.512714866 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4533247Z 2025-12-04T11:20:45.4533768Z [W1204 11:14:26.513373224 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4533837Z 2025-12-04T11:20:45.4534345Z [W1204 11:14:26.513567590 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4534357Z 2025-12-04T11:20:45.4534461Z FAILED [0.5177s] [100%] 2025-12-04T11:20:45.4534478Z 2025-12-04T11:20:45.4534623Z ==================================== RERUNS ==================================== 2025-12-04T11:20:45.4535167Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 _ 2025-12-04T11:20:45.4535306Z Traceback (most recent call last): 2025-12-04T11:20:45.4535820Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda 2025-12-04T11:20:45.4536054Z self.assertEqual(counters["inductor"]["woq_matcher_count"], 1) 2025-12-04T11:20:45.4536618Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual 2025-12-04T11:20:45.4536791Z return super().assertEqual(x, y, *args, **kwargs) 2025-12-04T11:20:45.4537382Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual 2025-12-04T11:20:45.4537591Z raise error_metas.pop()[0].to_error( # type: ignore[index] 2025-12-04T11:20:45.4537727Z AssertionError: Scalars are not equal! 2025-12-04T11:20:45.4537736Z 2025-12-04T11:20:45.4537857Z Expected 1 but got 2. 2025-12-04T11:20:45.4537966Z Absolute difference: 1 2025-12-04T11:20:45.4538079Z Relative difference: 1.0 2025-12-04T11:20:45.4538084Z 2025-12-04T11:20:45.4538320Z To execute this test, run the following from the base repo dir: 2025-12-04T11:20:45.4539231Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 2025-12-04T11:20:45.4539237Z 2025-12-04T11:20:45.4539526Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:20:45.4539753Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:20:45.4539871Z stats [('calls_captured', 6)] 2025-12-04T11:20:45.4540417Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)] 2025-12-04T11:20:45.4540650Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:20:45.4540763Z graph_break [] 2025-12-04T11:20:45.4540981Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:20:45.4542193Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T11:20:45.4542325Z if out == self.unknown_value: 2025-12-04T11:20:45.4543054Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.4543173Z warnings.warn( 2025-12-04T11:20:45.4543896Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.4543999Z warnings.warn( 2025-12-04T11:20:45.4544520Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 _ 2025-12-04T11:20:45.4544643Z Traceback (most recent call last): 2025-12-04T11:20:45.4545148Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda 2025-12-04T11:20:45.4545393Z self.assertEqual(counters["inductor"]["woq_matcher_count"], 1) 2025-12-04T11:20:45.4545918Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual 2025-12-04T11:20:45.4546096Z return super().assertEqual(x, y, *args, **kwargs) 2025-12-04T11:20:45.4546631Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual 2025-12-04T11:20:45.4546838Z raise error_metas.pop()[0].to_error( # type: ignore[index] 2025-12-04T11:20:45.4547018Z AssertionError: Scalars are not equal! 2025-12-04T11:20:45.4547024Z 2025-12-04T11:20:45.4547130Z Expected 1 but got 2. 2025-12-04T11:20:45.4547251Z Absolute difference: 1 2025-12-04T11:20:45.4547361Z Relative difference: 1.0 2025-12-04T11:20:45.4547367Z 2025-12-04T11:20:45.4547585Z To execute this test, run the following from the base repo dir: 2025-12-04T11:20:45.4548504Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 2025-12-04T11:20:45.4548514Z 2025-12-04T11:20:45.4548783Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:20:45.4549045Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:20:45.4549162Z stats [('calls_captured', 6)] 2025-12-04T11:20:45.4549689Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)] 2025-12-04T11:20:45.4549934Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:20:45.4550035Z graph_break [] 2025-12-04T11:20:45.4550251Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:20:45.4551471Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T11:20:45.4551592Z if out == self.unknown_value: 2025-12-04T11:20:45.4552334Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.4552438Z warnings.warn( 2025-12-04T11:20:45.4553155Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.4553272Z warnings.warn( 2025-12-04T11:20:45.4553488Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:20:45.4553617Z stats [('calls_captured', 6)] 2025-12-04T11:20:45.4553847Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:20:45.4554374Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)] 2025-12-04T11:20:45.4554487Z graph_break [] 2025-12-04T11:20:45.4554706Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:20:45.4555429Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.4555543Z warnings.warn( 2025-12-04T11:20:45.4556263Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.4556381Z warnings.warn( 2025-12-04T11:20:45.4556531Z =================================== FAILURES =================================== 2025-12-04T11:20:45.4557039Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 _ 2025-12-04T11:20:45.4557175Z Traceback (most recent call last): 2025-12-04T11:20:45.4557756Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda 2025-12-04T11:20:45.4558010Z self.assertEqual(counters["inductor"]["woq_matcher_count"], 1) 2025-12-04T11:20:45.4558473Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual 2025-12-04T11:20:45.4558637Z return super().assertEqual(x, y, *args, **kwargs) 2025-12-04T11:20:45.4559185Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual 2025-12-04T11:20:45.4559430Z raise error_metas.pop()[0].to_error( # type: ignore[index] 2025-12-04T11:20:45.4559568Z AssertionError: Scalars are not equal! 2025-12-04T11:20:45.4559574Z 2025-12-04T11:20:45.4559699Z Expected 1 but got 2. 2025-12-04T11:20:45.4559809Z Absolute difference: 1 2025-12-04T11:20:45.4559935Z Relative difference: 1.0 2025-12-04T11:20:45.4559941Z 2025-12-04T11:20:45.4560157Z To execute this test, run the following from the base repo dir: 2025-12-04T11:20:45.4561073Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 2025-12-04T11:20:45.4561111Z 2025-12-04T11:20:45.4561395Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:20:45.4561616Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:20:45.4561749Z stats [('calls_captured', 6)] 2025-12-04T11:20:45.4562273Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)] 2025-12-04T11:20:45.4562502Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:20:45.4562616Z graph_break [] 2025-12-04T11:20:45.4562832Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:20:45.4564045Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T11:20:45.4564179Z if out == self.unknown_value: 2025-12-04T11:20:45.4564900Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.4565015Z warnings.warn( 2025-12-04T11:20:45.4565736Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.4565839Z warnings.warn( 2025-12-04T11:20:45.4566071Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:20:45.4566187Z stats [('calls_captured', 6)] 2025-12-04T11:20:45.4566426Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:20:45.4566955Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)] 2025-12-04T11:20:45.4567056Z graph_break [] 2025-12-04T11:20:45.4567284Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:20:45.4568012Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.4568116Z warnings.warn( 2025-12-04T11:20:45.4568847Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.4568946Z warnings.warn( 2025-12-04T11:20:45.4569173Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:20:45.4569291Z stats [('calls_captured', 6)] 2025-12-04T11:20:45.4569587Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:20:45.4570132Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)] 2025-12-04T11:20:45.4570232Z graph_break [] 2025-12-04T11:20:45.4570449Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:20:45.4571694Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.4571802Z warnings.warn( 2025-12-04T11:20:45.4572537Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.4572641Z warnings.warn( 2025-12-04T11:20:45.4573487Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-4d9221d5ac70ff44.xml - 2025-12-04T11:20:45.4573765Z =========================== short test summary info ============================ 2025-12-04T11:20:45.4574713Z FAILED [0.5177s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 - AssertionError: Scalars are not equal! 2025-12-04T11:20:45.4574722Z 2025-12-04T11:20:45.4574847Z Expected 1 but got 2. 2025-12-04T11:20:45.4574958Z Absolute difference: 1 2025-12-04T11:20:45.4575075Z Relative difference: 1.0 2025-12-04T11:20:45.4575080Z 2025-12-04T11:20:45.4575314Z To execute this test, run the following from the base repo dir: 2025-12-04T11:20:45.4576227Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 2025-12-04T11:20:45.4576233Z 2025-12-04T11:20:45.4576593Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:20:45.4576782Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:20:45.4576981Z ================== 1 failed, 13 deselected, 2 rerun in 20.49s ================== 2025-12-04T11:20:45.4577099Z Got exit code 1 2025-12-04T11:20:45.4577922Z FAILED CONSISTENTLY: test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 2025-12-04T11:20:45.4578350Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T11:20:45.4578797Z W1204 11:14:37.939000 91849 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T11:20:45.4579455Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-af9a500a606c950b.xml 2025-12-04T11:20:45.4579651Z ============================= test session starts ============================== 2025-12-04T11:20:45.4580009Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T11:20:45.4580136Z cachedir: .pytest_cache 2025-12-04T11:20:45.4580661Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:20:45.4580787Z rootdir: /var/lib/jenkins/workspace 2025-12-04T11:20:45.4580912Z configfile: pytest.ini 2025-12-04T11:20:45.4581454Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:20:45.4581672Z collecting ... collected 58 items / 10 deselected / 48 selected 2025-12-04T11:20:45.4581833Z stepcurrent: skipping 10 already run items. 2025-12-04T11:20:45.4581951Z Running 4 items in this shard 2025-12-04T11:20:45.4581956Z 2025-12-04T11:20:45.4582930Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 ('RERUN', {'yellow': True}) [3.9109s] [ 25%] 2025-12-04T11:20:45.4583785Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 ('RERUN', {'yellow': True}) [0.4683s] [ 25%] 2025-12-04T11:20:45.4584596Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 FAILED [0.4681s] [ 25%] 2025-12-04T11:20:45.4584617Z 2025-12-04T11:20:45.4584762Z ==================================== RERUNS ==================================== 2025-12-04T11:20:45.4585264Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 _ 2025-12-04T11:20:45.4585407Z Traceback (most recent call last): 2025-12-04T11:20:45.4585922Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda 2025-12-04T11:20:45.4586192Z self.assertEqual(counters["inductor"]["woq_matcher_count"], 1) 2025-12-04T11:20:45.4586663Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual 2025-12-04T11:20:45.4586827Z return super().assertEqual(x, y, *args, **kwargs) 2025-12-04T11:20:45.4587376Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual 2025-12-04T11:20:45.4587581Z raise error_metas.pop()[0].to_error( # type: ignore[index] 2025-12-04T11:20:45.4587713Z AssertionError: Scalars are not equal! 2025-12-04T11:20:45.4587718Z 2025-12-04T11:20:45.4587836Z Expected 1 but got 2. 2025-12-04T11:20:45.4587944Z Absolute difference: 1 2025-12-04T11:20:45.4588055Z Relative difference: 1.0 2025-12-04T11:20:45.4588074Z 2025-12-04T11:20:45.4588295Z To execute this test, run the following from the base repo dir: 2025-12-04T11:20:45.4589191Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 2025-12-04T11:20:45.4589198Z 2025-12-04T11:20:45.4589475Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:20:45.4589698Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:20:45.4589834Z stats [('calls_captured', 6)] 2025-12-04T11:20:45.4590724Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)] 2025-12-04T11:20:45.4590954Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:20:45.4591068Z graph_break [] 2025-12-04T11:20:45.4591290Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:20:45.4592023Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.4592136Z warnings.warn( 2025-12-04T11:20:45.4592850Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.4592969Z warnings.warn( 2025-12-04T11:20:45.4593467Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 _ 2025-12-04T11:20:45.4593591Z Traceback (most recent call last): 2025-12-04T11:20:45.4594116Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda 2025-12-04T11:20:45.4594348Z self.assertEqual(counters["inductor"]["woq_matcher_count"], 1) 2025-12-04T11:20:45.4594894Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual 2025-12-04T11:20:45.4595064Z return super().assertEqual(x, y, *args, **kwargs) 2025-12-04T11:20:45.4595597Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual 2025-12-04T11:20:45.4595856Z raise error_metas.pop()[0].to_error( # type: ignore[index] 2025-12-04T11:20:45.4595990Z AssertionError: Scalars are not equal! 2025-12-04T11:20:45.4595995Z 2025-12-04T11:20:45.4596102Z Expected 1 but got 2. 2025-12-04T11:20:45.4596224Z Absolute difference: 1 2025-12-04T11:20:45.4596341Z Relative difference: 1.0 2025-12-04T11:20:45.4596346Z 2025-12-04T11:20:45.4596572Z To execute this test, run the following from the base repo dir: 2025-12-04T11:20:45.4597475Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 2025-12-04T11:20:45.4597513Z 2025-12-04T11:20:45.4597784Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:20:45.4598014Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:20:45.4598130Z stats [('calls_captured', 6)] 2025-12-04T11:20:45.4599030Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)] 2025-12-04T11:20:45.4599260Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:20:45.4599360Z graph_break [] 2025-12-04T11:20:45.4599589Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:20:45.4600325Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.4600446Z warnings.warn( 2025-12-04T11:20:45.4601166Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.4601269Z warnings.warn( 2025-12-04T11:20:45.4601497Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:20:45.4601618Z stats [('calls_captured', 6)] 2025-12-04T11:20:45.4601846Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:20:45.4602751Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)] 2025-12-04T11:20:45.4602850Z graph_break [] 2025-12-04T11:20:45.4603082Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:20:45.4603803Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.4603908Z warnings.warn( 2025-12-04T11:20:45.4604642Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.4604746Z warnings.warn( 2025-12-04T11:20:45.4604894Z =================================== FAILURES =================================== 2025-12-04T11:20:45.4605409Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 _ 2025-12-04T11:20:45.4605535Z Traceback (most recent call last): 2025-12-04T11:20:45.4606054Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda 2025-12-04T11:20:45.4606357Z self.assertEqual(counters["inductor"]["woq_matcher_count"], 1) 2025-12-04T11:20:45.4606824Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual 2025-12-04T11:20:45.4607006Z return super().assertEqual(x, y, *args, **kwargs) 2025-12-04T11:20:45.4607539Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual 2025-12-04T11:20:45.4607796Z raise error_metas.pop()[0].to_error( # type: ignore[index] 2025-12-04T11:20:45.4607930Z AssertionError: Scalars are not equal! 2025-12-04T11:20:45.4607936Z 2025-12-04T11:20:45.4608041Z Expected 1 but got 2. 2025-12-04T11:20:45.4608165Z Absolute difference: 1 2025-12-04T11:20:45.4608279Z Relative difference: 1.0 2025-12-04T11:20:45.4608284Z 2025-12-04T11:20:45.4608501Z To execute this test, run the following from the base repo dir: 2025-12-04T11:20:45.4609425Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 2025-12-04T11:20:45.4609463Z 2025-12-04T11:20:45.4609735Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:20:45.4609967Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:20:45.4610083Z stats [('calls_captured', 6)] 2025-12-04T11:20:45.4610972Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)] 2025-12-04T11:20:45.4611217Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:20:45.4611318Z graph_break [] 2025-12-04T11:20:45.4611548Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:20:45.4612284Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.4612389Z warnings.warn( 2025-12-04T11:20:45.4613124Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.4613225Z warnings.warn( 2025-12-04T11:20:45.4613457Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:20:45.4613574Z stats [('calls_captured', 6)] 2025-12-04T11:20:45.4613799Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:20:45.4614697Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)] 2025-12-04T11:20:45.4614797Z graph_break [] 2025-12-04T11:20:45.4615020Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:20:45.4615758Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.4615860Z warnings.warn( 2025-12-04T11:20:45.4616666Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.4616775Z warnings.warn( 2025-12-04T11:20:45.4616990Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:20:45.4617120Z stats [('calls_captured', 6)] 2025-12-04T11:20:45.4617347Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:20:45.4618358Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)] 2025-12-04T11:20:45.4618460Z graph_break [] 2025-12-04T11:20:45.4618675Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:20:45.4619417Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.4619552Z warnings.warn( 2025-12-04T11:20:45.4620269Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.4620384Z warnings.warn( 2025-12-04T11:20:45.4621219Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-af9a500a606c950b.xml - 2025-12-04T11:20:45.4621408Z =========================== short test summary info ============================ 2025-12-04T11:20:45.4622345Z FAILED [0.4681s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 - AssertionError: Scalars are not equal! 2025-12-04T11:20:45.4622463Z 2025-12-04T11:20:45.4622572Z Expected 1 but got 2. 2025-12-04T11:20:45.4622701Z Absolute difference: 1 2025-12-04T11:20:45.4622814Z Relative difference: 1.0 2025-12-04T11:20:45.4622822Z 2025-12-04T11:20:45.4623056Z To execute this test, run the following from the base repo dir: 2025-12-04T11:20:45.4623958Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 2025-12-04T11:20:45.4623963Z 2025-12-04T11:20:45.4624230Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:20:45.4624432Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:20:45.4624637Z ================== 1 failed, 10 deselected, 2 rerun in 4.88s =================== 2025-12-04T11:20:45.4624756Z Got exit code 1 2025-12-04T11:20:45.4624867Z Retrying single test... 2025-12-04T11:20:45.4625317Z W1204 11:14:58.701000 92018 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T11:20:45.4625988Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-e3ba96547605fc4e.xml 2025-12-04T11:20:45.4626158Z ============================= test session starts ============================== 2025-12-04T11:20:45.4626523Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T11:20:45.4626637Z cachedir: .pytest_cache 2025-12-04T11:20:45.4627160Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:20:45.4627299Z rootdir: /var/lib/jenkins/workspace 2025-12-04T11:20:45.4627413Z configfile: pytest.ini 2025-12-04T11:20:45.4627955Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:20:45.4628188Z collecting ... collected 58 items / 13 deselected / 45 selected 2025-12-04T11:20:45.4629169Z stepcurrent: skipping 10 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 2025-12-04T11:20:45.4629304Z Running 1 items in this shard 2025-12-04T11:20:45.4629309Z 2025-12-04T11:20:45.4630585Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 [W1204 11:15:04.612942651 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4630592Z 2025-12-04T11:20:45.4631199Z [W1204 11:15:20.656322297 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4631207Z 2025-12-04T11:20:45.4631720Z [W1204 11:15:20.656607339 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4631726Z 2025-12-04T11:20:45.4632283Z [W1204 11:15:20.664107438 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4632302Z 2025-12-04T11:20:45.4632813Z [W1204 11:15:20.664894640 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4632818Z 2025-12-04T11:20:45.4633328Z [W1204 11:15:20.665092521 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4633333Z 2025-12-04T11:20:45.4633857Z [W1204 11:15:20.672167739 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4633898Z 2025-12-04T11:20:45.4634408Z [W1204 11:15:20.673012766 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4634413Z 2025-12-04T11:20:45.4634934Z [W1204 11:15:20.673200949 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4634941Z 2025-12-04T11:20:45.4635447Z [W1204 11:15:20.813031816 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4635452Z 2025-12-04T11:20:45.4635976Z [W1204 11:15:20.814898318 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4635981Z 2025-12-04T11:20:45.4636504Z [W1204 11:15:20.815116540 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4636511Z 2025-12-04T11:20:45.4637032Z [W1204 11:15:20.819175902 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4637037Z 2025-12-04T11:20:45.4637545Z [W1204 11:15:20.819866706 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4637554Z 2025-12-04T11:20:45.4638066Z [W1204 11:15:20.820112805 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4638071Z 2025-12-04T11:20:45.4638601Z [W1204 11:15:20.826289842 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4638606Z 2025-12-04T11:20:45.4639122Z [W1204 11:15:20.826978992 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4639129Z 2025-12-04T11:20:45.4639654Z [W1204 11:15:20.827176986 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4639659Z 2025-12-04T11:20:45.4639794Z ('RERUN', {'yellow': True}) [19.9647s] [100%] 2025-12-04T11:20:45.4641072Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 [W1204 11:15:20.249121143 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4641078Z 2025-12-04T11:20:45.4641593Z [W1204 11:15:20.249967141 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4641597Z 2025-12-04T11:20:45.4642185Z [W1204 11:15:20.250208854 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4642193Z 2025-12-04T11:20:45.4642702Z [W1204 11:15:20.254575877 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4642707Z 2025-12-04T11:20:45.4643213Z [W1204 11:15:20.255344958 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4643265Z 2025-12-04T11:20:45.4643778Z [W1204 11:15:20.255554601 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4643783Z 2025-12-04T11:20:45.4644288Z [W1204 11:15:20.262250594 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4644293Z 2025-12-04T11:20:45.4644822Z [W1204 11:15:20.263066920 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4644856Z 2025-12-04T11:20:45.4645365Z [W1204 11:15:20.263273581 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4645370Z 2025-12-04T11:20:45.4645892Z [W1204 11:15:20.357260292 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4645899Z 2025-12-04T11:20:45.4646410Z [W1204 11:15:20.358106008 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4646415Z 2025-12-04T11:20:45.4646937Z [W1204 11:15:20.358327520 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4646942Z 2025-12-04T11:20:45.4647456Z [W1204 11:15:20.362574241 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4647461Z 2025-12-04T11:20:45.4647989Z [W1204 11:15:20.363336869 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4647994Z 2025-12-04T11:20:45.4648508Z [W1204 11:15:20.363547270 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4648515Z 2025-12-04T11:20:45.4649025Z [W1204 11:15:20.369934140 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4649047Z 2025-12-04T11:20:45.4649563Z [W1204 11:15:20.370967990 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4649568Z 2025-12-04T11:20:45.4650075Z [W1204 11:15:20.371184552 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4650085Z 2025-12-04T11:20:45.4650235Z ('RERUN', {'yellow': True}) [0.5058s] [100%] 2025-12-04T11:20:45.4651502Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 [W1204 11:15:21.731677620 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4651510Z 2025-12-04T11:20:45.4652040Z [W1204 11:15:21.732491375 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4652044Z 2025-12-04T11:20:45.4652554Z [W1204 11:15:21.732716373 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4652559Z 2025-12-04T11:20:45.4653086Z [W1204 11:15:21.737022251 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4653150Z 2025-12-04T11:20:45.4653663Z [W1204 11:15:21.737730042 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4653670Z 2025-12-04T11:20:45.4654173Z [W1204 11:15:21.737932915 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4654225Z 2025-12-04T11:20:45.4654737Z [W1204 11:15:21.744431222 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4654742Z 2025-12-04T11:20:45.4655252Z [W1204 11:15:21.745191043 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4655257Z 2025-12-04T11:20:45.4655775Z [W1204 11:15:21.745393256 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4655780Z 2025-12-04T11:20:45.4656364Z [W1204 11:15:21.838075303 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4656407Z 2025-12-04T11:20:45.4656931Z [W1204 11:15:21.838871878 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4656936Z 2025-12-04T11:20:45.4657446Z [W1204 11:15:21.839082754 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4657450Z 2025-12-04T11:20:45.4657970Z [W1204 11:15:21.843124574 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4657975Z 2025-12-04T11:20:45.4658479Z [W1204 11:15:21.843813584 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4658484Z 2025-12-04T11:20:45.4659009Z [W1204 11:15:21.844017226 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4659016Z 2025-12-04T11:20:45.4659524Z [W1204 11:15:21.850178145 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4659529Z 2025-12-04T11:20:45.4660037Z [W1204 11:15:21.851063212 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4660058Z 2025-12-04T11:20:45.4660565Z [W1204 11:15:21.851263279 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4660570Z 2025-12-04T11:20:45.4660672Z FAILED [0.4765s] [100%] 2025-12-04T11:20:45.4660678Z 2025-12-04T11:20:45.4660840Z ==================================== RERUNS ==================================== 2025-12-04T11:20:45.4661346Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 _ 2025-12-04T11:20:45.4661486Z Traceback (most recent call last): 2025-12-04T11:20:45.4661999Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda 2025-12-04T11:20:45.4662231Z self.assertEqual(counters["inductor"]["woq_matcher_count"], 1) 2025-12-04T11:20:45.4662711Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual 2025-12-04T11:20:45.4662878Z return super().assertEqual(x, y, *args, **kwargs) 2025-12-04T11:20:45.4663414Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual 2025-12-04T11:20:45.4663634Z raise error_metas.pop()[0].to_error( # type: ignore[index] 2025-12-04T11:20:45.4663769Z AssertionError: Scalars are not equal! 2025-12-04T11:20:45.4663774Z 2025-12-04T11:20:45.4663892Z Expected 1 but got 2. 2025-12-04T11:20:45.4664081Z Absolute difference: 1 2025-12-04T11:20:45.4664201Z Relative difference: 1.0 2025-12-04T11:20:45.4664206Z 2025-12-04T11:20:45.4664437Z To execute this test, run the following from the base repo dir: 2025-12-04T11:20:45.4665338Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 2025-12-04T11:20:45.4665374Z 2025-12-04T11:20:45.4665656Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:20:45.4665880Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:20:45.4665998Z stats [('calls_captured', 6)] 2025-12-04T11:20:45.4666903Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)] 2025-12-04T11:20:45.4667142Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:20:45.4667278Z graph_break [] 2025-12-04T11:20:45.4667511Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:20:45.4668721Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T11:20:45.4668856Z if out == self.unknown_value: 2025-12-04T11:20:45.4669580Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.4669684Z warnings.warn( 2025-12-04T11:20:45.4670422Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.4670528Z warnings.warn( 2025-12-04T11:20:45.4671303Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 _ 2025-12-04T11:20:45.4671503Z Traceback (most recent call last): 2025-12-04T11:20:45.4672023Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda 2025-12-04T11:20:45.4672272Z self.assertEqual(counters["inductor"]["woq_matcher_count"], 1) 2025-12-04T11:20:45.4672733Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual 2025-12-04T11:20:45.4672911Z return super().assertEqual(x, y, *args, **kwargs) 2025-12-04T11:20:45.4673447Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual 2025-12-04T11:20:45.4673654Z raise error_metas.pop()[0].to_error( # type: ignore[index] 2025-12-04T11:20:45.4673806Z AssertionError: Scalars are not equal! 2025-12-04T11:20:45.4673815Z 2025-12-04T11:20:45.4673921Z Expected 1 but got 2. 2025-12-04T11:20:45.4674032Z Absolute difference: 1 2025-12-04T11:20:45.4674159Z Relative difference: 1.0 2025-12-04T11:20:45.4674164Z 2025-12-04T11:20:45.4674381Z To execute this test, run the following from the base repo dir: 2025-12-04T11:20:45.4675294Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 2025-12-04T11:20:45.4675302Z 2025-12-04T11:20:45.4675574Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:20:45.4675796Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:20:45.4675930Z stats [('calls_captured', 6)] 2025-12-04T11:20:45.4676955Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)] 2025-12-04T11:20:45.4677204Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:20:45.4677306Z graph_break [] 2025-12-04T11:20:45.4677526Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:20:45.4678805Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T11:20:45.4678926Z if out == self.unknown_value: 2025-12-04T11:20:45.4679664Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.4679770Z warnings.warn( 2025-12-04T11:20:45.4680494Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.4680674Z warnings.warn( 2025-12-04T11:20:45.4680893Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:20:45.4681009Z stats [('calls_captured', 6)] 2025-12-04T11:20:45.4681253Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:20:45.4682145Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)] 2025-12-04T11:20:45.4682261Z graph_break [] 2025-12-04T11:20:45.4682478Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:20:45.4683205Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.4683326Z warnings.warn( 2025-12-04T11:20:45.4684045Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.4684164Z warnings.warn( 2025-12-04T11:20:45.4684314Z =================================== FAILURES =================================== 2025-12-04T11:20:45.4684817Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 _ 2025-12-04T11:20:45.4684956Z Traceback (most recent call last): 2025-12-04T11:20:45.4685465Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda 2025-12-04T11:20:45.4685697Z self.assertEqual(counters["inductor"]["woq_matcher_count"], 1) 2025-12-04T11:20:45.4686172Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual 2025-12-04T11:20:45.4686335Z return super().assertEqual(x, y, *args, **kwargs) 2025-12-04T11:20:45.4686886Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual 2025-12-04T11:20:45.4687092Z raise error_metas.pop()[0].to_error( # type: ignore[index] 2025-12-04T11:20:45.4687225Z AssertionError: Scalars are not equal! 2025-12-04T11:20:45.4687233Z 2025-12-04T11:20:45.4687352Z Expected 1 but got 2. 2025-12-04T11:20:45.4687463Z Absolute difference: 1 2025-12-04T11:20:45.4687574Z Relative difference: 1.0 2025-12-04T11:20:45.4687593Z 2025-12-04T11:20:45.4687813Z To execute this test, run the following from the base repo dir: 2025-12-04T11:20:45.4688719Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 2025-12-04T11:20:45.4688724Z 2025-12-04T11:20:45.4689075Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:20:45.4689301Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:20:45.4689433Z stats [('calls_captured', 6)] 2025-12-04T11:20:45.4690321Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)] 2025-12-04T11:20:45.4690582Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:20:45.4690699Z graph_break [] 2025-12-04T11:20:45.4690916Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:20:45.4692131Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T11:20:45.4692264Z if out == self.unknown_value: 2025-12-04T11:20:45.4693023Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.4693138Z warnings.warn( 2025-12-04T11:20:45.4693861Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.4693967Z warnings.warn( 2025-12-04T11:20:45.4694197Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:20:45.4694315Z stats [('calls_captured', 6)] 2025-12-04T11:20:45.4694558Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:20:45.4695447Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)] 2025-12-04T11:20:45.4695548Z graph_break [] 2025-12-04T11:20:45.4695774Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:20:45.4696572Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.4696692Z warnings.warn( 2025-12-04T11:20:45.4697411Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.4697512Z warnings.warn( 2025-12-04T11:20:45.4697745Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:20:45.4697861Z stats [('calls_captured', 6)] 2025-12-04T11:20:45.4698088Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:20:45.4698999Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)] 2025-12-04T11:20:45.4699103Z graph_break [] 2025-12-04T11:20:45.4699332Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:20:45.4700054Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.4700161Z warnings.warn( 2025-12-04T11:20:45.4700895Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.4700997Z warnings.warn( 2025-12-04T11:20:45.4701923Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-e3ba96547605fc4e.xml - 2025-12-04T11:20:45.4702103Z =========================== short test summary info ============================ 2025-12-04T11:20:45.4703044Z FAILED [0.4765s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 - AssertionError: Scalars are not equal! 2025-12-04T11:20:45.4703088Z 2025-12-04T11:20:45.4703214Z Expected 1 but got 2. 2025-12-04T11:20:45.4703326Z Absolute difference: 1 2025-12-04T11:20:45.4703438Z Relative difference: 1.0 2025-12-04T11:20:45.4703457Z 2025-12-04T11:20:45.4703678Z To execute this test, run the following from the base repo dir: 2025-12-04T11:20:45.4704578Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 2025-12-04T11:20:45.4704584Z 2025-12-04T11:20:45.4704872Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:20:45.4705108Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:20:45.4705320Z ================== 1 failed, 13 deselected, 2 rerun in 20.98s ================== 2025-12-04T11:20:45.4705421Z Got exit code 1 2025-12-04T11:20:45.4705530Z Retrying single test... 2025-12-04T11:20:45.4705986Z W1204 11:15:33.271000 92193 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T11:20:45.4706648Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-ce470e45644e1cc6.xml 2025-12-04T11:20:45.4706815Z ============================= test session starts ============================== 2025-12-04T11:20:45.4707181Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T11:20:45.4707294Z cachedir: .pytest_cache 2025-12-04T11:20:45.4707830Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:20:45.4707959Z rootdir: /var/lib/jenkins/workspace 2025-12-04T11:20:45.4708070Z configfile: pytest.ini 2025-12-04T11:20:45.4708625Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:20:45.4708847Z collecting ... collected 58 items / 13 deselected / 45 selected 2025-12-04T11:20:45.4709832Z stepcurrent: skipping 10 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 2025-12-04T11:20:45.4709961Z Running 1 items in this shard 2025-12-04T11:20:45.4709966Z 2025-12-04T11:20:45.4711239Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 [W1204 11:15:38.196016043 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4711249Z 2025-12-04T11:20:45.4711784Z [W1204 11:15:54.368207465 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4711789Z 2025-12-04T11:20:45.4712303Z [W1204 11:15:54.368466367 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4712310Z 2025-12-04T11:20:45.4712835Z [W1204 11:15:54.375807917 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4712841Z 2025-12-04T11:20:45.4713353Z [W1204 11:15:54.376551806 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4713358Z 2025-12-04T11:20:45.4713955Z [W1204 11:15:54.376747358 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4713962Z 2025-12-04T11:20:45.4714472Z [W1204 11:15:54.383710601 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4714477Z 2025-12-04T11:20:45.4714999Z [W1204 11:15:54.384504814 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4715036Z 2025-12-04T11:20:45.4715544Z [W1204 11:15:54.384708070 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4715549Z 2025-12-04T11:20:45.4716053Z [W1204 11:15:55.521059950 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4716071Z 2025-12-04T11:20:45.4716588Z [W1204 11:15:55.522823241 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4716626Z 2025-12-04T11:20:45.4717134Z [W1204 11:15:55.523036471 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4717139Z 2025-12-04T11:20:45.4717663Z [W1204 11:15:55.527037880 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4717671Z 2025-12-04T11:20:45.4718179Z [W1204 11:15:55.527706923 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4718184Z 2025-12-04T11:20:45.4718707Z [W1204 11:15:55.527907937 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4718711Z 2025-12-04T11:20:45.4719226Z [W1204 11:15:55.534007333 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4719233Z 2025-12-04T11:20:45.4719757Z [W1204 11:15:55.534702336 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4719762Z 2025-12-04T11:20:45.4720268Z [W1204 11:15:55.534900175 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4720275Z 2025-12-04T11:20:45.4720416Z ('RERUN', {'yellow': True}) [20.0760s] [100%] 2025-12-04T11:20:45.4721697Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 [W1204 11:15:55.937847188 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4721703Z 2025-12-04T11:20:45.4722222Z [W1204 11:15:55.938631694 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4722229Z 2025-12-04T11:20:45.4722757Z [W1204 11:15:55.938835962 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4722761Z 2025-12-04T11:20:45.4723274Z [W1204 11:15:55.942949636 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4723282Z 2025-12-04T11:20:45.4723807Z [W1204 11:15:55.943604907 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4723812Z 2025-12-04T11:20:45.4724321Z [W1204 11:15:55.943799420 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4724326Z 2025-12-04T11:20:45.4724913Z [W1204 11:15:55.949947866 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4724921Z 2025-12-04T11:20:45.4725428Z [W1204 11:15:55.950689143 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4725433Z 2025-12-04T11:20:45.4725954Z [W1204 11:15:55.950886649 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4725990Z 2025-12-04T11:20:45.4726495Z [W1204 11:15:55.041243801 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4726500Z 2025-12-04T11:20:45.4727010Z [W1204 11:15:55.042043151 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4727028Z 2025-12-04T11:20:45.4727535Z [W1204 11:15:55.042257820 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4727544Z 2025-12-04T11:20:45.4728088Z [W1204 11:15:55.046260191 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4728093Z 2025-12-04T11:20:45.4728612Z [W1204 11:15:55.046930396 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4728619Z 2025-12-04T11:20:45.4729131Z [W1204 11:15:55.047133715 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4729137Z 2025-12-04T11:20:45.4729656Z [W1204 11:15:55.053277850 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4729661Z 2025-12-04T11:20:45.4730168Z [W1204 11:15:55.054146214 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4730173Z 2025-12-04T11:20:45.4730698Z [W1204 11:15:55.054344140 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4730706Z 2025-12-04T11:20:45.4730838Z ('RERUN', {'yellow': True}) [0.4799s] [100%] 2025-12-04T11:20:45.4732111Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 [W1204 11:15:55.393978117 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4732131Z 2025-12-04T11:20:45.4732640Z [W1204 11:15:55.394745204 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4732645Z 2025-12-04T11:20:45.4733153Z [W1204 11:15:55.394950183 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4733163Z 2025-12-04T11:20:45.4733678Z [W1204 11:15:55.399053287 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4733688Z 2025-12-04T11:20:45.4734200Z [W1204 11:15:55.399713580 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4734207Z 2025-12-04T11:20:45.4734724Z [W1204 11:15:55.399909314 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4734729Z 2025-12-04T11:20:45.4735236Z [W1204 11:15:56.406172827 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4735241Z 2025-12-04T11:20:45.4735761Z [W1204 11:15:56.406843800 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4735766Z 2025-12-04T11:20:45.4736415Z [W1204 11:15:56.407037737 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4736425Z 2025-12-04T11:20:45.4736953Z [W1204 11:15:56.498214108 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4736959Z 2025-12-04T11:20:45.4737496Z [W1204 11:15:56.499030838 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4737500Z 2025-12-04T11:20:45.4738007Z [W1204 11:15:56.499248261 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4738027Z 2025-12-04T11:20:45.4738534Z [W1204 11:15:56.503367065 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4738539Z 2025-12-04T11:20:45.4739049Z [W1204 11:15:56.504077510 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4739085Z 2025-12-04T11:20:45.4739606Z [W1204 11:15:56.504282020 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4739611Z 2025-12-04T11:20:45.4740119Z [W1204 11:15:56.510458116 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4740126Z 2025-12-04T11:20:45.4740787Z [W1204 11:15:56.511346507 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4740792Z 2025-12-04T11:20:45.4741302Z [W1204 11:15:56.511549959 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4741307Z 2025-12-04T11:20:45.4741427Z FAILED [0.4558s] [100%] 2025-12-04T11:20:45.4741437Z 2025-12-04T11:20:45.4741580Z ==================================== RERUNS ==================================== 2025-12-04T11:20:45.4742084Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 _ 2025-12-04T11:20:45.4742226Z Traceback (most recent call last): 2025-12-04T11:20:45.4742738Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda 2025-12-04T11:20:45.4742988Z self.assertEqual(counters["inductor"]["woq_matcher_count"], 1) 2025-12-04T11:20:45.4743454Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual 2025-12-04T11:20:45.4743620Z return super().assertEqual(x, y, *args, **kwargs) 2025-12-04T11:20:45.4744170Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual 2025-12-04T11:20:45.4744383Z raise error_metas.pop()[0].to_error( # type: ignore[index] 2025-12-04T11:20:45.4744520Z AssertionError: Scalars are not equal! 2025-12-04T11:20:45.4744540Z 2025-12-04T11:20:45.4744648Z Expected 1 but got 2. 2025-12-04T11:20:45.4744757Z Absolute difference: 1 2025-12-04T11:20:45.4744884Z Relative difference: 1.0 2025-12-04T11:20:45.4744889Z 2025-12-04T11:20:45.4745109Z To execute this test, run the following from the base repo dir: 2025-12-04T11:20:45.4746010Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 2025-12-04T11:20:45.4746016Z 2025-12-04T11:20:45.4746301Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:20:45.4746521Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:20:45.4746651Z stats [('calls_captured', 6)] 2025-12-04T11:20:45.4747612Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)] 2025-12-04T11:20:45.4747845Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:20:45.4747957Z graph_break [] 2025-12-04T11:20:45.4748173Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:20:45.4749436Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T11:20:45.4749554Z if out == self.unknown_value: 2025-12-04T11:20:45.4750279Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.4750398Z warnings.warn( 2025-12-04T11:20:45.4751123Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.4751275Z warnings.warn( 2025-12-04T11:20:45.4751778Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 _ 2025-12-04T11:20:45.4751903Z Traceback (most recent call last): 2025-12-04T11:20:45.4752429Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda 2025-12-04T11:20:45.4752662Z self.assertEqual(counters["inductor"]["woq_matcher_count"], 1) 2025-12-04T11:20:45.4753120Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual 2025-12-04T11:20:45.4753299Z return super().assertEqual(x, y, *args, **kwargs) 2025-12-04T11:20:45.4753836Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual 2025-12-04T11:20:45.4754059Z raise error_metas.pop()[0].to_error( # type: ignore[index] 2025-12-04T11:20:45.4754193Z AssertionError: Scalars are not equal! 2025-12-04T11:20:45.4754198Z 2025-12-04T11:20:45.4754306Z Expected 1 but got 2. 2025-12-04T11:20:45.4754430Z Absolute difference: 1 2025-12-04T11:20:45.4754542Z Relative difference: 1.0 2025-12-04T11:20:45.4754550Z 2025-12-04T11:20:45.4754767Z To execute this test, run the following from the base repo dir: 2025-12-04T11:20:45.4755677Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 2025-12-04T11:20:45.4755683Z 2025-12-04T11:20:45.4755953Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:20:45.4756186Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:20:45.4756307Z stats [('calls_captured', 6)] 2025-12-04T11:20:45.4757194Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)] 2025-12-04T11:20:45.4757436Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:20:45.4757537Z graph_break [] 2025-12-04T11:20:45.4757766Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:20:45.4758975Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T11:20:45.4759092Z if out == self.unknown_value: 2025-12-04T11:20:45.4759914Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.4760020Z warnings.warn( 2025-12-04T11:20:45.4760754Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.4760859Z warnings.warn( 2025-12-04T11:20:45.4761077Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:20:45.4761241Z stats [('calls_captured', 6)] 2025-12-04T11:20:45.4761467Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:20:45.4762374Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)] 2025-12-04T11:20:45.4762473Z graph_break [] 2025-12-04T11:20:45.4762694Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:20:45.4763431Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.4763575Z warnings.warn( 2025-12-04T11:20:45.4764292Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.4764410Z warnings.warn( 2025-12-04T11:20:45.4764556Z =================================== FAILURES =================================== 2025-12-04T11:20:45.4765068Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 _ 2025-12-04T11:20:45.4765193Z Traceback (most recent call last): 2025-12-04T11:20:45.4765701Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda 2025-12-04T11:20:45.4765950Z self.assertEqual(counters["inductor"]["woq_matcher_count"], 1) 2025-12-04T11:20:45.4766409Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual 2025-12-04T11:20:45.4766586Z return super().assertEqual(x, y, *args, **kwargs) 2025-12-04T11:20:45.4767123Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual 2025-12-04T11:20:45.4767334Z raise error_metas.pop()[0].to_error( # type: ignore[index] 2025-12-04T11:20:45.4767480Z AssertionError: Scalars are not equal! 2025-12-04T11:20:45.4767484Z 2025-12-04T11:20:45.4767591Z Expected 1 but got 2. 2025-12-04T11:20:45.4767698Z Absolute difference: 1 2025-12-04T11:20:45.4767820Z Relative difference: 1.0 2025-12-04T11:20:45.4767825Z 2025-12-04T11:20:45.4768039Z To execute this test, run the following from the base repo dir: 2025-12-04T11:20:45.4768962Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 2025-12-04T11:20:45.4768969Z 2025-12-04T11:20:45.4769240Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:20:45.4769458Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:20:45.4769587Z stats [('calls_captured', 6)] 2025-12-04T11:20:45.4770474Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)] 2025-12-04T11:20:45.4770713Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:20:45.4770814Z graph_break [] 2025-12-04T11:20:45.4771263Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:20:45.4772764Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T11:20:45.4772889Z if out == self.unknown_value: 2025-12-04T11:20:45.4773628Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.4773778Z warnings.warn( 2025-12-04T11:20:45.4774494Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.4774611Z warnings.warn( 2025-12-04T11:20:45.4774831Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:20:45.4774947Z stats [('calls_captured', 6)] 2025-12-04T11:20:45.4775191Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:20:45.4776084Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)] 2025-12-04T11:20:45.4776249Z graph_break [] 2025-12-04T11:20:45.4776537Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:20:45.4777271Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.4777391Z warnings.warn( 2025-12-04T11:20:45.4778111Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.4778228Z warnings.warn( 2025-12-04T11:20:45.4778447Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:20:45.4778569Z stats [('calls_captured', 6)] 2025-12-04T11:20:45.4778814Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:20:45.4779698Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)] 2025-12-04T11:20:45.4779800Z graph_break [] 2025-12-04T11:20:45.4780031Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:20:45.4780753Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.4780866Z warnings.warn( 2025-12-04T11:20:45.4781585Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.4781692Z warnings.warn( 2025-12-04T11:20:45.4782545Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-ce470e45644e1cc6.xml - 2025-12-04T11:20:45.4782721Z =========================== short test summary info ============================ 2025-12-04T11:20:45.4783684Z FAILED [0.4558s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 - AssertionError: Scalars are not equal! 2025-12-04T11:20:45.4783693Z 2025-12-04T11:20:45.4783800Z Expected 1 but got 2. 2025-12-04T11:20:45.4783911Z Absolute difference: 1 2025-12-04T11:20:45.4784036Z Relative difference: 1.0 2025-12-04T11:20:45.4784040Z 2025-12-04T11:20:45.4784259Z To execute this test, run the following from the base repo dir: 2025-12-04T11:20:45.4785779Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 2025-12-04T11:20:45.4785788Z 2025-12-04T11:20:45.4786065Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:20:45.4786259Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:20:45.4786478Z ================== 1 failed, 13 deselected, 2 rerun in 21.05s ================== 2025-12-04T11:20:45.4786618Z Got exit code 1 2025-12-04T11:20:45.4787453Z FAILED CONSISTENTLY: test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 2025-12-04T11:20:45.4787867Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T11:20:45.4788319Z W1204 11:16:08.082000 92367 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T11:20:45.4789006Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-cece0bb00c5477e6.xml 2025-12-04T11:20:45.4789207Z ============================= test session starts ============================== 2025-12-04T11:20:45.4789571Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T11:20:45.4789688Z cachedir: .pytest_cache 2025-12-04T11:20:45.4790211Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:20:45.4790352Z rootdir: /var/lib/jenkins/workspace 2025-12-04T11:20:45.4790463Z configfile: pytest.ini 2025-12-04T11:20:45.4791006Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:20:45.4791240Z collecting ... collected 58 items / 11 deselected / 47 selected 2025-12-04T11:20:45.4791391Z stepcurrent: skipping 11 already run items. 2025-12-04T11:20:45.4791528Z Running 3 items in this shard 2025-12-04T11:20:45.4791535Z 2025-12-04T11:20:45.4792407Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 ('RERUN', {'yellow': True}) [4.0197s] [ 33%] 2025-12-04T11:20:45.4793271Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 ('RERUN', {'yellow': True}) [0.5183s] [ 33%] 2025-12-04T11:20:45.4794071Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 FAILED [0.5161s] [ 33%] 2025-12-04T11:20:45.4794076Z 2025-12-04T11:20:45.4794221Z ==================================== RERUNS ==================================== 2025-12-04T11:20:45.4794754Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 _ 2025-12-04T11:20:45.4794884Z Traceback (most recent call last): 2025-12-04T11:20:45.4795402Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda 2025-12-04T11:20:45.4795652Z self.assertEqual(counters["inductor"]["woq_matcher_count"], 1) 2025-12-04T11:20:45.4796118Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual 2025-12-04T11:20:45.4796298Z return super().assertEqual(x, y, *args, **kwargs) 2025-12-04T11:20:45.4796835Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual 2025-12-04T11:20:45.4797042Z raise error_metas.pop()[0].to_error( # type: ignore[index] 2025-12-04T11:20:45.4797193Z AssertionError: Scalars are not equal! 2025-12-04T11:20:45.4797199Z 2025-12-04T11:20:45.4797308Z Expected 1 but got 2. 2025-12-04T11:20:45.4797499Z Absolute difference: 1 2025-12-04T11:20:45.4797630Z Relative difference: 1.0 2025-12-04T11:20:45.4797637Z 2025-12-04T11:20:45.4797853Z To execute this test, run the following from the base repo dir: 2025-12-04T11:20:45.4798773Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 2025-12-04T11:20:45.4798813Z 2025-12-04T11:20:45.4799083Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:20:45.4799302Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:20:45.4799432Z stats [('calls_captured', 6)] 2025-12-04T11:20:45.4799964Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)] 2025-12-04T11:20:45.4800212Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:20:45.4800316Z graph_break [] 2025-12-04T11:20:45.4800570Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:20:45.4801316Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.4801426Z warnings.warn( 2025-12-04T11:20:45.4802160Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.4802268Z warnings.warn( 2025-12-04T11:20:45.4802775Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 _ 2025-12-04T11:20:45.4802912Z Traceback (most recent call last): 2025-12-04T11:20:45.4803425Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda 2025-12-04T11:20:45.4803661Z self.assertEqual(counters["inductor"]["woq_matcher_count"], 1) 2025-12-04T11:20:45.4804136Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual 2025-12-04T11:20:45.4804302Z return super().assertEqual(x, y, *args, **kwargs) 2025-12-04T11:20:45.4804846Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual 2025-12-04T11:20:45.4805056Z raise error_metas.pop()[0].to_error( # type: ignore[index] 2025-12-04T11:20:45.4805191Z AssertionError: Scalars are not equal! 2025-12-04T11:20:45.4805197Z 2025-12-04T11:20:45.4805317Z Expected 1 but got 2. 2025-12-04T11:20:45.4805425Z Absolute difference: 1 2025-12-04T11:20:45.4805536Z Relative difference: 1.0 2025-12-04T11:20:45.4805554Z 2025-12-04T11:20:45.4805771Z To execute this test, run the following from the base repo dir: 2025-12-04T11:20:45.4806683Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 2025-12-04T11:20:45.4806691Z 2025-12-04T11:20:45.4806978Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:20:45.4807198Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:20:45.4807320Z stats [('calls_captured', 6)] 2025-12-04T11:20:45.4807864Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)] 2025-12-04T11:20:45.4808097Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:20:45.4808212Z graph_break [] 2025-12-04T11:20:45.4808428Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:20:45.4809224Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.4809343Z warnings.warn( 2025-12-04T11:20:45.4810067Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.4810187Z warnings.warn( 2025-12-04T11:20:45.4810402Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:20:45.4810634Z stats [('calls_captured', 6)] 2025-12-04T11:20:45.4810875Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:20:45.4811403Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)] 2025-12-04T11:20:45.4811503Z graph_break [] 2025-12-04T11:20:45.4811737Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:20:45.4812466Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.4812611Z warnings.warn( 2025-12-04T11:20:45.4813328Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.4813432Z warnings.warn( 2025-12-04T11:20:45.4813594Z =================================== FAILURES =================================== 2025-12-04T11:20:45.4814106Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 _ 2025-12-04T11:20:45.4814245Z Traceback (most recent call last): 2025-12-04T11:20:45.4814753Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda 2025-12-04T11:20:45.4814985Z self.assertEqual(counters["inductor"]["woq_matcher_count"], 1) 2025-12-04T11:20:45.4815463Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual 2025-12-04T11:20:45.4815630Z return super().assertEqual(x, y, *args, **kwargs) 2025-12-04T11:20:45.4816164Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual 2025-12-04T11:20:45.4816466Z raise error_metas.pop()[0].to_error( # type: ignore[index] 2025-12-04T11:20:45.4816610Z AssertionError: Scalars are not equal! 2025-12-04T11:20:45.4816616Z 2025-12-04T11:20:45.4816739Z Expected 1 but got 2. 2025-12-04T11:20:45.4816851Z Absolute difference: 1 2025-12-04T11:20:45.4816965Z Relative difference: 1.0 2025-12-04T11:20:45.4816970Z 2025-12-04T11:20:45.4817205Z To execute this test, run the following from the base repo dir: 2025-12-04T11:20:45.4818124Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 2025-12-04T11:20:45.4818131Z 2025-12-04T11:20:45.4818417Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:20:45.4818642Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:20:45.4818759Z stats [('calls_captured', 6)] 2025-12-04T11:20:45.4819304Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)] 2025-12-04T11:20:45.4819534Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:20:45.4819635Z graph_break [] 2025-12-04T11:20:45.4819864Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:20:45.4820595Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.4820712Z warnings.warn( 2025-12-04T11:20:45.4821511Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.4821618Z warnings.warn( 2025-12-04T11:20:45.4821850Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:20:45.4821966Z stats [('calls_captured', 6)] 2025-12-04T11:20:45.4822230Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:20:45.4822775Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)] 2025-12-04T11:20:45.4822878Z graph_break [] 2025-12-04T11:20:45.4823109Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:20:45.4823833Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.4823936Z warnings.warn( 2025-12-04T11:20:45.4824707Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.4824807Z warnings.warn( 2025-12-04T11:20:45.4825035Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:20:45.4825152Z stats [('calls_captured', 6)] 2025-12-04T11:20:45.4825379Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:20:45.4825916Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)] 2025-12-04T11:20:45.4826015Z graph_break [] 2025-12-04T11:20:45.4826228Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:20:45.4826964Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.4827068Z warnings.warn( 2025-12-04T11:20:45.4827794Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.4827894Z warnings.warn( 2025-12-04T11:20:45.4828738Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-cece0bb00c5477e6.xml - 2025-12-04T11:20:45.4828925Z =========================== short test summary info ============================ 2025-12-04T11:20:45.4829871Z FAILED [0.5161s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 - AssertionError: Scalars are not equal! 2025-12-04T11:20:45.4829877Z 2025-12-04T11:20:45.4829995Z Expected 1 but got 2. 2025-12-04T11:20:45.4830109Z Absolute difference: 1 2025-12-04T11:20:45.4830223Z Relative difference: 1.0 2025-12-04T11:20:45.4830228Z 2025-12-04T11:20:45.4830458Z To execute this test, run the following from the base repo dir: 2025-12-04T11:20:45.4831369Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 2025-12-04T11:20:45.4831376Z 2025-12-04T11:20:45.4831656Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:20:45.4831837Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:20:45.4832036Z ================== 1 failed, 11 deselected, 2 rerun in 5.09s =================== 2025-12-04T11:20:45.4832150Z Got exit code 1 2025-12-04T11:20:45.4832258Z Retrying single test... 2025-12-04T11:20:45.4832772Z W1204 11:16:28.792000 92544 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T11:20:45.4833447Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-4e672e5e3ae6046c.xml 2025-12-04T11:20:45.4833615Z ============================= test session starts ============================== 2025-12-04T11:20:45.4833977Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T11:20:45.4834122Z cachedir: .pytest_cache 2025-12-04T11:20:45.4834638Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:20:45.4834778Z rootdir: /var/lib/jenkins/workspace 2025-12-04T11:20:45.4834887Z configfile: pytest.ini 2025-12-04T11:20:45.4835438Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:20:45.4835663Z collecting ... collected 58 items / 13 deselected / 45 selected 2025-12-04T11:20:45.4836687Z stepcurrent: skipping 11 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 2025-12-04T11:20:45.4836818Z Running 1 items in this shard 2025-12-04T11:20:45.4836824Z 2025-12-04T11:20:45.4838109Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 [W1204 11:16:32.915856585 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4838115Z 2025-12-04T11:20:45.4838647Z [W1204 11:16:48.167318131 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4838652Z 2025-12-04T11:20:45.4839170Z [W1204 11:16:48.167579688 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4839177Z 2025-12-04T11:20:45.4839699Z [W1204 11:16:48.175158765 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4839705Z 2025-12-04T11:20:45.4840215Z [W1204 11:16:48.175921119 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4840222Z 2025-12-04T11:20:45.4840728Z [W1204 11:16:48.176117512 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4840747Z 2025-12-04T11:20:45.4841253Z [W1204 11:16:48.183213596 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4841258Z 2025-12-04T11:20:45.4841769Z [W1204 11:16:48.183899664 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4841776Z 2025-12-04T11:20:45.4842298Z [W1204 11:16:48.184087193 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4842303Z 2025-12-04T11:20:45.4842810Z [W1204 11:16:50.190786562 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4842818Z 2025-12-04T11:20:45.4843338Z [W1204 11:16:50.192556017 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4843344Z 2025-12-04T11:20:45.4843853Z [W1204 11:16:50.192776658 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4843857Z 2025-12-04T11:20:45.4844465Z [W1204 11:16:50.196826901 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4844471Z 2025-12-04T11:20:45.4844982Z [W1204 11:16:50.197502434 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4844988Z 2025-12-04T11:20:45.4845508Z [W1204 11:16:50.197705888 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4845542Z 2025-12-04T11:20:45.4846051Z [W1204 11:16:50.203877411 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4846056Z 2025-12-04T11:20:45.4846562Z [W1204 11:16:50.204554953 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4846580Z 2025-12-04T11:20:45.4847087Z [W1204 11:16:50.204760846 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4847096Z 2025-12-04T11:20:45.4847234Z ('RERUN', {'yellow': True}) [20.2529s] [100%] 2025-12-04T11:20:45.4848554Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 [W1204 11:16:51.657032587 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4848562Z 2025-12-04T11:20:45.4849073Z [W1204 11:16:51.657874289 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4849078Z 2025-12-04T11:20:45.4849601Z [W1204 11:16:51.658088740 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4849606Z 2025-12-04T11:20:45.4850112Z [W1204 11:16:51.662327076 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4850121Z 2025-12-04T11:20:45.4850642Z [W1204 11:16:51.663247874 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4850647Z 2025-12-04T11:20:45.4851155Z [W1204 11:16:51.663455990 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4851162Z 2025-12-04T11:20:45.4851678Z [W1204 11:16:51.669747936 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4851683Z 2025-12-04T11:20:45.4852189Z [W1204 11:16:51.670484810 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4852195Z 2025-12-04T11:20:45.4852702Z [W1204 11:16:51.670689017 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4852724Z 2025-12-04T11:20:45.4853232Z [W1204 11:16:51.763533023 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4853239Z 2025-12-04T11:20:45.4853744Z [W1204 11:16:51.764363973 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4853752Z 2025-12-04T11:20:45.4854280Z [W1204 11:16:51.764592492 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4854286Z 2025-12-04T11:20:45.4854794Z [W1204 11:16:51.768705393 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4854799Z 2025-12-04T11:20:45.4855320Z [W1204 11:16:51.769389916 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4855324Z 2025-12-04T11:20:45.4855891Z [W1204 11:16:51.769587122 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4855899Z 2025-12-04T11:20:45.4856514Z [W1204 11:16:51.775848620 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4856520Z 2025-12-04T11:20:45.4857070Z [W1204 11:16:51.776789704 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4857075Z 2025-12-04T11:20:45.4857584Z [W1204 11:16:51.776990675 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4857603Z 2025-12-04T11:20:45.4857739Z ('RERUN', {'yellow': True}) [0.5341s] [100%] 2025-12-04T11:20:45.4859022Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 [W1204 11:16:51.166470451 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4859058Z 2025-12-04T11:20:45.4859588Z [W1204 11:16:51.167279693 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4859592Z 2025-12-04T11:20:45.4860108Z [W1204 11:16:51.167491178 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4860113Z 2025-12-04T11:20:45.4860636Z [W1204 11:16:51.171624226 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4860640Z 2025-12-04T11:20:45.4861151Z [W1204 11:16:51.172466790 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4861156Z 2025-12-04T11:20:45.4861688Z [W1204 11:16:51.172674957 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4861696Z 2025-12-04T11:20:45.4862204Z [W1204 11:16:51.178838112 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4862209Z 2025-12-04T11:20:45.4862726Z [W1204 11:16:51.179496581 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4862734Z 2025-12-04T11:20:45.4863241Z [W1204 11:16:51.179685508 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4863246Z 2025-12-04T11:20:45.4863758Z [W1204 11:16:51.271568151 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4863779Z 2025-12-04T11:20:45.4864291Z [W1204 11:16:51.273564027 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4864298Z 2025-12-04T11:20:45.4864802Z [W1204 11:16:51.273791397 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4864807Z 2025-12-04T11:20:45.4865331Z [W1204 11:16:51.279240334 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4865338Z 2025-12-04T11:20:45.4865850Z [W1204 11:16:51.280365366 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4865855Z 2025-12-04T11:20:45.4866382Z [W1204 11:16:51.280609663 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4866387Z 2025-12-04T11:20:45.4866957Z [W1204 11:16:51.287511313 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4866964Z 2025-12-04T11:20:45.4867485Z [W1204 11:16:51.288800517 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4867490Z 2025-12-04T11:20:45.4868000Z [W1204 11:16:51.289001309 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4868037Z 2025-12-04T11:20:45.4868143Z FAILED [0.5095s] [100%] 2025-12-04T11:20:45.4868163Z 2025-12-04T11:20:45.4868310Z ==================================== RERUNS ==================================== 2025-12-04T11:20:45.4868830Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 _ 2025-12-04T11:20:45.4868968Z Traceback (most recent call last): 2025-12-04T11:20:45.4869487Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda 2025-12-04T11:20:45.4869718Z self.assertEqual(counters["inductor"]["woq_matcher_count"], 1) 2025-12-04T11:20:45.4870233Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual 2025-12-04T11:20:45.4870396Z return super().assertEqual(x, y, *args, **kwargs) 2025-12-04T11:20:45.4871140Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual 2025-12-04T11:20:45.4871453Z raise error_metas.pop()[0].to_error( # type: ignore[index] 2025-12-04T11:20:45.4871614Z AssertionError: Scalars are not equal! 2025-12-04T11:20:45.4871619Z 2025-12-04T11:20:45.4871744Z Expected 1 but got 2. 2025-12-04T11:20:45.4871857Z Absolute difference: 1 2025-12-04T11:20:45.4871968Z Relative difference: 1.0 2025-12-04T11:20:45.4871973Z 2025-12-04T11:20:45.4872207Z To execute this test, run the following from the base repo dir: 2025-12-04T11:20:45.4873123Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 2025-12-04T11:20:45.4873131Z 2025-12-04T11:20:45.4873416Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:20:45.4873640Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:20:45.4873760Z stats [('calls_captured', 6)] 2025-12-04T11:20:45.4874305Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)] 2025-12-04T11:20:45.4874534Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:20:45.4874648Z graph_break [] 2025-12-04T11:20:45.4874864Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:20:45.4876077Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T11:20:45.4876210Z if out == self.unknown_value: 2025-12-04T11:20:45.4876933Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.4877052Z warnings.warn( 2025-12-04T11:20:45.4877773Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.4877878Z warnings.warn( 2025-12-04T11:20:45.4878406Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 _ 2025-12-04T11:20:45.4878530Z Traceback (most recent call last): 2025-12-04T11:20:45.4879178Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda 2025-12-04T11:20:45.4879426Z self.assertEqual(counters["inductor"]["woq_matcher_count"], 1) 2025-12-04T11:20:45.4879886Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual 2025-12-04T11:20:45.4880071Z return super().assertEqual(x, y, *args, **kwargs) 2025-12-04T11:20:45.4880651Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual 2025-12-04T11:20:45.4880859Z raise error_metas.pop()[0].to_error( # type: ignore[index] 2025-12-04T11:20:45.4881007Z AssertionError: Scalars are not equal! 2025-12-04T11:20:45.4881012Z 2025-12-04T11:20:45.4881121Z Expected 1 but got 2. 2025-12-04T11:20:45.4881242Z Absolute difference: 1 2025-12-04T11:20:45.4881354Z Relative difference: 1.0 2025-12-04T11:20:45.4881359Z 2025-12-04T11:20:45.4881580Z To execute this test, run the following from the base repo dir: 2025-12-04T11:20:45.4882510Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 2025-12-04T11:20:45.4882585Z 2025-12-04T11:20:45.4882858Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:20:45.4883096Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:20:45.4883216Z stats [('calls_captured', 6)] 2025-12-04T11:20:45.4883744Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)] 2025-12-04T11:20:45.4883989Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:20:45.4884091Z graph_break [] 2025-12-04T11:20:45.4884308Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:20:45.4885534Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T11:20:45.4885654Z if out == self.unknown_value: 2025-12-04T11:20:45.4886391Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.4886497Z warnings.warn( 2025-12-04T11:20:45.4887218Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.4887334Z warnings.warn( 2025-12-04T11:20:45.4887553Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:20:45.4887682Z stats [('calls_captured', 6)] 2025-12-04T11:20:45.4887915Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:20:45.4888446Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)] 2025-12-04T11:20:45.4888559Z graph_break [] 2025-12-04T11:20:45.4888774Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:20:45.4889499Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.4889617Z warnings.warn( 2025-12-04T11:20:45.4890336Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.4890450Z warnings.warn( 2025-12-04T11:20:45.4890597Z =================================== FAILURES =================================== 2025-12-04T11:20:45.4891178Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 _ 2025-12-04T11:20:45.4891322Z Traceback (most recent call last): 2025-12-04T11:20:45.4891830Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda 2025-12-04T11:20:45.4892080Z self.assertEqual(counters["inductor"]["woq_matcher_count"], 1) 2025-12-04T11:20:45.4892574Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual 2025-12-04T11:20:45.4892739Z return super().assertEqual(x, y, *args, **kwargs) 2025-12-04T11:20:45.4893291Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual 2025-12-04T11:20:45.4893497Z raise error_metas.pop()[0].to_error( # type: ignore[index] 2025-12-04T11:20:45.4893631Z AssertionError: Scalars are not equal! 2025-12-04T11:20:45.4893636Z 2025-12-04T11:20:45.4893763Z Expected 1 but got 2. 2025-12-04T11:20:45.4893874Z Absolute difference: 1 2025-12-04T11:20:45.4894030Z Relative difference: 1.0 2025-12-04T11:20:45.4894035Z 2025-12-04T11:20:45.4894251Z To execute this test, run the following from the base repo dir: 2025-12-04T11:20:45.4895156Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 2025-12-04T11:20:45.4895165Z 2025-12-04T11:20:45.4895449Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:20:45.4895668Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:20:45.4895798Z stats [('calls_captured', 6)] 2025-12-04T11:20:45.4896396Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)] 2025-12-04T11:20:45.4896634Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:20:45.4896751Z graph_break [] 2025-12-04T11:20:45.4896966Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:20:45.4898175Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T11:20:45.4898310Z if out == self.unknown_value: 2025-12-04T11:20:45.4899033Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.4899151Z warnings.warn( 2025-12-04T11:20:45.4899867Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.4899974Z warnings.warn( 2025-12-04T11:20:45.4900208Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:20:45.4900328Z stats [('calls_captured', 6)] 2025-12-04T11:20:45.4900570Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:20:45.4901100Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)] 2025-12-04T11:20:45.4901202Z graph_break [] 2025-12-04T11:20:45.4901433Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:20:45.4902152Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.4902257Z warnings.warn( 2025-12-04T11:20:45.4903053Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.4903156Z warnings.warn( 2025-12-04T11:20:45.4903389Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:20:45.4903505Z stats [('calls_captured', 6)] 2025-12-04T11:20:45.4903733Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:20:45.4904271Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)] 2025-12-04T11:20:45.4904404Z graph_break [] 2025-12-04T11:20:45.4904619Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:20:45.4905362Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.4905464Z warnings.warn( 2025-12-04T11:20:45.4906199Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.4906334Z warnings.warn( 2025-12-04T11:20:45.4907167Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-4e672e5e3ae6046c.xml - 2025-12-04T11:20:45.4907351Z =========================== short test summary info ============================ 2025-12-04T11:20:45.4908304Z FAILED [0.5095s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 - AssertionError: Scalars are not equal! 2025-12-04T11:20:45.4908311Z 2025-12-04T11:20:45.4908431Z Expected 1 but got 2. 2025-12-04T11:20:45.4908540Z Absolute difference: 1 2025-12-04T11:20:45.4908650Z Relative difference: 1.0 2025-12-04T11:20:45.4908656Z 2025-12-04T11:20:45.4908885Z To execute this test, run the following from the base repo dir: 2025-12-04T11:20:45.4909801Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 2025-12-04T11:20:45.4909810Z 2025-12-04T11:20:45.4910093Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:20:45.4910274Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:20:45.4910476Z ================== 1 failed, 13 deselected, 2 rerun in 21.33s ================== 2025-12-04T11:20:45.4910592Z Got exit code 1 2025-12-04T11:20:45.4910701Z Retrying single test... 2025-12-04T11:20:45.4911159Z W1204 11:17:03.738000 92726 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T11:20:45.4911816Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-65775801d71c7290.xml 2025-12-04T11:20:45.4911987Z ============================= test session starts ============================== 2025-12-04T11:20:45.4912359Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T11:20:45.4912473Z cachedir: .pytest_cache 2025-12-04T11:20:45.4912994Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:20:45.4913139Z rootdir: /var/lib/jenkins/workspace 2025-12-04T11:20:45.4913250Z configfile: pytest.ini 2025-12-04T11:20:45.4913809Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:20:45.4914028Z collecting ... collected 58 items / 13 deselected / 45 selected 2025-12-04T11:20:45.4915091Z stepcurrent: skipping 11 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 2025-12-04T11:20:45.4915226Z Running 1 items in this shard 2025-12-04T11:20:45.4915231Z 2025-12-04T11:20:45.4916521Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 [W1204 11:17:07.833268456 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4916559Z 2025-12-04T11:20:45.4917093Z [W1204 11:17:22.152782250 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4917099Z 2025-12-04T11:20:45.4917612Z [W1204 11:17:22.153052159 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4917618Z 2025-12-04T11:20:45.4918142Z [W1204 11:17:22.160552917 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4918178Z 2025-12-04T11:20:45.4918690Z [W1204 11:17:22.161351484 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4918696Z 2025-12-04T11:20:45.4919219Z [W1204 11:17:22.161551623 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4919226Z 2025-12-04T11:20:45.4919733Z [W1204 11:17:22.168665291 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4919739Z 2025-12-04T11:20:45.4920246Z [W1204 11:17:22.169395548 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4920264Z 2025-12-04T11:20:45.4920777Z [W1204 11:17:22.169589756 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4920782Z 2025-12-04T11:20:45.4921291Z [W1204 11:17:24.182102377 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4921296Z 2025-12-04T11:20:45.4921821Z [W1204 11:17:24.183860691 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4921829Z 2025-12-04T11:20:45.4922337Z [W1204 11:17:24.184081254 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4922342Z 2025-12-04T11:20:45.4922865Z [W1204 11:17:24.188096646 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4922869Z 2025-12-04T11:20:45.4923375Z [W1204 11:17:24.188791973 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4923385Z 2025-12-04T11:20:45.4923905Z [W1204 11:17:24.188994605 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4923913Z 2025-12-04T11:20:45.4924420Z [W1204 11:17:24.195201623 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4924428Z 2025-12-04T11:20:45.4924959Z [W1204 11:17:24.195891719 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4924963Z 2025-12-04T11:20:45.4925470Z [W1204 11:17:24.196087581 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4925476Z 2025-12-04T11:20:45.4925611Z ('RERUN', {'yellow': True}) [19.2846s] [100%] 2025-12-04T11:20:45.4926975Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 [W1204 11:17:25.656719457 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4926984Z 2025-12-04T11:20:45.4927496Z [W1204 11:17:25.657625909 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4927549Z 2025-12-04T11:20:45.4928070Z [W1204 11:17:25.657842586 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4928075Z 2025-12-04T11:20:45.4928584Z [W1204 11:17:25.661947320 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4928589Z 2025-12-04T11:20:45.4929112Z [W1204 11:17:25.662844559 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4929117Z 2025-12-04T11:20:45.4929630Z [W1204 11:17:25.663045970 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4929666Z 2025-12-04T11:20:45.4930189Z [W1204 11:17:25.669201584 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4930194Z 2025-12-04T11:20:45.4930706Z [W1204 11:17:25.669893848 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4930712Z 2025-12-04T11:20:45.4931237Z [W1204 11:17:25.670122148 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4931241Z 2025-12-04T11:20:45.4931750Z [W1204 11:17:25.762623881 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4931755Z 2025-12-04T11:20:45.4932271Z [W1204 11:17:25.763462481 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4932278Z 2025-12-04T11:20:45.4932803Z [W1204 11:17:25.763678765 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4932808Z 2025-12-04T11:20:45.4933321Z [W1204 11:17:25.767772694 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4933328Z 2025-12-04T11:20:45.4933856Z [W1204 11:17:25.768498541 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4933861Z 2025-12-04T11:20:45.4934373Z [W1204 11:17:25.768717848 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4934378Z 2025-12-04T11:20:45.4934908Z [W1204 11:17:25.774964078 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4934914Z 2025-12-04T11:20:45.4935424Z [W1204 11:17:25.775899218 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4935429Z 2025-12-04T11:20:45.4935955Z [W1204 11:17:25.776102383 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4935962Z 2025-12-04T11:20:45.4936098Z ('RERUN', {'yellow': True}) [0.5392s] [100%] 2025-12-04T11:20:45.4937464Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 [W1204 11:17:25.161046626 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4937485Z 2025-12-04T11:20:45.4938078Z [W1204 11:17:25.161779941 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4938086Z 2025-12-04T11:20:45.4938603Z [W1204 11:17:25.161978888 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4938608Z 2025-12-04T11:20:45.4939134Z [W1204 11:17:25.165866020 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4939169Z 2025-12-04T11:20:45.4939683Z [W1204 11:17:25.166630533 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4939688Z 2025-12-04T11:20:45.4940211Z [W1204 11:17:25.166819565 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4940216Z 2025-12-04T11:20:45.4940732Z [W1204 11:17:25.172831807 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4940770Z 2025-12-04T11:20:45.4941295Z [W1204 11:17:25.173463488 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4941300Z 2025-12-04T11:20:45.4941812Z [W1204 11:17:25.173651250 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4941819Z 2025-12-04T11:20:45.4942344Z [W1204 11:17:25.262561730 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4942349Z 2025-12-04T11:20:45.4942860Z [W1204 11:17:25.263330167 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4942865Z 2025-12-04T11:20:45.4943380Z [W1204 11:17:25.263535548 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4943401Z 2025-12-04T11:20:45.4943916Z [W1204 11:17:25.269107917 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4943921Z 2025-12-04T11:20:45.4944437Z [W1204 11:17:25.270021935 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4944444Z 2025-12-04T11:20:45.4944972Z [W1204 11:17:25.270228730 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4944977Z 2025-12-04T11:20:45.4945487Z [W1204 11:17:25.277704694 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4945492Z 2025-12-04T11:20:45.4946024Z [W1204 11:17:25.278451894 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4946029Z 2025-12-04T11:20:45.4946542Z [W1204 11:17:25.278648270 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.4946548Z 2025-12-04T11:20:45.4946669Z FAILED [0.5035s] [100%] 2025-12-04T11:20:45.4946674Z 2025-12-04T11:20:45.4946822Z ==================================== RERUNS ==================================== 2025-12-04T11:20:45.4947341Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 _ 2025-12-04T11:20:45.4947478Z Traceback (most recent call last): 2025-12-04T11:20:45.4947993Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda 2025-12-04T11:20:45.4948235Z self.assertEqual(counters["inductor"]["woq_matcher_count"], 1) 2025-12-04T11:20:45.4948784Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual 2025-12-04T11:20:45.4948951Z return super().assertEqual(x, y, *args, **kwargs) 2025-12-04T11:20:45.4949509Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual 2025-12-04T11:20:45.4949716Z raise error_metas.pop()[0].to_error( # type: ignore[index] 2025-12-04T11:20:45.4949849Z AssertionError: Scalars are not equal! 2025-12-04T11:20:45.4949887Z 2025-12-04T11:20:45.4950006Z Expected 1 but got 2. 2025-12-04T11:20:45.4950115Z Absolute difference: 1 2025-12-04T11:20:45.4950241Z Relative difference: 1.0 2025-12-04T11:20:45.4950246Z 2025-12-04T11:20:45.4950461Z To execute this test, run the following from the base repo dir: 2025-12-04T11:20:45.4951374Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 2025-12-04T11:20:45.4951379Z 2025-12-04T11:20:45.4951669Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:20:45.4951931Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:20:45.4952065Z stats [('calls_captured', 6)] 2025-12-04T11:20:45.4952595Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)] 2025-12-04T11:20:45.4952826Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:20:45.4952943Z graph_break [] 2025-12-04T11:20:45.4953161Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:20:45.4954370Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T11:20:45.4954514Z if out == self.unknown_value: 2025-12-04T11:20:45.4955244Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.4955367Z warnings.warn( 2025-12-04T11:20:45.4956090Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.4956199Z warnings.warn( 2025-12-04T11:20:45.4956728Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 _ 2025-12-04T11:20:45.4956853Z Traceback (most recent call last): 2025-12-04T11:20:45.4957375Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda 2025-12-04T11:20:45.4957607Z self.assertEqual(counters["inductor"]["woq_matcher_count"], 1) 2025-12-04T11:20:45.4958071Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual 2025-12-04T11:20:45.4958253Z return super().assertEqual(x, y, *args, **kwargs) 2025-12-04T11:20:45.4958788Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual 2025-12-04T11:20:45.4959006Z raise error_metas.pop()[0].to_error( # type: ignore[index] 2025-12-04T11:20:45.4959141Z AssertionError: Scalars are not equal! 2025-12-04T11:20:45.4959146Z 2025-12-04T11:20:45.4959253Z Expected 1 but got 2. 2025-12-04T11:20:45.4959374Z Absolute difference: 1 2025-12-04T11:20:45.4959485Z Relative difference: 1.0 2025-12-04T11:20:45.4959490Z 2025-12-04T11:20:45.4959706Z To execute this test, run the following from the base repo dir: 2025-12-04T11:20:45.4960724Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 2025-12-04T11:20:45.4960731Z 2025-12-04T11:20:45.4961004Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:20:45.4961236Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:20:45.4961354Z stats [('calls_captured', 6)] 2025-12-04T11:20:45.4961887Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)] 2025-12-04T11:20:45.4962166Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:20:45.4962267Z graph_break [] 2025-12-04T11:20:45.4962498Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:20:45.4963722Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T11:20:45.4963841Z if out == self.unknown_value: 2025-12-04T11:20:45.4964617Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.4964722Z warnings.warn( 2025-12-04T11:20:45.4965456Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.4965564Z warnings.warn( 2025-12-04T11:20:45.4965783Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:20:45.4965914Z stats [('calls_captured', 6)] 2025-12-04T11:20:45.4966144Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:20:45.4966675Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)] 2025-12-04T11:20:45.4966794Z graph_break [] 2025-12-04T11:20:45.4967012Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:20:45.4967758Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.4967859Z warnings.warn( 2025-12-04T11:20:45.4968579Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.4968693Z warnings.warn( 2025-12-04T11:20:45.4968840Z =================================== FAILURES =================================== 2025-12-04T11:20:45.4969350Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 _ 2025-12-04T11:20:45.4969493Z Traceback (most recent call last): 2025-12-04T11:20:45.4970009Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda 2025-12-04T11:20:45.4970256Z self.assertEqual(counters["inductor"]["woq_matcher_count"], 1) 2025-12-04T11:20:45.4970715Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual 2025-12-04T11:20:45.4970880Z return super().assertEqual(x, y, *args, **kwargs) 2025-12-04T11:20:45.4971780Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual 2025-12-04T11:20:45.4971991Z raise error_metas.pop()[0].to_error( # type: ignore[index] 2025-12-04T11:20:45.4972141Z AssertionError: Scalars are not equal! 2025-12-04T11:20:45.4972146Z 2025-12-04T11:20:45.4972254Z Expected 1 but got 2. 2025-12-04T11:20:45.4972364Z Absolute difference: 1 2025-12-04T11:20:45.4972490Z Relative difference: 1.0 2025-12-04T11:20:45.4972495Z 2025-12-04T11:20:45.4972711Z To execute this test, run the following from the base repo dir: 2025-12-04T11:20:45.4973803Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 2025-12-04T11:20:45.4973828Z 2025-12-04T11:20:45.4974100Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:20:45.4974366Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:20:45.4974500Z stats [('calls_captured', 6)] 2025-12-04T11:20:45.4975029Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)] 2025-12-04T11:20:45.4975259Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:20:45.4975375Z graph_break [] 2025-12-04T11:20:45.4975593Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:20:45.4976880Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T11:20:45.4977061Z if out == self.unknown_value: 2025-12-04T11:20:45.4977795Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.4977915Z warnings.warn( 2025-12-04T11:20:45.4978638Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.4978756Z warnings.warn( 2025-12-04T11:20:45.4978974Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:20:45.4979090Z stats [('calls_captured', 6)] 2025-12-04T11:20:45.4979336Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:20:45.4979867Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)] 2025-12-04T11:20:45.4979966Z graph_break [] 2025-12-04T11:20:45.4980197Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:20:45.4980920Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.4981037Z warnings.warn( 2025-12-04T11:20:45.4981756Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.4981857Z warnings.warn( 2025-12-04T11:20:45.4982086Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:20:45.4982202Z stats [('calls_captured', 6)] 2025-12-04T11:20:45.4982434Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:20:45.4982977Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)] 2025-12-04T11:20:45.4983077Z graph_break [] 2025-12-04T11:20:45.4983305Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:20:45.4984029Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.4984130Z warnings.warn( 2025-12-04T11:20:45.4984862Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.4984965Z warnings.warn( 2025-12-04T11:20:45.4985869Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-65775801d71c7290.xml - 2025-12-04T11:20:45.4986047Z =========================== short test summary info ============================ 2025-12-04T11:20:45.4987004Z FAILED [0.5035s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 - AssertionError: Scalars are not equal! 2025-12-04T11:20:45.4987043Z 2025-12-04T11:20:45.4987167Z Expected 1 but got 2. 2025-12-04T11:20:45.4987281Z Absolute difference: 1 2025-12-04T11:20:45.4987393Z Relative difference: 1.0 2025-12-04T11:20:45.4987411Z 2025-12-04T11:20:45.4987631Z To execute this test, run the following from the base repo dir: 2025-12-04T11:20:45.4988543Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 2025-12-04T11:20:45.4988553Z 2025-12-04T11:20:45.4988832Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:20:45.4989050Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:20:45.4989263Z ================== 1 failed, 13 deselected, 2 rerun in 20.36s ================== 2025-12-04T11:20:45.4989364Z Got exit code 1 2025-12-04T11:20:45.4990198Z FAILED CONSISTENTLY: test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 2025-12-04T11:20:45.4990624Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T11:20:45.4991070Z W1204 11:17:37.556000 92908 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T11:20:45.4991732Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-9ed754aaaf490f98.xml 2025-12-04T11:20:45.4991913Z ============================= test session starts ============================== 2025-12-04T11:20:45.4997953Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T11:20:45.4998124Z cachedir: .pytest_cache 2025-12-04T11:20:45.4998689Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:20:45.4998828Z rootdir: /var/lib/jenkins/workspace 2025-12-04T11:20:45.4998937Z configfile: pytest.ini 2025-12-04T11:20:45.4999497Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:20:45.4999717Z collecting ... collected 58 items / 12 deselected / 46 selected 2025-12-04T11:20:45.4999881Z stepcurrent: skipping 12 already run items. 2025-12-04T11:20:45.5000001Z Running 2 items in this shard 2025-12-04T11:20:45.5000015Z 2025-12-04T11:20:45.5000910Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 ('RERUN', {'yellow': True}) [4.3530s] [ 50%] 2025-12-04T11:20:45.5001796Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 ('RERUN', {'yellow': True}) [0.9081s] [ 50%] 2025-12-04T11:20:45.5002587Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 FAILED [0.9217s] [ 50%] 2025-12-04T11:20:45.5002594Z 2025-12-04T11:20:45.5002755Z ==================================== RERUNS ==================================== 2025-12-04T11:20:45.5003271Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 _ 2025-12-04T11:20:45.5003617Z Traceback (most recent call last): 2025-12-04T11:20:45.5004148Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda 2025-12-04T11:20:45.5004384Z self.assertEqual(counters["inductor"]["woq_matcher_count"], 1) 2025-12-04T11:20:45.5004863Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual 2025-12-04T11:20:45.5005073Z return super().assertEqual(x, y, *args, **kwargs) 2025-12-04T11:20:45.5005609Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual 2025-12-04T11:20:45.5005832Z raise error_metas.pop()[0].to_error( # type: ignore[index] 2025-12-04T11:20:45.5005967Z AssertionError: Scalars are not equal! 2025-12-04T11:20:45.5005973Z 2025-12-04T11:20:45.5006097Z Expected 1 but got 2. 2025-12-04T11:20:45.5006206Z Absolute difference: 1 2025-12-04T11:20:45.5006321Z Relative difference: 1.0 2025-12-04T11:20:45.5006332Z 2025-12-04T11:20:45.5006562Z To execute this test, run the following from the base repo dir: 2025-12-04T11:20:45.5007526Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 2025-12-04T11:20:45.5007533Z 2025-12-04T11:20:45.5007820Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:20:45.5008046Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:20:45.5008164Z stats [('calls_captured', 6)] 2025-12-04T11:20:45.5008711Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)] 2025-12-04T11:20:45.5008941Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:20:45.5009041Z graph_break [] 2025-12-04T11:20:45.5009278Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:20:45.5010022Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.5010139Z warnings.warn( 2025-12-04T11:20:45.5010859Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.5010965Z warnings.warn( 2025-12-04T11:20:45.5011495Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 _ 2025-12-04T11:20:45.5011619Z Traceback (most recent call last): 2025-12-04T11:20:45.5012143Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda 2025-12-04T11:20:45.5012374Z self.assertEqual(counters["inductor"]["woq_matcher_count"], 1) 2025-12-04T11:20:45.5012838Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual 2025-12-04T11:20:45.5013019Z return super().assertEqual(x, y, *args, **kwargs) 2025-12-04T11:20:45.5013553Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual 2025-12-04T11:20:45.5013763Z raise error_metas.pop()[0].to_error( # type: ignore[index] 2025-12-04T11:20:45.5013909Z AssertionError: Scalars are not equal! 2025-12-04T11:20:45.5013914Z 2025-12-04T11:20:45.5014021Z Expected 1 but got 2. 2025-12-04T11:20:45.5014143Z Absolute difference: 1 2025-12-04T11:20:45.5014252Z Relative difference: 1.0 2025-12-04T11:20:45.5014257Z 2025-12-04T11:20:45.5014474Z To execute this test, run the following from the base repo dir: 2025-12-04T11:20:45.5015478Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 2025-12-04T11:20:45.5015487Z 2025-12-04T11:20:45.5015755Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:20:45.5015989Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:20:45.5016106Z stats [('calls_captured', 6)] 2025-12-04T11:20:45.5016742Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)] 2025-12-04T11:20:45.5017027Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:20:45.5017128Z graph_break [] 2025-12-04T11:20:45.5017347Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:20:45.5018093Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.5018201Z warnings.warn( 2025-12-04T11:20:45.5018936Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.5019070Z warnings.warn( 2025-12-04T11:20:45.5019289Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:20:45.5019425Z stats [('calls_captured', 6)] 2025-12-04T11:20:45.5019654Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:20:45.5020182Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)] 2025-12-04T11:20:45.5020298Z graph_break [] 2025-12-04T11:20:45.5020513Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:20:45.5021248Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.5021352Z warnings.warn( 2025-12-04T11:20:45.5022066Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.5022179Z warnings.warn( 2025-12-04T11:20:45.5022326Z =================================== FAILURES =================================== 2025-12-04T11:20:45.5022858Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 _ 2025-12-04T11:20:45.5022982Z Traceback (most recent call last): 2025-12-04T11:20:45.5023492Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda 2025-12-04T11:20:45.5023733Z self.assertEqual(counters["inductor"]["woq_matcher_count"], 1) 2025-12-04T11:20:45.5024195Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual 2025-12-04T11:20:45.5024360Z return super().assertEqual(x, y, *args, **kwargs) 2025-12-04T11:20:45.5024910Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual 2025-12-04T11:20:45.5025116Z raise error_metas.pop()[0].to_error( # type: ignore[index] 2025-12-04T11:20:45.5025261Z AssertionError: Scalars are not equal! 2025-12-04T11:20:45.5025269Z 2025-12-04T11:20:45.5025375Z Expected 1 but got 2. 2025-12-04T11:20:45.5025484Z Absolute difference: 1 2025-12-04T11:20:45.5025611Z Relative difference: 1.0 2025-12-04T11:20:45.5025617Z 2025-12-04T11:20:45.5025832Z To execute this test, run the following from the base repo dir: 2025-12-04T11:20:45.5026762Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 2025-12-04T11:20:45.5026768Z 2025-12-04T11:20:45.5027096Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:20:45.5027323Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:20:45.5027452Z stats [('calls_captured', 6)] 2025-12-04T11:20:45.5027981Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)] 2025-12-04T11:20:45.5028253Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:20:45.5028352Z graph_break [] 2025-12-04T11:20:45.5028567Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:20:45.5029308Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.5029416Z warnings.warn( 2025-12-04T11:20:45.5030140Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.5030288Z warnings.warn( 2025-12-04T11:20:45.5030504Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:20:45.5030634Z stats [('calls_captured', 6)] 2025-12-04T11:20:45.5030863Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:20:45.5031397Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)] 2025-12-04T11:20:45.5031509Z graph_break [] 2025-12-04T11:20:45.5031726Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:20:45.5032448Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.5032564Z warnings.warn( 2025-12-04T11:20:45.5033285Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.5033401Z warnings.warn( 2025-12-04T11:20:45.5033620Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:20:45.5033735Z stats [('calls_captured', 6)] 2025-12-04T11:20:45.5033975Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:20:45.5034502Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)] 2025-12-04T11:20:45.5034600Z graph_break [] 2025-12-04T11:20:45.5034827Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:20:45.5035552Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.5035671Z warnings.warn( 2025-12-04T11:20:45.5036391Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.5036490Z warnings.warn( 2025-12-04T11:20:45.5037345Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-9ed754aaaf490f98.xml - 2025-12-04T11:20:45.5037521Z =========================== short test summary info ============================ 2025-12-04T11:20:45.5038477Z FAILED [0.9217s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 - AssertionError: Scalars are not equal! 2025-12-04T11:20:45.5038483Z 2025-12-04T11:20:45.5038591Z Expected 1 but got 2. 2025-12-04T11:20:45.5038701Z Absolute difference: 1 2025-12-04T11:20:45.5038886Z Relative difference: 1.0 2025-12-04T11:20:45.5038892Z 2025-12-04T11:20:45.5039114Z To execute this test, run the following from the base repo dir: 2025-12-04T11:20:45.5040041Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 2025-12-04T11:20:45.5040095Z 2025-12-04T11:20:45.5040366Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:20:45.5040550Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:20:45.5040767Z ================== 1 failed, 12 deselected, 2 rerun in 6.22s =================== 2025-12-04T11:20:45.5040870Z Got exit code 1 2025-12-04T11:20:45.5040992Z Retrying single test... 2025-12-04T11:20:45.5041438Z W1204 11:17:58.451000 93085 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T11:20:45.5042105Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-36a3a8a6a9d0a436.xml 2025-12-04T11:20:45.5042320Z ============================= test session starts ============================== 2025-12-04T11:20:45.5042674Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T11:20:45.5042788Z cachedir: .pytest_cache 2025-12-04T11:20:45.5043321Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:20:45.5043448Z rootdir: /var/lib/jenkins/workspace 2025-12-04T11:20:45.5043572Z configfile: pytest.ini 2025-12-04T11:20:45.5044114Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:20:45.5044334Z collecting ... collected 58 items / 13 deselected / 45 selected 2025-12-04T11:20:45.5045352Z stepcurrent: skipping 12 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 2025-12-04T11:20:45.5045471Z Running 1 items in this shard 2025-12-04T11:20:45.5045476Z 2025-12-04T11:20:45.5046795Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 [W1204 11:18:02.909364710 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.5046804Z 2025-12-04T11:20:45.5047322Z [W1204 11:18:18.165604660 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.5047327Z 2025-12-04T11:20:45.5047851Z [W1204 11:18:18.165880384 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.5047861Z 2025-12-04T11:20:45.5048370Z [W1204 11:18:18.173373199 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.5048378Z 2025-12-04T11:20:45.5048885Z [W1204 11:18:18.174099024 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.5048906Z 2025-12-04T11:20:45.5049413Z [W1204 11:18:18.174291668 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.5049418Z 2025-12-04T11:20:45.5049927Z [W1204 11:18:18.181446283 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.5049932Z 2025-12-04T11:20:45.5050455Z [W1204 11:18:18.182214576 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.5050460Z 2025-12-04T11:20:45.5051035Z [W1204 11:18:18.182410824 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.5051044Z 2025-12-04T11:20:45.5051562Z [W1204 11:18:20.185362378 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.5051567Z 2025-12-04T11:20:45.5052101Z [W1204 11:18:20.187051096 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.5052106Z 2025-12-04T11:20:45.5052624Z [W1204 11:18:20.187268919 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.5052629Z 2025-12-04T11:20:45.5053136Z [W1204 11:18:20.191256448 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.5053140Z 2025-12-04T11:20:45.5053664Z [W1204 11:18:20.191945160 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.5053701Z 2025-12-04T11:20:45.5054206Z [W1204 11:18:20.192143429 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.5054211Z 2025-12-04T11:20:45.5054719Z [W1204 11:18:20.198289814 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.5054739Z 2025-12-04T11:20:45.5055244Z [W1204 11:18:20.198946772 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.5055249Z 2025-12-04T11:20:45.5055757Z [W1204 11:18:20.199143134 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.5055763Z 2025-12-04T11:20:45.5055913Z ('RERUN', {'yellow': True}) [20.5887s] [100%] 2025-12-04T11:20:45.5057272Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 [W1204 11:18:21.038661417 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.5057282Z 2025-12-04T11:20:45.5057811Z [W1204 11:18:21.039464655 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.5057816Z 2025-12-04T11:20:45.5058325Z [W1204 11:18:21.039673592 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.5058330Z 2025-12-04T11:20:45.5058855Z [W1204 11:18:21.043809712 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.5058860Z 2025-12-04T11:20:45.5059372Z [W1204 11:18:21.044702754 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.5059380Z 2025-12-04T11:20:45.5059901Z [W1204 11:18:21.044905447 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.5059906Z 2025-12-04T11:20:45.5060415Z [W1204 11:18:21.051125928 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.5060422Z 2025-12-04T11:20:45.5060932Z [W1204 11:18:21.051817298 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.5060937Z 2025-12-04T11:20:45.5061458Z [W1204 11:18:21.052012962 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.5061462Z 2025-12-04T11:20:45.5062038Z [W1204 11:18:21.142290976 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.5062046Z 2025-12-04T11:20:45.5062568Z [W1204 11:18:21.143065618 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.5062573Z 2025-12-04T11:20:45.5063083Z [W1204 11:18:21.143271750 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.5063121Z 2025-12-04T11:20:45.5063645Z [W1204 11:18:21.147245585 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.5063650Z 2025-12-04T11:20:45.5064161Z [W1204 11:18:21.147908558 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.5064167Z 2025-12-04T11:20:45.5064689Z [W1204 11:18:21.148105840 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.5064733Z 2025-12-04T11:20:45.5065240Z [W1204 11:18:21.154247234 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.5065245Z 2025-12-04T11:20:45.5065754Z [W1204 11:18:21.155099158 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.5065776Z 2025-12-04T11:20:45.5066286Z [W1204 11:18:21.155298718 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.5066290Z 2025-12-04T11:20:45.5066423Z ('RERUN', {'yellow': True}) [0.9173s] [100%] 2025-12-04T11:20:45.5067726Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 [W1204 11:18:22.932878588 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.5067734Z 2025-12-04T11:20:45.5068244Z [W1204 11:18:22.933683468 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.5068248Z 2025-12-04T11:20:45.5068768Z [W1204 11:18:22.933888083 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.5068775Z 2025-12-04T11:20:45.5069281Z [W1204 11:18:22.937899275 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.5069286Z 2025-12-04T11:20:45.5069803Z [W1204 11:18:22.938552881 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.5069808Z 2025-12-04T11:20:45.5070321Z [W1204 11:18:22.938749165 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.5070328Z 2025-12-04T11:20:45.5070849Z [W1204 11:18:22.944951671 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.5070854Z 2025-12-04T11:20:45.5071766Z [W1204 11:18:22.945606932 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.5071776Z 2025-12-04T11:20:45.5072285Z [W1204 11:18:22.945800661 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.5072303Z 2025-12-04T11:20:45.5072813Z [W1204 11:18:22.036116308 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.5072818Z 2025-12-04T11:20:45.5073483Z [W1204 11:18:22.036923075 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.5073488Z 2025-12-04T11:20:45.5074014Z [W1204 11:18:22.037134899 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.5074019Z 2025-12-04T11:20:45.5074525Z [W1204 11:18:22.041178483 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.5074572Z 2025-12-04T11:20:45.5075100Z [W1204 11:18:22.041856946 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.5075105Z 2025-12-04T11:20:45.5075615Z [W1204 11:18:22.042058583 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.5075619Z 2025-12-04T11:20:45.5076142Z [W1204 11:18:22.048169858 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.5076151Z 2025-12-04T11:20:45.5076660Z [W1204 11:18:22.049033772 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.5076719Z 2025-12-04T11:20:45.5077239Z [W1204 11:18:22.049233391 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.5077246Z 2025-12-04T11:20:45.5077351Z FAILED [0.8916s] [100%] 2025-12-04T11:20:45.5077356Z 2025-12-04T11:20:45.5077502Z ==================================== RERUNS ==================================== 2025-12-04T11:20:45.5078037Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 _ 2025-12-04T11:20:45.5078163Z Traceback (most recent call last): 2025-12-04T11:20:45.5078679Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda 2025-12-04T11:20:45.5078931Z self.assertEqual(counters["inductor"]["woq_matcher_count"], 1) 2025-12-04T11:20:45.5079397Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual 2025-12-04T11:20:45.5079575Z return super().assertEqual(x, y, *args, **kwargs) 2025-12-04T11:20:45.5080111Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual 2025-12-04T11:20:45.5080323Z raise error_metas.pop()[0].to_error( # type: ignore[index] 2025-12-04T11:20:45.5080473Z AssertionError: Scalars are not equal! 2025-12-04T11:20:45.5080478Z 2025-12-04T11:20:45.5080586Z Expected 1 but got 2. 2025-12-04T11:20:45.5080710Z Absolute difference: 1 2025-12-04T11:20:45.5080822Z Relative difference: 1.0 2025-12-04T11:20:45.5080827Z 2025-12-04T11:20:45.5081042Z To execute this test, run the following from the base repo dir: 2025-12-04T11:20:45.5081974Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 2025-12-04T11:20:45.5081983Z 2025-12-04T11:20:45.5082255Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:20:45.5082495Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:20:45.5082614Z stats [('calls_captured', 6)] 2025-12-04T11:20:45.5083144Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)] 2025-12-04T11:20:45.5083387Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:20:45.5083488Z graph_break [] 2025-12-04T11:20:45.5083707Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:20:45.5085016Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T11:20:45.5085138Z if out == self.unknown_value: 2025-12-04T11:20:45.5085880Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.5086015Z warnings.warn( 2025-12-04T11:20:45.5086733Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.5086850Z warnings.warn( 2025-12-04T11:20:45.5087368Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 _ 2025-12-04T11:20:45.5087505Z Traceback (most recent call last): 2025-12-04T11:20:45.5088016Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda 2025-12-04T11:20:45.5088253Z self.assertEqual(counters["inductor"]["woq_matcher_count"], 1) 2025-12-04T11:20:45.5088763Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual 2025-12-04T11:20:45.5088928Z return super().assertEqual(x, y, *args, **kwargs) 2025-12-04T11:20:45.5089465Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual 2025-12-04T11:20:45.5089689Z raise error_metas.pop()[0].to_error( # type: ignore[index] 2025-12-04T11:20:45.5089821Z AssertionError: Scalars are not equal! 2025-12-04T11:20:45.5089826Z 2025-12-04T11:20:45.5089944Z Expected 1 but got 2. 2025-12-04T11:20:45.5090052Z Absolute difference: 1 2025-12-04T11:20:45.5090161Z Relative difference: 1.0 2025-12-04T11:20:45.5090166Z 2025-12-04T11:20:45.5090392Z To execute this test, run the following from the base repo dir: 2025-12-04T11:20:45.5091313Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 2025-12-04T11:20:45.5091321Z 2025-12-04T11:20:45.5091601Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:20:45.5091821Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:20:45.5091941Z stats [('calls_captured', 6)] 2025-12-04T11:20:45.5092482Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)] 2025-12-04T11:20:45.5092710Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:20:45.5092824Z graph_break [] 2025-12-04T11:20:45.5093042Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:20:45.5094256Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T11:20:45.5094391Z if out == self.unknown_value: 2025-12-04T11:20:45.5095115Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.5095222Z warnings.warn( 2025-12-04T11:20:45.5095954Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.5096056Z warnings.warn( 2025-12-04T11:20:45.5096360Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:20:45.5096480Z stats [('calls_captured', 6)] 2025-12-04T11:20:45.5096710Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:20:45.5097328Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)] 2025-12-04T11:20:45.5097432Z graph_break [] 2025-12-04T11:20:45.5097648Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:20:45.5098385Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.5098518Z warnings.warn( 2025-12-04T11:20:45.5099244Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.5099347Z warnings.warn( 2025-12-04T11:20:45.5099494Z =================================== FAILURES =================================== 2025-12-04T11:20:45.5100029Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 _ 2025-12-04T11:20:45.5100155Z Traceback (most recent call last): 2025-12-04T11:20:45.5100710Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda 2025-12-04T11:20:45.5100942Z self.assertEqual(counters["inductor"]["woq_matcher_count"], 1) 2025-12-04T11:20:45.5101405Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual 2025-12-04T11:20:45.5101588Z return super().assertEqual(x, y, *args, **kwargs) 2025-12-04T11:20:45.5102123Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual 2025-12-04T11:20:45.5102344Z raise error_metas.pop()[0].to_error( # type: ignore[index] 2025-12-04T11:20:45.5102482Z AssertionError: Scalars are not equal! 2025-12-04T11:20:45.5102488Z 2025-12-04T11:20:45.5102593Z Expected 1 but got 2. 2025-12-04T11:20:45.5102717Z Absolute difference: 1 2025-12-04T11:20:45.5102836Z Relative difference: 1.0 2025-12-04T11:20:45.5102844Z 2025-12-04T11:20:45.5103060Z To execute this test, run the following from the base repo dir: 2025-12-04T11:20:45.5103991Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 2025-12-04T11:20:45.5104000Z 2025-12-04T11:20:45.5104270Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:20:45.5104502Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:20:45.5104620Z stats [('calls_captured', 6)] 2025-12-04T11:20:45.5105149Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)] 2025-12-04T11:20:45.5105394Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:20:45.5105501Z graph_break [] 2025-12-04T11:20:45.5105732Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:20:45.5106944Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T11:20:45.5107065Z if out == self.unknown_value: 2025-12-04T11:20:45.5107801Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.5107907Z warnings.warn( 2025-12-04T11:20:45.5108635Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.5108738Z warnings.warn( 2025-12-04T11:20:45.5109015Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:20:45.5109149Z stats [('calls_captured', 6)] 2025-12-04T11:20:45.5109377Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:20:45.5109905Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)] 2025-12-04T11:20:45.5110046Z graph_break [] 2025-12-04T11:20:45.5110262Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:20:45.5110996Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.5111098Z warnings.warn( 2025-12-04T11:20:45.5111817Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.5111929Z warnings.warn( 2025-12-04T11:20:45.5112150Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:20:45.5112299Z stats [('calls_captured', 6)] 2025-12-04T11:20:45.5112542Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:20:45.5113069Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)] 2025-12-04T11:20:45.5113185Z graph_break [] 2025-12-04T11:20:45.5113401Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:20:45.5114124Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.5114241Z warnings.warn( 2025-12-04T11:20:45.5114959Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.5115062Z warnings.warn( 2025-12-04T11:20:45.5115917Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-36a3a8a6a9d0a436.xml - 2025-12-04T11:20:45.5116092Z =========================== short test summary info ============================ 2025-12-04T11:20:45.5117060Z FAILED [0.8916s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 - AssertionError: Scalars are not equal! 2025-12-04T11:20:45.5117066Z 2025-12-04T11:20:45.5117174Z Expected 1 but got 2. 2025-12-04T11:20:45.5117296Z Absolute difference: 1 2025-12-04T11:20:45.5117408Z Relative difference: 1.0 2025-12-04T11:20:45.5117413Z 2025-12-04T11:20:45.5117631Z To execute this test, run the following from the base repo dir: 2025-12-04T11:20:45.5118566Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 2025-12-04T11:20:45.5118574Z 2025-12-04T11:20:45.5118841Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:20:45.5119035Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:20:45.5119237Z ================== 1 failed, 13 deselected, 2 rerun in 22.43s ================== 2025-12-04T11:20:45.5119336Z Got exit code 1 2025-12-04T11:20:45.5119458Z Retrying single test... 2025-12-04T11:20:45.5119902Z W1204 11:18:34.482000 93267 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T11:20:45.5120563Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-f55b1076fbed9be9.xml 2025-12-04T11:20:45.5120806Z ============================= test session starts ============================== 2025-12-04T11:20:45.5121157Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T11:20:45.5121286Z cachedir: .pytest_cache 2025-12-04T11:20:45.5121801Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:20:45.5121929Z rootdir: /var/lib/jenkins/workspace 2025-12-04T11:20:45.5122087Z configfile: pytest.ini 2025-12-04T11:20:45.5122629Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:20:45.5122848Z collecting ... collected 58 items / 13 deselected / 45 selected 2025-12-04T11:20:45.5123857Z stepcurrent: skipping 12 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 2025-12-04T11:20:45.5123977Z Running 1 items in this shard 2025-12-04T11:20:45.5124013Z 2025-12-04T11:20:45.5125314Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 [W1204 11:18:38.922872674 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.5125324Z 2025-12-04T11:20:45.5125844Z [W1204 11:18:55.597035519 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.5125850Z 2025-12-04T11:20:45.5126371Z [W1204 11:18:55.597300996 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.5126377Z 2025-12-04T11:20:45.5126884Z [W1204 11:18:55.604629255 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.5126889Z 2025-12-04T11:20:45.5127413Z [W1204 11:18:55.605319901 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.5127421Z 2025-12-04T11:20:45.5127924Z [W1204 11:18:55.605509309 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.5127929Z 2025-12-04T11:20:45.5128439Z [W1204 11:18:55.612389954 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.5128455Z 2025-12-04T11:20:45.5128959Z [W1204 11:18:55.613048860 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.5128964Z 2025-12-04T11:20:45.5129472Z [W1204 11:18:55.613234657 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.5129477Z 2025-12-04T11:20:45.5130005Z [W1204 11:18:57.616100451 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.5130012Z 2025-12-04T11:20:45.5130520Z [W1204 11:18:57.617812130 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.5130525Z 2025-12-04T11:20:45.5131046Z [W1204 11:18:57.618019855 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.5131053Z 2025-12-04T11:20:45.5131557Z [W1204 11:18:57.622013724 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.5131562Z 2025-12-04T11:20:45.5132082Z [W1204 11:18:57.622677822 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.5132087Z 2025-12-04T11:20:45.5132673Z [W1204 11:18:57.622877379 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.5132680Z 2025-12-04T11:20:45.5133198Z [W1204 11:18:57.628933356 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.5133203Z 2025-12-04T11:20:45.5133709Z [W1204 11:18:57.629573525 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.5133744Z 2025-12-04T11:20:45.5134253Z [W1204 11:18:57.629767077 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.5134270Z 2025-12-04T11:20:45.5134403Z ('RERUN', {'yellow': True}) [20.9861s] [100%] 2025-12-04T11:20:45.5135690Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 [W1204 11:18:58.466471619 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.5135727Z 2025-12-04T11:20:45.5136252Z [W1204 11:18:58.467254105 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.5136257Z 2025-12-04T11:20:45.5136836Z [W1204 11:18:58.467455498 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.5136850Z 2025-12-04T11:20:45.5137375Z [W1204 11:18:58.471497101 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.5137380Z 2025-12-04T11:20:45.5137884Z [W1204 11:18:58.472340780 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.5137889Z 2025-12-04T11:20:45.5138408Z [W1204 11:18:58.472554044 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.5138415Z 2025-12-04T11:20:45.5138935Z [W1204 11:18:58.478622130 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.5138940Z 2025-12-04T11:20:45.5139467Z [W1204 11:18:58.479272667 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.5139475Z 2025-12-04T11:20:45.5139988Z [W1204 11:18:58.479462688 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.5139993Z 2025-12-04T11:20:45.5140517Z [W1204 11:18:58.569732496 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.5140522Z 2025-12-04T11:20:45.5141035Z [W1204 11:18:58.570572735 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.5141042Z 2025-12-04T11:20:45.5141568Z [W1204 11:18:58.570797768 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.5141572Z 2025-12-04T11:20:45.5142082Z [W1204 11:18:58.574782774 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.5142090Z 2025-12-04T11:20:45.5142612Z [W1204 11:18:58.575467540 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.5142617Z 2025-12-04T11:20:45.5143128Z [W1204 11:18:58.575673876 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.5143134Z 2025-12-04T11:20:45.5143710Z [W1204 11:18:58.581796544 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.5143732Z 2025-12-04T11:20:45.5144241Z [W1204 11:18:58.582676720 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.5144246Z 2025-12-04T11:20:45.5144754Z [W1204 11:18:58.582877120 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.5144791Z 2025-12-04T11:20:45.5144936Z ('RERUN', {'yellow': True}) [0.9144s] [100%] 2025-12-04T11:20:45.5146239Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 [W1204 11:18:58.361247648 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.5146244Z 2025-12-04T11:20:45.5146775Z [W1204 11:18:58.362031139 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.5146808Z 2025-12-04T11:20:45.5147317Z [W1204 11:18:58.362238581 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.5147322Z 2025-12-04T11:20:45.5147846Z [W1204 11:18:58.366209287 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.5147853Z 2025-12-04T11:20:45.5148366Z [W1204 11:18:58.366867750 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.5148371Z 2025-12-04T11:20:45.5148882Z [W1204 11:18:58.367060670 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.5148900Z 2025-12-04T11:20:45.5149411Z [W1204 11:18:58.373175522 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.5149416Z 2025-12-04T11:20:45.5149929Z [W1204 11:18:58.373831842 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.5149934Z 2025-12-04T11:20:45.5150456Z [W1204 11:18:58.374022365 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.5150463Z 2025-12-04T11:20:45.5150975Z [W1204 11:18:59.463241692 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.5150981Z 2025-12-04T11:20:45.5151502Z [W1204 11:18:59.464028036 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.5151507Z 2025-12-04T11:20:45.5152018Z [W1204 11:18:59.464246357 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.5152027Z 2025-12-04T11:20:45.5152549Z [W1204 11:18:59.468192380 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.5152556Z 2025-12-04T11:20:45.5153067Z [W1204 11:18:59.468860486 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.5153073Z 2025-12-04T11:20:45.5153597Z [W1204 11:18:59.469059107 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.5153602Z 2025-12-04T11:20:45.5154109Z [W1204 11:18:59.475103885 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.5154114Z 2025-12-04T11:20:45.5154627Z [W1204 11:18:59.475920375 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.5154643Z 2025-12-04T11:20:45.5155214Z [W1204 11:18:59.476115587 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.5155222Z 2025-12-04T11:20:45.5155331Z FAILED [0.8904s] [100%] 2025-12-04T11:20:45.5155336Z 2025-12-04T11:20:45.5155495Z ==================================== RERUNS ==================================== 2025-12-04T11:20:45.5156045Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 _ 2025-12-04T11:20:45.5156183Z Traceback (most recent call last): 2025-12-04T11:20:45.5156696Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda 2025-12-04T11:20:45.5156926Z self.assertEqual(counters["inductor"]["woq_matcher_count"], 1) 2025-12-04T11:20:45.5157404Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual 2025-12-04T11:20:45.5157573Z return super().assertEqual(x, y, *args, **kwargs) 2025-12-04T11:20:45.5158141Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual 2025-12-04T11:20:45.5158360Z raise error_metas.pop()[0].to_error( # type: ignore[index] 2025-12-04T11:20:45.5158492Z AssertionError: Scalars are not equal! 2025-12-04T11:20:45.5158497Z 2025-12-04T11:20:45.5158618Z Expected 1 but got 2. 2025-12-04T11:20:45.5158725Z Absolute difference: 1 2025-12-04T11:20:45.5158835Z Relative difference: 1.0 2025-12-04T11:20:45.5158841Z 2025-12-04T11:20:45.5159066Z To execute this test, run the following from the base repo dir: 2025-12-04T11:20:45.5159986Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 2025-12-04T11:20:45.5159992Z 2025-12-04T11:20:45.5160278Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:20:45.5160500Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:20:45.5160618Z stats [('calls_captured', 6)] 2025-12-04T11:20:45.5161158Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)] 2025-12-04T11:20:45.5161386Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:20:45.5161488Z graph_break [] 2025-12-04T11:20:45.5161724Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:20:45.5162932Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T11:20:45.5163063Z if out == self.unknown_value: 2025-12-04T11:20:45.5163797Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.5163903Z warnings.warn( 2025-12-04T11:20:45.5164644Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.5164749Z warnings.warn( 2025-12-04T11:20:45.5165279Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 _ 2025-12-04T11:20:45.5165410Z Traceback (most recent call last): 2025-12-04T11:20:45.5165915Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda 2025-12-04T11:20:45.5166161Z self.assertEqual(counters["inductor"]["woq_matcher_count"], 1) 2025-12-04T11:20:45.5166686Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual 2025-12-04T11:20:45.5166866Z return super().assertEqual(x, y, *args, **kwargs) 2025-12-04T11:20:45.5167403Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual 2025-12-04T11:20:45.5167610Z raise error_metas.pop()[0].to_error( # type: ignore[index] 2025-12-04T11:20:45.5167756Z AssertionError: Scalars are not equal! 2025-12-04T11:20:45.5167792Z 2025-12-04T11:20:45.5167901Z Expected 1 but got 2. 2025-12-04T11:20:45.5168009Z Absolute difference: 1 2025-12-04T11:20:45.5168137Z Relative difference: 1.0 2025-12-04T11:20:45.5168142Z 2025-12-04T11:20:45.5168358Z To execute this test, run the following from the base repo dir: 2025-12-04T11:20:45.5169281Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 2025-12-04T11:20:45.5169287Z 2025-12-04T11:20:45.5169561Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:20:45.5169830Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:20:45.5169960Z stats [('calls_captured', 6)] 2025-12-04T11:20:45.5170483Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)] 2025-12-04T11:20:45.5170723Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:20:45.5170825Z graph_break [] 2025-12-04T11:20:45.5171313Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:20:45.5172596Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T11:20:45.5172721Z if out == self.unknown_value: 2025-12-04T11:20:45.5173463Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.5173570Z warnings.warn( 2025-12-04T11:20:45.5174289Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.5174409Z warnings.warn( 2025-12-04T11:20:45.5174629Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:20:45.5174745Z stats [('calls_captured', 6)] 2025-12-04T11:20:45.5174988Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:20:45.5175515Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)] 2025-12-04T11:20:45.5175631Z graph_break [] 2025-12-04T11:20:45.5175854Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:20:45.5176678Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.5176799Z warnings.warn( 2025-12-04T11:20:45.5177518Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.5177624Z warnings.warn( 2025-12-04T11:20:45.5177789Z =================================== FAILURES =================================== 2025-12-04T11:20:45.5178307Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 _ 2025-12-04T11:20:45.5178448Z Traceback (most recent call last): 2025-12-04T11:20:45.5179102Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda 2025-12-04T11:20:45.5179343Z self.assertEqual(counters["inductor"]["woq_matcher_count"], 1) 2025-12-04T11:20:45.5179822Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual 2025-12-04T11:20:45.5179989Z return super().assertEqual(x, y, *args, **kwargs) 2025-12-04T11:20:45.5180541Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual 2025-12-04T11:20:45.5180798Z raise error_metas.pop()[0].to_error( # type: ignore[index] 2025-12-04T11:20:45.5180932Z AssertionError: Scalars are not equal! 2025-12-04T11:20:45.5180938Z 2025-12-04T11:20:45.5181063Z Expected 1 but got 2. 2025-12-04T11:20:45.5181171Z Absolute difference: 1 2025-12-04T11:20:45.5181282Z Relative difference: 1.0 2025-12-04T11:20:45.5181287Z 2025-12-04T11:20:45.5181514Z To execute this test, run the following from the base repo dir: 2025-12-04T11:20:45.5182437Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 2025-12-04T11:20:45.5182492Z 2025-12-04T11:20:45.5182777Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:20:45.5182995Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:20:45.5183115Z stats [('calls_captured', 6)] 2025-12-04T11:20:45.5183656Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)] 2025-12-04T11:20:45.5183884Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:20:45.5183996Z graph_break [] 2025-12-04T11:20:45.5184216Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:20:45.5185428Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T11:20:45.5185563Z if out == self.unknown_value: 2025-12-04T11:20:45.5186290Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.5186411Z warnings.warn( 2025-12-04T11:20:45.5187128Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.5187230Z warnings.warn( 2025-12-04T11:20:45.5187461Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:20:45.5187580Z stats [('calls_captured', 6)] 2025-12-04T11:20:45.5187811Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:20:45.5188361Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)] 2025-12-04T11:20:45.5188462Z graph_break [] 2025-12-04T11:20:45.5188692Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:20:45.5189420Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.5189527Z warnings.warn( 2025-12-04T11:20:45.5190255Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.5190356Z warnings.warn( 2025-12-04T11:20:45.5190571Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:20:45.5190698Z stats [('calls_captured', 6)] 2025-12-04T11:20:45.5191070Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:20:45.5191613Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)] 2025-12-04T11:20:45.5191713Z graph_break [] 2025-12-04T11:20:45.5191930Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:20:45.5192704Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.5192805Z warnings.warn( 2025-12-04T11:20:45.5193533Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.5193637Z warnings.warn( 2025-12-04T11:20:45.5194486Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-f55b1076fbed9be9.xml - 2025-12-04T11:20:45.5194714Z =========================== short test summary info ============================ 2025-12-04T11:20:45.5195677Z FAILED [0.8904s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 - AssertionError: Scalars are not equal! 2025-12-04T11:20:45.5195686Z 2025-12-04T11:20:45.5195808Z Expected 1 but got 2. 2025-12-04T11:20:45.5195916Z Absolute difference: 1 2025-12-04T11:20:45.5196028Z Relative difference: 1.0 2025-12-04T11:20:45.5196033Z 2025-12-04T11:20:45.5196263Z To execute this test, run the following from the base repo dir: 2025-12-04T11:20:45.5197187Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 2025-12-04T11:20:45.5197193Z 2025-12-04T11:20:45.5197481Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:20:45.5197668Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:20:45.5197869Z ================== 1 failed, 13 deselected, 2 rerun in 22.82s ================== 2025-12-04T11:20:45.5197985Z Got exit code 1 2025-12-04T11:20:45.5198817Z FAILED CONSISTENTLY: test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 2025-12-04T11:20:45.5199230Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T11:20:45.5199695Z W1204 11:19:10.799000 93449 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T11:20:45.5200352Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-6062b5e411b734f8.xml 2025-12-04T11:20:45.5200538Z ============================= test session starts ============================== 2025-12-04T11:20:45.5200892Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T11:20:45.5201003Z cachedir: .pytest_cache 2025-12-04T11:20:45.5201537Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:20:45.5201666Z rootdir: /var/lib/jenkins/workspace 2025-12-04T11:20:45.5201787Z configfile: pytest.ini 2025-12-04T11:20:45.5202329Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:20:45.5202547Z collecting ... collected 58 items / 13 deselected / 45 selected 2025-12-04T11:20:45.5202703Z stepcurrent: skipping 13 already run items. 2025-12-04T11:20:45.5202821Z Running 1 items in this shard 2025-12-04T11:20:45.5202826Z 2025-12-04T11:20:45.5204142Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 W1204 11:19:16.460000 93449 site-packages/torch/_inductor/utils.py:1703] [0/0] Not enough SMs to use max_autotune_gemm mode 2025-12-04T11:20:45.5204283Z ('RERUN', {'yellow': True}) [4.0054s] [100%] 2025-12-04T11:20:45.5205146Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 ('RERUN', {'yellow': True}) [0.5344s] [100%] 2025-12-04T11:20:45.5205976Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 FAILED [0.5294s] [100%] 2025-12-04T11:20:45.5205981Z 2025-12-04T11:20:45.5206124Z ==================================== RERUNS ==================================== 2025-12-04T11:20:45.5206650Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 _ 2025-12-04T11:20:45.5206805Z Traceback (most recent call last): 2025-12-04T11:20:45.5207316Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda 2025-12-04T11:20:45.5207561Z self.assertEqual(counters["inductor"]["woq_matcher_count"], 1) 2025-12-04T11:20:45.5208027Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual 2025-12-04T11:20:45.5208213Z return super().assertEqual(x, y, *args, **kwargs) 2025-12-04T11:20:45.5208750Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual 2025-12-04T11:20:45.5208961Z raise error_metas.pop()[0].to_error( # type: ignore[index] 2025-12-04T11:20:45.5209108Z AssertionError: Scalars are not equal! 2025-12-04T11:20:45.5209113Z 2025-12-04T11:20:45.5209221Z Expected 1 but got 0. 2025-12-04T11:20:45.5209335Z Absolute difference: 1 2025-12-04T11:20:45.5209462Z Relative difference: 1.0 2025-12-04T11:20:45.5209470Z 2025-12-04T11:20:45.5209686Z To execute this test, run the following from the base repo dir: 2025-12-04T11:20:45.5210609Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 2025-12-04T11:20:45.5210617Z 2025-12-04T11:20:45.5210884Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:20:45.5211109Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:20:45.5211242Z stats [('calls_captured', 6)] 2025-12-04T11:20:45.5211949Z inductor [('pattern_matcher_count', 6), ('pattern_matcher_nodes', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('pad_mm_bench', 1)] 2025-12-04T11:20:45.5212197Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:20:45.5212301Z graph_break [] 2025-12-04T11:20:45.5212430Z aten_mm_info [('aten.mm_256_72_1024', 2)] 2025-12-04T11:20:45.5212661Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:20:45.5213400Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.5213509Z warnings.warn( 2025-12-04T11:20:45.5214246Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.5214352Z warnings.warn( 2025-12-04T11:20:45.5214878Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 _ 2025-12-04T11:20:45.5215004Z Traceback (most recent call last): 2025-12-04T11:20:45.5215587Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda 2025-12-04T11:20:45.5215840Z self.assertEqual(counters["inductor"]["woq_matcher_count"], 1) 2025-12-04T11:20:45.5216370Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual 2025-12-04T11:20:45.5216555Z return super().assertEqual(x, y, *args, **kwargs) 2025-12-04T11:20:45.5217151Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual 2025-12-04T11:20:45.5217361Z raise error_metas.pop()[0].to_error( # type: ignore[index] 2025-12-04T11:20:45.5217510Z AssertionError: Scalars are not equal! 2025-12-04T11:20:45.5217516Z 2025-12-04T11:20:45.5217625Z Expected 1 but got 0. 2025-12-04T11:20:45.5217733Z Absolute difference: 1 2025-12-04T11:20:45.5217860Z Relative difference: 1.0 2025-12-04T11:20:45.5217866Z 2025-12-04T11:20:45.5218083Z To execute this test, run the following from the base repo dir: 2025-12-04T11:20:45.5219016Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 2025-12-04T11:20:45.5219055Z 2025-12-04T11:20:45.5219327Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:20:45.5219550Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:20:45.5219684Z stats [('calls_captured', 6)] 2025-12-04T11:20:45.5220376Z inductor [('pattern_matcher_count', 6), ('pattern_matcher_nodes', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('pad_mm_bench', 1)] 2025-12-04T11:20:45.5220615Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:20:45.5220715Z graph_break [] 2025-12-04T11:20:45.5220842Z aten_mm_info [('aten.mm_256_72_1024', 2)] 2025-12-04T11:20:45.5221078Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:20:45.5221811Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.5221914Z warnings.warn( 2025-12-04T11:20:45.5222645Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.5222750Z warnings.warn( 2025-12-04T11:20:45.5222981Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:20:45.5223097Z stats [('calls_captured', 6)] 2025-12-04T11:20:45.5223325Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:20:45.5224034Z inductor [('pattern_matcher_count', 6), ('pattern_matcher_nodes', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('pad_mm_bench', 1)] 2025-12-04T11:20:45.5224137Z graph_break [] 2025-12-04T11:20:45.5224261Z aten_mm_info [('aten.mm_256_72_1024', 2)] 2025-12-04T11:20:45.5224494Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:20:45.5225212Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.5225329Z warnings.warn( 2025-12-04T11:20:45.5226052Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.5226156Z warnings.warn( 2025-12-04T11:20:45.5226320Z =================================== FAILURES =================================== 2025-12-04T11:20:45.5226831Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 _ 2025-12-04T11:20:45.5226969Z Traceback (most recent call last): 2025-12-04T11:20:45.5227552Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda 2025-12-04T11:20:45.5227788Z self.assertEqual(counters["inductor"]["woq_matcher_count"], 1) 2025-12-04T11:20:45.5228265Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual 2025-12-04T11:20:45.5228463Z return super().assertEqual(x, y, *args, **kwargs) 2025-12-04T11:20:45.5229001Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual 2025-12-04T11:20:45.5229225Z raise error_metas.pop()[0].to_error( # type: ignore[index] 2025-12-04T11:20:45.5229358Z AssertionError: Scalars are not equal! 2025-12-04T11:20:45.5229363Z 2025-12-04T11:20:45.5229485Z Expected 1 but got 0. 2025-12-04T11:20:45.5229594Z Absolute difference: 1 2025-12-04T11:20:45.5229705Z Relative difference: 1.0 2025-12-04T11:20:45.5229709Z 2025-12-04T11:20:45.5229944Z To execute this test, run the following from the base repo dir: 2025-12-04T11:20:45.5230878Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 2025-12-04T11:20:45.5230884Z 2025-12-04T11:20:45.5231164Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:20:45.5231385Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:20:45.5231502Z stats [('calls_captured', 6)] 2025-12-04T11:20:45.5232211Z inductor [('pattern_matcher_count', 6), ('pattern_matcher_nodes', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('pad_mm_bench', 1)] 2025-12-04T11:20:45.5232439Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:20:45.5232551Z graph_break [] 2025-12-04T11:20:45.5232678Z aten_mm_info [('aten.mm_256_72_1024', 2)] 2025-12-04T11:20:45.5232895Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:20:45.5233640Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.5233742Z warnings.warn( 2025-12-04T11:20:45.5234462Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.5234581Z warnings.warn( 2025-12-04T11:20:45.5234797Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:20:45.5234926Z stats [('calls_captured', 6)] 2025-12-04T11:20:45.5235154Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:20:45.5235855Z inductor [('pattern_matcher_count', 6), ('pattern_matcher_nodes', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('pad_mm_bench', 1)] 2025-12-04T11:20:45.5235972Z graph_break [] 2025-12-04T11:20:45.5236096Z aten_mm_info [('aten.mm_256_72_1024', 2)] 2025-12-04T11:20:45.5236310Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:20:45.5237048Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.5237153Z warnings.warn( 2025-12-04T11:20:45.5237881Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.5237983Z warnings.warn( 2025-12-04T11:20:45.5238202Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:20:45.5238333Z stats [('calls_captured', 6)] 2025-12-04T11:20:45.5238620Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:20:45.5239314Z inductor [('pattern_matcher_count', 6), ('pattern_matcher_nodes', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('pad_mm_bench', 1)] 2025-12-04T11:20:45.5239429Z graph_break [] 2025-12-04T11:20:45.5239554Z aten_mm_info [('aten.mm_256_72_1024', 2)] 2025-12-04T11:20:45.5239782Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:20:45.5240537Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.5240638Z warnings.warn( 2025-12-04T11:20:45.5241365Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.5241466Z warnings.warn( 2025-12-04T11:20:45.5242317Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-6062b5e411b734f8.xml - 2025-12-04T11:20:45.5242525Z =========================== short test summary info ============================ 2025-12-04T11:20:45.5243468Z FAILED [0.5294s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 - AssertionError: Scalars are not equal! 2025-12-04T11:20:45.5243476Z 2025-12-04T11:20:45.5243597Z Expected 1 but got 0. 2025-12-04T11:20:45.5243706Z Absolute difference: 1 2025-12-04T11:20:45.5243829Z Relative difference: 1.0 2025-12-04T11:20:45.5243834Z 2025-12-04T11:20:45.5244049Z To execute this test, run the following from the base repo dir: 2025-12-04T11:20:45.5244959Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 2025-12-04T11:20:45.5244970Z 2025-12-04T11:20:45.5245248Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:20:45.5245431Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:20:45.5245642Z ================== 1 failed, 13 deselected, 2 rerun in 5.10s =================== 2025-12-04T11:20:45.5245742Z Got exit code 1 2025-12-04T11:20:45.5245852Z Retrying single test... 2025-12-04T11:20:45.5246309Z W1204 11:19:31.510000 93626 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T11:20:45.5246964Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-21fa07c752a411ad.xml 2025-12-04T11:20:45.5247129Z ============================= test session starts ============================== 2025-12-04T11:20:45.5247490Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T11:20:45.5247606Z cachedir: .pytest_cache 2025-12-04T11:20:45.5248138Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:20:45.5248262Z rootdir: /var/lib/jenkins/workspace 2025-12-04T11:20:45.5248371Z configfile: pytest.ini 2025-12-04T11:20:45.5248923Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:20:45.5249145Z collecting ... collected 58 items / 13 deselected / 45 selected 2025-12-04T11:20:45.5250142Z stepcurrent: skipping 13 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 2025-12-04T11:20:45.5250269Z Running 1 items in this shard 2025-12-04T11:20:45.5250275Z 2025-12-04T11:20:45.5251620Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 [W1204 11:19:37.474114044 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.5251629Z 2025-12-04T11:20:45.5252160Z [W1204 11:19:52.322354498 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.5252197Z 2025-12-04T11:20:45.5252712Z [W1204 11:19:52.322613110 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.5252718Z 2025-12-04T11:20:45.5253243Z [W1204 11:19:52.330983752 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.5253248Z 2025-12-04T11:20:45.5253753Z [W1204 11:19:52.331924649 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.5253762Z 2025-12-04T11:20:45.5254282Z [W1204 11:19:52.332122589 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.5254321Z 2025-12-04T11:20:45.5254832Z [W1204 11:19:52.339912127 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.5254839Z 2025-12-04T11:20:45.5255352Z [W1204 11:19:52.340729993 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.5255356Z 2025-12-04T11:20:45.5255869Z [W1204 11:19:52.340934308 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.5255873Z 2025-12-04T11:20:45.5256406Z W1204 11:19:53.064000 93626 site-packages/torch/_inductor/utils.py:1703] [0/0] Not enough SMs to use max_autotune_gemm mode 2025-12-04T11:20:45.5256933Z [W1204 11:19:53.539850131 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.5256940Z 2025-12-04T11:20:45.5257450Z [W1204 11:19:53.541683967 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.5257455Z 2025-12-04T11:20:45.5257976Z [W1204 11:19:53.541912926 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.5257984Z 2025-12-04T11:20:45.5258490Z [W1204 11:19:53.546785248 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.5258495Z 2025-12-04T11:20:45.5259014Z [W1204 11:19:53.547522672 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.5259019Z 2025-12-04T11:20:45.5259530Z [W1204 11:19:53.547731238 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.5259537Z 2025-12-04T11:20:45.5260057Z [W1204 11:19:53.554767633 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.5260062Z 2025-12-04T11:20:45.5260568Z [W1204 11:19:53.555599210 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.5260575Z 2025-12-04T11:20:45.5261085Z [W1204 11:19:53.555812084 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.5261103Z 2025-12-04T11:20:45.5261237Z ('RERUN', {'yellow': True}) [19.8935s] [100%] 2025-12-04T11:20:45.5262602Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 [W1204 11:19:53.021321403 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.5262612Z 2025-12-04T11:20:45.5263135Z [W1204 11:19:53.022113670 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.5263139Z 2025-12-04T11:20:45.5263656Z [W1204 11:19:53.022326042 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.5263690Z 2025-12-04T11:20:45.5264216Z [W1204 11:19:53.027291959 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.5264221Z 2025-12-04T11:20:45.5264727Z [W1204 11:19:53.027971346 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.5264732Z 2025-12-04T11:20:45.5265257Z [W1204 11:19:53.028165777 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.5265293Z 2025-12-04T11:20:45.5265803Z [W1204 11:19:53.035049819 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.5265808Z 2025-12-04T11:20:45.5266331Z [W1204 11:19:53.035714476 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.5266340Z 2025-12-04T11:20:45.5266849Z [W1204 11:19:53.035906472 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.5266854Z 2025-12-04T11:20:45.5267361Z [W1204 11:19:53.144804588 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.5267381Z 2025-12-04T11:20:45.5267895Z [W1204 11:19:53.145593970 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.5267902Z 2025-12-04T11:20:45.5268408Z [W1204 11:19:53.145804810 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.5268414Z 2025-12-04T11:20:45.5268936Z [W1204 11:19:53.150576939 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.5268943Z 2025-12-04T11:20:45.5269451Z [W1204 11:19:53.151252477 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.5269456Z 2025-12-04T11:20:45.5269977Z [W1204 11:19:53.151454687 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.5269982Z 2025-12-04T11:20:45.5270491Z [W1204 11:19:53.158214770 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.5270496Z 2025-12-04T11:20:45.5271254Z [W1204 11:19:53.158895195 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.5271261Z 2025-12-04T11:20:45.5271857Z [W1204 11:19:53.159095257 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.5271866Z 2025-12-04T11:20:45.5272000Z ('RERUN', {'yellow': True}) [0.5623s] [100%] 2025-12-04T11:20:45.5273294Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 [W1204 11:19:54.561837973 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.5273300Z 2025-12-04T11:20:45.5273947Z [W1204 11:19:54.562656934 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.5273953Z 2025-12-04T11:20:45.5274483Z [W1204 11:19:54.562873174 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.5274487Z 2025-12-04T11:20:45.5274999Z [W1204 11:19:54.567928937 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.5275045Z 2025-12-04T11:20:45.5275564Z [W1204 11:19:54.568658042 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.5275569Z 2025-12-04T11:20:45.5276075Z [W1204 11:19:54.568862515 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.5276080Z 2025-12-04T11:20:45.5276599Z [W1204 11:19:54.575772272 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.5276608Z 2025-12-04T11:20:45.5277117Z [W1204 11:19:54.576518516 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.5277167Z 2025-12-04T11:20:45.5277688Z [W1204 11:19:54.576735583 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.5277696Z 2025-12-04T11:20:45.5278204Z [W1204 11:19:54.692060738 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.5278209Z 2025-12-04T11:20:45.5278716Z [W1204 11:19:54.692876096 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.5278737Z 2025-12-04T11:20:45.5279256Z [W1204 11:19:54.693093523 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.5279261Z 2025-12-04T11:20:45.5279780Z [W1204 11:19:54.697875315 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.5279787Z 2025-12-04T11:20:45.5280307Z [W1204 11:19:54.698564970 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.5280311Z 2025-12-04T11:20:45.5280828Z [W1204 11:19:54.698766571 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.5280832Z 2025-12-04T11:20:45.5281356Z [W1204 11:19:54.705772301 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.5281361Z 2025-12-04T11:20:45.5281869Z [W1204 11:19:54.706468384 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.5281874Z 2025-12-04T11:20:45.5282402Z [W1204 11:19:54.706667911 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.5282409Z 2025-12-04T11:20:45.5282515Z FAILED [0.5452s] [100%] 2025-12-04T11:20:45.5282519Z 2025-12-04T11:20:45.5282664Z ==================================== RERUNS ==================================== 2025-12-04T11:20:45.5283191Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 _ 2025-12-04T11:20:45.5283320Z Traceback (most recent call last): 2025-12-04T11:20:45.5283850Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda 2025-12-04T11:20:45.5284085Z self.assertEqual(counters["inductor"]["woq_matcher_count"], 1) 2025-12-04T11:20:45.5284552Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual 2025-12-04T11:20:45.5284798Z return super().assertEqual(x, y, *args, **kwargs) 2025-12-04T11:20:45.5285337Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual 2025-12-04T11:20:45.5285563Z raise error_metas.pop()[0].to_error( # type: ignore[index] 2025-12-04T11:20:45.5285698Z AssertionError: Scalars are not equal! 2025-12-04T11:20:45.5285704Z 2025-12-04T11:20:45.5285812Z Expected 1 but got 0. 2025-12-04T11:20:45.5285973Z Absolute difference: 1 2025-12-04T11:20:45.5286088Z Relative difference: 1.0 2025-12-04T11:20:45.5286093Z 2025-12-04T11:20:45.5286310Z To execute this test, run the following from the base repo dir: 2025-12-04T11:20:45.5287240Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 2025-12-04T11:20:45.5287245Z 2025-12-04T11:20:45.5287515Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:20:45.5287759Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:20:45.5287912Z stats [('calls_captured', 6)] 2025-12-04T11:20:45.5288608Z inductor [('pattern_matcher_count', 6), ('pattern_matcher_nodes', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('pad_mm_bench', 1)] 2025-12-04T11:20:45.5288850Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:20:45.5288956Z graph_break [] 2025-12-04T11:20:45.5289097Z aten_mm_info [('aten.mm_256_72_1024', 2)] 2025-12-04T11:20:45.5289317Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:20:45.5290526Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T11:20:45.5290665Z if out == self.unknown_value: 2025-12-04T11:20:45.5291393Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.5291517Z warnings.warn( 2025-12-04T11:20:45.5292237Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.5292344Z warnings.warn( 2025-12-04T11:20:45.5292865Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 _ 2025-12-04T11:20:45.5292993Z Traceback (most recent call last): 2025-12-04T11:20:45.5293500Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda 2025-12-04T11:20:45.5293747Z self.assertEqual(counters["inductor"]["woq_matcher_count"], 1) 2025-12-04T11:20:45.5294205Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual 2025-12-04T11:20:45.5294386Z return super().assertEqual(x, y, *args, **kwargs) 2025-12-04T11:20:45.5294919Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual 2025-12-04T11:20:45.5295125Z raise error_metas.pop()[0].to_error( # type: ignore[index] 2025-12-04T11:20:45.5295273Z AssertionError: Scalars are not equal! 2025-12-04T11:20:45.5295278Z 2025-12-04T11:20:45.5295387Z Expected 1 but got 0. 2025-12-04T11:20:45.5295495Z Absolute difference: 1 2025-12-04T11:20:45.5295617Z Relative difference: 1.0 2025-12-04T11:20:45.5295622Z 2025-12-04T11:20:45.5295835Z To execute this test, run the following from the base repo dir: 2025-12-04T11:20:45.5296911Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 2025-12-04T11:20:45.5296919Z 2025-12-04T11:20:45.5297192Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:20:45.5297416Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:20:45.5297548Z stats [('calls_captured', 6)] 2025-12-04T11:20:45.5298244Z inductor [('pattern_matcher_count', 6), ('pattern_matcher_nodes', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('pad_mm_bench', 1)] 2025-12-04T11:20:45.5298520Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:20:45.5298620Z graph_break [] 2025-12-04T11:20:45.5298747Z aten_mm_info [('aten.mm_256_72_1024', 2)] 2025-12-04T11:20:45.5298984Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:20:45.5300193Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T11:20:45.5300360Z if out == self.unknown_value: 2025-12-04T11:20:45.5301086Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.5301191Z warnings.warn( 2025-12-04T11:20:45.5301924Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.5302030Z warnings.warn( 2025-12-04T11:20:45.5302249Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:20:45.5302379Z stats [('calls_captured', 6)] 2025-12-04T11:20:45.5302608Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:20:45.5303318Z inductor [('pattern_matcher_count', 6), ('pattern_matcher_nodes', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('pad_mm_bench', 1)] 2025-12-04T11:20:45.5303422Z graph_break [] 2025-12-04T11:20:45.5303546Z aten_mm_info [('aten.mm_256_72_1024', 2)] 2025-12-04T11:20:45.5303775Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:20:45.5304500Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.5304618Z warnings.warn( 2025-12-04T11:20:45.5305338Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.5305441Z warnings.warn( 2025-12-04T11:20:45.5305604Z =================================== FAILURES =================================== 2025-12-04T11:20:45.5306118Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 _ 2025-12-04T11:20:45.5306244Z Traceback (most recent call last): 2025-12-04T11:20:45.5306763Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda 2025-12-04T11:20:45.5306993Z self.assertEqual(counters["inductor"]["woq_matcher_count"], 1) 2025-12-04T11:20:45.5307462Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual 2025-12-04T11:20:45.5307628Z return super().assertEqual(x, y, *args, **kwargs) 2025-12-04T11:20:45.5308161Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual 2025-12-04T11:20:45.5308381Z raise error_metas.pop()[0].to_error( # type: ignore[index] 2025-12-04T11:20:45.5308513Z AssertionError: Scalars are not equal! 2025-12-04T11:20:45.5308518Z 2025-12-04T11:20:45.5308637Z Expected 1 but got 0. 2025-12-04T11:20:45.5308827Z Absolute difference: 1 2025-12-04T11:20:45.5308941Z Relative difference: 1.0 2025-12-04T11:20:45.5308949Z 2025-12-04T11:20:45.5309182Z To execute this test, run the following from the base repo dir: 2025-12-04T11:20:45.5310086Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 2025-12-04T11:20:45.5310123Z 2025-12-04T11:20:45.5310393Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:20:45.5310627Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:20:45.5310747Z stats [('calls_captured', 6)] 2025-12-04T11:20:45.5311462Z inductor [('pattern_matcher_count', 6), ('pattern_matcher_nodes', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('pad_mm_bench', 1)] 2025-12-04T11:20:45.5311697Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:20:45.5311799Z graph_break [] 2025-12-04T11:20:45.5311974Z aten_mm_info [('aten.mm_256_72_1024', 2)] 2025-12-04T11:20:45.5312194Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:20:45.5313421Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T11:20:45.5313542Z if out == self.unknown_value: 2025-12-04T11:20:45.5314268Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.5314385Z warnings.warn( 2025-12-04T11:20:45.5315104Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.5315211Z warnings.warn( 2025-12-04T11:20:45.5315444Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:20:45.5315563Z stats [('calls_captured', 6)] 2025-12-04T11:20:45.5315803Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:20:45.5316500Z inductor [('pattern_matcher_count', 6), ('pattern_matcher_nodes', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('pad_mm_bench', 1)] 2025-12-04T11:20:45.5316601Z graph_break [] 2025-12-04T11:20:45.5316739Z aten_mm_info [('aten.mm_256_72_1024', 2)] 2025-12-04T11:20:45.5316954Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:20:45.5317695Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.5317798Z warnings.warn( 2025-12-04T11:20:45.5318520Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.5318637Z warnings.warn( 2025-12-04T11:20:45.5318853Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:20:45.5318969Z stats [('calls_captured', 6)] 2025-12-04T11:20:45.5319209Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:20:45.5319902Z inductor [('pattern_matcher_count', 6), ('pattern_matcher_nodes', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('pad_mm_bench', 1)] 2025-12-04T11:20:45.5320015Z graph_break [] 2025-12-04T11:20:45.5320139Z aten_mm_info [('aten.mm_256_72_1024', 2)] 2025-12-04T11:20:45.5320354Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:20:45.5321154Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.5321258Z warnings.warn( 2025-12-04T11:20:45.5321972Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.5322086Z warnings.warn( 2025-12-04T11:20:45.5322924Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-21fa07c752a411ad.xml - 2025-12-04T11:20:45.5323144Z =========================== short test summary info ============================ 2025-12-04T11:20:45.5324082Z FAILED [0.5452s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 - AssertionError: Scalars are not equal! 2025-12-04T11:20:45.5324088Z 2025-12-04T11:20:45.5324209Z Expected 1 but got 0. 2025-12-04T11:20:45.5324325Z Absolute difference: 1 2025-12-04T11:20:45.5324469Z Relative difference: 1.0 2025-12-04T11:20:45.5324474Z 2025-12-04T11:20:45.5324702Z To execute this test, run the following from the base repo dir: 2025-12-04T11:20:45.5325608Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 2025-12-04T11:20:45.5325616Z 2025-12-04T11:20:45.5325883Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:20:45.5326075Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:20:45.5326276Z ================== 1 failed, 13 deselected, 2 rerun in 21.03s ================== 2025-12-04T11:20:45.5326390Z Got exit code 1 2025-12-04T11:20:45.5326498Z Retrying single test... 2025-12-04T11:20:45.5326950Z W1204 11:20:06.141000 93808 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T11:20:45.5327622Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-4c09ee7a97c51183.xml 2025-12-04T11:20:45.5327791Z ============================= test session starts ============================== 2025-12-04T11:20:45.5328152Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T11:20:45.5328265Z cachedir: .pytest_cache 2025-12-04T11:20:45.5328786Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:20:45.5328924Z rootdir: /var/lib/jenkins/workspace 2025-12-04T11:20:45.5329033Z configfile: pytest.ini 2025-12-04T11:20:45.5329575Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:20:45.5329811Z collecting ... collected 58 items / 13 deselected / 45 selected 2025-12-04T11:20:45.5330807Z stepcurrent: skipping 13 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 2025-12-04T11:20:45.5330938Z Running 1 items in this shard 2025-12-04T11:20:45.5330944Z 2025-12-04T11:20:45.5332236Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 [W1204 11:20:11.096681931 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.5332245Z 2025-12-04T11:20:45.5332777Z [W1204 11:20:28.186452159 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.5332782Z 2025-12-04T11:20:45.5333364Z [W1204 11:20:28.186716034 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.5333373Z 2025-12-04T11:20:45.5333883Z [W1204 11:20:28.194944748 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.5333901Z 2025-12-04T11:20:45.5334414Z [W1204 11:20:28.195851251 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.5334449Z 2025-12-04T11:20:45.5334958Z [W1204 11:20:28.196052020 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.5334963Z 2025-12-04T11:20:45.5335484Z [W1204 11:20:28.203651391 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.5335489Z 2025-12-04T11:20:45.5336002Z [W1204 11:20:28.204319762 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.5336007Z 2025-12-04T11:20:45.5336641Z [W1204 11:20:28.204510086 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.5336647Z 2025-12-04T11:20:45.5337108Z W1204 11:20:28.926000 93808 site-packages/torch/_inductor/utils.py:1703] [0/0] Not enough SMs to use max_autotune_gemm mode 2025-12-04T11:20:45.5337635Z [W1204 11:20:29.400484846 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.5337640Z 2025-12-04T11:20:45.5338149Z [W1204 11:20:29.402257914 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.5338154Z 2025-12-04T11:20:45.5338663Z [W1204 11:20:29.402479468 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.5338682Z 2025-12-04T11:20:45.5339193Z [W1204 11:20:29.407235324 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.5339201Z 2025-12-04T11:20:45.5339708Z [W1204 11:20:29.407969242 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.5339712Z 2025-12-04T11:20:45.5340235Z [W1204 11:20:29.408182677 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.5340240Z 2025-12-04T11:20:45.5340748Z [W1204 11:20:29.415007990 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.5340753Z 2025-12-04T11:20:45.5341277Z [W1204 11:20:29.415715316 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.5341283Z 2025-12-04T11:20:45.5341793Z [W1204 11:20:29.415920865 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.5341800Z 2025-12-04T11:20:45.5341949Z ('RERUN', {'yellow': True}) [21.1250s] [100%] 2025-12-04T11:20:45.5343226Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 [W1204 11:20:29.876668578 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.5343236Z 2025-12-04T11:20:45.5343758Z [W1204 11:20:29.877453862 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.5343764Z 2025-12-04T11:20:45.5344271Z [W1204 11:20:29.877661954 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.5344276Z 2025-12-04T11:20:45.5344849Z [W1204 11:20:29.882693558 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.5344874Z 2025-12-04T11:20:45.5345387Z [W1204 11:20:29.883378160 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.5345392Z 2025-12-04T11:20:45.5345903Z [W1204 11:20:29.883584684 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.5345938Z 2025-12-04T11:20:45.5346462Z [W1204 11:20:29.890428222 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.5346466Z 2025-12-04T11:20:45.5346973Z [W1204 11:20:29.891119052 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.5346978Z 2025-12-04T11:20:45.5347498Z [W1204 11:20:29.891316433 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.5347550Z 2025-12-04T11:20:45.5348061Z [W1204 11:20:29.002453388 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.5348066Z 2025-12-04T11:20:45.5348584Z [W1204 11:20:29.003289758 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.5348591Z 2025-12-04T11:20:45.5349096Z [W1204 11:20:29.003516496 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.5349101Z 2025-12-04T11:20:45.5349622Z [W1204 11:20:29.008294928 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.5349627Z 2025-12-04T11:20:45.5350138Z [W1204 11:20:29.009042062 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.5350145Z 2025-12-04T11:20:45.5350655Z [W1204 11:20:29.009251483 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.5350659Z 2025-12-04T11:20:45.5351182Z [W1204 11:20:29.016184178 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.5351189Z 2025-12-04T11:20:45.5351697Z [W1204 11:20:29.016918274 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.5351703Z 2025-12-04T11:20:45.5352229Z [W1204 11:20:29.017122621 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.5352234Z 2025-12-04T11:20:45.5352369Z ('RERUN', {'yellow': True}) [0.5616s] [100%] 2025-12-04T11:20:45.5353660Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 [W1204 11:20:30.414269816 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.5353668Z 2025-12-04T11:20:45.5354181Z [W1204 11:20:30.415084846 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.5354188Z 2025-12-04T11:20:45.5354711Z [W1204 11:20:30.415308496 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.5354715Z 2025-12-04T11:20:45.5355224Z [W1204 11:20:30.420329618 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.5355229Z 2025-12-04T11:20:45.5355801Z [W1204 11:20:30.421069777 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.5355824Z 2025-12-04T11:20:45.5356333Z [W1204 11:20:30.421283110 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.5356337Z 2025-12-04T11:20:45.5356849Z [W1204 11:20:30.428131946 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.5356885Z 2025-12-04T11:20:45.5357411Z [W1204 11:20:30.428869829 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.5357416Z 2025-12-04T11:20:45.5357930Z [W1204 11:20:30.429078765 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.5357934Z 2025-12-04T11:20:45.5358461Z [W1204 11:20:30.544890644 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.5358466Z 2025-12-04T11:20:45.5359006Z [W1204 11:20:30.545722248 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.5359011Z 2025-12-04T11:20:45.5359529Z [W1204 11:20:30.545945278 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.5359536Z 2025-12-04T11:20:45.5360046Z [W1204 11:20:30.550844335 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.5360051Z 2025-12-04T11:20:45.5360573Z [W1204 11:20:30.551600596 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.5360578Z 2025-12-04T11:20:45.5361089Z [W1204 11:20:30.551808446 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.5361098Z 2025-12-04T11:20:45.5361610Z [W1204 11:20:30.558984282 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.5361635Z 2025-12-04T11:20:45.5362147Z [W1204 11:20:30.559735594 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.5362154Z 2025-12-04T11:20:45.5362662Z [W1204 11:20:30.559941065 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:20:45.5362668Z 2025-12-04T11:20:45.5362789Z FAILED [0.5422s] [100%] 2025-12-04T11:20:45.5362794Z 2025-12-04T11:20:45.5362941Z ==================================== RERUNS ==================================== 2025-12-04T11:20:45.5363468Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 _ 2025-12-04T11:20:45.5363597Z Traceback (most recent call last): 2025-12-04T11:20:45.5364110Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda 2025-12-04T11:20:45.5364359Z self.assertEqual(counters["inductor"]["woq_matcher_count"], 1) 2025-12-04T11:20:45.5364827Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual 2025-12-04T11:20:45.5364996Z return super().assertEqual(x, y, *args, **kwargs) 2025-12-04T11:20:45.5365543Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual 2025-12-04T11:20:45.5365751Z raise error_metas.pop()[0].to_error( # type: ignore[index] 2025-12-04T11:20:45.5365900Z AssertionError: Scalars are not equal! 2025-12-04T11:20:45.5365905Z 2025-12-04T11:20:45.5366012Z Expected 1 but got 0. 2025-12-04T11:20:45.5366124Z Absolute difference: 1 2025-12-04T11:20:45.5366247Z Relative difference: 1.0 2025-12-04T11:20:45.5366252Z 2025-12-04T11:20:45.5366539Z To execute this test, run the following from the base repo dir: 2025-12-04T11:20:45.5367464Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 2025-12-04T11:20:45.5367469Z 2025-12-04T11:20:45.5367738Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:20:45.5367992Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:20:45.5368121Z stats [('calls_captured', 6)] 2025-12-04T11:20:45.5368819Z inductor [('pattern_matcher_count', 6), ('pattern_matcher_nodes', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('pad_mm_bench', 1)] 2025-12-04T11:20:45.5369060Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:20:45.5369160Z graph_break [] 2025-12-04T11:20:45.5369289Z aten_mm_info [('aten.mm_256_72_1024', 2)] 2025-12-04T11:20:45.5369519Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:20:45.5370843Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T11:20:45.5371251Z if out == self.unknown_value: 2025-12-04T11:20:45.5372055Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.5372161Z warnings.warn( 2025-12-04T11:20:45.5372889Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.5372993Z warnings.warn( 2025-12-04T11:20:45.5373510Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 _ 2025-12-04T11:20:45.5373657Z Traceback (most recent call last): 2025-12-04T11:20:45.5374162Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda 2025-12-04T11:20:45.5374411Z self.assertEqual(counters["inductor"]["woq_matcher_count"], 1) 2025-12-04T11:20:45.5374876Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual 2025-12-04T11:20:45.5375041Z return super().assertEqual(x, y, *args, **kwargs) 2025-12-04T11:20:45.5375592Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual 2025-12-04T11:20:45.5375800Z raise error_metas.pop()[0].to_error( # type: ignore[index] 2025-12-04T11:20:45.5375932Z AssertionError: Scalars are not equal! 2025-12-04T11:20:45.5375938Z 2025-12-04T11:20:45.5376058Z Expected 1 but got 0. 2025-12-04T11:20:45.5376173Z Absolute difference: 1 2025-12-04T11:20:45.5376364Z Relative difference: 1.0 2025-12-04T11:20:45.5376370Z 2025-12-04T11:20:45.5376591Z To execute this test, run the following from the base repo dir: 2025-12-04T11:20:45.5377498Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 2025-12-04T11:20:45.5377507Z 2025-12-04T11:20:45.5377792Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:20:45.5378014Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:20:45.5378149Z stats [('calls_captured', 6)] 2025-12-04T11:20:45.5378848Z inductor [('pattern_matcher_count', 6), ('pattern_matcher_nodes', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('pad_mm_bench', 1)] 2025-12-04T11:20:45.5379229Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:20:45.5379347Z graph_break [] 2025-12-04T11:20:45.5379472Z aten_mm_info [('aten.mm_256_72_1024', 2)] 2025-12-04T11:20:45.5379691Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:20:45.5380913Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T11:20:45.5381077Z if out == self.unknown_value: 2025-12-04T11:20:45.5381820Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.5381923Z warnings.warn( 2025-12-04T11:20:45.5382639Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.5382804Z warnings.warn( 2025-12-04T11:20:45.5383025Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:20:45.5383157Z stats [('calls_captured', 6)] 2025-12-04T11:20:45.5383388Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:20:45.5384084Z inductor [('pattern_matcher_count', 6), ('pattern_matcher_nodes', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('pad_mm_bench', 1)] 2025-12-04T11:20:45.5384202Z graph_break [] 2025-12-04T11:20:45.5384326Z aten_mm_info [('aten.mm_256_72_1024', 2)] 2025-12-04T11:20:45.5384547Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:20:45.5385287Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.5385390Z warnings.warn( 2025-12-04T11:20:45.5386122Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.5386226Z warnings.warn( 2025-12-04T11:20:45.5386376Z =================================== FAILURES =================================== 2025-12-04T11:20:45.5386897Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 _ 2025-12-04T11:20:45.5387024Z Traceback (most recent call last): 2025-12-04T11:20:45.5387548Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda 2025-12-04T11:20:45.5387782Z self.assertEqual(counters["inductor"]["woq_matcher_count"], 1) 2025-12-04T11:20:45.5388244Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual 2025-12-04T11:20:45.5388425Z return super().assertEqual(x, y, *args, **kwargs) 2025-12-04T11:20:45.5388960Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual 2025-12-04T11:20:45.5389170Z raise error_metas.pop()[0].to_error( # type: ignore[index] 2025-12-04T11:20:45.5389317Z AssertionError: Scalars are not equal! 2025-12-04T11:20:45.5389322Z 2025-12-04T11:20:45.5389429Z Expected 1 but got 0. 2025-12-04T11:20:45.5389553Z Absolute difference: 1 2025-12-04T11:20:45.5389665Z Relative difference: 1.0 2025-12-04T11:20:45.5389670Z 2025-12-04T11:20:45.5389885Z To execute this test, run the following from the base repo dir: 2025-12-04T11:20:45.5390810Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 2025-12-04T11:20:45.5390816Z 2025-12-04T11:20:45.5391084Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:20:45.5391384Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:20:45.5391506Z stats [('calls_captured', 6)] 2025-12-04T11:20:45.5392202Z inductor [('pattern_matcher_count', 6), ('pattern_matcher_nodes', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('pad_mm_bench', 1)] 2025-12-04T11:20:45.5392441Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:20:45.5392642Z graph_break [] 2025-12-04T11:20:45.5392765Z aten_mm_info [('aten.mm_256_72_1024', 2)] 2025-12-04T11:20:45.5392996Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:20:45.5394198Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T11:20:45.5394335Z if out == self.unknown_value: 2025-12-04T11:20:45.5395060Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.5395195Z warnings.warn( 2025-12-04T11:20:45.5395930Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.5396035Z warnings.warn( 2025-12-04T11:20:45.5396270Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:20:45.5396387Z stats [('calls_captured', 6)] 2025-12-04T11:20:45.5396615Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:20:45.5397324Z inductor [('pattern_matcher_count', 6), ('pattern_matcher_nodes', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('pad_mm_bench', 1)] 2025-12-04T11:20:45.5397426Z graph_break [] 2025-12-04T11:20:45.5397554Z aten_mm_info [('aten.mm_256_72_1024', 2)] 2025-12-04T11:20:45.5397785Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:20:45.5398505Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.5398620Z warnings.warn( 2025-12-04T11:20:45.5399339Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.5399440Z warnings.warn( 2025-12-04T11:20:45.5399667Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:20:45.5399784Z stats [('calls_captured', 6)] 2025-12-04T11:20:45.5400011Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:20:45.5400719Z inductor [('pattern_matcher_count', 6), ('pattern_matcher_nodes', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('pad_mm_bench', 1)] 2025-12-04T11:20:45.5400820Z graph_break [] 2025-12-04T11:20:45.5400957Z aten_mm_info [('aten.mm_256_72_1024', 2)] 2025-12-04T11:20:45.5401172Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:20:45.5401892Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.5402007Z warnings.warn( 2025-12-04T11:20:45.5402720Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:20:45.5402834Z warnings.warn( 2025-12-04T11:20:45.5403673Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-4c09ee7a97c51183.xml - 2025-12-04T11:20:45.5403906Z =========================== short test summary info ============================ 2025-12-04T11:20:45.5404861Z FAILED [0.5422s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 - AssertionError: Scalars are not equal! 2025-12-04T11:20:45.5404868Z 2025-12-04T11:20:45.5405004Z Expected 1 but got 0. 2025-12-04T11:20:45.5405127Z Absolute difference: 1 2025-12-04T11:20:45.5405243Z Relative difference: 1.0 2025-12-04T11:20:45.5405248Z 2025-12-04T11:20:45.5405466Z To execute this test, run the following from the base repo dir: 2025-12-04T11:20:45.5406389Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 2025-12-04T11:20:45.5406394Z 2025-12-04T11:20:45.5406667Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:20:45.5406864Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:20:45.5407097Z ================== 1 failed, 13 deselected, 2 rerun in 22.26s ================== 2025-12-04T11:20:45.5407199Z Got exit code 1 2025-12-04T11:20:45.5408037Z FAILED CONSISTENTLY: test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 2025-12-04T11:20:45.5408451Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T11:20:45.5408914Z W1204 11:20:42.127000 93990 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T11:20:45.5409570Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-c713b1dc3f3923ec.xml 2025-12-04T11:20:45.5409743Z ============================= test session starts ============================== 2025-12-04T11:20:45.5410108Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T11:20:45.5410220Z cachedir: .pytest_cache 2025-12-04T11:20:45.5410741Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:20:45.5410884Z rootdir: /var/lib/jenkins/workspace 2025-12-04T11:20:45.5410994Z configfile: pytest.ini 2025-12-04T11:20:45.5411546Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:20:45.5411766Z collecting ... collected 58 items / 14 deselected / 44 selected 2025-12-04T11:20:45.5411910Z stepcurrent: skipping 14 already run items. 2025-12-04T11:20:45.5412037Z Running 0 items in this shard 2025-12-04T11:20:45.5412042Z 2025-12-04T11:20:45.5412886Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-c713b1dc3f3923ec.xml - 2025-12-04T11:20:45.5413068Z ============================ 14 deselected in 0.02s ============================ 2025-12-04T11:20:45.5424371Z The following tests failed consistently: ['test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16', 'test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16', 'test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16', 'test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16', 'test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16', 'test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16', 'test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16', 'test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16', 'test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16', 'test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16', 'test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16', 'test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16', 'test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16', 'test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16'] 2025-12-04T11:20:45.5424461Z 2025-12-04T11:20:45.5425100Z FINISHED PRINTING LOG FILE of inductor/test_cuda_select_algorithm 3/5 (test/test-reports/inductor.test_cuda_select_algorithm_3.5_e3565bc7025c1889_.log) 2025-12-04T11:20:45.5425119Z 2025-12-04T11:20:45.5425518Z Finished inductor/test_cuda_select_algorithm 3/5 ... [2025-12-04 11:20:45.178343][7673.561228993], took 21.29min 2025-12-04T11:20:45.5426470Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-74cab4bdcde89184.xml 2025-12-04T11:20:45.5427392Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-77e37a2f8b75b3d9.xml 2025-12-04T11:20:45.5428283Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-3ba19b390afd5854.xml 2025-12-04T11:20:45.5429182Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-4ad317a243ecdd30.xml 2025-12-04T11:20:45.5430083Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-f482798b2b39d897.xml 2025-12-04T11:20:45.5430997Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-cbe2514f89eef609.xml 2025-12-04T11:20:45.5431886Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-3707d31910126ebf.xml 2025-12-04T11:20:45.5432781Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-dedaec5daecec784.xml 2025-12-04T11:20:45.5433682Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-2f4f0e9c4ac682e4.xml 2025-12-04T11:20:45.5434626Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-580d25229e34cb07.xml 2025-12-04T11:20:45.5445728Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-9d15e1ab064c4537.xml 2025-12-04T11:20:45.5736362Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-e6d909bcc6975bf8.xml 2025-12-04T11:20:45.6049956Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-0a612698d44183a1.xml 2025-12-04T11:20:45.6308071Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-90f2ceb88314c75a.xml 2025-12-04T11:20:45.6564277Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-9644b19a5203c0ee.xml 2025-12-04T11:20:45.6867653Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-1a6f999e52eb1904.xml 2025-12-04T11:20:45.7150757Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-547414903ca204e9.xml 2025-12-04T11:20:45.7441766Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-41f0f199b083e6d2.xml 2025-12-04T11:20:45.7753867Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-438f9d52209526cc.xml 2025-12-04T11:20:45.8065430Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-98df9406c6e0faf3.xml 2025-12-04T11:20:45.8387162Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-f706cf73cc88a5b8.xml 2025-12-04T11:20:45.8677489Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-c67f05de6c39b0d8.xml 2025-12-04T11:20:45.8947568Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-0e30a339afee7d22.xml 2025-12-04T11:20:45.9237064Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-4a14a2e6be65f97f.xml 2025-12-04T11:20:45.9752942Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-82a3db4b14f41cd2.xml 2025-12-04T11:20:46.0061331Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-a02c7191ab69f431.xml 2025-12-04T11:20:46.0348172Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-e37b8ebc7938792f.xml 2025-12-04T11:20:46.0639208Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-ee37665d187f9309.xml 2025-12-04T11:20:46.0933923Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-511047743df1b08e.xml 2025-12-04T11:20:46.1250098Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-4d9221d5ac70ff44.xml 2025-12-04T11:20:46.1529189Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-af9a500a606c950b.xml 2025-12-04T11:20:46.1835364Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-e3ba96547605fc4e.xml 2025-12-04T11:20:46.2124390Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-ce470e45644e1cc6.xml 2025-12-04T11:20:46.2416478Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-cece0bb00c5477e6.xml 2025-12-04T11:20:46.2722837Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-4e672e5e3ae6046c.xml 2025-12-04T11:20:46.3021077Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-65775801d71c7290.xml 2025-12-04T11:20:46.3319095Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-9ed754aaaf490f98.xml 2025-12-04T11:20:46.3591063Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-36a3a8a6a9d0a436.xml 2025-12-04T11:20:46.3912580Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-f55b1076fbed9be9.xml 2025-12-04T11:20:46.4191952Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-6062b5e411b734f8.xml 2025-12-04T11:20:46.4457339Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-21fa07c752a411ad.xml 2025-12-04T11:20:46.4736840Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-4c09ee7a97c51183.xml 2025-12-04T11:20:46.5005309Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-c713b1dc3f3923ec.xml 2025-12-04T11:20:46.7587574Z Uploading logs for 57119749248 to S3 2025-12-04T11:20:46.8460948Z Uploading artifacts took 0.32 seconds 2025-12-04T11:20:46.8461383Z inductor/test_cuda_select_algorithm 3/5 failed! 2025-12-04T11:20:46.8466097Z Running inductor/test_compile_subprocess 3/3 ... [2025-12-04 11:20:46.846430][7675.229324441] 2025-12-04T11:20:46.8466742Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T11:20:46.8471701Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_compile_subprocess.py', '--shard-id=3', '--num-shards=3', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 11:20:46.846893] 2025-12-04T11:29:55.3087917Z 2025-12-04T11:29:55.3091520Z PRINTING LOG FILE of inductor/test_compile_subprocess 3/3 (test/test-reports/inductor.test_compile_subprocess_3.3_92ce494afd455b37_.log) 2025-12-04T11:29:55.3093795Z W1204 11:20:56.511000 94107 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T11:29:55.3095442Z Test results will be stored in test-reports/python-pytest/inductor.test_compile_subprocess/inductor.test_compile_subprocess-84a2c5e5cdda7bdd.xml 2025-12-04T11:29:55.3096708Z ============================= test session starts ============================== 2025-12-04T11:29:55.3097995Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T11:29:55.3098610Z cachedir: .pytest_cache 2025-12-04T11:29:55.3099325Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:29:55.3100555Z rootdir: /var/lib/jenkins/workspace 2025-12-04T11:29:55.3101096Z configfile: pytest.ini 2025-12-04T11:29:55.3102323Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:29:55.3103719Z collecting ... collected 879 items 2025-12-04T11:29:55.3104548Z stepcurrent: Cannot find last run test, not skipping 2025-12-04T11:29:55.3234029Z Running 288 items in this shard: test/inductor/test_compile_subprocess.py::GPUTests::test__dyn_quant_matmul_4bit_bf16_input_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test__unsafe_masked_index_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test__unsafe_masked_index_put_accumulate_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_adaptive_avg_pool2d_low_prec_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_adaptive_avg_pool_errors_with_long_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_add_complex4_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_add_complex7_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_add_complex8_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_add_complex_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_addmv_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_aoti_eager_dtype_device_layout_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_aoti_eager_override_registration_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_aoti_eager_support_out_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_arange2_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_arange5_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_argmax_argmin_with_duplicates_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_argmax_argmin_with_nan_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_argmax_to_float_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_avg_pool2d2_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_avg_pool2d3_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_avg_pool2d6_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_avg_pool2d_backward2_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_avg_pool2d_backward_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_avg_pool3d_backward2_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_baddbmm_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_batch_norm_2d_2_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_bernoulli2_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_bitwise2_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_bitwise_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_bmm1_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_bucketize_broadcast_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_bucketize_default_kwargs_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_bucketize_int_int32_uint8_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_bucketize_int_int64_int16_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_bucketize_int_int64_int32_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_bucketize_int_int64_uint8_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_bucketize_int_int8_int32_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_bucketize_int_int8_int64_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_bucketize_int_uint8_int16_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_bucketize_int_uint8_int32_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_bucketize_int_uint8_int8_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_bucketize_int_uint8_uint8_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_buffer_batch_norm_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_buffer_copied_in_graph_with_different_shapes_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_buffer_use_after_remove_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_builtins_round_float_ndigits_neg_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_builtins_round_float_ndigits_pos_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_builtins_round_float_ndigits_zero_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_builtins_round_int_ndigits_zero_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_cat_of_loops_and_extern_kernel_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_cat_single_empty_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_cat_uint8_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_cat_upcasting_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_cauchy_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_chunk_recompiles_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_computed_buffer_inlining_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_consecutive_split_cumprod_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_const_int32_to_float_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_constant_pad_1d_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_constant_pad_float64_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_constant_pad_nd_inplace_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_conv_functional_bn_fuse_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_convolution4_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_copy_non_blocking_is_pinned_use_cat_True_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_copy_with_scalar_src_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_cpu_scalar_with_cpu_scalar_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_cpu_scalar_with_gpu_tensor_cpp_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_cpu_scalar_with_gpu_tensor_dynamic_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_cumsum_inf_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_cumsum_no_mask_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_custom_op_2_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_custom_op_fixed_layout_sequential_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_custom_scan_op_multi_input_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_data_type_propogation_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_dense_mask_index_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_div1_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_div8_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_dropout_trivial_1_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_dtypeview_float16_int8_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_dtypeview_float32_float32_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_dtypeview_float32_float64_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_dtypeview_float64_int64_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_dtypeview_int16_float16_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_dtypeview_int16_float32_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_dtypeview_int16_uint8_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_dtypeview_int32_int8_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_dtypeview_int64_int8_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_dtypeview_int64_uint8_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_dtypeview_int8_float16_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_dtypeview_int8_float32_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_dtypeview_int8_float64_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_dtypeview_int8_int16_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_dtypeview_uint8_int16_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_dtypeview_uint8_uint8_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_elu_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_erfc_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_erfinv_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_exp_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_expanded_reduction_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_expm1_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_fallback_mutable_op_with_return_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_fill1_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_flip_cat_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_flip_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_float_index_expression_type_promotion_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_floordiv_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_fmod_zero_dim_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_forced_buffer_realize_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_fractional_max_pool2d2_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_full_like_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_fuse_large_params_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_fusing_write_into_disjoint_read_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_gather1_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_gather_scatter_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_generate_rand_fp8_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_glu_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_graph_partition_both_scalars_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_graph_partition_constant_tensor2_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_graph_partition_mutation_real_name_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_graph_partition_no_inputs_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_graph_partition_pad_dynamic_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_graph_partition_refcount_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_grid_sampler_2d_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_index2_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_index_dynamic_shapes_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_index_propagation_abs_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_index_propagation_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_index_propagation_floordiv_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_index_propagation_remainder_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_index_put_failed_reinplace_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_index_put_fallback1_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_index_put_index_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_index_select_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_indirect_load_broadcast_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_inductor_layout_optimization_input_mutations_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_inner_fn_str_and_stride_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_inplace_add_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_input_mutation2_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_input_mutation3_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_input_mutation4_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_insignificant_strides_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_isinf2_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_isinf_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_issue102546_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_kernel_names_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_l1_loss_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_large_grid_use_block_ptr_False_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_layer_norm_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_leaky_relu_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_lgamma_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_like_rands2_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_linear_mixed_dtype_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_lite_regional_compile_repeated_blocks_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_lite_triton_kernel_wrapper_functional_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_log2_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_log_fp64_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_logcumsumexp_zero_dim_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_low_memory_max_pool_dilation_1_dim_2_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_low_memory_max_pool_dilation_2_dim_3_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_mark_dynamic_with_hint_override_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_masked_fill_promotion_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_matmul_layer_norm_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_max_min_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_max_pool2d3_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_max_pool2d_with_indices_backward2_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_max_pool2d_with_indices_backward4_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_max_pool2d_with_indices_backward5_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_max_pool2d_with_indices_backward_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_mean_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_min_max_reduction_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_misaligned_address_issue1_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_mixed_mm2_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_mixed_mm3_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_mm_mixed_dtype_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_mul_index_expr_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_multi_gpu_device_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_multi_threading_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_multilayer_any_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_multilayer_sum_low_prec_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_multilayer_var_lowp_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_mutable_custom_op_fixed_layout2_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_nan_sort_stable_False_descending_False_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_new_empty_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_nll_loss_backward_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_nll_loss_forward_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_one_hot_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_pattern_matcher_unbacked_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_pointwise_bessel_j0_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_pointwise_bessel_y0_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_pointwise_bessel_y1_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_pointwise_chebyshev_polynomial_t_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_pointwise_chebyshev_polynomial_w_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_pointwise_entr_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_pointwise_erfcx_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_pointwise_expm1_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_pointwise_gammaincc_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_pointwise_gammaln_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_pointwise_laguerre_polynomial_l_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_pointwise_log_ndtr_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_pointwise_logit_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_pointwise_multigammaln_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_pointwise_ndtri_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_pointwise_psi_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_pointwise_scaled_modified_bessel_k1_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_pow3_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_pow_symfloat_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_prod_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_progressive, test/inductor/test_compile_subprocess.py::GPUTests::test_rand_like_deterministic_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_randint_distribution_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_randn_generator_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_randn_like_empty_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_reduction2_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_reduction5_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_remainder_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_remove_noop_slice_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_remove_noop_slice_scatter_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_repeat_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_repeat_interleave_decomposition_has_clamp_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_require_stride_expanded_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_resize_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_roi_align_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_roll_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_round_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_rsqrt_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_rsqrt_dynamic_shapes_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_scaled_dot_product_attention_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_scatter3_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_scatter4_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_scatter_add3_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_scatter_reduce3_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_sdpa_prefer_nd_tiling_False_use_block_ptr_True_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_sdpa_prefer_nd_tiling_True_use_block_ptr_False_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_sdpa_unaligned_mask_freezing_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_shape_padding_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_signbit_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_silu_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_sin_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_slice1_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_slice2_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_slice_mutation1_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_slice_scatter5_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_sort_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_sort_stable_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_sort_transpose_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_special_polygamma_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_split_cumprod_low_prec_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_split_cumsum_low_prec_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_split_reduction_with_int64_size_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_split_with_unbacked_symints_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_sqrt_dynamic_shapes_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_squeeze1_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_squeeze2_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_squeeze_varargs_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_stack_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_std_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_strided_inputs_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_sum2_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_sum3_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_sum4_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_sum5_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_sum_keepdims_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_tanh_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_tmp_not_defined_issue1_use_block_ptr_True_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_tmp_not_defined_issue3_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_to_dtype_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_triton_argmin_argmax_transpose_logical_index_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_uint4x2_mixed_mm_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_unbacked_floordiv_simplify_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_unbacked_floordiv_simplify_errors_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_unroll_small_reduction_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_unspec_inputs_float16_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_unspec_inputs_int32_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_unspec_inputs_int8_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_upsample_nearest2d_backward_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_var_correction_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_var_mean_div_by_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_var_mean_tile_reduction_False_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_var_mean_tile_reduction_True_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_vertical_fusion1_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_view_as_complex_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_views2_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_weight_norm_bwd_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_weight_norm_conv2d_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_where_broadcast_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_where_with_logical_op_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_xblock_divides_xnumel_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_zero_dim_reductions_cuda 2025-12-04T11:29:55.3347862Z 2025-12-04T11:29:55.3348787Z inductor/test_compile_subprocess.py::GPUTests::test__dyn_quant_matmul_4bit_bf16_input_cuda <- test/inductor/test_torchinductor.py SKIPPED [0.0047s] (No _dyn_quant_matmul_4bit implementation on CUDA) [ 0%] 2025-12-04T11:29:55.3350562Z inductor/test_compile_subprocess.py::GPUTests::test__unsafe_masked_index_cuda <- test/inductor/test_torchinductor.py PASSED [18.4052s] [ 0%] 2025-12-04T11:29:55.3351971Z inductor/test_compile_subprocess.py::GPUTests::test__unsafe_masked_index_put_accumulate_cuda <- test/inductor/test_torchinductor.py PASSED [0.9283s] [ 1%] 2025-12-04T11:29:55.3353394Z inductor/test_compile_subprocess.py::GPUTests::test_adaptive_avg_pool2d_low_prec_cuda <- test/inductor/test_torchinductor.py PASSED [0.6602s] [ 1%] 2025-12-04T11:29:55.3354861Z inductor/test_compile_subprocess.py::GPUTests::test_adaptive_avg_pool_errors_with_long_cuda <- test/inductor/test_torchinductor.py PASSED [0.6870s] [ 1%] 2025-12-04T11:29:55.3356196Z inductor/test_compile_subprocess.py::GPUTests::test_add_complex4_cuda <- test/inductor/test_torchinductor.py PASSED [1.5006s] [ 2%] 2025-12-04T11:29:55.3357432Z inductor/test_compile_subprocess.py::GPUTests::test_add_complex7_cuda <- test/inductor/test_torchinductor.py PASSED [0.6564s] [ 2%] 2025-12-04T11:29:55.3358756Z inductor/test_compile_subprocess.py::GPUTests::test_add_complex8_cuda <- test/inductor/test_torchinductor.py PASSED [0.6202s] [ 2%] 2025-12-04T11:29:55.3359991Z inductor/test_compile_subprocess.py::GPUTests::test_add_complex_cuda <- test/inductor/test_torchinductor.py PASSED [0.6110s] [ 3%] 2025-12-04T11:29:55.3361698Z inductor/test_compile_subprocess.py::GPUTests::test_addmv_cuda <- test/inductor/test_torchinductor.py W1204 11:21:22.664000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Unable to pickle input graph or example inputs 2025-12-04T11:29:55.3363317Z W1204 11:21:22.664000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Traceback (most recent call last): 2025-12-04T11:29:55.3364818Z W1204 11:21:22.664000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 486, in serialize_compile 2025-12-04T11:29:55.3366216Z W1204 11:21:22.664000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] ).serialize() 2025-12-04T11:29:55.3367581Z W1204 11:21:22.664000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 210, in serialize 2025-12-04T11:29:55.3369146Z W1204 11:21:22.664000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] return _WireProtocolPickledInput(GraphPickler.dumps(self)) 2025-12-04T11:29:55.3370666Z W1204 11:21:22.664000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 127, in dumps 2025-12-04T11:29:55.3372207Z W1204 11:21:22.664000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] pickler.dump(obj) 2025-12-04T11:29:55.3373580Z W1204 11:21:22.664000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 94, in reducer_override 2025-12-04T11:29:55.3375103Z W1204 11:21:22.664000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] return _GraphModulePickleData.reduce_helper(self, obj) 2025-12-04T11:29:55.3376890Z W1204 11:21:22.664000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 352, in reduce_helper 2025-12-04T11:29:55.3378305Z W1204 11:21:22.664000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] cls(obj, pickler.options), 2025-12-04T11:29:55.3379691Z W1204 11:21:22.664000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 364, in __init__ 2025-12-04T11:29:55.3381199Z W1204 11:21:22.664000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] self.graph = _GraphPickleData(gm._graph, options) 2025-12-04T11:29:55.3382655Z W1204 11:21:22.664000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 571, in __init__ 2025-12-04T11:29:55.3384116Z W1204 11:21:22.664000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] nodes[node] = _NodePickleData(node, nodes, options) 2025-12-04T11:29:55.3385577Z W1204 11:21:22.664000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 387, in __init__ 2025-12-04T11:29:55.3387102Z W1204 11:21:22.664000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] self.target = _OpPickleData.pickle(node.target, options) 2025-12-04T11:29:55.3388592Z W1204 11:21:22.664000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 435, in pickle 2025-12-04T11:29:55.3390098Z W1204 11:21:22.664000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] return cls._pickle_op(name, _OpOverloadPickleData, options) 2025-12-04T11:29:55.3391630Z W1204 11:21:22.664000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 456, in _pickle_op 2025-12-04T11:29:55.3393198Z W1204 11:21:22.664000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] raise BypassFxGraphCache(f"Unable to pickle non-standard op: {name}") 2025-12-04T11:29:55.3394859Z W1204 11:21:22.664000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] torch._inductor.codecache.BypassFxGraphCache: Unable to pickle non-standard op: torch.ops.prims.convert_element_type.default 2025-12-04T11:29:55.3396023Z PASSED [1.0175s] [ 3%] 2025-12-04T11:29:55.3396943Z inductor/test_compile_subprocess.py::GPUTests::test_aoti_eager_dtype_device_layout_cuda <- test/inductor/test_torchinductor.py SKIPPED [0.0035s] (Requires sm80) [ 3%] 2025-12-04T11:29:55.3398536Z inductor/test_compile_subprocess.py::GPUTests::test_aoti_eager_override_registration_cuda <- test/inductor/test_torchinductor.py SKIPPED [0.0031s] (Requires sm80) [ 4%] 2025-12-04T11:29:55.3400075Z inductor/test_compile_subprocess.py::GPUTests::test_aoti_eager_support_out_cuda <- test/inductor/test_torchinductor.py SKIPPED [0.0031s] (Requires sm80) [ 4%] 2025-12-04T11:29:55.3401915Z inductor/test_compile_subprocess.py::GPUTests::test_arange2_cuda <- test/inductor/test_torchinductor.py W1204 11:21:23.683000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Unable to pickle input graph or example inputs 2025-12-04T11:29:55.3403543Z W1204 11:21:23.683000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Traceback (most recent call last): 2025-12-04T11:29:55.3405030Z W1204 11:21:23.683000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 486, in serialize_compile 2025-12-04T11:29:55.3406439Z W1204 11:21:23.683000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] ).serialize() 2025-12-04T11:29:55.3407860Z W1204 11:21:23.683000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 210, in serialize 2025-12-04T11:29:55.3409413Z W1204 11:21:23.683000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] return _WireProtocolPickledInput(GraphPickler.dumps(self)) 2025-12-04T11:29:55.3410909Z W1204 11:21:23.683000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 127, in dumps 2025-12-04T11:29:55.3412285Z W1204 11:21:23.683000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] pickler.dump(obj) 2025-12-04T11:29:55.3413660Z W1204 11:21:23.683000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 94, in reducer_override 2025-12-04T11:29:55.3415170Z W1204 11:21:23.683000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] return _GraphModulePickleData.reduce_helper(self, obj) 2025-12-04T11:29:55.3416769Z W1204 11:21:23.683000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 352, in reduce_helper 2025-12-04T11:29:55.3418220Z W1204 11:21:23.683000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] cls(obj, pickler.options), 2025-12-04T11:29:55.3419606Z W1204 11:21:23.683000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 364, in __init__ 2025-12-04T11:29:55.3421061Z W1204 11:21:23.683000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] self.graph = _GraphPickleData(gm._graph, options) 2025-12-04T11:29:55.3422532Z W1204 11:21:23.683000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 571, in __init__ 2025-12-04T11:29:55.3423999Z W1204 11:21:23.683000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] nodes[node] = _NodePickleData(node, nodes, options) 2025-12-04T11:29:55.3425481Z W1204 11:21:23.683000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 387, in __init__ 2025-12-04T11:29:55.3426974Z W1204 11:21:23.683000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] self.target = _OpPickleData.pickle(node.target, options) 2025-12-04T11:29:55.3428446Z W1204 11:21:23.683000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 435, in pickle 2025-12-04T11:29:55.3429961Z W1204 11:21:23.683000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] return cls._pickle_op(name, _OpOverloadPickleData, options) 2025-12-04T11:29:55.3431489Z W1204 11:21:23.683000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 456, in _pickle_op 2025-12-04T11:29:55.3433071Z W1204 11:21:23.683000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] raise BypassFxGraphCache(f"Unable to pickle non-standard op: {name}") 2025-12-04T11:29:55.3434661Z W1204 11:21:23.683000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] torch._inductor.codecache.BypassFxGraphCache: Unable to pickle non-standard op: torch.ops.prims.iota.default 2025-12-04T11:29:55.3435728Z PASSED [0.5444s] [ 4%] 2025-12-04T11:29:55.3436952Z inductor/test_compile_subprocess.py::GPUTests::test_arange5_cuda <- test/inductor/test_torchinductor.py W1204 11:21:24.219000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Unable to pickle input graph or example inputs 2025-12-04T11:29:55.3438653Z W1204 11:21:24.219000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Traceback (most recent call last): 2025-12-04T11:29:55.3440156Z W1204 11:21:24.219000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 486, in serialize_compile 2025-12-04T11:29:55.3441557Z W1204 11:21:24.219000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] ).serialize() 2025-12-04T11:29:55.3445944Z W1204 11:21:24.219000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 210, in serialize 2025-12-04T11:29:55.3447556Z W1204 11:21:24.219000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] return _WireProtocolPickledInput(GraphPickler.dumps(self)) 2025-12-04T11:29:55.3449069Z W1204 11:21:24.219000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 127, in dumps 2025-12-04T11:29:55.3450394Z W1204 11:21:24.219000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] pickler.dump(obj) 2025-12-04T11:29:55.3451849Z W1204 11:21:24.219000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 94, in reducer_override 2025-12-04T11:29:55.3453379Z W1204 11:21:24.219000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] return _GraphModulePickleData.reduce_helper(self, obj) 2025-12-04T11:29:55.3454941Z W1204 11:21:24.219000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 352, in reduce_helper 2025-12-04T11:29:55.3456346Z W1204 11:21:24.219000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] cls(obj, pickler.options), 2025-12-04T11:29:55.3457840Z W1204 11:21:24.219000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 364, in __init__ 2025-12-04T11:29:55.3459310Z W1204 11:21:24.219000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] self.graph = _GraphPickleData(gm._graph, options) 2025-12-04T11:29:55.3460766Z W1204 11:21:24.219000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 571, in __init__ 2025-12-04T11:29:55.3462228Z W1204 11:21:24.219000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] nodes[node] = _NodePickleData(node, nodes, options) 2025-12-04T11:29:55.3463674Z W1204 11:21:24.219000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 387, in __init__ 2025-12-04T11:29:55.3465165Z W1204 11:21:24.219000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] self.target = _OpPickleData.pickle(node.target, options) 2025-12-04T11:29:55.3466656Z W1204 11:21:24.219000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 435, in pickle 2025-12-04T11:29:55.3468152Z W1204 11:21:24.219000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] return cls._pickle_op(name, _OpOverloadPickleData, options) 2025-12-04T11:29:55.3469670Z W1204 11:21:24.219000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 456, in _pickle_op 2025-12-04T11:29:55.3471469Z W1204 11:21:24.219000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] raise BypassFxGraphCache(f"Unable to pickle non-standard op: {name}") 2025-12-04T11:29:55.3473174Z W1204 11:21:24.219000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] torch._inductor.codecache.BypassFxGraphCache: Unable to pickle non-standard op: torch.ops.prims.iota.default 2025-12-04T11:29:55.3474655Z W1204 11:21:24.744000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/1] Unable to pickle input graph or example inputs 2025-12-04T11:29:55.3475758Z W1204 11:21:24.744000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/1] Traceback (most recent call last): 2025-12-04T11:29:55.3477372Z W1204 11:21:24.744000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 486, in serialize_compile 2025-12-04T11:29:55.3478767Z W1204 11:21:24.744000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/1] ).serialize() 2025-12-04T11:29:55.3480123Z W1204 11:21:24.744000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 210, in serialize 2025-12-04T11:29:55.3481749Z W1204 11:21:24.744000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/1] return _WireProtocolPickledInput(GraphPickler.dumps(self)) 2025-12-04T11:29:55.3483252Z W1204 11:21:24.744000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 127, in dumps 2025-12-04T11:29:55.3484563Z W1204 11:21:24.744000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/1] pickler.dump(obj) 2025-12-04T11:29:55.3485944Z W1204 11:21:24.744000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 94, in reducer_override 2025-12-04T11:29:55.3487464Z W1204 11:21:24.744000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/1] return _GraphModulePickleData.reduce_helper(self, obj) 2025-12-04T11:29:55.3488983Z W1204 11:21:24.744000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 352, in reduce_helper 2025-12-04T11:29:55.3490389Z W1204 11:21:24.744000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/1] cls(obj, pickler.options), 2025-12-04T11:29:55.3491764Z W1204 11:21:24.744000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 364, in __init__ 2025-12-04T11:29:55.3493227Z W1204 11:21:24.744000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/1] self.graph = _GraphPickleData(gm._graph, options) 2025-12-04T11:29:55.3494685Z W1204 11:21:24.744000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 571, in __init__ 2025-12-04T11:29:55.3496149Z W1204 11:21:24.744000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/1] nodes[node] = _NodePickleData(node, nodes, options) 2025-12-04T11:29:55.3497678Z W1204 11:21:24.744000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 387, in __init__ 2025-12-04T11:29:55.3499183Z W1204 11:21:24.744000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/1] self.target = _OpPickleData.pickle(node.target, options) 2025-12-04T11:29:55.3500681Z W1204 11:21:24.744000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 435, in pickle 2025-12-04T11:29:55.3502185Z W1204 11:21:24.744000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/1] return cls._pickle_op(name, _OpOverloadPickleData, options) 2025-12-04T11:29:55.3503758Z W1204 11:21:24.744000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 456, in _pickle_op 2025-12-04T11:29:55.3505317Z W1204 11:21:24.744000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/1] raise BypassFxGraphCache(f"Unable to pickle non-standard op: {name}") 2025-12-04T11:29:55.3506908Z W1204 11:21:24.744000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/1] torch._inductor.codecache.BypassFxGraphCache: Unable to pickle non-standard op: torch.ops.prims.iota.default 2025-12-04T11:29:55.3508017Z PASSED [0.7799s] [ 5%] 2025-12-04T11:29:55.3508918Z inductor/test_compile_subprocess.py::GPUTests::test_argmax_argmin_with_duplicates_cuda <- test/inductor/test_torchinductor.py PASSED [2.3291s] [ 5%] 2025-12-04T11:29:55.3510274Z inductor/test_compile_subprocess.py::GPUTests::test_argmax_argmin_with_nan_cuda <- test/inductor/test_torchinductor.py PASSED [4.0337s] [ 5%] 2025-12-04T11:29:55.3512051Z inductor/test_compile_subprocess.py::GPUTests::test_argmax_to_float_cuda <- test/inductor/test_torchinductor.py W1204 11:21:31.382000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Unable to pickle input graph or example inputs 2025-12-04T11:29:55.3513737Z W1204 11:21:31.382000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Traceback (most recent call last): 2025-12-04T11:29:55.3515225Z W1204 11:21:31.382000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 486, in serialize_compile 2025-12-04T11:29:55.3516635Z W1204 11:21:31.382000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] ).serialize() 2025-12-04T11:29:55.3517979Z W1204 11:21:31.382000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 210, in serialize 2025-12-04T11:29:55.3519526Z W1204 11:21:31.382000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] return _WireProtocolPickledInput(GraphPickler.dumps(self)) 2025-12-04T11:29:55.3521029Z W1204 11:21:31.382000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 127, in dumps 2025-12-04T11:29:55.3522359Z W1204 11:21:31.382000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] pickler.dump(obj) 2025-12-04T11:29:55.3523727Z W1204 11:21:31.382000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 94, in reducer_override 2025-12-04T11:29:55.3525246Z W1204 11:21:31.382000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] return _GraphModulePickleData.reduce_helper(self, obj) 2025-12-04T11:29:55.3526762Z W1204 11:21:31.382000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 352, in reduce_helper 2025-12-04T11:29:55.3528172Z W1204 11:21:31.382000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] cls(obj, pickler.options), 2025-12-04T11:29:55.3529552Z W1204 11:21:31.382000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 364, in __init__ 2025-12-04T11:29:55.3530996Z W1204 11:21:31.382000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] self.graph = _GraphPickleData(gm._graph, options) 2025-12-04T11:29:55.3532448Z W1204 11:21:31.382000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 571, in __init__ 2025-12-04T11:29:55.3533933Z W1204 11:21:31.382000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] nodes[node] = _NodePickleData(node, nodes, options) 2025-12-04T11:29:55.3535401Z W1204 11:21:31.382000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 387, in __init__ 2025-12-04T11:29:55.3536978Z W1204 11:21:31.382000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] self.target = _OpPickleData.pickle(node.target, options) 2025-12-04T11:29:55.3538522Z W1204 11:21:31.382000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 435, in pickle 2025-12-04T11:29:55.3540024Z W1204 11:21:31.382000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] return cls._pickle_op(name, _OpOverloadPickleData, options) 2025-12-04T11:29:55.3541557Z W1204 11:21:31.382000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 456, in _pickle_op 2025-12-04T11:29:55.3543162Z W1204 11:21:31.382000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] raise BypassFxGraphCache(f"Unable to pickle non-standard op: {name}") 2025-12-04T11:29:55.3544833Z W1204 11:21:31.382000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] torch._inductor.codecache.BypassFxGraphCache: Unable to pickle non-standard op: torch.ops.prims.convert_element_type.default 2025-12-04T11:29:55.3545969Z PASSED [1.0764s] [ 6%] 2025-12-04T11:29:55.3546712Z inductor/test_compile_subprocess.py::GPUTests::test_avg_pool2d2_cuda <- test/inductor/test_torchinductor.py PASSED [1.1000s] [ 6%] 2025-12-04T11:29:55.3547936Z inductor/test_compile_subprocess.py::GPUTests::test_avg_pool2d3_cuda <- test/inductor/test_torchinductor.py PASSED [1.6083s] [ 6%] 2025-12-04T11:29:55.3549165Z inductor/test_compile_subprocess.py::GPUTests::test_avg_pool2d6_cuda <- test/inductor/test_torchinductor.py PASSED [0.7689s] [ 7%] 2025-12-04T11:29:55.3550429Z inductor/test_compile_subprocess.py::GPUTests::test_avg_pool2d_backward2_cuda <- test/inductor/test_torchinductor.py PASSED [10.1314s] [ 7%] 2025-12-04T11:29:55.3551741Z inductor/test_compile_subprocess.py::GPUTests::test_avg_pool2d_backward_cuda <- test/inductor/test_torchinductor.py PASSED [1.6173s] [ 7%] 2025-12-04T11:29:55.3553332Z inductor/test_compile_subprocess.py::GPUTests::test_avg_pool3d_backward2_cuda <- test/inductor/test_torchinductor.py SKIPPED [0.0005s] (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 8%] 2025-12-04T11:29:55.3555308Z inductor/test_compile_subprocess.py::GPUTests::test_baddbmm_cuda <- test/inductor/test_torchinductor.py W1204 11:21:48.565000 94292 site-packages/torch/_inductor/utils.py:1703] [0/0] Not enough SMs to use max_autotune_gemm mode 2025-12-04T11:29:55.3556515Z PASSED [1.8050s] [ 8%] 2025-12-04T11:29:55.3557263Z inductor/test_compile_subprocess.py::GPUTests::test_batch_norm_2d_2_cuda <- test/inductor/test_torchinductor.py PASSED [2.8427s] [ 9%] 2025-12-04T11:29:55.3558992Z inductor/test_compile_subprocess.py::GPUTests::test_bernoulli2_cuda <- test/inductor/test_torchinductor.py W1204 11:21:52.375000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Unable to pickle input graph or example inputs 2025-12-04T11:29:55.3560636Z W1204 11:21:52.375000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Traceback (most recent call last): 2025-12-04T11:29:55.3562273Z W1204 11:21:52.375000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 486, in serialize_compile 2025-12-04T11:29:55.3563667Z W1204 11:21:52.375000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] ).serialize() 2025-12-04T11:29:55.3565107Z W1204 11:21:52.375000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 210, in serialize 2025-12-04T11:29:55.3566666Z W1204 11:21:52.375000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] return _WireProtocolPickledInput(GraphPickler.dumps(self)) 2025-12-04T11:29:55.3568176Z W1204 11:21:52.375000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 127, in dumps 2025-12-04T11:29:55.3569536Z W1204 11:21:52.375000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] pickler.dump(obj) 2025-12-04T11:29:55.3571145Z W1204 11:21:52.375000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 94, in reducer_override 2025-12-04T11:29:55.3572725Z W1204 11:21:52.375000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] return _GraphModulePickleData.reduce_helper(self, obj) 2025-12-04T11:29:55.3574250Z W1204 11:21:52.375000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 352, in reduce_helper 2025-12-04T11:29:55.3575755Z W1204 11:21:52.375000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] cls(obj, pickler.options), 2025-12-04T11:29:55.3577216Z W1204 11:21:52.375000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 364, in __init__ 2025-12-04T11:29:55.3578687Z W1204 11:21:52.375000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] self.graph = _GraphPickleData(gm._graph, options) 2025-12-04T11:29:55.3580154Z W1204 11:21:52.375000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 571, in __init__ 2025-12-04T11:29:55.3581619Z W1204 11:21:52.375000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] nodes[node] = _NodePickleData(node, nodes, options) 2025-12-04T11:29:55.3583065Z W1204 11:21:52.375000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 387, in __init__ 2025-12-04T11:29:55.3584569Z W1204 11:21:52.375000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] self.target = _OpPickleData.pickle(node.target, options) 2025-12-04T11:29:55.3586062Z W1204 11:21:52.375000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 435, in pickle 2025-12-04T11:29:55.3587559Z W1204 11:21:52.375000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] return cls._pickle_op(name, _OpOverloadPickleData, options) 2025-12-04T11:29:55.3589083Z W1204 11:21:52.375000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 456, in _pickle_op 2025-12-04T11:29:55.3590646Z W1204 11:21:52.375000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] raise BypassFxGraphCache(f"Unable to pickle non-standard op: {name}") 2025-12-04T11:29:55.3592278Z W1204 11:21:52.375000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] torch._inductor.codecache.BypassFxGraphCache: Unable to pickle non-standard op: torch.ops.prims.inductor_seeds.default 2025-12-04T11:29:55.3593800Z W1204 11:21:53.501000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Unable to pickle input graph or example inputs 2025-12-04T11:29:55.3594904Z W1204 11:21:53.501000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Traceback (most recent call last): 2025-12-04T11:29:55.3596480Z W1204 11:21:53.501000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 486, in serialize_compile 2025-12-04T11:29:55.3597882Z W1204 11:21:53.501000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] ).serialize() 2025-12-04T11:29:55.3599238Z W1204 11:21:53.501000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 210, in serialize 2025-12-04T11:29:55.3600845Z W1204 11:21:53.501000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] return _WireProtocolPickledInput(GraphPickler.dumps(self)) 2025-12-04T11:29:55.3602437Z W1204 11:21:53.501000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 127, in dumps 2025-12-04T11:29:55.3603759Z W1204 11:21:53.501000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] pickler.dump(obj) 2025-12-04T11:29:55.3605131Z W1204 11:21:53.501000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 94, in reducer_override 2025-12-04T11:29:55.3606680Z W1204 11:21:53.501000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] return _GraphModulePickleData.reduce_helper(self, obj) 2025-12-04T11:29:55.3608205Z W1204 11:21:53.501000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 352, in reduce_helper 2025-12-04T11:29:55.3609610Z W1204 11:21:53.501000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] cls(obj, pickler.options), 2025-12-04T11:29:55.3610986Z W1204 11:21:53.501000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 364, in __init__ 2025-12-04T11:29:55.3612444Z W1204 11:21:53.501000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] self.graph = _GraphPickleData(gm._graph, options) 2025-12-04T11:29:55.3613915Z W1204 11:21:53.501000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 571, in __init__ 2025-12-04T11:29:55.3615373Z W1204 11:21:53.501000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] nodes[node] = _NodePickleData(node, nodes, options) 2025-12-04T11:29:55.3616929Z W1204 11:21:53.501000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 387, in __init__ 2025-12-04T11:29:55.3618417Z W1204 11:21:53.501000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] self.target = _OpPickleData.pickle(node.target, options) 2025-12-04T11:29:55.3619904Z W1204 11:21:53.501000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 435, in pickle 2025-12-04T11:29:55.3621404Z W1204 11:21:53.501000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] return cls._pickle_op(name, _OpOverloadPickleData, options) 2025-12-04T11:29:55.3622920Z W1204 11:21:53.501000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 456, in _pickle_op 2025-12-04T11:29:55.3624495Z W1204 11:21:53.501000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] raise BypassFxGraphCache(f"Unable to pickle non-standard op: {name}") 2025-12-04T11:29:55.3626130Z W1204 11:21:53.501000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] torch._inductor.codecache.BypassFxGraphCache: Unable to pickle non-standard op: torch.ops.prims.inductor_seeds.default 2025-12-04T11:29:55.3627293Z PASSED [1.4791s] [ 9%] 2025-12-04T11:29:55.3628030Z inductor/test_compile_subprocess.py::GPUTests::test_bitwise2_cuda <- test/inductor/test_torchinductor.py PASSED [0.3629s] [ 9%] 2025-12-04T11:29:55.3629241Z inductor/test_compile_subprocess.py::GPUTests::test_bitwise_cuda <- test/inductor/test_torchinductor.py PASSED [0.3315s] [ 10%] 2025-12-04T11:29:55.3630402Z inductor/test_compile_subprocess.py::GPUTests::test_bmm1_cuda <- test/inductor/test_torchinductor.py PASSED [0.6155s] [ 10%] 2025-12-04T11:29:55.3631719Z inductor/test_compile_subprocess.py::GPUTests::test_bucketize_broadcast_cuda <- test/inductor/test_torchinductor.py PASSED [0.5180s] [ 10%] 2025-12-04T11:29:55.3633063Z inductor/test_compile_subprocess.py::GPUTests::test_bucketize_default_kwargs_cuda <- test/inductor/test_torchinductor.py PASSED [0.2226s] [ 11%] 2025-12-04T11:29:55.3634427Z inductor/test_compile_subprocess.py::GPUTests::test_bucketize_int_int32_uint8_cuda <- test/inductor/test_torchinductor.py PASSED [1.5085s] [ 11%] 2025-12-04T11:29:55.3635764Z inductor/test_compile_subprocess.py::GPUTests::test_bucketize_int_int64_int16_cuda <- test/inductor/test_torchinductor.py PASSED [1.5006s] [ 11%] 2025-12-04T11:29:55.3637151Z inductor/test_compile_subprocess.py::GPUTests::test_bucketize_int_int64_int32_cuda <- test/inductor/test_torchinductor.py PASSED [1.5240s] [ 12%] 2025-12-04T11:29:55.3638504Z inductor/test_compile_subprocess.py::GPUTests::test_bucketize_int_int64_uint8_cuda <- test/inductor/test_torchinductor.py PASSED [1.5327s] [ 12%] 2025-12-04T11:29:55.3639850Z inductor/test_compile_subprocess.py::GPUTests::test_bucketize_int_int8_int32_cuda <- test/inductor/test_torchinductor.py PASSED [1.5015s] [ 12%] 2025-12-04T11:29:55.3641172Z inductor/test_compile_subprocess.py::GPUTests::test_bucketize_int_int8_int64_cuda <- test/inductor/test_torchinductor.py PASSED [1.5204s] [ 13%] 2025-12-04T11:29:55.3642516Z inductor/test_compile_subprocess.py::GPUTests::test_bucketize_int_uint8_int16_cuda <- test/inductor/test_torchinductor.py PASSED [1.4869s] [ 13%] 2025-12-04T11:29:55.3643868Z inductor/test_compile_subprocess.py::GPUTests::test_bucketize_int_uint8_int32_cuda <- test/inductor/test_torchinductor.py PASSED [1.4842s] [ 13%] 2025-12-04T11:29:55.3645211Z inductor/test_compile_subprocess.py::GPUTests::test_bucketize_int_uint8_int8_cuda <- test/inductor/test_torchinductor.py PASSED [1.8127s] [ 14%] 2025-12-04T11:29:55.3646562Z inductor/test_compile_subprocess.py::GPUTests::test_bucketize_int_uint8_uint8_cuda <- test/inductor/test_torchinductor.py PASSED [1.5066s] [ 14%] 2025-12-04T11:29:55.3647865Z inductor/test_compile_subprocess.py::GPUTests::test_buffer_batch_norm_cuda <- test/inductor/test_torchinductor.py PASSED [1.4812s] [ 14%] 2025-12-04T11:29:55.3649264Z inductor/test_compile_subprocess.py::GPUTests::test_buffer_copied_in_graph_with_different_shapes_cuda <- test/inductor/test_torchinductor.py PASSED [0.4896s] [ 15%] 2025-12-04T11:29:55.3650709Z inductor/test_compile_subprocess.py::GPUTests::test_buffer_use_after_remove_cuda <- test/inductor/test_torchinductor.py PASSED [2.6171s] [ 15%] 2025-12-04T11:29:55.3652083Z inductor/test_compile_subprocess.py::GPUTests::test_builtins_round_float_ndigits_neg_cuda <- test/inductor/test_torchinductor.py PASSED [0.3194s] [ 15%] 2025-12-04T11:29:55.3653495Z inductor/test_compile_subprocess.py::GPUTests::test_builtins_round_float_ndigits_pos_cuda <- test/inductor/test_torchinductor.py PASSED [0.2651s] [ 16%] 2025-12-04T11:29:55.3654925Z inductor/test_compile_subprocess.py::GPUTests::test_builtins_round_float_ndigits_zero_cuda <- test/inductor/test_torchinductor.py PASSED [0.2627s] [ 16%] 2025-12-04T11:29:55.3656420Z inductor/test_compile_subprocess.py::GPUTests::test_builtins_round_int_ndigits_zero_cuda <- test/inductor/test_torchinductor.py PASSED [0.2038s] [ 17%] 2025-12-04T11:29:55.3658370Z inductor/test_compile_subprocess.py::GPUTests::test_cat_of_loops_and_extern_kernel_cuda <- test/inductor/test_torchinductor.py W1204 11:22:17.025000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Unable to pickle input graph or example inputs 2025-12-04T11:29:55.3660092Z W1204 11:22:17.025000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Traceback (most recent call last): 2025-12-04T11:29:55.3661564Z W1204 11:22:17.025000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 486, in serialize_compile 2025-12-04T11:29:55.3663115Z W1204 11:22:17.025000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] ).serialize() 2025-12-04T11:29:55.3664537Z W1204 11:22:17.025000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 210, in serialize 2025-12-04T11:29:55.3666087Z W1204 11:22:17.025000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] return _WireProtocolPickledInput(GraphPickler.dumps(self)) 2025-12-04T11:29:55.3667580Z W1204 11:22:17.025000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 127, in dumps 2025-12-04T11:29:55.3668942Z W1204 11:22:17.025000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] pickler.dump(obj) 2025-12-04T11:29:55.3670319Z W1204 11:22:17.025000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 94, in reducer_override 2025-12-04T11:29:55.3672084Z W1204 11:22:17.025000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] return _GraphModulePickleData.reduce_helper(self, obj) 2025-12-04T11:29:55.3673603Z W1204 11:22:17.025000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 352, in reduce_helper 2025-12-04T11:29:55.3674994Z W1204 11:22:17.025000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] cls(obj, pickler.options), 2025-12-04T11:29:55.3676386Z W1204 11:22:17.025000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 364, in __init__ 2025-12-04T11:29:55.3677983Z W1204 11:22:17.025000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] self.graph = _GraphPickleData(gm._graph, options) 2025-12-04T11:29:55.3679447Z W1204 11:22:17.025000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 571, in __init__ 2025-12-04T11:29:55.3680911Z W1204 11:22:17.025000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] nodes[node] = _NodePickleData(node, nodes, options) 2025-12-04T11:29:55.3682365Z W1204 11:22:17.025000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 387, in __init__ 2025-12-04T11:29:55.3683857Z W1204 11:22:17.025000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] self.target = _OpPickleData.pickle(node.target, options) 2025-12-04T11:29:55.3685347Z W1204 11:22:17.025000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 435, in pickle 2025-12-04T11:29:55.3686855Z W1204 11:22:17.025000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] return cls._pickle_op(name, _OpOverloadPickleData, options) 2025-12-04T11:29:55.3688509Z W1204 11:22:17.025000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 456, in _pickle_op 2025-12-04T11:29:55.3690189Z W1204 11:22:17.025000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] raise BypassFxGraphCache(f"Unable to pickle non-standard op: {name}") 2025-12-04T11:29:55.3691922Z W1204 11:22:17.025000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] torch._inductor.codecache.BypassFxGraphCache: Unable to pickle non-standard op: torch.ops.prims._low_memory_max_pool_with_offsets.default 2025-12-04T11:29:55.3693132Z PASSED [1.1611s] [ 17%] 2025-12-04T11:29:55.3693908Z inductor/test_compile_subprocess.py::GPUTests::test_cat_single_empty_cuda <- test/inductor/test_torchinductor.py PASSED [0.2635s] [ 17%] 2025-12-04T11:29:55.3695254Z inductor/test_compile_subprocess.py::GPUTests::test_cat_uint8_cuda <- test/inductor/test_torchinductor.py PASSED [0.3904s] [ 18%] 2025-12-04T11:29:55.3696558Z inductor/test_compile_subprocess.py::GPUTests::test_cat_upcasting_cuda <- test/inductor/test_torchinductor.py PASSED [0.6337s] [ 18%] 2025-12-04T11:29:55.3697780Z inductor/test_compile_subprocess.py::GPUTests::test_cauchy_cuda <- test/inductor/test_torchinductor.py PASSED [0.2784s] [ 18%] 2025-12-04T11:29:55.3699024Z inductor/test_compile_subprocess.py::GPUTests::test_chunk_recompiles_cuda <- test/inductor/test_torchinductor.py PASSED [1.1241s] [ 19%] 2025-12-04T11:29:55.3700885Z inductor/test_compile_subprocess.py::GPUTests::test_computed_buffer_inlining_cuda <- test/inductor/test_torchinductor.py W1204 11:22:20.795000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Unable to pickle input graph or example inputs 2025-12-04T11:29:55.3702605Z W1204 11:22:20.795000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Traceback (most recent call last): 2025-12-04T11:29:55.3704097Z W1204 11:22:20.795000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 486, in serialize_compile 2025-12-04T11:29:55.3705504Z W1204 11:22:20.795000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] ).serialize() 2025-12-04T11:29:55.3706871Z W1204 11:22:20.795000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 210, in serialize 2025-12-04T11:29:55.3708421Z W1204 11:22:20.795000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] return _WireProtocolPickledInput(GraphPickler.dumps(self)) 2025-12-04T11:29:55.3709934Z W1204 11:22:20.795000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 127, in dumps 2025-12-04T11:29:55.3711267Z W1204 11:22:20.795000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] pickler.dump(obj) 2025-12-04T11:29:55.3712648Z W1204 11:22:20.795000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 94, in reducer_override 2025-12-04T11:29:55.3714154Z W1204 11:22:20.795000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] return _GraphModulePickleData.reduce_helper(self, obj) 2025-12-04T11:29:55.3715667Z W1204 11:22:20.795000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 352, in reduce_helper 2025-12-04T11:29:55.3717076Z W1204 11:22:20.795000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] cls(obj, pickler.options), 2025-12-04T11:29:55.3718480Z W1204 11:22:20.795000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 364, in __init__ 2025-12-04T11:29:55.3719940Z W1204 11:22:20.795000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] self.graph = _GraphPickleData(gm._graph, options) 2025-12-04T11:29:55.3721420Z W1204 11:22:20.795000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 571, in __init__ 2025-12-04T11:29:55.3722892Z W1204 11:22:20.795000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] nodes[node] = _NodePickleData(node, nodes, options) 2025-12-04T11:29:55.3724358Z W1204 11:22:20.795000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 387, in __init__ 2025-12-04T11:29:55.3725951Z W1204 11:22:20.795000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] self.target = _OpPickleData.pickle(node.target, options) 2025-12-04T11:29:55.3727436Z W1204 11:22:20.795000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 435, in pickle 2025-12-04T11:29:55.3728925Z W1204 11:22:20.795000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] return cls._pickle_op(name, _OpOverloadPickleData, options) 2025-12-04T11:29:55.3730477Z W1204 11:22:20.795000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 456, in _pickle_op 2025-12-04T11:29:55.3732038Z W1204 11:22:20.795000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] raise BypassFxGraphCache(f"Unable to pickle non-standard op: {name}") 2025-12-04T11:29:55.3733631Z W1204 11:22:20.795000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] torch._inductor.codecache.BypassFxGraphCache: Unable to pickle non-standard op: torch.ops.prims.iota.default 2025-12-04T11:29:55.3734688Z PASSED [0.2243s] [ 19%] 2025-12-04T11:29:55.3735499Z inductor/test_compile_subprocess.py::GPUTests::test_consecutive_split_cumprod_cuda <- test/inductor/test_torchinductor.py PASSED [0.4692s] [ 19%] 2025-12-04T11:29:55.3737393Z inductor/test_compile_subprocess.py::GPUTests::test_const_int32_to_float_cuda <- test/inductor/test_torchinductor.py W1204 11:22:21.494000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Unable to pickle input graph or example inputs 2025-12-04T11:29:55.3739067Z W1204 11:22:21.494000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Traceback (most recent call last): 2025-12-04T11:29:55.3740552Z W1204 11:22:21.494000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 486, in serialize_compile 2025-12-04T11:29:55.3741948Z W1204 11:22:21.494000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] ).serialize() 2025-12-04T11:29:55.3743303Z W1204 11:22:21.494000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 210, in serialize 2025-12-04T11:29:55.3744851Z W1204 11:22:21.494000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] return _WireProtocolPickledInput(GraphPickler.dumps(self)) 2025-12-04T11:29:55.3746349Z W1204 11:22:21.494000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 127, in dumps 2025-12-04T11:29:55.3747663Z W1204 11:22:21.494000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] pickler.dump(obj) 2025-12-04T11:29:55.3749054Z W1204 11:22:21.494000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 94, in reducer_override 2025-12-04T11:29:55.3750574Z W1204 11:22:21.494000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] return _GraphModulePickleData.reduce_helper(self, obj) 2025-12-04T11:29:55.3752130Z W1204 11:22:21.494000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 352, in reduce_helper 2025-12-04T11:29:55.3753536Z W1204 11:22:21.494000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] cls(obj, pickler.options), 2025-12-04T11:29:55.3754907Z W1204 11:22:21.494000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 364, in __init__ 2025-12-04T11:29:55.3756398Z W1204 11:22:21.494000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] self.graph = _GraphPickleData(gm._graph, options) 2025-12-04T11:29:55.3757893Z W1204 11:22:21.494000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 571, in __init__ 2025-12-04T11:29:55.3759353Z W1204 11:22:21.494000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] nodes[node] = _NodePickleData(node, nodes, options) 2025-12-04T11:29:55.3760934Z W1204 11:22:21.494000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 387, in __init__ 2025-12-04T11:29:55.3762452Z W1204 11:22:21.494000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] self.target = _OpPickleData.pickle(node.target, options) 2025-12-04T11:29:55.3763938Z W1204 11:22:21.494000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 435, in pickle 2025-12-04T11:29:55.3765434Z W1204 11:22:21.494000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] return cls._pickle_op(name, _OpOverloadPickleData, options) 2025-12-04T11:29:55.3766951Z W1204 11:22:21.494000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 456, in _pickle_op 2025-12-04T11:29:55.3768506Z W1204 11:22:21.494000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] raise BypassFxGraphCache(f"Unable to pickle non-standard op: {name}") 2025-12-04T11:29:55.3770187Z W1204 11:22:21.494000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] torch._inductor.codecache.BypassFxGraphCache: Unable to pickle non-standard op: torch.ops.prims.convert_element_type.default 2025-12-04T11:29:55.3771572Z PASSED [0.8194s] [ 20%] 2025-12-04T11:29:55.3772340Z inductor/test_compile_subprocess.py::GPUTests::test_constant_pad_1d_cuda <- test/inductor/test_torchinductor.py PASSED [0.6795s] [ 20%] 2025-12-04T11:29:55.3773630Z inductor/test_compile_subprocess.py::GPUTests::test_constant_pad_float64_cuda <- test/inductor/test_torchinductor.py PASSED [0.3179s] [ 20%] 2025-12-04T11:29:55.3774939Z inductor/test_compile_subprocess.py::GPUTests::test_constant_pad_nd_inplace_cuda <- test/inductor/test_torchinductor.py PASSED [0.1827s] [ 21%] 2025-12-04T11:29:55.3776506Z inductor/test_compile_subprocess.py::GPUTests::test_conv_functional_bn_fuse_cuda <- test/inductor/test_torchinductor.py SKIPPED [0.0035s] (only support cpu conv bn test) [ 21%] 2025-12-04T11:29:55.3777970Z inductor/test_compile_subprocess.py::GPUTests::test_convolution4_cuda <- test/inductor/test_torchinductor.py PASSED [0.5776s] [ 21%] 2025-12-04T11:29:55.3779820Z inductor/test_compile_subprocess.py::GPUTests::test_copy_non_blocking_is_pinned_use_cat_True_cuda <- test/inductor/test_torchinductor.py W1204 11:22:25.058000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Unable to pickle input graph or example inputs 2025-12-04T11:29:55.3781571Z W1204 11:22:25.058000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Traceback (most recent call last): 2025-12-04T11:29:55.3783141Z W1204 11:22:25.058000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 486, in serialize_compile 2025-12-04T11:29:55.3784553Z W1204 11:22:25.058000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] ).serialize() 2025-12-04T11:29:55.3785918Z W1204 11:22:25.058000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 210, in serialize 2025-12-04T11:29:55.3787508Z W1204 11:22:25.058000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] return _WireProtocolPickledInput(GraphPickler.dumps(self)) 2025-12-04T11:29:55.3789043Z W1204 11:22:25.058000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 127, in dumps 2025-12-04T11:29:55.3790372Z W1204 11:22:25.058000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] pickler.dump(obj) 2025-12-04T11:29:55.3791755Z W1204 11:22:25.058000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 94, in reducer_override 2025-12-04T11:29:55.3793329Z W1204 11:22:25.058000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] return _GraphModulePickleData.reduce_helper(self, obj) 2025-12-04T11:29:55.3794845Z W1204 11:22:25.058000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 352, in reduce_helper 2025-12-04T11:29:55.3796244Z W1204 11:22:25.058000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] cls(obj, pickler.options), 2025-12-04T11:29:55.3797631Z W1204 11:22:25.058000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 364, in __init__ 2025-12-04T11:29:55.3799093Z W1204 11:22:25.058000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] self.graph = _GraphPickleData(gm._graph, options) 2025-12-04T11:29:55.3800552Z W1204 11:22:25.058000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 571, in __init__ 2025-12-04T11:29:55.3802005Z W1204 11:22:25.058000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] nodes[node] = _NodePickleData(node, nodes, options) 2025-12-04T11:29:55.3803468Z W1204 11:22:25.058000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 387, in __init__ 2025-12-04T11:29:55.3804954Z W1204 11:22:25.058000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] self.target = _OpPickleData.pickle(node.target, options) 2025-12-04T11:29:55.3806441Z W1204 11:22:25.058000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 435, in pickle 2025-12-04T11:29:55.3807950Z W1204 11:22:25.058000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] return cls._pickle_op(name, _OpOverloadPickleData, options) 2025-12-04T11:29:55.3809460Z W1204 11:22:25.058000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 456, in _pickle_op 2025-12-04T11:29:55.3811042Z W1204 11:22:25.058000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] raise BypassFxGraphCache(f"Unable to pickle non-standard op: {name}") 2025-12-04T11:29:55.3812662Z W1204 11:22:25.058000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] torch._inductor.codecache.BypassFxGraphCache: Unable to pickle non-standard op: torch.ops.prims.device_put.default 2025-12-04T11:29:55.3814088Z W1204 11:22:25.236000 94107 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program 2025-12-04T11:29:55.3815026Z W1204 11:22:25.237000 94107 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program 2025-12-04T11:29:55.3815939Z W1204 11:22:25.238000 94107 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program 2025-12-04T11:29:55.3816925Z W1204 11:22:25.239000 94107 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program 2025-12-04T11:29:55.3817884Z W1204 11:22:25.240000 94107 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program 2025-12-04T11:29:55.3818827Z W1204 11:22:25.240000 94107 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program 2025-12-04T11:29:55.3819748Z W1204 11:22:25.241000 94107 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program 2025-12-04T11:29:55.3820668Z W1204 11:22:25.242000 94107 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program 2025-12-04T11:29:55.3821585Z W1204 11:22:25.242000 94107 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program 2025-12-04T11:29:55.3822524Z W1204 11:22:25.243000 94107 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program 2025-12-04T11:29:55.3823446Z W1204 11:22:25.244000 94107 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program 2025-12-04T11:29:55.3824368Z W1204 11:22:25.244000 94107 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program 2025-12-04T11:29:55.3825291Z W1204 11:22:25.245000 94107 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program 2025-12-04T11:29:55.3826204Z W1204 11:22:25.246000 94107 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program 2025-12-04T11:29:55.3827132Z W1204 11:22:25.246000 94107 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program 2025-12-04T11:29:55.3828059Z W1204 11:22:25.247000 94107 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program 2025-12-04T11:29:55.3828985Z W1204 11:22:25.248000 94107 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program 2025-12-04T11:29:55.3829899Z W1204 11:22:25.249000 94107 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program 2025-12-04T11:29:55.3830824Z W1204 11:22:25.249000 94107 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program 2025-12-04T11:29:55.3831756Z W1204 11:22:25.250000 94107 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program 2025-12-04T11:29:55.3832685Z W1204 11:22:25.251000 94107 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program 2025-12-04T11:29:55.3833593Z W1204 11:22:25.251000 94107 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program 2025-12-04T11:29:55.3834520Z W1204 11:22:25.252000 94107 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program 2025-12-04T11:29:55.3835450Z W1204 11:22:25.253000 94107 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program 2025-12-04T11:29:55.3836377Z W1204 11:22:25.253000 94107 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program 2025-12-04T11:29:55.3837290Z W1204 11:22:25.254000 94107 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program 2025-12-04T11:29:55.3838215Z W1204 11:22:25.255000 94107 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program 2025-12-04T11:29:55.3839140Z W1204 11:22:25.255000 94107 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program 2025-12-04T11:29:55.3840064Z W1204 11:22:25.256000 94107 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program 2025-12-04T11:29:55.3840973Z W1204 11:22:25.257000 94107 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program 2025-12-04T11:29:55.3841956Z W1204 11:22:25.257000 94107 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program 2025-12-04T11:29:55.3842881Z W1204 11:22:25.258000 94107 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program 2025-12-04T11:29:55.3843793Z W1204 11:22:25.259000 94107 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program 2025-12-04T11:29:55.3844710Z W1204 11:22:25.259000 94107 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program 2025-12-04T11:29:55.3846345Z W1204 11:22:25.260000 94107 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program 2025-12-04T11:29:55.3847304Z W1204 11:22:25.261000 94107 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program 2025-12-04T11:29:55.3848222Z W1204 11:22:25.262000 94107 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program 2025-12-04T11:29:55.3849126Z W1204 11:22:25.262000 94107 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program 2025-12-04T11:29:55.3850048Z W1204 11:22:25.263000 94107 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program 2025-12-04T11:29:55.3851002Z W1204 11:22:25.264000 94107 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program 2025-12-04T11:29:55.3851906Z W1204 11:22:25.264000 94107 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program 2025-12-04T11:29:55.3852830Z W1204 11:22:25.265000 94107 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program 2025-12-04T11:29:55.3853752Z W1204 11:22:25.266000 94107 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program 2025-12-04T11:29:55.3854675Z W1204 11:22:25.266000 94107 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program 2025-12-04T11:29:55.3855587Z W1204 11:22:25.267000 94107 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program 2025-12-04T11:29:55.3856586Z W1204 11:22:25.268000 94107 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program 2025-12-04T11:29:55.3857506Z W1204 11:22:25.268000 94107 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program 2025-12-04T11:29:55.3858431Z W1204 11:22:25.269000 94107 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program 2025-12-04T11:29:55.3859338Z W1204 11:22:25.270000 94107 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program 2025-12-04T11:29:55.3860263Z W1204 11:22:25.271000 94107 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program 2025-12-04T11:29:55.3861187Z W1204 11:22:25.271000 94107 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program 2025-12-04T11:29:55.3862227Z W1204 11:22:25.272000 94107 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program 2025-12-04T11:29:55.3863130Z W1204 11:22:25.273000 94107 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program 2025-12-04T11:29:55.3864056Z W1204 11:22:25.273000 94107 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program 2025-12-04T11:29:55.3864976Z W1204 11:22:25.274000 94107 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program 2025-12-04T11:29:55.3865896Z W1204 11:22:25.275000 94107 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program 2025-12-04T11:29:55.3866807Z W1204 11:22:25.275000 94107 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program 2025-12-04T11:29:55.3867738Z W1204 11:22:25.276000 94107 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program 2025-12-04T11:29:55.3868662Z W1204 11:22:25.277000 94107 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program 2025-12-04T11:29:55.3869579Z W1204 11:22:25.277000 94107 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program 2025-12-04T11:29:55.3870489Z W1204 11:22:25.278000 94107 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program 2025-12-04T11:29:55.3872153Z W1204 11:22:25.279000 94107 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program 2025-12-04T11:29:55.3873094Z W1204 11:22:25.279000 94107 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program 2025-12-04T11:29:55.3874016Z W1204 11:22:25.280000 94107 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program 2025-12-04T11:29:55.3874984Z W1204 11:22:25.281000 94107 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program 2025-12-04T11:29:55.3875976Z W1204 11:22:25.282000 94107 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program 2025-12-04T11:29:55.3876897Z W1204 11:22:25.282000 94107 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program 2025-12-04T11:29:55.3877803Z W1204 11:22:25.283000 94107 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program 2025-12-04T11:29:55.3878724Z W1204 11:22:25.284000 94107 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program 2025-12-04T11:29:55.3879783Z W1204 11:22:25.284000 94107 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program 2025-12-04T11:29:55.3880706Z W1204 11:22:25.285000 94107 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program 2025-12-04T11:29:55.3881610Z W1204 11:22:25.286000 94107 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program 2025-12-04T11:29:55.3882535Z W1204 11:22:25.286000 94107 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program 2025-12-04T11:29:55.3883461Z W1204 11:22:25.287000 94107 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program 2025-12-04T11:29:55.3884386Z W1204 11:22:25.288000 94107 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program 2025-12-04T11:29:55.3885289Z W1204 11:22:25.288000 94107 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program 2025-12-04T11:29:55.3886208Z W1204 11:22:25.289000 94107 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program 2025-12-04T11:29:55.3887122Z W1204 11:22:25.290000 94107 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program 2025-12-04T11:29:55.3888043Z W1204 11:22:25.291000 94107 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program 2025-12-04T11:29:55.3888953Z W1204 11:22:25.291000 94107 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program 2025-12-04T11:29:55.3889870Z W1204 11:22:25.292000 94107 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program 2025-12-04T11:29:55.3890785Z W1204 11:22:25.293000 94107 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program 2025-12-04T11:29:55.3891699Z W1204 11:22:25.294000 94107 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program 2025-12-04T11:29:55.3892605Z W1204 11:22:25.294000 94107 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program 2025-12-04T11:29:55.3893524Z W1204 11:22:25.295000 94107 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program 2025-12-04T11:29:55.3894449Z W1204 11:22:25.296000 94107 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program 2025-12-04T11:29:55.3895366Z W1204 11:22:25.296000 94107 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program 2025-12-04T11:29:55.3896279Z W1204 11:22:25.297000 94107 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program 2025-12-04T11:29:55.3897285Z W1204 11:22:25.298000 94107 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program 2025-12-04T11:29:55.3898210Z W1204 11:22:25.298000 94107 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program 2025-12-04T11:29:55.3899135Z W1204 11:22:25.299000 94107 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program 2025-12-04T11:29:55.3900091Z W1204 11:22:25.300000 94107 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program 2025-12-04T11:29:55.3901015Z W1204 11:22:25.301000 94107 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program 2025-12-04T11:29:55.3901933Z W1204 11:22:25.301000 94107 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program 2025-12-04T11:29:55.3902890Z W1204 11:22:25.302000 94107 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program 2025-12-04T11:29:55.3903831Z W1204 11:22:25.303000 94107 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program 2025-12-04T11:29:55.3904750Z W1204 11:22:25.303000 94107 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program 2025-12-04T11:29:55.3905668Z W1204 11:22:25.304000 94107 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program 2025-12-04T11:29:55.3906579Z W1204 11:22:25.305000 94107 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program 2025-12-04T11:29:55.3907534Z W1204 11:22:25.305000 94107 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program 2025-12-04T11:29:55.3908192Z PASSED [3.4798s] [ 22%] 2025-12-04T11:29:55.3908986Z inductor/test_compile_subprocess.py::GPUTests::test_copy_with_scalar_src_cuda <- test/inductor/test_torchinductor.py PASSED [0.5524s] [ 22%] 2025-12-04T11:29:55.3910312Z inductor/test_compile_subprocess.py::GPUTests::test_cpu_scalar_with_cpu_scalar_cuda <- test/inductor/test_torchinductor.py PASSED [8.1973s] [ 22%] 2025-12-04T11:29:55.3912077Z inductor/test_compile_subprocess.py::GPUTests::test_cpu_scalar_with_gpu_tensor_cpp_cuda <- test/inductor/test_torchinductor.py W1204 11:22:36.342000 94292 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program 2025-12-04T11:29:55.3913350Z PASSED [6.8863s] [ 23%] 2025-12-04T11:29:55.3914569Z inductor/test_compile_subprocess.py::GPUTests::test_cpu_scalar_with_gpu_tensor_dynamic_cuda <- test/inductor/test_torchinductor.py W1204 11:22:43.268000 94292 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program 2025-12-04T11:29:55.3915846Z PASSED [0.2600s] [ 23%] 2025-12-04T11:29:55.3916561Z inductor/test_compile_subprocess.py::GPUTests::test_cumsum_inf_cuda <- test/inductor/test_torchinductor.py PASSED [0.6998s] [ 23%] 2025-12-04T11:29:55.3917791Z inductor/test_compile_subprocess.py::GPUTests::test_cumsum_no_mask_cuda <- test/inductor/test_torchinductor.py PASSED [0.9193s] [ 24%] 2025-12-04T11:29:55.3919519Z inductor/test_compile_subprocess.py::GPUTests::test_custom_op_2_cuda <- test/inductor/test_torchinductor.py W1204 11:22:45.095000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Unable to pickle input graph or example inputs 2025-12-04T11:29:55.3921151Z W1204 11:22:45.095000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Traceback (most recent call last): 2025-12-04T11:29:55.3922632Z W1204 11:22:45.095000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 486, in serialize_compile 2025-12-04T11:29:55.3924040Z W1204 11:22:45.095000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] ).serialize() 2025-12-04T11:29:55.3925402Z W1204 11:22:45.095000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 210, in serialize 2025-12-04T11:29:55.3926960Z W1204 11:22:45.095000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] return _WireProtocolPickledInput(GraphPickler.dumps(self)) 2025-12-04T11:29:55.3928448Z W1204 11:22:45.095000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 127, in dumps 2025-12-04T11:29:55.3929823Z W1204 11:22:45.095000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] pickler.dump(obj) 2025-12-04T11:29:55.3931200Z W1204 11:22:45.095000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 94, in reducer_override 2025-12-04T11:29:55.3932722Z W1204 11:22:45.095000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] return _GraphModulePickleData.reduce_helper(self, obj) 2025-12-04T11:29:55.3934312Z W1204 11:22:45.095000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 352, in reduce_helper 2025-12-04T11:29:55.3935702Z W1204 11:22:45.095000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] cls(obj, pickler.options), 2025-12-04T11:29:55.3937190Z W1204 11:22:45.095000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 364, in __init__ 2025-12-04T11:29:55.3938688Z W1204 11:22:45.095000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] self.graph = _GraphPickleData(gm._graph, options) 2025-12-04T11:29:55.3940144Z W1204 11:22:45.095000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 571, in __init__ 2025-12-04T11:29:55.3941610Z W1204 11:22:45.095000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] nodes[node] = _NodePickleData(node, nodes, options) 2025-12-04T11:29:55.3943049Z W1204 11:22:45.095000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 387, in __init__ 2025-12-04T11:29:55.3944545Z W1204 11:22:45.095000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] self.target = _OpPickleData.pickle(node.target, options) 2025-12-04T11:29:55.3946022Z W1204 11:22:45.095000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 435, in pickle 2025-12-04T11:29:55.3947523Z W1204 11:22:45.095000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] return cls._pickle_op(name, _OpOverloadPickleData, options) 2025-12-04T11:29:55.3949033Z W1204 11:22:45.095000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 456, in _pickle_op 2025-12-04T11:29:55.3950597Z W1204 11:22:45.095000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] raise BypassFxGraphCache(f"Unable to pickle non-standard op: {name}") 2025-12-04T11:29:55.3952179Z W1204 11:22:45.095000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] torch._inductor.codecache.BypassFxGraphCache: Unable to pickle non-standard op: torch.ops.test.foo2.default 2025-12-04T11:29:55.3953240Z PASSED [0.2772s] [ 24%] 2025-12-04T11:29:55.3954576Z inductor/test_compile_subprocess.py::GPUTests::test_custom_op_fixed_layout_sequential_cuda <- test/inductor/test_torchinductor.py W1204 11:22:45.383000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Unable to pickle input graph or example inputs 2025-12-04T11:29:55.3956302Z W1204 11:22:45.383000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Traceback (most recent call last): 2025-12-04T11:29:55.3957787Z W1204 11:22:45.383000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 486, in serialize_compile 2025-12-04T11:29:55.3959182Z W1204 11:22:45.383000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] ).serialize() 2025-12-04T11:29:55.3960572Z W1204 11:22:45.383000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 210, in serialize 2025-12-04T11:29:55.3962127Z W1204 11:22:45.383000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] return _WireProtocolPickledInput(GraphPickler.dumps(self)) 2025-12-04T11:29:55.3963613Z W1204 11:22:45.383000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 127, in dumps 2025-12-04T11:29:55.3964973Z W1204 11:22:45.383000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] pickler.dump(obj) 2025-12-04T11:29:55.3966435Z W1204 11:22:45.383000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 94, in reducer_override 2025-12-04T11:29:55.3967965Z W1204 11:22:45.383000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] return _GraphModulePickleData.reduce_helper(self, obj) 2025-12-04T11:29:55.3969460Z W1204 11:22:45.383000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 352, in reduce_helper 2025-12-04T11:29:55.3970907Z W1204 11:22:45.383000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] cls(obj, pickler.options), 2025-12-04T11:29:55.3972530Z W1204 11:22:45.383000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 364, in __init__ 2025-12-04T11:29:55.3973994Z W1204 11:22:45.383000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] self.graph = _GraphPickleData(gm._graph, options) 2025-12-04T11:29:55.3975456Z W1204 11:22:45.383000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 571, in __init__ 2025-12-04T11:29:55.3976966Z W1204 11:22:45.383000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] nodes[node] = _NodePickleData(node, nodes, options) 2025-12-04T11:29:55.3978425Z W1204 11:22:45.383000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 387, in __init__ 2025-12-04T11:29:55.3979917Z W1204 11:22:45.383000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] self.target = _OpPickleData.pickle(node.target, options) 2025-12-04T11:29:55.3981396Z W1204 11:22:45.383000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 435, in pickle 2025-12-04T11:29:55.3982901Z W1204 11:22:45.383000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] return cls._pickle_op(name, _OpOverloadPickleData, options) 2025-12-04T11:29:55.3984411Z W1204 11:22:45.383000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 456, in _pickle_op 2025-12-04T11:29:55.3985986Z W1204 11:22:45.383000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] raise BypassFxGraphCache(f"Unable to pickle non-standard op: {name}") 2025-12-04T11:29:55.3987561Z W1204 11:22:45.383000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] torch._inductor.codecache.BypassFxGraphCache: Unable to pickle non-standard op: torch.ops.test.bar.default 2025-12-04T11:29:55.3988971Z W1204 11:22:45.399000 94107 site-packages/torch/_inductor/utils.py:1703] [0/0] Not enough SMs to use max_autotune_gemm mode 2025-12-04T11:29:55.3989658Z PASSED [0.3617s] [ 25%] 2025-12-04T11:29:55.3990460Z inductor/test_compile_subprocess.py::GPUTests::test_custom_scan_op_multi_input_cuda <- test/inductor/test_torchinductor.py PASSED [0.1730s] [ 25%] 2025-12-04T11:29:55.3992028Z inductor/test_compile_subprocess.py::GPUTests::test_data_type_propogation_cuda <- test/inductor/test_torchinductor.py SKIPPED [0.0035s] (triton not supported) [ 25%] 2025-12-04T11:29:55.3993456Z inductor/test_compile_subprocess.py::GPUTests::test_dense_mask_index_cuda <- test/inductor/test_torchinductor.py PASSED [0.5806s] [ 26%] 2025-12-04T11:29:55.3994652Z inductor/test_compile_subprocess.py::GPUTests::test_div1_cuda <- test/inductor/test_torchinductor.py PASSED [0.6825s] [ 26%] 2025-12-04T11:29:55.3995854Z inductor/test_compile_subprocess.py::GPUTests::test_div8_cuda <- test/inductor/test_torchinductor.py PASSED [0.8310s] [ 26%] 2025-12-04T11:29:55.3997122Z inductor/test_compile_subprocess.py::GPUTests::test_dropout_trivial_1_cuda <- test/inductor/test_torchinductor.py PASSED [0.2748s] [ 27%] 2025-12-04T11:29:55.3998621Z inductor/test_compile_subprocess.py::GPUTests::test_dtypeview_float16_int8_cuda <- test/inductor/test_torchinductor.py SKIPPED [0.0034s] (uses bfloat16 which requires SM >= 80) [ 27%] 2025-12-04T11:29:55.4000351Z inductor/test_compile_subprocess.py::GPUTests::test_dtypeview_float32_float32_cuda <- test/inductor/test_torchinductor.py SKIPPED [0.0031s] (uses bfloat16 which requires SM >= 80) [ 27%] 2025-12-04T11:29:55.4002122Z inductor/test_compile_subprocess.py::GPUTests::test_dtypeview_float32_float64_cuda <- test/inductor/test_torchinductor.py SKIPPED [0.0030s] (uses bfloat16 which requires SM >= 80) [ 28%] 2025-12-04T11:29:55.4003854Z inductor/test_compile_subprocess.py::GPUTests::test_dtypeview_float64_int64_cuda <- test/inductor/test_torchinductor.py SKIPPED [0.0030s] (uses bfloat16 which requires SM >= 80) [ 28%] 2025-12-04T11:29:55.4005567Z inductor/test_compile_subprocess.py::GPUTests::test_dtypeview_int16_float16_cuda <- test/inductor/test_torchinductor.py SKIPPED [0.0032s] (uses bfloat16 which requires SM >= 80) [ 28%] 2025-12-04T11:29:55.4007278Z inductor/test_compile_subprocess.py::GPUTests::test_dtypeview_int16_float32_cuda <- test/inductor/test_torchinductor.py SKIPPED [0.0030s] (uses bfloat16 which requires SM >= 80) [ 29%] 2025-12-04T11:29:55.4008971Z inductor/test_compile_subprocess.py::GPUTests::test_dtypeview_int16_uint8_cuda <- test/inductor/test_torchinductor.py SKIPPED [0.0030s] (uses bfloat16 which requires SM >= 80) [ 29%] 2025-12-04T11:29:55.4010649Z inductor/test_compile_subprocess.py::GPUTests::test_dtypeview_int32_int8_cuda <- test/inductor/test_torchinductor.py SKIPPED [0.0030s] (uses bfloat16 which requires SM >= 80) [ 29%] 2025-12-04T11:29:55.4012333Z inductor/test_compile_subprocess.py::GPUTests::test_dtypeview_int64_int8_cuda <- test/inductor/test_torchinductor.py SKIPPED [0.0030s] (uses bfloat16 which requires SM >= 80) [ 30%] 2025-12-04T11:29:55.4014014Z inductor/test_compile_subprocess.py::GPUTests::test_dtypeview_int64_uint8_cuda <- test/inductor/test_torchinductor.py SKIPPED [0.0030s] (uses bfloat16 which requires SM >= 80) [ 30%] 2025-12-04T11:29:55.4015717Z inductor/test_compile_subprocess.py::GPUTests::test_dtypeview_int8_float16_cuda <- test/inductor/test_torchinductor.py SKIPPED [0.0030s] (uses bfloat16 which requires SM >= 80) [ 30%] 2025-12-04T11:29:55.4017485Z inductor/test_compile_subprocess.py::GPUTests::test_dtypeview_int8_float32_cuda <- test/inductor/test_torchinductor.py SKIPPED [0.0030s] (uses bfloat16 which requires SM >= 80) [ 31%] 2025-12-04T11:29:55.4019175Z inductor/test_compile_subprocess.py::GPUTests::test_dtypeview_int8_float64_cuda <- test/inductor/test_torchinductor.py SKIPPED [0.0030s] (uses bfloat16 which requires SM >= 80) [ 31%] 2025-12-04T11:29:55.4020872Z inductor/test_compile_subprocess.py::GPUTests::test_dtypeview_int8_int16_cuda <- test/inductor/test_torchinductor.py SKIPPED [0.0030s] (uses bfloat16 which requires SM >= 80) [ 31%] 2025-12-04T11:29:55.4022566Z inductor/test_compile_subprocess.py::GPUTests::test_dtypeview_uint8_int16_cuda <- test/inductor/test_torchinductor.py SKIPPED [0.0030s] (uses bfloat16 which requires SM >= 80) [ 32%] 2025-12-04T11:29:55.4024296Z inductor/test_compile_subprocess.py::GPUTests::test_dtypeview_uint8_uint8_cuda <- test/inductor/test_torchinductor.py SKIPPED [0.0032s] (uses bfloat16 which requires SM >= 80) [ 32%] 2025-12-04T11:29:55.4026208Z inductor/test_compile_subprocess.py::GPUTests::test_elu_cuda <- test/inductor/test_torchinductor.py W1204 11:22:48.854000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Unable to pickle input graph or example inputs 2025-12-04T11:29:55.4027829Z W1204 11:22:48.854000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Traceback (most recent call last): 2025-12-04T11:29:55.4029356Z W1204 11:22:48.854000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 486, in serialize_compile 2025-12-04T11:29:55.4030752Z W1204 11:22:48.854000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] ).serialize() 2025-12-04T11:29:55.4032113Z W1204 11:22:48.854000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 210, in serialize 2025-12-04T11:29:55.4033680Z W1204 11:22:48.854000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] return _WireProtocolPickledInput(GraphPickler.dumps(self)) 2025-12-04T11:29:55.4035179Z W1204 11:22:48.854000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 127, in dumps 2025-12-04T11:29:55.4036517Z W1204 11:22:48.854000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] pickler.dump(obj) 2025-12-04T11:29:55.4037889Z W1204 11:22:48.854000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 94, in reducer_override 2025-12-04T11:29:55.4039412Z W1204 11:22:48.854000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] return _GraphModulePickleData.reduce_helper(self, obj) 2025-12-04T11:29:55.4040915Z W1204 11:22:48.854000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 352, in reduce_helper 2025-12-04T11:29:55.4042315Z W1204 11:22:48.854000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] cls(obj, pickler.options), 2025-12-04T11:29:55.4043707Z W1204 11:22:48.854000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 364, in __init__ 2025-12-04T11:29:55.4045166Z W1204 11:22:48.854000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] self.graph = _GraphPickleData(gm._graph, options) 2025-12-04T11:29:55.4046611Z W1204 11:22:48.854000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 571, in __init__ 2025-12-04T11:29:55.4048072Z W1204 11:22:48.854000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] nodes[node] = _NodePickleData(node, nodes, options) 2025-12-04T11:29:55.4049536Z W1204 11:22:48.854000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 387, in __init__ 2025-12-04T11:29:55.4051026Z W1204 11:22:48.854000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] self.target = _OpPickleData.pickle(node.target, options) 2025-12-04T11:29:55.4052516Z W1204 11:22:48.854000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 435, in pickle 2025-12-04T11:29:55.4054006Z W1204 11:22:48.854000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] return cls._pickle_op(name, _OpOverloadPickleData, options) 2025-12-04T11:29:55.4055567Z W1204 11:22:48.854000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 456, in _pickle_op 2025-12-04T11:29:55.4057210Z W1204 11:22:48.854000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] raise BypassFxGraphCache(f"Unable to pickle non-standard op: {name}") 2025-12-04T11:29:55.4058924Z W1204 11:22:48.854000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] torch._inductor.codecache.BypassFxGraphCache: Unable to pickle non-standard op: torch.ops.prims.convert_element_type.default 2025-12-04T11:29:55.4060109Z PASSED [0.9188s] [ 32%] 2025-12-04T11:29:55.4060805Z inductor/test_compile_subprocess.py::GPUTests::test_erfc_cuda <- test/inductor/test_torchinductor.py PASSED [0.7078s] [ 33%] 2025-12-04T11:29:55.4061988Z inductor/test_compile_subprocess.py::GPUTests::test_erfinv_cuda <- test/inductor/test_torchinductor.py PASSED [0.7006s] [ 33%] 2025-12-04T11:29:55.4063163Z inductor/test_compile_subprocess.py::GPUTests::test_exp_cuda <- test/inductor/test_torchinductor.py PASSED [0.5491s] [ 34%] 2025-12-04T11:29:55.4064429Z inductor/test_compile_subprocess.py::GPUTests::test_expanded_reduction_cuda <- test/inductor/test_torchinductor.py PASSED [0.8582s] [ 34%] 2025-12-04T11:29:55.4065665Z inductor/test_compile_subprocess.py::GPUTests::test_expm1_cuda <- test/inductor/test_torchinductor.py PASSED [4.0652s] [ 34%] 2025-12-04T11:29:55.4067450Z inductor/test_compile_subprocess.py::GPUTests::test_fallback_mutable_op_with_return_cuda <- test/inductor/test_torchinductor.py W1204 11:22:56.135000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] Unable to pickle input graph or example inputs 2025-12-04T11:29:55.4069145Z W1204 11:22:56.135000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] Traceback (most recent call last): 2025-12-04T11:29:55.4070612Z W1204 11:22:56.135000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 486, in serialize_compile 2025-12-04T11:29:55.4072155Z W1204 11:22:56.135000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] ).serialize() 2025-12-04T11:29:55.4073497Z W1204 11:22:56.135000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 210, in serialize 2025-12-04T11:29:55.4075036Z W1204 11:22:56.135000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] return _WireProtocolPickledInput(GraphPickler.dumps(self)) 2025-12-04T11:29:55.4076521Z W1204 11:22:56.135000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 127, in dumps 2025-12-04T11:29:55.4077813Z W1204 11:22:56.135000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] pickler.dump(obj) 2025-12-04T11:29:55.4079142Z W1204 11:22:56.135000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 94, in reducer_override 2025-12-04T11:29:55.4080639Z W1204 11:22:56.135000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] return _GraphModulePickleData.reduce_helper(self, obj) 2025-12-04T11:29:55.4082129Z W1204 11:22:56.135000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 352, in reduce_helper 2025-12-04T11:29:55.4083507Z W1204 11:22:56.135000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] cls(obj, pickler.options), 2025-12-04T11:29:55.4084843Z W1204 11:22:56.135000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 364, in __init__ 2025-12-04T11:29:55.4086380Z W1204 11:22:56.135000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] self.graph = _GraphPickleData(gm._graph, options) 2025-12-04T11:29:55.4087820Z W1204 11:22:56.135000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 571, in __init__ 2025-12-04T11:29:55.4089259Z W1204 11:22:56.135000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] nodes[node] = _NodePickleData(node, nodes, options) 2025-12-04T11:29:55.4090814Z W1204 11:22:56.135000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 387, in __init__ 2025-12-04T11:29:55.4092274Z W1204 11:22:56.135000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] self.target = _OpPickleData.pickle(node.target, options) 2025-12-04T11:29:55.4093741Z W1204 11:22:56.135000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 435, in pickle 2025-12-04T11:29:55.4095263Z W1204 11:22:56.135000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] return cls._pickle_op(name, _OpOverloadPickleData, options) 2025-12-04T11:29:55.4096833Z W1204 11:22:56.135000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 456, in _pickle_op 2025-12-04T11:29:55.4098386Z W1204 11:22:56.135000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] raise BypassFxGraphCache(f"Unable to pickle non-standard op: {name}") 2025-12-04T11:29:55.4099958Z W1204 11:22:56.135000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] torch._inductor.codecache.BypassFxGraphCache: Unable to pickle non-standard op: torch.ops.mylib.inplace_.default 2025-12-04T11:29:55.4101035Z PASSED [0.0600s] [ 35%] 2025-12-04T11:29:55.4101748Z inductor/test_compile_subprocess.py::GPUTests::test_fill1_cuda <- test/inductor/test_torchinductor.py PASSED [0.5711s] [ 35%] 2025-12-04T11:29:55.4103428Z inductor/test_compile_subprocess.py::GPUTests::test_flip_cat_cuda <- test/inductor/test_torchinductor.py W1204 11:22:56.858000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Unable to pickle input graph or example inputs 2025-12-04T11:29:55.4105023Z W1204 11:22:56.858000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Traceback (most recent call last): 2025-12-04T11:29:55.4106511Z W1204 11:22:56.858000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 486, in serialize_compile 2025-12-04T11:29:55.4107909Z W1204 11:22:56.858000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] ).serialize() 2025-12-04T11:29:55.4109268Z W1204 11:22:56.858000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 210, in serialize 2025-12-04T11:29:55.4110819Z W1204 11:22:56.858000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] return _WireProtocolPickledInput(GraphPickler.dumps(self)) 2025-12-04T11:29:55.4112305Z W1204 11:22:56.858000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 127, in dumps 2025-12-04T11:29:55.4113631Z W1204 11:22:56.858000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] pickler.dump(obj) 2025-12-04T11:29:55.4115015Z W1204 11:22:56.858000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 94, in reducer_override 2025-12-04T11:29:55.4116539Z W1204 11:22:56.858000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] return _GraphModulePickleData.reduce_helper(self, obj) 2025-12-04T11:29:55.4118094Z W1204 11:22:56.858000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 352, in reduce_helper 2025-12-04T11:29:55.4119504Z W1204 11:22:56.858000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] cls(obj, pickler.options), 2025-12-04T11:29:55.4120925Z W1204 11:22:56.858000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 364, in __init__ 2025-12-04T11:29:55.4122411Z W1204 11:22:56.858000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] self.graph = _GraphPickleData(gm._graph, options) 2025-12-04T11:29:55.4123871Z W1204 11:22:56.858000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 571, in __init__ 2025-12-04T11:29:55.4125318Z W1204 11:22:56.858000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] nodes[node] = _NodePickleData(node, nodes, options) 2025-12-04T11:29:55.4126811Z W1204 11:22:56.858000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 387, in __init__ 2025-12-04T11:29:55.4128297Z W1204 11:22:56.858000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] self.target = _OpPickleData.pickle(node.target, options) 2025-12-04T11:29:55.4129786Z W1204 11:22:56.858000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 435, in pickle 2025-12-04T11:29:55.4131283Z W1204 11:22:56.858000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] return cls._pickle_op(name, _OpOverloadPickleData, options) 2025-12-04T11:29:55.4132790Z W1204 11:22:56.858000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 456, in _pickle_op 2025-12-04T11:29:55.4134360Z W1204 11:22:56.858000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] raise BypassFxGraphCache(f"Unable to pickle non-standard op: {name}") 2025-12-04T11:29:55.4135940Z W1204 11:22:56.858000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] torch._inductor.codecache.BypassFxGraphCache: Unable to pickle non-standard op: torch.ops.prims.rev.default 2025-12-04T11:29:55.4137474Z W1204 11:22:57.183000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Unable to pickle input graph or example inputs 2025-12-04T11:29:55.4138565Z W1204 11:22:57.183000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Traceback (most recent call last): 2025-12-04T11:29:55.4140046Z W1204 11:22:57.183000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 486, in serialize_compile 2025-12-04T11:29:55.4141445Z W1204 11:22:57.183000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] ).serialize() 2025-12-04T11:29:55.4142797Z W1204 11:22:57.183000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 210, in serialize 2025-12-04T11:29:55.4144344Z W1204 11:22:57.183000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] return _WireProtocolPickledInput(GraphPickler.dumps(self)) 2025-12-04T11:29:55.4145824Z W1204 11:22:57.183000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 127, in dumps 2025-12-04T11:29:55.4147152Z W1204 11:22:57.183000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] pickler.dump(obj) 2025-12-04T11:29:55.4148604Z W1204 11:22:57.183000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 94, in reducer_override 2025-12-04T11:29:55.4150134Z W1204 11:22:57.183000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] return _GraphModulePickleData.reduce_helper(self, obj) 2025-12-04T11:29:55.4151676Z W1204 11:22:57.183000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 352, in reduce_helper 2025-12-04T11:29:55.4153117Z W1204 11:22:57.183000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] cls(obj, pickler.options), 2025-12-04T11:29:55.4154506Z W1204 11:22:57.183000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 364, in __init__ 2025-12-04T11:29:55.4155967Z W1204 11:22:57.183000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] self.graph = _GraphPickleData(gm._graph, options) 2025-12-04T11:29:55.4157478Z W1204 11:22:57.183000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 571, in __init__ 2025-12-04T11:29:55.4158921Z W1204 11:22:57.183000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] nodes[node] = _NodePickleData(node, nodes, options) 2025-12-04T11:29:55.4160380Z W1204 11:22:57.183000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 387, in __init__ 2025-12-04T11:29:55.4161865Z W1204 11:22:57.183000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] self.target = _OpPickleData.pickle(node.target, options) 2025-12-04T11:29:55.4163350Z W1204 11:22:57.183000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 435, in pickle 2025-12-04T11:29:55.4164846Z W1204 11:22:57.183000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] return cls._pickle_op(name, _OpOverloadPickleData, options) 2025-12-04T11:29:55.4166351Z W1204 11:22:57.183000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 456, in _pickle_op 2025-12-04T11:29:55.4167930Z W1204 11:22:57.183000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] raise BypassFxGraphCache(f"Unable to pickle non-standard op: {name}") 2025-12-04T11:29:55.4169506Z W1204 11:22:57.183000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] torch._inductor.codecache.BypassFxGraphCache: Unable to pickle non-standard op: torch.ops.prims.rev.default 2025-12-04T11:29:55.4170570Z PASSED [0.6333s] [ 35%] 2025-12-04T11:29:55.4171986Z inductor/test_compile_subprocess.py::GPUTests::test_flip_cuda <- test/inductor/test_torchinductor.py W1204 11:22:57.451000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Unable to pickle input graph or example inputs 2025-12-04T11:29:55.4173601Z W1204 11:22:57.451000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Traceback (most recent call last): 2025-12-04T11:29:55.4175113Z W1204 11:22:57.451000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 486, in serialize_compile 2025-12-04T11:29:55.4176581Z W1204 11:22:57.451000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] ).serialize() 2025-12-04T11:29:55.4178034Z W1204 11:22:57.451000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 210, in serialize 2025-12-04T11:29:55.4179575Z W1204 11:22:57.451000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] return _WireProtocolPickledInput(GraphPickler.dumps(self)) 2025-12-04T11:29:55.4181083Z W1204 11:22:57.451000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 127, in dumps 2025-12-04T11:29:55.4182465Z W1204 11:22:57.451000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] pickler.dump(obj) 2025-12-04T11:29:55.4183902Z W1204 11:22:57.451000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 94, in reducer_override 2025-12-04T11:29:55.4185429Z W1204 11:22:57.451000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] return _GraphModulePickleData.reduce_helper(self, obj) 2025-12-04T11:29:55.4186937Z W1204 11:22:57.451000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 352, in reduce_helper 2025-12-04T11:29:55.4188397Z W1204 11:22:57.451000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] cls(obj, pickler.options), 2025-12-04T11:29:55.4189787Z W1204 11:22:57.451000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 364, in __init__ 2025-12-04T11:29:55.4191252Z W1204 11:22:57.451000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] self.graph = _GraphPickleData(gm._graph, options) 2025-12-04T11:29:55.4192696Z W1204 11:22:57.451000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 571, in __init__ 2025-12-04T11:29:55.4194159Z W1204 11:22:57.451000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] nodes[node] = _NodePickleData(node, nodes, options) 2025-12-04T11:29:55.4195629Z W1204 11:22:57.451000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 387, in __init__ 2025-12-04T11:29:55.4197119Z W1204 11:22:57.451000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] self.target = _OpPickleData.pickle(node.target, options) 2025-12-04T11:29:55.4198608Z W1204 11:22:57.451000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 435, in pickle 2025-12-04T11:29:55.4200094Z W1204 11:22:57.451000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] return cls._pickle_op(name, _OpOverloadPickleData, options) 2025-12-04T11:29:55.4201615Z W1204 11:22:57.451000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 456, in _pickle_op 2025-12-04T11:29:55.4203190Z W1204 11:22:57.451000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] raise BypassFxGraphCache(f"Unable to pickle non-standard op: {name}") 2025-12-04T11:29:55.4204781Z W1204 11:22:57.451000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] torch._inductor.codecache.BypassFxGraphCache: Unable to pickle non-standard op: torch.ops.prims.rev.default 2025-12-04T11:29:55.4206252Z W1204 11:22:57.725000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Unable to pickle input graph or example inputs 2025-12-04T11:29:55.4207338Z W1204 11:22:57.725000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Traceback (most recent call last): 2025-12-04T11:29:55.4208889Z W1204 11:22:57.725000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 486, in serialize_compile 2025-12-04T11:29:55.4210297Z W1204 11:22:57.725000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] ).serialize() 2025-12-04T11:29:55.4211651Z W1204 11:22:57.725000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 210, in serialize 2025-12-04T11:29:55.4213227Z W1204 11:22:57.725000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] return _WireProtocolPickledInput(GraphPickler.dumps(self)) 2025-12-04T11:29:55.4214761Z W1204 11:22:57.725000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 127, in dumps 2025-12-04T11:29:55.4216086Z W1204 11:22:57.725000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] pickler.dump(obj) 2025-12-04T11:29:55.4217534Z W1204 11:22:57.725000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 94, in reducer_override 2025-12-04T11:29:55.4219109Z W1204 11:22:57.725000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] return _GraphModulePickleData.reduce_helper(self, obj) 2025-12-04T11:29:55.4220610Z W1204 11:22:57.725000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 352, in reduce_helper 2025-12-04T11:29:55.4222021Z W1204 11:22:57.725000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] cls(obj, pickler.options), 2025-12-04T11:29:55.4223399Z W1204 11:22:57.725000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 364, in __init__ 2025-12-04T11:29:55.4224857Z W1204 11:22:57.725000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] self.graph = _GraphPickleData(gm._graph, options) 2025-12-04T11:29:55.4226307Z W1204 11:22:57.725000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 571, in __init__ 2025-12-04T11:29:55.4227762Z W1204 11:22:57.725000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] nodes[node] = _NodePickleData(node, nodes, options) 2025-12-04T11:29:55.4229227Z W1204 11:22:57.725000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 387, in __init__ 2025-12-04T11:29:55.4230716Z W1204 11:22:57.725000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] self.target = _OpPickleData.pickle(node.target, options) 2025-12-04T11:29:55.4232197Z W1204 11:22:57.725000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 435, in pickle 2025-12-04T11:29:55.4233678Z W1204 11:22:57.725000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] return cls._pickle_op(name, _OpOverloadPickleData, options) 2025-12-04T11:29:55.4235196Z W1204 11:22:57.725000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 456, in _pickle_op 2025-12-04T11:29:55.4236770Z W1204 11:22:57.725000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] raise BypassFxGraphCache(f"Unable to pickle non-standard op: {name}") 2025-12-04T11:29:55.4238348Z W1204 11:22:57.725000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] torch._inductor.codecache.BypassFxGraphCache: Unable to pickle non-standard op: torch.ops.prims.rev.default 2025-12-04T11:29:55.4239414Z PASSED [0.4836s] [ 36%] 2025-12-04T11:29:55.4240317Z inductor/test_compile_subprocess.py::GPUTests::test_float_index_expression_type_promotion_cuda <- test/inductor/test_torchinductor.py PASSED [0.2572s] [ 36%] 2025-12-04T11:29:55.4241671Z inductor/test_compile_subprocess.py::GPUTests::test_floordiv_cuda <- test/inductor/test_torchinductor.py PASSED [0.6685s] [ 36%] 2025-12-04T11:29:55.4242896Z inductor/test_compile_subprocess.py::GPUTests::test_fmod_zero_dim_cuda <- test/inductor/test_torchinductor.py PASSED [1.0238s] [ 37%] 2025-12-04T11:29:55.4244734Z inductor/test_compile_subprocess.py::GPUTests::test_forced_buffer_realize_cuda <- test/inductor/test_torchinductor.py W1204 11:22:59.873000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Unable to pickle input graph or example inputs 2025-12-04T11:29:55.4246413Z W1204 11:22:59.873000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Traceback (most recent call last): 2025-12-04T11:29:55.4247907Z W1204 11:22:59.873000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 486, in serialize_compile 2025-12-04T11:29:55.4249334Z W1204 11:22:59.873000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] ).serialize() 2025-12-04T11:29:55.4250700Z W1204 11:22:59.873000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 210, in serialize 2025-12-04T11:29:55.4252259Z W1204 11:22:59.873000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] return _WireProtocolPickledInput(GraphPickler.dumps(self)) 2025-12-04T11:29:55.4253741Z W1204 11:22:59.873000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 127, in dumps 2025-12-04T11:29:55.4255068Z W1204 11:22:59.873000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] pickler.dump(obj) 2025-12-04T11:29:55.4256521Z W1204 11:22:59.873000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 94, in reducer_override 2025-12-04T11:29:55.4258047Z W1204 11:22:59.873000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] return _GraphModulePickleData.reduce_helper(self, obj) 2025-12-04T11:29:55.4259550Z W1204 11:22:59.873000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 352, in reduce_helper 2025-12-04T11:29:55.4260950Z W1204 11:22:59.873000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] cls(obj, pickler.options), 2025-12-04T11:29:55.4262342Z W1204 11:22:59.873000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 364, in __init__ 2025-12-04T11:29:55.4263798Z W1204 11:22:59.873000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] self.graph = _GraphPickleData(gm._graph, options) 2025-12-04T11:29:55.4265249Z W1204 11:22:59.873000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 571, in __init__ 2025-12-04T11:29:55.4266833Z W1204 11:22:59.873000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] nodes[node] = _NodePickleData(node, nodes, options) 2025-12-04T11:29:55.4268293Z W1204 11:22:59.873000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 387, in __init__ 2025-12-04T11:29:55.4269781Z W1204 11:22:59.873000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] self.target = _OpPickleData.pickle(node.target, options) 2025-12-04T11:29:55.4271544Z W1204 11:22:59.873000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 435, in pickle 2025-12-04T11:29:55.4273049Z W1204 11:22:59.873000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] return cls._pickle_op(name, _OpOverloadPickleData, options) 2025-12-04T11:29:55.4274601Z W1204 11:22:59.873000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 456, in _pickle_op 2025-12-04T11:29:55.4276215Z W1204 11:22:59.873000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] raise BypassFxGraphCache(f"Unable to pickle non-standard op: {name}") 2025-12-04T11:29:55.4277858Z W1204 11:22:59.873000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] torch._inductor.codecache.BypassFxGraphCache: Unable to pickle non-standard op: torch.ops._inductor_test.realize.default 2025-12-04T11:29:55.4279371Z W1204 11:23:00.039000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Unable to pickle input graph or example inputs 2025-12-04T11:29:55.4280511Z W1204 11:23:00.039000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Traceback (most recent call last): 2025-12-04T11:29:55.4281991Z W1204 11:23:00.039000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 486, in serialize_compile 2025-12-04T11:29:55.4283392Z W1204 11:23:00.039000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] ).serialize() 2025-12-04T11:29:55.4284749Z W1204 11:23:00.039000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 210, in serialize 2025-12-04T11:29:55.4286303Z W1204 11:23:00.039000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] return _WireProtocolPickledInput(GraphPickler.dumps(self)) 2025-12-04T11:29:55.4287794Z W1204 11:23:00.039000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 127, in dumps 2025-12-04T11:29:55.4289115Z W1204 11:23:00.039000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] pickler.dump(obj) 2025-12-04T11:29:55.4290502Z W1204 11:23:00.039000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 94, in reducer_override 2025-12-04T11:29:55.4292022Z W1204 11:23:00.039000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] return _GraphModulePickleData.reduce_helper(self, obj) 2025-12-04T11:29:55.4293545Z W1204 11:23:00.039000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 352, in reduce_helper 2025-12-04T11:29:55.4294934Z W1204 11:23:00.039000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] cls(obj, pickler.options), 2025-12-04T11:29:55.4296316Z W1204 11:23:00.039000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 364, in __init__ 2025-12-04T11:29:55.4297833Z W1204 11:23:00.039000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] self.graph = _GraphPickleData(gm._graph, options) 2025-12-04T11:29:55.4299302Z W1204 11:23:00.039000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 571, in __init__ 2025-12-04T11:29:55.4300887Z W1204 11:23:00.039000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] nodes[node] = _NodePickleData(node, nodes, options) 2025-12-04T11:29:55.4302493Z W1204 11:23:00.039000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 387, in __init__ 2025-12-04T11:29:55.4304195Z W1204 11:23:00.039000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] self.target = _OpPickleData.pickle(node.target, options) 2025-12-04T11:29:55.4306099Z W1204 11:23:00.039000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 435, in pickle 2025-12-04T11:29:55.4307748Z W1204 11:23:00.039000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] return cls._pickle_op(name, _OpOverloadPickleData, options) 2025-12-04T11:29:55.4309388Z W1204 11:23:00.039000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 456, in _pickle_op 2025-12-04T11:29:55.4311143Z W1204 11:23:00.039000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] raise BypassFxGraphCache(f"Unable to pickle non-standard op: {name}") 2025-12-04T11:29:55.4312940Z W1204 11:23:00.039000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] torch._inductor.codecache.BypassFxGraphCache: Unable to pickle non-standard op: torch.ops._inductor_test.realize.default 2025-12-04T11:29:55.4314119Z PASSED [0.3363s] [ 37%] 2025-12-04T11:29:55.4315110Z inductor/test_compile_subprocess.py::GPUTests::test_fractional_max_pool2d2_cuda <- test/inductor/test_torchinductor.py PASSED [1.4649s] [ 37%] 2025-12-04T11:29:55.4316518Z inductor/test_compile_subprocess.py::GPUTests::test_full_like_cuda <- test/inductor/test_torchinductor.py PASSED [0.3850s] [ 38%] 2025-12-04T11:29:55.4318183Z inductor/test_compile_subprocess.py::GPUTests::test_fuse_large_params_cuda <- test/inductor/test_torchinductor.py SKIPPED [0.0036s] (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 38%] 2025-12-04T11:29:55.4320472Z inductor/test_compile_subprocess.py::GPUTests::test_fusing_write_into_disjoint_read_cuda <- test/inductor/test_torchinductor.py W1204 11:23:02.084000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Unable to pickle input graph or example inputs 2025-12-04T11:29:55.4344808Z W1204 11:23:02.084000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Traceback (most recent call last): 2025-12-04T11:29:55.4346348Z W1204 11:23:02.084000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 486, in serialize_compile 2025-12-04T11:29:55.4347731Z W1204 11:23:02.084000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] ).serialize() 2025-12-04T11:29:55.4349077Z W1204 11:23:02.084000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 210, in serialize 2025-12-04T11:29:55.4350629Z W1204 11:23:02.084000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] return _WireProtocolPickledInput(GraphPickler.dumps(self)) 2025-12-04T11:29:55.4352127Z W1204 11:23:02.084000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 127, in dumps 2025-12-04T11:29:55.4353455Z W1204 11:23:02.084000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] pickler.dump(obj) 2025-12-04T11:29:55.4354840Z W1204 11:23:02.084000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 94, in reducer_override 2025-12-04T11:29:55.4356350Z W1204 11:23:02.084000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] return _GraphModulePickleData.reduce_helper(self, obj) 2025-12-04T11:29:55.4357971Z W1204 11:23:02.084000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 352, in reduce_helper 2025-12-04T11:29:55.4359379Z W1204 11:23:02.084000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] cls(obj, pickler.options), 2025-12-04T11:29:55.4360808Z W1204 11:23:02.084000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 364, in __init__ 2025-12-04T11:29:55.4362296Z W1204 11:23:02.084000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] self.graph = _GraphPickleData(gm._graph, options) 2025-12-04T11:29:55.4363758Z W1204 11:23:02.084000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 571, in __init__ 2025-12-04T11:29:55.4365219Z W1204 11:23:02.084000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] nodes[node] = _NodePickleData(node, nodes, options) 2025-12-04T11:29:55.4366723Z W1204 11:23:02.084000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 387, in __init__ 2025-12-04T11:29:55.4368221Z W1204 11:23:02.084000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] self.target = _OpPickleData.pickle(node.target, options) 2025-12-04T11:29:55.4369697Z W1204 11:23:02.084000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 435, in pickle 2025-12-04T11:29:55.4371414Z W1204 11:23:02.084000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] return cls._pickle_op(name, _OpOverloadPickleData, options) 2025-12-04T11:29:55.4372955Z W1204 11:23:02.084000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 456, in _pickle_op 2025-12-04T11:29:55.4374530Z W1204 11:23:02.084000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] raise BypassFxGraphCache(f"Unable to pickle non-standard op: {name}") 2025-12-04T11:29:55.4376117Z W1204 11:23:02.084000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] torch._inductor.codecache.BypassFxGraphCache: Unable to pickle non-standard op: torch.ops.prims.rev.default 2025-12-04T11:29:55.4377645Z W1204 11:23:02.273000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Unable to pickle input graph or example inputs 2025-12-04T11:29:55.4378745Z W1204 11:23:02.273000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Traceback (most recent call last): 2025-12-04T11:29:55.4380236Z W1204 11:23:02.273000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 486, in serialize_compile 2025-12-04T11:29:55.4381641Z W1204 11:23:02.273000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] ).serialize() 2025-12-04T11:29:55.4382987Z W1204 11:23:02.273000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 210, in serialize 2025-12-04T11:29:55.4384547Z W1204 11:23:02.273000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] return _WireProtocolPickledInput(GraphPickler.dumps(self)) 2025-12-04T11:29:55.4386057Z W1204 11:23:02.273000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 127, in dumps 2025-12-04T11:29:55.4387480Z W1204 11:23:02.273000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] pickler.dump(obj) 2025-12-04T11:29:55.4388868Z W1204 11:23:02.273000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 94, in reducer_override 2025-12-04T11:29:55.4390393Z W1204 11:23:02.273000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] return _GraphModulePickleData.reduce_helper(self, obj) 2025-12-04T11:29:55.4392010Z W1204 11:23:02.273000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 352, in reduce_helper 2025-12-04T11:29:55.4393419Z W1204 11:23:02.273000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] cls(obj, pickler.options), 2025-12-04T11:29:55.4394798Z W1204 11:23:02.273000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 364, in __init__ 2025-12-04T11:29:55.4396265Z W1204 11:23:02.273000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] self.graph = _GraphPickleData(gm._graph, options) 2025-12-04T11:29:55.4397756Z W1204 11:23:02.273000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 571, in __init__ 2025-12-04T11:29:55.4399214Z W1204 11:23:02.273000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] nodes[node] = _NodePickleData(node, nodes, options) 2025-12-04T11:29:55.4400673Z W1204 11:23:02.273000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 387, in __init__ 2025-12-04T11:29:55.4402168Z W1204 11:23:02.273000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] self.target = _OpPickleData.pickle(node.target, options) 2025-12-04T11:29:55.4403636Z W1204 11:23:02.273000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 435, in pickle 2025-12-04T11:29:55.4405136Z W1204 11:23:02.273000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] return cls._pickle_op(name, _OpOverloadPickleData, options) 2025-12-04T11:29:55.4406658Z W1204 11:23:02.273000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 456, in _pickle_op 2025-12-04T11:29:55.4408233Z W1204 11:23:02.273000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] raise BypassFxGraphCache(f"Unable to pickle non-standard op: {name}") 2025-12-04T11:29:55.4409809Z W1204 11:23:02.273000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] torch._inductor.codecache.BypassFxGraphCache: Unable to pickle non-standard op: torch.ops.prims.rev.default 2025-12-04T11:29:55.4410867Z PASSED [1.7494s] [ 38%] 2025-12-04T11:29:55.4411607Z inductor/test_compile_subprocess.py::GPUTests::test_gather1_cuda <- test/inductor/test_torchinductor.py PASSED [0.8960s] [ 39%] 2025-12-04T11:29:55.4412844Z inductor/test_compile_subprocess.py::GPUTests::test_gather_scatter_cuda <- test/inductor/test_torchinductor.py PASSED [0.5518s] [ 39%] 2025-12-04T11:29:55.4414105Z inductor/test_compile_subprocess.py::GPUTests::test_generate_rand_fp8_cuda <- test/inductor/test_torchinductor.py PASSED [0.0036s] [ 39%] 2025-12-04T11:29:55.4415301Z inductor/test_compile_subprocess.py::GPUTests::test_glu_cuda <- test/inductor/test_torchinductor.py PASSED [0.8363s] [ 40%] 2025-12-04T11:29:55.4416611Z inductor/test_compile_subprocess.py::GPUTests::test_graph_partition_both_scalars_cuda <- test/inductor/test_torchinductor.py PASSED [0.7717s] [ 40%] 2025-12-04T11:29:55.4418074Z inductor/test_compile_subprocess.py::GPUTests::test_graph_partition_constant_tensor2_cuda <- test/inductor/test_torchinductor.py PASSED [0.2118s] [ 40%] 2025-12-04T11:29:55.4420017Z inductor/test_compile_subprocess.py::GPUTests::test_graph_partition_mutation_real_name_cuda <- test/inductor/test_torchinductor.py W1204 11:23:07.210000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Unable to pickle input graph or example inputs 2025-12-04T11:29:55.4421776Z W1204 11:23:07.210000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Traceback (most recent call last): 2025-12-04T11:29:55.4423314Z W1204 11:23:07.210000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 486, in serialize_compile 2025-12-04T11:29:55.4424701Z W1204 11:23:07.210000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] ).serialize() 2025-12-04T11:29:55.4426071Z W1204 11:23:07.210000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 210, in serialize 2025-12-04T11:29:55.4427668Z W1204 11:23:07.210000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] return _WireProtocolPickledInput(GraphPickler.dumps(self)) 2025-12-04T11:29:55.4429157Z W1204 11:23:07.210000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 127, in dumps 2025-12-04T11:29:55.4430474Z W1204 11:23:07.210000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] pickler.dump(obj) 2025-12-04T11:29:55.4431841Z W1204 11:23:07.210000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 94, in reducer_override 2025-12-04T11:29:55.4433356Z W1204 11:23:07.210000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] return _GraphModulePickleData.reduce_helper(self, obj) 2025-12-04T11:29:55.4434865Z W1204 11:23:07.210000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 352, in reduce_helper 2025-12-04T11:29:55.4436257Z W1204 11:23:07.210000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] cls(obj, pickler.options), 2025-12-04T11:29:55.4437646Z W1204 11:23:07.210000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 364, in __init__ 2025-12-04T11:29:55.4439112Z W1204 11:23:07.210000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] self.graph = _GraphPickleData(gm._graph, options) 2025-12-04T11:29:55.4440550Z W1204 11:23:07.210000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 571, in __init__ 2025-12-04T11:29:55.4441998Z W1204 11:23:07.210000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] nodes[node] = _NodePickleData(node, nodes, options) 2025-12-04T11:29:55.4443440Z W1204 11:23:07.210000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 387, in __init__ 2025-12-04T11:29:55.4444919Z W1204 11:23:07.210000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] self.target = _OpPickleData.pickle(node.target, options) 2025-12-04T11:29:55.4446401Z W1204 11:23:07.210000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 435, in pickle 2025-12-04T11:29:55.4447899Z W1204 11:23:07.210000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] return cls._pickle_op(name, _OpOverloadPickleData, options) 2025-12-04T11:29:55.4449450Z W1204 11:23:07.210000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 456, in _pickle_op 2025-12-04T11:29:55.4451033Z W1204 11:23:07.210000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] raise BypassFxGraphCache(f"Unable to pickle non-standard op: {name}") 2025-12-04T11:29:55.4452706Z W1204 11:23:07.210000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] torch._inductor.codecache.BypassFxGraphCache: Unable to pickle non-standard op: torch.ops.prims.convert_element_type.default 2025-12-04T11:29:55.4454242Z W1204 11:23:07.238000 94107 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program 2025-12-04T11:29:55.4455180Z W1204 11:23:07.239000 94107 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program 2025-12-04T11:29:55.4455817Z PASSED [0.3561s] [ 41%] 2025-12-04T11:29:55.4457174Z inductor/test_compile_subprocess.py::GPUTests::test_graph_partition_no_inputs_cuda <- test/inductor/test_torchinductor.py W1204 11:23:07.508000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [1/0] Unable to pickle input graph or example inputs 2025-12-04T11:29:55.4458911Z W1204 11:23:07.508000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [1/0] Traceback (most recent call last): 2025-12-04T11:29:55.4460381Z W1204 11:23:07.508000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [1/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 486, in serialize_compile 2025-12-04T11:29:55.4461766Z W1204 11:23:07.508000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [1/0] ).serialize() 2025-12-04T11:29:55.4463108Z W1204 11:23:07.508000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [1/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 210, in serialize 2025-12-04T11:29:55.4464650Z W1204 11:23:07.508000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [1/0] return _WireProtocolPickledInput(GraphPickler.dumps(self)) 2025-12-04T11:29:55.4466148Z W1204 11:23:07.508000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [1/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 127, in dumps 2025-12-04T11:29:55.4467463Z W1204 11:23:07.508000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [1/0] pickler.dump(obj) 2025-12-04T11:29:55.4468821Z W1204 11:23:07.508000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [1/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 94, in reducer_override 2025-12-04T11:29:55.4470338Z W1204 11:23:07.508000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [1/0] return _GraphModulePickleData.reduce_helper(self, obj) 2025-12-04T11:29:55.4472047Z W1204 11:23:07.508000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [1/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 352, in reduce_helper 2025-12-04T11:29:55.4473439Z W1204 11:23:07.508000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [1/0] cls(obj, pickler.options), 2025-12-04T11:29:55.4474800Z W1204 11:23:07.508000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [1/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 364, in __init__ 2025-12-04T11:29:55.4476241Z W1204 11:23:07.508000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [1/0] self.graph = _GraphPickleData(gm._graph, options) 2025-12-04T11:29:55.4477687Z W1204 11:23:07.508000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [1/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 571, in __init__ 2025-12-04T11:29:55.4479142Z W1204 11:23:07.508000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [1/0] nodes[node] = _NodePickleData(node, nodes, options) 2025-12-04T11:29:55.4480701Z W1204 11:23:07.508000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [1/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 387, in __init__ 2025-12-04T11:29:55.4482188Z W1204 11:23:07.508000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [1/0] self.target = _OpPickleData.pickle(node.target, options) 2025-12-04T11:29:55.4483722Z W1204 11:23:07.508000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [1/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 435, in pickle 2025-12-04T11:29:55.4485294Z W1204 11:23:07.508000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [1/0] return cls._pickle_op(name, _OpOverloadPickleData, options) 2025-12-04T11:29:55.4486797Z W1204 11:23:07.508000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [1/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 456, in _pickle_op 2025-12-04T11:29:55.4488361Z W1204 11:23:07.508000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [1/0] raise BypassFxGraphCache(f"Unable to pickle non-standard op: {name}") 2025-12-04T11:29:55.4490042Z W1204 11:23:07.508000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [1/0] torch._inductor.codecache.BypassFxGraphCache: Unable to pickle non-standard op: torch.ops.prims.inductor_seeds.default 2025-12-04T11:29:55.4491154Z PASSED [0.8199s] [ 41%] 2025-12-04T11:29:55.4491968Z inductor/test_compile_subprocess.py::GPUTests::test_graph_partition_pad_dynamic_cuda <- test/inductor/test_torchinductor.py PASSED [3.9940s] [ 42%] 2025-12-04T11:29:55.4493330Z inductor/test_compile_subprocess.py::GPUTests::test_graph_partition_refcount_cuda <- test/inductor/test_torchinductor.py PASSED [5.2116s] [ 42%] 2025-12-04T11:29:55.4495132Z inductor/test_compile_subprocess.py::GPUTests::test_grid_sampler_2d_cuda <- test/inductor/test_torchinductor.py W1204 11:23:17.892000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Unable to pickle input graph or example inputs 2025-12-04T11:29:55.4496844Z W1204 11:23:17.892000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Traceback (most recent call last): 2025-12-04T11:29:55.4498325Z W1204 11:23:17.892000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 486, in serialize_compile 2025-12-04T11:29:55.4499713Z W1204 11:23:17.892000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] ).serialize() 2025-12-04T11:29:55.4501057Z W1204 11:23:17.892000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 210, in serialize 2025-12-04T11:29:55.4502591Z W1204 11:23:17.892000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] return _WireProtocolPickledInput(GraphPickler.dumps(self)) 2025-12-04T11:29:55.4504089Z W1204 11:23:17.892000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 127, in dumps 2025-12-04T11:29:55.4505408Z W1204 11:23:17.892000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] pickler.dump(obj) 2025-12-04T11:29:55.4506774Z W1204 11:23:17.892000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 94, in reducer_override 2025-12-04T11:29:55.4508287Z W1204 11:23:17.892000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] return _GraphModulePickleData.reduce_helper(self, obj) 2025-12-04T11:29:55.4509784Z W1204 11:23:17.892000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 352, in reduce_helper 2025-12-04T11:29:55.4511241Z W1204 11:23:17.892000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] cls(obj, pickler.options), 2025-12-04T11:29:55.4512612Z W1204 11:23:17.892000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 364, in __init__ 2025-12-04T11:29:55.4514050Z W1204 11:23:17.892000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] self.graph = _GraphPickleData(gm._graph, options) 2025-12-04T11:29:55.4515564Z W1204 11:23:17.892000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 571, in __init__ 2025-12-04T11:29:55.4517016Z W1204 11:23:17.892000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] nodes[node] = _NodePickleData(node, nodes, options) 2025-12-04T11:29:55.4518476Z W1204 11:23:17.892000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 387, in __init__ 2025-12-04T11:29:55.4519982Z W1204 11:23:17.892000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] self.target = _OpPickleData.pickle(node.target, options) 2025-12-04T11:29:55.4521470Z W1204 11:23:17.892000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 435, in pickle 2025-12-04T11:29:55.4522976Z W1204 11:23:17.892000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] return cls._pickle_op(name, _OpOverloadPickleData, options) 2025-12-04T11:29:55.4524490Z W1204 11:23:17.892000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 456, in _pickle_op 2025-12-04T11:29:55.4526059Z W1204 11:23:17.892000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] raise BypassFxGraphCache(f"Unable to pickle non-standard op: {name}") 2025-12-04T11:29:55.4527638Z W1204 11:23:17.892000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] torch._inductor.codecache.BypassFxGraphCache: Unable to pickle non-standard op: torch.ops.prims.iota.default 2025-12-04T11:29:55.4528710Z PASSED [3.4367s] [ 42%] 2025-12-04T11:29:55.4529436Z inductor/test_compile_subprocess.py::GPUTests::test_index2_cuda <- test/inductor/test_torchinductor.py PASSED [0.9474s] [ 43%] 2025-12-04T11:29:55.4531177Z inductor/test_compile_subprocess.py::GPUTests::test_index_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py W1204 11:23:22.015000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Unable to pickle input graph or example inputs 2025-12-04T11:29:55.4532846Z W1204 11:23:22.015000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Traceback (most recent call last): 2025-12-04T11:29:55.4534337Z W1204 11:23:22.015000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 486, in serialize_compile 2025-12-04T11:29:55.4535737Z W1204 11:23:22.015000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] ).serialize() 2025-12-04T11:29:55.4537196Z W1204 11:23:22.015000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 210, in serialize 2025-12-04T11:29:55.4538753Z W1204 11:23:22.015000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] return _WireProtocolPickledInput(GraphPickler.dumps(self)) 2025-12-04T11:29:55.4540245Z W1204 11:23:22.015000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 127, in dumps 2025-12-04T11:29:55.4541622Z W1204 11:23:22.015000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] pickler.dump(obj) 2025-12-04T11:29:55.4543008Z W1204 11:23:22.015000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 94, in reducer_override 2025-12-04T11:29:55.4544531Z W1204 11:23:22.015000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] return _GraphModulePickleData.reduce_helper(self, obj) 2025-12-04T11:29:55.4546107Z W1204 11:23:22.015000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 352, in reduce_helper 2025-12-04T11:29:55.4547499Z W1204 11:23:22.015000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] cls(obj, pickler.options), 2025-12-04T11:29:55.4548880Z W1204 11:23:22.015000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 364, in __init__ 2025-12-04T11:29:55.4550371Z W1204 11:23:22.015000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] self.graph = _GraphPickleData(gm._graph, options) 2025-12-04T11:29:55.4551840Z W1204 11:23:22.015000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 571, in __init__ 2025-12-04T11:29:55.4553285Z W1204 11:23:22.015000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] nodes[node] = _NodePickleData(node, nodes, options) 2025-12-04T11:29:55.4554735Z W1204 11:23:22.015000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 387, in __init__ 2025-12-04T11:29:55.4556228Z W1204 11:23:22.015000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] self.target = _OpPickleData.pickle(node.target, options) 2025-12-04T11:29:55.4557715Z W1204 11:23:22.015000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 435, in pickle 2025-12-04T11:29:55.4559218Z W1204 11:23:22.015000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] return cls._pickle_op(name, _OpOverloadPickleData, options) 2025-12-04T11:29:55.4560732Z W1204 11:23:22.015000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 456, in _pickle_op 2025-12-04T11:29:55.4562299Z W1204 11:23:22.015000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] raise BypassFxGraphCache(f"Unable to pickle non-standard op: {name}") 2025-12-04T11:29:55.4563893Z W1204 11:23:22.015000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] torch._inductor.codecache.BypassFxGraphCache: Unable to pickle non-standard op: torch.ops.prims.iota.default 2025-12-04T11:29:55.4565363Z W1204 11:23:22.707000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Unable to pickle input graph or example inputs 2025-12-04T11:29:55.4566462Z W1204 11:23:22.707000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Traceback (most recent call last): 2025-12-04T11:29:55.4567935Z W1204 11:23:22.707000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 486, in serialize_compile 2025-12-04T11:29:55.4569337Z W1204 11:23:22.707000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] ).serialize() 2025-12-04T11:29:55.4570690Z W1204 11:23:22.707000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 210, in serialize 2025-12-04T11:29:55.4572527Z W1204 11:23:22.707000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] return _WireProtocolPickledInput(GraphPickler.dumps(self)) 2025-12-04T11:29:55.4574027Z W1204 11:23:22.707000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 127, in dumps 2025-12-04T11:29:55.4575347Z W1204 11:23:22.707000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] pickler.dump(obj) 2025-12-04T11:29:55.4576896Z W1204 11:23:22.707000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 94, in reducer_override 2025-12-04T11:29:55.4578422Z W1204 11:23:22.707000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] return _GraphModulePickleData.reduce_helper(self, obj) 2025-12-04T11:29:55.4579935Z W1204 11:23:22.707000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 352, in reduce_helper 2025-12-04T11:29:55.4581373Z W1204 11:23:22.707000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] cls(obj, pickler.options), 2025-12-04T11:29:55.4582754Z W1204 11:23:22.707000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 364, in __init__ 2025-12-04T11:29:55.4584219Z W1204 11:23:22.707000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] self.graph = _GraphPickleData(gm._graph, options) 2025-12-04T11:29:55.4585680Z W1204 11:23:22.707000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 571, in __init__ 2025-12-04T11:29:55.4587143Z W1204 11:23:22.707000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] nodes[node] = _NodePickleData(node, nodes, options) 2025-12-04T11:29:55.4588595Z W1204 11:23:22.707000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 387, in __init__ 2025-12-04T11:29:55.4590093Z W1204 11:23:22.707000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] self.target = _OpPickleData.pickle(node.target, options) 2025-12-04T11:29:55.4591586Z W1204 11:23:22.707000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 435, in pickle 2025-12-04T11:29:55.4593078Z W1204 11:23:22.707000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] return cls._pickle_op(name, _OpOverloadPickleData, options) 2025-12-04T11:29:55.4594585Z W1204 11:23:22.707000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 456, in _pickle_op 2025-12-04T11:29:55.4596162Z W1204 11:23:22.707000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] raise BypassFxGraphCache(f"Unable to pickle non-standard op: {name}") 2025-12-04T11:29:55.4597758Z W1204 11:23:22.707000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] torch._inductor.codecache.BypassFxGraphCache: Unable to pickle non-standard op: torch.ops.prims.iota.default 2025-12-04T11:29:55.4598834Z PASSED [1.3900s] [ 43%] 2025-12-04T11:29:55.4600113Z inductor/test_compile_subprocess.py::GPUTests::test_index_propagation_abs_cuda <- test/inductor/test_torchinductor.py W1204 11:23:23.291000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Unable to pickle input graph or example inputs 2025-12-04T11:29:55.4601802Z W1204 11:23:23.291000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Traceback (most recent call last): 2025-12-04T11:29:55.4603333Z W1204 11:23:23.291000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 486, in serialize_compile 2025-12-04T11:29:55.4604730Z W1204 11:23:23.291000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] ).serialize() 2025-12-04T11:29:55.4606085Z W1204 11:23:23.291000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 210, in serialize 2025-12-04T11:29:55.4607697Z W1204 11:23:23.291000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] return _WireProtocolPickledInput(GraphPickler.dumps(self)) 2025-12-04T11:29:55.4609220Z W1204 11:23:23.291000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 127, in dumps 2025-12-04T11:29:55.4610550Z W1204 11:23:23.291000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] pickler.dump(obj) 2025-12-04T11:29:55.4611955Z W1204 11:23:23.291000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 94, in reducer_override 2025-12-04T11:29:55.4613478Z W1204 11:23:23.291000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] return _GraphModulePickleData.reduce_helper(self, obj) 2025-12-04T11:29:55.4615008Z W1204 11:23:23.291000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 352, in reduce_helper 2025-12-04T11:29:55.4616494Z W1204 11:23:23.291000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] cls(obj, pickler.options), 2025-12-04T11:29:55.4617872Z W1204 11:23:23.291000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 364, in __init__ 2025-12-04T11:29:55.4619338Z W1204 11:23:23.291000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] self.graph = _GraphPickleData(gm._graph, options) 2025-12-04T11:29:55.4620805Z W1204 11:23:23.291000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 571, in __init__ 2025-12-04T11:29:55.4622266Z W1204 11:23:23.291000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] nodes[node] = _NodePickleData(node, nodes, options) 2025-12-04T11:29:55.4623730Z W1204 11:23:23.291000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 387, in __init__ 2025-12-04T11:29:55.4625211Z W1204 11:23:23.291000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] self.target = _OpPickleData.pickle(node.target, options) 2025-12-04T11:29:55.4626705Z W1204 11:23:23.291000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 435, in pickle 2025-12-04T11:29:55.4628212Z W1204 11:23:23.291000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] return cls._pickle_op(name, _OpOverloadPickleData, options) 2025-12-04T11:29:55.4629744Z W1204 11:23:23.291000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 456, in _pickle_op 2025-12-04T11:29:55.4631321Z W1204 11:23:23.291000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] raise BypassFxGraphCache(f"Unable to pickle non-standard op: {name}") 2025-12-04T11:29:55.4632941Z W1204 11:23:23.291000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] torch._inductor.codecache.BypassFxGraphCache: Unable to pickle non-standard op: torch.ops.prims.iota.default 2025-12-04T11:29:55.4634018Z PASSED [0.2126s] [ 43%] 2025-12-04T11:29:55.4635283Z inductor/test_compile_subprocess.py::GPUTests::test_index_propagation_cuda <- test/inductor/test_torchinductor.py W1204 11:23:23.496000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Unable to pickle input graph or example inputs 2025-12-04T11:29:55.4636959Z W1204 11:23:23.496000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Traceback (most recent call last): 2025-12-04T11:29:55.4638506Z W1204 11:23:23.496000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 486, in serialize_compile 2025-12-04T11:29:55.4639921Z W1204 11:23:23.496000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] ).serialize() 2025-12-04T11:29:55.4641282Z W1204 11:23:23.496000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 210, in serialize 2025-12-04T11:29:55.4642861Z W1204 11:23:23.496000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] return _WireProtocolPickledInput(GraphPickler.dumps(self)) 2025-12-04T11:29:55.4644358Z W1204 11:23:23.496000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 127, in dumps 2025-12-04T11:29:55.4645672Z W1204 11:23:23.496000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] pickler.dump(obj) 2025-12-04T11:29:55.4647058Z W1204 11:23:23.496000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 94, in reducer_override 2025-12-04T11:29:55.4648587Z W1204 11:23:23.496000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] return _GraphModulePickleData.reduce_helper(self, obj) 2025-12-04T11:29:55.4650099Z W1204 11:23:23.496000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 352, in reduce_helper 2025-12-04T11:29:55.4651507Z W1204 11:23:23.496000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] cls(obj, pickler.options), 2025-12-04T11:29:55.4652879Z W1204 11:23:23.496000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 364, in __init__ 2025-12-04T11:29:55.4654332Z W1204 11:23:23.496000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] self.graph = _GraphPickleData(gm._graph, options) 2025-12-04T11:29:55.4655798Z W1204 11:23:23.496000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 571, in __init__ 2025-12-04T11:29:55.4657327Z W1204 11:23:23.496000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] nodes[node] = _NodePickleData(node, nodes, options) 2025-12-04T11:29:55.4658778Z W1204 11:23:23.496000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 387, in __init__ 2025-12-04T11:29:55.4660270Z W1204 11:23:23.496000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] self.target = _OpPickleData.pickle(node.target, options) 2025-12-04T11:29:55.4661758Z W1204 11:23:23.496000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 435, in pickle 2025-12-04T11:29:55.4663256Z W1204 11:23:23.496000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] return cls._pickle_op(name, _OpOverloadPickleData, options) 2025-12-04T11:29:55.4664821Z W1204 11:23:23.496000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 456, in _pickle_op 2025-12-04T11:29:55.4666381Z W1204 11:23:23.496000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] raise BypassFxGraphCache(f"Unable to pickle non-standard op: {name}") 2025-12-04T11:29:55.4668001Z W1204 11:23:23.496000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] torch._inductor.codecache.BypassFxGraphCache: Unable to pickle non-standard op: torch.ops.prims.iota.default 2025-12-04T11:29:55.4669103Z PASSED [0.1764s] [ 44%] 2025-12-04T11:29:55.4670412Z inductor/test_compile_subprocess.py::GPUTests::test_index_propagation_floordiv_cuda <- test/inductor/test_torchinductor.py W1204 11:23:23.681000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Unable to pickle input graph or example inputs 2025-12-04T11:29:55.4672299Z W1204 11:23:23.681000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Traceback (most recent call last): 2025-12-04T11:29:55.4673867Z W1204 11:23:23.681000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 486, in serialize_compile 2025-12-04T11:29:55.4675277Z W1204 11:23:23.681000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] ).serialize() 2025-12-04T11:29:55.4676651Z W1204 11:23:23.681000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 210, in serialize 2025-12-04T11:29:55.4678209Z W1204 11:23:23.681000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] return _WireProtocolPickledInput(GraphPickler.dumps(self)) 2025-12-04T11:29:55.4679701Z W1204 11:23:23.681000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 127, in dumps 2025-12-04T11:29:55.4681034Z W1204 11:23:23.681000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] pickler.dump(obj) 2025-12-04T11:29:55.4682416Z W1204 11:23:23.681000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 94, in reducer_override 2025-12-04T11:29:55.4683938Z W1204 11:23:23.681000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] return _GraphModulePickleData.reduce_helper(self, obj) 2025-12-04T11:29:55.4685450Z W1204 11:23:23.681000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 352, in reduce_helper 2025-12-04T11:29:55.4686846Z W1204 11:23:23.681000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] cls(obj, pickler.options), 2025-12-04T11:29:55.4688226Z W1204 11:23:23.681000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 364, in __init__ 2025-12-04T11:29:55.4689682Z W1204 11:23:23.681000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] self.graph = _GraphPickleData(gm._graph, options) 2025-12-04T11:29:55.4691147Z W1204 11:23:23.681000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 571, in __init__ 2025-12-04T11:29:55.4692592Z W1204 11:23:23.681000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] nodes[node] = _NodePickleData(node, nodes, options) 2025-12-04T11:29:55.4694039Z W1204 11:23:23.681000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 387, in __init__ 2025-12-04T11:29:55.4695610Z W1204 11:23:23.681000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] self.target = _OpPickleData.pickle(node.target, options) 2025-12-04T11:29:55.4697154Z W1204 11:23:23.681000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 435, in pickle 2025-12-04T11:29:55.4698708Z W1204 11:23:23.681000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] return cls._pickle_op(name, _OpOverloadPickleData, options) 2025-12-04T11:29:55.4700258Z W1204 11:23:23.681000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 456, in _pickle_op 2025-12-04T11:29:55.4701831Z W1204 11:23:23.681000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] raise BypassFxGraphCache(f"Unable to pickle non-standard op: {name}") 2025-12-04T11:29:55.4703419Z W1204 11:23:23.681000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] torch._inductor.codecache.BypassFxGraphCache: Unable to pickle non-standard op: torch.ops.prims.iota.default 2025-12-04T11:29:55.4704535Z PASSED [0.2917s] [ 44%] 2025-12-04T11:29:55.4705853Z inductor/test_compile_subprocess.py::GPUTests::test_index_propagation_remainder_cuda <- test/inductor/test_torchinductor.py W1204 11:23:23.971000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Unable to pickle input graph or example inputs 2025-12-04T11:29:55.4707568Z W1204 11:23:23.971000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Traceback (most recent call last): 2025-12-04T11:29:55.4709055Z W1204 11:23:23.971000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 486, in serialize_compile 2025-12-04T11:29:55.4710459Z W1204 11:23:23.971000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] ).serialize() 2025-12-04T11:29:55.4711813Z W1204 11:23:23.971000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 210, in serialize 2025-12-04T11:29:55.4713347Z W1204 11:23:23.971000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] return _WireProtocolPickledInput(GraphPickler.dumps(self)) 2025-12-04T11:29:55.4714854Z W1204 11:23:23.971000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 127, in dumps 2025-12-04T11:29:55.4716177Z W1204 11:23:23.971000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] pickler.dump(obj) 2025-12-04T11:29:55.4717557Z W1204 11:23:23.971000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 94, in reducer_override 2025-12-04T11:29:55.4719080Z W1204 11:23:23.971000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] return _GraphModulePickleData.reduce_helper(self, obj) 2025-12-04T11:29:55.4720581Z W1204 11:23:23.971000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 352, in reduce_helper 2025-12-04T11:29:55.4721993Z W1204 11:23:23.971000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] cls(obj, pickler.options), 2025-12-04T11:29:55.4723373Z W1204 11:23:23.971000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 364, in __init__ 2025-12-04T11:29:55.4724824Z W1204 11:23:23.971000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] self.graph = _GraphPickleData(gm._graph, options) 2025-12-04T11:29:55.4726316Z W1204 11:23:23.971000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 571, in __init__ 2025-12-04T11:29:55.4727765Z W1204 11:23:23.971000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] nodes[node] = _NodePickleData(node, nodes, options) 2025-12-04T11:29:55.4729228Z W1204 11:23:23.971000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 387, in __init__ 2025-12-04T11:29:55.4730782Z W1204 11:23:23.971000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] self.target = _OpPickleData.pickle(node.target, options) 2025-12-04T11:29:55.4732267Z W1204 11:23:23.971000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 435, in pickle 2025-12-04T11:29:55.4733752Z W1204 11:23:23.971000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] return cls._pickle_op(name, _OpOverloadPickleData, options) 2025-12-04T11:29:55.4735299Z W1204 11:23:23.971000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 456, in _pickle_op 2025-12-04T11:29:55.4736938Z W1204 11:23:23.971000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] raise BypassFxGraphCache(f"Unable to pickle non-standard op: {name}") 2025-12-04T11:29:55.4738531Z W1204 11:23:23.971000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] torch._inductor.codecache.BypassFxGraphCache: Unable to pickle non-standard op: torch.ops.prims.iota.default 2025-12-04T11:29:55.4739598Z PASSED [0.2955s] [ 44%] 2025-12-04T11:29:55.4740395Z inductor/test_compile_subprocess.py::GPUTests::test_index_put_failed_reinplace_cuda <- test/inductor/test_torchinductor.py PASSED [0.6007s] [ 45%] 2025-12-04T11:29:55.4741746Z inductor/test_compile_subprocess.py::GPUTests::test_index_put_fallback1_cuda <- test/inductor/test_torchinductor.py PASSED [0.7557s] [ 45%] 2025-12-04T11:29:55.4743034Z inductor/test_compile_subprocess.py::GPUTests::test_index_put_index_cuda <- test/inductor/test_torchinductor.py PASSED [0.6052s] [ 45%] 2025-12-04T11:29:55.4744293Z inductor/test_compile_subprocess.py::GPUTests::test_index_select_cuda <- test/inductor/test_torchinductor.py PASSED [2.0063s] [ 46%] 2025-12-04T11:29:55.4745578Z inductor/test_compile_subprocess.py::GPUTests::test_indirect_load_broadcast_cuda <- test/inductor/test_torchinductor.py PASSED [1.8674s] [ 46%] 2025-12-04T11:29:55.4747047Z inductor/test_compile_subprocess.py::GPUTests::test_inductor_layout_optimization_input_mutations_cuda <- test/inductor/test_torchinductor.py PASSED [0.5789s] [ 46%] 2025-12-04T11:29:55.4748988Z inductor/test_compile_subprocess.py::GPUTests::test_inner_fn_str_and_stride_cuda <- test/inductor/test_torchinductor.py W1204 11:23:30.682000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Unable to pickle input graph or example inputs 2025-12-04T11:29:55.4750666Z W1204 11:23:30.682000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Traceback (most recent call last): 2025-12-04T11:29:55.4752158Z W1204 11:23:30.682000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 486, in serialize_compile 2025-12-04T11:29:55.4753545Z W1204 11:23:30.682000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] ).serialize() 2025-12-04T11:29:55.4754907Z W1204 11:23:30.682000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 210, in serialize 2025-12-04T11:29:55.4756507Z W1204 11:23:30.682000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] return _WireProtocolPickledInput(GraphPickler.dumps(self)) 2025-12-04T11:29:55.4758012Z W1204 11:23:30.682000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 127, in dumps 2025-12-04T11:29:55.4759326Z W1204 11:23:30.682000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] pickler.dump(obj) 2025-12-04T11:29:55.4760747Z W1204 11:23:30.682000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 94, in reducer_override 2025-12-04T11:29:55.4762314Z W1204 11:23:30.682000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] return _GraphModulePickleData.reduce_helper(self, obj) 2025-12-04T11:29:55.4763837Z W1204 11:23:30.682000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 352, in reduce_helper 2025-12-04T11:29:55.4765241Z W1204 11:23:30.682000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] cls(obj, pickler.options), 2025-12-04T11:29:55.4766640Z W1204 11:23:30.682000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 364, in __init__ 2025-12-04T11:29:55.4768107Z W1204 11:23:30.682000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] self.graph = _GraphPickleData(gm._graph, options) 2025-12-04T11:29:55.4769573Z W1204 11:23:30.682000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 571, in __init__ 2025-12-04T11:29:55.4771248Z W1204 11:23:30.682000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] nodes[node] = _NodePickleData(node, nodes, options) 2025-12-04T11:29:55.4772721Z W1204 11:23:30.682000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 387, in __init__ 2025-12-04T11:29:55.4774198Z W1204 11:23:30.682000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] self.target = _OpPickleData.pickle(node.target, options) 2025-12-04T11:29:55.4775689Z W1204 11:23:30.682000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 435, in pickle 2025-12-04T11:29:55.4777256Z W1204 11:23:30.682000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] return cls._pickle_op(name, _OpOverloadPickleData, options) 2025-12-04T11:29:55.4778773Z W1204 11:23:30.682000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 456, in _pickle_op 2025-12-04T11:29:55.4780336Z W1204 11:23:30.682000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] raise BypassFxGraphCache(f"Unable to pickle non-standard op: {name}") 2025-12-04T11:29:55.4781980Z W1204 11:23:30.682000 94107 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] torch._inductor.codecache.BypassFxGraphCache: Unable to pickle non-standard op: torch.ops._inductor_test.realize.default 2025-12-04T11:29:55.4783111Z PASSED [0.1885s] [ 47%] 2025-12-04T11:29:55.4783856Z inductor/test_compile_subprocess.py::GPUTests::test_inplace_add_cuda <- test/inductor/test_torchinductor.py PASSED [0.2163s] [ 47%] 2025-12-04T11:29:55.4785101Z inductor/test_compile_subprocess.py::GPUTests::test_input_mutation2_cuda <- test/inductor/test_torchinductor.py PASSED [0.2752s] [ 47%] 2025-12-04T11:29:55.4786371Z inductor/test_compile_subprocess.py::GPUTests::test_input_mutation3_cuda <- test/inductor/test_torchinductor.py PASSED [0.2832s] [ 48%] 2025-12-04T11:29:55.4787739Z inductor/test_compile_subprocess.py::GPUTests::test_input_mutation4_cuda <- test/inductor/test_torchinductor.py PASSED [0.4367s] [ 48%] 2025-12-04T11:29:55.4789049Z inductor/test_compile_subprocess.py::GPUTests::test_insignificant_strides_cuda <- test/inductor/test_torchinductor.py PASSED [0.2029s] [ 48%] 2025-12-04T11:29:55.4790308Z inductor/test_compile_subprocess.py::GPUTests::test_isinf2_cuda <- test/inductor/test_torchinductor.py PASSED [0.4578s] [ 49%] 2025-12-04T11:29:55.4791611Z inductor/test_compile_subprocess.py::GPUTests::test_isinf_cuda <- test/inductor/test_torchinductor.py ('RERUN', {'yellow': True}) [0.8521s] [ 49%] 2025-12-04T11:29:55.4793021Z inductor/test_compile_subprocess.py::GPUTests::test_isinf_cuda <- test/inductor/test_torchinductor.py ('RERUN', {'yellow': True}) [0.8322s] [ 49%] 2025-12-04T11:29:55.4794287Z inductor/test_compile_subprocess.py::GPUTests::test_isinf_cuda <- test/inductor/test_torchinductor.py FAILED [0.8161s] [ 49%] 2025-12-04T11:29:55.4794935Z 2025-12-04T11:29:55.4795098Z ==================================== RERUNS ==================================== 2025-12-04T11:29:55.4795602Z ___________________________ GPUTests.test_isinf_cuda ___________________________ 2025-12-04T11:29:55.4795808Z Traceback (most recent call last): 2025-12-04T11:29:55.4796221Z File "/var/lib/jenkins/workspace/test/inductor/test_torchinductor.py", line 14842, in new_test 2025-12-04T11:29:55.4796329Z return value(self) 2025-12-04T11:29:55.4796743Z File "/var/lib/jenkins/workspace/test/inductor/test_torchinductor.py", line 8265, in test_isinf 2025-12-04T11:29:55.4796994Z self.common(fn, [torch.tensor(values, dtype=dtype)], check_lowp=False) 2025-12-04T11:29:55.4797291Z File "/opt/conda/envs/py_3.10/lib/python3.10/contextlib.py", line 79, in inner 2025-12-04T11:29:55.4797410Z return func(*args, **kwds) 2025-12-04T11:29:55.4797842Z File "/var/lib/jenkins/workspace/test/inductor/test_torchinductor.py", line 692, in check_model_gpu 2025-12-04T11:29:55.4797957Z check_model( 2025-12-04T11:29:55.4798359Z File "/var/lib/jenkins/workspace/test/inductor/test_torchinductor.py", line 514, in check_model 2025-12-04T11:29:55.4798495Z actual = run(*example_inputs, **kwargs) 2025-12-04T11:29:55.4798993Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T11:29:55.4799241Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T11:29:55.4799766Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T11:29:55.4799965Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T11:29:55.4800481Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T11:29:55.4800646Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T11:29:55.4801178Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T11:29:55.4801515Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T11:29:55.4802054Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 422, in codegen_and_compile 2025-12-04T11:29:55.4802264Z output = self._send_to_child(inputs).deserialize(constants) 2025-12-04T11:29:55.4802785Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 596, in _send_to_child 2025-12-04T11:29:55.4802894Z return f.result() 2025-12-04T11:29:55.4803254Z File "/opt/conda/envs/py_3.10/lib/python3.10/concurrent/futures/_base.py", line 458, in result 2025-12-04T11:29:55.4803384Z return self.__get_result() 2025-12-04T11:29:55.4803774Z File "/opt/conda/envs/py_3.10/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result 2025-12-04T11:29:55.4803899Z raise self._exception 2025-12-04T11:29:55.4804324Z torch._inductor.exc.InductorError: SubprocException: An exception occurred in a subprocess: 2025-12-04T11:29:55.4804332Z 2025-12-04T11:29:55.4804434Z Name= 2025-12-04T11:29:55.4804571Z Traceback (most recent call last): 2025-12-04T11:29:55.4805109Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_worker/subproc_pool.py", line 457, in do_job 2025-12-04T11:29:55.4805209Z result = job() 2025-12-04T11:29:55.4805829Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_subproc.py", line 92, in _run_in_child_subprocess 2025-12-04T11:29:55.4806009Z result = cls._run_in_child(pickled_input, extra_env) 2025-12-04T11:29:55.4806641Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 543, in _run_in_child 2025-12-04T11:29:55.4806851Z output_graph = _InProcessFxCompile().codegen_and_compile( 2025-12-04T11:29:55.4807372Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1471, in codegen_and_compile 2025-12-04T11:29:55.4807516Z _check_triton_bf16_support(graph) 2025-12-04T11:29:55.4808093Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2911, in _check_triton_bf16_support 2025-12-04T11:29:55.4808234Z warn_and_skip(node.get_device()) 2025-12-04T11:29:55.4808718Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2894, in warn_and_skip 2025-12-04T11:29:55.4808863Z raise SkipFrame("BF16 is not supported") 2025-12-04T11:29:55.4809051Z torch._dynamo.exc.SkipFrame: BF16 is not supported 2025-12-04T11:29:55.4809058Z 2025-12-04T11:29:55.4809065Z 2025-12-04T11:29:55.4809787Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T11:29:55.4809794Z 2025-12-04T11:29:55.4809798Z 2025-12-04T11:29:55.4810032Z To execute this test, run the following from the base repo dir: 2025-12-04T11:29:55.4810476Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_compile_subprocess.py GPUTests.test_isinf_cuda 2025-12-04T11:29:55.4810483Z 2025-12-04T11:29:55.4810758Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:29:55.4810998Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:29:55.4811160Z stats [('calls_captured', 8), ('unique_graphs', 3)] 2025-12-04T11:29:55.4811535Z aot_autograd [('total', 4), ('autograd_cache_miss', 4), ('autograd_cache_saved', 3), ('ok', 3), ('not_ok', 1)] 2025-12-04T11:29:55.4812079Z inductor [('triton_bundler_save_kernel', 24), ('fxgraph_cache_miss', 4), ('async_compile_cache_miss', 3), ('triton_bundler_save_static_autotuner', 3)] 2025-12-04T11:29:55.4812304Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:29:55.4813056Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__getstate__, please add missing op schema 2025-12-04T11:29:55.4813340Z warnings.warn(f"undefined OpHandler.{name}, please add missing op schema") 2025-12-04T11:29:55.4814076Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__setstate__, please add missing op schema 2025-12-04T11:29:55.4814353Z warnings.warn(f"undefined OpHandler.{name}, please add missing op schema") 2025-12-04T11:29:55.4815080Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__getstate__, please add missing op schema 2025-12-04T11:29:55.4815366Z warnings.warn(f"undefined OpHandler.{name}, please add missing op schema") 2025-12-04T11:29:55.4816083Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__setstate__, please add missing op schema 2025-12-04T11:29:55.4816501Z warnings.warn(f"undefined OpHandler.{name}, please add missing op schema") 2025-12-04T11:29:55.4817229Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__getstate__, please add missing op schema 2025-12-04T11:29:55.4817503Z warnings.warn(f"undefined OpHandler.{name}, please add missing op schema") 2025-12-04T11:29:55.4818239Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__setstate__, please add missing op schema 2025-12-04T11:29:55.4818575Z warnings.warn(f"undefined OpHandler.{name}, please add missing op schema") 2025-12-04T11:29:55.4819311Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__getstate__, please add missing op schema 2025-12-04T11:29:55.4819586Z warnings.warn(f"undefined OpHandler.{name}, please add missing op schema") 2025-12-04T11:29:55.4819807Z ___________________________ GPUTests.test_isinf_cuda ___________________________ 2025-12-04T11:29:55.4819980Z Traceback (most recent call last): 2025-12-04T11:29:55.4820384Z File "/var/lib/jenkins/workspace/test/inductor/test_torchinductor.py", line 14842, in new_test 2025-12-04T11:29:55.4820505Z return value(self) 2025-12-04T11:29:55.4820909Z File "/var/lib/jenkins/workspace/test/inductor/test_torchinductor.py", line 8265, in test_isinf 2025-12-04T11:29:55.4821162Z self.common(fn, [torch.tensor(values, dtype=dtype)], check_lowp=False) 2025-12-04T11:29:55.4821453Z File "/opt/conda/envs/py_3.10/lib/python3.10/contextlib.py", line 79, in inner 2025-12-04T11:29:55.4821571Z return func(*args, **kwds) 2025-12-04T11:29:55.4822001Z File "/var/lib/jenkins/workspace/test/inductor/test_torchinductor.py", line 692, in check_model_gpu 2025-12-04T11:29:55.4822113Z check_model( 2025-12-04T11:29:55.4822515Z File "/var/lib/jenkins/workspace/test/inductor/test_torchinductor.py", line 514, in check_model 2025-12-04T11:29:55.4822665Z actual = run(*example_inputs, **kwargs) 2025-12-04T11:29:55.4823156Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T11:29:55.4823404Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T11:29:55.4823932Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T11:29:55.4824128Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T11:29:55.4824644Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T11:29:55.4824807Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T11:29:55.4825341Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T11:29:55.4825677Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T11:29:55.4826212Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 422, in codegen_and_compile 2025-12-04T11:29:55.4826421Z output = self._send_to_child(inputs).deserialize(constants) 2025-12-04T11:29:55.4826941Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 596, in _send_to_child 2025-12-04T11:29:55.4827049Z return f.result() 2025-12-04T11:29:55.4827420Z File "/opt/conda/envs/py_3.10/lib/python3.10/concurrent/futures/_base.py", line 458, in result 2025-12-04T11:29:55.4827539Z return self.__get_result() 2025-12-04T11:29:55.4827928Z File "/opt/conda/envs/py_3.10/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result 2025-12-04T11:29:55.4828050Z raise self._exception 2025-12-04T11:29:55.4828432Z torch._inductor.exc.InductorError: SubprocException: An exception occurred in a subprocess: 2025-12-04T11:29:55.4828438Z 2025-12-04T11:29:55.4828575Z Name= 2025-12-04T11:29:55.4828713Z Traceback (most recent call last): 2025-12-04T11:29:55.4829258Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_worker/subproc_pool.py", line 457, in do_job 2025-12-04T11:29:55.4829372Z result = job() 2025-12-04T11:29:55.4829949Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_subproc.py", line 92, in _run_in_child_subprocess 2025-12-04T11:29:55.4830158Z result = cls._run_in_child(pickled_input, extra_env) 2025-12-04T11:29:55.4830706Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 543, in _run_in_child 2025-12-04T11:29:55.4830914Z output_graph = _InProcessFxCompile().codegen_and_compile( 2025-12-04T11:29:55.4831435Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1471, in codegen_and_compile 2025-12-04T11:29:55.4831577Z _check_triton_bf16_support(graph) 2025-12-04T11:29:55.4832126Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2911, in _check_triton_bf16_support 2025-12-04T11:29:55.4832303Z warn_and_skip(node.get_device()) 2025-12-04T11:29:55.4832788Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2894, in warn_and_skip 2025-12-04T11:29:55.4832928Z raise SkipFrame("BF16 is not supported") 2025-12-04T11:29:55.4833123Z torch._dynamo.exc.SkipFrame: BF16 is not supported 2025-12-04T11:29:55.4833129Z 2025-12-04T11:29:55.4833134Z 2025-12-04T11:29:55.4833855Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T11:29:55.4833861Z 2025-12-04T11:29:55.4833866Z 2025-12-04T11:29:55.4834100Z To execute this test, run the following from the base repo dir: 2025-12-04T11:29:55.4834542Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_compile_subprocess.py GPUTests.test_isinf_cuda 2025-12-04T11:29:55.4834549Z 2025-12-04T11:29:55.4834836Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:29:55.4835060Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:29:55.4835221Z stats [('calls_captured', 8), ('unique_graphs', 3)] 2025-12-04T11:29:55.4835598Z aot_autograd [('total', 4), ('autograd_cache_miss', 4), ('autograd_cache_saved', 3), ('ok', 3), ('not_ok', 1)] 2025-12-04T11:29:55.4836147Z inductor [('triton_bundler_save_kernel', 24), ('fxgraph_cache_miss', 4), ('async_compile_cache_miss', 3), ('triton_bundler_save_static_autotuner', 3)] 2025-12-04T11:29:55.4836371Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:29:55.4837129Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__getstate__, please add missing op schema 2025-12-04T11:29:55.4837416Z warnings.warn(f"undefined OpHandler.{name}, please add missing op schema") 2025-12-04T11:29:55.4838157Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__setstate__, please add missing op schema 2025-12-04T11:29:55.4838436Z warnings.warn(f"undefined OpHandler.{name}, please add missing op schema") 2025-12-04T11:29:55.4839162Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__getstate__, please add missing op schema 2025-12-04T11:29:55.4839459Z warnings.warn(f"undefined OpHandler.{name}, please add missing op schema") 2025-12-04T11:29:55.4840180Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__setstate__, please add missing op schema 2025-12-04T11:29:55.4840470Z warnings.warn(f"undefined OpHandler.{name}, please add missing op schema") 2025-12-04T11:29:55.4841231Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__getstate__, please add missing op schema 2025-12-04T11:29:55.4841511Z warnings.warn(f"undefined OpHandler.{name}, please add missing op schema") 2025-12-04T11:29:55.4842249Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__setstate__, please add missing op schema 2025-12-04T11:29:55.4842558Z warnings.warn(f"undefined OpHandler.{name}, please add missing op schema") 2025-12-04T11:29:55.4843343Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__getstate__, please add missing op schema 2025-12-04T11:29:55.4843618Z warnings.warn(f"undefined OpHandler.{name}, please add missing op schema") 2025-12-04T11:29:55.4843841Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:29:55.4844020Z stats [('calls_captured', 8), ('unique_graphs', 3)] 2025-12-04T11:29:55.4844384Z aot_autograd [('total', 4), ('autograd_cache_miss', 4), ('autograd_cache_saved', 3), ('ok', 3), ('not_ok', 1)] 2025-12-04T11:29:55.4844976Z inductor [('triton_bundler_save_kernel', 24), ('fxgraph_cache_miss', 4), ('async_compile_cache_miss', 3), ('triton_bundler_save_static_autotuner', 3)] 2025-12-04T11:29:55.4845194Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:29:55.4845920Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__getstate__, please add missing op schema 2025-12-04T11:29:55.4846212Z warnings.warn(f"undefined OpHandler.{name}, please add missing op schema") 2025-12-04T11:29:55.4846933Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__setstate__, please add missing op schema 2025-12-04T11:29:55.4847223Z warnings.warn(f"undefined OpHandler.{name}, please add missing op schema") 2025-12-04T11:29:55.4847947Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__getstate__, please add missing op schema 2025-12-04T11:29:55.4848227Z warnings.warn(f"undefined OpHandler.{name}, please add missing op schema") 2025-12-04T11:29:55.4848966Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__setstate__, please add missing op schema 2025-12-04T11:29:55.4849243Z warnings.warn(f"undefined OpHandler.{name}, please add missing op schema") 2025-12-04T11:29:55.4849979Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__getstate__, please add missing op schema 2025-12-04T11:29:55.4850256Z warnings.warn(f"undefined OpHandler.{name}, please add missing op schema") 2025-12-04T11:29:55.4850980Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__setstate__, please add missing op schema 2025-12-04T11:29:55.4851268Z warnings.warn(f"undefined OpHandler.{name}, please add missing op schema") 2025-12-04T11:29:55.4851985Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__getstate__, please add missing op schema 2025-12-04T11:29:55.4852277Z warnings.warn(f"undefined OpHandler.{name}, please add missing op schema") 2025-12-04T11:29:55.4852426Z =================================== FAILURES =================================== 2025-12-04T11:29:55.4852645Z ___________________________ GPUTests.test_isinf_cuda ___________________________ 2025-12-04T11:29:55.4852786Z Traceback (most recent call last): 2025-12-04T11:29:55.4853188Z File "/var/lib/jenkins/workspace/test/inductor/test_torchinductor.py", line 14842, in new_test 2025-12-04T11:29:55.4853291Z return value(self) 2025-12-04T11:29:55.4853759Z File "/var/lib/jenkins/workspace/test/inductor/test_torchinductor.py", line 8265, in test_isinf 2025-12-04T11:29:55.4854010Z self.common(fn, [torch.tensor(values, dtype=dtype)], check_lowp=False) 2025-12-04T11:29:55.4854306Z File "/opt/conda/envs/py_3.10/lib/python3.10/contextlib.py", line 79, in inner 2025-12-04T11:29:55.4854424Z return func(*args, **kwds) 2025-12-04T11:29:55.4854854Z File "/var/lib/jenkins/workspace/test/inductor/test_torchinductor.py", line 692, in check_model_gpu 2025-12-04T11:29:55.4855000Z check_model( 2025-12-04T11:29:55.4855401Z File "/var/lib/jenkins/workspace/test/inductor/test_torchinductor.py", line 514, in check_model 2025-12-04T11:29:55.4855571Z actual = run(*example_inputs, **kwargs) 2025-12-04T11:29:55.4856075Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T11:29:55.4856324Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T11:29:55.4856936Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T11:29:55.4857171Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T11:29:55.4857681Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T11:29:55.4857843Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T11:29:55.4858381Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T11:29:55.4858717Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T11:29:55.4859249Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 422, in codegen_and_compile 2025-12-04T11:29:55.4859457Z output = self._send_to_child(inputs).deserialize(constants) 2025-12-04T11:29:55.4859982Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 596, in _send_to_child 2025-12-04T11:29:55.4860091Z return f.result() 2025-12-04T11:29:55.4860462Z File "/opt/conda/envs/py_3.10/lib/python3.10/concurrent/futures/_base.py", line 458, in result 2025-12-04T11:29:55.4860581Z return self.__get_result() 2025-12-04T11:29:55.4860973Z File "/opt/conda/envs/py_3.10/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result 2025-12-04T11:29:55.4861099Z raise self._exception 2025-12-04T11:29:55.4861482Z torch._inductor.exc.InductorError: SubprocException: An exception occurred in a subprocess: 2025-12-04T11:29:55.4861491Z 2025-12-04T11:29:55.4861592Z Name= 2025-12-04T11:29:55.4861731Z Traceback (most recent call last): 2025-12-04T11:29:55.4862268Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_worker/subproc_pool.py", line 457, in do_job 2025-12-04T11:29:55.4862381Z result = job() 2025-12-04T11:29:55.4862962Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_subproc.py", line 92, in _run_in_child_subprocess 2025-12-04T11:29:55.4863141Z result = cls._run_in_child(pickled_input, extra_env) 2025-12-04T11:29:55.4863659Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 543, in _run_in_child 2025-12-04T11:29:55.4863865Z output_graph = _InProcessFxCompile().codegen_and_compile( 2025-12-04T11:29:55.4864385Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1471, in codegen_and_compile 2025-12-04T11:29:55.4864529Z _check_triton_bf16_support(graph) 2025-12-04T11:29:55.4865073Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2911, in _check_triton_bf16_support 2025-12-04T11:29:55.4865209Z warn_and_skip(node.get_device()) 2025-12-04T11:29:55.4865739Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2894, in warn_and_skip 2025-12-04T11:29:55.4865883Z raise SkipFrame("BF16 is not supported") 2025-12-04T11:29:55.4866073Z torch._dynamo.exc.SkipFrame: BF16 is not supported 2025-12-04T11:29:55.4866078Z 2025-12-04T11:29:55.4866083Z 2025-12-04T11:29:55.4866797Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T11:29:55.4866839Z 2025-12-04T11:29:55.4866843Z 2025-12-04T11:29:55.4867076Z To execute this test, run the following from the base repo dir: 2025-12-04T11:29:55.4867560Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_compile_subprocess.py GPUTests.test_isinf_cuda 2025-12-04T11:29:55.4867566Z 2025-12-04T11:29:55.4867854Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:29:55.4868075Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:29:55.4868242Z stats [('calls_captured', 8), ('unique_graphs', 3)] 2025-12-04T11:29:55.4868648Z aot_autograd [('total', 4), ('autograd_cache_miss', 4), ('autograd_cache_saved', 3), ('ok', 3), ('not_ok', 1)] 2025-12-04T11:29:55.4869194Z inductor [('triton_bundler_save_kernel', 24), ('fxgraph_cache_miss', 4), ('async_compile_cache_miss', 3), ('triton_bundler_save_static_autotuner', 3)] 2025-12-04T11:29:55.4869414Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:29:55.4870170Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__getstate__, please add missing op schema 2025-12-04T11:29:55.4870454Z warnings.warn(f"undefined OpHandler.{name}, please add missing op schema") 2025-12-04T11:29:55.4871391Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__setstate__, please add missing op schema 2025-12-04T11:29:55.4871678Z warnings.warn(f"undefined OpHandler.{name}, please add missing op schema") 2025-12-04T11:29:55.4872406Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__getstate__, please add missing op schema 2025-12-04T11:29:55.4872697Z warnings.warn(f"undefined OpHandler.{name}, please add missing op schema") 2025-12-04T11:29:55.4873421Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__setstate__, please add missing op schema 2025-12-04T11:29:55.4873714Z warnings.warn(f"undefined OpHandler.{name}, please add missing op schema") 2025-12-04T11:29:55.4874437Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__getstate__, please add missing op schema 2025-12-04T11:29:55.4874714Z warnings.warn(f"undefined OpHandler.{name}, please add missing op schema") 2025-12-04T11:29:55.4875455Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__setstate__, please add missing op schema 2025-12-04T11:29:55.4875731Z warnings.warn(f"undefined OpHandler.{name}, please add missing op schema") 2025-12-04T11:29:55.4876472Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__getstate__, please add missing op schema 2025-12-04T11:29:55.4876748Z warnings.warn(f"undefined OpHandler.{name}, please add missing op schema") 2025-12-04T11:29:55.4876967Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:29:55.4877143Z stats [('calls_captured', 8), ('unique_graphs', 3)] 2025-12-04T11:29:55.4877504Z aot_autograd [('total', 4), ('autograd_cache_miss', 4), ('autograd_cache_saved', 3), ('ok', 3), ('not_ok', 1)] 2025-12-04T11:29:55.4878149Z inductor [('triton_bundler_save_kernel', 24), ('fxgraph_cache_miss', 4), ('async_compile_cache_miss', 3), ('triton_bundler_save_static_autotuner', 3)] 2025-12-04T11:29:55.4878369Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:29:55.4879099Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__getstate__, please add missing op schema 2025-12-04T11:29:55.4879388Z warnings.warn(f"undefined OpHandler.{name}, please add missing op schema") 2025-12-04T11:29:55.4880168Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__setstate__, please add missing op schema 2025-12-04T11:29:55.4880496Z warnings.warn(f"undefined OpHandler.{name}, please add missing op schema") 2025-12-04T11:29:55.4881219Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__getstate__, please add missing op schema 2025-12-04T11:29:55.4881499Z warnings.warn(f"undefined OpHandler.{name}, please add missing op schema") 2025-12-04T11:29:55.4882235Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__setstate__, please add missing op schema 2025-12-04T11:29:55.4882554Z warnings.warn(f"undefined OpHandler.{name}, please add missing op schema") 2025-12-04T11:29:55.4883290Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__getstate__, please add missing op schema 2025-12-04T11:29:55.4883568Z warnings.warn(f"undefined OpHandler.{name}, please add missing op schema") 2025-12-04T11:29:55.4884294Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__setstate__, please add missing op schema 2025-12-04T11:29:55.4884582Z warnings.warn(f"undefined OpHandler.{name}, please add missing op schema") 2025-12-04T11:29:55.4885310Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__getstate__, please add missing op schema 2025-12-04T11:29:55.4885597Z warnings.warn(f"undefined OpHandler.{name}, please add missing op schema") 2025-12-04T11:29:55.4885812Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:29:55.4885973Z stats [('calls_captured', 8), ('unique_graphs', 3)] 2025-12-04T11:29:55.4886342Z aot_autograd [('total', 4), ('autograd_cache_miss', 4), ('autograd_cache_saved', 3), ('ok', 3), ('not_ok', 1)] 2025-12-04T11:29:55.4886885Z inductor [('triton_bundler_save_kernel', 24), ('fxgraph_cache_miss', 4), ('async_compile_cache_miss', 3), ('triton_bundler_save_static_autotuner', 3)] 2025-12-04T11:29:55.4887116Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:29:55.4887838Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__getstate__, please add missing op schema 2025-12-04T11:29:55.4888117Z warnings.warn(f"undefined OpHandler.{name}, please add missing op schema") 2025-12-04T11:29:55.4888859Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__setstate__, please add missing op schema 2025-12-04T11:29:55.4889133Z warnings.warn(f"undefined OpHandler.{name}, please add missing op schema") 2025-12-04T11:29:55.4889865Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__getstate__, please add missing op schema 2025-12-04T11:29:55.4890145Z warnings.warn(f"undefined OpHandler.{name}, please add missing op schema") 2025-12-04T11:29:55.4890876Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__setstate__, please add missing op schema 2025-12-04T11:29:55.4891163Z warnings.warn(f"undefined OpHandler.{name}, please add missing op schema") 2025-12-04T11:29:55.4891924Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__getstate__, please add missing op schema 2025-12-04T11:29:55.4892216Z warnings.warn(f"undefined OpHandler.{name}, please add missing op schema") 2025-12-04T11:29:55.4892937Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__setstate__, please add missing op schema 2025-12-04T11:29:55.4893262Z warnings.warn(f"undefined OpHandler.{name}, please add missing op schema") 2025-12-04T11:29:55.4894029Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__getstate__, please add missing op schema 2025-12-04T11:29:55.4894305Z warnings.warn(f"undefined OpHandler.{name}, please add missing op schema") 2025-12-04T11:29:55.4895140Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_compile_subprocess/inductor.test_compile_subprocess-84a2c5e5cdda7bdd.xml - 2025-12-04T11:29:55.4895314Z =========================== short test summary info ============================ 2025-12-04T11:29:55.4896121Z FAILED [0.8161s] inductor/test_compile_subprocess.py::GPUTests::test_isinf_cuda - torch._inductor.exc.InductorError: SubprocException: An exception occurred in a subprocess: 2025-12-04T11:29:55.4896128Z 2025-12-04T11:29:55.4896247Z Name= 2025-12-04T11:29:55.4896443Z Traceback (most recent call last): 2025-12-04T11:29:55.4896992Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_worker/subproc_pool.py", line 457, in do_job 2025-12-04T11:29:55.4897112Z result = job() 2025-12-04T11:29:55.4897689Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_subproc.py", line 92, in _run_in_child_subprocess 2025-12-04T11:29:55.4897882Z result = cls._run_in_child(pickled_input, extra_env) 2025-12-04T11:29:55.4898390Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 543, in _run_in_child 2025-12-04T11:29:55.4898601Z output_graph = _InProcessFxCompile().codegen_and_compile( 2025-12-04T11:29:55.4899133Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1471, in codegen_and_compile 2025-12-04T11:29:55.4899260Z _check_triton_bf16_support(graph) 2025-12-04T11:29:55.4899822Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2911, in _check_triton_bf16_support 2025-12-04T11:29:55.4899948Z warn_and_skip(node.get_device()) 2025-12-04T11:29:55.4900432Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2894, in warn_and_skip 2025-12-04T11:29:55.4900588Z raise SkipFrame("BF16 is not supported") 2025-12-04T11:29:55.4900760Z torch._dynamo.exc.SkipFrame: BF16 is not supported 2025-12-04T11:29:55.4900765Z 2025-12-04T11:29:55.4900770Z 2025-12-04T11:29:55.4901496Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T11:29:55.4901504Z 2025-12-04T11:29:55.4901508Z 2025-12-04T11:29:55.4901729Z To execute this test, run the following from the base repo dir: 2025-12-04T11:29:55.4902168Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_compile_subprocess.py GPUTests.test_isinf_cuda 2025-12-04T11:29:55.4902189Z 2025-12-04T11:29:55.4902462Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:29:55.4902647Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:29:55.4902893Z ======== 1 failed, 118 passed, 24 skipped, 2 rerun in 156.79s (0:02:36) ======== 2025-12-04T11:29:55.4902995Z Got exit code 1 2025-12-04T11:29:55.4903104Z Retrying single test... 2025-12-04T11:29:55.4903610Z W1204 11:23:50.228000 98844 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T11:29:55.4904257Z Test results will be stored in test-reports/python-pytest/inductor.test_compile_subprocess/inductor.test_compile_subprocess-97e49e1b6070e822.xml 2025-12-04T11:29:55.4904438Z ============================= test session starts ============================== 2025-12-04T11:29:55.4904792Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T11:29:55.4904939Z cachedir: .pytest_cache 2025-12-04T11:29:55.4905473Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:29:55.4905631Z rootdir: /var/lib/jenkins/workspace 2025-12-04T11:29:55.4905743Z configfile: pytest.ini 2025-12-04T11:29:55.4906299Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:29:55.4906532Z collecting ... collected 879 items / 287 deselected / 592 selected 2025-12-04T11:29:55.4907085Z stepcurrent: skipping 142 already run items. Running only test/inductor/test_compile_subprocess.py::GPUTests::test_isinf_cuda 2025-12-04T11:29:55.4907237Z Running 1 items in this shard 2025-12-04T11:29:55.4907242Z 2025-12-04T11:29:55.4907860Z inductor/test_compile_subprocess.py::GPUTests::test_isinf_cuda <- test/inductor/test_torchinductor.py ('RERUN', {'yellow': True}) [18.6471s] [100%] 2025-12-04T11:29:55.4908486Z inductor/test_compile_subprocess.py::GPUTests::test_isinf_cuda <- test/inductor/test_torchinductor.py ('RERUN', {'yellow': True}) [1.1840s] [100%] 2025-12-04T11:29:55.4909000Z inductor/test_compile_subprocess.py::GPUTests::test_isinf_cuda <- test/inductor/test_torchinductor.py FAILED [1.1873s] [100%] 2025-12-04T11:29:55.4909006Z 2025-12-04T11:29:55.4909167Z ==================================== RERUNS ==================================== 2025-12-04T11:29:55.4909389Z ___________________________ GPUTests.test_isinf_cuda ___________________________ 2025-12-04T11:29:55.4909517Z Traceback (most recent call last): 2025-12-04T11:29:55.4909940Z File "/var/lib/jenkins/workspace/test/inductor/test_torchinductor.py", line 14842, in new_test 2025-12-04T11:29:55.4910048Z return value(self) 2025-12-04T11:29:55.4910451Z File "/var/lib/jenkins/workspace/test/inductor/test_torchinductor.py", line 8265, in test_isinf 2025-12-04T11:29:55.4910720Z self.common(fn, [torch.tensor(values, dtype=dtype)], check_lowp=False) 2025-12-04T11:29:55.4911006Z File "/opt/conda/envs/py_3.10/lib/python3.10/contextlib.py", line 79, in inner 2025-12-04T11:29:55.4911139Z return func(*args, **kwds) 2025-12-04T11:29:55.4911574Z File "/var/lib/jenkins/workspace/test/inductor/test_torchinductor.py", line 692, in check_model_gpu 2025-12-04T11:29:55.4911676Z check_model( 2025-12-04T11:29:55.4912098Z File "/var/lib/jenkins/workspace/test/inductor/test_torchinductor.py", line 514, in check_model 2025-12-04T11:29:55.4912235Z actual = run(*example_inputs, **kwargs) 2025-12-04T11:29:55.4912725Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T11:29:55.4912994Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T11:29:55.4913509Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T11:29:55.4913720Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T11:29:55.4914233Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T11:29:55.4914384Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T11:29:55.4914931Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T11:29:55.4915253Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T11:29:55.4915835Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 422, in codegen_and_compile 2025-12-04T11:29:55.4916048Z output = self._send_to_child(inputs).deserialize(constants) 2025-12-04T11:29:55.4916559Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 596, in _send_to_child 2025-12-04T11:29:55.4916676Z return f.result() 2025-12-04T11:29:55.4917068Z File "/opt/conda/envs/py_3.10/lib/python3.10/concurrent/futures/_base.py", line 458, in result 2025-12-04T11:29:55.4917199Z return self.__get_result() 2025-12-04T11:29:55.4917627Z File "/opt/conda/envs/py_3.10/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result 2025-12-04T11:29:55.4917740Z raise self._exception 2025-12-04T11:29:55.4918135Z torch._inductor.exc.InductorError: SubprocException: An exception occurred in a subprocess: 2025-12-04T11:29:55.4918140Z 2025-12-04T11:29:55.4918239Z Name= 2025-12-04T11:29:55.4918365Z Traceback (most recent call last): 2025-12-04T11:29:55.4918916Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_worker/subproc_pool.py", line 457, in do_job 2025-12-04T11:29:55.4919055Z result = job() 2025-12-04T11:29:55.4919641Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_subproc.py", line 92, in _run_in_child_subprocess 2025-12-04T11:29:55.4919822Z result = cls._run_in_child(pickled_input, extra_env) 2025-12-04T11:29:55.4920323Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 543, in _run_in_child 2025-12-04T11:29:55.4920546Z output_graph = _InProcessFxCompile().codegen_and_compile( 2025-12-04T11:29:55.4921066Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1471, in codegen_and_compile 2025-12-04T11:29:55.4921193Z _check_triton_bf16_support(graph) 2025-12-04T11:29:55.4921752Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2911, in _check_triton_bf16_support 2025-12-04T11:29:55.4921876Z warn_and_skip(node.get_device()) 2025-12-04T11:29:55.4922371Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2894, in warn_and_skip 2025-12-04T11:29:55.4922511Z raise SkipFrame("BF16 is not supported") 2025-12-04T11:29:55.4922685Z torch._dynamo.exc.SkipFrame: BF16 is not supported 2025-12-04T11:29:55.4922693Z 2025-12-04T11:29:55.4922698Z 2025-12-04T11:29:55.4923428Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T11:29:55.4923434Z 2025-12-04T11:29:55.4923439Z 2025-12-04T11:29:55.4923656Z To execute this test, run the following from the base repo dir: 2025-12-04T11:29:55.4924106Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_compile_subprocess.py GPUTests.test_isinf_cuda 2025-12-04T11:29:55.4924114Z 2025-12-04T11:29:55.4924383Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:29:55.4924624Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:29:55.4924784Z stats [('calls_captured', 8), ('unique_graphs', 3)] 2025-12-04T11:29:55.4925329Z inductor [('triton_bundler_save_kernel', 24), ('fxgraph_cache_miss', 4), ('async_compile_cache_miss', 3), ('triton_bundler_save_static_autotuner', 3)] 2025-12-04T11:29:55.4925699Z aot_autograd [('total', 4), ('autograd_cache_miss', 4), ('autograd_cache_saved', 3), ('ok', 3), ('not_ok', 1)] 2025-12-04T11:29:55.4925923Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:29:55.4926656Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__getstate__, please add missing op schema 2025-12-04T11:29:55.4926982Z warnings.warn(f"undefined OpHandler.{name}, please add missing op schema") 2025-12-04T11:29:55.4927708Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__setstate__, please add missing op schema 2025-12-04T11:29:55.4928001Z warnings.warn(f"undefined OpHandler.{name}, please add missing op schema") 2025-12-04T11:29:55.4928724Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__getstate__, please add missing op schema 2025-12-04T11:29:55.4929031Z warnings.warn(f"undefined OpHandler.{name}, please add missing op schema") 2025-12-04T11:29:55.4929796Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__setstate__, please add missing op schema 2025-12-04T11:29:55.4930072Z warnings.warn(f"undefined OpHandler.{name}, please add missing op schema") 2025-12-04T11:29:55.4930810Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__getstate__, please add missing op schema 2025-12-04T11:29:55.4931132Z warnings.warn(f"undefined OpHandler.{name}, please add missing op schema") 2025-12-04T11:29:55.4931851Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__setstate__, please add missing op schema 2025-12-04T11:29:55.4932141Z warnings.warn(f"undefined OpHandler.{name}, please add missing op schema") 2025-12-04T11:29:55.4932867Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__getstate__, please add missing op schema 2025-12-04T11:29:55.4933154Z warnings.warn(f"undefined OpHandler.{name}, please add missing op schema") 2025-12-04T11:29:55.4933374Z ___________________________ GPUTests.test_isinf_cuda ___________________________ 2025-12-04T11:29:55.4933500Z Traceback (most recent call last): 2025-12-04T11:29:55.4933914Z File "/var/lib/jenkins/workspace/test/inductor/test_torchinductor.py", line 14842, in new_test 2025-12-04T11:29:55.4934023Z return value(self) 2025-12-04T11:29:55.4934439Z File "/var/lib/jenkins/workspace/test/inductor/test_torchinductor.py", line 8265, in test_isinf 2025-12-04T11:29:55.4934687Z self.common(fn, [torch.tensor(values, dtype=dtype)], check_lowp=False) 2025-12-04T11:29:55.4934966Z File "/opt/conda/envs/py_3.10/lib/python3.10/contextlib.py", line 79, in inner 2025-12-04T11:29:55.4935095Z return func(*args, **kwds) 2025-12-04T11:29:55.4935533Z File "/var/lib/jenkins/workspace/test/inductor/test_torchinductor.py", line 692, in check_model_gpu 2025-12-04T11:29:55.4935634Z check_model( 2025-12-04T11:29:55.4936047Z File "/var/lib/jenkins/workspace/test/inductor/test_torchinductor.py", line 514, in check_model 2025-12-04T11:29:55.4936181Z actual = run(*example_inputs, **kwargs) 2025-12-04T11:29:55.4936765Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T11:29:55.4937019Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T11:29:55.4937537Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T11:29:55.4937748Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T11:29:55.4938264Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T11:29:55.4938415Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T11:29:55.4938969Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T11:29:55.4939293Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T11:29:55.4939906Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 422, in codegen_and_compile 2025-12-04T11:29:55.4940115Z output = self._send_to_child(inputs).deserialize(constants) 2025-12-04T11:29:55.4940624Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 596, in _send_to_child 2025-12-04T11:29:55.4940740Z return f.result() 2025-12-04T11:29:55.4941104Z File "/opt/conda/envs/py_3.10/lib/python3.10/concurrent/futures/_base.py", line 458, in result 2025-12-04T11:29:55.4941266Z return self.__get_result() 2025-12-04T11:29:55.4941656Z File "/opt/conda/envs/py_3.10/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result 2025-12-04T11:29:55.4941802Z raise self._exception 2025-12-04T11:29:55.4942198Z torch._inductor.exc.InductorError: SubprocException: An exception occurred in a subprocess: 2025-12-04T11:29:55.4942205Z 2025-12-04T11:29:55.4942306Z Name= 2025-12-04T11:29:55.4942431Z Traceback (most recent call last): 2025-12-04T11:29:55.4942983Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_worker/subproc_pool.py", line 457, in do_job 2025-12-04T11:29:55.4943114Z result = job() 2025-12-04T11:29:55.4943703Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_subproc.py", line 92, in _run_in_child_subprocess 2025-12-04T11:29:55.4943879Z result = cls._run_in_child(pickled_input, extra_env) 2025-12-04T11:29:55.4944384Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 543, in _run_in_child 2025-12-04T11:29:55.4944603Z output_graph = _InProcessFxCompile().codegen_and_compile( 2025-12-04T11:29:55.4945125Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1471, in codegen_and_compile 2025-12-04T11:29:55.4945265Z _check_triton_bf16_support(graph) 2025-12-04T11:29:55.4945812Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2911, in _check_triton_bf16_support 2025-12-04T11:29:55.4945935Z warn_and_skip(node.get_device()) 2025-12-04T11:29:55.4946434Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2894, in warn_and_skip 2025-12-04T11:29:55.4946579Z raise SkipFrame("BF16 is not supported") 2025-12-04T11:29:55.4946749Z torch._dynamo.exc.SkipFrame: BF16 is not supported 2025-12-04T11:29:55.4946755Z 2025-12-04T11:29:55.4946774Z 2025-12-04T11:29:55.4947488Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T11:29:55.4947496Z 2025-12-04T11:29:55.4947501Z 2025-12-04T11:29:55.4947717Z To execute this test, run the following from the base repo dir: 2025-12-04T11:29:55.4948173Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_compile_subprocess.py GPUTests.test_isinf_cuda 2025-12-04T11:29:55.4948178Z 2025-12-04T11:29:55.4948450Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:29:55.4948684Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:29:55.4948846Z stats [('calls_captured', 8), ('unique_graphs', 3)] 2025-12-04T11:29:55.4949388Z inductor [('triton_bundler_save_kernel', 24), ('fxgraph_cache_miss', 4), ('async_compile_cache_miss', 3), ('triton_bundler_save_static_autotuner', 3)] 2025-12-04T11:29:55.4949758Z aot_autograd [('total', 4), ('autograd_cache_miss', 4), ('autograd_cache_saved', 3), ('ok', 3), ('not_ok', 1)] 2025-12-04T11:29:55.4949982Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:29:55.4950726Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__getstate__, please add missing op schema 2025-12-04T11:29:55.4951008Z warnings.warn(f"undefined OpHandler.{name}, please add missing op schema") 2025-12-04T11:29:55.4951770Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__setstate__, please add missing op schema 2025-12-04T11:29:55.4952064Z warnings.warn(f"undefined OpHandler.{name}, please add missing op schema") 2025-12-04T11:29:55.4952784Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__getstate__, please add missing op schema 2025-12-04T11:29:55.4953107Z warnings.warn(f"undefined OpHandler.{name}, please add missing op schema") 2025-12-04T11:29:55.4953859Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__setstate__, please add missing op schema 2025-12-04T11:29:55.4954135Z warnings.warn(f"undefined OpHandler.{name}, please add missing op schema") 2025-12-04T11:29:55.4954864Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__getstate__, please add missing op schema 2025-12-04T11:29:55.4955141Z warnings.warn(f"undefined OpHandler.{name}, please add missing op schema") 2025-12-04T11:29:55.4955910Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__setstate__, please add missing op schema 2025-12-04T11:29:55.4956181Z warnings.warn(f"undefined OpHandler.{name}, please add missing op schema") 2025-12-04T11:29:55.4956904Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__getstate__, please add missing op schema 2025-12-04T11:29:55.4957194Z warnings.warn(f"undefined OpHandler.{name}, please add missing op schema") 2025-12-04T11:29:55.4957409Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:29:55.4957568Z stats [('calls_captured', 8), ('unique_graphs', 3)] 2025-12-04T11:29:55.4957939Z aot_autograd [('total', 4), ('autograd_cache_miss', 4), ('autograd_cache_saved', 3), ('ok', 3), ('not_ok', 1)] 2025-12-04T11:29:55.4958481Z inductor [('triton_bundler_save_kernel', 24), ('fxgraph_cache_miss', 4), ('async_compile_cache_miss', 3), ('triton_bundler_save_static_autotuner', 3)] 2025-12-04T11:29:55.4958713Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:29:55.4959438Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__getstate__, please add missing op schema 2025-12-04T11:29:55.4959716Z warnings.warn(f"undefined OpHandler.{name}, please add missing op schema") 2025-12-04T11:29:55.4960451Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__setstate__, please add missing op schema 2025-12-04T11:29:55.4960727Z warnings.warn(f"undefined OpHandler.{name}, please add missing op schema") 2025-12-04T11:29:55.4961466Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__getstate__, please add missing op schema 2025-12-04T11:29:55.4961744Z warnings.warn(f"undefined OpHandler.{name}, please add missing op schema") 2025-12-04T11:29:55.4962465Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__setstate__, please add missing op schema 2025-12-04T11:29:55.4962754Z warnings.warn(f"undefined OpHandler.{name}, please add missing op schema") 2025-12-04T11:29:55.4963484Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__getstate__, please add missing op schema 2025-12-04T11:29:55.4963776Z warnings.warn(f"undefined OpHandler.{name}, please add missing op schema") 2025-12-04T11:29:55.4964498Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__setstate__, please add missing op schema 2025-12-04T11:29:55.4964806Z warnings.warn(f"undefined OpHandler.{name}, please add missing op schema") 2025-12-04T11:29:55.4965544Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__getstate__, please add missing op schema 2025-12-04T11:29:55.4965820Z warnings.warn(f"undefined OpHandler.{name}, please add missing op schema") 2025-12-04T11:29:55.4965983Z =================================== FAILURES =================================== 2025-12-04T11:29:55.4966234Z ___________________________ GPUTests.test_isinf_cuda ___________________________ 2025-12-04T11:29:55.4966363Z Traceback (most recent call last): 2025-12-04T11:29:55.4966815Z File "/var/lib/jenkins/workspace/test/inductor/test_torchinductor.py", line 14842, in new_test 2025-12-04T11:29:55.4966923Z return value(self) 2025-12-04T11:29:55.4967346Z File "/var/lib/jenkins/workspace/test/inductor/test_torchinductor.py", line 8265, in test_isinf 2025-12-04T11:29:55.4967599Z self.common(fn, [torch.tensor(values, dtype=dtype)], check_lowp=False) 2025-12-04T11:29:55.4967879Z File "/opt/conda/envs/py_3.10/lib/python3.10/contextlib.py", line 79, in inner 2025-12-04T11:29:55.4968038Z return func(*args, **kwds) 2025-12-04T11:29:55.4968470Z File "/var/lib/jenkins/workspace/test/inductor/test_torchinductor.py", line 692, in check_model_gpu 2025-12-04T11:29:55.4968569Z check_model( 2025-12-04T11:29:55.4968984Z File "/var/lib/jenkins/workspace/test/inductor/test_torchinductor.py", line 514, in check_model 2025-12-04T11:29:55.4969120Z actual = run(*example_inputs, **kwargs) 2025-12-04T11:29:55.4969623Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T11:29:55.4969874Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T11:29:55.4970386Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T11:29:55.4970592Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T11:29:55.4971313Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T11:29:55.4971468Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T11:29:55.4972027Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T11:29:55.4972355Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T11:29:55.4972907Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 422, in codegen_and_compile 2025-12-04T11:29:55.4973118Z output = self._send_to_child(inputs).deserialize(constants) 2025-12-04T11:29:55.4973626Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 596, in _send_to_child 2025-12-04T11:29:55.4973752Z return f.result() 2025-12-04T11:29:55.4974114Z File "/opt/conda/envs/py_3.10/lib/python3.10/concurrent/futures/_base.py", line 458, in result 2025-12-04T11:29:55.4974247Z return self.__get_result() 2025-12-04T11:29:55.4974638Z File "/opt/conda/envs/py_3.10/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result 2025-12-04T11:29:55.4974752Z raise self._exception 2025-12-04T11:29:55.4975151Z torch._inductor.exc.InductorError: SubprocException: An exception occurred in a subprocess: 2025-12-04T11:29:55.4975159Z 2025-12-04T11:29:55.4975261Z Name= 2025-12-04T11:29:55.4975388Z Traceback (most recent call last): 2025-12-04T11:29:55.4975949Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_worker/subproc_pool.py", line 457, in do_job 2025-12-04T11:29:55.4976052Z result = job() 2025-12-04T11:29:55.4976707Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_subproc.py", line 92, in _run_in_child_subprocess 2025-12-04T11:29:55.4976977Z result = cls._run_in_child(pickled_input, extra_env) 2025-12-04T11:29:55.4977482Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 543, in _run_in_child 2025-12-04T11:29:55.4977703Z output_graph = _InProcessFxCompile().codegen_and_compile( 2025-12-04T11:29:55.4978222Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1471, in codegen_and_compile 2025-12-04T11:29:55.4978404Z _check_triton_bf16_support(graph) 2025-12-04T11:29:55.4979959Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2911, in _check_triton_bf16_support 2025-12-04T11:29:55.4980100Z warn_and_skip(node.get_device()) 2025-12-04T11:29:55.4980601Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2894, in warn_and_skip 2025-12-04T11:29:55.4980747Z raise SkipFrame("BF16 is not supported") 2025-12-04T11:29:55.4980923Z torch._dynamo.exc.SkipFrame: BF16 is not supported 2025-12-04T11:29:55.4980929Z 2025-12-04T11:29:55.4981007Z 2025-12-04T11:29:55.4981738Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T11:29:55.4981746Z 2025-12-04T11:29:55.4981750Z 2025-12-04T11:29:55.4981969Z To execute this test, run the following from the base repo dir: 2025-12-04T11:29:55.4982433Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_compile_subprocess.py GPUTests.test_isinf_cuda 2025-12-04T11:29:55.4982439Z 2025-12-04T11:29:55.4982713Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:29:55.4982953Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:29:55.4983116Z stats [('calls_captured', 8), ('unique_graphs', 3)] 2025-12-04T11:29:55.4983664Z inductor [('triton_bundler_save_kernel', 24), ('fxgraph_cache_miss', 4), ('async_compile_cache_miss', 3), ('triton_bundler_save_static_autotuner', 3)] 2025-12-04T11:29:55.4984038Z aot_autograd [('total', 4), ('autograd_cache_miss', 4), ('autograd_cache_saved', 3), ('ok', 3), ('not_ok', 1)] 2025-12-04T11:29:55.4984262Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:29:55.4985014Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__getstate__, please add missing op schema 2025-12-04T11:29:55.4985298Z warnings.warn(f"undefined OpHandler.{name}, please add missing op schema") 2025-12-04T11:29:55.4986024Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__setstate__, please add missing op schema 2025-12-04T11:29:55.4986317Z warnings.warn(f"undefined OpHandler.{name}, please add missing op schema") 2025-12-04T11:29:55.4987051Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__getstate__, please add missing op schema 2025-12-04T11:29:55.4987345Z warnings.warn(f"undefined OpHandler.{name}, please add missing op schema") 2025-12-04T11:29:55.4988064Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__setstate__, please add missing op schema 2025-12-04T11:29:55.4988344Z warnings.warn(f"undefined OpHandler.{name}, please add missing op schema") 2025-12-04T11:29:55.4989084Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__getstate__, please add missing op schema 2025-12-04T11:29:55.4989359Z warnings.warn(f"undefined OpHandler.{name}, please add missing op schema") 2025-12-04T11:29:55.4990090Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__setstate__, please add missing op schema 2025-12-04T11:29:55.4990397Z warnings.warn(f"undefined OpHandler.{name}, please add missing op schema") 2025-12-04T11:29:55.4991118Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__getstate__, please add missing op schema 2025-12-04T11:29:55.4991404Z warnings.warn(f"undefined OpHandler.{name}, please add missing op schema") 2025-12-04T11:29:55.4991620Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:29:55.4991812Z stats [('calls_captured', 8), ('unique_graphs', 3)] 2025-12-04T11:29:55.4992214Z aot_autograd [('total', 4), ('autograd_cache_miss', 4), ('autograd_cache_saved', 3), ('ok', 3), ('not_ok', 1)] 2025-12-04T11:29:55.4992754Z inductor [('triton_bundler_save_kernel', 24), ('fxgraph_cache_miss', 4), ('async_compile_cache_miss', 3), ('triton_bundler_save_static_autotuner', 3)] 2025-12-04T11:29:55.4992986Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:29:55.4993714Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__getstate__, please add missing op schema 2025-12-04T11:29:55.4994022Z warnings.warn(f"undefined OpHandler.{name}, please add missing op schema") 2025-12-04T11:29:55.4994757Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__setstate__, please add missing op schema 2025-12-04T11:29:55.4995034Z warnings.warn(f"undefined OpHandler.{name}, please add missing op schema") 2025-12-04T11:29:55.4995771Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__getstate__, please add missing op schema 2025-12-04T11:29:55.4996050Z warnings.warn(f"undefined OpHandler.{name}, please add missing op schema") 2025-12-04T11:29:55.4996771Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__setstate__, please add missing op schema 2025-12-04T11:29:55.4997062Z warnings.warn(f"undefined OpHandler.{name}, please add missing op schema") 2025-12-04T11:29:55.4997786Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__getstate__, please add missing op schema 2025-12-04T11:29:55.4998071Z warnings.warn(f"undefined OpHandler.{name}, please add missing op schema") 2025-12-04T11:29:55.4998796Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__setstate__, please add missing op schema 2025-12-04T11:29:55.4999071Z warnings.warn(f"undefined OpHandler.{name}, please add missing op schema") 2025-12-04T11:29:55.4999809Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__getstate__, please add missing op schema 2025-12-04T11:29:55.5000086Z warnings.warn(f"undefined OpHandler.{name}, please add missing op schema") 2025-12-04T11:29:55.5000320Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:29:55.5000482Z stats [('calls_captured', 8), ('unique_graphs', 3)] 2025-12-04T11:29:55.5000838Z aot_autograd [('total', 4), ('autograd_cache_miss', 4), ('autograd_cache_saved', 3), ('ok', 3), ('not_ok', 1)] 2025-12-04T11:29:55.5001389Z inductor [('triton_bundler_save_kernel', 24), ('fxgraph_cache_miss', 4), ('async_compile_cache_miss', 3), ('triton_bundler_save_static_autotuner', 3)] 2025-12-04T11:29:55.5001609Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:29:55.5002352Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__getstate__, please add missing op schema 2025-12-04T11:29:55.5002630Z warnings.warn(f"undefined OpHandler.{name}, please add missing op schema") 2025-12-04T11:29:55.5003385Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__setstate__, please add missing op schema 2025-12-04T11:29:55.5003678Z warnings.warn(f"undefined OpHandler.{name}, please add missing op schema") 2025-12-04T11:29:55.5004397Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__getstate__, please add missing op schema 2025-12-04T11:29:55.5004790Z warnings.warn(f"undefined OpHandler.{name}, please add missing op schema") 2025-12-04T11:29:55.5005542Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__setstate__, please add missing op schema 2025-12-04T11:29:55.5005816Z warnings.warn(f"undefined OpHandler.{name}, please add missing op schema") 2025-12-04T11:29:55.5006553Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__getstate__, please add missing op schema 2025-12-04T11:29:55.5006829Z warnings.warn(f"undefined OpHandler.{name}, please add missing op schema") 2025-12-04T11:29:55.5007603Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__setstate__, please add missing op schema 2025-12-04T11:29:55.5007876Z warnings.warn(f"undefined OpHandler.{name}, please add missing op schema") 2025-12-04T11:29:55.5008600Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__getstate__, please add missing op schema 2025-12-04T11:29:55.5008892Z warnings.warn(f"undefined OpHandler.{name}, please add missing op schema") 2025-12-04T11:29:55.5009708Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_compile_subprocess/inductor.test_compile_subprocess-97e49e1b6070e822.xml - 2025-12-04T11:29:55.5009895Z =========================== short test summary info ============================ 2025-12-04T11:29:55.5010660Z FAILED [1.1873s] inductor/test_compile_subprocess.py::GPUTests::test_isinf_cuda - torch._inductor.exc.InductorError: SubprocException: An exception occurred in a subprocess: 2025-12-04T11:29:55.5010669Z 2025-12-04T11:29:55.5010769Z Name= 2025-12-04T11:29:55.5010910Z Traceback (most recent call last): 2025-12-04T11:29:55.5011456Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_worker/subproc_pool.py", line 457, in do_job 2025-12-04T11:29:55.5011574Z result = job() 2025-12-04T11:29:55.5012153Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_subproc.py", line 92, in _run_in_child_subprocess 2025-12-04T11:29:55.5012333Z result = cls._run_in_child(pickled_input, extra_env) 2025-12-04T11:29:55.5012852Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 543, in _run_in_child 2025-12-04T11:29:55.5013058Z output_graph = _InProcessFxCompile().codegen_and_compile( 2025-12-04T11:29:55.5013584Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1471, in codegen_and_compile 2025-12-04T11:29:55.5013727Z _check_triton_bf16_support(graph) 2025-12-04T11:29:55.5014278Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2911, in _check_triton_bf16_support 2025-12-04T11:29:55.5014417Z warn_and_skip(node.get_device()) 2025-12-04T11:29:55.5014900Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2894, in warn_and_skip 2025-12-04T11:29:55.5015045Z raise SkipFrame("BF16 is not supported") 2025-12-04T11:29:55.5015231Z torch._dynamo.exc.SkipFrame: BF16 is not supported 2025-12-04T11:29:55.5015236Z 2025-12-04T11:29:55.5015241Z 2025-12-04T11:29:55.5015957Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T11:29:55.5016011Z 2025-12-04T11:29:55.5016016Z 2025-12-04T11:29:55.5016256Z To execute this test, run the following from the base repo dir: 2025-12-04T11:29:55.5016791Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_compile_subprocess.py GPUTests.test_isinf_cuda 2025-12-04T11:29:55.5016798Z 2025-12-04T11:29:55.5017084Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:29:55.5017314Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:29:55.5017524Z ================= 1 failed, 287 deselected, 2 rerun in 21.10s ================== 2025-12-04T11:29:55.5017677Z Got exit code 1 2025-12-04T11:29:55.5017788Z Retrying single test... 2025-12-04T11:29:55.5018238Z W1204 11:24:29.449000 99294 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T11:29:55.5018896Z Test results will be stored in test-reports/python-pytest/inductor.test_compile_subprocess/inductor.test_compile_subprocess-aaac502093c587a7.xml 2025-12-04T11:29:55.5019096Z ============================= test session starts ============================== 2025-12-04T11:29:55.5019459Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T11:29:55.5019571Z cachedir: .pytest_cache 2025-12-04T11:29:55.5020092Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:29:55.5020234Z rootdir: /var/lib/jenkins/workspace 2025-12-04T11:29:55.5020345Z configfile: pytest.ini 2025-12-04T11:29:55.5020889Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:29:55.5021128Z collecting ... collected 879 items / 287 deselected / 592 selected 2025-12-04T11:29:55.5021664Z stepcurrent: skipping 142 already run items. Running only test/inductor/test_compile_subprocess.py::GPUTests::test_isinf_cuda 2025-12-04T11:29:55.5021798Z Running 1 items in this shard 2025-12-04T11:29:55.5021805Z 2025-12-04T11:29:55.5022422Z inductor/test_compile_subprocess.py::GPUTests::test_isinf_cuda <- test/inductor/test_torchinductor.py ('RERUN', {'yellow': True}) [18.6246s] [100%] 2025-12-04T11:29:55.5023027Z inductor/test_compile_subprocess.py::GPUTests::test_isinf_cuda <- test/inductor/test_torchinductor.py ('RERUN', {'yellow': True}) [1.1992s] [100%] 2025-12-04T11:29:55.5023555Z inductor/test_compile_subprocess.py::GPUTests::test_isinf_cuda <- test/inductor/test_torchinductor.py FAILED [1.1829s] [100%] 2025-12-04T11:29:55.5023560Z 2025-12-04T11:29:55.5023707Z ==================================== RERUNS ==================================== 2025-12-04T11:29:55.5023938Z ___________________________ GPUTests.test_isinf_cuda ___________________________ 2025-12-04T11:29:55.5024064Z Traceback (most recent call last): 2025-12-04T11:29:55.5024470Z File "/var/lib/jenkins/workspace/test/inductor/test_torchinductor.py", line 14842, in new_test 2025-12-04T11:29:55.5024589Z return value(self) 2025-12-04T11:29:55.5024994Z File "/var/lib/jenkins/workspace/test/inductor/test_torchinductor.py", line 8265, in test_isinf 2025-12-04T11:29:55.5025256Z self.common(fn, [torch.tensor(values, dtype=dtype)], check_lowp=False) 2025-12-04T11:29:55.5025535Z File "/opt/conda/envs/py_3.10/lib/python3.10/contextlib.py", line 79, in inner 2025-12-04T11:29:55.5025654Z return func(*args, **kwds) 2025-12-04T11:29:55.5026098Z File "/var/lib/jenkins/workspace/test/inductor/test_torchinductor.py", line 692, in check_model_gpu 2025-12-04T11:29:55.5026196Z check_model( 2025-12-04T11:29:55.5026600Z File "/var/lib/jenkins/workspace/test/inductor/test_torchinductor.py", line 514, in check_model 2025-12-04T11:29:55.5026747Z actual = run(*example_inputs, **kwargs) 2025-12-04T11:29:55.5027230Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T11:29:55.5027536Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T11:29:55.5028051Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T11:29:55.5028248Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T11:29:55.5028769Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T11:29:55.5028951Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T11:29:55.5029532Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T11:29:55.5029869Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T11:29:55.5030401Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 422, in codegen_and_compile 2025-12-04T11:29:55.5030623Z output = self._send_to_child(inputs).deserialize(constants) 2025-12-04T11:29:55.5031159Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 596, in _send_to_child 2025-12-04T11:29:55.5031265Z return f.result() 2025-12-04T11:29:55.5031638Z File "/opt/conda/envs/py_3.10/lib/python3.10/concurrent/futures/_base.py", line 458, in result 2025-12-04T11:29:55.5031755Z return self.__get_result() 2025-12-04T11:29:55.5032161Z File "/opt/conda/envs/py_3.10/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result 2025-12-04T11:29:55.5032273Z raise self._exception 2025-12-04T11:29:55.5032655Z torch._inductor.exc.InductorError: SubprocException: An exception occurred in a subprocess: 2025-12-04T11:29:55.5032661Z 2025-12-04T11:29:55.5032776Z Name= 2025-12-04T11:29:55.5032903Z Traceback (most recent call last): 2025-12-04T11:29:55.5033445Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_worker/subproc_pool.py", line 457, in do_job 2025-12-04T11:29:55.5033561Z result = job() 2025-12-04T11:29:55.5034142Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_subproc.py", line 92, in _run_in_child_subprocess 2025-12-04T11:29:55.5034331Z result = cls._run_in_child(pickled_input, extra_env) 2025-12-04T11:29:55.5034831Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 543, in _run_in_child 2025-12-04T11:29:55.5035038Z output_graph = _InProcessFxCompile().codegen_and_compile( 2025-12-04T11:29:55.5035576Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1471, in codegen_and_compile 2025-12-04T11:29:55.5035703Z _check_triton_bf16_support(graph) 2025-12-04T11:29:55.5036261Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2911, in _check_triton_bf16_support 2025-12-04T11:29:55.5036384Z warn_and_skip(node.get_device()) 2025-12-04T11:29:55.5036868Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2894, in warn_and_skip 2025-12-04T11:29:55.5037024Z raise SkipFrame("BF16 is not supported") 2025-12-04T11:29:55.5037195Z torch._dynamo.exc.SkipFrame: BF16 is not supported 2025-12-04T11:29:55.5037200Z 2025-12-04T11:29:55.5037205Z 2025-12-04T11:29:55.5037924Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T11:29:55.5037946Z 2025-12-04T11:29:55.5037951Z 2025-12-04T11:29:55.5038172Z To execute this test, run the following from the base repo dir: 2025-12-04T11:29:55.5038611Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_compile_subprocess.py GPUTests.test_isinf_cuda 2025-12-04T11:29:55.5038617Z 2025-12-04T11:29:55.5038897Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:29:55.5039165Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:29:55.5039342Z stats [('calls_captured', 8), ('unique_graphs', 3)] 2025-12-04T11:29:55.5039885Z inductor [('triton_bundler_save_kernel', 24), ('fxgraph_cache_miss', 4), ('async_compile_cache_miss', 3), ('triton_bundler_save_static_autotuner', 3)] 2025-12-04T11:29:55.5040243Z aot_autograd [('total', 4), ('autograd_cache_miss', 4), ('autograd_cache_saved', 3), ('ok', 3), ('not_ok', 1)] 2025-12-04T11:29:55.5040509Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:29:55.5041289Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__getstate__, please add missing op schema 2025-12-04T11:29:55.5041584Z warnings.warn(f"undefined OpHandler.{name}, please add missing op schema") 2025-12-04T11:29:55.5042315Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__setstate__, please add missing op schema 2025-12-04T11:29:55.5042625Z warnings.warn(f"undefined OpHandler.{name}, please add missing op schema") 2025-12-04T11:29:55.5043361Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__getstate__, please add missing op schema 2025-12-04T11:29:55.5043638Z warnings.warn(f"undefined OpHandler.{name}, please add missing op schema") 2025-12-04T11:29:55.5044378Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__setstate__, please add missing op schema 2025-12-04T11:29:55.5044657Z warnings.warn(f"undefined OpHandler.{name}, please add missing op schema") 2025-12-04T11:29:55.5045379Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__getstate__, please add missing op schema 2025-12-04T11:29:55.5045673Z warnings.warn(f"undefined OpHandler.{name}, please add missing op schema") 2025-12-04T11:29:55.5046405Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__setstate__, please add missing op schema 2025-12-04T11:29:55.5046697Z warnings.warn(f"undefined OpHandler.{name}, please add missing op schema") 2025-12-04T11:29:55.5047418Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__getstate__, please add missing op schema 2025-12-04T11:29:55.5047700Z warnings.warn(f"undefined OpHandler.{name}, please add missing op schema") 2025-12-04T11:29:55.5047936Z ___________________________ GPUTests.test_isinf_cuda ___________________________ 2025-12-04T11:29:55.5048065Z Traceback (most recent call last): 2025-12-04T11:29:55.5048467Z File "/var/lib/jenkins/workspace/test/inductor/test_torchinductor.py", line 14842, in new_test 2025-12-04T11:29:55.5048590Z return value(self) 2025-12-04T11:29:55.5049002Z File "/var/lib/jenkins/workspace/test/inductor/test_torchinductor.py", line 8265, in test_isinf 2025-12-04T11:29:55.5049269Z self.common(fn, [torch.tensor(values, dtype=dtype)], check_lowp=False) 2025-12-04T11:29:55.5049552Z File "/opt/conda/envs/py_3.10/lib/python3.10/contextlib.py", line 79, in inner 2025-12-04T11:29:55.5049670Z return func(*args, **kwds) 2025-12-04T11:29:55.5050123Z File "/var/lib/jenkins/workspace/test/inductor/test_torchinductor.py", line 692, in check_model_gpu 2025-12-04T11:29:55.5050227Z check_model( 2025-12-04T11:29:55.5050633Z File "/var/lib/jenkins/workspace/test/inductor/test_torchinductor.py", line 514, in check_model 2025-12-04T11:29:55.5050785Z actual = run(*example_inputs, **kwargs) 2025-12-04T11:29:55.5051275Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T11:29:55.5051542Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T11:29:55.5052095Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T11:29:55.5052294Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T11:29:55.5052824Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T11:29:55.5053014Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T11:29:55.5053563Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T11:29:55.5053918Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T11:29:55.5054455Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 422, in codegen_and_compile 2025-12-04T11:29:55.5054678Z output = self._send_to_child(inputs).deserialize(constants) 2025-12-04T11:29:55.5055194Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 596, in _send_to_child 2025-12-04T11:29:55.5055333Z return f.result() 2025-12-04T11:29:55.5055705Z File "/opt/conda/envs/py_3.10/lib/python3.10/concurrent/futures/_base.py", line 458, in result 2025-12-04T11:29:55.5055825Z return self.__get_result() 2025-12-04T11:29:55.5056228Z File "/opt/conda/envs/py_3.10/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result 2025-12-04T11:29:55.5056345Z raise self._exception 2025-12-04T11:29:55.5056816Z torch._inductor.exc.InductorError: SubprocException: An exception occurred in a subprocess: 2025-12-04T11:29:55.5056823Z 2025-12-04T11:29:55.5056936Z Name= 2025-12-04T11:29:55.5057064Z Traceback (most recent call last): 2025-12-04T11:29:55.5057606Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_worker/subproc_pool.py", line 457, in do_job 2025-12-04T11:29:55.5057723Z result = job() 2025-12-04T11:29:55.5058301Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_subproc.py", line 92, in _run_in_child_subprocess 2025-12-04T11:29:55.5058499Z result = cls._run_in_child(pickled_input, extra_env) 2025-12-04T11:29:55.5058999Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 543, in _run_in_child 2025-12-04T11:29:55.5059206Z output_graph = _InProcessFxCompile().codegen_and_compile( 2025-12-04T11:29:55.5059743Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1471, in codegen_and_compile 2025-12-04T11:29:55.5059871Z _check_triton_bf16_support(graph) 2025-12-04T11:29:55.5060433Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2911, in _check_triton_bf16_support 2025-12-04T11:29:55.5060553Z warn_and_skip(node.get_device()) 2025-12-04T11:29:55.5061039Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2894, in warn_and_skip 2025-12-04T11:29:55.5061197Z raise SkipFrame("BF16 is not supported") 2025-12-04T11:29:55.5061367Z torch._dynamo.exc.SkipFrame: BF16 is not supported 2025-12-04T11:29:55.5061372Z 2025-12-04T11:29:55.5061378Z 2025-12-04T11:29:55.5062110Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T11:29:55.5062119Z 2025-12-04T11:29:55.5062124Z 2025-12-04T11:29:55.5062343Z To execute this test, run the following from the base repo dir: 2025-12-04T11:29:55.5062782Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_compile_subprocess.py GPUTests.test_isinf_cuda 2025-12-04T11:29:55.5062787Z 2025-12-04T11:29:55.5063068Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:29:55.5063292Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:29:55.5063511Z stats [('calls_captured', 8), ('unique_graphs', 3)] 2025-12-04T11:29:55.5064060Z inductor [('triton_bundler_save_kernel', 24), ('fxgraph_cache_miss', 4), ('async_compile_cache_miss', 3), ('triton_bundler_save_static_autotuner', 3)] 2025-12-04T11:29:55.5064416Z aot_autograd [('total', 4), ('autograd_cache_miss', 4), ('autograd_cache_saved', 3), ('ok', 3), ('not_ok', 1)] 2025-12-04T11:29:55.5064677Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:29:55.5065485Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__getstate__, please add missing op schema 2025-12-04T11:29:55.5065783Z warnings.warn(f"undefined OpHandler.{name}, please add missing op schema") 2025-12-04T11:29:55.5066510Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__setstate__, please add missing op schema 2025-12-04T11:29:55.5066792Z warnings.warn(f"undefined OpHandler.{name}, please add missing op schema") 2025-12-04T11:29:55.5067560Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__getstate__, please add missing op schema 2025-12-04T11:29:55.5067835Z warnings.warn(f"undefined OpHandler.{name}, please add missing op schema") 2025-12-04T11:29:55.5068569Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__setstate__, please add missing op schema 2025-12-04T11:29:55.5068847Z warnings.warn(f"undefined OpHandler.{name}, please add missing op schema") 2025-12-04T11:29:55.5069565Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__getstate__, please add missing op schema 2025-12-04T11:29:55.5069853Z warnings.warn(f"undefined OpHandler.{name}, please add missing op schema") 2025-12-04T11:29:55.5070573Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__setstate__, please add missing op schema 2025-12-04T11:29:55.5070864Z warnings.warn(f"undefined OpHandler.{name}, please add missing op schema") 2025-12-04T11:29:55.5071813Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__getstate__, please add missing op schema 2025-12-04T11:29:55.5072092Z warnings.warn(f"undefined OpHandler.{name}, please add missing op schema") 2025-12-04T11:29:55.5072327Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:29:55.5072486Z stats [('calls_captured', 8), ('unique_graphs', 3)] 2025-12-04T11:29:55.5072855Z aot_autograd [('total', 4), ('autograd_cache_miss', 4), ('autograd_cache_saved', 3), ('ok', 3), ('not_ok', 1)] 2025-12-04T11:29:55.5073397Z inductor [('triton_bundler_save_kernel', 24), ('fxgraph_cache_miss', 4), ('async_compile_cache_miss', 3), ('triton_bundler_save_static_autotuner', 3)] 2025-12-04T11:29:55.5073617Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:29:55.5074356Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__getstate__, please add missing op schema 2025-12-04T11:29:55.5074632Z warnings.warn(f"undefined OpHandler.{name}, please add missing op schema") 2025-12-04T11:29:55.5075370Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__setstate__, please add missing op schema 2025-12-04T11:29:55.5075647Z warnings.warn(f"undefined OpHandler.{name}, please add missing op schema") 2025-12-04T11:29:55.5076368Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__getstate__, please add missing op schema 2025-12-04T11:29:55.5076758Z warnings.warn(f"undefined OpHandler.{name}, please add missing op schema") 2025-12-04T11:29:55.5077482Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__setstate__, please add missing op schema 2025-12-04T11:29:55.5077776Z warnings.warn(f"undefined OpHandler.{name}, please add missing op schema") 2025-12-04T11:29:55.5078499Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__getstate__, please add missing op schema 2025-12-04T11:29:55.5078842Z warnings.warn(f"undefined OpHandler.{name}, please add missing op schema") 2025-12-04T11:29:55.5079627Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__setstate__, please add missing op schema 2025-12-04T11:29:55.5079904Z warnings.warn(f"undefined OpHandler.{name}, please add missing op schema") 2025-12-04T11:29:55.5080638Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__getstate__, please add missing op schema 2025-12-04T11:29:55.5080958Z warnings.warn(f"undefined OpHandler.{name}, please add missing op schema") 2025-12-04T11:29:55.5081105Z =================================== FAILURES =================================== 2025-12-04T11:29:55.5081338Z ___________________________ GPUTests.test_isinf_cuda ___________________________ 2025-12-04T11:29:55.5081467Z Traceback (most recent call last): 2025-12-04T11:29:55.5081870Z File "/var/lib/jenkins/workspace/test/inductor/test_torchinductor.py", line 14842, in new_test 2025-12-04T11:29:55.5081993Z return value(self) 2025-12-04T11:29:55.5082396Z File "/var/lib/jenkins/workspace/test/inductor/test_torchinductor.py", line 8265, in test_isinf 2025-12-04T11:29:55.5082656Z self.common(fn, [torch.tensor(values, dtype=dtype)], check_lowp=False) 2025-12-04T11:29:55.5082935Z File "/opt/conda/envs/py_3.10/lib/python3.10/contextlib.py", line 79, in inner 2025-12-04T11:29:55.5083054Z return func(*args, **kwds) 2025-12-04T11:29:55.5083504Z File "/var/lib/jenkins/workspace/test/inductor/test_torchinductor.py", line 692, in check_model_gpu 2025-12-04T11:29:55.5083605Z check_model( 2025-12-04T11:29:55.5084009Z File "/var/lib/jenkins/workspace/test/inductor/test_torchinductor.py", line 514, in check_model 2025-12-04T11:29:55.5084158Z actual = run(*example_inputs, **kwargs) 2025-12-04T11:29:55.5084645Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T11:29:55.5084908Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T11:29:55.5085418Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T11:29:55.5085610Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T11:29:55.5086136Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T11:29:55.5086288Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T11:29:55.5086821Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T11:29:55.5087153Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T11:29:55.5087688Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 422, in codegen_and_compile 2025-12-04T11:29:55.5087909Z output = self._send_to_child(inputs).deserialize(constants) 2025-12-04T11:29:55.5088416Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 596, in _send_to_child 2025-12-04T11:29:55.5088520Z return f.result() 2025-12-04T11:29:55.5088888Z File "/opt/conda/envs/py_3.10/lib/python3.10/concurrent/futures/_base.py", line 458, in result 2025-12-04T11:29:55.5089038Z return self.__get_result() 2025-12-04T11:29:55.5089442Z File "/opt/conda/envs/py_3.10/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result 2025-12-04T11:29:55.5089556Z raise self._exception 2025-12-04T11:29:55.5089936Z torch._inductor.exc.InductorError: SubprocException: An exception occurred in a subprocess: 2025-12-04T11:29:55.5089942Z 2025-12-04T11:29:55.5090054Z Name= 2025-12-04T11:29:55.5090209Z Traceback (most recent call last): 2025-12-04T11:29:55.5090748Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_worker/subproc_pool.py", line 457, in do_job 2025-12-04T11:29:55.5090891Z result = job() 2025-12-04T11:29:55.5091466Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_subproc.py", line 92, in _run_in_child_subprocess 2025-12-04T11:29:55.5091655Z result = cls._run_in_child(pickled_input, extra_env) 2025-12-04T11:29:55.5092160Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 543, in _run_in_child 2025-12-04T11:29:55.5092393Z output_graph = _InProcessFxCompile().codegen_and_compile( 2025-12-04T11:29:55.5092927Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1471, in codegen_and_compile 2025-12-04T11:29:55.5093053Z _check_triton_bf16_support(graph) 2025-12-04T11:29:55.5093611Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2911, in _check_triton_bf16_support 2025-12-04T11:29:55.5093734Z warn_and_skip(node.get_device()) 2025-12-04T11:29:55.5094218Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2894, in warn_and_skip 2025-12-04T11:29:55.5094371Z raise SkipFrame("BF16 is not supported") 2025-12-04T11:29:55.5094542Z torch._dynamo.exc.SkipFrame: BF16 is not supported 2025-12-04T11:29:55.5094548Z 2025-12-04T11:29:55.5094552Z 2025-12-04T11:29:55.5095283Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T11:29:55.5095290Z 2025-12-04T11:29:55.5095296Z 2025-12-04T11:29:55.5095515Z To execute this test, run the following from the base repo dir: 2025-12-04T11:29:55.5095950Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_compile_subprocess.py GPUTests.test_isinf_cuda 2025-12-04T11:29:55.5095958Z 2025-12-04T11:29:55.5096240Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:29:55.5096532Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:29:55.5096708Z stats [('calls_captured', 8), ('unique_graphs', 3)] 2025-12-04T11:29:55.5097250Z inductor [('triton_bundler_save_kernel', 24), ('fxgraph_cache_miss', 4), ('async_compile_cache_miss', 3), ('triton_bundler_save_static_autotuner', 3)] 2025-12-04T11:29:55.5097610Z aot_autograd [('total', 4), ('autograd_cache_miss', 4), ('autograd_cache_saved', 3), ('ok', 3), ('not_ok', 1)] 2025-12-04T11:29:55.5097845Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:29:55.5098578Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__getstate__, please add missing op schema 2025-12-04T11:29:55.5098875Z warnings.warn(f"undefined OpHandler.{name}, please add missing op schema") 2025-12-04T11:29:55.5099601Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__setstate__, please add missing op schema 2025-12-04T11:29:55.5099881Z warnings.warn(f"undefined OpHandler.{name}, please add missing op schema") 2025-12-04T11:29:55.5100621Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__getstate__, please add missing op schema 2025-12-04T11:29:55.5100940Z warnings.warn(f"undefined OpHandler.{name}, please add missing op schema") 2025-12-04T11:29:55.5101680Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__setstate__, please add missing op schema 2025-12-04T11:29:55.5101953Z warnings.warn(f"undefined OpHandler.{name}, please add missing op schema") 2025-12-04T11:29:55.5102671Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__getstate__, please add missing op schema 2025-12-04T11:29:55.5102995Z warnings.warn(f"undefined OpHandler.{name}, please add missing op schema") 2025-12-04T11:29:55.5103748Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__setstate__, please add missing op schema 2025-12-04T11:29:55.5104035Z warnings.warn(f"undefined OpHandler.{name}, please add missing op schema") 2025-12-04T11:29:55.5104756Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__getstate__, please add missing op schema 2025-12-04T11:29:55.5105062Z warnings.warn(f"undefined OpHandler.{name}, please add missing op schema") 2025-12-04T11:29:55.5105297Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:29:55.5105458Z stats [('calls_captured', 8), ('unique_graphs', 3)] 2025-12-04T11:29:55.5105819Z aot_autograd [('total', 4), ('autograd_cache_miss', 4), ('autograd_cache_saved', 3), ('ok', 3), ('not_ok', 1)] 2025-12-04T11:29:55.5106373Z inductor [('triton_bundler_save_kernel', 24), ('fxgraph_cache_miss', 4), ('async_compile_cache_miss', 3), ('triton_bundler_save_static_autotuner', 3)] 2025-12-04T11:29:55.5106589Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:29:55.5107326Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__getstate__, please add missing op schema 2025-12-04T11:29:55.5107602Z warnings.warn(f"undefined OpHandler.{name}, please add missing op schema") 2025-12-04T11:29:55.5108328Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__setstate__, please add missing op schema 2025-12-04T11:29:55.5108617Z warnings.warn(f"undefined OpHandler.{name}, please add missing op schema") 2025-12-04T11:29:55.5109343Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__getstate__, please add missing op schema 2025-12-04T11:29:55.5109635Z warnings.warn(f"undefined OpHandler.{name}, please add missing op schema") 2025-12-04T11:29:55.5110358Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__setstate__, please add missing op schema 2025-12-04T11:29:55.5110632Z warnings.warn(f"undefined OpHandler.{name}, please add missing op schema") 2025-12-04T11:29:55.5111366Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__getstate__, please add missing op schema 2025-12-04T11:29:55.5111641Z warnings.warn(f"undefined OpHandler.{name}, please add missing op schema") 2025-12-04T11:29:55.5112370Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__setstate__, please add missing op schema 2025-12-04T11:29:55.5112662Z warnings.warn(f"undefined OpHandler.{name}, please add missing op schema") 2025-12-04T11:29:55.5113391Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__getstate__, please add missing op schema 2025-12-04T11:29:55.5113680Z warnings.warn(f"undefined OpHandler.{name}, please add missing op schema") 2025-12-04T11:29:55.5113894Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:29:55.5114104Z stats [('calls_captured', 8), ('unique_graphs', 3)] 2025-12-04T11:29:55.5114468Z aot_autograd [('total', 4), ('autograd_cache_miss', 4), ('autograd_cache_saved', 3), ('ok', 3), ('not_ok', 1)] 2025-12-04T11:29:55.5115005Z inductor [('triton_bundler_save_kernel', 24), ('fxgraph_cache_miss', 4), ('async_compile_cache_miss', 3), ('triton_bundler_save_static_autotuner', 3)] 2025-12-04T11:29:55.5115274Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:29:55.5116033Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__getstate__, please add missing op schema 2025-12-04T11:29:55.5116326Z warnings.warn(f"undefined OpHandler.{name}, please add missing op schema") 2025-12-04T11:29:55.5117054Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__setstate__, please add missing op schema 2025-12-04T11:29:55.5117334Z warnings.warn(f"undefined OpHandler.{name}, please add missing op schema") 2025-12-04T11:29:55.5118104Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__getstate__, please add missing op schema 2025-12-04T11:29:55.5118380Z warnings.warn(f"undefined OpHandler.{name}, please add missing op schema") 2025-12-04T11:29:55.5119119Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__setstate__, please add missing op schema 2025-12-04T11:29:55.5119398Z warnings.warn(f"undefined OpHandler.{name}, please add missing op schema") 2025-12-04T11:29:55.5120122Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__getstate__, please add missing op schema 2025-12-04T11:29:55.5120412Z warnings.warn(f"undefined OpHandler.{name}, please add missing op schema") 2025-12-04T11:29:55.5121135Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__setstate__, please add missing op schema 2025-12-04T11:29:55.5121427Z warnings.warn(f"undefined OpHandler.{name}, please add missing op schema") 2025-12-04T11:29:55.5122150Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__getstate__, please add missing op schema 2025-12-04T11:29:55.5122429Z warnings.warn(f"undefined OpHandler.{name}, please add missing op schema") 2025-12-04T11:29:55.5123272Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_compile_subprocess/inductor.test_compile_subprocess-aaac502093c587a7.xml - 2025-12-04T11:29:55.5123447Z =========================== short test summary info ============================ 2025-12-04T11:29:55.5124237Z FAILED [1.1829s] inductor/test_compile_subprocess.py::GPUTests::test_isinf_cuda - torch._inductor.exc.InductorError: SubprocException: An exception occurred in a subprocess: 2025-12-04T11:29:55.5124243Z 2025-12-04T11:29:55.5124347Z Name= 2025-12-04T11:29:55.5124475Z Traceback (most recent call last): 2025-12-04T11:29:55.5125038Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_worker/subproc_pool.py", line 457, in do_job 2025-12-04T11:29:55.5125143Z result = job() 2025-12-04T11:29:55.5125736Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_subproc.py", line 92, in _run_in_child_subprocess 2025-12-04T11:29:55.5125916Z result = cls._run_in_child(pickled_input, extra_env) 2025-12-04T11:29:55.5126422Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 543, in _run_in_child 2025-12-04T11:29:55.5126646Z output_graph = _InProcessFxCompile().codegen_and_compile( 2025-12-04T11:29:55.5127169Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1471, in codegen_and_compile 2025-12-04T11:29:55.5127364Z _check_triton_bf16_support(graph) 2025-12-04T11:29:55.5127916Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2911, in _check_triton_bf16_support 2025-12-04T11:29:55.5128042Z warn_and_skip(node.get_device()) 2025-12-04T11:29:55.5128542Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2894, in warn_and_skip 2025-12-04T11:29:55.5128718Z raise SkipFrame("BF16 is not supported") 2025-12-04T11:29:55.5128890Z torch._dynamo.exc.SkipFrame: BF16 is not supported 2025-12-04T11:29:55.5128895Z 2025-12-04T11:29:55.5128930Z 2025-12-04T11:29:55.5129658Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T11:29:55.5129664Z 2025-12-04T11:29:55.5129668Z 2025-12-04T11:29:55.5129889Z To execute this test, run the following from the base repo dir: 2025-12-04T11:29:55.5130347Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_compile_subprocess.py GPUTests.test_isinf_cuda 2025-12-04T11:29:55.5130384Z 2025-12-04T11:29:55.5130657Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:29:55.5130854Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:29:55.5131067Z ================= 1 failed, 287 deselected, 2 rerun in 21.09s ================== 2025-12-04T11:29:55.5131167Z Got exit code 1 2025-12-04T11:29:55.5131545Z FAILED CONSISTENTLY: test/inductor/test_compile_subprocess.py::GPUTests::test_isinf_cuda 2025-12-04T11:29:55.5131954Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T11:29:55.5132397Z W1204 11:25:08.579000 99744 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T11:29:55.5133051Z Test results will be stored in test-reports/python-pytest/inductor.test_compile_subprocess/inductor.test_compile_subprocess-decce829c4432557.xml 2025-12-04T11:29:55.5133219Z ============================= test session starts ============================== 2025-12-04T11:29:55.5133587Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T11:29:55.5133698Z cachedir: .pytest_cache 2025-12-04T11:29:55.5134218Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:29:55.5134356Z rootdir: /var/lib/jenkins/workspace 2025-12-04T11:29:55.5134468Z configfile: pytest.ini 2025-12-04T11:29:55.5135020Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:29:55.5135248Z collecting ... collected 879 items / 143 deselected / 736 selected 2025-12-04T11:29:55.5135396Z stepcurrent: skipping 143 already run items. 2025-12-04T11:29:55.5135528Z Running 145 items in this shard 2025-12-04T11:29:55.5135534Z 2025-12-04T11:29:55.5136095Z inductor/test_compile_subprocess.py::GPUTests::test_issue102546_cuda <- test/inductor/test_torchinductor.py PASSED [18.2799s] [ 0%] 2025-12-04T11:29:55.5136748Z inductor/test_compile_subprocess.py::GPUTests::test_kernel_names_cuda <- test/inductor/test_torchinductor.py PASSED [0.5008s] [ 1%] 2025-12-04T11:29:55.5137289Z inductor/test_compile_subprocess.py::GPUTests::test_l1_loss_cuda <- test/inductor/test_torchinductor.py PASSED [0.6136s] [ 2%] 2025-12-04T11:29:55.5137908Z inductor/test_compile_subprocess.py::GPUTests::test_large_grid_use_block_ptr_False_cuda <- test/inductor/test_torchinductor.py PASSED [0.8998s] [ 2%] 2025-12-04T11:29:55.5138513Z inductor/test_compile_subprocess.py::GPUTests::test_layer_norm_cuda <- test/inductor/test_torchinductor.py SKIPPED [0.0003s] (Skipped!) [ 3%] 2025-12-04T11:29:55.5139585Z inductor/test_compile_subprocess.py::GPUTests::test_leaky_relu_cuda <- test/inductor/test_torchinductor.py W1204 11:25:31.539000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Unable to pickle input graph or example inputs 2025-12-04T11:29:55.5140064Z W1204 11:25:31.539000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Traceback (most recent call last): 2025-12-04T11:29:55.5140954Z W1204 11:25:31.539000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 486, in serialize_compile 2025-12-04T11:29:55.5141401Z W1204 11:25:31.539000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] ).serialize() 2025-12-04T11:29:55.5142255Z W1204 11:25:31.539000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 210, in serialize 2025-12-04T11:29:55.5142836Z W1204 11:25:31.539000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] return _WireProtocolPickledInput(GraphPickler.dumps(self)) 2025-12-04T11:29:55.5143680Z W1204 11:25:31.539000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 127, in dumps 2025-12-04T11:29:55.5144081Z W1204 11:25:31.539000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] pickler.dump(obj) 2025-12-04T11:29:55.5144930Z W1204 11:25:31.539000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 94, in reducer_override 2025-12-04T11:29:55.5145481Z W1204 11:25:31.539000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] return _GraphModulePickleData.reduce_helper(self, obj) 2025-12-04T11:29:55.5146307Z W1204 11:25:31.539000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 352, in reduce_helper 2025-12-04T11:29:55.5146767Z W1204 11:25:31.539000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] cls(obj, pickler.options), 2025-12-04T11:29:55.5147568Z W1204 11:25:31.539000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 364, in __init__ 2025-12-04T11:29:55.5148103Z W1204 11:25:31.539000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] self.graph = _GraphPickleData(gm._graph, options) 2025-12-04T11:29:55.5148902Z W1204 11:25:31.539000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 571, in __init__ 2025-12-04T11:29:55.5149423Z W1204 11:25:31.539000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] nodes[node] = _NodePickleData(node, nodes, options) 2025-12-04T11:29:55.5150231Z W1204 11:25:31.539000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 387, in __init__ 2025-12-04T11:29:55.5150787Z W1204 11:25:31.539000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] self.target = _OpPickleData.pickle(node.target, options) 2025-12-04T11:29:55.5151593Z W1204 11:25:31.539000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 435, in pickle 2025-12-04T11:29:55.5152166Z W1204 11:25:31.539000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] return cls._pickle_op(name, _OpOverloadPickleData, options) 2025-12-04T11:29:55.5152985Z W1204 11:25:31.539000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 456, in _pickle_op 2025-12-04T11:29:55.5153661Z W1204 11:25:31.539000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] raise BypassFxGraphCache(f"Unable to pickle non-standard op: {name}") 2025-12-04T11:29:55.5154573Z W1204 11:25:31.539000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] torch._inductor.codecache.BypassFxGraphCache: Unable to pickle non-standard op: torch.ops.prims.convert_element_type.default 2025-12-04T11:29:55.5154733Z PASSED [1.1081s] [ 4%] 2025-12-04T11:29:55.5155261Z inductor/test_compile_subprocess.py::GPUTests::test_lgamma_cuda <- test/inductor/test_torchinductor.py PASSED [1.3722s] [ 4%] 2025-12-04T11:29:55.5156343Z inductor/test_compile_subprocess.py::GPUTests::test_like_rands2_cuda <- test/inductor/test_torchinductor.py W1204 11:25:33.403000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Unable to pickle input graph or example inputs 2025-12-04T11:29:55.5156804Z W1204 11:25:33.403000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Traceback (most recent call last): 2025-12-04T11:29:55.5157703Z W1204 11:25:33.403000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 486, in serialize_compile 2025-12-04T11:29:55.5158117Z W1204 11:25:33.403000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] ).serialize() 2025-12-04T11:29:55.5158958Z W1204 11:25:33.403000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 210, in serialize 2025-12-04T11:29:55.5159549Z W1204 11:25:33.403000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] return _WireProtocolPickledInput(GraphPickler.dumps(self)) 2025-12-04T11:29:55.5160335Z W1204 11:25:33.403000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 127, in dumps 2025-12-04T11:29:55.5160750Z W1204 11:25:33.403000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] pickler.dump(obj) 2025-12-04T11:29:55.5161583Z W1204 11:25:33.403000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 94, in reducer_override 2025-12-04T11:29:55.5162137Z W1204 11:25:33.403000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] return _GraphModulePickleData.reduce_helper(self, obj) 2025-12-04T11:29:55.5162977Z W1204 11:25:33.403000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 352, in reduce_helper 2025-12-04T11:29:55.5163419Z W1204 11:25:33.403000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] cls(obj, pickler.options), 2025-12-04T11:29:55.5164229Z W1204 11:25:33.403000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 364, in __init__ 2025-12-04T11:29:55.5164749Z W1204 11:25:33.403000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] self.graph = _GraphPickleData(gm._graph, options) 2025-12-04T11:29:55.5165558Z W1204 11:25:33.403000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 571, in __init__ 2025-12-04T11:29:55.5166083Z W1204 11:25:33.403000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] nodes[node] = _NodePickleData(node, nodes, options) 2025-12-04T11:29:55.5166874Z W1204 11:25:33.403000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 387, in __init__ 2025-12-04T11:29:55.5167488Z W1204 11:25:33.403000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] self.target = _OpPickleData.pickle(node.target, options) 2025-12-04T11:29:55.5168277Z W1204 11:25:33.403000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 435, in pickle 2025-12-04T11:29:55.5168856Z W1204 11:25:33.403000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] return cls._pickle_op(name, _OpOverloadPickleData, options) 2025-12-04T11:29:55.5169721Z W1204 11:25:33.403000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 456, in _pickle_op 2025-12-04T11:29:55.5170352Z W1204 11:25:33.403000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] raise BypassFxGraphCache(f"Unable to pickle non-standard op: {name}") 2025-12-04T11:29:55.5171440Z W1204 11:25:33.403000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] torch._inductor.codecache.BypassFxGraphCache: Unable to pickle non-standard op: torch.ops.prims.inductor_seeds.default 2025-12-04T11:29:55.5171659Z PASSED [0.6016s] [ 5%] 2025-12-04T11:29:55.5172256Z inductor/test_compile_subprocess.py::GPUTests::test_linear_mixed_dtype_cuda <- test/inductor/test_torchinductor.py PASSED [0.6873s] [ 6%] 2025-12-04T11:29:55.5172927Z inductor/test_compile_subprocess.py::GPUTests::test_lite_regional_compile_repeated_blocks_cuda <- test/inductor/test_torchinductor.py PASSED [0.3689s] [ 6%] 2025-12-04T11:29:55.5173612Z inductor/test_compile_subprocess.py::GPUTests::test_lite_triton_kernel_wrapper_functional_cuda <- test/inductor/test_torchinductor.py PASSED [0.5808s] [ 7%] 2025-12-04T11:29:55.5174118Z inductor/test_compile_subprocess.py::GPUTests::test_log2_cuda <- test/inductor/test_torchinductor.py PASSED [1.0496s] [ 8%] 2025-12-04T11:29:55.5174639Z inductor/test_compile_subprocess.py::GPUTests::test_log_fp64_cuda <- test/inductor/test_torchinductor.py PASSED [0.8230s] [ 8%] 2025-12-04T11:29:55.5175245Z inductor/test_compile_subprocess.py::GPUTests::test_logcumsumexp_zero_dim_cuda <- test/inductor/test_torchinductor.py PASSED [0.8425s] [ 9%] 2025-12-04T11:29:55.5176456Z inductor/test_compile_subprocess.py::GPUTests::test_low_memory_max_pool_dilation_1_dim_2_cuda <- test/inductor/test_torchinductor.py W1204 11:25:38.416000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Unable to pickle input graph or example inputs 2025-12-04T11:29:55.5176938Z W1204 11:25:38.416000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Traceback (most recent call last): 2025-12-04T11:29:55.5177840Z W1204 11:25:38.416000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 486, in serialize_compile 2025-12-04T11:29:55.5178231Z W1204 11:25:38.416000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] ).serialize() 2025-12-04T11:29:55.5179067Z W1204 11:25:38.416000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 210, in serialize 2025-12-04T11:29:55.5179642Z W1204 11:25:38.416000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] return _WireProtocolPickledInput(GraphPickler.dumps(self)) 2025-12-04T11:29:55.5180443Z W1204 11:25:38.416000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 127, in dumps 2025-12-04T11:29:55.5180846Z W1204 11:25:38.416000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] pickler.dump(obj) 2025-12-04T11:29:55.5181689Z W1204 11:25:38.416000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 94, in reducer_override 2025-12-04T11:29:55.5182310Z W1204 11:25:38.416000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] return _GraphModulePickleData.reduce_helper(self, obj) 2025-12-04T11:29:55.5183143Z W1204 11:25:38.416000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 352, in reduce_helper 2025-12-04T11:29:55.5183626Z W1204 11:25:38.416000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] cls(obj, pickler.options), 2025-12-04T11:29:55.5184474Z W1204 11:25:38.416000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 364, in __init__ 2025-12-04T11:29:55.5185004Z W1204 11:25:38.416000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] self.graph = _GraphPickleData(gm._graph, options) 2025-12-04T11:29:55.5185799Z W1204 11:25:38.416000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 571, in __init__ 2025-12-04T11:29:55.5186363Z W1204 11:25:38.416000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] nodes[node] = _NodePickleData(node, nodes, options) 2025-12-04T11:29:55.5187157Z W1204 11:25:38.416000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 387, in __init__ 2025-12-04T11:29:55.5187726Z W1204 11:25:38.416000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] self.target = _OpPickleData.pickle(node.target, options) 2025-12-04T11:29:55.5188513Z W1204 11:25:38.416000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 435, in pickle 2025-12-04T11:29:55.5189085Z W1204 11:25:38.416000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] return cls._pickle_op(name, _OpOverloadPickleData, options) 2025-12-04T11:29:55.5189905Z W1204 11:25:38.416000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 456, in _pickle_op 2025-12-04T11:29:55.5190524Z W1204 11:25:38.416000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] raise BypassFxGraphCache(f"Unable to pickle non-standard op: {name}") 2025-12-04T11:29:55.5191503Z W1204 11:25:38.416000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] torch._inductor.codecache.BypassFxGraphCache: Unable to pickle non-standard op: torch.ops.prims._low_memory_max_pool_with_offsets.default 2025-12-04T11:29:55.5192008Z W1204 11:25:39.308000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Unable to pickle input graph or example inputs 2025-12-04T11:29:55.5192463Z W1204 11:25:39.308000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Traceback (most recent call last): 2025-12-04T11:29:55.5193358Z W1204 11:25:39.308000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 486, in serialize_compile 2025-12-04T11:29:55.5193735Z W1204 11:25:39.308000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] ).serialize() 2025-12-04T11:29:55.5194583Z W1204 11:25:39.308000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 210, in serialize 2025-12-04T11:29:55.5195156Z W1204 11:25:39.308000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] return _WireProtocolPickledInput(GraphPickler.dumps(self)) 2025-12-04T11:29:55.5195986Z W1204 11:25:39.308000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 127, in dumps 2025-12-04T11:29:55.5196388Z W1204 11:25:39.308000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] pickler.dump(obj) 2025-12-04T11:29:55.5197221Z W1204 11:25:39.308000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 94, in reducer_override 2025-12-04T11:29:55.5197813Z W1204 11:25:39.308000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] return _GraphModulePickleData.reduce_helper(self, obj) 2025-12-04T11:29:55.5198672Z W1204 11:25:39.308000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 352, in reduce_helper 2025-12-04T11:29:55.5199130Z W1204 11:25:39.308000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] cls(obj, pickler.options), 2025-12-04T11:29:55.5199926Z W1204 11:25:39.308000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 364, in __init__ 2025-12-04T11:29:55.5200487Z W1204 11:25:39.308000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] self.graph = _GraphPickleData(gm._graph, options) 2025-12-04T11:29:55.5201289Z W1204 11:25:39.308000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 571, in __init__ 2025-12-04T11:29:55.5201814Z W1204 11:25:39.308000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] nodes[node] = _NodePickleData(node, nodes, options) 2025-12-04T11:29:55.5202622Z W1204 11:25:39.308000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 387, in __init__ 2025-12-04T11:29:55.5203175Z W1204 11:25:39.308000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] self.target = _OpPickleData.pickle(node.target, options) 2025-12-04T11:29:55.5203979Z W1204 11:25:39.308000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 435, in pickle 2025-12-04T11:29:55.5204550Z W1204 11:25:39.308000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] return cls._pickle_op(name, _OpOverloadPickleData, options) 2025-12-04T11:29:55.5205371Z W1204 11:25:39.308000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 456, in _pickle_op 2025-12-04T11:29:55.5205996Z W1204 11:25:39.308000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] raise BypassFxGraphCache(f"Unable to pickle non-standard op: {name}") 2025-12-04T11:29:55.5206963Z W1204 11:25:39.308000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] torch._inductor.codecache.BypassFxGraphCache: Unable to pickle non-standard op: torch.ops.prims._low_memory_max_pool_with_offsets.default 2025-12-04T11:29:55.5207087Z PASSED [1.7546s] [ 10%] 2025-12-04T11:29:55.5208221Z inductor/test_compile_subprocess.py::GPUTests::test_low_memory_max_pool_dilation_2_dim_3_cuda <- test/inductor/test_torchinductor.py W1204 11:25:40.190000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Unable to pickle input graph or example inputs 2025-12-04T11:29:55.5208694Z W1204 11:25:40.190000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Traceback (most recent call last): 2025-12-04T11:29:55.5226636Z W1204 11:25:40.190000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 486, in serialize_compile 2025-12-04T11:29:55.5227452Z W1204 11:25:40.190000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] ).serialize() 2025-12-04T11:29:55.5228337Z W1204 11:25:40.190000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 210, in serialize 2025-12-04T11:29:55.5228918Z W1204 11:25:40.190000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] return _WireProtocolPickledInput(GraphPickler.dumps(self)) 2025-12-04T11:29:55.5229797Z W1204 11:25:40.190000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 127, in dumps 2025-12-04T11:29:55.5230214Z W1204 11:25:40.190000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] pickler.dump(obj) 2025-12-04T11:29:55.5231056Z W1204 11:25:40.190000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 94, in reducer_override 2025-12-04T11:29:55.5231622Z W1204 11:25:40.190000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] return _GraphModulePickleData.reduce_helper(self, obj) 2025-12-04T11:29:55.5232494Z W1204 11:25:40.190000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 352, in reduce_helper 2025-12-04T11:29:55.5232950Z W1204 11:25:40.190000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] cls(obj, pickler.options), 2025-12-04T11:29:55.5233752Z W1204 11:25:40.190000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 364, in __init__ 2025-12-04T11:29:55.5234275Z W1204 11:25:40.190000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] self.graph = _GraphPickleData(gm._graph, options) 2025-12-04T11:29:55.5235084Z W1204 11:25:40.190000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 571, in __init__ 2025-12-04T11:29:55.5235607Z W1204 11:25:40.190000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] nodes[node] = _NodePickleData(node, nodes, options) 2025-12-04T11:29:55.5236418Z W1204 11:25:40.190000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 387, in __init__ 2025-12-04T11:29:55.5236979Z W1204 11:25:40.190000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] self.target = _OpPickleData.pickle(node.target, options) 2025-12-04T11:29:55.5237782Z W1204 11:25:40.190000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 435, in pickle 2025-12-04T11:29:55.5238354Z W1204 11:25:40.190000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] return cls._pickle_op(name, _OpOverloadPickleData, options) 2025-12-04T11:29:55.5239166Z W1204 11:25:40.190000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 456, in _pickle_op 2025-12-04T11:29:55.5239802Z W1204 11:25:40.190000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] raise BypassFxGraphCache(f"Unable to pickle non-standard op: {name}") 2025-12-04T11:29:55.5240762Z W1204 11:25:40.190000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] torch._inductor.codecache.BypassFxGraphCache: Unable to pickle non-standard op: torch.ops.prims._low_memory_max_pool_with_offsets.default 2025-12-04T11:29:55.5241280Z W1204 11:25:43.101000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Unable to pickle input graph or example inputs 2025-12-04T11:29:55.5241769Z W1204 11:25:43.101000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Traceback (most recent call last): 2025-12-04T11:29:55.5242666Z W1204 11:25:43.101000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 486, in serialize_compile 2025-12-04T11:29:55.5243044Z W1204 11:25:43.101000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] ).serialize() 2025-12-04T11:29:55.5243959Z W1204 11:25:43.101000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 210, in serialize 2025-12-04T11:29:55.5244551Z W1204 11:25:43.101000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] return _WireProtocolPickledInput(GraphPickler.dumps(self)) 2025-12-04T11:29:55.5245339Z W1204 11:25:43.101000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 127, in dumps 2025-12-04T11:29:55.5245789Z W1204 11:25:43.101000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] pickler.dump(obj) 2025-12-04T11:29:55.5246625Z W1204 11:25:43.101000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 94, in reducer_override 2025-12-04T11:29:55.5247177Z W1204 11:25:43.101000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] return _GraphModulePickleData.reduce_helper(self, obj) 2025-12-04T11:29:55.5248008Z W1204 11:25:43.101000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 352, in reduce_helper 2025-12-04T11:29:55.5248449Z W1204 11:25:43.101000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] cls(obj, pickler.options), 2025-12-04T11:29:55.5249268Z W1204 11:25:43.101000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 364, in __init__ 2025-12-04T11:29:55.5249790Z W1204 11:25:43.101000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] self.graph = _GraphPickleData(gm._graph, options) 2025-12-04T11:29:55.5250602Z W1204 11:25:43.101000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 571, in __init__ 2025-12-04T11:29:55.5251123Z W1204 11:25:43.101000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] nodes[node] = _NodePickleData(node, nodes, options) 2025-12-04T11:29:55.5251919Z W1204 11:25:43.101000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 387, in __init__ 2025-12-04T11:29:55.5252483Z W1204 11:25:43.101000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] self.target = _OpPickleData.pickle(node.target, options) 2025-12-04T11:29:55.5253284Z W1204 11:25:43.101000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 435, in pickle 2025-12-04T11:29:55.5253867Z W1204 11:25:43.101000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] return cls._pickle_op(name, _OpOverloadPickleData, options) 2025-12-04T11:29:55.5254677Z W1204 11:25:43.101000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 456, in _pickle_op 2025-12-04T11:29:55.5255313Z W1204 11:25:43.101000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] raise BypassFxGraphCache(f"Unable to pickle non-standard op: {name}") 2025-12-04T11:29:55.5256305Z W1204 11:25:43.101000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] torch._inductor.codecache.BypassFxGraphCache: Unable to pickle non-standard op: torch.ops.prims._low_memory_max_pool_with_offsets.default 2025-12-04T11:29:55.5256528Z PASSED [5.8952s] [ 11%] 2025-12-04T11:29:55.5257517Z inductor/test_compile_subprocess.py::GPUTests::test_mark_dynamic_with_hint_override_cuda <- test/inductor/test_torchinductor.py SKIPPED [0.0003s] (Skipping triton backend only since not big GPU (not enough SM)) [ 11%] 2025-12-04T11:29:55.5258182Z inductor/test_compile_subprocess.py::GPUTests::test_masked_fill_promotion_cuda <- test/inductor/test_torchinductor.py PASSED [0.9916s] [ 12%] 2025-12-04T11:29:55.5259215Z inductor/test_compile_subprocess.py::GPUTests::test_matmul_layer_norm_cuda <- test/inductor/test_torchinductor.py W1204 11:25:48.239000 99929 site-packages/torch/_inductor/utils.py:1703] [0/0] Not enough SMs to use max_autotune_gemm mode 2025-12-04T11:29:55.5259326Z PASSED [1.9393s] [ 13%] 2025-12-04T11:29:55.5259852Z inductor/test_compile_subprocess.py::GPUTests::test_max_min_cuda <- test/inductor/test_torchinductor.py PASSED [0.9670s] [ 13%] 2025-12-04T11:29:55.5260932Z inductor/test_compile_subprocess.py::GPUTests::test_max_pool2d3_cuda <- test/inductor/test_torchinductor.py W1204 11:25:49.972000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Unable to pickle input graph or example inputs 2025-12-04T11:29:55.5261390Z W1204 11:25:49.972000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Traceback (most recent call last): 2025-12-04T11:29:55.5262294Z W1204 11:25:49.972000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 486, in serialize_compile 2025-12-04T11:29:55.5262673Z W1204 11:25:49.972000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] ).serialize() 2025-12-04T11:29:55.5263532Z W1204 11:25:49.972000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 210, in serialize 2025-12-04T11:29:55.5264108Z W1204 11:25:49.972000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] return _WireProtocolPickledInput(GraphPickler.dumps(self)) 2025-12-04T11:29:55.5264900Z W1204 11:25:49.972000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 127, in dumps 2025-12-04T11:29:55.5265320Z W1204 11:25:49.972000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] pickler.dump(obj) 2025-12-04T11:29:55.5266153Z W1204 11:25:49.972000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 94, in reducer_override 2025-12-04T11:29:55.5266721Z W1204 11:25:49.972000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] return _GraphModulePickleData.reduce_helper(self, obj) 2025-12-04T11:29:55.5267546Z W1204 11:25:49.972000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 352, in reduce_helper 2025-12-04T11:29:55.5268003Z W1204 11:25:49.972000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] cls(obj, pickler.options), 2025-12-04T11:29:55.5268807Z W1204 11:25:49.972000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 364, in __init__ 2025-12-04T11:29:55.5269328Z W1204 11:25:49.972000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] self.graph = _GraphPickleData(gm._graph, options) 2025-12-04T11:29:55.5270166Z W1204 11:25:49.972000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 571, in __init__ 2025-12-04T11:29:55.5270693Z W1204 11:25:49.972000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] nodes[node] = _NodePickleData(node, nodes, options) 2025-12-04T11:29:55.5271723Z W1204 11:25:49.972000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 387, in __init__ 2025-12-04T11:29:55.5272420Z W1204 11:25:49.972000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] self.target = _OpPickleData.pickle(node.target, options) 2025-12-04T11:29:55.5273223Z W1204 11:25:49.972000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 435, in pickle 2025-12-04T11:29:55.5273798Z W1204 11:25:49.972000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] return cls._pickle_op(name, _OpOverloadPickleData, options) 2025-12-04T11:29:55.5274650Z W1204 11:25:49.972000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 456, in _pickle_op 2025-12-04T11:29:55.5275287Z W1204 11:25:49.972000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] raise BypassFxGraphCache(f"Unable to pickle non-standard op: {name}") 2025-12-04T11:29:55.5276251Z W1204 11:25:49.972000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] torch._inductor.codecache.BypassFxGraphCache: Unable to pickle non-standard op: torch.ops.prims._low_memory_max_pool_with_offsets.default 2025-12-04T11:29:55.5276770Z W1204 11:25:51.073000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Unable to pickle input graph or example inputs 2025-12-04T11:29:55.5277230Z W1204 11:25:51.073000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Traceback (most recent call last): 2025-12-04T11:29:55.5278127Z W1204 11:25:51.073000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 486, in serialize_compile 2025-12-04T11:29:55.5278509Z W1204 11:25:51.073000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] ).serialize() 2025-12-04T11:29:55.5279343Z W1204 11:25:51.073000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 210, in serialize 2025-12-04T11:29:55.5279935Z W1204 11:25:51.073000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] return _WireProtocolPickledInput(GraphPickler.dumps(self)) 2025-12-04T11:29:55.5280723Z W1204 11:25:51.073000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 127, in dumps 2025-12-04T11:29:55.5281144Z W1204 11:25:51.073000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] pickler.dump(obj) 2025-12-04T11:29:55.5281988Z W1204 11:25:51.073000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 94, in reducer_override 2025-12-04T11:29:55.5282541Z W1204 11:25:51.073000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] return _GraphModulePickleData.reduce_helper(self, obj) 2025-12-04T11:29:55.5283377Z W1204 11:25:51.073000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 352, in reduce_helper 2025-12-04T11:29:55.5283818Z W1204 11:25:51.073000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] cls(obj, pickler.options), 2025-12-04T11:29:55.5284676Z W1204 11:25:51.073000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 364, in __init__ 2025-12-04T11:29:55.5285198Z W1204 11:25:51.073000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] self.graph = _GraphPickleData(gm._graph, options) 2025-12-04T11:29:55.5286010Z W1204 11:25:51.073000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 571, in __init__ 2025-12-04T11:29:55.5286592Z W1204 11:25:51.073000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] nodes[node] = _NodePickleData(node, nodes, options) 2025-12-04T11:29:55.5287384Z W1204 11:25:51.073000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 387, in __init__ 2025-12-04T11:29:55.5287953Z W1204 11:25:51.073000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] self.target = _OpPickleData.pickle(node.target, options) 2025-12-04T11:29:55.5288769Z W1204 11:25:51.073000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 435, in pickle 2025-12-04T11:29:55.5289349Z W1204 11:25:51.073000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] return cls._pickle_op(name, _OpOverloadPickleData, options) 2025-12-04T11:29:55.5290158Z W1204 11:25:51.073000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 456, in _pickle_op 2025-12-04T11:29:55.5290791Z W1204 11:25:51.073000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] raise BypassFxGraphCache(f"Unable to pickle non-standard op: {name}") 2025-12-04T11:29:55.5291757Z W1204 11:25:51.073000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] torch._inductor.codecache.BypassFxGraphCache: Unable to pickle non-standard op: torch.ops.prims._low_memory_max_pool_with_offsets.default 2025-12-04T11:29:55.5291866Z PASSED [2.2417s] [ 14%] 2025-12-04T11:29:55.5292534Z inductor/test_compile_subprocess.py::GPUTests::test_max_pool2d_with_indices_backward2_cuda <- test/inductor/test_torchinductor.py PASSED [8.1056s] [ 15%] 2025-12-04T11:29:55.5293186Z inductor/test_compile_subprocess.py::GPUTests::test_max_pool2d_with_indices_backward4_cuda <- test/inductor/test_torchinductor.py PASSED [14.3095s] [ 15%] 2025-12-04T11:29:55.5293840Z inductor/test_compile_subprocess.py::GPUTests::test_max_pool2d_with_indices_backward5_cuda <- test/inductor/test_torchinductor.py PASSED [0.3123s] [ 16%] 2025-12-04T11:29:55.5294470Z inductor/test_compile_subprocess.py::GPUTests::test_max_pool2d_with_indices_backward_cuda <- test/inductor/test_torchinductor.py PASSED [2.8654s] [ 17%] 2025-12-04T11:29:55.5294981Z inductor/test_compile_subprocess.py::GPUTests::test_mean_cuda <- test/inductor/test_torchinductor.py PASSED [0.7647s] [ 17%] 2025-12-04T11:29:55.5295565Z inductor/test_compile_subprocess.py::GPUTests::test_min_max_reduction_cuda <- test/inductor/test_torchinductor.py PASSED [0.7996s] [ 18%] 2025-12-04T11:29:55.5296177Z inductor/test_compile_subprocess.py::GPUTests::test_misaligned_address_issue1_cuda <- test/inductor/test_torchinductor.py PASSED [0.4851s] [ 19%] 2025-12-04T11:29:55.5296869Z inductor/test_compile_subprocess.py::GPUTests::test_mixed_mm2_cuda <- test/inductor/test_torchinductor.py SKIPPED [0.0034s] (Requires sm80) [ 20%] 2025-12-04T11:29:55.5297484Z inductor/test_compile_subprocess.py::GPUTests::test_mixed_mm3_cuda <- test/inductor/test_torchinductor.py SKIPPED [0.0033s] (Requires sm80) [ 20%] 2025-12-04T11:29:55.5298054Z inductor/test_compile_subprocess.py::GPUTests::test_mm_mixed_dtype_cuda <- test/inductor/test_torchinductor.py PASSED [0.1391s] [ 21%] 2025-12-04T11:29:55.5299140Z inductor/test_compile_subprocess.py::GPUTests::test_mul_index_expr_cuda <- test/inductor/test_torchinductor.py W1204 11:26:19.982000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Unable to pickle input graph or example inputs 2025-12-04T11:29:55.5299604Z W1204 11:26:19.982000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Traceback (most recent call last): 2025-12-04T11:29:55.5300510Z W1204 11:26:19.982000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 486, in serialize_compile 2025-12-04T11:29:55.5300998Z W1204 11:26:19.982000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] ).serialize() 2025-12-04T11:29:55.5301847Z W1204 11:26:19.982000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 210, in serialize 2025-12-04T11:29:55.5302425Z W1204 11:26:19.982000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] return _WireProtocolPickledInput(GraphPickler.dumps(self)) 2025-12-04T11:29:55.5303261Z W1204 11:26:19.982000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 127, in dumps 2025-12-04T11:29:55.5303663Z W1204 11:26:19.982000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] pickler.dump(obj) 2025-12-04T11:29:55.5304504Z W1204 11:26:19.982000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 94, in reducer_override 2025-12-04T11:29:55.5305068Z W1204 11:26:19.982000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] return _GraphModulePickleData.reduce_helper(self, obj) 2025-12-04T11:29:55.5305891Z W1204 11:26:19.982000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 352, in reduce_helper 2025-12-04T11:29:55.5306352Z W1204 11:26:19.982000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] cls(obj, pickler.options), 2025-12-04T11:29:55.5307152Z W1204 11:26:19.982000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 364, in __init__ 2025-12-04T11:29:55.5307673Z W1204 11:26:19.982000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] self.graph = _GraphPickleData(gm._graph, options) 2025-12-04T11:29:55.5308484Z W1204 11:26:19.982000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 571, in __init__ 2025-12-04T11:29:55.5309004Z W1204 11:26:19.982000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] nodes[node] = _NodePickleData(node, nodes, options) 2025-12-04T11:29:55.5309817Z W1204 11:26:19.982000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 387, in __init__ 2025-12-04T11:29:55.5310369Z W1204 11:26:19.982000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] self.target = _OpPickleData.pickle(node.target, options) 2025-12-04T11:29:55.5311169Z W1204 11:26:19.982000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 435, in pickle 2025-12-04T11:29:55.5311737Z W1204 11:26:19.982000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] return cls._pickle_op(name, _OpOverloadPickleData, options) 2025-12-04T11:29:55.5312580Z W1204 11:26:19.982000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 456, in _pickle_op 2025-12-04T11:29:55.5313216Z W1204 11:26:19.982000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] raise BypassFxGraphCache(f"Unable to pickle non-standard op: {name}") 2025-12-04T11:29:55.5314040Z W1204 11:26:19.982000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] torch._inductor.codecache.BypassFxGraphCache: Unable to pickle non-standard op: torch.ops.prims.iota.default 2025-12-04T11:29:55.5314192Z PASSED [0.3643s] [ 22%] 2025-12-04T11:29:55.5314957Z inductor/test_compile_subprocess.py::GPUTests::test_multi_gpu_device_cuda <- test/inductor/test_torchinductor.py SKIPPED [0.0003s] (requires multiple cuda devices) [ 22%] 2025-12-04T11:29:55.5315529Z inductor/test_compile_subprocess.py::GPUTests::test_multi_threading_cuda <- test/inductor/test_torchinductor.py PASSED [0.2309s] [ 23%] 2025-12-04T11:29:55.5316101Z inductor/test_compile_subprocess.py::GPUTests::test_multilayer_any_cuda <- test/inductor/test_torchinductor.py PASSED [1.1302s] [ 24%] 2025-12-04T11:29:55.5316702Z inductor/test_compile_subprocess.py::GPUTests::test_multilayer_sum_low_prec_cuda <- test/inductor/test_torchinductor.py PASSED [0.4627s] [ 24%] 2025-12-04T11:29:55.5317325Z inductor/test_compile_subprocess.py::GPUTests::test_multilayer_var_lowp_cuda <- test/inductor/test_torchinductor.py PASSED [1.2566s] [ 25%] 2025-12-04T11:29:55.5318460Z inductor/test_compile_subprocess.py::GPUTests::test_mutable_custom_op_fixed_layout2_cuda <- test/inductor/test_torchinductor.py W1204 11:26:23.532000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Unable to pickle input graph or example inputs 2025-12-04T11:29:55.5318936Z W1204 11:26:23.532000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Traceback (most recent call last): 2025-12-04T11:29:55.5319819Z W1204 11:26:23.532000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 486, in serialize_compile 2025-12-04T11:29:55.5320202Z W1204 11:26:23.532000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] ).serialize() 2025-12-04T11:29:55.5321053Z W1204 11:26:23.532000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 210, in serialize 2025-12-04T11:29:55.5321632Z W1204 11:26:23.532000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] return _WireProtocolPickledInput(GraphPickler.dumps(self)) 2025-12-04T11:29:55.5322437Z W1204 11:26:23.532000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 127, in dumps 2025-12-04T11:29:55.5322839Z W1204 11:26:23.532000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] pickler.dump(obj) 2025-12-04T11:29:55.5323698Z W1204 11:26:23.532000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 94, in reducer_override 2025-12-04T11:29:55.5324252Z W1204 11:26:23.532000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] return _GraphModulePickleData.reduce_helper(self, obj) 2025-12-04T11:29:55.5325078Z W1204 11:26:23.532000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 352, in reduce_helper 2025-12-04T11:29:55.5325537Z W1204 11:26:23.532000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] cls(obj, pickler.options), 2025-12-04T11:29:55.5326339Z W1204 11:26:23.532000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 364, in __init__ 2025-12-04T11:29:55.5326908Z W1204 11:26:23.532000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] self.graph = _GraphPickleData(gm._graph, options) 2025-12-04T11:29:55.5327711Z W1204 11:26:23.532000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 571, in __init__ 2025-12-04T11:29:55.5328245Z W1204 11:26:23.532000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] nodes[node] = _NodePickleData(node, nodes, options) 2025-12-04T11:29:55.5329109Z W1204 11:26:23.532000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 387, in __init__ 2025-12-04T11:29:55.5329664Z W1204 11:26:23.532000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] self.target = _OpPickleData.pickle(node.target, options) 2025-12-04T11:29:55.5330473Z W1204 11:26:23.532000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 435, in pickle 2025-12-04T11:29:55.5331079Z W1204 11:26:23.532000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] return cls._pickle_op(name, _OpOverloadPickleData, options) 2025-12-04T11:29:55.5331904Z W1204 11:26:23.532000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 456, in _pickle_op 2025-12-04T11:29:55.5332526Z W1204 11:26:23.532000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] raise BypassFxGraphCache(f"Unable to pickle non-standard op: {name}") 2025-12-04T11:29:55.5333360Z W1204 11:26:23.532000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] torch._inductor.codecache.BypassFxGraphCache: Unable to pickle non-standard op: torch.ops.mylib.bar.default 2025-12-04T11:29:55.5333820Z W1204 11:26:23.555000 99744 site-packages/torch/_inductor/utils.py:1703] [0/0] Not enough SMs to use max_autotune_gemm mode 2025-12-04T11:29:55.5333930Z PASSED [0.5215s] [ 26%] 2025-12-04T11:29:55.5334622Z inductor/test_compile_subprocess.py::GPUTests::test_nan_sort_stable_False_descending_False_cuda <- test/inductor/test_torchinductor.py PASSED [0.7984s] [ 26%] 2025-12-04T11:29:55.5335156Z inductor/test_compile_subprocess.py::GPUTests::test_new_empty_cuda <- test/inductor/test_torchinductor.py PASSED [0.2547s] [ 27%] 2025-12-04T11:29:55.5336240Z inductor/test_compile_subprocess.py::GPUTests::test_nll_loss_backward_cuda <- test/inductor/test_torchinductor.py W1204 11:26:25.064000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Unable to pickle input graph or example inputs 2025-12-04T11:29:55.5336765Z W1204 11:26:25.064000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Traceback (most recent call last): 2025-12-04T11:29:55.5337663Z W1204 11:26:25.064000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 486, in serialize_compile 2025-12-04T11:29:55.5338061Z W1204 11:26:25.064000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] ).serialize() 2025-12-04T11:29:55.5338894Z W1204 11:26:25.064000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 210, in serialize 2025-12-04T11:29:55.5339484Z W1204 11:26:25.064000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] return _WireProtocolPickledInput(GraphPickler.dumps(self)) 2025-12-04T11:29:55.5340281Z W1204 11:26:25.064000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 127, in dumps 2025-12-04T11:29:55.5340699Z W1204 11:26:25.064000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] pickler.dump(obj) 2025-12-04T11:29:55.5341583Z W1204 11:26:25.064000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 94, in reducer_override 2025-12-04T11:29:55.5342137Z W1204 11:26:25.064000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] return _GraphModulePickleData.reduce_helper(self, obj) 2025-12-04T11:29:55.5342971Z W1204 11:26:25.064000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 352, in reduce_helper 2025-12-04T11:29:55.5343478Z W1204 11:26:25.064000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] cls(obj, pickler.options), 2025-12-04T11:29:55.5344287Z W1204 11:26:25.064000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 364, in __init__ 2025-12-04T11:29:55.5344807Z W1204 11:26:25.064000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] self.graph = _GraphPickleData(gm._graph, options) 2025-12-04T11:29:55.5345639Z W1204 11:26:25.064000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 571, in __init__ 2025-12-04T11:29:55.5346171Z W1204 11:26:25.064000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] nodes[node] = _NodePickleData(node, nodes, options) 2025-12-04T11:29:55.5346970Z W1204 11:26:25.064000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 387, in __init__ 2025-12-04T11:29:55.5347535Z W1204 11:26:25.064000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] self.target = _OpPickleData.pickle(node.target, options) 2025-12-04T11:29:55.5348326Z W1204 11:26:25.064000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 435, in pickle 2025-12-04T11:29:55.5348910Z W1204 11:26:25.064000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] return cls._pickle_op(name, _OpOverloadPickleData, options) 2025-12-04T11:29:55.5349715Z W1204 11:26:25.064000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 456, in _pickle_op 2025-12-04T11:29:55.5350340Z W1204 11:26:25.064000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] raise BypassFxGraphCache(f"Unable to pickle non-standard op: {name}") 2025-12-04T11:29:55.5351183Z W1204 11:26:25.064000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] torch._inductor.codecache.BypassFxGraphCache: Unable to pickle non-standard op: torch.ops.prims.iota.default 2025-12-04T11:29:55.5351693Z W1204 11:26:25.364000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Unable to pickle input graph or example inputs 2025-12-04T11:29:55.5352158Z W1204 11:26:25.364000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Traceback (most recent call last): 2025-12-04T11:29:55.5353057Z W1204 11:26:25.364000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 486, in serialize_compile 2025-12-04T11:29:55.5353441Z W1204 11:26:25.364000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] ).serialize() 2025-12-04T11:29:55.5354291Z W1204 11:26:25.364000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 210, in serialize 2025-12-04T11:29:55.5354867Z W1204 11:26:25.364000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] return _WireProtocolPickledInput(GraphPickler.dumps(self)) 2025-12-04T11:29:55.5355723Z W1204 11:26:25.364000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 127, in dumps 2025-12-04T11:29:55.5356129Z W1204 11:26:25.364000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] pickler.dump(obj) 2025-12-04T11:29:55.5356961Z W1204 11:26:25.364000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 94, in reducer_override 2025-12-04T11:29:55.5357591Z W1204 11:26:25.364000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] return _GraphModulePickleData.reduce_helper(self, obj) 2025-12-04T11:29:55.5358412Z W1204 11:26:25.364000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 352, in reduce_helper 2025-12-04T11:29:55.5358865Z W1204 11:26:25.364000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] cls(obj, pickler.options), 2025-12-04T11:29:55.5359690Z W1204 11:26:25.364000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 364, in __init__ 2025-12-04T11:29:55.5360208Z W1204 11:26:25.364000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] self.graph = _GraphPickleData(gm._graph, options) 2025-12-04T11:29:55.5361023Z W1204 11:26:25.364000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 571, in __init__ 2025-12-04T11:29:55.5361543Z W1204 11:26:25.364000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] nodes[node] = _NodePickleData(node, nodes, options) 2025-12-04T11:29:55.5362356Z W1204 11:26:25.364000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 387, in __init__ 2025-12-04T11:29:55.5362909Z W1204 11:26:25.364000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] self.target = _OpPickleData.pickle(node.target, options) 2025-12-04T11:29:55.5363709Z W1204 11:26:25.364000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 435, in pickle 2025-12-04T11:29:55.5364282Z W1204 11:26:25.364000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] return cls._pickle_op(name, _OpOverloadPickleData, options) 2025-12-04T11:29:55.5365092Z W1204 11:26:25.364000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 456, in _pickle_op 2025-12-04T11:29:55.5365733Z W1204 11:26:25.364000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] raise BypassFxGraphCache(f"Unable to pickle non-standard op: {name}") 2025-12-04T11:29:55.5366640Z W1204 11:26:25.364000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] torch._inductor.codecache.BypassFxGraphCache: Unable to pickle non-standard op: torch.ops.prims.convert_element_type.default 2025-12-04T11:29:55.5366762Z PASSED [0.7292s] [ 28%] 2025-12-04T11:29:55.5367827Z inductor/test_compile_subprocess.py::GPUTests::test_nll_loss_forward_cuda <- test/inductor/test_torchinductor.py W1204 11:26:25.770000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Unable to pickle input graph or example inputs 2025-12-04T11:29:55.5368292Z W1204 11:26:25.770000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Traceback (most recent call last): 2025-12-04T11:29:55.5369210Z W1204 11:26:25.770000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 486, in serialize_compile 2025-12-04T11:29:55.5369594Z W1204 11:26:25.770000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] ).serialize() 2025-12-04T11:29:55.5370450Z W1204 11:26:25.770000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 210, in serialize 2025-12-04T11:29:55.5371301Z W1204 11:26:25.770000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] return _WireProtocolPickledInput(GraphPickler.dumps(self)) 2025-12-04T11:29:55.5372181Z W1204 11:26:25.770000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 127, in dumps 2025-12-04T11:29:55.5372587Z W1204 11:26:25.770000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] pickler.dump(obj) 2025-12-04T11:29:55.5373441Z W1204 11:26:25.770000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 94, in reducer_override 2025-12-04T11:29:55.5374037Z W1204 11:26:25.770000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] return _GraphModulePickleData.reduce_helper(self, obj) 2025-12-04T11:29:55.5374856Z W1204 11:26:25.770000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 352, in reduce_helper 2025-12-04T11:29:55.5375319Z W1204 11:26:25.770000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] cls(obj, pickler.options), 2025-12-04T11:29:55.5376116Z W1204 11:26:25.770000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 364, in __init__ 2025-12-04T11:29:55.5376716Z W1204 11:26:25.770000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] self.graph = _GraphPickleData(gm._graph, options) 2025-12-04T11:29:55.5377513Z W1204 11:26:25.770000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 571, in __init__ 2025-12-04T11:29:55.5378033Z W1204 11:26:25.770000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] nodes[node] = _NodePickleData(node, nodes, options) 2025-12-04T11:29:55.5378846Z W1204 11:26:25.770000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 387, in __init__ 2025-12-04T11:29:55.5379393Z W1204 11:26:25.770000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] self.target = _OpPickleData.pickle(node.target, options) 2025-12-04T11:29:55.5380194Z W1204 11:26:25.770000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 435, in pickle 2025-12-04T11:29:55.5380763Z W1204 11:26:25.770000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] return cls._pickle_op(name, _OpOverloadPickleData, options) 2025-12-04T11:29:55.5381578Z W1204 11:26:25.770000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 456, in _pickle_op 2025-12-04T11:29:55.5382204Z W1204 11:26:25.770000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] raise BypassFxGraphCache(f"Unable to pickle non-standard op: {name}") 2025-12-04T11:29:55.5383104Z W1204 11:26:25.770000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] torch._inductor.codecache.BypassFxGraphCache: Unable to pickle non-standard op: torch.ops.prims.convert_element_type.default 2025-12-04T11:29:55.5383681Z W1204 11:26:26.186000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Unable to pickle input graph or example inputs 2025-12-04T11:29:55.5384138Z W1204 11:26:26.186000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Traceback (most recent call last): 2025-12-04T11:29:55.5385028Z W1204 11:26:26.186000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 486, in serialize_compile 2025-12-04T11:29:55.5385458Z W1204 11:26:26.186000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] ).serialize() 2025-12-04T11:29:55.5386332Z W1204 11:26:26.186000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 210, in serialize 2025-12-04T11:29:55.5386907Z W1204 11:26:26.186000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] return _WireProtocolPickledInput(GraphPickler.dumps(self)) 2025-12-04T11:29:55.5387695Z W1204 11:26:26.186000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 127, in dumps 2025-12-04T11:29:55.5388143Z W1204 11:26:26.186000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] pickler.dump(obj) 2025-12-04T11:29:55.5388982Z W1204 11:26:26.186000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 94, in reducer_override 2025-12-04T11:29:55.5389549Z W1204 11:26:26.186000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] return _GraphModulePickleData.reduce_helper(self, obj) 2025-12-04T11:29:55.5390366Z W1204 11:26:26.186000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 352, in reduce_helper 2025-12-04T11:29:55.5390824Z W1204 11:26:26.186000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] cls(obj, pickler.options), 2025-12-04T11:29:55.5391620Z W1204 11:26:26.186000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 364, in __init__ 2025-12-04T11:29:55.5392140Z W1204 11:26:26.186000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] self.graph = _GraphPickleData(gm._graph, options) 2025-12-04T11:29:55.5392950Z W1204 11:26:26.186000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 571, in __init__ 2025-12-04T11:29:55.5393470Z W1204 11:26:26.186000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] nodes[node] = _NodePickleData(node, nodes, options) 2025-12-04T11:29:55.5394282Z W1204 11:26:26.186000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 387, in __init__ 2025-12-04T11:29:55.5394834Z W1204 11:26:26.186000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] self.target = _OpPickleData.pickle(node.target, options) 2025-12-04T11:29:55.5395620Z W1204 11:26:26.186000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 435, in pickle 2025-12-04T11:29:55.5396207Z W1204 11:26:26.186000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] return cls._pickle_op(name, _OpOverloadPickleData, options) 2025-12-04T11:29:55.5397015Z W1204 11:26:26.186000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 456, in _pickle_op 2025-12-04T11:29:55.5397684Z W1204 11:26:26.186000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] raise BypassFxGraphCache(f"Unable to pickle non-standard op: {name}") 2025-12-04T11:29:55.5398588Z W1204 11:26:26.186000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] torch._inductor.codecache.BypassFxGraphCache: Unable to pickle non-standard op: torch.ops.prims.convert_element_type.default 2025-12-04T11:29:55.5398705Z PASSED [0.9424s] [ 28%] 2025-12-04T11:29:55.5399784Z inductor/test_compile_subprocess.py::GPUTests::test_one_hot_cuda <- test/inductor/test_torchinductor.py W1204 11:26:26.697000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Unable to pickle input graph or example inputs 2025-12-04T11:29:55.5400240Z W1204 11:26:26.697000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Traceback (most recent call last): 2025-12-04T11:29:55.5401135Z W1204 11:26:26.697000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 486, in serialize_compile 2025-12-04T11:29:55.5401559Z W1204 11:26:26.697000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] ).serialize() 2025-12-04T11:29:55.5402408Z W1204 11:26:26.697000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 210, in serialize 2025-12-04T11:29:55.5402983Z W1204 11:26:26.697000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] return _WireProtocolPickledInput(GraphPickler.dumps(self)) 2025-12-04T11:29:55.5403777Z W1204 11:26:26.697000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 127, in dumps 2025-12-04T11:29:55.5404179Z W1204 11:26:26.697000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] pickler.dump(obj) 2025-12-04T11:29:55.5405010Z W1204 11:26:26.697000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 94, in reducer_override 2025-12-04T11:29:55.5405572Z W1204 11:26:26.697000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] return _GraphModulePickleData.reduce_helper(self, obj) 2025-12-04T11:29:55.5406390Z W1204 11:26:26.697000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 352, in reduce_helper 2025-12-04T11:29:55.5406850Z W1204 11:26:26.697000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] cls(obj, pickler.options), 2025-12-04T11:29:55.5407646Z W1204 11:26:26.697000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 364, in __init__ 2025-12-04T11:29:55.5408176Z W1204 11:26:26.697000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] self.graph = _GraphPickleData(gm._graph, options) 2025-12-04T11:29:55.5408972Z W1204 11:26:26.697000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 571, in __init__ 2025-12-04T11:29:55.5409487Z W1204 11:26:26.697000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] nodes[node] = _NodePickleData(node, nodes, options) 2025-12-04T11:29:55.5410297Z W1204 11:26:26.697000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 387, in __init__ 2025-12-04T11:29:55.5410848Z W1204 11:26:26.697000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] self.target = _OpPickleData.pickle(node.target, options) 2025-12-04T11:29:55.5411690Z W1204 11:26:26.697000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 435, in pickle 2025-12-04T11:29:55.5412260Z W1204 11:26:26.697000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] return cls._pickle_op(name, _OpOverloadPickleData, options) 2025-12-04T11:29:55.5413078Z W1204 11:26:26.697000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 456, in _pickle_op 2025-12-04T11:29:55.5413763Z W1204 11:26:26.697000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] raise BypassFxGraphCache(f"Unable to pickle non-standard op: {name}") 2025-12-04T11:29:55.5414588Z W1204 11:26:26.697000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] torch._inductor.codecache.BypassFxGraphCache: Unable to pickle non-standard op: torch.ops.prims.iota.default 2025-12-04T11:29:55.5414707Z PASSED [0.2554s] [ 29%] 2025-12-04T11:29:55.5415815Z inductor/test_compile_subprocess.py::GPUTests::test_pattern_matcher_unbacked_cuda <- test/inductor/test_torchinductor.py W1204 11:26:27.036000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Unable to pickle input graph or example inputs 2025-12-04T11:29:55.5416310Z W1204 11:26:27.036000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Traceback (most recent call last): 2025-12-04T11:29:55.5417273Z W1204 11:26:27.036000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 486, in serialize_compile 2025-12-04T11:29:55.5417660Z W1204 11:26:27.036000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] ).serialize() 2025-12-04T11:29:55.5418508Z W1204 11:26:27.036000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 210, in serialize 2025-12-04T11:29:55.5419086Z W1204 11:26:27.036000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] return _WireProtocolPickledInput(GraphPickler.dumps(self)) 2025-12-04T11:29:55.5419883Z W1204 11:26:27.036000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 127, in dumps 2025-12-04T11:29:55.5420287Z W1204 11:26:27.036000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] pickler.dump(obj) 2025-12-04T11:29:55.5421138Z W1204 11:26:27.036000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 94, in reducer_override 2025-12-04T11:29:55.5421689Z W1204 11:26:27.036000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] return _GraphModulePickleData.reduce_helper(self, obj) 2025-12-04T11:29:55.5422514Z W1204 11:26:27.036000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 352, in reduce_helper 2025-12-04T11:29:55.5422966Z W1204 11:26:27.036000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] cls(obj, pickler.options), 2025-12-04T11:29:55.5423765Z W1204 11:26:27.036000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 364, in __init__ 2025-12-04T11:29:55.5424301Z W1204 11:26:27.036000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] self.graph = _GraphPickleData(gm._graph, options) 2025-12-04T11:29:55.5425093Z W1204 11:26:27.036000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 571, in __init__ 2025-12-04T11:29:55.5425664Z W1204 11:26:27.036000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] nodes[node] = _NodePickleData(node, nodes, options) 2025-12-04T11:29:55.5426469Z W1204 11:26:27.036000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 387, in __init__ 2025-12-04T11:29:55.5427017Z W1204 11:26:27.036000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] self.target = _OpPickleData.pickle(node.target, options) 2025-12-04T11:29:55.5427881Z W1204 11:26:27.036000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 435, in pickle 2025-12-04T11:29:55.5428449Z W1204 11:26:27.036000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] return cls._pickle_op(name, _OpOverloadPickleData, options) 2025-12-04T11:29:55.5429263Z W1204 11:26:27.036000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 456, in _pickle_op 2025-12-04T11:29:55.5429912Z W1204 11:26:27.036000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] raise BypassFxGraphCache(f"Unable to pickle non-standard op: {name}") 2025-12-04T11:29:55.5430829Z W1204 11:26:27.036000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] torch._inductor.codecache.BypassFxGraphCache: Unable to pickle non-standard op: torch.ops.prims.convert_element_type.default 2025-12-04T11:29:55.5430936Z PASSED [0.5867s] [ 30%] 2025-12-04T11:29:55.5431530Z inductor/test_compile_subprocess.py::GPUTests::test_pointwise_bessel_j0_cuda <- test/inductor/test_torchinductor.py PASSED [0.9510s] [ 31%] 2025-12-04T11:29:55.5432123Z inductor/test_compile_subprocess.py::GPUTests::test_pointwise_bessel_y0_cuda <- test/inductor/test_torchinductor.py PASSED [0.3430s] [ 31%] 2025-12-04T11:29:55.5432705Z inductor/test_compile_subprocess.py::GPUTests::test_pointwise_bessel_y1_cuda <- test/inductor/test_torchinductor.py PASSED [0.3513s] [ 32%] 2025-12-04T11:29:55.5433369Z inductor/test_compile_subprocess.py::GPUTests::test_pointwise_chebyshev_polynomial_t_cuda <- test/inductor/test_torchinductor.py PASSED [0.4259s] [ 33%] 2025-12-04T11:29:55.5434016Z inductor/test_compile_subprocess.py::GPUTests::test_pointwise_chebyshev_polynomial_w_cuda <- test/inductor/test_torchinductor.py PASSED [0.1360s] [ 33%] 2025-12-04T11:29:55.5435075Z inductor/test_compile_subprocess.py::GPUTests::test_pointwise_entr_cuda <- test/inductor/test_torchinductor.py W1204 11:26:30.262000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Unable to pickle input graph or example inputs 2025-12-04T11:29:55.5435541Z W1204 11:26:30.262000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Traceback (most recent call last): 2025-12-04T11:29:55.5436423Z W1204 11:26:30.262000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 486, in serialize_compile 2025-12-04T11:29:55.5436814Z W1204 11:26:30.262000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] ).serialize() 2025-12-04T11:29:55.5437649Z W1204 11:26:30.262000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 210, in serialize 2025-12-04T11:29:55.5438224Z W1204 11:26:30.262000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] return _WireProtocolPickledInput(GraphPickler.dumps(self)) 2025-12-04T11:29:55.5439023Z W1204 11:26:30.262000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 127, in dumps 2025-12-04T11:29:55.5439421Z W1204 11:26:30.262000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] pickler.dump(obj) 2025-12-04T11:29:55.5440296Z W1204 11:26:30.262000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 94, in reducer_override 2025-12-04T11:29:55.5440845Z W1204 11:26:30.262000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] return _GraphModulePickleData.reduce_helper(self, obj) 2025-12-04T11:29:55.5441677Z W1204 11:26:30.262000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 352, in reduce_helper 2025-12-04T11:29:55.5442181Z W1204 11:26:30.262000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] cls(obj, pickler.options), 2025-12-04T11:29:55.5442982Z W1204 11:26:30.262000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 364, in __init__ 2025-12-04T11:29:55.5443514Z W1204 11:26:30.262000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] self.graph = _GraphPickleData(gm._graph, options) 2025-12-04T11:29:55.5444421Z W1204 11:26:30.262000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 571, in __init__ 2025-12-04T11:29:55.5444951Z W1204 11:26:30.262000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] nodes[node] = _NodePickleData(node, nodes, options) 2025-12-04T11:29:55.5445746Z W1204 11:26:30.262000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 387, in __init__ 2025-12-04T11:29:55.5446309Z W1204 11:26:30.262000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] self.target = _OpPickleData.pickle(node.target, options) 2025-12-04T11:29:55.5447098Z W1204 11:26:30.262000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 435, in pickle 2025-12-04T11:29:55.5447670Z W1204 11:26:30.262000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] return cls._pickle_op(name, _OpOverloadPickleData, options) 2025-12-04T11:29:55.5448490Z W1204 11:26:30.262000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 456, in _pickle_op 2025-12-04T11:29:55.5449113Z W1204 11:26:30.262000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] raise BypassFxGraphCache(f"Unable to pickle non-standard op: {name}") 2025-12-04T11:29:55.5450039Z W1204 11:26:30.262000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] torch._inductor.codecache.BypassFxGraphCache: Unable to pickle non-standard op: torch.ops.prims.convert_element_type.default 2025-12-04T11:29:55.5450150Z PASSED [0.7919s] [ 34%] 2025-12-04T11:29:55.5450739Z inductor/test_compile_subprocess.py::GPUTests::test_pointwise_erfcx_cuda <- test/inductor/test_torchinductor.py PASSED [6.1724s] [ 35%] 2025-12-04T11:29:55.5451311Z inductor/test_compile_subprocess.py::GPUTests::test_pointwise_expm1_cuda <- test/inductor/test_torchinductor.py PASSED [0.4386s] [ 35%] 2025-12-04T11:29:55.5451902Z inductor/test_compile_subprocess.py::GPUTests::test_pointwise_gammaincc_cuda <- test/inductor/test_torchinductor.py PASSED [0.1281s] [ 36%] 2025-12-04T11:29:55.5452494Z inductor/test_compile_subprocess.py::GPUTests::test_pointwise_gammaln_cuda <- test/inductor/test_torchinductor.py PASSED [0.5323s] [ 37%] 2025-12-04T11:29:55.5453143Z inductor/test_compile_subprocess.py::GPUTests::test_pointwise_laguerre_polynomial_l_cuda <- test/inductor/test_torchinductor.py PASSED [0.2462s] [ 37%] 2025-12-04T11:29:55.5454252Z inductor/test_compile_subprocess.py::GPUTests::test_pointwise_log_ndtr_cuda <- test/inductor/test_torchinductor.py W1204 11:26:38.651000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Unable to pickle input graph or example inputs 2025-12-04T11:29:55.5454721Z W1204 11:26:38.651000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Traceback (most recent call last): 2025-12-04T11:29:55.5455605Z W1204 11:26:38.651000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 486, in serialize_compile 2025-12-04T11:29:55.5456038Z W1204 11:26:38.651000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] ).serialize() 2025-12-04T11:29:55.5456997Z W1204 11:26:38.651000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 210, in serialize 2025-12-04T11:29:55.5457589Z W1204 11:26:38.651000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] return _WireProtocolPickledInput(GraphPickler.dumps(self)) 2025-12-04T11:29:55.5458379Z W1204 11:26:38.651000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 127, in dumps 2025-12-04T11:29:55.5458822Z W1204 11:26:38.651000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] pickler.dump(obj) 2025-12-04T11:29:55.5459652Z W1204 11:26:38.651000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 94, in reducer_override 2025-12-04T11:29:55.5460206Z W1204 11:26:38.651000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] return _GraphModulePickleData.reduce_helper(self, obj) 2025-12-04T11:29:55.5461038Z W1204 11:26:38.651000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 352, in reduce_helper 2025-12-04T11:29:55.5461481Z W1204 11:26:38.651000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] cls(obj, pickler.options), 2025-12-04T11:29:55.5462292Z W1204 11:26:38.651000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 364, in __init__ 2025-12-04T11:29:55.5462811Z W1204 11:26:38.651000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] self.graph = _GraphPickleData(gm._graph, options) 2025-12-04T11:29:55.5463624Z W1204 11:26:38.651000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 571, in __init__ 2025-12-04T11:29:55.5464141Z W1204 11:26:38.651000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] nodes[node] = _NodePickleData(node, nodes, options) 2025-12-04T11:29:55.5464937Z W1204 11:26:38.651000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 387, in __init__ 2025-12-04T11:29:55.5465501Z W1204 11:26:38.651000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] self.target = _OpPickleData.pickle(node.target, options) 2025-12-04T11:29:55.5466294Z W1204 11:26:38.651000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 435, in pickle 2025-12-04T11:29:55.5466886Z W1204 11:26:38.651000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] return cls._pickle_op(name, _OpOverloadPickleData, options) 2025-12-04T11:29:55.5467706Z W1204 11:26:38.651000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 456, in _pickle_op 2025-12-04T11:29:55.5468362Z W1204 11:26:38.651000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] raise BypassFxGraphCache(f"Unable to pickle non-standard op: {name}") 2025-12-04T11:29:55.5469268Z W1204 11:26:38.651000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] torch._inductor.codecache.BypassFxGraphCache: Unable to pickle non-standard op: torch.ops.prims.convert_element_type.default 2025-12-04T11:29:55.5469389Z PASSED [0.8885s] [ 38%] 2025-12-04T11:29:55.5470515Z inductor/test_compile_subprocess.py::GPUTests::test_pointwise_logit_cuda <- test/inductor/test_torchinductor.py W1204 11:26:39.241000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Unable to pickle input graph or example inputs 2025-12-04T11:29:55.5471209Z W1204 11:26:39.241000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Traceback (most recent call last): 2025-12-04T11:29:55.5472114Z W1204 11:26:39.241000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 486, in serialize_compile 2025-12-04T11:29:55.5472582Z W1204 11:26:39.241000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] ).serialize() 2025-12-04T11:29:55.5473417Z W1204 11:26:39.241000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 210, in serialize 2025-12-04T11:29:55.5473996Z W1204 11:26:39.241000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] return _WireProtocolPickledInput(GraphPickler.dumps(self)) 2025-12-04T11:29:55.5474798Z W1204 11:26:39.241000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 127, in dumps 2025-12-04T11:29:55.5475201Z W1204 11:26:39.241000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] pickler.dump(obj) 2025-12-04T11:29:55.5476050Z W1204 11:26:39.241000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 94, in reducer_override 2025-12-04T11:29:55.5476602Z W1204 11:26:39.241000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] return _GraphModulePickleData.reduce_helper(self, obj) 2025-12-04T11:29:55.5477426Z W1204 11:26:39.241000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 352, in reduce_helper 2025-12-04T11:29:55.5477894Z W1204 11:26:39.241000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] cls(obj, pickler.options), 2025-12-04T11:29:55.5478694Z W1204 11:26:39.241000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 364, in __init__ 2025-12-04T11:29:55.5479230Z W1204 11:26:39.241000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] self.graph = _GraphPickleData(gm._graph, options) 2025-12-04T11:29:55.5480027Z W1204 11:26:39.241000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 571, in __init__ 2025-12-04T11:29:55.5480564Z W1204 11:26:39.241000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] nodes[node] = _NodePickleData(node, nodes, options) 2025-12-04T11:29:55.5481368Z W1204 11:26:39.241000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 387, in __init__ 2025-12-04T11:29:55.5481919Z W1204 11:26:39.241000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] self.target = _OpPickleData.pickle(node.target, options) 2025-12-04T11:29:55.5482770Z W1204 11:26:39.241000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 435, in pickle 2025-12-04T11:29:55.5483344Z W1204 11:26:39.241000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] return cls._pickle_op(name, _OpOverloadPickleData, options) 2025-12-04T11:29:55.5484167Z W1204 11:26:39.241000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 456, in _pickle_op 2025-12-04T11:29:55.5484881Z W1204 11:26:39.241000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] raise BypassFxGraphCache(f"Unable to pickle non-standard op: {name}") 2025-12-04T11:29:55.5485802Z W1204 11:26:39.241000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] torch._inductor.codecache.BypassFxGraphCache: Unable to pickle non-standard op: torch.ops.prims.convert_element_type.default 2025-12-04T11:29:55.5485911Z PASSED [0.5333s] [ 39%] 2025-12-04T11:29:55.5487024Z inductor/test_compile_subprocess.py::GPUTests::test_pointwise_multigammaln_cuda <- test/inductor/test_torchinductor.py W1204 11:26:39.505000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Unable to pickle input graph or example inputs 2025-12-04T11:29:55.5487533Z W1204 11:26:39.505000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Traceback (most recent call last): 2025-12-04T11:29:55.5488425Z W1204 11:26:39.505000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 486, in serialize_compile 2025-12-04T11:29:55.5488816Z W1204 11:26:39.505000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] ).serialize() 2025-12-04T11:29:55.5489652Z W1204 11:26:39.505000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 210, in serialize 2025-12-04T11:29:55.5490246Z W1204 11:26:39.505000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] return _WireProtocolPickledInput(GraphPickler.dumps(self)) 2025-12-04T11:29:55.5491035Z W1204 11:26:39.505000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 127, in dumps 2025-12-04T11:29:55.5491441Z W1204 11:26:39.505000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] pickler.dump(obj) 2025-12-04T11:29:55.5492285Z W1204 11:26:39.505000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 94, in reducer_override 2025-12-04T11:29:55.5492833Z W1204 11:26:39.505000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] return _GraphModulePickleData.reduce_helper(self, obj) 2025-12-04T11:29:55.5493667Z W1204 11:26:39.505000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 352, in reduce_helper 2025-12-04T11:29:55.5494109Z W1204 11:26:39.505000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] cls(obj, pickler.options), 2025-12-04T11:29:55.5494899Z W1204 11:26:39.505000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 364, in __init__ 2025-12-04T11:29:55.5495431Z W1204 11:26:39.505000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] self.graph = _GraphPickleData(gm._graph, options) 2025-12-04T11:29:55.5496221Z W1204 11:26:39.505000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 571, in __init__ 2025-12-04T11:29:55.5496866Z W1204 11:26:39.505000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] nodes[node] = _NodePickleData(node, nodes, options) 2025-12-04T11:29:55.5497662Z W1204 11:26:39.505000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 387, in __init__ 2025-12-04T11:29:55.5498230Z W1204 11:26:39.505000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] self.target = _OpPickleData.pickle(node.target, options) 2025-12-04T11:29:55.5499093Z W1204 11:26:39.505000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 435, in pickle 2025-12-04T11:29:55.5499663Z W1204 11:26:39.505000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] return cls._pickle_op(name, _OpOverloadPickleData, options) 2025-12-04T11:29:55.5500482Z W1204 11:26:39.505000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 456, in _pickle_op 2025-12-04T11:29:55.5501135Z W1204 11:26:39.505000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] raise BypassFxGraphCache(f"Unable to pickle non-standard op: {name}") 2025-12-04T11:29:55.5501982Z W1204 11:26:39.505000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] torch._inductor.codecache.BypassFxGraphCache: Unable to pickle non-standard op: torch.ops.prims.iota.default 2025-12-04T11:29:55.5502492Z W1204 11:26:39.839000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Unable to pickle input graph or example inputs 2025-12-04T11:29:55.5502962Z W1204 11:26:39.839000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Traceback (most recent call last): 2025-12-04T11:29:55.5503844Z W1204 11:26:39.839000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 486, in serialize_compile 2025-12-04T11:29:55.5504222Z W1204 11:26:39.839000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] ).serialize() 2025-12-04T11:29:55.5505068Z W1204 11:26:39.839000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 210, in serialize 2025-12-04T11:29:55.5505645Z W1204 11:26:39.839000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] return _WireProtocolPickledInput(GraphPickler.dumps(self)) 2025-12-04T11:29:55.5506442Z W1204 11:26:39.839000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 127, in dumps 2025-12-04T11:29:55.5506841Z W1204 11:26:39.839000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] pickler.dump(obj) 2025-12-04T11:29:55.5507685Z W1204 11:26:39.839000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 94, in reducer_override 2025-12-04T11:29:55.5508236Z W1204 11:26:39.839000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] return _GraphModulePickleData.reduce_helper(self, obj) 2025-12-04T11:29:55.5509057Z W1204 11:26:39.839000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 352, in reduce_helper 2025-12-04T11:29:55.5509518Z W1204 11:26:39.839000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] cls(obj, pickler.options), 2025-12-04T11:29:55.5510312Z W1204 11:26:39.839000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 364, in __init__ 2025-12-04T11:29:55.5510890Z W1204 11:26:39.839000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] self.graph = _GraphPickleData(gm._graph, options) 2025-12-04T11:29:55.5511686Z W1204 11:26:39.839000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 571, in __init__ 2025-12-04T11:29:55.5512203Z W1204 11:26:39.839000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] nodes[node] = _NodePickleData(node, nodes, options) 2025-12-04T11:29:55.5513088Z W1204 11:26:39.839000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 387, in __init__ 2025-12-04T11:29:55.5513642Z W1204 11:26:39.839000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] self.target = _OpPickleData.pickle(node.target, options) 2025-12-04T11:29:55.5514451Z W1204 11:26:39.839000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 435, in pickle 2025-12-04T11:29:55.5515048Z W1204 11:26:39.839000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] return cls._pickle_op(name, _OpOverloadPickleData, options) 2025-12-04T11:29:55.5515867Z W1204 11:26:39.839000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 456, in _pickle_op 2025-12-04T11:29:55.5516494Z W1204 11:26:39.839000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] raise BypassFxGraphCache(f"Unable to pickle non-standard op: {name}") 2025-12-04T11:29:55.5517395Z W1204 11:26:39.839000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] torch._inductor.codecache.BypassFxGraphCache: Unable to pickle non-standard op: torch.ops.prims.convert_element_type.default 2025-12-04T11:29:55.5517517Z PASSED [0.6839s] [ 40%] 2025-12-04T11:29:55.5518090Z inductor/test_compile_subprocess.py::GPUTests::test_pointwise_ndtri_cuda <- test/inductor/test_torchinductor.py PASSED [0.1211s] [ 40%] 2025-12-04T11:29:55.5518655Z inductor/test_compile_subprocess.py::GPUTests::test_pointwise_psi_cuda <- test/inductor/test_torchinductor.py PASSED [0.2342s] [ 41%] 2025-12-04T11:29:55.5519316Z inductor/test_compile_subprocess.py::GPUTests::test_pointwise_scaled_modified_bessel_k1_cuda <- test/inductor/test_torchinductor.py PASSED [0.5711s] [ 42%] 2025-12-04T11:29:55.5519825Z inductor/test_compile_subprocess.py::GPUTests::test_pow3_cuda <- test/inductor/test_torchinductor.py PASSED [0.2314s] [ 42%] 2025-12-04T11:29:55.5520873Z inductor/test_compile_subprocess.py::GPUTests::test_pow_symfloat_cuda <- test/inductor/test_torchinductor.py W1204 11:26:41.382000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Unable to pickle input graph or example inputs 2025-12-04T11:29:55.5521332Z W1204 11:26:41.382000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Traceback (most recent call last): 2025-12-04T11:29:55.5522226Z W1204 11:26:41.382000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 486, in serialize_compile 2025-12-04T11:29:55.5522605Z W1204 11:26:41.382000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] ).serialize() 2025-12-04T11:29:55.5523454Z W1204 11:26:41.382000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 210, in serialize 2025-12-04T11:29:55.5524029Z W1204 11:26:41.382000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] return _WireProtocolPickledInput(GraphPickler.dumps(self)) 2025-12-04T11:29:55.5524847Z W1204 11:26:41.382000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 127, in dumps 2025-12-04T11:29:55.5525264Z W1204 11:26:41.382000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] pickler.dump(obj) 2025-12-04T11:29:55.5526095Z W1204 11:26:41.382000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 94, in reducer_override 2025-12-04T11:29:55.5526694Z W1204 11:26:41.382000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] return _GraphModulePickleData.reduce_helper(self, obj) 2025-12-04T11:29:55.5527558Z W1204 11:26:41.382000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 352, in reduce_helper 2025-12-04T11:29:55.5528014Z W1204 11:26:41.382000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] cls(obj, pickler.options), 2025-12-04T11:29:55.5528815Z W1204 11:26:41.382000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 364, in __init__ 2025-12-04T11:29:55.5529359Z W1204 11:26:41.382000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] self.graph = _GraphPickleData(gm._graph, options) 2025-12-04T11:29:55.5530169Z W1204 11:26:41.382000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 571, in __init__ 2025-12-04T11:29:55.5530695Z W1204 11:26:41.382000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] nodes[node] = _NodePickleData(node, nodes, options) 2025-12-04T11:29:55.5531497Z W1204 11:26:41.382000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 387, in __init__ 2025-12-04T11:29:55.5532047Z W1204 11:26:41.382000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] self.target = _OpPickleData.pickle(node.target, options) 2025-12-04T11:29:55.5532848Z W1204 11:26:41.382000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 435, in pickle 2025-12-04T11:29:55.5533422Z W1204 11:26:41.382000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] return cls._pickle_op(name, _OpOverloadPickleData, options) 2025-12-04T11:29:55.5534231Z W1204 11:26:41.382000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 456, in _pickle_op 2025-12-04T11:29:55.5534864Z W1204 11:26:41.382000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] raise BypassFxGraphCache(f"Unable to pickle non-standard op: {name}") 2025-12-04T11:29:55.5535767Z W1204 11:26:41.382000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] torch._inductor.codecache.BypassFxGraphCache: Unable to pickle non-standard op: torch.ops.prims.convert_element_type.default 2025-12-04T11:29:55.5535890Z PASSED [0.7046s] [ 43%] 2025-12-04T11:29:55.5536480Z inductor/test_compile_subprocess.py::GPUTests::test_prod_cuda <- test/inductor/test_torchinductor.py PASSED [2.7160s] [ 44%] 2025-12-04T11:29:55.5537138Z inductor/test_compile_subprocess.py::GPUTests::test_progressive SKIPPED [0.0003s] (Skipping triton backend only since not big GPU (not enough SM)) [ 44%] 2025-12-04T11:29:55.5538261Z inductor/test_compile_subprocess.py::GPUTests::test_rand_like_deterministic_cuda <- test/inductor/test_torchinductor.py W1204 11:26:44.776000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Unable to pickle input graph or example inputs 2025-12-04T11:29:55.5538718Z W1204 11:26:44.776000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Traceback (most recent call last): 2025-12-04T11:29:55.5539663Z W1204 11:26:44.776000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 486, in serialize_compile 2025-12-04T11:29:55.5540045Z W1204 11:26:44.776000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] ).serialize() 2025-12-04T11:29:55.5540895Z W1204 11:26:44.776000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 210, in serialize 2025-12-04T11:29:55.5541532Z W1204 11:26:44.776000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] return _WireProtocolPickledInput(GraphPickler.dumps(self)) 2025-12-04T11:29:55.5542316Z W1204 11:26:44.776000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 127, in dumps 2025-12-04T11:29:55.5542731Z W1204 11:26:44.776000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] pickler.dump(obj) 2025-12-04T11:29:55.5543591Z W1204 11:26:44.776000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 94, in reducer_override 2025-12-04T11:29:55.5544154Z W1204 11:26:44.776000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] return _GraphModulePickleData.reduce_helper(self, obj) 2025-12-04T11:29:55.5544980Z W1204 11:26:44.776000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 352, in reduce_helper 2025-12-04T11:29:55.5545435Z W1204 11:26:44.776000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] cls(obj, pickler.options), 2025-12-04T11:29:55.5546231Z W1204 11:26:44.776000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 364, in __init__ 2025-12-04T11:29:55.5546747Z W1204 11:26:44.776000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] self.graph = _GraphPickleData(gm._graph, options) 2025-12-04T11:29:55.5547554Z W1204 11:26:44.776000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 571, in __init__ 2025-12-04T11:29:55.5548074Z W1204 11:26:44.776000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] nodes[node] = _NodePickleData(node, nodes, options) 2025-12-04T11:29:55.5548884Z W1204 11:26:44.776000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 387, in __init__ 2025-12-04T11:29:55.5549436Z W1204 11:26:44.776000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] self.target = _OpPickleData.pickle(node.target, options) 2025-12-04T11:29:55.5550239Z W1204 11:26:44.776000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 435, in pickle 2025-12-04T11:29:55.5550807Z W1204 11:26:44.776000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] return cls._pickle_op(name, _OpOverloadPickleData, options) 2025-12-04T11:29:55.5551617Z W1204 11:26:44.776000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 456, in _pickle_op 2025-12-04T11:29:55.5552253Z W1204 11:26:44.776000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] raise BypassFxGraphCache(f"Unable to pickle non-standard op: {name}") 2025-12-04T11:29:55.5553172Z W1204 11:26:44.776000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] torch._inductor.codecache.BypassFxGraphCache: Unable to pickle non-standard op: torch.ops.prims.inductor_seeds.default 2025-12-04T11:29:55.5553300Z PASSED [0.4734s] [ 45%] 2025-12-04T11:29:55.5554388Z inductor/test_compile_subprocess.py::GPUTests::test_randint_distribution_cuda <- test/inductor/test_torchinductor.py W1204 11:26:45.240000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Unable to pickle input graph or example inputs 2025-12-04T11:29:55.5554878Z W1204 11:26:45.240000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Traceback (most recent call last): 2025-12-04T11:29:55.5555803Z W1204 11:26:45.240000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 486, in serialize_compile 2025-12-04T11:29:55.5556185Z W1204 11:26:45.240000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] ).serialize() 2025-12-04T11:29:55.5557032Z W1204 11:26:45.240000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 210, in serialize 2025-12-04T11:29:55.5557651Z W1204 11:26:45.240000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] return _WireProtocolPickledInput(GraphPickler.dumps(self)) 2025-12-04T11:29:55.5558446Z W1204 11:26:45.240000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 127, in dumps 2025-12-04T11:29:55.5558849Z W1204 11:26:45.240000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] pickler.dump(obj) 2025-12-04T11:29:55.5559680Z W1204 11:26:45.240000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 94, in reducer_override 2025-12-04T11:29:55.5560249Z W1204 11:26:45.240000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] return _GraphModulePickleData.reduce_helper(self, obj) 2025-12-04T11:29:55.5561079Z W1204 11:26:45.240000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 352, in reduce_helper 2025-12-04T11:29:55.5561536Z W1204 11:26:45.240000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] cls(obj, pickler.options), 2025-12-04T11:29:55.5562344Z W1204 11:26:45.240000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 364, in __init__ 2025-12-04T11:29:55.5562877Z W1204 11:26:45.240000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] self.graph = _GraphPickleData(gm._graph, options) 2025-12-04T11:29:55.5563679Z W1204 11:26:45.240000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 571, in __init__ 2025-12-04T11:29:55.5564207Z W1204 11:26:45.240000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] nodes[node] = _NodePickleData(node, nodes, options) 2025-12-04T11:29:55.5565012Z W1204 11:26:45.240000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 387, in __init__ 2025-12-04T11:29:55.5565565Z W1204 11:26:45.240000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] self.target = _OpPickleData.pickle(node.target, options) 2025-12-04T11:29:55.5566371Z W1204 11:26:45.240000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 435, in pickle 2025-12-04T11:29:55.5567011Z W1204 11:26:45.240000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] return cls._pickle_op(name, _OpOverloadPickleData, options) 2025-12-04T11:29:55.5567841Z W1204 11:26:45.240000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 456, in _pickle_op 2025-12-04T11:29:55.5568463Z W1204 11:26:45.240000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] raise BypassFxGraphCache(f"Unable to pickle non-standard op: {name}") 2025-12-04T11:29:55.5569393Z W1204 11:26:45.240000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] torch._inductor.codecache.BypassFxGraphCache: Unable to pickle non-standard op: torch.ops.prims.inductor_seeds.default 2025-12-04T11:29:55.5569517Z PASSED [0.5054s] [ 46%] 2025-12-04T11:29:55.5570579Z inductor/test_compile_subprocess.py::GPUTests::test_randn_generator_cuda <- test/inductor/test_torchinductor.py W1204 11:26:45.753000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Unable to pickle input graph or example inputs 2025-12-04T11:29:55.5571321Z W1204 11:26:45.753000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Traceback (most recent call last): 2025-12-04T11:29:55.5572311Z W1204 11:26:45.753000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 486, in serialize_compile 2025-12-04T11:29:55.5572695Z W1204 11:26:45.753000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] ).serialize() 2025-12-04T11:29:55.5573550Z W1204 11:26:45.753000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 210, in serialize 2025-12-04T11:29:55.5574131Z W1204 11:26:45.753000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] return _WireProtocolPickledInput(GraphPickler.dumps(self)) 2025-12-04T11:29:55.5574944Z W1204 11:26:45.753000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 127, in dumps 2025-12-04T11:29:55.5575350Z W1204 11:26:45.753000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] pickler.dump(obj) 2025-12-04T11:29:55.5576200Z W1204 11:26:45.753000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 94, in reducer_override 2025-12-04T11:29:55.5576841Z W1204 11:26:45.753000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] return _GraphModulePickleData.reduce_helper(self, obj) 2025-12-04T11:29:55.5577666Z W1204 11:26:45.753000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 352, in reduce_helper 2025-12-04T11:29:55.5578124Z W1204 11:26:45.753000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] cls(obj, pickler.options), 2025-12-04T11:29:55.5578919Z W1204 11:26:45.753000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 364, in __init__ 2025-12-04T11:29:55.5579453Z W1204 11:26:45.753000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] self.graph = _GraphPickleData(gm._graph, options) 2025-12-04T11:29:55.5580258Z W1204 11:26:45.753000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 571, in __init__ 2025-12-04T11:29:55.5580792Z W1204 11:26:45.753000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] nodes[node] = _NodePickleData(node, nodes, options) 2025-12-04T11:29:55.5581650Z W1204 11:26:45.753000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 387, in __init__ 2025-12-04T11:29:55.5582207Z W1204 11:26:45.753000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] self.target = _OpPickleData.pickle(node.target, options) 2025-12-04T11:29:55.5583006Z W1204 11:26:45.753000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 435, in pickle 2025-12-04T11:29:55.5583618Z W1204 11:26:45.753000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] return cls._pickle_op(name, _OpOverloadPickleData, options) 2025-12-04T11:29:55.5584485Z W1204 11:26:45.753000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 456, in _pickle_op 2025-12-04T11:29:55.5585115Z W1204 11:26:45.753000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] raise BypassFxGraphCache(f"Unable to pickle non-standard op: {name}") 2025-12-04T11:29:55.5585995Z W1204 11:26:45.753000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] torch._inductor.codecache.BypassFxGraphCache: Unable to pickle non-standard op: torch.ops.prims.inductor_seeds.default 2025-12-04T11:29:55.5586133Z PASSED [0.5853s] [ 46%] 2025-12-04T11:29:55.5587194Z inductor/test_compile_subprocess.py::GPUTests::test_randn_like_empty_cuda <- test/inductor/test_torchinductor.py W1204 11:26:46.353000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Unable to pickle input graph or example inputs 2025-12-04T11:29:55.5587673Z W1204 11:26:46.353000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Traceback (most recent call last): 2025-12-04T11:29:55.5588555Z W1204 11:26:46.353000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 486, in serialize_compile 2025-12-04T11:29:55.5588950Z W1204 11:26:46.353000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] ).serialize() 2025-12-04T11:29:55.5589789Z W1204 11:26:46.353000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 210, in serialize 2025-12-04T11:29:55.5590364Z W1204 11:26:46.353000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] return _WireProtocolPickledInput(GraphPickler.dumps(self)) 2025-12-04T11:29:55.5591171Z W1204 11:26:46.353000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 127, in dumps 2025-12-04T11:29:55.5591578Z W1204 11:26:46.353000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] pickler.dump(obj) 2025-12-04T11:29:55.5592438Z W1204 11:26:46.353000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 94, in reducer_override 2025-12-04T11:29:55.5592993Z W1204 11:26:46.353000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] return _GraphModulePickleData.reduce_helper(self, obj) 2025-12-04T11:29:55.5593830Z W1204 11:26:46.353000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 352, in reduce_helper 2025-12-04T11:29:55.5594276Z W1204 11:26:46.353000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] cls(obj, pickler.options), 2025-12-04T11:29:55.5595075Z W1204 11:26:46.353000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 364, in __init__ 2025-12-04T11:29:55.5595606Z W1204 11:26:46.353000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] self.graph = _GraphPickleData(gm._graph, options) 2025-12-04T11:29:55.5596458Z W1204 11:26:46.353000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 571, in __init__ 2025-12-04T11:29:55.5597006Z W1204 11:26:46.353000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] nodes[node] = _NodePickleData(node, nodes, options) 2025-12-04T11:29:55.5597839Z W1204 11:26:46.353000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 387, in __init__ 2025-12-04T11:29:55.5598431Z W1204 11:26:46.353000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] self.target = _OpPickleData.pickle(node.target, options) 2025-12-04T11:29:55.5599227Z W1204 11:26:46.353000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 435, in pickle 2025-12-04T11:29:55.5599797Z W1204 11:26:46.353000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] return cls._pickle_op(name, _OpOverloadPickleData, options) 2025-12-04T11:29:55.5600649Z W1204 11:26:46.353000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 456, in _pickle_op 2025-12-04T11:29:55.5601274Z W1204 11:26:46.353000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] raise BypassFxGraphCache(f"Unable to pickle non-standard op: {name}") 2025-12-04T11:29:55.5602161Z W1204 11:26:46.353000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] torch._inductor.codecache.BypassFxGraphCache: Unable to pickle non-standard op: torch.ops.prims.inductor_seeds.default 2025-12-04T11:29:55.5602664Z W1204 11:26:46.468000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Unable to pickle input graph or example inputs 2025-12-04T11:29:55.5603134Z W1204 11:26:46.468000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Traceback (most recent call last): 2025-12-04T11:29:55.5604021Z W1204 11:26:46.468000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 486, in serialize_compile 2025-12-04T11:29:55.5604403Z W1204 11:26:46.468000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] ).serialize() 2025-12-04T11:29:55.5605254Z W1204 11:26:46.468000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 210, in serialize 2025-12-04T11:29:55.5605829Z W1204 11:26:46.468000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] return _WireProtocolPickledInput(GraphPickler.dumps(self)) 2025-12-04T11:29:55.5606629Z W1204 11:26:46.468000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 127, in dumps 2025-12-04T11:29:55.5607031Z W1204 11:26:46.468000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] pickler.dump(obj) 2025-12-04T11:29:55.5607861Z W1204 11:26:46.468000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 94, in reducer_override 2025-12-04T11:29:55.5608430Z W1204 11:26:46.468000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] return _GraphModulePickleData.reduce_helper(self, obj) 2025-12-04T11:29:55.5609255Z W1204 11:26:46.468000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 352, in reduce_helper 2025-12-04T11:29:55.5609753Z W1204 11:26:46.468000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] cls(obj, pickler.options), 2025-12-04T11:29:55.5610552Z W1204 11:26:46.468000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 364, in __init__ 2025-12-04T11:29:55.5611087Z W1204 11:26:46.468000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] self.graph = _GraphPickleData(gm._graph, options) 2025-12-04T11:29:55.5611949Z W1204 11:26:46.468000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 571, in __init__ 2025-12-04T11:29:55.5612470Z W1204 11:26:46.468000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] nodes[node] = _NodePickleData(node, nodes, options) 2025-12-04T11:29:55.5613282Z W1204 11:26:46.468000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 387, in __init__ 2025-12-04T11:29:55.5613883Z W1204 11:26:46.468000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] self.target = _OpPickleData.pickle(node.target, options) 2025-12-04T11:29:55.5614681Z W1204 11:26:46.468000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 435, in pickle 2025-12-04T11:29:55.5615249Z W1204 11:26:46.468000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] return cls._pickle_op(name, _OpOverloadPickleData, options) 2025-12-04T11:29:55.5616067Z W1204 11:26:46.468000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 456, in _pickle_op 2025-12-04T11:29:55.5616762Z W1204 11:26:46.468000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] raise BypassFxGraphCache(f"Unable to pickle non-standard op: {name}") 2025-12-04T11:29:55.5617635Z W1204 11:26:46.468000 99744 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] torch._inductor.codecache.BypassFxGraphCache: Unable to pickle non-standard op: torch.ops.prims.inductor_seeds.default 2025-12-04T11:29:55.5617759Z PASSED [0.2413s] [ 47%] 2025-12-04T11:29:55.5618310Z inductor/test_compile_subprocess.py::GPUTests::test_reduction2_cuda <- test/inductor/test_torchinductor.py PASSED [0.5340s] [ 48%] 2025-12-04T11:29:55.5618868Z inductor/test_compile_subprocess.py::GPUTests::test_reduction5_cuda <- test/inductor/test_torchinductor.py PASSED [0.4880s] [ 48%] 2025-12-04T11:29:55.5619407Z inductor/test_compile_subprocess.py::GPUTests::test_remainder_cuda <- test/inductor/test_torchinductor.py PASSED [0.6404s] [ 49%] 2025-12-04T11:29:55.5620072Z inductor/test_compile_subprocess.py::GPUTests::test_remove_noop_slice_cuda <- test/inductor/test_torchinductor.py ('RERUN', {'yellow': True}) [0.4479s] [ 50%] 2025-12-04T11:29:55.5620749Z inductor/test_compile_subprocess.py::GPUTests::test_remove_noop_slice_cuda <- test/inductor/test_torchinductor.py ('RERUN', {'yellow': True}) [0.4878s] [ 50%] 2025-12-04T11:29:55.5621320Z inductor/test_compile_subprocess.py::GPUTests::test_remove_noop_slice_cuda <- test/inductor/test_torchinductor.py FAILED [0.2690s] [ 50%] 2025-12-04T11:29:55.5621330Z 2025-12-04T11:29:55.5621490Z ==================================== RERUNS ==================================== 2025-12-04T11:29:55.5621734Z _____________________ GPUTests.test_remove_noop_slice_cuda _____________________ 2025-12-04T11:29:55.5621860Z Traceback (most recent call last): 2025-12-04T11:29:55.5622286Z File "/var/lib/jenkins/workspace/test/inductor/test_torchinductor.py", line 14842, in new_test 2025-12-04T11:29:55.5622392Z return value(self) 2025-12-04T11:29:55.5622880Z File "/var/lib/jenkins/workspace/test/inductor/test_torchinductor.py", line 6708, in test_remove_noop_slice 2025-12-04T11:29:55.5623010Z self.assertExpectedInline( 2025-12-04T11:29:55.5623662Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3272, in assertExpectedInline 2025-12-04T11:29:55.5624116Z return super().assertExpectedInline(actual if isinstance(actual, str) else str(actual), expect, skip + 1) 2025-12-04T11:29:55.5624608Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/expecttest/__init__.py", line 413, in assertExpectedInline 2025-12-04T11:29:55.5624773Z assert_expected_inline( 2025-12-04T11:29:55.5625269Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/expecttest/__init__.py", line 378, in assert_expected_inline 2025-12-04T11:29:55.5625436Z assert_eq(expect, actual, msg=help_text) 2025-12-04T11:29:55.5626004Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/expecttest/__init__.py", line 450, in assertMultiLineEqualMaybeCppStack 2025-12-04T11:29:55.5626221Z self.assertMultiLineEqual(expect, actual, *args, **kwargs) 2025-12-04T11:29:55.5626608Z File "/opt/conda/envs/py_3.10/lib/python3.10/unittest/case.py", line 1226, in assertMultiLineEqual 2025-12-04T11:29:55.5626821Z self.fail(self._formatMessage(msg, standardMsg)) 2025-12-04T11:29:55.5627119Z File "/opt/conda/envs/py_3.10/lib/python3.10/unittest/case.py", line 675, in fail 2025-12-04T11:29:55.5627267Z raise self.failureException(msg) 2025-12-04T11:29:55.5627551Z AssertionError: 'def forward(self, arg0_1: "Sym(s77)", arg[333 chars]_9,)' != '' 2025-12-04T11:29:55.5627954Z - def forward(self, arg0_1: "Sym(s77)", arg1_1: "Sym(s27)", arg2_1: "Sym(s53)", arg3_1: "f32[s77, s27, s53][s27*s53, s53, 1]cuda:0"): 2025-12-04T11:29:55.5628289Z - add: "f32[s77, s27, s53][s27*s53, s53, 1]cuda:0" = torch.ops.aten.add.Tensor(arg3_1, 1); arg3_1 = None 2025-12-04T11:29:55.5628591Z - add_9: "f32[s77, s27, s53][s27*s53, s53, 1]cuda:0" = torch.ops.aten.add.Tensor(add, 1); add = None 2025-12-04T11:29:55.5629201Z - return (add_9,) : To accept the new output, re-run test with envvar EXPECTTEST_ACCEPT=1 (we recommend staging/committing your changes before doing this) 2025-12-04T11:29:55.5629210Z 2025-12-04T11:29:55.5629429Z To execute this test, run the following from the base repo dir: 2025-12-04T11:29:55.5629927Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_compile_subprocess.py GPUTests.test_remove_noop_slice_cuda 2025-12-04T11:29:55.5629934Z 2025-12-04T11:29:55.5630218Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:29:55.5630444Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:29:55.5630571Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:29:55.5630732Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T11:29:55.5631046Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:29:55.5631589Z inductor [('triton_bundler_save_kernel', 8), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:29:55.5631815Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:29:55.5632555Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__getstate__, please add missing op schema 2025-12-04T11:29:55.5632850Z warnings.warn(f"undefined OpHandler.{name}, please add missing op schema") 2025-12-04T11:29:55.5633575Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__setstate__, please add missing op schema 2025-12-04T11:29:55.5633869Z warnings.warn(f"undefined OpHandler.{name}, please add missing op schema") 2025-12-04T11:29:55.5634108Z _____________________ GPUTests.test_remove_noop_slice_cuda _____________________ 2025-12-04T11:29:55.5634233Z Traceback (most recent call last): 2025-12-04T11:29:55.5634645Z File "/var/lib/jenkins/workspace/test/inductor/test_torchinductor.py", line 14842, in new_test 2025-12-04T11:29:55.5634794Z return value(self) 2025-12-04T11:29:55.5635291Z File "/var/lib/jenkins/workspace/test/inductor/test_torchinductor.py", line 6708, in test_remove_noop_slice 2025-12-04T11:29:55.5635422Z self.assertExpectedInline( 2025-12-04T11:29:55.5636012Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3272, in assertExpectedInline 2025-12-04T11:29:55.5636497Z return super().assertExpectedInline(actual if isinstance(actual, str) else str(actual), expect, skip + 1) 2025-12-04T11:29:55.5637024Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/expecttest/__init__.py", line 413, in assertExpectedInline 2025-12-04T11:29:55.5637142Z assert_expected_inline( 2025-12-04T11:29:55.5637651Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/expecttest/__init__.py", line 378, in assert_expected_inline 2025-12-04T11:29:55.5637792Z assert_eq(expect, actual, msg=help_text) 2025-12-04T11:29:55.5638361Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/expecttest/__init__.py", line 450, in assertMultiLineEqualMaybeCppStack 2025-12-04T11:29:55.5638616Z self.assertMultiLineEqual(expect, actual, *args, **kwargs) 2025-12-04T11:29:55.5639001Z File "/opt/conda/envs/py_3.10/lib/python3.10/unittest/case.py", line 1226, in assertMultiLineEqual 2025-12-04T11:29:55.5639185Z self.fail(self._formatMessage(msg, standardMsg)) 2025-12-04T11:29:55.5639480Z File "/opt/conda/envs/py_3.10/lib/python3.10/unittest/case.py", line 675, in fail 2025-12-04T11:29:55.5639622Z raise self.failureException(msg) 2025-12-04T11:29:55.5639908Z AssertionError: 'def forward(self, arg0_1: "Sym(s77)", arg[333 chars]_9,)' != '' 2025-12-04T11:29:55.5640303Z - def forward(self, arg0_1: "Sym(s77)", arg1_1: "Sym(s27)", arg2_1: "Sym(s53)", arg3_1: "f32[s77, s27, s53][s27*s53, s53, 1]cuda:0"): 2025-12-04T11:29:55.5640636Z - add: "f32[s77, s27, s53][s27*s53, s53, 1]cuda:0" = torch.ops.aten.add.Tensor(arg3_1, 1); arg3_1 = None 2025-12-04T11:29:55.5640939Z - add_9: "f32[s77, s27, s53][s27*s53, s53, 1]cuda:0" = torch.ops.aten.add.Tensor(add, 1); add = None 2025-12-04T11:29:55.5641530Z - return (add_9,) : To accept the new output, re-run test with envvar EXPECTTEST_ACCEPT=1 (we recommend staging/committing your changes before doing this) 2025-12-04T11:29:55.5641548Z 2025-12-04T11:29:55.5641767Z To execute this test, run the following from the base repo dir: 2025-12-04T11:29:55.5642271Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_compile_subprocess.py GPUTests.test_remove_noop_slice_cuda 2025-12-04T11:29:55.5642276Z 2025-12-04T11:29:55.5642558Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:29:55.5642781Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:29:55.5642893Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:29:55.5643063Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T11:29:55.5643377Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:29:55.5643926Z inductor [('triton_bundler_save_kernel', 8), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:29:55.5644145Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:29:55.5644880Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__getstate__, please add missing op schema 2025-12-04T11:29:55.5645176Z warnings.warn(f"undefined OpHandler.{name}, please add missing op schema") 2025-12-04T11:29:55.5645902Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__setstate__, please add missing op schema 2025-12-04T11:29:55.5646195Z warnings.warn(f"undefined OpHandler.{name}, please add missing op schema") 2025-12-04T11:29:55.5646447Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:29:55.5646563Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:29:55.5646732Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T11:29:55.5647043Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:29:55.5647588Z inductor [('triton_bundler_save_kernel', 8), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:29:55.5647836Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:29:55.5648592Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__getstate__, please add missing op schema 2025-12-04T11:29:55.5648881Z warnings.warn(f"undefined OpHandler.{name}, please add missing op schema") 2025-12-04T11:29:55.5649610Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__setstate__, please add missing op schema 2025-12-04T11:29:55.5649928Z warnings.warn(f"undefined OpHandler.{name}, please add missing op schema") 2025-12-04T11:29:55.5650073Z =================================== FAILURES =================================== 2025-12-04T11:29:55.5650310Z _____________________ GPUTests.test_remove_noop_slice_cuda _____________________ 2025-12-04T11:29:55.5650448Z Traceback (most recent call last): 2025-12-04T11:29:55.5650850Z File "/var/lib/jenkins/workspace/test/inductor/test_torchinductor.py", line 14842, in new_test 2025-12-04T11:29:55.5650953Z return value(self) 2025-12-04T11:29:55.5651448Z File "/var/lib/jenkins/workspace/test/inductor/test_torchinductor.py", line 6708, in test_remove_noop_slice 2025-12-04T11:29:55.5651574Z self.assertExpectedInline( 2025-12-04T11:29:55.5652179Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3272, in assertExpectedInline 2025-12-04T11:29:55.5652617Z return super().assertExpectedInline(actual if isinstance(actual, str) else str(actual), expect, skip + 1) 2025-12-04T11:29:55.5653111Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/expecttest/__init__.py", line 413, in assertExpectedInline 2025-12-04T11:29:55.5653237Z assert_expected_inline( 2025-12-04T11:29:55.5653731Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/expecttest/__init__.py", line 378, in assert_expected_inline 2025-12-04T11:29:55.5653873Z assert_eq(expect, actual, msg=help_text) 2025-12-04T11:29:55.5654440Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/expecttest/__init__.py", line 450, in assertMultiLineEqualMaybeCppStack 2025-12-04T11:29:55.5654660Z self.assertMultiLineEqual(expect, actual, *args, **kwargs) 2025-12-04T11:29:55.5655057Z File "/opt/conda/envs/py_3.10/lib/python3.10/unittest/case.py", line 1226, in assertMultiLineEqual 2025-12-04T11:29:55.5655226Z self.fail(self._formatMessage(msg, standardMsg)) 2025-12-04T11:29:55.5655522Z File "/opt/conda/envs/py_3.10/lib/python3.10/unittest/case.py", line 675, in fail 2025-12-04T11:29:55.5655671Z raise self.failureException(msg) 2025-12-04T11:29:55.5655958Z AssertionError: 'def forward(self, arg0_1: "Sym(s77)", arg[333 chars]_9,)' != '' 2025-12-04T11:29:55.5656460Z - def forward(self, arg0_1: "Sym(s77)", arg1_1: "Sym(s27)", arg2_1: "Sym(s53)", arg3_1: "f32[s77, s27, s53][s27*s53, s53, 1]cuda:0"): 2025-12-04T11:29:55.5656788Z - add: "f32[s77, s27, s53][s27*s53, s53, 1]cuda:0" = torch.ops.aten.add.Tensor(arg3_1, 1); arg3_1 = None 2025-12-04T11:29:55.5657095Z - add_9: "f32[s77, s27, s53][s27*s53, s53, 1]cuda:0" = torch.ops.aten.add.Tensor(add, 1); add = None 2025-12-04T11:29:55.5657702Z - return (add_9,) : To accept the new output, re-run test with envvar EXPECTTEST_ACCEPT=1 (we recommend staging/committing your changes before doing this) 2025-12-04T11:29:55.5657709Z 2025-12-04T11:29:55.5657977Z To execute this test, run the following from the base repo dir: 2025-12-04T11:29:55.5658494Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_compile_subprocess.py GPUTests.test_remove_noop_slice_cuda 2025-12-04T11:29:55.5658502Z 2025-12-04T11:29:55.5658775Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:29:55.5658997Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:29:55.5659159Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:29:55.5659320Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T11:29:55.5659770Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:29:55.5660319Z inductor [('triton_bundler_save_kernel', 8), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:29:55.5660538Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:29:55.5661289Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__getstate__, please add missing op schema 2025-12-04T11:29:55.5661608Z warnings.warn(f"undefined OpHandler.{name}, please add missing op schema") 2025-12-04T11:29:55.5662332Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__setstate__, please add missing op schema 2025-12-04T11:29:55.5662627Z warnings.warn(f"undefined OpHandler.{name}, please add missing op schema") 2025-12-04T11:29:55.5662846Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:29:55.5662976Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:29:55.5663134Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T11:29:55.5663445Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:29:55.5663992Z inductor [('triton_bundler_save_kernel', 8), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:29:55.5664210Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:29:55.5664952Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__getstate__, please add missing op schema 2025-12-04T11:29:55.5665231Z warnings.warn(f"undefined OpHandler.{name}, please add missing op schema") 2025-12-04T11:29:55.5665954Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__setstate__, please add missing op schema 2025-12-04T11:29:55.5666244Z warnings.warn(f"undefined OpHandler.{name}, please add missing op schema") 2025-12-04T11:29:55.5666468Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:29:55.5666594Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:29:55.5666756Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T11:29:55.5667069Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:29:55.5667618Z inductor [('triton_bundler_save_kernel', 8), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:29:55.5667839Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:29:55.5668571Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__getstate__, please add missing op schema 2025-12-04T11:29:55.5668865Z warnings.warn(f"undefined OpHandler.{name}, please add missing op schema") 2025-12-04T11:29:55.5669592Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ops_handler.py:772: UserWarning: undefined OpHandler.__setstate__, please add missing op schema 2025-12-04T11:29:55.5669929Z warnings.warn(f"undefined OpHandler.{name}, please add missing op schema") 2025-12-04T11:29:55.5670750Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_compile_subprocess/inductor.test_compile_subprocess-decce829c4432557.xml - 2025-12-04T11:29:55.5670928Z =========================== short test summary info ============================ 2025-12-04T11:29:55.5671906Z FAILED [0.2690s] inductor/test_compile_subprocess.py::GPUTests::test_remove_noop_slice_cuda - AssertionError: 'def forward(self, arg0_1: "Sym(s77)", arg[333 chars]_9,)' != '' 2025-12-04T11:29:55.5672442Z - def forward(self, arg0_1: "Sym(s77)", arg1_1: "Sym(s27)", arg2_1: "Sym(s53)", arg3_1: "f32[s77, s27, s53][s27*s53, s53, 1]cuda:0"): 2025-12-04T11:29:55.5672780Z - add: "f32[s77, s27, s53][s27*s53, s53, 1]cuda:0" = torch.ops.aten.add.Tensor(arg3_1, 1); arg3_1 = None 2025-12-04T11:29:55.5673084Z - add_9: "f32[s77, s27, s53][s27*s53, s53, 1]cuda:0" = torch.ops.aten.add.Tensor(add, 1); add = None 2025-12-04T11:29:55.5673677Z - return (add_9,) : To accept the new output, re-run test with envvar EXPECTTEST_ACCEPT=1 (we recommend staging/committing your changes before doing this) 2025-12-04T11:29:55.5673751Z 2025-12-04T11:29:55.5673972Z To execute this test, run the following from the base repo dir: 2025-12-04T11:29:55.5674469Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_compile_subprocess.py GPUTests.test_remove_noop_slice_cuda 2025-12-04T11:29:55.5674477Z 2025-12-04T11:29:55.5674766Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:29:55.5674951Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:29:55.5675223Z = 1 failed, 66 passed, 6 skipped, 143 deselected, 2 rerun in 98.92s (0:01:38) == 2025-12-04T11:29:55.5675328Z Got exit code 1 2025-12-04T11:29:55.5675440Z Retrying single test... 2025-12-04T11:29:55.5675909Z W1204 11:27:03.841000 101781 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T11:29:55.5676559Z Test results will be stored in test-reports/python-pytest/inductor.test_compile_subprocess/inductor.test_compile_subprocess-491de48d6c983340.xml 2025-12-04T11:29:55.5676729Z ============================= test session starts ============================== 2025-12-04T11:29:55.5677096Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T11:29:55.5677212Z cachedir: .pytest_cache 2025-12-04T11:29:55.5677745Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:29:55.5677873Z rootdir: /var/lib/jenkins/workspace 2025-12-04T11:29:55.5677985Z configfile: pytest.ini 2025-12-04T11:29:55.5678543Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:29:55.5678773Z collecting ... collected 879 items / 287 deselected / 592 selected 2025-12-04T11:29:55.5679372Z stepcurrent: skipping 215 already run items. Running only test/inductor/test_compile_subprocess.py::GPUTests::test_remove_noop_slice_cuda 2025-12-04T11:29:55.5679509Z Running 1 items in this shard 2025-12-04T11:29:55.5679514Z 2025-12-04T11:29:55.5680096Z inductor/test_compile_subprocess.py::GPUTests::test_remove_noop_slice_cuda <- test/inductor/test_torchinductor.py PASSED [18.5649s] [100%] 2025-12-04T11:29:55.5680105Z 2025-12-04T11:29:55.5680937Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_compile_subprocess/inductor.test_compile_subprocess-491de48d6c983340.xml - 2025-12-04T11:29:55.5681134Z ====================== 1 passed, 287 deselected in 18.64s ====================== 2025-12-04T11:29:55.5681239Z Got exit code 0 2025-12-04T11:29:55.5681501Z Test succeeded in new process, continuing with the rest of the tests 2025-12-04T11:29:55.5681950Z W1204 11:27:44.063000 102081 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T11:29:55.5682648Z Test results will be stored in test-reports/python-pytest/inductor.test_compile_subprocess/inductor.test_compile_subprocess-35b1cdd46f4129e6.xml 2025-12-04T11:29:55.5682822Z ============================= test session starts ============================== 2025-12-04T11:29:55.5683172Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T11:29:55.5683334Z cachedir: .pytest_cache 2025-12-04T11:29:55.5683855Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:29:55.5684025Z rootdir: /var/lib/jenkins/workspace 2025-12-04T11:29:55.5684139Z configfile: pytest.ini 2025-12-04T11:29:55.5684677Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:29:55.5684917Z collecting ... collected 879 items / 216 deselected / 663 selected 2025-12-04T11:29:55.5685070Z stepcurrent: skipping 216 already run items. 2025-12-04T11:29:55.5685220Z Running 72 items in this shard 2025-12-04T11:29:55.5685225Z 2025-12-04T11:29:55.5687254Z inductor/test_compile_subprocess.py::GPUTests::test_remove_noop_slice_scatter_cuda <- test/inductor/test_torchinductor.py SKIPPED [0.0011s] (Test is disabled because an issue exists disabling it: https://github.com/pytorch/pytorch/issues/151378 for platform(s) linux, rocm, slow. If you're seeing this on your local machine and would like to enable this test, please make sure CI is not set and you are not using the flag --import-disabled-tests.) [ 1%] 2025-12-04T11:29:55.5687789Z inductor/test_compile_subprocess.py::GPUTests::test_repeat_cuda <- test/inductor/test_torchinductor.py PASSED [19.3486s] [ 2%] 2025-12-04T11:29:55.5689004Z inductor/test_compile_subprocess.py::GPUTests::test_repeat_interleave_decomposition_has_clamp_cuda <- test/inductor/test_torchinductor.py W1204 11:28:05.485000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Unable to pickle input graph or example inputs 2025-12-04T11:29:55.5689468Z W1204 11:28:05.485000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Traceback (most recent call last): 2025-12-04T11:29:55.5690375Z W1204 11:28:05.485000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 486, in serialize_compile 2025-12-04T11:29:55.5690761Z W1204 11:28:05.485000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] ).serialize() 2025-12-04T11:29:55.5691613Z W1204 11:28:05.485000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 210, in serialize 2025-12-04T11:29:55.5692192Z W1204 11:28:05.485000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] return _WireProtocolPickledInput(GraphPickler.dumps(self)) 2025-12-04T11:29:55.5692987Z W1204 11:28:05.485000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 127, in dumps 2025-12-04T11:29:55.5693407Z W1204 11:28:05.485000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] pickler.dump(obj) 2025-12-04T11:29:55.5694245Z W1204 11:28:05.485000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 94, in reducer_override 2025-12-04T11:29:55.5694815Z W1204 11:28:05.485000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] return _GraphModulePickleData.reduce_helper(self, obj) 2025-12-04T11:29:55.5695636Z W1204 11:28:05.485000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 352, in reduce_helper 2025-12-04T11:29:55.5696131Z W1204 11:28:05.485000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] cls(obj, pickler.options), 2025-12-04T11:29:55.5697003Z W1204 11:28:05.485000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 364, in __init__ 2025-12-04T11:29:55.5697529Z W1204 11:28:05.485000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] self.graph = _GraphPickleData(gm._graph, options) 2025-12-04T11:29:55.5698412Z W1204 11:28:05.485000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 571, in __init__ 2025-12-04T11:29:55.5698940Z W1204 11:28:05.485000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] nodes[node] = _NodePickleData(node, nodes, options) 2025-12-04T11:29:55.5699750Z W1204 11:28:05.485000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 387, in __init__ 2025-12-04T11:29:55.5700339Z W1204 11:28:05.485000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] self.target = _OpPickleData.pickle(node.target, options) 2025-12-04T11:29:55.5701130Z W1204 11:28:05.485000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 435, in pickle 2025-12-04T11:29:55.5701722Z W1204 11:28:05.485000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] return cls._pickle_op(name, _OpOverloadPickleData, options) 2025-12-04T11:29:55.5702529Z W1204 11:28:05.485000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 456, in _pickle_op 2025-12-04T11:29:55.5703170Z W1204 11:28:05.485000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] raise BypassFxGraphCache(f"Unable to pickle non-standard op: {name}") 2025-12-04T11:29:55.5703997Z W1204 11:28:05.485000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] torch._inductor.codecache.BypassFxGraphCache: Unable to pickle non-standard op: torch.ops.prims.iota.default 2025-12-04T11:29:55.5704114Z PASSED [1.0688s] [ 4%] 2025-12-04T11:29:55.5705177Z inductor/test_compile_subprocess.py::GPUTests::test_require_stride_expanded_cuda <- test/inductor/test_torchinductor.py W1204 11:28:07.992000 102266 site-packages/torch/_inductor/utils.py:1703] [0/0] Not enough SMs to use max_autotune_gemm mode 2025-12-04T11:29:55.5705282Z PASSED [2.3217s] [ 5%] 2025-12-04T11:29:55.5705819Z inductor/test_compile_subprocess.py::GPUTests::test_resize_cuda <- test/inductor/test_torchinductor.py PASSED [7.1319s] [ 6%] 2025-12-04T11:29:55.5706841Z inductor/test_compile_subprocess.py::GPUTests::test_roi_align_cuda <- test/inductor/test_torchinductor.py W1204 11:28:16.101000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Unable to pickle input graph or example inputs 2025-12-04T11:29:55.5707314Z W1204 11:28:16.101000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Traceback (most recent call last): 2025-12-04T11:29:55.5708202Z W1204 11:28:16.101000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 486, in serialize_compile 2025-12-04T11:29:55.5708591Z W1204 11:28:16.101000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] ).serialize() 2025-12-04T11:29:55.5709444Z W1204 11:28:16.101000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 210, in serialize 2025-12-04T11:29:55.5710054Z W1204 11:28:16.101000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] return _WireProtocolPickledInput(GraphPickler.dumps(self)) 2025-12-04T11:29:55.5710858Z W1204 11:28:16.101000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 127, in dumps 2025-12-04T11:29:55.5711264Z W1204 11:28:16.101000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] pickler.dump(obj) 2025-12-04T11:29:55.5712195Z W1204 11:28:16.101000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 94, in reducer_override 2025-12-04T11:29:55.5712753Z W1204 11:28:16.101000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] return _GraphModulePickleData.reduce_helper(self, obj) 2025-12-04T11:29:55.5713576Z W1204 11:28:16.101000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 352, in reduce_helper 2025-12-04T11:29:55.5714069Z W1204 11:28:16.101000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] cls(obj, pickler.options), 2025-12-04T11:29:55.5714870Z W1204 11:28:16.101000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 364, in __init__ 2025-12-04T11:29:55.5715412Z W1204 11:28:16.101000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] self.graph = _GraphPickleData(gm._graph, options) 2025-12-04T11:29:55.5716213Z W1204 11:28:16.101000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 571, in __init__ 2025-12-04T11:29:55.5716751Z W1204 11:28:16.101000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] nodes[node] = _NodePickleData(node, nodes, options) 2025-12-04T11:29:55.5717552Z W1204 11:28:16.101000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 387, in __init__ 2025-12-04T11:29:55.5718111Z W1204 11:28:16.101000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] self.target = _OpPickleData.pickle(node.target, options) 2025-12-04T11:29:55.5718920Z W1204 11:28:16.101000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 435, in pickle 2025-12-04T11:29:55.5719496Z W1204 11:28:16.101000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] return cls._pickle_op(name, _OpOverloadPickleData, options) 2025-12-04T11:29:55.5720318Z W1204 11:28:16.101000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 456, in _pickle_op 2025-12-04T11:29:55.5720947Z W1204 11:28:16.101000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] raise BypassFxGraphCache(f"Unable to pickle non-standard op: {name}") 2025-12-04T11:29:55.5721841Z W1204 11:28:16.101000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] torch._inductor.codecache.BypassFxGraphCache: Unable to pickle non-standard op: torch.ops.torchvision.roi_align.default 2025-12-04T11:29:55.5722352Z W1204 11:28:16.289000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Unable to pickle input graph or example inputs 2025-12-04T11:29:55.5722809Z W1204 11:28:16.289000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Traceback (most recent call last): 2025-12-04T11:29:55.5723705Z W1204 11:28:16.289000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 486, in serialize_compile 2025-12-04T11:29:55.5724123Z W1204 11:28:16.289000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] ).serialize() 2025-12-04T11:29:55.5724975Z W1204 11:28:16.289000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 210, in serialize 2025-12-04T11:29:55.5725553Z W1204 11:28:16.289000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] return _WireProtocolPickledInput(GraphPickler.dumps(self)) 2025-12-04T11:29:55.5726409Z W1204 11:28:16.289000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 127, in dumps 2025-12-04T11:29:55.5726815Z W1204 11:28:16.289000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] pickler.dump(obj) 2025-12-04T11:29:55.5727649Z W1204 11:28:16.289000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 94, in reducer_override 2025-12-04T11:29:55.5728247Z W1204 11:28:16.289000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] return _GraphModulePickleData.reduce_helper(self, obj) 2025-12-04T11:29:55.5729065Z W1204 11:28:16.289000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 352, in reduce_helper 2025-12-04T11:29:55.5729523Z W1204 11:28:16.289000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] cls(obj, pickler.options), 2025-12-04T11:29:55.5730323Z W1204 11:28:16.289000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 364, in __init__ 2025-12-04T11:29:55.5730849Z W1204 11:28:16.289000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] self.graph = _GraphPickleData(gm._graph, options) 2025-12-04T11:29:55.5731660Z W1204 11:28:16.289000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 571, in __init__ 2025-12-04T11:29:55.5732188Z W1204 11:28:16.289000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] nodes[node] = _NodePickleData(node, nodes, options) 2025-12-04T11:29:55.5732999Z W1204 11:28:16.289000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 387, in __init__ 2025-12-04T11:29:55.5733553Z W1204 11:28:16.289000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] self.target = _OpPickleData.pickle(node.target, options) 2025-12-04T11:29:55.5734357Z W1204 11:28:16.289000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 435, in pickle 2025-12-04T11:29:55.5734930Z W1204 11:28:16.289000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] return cls._pickle_op(name, _OpOverloadPickleData, options) 2025-12-04T11:29:55.5735748Z W1204 11:28:16.289000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 456, in _pickle_op 2025-12-04T11:29:55.5736485Z W1204 11:28:16.289000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] raise BypassFxGraphCache(f"Unable to pickle non-standard op: {name}") 2025-12-04T11:29:55.5737375Z W1204 11:28:16.289000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] torch._inductor.codecache.BypassFxGraphCache: Unable to pickle non-standard op: torch.ops.torchvision.roi_align.default 2025-12-04T11:29:55.5737497Z PASSED [0.5133s] [ 8%] 2025-12-04T11:29:55.5738540Z inductor/test_compile_subprocess.py::GPUTests::test_roll_cuda <- test/inductor/test_torchinductor.py W1204 11:28:16.542000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Unable to pickle input graph or example inputs 2025-12-04T11:29:55.5739019Z W1204 11:28:16.542000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Traceback (most recent call last): 2025-12-04T11:29:55.5739902Z W1204 11:28:16.542000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 486, in serialize_compile 2025-12-04T11:29:55.5740360Z W1204 11:28:16.542000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] ).serialize() 2025-12-04T11:29:55.5741210Z W1204 11:28:16.542000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 210, in serialize 2025-12-04T11:29:55.5741793Z W1204 11:28:16.542000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] return _WireProtocolPickledInput(GraphPickler.dumps(self)) 2025-12-04T11:29:55.5742618Z W1204 11:28:16.542000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 127, in dumps 2025-12-04T11:29:55.5743022Z W1204 11:28:16.542000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] pickler.dump(obj) 2025-12-04T11:29:55.5743878Z W1204 11:28:16.542000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 94, in reducer_override 2025-12-04T11:29:55.5744429Z W1204 11:28:16.542000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] return _GraphModulePickleData.reduce_helper(self, obj) 2025-12-04T11:29:55.5745252Z W1204 11:28:16.542000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 352, in reduce_helper 2025-12-04T11:29:55.5745713Z W1204 11:28:16.542000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] cls(obj, pickler.options), 2025-12-04T11:29:55.5746519Z W1204 11:28:16.542000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 364, in __init__ 2025-12-04T11:29:55.5747056Z W1204 11:28:16.542000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] self.graph = _GraphPickleData(gm._graph, options) 2025-12-04T11:29:55.5747855Z W1204 11:28:16.542000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 571, in __init__ 2025-12-04T11:29:55.5748383Z W1204 11:28:16.542000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] nodes[node] = _NodePickleData(node, nodes, options) 2025-12-04T11:29:55.5749195Z W1204 11:28:16.542000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 387, in __init__ 2025-12-04T11:29:55.5749750Z W1204 11:28:16.542000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] self.target = _OpPickleData.pickle(node.target, options) 2025-12-04T11:29:55.5750557Z W1204 11:28:16.542000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 435, in pickle 2025-12-04T11:29:55.5751133Z W1204 11:28:16.542000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] return cls._pickle_op(name, _OpOverloadPickleData, options) 2025-12-04T11:29:55.5751987Z W1204 11:28:16.542000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 456, in _pickle_op 2025-12-04T11:29:55.5752613Z W1204 11:28:16.542000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] raise BypassFxGraphCache(f"Unable to pickle non-standard op: {name}") 2025-12-04T11:29:55.5753442Z W1204 11:28:16.542000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] torch._inductor.codecache.BypassFxGraphCache: Unable to pickle non-standard op: torch.ops.prims.iota.default 2025-12-04T11:29:55.5754021Z W1204 11:28:17.331000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Unable to pickle input graph or example inputs 2025-12-04T11:29:55.5754506Z W1204 11:28:17.331000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Traceback (most recent call last): 2025-12-04T11:29:55.5755406Z W1204 11:28:17.331000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 486, in serialize_compile 2025-12-04T11:29:55.5755791Z W1204 11:28:17.331000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] ).serialize() 2025-12-04T11:29:55.5756678Z W1204 11:28:17.331000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 210, in serialize 2025-12-04T11:29:55.5757254Z W1204 11:28:17.331000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] return _WireProtocolPickledInput(GraphPickler.dumps(self)) 2025-12-04T11:29:55.5758044Z W1204 11:28:17.331000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 127, in dumps 2025-12-04T11:29:55.5758469Z W1204 11:28:17.331000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] pickler.dump(obj) 2025-12-04T11:29:55.5759307Z W1204 11:28:17.331000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 94, in reducer_override 2025-12-04T11:29:55.5759877Z W1204 11:28:17.331000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] return _GraphModulePickleData.reduce_helper(self, obj) 2025-12-04T11:29:55.5760699Z W1204 11:28:17.331000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 352, in reduce_helper 2025-12-04T11:29:55.5761159Z W1204 11:28:17.331000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] cls(obj, pickler.options), 2025-12-04T11:29:55.5761958Z W1204 11:28:17.331000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 364, in __init__ 2025-12-04T11:29:55.5762488Z W1204 11:28:17.331000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] self.graph = _GraphPickleData(gm._graph, options) 2025-12-04T11:29:55.5763307Z W1204 11:28:17.331000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 571, in __init__ 2025-12-04T11:29:55.5763832Z W1204 11:28:17.331000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] nodes[node] = _NodePickleData(node, nodes, options) 2025-12-04T11:29:55.5764646Z W1204 11:28:17.331000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 387, in __init__ 2025-12-04T11:29:55.5765201Z W1204 11:28:17.331000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] self.target = _OpPickleData.pickle(node.target, options) 2025-12-04T11:29:55.5766053Z W1204 11:28:17.331000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 435, in pickle 2025-12-04T11:29:55.5766628Z W1204 11:28:17.331000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] return cls._pickle_op(name, _OpOverloadPickleData, options) 2025-12-04T11:29:55.5767434Z W1204 11:28:17.331000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 456, in _pickle_op 2025-12-04T11:29:55.5768132Z W1204 11:28:17.331000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] raise BypassFxGraphCache(f"Unable to pickle non-standard op: {name}") 2025-12-04T11:29:55.5768957Z W1204 11:28:17.331000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] torch._inductor.codecache.BypassFxGraphCache: Unable to pickle non-standard op: torch.ops.prims.iota.default 2025-12-04T11:29:55.5769078Z PASSED [1.2033s] [ 9%] 2025-12-04T11:29:55.5769603Z inductor/test_compile_subprocess.py::GPUTests::test_round_cuda <- test/inductor/test_torchinductor.py PASSED [0.6584s] [ 11%] 2025-12-04T11:29:55.5770150Z inductor/test_compile_subprocess.py::GPUTests::test_rsqrt_cuda <- test/inductor/test_torchinductor.py PASSED [0.5388s] [ 12%] 2025-12-04T11:29:55.5771429Z inductor/test_compile_subprocess.py::GPUTests::test_rsqrt_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py W1204 11:28:19.025000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Unable to pickle input graph or example inputs 2025-12-04T11:29:55.5771896Z W1204 11:28:19.025000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Traceback (most recent call last): 2025-12-04T11:29:55.5772798Z W1204 11:28:19.025000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 486, in serialize_compile 2025-12-04T11:29:55.5773185Z W1204 11:28:19.025000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] ).serialize() 2025-12-04T11:29:55.5774041Z W1204 11:28:19.025000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 210, in serialize 2025-12-04T11:29:55.5774625Z W1204 11:28:19.025000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] return _WireProtocolPickledInput(GraphPickler.dumps(self)) 2025-12-04T11:29:55.5775420Z W1204 11:28:19.025000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 127, in dumps 2025-12-04T11:29:55.5775844Z W1204 11:28:19.025000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] pickler.dump(obj) 2025-12-04T11:29:55.5776758Z W1204 11:28:19.025000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 94, in reducer_override 2025-12-04T11:29:55.5777335Z W1204 11:28:19.025000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] return _GraphModulePickleData.reduce_helper(self, obj) 2025-12-04T11:29:55.5778163Z W1204 11:28:19.025000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 352, in reduce_helper 2025-12-04T11:29:55.5778614Z W1204 11:28:19.025000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] cls(obj, pickler.options), 2025-12-04T11:29:55.5779436Z W1204 11:28:19.025000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 364, in __init__ 2025-12-04T11:29:55.5779963Z W1204 11:28:19.025000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] self.graph = _GraphPickleData(gm._graph, options) 2025-12-04T11:29:55.5780862Z W1204 11:28:19.025000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 571, in __init__ 2025-12-04T11:29:55.5781395Z W1204 11:28:19.025000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] nodes[node] = _NodePickleData(node, nodes, options) 2025-12-04T11:29:55.5782210Z W1204 11:28:19.025000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 387, in __init__ 2025-12-04T11:29:55.5782863Z W1204 11:28:19.025000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] self.target = _OpPickleData.pickle(node.target, options) 2025-12-04T11:29:55.5783654Z W1204 11:28:19.025000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 435, in pickle 2025-12-04T11:29:55.5784244Z W1204 11:28:19.025000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] return cls._pickle_op(name, _OpOverloadPickleData, options) 2025-12-04T11:29:55.5785102Z W1204 11:28:19.025000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 456, in _pickle_op 2025-12-04T11:29:55.5785739Z W1204 11:28:19.025000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] raise BypassFxGraphCache(f"Unable to pickle non-standard op: {name}") 2025-12-04T11:29:55.5786652Z W1204 11:28:19.025000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] torch._inductor.codecache.BypassFxGraphCache: Unable to pickle non-standard op: torch.ops.prims.convert_element_type.default 2025-12-04T11:29:55.5787129Z W1204 11:28:19.045000 102081 site-packages/torch/_inductor/utils.py:1703] [0/0] Not enough SMs to use max_autotune_gemm mode 2025-12-04T11:29:55.5787640Z W1204 11:28:20.207000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Unable to pickle input graph or example inputs 2025-12-04T11:29:55.5788102Z W1204 11:28:20.207000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Traceback (most recent call last): 2025-12-04T11:29:55.5789004Z W1204 11:28:20.207000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 486, in serialize_compile 2025-12-04T11:29:55.5789395Z W1204 11:28:20.207000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] ).serialize() 2025-12-04T11:29:55.5790258Z W1204 11:28:20.207000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 210, in serialize 2025-12-04T11:29:55.5790842Z W1204 11:28:20.207000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] return _WireProtocolPickledInput(GraphPickler.dumps(self)) 2025-12-04T11:29:55.5791643Z W1204 11:28:20.207000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 127, in dumps 2025-12-04T11:29:55.5792053Z W1204 11:28:20.207000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] pickler.dump(obj) 2025-12-04T11:29:55.5792894Z W1204 11:28:20.207000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 94, in reducer_override 2025-12-04T11:29:55.5793470Z W1204 11:28:20.207000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] return _GraphModulePickleData.reduce_helper(self, obj) 2025-12-04T11:29:55.5794295Z W1204 11:28:20.207000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 352, in reduce_helper 2025-12-04T11:29:55.5794788Z W1204 11:28:20.207000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] cls(obj, pickler.options), 2025-12-04T11:29:55.5795595Z W1204 11:28:20.207000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 364, in __init__ 2025-12-04T11:29:55.5796161Z W1204 11:28:20.207000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] self.graph = _GraphPickleData(gm._graph, options) 2025-12-04T11:29:55.5796992Z W1204 11:28:20.207000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 571, in __init__ 2025-12-04T11:29:55.5797515Z W1204 11:28:20.207000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] nodes[node] = _NodePickleData(node, nodes, options) 2025-12-04T11:29:55.5798329Z W1204 11:28:20.207000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 387, in __init__ 2025-12-04T11:29:55.5798916Z W1204 11:28:20.207000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] self.target = _OpPickleData.pickle(node.target, options) 2025-12-04T11:29:55.5799721Z W1204 11:28:20.207000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 435, in pickle 2025-12-04T11:29:55.5800295Z W1204 11:28:20.207000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] return cls._pickle_op(name, _OpOverloadPickleData, options) 2025-12-04T11:29:55.5801104Z W1204 11:28:20.207000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 456, in _pickle_op 2025-12-04T11:29:55.5801754Z W1204 11:28:20.207000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] raise BypassFxGraphCache(f"Unable to pickle non-standard op: {name}") 2025-12-04T11:29:55.5802657Z W1204 11:28:20.207000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] torch._inductor.codecache.BypassFxGraphCache: Unable to pickle non-standard op: torch.ops.prims.convert_element_type.default 2025-12-04T11:29:55.5802778Z PASSED [2.0946s] [ 13%] 2025-12-04T11:29:55.5803616Z inductor/test_compile_subprocess.py::GPUTests::test_scaled_dot_product_attention_cuda <- test/inductor/test_torchinductor.py SKIPPED [0.0035s] (Can't run flash attention on this platform) [ 15%] 2025-12-04T11:29:55.5804161Z inductor/test_compile_subprocess.py::GPUTests::test_scatter3_cuda <- test/inductor/test_torchinductor.py PASSED [0.6554s] [ 16%] 2025-12-04T11:29:55.5804688Z inductor/test_compile_subprocess.py::GPUTests::test_scatter4_cuda <- test/inductor/test_torchinductor.py PASSED [1.2915s] [ 18%] 2025-12-04T11:29:55.5805239Z inductor/test_compile_subprocess.py::GPUTests::test_scatter_add3_cuda <- test/inductor/test_torchinductor.py PASSED [0.9734s] [ 19%] 2025-12-04T11:29:55.5805820Z inductor/test_compile_subprocess.py::GPUTests::test_scatter_reduce3_cuda <- test/inductor/test_torchinductor.py PASSED [1.0933s] [ 20%] 2025-12-04T11:29:55.5806724Z inductor/test_compile_subprocess.py::GPUTests::test_sdpa_prefer_nd_tiling_False_use_block_ptr_True_cuda <- test/inductor/test_torchinductor.py SKIPPED [0.0003s] (Does not support SDPA or pre-SM80 hardware) [ 22%] 2025-12-04T11:29:55.5807643Z inductor/test_compile_subprocess.py::GPUTests::test_sdpa_prefer_nd_tiling_True_use_block_ptr_False_cuda <- test/inductor/test_torchinductor.py SKIPPED [0.0002s] (Does not support SDPA or pre-SM80 hardware) [ 23%] 2025-12-04T11:29:55.5808795Z inductor/test_compile_subprocess.py::GPUTests::test_sdpa_unaligned_mask_freezing_cuda <- test/inductor/test_torchinductor.py W1204 11:28:25.124000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Unable to pickle input graph or example inputs 2025-12-04T11:29:55.5809272Z W1204 11:28:25.124000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Traceback (most recent call last): 2025-12-04T11:29:55.5810161Z W1204 11:28:25.124000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 486, in serialize_compile 2025-12-04T11:29:55.5810574Z W1204 11:28:25.124000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] ).serialize() 2025-12-04T11:29:55.5811456Z W1204 11:28:25.124000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 210, in serialize 2025-12-04T11:29:55.5812033Z W1204 11:28:25.124000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] return _WireProtocolPickledInput(GraphPickler.dumps(self)) 2025-12-04T11:29:55.5812836Z W1204 11:28:25.124000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 127, in dumps 2025-12-04T11:29:55.5813309Z W1204 11:28:25.124000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] pickler.dump(obj) 2025-12-04T11:29:55.5814101Z W1204 11:28:25.124000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] AttributeError: Can't pickle local object 'CommonTemplate.test_sdpa_unaligned_mask_freezing..Mod' 2025-12-04T11:29:55.5814226Z PASSED [0.2284s] [ 25%] 2025-12-04T11:29:55.5814786Z inductor/test_compile_subprocess.py::GPUTests::test_shape_padding_cuda <- test/inductor/test_torchinductor.py PASSED [2.8750s] [ 26%] 2025-12-04T11:29:55.5815328Z inductor/test_compile_subprocess.py::GPUTests::test_signbit_cuda <- test/inductor/test_torchinductor.py PASSED [0.5661s] [ 27%] 2025-12-04T11:29:55.5816339Z inductor/test_compile_subprocess.py::GPUTests::test_silu_cuda <- test/inductor/test_torchinductor.py W1204 11:28:28.909000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Unable to pickle input graph or example inputs 2025-12-04T11:29:55.5816896Z W1204 11:28:28.909000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Traceback (most recent call last): 2025-12-04T11:29:55.5817790Z W1204 11:28:28.909000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 486, in serialize_compile 2025-12-04T11:29:55.5818177Z W1204 11:28:28.909000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] ).serialize() 2025-12-04T11:29:55.5819033Z W1204 11:28:28.909000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 210, in serialize 2025-12-04T11:29:55.5819613Z W1204 11:28:28.909000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] return _WireProtocolPickledInput(GraphPickler.dumps(self)) 2025-12-04T11:29:55.5820418Z W1204 11:28:28.909000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 127, in dumps 2025-12-04T11:29:55.5820823Z W1204 11:28:28.909000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] pickler.dump(obj) 2025-12-04T11:29:55.5821663Z W1204 11:28:28.909000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 94, in reducer_override 2025-12-04T11:29:55.5822230Z W1204 11:28:28.909000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] return _GraphModulePickleData.reduce_helper(self, obj) 2025-12-04T11:29:55.5823098Z W1204 11:28:28.909000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 352, in reduce_helper 2025-12-04T11:29:55.5823562Z W1204 11:28:28.909000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] cls(obj, pickler.options), 2025-12-04T11:29:55.5824363Z W1204 11:28:28.909000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 364, in __init__ 2025-12-04T11:29:55.5824929Z W1204 11:28:28.909000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] self.graph = _GraphPickleData(gm._graph, options) 2025-12-04T11:29:55.5825759Z W1204 11:28:28.909000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 571, in __init__ 2025-12-04T11:29:55.5826285Z W1204 11:28:28.909000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] nodes[node] = _NodePickleData(node, nodes, options) 2025-12-04T11:29:55.5827100Z W1204 11:28:28.909000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 387, in __init__ 2025-12-04T11:29:55.5827689Z W1204 11:28:28.909000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] self.target = _OpPickleData.pickle(node.target, options) 2025-12-04T11:29:55.5828496Z W1204 11:28:28.909000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 435, in pickle 2025-12-04T11:29:55.5829069Z W1204 11:28:28.909000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] return cls._pickle_op(name, _OpOverloadPickleData, options) 2025-12-04T11:29:55.5829888Z W1204 11:28:28.909000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 456, in _pickle_op 2025-12-04T11:29:55.5830511Z W1204 11:28:28.909000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] raise BypassFxGraphCache(f"Unable to pickle non-standard op: {name}") 2025-12-04T11:29:55.5831419Z W1204 11:28:28.909000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] torch._inductor.codecache.BypassFxGraphCache: Unable to pickle non-standard op: torch.ops.prims.convert_element_type.default 2025-12-04T11:29:55.5831539Z PASSED [0.4053s] [ 29%] 2025-12-04T11:29:55.5832048Z inductor/test_compile_subprocess.py::GPUTests::test_sin_cuda <- test/inductor/test_torchinductor.py PASSED [0.9362s] [ 30%] 2025-12-04T11:29:55.5832585Z inductor/test_compile_subprocess.py::GPUTests::test_slice1_cuda <- test/inductor/test_torchinductor.py PASSED [0.6578s] [ 31%] 2025-12-04T11:29:55.5833098Z inductor/test_compile_subprocess.py::GPUTests::test_slice2_cuda <- test/inductor/test_torchinductor.py PASSED [0.8658s] [ 33%] 2025-12-04T11:29:55.5833667Z inductor/test_compile_subprocess.py::GPUTests::test_slice_mutation1_cuda <- test/inductor/test_torchinductor.py PASSED [0.6682s] [ 34%] 2025-12-04T11:29:55.5834235Z inductor/test_compile_subprocess.py::GPUTests::test_slice_scatter5_cuda <- test/inductor/test_torchinductor.py PASSED [0.5721s] [ 36%] 2025-12-04T11:29:55.5834737Z inductor/test_compile_subprocess.py::GPUTests::test_sort_cuda <- test/inductor/test_torchinductor.py PASSED [1.9188s] [ 37%] 2025-12-04T11:29:55.5835579Z inductor/test_compile_subprocess.py::GPUTests::test_sort_stable_cuda <- test/inductor/test_torchinductor.py SKIPPED [0.0007s] (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 38%] 2025-12-04T11:29:55.5836140Z inductor/test_compile_subprocess.py::GPUTests::test_sort_transpose_cuda <- test/inductor/test_torchinductor.py PASSED [27.8336s] [ 40%] 2025-12-04T11:29:55.5836728Z inductor/test_compile_subprocess.py::GPUTests::test_special_polygamma_cuda <- test/inductor/test_torchinductor.py PASSED [3.6925s] [ 41%] 2025-12-04T11:29:55.5837434Z inductor/test_compile_subprocess.py::GPUTests::test_split_cumprod_low_prec_cuda <- test/inductor/test_torchinductor.py SKIPPED [0.0035s] (Requires sm80) [ 43%] 2025-12-04T11:29:55.5838101Z inductor/test_compile_subprocess.py::GPUTests::test_split_cumsum_low_prec_cuda <- test/inductor/test_torchinductor.py SKIPPED [0.0032s] (Requires sm80) [ 44%] 2025-12-04T11:29:55.5838883Z inductor/test_compile_subprocess.py::GPUTests::test_split_reduction_with_int64_size_cuda <- test/inductor/test_torchinductor.py SKIPPED [0.2654s] (Insufficient cuda memory) [ 45%] 2025-12-04T11:29:55.5840060Z inductor/test_compile_subprocess.py::GPUTests::test_split_with_unbacked_symints_cuda <- test/inductor/test_torchinductor.py W1204 11:29:06.605000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Unable to pickle input graph or example inputs 2025-12-04T11:29:55.5840536Z W1204 11:29:06.605000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Traceback (most recent call last): 2025-12-04T11:29:55.5841429Z W1204 11:29:06.605000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 486, in serialize_compile 2025-12-04T11:29:55.5841858Z W1204 11:29:06.605000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] ).serialize() 2025-12-04T11:29:55.5842697Z W1204 11:29:06.605000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 210, in serialize 2025-12-04T11:29:55.5843281Z W1204 11:29:06.605000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] return _WireProtocolPickledInput(GraphPickler.dumps(self)) 2025-12-04T11:29:55.5844088Z W1204 11:29:06.605000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 127, in dumps 2025-12-04T11:29:55.5844498Z W1204 11:29:06.605000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] pickler.dump(obj) 2025-12-04T11:29:55.5845345Z W1204 11:29:06.605000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 94, in reducer_override 2025-12-04T11:29:55.5845898Z W1204 11:29:06.605000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] return _GraphModulePickleData.reduce_helper(self, obj) 2025-12-04T11:29:55.5846723Z W1204 11:29:06.605000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 352, in reduce_helper 2025-12-04T11:29:55.5847179Z W1204 11:29:06.605000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] cls(obj, pickler.options), 2025-12-04T11:29:55.5847980Z W1204 11:29:06.605000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 364, in __init__ 2025-12-04T11:29:55.5848521Z W1204 11:29:06.605000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] self.graph = _GraphPickleData(gm._graph, options) 2025-12-04T11:29:55.5849320Z W1204 11:29:06.605000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 571, in __init__ 2025-12-04T11:29:55.5849864Z W1204 11:29:06.605000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] nodes[node] = _NodePickleData(node, nodes, options) 2025-12-04T11:29:55.5850663Z W1204 11:29:06.605000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 387, in __init__ 2025-12-04T11:29:55.5851249Z W1204 11:29:06.605000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] self.target = _OpPickleData.pickle(node.target, options) 2025-12-04T11:29:55.5852050Z W1204 11:29:06.605000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 435, in pickle 2025-12-04T11:29:55.5852622Z W1204 11:29:06.605000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] return cls._pickle_op(name, _OpOverloadPickleData, options) 2025-12-04T11:29:55.5853503Z W1204 11:29:06.605000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 456, in _pickle_op 2025-12-04T11:29:55.5854127Z W1204 11:29:06.605000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] raise BypassFxGraphCache(f"Unable to pickle non-standard op: {name}") 2025-12-04T11:29:55.5854966Z W1204 11:29:06.605000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] torch._inductor.codecache.BypassFxGraphCache: Unable to pickle non-standard op: torch.ops.prims.iota.default 2025-12-04T11:29:55.5855101Z PASSED [1.0720s] [ 47%] 2025-12-04T11:29:55.5855884Z inductor/test_compile_subprocess.py::GPUTests::test_sqrt_dynamic_shapes_cuda <- test/inductor/test_torchinductor.py SKIPPED [0.0036s] (sqrt dynamic shapes only supports cpu) [ 48%] 2025-12-04T11:29:55.5856499Z inductor/test_compile_subprocess.py::GPUTests::test_squeeze1_cuda <- test/inductor/test_torchinductor.py PASSED [0.4750s] [ 50%] 2025-12-04T11:29:55.5857036Z inductor/test_compile_subprocess.py::GPUTests::test_squeeze2_cuda <- test/inductor/test_torchinductor.py PASSED [0.5063s] [ 51%] 2025-12-04T11:29:55.5857614Z inductor/test_compile_subprocess.py::GPUTests::test_squeeze_varargs_cuda <- test/inductor/test_torchinductor.py PASSED [0.7119s] [ 52%] 2025-12-04T11:29:55.5858122Z inductor/test_compile_subprocess.py::GPUTests::test_stack_cuda <- test/inductor/test_torchinductor.py PASSED [0.6632s] [ 54%] 2025-12-04T11:29:55.5859120Z inductor/test_compile_subprocess.py::GPUTests::test_std_cuda <- test/inductor/test_torchinductor.py W1204 11:29:10.952000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Unable to pickle input graph or example inputs 2025-12-04T11:29:55.5859596Z W1204 11:29:10.952000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Traceback (most recent call last): 2025-12-04T11:29:55.5860482Z W1204 11:29:10.952000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 486, in serialize_compile 2025-12-04T11:29:55.5860884Z W1204 11:29:10.952000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] ).serialize() 2025-12-04T11:29:55.5861723Z W1204 11:29:10.952000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 210, in serialize 2025-12-04T11:29:55.5862316Z W1204 11:29:10.952000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] return _WireProtocolPickledInput(GraphPickler.dumps(self)) 2025-12-04T11:29:55.5863102Z W1204 11:29:10.952000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 127, in dumps 2025-12-04T11:29:55.5863506Z W1204 11:29:10.952000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] pickler.dump(obj) 2025-12-04T11:29:55.5864357Z W1204 11:29:10.952000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 94, in reducer_override 2025-12-04T11:29:55.5864908Z W1204 11:29:10.952000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] return _GraphModulePickleData.reduce_helper(self, obj) 2025-12-04T11:29:55.5865780Z W1204 11:29:10.952000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 352, in reduce_helper 2025-12-04T11:29:55.5866230Z W1204 11:29:10.952000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] cls(obj, pickler.options), 2025-12-04T11:29:55.5867043Z W1204 11:29:10.952000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 364, in __init__ 2025-12-04T11:29:55.5867648Z W1204 11:29:10.952000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] self.graph = _GraphPickleData(gm._graph, options) 2025-12-04T11:29:55.5868448Z W1204 11:29:10.952000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 571, in __init__ 2025-12-04T11:29:55.5868988Z W1204 11:29:10.952000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] nodes[node] = _NodePickleData(node, nodes, options) 2025-12-04T11:29:55.5869819Z W1204 11:29:10.952000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 387, in __init__ 2025-12-04T11:29:55.5870387Z W1204 11:29:10.952000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] self.target = _OpPickleData.pickle(node.target, options) 2025-12-04T11:29:55.5871356Z W1204 11:29:10.952000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 435, in pickle 2025-12-04T11:29:55.5871930Z W1204 11:29:10.952000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] return cls._pickle_op(name, _OpOverloadPickleData, options) 2025-12-04T11:29:55.5872763Z W1204 11:29:10.952000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 456, in _pickle_op 2025-12-04T11:29:55.5873389Z W1204 11:29:10.952000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] raise BypassFxGraphCache(f"Unable to pickle non-standard op: {name}") 2025-12-04T11:29:55.5874316Z W1204 11:29:10.952000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] torch._inductor.codecache.BypassFxGraphCache: Unable to pickle non-standard op: torch.ops.prims.convert_element_type.default 2025-12-04T11:29:55.5874424Z PASSED [1.8510s] [ 55%] 2025-12-04T11:29:55.5875001Z inductor/test_compile_subprocess.py::GPUTests::test_strided_inputs_cuda <- test/inductor/test_torchinductor.py PASSED [0.2326s] [ 56%] 2025-12-04T11:29:55.5875510Z inductor/test_compile_subprocess.py::GPUTests::test_sum2_cuda <- test/inductor/test_torchinductor.py PASSED [2.3694s] [ 58%] 2025-12-04T11:29:55.5876016Z inductor/test_compile_subprocess.py::GPUTests::test_sum3_cuda <- test/inductor/test_torchinductor.py PASSED [0.7126s] [ 59%] 2025-12-04T11:29:55.5876535Z inductor/test_compile_subprocess.py::GPUTests::test_sum4_cuda <- test/inductor/test_torchinductor.py PASSED [1.1273s] [ 61%] 2025-12-04T11:29:55.5877040Z inductor/test_compile_subprocess.py::GPUTests::test_sum5_cuda <- test/inductor/test_torchinductor.py PASSED [1.3352s] [ 62%] 2025-12-04T11:29:55.5877595Z inductor/test_compile_subprocess.py::GPUTests::test_sum_keepdims_cuda <- test/inductor/test_torchinductor.py PASSED [0.6257s] [ 63%] 2025-12-04T11:29:55.5878100Z inductor/test_compile_subprocess.py::GPUTests::test_tanh_cuda <- test/inductor/test_torchinductor.py PASSED [0.7866s] [ 65%] 2025-12-04T11:29:55.5879273Z inductor/test_compile_subprocess.py::GPUTests::test_tmp_not_defined_issue1_use_block_ptr_True_cuda <- test/inductor/test_torchinductor.py W1204 11:29:19.611000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Unable to pickle input graph or example inputs 2025-12-04T11:29:55.5879817Z W1204 11:29:19.611000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Traceback (most recent call last): 2025-12-04T11:29:55.5880712Z W1204 11:29:19.611000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 486, in serialize_compile 2025-12-04T11:29:55.5881109Z W1204 11:29:19.611000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] ).serialize() 2025-12-04T11:29:55.5882107Z W1204 11:29:19.611000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 210, in serialize 2025-12-04T11:29:55.5882706Z W1204 11:29:19.611000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] return _WireProtocolPickledInput(GraphPickler.dumps(self)) 2025-12-04T11:29:55.5883496Z W1204 11:29:19.611000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 127, in dumps 2025-12-04T11:29:55.5883944Z W1204 11:29:19.611000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] pickler.dump(obj) 2025-12-04T11:29:55.5884791Z W1204 11:29:19.611000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 94, in reducer_override 2025-12-04T11:29:55.5885346Z W1204 11:29:19.611000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] return _GraphModulePickleData.reduce_helper(self, obj) 2025-12-04T11:29:55.5886188Z W1204 11:29:19.611000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 352, in reduce_helper 2025-12-04T11:29:55.5886635Z W1204 11:29:19.611000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] cls(obj, pickler.options), 2025-12-04T11:29:55.5887452Z W1204 11:29:19.611000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 364, in __init__ 2025-12-04T11:29:55.5887977Z W1204 11:29:19.611000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] self.graph = _GraphPickleData(gm._graph, options) 2025-12-04T11:29:55.5888775Z W1204 11:29:19.611000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 571, in __init__ 2025-12-04T11:29:55.5889317Z W1204 11:29:19.611000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] nodes[node] = _NodePickleData(node, nodes, options) 2025-12-04T11:29:55.5890110Z W1204 11:29:19.611000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 387, in __init__ 2025-12-04T11:29:55.5890680Z W1204 11:29:19.611000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] self.target = _OpPickleData.pickle(node.target, options) 2025-12-04T11:29:55.5891471Z W1204 11:29:19.611000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 435, in pickle 2025-12-04T11:29:55.5892061Z W1204 11:29:19.611000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] return cls._pickle_op(name, _OpOverloadPickleData, options) 2025-12-04T11:29:55.5892883Z W1204 11:29:19.611000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 456, in _pickle_op 2025-12-04T11:29:55.5893507Z W1204 11:29:19.611000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] raise BypassFxGraphCache(f"Unable to pickle non-standard op: {name}") 2025-12-04T11:29:55.5894471Z W1204 11:29:19.611000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] torch._inductor.codecache.BypassFxGraphCache: Unable to pickle non-standard op: torch.ops.prims.convert_element_type.default 2025-12-04T11:29:55.5894583Z PASSED [1.0038s] [ 66%] 2025-12-04T11:29:55.5895693Z inductor/test_compile_subprocess.py::GPUTests::test_tmp_not_defined_issue3_cuda <- test/inductor/test_torchinductor.py W1204 11:29:20.304000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Unable to pickle input graph or example inputs 2025-12-04T11:29:55.5896216Z W1204 11:29:20.304000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Traceback (most recent call last): 2025-12-04T11:29:55.5897176Z W1204 11:29:20.304000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 486, in serialize_compile 2025-12-04T11:29:55.5897582Z W1204 11:29:20.304000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] ).serialize() 2025-12-04T11:29:55.5898464Z W1204 11:29:20.304000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 210, in serialize 2025-12-04T11:29:55.5899057Z W1204 11:29:20.304000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] return _WireProtocolPickledInput(GraphPickler.dumps(self)) 2025-12-04T11:29:55.5899847Z W1204 11:29:20.304000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 127, in dumps 2025-12-04T11:29:55.5900267Z W1204 11:29:20.304000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] pickler.dump(obj) 2025-12-04T11:29:55.5901104Z W1204 11:29:20.304000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 94, in reducer_override 2025-12-04T11:29:55.5901662Z W1204 11:29:20.304000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] return _GraphModulePickleData.reduce_helper(self, obj) 2025-12-04T11:29:55.5902505Z W1204 11:29:20.304000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 352, in reduce_helper 2025-12-04T11:29:55.5902954Z W1204 11:29:20.304000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] cls(obj, pickler.options), 2025-12-04T11:29:55.5903771Z W1204 11:29:20.304000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 364, in __init__ 2025-12-04T11:29:55.5904300Z W1204 11:29:20.304000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] self.graph = _GraphPickleData(gm._graph, options) 2025-12-04T11:29:55.5905118Z W1204 11:29:20.304000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 571, in __init__ 2025-12-04T11:29:55.5905646Z W1204 11:29:20.304000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] nodes[node] = _NodePickleData(node, nodes, options) 2025-12-04T11:29:55.5906448Z W1204 11:29:20.304000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 387, in __init__ 2025-12-04T11:29:55.5907024Z W1204 11:29:20.304000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] self.target = _OpPickleData.pickle(node.target, options) 2025-12-04T11:29:55.5907820Z W1204 11:29:20.304000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 435, in pickle 2025-12-04T11:29:55.5908459Z W1204 11:29:20.304000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] return cls._pickle_op(name, _OpOverloadPickleData, options) 2025-12-04T11:29:55.5909272Z W1204 11:29:20.304000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 456, in _pickle_op 2025-12-04T11:29:55.5909943Z W1204 11:29:20.304000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] raise BypassFxGraphCache(f"Unable to pickle non-standard op: {name}") 2025-12-04T11:29:55.5910800Z W1204 11:29:20.304000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] torch._inductor.codecache.BypassFxGraphCache: Unable to pickle non-standard op: torch.ops.prims.iota.default 2025-12-04T11:29:55.5910911Z PASSED [2.5343s] [ 68%] 2025-12-04T11:29:55.5911957Z inductor/test_compile_subprocess.py::GPUTests::test_to_dtype_cuda <- test/inductor/test_torchinductor.py W1204 11:29:22.617000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Unable to pickle input graph or example inputs 2025-12-04T11:29:55.5912447Z W1204 11:29:22.617000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Traceback (most recent call last): 2025-12-04T11:29:55.5913347Z W1204 11:29:22.617000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 486, in serialize_compile 2025-12-04T11:29:55.5913734Z W1204 11:29:22.617000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] ).serialize() 2025-12-04T11:29:55.5914572Z W1204 11:29:22.617000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 210, in serialize 2025-12-04T11:29:55.5915169Z W1204 11:29:22.617000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] return _WireProtocolPickledInput(GraphPickler.dumps(self)) 2025-12-04T11:29:55.5915959Z W1204 11:29:22.617000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 127, in dumps 2025-12-04T11:29:55.5916379Z W1204 11:29:22.617000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] pickler.dump(obj) 2025-12-04T11:29:55.5917217Z W1204 11:29:22.617000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 94, in reducer_override 2025-12-04T11:29:55.5917788Z W1204 11:29:22.617000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] return _GraphModulePickleData.reduce_helper(self, obj) 2025-12-04T11:29:55.5918618Z W1204 11:29:22.617000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 352, in reduce_helper 2025-12-04T11:29:55.5919063Z W1204 11:29:22.617000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] cls(obj, pickler.options), 2025-12-04T11:29:55.5919884Z W1204 11:29:22.617000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 364, in __init__ 2025-12-04T11:29:55.5920407Z W1204 11:29:22.617000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] self.graph = _GraphPickleData(gm._graph, options) 2025-12-04T11:29:55.5921219Z W1204 11:29:22.617000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 571, in __init__ 2025-12-04T11:29:55.5921742Z W1204 11:29:22.617000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] nodes[node] = _NodePickleData(node, nodes, options) 2025-12-04T11:29:55.5922583Z W1204 11:29:22.617000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 387, in __init__ 2025-12-04T11:29:55.5923141Z W1204 11:29:22.617000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] self.target = _OpPickleData.pickle(node.target, options) 2025-12-04T11:29:55.5923930Z W1204 11:29:22.617000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 435, in pickle 2025-12-04T11:29:55.5924595Z W1204 11:29:22.617000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] return cls._pickle_op(name, _OpOverloadPickleData, options) 2025-12-04T11:29:55.5925401Z W1204 11:29:22.617000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 456, in _pickle_op 2025-12-04T11:29:55.5926042Z W1204 11:29:22.617000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] raise BypassFxGraphCache(f"Unable to pickle non-standard op: {name}") 2025-12-04T11:29:55.5926981Z W1204 11:29:22.617000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] torch._inductor.codecache.BypassFxGraphCache: Unable to pickle non-standard op: torch.ops.prims.convert_element_type.default 2025-12-04T11:29:55.5927503Z W1204 11:29:22.887000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Unable to pickle input graph or example inputs 2025-12-04T11:29:55.5927965Z W1204 11:29:22.887000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Traceback (most recent call last): 2025-12-04T11:29:55.5928858Z W1204 11:29:22.887000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 486, in serialize_compile 2025-12-04T11:29:55.5929255Z W1204 11:29:22.887000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] ).serialize() 2025-12-04T11:29:55.5930090Z W1204 11:29:22.887000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 210, in serialize 2025-12-04T11:29:55.5930681Z W1204 11:29:22.887000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] return _WireProtocolPickledInput(GraphPickler.dumps(self)) 2025-12-04T11:29:55.5931472Z W1204 11:29:22.887000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 127, in dumps 2025-12-04T11:29:55.5931878Z W1204 11:29:22.887000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] pickler.dump(obj) 2025-12-04T11:29:55.5932730Z W1204 11:29:22.887000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 94, in reducer_override 2025-12-04T11:29:55.5933283Z W1204 11:29:22.887000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] return _GraphModulePickleData.reduce_helper(self, obj) 2025-12-04T11:29:55.5934124Z W1204 11:29:22.887000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 352, in reduce_helper 2025-12-04T11:29:55.5934573Z W1204 11:29:22.887000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] cls(obj, pickler.options), 2025-12-04T11:29:55.5935388Z W1204 11:29:22.887000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 364, in __init__ 2025-12-04T11:29:55.5935951Z W1204 11:29:22.887000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] self.graph = _GraphPickleData(gm._graph, options) 2025-12-04T11:29:55.5936816Z W1204 11:29:22.887000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 571, in __init__ 2025-12-04T11:29:55.5937362Z W1204 11:29:22.887000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] nodes[node] = _NodePickleData(node, nodes, options) 2025-12-04T11:29:55.5938205Z W1204 11:29:22.887000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 387, in __init__ 2025-12-04T11:29:55.5938808Z W1204 11:29:22.887000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] self.target = _OpPickleData.pickle(node.target, options) 2025-12-04T11:29:55.5939602Z W1204 11:29:22.887000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 435, in pickle 2025-12-04T11:29:55.5940190Z W1204 11:29:22.887000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] return cls._pickle_op(name, _OpOverloadPickleData, options) 2025-12-04T11:29:55.5941033Z W1204 11:29:22.887000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 456, in _pickle_op 2025-12-04T11:29:55.5941661Z W1204 11:29:22.887000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] raise BypassFxGraphCache(f"Unable to pickle non-standard op: {name}") 2025-12-04T11:29:55.5942586Z W1204 11:29:22.887000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] torch._inductor.codecache.BypassFxGraphCache: Unable to pickle non-standard op: torch.ops.prims.convert_element_type.default 2025-12-04T11:29:55.5942695Z PASSED [0.4930s] [ 69%] 2025-12-04T11:29:55.5943425Z inductor/test_compile_subprocess.py::GPUTests::test_triton_argmin_argmax_transpose_logical_index_cuda <- test/inductor/test_torchinductor.py PASSED [4.4349s] [ 70%] 2025-12-04T11:29:55.5944489Z inductor/test_compile_subprocess.py::GPUTests::test_uint4x2_mixed_mm_cuda <- test/inductor/test_torchinductor.py W1204 11:29:27.559000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Unable to pickle input graph or example inputs 2025-12-04T11:29:55.5944964Z W1204 11:29:27.559000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Traceback (most recent call last): 2025-12-04T11:29:55.5945854Z W1204 11:29:27.559000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 486, in serialize_compile 2025-12-04T11:29:55.5946238Z W1204 11:29:27.559000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] ).serialize() 2025-12-04T11:29:55.5947091Z W1204 11:29:27.559000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 210, in serialize 2025-12-04T11:29:55.5947670Z W1204 11:29:27.559000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] return _WireProtocolPickledInput(GraphPickler.dumps(self)) 2025-12-04T11:29:55.5948469Z W1204 11:29:27.559000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 127, in dumps 2025-12-04T11:29:55.5948879Z W1204 11:29:27.559000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] pickler.dump(obj) 2025-12-04T11:29:55.5949709Z W1204 11:29:27.559000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 94, in reducer_override 2025-12-04T11:29:55.5950307Z W1204 11:29:27.559000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] return _GraphModulePickleData.reduce_helper(self, obj) 2025-12-04T11:29:55.5951133Z W1204 11:29:27.559000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 352, in reduce_helper 2025-12-04T11:29:55.5951590Z W1204 11:29:27.559000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] cls(obj, pickler.options), 2025-12-04T11:29:55.5952456Z W1204 11:29:27.559000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 364, in __init__ 2025-12-04T11:29:55.5952993Z W1204 11:29:27.559000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] self.graph = _GraphPickleData(gm._graph, options) 2025-12-04T11:29:55.5953794Z W1204 11:29:27.559000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 571, in __init__ 2025-12-04T11:29:55.5954350Z W1204 11:29:27.559000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] nodes[node] = _NodePickleData(node, nodes, options) 2025-12-04T11:29:55.5955160Z W1204 11:29:27.559000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 387, in __init__ 2025-12-04T11:29:55.5955714Z W1204 11:29:27.559000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] self.target = _OpPickleData.pickle(node.target, options) 2025-12-04T11:29:55.5956522Z W1204 11:29:27.559000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 435, in pickle 2025-12-04T11:29:55.5957093Z W1204 11:29:27.559000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] return cls._pickle_op(name, _OpOverloadPickleData, options) 2025-12-04T11:29:55.5957918Z W1204 11:29:27.559000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 456, in _pickle_op 2025-12-04T11:29:55.5958543Z W1204 11:29:27.559000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] raise BypassFxGraphCache(f"Unable to pickle non-standard op: {name}") 2025-12-04T11:29:55.5959456Z W1204 11:29:27.559000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] torch._inductor.codecache.BypassFxGraphCache: Unable to pickle non-standard op: torch.ops.prims.convert_element_type.default 2025-12-04T11:29:55.5959979Z W1204 11:29:27.994000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Unable to pickle input graph or example inputs 2025-12-04T11:29:55.5960437Z W1204 11:29:27.994000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] Traceback (most recent call last): 2025-12-04T11:29:55.5961343Z W1204 11:29:27.994000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 486, in serialize_compile 2025-12-04T11:29:55.5961727Z W1204 11:29:27.994000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] ).serialize() 2025-12-04T11:29:55.5962574Z W1204 11:29:27.994000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 210, in serialize 2025-12-04T11:29:55.5963159Z W1204 11:29:27.994000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] return _WireProtocolPickledInput(GraphPickler.dumps(self)) 2025-12-04T11:29:55.5963943Z W1204 11:29:27.994000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 127, in dumps 2025-12-04T11:29:55.5964406Z W1204 11:29:27.994000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] pickler.dump(obj) 2025-12-04T11:29:55.5965241Z W1204 11:29:27.994000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 94, in reducer_override 2025-12-04T11:29:55.5965806Z W1204 11:29:27.994000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] return _GraphModulePickleData.reduce_helper(self, obj) 2025-12-04T11:29:55.5966718Z W1204 11:29:27.994000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 352, in reduce_helper 2025-12-04T11:29:55.5967183Z W1204 11:29:27.994000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] cls(obj, pickler.options), 2025-12-04T11:29:55.5967983Z W1204 11:29:27.994000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 364, in __init__ 2025-12-04T11:29:55.5968535Z W1204 11:29:27.994000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] self.graph = _GraphPickleData(gm._graph, options) 2025-12-04T11:29:55.5969340Z W1204 11:29:27.994000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 571, in __init__ 2025-12-04T11:29:55.5969871Z W1204 11:29:27.994000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] nodes[node] = _NodePickleData(node, nodes, options) 2025-12-04T11:29:55.5970682Z W1204 11:29:27.994000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 387, in __init__ 2025-12-04T11:29:55.5971485Z W1204 11:29:27.994000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] self.target = _OpPickleData.pickle(node.target, options) 2025-12-04T11:29:55.5972281Z W1204 11:29:27.994000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 435, in pickle 2025-12-04T11:29:55.5972869Z W1204 11:29:27.994000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] return cls._pickle_op(name, _OpOverloadPickleData, options) 2025-12-04T11:29:55.5973685Z W1204 11:29:27.994000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 456, in _pickle_op 2025-12-04T11:29:55.5974330Z W1204 11:29:27.994000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] raise BypassFxGraphCache(f"Unable to pickle non-standard op: {name}") 2025-12-04T11:29:55.5975241Z W1204 11:29:27.994000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0] torch._inductor.codecache.BypassFxGraphCache: Unable to pickle non-standard op: torch.ops.prims.convert_element_type.default 2025-12-04T11:29:55.5975365Z PASSED [0.8081s] [ 72%] 2025-12-04T11:29:55.5975993Z inductor/test_compile_subprocess.py::GPUTests::test_unbacked_floordiv_simplify_cuda <- test/inductor/test_torchinductor.py PASSED [1.0403s] [ 73%] 2025-12-04T11:29:55.5976720Z inductor/test_compile_subprocess.py::GPUTests::test_unbacked_floordiv_simplify_errors_cuda <- test/inductor/test_torchinductor.py PASSED [0.0238s] [ 75%] 2025-12-04T11:29:55.5977331Z inductor/test_compile_subprocess.py::GPUTests::test_unroll_small_reduction_cuda <- test/inductor/test_torchinductor.py PASSED [2.3556s] [ 76%] 2025-12-04T11:29:55.5977918Z inductor/test_compile_subprocess.py::GPUTests::test_unspec_inputs_float16_cuda <- test/inductor/test_torchinductor.py PASSED [0.7524s] [ 77%] 2025-12-04T11:29:55.5978508Z inductor/test_compile_subprocess.py::GPUTests::test_unspec_inputs_int32_cuda <- test/inductor/test_torchinductor.py PASSED [1.1009s] [ 79%] 2025-12-04T11:29:55.5979183Z inductor/test_compile_subprocess.py::GPUTests::test_unspec_inputs_int8_cuda <- test/inductor/test_torchinductor.py PASSED [0.5682s] [ 80%] 2025-12-04T11:29:55.5979811Z inductor/test_compile_subprocess.py::GPUTests::test_upsample_nearest2d_backward_cuda <- test/inductor/test_torchinductor.py PASSED [2.8734s] [ 81%] 2025-12-04T11:29:55.5980389Z inductor/test_compile_subprocess.py::GPUTests::test_var_correction_cuda <- test/inductor/test_torchinductor.py PASSED [1.3559s] [ 83%] 2025-12-04T11:29:55.5981026Z inductor/test_compile_subprocess.py::GPUTests::test_var_mean_div_by_cuda <- test/inductor/test_torchinductor.py PASSED [0.7491s] [ 84%] 2025-12-04T11:29:55.5981657Z inductor/test_compile_subprocess.py::GPUTests::test_var_mean_tile_reduction_False_cuda <- test/inductor/test_torchinductor.py PASSED [0.8185s] [ 86%] 2025-12-04T11:29:55.5982268Z inductor/test_compile_subprocess.py::GPUTests::test_var_mean_tile_reduction_True_cuda <- test/inductor/test_torchinductor.py PASSED [0.7944s] [ 87%] 2025-12-04T11:29:55.5982851Z inductor/test_compile_subprocess.py::GPUTests::test_vertical_fusion1_cuda <- test/inductor/test_torchinductor.py PASSED [0.8372s] [ 88%] 2025-12-04T11:29:55.5983443Z inductor/test_compile_subprocess.py::GPUTests::test_view_as_complex_cuda <- test/inductor/test_torchinductor.py PASSED [0.2878s] [ 90%] 2025-12-04T11:29:55.5983961Z inductor/test_compile_subprocess.py::GPUTests::test_views2_cuda <- test/inductor/test_torchinductor.py PASSED [2.6466s] [ 91%] 2025-12-04T11:29:55.5985038Z inductor/test_compile_subprocess.py::GPUTests::test_weight_norm_bwd_cuda <- test/inductor/test_torchinductor.py W1204 11:29:45.121000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0_1] Unable to pickle input graph or example inputs 2025-12-04T11:29:55.5985504Z W1204 11:29:45.121000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0_1] Traceback (most recent call last): 2025-12-04T11:29:55.5986414Z W1204 11:29:45.121000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0_1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 486, in serialize_compile 2025-12-04T11:29:55.5986810Z W1204 11:29:45.121000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0_1] ).serialize() 2025-12-04T11:29:55.5987665Z W1204 11:29:45.121000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0_1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 210, in serialize 2025-12-04T11:29:55.5988265Z W1204 11:29:45.121000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0_1] return _WireProtocolPickledInput(GraphPickler.dumps(self)) 2025-12-04T11:29:55.5989055Z W1204 11:29:45.121000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0_1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 127, in dumps 2025-12-04T11:29:55.5989492Z W1204 11:29:45.121000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0_1] pickler.dump(obj) 2025-12-04T11:29:55.5990334Z W1204 11:29:45.121000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0_1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 94, in reducer_override 2025-12-04T11:29:55.5990906Z W1204 11:29:45.121000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0_1] return _GraphModulePickleData.reduce_helper(self, obj) 2025-12-04T11:29:55.5991738Z W1204 11:29:45.121000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0_1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 352, in reduce_helper 2025-12-04T11:29:55.5992192Z W1204 11:29:45.121000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0_1] cls(obj, pickler.options), 2025-12-04T11:29:55.5993054Z W1204 11:29:45.121000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0_1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 364, in __init__ 2025-12-04T11:29:55.5993589Z W1204 11:29:45.121000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0_1] self.graph = _GraphPickleData(gm._graph, options) 2025-12-04T11:29:55.5994403Z W1204 11:29:45.121000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0_1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 571, in __init__ 2025-12-04T11:29:55.5994967Z W1204 11:29:45.121000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0_1] nodes[node] = _NodePickleData(node, nodes, options) 2025-12-04T11:29:55.5995814Z W1204 11:29:45.121000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0_1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 387, in __init__ 2025-12-04T11:29:55.5996385Z W1204 11:29:45.121000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0_1] self.target = _OpPickleData.pickle(node.target, options) 2025-12-04T11:29:55.5997181Z W1204 11:29:45.121000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0_1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 435, in pickle 2025-12-04T11:29:55.5997804Z W1204 11:29:45.121000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0_1] return cls._pickle_op(name, _OpOverloadPickleData, options) 2025-12-04T11:29:55.5998624Z W1204 11:29:45.121000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0_1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 456, in _pickle_op 2025-12-04T11:29:55.5999270Z W1204 11:29:45.121000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0_1] raise BypassFxGraphCache(f"Unable to pickle non-standard op: {name}") 2025-12-04T11:29:55.6000185Z W1204 11:29:45.121000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0_1] torch._inductor.codecache.BypassFxGraphCache: Unable to pickle non-standard op: torch.ops.prims.convert_element_type.default 2025-12-04T11:29:55.6000709Z W1204 11:29:45.510000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0_1] Unable to pickle input graph or example inputs 2025-12-04T11:29:55.6001174Z W1204 11:29:45.510000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0_1] Traceback (most recent call last): 2025-12-04T11:29:55.6002070Z W1204 11:29:45.510000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0_1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 486, in serialize_compile 2025-12-04T11:29:55.6002481Z W1204 11:29:45.510000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0_1] ).serialize() 2025-12-04T11:29:55.6003321Z W1204 11:29:45.510000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0_1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 210, in serialize 2025-12-04T11:29:55.6003919Z W1204 11:29:45.510000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0_1] return _WireProtocolPickledInput(GraphPickler.dumps(self)) 2025-12-04T11:29:55.6004707Z W1204 11:29:45.510000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0_1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 127, in dumps 2025-12-04T11:29:55.6005137Z W1204 11:29:45.510000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0_1] pickler.dump(obj) 2025-12-04T11:29:55.6005985Z W1204 11:29:45.510000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0_1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 94, in reducer_override 2025-12-04T11:29:55.6006544Z W1204 11:29:45.510000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0_1] return _GraphModulePickleData.reduce_helper(self, obj) 2025-12-04T11:29:55.6007423Z W1204 11:29:45.510000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0_1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 352, in reduce_helper 2025-12-04T11:29:55.6007876Z W1204 11:29:45.510000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0_1] cls(obj, pickler.options), 2025-12-04T11:29:55.6008692Z W1204 11:29:45.510000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0_1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 364, in __init__ 2025-12-04T11:29:55.6009283Z W1204 11:29:45.510000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0_1] self.graph = _GraphPickleData(gm._graph, options) 2025-12-04T11:29:55.6010086Z W1204 11:29:45.510000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0_1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 571, in __init__ 2025-12-04T11:29:55.6010629Z W1204 11:29:45.510000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0_1] nodes[node] = _NodePickleData(node, nodes, options) 2025-12-04T11:29:55.6011462Z W1204 11:29:45.510000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0_1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 387, in __init__ 2025-12-04T11:29:55.6012035Z W1204 11:29:45.510000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0_1] self.target = _OpPickleData.pickle(node.target, options) 2025-12-04T11:29:55.6012833Z W1204 11:29:45.510000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0_1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 435, in pickle 2025-12-04T11:29:55.6013430Z W1204 11:29:45.510000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0_1] return cls._pickle_op(name, _OpOverloadPickleData, options) 2025-12-04T11:29:55.6014251Z W1204 11:29:45.510000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0_1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 456, in _pickle_op 2025-12-04T11:29:55.6014882Z W1204 11:29:45.510000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0_1] raise BypassFxGraphCache(f"Unable to pickle non-standard op: {name}") 2025-12-04T11:29:55.6015814Z W1204 11:29:45.510000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] [0/0_1] torch._inductor.codecache.BypassFxGraphCache: Unable to pickle non-standard op: torch.ops.prims.convert_element_type.default 2025-12-04T11:29:55.6015926Z PASSED [1.3275s] [ 93%] 2025-12-04T11:29:55.6016648Z inductor/test_compile_subprocess.py::GPUTests::test_weight_norm_conv2d_cuda <- test/inductor/test_torchinductor.py SKIPPED [0.0003s] (Skipped!) [ 94%] 2025-12-04T11:29:55.6017707Z inductor/test_compile_subprocess.py::GPUTests::test_where_broadcast_cuda <- test/inductor/test_torchinductor.py W1204 11:29:46.419000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] Unable to pickle input graph or example inputs 2025-12-04T11:29:55.6018172Z W1204 11:29:46.419000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] Traceback (most recent call last): 2025-12-04T11:29:55.6019049Z W1204 11:29:46.419000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 486, in serialize_compile 2025-12-04T11:29:55.6019409Z W1204 11:29:46.419000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] ).serialize() 2025-12-04T11:29:55.6020255Z W1204 11:29:46.419000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx_ext.py", line 210, in serialize 2025-12-04T11:29:55.6020816Z W1204 11:29:46.419000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] return _WireProtocolPickledInput(GraphPickler.dumps(self)) 2025-12-04T11:29:55.6021657Z W1204 11:29:46.419000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 127, in dumps 2025-12-04T11:29:55.6022044Z W1204 11:29:46.419000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] pickler.dump(obj) 2025-12-04T11:29:55.6022883Z W1204 11:29:46.419000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 94, in reducer_override 2025-12-04T11:29:55.6023493Z W1204 11:29:46.419000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] return _GraphModulePickleData.reduce_helper(self, obj) 2025-12-04T11:29:55.6024307Z W1204 11:29:46.419000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 352, in reduce_helper 2025-12-04T11:29:55.6024751Z W1204 11:29:46.419000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] cls(obj, pickler.options), 2025-12-04T11:29:55.6025596Z W1204 11:29:46.419000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 364, in __init__ 2025-12-04T11:29:55.6026126Z W1204 11:29:46.419000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] self.graph = _GraphPickleData(gm._graph, options) 2025-12-04T11:29:55.6026919Z W1204 11:29:46.419000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 571, in __init__ 2025-12-04T11:29:55.6027430Z W1204 11:29:46.419000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] nodes[node] = _NodePickleData(node, nodes, options) 2025-12-04T11:29:55.6028237Z W1204 11:29:46.419000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 387, in __init__ 2025-12-04T11:29:55.6028784Z W1204 11:29:46.419000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] self.target = _OpPickleData.pickle(node.target, options) 2025-12-04T11:29:55.6029580Z W1204 11:29:46.419000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 435, in pickle 2025-12-04T11:29:55.6030145Z W1204 11:29:46.419000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] return cls._pickle_op(name, _OpOverloadPickleData, options) 2025-12-04T11:29:55.6030968Z W1204 11:29:46.419000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/_graph_pickler.py", line 456, in _pickle_op 2025-12-04T11:29:55.6031585Z W1204 11:29:46.419000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] raise BypassFxGraphCache(f"Unable to pickle non-standard op: {name}") 2025-12-04T11:29:55.6032484Z W1204 11:29:46.419000 102081 site-packages/torch/_inductor/compile_fx_ext.py:493] torch._inductor.codecache.BypassFxGraphCache: Unable to pickle non-standard op: torch.ops.prims.convert_element_type.default 2025-12-04T11:29:55.6032606Z PASSED [1.3723s] [ 95%] 2025-12-04T11:29:55.6033196Z inductor/test_compile_subprocess.py::GPUTests::test_where_with_logical_op_cuda <- test/inductor/test_torchinductor.py PASSED [0.8848s] [ 97%] 2025-12-04T11:29:55.6033807Z inductor/test_compile_subprocess.py::GPUTests::test_xblock_divides_xnumel_cuda <- test/inductor/test_torchinductor.py PASSED [1.0013s] [ 98%] 2025-12-04T11:29:55.6034382Z inductor/test_compile_subprocess.py::GPUTests::test_zero_dim_reductions_cuda <- test/inductor/test_torchinductor.py PASSED [0.3938s] [100%] 2025-12-04T11:29:55.6034390Z 2025-12-04T11:29:55.6035222Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_compile_subprocess/inductor.test_compile_subprocess-35b1cdd46f4129e6.xml - 2025-12-04T11:29:55.6035517Z ========== 62 passed, 10 skipped, 216 deselected in 123.52s (0:02:03) ========== 2025-12-04T11:29:55.6036166Z The following tests failed and then succeeded when run in a new process['test/inductor/test_compile_subprocess.py::GPUTests::test_remove_noop_slice_cuda'] 2025-12-04T11:29:55.6036642Z The following tests failed consistently: ['test/inductor/test_compile_subprocess.py::GPUTests::test_isinf_cuda'] 2025-12-04T11:29:55.6036683Z 2025-12-04T11:29:55.6037304Z FINISHED PRINTING LOG FILE of inductor/test_compile_subprocess 3/3 (test/test-reports/inductor.test_compile_subprocess_3.3_92ce494afd455b37_.log) 2025-12-04T11:29:55.6037339Z 2025-12-04T11:29:55.6037743Z Finished inductor/test_compile_subprocess 3/3 ... [2025-12-04 11:29:55.314123][8223.697013523], took 9.14min 2025-12-04T11:29:55.6038616Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_compile_subprocess/inductor.test_compile_subprocess-84a2c5e5cdda7bdd.xml 2025-12-04T11:29:55.6039526Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_compile_subprocess/inductor.test_compile_subprocess-97e49e1b6070e822.xml 2025-12-04T11:29:55.6040451Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_compile_subprocess/inductor.test_compile_subprocess-aaac502093c587a7.xml 2025-12-04T11:29:55.6041322Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_compile_subprocess/inductor.test_compile_subprocess-decce829c4432557.xml 2025-12-04T11:29:55.6042203Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_compile_subprocess/inductor.test_compile_subprocess-491de48d6c983340.xml 2025-12-04T11:29:55.6043072Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_compile_subprocess/inductor.test_compile_subprocess-35b1cdd46f4129e6.xml 2025-12-04T11:29:55.8960730Z Uploading logs for 57119749248 to S3 2025-12-04T11:29:56.0366644Z Uploading artifacts took 0.45 seconds 2025-12-04T11:29:56.0367082Z inductor/test_compile_subprocess 3/3 failed! 2025-12-04T11:29:56.0371230Z Running inductor/test_flex_decoding 1/1 ... [2025-12-04 11:29:56.036923][8224.419818157] 2025-12-04T11:29:56.0371819Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T11:29:56.0376473Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_flex_decoding.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 11:29:56.037405] 2025-12-04T11:30:01.3805262Z 2025-12-04T11:30:01.3806325Z inductor/test_flex_decoding 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_flex_decoding_1.1_a47e1c88f2ff3c9a_.log 2025-12-04T11:30:01.3807310Z Running 0 items in this shard: 2025-12-04T11:30:01.3807529Z 2025-12-04T11:30:01.3807941Z Finished inductor/test_flex_decoding 1/1 ... [2025-12-04 11:30:01.380338][8229.763233694], took 0.09min 2025-12-04T11:30:01.3899925Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_flex_decoding/inductor.test_flex_decoding-4523fe803428b665.xml 2025-12-04T11:30:01.4186275Z Running inductor/test_deterministic 5/8 ... [2025-12-04 11:30:01.418353][8229.801249028] 2025-12-04T11:30:01.4186883Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T11:30:01.4190077Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_deterministic.py', '--shard-id=5', '--num-shards=8', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 11:30:01.418771] 2025-12-04T11:42:56.1686884Z 2025-12-04T11:42:56.1688140Z PRINTING LOG FILE of inductor/test_deterministic 5/8 (test/test-reports/inductor.test_deterministic_5.8_04041ff7a6ce6208_.log) 2025-12-04T11:42:56.1691756Z W1204 11:30:10.639000 105004 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T11:42:56.1693266Z Test results will be stored in test-reports/python-pytest/inductor.test_deterministic/inductor.test_deterministic-ccc55353a2e77d8f.xml 2025-12-04T11:42:56.1694168Z ============================= test session starts ============================== 2025-12-04T11:42:56.1695016Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T11:42:56.1695628Z cachedir: .pytest_cache 2025-12-04T11:42:56.1696535Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:42:56.1697350Z rootdir: /var/lib/jenkins/workspace 2025-12-04T11:42:56.1697716Z configfile: pytest.ini 2025-12-04T11:42:56.1698469Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:42:56.1699264Z collecting ... collected 32 items 2025-12-04T11:42:56.1699770Z stepcurrent: Cannot find last run test, not skipping 2025-12-04T11:42:56.1702507Z Running 3 items in this shard: test/inductor/test_deterministic.py::DeterministicTest::test_run2run_determinism_model_name_DistillGPT2_training_or_inference_inference_precision_bfloat16, test/inductor/test_deterministic.py::DeterministicTest::test_run2run_determinism_model_name_GoogleFnet_training_or_inference_inference_precision_amp, test/inductor/test_deterministic.py::DeterministicTest::test_run2run_determinism_model_name_GoogleFnet_training_or_inference_inference_precision_float16 2025-12-04T11:42:56.1705082Z 2025-12-04T11:42:56.1705953Z inductor/test_deterministic.py::DeterministicTest::test_run2run_determinism_model_name_DistillGPT2_training_or_inference_inference_precision_bfloat16 ('RERUN', {'yellow': True}) [70.9025s] [ 33%] 2025-12-04T11:42:56.1707848Z inductor/test_deterministic.py::DeterministicTest::test_run2run_determinism_model_name_DistillGPT2_training_or_inference_inference_precision_bfloat16 ('RERUN', {'yellow': True}) [46.2098s] [ 33%] 2025-12-04T11:42:56.1709639Z inductor/test_deterministic.py::DeterministicTest::test_run2run_determinism_model_name_DistillGPT2_training_or_inference_inference_precision_bfloat16 FAILED [46.7201s] [ 33%] 2025-12-04T11:42:56.1710546Z 2025-12-04T11:42:56.1710708Z ==================================== RERUNS ==================================== 2025-12-04T11:42:56.1711514Z _ DeterministicTest.test_run2run_determinism_model_name_DistillGPT2_training_or_inference_inference_precision_bfloat16 _ 2025-12-04T11:42:56.1712310Z Traceback (most recent call last): 2025-12-04T11:42:56.1713036Z File "/var/lib/jenkins/workspace/test/inductor/test_deterministic.py", line 166, in test_run2run_determinism 2025-12-04T11:42:56.1713760Z self.assertTrue( 2025-12-04T11:42:56.1714272Z File "/opt/conda/envs/py_3.10/lib/python3.10/unittest/case.py", line 687, in assertTrue 2025-12-04T11:42:56.1714874Z raise self.failureException(msg) 2025-12-04T11:42:56.1715451Z AssertionError: False is not true : stdout: cuda eval DistillGPT2 2025-12-04T11:42:56.1716165Z TorchDynamo optimized model failed to run because of following error 2025-12-04T11:42:56.1716953Z fail_to_run 2025-12-04T11:42:56.1717201Z , stderr: 2025-12-04T11:42:56.1717818Z loading model: 0it [00:00, ?it/s]`loss_type=None` was set in the config but it is unrecognized. Using the default loss: `ForCausalLMLoss`. 2025-12-04T11:42:56.1719029Z WARNING:transformers.modeling_utils:`loss_type=None` was set in the config but it is unrecognized. Using the default loss: `ForCausalLMLoss`. 2025-12-04T11:42:56.1719747Z 2025-12-04T11:42:56.1719867Z loading model: 0it [00:03, ?it/s] 2025-12-04T11:42:56.1720574Z W1204 11:31:16.465000 105261 site-packages/torch/_inductor/utils.py:1703] [1/0_1] Not enough SMs to use max_autotune_gemm mode 2025-12-04T11:42:56.1721789Z W1204 11:31:19.116000 105261 site-packages/torch/_inductor/utils.py:1361] [1/0_1] on error, temporary cache dir kept at /tmp/tmp1ko3ckfr/tmplc0o_fjw 2025-12-04T11:42:56.1722608Z ERROR:common: 2025-12-04T11:42:56.1722892Z Traceback (most recent call last): 2025-12-04T11:42:56.1723528Z File "/var/lib/jenkins/workspace/benchmarks/dynamo/common.py", line 2333, in check_accuracy 2025-12-04T11:42:56.1724183Z new_result = self.run_n_iterations( 2025-12-04T11:42:56.1724879Z File "/var/lib/jenkins/workspace/benchmarks/dynamo/common.py", line 2043, in run_n_iterations 2025-12-04T11:42:56.1725589Z model_iter_fn(mod, inputs, collect_outputs=False) 2025-12-04T11:42:56.1726403Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T11:42:56.1727285Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T11:42:56.1728185Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 992, in _compile_fx_inner 2025-12-04T11:42:56.1729034Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T11:42:56.1729897Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 988, in _compile_fx_inner 2025-12-04T11:42:56.1730692Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T11:42:56.1731510Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T11:42:56.1732514Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T11:42:56.1733495Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1471, in codegen_and_compile 2025-12-04T11:42:56.1734291Z _check_triton_bf16_support(graph) 2025-12-04T11:42:56.1735094Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2911, in _check_triton_bf16_support 2025-12-04T11:42:56.1735892Z warn_and_skip(node.get_device()) 2025-12-04T11:42:56.1736722Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2894, in warn_and_skip 2025-12-04T11:42:56.1737496Z raise SkipFrame("BF16 is not supported") 2025-12-04T11:42:56.1738021Z torch._inductor.exc.InductorError: SkipFrame: BF16 is not supported 2025-12-04T11:42:56.1738404Z 2025-12-04T11:42:56.1739117Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T11:42:56.1739979Z 2025-12-04T11:42:56.1739983Z 2025-12-04T11:42:56.1739988Z 2025-12-04T11:42:56.1740213Z To execute this test, run the following from the base repo dir: 2025-12-04T11:42:56.1741461Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_deterministic.py DeterministicTest.test_run2run_determinism_model_name_DistillGPT2_training_or_inference_inference_precision_bfloat16 2025-12-04T11:42:56.1742480Z 2025-12-04T11:42:56.1742765Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:42:56.1743415Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:42:56.1745003Z Command /opt/conda/envs/py_3.10/bin/python /var/lib/jenkins/workspace/benchmarks/dynamo/huggingface.py --backend inductor --bfloat16 --accuracy --only DistillGPT2 --inference --disable-cudagraphs --save-model-outputs-to=/tmp/tmpgw838ry8/saved.pkl 2025-12-04T11:42:56.1747601Z Command /opt/conda/envs/py_3.10/bin/python /var/lib/jenkins/workspace/benchmarks/dynamo/huggingface.py --backend inductor --bfloat16 --accuracy --only DistillGPT2 --inference --disable-cudagraphs --compare-model-outputs-with=/tmp/tmpgw838ry8/saved.pkl 2025-12-04T11:42:56.1749497Z _ DeterministicTest.test_run2run_determinism_model_name_DistillGPT2_training_or_inference_inference_precision_bfloat16 _ 2025-12-04T11:42:56.1750293Z Traceback (most recent call last): 2025-12-04T11:42:56.1751074Z File "/var/lib/jenkins/workspace/test/inductor/test_deterministic.py", line 166, in test_run2run_determinism 2025-12-04T11:42:56.1751806Z self.assertTrue( 2025-12-04T11:42:56.1752306Z File "/opt/conda/envs/py_3.10/lib/python3.10/unittest/case.py", line 687, in assertTrue 2025-12-04T11:42:56.1752901Z raise self.failureException(msg) 2025-12-04T11:42:56.1753459Z AssertionError: False is not true : stdout: cuda eval DistillGPT2 2025-12-04T11:42:56.1754231Z TorchDynamo optimized model failed to run because of following error 2025-12-04T11:42:56.1754730Z fail_to_run 2025-12-04T11:42:56.1754958Z , stderr: 2025-12-04T11:42:56.1755621Z loading model: 0it [00:00, ?it/s]`loss_type=None` was set in the config but it is unrecognized. Using the default loss: `ForCausalLMLoss`. 2025-12-04T11:42:56.1756827Z WARNING:transformers.modeling_utils:`loss_type=None` was set in the config but it is unrecognized. Using the default loss: `ForCausalLMLoss`. 2025-12-04T11:42:56.1757530Z 2025-12-04T11:42:56.1757666Z loading model: 0it [00:03, ?it/s] 2025-12-04T11:42:56.1758362Z W1204 11:32:02.694000 105463 site-packages/torch/_inductor/utils.py:1703] [1/0_1] Not enough SMs to use max_autotune_gemm mode 2025-12-04T11:42:56.1759586Z W1204 11:32:05.334000 105463 site-packages/torch/_inductor/utils.py:1361] [1/0_1] on error, temporary cache dir kept at /tmp/tmppgs9zs5u/tmp5vkay3a5 2025-12-04T11:42:56.1760402Z ERROR:common: 2025-12-04T11:42:56.1760688Z Traceback (most recent call last): 2025-12-04T11:42:56.1761309Z File "/var/lib/jenkins/workspace/benchmarks/dynamo/common.py", line 2333, in check_accuracy 2025-12-04T11:42:56.1761982Z new_result = self.run_n_iterations( 2025-12-04T11:42:56.1762639Z File "/var/lib/jenkins/workspace/benchmarks/dynamo/common.py", line 2043, in run_n_iterations 2025-12-04T11:42:56.1763339Z model_iter_fn(mod, inputs, collect_outputs=False) 2025-12-04T11:42:56.1764139Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T11:42:56.1765021Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T11:42:56.1765927Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 992, in _compile_fx_inner 2025-12-04T11:42:56.1766766Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T11:42:56.1767606Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 988, in _compile_fx_inner 2025-12-04T11:42:56.1768406Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T11:42:56.1769228Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T11:42:56.1770211Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T11:42:56.1771673Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1471, in codegen_and_compile 2025-12-04T11:42:56.1772479Z _check_triton_bf16_support(graph) 2025-12-04T11:42:56.1773287Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2911, in _check_triton_bf16_support 2025-12-04T11:42:56.1774092Z warn_and_skip(node.get_device()) 2025-12-04T11:42:56.1774832Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2894, in warn_and_skip 2025-12-04T11:42:56.1775610Z raise SkipFrame("BF16 is not supported") 2025-12-04T11:42:56.1776126Z torch._inductor.exc.InductorError: SkipFrame: BF16 is not supported 2025-12-04T11:42:56.1776608Z 2025-12-04T11:42:56.1777324Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T11:42:56.1778182Z 2025-12-04T11:42:56.1778187Z 2025-12-04T11:42:56.1778193Z 2025-12-04T11:42:56.1778410Z To execute this test, run the following from the base repo dir: 2025-12-04T11:42:56.1779773Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_deterministic.py DeterministicTest.test_run2run_determinism_model_name_DistillGPT2_training_or_inference_inference_precision_bfloat16 2025-12-04T11:42:56.1780804Z 2025-12-04T11:42:56.1781090Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:42:56.1781711Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:42:56.1783438Z Command /opt/conda/envs/py_3.10/bin/python /var/lib/jenkins/workspace/benchmarks/dynamo/huggingface.py --backend inductor --bfloat16 --accuracy --only DistillGPT2 --inference --disable-cudagraphs --save-model-outputs-to=/tmp/tmpgw838ry8/saved.pkl 2025-12-04T11:42:56.1786034Z Command /opt/conda/envs/py_3.10/bin/python /var/lib/jenkins/workspace/benchmarks/dynamo/huggingface.py --backend inductor --bfloat16 --accuracy --only DistillGPT2 --inference --disable-cudagraphs --compare-model-outputs-with=/tmp/tmpgw838ry8/saved.pkl 2025-12-04T11:42:56.1787644Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:42:56.1789286Z Command /opt/conda/envs/py_3.10/bin/python /var/lib/jenkins/workspace/benchmarks/dynamo/huggingface.py --backend inductor --bfloat16 --accuracy --only DistillGPT2 --inference --disable-cudagraphs --save-model-outputs-to=/tmp/tmpzbb58pzn/saved.pkl 2025-12-04T11:42:56.1791870Z Command /opt/conda/envs/py_3.10/bin/python /var/lib/jenkins/workspace/benchmarks/dynamo/huggingface.py --backend inductor --bfloat16 --accuracy --only DistillGPT2 --inference --disable-cudagraphs --compare-model-outputs-with=/tmp/tmpzbb58pzn/saved.pkl 2025-12-04T11:42:56.1793412Z =================================== FAILURES =================================== 2025-12-04T11:42:56.1794236Z _ DeterministicTest.test_run2run_determinism_model_name_DistillGPT2_training_or_inference_inference_precision_bfloat16 _ 2025-12-04T11:42:56.1795029Z Traceback (most recent call last): 2025-12-04T11:42:56.1795754Z File "/var/lib/jenkins/workspace/test/inductor/test_deterministic.py", line 166, in test_run2run_determinism 2025-12-04T11:42:56.1796498Z self.assertTrue( 2025-12-04T11:42:56.1797006Z File "/opt/conda/envs/py_3.10/lib/python3.10/unittest/case.py", line 687, in assertTrue 2025-12-04T11:42:56.1797589Z raise self.failureException(msg) 2025-12-04T11:42:56.1798167Z AssertionError: False is not true : stdout: cuda eval DistillGPT2 2025-12-04T11:42:56.1798958Z TorchDynamo optimized model failed to run because of following error 2025-12-04T11:42:56.1799475Z fail_to_run 2025-12-04T11:42:56.1799706Z , stderr: 2025-12-04T11:42:56.1800337Z loading model: 0it [00:00, ?it/s]`loss_type=None` was set in the config but it is unrecognized. Using the default loss: `ForCausalLMLoss`. 2025-12-04T11:42:56.1801538Z WARNING:transformers.modeling_utils:`loss_type=None` was set in the config but it is unrecognized. Using the default loss: `ForCausalLMLoss`. 2025-12-04T11:42:56.1802250Z 2025-12-04T11:42:56.1802381Z loading model: 0it [00:03, ?it/s] 2025-12-04T11:42:56.1803080Z W1204 11:32:49.445000 105661 site-packages/torch/_inductor/utils.py:1703] [1/0_1] Not enough SMs to use max_autotune_gemm mode 2025-12-04T11:42:56.1804267Z W1204 11:32:52.081000 105661 site-packages/torch/_inductor/utils.py:1361] [1/0_1] on error, temporary cache dir kept at /tmp/tmp3t6l4de6/tmp4psly7t8 2025-12-04T11:42:56.1805079Z ERROR:common: 2025-12-04T11:42:56.1805362Z Traceback (most recent call last): 2025-12-04T11:42:56.1805982Z File "/var/lib/jenkins/workspace/benchmarks/dynamo/common.py", line 2333, in check_accuracy 2025-12-04T11:42:56.1806654Z new_result = self.run_n_iterations( 2025-12-04T11:42:56.1807312Z File "/var/lib/jenkins/workspace/benchmarks/dynamo/common.py", line 2043, in run_n_iterations 2025-12-04T11:42:56.1808015Z model_iter_fn(mod, inputs, collect_outputs=False) 2025-12-04T11:42:56.1808813Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T11:42:56.1809741Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T11:42:56.1810651Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 992, in _compile_fx_inner 2025-12-04T11:42:56.1811485Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T11:42:56.1812331Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 988, in _compile_fx_inner 2025-12-04T11:42:56.1813173Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T11:42:56.1814014Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T11:42:56.1815032Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T11:42:56.1816022Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1471, in codegen_and_compile 2025-12-04T11:42:56.1816917Z _check_triton_bf16_support(graph) 2025-12-04T11:42:56.1817761Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2911, in _check_triton_bf16_support 2025-12-04T11:42:56.1818583Z warn_and_skip(node.get_device()) 2025-12-04T11:42:56.1819319Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2894, in warn_and_skip 2025-12-04T11:42:56.1820095Z raise SkipFrame("BF16 is not supported") 2025-12-04T11:42:56.1820613Z torch._inductor.exc.InductorError: SkipFrame: BF16 is not supported 2025-12-04T11:42:56.1821012Z 2025-12-04T11:42:56.1821738Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T11:42:56.1822600Z 2025-12-04T11:42:56.1822605Z 2025-12-04T11:42:56.1822610Z 2025-12-04T11:42:56.1822829Z To execute this test, run the following from the base repo dir: 2025-12-04T11:42:56.1824082Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_deterministic.py DeterministicTest.test_run2run_determinism_model_name_DistillGPT2_training_or_inference_inference_precision_bfloat16 2025-12-04T11:42:56.1825116Z 2025-12-04T11:42:56.1825397Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:42:56.1826022Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:42:56.1827625Z Command /opt/conda/envs/py_3.10/bin/python /var/lib/jenkins/workspace/benchmarks/dynamo/huggingface.py --backend inductor --bfloat16 --accuracy --only DistillGPT2 --inference --disable-cudagraphs --save-model-outputs-to=/tmp/tmpgw838ry8/saved.pkl 2025-12-04T11:42:56.1830247Z Command /opt/conda/envs/py_3.10/bin/python /var/lib/jenkins/workspace/benchmarks/dynamo/huggingface.py --backend inductor --bfloat16 --accuracy --only DistillGPT2 --inference --disable-cudagraphs --compare-model-outputs-with=/tmp/tmpgw838ry8/saved.pkl 2025-12-04T11:42:56.1831848Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:42:56.1833425Z Command /opt/conda/envs/py_3.10/bin/python /var/lib/jenkins/workspace/benchmarks/dynamo/huggingface.py --backend inductor --bfloat16 --accuracy --only DistillGPT2 --inference --disable-cudagraphs --save-model-outputs-to=/tmp/tmpzbb58pzn/saved.pkl 2025-12-04T11:42:56.1836030Z Command /opt/conda/envs/py_3.10/bin/python /var/lib/jenkins/workspace/benchmarks/dynamo/huggingface.py --backend inductor --bfloat16 --accuracy --only DistillGPT2 --inference --disable-cudagraphs --compare-model-outputs-with=/tmp/tmpzbb58pzn/saved.pkl 2025-12-04T11:42:56.1837649Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:42:56.1839276Z Command /opt/conda/envs/py_3.10/bin/python /var/lib/jenkins/workspace/benchmarks/dynamo/huggingface.py --backend inductor --bfloat16 --accuracy --only DistillGPT2 --inference --disable-cudagraphs --save-model-outputs-to=/tmp/tmpvb4nn8d7/saved.pkl 2025-12-04T11:42:56.1841894Z Command /opt/conda/envs/py_3.10/bin/python /var/lib/jenkins/workspace/benchmarks/dynamo/huggingface.py --backend inductor --bfloat16 --accuracy --only DistillGPT2 --inference --disable-cudagraphs --compare-model-outputs-with=/tmp/tmpvb4nn8d7/saved.pkl 2025-12-04T11:42:56.1844040Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_deterministic/inductor.test_deterministic-ccc55353a2e77d8f.xml - 2025-12-04T11:42:56.1845159Z =========================== short test summary info ============================ 2025-12-04T11:42:56.1846703Z FAILED [46.7201s] inductor/test_deterministic.py::DeterministicTest::test_run2run_determinism_model_name_DistillGPT2_training_or_inference_inference_precision_bfloat16 - AssertionError: False is not true : stdout: cuda eval DistillGPT2 2025-12-04T11:42:56.1848284Z TorchDynamo optimized model failed to run because of following error 2025-12-04T11:42:56.1848774Z fail_to_run 2025-12-04T11:42:56.1849021Z , stderr: 2025-12-04T11:42:56.1849692Z loading model: 0it [00:00, ?it/s]`loss_type=None` was set in the config but it is unrecognized. Using the default loss: `ForCausalLMLoss`. 2025-12-04T11:42:56.1850894Z WARNING:transformers.modeling_utils:`loss_type=None` was set in the config but it is unrecognized. Using the default loss: `ForCausalLMLoss`. 2025-12-04T11:42:56.1851604Z 2025-12-04T11:42:56.1851725Z loading model: 0it [00:03, ?it/s] 2025-12-04T11:42:56.1852436Z W1204 11:32:49.445000 105661 site-packages/torch/_inductor/utils.py:1703] [1/0_1] Not enough SMs to use max_autotune_gemm mode 2025-12-04T11:42:56.1853629Z W1204 11:32:52.081000 105661 site-packages/torch/_inductor/utils.py:1361] [1/0_1] on error, temporary cache dir kept at /tmp/tmp3t6l4de6/tmp4psly7t8 2025-12-04T11:42:56.1854446Z ERROR:common: 2025-12-04T11:42:56.1854716Z Traceback (most recent call last): 2025-12-04T11:42:56.1855362Z File "/var/lib/jenkins/workspace/benchmarks/dynamo/common.py", line 2333, in check_accuracy 2025-12-04T11:42:56.1856030Z new_result = self.run_n_iterations( 2025-12-04T11:42:56.1856763Z File "/var/lib/jenkins/workspace/benchmarks/dynamo/common.py", line 2043, in run_n_iterations 2025-12-04T11:42:56.1857485Z model_iter_fn(mod, inputs, collect_outputs=False) 2025-12-04T11:42:56.1858286Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T11:42:56.1859173Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T11:42:56.1860067Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 992, in _compile_fx_inner 2025-12-04T11:42:56.1860921Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T11:42:56.1861761Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 988, in _compile_fx_inner 2025-12-04T11:42:56.1862545Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T11:42:56.1863373Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T11:42:56.1864386Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T11:42:56.1865381Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1471, in codegen_and_compile 2025-12-04T11:42:56.1866171Z _check_triton_bf16_support(graph) 2025-12-04T11:42:56.1866981Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2911, in _check_triton_bf16_support 2025-12-04T11:42:56.1867802Z warn_and_skip(node.get_device()) 2025-12-04T11:42:56.1868537Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2894, in warn_and_skip 2025-12-04T11:42:56.1869296Z raise SkipFrame("BF16 is not supported") 2025-12-04T11:42:56.1869879Z torch._inductor.exc.InductorError: SkipFrame: BF16 is not supported 2025-12-04T11:42:56.1870270Z 2025-12-04T11:42:56.1871178Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T11:42:56.1872037Z 2025-12-04T11:42:56.1872042Z 2025-12-04T11:42:56.1872046Z 2025-12-04T11:42:56.1872282Z To execute this test, run the following from the base repo dir: 2025-12-04T11:42:56.1873666Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_deterministic.py DeterministicTest.test_run2run_determinism_model_name_DistillGPT2_training_or_inference_inference_precision_bfloat16 2025-12-04T11:42:56.1874709Z 2025-12-04T11:42:56.1874980Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:42:56.1875593Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:42:56.1876110Z ==================== 1 failed, 2 rerun in 163.86s (0:02:43) ==================== 2025-12-04T11:42:56.1876529Z Got exit code 1 2025-12-04T11:42:56.1876859Z Retrying single test... 2025-12-04T11:42:56.1877508Z W1204 11:33:05.027000 105760 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T11:42:56.1878688Z Test results will be stored in test-reports/python-pytest/inductor.test_deterministic/inductor.test_deterministic-cbc1aeff512c7b0d.xml 2025-12-04T11:42:56.1879604Z ============================= test session starts ============================== 2025-12-04T11:42:56.1880276Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T11:42:56.1880884Z cachedir: .pytest_cache 2025-12-04T11:42:56.1881588Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:42:56.1882381Z rootdir: /var/lib/jenkins/workspace 2025-12-04T11:42:56.1882746Z configfile: pytest.ini 2025-12-04T11:42:56.1883529Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T11:42:56.1884471Z collecting ... collected 32 items / 2 deselected / 30 selected 2025-12-04T11:42:56.1885813Z stepcurrent: skipping 0 already run items. Running only test/inductor/test_deterministic.py::DeterministicTest::test_run2run_determinism_model_name_DistillGPT2_training_or_inference_inference_precision_bfloat16 2025-12-04T11:42:56.1887034Z Running 1 items in this shard 2025-12-04T11:42:56.1887247Z 2025-12-04T11:42:56.1888140Z inductor/test_deterministic.py::DeterministicTest::test_run2run_determinism_model_name_DistillGPT2_training_or_inference_inference_precision_bfloat16 ('RERUN', {'yellow': True}) [77.2762s] [100%] 2025-12-04T11:42:56.1890015Z inductor/test_deterministic.py::DeterministicTest::test_run2run_determinism_model_name_DistillGPT2_training_or_inference_inference_precision_bfloat16 ('RERUN', {'yellow': True}) [77.9137s] [100%] 2025-12-04T11:42:56.1891811Z inductor/test_deterministic.py::DeterministicTest::test_run2run_determinism_model_name_DistillGPT2_training_or_inference_inference_precision_bfloat16 FAILED [77.9025s] [100%] 2025-12-04T11:42:56.1892734Z 2025-12-04T11:42:56.1892881Z ==================================== RERUNS ==================================== 2025-12-04T11:42:56.1893686Z _ DeterministicTest.test_run2run_determinism_model_name_DistillGPT2_training_or_inference_inference_precision_bfloat16 _ 2025-12-04T11:42:56.1894471Z Traceback (most recent call last): 2025-12-04T11:42:56.1895205Z File "/var/lib/jenkins/workspace/test/inductor/test_deterministic.py", line 166, in test_run2run_determinism 2025-12-04T11:42:56.1895941Z self.assertTrue( 2025-12-04T11:42:56.1896520Z File "/opt/conda/envs/py_3.10/lib/python3.10/unittest/case.py", line 687, in assertTrue 2025-12-04T11:42:56.1897115Z raise self.failureException(msg) 2025-12-04T11:42:56.1897780Z AssertionError: False is not true : stdout: cuda eval DistillGPT2 2025-12-04T11:42:56.1898511Z TorchDynamo optimized model failed to run because of following error 2025-12-04T11:42:56.1899000Z fail_to_run 2025-12-04T11:42:56.1899248Z , stderr: 2025-12-04T11:42:56.1899891Z loading model: 0it [00:00, ?it/s]`loss_type=None` was set in the config but it is unrecognized. Using the default loss: `ForCausalLMLoss`. 2025-12-04T11:42:56.1901097Z WARNING:transformers.modeling_utils:`loss_type=None` was set in the config but it is unrecognized. Using the default loss: `ForCausalLMLoss`. 2025-12-04T11:42:56.1901880Z 2025-12-04T11:42:56.1901998Z loading model: 0it [00:03, ?it/s] 2025-12-04T11:42:56.1902792Z [W1204 11:34:00.134044832 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.1903446Z 2025-12-04T11:42:56.1903977Z [W1204 11:34:16.418483514 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.1904632Z 2025-12-04T11:42:56.1905156Z [W1204 11:34:16.422208999 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.1905855Z 2025-12-04T11:42:56.1906368Z [W1204 11:34:16.422477222 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.1907030Z 2025-12-04T11:42:56.1907539Z [W1204 11:34:16.423316135 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.1908195Z 2025-12-04T11:42:56.1908705Z [W1204 11:34:16.423543629 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.1909359Z 2025-12-04T11:42:56.1909886Z [W1204 11:34:16.424286866 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.1910530Z 2025-12-04T11:42:56.1911060Z [W1204 11:34:16.424487043 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.1911713Z 2025-12-04T11:42:56.1912224Z [W1204 11:34:16.425526362 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.1912884Z 2025-12-04T11:42:56.1913401Z [W1204 11:34:16.425722561 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.1914061Z 2025-12-04T11:42:56.1914573Z [W1204 11:34:16.426208761 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.1915231Z 2025-12-04T11:42:56.1915748Z [W1204 11:34:16.426404658 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.1916394Z 2025-12-04T11:42:56.1916920Z [W1204 11:34:16.427075644 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.1917570Z 2025-12-04T11:42:56.1918098Z [W1204 11:34:16.427260432 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.1918744Z 2025-12-04T11:42:56.1919259Z [W1204 11:34:16.428194822 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.1919923Z 2025-12-04T11:42:56.1920442Z [W1204 11:34:16.428378324 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.1921106Z 2025-12-04T11:42:56.1921618Z [W1204 11:34:16.428854921 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.1922266Z 2025-12-04T11:42:56.1922848Z [W1204 11:34:16.429038075 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.1923506Z 2025-12-04T11:42:56.1924033Z [W1204 11:34:16.429660189 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.1924686Z 2025-12-04T11:42:56.1925200Z [W1204 11:34:16.429841597 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.1925904Z 2025-12-04T11:42:56.1926449Z [W1204 11:34:16.430767030 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.1927110Z 2025-12-04T11:42:56.1927622Z [W1204 11:34:16.430951198 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.1928269Z 2025-12-04T11:42:56.1928798Z [W1204 11:34:16.431392291 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.1929477Z 2025-12-04T11:42:56.1930001Z [W1204 11:34:16.431576952 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.1930689Z 2025-12-04T11:42:56.1931198Z [W1204 11:34:16.432198753 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.1931863Z 2025-12-04T11:42:56.1932377Z [W1204 11:34:16.432379672 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.1933037Z 2025-12-04T11:42:56.1933555Z [W1204 11:34:16.433283657 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.1934201Z 2025-12-04T11:42:56.1934731Z [W1204 11:34:16.433466400 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.1935378Z 2025-12-04T11:42:56.1935909Z [W1204 11:34:16.433906459 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.1936659Z 2025-12-04T11:42:56.1937173Z [W1204 11:34:16.434093470 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.1937842Z 2025-12-04T11:42:56.1938356Z [W1204 11:34:16.434701643 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.1939021Z 2025-12-04T11:42:56.1939536Z [W1204 11:34:16.434883296 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.1940188Z 2025-12-04T11:42:56.1940714Z [W1204 11:34:16.435766411 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.1941364Z 2025-12-04T11:42:56.1941892Z [W1204 11:34:16.435950648 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.1942544Z 2025-12-04T11:42:56.1943055Z [W1204 11:34:16.436394102 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.1943726Z 2025-12-04T11:42:56.1944239Z [W1204 11:34:16.436584023 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.1944904Z 2025-12-04T11:42:56.1945417Z [W1204 11:34:16.437205111 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.1946078Z 2025-12-04T11:42:56.1946589Z [W1204 11:34:16.437386770 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.1947241Z 2025-12-04T11:42:56.1947780Z W1204 11:34:17.235000 105974 site-packages/torch/_inductor/utils.py:1703] [1/0_1] Not enough SMs to use max_autotune_gemm mode 2025-12-04T11:42:56.1948961Z W1204 11:34:19.893000 105974 site-packages/torch/_inductor/utils.py:1361] [1/0_1] on error, temporary cache dir kept at /tmp/tmpvdzzse9j/tmp2ba4p3re 2025-12-04T11:42:56.1949777Z ERROR:common: 2025-12-04T11:42:56.1950093Z Traceback (most recent call last): 2025-12-04T11:42:56.1950733Z File "/var/lib/jenkins/workspace/benchmarks/dynamo/common.py", line 2333, in check_accuracy 2025-12-04T11:42:56.1951382Z new_result = self.run_n_iterations( 2025-12-04T11:42:56.1952071Z File "/var/lib/jenkins/workspace/benchmarks/dynamo/common.py", line 2043, in run_n_iterations 2025-12-04T11:42:56.1952783Z model_iter_fn(mod, inputs, collect_outputs=False) 2025-12-04T11:42:56.1953562Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T11:42:56.1954442Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T11:42:56.1955379Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 992, in _compile_fx_inner 2025-12-04T11:42:56.1956222Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T11:42:56.1957046Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 988, in _compile_fx_inner 2025-12-04T11:42:56.1957847Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T11:42:56.1958667Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T11:42:56.1959662Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T11:42:56.1960634Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1471, in codegen_and_compile 2025-12-04T11:42:56.1961425Z _check_triton_bf16_support(graph) 2025-12-04T11:42:56.1962225Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2911, in _check_triton_bf16_support 2025-12-04T11:42:56.1963035Z warn_and_skip(node.get_device()) 2025-12-04T11:42:56.1963763Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2894, in warn_and_skip 2025-12-04T11:42:56.1964532Z raise SkipFrame("BF16 is not supported") 2025-12-04T11:42:56.1965058Z torch._inductor.exc.InductorError: SkipFrame: BF16 is not supported 2025-12-04T11:42:56.1965443Z 2025-12-04T11:42:56.1966157Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T11:42:56.1967015Z 2025-12-04T11:42:56.1967020Z 2025-12-04T11:42:56.1967024Z 2025-12-04T11:42:56.1967242Z To execute this test, run the following from the base repo dir: 2025-12-04T11:42:56.1968486Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_deterministic.py DeterministicTest.test_run2run_determinism_model_name_DistillGPT2_training_or_inference_inference_precision_bfloat16 2025-12-04T11:42:56.1969508Z 2025-12-04T11:42:56.1969791Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:42:56.1970431Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:42:56.1972204Z Command /opt/conda/envs/py_3.10/bin/python /var/lib/jenkins/workspace/benchmarks/dynamo/huggingface.py --backend inductor --bfloat16 --accuracy --only DistillGPT2 --inference --disable-cudagraphs --save-model-outputs-to=/tmp/tmp2kpkd29m/saved.pkl 2025-12-04T11:42:56.1974809Z Command /opt/conda/envs/py_3.10/bin/python /var/lib/jenkins/workspace/benchmarks/dynamo/huggingface.py --backend inductor --bfloat16 --accuracy --only DistillGPT2 --inference --disable-cudagraphs --compare-model-outputs-with=/tmp/tmp2kpkd29m/saved.pkl 2025-12-04T11:42:56.1976872Z _ DeterministicTest.test_run2run_determinism_model_name_DistillGPT2_training_or_inference_inference_precision_bfloat16 _ 2025-12-04T11:42:56.1977669Z Traceback (most recent call last): 2025-12-04T11:42:56.1978386Z File "/var/lib/jenkins/workspace/test/inductor/test_deterministic.py", line 166, in test_run2run_determinism 2025-12-04T11:42:56.1979123Z self.assertTrue( 2025-12-04T11:42:56.1979678Z File "/opt/conda/envs/py_3.10/lib/python3.10/unittest/case.py", line 687, in assertTrue 2025-12-04T11:42:56.1980275Z raise self.failureException(msg) 2025-12-04T11:42:56.1980887Z AssertionError: False is not true : stdout: cuda eval DistillGPT2 2025-12-04T11:42:56.1981619Z TorchDynamo optimized model failed to run because of following error 2025-12-04T11:42:56.1982120Z fail_to_run 2025-12-04T11:42:56.1982351Z , stderr: 2025-12-04T11:42:56.1982986Z loading model: 0it [00:00, ?it/s]`loss_type=None` was set in the config but it is unrecognized. Using the default loss: `ForCausalLMLoss`. 2025-12-04T11:42:56.1984185Z WARNING:transformers.modeling_utils:`loss_type=None` was set in the config but it is unrecognized. Using the default loss: `ForCausalLMLoss`. 2025-12-04T11:42:56.1984938Z 2025-12-04T11:42:56.1985068Z loading model: 0it [00:03, ?it/s] 2025-12-04T11:42:56.1985804Z [W1204 11:35:18.805346653 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.1986476Z 2025-12-04T11:42:56.1986991Z [W1204 11:35:33.242724749 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.1987653Z 2025-12-04T11:42:56.1988167Z [W1204 11:35:33.246424966 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.1988813Z 2025-12-04T11:42:56.1989343Z [W1204 11:35:33.246679430 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.1989997Z 2025-12-04T11:42:56.1990528Z [W1204 11:35:33.247503571 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.1991176Z 2025-12-04T11:42:56.1991689Z [W1204 11:35:33.247721804 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.1992350Z 2025-12-04T11:42:56.1992860Z [W1204 11:35:33.248461285 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.1993521Z 2025-12-04T11:42:56.1994034Z [W1204 11:35:33.248674728 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.1994678Z 2025-12-04T11:42:56.1995207Z [W1204 11:35:33.249712715 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.1995860Z 2025-12-04T11:42:56.1996384Z [W1204 11:35:33.249910494 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.1997038Z 2025-12-04T11:42:56.1997548Z [W1204 11:35:33.250409881 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.1998210Z 2025-12-04T11:42:56.1998722Z [W1204 11:35:33.250610267 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.1999384Z 2025-12-04T11:42:56.1999894Z [W1204 11:35:33.251295545 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.2000540Z 2025-12-04T11:42:56.2001062Z [W1204 11:35:33.251487298 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.2001708Z 2025-12-04T11:42:56.2002305Z [W1204 11:35:33.252407313 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.2002962Z 2025-12-04T11:42:56.2003468Z [W1204 11:35:33.252599799 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.2004130Z 2025-12-04T11:42:56.2004673Z [W1204 11:35:33.253053871 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.2005335Z 2025-12-04T11:42:56.2005885Z [W1204 11:35:33.253233353 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.2006533Z 2025-12-04T11:42:56.2007058Z [W1204 11:35:33.253857047 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.2007703Z 2025-12-04T11:42:56.2008228Z [W1204 11:35:33.254041636 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.2008909Z 2025-12-04T11:42:56.2009422Z [W1204 11:35:33.254956806 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.2010083Z 2025-12-04T11:42:56.2010592Z [W1204 11:35:33.255141144 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.2011257Z 2025-12-04T11:42:56.2011771Z [W1204 11:35:33.255593279 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.2012432Z 2025-12-04T11:42:56.2012949Z [W1204 11:35:33.255774354 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.2013595Z 2025-12-04T11:42:56.2014121Z [W1204 11:35:33.256390753 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.2014771Z 2025-12-04T11:42:56.2015284Z [W1204 11:35:33.256582111 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.2015944Z 2025-12-04T11:42:56.2016523Z [W1204 11:35:33.257483215 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.2017191Z 2025-12-04T11:42:56.2017704Z [W1204 11:35:33.257666071 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.2018367Z 2025-12-04T11:42:56.2018873Z [W1204 11:35:33.258101242 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.2019523Z 2025-12-04T11:42:56.2020049Z [W1204 11:35:33.258282531 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.2020696Z 2025-12-04T11:42:56.2021221Z [W1204 11:35:33.258894598 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.2021870Z 2025-12-04T11:42:56.2022383Z [W1204 11:35:33.259074825 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.2023045Z 2025-12-04T11:42:56.2023561Z [W1204 11:35:33.259971216 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.2024218Z 2025-12-04T11:42:56.2024730Z [W1204 11:35:33.260174176 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.2025375Z 2025-12-04T11:42:56.2025946Z [W1204 11:35:33.260644428 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.2026604Z 2025-12-04T11:42:56.2027129Z [W1204 11:35:33.260823852 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.2027774Z 2025-12-04T11:42:56.2028284Z [W1204 11:35:33.261435926 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.2028980Z 2025-12-04T11:42:56.2029492Z [W1204 11:35:33.261616600 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.2030198Z 2025-12-04T11:42:56.2030676Z W1204 11:35:35.067000 106186 site-packages/torch/_inductor/utils.py:1703] [1/0_1] Not enough SMs to use max_autotune_gemm mode 2025-12-04T11:42:56.2031861Z W1204 11:35:37.713000 106186 site-packages/torch/_inductor/utils.py:1361] [1/0_1] on error, temporary cache dir kept at /tmp/tmpfvtsyn16/tmpc1et893t 2025-12-04T11:42:56.2032663Z ERROR:common: 2025-12-04T11:42:56.2032951Z Traceback (most recent call last): 2025-12-04T11:42:56.2033624Z File "/var/lib/jenkins/workspace/benchmarks/dynamo/common.py", line 2333, in check_accuracy 2025-12-04T11:42:56.2034291Z new_result = self.run_n_iterations( 2025-12-04T11:42:56.2034934Z File "/var/lib/jenkins/workspace/benchmarks/dynamo/common.py", line 2043, in run_n_iterations 2025-12-04T11:42:56.2035657Z model_iter_fn(mod, inputs, collect_outputs=False) 2025-12-04T11:42:56.2036459Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T11:42:56.2037329Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T11:42:56.2038236Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 992, in _compile_fx_inner 2025-12-04T11:42:56.2039091Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T11:42:56.2039933Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 988, in _compile_fx_inner 2025-12-04T11:42:56.2040722Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T11:42:56.2041547Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T11:42:56.2042547Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T11:42:56.2043543Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1471, in codegen_and_compile 2025-12-04T11:42:56.2044325Z _check_triton_bf16_support(graph) 2025-12-04T11:42:56.2045129Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2911, in _check_triton_bf16_support 2025-12-04T11:42:56.2045944Z warn_and_skip(node.get_device()) 2025-12-04T11:42:56.2046669Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2894, in warn_and_skip 2025-12-04T11:42:56.2047440Z raise SkipFrame("BF16 is not supported") 2025-12-04T11:42:56.2047965Z torch._inductor.exc.InductorError: SkipFrame: BF16 is not supported 2025-12-04T11:42:56.2048352Z 2025-12-04T11:42:56.2049074Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T11:42:56.2049916Z 2025-12-04T11:42:56.2049921Z 2025-12-04T11:42:56.2049926Z 2025-12-04T11:42:56.2050153Z To execute this test, run the following from the base repo dir: 2025-12-04T11:42:56.2051392Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_deterministic.py DeterministicTest.test_run2run_determinism_model_name_DistillGPT2_training_or_inference_inference_precision_bfloat16 2025-12-04T11:42:56.2052421Z 2025-12-04T11:42:56.2052691Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:42:56.2053370Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:42:56.2054966Z Command /opt/conda/envs/py_3.10/bin/python /var/lib/jenkins/workspace/benchmarks/dynamo/huggingface.py --backend inductor --bfloat16 --accuracy --only DistillGPT2 --inference --disable-cudagraphs --save-model-outputs-to=/tmp/tmp2kpkd29m/saved.pkl 2025-12-04T11:42:56.2057650Z Command /opt/conda/envs/py_3.10/bin/python /var/lib/jenkins/workspace/benchmarks/dynamo/huggingface.py --backend inductor --bfloat16 --accuracy --only DistillGPT2 --inference --disable-cudagraphs --compare-model-outputs-with=/tmp/tmp2kpkd29m/saved.pkl 2025-12-04T11:42:56.2059325Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:42:56.2060908Z Command /opt/conda/envs/py_3.10/bin/python /var/lib/jenkins/workspace/benchmarks/dynamo/huggingface.py --backend inductor --bfloat16 --accuracy --only DistillGPT2 --inference --disable-cudagraphs --save-model-outputs-to=/tmp/tmpaak6li6p/saved.pkl 2025-12-04T11:42:56.2063494Z Command /opt/conda/envs/py_3.10/bin/python /var/lib/jenkins/workspace/benchmarks/dynamo/huggingface.py --backend inductor --bfloat16 --accuracy --only DistillGPT2 --inference --disable-cudagraphs --compare-model-outputs-with=/tmp/tmpaak6li6p/saved.pkl 2025-12-04T11:42:56.2065060Z =================================== FAILURES =================================== 2025-12-04T11:42:56.2065867Z _ DeterministicTest.test_run2run_determinism_model_name_DistillGPT2_training_or_inference_inference_precision_bfloat16 _ 2025-12-04T11:42:56.2066654Z Traceback (most recent call last): 2025-12-04T11:42:56.2067385Z File "/var/lib/jenkins/workspace/test/inductor/test_deterministic.py", line 166, in test_run2run_determinism 2025-12-04T11:42:56.2068128Z self.assertTrue( 2025-12-04T11:42:56.2068618Z File "/opt/conda/envs/py_3.10/lib/python3.10/unittest/case.py", line 687, in assertTrue 2025-12-04T11:42:56.2069212Z raise self.failureException(msg) 2025-12-04T11:42:56.2069788Z AssertionError: False is not true : stdout: cuda eval DistillGPT2 2025-12-04T11:42:56.2070503Z TorchDynamo optimized model failed to run because of following error 2025-12-04T11:42:56.2071192Z fail_to_run 2025-12-04T11:42:56.2071496Z , stderr: 2025-12-04T11:42:56.2072114Z loading model: 0it [00:00, ?it/s]`loss_type=None` was set in the config but it is unrecognized. Using the default loss: `ForCausalLMLoss`. 2025-12-04T11:42:56.2073324Z WARNING:transformers.modeling_utils:`loss_type=None` was set in the config but it is unrecognized. Using the default loss: `ForCausalLMLoss`. 2025-12-04T11:42:56.2074048Z 2025-12-04T11:42:56.2074168Z loading model: 0it [00:03, ?it/s] 2025-12-04T11:42:56.2074933Z [W1204 11:36:36.734921876 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.2075588Z 2025-12-04T11:42:56.2076121Z [W1204 11:36:51.183827512 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.2076768Z 2025-12-04T11:42:56.2077281Z [W1204 11:36:51.187535330 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.2077942Z 2025-12-04T11:42:56.2078449Z [W1204 11:36:51.187801493 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.2079114Z 2025-12-04T11:42:56.2079627Z [W1204 11:36:51.188644431 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.2080278Z 2025-12-04T11:42:56.2080804Z [W1204 11:36:51.188885600 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.2081453Z 2025-12-04T11:42:56.2081977Z [W1204 11:36:51.189641292 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.2082712Z 2025-12-04T11:42:56.2083229Z [W1204 11:36:51.189852199 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.2083895Z 2025-12-04T11:42:56.2084409Z [W1204 11:36:51.190907768 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.2085125Z 2025-12-04T11:42:56.2085637Z [W1204 11:36:51.191099871 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.2086284Z 2025-12-04T11:42:56.2086851Z [W1204 11:36:51.191568086 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.2087501Z 2025-12-04T11:42:56.2088025Z [W1204 11:36:51.191751770 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.2088672Z 2025-12-04T11:42:56.2089186Z [W1204 11:36:51.192419226 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.2089912Z 2025-12-04T11:42:56.2090422Z [W1204 11:36:51.192612590 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.2091085Z 2025-12-04T11:42:56.2091598Z [W1204 11:36:51.193563867 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.2092244Z 2025-12-04T11:42:56.2092774Z [W1204 11:36:51.193748141 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.2093422Z 2025-12-04T11:42:56.2093948Z [W1204 11:36:51.194196765 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.2094596Z 2025-12-04T11:42:56.2095111Z [W1204 11:36:51.194375316 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.2095776Z 2025-12-04T11:42:56.2096358Z [W1204 11:36:51.195003082 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.2097022Z 2025-12-04T11:42:56.2097533Z [W1204 11:36:51.195183188 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.2098197Z 2025-12-04T11:42:56.2098717Z [W1204 11:36:51.196083443 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.2099365Z 2025-12-04T11:42:56.2099887Z [W1204 11:36:51.196265674 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.2100533Z 2025-12-04T11:42:56.2101061Z [W1204 11:36:51.196718880 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.2101708Z 2025-12-04T11:42:56.2102220Z [W1204 11:36:51.196896697 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.2102877Z 2025-12-04T11:42:56.2103387Z [W1204 11:36:51.197511368 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.2104046Z 2025-12-04T11:42:56.2104556Z [W1204 11:36:51.197694399 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.2105200Z 2025-12-04T11:42:56.2105723Z [W1204 11:36:51.198589728 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.2106368Z 2025-12-04T11:42:56.2106934Z [W1204 11:36:51.198769355 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.2107590Z 2025-12-04T11:42:56.2108101Z [W1204 11:36:51.199209908 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.2108763Z 2025-12-04T11:42:56.2109275Z [W1204 11:36:51.199391433 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.2109972Z 2025-12-04T11:42:56.2110519Z [W1204 11:36:51.200031627 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.2111171Z 2025-12-04T11:42:56.2111702Z [W1204 11:36:51.200212638 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.2112350Z 2025-12-04T11:42:56.2112878Z [W1204 11:36:51.201109535 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.2113635Z 2025-12-04T11:42:56.2114146Z [W1204 11:36:51.201288822 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.2114809Z 2025-12-04T11:42:56.2115318Z [W1204 11:36:51.201715185 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.2115977Z 2025-12-04T11:42:56.2116488Z [W1204 11:36:51.201891767 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.2117136Z 2025-12-04T11:42:56.2117662Z [W1204 11:36:51.202485114 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.2118310Z 2025-12-04T11:42:56.2118835Z [W1204 11:36:51.202663644 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.2119483Z 2025-12-04T11:42:56.2119957Z W1204 11:36:52.996000 106394 site-packages/torch/_inductor/utils.py:1703] [1/0_1] Not enough SMs to use max_autotune_gemm mode 2025-12-04T11:42:56.2121141Z W1204 11:36:55.640000 106394 site-packages/torch/_inductor/utils.py:1361] [1/0_1] on error, temporary cache dir kept at /tmp/tmpewfowa37/tmpai252t84 2025-12-04T11:42:56.2121954Z ERROR:common: 2025-12-04T11:42:56.2122235Z Traceback (most recent call last): 2025-12-04T11:42:56.2122860Z File "/var/lib/jenkins/workspace/benchmarks/dynamo/common.py", line 2333, in check_accuracy 2025-12-04T11:42:56.2123529Z new_result = self.run_n_iterations( 2025-12-04T11:42:56.2124181Z File "/var/lib/jenkins/workspace/benchmarks/dynamo/common.py", line 2043, in run_n_iterations 2025-12-04T11:42:56.2124879Z model_iter_fn(mod, inputs, collect_outputs=False) 2025-12-04T11:42:56.2125674Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T11:42:56.2126555Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T11:42:56.2127459Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 992, in _compile_fx_inner 2025-12-04T11:42:56.2128294Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T11:42:56.2129134Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 988, in _compile_fx_inner 2025-12-04T11:42:56.2129935Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T11:42:56.2130761Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T11:42:56.2131751Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T11:42:56.2132744Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1471, in codegen_and_compile 2025-12-04T11:42:56.2133591Z _check_triton_bf16_support(graph) 2025-12-04T11:42:56.2134388Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2911, in _check_triton_bf16_support 2025-12-04T11:42:56.2135208Z warn_and_skip(node.get_device()) 2025-12-04T11:42:56.2135939Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2894, in warn_and_skip 2025-12-04T11:42:56.2136823Z raise SkipFrame("BF16 is not supported") 2025-12-04T11:42:56.2137337Z torch._inductor.exc.InductorError: SkipFrame: BF16 is not supported 2025-12-04T11:42:56.2137741Z 2025-12-04T11:42:56.2138502Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T11:42:56.2139362Z 2025-12-04T11:42:56.2139367Z 2025-12-04T11:42:56.2139371Z 2025-12-04T11:42:56.2139590Z To execute this test, run the following from the base repo dir: 2025-12-04T11:42:56.2140839Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_deterministic.py DeterministicTest.test_run2run_determinism_model_name_DistillGPT2_training_or_inference_inference_precision_bfloat16 2025-12-04T11:42:56.2141890Z 2025-12-04T11:42:56.2142172Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:42:56.2142796Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:42:56.2144380Z Command /opt/conda/envs/py_3.10/bin/python /var/lib/jenkins/workspace/benchmarks/dynamo/huggingface.py --backend inductor --bfloat16 --accuracy --only DistillGPT2 --inference --disable-cudagraphs --save-model-outputs-to=/tmp/tmp2kpkd29m/saved.pkl 2025-12-04T11:42:56.2146978Z Command /opt/conda/envs/py_3.10/bin/python /var/lib/jenkins/workspace/benchmarks/dynamo/huggingface.py --backend inductor --bfloat16 --accuracy --only DistillGPT2 --inference --disable-cudagraphs --compare-model-outputs-with=/tmp/tmp2kpkd29m/saved.pkl 2025-12-04T11:42:56.2148594Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:42:56.2150156Z Command /opt/conda/envs/py_3.10/bin/python /var/lib/jenkins/workspace/benchmarks/dynamo/huggingface.py --backend inductor --bfloat16 --accuracy --only DistillGPT2 --inference --disable-cudagraphs --save-model-outputs-to=/tmp/tmpaak6li6p/saved.pkl 2025-12-04T11:42:56.2152750Z Command /opt/conda/envs/py_3.10/bin/python /var/lib/jenkins/workspace/benchmarks/dynamo/huggingface.py --backend inductor --bfloat16 --accuracy --only DistillGPT2 --inference --disable-cudagraphs --compare-model-outputs-with=/tmp/tmpaak6li6p/saved.pkl 2025-12-04T11:42:56.2154355Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:42:56.2155931Z Command /opt/conda/envs/py_3.10/bin/python /var/lib/jenkins/workspace/benchmarks/dynamo/huggingface.py --backend inductor --bfloat16 --accuracy --only DistillGPT2 --inference --disable-cudagraphs --save-model-outputs-to=/tmp/tmpi6g9v71o/saved.pkl 2025-12-04T11:42:56.2158519Z Command /opt/conda/envs/py_3.10/bin/python /var/lib/jenkins/workspace/benchmarks/dynamo/huggingface.py --backend inductor --bfloat16 --accuracy --only DistillGPT2 --inference --disable-cudagraphs --compare-model-outputs-with=/tmp/tmpi6g9v71o/saved.pkl 2025-12-04T11:42:56.2160670Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_deterministic/inductor.test_deterministic-cbc1aeff512c7b0d.xml - 2025-12-04T11:42:56.2161747Z =========================== short test summary info ============================ 2025-12-04T11:42:56.2163248Z FAILED [77.9025s] inductor/test_deterministic.py::DeterministicTest::test_run2run_determinism_model_name_DistillGPT2_training_or_inference_inference_precision_bfloat16 - AssertionError: False is not true : stdout: cuda eval DistillGPT2 2025-12-04T11:42:56.2164804Z TorchDynamo optimized model failed to run because of following error 2025-12-04T11:42:56.2165370Z fail_to_run 2025-12-04T11:42:56.2165604Z , stderr: 2025-12-04T11:42:56.2166241Z loading model: 0it [00:00, ?it/s]`loss_type=None` was set in the config but it is unrecognized. Using the default loss: `ForCausalLMLoss`. 2025-12-04T11:42:56.2167442Z WARNING:transformers.modeling_utils:`loss_type=None` was set in the config but it is unrecognized. Using the default loss: `ForCausalLMLoss`. 2025-12-04T11:42:56.2168176Z 2025-12-04T11:42:56.2168304Z loading model: 0it [00:03, ?it/s] 2025-12-04T11:42:56.2169082Z [W1204 11:36:36.734921876 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.2169746Z 2025-12-04T11:42:56.2170263Z [W1204 11:36:51.183827512 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.2170909Z 2025-12-04T11:42:56.2171617Z [W1204 11:36:51.187535330 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.2172345Z 2025-12-04T11:42:56.2172866Z [W1204 11:36:51.187801493 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.2173510Z 2025-12-04T11:42:56.2174017Z [W1204 11:36:51.188644431 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.2174677Z 2025-12-04T11:42:56.2175192Z [W1204 11:36:51.188885600 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.2175857Z 2025-12-04T11:42:56.2176440Z [W1204 11:36:51.189641292 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.2177106Z 2025-12-04T11:42:56.2177619Z [W1204 11:36:51.189852199 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.2178268Z 2025-12-04T11:42:56.2178799Z [W1204 11:36:51.190907768 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.2179448Z 2025-12-04T11:42:56.2179961Z [W1204 11:36:51.191099871 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.2180628Z 2025-12-04T11:42:56.2181143Z [W1204 11:36:51.191568086 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.2181807Z 2025-12-04T11:42:56.2182321Z [W1204 11:36:51.191751770 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.2182989Z 2025-12-04T11:42:56.2183502Z [W1204 11:36:51.192419226 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.2184151Z 2025-12-04T11:42:56.2184675Z [W1204 11:36:51.192612590 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.2185321Z 2025-12-04T11:42:56.2185846Z [W1204 11:36:51.193563867 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.2186495Z 2025-12-04T11:42:56.2187003Z [W1204 11:36:51.193748141 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.2187661Z 2025-12-04T11:42:56.2188173Z [W1204 11:36:51.194196765 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.2188831Z 2025-12-04T11:42:56.2189346Z [W1204 11:36:51.194375316 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.2189992Z 2025-12-04T11:42:56.2190579Z [W1204 11:36:51.195003082 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.2191226Z 2025-12-04T11:42:56.2191747Z [W1204 11:36:51.195183188 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.2192396Z 2025-12-04T11:42:56.2192953Z [W1204 11:36:51.196083443 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.2193616Z 2025-12-04T11:42:56.2194194Z [W1204 11:36:51.196265674 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.2194857Z 2025-12-04T11:42:56.2195367Z [W1204 11:36:51.196718880 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.2196016Z 2025-12-04T11:42:56.2196544Z [W1204 11:36:51.196896697 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.2197222Z 2025-12-04T11:42:56.2197746Z [W1204 11:36:51.197511368 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.2198395Z 2025-12-04T11:42:56.2198909Z [W1204 11:36:51.197694399 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.2199573Z 2025-12-04T11:42:56.2200088Z [W1204 11:36:51.198589728 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.2200745Z 2025-12-04T11:42:56.2201259Z [W1204 11:36:51.198769355 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.2201903Z 2025-12-04T11:42:56.2202431Z [W1204 11:36:51.199209908 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.2203081Z 2025-12-04T11:42:56.2203604Z [W1204 11:36:51.199391433 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.2204252Z 2025-12-04T11:42:56.2204760Z [W1204 11:36:51.200031627 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.2205423Z 2025-12-04T11:42:56.2205932Z [W1204 11:36:51.200212638 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.2206595Z 2025-12-04T11:42:56.2207105Z [W1204 11:36:51.201109535 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.2207748Z 2025-12-04T11:42:56.2208271Z [W1204 11:36:51.201288822 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.2208920Z 2025-12-04T11:42:56.2209444Z [W1204 11:36:51.201715185 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.2210093Z 2025-12-04T11:42:56.2210608Z [W1204 11:36:51.201891767 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.2211271Z 2025-12-04T11:42:56.2211785Z [W1204 11:36:51.202485114 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.2212449Z 2025-12-04T11:42:56.2212959Z [W1204 11:36:51.202663644 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.2213618Z 2025-12-04T11:42:56.2214135Z W1204 11:36:52.996000 106394 site-packages/torch/_inductor/utils.py:1703] [1/0_1] Not enough SMs to use max_autotune_gemm mode 2025-12-04T11:42:56.2215325Z W1204 11:36:55.640000 106394 site-packages/torch/_inductor/utils.py:1361] [1/0_1] on error, temporary cache dir kept at /tmp/tmpewfowa37/tmpai252t84 2025-12-04T11:42:56.2216124Z ERROR:common: 2025-12-04T11:42:56.2216475Z Traceback (most recent call last): 2025-12-04T11:42:56.2217114Z File "/var/lib/jenkins/workspace/benchmarks/dynamo/common.py", line 2333, in check_accuracy 2025-12-04T11:42:56.2217805Z new_result = self.run_n_iterations( 2025-12-04T11:42:56.2218466Z File "/var/lib/jenkins/workspace/benchmarks/dynamo/common.py", line 2043, in run_n_iterations 2025-12-04T11:42:56.2219205Z model_iter_fn(mod, inputs, collect_outputs=False) 2025-12-04T11:42:56.2220000Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T11:42:56.2220863Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T11:42:56.2221762Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 992, in _compile_fx_inner 2025-12-04T11:42:56.2222637Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T11:42:56.2223472Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 988, in _compile_fx_inner 2025-12-04T11:42:56.2224256Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T11:42:56.2225078Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T11:42:56.2226080Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T11:42:56.2227078Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1471, in codegen_and_compile 2025-12-04T11:42:56.2227872Z _check_triton_bf16_support(graph) 2025-12-04T11:42:56.2228679Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2911, in _check_triton_bf16_support 2025-12-04T11:42:56.2229504Z warn_and_skip(node.get_device()) 2025-12-04T11:42:56.2230223Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2894, in warn_and_skip 2025-12-04T11:42:56.2230997Z raise SkipFrame("BF16 is not supported") 2025-12-04T11:42:56.2231526Z torch._inductor.exc.InductorError: SkipFrame: BF16 is not supported 2025-12-04T11:42:56.2231921Z 2025-12-04T11:42:56.2232652Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T11:42:56.2233497Z 2025-12-04T11:42:56.2233501Z 2025-12-04T11:42:56.2233506Z 2025-12-04T11:42:56.2233726Z To execute this test, run the following from the base repo dir: 2025-12-04T11:42:56.2234977Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_deterministic.py DeterministicTest.test_run2run_determinism_model_name_DistillGPT2_training_or_inference_inference_precision_bfloat16 2025-12-04T11:42:56.2236020Z 2025-12-04T11:42:56.2236292Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:42:56.2236891Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:42:56.2237435Z ============= 1 failed, 2 deselected, 2 rerun in 233.12s (0:03:53) ============= 2025-12-04T11:42:56.2237889Z Got exit code 1 2025-12-04T11:42:56.2238169Z Retrying single test... 2025-12-04T11:42:56.2238819Z W1204 11:37:08.548000 106498 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T11:42:56.2239987Z Test results will be stored in test-reports/python-pytest/inductor.test_deterministic/inductor.test_deterministic-b35d65d1a2e42e4e.xml 2025-12-04T11:42:56.2240894Z ============================= test session starts ============================== 2025-12-04T11:42:56.2241617Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T11:42:56.2242231Z cachedir: .pytest_cache 2025-12-04T11:42:56.2242936Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:42:56.2243726Z rootdir: /var/lib/jenkins/workspace 2025-12-04T11:42:56.2244088Z configfile: pytest.ini 2025-12-04T11:42:56.2244852Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T11:42:56.2245836Z collecting ... collected 32 items / 2 deselected / 30 selected 2025-12-04T11:42:56.2247200Z stepcurrent: skipping 0 already run items. Running only test/inductor/test_deterministic.py::DeterministicTest::test_run2run_determinism_model_name_DistillGPT2_training_or_inference_inference_precision_bfloat16 2025-12-04T11:42:56.2248421Z Running 1 items in this shard 2025-12-04T11:42:56.2248632Z 2025-12-04T11:42:56.2249515Z inductor/test_deterministic.py::DeterministicTest::test_run2run_determinism_model_name_DistillGPT2_training_or_inference_inference_precision_bfloat16 ('RERUN', {'yellow': True}) [78.6372s] [100%] 2025-12-04T11:42:56.2251428Z inductor/test_deterministic.py::DeterministicTest::test_run2run_determinism_model_name_DistillGPT2_training_or_inference_inference_precision_bfloat16 ('RERUN', {'yellow': True}) [79.2654s] [100%] 2025-12-04T11:42:56.2253211Z inductor/test_deterministic.py::DeterministicTest::test_run2run_determinism_model_name_DistillGPT2_training_or_inference_inference_precision_bfloat16 FAILED [78.1277s] [100%] 2025-12-04T11:42:56.2254117Z 2025-12-04T11:42:56.2254276Z ==================================== RERUNS ==================================== 2025-12-04T11:42:56.2255090Z _ DeterministicTest.test_run2run_determinism_model_name_DistillGPT2_training_or_inference_inference_precision_bfloat16 _ 2025-12-04T11:42:56.2255865Z Traceback (most recent call last): 2025-12-04T11:42:56.2256660Z File "/var/lib/jenkins/workspace/test/inductor/test_deterministic.py", line 166, in test_run2run_determinism 2025-12-04T11:42:56.2257399Z self.assertTrue( 2025-12-04T11:42:56.2257896Z File "/opt/conda/envs/py_3.10/lib/python3.10/unittest/case.py", line 687, in assertTrue 2025-12-04T11:42:56.2258494Z raise self.failureException(msg) 2025-12-04T11:42:56.2259069Z AssertionError: False is not true : stdout: cuda eval DistillGPT2 2025-12-04T11:42:56.2259794Z TorchDynamo optimized model failed to run because of following error 2025-12-04T11:42:56.2260278Z fail_to_run 2025-12-04T11:42:56.2260519Z , stderr: 2025-12-04T11:42:56.2261149Z loading model: 0it [00:00, ?it/s]`loss_type=None` was set in the config but it is unrecognized. Using the default loss: `ForCausalLMLoss`. 2025-12-04T11:42:56.2262339Z WARNING:transformers.modeling_utils:`loss_type=None` was set in the config but it is unrecognized. Using the default loss: `ForCausalLMLoss`. 2025-12-04T11:42:56.2263057Z 2025-12-04T11:42:56.2263177Z loading model: 0it [00:03, ?it/s] 2025-12-04T11:42:56.2263932Z [W1204 11:38:04.269755840 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.2264582Z 2025-12-04T11:42:56.2265106Z [W1204 11:38:20.296686305 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.2265759Z 2025-12-04T11:42:56.2266282Z [W1204 11:38:20.300418821 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.2266933Z 2025-12-04T11:42:56.2267448Z [W1204 11:38:20.300693788 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.2268109Z 2025-12-04T11:42:56.2268620Z [W1204 11:38:20.301508541 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.2269281Z 2025-12-04T11:42:56.2269841Z [W1204 11:38:20.301733923 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.2270491Z 2025-12-04T11:42:56.2271228Z [W1204 11:38:20.302461176 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.2271875Z 2025-12-04T11:42:56.2272468Z [W1204 11:38:20.302671949 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.2273118Z 2025-12-04T11:42:56.2273678Z [W1204 11:38:20.303695684 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.2274342Z 2025-12-04T11:42:56.2274853Z [W1204 11:38:20.303885420 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.2275513Z 2025-12-04T11:42:56.2276029Z [W1204 11:38:20.304348242 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.2276721Z 2025-12-04T11:42:56.2277249Z [W1204 11:38:20.304529859 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.2277896Z 2025-12-04T11:42:56.2278425Z [W1204 11:38:20.305192665 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.2279078Z 2025-12-04T11:42:56.2279594Z [W1204 11:38:20.305377277 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.2280253Z 2025-12-04T11:42:56.2280766Z [W1204 11:38:20.306297294 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.2281422Z 2025-12-04T11:42:56.2281939Z [W1204 11:38:20.306478871 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.2282588Z 2025-12-04T11:42:56.2283116Z [W1204 11:38:20.306917597 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.2283764Z 2025-12-04T11:42:56.2284289Z [W1204 11:38:20.307100173 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.2284942Z 2025-12-04T11:42:56.2285454Z [W1204 11:38:20.307723531 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.2286113Z 2025-12-04T11:42:56.2286624Z [W1204 11:38:20.307906637 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.2287285Z 2025-12-04T11:42:56.2287802Z [W1204 11:38:20.308826418 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.2288446Z 2025-12-04T11:42:56.2288967Z [W1204 11:38:20.309007889 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.2289612Z 2025-12-04T11:42:56.2290136Z [W1204 11:38:20.309444992 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.2290787Z 2025-12-04T11:42:56.2291303Z [W1204 11:38:20.309627573 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.2291961Z 2025-12-04T11:42:56.2292478Z [W1204 11:38:20.310273616 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.2293140Z 2025-12-04T11:42:56.2293702Z [W1204 11:38:20.310452885 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.2294355Z 2025-12-04T11:42:56.2294881Z [W1204 11:38:20.311350837 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.2295527Z 2025-12-04T11:42:56.2296049Z [W1204 11:38:20.311534762 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.2296827Z 2025-12-04T11:42:56.2297336Z [W1204 11:38:20.311967356 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.2298029Z 2025-12-04T11:42:56.2298539Z [W1204 11:38:20.312145343 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.2299197Z 2025-12-04T11:42:56.2299709Z [W1204 11:38:20.312766683 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.2300368Z 2025-12-04T11:42:56.2300918Z [W1204 11:38:20.312945908 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.2301564Z 2025-12-04T11:42:56.2302089Z [W1204 11:38:20.313825878 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.2302735Z 2025-12-04T11:42:56.2303260Z [W1204 11:38:20.314005577 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.2303905Z 2025-12-04T11:42:56.2304418Z [W1204 11:38:20.314434292 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.2305078Z 2025-12-04T11:42:56.2305586Z [W1204 11:38:20.314610913 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.2306250Z 2025-12-04T11:42:56.2306764Z [W1204 11:38:20.315212887 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.2307413Z 2025-12-04T11:42:56.2307940Z [W1204 11:38:20.315392202 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.2308591Z 2025-12-04T11:42:56.2309083Z W1204 11:38:22.125000 106712 site-packages/torch/_inductor/utils.py:1703] [1/0_1] Not enough SMs to use max_autotune_gemm mode 2025-12-04T11:42:56.2310258Z W1204 11:38:24.764000 106712 site-packages/torch/_inductor/utils.py:1361] [1/0_1] on error, temporary cache dir kept at /tmp/tmpuhhbdw1k/tmpq_n70fjv 2025-12-04T11:42:56.2311072Z ERROR:common: 2025-12-04T11:42:56.2311352Z Traceback (most recent call last): 2025-12-04T11:42:56.2311974Z File "/var/lib/jenkins/workspace/benchmarks/dynamo/common.py", line 2333, in check_accuracy 2025-12-04T11:42:56.2324416Z new_result = self.run_n_iterations( 2025-12-04T11:42:56.2325135Z File "/var/lib/jenkins/workspace/benchmarks/dynamo/common.py", line 2043, in run_n_iterations 2025-12-04T11:42:56.2325859Z model_iter_fn(mod, inputs, collect_outputs=False) 2025-12-04T11:42:56.2326673Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T11:42:56.2327562Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T11:42:56.2328458Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 992, in _compile_fx_inner 2025-12-04T11:42:56.2329317Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T11:42:56.2330161Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 988, in _compile_fx_inner 2025-12-04T11:42:56.2330959Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T11:42:56.2331873Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T11:42:56.2332880Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T11:42:56.2333879Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1471, in codegen_and_compile 2025-12-04T11:42:56.2334680Z _check_triton_bf16_support(graph) 2025-12-04T11:42:56.2335519Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2911, in _check_triton_bf16_support 2025-12-04T11:42:56.2336478Z warn_and_skip(node.get_device()) 2025-12-04T11:42:56.2337224Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2894, in warn_and_skip 2025-12-04T11:42:56.2337982Z raise SkipFrame("BF16 is not supported") 2025-12-04T11:42:56.2338512Z torch._inductor.exc.InductorError: SkipFrame: BF16 is not supported 2025-12-04T11:42:56.2338916Z 2025-12-04T11:42:56.2339636Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T11:42:56.2340525Z 2025-12-04T11:42:56.2340530Z 2025-12-04T11:42:56.2340535Z 2025-12-04T11:42:56.2340764Z To execute this test, run the following from the base repo dir: 2025-12-04T11:42:56.2342013Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_deterministic.py DeterministicTest.test_run2run_determinism_model_name_DistillGPT2_training_or_inference_inference_precision_bfloat16 2025-12-04T11:42:56.2343039Z 2025-12-04T11:42:56.2343312Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:42:56.2343953Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:42:56.2345553Z Command /opt/conda/envs/py_3.10/bin/python /var/lib/jenkins/workspace/benchmarks/dynamo/huggingface.py --backend inductor --bfloat16 --accuracy --only DistillGPT2 --inference --disable-cudagraphs --save-model-outputs-to=/tmp/tmpezjp3dnh/saved.pkl 2025-12-04T11:42:56.2348176Z Command /opt/conda/envs/py_3.10/bin/python /var/lib/jenkins/workspace/benchmarks/dynamo/huggingface.py --backend inductor --bfloat16 --accuracy --only DistillGPT2 --inference --disable-cudagraphs --compare-model-outputs-with=/tmp/tmpezjp3dnh/saved.pkl 2025-12-04T11:42:56.2350075Z _ DeterministicTest.test_run2run_determinism_model_name_DistillGPT2_training_or_inference_inference_precision_bfloat16 _ 2025-12-04T11:42:56.2350852Z Traceback (most recent call last): 2025-12-04T11:42:56.2351587Z File "/var/lib/jenkins/workspace/test/inductor/test_deterministic.py", line 166, in test_run2run_determinism 2025-12-04T11:42:56.2352336Z self.assertTrue( 2025-12-04T11:42:56.2352832Z File "/opt/conda/envs/py_3.10/lib/python3.10/unittest/case.py", line 687, in assertTrue 2025-12-04T11:42:56.2353434Z raise self.failureException(msg) 2025-12-04T11:42:56.2354019Z AssertionError: False is not true : stdout: cuda eval DistillGPT2 2025-12-04T11:42:56.2354755Z TorchDynamo optimized model failed to run because of following error 2025-12-04T11:42:56.2355244Z fail_to_run 2025-12-04T11:42:56.2355491Z , stderr: 2025-12-04T11:42:56.2356130Z loading model: 0it [00:00, ?it/s]`loss_type=None` was set in the config but it is unrecognized. Using the default loss: `ForCausalLMLoss`. 2025-12-04T11:42:56.2357319Z WARNING:transformers.modeling_utils:`loss_type=None` was set in the config but it is unrecognized. Using the default loss: `ForCausalLMLoss`. 2025-12-04T11:42:56.2358039Z 2025-12-04T11:42:56.2358159Z loading model: 0it [00:03, ?it/s] 2025-12-04T11:42:56.2358915Z [W1204 11:39:23.314064910 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.2359566Z 2025-12-04T11:42:56.2360132Z [W1204 11:39:40.565949462 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.2360782Z 2025-12-04T11:42:56.2361307Z [W1204 11:39:40.569705105 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.2361954Z 2025-12-04T11:42:56.2362471Z [W1204 11:39:40.569975724 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.2363217Z 2025-12-04T11:42:56.2363729Z [W1204 11:39:40.570857629 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.2364401Z 2025-12-04T11:42:56.2364944Z [W1204 11:39:40.571082099 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.2365595Z 2025-12-04T11:42:56.2366125Z [W1204 11:39:40.571821794 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.2366772Z 2025-12-04T11:42:56.2367301Z [W1204 11:39:40.572026357 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.2367987Z 2025-12-04T11:42:56.2368503Z [W1204 11:39:40.573101463 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.2369172Z 2025-12-04T11:42:56.2369682Z [W1204 11:39:40.573299754 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.2370342Z 2025-12-04T11:42:56.2370860Z [W1204 11:39:40.573773047 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.2371712Z 2025-12-04T11:42:56.2372244Z [W1204 11:39:40.573972126 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.2372898Z 2025-12-04T11:42:56.2373427Z [W1204 11:39:40.574648553 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.2374082Z 2025-12-04T11:42:56.2374595Z [W1204 11:39:40.574851069 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.2375263Z 2025-12-04T11:42:56.2375780Z [W1204 11:39:40.575786385 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.2376512Z 2025-12-04T11:42:56.2377028Z [W1204 11:39:40.575971820 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.2377673Z 2025-12-04T11:42:56.2378206Z [W1204 11:39:40.576428810 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.2378862Z 2025-12-04T11:42:56.2379394Z [W1204 11:39:40.576624024 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.2380047Z 2025-12-04T11:42:56.2380563Z [W1204 11:39:40.577277314 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.2381229Z 2025-12-04T11:42:56.2381745Z [W1204 11:39:40.577459192 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.2382411Z 2025-12-04T11:42:56.2382926Z [W1204 11:39:40.578371322 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.2383577Z 2025-12-04T11:42:56.2384100Z [W1204 11:39:40.578555530 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.2384750Z 2025-12-04T11:42:56.2385386Z [W1204 11:39:40.579011618 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.2386039Z 2025-12-04T11:42:56.2386555Z [W1204 11:39:40.579192613 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.2387218Z 2025-12-04T11:42:56.2387731Z [W1204 11:39:40.579821653 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.2388455Z 2025-12-04T11:42:56.2389006Z [W1204 11:39:40.580031664 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.2389659Z 2025-12-04T11:42:56.2390184Z [W1204 11:39:40.580952719 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.2390833Z 2025-12-04T11:42:56.2391364Z [W1204 11:39:40.581135288 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.2392054Z 2025-12-04T11:42:56.2392566Z [W1204 11:39:40.581586691 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.2393227Z 2025-12-04T11:42:56.2393741Z [W1204 11:39:40.581769034 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.2394405Z 2025-12-04T11:42:56.2394921Z [W1204 11:39:40.582412826 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.2395586Z 2025-12-04T11:42:56.2396099Z [W1204 11:39:40.582595397 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.2396745Z 2025-12-04T11:42:56.2397272Z [W1204 11:39:40.583484478 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.2397918Z 2025-12-04T11:42:56.2398446Z [W1204 11:39:40.583664781 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.2399097Z 2025-12-04T11:42:56.2399611Z [W1204 11:39:40.584090230 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.2400278Z 2025-12-04T11:42:56.2400792Z [W1204 11:39:40.584269122 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.2401453Z 2025-12-04T11:42:56.2401966Z [W1204 11:39:40.584894333 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.2402616Z 2025-12-04T11:42:56.2403148Z [W1204 11:39:40.585078921 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.2403797Z 2025-12-04T11:42:56.2404295Z W1204 11:39:41.394000 106924 site-packages/torch/_inductor/utils.py:1703] [1/0_1] Not enough SMs to use max_autotune_gemm mode 2025-12-04T11:42:56.2405476Z W1204 11:39:44.038000 106924 site-packages/torch/_inductor/utils.py:1361] [1/0_1] on error, temporary cache dir kept at /tmp/tmp3ldklfp5/tmpcq8m_h56 2025-12-04T11:42:56.2406294Z ERROR:common: 2025-12-04T11:42:56.2406579Z Traceback (most recent call last): 2025-12-04T11:42:56.2407204Z File "/var/lib/jenkins/workspace/benchmarks/dynamo/common.py", line 2333, in check_accuracy 2025-12-04T11:42:56.2407870Z new_result = self.run_n_iterations( 2025-12-04T11:42:56.2408533Z File "/var/lib/jenkins/workspace/benchmarks/dynamo/common.py", line 2043, in run_n_iterations 2025-12-04T11:42:56.2409250Z model_iter_fn(mod, inputs, collect_outputs=False) 2025-12-04T11:42:56.2410071Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T11:42:56.2410948Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T11:42:56.2411854Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 992, in _compile_fx_inner 2025-12-04T11:42:56.2412684Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T11:42:56.2413523Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 988, in _compile_fx_inner 2025-12-04T11:42:56.2414357Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T11:42:56.2415204Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T11:42:56.2416194Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T11:42:56.2417270Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1471, in codegen_and_compile 2025-12-04T11:42:56.2418070Z _check_triton_bf16_support(graph) 2025-12-04T11:42:56.2418913Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2911, in _check_triton_bf16_support 2025-12-04T11:42:56.2419721Z warn_and_skip(node.get_device()) 2025-12-04T11:42:56.2420472Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2894, in warn_and_skip 2025-12-04T11:42:56.2421239Z raise SkipFrame("BF16 is not supported") 2025-12-04T11:42:56.2421771Z torch._inductor.exc.InductorError: SkipFrame: BF16 is not supported 2025-12-04T11:42:56.2422172Z 2025-12-04T11:42:56.2422890Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T11:42:56.2423733Z 2025-12-04T11:42:56.2423738Z 2025-12-04T11:42:56.2423743Z 2025-12-04T11:42:56.2423982Z To execute this test, run the following from the base repo dir: 2025-12-04T11:42:56.2425233Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_deterministic.py DeterministicTest.test_run2run_determinism_model_name_DistillGPT2_training_or_inference_inference_precision_bfloat16 2025-12-04T11:42:56.2426263Z 2025-12-04T11:42:56.2426533Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:42:56.2427173Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:42:56.2428779Z Command /opt/conda/envs/py_3.10/bin/python /var/lib/jenkins/workspace/benchmarks/dynamo/huggingface.py --backend inductor --bfloat16 --accuracy --only DistillGPT2 --inference --disable-cudagraphs --save-model-outputs-to=/tmp/tmpezjp3dnh/saved.pkl 2025-12-04T11:42:56.2431404Z Command /opt/conda/envs/py_3.10/bin/python /var/lib/jenkins/workspace/benchmarks/dynamo/huggingface.py --backend inductor --bfloat16 --accuracy --only DistillGPT2 --inference --disable-cudagraphs --compare-model-outputs-with=/tmp/tmpezjp3dnh/saved.pkl 2025-12-04T11:42:56.2433020Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:42:56.2434581Z Command /opt/conda/envs/py_3.10/bin/python /var/lib/jenkins/workspace/benchmarks/dynamo/huggingface.py --backend inductor --bfloat16 --accuracy --only DistillGPT2 --inference --disable-cudagraphs --save-model-outputs-to=/tmp/tmpr80swaia/saved.pkl 2025-12-04T11:42:56.2437165Z Command /opt/conda/envs/py_3.10/bin/python /var/lib/jenkins/workspace/benchmarks/dynamo/huggingface.py --backend inductor --bfloat16 --accuracy --only DistillGPT2 --inference --disable-cudagraphs --compare-model-outputs-with=/tmp/tmpr80swaia/saved.pkl 2025-12-04T11:42:56.2438695Z =================================== FAILURES =================================== 2025-12-04T11:42:56.2439506Z _ DeterministicTest.test_run2run_determinism_model_name_DistillGPT2_training_or_inference_inference_precision_bfloat16 _ 2025-12-04T11:42:56.2440287Z Traceback (most recent call last): 2025-12-04T11:42:56.2441064Z File "/var/lib/jenkins/workspace/test/inductor/test_deterministic.py", line 166, in test_run2run_determinism 2025-12-04T11:42:56.2441808Z self.assertTrue( 2025-12-04T11:42:56.2442316Z File "/opt/conda/envs/py_3.10/lib/python3.10/unittest/case.py", line 687, in assertTrue 2025-12-04T11:42:56.2442895Z raise self.failureException(msg) 2025-12-04T11:42:56.2443468Z AssertionError: False is not true : stdout: cuda eval DistillGPT2 2025-12-04T11:42:56.2444228Z TorchDynamo optimized model failed to run because of following error 2025-12-04T11:42:56.2444713Z fail_to_run 2025-12-04T11:42:56.2444983Z , stderr: 2025-12-04T11:42:56.2445618Z loading model: 0it [00:00, ?it/s]`loss_type=None` was set in the config but it is unrecognized. Using the default loss: `ForCausalLMLoss`. 2025-12-04T11:42:56.2446815Z WARNING:transformers.modeling_utils:`loss_type=None` was set in the config but it is unrecognized. Using the default loss: `ForCausalLMLoss`. 2025-12-04T11:42:56.2447519Z 2025-12-04T11:42:56.2447638Z loading model: 0it [00:03, ?it/s] 2025-12-04T11:42:56.2448427Z [W1204 11:40:42.245142406 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.2449079Z 2025-12-04T11:42:56.2449609Z [W1204 11:40:58.664357169 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.2450259Z 2025-12-04T11:42:56.2450782Z [W1204 11:40:58.668049808 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.2451439Z 2025-12-04T11:42:56.2451954Z [W1204 11:40:58.668307211 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.2452613Z 2025-12-04T11:42:56.2453124Z [W1204 11:40:58.669145931 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.2453785Z 2025-12-04T11:42:56.2454299Z [W1204 11:40:58.669359432 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.2454953Z 2025-12-04T11:42:56.2455481Z [W1204 11:40:58.670139203 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.2456131Z 2025-12-04T11:42:56.2456742Z [W1204 11:40:58.670333024 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.2457392Z 2025-12-04T11:42:56.2457907Z [W1204 11:40:58.671387134 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.2458565Z 2025-12-04T11:42:56.2459075Z [W1204 11:40:58.671584322 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.2459731Z 2025-12-04T11:42:56.2460244Z [W1204 11:40:58.672047813 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.2460906Z 2025-12-04T11:42:56.2461420Z [W1204 11:40:58.672230908 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.2462067Z 2025-12-04T11:42:56.2462595Z [W1204 11:40:58.672907023 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.2463245Z 2025-12-04T11:42:56.2463777Z [W1204 11:40:58.673093850 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.2464427Z 2025-12-04T11:42:56.2464939Z [W1204 11:40:58.674023154 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.2465601Z 2025-12-04T11:42:56.2466190Z [W1204 11:40:58.674207499 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.2466854Z 2025-12-04T11:42:56.2467371Z [W1204 11:40:58.674664175 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.2468020Z 2025-12-04T11:42:56.2468547Z [W1204 11:40:58.674845109 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.2469231Z 2025-12-04T11:42:56.2469791Z [W1204 11:40:58.675475067 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.2470444Z 2025-12-04T11:42:56.2471128Z [W1204 11:40:58.675657274 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.2471792Z 2025-12-04T11:42:56.2472309Z [W1204 11:40:58.676592500 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.2473049Z 2025-12-04T11:42:56.2473565Z [W1204 11:40:58.676774803 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.2474217Z 2025-12-04T11:42:56.2474746Z [W1204 11:40:58.677222049 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.2475399Z 2025-12-04T11:42:56.2475931Z [W1204 11:40:58.677400417 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.2476582Z 2025-12-04T11:42:56.2477095Z [W1204 11:40:58.678021426 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.2477761Z 2025-12-04T11:42:56.2478275Z [W1204 11:40:58.678200770 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.2478937Z 2025-12-04T11:42:56.2479449Z [W1204 11:40:58.679078870 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.2480099Z 2025-12-04T11:42:56.2480630Z [W1204 11:40:58.679258360 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.2481281Z 2025-12-04T11:42:56.2481809Z [W1204 11:40:58.679700447 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.2482459Z 2025-12-04T11:42:56.2482971Z [W1204 11:40:58.679878086 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.2483632Z 2025-12-04T11:42:56.2484147Z [W1204 11:40:58.680514917 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.2484810Z 2025-12-04T11:42:56.2485319Z [W1204 11:40:58.680704722 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.2485970Z 2025-12-04T11:42:56.2486495Z [W1204 11:40:58.681592525 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.2487150Z 2025-12-04T11:42:56.2487676Z [W1204 11:40:58.681775925 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.2488326Z 2025-12-04T11:42:56.2488837Z [W1204 11:40:58.682209951 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.2489496Z 2025-12-04T11:42:56.2490073Z [W1204 11:40:58.682388769 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.2490738Z 2025-12-04T11:42:56.2491258Z [W1204 11:40:58.683003251 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.2491905Z 2025-12-04T11:42:56.2492428Z [W1204 11:40:58.683184131 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.2493125Z 2025-12-04T11:42:56.2493614Z W1204 11:40:59.501000 107132 site-packages/torch/_inductor/utils.py:1703] [1/0_1] Not enough SMs to use max_autotune_gemm mode 2025-12-04T11:42:56.2494834Z W1204 11:41:02.162000 107132 site-packages/torch/_inductor/utils.py:1361] [1/0_1] on error, temporary cache dir kept at /tmp/tmpdoi89lc7/tmptfkayx7_ 2025-12-04T11:42:56.2495651Z ERROR:common: 2025-12-04T11:42:56.2495929Z Traceback (most recent call last): 2025-12-04T11:42:56.2496636Z File "/var/lib/jenkins/workspace/benchmarks/dynamo/common.py", line 2333, in check_accuracy 2025-12-04T11:42:56.2497286Z new_result = self.run_n_iterations( 2025-12-04T11:42:56.2498062Z File "/var/lib/jenkins/workspace/benchmarks/dynamo/common.py", line 2043, in run_n_iterations 2025-12-04T11:42:56.2498772Z model_iter_fn(mod, inputs, collect_outputs=False) 2025-12-04T11:42:56.2499556Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T11:42:56.2500445Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T11:42:56.2501347Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 992, in _compile_fx_inner 2025-12-04T11:42:56.2502192Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T11:42:56.2503013Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 988, in _compile_fx_inner 2025-12-04T11:42:56.2503809Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T11:42:56.2504625Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T11:42:56.2505626Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T11:42:56.2506604Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1471, in codegen_and_compile 2025-12-04T11:42:56.2507397Z _check_triton_bf16_support(graph) 2025-12-04T11:42:56.2508199Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2911, in _check_triton_bf16_support 2025-12-04T11:42:56.2509001Z warn_and_skip(node.get_device()) 2025-12-04T11:42:56.2509731Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2894, in warn_and_skip 2025-12-04T11:42:56.2510499Z raise SkipFrame("BF16 is not supported") 2025-12-04T11:42:56.2511034Z torch._inductor.exc.InductorError: SkipFrame: BF16 is not supported 2025-12-04T11:42:56.2511434Z 2025-12-04T11:42:56.2512146Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T11:42:56.2512994Z 2025-12-04T11:42:56.2512998Z 2025-12-04T11:42:56.2513003Z 2025-12-04T11:42:56.2513241Z To execute this test, run the following from the base repo dir: 2025-12-04T11:42:56.2514492Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_deterministic.py DeterministicTest.test_run2run_determinism_model_name_DistillGPT2_training_or_inference_inference_precision_bfloat16 2025-12-04T11:42:56.2515519Z 2025-12-04T11:42:56.2515789Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:42:56.2516434Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:42:56.2518074Z Command /opt/conda/envs/py_3.10/bin/python /var/lib/jenkins/workspace/benchmarks/dynamo/huggingface.py --backend inductor --bfloat16 --accuracy --only DistillGPT2 --inference --disable-cudagraphs --save-model-outputs-to=/tmp/tmpezjp3dnh/saved.pkl 2025-12-04T11:42:56.2520687Z Command /opt/conda/envs/py_3.10/bin/python /var/lib/jenkins/workspace/benchmarks/dynamo/huggingface.py --backend inductor --bfloat16 --accuracy --only DistillGPT2 --inference --disable-cudagraphs --compare-model-outputs-with=/tmp/tmpezjp3dnh/saved.pkl 2025-12-04T11:42:56.2522314Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:42:56.2523933Z Command /opt/conda/envs/py_3.10/bin/python /var/lib/jenkins/workspace/benchmarks/dynamo/huggingface.py --backend inductor --bfloat16 --accuracy --only DistillGPT2 --inference --disable-cudagraphs --save-model-outputs-to=/tmp/tmpr80swaia/saved.pkl 2025-12-04T11:42:56.2526530Z Command /opt/conda/envs/py_3.10/bin/python /var/lib/jenkins/workspace/benchmarks/dynamo/huggingface.py --backend inductor --bfloat16 --accuracy --only DistillGPT2 --inference --disable-cudagraphs --compare-model-outputs-with=/tmp/tmpr80swaia/saved.pkl 2025-12-04T11:42:56.2528168Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:42:56.2529741Z Command /opt/conda/envs/py_3.10/bin/python /var/lib/jenkins/workspace/benchmarks/dynamo/huggingface.py --backend inductor --bfloat16 --accuracy --only DistillGPT2 --inference --disable-cudagraphs --save-model-outputs-to=/tmp/tmp4n6zeiot/saved.pkl 2025-12-04T11:42:56.2532335Z Command /opt/conda/envs/py_3.10/bin/python /var/lib/jenkins/workspace/benchmarks/dynamo/huggingface.py --backend inductor --bfloat16 --accuracy --only DistillGPT2 --inference --disable-cudagraphs --compare-model-outputs-with=/tmp/tmp4n6zeiot/saved.pkl 2025-12-04T11:42:56.2534487Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_deterministic/inductor.test_deterministic-b35d65d1a2e42e4e.xml - 2025-12-04T11:42:56.2535576Z =========================== short test summary info ============================ 2025-12-04T11:42:56.2537188Z FAILED [78.1277s] inductor/test_deterministic.py::DeterministicTest::test_run2run_determinism_model_name_DistillGPT2_training_or_inference_inference_precision_bfloat16 - AssertionError: False is not true : stdout: cuda eval DistillGPT2 2025-12-04T11:42:56.2538754Z TorchDynamo optimized model failed to run because of following error 2025-12-04T11:42:56.2539246Z fail_to_run 2025-12-04T11:42:56.2539492Z , stderr: 2025-12-04T11:42:56.2540135Z loading model: 0it [00:00, ?it/s]`loss_type=None` was set in the config but it is unrecognized. Using the default loss: `ForCausalLMLoss`. 2025-12-04T11:42:56.2541324Z WARNING:transformers.modeling_utils:`loss_type=None` was set in the config but it is unrecognized. Using the default loss: `ForCausalLMLoss`. 2025-12-04T11:42:56.2542038Z 2025-12-04T11:42:56.2542158Z loading model: 0it [00:03, ?it/s] 2025-12-04T11:42:56.2542914Z [W1204 11:40:42.245142406 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.2543569Z 2025-12-04T11:42:56.2544099Z [W1204 11:40:58.664357169 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.2544750Z 2025-12-04T11:42:56.2545275Z [W1204 11:40:58.668049808 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.2545925Z 2025-12-04T11:42:56.2546437Z [W1204 11:40:58.668307211 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.2547097Z 2025-12-04T11:42:56.2547612Z [W1204 11:40:58.669145931 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.2548270Z 2025-12-04T11:42:56.2548829Z [W1204 11:40:58.669359432 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.2549481Z 2025-12-04T11:42:56.2550005Z [W1204 11:40:58.670139203 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.2550649Z 2025-12-04T11:42:56.2551178Z [W1204 11:40:58.670333024 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.2551861Z 2025-12-04T11:42:56.2552370Z [W1204 11:40:58.671387134 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.2553063Z 2025-12-04T11:42:56.2553579Z [W1204 11:40:58.671584322 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.2554240Z 2025-12-04T11:42:56.2554754Z [W1204 11:40:58.672047813 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.2555406Z 2025-12-04T11:42:56.2555929Z [W1204 11:40:58.672230908 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.2556637Z 2025-12-04T11:42:56.2557163Z [W1204 11:40:58.672907023 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.2557815Z 2025-12-04T11:42:56.2558326Z [W1204 11:40:58.673093850 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.2558985Z 2025-12-04T11:42:56.2559501Z [W1204 11:40:58.674023154 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.2560161Z 2025-12-04T11:42:56.2560673Z [W1204 11:40:58.674207499 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.2561334Z 2025-12-04T11:42:56.2561847Z [W1204 11:40:58.674664175 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.2562497Z 2025-12-04T11:42:56.2563018Z [W1204 11:40:58.674845109 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.2563669Z 2025-12-04T11:42:56.2564192Z [W1204 11:40:58.675475067 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.2564842Z 2025-12-04T11:42:56.2565355Z [W1204 11:40:58.675657274 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.2566012Z 2025-12-04T11:42:56.2566523Z [W1204 11:40:58.676592500 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.2567181Z 2025-12-04T11:42:56.2567696Z [W1204 11:40:58.676774803 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.2568351Z 2025-12-04T11:42:56.2568873Z [W1204 11:40:58.677222049 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.2569522Z 2025-12-04T11:42:56.2570044Z [W1204 11:40:58.677400417 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.2570696Z 2025-12-04T11:42:56.2571384Z [W1204 11:40:58.678021426 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.2572052Z 2025-12-04T11:42:56.2572563Z [W1204 11:40:58.678200770 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.2573227Z 2025-12-04T11:42:56.2573817Z [W1204 11:40:58.679078870 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.2574473Z 2025-12-04T11:42:56.2575001Z [W1204 11:40:58.679258360 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.2575649Z 2025-12-04T11:42:56.2576179Z [W1204 11:40:58.679700447 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.2576949Z 2025-12-04T11:42:56.2577508Z [W1204 11:40:58.679878086 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.2578169Z 2025-12-04T11:42:56.2578682Z [W1204 11:40:58.680514917 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.2579339Z 2025-12-04T11:42:56.2579853Z [W1204 11:40:58.680704722 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.2580543Z 2025-12-04T11:42:56.2581069Z [W1204 11:40:58.681592525 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.2581728Z 2025-12-04T11:42:56.2582249Z [W1204 11:40:58.681775925 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.2582899Z 2025-12-04T11:42:56.2583410Z [W1204 11:40:58.682209951 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.2584076Z 2025-12-04T11:42:56.2584589Z [W1204 11:40:58.682388769 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.2585251Z 2025-12-04T11:42:56.2585764Z [W1204 11:40:58.683003251 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.2586411Z 2025-12-04T11:42:56.2586934Z [W1204 11:40:58.683184131 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:42:56.2587588Z 2025-12-04T11:42:56.2588077Z W1204 11:40:59.501000 107132 site-packages/torch/_inductor/utils.py:1703] [1/0_1] Not enough SMs to use max_autotune_gemm mode 2025-12-04T11:42:56.2589253Z W1204 11:41:02.162000 107132 site-packages/torch/_inductor/utils.py:1361] [1/0_1] on error, temporary cache dir kept at /tmp/tmpdoi89lc7/tmptfkayx7_ 2025-12-04T11:42:56.2590070Z ERROR:common: 2025-12-04T11:42:56.2590352Z Traceback (most recent call last): 2025-12-04T11:42:56.2590989Z File "/var/lib/jenkins/workspace/benchmarks/dynamo/common.py", line 2333, in check_accuracy 2025-12-04T11:42:56.2591638Z new_result = self.run_n_iterations( 2025-12-04T11:42:56.2592296Z File "/var/lib/jenkins/workspace/benchmarks/dynamo/common.py", line 2043, in run_n_iterations 2025-12-04T11:42:56.2593009Z model_iter_fn(mod, inputs, collect_outputs=False) 2025-12-04T11:42:56.2593791Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T11:42:56.2594669Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T11:42:56.2595570Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 992, in _compile_fx_inner 2025-12-04T11:42:56.2596423Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T11:42:56.2597247Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 988, in _compile_fx_inner 2025-12-04T11:42:56.2598180Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T11:42:56.2599002Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T11:42:56.2600060Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T11:42:56.2601046Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1471, in codegen_and_compile 2025-12-04T11:42:56.2601841Z _check_triton_bf16_support(graph) 2025-12-04T11:42:56.2602641Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2911, in _check_triton_bf16_support 2025-12-04T11:42:56.2603495Z warn_and_skip(node.get_device()) 2025-12-04T11:42:56.2604270Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2894, in warn_and_skip 2025-12-04T11:42:56.2605045Z raise SkipFrame("BF16 is not supported") 2025-12-04T11:42:56.2605573Z torch._inductor.exc.InductorError: SkipFrame: BF16 is not supported 2025-12-04T11:42:56.2605958Z 2025-12-04T11:42:56.2606675Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T11:42:56.2607567Z 2025-12-04T11:42:56.2607572Z 2025-12-04T11:42:56.2607576Z 2025-12-04T11:42:56.2607794Z To execute this test, run the following from the base repo dir: 2025-12-04T11:42:56.2609041Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_deterministic.py DeterministicTest.test_run2run_determinism_model_name_DistillGPT2_training_or_inference_inference_precision_bfloat16 2025-12-04T11:42:56.2610064Z 2025-12-04T11:42:56.2610345Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:42:56.2610939Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:42:56.2611470Z ============= 1 failed, 2 deselected, 2 rerun in 236.06s (0:03:56) ============= 2025-12-04T11:42:56.2611927Z Got exit code 1 2025-12-04T11:42:56.2612924Z FAILED CONSISTENTLY: test/inductor/test_deterministic.py::DeterministicTest::test_run2run_determinism_model_name_DistillGPT2_training_or_inference_inference_precision_bfloat16 2025-12-04T11:42:56.2614265Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T11:42:56.2615277Z W1204 11:41:14.982000 107236 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T11:42:56.2616536Z Test results will be stored in test-reports/python-pytest/inductor.test_deterministic/inductor.test_deterministic-feba5ff46dbc30dd.xml 2025-12-04T11:42:56.2617449Z ============================= test session starts ============================== 2025-12-04T11:42:56.2618112Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T11:42:56.2618724Z cachedir: .pytest_cache 2025-12-04T11:42:56.2619444Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:42:56.2620234Z rootdir: /var/lib/jenkins/workspace 2025-12-04T11:42:56.2620586Z configfile: pytest.ini 2025-12-04T11:42:56.2621375Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T11:42:56.2622326Z collecting ... collected 32 items / 1 deselected / 31 selected 2025-12-04T11:42:56.2622814Z stepcurrent: skipping 1 already run items. 2025-12-04T11:42:56.2623210Z Running 2 items in this shard 2025-12-04T11:42:56.2623440Z 2025-12-04T11:42:56.2624205Z inductor/test_deterministic.py::DeterministicTest::test_run2run_determinism_model_name_GoogleFnet_training_or_inference_inference_precision_amp PASSED [50.2427s] [ 50%] 2025-12-04T11:42:56.2625864Z inductor/test_deterministic.py::DeterministicTest::test_run2run_determinism_model_name_GoogleFnet_training_or_inference_inference_precision_float16 PASSED [49.8039s] [100%] 2025-12-04T11:42:56.2626764Z 2025-12-04T11:42:56.2627606Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_deterministic/inductor.test_deterministic-feba5ff46dbc30dd.xml - 2025-12-04T11:42:56.2628720Z ================= 2 passed, 1 deselected in 100.07s (0:01:40) ================== 2025-12-04T11:42:56.2629989Z The following tests failed consistently: ['test/inductor/test_deterministic.py::DeterministicTest::test_run2run_determinism_model_name_DistillGPT2_training_or_inference_inference_precision_bfloat16'] 2025-12-04T11:42:56.2631070Z 2025-12-04T11:42:56.2631656Z FINISHED PRINTING LOG FILE of inductor/test_deterministic 5/8 (test/test-reports/inductor.test_deterministic_5.8_04041ff7a6ce6208_.log) 2025-12-04T11:42:56.2632357Z 2025-12-04T11:42:56.2632780Z Finished inductor/test_deterministic 5/8 ... [2025-12-04 11:42:56.169381][9004.552272517], took 12.91min 2025-12-04T11:42:56.2634141Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_deterministic/inductor.test_deterministic-ccc55353a2e77d8f.xml 2025-12-04T11:42:56.2659912Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_deterministic/inductor.test_deterministic-cbc1aeff512c7b0d.xml 2025-12-04T11:42:56.3114942Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_deterministic/inductor.test_deterministic-b35d65d1a2e42e4e.xml 2025-12-04T11:42:56.3489784Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_deterministic/inductor.test_deterministic-feba5ff46dbc30dd.xml 2025-12-04T11:42:56.6646203Z Uploading logs for 57119749248 to S3 2025-12-04T11:42:56.7516770Z Uploading artifacts took 0.37 seconds 2025-12-04T11:42:56.7517200Z inductor/test_deterministic 5/8 failed! 2025-12-04T11:42:56.7521647Z Running inductor/test_fp8 1/1 ... [2025-12-04 11:42:56.751998][9005.134892596] 2025-12-04T11:42:56.7522176Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T11:42:56.7527059Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_fp8.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 11:42:56.752479] 2025-12-04T12:15:04.7378265Z 2025-12-04T12:15:04.7379343Z PRINTING LOG FILE of inductor/test_fp8 1/1 (test/test-reports/inductor.test_fp8_1.1_5b24deb545871ee8_.log) 2025-12-04T12:15:04.7380950Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-dff864e79f1bf91b.xml 2025-12-04T12:15:04.7382116Z ============================= test session starts ============================== 2025-12-04T12:15:04.7383054Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T12:15:04.7384090Z cachedir: .pytest_cache 2025-12-04T12:15:04.7385223Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:15:04.7386404Z rootdir: /var/lib/jenkins/workspace 2025-12-04T12:15:04.7387022Z configfile: pytest.ini 2025-12-04T12:15:04.7388358Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T12:15:04.7389480Z collecting ... collected 188 items 2025-12-04T12:15:04.7390144Z stepcurrent: Cannot find last run test, not skipping 2025-12-04T12:15:04.7564729Z Running 188 items in this shard: test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e5m2_shape_1,1,15_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e5m2_shape_1,10,15_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e5m2_shape_1,10,4096_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e5m2_shape_1,10,512_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e5m2_shape_4,2048,4096_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e5m2_shape_1,1,15_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e5m2_shape_1,10,15_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e5m2_shape_1,10,4096_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e5m2_shape_1,10,512_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e5m2_shape_4,2048,4096_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_bad_cast_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_eager_fallback_bfloat16_cuda_bfloat16, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_eager_fallback_float16_cuda_float16, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_benchmark_float8_e4m3fn_shape_4,2048,4096_keepdim_False_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_benchmark_float8_e4m3fn_shape_4,2048,4096_keepdim_True_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_benchmark_float8_e5m2_shape_4,2048,4096_keepdim_False_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_benchmark_float8_e5m2_shape_4,2048,4096_keepdim_True_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,1,15_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,15_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,4096_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,512_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_4,2048,4096_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,1,15_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,15_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,4096_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,512_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_4,2048,4096_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e5m2_amax_keep_dim_False_shape_1,1,15_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e5m2_amax_keep_dim_False_shape_1,10,15_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e5m2_amax_keep_dim_False_shape_1,10,4096_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e5m2_amax_keep_dim_False_shape_1,10,512_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e5m2_amax_keep_dim_False_shape_4,2048,4096_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e5m2_amax_keep_dim_True_shape_1,1,15_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e5m2_amax_keep_dim_True_shape_1,10,15_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e5m2_amax_keep_dim_True_shape_1,10,4096_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e5m2_amax_keep_dim_True_shape_1,10,512_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e5m2_amax_keep_dim_True_shape_4,2048,4096_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_bfloat16_float8_e4m3fn_shape_16,16,16_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_bfloat16_float8_e4m3fn_shape_4,2048,4096_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_bfloat16_float8_e5m2_shape_16,16,16_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_bfloat16_float8_e5m2_shape_4,2048,4096_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float16_float8_e4m3fn_shape_16,16,16_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float16_float8_e4m3fn_shape_4,2048,4096_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float16_float8_e5m2_shape_16,16,16_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float16_float8_e5m2_shape_4,2048,4096_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float32_float8_e4m3fn_shape_16,16,16_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float32_float8_e4m3fn_shape_4,2048,4096_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float32_float8_e5m2_shape_16,16,16_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float32_float8_e5m2_shape_4,2048,4096_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_bfloat16_shape_15,3,13_dst_types0_cuda_bfloat16, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_bfloat16_shape_4,2048,4096_dst_types0_cuda_bfloat16, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float16_shape_15,3,13_dst_types0_cuda_float16, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float16_shape_4,2048,4096_dst_types0_cuda_float16, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float32_shape_15,3,13_dst_types0_cuda_float32, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float32_shape_4,2048,4096_dst_types0_cuda_float32, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_xblock_for_small_numel_float8_e4m3fn_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_xblock_for_small_numel_float8_e5m2_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_main_loop_scaling_shape0_use_fast_accum_False_scaling_block_sizes0_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_main_loop_scaling_shape0_use_fast_accum_False_scaling_block_sizes1_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_main_loop_scaling_shape0_use_fast_accum_True_scaling_block_sizes0_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_main_loop_scaling_shape0_use_fast_accum_True_scaling_block_sizes1_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_main_loop_scaling_shape1_use_fast_accum_False_scaling_block_sizes0_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_main_loop_scaling_shape1_use_fast_accum_False_scaling_block_sizes1_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_main_loop_scaling_shape1_use_fast_accum_True_scaling_block_sizes0_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_main_loop_scaling_shape1_use_fast_accum_True_scaling_block_sizes1_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_mx_fp8_max_autotune_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_mx_fusion_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1024_K_1024_N_16_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1024_K_1024_N_2048_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1024_K_16_N_16_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1024_K_16_N_2048_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1024_K_32_N_16_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1024_K_32_N_2048_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1_K_1024_N_16_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1_K_1024_N_2048_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1_K_16_N_16_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1_K_16_N_2048_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1_K_32_N_16_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1_K_32_N_2048_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_257_K_1024_N_16_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_257_K_1024_N_2048_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_257_K_16_N_16_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_257_K_16_N_2048_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_257_K_32_N_16_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_257_K_32_N_2048_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_33_K_1024_N_16_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_33_K_1024_N_2048_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_33_K_16_N_16_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_33_K_16_N_2048_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_33_K_32_N_16_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_33_K_32_N_2048_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_3_K_1024_N_16_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_3_K_1024_N_2048_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_3_K_16_N_16_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_3_K_16_N_2048_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_3_K_32_N_16_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_3_K_32_N_2048_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_1024,1024,512_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_1024,1024,512_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_1024,1024,512_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_1024,1024,512_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_16,16,32_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_16,16,32_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_16,16,32_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_16,16,32_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_16,32,32_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_16,32,32_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_16,32,32_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_16,32,32_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_tma_template_shape_1024,1024,512_use_fast_accum_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_tma_template_shape_1024,1024,512_use_fast_accum_True_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_tma_template_shape_16,32,32_use_fast_accum_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_tma_template_shape_16,32,32_use_fast_accum_True_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_scaled_mm_preserves_strides_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1024_K_1024_N_16_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1024_K_1024_N_2048_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1024_K_16_N_16_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1024_K_16_N_2048_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1024_K_32_N_16_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1024_K_32_N_2048_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1_K_1024_N_16_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1_K_1024_N_2048_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1_K_16_N_16_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1_K_16_N_2048_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1_K_32_N_16_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1_K_32_N_2048_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_257_K_1024_N_16_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_257_K_1024_N_2048_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_257_K_16_N_16_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_257_K_16_N_2048_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_257_K_32_N_16_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_257_K_32_N_2048_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_33_K_1024_N_16_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_33_K_1024_N_2048_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_33_K_16_N_16_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_33_K_16_N_2048_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_33_K_32_N_16_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_33_K_32_N_2048_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_3_K_1024_N_16_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_3_K_1024_N_2048_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_3_K_16_N_16_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_3_K_16_N_2048_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_3_K_32_N_16_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_3_K_32_N_2048_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_bfloat16_shape_1024,1024,512_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda_bfloat16, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_bfloat16_shape_1024,1024,512_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda_bfloat16, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_bfloat16_shape_1024,1024,512_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda_bfloat16, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_bfloat16_shape_1024,1024,512_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda_bfloat16, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_bfloat16_shape_16,16,32_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda_bfloat16, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_bfloat16_shape_16,16,32_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda_bfloat16, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_bfloat16_shape_16,16,32_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda_bfloat16, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_bfloat16_shape_16,16,32_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda_bfloat16, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_bfloat16_shape_16,32,32_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda_bfloat16, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_bfloat16_shape_16,32,32_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda_bfloat16, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_bfloat16_shape_16,32,32_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda_bfloat16, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_bfloat16_shape_16,32,32_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda_bfloat16, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_float32_shape_1024,1024,512_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda_float32, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_float32_shape_1024,1024,512_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda_float32, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_float32_shape_1024,1024,512_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda_float32, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_float32_shape_1024,1024,512_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda_float32, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_float32_shape_16,16,32_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda_float32, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_float32_shape_16,16,32_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda_float32, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_float32_shape_16,16,32_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda_float32, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_float32_shape_16,16,32_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda_float32, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_float32_shape_16,32,32_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda_float32, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_float32_shape_16,32,32_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda_float32, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_float32_shape_16,32,32_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda_float32, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_float32_shape_16,32,32_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda_float32, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_tma_template_bfloat16_shape_1024,1024,512_use_fast_accum_False_cuda_bfloat16, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_tma_template_bfloat16_shape_1024,1024,512_use_fast_accum_True_cuda_bfloat16, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_tma_template_bfloat16_shape_16,32,32_use_fast_accum_False_cuda_bfloat16, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_tma_template_bfloat16_shape_16,32,32_use_fast_accum_True_cuda_bfloat16, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_tma_template_float32_shape_1024,1024,512_use_fast_accum_False_cuda_float32, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_tma_template_float32_shape_1024,1024,512_use_fast_accum_True_cuda_float32, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_tma_template_float32_shape_16,32,32_use_fast_accum_False_cuda_float32, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_tma_template_float32_shape_16,32,32_use_fast_accum_True_cuda_float32, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_unacceptable_input_dims_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_unacceptable_scale_dims_rowwise_scaling_cuda 2025-12-04T12:15:04.7747065Z 2025-12-04T12:15:04.7749352Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda E1204 11:43:10.973000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0 2025-12-04T12:15:04.7753374Z E1204 11:43:10.973000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:04.7756134Z E1204 11:43:10.973000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T12:15:04.7758125Z E1204 11:43:10.973000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 15 2025-12-04T12:15:04.7759792Z E1204 11:43:10.973000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] R0_BLOCK: tl.constexpr = 16 2025-12-04T12:15:04.7761827Z E1204 11:43:10.973000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T12:15:04.7763900Z E1204 11:43:10.973000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T12:15:04.7766341Z E1204 11:43:10.973000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:04.7768664Z E1204 11:43:10.973000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T12:15:04.7770936Z E1204 11:43:10.973000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T12:15:04.7772774Z E1204 11:43:10.973000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T12:15:04.7774585Z E1204 11:43:10.973000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_offset = 0 2025-12-04T12:15:04.7776257Z E1204 11:43:10.973000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T12:15:04.7777731Z E1204 11:43:10.973000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T12:15:04.7779359Z E1204 11:43:10.973000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T12:15:04.7780865Z E1204 11:43:10.973000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_0 = r0_index 2025-12-04T12:15:04.7782677Z E1204 11:43:10.973000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0).to(tl.float32) 2025-12-04T12:15:04.7784749Z E1204 11:43:10.973000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = tl.load(in_ptr1 + (0)) 2025-12-04T12:15:04.7786484Z E1204 11:43:10.973000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tl.broadcast_to(tmp7, [1, 1]) 2025-12-04T12:15:04.7788009Z E1204 11:43:10.973000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tl_math.abs(tmp0) 2025-12-04T12:15:04.7789706Z E1204 11:43:10.973000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK]) 2025-12-04T12:15:04.7791595Z E1204 11:43:10.973000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tl.where(r0_mask, tmp2, float("-inf")) 2025-12-04T12:15:04.7793690Z E1204 11:43:10.973000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = triton_helpers.max2(tmp4, 1)[:, None].to(tl.float32) 2025-12-04T12:15:04.7795557Z E1204 11:43:10.973000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp6 = tmp0.to(tl.float32) 2025-12-04T12:15:04.7797277Z E1204 11:43:10.973000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = tmp6 * tmp8 2025-12-04T12:15:04.7799095Z E1204 11:43:10.973000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = -448.0 2025-12-04T12:15:04.7800963Z E1204 11:43:10.973000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = triton_helpers.maximum(tmp9, tmp10) 2025-12-04T12:15:04.7802771Z E1204 11:43:10.973000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = 448.0 2025-12-04T12:15:04.7804741Z E1204 11:43:10.973000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = triton_helpers.minimum(tmp11, tmp12) 2025-12-04T12:15:04.7806595Z E1204 11:43:10.973000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp14 = tmp13.to(tl.float8e4nv) 2025-12-04T12:15:04.7808684Z E1204 11:43:10.973000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (tl.broadcast_to(r0_0, [XBLOCK, R0_BLOCK])), tmp14, r0_mask) 2025-12-04T12:15:04.7811141Z E1204 11:43:10.973000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr2 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp5, None) 2025-12-04T12:15:04.7813138Z E1204 11:43:10.973000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:04.7816829Z E1204 11:43:10.973000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'out_ptr2': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 75} 2025-12-04T12:15:04.7821099Z E1204 11:43:10.973000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:04.7823705Z E1204 11:43:10.973000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:04.7826266Z E1204 11:43:10.973000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:04.7828340Z E1204 11:43:10.973000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:04.7831422Z E1204 11:43:10.973000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:04.7834116Z E1204 11:43:10.973000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:04.7836762Z E1204 11:43:10.973000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:04.7838687Z E1204 11:43:10.973000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T12:15:04.7840760Z E1204 11:43:10.973000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:04.7842935Z E1204 11:43:10.973000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T12:15:04.7844965Z E1204 11:43:10.973000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:04.7846666Z ('RERUN', {'yellow': True}) [3.2912s] [ 0%] 2025-12-04T12:15:04.7848952Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda E1204 11:43:11.361000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0 2025-12-04T12:15:04.7852510Z E1204 11:43:11.361000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:04.7854613Z E1204 11:43:11.361000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T12:15:04.7855875Z E1204 11:43:11.361000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 15 2025-12-04T12:15:04.7857624Z E1204 11:43:11.361000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] R0_BLOCK: tl.constexpr = 16 2025-12-04T12:15:04.7859009Z E1204 11:43:11.361000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T12:15:04.7860624Z E1204 11:43:11.361000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T12:15:04.7862341Z E1204 11:43:11.361000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:04.7863923Z E1204 11:43:11.361000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T12:15:04.7865865Z E1204 11:43:11.361000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T12:15:04.7867838Z E1204 11:43:11.361000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T12:15:04.7869444Z E1204 11:43:11.361000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_offset = 0 2025-12-04T12:15:04.7871132Z E1204 11:43:11.361000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T12:15:04.7872734Z E1204 11:43:11.361000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T12:15:04.7874407Z E1204 11:43:11.361000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T12:15:04.7875969Z E1204 11:43:11.361000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_0 = r0_index 2025-12-04T12:15:04.7877668Z E1204 11:43:11.361000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0).to(tl.float32) 2025-12-04T12:15:04.7879605Z E1204 11:43:11.361000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = tl.load(in_ptr1 + (0)) 2025-12-04T12:15:04.7881215Z E1204 11:43:11.361000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tl.broadcast_to(tmp7, [1, 1]) 2025-12-04T12:15:04.7882818Z E1204 11:43:11.361000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tl_math.abs(tmp0) 2025-12-04T12:15:04.7884821Z E1204 11:43:11.361000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK]) 2025-12-04T12:15:04.7886893Z E1204 11:43:11.361000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tl.where(r0_mask, tmp2, float("-inf")) 2025-12-04T12:15:04.7888993Z E1204 11:43:11.361000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = triton_helpers.max2(tmp4, 1)[:, None].to(tl.float32) 2025-12-04T12:15:04.7890878Z E1204 11:43:11.361000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp6 = tmp0.to(tl.float32) 2025-12-04T12:15:04.7892586Z E1204 11:43:11.361000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = tmp6 * tmp8 2025-12-04T12:15:04.7894340Z E1204 11:43:11.361000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = -448.0 2025-12-04T12:15:04.7896141Z E1204 11:43:11.361000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = triton_helpers.maximum(tmp9, tmp10) 2025-12-04T12:15:04.7897913Z E1204 11:43:11.361000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = 448.0 2025-12-04T12:15:04.7899823Z E1204 11:43:11.361000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = triton_helpers.minimum(tmp11, tmp12) 2025-12-04T12:15:04.7901976Z E1204 11:43:11.361000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp14 = tmp13.to(tl.float8e4nv) 2025-12-04T12:15:04.7904221Z E1204 11:43:11.361000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (tl.broadcast_to(r0_0, [XBLOCK, R0_BLOCK])), tmp14, r0_mask) 2025-12-04T12:15:04.7906219Z E1204 11:43:11.361000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr2 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp5, None) 2025-12-04T12:15:04.7907901Z E1204 11:43:11.361000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:04.7911527Z E1204 11:43:11.361000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'out_ptr2': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 75} 2025-12-04T12:15:04.7915455Z E1204 11:43:11.361000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:04.7917576Z E1204 11:43:11.361000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:04.7920473Z E1204 11:43:11.361000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:04.7922629Z E1204 11:43:11.361000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:04.7924964Z E1204 11:43:11.361000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:04.7927642Z E1204 11:43:11.361000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:04.7930470Z E1204 11:43:11.361000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:04.7932598Z E1204 11:43:11.361000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T12:15:04.7935446Z E1204 11:43:11.361000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:04.7937841Z E1204 11:43:11.361000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T12:15:04.7941197Z E1204 11:43:11.361000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:04.7943253Z ('RERUN', {'yellow': True}) [0.3483s] [ 0%] 2025-12-04T12:15:04.7945355Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda E1204 11:43:11.713000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0 2025-12-04T12:15:04.7948782Z E1204 11:43:11.713000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:04.7950321Z E1204 11:43:11.713000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T12:15:04.7951463Z E1204 11:43:11.713000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 15 2025-12-04T12:15:04.7952567Z E1204 11:43:11.713000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] R0_BLOCK: tl.constexpr = 16 2025-12-04T12:15:04.7953672Z E1204 11:43:11.713000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T12:15:04.7954811Z E1204 11:43:11.713000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T12:15:04.7956027Z E1204 11:43:11.713000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:04.7957301Z E1204 11:43:11.713000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T12:15:04.7958604Z E1204 11:43:11.713000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T12:15:04.7959891Z E1204 11:43:11.713000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T12:15:04.7961036Z E1204 11:43:11.713000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_offset = 0 2025-12-04T12:15:04.7962151Z E1204 11:43:11.713000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T12:15:04.7963274Z E1204 11:43:11.713000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T12:15:04.7964354Z E1204 11:43:11.713000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T12:15:04.7965416Z E1204 11:43:11.713000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_0 = r0_index 2025-12-04T12:15:04.7966664Z E1204 11:43:11.713000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0).to(tl.float32) 2025-12-04T12:15:04.7968038Z E1204 11:43:11.713000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = tl.load(in_ptr1 + (0)) 2025-12-04T12:15:04.7969250Z E1204 11:43:11.713000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tl.broadcast_to(tmp7, [1, 1]) 2025-12-04T12:15:04.7970445Z E1204 11:43:11.713000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tl_math.abs(tmp0) 2025-12-04T12:15:04.7971908Z E1204 11:43:11.713000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK]) 2025-12-04T12:15:04.7973348Z E1204 11:43:11.713000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tl.where(r0_mask, tmp2, float("-inf")) 2025-12-04T12:15:04.7974682Z E1204 11:43:11.713000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = triton_helpers.max2(tmp4, 1)[:, None].to(tl.float32) 2025-12-04T12:15:04.7975970Z E1204 11:43:11.713000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp6 = tmp0.to(tl.float32) 2025-12-04T12:15:04.7977163Z E1204 11:43:11.713000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = tmp6 * tmp8 2025-12-04T12:15:04.7978279Z E1204 11:43:11.713000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = -448.0 2025-12-04T12:15:04.7979429Z E1204 11:43:11.713000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = triton_helpers.maximum(tmp9, tmp10) 2025-12-04T12:15:04.7980583Z E1204 11:43:11.713000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = 448.0 2025-12-04T12:15:04.7981745Z E1204 11:43:11.713000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = triton_helpers.minimum(tmp11, tmp12) 2025-12-04T12:15:04.7982990Z E1204 11:43:11.713000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp14 = tmp13.to(tl.float8e4nv) 2025-12-04T12:15:04.7984355Z E1204 11:43:11.713000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (tl.broadcast_to(r0_0, [XBLOCK, R0_BLOCK])), tmp14, r0_mask) 2025-12-04T12:15:04.7985900Z E1204 11:43:11.713000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr2 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp5, None) 2025-12-04T12:15:04.7987113Z E1204 11:43:11.713000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:04.7989712Z E1204 11:43:11.713000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'out_ptr2': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 75} 2025-12-04T12:15:04.7992447Z E1204 11:43:11.713000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:04.7994166Z E1204 11:43:11.713000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:04.7995970Z E1204 11:43:11.713000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:04.7997630Z E1204 11:43:11.713000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:04.7999391Z E1204 11:43:11.713000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:04.8001105Z E1204 11:43:11.713000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:04.8002930Z E1204 11:43:11.713000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:04.8004471Z E1204 11:43:11.713000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T12:15:04.8006183Z E1204 11:43:11.713000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:04.8007695Z E1204 11:43:11.713000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T12:15:04.8009099Z E1204 11:43:11.713000 108452 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:04.8010238Z FAILED [0.3498s] [ 0%] 2025-12-04T12:15:04.8010420Z 2025-12-04T12:15:04.8010565Z ==================================== RERUNS ==================================== 2025-12-04T12:15:04.8011181Z _ TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda _ 2025-12-04T12:15:04.8011769Z Traceback (most recent call last): 2025-12-04T12:15:04.8012453Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 265, in test_amax_along_with_fp8_quant 2025-12-04T12:15:04.8013296Z y_compiled = compiled_amax_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T12:15:04.8014166Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:04.8015052Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:04.8015945Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:04.8016876Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:04.8017725Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:04.8018538Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:04.8019349Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:04.8020354Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:04.8021352Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:04.8022170Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:04.8022921Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:04.8023671Z return self._compile_to_module() 2025-12-04T12:15:04.8024406Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:04.8025188Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:04.8026008Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:04.8026798Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:04.8027600Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:04.8028463Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:04.8029430Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:04.8030293Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:04.8031059Z File "/tmp/tmp7wdl8vg8/ha/chatiivoxdb5gtbqpamfs2lmbuetlnhbvebwzol5gw2sywjeo333.py", line 62, in 2025-12-04T12:15:04.8032219Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T12:15:04.8032938Z kernel.precompile( 2025-12-04T12:15:04.8033688Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T12:15:04.8034491Z self._precompile_worker() 2025-12-04T12:15:04.8035315Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:04.8036264Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:04.8037171Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:04.8038101Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:04.8038898Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:04.8039739Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:04.8040572Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:04.8041487Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:04.8042196Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:04.8043080Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:04.8043820Z ^ 2025-12-04T12:15:04.8044410Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:04.8045012Z 2025-12-04T12:15:04.8045723Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:04.8046574Z 2025-12-04T12:15:04.8046579Z 2025-12-04T12:15:04.8046811Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:04.8047799Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda 2025-12-04T12:15:04.8048561Z 2025-12-04T12:15:04.8048832Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:04.8049475Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:04.8049957Z frames [('total', 1)] 2025-12-04T12:15:04.8050244Z stats [('calls_captured', 7)] 2025-12-04T12:15:04.8050699Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:04.8051299Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:04.8051769Z graph_break [] 2025-12-04T12:15:04.8052250Z _ TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda _ 2025-12-04T12:15:04.8052840Z Traceback (most recent call last): 2025-12-04T12:15:04.8053539Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 265, in test_amax_along_with_fp8_quant 2025-12-04T12:15:04.8054369Z y_compiled = compiled_amax_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T12:15:04.8055293Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:04.8056180Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:04.8057289Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:04.8058202Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:04.8059051Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:04.8059919Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:04.8060770Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:04.8061779Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:04.8062780Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:04.8063601Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:04.8064418Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:04.8065173Z return self._compile_to_module() 2025-12-04T12:15:04.8065915Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:04.8066718Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:04.8067532Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:04.8068335Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:04.8069102Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:04.8069979Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:04.8070928Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:04.8071991Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:04.8072759Z File "/tmp/tmpzyu2a4d5/uh/cuhgwijot7lhtmot4esqh5jijysnud4eeu6s4bqkh2ficdnykgeq.py", line 62, in 2025-12-04T12:15:04.8073871Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T12:15:04.8074594Z kernel.precompile( 2025-12-04T12:15:04.8075347Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T12:15:04.8076162Z self._precompile_worker() 2025-12-04T12:15:04.8076970Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:04.8077893Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:04.8078807Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:04.8079751Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:04.8080532Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:04.8081377Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:04.8082219Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:04.8083127Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:04.8083889Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:04.8084863Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:04.8085623Z ^ 2025-12-04T12:15:04.8086306Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:04.8086912Z 2025-12-04T12:15:04.8087625Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:04.8088535Z 2025-12-04T12:15:04.8088541Z 2025-12-04T12:15:04.8088761Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:04.8089791Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda 2025-12-04T12:15:04.8090557Z 2025-12-04T12:15:04.8090838Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:04.8091468Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:04.8091947Z frames [('total', 1)] 2025-12-04T12:15:04.8092247Z stats [('calls_captured', 7)] 2025-12-04T12:15:04.8092737Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:04.8093341Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:04.8093807Z graph_break [] 2025-12-04T12:15:04.8094176Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:04.8094650Z frames [('total', 1)] 2025-12-04T12:15:04.8094957Z stats [('calls_captured', 7)] 2025-12-04T12:15:04.8095387Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:04.8095980Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:04.8096524Z graph_break [] 2025-12-04T12:15:04.8096832Z =================================== FAILURES =================================== 2025-12-04T12:15:04.8097433Z _ TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda _ 2025-12-04T12:15:04.8098020Z Traceback (most recent call last): 2025-12-04T12:15:04.8098720Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 265, in test_amax_along_with_fp8_quant 2025-12-04T12:15:04.8099545Z y_compiled = compiled_amax_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T12:15:04.8100416Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:04.8101296Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:04.8102203Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:04.8103037Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:04.8103880Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:04.8104678Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:04.8105495Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:04.8106486Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:04.8107477Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:04.8108293Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:04.8109060Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:04.8109795Z return self._compile_to_module() 2025-12-04T12:15:04.8110523Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:04.8112014Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:04.8112921Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:04.8113721Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:04.8114482Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:04.8115357Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:04.8116350Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:04.8117213Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:04.8118021Z File "/tmp/tmpym30s4rg/f7/cf73uwgybamxghyudgisrciw3ukevb3eyyij2dnyozakiw2bi4a7.py", line 62, in 2025-12-04T12:15:04.8119138Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T12:15:04.8119846Z kernel.precompile( 2025-12-04T12:15:04.8120600Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T12:15:04.8121458Z self._precompile_worker() 2025-12-04T12:15:04.8122264Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:04.8219729Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:04.8220735Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:04.8221671Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:04.8222455Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:04.8223285Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:04.8224115Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:04.8225025Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:04.8225725Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:04.8226587Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:04.8227319Z ^ 2025-12-04T12:15:04.8227898Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:04.8229088Z 2025-12-04T12:15:04.8229847Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:04.8230684Z 2025-12-04T12:15:04.8230689Z 2025-12-04T12:15:04.8230914Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:04.8232207Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda 2025-12-04T12:15:04.8233369Z 2025-12-04T12:15:04.8233644Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:04.8234274Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:04.8234748Z frames [('total', 1)] 2025-12-04T12:15:04.8235026Z stats [('calls_captured', 7)] 2025-12-04T12:15:04.8235479Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:04.8236077Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:04.8236536Z graph_break [] 2025-12-04T12:15:04.8236895Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:04.8237352Z frames [('total', 1)] 2025-12-04T12:15:04.8237639Z stats [('calls_captured', 7)] 2025-12-04T12:15:04.8238280Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:04.8238870Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:04.8239332Z graph_break [] 2025-12-04T12:15:04.8239685Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:04.8240140Z frames [('total', 1)] 2025-12-04T12:15:04.8240425Z stats [('calls_captured', 7)] 2025-12-04T12:15:04.8240936Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:04.8241507Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:04.8242043Z graph_break [] 2025-12-04T12:15:04.8242845Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-dff864e79f1bf91b.xml - 2025-12-04T12:15:04.8243782Z =========================== short test summary info ============================ 2025-12-04T12:15:04.8244880Z FAILED [0.3498s] inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:04.8246342Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:04.8247085Z ^ 2025-12-04T12:15:04.8247646Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:04.8248244Z 2025-12-04T12:15:04.8248952Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:04.8249792Z 2025-12-04T12:15:04.8249797Z 2025-12-04T12:15:04.8250008Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:04.8250974Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda 2025-12-04T12:15:04.8251740Z 2025-12-04T12:15:04.8252013Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:04.8252581Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T12:15:04.8253053Z ========================== 1 failed, 2 rerun in 4.03s ========================== 2025-12-04T12:15:04.8253771Z Got exit code 1 2025-12-04T12:15:04.8254032Z Retrying single test... 2025-12-04T12:15:04.8254683Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-053a0e10a178eff6.xml 2025-12-04T12:15:04.8255639Z ============================= test session starts ============================== 2025-12-04T12:15:04.8256383Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T12:15:04.8256972Z cachedir: .pytest_cache 2025-12-04T12:15:04.8257675Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:15:04.8258467Z rootdir: /var/lib/jenkins/workspace 2025-12-04T12:15:04.8258809Z configfile: pytest.ini 2025-12-04T12:15:04.8259573Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T12:15:04.8260518Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T12:15:04.8261585Z stepcurrent: skipping 0 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda 2025-12-04T12:15:04.8262526Z Running 1 items in this shard 2025-12-04T12:15:04.8262745Z 2025-12-04T12:15:04.8264043Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda E1204 11:43:30.857000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0 2025-12-04T12:15:04.8266372Z E1204 11:43:30.857000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:04.8267887Z E1204 11:43:30.857000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T12:15:04.8268938Z E1204 11:43:30.857000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 15 2025-12-04T12:15:04.8270044Z E1204 11:43:30.857000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] R0_BLOCK: tl.constexpr = 16 2025-12-04T12:15:04.8271412Z E1204 11:43:30.857000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T12:15:04.8272765Z E1204 11:43:30.857000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T12:15:04.8273974Z E1204 11:43:30.857000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:04.8275315Z E1204 11:43:30.857000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T12:15:04.8276618Z E1204 11:43:30.857000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T12:15:04.8278057Z E1204 11:43:30.857000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T12:15:04.8279231Z E1204 11:43:30.857000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_offset = 0 2025-12-04T12:15:04.8280316Z E1204 11:43:30.857000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T12:15:04.8281428Z E1204 11:43:30.857000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T12:15:04.8282497Z E1204 11:43:30.857000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T12:15:04.8283534Z E1204 11:43:30.857000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_0 = r0_index 2025-12-04T12:15:04.8284751Z E1204 11:43:30.857000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0).to(tl.float32) 2025-12-04T12:15:04.8286046Z E1204 11:43:30.857000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = tl.load(in_ptr1 + (0)) 2025-12-04T12:15:04.8287237Z E1204 11:43:30.857000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tl.broadcast_to(tmp7, [1, 1]) 2025-12-04T12:15:04.8288417Z E1204 11:43:30.857000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tl_math.abs(tmp0) 2025-12-04T12:15:04.8289619Z E1204 11:43:30.857000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK]) 2025-12-04T12:15:04.8290902Z E1204 11:43:30.857000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tl.where(r0_mask, tmp2, float("-inf")) 2025-12-04T12:15:04.8292227Z E1204 11:43:30.857000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = triton_helpers.max2(tmp4, 1)[:, None].to(tl.float32) 2025-12-04T12:15:04.8293485Z E1204 11:43:30.857000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp6 = tmp0.to(tl.float32) 2025-12-04T12:15:04.8294600Z E1204 11:43:30.857000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = tmp6 * tmp8 2025-12-04T12:15:04.8295719Z E1204 11:43:30.857000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = -448.0 2025-12-04T12:15:04.8296939Z E1204 11:43:30.857000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = triton_helpers.maximum(tmp9, tmp10) 2025-12-04T12:15:04.8298090Z E1204 11:43:30.857000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = 448.0 2025-12-04T12:15:04.8299290Z E1204 11:43:30.857000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = triton_helpers.minimum(tmp11, tmp12) 2025-12-04T12:15:04.8300558Z E1204 11:43:30.857000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp14 = tmp13.to(tl.float8e4nv) 2025-12-04T12:15:04.8301932Z E1204 11:43:30.857000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (tl.broadcast_to(r0_0, [XBLOCK, R0_BLOCK])), tmp14, r0_mask) 2025-12-04T12:15:04.8303484Z E1204 11:43:30.857000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr2 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp5, None) 2025-12-04T12:15:04.8304719Z E1204 11:43:30.857000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:04.8307316Z E1204 11:43:30.857000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'out_ptr2': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 75} 2025-12-04T12:15:04.8310071Z E1204 11:43:30.857000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:04.8311781Z E1204 11:43:30.857000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:04.8313583Z E1204 11:43:30.857000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:04.8315249Z E1204 11:43:30.857000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:04.8316962Z E1204 11:43:30.857000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:04.8318665Z E1204 11:43:30.857000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:04.8320456Z E1204 11:43:30.857000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:04.8321975Z E1204 11:43:30.857000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T12:15:04.8323689Z E1204 11:43:30.857000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:04.8325151Z E1204 11:43:30.857000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T12:15:04.8326592Z E1204 11:43:30.857000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:04.8327746Z ('RERUN', {'yellow': True}) [3.3089s] [100%] 2025-12-04T12:15:04.8329261Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda E1204 11:43:31.244000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0 2025-12-04T12:15:04.8331630Z E1204 11:43:31.244000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:04.8333152Z E1204 11:43:31.244000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T12:15:04.8334159Z E1204 11:43:31.244000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 15 2025-12-04T12:15:04.8335287Z E1204 11:43:31.244000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] R0_BLOCK: tl.constexpr = 16 2025-12-04T12:15:04.8336473Z E1204 11:43:31.244000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T12:15:04.8337620Z E1204 11:43:31.244000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T12:15:04.8338818Z E1204 11:43:31.244000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:04.8340083Z E1204 11:43:31.244000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T12:15:04.8341401Z E1204 11:43:31.244000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T12:15:04.8342688Z E1204 11:43:31.244000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T12:15:04.8343827Z E1204 11:43:31.244000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_offset = 0 2025-12-04T12:15:04.8344930Z E1204 11:43:31.244000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T12:15:04.8346066Z E1204 11:43:31.244000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T12:15:04.8347140Z E1204 11:43:31.244000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T12:15:04.8348197Z E1204 11:43:31.244000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_0 = r0_index 2025-12-04T12:15:04.8349423Z E1204 11:43:31.244000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0).to(tl.float32) 2025-12-04T12:15:04.8350727Z E1204 11:43:31.244000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = tl.load(in_ptr1 + (0)) 2025-12-04T12:15:04.8351937Z E1204 11:43:31.244000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tl.broadcast_to(tmp7, [1, 1]) 2025-12-04T12:15:04.8353121Z E1204 11:43:31.244000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tl_math.abs(tmp0) 2025-12-04T12:15:04.8354326Z E1204 11:43:31.244000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK]) 2025-12-04T12:15:04.8355682Z E1204 11:43:31.244000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tl.where(r0_mask, tmp2, float("-inf")) 2025-12-04T12:15:04.8357024Z E1204 11:43:31.244000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = triton_helpers.max2(tmp4, 1)[:, None].to(tl.float32) 2025-12-04T12:15:04.8358292Z E1204 11:43:31.244000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp6 = tmp0.to(tl.float32) 2025-12-04T12:15:04.8359429Z E1204 11:43:31.244000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = tmp6 * tmp8 2025-12-04T12:15:04.8360515Z E1204 11:43:31.244000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = -448.0 2025-12-04T12:15:04.8361674Z E1204 11:43:31.244000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = triton_helpers.maximum(tmp9, tmp10) 2025-12-04T12:15:04.8362823Z E1204 11:43:31.244000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = 448.0 2025-12-04T12:15:04.8363981Z E1204 11:43:31.244000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = triton_helpers.minimum(tmp11, tmp12) 2025-12-04T12:15:04.8365244Z E1204 11:43:31.244000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp14 = tmp13.to(tl.float8e4nv) 2025-12-04T12:15:04.8366616Z E1204 11:43:31.244000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (tl.broadcast_to(r0_0, [XBLOCK, R0_BLOCK])), tmp14, r0_mask) 2025-12-04T12:15:04.8368165Z E1204 11:43:31.244000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr2 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp5, None) 2025-12-04T12:15:04.8369368Z E1204 11:43:31.244000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:04.8372147Z E1204 11:43:31.244000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'out_ptr2': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 75} 2025-12-04T12:15:04.8374906Z E1204 11:43:31.244000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:04.8376683Z E1204 11:43:31.244000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:04.8378494Z E1204 11:43:31.244000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:04.8380155Z E1204 11:43:31.244000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:04.8381865Z E1204 11:43:31.244000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:04.8383558Z E1204 11:43:31.244000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:04.8385352Z E1204 11:43:31.244000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:04.8386956Z E1204 11:43:31.244000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T12:15:04.8388663Z E1204 11:43:31.244000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:04.8390166Z E1204 11:43:31.244000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T12:15:04.8391597Z E1204 11:43:31.244000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:04.8392762Z ('RERUN', {'yellow': True}) [0.3470s] [100%] 2025-12-04T12:15:04.8394263Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda E1204 11:43:31.591000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0 2025-12-04T12:15:04.8396613Z E1204 11:43:31.591000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:04.8398140Z E1204 11:43:31.591000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T12:15:04.8399145Z E1204 11:43:31.591000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 15 2025-12-04T12:15:04.8400239Z E1204 11:43:31.591000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] R0_BLOCK: tl.constexpr = 16 2025-12-04T12:15:04.8401362Z E1204 11:43:31.591000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T12:15:04.8402501Z E1204 11:43:31.591000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T12:15:04.8403707Z E1204 11:43:31.591000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:04.8404967Z E1204 11:43:31.591000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T12:15:04.8406274Z E1204 11:43:31.591000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T12:15:04.8407550Z E1204 11:43:31.591000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T12:15:04.8408693Z E1204 11:43:31.591000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_offset = 0 2025-12-04T12:15:04.8409782Z E1204 11:43:31.591000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T12:15:04.8410919Z E1204 11:43:31.591000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T12:15:04.8411995Z E1204 11:43:31.591000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T12:15:04.8413041Z E1204 11:43:31.591000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_0 = r0_index 2025-12-04T12:15:04.8414263Z E1204 11:43:31.591000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0).to(tl.float32) 2025-12-04T12:15:04.8415566Z E1204 11:43:31.591000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = tl.load(in_ptr1 + (0)) 2025-12-04T12:15:04.8416879Z E1204 11:43:31.591000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tl.broadcast_to(tmp7, [1, 1]) 2025-12-04T12:15:04.8418074Z E1204 11:43:31.591000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tl_math.abs(tmp0) 2025-12-04T12:15:04.8419284Z E1204 11:43:31.591000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK]) 2025-12-04T12:15:04.8420665Z E1204 11:43:31.591000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tl.where(r0_mask, tmp2, float("-inf")) 2025-12-04T12:15:04.8422009Z E1204 11:43:31.591000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = triton_helpers.max2(tmp4, 1)[:, None].to(tl.float32) 2025-12-04T12:15:04.8423279Z E1204 11:43:31.591000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp6 = tmp0.to(tl.float32) 2025-12-04T12:15:04.8424389Z E1204 11:43:31.591000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = tmp6 * tmp8 2025-12-04T12:15:04.8425477Z E1204 11:43:31.591000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = -448.0 2025-12-04T12:15:04.8426630Z E1204 11:43:31.591000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = triton_helpers.maximum(tmp9, tmp10) 2025-12-04T12:15:04.8427781Z E1204 11:43:31.591000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = 448.0 2025-12-04T12:15:04.8428922Z E1204 11:43:31.591000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = triton_helpers.minimum(tmp11, tmp12) 2025-12-04T12:15:04.8430173Z E1204 11:43:31.591000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp14 = tmp13.to(tl.float8e4nv) 2025-12-04T12:15:04.8431557Z E1204 11:43:31.591000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (tl.broadcast_to(r0_0, [XBLOCK, R0_BLOCK])), tmp14, r0_mask) 2025-12-04T12:15:04.8433105Z E1204 11:43:31.591000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr2 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp5, None) 2025-12-04T12:15:04.8434315Z E1204 11:43:31.591000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:04.8436901Z E1204 11:43:31.591000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'out_ptr2': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 75} 2025-12-04T12:15:04.8439628Z E1204 11:43:31.591000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:04.8441351Z E1204 11:43:31.591000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:04.8443163Z E1204 11:43:31.591000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:04.8444811Z E1204 11:43:31.591000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:04.8446560Z E1204 11:43:31.591000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:04.8448245Z E1204 11:43:31.591000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:04.8450032Z E1204 11:43:31.591000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:04.8451599Z E1204 11:43:31.591000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T12:15:04.8453295Z E1204 11:43:31.591000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:04.8454747Z E1204 11:43:31.591000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T12:15:04.8456182Z E1204 11:43:31.591000 108649 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:04.8457407Z FAILED [0.3458s] [100%] 2025-12-04T12:15:04.8457599Z 2025-12-04T12:15:04.8457747Z ==================================== RERUNS ==================================== 2025-12-04T12:15:04.8458358Z _ TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda _ 2025-12-04T12:15:04.8458933Z Traceback (most recent call last): 2025-12-04T12:15:04.8459635Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 265, in test_amax_along_with_fp8_quant 2025-12-04T12:15:04.8460478Z y_compiled = compiled_amax_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T12:15:04.8461353Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:04.8462221Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:04.8463128Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:04.8463981Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:04.8464808Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:04.8465606Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:04.8466420Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:04.8467410Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:04.8468387Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:04.8469197Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:04.8469960Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:04.8470702Z return self._compile_to_module() 2025-12-04T12:15:04.8471619Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:04.8472406Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:04.8473222Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:04.8473993Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:04.8474748Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:04.8475707Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:04.8476674Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:04.8477515Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:04.8478261Z File "/tmp/tmpu_c4fj5y/nq/cnqwkjdxubfnokpzrjleqkzb7cjglpvbgvld3zxp2eqfsu3gtp6g.py", line 62, in 2025-12-04T12:15:04.8479393Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T12:15:04.8480142Z kernel.precompile( 2025-12-04T12:15:04.8480881Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T12:15:04.8481693Z self._precompile_worker() 2025-12-04T12:15:04.8482511Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:04.8483412Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:04.8484364Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:04.8485306Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:04.8486094Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:04.8486918Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:04.8487747Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:04.8488673Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:04.8489379Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:04.8490242Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:04.8490988Z ^ 2025-12-04T12:15:04.8491574Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:04.8492163Z 2025-12-04T12:15:04.8492888Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:04.8493729Z 2025-12-04T12:15:04.8493734Z 2025-12-04T12:15:04.8493951Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:04.8494937Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda 2025-12-04T12:15:04.8495704Z 2025-12-04T12:15:04.8495971Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:04.8496685Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:04.8497152Z frames [('total', 1)] 2025-12-04T12:15:04.8497453Z stats [('calls_captured', 7)] 2025-12-04T12:15:04.8497911Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:04.8498497Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:04.8498963Z graph_break [] 2025-12-04T12:15:04.8499437Z _ TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda _ 2025-12-04T12:15:04.8500019Z Traceback (most recent call last): 2025-12-04T12:15:04.8500697Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 265, in test_amax_along_with_fp8_quant 2025-12-04T12:15:04.8501530Z y_compiled = compiled_amax_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T12:15:04.8502403Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:04.8503309Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:04.8504211Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:04.8505056Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:04.8505893Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:04.8506715Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:04.8507565Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:04.8508555Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:04.8509540Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:04.8510338Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:04.8511130Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:04.8511874Z return self._compile_to_module() 2025-12-04T12:15:04.8512589Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:04.8513379Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:04.8514195Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:04.8514986Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:04.8515728Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:04.8516597Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:04.8517556Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:04.8518411Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:04.8519151Z File "/tmp/tmpv1k5hwll/zf/czfxhseglpjumrbiwe4oh6fpaq24w2prrcngy25ie3d6cjz2k4tw.py", line 62, in 2025-12-04T12:15:04.8520254Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T12:15:04.8520970Z kernel.precompile( 2025-12-04T12:15:04.8521705Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T12:15:04.8522515Z self._precompile_worker() 2025-12-04T12:15:04.8523330Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:04.8524242Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:04.8525141Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:04.8526081Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:04.8526870Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:04.8527713Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:04.8528535Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:04.8529468Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:04.8530168Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:04.8531033Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:04.8531788Z ^ 2025-12-04T12:15:04.8532422Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:04.8533022Z 2025-12-04T12:15:04.8533746Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:04.8534623Z 2025-12-04T12:15:04.8534628Z 2025-12-04T12:15:04.8534854Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:04.8535853Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda 2025-12-04T12:15:04.8536698Z 2025-12-04T12:15:04.8536968Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:04.8537606Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:04.8538079Z frames [('total', 1)] 2025-12-04T12:15:04.8538369Z stats [('calls_captured', 7)] 2025-12-04T12:15:04.8538857Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:04.8539456Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:04.8539904Z graph_break [] 2025-12-04T12:15:04.8540278Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:04.8540747Z frames [('total', 1)] 2025-12-04T12:15:04.8541032Z stats [('calls_captured', 7)] 2025-12-04T12:15:04.8541471Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:04.8542070Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:04.8542540Z graph_break [] 2025-12-04T12:15:04.8542840Z =================================== FAILURES =================================== 2025-12-04T12:15:04.8543453Z _ TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda _ 2025-12-04T12:15:04.8544030Z Traceback (most recent call last): 2025-12-04T12:15:04.8544712Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 265, in test_amax_along_with_fp8_quant 2025-12-04T12:15:04.8545548Z y_compiled = compiled_amax_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T12:15:04.8546416Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:04.8547291Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:04.8548184Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:04.8549032Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:04.8549869Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:04.8550656Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:04.8551467Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:04.8552460Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:04.8553444Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:04.8554250Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:04.8555019Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:04.8555762Z return self._compile_to_module() 2025-12-04T12:15:04.8556491Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:04.8557271Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:04.8558118Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:04.8558913Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:04.8559656Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:04.8560524Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:04.8561483Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:04.8562376Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:04.8563170Z File "/tmp/tmpohvbi7jj/i2/ci22fz46s6ajnyspd3wh56hubwynourlnmmhtqsrjubrvo46svo5.py", line 62, in 2025-12-04T12:15:04.8564269Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T12:15:04.8564987Z kernel.precompile( 2025-12-04T12:15:04.8565736Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T12:15:04.8566570Z self._precompile_worker() 2025-12-04T12:15:04.8567398Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:04.8568317Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:04.8569219Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:04.8570152Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:04.8571174Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:04.8572050Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:04.8572873Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:04.8573797Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:04.8574508Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:04.8575384Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:04.8576115Z ^ 2025-12-04T12:15:04.8576760Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:04.8577352Z 2025-12-04T12:15:04.8578080Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:04.8578924Z 2025-12-04T12:15:04.8578929Z 2025-12-04T12:15:04.8579158Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:04.8580131Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda 2025-12-04T12:15:04.8580904Z 2025-12-04T12:15:04.8581171Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:04.8581804Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:04.8582272Z frames [('total', 1)] 2025-12-04T12:15:04.8582561Z stats [('calls_captured', 7)] 2025-12-04T12:15:04.8583014Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:04.8583621Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:04.8584070Z graph_break [] 2025-12-04T12:15:04.8584447Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:04.8584912Z frames [('total', 1)] 2025-12-04T12:15:04.8585193Z stats [('calls_captured', 7)] 2025-12-04T12:15:04.8585718Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:04.8586319Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:04.8586799Z graph_break [] 2025-12-04T12:15:04.8587161Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:04.8587628Z frames [('total', 1)] 2025-12-04T12:15:04.8587924Z stats [('calls_captured', 7)] 2025-12-04T12:15:04.8588348Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:04.8588987Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:04.8589461Z graph_break [] 2025-12-04T12:15:04.8590305Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-053a0e10a178eff6.xml - 2025-12-04T12:15:04.8591270Z =========================== short test summary info ============================ 2025-12-04T12:15:04.8592367Z FAILED [0.3458s] inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:04.8593834Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:04.8594567Z ^ 2025-12-04T12:15:04.8595159Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:04.8595768Z 2025-12-04T12:15:04.8596481Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:04.8597318Z 2025-12-04T12:15:04.8597322Z 2025-12-04T12:15:04.8597558Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:04.8598543Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda 2025-12-04T12:15:04.8599299Z 2025-12-04T12:15:04.8599570Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:04.8600172Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T12:15:04.8600708Z ================== 1 failed, 187 deselected, 2 rerun in 4.05s ================== 2025-12-04T12:15:04.8601165Z Got exit code 1 2025-12-04T12:15:04.8601434Z Retrying single test... 2025-12-04T12:15:04.8602101Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-966288eeb3fe785e.xml 2025-12-04T12:15:04.8602885Z ============================= test session starts ============================== 2025-12-04T12:15:04.8603540Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T12:15:04.8604148Z cachedir: .pytest_cache 2025-12-04T12:15:04.8604867Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:15:04.8605657Z rootdir: /var/lib/jenkins/workspace 2025-12-04T12:15:04.8606005Z configfile: pytest.ini 2025-12-04T12:15:04.8606787Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T12:15:04.8607749Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T12:15:04.8608803Z stepcurrent: skipping 0 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda 2025-12-04T12:15:04.8609757Z Running 1 items in this shard 2025-12-04T12:15:04.8609987Z 2025-12-04T12:15:04.8611225Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda E1204 11:43:50.662000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0 2025-12-04T12:15:04.8613582Z E1204 11:43:50.662000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:04.8615106Z E1204 11:43:50.662000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T12:15:04.8616140Z E1204 11:43:50.662000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 15 2025-12-04T12:15:04.8617342Z E1204 11:43:50.662000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] R0_BLOCK: tl.constexpr = 16 2025-12-04T12:15:04.8618464Z E1204 11:43:50.662000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T12:15:04.8619608Z E1204 11:43:50.662000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T12:15:04.8620811Z E1204 11:43:50.662000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:04.8622102Z E1204 11:43:50.662000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T12:15:04.8623416Z E1204 11:43:50.662000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T12:15:04.8624704Z E1204 11:43:50.662000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T12:15:04.8625827Z E1204 11:43:50.662000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_offset = 0 2025-12-04T12:15:04.8626934Z E1204 11:43:50.662000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T12:15:04.8628065Z E1204 11:43:50.662000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T12:15:04.8629147Z E1204 11:43:50.662000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T12:15:04.8630179Z E1204 11:43:50.662000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_0 = r0_index 2025-12-04T12:15:04.8631412Z E1204 11:43:50.662000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0).to(tl.float32) 2025-12-04T12:15:04.8632722Z E1204 11:43:50.662000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = tl.load(in_ptr1 + (0)) 2025-12-04T12:15:04.8633930Z E1204 11:43:50.662000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tl.broadcast_to(tmp7, [1, 1]) 2025-12-04T12:15:04.8635113Z E1204 11:43:50.662000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tl_math.abs(tmp0) 2025-12-04T12:15:04.8636326Z E1204 11:43:50.662000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK]) 2025-12-04T12:15:04.8637620Z E1204 11:43:50.662000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tl.where(r0_mask, tmp2, float("-inf")) 2025-12-04T12:15:04.8638958Z E1204 11:43:50.662000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = triton_helpers.max2(tmp4, 1)[:, None].to(tl.float32) 2025-12-04T12:15:04.8640232Z E1204 11:43:50.662000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp6 = tmp0.to(tl.float32) 2025-12-04T12:15:04.8641337Z E1204 11:43:50.662000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = tmp6 * tmp8 2025-12-04T12:15:04.8642431Z E1204 11:43:50.662000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = -448.0 2025-12-04T12:15:04.8643594Z E1204 11:43:50.662000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = triton_helpers.maximum(tmp9, tmp10) 2025-12-04T12:15:04.8644743Z E1204 11:43:50.662000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = 448.0 2025-12-04T12:15:04.8645915Z E1204 11:43:50.662000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = triton_helpers.minimum(tmp11, tmp12) 2025-12-04T12:15:04.8647194Z E1204 11:43:50.662000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp14 = tmp13.to(tl.float8e4nv) 2025-12-04T12:15:04.8648577Z E1204 11:43:50.662000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (tl.broadcast_to(r0_0, [XBLOCK, R0_BLOCK])), tmp14, r0_mask) 2025-12-04T12:15:04.8650125Z E1204 11:43:50.662000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr2 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp5, None) 2025-12-04T12:15:04.8651359Z E1204 11:43:50.662000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:04.8653925Z E1204 11:43:50.662000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'out_ptr2': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 75} 2025-12-04T12:15:04.8656734Z E1204 11:43:50.662000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:04.8658462Z E1204 11:43:50.662000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:04.8660265Z E1204 11:43:50.662000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:04.8661928Z E1204 11:43:50.662000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:04.8663633Z E1204 11:43:50.662000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:04.8665316Z E1204 11:43:50.662000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:04.8667119Z E1204 11:43:50.662000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:04.8668641Z E1204 11:43:50.662000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T12:15:04.8670336Z E1204 11:43:50.662000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:04.8671985Z E1204 11:43:50.662000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T12:15:04.8673458Z E1204 11:43:50.662000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:04.8674627Z ('RERUN', {'yellow': True}) [3.3138s] [100%] 2025-12-04T12:15:04.8676121Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda E1204 11:43:51.061000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0 2025-12-04T12:15:04.8678532Z E1204 11:43:51.061000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:04.8680053Z E1204 11:43:51.061000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T12:15:04.8681057Z E1204 11:43:51.061000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 15 2025-12-04T12:15:04.8682199Z E1204 11:43:51.061000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] R0_BLOCK: tl.constexpr = 16 2025-12-04T12:15:04.8683312Z E1204 11:43:51.061000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T12:15:04.8684455Z E1204 11:43:51.061000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T12:15:04.8685661Z E1204 11:43:51.061000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:04.8686921Z E1204 11:43:51.061000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T12:15:04.8688235Z E1204 11:43:51.061000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T12:15:04.8689523Z E1204 11:43:51.061000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T12:15:04.8690650Z E1204 11:43:51.061000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_offset = 0 2025-12-04T12:15:04.8691753Z E1204 11:43:51.061000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T12:15:04.8692887Z E1204 11:43:51.061000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T12:15:04.8693967Z E1204 11:43:51.061000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T12:15:04.8695005Z E1204 11:43:51.061000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_0 = r0_index 2025-12-04T12:15:04.8696948Z E1204 11:43:51.061000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0).to(tl.float32) 2025-12-04T12:15:04.8698262Z E1204 11:43:51.061000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = tl.load(in_ptr1 + (0)) 2025-12-04T12:15:04.8699470Z E1204 11:43:51.061000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tl.broadcast_to(tmp7, [1, 1]) 2025-12-04T12:15:04.8700645Z E1204 11:43:51.061000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tl_math.abs(tmp0) 2025-12-04T12:15:04.8701868Z E1204 11:43:51.061000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK]) 2025-12-04T12:15:04.8703418Z E1204 11:43:51.061000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tl.where(r0_mask, tmp2, float("-inf")) 2025-12-04T12:15:04.8704887Z E1204 11:43:51.061000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = triton_helpers.max2(tmp4, 1)[:, None].to(tl.float32) 2025-12-04T12:15:04.8706166Z E1204 11:43:51.061000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp6 = tmp0.to(tl.float32) 2025-12-04T12:15:04.8707273Z E1204 11:43:51.061000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = tmp6 * tmp8 2025-12-04T12:15:04.8708403Z E1204 11:43:51.061000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = -448.0 2025-12-04T12:15:04.8709571Z E1204 11:43:51.061000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = triton_helpers.maximum(tmp9, tmp10) 2025-12-04T12:15:04.8710730Z E1204 11:43:51.061000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = 448.0 2025-12-04T12:15:04.8711881Z E1204 11:43:51.061000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = triton_helpers.minimum(tmp11, tmp12) 2025-12-04T12:15:04.8713167Z E1204 11:43:51.061000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp14 = tmp13.to(tl.float8e4nv) 2025-12-04T12:15:04.8714626Z E1204 11:43:51.061000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (tl.broadcast_to(r0_0, [XBLOCK, R0_BLOCK])), tmp14, r0_mask) 2025-12-04T12:15:04.8716298Z E1204 11:43:51.061000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr2 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp5, None) 2025-12-04T12:15:04.8717806Z E1204 11:43:51.061000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:04.8720543Z E1204 11:43:51.061000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'out_ptr2': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 75} 2025-12-04T12:15:04.8723418Z E1204 11:43:51.061000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:04.8725288Z E1204 11:43:51.061000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:04.8727216Z E1204 11:43:51.061000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:04.8728955Z E1204 11:43:51.061000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:04.8730821Z E1204 11:43:51.061000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:04.8732640Z E1204 11:43:51.061000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:04.8734527Z E1204 11:43:51.061000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:04.8736269Z E1204 11:43:51.061000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T12:15:04.8738126Z E1204 11:43:51.061000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:04.8739718Z E1204 11:43:51.061000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T12:15:04.8741330Z E1204 11:43:51.061000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:04.8742624Z ('RERUN', {'yellow': True}) [0.3597s] [100%] 2025-12-04T12:15:04.8744256Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda E1204 11:43:51.426000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0 2025-12-04T12:15:04.8746689Z E1204 11:43:51.426000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:04.8748331Z E1204 11:43:51.426000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T12:15:04.8749494Z E1204 11:43:51.426000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 15 2025-12-04T12:15:04.8750736Z E1204 11:43:51.426000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] R0_BLOCK: tl.constexpr = 16 2025-12-04T12:15:04.8751967Z E1204 11:43:51.426000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T12:15:04.8753195Z E1204 11:43:51.426000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T12:15:04.8754556Z E1204 11:43:51.426000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:04.8755934Z E1204 11:43:51.426000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T12:15:04.8757403Z E1204 11:43:51.426000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T12:15:04.8758749Z E1204 11:43:51.426000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T12:15:04.8759997Z E1204 11:43:51.426000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_offset = 0 2025-12-04T12:15:04.8761256Z E1204 11:43:51.426000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T12:15:04.8762499Z E1204 11:43:51.426000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T12:15:04.8763636Z E1204 11:43:51.426000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T12:15:04.8764839Z E1204 11:43:51.426000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_0 = r0_index 2025-12-04T12:15:04.8766181Z E1204 11:43:51.426000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0).to(tl.float32) 2025-12-04T12:15:04.8767588Z E1204 11:43:51.426000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = tl.load(in_ptr1 + (0)) 2025-12-04T12:15:04.8768997Z E1204 11:43:51.426000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tl.broadcast_to(tmp7, [1, 1]) 2025-12-04T12:15:04.8770249Z E1204 11:43:51.426000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tl_math.abs(tmp0) 2025-12-04T12:15:04.8771770Z E1204 11:43:51.426000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK]) 2025-12-04T12:15:04.8773330Z E1204 11:43:51.426000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tl.where(r0_mask, tmp2, float("-inf")) 2025-12-04T12:15:04.8774851Z E1204 11:43:51.426000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = triton_helpers.max2(tmp4, 1)[:, None].to(tl.float32) 2025-12-04T12:15:04.8776176Z E1204 11:43:51.426000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp6 = tmp0.to(tl.float32) 2025-12-04T12:15:04.8777535Z E1204 11:43:51.426000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = tmp6 * tmp8 2025-12-04T12:15:04.8778774Z E1204 11:43:51.426000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = -448.0 2025-12-04T12:15:04.8780068Z E1204 11:43:51.426000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = triton_helpers.maximum(tmp9, tmp10) 2025-12-04T12:15:04.8781368Z E1204 11:43:51.426000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = 448.0 2025-12-04T12:15:04.8782594Z E1204 11:43:51.426000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = triton_helpers.minimum(tmp11, tmp12) 2025-12-04T12:15:04.8783990Z E1204 11:43:51.426000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp14 = tmp13.to(tl.float8e4nv) 2025-12-04T12:15:04.8785509Z E1204 11:43:51.426000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (tl.broadcast_to(r0_0, [XBLOCK, R0_BLOCK])), tmp14, r0_mask) 2025-12-04T12:15:04.8787178Z E1204 11:43:51.426000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr2 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp5, None) 2025-12-04T12:15:04.8788463Z E1204 11:43:51.426000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:04.8791208Z E1204 11:43:51.426000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'out_ptr2': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 75} 2025-12-04T12:15:04.8794055Z E1204 11:43:51.426000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:04.8795941Z E1204 11:43:51.426000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:04.8797858Z E1204 11:43:51.426000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:04.8799606Z E1204 11:43:51.426000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:04.8801534Z E1204 11:43:51.426000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:04.8803297Z E1204 11:43:51.426000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:04.8805182Z E1204 11:43:51.426000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:04.8806946Z E1204 11:43:51.426000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T12:15:04.8808751Z E1204 11:43:51.426000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:04.8810354Z E1204 11:43:51.426000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T12:15:04.8811867Z E1204 11:43:51.426000 108846 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:04.8813113Z FAILED [0.3631s] [100%] 2025-12-04T12:15:04.8813358Z 2025-12-04T12:15:04.8813604Z ==================================== RERUNS ==================================== 2025-12-04T12:15:04.8814347Z _ TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda _ 2025-12-04T12:15:04.8815009Z Traceback (most recent call last): 2025-12-04T12:15:04.8815828Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 265, in test_amax_along_with_fp8_quant 2025-12-04T12:15:04.8816921Z y_compiled = compiled_amax_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T12:15:04.8817928Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:04.8818900Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:04.8819938Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:04.8820909Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:04.8821902Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:04.8822780Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:04.8823706Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:04.8824843Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:04.8825952Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:04.8826824Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:04.8827732Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:04.8828596Z return self._compile_to_module() 2025-12-04T12:15:04.8829386Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:04.8830321Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:04.8831255Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:04.8832137Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:04.8833019Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:04.8834046Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:04.8835073Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:04.8836167Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:04.8836990Z File "/tmp/tmpxkncwm92/wf/cwfemw2uybep42tuf6ibn5fijwno3ejhyf2getnhpigbh6jbhmmr.py", line 62, in 2025-12-04T12:15:04.8838227Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T12:15:04.8839134Z kernel.precompile( 2025-12-04T12:15:04.8840022Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T12:15:04.8840891Z self._precompile_worker() 2025-12-04T12:15:04.8841885Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:04.8851169Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:04.8852363Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:04.8853391Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:04.8854307Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:04.8855270Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:04.8856241Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:04.8857340Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:04.8858184Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:04.8859204Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:04.8860062Z ^ 2025-12-04T12:15:04.8860730Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:04.8861441Z 2025-12-04T12:15:04.8862192Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:04.8863120Z 2025-12-04T12:15:04.8863126Z 2025-12-04T12:15:04.8863381Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:04.8864510Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda 2025-12-04T12:15:04.8865311Z 2025-12-04T12:15:04.8865667Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:04.8866370Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:04.8867002Z frames [('total', 1)] 2025-12-04T12:15:04.8867413Z stats [('calls_captured', 7)] 2025-12-04T12:15:04.8867931Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:04.8868685Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:04.8869255Z graph_break [] 2025-12-04T12:15:04.8869820Z _ TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda _ 2025-12-04T12:15:04.8870543Z Traceback (most recent call last): 2025-12-04T12:15:04.8871543Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 265, in test_amax_along_with_fp8_quant 2025-12-04T12:15:04.8872478Z y_compiled = compiled_amax_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T12:15:04.8873494Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:04.8874590Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:04.8875598Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:04.8876622Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:04.8877529Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:04.8878489Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:04.8879530Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:04.8880649Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:04.8881682Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:04.8882685Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:04.8883573Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:04.8884515Z return self._compile_to_module() 2025-12-04T12:15:04.8885327Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:04.8886249Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:04.8887191Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:04.8888120Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:04.8888937Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:04.8889933Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:04.8891035Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:04.8892007Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:04.8892837Z File "/tmp/tmpwz2g9nlg/hn/chnls7qs2snlkm5mkd36mzn7rstdzdwu3uhl2dj7efuafbxas3am.py", line 62, in 2025-12-04T12:15:04.8894088Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T12:15:04.8894920Z kernel.precompile( 2025-12-04T12:15:04.8895742Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T12:15:04.8896798Z self._precompile_worker() 2025-12-04T12:15:04.8897726Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:04.8898807Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:04.8899786Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:04.8900842Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:04.8901794Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:04.8902743Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:04.8903645Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:04.8904738Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:04.8905556Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:04.8906522Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:04.8907407Z ^ 2025-12-04T12:15:04.8908156Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:04.8908790Z 2025-12-04T12:15:04.8909578Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:04.8910498Z 2025-12-04T12:15:04.8910502Z 2025-12-04T12:15:04.8910868Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:04.8911949Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda 2025-12-04T12:15:04.8912812Z 2025-12-04T12:15:04.8913097Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:04.8913899Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:04.8914495Z frames [('total', 1)] 2025-12-04T12:15:04.8914832Z stats [('calls_captured', 7)] 2025-12-04T12:15:04.8915450Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:04.8916204Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:04.8916790Z graph_break [] 2025-12-04T12:15:04.8917248Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:04.8917833Z frames [('total', 1)] 2025-12-04T12:15:04.8918266Z stats [('calls_captured', 7)] 2025-12-04T12:15:04.8918793Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:04.8919504Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:04.8920112Z graph_break [] 2025-12-04T12:15:04.8920554Z =================================== FAILURES =================================== 2025-12-04T12:15:04.8921230Z _ TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda _ 2025-12-04T12:15:04.8921956Z Traceback (most recent call last): 2025-12-04T12:15:04.8922793Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 265, in test_amax_along_with_fp8_quant 2025-12-04T12:15:04.8923694Z y_compiled = compiled_amax_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T12:15:04.8924693Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:04.8925716Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:04.8926729Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:04.8927654Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:04.8928716Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:04.8929625Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:04.8930627Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:04.8931695Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:04.8932796Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:04.8933774Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:04.8934658Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:04.8935470Z return self._compile_to_module() 2025-12-04T12:15:04.8936473Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:04.8937430Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:04.8938350Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:04.8939328Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:04.8940271Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:04.8941247Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:04.8942330Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:04.8943352Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:04.8944236Z File "/tmp/tmpqsmwxf2t/cr/ccrgz6yksh52d4pljsiu454p36bnmquvjmz2guh5wajrhj4ezynv.py", line 62, in 2025-12-04T12:15:04.8945517Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T12:15:04.8946312Z kernel.precompile( 2025-12-04T12:15:04.8947154Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T12:15:04.8948200Z self._precompile_worker() 2025-12-04T12:15:04.8949139Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:04.8950105Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:04.8951253Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:04.8952312Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:04.8953237Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:04.8954179Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:04.8955129Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:04.8956186Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:04.8957037Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:04.8957977Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:04.8958852Z ^ 2025-12-04T12:15:04.8959587Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:04.8960220Z 2025-12-04T12:15:04.8961022Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:04.8961881Z 2025-12-04T12:15:04.8961887Z 2025-12-04T12:15:04.8962273Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:04.8963394Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda 2025-12-04T12:15:04.8964245Z 2025-12-04T12:15:04.8964554Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:04.8965363Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:04.8965900Z frames [('total', 1)] 2025-12-04T12:15:04.8966316Z stats [('calls_captured', 7)] 2025-12-04T12:15:04.8966931Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:04.8967646Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:04.8968185Z graph_break [] 2025-12-04T12:15:04.8969390Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:04.8969980Z frames [('total', 1)] 2025-12-04T12:15:04.8970355Z stats [('calls_captured', 7)] 2025-12-04T12:15:04.8971191Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:04.8972037Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:04.8972632Z graph_break [] 2025-12-04T12:15:04.8973141Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:04.8973729Z frames [('total', 1)] 2025-12-04T12:15:04.8974117Z stats [('calls_captured', 7)] 2025-12-04T12:15:04.8974679Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:04.8975457Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:04.8976030Z graph_break [] 2025-12-04T12:15:04.8977105Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-966288eeb3fe785e.xml - 2025-12-04T12:15:04.8978209Z =========================== short test summary info ============================ 2025-12-04T12:15:04.8979470Z FAILED [0.3631s] inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:04.8981121Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:04.8982185Z ^ 2025-12-04T12:15:04.8982872Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:04.8983692Z 2025-12-04T12:15:04.8984446Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:04.8985378Z 2025-12-04T12:15:04.8985383Z 2025-12-04T12:15:04.8985645Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:04.8986818Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda 2025-12-04T12:15:04.8987624Z 2025-12-04T12:15:04.8987939Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:04.8988696Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T12:15:04.8989417Z ================== 1 failed, 187 deselected, 2 rerun in 4.08s ================== 2025-12-04T12:15:04.8989982Z Got exit code 1 2025-12-04T12:15:04.8990825Z FAILED CONSISTENTLY: test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda 2025-12-04T12:15:04.8992084Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T12:15:04.8993232Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-47dd8058babbbd0d.xml 2025-12-04T12:15:04.8994107Z ============================= test session starts ============================== 2025-12-04T12:15:04.8994911Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T12:15:04.8995632Z cachedir: .pytest_cache 2025-12-04T12:15:04.8996444Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:15:04.8997442Z rootdir: /var/lib/jenkins/workspace 2025-12-04T12:15:04.8997865Z configfile: pytest.ini 2025-12-04T12:15:04.8998742Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T12:15:04.8999872Z collecting ... collected 188 items / 1 deselected / 187 selected 2025-12-04T12:15:04.9000520Z stepcurrent: skipping 1 already run items. 2025-12-04T12:15:04.9001021Z Running 187 items in this shard 2025-12-04T12:15:04.9001380Z 2025-12-04T12:15:04.9002746Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda E1204 11:44:10.513000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0 2025-12-04T12:15:04.9005236Z E1204 11:44:10.513000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:04.9006902Z E1204 11:44:10.513000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T12:15:04.9008046Z E1204 11:44:10.513000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 150 2025-12-04T12:15:04.9008700Z E1204 11:44:10.513000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] R0_BLOCK: tl.constexpr = 256 2025-12-04T12:15:04.9009202Z E1204 11:44:10.513000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T12:15:04.9009780Z E1204 11:44:10.513000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T12:15:04.9010604Z E1204 11:44:10.513000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:04.9011249Z E1204 11:44:10.513000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T12:15:04.9011936Z E1204 11:44:10.513000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T12:15:04.9012534Z E1204 11:44:10.513000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T12:15:04.9013068Z E1204 11:44:10.513000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_offset = 0 2025-12-04T12:15:04.9013605Z E1204 11:44:10.513000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T12:15:04.9014239Z E1204 11:44:10.513000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T12:15:04.9014826Z E1204 11:44:10.513000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T12:15:04.9015314Z E1204 11:44:10.513000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_0 = r0_index 2025-12-04T12:15:04.9016061Z E1204 11:44:10.513000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0).to(tl.float32) 2025-12-04T12:15:04.9016731Z E1204 11:44:10.513000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = tl.load(in_ptr1 + (0)) 2025-12-04T12:15:04.9017392Z E1204 11:44:10.513000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tl.broadcast_to(tmp7, [1, 1]) 2025-12-04T12:15:04.9017957Z E1204 11:44:10.513000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tl_math.abs(tmp0) 2025-12-04T12:15:04.9018598Z E1204 11:44:10.513000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK]) 2025-12-04T12:15:04.9019260Z E1204 11:44:10.513000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tl.where(r0_mask, tmp2, float("-inf")) 2025-12-04T12:15:04.9019931Z E1204 11:44:10.513000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = triton_helpers.max2(tmp4, 1)[:, None].to(tl.float32) 2025-12-04T12:15:04.9020507Z E1204 11:44:10.513000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp6 = tmp0.to(tl.float32) 2025-12-04T12:15:04.9021459Z E1204 11:44:10.513000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = tmp6 * tmp8 2025-12-04T12:15:04.9022087Z E1204 11:44:10.513000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = -448.0 2025-12-04T12:15:04.9022701Z E1204 11:44:10.513000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = triton_helpers.maximum(tmp9, tmp10) 2025-12-04T12:15:04.9023182Z E1204 11:44:10.513000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = 448.0 2025-12-04T12:15:04.9023886Z E1204 11:44:10.513000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = triton_helpers.minimum(tmp11, tmp12) 2025-12-04T12:15:04.9024462Z E1204 11:44:10.513000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp14 = tmp13.to(tl.float8e4nv) 2025-12-04T12:15:04.9025335Z E1204 11:44:10.513000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (tl.broadcast_to(r0_0, [XBLOCK, R0_BLOCK])), tmp14, r0_mask) 2025-12-04T12:15:04.9026081Z E1204 11:44:10.513000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr2 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp5, None) 2025-12-04T12:15:04.9026515Z E1204 11:44:10.513000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:04.9028698Z E1204 11:44:10.513000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'out_ptr2': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 75} 2025-12-04T12:15:04.9029316Z E1204 11:44:10.513000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:04.9030517Z E1204 11:44:10.513000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:04.9031211Z E1204 11:44:10.513000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:04.9032193Z E1204 11:44:10.513000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:04.9032914Z E1204 11:44:10.513000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:04.9033898Z E1204 11:44:10.513000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:04.9034688Z E1204 11:44:10.513000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:04.9035449Z E1204 11:44:10.513000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T12:15:04.9036441Z E1204 11:44:10.513000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:04.9036880Z E1204 11:44:10.513000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T12:15:04.9037873Z E1204 11:44:10.513000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:04.9038049Z ('RERUN', {'yellow': True}) [3.3269s] [ 0%] 2025-12-04T12:15:04.9039396Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda E1204 11:44:10.907000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0 2025-12-04T12:15:04.9040465Z E1204 11:44:10.907000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:04.9041001Z E1204 11:44:10.907000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T12:15:04.9041517Z E1204 11:44:10.907000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 150 2025-12-04T12:15:04.9042079Z E1204 11:44:10.907000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] R0_BLOCK: tl.constexpr = 256 2025-12-04T12:15:04.9042609Z E1204 11:44:10.907000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T12:15:04.9043234Z E1204 11:44:10.907000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T12:15:04.9043893Z E1204 11:44:10.907000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:04.9044521Z E1204 11:44:10.907000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T12:15:04.9045267Z E1204 11:44:10.907000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T12:15:04.9045865Z E1204 11:44:10.907000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T12:15:04.9046322Z E1204 11:44:10.907000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_offset = 0 2025-12-04T12:15:04.9047004Z E1204 11:44:10.907000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T12:15:04.9047532Z E1204 11:44:10.907000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T12:15:04.9048079Z E1204 11:44:10.907000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T12:15:04.9048567Z E1204 11:44:10.907000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_0 = r0_index 2025-12-04T12:15:04.9049249Z E1204 11:44:10.907000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0).to(tl.float32) 2025-12-04T12:15:04.9049879Z E1204 11:44:10.907000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = tl.load(in_ptr1 + (0)) 2025-12-04T12:15:04.9050488Z E1204 11:44:10.907000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tl.broadcast_to(tmp7, [1, 1]) 2025-12-04T12:15:04.9051079Z E1204 11:44:10.907000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tl_math.abs(tmp0) 2025-12-04T12:15:04.9051701Z E1204 11:44:10.907000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK]) 2025-12-04T12:15:04.9052355Z E1204 11:44:10.907000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tl.where(r0_mask, tmp2, float("-inf")) 2025-12-04T12:15:04.9053047Z E1204 11:44:10.907000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = triton_helpers.max2(tmp4, 1)[:, None].to(tl.float32) 2025-12-04T12:15:04.9053640Z E1204 11:44:10.907000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp6 = tmp0.to(tl.float32) 2025-12-04T12:15:04.9054260Z E1204 11:44:10.907000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = tmp6 * tmp8 2025-12-04T12:15:04.9054769Z E1204 11:44:10.907000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = -448.0 2025-12-04T12:15:04.9055427Z E1204 11:44:10.907000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = triton_helpers.maximum(tmp9, tmp10) 2025-12-04T12:15:04.9055909Z E1204 11:44:10.907000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = 448.0 2025-12-04T12:15:04.9056645Z E1204 11:44:10.907000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = triton_helpers.minimum(tmp11, tmp12) 2025-12-04T12:15:04.9057352Z E1204 11:44:10.907000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp14 = tmp13.to(tl.float8e4nv) 2025-12-04T12:15:04.9058168Z E1204 11:44:10.907000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (tl.broadcast_to(r0_0, [XBLOCK, R0_BLOCK])), tmp14, r0_mask) 2025-12-04T12:15:04.9058963Z E1204 11:44:10.907000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr2 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp5, None) 2025-12-04T12:15:04.9059368Z E1204 11:44:10.907000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:04.9061539Z E1204 11:44:10.907000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'out_ptr2': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 75} 2025-12-04T12:15:04.9062162Z E1204 11:44:10.907000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:04.9063308Z E1204 11:44:10.907000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:04.9063975Z E1204 11:44:10.907000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:04.9064902Z E1204 11:44:10.907000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:04.9065684Z E1204 11:44:10.907000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:04.9066577Z E1204 11:44:10.907000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:04.9067580Z E1204 11:44:10.907000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:04.9068230Z E1204 11:44:10.907000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T12:15:04.9069280Z E1204 11:44:10.907000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:04.9069716Z E1204 11:44:10.907000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T12:15:04.9070672Z E1204 11:44:10.907000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:04.9071123Z ('RERUN', {'yellow': True}) [0.3543s] [ 0%] 2025-12-04T12:15:04.9072439Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda E1204 11:44:11.267000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0 2025-12-04T12:15:04.9073594Z E1204 11:44:11.267000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:04.9074069Z E1204 11:44:11.267000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T12:15:04.9074606Z E1204 11:44:11.267000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 150 2025-12-04T12:15:04.9075143Z E1204 11:44:11.267000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] R0_BLOCK: tl.constexpr = 256 2025-12-04T12:15:04.9075776Z E1204 11:44:11.267000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T12:15:04.9076439Z E1204 11:44:11.267000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T12:15:04.9077021Z E1204 11:44:11.267000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:04.9077701Z E1204 11:44:11.267000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T12:15:04.9078323Z E1204 11:44:11.267000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T12:15:04.9078993Z E1204 11:44:11.267000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T12:15:04.9079506Z E1204 11:44:11.267000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_offset = 0 2025-12-04T12:15:04.9080064Z E1204 11:44:11.267000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T12:15:04.9080625Z E1204 11:44:11.267000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T12:15:04.9081129Z E1204 11:44:11.267000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T12:15:04.9081646Z E1204 11:44:11.267000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_0 = r0_index 2025-12-04T12:15:04.9082390Z E1204 11:44:11.267000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0).to(tl.float32) 2025-12-04T12:15:04.9083037Z E1204 11:44:11.267000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = tl.load(in_ptr1 + (0)) 2025-12-04T12:15:04.9083672Z E1204 11:44:11.267000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tl.broadcast_to(tmp7, [1, 1]) 2025-12-04T12:15:04.9084217Z E1204 11:44:11.267000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tl_math.abs(tmp0) 2025-12-04T12:15:04.9084893Z E1204 11:44:11.267000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK]) 2025-12-04T12:15:04.9085575Z E1204 11:44:11.267000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tl.where(r0_mask, tmp2, float("-inf")) 2025-12-04T12:15:04.9086360Z E1204 11:44:11.267000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = triton_helpers.max2(tmp4, 1)[:, None].to(tl.float32) 2025-12-04T12:15:04.9086914Z E1204 11:44:11.267000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp6 = tmp0.to(tl.float32) 2025-12-04T12:15:04.9087418Z E1204 11:44:11.267000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = tmp6 * tmp8 2025-12-04T12:15:04.9088062Z E1204 11:44:11.267000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = -448.0 2025-12-04T12:15:04.9088686Z E1204 11:44:11.267000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = triton_helpers.maximum(tmp9, tmp10) 2025-12-04T12:15:04.9089242Z E1204 11:44:11.267000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = 448.0 2025-12-04T12:15:04.9089876Z E1204 11:44:11.267000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = triton_helpers.minimum(tmp11, tmp12) 2025-12-04T12:15:04.9090451Z E1204 11:44:11.267000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp14 = tmp13.to(tl.float8e4nv) 2025-12-04T12:15:04.9091251Z E1204 11:44:11.267000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (tl.broadcast_to(r0_0, [XBLOCK, R0_BLOCK])), tmp14, r0_mask) 2025-12-04T12:15:04.9092005Z E1204 11:44:11.267000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr2 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp5, None) 2025-12-04T12:15:04.9092439Z E1204 11:44:11.267000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:04.9094645Z E1204 11:44:11.267000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'out_ptr2': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 75} 2025-12-04T12:15:04.9095290Z E1204 11:44:11.267000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:04.9096455Z E1204 11:44:11.267000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:04.9097200Z E1204 11:44:11.267000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:04.9098135Z E1204 11:44:11.267000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:04.9098970Z E1204 11:44:11.267000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:04.9099916Z E1204 11:44:11.267000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:04.9100762Z E1204 11:44:11.267000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:04.9101503Z E1204 11:44:11.267000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T12:15:04.9102502Z E1204 11:44:11.267000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:04.9102968Z E1204 11:44:11.267000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T12:15:04.9103946Z E1204 11:44:11.267000 109043 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:04.9104175Z FAILED [0.3581s] [ 0%] 2025-12-04T12:15:04.9104182Z 2025-12-04T12:15:04.9104432Z ==================================== RERUNS ==================================== 2025-12-04T12:15:04.9104792Z _ TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda _ 2025-12-04T12:15:04.9105475Z Traceback (most recent call last): 2025-12-04T12:15:04.9105960Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 265, in test_amax_along_with_fp8_quant 2025-12-04T12:15:04.9106303Z y_compiled = compiled_amax_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T12:15:04.9106923Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:04.9107214Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:04.9107816Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:04.9108057Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:04.9108586Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:04.9108902Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:04.9109479Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:04.9109887Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:04.9110451Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:04.9110643Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:04.9111228Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:04.9111426Z return self._compile_to_module() 2025-12-04T12:15:04.9112002Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:04.9112208Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:04.9112761Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:04.9112959Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:04.9113603Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:04.9114010Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:04.9114642Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:04.9114809Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:04.9115405Z File "/tmp/tmpynpbzzr4/vs/cvscihiarpeopxwvjhysbzu5j4jpryak7kmli74mev3svog7a2j3.py", line 62, in 2025-12-04T12:15:04.9115964Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T12:15:04.9116168Z kernel.precompile( 2025-12-04T12:15:04.9117064Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T12:15:04.9117227Z self._precompile_worker() 2025-12-04T12:15:04.9117938Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:04.9118209Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:04.9118820Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:04.9119189Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:04.9119734Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:04.9120082Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:04.9120565Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:04.9120943Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:04.9217240Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:04.9218080Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:04.9218188Z ^ 2025-12-04T12:15:04.9218663Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:04.9218671Z 2025-12-04T12:15:04.9219415Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:04.9219429Z 2025-12-04T12:15:04.9219434Z 2025-12-04T12:15:04.9219666Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:04.9220339Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda 2025-12-04T12:15:04.9220345Z 2025-12-04T12:15:04.9220631Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:04.9220882Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:04.9221000Z frames [('total', 1)] 2025-12-04T12:15:04.9221127Z stats [('calls_captured', 7)] 2025-12-04T12:15:04.9221388Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:04.9221618Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:04.9221729Z graph_break [] 2025-12-04T12:15:04.9222073Z _ TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda _ 2025-12-04T12:15:04.9222210Z Traceback (most recent call last): 2025-12-04T12:15:04.9222679Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 265, in test_amax_along_with_fp8_quant 2025-12-04T12:15:04.9222948Z y_compiled = compiled_amax_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T12:15:04.9223684Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:04.9223958Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:04.9224484Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:04.9224688Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:04.9225225Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:04.9225442Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:04.9226060Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:04.9226389Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:04.9226923Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:04.9227095Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:04.9227720Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:04.9227868Z return self._compile_to_module() 2025-12-04T12:15:04.9228364Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:04.9228538Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:04.9229082Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:04.9229221Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:04.9229721Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:04.9229981Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:04.9230577Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:04.9230731Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:04.9231220Z File "/tmp/tmpna5tog_g/33/c33whvclvjiosqjm2uamnjzooutjccj2sxcdh66y6lxcjtjcdm4e.py", line 62, in 2025-12-04T12:15:04.9231686Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T12:15:04.9231820Z kernel.precompile( 2025-12-04T12:15:04.9232382Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T12:15:04.9232525Z self._precompile_worker() 2025-12-04T12:15:04.9233133Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:04.9233321Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:04.9233940Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:04.9234151Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:04.9234629Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:04.9234895Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:04.9235347Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:04.9235723Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:04.9235957Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:04.9236508Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:04.9236639Z ^ 2025-12-04T12:15:04.9237107Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:04.9237113Z 2025-12-04T12:15:04.9237882Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:04.9237978Z 2025-12-04T12:15:04.9237984Z 2025-12-04T12:15:04.9238227Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:04.9238894Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda 2025-12-04T12:15:04.9238917Z 2025-12-04T12:15:04.9239192Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:04.9239422Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:04.9239555Z frames [('total', 1)] 2025-12-04T12:15:04.9239707Z stats [('calls_captured', 7)] 2025-12-04T12:15:04.9239955Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:04.9240231Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:04.9240344Z graph_break [] 2025-12-04T12:15:04.9240610Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:04.9240816Z frames [('total', 1)] 2025-12-04T12:15:04.9240966Z stats [('calls_captured', 7)] 2025-12-04T12:15:04.9241302Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:04.9246121Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:04.9246268Z graph_break [] 2025-12-04T12:15:04.9246440Z =================================== FAILURES =================================== 2025-12-04T12:15:04.9246763Z _ TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda _ 2025-12-04T12:15:04.9246899Z Traceback (most recent call last): 2025-12-04T12:15:04.9247381Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 265, in test_amax_along_with_fp8_quant 2025-12-04T12:15:04.9247627Z y_compiled = compiled_amax_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T12:15:04.9248115Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:04.9248365Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:04.9248888Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:04.9249081Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:04.9249597Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:04.9249742Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:04.9250272Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:04.9250605Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:04.9251172Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:04.9251329Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:04.9251811Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:04.9251951Z return self._compile_to_module() 2025-12-04T12:15:04.9252455Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:04.9252614Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:04.9253177Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:04.9253322Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:04.9253815Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:04.9254050Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:04.9254664Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:04.9254823Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:04.9255332Z File "/tmp/tmpw4uqtrom/ln/clnj7ur7ftz7hq4rsfhr7vbecbpxqqlzgyyhrmyesywx5pk4ngsu.py", line 62, in 2025-12-04T12:15:04.9255795Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T12:15:04.9276557Z kernel.precompile( 2025-12-04T12:15:04.9277215Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T12:15:04.9277462Z self._precompile_worker() 2025-12-04T12:15:04.9278111Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:04.9278295Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:04.9278892Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:04.9279141Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:04.9279595Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:04.9279843Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:04.9280281Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:04.9280615Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:04.9280848Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:04.9281358Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:04.9281455Z ^ 2025-12-04T12:15:04.9281908Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:04.9281915Z 2025-12-04T12:15:04.9282631Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:04.9282638Z 2025-12-04T12:15:04.9282653Z 2025-12-04T12:15:04.9282870Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:04.9283504Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda 2025-12-04T12:15:04.9283512Z 2025-12-04T12:15:04.9283788Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:04.9284011Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:04.9284120Z frames [('total', 1)] 2025-12-04T12:15:04.9284244Z stats [('calls_captured', 7)] 2025-12-04T12:15:04.9284479Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:04.9284710Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:04.9284808Z graph_break [] 2025-12-04T12:15:04.9285025Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:04.9285135Z frames [('total', 1)] 2025-12-04T12:15:04.9285250Z stats [('calls_captured', 7)] 2025-12-04T12:15:04.9285596Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:04.9285845Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:04.9285942Z graph_break [] 2025-12-04T12:15:04.9286159Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:04.9286276Z frames [('total', 1)] 2025-12-04T12:15:04.9286392Z stats [('calls_captured', 7)] 2025-12-04T12:15:04.9286674Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:04.9286905Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:04.9287053Z graph_break [] 2025-12-04T12:15:04.9287725Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-47dd8058babbbd0d.xml - 2025-12-04T12:15:04.9287898Z =========================== short test summary info ============================ 2025-12-04T12:15:04.9288684Z FAILED [0.3581s] inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:04.9289246Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:04.9289335Z ^ 2025-12-04T12:15:04.9289808Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:04.9289817Z 2025-12-04T12:15:04.9290527Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:04.9290533Z 2025-12-04T12:15:04.9290538Z 2025-12-04T12:15:04.9290756Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:04.9291396Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda 2025-12-04T12:15:04.9291415Z 2025-12-04T12:15:04.9291688Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:04.9291871Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T12:15:04.9292083Z =================== 1 failed, 1 deselected, 2 rerun in 4.08s =================== 2025-12-04T12:15:04.9292188Z Got exit code 1 2025-12-04T12:15:04.9292297Z Retrying single test... 2025-12-04T12:15:04.9292782Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-e92e228ccdafe934.xml 2025-12-04T12:15:04.9292950Z ============================= test session starts ============================== 2025-12-04T12:15:04.9293317Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T12:15:04.9293430Z cachedir: .pytest_cache 2025-12-04T12:15:04.9293952Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:15:04.9294092Z rootdir: /var/lib/jenkins/workspace 2025-12-04T12:15:04.9294204Z configfile: pytest.ini 2025-12-04T12:15:04.9294794Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T12:15:04.9295026Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T12:15:04.9295737Z stepcurrent: skipping 1 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda 2025-12-04T12:15:04.9295870Z Running 1 items in this shard 2025-12-04T12:15:04.9295875Z 2025-12-04T12:15:04.9297221Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda E1204 11:44:30.230000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0 2025-12-04T12:15:04.9298200Z E1204 11:44:30.230000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:04.9298639Z E1204 11:44:30.230000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T12:15:04.9299115Z E1204 11:44:30.230000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 150 2025-12-04T12:15:04.9299688Z E1204 11:44:30.230000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] R0_BLOCK: tl.constexpr = 256 2025-12-04T12:15:04.9300151Z E1204 11:44:30.230000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T12:15:04.9300701Z E1204 11:44:30.230000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T12:15:04.9301238Z E1204 11:44:30.230000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:04.9301854Z E1204 11:44:30.230000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T12:15:04.9302452Z E1204 11:44:30.230000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T12:15:04.9303007Z E1204 11:44:30.230000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T12:15:04.9303463Z E1204 11:44:30.230000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_offset = 0 2025-12-04T12:15:04.9303983Z E1204 11:44:30.230000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T12:15:04.9304455Z E1204 11:44:30.230000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T12:15:04.9304931Z E1204 11:44:30.230000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T12:15:04.9305380Z E1204 11:44:30.230000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_0 = r0_index 2025-12-04T12:15:04.9306556Z E1204 11:44:30.230000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0).to(tl.float32) 2025-12-04T12:15:04.9307082Z E1204 11:44:30.230000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = tl.load(in_ptr1 + (0)) 2025-12-04T12:15:04.9307627Z E1204 11:44:30.230000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tl.broadcast_to(tmp7, [1, 1]) 2025-12-04T12:15:04.9308136Z E1204 11:44:30.230000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tl_math.abs(tmp0) 2025-12-04T12:15:04.9308721Z E1204 11:44:30.230000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK]) 2025-12-04T12:15:04.9309304Z E1204 11:44:30.230000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tl.where(r0_mask, tmp2, float("-inf")) 2025-12-04T12:15:04.9309934Z E1204 11:44:30.230000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = triton_helpers.max2(tmp4, 1)[:, None].to(tl.float32) 2025-12-04T12:15:04.9310452Z E1204 11:44:30.230000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp6 = tmp0.to(tl.float32) 2025-12-04T12:15:04.9310916Z E1204 11:44:30.230000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = tmp6 * tmp8 2025-12-04T12:15:04.9311435Z E1204 11:44:30.230000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = -448.0 2025-12-04T12:15:04.9312020Z E1204 11:44:30.230000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = triton_helpers.maximum(tmp9, tmp10) 2025-12-04T12:15:04.9312457Z E1204 11:44:30.230000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = 448.0 2025-12-04T12:15:04.9313077Z E1204 11:44:30.230000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = triton_helpers.minimum(tmp11, tmp12) 2025-12-04T12:15:04.9313642Z E1204 11:44:30.230000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp14 = tmp13.to(tl.float8e4nv) 2025-12-04T12:15:04.9314353Z E1204 11:44:30.230000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (tl.broadcast_to(r0_0, [XBLOCK, R0_BLOCK])), tmp14, r0_mask) 2025-12-04T12:15:04.9315077Z E1204 11:44:30.230000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr2 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp5, None) 2025-12-04T12:15:04.9315474Z E1204 11:44:30.230000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:04.9317572Z E1204 11:44:30.230000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'out_ptr2': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 75} 2025-12-04T12:15:04.9318109Z E1204 11:44:30.230000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:04.9319162Z E1204 11:44:30.230000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:04.9319791Z E1204 11:44:30.230000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:04.9320704Z E1204 11:44:30.230000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:04.9321386Z E1204 11:44:30.230000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:04.9322274Z E1204 11:44:30.230000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:04.9323060Z E1204 11:44:30.230000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:04.9323668Z E1204 11:44:30.230000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T12:15:04.9324640Z E1204 11:44:30.230000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:04.9325005Z E1204 11:44:30.230000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T12:15:04.9325955Z E1204 11:44:30.230000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:04.9326096Z ('RERUN', {'yellow': True}) [3.3507s] [100%] 2025-12-04T12:15:04.9327332Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda E1204 11:44:30.629000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0 2025-12-04T12:15:04.9328356Z E1204 11:44:30.629000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:04.9328792Z E1204 11:44:30.629000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T12:15:04.9329248Z E1204 11:44:30.629000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 150 2025-12-04T12:15:04.9329800Z E1204 11:44:30.629000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] R0_BLOCK: tl.constexpr = 256 2025-12-04T12:15:04.9330271Z E1204 11:44:30.629000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T12:15:04.9330804Z E1204 11:44:30.629000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T12:15:04.9331346Z E1204 11:44:30.629000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:04.9331941Z E1204 11:44:30.629000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T12:15:04.9332530Z E1204 11:44:30.629000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T12:15:04.9333096Z E1204 11:44:30.629000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T12:15:04.9333536Z E1204 11:44:30.629000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_offset = 0 2025-12-04T12:15:04.9334053Z E1204 11:44:30.629000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T12:15:04.9334538Z E1204 11:44:30.629000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T12:15:04.9334996Z E1204 11:44:30.629000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T12:15:04.9335459Z E1204 11:44:30.629000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_0 = r0_index 2025-12-04T12:15:04.9336107Z E1204 11:44:30.629000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0).to(tl.float32) 2025-12-04T12:15:04.9336699Z E1204 11:44:30.629000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = tl.load(in_ptr1 + (0)) 2025-12-04T12:15:04.9337263Z E1204 11:44:30.629000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tl.broadcast_to(tmp7, [1, 1]) 2025-12-04T12:15:04.9337769Z E1204 11:44:30.629000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tl_math.abs(tmp0) 2025-12-04T12:15:04.9338362Z E1204 11:44:30.629000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK]) 2025-12-04T12:15:04.9338969Z E1204 11:44:30.629000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tl.where(r0_mask, tmp2, float("-inf")) 2025-12-04T12:15:04.9339596Z E1204 11:44:30.629000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = triton_helpers.max2(tmp4, 1)[:, None].to(tl.float32) 2025-12-04T12:15:04.9340120Z E1204 11:44:30.629000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp6 = tmp0.to(tl.float32) 2025-12-04T12:15:04.9340624Z E1204 11:44:30.629000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = tmp6 * tmp8 2025-12-04T12:15:04.9341110Z E1204 11:44:30.629000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = -448.0 2025-12-04T12:15:04.9341687Z E1204 11:44:30.629000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = triton_helpers.maximum(tmp9, tmp10) 2025-12-04T12:15:04.9342130Z E1204 11:44:30.629000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = 448.0 2025-12-04T12:15:04.9342720Z E1204 11:44:30.629000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = triton_helpers.minimum(tmp11, tmp12) 2025-12-04T12:15:04.9343286Z E1204 11:44:30.629000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp14 = tmp13.to(tl.float8e4nv) 2025-12-04T12:15:04.9344001Z E1204 11:44:30.629000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (tl.broadcast_to(r0_0, [XBLOCK, R0_BLOCK])), tmp14, r0_mask) 2025-12-04T12:15:04.9344708Z E1204 11:44:30.629000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr2 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp5, None) 2025-12-04T12:15:04.9345085Z E1204 11:44:30.629000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:04.9347179Z E1204 11:44:30.629000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'out_ptr2': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 75} 2025-12-04T12:15:04.9347730Z E1204 11:44:30.629000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:04.9348776Z E1204 11:44:30.629000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:04.9349404Z E1204 11:44:30.629000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:04.9350312Z E1204 11:44:30.629000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:04.9350990Z E1204 11:44:30.629000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:04.9351891Z E1204 11:44:30.629000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:04.9352690Z E1204 11:44:30.629000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:04.9353312Z E1204 11:44:30.629000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T12:15:04.9354266Z E1204 11:44:30.629000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:04.9354676Z E1204 11:44:30.629000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T12:15:04.9355599Z E1204 11:44:30.629000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:04.9355739Z ('RERUN', {'yellow': True}) [0.3559s] [100%] 2025-12-04T12:15:04.9356990Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda E1204 11:44:31.000000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0 2025-12-04T12:15:04.9357985Z E1204 11:44:31.000000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:04.9358436Z E1204 11:44:31.000000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T12:15:04.9358884Z E1204 11:44:31.000000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 150 2025-12-04T12:15:04.9359402Z E1204 11:44:31.000000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] R0_BLOCK: tl.constexpr = 256 2025-12-04T12:15:04.9359883Z E1204 11:44:31.000000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T12:15:04.9360419Z E1204 11:44:31.000000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T12:15:04.9360973Z E1204 11:44:31.000000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:04.9361563Z E1204 11:44:31.000000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T12:15:04.9362148Z E1204 11:44:31.000000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T12:15:04.9362723Z E1204 11:44:31.000000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T12:15:04.9363165Z E1204 11:44:31.000000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_offset = 0 2025-12-04T12:15:04.9363703Z E1204 11:44:31.000000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T12:15:04.9364178Z E1204 11:44:31.000000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T12:15:04.9364652Z E1204 11:44:31.000000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T12:15:04.9365102Z E1204 11:44:31.000000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_0 = r0_index 2025-12-04T12:15:04.9365747Z E1204 11:44:31.000000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0).to(tl.float32) 2025-12-04T12:15:04.9366287Z E1204 11:44:31.000000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = tl.load(in_ptr1 + (0)) 2025-12-04T12:15:04.9366867Z E1204 11:44:31.000000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tl.broadcast_to(tmp7, [1, 1]) 2025-12-04T12:15:04.9367386Z E1204 11:44:31.000000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tl_math.abs(tmp0) 2025-12-04T12:15:04.9367969Z E1204 11:44:31.000000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK]) 2025-12-04T12:15:04.9368598Z E1204 11:44:31.000000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tl.where(r0_mask, tmp2, float("-inf")) 2025-12-04T12:15:04.9369240Z E1204 11:44:31.000000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = triton_helpers.max2(tmp4, 1)[:, None].to(tl.float32) 2025-12-04T12:15:04.9369746Z E1204 11:44:31.000000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp6 = tmp0.to(tl.float32) 2025-12-04T12:15:04.9370226Z E1204 11:44:31.000000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = tmp6 * tmp8 2025-12-04T12:15:04.9370705Z E1204 11:44:31.000000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = -448.0 2025-12-04T12:15:04.9371468Z E1204 11:44:31.000000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = triton_helpers.maximum(tmp9, tmp10) 2025-12-04T12:15:04.9371924Z E1204 11:44:31.000000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = 448.0 2025-12-04T12:15:04.9372503Z E1204 11:44:31.000000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = triton_helpers.minimum(tmp11, tmp12) 2025-12-04T12:15:04.9373053Z E1204 11:44:31.000000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp14 = tmp13.to(tl.float8e4nv) 2025-12-04T12:15:04.9373763Z E1204 11:44:31.000000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (tl.broadcast_to(r0_0, [XBLOCK, R0_BLOCK])), tmp14, r0_mask) 2025-12-04T12:15:04.9374485Z E1204 11:44:31.000000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr2 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp5, None) 2025-12-04T12:15:04.9374853Z E1204 11:44:31.000000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:04.9377009Z E1204 11:44:31.000000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'out_ptr2': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 75} 2025-12-04T12:15:04.9377567Z E1204 11:44:31.000000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:04.9378609Z E1204 11:44:31.000000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:04.9379261Z E1204 11:44:31.000000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:04.9380154Z E1204 11:44:31.000000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:04.9380934Z E1204 11:44:31.000000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:04.9381820Z E1204 11:44:31.000000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:04.9382653Z E1204 11:44:31.000000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:04.9383311Z E1204 11:44:31.000000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T12:15:04.9384266Z E1204 11:44:31.000000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:04.9384689Z E1204 11:44:31.000000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T12:15:04.9385580Z E1204 11:44:31.000000 109240 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:04.9385704Z FAILED [0.3694s] [100%] 2025-12-04T12:15:04.9385711Z 2025-12-04T12:15:04.9385860Z ==================================== RERUNS ==================================== 2025-12-04T12:15:04.9386186Z _ TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda _ 2025-12-04T12:15:04.9386326Z Traceback (most recent call last): 2025-12-04T12:15:04.9386782Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 265, in test_amax_along_with_fp8_quant 2025-12-04T12:15:04.9387042Z y_compiled = compiled_amax_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T12:15:04.9387533Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:04.9387786Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:04.9388315Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:04.9388512Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:04.9389035Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:04.9389193Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:04.9389726Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:04.9390063Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:04.9390586Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:04.9390737Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:04.9391229Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:04.9391353Z return self._compile_to_module() 2025-12-04T12:15:04.9391850Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:04.9392020Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:04.9392540Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:04.9392686Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:04.9393219Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:04.9393473Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:04.9394063Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:04.9394191Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:04.9395072Z File "/tmp/tmpd5orxukv/z7/cz7lutm3es2lyz6khdiqs5qmbvwebokimmwds4jh3wrg7aysnpl2.py", line 62, in 2025-12-04T12:15:04.9395588Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T12:15:04.9395732Z kernel.precompile( 2025-12-04T12:15:04.9396301Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T12:15:04.9396420Z self._precompile_worker() 2025-12-04T12:15:04.9397030Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:04.9397242Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:04.9397835Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:04.9398047Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:04.9398500Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:04.9398760Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:04.9399203Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:04.9399538Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:04.9399777Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:04.9400293Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:04.9400543Z ^ 2025-12-04T12:15:04.9401025Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:04.9401031Z 2025-12-04T12:15:04.9401748Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:04.9401758Z 2025-12-04T12:15:04.9401762Z 2025-12-04T12:15:04.9402001Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:04.9402639Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda 2025-12-04T12:15:04.9402644Z 2025-12-04T12:15:04.9402929Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:04.9403159Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:04.9403269Z frames [('total', 1)] 2025-12-04T12:15:04.9403405Z stats [('calls_captured', 7)] 2025-12-04T12:15:04.9403644Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:04.9403867Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:04.9403992Z graph_break [] 2025-12-04T12:15:04.9404315Z _ TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda _ 2025-12-04T12:15:04.9404453Z Traceback (most recent call last): 2025-12-04T12:15:04.9404912Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 265, in test_amax_along_with_fp8_quant 2025-12-04T12:15:04.9405156Z y_compiled = compiled_amax_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T12:15:04.9405661Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:04.9405961Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:04.9406488Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:04.9406681Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:04.9407190Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:04.9407389Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:04.9407983Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:04.9408306Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:04.9408838Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:04.9408989Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:04.9409514Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:04.9409636Z return self._compile_to_module() 2025-12-04T12:15:04.9410117Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:04.9410298Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:04.9410819Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:04.9410963Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:04.9411464Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:04.9411696Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:04.9412306Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:04.9412436Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:04.9412952Z File "/tmp/tmpabpepxim/mc/cmcnqgrmruxpwu2wl7ubmrpoprbrm5zwt5kdzvxqhasamksxzecz.py", line 62, in 2025-12-04T12:15:04.9413425Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T12:15:04.9413539Z kernel.precompile( 2025-12-04T12:15:04.9414107Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T12:15:04.9414225Z self._precompile_worker() 2025-12-04T12:15:04.9414820Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:04.9415011Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:04.9415608Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:04.9415822Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:04.9416274Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:04.9416627Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:04.9417082Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:04.9417417Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:04.9417645Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:04.9418175Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:04.9418308Z ^ 2025-12-04T12:15:04.9418782Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:04.9418790Z 2025-12-04T12:15:04.9419501Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:04.9419541Z 2025-12-04T12:15:04.9419545Z 2025-12-04T12:15:04.9419775Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:04.9420438Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda 2025-12-04T12:15:04.9420445Z 2025-12-04T12:15:04.9420712Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:04.9420950Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:04.9421057Z frames [('total', 1)] 2025-12-04T12:15:04.9421177Z stats [('calls_captured', 7)] 2025-12-04T12:15:04.9421463Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:04.9421685Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:04.9421799Z graph_break [] 2025-12-04T12:15:04.9422022Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:04.9422133Z frames [('total', 1)] 2025-12-04T12:15:04.9422267Z stats [('calls_captured', 7)] 2025-12-04T12:15:04.9422485Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:04.9422721Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:04.9422836Z graph_break [] 2025-12-04T12:15:04.9422985Z =================================== FAILURES =================================== 2025-12-04T12:15:04.9423320Z _ TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda _ 2025-12-04T12:15:04.9423450Z Traceback (most recent call last): 2025-12-04T12:15:04.9423912Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 265, in test_amax_along_with_fp8_quant 2025-12-04T12:15:04.9424172Z y_compiled = compiled_amax_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T12:15:04.9424664Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:04.9424916Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:04.9425445Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:04.9425641Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:04.9426166Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:04.9426313Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:04.9426853Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:04.9427188Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:04.9427708Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:04.9427870Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:04.9428347Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:04.9428473Z return self._compile_to_module() 2025-12-04T12:15:04.9428970Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:04.9429135Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:04.9429689Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:04.9429837Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:04.9430334Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:04.9430576Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:04.9431194Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:04.9431321Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:04.9431854Z File "/tmp/tmpv_nezct4/ru/cruzlhbig75v7zez2elvcwr3kmbw4uczxlz5wfag4aohz4fymkq6.py", line 62, in 2025-12-04T12:15:04.9432319Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T12:15:04.9432446Z kernel.precompile( 2025-12-04T12:15:04.9433007Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T12:15:04.9433156Z self._precompile_worker() 2025-12-04T12:15:04.9433767Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:04.9433947Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:04.9434546Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:04.9434762Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:04.9435217Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:04.9435478Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:04.9435923Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:04.9436257Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:04.9436503Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:04.9437021Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:04.9437132Z ^ 2025-12-04T12:15:04.9437592Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:04.9437598Z 2025-12-04T12:15:04.9438310Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:04.9438317Z 2025-12-04T12:15:04.9438336Z 2025-12-04T12:15:04.9438557Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:04.9439190Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda 2025-12-04T12:15:04.9439198Z 2025-12-04T12:15:04.9439485Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:04.9439709Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:04.9439818Z frames [('total', 1)] 2025-12-04T12:15:04.9439953Z stats [('calls_captured', 7)] 2025-12-04T12:15:04.9440194Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:04.9440432Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:04.9440536Z graph_break [] 2025-12-04T12:15:04.9440758Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:04.9440883Z frames [('total', 1)] 2025-12-04T12:15:04.9441002Z stats [('calls_captured', 7)] 2025-12-04T12:15:04.9441258Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:04.9441513Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:04.9441616Z graph_break [] 2025-12-04T12:15:04.9441833Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:04.9441953Z frames [('total', 1)] 2025-12-04T12:15:04.9442070Z stats [('calls_captured', 7)] 2025-12-04T12:15:04.9442335Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:04.9442570Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:04.9442671Z graph_break [] 2025-12-04T12:15:04.9443374Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-e92e228ccdafe934.xml - 2025-12-04T12:15:04.9443552Z =========================== short test summary info ============================ 2025-12-04T12:15:04.9444331Z FAILED [0.3694s] inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:04.9444889Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:04.9444981Z ^ 2025-12-04T12:15:04.9445449Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:04.9445458Z 2025-12-04T12:15:04.9446168Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:04.9446174Z 2025-12-04T12:15:04.9446179Z 2025-12-04T12:15:04.9446411Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:04.9447040Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda 2025-12-04T12:15:04.9447048Z 2025-12-04T12:15:04.9447314Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:04.9447511Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T12:15:04.9447714Z ================== 1 failed, 187 deselected, 2 rerun in 4.12s ================== 2025-12-04T12:15:04.9447833Z Got exit code 1 2025-12-04T12:15:04.9447943Z Retrying single test... 2025-12-04T12:15:04.9448418Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-0328fb4bc2fb022d.xml 2025-12-04T12:15:04.9448594Z ============================= test session starts ============================== 2025-12-04T12:15:04.9448947Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T12:15:04.9449056Z cachedir: .pytest_cache 2025-12-04T12:15:04.9449588Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:15:04.9449714Z rootdir: /var/lib/jenkins/workspace 2025-12-04T12:15:04.9449840Z configfile: pytest.ini 2025-12-04T12:15:04.9450429Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T12:15:04.9450651Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T12:15:04.9451378Z stepcurrent: skipping 1 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda 2025-12-04T12:15:04.9451497Z Running 1 items in this shard 2025-12-04T12:15:04.9451503Z 2025-12-04T12:15:04.9452750Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda E1204 11:44:50.142000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0 2025-12-04T12:15:04.9453749Z E1204 11:44:50.142000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:04.9454187Z E1204 11:44:50.142000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T12:15:04.9454698Z E1204 11:44:50.142000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 150 2025-12-04T12:15:04.9455289Z E1204 11:44:50.142000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] R0_BLOCK: tl.constexpr = 256 2025-12-04T12:15:04.9455769Z E1204 11:44:50.142000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T12:15:04.9456397Z E1204 11:44:50.142000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T12:15:04.9456943Z E1204 11:44:50.142000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:04.9457582Z E1204 11:44:50.142000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T12:15:04.9458166Z E1204 11:44:50.142000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T12:15:04.9458737Z E1204 11:44:50.142000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T12:15:04.9459182Z E1204 11:44:50.142000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_offset = 0 2025-12-04T12:15:04.9459709Z E1204 11:44:50.142000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T12:15:04.9460186Z E1204 11:44:50.142000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T12:15:04.9460644Z E1204 11:44:50.142000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T12:15:04.9461106Z E1204 11:44:50.142000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_0 = r0_index 2025-12-04T12:15:04.9461756Z E1204 11:44:50.142000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0).to(tl.float32) 2025-12-04T12:15:04.9462288Z E1204 11:44:50.142000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = tl.load(in_ptr1 + (0)) 2025-12-04T12:15:04.9462836Z E1204 11:44:50.142000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tl.broadcast_to(tmp7, [1, 1]) 2025-12-04T12:15:04.9463338Z E1204 11:44:50.142000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tl_math.abs(tmp0) 2025-12-04T12:15:04.9463931Z E1204 11:44:50.142000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK]) 2025-12-04T12:15:04.9464501Z E1204 11:44:50.142000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tl.where(r0_mask, tmp2, float("-inf")) 2025-12-04T12:15:04.9465141Z E1204 11:44:50.142000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = triton_helpers.max2(tmp4, 1)[:, None].to(tl.float32) 2025-12-04T12:15:04.9465646Z E1204 11:44:50.142000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp6 = tmp0.to(tl.float32) 2025-12-04T12:15:04.9466115Z E1204 11:44:50.142000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = tmp6 * tmp8 2025-12-04T12:15:04.9466597Z E1204 11:44:50.142000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = -448.0 2025-12-04T12:15:04.9467172Z E1204 11:44:50.142000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = triton_helpers.maximum(tmp9, tmp10) 2025-12-04T12:15:04.9467621Z E1204 11:44:50.142000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = 448.0 2025-12-04T12:15:04.9468230Z E1204 11:44:50.142000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = triton_helpers.minimum(tmp11, tmp12) 2025-12-04T12:15:04.9468790Z E1204 11:44:50.142000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp14 = tmp13.to(tl.float8e4nv) 2025-12-04T12:15:04.9469508Z E1204 11:44:50.142000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (tl.broadcast_to(r0_0, [XBLOCK, R0_BLOCK])), tmp14, r0_mask) 2025-12-04T12:15:04.9470215Z E1204 11:44:50.142000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr2 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp5, None) 2025-12-04T12:15:04.9470618Z E1204 11:44:50.142000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:04.9472901Z E1204 11:44:50.142000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'out_ptr2': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 75} 2025-12-04T12:15:04.9473460Z E1204 11:44:50.142000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:04.9474507Z E1204 11:44:50.142000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:04.9475152Z E1204 11:44:50.142000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:04.9476051Z E1204 11:44:50.142000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:04.9476747Z E1204 11:44:50.142000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:04.9477632Z E1204 11:44:50.142000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:04.9478400Z E1204 11:44:50.142000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:04.9479020Z E1204 11:44:50.142000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T12:15:04.9479971Z E1204 11:44:50.142000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:04.9480348Z E1204 11:44:50.142000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T12:15:04.9481324Z E1204 11:44:50.142000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:04.9481463Z ('RERUN', {'yellow': True}) [3.3342s] [100%] 2025-12-04T12:15:04.9482719Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda E1204 11:44:50.529000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0 2025-12-04T12:15:04.9483747Z E1204 11:44:50.529000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:04.9484194Z E1204 11:44:50.529000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T12:15:04.9484643Z E1204 11:44:50.529000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 150 2025-12-04T12:15:04.9485223Z E1204 11:44:50.529000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] R0_BLOCK: tl.constexpr = 256 2025-12-04T12:15:04.9485683Z E1204 11:44:50.529000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T12:15:04.9486217Z E1204 11:44:50.529000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T12:15:04.9486770Z E1204 11:44:50.529000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:04.9487349Z E1204 11:44:50.529000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T12:15:04.9487947Z E1204 11:44:50.529000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T12:15:04.9488506Z E1204 11:44:50.529000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T12:15:04.9488944Z E1204 11:44:50.529000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_offset = 0 2025-12-04T12:15:04.9489479Z E1204 11:44:50.529000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T12:15:04.9489953Z E1204 11:44:50.529000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T12:15:04.9490430Z E1204 11:44:50.529000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T12:15:04.9490879Z E1204 11:44:50.529000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_0 = r0_index 2025-12-04T12:15:04.9491522Z E1204 11:44:50.529000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0).to(tl.float32) 2025-12-04T12:15:04.9492059Z E1204 11:44:50.529000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = tl.load(in_ptr1 + (0)) 2025-12-04T12:15:04.9492605Z E1204 11:44:50.529000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tl.broadcast_to(tmp7, [1, 1]) 2025-12-04T12:15:04.9493116Z E1204 11:44:50.529000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tl_math.abs(tmp0) 2025-12-04T12:15:04.9493700Z E1204 11:44:50.529000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK]) 2025-12-04T12:15:04.9494325Z E1204 11:44:50.529000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tl.where(r0_mask, tmp2, float("-inf")) 2025-12-04T12:15:04.9494952Z E1204 11:44:50.529000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = triton_helpers.max2(tmp4, 1)[:, None].to(tl.float32) 2025-12-04T12:15:04.9495459Z E1204 11:44:50.529000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp6 = tmp0.to(tl.float32) 2025-12-04T12:15:04.9495971Z E1204 11:44:50.529000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = tmp6 * tmp8 2025-12-04T12:15:04.9496510Z E1204 11:44:50.529000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = -448.0 2025-12-04T12:15:04.9497095Z E1204 11:44:50.529000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = triton_helpers.maximum(tmp9, tmp10) 2025-12-04T12:15:04.9497533Z E1204 11:44:50.529000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = 448.0 2025-12-04T12:15:04.9498109Z E1204 11:44:50.529000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = triton_helpers.minimum(tmp11, tmp12) 2025-12-04T12:15:04.9498684Z E1204 11:44:50.529000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp14 = tmp13.to(tl.float8e4nv) 2025-12-04T12:15:04.9499388Z E1204 11:44:50.529000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (tl.broadcast_to(r0_0, [XBLOCK, R0_BLOCK])), tmp14, r0_mask) 2025-12-04T12:15:04.9500103Z E1204 11:44:50.529000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr2 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp5, None) 2025-12-04T12:15:04.9500467Z E1204 11:44:50.529000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:04.9502576Z E1204 11:44:50.529000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'out_ptr2': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 75} 2025-12-04T12:15:04.9503116Z E1204 11:44:50.529000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:04.9504167Z E1204 11:44:50.529000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:04.9504799Z E1204 11:44:50.529000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:04.9505692Z E1204 11:44:50.529000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:04.9506383Z E1204 11:44:50.529000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:04.9507269Z E1204 11:44:50.529000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:04.9508052Z E1204 11:44:50.529000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:04.9508712Z E1204 11:44:50.529000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T12:15:04.9509688Z E1204 11:44:50.529000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:04.9510084Z E1204 11:44:50.529000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T12:15:04.9511003Z E1204 11:44:50.529000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:04.9511160Z ('RERUN', {'yellow': True}) [0.3472s] [100%] 2025-12-04T12:15:04.9512393Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda E1204 11:44:50.876000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0 2025-12-04T12:15:04.9513385Z E1204 11:44:50.876000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:04.9513821Z E1204 11:44:50.876000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T12:15:04.9514282Z E1204 11:44:50.876000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 150 2025-12-04T12:15:04.9514805Z E1204 11:44:50.876000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] R0_BLOCK: tl.constexpr = 256 2025-12-04T12:15:04.9515266Z E1204 11:44:50.876000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T12:15:04.9515813Z E1204 11:44:50.876000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T12:15:04.9516356Z E1204 11:44:50.876000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:04.9516956Z E1204 11:44:50.876000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T12:15:04.9517542Z E1204 11:44:50.876000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T12:15:04.9518097Z E1204 11:44:50.876000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T12:15:04.9518553Z E1204 11:44:50.876000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_offset = 0 2025-12-04T12:15:04.9519068Z E1204 11:44:50.876000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T12:15:04.9519552Z E1204 11:44:50.876000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T12:15:04.9520010Z E1204 11:44:50.876000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T12:15:04.9520455Z E1204 11:44:50.876000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_0 = r0_index 2025-12-04T12:15:04.9521113Z E1204 11:44:50.876000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0).to(tl.float32) 2025-12-04T12:15:04.9521629Z E1204 11:44:50.876000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = tl.load(in_ptr1 + (0)) 2025-12-04T12:15:04.9522212Z E1204 11:44:50.876000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tl.broadcast_to(tmp7, [1, 1]) 2025-12-04T12:15:04.9522716Z E1204 11:44:50.876000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tl_math.abs(tmp0) 2025-12-04T12:15:04.9523295Z E1204 11:44:50.876000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK]) 2025-12-04T12:15:04.9523906Z E1204 11:44:50.876000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tl.where(r0_mask, tmp2, float("-inf")) 2025-12-04T12:15:04.9524562Z E1204 11:44:50.876000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = triton_helpers.max2(tmp4, 1)[:, None].to(tl.float32) 2025-12-04T12:15:04.9525079Z E1204 11:44:50.876000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp6 = tmp0.to(tl.float32) 2025-12-04T12:15:04.9525549Z E1204 11:44:50.876000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = tmp6 * tmp8 2025-12-04T12:15:04.9526033Z E1204 11:44:50.876000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = -448.0 2025-12-04T12:15:04.9526599Z E1204 11:44:50.876000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = triton_helpers.maximum(tmp9, tmp10) 2025-12-04T12:15:04.9527040Z E1204 11:44:50.876000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = 448.0 2025-12-04T12:15:04.9527630Z E1204 11:44:50.876000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = triton_helpers.minimum(tmp11, tmp12) 2025-12-04T12:15:04.9528164Z E1204 11:44:50.876000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp14 = tmp13.to(tl.float8e4nv) 2025-12-04T12:15:04.9528884Z E1204 11:44:50.876000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (tl.broadcast_to(r0_0, [XBLOCK, R0_BLOCK])), tmp14, r0_mask) 2025-12-04T12:15:04.9529590Z E1204 11:44:50.876000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr2 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp5, None) 2025-12-04T12:15:04.9529953Z E1204 11:44:50.876000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:04.9532047Z E1204 11:44:50.876000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'out_ptr2': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 75} 2025-12-04T12:15:04.9532590Z E1204 11:44:50.876000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:04.9533655Z E1204 11:44:50.876000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:04.9534290Z E1204 11:44:50.876000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:04.9535197Z E1204 11:44:50.876000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:04.9535914Z E1204 11:44:50.876000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:04.9536874Z E1204 11:44:50.876000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:04.9537646Z E1204 11:44:50.876000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:04.9538354Z E1204 11:44:50.876000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T12:15:04.9539311Z E1204 11:44:50.876000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:04.9539678Z E1204 11:44:50.876000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T12:15:04.9540615Z E1204 11:44:50.876000 109437 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:04.9540727Z FAILED [0.3454s] [100%] 2025-12-04T12:15:04.9540734Z 2025-12-04T12:15:04.9540899Z ==================================== RERUNS ==================================== 2025-12-04T12:15:04.9541223Z _ TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda _ 2025-12-04T12:15:04.9541351Z Traceback (most recent call last): 2025-12-04T12:15:04.9541824Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 265, in test_amax_along_with_fp8_quant 2025-12-04T12:15:04.9542069Z y_compiled = compiled_amax_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T12:15:04.9542575Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:04.9542831Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:04.9543345Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:04.9543559Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:04.9544069Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:04.9544221Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:04.9544773Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:04.9545095Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:04.9545634Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:04.9545789Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:04.9546275Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:04.9546419Z return self._compile_to_module() 2025-12-04T12:15:04.9546910Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:04.9547090Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:04.9547610Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:04.9547744Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:04.9548250Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:04.9548523Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:04.9549114Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:04.9549259Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:04.9549755Z File "/tmp/tmpu3owmxje/b4/cb4w36z67ouxaq5j3vsw7lsoxkldnbjlxjhks5bmuhdl7hhmegzf.py", line 62, in 2025-12-04T12:15:04.9550264Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T12:15:04.9550406Z kernel.precompile( 2025-12-04T12:15:04.9550966Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T12:15:04.9551095Z self._precompile_worker() 2025-12-04T12:15:04.9551695Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:04.9551892Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:04.9552532Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:04.9552731Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:04.9553193Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:04.9553440Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:04.9553886Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:04.9554231Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:04.9554457Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:04.9554984Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:04.9555076Z ^ 2025-12-04T12:15:04.9555530Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:04.9555536Z 2025-12-04T12:15:04.9556254Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:04.9556263Z 2025-12-04T12:15:04.9556268Z 2025-12-04T12:15:04.9556486Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:04.9557131Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda 2025-12-04T12:15:04.9557137Z 2025-12-04T12:15:04.9557404Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:04.9557647Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:04.9557756Z frames [('total', 1)] 2025-12-04T12:15:04.9557874Z stats [('calls_captured', 7)] 2025-12-04T12:15:04.9558125Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:04.9558349Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:04.9558454Z graph_break [] 2025-12-04T12:15:04.9558790Z _ TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda _ 2025-12-04T12:15:04.9558913Z Traceback (most recent call last): 2025-12-04T12:15:04.9559373Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 265, in test_amax_along_with_fp8_quant 2025-12-04T12:15:04.9559630Z y_compiled = compiled_amax_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T12:15:04.9560124Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:04.9560422Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:04.9560940Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:04.9561132Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:04.9561658Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:04.9561838Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:04.9562416Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:04.9562744Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:04.9563264Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:04.9563429Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:04.9563937Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:04.9564072Z return self._compile_to_module() 2025-12-04T12:15:04.9564555Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:04.9564721Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:04.9565251Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:04.9565385Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:04.9565884Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:04.9566130Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:04.9566713Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:04.9566857Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:04.9567358Z File "/tmp/tmp9650aj7v/xq/cxqp5ybp2bbis5lrrx2of4wejt4azewfckjsqz7oyanqdwexzxrv.py", line 62, in 2025-12-04T12:15:04.9567820Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T12:15:04.9567951Z kernel.precompile( 2025-12-04T12:15:04.9568506Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T12:15:04.9568627Z self._precompile_worker() 2025-12-04T12:15:04.9569236Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:04.9569417Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:04.9570027Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:04.9570227Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:04.9570676Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:04.9571124Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:04.9571578Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:04.9571930Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:04.9572159Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:04.9572676Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:04.9572782Z ^ 2025-12-04T12:15:04.9573332Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:04.9573341Z 2025-12-04T12:15:04.9574065Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:04.9574112Z 2025-12-04T12:15:04.9574117Z 2025-12-04T12:15:04.9574337Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:04.9575012Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda 2025-12-04T12:15:04.9575034Z 2025-12-04T12:15:04.9575309Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:04.9575533Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:04.9575653Z frames [('total', 1)] 2025-12-04T12:15:04.9575773Z stats [('calls_captured', 7)] 2025-12-04T12:15:04.9576055Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:04.9576350Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:04.9576455Z graph_break [] 2025-12-04T12:15:04.9576676Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:04.9576800Z frames [('total', 1)] 2025-12-04T12:15:04.9576917Z stats [('calls_captured', 7)] 2025-12-04T12:15:04.9577151Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:04.9577390Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:04.9577490Z graph_break [] 2025-12-04T12:15:04.9577652Z =================================== FAILURES =================================== 2025-12-04T12:15:04.9577977Z _ TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda _ 2025-12-04T12:15:04.9578103Z Traceback (most recent call last): 2025-12-04T12:15:04.9578572Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 265, in test_amax_along_with_fp8_quant 2025-12-04T12:15:04.9578817Z y_compiled = compiled_amax_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T12:15:04.9579319Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:04.9579571Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:04.9580087Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:04.9580290Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:04.9580801Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:04.9580950Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:04.9581498Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:04.9581822Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:04.9582350Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:04.9582501Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:04.9582983Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:04.9583121Z return self._compile_to_module() 2025-12-04T12:15:04.9583604Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:04.9583780Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:04.9584335Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:04.9584471Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:04.9584985Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:04.9585214Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:04.9585812Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:04.9585969Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:04.9586499Z File "/tmp/tmpzgybdau9/rz/crzmctqfkn4vp54aytyfgk3qgfe4ha3prv52bltqg5w73znfs6gd.py", line 62, in 2025-12-04T12:15:04.9586980Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T12:15:04.9587092Z kernel.precompile( 2025-12-04T12:15:04.9587651Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T12:15:04.9587857Z self._precompile_worker() 2025-12-04T12:15:04.9588452Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:04.9588647Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:04.9589248Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:04.9589446Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:04.9589914Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:04.9590160Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:04.9590618Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:04.9590954Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:04.9591182Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:04.9592219Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:04.9592315Z ^ 2025-12-04T12:15:04.9592775Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:04.9592797Z 2025-12-04T12:15:04.9593509Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:04.9593516Z 2025-12-04T12:15:04.9593521Z 2025-12-04T12:15:04.9593738Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:04.9594383Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda 2025-12-04T12:15:04.9594391Z 2025-12-04T12:15:04.9594660Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:04.9594894Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:04.9595002Z frames [('total', 1)] 2025-12-04T12:15:04.9595123Z stats [('calls_captured', 7)] 2025-12-04T12:15:04.9595375Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:04.9595598Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:04.9595699Z graph_break [] 2025-12-04T12:15:04.9595930Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:04.9596049Z frames [('total', 1)] 2025-12-04T12:15:04.9596180Z stats [('calls_captured', 7)] 2025-12-04T12:15:04.9596452Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:04.9596691Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:04.9596811Z graph_break [] 2025-12-04T12:15:04.9597027Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:04.9597133Z frames [('total', 1)] 2025-12-04T12:15:04.9597263Z stats [('calls_captured', 7)] 2025-12-04T12:15:04.9597482Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:04.9597748Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:04.9597866Z graph_break [] 2025-12-04T12:15:04.9598565Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-0328fb4bc2fb022d.xml - 2025-12-04T12:15:04.9598759Z =========================== short test summary info ============================ 2025-12-04T12:15:04.9599533Z FAILED [0.3454s] inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:04.9600081Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:04.9600190Z ^ 2025-12-04T12:15:04.9600650Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:04.9600659Z 2025-12-04T12:15:04.9601385Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:04.9601391Z 2025-12-04T12:15:04.9601396Z 2025-12-04T12:15:04.9601617Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:04.9602247Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda 2025-12-04T12:15:04.9602269Z 2025-12-04T12:15:04.9602542Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:04.9602727Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T12:15:04.9602947Z ================== 1 failed, 187 deselected, 2 rerun in 4.07s ================== 2025-12-04T12:15:04.9603051Z Got exit code 1 2025-12-04T12:15:04.9603615Z FAILED CONSISTENTLY: test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda 2025-12-04T12:15:04.9604152Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T12:15:04.9604628Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-4ecceae3d20d3515.xml 2025-12-04T12:15:04.9604812Z ============================= test session starts ============================== 2025-12-04T12:15:04.9605171Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T12:15:04.9605283Z cachedir: .pytest_cache 2025-12-04T12:15:04.9605822Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:15:04.9605952Z rootdir: /var/lib/jenkins/workspace 2025-12-04T12:15:04.9606063Z configfile: pytest.ini 2025-12-04T12:15:04.9606665Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T12:15:04.9606894Z collecting ... collected 188 items / 2 deselected / 186 selected 2025-12-04T12:15:04.9607055Z stepcurrent: skipping 2 already run items. 2025-12-04T12:15:04.9607172Z Running 186 items in this shard 2025-12-04T12:15:04.9607178Z 2025-12-04T12:15:04.9608384Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda E1204 11:45:10.496000 109634 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_clamp_mul_2 2025-12-04T12:15:04.9609202Z E1204 11:45:10.496000 109634 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_clamp_mul_2(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:04.9609652Z E1204 11:45:10.496000 109634 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 40960 2025-12-04T12:15:04.9610243Z E1204 11:45:10.496000 109634 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:04.9610837Z E1204 11:45:10.496000 109634 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T12:15:04.9611415Z E1204 11:45:10.496000 109634 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:] 2025-12-04T12:15:04.9611853Z E1204 11:45:10.496000 109634 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T12:15:04.9612472Z E1204 11:45:10.496000 109634 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (x0), None).to(tl.float32) 2025-12-04T12:15:04.9613009Z E1204 11:45:10.496000 109634 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.load(in_ptr1 + (0)) 2025-12-04T12:15:04.9613560Z E1204 11:45:10.496000 109634 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tl.broadcast_to(tmp2, [XBLOCK]) 2025-12-04T12:15:04.9614083Z E1204 11:45:10.496000 109634 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float32) 2025-12-04T12:15:04.9614550Z E1204 11:45:10.496000 109634 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tmp1 * tmp3 2025-12-04T12:15:04.9614992Z E1204 11:45:10.496000 109634 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = -448.0 2025-12-04T12:15:04.9615570Z E1204 11:45:10.496000 109634 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp6 = triton_helpers.maximum(tmp4, tmp5) 2025-12-04T12:15:04.9616006Z E1204 11:45:10.496000 109634 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = 448.0 2025-12-04T12:15:04.9616651Z E1204 11:45:10.496000 109634 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = triton_helpers.minimum(tmp6, tmp7) 2025-12-04T12:15:04.9617184Z E1204 11:45:10.496000 109634 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = tmp8.to(tl.float8e4nv) 2025-12-04T12:15:04.9617725Z E1204 11:45:10.496000 109634 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr0 + (x0), tmp9, None) 2025-12-04T12:15:04.9618100Z E1204 11:45:10.496000 109634 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:04.9620009Z E1204 11:45:10.496000 109634 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr0': '*fp8e4nv', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 512}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 75} 2025-12-04T12:15:04.9620562Z E1204 11:45:10.496000 109634 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:04.9621606Z E1204 11:45:10.496000 109634 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:04.9622298Z E1204 11:45:10.496000 109634 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:04.9623189Z E1204 11:45:10.496000 109634 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:04.9623910Z E1204 11:45:10.496000 109634 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:04.9624829Z E1204 11:45:10.496000 109634 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:04.9625600Z E1204 11:45:10.496000 109634 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:04.9626223Z E1204 11:45:10.496000 109634 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T12:15:04.9627057Z E1204 11:45:10.496000 109634 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_clamp_mul_2(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:04.9627437Z E1204 11:45:10.496000 109634 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T12:15:04.9628327Z E1204 11:45:10.496000 109634 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:04.9631617Z ('RERUN', {'yellow': True}) [3.7007s] [ 0%] 2025-12-04T12:15:04.9632817Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda E1204 11:45:11.084000 109634 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_clamp_mul_2 2025-12-04T12:15:04.9633963Z E1204 11:45:11.084000 109634 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_clamp_mul_2(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:04.9634433Z E1204 11:45:11.084000 109634 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 40960 2025-12-04T12:15:04.9634981Z E1204 11:45:11.084000 109634 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:04.9635558Z E1204 11:45:11.084000 109634 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T12:15:04.9636164Z E1204 11:45:11.084000 109634 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:] 2025-12-04T12:15:04.9636595Z E1204 11:45:11.084000 109634 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T12:15:04.9637201Z E1204 11:45:11.084000 109634 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (x0), None).to(tl.float32) 2025-12-04T12:15:04.9637720Z E1204 11:45:11.084000 109634 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.load(in_ptr1 + (0)) 2025-12-04T12:15:04.9638272Z E1204 11:45:11.084000 109634 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tl.broadcast_to(tmp2, [XBLOCK]) 2025-12-04T12:15:04.9638794Z E1204 11:45:11.084000 109634 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float32) 2025-12-04T12:15:04.9639260Z E1204 11:45:11.084000 109634 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tmp1 * tmp3 2025-12-04T12:15:04.9639790Z E1204 11:45:11.084000 109634 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = -448.0 2025-12-04T12:15:04.9640361Z E1204 11:45:11.084000 109634 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp6 = triton_helpers.maximum(tmp4, tmp5) 2025-12-04T12:15:04.9640798Z E1204 11:45:11.084000 109634 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = 448.0 2025-12-04T12:15:04.9641416Z E1204 11:45:11.084000 109634 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = triton_helpers.minimum(tmp6, tmp7) 2025-12-04T12:15:04.9641973Z E1204 11:45:11.084000 109634 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = tmp8.to(tl.float8e4nv) 2025-12-04T12:15:04.9642531Z E1204 11:45:11.084000 109634 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr0 + (x0), tmp9, None) 2025-12-04T12:15:04.9642897Z E1204 11:45:11.084000 109634 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:04.9644808Z E1204 11:45:11.084000 109634 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr0': '*fp8e4nv', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 512}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 75} 2025-12-04T12:15:04.9645353Z E1204 11:45:11.084000 109634 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:04.9646415Z E1204 11:45:11.084000 109634 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:04.9647114Z E1204 11:45:11.084000 109634 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:04.9648008Z E1204 11:45:11.084000 109634 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:04.9648706Z E1204 11:45:11.084000 109634 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:04.9649591Z E1204 11:45:11.084000 109634 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:04.9650384Z E1204 11:45:11.084000 109634 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:04.9650996Z E1204 11:45:11.084000 109634 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T12:15:04.9651805Z E1204 11:45:11.084000 109634 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_clamp_mul_2(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:04.9652176Z E1204 11:45:11.084000 109634 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T12:15:04.9653075Z E1204 11:45:11.084000 109634 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:04.9653231Z ('RERUN', {'yellow': True}) [0.5517s] [ 0%] 2025-12-04T12:15:04.9654429Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda E1204 11:45:11.635000 109634 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_clamp_mul_2 2025-12-04T12:15:04.9655242Z E1204 11:45:11.635000 109634 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_clamp_mul_2(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:04.9655717Z E1204 11:45:11.635000 109634 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 40960 2025-12-04T12:15:04.9656369Z E1204 11:45:11.635000 109634 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:04.9656949Z E1204 11:45:11.635000 109634 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T12:15:04.9657519Z E1204 11:45:11.635000 109634 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:] 2025-12-04T12:15:04.9657969Z E1204 11:45:11.635000 109634 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T12:15:04.9658556Z E1204 11:45:11.635000 109634 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (x0), None).to(tl.float32) 2025-12-04T12:15:04.9659096Z E1204 11:45:11.635000 109634 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.load(in_ptr1 + (0)) 2025-12-04T12:15:04.9659645Z E1204 11:45:11.635000 109634 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tl.broadcast_to(tmp2, [XBLOCK]) 2025-12-04T12:15:04.9660149Z E1204 11:45:11.635000 109634 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float32) 2025-12-04T12:15:04.9660687Z E1204 11:45:11.635000 109634 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tmp1 * tmp3 2025-12-04T12:15:04.9661128Z E1204 11:45:11.635000 109634 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = -448.0 2025-12-04T12:15:04.9661701Z E1204 11:45:11.635000 109634 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp6 = triton_helpers.maximum(tmp4, tmp5) 2025-12-04T12:15:04.9662138Z E1204 11:45:11.635000 109634 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = 448.0 2025-12-04T12:15:04.9662707Z E1204 11:45:11.635000 109634 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = triton_helpers.minimum(tmp6, tmp7) 2025-12-04T12:15:04.9663245Z E1204 11:45:11.635000 109634 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = tmp8.to(tl.float8e4nv) 2025-12-04T12:15:04.9663789Z E1204 11:45:11.635000 109634 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr0 + (x0), tmp9, None) 2025-12-04T12:15:04.9664165Z E1204 11:45:11.635000 109634 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:04.9666070Z E1204 11:45:11.635000 109634 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr0': '*fp8e4nv', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 512}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 75} 2025-12-04T12:15:04.9666621Z E1204 11:45:11.635000 109634 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:04.9667778Z E1204 11:45:11.635000 109634 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:04.9668409Z E1204 11:45:11.635000 109634 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:04.9669313Z E1204 11:45:11.635000 109634 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:04.9670051Z E1204 11:45:11.635000 109634 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:04.9671122Z E1204 11:45:11.635000 109634 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:04.9671898Z E1204 11:45:11.635000 109634 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:04.9672524Z E1204 11:45:11.635000 109634 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T12:15:04.9673327Z E1204 11:45:11.635000 109634 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_clamp_mul_2(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:04.9673713Z E1204 11:45:11.635000 109634 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T12:15:04.9674608Z E1204 11:45:11.635000 109634 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:04.9674789Z FAILED [0.5510s] [ 0%] 2025-12-04T12:15:04.9674799Z 2025-12-04T12:15:04.9674960Z ==================================== RERUNS ==================================== 2025-12-04T12:15:04.9675284Z _ TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda _ 2025-12-04T12:15:04.9675411Z Traceback (most recent call last): 2025-12-04T12:15:04.9675883Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 265, in test_amax_along_with_fp8_quant 2025-12-04T12:15:04.9676132Z y_compiled = compiled_amax_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T12:15:04.9676638Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:04.9676888Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:04.9677404Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:04.9677613Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:04.9678125Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:04.9678288Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:04.9678821Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:04.9679143Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:04.9679678Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:04.9679828Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:04.9680322Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:04.9680449Z return self._compile_to_module() 2025-12-04T12:15:04.9680994Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:04.9681178Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:04.9681694Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:04.9681824Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:04.9682379Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:04.9682654Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:04.9683255Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:04.9683386Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:04.9683900Z File "/tmp/tmp1plybcp3/ky/cky7khmdr5lyfsuub4j6geabdqllkm256nn3louctma6b7lduzvd.py", line 163, in 2025-12-04T12:15:04.9684376Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T12:15:04.9684488Z kernel.precompile( 2025-12-04T12:15:04.9685059Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T12:15:04.9685181Z self._precompile_worker() 2025-12-04T12:15:04.9685779Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:04.9685977Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:04.9686571Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:04.9686813Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:04.9687274Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:04.9687522Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:04.9687977Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:04.9688313Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:04.9688540Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:04.9688913Z def triton_poi_fused__to_copy_clamp_mul_2(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:04.9689003Z ^ 2025-12-04T12:15:04.9689462Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:04.9689482Z 2025-12-04T12:15:04.9690194Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:04.9690200Z 2025-12-04T12:15:04.9690205Z 2025-12-04T12:15:04.9690422Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:04.9691078Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda 2025-12-04T12:15:04.9691086Z 2025-12-04T12:15:04.9691354Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:04.9691593Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:04.9691701Z frames [('total', 1)] 2025-12-04T12:15:04.9691821Z stats [('calls_captured', 7)] 2025-12-04T12:15:04.9692074Z inductor [('async_compile_cache_miss', 3), ('fxgraph_cache_miss', 1)] 2025-12-04T12:15:04.9692299Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:04.9692402Z graph_break [] 2025-12-04T12:15:04.9692773Z _ TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda _ 2025-12-04T12:15:04.9692902Z Traceback (most recent call last): 2025-12-04T12:15:04.9693372Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 265, in test_amax_along_with_fp8_quant 2025-12-04T12:15:04.9693615Z y_compiled = compiled_amax_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T12:15:04.9694104Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:04.9694402Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:04.9694943Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:04.9695152Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:04.9695667Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:04.9695821Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:04.9696440Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:04.9696765Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:04.9697287Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:04.9697458Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:04.9697939Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:04.9698077Z return self._compile_to_module() 2025-12-04T12:15:04.9698566Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:04.9698791Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:04.9699328Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:04.9699463Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:04.9699975Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:04.9700211Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:04.9700798Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:04.9700945Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:04.9701454Z File "/tmp/tmpod28e3ow/hk/chkrruefcnej6yo2sl3hrqyh426dvgtouxusocl6uumvojl6nidp.py", line 163, in 2025-12-04T12:15:04.9701921Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T12:15:04.9702049Z kernel.precompile( 2025-12-04T12:15:04.9702605Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T12:15:04.9702740Z self._precompile_worker() 2025-12-04T12:15:04.9703332Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:04.9703514Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:04.9704129Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:04.9704325Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:04.9704787Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:04.9705037Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:04.9705515Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:04.9705868Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:04.9706098Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:04.9706458Z def triton_poi_fused__to_copy_clamp_mul_2(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:04.9706590Z ^ 2025-12-04T12:15:04.9707051Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:04.9707084Z 2025-12-04T12:15:04.9707810Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:04.9707819Z 2025-12-04T12:15:04.9707823Z 2025-12-04T12:15:04.9708043Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:04.9708691Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda 2025-12-04T12:15:04.9708697Z 2025-12-04T12:15:04.9708968Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:04.9709192Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:04.9709314Z frames [('total', 1)] 2025-12-04T12:15:04.9709431Z stats [('calls_captured', 7)] 2025-12-04T12:15:04.9709668Z inductor [('async_compile_cache_miss', 3), ('fxgraph_cache_miss', 1)] 2025-12-04T12:15:04.9709905Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:04.9710005Z graph_break [] 2025-12-04T12:15:04.9710237Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:04.9710379Z frames [('total', 1)] 2025-12-04T12:15:04.9710494Z stats [('calls_captured', 7)] 2025-12-04T12:15:04.9710730Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:04.9710963Z inductor [('async_compile_cache_miss', 3), ('fxgraph_cache_miss', 1)] 2025-12-04T12:15:04.9711063Z graph_break [] 2025-12-04T12:15:04.9711222Z =================================== FAILURES =================================== 2025-12-04T12:15:04.9711543Z _ TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda _ 2025-12-04T12:15:04.9711683Z Traceback (most recent call last): 2025-12-04T12:15:04.9712143Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 265, in test_amax_along_with_fp8_quant 2025-12-04T12:15:04.9712385Z y_compiled = compiled_amax_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T12:15:04.9712889Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:04.9713138Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:04.9713656Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:04.9713866Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:04.9714376Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:04.9714543Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:04.9715074Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:04.9715394Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:04.9715927Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:04.9716078Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:04.9716603Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:04.9716728Z return self._compile_to_module() 2025-12-04T12:15:04.9717213Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:04.9717387Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:04.9717934Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:04.9718063Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:04.9718604Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:04.9718835Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:04.9719437Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:04.9719565Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:04.9720075Z File "/tmp/tmp241kvq85/w5/cw56acwuuf44v55zghnxv2hmw7hpmbeclykkqwvt47zyidmyzlxl.py", line 163, in 2025-12-04T12:15:04.9720546Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T12:15:04.9720661Z kernel.precompile( 2025-12-04T12:15:04.9721227Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T12:15:04.9721346Z self._precompile_worker() 2025-12-04T12:15:04.9721942Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:04.9722171Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:04.9722765Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:04.9722969Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:04.9723435Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:04.9723679Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:04.9724137Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:04.9724471Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:04.9724701Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:04.9725076Z def triton_poi_fused__to_copy_clamp_mul_2(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:04.9725169Z ^ 2025-12-04T12:15:04.9725643Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:04.9725649Z 2025-12-04T12:15:04.9726357Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:04.9726363Z 2025-12-04T12:15:04.9726368Z 2025-12-04T12:15:04.9726586Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:04.9727238Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda 2025-12-04T12:15:04.9727244Z 2025-12-04T12:15:04.9727519Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:04.9727752Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:04.9727862Z frames [('total', 1)] 2025-12-04T12:15:04.9727978Z stats [('calls_captured', 7)] 2025-12-04T12:15:04.9728262Z inductor [('async_compile_cache_miss', 3), ('fxgraph_cache_miss', 1)] 2025-12-04T12:15:04.9728486Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:04.9728587Z graph_break [] 2025-12-04T12:15:04.9728821Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:04.9728927Z frames [('total', 1)] 2025-12-04T12:15:04.9729055Z stats [('calls_captured', 7)] 2025-12-04T12:15:04.9729303Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:04.9729535Z inductor [('async_compile_cache_miss', 3), ('fxgraph_cache_miss', 1)] 2025-12-04T12:15:04.9729648Z graph_break [] 2025-12-04T12:15:04.9729906Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:04.9730011Z frames [('total', 1)] 2025-12-04T12:15:04.9730142Z stats [('calls_captured', 7)] 2025-12-04T12:15:04.9730360Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:04.9730605Z inductor [('async_compile_cache_miss', 3), ('fxgraph_cache_miss', 1)] 2025-12-04T12:15:04.9730706Z graph_break [] 2025-12-04T12:15:04.9731360Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-4ecceae3d20d3515.xml - 2025-12-04T12:15:04.9731547Z =========================== short test summary info ============================ 2025-12-04T12:15:04.9732326Z FAILED [0.5510s] inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:04.9732690Z def triton_poi_fused__to_copy_clamp_mul_2(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:04.9732795Z ^ 2025-12-04T12:15:04.9733252Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:04.9733418Z 2025-12-04T12:15:04.9734143Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:04.9734149Z 2025-12-04T12:15:04.9734154Z 2025-12-04T12:15:04.9734371Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:04.9735027Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda 2025-12-04T12:15:04.9735035Z 2025-12-04T12:15:04.9735305Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:04.9735490Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T12:15:04.9735704Z =================== 1 failed, 2 deselected, 2 rerun in 4.85s =================== 2025-12-04T12:15:04.9735805Z Got exit code 1 2025-12-04T12:15:04.9735917Z Retrying single test... 2025-12-04T12:15:04.9736490Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-af3f0411f43ffff1.xml 2025-12-04T12:15:04.9736658Z ============================= test session starts ============================== 2025-12-04T12:15:04.9737024Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T12:15:04.9737137Z cachedir: .pytest_cache 2025-12-04T12:15:04.9737657Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:15:04.9737800Z rootdir: /var/lib/jenkins/workspace 2025-12-04T12:15:04.9737910Z configfile: pytest.ini 2025-12-04T12:15:04.9738504Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T12:15:04.9738741Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T12:15:04.9739460Z stepcurrent: skipping 2 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda 2025-12-04T12:15:04.9739630Z Running 1 items in this shard 2025-12-04T12:15:04.9739637Z 2025-12-04T12:15:04.9740800Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda E1204 11:45:30.102000 109894 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_clamp_mul_2 2025-12-04T12:15:04.9741643Z E1204 11:45:30.102000 109894 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_clamp_mul_2(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:04.9742143Z E1204 11:45:30.102000 109894 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 40960 2025-12-04T12:15:04.9742684Z E1204 11:45:30.102000 109894 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:04.9743258Z E1204 11:45:30.102000 109894 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T12:15:04.9743817Z E1204 11:45:30.102000 109894 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:] 2025-12-04T12:15:04.9744262Z E1204 11:45:30.102000 109894 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T12:15:04.9744857Z E1204 11:45:30.102000 109894 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (x0), None).to(tl.float32) 2025-12-04T12:15:04.9745378Z E1204 11:45:30.102000 109894 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.load(in_ptr1 + (0)) 2025-12-04T12:15:04.9745940Z E1204 11:45:30.102000 109894 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tl.broadcast_to(tmp2, [XBLOCK]) 2025-12-04T12:15:04.9746484Z E1204 11:45:30.102000 109894 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float32) 2025-12-04T12:15:04.9746961Z E1204 11:45:30.102000 109894 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tmp1 * tmp3 2025-12-04T12:15:04.9747401Z E1204 11:45:30.102000 109894 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = -448.0 2025-12-04T12:15:04.9747959Z E1204 11:45:30.102000 109894 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp6 = triton_helpers.maximum(tmp4, tmp5) 2025-12-04T12:15:04.9748413Z E1204 11:45:30.102000 109894 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = 448.0 2025-12-04T12:15:04.9748977Z E1204 11:45:30.102000 109894 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = triton_helpers.minimum(tmp6, tmp7) 2025-12-04T12:15:04.9749515Z E1204 11:45:30.102000 109894 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = tmp8.to(tl.float8e4nv) 2025-12-04T12:15:04.9750059Z E1204 11:45:30.102000 109894 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr0 + (x0), tmp9, None) 2025-12-04T12:15:04.9750432Z E1204 11:45:30.102000 109894 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:04.9752342Z E1204 11:45:30.102000 109894 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr0': '*fp8e4nv', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 512}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 75} 2025-12-04T12:15:04.9753358Z E1204 11:45:30.102000 109894 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:04.9754411Z E1204 11:45:30.102000 109894 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:04.9755043Z E1204 11:45:30.102000 109894 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:04.9756008Z E1204 11:45:30.102000 109894 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:04.9756690Z E1204 11:45:30.102000 109894 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:04.9757589Z E1204 11:45:30.102000 109894 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:04.9758503Z E1204 11:45:30.102000 109894 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:04.9759123Z E1204 11:45:30.102000 109894 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T12:15:04.9759926Z E1204 11:45:30.102000 109894 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_clamp_mul_2(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:04.9760290Z E1204 11:45:30.102000 109894 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T12:15:04.9761248Z E1204 11:45:30.102000 109894 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:04.9761387Z ('RERUN', {'yellow': True}) [3.6849s] [100%] 2025-12-04T12:15:04.9762563Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda E1204 11:45:30.682000 109894 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_clamp_mul_2 2025-12-04T12:15:04.9763362Z E1204 11:45:30.682000 109894 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_clamp_mul_2(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:04.9763823Z E1204 11:45:30.682000 109894 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 40960 2025-12-04T12:15:04.9764366Z E1204 11:45:30.682000 109894 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:04.9764931Z E1204 11:45:30.682000 109894 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T12:15:04.9765513Z E1204 11:45:30.682000 109894 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:] 2025-12-04T12:15:04.9765946Z E1204 11:45:30.682000 109894 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T12:15:04.9766558Z E1204 11:45:30.682000 109894 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (x0), None).to(tl.float32) 2025-12-04T12:15:04.9767081Z E1204 11:45:30.682000 109894 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.load(in_ptr1 + (0)) 2025-12-04T12:15:04.9767630Z E1204 11:45:30.682000 109894 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tl.broadcast_to(tmp2, [XBLOCK]) 2025-12-04T12:15:04.9768187Z E1204 11:45:30.682000 109894 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float32) 2025-12-04T12:15:04.9768655Z E1204 11:45:30.682000 109894 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tmp1 * tmp3 2025-12-04T12:15:04.9769107Z E1204 11:45:30.682000 109894 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = -448.0 2025-12-04T12:15:04.9769706Z E1204 11:45:30.682000 109894 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp6 = triton_helpers.maximum(tmp4, tmp5) 2025-12-04T12:15:04.9770180Z E1204 11:45:30.682000 109894 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = 448.0 2025-12-04T12:15:04.9770761Z E1204 11:45:30.682000 109894 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = triton_helpers.minimum(tmp6, tmp7) 2025-12-04T12:15:04.9771880Z E1204 11:45:30.682000 109894 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = tmp8.to(tl.float8e4nv) 2025-12-04T12:15:04.9772439Z E1204 11:45:30.682000 109894 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr0 + (x0), tmp9, None) 2025-12-04T12:15:04.9772804Z E1204 11:45:30.682000 109894 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:04.9774725Z E1204 11:45:30.682000 109894 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr0': '*fp8e4nv', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 512}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 75} 2025-12-04T12:15:04.9775357Z E1204 11:45:30.682000 109894 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:04.9776457Z E1204 11:45:30.682000 109894 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:04.9777102Z E1204 11:45:30.682000 109894 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:04.9778007Z E1204 11:45:30.682000 109894 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:04.9778702Z E1204 11:45:30.682000 109894 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:04.9779589Z E1204 11:45:30.682000 109894 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:04.9780375Z E1204 11:45:30.682000 109894 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:04.9780984Z E1204 11:45:30.682000 109894 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T12:15:04.9781803Z E1204 11:45:30.682000 109894 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_clamp_mul_2(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:04.9782176Z E1204 11:45:30.682000 109894 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T12:15:04.9783129Z E1204 11:45:30.682000 109894 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:04.9783279Z ('RERUN', {'yellow': True}) [0.5445s] [100%] 2025-12-04T12:15:04.9784436Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda E1204 11:45:31.230000 109894 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_clamp_mul_2 2025-12-04T12:15:04.9785326Z E1204 11:45:31.230000 109894 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_clamp_mul_2(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:04.9785777Z E1204 11:45:31.230000 109894 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 40960 2025-12-04T12:15:04.9786323Z E1204 11:45:31.230000 109894 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:04.9786898Z E1204 11:45:31.230000 109894 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T12:15:04.9787458Z E1204 11:45:31.230000 109894 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:] 2025-12-04T12:15:04.9787910Z E1204 11:45:31.230000 109894 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T12:15:04.9788503Z E1204 11:45:31.230000 109894 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (x0), None).to(tl.float32) 2025-12-04T12:15:04.9789041Z E1204 11:45:31.230000 109894 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.load(in_ptr1 + (0)) 2025-12-04T12:15:04.9789633Z E1204 11:45:31.230000 109894 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tl.broadcast_to(tmp2, [XBLOCK]) 2025-12-04T12:15:04.9790142Z E1204 11:45:31.230000 109894 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float32) 2025-12-04T12:15:04.9790622Z E1204 11:45:31.230000 109894 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tmp1 * tmp3 2025-12-04T12:15:04.9791063Z E1204 11:45:31.230000 109894 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = -448.0 2025-12-04T12:15:04.9791644Z E1204 11:45:31.230000 109894 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp6 = triton_helpers.maximum(tmp4, tmp5) 2025-12-04T12:15:04.9792079Z E1204 11:45:31.230000 109894 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = 448.0 2025-12-04T12:15:04.9792646Z E1204 11:45:31.230000 109894 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = triton_helpers.minimum(tmp6, tmp7) 2025-12-04T12:15:04.9793182Z E1204 11:45:31.230000 109894 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = tmp8.to(tl.float8e4nv) 2025-12-04T12:15:04.9793722Z E1204 11:45:31.230000 109894 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr0 + (x0), tmp9, None) 2025-12-04T12:15:04.9794093Z E1204 11:45:31.230000 109894 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:04.9796031Z E1204 11:45:31.230000 109894 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr0': '*fp8e4nv', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 512}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 75} 2025-12-04T12:15:04.9796584Z E1204 11:45:31.230000 109894 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:04.9797625Z E1204 11:45:31.230000 109894 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:04.9798299Z E1204 11:45:31.230000 109894 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:04.9799234Z E1204 11:45:31.230000 109894 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:04.9799920Z E1204 11:45:31.230000 109894 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:04.9800812Z E1204 11:45:31.230000 109894 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:04.9801580Z E1204 11:45:31.230000 109894 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:04.9802204Z E1204 11:45:31.230000 109894 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T12:15:04.9802997Z E1204 11:45:31.230000 109894 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_clamp_mul_2(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:04.9803400Z E1204 11:45:31.230000 109894 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T12:15:04.9804307Z E1204 11:45:31.230000 109894 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:04.9804414Z FAILED [0.5454s] [100%] 2025-12-04T12:15:04.9804420Z 2025-12-04T12:15:04.9804580Z ==================================== RERUNS ==================================== 2025-12-04T12:15:04.9804908Z _ TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda _ 2025-12-04T12:15:04.9805035Z Traceback (most recent call last): 2025-12-04T12:15:04.9805508Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 265, in test_amax_along_with_fp8_quant 2025-12-04T12:15:04.9805751Z y_compiled = compiled_amax_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T12:15:04.9806259Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:04.9806512Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:04.9807028Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:04.9807237Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:04.9807744Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:04.9807911Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:04.9808445Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:04.9808764Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:04.9809297Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:04.9809480Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:04.9809962Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:04.9810100Z return self._compile_to_module() 2025-12-04T12:15:04.9810585Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:04.9810793Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:04.9811340Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:04.9811472Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:04.9811980Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:04.9812214Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:04.9812814Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:04.9812940Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:04.9813448Z File "/tmp/tmpzaf0e86o/x7/cx7uih5uzm2uphxtckruuoybmwgam52us2bcsodffye5bwzycvje.py", line 163, in 2025-12-04T12:15:04.9813924Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T12:15:04.9814041Z kernel.precompile( 2025-12-04T12:15:04.9814593Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T12:15:04.9814726Z self._precompile_worker() 2025-12-04T12:15:04.9815323Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:04.9815561Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:04.9816157Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:04.9816418Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:04.9816892Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:04.9817146Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:04.9817605Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:04.9817942Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:04.9818172Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:04.9818546Z def triton_poi_fused__to_copy_clamp_mul_2(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:04.9818639Z ^ 2025-12-04T12:15:04.9819100Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:04.9819122Z 2025-12-04T12:15:04.9819837Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:04.9819844Z 2025-12-04T12:15:04.9819851Z 2025-12-04T12:15:04.9820070Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:04.9820725Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda 2025-12-04T12:15:04.9820731Z 2025-12-04T12:15:04.9820999Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:04.9821236Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:04.9821342Z frames [('total', 1)] 2025-12-04T12:15:04.9821507Z stats [('calls_captured', 7)] 2025-12-04T12:15:04.9821759Z inductor [('async_compile_cache_miss', 3), ('fxgraph_cache_miss', 1)] 2025-12-04T12:15:04.9821980Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:04.9822080Z graph_break [] 2025-12-04T12:15:04.9822414Z _ TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda _ 2025-12-04T12:15:04.9822569Z Traceback (most recent call last): 2025-12-04T12:15:04.9823037Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 265, in test_amax_along_with_fp8_quant 2025-12-04T12:15:04.9823312Z y_compiled = compiled_amax_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T12:15:04.9823803Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:04.9824068Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:04.9824587Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:04.9824782Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:04.9825305Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:04.9825452Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:04.9825997Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:04.9826318Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:04.9826835Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:04.9827038Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:04.9827521Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:04.9827657Z return self._compile_to_module() 2025-12-04T12:15:04.9828140Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:04.9828302Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:04.9828831Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:04.9828965Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:04.9829462Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:04.9829705Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:04.9830294Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:04.9830435Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:04.9830915Z File "/tmp/tmpfax_frvz/57/c572mxpmqtrjjs5gfj4qmykfskskgjkosm4aanfpybly7so6743j.py", line 163, in 2025-12-04T12:15:04.9831378Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T12:15:04.9831502Z kernel.precompile( 2025-12-04T12:15:04.9832054Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T12:15:04.9832187Z self._precompile_worker() 2025-12-04T12:15:04.9832780Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:04.9832960Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:04.9833565Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:04.9833799Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:04.9834249Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:04.9834508Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:04.9834952Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:04.9835337Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:04.9835599Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:04.9835958Z def triton_poi_fused__to_copy_clamp_mul_2(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:04.9836060Z ^ 2025-12-04T12:15:04.9836524Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:04.9836530Z 2025-12-04T12:15:04.9837263Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:04.9837268Z 2025-12-04T12:15:04.9837273Z 2025-12-04T12:15:04.9837490Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:04.9838139Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda 2025-12-04T12:15:04.9838159Z 2025-12-04T12:15:04.9838431Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:04.9838654Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:04.9838771Z frames [('total', 1)] 2025-12-04T12:15:04.9838926Z stats [('calls_captured', 7)] 2025-12-04T12:15:04.9839162Z inductor [('async_compile_cache_miss', 3), ('fxgraph_cache_miss', 1)] 2025-12-04T12:15:04.9839398Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:04.9839499Z graph_break [] 2025-12-04T12:15:04.9839731Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:04.9839835Z frames [('total', 1)] 2025-12-04T12:15:04.9839949Z stats [('calls_captured', 7)] 2025-12-04T12:15:04.9840178Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:04.9840413Z inductor [('async_compile_cache_miss', 3), ('fxgraph_cache_miss', 1)] 2025-12-04T12:15:04.9840511Z graph_break [] 2025-12-04T12:15:04.9840679Z =================================== FAILURES =================================== 2025-12-04T12:15:04.9841001Z _ TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda _ 2025-12-04T12:15:04.9841125Z Traceback (most recent call last): 2025-12-04T12:15:04.9841596Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 265, in test_amax_along_with_fp8_quant 2025-12-04T12:15:04.9841841Z y_compiled = compiled_amax_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T12:15:04.9842340Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:04.9842590Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:04.9843105Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:04.9843314Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:04.9843830Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:04.9843992Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:04.9844525Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:04.9844902Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:04.9845439Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:04.9845592Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:04.9846070Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:04.9846237Z return self._compile_to_module() 2025-12-04T12:15:04.9846722Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:04.9846934Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:04.9847451Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:04.9847586Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:04.9848103Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:04.9848336Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:04.9848938Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:04.9849071Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:04.9849583Z File "/tmp/tmpw53ectv7/7b/c7b5sfbk6rqmzdkpw5iowyahvgrhvy7avmmk2he2l6xosmewtmzx.py", line 163, in 2025-12-04T12:15:04.9850063Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T12:15:04.9850178Z kernel.precompile( 2025-12-04T12:15:04.9850732Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T12:15:04.9850929Z self._precompile_worker() 2025-12-04T12:15:04.9851530Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:04.9851724Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:04.9852322Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:04.9852521Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:04.9852990Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:04.9853239Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:04.9853699Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:04.9854039Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:04.9854270Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:04.9854648Z def triton_poi_fused__to_copy_clamp_mul_2(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:04.9854740Z ^ 2025-12-04T12:15:04.9855196Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:04.9855216Z 2025-12-04T12:15:04.9855936Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:04.9855944Z 2025-12-04T12:15:04.9855948Z 2025-12-04T12:15:04.9856166Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:04.9856889Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda 2025-12-04T12:15:04.9856899Z 2025-12-04T12:15:04.9857214Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:04.9857450Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:04.9857559Z frames [('total', 1)] 2025-12-04T12:15:04.9857678Z stats [('calls_captured', 7)] 2025-12-04T12:15:04.9857926Z inductor [('async_compile_cache_miss', 3), ('fxgraph_cache_miss', 1)] 2025-12-04T12:15:04.9858150Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:04.9858282Z graph_break [] 2025-12-04T12:15:04.9858517Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:04.9858653Z frames [('total', 1)] 2025-12-04T12:15:04.9858785Z stats [('calls_captured', 7)] 2025-12-04T12:15:04.9859006Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:04.9859242Z inductor [('async_compile_cache_miss', 3), ('fxgraph_cache_miss', 1)] 2025-12-04T12:15:04.9859354Z graph_break [] 2025-12-04T12:15:04.9859575Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:04.9859679Z frames [('total', 1)] 2025-12-04T12:15:04.9859807Z stats [('calls_captured', 7)] 2025-12-04T12:15:04.9860026Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:04.9860257Z inductor [('async_compile_cache_miss', 3), ('fxgraph_cache_miss', 1)] 2025-12-04T12:15:04.9860367Z graph_break [] 2025-12-04T12:15:04.9861023Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-af3f0411f43ffff1.xml - 2025-12-04T12:15:04.9861210Z =========================== short test summary info ============================ 2025-12-04T12:15:04.9861986Z FAILED [0.5454s] inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:04.9862459Z def triton_poi_fused__to_copy_clamp_mul_2(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:04.9862570Z ^ 2025-12-04T12:15:04.9863028Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:04.9863033Z 2025-12-04T12:15:04.9863760Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:04.9863768Z 2025-12-04T12:15:04.9863773Z 2025-12-04T12:15:04.9863990Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:04.9864632Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda 2025-12-04T12:15:04.9864652Z 2025-12-04T12:15:04.9864921Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:04.9865106Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T12:15:04.9865326Z ================== 1 failed, 187 deselected, 2 rerun in 4.82s ================== 2025-12-04T12:15:04.9865428Z Got exit code 1 2025-12-04T12:15:04.9865537Z Retrying single test... 2025-12-04T12:15:04.9866025Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-5f646abfecfc34db.xml 2025-12-04T12:15:04.9866191Z ============================= test session starts ============================== 2025-12-04T12:15:04.9866557Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T12:15:04.9866668Z cachedir: .pytest_cache 2025-12-04T12:15:04.9867188Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:15:04.9867332Z rootdir: /var/lib/jenkins/workspace 2025-12-04T12:15:04.9867444Z configfile: pytest.ini 2025-12-04T12:15:04.9868071Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T12:15:04.9868308Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T12:15:04.9869028Z stepcurrent: skipping 2 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda 2025-12-04T12:15:04.9869158Z Running 1 items in this shard 2025-12-04T12:15:04.9869193Z 2025-12-04T12:15:04.9870383Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda E1204 11:45:50.128000 110156 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_clamp_mul_2 2025-12-04T12:15:04.9871397Z E1204 11:45:50.128000 110156 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_clamp_mul_2(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:04.9871854Z E1204 11:45:50.128000 110156 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 40960 2025-12-04T12:15:04.9872400Z E1204 11:45:50.128000 110156 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:04.9872978Z E1204 11:45:50.128000 110156 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T12:15:04.9873544Z E1204 11:45:50.128000 110156 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:] 2025-12-04T12:15:04.9873992Z E1204 11:45:50.128000 110156 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T12:15:04.9874583Z E1204 11:45:50.128000 110156 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (x0), None).to(tl.float32) 2025-12-04T12:15:04.9875187Z E1204 11:45:50.128000 110156 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.load(in_ptr1 + (0)) 2025-12-04T12:15:04.9875752Z E1204 11:45:50.128000 110156 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tl.broadcast_to(tmp2, [XBLOCK]) 2025-12-04T12:15:04.9876256Z E1204 11:45:50.128000 110156 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float32) 2025-12-04T12:15:04.9876741Z E1204 11:45:50.128000 110156 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tmp1 * tmp3 2025-12-04T12:15:04.9877188Z E1204 11:45:50.128000 110156 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = -448.0 2025-12-04T12:15:04.9877755Z E1204 11:45:50.128000 110156 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp6 = triton_helpers.maximum(tmp4, tmp5) 2025-12-04T12:15:04.9878209Z E1204 11:45:50.128000 110156 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = 448.0 2025-12-04T12:15:04.9878779Z E1204 11:45:50.128000 110156 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = triton_helpers.minimum(tmp6, tmp7) 2025-12-04T12:15:04.9879320Z E1204 11:45:50.128000 110156 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = tmp8.to(tl.float8e4nv) 2025-12-04T12:15:04.9879861Z E1204 11:45:50.128000 110156 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr0 + (x0), tmp9, None) 2025-12-04T12:15:04.9880223Z E1204 11:45:50.128000 110156 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:04.9882205Z E1204 11:45:50.128000 110156 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr0': '*fp8e4nv', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 512}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 75} 2025-12-04T12:15:04.9882748Z E1204 11:45:50.128000 110156 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:04.9883806Z E1204 11:45:50.128000 110156 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:04.9884528Z E1204 11:45:50.128000 110156 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:04.9885437Z E1204 11:45:50.128000 110156 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:04.9886119Z E1204 11:45:50.128000 110156 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:04.9887012Z E1204 11:45:50.128000 110156 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:04.9887789Z E1204 11:45:50.128000 110156 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:04.9888409Z E1204 11:45:50.128000 110156 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T12:15:04.9889254Z E1204 11:45:50.128000 110156 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_clamp_mul_2(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:04.9889618Z E1204 11:45:50.128000 110156 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T12:15:04.9890521Z E1204 11:45:50.128000 110156 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:04.9890658Z ('RERUN', {'yellow': True}) [3.7187s] [100%] 2025-12-04T12:15:04.9891815Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda E1204 11:45:50.734000 110156 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_clamp_mul_2 2025-12-04T12:15:04.9892607Z E1204 11:45:50.734000 110156 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_clamp_mul_2(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:04.9893058Z E1204 11:45:50.734000 110156 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 40960 2025-12-04T12:15:04.9893609Z E1204 11:45:50.734000 110156 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:04.9894164Z E1204 11:45:50.734000 110156 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T12:15:04.9894743Z E1204 11:45:50.734000 110156 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:] 2025-12-04T12:15:04.9895178Z E1204 11:45:50.734000 110156 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T12:15:04.9895781Z E1204 11:45:50.734000 110156 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (x0), None).to(tl.float32) 2025-12-04T12:15:04.9896404Z E1204 11:45:50.734000 110156 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.load(in_ptr1 + (0)) 2025-12-04T12:15:04.9896958Z E1204 11:45:50.734000 110156 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tl.broadcast_to(tmp2, [XBLOCK]) 2025-12-04T12:15:04.9897481Z E1204 11:45:50.734000 110156 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float32) 2025-12-04T12:15:04.9897982Z E1204 11:45:50.734000 110156 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tmp1 * tmp3 2025-12-04T12:15:04.9898474Z E1204 11:45:50.734000 110156 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = -448.0 2025-12-04T12:15:04.9899046Z E1204 11:45:50.734000 110156 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp6 = triton_helpers.maximum(tmp4, tmp5) 2025-12-04T12:15:04.9899489Z E1204 11:45:50.734000 110156 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = 448.0 2025-12-04T12:15:04.9900067Z E1204 11:45:50.734000 110156 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = triton_helpers.minimum(tmp6, tmp7) 2025-12-04T12:15:04.9900591Z E1204 11:45:50.734000 110156 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = tmp8.to(tl.float8e4nv) 2025-12-04T12:15:04.9901148Z E1204 11:45:50.734000 110156 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr0 + (x0), tmp9, None) 2025-12-04T12:15:04.9901509Z E1204 11:45:50.734000 110156 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:04.9903428Z E1204 11:45:50.734000 110156 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr0': '*fp8e4nv', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 512}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 75} 2025-12-04T12:15:04.9904012Z E1204 11:45:50.734000 110156 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:04.9905051Z E1204 11:45:50.734000 110156 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:04.9905698Z E1204 11:45:50.734000 110156 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:04.9906592Z E1204 11:45:50.734000 110156 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:04.9907285Z E1204 11:45:50.734000 110156 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:04.9908162Z E1204 11:45:50.734000 110156 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:04.9908947Z E1204 11:45:50.734000 110156 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:04.9909552Z E1204 11:45:50.734000 110156 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T12:15:04.9910385Z E1204 11:45:50.734000 110156 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_clamp_mul_2(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:04.9910767Z E1204 11:45:50.734000 110156 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T12:15:04.9911653Z E1204 11:45:50.734000 110156 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:04.9911832Z ('RERUN', {'yellow': True}) [0.5593s] [100%] 2025-12-04T12:15:04.9913011Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda E1204 11:45:51.285000 110156 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_clamp_mul_2 2025-12-04T12:15:04.9913824Z E1204 11:45:51.285000 110156 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_clamp_mul_2(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:04.9914271Z E1204 11:45:51.285000 110156 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 40960 2025-12-04T12:15:04.9914812Z E1204 11:45:51.285000 110156 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:04.9915387Z E1204 11:45:51.285000 110156 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T12:15:04.9915950Z E1204 11:45:51.285000 110156 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:] 2025-12-04T12:15:04.9916394Z E1204 11:45:51.285000 110156 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T12:15:04.9917018Z E1204 11:45:51.285000 110156 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (x0), None).to(tl.float32) 2025-12-04T12:15:04.9917540Z E1204 11:45:51.285000 110156 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.load(in_ptr1 + (0)) 2025-12-04T12:15:04.9918100Z E1204 11:45:51.285000 110156 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tl.broadcast_to(tmp2, [XBLOCK]) 2025-12-04T12:15:04.9918611Z E1204 11:45:51.285000 110156 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float32) 2025-12-04T12:15:04.9919097Z E1204 11:45:51.285000 110156 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tmp1 * tmp3 2025-12-04T12:15:04.9919537Z E1204 11:45:51.285000 110156 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = -448.0 2025-12-04T12:15:04.9920104Z E1204 11:45:51.285000 110156 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp6 = triton_helpers.maximum(tmp4, tmp5) 2025-12-04T12:15:04.9920554Z E1204 11:45:51.285000 110156 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = 448.0 2025-12-04T12:15:04.9921115Z E1204 11:45:51.285000 110156 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = triton_helpers.minimum(tmp6, tmp7) 2025-12-04T12:15:04.9921654Z E1204 11:45:51.285000 110156 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = tmp8.to(tl.float8e4nv) 2025-12-04T12:15:04.9922194Z E1204 11:45:51.285000 110156 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr0 + (x0), tmp9, None) 2025-12-04T12:15:04.9922556Z E1204 11:45:51.285000 110156 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:04.9924519Z E1204 11:45:51.285000 110156 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr0': '*fp8e4nv', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 512}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 75} 2025-12-04T12:15:04.9925073Z E1204 11:45:51.285000 110156 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:04.9926171Z E1204 11:45:51.285000 110156 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:04.9926802Z E1204 11:45:51.285000 110156 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:04.9927709Z E1204 11:45:51.285000 110156 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:04.9928384Z E1204 11:45:51.285000 110156 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:04.9929280Z E1204 11:45:51.285000 110156 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:04.9930058Z E1204 11:45:51.285000 110156 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:04.9930716Z E1204 11:45:51.285000 110156 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T12:15:04.9931511Z E1204 11:45:51.285000 110156 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_clamp_mul_2(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:04.9931875Z E1204 11:45:51.285000 110156 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T12:15:04.9932776Z E1204 11:45:51.285000 110156 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:04.9932886Z FAILED [0.5497s] [100%] 2025-12-04T12:15:04.9932895Z 2025-12-04T12:15:04.9933052Z ==================================== RERUNS ==================================== 2025-12-04T12:15:04.9933376Z _ TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda _ 2025-12-04T12:15:04.9933504Z Traceback (most recent call last): 2025-12-04T12:15:04.9933975Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 265, in test_amax_along_with_fp8_quant 2025-12-04T12:15:04.9934219Z y_compiled = compiled_amax_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T12:15:04.9934736Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:04.9934985Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:04.9935499Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:04.9935711Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:04.9936223Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:04.9936438Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:04.9937036Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:04.9937361Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:04.9937897Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:04.9938052Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:04.9938563Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:04.9938703Z return self._compile_to_module() 2025-12-04T12:15:04.9939249Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:04.9939430Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:04.9939951Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:04.9940084Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:04.9940598Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:04.9940835Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:04.9941424Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:04.9941570Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:04.9942084Z File "/tmp/tmp1exnzehg/if/cifkbiminc6b7mmuphmf7suncm7zmsuhxz37qefngp7frv7ocxzz.py", line 163, in 2025-12-04T12:15:04.9942566Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T12:15:04.9942718Z kernel.precompile( 2025-12-04T12:15:04.9943275Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T12:15:04.9943417Z self._precompile_worker() 2025-12-04T12:15:04.9944017Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:04.9944213Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:04.9944810Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:04.9945013Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:04.9945484Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:04.9945733Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:04.9946177Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:04.9946532Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:04.9946763Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:04.9947139Z def triton_poi_fused__to_copy_clamp_mul_2(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:04.9947232Z ^ 2025-12-04T12:15:04.9947690Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:04.9947698Z 2025-12-04T12:15:04.9948429Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:04.9948436Z 2025-12-04T12:15:04.9948441Z 2025-12-04T12:15:04.9948659Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:04.9949317Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda 2025-12-04T12:15:04.9949364Z 2025-12-04T12:15:04.9949635Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:04.9949875Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:04.9949979Z frames [('total', 1)] 2025-12-04T12:15:04.9950099Z stats [('calls_captured', 7)] 2025-12-04T12:15:04.9950351Z inductor [('async_compile_cache_miss', 3), ('fxgraph_cache_miss', 1)] 2025-12-04T12:15:04.9950603Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:04.9950703Z graph_break [] 2025-12-04T12:15:04.9951071Z _ TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda _ 2025-12-04T12:15:04.9951198Z Traceback (most recent call last): 2025-12-04T12:15:04.9951650Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 265, in test_amax_along_with_fp8_quant 2025-12-04T12:15:04.9951907Z y_compiled = compiled_amax_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T12:15:04.9952398Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:04.9952660Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:04.9953173Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:04.9953371Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:04.9953892Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:04.9954039Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:04.9954583Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:04.9954934Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:04.9955456Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:04.9955616Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:04.9956096Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:04.9956233Z return self._compile_to_module() 2025-12-04T12:15:04.9956721Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:04.9956883Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:04.9957410Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:04.9957543Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:04.9958040Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:04.9958281Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:04.9958864Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:04.9959004Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:04.9959514Z File "/tmp/tmpwkf37tjm/md/cmdhorstx6yqakfmcbbzckz3wgrv7s3rff4a73j3f7w4g3lyk6ev.py", line 163, in 2025-12-04T12:15:04.9959978Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T12:15:04.9960103Z kernel.precompile( 2025-12-04T12:15:04.9960658Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T12:15:04.9960791Z self._precompile_worker() 2025-12-04T12:15:04.9961423Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:04.9961605Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:04.9962213Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:04.9962410Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:04.9962892Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:04.9963151Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:04.9963627Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:04.9963975Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:04.9964202Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:04.9964566Z def triton_poi_fused__to_copy_clamp_mul_2(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:04.9964669Z ^ 2025-12-04T12:15:04.9965123Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:04.9965128Z 2025-12-04T12:15:04.9965853Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:04.9965861Z 2025-12-04T12:15:04.9965866Z 2025-12-04T12:15:04.9966086Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:04.9966724Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda 2025-12-04T12:15:04.9966786Z 2025-12-04T12:15:04.9967058Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:04.9967285Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:04.9967404Z frames [('total', 1)] 2025-12-04T12:15:04.9967523Z stats [('calls_captured', 7)] 2025-12-04T12:15:04.9967759Z inductor [('async_compile_cache_miss', 3), ('fxgraph_cache_miss', 1)] 2025-12-04T12:15:04.9967995Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:04.9968099Z graph_break [] 2025-12-04T12:15:04.9968318Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:04.9968440Z frames [('total', 1)] 2025-12-04T12:15:04.9968556Z stats [('calls_captured', 7)] 2025-12-04T12:15:04.9968789Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:04.9969023Z inductor [('async_compile_cache_miss', 3), ('fxgraph_cache_miss', 1)] 2025-12-04T12:15:04.9969125Z graph_break [] 2025-12-04T12:15:04.9975367Z =================================== FAILURES =================================== 2025-12-04T12:15:04.9975775Z _ TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda _ 2025-12-04T12:15:04.9975903Z Traceback (most recent call last): 2025-12-04T12:15:04.9976447Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 265, in test_amax_along_with_fp8_quant 2025-12-04T12:15:04.9976696Z y_compiled = compiled_amax_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T12:15:04.9977212Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:04.9977466Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:04.9977990Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:04.9978194Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:04.9978849Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:04.9979001Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:04.9979544Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:04.9979867Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:04.9980396Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:04.9980599Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:04.9981126Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:04.9981265Z return self._compile_to_module() 2025-12-04T12:15:04.9981754Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:04.9981929Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:04.9982450Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:04.9982580Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:04.9983092Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:04.9983327Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:04.9983913Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:04.9984056Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:04.9984565Z File "/tmp/tmp8witzdji/m6/cm64mukunvqnv4ogmdbzdplkr5f2rm4tlozbcmimeunhijmt44ty.py", line 163, in 2025-12-04T12:15:04.9985098Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T12:15:04.9985218Z kernel.precompile( 2025-12-04T12:15:04.9985769Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T12:15:04.9985908Z self._precompile_worker() 2025-12-04T12:15:04.9986502Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:04.9986696Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:04.9987293Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:04.9987493Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:04.9987955Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:04.9988205Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:04.9988649Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:04.9988996Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:04.9989224Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:04.9989593Z def triton_poi_fused__to_copy_clamp_mul_2(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:04.9989687Z ^ 2025-12-04T12:15:04.9990145Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:04.9990152Z 2025-12-04T12:15:04.9990873Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:04.9990882Z 2025-12-04T12:15:04.9990887Z 2025-12-04T12:15:04.9991137Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:04.9991788Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda 2025-12-04T12:15:04.9991794Z 2025-12-04T12:15:04.9992064Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:04.9992296Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:04.9992455Z frames [('total', 1)] 2025-12-04T12:15:04.9992572Z stats [('calls_captured', 7)] 2025-12-04T12:15:04.9992849Z inductor [('async_compile_cache_miss', 3), ('fxgraph_cache_miss', 1)] 2025-12-04T12:15:04.9993073Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:04.9993172Z graph_break [] 2025-12-04T12:15:04.9993405Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:04.9993510Z frames [('total', 1)] 2025-12-04T12:15:04.9993629Z stats [('calls_captured', 7)] 2025-12-04T12:15:04.9993859Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:04.9994088Z inductor [('async_compile_cache_miss', 3), ('fxgraph_cache_miss', 1)] 2025-12-04T12:15:04.9994205Z graph_break [] 2025-12-04T12:15:04.9994420Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:04.9994520Z frames [('total', 1)] 2025-12-04T12:15:04.9994649Z stats [('calls_captured', 7)] 2025-12-04T12:15:04.9994866Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:04.9995096Z inductor [('async_compile_cache_miss', 3), ('fxgraph_cache_miss', 1)] 2025-12-04T12:15:04.9995205Z graph_break [] 2025-12-04T12:15:04.9995867Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-5f646abfecfc34db.xml - 2025-12-04T12:15:04.9996082Z =========================== short test summary info ============================ 2025-12-04T12:15:04.9996861Z FAILED [0.5497s] inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:04.9997219Z def triton_poi_fused__to_copy_clamp_mul_2(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:04.9997318Z ^ 2025-12-04T12:15:04.9997774Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:04.9997782Z 2025-12-04T12:15:04.9998500Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:04.9998506Z 2025-12-04T12:15:04.9998511Z 2025-12-04T12:15:04.9998728Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:04.9999372Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda 2025-12-04T12:15:04.9999378Z 2025-12-04T12:15:04.9999656Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:04.9999835Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T12:15:05.0000051Z ================== 1 failed, 187 deselected, 2 rerun in 4.88s ================== 2025-12-04T12:15:05.0000154Z Got exit code 1 2025-12-04T12:15:05.0000707Z FAILED CONSISTENTLY: test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda 2025-12-04T12:15:05.0001130Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T12:15:05.0001593Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-2f71fa45f6063b14.xml 2025-12-04T12:15:05.0001770Z ============================= test session starts ============================== 2025-12-04T12:15:05.0002158Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T12:15:05.0002271Z cachedir: .pytest_cache 2025-12-04T12:15:05.0002798Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:15:05.0002921Z rootdir: /var/lib/jenkins/workspace 2025-12-04T12:15:05.0003031Z configfile: pytest.ini 2025-12-04T12:15:05.0003664Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T12:15:05.0003914Z collecting ... collected 188 items / 3 deselected / 185 selected 2025-12-04T12:15:05.0004068Z stepcurrent: skipping 3 already run items. 2025-12-04T12:15:05.0004183Z Running 185 items in this shard 2025-12-04T12:15:05.0004192Z 2025-12-04T12:15:05.0005435Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda E1204 11:46:09.456000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_0 2025-12-04T12:15:05.0006504Z E1204 11:46:09.456000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T12:15:05.0006942Z E1204 11:46:09.456000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T12:15:05.0007408Z E1204 11:46:09.456000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 5120 2025-12-04T12:15:05.0007865Z E1204 11:46:09.456000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T12:15:05.0008436Z E1204 11:46:09.456000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T12:15:05.0008989Z E1204 11:46:09.456000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:05.0009572Z E1204 11:46:09.456000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T12:15:05.0010179Z E1204 11:46:09.456000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T12:15:05.0010733Z E1204 11:46:09.456000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_base = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T12:15:05.0011190Z E1204 11:46:09.456000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rbase = r0_base 2025-12-04T12:15:05.0011819Z E1204 11:46:09.456000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp3 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32) 2025-12-04T12:15:05.0012338Z E1204 11:46:09.456000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp6 = tl.load(in_ptr1 + (0)) 2025-12-04T12:15:05.0012889Z E1204 11:46:09.456000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = tl.broadcast_to(tmp6, [1, 1]) 2025-12-04T12:15:05.0013470Z E1204 11:46:09.456000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] for r0_offset in tl.range(0, r0_numel, R0_BLOCK): 2025-12-04T12:15:05.0014012Z E1204 11:46:09.456000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = r0_offset + r0_base 2025-12-04T12:15:05.0014535Z E1204 11:46:09.456000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T12:15:05.0015062Z E1204 11:46:09.456000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T12:15:05.0015554Z E1204 11:46:09.456000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T12:15:05.0016021Z E1204 11:46:09.456000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_0 = r0_index 2025-12-04T12:15:05.0016919Z E1204 11:46:09.456000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, eviction_policy='evict_first', other=0.0).to(tl.float32) 2025-12-04T12:15:05.0017505Z E1204 11:46:09.456000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tl_math.abs(tmp0) 2025-12-04T12:15:05.0018096Z E1204 11:46:09.456000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK]) 2025-12-04T12:15:05.0018689Z E1204 11:46:09.456000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = triton_helpers.maximum(_tmp3, tmp2) 2025-12-04T12:15:05.0019245Z E1204 11:46:09.456000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp3 = tl.where(r0_mask, tmp4, _tmp3) 2025-12-04T12:15:05.0019806Z E1204 11:46:09.456000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = tmp0.to(tl.float32) 2025-12-04T12:15:05.0020295Z E1204 11:46:09.456000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tmp5 * tmp7 2025-12-04T12:15:05.0020778Z E1204 11:46:09.456000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = -448.0 2025-12-04T12:15:05.0021361Z E1204 11:46:09.456000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = triton_helpers.maximum(tmp8, tmp9) 2025-12-04T12:15:05.0021851Z E1204 11:46:09.456000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = 448.0 2025-12-04T12:15:05.0022439Z E1204 11:46:09.456000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = triton_helpers.minimum(tmp10, tmp11) 2025-12-04T12:15:05.0022979Z E1204 11:46:09.456000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = tmp12.to(tl.float8e4nv) 2025-12-04T12:15:05.0023698Z E1204 11:46:09.456000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (tl.broadcast_to(r0_0, [XBLOCK, R0_BLOCK])), tmp13, r0_mask) 2025-12-04T12:15:05.0024274Z E1204 11:46:09.456000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = triton_helpers.max2(_tmp3, 1)[:, None] 2025-12-04T12:15:05.0024982Z E1204 11:46:09.456000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr2 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp3, None) 2025-12-04T12:15:05.0025362Z E1204 11:46:09.456000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:05.0027699Z E1204 11:46:09.456000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'out_ptr2': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1, 'R0_BLOCK': 2048}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 75} 2025-12-04T12:15:05.0028255Z E1204 11:46:09.456000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:05.0029365Z E1204 11:46:09.456000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.0030008Z E1204 11:46:09.456000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.0030945Z E1204 11:46:09.456000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.0031687Z E1204 11:46:09.456000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.0032577Z E1204 11:46:09.456000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.0033357Z E1204 11:46:09.456000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.0033961Z E1204 11:46:09.456000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T12:15:05.0035028Z E1204 11:46:09.456000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T12:15:05.0035406Z E1204 11:46:09.456000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T12:15:05.0036345Z E1204 11:46:09.456000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.0036491Z ('RERUN', {'yellow': True}) [3.3187s] [ 0%] 2025-12-04T12:15:05.0037732Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda E1204 11:46:09.853000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_0 2025-12-04T12:15:05.0038802Z E1204 11:46:09.853000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T12:15:05.0039237Z E1204 11:46:09.853000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T12:15:05.0039690Z E1204 11:46:09.853000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 5120 2025-12-04T12:15:05.0040169Z E1204 11:46:09.853000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T12:15:05.0040706Z E1204 11:46:09.853000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T12:15:05.0041257Z E1204 11:46:09.853000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:05.0041848Z E1204 11:46:09.853000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T12:15:05.0042435Z E1204 11:46:09.853000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T12:15:05.0043082Z E1204 11:46:09.853000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_base = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T12:15:05.0043534Z E1204 11:46:09.853000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rbase = r0_base 2025-12-04T12:15:05.0044175Z E1204 11:46:09.853000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp3 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32) 2025-12-04T12:15:05.0044741Z E1204 11:46:09.853000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp6 = tl.load(in_ptr1 + (0)) 2025-12-04T12:15:05.0045330Z E1204 11:46:09.853000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = tl.broadcast_to(tmp6, [1, 1]) 2025-12-04T12:15:05.0045913Z E1204 11:46:09.853000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] for r0_offset in tl.range(0, r0_numel, R0_BLOCK): 2025-12-04T12:15:05.0046447Z E1204 11:46:09.853000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = r0_offset + r0_base 2025-12-04T12:15:05.0046989Z E1204 11:46:09.853000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T12:15:05.0047477Z E1204 11:46:09.853000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T12:15:05.0047978Z E1204 11:46:09.853000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T12:15:05.0048454Z E1204 11:46:09.853000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_0 = r0_index 2025-12-04T12:15:05.0049218Z E1204 11:46:09.853000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, eviction_policy='evict_first', other=0.0).to(tl.float32) 2025-12-04T12:15:05.0049793Z E1204 11:46:09.853000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tl_math.abs(tmp0) 2025-12-04T12:15:05.0050381Z E1204 11:46:09.853000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK]) 2025-12-04T12:15:05.0050967Z E1204 11:46:09.853000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = triton_helpers.maximum(_tmp3, tmp2) 2025-12-04T12:15:05.0051522Z E1204 11:46:09.853000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp3 = tl.where(r0_mask, tmp4, _tmp3) 2025-12-04T12:15:05.0052052Z E1204 11:46:09.853000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = tmp0.to(tl.float32) 2025-12-04T12:15:05.0052550Z E1204 11:46:09.853000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tmp5 * tmp7 2025-12-04T12:15:05.0053017Z E1204 11:46:09.853000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = -448.0 2025-12-04T12:15:05.0053602Z E1204 11:46:09.853000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = triton_helpers.maximum(tmp8, tmp9) 2025-12-04T12:15:05.0054063Z E1204 11:46:09.853000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = 448.0 2025-12-04T12:15:05.0054640Z E1204 11:46:09.853000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = triton_helpers.minimum(tmp10, tmp11) 2025-12-04T12:15:05.0055196Z E1204 11:46:09.853000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = tmp12.to(tl.float8e4nv) 2025-12-04T12:15:05.0055902Z E1204 11:46:09.853000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (tl.broadcast_to(r0_0, [XBLOCK, R0_BLOCK])), tmp13, r0_mask) 2025-12-04T12:15:05.0056611Z E1204 11:46:09.853000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = triton_helpers.max2(_tmp3, 1)[:, None] 2025-12-04T12:15:05.0057320Z E1204 11:46:09.853000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr2 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp3, None) 2025-12-04T12:15:05.0057697Z E1204 11:46:09.853000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:05.0060104Z E1204 11:46:09.853000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'out_ptr2': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1, 'R0_BLOCK': 2048}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 75} 2025-12-04T12:15:05.0060666Z E1204 11:46:09.853000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:05.0061710Z E1204 11:46:09.853000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.0062363Z E1204 11:46:09.853000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.0063255Z E1204 11:46:09.853000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.0063992Z E1204 11:46:09.853000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.0064885Z E1204 11:46:09.853000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.0065667Z E1204 11:46:09.853000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.0066290Z E1204 11:46:09.853000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T12:15:05.0067348Z E1204 11:46:09.853000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T12:15:05.0067731Z E1204 11:46:09.853000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T12:15:05.0068625Z E1204 11:46:09.853000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.0068764Z ('RERUN', {'yellow': True}) [0.3581s] [ 0%] 2025-12-04T12:15:05.0070013Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda E1204 11:46:10.215000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_0 2025-12-04T12:15:05.0071383Z E1204 11:46:10.215000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T12:15:05.0071838Z E1204 11:46:10.215000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T12:15:05.0072295Z E1204 11:46:10.215000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 5120 2025-12-04T12:15:05.0072813Z E1204 11:46:10.215000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T12:15:05.0073483Z E1204 11:46:10.215000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T12:15:05.0074029Z E1204 11:46:10.215000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:05.0074634Z E1204 11:46:10.215000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T12:15:05.0075219Z E1204 11:46:10.215000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T12:15:05.0075789Z E1204 11:46:10.215000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_base = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T12:15:05.0076244Z E1204 11:46:10.215000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rbase = r0_base 2025-12-04T12:15:05.0076876Z E1204 11:46:10.215000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp3 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32) 2025-12-04T12:15:05.0077412Z E1204 11:46:10.215000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp6 = tl.load(in_ptr1 + (0)) 2025-12-04T12:15:05.0078018Z E1204 11:46:10.215000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = tl.broadcast_to(tmp6, [1, 1]) 2025-12-04T12:15:05.0078612Z E1204 11:46:10.215000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] for r0_offset in tl.range(0, r0_numel, R0_BLOCK): 2025-12-04T12:15:05.0079143Z E1204 11:46:10.215000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = r0_offset + r0_base 2025-12-04T12:15:05.0079672Z E1204 11:46:10.215000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T12:15:05.0080177Z E1204 11:46:10.215000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T12:15:05.0080654Z E1204 11:46:10.215000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T12:15:05.0081135Z E1204 11:46:10.215000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_0 = r0_index 2025-12-04T12:15:05.0081903Z E1204 11:46:10.215000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, eviction_policy='evict_first', other=0.0).to(tl.float32) 2025-12-04T12:15:05.0082428Z E1204 11:46:10.215000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tl_math.abs(tmp0) 2025-12-04T12:15:05.0083019Z E1204 11:46:10.215000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK]) 2025-12-04T12:15:05.0083593Z E1204 11:46:10.215000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = triton_helpers.maximum(_tmp3, tmp2) 2025-12-04T12:15:05.0084162Z E1204 11:46:10.215000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp3 = tl.where(r0_mask, tmp4, _tmp3) 2025-12-04T12:15:05.0084724Z E1204 11:46:10.215000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = tmp0.to(tl.float32) 2025-12-04T12:15:05.0085223Z E1204 11:46:10.215000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tmp5 * tmp7 2025-12-04T12:15:05.0085684Z E1204 11:46:10.215000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = -448.0 2025-12-04T12:15:05.0086292Z E1204 11:46:10.215000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = triton_helpers.maximum(tmp8, tmp9) 2025-12-04T12:15:05.0086787Z E1204 11:46:10.215000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = 448.0 2025-12-04T12:15:05.0087368Z E1204 11:46:10.215000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = triton_helpers.minimum(tmp10, tmp11) 2025-12-04T12:15:05.0087929Z E1204 11:46:10.215000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = tmp12.to(tl.float8e4nv) 2025-12-04T12:15:05.0088636Z E1204 11:46:10.215000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (tl.broadcast_to(r0_0, [XBLOCK, R0_BLOCK])), tmp13, r0_mask) 2025-12-04T12:15:05.0089208Z E1204 11:46:10.215000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = triton_helpers.max2(_tmp3, 1)[:, None] 2025-12-04T12:15:05.0089926Z E1204 11:46:10.215000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr2 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp3, None) 2025-12-04T12:15:05.0090292Z E1204 11:46:10.215000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:05.0092635Z E1204 11:46:10.215000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'out_ptr2': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1, 'R0_BLOCK': 2048}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 75} 2025-12-04T12:15:05.0093224Z E1204 11:46:10.215000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:05.0094278Z E1204 11:46:10.215000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.0094908Z E1204 11:46:10.215000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.0095810Z E1204 11:46:10.215000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.0096553Z E1204 11:46:10.215000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.0097449Z E1204 11:46:10.215000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.0098218Z E1204 11:46:10.215000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.0098863Z E1204 11:46:10.215000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T12:15:05.0099935Z E1204 11:46:10.215000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T12:15:05.0100337Z E1204 11:46:10.215000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T12:15:05.0101273Z E1204 11:46:10.215000 110417 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.0101379Z FAILED [0.3591s] [ 0%] 2025-12-04T12:15:05.0101388Z 2025-12-04T12:15:05.0101550Z ==================================== RERUNS ==================================== 2025-12-04T12:15:05.0101879Z _ TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda _ 2025-12-04T12:15:05.0102004Z Traceback (most recent call last): 2025-12-04T12:15:05.0102472Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 265, in test_amax_along_with_fp8_quant 2025-12-04T12:15:05.0102717Z y_compiled = compiled_amax_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T12:15:05.0103208Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:05.0103470Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:05.0103990Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:05.0104195Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:05.0104737Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:05.0104891Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:05.0105446Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:05.0105766Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:05.0106295Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:05.0106447Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:05.0106929Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:05.0107066Z return self._compile_to_module() 2025-12-04T12:15:05.0107565Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:05.0107736Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:05.0108264Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:05.0108394Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:05.0108899Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:05.0109131Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:05.0109716Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:05.0109855Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:05.0110352Z File "/tmp/tmpotzsm7xh/qu/cqufc5x6lwjjws43ojgjstkx4n6wmjh3epgcpo7hc3eqwk666dtt.py", line 62, in 2025-12-04T12:15:05.0110826Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T12:15:05.0110971Z kernel.precompile( 2025-12-04T12:15:05.0111525Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T12:15:05.0111656Z self._precompile_worker() 2025-12-04T12:15:05.0112252Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:05.0112465Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:05.0113099Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.0113299Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.0113761Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.0114012Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.0114457Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.0114808Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.0115036Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:05.0115662Z def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T12:15:05.0115756Z ^ 2025-12-04T12:15:05.0116215Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.0116221Z 2025-12-04T12:15:05.0116940Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:05.0116978Z 2025-12-04T12:15:05.0116983Z 2025-12-04T12:15:05.0117204Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:05.0117851Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda 2025-12-04T12:15:05.0117857Z 2025-12-04T12:15:05.0118126Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:05.0118355Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.0118473Z frames [('total', 1)] 2025-12-04T12:15:05.0118592Z stats [('calls_captured', 7)] 2025-12-04T12:15:05.0118845Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:05.0119064Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.0119168Z graph_break [] 2025-12-04T12:15:05.0119500Z _ TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda _ 2025-12-04T12:15:05.0119626Z Traceback (most recent call last): 2025-12-04T12:15:05.0120076Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 265, in test_amax_along_with_fp8_quant 2025-12-04T12:15:05.0120332Z y_compiled = compiled_amax_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T12:15:05.0120820Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:05.0121085Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:05.0121598Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:05.0121791Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:05.0122315Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:05.0122464Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:05.0123033Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:05.0123368Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:05.0123883Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:05.0124077Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:05.0124557Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:05.0124710Z return self._compile_to_module() 2025-12-04T12:15:05.0125207Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:05.0125375Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:05.0125903Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:05.0126032Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:05.0126526Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:05.0126770Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:05.0127357Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:05.0127504Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:05.0128005Z File "/tmp/tmpp8qefu7m/dx/cdxobz35is7vmn2nhahqefcnttool33pgq6zbnkhhcb2ssgtygfe.py", line 62, in 2025-12-04T12:15:05.0128466Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T12:15:05.0128625Z kernel.precompile( 2025-12-04T12:15:05.0129182Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T12:15:05.0129298Z self._precompile_worker() 2025-12-04T12:15:05.0129903Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:05.0130083Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:05.0130691Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.0130889Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.0131343Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.0131599Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.0132047Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.0132393Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.0132619Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:05.0133228Z def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T12:15:05.0133333Z ^ 2025-12-04T12:15:05.0133789Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.0133795Z 2025-12-04T12:15:05.0134521Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:05.0134529Z 2025-12-04T12:15:05.0134534Z 2025-12-04T12:15:05.0134751Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:05.0135426Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda 2025-12-04T12:15:05.0135433Z 2025-12-04T12:15:05.0135714Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:05.0135935Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.0136086Z frames [('total', 1)] 2025-12-04T12:15:05.0136207Z stats [('calls_captured', 7)] 2025-12-04T12:15:05.0136523Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:05.0136828Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.0136932Z graph_break [] 2025-12-04T12:15:05.0137153Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.0137275Z frames [('total', 1)] 2025-12-04T12:15:05.0137391Z stats [('calls_captured', 7)] 2025-12-04T12:15:05.0137612Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.0137861Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:05.0137960Z graph_break [] 2025-12-04T12:15:05.0138123Z =================================== FAILURES =================================== 2025-12-04T12:15:05.0138447Z _ TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda _ 2025-12-04T12:15:05.0138575Z Traceback (most recent call last): 2025-12-04T12:15:05.0139040Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 265, in test_amax_along_with_fp8_quant 2025-12-04T12:15:05.0139286Z y_compiled = compiled_amax_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T12:15:05.0139777Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:05.0140081Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:05.0140597Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:05.0140800Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:05.0141308Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:05.0141454Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:05.0142003Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:05.0142326Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:05.0142858Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:05.0143010Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:05.0143493Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:05.0143628Z return self._compile_to_module() 2025-12-04T12:15:05.0144132Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:05.0144310Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:05.0144827Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:05.0144962Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:05.0145482Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:05.0145716Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:05.0146313Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:05.0146500Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:05.0146978Z File "/tmp/tmpluiakr_1/oh/coh7xscqvd6toy45gje54ujlzvbfh7fibt64kkk6wcyq4btftkku.py", line 62, in 2025-12-04T12:15:05.0147455Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T12:15:05.0147568Z kernel.precompile( 2025-12-04T12:15:05.0148123Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T12:15:05.0148287Z self._precompile_worker() 2025-12-04T12:15:05.0148915Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:05.0149114Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:05.0149714Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.0149916Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.0150385Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.0150635Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.0151079Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.0151433Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.0151666Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:05.0152295Z def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T12:15:05.0152422Z ^ 2025-12-04T12:15:05.0152884Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.0152890Z 2025-12-04T12:15:05.0153618Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:05.0153624Z 2025-12-04T12:15:05.0153629Z 2025-12-04T12:15:05.0153847Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:05.0154503Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda 2025-12-04T12:15:05.0154509Z 2025-12-04T12:15:05.0154783Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:05.0155019Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.0155127Z frames [('total', 1)] 2025-12-04T12:15:05.0155247Z stats [('calls_captured', 7)] 2025-12-04T12:15:05.0155504Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:05.0155726Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.0155830Z graph_break [] 2025-12-04T12:15:05.0156063Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.0156169Z frames [('total', 1)] 2025-12-04T12:15:05.0156287Z stats [('calls_captured', 7)] 2025-12-04T12:15:05.0156521Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.0156756Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:05.0156871Z graph_break [] 2025-12-04T12:15:05.0157086Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.0157190Z frames [('total', 1)] 2025-12-04T12:15:05.0157316Z stats [('calls_captured', 7)] 2025-12-04T12:15:05.0157535Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.0157809Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:05.0157928Z graph_break [] 2025-12-04T12:15:05.0158583Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-2f71fa45f6063b14.xml - 2025-12-04T12:15:05.0158769Z =========================== short test summary info ============================ 2025-12-04T12:15:05.0159542Z FAILED [0.3591s] inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:05.0160213Z def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T12:15:05.0160316Z ^ 2025-12-04T12:15:05.0160771Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.0160779Z 2025-12-04T12:15:05.0161500Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:05.0161506Z 2025-12-04T12:15:05.0161511Z 2025-12-04T12:15:05.0161728Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:05.0162362Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda 2025-12-04T12:15:05.0162383Z 2025-12-04T12:15:05.0162651Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:05.0162834Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T12:15:05.0163046Z =================== 1 failed, 3 deselected, 2 rerun in 4.08s =================== 2025-12-04T12:15:05.0163179Z Got exit code 1 2025-12-04T12:15:05.0163288Z Retrying single test... 2025-12-04T12:15:05.0163773Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-3d881319a967678f.xml 2025-12-04T12:15:05.0163937Z ============================= test session starts ============================== 2025-12-04T12:15:05.0164302Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T12:15:05.0164415Z cachedir: .pytest_cache 2025-12-04T12:15:05.0164934Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:15:05.0165074Z rootdir: /var/lib/jenkins/workspace 2025-12-04T12:15:05.0165184Z configfile: pytest.ini 2025-12-04T12:15:05.0165779Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T12:15:05.0166018Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T12:15:05.0166732Z stepcurrent: skipping 3 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda 2025-12-04T12:15:05.0166864Z Running 1 items in this shard 2025-12-04T12:15:05.0166869Z 2025-12-04T12:15:05.0168110Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda E1204 11:46:29.258000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_0 2025-12-04T12:15:05.0169181Z E1204 11:46:29.258000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T12:15:05.0169612Z E1204 11:46:29.258000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T12:15:05.0170096Z E1204 11:46:29.258000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 5120 2025-12-04T12:15:05.0170573Z E1204 11:46:29.258000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T12:15:05.0171315Z E1204 11:46:29.258000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T12:15:05.0171872Z E1204 11:46:29.258000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:05.0172581Z E1204 11:46:29.258000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T12:15:05.0173168Z E1204 11:46:29.258000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T12:15:05.0173742Z E1204 11:46:29.258000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_base = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T12:15:05.0174193Z E1204 11:46:29.258000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rbase = r0_base 2025-12-04T12:15:05.0174832Z E1204 11:46:29.258000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp3 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32) 2025-12-04T12:15:05.0175356Z E1204 11:46:29.258000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp6 = tl.load(in_ptr1 + (0)) 2025-12-04T12:15:05.0175898Z E1204 11:46:29.258000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = tl.broadcast_to(tmp6, [1, 1]) 2025-12-04T12:15:05.0176553Z E1204 11:46:29.258000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] for r0_offset in tl.range(0, r0_numel, R0_BLOCK): 2025-12-04T12:15:05.0177159Z E1204 11:46:29.258000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = r0_offset + r0_base 2025-12-04T12:15:05.0177699Z E1204 11:46:29.258000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T12:15:05.0178188Z E1204 11:46:29.258000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T12:15:05.0178671Z E1204 11:46:29.258000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T12:15:05.0179159Z E1204 11:46:29.258000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_0 = r0_index 2025-12-04T12:15:05.0179922Z E1204 11:46:29.258000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, eviction_policy='evict_first', other=0.0).to(tl.float32) 2025-12-04T12:15:05.0180458Z E1204 11:46:29.258000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tl_math.abs(tmp0) 2025-12-04T12:15:05.0181048Z E1204 11:46:29.258000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK]) 2025-12-04T12:15:05.0181634Z E1204 11:46:29.258000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = triton_helpers.maximum(_tmp3, tmp2) 2025-12-04T12:15:05.0182187Z E1204 11:46:29.258000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp3 = tl.where(r0_mask, tmp4, _tmp3) 2025-12-04T12:15:05.0182709Z E1204 11:46:29.258000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = tmp0.to(tl.float32) 2025-12-04T12:15:05.0183206Z E1204 11:46:29.258000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tmp5 * tmp7 2025-12-04T12:15:05.0183709Z E1204 11:46:29.258000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = -448.0 2025-12-04T12:15:05.0184302Z E1204 11:46:29.258000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = triton_helpers.maximum(tmp8, tmp9) 2025-12-04T12:15:05.0184758Z E1204 11:46:29.258000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = 448.0 2025-12-04T12:15:05.0185335Z E1204 11:46:29.258000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = triton_helpers.minimum(tmp10, tmp11) 2025-12-04T12:15:05.0185962Z E1204 11:46:29.258000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = tmp12.to(tl.float8e4nv) 2025-12-04T12:15:05.0186671Z E1204 11:46:29.258000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (tl.broadcast_to(r0_0, [XBLOCK, R0_BLOCK])), tmp13, r0_mask) 2025-12-04T12:15:05.0187262Z E1204 11:46:29.258000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = triton_helpers.max2(_tmp3, 1)[:, None] 2025-12-04T12:15:05.0187971Z E1204 11:46:29.258000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr2 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp3, None) 2025-12-04T12:15:05.0188353Z E1204 11:46:29.258000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:05.0190695Z E1204 11:46:29.258000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'out_ptr2': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1, 'R0_BLOCK': 2048}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 75} 2025-12-04T12:15:05.0191276Z E1204 11:46:29.258000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:05.0192318Z E1204 11:46:29.258000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.0192951Z E1204 11:46:29.258000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.0193855Z E1204 11:46:29.258000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.0194539Z E1204 11:46:29.258000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.0195436Z E1204 11:46:29.258000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.0196210Z E1204 11:46:29.258000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.0196829Z E1204 11:46:29.258000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T12:15:05.0197930Z E1204 11:46:29.258000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T12:15:05.0198357Z E1204 11:46:29.258000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T12:15:05.0199298Z E1204 11:46:29.258000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.0199507Z ('RERUN', {'yellow': True}) [3.3108s] [100%] 2025-12-04T12:15:05.0200793Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda E1204 11:46:29.652000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_0 2025-12-04T12:15:05.0201885Z E1204 11:46:29.652000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T12:15:05.0202346Z E1204 11:46:29.652000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T12:15:05.0202796Z E1204 11:46:29.652000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 5120 2025-12-04T12:15:05.0203276Z E1204 11:46:29.652000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T12:15:05.0203814Z E1204 11:46:29.652000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T12:15:05.0204364Z E1204 11:46:29.652000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:05.0204986Z E1204 11:46:29.652000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T12:15:05.0205571Z E1204 11:46:29.652000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T12:15:05.0206147Z E1204 11:46:29.652000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_base = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T12:15:05.0206597Z E1204 11:46:29.652000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rbase = r0_base 2025-12-04T12:15:05.0207248Z E1204 11:46:29.652000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp3 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32) 2025-12-04T12:15:05.0207771Z E1204 11:46:29.652000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp6 = tl.load(in_ptr1 + (0)) 2025-12-04T12:15:05.0208339Z E1204 11:46:29.652000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = tl.broadcast_to(tmp6, [1, 1]) 2025-12-04T12:15:05.0208920Z E1204 11:46:29.652000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] for r0_offset in tl.range(0, r0_numel, R0_BLOCK): 2025-12-04T12:15:05.0209453Z E1204 11:46:29.652000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = r0_offset + r0_base 2025-12-04T12:15:05.0209998Z E1204 11:46:29.652000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T12:15:05.0210489Z E1204 11:46:29.652000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T12:15:05.0210981Z E1204 11:46:29.652000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T12:15:05.0211451Z E1204 11:46:29.652000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_0 = r0_index 2025-12-04T12:15:05.0212249Z E1204 11:46:29.652000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, eviction_policy='evict_first', other=0.0).to(tl.float32) 2025-12-04T12:15:05.0212778Z E1204 11:46:29.652000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tl_math.abs(tmp0) 2025-12-04T12:15:05.0213396Z E1204 11:46:29.652000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK]) 2025-12-04T12:15:05.0214016Z E1204 11:46:29.652000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = triton_helpers.maximum(_tmp3, tmp2) 2025-12-04T12:15:05.0214574Z E1204 11:46:29.652000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp3 = tl.where(r0_mask, tmp4, _tmp3) 2025-12-04T12:15:05.0215106Z E1204 11:46:29.652000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = tmp0.to(tl.float32) 2025-12-04T12:15:05.0215609Z E1204 11:46:29.652000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tmp5 * tmp7 2025-12-04T12:15:05.0216074Z E1204 11:46:29.652000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = -448.0 2025-12-04T12:15:05.0216736Z E1204 11:46:29.652000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = triton_helpers.maximum(tmp8, tmp9) 2025-12-04T12:15:05.0217208Z E1204 11:46:29.652000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = 448.0 2025-12-04T12:15:05.0217800Z E1204 11:46:29.652000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = triton_helpers.minimum(tmp10, tmp11) 2025-12-04T12:15:05.0218385Z E1204 11:46:29.652000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = tmp12.to(tl.float8e4nv) 2025-12-04T12:15:05.0219098Z E1204 11:46:29.652000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (tl.broadcast_to(r0_0, [XBLOCK, R0_BLOCK])), tmp13, r0_mask) 2025-12-04T12:15:05.0219686Z E1204 11:46:29.652000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = triton_helpers.max2(_tmp3, 1)[:, None] 2025-12-04T12:15:05.0220397Z E1204 11:46:29.652000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr2 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp3, None) 2025-12-04T12:15:05.0220780Z E1204 11:46:29.652000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:05.0223155Z E1204 11:46:29.652000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'out_ptr2': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1, 'R0_BLOCK': 2048}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 75} 2025-12-04T12:15:05.0223711Z E1204 11:46:29.652000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:05.0224775Z E1204 11:46:29.652000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.0225460Z E1204 11:46:29.652000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.0226363Z E1204 11:46:29.652000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.0227047Z E1204 11:46:29.652000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.0228014Z E1204 11:46:29.652000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.0228787Z E1204 11:46:29.652000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.0229426Z E1204 11:46:29.652000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T12:15:05.0230487Z E1204 11:46:29.652000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T12:15:05.0230870Z E1204 11:46:29.652000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T12:15:05.0231768Z E1204 11:46:29.652000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.0231903Z ('RERUN', {'yellow': True}) [0.3543s] [100%] 2025-12-04T12:15:05.0233206Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda E1204 11:46:30.008000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_0 2025-12-04T12:15:05.0234262Z E1204 11:46:30.008000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T12:15:05.0234712Z E1204 11:46:30.008000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T12:15:05.0235163Z E1204 11:46:30.008000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 5120 2025-12-04T12:15:05.0235636Z E1204 11:46:30.008000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T12:15:05.0236176Z E1204 11:46:30.008000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T12:15:05.0236721Z E1204 11:46:30.008000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:05.0237319Z E1204 11:46:30.008000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T12:15:05.0237908Z E1204 11:46:30.008000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T12:15:05.0238486Z E1204 11:46:30.008000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_base = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T12:15:05.0238944Z E1204 11:46:30.008000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rbase = r0_base 2025-12-04T12:15:05.0239632Z E1204 11:46:30.008000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp3 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32) 2025-12-04T12:15:05.0240175Z E1204 11:46:30.008000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp6 = tl.load(in_ptr1 + (0)) 2025-12-04T12:15:05.0240717Z E1204 11:46:30.008000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = tl.broadcast_to(tmp6, [1, 1]) 2025-12-04T12:15:05.0241342Z E1204 11:46:30.008000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] for r0_offset in tl.range(0, r0_numel, R0_BLOCK): 2025-12-04T12:15:05.0241903Z E1204 11:46:30.008000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = r0_offset + r0_base 2025-12-04T12:15:05.0242437Z E1204 11:46:30.008000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T12:15:05.0242948Z E1204 11:46:30.008000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T12:15:05.0243431Z E1204 11:46:30.008000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T12:15:05.0243916Z E1204 11:46:30.008000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_0 = r0_index 2025-12-04T12:15:05.0244683Z E1204 11:46:30.008000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, eviction_policy='evict_first', other=0.0).to(tl.float32) 2025-12-04T12:15:05.0245215Z E1204 11:46:30.008000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tl_math.abs(tmp0) 2025-12-04T12:15:05.0245802Z E1204 11:46:30.008000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK]) 2025-12-04T12:15:05.0246415Z E1204 11:46:30.008000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = triton_helpers.maximum(_tmp3, tmp2) 2025-12-04T12:15:05.0246985Z E1204 11:46:30.008000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp3 = tl.where(r0_mask, tmp4, _tmp3) 2025-12-04T12:15:05.0247510Z E1204 11:46:30.008000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = tmp0.to(tl.float32) 2025-12-04T12:15:05.0248014Z E1204 11:46:30.008000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tmp5 * tmp7 2025-12-04T12:15:05.0248480Z E1204 11:46:30.008000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = -448.0 2025-12-04T12:15:05.0249056Z E1204 11:46:30.008000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = triton_helpers.maximum(tmp8, tmp9) 2025-12-04T12:15:05.0249538Z E1204 11:46:30.008000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = 448.0 2025-12-04T12:15:05.0250118Z E1204 11:46:30.008000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = triton_helpers.minimum(tmp10, tmp11) 2025-12-04T12:15:05.0250677Z E1204 11:46:30.008000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = tmp12.to(tl.float8e4nv) 2025-12-04T12:15:05.0251382Z E1204 11:46:30.008000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (tl.broadcast_to(r0_0, [XBLOCK, R0_BLOCK])), tmp13, r0_mask) 2025-12-04T12:15:05.0251956Z E1204 11:46:30.008000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = triton_helpers.max2(_tmp3, 1)[:, None] 2025-12-04T12:15:05.0252681Z E1204 11:46:30.008000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr2 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp3, None) 2025-12-04T12:15:05.0253091Z E1204 11:46:30.008000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:05.0255492Z E1204 11:46:30.008000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'out_ptr2': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1, 'R0_BLOCK': 2048}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 75} 2025-12-04T12:15:05.0256063Z E1204 11:46:30.008000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:05.0257220Z E1204 11:46:30.008000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.0257859Z E1204 11:46:30.008000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.0258775Z E1204 11:46:30.008000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.0259458Z E1204 11:46:30.008000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.0260402Z E1204 11:46:30.008000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.0261168Z E1204 11:46:30.008000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.0261785Z E1204 11:46:30.008000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T12:15:05.0262857Z E1204 11:46:30.008000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T12:15:05.0263226Z E1204 11:46:30.008000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T12:15:05.0264139Z E1204 11:46:30.008000 110614 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.0264247Z FAILED [0.3555s] [100%] 2025-12-04T12:15:05.0264255Z 2025-12-04T12:15:05.0264415Z ==================================== RERUNS ==================================== 2025-12-04T12:15:05.0264744Z _ TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda _ 2025-12-04T12:15:05.0264872Z Traceback (most recent call last): 2025-12-04T12:15:05.0265344Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 265, in test_amax_along_with_fp8_quant 2025-12-04T12:15:05.0265591Z y_compiled = compiled_amax_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T12:15:05.0266080Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:05.0266345Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:05.0266890Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:05.0267100Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:05.0267612Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:05.0267766Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:05.0268349Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:05.0268709Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:05.0269245Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:05.0269402Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:05.0269890Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:05.0270031Z return self._compile_to_module() 2025-12-04T12:15:05.0270521Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:05.0270687Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:05.0271541Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:05.0271679Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:05.0272198Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:05.0272432Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:05.0273122Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:05.0273270Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:05.0273780Z File "/tmp/tmp61d8e04k/ff/cffrfjwasuxxgzokqulnluti23iuaacrarvz4fycblwg3su4uyv3.py", line 62, in 2025-12-04T12:15:05.0274260Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T12:15:05.0274376Z kernel.precompile( 2025-12-04T12:15:05.0274932Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T12:15:05.0275072Z self._precompile_worker() 2025-12-04T12:15:05.0275670Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:05.0275853Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:05.0276467Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.0276671Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.0277137Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.0277385Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.0277834Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.0278190Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.0278420Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:05.0279049Z def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T12:15:05.0279143Z ^ 2025-12-04T12:15:05.0279658Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.0279665Z 2025-12-04T12:15:05.0280398Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:05.0280405Z 2025-12-04T12:15:05.0280409Z 2025-12-04T12:15:05.0280632Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:05.0281402Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda 2025-12-04T12:15:05.0281408Z 2025-12-04T12:15:05.0281727Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:05.0281956Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.0282085Z frames [('total', 1)] 2025-12-04T12:15:05.0282203Z stats [('calls_captured', 7)] 2025-12-04T12:15:05.0282461Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:05.0282684Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.0282788Z graph_break [] 2025-12-04T12:15:05.0283125Z _ TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda _ 2025-12-04T12:15:05.0283250Z Traceback (most recent call last): 2025-12-04T12:15:05.0283705Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 265, in test_amax_along_with_fp8_quant 2025-12-04T12:15:05.0283963Z y_compiled = compiled_amax_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T12:15:05.0284460Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:05.0284721Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:05.0285285Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:05.0285482Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:05.0286006Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:05.0286156Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:05.0286703Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:05.0287028Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:05.0287548Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:05.0287711Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:05.0288190Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:05.0288318Z return self._compile_to_module() 2025-12-04T12:15:05.0288817Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:05.0288983Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:05.0289511Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:05.0289645Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:05.0290139Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:05.0290386Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:05.0290972Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:05.0291115Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:05.0291632Z File "/tmp/tmp_di73wtl/4p/c4pjicw4ad5n4qi5pbu5cdwblqv7hbjmlhfzpd6uualp4khnauvg.py", line 62, in 2025-12-04T12:15:05.0292098Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T12:15:05.0292225Z kernel.precompile( 2025-12-04T12:15:05.0292783Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T12:15:05.0292933Z self._precompile_worker() 2025-12-04T12:15:05.0293548Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:05.0293765Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:05.0294377Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.0294578Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.0295037Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.0295300Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.0295750Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.0296101Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.0296397Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:05.0297023Z def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T12:15:05.0297130Z ^ 2025-12-04T12:15:05.0297587Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.0297633Z 2025-12-04T12:15:05.0298360Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:05.0298366Z 2025-12-04T12:15:05.0298371Z 2025-12-04T12:15:05.0298589Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:05.0299225Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda 2025-12-04T12:15:05.0299246Z 2025-12-04T12:15:05.0299515Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:05.0299741Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.0299860Z frames [('total', 1)] 2025-12-04T12:15:05.0299978Z stats [('calls_captured', 7)] 2025-12-04T12:15:05.0300219Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:05.0300454Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.0300557Z graph_break [] 2025-12-04T12:15:05.0300776Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.0300893Z frames [('total', 1)] 2025-12-04T12:15:05.0301011Z stats [('calls_captured', 7)] 2025-12-04T12:15:05.0301242Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.0301476Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:05.0301582Z graph_break [] 2025-12-04T12:15:05.0301744Z =================================== FAILURES =================================== 2025-12-04T12:15:05.0302066Z _ TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda _ 2025-12-04T12:15:05.0302190Z Traceback (most recent call last): 2025-12-04T12:15:05.0302663Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 265, in test_amax_along_with_fp8_quant 2025-12-04T12:15:05.0302948Z y_compiled = compiled_amax_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T12:15:05.0303491Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:05.0303744Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:05.0304260Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:05.0304504Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:05.0305051Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:05.0305203Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:05.0305760Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:05.0306087Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:05.0306626Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:05.0306775Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:05.0307254Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:05.0307393Z return self._compile_to_module() 2025-12-04T12:15:05.0307877Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:05.0308056Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:05.0308577Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:05.0308742Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:05.0309255Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:05.0309488Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:05.0310075Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:05.0310218Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:05.0310724Z File "/tmp/tmpph97hlyp/ky/ckyit2fbc7htzmeglpvxlpbe7vwe3hsuabb5q3bvcqlfgssfpniq.py", line 62, in 2025-12-04T12:15:05.0311203Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T12:15:05.0311321Z kernel.precompile( 2025-12-04T12:15:05.0311877Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T12:15:05.0312028Z self._precompile_worker() 2025-12-04T12:15:05.0312628Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:05.0312821Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:05.0313419Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.0313621Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.0314093Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.0314342Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.0314803Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.0315140Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.0315372Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:05.0316050Z def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T12:15:05.0316142Z ^ 2025-12-04T12:15:05.0316602Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.0316622Z 2025-12-04T12:15:05.0317366Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:05.0317372Z 2025-12-04T12:15:05.0317405Z 2025-12-04T12:15:05.0317627Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:05.0318275Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda 2025-12-04T12:15:05.0318283Z 2025-12-04T12:15:05.0318555Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:05.0318795Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.0318903Z frames [('total', 1)] 2025-12-04T12:15:05.0319022Z stats [('calls_captured', 7)] 2025-12-04T12:15:05.0319277Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:05.0319499Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.0319604Z graph_break [] 2025-12-04T12:15:05.0319841Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.0319952Z frames [('total', 1)] 2025-12-04T12:15:05.0320088Z stats [('calls_captured', 7)] 2025-12-04T12:15:05.0320311Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.0320579Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:05.0320692Z graph_break [] 2025-12-04T12:15:05.0320913Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.0321016Z frames [('total', 1)] 2025-12-04T12:15:05.0321149Z stats [('calls_captured', 7)] 2025-12-04T12:15:05.0321367Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.0321602Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:05.0321717Z graph_break [] 2025-12-04T12:15:05.0322380Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-3d881319a967678f.xml - 2025-12-04T12:15:05.0322579Z =========================== short test summary info ============================ 2025-12-04T12:15:05.0323362Z FAILED [0.3555s] inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:05.0323977Z def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T12:15:05.0324084Z ^ 2025-12-04T12:15:05.0324544Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.0324550Z 2025-12-04T12:15:05.0325273Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:05.0325281Z 2025-12-04T12:15:05.0325285Z 2025-12-04T12:15:05.0325504Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:05.0326141Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda 2025-12-04T12:15:05.0326161Z 2025-12-04T12:15:05.0326431Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:05.0326649Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T12:15:05.0326868Z ================== 1 failed, 187 deselected, 2 rerun in 4.06s ================== 2025-12-04T12:15:05.0326970Z Got exit code 1 2025-12-04T12:15:05.0327080Z Retrying single test... 2025-12-04T12:15:05.0327567Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-7b45d70025cf6016.xml 2025-12-04T12:15:05.0327767Z ============================= test session starts ============================== 2025-12-04T12:15:05.0328133Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T12:15:05.0328294Z cachedir: .pytest_cache 2025-12-04T12:15:05.0328816Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:15:05.0328960Z rootdir: /var/lib/jenkins/workspace 2025-12-04T12:15:05.0329069Z configfile: pytest.ini 2025-12-04T12:15:05.0329666Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T12:15:05.0329903Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T12:15:05.0330618Z stepcurrent: skipping 3 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda 2025-12-04T12:15:05.0330750Z Running 1 items in this shard 2025-12-04T12:15:05.0330755Z 2025-12-04T12:15:05.0332030Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda E1204 11:46:48.948000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_0 2025-12-04T12:15:05.0333136Z E1204 11:46:48.948000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T12:15:05.0333570Z E1204 11:46:48.948000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T12:15:05.0334022Z E1204 11:46:48.948000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 5120 2025-12-04T12:15:05.0334499Z E1204 11:46:48.948000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T12:15:05.0335041Z E1204 11:46:48.948000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T12:15:05.0335598Z E1204 11:46:48.948000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:05.0336187Z E1204 11:46:48.948000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T12:15:05.0336844Z E1204 11:46:48.948000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T12:15:05.0337416Z E1204 11:46:48.948000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_base = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T12:15:05.0337865Z E1204 11:46:48.948000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rbase = r0_base 2025-12-04T12:15:05.0338518Z E1204 11:46:48.948000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp3 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32) 2025-12-04T12:15:05.0339043Z E1204 11:46:48.948000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp6 = tl.load(in_ptr1 + (0)) 2025-12-04T12:15:05.0339632Z E1204 11:46:48.948000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = tl.broadcast_to(tmp6, [1, 1]) 2025-12-04T12:15:05.0340228Z E1204 11:46:48.948000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] for r0_offset in tl.range(0, r0_numel, R0_BLOCK): 2025-12-04T12:15:05.0340760Z E1204 11:46:48.948000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = r0_offset + r0_base 2025-12-04T12:15:05.0341303Z E1204 11:46:48.948000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T12:15:05.0341857Z E1204 11:46:48.948000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T12:15:05.0342351Z E1204 11:46:48.948000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T12:15:05.0342823Z E1204 11:46:48.948000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_0 = r0_index 2025-12-04T12:15:05.0343593Z E1204 11:46:48.948000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, eviction_policy='evict_first', other=0.0).to(tl.float32) 2025-12-04T12:15:05.0344122Z E1204 11:46:48.948000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tl_math.abs(tmp0) 2025-12-04T12:15:05.0344709Z E1204 11:46:48.948000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK]) 2025-12-04T12:15:05.0345304Z E1204 11:46:48.948000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = triton_helpers.maximum(_tmp3, tmp2) 2025-12-04T12:15:05.0345861Z E1204 11:46:48.948000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp3 = tl.where(r0_mask, tmp4, _tmp3) 2025-12-04T12:15:05.0346427Z E1204 11:46:48.948000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = tmp0.to(tl.float32) 2025-12-04T12:15:05.0346928Z E1204 11:46:48.948000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tmp5 * tmp7 2025-12-04T12:15:05.0347393Z E1204 11:46:48.948000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = -448.0 2025-12-04T12:15:05.0347982Z E1204 11:46:48.948000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = triton_helpers.maximum(tmp8, tmp9) 2025-12-04T12:15:05.0348442Z E1204 11:46:48.948000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = 448.0 2025-12-04T12:15:05.0349025Z E1204 11:46:48.948000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = triton_helpers.minimum(tmp10, tmp11) 2025-12-04T12:15:05.0349583Z E1204 11:46:48.948000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = tmp12.to(tl.float8e4nv) 2025-12-04T12:15:05.0350298Z E1204 11:46:48.948000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (tl.broadcast_to(r0_0, [XBLOCK, R0_BLOCK])), tmp13, r0_mask) 2025-12-04T12:15:05.0350883Z E1204 11:46:48.948000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = triton_helpers.max2(_tmp3, 1)[:, None] 2025-12-04T12:15:05.0351596Z E1204 11:46:48.948000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr2 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp3, None) 2025-12-04T12:15:05.0351974Z E1204 11:46:48.948000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:05.0354343Z E1204 11:46:48.948000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'out_ptr2': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1, 'R0_BLOCK': 2048}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 75} 2025-12-04T12:15:05.0354931Z E1204 11:46:48.948000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:05.0356003Z E1204 11:46:48.948000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.0356654Z E1204 11:46:48.948000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.0357558Z E1204 11:46:48.948000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.0358236Z E1204 11:46:48.948000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.0359136Z E1204 11:46:48.948000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.0359908Z E1204 11:46:48.948000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.0360564Z E1204 11:46:48.948000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T12:15:05.0361621Z E1204 11:46:48.948000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T12:15:05.0362004Z E1204 11:46:48.948000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T12:15:05.0362906Z E1204 11:46:48.948000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.0363043Z ('RERUN', {'yellow': True}) [3.3088s] [100%] 2025-12-04T12:15:05.0364306Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda E1204 11:46:49.346000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_0 2025-12-04T12:15:05.0365368Z E1204 11:46:49.346000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T12:15:05.0365820Z E1204 11:46:49.346000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T12:15:05.0366281Z E1204 11:46:49.346000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 5120 2025-12-04T12:15:05.0366753Z E1204 11:46:49.346000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T12:15:05.0367296Z E1204 11:46:49.346000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T12:15:05.0367874Z E1204 11:46:49.346000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:05.0368480Z E1204 11:46:49.346000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T12:15:05.0369066Z E1204 11:46:49.346000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T12:15:05.0369713Z E1204 11:46:49.346000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_base = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T12:15:05.0370164Z E1204 11:46:49.346000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rbase = r0_base 2025-12-04T12:15:05.0370800Z E1204 11:46:49.346000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp3 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32) 2025-12-04T12:15:05.0371540Z E1204 11:46:49.346000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp6 = tl.load(in_ptr1 + (0)) 2025-12-04T12:15:05.0372087Z E1204 11:46:49.346000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = tl.broadcast_to(tmp6, [1, 1]) 2025-12-04T12:15:05.0372684Z E1204 11:46:49.346000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] for r0_offset in tl.range(0, r0_numel, R0_BLOCK): 2025-12-04T12:15:05.0373221Z E1204 11:46:49.346000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = r0_offset + r0_base 2025-12-04T12:15:05.0373753Z E1204 11:46:49.346000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T12:15:05.0374355Z E1204 11:46:49.346000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T12:15:05.0374835Z E1204 11:46:49.346000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T12:15:05.0375320Z E1204 11:46:49.346000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_0 = r0_index 2025-12-04T12:15:05.0376082Z E1204 11:46:49.346000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, eviction_policy='evict_first', other=0.0).to(tl.float32) 2025-12-04T12:15:05.0376680Z E1204 11:46:49.346000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tl_math.abs(tmp0) 2025-12-04T12:15:05.0377272Z E1204 11:46:49.346000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK]) 2025-12-04T12:15:05.0377853Z E1204 11:46:49.346000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = triton_helpers.maximum(_tmp3, tmp2) 2025-12-04T12:15:05.0378417Z E1204 11:46:49.346000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp3 = tl.where(r0_mask, tmp4, _tmp3) 2025-12-04T12:15:05.0378937Z E1204 11:46:49.346000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = tmp0.to(tl.float32) 2025-12-04T12:15:05.0379434Z E1204 11:46:49.346000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tmp5 * tmp7 2025-12-04T12:15:05.0379901Z E1204 11:46:49.346000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = -448.0 2025-12-04T12:15:05.0380478Z E1204 11:46:49.346000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = triton_helpers.maximum(tmp8, tmp9) 2025-12-04T12:15:05.0380950Z E1204 11:46:49.346000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = 448.0 2025-12-04T12:15:05.0381577Z E1204 11:46:49.346000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = triton_helpers.minimum(tmp10, tmp11) 2025-12-04T12:15:05.0382134Z E1204 11:46:49.346000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = tmp12.to(tl.float8e4nv) 2025-12-04T12:15:05.0382849Z E1204 11:46:49.346000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (tl.broadcast_to(r0_0, [XBLOCK, R0_BLOCK])), tmp13, r0_mask) 2025-12-04T12:15:05.0383544Z E1204 11:46:49.346000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = triton_helpers.max2(_tmp3, 1)[:, None] 2025-12-04T12:15:05.0384255Z E1204 11:46:49.346000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr2 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp3, None) 2025-12-04T12:15:05.0384626Z E1204 11:46:49.346000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:05.0386964Z E1204 11:46:49.346000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'out_ptr2': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1, 'R0_BLOCK': 2048}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 75} 2025-12-04T12:15:05.0387502Z E1204 11:46:49.346000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:05.0388603Z E1204 11:46:49.346000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.0389229Z E1204 11:46:49.346000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.0390142Z E1204 11:46:49.346000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.0390832Z E1204 11:46:49.346000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.0391732Z E1204 11:46:49.346000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.0392530Z E1204 11:46:49.346000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.0393142Z E1204 11:46:49.346000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T12:15:05.0394217Z E1204 11:46:49.346000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T12:15:05.0394597Z E1204 11:46:49.346000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T12:15:05.0395548Z E1204 11:46:49.346000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.0395688Z ('RERUN', {'yellow': True}) [0.3580s] [100%] 2025-12-04T12:15:05.0396976Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda E1204 11:46:49.706000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_0 2025-12-04T12:15:05.0398084Z E1204 11:46:49.706000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T12:15:05.0398536Z E1204 11:46:49.706000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T12:15:05.0398992Z E1204 11:46:49.706000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 5120 2025-12-04T12:15:05.0399453Z E1204 11:46:49.706000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T12:15:05.0400005Z E1204 11:46:49.706000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T12:15:05.0400550Z E1204 11:46:49.706000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:05.0401157Z E1204 11:46:49.706000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T12:15:05.0401745Z E1204 11:46:49.706000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T12:15:05.0402349Z E1204 11:46:49.706000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_base = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T12:15:05.0402815Z E1204 11:46:49.706000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rbase = r0_base 2025-12-04T12:15:05.0403446Z E1204 11:46:49.706000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp3 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32) 2025-12-04T12:15:05.0403984Z E1204 11:46:49.706000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp6 = tl.load(in_ptr1 + (0)) 2025-12-04T12:15:05.0404534Z E1204 11:46:49.706000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = tl.broadcast_to(tmp6, [1, 1]) 2025-12-04T12:15:05.0405130Z E1204 11:46:49.706000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] for r0_offset in tl.range(0, r0_numel, R0_BLOCK): 2025-12-04T12:15:05.0405683Z E1204 11:46:49.706000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = r0_offset + r0_base 2025-12-04T12:15:05.0406205Z E1204 11:46:49.706000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T12:15:05.0406714Z E1204 11:46:49.706000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T12:15:05.0407194Z E1204 11:46:49.706000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T12:15:05.0407665Z E1204 11:46:49.706000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_0 = r0_index 2025-12-04T12:15:05.0408444Z E1204 11:46:49.706000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, eviction_policy='evict_first', other=0.0).to(tl.float32) 2025-12-04T12:15:05.0408998Z E1204 11:46:49.706000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tl_math.abs(tmp0) 2025-12-04T12:15:05.0409607Z E1204 11:46:49.706000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK]) 2025-12-04T12:15:05.0410181Z E1204 11:46:49.706000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = triton_helpers.maximum(_tmp3, tmp2) 2025-12-04T12:15:05.0410785Z E1204 11:46:49.706000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp3 = tl.where(r0_mask, tmp4, _tmp3) 2025-12-04T12:15:05.0411337Z E1204 11:46:49.706000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = tmp0.to(tl.float32) 2025-12-04T12:15:05.0411829Z E1204 11:46:49.706000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tmp5 * tmp7 2025-12-04T12:15:05.0412310Z E1204 11:46:49.706000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = -448.0 2025-12-04T12:15:05.0412886Z E1204 11:46:49.706000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = triton_helpers.maximum(tmp8, tmp9) 2025-12-04T12:15:05.0413355Z E1204 11:46:49.706000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = 448.0 2025-12-04T12:15:05.0413931Z E1204 11:46:49.706000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = triton_helpers.minimum(tmp10, tmp11) 2025-12-04T12:15:05.0414480Z E1204 11:46:49.706000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = tmp12.to(tl.float8e4nv) 2025-12-04T12:15:05.0415202Z E1204 11:46:49.706000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (tl.broadcast_to(r0_0, [XBLOCK, R0_BLOCK])), tmp13, r0_mask) 2025-12-04T12:15:05.0415809Z E1204 11:46:49.706000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = triton_helpers.max2(_tmp3, 1)[:, None] 2025-12-04T12:15:05.0416599Z E1204 11:46:49.706000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr2 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp3, None) 2025-12-04T12:15:05.0416962Z E1204 11:46:49.706000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:05.0419320Z E1204 11:46:49.706000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'out_ptr2': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1, 'R0_BLOCK': 2048}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 75} 2025-12-04T12:15:05.0419858Z E1204 11:46:49.706000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:05.0420908Z E1204 11:46:49.706000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.0421543Z E1204 11:46:49.706000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.0422453Z E1204 11:46:49.706000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.0423174Z E1204 11:46:49.706000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.0424056Z E1204 11:46:49.706000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.0424874Z E1204 11:46:49.706000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.0425516Z E1204 11:46:49.706000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T12:15:05.0426598Z E1204 11:46:49.706000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T12:15:05.0426975Z E1204 11:46:49.706000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T12:15:05.0427885Z E1204 11:46:49.706000 110811 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.0427997Z FAILED [0.3581s] [100%] 2025-12-04T12:15:05.0428004Z 2025-12-04T12:15:05.0428149Z ==================================== RERUNS ==================================== 2025-12-04T12:15:05.0428492Z _ TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda _ 2025-12-04T12:15:05.0428625Z Traceback (most recent call last): 2025-12-04T12:15:05.0429079Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 265, in test_amax_along_with_fp8_quant 2025-12-04T12:15:05.0429374Z y_compiled = compiled_amax_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T12:15:05.0429866Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:05.0430128Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:05.0430643Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:05.0430842Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:05.0431368Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:05.0431517Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:05.0432062Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:05.0432394Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:05.0432945Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:05.0433109Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:05.0433591Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:05.0433731Z return self._compile_to_module() 2025-12-04T12:15:05.0434219Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:05.0434385Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:05.0434912Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:05.0435048Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:05.0435592Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:05.0435841Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:05.0436426Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:05.0436566Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:05.0437037Z File "/tmp/tmp2jd4mi3_/42/c425kzmprx75eue6yyh3uob3fhnoxdrjot44i67kcgmfh4b6uoj3.py", line 62, in 2025-12-04T12:15:05.0437535Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T12:15:05.0437695Z kernel.precompile( 2025-12-04T12:15:05.0438254Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T12:15:05.0438391Z self._precompile_worker() 2025-12-04T12:15:05.0438989Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:05.0439168Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:05.0439783Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.0439982Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.0440437Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.0440697Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.0441145Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.0441502Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.0441763Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:05.0442378Z def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T12:15:05.0442482Z ^ 2025-12-04T12:15:05.0442944Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.0442950Z 2025-12-04T12:15:05.0443681Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:05.0443688Z 2025-12-04T12:15:05.0443693Z 2025-12-04T12:15:05.0443915Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:05.0444573Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda 2025-12-04T12:15:05.0444582Z 2025-12-04T12:15:05.0444853Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:05.0445078Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.0445197Z frames [('total', 1)] 2025-12-04T12:15:05.0445314Z stats [('calls_captured', 7)] 2025-12-04T12:15:05.0445550Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:05.0445790Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.0445894Z graph_break [] 2025-12-04T12:15:05.0446230Z _ TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda _ 2025-12-04T12:15:05.0446358Z Traceback (most recent call last): 2025-12-04T12:15:05.0446817Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 265, in test_amax_along_with_fp8_quant 2025-12-04T12:15:05.0447076Z y_compiled = compiled_amax_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T12:15:05.0447599Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:05.0447850Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:05.0448375Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:05.0448569Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:05.0449130Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:05.0449280Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:05.0449844Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:05.0450178Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:05.0450726Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:05.0450889Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:05.0451373Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:05.0451495Z return self._compile_to_module() 2025-12-04T12:15:05.0451999Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:05.0452168Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:05.0452682Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:05.0452828Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:05.0453323Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:05.0453600Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:05.0454197Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:05.0454325Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:05.0454839Z File "/tmp/tmp2l7zvco7/4h/c4hqymdsulobuwkkx7xt2md63hatst6mngltzptiw2dmrehhia7u.py", line 62, in 2025-12-04T12:15:05.0455305Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T12:15:05.0455433Z kernel.precompile( 2025-12-04T12:15:05.0455996Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T12:15:05.0456117Z self._precompile_worker() 2025-12-04T12:15:05.0456802Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:05.0456993Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:05.0457591Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.0457807Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.0458257Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.0458528Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.0458979Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.0459315Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.0459562Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:05.0460229Z def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T12:15:05.0460336Z ^ 2025-12-04T12:15:05.0460797Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.0460802Z 2025-12-04T12:15:05.0461521Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:05.0461572Z 2025-12-04T12:15:05.0461578Z 2025-12-04T12:15:05.0461800Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:05.0462478Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda 2025-12-04T12:15:05.0462487Z 2025-12-04T12:15:05.0462776Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:05.0463002Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.0463109Z frames [('total', 1)] 2025-12-04T12:15:05.0463240Z stats [('calls_captured', 7)] 2025-12-04T12:15:05.0463479Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:05.0463718Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.0463819Z graph_break [] 2025-12-04T12:15:05.0464039Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.0464158Z frames [('total', 1)] 2025-12-04T12:15:05.0464273Z stats [('calls_captured', 7)] 2025-12-04T12:15:05.0464500Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.0464748Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:05.0464850Z graph_break [] 2025-12-04T12:15:05.0465049Z =================================== FAILURES =================================== 2025-12-04T12:15:05.0465375Z _ TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda _ 2025-12-04T12:15:05.0465502Z Traceback (most recent call last): 2025-12-04T12:15:05.0465974Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 265, in test_amax_along_with_fp8_quant 2025-12-04T12:15:05.0466216Z y_compiled = compiled_amax_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T12:15:05.0466710Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:05.0466975Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:05.0467505Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:05.0467715Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:05.0468228Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:05.0468392Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:05.0468930Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:05.0469252Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:05.0469793Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:05.0469950Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:05.0470435Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:05.0470573Z return self._compile_to_module() 2025-12-04T12:15:05.0471278Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:05.0471467Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:05.0472080Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:05.0472217Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:05.0472737Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:05.0472973Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:05.0473632Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:05.0473813Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:05.0474300Z File "/tmp/tmp1i_pvvjs/xj/cxj2cj7kqa5zrggtpbepqwgwh3wguwq6jerzpkiytiia4z2auv6d.py", line 62, in 2025-12-04T12:15:05.0474791Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T12:15:05.0474908Z kernel.precompile( 2025-12-04T12:15:05.0475470Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T12:15:05.0475605Z self._precompile_worker() 2025-12-04T12:15:05.0476205Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:05.0476407Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:05.0477006Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.0477207Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.0477681Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.0478040Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.0478506Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.0478844Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.0479072Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:05.0479704Z def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T12:15:05.0479799Z ^ 2025-12-04T12:15:05.0480278Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.0480284Z 2025-12-04T12:15:05.0480999Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:05.0481007Z 2025-12-04T12:15:05.0481013Z 2025-12-04T12:15:05.0481234Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:05.0481889Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda 2025-12-04T12:15:05.0481895Z 2025-12-04T12:15:05.0482167Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:05.0482410Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.0482519Z frames [('total', 1)] 2025-12-04T12:15:05.0482633Z stats [('calls_captured', 7)] 2025-12-04T12:15:05.0482889Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:05.0483112Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.0483226Z graph_break [] 2025-12-04T12:15:05.0483449Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.0483551Z frames [('total', 1)] 2025-12-04T12:15:05.0483794Z stats [('calls_captured', 7)] 2025-12-04T12:15:05.0484018Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.0484253Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:05.0484368Z graph_break [] 2025-12-04T12:15:05.0484582Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.0484722Z frames [('total', 1)] 2025-12-04T12:15:05.0484854Z stats [('calls_captured', 7)] 2025-12-04T12:15:05.0485072Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.0485351Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:05.0485466Z graph_break [] 2025-12-04T12:15:05.0486126Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-7b45d70025cf6016.xml - 2025-12-04T12:15:05.0486320Z =========================== short test summary info ============================ 2025-12-04T12:15:05.0487093Z FAILED [0.3581s] inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:05.0487721Z def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_0(in_ptr0, in_ptr1, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T12:15:05.0487815Z ^ 2025-12-04T12:15:05.0488271Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.0488277Z 2025-12-04T12:15:05.0489004Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:05.0489041Z 2025-12-04T12:15:05.0489046Z 2025-12-04T12:15:05.0489264Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:05.0489918Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda 2025-12-04T12:15:05.0489924Z 2025-12-04T12:15:05.0490196Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:05.0490381Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T12:15:05.0490602Z ================== 1 failed, 187 deselected, 2 rerun in 4.07s ================== 2025-12-04T12:15:05.0490705Z Got exit code 1 2025-12-04T12:15:05.0491275Z FAILED CONSISTENTLY: test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda 2025-12-04T12:15:05.0491688Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T12:15:05.0492163Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-b4a285d41fdad5fc.xml 2025-12-04T12:15:05.0492347Z ============================= test session starts ============================== 2025-12-04T12:15:05.0492701Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T12:15:05.0492825Z cachedir: .pytest_cache 2025-12-04T12:15:05.0493343Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:15:05.0493471Z rootdir: /var/lib/jenkins/workspace 2025-12-04T12:15:05.0493596Z configfile: pytest.ini 2025-12-04T12:15:05.0494188Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T12:15:05.0494414Z collecting ... collected 188 items / 4 deselected / 184 selected 2025-12-04T12:15:05.0494570Z stepcurrent: skipping 4 already run items. 2025-12-04T12:15:05.0494690Z Running 184 items in this shard 2025-12-04T12:15:05.0494694Z 2025-12-04T12:15:05.0495927Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda E1204 11:47:09.435000 111008 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_clamp_mul_2 2025-12-04T12:15:05.0496819Z E1204 11:47:09.435000 111008 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_clamp_mul_2(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.0497327Z E1204 11:47:09.435000 111008 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 33554432 2025-12-04T12:15:05.0497921Z E1204 11:47:09.435000 111008 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:05.0498487Z E1204 11:47:09.435000 111008 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T12:15:05.0499076Z E1204 11:47:09.435000 111008 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:] 2025-12-04T12:15:05.0499512Z E1204 11:47:09.435000 111008 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T12:15:05.0500116Z E1204 11:47:09.435000 111008 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (x0), None).to(tl.float32) 2025-12-04T12:15:05.0500642Z E1204 11:47:09.435000 111008 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.load(in_ptr1 + (0)) 2025-12-04T12:15:05.0501199Z E1204 11:47:09.435000 111008 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tl.broadcast_to(tmp2, [XBLOCK]) 2025-12-04T12:15:05.0501728Z E1204 11:47:09.435000 111008 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float32) 2025-12-04T12:15:05.0502238Z E1204 11:47:09.435000 111008 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tmp1 * tmp3 2025-12-04T12:15:05.0502694Z E1204 11:47:09.435000 111008 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = -448.0 2025-12-04T12:15:05.0503262Z E1204 11:47:09.435000 111008 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp6 = triton_helpers.maximum(tmp4, tmp5) 2025-12-04T12:15:05.0503701Z E1204 11:47:09.435000 111008 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = 448.0 2025-12-04T12:15:05.0504289Z E1204 11:47:09.435000 111008 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = triton_helpers.minimum(tmp6, tmp7) 2025-12-04T12:15:05.0504838Z E1204 11:47:09.435000 111008 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = tmp8.to(tl.float8e4nv) 2025-12-04T12:15:05.0505403Z E1204 11:47:09.435000 111008 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr0 + (x0), tmp9, None) 2025-12-04T12:15:05.0505773Z E1204 11:47:09.435000 111008 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:05.0507706Z E1204 11:47:09.435000 111008 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr0': '*fp8e4nv', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1024}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 75} 2025-12-04T12:15:05.0508243Z E1204 11:47:09.435000 111008 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:05.0509342Z E1204 11:47:09.435000 111008 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.0509993Z E1204 11:47:09.435000 111008 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.0510894Z E1204 11:47:09.435000 111008 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.0511655Z E1204 11:47:09.435000 111008 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.0512539Z E1204 11:47:09.435000 111008 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.0513336Z E1204 11:47:09.435000 111008 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.0513944Z E1204 11:47:09.435000 111008 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T12:15:05.0514756Z E1204 11:47:09.435000 111008 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_clamp_mul_2(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.0515128Z E1204 11:47:09.435000 111008 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T12:15:05.0516025Z E1204 11:47:09.435000 111008 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.0516209Z ('RERUN', {'yellow': True}) [3.7607s] [ 0%] 2025-12-04T12:15:05.0517383Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda E1204 11:47:10.063000 111008 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_clamp_mul_2 2025-12-04T12:15:05.0518193Z E1204 11:47:10.063000 111008 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_clamp_mul_2(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.0518662Z E1204 11:47:10.063000 111008 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 33554432 2025-12-04T12:15:05.0519217Z E1204 11:47:10.063000 111008 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:05.0519780Z E1204 11:47:10.063000 111008 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T12:15:05.0520343Z E1204 11:47:10.063000 111008 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:] 2025-12-04T12:15:05.0520792Z E1204 11:47:10.063000 111008 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T12:15:05.0521384Z E1204 11:47:10.063000 111008 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (x0), None).to(tl.float32) 2025-12-04T12:15:05.0522476Z E1204 11:47:10.063000 111008 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.load(in_ptr1 + (0)) 2025-12-04T12:15:05.0523036Z E1204 11:47:10.063000 111008 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tl.broadcast_to(tmp2, [XBLOCK]) 2025-12-04T12:15:05.0523545Z E1204 11:47:10.063000 111008 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float32) 2025-12-04T12:15:05.0524082Z E1204 11:47:10.063000 111008 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tmp1 * tmp3 2025-12-04T12:15:05.0524526Z E1204 11:47:10.063000 111008 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = -448.0 2025-12-04T12:15:05.0525103Z E1204 11:47:10.063000 111008 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp6 = triton_helpers.maximum(tmp4, tmp5) 2025-12-04T12:15:05.0525581Z E1204 11:47:10.063000 111008 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = 448.0 2025-12-04T12:15:05.0526192Z E1204 11:47:10.063000 111008 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = triton_helpers.minimum(tmp6, tmp7) 2025-12-04T12:15:05.0526735Z E1204 11:47:10.063000 111008 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = tmp8.to(tl.float8e4nv) 2025-12-04T12:15:05.0527291Z E1204 11:47:10.063000 111008 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr0 + (x0), tmp9, None) 2025-12-04T12:15:05.0527669Z E1204 11:47:10.063000 111008 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:05.0529588Z E1204 11:47:10.063000 111008 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr0': '*fp8e4nv', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1024}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 75} 2025-12-04T12:15:05.0530165Z E1204 11:47:10.063000 111008 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:05.0531252Z E1204 11:47:10.063000 111008 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.0531903Z E1204 11:47:10.063000 111008 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.0532800Z E1204 11:47:10.063000 111008 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.0533485Z E1204 11:47:10.063000 111008 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.0534387Z E1204 11:47:10.063000 111008 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.0535255Z E1204 11:47:10.063000 111008 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.0535914Z E1204 11:47:10.063000 111008 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T12:15:05.0536791Z E1204 11:47:10.063000 111008 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_clamp_mul_2(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.0537181Z E1204 11:47:10.063000 111008 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T12:15:05.0538123Z E1204 11:47:10.063000 111008 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.0538265Z ('RERUN', {'yellow': True}) [0.5906s] [ 0%] 2025-12-04T12:15:05.0539446Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda E1204 11:47:10.638000 111008 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_clamp_mul_2 2025-12-04T12:15:05.0540274Z E1204 11:47:10.638000 111008 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_clamp_mul_2(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.0540781Z E1204 11:47:10.638000 111008 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 33554432 2025-12-04T12:15:05.0541329Z E1204 11:47:10.638000 111008 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:05.0541892Z E1204 11:47:10.638000 111008 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T12:15:05.0542469Z E1204 11:47:10.638000 111008 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:] 2025-12-04T12:15:05.0542905Z E1204 11:47:10.638000 111008 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T12:15:05.0543512Z E1204 11:47:10.638000 111008 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (x0), None).to(tl.float32) 2025-12-04T12:15:05.0544037Z E1204 11:47:10.638000 111008 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.load(in_ptr1 + (0)) 2025-12-04T12:15:05.0544600Z E1204 11:47:10.638000 111008 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tl.broadcast_to(tmp2, [XBLOCK]) 2025-12-04T12:15:05.0545151Z E1204 11:47:10.638000 111008 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float32) 2025-12-04T12:15:05.0545619Z E1204 11:47:10.638000 111008 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tmp1 * tmp3 2025-12-04T12:15:05.0546080Z E1204 11:47:10.638000 111008 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = -448.0 2025-12-04T12:15:05.0546644Z E1204 11:47:10.638000 111008 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp6 = triton_helpers.maximum(tmp4, tmp5) 2025-12-04T12:15:05.0547100Z E1204 11:47:10.638000 111008 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = 448.0 2025-12-04T12:15:05.0547667Z E1204 11:47:10.638000 111008 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = triton_helpers.minimum(tmp6, tmp7) 2025-12-04T12:15:05.0548192Z E1204 11:47:10.638000 111008 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = tmp8.to(tl.float8e4nv) 2025-12-04T12:15:05.0548754Z E1204 11:47:10.638000 111008 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr0 + (x0), tmp9, None) 2025-12-04T12:15:05.0549118Z E1204 11:47:10.638000 111008 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:05.0551030Z E1204 11:47:10.638000 111008 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr0': '*fp8e4nv', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1024}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 75} 2025-12-04T12:15:05.0551613Z E1204 11:47:10.638000 111008 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:05.0552675Z E1204 11:47:10.638000 111008 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.0553307Z E1204 11:47:10.638000 111008 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.0554289Z E1204 11:47:10.638000 111008 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.0554974Z E1204 11:47:10.638000 111008 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.0555864Z E1204 11:47:10.638000 111008 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.0556654Z E1204 11:47:10.638000 111008 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.0557262Z E1204 11:47:10.638000 111008 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T12:15:05.0558083Z E1204 11:47:10.638000 111008 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_clamp_mul_2(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.0558451Z E1204 11:47:10.638000 111008 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T12:15:05.0559402Z E1204 11:47:10.638000 111008 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.0559509Z FAILED [0.5733s] [ 0%] 2025-12-04T12:15:05.0559516Z 2025-12-04T12:15:05.0559666Z ==================================== RERUNS ==================================== 2025-12-04T12:15:05.0560014Z _ TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda _ 2025-12-04T12:15:05.0560147Z Traceback (most recent call last): 2025-12-04T12:15:05.0560608Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 265, in test_amax_along_with_fp8_quant 2025-12-04T12:15:05.0560874Z y_compiled = compiled_amax_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T12:15:05.0561367Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:05.0561634Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:05.0562160Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:05.0562353Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:05.0562877Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:05.0563025Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:05.0563575Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:05.0563904Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:05.0564421Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:05.0564587Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:05.0565111Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:05.0565251Z return self._compile_to_module() 2025-12-04T12:15:05.0565740Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:05.0565929Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:05.0566457Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:05.0566624Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:05.0567158Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:05.0567410Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:05.0568351Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:05.0568496Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:05.0568986Z File "/tmp/tmpcz80_m4d/w2/cw2w2nzlrsvlalmiaenkppzrll7ug7oywppgchyrtmwm6jkw3x5w.py", line 168, in 2025-12-04T12:15:05.0569453Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T12:15:05.0569587Z kernel.precompile( 2025-12-04T12:15:05.0570144Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T12:15:05.0570280Z self._precompile_worker() 2025-12-04T12:15:05.0570885Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:05.0571246Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:05.0571966Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.0572171Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.0572626Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.0572889Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.0573338Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.0573698Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.0573929Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:05.0574292Z def triton_poi_fused__to_copy_clamp_mul_2(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.0574404Z ^ 2025-12-04T12:15:05.0574863Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.0574869Z 2025-12-04T12:15:05.0575605Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:05.0575610Z 2025-12-04T12:15:05.0575615Z 2025-12-04T12:15:05.0575840Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:05.0576555Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda 2025-12-04T12:15:05.0576580Z 2025-12-04T12:15:05.0576858Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:05.0577085Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.0577209Z frames [('total', 1)] 2025-12-04T12:15:05.0577334Z stats [('calls_captured', 7)] 2025-12-04T12:15:05.0577579Z inductor [('async_compile_cache_miss', 3), ('fxgraph_cache_miss', 1)] 2025-12-04T12:15:05.0577884Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.0577992Z graph_break [] 2025-12-04T12:15:05.0578324Z _ TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda _ 2025-12-04T12:15:05.0578463Z Traceback (most recent call last): 2025-12-04T12:15:05.0578916Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 265, in test_amax_along_with_fp8_quant 2025-12-04T12:15:05.0579236Z y_compiled = compiled_amax_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T12:15:05.0579774Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:05.0580027Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:05.0580558Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:05.0580761Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:05.0581283Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:05.0581438Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:05.0581977Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:05.0582318Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:05.0582845Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:05.0582999Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:05.0583496Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:05.0583662Z return self._compile_to_module() 2025-12-04T12:15:05.0584161Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:05.0584324Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:05.0584845Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:05.0584990Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:05.0585492Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:05.0585743Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:05.0586332Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:05.0586462Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:05.0586994Z File "/tmp/tmpvwxewveo/ej/cejkkbtgn6ov6zv4kojr6w2b6wy2tgudjeczdjnlv4bmxk6az4ml.py", line 168, in 2025-12-04T12:15:05.0587455Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T12:15:05.0587568Z kernel.precompile( 2025-12-04T12:15:05.0588136Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T12:15:05.0588256Z self._precompile_worker() 2025-12-04T12:15:05.0588870Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:05.0589053Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:05.0589648Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.0589867Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.0590353Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.0590619Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.0591067Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.0591406Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.0591689Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:05.0592054Z def triton_poi_fused__to_copy_clamp_mul_2(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.0592174Z ^ 2025-12-04T12:15:05.0592652Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.0592660Z 2025-12-04T12:15:05.0593376Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:05.0593384Z 2025-12-04T12:15:05.0593388Z 2025-12-04T12:15:05.0593617Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:05.0594265Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda 2025-12-04T12:15:05.0594271Z 2025-12-04T12:15:05.0594555Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:05.0594778Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.0594890Z frames [('total', 1)] 2025-12-04T12:15:05.0595022Z stats [('calls_captured', 7)] 2025-12-04T12:15:05.0595259Z inductor [('async_compile_cache_miss', 3), ('fxgraph_cache_miss', 1)] 2025-12-04T12:15:05.0595518Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.0595633Z graph_break [] 2025-12-04T12:15:05.0595856Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.0595978Z frames [('total', 1)] 2025-12-04T12:15:05.0596095Z stats [('calls_captured', 7)] 2025-12-04T12:15:05.0596314Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.0596560Z inductor [('async_compile_cache_miss', 3), ('fxgraph_cache_miss', 1)] 2025-12-04T12:15:05.0596661Z graph_break [] 2025-12-04T12:15:05.0596811Z =================================== FAILURES =================================== 2025-12-04T12:15:05.0597158Z _ TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda _ 2025-12-04T12:15:05.0597286Z Traceback (most recent call last): 2025-12-04T12:15:05.0597753Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 265, in test_amax_along_with_fp8_quant 2025-12-04T12:15:05.0597998Z y_compiled = compiled_amax_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T12:15:05.0598493Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:05.0598754Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:05.0599271Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:05.0599494Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:05.0600030Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:05.0600180Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:05.0600739Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:05.0601065Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:05.0601624Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:05.0601788Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:05.0602280Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:05.0602417Z return self._compile_to_module() 2025-12-04T12:15:05.0602909Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:05.0603111Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:05.0603680Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:05.0603814Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:05.0604313Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:05.0604568Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:05.0605160Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:05.0605301Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:05.0605818Z File "/tmp/tmpvzwhzjxd/ki/ckioq6mwobl3fchhfvif7vc7iyaqwzminfkey3j5mnmxsnsqy3bm.py", line 168, in 2025-12-04T12:15:05.0606284Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T12:15:05.0606412Z kernel.precompile( 2025-12-04T12:15:05.0606973Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T12:15:05.0607105Z self._precompile_worker() 2025-12-04T12:15:05.0607738Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:05.0607919Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:05.0608525Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.0608746Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.0609211Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.0609462Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.0609912Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.0610263Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.0610494Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:05.0610870Z def triton_poi_fused__to_copy_clamp_mul_2(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.0610962Z ^ 2025-12-04T12:15:05.0611421Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.0611426Z 2025-12-04T12:15:05.0612158Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:05.0612166Z 2025-12-04T12:15:05.0612171Z 2025-12-04T12:15:05.0612393Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:05.0613057Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda 2025-12-04T12:15:05.0613063Z 2025-12-04T12:15:05.0613333Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:05.0613561Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.0613734Z frames [('total', 1)] 2025-12-04T12:15:05.0613855Z stats [('calls_captured', 7)] 2025-12-04T12:15:05.0614106Z inductor [('async_compile_cache_miss', 3), ('fxgraph_cache_miss', 1)] 2025-12-04T12:15:05.0614332Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.0614437Z graph_break [] 2025-12-04T12:15:05.0614674Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.0614820Z frames [('total', 1)] 2025-12-04T12:15:05.0614942Z stats [('calls_captured', 7)] 2025-12-04T12:15:05.0615212Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.0615452Z inductor [('async_compile_cache_miss', 3), ('fxgraph_cache_miss', 1)] 2025-12-04T12:15:05.0615555Z graph_break [] 2025-12-04T12:15:05.0615796Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.0615904Z frames [('total', 1)] 2025-12-04T12:15:05.0616038Z stats [('calls_captured', 7)] 2025-12-04T12:15:05.0616262Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.0616607Z inductor [('async_compile_cache_miss', 3), ('fxgraph_cache_miss', 1)] 2025-12-04T12:15:05.0616726Z graph_break [] 2025-12-04T12:15:05.0617383Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-b4a285d41fdad5fc.xml - 2025-12-04T12:15:05.0617565Z =========================== short test summary info ============================ 2025-12-04T12:15:05.0618380Z FAILED [0.5733s] inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:05.0618744Z def triton_poi_fused__to_copy_clamp_mul_2(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.0618902Z ^ 2025-12-04T12:15:05.0619360Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.0619368Z 2025-12-04T12:15:05.0620075Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:05.0620094Z 2025-12-04T12:15:05.0620098Z 2025-12-04T12:15:05.0620322Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:05.0620969Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda 2025-12-04T12:15:05.0620975Z 2025-12-04T12:15:05.0621262Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:05.0621447Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T12:15:05.0621662Z =================== 1 failed, 4 deselected, 2 rerun in 4.97s =================== 2025-12-04T12:15:05.0621762Z Got exit code 1 2025-12-04T12:15:05.0621877Z Retrying single test... 2025-12-04T12:15:05.0622363Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-9b24822b6f23300e.xml 2025-12-04T12:15:05.0622527Z ============================= test session starts ============================== 2025-12-04T12:15:05.0622875Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T12:15:05.0623003Z cachedir: .pytest_cache 2025-12-04T12:15:05.0623526Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:15:05.0623667Z rootdir: /var/lib/jenkins/workspace 2025-12-04T12:15:05.0623778Z configfile: pytest.ini 2025-12-04T12:15:05.0624371Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T12:15:05.0624611Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T12:15:05.0625386Z stepcurrent: skipping 4 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda 2025-12-04T12:15:05.0625505Z Running 1 items in this shard 2025-12-04T12:15:05.0625511Z 2025-12-04T12:15:05.0626693Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda E1204 11:47:29.243000 111270 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_clamp_mul_2 2025-12-04T12:15:05.0627556Z E1204 11:47:29.243000 111270 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_clamp_mul_2(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.0628041Z E1204 11:47:29.243000 111270 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 33554432 2025-12-04T12:15:05.0628587Z E1204 11:47:29.243000 111270 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:05.0629166Z E1204 11:47:29.243000 111270 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T12:15:05.0629735Z E1204 11:47:29.243000 111270 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:] 2025-12-04T12:15:05.0630172Z E1204 11:47:29.243000 111270 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T12:15:05.0630785Z E1204 11:47:29.243000 111270 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (x0), None).to(tl.float32) 2025-12-04T12:15:05.0631310Z E1204 11:47:29.243000 111270 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.load(in_ptr1 + (0)) 2025-12-04T12:15:05.0631914Z E1204 11:47:29.243000 111270 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tl.broadcast_to(tmp2, [XBLOCK]) 2025-12-04T12:15:05.0632425Z E1204 11:47:29.243000 111270 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float32) 2025-12-04T12:15:05.0632893Z E1204 11:47:29.243000 111270 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tmp1 * tmp3 2025-12-04T12:15:05.0633353Z E1204 11:47:29.243000 111270 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = -448.0 2025-12-04T12:15:05.0633927Z E1204 11:47:29.243000 111270 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp6 = triton_helpers.maximum(tmp4, tmp5) 2025-12-04T12:15:05.0634373Z E1204 11:47:29.243000 111270 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = 448.0 2025-12-04T12:15:05.0634944Z E1204 11:47:29.243000 111270 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = triton_helpers.minimum(tmp6, tmp7) 2025-12-04T12:15:05.0635467Z E1204 11:47:29.243000 111270 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = tmp8.to(tl.float8e4nv) 2025-12-04T12:15:05.0636021Z E1204 11:47:29.243000 111270 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr0 + (x0), tmp9, None) 2025-12-04T12:15:05.0636387Z E1204 11:47:29.243000 111270 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:05.0638362Z E1204 11:47:29.243000 111270 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr0': '*fp8e4nv', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1024}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 75} 2025-12-04T12:15:05.0638905Z E1204 11:47:29.243000 111270 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:05.0639969Z E1204 11:47:29.243000 111270 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.0640670Z E1204 11:47:29.243000 111270 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.0641589Z E1204 11:47:29.243000 111270 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.0642283Z E1204 11:47:29.243000 111270 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.0643168Z E1204 11:47:29.243000 111270 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.0643952Z E1204 11:47:29.243000 111270 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.0644570Z E1204 11:47:29.243000 111270 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T12:15:05.0645381Z E1204 11:47:29.243000 111270 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_clamp_mul_2(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.0645784Z E1204 11:47:29.243000 111270 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T12:15:05.0646688Z E1204 11:47:29.243000 111270 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.0646824Z ('RERUN', {'yellow': True}) [3.7463s] [100%] 2025-12-04T12:15:05.0647991Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda E1204 11:47:29.868000 111270 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_clamp_mul_2 2025-12-04T12:15:05.0648806Z E1204 11:47:29.868000 111270 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_clamp_mul_2(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.0649274Z E1204 11:47:29.868000 111270 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 33554432 2025-12-04T12:15:05.0649826Z E1204 11:47:29.868000 111270 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:05.0650391Z E1204 11:47:29.868000 111270 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T12:15:05.0650970Z E1204 11:47:29.868000 111270 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:] 2025-12-04T12:15:05.0651407Z E1204 11:47:29.868000 111270 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T12:15:05.0651997Z E1204 11:47:29.868000 111270 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (x0), None).to(tl.float32) 2025-12-04T12:15:05.0652576Z E1204 11:47:29.868000 111270 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.load(in_ptr1 + (0)) 2025-12-04T12:15:05.0653130Z E1204 11:47:29.868000 111270 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tl.broadcast_to(tmp2, [XBLOCK]) 2025-12-04T12:15:05.0653653Z E1204 11:47:29.868000 111270 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float32) 2025-12-04T12:15:05.0654120Z E1204 11:47:29.868000 111270 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tmp1 * tmp3 2025-12-04T12:15:05.0654596Z E1204 11:47:29.868000 111270 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = -448.0 2025-12-04T12:15:05.0655204Z E1204 11:47:29.868000 111270 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp6 = triton_helpers.maximum(tmp4, tmp5) 2025-12-04T12:15:05.0655644Z E1204 11:47:29.868000 111270 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = 448.0 2025-12-04T12:15:05.0656227Z E1204 11:47:29.868000 111270 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = triton_helpers.minimum(tmp6, tmp7) 2025-12-04T12:15:05.0656830Z E1204 11:47:29.868000 111270 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = tmp8.to(tl.float8e4nv) 2025-12-04T12:15:05.0657405Z E1204 11:47:29.868000 111270 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr0 + (x0), tmp9, None) 2025-12-04T12:15:05.0657786Z E1204 11:47:29.868000 111270 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:05.0660129Z E1204 11:47:29.868000 111270 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr0': '*fp8e4nv', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1024}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 75} 2025-12-04T12:15:05.0660740Z E1204 11:47:29.868000 111270 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:05.0661783Z E1204 11:47:29.868000 111270 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.0662428Z E1204 11:47:29.868000 111270 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.0663328Z E1204 11:47:29.868000 111270 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.0664033Z E1204 11:47:29.868000 111270 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.0665067Z E1204 11:47:29.868000 111270 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.0665846Z E1204 11:47:29.868000 111270 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.0666479Z E1204 11:47:29.868000 111270 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T12:15:05.0667331Z E1204 11:47:29.868000 111270 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_clamp_mul_2(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.0667716Z E1204 11:47:29.868000 111270 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T12:15:05.0668612Z E1204 11:47:29.868000 111270 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.0668793Z ('RERUN', {'yellow': True}) [0.5893s] [100%] 2025-12-04T12:15:05.0670002Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda E1204 11:47:30.448000 111270 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_clamp_mul_2 2025-12-04T12:15:05.0670795Z E1204 11:47:30.448000 111270 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_clamp_mul_2(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.0671480Z E1204 11:47:30.448000 111270 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 33554432 2025-12-04T12:15:05.0672022Z E1204 11:47:30.448000 111270 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:05.0672602Z E1204 11:47:30.448000 111270 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T12:15:05.0673173Z E1204 11:47:30.448000 111270 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:] 2025-12-04T12:15:05.0673611Z E1204 11:47:30.448000 111270 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T12:15:05.0674215Z E1204 11:47:30.448000 111270 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (x0), None).to(tl.float32) 2025-12-04T12:15:05.0674855Z E1204 11:47:30.448000 111270 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.load(in_ptr1 + (0)) 2025-12-04T12:15:05.0675420Z E1204 11:47:30.448000 111270 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tl.broadcast_to(tmp2, [XBLOCK]) 2025-12-04T12:15:05.0675929Z E1204 11:47:30.448000 111270 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float32) 2025-12-04T12:15:05.0676411Z E1204 11:47:30.448000 111270 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tmp1 * tmp3 2025-12-04T12:15:05.0676903Z E1204 11:47:30.448000 111270 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = -448.0 2025-12-04T12:15:05.0677473Z E1204 11:47:30.448000 111270 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp6 = triton_helpers.maximum(tmp4, tmp5) 2025-12-04T12:15:05.0677936Z E1204 11:47:30.448000 111270 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = 448.0 2025-12-04T12:15:05.0678496Z E1204 11:47:30.448000 111270 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = triton_helpers.minimum(tmp6, tmp7) 2025-12-04T12:15:05.0679029Z E1204 11:47:30.448000 111270 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = tmp8.to(tl.float8e4nv) 2025-12-04T12:15:05.0679573Z E1204 11:47:30.448000 111270 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr0 + (x0), tmp9, None) 2025-12-04T12:15:05.0679939Z E1204 11:47:30.448000 111270 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:05.0681920Z E1204 11:47:30.448000 111270 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr0': '*fp8e4nv', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1024}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 75} 2025-12-04T12:15:05.0682463Z E1204 11:47:30.448000 111270 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:05.0683522Z E1204 11:47:30.448000 111270 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.0684234Z E1204 11:47:30.448000 111270 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.0685147Z E1204 11:47:30.448000 111270 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.0685830Z E1204 11:47:30.448000 111270 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.0686733Z E1204 11:47:30.448000 111270 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.0687508Z E1204 11:47:30.448000 111270 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.0688115Z E1204 11:47:30.448000 111270 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T12:15:05.0689048Z E1204 11:47:30.448000 111270 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_clamp_mul_2(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.0689415Z E1204 11:47:30.448000 111270 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T12:15:05.0690322Z E1204 11:47:30.448000 111270 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.0690433Z FAILED [0.5778s] [100%] 2025-12-04T12:15:05.0690440Z 2025-12-04T12:15:05.0690588Z ==================================== RERUNS ==================================== 2025-12-04T12:15:05.0690933Z _ TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda _ 2025-12-04T12:15:05.0691061Z Traceback (most recent call last): 2025-12-04T12:15:05.0691534Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 265, in test_amax_along_with_fp8_quant 2025-12-04T12:15:05.0691783Z y_compiled = compiled_amax_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T12:15:05.0692274Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:05.0692537Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:05.0693050Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:05.0693257Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:05.0693770Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:05.0693920Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:05.0702083Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:05.0702637Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:05.0703194Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:05.0703367Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:05.0703860Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:05.0704043Z return self._compile_to_module() 2025-12-04T12:15:05.0704533Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:05.0704748Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:05.0705284Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:05.0705421Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:05.0705925Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:05.0706174Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:05.0706765Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:05.0706909Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:05.0707421Z File "/tmp/tmpog1bh0c0/6n/c6ncjlgssni77o6mwasj6isbalfab5frgpzcdycrnp5j7dka4ylh.py", line 168, in 2025-12-04T12:15:05.0707889Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T12:15:05.0708022Z kernel.precompile( 2025-12-04T12:15:05.0708579Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T12:15:05.0708753Z self._precompile_worker() 2025-12-04T12:15:05.0709355Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:05.0709539Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:05.0710153Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.0710356Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.0710812Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.0711082Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.0711527Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.0711882Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.0712114Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:05.0712476Z def triton_poi_fused__to_copy_clamp_mul_2(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.0712585Z ^ 2025-12-04T12:15:05.0713048Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.0713056Z 2025-12-04T12:15:05.0713789Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:05.0713798Z 2025-12-04T12:15:05.0713803Z 2025-12-04T12:15:05.0714028Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:05.0714675Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda 2025-12-04T12:15:05.0714697Z 2025-12-04T12:15:05.0715000Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:05.0715234Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.0715358Z frames [('total', 1)] 2025-12-04T12:15:05.0715478Z stats [('calls_captured', 7)] 2025-12-04T12:15:05.0715716Z inductor [('async_compile_cache_miss', 3), ('fxgraph_cache_miss', 1)] 2025-12-04T12:15:05.0715956Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.0716088Z graph_break [] 2025-12-04T12:15:05.0716428Z _ TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda _ 2025-12-04T12:15:05.0716599Z Traceback (most recent call last): 2025-12-04T12:15:05.0717056Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 265, in test_amax_along_with_fp8_quant 2025-12-04T12:15:05.0717313Z y_compiled = compiled_amax_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T12:15:05.0717812Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:05.0718061Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:05.0718587Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:05.0718780Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:05.0719304Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:05.0719453Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:05.0719986Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:05.0720321Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:05.0720874Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:05.0721036Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:05.0721516Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:05.0721637Z return self._compile_to_module() 2025-12-04T12:15:05.0722140Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:05.0722308Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:05.0722826Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:05.0722974Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:05.0723472Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:05.0723717Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:05.0724304Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:05.0724431Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:05.0724930Z File "/tmp/tmpf2_jk7i4/ed/cedg4cetw7lgrzulgrg6uysbph33aagyrxijhaajbsbz3dowra6h.py", line 168, in 2025-12-04T12:15:05.0725390Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T12:15:05.0725519Z kernel.precompile( 2025-12-04T12:15:05.0726075Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T12:15:05.0726197Z self._precompile_worker() 2025-12-04T12:15:05.0726806Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:05.0727019Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:05.0727616Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.0727829Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.0728279Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.0728571Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.0729064Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.0729403Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.0729643Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:05.0730006Z def triton_poi_fused__to_copy_clamp_mul_2(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.0730101Z ^ 2025-12-04T12:15:05.0730569Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.0730575Z 2025-12-04T12:15:05.0731284Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:05.0731294Z 2025-12-04T12:15:05.0731299Z 2025-12-04T12:15:05.0731529Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:05.0732177Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda 2025-12-04T12:15:05.0732183Z 2025-12-04T12:15:05.0732464Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:05.0732721Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.0732829Z frames [('total', 1)] 2025-12-04T12:15:05.0732963Z stats [('calls_captured', 7)] 2025-12-04T12:15:05.0733200Z inductor [('async_compile_cache_miss', 3), ('fxgraph_cache_miss', 1)] 2025-12-04T12:15:05.0733421Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.0733537Z graph_break [] 2025-12-04T12:15:05.0733757Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.0733880Z frames [('total', 1)] 2025-12-04T12:15:05.0733995Z stats [('calls_captured', 7)] 2025-12-04T12:15:05.0734213Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.0734463Z inductor [('async_compile_cache_miss', 3), ('fxgraph_cache_miss', 1)] 2025-12-04T12:15:05.0734564Z graph_break [] 2025-12-04T12:15:05.0734711Z =================================== FAILURES =================================== 2025-12-04T12:15:05.0735060Z _ TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda _ 2025-12-04T12:15:05.0735187Z Traceback (most recent call last): 2025-12-04T12:15:05.0735652Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 265, in test_amax_along_with_fp8_quant 2025-12-04T12:15:05.0735898Z y_compiled = compiled_amax_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T12:15:05.0736475Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:05.0736744Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:05.0737259Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:05.0737455Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:05.0737982Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:05.0738133Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:05.0738719Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:05.0739041Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:05.0739562Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:05.0739760Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:05.0740243Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:05.0740412Z return self._compile_to_module() 2025-12-04T12:15:05.0740903Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:05.0741072Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:05.0741605Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:05.0741738Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:05.0742238Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:05.0742489Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:05.0743076Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:05.0743215Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:05.0743732Z File "/tmp/tmp4nx58gwi/zz/czzv6nwerbcir24skctpdetkn3iuakkso7dhks7wsbelyelggatr.py", line 168, in 2025-12-04T12:15:05.0744193Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T12:15:05.0744351Z kernel.precompile( 2025-12-04T12:15:05.0744909Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T12:15:05.0745039Z self._precompile_worker() 2025-12-04T12:15:05.0745632Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:05.0745812Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:05.0746421Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.0746622Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.0747086Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.0747333Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.0747776Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.0748126Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.0748353Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:05.0748716Z def triton_poi_fused__to_copy_clamp_mul_2(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.0748821Z ^ 2025-12-04T12:15:05.0749275Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.0749280Z 2025-12-04T12:15:05.0750005Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:05.0750011Z 2025-12-04T12:15:05.0750018Z 2025-12-04T12:15:05.0750238Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:05.0750923Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda 2025-12-04T12:15:05.0750929Z 2025-12-04T12:15:05.0751198Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:05.0751421Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.0751542Z frames [('total', 1)] 2025-12-04T12:15:05.0752291Z stats [('calls_captured', 7)] 2025-12-04T12:15:05.0752528Z inductor [('async_compile_cache_miss', 3), ('fxgraph_cache_miss', 1)] 2025-12-04T12:15:05.0752794Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.0752898Z graph_break [] 2025-12-04T12:15:05.0753130Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.0753238Z frames [('total', 1)] 2025-12-04T12:15:05.0753355Z stats [('calls_captured', 7)] 2025-12-04T12:15:05.0753584Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.0753818Z inductor [('async_compile_cache_miss', 3), ('fxgraph_cache_miss', 1)] 2025-12-04T12:15:05.0753920Z graph_break [] 2025-12-04T12:15:05.0754148Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.0754254Z frames [('total', 1)] 2025-12-04T12:15:05.0754375Z stats [('calls_captured', 7)] 2025-12-04T12:15:05.0754606Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.0754837Z inductor [('async_compile_cache_miss', 3), ('fxgraph_cache_miss', 1)] 2025-12-04T12:15:05.0754954Z graph_break [] 2025-12-04T12:15:05.0755605Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-9b24822b6f23300e.xml - 2025-12-04T12:15:05.0755780Z =========================== short test summary info ============================ 2025-12-04T12:15:05.0756627Z FAILED [0.5778s] inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:05.0756988Z def triton_poi_fused__to_copy_clamp_mul_2(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.0757079Z ^ 2025-12-04T12:15:05.0757548Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.0757556Z 2025-12-04T12:15:05.0758262Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:05.0758267Z 2025-12-04T12:15:05.0758274Z 2025-12-04T12:15:05.0758503Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:05.0759146Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda 2025-12-04T12:15:05.0759154Z 2025-12-04T12:15:05.0759437Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:05.0759620Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T12:15:05.0759820Z ================== 1 failed, 187 deselected, 2 rerun in 4.96s ================== 2025-12-04T12:15:05.0759935Z Got exit code 1 2025-12-04T12:15:05.0760045Z Retrying single test... 2025-12-04T12:15:05.0760517Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-642548938a706c13.xml 2025-12-04T12:15:05.0760697Z ============================= test session starts ============================== 2025-12-04T12:15:05.0761050Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T12:15:05.0761173Z cachedir: .pytest_cache 2025-12-04T12:15:05.0761698Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:15:05.0761857Z rootdir: /var/lib/jenkins/workspace 2025-12-04T12:15:05.0761980Z configfile: pytest.ini 2025-12-04T12:15:05.0762567Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T12:15:05.0762806Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T12:15:05.0763531Z stepcurrent: skipping 4 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda 2025-12-04T12:15:05.0763680Z Running 1 items in this shard 2025-12-04T12:15:05.0763685Z 2025-12-04T12:15:05.0764902Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda E1204 11:47:49.090000 111532 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_clamp_mul_2 2025-12-04T12:15:05.0765745Z E1204 11:47:49.090000 111532 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_clamp_mul_2(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.0766230Z E1204 11:47:49.090000 111532 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 33554432 2025-12-04T12:15:05.0766771Z E1204 11:47:49.090000 111532 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:05.0767336Z E1204 11:47:49.090000 111532 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T12:15:05.0767917Z E1204 11:47:49.090000 111532 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:] 2025-12-04T12:15:05.0768382Z E1204 11:47:49.090000 111532 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T12:15:05.0768993Z E1204 11:47:49.090000 111532 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (x0), None).to(tl.float32) 2025-12-04T12:15:05.0769513Z E1204 11:47:49.090000 111532 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.load(in_ptr1 + (0)) 2025-12-04T12:15:05.0770072Z E1204 11:47:49.090000 111532 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tl.broadcast_to(tmp2, [XBLOCK]) 2025-12-04T12:15:05.0770585Z E1204 11:47:49.090000 111532 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float32) 2025-12-04T12:15:05.0771247Z E1204 11:47:49.090000 111532 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tmp1 * tmp3 2025-12-04T12:15:05.0771705Z E1204 11:47:49.090000 111532 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = -448.0 2025-12-04T12:15:05.0772281Z E1204 11:47:49.090000 111532 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp6 = triton_helpers.maximum(tmp4, tmp5) 2025-12-04T12:15:05.0772733Z E1204 11:47:49.090000 111532 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = 448.0 2025-12-04T12:15:05.0773300Z E1204 11:47:49.090000 111532 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = triton_helpers.minimum(tmp6, tmp7) 2025-12-04T12:15:05.0773825Z E1204 11:47:49.090000 111532 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = tmp8.to(tl.float8e4nv) 2025-12-04T12:15:05.0774386Z E1204 11:47:49.090000 111532 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr0 + (x0), tmp9, None) 2025-12-04T12:15:05.0774752Z E1204 11:47:49.090000 111532 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:05.0776841Z E1204 11:47:49.090000 111532 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr0': '*fp8e4nv', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1024}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 75} 2025-12-04T12:15:05.0777441Z E1204 11:47:49.090000 111532 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:05.0778540Z E1204 11:47:49.090000 111532 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.0779176Z E1204 11:47:49.090000 111532 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.0780069Z E1204 11:47:49.090000 111532 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.0780762Z E1204 11:47:49.090000 111532 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.0781654Z E1204 11:47:49.090000 111532 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.0782440Z E1204 11:47:49.090000 111532 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.0783096Z E1204 11:47:49.090000 111532 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T12:15:05.0783899Z E1204 11:47:49.090000 111532 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_clamp_mul_2(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.0784266Z E1204 11:47:49.090000 111532 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T12:15:05.0785165Z E1204 11:47:49.090000 111532 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.0785317Z ('RERUN', {'yellow': True}) [3.7459s] [100%] 2025-12-04T12:15:05.0786484Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda E1204 11:47:49.707000 111532 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_clamp_mul_2 2025-12-04T12:15:05.0787302Z E1204 11:47:49.707000 111532 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_clamp_mul_2(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.0787785Z E1204 11:47:49.707000 111532 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 33554432 2025-12-04T12:15:05.0788349Z E1204 11:47:49.707000 111532 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:05.0788916Z E1204 11:47:49.707000 111532 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T12:15:05.0789479Z E1204 11:47:49.707000 111532 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:] 2025-12-04T12:15:05.0789957Z E1204 11:47:49.707000 111532 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T12:15:05.0790549Z E1204 11:47:49.707000 111532 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (x0), None).to(tl.float32) 2025-12-04T12:15:05.0791082Z E1204 11:47:49.707000 111532 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.load(in_ptr1 + (0)) 2025-12-04T12:15:05.0791631Z E1204 11:47:49.707000 111532 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tl.broadcast_to(tmp2, [XBLOCK]) 2025-12-04T12:15:05.0792196Z E1204 11:47:49.707000 111532 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float32) 2025-12-04T12:15:05.0792684Z E1204 11:47:49.707000 111532 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tmp1 * tmp3 2025-12-04T12:15:05.0793130Z E1204 11:47:49.707000 111532 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = -448.0 2025-12-04T12:15:05.0793707Z E1204 11:47:49.707000 111532 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp6 = triton_helpers.maximum(tmp4, tmp5) 2025-12-04T12:15:05.0794142Z E1204 11:47:49.707000 111532 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = 448.0 2025-12-04T12:15:05.0794703Z E1204 11:47:49.707000 111532 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = triton_helpers.minimum(tmp6, tmp7) 2025-12-04T12:15:05.0795242Z E1204 11:47:49.707000 111532 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = tmp8.to(tl.float8e4nv) 2025-12-04T12:15:05.0795787Z E1204 11:47:49.707000 111532 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr0 + (x0), tmp9, None) 2025-12-04T12:15:05.0796172Z E1204 11:47:49.707000 111532 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:05.0798113Z E1204 11:47:49.707000 111532 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr0': '*fp8e4nv', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1024}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 75} 2025-12-04T12:15:05.0798662Z E1204 11:47:49.707000 111532 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:05.0799700Z E1204 11:47:49.707000 111532 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.0800349Z E1204 11:47:49.707000 111532 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.0801238Z E1204 11:47:49.707000 111532 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.0801918Z E1204 11:47:49.707000 111532 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.0802816Z E1204 11:47:49.707000 111532 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.0803584Z E1204 11:47:49.707000 111532 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.0804236Z E1204 11:47:49.707000 111532 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T12:15:05.0805034Z E1204 11:47:49.707000 111532 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_clamp_mul_2(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.0805410Z E1204 11:47:49.707000 111532 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T12:15:05.0806366Z E1204 11:47:49.707000 111532 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.0806504Z ('RERUN', {'yellow': True}) [0.5813s] [100%] 2025-12-04T12:15:05.0807690Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda E1204 11:47:50.309000 111532 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_clamp_mul_2 2025-12-04T12:15:05.0808485Z E1204 11:47:50.309000 111532 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_clamp_mul_2(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.0808962Z E1204 11:47:50.309000 111532 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 33554432 2025-12-04T12:15:05.0809505Z E1204 11:47:50.309000 111532 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:05.0810076Z E1204 11:47:50.309000 111532 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T12:15:05.0810672Z E1204 11:47:50.309000 111532 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:] 2025-12-04T12:15:05.0811107Z E1204 11:47:50.309000 111532 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T12:15:05.0811713Z E1204 11:47:50.309000 111532 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (x0), None).to(tl.float32) 2025-12-04T12:15:05.0812232Z E1204 11:47:50.309000 111532 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.load(in_ptr1 + (0)) 2025-12-04T12:15:05.0812791Z E1204 11:47:50.309000 111532 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tl.broadcast_to(tmp2, [XBLOCK]) 2025-12-04T12:15:05.0813298Z E1204 11:47:50.309000 111532 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float32) 2025-12-04T12:15:05.0813765Z E1204 11:47:50.309000 111532 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tmp1 * tmp3 2025-12-04T12:15:05.0814220Z E1204 11:47:50.309000 111532 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = -448.0 2025-12-04T12:15:05.0814782Z E1204 11:47:50.309000 111532 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp6 = triton_helpers.maximum(tmp4, tmp5) 2025-12-04T12:15:05.0815225Z E1204 11:47:50.309000 111532 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = 448.0 2025-12-04T12:15:05.0815794Z E1204 11:47:50.309000 111532 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = triton_helpers.minimum(tmp6, tmp7) 2025-12-04T12:15:05.0816373Z E1204 11:47:50.309000 111532 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = tmp8.to(tl.float8e4nv) 2025-12-04T12:15:05.0816931Z E1204 11:47:50.309000 111532 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr0 + (x0), tmp9, None) 2025-12-04T12:15:05.0817293Z E1204 11:47:50.309000 111532 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:05.0819269Z E1204 11:47:50.309000 111532 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr0': '*fp8e4nv', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1024}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 75} 2025-12-04T12:15:05.0819866Z E1204 11:47:50.309000 111532 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:05.0820932Z E1204 11:47:50.309000 111532 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.0821564Z E1204 11:47:50.309000 111532 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.0822474Z E1204 11:47:50.309000 111532 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.0823151Z E1204 11:47:50.309000 111532 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.0824042Z E1204 11:47:50.309000 111532 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.0824859Z E1204 11:47:50.309000 111532 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.0825471Z E1204 11:47:50.309000 111532 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T12:15:05.0826281Z E1204 11:47:50.309000 111532 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_clamp_mul_2(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.0826651Z E1204 11:47:50.309000 111532 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T12:15:05.0827555Z E1204 11:47:50.309000 111532 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.0827663Z FAILED [0.5994s] [100%] 2025-12-04T12:15:05.0827671Z 2025-12-04T12:15:05.0827816Z ==================================== RERUNS ==================================== 2025-12-04T12:15:05.0828162Z _ TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda _ 2025-12-04T12:15:05.0828289Z Traceback (most recent call last): 2025-12-04T12:15:05.0828744Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 265, in test_amax_along_with_fp8_quant 2025-12-04T12:15:05.0829001Z y_compiled = compiled_amax_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T12:15:05.0829489Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:05.0829750Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:05.0830262Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:05.0830457Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:05.0831029Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:05.0831177Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:05.0831724Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:05.0832046Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:05.0832596Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:05.0832756Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:05.0833269Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:05.0833394Z return self._compile_to_module() 2025-12-04T12:15:05.0833893Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:05.0834059Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:05.0834588Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:05.0834718Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:05.0835215Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:05.0835459Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:05.0836048Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:05.0836189Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:05.0836701Z File "/tmp/tmpcovv30vq/l2/cl2pfyu3yerwqajaunnn6bdchrm5jjmcj5tpr5xrpkyrtxpk3qwp.py", line 168, in 2025-12-04T12:15:05.0837195Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T12:15:05.0837320Z kernel.precompile( 2025-12-04T12:15:05.0837876Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T12:15:05.0837994Z self._precompile_worker() 2025-12-04T12:15:05.0838599Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:05.0838781Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:05.0839388Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.0839590Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.0840041Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.0840301Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.0840746Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.0841090Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.0841317Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:05.0841678Z def triton_poi_fused__to_copy_clamp_mul_2(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.0841781Z ^ 2025-12-04T12:15:05.0842243Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.0842249Z 2025-12-04T12:15:05.0842976Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:05.0842984Z 2025-12-04T12:15:05.0842989Z 2025-12-04T12:15:05.0843237Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:05.0843883Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda 2025-12-04T12:15:05.0843901Z 2025-12-04T12:15:05.0844169Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:05.0844425Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.0844549Z frames [('total', 1)] 2025-12-04T12:15:05.0844669Z stats [('calls_captured', 7)] 2025-12-04T12:15:05.0844935Z inductor [('async_compile_cache_miss', 3), ('fxgraph_cache_miss', 1)] 2025-12-04T12:15:05.0845170Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.0845272Z graph_break [] 2025-12-04T12:15:05.0845606Z _ TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda _ 2025-12-04T12:15:05.0845743Z Traceback (most recent call last): 2025-12-04T12:15:05.0846201Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 265, in test_amax_along_with_fp8_quant 2025-12-04T12:15:05.0846457Z y_compiled = compiled_amax_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T12:15:05.0846945Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:05.0847197Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:05.0847721Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:05.0847918Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:05.0848439Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:05.0848621Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:05.0849710Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:05.0850051Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:05.0850580Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:05.0850731Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:05.0851228Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:05.0851354Z return self._compile_to_module() 2025-12-04T12:15:05.0851851Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:05.0852019Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:05.0852536Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:05.0852683Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:05.0853177Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:05.0853420Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:05.0854004Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:05.0854134Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:05.0854659Z File "/tmp/tmp1rnwc7cl/kt/cktt3qo6gdwctexez2ytmxkqwi3slhgisa7ujqlzt5l7gsfou4mu.py", line 168, in 2025-12-04T12:15:05.0855121Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T12:15:05.0855237Z kernel.precompile( 2025-12-04T12:15:05.0855849Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T12:15:05.0855970Z self._precompile_worker() 2025-12-04T12:15:05.0856668Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:05.0856864Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:05.0857461Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.0857714Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.0858201Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.0858464Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.0858914Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.0859252Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.0859498Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:05.0859859Z def triton_poi_fused__to_copy_clamp_mul_2(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.0859952Z ^ 2025-12-04T12:15:05.0860427Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.0860435Z 2025-12-04T12:15:05.0861150Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:05.0861157Z 2025-12-04T12:15:05.0861162Z 2025-12-04T12:15:05.0861430Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:05.0862219Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda 2025-12-04T12:15:05.0862226Z 2025-12-04T12:15:05.0862512Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:05.0862742Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.0862852Z frames [('total', 1)] 2025-12-04T12:15:05.0862994Z stats [('calls_captured', 7)] 2025-12-04T12:15:05.0863234Z inductor [('async_compile_cache_miss', 3), ('fxgraph_cache_miss', 1)] 2025-12-04T12:15:05.0863457Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.0863577Z graph_break [] 2025-12-04T12:15:05.0863798Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.0863919Z frames [('total', 1)] 2025-12-04T12:15:05.0864043Z stats [('calls_captured', 7)] 2025-12-04T12:15:05.0864264Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.0864515Z inductor [('async_compile_cache_miss', 3), ('fxgraph_cache_miss', 1)] 2025-12-04T12:15:05.0864619Z graph_break [] 2025-12-04T12:15:05.0864769Z =================================== FAILURES =================================== 2025-12-04T12:15:05.0865122Z _ TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda _ 2025-12-04T12:15:05.0865252Z Traceback (most recent call last): 2025-12-04T12:15:05.0865709Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 265, in test_amax_along_with_fp8_quant 2025-12-04T12:15:05.0865973Z y_compiled = compiled_amax_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T12:15:05.0866466Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:05.0866732Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:05.0867305Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:05.0867505Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:05.0868032Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:05.0868182Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:05.0868735Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:05.0869107Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:05.0869663Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:05.0869830Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:05.0870313Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:05.0870454Z return self._compile_to_module() 2025-12-04T12:15:05.0871103Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:05.0871271Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:05.0871807Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:05.0871941Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:05.0872438Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:05.0872687Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:05.0873273Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:05.0873512Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:05.0874003Z File "/tmp/tmpjp_ii3nz/pg/cpgmhtrqxu7r366x6ajfqgjb7chhlm5bjwvw77kf2nla77uk5ley.py", line 168, in 2025-12-04T12:15:05.0874468Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T12:15:05.0874593Z kernel.precompile( 2025-12-04T12:15:05.0875141Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T12:15:05.0875276Z self._precompile_worker() 2025-12-04T12:15:05.0875878Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:05.0876057Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:05.0876667Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.0876869Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.0877325Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.0877585Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.0878027Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.0878375Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.0878601Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:05.0878966Z def triton_poi_fused__to_copy_clamp_mul_2(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.0879071Z ^ 2025-12-04T12:15:05.0879527Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.0879536Z 2025-12-04T12:15:05.0880313Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:05.0880321Z 2025-12-04T12:15:05.0880325Z 2025-12-04T12:15:05.0880545Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:05.0881189Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda 2025-12-04T12:15:05.0881252Z 2025-12-04T12:15:05.0881525Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:05.0881797Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.0881918Z frames [('total', 1)] 2025-12-04T12:15:05.0882035Z stats [('calls_captured', 7)] 2025-12-04T12:15:05.0882278Z inductor [('async_compile_cache_miss', 3), ('fxgraph_cache_miss', 1)] 2025-12-04T12:15:05.0882512Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.0882616Z graph_break [] 2025-12-04T12:15:05.0882833Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.0882952Z frames [('total', 1)] 2025-12-04T12:15:05.0883068Z stats [('calls_captured', 7)] 2025-12-04T12:15:05.0883299Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.0883535Z inductor [('async_compile_cache_miss', 3), ('fxgraph_cache_miss', 1)] 2025-12-04T12:15:05.0883637Z graph_break [] 2025-12-04T12:15:05.0883872Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.0883979Z frames [('total', 1)] 2025-12-04T12:15:05.0884095Z stats [('calls_captured', 7)] 2025-12-04T12:15:05.0884329Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.0884596Z inductor [('async_compile_cache_miss', 3), ('fxgraph_cache_miss', 1)] 2025-12-04T12:15:05.0884696Z graph_break [] 2025-12-04T12:15:05.0885358Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-642548938a706c13.xml - 2025-12-04T12:15:05.0885535Z =========================== short test summary info ============================ 2025-12-04T12:15:05.0886351Z FAILED [0.5994s] inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:05.0886715Z def triton_poi_fused__to_copy_clamp_mul_2(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.0886808Z ^ 2025-12-04T12:15:05.0887280Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.0887285Z 2025-12-04T12:15:05.0887994Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:05.0888002Z 2025-12-04T12:15:05.0888009Z 2025-12-04T12:15:05.0888246Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:05.0888897Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_along_with_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda 2025-12-04T12:15:05.0888903Z 2025-12-04T12:15:05.0889186Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:05.0889370Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T12:15:05.0889574Z ================== 1 failed, 187 deselected, 2 rerun in 4.97s ================== 2025-12-04T12:15:05.0889693Z Got exit code 1 2025-12-04T12:15:05.0890262Z FAILED CONSISTENTLY: test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda 2025-12-04T12:15:05.0890695Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T12:15:05.0891205Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-3087aa3d89d0a96b.xml 2025-12-04T12:15:05.0891374Z ============================= test session starts ============================== 2025-12-04T12:15:05.0891739Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T12:15:05.0891852Z cachedir: .pytest_cache 2025-12-04T12:15:05.0892412Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:15:05.0892553Z rootdir: /var/lib/jenkins/workspace 2025-12-04T12:15:05.0892769Z configfile: pytest.ini 2025-12-04T12:15:05.0893379Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T12:15:05.0893605Z collecting ... collected 188 items / 5 deselected / 183 selected 2025-12-04T12:15:05.0893750Z stepcurrent: skipping 5 already run items. 2025-12-04T12:15:05.0893884Z Running 183 items in this shard 2025-12-04T12:15:05.0893889Z 2025-12-04T12:15:05.0894396Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e5m2_shape_1,1,15_cuda PASSED [3.3894s] [ 0%] 2025-12-04T12:15:05.0894896Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e5m2_shape_1,10,15_cuda PASSED [0.2990s] [ 1%] 2025-12-04T12:15:05.0895419Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e5m2_shape_1,10,4096_cuda PASSED [0.6567s] [ 1%] 2025-12-04T12:15:05.0895925Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e5m2_shape_1,10,512_cuda PASSED [0.3272s] [ 2%] 2025-12-04T12:15:05.0896521Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e5m2_shape_4,2048,4096_cuda PASSED [0.7233s] [ 2%] 2025-12-04T12:15:05.0897717Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda E1204 11:48:10.523000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_mul_0 2025-12-04T12:15:05.0898610Z E1204 11:48:10.523000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.0899050Z E1204 11:48:10.523000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T12:15:05.0899495Z E1204 11:48:10.523000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 15 2025-12-04T12:15:05.0900361Z E1204 11:48:10.523000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] R0_BLOCK: tl.constexpr = 16 2025-12-04T12:15:05.0900835Z E1204 11:48:10.523000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T12:15:05.0901385Z E1204 11:48:10.523000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T12:15:05.0901924Z E1204 11:48:10.523000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:05.0902510Z E1204 11:48:10.523000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T12:15:05.0903113Z E1204 11:48:10.523000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T12:15:05.0903669Z E1204 11:48:10.523000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T12:15:05.0904130Z E1204 11:48:10.523000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_offset = 0 2025-12-04T12:15:05.0904694Z E1204 11:48:10.523000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T12:15:05.0905168Z E1204 11:48:10.523000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T12:15:05.0905642Z E1204 11:48:10.523000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T12:15:05.0906120Z E1204 11:48:10.523000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_0 = r0_index 2025-12-04T12:15:05.0906813Z E1204 11:48:10.523000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0).to(tl.float32) 2025-12-04T12:15:05.0907338Z E1204 11:48:10.523000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = tl.load(in_ptr1 + (0)) 2025-12-04T12:15:05.0907898Z E1204 11:48:10.523000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tl.broadcast_to(tmp7, [1, 1]) 2025-12-04T12:15:05.0908399Z E1204 11:48:10.523000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tl_math.abs(tmp0) 2025-12-04T12:15:05.0908979Z E1204 11:48:10.523000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK]) 2025-12-04T12:15:05.0909563Z E1204 11:48:10.523000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tl.where(r0_mask, tmp2, float("-inf")) 2025-12-04T12:15:05.0910189Z E1204 11:48:10.523000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = triton_helpers.max2(tmp4, 1)[:, None].to(tl.float32) 2025-12-04T12:15:05.0910706Z E1204 11:48:10.523000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp6 = tmp5.to(tl.float32) 2025-12-04T12:15:05.0911207Z E1204 11:48:10.523000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = tmp6 * tmp8 2025-12-04T12:15:05.0911653Z E1204 11:48:10.523000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = -448.0 2025-12-04T12:15:05.0912233Z E1204 11:48:10.523000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = triton_helpers.maximum(tmp9, tmp10) 2025-12-04T12:15:05.0912675Z E1204 11:48:10.523000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = 448.0 2025-12-04T12:15:05.0913261Z E1204 11:48:10.523000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = triton_helpers.minimum(tmp11, tmp12) 2025-12-04T12:15:05.0913795Z E1204 11:48:10.523000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp14 = tmp13.to(tl.float8e4nv) 2025-12-04T12:15:05.0914513Z E1204 11:48:10.523000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp14, None) 2025-12-04T12:15:05.0914894Z E1204 11:48:10.523000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:05.0916876Z E1204 11:48:10.523000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 75} 2025-12-04T12:15:05.0917483Z E1204 11:48:10.523000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:05.0918572Z E1204 11:48:10.523000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.0919223Z E1204 11:48:10.523000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.0920157Z E1204 11:48:10.523000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.0920909Z E1204 11:48:10.523000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.0921837Z E1204 11:48:10.523000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.0922627Z E1204 11:48:10.523000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.0923303Z E1204 11:48:10.523000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T12:15:05.0924205Z E1204 11:48:10.523000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.0924597Z E1204 11:48:10.523000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T12:15:05.0925544Z E1204 11:48:10.523000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.0925692Z ('RERUN', {'yellow': True}) [0.1761s] [ 3%] 2025-12-04T12:15:05.0926846Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda E1204 11:48:10.919000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_mul_0 2025-12-04T12:15:05.0927745Z E1204 11:48:10.919000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.0928181Z E1204 11:48:10.919000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T12:15:05.0928624Z E1204 11:48:10.919000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 15 2025-12-04T12:15:05.0929178Z E1204 11:48:10.919000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] R0_BLOCK: tl.constexpr = 16 2025-12-04T12:15:05.0929646Z E1204 11:48:10.919000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T12:15:05.0930199Z E1204 11:48:10.919000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T12:15:05.0930757Z E1204 11:48:10.919000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:05.0931343Z E1204 11:48:10.919000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T12:15:05.0931943Z E1204 11:48:10.919000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T12:15:05.0932539Z E1204 11:48:10.919000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T12:15:05.0932996Z E1204 11:48:10.919000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_offset = 0 2025-12-04T12:15:05.0933517Z E1204 11:48:10.919000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T12:15:05.0934038Z E1204 11:48:10.919000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T12:15:05.0934538Z E1204 11:48:10.919000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T12:15:05.0934992Z E1204 11:48:10.919000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_0 = r0_index 2025-12-04T12:15:05.0935666Z E1204 11:48:10.919000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0).to(tl.float32) 2025-12-04T12:15:05.0936194Z E1204 11:48:10.919000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = tl.load(in_ptr1 + (0)) 2025-12-04T12:15:05.0936840Z E1204 11:48:10.919000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tl.broadcast_to(tmp7, [1, 1]) 2025-12-04T12:15:05.0937343Z E1204 11:48:10.919000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tl_math.abs(tmp0) 2025-12-04T12:15:05.0937937Z E1204 11:48:10.919000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK]) 2025-12-04T12:15:05.0938529Z E1204 11:48:10.919000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tl.where(r0_mask, tmp2, float("-inf")) 2025-12-04T12:15:05.0939198Z E1204 11:48:10.919000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = triton_helpers.max2(tmp4, 1)[:, None].to(tl.float32) 2025-12-04T12:15:05.0939722Z E1204 11:48:10.919000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp6 = tmp5.to(tl.float32) 2025-12-04T12:15:05.0940195Z E1204 11:48:10.919000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = tmp6 * tmp8 2025-12-04T12:15:05.0940649Z E1204 11:48:10.919000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = -448.0 2025-12-04T12:15:05.0941247Z E1204 11:48:10.919000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = triton_helpers.maximum(tmp9, tmp10) 2025-12-04T12:15:05.0941688Z E1204 11:48:10.919000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = 448.0 2025-12-04T12:15:05.0942289Z E1204 11:48:10.919000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = triton_helpers.minimum(tmp11, tmp12) 2025-12-04T12:15:05.0942824Z E1204 11:48:10.919000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp14 = tmp13.to(tl.float8e4nv) 2025-12-04T12:15:05.0943536Z E1204 11:48:10.919000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp14, None) 2025-12-04T12:15:05.0943916Z E1204 11:48:10.919000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:05.0945892Z E1204 11:48:10.919000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 75} 2025-12-04T12:15:05.0946446Z E1204 11:48:10.919000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:05.0947491Z E1204 11:48:10.919000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.0948201Z E1204 11:48:10.919000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.0949099Z E1204 11:48:10.919000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.0949798Z E1204 11:48:10.919000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.0950683Z E1204 11:48:10.919000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.0951464Z E1204 11:48:10.919000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.0952083Z E1204 11:48:10.919000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T12:15:05.0952959Z E1204 11:48:10.919000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.0953379Z E1204 11:48:10.919000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T12:15:05.0954279Z E1204 11:48:10.919000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.0954431Z ('RERUN', {'yellow': True}) [0.3661s] [ 3%] 2025-12-04T12:15:05.0955580Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda E1204 11:48:11.261000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_mul_0 2025-12-04T12:15:05.0956449Z E1204 11:48:11.261000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.0956906Z E1204 11:48:11.261000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T12:15:05.0957356Z E1204 11:48:11.261000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 15 2025-12-04T12:15:05.0957894Z E1204 11:48:11.261000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] R0_BLOCK: tl.constexpr = 16 2025-12-04T12:15:05.0958360Z E1204 11:48:11.261000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T12:15:05.0958915Z E1204 11:48:11.261000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T12:15:05.0959457Z E1204 11:48:11.261000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:05.0960047Z E1204 11:48:11.261000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T12:15:05.0960682Z E1204 11:48:11.261000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T12:15:05.0961239Z E1204 11:48:11.261000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T12:15:05.0961696Z E1204 11:48:11.261000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_offset = 0 2025-12-04T12:15:05.0962277Z E1204 11:48:11.261000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T12:15:05.0962751Z E1204 11:48:11.261000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T12:15:05.0963237Z E1204 11:48:11.261000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T12:15:05.0963689Z E1204 11:48:11.261000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_0 = r0_index 2025-12-04T12:15:05.0964351Z E1204 11:48:11.261000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0).to(tl.float32) 2025-12-04T12:15:05.0964881Z E1204 11:48:11.261000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = tl.load(in_ptr1 + (0)) 2025-12-04T12:15:05.0965430Z E1204 11:48:11.261000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tl.broadcast_to(tmp7, [1, 1]) 2025-12-04T12:15:05.0965944Z E1204 11:48:11.261000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tl_math.abs(tmp0) 2025-12-04T12:15:05.0966527Z E1204 11:48:11.261000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK]) 2025-12-04T12:15:05.0967146Z E1204 11:48:11.261000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tl.where(r0_mask, tmp2, float("-inf")) 2025-12-04T12:15:05.0967773Z E1204 11:48:11.261000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = triton_helpers.max2(tmp4, 1)[:, None].to(tl.float32) 2025-12-04T12:15:05.0968280Z E1204 11:48:11.261000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp6 = tmp5.to(tl.float32) 2025-12-04T12:15:05.0968768Z E1204 11:48:11.261000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = tmp6 * tmp8 2025-12-04T12:15:05.0969215Z E1204 11:48:11.261000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = -448.0 2025-12-04T12:15:05.0969805Z E1204 11:48:11.261000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = triton_helpers.maximum(tmp9, tmp10) 2025-12-04T12:15:05.0970254Z E1204 11:48:11.261000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = 448.0 2025-12-04T12:15:05.0970829Z E1204 11:48:11.261000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = triton_helpers.minimum(tmp11, tmp12) 2025-12-04T12:15:05.0971566Z E1204 11:48:11.261000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp14 = tmp13.to(tl.float8e4nv) 2025-12-04T12:15:05.0972280Z E1204 11:48:11.261000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp14, None) 2025-12-04T12:15:05.0972661Z E1204 11:48:11.261000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:05.0974685Z E1204 11:48:11.261000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 75} 2025-12-04T12:15:05.0975248Z E1204 11:48:11.261000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:05.0976464Z E1204 11:48:11.261000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.0977117Z E1204 11:48:11.261000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.0978037Z E1204 11:48:11.261000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.0978719Z E1204 11:48:11.261000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.0979620Z E1204 11:48:11.261000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.0980404Z E1204 11:48:11.261000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.0981075Z E1204 11:48:11.261000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T12:15:05.0981959Z E1204 11:48:11.261000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.0982339Z E1204 11:48:11.261000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T12:15:05.0983232Z E1204 11:48:11.261000 111793 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.0983349Z FAILED [0.3408s] [ 3%] 2025-12-04T12:15:05.0983374Z 2025-12-04T12:15:05.0983525Z ==================================== RERUNS ==================================== 2025-12-04T12:15:05.0983820Z _____ TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda _____ 2025-12-04T12:15:05.0983962Z Traceback (most recent call last): 2025-12-04T12:15:05.0984367Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 236, in test_amax_fp8_quant 2025-12-04T12:15:05.0984527Z y_compiled = compiled_amax_fp8_quant(x, scale) 2025-12-04T12:15:05.0985033Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:05.0985285Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:05.0985814Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:05.0986010Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:05.0986523Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:05.0986683Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:05.0987225Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:05.0987587Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:05.0988121Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:05.0988272Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:05.0988762Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:05.0988921Z return self._compile_to_module() 2025-12-04T12:15:05.0989436Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:05.0989615Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:05.0990132Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:05.0990281Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:05.0990778Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:05.0991013Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:05.0991620Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:05.0991754Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:05.0992239Z File "/tmp/tmp03_ztxxq/cx/ccxjqkpnzscpe7jvzeho2zrfnk4hkkr4ok5z7cgtyhamznr5ph5i.py", line 58, in 2025-12-04T12:15:05.0992722Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T12:15:05.0992837Z kernel.precompile( 2025-12-04T12:15:05.0993438Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T12:15:05.0993562Z self._precompile_worker() 2025-12-04T12:15:05.0994160Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:05.0994364Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:05.0994957Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.0995172Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.0995630Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.0995877Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.0996337Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.0996677Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.0996907Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:05.0997350Z def triton_per_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.0997442Z ^ 2025-12-04T12:15:05.0997916Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.0997924Z 2025-12-04T12:15:05.0998642Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:05.0998648Z 2025-12-04T12:15:05.0998653Z 2025-12-04T12:15:05.0998883Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:05.0999460Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda 2025-12-04T12:15:05.0999466Z 2025-12-04T12:15:05.0999765Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:05.1000005Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.1000112Z frames [('total', 1)] 2025-12-04T12:15:05.1000230Z stats [('calls_captured', 6)] 2025-12-04T12:15:05.1000467Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.1000736Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:05.1000848Z graph_break [] 2025-12-04T12:15:05.1001171Z _____ TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda _____ 2025-12-04T12:15:05.1001298Z Traceback (most recent call last): 2025-12-04T12:15:05.1001708Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 236, in test_amax_fp8_quant 2025-12-04T12:15:05.1001867Z y_compiled = compiled_amax_fp8_quant(x, scale) 2025-12-04T12:15:05.1002359Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:05.1002642Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:05.1003156Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:05.1003364Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:05.1003877Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:05.1004025Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:05.1004574Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:05.1004896Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:05.1005462Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:05.1005616Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:05.1006100Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:05.1006244Z return self._compile_to_module() 2025-12-04T12:15:05.1006730Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:05.1006898Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:05.1007428Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:05.1007561Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:05.1008068Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:05.1008308Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:05.1008893Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:05.1009039Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:05.1009545Z File "/tmp/tmpzu2vv2w2/pq/cpqops675htgquucs5stc3cdhr6yh67ew3vhngbdxmjnohi4f2kc.py", line 58, in 2025-12-04T12:15:05.1010028Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T12:15:05.1010141Z kernel.precompile( 2025-12-04T12:15:05.1010700Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T12:15:05.1010835Z self._precompile_worker() 2025-12-04T12:15:05.1011433Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:05.1011644Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:05.1012258Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.1012460Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.1013336Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.1013633Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.1014111Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.1014465Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.1014716Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:05.1015167Z def triton_per_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.1015259Z ^ 2025-12-04T12:15:05.1015726Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.1015732Z 2025-12-04T12:15:05.1016532Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:05.1016542Z 2025-12-04T12:15:05.1016547Z 2025-12-04T12:15:05.1016767Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:05.1017361Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda 2025-12-04T12:15:05.1017367Z 2025-12-04T12:15:05.1017638Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:05.1017904Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.1018032Z frames [('total', 1)] 2025-12-04T12:15:05.1018151Z stats [('calls_captured', 6)] 2025-12-04T12:15:05.1018391Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.1018631Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:05.1018735Z graph_break [] 2025-12-04T12:15:05.1018969Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.1019080Z frames [('total', 1)] 2025-12-04T12:15:05.1019198Z stats [('calls_captured', 6)] 2025-12-04T12:15:05.1019436Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.1019676Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:05.1019779Z graph_break [] 2025-12-04T12:15:05.1019946Z =================================== FAILURES =================================== 2025-12-04T12:15:05.1020238Z _____ TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda _____ 2025-12-04T12:15:05.1020380Z Traceback (most recent call last): 2025-12-04T12:15:05.1020783Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 236, in test_amax_fp8_quant 2025-12-04T12:15:05.1020943Z y_compiled = compiled_amax_fp8_quant(x, scale) 2025-12-04T12:15:05.1021454Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:05.1021759Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:05.1022424Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:05.1022624Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:05.1023135Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:05.1023301Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:05.1023913Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:05.1024238Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:05.1024773Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:05.1024959Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:05.1025453Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:05.1025824Z return self._compile_to_module() 2025-12-04T12:15:05.1026319Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:05.1026502Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:05.1027024Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:05.1027172Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:05.1027673Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:05.1027909Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:05.1028516Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:05.1028644Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:05.1029126Z File "/tmp/tmp7_bgkn17/w4/cw4vyp64zvdbwybls4u4tmfkgzxoykbl65jou4flvp457v3ti6uc.py", line 58, in 2025-12-04T12:15:05.1029610Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T12:15:05.1029760Z kernel.precompile( 2025-12-04T12:15:05.1030336Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T12:15:05.1030455Z self._precompile_worker() 2025-12-04T12:15:05.1031051Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:05.1031245Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:05.1031840Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.1032057Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.1032508Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.1032758Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.1033218Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.1033553Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.1033785Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:05.1034229Z def triton_per_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.1034324Z ^ 2025-12-04T12:15:05.1034799Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.1034805Z 2025-12-04T12:15:05.1035516Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:05.1035525Z 2025-12-04T12:15:05.1035530Z 2025-12-04T12:15:05.1035758Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:05.1036370Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda 2025-12-04T12:15:05.1036377Z 2025-12-04T12:15:05.1036649Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:05.1036890Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.1036997Z frames [('total', 1)] 2025-12-04T12:15:05.1037146Z stats [('calls_captured', 6)] 2025-12-04T12:15:05.1037386Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.1037651Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:05.1037768Z graph_break [] 2025-12-04T12:15:05.1037989Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.1038096Z frames [('total', 1)] 2025-12-04T12:15:05.1038229Z stats [('calls_captured', 6)] 2025-12-04T12:15:05.1038447Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.1038684Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:05.1038799Z graph_break [] 2025-12-04T12:15:05.1039017Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.1039121Z frames [('total', 1)] 2025-12-04T12:15:05.1039250Z stats [('calls_captured', 6)] 2025-12-04T12:15:05.1039468Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.1039713Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:05.1039814Z graph_break [] 2025-12-04T12:15:05.1040469Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-3087aa3d89d0a96b.xml - 2025-12-04T12:15:05.1040660Z =========================== short test summary info ============================ 2025-12-04T12:15:05.1041409Z FAILED [0.3408s] inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:05.1041853Z def triton_per_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.1041945Z ^ 2025-12-04T12:15:05.1042407Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.1042415Z 2025-12-04T12:15:05.1043135Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:05.1043141Z 2025-12-04T12:15:05.1043148Z 2025-12-04T12:15:05.1043368Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:05.1043955Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda 2025-12-04T12:15:05.1043963Z 2025-12-04T12:15:05.1044235Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:05.1044417Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T12:15:05.1044646Z ============== 1 failed, 5 passed, 5 deselected, 2 rerun in 6.33s ============== 2025-12-04T12:15:05.1044749Z Got exit code 1 2025-12-04T12:15:05.1044871Z Retrying single test... 2025-12-04T12:15:05.1045346Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-5fb4c628c04a1cdc.xml 2025-12-04T12:15:05.1045515Z ============================= test session starts ============================== 2025-12-04T12:15:05.1045885Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T12:15:05.1045998Z cachedir: .pytest_cache 2025-12-04T12:15:05.1046523Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:15:05.1046663Z rootdir: /var/lib/jenkins/workspace 2025-12-04T12:15:05.1046806Z configfile: pytest.ini 2025-12-04T12:15:05.1047409Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T12:15:05.1047634Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T12:15:05.1048309Z stepcurrent: skipping 10 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda 2025-12-04T12:15:05.1048476Z Running 1 items in this shard 2025-12-04T12:15:05.1048481Z 2025-12-04T12:15:05.1049654Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda E1204 11:48:27.920000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_mul_0 2025-12-04T12:15:05.1050545Z E1204 11:48:27.920000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.1050977Z E1204 11:48:27.920000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T12:15:05.1051422Z E1204 11:48:27.920000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 15 2025-12-04T12:15:05.1051953Z E1204 11:48:27.920000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] R0_BLOCK: tl.constexpr = 16 2025-12-04T12:15:05.1052419Z E1204 11:48:27.920000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T12:15:05.1052972Z E1204 11:48:27.920000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T12:15:05.1053548Z E1204 11:48:27.920000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:05.1054144Z E1204 11:48:27.920000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T12:15:05.1054726Z E1204 11:48:27.920000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T12:15:05.1055285Z E1204 11:48:27.920000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T12:15:05.1055743Z E1204 11:48:27.920000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_offset = 0 2025-12-04T12:15:05.1056261Z E1204 11:48:27.920000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T12:15:05.1056842Z E1204 11:48:27.920000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T12:15:05.1057307Z E1204 11:48:27.920000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T12:15:05.1057754Z E1204 11:48:27.920000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_0 = r0_index 2025-12-04T12:15:05.1058420Z E1204 11:48:27.920000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0).to(tl.float32) 2025-12-04T12:15:05.1058945Z E1204 11:48:27.920000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = tl.load(in_ptr1 + (0)) 2025-12-04T12:15:05.1059501Z E1204 11:48:27.920000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tl.broadcast_to(tmp7, [1, 1]) 2025-12-04T12:15:05.1060001Z E1204 11:48:27.920000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tl_math.abs(tmp0) 2025-12-04T12:15:05.1060620Z E1204 11:48:27.920000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK]) 2025-12-04T12:15:05.1061210Z E1204 11:48:27.920000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tl.where(r0_mask, tmp2, float("-inf")) 2025-12-04T12:15:05.1061838Z E1204 11:48:27.920000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = triton_helpers.max2(tmp4, 1)[:, None].to(tl.float32) 2025-12-04T12:15:05.1062417Z E1204 11:48:27.920000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp6 = tmp5.to(tl.float32) 2025-12-04T12:15:05.1062884Z E1204 11:48:27.920000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = tmp6 * tmp8 2025-12-04T12:15:05.1063331Z E1204 11:48:27.920000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = -448.0 2025-12-04T12:15:05.1063918Z E1204 11:48:27.920000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = triton_helpers.maximum(tmp9, tmp10) 2025-12-04T12:15:05.1064357Z E1204 11:48:27.920000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = 448.0 2025-12-04T12:15:05.1064945Z E1204 11:48:27.920000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = triton_helpers.minimum(tmp11, tmp12) 2025-12-04T12:15:05.1065481Z E1204 11:48:27.920000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp14 = tmp13.to(tl.float8e4nv) 2025-12-04T12:15:05.1066213Z E1204 11:48:27.920000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp14, None) 2025-12-04T12:15:05.1066622Z E1204 11:48:27.920000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:05.1068557Z E1204 11:48:27.920000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 75} 2025-12-04T12:15:05.1069109Z E1204 11:48:27.920000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:05.1070159Z E1204 11:48:27.920000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.1070807Z E1204 11:48:27.920000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.1071900Z E1204 11:48:27.920000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.1072598Z E1204 11:48:27.920000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.1073489Z E1204 11:48:27.920000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.1074273Z E1204 11:48:27.920000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.1074964Z E1204 11:48:27.920000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T12:15:05.1075840Z E1204 11:48:27.920000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.1076265Z E1204 11:48:27.920000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T12:15:05.1077214Z E1204 11:48:27.920000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.1077364Z ('RERUN', {'yellow': True}) [3.2740s] [100%] 2025-12-04T12:15:05.1078504Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda E1204 11:48:28.283000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_mul_0 2025-12-04T12:15:05.1079386Z E1204 11:48:28.283000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.1079821Z E1204 11:48:28.283000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T12:15:05.1080270Z E1204 11:48:28.283000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 15 2025-12-04T12:15:05.1080797Z E1204 11:48:28.283000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] R0_BLOCK: tl.constexpr = 16 2025-12-04T12:15:05.1081304Z E1204 11:48:28.283000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T12:15:05.1081856Z E1204 11:48:28.283000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T12:15:05.1082397Z E1204 11:48:28.283000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:05.1082983Z E1204 11:48:28.283000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T12:15:05.1083584Z E1204 11:48:28.283000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T12:15:05.1084235Z E1204 11:48:28.283000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T12:15:05.1084737Z E1204 11:48:28.283000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_offset = 0 2025-12-04T12:15:05.1085257Z E1204 11:48:28.283000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T12:15:05.1085733Z E1204 11:48:28.283000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T12:15:05.1086210Z E1204 11:48:28.283000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T12:15:05.1086659Z E1204 11:48:28.283000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_0 = r0_index 2025-12-04T12:15:05.1087323Z E1204 11:48:28.283000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0).to(tl.float32) 2025-12-04T12:15:05.1087845Z E1204 11:48:28.283000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = tl.load(in_ptr1 + (0)) 2025-12-04T12:15:05.1088437Z E1204 11:48:28.283000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tl.broadcast_to(tmp7, [1, 1]) 2025-12-04T12:15:05.1088951Z E1204 11:48:28.283000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tl_math.abs(tmp0) 2025-12-04T12:15:05.1089540Z E1204 11:48:28.283000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK]) 2025-12-04T12:15:05.1090154Z E1204 11:48:28.283000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tl.where(r0_mask, tmp2, float("-inf")) 2025-12-04T12:15:05.1090814Z E1204 11:48:28.283000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = triton_helpers.max2(tmp4, 1)[:, None].to(tl.float32) 2025-12-04T12:15:05.1091336Z E1204 11:48:28.283000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp6 = tmp5.to(tl.float32) 2025-12-04T12:15:05.1091815Z E1204 11:48:28.283000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = tmp6 * tmp8 2025-12-04T12:15:05.1092264Z E1204 11:48:28.283000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = -448.0 2025-12-04T12:15:05.1092848Z E1204 11:48:28.283000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = triton_helpers.maximum(tmp9, tmp10) 2025-12-04T12:15:05.1093294Z E1204 11:48:28.283000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = 448.0 2025-12-04T12:15:05.1093885Z E1204 11:48:28.283000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = triton_helpers.minimum(tmp11, tmp12) 2025-12-04T12:15:05.1094417Z E1204 11:48:28.283000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp14 = tmp13.to(tl.float8e4nv) 2025-12-04T12:15:05.1095163Z E1204 11:48:28.283000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp14, None) 2025-12-04T12:15:05.1095542Z E1204 11:48:28.283000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:05.1097547Z E1204 11:48:28.283000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 75} 2025-12-04T12:15:05.1098103Z E1204 11:48:28.283000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:05.1099167Z E1204 11:48:28.283000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.1099815Z E1204 11:48:28.283000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.1100708Z E1204 11:48:28.283000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.1101406Z E1204 11:48:28.283000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.1102333Z E1204 11:48:28.283000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.1103104Z E1204 11:48:28.283000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.1103723Z E1204 11:48:28.283000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T12:15:05.1104732Z E1204 11:48:28.283000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.1105123Z E1204 11:48:28.283000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T12:15:05.1106018Z E1204 11:48:28.283000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.1106168Z ('RERUN', {'yellow': True}) [0.3258s] [100%] 2025-12-04T12:15:05.1107334Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda E1204 11:48:28.610000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_mul_0 2025-12-04T12:15:05.1108211Z E1204 11:48:28.610000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.1108659Z E1204 11:48:28.610000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T12:15:05.1109163Z E1204 11:48:28.610000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 15 2025-12-04T12:15:05.1109695Z E1204 11:48:28.610000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] R0_BLOCK: tl.constexpr = 16 2025-12-04T12:15:05.1110159Z E1204 11:48:28.610000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T12:15:05.1110694Z E1204 11:48:28.610000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T12:15:05.1111255Z E1204 11:48:28.610000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:05.1111850Z E1204 11:48:28.610000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T12:15:05.1112449Z E1204 11:48:28.610000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T12:15:05.1113012Z E1204 11:48:28.610000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T12:15:05.1113457Z E1204 11:48:28.610000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_offset = 0 2025-12-04T12:15:05.1113991Z E1204 11:48:28.610000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T12:15:05.1114468Z E1204 11:48:28.610000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T12:15:05.1114948Z E1204 11:48:28.610000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T12:15:05.1115392Z E1204 11:48:28.610000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_0 = r0_index 2025-12-04T12:15:05.1116068Z E1204 11:48:28.610000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0).to(tl.float32) 2025-12-04T12:15:05.1116608Z E1204 11:48:28.610000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = tl.load(in_ptr1 + (0)) 2025-12-04T12:15:05.1117152Z E1204 11:48:28.610000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tl.broadcast_to(tmp7, [1, 1]) 2025-12-04T12:15:05.1117670Z E1204 11:48:28.610000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tl_math.abs(tmp0) 2025-12-04T12:15:05.1118328Z E1204 11:48:28.610000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK]) 2025-12-04T12:15:05.1118910Z E1204 11:48:28.610000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tl.where(r0_mask, tmp2, float("-inf")) 2025-12-04T12:15:05.1119541Z E1204 11:48:28.610000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = triton_helpers.max2(tmp4, 1)[:, None].to(tl.float32) 2025-12-04T12:15:05.1120049Z E1204 11:48:28.610000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp6 = tmp5.to(tl.float32) 2025-12-04T12:15:05.1120528Z E1204 11:48:28.610000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = tmp6 * tmp8 2025-12-04T12:15:05.1120974Z E1204 11:48:28.610000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = -448.0 2025-12-04T12:15:05.1121563Z E1204 11:48:28.610000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = triton_helpers.maximum(tmp9, tmp10) 2025-12-04T12:15:05.1122002Z E1204 11:48:28.610000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = 448.0 2025-12-04T12:15:05.1122608Z E1204 11:48:28.610000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = triton_helpers.minimum(tmp11, tmp12) 2025-12-04T12:15:05.1123157Z E1204 11:48:28.610000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp14 = tmp13.to(tl.float8e4nv) 2025-12-04T12:15:05.1123867Z E1204 11:48:28.610000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp14, None) 2025-12-04T12:15:05.1124247Z E1204 11:48:28.610000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:05.1126180Z E1204 11:48:28.610000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 75} 2025-12-04T12:15:05.1126724Z E1204 11:48:28.610000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:05.1127764Z E1204 11:48:28.610000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.1128407Z E1204 11:48:28.610000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.1129294Z E1204 11:48:28.610000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.1130007Z E1204 11:48:28.610000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.1130924Z E1204 11:48:28.610000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.1131696Z E1204 11:48:28.610000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.1132377Z E1204 11:48:28.610000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T12:15:05.1133251Z E1204 11:48:28.610000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.1133636Z E1204 11:48:28.610000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T12:15:05.1134529Z E1204 11:48:28.610000 112096 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.1134642Z FAILED [0.3241s] [100%] 2025-12-04T12:15:05.1134649Z 2025-12-04T12:15:05.1134816Z ==================================== RERUNS ==================================== 2025-12-04T12:15:05.1135107Z _____ TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda _____ 2025-12-04T12:15:05.1135256Z Traceback (most recent call last): 2025-12-04T12:15:05.1135661Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 236, in test_amax_fp8_quant 2025-12-04T12:15:05.1135849Z y_compiled = compiled_amax_fp8_quant(x, scale) 2025-12-04T12:15:05.1136424Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:05.1136679Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:05.1137192Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:05.1137399Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:05.1137912Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:05.1138077Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:05.1138614Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:05.1138941Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:05.1139478Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:05.1139635Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:05.1140131Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:05.1140259Z return self._compile_to_module() 2025-12-04T12:15:05.1140743Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:05.1140922Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:05.1141440Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:05.1141571Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:05.1142086Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:05.1142321Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:05.1142965Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:05.1143096Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:05.1143596Z File "/tmp/tmpkmloqxox/sm/csm67ljxmul4unwsrhd45cdsj7wa7asxlqevhown7jvo2dnz77xa.py", line 58, in 2025-12-04T12:15:05.1144071Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T12:15:05.1144222Z kernel.precompile( 2025-12-04T12:15:05.1144821Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T12:15:05.1144942Z self._precompile_worker() 2025-12-04T12:15:05.1145559Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:05.1145759Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:05.1146359Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.1146561Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.1147028Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.1147278Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.1147733Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.1148072Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.1148302Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:05.1148779Z def triton_per_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.1148874Z ^ 2025-12-04T12:15:05.1149346Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.1149351Z 2025-12-04T12:15:05.1150070Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:05.1150079Z 2025-12-04T12:15:05.1150084Z 2025-12-04T12:15:05.1150303Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:05.1150897Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda 2025-12-04T12:15:05.1150902Z 2025-12-04T12:15:05.1151176Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:05.1151415Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.1151522Z frames [('total', 1)] 2025-12-04T12:15:05.1151644Z stats [('calls_captured', 6)] 2025-12-04T12:15:05.1151896Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:05.1152116Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.1152217Z graph_break [] 2025-12-04T12:15:05.1152523Z _____ TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda _____ 2025-12-04T12:15:05.1152649Z Traceback (most recent call last): 2025-12-04T12:15:05.1153059Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 236, in test_amax_fp8_quant 2025-12-04T12:15:05.1153220Z y_compiled = compiled_amax_fp8_quant(x, scale) 2025-12-04T12:15:05.1153711Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:05.1153978Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:05.1154524Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:05.1154733Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:05.1155241Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:05.1155388Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:05.1155936Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:05.1156294Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:05.1156849Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:05.1157015Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:05.1157501Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:05.1157644Z return self._compile_to_module() 2025-12-04T12:15:05.1158130Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:05.1158295Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:05.1158823Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:05.1158959Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:05.1159472Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:05.1159705Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:05.1160292Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:05.1160470Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:05.1160947Z File "/tmp/tmpm_8a6myn/7s/c7scoabjmz4lq7otdr2zlv55wkha6upjywo2ftq6bgeoee6simbd.py", line 58, in 2025-12-04T12:15:05.1161413Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T12:15:05.1161540Z kernel.precompile( 2025-12-04T12:15:05.1162098Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T12:15:05.1162232Z self._precompile_worker() 2025-12-04T12:15:05.1162828Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:05.1163006Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:05.1163617Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.1163819Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.1164281Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.1164528Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.1164970Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.1165322Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.1165549Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:05.1165986Z def triton_per_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.1166097Z ^ 2025-12-04T12:15:05.1166557Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.1166562Z 2025-12-04T12:15:05.1167328Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:05.1167335Z 2025-12-04T12:15:05.1167339Z 2025-12-04T12:15:05.1167558Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:05.1168151Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda 2025-12-04T12:15:05.1168204Z 2025-12-04T12:15:05.1168474Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:05.1168727Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.1168854Z frames [('total', 1)] 2025-12-04T12:15:05.1168973Z stats [('calls_captured', 6)] 2025-12-04T12:15:05.1169218Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:05.1169464Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.1169566Z graph_break [] 2025-12-04T12:15:05.1169804Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.1169913Z frames [('total', 1)] 2025-12-04T12:15:05.1170028Z stats [('calls_captured', 6)] 2025-12-04T12:15:05.1170262Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.1170500Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:05.1170602Z graph_break [] 2025-12-04T12:15:05.1170762Z =================================== FAILURES =================================== 2025-12-04T12:15:05.1171261Z _____ TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda _____ 2025-12-04T12:15:05.1171403Z Traceback (most recent call last): 2025-12-04T12:15:05.1171891Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 236, in test_amax_fp8_quant 2025-12-04T12:15:05.1172050Z y_compiled = compiled_amax_fp8_quant(x, scale) 2025-12-04T12:15:05.1172558Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:05.1172809Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:05.1173322Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:05.1173532Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:05.1174046Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:05.1174209Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:05.1174750Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:05.1175073Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:05.1175617Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:05.1175765Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:05.1176273Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:05.1176465Z return self._compile_to_module() 2025-12-04T12:15:05.1176960Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:05.1177144Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:05.1177666Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:05.1177801Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:05.1178370Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:05.1178605Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:05.1179205Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:05.1179335Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:05.1179846Z File "/tmp/tmpuhdzuvz8/hi/chiiftdpysrhhsk6nfs3sptp5gxmdbmcgijuvcbktwfco6sf4d5u.py", line 58, in 2025-12-04T12:15:05.1180371Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T12:15:05.1180530Z kernel.precompile( 2025-12-04T12:15:05.1181177Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T12:15:05.1181344Z self._precompile_worker() 2025-12-04T12:15:05.1182001Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:05.1182199Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:05.1182796Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.1182998Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.1183471Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.1183718Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.1184181Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.1184522Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.1184798Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:05.1185249Z def triton_per_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.1185343Z ^ 2025-12-04T12:15:05.1185818Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.1185825Z 2025-12-04T12:15:05.1186544Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:05.1186552Z 2025-12-04T12:15:05.1186557Z 2025-12-04T12:15:05.1186781Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:05.1187372Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda 2025-12-04T12:15:05.1187380Z 2025-12-04T12:15:05.1187649Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:05.1187889Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.1187998Z frames [('total', 1)] 2025-12-04T12:15:05.1188116Z stats [('calls_captured', 6)] 2025-12-04T12:15:05.1188369Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:05.1188590Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.1188694Z graph_break [] 2025-12-04T12:15:05.1188926Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.1189029Z frames [('total', 1)] 2025-12-04T12:15:05.1189162Z stats [('calls_captured', 6)] 2025-12-04T12:15:05.1189384Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.1189619Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:05.1189735Z graph_break [] 2025-12-04T12:15:05.1189949Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.1190089Z frames [('total', 1)] 2025-12-04T12:15:05.1190223Z stats [('calls_captured', 6)] 2025-12-04T12:15:05.1190437Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.1190685Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:05.1190784Z graph_break [] 2025-12-04T12:15:05.1191829Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-5fb4c628c04a1cdc.xml - 2025-12-04T12:15:05.1192090Z =========================== short test summary info ============================ 2025-12-04T12:15:05.1192843Z FAILED [0.3241s] inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:05.1193280Z def triton_per_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.1193386Z ^ 2025-12-04T12:15:05.1193844Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.1193850Z 2025-12-04T12:15:05.1194580Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:05.1194586Z 2025-12-04T12:15:05.1194593Z 2025-12-04T12:15:05.1194815Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:05.1195406Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda 2025-12-04T12:15:05.1195412Z 2025-12-04T12:15:05.1195679Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:05.1195896Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T12:15:05.1196118Z ================== 1 failed, 187 deselected, 2 rerun in 3.97s ================== 2025-12-04T12:15:05.1196220Z Got exit code 1 2025-12-04T12:15:05.1196330Z Retrying single test... 2025-12-04T12:15:05.1196816Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-0d752e0bfa5071ea.xml 2025-12-04T12:15:05.1196981Z ============================= test session starts ============================== 2025-12-04T12:15:05.1197348Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T12:15:05.1197461Z cachedir: .pytest_cache 2025-12-04T12:15:05.1197985Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:15:05.1198129Z rootdir: /var/lib/jenkins/workspace 2025-12-04T12:15:05.1198240Z configfile: pytest.ini 2025-12-04T12:15:05.1198834Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T12:15:05.1199072Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T12:15:05.1199724Z stepcurrent: skipping 10 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda 2025-12-04T12:15:05.1199857Z Running 1 items in this shard 2025-12-04T12:15:05.1199862Z 2025-12-04T12:15:05.1201008Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda E1204 11:48:47.720000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_mul_0 2025-12-04T12:15:05.1201901Z E1204 11:48:47.720000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.1202337Z E1204 11:48:47.720000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T12:15:05.1202811Z E1204 11:48:47.720000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 15 2025-12-04T12:15:05.1203339Z E1204 11:48:47.720000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] R0_BLOCK: tl.constexpr = 16 2025-12-04T12:15:05.1203876Z E1204 11:48:47.720000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T12:15:05.1204528Z E1204 11:48:47.720000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T12:15:05.1205123Z E1204 11:48:47.720000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:05.1205715Z E1204 11:48:47.720000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T12:15:05.1206314Z E1204 11:48:47.720000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T12:15:05.1206872Z E1204 11:48:47.720000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T12:15:05.1207327Z E1204 11:48:47.720000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_offset = 0 2025-12-04T12:15:05.1207852Z E1204 11:48:47.720000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T12:15:05.1208328Z E1204 11:48:47.720000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T12:15:05.1208804Z E1204 11:48:47.720000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T12:15:05.1209290Z E1204 11:48:47.720000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_0 = r0_index 2025-12-04T12:15:05.1209955Z E1204 11:48:47.720000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0).to(tl.float32) 2025-12-04T12:15:05.1210476Z E1204 11:48:47.720000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = tl.load(in_ptr1 + (0)) 2025-12-04T12:15:05.1211029Z E1204 11:48:47.720000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tl.broadcast_to(tmp7, [1, 1]) 2025-12-04T12:15:05.1211534Z E1204 11:48:47.720000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tl_math.abs(tmp0) 2025-12-04T12:15:05.1212116Z E1204 11:48:47.720000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK]) 2025-12-04T12:15:05.1212709Z E1204 11:48:47.720000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tl.where(r0_mask, tmp2, float("-inf")) 2025-12-04T12:15:05.1213339Z E1204 11:48:47.720000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = triton_helpers.max2(tmp4, 1)[:, None].to(tl.float32) 2025-12-04T12:15:05.1213860Z E1204 11:48:47.720000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp6 = tmp5.to(tl.float32) 2025-12-04T12:15:05.1214333Z E1204 11:48:47.720000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = tmp6 * tmp8 2025-12-04T12:15:05.1214782Z E1204 11:48:47.720000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = -448.0 2025-12-04T12:15:05.1215365Z E1204 11:48:47.720000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = triton_helpers.maximum(tmp9, tmp10) 2025-12-04T12:15:05.1215872Z E1204 11:48:47.720000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = 448.0 2025-12-04T12:15:05.1216563Z E1204 11:48:47.720000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = triton_helpers.minimum(tmp11, tmp12) 2025-12-04T12:15:05.1217100Z E1204 11:48:47.720000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp14 = tmp13.to(tl.float8e4nv) 2025-12-04T12:15:05.1217814Z E1204 11:48:47.720000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp14, None) 2025-12-04T12:15:05.1218270Z E1204 11:48:47.720000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:05.1220214Z E1204 11:48:47.720000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 75} 2025-12-04T12:15:05.1220766Z E1204 11:48:47.720000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:05.1221816Z E1204 11:48:47.720000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.1222460Z E1204 11:48:47.720000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.1223387Z E1204 11:48:47.720000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.1224078Z E1204 11:48:47.720000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.1224958Z E1204 11:48:47.720000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.1225733Z E1204 11:48:47.720000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.1226358Z E1204 11:48:47.720000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T12:15:05.1227233Z E1204 11:48:47.720000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.1227617Z E1204 11:48:47.720000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T12:15:05.1228510Z E1204 11:48:47.720000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.1228664Z ('RERUN', {'yellow': True}) [3.2742s] [100%] 2025-12-04T12:15:05.1229800Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda E1204 11:48:48.090000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_mul_0 2025-12-04T12:15:05.1230708Z E1204 11:48:48.090000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.1231152Z E1204 11:48:48.090000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T12:15:05.1231861Z E1204 11:48:48.090000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 15 2025-12-04T12:15:05.1232435Z E1204 11:48:48.090000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] R0_BLOCK: tl.constexpr = 16 2025-12-04T12:15:05.1232931Z E1204 11:48:48.090000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T12:15:05.1233465Z E1204 11:48:48.090000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T12:15:05.1234026Z E1204 11:48:48.090000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:05.1234610Z E1204 11:48:48.090000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T12:15:05.1235205Z E1204 11:48:48.090000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T12:15:05.1235762Z E1204 11:48:48.090000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T12:15:05.1236221Z E1204 11:48:48.090000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_offset = 0 2025-12-04T12:15:05.1236737Z E1204 11:48:48.090000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T12:15:05.1237245Z E1204 11:48:48.090000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T12:15:05.1237719Z E1204 11:48:48.090000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T12:15:05.1238168Z E1204 11:48:48.090000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_0 = r0_index 2025-12-04T12:15:05.1238826Z E1204 11:48:48.090000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0).to(tl.float32) 2025-12-04T12:15:05.1239356Z E1204 11:48:48.090000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = tl.load(in_ptr1 + (0)) 2025-12-04T12:15:05.1239897Z E1204 11:48:48.090000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tl.broadcast_to(tmp7, [1, 1]) 2025-12-04T12:15:05.1240410Z E1204 11:48:48.090000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tl_math.abs(tmp0) 2025-12-04T12:15:05.1240994Z E1204 11:48:48.090000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK]) 2025-12-04T12:15:05.1241577Z E1204 11:48:48.090000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tl.where(r0_mask, tmp2, float("-inf")) 2025-12-04T12:15:05.1242206Z E1204 11:48:48.090000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = triton_helpers.max2(tmp4, 1)[:, None].to(tl.float32) 2025-12-04T12:15:05.1242717Z E1204 11:48:48.090000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp6 = tmp5.to(tl.float32) 2025-12-04T12:15:05.1243195Z E1204 11:48:48.090000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = tmp6 * tmp8 2025-12-04T12:15:05.1243644Z E1204 11:48:48.090000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = -448.0 2025-12-04T12:15:05.1244262Z E1204 11:48:48.090000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = triton_helpers.maximum(tmp9, tmp10) 2025-12-04T12:15:05.1244701Z E1204 11:48:48.090000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = 448.0 2025-12-04T12:15:05.1245277Z E1204 11:48:48.090000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = triton_helpers.minimum(tmp11, tmp12) 2025-12-04T12:15:05.1245856Z E1204 11:48:48.090000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp14 = tmp13.to(tl.float8e4nv) 2025-12-04T12:15:05.1246598Z E1204 11:48:48.090000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp14, None) 2025-12-04T12:15:05.1246985Z E1204 11:48:48.090000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:05.1248930Z E1204 11:48:48.090000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 75} 2025-12-04T12:15:05.1249487Z E1204 11:48:48.090000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:05.1250532Z E1204 11:48:48.090000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.1251218Z E1204 11:48:48.090000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.1252110Z E1204 11:48:48.090000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.1252791Z E1204 11:48:48.090000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.1253697Z E1204 11:48:48.090000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.1254474Z E1204 11:48:48.090000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.1255094Z E1204 11:48:48.090000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T12:15:05.1255970Z E1204 11:48:48.090000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.1256417Z E1204 11:48:48.090000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T12:15:05.1257480Z E1204 11:48:48.090000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.1257621Z ('RERUN', {'yellow': True}) [0.3327s] [100%] 2025-12-04T12:15:05.1258819Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda E1204 11:48:48.425000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_mul_0 2025-12-04T12:15:05.1259715Z E1204 11:48:48.425000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.1260222Z E1204 11:48:48.425000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T12:15:05.1260841Z E1204 11:48:48.425000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 15 2025-12-04T12:15:05.1261511Z E1204 11:48:48.425000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] R0_BLOCK: tl.constexpr = 16 2025-12-04T12:15:05.1261977Z E1204 11:48:48.425000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T12:15:05.1262514Z E1204 11:48:48.425000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T12:15:05.1263068Z E1204 11:48:48.425000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:05.1263651Z E1204 11:48:48.425000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T12:15:05.1264249Z E1204 11:48:48.425000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T12:15:05.1264807Z E1204 11:48:48.425000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T12:15:05.1265311Z E1204 11:48:48.425000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_offset = 0 2025-12-04T12:15:05.1265846Z E1204 11:48:48.425000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T12:15:05.1266315Z E1204 11:48:48.425000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T12:15:05.1266787Z E1204 11:48:48.425000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T12:15:05.1267236Z E1204 11:48:48.425000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_0 = r0_index 2025-12-04T12:15:05.1267885Z E1204 11:48:48.425000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0).to(tl.float32) 2025-12-04T12:15:05.1268421Z E1204 11:48:48.425000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = tl.load(in_ptr1 + (0)) 2025-12-04T12:15:05.1268973Z E1204 11:48:48.425000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tl.broadcast_to(tmp7, [1, 1]) 2025-12-04T12:15:05.1269489Z E1204 11:48:48.425000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tl_math.abs(tmp0) 2025-12-04T12:15:05.1270072Z E1204 11:48:48.425000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK]) 2025-12-04T12:15:05.1270647Z E1204 11:48:48.425000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tl.where(r0_mask, tmp2, float("-inf")) 2025-12-04T12:15:05.1271475Z E1204 11:48:48.425000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = triton_helpers.max2(tmp4, 1)[:, None].to(tl.float32) 2025-12-04T12:15:05.1271986Z E1204 11:48:48.425000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp6 = tmp5.to(tl.float32) 2025-12-04T12:15:05.1272559Z E1204 11:48:48.425000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = tmp6 * tmp8 2025-12-04T12:15:05.1273011Z E1204 11:48:48.425000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = -448.0 2025-12-04T12:15:05.1273595Z E1204 11:48:48.425000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = triton_helpers.maximum(tmp9, tmp10) 2025-12-04T12:15:05.1274079Z E1204 11:48:48.425000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = 448.0 2025-12-04T12:15:05.1274691Z E1204 11:48:48.425000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = triton_helpers.minimum(tmp11, tmp12) 2025-12-04T12:15:05.1275240Z E1204 11:48:48.425000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp14 = tmp13.to(tl.float8e4nv) 2025-12-04T12:15:05.1275955Z E1204 11:48:48.425000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp14, None) 2025-12-04T12:15:05.1276337Z E1204 11:48:48.425000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:05.1278274Z E1204 11:48:48.425000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 75} 2025-12-04T12:15:05.1278826Z E1204 11:48:48.425000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:05.1279919Z E1204 11:48:48.425000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.1280554Z E1204 11:48:48.425000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.1281473Z E1204 11:48:48.425000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.1282154Z E1204 11:48:48.425000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.1283060Z E1204 11:48:48.425000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.1283832Z E1204 11:48:48.425000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.1284454Z E1204 11:48:48.425000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T12:15:05.1285330Z E1204 11:48:48.425000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.1285699Z E1204 11:48:48.425000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T12:15:05.1286642Z E1204 11:48:48.425000 112293 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.1286750Z FAILED [0.3320s] [100%] 2025-12-04T12:15:05.1286756Z 2025-12-04T12:15:05.1286917Z ==================================== RERUNS ==================================== 2025-12-04T12:15:05.1287211Z _____ TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda _____ 2025-12-04T12:15:05.1287338Z Traceback (most recent call last): 2025-12-04T12:15:05.1287781Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 236, in test_amax_fp8_quant 2025-12-04T12:15:05.1287940Z y_compiled = compiled_amax_fp8_quant(x, scale) 2025-12-04T12:15:05.1288474Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:05.1288727Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:05.1289248Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:05.1289461Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:05.1289971Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:05.1290132Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:05.1290667Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:05.1290992Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:05.1291530Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:05.1291680Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:05.1292293Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:05.1292452Z return self._compile_to_module() 2025-12-04T12:15:05.1292962Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:05.1293145Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:05.1293662Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:05.1293795Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:05.1294304Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:05.1294539Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:05.1295140Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:05.1295271Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:05.1295779Z File "/tmp/tmpp8o7idow/xn/cxnwjbv4kkjuhrs3bzu23uu66vz7hji6zr45at7jbbnlflwtqy2z.py", line 58, in 2025-12-04T12:15:05.1296257Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T12:15:05.1296438Z kernel.precompile( 2025-12-04T12:15:05.1296993Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T12:15:05.1297128Z self._precompile_worker() 2025-12-04T12:15:05.1297726Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:05.1297921Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:05.1298519Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.1298721Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.1299232Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.1299480Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.1299933Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.1300301Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.1300536Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:05.1301015Z def triton_per_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.1301110Z ^ 2025-12-04T12:15:05.1301569Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.1301591Z 2025-12-04T12:15:05.1302313Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:05.1302320Z 2025-12-04T12:15:05.1302325Z 2025-12-04T12:15:05.1302544Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:05.1303139Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda 2025-12-04T12:15:05.1303149Z 2025-12-04T12:15:05.1303421Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:05.1303663Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.1303771Z frames [('total', 1)] 2025-12-04T12:15:05.1303889Z stats [('calls_captured', 6)] 2025-12-04T12:15:05.1304172Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:05.1304397Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.1304500Z graph_break [] 2025-12-04T12:15:05.1304805Z _____ TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda _____ 2025-12-04T12:15:05.1304930Z Traceback (most recent call last): 2025-12-04T12:15:05.1305342Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 236, in test_amax_fp8_quant 2025-12-04T12:15:05.1305502Z y_compiled = compiled_amax_fp8_quant(x, scale) 2025-12-04T12:15:05.1305992Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:05.1306254Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:05.1306765Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:05.1306962Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:05.1307493Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:05.1307642Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:05.1308191Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:05.1308515Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:05.1309039Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:05.1309203Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:05.1309687Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:05.1309824Z return self._compile_to_module() 2025-12-04T12:15:05.1310313Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:05.1310590Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:05.1311129Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:05.1311261Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:05.1311761Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:05.1312041Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:05.1312676Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:05.1313145Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:05.1313660Z File "/tmp/tmpgee3xitr/ei/ceixziejlt5nrat6w2xlvumtvbh4hkg5o4amqsjju4anlnub3a3z.py", line 58, in 2025-12-04T12:15:05.1314205Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T12:15:05.1314362Z kernel.precompile( 2025-12-04T12:15:05.1314921Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T12:15:05.1315052Z self._precompile_worker() 2025-12-04T12:15:05.1315650Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:05.1315833Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:05.1316445Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.1316646Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.1317098Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.1317591Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.1318095Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.1318499Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.1318900Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:05.1319339Z def triton_per_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.1319463Z ^ 2025-12-04T12:15:05.1320043Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.1320050Z 2025-12-04T12:15:05.1320773Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:05.1320782Z 2025-12-04T12:15:05.1320787Z 2025-12-04T12:15:05.1321007Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:05.1321583Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda 2025-12-04T12:15:05.1321602Z 2025-12-04T12:15:05.1321871Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:05.1322095Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.1322217Z frames [('total', 1)] 2025-12-04T12:15:05.1322339Z stats [('calls_captured', 6)] 2025-12-04T12:15:05.1322581Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:05.1322816Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.1322918Z graph_break [] 2025-12-04T12:15:05.1323140Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.1323256Z frames [('total', 1)] 2025-12-04T12:15:05.1323428Z stats [('calls_captured', 6)] 2025-12-04T12:15:05.1323664Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.1323897Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:05.1323997Z graph_break [] 2025-12-04T12:15:05.1324159Z =================================== FAILURES =================================== 2025-12-04T12:15:05.1324483Z _____ TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda _____ 2025-12-04T12:15:05.1324608Z Traceback (most recent call last): 2025-12-04T12:15:05.1325059Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 236, in test_amax_fp8_quant 2025-12-04T12:15:05.1325217Z y_compiled = compiled_amax_fp8_quant(x, scale) 2025-12-04T12:15:05.1325722Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:05.1325976Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:05.1326492Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:05.1326699Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:05.1327209Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:05.1327372Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:05.1327905Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:05.1328230Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:05.1328766Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:05.1328946Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:05.1329429Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:05.1329569Z return self._compile_to_module() 2025-12-04T12:15:05.1330056Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:05.1330239Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:05.1330761Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:05.1330892Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:05.1331407Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:05.1331642Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:05.1332250Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:05.1332379Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:05.1332854Z File "/tmp/tmpm_wfro_n/ug/cugkxgnqbwfx3x2whxlt522gco5jleyjoobigwqugfoemjvcekri.py", line 58, in 2025-12-04T12:15:05.1333329Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T12:15:05.1333443Z kernel.precompile( 2025-12-04T12:15:05.1334001Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T12:15:05.1334132Z self._precompile_worker() 2025-12-04T12:15:05.1334730Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:05.1334923Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:05.1335550Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.1335752Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.1336217Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.1336555Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.1337052Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.1337388Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.1337647Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:05.1338098Z def triton_per_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.1338193Z ^ 2025-12-04T12:15:05.1338653Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.1338659Z 2025-12-04T12:15:05.1339389Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:05.1339395Z 2025-12-04T12:15:05.1339400Z 2025-12-04T12:15:05.1339619Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:05.1340217Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda 2025-12-04T12:15:05.1340222Z 2025-12-04T12:15:05.1340495Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:05.1340734Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.1340873Z frames [('total', 1)] 2025-12-04T12:15:05.1340990Z stats [('calls_captured', 6)] 2025-12-04T12:15:05.1341247Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:05.1341472Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.1341576Z graph_break [] 2025-12-04T12:15:05.1341815Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.1341922Z frames [('total', 1)] 2025-12-04T12:15:05.1342041Z stats [('calls_captured', 6)] 2025-12-04T12:15:05.1342279Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.1342514Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:05.1342628Z graph_break [] 2025-12-04T12:15:05.1342848Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.1342956Z frames [('total', 1)] 2025-12-04T12:15:05.1343091Z stats [('calls_captured', 6)] 2025-12-04T12:15:05.1343312Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.1343549Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:05.1343666Z graph_break [] 2025-12-04T12:15:05.1344327Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-0d752e0bfa5071ea.xml - 2025-12-04T12:15:05.1344518Z =========================== short test summary info ============================ 2025-12-04T12:15:05.1345242Z FAILED [0.3320s] inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:05.1345684Z def triton_per_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.1345791Z ^ 2025-12-04T12:15:05.1346252Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.1346260Z 2025-12-04T12:15:05.1347029Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:05.1347036Z 2025-12-04T12:15:05.1347040Z 2025-12-04T12:15:05.1347260Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:05.1347840Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda 2025-12-04T12:15:05.1347890Z 2025-12-04T12:15:05.1348160Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:05.1348373Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T12:15:05.1348594Z ================== 1 failed, 187 deselected, 2 rerun in 3.98s ================== 2025-12-04T12:15:05.1348697Z Got exit code 1 2025-12-04T12:15:05.1349191Z FAILED CONSISTENTLY: test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda 2025-12-04T12:15:05.1349621Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T12:15:05.1350099Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-fedbc7df4b1c2869.xml 2025-12-04T12:15:05.1350281Z ============================= test session starts ============================== 2025-12-04T12:15:05.1350632Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T12:15:05.1350747Z cachedir: .pytest_cache 2025-12-04T12:15:05.1351281Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:15:05.1351407Z rootdir: /var/lib/jenkins/workspace 2025-12-04T12:15:05.1351516Z configfile: pytest.ini 2025-12-04T12:15:05.1352124Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T12:15:05.1352387Z collecting ... collected 188 items / 11 deselected / 177 selected 2025-12-04T12:15:05.1352544Z stepcurrent: skipping 11 already run items. 2025-12-04T12:15:05.1352662Z Running 177 items in this shard 2025-12-04T12:15:05.1352667Z 2025-12-04T12:15:05.1353812Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda E1204 11:49:07.277000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_mul_0 2025-12-04T12:15:05.1354698Z E1204 11:49:07.277000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.1355127Z E1204 11:49:07.277000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T12:15:05.1355592Z E1204 11:49:07.277000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 150 2025-12-04T12:15:05.1356113Z E1204 11:49:07.277000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] R0_BLOCK: tl.constexpr = 256 2025-12-04T12:15:05.1356578Z E1204 11:49:07.277000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T12:15:05.1357125Z E1204 11:49:07.277000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T12:15:05.1357667Z E1204 11:49:07.277000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:05.1358272Z E1204 11:49:07.277000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T12:15:05.1358856Z E1204 11:49:07.277000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T12:15:05.1359456Z E1204 11:49:07.277000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T12:15:05.1359900Z E1204 11:49:07.277000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_offset = 0 2025-12-04T12:15:05.1360417Z E1204 11:49:07.277000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T12:15:05.1360952Z E1204 11:49:07.277000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T12:15:05.1361439Z E1204 11:49:07.277000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T12:15:05.1361903Z E1204 11:49:07.277000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_0 = r0_index 2025-12-04T12:15:05.1362561Z E1204 11:49:07.277000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0).to(tl.float32) 2025-12-04T12:15:05.1363082Z E1204 11:49:07.277000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = tl.load(in_ptr1 + (0)) 2025-12-04T12:15:05.1363634Z E1204 11:49:07.277000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tl.broadcast_to(tmp7, [1, 1]) 2025-12-04T12:15:05.1364137Z E1204 11:49:07.277000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tl_math.abs(tmp0) 2025-12-04T12:15:05.1364737Z E1204 11:49:07.277000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK]) 2025-12-04T12:15:05.1365308Z E1204 11:49:07.277000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tl.where(r0_mask, tmp2, float("-inf")) 2025-12-04T12:15:05.1365967Z E1204 11:49:07.277000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = triton_helpers.max2(tmp4, 1)[:, None].to(tl.float32) 2025-12-04T12:15:05.1366490Z E1204 11:49:07.277000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp6 = tmp5.to(tl.float32) 2025-12-04T12:15:05.1366957Z E1204 11:49:07.277000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = tmp6 * tmp8 2025-12-04T12:15:05.1367414Z E1204 11:49:07.277000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = -448.0 2025-12-04T12:15:05.1367983Z E1204 11:49:07.277000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = triton_helpers.maximum(tmp9, tmp10) 2025-12-04T12:15:05.1368421Z E1204 11:49:07.277000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = 448.0 2025-12-04T12:15:05.1369016Z E1204 11:49:07.277000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = triton_helpers.minimum(tmp11, tmp12) 2025-12-04T12:15:05.1369549Z E1204 11:49:07.277000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp14 = tmp13.to(tl.float8e4nv) 2025-12-04T12:15:05.1370277Z E1204 11:49:07.277000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp14, None) 2025-12-04T12:15:05.1370701Z E1204 11:49:07.277000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:05.1372987Z E1204 11:49:07.277000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 75} 2025-12-04T12:15:05.1373536Z E1204 11:49:07.277000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:05.1374723Z E1204 11:49:07.277000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.1375462Z E1204 11:49:07.277000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.1376688Z E1204 11:49:07.277000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.1377669Z E1204 11:49:07.277000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.1378839Z E1204 11:49:07.277000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.1379631Z E1204 11:49:07.277000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.1380243Z E1204 11:49:07.277000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T12:15:05.1381134Z E1204 11:49:07.277000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.1381598Z E1204 11:49:07.277000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T12:15:05.1382492Z E1204 11:49:07.277000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.1382649Z ('RERUN', {'yellow': True}) [3.2672s] [ 0%] 2025-12-04T12:15:05.1383798Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda E1204 11:49:07.639000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_mul_0 2025-12-04T12:15:05.1384677Z E1204 11:49:07.639000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.1385112Z E1204 11:49:07.639000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T12:15:05.1385572Z E1204 11:49:07.639000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 150 2025-12-04T12:15:05.1386091Z E1204 11:49:07.639000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] R0_BLOCK: tl.constexpr = 256 2025-12-04T12:15:05.1386554Z E1204 11:49:07.639000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T12:15:05.1387104Z E1204 11:49:07.639000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T12:15:05.1387645Z E1204 11:49:07.639000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:05.1388280Z E1204 11:49:07.639000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T12:15:05.1388867Z E1204 11:49:07.639000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T12:15:05.1389425Z E1204 11:49:07.639000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T12:15:05.1389919Z E1204 11:49:07.639000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_offset = 0 2025-12-04T12:15:05.1390465Z E1204 11:49:07.639000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T12:15:05.1390953Z E1204 11:49:07.639000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T12:15:05.1391418Z E1204 11:49:07.639000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T12:15:05.1391861Z E1204 11:49:07.639000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_0 = r0_index 2025-12-04T12:15:05.1392522Z E1204 11:49:07.639000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0).to(tl.float32) 2025-12-04T12:15:05.1393041Z E1204 11:49:07.639000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = tl.load(in_ptr1 + (0)) 2025-12-04T12:15:05.1393599Z E1204 11:49:07.639000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tl.broadcast_to(tmp7, [1, 1]) 2025-12-04T12:15:05.1394097Z E1204 11:49:07.639000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tl_math.abs(tmp0) 2025-12-04T12:15:05.1394708Z E1204 11:49:07.639000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK]) 2025-12-04T12:15:05.1395291Z E1204 11:49:07.639000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tl.where(r0_mask, tmp2, float("-inf")) 2025-12-04T12:15:05.1395921Z E1204 11:49:07.639000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = triton_helpers.max2(tmp4, 1)[:, None].to(tl.float32) 2025-12-04T12:15:05.1396442Z E1204 11:49:07.639000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp6 = tmp5.to(tl.float32) 2025-12-04T12:15:05.1396911Z E1204 11:49:07.639000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = tmp6 * tmp8 2025-12-04T12:15:05.1397372Z E1204 11:49:07.639000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = -448.0 2025-12-04T12:15:05.1397943Z E1204 11:49:07.639000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = triton_helpers.maximum(tmp9, tmp10) 2025-12-04T12:15:05.1398384Z E1204 11:49:07.639000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = 448.0 2025-12-04T12:15:05.1398970Z E1204 11:49:07.639000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = triton_helpers.minimum(tmp11, tmp12) 2025-12-04T12:15:05.1399502Z E1204 11:49:07.639000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp14 = tmp13.to(tl.float8e4nv) 2025-12-04T12:15:05.1400232Z E1204 11:49:07.639000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp14, None) 2025-12-04T12:15:05.1400592Z E1204 11:49:07.639000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:05.1402556Z E1204 11:49:07.639000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 75} 2025-12-04T12:15:05.1403136Z E1204 11:49:07.639000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:05.1404222Z E1204 11:49:07.639000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.1404870Z E1204 11:49:07.639000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.1405769Z E1204 11:49:07.639000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.1406461Z E1204 11:49:07.639000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.1407346Z E1204 11:49:07.639000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.1408124Z E1204 11:49:07.639000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.1408762Z E1204 11:49:07.639000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T12:15:05.1409627Z E1204 11:49:07.639000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.1410010Z E1204 11:49:07.639000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T12:15:05.1410904Z E1204 11:49:07.639000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.1411051Z ('RERUN', {'yellow': True}) [0.3237s] [ 0%] 2025-12-04T12:15:05.1412187Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda E1204 11:49:07.962000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_mul_0 2025-12-04T12:15:05.1413066Z E1204 11:49:07.962000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.1413497Z E1204 11:49:07.962000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T12:15:05.1413948Z E1204 11:49:07.962000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 150 2025-12-04T12:15:05.1414482Z E1204 11:49:07.962000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] R0_BLOCK: tl.constexpr = 256 2025-12-04T12:15:05.1414942Z E1204 11:49:07.962000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T12:15:05.1415495Z E1204 11:49:07.962000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T12:15:05.1416086Z E1204 11:49:07.962000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:05.1416747Z E1204 11:49:07.962000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T12:15:05.1417341Z E1204 11:49:07.962000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T12:15:05.1417964Z E1204 11:49:07.962000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T12:15:05.1418425Z E1204 11:49:07.962000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_offset = 0 2025-12-04T12:15:05.1418950Z E1204 11:49:07.962000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T12:15:05.1419427Z E1204 11:49:07.962000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T12:15:05.1419905Z E1204 11:49:07.962000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T12:15:05.1420356Z E1204 11:49:07.962000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_0 = r0_index 2025-12-04T12:15:05.1421017Z E1204 11:49:07.962000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0).to(tl.float32) 2025-12-04T12:15:05.1421541Z E1204 11:49:07.962000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = tl.load(in_ptr1 + (0)) 2025-12-04T12:15:05.1422095Z E1204 11:49:07.962000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tl.broadcast_to(tmp7, [1, 1]) 2025-12-04T12:15:05.1422630Z E1204 11:49:07.962000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tl_math.abs(tmp0) 2025-12-04T12:15:05.1423211Z E1204 11:49:07.962000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK]) 2025-12-04T12:15:05.1423796Z E1204 11:49:07.962000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tl.where(r0_mask, tmp2, float("-inf")) 2025-12-04T12:15:05.1424425Z E1204 11:49:07.962000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = triton_helpers.max2(tmp4, 1)[:, None].to(tl.float32) 2025-12-04T12:15:05.1424940Z E1204 11:49:07.962000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp6 = tmp5.to(tl.float32) 2025-12-04T12:15:05.1425409Z E1204 11:49:07.962000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = tmp6 * tmp8 2025-12-04T12:15:05.1432629Z E1204 11:49:07.962000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = -448.0 2025-12-04T12:15:05.1433508Z E1204 11:49:07.962000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = triton_helpers.maximum(tmp9, tmp10) 2025-12-04T12:15:05.1433962Z E1204 11:49:07.962000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = 448.0 2025-12-04T12:15:05.1434903Z E1204 11:49:07.962000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = triton_helpers.minimum(tmp11, tmp12) 2025-12-04T12:15:05.1435558Z E1204 11:49:07.962000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp14 = tmp13.to(tl.float8e4nv) 2025-12-04T12:15:05.1436327Z E1204 11:49:07.962000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp14, None) 2025-12-04T12:15:05.1436850Z E1204 11:49:07.962000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:05.1438874Z E1204 11:49:07.962000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 75} 2025-12-04T12:15:05.1439472Z E1204 11:49:07.962000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:05.1440523Z E1204 11:49:07.962000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.1441165Z E1204 11:49:07.962000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.1442061Z E1204 11:49:07.962000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.1442758Z E1204 11:49:07.962000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.1443643Z E1204 11:49:07.962000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.1444457Z E1204 11:49:07.962000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.1445075Z E1204 11:49:07.962000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T12:15:05.1445948Z E1204 11:49:07.962000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.1446332Z E1204 11:49:07.962000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T12:15:05.1447230Z E1204 11:49:07.962000 112490 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.1447338Z FAILED [0.3204s] [ 0%] 2025-12-04T12:15:05.1447359Z 2025-12-04T12:15:05.1447510Z ==================================== RERUNS ==================================== 2025-12-04T12:15:05.1447803Z ____ TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda _____ 2025-12-04T12:15:05.1447945Z Traceback (most recent call last): 2025-12-04T12:15:05.1448345Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 236, in test_amax_fp8_quant 2025-12-04T12:15:05.1448505Z y_compiled = compiled_amax_fp8_quant(x, scale) 2025-12-04T12:15:05.1449011Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:05.1449267Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:05.1449798Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:05.1449996Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:05.1450540Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:05.1450703Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:05.1451238Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:05.1451558Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:05.1452128Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:05.1452306Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:05.1452799Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:05.1452927Z return self._compile_to_module() 2025-12-04T12:15:05.1453416Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:05.1453590Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:05.1454108Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:05.1454253Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:05.1454747Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:05.1454984Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:05.1455587Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:05.1455713Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:05.1456241Z File "/tmp/tmpx1q1l5d9/jp/cjp7bmi7v7ypzibnpt4wclabccxrx6672ohcappxd44gst6cvgcr.py", line 58, in 2025-12-04T12:15:05.1456816Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T12:15:05.1456930Z kernel.precompile( 2025-12-04T12:15:05.1457496Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T12:15:05.1457613Z self._precompile_worker() 2025-12-04T12:15:05.1458211Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:05.1458407Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:05.1459003Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.1459220Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.1459672Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.1459922Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.1460380Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.1460715Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.1460943Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:05.1461391Z def triton_per_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.1461486Z ^ 2025-12-04T12:15:05.1461954Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.1461961Z 2025-12-04T12:15:05.1462674Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:05.1462720Z 2025-12-04T12:15:05.1462726Z 2025-12-04T12:15:05.1462961Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:05.1463544Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda 2025-12-04T12:15:05.1463550Z 2025-12-04T12:15:05.1463817Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:05.1464089Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.1464198Z frames [('total', 1)] 2025-12-04T12:15:05.1464349Z stats [('calls_captured', 6)] 2025-12-04T12:15:05.1464602Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:05.1464823Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.1464940Z graph_break [] 2025-12-04T12:15:05.1465232Z ____ TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda _____ 2025-12-04T12:15:05.1465356Z Traceback (most recent call last): 2025-12-04T12:15:05.1465772Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 236, in test_amax_fp8_quant 2025-12-04T12:15:05.1465928Z y_compiled = compiled_amax_fp8_quant(x, scale) 2025-12-04T12:15:05.1466416Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:05.1466678Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:05.1467192Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:05.1467393Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:05.1467899Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:05.1468091Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:05.1468634Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:05.1468952Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:05.1469484Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:05.1469634Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:05.1470112Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:05.1470248Z return self._compile_to_module() 2025-12-04T12:15:05.1470730Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:05.1470897Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:05.1471615Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:05.1471748Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:05.1472261Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:05.1472496Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:05.1473088Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:05.1473233Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:05.1473735Z File "/tmp/tmpfqj4hjli/ze/cze3y35l3qkwon2z22jt6gic5gihofiimccweugivfjhak65tw7a.py", line 58, in 2025-12-04T12:15:05.1474212Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T12:15:05.1474331Z kernel.precompile( 2025-12-04T12:15:05.1474974Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T12:15:05.1475108Z self._precompile_worker() 2025-12-04T12:15:05.1475706Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:05.1475887Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:05.1476541Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.1476788Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.1477255Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.1477502Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.1477945Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.1478297Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.1478526Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:05.1478973Z def triton_per_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.1479067Z ^ 2025-12-04T12:15:05.1479519Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.1479525Z 2025-12-04T12:15:05.1480254Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:05.1480261Z 2025-12-04T12:15:05.1480312Z 2025-12-04T12:15:05.1480531Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:05.1481126Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda 2025-12-04T12:15:05.1481132Z 2025-12-04T12:15:05.1481401Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:05.1481622Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.1481741Z frames [('total', 1)] 2025-12-04T12:15:05.1481903Z stats [('calls_captured', 6)] 2025-12-04T12:15:05.1482153Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:05.1482372Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.1482471Z graph_break [] 2025-12-04T12:15:05.1482704Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.1482813Z frames [('total', 1)] 2025-12-04T12:15:05.1482927Z stats [('calls_captured', 6)] 2025-12-04T12:15:05.1483160Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.1483399Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:05.1483509Z graph_break [] 2025-12-04T12:15:05.1483656Z =================================== FAILURES =================================== 2025-12-04T12:15:05.1484034Z ____ TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda _____ 2025-12-04T12:15:05.1484213Z Traceback (most recent call last): 2025-12-04T12:15:05.1484645Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 236, in test_amax_fp8_quant 2025-12-04T12:15:05.1484801Z y_compiled = compiled_amax_fp8_quant(x, scale) 2025-12-04T12:15:05.1485300Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:05.1485549Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:05.1486119Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:05.1486317Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:05.1486829Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:05.1486986Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:05.1487521Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:05.1487873Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:05.1488432Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:05.1488582Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:05.1489074Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:05.1489203Z return self._compile_to_module() 2025-12-04T12:15:05.1489727Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:05.1489968Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:05.1490538Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:05.1490683Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:05.1491722Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:05.1491998Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:05.1492714Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:05.1493001Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:05.1493834Z File "/tmp/tmpbalprfne/ll/cllr2iscaoldjg5hjwmklegpzmgueaoxwb5heq27w6iuupzc5wju.py", line 58, in 2025-12-04T12:15:05.1494345Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T12:15:05.1494478Z kernel.precompile( 2025-12-04T12:15:05.1495070Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T12:15:05.1495195Z self._precompile_worker() 2025-12-04T12:15:05.1495790Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:05.1496039Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:05.1497104Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.1497457Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.1498139Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.1498396Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.1498852Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.1499192Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.1499511Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:05.1499974Z def triton_per_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.1500084Z ^ 2025-12-04T12:15:05.1500545Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.1500555Z 2025-12-04T12:15:05.1501346Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:05.1501353Z 2025-12-04T12:15:05.1501358Z 2025-12-04T12:15:05.1501588Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:05.1502169Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda 2025-12-04T12:15:05.1502213Z 2025-12-04T12:15:05.1502482Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:05.1502755Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.1502863Z frames [('total', 1)] 2025-12-04T12:15:05.1502980Z stats [('calls_captured', 6)] 2025-12-04T12:15:05.1503294Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:05.1503571Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.1503692Z graph_break [] 2025-12-04T12:15:05.1503916Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.1504021Z frames [('total', 1)] 2025-12-04T12:15:05.1504152Z stats [('calls_captured', 6)] 2025-12-04T12:15:05.1504374Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.1504611Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:05.1504731Z graph_break [] 2025-12-04T12:15:05.1505001Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.1505122Z frames [('total', 1)] 2025-12-04T12:15:05.1505238Z stats [('calls_captured', 6)] 2025-12-04T12:15:05.1505455Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.1505750Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:05.1505852Z graph_break [] 2025-12-04T12:15:05.1506515Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-fedbc7df4b1c2869.xml - 2025-12-04T12:15:05.1506705Z =========================== short test summary info ============================ 2025-12-04T12:15:05.1507426Z FAILED [0.3204s] inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:05.1507879Z def triton_per_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.1507970Z ^ 2025-12-04T12:15:05.1508436Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.1508441Z 2025-12-04T12:15:05.1509169Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:05.1509178Z 2025-12-04T12:15:05.1509185Z 2025-12-04T12:15:05.1509403Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:05.1510000Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda 2025-12-04T12:15:05.1510005Z 2025-12-04T12:15:05.1510273Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:05.1510460Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T12:15:05.1510678Z ================== 1 failed, 11 deselected, 2 rerun in 3.95s =================== 2025-12-04T12:15:05.1510785Z Got exit code 1 2025-12-04T12:15:05.1510906Z Retrying single test... 2025-12-04T12:15:05.1511385Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-3cf6d62b643bfad7.xml 2025-12-04T12:15:05.1511555Z ============================= test session starts ============================== 2025-12-04T12:15:05.1511957Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T12:15:05.1512100Z cachedir: .pytest_cache 2025-12-04T12:15:05.1512622Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:15:05.1512759Z rootdir: /var/lib/jenkins/workspace 2025-12-04T12:15:05.1512871Z configfile: pytest.ini 2025-12-04T12:15:05.1513507Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T12:15:05.1513781Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T12:15:05.1514441Z stepcurrent: skipping 11 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda 2025-12-04T12:15:05.1514574Z Running 1 items in this shard 2025-12-04T12:15:05.1514579Z 2025-12-04T12:15:05.1515722Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda E1204 11:49:26.903000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_mul_0 2025-12-04T12:15:05.1516613Z E1204 11:49:26.903000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.1517052Z E1204 11:49:26.903000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T12:15:05.1517512Z E1204 11:49:26.903000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 150 2025-12-04T12:15:05.1518033Z E1204 11:49:26.903000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] R0_BLOCK: tl.constexpr = 256 2025-12-04T12:15:05.1518605Z E1204 11:49:26.903000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T12:15:05.1519154Z E1204 11:49:26.903000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T12:15:05.1519697Z E1204 11:49:26.903000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:05.1520298Z E1204 11:49:26.903000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T12:15:05.1520890Z E1204 11:49:26.903000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T12:15:05.1521444Z E1204 11:49:26.903000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T12:15:05.1521907Z E1204 11:49:26.903000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_offset = 0 2025-12-04T12:15:05.1522425Z E1204 11:49:26.903000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T12:15:05.1522909Z E1204 11:49:26.903000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T12:15:05.1523367Z E1204 11:49:26.903000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T12:15:05.1523815Z E1204 11:49:26.903000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_0 = r0_index 2025-12-04T12:15:05.1524472Z E1204 11:49:26.903000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0).to(tl.float32) 2025-12-04T12:15:05.1525025Z E1204 11:49:26.903000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = tl.load(in_ptr1 + (0)) 2025-12-04T12:15:05.1525581Z E1204 11:49:26.903000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tl.broadcast_to(tmp7, [1, 1]) 2025-12-04T12:15:05.1526076Z E1204 11:49:26.903000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tl_math.abs(tmp0) 2025-12-04T12:15:05.1526657Z E1204 11:49:26.903000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK]) 2025-12-04T12:15:05.1527295Z E1204 11:49:26.903000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tl.where(r0_mask, tmp2, float("-inf")) 2025-12-04T12:15:05.1527920Z E1204 11:49:26.903000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = triton_helpers.max2(tmp4, 1)[:, None].to(tl.float32) 2025-12-04T12:15:05.1528442Z E1204 11:49:26.903000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp6 = tmp5.to(tl.float32) 2025-12-04T12:15:05.1528907Z E1204 11:49:26.903000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = tmp6 * tmp8 2025-12-04T12:15:05.1529349Z E1204 11:49:26.903000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = -448.0 2025-12-04T12:15:05.1529937Z E1204 11:49:26.903000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = triton_helpers.maximum(tmp9, tmp10) 2025-12-04T12:15:05.1530378Z E1204 11:49:26.903000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = 448.0 2025-12-04T12:15:05.1530968Z E1204 11:49:26.903000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = triton_helpers.minimum(tmp11, tmp12) 2025-12-04T12:15:05.1531823Z E1204 11:49:26.903000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp14 = tmp13.to(tl.float8e4nv) 2025-12-04T12:15:05.1532561Z E1204 11:49:26.903000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp14, None) 2025-12-04T12:15:05.1532922Z E1204 11:49:26.903000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:05.1534853Z E1204 11:49:26.903000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 75} 2025-12-04T12:15:05.1535412Z E1204 11:49:26.903000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:05.1536526Z E1204 11:49:26.903000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.1537170Z E1204 11:49:26.903000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.1538062Z E1204 11:49:26.903000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.1538751Z E1204 11:49:26.903000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.1539688Z E1204 11:49:26.903000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.1540521Z E1204 11:49:26.903000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.1541186Z E1204 11:49:26.903000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T12:15:05.1542135Z E1204 11:49:26.903000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.1542516Z E1204 11:49:26.903000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T12:15:05.1543417Z E1204 11:49:26.903000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.1543565Z ('RERUN', {'yellow': True}) [3.2888s] [100%] 2025-12-04T12:15:05.1544699Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda E1204 11:49:27.276000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_mul_0 2025-12-04T12:15:05.1545580Z E1204 11:49:27.276000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.1546011Z E1204 11:49:27.276000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T12:15:05.1546492Z E1204 11:49:27.276000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 150 2025-12-04T12:15:05.1547027Z E1204 11:49:27.276000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] R0_BLOCK: tl.constexpr = 256 2025-12-04T12:15:05.1547489Z E1204 11:49:27.276000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T12:15:05.1548038Z E1204 11:49:27.276000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T12:15:05.1548581Z E1204 11:49:27.276000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:05.1549320Z E1204 11:49:27.276000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T12:15:05.1550355Z E1204 11:49:27.276000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T12:15:05.1551241Z E1204 11:49:27.276000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T12:15:05.1551998Z E1204 11:49:27.276000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_offset = 0 2025-12-04T12:15:05.1552631Z E1204 11:49:27.276000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T12:15:05.1553158Z E1204 11:49:27.276000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T12:15:05.1553635Z E1204 11:49:27.276000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T12:15:05.1554083Z E1204 11:49:27.276000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_0 = r0_index 2025-12-04T12:15:05.1554821Z E1204 11:49:27.276000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0).to(tl.float32) 2025-12-04T12:15:05.1555349Z E1204 11:49:27.276000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = tl.load(in_ptr1 + (0)) 2025-12-04T12:15:05.1555894Z E1204 11:49:27.276000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tl.broadcast_to(tmp7, [1, 1]) 2025-12-04T12:15:05.1556449Z E1204 11:49:27.276000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tl_math.abs(tmp0) 2025-12-04T12:15:05.1557108Z E1204 11:49:27.276000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK]) 2025-12-04T12:15:05.1557700Z E1204 11:49:27.276000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tl.where(r0_mask, tmp2, float("-inf")) 2025-12-04T12:15:05.1558330Z E1204 11:49:27.276000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = triton_helpers.max2(tmp4, 1)[:, None].to(tl.float32) 2025-12-04T12:15:05.1558859Z E1204 11:49:27.276000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp6 = tmp5.to(tl.float32) 2025-12-04T12:15:05.1559330Z E1204 11:49:27.276000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = tmp6 * tmp8 2025-12-04T12:15:05.1559777Z E1204 11:49:27.276000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = -448.0 2025-12-04T12:15:05.1560406Z E1204 11:49:27.276000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = triton_helpers.maximum(tmp9, tmp10) 2025-12-04T12:15:05.1560886Z E1204 11:49:27.276000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = 448.0 2025-12-04T12:15:05.1561563Z E1204 11:49:27.276000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = triton_helpers.minimum(tmp11, tmp12) 2025-12-04T12:15:05.1562102Z E1204 11:49:27.276000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp14 = tmp13.to(tl.float8e4nv) 2025-12-04T12:15:05.1563223Z E1204 11:49:27.276000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp14, None) 2025-12-04T12:15:05.1563867Z E1204 11:49:27.276000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:05.1567191Z E1204 11:49:27.276000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 75} 2025-12-04T12:15:05.1568127Z E1204 11:49:27.276000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:05.1569205Z E1204 11:49:27.276000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.1569855Z E1204 11:49:27.276000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.1570746Z E1204 11:49:27.276000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.1571741Z E1204 11:49:27.276000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.1572630Z E1204 11:49:27.276000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.1573477Z E1204 11:49:27.276000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.1574154Z E1204 11:49:27.276000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T12:15:05.1575032Z E1204 11:49:27.276000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.1575406Z E1204 11:49:27.276000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T12:15:05.1576351Z E1204 11:49:27.276000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.1576508Z ('RERUN', {'yellow': True}) [0.3350s] [100%] 2025-12-04T12:15:05.1577657Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda E1204 11:49:27.614000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_mul_0 2025-12-04T12:15:05.1578523Z E1204 11:49:27.614000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.1579023Z E1204 11:49:27.614000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T12:15:05.1579472Z E1204 11:49:27.614000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 150 2025-12-04T12:15:05.1580000Z E1204 11:49:27.614000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] R0_BLOCK: tl.constexpr = 256 2025-12-04T12:15:05.1580464Z E1204 11:49:27.614000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T12:15:05.1581003Z E1204 11:49:27.614000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T12:15:05.1581561Z E1204 11:49:27.614000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:05.1582150Z E1204 11:49:27.614000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T12:15:05.1582745Z E1204 11:49:27.614000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T12:15:05.1583305Z E1204 11:49:27.614000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T12:15:05.1583765Z E1204 11:49:27.614000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_offset = 0 2025-12-04T12:15:05.1584282Z E1204 11:49:27.614000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T12:15:05.1584754Z E1204 11:49:27.614000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T12:15:05.1585259Z E1204 11:49:27.614000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T12:15:05.1585709Z E1204 11:49:27.614000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_0 = r0_index 2025-12-04T12:15:05.1586365Z E1204 11:49:27.614000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0).to(tl.float32) 2025-12-04T12:15:05.1586920Z E1204 11:49:27.614000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = tl.load(in_ptr1 + (0)) 2025-12-04T12:15:05.1587490Z E1204 11:49:27.614000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tl.broadcast_to(tmp7, [1, 1]) 2025-12-04T12:15:05.1588001Z E1204 11:49:27.614000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tl_math.abs(tmp0) 2025-12-04T12:15:05.1588587Z E1204 11:49:27.614000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK]) 2025-12-04T12:15:05.1589169Z E1204 11:49:27.614000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tl.where(r0_mask, tmp2, float("-inf")) 2025-12-04T12:15:05.1589794Z E1204 11:49:27.614000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = triton_helpers.max2(tmp4, 1)[:, None].to(tl.float32) 2025-12-04T12:15:05.1590306Z E1204 11:49:27.614000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp6 = tmp5.to(tl.float32) 2025-12-04T12:15:05.1590786Z E1204 11:49:27.614000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = tmp6 * tmp8 2025-12-04T12:15:05.1591230Z E1204 11:49:27.614000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = -448.0 2025-12-04T12:15:05.1591854Z E1204 11:49:27.614000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = triton_helpers.maximum(tmp9, tmp10) 2025-12-04T12:15:05.1592297Z E1204 11:49:27.614000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = 448.0 2025-12-04T12:15:05.1592869Z E1204 11:49:27.614000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = triton_helpers.minimum(tmp11, tmp12) 2025-12-04T12:15:05.1593413Z E1204 11:49:27.614000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp14 = tmp13.to(tl.float8e4nv) 2025-12-04T12:15:05.1594132Z E1204 11:49:27.614000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp14, None) 2025-12-04T12:15:05.1594515Z E1204 11:49:27.614000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:05.1596468Z E1204 11:49:27.614000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 75} 2025-12-04T12:15:05.1597016Z E1204 11:49:27.614000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:05.1598060Z E1204 11:49:27.614000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.1598798Z E1204 11:49:27.614000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.1599731Z E1204 11:49:27.614000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.1600412Z E1204 11:49:27.614000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.1601370Z E1204 11:49:27.614000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.1602145Z E1204 11:49:27.614000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.1602770Z E1204 11:49:27.614000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T12:15:05.1603641Z E1204 11:49:27.614000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.1604021Z E1204 11:49:27.614000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T12:15:05.1604925Z E1204 11:49:27.614000 112687 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.1605031Z FAILED [0.3360s] [100%] 2025-12-04T12:15:05.1605038Z 2025-12-04T12:15:05.1605198Z ==================================== RERUNS ==================================== 2025-12-04T12:15:05.1605523Z ____ TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda _____ 2025-12-04T12:15:05.1605665Z Traceback (most recent call last): 2025-12-04T12:15:05.1606066Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 236, in test_amax_fp8_quant 2025-12-04T12:15:05.1606224Z y_compiled = compiled_amax_fp8_quant(x, scale) 2025-12-04T12:15:05.1606729Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:05.1606981Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:05.1607578Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:05.1607895Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:05.1608412Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:05.1608587Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:05.1609124Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:05.1609524Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:05.1610430Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:05.1610677Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:05.1611218Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:05.1611348Z return self._compile_to_module() 2025-12-04T12:15:05.1611878Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:05.1612098Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:05.1613021Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:05.1613232Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:05.1613929Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:05.1614205Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:05.1614951Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:05.1615129Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:05.1615709Z File "/tmp/tmp3yt0jipj/ly/clyw34eolqozc3qd2w5eu6cjthldi2jqg5jtgupxlfwg2wb4jphf.py", line 58, in 2025-12-04T12:15:05.1616230Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T12:15:05.1616425Z kernel.precompile( 2025-12-04T12:15:05.1617003Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T12:15:05.1617127Z self._precompile_worker() 2025-12-04T12:15:05.1617726Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:05.1617971Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:05.1618608Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.1618817Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.1619325Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.1619576Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.1620079Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.1620421Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.1620650Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:05.1621100Z def triton_per_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.1621191Z ^ 2025-12-04T12:15:05.1621666Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.1621672Z 2025-12-04T12:15:05.1622481Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:05.1622488Z 2025-12-04T12:15:05.1622493Z 2025-12-04T12:15:05.1622714Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:05.1623321Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda 2025-12-04T12:15:05.1623327Z 2025-12-04T12:15:05.1623673Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:05.1623956Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.1624067Z frames [('total', 1)] 2025-12-04T12:15:05.1624187Z stats [('calls_captured', 6)] 2025-12-04T12:15:05.1624447Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:05.1624672Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.1624790Z graph_break [] 2025-12-04T12:15:05.1625085Z ____ TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda _____ 2025-12-04T12:15:05.1625210Z Traceback (most recent call last): 2025-12-04T12:15:05.1625623Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 236, in test_amax_fp8_quant 2025-12-04T12:15:05.1625844Z y_compiled = compiled_amax_fp8_quant(x, scale) 2025-12-04T12:15:05.1626334Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:05.1626957Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:05.1627524Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:05.1627827Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:05.1628418Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:05.1628568Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:05.1629159Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:05.1629522Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:05.1630100Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:05.1630250Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:05.1630731Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:05.1630869Z return self._compile_to_module() 2025-12-04T12:15:05.1631358Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:05.1631529Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:05.1632149Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:05.1632322Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:05.1632878Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:05.1633146Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:05.1633773Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:05.1633915Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:05.1634454Z File "/tmp/tmp5d1dhucf/w3/cw3g23zyjiveu3xexvruyv4swrhhon324rpzi3cas6hxw53qffpi.py", line 58, in 2025-12-04T12:15:05.1635101Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T12:15:05.1635216Z kernel.precompile( 2025-12-04T12:15:05.1635772Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T12:15:05.1635907Z self._precompile_worker() 2025-12-04T12:15:05.1636511Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:05.1636756Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:05.1637405Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.1637643Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.1638151Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.1638399Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.1638844Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.1639190Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.1639421Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:05.1639911Z def triton_per_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.1640004Z ^ 2025-12-04T12:15:05.1640461Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.1640468Z 2025-12-04T12:15:05.1641195Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:05.1641233Z 2025-12-04T12:15:05.1641238Z 2025-12-04T12:15:05.1641489Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:05.1642085Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda 2025-12-04T12:15:05.1642093Z 2025-12-04T12:15:05.1642361Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:05.1642589Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.1642712Z frames [('total', 1)] 2025-12-04T12:15:05.1642830Z stats [('calls_captured', 6)] 2025-12-04T12:15:05.1643081Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:05.1643307Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.1643412Z graph_break [] 2025-12-04T12:15:05.1643647Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.1643755Z frames [('total', 1)] 2025-12-04T12:15:05.1643876Z stats [('calls_captured', 6)] 2025-12-04T12:15:05.1644113Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.1644351Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:05.1644489Z graph_break [] 2025-12-04T12:15:05.1644649Z =================================== FAILURES =================================== 2025-12-04T12:15:05.1644945Z ____ TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda _____ 2025-12-04T12:15:05.1645084Z Traceback (most recent call last): 2025-12-04T12:15:05.1645486Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 236, in test_amax_fp8_quant 2025-12-04T12:15:05.1645642Z y_compiled = compiled_amax_fp8_quant(x, scale) 2025-12-04T12:15:05.1646150Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:05.1646398Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:05.1646915Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:05.1647120Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:05.1647633Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:05.1647795Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:05.1648330Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:05.1648651Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:05.1649186Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:05.1649337Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:05.1649829Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:05.1649951Z return self._compile_to_module() 2025-12-04T12:15:05.1650434Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:05.1650651Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:05.1651170Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:05.1651300Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:05.1651811Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:05.1652077Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:05.1652707Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:05.1652837Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:05.1653345Z File "/tmp/tmp3khqosgp/7z/c7zlrawrzs2ffa3mrvgu5rfsrgbpcxqqoafrhdaq5t7pkuovwzzf.py", line 58, in 2025-12-04T12:15:05.1653824Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T12:15:05.1653942Z kernel.precompile( 2025-12-04T12:15:05.1654512Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T12:15:05.1654635Z self._precompile_worker() 2025-12-04T12:15:05.1655287Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:05.1655484Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:05.1656078Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.1656278Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.1656815Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.1657107Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.1657566Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.1657903Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.1658131Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:05.1658573Z def triton_per_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.1658666Z ^ 2025-12-04T12:15:05.1659140Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.1659145Z 2025-12-04T12:15:05.1659857Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:05.1659865Z 2025-12-04T12:15:05.1659870Z 2025-12-04T12:15:05.1660092Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:05.1660684Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda 2025-12-04T12:15:05.1660690Z 2025-12-04T12:15:05.1660957Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:05.1661193Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.1661302Z frames [('total', 1)] 2025-12-04T12:15:05.1661419Z stats [('calls_captured', 6)] 2025-12-04T12:15:05.1661674Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:05.1661895Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.1662009Z graph_break [] 2025-12-04T12:15:05.1662233Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.1662338Z frames [('total', 1)] 2025-12-04T12:15:05.1662466Z stats [('calls_captured', 6)] 2025-12-04T12:15:05.1662720Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.1662957Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:05.1663075Z graph_break [] 2025-12-04T12:15:05.1663294Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.1663403Z frames [('total', 1)] 2025-12-04T12:15:05.1663568Z stats [('calls_captured', 6)] 2025-12-04T12:15:05.1663783Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.1664075Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:05.1664216Z graph_break [] 2025-12-04T12:15:05.1664913Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-3cf6d62b643bfad7.xml - 2025-12-04T12:15:05.1665148Z =========================== short test summary info ============================ 2025-12-04T12:15:05.1665918Z FAILED [0.3360s] inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:05.1666391Z def triton_per_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.1666502Z ^ 2025-12-04T12:15:05.1666966Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.1666975Z 2025-12-04T12:15:05.1667746Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:05.1667752Z 2025-12-04T12:15:05.1667756Z 2025-12-04T12:15:05.1667976Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:05.1668642Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda 2025-12-04T12:15:05.1668648Z 2025-12-04T12:15:05.1668918Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:05.1669100Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T12:15:05.1669341Z ================== 1 failed, 187 deselected, 2 rerun in 4.00s ================== 2025-12-04T12:15:05.1669469Z Got exit code 1 2025-12-04T12:15:05.1669579Z Retrying single test... 2025-12-04T12:15:05.1670070Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-e2e744a24cd2751e.xml 2025-12-04T12:15:05.1670236Z ============================= test session starts ============================== 2025-12-04T12:15:05.1670644Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T12:15:05.1670763Z cachedir: .pytest_cache 2025-12-04T12:15:05.1671590Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:15:05.1671739Z rootdir: /var/lib/jenkins/workspace 2025-12-04T12:15:05.1671888Z configfile: pytest.ini 2025-12-04T12:15:05.1672542Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T12:15:05.1672802Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T12:15:05.1673474Z stepcurrent: skipping 11 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda 2025-12-04T12:15:05.1673607Z Running 1 items in this shard 2025-12-04T12:15:05.1673613Z 2025-12-04T12:15:05.1674815Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda E1204 11:49:46.838000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_mul_0 2025-12-04T12:15:05.1675893Z E1204 11:49:46.838000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.1676367Z E1204 11:49:46.838000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T12:15:05.1676876Z E1204 11:49:46.838000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 150 2025-12-04T12:15:05.1677468Z E1204 11:49:46.838000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] R0_BLOCK: tl.constexpr = 256 2025-12-04T12:15:05.1677938Z E1204 11:49:46.838000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T12:15:05.1678491Z E1204 11:49:46.838000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T12:15:05.1679036Z E1204 11:49:46.838000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:05.1679619Z E1204 11:49:46.838000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T12:15:05.1680219Z E1204 11:49:46.838000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T12:15:05.1680782Z E1204 11:49:46.838000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T12:15:05.1681246Z E1204 11:49:46.838000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_offset = 0 2025-12-04T12:15:05.1681817Z E1204 11:49:46.838000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T12:15:05.1682310Z E1204 11:49:46.838000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T12:15:05.1682774Z E1204 11:49:46.838000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T12:15:05.1683226Z E1204 11:49:46.838000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_0 = r0_index 2025-12-04T12:15:05.1683892Z E1204 11:49:46.838000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0).to(tl.float32) 2025-12-04T12:15:05.1684417Z E1204 11:49:46.838000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = tl.load(in_ptr1 + (0)) 2025-12-04T12:15:05.1684979Z E1204 11:49:46.838000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tl.broadcast_to(tmp7, [1, 1]) 2025-12-04T12:15:05.1685486Z E1204 11:49:46.838000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tl_math.abs(tmp0) 2025-12-04T12:15:05.1686069Z E1204 11:49:46.838000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK]) 2025-12-04T12:15:05.1686654Z E1204 11:49:46.838000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tl.where(r0_mask, tmp2, float("-inf")) 2025-12-04T12:15:05.1687282Z E1204 11:49:46.838000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = triton_helpers.max2(tmp4, 1)[:, None].to(tl.float32) 2025-12-04T12:15:05.1687809Z E1204 11:49:46.838000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp6 = tmp5.to(tl.float32) 2025-12-04T12:15:05.1688281Z E1204 11:49:46.838000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = tmp6 * tmp8 2025-12-04T12:15:05.1688772Z E1204 11:49:46.838000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = -448.0 2025-12-04T12:15:05.1689366Z E1204 11:49:46.838000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = triton_helpers.maximum(tmp9, tmp10) 2025-12-04T12:15:05.1689807Z E1204 11:49:46.838000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = 448.0 2025-12-04T12:15:05.1690427Z E1204 11:49:46.838000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = triton_helpers.minimum(tmp11, tmp12) 2025-12-04T12:15:05.1690990Z E1204 11:49:46.838000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp14 = tmp13.to(tl.float8e4nv) 2025-12-04T12:15:05.1691706Z E1204 11:49:46.838000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp14, None) 2025-12-04T12:15:05.1692091Z E1204 11:49:46.838000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:05.1694029Z E1204 11:49:46.838000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 75} 2025-12-04T12:15:05.1694584Z E1204 11:49:46.838000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:05.1695634Z E1204 11:49:46.838000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.1696375Z E1204 11:49:46.838000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.1697270Z E1204 11:49:46.838000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.1697973Z E1204 11:49:46.838000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.1698860Z E1204 11:49:46.838000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.1699632Z E1204 11:49:46.838000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.1700251Z E1204 11:49:46.838000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T12:15:05.1701129Z E1204 11:49:46.838000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.1701512Z E1204 11:49:46.838000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T12:15:05.1702400Z E1204 11:49:46.838000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.1702552Z ('RERUN', {'yellow': True}) [3.2738s] [100%] 2025-12-04T12:15:05.1703733Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda E1204 11:49:47.206000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_mul_0 2025-12-04T12:15:05.1704601Z E1204 11:49:47.206000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.1705105Z E1204 11:49:47.206000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T12:15:05.1705551Z E1204 11:49:47.206000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 150 2025-12-04T12:15:05.1706085Z E1204 11:49:47.206000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] R0_BLOCK: tl.constexpr = 256 2025-12-04T12:15:05.1706556Z E1204 11:49:47.206000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T12:15:05.1707110Z E1204 11:49:47.206000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T12:15:05.1707652Z E1204 11:49:47.206000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:05.1708235Z E1204 11:49:47.206000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T12:15:05.1708837Z E1204 11:49:47.206000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T12:15:05.1709394Z E1204 11:49:47.206000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T12:15:05.1709898Z E1204 11:49:47.206000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_offset = 0 2025-12-04T12:15:05.1710415Z E1204 11:49:47.206000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T12:15:05.1710889Z E1204 11:49:47.206000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T12:15:05.1711362Z E1204 11:49:47.206000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T12:15:05.1711811Z E1204 11:49:47.206000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_0 = r0_index 2025-12-04T12:15:05.1712472Z E1204 11:49:47.206000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0).to(tl.float32) 2025-12-04T12:15:05.1712996Z E1204 11:49:47.206000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = tl.load(in_ptr1 + (0)) 2025-12-04T12:15:05.1713543Z E1204 11:49:47.206000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tl.broadcast_to(tmp7, [1, 1]) 2025-12-04T12:15:05.1714057Z E1204 11:49:47.206000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tl_math.abs(tmp0) 2025-12-04T12:15:05.1714639Z E1204 11:49:47.206000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK]) 2025-12-04T12:15:05.1715220Z E1204 11:49:47.206000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tl.where(r0_mask, tmp2, float("-inf")) 2025-12-04T12:15:05.1715845Z E1204 11:49:47.206000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = triton_helpers.max2(tmp4, 1)[:, None].to(tl.float32) 2025-12-04T12:15:05.1716404Z E1204 11:49:47.206000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp6 = tmp5.to(tl.float32) 2025-12-04T12:15:05.1716885Z E1204 11:49:47.206000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = tmp6 * tmp8 2025-12-04T12:15:05.1717331Z E1204 11:49:47.206000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = -448.0 2025-12-04T12:15:05.1717914Z E1204 11:49:47.206000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = triton_helpers.maximum(tmp9, tmp10) 2025-12-04T12:15:05.1718433Z E1204 11:49:47.206000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = 448.0 2025-12-04T12:15:05.1719004Z E1204 11:49:47.206000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = triton_helpers.minimum(tmp11, tmp12) 2025-12-04T12:15:05.1719551Z E1204 11:49:47.206000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp14 = tmp13.to(tl.float8e4nv) 2025-12-04T12:15:05.1720264Z E1204 11:49:47.206000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp14, None) 2025-12-04T12:15:05.1720643Z E1204 11:49:47.206000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:05.1722578Z E1204 11:49:47.206000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 75} 2025-12-04T12:15:05.1723169Z E1204 11:49:47.206000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:05.1724208Z E1204 11:49:47.206000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.1724847Z E1204 11:49:47.206000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.1725740Z E1204 11:49:47.206000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.1726416Z E1204 11:49:47.206000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.1727314Z E1204 11:49:47.206000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.1728090Z E1204 11:49:47.206000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.1728715Z E1204 11:49:47.206000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T12:15:05.1729590Z E1204 11:49:47.206000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.1729975Z E1204 11:49:47.206000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T12:15:05.1730911Z E1204 11:49:47.206000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.1731049Z ('RERUN', {'yellow': True}) [0.3303s] [100%] 2025-12-04T12:15:05.1732194Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda E1204 11:49:47.534000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_mul_0 2025-12-04T12:15:05.1733203Z E1204 11:49:47.534000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.1733653Z E1204 11:49:47.534000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T12:15:05.1734105Z E1204 11:49:47.534000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 150 2025-12-04T12:15:05.1734641Z E1204 11:49:47.534000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] R0_BLOCK: tl.constexpr = 256 2025-12-04T12:15:05.1735105Z E1204 11:49:47.534000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T12:15:05.1735642Z E1204 11:49:47.534000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T12:15:05.1736204Z E1204 11:49:47.534000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:05.1736855Z E1204 11:49:47.534000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T12:15:05.1737502Z E1204 11:49:47.534000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T12:15:05.1738061Z E1204 11:49:47.534000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T12:15:05.1738505Z E1204 11:49:47.534000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_offset = 0 2025-12-04T12:15:05.1739040Z E1204 11:49:47.534000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T12:15:05.1739515Z E1204 11:49:47.534000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T12:15:05.1739989Z E1204 11:49:47.534000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T12:15:05.1740438Z E1204 11:49:47.534000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_0 = r0_index 2025-12-04T12:15:05.1741087Z E1204 11:49:47.534000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0).to(tl.float32) 2025-12-04T12:15:05.1741621Z E1204 11:49:47.534000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = tl.load(in_ptr1 + (0)) 2025-12-04T12:15:05.1742164Z E1204 11:49:47.534000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tl.broadcast_to(tmp7, [1, 1]) 2025-12-04T12:15:05.1742678Z E1204 11:49:47.534000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tl_math.abs(tmp0) 2025-12-04T12:15:05.1743258Z E1204 11:49:47.534000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK]) 2025-12-04T12:15:05.1743843Z E1204 11:49:47.534000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tl.where(r0_mask, tmp2, float("-inf")) 2025-12-04T12:15:05.1744507Z E1204 11:49:47.534000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = triton_helpers.max2(tmp4, 1)[:, None].to(tl.float32) 2025-12-04T12:15:05.1745012Z E1204 11:49:47.534000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp6 = tmp5.to(tl.float32) 2025-12-04T12:15:05.1745495Z E1204 11:49:47.534000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = tmp6 * tmp8 2025-12-04T12:15:05.1745969Z E1204 11:49:47.534000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = -448.0 2025-12-04T12:15:05.1746582Z E1204 11:49:47.534000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = triton_helpers.maximum(tmp9, tmp10) 2025-12-04T12:15:05.1747022Z E1204 11:49:47.534000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = 448.0 2025-12-04T12:15:05.1747599Z E1204 11:49:47.534000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = triton_helpers.minimum(tmp11, tmp12) 2025-12-04T12:15:05.1748144Z E1204 11:49:47.534000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp14 = tmp13.to(tl.float8e4nv) 2025-12-04T12:15:05.1748856Z E1204 11:49:47.534000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp14, None) 2025-12-04T12:15:05.1749237Z E1204 11:49:47.534000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:05.1751164Z E1204 11:49:47.534000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 75} 2025-12-04T12:15:05.1751749Z E1204 11:49:47.534000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:05.1752794Z E1204 11:49:47.534000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.1753426Z E1204 11:49:47.534000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.1754327Z E1204 11:49:47.534000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.1755010Z E1204 11:49:47.534000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.1755921Z E1204 11:49:47.534000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.1756693Z E1204 11:49:47.534000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.1757314Z E1204 11:49:47.534000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T12:15:05.1758225Z E1204 11:49:47.534000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.1758608Z E1204 11:49:47.534000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T12:15:05.1759501Z E1204 11:49:47.534000 112884 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.1759639Z FAILED [0.3260s] [100%] 2025-12-04T12:15:05.1759645Z 2025-12-04T12:15:05.1759805Z ==================================== RERUNS ==================================== 2025-12-04T12:15:05.1760130Z ____ TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda _____ 2025-12-04T12:15:05.1760258Z Traceback (most recent call last): 2025-12-04T12:15:05.1760674Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 236, in test_amax_fp8_quant 2025-12-04T12:15:05.1760835Z y_compiled = compiled_amax_fp8_quant(x, scale) 2025-12-04T12:15:05.1761343Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:05.1761592Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:05.1762108Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:05.1762317Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:05.1762832Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:05.1762998Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:05.1763534Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:05.1763890Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:05.1764427Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:05.1764577Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:05.1765053Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:05.1765194Z return self._compile_to_module() 2025-12-04T12:15:05.1765685Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:05.1765867Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:05.1766388Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:05.1766518Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:05.1767031Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:05.1767267Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:05.1767868Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:05.1767998Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:05.1768474Z File "/tmp/tmpiq_xyrgc/qw/cqwfwgqwf53dh6to6cphxvynpeqqb2g6xdq27nvoojjd6baluz5v.py", line 58, in 2025-12-04T12:15:05.1768953Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T12:15:05.1769069Z kernel.precompile( 2025-12-04T12:15:05.1769625Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T12:15:05.1769759Z self._precompile_worker() 2025-12-04T12:15:05.1770400Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:05.1770596Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:05.1771386Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.1771586Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.1772156Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.1772404Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.1772903Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.1773240Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.1773470Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:05.1773922Z def triton_per_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.1774016Z ^ 2025-12-04T12:15:05.1774474Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.1774504Z 2025-12-04T12:15:05.1775221Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:05.1775230Z 2025-12-04T12:15:05.1775235Z 2025-12-04T12:15:05.1775457Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:05.1776055Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda 2025-12-04T12:15:05.1776860Z 2025-12-04T12:15:05.1777148Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:05.1777394Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.1777503Z frames [('total', 1)] 2025-12-04T12:15:05.1777623Z stats [('calls_captured', 6)] 2025-12-04T12:15:05.1777882Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:05.1778110Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.1778213Z graph_break [] 2025-12-04T12:15:05.1778526Z ____ TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda _____ 2025-12-04T12:15:05.1778653Z Traceback (most recent call last): 2025-12-04T12:15:05.1779071Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 236, in test_amax_fp8_quant 2025-12-04T12:15:05.1779230Z y_compiled = compiled_amax_fp8_quant(x, scale) 2025-12-04T12:15:05.1779720Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:05.1779991Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:05.1780506Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:05.1780715Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:05.1781231Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:05.1781384Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:05.1781936Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:05.1782262Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:05.1782783Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:05.1782952Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:05.1783494Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:05.1783634Z return self._compile_to_module() 2025-12-04T12:15:05.1784122Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:05.1784288Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:05.1784863Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:05.1785028Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:05.1785541Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:05.1785775Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:05.1786368Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:05.1786516Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:05.1786994Z File "/tmp/tmpxx_20eu0/en/cenlg47guevxwfvbzco5q43q2pmysmyoe44t3zdvbfokblfu7w4q.py", line 58, in 2025-12-04T12:15:05.1787457Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T12:15:05.1787591Z kernel.precompile( 2025-12-04T12:15:05.1788147Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T12:15:05.1788285Z self._precompile_worker() 2025-12-04T12:15:05.1788883Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:05.1789102Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:05.1789721Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.1789924Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.1790387Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.1790632Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.1791079Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.1791431Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.1791657Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:05.1792091Z def triton_per_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.1792197Z ^ 2025-12-04T12:15:05.1792654Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.1792660Z 2025-12-04T12:15:05.1793399Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:05.1793406Z 2025-12-04T12:15:05.1793410Z 2025-12-04T12:15:05.1793627Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:05.1794225Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda 2025-12-04T12:15:05.1794233Z 2025-12-04T12:15:05.1794500Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:05.1794722Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.1794841Z frames [('total', 1)] 2025-12-04T12:15:05.1794959Z stats [('calls_captured', 6)] 2025-12-04T12:15:05.1795231Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:05.1795468Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.1795569Z graph_break [] 2025-12-04T12:15:05.1795803Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.1795908Z frames [('total', 1)] 2025-12-04T12:15:05.1796025Z stats [('calls_captured', 6)] 2025-12-04T12:15:05.1796292Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.1796530Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:05.1796665Z graph_break [] 2025-12-04T12:15:05.1796830Z =================================== FAILURES =================================== 2025-12-04T12:15:05.1797121Z ____ TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda _____ 2025-12-04T12:15:05.1797250Z Traceback (most recent call last): 2025-12-04T12:15:05.1797666Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 236, in test_amax_fp8_quant 2025-12-04T12:15:05.1797826Z y_compiled = compiled_amax_fp8_quant(x, scale) 2025-12-04T12:15:05.1798326Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:05.1798576Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:05.1799089Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:05.1799296Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:05.1799808Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:05.1799968Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:05.1800538Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:05.1800859Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:05.1801388Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:05.1801537Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:05.1802017Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:05.1802157Z return self._compile_to_module() 2025-12-04T12:15:05.1802642Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:05.1802819Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:05.1803334Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:05.1803469Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:05.1803983Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:05.1804213Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:05.1804810Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:05.1804942Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:05.1805444Z File "/tmp/tmpiwa3423i/up/cupqm6wvqnoow64ss6ujrfpfoqsauevwfpwscdjoflk47zwsmo52.py", line 58, in 2025-12-04T12:15:05.1805923Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T12:15:05.1806034Z kernel.precompile( 2025-12-04T12:15:05.1806594Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T12:15:05.1806764Z self._precompile_worker() 2025-12-04T12:15:05.1807359Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:05.1807556Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:05.1808152Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.1808383Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.1808876Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.1809124Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.1809580Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.1809921Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.1810148Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:05.1810592Z def triton_per_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.1810681Z ^ 2025-12-04T12:15:05.1811141Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.1811161Z 2025-12-04T12:15:05.1811874Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:05.1811880Z 2025-12-04T12:15:05.1811885Z 2025-12-04T12:15:05.1812101Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:05.1812724Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda 2025-12-04T12:15:05.1812730Z 2025-12-04T12:15:05.1813000Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:05.1813233Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.1813339Z frames [('total', 1)] 2025-12-04T12:15:05.1813456Z stats [('calls_captured', 6)] 2025-12-04T12:15:05.1813709Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:05.1813935Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.1814035Z graph_break [] 2025-12-04T12:15:05.1814269Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.1814375Z frames [('total', 1)] 2025-12-04T12:15:05.1814504Z stats [('calls_captured', 6)] 2025-12-04T12:15:05.1814722Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.1814959Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:05.1815072Z graph_break [] 2025-12-04T12:15:05.1815288Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.1815395Z frames [('total', 1)] 2025-12-04T12:15:05.1815523Z stats [('calls_captured', 6)] 2025-12-04T12:15:05.1815739Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.1815973Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:05.1816088Z graph_break [] 2025-12-04T12:15:05.1816845Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-e2e744a24cd2751e.xml - 2025-12-04T12:15:05.1817040Z =========================== short test summary info ============================ 2025-12-04T12:15:05.1817759Z FAILED [0.3260s] inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:05.1818252Z def triton_per_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.1818363Z ^ 2025-12-04T12:15:05.1818821Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.1818827Z 2025-12-04T12:15:05.1819548Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:05.1819586Z 2025-12-04T12:15:05.1819590Z 2025-12-04T12:15:05.1819810Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:05.1820441Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda 2025-12-04T12:15:05.1820448Z 2025-12-04T12:15:05.1820718Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:05.1820902Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T12:15:05.1821120Z ================== 1 failed, 187 deselected, 2 rerun in 3.97s ================== 2025-12-04T12:15:05.1821223Z Got exit code 1 2025-12-04T12:15:05.1821720Z FAILED CONSISTENTLY: test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda 2025-12-04T12:15:05.1822145Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T12:15:05.1822614Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-6de5a411a3f65f82.xml 2025-12-04T12:15:05.1822793Z ============================= test session starts ============================== 2025-12-04T12:15:05.1823144Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T12:15:05.1823289Z cachedir: .pytest_cache 2025-12-04T12:15:05.1823827Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:15:05.1823954Z rootdir: /var/lib/jenkins/workspace 2025-12-04T12:15:05.1824063Z configfile: pytest.ini 2025-12-04T12:15:05.1824666Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T12:15:05.1824894Z collecting ... collected 188 items / 12 deselected / 176 selected 2025-12-04T12:15:05.1825056Z stepcurrent: skipping 12 already run items. 2025-12-04T12:15:05.1825173Z Running 176 items in this shard 2025-12-04T12:15:05.1825178Z 2025-12-04T12:15:05.1826331Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda E1204 11:50:06.734000 113081 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_mul_1 2025-12-04T12:15:05.1827233Z E1204 11:50:06.734000 113081 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_1(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.1827665Z E1204 11:50:06.734000 113081 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T12:15:05.1828120Z E1204 11:50:06.734000 113081 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 5 2025-12-04T12:15:05.1828635Z E1204 11:50:06.734000 113081 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] R0_BLOCK: tl.constexpr = 8 2025-12-04T12:15:05.1829114Z E1204 11:50:06.734000 113081 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T12:15:05.1829648Z E1204 11:50:06.734000 113081 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T12:15:05.1830224Z E1204 11:50:06.734000 113081 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:05.1830826Z E1204 11:50:06.734000 113081 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T12:15:05.1831413Z E1204 11:50:06.734000 113081 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T12:15:05.1832017Z E1204 11:50:06.734000 113081 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T12:15:05.1832491Z E1204 11:50:06.734000 113081 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_offset = 0 2025-12-04T12:15:05.1833012Z E1204 11:50:06.734000 113081 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T12:15:05.1833496Z E1204 11:50:06.734000 113081 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T12:15:05.1833958Z E1204 11:50:06.734000 113081 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T12:15:05.1834417Z E1204 11:50:06.734000 113081 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_0 = r0_index 2025-12-04T12:15:05.1835012Z E1204 11:50:06.734000 113081 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0) 2025-12-04T12:15:05.1835535Z E1204 11:50:06.734000 113081 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp6 = tl.load(in_ptr1 + (0)) 2025-12-04T12:15:05.1836093Z E1204 11:50:06.734000 113081 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = tl.broadcast_to(tmp6, [1, 1]) 2025-12-04T12:15:05.1836677Z E1204 11:50:06.734000 113081 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tl.broadcast_to(tmp0, [XBLOCK, R0_BLOCK]) 2025-12-04T12:15:05.1837298Z E1204 11:50:06.734000 113081 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tl.where(r0_mask, tmp1, float("-inf")) 2025-12-04T12:15:05.1837922Z E1204 11:50:06.734000 113081 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = triton_helpers.max2(tmp3, 1)[:, None].to(tl.float32) 2025-12-04T12:15:05.1838440Z E1204 11:50:06.734000 113081 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = tmp4.to(tl.float32) 2025-12-04T12:15:05.1838910Z E1204 11:50:06.734000 113081 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tmp5 * tmp7 2025-12-04T12:15:05.1839356Z E1204 11:50:06.734000 113081 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = -448.0 2025-12-04T12:15:05.1839931Z E1204 11:50:06.734000 113081 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = triton_helpers.maximum(tmp8, tmp9) 2025-12-04T12:15:05.1840373Z E1204 11:50:06.734000 113081 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = 448.0 2025-12-04T12:15:05.1840959Z E1204 11:50:06.734000 113081 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = triton_helpers.minimum(tmp10, tmp11) 2025-12-04T12:15:05.1841492Z E1204 11:50:06.734000 113081 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = tmp12.to(tl.float8e4nv) 2025-12-04T12:15:05.1842208Z E1204 11:50:06.734000 113081 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp13, None) 2025-12-04T12:15:05.1842590Z E1204 11:50:06.734000 113081 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:05.1844541Z E1204 11:50:06.734000 113081 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp32', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 75} 2025-12-04T12:15:05.1845096Z E1204 11:50:06.734000 113081 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:05.1846203Z E1204 11:50:06.734000 113081 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.1846846Z E1204 11:50:06.734000 113081 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.1847743Z E1204 11:50:06.734000 113081 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.1848439Z E1204 11:50:06.734000 113081 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.1849334Z E1204 11:50:06.734000 113081 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.1850111Z E1204 11:50:06.734000 113081 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.1850773Z E1204 11:50:06.734000 113081 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T12:15:05.1851652Z E1204 11:50:06.734000 113081 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_1(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.1852036Z E1204 11:50:06.734000 113081 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T12:15:05.1852929Z E1204 11:50:06.734000 113081 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.1853069Z ('RERUN', {'yellow': True}) [3.5894s] [ 0%] 2025-12-04T12:15:05.1854227Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda E1204 11:50:07.233000 113081 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_mul_1 2025-12-04T12:15:05.1855105Z E1204 11:50:07.233000 113081 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_1(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.1855554Z E1204 11:50:07.233000 113081 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T12:15:05.1855995Z E1204 11:50:07.233000 113081 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 5 2025-12-04T12:15:05.1856585Z E1204 11:50:07.233000 113081 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] R0_BLOCK: tl.constexpr = 8 2025-12-04T12:15:05.1857051Z E1204 11:50:07.233000 113081 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T12:15:05.1857584Z E1204 11:50:07.233000 113081 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T12:15:05.1858179Z E1204 11:50:07.233000 113081 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:05.1858767Z E1204 11:50:07.233000 113081 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T12:15:05.1859370Z E1204 11:50:07.233000 113081 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T12:15:05.1859957Z E1204 11:50:07.233000 113081 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T12:15:05.1860432Z E1204 11:50:07.233000 113081 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_offset = 0 2025-12-04T12:15:05.1860966Z E1204 11:50:07.233000 113081 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T12:15:05.1861446Z E1204 11:50:07.233000 113081 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T12:15:05.1861927Z E1204 11:50:07.233000 113081 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T12:15:05.1862377Z E1204 11:50:07.233000 113081 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_0 = r0_index 2025-12-04T12:15:05.1862972Z E1204 11:50:07.233000 113081 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0) 2025-12-04T12:15:05.1863512Z E1204 11:50:07.233000 113081 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp6 = tl.load(in_ptr1 + (0)) 2025-12-04T12:15:05.1864052Z E1204 11:50:07.233000 113081 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = tl.broadcast_to(tmp6, [1, 1]) 2025-12-04T12:15:05.1864694Z E1204 11:50:07.233000 113081 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tl.broadcast_to(tmp0, [XBLOCK, R0_BLOCK]) 2025-12-04T12:15:05.1865264Z E1204 11:50:07.233000 113081 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tl.where(r0_mask, tmp1, float("-inf")) 2025-12-04T12:15:05.1865903Z E1204 11:50:07.233000 113081 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = triton_helpers.max2(tmp3, 1)[:, None].to(tl.float32) 2025-12-04T12:15:05.1866418Z E1204 11:50:07.233000 113081 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = tmp4.to(tl.float32) 2025-12-04T12:15:05.1866891Z E1204 11:50:07.233000 113081 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tmp5 * tmp7 2025-12-04T12:15:05.1867352Z E1204 11:50:07.233000 113081 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = -448.0 2025-12-04T12:15:05.1867925Z E1204 11:50:07.233000 113081 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = triton_helpers.maximum(tmp8, tmp9) 2025-12-04T12:15:05.1868372Z E1204 11:50:07.233000 113081 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = 448.0 2025-12-04T12:15:05.1868948Z E1204 11:50:07.233000 113081 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = triton_helpers.minimum(tmp10, tmp11) 2025-12-04T12:15:05.1869481Z E1204 11:50:07.233000 113081 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = tmp12.to(tl.float8e4nv) 2025-12-04T12:15:05.1870203Z E1204 11:50:07.233000 113081 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp13, None) 2025-12-04T12:15:05.1870564Z E1204 11:50:07.233000 113081 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:05.1872751Z E1204 11:50:07.233000 113081 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp32', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 75} 2025-12-04T12:15:05.1873328Z E1204 11:50:07.233000 113081 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:05.1874434Z E1204 11:50:07.233000 113081 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.1875071Z E1204 11:50:07.233000 113081 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.1875985Z E1204 11:50:07.233000 113081 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.1876664Z E1204 11:50:07.233000 113081 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.1877548Z E1204 11:50:07.233000 113081 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.1878329Z E1204 11:50:07.233000 113081 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.1878993Z E1204 11:50:07.233000 113081 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T12:15:05.1879875Z E1204 11:50:07.233000 113081 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_1(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.1880239Z E1204 11:50:07.233000 113081 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T12:15:05.1881145Z E1204 11:50:07.233000 113081 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.1881280Z ('RERUN', {'yellow': True}) [0.4605s] [ 0%] 2025-12-04T12:15:05.1882435Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda E1204 11:50:07.694000 113081 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_mul_1 2025-12-04T12:15:05.1883326Z E1204 11:50:07.694000 113081 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_1(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.1883755Z E1204 11:50:07.694000 113081 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T12:15:05.1884207Z E1204 11:50:07.694000 113081 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 5 2025-12-04T12:15:05.1884719Z E1204 11:50:07.694000 113081 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] R0_BLOCK: tl.constexpr = 8 2025-12-04T12:15:05.1885179Z E1204 11:50:07.694000 113081 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T12:15:05.1885764Z E1204 11:50:07.694000 113081 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T12:15:05.1886311Z E1204 11:50:07.694000 113081 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:05.1886909Z E1204 11:50:07.694000 113081 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T12:15:05.1887527Z E1204 11:50:07.694000 113081 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T12:15:05.1888128Z E1204 11:50:07.694000 113081 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T12:15:05.1888576Z E1204 11:50:07.694000 113081 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_offset = 0 2025-12-04T12:15:05.1889101Z E1204 11:50:07.694000 113081 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T12:15:05.1889584Z E1204 11:50:07.694000 113081 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T12:15:05.1890045Z E1204 11:50:07.694000 113081 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T12:15:05.1890505Z E1204 11:50:07.694000 113081 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_0 = r0_index 2025-12-04T12:15:05.1891104Z E1204 11:50:07.694000 113081 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0) 2025-12-04T12:15:05.1891624Z E1204 11:50:07.694000 113081 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp6 = tl.load(in_ptr1 + (0)) 2025-12-04T12:15:05.1892211Z E1204 11:50:07.694000 113081 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = tl.broadcast_to(tmp6, [1, 1]) 2025-12-04T12:15:05.1892797Z E1204 11:50:07.694000 113081 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tl.broadcast_to(tmp0, [XBLOCK, R0_BLOCK]) 2025-12-04T12:15:05.1893379Z E1204 11:50:07.694000 113081 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tl.where(r0_mask, tmp1, float("-inf")) 2025-12-04T12:15:05.1894004Z E1204 11:50:07.694000 113081 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = triton_helpers.max2(tmp3, 1)[:, None].to(tl.float32) 2025-12-04T12:15:05.1894514Z E1204 11:50:07.694000 113081 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = tmp4.to(tl.float32) 2025-12-04T12:15:05.1894993Z E1204 11:50:07.694000 113081 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tmp5 * tmp7 2025-12-04T12:15:05.1895436Z E1204 11:50:07.694000 113081 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = -448.0 2025-12-04T12:15:05.1896016Z E1204 11:50:07.694000 113081 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = triton_helpers.maximum(tmp8, tmp9) 2025-12-04T12:15:05.1896513Z E1204 11:50:07.694000 113081 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = 448.0 2025-12-04T12:15:05.1897085Z E1204 11:50:07.694000 113081 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = triton_helpers.minimum(tmp10, tmp11) 2025-12-04T12:15:05.1897642Z E1204 11:50:07.694000 113081 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = tmp12.to(tl.float8e4nv) 2025-12-04T12:15:05.1898359Z E1204 11:50:07.694000 113081 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp13, None) 2025-12-04T12:15:05.1898741Z E1204 11:50:07.694000 113081 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:05.1900716Z E1204 11:50:07.694000 113081 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp32', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 75} 2025-12-04T12:15:05.1901329Z E1204 11:50:07.694000 113081 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:05.1902375Z E1204 11:50:07.694000 113081 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.1903022Z E1204 11:50:07.694000 113081 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.1903908Z E1204 11:50:07.694000 113081 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.1904589Z E1204 11:50:07.694000 113081 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.1905498Z E1204 11:50:07.694000 113081 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.1906267Z E1204 11:50:07.694000 113081 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.1906923Z E1204 11:50:07.694000 113081 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T12:15:05.1907791Z E1204 11:50:07.694000 113081 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_1(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.1908170Z E1204 11:50:07.694000 113081 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T12:15:05.1909061Z E1204 11:50:07.694000 113081 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.1909166Z FAILED [0.4598s] [ 0%] 2025-12-04T12:15:05.1909175Z 2025-12-04T12:15:05.1909333Z ==================================== RERUNS ==================================== 2025-12-04T12:15:05.1909631Z ___ TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda ____ 2025-12-04T12:15:05.1909770Z Traceback (most recent call last): 2025-12-04T12:15:05.1910169Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 236, in test_amax_fp8_quant 2025-12-04T12:15:05.1910325Z y_compiled = compiled_amax_fp8_quant(x, scale) 2025-12-04T12:15:05.1910825Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:05.1911077Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:05.1911587Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:05.1911791Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:05.1912303Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:05.1912497Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:05.1913031Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:05.1913351Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:05.1913881Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:05.1914063Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:05.1914599Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:05.1914724Z return self._compile_to_module() 2025-12-04T12:15:05.1915208Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:05.1915390Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:05.1915910Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:05.1916041Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:05.1916554Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:05.1916786Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:05.1917387Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:05.1917518Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:05.1918010Z File "/tmp/tmpn0ip3q_o/pf/cpfhhofptfqdvu2nzxhygme3zjx2l52n2mjges6etyvlnownz6iv.py", line 113, in 2025-12-04T12:15:05.1918517Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T12:15:05.1918632Z kernel.precompile( 2025-12-04T12:15:05.1919195Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T12:15:05.1919315Z self._precompile_worker() 2025-12-04T12:15:05.1919912Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:05.1920110Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:05.1920705Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.1920907Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.1921370Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.1921619Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.1922074Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.1922410Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.1922636Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:05.1923078Z def triton_per_fused__to_copy_abs_amax_clamp_mul_1(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.1923172Z ^ 2025-12-04T12:15:05.1923640Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.1923648Z 2025-12-04T12:15:05.1924368Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:05.1924377Z 2025-12-04T12:15:05.1924382Z 2025-12-04T12:15:05.1924635Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:05.1925239Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda 2025-12-04T12:15:05.1925245Z 2025-12-04T12:15:05.1925513Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:05.1925751Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.1925892Z frames [('total', 1)] 2025-12-04T12:15:05.1926011Z stats [('calls_captured', 6)] 2025-12-04T12:15:05.1926263Z inductor [('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1)] 2025-12-04T12:15:05.1926517Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.1926636Z graph_break [] 2025-12-04T12:15:05.1926931Z ___ TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda ____ 2025-12-04T12:15:05.1927060Z Traceback (most recent call last): 2025-12-04T12:15:05.1927473Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 236, in test_amax_fp8_quant 2025-12-04T12:15:05.1927629Z y_compiled = compiled_amax_fp8_quant(x, scale) 2025-12-04T12:15:05.1928122Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:05.1928386Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:05.1928903Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:05.1929109Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:05.1929618Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:05.1929767Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:05.1930348Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:05.1930672Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:05.1931207Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:05.1931357Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:05.1931838Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:05.1931979Z return self._compile_to_module() 2025-12-04T12:15:05.1932469Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:05.1932632Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:05.1933163Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:05.1933298Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:05.1933805Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:05.1934036Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:05.1934619Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:05.1934773Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:05.1935284Z File "/tmp/tmp4xvsuzkt/xr/cxrldus6vp6ouqkiqrbuchtbgu7kt3szztotmi4e7s3w5iwdga2z.py", line 113, in 2025-12-04T12:15:05.1935762Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T12:15:05.1935876Z kernel.precompile( 2025-12-04T12:15:05.1936508Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T12:15:05.1936687Z self._precompile_worker() 2025-12-04T12:15:05.1937287Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:05.1937469Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:05.1938081Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.1938384Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.1938884Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.1939135Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.1939582Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.1939937Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.1940166Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:05.1940617Z def triton_per_fused__to_copy_abs_amax_clamp_mul_1(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.1940709Z ^ 2025-12-04T12:15:05.1941169Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.1941178Z 2025-12-04T12:15:05.1941911Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:05.1941917Z 2025-12-04T12:15:05.1941921Z 2025-12-04T12:15:05.1942143Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:05.1942783Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda 2025-12-04T12:15:05.1942789Z 2025-12-04T12:15:05.1943061Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:05.1943286Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.1943410Z frames [('total', 1)] 2025-12-04T12:15:05.1943531Z stats [('calls_captured', 6)] 2025-12-04T12:15:05.1943769Z inductor [('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1)] 2025-12-04T12:15:05.1944011Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.1944113Z graph_break [] 2025-12-04T12:15:05.1944351Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.1944458Z frames [('total', 1)] 2025-12-04T12:15:05.1944577Z stats [('calls_captured', 6)] 2025-12-04T12:15:05.1944813Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.1945053Z inductor [('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1)] 2025-12-04T12:15:05.1945158Z graph_break [] 2025-12-04T12:15:05.1945323Z =================================== FAILURES =================================== 2025-12-04T12:15:05.1945617Z ___ TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda ____ 2025-12-04T12:15:05.1945756Z Traceback (most recent call last): 2025-12-04T12:15:05.1946154Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 236, in test_amax_fp8_quant 2025-12-04T12:15:05.1946313Z y_compiled = compiled_amax_fp8_quant(x, scale) 2025-12-04T12:15:05.1946824Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:05.1947078Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:05.1947595Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:05.1947806Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:05.1948353Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:05.1948516Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:05.1949054Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:05.1949380Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:05.1949945Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:05.1950123Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:05.1950620Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:05.1950744Z return self._compile_to_module() 2025-12-04T12:15:05.1951229Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:05.1951405Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:05.1951922Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:05.1952052Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:05.1952561Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:05.1952795Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:05.1953391Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:05.1953517Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:05.1954048Z File "/tmp/tmphoo1dnbk/cm/ccmb5pmh2v4krc2fx4we57durl5bvualdjg4u2iphyynhog5lbjw.py", line 113, in 2025-12-04T12:15:05.1954525Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T12:15:05.1954636Z kernel.precompile( 2025-12-04T12:15:05.1955198Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T12:15:05.1955316Z self._precompile_worker() 2025-12-04T12:15:05.1955912Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:05.1956102Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:05.1956698Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.1956896Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.1957362Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.1957607Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.1958063Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.1958398Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.1958629Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:05.1959075Z def triton_per_fused__to_copy_abs_amax_clamp_mul_1(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.1959165Z ^ 2025-12-04T12:15:05.1959631Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.1959636Z 2025-12-04T12:15:05.1960383Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:05.1960390Z 2025-12-04T12:15:05.1960394Z 2025-12-04T12:15:05.1960614Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:05.1961219Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda 2025-12-04T12:15:05.1961225Z 2025-12-04T12:15:05.1961494Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:05.1961779Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.1961885Z frames [('total', 1)] 2025-12-04T12:15:05.1962031Z stats [('calls_captured', 6)] 2025-12-04T12:15:05.1962281Z inductor [('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1)] 2025-12-04T12:15:05.1962503Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.1962619Z graph_break [] 2025-12-04T12:15:05.1962843Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.1962948Z frames [('total', 1)] 2025-12-04T12:15:05.1963077Z stats [('calls_captured', 6)] 2025-12-04T12:15:05.1963295Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.1963528Z inductor [('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1)] 2025-12-04T12:15:05.1963642Z graph_break [] 2025-12-04T12:15:05.1963857Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.1963964Z frames [('total', 1)] 2025-12-04T12:15:05.1964095Z stats [('calls_captured', 6)] 2025-12-04T12:15:05.1964314Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.1964558Z inductor [('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1)] 2025-12-04T12:15:05.1964659Z graph_break [] 2025-12-04T12:15:05.1965344Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-6de5a411a3f65f82.xml - 2025-12-04T12:15:05.1965536Z =========================== short test summary info ============================ 2025-12-04T12:15:05.1966265Z FAILED [0.4598s] inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:05.1966696Z def triton_per_fused__to_copy_abs_amax_clamp_mul_1(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.1966803Z ^ 2025-12-04T12:15:05.1967259Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.1967264Z 2025-12-04T12:15:05.1967987Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:05.1967995Z 2025-12-04T12:15:05.1967999Z 2025-12-04T12:15:05.1968217Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:05.1968819Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda 2025-12-04T12:15:05.1968824Z 2025-12-04T12:15:05.1969090Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:05.1969270Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T12:15:05.1969487Z ================== 1 failed, 12 deselected, 2 rerun in 4.55s =================== 2025-12-04T12:15:05.1969590Z Got exit code 1 2025-12-04T12:15:05.1969700Z Retrying single test... 2025-12-04T12:15:05.1970189Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-d2f3621583fff098.xml 2025-12-04T12:15:05.1970353Z ============================= test session starts ============================== 2025-12-04T12:15:05.1970719Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T12:15:05.1970862Z cachedir: .pytest_cache 2025-12-04T12:15:05.1971574Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:15:05.1971720Z rootdir: /var/lib/jenkins/workspace 2025-12-04T12:15:05.1971832Z configfile: pytest.ini 2025-12-04T12:15:05.1972433Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T12:15:05.1972729Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T12:15:05.1973442Z stepcurrent: skipping 12 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda 2025-12-04T12:15:05.1973576Z Running 1 items in this shard 2025-12-04T12:15:05.1973584Z 2025-12-04T12:15:05.1974743Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda E1204 11:50:26.332000 113312 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_mul_1 2025-12-04T12:15:05.1975635Z E1204 11:50:26.332000 113312 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_1(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.1976069Z E1204 11:50:26.332000 113312 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T12:15:05.1976585Z E1204 11:50:26.332000 113312 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 5 2025-12-04T12:15:05.1977111Z E1204 11:50:26.332000 113312 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] R0_BLOCK: tl.constexpr = 8 2025-12-04T12:15:05.1977632Z E1204 11:50:26.332000 113312 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T12:15:05.1978181Z E1204 11:50:26.332000 113312 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T12:15:05.1978720Z E1204 11:50:26.332000 113312 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:05.1979307Z E1204 11:50:26.332000 113312 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T12:15:05.1979910Z E1204 11:50:26.332000 113312 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T12:15:05.1980466Z E1204 11:50:26.332000 113312 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T12:15:05.1980925Z E1204 11:50:26.332000 113312 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_offset = 0 2025-12-04T12:15:05.1981448Z E1204 11:50:26.332000 113312 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T12:15:05.1981934Z E1204 11:50:26.332000 113312 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T12:15:05.1982393Z E1204 11:50:26.332000 113312 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T12:15:05.1982841Z E1204 11:50:26.332000 113312 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_0 = r0_index 2025-12-04T12:15:05.1983447Z E1204 11:50:26.332000 113312 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0) 2025-12-04T12:15:05.1983964Z E1204 11:50:26.332000 113312 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp6 = tl.load(in_ptr1 + (0)) 2025-12-04T12:15:05.1984563Z E1204 11:50:26.332000 113312 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = tl.broadcast_to(tmp6, [1, 1]) 2025-12-04T12:15:05.1985145Z E1204 11:50:26.332000 113312 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tl.broadcast_to(tmp0, [XBLOCK, R0_BLOCK]) 2025-12-04T12:15:05.1985713Z E1204 11:50:26.332000 113312 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tl.where(r0_mask, tmp1, float("-inf")) 2025-12-04T12:15:05.1986384Z E1204 11:50:26.332000 113312 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = triton_helpers.max2(tmp3, 1)[:, None].to(tl.float32) 2025-12-04T12:15:05.1986920Z E1204 11:50:26.332000 113312 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = tmp4.to(tl.float32) 2025-12-04T12:15:05.1987402Z E1204 11:50:26.332000 113312 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tmp5 * tmp7 2025-12-04T12:15:05.1987847Z E1204 11:50:26.332000 113312 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = -448.0 2025-12-04T12:15:05.1988410Z E1204 11:50:26.332000 113312 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = triton_helpers.maximum(tmp8, tmp9) 2025-12-04T12:15:05.1988862Z E1204 11:50:26.332000 113312 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = 448.0 2025-12-04T12:15:05.1989432Z E1204 11:50:26.332000 113312 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = triton_helpers.minimum(tmp10, tmp11) 2025-12-04T12:15:05.1989983Z E1204 11:50:26.332000 113312 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = tmp12.to(tl.float8e4nv) 2025-12-04T12:15:05.1990695Z E1204 11:50:26.332000 113312 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp13, None) 2025-12-04T12:15:05.1991100Z E1204 11:50:26.332000 113312 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:05.1993048Z E1204 11:50:26.332000 113312 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp32', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 75} 2025-12-04T12:15:05.1993595Z E1204 11:50:26.332000 113312 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:05.1994650Z E1204 11:50:26.332000 113312 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.1995284Z E1204 11:50:26.332000 113312 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.1996190Z E1204 11:50:26.332000 113312 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.1996872Z E1204 11:50:26.332000 113312 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.1997769Z E1204 11:50:26.332000 113312 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.1998576Z E1204 11:50:26.332000 113312 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.1999199Z E1204 11:50:26.332000 113312 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T12:15:05.2000069Z E1204 11:50:26.332000 113312 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_1(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.2000468Z E1204 11:50:26.332000 113312 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T12:15:05.2001404Z E1204 11:50:26.332000 113312 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.2001546Z ('RERUN', {'yellow': True}) [3.6178s] [100%] 2025-12-04T12:15:05.2002709Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda E1204 11:50:26.853000 113312 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_mul_1 2025-12-04T12:15:05.2003578Z E1204 11:50:26.853000 113312 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_1(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.2004006Z E1204 11:50:26.853000 113312 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T12:15:05.2004458Z E1204 11:50:26.853000 113312 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 5 2025-12-04T12:15:05.2004970Z E1204 11:50:26.853000 113312 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] R0_BLOCK: tl.constexpr = 8 2025-12-04T12:15:05.2005481Z E1204 11:50:26.853000 113312 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T12:15:05.2006013Z E1204 11:50:26.853000 113312 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T12:15:05.2006552Z E1204 11:50:26.853000 113312 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:05.2007150Z E1204 11:50:26.853000 113312 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T12:15:05.2007742Z E1204 11:50:26.853000 113312 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T12:15:05.2008311Z E1204 11:50:26.853000 113312 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T12:15:05.2008759Z E1204 11:50:26.853000 113312 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_offset = 0 2025-12-04T12:15:05.2009286Z E1204 11:50:26.853000 113312 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T12:15:05.2009759Z E1204 11:50:26.853000 113312 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T12:15:05.2010220Z E1204 11:50:26.853000 113312 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T12:15:05.2010681Z E1204 11:50:26.853000 113312 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_0 = r0_index 2025-12-04T12:15:05.2011277Z E1204 11:50:26.853000 113312 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0) 2025-12-04T12:15:05.2011810Z E1204 11:50:26.853000 113312 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp6 = tl.load(in_ptr1 + (0)) 2025-12-04T12:15:05.2012408Z E1204 11:50:26.853000 113312 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = tl.broadcast_to(tmp6, [1, 1]) 2025-12-04T12:15:05.2012987Z E1204 11:50:26.853000 113312 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tl.broadcast_to(tmp0, [XBLOCK, R0_BLOCK]) 2025-12-04T12:15:05.2013570Z E1204 11:50:26.853000 113312 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tl.where(r0_mask, tmp1, float("-inf")) 2025-12-04T12:15:05.2014259Z E1204 11:50:26.853000 113312 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = triton_helpers.max2(tmp3, 1)[:, None].to(tl.float32) 2025-12-04T12:15:05.2014781Z E1204 11:50:26.853000 113312 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = tmp4.to(tl.float32) 2025-12-04T12:15:05.2015249Z E1204 11:50:26.853000 113312 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tmp5 * tmp7 2025-12-04T12:15:05.2015693Z E1204 11:50:26.853000 113312 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = -448.0 2025-12-04T12:15:05.2016274Z E1204 11:50:26.853000 113312 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = triton_helpers.maximum(tmp8, tmp9) 2025-12-04T12:15:05.2016780Z E1204 11:50:26.853000 113312 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = 448.0 2025-12-04T12:15:05.2017375Z E1204 11:50:26.853000 113312 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = triton_helpers.minimum(tmp10, tmp11) 2025-12-04T12:15:05.2017909Z E1204 11:50:26.853000 113312 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = tmp12.to(tl.float8e4nv) 2025-12-04T12:15:05.2018625Z E1204 11:50:26.853000 113312 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp13, None) 2025-12-04T12:15:05.2019047Z E1204 11:50:26.853000 113312 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:05.2020983Z E1204 11:50:26.853000 113312 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp32', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 75} 2025-12-04T12:15:05.2021535Z E1204 11:50:26.853000 113312 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:05.2022587Z E1204 11:50:26.853000 113312 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.2023235Z E1204 11:50:26.853000 113312 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.2024131Z E1204 11:50:26.853000 113312 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.2024835Z E1204 11:50:26.853000 113312 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.2025718Z E1204 11:50:26.853000 113312 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.2026538Z E1204 11:50:26.853000 113312 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.2027148Z E1204 11:50:26.853000 113312 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T12:15:05.2028019Z E1204 11:50:26.853000 113312 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_1(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.2028464Z E1204 11:50:26.853000 113312 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T12:15:05.2029926Z E1204 11:50:26.853000 113312 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.2031100Z ('RERUN', {'yellow': True}) [0.4823s] [100%] 2025-12-04T12:15:05.2032522Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda E1204 11:50:27.324000 113312 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_mul_1 2025-12-04T12:15:05.2034704Z E1204 11:50:27.324000 113312 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_1(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.2036166Z E1204 11:50:27.324000 113312 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T12:15:05.2037194Z E1204 11:50:27.324000 113312 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 5 2025-12-04T12:15:05.2038314Z E1204 11:50:27.324000 113312 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] R0_BLOCK: tl.constexpr = 8 2025-12-04T12:15:05.2039471Z E1204 11:50:27.324000 113312 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T12:15:05.2040617Z E1204 11:50:27.324000 113312 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T12:15:05.2041836Z E1204 11:50:27.324000 113312 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:05.2043095Z E1204 11:50:27.324000 113312 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T12:15:05.2044406Z E1204 11:50:27.324000 113312 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T12:15:05.2045691Z E1204 11:50:27.324000 113312 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T12:15:05.2046838Z E1204 11:50:27.324000 113312 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_offset = 0 2025-12-04T12:15:05.2047937Z E1204 11:50:27.324000 113312 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T12:15:05.2049084Z E1204 11:50:27.324000 113312 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T12:15:05.2050170Z E1204 11:50:27.324000 113312 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T12:15:05.2051231Z E1204 11:50:27.324000 113312 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_0 = r0_index 2025-12-04T12:15:05.2052403Z E1204 11:50:27.324000 113312 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0) 2025-12-04T12:15:05.2053709Z E1204 11:50:27.324000 113312 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp6 = tl.load(in_ptr1 + (0)) 2025-12-04T12:15:05.2054927Z E1204 11:50:27.324000 113312 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = tl.broadcast_to(tmp6, [1, 1]) 2025-12-04T12:15:05.2056191Z E1204 11:50:27.324000 113312 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tl.broadcast_to(tmp0, [XBLOCK, R0_BLOCK]) 2025-12-04T12:15:05.2057604Z E1204 11:50:27.324000 113312 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tl.where(r0_mask, tmp1, float("-inf")) 2025-12-04T12:15:05.2058994Z E1204 11:50:27.324000 113312 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = triton_helpers.max2(tmp3, 1)[:, None].to(tl.float32) 2025-12-04T12:15:05.2060279Z E1204 11:50:27.324000 113312 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = tmp4.to(tl.float32) 2025-12-04T12:15:05.2061405Z E1204 11:50:27.324000 113312 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tmp5 * tmp7 2025-12-04T12:15:05.2062462Z E1204 11:50:27.324000 113312 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = -448.0 2025-12-04T12:15:05.2063607Z E1204 11:50:27.324000 113312 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = triton_helpers.maximum(tmp8, tmp9) 2025-12-04T12:15:05.2064768Z E1204 11:50:27.324000 113312 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = 448.0 2025-12-04T12:15:05.2065934Z E1204 11:50:27.324000 113312 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = triton_helpers.minimum(tmp10, tmp11) 2025-12-04T12:15:05.2067188Z E1204 11:50:27.324000 113312 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = tmp12.to(tl.float8e4nv) 2025-12-04T12:15:05.2068606Z E1204 11:50:27.324000 113312 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp13, None) 2025-12-04T12:15:05.2069826Z E1204 11:50:27.324000 113312 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:05.2072467Z E1204 11:50:27.324000 113312 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp32', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 75} 2025-12-04T12:15:05.2075074Z E1204 11:50:27.324000 113312 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:05.2076814Z E1204 11:50:27.324000 113312 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.2078631Z E1204 11:50:27.324000 113312 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.2080303Z E1204 11:50:27.324000 113312 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.2082028Z E1204 11:50:27.324000 113312 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.2083816Z E1204 11:50:27.324000 113312 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.2085615Z E1204 11:50:27.324000 113312 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.2087135Z E1204 11:50:27.324000 113312 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T12:15:05.2088881Z E1204 11:50:27.324000 113312 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_1(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.2090260Z E1204 11:50:27.324000 113312 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T12:15:05.2091685Z E1204 11:50:27.324000 113312 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.2092830Z FAILED [0.4675s] [100%] 2025-12-04T12:15:05.2093016Z 2025-12-04T12:15:05.2093163Z ==================================== RERUNS ==================================== 2025-12-04T12:15:05.2093760Z ___ TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda ____ 2025-12-04T12:15:05.2094325Z Traceback (most recent call last): 2025-12-04T12:15:05.2094973Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 236, in test_amax_fp8_quant 2025-12-04T12:15:05.2095668Z y_compiled = compiled_amax_fp8_quant(x, scale) 2025-12-04T12:15:05.2096525Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:05.2097428Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:05.2098411Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:05.2099276Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:05.2100131Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:05.2100949Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:05.2101779Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:05.2102795Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:05.2103789Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:05.2104613Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:05.2105396Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:05.2106156Z return self._compile_to_module() 2025-12-04T12:15:05.2106884Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:05.2107696Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:05.2108527Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:05.2109340Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:05.2110098Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:05.2110981Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:05.2111958Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:05.2112817Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:05.2113659Z File "/tmp/tmpuge66jqu/j5/cj5eczgvcwmdynhzqwwuhshosypcq5osnr2lopm66l3olxca4x73.py", line 113, in 2025-12-04T12:15:05.2114790Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T12:15:05.2115528Z kernel.precompile( 2025-12-04T12:15:05.2116272Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T12:15:05.2117141Z self._precompile_worker() 2025-12-04T12:15:05.2118001Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:05.2118928Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:05.2119847Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.2120806Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.2121605Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.2122439Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.2123280Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.2124213Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.2124927Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:05.2125714Z def triton_per_fused__to_copy_abs_amax_clamp_mul_1(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.2126379Z ^ 2025-12-04T12:15:05.2126975Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.2127614Z 2025-12-04T12:15:05.2128350Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:05.2129200Z 2025-12-04T12:15:05.2129204Z 2025-12-04T12:15:05.2129425Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:05.2130377Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda 2025-12-04T12:15:05.2131117Z 2025-12-04T12:15:05.2131389Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:05.2132036Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.2132501Z frames [('total', 1)] 2025-12-04T12:15:05.2132806Z stats [('calls_captured', 6)] 2025-12-04T12:15:05.2133271Z inductor [('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1)] 2025-12-04T12:15:05.2133861Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.2134335Z graph_break [] 2025-12-04T12:15:05.2134794Z ___ TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda ____ 2025-12-04T12:15:05.2135363Z Traceback (most recent call last): 2025-12-04T12:15:05.2135993Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 236, in test_amax_fp8_quant 2025-12-04T12:15:05.2136763Z y_compiled = compiled_amax_fp8_quant(x, scale) 2025-12-04T12:15:05.2137561Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:05.2138431Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:05.2139353Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:05.2140207Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:05.2141134Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:05.2141922Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:05.2142746Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:05.2143748Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:05.2144741Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:05.2145582Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:05.2146388Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:05.2147145Z return self._compile_to_module() 2025-12-04T12:15:05.2147876Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:05.2148675Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:05.2149499Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:05.2150294Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:05.2151044Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:05.2151925Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:05.2152893Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:05.2153754Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:05.2154506Z File "/tmp/tmpsh441ywf/2l/c2leueaeuuq4sigwf2uransvivrn3z2dtdo55t6sxaal2qj3uz5d.py", line 113, in 2025-12-04T12:15:05.2155669Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T12:15:05.2156395Z kernel.precompile( 2025-12-04T12:15:05.2157135Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T12:15:05.2157955Z self._precompile_worker() 2025-12-04T12:15:05.2158775Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:05.2159701Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:05.2160609Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.2161546Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.2162340Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.2163186Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.2164014Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.2164937Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.2165646Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:05.2166428Z def triton_per_fused__to_copy_abs_amax_clamp_mul_1(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.2167110Z ^ 2025-12-04T12:15:05.2167705Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.2168304Z 2025-12-04T12:15:05.2169026Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:05.2169873Z 2025-12-04T12:15:05.2169878Z 2025-12-04T12:15:05.2170155Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:05.2171282Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda 2025-12-04T12:15:05.2172021Z 2025-12-04T12:15:05.2172293Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:05.2172932Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.2173467Z frames [('total', 1)] 2025-12-04T12:15:05.2180495Z stats [('calls_captured', 6)] 2025-12-04T12:15:05.2181134Z inductor [('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1)] 2025-12-04T12:15:05.2181737Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.2182213Z graph_break [] 2025-12-04T12:15:05.2182614Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.2183093Z frames [('total', 1)] 2025-12-04T12:15:05.2183387Z stats [('calls_captured', 6)] 2025-12-04T12:15:05.2183834Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.2184435Z inductor [('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1)] 2025-12-04T12:15:05.2184904Z graph_break [] 2025-12-04T12:15:05.2185216Z =================================== FAILURES =================================== 2025-12-04T12:15:05.2185815Z ___ TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda ____ 2025-12-04T12:15:05.2186364Z Traceback (most recent call last): 2025-12-04T12:15:05.2187020Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 236, in test_amax_fp8_quant 2025-12-04T12:15:05.2187720Z y_compiled = compiled_amax_fp8_quant(x, scale) 2025-12-04T12:15:05.2188502Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:05.2189443Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:05.2190355Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:05.2191211Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:05.2192058Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:05.2192852Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:05.2193675Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:05.2194676Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:05.2195654Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:05.2196468Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:05.2197234Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:05.2197975Z return self._compile_to_module() 2025-12-04T12:15:05.2198688Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:05.2199485Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:05.2200305Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:05.2201101Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:05.2201847Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:05.2202724Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:05.2203746Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:05.2204586Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:05.2205360Z File "/tmp/tmpjeys9fpk/qt/cqtngethmeezmzzpbyza2yhzfoppgoh6nj6yoahjjox5zrv76dl5.py", line 113, in 2025-12-04T12:15:05.2206481Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T12:15:05.2207200Z kernel.precompile( 2025-12-04T12:15:05.2207972Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T12:15:05.2208787Z self._precompile_worker() 2025-12-04T12:15:05.2209660Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:05.2210576Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:05.2211480Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.2212420Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.2213210Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.2214038Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.2214876Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.2215801Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.2216622Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:05.2217413Z def triton_per_fused__to_copy_abs_amax_clamp_mul_1(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.2218134Z ^ 2025-12-04T12:15:05.2218724Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.2219320Z 2025-12-04T12:15:05.2220051Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:05.2220899Z 2025-12-04T12:15:05.2220904Z 2025-12-04T12:15:05.2221126Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:05.2222079Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda 2025-12-04T12:15:05.2222806Z 2025-12-04T12:15:05.2223079Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:05.2223711Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.2224176Z frames [('total', 1)] 2025-12-04T12:15:05.2224476Z stats [('calls_captured', 6)] 2025-12-04T12:15:05.2224931Z inductor [('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1)] 2025-12-04T12:15:05.2225510Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.2225975Z graph_break [] 2025-12-04T12:15:05.2226354Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.2226820Z frames [('total', 1)] 2025-12-04T12:15:05.2227106Z stats [('calls_captured', 6)] 2025-12-04T12:15:05.2227544Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.2228136Z inductor [('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1)] 2025-12-04T12:15:05.2228602Z graph_break [] 2025-12-04T12:15:05.2228974Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.2229435Z frames [('total', 1)] 2025-12-04T12:15:05.2229718Z stats [('calls_captured', 6)] 2025-12-04T12:15:05.2230152Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.2230786Z inductor [('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1)] 2025-12-04T12:15:05.2231259Z graph_break [] 2025-12-04T12:15:05.2232058Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-d2f3621583fff098.xml - 2025-12-04T12:15:05.2233021Z =========================== short test summary info ============================ 2025-12-04T12:15:05.2234081Z FAILED [0.4675s] inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:05.2235448Z def triton_per_fused__to_copy_abs_amax_clamp_mul_1(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.2236103Z ^ 2025-12-04T12:15:05.2236694Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.2237291Z 2025-12-04T12:15:05.2238019Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:05.2238861Z 2025-12-04T12:15:05.2238866Z 2025-12-04T12:15:05.2239095Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:05.2240023Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda 2025-12-04T12:15:05.2240757Z 2025-12-04T12:15:05.2241024Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:05.2241613Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T12:15:05.2242146Z ================== 1 failed, 187 deselected, 2 rerun in 4.61s ================== 2025-12-04T12:15:05.2242582Z Got exit code 1 2025-12-04T12:15:05.2242892Z Retrying single test... 2025-12-04T12:15:05.2243554Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-319cee3df6121e1a.xml 2025-12-04T12:15:05.2244323Z ============================= test session starts ============================== 2025-12-04T12:15:05.2244983Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T12:15:05.2245586Z cachedir: .pytest_cache 2025-12-04T12:15:05.2246300Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:15:05.2247082Z rootdir: /var/lib/jenkins/workspace 2025-12-04T12:15:05.2247435Z configfile: pytest.ini 2025-12-04T12:15:05.2248219Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T12:15:05.2249165Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T12:15:05.2250199Z stepcurrent: skipping 12 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda 2025-12-04T12:15:05.2251120Z Running 1 items in this shard 2025-12-04T12:15:05.2251336Z 2025-12-04T12:15:05.2252507Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda E1204 11:50:45.985000 113543 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_mul_1 2025-12-04T12:15:05.2254691Z E1204 11:50:45.985000 113543 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_1(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.2256132Z E1204 11:50:45.985000 113543 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T12:15:05.2257220Z E1204 11:50:45.985000 113543 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 5 2025-12-04T12:15:05.2258376Z E1204 11:50:45.985000 113543 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] R0_BLOCK: tl.constexpr = 8 2025-12-04T12:15:05.2259502Z E1204 11:50:45.985000 113543 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T12:15:05.2260634Z E1204 11:50:45.985000 113543 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T12:15:05.2261844Z E1204 11:50:45.985000 113543 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:05.2263182Z E1204 11:50:45.985000 113543 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T12:15:05.2264494Z E1204 11:50:45.985000 113543 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T12:15:05.2265770Z E1204 11:50:45.985000 113543 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T12:15:05.2266918Z E1204 11:50:45.985000 113543 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_offset = 0 2025-12-04T12:15:05.2268025Z E1204 11:50:45.985000 113543 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T12:15:05.2269170Z E1204 11:50:45.985000 113543 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T12:15:05.2270248Z E1204 11:50:45.985000 113543 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T12:15:05.2271523Z E1204 11:50:45.985000 113543 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_0 = r0_index 2025-12-04T12:15:05.2272804Z E1204 11:50:45.985000 113543 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0) 2025-12-04T12:15:05.2274059Z E1204 11:50:45.985000 113543 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp6 = tl.load(in_ptr1 + (0)) 2025-12-04T12:15:05.2275247Z E1204 11:50:45.985000 113543 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = tl.broadcast_to(tmp6, [1, 1]) 2025-12-04T12:15:05.2276505Z E1204 11:50:45.985000 113543 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tl.broadcast_to(tmp0, [XBLOCK, R0_BLOCK]) 2025-12-04T12:15:05.2277793Z E1204 11:50:45.985000 113543 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tl.where(r0_mask, tmp1, float("-inf")) 2025-12-04T12:15:05.2279126Z E1204 11:50:45.985000 113543 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = triton_helpers.max2(tmp3, 1)[:, None].to(tl.float32) 2025-12-04T12:15:05.2280405Z E1204 11:50:45.985000 113543 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = tmp4.to(tl.float32) 2025-12-04T12:15:05.2281511Z E1204 11:50:45.985000 113543 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tmp5 * tmp7 2025-12-04T12:15:05.2282562Z E1204 11:50:45.985000 113543 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = -448.0 2025-12-04T12:15:05.2283718Z E1204 11:50:45.985000 113543 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = triton_helpers.maximum(tmp8, tmp9) 2025-12-04T12:15:05.2284868Z E1204 11:50:45.985000 113543 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = 448.0 2025-12-04T12:15:05.2286006Z E1204 11:50:45.985000 113543 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = triton_helpers.minimum(tmp10, tmp11) 2025-12-04T12:15:05.2287252Z E1204 11:50:45.985000 113543 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = tmp12.to(tl.float8e4nv) 2025-12-04T12:15:05.2288785Z E1204 11:50:45.985000 113543 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp13, None) 2025-12-04T12:15:05.2290001Z E1204 11:50:45.985000 113543 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:05.2292495Z E1204 11:50:45.985000 113543 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp32', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 75} 2025-12-04T12:15:05.2295119Z E1204 11:50:45.985000 113543 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:05.2296894Z E1204 11:50:45.985000 113543 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.2298703Z E1204 11:50:45.985000 113543 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.2300379Z E1204 11:50:45.985000 113543 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.2302083Z E1204 11:50:45.985000 113543 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.2303816Z E1204 11:50:45.985000 113543 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.2305608Z E1204 11:50:45.985000 113543 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.2307123Z E1204 11:50:45.985000 113543 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T12:15:05.2309032Z E1204 11:50:45.985000 113543 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_1(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.2310420Z E1204 11:50:45.985000 113543 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T12:15:05.2311828Z E1204 11:50:45.985000 113543 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.2313009Z ('RERUN', {'yellow': True}) [3.5974s] [100%] 2025-12-04T12:15:05.2314432Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda E1204 11:50:46.486000 113543 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_mul_1 2025-12-04T12:15:05.2316655Z E1204 11:50:46.486000 113543 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_1(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.2318094Z E1204 11:50:46.486000 113543 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T12:15:05.2319099Z E1204 11:50:46.486000 113543 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 5 2025-12-04T12:15:05.2320243Z E1204 11:50:46.486000 113543 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] R0_BLOCK: tl.constexpr = 8 2025-12-04T12:15:05.2321367Z E1204 11:50:46.486000 113543 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T12:15:05.2322509Z E1204 11:50:46.486000 113543 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T12:15:05.2323795Z E1204 11:50:46.486000 113543 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:05.2325095Z E1204 11:50:46.486000 113543 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T12:15:05.2326421Z E1204 11:50:46.486000 113543 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T12:15:05.2327722Z E1204 11:50:46.486000 113543 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T12:15:05.2328878Z E1204 11:50:46.486000 113543 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_offset = 0 2025-12-04T12:15:05.2329980Z E1204 11:50:46.486000 113543 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T12:15:05.2331128Z E1204 11:50:46.486000 113543 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T12:15:05.2332222Z E1204 11:50:46.486000 113543 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T12:15:05.2333300Z E1204 11:50:46.486000 113543 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_0 = r0_index 2025-12-04T12:15:05.2334581Z E1204 11:50:46.486000 113543 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0) 2025-12-04T12:15:05.2335875Z E1204 11:50:46.486000 113543 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp6 = tl.load(in_ptr1 + (0)) 2025-12-04T12:15:05.2337179Z E1204 11:50:46.486000 113543 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = tl.broadcast_to(tmp6, [1, 1]) 2025-12-04T12:15:05.2338484Z E1204 11:50:46.486000 113543 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tl.broadcast_to(tmp0, [XBLOCK, R0_BLOCK]) 2025-12-04T12:15:05.2339786Z E1204 11:50:46.486000 113543 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tl.where(r0_mask, tmp1, float("-inf")) 2025-12-04T12:15:05.2341138Z E1204 11:50:46.486000 113543 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = triton_helpers.max2(tmp3, 1)[:, None].to(tl.float32) 2025-12-04T12:15:05.2342439Z E1204 11:50:46.486000 113543 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = tmp4.to(tl.float32) 2025-12-04T12:15:05.2343567Z E1204 11:50:46.486000 113543 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tmp5 * tmp7 2025-12-04T12:15:05.2344621Z E1204 11:50:46.486000 113543 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = -448.0 2025-12-04T12:15:05.2345782Z E1204 11:50:46.486000 113543 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = triton_helpers.maximum(tmp8, tmp9) 2025-12-04T12:15:05.2346953Z E1204 11:50:46.486000 113543 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = 448.0 2025-12-04T12:15:05.2348123Z E1204 11:50:46.486000 113543 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = triton_helpers.minimum(tmp10, tmp11) 2025-12-04T12:15:05.2349481Z E1204 11:50:46.486000 113543 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = tmp12.to(tl.float8e4nv) 2025-12-04T12:15:05.2350873Z E1204 11:50:46.486000 113543 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp13, None) 2025-12-04T12:15:05.2352092Z E1204 11:50:46.486000 113543 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:05.2354616Z E1204 11:50:46.486000 113543 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp32', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 75} 2025-12-04T12:15:05.2357252Z E1204 11:50:46.486000 113543 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:05.2358988Z E1204 11:50:46.486000 113543 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.2360792Z E1204 11:50:46.486000 113543 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.2362453Z E1204 11:50:46.486000 113543 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.2364176Z E1204 11:50:46.486000 113543 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.2365932Z E1204 11:50:46.486000 113543 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.2367731Z E1204 11:50:46.486000 113543 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.2369270Z E1204 11:50:46.486000 113543 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T12:15:05.2370888Z E1204 11:50:46.486000 113543 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_1(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.2372508Z E1204 11:50:46.486000 113543 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T12:15:05.2373933Z E1204 11:50:46.486000 113543 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.2375110Z ('RERUN', {'yellow': True}) [0.4623s] [100%] 2025-12-04T12:15:05.2376587Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda E1204 11:50:46.949000 113543 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_mul_1 2025-12-04T12:15:05.2378758Z E1204 11:50:46.949000 113543 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_1(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.2380224Z E1204 11:50:46.949000 113543 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T12:15:05.2381390Z E1204 11:50:46.949000 113543 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 5 2025-12-04T12:15:05.2382494Z E1204 11:50:46.949000 113543 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] R0_BLOCK: tl.constexpr = 8 2025-12-04T12:15:05.2383601Z E1204 11:50:46.949000 113543 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T12:15:05.2384794Z E1204 11:50:46.949000 113543 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T12:15:05.2386062Z E1204 11:50:46.949000 113543 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:05.2387337Z E1204 11:50:46.949000 113543 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T12:15:05.2388642Z E1204 11:50:46.949000 113543 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T12:15:05.2389926Z E1204 11:50:46.949000 113543 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T12:15:05.2391081Z E1204 11:50:46.949000 113543 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_offset = 0 2025-12-04T12:15:05.2392191Z E1204 11:50:46.949000 113543 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T12:15:05.2393312Z E1204 11:50:46.949000 113543 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T12:15:05.2394393Z E1204 11:50:46.949000 113543 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T12:15:05.2395510Z E1204 11:50:46.949000 113543 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_0 = r0_index 2025-12-04T12:15:05.2396703Z E1204 11:50:46.949000 113543 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0) 2025-12-04T12:15:05.2397953Z E1204 11:50:46.949000 113543 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp6 = tl.load(in_ptr1 + (0)) 2025-12-04T12:15:05.2399168Z E1204 11:50:46.949000 113543 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = tl.broadcast_to(tmp6, [1, 1]) 2025-12-04T12:15:05.2400446Z E1204 11:50:46.949000 113543 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tl.broadcast_to(tmp0, [XBLOCK, R0_BLOCK]) 2025-12-04T12:15:05.2401738Z E1204 11:50:46.949000 113543 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tl.where(r0_mask, tmp1, float("-inf")) 2025-12-04T12:15:05.2403067Z E1204 11:50:46.949000 113543 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = triton_helpers.max2(tmp3, 1)[:, None].to(tl.float32) 2025-12-04T12:15:05.2404336Z E1204 11:50:46.949000 113543 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = tmp4.to(tl.float32) 2025-12-04T12:15:05.2405452Z E1204 11:50:46.949000 113543 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tmp5 * tmp7 2025-12-04T12:15:05.2406499Z E1204 11:50:46.949000 113543 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = -448.0 2025-12-04T12:15:05.2407645Z E1204 11:50:46.949000 113543 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = triton_helpers.maximum(tmp8, tmp9) 2025-12-04T12:15:05.2408789Z E1204 11:50:46.949000 113543 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = 448.0 2025-12-04T12:15:05.2409939Z E1204 11:50:46.949000 113543 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = triton_helpers.minimum(tmp10, tmp11) 2025-12-04T12:15:05.2411259Z E1204 11:50:46.949000 113543 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = tmp12.to(tl.float8e4nv) 2025-12-04T12:15:05.2412654Z E1204 11:50:46.949000 113543 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp13, None) 2025-12-04T12:15:05.2413863Z E1204 11:50:46.949000 113543 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:05.2416494Z E1204 11:50:46.949000 113543 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp32', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 75} 2025-12-04T12:15:05.2419101Z E1204 11:50:46.949000 113543 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:05.2420822Z E1204 11:50:46.949000 113543 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.2422633Z E1204 11:50:46.949000 113543 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.2424289Z E1204 11:50:46.949000 113543 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.2426014Z E1204 11:50:46.949000 113543 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.2427715Z E1204 11:50:46.949000 113543 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.2429509Z E1204 11:50:46.949000 113543 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.2431029Z E1204 11:50:46.949000 113543 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T12:15:05.2432649Z E1204 11:50:46.949000 113543 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_1(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.2434019Z E1204 11:50:46.949000 113543 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T12:15:05.2435439Z E1204 11:50:46.949000 113543 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.2436573Z FAILED [0.4611s] [100%] 2025-12-04T12:15:05.2436755Z 2025-12-04T12:15:05.2436918Z ==================================== RERUNS ==================================== 2025-12-04T12:15:05.2437497Z ___ TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda ____ 2025-12-04T12:15:05.2438064Z Traceback (most recent call last): 2025-12-04T12:15:05.2438710Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 236, in test_amax_fp8_quant 2025-12-04T12:15:05.2439409Z y_compiled = compiled_amax_fp8_quant(x, scale) 2025-12-04T12:15:05.2440182Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:05.2441133Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:05.2442043Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:05.2442888Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:05.2443732Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:05.2444578Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:05.2445431Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:05.2446424Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:05.2447422Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:05.2448252Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:05.2449022Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:05.2449758Z return self._compile_to_module() 2025-12-04T12:15:05.2450496Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:05.2451291Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:05.2452098Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:05.2452890Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:05.2453649Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:05.2454567Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:05.2455516Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:05.2456437Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:05.2457215Z File "/tmp/tmp4yizaawy/hq/chqaaow35caxdduunmhgnbhq3jxqvs2ryjclfayhdkcyja4wlspi.py", line 113, in 2025-12-04T12:15:05.2458341Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T12:15:05.2459053Z kernel.precompile( 2025-12-04T12:15:05.2459810Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T12:15:05.2460632Z self._precompile_worker() 2025-12-04T12:15:05.2461442Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:05.2462367Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:05.2463284Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.2464233Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.2465016Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.2465866Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.2466701Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.2467633Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.2468330Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:05.2469129Z def triton_per_fused__to_copy_abs_amax_clamp_mul_1(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.2469798Z ^ 2025-12-04T12:15:05.2470433Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.2471240Z 2025-12-04T12:15:05.2471958Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:05.2472890Z 2025-12-04T12:15:05.2472895Z 2025-12-04T12:15:05.2473116Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:05.2474117Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda 2025-12-04T12:15:05.2474837Z 2025-12-04T12:15:05.2475123Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:05.2475756Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.2476235Z frames [('total', 1)] 2025-12-04T12:15:05.2476544Z stats [('calls_captured', 6)] 2025-12-04T12:15:05.2476996Z inductor [('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1)] 2025-12-04T12:15:05.2477605Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.2478076Z graph_break [] 2025-12-04T12:15:05.2478525Z ___ TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda ____ 2025-12-04T12:15:05.2479096Z Traceback (most recent call last): 2025-12-04T12:15:05.2479744Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 236, in test_amax_fp8_quant 2025-12-04T12:15:05.2480450Z y_compiled = compiled_amax_fp8_quant(x, scale) 2025-12-04T12:15:05.2481228Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:05.2482173Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:05.2483090Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:05.2483947Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:05.2484780Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:05.2485595Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:05.2486427Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:05.2487421Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:05.2488422Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:05.2489243Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:05.2490021Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:05.2490765Z return self._compile_to_module() 2025-12-04T12:15:05.2491506Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:05.2492311Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:05.2493131Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:05.2493911Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:05.2494678Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:05.2495555Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:05.2496560Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:05.2497426Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:05.2498249Z File "/tmp/tmpgaidouh7/67/c67ognwug75odwxkol235o5w234fejsuwsmki46wz2slnxhn2r57.py", line 113, in 2025-12-04T12:15:05.2499357Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T12:15:05.2500071Z kernel.precompile( 2025-12-04T12:15:05.2500821Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T12:15:05.2501674Z self._precompile_worker() 2025-12-04T12:15:05.2502532Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:05.2503442Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:05.2504362Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.2505308Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.2506089Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.2506935Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.2507766Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.2508697Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.2509394Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:05.2510197Z def triton_per_fused__to_copy_abs_amax_clamp_mul_1(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.2510860Z ^ 2025-12-04T12:15:05.2511490Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.2512082Z 2025-12-04T12:15:05.2512799Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:05.2513657Z 2025-12-04T12:15:05.2513662Z 2025-12-04T12:15:05.2513884Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:05.2514835Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda 2025-12-04T12:15:05.2515566Z 2025-12-04T12:15:05.2515851Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:05.2516482Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.2516958Z frames [('total', 1)] 2025-12-04T12:15:05.2517257Z stats [('calls_captured', 6)] 2025-12-04T12:15:05.2517718Z inductor [('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1)] 2025-12-04T12:15:05.2518307Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.2518774Z graph_break [] 2025-12-04T12:15:05.2519154Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.2519610Z frames [('total', 1)] 2025-12-04T12:15:05.2519909Z stats [('calls_captured', 6)] 2025-12-04T12:15:05.2520348Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.2520937Z inductor [('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1)] 2025-12-04T12:15:05.2521410Z graph_break [] 2025-12-04T12:15:05.2521718Z =================================== FAILURES =================================== 2025-12-04T12:15:05.2522311Z ___ TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda ____ 2025-12-04T12:15:05.2522858Z Traceback (most recent call last): 2025-12-04T12:15:05.2523501Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 236, in test_amax_fp8_quant 2025-12-04T12:15:05.2524201Z y_compiled = compiled_amax_fp8_quant(x, scale) 2025-12-04T12:15:05.2525010Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:05.2525889Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:05.2526790Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:05.2527678Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:05.2528511Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:05.2529366Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:05.2530187Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:05.2531191Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:05.2532175Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:05.2532994Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:05.2533766Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:05.2533891Z return self._compile_to_module() 2025-12-04T12:15:05.2534383Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:05.2534566Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:05.2535084Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:05.2535299Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:05.2535795Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:05.2536030Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:05.2536709Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:05.2536840Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:05.2537358Z File "/tmp/tmp1hojn4gp/lj/cljnn3iatte42t6knrtce34dmmsy5s4wq326ajh4ngohnbd535vb.py", line 113, in 2025-12-04T12:15:05.2537827Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T12:15:05.2537940Z kernel.precompile( 2025-12-04T12:15:05.2538509Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T12:15:05.2538628Z self._precompile_worker() 2025-12-04T12:15:05.2539229Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:05.2539423Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:05.2540017Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.2540234Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.2540687Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.2540932Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.2541391Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.2541727Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.2541972Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:05.2542457Z def triton_per_fused__to_copy_abs_amax_clamp_mul_1(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.2542550Z ^ 2025-12-04T12:15:05.2543018Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.2543026Z 2025-12-04T12:15:05.2543744Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:05.2543782Z 2025-12-04T12:15:05.2543786Z 2025-12-04T12:15:05.2544048Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:05.2544637Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda 2025-12-04T12:15:05.2544645Z 2025-12-04T12:15:05.2544928Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:05.2545169Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.2545277Z frames [('total', 1)] 2025-12-04T12:15:05.2545410Z stats [('calls_captured', 6)] 2025-12-04T12:15:05.2545651Z inductor [('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1)] 2025-12-04T12:15:05.2545878Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.2545997Z graph_break [] 2025-12-04T12:15:05.2546219Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.2546325Z frames [('total', 1)] 2025-12-04T12:15:05.2546456Z stats [('calls_captured', 6)] 2025-12-04T12:15:05.2546679Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.2546911Z inductor [('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1)] 2025-12-04T12:15:05.2547058Z graph_break [] 2025-12-04T12:15:05.2547279Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.2547404Z frames [('total', 1)] 2025-12-04T12:15:05.2547529Z stats [('calls_captured', 6)] 2025-12-04T12:15:05.2547747Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.2547994Z inductor [('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1)] 2025-12-04T12:15:05.2548097Z graph_break [] 2025-12-04T12:15:05.2548751Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-319cee3df6121e1a.xml - 2025-12-04T12:15:05.2548947Z =========================== short test summary info ============================ 2025-12-04T12:15:05.2549688Z FAILED [0.4611s] inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:05.2550137Z def triton_per_fused__to_copy_abs_amax_clamp_mul_1(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.2550232Z ^ 2025-12-04T12:15:05.2550694Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.2550699Z 2025-12-04T12:15:05.2551426Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:05.2551432Z 2025-12-04T12:15:05.2551439Z 2025-12-04T12:15:05.2551659Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:05.2552265Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda 2025-12-04T12:15:05.2552270Z 2025-12-04T12:15:05.2552541Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:05.2552742Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T12:15:05.2552944Z ================== 1 failed, 187 deselected, 2 rerun in 4.56s ================== 2025-12-04T12:15:05.2553081Z Got exit code 1 2025-12-04T12:15:05.2553606Z FAILED CONSISTENTLY: test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda 2025-12-04T12:15:05.2554017Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T12:15:05.2554487Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-452be63c68b4eb35.xml 2025-12-04T12:15:05.2554697Z ============================= test session starts ============================== 2025-12-04T12:15:05.2555082Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T12:15:05.2555212Z cachedir: .pytest_cache 2025-12-04T12:15:05.2555732Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:15:05.2555867Z rootdir: /var/lib/jenkins/workspace 2025-12-04T12:15:05.2555997Z configfile: pytest.ini 2025-12-04T12:15:05.2556588Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T12:15:05.2556819Z collecting ... collected 188 items / 13 deselected / 175 selected 2025-12-04T12:15:05.2556982Z stepcurrent: skipping 13 already run items. 2025-12-04T12:15:05.2557099Z Running 175 items in this shard 2025-12-04T12:15:05.2557107Z 2025-12-04T12:15:05.2558276Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda E1204 11:51:05.535000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_mul_0 2025-12-04T12:15:05.2559252Z E1204 11:51:05.535000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T12:15:05.2559727Z E1204 11:51:05.535000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T12:15:05.2560179Z E1204 11:51:05.535000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 5120 2025-12-04T12:15:05.2560637Z E1204 11:51:05.535000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T12:15:05.2561186Z E1204 11:51:05.535000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T12:15:05.2561731Z E1204 11:51:05.535000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:05.2562324Z E1204 11:51:05.535000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T12:15:05.2562913Z E1204 11:51:05.535000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T12:15:05.2563467Z E1204 11:51:05.535000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_base = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T12:15:05.2563926Z E1204 11:51:05.535000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rbase = r0_base 2025-12-04T12:15:05.2564560Z E1204 11:51:05.535000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp3 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32) 2025-12-04T12:15:05.2565153Z E1204 11:51:05.535000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] for r0_offset in tl.range(0, r0_numel, R0_BLOCK): 2025-12-04T12:15:05.2565688Z E1204 11:51:05.535000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = r0_offset + r0_base 2025-12-04T12:15:05.2566241Z E1204 11:51:05.535000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T12:15:05.2566749Z E1204 11:51:05.535000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T12:15:05.2567227Z E1204 11:51:05.535000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T12:15:05.2567741Z E1204 11:51:05.535000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_0 = r0_index 2025-12-04T12:15:05.2568532Z E1204 11:51:05.535000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, eviction_policy='evict_first', other=0.0).to(tl.float32) 2025-12-04T12:15:05.2569065Z E1204 11:51:05.535000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tl_math.abs(tmp0) 2025-12-04T12:15:05.2569655Z E1204 11:51:05.535000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK]) 2025-12-04T12:15:05.2570225Z E1204 11:51:05.535000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = triton_helpers.maximum(_tmp3, tmp2) 2025-12-04T12:15:05.2570792Z E1204 11:51:05.535000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp3 = tl.where(r0_mask, tmp4, _tmp3) 2025-12-04T12:15:05.2571645Z E1204 11:51:05.535000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = triton_helpers.max2(_tmp3, 1)[:, None] 2025-12-04T12:15:05.2572184Z E1204 11:51:05.535000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp6 = tl.load(in_ptr1 + (0)) 2025-12-04T12:15:05.2572811Z E1204 11:51:05.535000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = tl.broadcast_to(tmp6, [1, 1]) 2025-12-04T12:15:05.2573323Z E1204 11:51:05.535000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = tmp3.to(tl.float32) 2025-12-04T12:15:05.2573805Z E1204 11:51:05.535000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tmp5 * tmp7 2025-12-04T12:15:05.2574251Z E1204 11:51:05.535000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = -448.0 2025-12-04T12:15:05.2574835Z E1204 11:51:05.535000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = triton_helpers.maximum(tmp8, tmp9) 2025-12-04T12:15:05.2575278Z E1204 11:51:05.535000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = 448.0 2025-12-04T12:15:05.2575853Z E1204 11:51:05.535000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = triton_helpers.minimum(tmp10, tmp11) 2025-12-04T12:15:05.2576469Z E1204 11:51:05.535000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = tmp12.to(tl.float8e4nv) 2025-12-04T12:15:05.2577185Z E1204 11:51:05.535000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp13, None) 2025-12-04T12:15:05.2577562Z E1204 11:51:05.535000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:05.2579799Z E1204 11:51:05.535000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1, 'R0_BLOCK': 2048}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 75} 2025-12-04T12:15:05.2580352Z E1204 11:51:05.535000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:05.2581399Z E1204 11:51:05.535000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.2582142Z E1204 11:51:05.535000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.2583044Z E1204 11:51:05.535000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.2583730Z E1204 11:51:05.535000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.2584626Z E1204 11:51:05.535000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.2585398Z E1204 11:51:05.535000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.2586026Z E1204 11:51:05.535000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T12:15:05.2587007Z E1204 11:51:05.535000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T12:15:05.2587443Z E1204 11:51:05.535000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T12:15:05.2588340Z E1204 11:51:05.535000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.2588478Z ('RERUN', {'yellow': True}) [3.2675s] [ 0%] 2025-12-04T12:15:05.2589643Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda E1204 11:51:05.904000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_mul_0 2025-12-04T12:15:05.2590614Z E1204 11:51:05.904000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T12:15:05.2591064Z E1204 11:51:05.904000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T12:15:05.2591516Z E1204 11:51:05.904000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 5120 2025-12-04T12:15:05.2591989Z E1204 11:51:05.904000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T12:15:05.2592528Z E1204 11:51:05.904000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T12:15:05.2593071Z E1204 11:51:05.904000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:05.2593671Z E1204 11:51:05.904000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T12:15:05.2594303Z E1204 11:51:05.904000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T12:15:05.2594877Z E1204 11:51:05.904000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_base = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T12:15:05.2595326Z E1204 11:51:05.904000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rbase = r0_base 2025-12-04T12:15:05.2595986Z E1204 11:51:05.904000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp3 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32) 2025-12-04T12:15:05.2596609Z E1204 11:51:05.904000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] for r0_offset in tl.range(0, r0_numel, R0_BLOCK): 2025-12-04T12:15:05.2597141Z E1204 11:51:05.904000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = r0_offset + r0_base 2025-12-04T12:15:05.2597687Z E1204 11:51:05.904000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T12:15:05.2598174Z E1204 11:51:05.904000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T12:15:05.2598670Z E1204 11:51:05.904000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T12:15:05.2599136Z E1204 11:51:05.904000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_0 = r0_index 2025-12-04T12:15:05.2599900Z E1204 11:51:05.904000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, eviction_policy='evict_first', other=0.0).to(tl.float32) 2025-12-04T12:15:05.2600464Z E1204 11:51:05.904000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tl_math.abs(tmp0) 2025-12-04T12:15:05.2601058Z E1204 11:51:05.904000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK]) 2025-12-04T12:15:05.2601642Z E1204 11:51:05.904000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = triton_helpers.maximum(_tmp3, tmp2) 2025-12-04T12:15:05.2602194Z E1204 11:51:05.904000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp3 = tl.where(r0_mask, tmp4, _tmp3) 2025-12-04T12:15:05.2602765Z E1204 11:51:05.904000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = triton_helpers.max2(_tmp3, 1)[:, None] 2025-12-04T12:15:05.2603298Z E1204 11:51:05.904000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp6 = tl.load(in_ptr1 + (0)) 2025-12-04T12:15:05.2603840Z E1204 11:51:05.904000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = tl.broadcast_to(tmp6, [1, 1]) 2025-12-04T12:15:05.2604365Z E1204 11:51:05.904000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = tmp3.to(tl.float32) 2025-12-04T12:15:05.2604830Z E1204 11:51:05.904000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tmp5 * tmp7 2025-12-04T12:15:05.2605270Z E1204 11:51:05.904000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = -448.0 2025-12-04T12:15:05.2605854Z E1204 11:51:05.904000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = triton_helpers.maximum(tmp8, tmp9) 2025-12-04T12:15:05.2606292Z E1204 11:51:05.904000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = 448.0 2025-12-04T12:15:05.2606875Z E1204 11:51:05.904000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = triton_helpers.minimum(tmp10, tmp11) 2025-12-04T12:15:05.2607442Z E1204 11:51:05.904000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = tmp12.to(tl.float8e4nv) 2025-12-04T12:15:05.2608155Z E1204 11:51:05.904000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp13, None) 2025-12-04T12:15:05.2608533Z E1204 11:51:05.904000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:05.2610775Z E1204 11:51:05.904000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1, 'R0_BLOCK': 2048}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 75} 2025-12-04T12:15:05.2611326Z E1204 11:51:05.904000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:05.2612374Z E1204 11:51:05.904000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.2613014Z E1204 11:51:05.904000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.2613903Z E1204 11:51:05.904000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.2614630Z E1204 11:51:05.904000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.2615510Z E1204 11:51:05.904000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.2616350Z E1204 11:51:05.904000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.2616969Z E1204 11:51:05.904000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T12:15:05.2617939Z E1204 11:51:05.904000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T12:15:05.2618327Z E1204 11:51:05.904000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T12:15:05.2619222Z E1204 11:51:05.904000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.2619371Z ('RERUN', {'yellow': True}) [0.3303s] [ 0%] 2025-12-04T12:15:05.2620523Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda E1204 11:51:06.236000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_mul_0 2025-12-04T12:15:05.2621507Z E1204 11:51:06.236000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T12:15:05.2621989Z E1204 11:51:06.236000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T12:15:05.2622442Z E1204 11:51:06.236000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 5120 2025-12-04T12:15:05.2622916Z E1204 11:51:06.236000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T12:15:05.2623482Z E1204 11:51:06.236000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T12:15:05.2624066Z E1204 11:51:06.236000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:05.2624654Z E1204 11:51:06.236000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T12:15:05.2625240Z E1204 11:51:06.236000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T12:15:05.2625807Z E1204 11:51:06.236000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_base = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T12:15:05.2626255Z E1204 11:51:06.236000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rbase = r0_base 2025-12-04T12:15:05.2626901Z E1204 11:51:06.236000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp3 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32) 2025-12-04T12:15:05.2627485Z E1204 11:51:06.236000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] for r0_offset in tl.range(0, r0_numel, R0_BLOCK): 2025-12-04T12:15:05.2628016Z E1204 11:51:06.236000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = r0_offset + r0_base 2025-12-04T12:15:05.2628675Z E1204 11:51:06.236000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T12:15:05.2629164Z E1204 11:51:06.236000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T12:15:05.2629662Z E1204 11:51:06.236000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T12:15:05.2630137Z E1204 11:51:06.236000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_0 = r0_index 2025-12-04T12:15:05.2630914Z E1204 11:51:06.236000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, eviction_policy='evict_first', other=0.0).to(tl.float32) 2025-12-04T12:15:05.2631431Z E1204 11:51:06.236000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tl_math.abs(tmp0) 2025-12-04T12:15:05.2632026Z E1204 11:51:06.236000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK]) 2025-12-04T12:15:05.2633230Z E1204 11:51:06.236000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = triton_helpers.maximum(_tmp3, tmp2) 2025-12-04T12:15:05.2633785Z E1204 11:51:06.236000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp3 = tl.where(r0_mask, tmp4, _tmp3) 2025-12-04T12:15:05.2634375Z E1204 11:51:06.236000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = triton_helpers.max2(_tmp3, 1)[:, None] 2025-12-04T12:15:05.2634900Z E1204 11:51:06.236000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp6 = tl.load(in_ptr1 + (0)) 2025-12-04T12:15:05.2635442Z E1204 11:51:06.236000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = tl.broadcast_to(tmp6, [1, 1]) 2025-12-04T12:15:05.2636020Z E1204 11:51:06.236000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = tmp3.to(tl.float32) 2025-12-04T12:15:05.2636486Z E1204 11:51:06.236000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tmp5 * tmp7 2025-12-04T12:15:05.2636940Z E1204 11:51:06.236000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = -448.0 2025-12-04T12:15:05.2637557Z E1204 11:51:06.236000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = triton_helpers.maximum(tmp8, tmp9) 2025-12-04T12:15:05.2638022Z E1204 11:51:06.236000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = 448.0 2025-12-04T12:15:05.2638614Z E1204 11:51:06.236000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = triton_helpers.minimum(tmp10, tmp11) 2025-12-04T12:15:05.2639151Z E1204 11:51:06.236000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = tmp12.to(tl.float8e4nv) 2025-12-04T12:15:05.2639867Z E1204 11:51:06.236000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp13, None) 2025-12-04T12:15:05.2640232Z E1204 11:51:06.236000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:05.2642415Z E1204 11:51:06.236000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1, 'R0_BLOCK': 2048}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 75} 2025-12-04T12:15:05.2642991Z E1204 11:51:06.236000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:05.2644041Z E1204 11:51:06.236000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.2644672Z E1204 11:51:06.236000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.2645707Z E1204 11:51:06.236000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.2646407Z E1204 11:51:06.236000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.2647291Z E1204 11:51:06.236000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.2648084Z E1204 11:51:06.236000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.2648701Z E1204 11:51:06.236000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T12:15:05.2649685Z E1204 11:51:06.236000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T12:15:05.2650097Z E1204 11:51:06.236000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T12:15:05.2650988Z E1204 11:51:06.236000 113773 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.2651107Z FAILED [0.3306s] [ 0%] 2025-12-04T12:15:05.2651143Z 2025-12-04T12:15:05.2651294Z ==================================== RERUNS ==================================== 2025-12-04T12:15:05.2651600Z ____ TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda ____ 2025-12-04T12:15:05.2651758Z Traceback (most recent call last): 2025-12-04T12:15:05.2652162Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 236, in test_amax_fp8_quant 2025-12-04T12:15:05.2652340Z y_compiled = compiled_amax_fp8_quant(x, scale) 2025-12-04T12:15:05.2652839Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:05.2653106Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:05.2653622Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:05.2653818Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:05.2654346Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:05.2654499Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:05.2655039Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:05.2655376Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:05.2655937Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:05.2656101Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:05.2656666Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:05.2656794Z return self._compile_to_module() 2025-12-04T12:15:05.2657297Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:05.2657467Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:05.2657999Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:05.2658131Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:05.2658629Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:05.2658879Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:05.2659470Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:05.2659612Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:05.2660119Z File "/tmp/tmpvldyvac4/7d/c7dmblsqrrhzhkzyttxoa6fsfxh7miolefeq4fdp2uvexshr7vii.py", line 58, in 2025-12-04T12:15:05.2660582Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T12:15:05.2660714Z kernel.precompile( 2025-12-04T12:15:05.2661274Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T12:15:05.2661394Z self._precompile_worker() 2025-12-04T12:15:05.2662002Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:05.2662236Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:05.2662848Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.2663048Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.2663500Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.2663792Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.2664263Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.2664608Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.2664835Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:05.2665375Z def triton_red_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T12:15:05.2665484Z ^ 2025-12-04T12:15:05.2665943Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.2665949Z 2025-12-04T12:15:05.2666674Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:05.2666683Z 2025-12-04T12:15:05.2666688Z 2025-12-04T12:15:05.2666906Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:05.2667499Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda 2025-12-04T12:15:05.2667505Z 2025-12-04T12:15:05.2667789Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:05.2668048Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.2668170Z frames [('total', 1)] 2025-12-04T12:15:05.2668294Z stats [('calls_captured', 6)] 2025-12-04T12:15:05.2668533Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:05.2668769Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.2668875Z graph_break [] 2025-12-04T12:15:05.2669167Z ____ TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda ____ 2025-12-04T12:15:05.2669309Z Traceback (most recent call last): 2025-12-04T12:15:05.2669709Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 236, in test_amax_fp8_quant 2025-12-04T12:15:05.2669882Z y_compiled = compiled_amax_fp8_quant(x, scale) 2025-12-04T12:15:05.2670372Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:05.2670622Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:05.2671325Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:05.2671521Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:05.2672031Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:05.2672195Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:05.2672730Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:05.2673069Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:05.2673590Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:05.2673743Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:05.2674318Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:05.2674443Z return self._compile_to_module() 2025-12-04T12:15:05.2675251Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:05.2675422Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:05.2675938Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:05.2676153Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:05.2676691Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:05.2676925Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:05.2677525Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:05.2677657Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:05.2678179Z File "/tmp/tmpxzdjushk/sb/csbslejr7dl6ckros4dlobemrdlaidbvcnpnityrtogll3ghyblm.py", line 58, in 2025-12-04T12:15:05.2678646Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T12:15:05.2678759Z kernel.precompile( 2025-12-04T12:15:05.2679334Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T12:15:05.2679456Z self._precompile_worker() 2025-12-04T12:15:05.2680069Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:05.2680252Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:05.2680896Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.2681116Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.2681568Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.2681813Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.2682270Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.2682611Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.2682858Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:05.2683397Z def triton_red_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T12:15:05.2683492Z ^ 2025-12-04T12:15:05.2683973Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.2683979Z 2025-12-04T12:15:05.2684687Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:05.2684693Z 2025-12-04T12:15:05.2684699Z 2025-12-04T12:15:05.2684931Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:05.2685517Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda 2025-12-04T12:15:05.2685523Z 2025-12-04T12:15:05.2685807Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:05.2686032Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.2686140Z frames [('total', 1)] 2025-12-04T12:15:05.2686268Z stats [('calls_captured', 6)] 2025-12-04T12:15:05.2686509Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:05.2686783Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.2686899Z graph_break [] 2025-12-04T12:15:05.2687119Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.2687223Z frames [('total', 1)] 2025-12-04T12:15:05.2687353Z stats [('calls_captured', 6)] 2025-12-04T12:15:05.2687570Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.2687846Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:05.2687948Z graph_break [] 2025-12-04T12:15:05.2688127Z =================================== FAILURES =================================== 2025-12-04T12:15:05.2688436Z ____ TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda ____ 2025-12-04T12:15:05.2688567Z Traceback (most recent call last): 2025-12-04T12:15:05.2688969Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 236, in test_amax_fp8_quant 2025-12-04T12:15:05.2689142Z y_compiled = compiled_amax_fp8_quant(x, scale) 2025-12-04T12:15:05.2689633Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:05.2689898Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:05.2690412Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:05.2690605Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:05.2691130Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:05.2691279Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:05.2691824Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:05.2692181Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:05.2692701Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:05.2692863Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:05.2693341Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:05.2693465Z return self._compile_to_module() 2025-12-04T12:15:05.2693964Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:05.2694132Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:05.2694658Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:05.2694792Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:05.2695291Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:05.2695542Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:05.2696127Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:05.2696266Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:05.2696843Z File "/tmp/tmpn7g629zu/4t/c4tpztc2cf4nouw5kqxpychvhy47k5gxlv56f5ydrankt6acdhtg.py", line 58, in 2025-12-04T12:15:05.2697308Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T12:15:05.2697435Z kernel.precompile( 2025-12-04T12:15:05.2697990Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T12:15:05.2698110Z self._precompile_worker() 2025-12-04T12:15:05.2698762Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:05.2698944Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:05.2699553Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.2699753Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.2700237Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.2700525Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.2700972Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.2701324Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.2701556Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:05.2702094Z def triton_red_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T12:15:05.2702200Z ^ 2025-12-04T12:15:05.2702658Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.2702666Z 2025-12-04T12:15:05.2703388Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:05.2703397Z 2025-12-04T12:15:05.2703402Z 2025-12-04T12:15:05.2703622Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:05.2704207Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda 2025-12-04T12:15:05.2704251Z 2025-12-04T12:15:05.2704539Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:05.2704765Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.2704884Z frames [('total', 1)] 2025-12-04T12:15:05.2705001Z stats [('calls_captured', 6)] 2025-12-04T12:15:05.2705242Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:05.2705478Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.2705579Z graph_break [] 2025-12-04T12:15:05.2705797Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.2705916Z frames [('total', 1)] 2025-12-04T12:15:05.2706031Z stats [('calls_captured', 6)] 2025-12-04T12:15:05.2706249Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.2706500Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:05.2706599Z graph_break [] 2025-12-04T12:15:05.2706833Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.2706938Z frames [('total', 1)] 2025-12-04T12:15:05.2707052Z stats [('calls_captured', 6)] 2025-12-04T12:15:05.2707282Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.2707517Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:05.2707619Z graph_break [] 2025-12-04T12:15:05.2708285Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-452be63c68b4eb35.xml - 2025-12-04T12:15:05.2708461Z =========================== short test summary info ============================ 2025-12-04T12:15:05.2709220Z FAILED [0.3306s] inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:05.2709793Z def triton_red_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T12:15:05.2709885Z ^ 2025-12-04T12:15:05.2710360Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.2710365Z 2025-12-04T12:15:05.2711073Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:05.2711111Z 2025-12-04T12:15:05.2711116Z 2025-12-04T12:15:05.2711352Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:05.2711964Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda 2025-12-04T12:15:05.2711970Z 2025-12-04T12:15:05.2712255Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:05.2712442Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T12:15:05.2712645Z ================== 1 failed, 13 deselected, 2 rerun in 3.97s =================== 2025-12-04T12:15:05.2712770Z Got exit code 1 2025-12-04T12:15:05.2712881Z Retrying single test... 2025-12-04T12:15:05.2713355Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-5a49841d6a2b730b.xml 2025-12-04T12:15:05.2713537Z ============================= test session starts ============================== 2025-12-04T12:15:05.2713898Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T12:15:05.2714027Z cachedir: .pytest_cache 2025-12-04T12:15:05.2714549Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:15:05.2714711Z rootdir: /var/lib/jenkins/workspace 2025-12-04T12:15:05.2714834Z configfile: pytest.ini 2025-12-04T12:15:05.2715429Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T12:15:05.2715654Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T12:15:05.2716332Z stepcurrent: skipping 13 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda 2025-12-04T12:15:05.2716454Z Running 1 items in this shard 2025-12-04T12:15:05.2716459Z 2025-12-04T12:15:05.2717634Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda E1204 11:51:25.191000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_mul_0 2025-12-04T12:15:05.2718618Z E1204 11:51:25.191000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T12:15:05.2719072Z E1204 11:51:25.191000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T12:15:05.2719528Z E1204 11:51:25.191000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 5120 2025-12-04T12:15:05.2719995Z E1204 11:51:25.191000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T12:15:05.2720551Z E1204 11:51:25.191000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T12:15:05.2721099Z E1204 11:51:25.191000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:05.2721696Z E1204 11:51:25.191000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T12:15:05.2722325Z E1204 11:51:25.191000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T12:15:05.2722890Z E1204 11:51:25.191000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_base = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T12:15:05.2723356Z E1204 11:51:25.191000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rbase = r0_base 2025-12-04T12:15:05.2724017Z E1204 11:51:25.191000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp3 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32) 2025-12-04T12:15:05.2724643Z E1204 11:51:25.191000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] for r0_offset in tl.range(0, r0_numel, R0_BLOCK): 2025-12-04T12:15:05.2725181Z E1204 11:51:25.191000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = r0_offset + r0_base 2025-12-04T12:15:05.2725709Z E1204 11:51:25.191000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T12:15:05.2726213Z E1204 11:51:25.191000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T12:15:05.2726695Z E1204 11:51:25.191000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T12:15:05.2727177Z E1204 11:51:25.191000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_0 = r0_index 2025-12-04T12:15:05.2727951Z E1204 11:51:25.191000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, eviction_policy='evict_first', other=0.0).to(tl.float32) 2025-12-04T12:15:05.2728532Z E1204 11:51:25.191000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tl_math.abs(tmp0) 2025-12-04T12:15:05.2729122Z E1204 11:51:25.191000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK]) 2025-12-04T12:15:05.2729699Z E1204 11:51:25.191000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = triton_helpers.maximum(_tmp3, tmp2) 2025-12-04T12:15:05.2730262Z E1204 11:51:25.191000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp3 = tl.where(r0_mask, tmp4, _tmp3) 2025-12-04T12:15:05.2730836Z E1204 11:51:25.191000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = triton_helpers.max2(_tmp3, 1)[:, None] 2025-12-04T12:15:05.2731370Z E1204 11:51:25.191000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp6 = tl.load(in_ptr1 + (0)) 2025-12-04T12:15:05.2731911Z E1204 11:51:25.191000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = tl.broadcast_to(tmp6, [1, 1]) 2025-12-04T12:15:05.2732416Z E1204 11:51:25.191000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = tmp3.to(tl.float32) 2025-12-04T12:15:05.2732896Z E1204 11:51:25.191000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tmp5 * tmp7 2025-12-04T12:15:05.2733342Z E1204 11:51:25.191000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = -448.0 2025-12-04T12:15:05.2733923Z E1204 11:51:25.191000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = triton_helpers.maximum(tmp8, tmp9) 2025-12-04T12:15:05.2734363Z E1204 11:51:25.191000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = 448.0 2025-12-04T12:15:05.2734938Z E1204 11:51:25.191000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = triton_helpers.minimum(tmp10, tmp11) 2025-12-04T12:15:05.2735547Z E1204 11:51:25.191000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = tmp12.to(tl.float8e4nv) 2025-12-04T12:15:05.2736269Z E1204 11:51:25.191000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp13, None) 2025-12-04T12:15:05.2736719Z E1204 11:51:25.191000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:05.2738983Z E1204 11:51:25.191000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1, 'R0_BLOCK': 2048}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 75} 2025-12-04T12:15:05.2739541Z E1204 11:51:25.191000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:05.2740590Z E1204 11:51:25.191000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.2741248Z E1204 11:51:25.191000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.2742156Z E1204 11:51:25.191000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.2742893Z E1204 11:51:25.191000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.2743789Z E1204 11:51:25.191000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.2744556Z E1204 11:51:25.191000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.2745181Z E1204 11:51:25.191000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T12:15:05.2746152Z E1204 11:51:25.191000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T12:15:05.2746544Z E1204 11:51:25.191000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T12:15:05.2747441Z E1204 11:51:25.191000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.2747576Z ('RERUN', {'yellow': True}) [3.3292s] [100%] 2025-12-04T12:15:05.2748738Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda E1204 11:51:25.567000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_mul_0 2025-12-04T12:15:05.2749710Z E1204 11:51:25.567000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T12:15:05.2750196Z E1204 11:51:25.567000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T12:15:05.2750649Z E1204 11:51:25.567000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 5120 2025-12-04T12:15:05.2751122Z E1204 11:51:25.567000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T12:15:05.2751692Z E1204 11:51:25.567000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T12:15:05.2752267Z E1204 11:51:25.567000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:05.2752871Z E1204 11:51:25.567000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T12:15:05.2753461Z E1204 11:51:25.567000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T12:15:05.2754030Z E1204 11:51:25.567000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_base = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T12:15:05.2754480Z E1204 11:51:25.567000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rbase = r0_base 2025-12-04T12:15:05.2755110Z E1204 11:51:25.567000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp3 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32) 2025-12-04T12:15:05.2755708Z E1204 11:51:25.567000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] for r0_offset in tl.range(0, r0_numel, R0_BLOCK): 2025-12-04T12:15:05.2756245Z E1204 11:51:25.567000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = r0_offset + r0_base 2025-12-04T12:15:05.2756824Z E1204 11:51:25.567000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T12:15:05.2757314Z E1204 11:51:25.567000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T12:15:05.2757809Z E1204 11:51:25.567000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T12:15:05.2758279Z E1204 11:51:25.567000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_0 = r0_index 2025-12-04T12:15:05.2759041Z E1204 11:51:25.567000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, eviction_policy='evict_first', other=0.0).to(tl.float32) 2025-12-04T12:15:05.2759575Z E1204 11:51:25.567000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tl_math.abs(tmp0) 2025-12-04T12:15:05.2760164Z E1204 11:51:25.567000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK]) 2025-12-04T12:15:05.2760750Z E1204 11:51:25.567000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = triton_helpers.maximum(_tmp3, tmp2) 2025-12-04T12:15:05.2761304Z E1204 11:51:25.567000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp3 = tl.where(r0_mask, tmp4, _tmp3) 2025-12-04T12:15:05.2761879Z E1204 11:51:25.567000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = triton_helpers.max2(_tmp3, 1)[:, None] 2025-12-04T12:15:05.2762418Z E1204 11:51:25.567000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp6 = tl.load(in_ptr1 + (0)) 2025-12-04T12:15:05.2762959Z E1204 11:51:25.567000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = tl.broadcast_to(tmp6, [1, 1]) 2025-12-04T12:15:05.2763525Z E1204 11:51:25.567000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = tmp3.to(tl.float32) 2025-12-04T12:15:05.2763995Z E1204 11:51:25.567000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tmp5 * tmp7 2025-12-04T12:15:05.2764437Z E1204 11:51:25.567000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = -448.0 2025-12-04T12:15:05.2765047Z E1204 11:51:25.567000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = triton_helpers.maximum(tmp8, tmp9) 2025-12-04T12:15:05.2765518Z E1204 11:51:25.567000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = 448.0 2025-12-04T12:15:05.2766110Z E1204 11:51:25.567000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = triton_helpers.minimum(tmp10, tmp11) 2025-12-04T12:15:05.2766651Z E1204 11:51:25.567000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = tmp12.to(tl.float8e4nv) 2025-12-04T12:15:05.2767366Z E1204 11:51:25.567000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp13, None) 2025-12-04T12:15:05.2767748Z E1204 11:51:25.567000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:05.2769929Z E1204 11:51:25.567000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1, 'R0_BLOCK': 2048}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 75} 2025-12-04T12:15:05.2770513Z E1204 11:51:25.567000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:05.2771844Z E1204 11:51:25.567000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.2772502Z E1204 11:51:25.567000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.2773395Z E1204 11:51:25.567000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.2774092Z E1204 11:51:25.567000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.2774973Z E1204 11:51:25.567000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.2775756Z E1204 11:51:25.567000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.2776422Z E1204 11:51:25.567000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T12:15:05.2777398Z E1204 11:51:25.567000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T12:15:05.2777878Z E1204 11:51:25.567000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T12:15:05.2778775Z E1204 11:51:25.567000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.2778927Z ('RERUN', {'yellow': True}) [0.3379s] [100%] 2025-12-04T12:15:05.2780180Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda E1204 11:51:25.906000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_mul_0 2025-12-04T12:15:05.2781169Z E1204 11:51:25.906000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T12:15:05.2781606Z E1204 11:51:25.906000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T12:15:05.2782057Z E1204 11:51:25.906000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 5120 2025-12-04T12:15:05.2782539Z E1204 11:51:25.906000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T12:15:05.2783081Z E1204 11:51:25.906000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T12:15:05.2783638Z E1204 11:51:25.906000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:05.2784221Z E1204 11:51:25.906000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T12:15:05.2784862Z E1204 11:51:25.906000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T12:15:05.2785429Z E1204 11:51:25.906000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_base = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T12:15:05.2785878Z E1204 11:51:25.906000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rbase = r0_base 2025-12-04T12:15:05.2786525Z E1204 11:51:25.906000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp3 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32) 2025-12-04T12:15:05.2787107Z E1204 11:51:25.906000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] for r0_offset in tl.range(0, r0_numel, R0_BLOCK): 2025-12-04T12:15:05.2787638Z E1204 11:51:25.906000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = r0_offset + r0_base 2025-12-04T12:15:05.2788182Z E1204 11:51:25.906000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T12:15:05.2788667Z E1204 11:51:25.906000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T12:15:05.2789161Z E1204 11:51:25.906000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T12:15:05.2789630Z E1204 11:51:25.906000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_0 = r0_index 2025-12-04T12:15:05.2790410Z E1204 11:51:25.906000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, eviction_policy='evict_first', other=0.0).to(tl.float32) 2025-12-04T12:15:05.2790924Z E1204 11:51:25.906000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tl_math.abs(tmp0) 2025-12-04T12:15:05.2791546Z E1204 11:51:25.906000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK]) 2025-12-04T12:15:05.2792133Z E1204 11:51:25.906000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = triton_helpers.maximum(_tmp3, tmp2) 2025-12-04T12:15:05.2792684Z E1204 11:51:25.906000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp3 = tl.where(r0_mask, tmp4, _tmp3) 2025-12-04T12:15:05.2793300Z E1204 11:51:25.906000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = triton_helpers.max2(_tmp3, 1)[:, None] 2025-12-04T12:15:05.2793853Z E1204 11:51:25.906000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp6 = tl.load(in_ptr1 + (0)) 2025-12-04T12:15:05.2794396Z E1204 11:51:25.906000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = tl.broadcast_to(tmp6, [1, 1]) 2025-12-04T12:15:05.2794915Z E1204 11:51:25.906000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = tmp3.to(tl.float32) 2025-12-04T12:15:05.2795380Z E1204 11:51:25.906000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tmp5 * tmp7 2025-12-04T12:15:05.2795834Z E1204 11:51:25.906000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = -448.0 2025-12-04T12:15:05.2796404Z E1204 11:51:25.906000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = triton_helpers.maximum(tmp8, tmp9) 2025-12-04T12:15:05.2796844Z E1204 11:51:25.906000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = 448.0 2025-12-04T12:15:05.2797431Z E1204 11:51:25.906000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = triton_helpers.minimum(tmp10, tmp11) 2025-12-04T12:15:05.2798001Z E1204 11:51:25.906000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = tmp12.to(tl.float8e4nv) 2025-12-04T12:15:05.2798726Z E1204 11:51:25.906000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp13, None) 2025-12-04T12:15:05.2799089Z E1204 11:51:25.906000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:05.2801280Z E1204 11:51:25.906000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1, 'R0_BLOCK': 2048}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 75} 2025-12-04T12:15:05.2801819Z E1204 11:51:25.906000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:05.2802879Z E1204 11:51:25.906000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.2803517Z E1204 11:51:25.906000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.2804410Z E1204 11:51:25.906000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.2805142Z E1204 11:51:25.906000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.2806037Z E1204 11:51:25.906000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.2806825Z E1204 11:51:25.906000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.2807496Z E1204 11:51:25.906000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T12:15:05.2808486Z E1204 11:51:25.906000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T12:15:05.2808859Z E1204 11:51:25.906000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T12:15:05.2809747Z E1204 11:51:25.906000 113970 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.2809866Z FAILED [0.3367s] [100%] 2025-12-04T12:15:05.2809874Z 2025-12-04T12:15:05.2810020Z ==================================== RERUNS ==================================== 2025-12-04T12:15:05.2810325Z ____ TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda ____ 2025-12-04T12:15:05.2810453Z Traceback (most recent call last): 2025-12-04T12:15:05.2810851Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 236, in test_amax_fp8_quant 2025-12-04T12:15:05.2811061Z y_compiled = compiled_amax_fp8_quant(x, scale) 2025-12-04T12:15:05.2811554Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:05.2811819Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:05.2812331Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:05.2812525Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:05.2813049Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:05.2813200Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:05.2813736Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:05.2814071Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:05.2814595Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:05.2814759Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:05.2815239Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:05.2815363Z return self._compile_to_module() 2025-12-04T12:15:05.2815861Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:05.2816030Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:05.2816627Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:05.2816762Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:05.2817259Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:05.2817513Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:05.2818142Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:05.2818287Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:05.2818793Z File "/tmp/tmpjh3s7agi/2g/c2gosmniklmk24r4ta44yvlqiqotibdvbnl4trn6ppqt2jvhd2xc.py", line 58, in 2025-12-04T12:15:05.2819253Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T12:15:05.2819413Z kernel.precompile( 2025-12-04T12:15:05.2820002Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T12:15:05.2820124Z self._precompile_worker() 2025-12-04T12:15:05.2820740Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:05.2820928Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:05.2821536Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.2821738Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.2822191Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.2822459Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.2822907Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.2823259Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.2823488Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:05.2824058Z def triton_red_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T12:15:05.2824167Z ^ 2025-12-04T12:15:05.2824629Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.2824635Z 2025-12-04T12:15:05.2825363Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:05.2825372Z 2025-12-04T12:15:05.2825377Z 2025-12-04T12:15:05.2825596Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:05.2826187Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda 2025-12-04T12:15:05.2826193Z 2025-12-04T12:15:05.2826478Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:05.2826707Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.2826833Z frames [('total', 1)] 2025-12-04T12:15:05.2826951Z stats [('calls_captured', 6)] 2025-12-04T12:15:05.2827189Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:05.2827426Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.2827530Z graph_break [] 2025-12-04T12:15:05.2827826Z ____ TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda ____ 2025-12-04T12:15:05.2827966Z Traceback (most recent call last): 2025-12-04T12:15:05.2828363Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 236, in test_amax_fp8_quant 2025-12-04T12:15:05.2828531Z y_compiled = compiled_amax_fp8_quant(x, scale) 2025-12-04T12:15:05.2829023Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:05.2829275Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:05.2829842Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:05.2830038Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:05.2830547Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:05.2830706Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:05.2831287Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:05.2831735Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:05.2832255Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:05.2832406Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:05.2832902Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:05.2833027Z return self._compile_to_module() 2025-12-04T12:15:05.2833532Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:05.2833697Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:05.2834212Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:05.2834362Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:05.2834864Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:05.2835096Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:05.2835738Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:05.2835867Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:05.2836382Z File "/tmp/tmptyhyzyxe/ru/crutiqfrgl6bg6v6h4bwcclkmn75z6da5uuud2wvt4dqyfn2fu2u.py", line 58, in 2025-12-04T12:15:05.2836845Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T12:15:05.2836959Z kernel.precompile( 2025-12-04T12:15:05.2837527Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T12:15:05.2837645Z self._precompile_worker() 2025-12-04T12:15:05.2838256Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:05.2838437Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:05.2839036Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.2839255Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.2839704Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.2839950Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.2840407Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.2840745Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.2840987Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:05.2841518Z def triton_red_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T12:15:05.2841612Z ^ 2025-12-04T12:15:05.2842114Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.2842121Z 2025-12-04T12:15:05.2842836Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:05.2842843Z 2025-12-04T12:15:05.2842847Z 2025-12-04T12:15:05.2843079Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:05.2843696Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda 2025-12-04T12:15:05.2843702Z 2025-12-04T12:15:05.2844012Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:05.2844237Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.2844345Z frames [('total', 1)] 2025-12-04T12:15:05.2844475Z stats [('calls_captured', 6)] 2025-12-04T12:15:05.2844713Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:05.2844937Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.2845050Z graph_break [] 2025-12-04T12:15:05.2845267Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.2845371Z frames [('total', 1)] 2025-12-04T12:15:05.2845499Z stats [('calls_captured', 6)] 2025-12-04T12:15:05.2845717Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.2845968Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:05.2846071Z graph_break [] 2025-12-04T12:15:05.2846220Z =================================== FAILURES =================================== 2025-12-04T12:15:05.2846522Z ____ TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda ____ 2025-12-04T12:15:05.2846680Z Traceback (most recent call last): 2025-12-04T12:15:05.2847080Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 236, in test_amax_fp8_quant 2025-12-04T12:15:05.2847252Z y_compiled = compiled_amax_fp8_quant(x, scale) 2025-12-04T12:15:05.2847742Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:05.2848003Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:05.2848518Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:05.2848713Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:05.2849239Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:05.2849388Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:05.2849936Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:05.2850264Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:05.2850783Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:05.2850943Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:05.2851424Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:05.2851548Z return self._compile_to_module() 2025-12-04T12:15:05.2852049Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:05.2852216Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:05.2852747Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:05.2852879Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:05.2853417Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:05.2853663Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:05.2854245Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:05.2854384Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:05.2854919Z File "/tmp/tmp925yvxqb/sb/csbnfjpd6wqmkeluniuygzgi6oasdo3qgjnbe6bsaweodo7s66dy.py", line 58, in 2025-12-04T12:15:05.2855410Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T12:15:05.2855536Z kernel.precompile( 2025-12-04T12:15:05.2856089Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T12:15:05.2856210Z self._precompile_worker() 2025-12-04T12:15:05.2856890Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:05.2857079Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:05.2857689Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.2857893Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.2858342Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.2858604Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.2859045Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.2859438Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.2859672Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:05.2860208Z def triton_red_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T12:15:05.2860313Z ^ 2025-12-04T12:15:05.2860770Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.2860778Z 2025-12-04T12:15:05.2861502Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:05.2861511Z 2025-12-04T12:15:05.2861516Z 2025-12-04T12:15:05.2861733Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:05.2862318Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda 2025-12-04T12:15:05.2862326Z 2025-12-04T12:15:05.2862609Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:05.2862829Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.2862948Z frames [('total', 1)] 2025-12-04T12:15:05.2863064Z stats [('calls_captured', 6)] 2025-12-04T12:15:05.2863302Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:05.2863539Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.2863641Z graph_break [] 2025-12-04T12:15:05.2863859Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.2863977Z frames [('total', 1)] 2025-12-04T12:15:05.2864092Z stats [('calls_captured', 6)] 2025-12-04T12:15:05.2864315Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.2864565Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:05.2864665Z graph_break [] 2025-12-04T12:15:05.2864929Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.2865038Z frames [('total', 1)] 2025-12-04T12:15:05.2865151Z stats [('calls_captured', 6)] 2025-12-04T12:15:05.2865383Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.2865616Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:05.2865750Z graph_break [] 2025-12-04T12:15:05.2866415Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-5a49841d6a2b730b.xml - 2025-12-04T12:15:05.2866620Z =========================== short test summary info ============================ 2025-12-04T12:15:05.2867366Z FAILED [0.3367s] inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:05.2867900Z def triton_red_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T12:15:05.2867990Z ^ 2025-12-04T12:15:05.2868461Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.2868467Z 2025-12-04T12:15:05.2869175Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:05.2869184Z 2025-12-04T12:15:05.2869189Z 2025-12-04T12:15:05.2869427Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:05.2870012Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda 2025-12-04T12:15:05.2870018Z 2025-12-04T12:15:05.2870334Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:05.2870520Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T12:15:05.2870723Z ================== 1 failed, 187 deselected, 2 rerun in 4.05s ================== 2025-12-04T12:15:05.2870838Z Got exit code 1 2025-12-04T12:15:05.2871167Z Retrying single test... 2025-12-04T12:15:05.2871642Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-f1313a025d30dc09.xml 2025-12-04T12:15:05.2871820Z ============================= test session starts ============================== 2025-12-04T12:15:05.2872176Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T12:15:05.2872303Z cachedir: .pytest_cache 2025-12-04T12:15:05.2872823Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:15:05.2872952Z rootdir: /var/lib/jenkins/workspace 2025-12-04T12:15:05.2873076Z configfile: pytest.ini 2025-12-04T12:15:05.2873673Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T12:15:05.2873897Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T12:15:05.2874578Z stepcurrent: skipping 13 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda 2025-12-04T12:15:05.2874710Z Running 1 items in this shard 2025-12-04T12:15:05.2874715Z 2025-12-04T12:15:05.2875868Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda E1204 11:51:44.720000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_mul_0 2025-12-04T12:15:05.2876961Z E1204 11:51:44.720000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T12:15:05.2877417Z E1204 11:51:44.720000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T12:15:05.2877872Z E1204 11:51:44.720000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 5120 2025-12-04T12:15:05.2878335Z E1204 11:51:44.720000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T12:15:05.2878947Z E1204 11:51:44.720000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T12:15:05.2879535Z E1204 11:51:44.720000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:05.2880134Z E1204 11:51:44.720000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T12:15:05.2880726Z E1204 11:51:44.720000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T12:15:05.2881278Z E1204 11:51:44.720000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_base = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T12:15:05.2881743Z E1204 11:51:44.720000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rbase = r0_base 2025-12-04T12:15:05.2882376Z E1204 11:51:44.720000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp3 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32) 2025-12-04T12:15:05.2882977Z E1204 11:51:44.720000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] for r0_offset in tl.range(0, r0_numel, R0_BLOCK): 2025-12-04T12:15:05.2883557Z E1204 11:51:44.720000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = r0_offset + r0_base 2025-12-04T12:15:05.2884087Z E1204 11:51:44.720000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T12:15:05.2884596Z E1204 11:51:44.720000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T12:15:05.2885079Z E1204 11:51:44.720000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T12:15:05.2885565Z E1204 11:51:44.720000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_0 = r0_index 2025-12-04T12:15:05.2886327Z E1204 11:51:44.720000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, eviction_policy='evict_first', other=0.0).to(tl.float32) 2025-12-04T12:15:05.2886866Z E1204 11:51:44.720000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tl_math.abs(tmp0) 2025-12-04T12:15:05.2887462Z E1204 11:51:44.720000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK]) 2025-12-04T12:15:05.2888587Z E1204 11:51:44.720000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = triton_helpers.maximum(_tmp3, tmp2) 2025-12-04T12:15:05.2889161Z E1204 11:51:44.720000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp3 = tl.where(r0_mask, tmp4, _tmp3) 2025-12-04T12:15:05.2889744Z E1204 11:51:44.720000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = triton_helpers.max2(_tmp3, 1)[:, None] 2025-12-04T12:15:05.2890282Z E1204 11:51:44.720000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp6 = tl.load(in_ptr1 + (0)) 2025-12-04T12:15:05.2890828Z E1204 11:51:44.720000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = tl.broadcast_to(tmp6, [1, 1]) 2025-12-04T12:15:05.2891384Z E1204 11:51:44.720000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = tmp3.to(tl.float32) 2025-12-04T12:15:05.2891868Z E1204 11:51:44.720000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tmp5 * tmp7 2025-12-04T12:15:05.2892314Z E1204 11:51:44.720000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = -448.0 2025-12-04T12:15:05.2892933Z E1204 11:51:44.720000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = triton_helpers.maximum(tmp8, tmp9) 2025-12-04T12:15:05.2893403Z E1204 11:51:44.720000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = 448.0 2025-12-04T12:15:05.2893981Z E1204 11:51:44.720000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = triton_helpers.minimum(tmp10, tmp11) 2025-12-04T12:15:05.2894533Z E1204 11:51:44.720000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = tmp12.to(tl.float8e4nv) 2025-12-04T12:15:05.2895250Z E1204 11:51:44.720000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp13, None) 2025-12-04T12:15:05.2895632Z E1204 11:51:44.720000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:05.2898050Z E1204 11:51:44.720000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1, 'R0_BLOCK': 2048}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 75} 2025-12-04T12:15:05.2898659Z E1204 11:51:44.720000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:05.2899711Z E1204 11:51:44.720000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.2900361Z E1204 11:51:44.720000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.2901256Z E1204 11:51:44.720000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.2901944Z E1204 11:51:44.720000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.2902844Z E1204 11:51:44.720000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.2903613Z E1204 11:51:44.720000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.2904276Z E1204 11:51:44.720000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T12:15:05.2905245Z E1204 11:51:44.720000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T12:15:05.2905664Z E1204 11:51:44.720000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T12:15:05.2906553Z E1204 11:51:44.720000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.2906689Z ('RERUN', {'yellow': True}) [3.2936s] [100%] 2025-12-04T12:15:05.2907918Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda E1204 11:51:45.096000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_mul_0 2025-12-04T12:15:05.2908901Z E1204 11:51:45.096000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T12:15:05.2909346Z E1204 11:51:45.096000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T12:15:05.2909796Z E1204 11:51:45.096000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 5120 2025-12-04T12:15:05.2910267Z E1204 11:51:45.096000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T12:15:05.2910802Z E1204 11:51:45.096000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T12:15:05.2911344Z E1204 11:51:45.096000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:05.2911941Z E1204 11:51:45.096000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T12:15:05.2912561Z E1204 11:51:45.096000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T12:15:05.2913127Z E1204 11:51:45.096000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_base = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T12:15:05.2913577Z E1204 11:51:45.096000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rbase = r0_base 2025-12-04T12:15:05.2914207Z E1204 11:51:45.096000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp3 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32) 2025-12-04T12:15:05.2914801Z E1204 11:51:45.096000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] for r0_offset in tl.range(0, r0_numel, R0_BLOCK): 2025-12-04T12:15:05.2915338Z E1204 11:51:45.096000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = r0_offset + r0_base 2025-12-04T12:15:05.2915876Z E1204 11:51:45.096000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T12:15:05.2916368Z E1204 11:51:45.096000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T12:15:05.2916858Z E1204 11:51:45.096000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T12:15:05.2917328Z E1204 11:51:45.096000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_0 = r0_index 2025-12-04T12:15:05.2918096Z E1204 11:51:45.096000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, eviction_policy='evict_first', other=0.0).to(tl.float32) 2025-12-04T12:15:05.2918629Z E1204 11:51:45.096000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tl_math.abs(tmp0) 2025-12-04T12:15:05.2919262Z E1204 11:51:45.096000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK]) 2025-12-04T12:15:05.2919850Z E1204 11:51:45.096000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = triton_helpers.maximum(_tmp3, tmp2) 2025-12-04T12:15:05.2920402Z E1204 11:51:45.096000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp3 = tl.where(r0_mask, tmp4, _tmp3) 2025-12-04T12:15:05.2921039Z E1204 11:51:45.096000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = triton_helpers.max2(_tmp3, 1)[:, None] 2025-12-04T12:15:05.2921571Z E1204 11:51:45.096000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp6 = tl.load(in_ptr1 + (0)) 2025-12-04T12:15:05.2922114Z E1204 11:51:45.096000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = tl.broadcast_to(tmp6, [1, 1]) 2025-12-04T12:15:05.2922638Z E1204 11:51:45.096000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = tmp3.to(tl.float32) 2025-12-04T12:15:05.2923104Z E1204 11:51:45.096000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tmp5 * tmp7 2025-12-04T12:15:05.2923543Z E1204 11:51:45.096000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = -448.0 2025-12-04T12:15:05.2924124Z E1204 11:51:45.096000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = triton_helpers.maximum(tmp8, tmp9) 2025-12-04T12:15:05.2924566Z E1204 11:51:45.096000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = 448.0 2025-12-04T12:15:05.2925152Z E1204 11:51:45.096000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = triton_helpers.minimum(tmp10, tmp11) 2025-12-04T12:15:05.2925736Z E1204 11:51:45.096000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = tmp12.to(tl.float8e4nv) 2025-12-04T12:15:05.2926449Z E1204 11:51:45.096000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp13, None) 2025-12-04T12:15:05.2926822Z E1204 11:51:45.096000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:05.2928995Z E1204 11:51:45.096000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1, 'R0_BLOCK': 2048}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 75} 2025-12-04T12:15:05.2929552Z E1204 11:51:45.096000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:05.2930594Z E1204 11:51:45.096000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.2931238Z E1204 11:51:45.096000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.2932129Z E1204 11:51:45.096000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.2932860Z E1204 11:51:45.096000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.2933749Z E1204 11:51:45.096000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.2934543Z E1204 11:51:45.096000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.2935218Z E1204 11:51:45.096000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T12:15:05.2936191Z E1204 11:51:45.096000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T12:15:05.2936641Z E1204 11:51:45.096000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T12:15:05.2937534Z E1204 11:51:45.096000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.2937688Z ('RERUN', {'yellow': True}) [0.3379s] [100%] 2025-12-04T12:15:05.2938834Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda E1204 11:51:45.436000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_mul_0 2025-12-04T12:15:05.2939813Z E1204 11:51:45.436000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T12:15:05.2940296Z E1204 11:51:45.436000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T12:15:05.2940749Z E1204 11:51:45.436000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 5120 2025-12-04T12:15:05.2941226Z E1204 11:51:45.436000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T12:15:05.2941765Z E1204 11:51:45.436000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T12:15:05.2942318Z E1204 11:51:45.436000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:05.2942900Z E1204 11:51:45.436000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T12:15:05.2943483Z E1204 11:51:45.436000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T12:15:05.2944047Z E1204 11:51:45.436000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_base = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T12:15:05.2944495Z E1204 11:51:45.436000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rbase = r0_base 2025-12-04T12:15:05.2945139Z E1204 11:51:45.436000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp3 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32) 2025-12-04T12:15:05.2945718Z E1204 11:51:45.436000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] for r0_offset in tl.range(0, r0_numel, R0_BLOCK): 2025-12-04T12:15:05.2946246Z E1204 11:51:45.436000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = r0_offset + r0_base 2025-12-04T12:15:05.2946819Z E1204 11:51:45.436000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T12:15:05.2947309Z E1204 11:51:45.436000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T12:15:05.2947804Z E1204 11:51:45.436000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T12:15:05.2948302Z E1204 11:51:45.436000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_0 = r0_index 2025-12-04T12:15:05.2949109Z E1204 11:51:45.436000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, eviction_policy='evict_first', other=0.0).to(tl.float32) 2025-12-04T12:15:05.2949621Z E1204 11:51:45.436000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tl_math.abs(tmp0) 2025-12-04T12:15:05.2950211Z E1204 11:51:45.436000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK]) 2025-12-04T12:15:05.2950797Z E1204 11:51:45.436000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = triton_helpers.maximum(_tmp3, tmp2) 2025-12-04T12:15:05.2951350Z E1204 11:51:45.436000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp3 = tl.where(r0_mask, tmp4, _tmp3) 2025-12-04T12:15:05.2951936Z E1204 11:51:45.436000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = triton_helpers.max2(_tmp3, 1)[:, None] 2025-12-04T12:15:05.2952457Z E1204 11:51:45.436000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp6 = tl.load(in_ptr1 + (0)) 2025-12-04T12:15:05.2953033Z E1204 11:51:45.436000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = tl.broadcast_to(tmp6, [1, 1]) 2025-12-04T12:15:05.2953553Z E1204 11:51:45.436000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = tmp3.to(tl.float32) 2025-12-04T12:15:05.2954016Z E1204 11:51:45.436000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tmp5 * tmp7 2025-12-04T12:15:05.2954470Z E1204 11:51:45.436000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = -448.0 2025-12-04T12:15:05.2955040Z E1204 11:51:45.436000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = triton_helpers.maximum(tmp8, tmp9) 2025-12-04T12:15:05.2955480Z E1204 11:51:45.436000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = 448.0 2025-12-04T12:15:05.2956064Z E1204 11:51:45.436000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = triton_helpers.minimum(tmp10, tmp11) 2025-12-04T12:15:05.2956601Z E1204 11:51:45.436000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = tmp12.to(tl.float8e4nv) 2025-12-04T12:15:05.2957325Z E1204 11:51:45.436000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp13, None) 2025-12-04T12:15:05.2957691Z E1204 11:51:45.436000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:05.2959918Z E1204 11:51:45.436000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1, 'R0_BLOCK': 2048}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 75} 2025-12-04T12:15:05.2960461Z E1204 11:51:45.436000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:05.2961511Z E1204 11:51:45.436000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.2962207Z E1204 11:51:45.436000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.2963099Z E1204 11:51:45.436000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.2963793Z E1204 11:51:45.436000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.2964671Z E1204 11:51:45.436000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.2965456Z E1204 11:51:45.436000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.2966066Z E1204 11:51:45.436000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T12:15:05.2967052Z E1204 11:51:45.436000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T12:15:05.2967452Z E1204 11:51:45.436000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T12:15:05.2968345Z E1204 11:51:45.436000 114167 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.2968465Z FAILED [0.3385s] [100%] 2025-12-04T12:15:05.2968473Z 2025-12-04T12:15:05.2968621Z ==================================== RERUNS ==================================== 2025-12-04T12:15:05.2968925Z ____ TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda ____ 2025-12-04T12:15:05.2969055Z Traceback (most recent call last): 2025-12-04T12:15:05.2969452Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 236, in test_amax_fp8_quant 2025-12-04T12:15:05.2969626Z y_compiled = compiled_amax_fp8_quant(x, scale) 2025-12-04T12:15:05.2970116Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:05.2970381Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:05.2970892Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:05.2971294Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:05.2971825Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:05.2971974Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:05.2972507Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:05.2972845Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:05.2973463Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:05.2973627Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:05.2974109Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:05.2974233Z return self._compile_to_module() 2025-12-04T12:15:05.2974735Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:05.2974952Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:05.2975540Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:05.2975674Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:05.2976171Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:05.2976480Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:05.2977068Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:05.2984026Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:05.2984621Z File "/tmp/tmpqlyuc1go/wa/cwa3dm4i26la4q3qwl4q7xvx252yxnfo3tasybuwmubg533wb23j.py", line 58, in 2025-12-04T12:15:05.2985103Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T12:15:05.2985244Z kernel.precompile( 2025-12-04T12:15:05.2985811Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T12:15:05.2985936Z self._precompile_worker() 2025-12-04T12:15:05.2986548Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:05.2986888Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:05.2987489Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.2987711Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.2988165Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.2988429Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.2988879Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.2989220Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.2989467Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:05.2990009Z def triton_red_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T12:15:05.2990118Z ^ 2025-12-04T12:15:05.2990579Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.2990586Z 2025-12-04T12:15:05.2991302Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:05.2991313Z 2025-12-04T12:15:05.2991332Z 2025-12-04T12:15:05.2991555Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:05.2992142Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda 2025-12-04T12:15:05.2992147Z 2025-12-04T12:15:05.2992431Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:05.2992664Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.2992816Z frames [('total', 1)] 2025-12-04T12:15:05.2992950Z stats [('calls_captured', 6)] 2025-12-04T12:15:05.2993193Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:05.2993431Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.2993535Z graph_break [] 2025-12-04T12:15:05.2993829Z ____ TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda ____ 2025-12-04T12:15:05.2994022Z Traceback (most recent call last): 2025-12-04T12:15:05.2994424Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 236, in test_amax_fp8_quant 2025-12-04T12:15:05.2994631Z y_compiled = compiled_amax_fp8_quant(x, scale) 2025-12-04T12:15:05.2995139Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:05.2995395Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:05.2995924Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:05.2996118Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:05.2996628Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:05.2996794Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:05.2997334Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:05.2997670Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:05.2998195Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:05.2998380Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:05.2998875Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:05.2998999Z return self._compile_to_module() 2025-12-04T12:15:05.2999486Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:05.2999665Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:05.3000181Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:05.3000325Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:05.3000824Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:05.3001055Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:05.3001656Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:05.3001785Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:05.3002294Z File "/tmp/tmpg82tku6q/wa/cwasncznhdh27jaye6cm62ypurp26i5ly6adgfhkyfckfxttkofh.py", line 58, in 2025-12-04T12:15:05.3002759Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T12:15:05.3002872Z kernel.precompile( 2025-12-04T12:15:05.3003439Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T12:15:05.3003558Z self._precompile_worker() 2025-12-04T12:15:05.3004160Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:05.3004351Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:05.3004945Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.3005193Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.3005645Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.3005890Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.3006348Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.3006714Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.3006986Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:05.3007521Z def triton_red_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T12:15:05.3007618Z ^ 2025-12-04T12:15:05.3008091Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.3008097Z 2025-12-04T12:15:05.3008808Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:05.3008814Z 2025-12-04T12:15:05.3008819Z 2025-12-04T12:15:05.3009051Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:05.3009637Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda 2025-12-04T12:15:05.3009643Z 2025-12-04T12:15:05.3009911Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:05.3010148Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.3010289Z frames [('total', 1)] 2025-12-04T12:15:05.3010421Z stats [('calls_captured', 6)] 2025-12-04T12:15:05.3010663Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:05.3010887Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.3011001Z graph_break [] 2025-12-04T12:15:05.3011222Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.3011327Z frames [('total', 1)] 2025-12-04T12:15:05.3011455Z stats [('calls_captured', 6)] 2025-12-04T12:15:05.3011674Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.3011912Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:05.3012026Z graph_break [] 2025-12-04T12:15:05.3012174Z =================================== FAILURES =================================== 2025-12-04T12:15:05.3012476Z ____ TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda ____ 2025-12-04T12:15:05.3012603Z Traceback (most recent call last): 2025-12-04T12:15:05.3013000Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 236, in test_amax_fp8_quant 2025-12-04T12:15:05.3013172Z y_compiled = compiled_amax_fp8_quant(x, scale) 2025-12-04T12:15:05.3013665Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:05.3013915Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:05.3014441Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:05.3014640Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:05.3015166Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:05.3015314Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:05.3015850Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:05.3016228Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:05.3016898Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:05.3017071Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:05.3017556Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:05.3017736Z return self._compile_to_module() 2025-12-04T12:15:05.3018268Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:05.3018435Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:05.3018964Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:05.3019098Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:05.3019597Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:05.3019844Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:05.3020434Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:05.3020561Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:05.3021085Z File "/tmp/tmp8lyshc2k/dk/cdkrpphdgfcpc657znenumilapgmbhcho7h5ybgzvvpgztpo4ori.py", line 58, in 2025-12-04T12:15:05.3021550Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T12:15:05.3021676Z kernel.precompile( 2025-12-04T12:15:05.3022229Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T12:15:05.3022388Z self._precompile_worker() 2025-12-04T12:15:05.3023001Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:05.3023181Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:05.3023789Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.3023990Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.3024442Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.3024702Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.3025145Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.3025481Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.3025724Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:05.3026257Z def triton_red_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T12:15:05.3026362Z ^ 2025-12-04T12:15:05.3026820Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.3026828Z 2025-12-04T12:15:05.3027543Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:05.3027566Z 2025-12-04T12:15:05.3027570Z 2025-12-04T12:15:05.3027788Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:05.3028373Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda 2025-12-04T12:15:05.3028381Z 2025-12-04T12:15:05.3028698Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:05.3028922Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.3029029Z frames [('total', 1)] 2025-12-04T12:15:05.3029159Z stats [('calls_captured', 6)] 2025-12-04T12:15:05.3029396Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:05.3029688Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.3029791Z graph_break [] 2025-12-04T12:15:05.3030011Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.3030162Z frames [('total', 1)] 2025-12-04T12:15:05.3030277Z stats [('calls_captured', 6)] 2025-12-04T12:15:05.3030496Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.3030746Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:05.3030846Z graph_break [] 2025-12-04T12:15:05.3031083Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.3031187Z frames [('total', 1)] 2025-12-04T12:15:05.3031302Z stats [('calls_captured', 6)] 2025-12-04T12:15:05.3031533Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.3031764Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:05.3031866Z graph_break [] 2025-12-04T12:15:05.3032528Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-f1313a025d30dc09.xml - 2025-12-04T12:15:05.3032707Z =========================== short test summary info ============================ 2025-12-04T12:15:05.3033452Z FAILED [0.3385s] inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:05.3034018Z def triton_red_fused__to_copy_abs_amax_clamp_mul_0(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T12:15:05.3034108Z ^ 2025-12-04T12:15:05.3034581Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.3034587Z 2025-12-04T12:15:05.3035293Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:05.3035301Z 2025-12-04T12:15:05.3035305Z 2025-12-04T12:15:05.3035537Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:05.3036124Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda 2025-12-04T12:15:05.3036132Z 2025-12-04T12:15:05.3036403Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:05.3036600Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T12:15:05.3036803Z ================== 1 failed, 187 deselected, 2 rerun in 4.01s ================== 2025-12-04T12:15:05.3036922Z Got exit code 1 2025-12-04T12:15:05.3037429Z FAILED CONSISTENTLY: test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda 2025-12-04T12:15:05.3037838Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T12:15:05.3038325Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-03aedafc0832726c.xml 2025-12-04T12:15:05.3038493Z ============================= test session starts ============================== 2025-12-04T12:15:05.3038858Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T12:15:05.3038972Z cachedir: .pytest_cache 2025-12-04T12:15:05.3039525Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:15:05.3039666Z rootdir: /var/lib/jenkins/workspace 2025-12-04T12:15:05.3039774Z configfile: pytest.ini 2025-12-04T12:15:05.3040363Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T12:15:05.3040605Z collecting ... collected 188 items / 14 deselected / 174 selected 2025-12-04T12:15:05.3040855Z stepcurrent: skipping 14 already run items. 2025-12-04T12:15:05.3040985Z Running 174 items in this shard 2025-12-04T12:15:05.3040991Z 2025-12-04T12:15:05.3042191Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda E1204 11:52:04.662000 114364 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_mul_1 2025-12-04T12:15:05.3043075Z E1204 11:52:04.662000 114364 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_1(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.3043520Z E1204 11:52:04.662000 114364 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T12:15:05.3043965Z E1204 11:52:04.662000 114364 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 160 2025-12-04T12:15:05.3044503Z E1204 11:52:04.662000 114364 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] R0_BLOCK: tl.constexpr = 256 2025-12-04T12:15:05.3044969Z E1204 11:52:04.662000 114364 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T12:15:05.3045505Z E1204 11:52:04.662000 114364 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T12:15:05.3046098Z E1204 11:52:04.662000 114364 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:05.3046684Z E1204 11:52:04.662000 114364 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T12:15:05.3047282Z E1204 11:52:04.662000 114364 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T12:15:05.3047841Z E1204 11:52:04.662000 114364 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T12:15:05.3048298Z E1204 11:52:04.662000 114364 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_offset = 0 2025-12-04T12:15:05.3048817Z E1204 11:52:04.662000 114364 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T12:15:05.3049296Z E1204 11:52:04.662000 114364 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T12:15:05.3049771Z E1204 11:52:04.662000 114364 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T12:15:05.3050218Z E1204 11:52:04.662000 114364 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_0 = r0_index 2025-12-04T12:15:05.3050828Z E1204 11:52:04.662000 114364 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0) 2025-12-04T12:15:05.3051352Z E1204 11:52:04.662000 114364 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp6 = tl.load(in_ptr1 + (0)) 2025-12-04T12:15:05.3051899Z E1204 11:52:04.662000 114364 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = tl.broadcast_to(tmp6, [1, 1]) 2025-12-04T12:15:05.3052498Z E1204 11:52:04.662000 114364 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tl.broadcast_to(tmp0, [XBLOCK, R0_BLOCK]) 2025-12-04T12:15:05.3053101Z E1204 11:52:04.662000 114364 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tl.where(r0_mask, tmp1, float("-inf")) 2025-12-04T12:15:05.3053744Z E1204 11:52:04.662000 114364 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = triton_helpers.max2(tmp3, 1)[:, None].to(tl.float32) 2025-12-04T12:15:05.3054252Z E1204 11:52:04.662000 114364 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = tmp4.to(tl.float32) 2025-12-04T12:15:05.3054753Z E1204 11:52:04.662000 114364 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tmp5 * tmp7 2025-12-04T12:15:05.3055246Z E1204 11:52:04.662000 114364 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = -448.0 2025-12-04T12:15:05.3055812Z E1204 11:52:04.662000 114364 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = triton_helpers.maximum(tmp8, tmp9) 2025-12-04T12:15:05.3056271Z E1204 11:52:04.662000 114364 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = 448.0 2025-12-04T12:15:05.3056922Z E1204 11:52:04.662000 114364 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = triton_helpers.minimum(tmp10, tmp11) 2025-12-04T12:15:05.3057459Z E1204 11:52:04.662000 114364 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = tmp12.to(tl.float8e4nv) 2025-12-04T12:15:05.3058197Z E1204 11:52:04.662000 114364 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp13, None) 2025-12-04T12:15:05.3058560Z E1204 11:52:04.662000 114364 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:05.3060625Z E1204 11:52:04.662000 114364 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp32', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 75} 2025-12-04T12:15:05.3061205Z E1204 11:52:04.662000 114364 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:05.3062261Z E1204 11:52:04.662000 114364 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.3062893Z E1204 11:52:04.662000 114364 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.3063803Z E1204 11:52:04.662000 114364 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.3064487Z E1204 11:52:04.662000 114364 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.3065370Z E1204 11:52:04.662000 114364 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.3066161Z E1204 11:52:04.662000 114364 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.3066805Z E1204 11:52:04.662000 114364 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T12:15:05.3067700Z E1204 11:52:04.662000 114364 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_1(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.3068067Z E1204 11:52:04.662000 114364 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T12:15:05.3069615Z E1204 11:52:04.662000 114364 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.3069755Z ('RERUN', {'yellow': True}) [3.6567s] [ 0%] 2025-12-04T12:15:05.3070922Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda E1204 11:52:05.180000 114364 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_mul_1 2025-12-04T12:15:05.3072156Z E1204 11:52:05.180000 114364 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_1(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.3072584Z E1204 11:52:05.180000 114364 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T12:15:05.3073049Z E1204 11:52:05.180000 114364 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 160 2025-12-04T12:15:05.3073570Z E1204 11:52:05.180000 114364 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] R0_BLOCK: tl.constexpr = 256 2025-12-04T12:15:05.3074046Z E1204 11:52:05.180000 114364 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T12:15:05.3074669Z E1204 11:52:05.180000 114364 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T12:15:05.3075210Z E1204 11:52:05.180000 114364 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:05.3075809Z E1204 11:52:05.180000 114364 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T12:15:05.3076394Z E1204 11:52:05.180000 114364 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T12:15:05.3076968Z E1204 11:52:05.180000 114364 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T12:15:05.3077406Z E1204 11:52:05.180000 114364 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_offset = 0 2025-12-04T12:15:05.3077922Z E1204 11:52:05.180000 114364 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T12:15:05.3078407Z E1204 11:52:05.180000 114364 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T12:15:05.3078865Z E1204 11:52:05.180000 114364 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T12:15:05.3079324Z E1204 11:52:05.180000 114364 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_0 = r0_index 2025-12-04T12:15:05.3079919Z E1204 11:52:05.180000 114364 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0) 2025-12-04T12:15:05.3080442Z E1204 11:52:05.180000 114364 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp6 = tl.load(in_ptr1 + (0)) 2025-12-04T12:15:05.3080995Z E1204 11:52:05.180000 114364 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = tl.broadcast_to(tmp6, [1, 1]) 2025-12-04T12:15:05.3081646Z E1204 11:52:05.180000 114364 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tl.broadcast_to(tmp0, [XBLOCK, R0_BLOCK]) 2025-12-04T12:15:05.3082234Z E1204 11:52:05.180000 114364 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tl.where(r0_mask, tmp1, float("-inf")) 2025-12-04T12:15:05.3082857Z E1204 11:52:05.180000 114364 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = triton_helpers.max2(tmp3, 1)[:, None].to(tl.float32) 2025-12-04T12:15:05.3083411Z E1204 11:52:05.180000 114364 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = tmp4.to(tl.float32) 2025-12-04T12:15:05.3083934Z E1204 11:52:05.180000 114364 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tmp5 * tmp7 2025-12-04T12:15:05.3084380Z E1204 11:52:05.180000 114364 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = -448.0 2025-12-04T12:15:05.3084962Z E1204 11:52:05.180000 114364 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = triton_helpers.maximum(tmp8, tmp9) 2025-12-04T12:15:05.3085401Z E1204 11:52:05.180000 114364 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = 448.0 2025-12-04T12:15:05.3085985Z E1204 11:52:05.180000 114364 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = triton_helpers.minimum(tmp10, tmp11) 2025-12-04T12:15:05.3086517Z E1204 11:52:05.180000 114364 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = tmp12.to(tl.float8e4nv) 2025-12-04T12:15:05.3087227Z E1204 11:52:05.180000 114364 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp13, None) 2025-12-04T12:15:05.3087608Z E1204 11:52:05.180000 114364 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:05.3089676Z E1204 11:52:05.180000 114364 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp32', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 75} 2025-12-04T12:15:05.3090237Z E1204 11:52:05.180000 114364 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:05.3091283Z E1204 11:52:05.180000 114364 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.3091931Z E1204 11:52:05.180000 114364 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.3092825Z E1204 11:52:05.180000 114364 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.3093514Z E1204 11:52:05.180000 114364 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.3094403Z E1204 11:52:05.180000 114364 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.3095173Z E1204 11:52:05.180000 114364 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.3095824Z E1204 11:52:05.180000 114364 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T12:15:05.3096764Z E1204 11:52:05.180000 114364 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_1(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.3097186Z E1204 11:52:05.180000 114364 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T12:15:05.3098105Z E1204 11:52:05.180000 114364 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.3098257Z ('RERUN', {'yellow': True}) [0.4800s] [ 0%] 2025-12-04T12:15:05.3099420Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda E1204 11:52:05.658000 114364 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_mul_1 2025-12-04T12:15:05.3100283Z E1204 11:52:05.658000 114364 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_1(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.3100726Z E1204 11:52:05.658000 114364 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T12:15:05.3101173Z E1204 11:52:05.658000 114364 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 160 2025-12-04T12:15:05.3101704Z E1204 11:52:05.658000 114364 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] R0_BLOCK: tl.constexpr = 256 2025-12-04T12:15:05.3102198Z E1204 11:52:05.658000 114364 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T12:15:05.3102733Z E1204 11:52:05.658000 114364 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T12:15:05.3103288Z E1204 11:52:05.658000 114364 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:05.3103869Z E1204 11:52:05.658000 114364 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T12:15:05.3104465Z E1204 11:52:05.658000 114364 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T12:15:05.3105025Z E1204 11:52:05.658000 114364 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T12:15:05.3105466Z E1204 11:52:05.658000 114364 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_offset = 0 2025-12-04T12:15:05.3106006Z E1204 11:52:05.658000 114364 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T12:15:05.3106478Z E1204 11:52:05.658000 114364 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T12:15:05.3106952Z E1204 11:52:05.658000 114364 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T12:15:05.3107402Z E1204 11:52:05.658000 114364 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_0 = r0_index 2025-12-04T12:15:05.3108000Z E1204 11:52:05.658000 114364 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0) 2025-12-04T12:15:05.3108540Z E1204 11:52:05.658000 114364 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp6 = tl.load(in_ptr1 + (0)) 2025-12-04T12:15:05.3109121Z E1204 11:52:05.658000 114364 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = tl.broadcast_to(tmp6, [1, 1]) 2025-12-04T12:15:05.3109716Z E1204 11:52:05.658000 114364 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tl.broadcast_to(tmp0, [XBLOCK, R0_BLOCK]) 2025-12-04T12:15:05.3110287Z E1204 11:52:05.658000 114364 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tl.where(r0_mask, tmp1, float("-inf")) 2025-12-04T12:15:05.3110959Z E1204 11:52:05.658000 114364 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = triton_helpers.max2(tmp3, 1)[:, None].to(tl.float32) 2025-12-04T12:15:05.3111497Z E1204 11:52:05.658000 114364 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = tmp4.to(tl.float32) 2025-12-04T12:15:05.3111966Z E1204 11:52:05.658000 114364 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tmp5 * tmp7 2025-12-04T12:15:05.3112425Z E1204 11:52:05.658000 114364 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = -448.0 2025-12-04T12:15:05.3112992Z E1204 11:52:05.658000 114364 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = triton_helpers.maximum(tmp8, tmp9) 2025-12-04T12:15:05.3113442Z E1204 11:52:05.658000 114364 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = 448.0 2025-12-04T12:15:05.3114013Z E1204 11:52:05.658000 114364 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = triton_helpers.minimum(tmp10, tmp11) 2025-12-04T12:15:05.3114546Z E1204 11:52:05.658000 114364 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = tmp12.to(tl.float8e4nv) 2025-12-04T12:15:05.3115275Z E1204 11:52:05.658000 114364 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp13, None) 2025-12-04T12:15:05.3115680Z E1204 11:52:05.658000 114364 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:05.3117714Z E1204 11:52:05.658000 114364 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp32', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 75} 2025-12-04T12:15:05.3118253Z E1204 11:52:05.658000 114364 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:05.3119309Z E1204 11:52:05.658000 114364 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.3119936Z E1204 11:52:05.658000 114364 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.3120842Z E1204 11:52:05.658000 114364 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.3121526Z E1204 11:52:05.658000 114364 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.3122406Z E1204 11:52:05.658000 114364 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.3123219Z E1204 11:52:05.658000 114364 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.3123828Z E1204 11:52:05.658000 114364 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T12:15:05.3124706Z E1204 11:52:05.658000 114364 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_1(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.3125135Z E1204 11:52:05.658000 114364 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T12:15:05.3126047Z E1204 11:52:05.658000 114364 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.3126156Z FAILED [0.4755s] [ 0%] 2025-12-04T12:15:05.3126163Z 2025-12-04T12:15:05.3126311Z ==================================== RERUNS ==================================== 2025-12-04T12:15:05.3126623Z __ TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda ___ 2025-12-04T12:15:05.3126749Z Traceback (most recent call last): 2025-12-04T12:15:05.3127161Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 236, in test_amax_fp8_quant 2025-12-04T12:15:05.3127321Z y_compiled = compiled_amax_fp8_quant(x, scale) 2025-12-04T12:15:05.3127812Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:05.3128078Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:05.3128591Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:05.3128834Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:05.3129359Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:05.3129507Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:05.3130052Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:05.3130374Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:05.3130896Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:05.3131060Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:05.3131540Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:05.3131678Z return self._compile_to_module() 2025-12-04T12:15:05.3132166Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:05.3132331Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:05.3132859Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:05.3132991Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:05.3133484Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:05.3133730Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:05.3134318Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:05.3134458Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:05.3134979Z File "/tmp/tmp9ew304tp/px/cpxpbyg6h77wdwmfjopa4olavns7dy7budjyv3xi56ypvqtxsi7x.py", line 118, in 2025-12-04T12:15:05.3135480Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T12:15:05.3135608Z kernel.precompile( 2025-12-04T12:15:05.3136162Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T12:15:05.3136370Z self._precompile_worker() 2025-12-04T12:15:05.3136977Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:05.3137201Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:05.3137845Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.3138045Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.3138500Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.3138764Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.3139208Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.3139558Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.3139788Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:05.3140222Z def triton_per_fused__to_copy_abs_amax_clamp_mul_1(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.3140328Z ^ 2025-12-04T12:15:05.3140785Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.3140791Z 2025-12-04T12:15:05.3141516Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:05.3141558Z 2025-12-04T12:15:05.3141563Z 2025-12-04T12:15:05.3141781Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:05.3142378Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda 2025-12-04T12:15:05.3142396Z 2025-12-04T12:15:05.3142669Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:05.3142897Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.3143016Z frames [('total', 1)] 2025-12-04T12:15:05.3143134Z stats [('calls_captured', 6)] 2025-12-04T12:15:05.3143372Z inductor [('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1)] 2025-12-04T12:15:05.3143608Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.3143709Z graph_break [] 2025-12-04T12:15:05.3144004Z __ TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda ___ 2025-12-04T12:15:05.3144143Z Traceback (most recent call last): 2025-12-04T12:15:05.3144540Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 236, in test_amax_fp8_quant 2025-12-04T12:15:05.3144709Z y_compiled = compiled_amax_fp8_quant(x, scale) 2025-12-04T12:15:05.3145196Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:05.3145446Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:05.3145994Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:05.3146188Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:05.3146713Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:05.3146865Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:05.3147439Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:05.3147780Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:05.3148300Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:05.3148480Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:05.3148975Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:05.3149133Z return self._compile_to_module() 2025-12-04T12:15:05.3149635Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:05.3149804Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:05.3150326Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:05.3150474Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:05.3150974Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:05.3151224Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:05.3151812Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:05.3151943Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:05.3152457Z File "/tmp/tmpi22i3h0l/ow/cowqt4322c7owwcibq4p5y2imfc6k52zopnev5y5666tvgnjkyzz.py", line 118, in 2025-12-04T12:15:05.3152920Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T12:15:05.3153073Z kernel.precompile( 2025-12-04T12:15:05.3153643Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T12:15:05.3153762Z self._precompile_worker() 2025-12-04T12:15:05.3154373Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:05.3154554Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:05.3155153Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.3155373Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.3155826Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.3156086Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.3156538Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.3156872Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.3157115Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:05.3157549Z def triton_per_fused__to_copy_abs_amax_clamp_mul_1(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.3157644Z ^ 2025-12-04T12:15:05.3158120Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.3158126Z 2025-12-04T12:15:05.3158840Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:05.3158846Z 2025-12-04T12:15:05.3158853Z 2025-12-04T12:15:05.3159088Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:05.3159721Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda 2025-12-04T12:15:05.3159730Z 2025-12-04T12:15:05.3160018Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:05.3160246Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.3160354Z frames [('total', 1)] 2025-12-04T12:15:05.3160518Z stats [('calls_captured', 6)] 2025-12-04T12:15:05.3160756Z inductor [('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1)] 2025-12-04T12:15:05.3161009Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.3161124Z graph_break [] 2025-12-04T12:15:05.3161351Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.3161468Z frames [('total', 1)] 2025-12-04T12:15:05.3161583Z stats [('calls_captured', 6)] 2025-12-04T12:15:05.3161799Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.3162050Z inductor [('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1)] 2025-12-04T12:15:05.3162149Z graph_break [] 2025-12-04T12:15:05.3162297Z =================================== FAILURES =================================== 2025-12-04T12:15:05.3162601Z __ TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda ___ 2025-12-04T12:15:05.3162726Z Traceback (most recent call last): 2025-12-04T12:15:05.3163128Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 236, in test_amax_fp8_quant 2025-12-04T12:15:05.3163295Z y_compiled = compiled_amax_fp8_quant(x, scale) 2025-12-04T12:15:05.3163784Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:05.3164044Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:05.3164588Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:05.3164782Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:05.3165303Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:05.3165450Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:05.3165997Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:05.3166322Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:05.3166842Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:05.3167006Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:05.3167489Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:05.3167626Z return self._compile_to_module() 2025-12-04T12:15:05.3168113Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:05.3168278Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:05.3168811Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:05.3168944Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:05.3169444Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:05.3169690Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:05.3170277Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:05.3170419Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:05.3171160Z File "/tmp/tmpljfnr47f/fd/cfdvrn6jloli4lt23dbt7sa6q4q3a6x3p4dfntgshsjvgzhlvbzs.py", line 118, in 2025-12-04T12:15:05.3171631Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T12:15:05.3171757Z kernel.precompile( 2025-12-04T12:15:05.3172310Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T12:15:05.3172580Z self._precompile_worker() 2025-12-04T12:15:05.3173242Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:05.3173425Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:05.3174032Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.3174855Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.3175314Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.3175573Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.3176020Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.3176438Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.3176674Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:05.3177109Z def triton_per_fused__to_copy_abs_amax_clamp_mul_1(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.3177219Z ^ 2025-12-04T12:15:05.3177678Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.3177765Z 2025-12-04T12:15:05.3178496Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:05.3178502Z 2025-12-04T12:15:05.3178507Z 2025-12-04T12:15:05.3178727Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:05.3179322Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda 2025-12-04T12:15:05.3179345Z 2025-12-04T12:15:05.3179618Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:05.3179849Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.3179974Z frames [('total', 1)] 2025-12-04T12:15:05.3180092Z stats [('calls_captured', 6)] 2025-12-04T12:15:05.3180332Z inductor [('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1)] 2025-12-04T12:15:05.3180570Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.3180675Z graph_break [] 2025-12-04T12:15:05.3180897Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.3181013Z frames [('total', 1)] 2025-12-04T12:15:05.3181131Z stats [('calls_captured', 6)] 2025-12-04T12:15:05.3181362Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.3181599Z inductor [('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1)] 2025-12-04T12:15:05.3181701Z graph_break [] 2025-12-04T12:15:05.3181930Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.3182035Z frames [('total', 1)] 2025-12-04T12:15:05.3182150Z stats [('calls_captured', 6)] 2025-12-04T12:15:05.3182379Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.3182614Z inductor [('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1)] 2025-12-04T12:15:05.3182712Z graph_break [] 2025-12-04T12:15:05.3183430Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-03aedafc0832726c.xml - 2025-12-04T12:15:05.3183608Z =========================== short test summary info ============================ 2025-12-04T12:15:05.3184367Z FAILED [0.4755s] inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:05.3184835Z def triton_per_fused__to_copy_abs_amax_clamp_mul_1(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.3184926Z ^ 2025-12-04T12:15:05.3185431Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.3185437Z 2025-12-04T12:15:05.3186151Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:05.3186160Z 2025-12-04T12:15:05.3186167Z 2025-12-04T12:15:05.3186400Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:05.3187031Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda 2025-12-04T12:15:05.3187040Z 2025-12-04T12:15:05.3187418Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:05.3187606Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T12:15:05.3187808Z ================== 1 failed, 14 deselected, 2 rerun in 4.66s =================== 2025-12-04T12:15:05.3187929Z Got exit code 1 2025-12-04T12:15:05.3188038Z Retrying single test... 2025-12-04T12:15:05.3188507Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-89171bcc48f05a69.xml 2025-12-04T12:15:05.3188723Z ============================= test session starts ============================== 2025-12-04T12:15:05.3189080Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T12:15:05.3189203Z cachedir: .pytest_cache 2025-12-04T12:15:05.3189723Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:15:05.3189848Z rootdir: /var/lib/jenkins/workspace 2025-12-04T12:15:05.3189973Z configfile: pytest.ini 2025-12-04T12:15:05.3190564Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T12:15:05.3190790Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T12:15:05.3191484Z stepcurrent: skipping 14 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda 2025-12-04T12:15:05.3191604Z Running 1 items in this shard 2025-12-04T12:15:05.3191609Z 2025-12-04T12:15:05.3192785Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda E1204 11:52:24.411000 114595 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_mul_1 2025-12-04T12:15:05.3193660Z E1204 11:52:24.411000 114595 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_1(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.3194106Z E1204 11:52:24.411000 114595 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T12:15:05.3194553Z E1204 11:52:24.411000 114595 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 160 2025-12-04T12:15:05.3195071Z E1204 11:52:24.411000 114595 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] R0_BLOCK: tl.constexpr = 256 2025-12-04T12:15:05.3195590Z E1204 11:52:24.411000 114595 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T12:15:05.3196129Z E1204 11:52:24.411000 114595 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T12:15:05.3196681Z E1204 11:52:24.411000 114595 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:05.3197294Z E1204 11:52:24.411000 114595 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T12:15:05.3197915Z E1204 11:52:24.411000 114595 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T12:15:05.3198485Z E1204 11:52:24.411000 114595 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T12:15:05.3198936Z E1204 11:52:24.411000 114595 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_offset = 0 2025-12-04T12:15:05.3199467Z E1204 11:52:24.411000 114595 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T12:15:05.3199938Z E1204 11:52:24.411000 114595 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T12:15:05.3200398Z E1204 11:52:24.411000 114595 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T12:15:05.3200866Z E1204 11:52:24.411000 114595 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_0 = r0_index 2025-12-04T12:15:05.3201459Z E1204 11:52:24.411000 114595 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0) 2025-12-04T12:15:05.3202031Z E1204 11:52:24.411000 114595 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp6 = tl.load(in_ptr1 + (0)) 2025-12-04T12:15:05.3202576Z E1204 11:52:24.411000 114595 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = tl.broadcast_to(tmp6, [1, 1]) 2025-12-04T12:15:05.3203169Z E1204 11:52:24.411000 114595 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tl.broadcast_to(tmp0, [XBLOCK, R0_BLOCK]) 2025-12-04T12:15:05.3203738Z E1204 11:52:24.411000 114595 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tl.where(r0_mask, tmp1, float("-inf")) 2025-12-04T12:15:05.3204368Z E1204 11:52:24.411000 114595 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = triton_helpers.max2(tmp3, 1)[:, None].to(tl.float32) 2025-12-04T12:15:05.3204895Z E1204 11:52:24.411000 114595 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = tmp4.to(tl.float32) 2025-12-04T12:15:05.3205368Z E1204 11:52:24.411000 114595 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tmp5 * tmp7 2025-12-04T12:15:05.3205826Z E1204 11:52:24.411000 114595 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = -448.0 2025-12-04T12:15:05.3206395Z E1204 11:52:24.411000 114595 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = triton_helpers.maximum(tmp8, tmp9) 2025-12-04T12:15:05.3206835Z E1204 11:52:24.411000 114595 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = 448.0 2025-12-04T12:15:05.3207424Z E1204 11:52:24.411000 114595 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = triton_helpers.minimum(tmp10, tmp11) 2025-12-04T12:15:05.3207953Z E1204 11:52:24.411000 114595 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = tmp12.to(tl.float8e4nv) 2025-12-04T12:15:05.3208726Z E1204 11:52:24.411000 114595 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp13, None) 2025-12-04T12:15:05.3209092Z E1204 11:52:24.411000 114595 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:05.3211162Z E1204 11:52:24.411000 114595 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp32', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 75} 2025-12-04T12:15:05.3211735Z E1204 11:52:24.411000 114595 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:05.3212804Z E1204 11:52:24.411000 114595 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.3213434Z E1204 11:52:24.411000 114595 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.3214321Z E1204 11:52:24.411000 114595 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.3215018Z E1204 11:52:24.411000 114595 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.3215900Z E1204 11:52:24.411000 114595 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.3216796Z E1204 11:52:24.411000 114595 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.3217411Z E1204 11:52:24.411000 114595 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T12:15:05.3218299Z E1204 11:52:24.411000 114595 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_1(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.3218669Z E1204 11:52:24.411000 114595 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T12:15:05.3219896Z E1204 11:52:24.411000 114595 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.3220055Z ('RERUN', {'yellow': True}) [3.6558s] [100%] 2025-12-04T12:15:05.3221219Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda E1204 11:52:24.940000 114595 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_mul_1 2025-12-04T12:15:05.3222106Z E1204 11:52:24.940000 114595 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_1(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.3222537Z E1204 11:52:24.940000 114595 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T12:15:05.3222986Z E1204 11:52:24.940000 114595 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 160 2025-12-04T12:15:05.3223580Z E1204 11:52:24.940000 114595 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] R0_BLOCK: tl.constexpr = 256 2025-12-04T12:15:05.3224048Z E1204 11:52:24.940000 114595 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T12:15:05.3224593Z E1204 11:52:24.940000 114595 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T12:15:05.3225183Z E1204 11:52:24.940000 114595 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:05.3225817Z E1204 11:52:24.940000 114595 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T12:15:05.3226400Z E1204 11:52:24.940000 114595 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T12:15:05.3226966Z E1204 11:52:24.940000 114595 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T12:15:05.3227426Z E1204 11:52:24.940000 114595 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_offset = 0 2025-12-04T12:15:05.3227942Z E1204 11:52:24.940000 114595 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T12:15:05.3228428Z E1204 11:52:24.940000 114595 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T12:15:05.3228895Z E1204 11:52:24.940000 114595 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T12:15:05.3229342Z E1204 11:52:24.940000 114595 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_0 = r0_index 2025-12-04T12:15:05.3229994Z E1204 11:52:24.940000 114595 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0) 2025-12-04T12:15:05.3230521Z E1204 11:52:24.940000 114595 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp6 = tl.load(in_ptr1 + (0)) 2025-12-04T12:15:05.3231082Z E1204 11:52:24.940000 114595 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = tl.broadcast_to(tmp6, [1, 1]) 2025-12-04T12:15:05.3231664Z E1204 11:52:24.940000 114595 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tl.broadcast_to(tmp0, [XBLOCK, R0_BLOCK]) 2025-12-04T12:15:05.3232237Z E1204 11:52:24.940000 114595 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tl.where(r0_mask, tmp1, float("-inf")) 2025-12-04T12:15:05.3232883Z E1204 11:52:24.940000 114595 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = triton_helpers.max2(tmp3, 1)[:, None].to(tl.float32) 2025-12-04T12:15:05.3233399Z E1204 11:52:24.940000 114595 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = tmp4.to(tl.float32) 2025-12-04T12:15:05.3233882Z E1204 11:52:24.940000 114595 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tmp5 * tmp7 2025-12-04T12:15:05.3234324Z E1204 11:52:24.940000 114595 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = -448.0 2025-12-04T12:15:05.3234894Z E1204 11:52:24.940000 114595 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = triton_helpers.maximum(tmp8, tmp9) 2025-12-04T12:15:05.3235355Z E1204 11:52:24.940000 114595 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = 448.0 2025-12-04T12:15:05.3235933Z E1204 11:52:24.940000 114595 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = triton_helpers.minimum(tmp10, tmp11) 2025-12-04T12:15:05.3236481Z E1204 11:52:24.940000 114595 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = tmp12.to(tl.float8e4nv) 2025-12-04T12:15:05.3237228Z E1204 11:52:24.940000 114595 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp13, None) 2025-12-04T12:15:05.3237613Z E1204 11:52:24.940000 114595 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:05.3239682Z E1204 11:52:24.940000 114595 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp32', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 75} 2025-12-04T12:15:05.3240264Z E1204 11:52:24.940000 114595 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:05.3241306Z E1204 11:52:24.940000 114595 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.3241937Z E1204 11:52:24.940000 114595 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.3242843Z E1204 11:52:24.940000 114595 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.3243523Z E1204 11:52:24.940000 114595 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.3244462Z E1204 11:52:24.940000 114595 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.3245232Z E1204 11:52:24.940000 114595 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.3245855Z E1204 11:52:24.940000 114595 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T12:15:05.3246734Z E1204 11:52:24.940000 114595 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_1(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.3247103Z E1204 11:52:24.940000 114595 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T12:15:05.3248003Z E1204 11:52:24.940000 114595 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.3248141Z ('RERUN', {'yellow': True}) [0.4912s] [100%] 2025-12-04T12:15:05.3249312Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda E1204 11:52:25.434000 114595 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_mul_1 2025-12-04T12:15:05.3250187Z E1204 11:52:25.434000 114595 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_1(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.3250626Z E1204 11:52:25.434000 114595 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T12:15:05.3251206Z E1204 11:52:25.434000 114595 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 160 2025-12-04T12:15:05.3251731Z E1204 11:52:25.434000 114595 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] R0_BLOCK: tl.constexpr = 256 2025-12-04T12:15:05.3252205Z E1204 11:52:25.434000 114595 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T12:15:05.3252739Z E1204 11:52:25.434000 114595 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T12:15:05.3253371Z E1204 11:52:25.434000 114595 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:05.3253953Z E1204 11:52:25.434000 114595 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T12:15:05.3254543Z E1204 11:52:25.434000 114595 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T12:15:05.3255115Z E1204 11:52:25.434000 114595 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T12:15:05.3255559Z E1204 11:52:25.434000 114595 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_offset = 0 2025-12-04T12:15:05.3256089Z E1204 11:52:25.434000 114595 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T12:15:05.3256652Z E1204 11:52:25.434000 114595 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T12:15:05.3257110Z E1204 11:52:25.434000 114595 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T12:15:05.3257617Z E1204 11:52:25.434000 114595 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_0 = r0_index 2025-12-04T12:15:05.3258212Z E1204 11:52:25.434000 114595 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0) 2025-12-04T12:15:05.3258743Z E1204 11:52:25.434000 114595 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp6 = tl.load(in_ptr1 + (0)) 2025-12-04T12:15:05.3259286Z E1204 11:52:25.434000 114595 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = tl.broadcast_to(tmp6, [1, 1]) 2025-12-04T12:15:05.3259867Z E1204 11:52:25.434000 114595 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tl.broadcast_to(tmp0, [XBLOCK, R0_BLOCK]) 2025-12-04T12:15:05.3260452Z E1204 11:52:25.434000 114595 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tl.where(r0_mask, tmp1, float("-inf")) 2025-12-04T12:15:05.3261077Z E1204 11:52:25.434000 114595 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = triton_helpers.max2(tmp3, 1)[:, None].to(tl.float32) 2025-12-04T12:15:05.3261600Z E1204 11:52:25.434000 114595 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = tmp4.to(tl.float32) 2025-12-04T12:15:05.3262065Z E1204 11:52:25.434000 114595 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tmp5 * tmp7 2025-12-04T12:15:05.3262517Z E1204 11:52:25.434000 114595 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = -448.0 2025-12-04T12:15:05.3263087Z E1204 11:52:25.434000 114595 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = triton_helpers.maximum(tmp8, tmp9) 2025-12-04T12:15:05.3263531Z E1204 11:52:25.434000 114595 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = 448.0 2025-12-04T12:15:05.3264115Z E1204 11:52:25.434000 114595 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = triton_helpers.minimum(tmp10, tmp11) 2025-12-04T12:15:05.3264687Z E1204 11:52:25.434000 114595 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = tmp12.to(tl.float8e4nv) 2025-12-04T12:15:05.3265427Z E1204 11:52:25.434000 114595 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp13, None) 2025-12-04T12:15:05.3265790Z E1204 11:52:25.434000 114595 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:05.3267882Z E1204 11:52:25.434000 114595 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp32', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 75} 2025-12-04T12:15:05.3268438Z E1204 11:52:25.434000 114595 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:05.3269480Z E1204 11:52:25.434000 114595 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.3270124Z E1204 11:52:25.434000 114595 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.3271208Z E1204 11:52:25.434000 114595 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.3271997Z E1204 11:52:25.434000 114595 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.3272880Z E1204 11:52:25.434000 114595 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.3273661Z E1204 11:52:25.434000 114595 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.3274275Z E1204 11:52:25.434000 114595 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T12:15:05.3275142Z E1204 11:52:25.434000 114595 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_1(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.3275526Z E1204 11:52:25.434000 114595 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T12:15:05.3276421Z E1204 11:52:25.434000 114595 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.3276544Z FAILED [0.4915s] [100%] 2025-12-04T12:15:05.3276550Z 2025-12-04T12:15:05.3276700Z ==================================== RERUNS ==================================== 2025-12-04T12:15:05.3277015Z __ TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda ___ 2025-12-04T12:15:05.3277143Z Traceback (most recent call last): 2025-12-04T12:15:05.3277548Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 236, in test_amax_fp8_quant 2025-12-04T12:15:05.3277719Z y_compiled = compiled_amax_fp8_quant(x, scale) 2025-12-04T12:15:05.3278216Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:05.3278531Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:05.3279061Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:05.3279258Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:05.3279783Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:05.3279973Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:05.3280553Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:05.3280891Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:05.3281415Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:05.3281581Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:05.3282061Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:05.3282185Z return self._compile_to_module() 2025-12-04T12:15:05.3282680Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:05.3282847Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:05.3283363Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:05.3283508Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:05.3284005Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:05.3284307Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:05.3284897Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:05.3285030Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:05.3285555Z File "/tmp/tmp01ubla4e/kp/ckpzfcagqn7xfv6oamrbmiph5ypibzgec26gkdhczk653eo546h4.py", line 118, in 2025-12-04T12:15:05.3286024Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T12:15:05.3286153Z kernel.precompile( 2025-12-04T12:15:05.3286710Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T12:15:05.3286829Z self._precompile_worker() 2025-12-04T12:15:05.3287442Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:05.3287624Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:05.3288220Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.3288436Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.3288888Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.3289149Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.3289597Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.3289932Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.3290174Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:05.3290608Z def triton_per_fused__to_copy_abs_amax_clamp_mul_1(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.3290713Z ^ 2025-12-04T12:15:05.3291211Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.3291217Z 2025-12-04T12:15:05.3291933Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:05.3291940Z 2025-12-04T12:15:05.3291976Z 2025-12-04T12:15:05.3292207Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:05.3292838Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda 2025-12-04T12:15:05.3292845Z 2025-12-04T12:15:05.3293125Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:05.3293353Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.3293459Z frames [('total', 1)] 2025-12-04T12:15:05.3293591Z stats [('calls_captured', 6)] 2025-12-04T12:15:05.3293828Z inductor [('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1)] 2025-12-04T12:15:05.3294050Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.3294163Z graph_break [] 2025-12-04T12:15:05.3294456Z __ TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda ___ 2025-12-04T12:15:05.3294596Z Traceback (most recent call last): 2025-12-04T12:15:05.3294992Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 236, in test_amax_fp8_quant 2025-12-04T12:15:05.3295147Z y_compiled = compiled_amax_fp8_quant(x, scale) 2025-12-04T12:15:05.3295652Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:05.3295902Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:05.3296568Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:05.3296767Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:05.3297275Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:05.3297439Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:05.3297977Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:05.3298302Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:05.3298842Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:05.3298991Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:05.3299494Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:05.3299618Z return self._compile_to_module() 2025-12-04T12:15:05.3300104Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:05.3300283Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:05.3300798Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:05.3300947Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:05.3301448Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:05.3301680Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:05.3302279Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:05.3302412Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:05.3302971Z File "/tmp/tmpkysq29mq/fh/cfh37imijtyduvrv3zgukdrayd3uijhppizkbtkqlz5rozpxioov.py", line 118, in 2025-12-04T12:15:05.3303448Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T12:15:05.3303560Z kernel.precompile( 2025-12-04T12:15:05.3304129Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T12:15:05.3304285Z self._precompile_worker() 2025-12-04T12:15:05.3304910Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:05.3305104Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:05.3305698Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.3305915Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.3306365Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.3306614Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.3307068Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.3307406Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.3307632Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:05.3308080Z def triton_per_fused__to_copy_abs_amax_clamp_mul_1(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.3308170Z ^ 2025-12-04T12:15:05.3308659Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.3308703Z 2025-12-04T12:15:05.3309422Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:05.3309429Z 2025-12-04T12:15:05.3309434Z 2025-12-04T12:15:05.3309664Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:05.3310268Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda 2025-12-04T12:15:05.3310276Z 2025-12-04T12:15:05.3310547Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:05.3310788Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.3310896Z frames [('total', 1)] 2025-12-04T12:15:05.3311015Z stats [('calls_captured', 6)] 2025-12-04T12:15:05.3311271Z inductor [('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1)] 2025-12-04T12:15:05.3311497Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.3311614Z graph_break [] 2025-12-04T12:15:05.3311834Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.3311939Z frames [('total', 1)] 2025-12-04T12:15:05.3312069Z stats [('calls_captured', 6)] 2025-12-04T12:15:05.3312289Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.3312525Z inductor [('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1)] 2025-12-04T12:15:05.3312644Z graph_break [] 2025-12-04T12:15:05.3312793Z =================================== FAILURES =================================== 2025-12-04T12:15:05.3313105Z __ TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda ___ 2025-12-04T12:15:05.3313233Z Traceback (most recent call last): 2025-12-04T12:15:05.3313633Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 236, in test_amax_fp8_quant 2025-12-04T12:15:05.3313804Z y_compiled = compiled_amax_fp8_quant(x, scale) 2025-12-04T12:15:05.3314361Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:05.3314616Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:05.3315146Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:05.3315377Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:05.3316287Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:05.3316482Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:05.3317026Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:05.3317369Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:05.3317892Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:05.3318054Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:05.3318537Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:05.3318661Z return self._compile_to_module() 2025-12-04T12:15:05.3319168Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:05.3319341Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:05.3319861Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:05.3320013Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:05.3320568Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:05.3320818Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:05.3321404Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:05.3321538Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:05.3322064Z File "/tmp/tmpfg502zcq/db/cdbad4etsaltwm57df5tkxb7v74k6iqxee6c3vhydwqlnukcffo2.py", line 118, in 2025-12-04T12:15:05.3322532Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T12:15:05.3322660Z kernel.precompile( 2025-12-04T12:15:05.3323217Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T12:15:05.3323337Z self._precompile_worker() 2025-12-04T12:15:05.3323951Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:05.3324132Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:05.3324726Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.3324936Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.3325387Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.3325643Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.3326088Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.3326420Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.3326660Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:05.3327126Z def triton_per_fused__to_copy_abs_amax_clamp_mul_1(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.3327229Z ^ 2025-12-04T12:15:05.3327686Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.3327692Z 2025-12-04T12:15:05.3328405Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:05.3328449Z 2025-12-04T12:15:05.3328454Z 2025-12-04T12:15:05.3328712Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:05.3329308Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda 2025-12-04T12:15:05.3329316Z 2025-12-04T12:15:05.3329791Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:05.3330022Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.3330128Z frames [('total', 1)] 2025-12-04T12:15:05.3330260Z stats [('calls_captured', 6)] 2025-12-04T12:15:05.3330496Z inductor [('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1)] 2025-12-04T12:15:05.3330717Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.3330838Z graph_break [] 2025-12-04T12:15:05.3331058Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.3331180Z frames [('total', 1)] 2025-12-04T12:15:05.3331296Z stats [('calls_captured', 6)] 2025-12-04T12:15:05.3331519Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.3331764Z inductor [('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1)] 2025-12-04T12:15:05.3331905Z graph_break [] 2025-12-04T12:15:05.3332125Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.3332247Z frames [('total', 1)] 2025-12-04T12:15:05.3332361Z stats [('calls_captured', 6)] 2025-12-04T12:15:05.3332588Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.3332819Z inductor [('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1)] 2025-12-04T12:15:05.3332917Z graph_break [] 2025-12-04T12:15:05.3333581Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-89171bcc48f05a69.xml - 2025-12-04T12:15:05.3333757Z =========================== short test summary info ============================ 2025-12-04T12:15:05.3334508Z FAILED [0.4915s] inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:05.3334949Z def triton_per_fused__to_copy_abs_amax_clamp_mul_1(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.3335041Z ^ 2025-12-04T12:15:05.3335512Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.3335517Z 2025-12-04T12:15:05.3336226Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:05.3336232Z 2025-12-04T12:15:05.3336239Z 2025-12-04T12:15:05.3336539Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:05.3337146Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda 2025-12-04T12:15:05.3337151Z 2025-12-04T12:15:05.3337423Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:05.3337626Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T12:15:05.3337837Z ================== 1 failed, 187 deselected, 2 rerun in 4.68s ================== 2025-12-04T12:15:05.3337983Z Got exit code 1 2025-12-04T12:15:05.3338113Z Retrying single test... 2025-12-04T12:15:05.3338584Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-6450e334481f0131.xml 2025-12-04T12:15:05.3338766Z ============================= test session starts ============================== 2025-12-04T12:15:05.3339121Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T12:15:05.3339269Z cachedir: .pytest_cache 2025-12-04T12:15:05.3339835Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:15:05.3339962Z rootdir: /var/lib/jenkins/workspace 2025-12-04T12:15:05.3340073Z configfile: pytest.ini 2025-12-04T12:15:05.3340683Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T12:15:05.3340912Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T12:15:05.3341599Z stepcurrent: skipping 14 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda 2025-12-04T12:15:05.3341719Z Running 1 items in this shard 2025-12-04T12:15:05.3341725Z 2025-12-04T12:15:05.3342896Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda E1204 11:52:44.559000 114827 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_mul_1 2025-12-04T12:15:05.3343784Z E1204 11:52:44.559000 114827 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_1(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.3344248Z E1204 11:52:44.559000 114827 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T12:15:05.3344706Z E1204 11:52:44.559000 114827 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 160 2025-12-04T12:15:05.3345226Z E1204 11:52:44.559000 114827 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] R0_BLOCK: tl.constexpr = 256 2025-12-04T12:15:05.3345702Z E1204 11:52:44.559000 114827 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T12:15:05.3346238Z E1204 11:52:44.559000 114827 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T12:15:05.3346783Z E1204 11:52:44.559000 114827 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:05.3347378Z E1204 11:52:44.559000 114827 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T12:15:05.3347962Z E1204 11:52:44.559000 114827 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T12:15:05.3348532Z E1204 11:52:44.559000 114827 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T12:15:05.3348973Z E1204 11:52:44.559000 114827 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_offset = 0 2025-12-04T12:15:05.3349491Z E1204 11:52:44.559000 114827 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T12:15:05.3349983Z E1204 11:52:44.559000 114827 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T12:15:05.3350439Z E1204 11:52:44.559000 114827 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T12:15:05.3350936Z E1204 11:52:44.559000 114827 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_0 = r0_index 2025-12-04T12:15:05.3351536Z E1204 11:52:44.559000 114827 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0) 2025-12-04T12:15:05.3352057Z E1204 11:52:44.559000 114827 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp6 = tl.load(in_ptr1 + (0)) 2025-12-04T12:15:05.3352641Z E1204 11:52:44.559000 114827 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = tl.broadcast_to(tmp6, [1, 1]) 2025-12-04T12:15:05.3353255Z E1204 11:52:44.559000 114827 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tl.broadcast_to(tmp0, [XBLOCK, R0_BLOCK]) 2025-12-04T12:15:05.3353843Z E1204 11:52:44.559000 114827 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tl.where(r0_mask, tmp1, float("-inf")) 2025-12-04T12:15:05.3354477Z E1204 11:52:44.559000 114827 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = triton_helpers.max2(tmp3, 1)[:, None].to(tl.float32) 2025-12-04T12:15:05.3354994Z E1204 11:52:44.559000 114827 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = tmp4.to(tl.float32) 2025-12-04T12:15:05.3355460Z E1204 11:52:44.559000 114827 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tmp5 * tmp7 2025-12-04T12:15:05.3355905Z E1204 11:52:44.559000 114827 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = -448.0 2025-12-04T12:15:05.3356488Z E1204 11:52:44.559000 114827 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = triton_helpers.maximum(tmp8, tmp9) 2025-12-04T12:15:05.3356924Z E1204 11:52:44.559000 114827 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = 448.0 2025-12-04T12:15:05.3357546Z E1204 11:52:44.559000 114827 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = triton_helpers.minimum(tmp10, tmp11) 2025-12-04T12:15:05.3358082Z E1204 11:52:44.559000 114827 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = tmp12.to(tl.float8e4nv) 2025-12-04T12:15:05.3358792Z E1204 11:52:44.559000 114827 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp13, None) 2025-12-04T12:15:05.3359173Z E1204 11:52:44.559000 114827 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:05.3361202Z E1204 11:52:44.559000 114827 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp32', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 75} 2025-12-04T12:15:05.3361751Z E1204 11:52:44.559000 114827 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:05.3362794Z E1204 11:52:44.559000 114827 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.3363437Z E1204 11:52:44.559000 114827 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.3364331Z E1204 11:52:44.559000 114827 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.3365061Z E1204 11:52:44.559000 114827 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.3365947Z E1204 11:52:44.559000 114827 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.3366747Z E1204 11:52:44.559000 114827 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.3367443Z E1204 11:52:44.559000 114827 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T12:15:05.3368323Z E1204 11:52:44.559000 114827 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_1(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.3368704Z E1204 11:52:44.559000 114827 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T12:15:05.3369601Z E1204 11:52:44.559000 114827 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.3369752Z ('RERUN', {'yellow': True}) [3.6549s] [100%] 2025-12-04T12:15:05.3370911Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda E1204 11:52:45.090000 114827 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_mul_1 2025-12-04T12:15:05.3371965Z E1204 11:52:45.090000 114827 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_1(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.3372501Z E1204 11:52:45.090000 114827 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T12:15:05.3372947Z E1204 11:52:45.090000 114827 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 160 2025-12-04T12:15:05.3373482Z E1204 11:52:45.090000 114827 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] R0_BLOCK: tl.constexpr = 256 2025-12-04T12:15:05.3373950Z E1204 11:52:45.090000 114827 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T12:15:05.3374484Z E1204 11:52:45.090000 114827 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T12:15:05.3375045Z E1204 11:52:45.090000 114827 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:05.3375637Z E1204 11:52:45.090000 114827 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T12:15:05.3376233Z E1204 11:52:45.090000 114827 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T12:15:05.3376848Z E1204 11:52:45.090000 114827 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T12:15:05.3377311Z E1204 11:52:45.090000 114827 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_offset = 0 2025-12-04T12:15:05.3377835Z E1204 11:52:45.090000 114827 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T12:15:05.3378305Z E1204 11:52:45.090000 114827 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T12:15:05.3378842Z E1204 11:52:45.090000 114827 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T12:15:05.3379296Z E1204 11:52:45.090000 114827 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_0 = r0_index 2025-12-04T12:15:05.3379907Z E1204 11:52:45.090000 114827 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0) 2025-12-04T12:15:05.3380431Z E1204 11:52:45.090000 114827 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp6 = tl.load(in_ptr1 + (0)) 2025-12-04T12:15:05.3381065Z E1204 11:52:45.090000 114827 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = tl.broadcast_to(tmp6, [1, 1]) 2025-12-04T12:15:05.3381662Z E1204 11:52:45.090000 114827 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tl.broadcast_to(tmp0, [XBLOCK, R0_BLOCK]) 2025-12-04T12:15:05.3382239Z E1204 11:52:45.090000 114827 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tl.where(r0_mask, tmp1, float("-inf")) 2025-12-04T12:15:05.3382872Z E1204 11:52:45.090000 114827 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = triton_helpers.max2(tmp3, 1)[:, None].to(tl.float32) 2025-12-04T12:15:05.3383380Z E1204 11:52:45.090000 114827 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = tmp4.to(tl.float32) 2025-12-04T12:15:05.3383846Z E1204 11:52:45.090000 114827 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tmp5 * tmp7 2025-12-04T12:15:05.3384306Z E1204 11:52:45.090000 114827 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = -448.0 2025-12-04T12:15:05.3384867Z E1204 11:52:45.090000 114827 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = triton_helpers.maximum(tmp8, tmp9) 2025-12-04T12:15:05.3385357Z E1204 11:52:45.090000 114827 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = 448.0 2025-12-04T12:15:05.3385931Z E1204 11:52:45.090000 114827 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = triton_helpers.minimum(tmp10, tmp11) 2025-12-04T12:15:05.3386468Z E1204 11:52:45.090000 114827 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = tmp12.to(tl.float8e4nv) 2025-12-04T12:15:05.3387195Z E1204 11:52:45.090000 114827 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp13, None) 2025-12-04T12:15:05.3387561Z E1204 11:52:45.090000 114827 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:05.3389593Z E1204 11:52:45.090000 114827 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp32', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 75} 2025-12-04T12:15:05.3390132Z E1204 11:52:45.090000 114827 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:05.3391204Z E1204 11:52:45.090000 114827 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.3391832Z E1204 11:52:45.090000 114827 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.3392790Z E1204 11:52:45.090000 114827 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.3393470Z E1204 11:52:45.090000 114827 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.3394353Z E1204 11:52:45.090000 114827 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.3395205Z E1204 11:52:45.090000 114827 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.3395814Z E1204 11:52:45.090000 114827 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T12:15:05.3396705Z E1204 11:52:45.090000 114827 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_1(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.3397070Z E1204 11:52:45.090000 114827 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T12:15:05.3397974Z E1204 11:52:45.090000 114827 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.3398113Z ('RERUN', {'yellow': True}) [0.4917s] [100%] 2025-12-04T12:15:05.3399278Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda E1204 11:52:45.581000 114827 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_mul_1 2025-12-04T12:15:05.3400213Z E1204 11:52:45.581000 114827 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_1(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.3400646Z E1204 11:52:45.581000 114827 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T12:15:05.3401109Z E1204 11:52:45.581000 114827 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 160 2025-12-04T12:15:05.3401631Z E1204 11:52:45.581000 114827 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] R0_BLOCK: tl.constexpr = 256 2025-12-04T12:15:05.3402117Z E1204 11:52:45.581000 114827 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T12:15:05.3402654Z E1204 11:52:45.581000 114827 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T12:15:05.3403198Z E1204 11:52:45.581000 114827 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:05.3406779Z E1204 11:52:45.581000 114827 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T12:15:05.3407402Z E1204 11:52:45.581000 114827 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T12:15:05.3407969Z E1204 11:52:45.581000 114827 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T12:15:05.3408429Z E1204 11:52:45.581000 114827 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_offset = 0 2025-12-04T12:15:05.3408948Z E1204 11:52:45.581000 114827 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T12:15:05.3409504Z E1204 11:52:45.581000 114827 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T12:15:05.3409966Z E1204 11:52:45.581000 114827 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T12:15:05.3410449Z E1204 11:52:45.581000 114827 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_0 = r0_index 2025-12-04T12:15:05.3411048Z E1204 11:52:45.581000 114827 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0) 2025-12-04T12:15:05.3411623Z E1204 11:52:45.581000 114827 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp6 = tl.load(in_ptr1 + (0)) 2025-12-04T12:15:05.3412166Z E1204 11:52:45.581000 114827 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = tl.broadcast_to(tmp6, [1, 1]) 2025-12-04T12:15:05.3412766Z E1204 11:52:45.581000 114827 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tl.broadcast_to(tmp0, [XBLOCK, R0_BLOCK]) 2025-12-04T12:15:05.3413342Z E1204 11:52:45.581000 114827 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tl.where(r0_mask, tmp1, float("-inf")) 2025-12-04T12:15:05.3413974Z E1204 11:52:45.581000 114827 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = triton_helpers.max2(tmp3, 1)[:, None].to(tl.float32) 2025-12-04T12:15:05.3414495Z E1204 11:52:45.581000 114827 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = tmp4.to(tl.float32) 2025-12-04T12:15:05.3414972Z E1204 11:52:45.581000 114827 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tmp5 * tmp7 2025-12-04T12:15:05.3415429Z E1204 11:52:45.581000 114827 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = -448.0 2025-12-04T12:15:05.3416040Z E1204 11:52:45.581000 114827 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = triton_helpers.maximum(tmp8, tmp9) 2025-12-04T12:15:05.3416595Z E1204 11:52:45.581000 114827 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = 448.0 2025-12-04T12:15:05.3417190Z E1204 11:52:45.581000 114827 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = triton_helpers.minimum(tmp10, tmp11) 2025-12-04T12:15:05.3417722Z E1204 11:52:45.581000 114827 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = tmp12.to(tl.float8e4nv) 2025-12-04T12:15:05.3418450Z E1204 11:52:45.581000 114827 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp13, None) 2025-12-04T12:15:05.3418813Z E1204 11:52:45.581000 114827 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:05.3420944Z E1204 11:52:45.581000 114827 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp32', 'in_ptr1': '*fp32', 'out_ptr1': '*fp8e4nv', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 75} 2025-12-04T12:15:05.3421482Z E1204 11:52:45.581000 114827 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:05.3422523Z E1204 11:52:45.581000 114827 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.3423169Z E1204 11:52:45.581000 114827 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.3424125Z E1204 11:52:45.581000 114827 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.3424819Z E1204 11:52:45.581000 114827 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.3425733Z E1204 11:52:45.581000 114827 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.3426515Z E1204 11:52:45.581000 114827 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.3427127Z E1204 11:52:45.581000 114827 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T12:15:05.3428009Z E1204 11:52:45.581000 114827 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_mul_1(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.3428377Z E1204 11:52:45.581000 114827 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T12:15:05.3429269Z E1204 11:52:45.581000 114827 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.3429390Z FAILED [0.4894s] [100%] 2025-12-04T12:15:05.3429397Z 2025-12-04T12:15:05.3429546Z ==================================== RERUNS ==================================== 2025-12-04T12:15:05.3429894Z __ TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda ___ 2025-12-04T12:15:05.3430025Z Traceback (most recent call last): 2025-12-04T12:15:05.3430421Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 236, in test_amax_fp8_quant 2025-12-04T12:15:05.3430596Z y_compiled = compiled_amax_fp8_quant(x, scale) 2025-12-04T12:15:05.3431090Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:05.3431342Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:05.3431872Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:05.3432066Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:05.3432589Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:05.3432742Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:05.3433279Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:05.3433613Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:05.3434191Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:05.3434356Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:05.3434838Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:05.3434960Z return self._compile_to_module() 2025-12-04T12:15:05.3435456Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:05.3435620Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:05.3436175Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:05.3436319Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:05.3436826Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:05.3437075Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:05.3437664Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:05.3437825Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:05.3438323Z File "/tmp/tmpfscsojs_/ot/cot6ucnhr5vlnabl6vdrqw37vysspn5uhkni7ol5os7ojegf4fnd.py", line 118, in 2025-12-04T12:15:05.3438787Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T12:15:05.3438916Z kernel.precompile( 2025-12-04T12:15:05.3439472Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T12:15:05.3439592Z self._precompile_worker() 2025-12-04T12:15:05.3440207Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:05.3440391Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:05.3440989Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.3441204Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.3441656Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.3441918Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.3442400Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.3442743Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.3442983Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:05.3443419Z def triton_per_fused__to_copy_abs_amax_clamp_mul_1(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.3443526Z ^ 2025-12-04T12:15:05.3443984Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.3443990Z 2025-12-04T12:15:05.3444699Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:05.3444705Z 2025-12-04T12:15:05.3444722Z 2025-12-04T12:15:05.3444944Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:05.3445543Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda 2025-12-04T12:15:05.3445549Z 2025-12-04T12:15:05.3445832Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:05.3446099Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.3446207Z frames [('total', 1)] 2025-12-04T12:15:05.3446339Z stats [('calls_captured', 6)] 2025-12-04T12:15:05.3446580Z inductor [('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1)] 2025-12-04T12:15:05.3446815Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.3446917Z graph_break [] 2025-12-04T12:15:05.3447211Z __ TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda ___ 2025-12-04T12:15:05.3447348Z Traceback (most recent call last): 2025-12-04T12:15:05.3447750Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 236, in test_amax_fp8_quant 2025-12-04T12:15:05.3447939Z y_compiled = compiled_amax_fp8_quant(x, scale) 2025-12-04T12:15:05.3448445Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:05.3448698Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:05.3449223Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:05.3449452Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:05.3449965Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:05.3450124Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:05.3450655Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:05.3450981Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:05.3451513Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:05.3451663Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:05.3452157Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:05.3452282Z return self._compile_to_module() 2025-12-04T12:15:05.3452769Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:05.3452947Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:05.3453464Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:05.3453718Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:05.3454219Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:05.3454454Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:05.3455055Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:05.3455183Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:05.3455662Z File "/tmp/tmp_07vpckh/cq/ccqyax4rpybxh3lrtsr6g6wsyqiu2bsp56wq25f7dxxumkgep4du.py", line 118, in 2025-12-04T12:15:05.3456141Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T12:15:05.3456254Z kernel.precompile( 2025-12-04T12:15:05.3456901Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T12:15:05.3457023Z self._precompile_worker() 2025-12-04T12:15:05.3457622Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:05.3457816Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:05.3458454Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.3458671Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.3459125Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.3459372Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.3459833Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.3460168Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.3460414Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:05.3460875Z def triton_per_fused__to_copy_abs_amax_clamp_mul_1(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.3460971Z ^ 2025-12-04T12:15:05.3461441Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.3461447Z 2025-12-04T12:15:05.3462158Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:05.3462195Z 2025-12-04T12:15:05.3462200Z 2025-12-04T12:15:05.3462431Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:05.3463027Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda 2025-12-04T12:15:05.3463035Z 2025-12-04T12:15:05.3463306Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:05.3463547Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.3463652Z frames [('total', 1)] 2025-12-04T12:15:05.3463782Z stats [('calls_captured', 6)] 2025-12-04T12:15:05.3464022Z inductor [('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1)] 2025-12-04T12:15:05.3464241Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.3464357Z graph_break [] 2025-12-04T12:15:05.3464576Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.3464680Z frames [('total', 1)] 2025-12-04T12:15:05.3464808Z stats [('calls_captured', 6)] 2025-12-04T12:15:05.3465025Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.3465258Z inductor [('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1)] 2025-12-04T12:15:05.3465403Z graph_break [] 2025-12-04T12:15:05.3465551Z =================================== FAILURES =================================== 2025-12-04T12:15:05.3465860Z __ TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda ___ 2025-12-04T12:15:05.3465984Z Traceback (most recent call last): 2025-12-04T12:15:05.3466385Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 236, in test_amax_fp8_quant 2025-12-04T12:15:05.3466556Z y_compiled = compiled_amax_fp8_quant(x, scale) 2025-12-04T12:15:05.3467051Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:05.3467301Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:05.3467827Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:05.3468021Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:05.3468544Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:05.3468694Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:05.3469284Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:05.3469620Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:05.3470139Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:05.3470306Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:05.3470783Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:05.3471187Z return self._compile_to_module() 2025-12-04T12:15:05.3471679Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:05.3471946Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:05.3472480Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:05.3472617Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:05.3473130Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:05.3473407Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:05.3473993Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:05.3474136Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:05.3474645Z File "/tmp/tmpywtwoc7l/gs/cgscbnakwbloqkvnt62ojqax3g45mhzel7n3jitvuxdbb575ndqo.py", line 118, in 2025-12-04T12:15:05.3475133Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T12:15:05.3475251Z kernel.precompile( 2025-12-04T12:15:05.3475805Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T12:15:05.3475944Z self._precompile_worker() 2025-12-04T12:15:05.3476540Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:05.3476724Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:05.3477331Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.3477532Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.3478000Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.3478296Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.3478746Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.3479105Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.3479334Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:05.3479782Z def triton_per_fused__to_copy_abs_amax_clamp_mul_1(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.3479878Z ^ 2025-12-04T12:15:05.3480339Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.3480344Z 2025-12-04T12:15:05.3481070Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:05.3481079Z 2025-12-04T12:15:05.3481084Z 2025-12-04T12:15:05.3481308Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:05.3481915Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda 2025-12-04T12:15:05.3481972Z 2025-12-04T12:15:05.3482247Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:05.3482473Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.3482602Z frames [('total', 1)] 2025-12-04T12:15:05.3482722Z stats [('calls_captured', 6)] 2025-12-04T12:15:05.3482975Z inductor [('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1)] 2025-12-04T12:15:05.3483199Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.3483300Z graph_break [] 2025-12-04T12:15:05.3483536Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.3483644Z frames [('total', 1)] 2025-12-04T12:15:05.3483797Z stats [('calls_captured', 6)] 2025-12-04T12:15:05.3484033Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.3484267Z inductor [('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1)] 2025-12-04T12:15:05.3484372Z graph_break [] 2025-12-04T12:15:05.3484605Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.3484711Z frames [('total', 1)] 2025-12-04T12:15:05.3484878Z stats [('calls_captured', 6)] 2025-12-04T12:15:05.3485099Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.3485329Z inductor [('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1)] 2025-12-04T12:15:05.3485447Z graph_break [] 2025-12-04T12:15:05.3486095Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-6450e334481f0131.xml - 2025-12-04T12:15:05.3486273Z =========================== short test summary info ============================ 2025-12-04T12:15:05.3487041Z FAILED [0.4894s] inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:05.3487473Z def triton_per_fused__to_copy_abs_amax_clamp_mul_1(in_ptr0, in_ptr1, out_ptr1, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.3487576Z ^ 2025-12-04T12:15:05.3488034Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.3488043Z 2025-12-04T12:15:05.3488747Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:05.3488765Z 2025-12-04T12:15:05.3488770Z 2025-12-04T12:15:05.3488987Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:05.3489624Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_amax_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda 2025-12-04T12:15:05.3489630Z 2025-12-04T12:15:05.3489908Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:05.3490091Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T12:15:05.3490306Z ================== 1 failed, 187 deselected, 2 rerun in 4.68s ================== 2025-12-04T12:15:05.3490410Z Got exit code 1 2025-12-04T12:15:05.3490923Z FAILED CONSISTENTLY: test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda 2025-12-04T12:15:05.3491346Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T12:15:05.3491818Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-f7999da795e3cf34.xml 2025-12-04T12:15:05.3491988Z ============================= test session starts ============================== 2025-12-04T12:15:05.3492353Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T12:15:05.3492465Z cachedir: .pytest_cache 2025-12-04T12:15:05.3493035Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:15:05.3493163Z rootdir: /var/lib/jenkins/workspace 2025-12-04T12:15:05.3493274Z configfile: pytest.ini 2025-12-04T12:15:05.3493887Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T12:15:05.3494116Z collecting ... collected 188 items / 15 deselected / 173 selected 2025-12-04T12:15:05.3494261Z stepcurrent: skipping 15 already run items. 2025-12-04T12:15:05.3494391Z Running 173 items in this shard 2025-12-04T12:15:05.3494399Z 2025-12-04T12:15:05.3494852Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e5m2_shape_1,1,15_cuda PASSED [3.3639s] [ 0%] 2025-12-04T12:15:05.3495346Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e5m2_shape_1,10,15_cuda PASSED [0.2656s] [ 1%] 2025-12-04T12:15:05.3495803Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e5m2_shape_1,10,4096_cuda PASSED [0.5887s] [ 1%] 2025-12-04T12:15:05.3496251Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e5m2_shape_1,10,512_cuda PASSED [0.2871s] [ 2%] 2025-12-04T12:15:05.3496840Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e5m2_shape_4,2048,4096_cuda PASSED [0.4899s] [ 2%] 2025-12-04T12:15:05.3497435Z inductor/test_fp8.py::TestFP8TypesCUDA::test_bad_cast_cuda SKIPPED [0.0004s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 3%] 2025-12-04T12:15:05.3498316Z inductor/test_fp8.py::TestFP8TypesCUDA::test_eager_fallback_bfloat16_cuda_bfloat16 W1204 11:53:06.009000 115060 site-packages/torch/_inductor/utils.py:1703] [0/0] Not enough SMs to use max_autotune_gemm mode 2025-12-04T12:15:05.3498456Z ('RERUN', {'yellow': True}) [0.4279s] [ 4%] 2025-12-04T12:15:05.3498967Z inductor/test_fp8.py::TestFP8TypesCUDA::test_eager_fallback_bfloat16_cuda_bfloat16 ('RERUN', {'yellow': True}) [0.4763s] [ 4%] 2025-12-04T12:15:05.3499402Z inductor/test_fp8.py::TestFP8TypesCUDA::test_eager_fallback_bfloat16_cuda_bfloat16 FAILED [0.4524s] [ 4%] 2025-12-04T12:15:05.3499408Z 2025-12-04T12:15:05.3499552Z ==================================== RERUNS ==================================== 2025-12-04T12:15:05.3499857Z _________ TestFP8TypesCUDA.test_eager_fallback_bfloat16_cuda_bfloat16 __________ 2025-12-04T12:15:05.3499984Z Traceback (most recent call last): 2025-12-04T12:15:05.3500381Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 130, in test_eager_fallback 2025-12-04T12:15:05.3500543Z y_fp8 = compiled_fp8_matmul(x) # noqa: F841 2025-12-04T12:15:05.3501075Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 926, in compile_wrapper 2025-12-04T12:15:05.3501205Z return fn(*args, **kwargs) 2025-12-04T12:15:05.3501600Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 113, in fp8_matmul_unwrapped 2025-12-04T12:15:05.3501716Z output = torch._scaled_mm( 2025-12-04T12:15:05.3502194Z RuntimeError: torch._scaled_mm is only supported on CUDA devices with compute capability >= 9.0 or 8.9, or ROCm MI300+ 2025-12-04T12:15:05.3502200Z 2025-12-04T12:15:05.3502421Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:05.3502970Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_eager_fallback_bfloat16_cuda_bfloat16 2025-12-04T12:15:05.3502976Z 2025-12-04T12:15:05.3503246Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:05.3503469Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.3503601Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:15:05.3503718Z stats [('calls_captured', 11)] 2025-12-04T12:15:05.3503941Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.3504295Z inductor [('pattern_matcher_count', 2), ('pattern_matcher_nodes', 2), ('fxgraph_cache_miss', 1)] 2025-12-04T12:15:05.3504431Z graph_break [] 2025-12-04T12:15:05.3504625Z aten_mm_info [('aten._scaled_mm.default_s77_s0_s77', 1)] 2025-12-04T12:15:05.3504843Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:15:05.3505587Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:15:05.3505707Z warnings.warn( 2025-12-04T12:15:05.3505994Z _________ TestFP8TypesCUDA.test_eager_fallback_bfloat16_cuda_bfloat16 __________ 2025-12-04T12:15:05.3506118Z Traceback (most recent call last): 2025-12-04T12:15:05.3506527Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 130, in test_eager_fallback 2025-12-04T12:15:05.3506705Z y_fp8 = compiled_fp8_matmul(x) # noqa: F841 2025-12-04T12:15:05.3507208Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 926, in compile_wrapper 2025-12-04T12:15:05.3507325Z return fn(*args, **kwargs) 2025-12-04T12:15:05.3507720Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 113, in fp8_matmul_unwrapped 2025-12-04T12:15:05.3507879Z output = torch._scaled_mm( 2025-12-04T12:15:05.3508340Z RuntimeError: torch._scaled_mm is only supported on CUDA devices with compute capability >= 9.0 or 8.9, or ROCm MI300+ 2025-12-04T12:15:05.3508346Z 2025-12-04T12:15:05.3508561Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:05.3509109Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_eager_fallback_bfloat16_cuda_bfloat16 2025-12-04T12:15:05.3509117Z 2025-12-04T12:15:05.3509385Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:05.3509623Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.3509739Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:15:05.3509856Z stats [('calls_captured', 11)] 2025-12-04T12:15:05.3510093Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.3510433Z inductor [('pattern_matcher_count', 2), ('pattern_matcher_nodes', 2), ('fxgraph_cache_miss', 1)] 2025-12-04T12:15:05.3510547Z graph_break [] 2025-12-04T12:15:05.3510724Z aten_mm_info [('aten._scaled_mm.default_s77_s0_s77', 1)] 2025-12-04T12:15:05.3510941Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:15:05.3512279Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:15:05.3512461Z warnings.warn( 2025-12-04T12:15:05.3512678Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.3512808Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:15:05.3512926Z stats [('calls_captured', 11)] 2025-12-04T12:15:05.3513165Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.3513508Z inductor [('pattern_matcher_count', 2), ('pattern_matcher_nodes', 2), ('fxgraph_cache_miss', 1)] 2025-12-04T12:15:05.3513609Z graph_break [] 2025-12-04T12:15:05.3513803Z aten_mm_info [('aten._scaled_mm.default_s77_s0_s77', 1)] 2025-12-04T12:15:05.3514018Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:15:05.3514748Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:15:05.3514864Z warnings.warn( 2025-12-04T12:15:05.3515015Z =================================== FAILURES =================================== 2025-12-04T12:15:05.3515313Z _________ TestFP8TypesCUDA.test_eager_fallback_bfloat16_cuda_bfloat16 __________ 2025-12-04T12:15:05.3515437Z Traceback (most recent call last): 2025-12-04T12:15:05.3515834Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 130, in test_eager_fallback 2025-12-04T12:15:05.3516039Z y_fp8 = compiled_fp8_matmul(x) # noqa: F841 2025-12-04T12:15:05.3516529Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 926, in compile_wrapper 2025-12-04T12:15:05.3516644Z return fn(*args, **kwargs) 2025-12-04T12:15:05.3517050Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 113, in fp8_matmul_unwrapped 2025-12-04T12:15:05.3517167Z output = torch._scaled_mm( 2025-12-04T12:15:05.3517637Z RuntimeError: torch._scaled_mm is only supported on CUDA devices with compute capability >= 9.0 or 8.9, or ROCm MI300+ 2025-12-04T12:15:05.3517646Z 2025-12-04T12:15:05.3517865Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:05.3518436Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_eager_fallback_bfloat16_cuda_bfloat16 2025-12-04T12:15:05.3518442Z 2025-12-04T12:15:05.3518727Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:05.3518945Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.3519075Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:15:05.3519224Z stats [('calls_captured', 11)] 2025-12-04T12:15:05.3519447Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.3519799Z inductor [('pattern_matcher_count', 2), ('pattern_matcher_nodes', 2), ('fxgraph_cache_miss', 1)] 2025-12-04T12:15:05.3519900Z graph_break [] 2025-12-04T12:15:05.3520075Z aten_mm_info [('aten._scaled_mm.default_s77_s0_s77', 1)] 2025-12-04T12:15:05.3520309Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:15:05.3521040Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:15:05.3521158Z warnings.warn( 2025-12-04T12:15:05.3521382Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.3521496Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:15:05.3521627Z stats [('calls_captured', 11)] 2025-12-04T12:15:05.3521851Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.3522190Z inductor [('pattern_matcher_count', 2), ('pattern_matcher_nodes', 2), ('fxgraph_cache_miss', 1)] 2025-12-04T12:15:05.3522310Z graph_break [] 2025-12-04T12:15:05.3522486Z aten_mm_info [('aten._scaled_mm.default_s77_s0_s77', 1)] 2025-12-04T12:15:05.3522703Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:15:05.3523482Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:15:05.3523586Z warnings.warn( 2025-12-04T12:15:05.3523811Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.3523926Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:15:05.3524040Z stats [('calls_captured', 11)] 2025-12-04T12:15:05.3524358Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.3524759Z inductor [('pattern_matcher_count', 2), ('pattern_matcher_nodes', 2), ('fxgraph_cache_miss', 1)] 2025-12-04T12:15:05.3524859Z graph_break [] 2025-12-04T12:15:05.3525048Z aten_mm_info [('aten._scaled_mm.default_s77_s0_s77', 1)] 2025-12-04T12:15:05.3525265Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:15:05.3526002Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:15:05.3526107Z warnings.warn( 2025-12-04T12:15:05.3526759Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-f7999da795e3cf34.xml - 2025-12-04T12:15:05.3526995Z =========================== short test summary info ============================ 2025-12-04T12:15:05.3527952Z FAILED [0.4524s] inductor/test_fp8.py::TestFP8TypesCUDA::test_eager_fallback_bfloat16_cuda_bfloat16 - RuntimeError: torch._scaled_mm is only supported on CUDA devices with compute capability >= 9.0 or 8.9, or ROCm MI300+ 2025-12-04T12:15:05.3527961Z 2025-12-04T12:15:05.3528193Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:05.3528731Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_eager_fallback_bfloat16_cuda_bfloat16 2025-12-04T12:15:05.3528737Z 2025-12-04T12:15:05.3529009Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:05.3529242Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T12:15:05.3529485Z ======== 1 failed, 5 passed, 1 skipped, 15 deselected, 2 rerun in 6.41s ======== 2025-12-04T12:15:05.3529605Z Got exit code 1 2025-12-04T12:15:05.3529716Z Retrying single test... 2025-12-04T12:15:05.3530198Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-ad7a38726bbc8b50.xml 2025-12-04T12:15:05.3530379Z ============================= test session starts ============================== 2025-12-04T12:15:05.3530765Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T12:15:05.3530875Z cachedir: .pytest_cache 2025-12-04T12:15:05.3531413Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:15:05.3531543Z rootdir: /var/lib/jenkins/workspace 2025-12-04T12:15:05.3531672Z configfile: pytest.ini 2025-12-04T12:15:05.3532268Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T12:15:05.3532494Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T12:15:05.3533131Z stepcurrent: skipping 21 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_eager_fallback_bfloat16_cuda_bfloat16 2025-12-04T12:15:05.3533251Z Running 1 items in this shard 2025-12-04T12:15:05.3533259Z 2025-12-04T12:15:05.3534207Z inductor/test_fp8.py::TestFP8TypesCUDA::test_eager_fallback_bfloat16_cuda_bfloat16 [W1204 11:53:23.880322369 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:15:05.3534213Z 2025-12-04T12:15:05.3534732Z [W1204 11:53:39.749253585 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:15:05.3534778Z 2025-12-04T12:15:05.3535310Z [W1204 11:53:39.749501944 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:15:05.3535316Z 2025-12-04T12:15:05.3535834Z [W1204 11:53:39.752604165 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:15:05.3535839Z 2025-12-04T12:15:05.3536425Z [W1204 11:53:39.752803535 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:15:05.3536434Z 2025-12-04T12:15:05.3536958Z [W1204 11:53:39.754748216 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:15:05.3536963Z 2025-12-04T12:15:05.3537473Z [W1204 11:53:39.755102951 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:15:05.3537480Z 2025-12-04T12:15:05.3538008Z [W1204 11:53:39.755275209 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:15:05.3538013Z 2025-12-04T12:15:05.3538586Z [W1204 11:53:39.755806379 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:15:05.3538591Z 2025-12-04T12:15:05.3539115Z [W1204 11:53:39.755992289 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:15:05.3539122Z 2025-12-04T12:15:05.3539633Z [W1204 11:53:39.756564136 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:15:05.3539638Z 2025-12-04T12:15:05.3540161Z [W1204 11:53:39.756744702 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:15:05.3540169Z 2025-12-04T12:15:05.3540715Z [W1204 11:53:39.757161730 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:15:05.3540720Z 2025-12-04T12:15:05.3541232Z [W1204 11:53:39.757340073 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:15:05.3541254Z 2025-12-04T12:15:05.3541765Z [W1204 11:53:39.757717761 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:15:05.3541801Z 2025-12-04T12:15:05.3542314Z [W1204 11:53:39.757895374 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:15:05.3542319Z 2025-12-04T12:15:05.3542846Z [W1204 11:53:39.758276180 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:15:05.3542853Z 2025-12-04T12:15:05.3543360Z [W1204 11:53:39.758456309 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:15:05.3543368Z 2025-12-04T12:15:05.3543850Z W1204 11:53:39.719000 115336 site-packages/torch/_inductor/utils.py:1703] [0/0] Not enough SMs to use max_autotune_gemm mode 2025-12-04T12:15:05.3544363Z [W1204 11:53:39.127990794 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:15:05.3544368Z 2025-12-04T12:15:05.3544522Z ('RERUN', {'yellow': True}) [19.3476s] [100%] 2025-12-04T12:15:05.3545450Z inductor/test_fp8.py::TestFP8TypesCUDA::test_eager_fallback_bfloat16_cuda_bfloat16 [W1204 11:53:40.742179499 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:15:05.3545456Z 2025-12-04T12:15:05.3545970Z [W1204 11:53:40.742608328 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:15:05.3546024Z 2025-12-04T12:15:05.3546537Z [W1204 11:53:40.742792868 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:15:05.3546542Z 2025-12-04T12:15:05.3547059Z [W1204 11:53:40.743381348 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:15:05.3547064Z 2025-12-04T12:15:05.3547586Z [W1204 11:53:40.743573437 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:15:05.3547594Z 2025-12-04T12:15:05.3548105Z [W1204 11:53:40.743937530 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:15:05.3548110Z 2025-12-04T12:15:05.3548637Z [W1204 11:53:40.744226615 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:15:05.3548644Z 2025-12-04T12:15:05.3549158Z [W1204 11:53:40.744393221 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:15:05.3549162Z 2025-12-04T12:15:05.3549726Z [W1204 11:53:40.744877476 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:15:05.3549732Z 2025-12-04T12:15:05.3550244Z [W1204 11:53:40.745056507 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:15:05.3550251Z 2025-12-04T12:15:05.3550775Z [W1204 11:53:40.745498131 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:15:05.3550780Z 2025-12-04T12:15:05.3551289Z [W1204 11:53:40.745678025 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:15:05.3551296Z 2025-12-04T12:15:05.3551848Z [W1204 11:53:40.746064328 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:15:05.3551853Z 2025-12-04T12:15:05.3552376Z [W1204 11:53:40.746240294 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:15:05.3552383Z 2025-12-04T12:15:05.3552892Z [W1204 11:53:40.746601410 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:15:05.3552949Z 2025-12-04T12:15:05.3553474Z [W1204 11:53:40.746777402 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:15:05.3553479Z 2025-12-04T12:15:05.3553982Z [W1204 11:53:40.747147147 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:15:05.3553989Z 2025-12-04T12:15:05.3554506Z [W1204 11:53:40.747323648 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:15:05.3554513Z 2025-12-04T12:15:05.3555023Z [W1204 11:53:40.841731853 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:15:05.3555028Z 2025-12-04T12:15:05.3555172Z ('RERUN', {'yellow': True}) [0.4822s] [100%] 2025-12-04T12:15:05.3556100Z inductor/test_fp8.py::TestFP8TypesCUDA::test_eager_fallback_bfloat16_cuda_bfloat16 [W1204 11:53:40.201171908 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:15:05.3556109Z 2025-12-04T12:15:05.3556949Z [W1204 11:53:40.201604707 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:15:05.3556969Z 2025-12-04T12:15:05.3557484Z [W1204 11:53:40.201790233 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:15:05.3557545Z 2025-12-04T12:15:05.3558057Z [W1204 11:53:40.202360700 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:15:05.3558062Z 2025-12-04T12:15:05.3558588Z [W1204 11:53:40.202554983 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:15:05.3558594Z 2025-12-04T12:15:05.3559103Z [W1204 11:53:40.202913615 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:15:05.3559111Z 2025-12-04T12:15:05.3559633Z [W1204 11:53:40.203212713 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:15:05.3559638Z 2025-12-04T12:15:05.3560149Z [W1204 11:53:40.203380198 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:15:05.3560156Z 2025-12-04T12:15:05.3560678Z [W1204 11:53:40.203833442 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:15:05.3560683Z 2025-12-04T12:15:05.3561236Z [W1204 11:53:40.204013243 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:15:05.3561242Z 2025-12-04T12:15:05.3561754Z [W1204 11:53:40.204462945 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:15:05.3561775Z 2025-12-04T12:15:05.3562286Z [W1204 11:53:40.204657249 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:15:05.3562291Z 2025-12-04T12:15:05.3562800Z [W1204 11:53:40.205044154 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:15:05.3562808Z 2025-12-04T12:15:05.3563371Z [W1204 11:53:40.205221330 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:15:05.3563377Z 2025-12-04T12:15:05.3563890Z [W1204 11:53:40.205582023 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:15:05.3563896Z 2025-12-04T12:15:05.3564417Z [W1204 11:53:40.205756718 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:15:05.3564457Z 2025-12-04T12:15:05.3564964Z [W1204 11:53:40.206118057 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:15:05.3564969Z 2025-12-04T12:15:05.3565486Z [W1204 11:53:40.206292667 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:15:05.3565494Z 2025-12-04T12:15:05.3566002Z [W1204 11:53:40.302170471 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:15:05.3566007Z 2025-12-04T12:15:05.3566124Z FAILED [0.4588s] [100%] 2025-12-04T12:15:05.3566129Z 2025-12-04T12:15:05.3566275Z ==================================== RERUNS ==================================== 2025-12-04T12:15:05.3566564Z _________ TestFP8TypesCUDA.test_eager_fallback_bfloat16_cuda_bfloat16 __________ 2025-12-04T12:15:05.3566702Z Traceback (most recent call last): 2025-12-04T12:15:05.3567102Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 130, in test_eager_fallback 2025-12-04T12:15:05.3567249Z y_fp8 = compiled_fp8_matmul(x) # noqa: F841 2025-12-04T12:15:05.3567756Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 926, in compile_wrapper 2025-12-04T12:15:05.3567869Z return fn(*args, **kwargs) 2025-12-04T12:15:05.3568316Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 113, in fp8_matmul_unwrapped 2025-12-04T12:15:05.3568433Z output = torch._scaled_mm( 2025-12-04T12:15:05.3568899Z RuntimeError: torch._scaled_mm is only supported on CUDA devices with compute capability >= 9.0 or 8.9, or ROCm MI300+ 2025-12-04T12:15:05.3569567Z Exception raised from _scaled_mm_out_cuda at /var/lib/jenkins/workspace/aten/src/ATen/native/cuda/ScaledBlas.cpp:492 (most recent call first): 2025-12-04T12:15:05.3569681Z C++ CapturedTraceback: 2025-12-04T12:15:05.3571240Z #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T12:15:05.3571731Z #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T12:15:05.3572074Z #6 c10::detail::torchCheckFail(char const*, char const*, unsigned int, char const*) from ??:0 2025-12-04T12:15:05.3573049Z #7 at::native::_scaled_mm_out_cuda(at::Tensor const&, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, std::optional const&, std::optional, bool, at::Tensor&) from ??:0 2025-12-04T12:15:05.3573855Z #8 at::native::_scaled_mm_cuda(at::Tensor const&, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, std::optional const&, std::optional, bool) from ??:0 2025-12-04T12:15:05.3574977Z #9 at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA___scaled_mm(at::Tensor const&, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, std::optional const&, std::optional, bool) from RegisterCUDA_0.cpp:0 2025-12-04T12:15:05.3578365Z #10 c10::impl::make_boxed_from_unboxed_functor const&, std::optional const&, std::optional, bool), &at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA___scaled_mm>, at::Tensor, c10::guts::typelist::typelist const&, std::optional const&, std::optional, bool> >, false>::call(c10::OperatorKernel*, c10::OperatorHandle const&, c10::DispatchKeySet, std::vector >*) from RegisterCUDA_0.cpp:0 2025-12-04T12:15:05.3579320Z #11 torch::autograd::autogradNotImplementedFallbackImpl(c10::OperatorHandle const&, c10::DispatchKeySet, std::vector >*) from autograd_not_implemented_fallback.cpp:0 2025-12-04T12:15:05.3580117Z #12 at::_ops::_scaled_mm::call(at::Tensor const&, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, std::optional const&, std::optional, bool) from ??:0 2025-12-04T12:15:05.3580534Z #13 torch::autograd::THPVariable__scaled_mm(_object*, _object*, _object*) from python_torch_functions_2.cpp:0 2025-12-04T12:15:05.3580859Z #14 cfunction_call from /usr/local/src/conda/python-3.10.14/Objects/methodobject.c:543 2025-12-04T12:15:05.3581184Z #15 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215 2025-12-04T12:15:05.3581600Z #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112 2025-12-04T12:15:05.3581730Z #17 dynamo__custom_eval_frame from :0 2025-12-04T12:15:05.3582126Z #18 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.3582392Z #19 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T12:15:05.3582820Z #20 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.3583239Z #21 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T12:15:05.3583615Z #22 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.3583925Z #23 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267 2025-12-04T12:15:05.3584188Z #24 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T12:15:05.3584560Z #25 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.3584832Z #26 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T12:15:05.3585201Z #27 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.3585476Z #28 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T12:15:05.3585848Z #29 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.3586105Z #30 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T12:15:05.3586529Z #31 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.3586936Z #32 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T12:15:05.3587322Z #33 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.3587727Z #34 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T12:15:05.3588093Z #35 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.3588516Z #36 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T12:15:05.3588918Z #37 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.3589327Z #38 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T12:15:05.3589712Z #39 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.3590116Z #40 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T12:15:05.3590534Z #41 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.3590830Z #42 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267 2025-12-04T12:15:05.3591087Z #43 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T12:15:05.3591469Z #44 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.3591821Z #45 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153 2025-12-04T12:15:05.3592142Z #46 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431 2025-12-04T12:15:05.3592441Z #47 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494 2025-12-04T12:15:05.3592745Z #48 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215 2025-12-04T12:15:05.3593164Z #49 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112 2025-12-04T12:15:05.3593536Z #50 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.3593952Z #51 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T12:15:05.3594323Z #52 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.3594618Z #53 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T12:15:05.3595001Z #54 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.3595406Z #55 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T12:15:05.3595776Z #56 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.3596195Z #57 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T12:15:05.3596567Z #58 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.3596927Z #59 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153 2025-12-04T12:15:05.3597234Z #60 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431 2025-12-04T12:15:05.3597530Z #61 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494 2025-12-04T12:15:05.3597815Z #62 _PyObject_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:305 2025-12-04T12:15:05.3598075Z #63 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T12:15:05.3598502Z #64 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.3598909Z #65 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T12:15:05.3599283Z #66 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.3599706Z #67 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T12:15:05.3600076Z #68 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.3600339Z #69 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T12:15:05.3600729Z #70 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.3601188Z #71 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T12:15:05.3601574Z #72 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.3601980Z #73 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T12:15:05.3602382Z #74 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.3602657Z #75 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T12:15:05.3603028Z #76 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.3603449Z #77 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T12:15:05.3603822Z #78 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.3604230Z #79 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T12:15:05.3604612Z #80 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.3604962Z #81 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153 2025-12-04T12:15:05.3605280Z #82 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431 2025-12-04T12:15:05.3605576Z #83 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494 2025-12-04T12:15:05.3605879Z #84 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215 2025-12-04T12:15:05.3606294Z #85 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112 2025-12-04T12:15:05.3606694Z #86 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.3606957Z #87 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T12:15:05.3607343Z #88 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.3607751Z #89 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T12:15:05.3608130Z #90 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.3608535Z #91 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T12:15:05.3608903Z #92 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.3609284Z #93 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153 2025-12-04T12:15:05.3609593Z #94 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431 2025-12-04T12:15:05.3609906Z #95 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494 2025-12-04T12:15:05.3610210Z #96 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215 2025-12-04T12:15:05.3610672Z #97 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112 2025-12-04T12:15:05.3611063Z #98 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.3611472Z #99 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T12:15:05.3611872Z #100 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.3612291Z #101 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T12:15:05.3612672Z #102 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.3612959Z #103 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T12:15:05.3613372Z #104 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.3613793Z #105 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T12:15:05.3614188Z #106 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.3614639Z #107 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T12:15:05.3615032Z #108 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.3615393Z #109 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153 2025-12-04T12:15:05.3615705Z #110 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431 2025-12-04T12:15:05.3616031Z #111 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494 2025-12-04T12:15:05.3616440Z #112 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215 2025-12-04T12:15:05.3616874Z #113 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112 2025-12-04T12:15:05.3617263Z #114 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.3617679Z #115 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T12:15:05.3618080Z #116 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.3618497Z #117 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T12:15:05.3618891Z #118 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.3619342Z #119 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T12:15:05.3619727Z #120 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.3620157Z #121 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T12:15:05.3620537Z #122 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.3620831Z #123 PyEval_EvalCode from /usr/local/src/conda/python-3.10.14/Python/ceval.c:1134 2025-12-04T12:15:05.3621159Z #124 run_eval_code_obj from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1291 2025-12-04T12:15:05.3621434Z #125 run_mod from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1312 2025-12-04T12:15:05.3621733Z #126 pyrun_file from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1208 2025-12-04T12:15:05.3622092Z #127 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:456 2025-12-04T12:15:05.3622424Z #128 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:90 2025-12-04T12:15:05.3622732Z #129 pymain_run_file_obj from /usr/local/src/conda/python-3.10.14/Modules/main.c:357 2025-12-04T12:15:05.3623037Z #130 Py_BytesMain from /usr/local/src/conda/python-3.10.14/Modules/main.c:1090 2025-12-04T12:15:05.3623324Z #131 __libc_start_call_main from ./csu/../sysdeps/nptl/libc_start_call_main.h:58 2025-12-04T12:15:05.3623525Z #132 __libc_start_main_impl from ./csu/../csu/libc-start.c:392 2025-12-04T12:15:05.3623633Z #133 _start from ??:0 2025-12-04T12:15:05.3623772Z #134 from ??:0 2025-12-04T12:15:05.3623778Z 2025-12-04T12:15:05.3623783Z 2025-12-04T12:15:05.3624004Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:05.3624546Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_eager_fallback_bfloat16_cuda_bfloat16 2025-12-04T12:15:05.3624568Z 2025-12-04T12:15:05.3624838Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:05.3625098Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.3625233Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:15:05.3625349Z stats [('calls_captured', 11)] 2025-12-04T12:15:05.3625693Z inductor [('pattern_matcher_count', 2), ('pattern_matcher_nodes', 2), ('fxgraph_cache_miss', 1)] 2025-12-04T12:15:05.3625933Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.3626063Z graph_break [] 2025-12-04T12:15:05.3626252Z aten_mm_info [('aten._scaled_mm.default_s77_s0_s77', 1)] 2025-12-04T12:15:05.3626471Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:15:05.3627679Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T12:15:05.3627814Z if out == self.unknown_value: 2025-12-04T12:15:05.3628541Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:15:05.3628648Z warnings.warn( 2025-12-04T12:15:05.3628949Z _________ TestFP8TypesCUDA.test_eager_fallback_bfloat16_cuda_bfloat16 __________ 2025-12-04T12:15:05.3629075Z Traceback (most recent call last): 2025-12-04T12:15:05.3629485Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 130, in test_eager_fallback 2025-12-04T12:15:05.3629630Z y_fp8 = compiled_fp8_matmul(x) # noqa: F841 2025-12-04T12:15:05.3630125Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 926, in compile_wrapper 2025-12-04T12:15:05.3630251Z return fn(*args, **kwargs) 2025-12-04T12:15:05.3630684Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 113, in fp8_matmul_unwrapped 2025-12-04T12:15:05.3630799Z output = torch._scaled_mm( 2025-12-04T12:15:05.3631275Z RuntimeError: torch._scaled_mm is only supported on CUDA devices with compute capability >= 9.0 or 8.9, or ROCm MI300+ 2025-12-04T12:15:05.3631933Z Exception raised from _scaled_mm_out_cuda at /var/lib/jenkins/workspace/aten/src/ATen/native/cuda/ScaledBlas.cpp:492 (most recent call first): 2025-12-04T12:15:05.3632058Z C++ CapturedTraceback: 2025-12-04T12:15:05.3633376Z #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T12:15:05.3633879Z #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T12:15:05.3634221Z #6 c10::detail::torchCheckFail(char const*, char const*, unsigned int, char const*) from ??:0 2025-12-04T12:15:05.3635130Z #7 at::native::_scaled_mm_out_cuda(at::Tensor const&, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, std::optional const&, std::optional, bool, at::Tensor&) from ??:0 2025-12-04T12:15:05.3635949Z #8 at::native::_scaled_mm_cuda(at::Tensor const&, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, std::optional const&, std::optional, bool) from ??:0 2025-12-04T12:15:05.3637059Z #9 at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA___scaled_mm(at::Tensor const&, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, std::optional const&, std::optional, bool) from RegisterCUDA_0.cpp:0 2025-12-04T12:15:05.3640357Z #10 c10::impl::make_boxed_from_unboxed_functor const&, std::optional const&, std::optional, bool), &at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA___scaled_mm>, at::Tensor, c10::guts::typelist::typelist const&, std::optional const&, std::optional, bool> >, false>::call(c10::OperatorKernel*, c10::OperatorHandle const&, c10::DispatchKeySet, std::vector >*) from RegisterCUDA_0.cpp:0 2025-12-04T12:15:05.3641366Z #11 torch::autograd::autogradNotImplementedFallbackImpl(c10::OperatorHandle const&, c10::DispatchKeySet, std::vector >*) from autograd_not_implemented_fallback.cpp:0 2025-12-04T12:15:05.3642173Z #12 at::_ops::_scaled_mm::call(at::Tensor const&, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, std::optional const&, std::optional, bool) from ??:0 2025-12-04T12:15:05.3642578Z #13 torch::autograd::THPVariable__scaled_mm(_object*, _object*, _object*) from python_torch_functions_2.cpp:0 2025-12-04T12:15:05.3642917Z #14 cfunction_call from /usr/local/src/conda/python-3.10.14/Objects/methodobject.c:543 2025-12-04T12:15:05.3643226Z #15 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215 2025-12-04T12:15:05.3643638Z #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112 2025-12-04T12:15:05.3643777Z #17 dynamo__custom_eval_frame from :0 2025-12-04T12:15:05.3644153Z #18 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.3644430Z #19 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T12:15:05.3644838Z #20 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.3645246Z #21 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T12:15:05.3645636Z #22 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.3645931Z #23 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267 2025-12-04T12:15:05.3646208Z #24 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T12:15:05.3646580Z #25 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.3646838Z #26 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T12:15:05.3647224Z #27 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.3647485Z #28 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T12:15:05.3647860Z #29 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.3648130Z #30 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T12:15:05.3648557Z #31 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.3648977Z #32 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T12:15:05.3649351Z #33 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.3649756Z #34 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T12:15:05.3650141Z #35 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.3650550Z #36 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T12:15:05.3650964Z #37 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.3651369Z #38 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T12:15:05.3651740Z #39 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.3652156Z #40 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T12:15:05.3652558Z #41 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.3652865Z #42 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267 2025-12-04T12:15:05.3653125Z #43 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T12:15:05.3653492Z #44 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.3653859Z #45 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153 2025-12-04T12:15:05.3654169Z #46 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431 2025-12-04T12:15:05.3654466Z #47 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494 2025-12-04T12:15:05.3654780Z #48 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215 2025-12-04T12:15:05.3655184Z #49 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112 2025-12-04T12:15:05.3655573Z #50 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.3655977Z #51 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T12:15:05.3656416Z #52 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.3656729Z #53 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T12:15:05.3657102Z #54 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.3657524Z #55 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T12:15:05.3657897Z #56 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.3658302Z #57 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T12:15:05.3658692Z #58 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.3659041Z #59 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153 2025-12-04T12:15:05.3659357Z #60 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431 2025-12-04T12:15:05.3659655Z #61 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494 2025-12-04T12:15:05.3659926Z #62 _PyObject_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:305 2025-12-04T12:15:05.3660198Z #63 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T12:15:05.3660600Z #64 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.3661006Z #65 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T12:15:05.3661394Z #66 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.3661798Z #67 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T12:15:05.3662178Z #68 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.3662437Z #69 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T12:15:05.3662811Z #70 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.3663263Z #71 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T12:15:05.3663635Z #72 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.3664053Z #73 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T12:15:05.3664456Z #74 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.3664715Z #75 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T12:15:05.3665097Z #76 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.3665502Z #77 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T12:15:05.3665887Z #78 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.3666293Z #79 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T12:15:05.3666663Z #80 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.3667026Z #81 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153 2025-12-04T12:15:05.3667331Z #82 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431 2025-12-04T12:15:05.3667627Z #83 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494 2025-12-04T12:15:05.3667941Z #84 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215 2025-12-04T12:15:05.3668348Z #85 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112 2025-12-04T12:15:05.3668763Z #86 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.3669027Z #87 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T12:15:05.3669397Z #88 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.3669818Z #89 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T12:15:05.3670191Z #90 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.3670613Z #91 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T12:15:05.3671160Z #92 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.3671515Z #93 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153 2025-12-04T12:15:05.3671838Z #94 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431 2025-12-04T12:15:05.3672135Z #95 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494 2025-12-04T12:15:05.3672453Z #96 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215 2025-12-04T12:15:05.3672939Z #97 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112 2025-12-04T12:15:05.3673313Z #98 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.3673736Z #99 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T12:15:05.3674119Z #100 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.3674532Z #101 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T12:15:05.3674925Z #102 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.3675195Z #103 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T12:15:05.3675639Z #104 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.3676057Z #105 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T12:15:05.3676436Z #106 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.3682613Z #107 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T12:15:05.3683058Z #108 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.3683423Z #109 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153 2025-12-04T12:15:05.3683751Z #110 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431 2025-12-04T12:15:05.3684063Z #111 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494 2025-12-04T12:15:05.3684388Z #112 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215 2025-12-04T12:15:05.3684803Z #113 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112 2025-12-04T12:15:05.3685187Z #114 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.3685616Z #115 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T12:15:05.3686001Z #116 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.3686428Z #117 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T12:15:05.3686804Z #118 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.3687343Z #119 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T12:15:05.3687740Z #120 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.3688156Z #121 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T12:15:05.3688552Z #122 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.3688845Z #123 PyEval_EvalCode from /usr/local/src/conda/python-3.10.14/Python/ceval.c:1134 2025-12-04T12:15:05.3689158Z #124 run_eval_code_obj from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1291 2025-12-04T12:15:05.3689967Z #125 run_mod from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1312 2025-12-04T12:15:05.3690259Z #126 pyrun_file from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1208 2025-12-04T12:15:05.3690615Z #127 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:456 2025-12-04T12:15:05.3690960Z #128 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:90 2025-12-04T12:15:05.3691253Z #129 pymain_run_file_obj from /usr/local/src/conda/python-3.10.14/Modules/main.c:357 2025-12-04T12:15:05.3691603Z #130 Py_BytesMain from /usr/local/src/conda/python-3.10.14/Modules/main.c:1090 2025-12-04T12:15:05.3691872Z #131 __libc_start_call_main from ./csu/../sysdeps/nptl/libc_start_call_main.h:58 2025-12-04T12:15:05.3692072Z #132 __libc_start_main_impl from ./csu/../csu/libc-start.c:392 2025-12-04T12:15:05.3692195Z #133 _start from ??:0 2025-12-04T12:15:05.3692322Z #134 from ??:0 2025-12-04T12:15:05.3692329Z 2025-12-04T12:15:05.3692334Z 2025-12-04T12:15:05.3692569Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:05.3693111Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_eager_fallback_bfloat16_cuda_bfloat16 2025-12-04T12:15:05.3693121Z 2025-12-04T12:15:05.3693391Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:05.3693671Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.3693790Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:15:05.3693911Z stats [('calls_captured', 11)] 2025-12-04T12:15:05.3694273Z inductor [('pattern_matcher_count', 2), ('pattern_matcher_nodes', 2), ('fxgraph_cache_miss', 1)] 2025-12-04T12:15:05.3694498Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.3694649Z graph_break [] 2025-12-04T12:15:05.3694825Z aten_mm_info [('aten._scaled_mm.default_s77_s0_s77', 1)] 2025-12-04T12:15:05.3695048Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:15:05.3696278Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T12:15:05.3696480Z if out == self.unknown_value: 2025-12-04T12:15:05.3697223Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:15:05.3697334Z warnings.warn( 2025-12-04T12:15:05.3697557Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.3697684Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:15:05.3697802Z stats [('calls_captured', 11)] 2025-12-04T12:15:05.3698028Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.3698382Z inductor [('pattern_matcher_count', 2), ('pattern_matcher_nodes', 2), ('fxgraph_cache_miss', 1)] 2025-12-04T12:15:05.3698480Z graph_break [] 2025-12-04T12:15:05.3698656Z aten_mm_info [('aten._scaled_mm.default_s77_s0_s77', 1)] 2025-12-04T12:15:05.3698948Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:15:05.3699677Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:15:05.3699797Z warnings.warn( 2025-12-04T12:15:05.3699943Z =================================== FAILURES =================================== 2025-12-04T12:15:05.3700226Z _________ TestFP8TypesCUDA.test_eager_fallback_bfloat16_cuda_bfloat16 __________ 2025-12-04T12:15:05.3700363Z Traceback (most recent call last): 2025-12-04T12:15:05.3700758Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 130, in test_eager_fallback 2025-12-04T12:15:05.3700915Z y_fp8 = compiled_fp8_matmul(x) # noqa: F841 2025-12-04T12:15:05.3701403Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 926, in compile_wrapper 2025-12-04T12:15:05.3701519Z return fn(*args, **kwargs) 2025-12-04T12:15:05.3701923Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 113, in fp8_matmul_unwrapped 2025-12-04T12:15:05.3702038Z output = torch._scaled_mm( 2025-12-04T12:15:05.3702499Z RuntimeError: torch._scaled_mm is only supported on CUDA devices with compute capability >= 9.0 or 8.9, or ROCm MI300+ 2025-12-04T12:15:05.3703202Z Exception raised from _scaled_mm_out_cuda at /var/lib/jenkins/workspace/aten/src/ATen/native/cuda/ScaledBlas.cpp:492 (most recent call first): 2025-12-04T12:15:05.3703317Z C++ CapturedTraceback: 2025-12-04T12:15:05.3704656Z #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T12:15:05.3705142Z #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T12:15:05.3705513Z #6 c10::detail::torchCheckFail(char const*, char const*, unsigned int, char const*) from ??:0 2025-12-04T12:15:05.3706398Z #7 at::native::_scaled_mm_out_cuda(at::Tensor const&, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, std::optional const&, std::optional, bool, at::Tensor&) from ??:0 2025-12-04T12:15:05.3707192Z #8 at::native::_scaled_mm_cuda(at::Tensor const&, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, std::optional const&, std::optional, bool) from ??:0 2025-12-04T12:15:05.3708578Z #9 at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA___scaled_mm(at::Tensor const&, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, std::optional const&, std::optional, bool) from RegisterCUDA_0.cpp:0 2025-12-04T12:15:05.3711827Z #10 c10::impl::make_boxed_from_unboxed_functor const&, std::optional const&, std::optional, bool), &at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA___scaled_mm>, at::Tensor, c10::guts::typelist::typelist const&, std::optional const&, std::optional, bool> >, false>::call(c10::OperatorKernel*, c10::OperatorHandle const&, c10::DispatchKeySet, std::vector >*) from RegisterCUDA_0.cpp:0 2025-12-04T12:15:05.3712741Z #11 torch::autograd::autogradNotImplementedFallbackImpl(c10::OperatorHandle const&, c10::DispatchKeySet, std::vector >*) from autograd_not_implemented_fallback.cpp:0 2025-12-04T12:15:05.3713577Z #12 at::_ops::_scaled_mm::call(at::Tensor const&, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, std::optional const&, std::optional, bool) from ??:0 2025-12-04T12:15:05.3713994Z #13 torch::autograd::THPVariable__scaled_mm(_object*, _object*, _object*) from python_torch_functions_2.cpp:0 2025-12-04T12:15:05.3714319Z #14 cfunction_call from /usr/local/src/conda/python-3.10.14/Objects/methodobject.c:543 2025-12-04T12:15:05.3714636Z #15 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215 2025-12-04T12:15:05.3715047Z #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112 2025-12-04T12:15:05.3715167Z #17 dynamo__custom_eval_frame from :0 2025-12-04T12:15:05.3715553Z #18 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.3715818Z #19 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T12:15:05.3716204Z #20 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.3716612Z #21 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T12:15:05.3717018Z #22 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.3717328Z #23 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267 2025-12-04T12:15:05.3717590Z #24 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T12:15:05.3717962Z #25 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.3718233Z #26 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T12:15:05.3718600Z #27 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.3718870Z #28 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T12:15:05.3719276Z #29 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.3719534Z #30 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T12:15:05.3719918Z #31 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.3720327Z #32 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T12:15:05.3720745Z #33 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.3721152Z #34 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T12:15:05.3721518Z #35 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.3721938Z #36 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T12:15:05.3722310Z #37 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.3722724Z #38 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T12:15:05.3723094Z #39 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.3723503Z #40 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T12:15:05.3723890Z #41 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.3724184Z #42 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267 2025-12-04T12:15:05.3724438Z #43 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T12:15:05.3724849Z #44 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.3725200Z #45 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153 2025-12-04T12:15:05.3725515Z #46 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431 2025-12-04T12:15:05.3725812Z #47 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494 2025-12-04T12:15:05.3726109Z #48 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215 2025-12-04T12:15:05.3726529Z #49 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112 2025-12-04T12:15:05.3726903Z #50 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.3727318Z #51 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T12:15:05.3727687Z #52 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.3727949Z #53 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T12:15:05.3728332Z #54 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.3728766Z #55 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T12:15:05.3729145Z #56 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.3729552Z #57 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T12:15:05.3729920Z #58 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.3730282Z #59 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153 2025-12-04T12:15:05.3730587Z #60 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431 2025-12-04T12:15:05.3730881Z #61 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494 2025-12-04T12:15:05.3731192Z #62 _PyObject_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:305 2025-12-04T12:15:05.3731450Z #63 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T12:15:05.3731831Z #64 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.3732236Z #65 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T12:15:05.3732636Z #66 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.3733049Z #67 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T12:15:05.3733420Z #68 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.3733691Z #69 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T12:15:05.3734063Z #70 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.3734464Z #71 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T12:15:05.3734846Z #72 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.3735250Z #73 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T12:15:05.3735629Z #74 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.3735886Z #75 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T12:15:05.3736253Z #76 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.3736745Z #77 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T12:15:05.3737154Z #78 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.3737558Z #79 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T12:15:05.3737938Z #80 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.3738287Z #81 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153 2025-12-04T12:15:05.3738603Z #82 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431 2025-12-04T12:15:05.3738895Z #83 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494 2025-12-04T12:15:05.3739196Z #84 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215 2025-12-04T12:15:05.3739610Z #85 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112 2025-12-04T12:15:05.3739984Z #86 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.3740254Z #87 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T12:15:05.3740629Z #88 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.3741071Z #89 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T12:15:05.3741454Z #90 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.3741860Z #91 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T12:15:05.3742240Z #92 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.3742588Z #93 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153 2025-12-04T12:15:05.3742892Z #94 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431 2025-12-04T12:15:05.3743230Z #95 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494 2025-12-04T12:15:05.3743532Z #96 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215 2025-12-04T12:15:05.3743944Z #97 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112 2025-12-04T12:15:05.3744323Z #98 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.3744774Z #99 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T12:15:05.3745167Z #100 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.3745581Z #101 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T12:15:05.3745963Z #102 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.3746243Z #103 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T12:15:05.3746621Z #104 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.3747047Z #105 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T12:15:05.3747422Z #106 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.3747836Z #107 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T12:15:05.3748231Z #108 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.3748585Z #109 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153 2025-12-04T12:15:05.3748910Z #110 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431 2025-12-04T12:15:05.3749243Z #111 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494 2025-12-04T12:15:05.3749551Z #112 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215 2025-12-04T12:15:05.3749975Z #113 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112 2025-12-04T12:15:05.3750355Z #114 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.3750771Z #115 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T12:15:05.3751159Z #116 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.3751572Z #117 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T12:15:05.3751961Z #118 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.3752374Z #119 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T12:15:05.3752750Z #120 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.3753203Z #121 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T12:15:05.3753586Z #122 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.3753894Z #123 PyEval_EvalCode from /usr/local/src/conda/python-3.10.14/Python/ceval.c:1134 2025-12-04T12:15:05.3754202Z #124 run_eval_code_obj from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1291 2025-12-04T12:15:05.3754472Z #125 run_mod from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1312 2025-12-04T12:15:05.3754770Z #126 pyrun_file from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1208 2025-12-04T12:15:05.3755125Z #127 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:456 2025-12-04T12:15:05.3755481Z #128 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:90 2025-12-04T12:15:05.3755789Z #129 pymain_run_file_obj from /usr/local/src/conda/python-3.10.14/Modules/main.c:357 2025-12-04T12:15:05.3756072Z #130 Py_BytesMain from /usr/local/src/conda/python-3.10.14/Modules/main.c:1090 2025-12-04T12:15:05.3756355Z #131 __libc_start_call_main from ./csu/../sysdeps/nptl/libc_start_call_main.h:58 2025-12-04T12:15:05.3756583Z #132 __libc_start_main_impl from ./csu/../csu/libc-start.c:392 2025-12-04T12:15:05.3756692Z #133 _start from ??:0 2025-12-04T12:15:05.3756830Z #134 from ??:0 2025-12-04T12:15:05.3756836Z 2025-12-04T12:15:05.3756842Z 2025-12-04T12:15:05.3757062Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:05.3757619Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_eager_fallback_bfloat16_cuda_bfloat16 2025-12-04T12:15:05.3757627Z 2025-12-04T12:15:05.3757903Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:05.3758126Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.3758257Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:15:05.3758381Z stats [('calls_captured', 11)] 2025-12-04T12:15:05.3758724Z inductor [('pattern_matcher_count', 2), ('pattern_matcher_nodes', 2), ('fxgraph_cache_miss', 1)] 2025-12-04T12:15:05.3758962Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.3759063Z graph_break [] 2025-12-04T12:15:05.3759255Z aten_mm_info [('aten._scaled_mm.default_s77_s0_s77', 1)] 2025-12-04T12:15:05.3759478Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:15:05.3760694Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T12:15:05.3760863Z if out == self.unknown_value: 2025-12-04T12:15:05.3761596Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:15:05.3761714Z warnings.warn( 2025-12-04T12:15:05.3761935Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.3762054Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:15:05.3762185Z stats [('calls_captured', 11)] 2025-12-04T12:15:05.3762410Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.3762750Z inductor [('pattern_matcher_count', 2), ('pattern_matcher_nodes', 2), ('fxgraph_cache_miss', 1)] 2025-12-04T12:15:05.3762866Z graph_break [] 2025-12-04T12:15:05.3763046Z aten_mm_info [('aten._scaled_mm.default_s77_s0_s77', 1)] 2025-12-04T12:15:05.3763279Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:15:05.3764012Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:15:05.3764116Z warnings.warn( 2025-12-04T12:15:05.3764383Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.3764497Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:15:05.3764615Z stats [('calls_captured', 11)] 2025-12-04T12:15:05.3764856Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.3765194Z inductor [('pattern_matcher_count', 2), ('pattern_matcher_nodes', 2), ('fxgraph_cache_miss', 1)] 2025-12-04T12:15:05.3765305Z graph_break [] 2025-12-04T12:15:05.3765482Z aten_mm_info [('aten._scaled_mm.default_s77_s0_s77', 1)] 2025-12-04T12:15:05.3765699Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:15:05.3766472Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:15:05.3766579Z warnings.warn( 2025-12-04T12:15:05.3767234Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-ad7a38726bbc8b50.xml - 2025-12-04T12:15:05.3767418Z =========================== short test summary info ============================ 2025-12-04T12:15:05.3768392Z FAILED [0.4588s] inductor/test_fp8.py::TestFP8TypesCUDA::test_eager_fallback_bfloat16_cuda_bfloat16 - RuntimeError: torch._scaled_mm is only supported on CUDA devices with compute capability >= 9.0 or 8.9, or ROCm MI300+ 2025-12-04T12:15:05.3769059Z Exception raised from _scaled_mm_out_cuda at /var/lib/jenkins/workspace/aten/src/ATen/native/cuda/ScaledBlas.cpp:492 (most recent call first): 2025-12-04T12:15:05.3769175Z C++ CapturedTraceback: 2025-12-04T12:15:05.3770494Z #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T12:15:05.3771199Z #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T12:15:05.3771536Z #6 c10::detail::torchCheckFail(char const*, char const*, unsigned int, char const*) from ??:0 2025-12-04T12:15:05.3772421Z #7 at::native::_scaled_mm_out_cuda(at::Tensor const&, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, std::optional const&, std::optional, bool, at::Tensor&) from ??:0 2025-12-04T12:15:05.3773216Z #8 at::native::_scaled_mm_cuda(at::Tensor const&, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, std::optional const&, std::optional, bool) from ??:0 2025-12-04T12:15:05.3774429Z #9 at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA___scaled_mm(at::Tensor const&, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, std::optional const&, std::optional, bool) from RegisterCUDA_0.cpp:0 2025-12-04T12:15:05.3777724Z #10 c10::impl::make_boxed_from_unboxed_functor const&, std::optional const&, std::optional, bool), &at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA___scaled_mm>, at::Tensor, c10::guts::typelist::typelist const&, std::optional const&, std::optional, bool> >, false>::call(c10::OperatorKernel*, c10::OperatorHandle const&, c10::DispatchKeySet, std::vector >*) from RegisterCUDA_0.cpp:0 2025-12-04T12:15:05.3778697Z #11 torch::autograd::autogradNotImplementedFallbackImpl(c10::OperatorHandle const&, c10::DispatchKeySet, std::vector >*) from autograd_not_implemented_fallback.cpp:0 2025-12-04T12:15:05.3779486Z #12 at::_ops::_scaled_mm::call(at::Tensor const&, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, std::optional const&, std::optional, bool) from ??:0 2025-12-04T12:15:05.3779899Z #13 torch::autograd::THPVariable__scaled_mm(_object*, _object*, _object*) from python_torch_functions_2.cpp:0 2025-12-04T12:15:05.3780224Z #14 cfunction_call from /usr/local/src/conda/python-3.10.14/Objects/methodobject.c:543 2025-12-04T12:15:05.3780533Z #15 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215 2025-12-04T12:15:05.3781003Z #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112 2025-12-04T12:15:05.3781127Z #17 dynamo__custom_eval_frame from :0 2025-12-04T12:15:05.3781518Z #18 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.3781780Z #19 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T12:15:05.3782195Z #20 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.3782615Z #21 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T12:15:05.3782988Z #22 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.3783296Z #23 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267 2025-12-04T12:15:05.3783557Z #24 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T12:15:05.3783928Z #25 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.3784204Z #26 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T12:15:05.3784574Z #27 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.3784833Z #28 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T12:15:05.3785216Z #29 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.3785476Z #30 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T12:15:05.3785858Z #31 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.3786264Z #32 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T12:15:05.3786679Z #33 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.3787094Z #34 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T12:15:05.3787465Z #35 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.3787880Z #36 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T12:15:05.3788249Z #37 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.3788653Z #38 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T12:15:05.3789031Z #39 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.3789436Z #40 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T12:15:05.3789819Z #41 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.3790111Z #42 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267 2025-12-04T12:15:05.3790402Z #43 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T12:15:05.3790784Z #44 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.3791136Z #45 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153 2025-12-04T12:15:05.3791439Z #46 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431 2025-12-04T12:15:05.3791744Z #47 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494 2025-12-04T12:15:05.3792047Z #48 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215 2025-12-04T12:15:05.3792470Z #49 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112 2025-12-04T12:15:05.3792891Z #50 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.3793296Z #51 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T12:15:05.3793675Z #52 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.3793932Z #53 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T12:15:05.3794342Z #54 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.3794749Z #55 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T12:15:05.3795118Z #56 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.3795538Z #57 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T12:15:05.3795909Z #58 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.3796269Z #59 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153 2025-12-04T12:15:05.3796577Z #60 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431 2025-12-04T12:15:05.3796870Z #61 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494 2025-12-04T12:15:05.3797158Z #62 _PyObject_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:305 2025-12-04T12:15:05.3797417Z #63 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T12:15:05.3797788Z #64 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.3798205Z #65 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T12:15:05.3798603Z #66 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.3799023Z #67 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T12:15:05.3799393Z #68 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.3799650Z #69 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T12:15:05.3800035Z #70 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.3800442Z #71 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T12:15:05.3800823Z #72 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.3801225Z #73 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T12:15:05.3801594Z #74 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.3801865Z #75 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T12:15:05.3802233Z #76 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.3802685Z #77 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T12:15:05.3803058Z #78 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.3803462Z #79 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T12:15:05.3803848Z #80 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.3804196Z #81 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153 2025-12-04T12:15:05.3804502Z #82 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431 2025-12-04T12:15:05.3804839Z #83 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494 2025-12-04T12:15:05.3805139Z #84 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215 2025-12-04T12:15:05.3805563Z #85 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112 2025-12-04T12:15:05.3805935Z #86 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.3806223Z #87 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T12:15:05.3806606Z #88 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.3807005Z #89 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T12:15:05.3807382Z #90 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.3807790Z #91 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T12:15:05.3808156Z #92 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.3808519Z #93 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153 2025-12-04T12:15:05.3808823Z #94 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431 2025-12-04T12:15:05.3809130Z #95 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494 2025-12-04T12:15:05.3809432Z #96 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215 2025-12-04T12:15:05.3809833Z #97 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112 2025-12-04T12:15:05.3810212Z #98 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.3810654Z #99 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T12:15:05.3811040Z #100 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.3811475Z #101 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T12:15:05.3811850Z #102 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.3812134Z #103 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T12:15:05.3812511Z #104 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.3812921Z #105 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T12:15:05.3813314Z #106 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.3813727Z #107 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T12:15:05.3814120Z #108 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.3814536Z #109 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153 2025-12-04T12:15:05.3814849Z #110 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431 2025-12-04T12:15:05.3815165Z #111 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494 2025-12-04T12:15:05.3815473Z #112 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215 2025-12-04T12:15:05.3815884Z #113 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112 2025-12-04T12:15:05.3816276Z #114 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.3816767Z #115 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T12:15:05.3817200Z #116 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.3817613Z #117 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T12:15:05.3817994Z #118 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.3818419Z #119 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T12:15:05.3818829Z #120 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.3819249Z #121 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T12:15:05.3819625Z #122 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.3819918Z #123 PyEval_EvalCode from /usr/local/src/conda/python-3.10.14/Python/ceval.c:1134 2025-12-04T12:15:05.3820243Z #124 run_eval_code_obj from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1291 2025-12-04T12:15:05.3820512Z #125 run_mod from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1312 2025-12-04T12:15:05.3820815Z #126 pyrun_file from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1208 2025-12-04T12:15:05.3821166Z #127 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:456 2025-12-04T12:15:05.3821493Z #128 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:90 2025-12-04T12:15:05.3821798Z #129 pymain_run_file_obj from /usr/local/src/conda/python-3.10.14/Modules/main.c:357 2025-12-04T12:15:05.3822070Z #130 Py_BytesMain from /usr/local/src/conda/python-3.10.14/Modules/main.c:1090 2025-12-04T12:15:05.3822339Z #131 __libc_start_call_main from ./csu/../sysdeps/nptl/libc_start_call_main.h:58 2025-12-04T12:15:05.3822589Z #132 __libc_start_main_impl from ./csu/../csu/libc-start.c:392 2025-12-04T12:15:05.3822693Z #133 _start from ??:0 2025-12-04T12:15:05.3822831Z #134 from ??:0 2025-12-04T12:15:05.3822838Z 2025-12-04T12:15:05.3822843Z 2025-12-04T12:15:05.3823077Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:05.3823620Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_eager_fallback_bfloat16_cuda_bfloat16 2025-12-04T12:15:05.3823625Z 2025-12-04T12:15:05.3823912Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:05.3824095Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T12:15:05.3824315Z ================= 1 failed, 187 deselected, 2 rerun in 20.33s ================== 2025-12-04T12:15:05.3824418Z Got exit code 1 2025-12-04T12:15:05.3824530Z Retrying single test... 2025-12-04T12:15:05.3825016Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-b434424093647de3.xml 2025-12-04T12:15:05.3825186Z ============================= test session starts ============================== 2025-12-04T12:15:05.3825539Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T12:15:05.3825665Z cachedir: .pytest_cache 2025-12-04T12:15:05.3826223Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:15:05.3826370Z rootdir: /var/lib/jenkins/workspace 2025-12-04T12:15:05.3826483Z configfile: pytest.ini 2025-12-04T12:15:05.3827079Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T12:15:05.3827323Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T12:15:05.3827946Z stepcurrent: skipping 21 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_eager_fallback_bfloat16_cuda_bfloat16 2025-12-04T12:15:05.3828084Z Running 1 items in this shard 2025-12-04T12:15:05.3828121Z 2025-12-04T12:15:05.3829055Z inductor/test_fp8.py::TestFP8TypesCUDA::test_eager_fallback_bfloat16_cuda_bfloat16 [W1204 11:53:57.520878802 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:15:05.3829062Z 2025-12-04T12:15:05.3829581Z [W1204 11:54:12.862438374 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:15:05.3829634Z 2025-12-04T12:15:05.3830147Z [W1204 11:54:12.862694584 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:15:05.3830153Z 2025-12-04T12:15:05.3830664Z [W1204 11:54:12.865804269 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:15:05.3830673Z 2025-12-04T12:15:05.3831207Z [W1204 11:54:12.866009267 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:15:05.3831212Z 2025-12-04T12:15:05.3831724Z [W1204 11:54:12.867991971 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:15:05.3831728Z 2025-12-04T12:15:05.3832254Z [W1204 11:54:12.868359051 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:15:05.3832261Z 2025-12-04T12:15:05.3832773Z [W1204 11:54:12.868545447 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:15:05.3832778Z 2025-12-04T12:15:05.3833300Z [W1204 11:54:12.869083168 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:15:05.3833336Z 2025-12-04T12:15:05.3833852Z [W1204 11:54:12.869269987 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:15:05.3833857Z 2025-12-04T12:15:05.3834368Z [W1204 11:54:12.869830376 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:15:05.3834389Z 2025-12-04T12:15:05.3834900Z [W1204 11:54:12.870048273 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:15:05.3834908Z 2025-12-04T12:15:05.3835414Z [W1204 11:54:12.870512249 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:15:05.3835419Z 2025-12-04T12:15:05.3835942Z [W1204 11:54:12.870690203 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:15:05.3835948Z 2025-12-04T12:15:05.3836459Z [W1204 11:54:12.871068054 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:15:05.3836466Z 2025-12-04T12:15:05.3836994Z [W1204 11:54:12.871243652 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:15:05.3836999Z 2025-12-04T12:15:05.3837558Z [W1204 11:54:12.871612440 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:15:05.3837566Z 2025-12-04T12:15:05.3838089Z [W1204 11:54:12.871788476 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:15:05.3838093Z 2025-12-04T12:15:05.3838563Z W1204 11:54:12.829000 115512 site-packages/torch/_inductor/utils.py:1703] [0/0] Not enough SMs to use max_autotune_gemm mode 2025-12-04T12:15:05.3839089Z [W1204 11:54:12.238848376 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:15:05.3839096Z 2025-12-04T12:15:05.3839345Z ('RERUN', {'yellow': True}) [18.8552s] [100%] 2025-12-04T12:15:05.3840274Z inductor/test_fp8.py::TestFP8TypesCUDA::test_eager_fallback_bfloat16_cuda_bfloat16 [W1204 11:54:13.869221007 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:15:05.3840280Z 2025-12-04T12:15:05.3840810Z [W1204 11:54:13.869639665 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:15:05.3840846Z 2025-12-04T12:15:05.3841358Z [W1204 11:54:13.869828176 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:15:05.3841363Z 2025-12-04T12:15:05.3841890Z [W1204 11:54:13.870436268 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:15:05.3841897Z 2025-12-04T12:15:05.3842412Z [W1204 11:54:13.870634727 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:15:05.3842418Z 2025-12-04T12:15:05.3842947Z [W1204 11:54:13.871003669 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:15:05.3842952Z 2025-12-04T12:15:05.3843461Z [W1204 11:54:13.871296948 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:15:05.3843468Z 2025-12-04T12:15:05.3843986Z [W1204 11:54:13.871463691 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:15:05.3843991Z 2025-12-04T12:15:05.3844496Z [W1204 11:54:13.871926564 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:15:05.3844531Z 2025-12-04T12:15:05.3845039Z [W1204 11:54:13.872107125 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:15:05.3845060Z 2025-12-04T12:15:05.3845568Z [W1204 11:54:13.872561805 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:15:05.3845575Z 2025-12-04T12:15:05.3846082Z [W1204 11:54:13.872741038 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:15:05.3846089Z 2025-12-04T12:15:05.3846608Z [W1204 11:54:13.873125893 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:15:05.3846613Z 2025-12-04T12:15:05.3847120Z [W1204 11:54:13.873303238 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:15:05.3847125Z 2025-12-04T12:15:05.3847644Z [W1204 11:54:13.873661680 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:15:05.3847649Z 2025-12-04T12:15:05.3848156Z [W1204 11:54:13.873837544 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:15:05.3848161Z 2025-12-04T12:15:05.3848708Z [W1204 11:54:13.874199063 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:15:05.3848714Z 2025-12-04T12:15:05.3849226Z [W1204 11:54:13.874373983 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:15:05.3849230Z 2025-12-04T12:15:05.3849736Z [W1204 11:54:13.966294254 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:15:05.3849753Z 2025-12-04T12:15:05.3849884Z ('RERUN', {'yellow': True}) [0.4767s] [100%] 2025-12-04T12:15:05.3850850Z inductor/test_fp8.py::TestFP8TypesCUDA::test_eager_fallback_bfloat16_cuda_bfloat16 [W1204 11:54:13.320873368 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:15:05.3850856Z 2025-12-04T12:15:05.3851382Z [W1204 11:54:13.321318706 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:15:05.3851387Z 2025-12-04T12:15:05.3851893Z [W1204 11:54:13.321503287 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:15:05.3851926Z 2025-12-04T12:15:05.3852450Z [W1204 11:54:13.322088136 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:15:05.3852454Z 2025-12-04T12:15:05.3852962Z [W1204 11:54:13.322280672 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:15:05.3852970Z 2025-12-04T12:15:05.3853493Z [W1204 11:54:13.322644089 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:15:05.3853499Z 2025-12-04T12:15:05.3854008Z [W1204 11:54:13.322935634 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:15:05.3854013Z 2025-12-04T12:15:05.3854533Z [W1204 11:54:13.323104183 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:15:05.3854541Z 2025-12-04T12:15:05.3855048Z [W1204 11:54:13.323578800 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:15:05.3855053Z 2025-12-04T12:15:05.3855563Z [W1204 11:54:13.323759318 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:15:05.3855598Z 2025-12-04T12:15:05.3856121Z [W1204 11:54:13.324205490 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:15:05.3856125Z 2025-12-04T12:15:05.3856722Z [W1204 11:54:13.324385347 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:15:05.3856727Z 2025-12-04T12:15:05.3857253Z [W1204 11:54:13.324786851 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:15:05.3857260Z 2025-12-04T12:15:05.3857773Z [W1204 11:54:13.324966001 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:15:05.3857777Z 2025-12-04T12:15:05.3858298Z [W1204 11:54:13.325325441 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:15:05.3858305Z 2025-12-04T12:15:05.3858816Z [W1204 11:54:13.325502454 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:15:05.3858821Z 2025-12-04T12:15:05.3859341Z [W1204 11:54:13.325862894 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:15:05.3859385Z 2025-12-04T12:15:05.3859898Z [W1204 11:54:13.326039400 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:15:05.3859905Z 2025-12-04T12:15:05.3860413Z [W1204 11:54:14.423402432 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:15:05.3860431Z 2025-12-04T12:15:05.3860535Z FAILED [0.4551s] [100%] 2025-12-04T12:15:05.3860540Z 2025-12-04T12:15:05.3860687Z ==================================== RERUNS ==================================== 2025-12-04T12:15:05.3860992Z _________ TestFP8TypesCUDA.test_eager_fallback_bfloat16_cuda_bfloat16 __________ 2025-12-04T12:15:05.3861119Z Traceback (most recent call last): 2025-12-04T12:15:05.3861552Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 130, in test_eager_fallback 2025-12-04T12:15:05.3861714Z y_fp8 = compiled_fp8_matmul(x) # noqa: F841 2025-12-04T12:15:05.3862213Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 926, in compile_wrapper 2025-12-04T12:15:05.3862340Z return fn(*args, **kwargs) 2025-12-04T12:15:05.3862767Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 113, in fp8_matmul_unwrapped 2025-12-04T12:15:05.3862882Z output = torch._scaled_mm( 2025-12-04T12:15:05.3863361Z RuntimeError: torch._scaled_mm is only supported on CUDA devices with compute capability >= 9.0 or 8.9, or ROCm MI300+ 2025-12-04T12:15:05.3864010Z Exception raised from _scaled_mm_out_cuda at /var/lib/jenkins/workspace/aten/src/ATen/native/cuda/ScaledBlas.cpp:492 (most recent call first): 2025-12-04T12:15:05.3864126Z C++ CapturedTraceback: 2025-12-04T12:15:05.3865452Z #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T12:15:05.3865936Z #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T12:15:05.3866290Z #6 c10::detail::torchCheckFail(char const*, char const*, unsigned int, char const*) from ??:0 2025-12-04T12:15:05.3867164Z #7 at::native::_scaled_mm_out_cuda(at::Tensor const&, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, std::optional const&, std::optional, bool, at::Tensor&) from ??:0 2025-12-04T12:15:05.3868010Z #8 at::native::_scaled_mm_cuda(at::Tensor const&, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, std::optional const&, std::optional, bool) from ??:0 2025-12-04T12:15:05.3869125Z #9 at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA___scaled_mm(at::Tensor const&, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, std::optional const&, std::optional, bool) from RegisterCUDA_0.cpp:0 2025-12-04T12:15:05.3872573Z #10 c10::impl::make_boxed_from_unboxed_functor const&, std::optional const&, std::optional, bool), &at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA___scaled_mm>, at::Tensor, c10::guts::typelist::typelist const&, std::optional const&, std::optional, bool> >, false>::call(c10::OperatorKernel*, c10::OperatorHandle const&, c10::DispatchKeySet, std::vector >*) from RegisterCUDA_0.cpp:0 2025-12-04T12:15:05.3873553Z #11 torch::autograd::autogradNotImplementedFallbackImpl(c10::OperatorHandle const&, c10::DispatchKeySet, std::vector >*) from autograd_not_implemented_fallback.cpp:0 2025-12-04T12:15:05.3874367Z #12 at::_ops::_scaled_mm::call(at::Tensor const&, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, std::optional const&, std::optional, bool) from ??:0 2025-12-04T12:15:05.3874775Z #13 torch::autograd::THPVariable__scaled_mm(_object*, _object*, _object*) from python_torch_functions_2.cpp:0 2025-12-04T12:15:05.3875122Z #14 cfunction_call from /usr/local/src/conda/python-3.10.14/Objects/methodobject.c:543 2025-12-04T12:15:05.3875473Z #15 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215 2025-12-04T12:15:05.3876360Z #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112 2025-12-04T12:15:05.3876505Z #17 dynamo__custom_eval_frame from :0 2025-12-04T12:15:05.3876885Z #18 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.3877149Z #19 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T12:15:05.3877594Z #20 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.3878000Z #21 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T12:15:05.3878388Z #22 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.3878687Z #23 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267 2025-12-04T12:15:05.3878953Z #24 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T12:15:05.3879335Z #25 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.3879597Z #26 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T12:15:05.3879985Z #27 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.3880247Z #28 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T12:15:05.3880619Z #29 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.3880896Z #30 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T12:15:05.3881266Z #31 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.3881745Z #32 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T12:15:05.3882117Z #33 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.3882525Z #34 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T12:15:05.3882907Z #35 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.3883312Z #36 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T12:15:05.3883685Z #37 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.3884108Z #38 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T12:15:05.3884476Z #39 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.3884898Z #40 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T12:15:05.3885269Z #41 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.3885568Z #42 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267 2025-12-04T12:15:05.3885876Z #43 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T12:15:05.3886255Z #44 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.3886619Z #45 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153 2025-12-04T12:15:05.3886924Z #46 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431 2025-12-04T12:15:05.3887217Z #47 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494 2025-12-04T12:15:05.3887537Z #48 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215 2025-12-04T12:15:05.3887976Z #49 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112 2025-12-04T12:15:05.3888425Z #50 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.3888908Z #51 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T12:15:05.3889297Z #52 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.3889624Z #53 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T12:15:05.3889994Z #54 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.3890399Z #55 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T12:15:05.3890783Z #56 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.3891194Z #57 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T12:15:05.3891575Z #58 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.3891927Z #59 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153 2025-12-04T12:15:05.3892231Z #60 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431 2025-12-04T12:15:05.3892545Z #61 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494 2025-12-04T12:15:05.3892814Z #62 _PyObject_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:305 2025-12-04T12:15:05.3893085Z #63 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T12:15:05.3893458Z #64 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.3893905Z #65 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T12:15:05.3894290Z #66 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.3894699Z #67 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T12:15:05.3895073Z #68 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.3895350Z #69 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T12:15:05.3895722Z #70 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.3896139Z #71 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T12:15:05.3896587Z #72 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.3896996Z #73 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T12:15:05.3897381Z #74 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.3897640Z #75 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T12:15:05.3898076Z #76 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.3898483Z #77 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T12:15:05.3898857Z #78 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.3899276Z #79 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T12:15:05.3899645Z #80 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.3900008Z #81 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153 2025-12-04T12:15:05.3900350Z #82 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431 2025-12-04T12:15:05.3900646Z #83 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494 2025-12-04T12:15:05.3900962Z #84 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215 2025-12-04T12:15:05.3901371Z #85 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112 2025-12-04T12:15:05.3901774Z #86 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.3902050Z #87 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T12:15:05.3902422Z #88 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.3902840Z #89 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T12:15:05.3903213Z #90 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.3903620Z #91 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T12:15:05.3904006Z #92 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.3904357Z #93 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153 2025-12-04T12:15:05.3904675Z #94 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431 2025-12-04T12:15:05.3904972Z #95 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494 2025-12-04T12:15:05.3905278Z #96 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215 2025-12-04T12:15:05.3905698Z #97 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112 2025-12-04T12:15:05.3906120Z #98 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.3906542Z #99 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T12:15:05.3906928Z #100 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.3907347Z #101 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T12:15:05.3907745Z #102 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.3908018Z #103 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T12:15:05.3908399Z #104 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.3908832Z #105 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T12:15:05.3909216Z #106 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.3909651Z #107 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T12:15:05.3910031Z #108 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.3910428Z #109 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153 2025-12-04T12:15:05.3910756Z #110 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431 2025-12-04T12:15:05.3911062Z #111 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494 2025-12-04T12:15:05.3911386Z #112 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215 2025-12-04T12:15:05.3911808Z #113 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112 2025-12-04T12:15:05.3912190Z #114 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.3912657Z #115 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T12:15:05.3913038Z #116 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.3913472Z #117 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T12:15:05.3913854Z #118 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.3914310Z #119 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T12:15:05.3914705Z #120 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.3915120Z #121 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T12:15:05.3915501Z #122 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.3915816Z #123 PyEval_EvalCode from /usr/local/src/conda/python-3.10.14/Python/ceval.c:1134 2025-12-04T12:15:05.3916129Z #124 run_eval_code_obj from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1291 2025-12-04T12:15:05.3916414Z #125 run_mod from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1312 2025-12-04T12:15:05.3916705Z #126 pyrun_file from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1208 2025-12-04T12:15:05.3917057Z #127 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:456 2025-12-04T12:15:05.3917409Z #128 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:90 2025-12-04T12:15:05.3917706Z #129 pymain_run_file_obj from /usr/local/src/conda/python-3.10.14/Modules/main.c:357 2025-12-04T12:15:05.3917999Z #130 Py_BytesMain from /usr/local/src/conda/python-3.10.14/Modules/main.c:1090 2025-12-04T12:15:05.3918310Z #131 __libc_start_call_main from ./csu/../sysdeps/nptl/libc_start_call_main.h:58 2025-12-04T12:15:05.3918515Z #132 __libc_start_main_impl from ./csu/../csu/libc-start.c:392 2025-12-04T12:15:05.3918637Z #133 _start from ??:0 2025-12-04T12:15:05.3918764Z #134 from ??:0 2025-12-04T12:15:05.3918770Z 2025-12-04T12:15:05.3918776Z 2025-12-04T12:15:05.3919003Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:05.3919562Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_eager_fallback_bfloat16_cuda_bfloat16 2025-12-04T12:15:05.3919570Z 2025-12-04T12:15:05.3919842Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:05.3920083Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.3920201Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:15:05.3920318Z stats [('calls_captured', 11)] 2025-12-04T12:15:05.3920678Z inductor [('pattern_matcher_count', 2), ('pattern_matcher_nodes', 2), ('fxgraph_cache_miss', 1)] 2025-12-04T12:15:05.3920901Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.3921015Z graph_break [] 2025-12-04T12:15:05.3921195Z aten_mm_info [('aten._scaled_mm.default_s77_s0_s77', 1)] 2025-12-04T12:15:05.3921453Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:15:05.3922684Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T12:15:05.3922806Z if out == self.unknown_value: 2025-12-04T12:15:05.3923534Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:15:05.3923650Z warnings.warn( 2025-12-04T12:15:05.3924280Z _________ TestFP8TypesCUDA.test_eager_fallback_bfloat16_cuda_bfloat16 __________ 2025-12-04T12:15:05.3924423Z Traceback (most recent call last): 2025-12-04T12:15:05.3924882Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 130, in test_eager_fallback 2025-12-04T12:15:05.3925033Z y_fp8 = compiled_fp8_matmul(x) # noqa: F841 2025-12-04T12:15:05.3925539Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 926, in compile_wrapper 2025-12-04T12:15:05.3925655Z return fn(*args, **kwargs) 2025-12-04T12:15:05.3926103Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 113, in fp8_matmul_unwrapped 2025-12-04T12:15:05.3926235Z output = torch._scaled_mm( 2025-12-04T12:15:05.3926692Z RuntimeError: torch._scaled_mm is only supported on CUDA devices with compute capability >= 9.0 or 8.9, or ROCm MI300+ 2025-12-04T12:15:05.3927372Z Exception raised from _scaled_mm_out_cuda at /var/lib/jenkins/workspace/aten/src/ATen/native/cuda/ScaledBlas.cpp:492 (most recent call first): 2025-12-04T12:15:05.3927491Z C++ CapturedTraceback: 2025-12-04T12:15:05.3928812Z #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T12:15:05.3929314Z #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T12:15:05.3929655Z #6 c10::detail::torchCheckFail(char const*, char const*, unsigned int, char const*) from ??:0 2025-12-04T12:15:05.3930544Z #7 at::native::_scaled_mm_out_cuda(at::Tensor const&, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, std::optional const&, std::optional, bool, at::Tensor&) from ??:0 2025-12-04T12:15:05.3931385Z #8 at::native::_scaled_mm_cuda(at::Tensor const&, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, std::optional const&, std::optional, bool) from ??:0 2025-12-04T12:15:05.3932502Z #9 at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA___scaled_mm(at::Tensor const&, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, std::optional const&, std::optional, bool) from RegisterCUDA_0.cpp:0 2025-12-04T12:15:05.3935734Z #10 c10::impl::make_boxed_from_unboxed_functor const&, std::optional const&, std::optional, bool), &at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA___scaled_mm>, at::Tensor, c10::guts::typelist::typelist const&, std::optional const&, std::optional, bool> >, false>::call(c10::OperatorKernel*, c10::OperatorHandle const&, c10::DispatchKeySet, std::vector >*) from RegisterCUDA_0.cpp:0 2025-12-04T12:15:05.3936772Z #11 torch::autograd::autogradNotImplementedFallbackImpl(c10::OperatorHandle const&, c10::DispatchKeySet, std::vector >*) from autograd_not_implemented_fallback.cpp:0 2025-12-04T12:15:05.3937565Z #12 at::_ops::_scaled_mm::call(at::Tensor const&, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, std::optional const&, std::optional, bool) from ??:0 2025-12-04T12:15:05.3937984Z #13 torch::autograd::THPVariable__scaled_mm(_object*, _object*, _object*) from python_torch_functions_2.cpp:0 2025-12-04T12:15:05.3938311Z #14 cfunction_call from /usr/local/src/conda/python-3.10.14/Objects/methodobject.c:543 2025-12-04T12:15:05.3938651Z #15 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215 2025-12-04T12:15:05.3939075Z #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112 2025-12-04T12:15:05.3939203Z #17 dynamo__custom_eval_frame from :0 2025-12-04T12:15:05.3939594Z #18 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.3939858Z #19 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T12:15:05.3940267Z #20 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.3940690Z #21 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T12:15:05.3941065Z #22 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.3941374Z #23 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267 2025-12-04T12:15:05.3941641Z #24 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T12:15:05.3942015Z #25 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.3942287Z #26 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T12:15:05.3942661Z #27 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.3942922Z #28 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T12:15:05.3943309Z #29 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.3943570Z #30 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T12:15:05.3943951Z #31 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.3944398Z #32 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T12:15:05.3944773Z #33 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.3945197Z #34 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T12:15:05.3945567Z #35 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.3945987Z #36 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T12:15:05.3946358Z #37 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.3946763Z #38 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T12:15:05.3947149Z #39 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.3947557Z #40 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T12:15:05.3947945Z #41 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.3948278Z #42 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267 2025-12-04T12:15:05.3948537Z #43 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T12:15:05.3948918Z #44 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.3949271Z #45 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153 2025-12-04T12:15:05.3949576Z #46 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431 2025-12-04T12:15:05.3949884Z #47 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494 2025-12-04T12:15:05.3950188Z #48 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215 2025-12-04T12:15:05.3950642Z #49 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112 2025-12-04T12:15:05.3951013Z #50 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.3951421Z #51 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T12:15:05.3951805Z #52 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.3952096Z #53 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T12:15:05.3952480Z #54 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.3952884Z #55 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T12:15:05.3953257Z #56 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.3953678Z #57 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T12:15:05.3954051Z #58 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.3954417Z #59 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153 2025-12-04T12:15:05.3954721Z #60 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431 2025-12-04T12:15:05.3955019Z #61 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494 2025-12-04T12:15:05.3955299Z #62 _PyObject_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:305 2025-12-04T12:15:05.3955557Z #63 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T12:15:05.3955927Z #64 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.3956382Z #65 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T12:15:05.3956753Z #66 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.3957168Z #67 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T12:15:05.3957541Z #68 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.3957800Z #69 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T12:15:05.3958184Z #70 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.3958586Z #71 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T12:15:05.3958963Z #72 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.3959371Z #73 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T12:15:05.3959741Z #74 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.3960016Z #75 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T12:15:05.3960420Z #76 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.3960840Z #77 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T12:15:05.3961212Z #78 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.3961615Z #79 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T12:15:05.3961997Z #80 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.3962346Z #81 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153 2025-12-04T12:15:05.3962683Z #82 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431 2025-12-04T12:15:05.3962990Z #83 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494 2025-12-04T12:15:05.3963296Z #84 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215 2025-12-04T12:15:05.3963713Z #85 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112 2025-12-04T12:15:05.3964119Z #86 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.3964381Z #87 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T12:15:05.3964766Z #88 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.3965171Z #89 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T12:15:05.3965556Z #90 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.3965968Z #91 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T12:15:05.3966340Z #92 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.3966706Z #93 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153 2025-12-04T12:15:05.3967016Z #94 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431 2025-12-04T12:15:05.3967313Z #95 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494 2025-12-04T12:15:05.3967631Z #96 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215 2025-12-04T12:15:05.3968038Z #97 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112 2025-12-04T12:15:05.3968457Z #98 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.3968863Z #99 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T12:15:05.3969247Z #100 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.3969678Z #101 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T12:15:05.3970059Z #102 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.3970341Z #103 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T12:15:05.3970723Z #104 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.3971357Z #105 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T12:15:05.3971756Z #106 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.3972172Z #107 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T12:15:05.3972565Z #108 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.3973030Z #109 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153 2025-12-04T12:15:05.3973345Z #110 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431 2025-12-04T12:15:05.3973664Z #111 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494 2025-12-04T12:15:05.3973973Z #112 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215 2025-12-04T12:15:05.3974390Z #113 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112 2025-12-04T12:15:05.3974786Z #114 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.3975273Z #115 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T12:15:05.3975669Z #116 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.3976086Z #117 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T12:15:05.3976547Z #118 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.3977065Z #119 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T12:15:05.3977449Z #120 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.3977877Z #121 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T12:15:05.3978260Z #122 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.3978558Z #123 PyEval_EvalCode from /usr/local/src/conda/python-3.10.14/Python/ceval.c:1134 2025-12-04T12:15:05.3978886Z #124 run_eval_code_obj from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1291 2025-12-04T12:15:05.3979161Z #125 run_mod from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1312 2025-12-04T12:15:05.3979463Z #126 pyrun_file from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1208 2025-12-04T12:15:05.3979817Z #127 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:456 2025-12-04T12:15:05.3980149Z #128 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:90 2025-12-04T12:15:05.3980454Z #129 pymain_run_file_obj from /usr/local/src/conda/python-3.10.14/Modules/main.c:357 2025-12-04T12:15:05.3980729Z #130 Py_BytesMain from /usr/local/src/conda/python-3.10.14/Modules/main.c:1090 2025-12-04T12:15:05.3981047Z #131 __libc_start_call_main from ./csu/../sysdeps/nptl/libc_start_call_main.h:58 2025-12-04T12:15:05.3981263Z #132 __libc_start_main_impl from ./csu/../csu/libc-start.c:392 2025-12-04T12:15:05.3981368Z #133 _start from ??:0 2025-12-04T12:15:05.3981511Z #134 from ??:0 2025-12-04T12:15:05.3981518Z 2025-12-04T12:15:05.3981523Z 2025-12-04T12:15:05.3981749Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:05.3982296Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_eager_fallback_bfloat16_cuda_bfloat16 2025-12-04T12:15:05.3982304Z 2025-12-04T12:15:05.3982593Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:05.3982822Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.3982954Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:15:05.3983078Z stats [('calls_captured', 11)] 2025-12-04T12:15:05.3983425Z inductor [('pattern_matcher_count', 2), ('pattern_matcher_nodes', 2), ('fxgraph_cache_miss', 1)] 2025-12-04T12:15:05.3983667Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.3983771Z graph_break [] 2025-12-04T12:15:05.3983951Z aten_mm_info [('aten._scaled_mm.default_s77_s0_s77', 1)] 2025-12-04T12:15:05.3984222Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:15:05.3985441Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T12:15:05.3985631Z if out == self.unknown_value: 2025-12-04T12:15:05.3986360Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:15:05.3986466Z warnings.warn( 2025-12-04T12:15:05.3986703Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.3986820Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:15:05.3986969Z stats [('calls_captured', 11)] 2025-12-04T12:15:05.3987206Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.3987549Z inductor [('pattern_matcher_count', 2), ('pattern_matcher_nodes', 2), ('fxgraph_cache_miss', 1)] 2025-12-04T12:15:05.3987663Z graph_break [] 2025-12-04T12:15:05.3987844Z aten_mm_info [('aten._scaled_mm.default_s77_s0_s77', 1)] 2025-12-04T12:15:05.3988100Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:15:05.3988852Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:15:05.3988963Z warnings.warn( 2025-12-04T12:15:05.3989112Z =================================== FAILURES =================================== 2025-12-04T12:15:05.3989415Z _________ TestFP8TypesCUDA.test_eager_fallback_bfloat16_cuda_bfloat16 __________ 2025-12-04T12:15:05.3989541Z Traceback (most recent call last): 2025-12-04T12:15:05.3989951Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 130, in test_eager_fallback 2025-12-04T12:15:05.3990097Z y_fp8 = compiled_fp8_matmul(x) # noqa: F841 2025-12-04T12:15:05.3990593Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 926, in compile_wrapper 2025-12-04T12:15:05.3990725Z return fn(*args, **kwargs) 2025-12-04T12:15:05.3991124Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 113, in fp8_matmul_unwrapped 2025-12-04T12:15:05.3991249Z output = torch._scaled_mm( 2025-12-04T12:15:05.3991707Z RuntimeError: torch._scaled_mm is only supported on CUDA devices with compute capability >= 9.0 or 8.9, or ROCm MI300+ 2025-12-04T12:15:05.3992360Z Exception raised from _scaled_mm_out_cuda at /var/lib/jenkins/workspace/aten/src/ATen/native/cuda/ScaledBlas.cpp:492 (most recent call first): 2025-12-04T12:15:05.3992523Z C++ CapturedTraceback: 2025-12-04T12:15:05.3993847Z #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T12:15:05.3994343Z #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T12:15:05.3994681Z #6 c10::detail::torchCheckFail(char const*, char const*, unsigned int, char const*) from ??:0 2025-12-04T12:15:05.3995551Z #7 at::native::_scaled_mm_out_cuda(at::Tensor const&, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, std::optional const&, std::optional, bool, at::Tensor&) from ??:0 2025-12-04T12:15:05.3996366Z #8 at::native::_scaled_mm_cuda(at::Tensor const&, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, std::optional const&, std::optional, bool) from ??:0 2025-12-04T12:15:05.3997506Z #9 at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA___scaled_mm(at::Tensor const&, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, std::optional const&, std::optional, bool) from RegisterCUDA_0.cpp:0 2025-12-04T12:15:05.4000789Z #10 c10::impl::make_boxed_from_unboxed_functor const&, std::optional const&, std::optional, bool), &at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA___scaled_mm>, at::Tensor, c10::guts::typelist::typelist const&, std::optional const&, std::optional, bool> >, false>::call(c10::OperatorKernel*, c10::OperatorHandle const&, c10::DispatchKeySet, std::vector >*) from RegisterCUDA_0.cpp:0 2025-12-04T12:15:05.4001691Z #11 torch::autograd::autogradNotImplementedFallbackImpl(c10::OperatorHandle const&, c10::DispatchKeySet, std::vector >*) from autograd_not_implemented_fallback.cpp:0 2025-12-04T12:15:05.4002518Z #12 at::_ops::_scaled_mm::call(at::Tensor const&, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, std::optional const&, std::optional, bool) from ??:0 2025-12-04T12:15:05.4002920Z #13 torch::autograd::THPVariable__scaled_mm(_object*, _object*, _object*) from python_torch_functions_2.cpp:0 2025-12-04T12:15:05.4003260Z #14 cfunction_call from /usr/local/src/conda/python-3.10.14/Objects/methodobject.c:543 2025-12-04T12:15:05.4003571Z #15 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215 2025-12-04T12:15:05.4003997Z #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112 2025-12-04T12:15:05.4004125Z #17 dynamo__custom_eval_frame from :0 2025-12-04T12:15:05.4004503Z #18 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.4004782Z #19 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T12:15:05.4005155Z #20 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.4005560Z #21 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T12:15:05.4005944Z #22 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.4006274Z #23 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267 2025-12-04T12:15:05.4006552Z #24 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T12:15:05.4006924Z #25 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.4007186Z #26 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T12:15:05.4007571Z #27 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.4007832Z #28 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T12:15:05.4008217Z #29 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.4008476Z #30 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T12:15:05.4008845Z #31 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.4009267Z #32 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T12:15:05.4009636Z #33 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.4010093Z #34 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T12:15:05.4010478Z #35 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.4010888Z #36 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T12:15:05.4011268Z #37 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.4011674Z #38 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T12:15:05.4012043Z #39 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.4012496Z #40 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T12:15:05.4012865Z #41 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.4013174Z #42 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267 2025-12-04T12:15:05.4013436Z #43 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T12:15:05.4013806Z #44 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.4014205Z #45 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153 2025-12-04T12:15:05.4014510Z #46 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431 2025-12-04T12:15:05.4014825Z #47 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494 2025-12-04T12:15:05.4015131Z #48 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215 2025-12-04T12:15:05.4015543Z #49 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112 2025-12-04T12:15:05.4015928Z #50 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.4016417Z #51 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T12:15:05.4016790Z #52 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.4017072Z #53 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T12:15:05.4017443Z #54 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.4017863Z #55 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T12:15:05.4018276Z #56 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.4018685Z #57 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T12:15:05.4019069Z #58 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.4019421Z #59 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153 2025-12-04T12:15:05.4019742Z #60 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431 2025-12-04T12:15:05.4020040Z #61 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494 2025-12-04T12:15:05.4020314Z #62 _PyObject_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:305 2025-12-04T12:15:05.4020588Z #63 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T12:15:05.4020960Z #64 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.4021367Z #65 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T12:15:05.4021751Z #66 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.4022216Z #67 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T12:15:05.4022597Z #68 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.4022859Z #69 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T12:15:05.4023230Z #70 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.4023652Z #71 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T12:15:05.4024021Z #72 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.4024437Z #73 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T12:15:05.4024843Z #74 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.4025105Z #75 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T12:15:05.4025492Z #76 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.4025897Z #77 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T12:15:05.4026315Z #78 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.4026720Z #79 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T12:15:05.4027089Z #80 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.4027454Z #81 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153 2025-12-04T12:15:05.4027761Z #82 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431 2025-12-04T12:15:05.4028055Z #83 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494 2025-12-04T12:15:05.4028375Z #84 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215 2025-12-04T12:15:05.4028779Z #85 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112 2025-12-04T12:15:05.4029167Z #86 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.4029424Z #87 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T12:15:05.4029794Z #88 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.4030212Z #89 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T12:15:05.4030618Z #90 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.4031032Z #91 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T12:15:05.4031404Z #92 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.4031753Z #93 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153 2025-12-04T12:15:05.4032069Z #94 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431 2025-12-04T12:15:05.4032363Z #95 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494 2025-12-04T12:15:05.4032678Z #96 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215 2025-12-04T12:15:05.4033081Z #97 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112 2025-12-04T12:15:05.4033454Z #98 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.4033874Z #99 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T12:15:05.4034366Z #100 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.4034780Z #101 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T12:15:05.4035171Z #102 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.4035439Z #103 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T12:15:05.4035831Z #104 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.4036248Z #105 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T12:15:05.4036630Z #106 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.4037092Z #107 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T12:15:05.4037475Z #108 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.4037850Z #109 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153 2025-12-04T12:15:05.4038165Z #110 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431 2025-12-04T12:15:05.4038497Z #111 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494 2025-12-04T12:15:05.4038818Z #112 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215 2025-12-04T12:15:05.4039234Z #113 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112 2025-12-04T12:15:05.4039626Z #114 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.4040046Z #115 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T12:15:05.4040425Z #116 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.4040853Z #117 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T12:15:05.4041230Z #118 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.4041645Z #119 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T12:15:05.4042033Z #120 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.4042442Z #121 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T12:15:05.4043362Z #122 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.4043659Z #123 PyEval_EvalCode from /usr/local/src/conda/python-3.10.14/Python/ceval.c:1134 2025-12-04T12:15:05.4043967Z #124 run_eval_code_obj from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1291 2025-12-04T12:15:05.4044256Z #125 run_mod from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1312 2025-12-04T12:15:05.4044542Z #126 pyrun_file from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1208 2025-12-04T12:15:05.4044910Z #127 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:456 2025-12-04T12:15:05.4045239Z #128 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:90 2025-12-04T12:15:05.4045532Z #129 pymain_run_file_obj from /usr/local/src/conda/python-3.10.14/Modules/main.c:357 2025-12-04T12:15:05.4045820Z #130 Py_BytesMain from /usr/local/src/conda/python-3.10.14/Modules/main.c:1090 2025-12-04T12:15:05.4046090Z #131 __libc_start_call_main from ./csu/../sysdeps/nptl/libc_start_call_main.h:58 2025-12-04T12:15:05.4046291Z #132 __libc_start_main_impl from ./csu/../csu/libc-start.c:392 2025-12-04T12:15:05.4046408Z #133 _start from ??:0 2025-12-04T12:15:05.4046535Z #134 from ??:0 2025-12-04T12:15:05.4046541Z 2025-12-04T12:15:05.4046546Z 2025-12-04T12:15:05.4046814Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:05.4047354Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_eager_fallback_bfloat16_cuda_bfloat16 2025-12-04T12:15:05.4047362Z 2025-12-04T12:15:05.4047634Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:05.4047873Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.4047989Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:15:05.4048126Z stats [('calls_captured', 11)] 2025-12-04T12:15:05.4048477Z inductor [('pattern_matcher_count', 2), ('pattern_matcher_nodes', 2), ('fxgraph_cache_miss', 1)] 2025-12-04T12:15:05.4048732Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.4048851Z graph_break [] 2025-12-04T12:15:05.4049031Z aten_mm_info [('aten._scaled_mm.default_s77_s0_s77', 1)] 2025-12-04T12:15:05.4049259Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:15:05.4050487Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T12:15:05.4050643Z if out == self.unknown_value: 2025-12-04T12:15:05.4051387Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:15:05.4051501Z warnings.warn( 2025-12-04T12:15:05.4051722Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.4051856Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:15:05.4051977Z stats [('calls_captured', 11)] 2025-12-04T12:15:05.4052202Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.4052560Z inductor [('pattern_matcher_count', 2), ('pattern_matcher_nodes', 2), ('fxgraph_cache_miss', 1)] 2025-12-04T12:15:05.4052663Z graph_break [] 2025-12-04T12:15:05.4052858Z aten_mm_info [('aten._scaled_mm.default_s77_s0_s77', 1)] 2025-12-04T12:15:05.4053082Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:15:05.4053815Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:15:05.4053938Z warnings.warn( 2025-12-04T12:15:05.4054604Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.4054783Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:15:05.4054903Z stats [('calls_captured', 11)] 2025-12-04T12:15:05.4055129Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.4055484Z inductor [('pattern_matcher_count', 2), ('pattern_matcher_nodes', 2), ('fxgraph_cache_miss', 1)] 2025-12-04T12:15:05.4055589Z graph_break [] 2025-12-04T12:15:05.4055768Z aten_mm_info [('aten._scaled_mm.default_s77_s0_s77', 1)] 2025-12-04T12:15:05.4055998Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:15:05.4056821Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:15:05.4056946Z warnings.warn( 2025-12-04T12:15:05.4057600Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-b434424093647de3.xml - 2025-12-04T12:15:05.4057778Z =========================== short test summary info ============================ 2025-12-04T12:15:05.4058751Z FAILED [0.4551s] inductor/test_fp8.py::TestFP8TypesCUDA::test_eager_fallback_bfloat16_cuda_bfloat16 - RuntimeError: torch._scaled_mm is only supported on CUDA devices with compute capability >= 9.0 or 8.9, or ROCm MI300+ 2025-12-04T12:15:05.4059450Z Exception raised from _scaled_mm_out_cuda at /var/lib/jenkins/workspace/aten/src/ATen/native/cuda/ScaledBlas.cpp:492 (most recent call first): 2025-12-04T12:15:05.4059584Z C++ CapturedTraceback: 2025-12-04T12:15:05.4060886Z #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T12:15:05.4061370Z #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T12:15:05.4061753Z #6 c10::detail::torchCheckFail(char const*, char const*, unsigned int, char const*) from ??:0 2025-12-04T12:15:05.4062636Z #7 at::native::_scaled_mm_out_cuda(at::Tensor const&, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, std::optional const&, std::optional, bool, at::Tensor&) from ??:0 2025-12-04T12:15:05.4063447Z #8 at::native::_scaled_mm_cuda(at::Tensor const&, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, std::optional const&, std::optional, bool) from ??:0 2025-12-04T12:15:05.4064580Z #9 at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA___scaled_mm(at::Tensor const&, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, std::optional const&, std::optional, bool) from RegisterCUDA_0.cpp:0 2025-12-04T12:15:05.4067844Z #10 c10::impl::make_boxed_from_unboxed_functor const&, std::optional const&, std::optional, bool), &at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA___scaled_mm>, at::Tensor, c10::guts::typelist::typelist const&, std::optional const&, std::optional, bool> >, false>::call(c10::OperatorKernel*, c10::OperatorHandle const&, c10::DispatchKeySet, std::vector >*) from RegisterCUDA_0.cpp:0 2025-12-04T12:15:05.4068750Z #11 torch::autograd::autogradNotImplementedFallbackImpl(c10::OperatorHandle const&, c10::DispatchKeySet, std::vector >*) from autograd_not_implemented_fallback.cpp:0 2025-12-04T12:15:05.4069602Z #12 at::_ops::_scaled_mm::call(at::Tensor const&, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, std::optional const&, std::optional, bool) from ??:0 2025-12-04T12:15:05.4070006Z #13 torch::autograd::THPVariable__scaled_mm(_object*, _object*, _object*) from python_torch_functions_2.cpp:0 2025-12-04T12:15:05.4070344Z #14 cfunction_call from /usr/local/src/conda/python-3.10.14/Objects/methodobject.c:543 2025-12-04T12:15:05.4070652Z #15 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215 2025-12-04T12:15:05.4071256Z #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112 2025-12-04T12:15:05.4071399Z #17 dynamo__custom_eval_frame from :0 2025-12-04T12:15:05.4071775Z #18 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.4072043Z #19 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T12:15:05.4072432Z #20 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.4072842Z #21 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T12:15:05.4073519Z #22 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.4073826Z #23 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267 2025-12-04T12:15:05.4074096Z #24 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T12:15:05.4074480Z #25 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.4074739Z #26 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T12:15:05.4075126Z #27 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.4075384Z #28 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T12:15:05.4075813Z #29 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.4076085Z #30 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T12:15:05.4076460Z #31 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.4076882Z #32 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T12:15:05.4077297Z #33 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.4077703Z #34 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T12:15:05.4078086Z #35 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.4078495Z #36 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T12:15:05.4078866Z #37 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.4079286Z #38 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T12:15:05.4079658Z #39 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.4080080Z #40 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T12:15:05.4080452Z #41 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.4080747Z #42 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267 2025-12-04T12:15:05.4081020Z #43 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T12:15:05.4081453Z #44 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.4081817Z #45 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153 2025-12-04T12:15:05.4082122Z #46 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431 2025-12-04T12:15:05.4082423Z #47 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494 2025-12-04T12:15:05.4082742Z #48 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215 2025-12-04T12:15:05.4083152Z #49 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112 2025-12-04T12:15:05.4083534Z #50 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.4083941Z #51 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T12:15:05.4084312Z #52 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.4084589Z #53 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T12:15:05.4084962Z #54 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.4085400Z #55 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T12:15:05.4085787Z #56 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.4086194Z #57 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T12:15:05.4086576Z #58 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.4086927Z #59 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153 2025-12-04T12:15:05.4087231Z #60 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431 2025-12-04T12:15:05.4087541Z #61 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494 2025-12-04T12:15:05.4087842Z #62 _PyObject_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:305 2025-12-04T12:15:05.4088118Z #63 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T12:15:05.4088491Z #64 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.4088897Z #65 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T12:15:05.4089311Z #66 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.4089715Z #67 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T12:15:05.4090086Z #68 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.4090360Z #69 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T12:15:05.4090730Z #70 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.4091147Z #71 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T12:15:05.4091519Z #72 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.4091923Z #73 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T12:15:05.4092307Z #74 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.4092568Z #75 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T12:15:05.4092950Z #76 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.4093354Z #77 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T12:15:05.4093760Z #78 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.4094177Z #79 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T12:15:05.4094548Z #80 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.4094908Z #81 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153 2025-12-04T12:15:05.4095213Z #82 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431 2025-12-04T12:15:05.4095507Z #83 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494 2025-12-04T12:15:05.4095821Z #84 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215 2025-12-04T12:15:05.4096229Z #85 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112 2025-12-04T12:15:05.4096687Z #86 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.4096967Z #87 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T12:15:05.4097340Z #88 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.4097798Z #89 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T12:15:05.4098171Z #90 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.4098582Z #91 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T12:15:05.4098967Z #92 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.4099315Z #93 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153 2025-12-04T12:15:05.4099632Z #94 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431 2025-12-04T12:15:05.4099957Z #95 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494 2025-12-04T12:15:05.4100259Z #96 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215 2025-12-04T12:15:05.4100684Z #97 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112 2025-12-04T12:15:05.4101054Z #98 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.4101508Z #99 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T12:15:05.4101891Z #100 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.4102306Z #101 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T12:15:05.4102697Z #102 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.4102967Z #103 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T12:15:05.4103344Z #104 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.4103770Z #105 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T12:15:05.4104145Z #106 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.4104570Z #107 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T12:15:05.4104947Z #108 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.4105304Z #109 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153 2025-12-04T12:15:05.4105662Z #110 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431 2025-12-04T12:15:05.4105965Z #111 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494 2025-12-04T12:15:05.4106283Z #112 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215 2025-12-04T12:15:05.4106700Z #113 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112 2025-12-04T12:15:05.4107078Z #114 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.4107506Z #115 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T12:15:05.4107885Z #116 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.4108310Z #117 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T12:15:05.4108688Z #118 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.4109102Z #119 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T12:15:05.4109498Z #120 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.4109941Z #121 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T12:15:05.4110322Z #122 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T12:15:05.4110630Z #123 PyEval_EvalCode from /usr/local/src/conda/python-3.10.14/Python/ceval.c:1134 2025-12-04T12:15:05.4110940Z #124 run_eval_code_obj from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1291 2025-12-04T12:15:05.4111224Z #125 run_mod from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1312 2025-12-04T12:15:05.4111512Z #126 pyrun_file from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1208 2025-12-04T12:15:05.4111867Z #127 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:456 2025-12-04T12:15:05.4112255Z #128 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:90 2025-12-04T12:15:05.4112550Z #129 pymain_run_file_obj from /usr/local/src/conda/python-3.10.14/Modules/main.c:357 2025-12-04T12:15:05.4112838Z #130 Py_BytesMain from /usr/local/src/conda/python-3.10.14/Modules/main.c:1090 2025-12-04T12:15:05.4113110Z #131 __libc_start_call_main from ./csu/../sysdeps/nptl/libc_start_call_main.h:58 2025-12-04T12:15:05.4113357Z #132 __libc_start_main_impl from ./csu/../csu/libc-start.c:392 2025-12-04T12:15:05.4113476Z #133 _start from ??:0 2025-12-04T12:15:05.4113600Z #134 from ??:0 2025-12-04T12:15:05.4113606Z 2025-12-04T12:15:05.4113611Z 2025-12-04T12:15:05.4113830Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:05.4114386Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_eager_fallback_bfloat16_cuda_bfloat16 2025-12-04T12:15:05.4114394Z 2025-12-04T12:15:05.4114665Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:05.4114858Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T12:15:05.4115066Z ================= 1 failed, 187 deselected, 2 rerun in 19.83s ================== 2025-12-04T12:15:05.4115167Z Got exit code 1 2025-12-04T12:15:05.4115641Z FAILED CONSISTENTLY: test/inductor/test_fp8.py::TestFP8TypesCUDA::test_eager_fallback_bfloat16_cuda_bfloat16 2025-12-04T12:15:05.4116054Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T12:15:05.4116536Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-ccd966f4e119e833.xml 2025-12-04T12:15:05.4116701Z ============================= test session starts ============================== 2025-12-04T12:15:05.4117086Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T12:15:05.4117212Z cachedir: .pytest_cache 2025-12-04T12:15:05.4117753Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:15:05.4117892Z rootdir: /var/lib/jenkins/workspace 2025-12-04T12:15:05.4118007Z configfile: pytest.ini 2025-12-04T12:15:05.4118603Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T12:15:05.4118847Z collecting ... collected 188 items / 22 deselected / 166 selected 2025-12-04T12:15:05.4118995Z stepcurrent: skipping 22 already run items. 2025-12-04T12:15:05.4119112Z Running 166 items in this shard 2025-12-04T12:15:05.4119118Z 2025-12-04T12:15:05.4120004Z inductor/test_fp8.py::TestFP8TypesCUDA::test_eager_fallback_float16_cuda_float16 W1204 11:54:30.500000 115688 site-packages/torch/_inductor/utils.py:1703] [0/0] Not enough SMs to use max_autotune_gemm mode 2025-12-04T12:15:05.4120768Z E1204 11:54:31.105000 115688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__scaled_mm__to_copy_mul_permute_rand_0 2025-12-04T12:15:05.4121760Z E1204 11:54:31.105000 115688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__scaled_mm__to_copy_mul_permute_rand_0(in_ptr0, out_ptr1, load_seed_offset, ks1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.4122312Z E1204 11:54:31.105000 115688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:05.4122887Z E1204 11:54:31.105000 115688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T12:15:05.4123382Z E1204 11:54:31.105000 115688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = xindex < xnumel 2025-12-04T12:15:05.4123846Z E1204 11:54:31.105000 115688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T12:15:05.4124336Z E1204 11:54:31.105000 115688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x1 = (xindex % ks1) 2025-12-04T12:15:05.4124940Z E1204 11:54:31.105000 115688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x2 = triton_helpers.div_floor_integer(xindex, ks1) 2025-12-04T12:15:05.4125510Z E1204 11:54:31.105000 115688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + load_seed_offset) 2025-12-04T12:15:05.4125970Z E1204 11:54:31.105000 115688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = x0 2025-12-04T12:15:05.4126531Z E1204 11:54:31.105000 115688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.rand(tmp0, (tmp1).to(tl.uint32)) 2025-12-04T12:15:05.4127055Z E1204 11:54:31.105000 115688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tmp2.to(tl.float32) 2025-12-04T12:15:05.4127585Z E1204 11:54:31.105000 115688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tmp3.to(tl.float8e4nv) 2025-12-04T12:15:05.4128293Z E1204 11:54:31.105000 115688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (x1 + x2*((1) * ((1) >= (ks1)) + (ks1) * ((ks1) > (1)))), tmp4, xmask) 2025-12-04T12:15:05.4128658Z E1204 11:54:31.105000 115688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:05.4130580Z E1204 11:54:31.105000 115688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*i64', 'out_ptr1': '*fp8e4nv', 'load_seed_offset': 'constexpr', 'ks1': 'i64', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'load_seed_offset': 1, 'XBLOCK': 256}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 75} 2025-12-04T12:15:05.4131148Z E1204 11:54:31.105000 115688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:05.4132199Z E1204 11:54:31.105000 115688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.4132848Z E1204 11:54:31.105000 115688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.4133740Z E1204 11:54:31.105000 115688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.4134438Z E1204 11:54:31.105000 115688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.4135356Z E1204 11:54:31.105000 115688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.4136143Z E1204 11:54:31.105000 115688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.4136836Z E1204 11:54:31.105000 115688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T12:15:05.4137784Z E1204 11:54:31.105000 115688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__scaled_mm__to_copy_mul_permute_rand_0(in_ptr0, out_ptr1, load_seed_offset, ks1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.4138196Z E1204 11:54:31.105000 115688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T12:15:05.4139091Z E1204 11:54:31.105000 115688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.4139242Z ('RERUN', {'yellow': True}) [3.8305s] [ 0%] 2025-12-04T12:15:05.4140429Z inductor/test_fp8.py::TestFP8TypesCUDA::test_eager_fallback_float16_cuda_float16 E1204 11:54:31.717000 115688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__scaled_mm__to_copy_mul_permute_rand_0 2025-12-04T12:15:05.4141367Z E1204 11:54:31.717000 115688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__scaled_mm__to_copy_mul_permute_rand_0(in_ptr0, out_ptr1, load_seed_offset, ks1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.4141915Z E1204 11:54:31.717000 115688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:05.4142477Z E1204 11:54:31.717000 115688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T12:15:05.4142989Z E1204 11:54:31.717000 115688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = xindex < xnumel 2025-12-04T12:15:05.4143425Z E1204 11:54:31.717000 115688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T12:15:05.4143912Z E1204 11:54:31.717000 115688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x1 = (xindex % ks1) 2025-12-04T12:15:05.4144507Z E1204 11:54:31.717000 115688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x2 = triton_helpers.div_floor_integer(xindex, ks1) 2025-12-04T12:15:05.4145117Z E1204 11:54:31.717000 115688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + load_seed_offset) 2025-12-04T12:15:05.4145539Z E1204 11:54:31.717000 115688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = x0 2025-12-04T12:15:05.4146099Z E1204 11:54:31.717000 115688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.rand(tmp0, (tmp1).to(tl.uint32)) 2025-12-04T12:15:05.4146615Z E1204 11:54:31.717000 115688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tmp2.to(tl.float32) 2025-12-04T12:15:05.4147142Z E1204 11:54:31.717000 115688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tmp3.to(tl.float8e4nv) 2025-12-04T12:15:05.4147822Z E1204 11:54:31.717000 115688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (x1 + x2*((1) * ((1) >= (ks1)) + (ks1) * ((ks1) > (1)))), tmp4, xmask) 2025-12-04T12:15:05.4148191Z E1204 11:54:31.717000 115688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:05.4150130Z E1204 11:54:31.717000 115688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*i64', 'out_ptr1': '*fp8e4nv', 'load_seed_offset': 'constexpr', 'ks1': 'i64', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'load_seed_offset': 1, 'XBLOCK': 256}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 75} 2025-12-04T12:15:05.4150683Z E1204 11:54:31.717000 115688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:05.4151733Z E1204 11:54:31.717000 115688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.4152404Z E1204 11:54:31.717000 115688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.4153298Z E1204 11:54:31.717000 115688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.4154026Z E1204 11:54:31.717000 115688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.4154910Z E1204 11:54:31.717000 115688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.4155697Z E1204 11:54:31.717000 115688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.4156307Z E1204 11:54:31.717000 115688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T12:15:05.4157242Z E1204 11:54:31.717000 115688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__scaled_mm__to_copy_mul_permute_rand_0(in_ptr0, out_ptr1, load_seed_offset, ks1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.4157629Z E1204 11:54:31.717000 115688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T12:15:05.4158519Z E1204 11:54:31.717000 115688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.4158696Z ('RERUN', {'yellow': True}) [0.5737s] [ 0%] 2025-12-04T12:15:05.4159841Z inductor/test_fp8.py::TestFP8TypesCUDA::test_eager_fallback_float16_cuda_float16 E1204 11:54:32.313000 115688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__scaled_mm__to_copy_mul_permute_rand_0 2025-12-04T12:15:05.4160786Z E1204 11:54:32.313000 115688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__scaled_mm__to_copy_mul_permute_rand_0(in_ptr0, out_ptr1, load_seed_offset, ks1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.4161331Z E1204 11:54:32.313000 115688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:05.4161892Z E1204 11:54:32.313000 115688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T12:15:05.4162402Z E1204 11:54:32.313000 115688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = xindex < xnumel 2025-12-04T12:15:05.4162835Z E1204 11:54:32.313000 115688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T12:15:05.4163346Z E1204 11:54:32.313000 115688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x1 = (xindex % ks1) 2025-12-04T12:15:05.4163942Z E1204 11:54:32.313000 115688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x2 = triton_helpers.div_floor_integer(xindex, ks1) 2025-12-04T12:15:05.4164502Z E1204 11:54:32.313000 115688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + load_seed_offset) 2025-12-04T12:15:05.4164940Z E1204 11:54:32.313000 115688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = x0 2025-12-04T12:15:05.4165498Z E1204 11:54:32.313000 115688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.rand(tmp0, (tmp1).to(tl.uint32)) 2025-12-04T12:15:05.4166065Z E1204 11:54:32.313000 115688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tmp2.to(tl.float32) 2025-12-04T12:15:05.4166590Z E1204 11:54:32.313000 115688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tmp3.to(tl.float8e4nv) 2025-12-04T12:15:05.4167262Z E1204 11:54:32.313000 115688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (x1 + x2*((1) * ((1) >= (ks1)) + (ks1) * ((ks1) > (1)))), tmp4, xmask) 2025-12-04T12:15:05.4167671Z E1204 11:54:32.313000 115688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:05.4169569Z E1204 11:54:32.313000 115688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*i64', 'out_ptr1': '*fp8e4nv', 'load_seed_offset': 'constexpr', 'ks1': 'i64', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'load_seed_offset': 1, 'XBLOCK': 256}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 75} 2025-12-04T12:15:05.4170122Z E1204 11:54:32.313000 115688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:05.4171350Z E1204 11:54:32.313000 115688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.4171995Z E1204 11:54:32.313000 115688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.4172885Z E1204 11:54:32.313000 115688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.4173648Z E1204 11:54:32.313000 115688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.4174533Z E1204 11:54:32.313000 115688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.4175319Z E1204 11:54:32.313000 115688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.4175927Z E1204 11:54:32.313000 115688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T12:15:05.4176936Z E1204 11:54:32.313000 115688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__scaled_mm__to_copy_mul_permute_rand_0(in_ptr0, out_ptr1, load_seed_offset, ks1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.4177324Z E1204 11:54:32.313000 115688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T12:15:05.4178268Z E1204 11:54:32.313000 115688 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.4178402Z FAILED [0.5948s] [ 0%] 2025-12-04T12:15:05.4178408Z 2025-12-04T12:15:05.4178558Z ==================================== RERUNS ==================================== 2025-12-04T12:15:05.4178842Z __________ TestFP8TypesCUDA.test_eager_fallback_float16_cuda_float16 ___________ 2025-12-04T12:15:05.4178986Z Traceback (most recent call last): 2025-12-04T12:15:05.4179387Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 130, in test_eager_fallback 2025-12-04T12:15:05.4179552Z y_fp8 = compiled_fp8_matmul(x) # noqa: F841 2025-12-04T12:15:05.4180093Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:05.4180348Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:05.4180881Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:05.4181077Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:05.4181629Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:05.4181793Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:05.4182327Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:05.4182665Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:05.4183188Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:05.4183337Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:05.4183836Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:05.4183961Z return self._compile_to_module() 2025-12-04T12:15:05.4184463Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:05.4184628Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:05.4185145Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:05.4185290Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:05.4185827Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:05.4186061Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:05.4186661Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:05.4186788Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:05.4187303Z File "/tmp/tmpjb957dc9/jn/cjnexhgq3vjxjlw7edwbtm6qytyqlitar2t5iw5kgvsxyibeqhnf.py", line 60, in 2025-12-04T12:15:05.4187769Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T12:15:05.4187881Z kernel.precompile( 2025-12-04T12:15:05.4188449Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T12:15:05.4188570Z self._precompile_worker() 2025-12-04T12:15:05.4189180Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:05.4189361Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:05.4189992Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.4190208Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.4190662Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.4190910Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.4191364Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.4191699Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.4191938Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:05.4192467Z def triton_poi_fused__scaled_mm__to_copy_mul_permute_rand_0(in_ptr0, out_ptr1, load_seed_offset, ks1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.4192560Z ^ 2025-12-04T12:15:05.4193033Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.4193040Z 2025-12-04T12:15:05.4193752Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:05.4193789Z 2025-12-04T12:15:05.4193794Z 2025-12-04T12:15:05.4194024Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:05.4194548Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_eager_fallback_float16_cuda_float16 2025-12-04T12:15:05.4194556Z 2025-12-04T12:15:05.4194843Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:05.4195074Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.4195183Z frames [('total', 1)] 2025-12-04T12:15:05.4195316Z stats [('calls_captured', 11)] 2025-12-04T12:15:05.4195851Z inductor [('pattern_matcher_count', 2), ('pattern_matcher_nodes', 2), ('extern_calls', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:05.4196074Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.4196190Z graph_break [] 2025-12-04T12:15:05.4196370Z aten_mm_info [('aten._scaled_mm.default_s77_s0_s77', 1)] 2025-12-04T12:15:05.4196669Z __________ TestFP8TypesCUDA.test_eager_fallback_float16_cuda_float16 ___________ 2025-12-04T12:15:05.4196795Z Traceback (most recent call last): 2025-12-04T12:15:05.4197191Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 130, in test_eager_fallback 2025-12-04T12:15:05.4197384Z y_fp8 = compiled_fp8_matmul(x) # noqa: F841 2025-12-04T12:15:05.4197877Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:05.4198128Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:05.4198661Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:05.4198854Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:05.4199379Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:05.4199531Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:05.4200067Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:05.4200408Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:05.4200933Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:05.4201099Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:05.4201612Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:05.4201736Z return self._compile_to_module() 2025-12-04T12:15:05.4202234Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:05.4202403Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:05.4202919Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:05.4203067Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:05.4203566Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:05.4203843Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:05.4204435Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:05.4204564Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:05.4205087Z File "/tmp/tmp0jedbhwp/ay/cay2dvspppsjmoss6vkxbgpgym75gkayiiwjmtjezgn52evwc76g.py", line 60, in 2025-12-04T12:15:05.4205583Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T12:15:05.4205706Z kernel.precompile( 2025-12-04T12:15:05.4206262Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T12:15:05.4206382Z self._precompile_worker() 2025-12-04T12:15:05.4206992Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:05.4207174Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:05.4207770Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.4207986Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.4208437Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.4208696Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.4209139Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.4209473Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.4209746Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:05.4210248Z def triton_poi_fused__scaled_mm__to_copy_mul_permute_rand_0(in_ptr0, out_ptr1, load_seed_offset, ks1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.4210352Z ^ 2025-12-04T12:15:05.4210809Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.4210818Z 2025-12-04T12:15:05.4211531Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:05.4211539Z 2025-12-04T12:15:05.4211544Z 2025-12-04T12:15:05.4211780Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:05.4212306Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_eager_fallback_float16_cuda_float16 2025-12-04T12:15:05.4212312Z 2025-12-04T12:15:05.4212596Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:05.4212823Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.4212933Z frames [('total', 1)] 2025-12-04T12:15:05.4213069Z stats [('calls_captured', 11)] 2025-12-04T12:15:05.4213700Z inductor [('pattern_matcher_count', 2), ('pattern_matcher_nodes', 2), ('extern_calls', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:05.4213941Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.4214046Z graph_break [] 2025-12-04T12:15:05.4214231Z aten_mm_info [('aten._scaled_mm.default_s77_s0_s77', 1)] 2025-12-04T12:15:05.4214470Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.4214577Z frames [('total', 1)] 2025-12-04T12:15:05.4214696Z stats [('calls_captured', 11)] 2025-12-04T12:15:05.4214934Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.4215468Z inductor [('pattern_matcher_count', 2), ('pattern_matcher_nodes', 2), ('extern_calls', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:05.4215621Z graph_break [] 2025-12-04T12:15:05.4215805Z aten_mm_info [('aten._scaled_mm.default_s77_s0_s77', 1)] 2025-12-04T12:15:05.4215953Z =================================== FAILURES =================================== 2025-12-04T12:15:05.4216254Z __________ TestFP8TypesCUDA.test_eager_fallback_float16_cuda_float16 ___________ 2025-12-04T12:15:05.4216487Z Traceback (most recent call last): 2025-12-04T12:15:05.4216930Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 130, in test_eager_fallback 2025-12-04T12:15:05.4217095Z y_fp8 = compiled_fp8_matmul(x) # noqa: F841 2025-12-04T12:15:05.4217586Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:05.4217850Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:05.4218370Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:05.4218566Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:05.4219092Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:05.4219246Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:05.4219784Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:05.4220124Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:05.4220643Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:05.4220805Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:05.4221321Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:05.4221447Z return self._compile_to_module() 2025-12-04T12:15:05.4221953Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:05.4222123Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:05.4222655Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:05.4222788Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:05.4223285Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:05.4223537Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:05.4224128Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:05.4224257Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:05.4224777Z File "/tmp/tmp317z82xt/3z/c3zhiqe2blftfihjcp7wonscomrqgbh4l4xighgjwf3blmoz6ce3.py", line 60, in 2025-12-04T12:15:05.4225241Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T12:15:05.4225400Z kernel.precompile( 2025-12-04T12:15:05.4225959Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T12:15:05.4226079Z self._precompile_worker() 2025-12-04T12:15:05.4226687Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:05.4226865Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:05.4227936Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.4228143Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.4228637Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.4228904Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.4229349Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.4229689Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.4229962Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:05.4230459Z def triton_poi_fused__scaled_mm__to_copy_mul_permute_rand_0(in_ptr0, out_ptr1, load_seed_offset, ks1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.4230568Z ^ 2025-12-04T12:15:05.4231029Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.4231038Z 2025-12-04T12:15:05.4231753Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:05.4231774Z 2025-12-04T12:15:05.4231778Z 2025-12-04T12:15:05.4231999Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:05.4232522Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_eager_fallback_float16_cuda_float16 2025-12-04T12:15:05.4232530Z 2025-12-04T12:15:05.4232814Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:05.4233036Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.4233157Z frames [('total', 1)] 2025-12-04T12:15:05.4233278Z stats [('calls_captured', 11)] 2025-12-04T12:15:05.4233811Z inductor [('pattern_matcher_count', 2), ('pattern_matcher_nodes', 2), ('extern_calls', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:05.4234168Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.4234271Z graph_break [] 2025-12-04T12:15:05.4234453Z aten_mm_info [('aten._scaled_mm.default_s77_s0_s77', 1)] 2025-12-04T12:15:05.4234693Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.4234800Z frames [('total', 1)] 2025-12-04T12:15:05.4234918Z stats [('calls_captured', 11)] 2025-12-04T12:15:05.4235152Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.4235688Z inductor [('pattern_matcher_count', 2), ('pattern_matcher_nodes', 2), ('extern_calls', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:05.4235802Z graph_break [] 2025-12-04T12:15:05.4235979Z aten_mm_info [('aten._scaled_mm.default_s77_s0_s77', 1)] 2025-12-04T12:15:05.4236197Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.4236316Z frames [('total', 1)] 2025-12-04T12:15:05.4236434Z stats [('calls_captured', 11)] 2025-12-04T12:15:05.4236657Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.4237231Z inductor [('pattern_matcher_count', 2), ('pattern_matcher_nodes', 2), ('extern_calls', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:05.4237333Z graph_break [] 2025-12-04T12:15:05.4237521Z aten_mm_info [('aten._scaled_mm.default_s77_s0_s77', 1)] 2025-12-04T12:15:05.4238179Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-ccd966f4e119e833.xml - 2025-12-04T12:15:05.4238353Z =========================== short test summary info ============================ 2025-12-04T12:15:05.4239065Z FAILED [0.5948s] inductor/test_fp8.py::TestFP8TypesCUDA::test_eager_fallback_float16_cuda_float16 - torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:05.4239596Z def triton_poi_fused__scaled_mm__to_copy_mul_permute_rand_0(in_ptr0, out_ptr1, load_seed_offset, ks1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.4239705Z ^ 2025-12-04T12:15:05.4240303Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.4240309Z 2025-12-04T12:15:05.4241023Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:05.4241066Z 2025-12-04T12:15:05.4241071Z 2025-12-04T12:15:05.4241303Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:05.4241829Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_eager_fallback_float16_cuda_float16 2025-12-04T12:15:05.4241835Z 2025-12-04T12:15:05.4242117Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:05.4242302Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T12:15:05.4242507Z ================== 1 failed, 22 deselected, 2 rerun in 5.04s =================== 2025-12-04T12:15:05.4242622Z Got exit code 1 2025-12-04T12:15:05.4242730Z Retrying single test... 2025-12-04T12:15:05.4243219Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-d16f18ba4de45d90.xml 2025-12-04T12:15:05.4243385Z ============================= test session starts ============================== 2025-12-04T12:15:05.4243740Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T12:15:05.4243866Z cachedir: .pytest_cache 2025-12-04T12:15:05.4244384Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:15:05.4244512Z rootdir: /var/lib/jenkins/workspace 2025-12-04T12:15:05.4244668Z configfile: pytest.ini 2025-12-04T12:15:05.4245263Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T12:15:05.4245499Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T12:15:05.4246106Z stepcurrent: skipping 22 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_eager_fallback_float16_cuda_float16 2025-12-04T12:15:05.4246223Z Running 1 items in this shard 2025-12-04T12:15:05.4246228Z 2025-12-04T12:15:05.4247163Z inductor/test_fp8.py::TestFP8TypesCUDA::test_eager_fallback_float16_cuda_float16 [W1204 11:54:49.329949221 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:15:05.4247169Z 2025-12-04T12:15:05.4247687Z [W1204 11:55:05.232876621 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:15:05.4247695Z 2025-12-04T12:15:05.4248222Z [W1204 11:55:05.233142957 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:15:05.4248227Z 2025-12-04T12:15:05.4248739Z [W1204 11:55:05.236255432 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:15:05.4248779Z 2025-12-04T12:15:05.4249301Z [W1204 11:55:05.236458970 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:15:05.4249309Z 2025-12-04T12:15:05.4249820Z [W1204 11:55:05.238470603 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:15:05.4249825Z 2025-12-04T12:15:05.4250344Z [W1204 11:55:05.238833323 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:15:05.4250351Z 2025-12-04T12:15:05.4250862Z [W1204 11:55:05.239006537 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:15:05.4250898Z 2025-12-04T12:15:05.4251409Z [W1204 11:55:05.239571280 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:15:05.4251426Z 2025-12-04T12:15:05.4251941Z [W1204 11:55:05.239765830 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:15:05.4251976Z 2025-12-04T12:15:05.4252486Z [W1204 11:55:05.240399667 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:15:05.4252491Z 2025-12-04T12:15:05.4253013Z [W1204 11:55:05.240613813 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:15:05.4253018Z 2025-12-04T12:15:05.4253529Z [W1204 11:55:05.241045787 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:15:05.4253534Z 2025-12-04T12:15:05.4254061Z [W1204 11:55:05.241224039 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:15:05.4254066Z 2025-12-04T12:15:05.4254576Z [W1204 11:55:05.241603480 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:15:05.4254581Z 2025-12-04T12:15:05.4255104Z [W1204 11:55:05.241780784 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:15:05.4255109Z 2025-12-04T12:15:05.4255620Z [W1204 11:55:05.242167135 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:15:05.4255625Z 2025-12-04T12:15:05.4256133Z [W1204 11:55:05.242344300 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:15:05.4256212Z 2025-12-04T12:15:05.4256772Z W1204 11:55:06.210000 115886 site-packages/torch/_inductor/utils.py:1703] [0/0] Not enough SMs to use max_autotune_gemm mode 2025-12-04T12:15:05.4256913Z ('RERUN', {'yellow': True}) [19.8220s] [100%] 2025-12-04T12:15:05.4257853Z inductor/test_fp8.py::TestFP8TypesCUDA::test_eager_fallback_float16_cuda_float16 [W1204 11:55:07.680953394 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:15:05.4257861Z 2025-12-04T12:15:05.4258379Z [W1204 11:55:07.681414332 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:15:05.4258383Z 2025-12-04T12:15:05.4258907Z [W1204 11:55:07.681609327 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:15:05.4258915Z 2025-12-04T12:15:05.4259428Z [W1204 11:55:07.682206823 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:15:05.4259433Z 2025-12-04T12:15:05.4259954Z [W1204 11:55:07.682401354 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:15:05.4260007Z 2025-12-04T12:15:05.4260518Z [W1204 11:55:07.682771955 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:15:05.4260526Z 2025-12-04T12:15:05.4261031Z [W1204 11:55:07.683074694 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:15:05.4261048Z 2025-12-04T12:15:05.4261556Z [W1204 11:55:07.683245103 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:15:05.4261563Z 2025-12-04T12:15:05.4262070Z [W1204 11:55:07.683706512 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:15:05.4262105Z 2025-12-04T12:15:05.4262628Z [W1204 11:55:07.683886981 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:15:05.4262633Z 2025-12-04T12:15:05.4263144Z [W1204 11:55:07.684341405 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:15:05.4263182Z 2025-12-04T12:15:05.4263704Z [W1204 11:55:07.684523240 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:15:05.4263709Z 2025-12-04T12:15:05.4264216Z [W1204 11:55:07.684930294 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:15:05.4264220Z 2025-12-04T12:15:05.4264745Z [W1204 11:55:07.685108262 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:15:05.4264750Z 2025-12-04T12:15:05.4265263Z [W1204 11:55:07.685473853 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:15:05.4265268Z 2025-12-04T12:15:05.4265789Z [W1204 11:55:07.685650715 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:15:05.4265794Z 2025-12-04T12:15:05.4266303Z [W1204 11:55:07.686018333 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:15:05.4266308Z 2025-12-04T12:15:05.4266817Z [W1204 11:55:07.686196060 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:15:05.4266834Z 2025-12-04T12:15:05.4266969Z ('RERUN', {'yellow': True}) [0.8658s] [100%] 2025-12-04T12:15:05.4267923Z inductor/test_fp8.py::TestFP8TypesCUDA::test_eager_fallback_float16_cuda_float16 [W1204 11:55:08.540588649 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:15:05.4267929Z 2025-12-04T12:15:05.4268732Z [W1204 11:55:08.541130007 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:15:05.4268738Z 2025-12-04T12:15:05.4269253Z [W1204 11:55:08.541323048 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:15:05.4269260Z 2025-12-04T12:15:05.4269787Z [W1204 11:55:08.541906082 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:15:05.4269791Z 2025-12-04T12:15:05.4270298Z [W1204 11:55:08.542102830 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:15:05.4270307Z 2025-12-04T12:15:05.4270834Z [W1204 11:55:08.542459564 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:15:05.4270839Z 2025-12-04T12:15:05.4271644Z [W1204 11:55:08.542763509 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:15:05.4271651Z 2025-12-04T12:15:05.4272163Z [W1204 11:55:08.542935138 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:15:05.4272186Z 2025-12-04T12:15:05.4272697Z [W1204 11:55:08.543397112 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:15:05.4272702Z 2025-12-04T12:15:05.4273209Z [W1204 11:55:08.543580272 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:15:05.4273217Z 2025-12-04T12:15:05.4273796Z [W1204 11:55:08.544024160 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:15:05.4273801Z 2025-12-04T12:15:05.4274318Z [W1204 11:55:08.544207224 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:15:05.4274325Z 2025-12-04T12:15:05.4274852Z [W1204 11:55:08.544601688 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:15:05.4274902Z 2025-12-04T12:15:05.4275412Z [W1204 11:55:08.544782826 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:15:05.4275418Z 2025-12-04T12:15:05.4275938Z [W1204 11:55:08.545144186 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:15:05.4275946Z 2025-12-04T12:15:05.4276458Z [W1204 11:55:08.545330823 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:15:05.4276466Z 2025-12-04T12:15:05.4276992Z [W1204 11:55:08.545695284 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:15:05.4276997Z 2025-12-04T12:15:05.4277506Z [W1204 11:55:08.545873675 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:15:05.4277514Z 2025-12-04T12:15:05.4277620Z FAILED [0.9733s] [100%] 2025-12-04T12:15:05.4277625Z 2025-12-04T12:15:05.4277782Z ==================================== RERUNS ==================================== 2025-12-04T12:15:05.4278067Z __________ TestFP8TypesCUDA.test_eager_fallback_float16_cuda_float16 ___________ 2025-12-04T12:15:05.4278205Z Traceback (most recent call last): 2025-12-04T12:15:05.4278605Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 130, in test_eager_fallback 2025-12-04T12:15:05.4278807Z y_fp8 = compiled_fp8_matmul(x) # noqa: F841 2025-12-04T12:15:05.4279317Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:05.4279567Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:05.4280086Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:05.4280293Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:05.4280823Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:05.4280987Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:05.4281526Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:05.4281850Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:05.4282391Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:05.4282541Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:05.4283075Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:05.4283202Z return self._compile_to_module() 2025-12-04T12:15:05.4283692Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:05.4283876Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:05.4284394Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:05.4284523Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:05.4285040Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:05.4285311Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:05.4285920Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:05.4286050Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:05.4286532Z File "/tmp/tmpz_1dg954/ya/cyapkdhf5yoal5x65ohvfufmwbn7mtthu52mkwtqnsx4utahfmjk.py", line 193, in 2025-12-04T12:15:05.4287039Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 649, in wait 2025-12-04T12:15:05.4287159Z self._wait_futures(scope) 2025-12-04T12:15:05.4287669Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 669, in _wait_futures 2025-12-04T12:15:05.4287792Z kernel = result.result() 2025-12-04T12:15:05.4288243Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 4416, in result 2025-12-04T12:15:05.4288376Z return self.result_fn() 2025-12-04T12:15:05.4288857Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 438, in get_result 2025-12-04T12:15:05.4288993Z raise e.with_name(kernel_name) from e 2025-12-04T12:15:05.4289392Z torch._inductor.exc.InductorError: SubprocException: An exception occurred in a subprocess: 2025-12-04T12:15:05.4289401Z 2025-12-04T12:15:05.4289612Z Name=triton_poi_fused__scaled_mm__to_copy_mul_permute_rand_0 2025-12-04T12:15:05.4289754Z Traceback (most recent call last): 2025-12-04T12:15:05.4290299Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_worker/subproc_pool.py", line 457, in do_job 2025-12-04T12:15:05.4290403Z result = job() 2025-12-04T12:15:05.4291014Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 68, in _worker_compile_triton 2025-12-04T12:15:05.4291198Z kernel.precompile(warm_cache_only=True) 2025-12-04T12:15:05.4291768Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 432, in precompile 2025-12-04T12:15:05.4291887Z self._precompile_worker() 2025-12-04T12:15:05.4292486Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:05.4292685Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:05.4293279Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.4293480Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.4293950Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.4294201Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.4294662Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.4295000Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.4295217Z triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T12:15:05.4295734Z def triton_poi_fused__scaled_mm__to_copy_mul_permute_rand_0(in_ptr0, out_ptr1, load_seed_offset, ks1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.4295829Z ^ 2025-12-04T12:15:05.4296394Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.4296401Z 2025-12-04T12:15:05.4296405Z 2025-12-04T12:15:05.4297117Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:05.4297125Z 2025-12-04T12:15:05.4297129Z 2025-12-04T12:15:05.4297387Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:05.4297928Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_eager_fallback_float16_cuda_float16 2025-12-04T12:15:05.4297933Z 2025-12-04T12:15:05.4298203Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:05.4298442Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.4298601Z frames [('total', 1)] 2025-12-04T12:15:05.4298719Z stats [('calls_captured', 11)] 2025-12-04T12:15:05.4299389Z inductor [('async_compile_cache_miss', 6), ('async_compile_cache_hit', 3), ('pattern_matcher_count', 2), ('pattern_matcher_nodes', 2), ('extern_calls', 2), ('fxgraph_cache_miss', 1)] 2025-12-04T12:15:05.4299613Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.4299733Z graph_break [] 2025-12-04T12:15:05.4299909Z aten_mm_info [('aten._scaled_mm.default_s77_s0_s77', 1)] 2025-12-04T12:15:05.4300130Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:15:05.4301354Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T12:15:05.4301474Z if out == self.unknown_value: 2025-12-04T12:15:05.4301772Z __________ TestFP8TypesCUDA.test_eager_fallback_float16_cuda_float16 ___________ 2025-12-04T12:15:05.4301898Z Traceback (most recent call last): 2025-12-04T12:15:05.4302294Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 130, in test_eager_fallback 2025-12-04T12:15:05.4302455Z y_fp8 = compiled_fp8_matmul(x) # noqa: F841 2025-12-04T12:15:05.4302944Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:05.4303234Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:05.4303760Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:05.4303957Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:05.4304481Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:05.4304631Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:05.4305167Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:05.4305504Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:05.4306025Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:05.4306189Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:05.4306674Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:05.4306797Z return self._compile_to_module() 2025-12-04T12:15:05.4307332Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:05.4307498Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:05.4308016Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:05.4308157Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:05.4308657Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:05.4308901Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:05.4309517Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:05.4309650Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:05.4310177Z File "/tmp/tmpnyyinwfq/lc/clc5kr7ofp6ipkdzxvgogfgretxbny23pz4cfqnpum2ef7mimh4t.py", line 193, in 2025-12-04T12:15:05.4310632Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 649, in wait 2025-12-04T12:15:05.4310789Z self._wait_futures(scope) 2025-12-04T12:15:05.4311287Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 669, in _wait_futures 2025-12-04T12:15:05.4311405Z kernel = result.result() 2025-12-04T12:15:05.4311860Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 4416, in result 2025-12-04T12:15:05.4311976Z return self.result_fn() 2025-12-04T12:15:05.4312457Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 438, in get_result 2025-12-04T12:15:05.4312604Z raise e.with_name(kernel_name) from e 2025-12-04T12:15:05.4312985Z torch._inductor.exc.InductorError: SubprocException: An exception occurred in a subprocess: 2025-12-04T12:15:05.4312991Z 2025-12-04T12:15:05.4313214Z Name=triton_poi_fused__scaled_mm__to_copy_mul_permute_rand_0 2025-12-04T12:15:05.4313339Z Traceback (most recent call last): 2025-12-04T12:15:05.4313881Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_worker/subproc_pool.py", line 457, in do_job 2025-12-04T12:15:05.4314000Z result = job() 2025-12-04T12:15:05.4314592Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 68, in _worker_compile_triton 2025-12-04T12:15:05.4314734Z kernel.precompile(warm_cache_only=True) 2025-12-04T12:15:05.4315342Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 432, in precompile 2025-12-04T12:15:05.4315464Z self._precompile_worker() 2025-12-04T12:15:05.4316071Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:05.4316256Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:05.4316851Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.4317065Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.4317519Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.4317777Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.4318222Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.4318564Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.4318761Z triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T12:15:05.4319289Z def triton_poi_fused__scaled_mm__to_copy_mul_permute_rand_0(in_ptr0, out_ptr1, load_seed_offset, ks1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.4319380Z ^ 2025-12-04T12:15:05.4319852Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.4319860Z 2025-12-04T12:15:05.4319865Z 2025-12-04T12:15:05.4320576Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:05.4320582Z 2025-12-04T12:15:05.4320587Z 2025-12-04T12:15:05.4320817Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:05.4321342Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_eager_fallback_float16_cuda_float16 2025-12-04T12:15:05.4321381Z 2025-12-04T12:15:05.4321664Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:05.4321886Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.4321994Z frames [('total', 1)] 2025-12-04T12:15:05.4322127Z stats [('calls_captured', 11)] 2025-12-04T12:15:05.4322780Z inductor [('async_compile_cache_miss', 6), ('async_compile_cache_hit', 3), ('pattern_matcher_count', 2), ('pattern_matcher_nodes', 2), ('extern_calls', 2), ('fxgraph_cache_miss', 1)] 2025-12-04T12:15:05.4323048Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.4323149Z graph_break [] 2025-12-04T12:15:05.4323327Z aten_mm_info [('aten._scaled_mm.default_s77_s0_s77', 1)] 2025-12-04T12:15:05.4323560Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:15:05.4324771Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T12:15:05.4324888Z if out == self.unknown_value: 2025-12-04T12:15:05.4325120Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.4325225Z frames [('total', 1)] 2025-12-04T12:15:05.4325353Z stats [('calls_captured', 11)] 2025-12-04T12:15:05.4325578Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.4326232Z inductor [('async_compile_cache_miss', 6), ('async_compile_cache_hit', 3), ('pattern_matcher_count', 2), ('pattern_matcher_nodes', 2), ('extern_calls', 2), ('fxgraph_cache_miss', 1)] 2025-12-04T12:15:05.4326348Z graph_break [] 2025-12-04T12:15:05.4326524Z aten_mm_info [('aten._scaled_mm.default_s77_s0_s77', 1)] 2025-12-04T12:15:05.4326703Z =================================== FAILURES =================================== 2025-12-04T12:15:05.4327000Z __________ TestFP8TypesCUDA.test_eager_fallback_float16_cuda_float16 ___________ 2025-12-04T12:15:05.4327124Z Traceback (most recent call last): 2025-12-04T12:15:05.4327532Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 130, in test_eager_fallback 2025-12-04T12:15:05.4327678Z y_fp8 = compiled_fp8_matmul(x) # noqa: F841 2025-12-04T12:15:05.4328165Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:05.4328428Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:05.4328938Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:05.4329142Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:05.4329656Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:05.4329805Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:05.4330350Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:05.4330703Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:05.4331227Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:05.4331391Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:05.4331872Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:05.4332007Z return self._compile_to_module() 2025-12-04T12:15:05.4332493Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:05.4332661Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:05.4333229Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:05.4333365Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:05.4333877Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:05.4334111Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:05.4334740Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:05.4334880Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:05.4335386Z File "/tmp/tmpzltiuvop/kw/ckwcj562jw3lbuugkfy6lz46lt2sifvytood63xi4pmhwrznmhdn.py", line 193, in 2025-12-04T12:15:05.4335847Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 649, in wait 2025-12-04T12:15:05.4335980Z self._wait_futures(scope) 2025-12-04T12:15:05.4336561Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 669, in _wait_futures 2025-12-04T12:15:05.4336692Z kernel = result.result() 2025-12-04T12:15:05.4337142Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 4416, in result 2025-12-04T12:15:05.4337256Z return self.result_fn() 2025-12-04T12:15:05.4337757Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 438, in get_result 2025-12-04T12:15:05.4337889Z raise e.with_name(kernel_name) from e 2025-12-04T12:15:05.4338271Z torch._inductor.exc.InductorError: SubprocException: An exception occurred in a subprocess: 2025-12-04T12:15:05.4338291Z 2025-12-04T12:15:05.4338500Z Name=triton_poi_fused__scaled_mm__to_copy_mul_permute_rand_0 2025-12-04T12:15:05.4338674Z Traceback (most recent call last): 2025-12-04T12:15:05.4346391Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_worker/subproc_pool.py", line 457, in do_job 2025-12-04T12:15:05.4346611Z result = job() 2025-12-04T12:15:05.4347255Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 68, in _worker_compile_triton 2025-12-04T12:15:05.4347416Z kernel.precompile(warm_cache_only=True) 2025-12-04T12:15:05.4347979Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 432, in precompile 2025-12-04T12:15:05.4348107Z self._precompile_worker() 2025-12-04T12:15:05.4348721Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:05.4348900Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:05.4349513Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.4349713Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.4350169Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.4350549Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.4350999Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.4351355Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.4351544Z triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T12:15:05.4352501Z def triton_poi_fused__scaled_mm__to_copy_mul_permute_rand_0(in_ptr0, out_ptr1, load_seed_offset, ks1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.4352612Z ^ 2025-12-04T12:15:05.4353072Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.4353079Z 2025-12-04T12:15:05.4353171Z 2025-12-04T12:15:05.4353909Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:05.4353915Z 2025-12-04T12:15:05.4353920Z 2025-12-04T12:15:05.4354139Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:05.4354705Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_eager_fallback_float16_cuda_float16 2025-12-04T12:15:05.4354726Z 2025-12-04T12:15:05.4354998Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:05.4355228Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.4355350Z frames [('total', 1)] 2025-12-04T12:15:05.4355477Z stats [('calls_captured', 11)] 2025-12-04T12:15:05.4356139Z inductor [('async_compile_cache_miss', 6), ('async_compile_cache_hit', 3), ('pattern_matcher_count', 2), ('pattern_matcher_nodes', 2), ('extern_calls', 2), ('fxgraph_cache_miss', 1)] 2025-12-04T12:15:05.4356377Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.4356480Z graph_break [] 2025-12-04T12:15:05.4356674Z aten_mm_info [('aten._scaled_mm.default_s77_s0_s77', 1)] 2025-12-04T12:15:05.4356898Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:15:05.4358115Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T12:15:05.4358251Z if out == self.unknown_value: 2025-12-04T12:15:05.4358470Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.4358617Z frames [('total', 1)] 2025-12-04T12:15:05.4358751Z stats [('calls_captured', 11)] 2025-12-04T12:15:05.4358976Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.4359642Z inductor [('async_compile_cache_miss', 6), ('async_compile_cache_hit', 3), ('pattern_matcher_count', 2), ('pattern_matcher_nodes', 2), ('extern_calls', 2), ('fxgraph_cache_miss', 1)] 2025-12-04T12:15:05.4359742Z graph_break [] 2025-12-04T12:15:05.4359918Z aten_mm_info [('aten._scaled_mm.default_s77_s0_s77', 1)] 2025-12-04T12:15:05.4360153Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.4360260Z frames [('total', 1)] 2025-12-04T12:15:05.4360378Z stats [('calls_captured', 11)] 2025-12-04T12:15:05.4360611Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.4361263Z inductor [('async_compile_cache_miss', 6), ('async_compile_cache_hit', 3), ('pattern_matcher_count', 2), ('pattern_matcher_nodes', 2), ('extern_calls', 2), ('fxgraph_cache_miss', 1)] 2025-12-04T12:15:05.4361378Z graph_break [] 2025-12-04T12:15:05.4361555Z aten_mm_info [('aten._scaled_mm.default_s77_s0_s77', 1)] 2025-12-04T12:15:05.4362208Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-d16f18ba4de45d90.xml - 2025-12-04T12:15:05.4362425Z =========================== short test summary info ============================ 2025-12-04T12:15:05.4363281Z FAILED [0.9733s] inductor/test_fp8.py::TestFP8TypesCUDA::test_eager_fallback_float16_cuda_float16 - torch._inductor.exc.InductorError: SubprocException: An exception occurred in a subprocess: 2025-12-04T12:15:05.4363290Z 2025-12-04T12:15:05.4363513Z Name=triton_poi_fused__scaled_mm__to_copy_mul_permute_rand_0 2025-12-04T12:15:05.4363638Z Traceback (most recent call last): 2025-12-04T12:15:05.4364183Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_worker/subproc_pool.py", line 457, in do_job 2025-12-04T12:15:05.4364478Z result = job() 2025-12-04T12:15:05.4365122Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 68, in _worker_compile_triton 2025-12-04T12:15:05.4365265Z kernel.precompile(warm_cache_only=True) 2025-12-04T12:15:05.4365834Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 432, in precompile 2025-12-04T12:15:05.4365951Z self._precompile_worker() 2025-12-04T12:15:05.4366592Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:05.4366768Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:05.4367360Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.4367575Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.4368030Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.4368284Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.4368731Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.4369065Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.4369260Z triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T12:15:05.4369756Z def triton_poi_fused__scaled_mm__to_copy_mul_permute_rand_0(in_ptr0, out_ptr1, load_seed_offset, ks1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.4369847Z ^ 2025-12-04T12:15:05.4370313Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.4370320Z 2025-12-04T12:15:05.4370364Z 2025-12-04T12:15:05.4371284Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:05.4371291Z 2025-12-04T12:15:05.4371296Z 2025-12-04T12:15:05.4371528Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:05.4372053Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_eager_fallback_float16_cuda_float16 2025-12-04T12:15:05.4372058Z 2025-12-04T12:15:05.4372337Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:05.4372519Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T12:15:05.4372727Z ================= 1 failed, 187 deselected, 2 rerun in 21.71s ================== 2025-12-04T12:15:05.4372842Z Got exit code 1 2025-12-04T12:15:05.4372952Z Retrying single test... 2025-12-04T12:15:05.4373440Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-4078dca354f1c797.xml 2025-12-04T12:15:05.4373608Z ============================= test session starts ============================== 2025-12-04T12:15:05.4373960Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T12:15:05.4374080Z cachedir: .pytest_cache 2025-12-04T12:15:05.4374687Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:15:05.4374817Z rootdir: /var/lib/jenkins/workspace 2025-12-04T12:15:05.4374940Z configfile: pytest.ini 2025-12-04T12:15:05.4375530Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T12:15:05.4375765Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T12:15:05.4376469Z stepcurrent: skipping 22 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_eager_fallback_float16_cuda_float16 2025-12-04T12:15:05.4376591Z Running 1 items in this shard 2025-12-04T12:15:05.4376654Z 2025-12-04T12:15:05.4377587Z inductor/test_fp8.py::TestFP8TypesCUDA::test_eager_fallback_float16_cuda_float16 [W1204 11:55:25.665307544 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:15:05.4377593Z 2025-12-04T12:15:05.4378110Z [W1204 11:55:41.632861756 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:15:05.4378163Z 2025-12-04T12:15:05.4378686Z [W1204 11:55:41.633115095 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:15:05.4378691Z 2025-12-04T12:15:05.4379200Z [W1204 11:55:41.636190118 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:15:05.4379208Z 2025-12-04T12:15:05.4379735Z [W1204 11:55:41.636379338 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:15:05.4379740Z 2025-12-04T12:15:05.4380252Z [W1204 11:55:41.638308150 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:15:05.4380260Z 2025-12-04T12:15:05.4380768Z [W1204 11:55:41.638639530 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:15:05.4380788Z 2025-12-04T12:15:05.4381298Z [W1204 11:55:41.638807310 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:15:05.4381304Z 2025-12-04T12:15:05.4381812Z [W1204 11:55:41.639323280 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:15:05.4381865Z 2025-12-04T12:15:05.4382391Z [W1204 11:55:41.639504314 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:15:05.4382398Z 2025-12-04T12:15:05.4382906Z [W1204 11:55:41.640100353 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:15:05.4382914Z 2025-12-04T12:15:05.4383429Z [W1204 11:55:41.640300068 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:15:05.4383438Z 2025-12-04T12:15:05.4383943Z [W1204 11:55:41.640736483 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:15:05.4383948Z 2025-12-04T12:15:05.4384467Z [W1204 11:55:41.640913317 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:15:05.4384471Z 2025-12-04T12:15:05.4384982Z [W1204 11:55:41.641287051 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:15:05.4384987Z 2025-12-04T12:15:05.4385507Z [W1204 11:55:41.641460882 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:15:05.4385512Z 2025-12-04T12:15:05.4386051Z [W1204 11:55:41.641839376 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:15:05.4386057Z 2025-12-04T12:15:05.4386567Z [W1204 11:55:41.642012794 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:15:05.4386586Z 2025-12-04T12:15:05.4387051Z W1204 11:55:41.605000 116320 site-packages/torch/_inductor/utils.py:1703] [0/0] Not enough SMs to use max_autotune_gemm mode 2025-12-04T12:15:05.4387189Z ('RERUN', {'yellow': True}) [19.8783s] [100%] 2025-12-04T12:15:05.4388145Z inductor/test_fp8.py::TestFP8TypesCUDA::test_eager_fallback_float16_cuda_float16 [W1204 11:55:42.047258113 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:15:05.4388152Z 2025-12-04T12:15:05.4388664Z [W1204 11:55:42.047705038 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:15:05.4388669Z 2025-12-04T12:15:05.4389193Z [W1204 11:55:42.047892171 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:15:05.4389228Z 2025-12-04T12:15:05.4389731Z [W1204 11:55:42.048485283 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:15:05.4389736Z 2025-12-04T12:15:05.4390255Z [W1204 11:55:42.048698011 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:15:05.4390263Z 2025-12-04T12:15:05.4390776Z [W1204 11:55:42.049065292 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:15:05.4390781Z 2025-12-04T12:15:05.4391284Z [W1204 11:55:42.049383283 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:15:05.4391305Z 2025-12-04T12:15:05.4391815Z [W1204 11:55:42.049554122 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:15:05.4391823Z 2025-12-04T12:15:05.4392325Z [W1204 11:55:42.050055056 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:15:05.4392331Z 2025-12-04T12:15:05.4392845Z [W1204 11:55:42.050242151 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:15:05.4392885Z 2025-12-04T12:15:05.4393392Z [W1204 11:55:42.050698827 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:15:05.4393399Z 2025-12-04T12:15:05.4393914Z [W1204 11:55:42.050878356 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:15:05.4393919Z 2025-12-04T12:15:05.4394430Z [W1204 11:55:42.051268046 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:15:05.4394437Z 2025-12-04T12:15:05.4394953Z [W1204 11:55:42.051444935 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:15:05.4394958Z 2025-12-04T12:15:05.4395466Z [W1204 11:55:42.051808164 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:15:05.4395471Z 2025-12-04T12:15:05.4395989Z [W1204 11:55:42.051983558 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:15:05.4395994Z 2025-12-04T12:15:05.4396503Z [W1204 11:55:42.052348990 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:15:05.4396508Z 2025-12-04T12:15:05.4397062Z [W1204 11:55:42.052526374 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:15:05.4397067Z 2025-12-04T12:15:05.4397213Z ('RERUN', {'yellow': True}) [0.8586s] [100%] 2025-12-04T12:15:05.4398119Z inductor/test_fp8.py::TestFP8TypesCUDA::test_eager_fallback_float16_cuda_float16 [W1204 11:55:43.915440839 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:15:05.4398124Z 2025-12-04T12:15:05.4398643Z [W1204 11:55:43.915862192 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:15:05.4398650Z 2025-12-04T12:15:05.4399188Z [W1204 11:55:43.916046807 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:15:05.4399194Z 2025-12-04T12:15:05.4399713Z [W1204 11:55:43.916647303 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:15:05.4399718Z 2025-12-04T12:15:05.4400225Z [W1204 11:55:43.916847572 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:15:05.4400261Z 2025-12-04T12:15:05.4400783Z [W1204 11:55:43.917215255 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:15:05.4400788Z 2025-12-04T12:15:05.4401292Z [W1204 11:55:43.917505746 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:15:05.4401300Z 2025-12-04T12:15:05.4401807Z [W1204 11:55:43.917671384 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:15:05.4401812Z 2025-12-04T12:15:05.4402331Z [W1204 11:55:43.918138177 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:15:05.4402336Z 2025-12-04T12:15:05.4402840Z [W1204 11:55:43.918319093 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:15:05.4402848Z 2025-12-04T12:15:05.4403367Z [W1204 11:55:43.918761094 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:15:05.4403372Z 2025-12-04T12:15:05.4403875Z [W1204 11:55:43.918938689 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:15:05.4403911Z 2025-12-04T12:15:05.4404434Z [W1204 11:55:43.919322728 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:15:05.4404439Z 2025-12-04T12:15:05.4404944Z [W1204 11:55:43.919499414 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:15:05.4404949Z 2025-12-04T12:15:05.4405465Z [W1204 11:55:43.919853472 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:15:05.4405473Z 2025-12-04T12:15:05.4405981Z [W1204 11:55:43.920059241 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:15:05.4405986Z 2025-12-04T12:15:05.4406490Z [W1204 11:55:43.920446807 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:15:05.4406506Z 2025-12-04T12:15:05.4407014Z [W1204 11:55:43.920634939 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T12:15:05.4407021Z 2025-12-04T12:15:05.4407125Z FAILED [0.9665s] [100%] 2025-12-04T12:15:05.4407130Z 2025-12-04T12:15:05.4407286Z ==================================== RERUNS ==================================== 2025-12-04T12:15:05.4407621Z __________ TestFP8TypesCUDA.test_eager_fallback_float16_cuda_float16 ___________ 2025-12-04T12:15:05.4407759Z Traceback (most recent call last): 2025-12-04T12:15:05.4408159Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 130, in test_eager_fallback 2025-12-04T12:15:05.4408311Z y_fp8 = compiled_fp8_matmul(x) # noqa: F841 2025-12-04T12:15:05.4408818Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:05.4409070Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:05.4409587Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:05.4409835Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:05.4410349Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:05.4410513Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:05.4411049Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:05.4411405Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:05.4411935Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:05.4412085Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:05.4412580Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:05.4412707Z return self._compile_to_module() 2025-12-04T12:15:05.4413193Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:05.4413370Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:05.4413882Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:05.4414016Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:05.4414523Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:05.4414756Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:05.4415347Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:05.4415507Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:05.4415999Z File "/tmp/tmpmt7jig_6/uf/cuflm2uv5axh3zr2dkqme6bl7pcttyxbrpkuupsvfhwpe45mxnhx.py", line 193, in 2025-12-04T12:15:05.4416543Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 649, in wait 2025-12-04T12:15:05.4416662Z self._wait_futures(scope) 2025-12-04T12:15:05.4417170Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 669, in _wait_futures 2025-12-04T12:15:05.4417290Z kernel = result.result() 2025-12-04T12:15:05.4417734Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 4416, in result 2025-12-04T12:15:05.4417856Z return self.result_fn() 2025-12-04T12:15:05.4418333Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 438, in get_result 2025-12-04T12:15:05.4418464Z raise e.with_name(kernel_name) from e 2025-12-04T12:15:05.4418861Z torch._inductor.exc.InductorError: SubprocException: An exception occurred in a subprocess: 2025-12-04T12:15:05.4418869Z 2025-12-04T12:15:05.4419079Z Name=triton_poi_fused__scaled_mm__to_copy_mul_permute_rand_0 2025-12-04T12:15:05.4419217Z Traceback (most recent call last): 2025-12-04T12:15:05.4419800Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_worker/subproc_pool.py", line 457, in do_job 2025-12-04T12:15:05.4419903Z result = job() 2025-12-04T12:15:05.4420509Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 68, in _worker_compile_triton 2025-12-04T12:15:05.4420651Z kernel.precompile(warm_cache_only=True) 2025-12-04T12:15:05.4421204Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 432, in precompile 2025-12-04T12:15:05.4421334Z self._precompile_worker() 2025-12-04T12:15:05.4421935Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:05.4422161Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:05.4422760Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.4422961Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.4423423Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.4423783Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.4424238Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.4424573Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.4424760Z triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T12:15:05.4425273Z def triton_poi_fused__scaled_mm__to_copy_mul_permute_rand_0(in_ptr0, out_ptr1, load_seed_offset, ks1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.4425365Z ^ 2025-12-04T12:15:05.4425828Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.4425846Z 2025-12-04T12:15:05.4425851Z 2025-12-04T12:15:05.4426566Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:05.4426574Z 2025-12-04T12:15:05.4426579Z 2025-12-04T12:15:05.4426801Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:05.4427337Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_eager_fallback_float16_cuda_float16 2025-12-04T12:15:05.4427377Z 2025-12-04T12:15:05.4427646Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:05.4427889Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.4427996Z frames [('total', 1)] 2025-12-04T12:15:05.4428117Z stats [('calls_captured', 11)] 2025-12-04T12:15:05.4428795Z inductor [('async_compile_cache_miss', 6), ('async_compile_cache_hit', 3), ('pattern_matcher_count', 2), ('pattern_matcher_nodes', 2), ('extern_calls', 2), ('fxgraph_cache_miss', 1)] 2025-12-04T12:15:05.4429018Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.4429136Z graph_break [] 2025-12-04T12:15:05.4429315Z aten_mm_info [('aten._scaled_mm.default_s77_s0_s77', 1)] 2025-12-04T12:15:05.4429538Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:15:05.4430756Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T12:15:05.4430880Z if out == self.unknown_value: 2025-12-04T12:15:05.4431167Z __________ TestFP8TypesCUDA.test_eager_fallback_float16_cuda_float16 ___________ 2025-12-04T12:15:05.4431309Z Traceback (most recent call last): 2025-12-04T12:15:05.4431739Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 130, in test_eager_fallback 2025-12-04T12:15:05.4431895Z y_fp8 = compiled_fp8_matmul(x) # noqa: F841 2025-12-04T12:15:05.4432386Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:05.4432637Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:05.4433162Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:05.4433360Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:05.4433908Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:05.4434054Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:05.4434593Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:05.4434925Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:05.4435472Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:05.4435621Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:05.4436112Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:05.4436232Z return self._compile_to_module() 2025-12-04T12:15:05.4436730Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:05.4436898Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:05.4437414Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:05.4437558Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:05.4438056Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:05.4438299Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:05.4438885Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:05.4439013Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:05.4439528Z File "/tmp/tmpld5b0idr/nx/cnxx7kaioecv5jz4hjemekyivxpjom344u7no6d4gmd7d27uagsq.py", line 193, in 2025-12-04T12:15:05.4440028Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 649, in wait 2025-12-04T12:15:05.4440147Z self._wait_futures(scope) 2025-12-04T12:15:05.4440649Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 669, in _wait_futures 2025-12-04T12:15:05.4440768Z kernel = result.result() 2025-12-04T12:15:05.4441224Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 4416, in result 2025-12-04T12:15:05.4441343Z return self.result_fn() 2025-12-04T12:15:05.4441820Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 438, in get_result 2025-12-04T12:15:05.4441960Z raise e.with_name(kernel_name) from e 2025-12-04T12:15:05.4442342Z torch._inductor.exc.InductorError: SubprocException: An exception occurred in a subprocess: 2025-12-04T12:15:05.4442350Z 2025-12-04T12:15:05.4442567Z Name=triton_poi_fused__scaled_mm__to_copy_mul_permute_rand_0 2025-12-04T12:15:05.4442691Z Traceback (most recent call last): 2025-12-04T12:15:05.4443232Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_worker/subproc_pool.py", line 457, in do_job 2025-12-04T12:15:05.4443343Z result = job() 2025-12-04T12:15:05.4443969Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 68, in _worker_compile_triton 2025-12-04T12:15:05.4444112Z kernel.precompile(warm_cache_only=True) 2025-12-04T12:15:05.4444682Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 432, in precompile 2025-12-04T12:15:05.4444799Z self._precompile_worker() 2025-12-04T12:15:05.4445408Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:05.4445590Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:05.4446216Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.4446429Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.4446885Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.4447143Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.4447618Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.4447952Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.4448146Z triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T12:15:05.4448646Z def triton_poi_fused__scaled_mm__to_copy_mul_permute_rand_0(in_ptr0, out_ptr1, load_seed_offset, ks1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.4448738Z ^ 2025-12-04T12:15:05.4449204Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.4449210Z 2025-12-04T12:15:05.4449215Z 2025-12-04T12:15:05.4449931Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:05.4449937Z 2025-12-04T12:15:05.4449942Z 2025-12-04T12:15:05.4450169Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:05.4450691Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_eager_fallback_float16_cuda_float16 2025-12-04T12:15:05.4450697Z 2025-12-04T12:15:05.4450976Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:05.4451197Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.4451342Z frames [('total', 1)] 2025-12-04T12:15:05.4451477Z stats [('calls_captured', 11)] 2025-12-04T12:15:05.4452139Z inductor [('async_compile_cache_miss', 6), ('async_compile_cache_hit', 3), ('pattern_matcher_count', 2), ('pattern_matcher_nodes', 2), ('extern_calls', 2), ('fxgraph_cache_miss', 1)] 2025-12-04T12:15:05.4452376Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.4452478Z graph_break [] 2025-12-04T12:15:05.4452655Z aten_mm_info [('aten._scaled_mm.default_s77_s0_s77', 1)] 2025-12-04T12:15:05.4452890Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:15:05.4454095Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T12:15:05.4454214Z if out == self.unknown_value: 2025-12-04T12:15:05.4454448Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.4454553Z frames [('total', 1)] 2025-12-04T12:15:05.4454684Z stats [('calls_captured', 11)] 2025-12-04T12:15:05.4454904Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.4455597Z inductor [('async_compile_cache_miss', 6), ('async_compile_cache_hit', 3), ('pattern_matcher_count', 2), ('pattern_matcher_nodes', 2), ('extern_calls', 2), ('fxgraph_cache_miss', 1)] 2025-12-04T12:15:05.4455710Z graph_break [] 2025-12-04T12:15:05.4455888Z aten_mm_info [('aten._scaled_mm.default_s77_s0_s77', 1)] 2025-12-04T12:15:05.4456039Z =================================== FAILURES =================================== 2025-12-04T12:15:05.4456442Z __________ TestFP8TypesCUDA.test_eager_fallback_float16_cuda_float16 ___________ 2025-12-04T12:15:05.4456573Z Traceback (most recent call last): 2025-12-04T12:15:05.4456981Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 130, in test_eager_fallback 2025-12-04T12:15:05.4457131Z y_fp8 = compiled_fp8_matmul(x) # noqa: F841 2025-12-04T12:15:05.4457668Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:05.4457934Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:05.4458454Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:05.4458648Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:05.4459198Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:05.4459346Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:05.4459888Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:05.4460212Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:05.4460734Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:05.4460893Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:05.4461376Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:05.4461509Z return self._compile_to_module() 2025-12-04T12:15:05.4461990Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:05.4462150Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:05.4462680Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:05.4462810Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:05.4463337Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:05.4463582Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:05.4464168Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:05.4464307Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:05.4464816Z File "/tmp/tmp4aemnlh4/qc/cqcprnkar4ppm6efp767xjtczmidmbw3cmmyeqznbpp5lk67zulo.py", line 193, in 2025-12-04T12:15:05.4465274Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 649, in wait 2025-12-04T12:15:05.4465403Z self._wait_futures(scope) 2025-12-04T12:15:05.4465896Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 669, in _wait_futures 2025-12-04T12:15:05.4466022Z kernel = result.result() 2025-12-04T12:15:05.4466465Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 4416, in result 2025-12-04T12:15:05.4466579Z return self.result_fn() 2025-12-04T12:15:05.4467067Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 438, in get_result 2025-12-04T12:15:05.4467232Z raise e.with_name(kernel_name) from e 2025-12-04T12:15:05.4467614Z torch._inductor.exc.InductorError: SubprocException: An exception occurred in a subprocess: 2025-12-04T12:15:05.4467630Z 2025-12-04T12:15:05.4467841Z Name=triton_poi_fused__scaled_mm__to_copy_mul_permute_rand_0 2025-12-04T12:15:05.4467963Z Traceback (most recent call last): 2025-12-04T12:15:05.4468512Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_worker/subproc_pool.py", line 457, in do_job 2025-12-04T12:15:05.4468611Z result = job() 2025-12-04T12:15:05.4469205Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 68, in _worker_compile_triton 2025-12-04T12:15:05.4469361Z kernel.precompile(warm_cache_only=True) 2025-12-04T12:15:05.4469948Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 432, in precompile 2025-12-04T12:15:05.4470081Z self._precompile_worker() 2025-12-04T12:15:05.4470684Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:05.4470861Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:05.4471694Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.4471891Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.4472341Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.4472603Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.4473049Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.4473395Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.4473578Z triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T12:15:05.4474073Z def triton_poi_fused__scaled_mm__to_copy_mul_permute_rand_0(in_ptr0, out_ptr1, load_seed_offset, ks1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.4474180Z ^ 2025-12-04T12:15:05.4474635Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.4474640Z 2025-12-04T12:15:05.4474645Z 2025-12-04T12:15:05.4475371Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:05.4475475Z 2025-12-04T12:15:05.4475480Z 2025-12-04T12:15:05.4475698Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:05.4476236Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_eager_fallback_float16_cuda_float16 2025-12-04T12:15:05.4476242Z 2025-12-04T12:15:05.4476513Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:05.4476735Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.4476859Z frames [('total', 1)] 2025-12-04T12:15:05.4476978Z stats [('calls_captured', 11)] 2025-12-04T12:15:05.4477634Z inductor [('async_compile_cache_miss', 6), ('async_compile_cache_hit', 3), ('pattern_matcher_count', 2), ('pattern_matcher_nodes', 2), ('extern_calls', 2), ('fxgraph_cache_miss', 1)] 2025-12-04T12:15:05.4477869Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.4477970Z graph_break [] 2025-12-04T12:15:05.4478162Z aten_mm_info [('aten._scaled_mm.default_s77_s0_s77', 1)] 2025-12-04T12:15:05.4478383Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:15:05.4479651Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T12:15:05.4479786Z if out == self.unknown_value: 2025-12-04T12:15:05.4480005Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.4480119Z frames [('total', 1)] 2025-12-04T12:15:05.4480238Z stats [('calls_captured', 11)] 2025-12-04T12:15:05.4480457Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.4481127Z inductor [('async_compile_cache_miss', 6), ('async_compile_cache_hit', 3), ('pattern_matcher_count', 2), ('pattern_matcher_nodes', 2), ('extern_calls', 2), ('fxgraph_cache_miss', 1)] 2025-12-04T12:15:05.4481229Z graph_break [] 2025-12-04T12:15:05.4481453Z aten_mm_info [('aten._scaled_mm.default_s77_s0_s77', 1)] 2025-12-04T12:15:05.4481699Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.4481804Z frames [('total', 1)] 2025-12-04T12:15:05.4481924Z stats [('calls_captured', 11)] 2025-12-04T12:15:05.4482162Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.4482816Z inductor [('async_compile_cache_miss', 6), ('async_compile_cache_hit', 3), ('pattern_matcher_count', 2), ('pattern_matcher_nodes', 2), ('extern_calls', 2), ('fxgraph_cache_miss', 1)] 2025-12-04T12:15:05.4483000Z graph_break [] 2025-12-04T12:15:05.4483180Z aten_mm_info [('aten._scaled_mm.default_s77_s0_s77', 1)] 2025-12-04T12:15:05.4483827Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-4078dca354f1c797.xml - 2025-12-04T12:15:05.4484023Z =========================== short test summary info ============================ 2025-12-04T12:15:05.4484890Z FAILED [0.9665s] inductor/test_fp8.py::TestFP8TypesCUDA::test_eager_fallback_float16_cuda_float16 - torch._inductor.exc.InductorError: SubprocException: An exception occurred in a subprocess: 2025-12-04T12:15:05.4484896Z 2025-12-04T12:15:05.4485122Z Name=triton_poi_fused__scaled_mm__to_copy_mul_permute_rand_0 2025-12-04T12:15:05.4485250Z Traceback (most recent call last): 2025-12-04T12:15:05.4485798Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_worker/subproc_pool.py", line 457, in do_job 2025-12-04T12:15:05.4485920Z result = job() 2025-12-04T12:15:05.4486513Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 68, in _worker_compile_triton 2025-12-04T12:15:05.4486671Z kernel.precompile(warm_cache_only=True) 2025-12-04T12:15:05.4487229Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 432, in precompile 2025-12-04T12:15:05.4487387Z self._precompile_worker() 2025-12-04T12:15:05.4488000Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:05.4488184Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:05.4488781Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.4489000Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.4489453Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.4489717Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.4490161Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.4490501Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.4490704Z triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T12:15:05.4491236Z def triton_poi_fused__scaled_mm__to_copy_mul_permute_rand_0(in_ptr0, out_ptr1, load_seed_offset, ks1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.4491341Z ^ 2025-12-04T12:15:05.4491799Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.4491807Z 2025-12-04T12:15:05.4491812Z 2025-12-04T12:15:05.4492520Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:05.4492541Z 2025-12-04T12:15:05.4492546Z 2025-12-04T12:15:05.4492764Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:05.4493292Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_eager_fallback_float16_cuda_float16 2025-12-04T12:15:05.4493298Z 2025-12-04T12:15:05.4493616Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:05.4493803Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T12:15:05.4494012Z ================= 1 failed, 187 deselected, 2 rerun in 21.75s ================== 2025-12-04T12:15:05.4494130Z Got exit code 1 2025-12-04T12:15:05.4494573Z FAILED CONSISTENTLY: test/inductor/test_fp8.py::TestFP8TypesCUDA::test_eager_fallback_float16_cuda_float16 2025-12-04T12:15:05.4495034Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T12:15:05.4495507Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-7591ded94ad5fda9.xml 2025-12-04T12:15:05.4495675Z ============================= test session starts ============================== 2025-12-04T12:15:05.4496050Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T12:15:05.4496163Z cachedir: .pytest_cache 2025-12-04T12:15:05.4496783Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:15:05.4496915Z rootdir: /var/lib/jenkins/workspace 2025-12-04T12:15:05.4497026Z configfile: pytest.ini 2025-12-04T12:15:05.4497635Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T12:15:05.4497864Z collecting ... collected 188 items / 23 deselected / 165 selected 2025-12-04T12:15:05.4498007Z stepcurrent: skipping 23 already run items. 2025-12-04T12:15:05.4498136Z Running 165 items in this shard 2025-12-04T12:15:05.4498142Z 2025-12-04T12:15:05.4499058Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_benchmark_float8_e4m3fn_shape_4,2048,4096_keepdim_False_cuda SKIPPED [0.0004s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 0%] 2025-12-04T12:15:05.4500011Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_benchmark_float8_e4m3fn_shape_4,2048,4096_keepdim_True_cuda SKIPPED [0.0003s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 1%] 2025-12-04T12:15:05.4500898Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_benchmark_float8_e5m2_shape_4,2048,4096_keepdim_False_cuda SKIPPED [0.0004s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 1%] 2025-12-04T12:15:05.4501796Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_benchmark_float8_e5m2_shape_4,2048,4096_keepdim_True_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 2%] 2025-12-04T12:15:05.4503219Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,1,15_cuda E1204 11:56:00.975000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0 2025-12-04T12:15:05.4504357Z E1204 11:56:00.975000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0(in_ptr0, in_ptr1, out_ptr3, out_ptr4, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.4504794Z E1204 11:56:00.975000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T12:15:05.4505239Z E1204 11:56:00.975000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 15 2025-12-04T12:15:05.4505767Z E1204 11:56:00.975000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] R0_BLOCK: tl.constexpr = 16 2025-12-04T12:15:05.4506228Z E1204 11:56:00.975000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T12:15:05.4506806Z E1204 11:56:00.975000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T12:15:05.4507350Z E1204 11:56:00.975000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:05.4507934Z E1204 11:56:00.975000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T12:15:05.4508566Z E1204 11:56:00.975000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T12:15:05.4509122Z E1204 11:56:00.975000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T12:15:05.4509577Z E1204 11:56:00.975000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_offset = 0 2025-12-04T12:15:05.4510099Z E1204 11:56:00.975000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T12:15:05.4510573Z E1204 11:56:00.975000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T12:15:05.4511047Z E1204 11:56:00.975000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T12:15:05.4511495Z E1204 11:56:00.975000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_0 = r0_index 2025-12-04T12:15:05.4512155Z E1204 11:56:00.975000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0).to(tl.float32) 2025-12-04T12:15:05.4512682Z E1204 11:56:00.975000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp30 = tl.load(in_ptr1 + (0)) 2025-12-04T12:15:05.4513265Z E1204 11:56:00.975000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp31 = tl.broadcast_to(tmp30, [1, 1]) 2025-12-04T12:15:05.4513786Z E1204 11:56:00.975000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float32) 2025-12-04T12:15:05.4514366Z E1204 11:56:00.975000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK]) 2025-12-04T12:15:05.4514913Z E1204 11:56:00.975000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tl.where(r0_mask, tmp2, 0) 2025-12-04T12:15:05.4515490Z E1204 11:56:00.975000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = tl.broadcast_to(tmp2, [XBLOCK, R0_BLOCK]) 2025-12-04T12:15:05.4516037Z E1204 11:56:00.975000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = tl.where(r0_mask, tmp5, 0) 2025-12-04T12:15:05.4516608Z E1204 11:56:00.975000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tl.sum(tmp7, 1)[:, None].to(tl.float32) 2025-12-04T12:15:05.4517174Z E1204 11:56:00.975000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = tl.full([1, 1], 15, tl.int32) 2025-12-04T12:15:05.4517697Z E1204 11:56:00.975000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = tmp9.to(tl.float32) 2025-12-04T12:15:05.4518183Z E1204 11:56:00.975000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = (tmp8 / tmp10) 2025-12-04T12:15:05.4518677Z E1204 11:56:00.975000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = tmp2 - tmp11 2025-12-04T12:15:05.4519155Z E1204 11:56:00.975000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = tmp12 * tmp12 2025-12-04T12:15:05.4519776Z E1204 11:56:00.975000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp14 = tl.broadcast_to(tmp13, [XBLOCK, R0_BLOCK]) 2025-12-04T12:15:05.4520327Z E1204 11:56:00.975000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp16 = tl.where(r0_mask, tmp14, 0) 2025-12-04T12:15:05.4520903Z E1204 11:56:00.975000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp17 = tl.sum(tmp16, 1)[:, None].to(tl.float32) 2025-12-04T12:15:05.4521396Z E1204 11:56:00.975000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp18 = tmp1 - tmp11 2025-12-04T12:15:05.4521876Z E1204 11:56:00.975000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp19 = 15.0 2025-12-04T12:15:05.4522364Z E1204 11:56:00.975000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp20 = (tmp17 / tmp19) 2025-12-04T12:15:05.4522818Z E1204 11:56:00.975000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp21 = 1e-05 2025-12-04T12:15:05.4523307Z E1204 11:56:00.975000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp22 = tmp20 + tmp21 2025-12-04T12:15:05.4523848Z E1204 11:56:00.975000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp23 = libdevice.rsqrt(tmp22) 2025-12-04T12:15:05.4524332Z E1204 11:56:00.975000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp24 = tmp18 * tmp23 2025-12-04T12:15:05.4524836Z E1204 11:56:00.975000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp25 = tl_math.abs(tmp24) 2025-12-04T12:15:05.4525441Z E1204 11:56:00.975000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp26 = tl.broadcast_to(tmp25, [XBLOCK, R0_BLOCK]) 2025-12-04T12:15:05.4526017Z E1204 11:56:00.975000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp28 = tl.where(r0_mask, tmp26, float("-inf")) 2025-12-04T12:15:05.4526707Z E1204 11:56:00.975000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp29 = triton_helpers.max2(tmp28, 1)[:, None].to(tl.float32) 2025-12-04T12:15:05.4527191Z E1204 11:56:00.975000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp32 = tmp24 * tmp31 2025-12-04T12:15:05.4527646Z E1204 11:56:00.975000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp33 = -448.0 2025-12-04T12:15:05.4528227Z E1204 11:56:00.975000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp34 = triton_helpers.maximum(tmp32, tmp33) 2025-12-04T12:15:05.4528664Z E1204 11:56:00.975000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp35 = 448.0 2025-12-04T12:15:05.4529244Z E1204 11:56:00.975000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp36 = triton_helpers.minimum(tmp34, tmp35) 2025-12-04T12:15:05.4529780Z E1204 11:56:00.975000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp37 = tmp36.to(tl.float8e4nv) 2025-12-04T12:15:05.4530306Z E1204 11:56:00.975000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp38 = tmp29.to(tl.float32) 2025-12-04T12:15:05.4531041Z E1204 11:56:00.975000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr3 + (tl.broadcast_to(r0_0, [XBLOCK, R0_BLOCK])), tmp37, r0_mask) 2025-12-04T12:15:05.4531752Z E1204 11:56:00.975000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr4 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp38, None) 2025-12-04T12:15:05.4532128Z E1204 11:56:00.975000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:05.4534267Z E1204 11:56:00.975000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr3': '*fp8e4nv', 'out_ptr4': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 75} 2025-12-04T12:15:05.4534846Z E1204 11:56:00.975000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:05.4535890Z E1204 11:56:00.975000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.4536617Z E1204 11:56:00.975000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.4537512Z E1204 11:56:00.975000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.4538205Z E1204 11:56:00.975000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.4539089Z E1204 11:56:00.975000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.4539860Z E1204 11:56:00.975000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.4540529Z E1204 11:56:00.975000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T12:15:05.4541619Z E1204 11:56:00.975000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0(in_ptr0, in_ptr1, out_ptr3, out_ptr4, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.4542002Z E1204 11:56:00.975000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T12:15:05.4542896Z E1204 11:56:00.975000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.4543046Z ('RERUN', {'yellow': True}) [3.3820s] [ 3%] 2025-12-04T12:15:05.4544468Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,1,15_cuda E1204 11:56:01.419000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0 2025-12-04T12:15:05.4545598Z E1204 11:56:01.419000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0(in_ptr0, in_ptr1, out_ptr3, out_ptr4, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.4546034Z E1204 11:56:01.419000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T12:15:05.4546478Z E1204 11:56:01.419000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 15 2025-12-04T12:15:05.4547005Z E1204 11:56:01.419000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] R0_BLOCK: tl.constexpr = 16 2025-12-04T12:15:05.4547468Z E1204 11:56:01.419000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T12:15:05.4548043Z E1204 11:56:01.419000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T12:15:05.4548584Z E1204 11:56:01.419000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:05.4549166Z E1204 11:56:01.419000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T12:15:05.4549791Z E1204 11:56:01.419000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T12:15:05.4550345Z E1204 11:56:01.419000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T12:15:05.4550798Z E1204 11:56:01.419000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_offset = 0 2025-12-04T12:15:05.4551319Z E1204 11:56:01.419000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T12:15:05.4551790Z E1204 11:56:01.419000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T12:15:05.4552258Z E1204 11:56:01.419000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T12:15:05.4552705Z E1204 11:56:01.419000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_0 = r0_index 2025-12-04T12:15:05.4553360Z E1204 11:56:01.419000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0).to(tl.float32) 2025-12-04T12:15:05.4553887Z E1204 11:56:01.419000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp30 = tl.load(in_ptr1 + (0)) 2025-12-04T12:15:05.4554468Z E1204 11:56:01.419000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp31 = tl.broadcast_to(tmp30, [1, 1]) 2025-12-04T12:15:05.4554987Z E1204 11:56:01.419000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float32) 2025-12-04T12:15:05.4555573Z E1204 11:56:01.419000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK]) 2025-12-04T12:15:05.4556119Z E1204 11:56:01.419000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tl.where(r0_mask, tmp2, 0) 2025-12-04T12:15:05.4556698Z E1204 11:56:01.419000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = tl.broadcast_to(tmp2, [XBLOCK, R0_BLOCK]) 2025-12-04T12:15:05.4557241Z E1204 11:56:01.419000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = tl.where(r0_mask, tmp5, 0) 2025-12-04T12:15:05.4557815Z E1204 11:56:01.419000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tl.sum(tmp7, 1)[:, None].to(tl.float32) 2025-12-04T12:15:05.4558381Z E1204 11:56:01.419000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = tl.full([1, 1], 15, tl.int32) 2025-12-04T12:15:05.4558905Z E1204 11:56:01.419000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = tmp9.to(tl.float32) 2025-12-04T12:15:05.4559391Z E1204 11:56:01.419000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = (tmp8 / tmp10) 2025-12-04T12:15:05.4559887Z E1204 11:56:01.419000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = tmp2 - tmp11 2025-12-04T12:15:05.4560368Z E1204 11:56:01.419000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = tmp12 * tmp12 2025-12-04T12:15:05.4561552Z E1204 11:56:01.419000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp14 = tl.broadcast_to(tmp13, [XBLOCK, R0_BLOCK]) 2025-12-04T12:15:05.4562115Z E1204 11:56:01.419000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp16 = tl.where(r0_mask, tmp14, 0) 2025-12-04T12:15:05.4562693Z E1204 11:56:01.419000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp17 = tl.sum(tmp16, 1)[:, None].to(tl.float32) 2025-12-04T12:15:05.4563222Z E1204 11:56:01.419000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp18 = tmp1 - tmp11 2025-12-04T12:15:05.4563660Z E1204 11:56:01.419000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp19 = 15.0 2025-12-04T12:15:05.4564150Z E1204 11:56:01.419000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp20 = (tmp17 / tmp19) 2025-12-04T12:15:05.4564607Z E1204 11:56:01.419000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp21 = 1e-05 2025-12-04T12:15:05.4565093Z E1204 11:56:01.419000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp22 = tmp20 + tmp21 2025-12-04T12:15:05.4565632Z E1204 11:56:01.419000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp23 = libdevice.rsqrt(tmp22) 2025-12-04T12:15:05.4566112Z E1204 11:56:01.419000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp24 = tmp18 * tmp23 2025-12-04T12:15:05.4566621Z E1204 11:56:01.419000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp25 = tl_math.abs(tmp24) 2025-12-04T12:15:05.4567223Z E1204 11:56:01.419000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp26 = tl.broadcast_to(tmp25, [XBLOCK, R0_BLOCK]) 2025-12-04T12:15:05.4567801Z E1204 11:56:01.419000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp28 = tl.where(r0_mask, tmp26, float("-inf")) 2025-12-04T12:15:05.4568491Z E1204 11:56:01.419000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp29 = triton_helpers.max2(tmp28, 1)[:, None].to(tl.float32) 2025-12-04T12:15:05.4568973Z E1204 11:56:01.419000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp32 = tmp24 * tmp31 2025-12-04T12:15:05.4569415Z E1204 11:56:01.419000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp33 = -448.0 2025-12-04T12:15:05.4570004Z E1204 11:56:01.419000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp34 = triton_helpers.maximum(tmp32, tmp33) 2025-12-04T12:15:05.4570444Z E1204 11:56:01.419000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp35 = 448.0 2025-12-04T12:15:05.4571195Z E1204 11:56:01.419000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp36 = triton_helpers.minimum(tmp34, tmp35) 2025-12-04T12:15:05.4571735Z E1204 11:56:01.419000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp37 = tmp36.to(tl.float8e4nv) 2025-12-04T12:15:05.4572340Z E1204 11:56:01.419000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp38 = tmp29.to(tl.float32) 2025-12-04T12:15:05.4573047Z E1204 11:56:01.419000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr3 + (tl.broadcast_to(r0_0, [XBLOCK, R0_BLOCK])), tmp37, r0_mask) 2025-12-04T12:15:05.4573763Z E1204 11:56:01.419000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr4 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp38, None) 2025-12-04T12:15:05.4574284Z E1204 11:56:01.419000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:05.4576525Z E1204 11:56:01.419000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr3': '*fp8e4nv', 'out_ptr4': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 75} 2025-12-04T12:15:05.4577127Z E1204 11:56:01.419000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:05.4578175Z E1204 11:56:01.419000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.4578822Z E1204 11:56:01.419000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.4579716Z E1204 11:56:01.419000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.4580414Z E1204 11:56:01.419000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.4581297Z E1204 11:56:01.419000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.4582067Z E1204 11:56:01.419000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.4582742Z E1204 11:56:01.419000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T12:15:05.4583839Z E1204 11:56:01.419000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0(in_ptr0, in_ptr1, out_ptr3, out_ptr4, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.4584224Z E1204 11:56:01.419000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T12:15:05.4585118Z E1204 11:56:01.419000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.4585270Z ('RERUN', {'yellow': True}) [0.4131s] [ 3%] 2025-12-04T12:15:05.4586703Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,1,15_cuda E1204 11:56:01.834000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0 2025-12-04T12:15:05.4587842Z E1204 11:56:01.834000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0(in_ptr0, in_ptr1, out_ptr3, out_ptr4, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.4588290Z E1204 11:56:01.834000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T12:15:05.4588734Z E1204 11:56:01.834000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 15 2025-12-04T12:15:05.4589262Z E1204 11:56:01.834000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] R0_BLOCK: tl.constexpr = 16 2025-12-04T12:15:05.4589763Z E1204 11:56:01.834000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T12:15:05.4590314Z E1204 11:56:01.834000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T12:15:05.4590860Z E1204 11:56:01.834000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:05.4591448Z E1204 11:56:01.834000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T12:15:05.4592086Z E1204 11:56:01.834000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T12:15:05.4592644Z E1204 11:56:01.834000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T12:15:05.4593108Z E1204 11:56:01.834000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_offset = 0 2025-12-04T12:15:05.4593628Z E1204 11:56:01.834000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T12:15:05.4594106Z E1204 11:56:01.834000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T12:15:05.4594579Z E1204 11:56:01.834000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T12:15:05.4595030Z E1204 11:56:01.834000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_0 = r0_index 2025-12-04T12:15:05.4595693Z E1204 11:56:01.834000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0).to(tl.float32) 2025-12-04T12:15:05.4596257Z E1204 11:56:01.834000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp30 = tl.load(in_ptr1 + (0)) 2025-12-04T12:15:05.4596807Z E1204 11:56:01.834000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp31 = tl.broadcast_to(tmp30, [1, 1]) 2025-12-04T12:15:05.4597330Z E1204 11:56:01.834000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float32) 2025-12-04T12:15:05.4597910Z E1204 11:56:01.834000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK]) 2025-12-04T12:15:05.4598459Z E1204 11:56:01.834000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tl.where(r0_mask, tmp2, 0) 2025-12-04T12:15:05.4599037Z E1204 11:56:01.834000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = tl.broadcast_to(tmp2, [XBLOCK, R0_BLOCK]) 2025-12-04T12:15:05.4599574Z E1204 11:56:01.834000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = tl.where(r0_mask, tmp5, 0) 2025-12-04T12:15:05.4600162Z E1204 11:56:01.834000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tl.sum(tmp7, 1)[:, None].to(tl.float32) 2025-12-04T12:15:05.4600729Z E1204 11:56:01.834000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = tl.full([1, 1], 15, tl.int32) 2025-12-04T12:15:05.4601253Z E1204 11:56:01.834000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = tmp9.to(tl.float32) 2025-12-04T12:15:05.4601742Z E1204 11:56:01.834000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = (tmp8 / tmp10) 2025-12-04T12:15:05.4602243Z E1204 11:56:01.834000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = tmp2 - tmp11 2025-12-04T12:15:05.4602721Z E1204 11:56:01.834000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = tmp12 * tmp12 2025-12-04T12:15:05.4603682Z E1204 11:56:01.834000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp14 = tl.broadcast_to(tmp13, [XBLOCK, R0_BLOCK]) 2025-12-04T12:15:05.4604241Z E1204 11:56:01.834000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp16 = tl.where(r0_mask, tmp14, 0) 2025-12-04T12:15:05.4604820Z E1204 11:56:01.834000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp17 = tl.sum(tmp16, 1)[:, None].to(tl.float32) 2025-12-04T12:15:05.4605359Z E1204 11:56:01.834000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp18 = tmp1 - tmp11 2025-12-04T12:15:05.4605795Z E1204 11:56:01.834000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp19 = 15.0 2025-12-04T12:15:05.4606288Z E1204 11:56:01.834000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp20 = (tmp17 / tmp19) 2025-12-04T12:15:05.4606751Z E1204 11:56:01.834000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp21 = 1e-05 2025-12-04T12:15:05.4607232Z E1204 11:56:01.834000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp22 = tmp20 + tmp21 2025-12-04T12:15:05.4607781Z E1204 11:56:01.834000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp23 = libdevice.rsqrt(tmp22) 2025-12-04T12:15:05.4608261Z E1204 11:56:01.834000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp24 = tmp18 * tmp23 2025-12-04T12:15:05.4608769Z E1204 11:56:01.834000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp25 = tl_math.abs(tmp24) 2025-12-04T12:15:05.4609369Z E1204 11:56:01.834000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp26 = tl.broadcast_to(tmp25, [XBLOCK, R0_BLOCK]) 2025-12-04T12:15:05.4609996Z E1204 11:56:01.834000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp28 = tl.where(r0_mask, tmp26, float("-inf")) 2025-12-04T12:15:05.4610646Z E1204 11:56:01.834000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp29 = triton_helpers.max2(tmp28, 1)[:, None].to(tl.float32) 2025-12-04T12:15:05.4611130Z E1204 11:56:01.834000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp32 = tmp24 * tmp31 2025-12-04T12:15:05.4611575Z E1204 11:56:01.834000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp33 = -448.0 2025-12-04T12:15:05.4612166Z E1204 11:56:01.834000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp34 = triton_helpers.maximum(tmp32, tmp33) 2025-12-04T12:15:05.4612605Z E1204 11:56:01.834000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp35 = 448.0 2025-12-04T12:15:05.4613201Z E1204 11:56:01.834000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp36 = triton_helpers.minimum(tmp34, tmp35) 2025-12-04T12:15:05.4613734Z E1204 11:56:01.834000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp37 = tmp36.to(tl.float8e4nv) 2025-12-04T12:15:05.4614311Z E1204 11:56:01.834000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp38 = tmp29.to(tl.float32) 2025-12-04T12:15:05.4615037Z E1204 11:56:01.834000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr3 + (tl.broadcast_to(r0_0, [XBLOCK, R0_BLOCK])), tmp37, r0_mask) 2025-12-04T12:15:05.4615747Z E1204 11:56:01.834000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr4 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp38, None) 2025-12-04T12:15:05.4616130Z E1204 11:56:01.834000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:05.4618359Z E1204 11:56:01.834000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr3': '*fp8e4nv', 'out_ptr4': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 75} 2025-12-04T12:15:05.4618949Z E1204 11:56:01.834000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:05.4619995Z E1204 11:56:01.834000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.4620643Z E1204 11:56:01.834000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.4621542Z E1204 11:56:01.834000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.4622237Z E1204 11:56:01.834000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.4623125Z E1204 11:56:01.834000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.4623893Z E1204 11:56:01.834000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.4624569Z E1204 11:56:01.834000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T12:15:05.4625657Z E1204 11:56:01.834000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0(in_ptr0, in_ptr1, out_ptr3, out_ptr4, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.4626036Z E1204 11:56:01.834000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T12:15:05.4626929Z E1204 11:56:01.834000 116754 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.4627050Z FAILED [0.4126s] [ 3%] 2025-12-04T12:15:05.4627059Z 2025-12-04T12:15:05.4627206Z ==================================== RERUNS ==================================== 2025-12-04T12:15:05.4627596Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,1,15_cuda _ 2025-12-04T12:15:05.4627732Z Traceback (most recent call last): 2025-12-04T12:15:05.4628201Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant 2025-12-04T12:15:05.4628445Z y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T12:15:05.4628953Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:05.4629203Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:05.4629728Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:05.4629927Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:05.4630548Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:05.4630720Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:05.4631255Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:05.4631591Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:05.4632149Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:05.4632303Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:05.4632799Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:05.4632927Z return self._compile_to_module() 2025-12-04T12:15:05.4633416Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:05.4633601Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:05.4634119Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:05.4634264Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:05.4634761Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:05.4634997Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:05.4635601Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:05.4635732Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:05.4636251Z File "/tmp/tmpz9880xg5/sk/cskb2xbapc5orxobaeeehsbmvlgzfbqdt5wuoetd2bvzx3hf7kwu.py", line 74, in 2025-12-04T12:15:05.4636775Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T12:15:05.4636894Z kernel.precompile( 2025-12-04T12:15:05.4637458Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T12:15:05.4637584Z self._precompile_worker() 2025-12-04T12:15:05.4638183Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:05.4638379Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:05.4638970Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.4639180Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.4639630Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.4639882Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.4640342Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.4640715Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.4640958Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:05.4641610Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0(in_ptr0, in_ptr1, out_ptr3, out_ptr4, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.4641702Z ^ 2025-12-04T12:15:05.4642171Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.4642177Z 2025-12-04T12:15:05.4642890Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:05.4642899Z 2025-12-04T12:15:05.4642933Z 2025-12-04T12:15:05.4643168Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:05.4643864Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,1,15_cuda 2025-12-04T12:15:05.4643870Z 2025-12-04T12:15:05.4644141Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:05.4644412Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.4644517Z frames [('total', 1)] 2025-12-04T12:15:05.4644649Z stats [('calls_captured', 10)] 2025-12-04T12:15:05.4645113Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:05.4645337Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.4645454Z graph_break [] 2025-12-04T12:15:05.4645843Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,1,15_cuda _ 2025-12-04T12:15:05.4645967Z Traceback (most recent call last): 2025-12-04T12:15:05.4646407Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant 2025-12-04T12:15:05.4646639Z y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T12:15:05.4647141Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:05.4647391Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:05.4647903Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:05.4648114Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:05.4648655Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:05.4648818Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:05.4649351Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:05.4649674Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:05.4650204Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:05.4650356Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:05.4650833Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:05.4650967Z return self._compile_to_module() 2025-12-04T12:15:05.4651455Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:05.4651635Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:05.4652153Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:05.4652284Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:05.4652822Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:05.4653055Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:05.4653656Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:05.4653784Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:05.4654292Z File "/tmp/tmpj0qljqxx/bh/cbhxnnqevtb5kwecrnfczthhoacmgjrdo6reybht6gqraqtjzvmv.py", line 74, in 2025-12-04T12:15:05.4654771Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T12:15:05.4654915Z kernel.precompile( 2025-12-04T12:15:05.4655472Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T12:15:05.4655609Z self._precompile_worker() 2025-12-04T12:15:05.4656208Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:05.4656538Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:05.4657141Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.4657341Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.4657809Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.4658059Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.4658520Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.4658856Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.4659084Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:05.4659748Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0(in_ptr0, in_ptr1, out_ptr3, out_ptr4, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.4659842Z ^ 2025-12-04T12:15:05.4660315Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.4660321Z 2025-12-04T12:15:05.4661030Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:05.4661085Z 2025-12-04T12:15:05.4661090Z 2025-12-04T12:15:05.4661312Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:05.4662023Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,1,15_cuda 2025-12-04T12:15:05.4662028Z 2025-12-04T12:15:05.4662298Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:05.4662539Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.4662644Z frames [('total', 1)] 2025-12-04T12:15:05.4662762Z stats [('calls_captured', 10)] 2025-12-04T12:15:05.4663239Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:05.4663465Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.4663578Z graph_break [] 2025-12-04T12:15:05.4663798Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.4663904Z frames [('total', 1)] 2025-12-04T12:15:05.4664035Z stats [('calls_captured', 10)] 2025-12-04T12:15:05.4664254Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.4664763Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:05.4664881Z graph_break [] 2025-12-04T12:15:05.4665030Z =================================== FAILURES =================================== 2025-12-04T12:15:05.4665418Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,1,15_cuda _ 2025-12-04T12:15:05.4665558Z Traceback (most recent call last): 2025-12-04T12:15:05.4665984Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant 2025-12-04T12:15:05.4666238Z y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T12:15:05.4666761Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:05.4667012Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:05.4667539Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:05.4667737Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:05.4668297Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:05.4668446Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:05.4668980Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:05.4669322Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:05.4669849Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:05.4670014Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:05.4670498Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:05.4670625Z return self._compile_to_module() 2025-12-04T12:15:05.4671420Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:05.4671593Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:05.4672113Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:05.4672260Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:05.4672856Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:05.4673107Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:05.4673692Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:05.4673823Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:05.4674349Z File "/tmp/tmpxs429tji/st/cstau35jzagywdq344ovuqcdoffasluumixuoleq23pvvnkx3o4i.py", line 74, in 2025-12-04T12:15:05.4674815Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T12:15:05.4674944Z kernel.precompile( 2025-12-04T12:15:05.4675501Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T12:15:05.4675619Z self._precompile_worker() 2025-12-04T12:15:05.4676231Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:05.4676416Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:05.4677073Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.4677287Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.4677741Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.4678004Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.4678449Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.4678785Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.4679033Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:05.4679739Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0(in_ptr0, in_ptr1, out_ptr3, out_ptr4, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.4679849Z ^ 2025-12-04T12:15:05.4680311Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.4680317Z 2025-12-04T12:15:05.4681026Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:05.4681101Z 2025-12-04T12:15:05.4681106Z 2025-12-04T12:15:05.4681340Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:05.4682033Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,1,15_cuda 2025-12-04T12:15:05.4682041Z 2025-12-04T12:15:05.4682328Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:05.4682553Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.4682657Z frames [('total', 1)] 2025-12-04T12:15:05.4682791Z stats [('calls_captured', 10)] 2025-12-04T12:15:05.4683258Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:05.4683492Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.4683594Z graph_break [] 2025-12-04T12:15:05.4683812Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.4683930Z frames [('total', 1)] 2025-12-04T12:15:05.4684047Z stats [('calls_captured', 10)] 2025-12-04T12:15:05.4684264Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.4684774Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:05.4684873Z graph_break [] 2025-12-04T12:15:05.4685105Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.4685208Z frames [('total', 1)] 2025-12-04T12:15:05.4685323Z stats [('calls_captured', 10)] 2025-12-04T12:15:05.4685558Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.4686014Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:05.4686113Z graph_break [] 2025-12-04T12:15:05.4686775Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-7591ded94ad5fda9.xml - 2025-12-04T12:15:05.4686951Z =========================== short test summary info ============================ 2025-12-04T12:15:05.4687805Z FAILED [0.4126s] inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,1,15_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:05.4688457Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0(in_ptr0, in_ptr1, out_ptr3, out_ptr4, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.4688580Z ^ 2025-12-04T12:15:05.4689058Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.4689067Z 2025-12-04T12:15:05.4689776Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:05.4689783Z 2025-12-04T12:15:05.4689788Z 2025-12-04T12:15:05.4690020Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:05.4690712Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,1,15_cuda 2025-12-04T12:15:05.4690720Z 2025-12-04T12:15:05.4691037Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:05.4691221Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T12:15:05.4691442Z ============= 1 failed, 4 skipped, 23 deselected, 2 rerun in 4.26s ============= 2025-12-04T12:15:05.4691556Z Got exit code 1 2025-12-04T12:15:05.4691665Z Retrying single test... 2025-12-04T12:15:05.4692171Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-4955a88ef6b89264.xml 2025-12-04T12:15:05.4692351Z ============================= test session starts ============================== 2025-12-04T12:15:05.4692703Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T12:15:05.4692827Z cachedir: .pytest_cache 2025-12-04T12:15:05.4693348Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:15:05.4693478Z rootdir: /var/lib/jenkins/workspace 2025-12-04T12:15:05.4693601Z configfile: pytest.ini 2025-12-04T12:15:05.4694194Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T12:15:05.4694419Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T12:15:05.4695206Z stepcurrent: skipping 27 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,1,15_cuda 2025-12-04T12:15:05.4695325Z Running 1 items in this shard 2025-12-04T12:15:05.4695331Z 2025-12-04T12:15:05.4696862Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,1,15_cuda E1204 11:56:20.755000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0 2025-12-04T12:15:05.4698001Z E1204 11:56:20.755000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0(in_ptr0, in_ptr1, out_ptr3, out_ptr4, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.4698448Z E1204 11:56:20.755000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T12:15:05.4698893Z E1204 11:56:20.755000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 15 2025-12-04T12:15:05.4699409Z E1204 11:56:20.755000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] R0_BLOCK: tl.constexpr = 16 2025-12-04T12:15:05.4699887Z E1204 11:56:20.755000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T12:15:05.4700429Z E1204 11:56:20.755000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T12:15:05.4700985Z E1204 11:56:20.755000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:05.4701603Z E1204 11:56:20.755000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T12:15:05.4702192Z E1204 11:56:20.755000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T12:15:05.4702763Z E1204 11:56:20.755000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T12:15:05.4703206Z E1204 11:56:20.755000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_offset = 0 2025-12-04T12:15:05.4703770Z E1204 11:56:20.755000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T12:15:05.4704243Z E1204 11:56:20.755000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T12:15:05.4704705Z E1204 11:56:20.755000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T12:15:05.4705162Z E1204 11:56:20.755000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_0 = r0_index 2025-12-04T12:15:05.4705835Z E1204 11:56:20.755000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0).to(tl.float32) 2025-12-04T12:15:05.4706376Z E1204 11:56:20.755000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp30 = tl.load(in_ptr1 + (0)) 2025-12-04T12:15:05.4706924Z E1204 11:56:20.755000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp31 = tl.broadcast_to(tmp30, [1, 1]) 2025-12-04T12:15:05.4707445Z E1204 11:56:20.755000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float32) 2025-12-04T12:15:05.4708027Z E1204 11:56:20.755000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK]) 2025-12-04T12:15:05.4708556Z E1204 11:56:20.755000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tl.where(r0_mask, tmp2, 0) 2025-12-04T12:15:05.4709148Z E1204 11:56:20.755000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = tl.broadcast_to(tmp2, [XBLOCK, R0_BLOCK]) 2025-12-04T12:15:05.4709677Z E1204 11:56:20.755000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = tl.where(r0_mask, tmp5, 0) 2025-12-04T12:15:05.4710297Z E1204 11:56:20.755000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tl.sum(tmp7, 1)[:, None].to(tl.float32) 2025-12-04T12:15:05.4710832Z E1204 11:56:20.755000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = tl.full([1, 1], 15, tl.int32) 2025-12-04T12:15:05.4711345Z E1204 11:56:20.755000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = tmp9.to(tl.float32) 2025-12-04T12:15:05.4711841Z E1204 11:56:20.755000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = (tmp8 / tmp10) 2025-12-04T12:15:05.4712848Z E1204 11:56:20.755000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = tmp2 - tmp11 2025-12-04T12:15:05.4713345Z E1204 11:56:20.755000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = tmp12 * tmp12 2025-12-04T12:15:05.4713928Z E1204 11:56:20.755000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp14 = tl.broadcast_to(tmp13, [XBLOCK, R0_BLOCK]) 2025-12-04T12:15:05.4714476Z E1204 11:56:20.755000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp16 = tl.where(r0_mask, tmp14, 0) 2025-12-04T12:15:05.4715112Z E1204 11:56:20.755000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp17 = tl.sum(tmp16, 1)[:, None].to(tl.float32) 2025-12-04T12:15:05.4715592Z E1204 11:56:20.755000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp18 = tmp1 - tmp11 2025-12-04T12:15:05.4716046Z E1204 11:56:20.755000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp19 = 15.0 2025-12-04T12:15:05.4716537Z E1204 11:56:20.755000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp20 = (tmp17 / tmp19) 2025-12-04T12:15:05.4716972Z E1204 11:56:20.755000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp21 = 1e-05 2025-12-04T12:15:05.4717681Z E1204 11:56:20.755000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp22 = tmp20 + tmp21 2025-12-04T12:15:05.4718215Z E1204 11:56:20.755000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp23 = libdevice.rsqrt(tmp22) 2025-12-04T12:15:05.4718707Z E1204 11:56:20.755000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp24 = tmp18 * tmp23 2025-12-04T12:15:05.4719212Z E1204 11:56:20.755000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp25 = tl_math.abs(tmp24) 2025-12-04T12:15:05.4719836Z E1204 11:56:20.755000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp26 = tl.broadcast_to(tmp25, [XBLOCK, R0_BLOCK]) 2025-12-04T12:15:05.4720429Z E1204 11:56:20.755000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp28 = tl.where(r0_mask, tmp26, float("-inf")) 2025-12-04T12:15:05.4721070Z E1204 11:56:20.755000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp29 = triton_helpers.max2(tmp28, 1)[:, None].to(tl.float32) 2025-12-04T12:15:05.4721566Z E1204 11:56:20.755000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp32 = tmp24 * tmp31 2025-12-04T12:15:05.4722010Z E1204 11:56:20.755000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp33 = -448.0 2025-12-04T12:15:05.4722594Z E1204 11:56:20.755000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp34 = triton_helpers.maximum(tmp32, tmp33) 2025-12-04T12:15:05.4723034Z E1204 11:56:20.755000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp35 = 448.0 2025-12-04T12:15:05.4723602Z E1204 11:56:20.755000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp36 = triton_helpers.minimum(tmp34, tmp35) 2025-12-04T12:15:05.4724187Z E1204 11:56:20.755000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp37 = tmp36.to(tl.float8e4nv) 2025-12-04T12:15:05.4724704Z E1204 11:56:20.755000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp38 = tmp29.to(tl.float32) 2025-12-04T12:15:05.4725421Z E1204 11:56:20.755000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr3 + (tl.broadcast_to(r0_0, [XBLOCK, R0_BLOCK])), tmp37, r0_mask) 2025-12-04T12:15:05.4726131Z E1204 11:56:20.755000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr4 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp38, None) 2025-12-04T12:15:05.4726498Z E1204 11:56:20.755000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:05.4728635Z E1204 11:56:20.755000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr3': '*fp8e4nv', 'out_ptr4': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 75} 2025-12-04T12:15:05.4729173Z E1204 11:56:20.755000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:05.4730232Z E1204 11:56:20.755000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.4730865Z E1204 11:56:20.755000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.4731820Z E1204 11:56:20.755000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.4732502Z E1204 11:56:20.755000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.4733405Z E1204 11:56:20.755000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.4734207Z E1204 11:56:20.755000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.4734829Z E1204 11:56:20.755000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T12:15:05.4735918Z E1204 11:56:20.755000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0(in_ptr0, in_ptr1, out_ptr3, out_ptr4, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.4736361Z E1204 11:56:20.755000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T12:15:05.4737279Z E1204 11:56:20.755000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.4737414Z ('RERUN', {'yellow': True}) [3.3979s] [100%] 2025-12-04T12:15:05.4738862Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,1,15_cuda E1204 11:56:21.209000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0 2025-12-04T12:15:05.4740006Z E1204 11:56:21.209000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0(in_ptr0, in_ptr1, out_ptr3, out_ptr4, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.4740447Z E1204 11:56:21.209000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T12:15:05.4740893Z E1204 11:56:21.209000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 15 2025-12-04T12:15:05.4741407Z E1204 11:56:21.209000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] R0_BLOCK: tl.constexpr = 16 2025-12-04T12:15:05.4741884Z E1204 11:56:21.209000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T12:15:05.4742422Z E1204 11:56:21.209000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T12:15:05.4743006Z E1204 11:56:21.209000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:05.4743589Z E1204 11:56:21.209000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T12:15:05.4744176Z E1204 11:56:21.209000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T12:15:05.4744748Z E1204 11:56:21.209000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T12:15:05.4745190Z E1204 11:56:21.209000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_offset = 0 2025-12-04T12:15:05.4745752Z E1204 11:56:21.209000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T12:15:05.4746227Z E1204 11:56:21.209000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T12:15:05.4746689Z E1204 11:56:21.209000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T12:15:05.4747146Z E1204 11:56:21.209000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_0 = r0_index 2025-12-04T12:15:05.4747822Z E1204 11:56:21.209000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0).to(tl.float32) 2025-12-04T12:15:05.4748358Z E1204 11:56:21.209000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp30 = tl.load(in_ptr1 + (0)) 2025-12-04T12:15:05.4748911Z E1204 11:56:21.209000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp31 = tl.broadcast_to(tmp30, [1, 1]) 2025-12-04T12:15:05.4749420Z E1204 11:56:21.209000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float32) 2025-12-04T12:15:05.4750019Z E1204 11:56:21.209000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK]) 2025-12-04T12:15:05.4750548Z E1204 11:56:21.209000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tl.where(r0_mask, tmp2, 0) 2025-12-04T12:15:05.4751138Z E1204 11:56:21.209000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = tl.broadcast_to(tmp2, [XBLOCK, R0_BLOCK]) 2025-12-04T12:15:05.4751669Z E1204 11:56:21.209000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = tl.where(r0_mask, tmp5, 0) 2025-12-04T12:15:05.4752283Z E1204 11:56:21.209000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tl.sum(tmp7, 1)[:, None].to(tl.float32) 2025-12-04T12:15:05.4752815Z E1204 11:56:21.209000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = tl.full([1, 1], 15, tl.int32) 2025-12-04T12:15:05.4753326Z E1204 11:56:21.209000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = tmp9.to(tl.float32) 2025-12-04T12:15:05.4753827Z E1204 11:56:21.209000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = (tmp8 / tmp10) 2025-12-04T12:15:05.4754313Z E1204 11:56:21.209000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = tmp2 - tmp11 2025-12-04T12:15:05.4754808Z E1204 11:56:21.209000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = tmp12 * tmp12 2025-12-04T12:15:05.4755399Z E1204 11:56:21.209000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp14 = tl.broadcast_to(tmp13, [XBLOCK, R0_BLOCK]) 2025-12-04T12:15:05.4755941Z E1204 11:56:21.209000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp16 = tl.where(r0_mask, tmp14, 0) 2025-12-04T12:15:05.4756559Z E1204 11:56:21.209000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp17 = tl.sum(tmp16, 1)[:, None].to(tl.float32) 2025-12-04T12:15:05.4757040Z E1204 11:56:21.209000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp18 = tmp1 - tmp11 2025-12-04T12:15:05.4757494Z E1204 11:56:21.209000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp19 = 15.0 2025-12-04T12:15:05.4757987Z E1204 11:56:21.209000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp20 = (tmp17 / tmp19) 2025-12-04T12:15:05.4758430Z E1204 11:56:21.209000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp21 = 1e-05 2025-12-04T12:15:05.4758969Z E1204 11:56:21.209000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp22 = tmp20 + tmp21 2025-12-04T12:15:05.4759503Z E1204 11:56:21.209000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp23 = libdevice.rsqrt(tmp22) 2025-12-04T12:15:05.4759998Z E1204 11:56:21.209000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp24 = tmp18 * tmp23 2025-12-04T12:15:05.4760534Z E1204 11:56:21.209000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp25 = tl_math.abs(tmp24) 2025-12-04T12:15:05.4761119Z E1204 11:56:21.209000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp26 = tl.broadcast_to(tmp25, [XBLOCK, R0_BLOCK]) 2025-12-04T12:15:05.4761714Z E1204 11:56:21.209000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp28 = tl.where(r0_mask, tmp26, float("-inf")) 2025-12-04T12:15:05.4762363Z E1204 11:56:21.209000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp29 = triton_helpers.max2(tmp28, 1)[:, None].to(tl.float32) 2025-12-04T12:15:05.4762864Z E1204 11:56:21.209000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp32 = tmp24 * tmp31 2025-12-04T12:15:05.4763315Z E1204 11:56:21.209000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp33 = -448.0 2025-12-04T12:15:05.4763908Z E1204 11:56:21.209000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp34 = triton_helpers.maximum(tmp32, tmp33) 2025-12-04T12:15:05.4764350Z E1204 11:56:21.209000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp35 = 448.0 2025-12-04T12:15:05.4764924Z E1204 11:56:21.209000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp36 = triton_helpers.minimum(tmp34, tmp35) 2025-12-04T12:15:05.4765508Z E1204 11:56:21.209000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp37 = tmp36.to(tl.float8e4nv) 2025-12-04T12:15:05.4766020Z E1204 11:56:21.209000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp38 = tmp29.to(tl.float32) 2025-12-04T12:15:05.4766746Z E1204 11:56:21.209000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr3 + (tl.broadcast_to(r0_0, [XBLOCK, R0_BLOCK])), tmp37, r0_mask) 2025-12-04T12:15:05.4767452Z E1204 11:56:21.209000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr4 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp38, None) 2025-12-04T12:15:05.4767816Z E1204 11:56:21.209000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:05.4769942Z E1204 11:56:21.209000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr3': '*fp8e4nv', 'out_ptr4': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 75} 2025-12-04T12:15:05.4770488Z E1204 11:56:21.209000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:05.4771728Z E1204 11:56:21.209000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.4772359Z E1204 11:56:21.209000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.4773360Z E1204 11:56:21.209000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.4774042Z E1204 11:56:21.209000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.4774986Z E1204 11:56:21.209000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.4775759Z E1204 11:56:21.209000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.4776436Z E1204 11:56:21.209000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T12:15:05.4777554Z E1204 11:56:21.209000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0(in_ptr0, in_ptr1, out_ptr3, out_ptr4, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.4777920Z E1204 11:56:21.209000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T12:15:05.4778828Z E1204 11:56:21.209000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.4778968Z ('RERUN', {'yellow': True}) [0.4171s] [100%] 2025-12-04T12:15:05.4780412Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,1,15_cuda E1204 11:56:21.634000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0 2025-12-04T12:15:05.4781555Z E1204 11:56:21.634000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0(in_ptr0, in_ptr1, out_ptr3, out_ptr4, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.4782004Z E1204 11:56:21.634000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T12:15:05.4782446Z E1204 11:56:21.634000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 15 2025-12-04T12:15:05.4782957Z E1204 11:56:21.634000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] R0_BLOCK: tl.constexpr = 16 2025-12-04T12:15:05.4783429Z E1204 11:56:21.634000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T12:15:05.4783966Z E1204 11:56:21.634000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T12:15:05.4784575Z E1204 11:56:21.634000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:05.4785160Z E1204 11:56:21.634000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T12:15:05.4785741Z E1204 11:56:21.634000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T12:15:05.4786306Z E1204 11:56:21.634000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T12:15:05.4786745Z E1204 11:56:21.634000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_offset = 0 2025-12-04T12:15:05.4787305Z E1204 11:56:21.634000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T12:15:05.4787779Z E1204 11:56:21.634000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T12:15:05.4788241Z E1204 11:56:21.634000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T12:15:05.4788754Z E1204 11:56:21.634000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_0 = r0_index 2025-12-04T12:15:05.4789398Z E1204 11:56:21.634000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0).to(tl.float32) 2025-12-04T12:15:05.4789934Z E1204 11:56:21.634000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp30 = tl.load(in_ptr1 + (0)) 2025-12-04T12:15:05.4790487Z E1204 11:56:21.634000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp31 = tl.broadcast_to(tmp30, [1, 1]) 2025-12-04T12:15:05.4790995Z E1204 11:56:21.634000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float32) 2025-12-04T12:15:05.4791589Z E1204 11:56:21.634000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK]) 2025-12-04T12:15:05.4792122Z E1204 11:56:21.634000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tl.where(r0_mask, tmp2, 0) 2025-12-04T12:15:05.4792714Z E1204 11:56:21.634000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = tl.broadcast_to(tmp2, [XBLOCK, R0_BLOCK]) 2025-12-04T12:15:05.4793244Z E1204 11:56:21.634000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = tl.where(r0_mask, tmp5, 0) 2025-12-04T12:15:05.4793864Z E1204 11:56:21.634000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tl.sum(tmp7, 1)[:, None].to(tl.float32) 2025-12-04T12:15:05.4794401Z E1204 11:56:21.634000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = tl.full([1, 1], 15, tl.int32) 2025-12-04T12:15:05.4794914Z E1204 11:56:21.634000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = tmp9.to(tl.float32) 2025-12-04T12:15:05.4795414Z E1204 11:56:21.634000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = (tmp8 / tmp10) 2025-12-04T12:15:05.4795888Z E1204 11:56:21.634000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = tmp2 - tmp11 2025-12-04T12:15:05.4796380Z E1204 11:56:21.634000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = tmp12 * tmp12 2025-12-04T12:15:05.4796969Z E1204 11:56:21.634000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp14 = tl.broadcast_to(tmp13, [XBLOCK, R0_BLOCK]) 2025-12-04T12:15:05.4797504Z E1204 11:56:21.634000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp16 = tl.where(r0_mask, tmp14, 0) 2025-12-04T12:15:05.4798121Z E1204 11:56:21.634000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp17 = tl.sum(tmp16, 1)[:, None].to(tl.float32) 2025-12-04T12:15:05.4798601Z E1204 11:56:21.634000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp18 = tmp1 - tmp11 2025-12-04T12:15:05.4799050Z E1204 11:56:21.634000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp19 = 15.0 2025-12-04T12:15:05.4799537Z E1204 11:56:21.634000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp20 = (tmp17 / tmp19) 2025-12-04T12:15:05.4799983Z E1204 11:56:21.634000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp21 = 1e-05 2025-12-04T12:15:05.4800508Z E1204 11:56:21.634000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp22 = tmp20 + tmp21 2025-12-04T12:15:05.4801039Z E1204 11:56:21.634000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp23 = libdevice.rsqrt(tmp22) 2025-12-04T12:15:05.4801531Z E1204 11:56:21.634000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp24 = tmp18 * tmp23 2025-12-04T12:15:05.4802070Z E1204 11:56:21.634000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp25 = tl_math.abs(tmp24) 2025-12-04T12:15:05.4802655Z E1204 11:56:21.634000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp26 = tl.broadcast_to(tmp25, [XBLOCK, R0_BLOCK]) 2025-12-04T12:15:05.4803243Z E1204 11:56:21.634000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp28 = tl.where(r0_mask, tmp26, float("-inf")) 2025-12-04T12:15:05.4803888Z E1204 11:56:21.634000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp29 = triton_helpers.max2(tmp28, 1)[:, None].to(tl.float32) 2025-12-04T12:15:05.4804382Z E1204 11:56:21.634000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp32 = tmp24 * tmp31 2025-12-04T12:15:05.4804828Z E1204 11:56:21.634000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp33 = -448.0 2025-12-04T12:15:05.4805404Z E1204 11:56:21.634000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp34 = triton_helpers.maximum(tmp32, tmp33) 2025-12-04T12:15:05.4805860Z E1204 11:56:21.634000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp35 = 448.0 2025-12-04T12:15:05.4806432Z E1204 11:56:21.634000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp36 = triton_helpers.minimum(tmp34, tmp35) 2025-12-04T12:15:05.4807013Z E1204 11:56:21.634000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp37 = tmp36.to(tl.float8e4nv) 2025-12-04T12:15:05.4807530Z E1204 11:56:21.634000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp38 = tmp29.to(tl.float32) 2025-12-04T12:15:05.4808246Z E1204 11:56:21.634000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr3 + (tl.broadcast_to(r0_0, [XBLOCK, R0_BLOCK])), tmp37, r0_mask) 2025-12-04T12:15:05.4808958Z E1204 11:56:21.634000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr4 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp38, None) 2025-12-04T12:15:05.4809321Z E1204 11:56:21.634000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:05.4811454Z E1204 11:56:21.634000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr3': '*fp8e4nv', 'out_ptr4': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 75} 2025-12-04T12:15:05.4811994Z E1204 11:56:21.634000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:05.4813047Z E1204 11:56:21.634000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.4813676Z E1204 11:56:21.634000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.4814612Z E1204 11:56:21.634000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.4815297Z E1204 11:56:21.634000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.4816220Z E1204 11:56:21.634000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.4817073Z E1204 11:56:21.634000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.4817692Z E1204 11:56:21.634000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T12:15:05.4818873Z E1204 11:56:21.634000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0(in_ptr0, in_ptr1, out_ptr3, out_ptr4, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.4819238Z E1204 11:56:21.634000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T12:15:05.4820150Z E1204 11:56:21.634000 116951 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.4820257Z FAILED [0.4229s] [100%] 2025-12-04T12:15:05.4820263Z 2025-12-04T12:15:05.4820461Z ==================================== RERUNS ==================================== 2025-12-04T12:15:05.4820852Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,1,15_cuda _ 2025-12-04T12:15:05.4820979Z Traceback (most recent call last): 2025-12-04T12:15:05.4821419Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant 2025-12-04T12:15:05.4821657Z y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T12:15:05.4822146Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:05.4822408Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:05.4822921Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:05.4823129Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:05.4823642Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:05.4823793Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:05.4824341Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:05.4824725Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:05.4825262Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:05.4825414Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:05.4825892Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:05.4826029Z return self._compile_to_module() 2025-12-04T12:15:05.4826512Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:05.4826679Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:05.4827240Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:05.4827373Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:05.4827884Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:05.4828116Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:05.4828730Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:05.4828869Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:05.4829375Z File "/tmp/tmpkbg3c47x/ry/crymrpytjiqolcjbegw2trazvtmrlfdw4y3cdd2mpwscx5v3qu57.py", line 74, in 2025-12-04T12:15:05.4829852Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T12:15:05.4829962Z kernel.precompile( 2025-12-04T12:15:05.4830519Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T12:15:05.4830652Z self._precompile_worker() 2025-12-04T12:15:05.4831246Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:05.4831424Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:05.4832033Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.4832232Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.4832695Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.4832991Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.4833436Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.4833781Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.4834010Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:05.4834674Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0(in_ptr0, in_ptr1, out_ptr3, out_ptr4, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.4834770Z ^ 2025-12-04T12:15:05.4835230Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.4835236Z 2025-12-04T12:15:05.4835960Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:05.4835969Z 2025-12-04T12:15:05.4835974Z 2025-12-04T12:15:05.4836193Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:05.4837014Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,1,15_cuda 2025-12-04T12:15:05.4837022Z 2025-12-04T12:15:05.4837292Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:05.4837519Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.4837643Z frames [('total', 1)] 2025-12-04T12:15:05.4837764Z stats [('calls_captured', 10)] 2025-12-04T12:15:05.4838245Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:05.4838470Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.4838574Z graph_break [] 2025-12-04T12:15:05.4838978Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,1,15_cuda _ 2025-12-04T12:15:05.4839134Z Traceback (most recent call last): 2025-12-04T12:15:05.4839561Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant 2025-12-04T12:15:05.4839810Z y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T12:15:05.4840298Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:05.4840593Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:05.4841106Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:05.4841300Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:05.4841827Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:05.4841979Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:05.4842525Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:05.4842850Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:05.4843370Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:05.4843535Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:05.4844015Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:05.4844151Z return self._compile_to_module() 2025-12-04T12:15:05.4844635Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:05.4844834Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:05.4845368Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:05.4845514Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:05.4846012Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:05.4846258Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:05.4846845Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:05.4846986Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:05.4847484Z File "/tmp/tmp5w0ergsv/w4/cw4ykyvlsy2aspauo47l6mizex6p44cq6otdsbdeg7dqhi7wl6ku.py", line 74, in 2025-12-04T12:15:05.4847949Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T12:15:05.4848081Z kernel.precompile( 2025-12-04T12:15:05.4848642Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T12:15:05.4848761Z self._precompile_worker() 2025-12-04T12:15:05.4849398Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:05.4849580Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:05.4850192Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.4850394Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.4850847Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.4851110Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.4851586Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.4851936Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.4852169Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:05.4852819Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0(in_ptr0, in_ptr1, out_ptr3, out_ptr4, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.4852962Z ^ 2025-12-04T12:15:05.4853420Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.4853426Z 2025-12-04T12:15:05.4854150Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:05.4854158Z 2025-12-04T12:15:05.4854163Z 2025-12-04T12:15:05.4854385Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:05.4855081Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,1,15_cuda 2025-12-04T12:15:05.4855100Z 2025-12-04T12:15:05.4855373Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:05.4855598Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.4855725Z frames [('total', 1)] 2025-12-04T12:15:05.4855845Z stats [('calls_captured', 10)] 2025-12-04T12:15:05.4856420Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:05.4856663Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.4856813Z graph_break [] 2025-12-04T12:15:05.4857049Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.4857159Z frames [('total', 1)] 2025-12-04T12:15:05.4857283Z stats [('calls_captured', 10)] 2025-12-04T12:15:05.4857516Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.4857983Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:05.4858083Z graph_break [] 2025-12-04T12:15:05.4858248Z =================================== FAILURES =================================== 2025-12-04T12:15:05.4858635Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,1,15_cuda _ 2025-12-04T12:15:05.4858761Z Traceback (most recent call last): 2025-12-04T12:15:05.4859203Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant 2025-12-04T12:15:05.4859436Z y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T12:15:05.4859947Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:05.4860199Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:05.4860739Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:05.4860949Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:05.4861458Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:05.4861621Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:05.4862151Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:05.4862471Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:05.4863005Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:05.4863230Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:05.4863724Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:05.4863847Z return self._compile_to_module() 2025-12-04T12:15:05.4864332Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:05.4864543Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:05.4865062Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:05.4865191Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:05.4865696Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:05.4865934Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:05.4866533Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:05.4866660Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:05.4867161Z File "/tmp/tmpos7maqea/bw/cbwqdwvki3qydy47fp6sn2g4opjc43xlg5yiy5lzvdxmwgnalbhq.py", line 74, in 2025-12-04T12:15:05.4867636Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T12:15:05.4867752Z kernel.precompile( 2025-12-04T12:15:05.4868318Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T12:15:05.4868436Z self._precompile_worker() 2025-12-04T12:15:05.4869029Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:05.4869253Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:05.4869853Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.4870053Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.4870520Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.4870770Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.4871391Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.4871728Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.4871955Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:05.4872623Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0(in_ptr0, in_ptr1, out_ptr3, out_ptr4, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.4872713Z ^ 2025-12-04T12:15:05.4873181Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.4873268Z 2025-12-04T12:15:05.4873980Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:05.4873990Z 2025-12-04T12:15:05.4873995Z 2025-12-04T12:15:05.4874212Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:05.4874917Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,1,15_cuda 2025-12-04T12:15:05.4874923Z 2025-12-04T12:15:05.4875197Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:05.4875480Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.4875587Z frames [('total', 1)] 2025-12-04T12:15:05.4875705Z stats [('calls_captured', 10)] 2025-12-04T12:15:05.4876186Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:05.4876408Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.4876583Z graph_break [] 2025-12-04T12:15:05.4876801Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.4876905Z frames [('total', 1)] 2025-12-04T12:15:05.4877038Z stats [('calls_captured', 10)] 2025-12-04T12:15:05.4877257Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.4877715Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:05.4877832Z graph_break [] 2025-12-04T12:15:05.4878051Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.4878168Z frames [('total', 1)] 2025-12-04T12:15:05.4878285Z stats [('calls_captured', 10)] 2025-12-04T12:15:05.4878508Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.4878979Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:05.4879082Z graph_break [] 2025-12-04T12:15:05.4879728Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-4955a88ef6b89264.xml - 2025-12-04T12:15:05.4879915Z =========================== short test summary info ============================ 2025-12-04T12:15:05.4880753Z FAILED [0.4229s] inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,1,15_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:05.4881464Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0(in_ptr0, in_ptr1, out_ptr3, out_ptr4, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.4881555Z ^ 2025-12-04T12:15:05.4882013Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.4882019Z 2025-12-04T12:15:05.4882741Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:05.4882749Z 2025-12-04T12:15:05.4882754Z 2025-12-04T12:15:05.4882973Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:05.4883681Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,1,15_cuda 2025-12-04T12:15:05.4883689Z 2025-12-04T12:15:05.4883960Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:05.4884159Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T12:15:05.4884364Z ================== 1 failed, 187 deselected, 2 rerun in 4.28s ================== 2025-12-04T12:15:05.4884499Z Got exit code 1 2025-12-04T12:15:05.4884624Z Retrying single test... 2025-12-04T12:15:05.4885098Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-2fae1650dec37ec0.xml 2025-12-04T12:15:05.4885265Z ============================= test session starts ============================== 2025-12-04T12:15:05.4885628Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T12:15:05.4885738Z cachedir: .pytest_cache 2025-12-04T12:15:05.4886267Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:15:05.4886397Z rootdir: /var/lib/jenkins/workspace 2025-12-04T12:15:05.4886540Z configfile: pytest.ini 2025-12-04T12:15:05.4887148Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T12:15:05.4887373Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T12:15:05.4888147Z stepcurrent: skipping 27 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,1,15_cuda 2025-12-04T12:15:05.4888308Z Running 1 items in this shard 2025-12-04T12:15:05.4888313Z 2025-12-04T12:15:05.4889735Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,1,15_cuda E1204 11:56:40.398000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0 2025-12-04T12:15:05.4890841Z E1204 11:56:40.398000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0(in_ptr0, in_ptr1, out_ptr3, out_ptr4, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.4891275Z E1204 11:56:40.398000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T12:15:05.4891732Z E1204 11:56:40.398000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 15 2025-12-04T12:15:05.4892250Z E1204 11:56:40.398000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] R0_BLOCK: tl.constexpr = 16 2025-12-04T12:15:05.4892711Z E1204 11:56:40.398000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T12:15:05.4893329Z E1204 11:56:40.398000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T12:15:05.4893873Z E1204 11:56:40.398000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:05.4894471Z E1204 11:56:40.398000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T12:15:05.4895055Z E1204 11:56:40.398000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T12:15:05.4895613Z E1204 11:56:40.398000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T12:15:05.4896065Z E1204 11:56:40.398000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_offset = 0 2025-12-04T12:15:05.4896667Z E1204 11:56:40.398000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T12:15:05.4897158Z E1204 11:56:40.398000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T12:15:05.4897663Z E1204 11:56:40.398000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T12:15:05.4898126Z E1204 11:56:40.398000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_0 = r0_index 2025-12-04T12:15:05.4898773Z E1204 11:56:40.398000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0).to(tl.float32) 2025-12-04T12:15:05.4899296Z E1204 11:56:40.398000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp30 = tl.load(in_ptr1 + (0)) 2025-12-04T12:15:05.4899853Z E1204 11:56:40.398000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp31 = tl.broadcast_to(tmp30, [1, 1]) 2025-12-04T12:15:05.4900392Z E1204 11:56:40.398000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float32) 2025-12-04T12:15:05.4900984Z E1204 11:56:40.398000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK]) 2025-12-04T12:15:05.4901517Z E1204 11:56:40.398000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tl.where(r0_mask, tmp2, 0) 2025-12-04T12:15:05.4902128Z E1204 11:56:40.398000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = tl.broadcast_to(tmp2, [XBLOCK, R0_BLOCK]) 2025-12-04T12:15:05.4902676Z E1204 11:56:40.398000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = tl.where(r0_mask, tmp5, 0) 2025-12-04T12:15:05.4903249Z E1204 11:56:40.398000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tl.sum(tmp7, 1)[:, None].to(tl.float32) 2025-12-04T12:15:05.4903799Z E1204 11:56:40.398000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = tl.full([1, 1], 15, tl.int32) 2025-12-04T12:15:05.4904312Z E1204 11:56:40.398000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = tmp9.to(tl.float32) 2025-12-04T12:15:05.4904794Z E1204 11:56:40.398000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = (tmp8 / tmp10) 2025-12-04T12:15:05.4905285Z E1204 11:56:40.398000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = tmp2 - tmp11 2025-12-04T12:15:05.4905766Z E1204 11:56:40.398000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = tmp12 * tmp12 2025-12-04T12:15:05.4906363Z E1204 11:56:40.398000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp14 = tl.broadcast_to(tmp13, [XBLOCK, R0_BLOCK]) 2025-12-04T12:15:05.4906940Z E1204 11:56:40.398000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp16 = tl.where(r0_mask, tmp14, 0) 2025-12-04T12:15:05.4907515Z E1204 11:56:40.398000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp17 = tl.sum(tmp16, 1)[:, None].to(tl.float32) 2025-12-04T12:15:05.4908012Z E1204 11:56:40.398000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp18 = tmp1 - tmp11 2025-12-04T12:15:05.4908447Z E1204 11:56:40.398000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp19 = 15.0 2025-12-04T12:15:05.4908952Z E1204 11:56:40.398000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp20 = (tmp17 / tmp19) 2025-12-04T12:15:05.4909390Z E1204 11:56:40.398000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp21 = 1e-05 2025-12-04T12:15:05.4909874Z E1204 11:56:40.398000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp22 = tmp20 + tmp21 2025-12-04T12:15:05.4910415Z E1204 11:56:40.398000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp23 = libdevice.rsqrt(tmp22) 2025-12-04T12:15:05.4910927Z E1204 11:56:40.398000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp24 = tmp18 * tmp23 2025-12-04T12:15:05.4911453Z E1204 11:56:40.398000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp25 = tl_math.abs(tmp24) 2025-12-04T12:15:05.4912043Z E1204 11:56:40.398000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp26 = tl.broadcast_to(tmp25, [XBLOCK, R0_BLOCK]) 2025-12-04T12:15:05.4912630Z E1204 11:56:40.398000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp28 = tl.where(r0_mask, tmp26, float("-inf")) 2025-12-04T12:15:05.4913272Z E1204 11:56:40.398000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp29 = triton_helpers.max2(tmp28, 1)[:, None].to(tl.float32) 2025-12-04T12:15:05.4913785Z E1204 11:56:40.398000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp32 = tmp24 * tmp31 2025-12-04T12:15:05.4914245Z E1204 11:56:40.398000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp33 = -448.0 2025-12-04T12:15:05.4914819Z E1204 11:56:40.398000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp34 = triton_helpers.maximum(tmp32, tmp33) 2025-12-04T12:15:05.4915300Z E1204 11:56:40.398000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp35 = 448.0 2025-12-04T12:15:05.4915868Z E1204 11:56:40.398000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp36 = triton_helpers.minimum(tmp34, tmp35) 2025-12-04T12:15:05.4916400Z E1204 11:56:40.398000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp37 = tmp36.to(tl.float8e4nv) 2025-12-04T12:15:05.4916929Z E1204 11:56:40.398000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp38 = tmp29.to(tl.float32) 2025-12-04T12:15:05.4917635Z E1204 11:56:40.398000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr3 + (tl.broadcast_to(r0_0, [XBLOCK, R0_BLOCK])), tmp37, r0_mask) 2025-12-04T12:15:05.4918351Z E1204 11:56:40.398000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr4 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp38, None) 2025-12-04T12:15:05.4918715Z E1204 11:56:40.398000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:05.4920830Z E1204 11:56:40.398000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr3': '*fp8e4nv', 'out_ptr4': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 75} 2025-12-04T12:15:05.4921401Z E1204 11:56:40.398000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:05.4922465Z E1204 11:56:40.398000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.4923098Z E1204 11:56:40.398000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.4923992Z E1204 11:56:40.398000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.4924718Z E1204 11:56:40.398000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.4925599Z E1204 11:56:40.398000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.4926391Z E1204 11:56:40.398000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.4926998Z E1204 11:56:40.398000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T12:15:05.4928183Z E1204 11:56:40.398000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0(in_ptr0, in_ptr1, out_ptr3, out_ptr4, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.4928554Z E1204 11:56:40.398000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T12:15:05.4929455Z E1204 11:56:40.398000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.4929637Z ('RERUN', {'yellow': True}) [3.3874s] [100%] 2025-12-04T12:15:05.4931062Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,1,15_cuda E1204 11:56:40.851000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0 2025-12-04T12:15:05.4932176Z E1204 11:56:40.851000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0(in_ptr0, in_ptr1, out_ptr3, out_ptr4, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.4932608Z E1204 11:56:40.851000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T12:15:05.4933067Z E1204 11:56:40.851000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 15 2025-12-04T12:15:05.4933586Z E1204 11:56:40.851000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] R0_BLOCK: tl.constexpr = 16 2025-12-04T12:15:05.4934049Z E1204 11:56:40.851000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T12:15:05.4934638Z E1204 11:56:40.851000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T12:15:05.4935183Z E1204 11:56:40.851000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:05.4935792Z E1204 11:56:40.851000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T12:15:05.4936448Z E1204 11:56:40.851000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T12:15:05.4937012Z E1204 11:56:40.851000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T12:15:05.4937472Z E1204 11:56:40.851000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_offset = 0 2025-12-04T12:15:05.4937997Z E1204 11:56:40.851000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T12:15:05.4938486Z E1204 11:56:40.851000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T12:15:05.4938986Z E1204 11:56:40.851000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T12:15:05.4939439Z E1204 11:56:40.851000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_0 = r0_index 2025-12-04T12:15:05.4940103Z E1204 11:56:40.851000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0).to(tl.float32) 2025-12-04T12:15:05.4940630Z E1204 11:56:40.851000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp30 = tl.load(in_ptr1 + (0)) 2025-12-04T12:15:05.4941194Z E1204 11:56:40.851000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp31 = tl.broadcast_to(tmp30, [1, 1]) 2025-12-04T12:15:05.4941731Z E1204 11:56:40.851000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float32) 2025-12-04T12:15:05.4942325Z E1204 11:56:40.851000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK]) 2025-12-04T12:15:05.4942859Z E1204 11:56:40.851000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tl.where(r0_mask, tmp2, 0) 2025-12-04T12:15:05.4943470Z E1204 11:56:40.851000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = tl.broadcast_to(tmp2, [XBLOCK, R0_BLOCK]) 2025-12-04T12:15:05.4944015Z E1204 11:56:40.851000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = tl.where(r0_mask, tmp5, 0) 2025-12-04T12:15:05.4944587Z E1204 11:56:40.851000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tl.sum(tmp7, 1)[:, None].to(tl.float32) 2025-12-04T12:15:05.4945137Z E1204 11:56:40.851000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = tl.full([1, 1], 15, tl.int32) 2025-12-04T12:15:05.4945647Z E1204 11:56:40.851000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = tmp9.to(tl.float32) 2025-12-04T12:15:05.4946134Z E1204 11:56:40.851000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = (tmp8 / tmp10) 2025-12-04T12:15:05.4946629Z E1204 11:56:40.851000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = tmp2 - tmp11 2025-12-04T12:15:05.4947112Z E1204 11:56:40.851000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = tmp12 * tmp12 2025-12-04T12:15:05.4947712Z E1204 11:56:40.851000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp14 = tl.broadcast_to(tmp13, [XBLOCK, R0_BLOCK]) 2025-12-04T12:15:05.4948288Z E1204 11:56:40.851000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp16 = tl.where(r0_mask, tmp14, 0) 2025-12-04T12:15:05.4948867Z E1204 11:56:40.851000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp17 = tl.sum(tmp16, 1)[:, None].to(tl.float32) 2025-12-04T12:15:05.4949357Z E1204 11:56:40.851000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp18 = tmp1 - tmp11 2025-12-04T12:15:05.4949793Z E1204 11:56:40.851000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp19 = 15.0 2025-12-04T12:15:05.4950297Z E1204 11:56:40.851000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp20 = (tmp17 / tmp19) 2025-12-04T12:15:05.4950737Z E1204 11:56:40.851000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp21 = 1e-05 2025-12-04T12:15:05.4951218Z E1204 11:56:40.851000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp22 = tmp20 + tmp21 2025-12-04T12:15:05.4951761Z E1204 11:56:40.851000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp23 = libdevice.rsqrt(tmp22) 2025-12-04T12:15:05.4952268Z E1204 11:56:40.851000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp24 = tmp18 * tmp23 2025-12-04T12:15:05.4952784Z E1204 11:56:40.851000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp25 = tl_math.abs(tmp24) 2025-12-04T12:15:05.4953373Z E1204 11:56:40.851000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp26 = tl.broadcast_to(tmp25, [XBLOCK, R0_BLOCK]) 2025-12-04T12:15:05.4953948Z E1204 11:56:40.851000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp28 = tl.where(r0_mask, tmp26, float("-inf")) 2025-12-04T12:15:05.4954597Z E1204 11:56:40.851000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp29 = triton_helpers.max2(tmp28, 1)[:, None].to(tl.float32) 2025-12-04T12:15:05.4955106Z E1204 11:56:40.851000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp32 = tmp24 * tmp31 2025-12-04T12:15:05.4955564Z E1204 11:56:40.851000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp33 = -448.0 2025-12-04T12:15:05.4956131Z E1204 11:56:40.851000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp34 = triton_helpers.maximum(tmp32, tmp33) 2025-12-04T12:15:05.4956612Z E1204 11:56:40.851000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp35 = 448.0 2025-12-04T12:15:05.4957185Z E1204 11:56:40.851000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp36 = triton_helpers.minimum(tmp34, tmp35) 2025-12-04T12:15:05.4957720Z E1204 11:56:40.851000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp37 = tmp36.to(tl.float8e4nv) 2025-12-04T12:15:05.4958247Z E1204 11:56:40.851000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp38 = tmp29.to(tl.float32) 2025-12-04T12:15:05.4958950Z E1204 11:56:40.851000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr3 + (tl.broadcast_to(r0_0, [XBLOCK, R0_BLOCK])), tmp37, r0_mask) 2025-12-04T12:15:05.4959665Z E1204 11:56:40.851000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr4 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp38, None) 2025-12-04T12:15:05.4960029Z E1204 11:56:40.851000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:05.4962130Z E1204 11:56:40.851000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr3': '*fp8e4nv', 'out_ptr4': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 75} 2025-12-04T12:15:05.4962692Z E1204 11:56:40.851000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:05.4963734Z E1204 11:56:40.851000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.4964377Z E1204 11:56:40.851000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.4965275Z E1204 11:56:40.851000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.4965997Z E1204 11:56:40.851000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.4966874Z E1204 11:56:40.851000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.4967659Z E1204 11:56:40.851000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.4968266Z E1204 11:56:40.851000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T12:15:05.4969405Z E1204 11:56:40.851000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0(in_ptr0, in_ptr1, out_ptr3, out_ptr4, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.4969774Z E1204 11:56:40.851000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T12:15:05.4970678Z E1204 11:56:40.851000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.4970854Z ('RERUN', {'yellow': True}) [0.4139s] [100%] 2025-12-04T12:15:05.4972455Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,1,15_cuda E1204 11:56:41.257000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0 2025-12-04T12:15:05.4973556Z E1204 11:56:41.257000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0(in_ptr0, in_ptr1, out_ptr3, out_ptr4, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.4973984Z E1204 11:56:41.257000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T12:15:05.4974439Z E1204 11:56:41.257000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 15 2025-12-04T12:15:05.4974952Z E1204 11:56:41.257000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] R0_BLOCK: tl.constexpr = 16 2025-12-04T12:15:05.4975413Z E1204 11:56:41.257000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T12:15:05.4976033Z E1204 11:56:41.257000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T12:15:05.4976641Z E1204 11:56:41.257000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:05.4977235Z E1204 11:56:41.257000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T12:15:05.4977821Z E1204 11:56:41.257000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T12:15:05.4978376Z E1204 11:56:41.257000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T12:15:05.4978830Z E1204 11:56:41.257000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_offset = 0 2025-12-04T12:15:05.4979353Z E1204 11:56:41.257000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T12:15:05.4979839Z E1204 11:56:41.257000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T12:15:05.4980371Z E1204 11:56:41.257000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T12:15:05.4980820Z E1204 11:56:41.257000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_0 = r0_index 2025-12-04T12:15:05.4981478Z E1204 11:56:41.257000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0).to(tl.float32) 2025-12-04T12:15:05.4982002Z E1204 11:56:41.257000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp30 = tl.load(in_ptr1 + (0)) 2025-12-04T12:15:05.4982608Z E1204 11:56:41.257000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp31 = tl.broadcast_to(tmp30, [1, 1]) 2025-12-04T12:15:05.4983121Z E1204 11:56:41.257000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float32) 2025-12-04T12:15:05.4983705Z E1204 11:56:41.257000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK]) 2025-12-04T12:15:05.4984251Z E1204 11:56:41.257000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tl.where(r0_mask, tmp2, 0) 2025-12-04T12:15:05.4984877Z E1204 11:56:41.257000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = tl.broadcast_to(tmp2, [XBLOCK, R0_BLOCK]) 2025-12-04T12:15:05.4985428Z E1204 11:56:41.257000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = tl.where(r0_mask, tmp5, 0) 2025-12-04T12:15:05.4986003Z E1204 11:56:41.257000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tl.sum(tmp7, 1)[:, None].to(tl.float32) 2025-12-04T12:15:05.4986554Z E1204 11:56:41.257000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = tl.full([1, 1], 15, tl.int32) 2025-12-04T12:15:05.4987064Z E1204 11:56:41.257000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = tmp9.to(tl.float32) 2025-12-04T12:15:05.4987550Z E1204 11:56:41.257000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = (tmp8 / tmp10) 2025-12-04T12:15:05.4988048Z E1204 11:56:41.257000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = tmp2 - tmp11 2025-12-04T12:15:05.4988531Z E1204 11:56:41.257000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = tmp12 * tmp12 2025-12-04T12:15:05.4989158Z E1204 11:56:41.257000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp14 = tl.broadcast_to(tmp13, [XBLOCK, R0_BLOCK]) 2025-12-04T12:15:05.4989696Z E1204 11:56:41.257000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp16 = tl.where(r0_mask, tmp14, 0) 2025-12-04T12:15:05.4990272Z E1204 11:56:41.257000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp17 = tl.sum(tmp16, 1)[:, None].to(tl.float32) 2025-12-04T12:15:05.4990759Z E1204 11:56:41.257000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp18 = tmp1 - tmp11 2025-12-04T12:15:05.4991198Z E1204 11:56:41.257000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp19 = 15.0 2025-12-04T12:15:05.4991702Z E1204 11:56:41.257000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp20 = (tmp17 / tmp19) 2025-12-04T12:15:05.4992139Z E1204 11:56:41.257000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp21 = 1e-05 2025-12-04T12:15:05.4992624Z E1204 11:56:41.257000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp22 = tmp20 + tmp21 2025-12-04T12:15:05.4993165Z E1204 11:56:41.257000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp23 = libdevice.rsqrt(tmp22) 2025-12-04T12:15:05.4993675Z E1204 11:56:41.257000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp24 = tmp18 * tmp23 2025-12-04T12:15:05.4994196Z E1204 11:56:41.257000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp25 = tl_math.abs(tmp24) 2025-12-04T12:15:05.4994784Z E1204 11:56:41.257000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp26 = tl.broadcast_to(tmp25, [XBLOCK, R0_BLOCK]) 2025-12-04T12:15:05.4995362Z E1204 11:56:41.257000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp28 = tl.where(r0_mask, tmp26, float("-inf")) 2025-12-04T12:15:05.4996041Z E1204 11:56:41.257000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp29 = triton_helpers.max2(tmp28, 1)[:, None].to(tl.float32) 2025-12-04T12:15:05.4996524Z E1204 11:56:41.257000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp32 = tmp24 * tmp31 2025-12-04T12:15:05.4996984Z E1204 11:56:41.257000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp33 = -448.0 2025-12-04T12:15:05.4997561Z E1204 11:56:41.257000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp34 = triton_helpers.maximum(tmp32, tmp33) 2025-12-04T12:15:05.4998032Z E1204 11:56:41.257000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp35 = 448.0 2025-12-04T12:15:05.4998622Z E1204 11:56:41.257000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp36 = triton_helpers.minimum(tmp34, tmp35) 2025-12-04T12:15:05.4999154Z E1204 11:56:41.257000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp37 = tmp36.to(tl.float8e4nv) 2025-12-04T12:15:05.4999687Z E1204 11:56:41.257000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp38 = tmp29.to(tl.float32) 2025-12-04T12:15:05.5000396Z E1204 11:56:41.257000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr3 + (tl.broadcast_to(r0_0, [XBLOCK, R0_BLOCK])), tmp37, r0_mask) 2025-12-04T12:15:05.5001113Z E1204 11:56:41.257000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr4 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp38, None) 2025-12-04T12:15:05.5001478Z E1204 11:56:41.257000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:05.5003580Z E1204 11:56:41.257000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr3': '*fp8e4nv', 'out_ptr4': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 75} 2025-12-04T12:15:05.5004142Z E1204 11:56:41.257000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:05.5005185Z E1204 11:56:41.257000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.5005826Z E1204 11:56:41.257000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.5006719Z E1204 11:56:41.257000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.5007439Z E1204 11:56:41.257000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.5008320Z E1204 11:56:41.257000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.5009101Z E1204 11:56:41.257000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.5009708Z E1204 11:56:41.257000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T12:15:05.5010839Z E1204 11:56:41.257000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0(in_ptr0, in_ptr1, out_ptr3, out_ptr4, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.5011211Z E1204 11:56:41.257000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T12:15:05.5012139Z E1204 11:56:41.257000 117148 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.5012257Z FAILED [0.4047s] [100%] 2025-12-04T12:15:05.5012264Z 2025-12-04T12:15:05.5012410Z ==================================== RERUNS ==================================== 2025-12-04T12:15:05.5012809Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,1,15_cuda _ 2025-12-04T12:15:05.5012938Z Traceback (most recent call last): 2025-12-04T12:15:05.5013364Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant 2025-12-04T12:15:05.5013619Z y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T12:15:05.5014107Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:05.5014360Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:05.5014884Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:05.5015080Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:05.5015606Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:05.5015792Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:05.5016390Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:05.5016733Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:05.5017259Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:05.5017427Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:05.5017907Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:05.5018030Z return self._compile_to_module() 2025-12-04T12:15:05.5018528Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:05.5018697Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:05.5019217Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:05.5019361Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:05.5019894Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:05.5020140Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:05.5020725Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:05.5020856Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:05.5021385Z File "/tmp/tmp6r4qtrxw/pa/cpa6pnodeunjl3urwjgabufcb3mkfpptvgmgglw2mdae6d7fherb.py", line 74, in 2025-12-04T12:15:05.5021847Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T12:15:05.5021977Z kernel.precompile( 2025-12-04T12:15:05.5022558Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T12:15:05.5022680Z self._precompile_worker() 2025-12-04T12:15:05.5023290Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:05.5023471Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:05.5024095Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.5024307Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.5024757Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.5025030Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.5025478Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.5025814Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.5026057Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:05.5026714Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0(in_ptr0, in_ptr1, out_ptr3, out_ptr4, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.5026822Z ^ 2025-12-04T12:15:05.5027280Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.5027286Z 2025-12-04T12:15:05.5027998Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:05.5028066Z 2025-12-04T12:15:05.5028071Z 2025-12-04T12:15:05.5028288Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:05.5028986Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,1,15_cuda 2025-12-04T12:15:05.5028992Z 2025-12-04T12:15:05.5029281Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:05.5029510Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.5029636Z frames [('total', 1)] 2025-12-04T12:15:05.5029758Z stats [('calls_captured', 10)] 2025-12-04T12:15:05.5030225Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:05.5030469Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.5030573Z graph_break [] 2025-12-04T12:15:05.5030963Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,1,15_cuda _ 2025-12-04T12:15:05.5031108Z Traceback (most recent call last): 2025-12-04T12:15:05.5031540Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant 2025-12-04T12:15:05.5031777Z y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T12:15:05.5032349Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:05.5032603Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:05.5033133Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:05.5033330Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:05.5033841Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:05.5034010Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:05.5035099Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:05.5035444Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:05.5035971Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:05.5036122Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:05.5036663Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:05.5036788Z return self._compile_to_module() 2025-12-04T12:15:05.5037290Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:05.5037456Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:05.5037980Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:05.5038133Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:05.5038631Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:05.5038867Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:05.5039477Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:05.5039608Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:05.5040124Z File "/tmp/tmptz337sga/7z/c7zlikqiugzxlfqtxc6m6urum2mjncirlkshav32ot6wlx7nfe72.py", line 74, in 2025-12-04T12:15:05.5040591Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T12:15:05.5040819Z kernel.precompile( 2025-12-04T12:15:05.5041392Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T12:15:05.5041511Z self._precompile_worker() 2025-12-04T12:15:05.5042126Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:05.5042305Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:05.5042901Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.5043119Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.5043572Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.5043815Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.5044275Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.5044610Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.5044849Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:05.5045531Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0(in_ptr0, in_ptr1, out_ptr3, out_ptr4, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.5045624Z ^ 2025-12-04T12:15:05.5046099Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.5046105Z 2025-12-04T12:15:05.5046810Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:05.5046815Z 2025-12-04T12:15:05.5046823Z 2025-12-04T12:15:05.5047051Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:05.5047767Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,1,15_cuda 2025-12-04T12:15:05.5047774Z 2025-12-04T12:15:05.5048061Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:05.5048286Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.5048391Z frames [('total', 1)] 2025-12-04T12:15:05.5048559Z stats [('calls_captured', 10)] 2025-12-04T12:15:05.5049028Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:05.5049252Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.5049364Z graph_break [] 2025-12-04T12:15:05.5049584Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.5049706Z frames [('total', 1)] 2025-12-04T12:15:05.5049825Z stats [('calls_captured', 10)] 2025-12-04T12:15:05.5050048Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.5050529Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:05.5050629Z graph_break [] 2025-12-04T12:15:05.5050779Z =================================== FAILURES =================================== 2025-12-04T12:15:05.5051184Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,1,15_cuda _ 2025-12-04T12:15:05.5051310Z Traceback (most recent call last): 2025-12-04T12:15:05.5051749Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant 2025-12-04T12:15:05.5051981Z y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T12:15:05.5052502Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:05.5052764Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:05.5053279Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:05.5053475Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:05.5053998Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:05.5054148Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:05.5054692Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:05.5055011Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:05.5055529Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:05.5055694Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:05.5056175Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:05.5056408Z return self._compile_to_module() 2025-12-04T12:15:05.5056939Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:05.5057107Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:05.5057644Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:05.5057777Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:05.5058275Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:05.5058527Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:05.5059148Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:05.5059293Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:05.5059791Z File "/tmp/tmpgno1frxj/54/c545r7kzyhgpvzb52bfaz3yevfp3bk4rzevqy3tgwq7t43llz2n3.py", line 74, in 2025-12-04T12:15:05.5060256Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T12:15:05.5060417Z kernel.precompile( 2025-12-04T12:15:05.5060975Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T12:15:05.5061106Z self._precompile_worker() 2025-12-04T12:15:05.5061699Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:05.5061881Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:05.5062486Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.5062685Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.5063137Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.5063395Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.5063844Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.5064192Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.5064419Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:05.5065068Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0(in_ptr0, in_ptr1, out_ptr3, out_ptr4, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.5065202Z ^ 2025-12-04T12:15:05.5065659Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.5065665Z 2025-12-04T12:15:05.5066392Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:05.5066398Z 2025-12-04T12:15:05.5066405Z 2025-12-04T12:15:05.5066622Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:05.5067325Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,1,15_cuda 2025-12-04T12:15:05.5067331Z 2025-12-04T12:15:05.5067601Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:05.5067825Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.5067943Z frames [('total', 1)] 2025-12-04T12:15:05.5068065Z stats [('calls_captured', 10)] 2025-12-04T12:15:05.5068533Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:05.5068799Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.5068902Z graph_break [] 2025-12-04T12:15:05.5069131Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.5069238Z frames [('total', 1)] 2025-12-04T12:15:05.5069356Z stats [('calls_captured', 10)] 2025-12-04T12:15:05.5069585Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.5070046Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:05.5070149Z graph_break [] 2025-12-04T12:15:05.5070381Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.5070516Z frames [('total', 1)] 2025-12-04T12:15:05.5070632Z stats [('calls_captured', 10)] 2025-12-04T12:15:05.5070868Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.5071528Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:05.5071646Z graph_break [] 2025-12-04T12:15:05.5072399Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-2fae1650dec37ec0.xml - 2025-12-04T12:15:05.5072574Z =========================== short test summary info ============================ 2025-12-04T12:15:05.5073421Z FAILED [0.4047s] inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,1,15_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:05.5074072Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0(in_ptr0, in_ptr1, out_ptr3, out_ptr4, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.5074176Z ^ 2025-12-04T12:15:05.5074637Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.5074644Z 2025-12-04T12:15:05.5075352Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:05.5075372Z 2025-12-04T12:15:05.5075377Z 2025-12-04T12:15:05.5075593Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:05.5076282Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,1,15_cuda 2025-12-04T12:15:05.5076336Z 2025-12-04T12:15:05.5076615Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:05.5076801Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T12:15:05.5077020Z ================== 1 failed, 187 deselected, 2 rerun in 4.25s ================== 2025-12-04T12:15:05.5077122Z Got exit code 1 2025-12-04T12:15:05.5077736Z FAILED CONSISTENTLY: test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,1,15_cuda 2025-12-04T12:15:05.5078160Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T12:15:05.5078630Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-0893388d06071d35.xml 2025-12-04T12:15:05.5078795Z ============================= test session starts ============================== 2025-12-04T12:15:05.5079162Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T12:15:05.5079275Z cachedir: .pytest_cache 2025-12-04T12:15:05.5079814Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:15:05.5079942Z rootdir: /var/lib/jenkins/workspace 2025-12-04T12:15:05.5080055Z configfile: pytest.ini 2025-12-04T12:15:05.5080710Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T12:15:05.5080941Z collecting ... collected 188 items / 28 deselected / 160 selected 2025-12-04T12:15:05.5081088Z stepcurrent: skipping 28 already run items. 2025-12-04T12:15:05.5081223Z Running 160 items in this shard 2025-12-04T12:15:05.5081228Z 2025-12-04T12:15:05.5082666Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,15_cuda E1204 11:57:00.639000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1 2025-12-04T12:15:05.5083885Z E1204 11:57:00.639000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.5084320Z E1204 11:57:00.639000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T12:15:05.5084815Z E1204 11:57:00.639000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 150 2025-12-04T12:15:05.5085337Z E1204 11:57:00.639000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] R0_BLOCK: tl.constexpr = 256 2025-12-04T12:15:05.5085797Z E1204 11:57:00.639000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T12:15:05.5086351Z E1204 11:57:00.639000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T12:15:05.5086891Z E1204 11:57:00.639000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:05.5092296Z E1204 11:57:00.639000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T12:15:05.5092941Z E1204 11:57:00.639000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T12:15:05.5093506Z E1204 11:57:00.639000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T12:15:05.5093965Z E1204 11:57:00.639000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_offset = 0 2025-12-04T12:15:05.5094564Z E1204 11:57:00.639000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T12:15:05.5095052Z E1204 11:57:00.639000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T12:15:05.5095517Z E1204 11:57:00.639000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T12:15:05.5095964Z E1204 11:57:00.639000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_2 = r0_index 2025-12-04T12:15:05.5096573Z E1204 11:57:00.639000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_1 = r0_index // 15 2025-12-04T12:15:05.5097225Z E1204 11:57:00.639000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_2), r0_mask, other=0.0).to(tl.float32) 2025-12-04T12:15:05.5097934Z E1204 11:57:00.639000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.load(in_ptr1 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0) 2025-12-04T12:15:05.5098623Z E1204 11:57:00.639000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tl.load(in_ptr2 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0) 2025-12-04T12:15:05.5099209Z E1204 11:57:00.639000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp16 = tl.load(in_ptr3 + (0)) 2025-12-04T12:15:05.5099766Z E1204 11:57:00.639000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp17 = tl.broadcast_to(tmp16, [1, 1]) 2025-12-04T12:15:05.5100278Z E1204 11:57:00.639000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float32) 2025-12-04T12:15:05.5100771Z E1204 11:57:00.639000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tmp1 - tmp2 2025-12-04T12:15:05.5101210Z E1204 11:57:00.639000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = 15.0 2025-12-04T12:15:05.5101741Z E1204 11:57:00.639000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp6 = (tmp4 / tmp5) 2025-12-04T12:15:05.5102187Z E1204 11:57:00.639000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = 1e-05 2025-12-04T12:15:05.5102651Z E1204 11:57:00.639000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tmp6 + tmp7 2025-12-04T12:15:05.5103219Z E1204 11:57:00.639000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = libdevice.rsqrt(tmp8) 2025-12-04T12:15:05.5103688Z E1204 11:57:00.639000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = tmp3 * tmp9 2025-12-04T12:15:05.5104202Z E1204 11:57:00.639000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = tl_math.abs(tmp10) 2025-12-04T12:15:05.5104795Z E1204 11:57:00.639000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = tl.broadcast_to(tmp11, [XBLOCK, R0_BLOCK]) 2025-12-04T12:15:05.5105375Z E1204 11:57:00.639000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp14 = tl.where(r0_mask, tmp12, float("-inf")) 2025-12-04T12:15:05.5106033Z E1204 11:57:00.639000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp15 = triton_helpers.max2(tmp14, 1)[:, None].to(tl.float32) 2025-12-04T12:15:05.5106518Z E1204 11:57:00.639000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp18 = tmp10 * tmp17 2025-12-04T12:15:05.5106975Z E1204 11:57:00.639000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp19 = -448.0 2025-12-04T12:15:05.5107546Z E1204 11:57:00.639000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp20 = triton_helpers.maximum(tmp18, tmp19) 2025-12-04T12:15:05.5108022Z E1204 11:57:00.639000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp21 = 448.0 2025-12-04T12:15:05.5108611Z E1204 11:57:00.639000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp22 = triton_helpers.minimum(tmp20, tmp21) 2025-12-04T12:15:05.5109148Z E1204 11:57:00.639000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp23 = tmp22.to(tl.float8e4nv) 2025-12-04T12:15:05.5109673Z E1204 11:57:00.639000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp24 = tmp15.to(tl.float32) 2025-12-04T12:15:05.5110382Z E1204 11:57:00.639000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (tl.broadcast_to(r0_2, [XBLOCK, R0_BLOCK])), tmp23, r0_mask) 2025-12-04T12:15:05.5111105Z E1204 11:57:00.639000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr2 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp24, None) 2025-12-04T12:15:05.5111477Z E1204 11:57:00.639000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:05.5113886Z E1204 11:57:00.639000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'in_ptr2': '*fp32', 'in_ptr3': '*fp32', 'out_ptr1': '*fp8e4nv', 'out_ptr2': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 75} 2025-12-04T12:15:05.5114445Z E1204 11:57:00.639000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:05.5115524Z E1204 11:57:00.639000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.5116170Z E1204 11:57:00.639000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.5117097Z E1204 11:57:00.639000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.5117791Z E1204 11:57:00.639000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.5118682Z E1204 11:57:00.639000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.5119465Z E1204 11:57:00.639000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.5120069Z E1204 11:57:00.639000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T12:15:05.5121231Z E1204 11:57:00.639000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.5121595Z E1204 11:57:00.639000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T12:15:05.5122525Z E1204 11:57:00.639000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.5122671Z ('RERUN', {'yellow': True}) [3.6018s] [ 0%] 2025-12-04T12:15:05.5124101Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,15_cuda E1204 11:57:01.283000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1 2025-12-04T12:15:05.5125259Z E1204 11:57:01.283000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.5125692Z E1204 11:57:01.283000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T12:15:05.5126147Z E1204 11:57:01.283000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 150 2025-12-04T12:15:05.5126699Z E1204 11:57:01.283000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] R0_BLOCK: tl.constexpr = 256 2025-12-04T12:15:05.5127162Z E1204 11:57:01.283000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T12:15:05.5127709Z E1204 11:57:01.283000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T12:15:05.5128252Z E1204 11:57:01.283000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:05.5128846Z E1204 11:57:01.283000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T12:15:05.5129477Z E1204 11:57:01.283000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T12:15:05.5130034Z E1204 11:57:01.283000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T12:15:05.5130485Z E1204 11:57:01.283000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_offset = 0 2025-12-04T12:15:05.5131035Z E1204 11:57:01.283000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T12:15:05.5131514Z E1204 11:57:01.283000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T12:15:05.5131971Z E1204 11:57:01.283000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T12:15:05.5132421Z E1204 11:57:01.283000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_2 = r0_index 2025-12-04T12:15:05.5132914Z E1204 11:57:01.283000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_1 = r0_index // 15 2025-12-04T12:15:05.5133559Z E1204 11:57:01.283000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_2), r0_mask, other=0.0).to(tl.float32) 2025-12-04T12:15:05.5134257Z E1204 11:57:01.283000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.load(in_ptr1 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0) 2025-12-04T12:15:05.5134944Z E1204 11:57:01.283000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tl.load(in_ptr2 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0) 2025-12-04T12:15:05.5135516Z E1204 11:57:01.283000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp16 = tl.load(in_ptr3 + (0)) 2025-12-04T12:15:05.5136069Z E1204 11:57:01.283000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp17 = tl.broadcast_to(tmp16, [1, 1]) 2025-12-04T12:15:05.5136650Z E1204 11:57:01.283000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float32) 2025-12-04T12:15:05.5137133Z E1204 11:57:01.283000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tmp1 - tmp2 2025-12-04T12:15:05.5137571Z E1204 11:57:01.283000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = 15.0 2025-12-04T12:15:05.5138061Z E1204 11:57:01.283000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp6 = (tmp4 / tmp5) 2025-12-04T12:15:05.5138497Z E1204 11:57:01.283000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = 1e-05 2025-12-04T12:15:05.5138963Z E1204 11:57:01.283000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tmp6 + tmp7 2025-12-04T12:15:05.5139490Z E1204 11:57:01.283000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = libdevice.rsqrt(tmp8) 2025-12-04T12:15:05.5140001Z E1204 11:57:01.283000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = tmp3 * tmp9 2025-12-04T12:15:05.5140521Z E1204 11:57:01.283000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = tl_math.abs(tmp10) 2025-12-04T12:15:05.5141116Z E1204 11:57:01.283000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = tl.broadcast_to(tmp11, [XBLOCK, R0_BLOCK]) 2025-12-04T12:15:05.5141691Z E1204 11:57:01.283000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp14 = tl.where(r0_mask, tmp12, float("-inf")) 2025-12-04T12:15:05.5142382Z E1204 11:57:01.283000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp15 = triton_helpers.max2(tmp14, 1)[:, None].to(tl.float32) 2025-12-04T12:15:05.5142869Z E1204 11:57:01.283000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp18 = tmp10 * tmp17 2025-12-04T12:15:05.5143323Z E1204 11:57:01.283000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp19 = -448.0 2025-12-04T12:15:05.5143891Z E1204 11:57:01.283000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp20 = triton_helpers.maximum(tmp18, tmp19) 2025-12-04T12:15:05.5144359Z E1204 11:57:01.283000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp21 = 448.0 2025-12-04T12:15:05.5144943Z E1204 11:57:01.283000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp22 = triton_helpers.minimum(tmp20, tmp21) 2025-12-04T12:15:05.5145475Z E1204 11:57:01.283000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp23 = tmp22.to(tl.float8e4nv) 2025-12-04T12:15:05.5146005Z E1204 11:57:01.283000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp24 = tmp15.to(tl.float32) 2025-12-04T12:15:05.5146708Z E1204 11:57:01.283000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (tl.broadcast_to(r0_2, [XBLOCK, R0_BLOCK])), tmp23, r0_mask) 2025-12-04T12:15:05.5147414Z E1204 11:57:01.283000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr2 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp24, None) 2025-12-04T12:15:05.5147791Z E1204 11:57:01.283000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:05.5150164Z E1204 11:57:01.283000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'in_ptr2': '*fp32', 'in_ptr3': '*fp32', 'out_ptr1': '*fp8e4nv', 'out_ptr2': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 75} 2025-12-04T12:15:05.5150751Z E1204 11:57:01.283000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:05.5151791Z E1204 11:57:01.283000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.5152436Z E1204 11:57:01.283000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.5153360Z E1204 11:57:01.283000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.5154054Z E1204 11:57:01.283000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.5154935Z E1204 11:57:01.283000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.5155714Z E1204 11:57:01.283000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.5156357Z E1204 11:57:01.283000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T12:15:05.5157509Z E1204 11:57:01.283000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.5157918Z E1204 11:57:01.283000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T12:15:05.5158810Z E1204 11:57:01.283000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.5158956Z ('RERUN', {'yellow': True}) [0.6032s] [ 0%] 2025-12-04T12:15:05.5160396Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,15_cuda E1204 11:57:01.890000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1 2025-12-04T12:15:05.5161554Z E1204 11:57:01.890000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.5161986Z E1204 11:57:01.890000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T12:15:05.5162428Z E1204 11:57:01.890000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 150 2025-12-04T12:15:05.5162957Z E1204 11:57:01.890000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] R0_BLOCK: tl.constexpr = 256 2025-12-04T12:15:05.5163454Z E1204 11:57:01.890000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T12:15:05.5164000Z E1204 11:57:01.890000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T12:15:05.5164539Z E1204 11:57:01.890000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:05.5165137Z E1204 11:57:01.890000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T12:15:05.5165719Z E1204 11:57:01.890000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T12:15:05.5166275Z E1204 11:57:01.890000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T12:15:05.5166732Z E1204 11:57:01.890000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_offset = 0 2025-12-04T12:15:05.5167245Z E1204 11:57:01.890000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T12:15:05.5167761Z E1204 11:57:01.890000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T12:15:05.5168221Z E1204 11:57:01.890000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T12:15:05.5168664Z E1204 11:57:01.890000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_2 = r0_index 2025-12-04T12:15:05.5169162Z E1204 11:57:01.890000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_1 = r0_index // 15 2025-12-04T12:15:05.5169808Z E1204 11:57:01.890000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_2), r0_mask, other=0.0).to(tl.float32) 2025-12-04T12:15:05.5170546Z E1204 11:57:01.890000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.load(in_ptr1 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0) 2025-12-04T12:15:05.5171455Z E1204 11:57:01.890000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tl.load(in_ptr2 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0) 2025-12-04T12:15:05.5172071Z E1204 11:57:01.890000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp16 = tl.load(in_ptr3 + (0)) 2025-12-04T12:15:05.5172635Z E1204 11:57:01.890000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp17 = tl.broadcast_to(tmp16, [1, 1]) 2025-12-04T12:15:05.5173138Z E1204 11:57:01.890000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float32) 2025-12-04T12:15:05.5173634Z E1204 11:57:01.890000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tmp1 - tmp2 2025-12-04T12:15:05.5174063Z E1204 11:57:01.890000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = 15.0 2025-12-04T12:15:05.5174541Z E1204 11:57:01.890000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp6 = (tmp4 / tmp5) 2025-12-04T12:15:05.5174992Z E1204 11:57:01.890000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = 1e-05 2025-12-04T12:15:05.5175455Z E1204 11:57:01.890000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tmp6 + tmp7 2025-12-04T12:15:05.5175990Z E1204 11:57:01.890000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = libdevice.rsqrt(tmp8) 2025-12-04T12:15:05.5176578Z E1204 11:57:01.890000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = tmp3 * tmp9 2025-12-04T12:15:05.5177106Z E1204 11:57:01.890000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = tl_math.abs(tmp10) 2025-12-04T12:15:05.5177693Z E1204 11:57:01.890000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = tl.broadcast_to(tmp11, [XBLOCK, R0_BLOCK]) 2025-12-04T12:15:05.5178270Z E1204 11:57:01.890000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp14 = tl.where(r0_mask, tmp12, float("-inf")) 2025-12-04T12:15:05.5178921Z E1204 11:57:01.890000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp15 = triton_helpers.max2(tmp14, 1)[:, None].to(tl.float32) 2025-12-04T12:15:05.5179404Z E1204 11:57:01.890000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp18 = tmp10 * tmp17 2025-12-04T12:15:05.5179862Z E1204 11:57:01.890000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp19 = -448.0 2025-12-04T12:15:05.5180438Z E1204 11:57:01.890000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp20 = triton_helpers.maximum(tmp18, tmp19) 2025-12-04T12:15:05.5180954Z E1204 11:57:01.890000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp21 = 448.0 2025-12-04T12:15:05.5181543Z E1204 11:57:01.890000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp22 = triton_helpers.minimum(tmp20, tmp21) 2025-12-04T12:15:05.5182079Z E1204 11:57:01.890000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp23 = tmp22.to(tl.float8e4nv) 2025-12-04T12:15:05.5182606Z E1204 11:57:01.890000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp24 = tmp15.to(tl.float32) 2025-12-04T12:15:05.5183307Z E1204 11:57:01.890000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (tl.broadcast_to(r0_2, [XBLOCK, R0_BLOCK])), tmp23, r0_mask) 2025-12-04T12:15:05.5184064Z E1204 11:57:01.890000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr2 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp24, None) 2025-12-04T12:15:05.5184446Z E1204 11:57:01.890000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:05.5186821Z E1204 11:57:01.890000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'in_ptr2': '*fp32', 'in_ptr3': '*fp32', 'out_ptr1': '*fp8e4nv', 'out_ptr2': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 75} 2025-12-04T12:15:05.5187405Z E1204 11:57:01.890000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:05.5188455Z E1204 11:57:01.890000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.5189095Z E1204 11:57:01.890000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.5189990Z E1204 11:57:01.890000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.5190711Z E1204 11:57:01.890000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.5191599Z E1204 11:57:01.890000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.5192387Z E1204 11:57:01.890000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.5192999Z E1204 11:57:01.890000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T12:15:05.5194146Z E1204 11:57:01.890000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.5194530Z E1204 11:57:01.890000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T12:15:05.5195449Z E1204 11:57:01.890000 117345 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.5195572Z FAILED [0.6044s] [ 0%] 2025-12-04T12:15:05.5195582Z 2025-12-04T12:15:05.5195728Z ==================================== RERUNS ==================================== 2025-12-04T12:15:05.5196119Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,15_cuda _ 2025-12-04T12:15:05.5196259Z Traceback (most recent call last): 2025-12-04T12:15:05.5196682Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant 2025-12-04T12:15:05.5196931Z y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T12:15:05.5197455Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:05.5197707Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:05.5198235Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:05.5198428Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:05.5198984Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:05.5199131Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:05.5199669Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:05.5200005Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:05.5200526Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:05.5200686Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:05.5201166Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:05.5201289Z return self._compile_to_module() 2025-12-04T12:15:05.5201785Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:05.5201950Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:05.5202466Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:05.5202609Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:05.5203137Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:05.5203387Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:05.5203979Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:05.5204107Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:05.5204631Z File "/tmp/tmp1bn6eiho/ok/coklw6jhzyywldqoqlnuc7qwmoiowja3jt4to6qo2t4hp7gnezio.py", line 137, in 2025-12-04T12:15:05.5205096Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T12:15:05.5205212Z kernel.precompile( 2025-12-04T12:15:05.5205781Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T12:15:05.5205904Z self._precompile_worker() 2025-12-04T12:15:05.5206516Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:05.5206701Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:05.5207329Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.5207544Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.5207997Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.5208260Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.5208704Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.5209040Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.5209285Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:05.5210026Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.5210132Z ^ 2025-12-04T12:15:05.5210596Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.5210602Z 2025-12-04T12:15:05.5211313Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:05.5211350Z 2025-12-04T12:15:05.5211356Z 2025-12-04T12:15:05.5211593Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:05.5212300Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,15_cuda 2025-12-04T12:15:05.5212308Z 2025-12-04T12:15:05.5212595Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:05.5212824Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.5212932Z frames [('total', 1)] 2025-12-04T12:15:05.5213069Z stats [('calls_captured', 10)] 2025-12-04T12:15:05.5213538Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:15:05.5213787Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.5213892Z graph_break [] 2025-12-04T12:15:05.5214283Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,15_cuda _ 2025-12-04T12:15:05.5214422Z Traceback (most recent call last): 2025-12-04T12:15:05.5214844Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant 2025-12-04T12:15:05.5215113Z y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T12:15:05.5215621Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:05.5215871Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:05.5216466Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:05.5216665Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:05.5217177Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:05.5217335Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:05.5217870Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:05.5218206Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:05.5218729Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:05.5218876Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:05.5219411Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:05.5219532Z return self._compile_to_module() 2025-12-04T12:15:05.5220014Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:05.5220194Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:05.5220709Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:05.5220851Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:05.5221351Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:05.5221612Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:05.5222205Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:05.5222333Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:05.5222846Z File "/tmp/tmp840scpu5/zs/czsvjiyapphd5wxcf5yncv246rn65nyv2s2d6t3fzqsnes5hem2f.py", line 137, in 2025-12-04T12:15:05.5223342Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T12:15:05.5223455Z kernel.precompile( 2025-12-04T12:15:05.5224020Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T12:15:05.5224138Z self._precompile_worker() 2025-12-04T12:15:05.5224738Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:05.5224932Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:05.5225531Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.5225742Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.5226194Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.5226439Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.5226898Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.5227231Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.5227518Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:05.5228229Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.5228321Z ^ 2025-12-04T12:15:05.5228800Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.5228806Z 2025-12-04T12:15:05.5229515Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:05.5229524Z 2025-12-04T12:15:05.5229529Z 2025-12-04T12:15:05.5229756Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:05.5230458Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,15_cuda 2025-12-04T12:15:05.5230466Z 2025-12-04T12:15:05.5230732Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:05.5230970Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.5231075Z frames [('total', 1)] 2025-12-04T12:15:05.5231206Z stats [('calls_captured', 10)] 2025-12-04T12:15:05.5231704Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:15:05.5231931Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.5232046Z graph_break [] 2025-12-04T12:15:05.5232268Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.5232372Z frames [('total', 1)] 2025-12-04T12:15:05.5232501Z stats [('calls_captured', 10)] 2025-12-04T12:15:05.5232717Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.5233190Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:15:05.5233321Z graph_break [] 2025-12-04T12:15:05.5233470Z =================================== FAILURES =================================== 2025-12-04T12:15:05.5233880Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,15_cuda _ 2025-12-04T12:15:05.5234005Z Traceback (most recent call last): 2025-12-04T12:15:05.5234428Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant 2025-12-04T12:15:05.5234704Z y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T12:15:05.5235195Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:05.5235456Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:05.5235973Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:05.5236169Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:05.5236689Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:05.5236841Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:05.5237386Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:05.5237709Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:05.5238227Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:05.5238388Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:05.5238867Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:05.5239023Z return self._compile_to_module() 2025-12-04T12:15:05.5239522Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:05.5239689Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:05.5240216Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:05.5240345Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:05.5240843Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:05.5241087Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:05.5241674Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:05.5241815Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:05.5242321Z File "/tmp/tmpscng8l2m/7e/c7eckh5miwnuwcfxqtk2rg3a7uh4hwkjx2zmjj4tq7p24eezw6nj.py", line 137, in 2025-12-04T12:15:05.5242781Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T12:15:05.5242906Z kernel.precompile( 2025-12-04T12:15:05.5243495Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T12:15:05.5243615Z self._precompile_worker() 2025-12-04T12:15:05.5244225Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:05.5244406Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:05.5245009Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.5245210Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.5245686Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.5245945Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.5246391Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.5246737Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.5247081Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:05.5247790Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.5247895Z ^ 2025-12-04T12:15:05.5248352Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.5248360Z 2025-12-04T12:15:05.5249088Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:05.5249093Z 2025-12-04T12:15:05.5249097Z 2025-12-04T12:15:05.5249319Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:05.5250014Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,15_cuda 2025-12-04T12:15:05.5250037Z 2025-12-04T12:15:05.5250307Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:05.5250530Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.5250653Z frames [('total', 1)] 2025-12-04T12:15:05.5250768Z stats [('calls_captured', 10)] 2025-12-04T12:15:05.5251266Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:15:05.5251501Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.5251601Z graph_break [] 2025-12-04T12:15:05.5251830Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.5251936Z frames [('total', 1)] 2025-12-04T12:15:05.5252054Z stats [('calls_captured', 10)] 2025-12-04T12:15:05.5252286Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.5252750Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:15:05.5252851Z graph_break [] 2025-12-04T12:15:05.5253079Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.5253183Z frames [('total', 1)] 2025-12-04T12:15:05.5253299Z stats [('calls_captured', 10)] 2025-12-04T12:15:05.5253530Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.5253987Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:15:05.5254101Z graph_break [] 2025-12-04T12:15:05.5254772Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-0893388d06071d35.xml - 2025-12-04T12:15:05.5254950Z =========================== short test summary info ============================ 2025-12-04T12:15:05.5255807Z FAILED [0.6044s] inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,15_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:05.5256585Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.5256695Z ^ 2025-12-04T12:15:05.5257193Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.5257199Z 2025-12-04T12:15:05.5257912Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:05.5257932Z 2025-12-04T12:15:05.5257937Z 2025-12-04T12:15:05.5258153Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:05.5258881Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,15_cuda 2025-12-04T12:15:05.5258887Z 2025-12-04T12:15:05.5259169Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:05.5259350Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T12:15:05.5259568Z ================== 1 failed, 28 deselected, 2 rerun in 4.85s =================== 2025-12-04T12:15:05.5259671Z Got exit code 1 2025-12-04T12:15:05.5259782Z Retrying single test... 2025-12-04T12:15:05.5260266Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-b62e3abe6013e6ef.xml 2025-12-04T12:15:05.5260434Z ============================= test session starts ============================== 2025-12-04T12:15:05.5260788Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T12:15:05.5260914Z cachedir: .pytest_cache 2025-12-04T12:15:05.5261434Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:15:05.5261572Z rootdir: /var/lib/jenkins/workspace 2025-12-04T12:15:05.5261681Z configfile: pytest.ini 2025-12-04T12:15:05.5262271Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T12:15:05.5262542Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T12:15:05.5263328Z stepcurrent: skipping 28 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,15_cuda 2025-12-04T12:15:05.5263446Z Running 1 items in this shard 2025-12-04T12:15:05.5263452Z 2025-12-04T12:15:05.5264887Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,15_cuda E1204 11:57:20.400000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1 2025-12-04T12:15:05.5266043Z E1204 11:57:20.400000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.5266506Z E1204 11:57:20.400000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T12:15:05.5266985Z E1204 11:57:20.400000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 150 2025-12-04T12:15:05.5267523Z E1204 11:57:20.400000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] R0_BLOCK: tl.constexpr = 256 2025-12-04T12:15:05.5267988Z E1204 11:57:20.400000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T12:15:05.5268522Z E1204 11:57:20.400000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T12:15:05.5269073Z E1204 11:57:20.400000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:05.5269693Z E1204 11:57:20.400000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T12:15:05.5270296Z E1204 11:57:20.400000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T12:15:05.5270857Z E1204 11:57:20.400000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T12:15:05.5271583Z E1204 11:57:20.400000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_offset = 0 2025-12-04T12:15:05.5272102Z E1204 11:57:20.400000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T12:15:05.5272577Z E1204 11:57:20.400000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T12:15:05.5273054Z E1204 11:57:20.400000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T12:15:05.5273507Z E1204 11:57:20.400000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_2 = r0_index 2025-12-04T12:15:05.5274008Z E1204 11:57:20.400000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_1 = r0_index // 15 2025-12-04T12:15:05.5274663Z E1204 11:57:20.400000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_2), r0_mask, other=0.0).to(tl.float32) 2025-12-04T12:15:05.5275358Z E1204 11:57:20.400000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.load(in_ptr1 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0) 2025-12-04T12:15:05.5276058Z E1204 11:57:20.400000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tl.load(in_ptr2 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0) 2025-12-04T12:15:05.5276663Z E1204 11:57:20.400000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp16 = tl.load(in_ptr3 + (0)) 2025-12-04T12:15:05.5277226Z E1204 11:57:20.400000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp17 = tl.broadcast_to(tmp16, [1, 1]) 2025-12-04T12:15:05.5277734Z E1204 11:57:20.400000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float32) 2025-12-04T12:15:05.5278207Z E1204 11:57:20.400000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tmp1 - tmp2 2025-12-04T12:15:05.5278653Z E1204 11:57:20.400000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = 15.0 2025-12-04T12:15:05.5279135Z E1204 11:57:20.400000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp6 = (tmp4 / tmp5) 2025-12-04T12:15:05.5279588Z E1204 11:57:20.400000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = 1e-05 2025-12-04T12:15:05.5280057Z E1204 11:57:20.400000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tmp6 + tmp7 2025-12-04T12:15:05.5280624Z E1204 11:57:20.400000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = libdevice.rsqrt(tmp8) 2025-12-04T12:15:05.5281108Z E1204 11:57:20.400000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = tmp3 * tmp9 2025-12-04T12:15:05.5281614Z E1204 11:57:20.400000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = tl_math.abs(tmp10) 2025-12-04T12:15:05.5282216Z E1204 11:57:20.400000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = tl.broadcast_to(tmp11, [XBLOCK, R0_BLOCK]) 2025-12-04T12:15:05.5282795Z E1204 11:57:20.400000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp14 = tl.where(r0_mask, tmp12, float("-inf")) 2025-12-04T12:15:05.5283479Z E1204 11:57:20.400000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp15 = triton_helpers.max2(tmp14, 1)[:, None].to(tl.float32) 2025-12-04T12:15:05.5283985Z E1204 11:57:20.400000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp18 = tmp10 * tmp17 2025-12-04T12:15:05.5284433Z E1204 11:57:20.400000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp19 = -448.0 2025-12-04T12:15:05.5285066Z E1204 11:57:20.400000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp20 = triton_helpers.maximum(tmp18, tmp19) 2025-12-04T12:15:05.5285510Z E1204 11:57:20.400000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp21 = 448.0 2025-12-04T12:15:05.5286099Z E1204 11:57:20.400000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp22 = triton_helpers.minimum(tmp20, tmp21) 2025-12-04T12:15:05.5286639Z E1204 11:57:20.400000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp23 = tmp22.to(tl.float8e4nv) 2025-12-04T12:15:05.5287154Z E1204 11:57:20.400000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp24 = tmp15.to(tl.float32) 2025-12-04T12:15:05.5287881Z E1204 11:57:20.400000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (tl.broadcast_to(r0_2, [XBLOCK, R0_BLOCK])), tmp23, r0_mask) 2025-12-04T12:15:05.5288591Z E1204 11:57:20.400000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr2 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp24, None) 2025-12-04T12:15:05.5288971Z E1204 11:57:20.400000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:05.5291351Z E1204 11:57:20.400000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'in_ptr2': '*fp32', 'in_ptr3': '*fp32', 'out_ptr1': '*fp8e4nv', 'out_ptr2': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 75} 2025-12-04T12:15:05.5291936Z E1204 11:57:20.400000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:05.5292979Z E1204 11:57:20.400000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.5293626Z E1204 11:57:20.400000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.5294543Z E1204 11:57:20.400000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.5295224Z E1204 11:57:20.400000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.5296119Z E1204 11:57:20.400000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.5296958Z E1204 11:57:20.400000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.5297622Z E1204 11:57:20.400000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T12:15:05.5298780Z E1204 11:57:20.400000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.5299198Z E1204 11:57:20.400000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T12:15:05.5300089Z E1204 11:57:20.400000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.5300226Z ('RERUN', {'yellow': True}) [3.6016s] [100%] 2025-12-04T12:15:05.5301667Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,15_cuda E1204 11:57:21.047000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1 2025-12-04T12:15:05.5302818Z E1204 11:57:21.047000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.5303261Z E1204 11:57:21.047000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T12:15:05.5303704Z E1204 11:57:21.047000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 150 2025-12-04T12:15:05.5304264Z E1204 11:57:21.047000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] R0_BLOCK: tl.constexpr = 256 2025-12-04T12:15:05.5304730Z E1204 11:57:21.047000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T12:15:05.5305265Z E1204 11:57:21.047000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T12:15:05.5305819Z E1204 11:57:21.047000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:05.5306406Z E1204 11:57:21.047000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T12:15:05.5306997Z E1204 11:57:21.047000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T12:15:05.5307556Z E1204 11:57:21.047000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T12:15:05.5307998Z E1204 11:57:21.047000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_offset = 0 2025-12-04T12:15:05.5308562Z E1204 11:57:21.047000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T12:15:05.5309035Z E1204 11:57:21.047000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T12:15:05.5309513Z E1204 11:57:21.047000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T12:15:05.5309956Z E1204 11:57:21.047000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_2 = r0_index 2025-12-04T12:15:05.5310435Z E1204 11:57:21.047000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_1 = r0_index // 15 2025-12-04T12:15:05.5311124Z E1204 11:57:21.047000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_2), r0_mask, other=0.0).to(tl.float32) 2025-12-04T12:15:05.5311818Z E1204 11:57:21.047000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.load(in_ptr1 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0) 2025-12-04T12:15:05.5312515Z E1204 11:57:21.047000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tl.load(in_ptr2 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0) 2025-12-04T12:15:05.5313073Z E1204 11:57:21.047000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp16 = tl.load(in_ptr3 + (0)) 2025-12-04T12:15:05.5313629Z E1204 11:57:21.047000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp17 = tl.broadcast_to(tmp16, [1, 1]) 2025-12-04T12:15:05.5314138Z E1204 11:57:21.047000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float32) 2025-12-04T12:15:05.5314608Z E1204 11:57:21.047000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tmp1 - tmp2 2025-12-04T12:15:05.5315055Z E1204 11:57:21.047000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = 15.0 2025-12-04T12:15:05.5315528Z E1204 11:57:21.047000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp6 = (tmp4 / tmp5) 2025-12-04T12:15:05.5315980Z E1204 11:57:21.047000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = 1e-05 2025-12-04T12:15:05.5316444Z E1204 11:57:21.047000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tmp6 + tmp7 2025-12-04T12:15:05.5316962Z E1204 11:57:21.047000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = libdevice.rsqrt(tmp8) 2025-12-04T12:15:05.5317483Z E1204 11:57:21.047000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = tmp3 * tmp9 2025-12-04T12:15:05.5317988Z E1204 11:57:21.047000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = tl_math.abs(tmp10) 2025-12-04T12:15:05.5318591Z E1204 11:57:21.047000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = tl.broadcast_to(tmp11, [XBLOCK, R0_BLOCK]) 2025-12-04T12:15:05.5319169Z E1204 11:57:21.047000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp14 = tl.where(r0_mask, tmp12, float("-inf")) 2025-12-04T12:15:05.5319805Z E1204 11:57:21.047000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp15 = triton_helpers.max2(tmp14, 1)[:, None].to(tl.float32) 2025-12-04T12:15:05.5320302Z E1204 11:57:21.047000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp18 = tmp10 * tmp17 2025-12-04T12:15:05.5320748Z E1204 11:57:21.047000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp19 = -448.0 2025-12-04T12:15:05.5321338Z E1204 11:57:21.047000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp20 = triton_helpers.maximum(tmp18, tmp19) 2025-12-04T12:15:05.5321812Z E1204 11:57:21.047000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp21 = 448.0 2025-12-04T12:15:05.5322381Z E1204 11:57:21.047000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp22 = triton_helpers.minimum(tmp20, tmp21) 2025-12-04T12:15:05.5322929Z E1204 11:57:21.047000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp23 = tmp22.to(tl.float8e4nv) 2025-12-04T12:15:05.5323445Z E1204 11:57:21.047000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp24 = tmp15.to(tl.float32) 2025-12-04T12:15:05.5324229Z E1204 11:57:21.047000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (tl.broadcast_to(r0_2, [XBLOCK, R0_BLOCK])), tmp23, r0_mask) 2025-12-04T12:15:05.5324937Z E1204 11:57:21.047000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr2 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp24, None) 2025-12-04T12:15:05.5325317Z E1204 11:57:21.047000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:05.5327703Z E1204 11:57:21.047000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'in_ptr2': '*fp32', 'in_ptr3': '*fp32', 'out_ptr1': '*fp8e4nv', 'out_ptr2': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 75} 2025-12-04T12:15:05.5328290Z E1204 11:57:21.047000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:05.5329332Z E1204 11:57:21.047000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.5329972Z E1204 11:57:21.047000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.5330865Z E1204 11:57:21.047000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.5331581Z E1204 11:57:21.047000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.5332478Z E1204 11:57:21.047000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.5333253Z E1204 11:57:21.047000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.5333877Z E1204 11:57:21.047000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T12:15:05.5335034Z E1204 11:57:21.047000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.5335418Z E1204 11:57:21.047000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T12:15:05.5336410Z E1204 11:57:21.047000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.5336553Z ('RERUN', {'yellow': True}) [0.6071s] [100%] 2025-12-04T12:15:05.5337990Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,15_cuda E1204 11:57:21.658000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1 2025-12-04T12:15:05.5339167Z E1204 11:57:21.658000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.5339618Z E1204 11:57:21.658000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T12:15:05.5340067Z E1204 11:57:21.658000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 150 2025-12-04T12:15:05.5340636Z E1204 11:57:21.658000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] R0_BLOCK: tl.constexpr = 256 2025-12-04T12:15:05.5341096Z E1204 11:57:21.658000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T12:15:05.5341629Z E1204 11:57:21.658000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T12:15:05.5342188Z E1204 11:57:21.658000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:05.5342772Z E1204 11:57:21.658000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T12:15:05.5343365Z E1204 11:57:21.658000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T12:15:05.5343921Z E1204 11:57:21.658000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T12:15:05.5344364Z E1204 11:57:21.658000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_offset = 0 2025-12-04T12:15:05.5344892Z E1204 11:57:21.658000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T12:15:05.5345397Z E1204 11:57:21.658000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T12:15:05.5345867Z E1204 11:57:21.658000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T12:15:05.5346312Z E1204 11:57:21.658000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_2 = r0_index 2025-12-04T12:15:05.5346793Z E1204 11:57:21.658000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_1 = r0_index // 15 2025-12-04T12:15:05.5347452Z E1204 11:57:21.658000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_2), r0_mask, other=0.0).to(tl.float32) 2025-12-04T12:15:05.5348138Z E1204 11:57:21.658000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.load(in_ptr1 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0) 2025-12-04T12:15:05.5348838Z E1204 11:57:21.658000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tl.load(in_ptr2 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0) 2025-12-04T12:15:05.5349392Z E1204 11:57:21.658000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp16 = tl.load(in_ptr3 + (0)) 2025-12-04T12:15:05.5349952Z E1204 11:57:21.658000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp17 = tl.broadcast_to(tmp16, [1, 1]) 2025-12-04T12:15:05.5350464Z E1204 11:57:21.658000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float32) 2025-12-04T12:15:05.5350934Z E1204 11:57:21.658000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tmp1 - tmp2 2025-12-04T12:15:05.5351376Z E1204 11:57:21.658000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = 15.0 2025-12-04T12:15:05.5351896Z E1204 11:57:21.658000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp6 = (tmp4 / tmp5) 2025-12-04T12:15:05.5352345Z E1204 11:57:21.658000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = 1e-05 2025-12-04T12:15:05.5352813Z E1204 11:57:21.658000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tmp6 + tmp7 2025-12-04T12:15:05.5353330Z E1204 11:57:21.658000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = libdevice.rsqrt(tmp8) 2025-12-04T12:15:05.5353846Z E1204 11:57:21.658000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = tmp3 * tmp9 2025-12-04T12:15:05.5354351Z E1204 11:57:21.658000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = tl_math.abs(tmp10) 2025-12-04T12:15:05.5354947Z E1204 11:57:21.658000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = tl.broadcast_to(tmp11, [XBLOCK, R0_BLOCK]) 2025-12-04T12:15:05.5355531Z E1204 11:57:21.658000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp14 = tl.where(r0_mask, tmp12, float("-inf")) 2025-12-04T12:15:05.5356171Z E1204 11:57:21.658000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp15 = triton_helpers.max2(tmp14, 1)[:, None].to(tl.float32) 2025-12-04T12:15:05.5356666Z E1204 11:57:21.658000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp18 = tmp10 * tmp17 2025-12-04T12:15:05.5357111Z E1204 11:57:21.658000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp19 = -448.0 2025-12-04T12:15:05.5357699Z E1204 11:57:21.658000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp20 = triton_helpers.maximum(tmp18, tmp19) 2025-12-04T12:15:05.5358140Z E1204 11:57:21.658000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp21 = 448.0 2025-12-04T12:15:05.5358747Z E1204 11:57:21.658000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp22 = triton_helpers.minimum(tmp20, tmp21) 2025-12-04T12:15:05.5359293Z E1204 11:57:21.658000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp23 = tmp22.to(tl.float8e4nv) 2025-12-04T12:15:05.5359806Z E1204 11:57:21.658000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp24 = tmp15.to(tl.float32) 2025-12-04T12:15:05.5360521Z E1204 11:57:21.658000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (tl.broadcast_to(r0_2, [XBLOCK, R0_BLOCK])), tmp23, r0_mask) 2025-12-04T12:15:05.5361225Z E1204 11:57:21.658000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr2 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp24, None) 2025-12-04T12:15:05.5361653Z E1204 11:57:21.658000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:05.5364065Z E1204 11:57:21.658000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'in_ptr2': '*fp32', 'in_ptr3': '*fp32', 'out_ptr1': '*fp8e4nv', 'out_ptr2': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 75} 2025-12-04T12:15:05.5364617Z E1204 11:57:21.658000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:05.5365697Z E1204 11:57:21.658000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.5366346Z E1204 11:57:21.658000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.5367242Z E1204 11:57:21.658000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.5367953Z E1204 11:57:21.658000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.5368847Z E1204 11:57:21.658000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.5369620Z E1204 11:57:21.658000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.5370243Z E1204 11:57:21.658000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T12:15:05.5371572Z E1204 11:57:21.658000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.5371958Z E1204 11:57:21.658000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T12:15:05.5372921Z E1204 11:57:21.658000 117585 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.5373027Z FAILED [0.6075s] [100%] 2025-12-04T12:15:05.5373033Z 2025-12-04T12:15:05.5373196Z ==================================== RERUNS ==================================== 2025-12-04T12:15:05.5373591Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,15_cuda _ 2025-12-04T12:15:05.5373731Z Traceback (most recent call last): 2025-12-04T12:15:05.5374158Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant 2025-12-04T12:15:05.5374392Z y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T12:15:05.5374894Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:05.5375145Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:05.5375659Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:05.5375868Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:05.5376506Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:05.5376674Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:05.5377208Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:05.5377533Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:05.5378067Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:05.5378220Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:05.5378721Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:05.5378889Z return self._compile_to_module() 2025-12-04T12:15:05.5379378Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:05.5379559Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:05.5380078Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:05.5380257Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:05.5380770Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:05.5381003Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:05.5381603Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:05.5381735Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:05.5382220Z File "/tmp/tmp3ow_8pip/nu/cnuadydymofs7cfakozh54wylilyzojo24q3sbdpr4da73bj2m6f.py", line 137, in 2025-12-04T12:15:05.5382701Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T12:15:05.5382815Z kernel.precompile( 2025-12-04T12:15:05.5383385Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T12:15:05.5383512Z self._precompile_worker() 2025-12-04T12:15:05.5384107Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:05.5384305Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:05.5384899Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.5385134Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.5385598Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.5385848Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.5386311Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.5386651Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.5386879Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:05.5387601Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.5387694Z ^ 2025-12-04T12:15:05.5388169Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.5388178Z 2025-12-04T12:15:05.5388891Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:05.5388927Z 2025-12-04T12:15:05.5388933Z 2025-12-04T12:15:05.5389167Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:05.5389867Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,15_cuda 2025-12-04T12:15:05.5389876Z 2025-12-04T12:15:05.5390148Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:05.5390394Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.5390504Z frames [('total', 1)] 2025-12-04T12:15:05.5390623Z stats [('calls_captured', 10)] 2025-12-04T12:15:05.5391134Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:15:05.5391361Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.5391478Z graph_break [] 2025-12-04T12:15:05.5391873Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,15_cuda _ 2025-12-04T12:15:05.5392001Z Traceback (most recent call last): 2025-12-04T12:15:05.5392470Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant 2025-12-04T12:15:05.5392701Z y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T12:15:05.5393191Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:05.5393458Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:05.5393972Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:05.5394178Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:05.5394689Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:05.5394836Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:05.5395383Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:05.5395704Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:05.5396233Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:05.5396379Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:05.5396891Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:05.5397027Z return self._compile_to_module() 2025-12-04T12:15:05.5397511Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:05.5397688Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:05.5398203Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:05.5398334Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:05.5398847Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:05.5399079Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:05.5399661Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:05.5399802Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:05.5400287Z File "/tmp/tmp81s24w_8/j7/cj7zein4hfa3cnabppz3mxtx4kpx22bzur7ibnf2tol37wabiyvh.py", line 137, in 2025-12-04T12:15:05.5400791Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T12:15:05.5400906Z kernel.precompile( 2025-12-04T12:15:05.5401459Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T12:15:05.5401593Z self._precompile_worker() 2025-12-04T12:15:05.5402189Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:05.5402385Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:05.5402977Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.5403184Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.5403677Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.5403927Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.5404366Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.5404744Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.5404971Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:05.5405691Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.5405783Z ^ 2025-12-04T12:15:05.5406239Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.5406245Z 2025-12-04T12:15:05.5406970Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:05.5406978Z 2025-12-04T12:15:05.5406982Z 2025-12-04T12:15:05.5407199Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:05.5407908Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,15_cuda 2025-12-04T12:15:05.5407916Z 2025-12-04T12:15:05.5408185Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:05.5408421Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.5408526Z frames [('total', 1)] 2025-12-04T12:15:05.5408679Z stats [('calls_captured', 10)] 2025-12-04T12:15:05.5409158Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:15:05.5409380Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.5409480Z graph_break [] 2025-12-04T12:15:05.5409714Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.5409819Z frames [('total', 1)] 2025-12-04T12:15:05.5409935Z stats [('calls_captured', 10)] 2025-12-04T12:15:05.5410171Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.5410631Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:15:05.5410745Z graph_break [] 2025-12-04T12:15:05.5410893Z =================================== FAILURES =================================== 2025-12-04T12:15:05.5411286Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,15_cuda _ 2025-12-04T12:15:05.5411425Z Traceback (most recent call last): 2025-12-04T12:15:05.5411851Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant 2025-12-04T12:15:05.5412100Z y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T12:15:05.5412620Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:05.5412870Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:05.5413394Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:05.5413587Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:05.5414094Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:05.5414258Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:05.5414820Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:05.5415155Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:05.5415680Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:05.5415828Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:05.5416431Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:05.5416557Z return self._compile_to_module() 2025-12-04T12:15:05.5417056Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:05.5417224Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:05.5417743Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:05.5417899Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:05.5418395Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:05.5418628Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:05.5419236Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:05.5419366Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:05.5419882Z File "/tmp/tmp0ive0a57/am/cam4c5zjqj46cwwdvx4zna4nvoqzxflr655jvqum53fitaq3jjdf.py", line 137, in 2025-12-04T12:15:05.5420347Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T12:15:05.5420514Z kernel.precompile( 2025-12-04T12:15:05.5421082Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T12:15:05.5421201Z self._precompile_worker() 2025-12-04T12:15:05.5421807Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:05.5421986Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:05.5422578Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.5422793Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.5423243Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.5423489Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.5423945Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.5424285Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.5424525Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:05.5425265Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.5425359Z ^ 2025-12-04T12:15:05.5425830Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.5425836Z 2025-12-04T12:15:05.5426542Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:05.5426551Z 2025-12-04T12:15:05.5426555Z 2025-12-04T12:15:05.5426785Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:05.5427511Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,15_cuda 2025-12-04T12:15:05.5427518Z 2025-12-04T12:15:05.5427804Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:05.5428026Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.5428163Z frames [('total', 1)] 2025-12-04T12:15:05.5428294Z stats [('calls_captured', 10)] 2025-12-04T12:15:05.5428754Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:15:05.5428976Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.5429089Z graph_break [] 2025-12-04T12:15:05.5429308Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.5429431Z frames [('total', 1)] 2025-12-04T12:15:05.5429546Z stats [('calls_captured', 10)] 2025-12-04T12:15:05.5429766Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.5430239Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:15:05.5430340Z graph_break [] 2025-12-04T12:15:05.5430559Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.5430678Z frames [('total', 1)] 2025-12-04T12:15:05.5430793Z stats [('calls_captured', 10)] 2025-12-04T12:15:05.5431010Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.5431477Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:15:05.5431575Z graph_break [] 2025-12-04T12:15:05.5432279Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-b62e3abe6013e6ef.xml - 2025-12-04T12:15:05.5432459Z =========================== short test summary info ============================ 2025-12-04T12:15:05.5433298Z FAILED [0.6075s] inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,15_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:05.5434017Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.5434110Z ^ 2025-12-04T12:15:05.5434582Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.5434587Z 2025-12-04T12:15:05.5435291Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:05.5435300Z 2025-12-04T12:15:05.5435304Z 2025-12-04T12:15:05.5435539Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:05.5436270Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,15_cuda 2025-12-04T12:15:05.5436276Z 2025-12-04T12:15:05.5436543Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:05.5436740Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T12:15:05.5436945Z ================== 1 failed, 187 deselected, 2 rerun in 4.86s ================== 2025-12-04T12:15:05.5437056Z Got exit code 1 2025-12-04T12:15:05.5437164Z Retrying single test... 2025-12-04T12:15:05.5437637Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-0fe50dbde6f69754.xml 2025-12-04T12:15:05.5437817Z ============================= test session starts ============================== 2025-12-04T12:15:05.5438196Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T12:15:05.5438310Z cachedir: .pytest_cache 2025-12-04T12:15:05.5438843Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:15:05.5438971Z rootdir: /var/lib/jenkins/workspace 2025-12-04T12:15:05.5439092Z configfile: pytest.ini 2025-12-04T12:15:05.5439716Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T12:15:05.5439956Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T12:15:05.5440750Z stepcurrent: skipping 28 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,15_cuda 2025-12-04T12:15:05.5440870Z Running 1 items in this shard 2025-12-04T12:15:05.5440874Z 2025-12-04T12:15:05.5442968Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,15_cuda E1204 11:57:40.163000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1 2025-12-04T12:15:05.5444125Z E1204 11:57:40.163000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.5444575Z E1204 11:57:40.163000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T12:15:05.5445024Z E1204 11:57:40.163000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 150 2025-12-04T12:15:05.5445601Z E1204 11:57:40.163000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] R0_BLOCK: tl.constexpr = 256 2025-12-04T12:15:05.5446082Z E1204 11:57:40.163000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T12:15:05.5446617Z E1204 11:57:40.163000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T12:15:05.5447172Z E1204 11:57:40.163000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:05.5447758Z E1204 11:57:40.163000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T12:15:05.5448349Z E1204 11:57:40.163000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T12:15:05.5448926Z E1204 11:57:40.163000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T12:15:05.5449373Z E1204 11:57:40.163000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_offset = 0 2025-12-04T12:15:05.5449941Z E1204 11:57:40.163000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T12:15:05.5450417Z E1204 11:57:40.163000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T12:15:05.5450876Z E1204 11:57:40.163000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T12:15:05.5451337Z E1204 11:57:40.163000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_2 = r0_index 2025-12-04T12:15:05.5451824Z E1204 11:57:40.163000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_1 = r0_index // 15 2025-12-04T12:15:05.5452597Z E1204 11:57:40.163000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_2), r0_mask, other=0.0).to(tl.float32) 2025-12-04T12:15:05.5453294Z E1204 11:57:40.163000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.load(in_ptr1 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0) 2025-12-04T12:15:05.5453985Z E1204 11:57:40.163000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tl.load(in_ptr2 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0) 2025-12-04T12:15:05.5454657Z E1204 11:57:40.163000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp16 = tl.load(in_ptr3 + (0)) 2025-12-04T12:15:05.5455244Z E1204 11:57:40.163000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp17 = tl.broadcast_to(tmp16, [1, 1]) 2025-12-04T12:15:05.5455773Z E1204 11:57:40.163000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float32) 2025-12-04T12:15:05.5456246Z E1204 11:57:40.163000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tmp1 - tmp2 2025-12-04T12:15:05.5456790Z E1204 11:57:40.163000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = 15.0 2025-12-04T12:15:05.5457275Z E1204 11:57:40.163000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp6 = (tmp4 / tmp5) 2025-12-04T12:15:05.5457712Z E1204 11:57:40.163000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = 1e-05 2025-12-04T12:15:05.5458191Z E1204 11:57:40.163000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tmp6 + tmp7 2025-12-04T12:15:05.5458709Z E1204 11:57:40.163000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = libdevice.rsqrt(tmp8) 2025-12-04T12:15:05.5459243Z E1204 11:57:40.163000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = tmp3 * tmp9 2025-12-04T12:15:05.5459750Z E1204 11:57:40.163000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = tl_math.abs(tmp10) 2025-12-04T12:15:05.5460340Z E1204 11:57:40.163000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = tl.broadcast_to(tmp11, [XBLOCK, R0_BLOCK]) 2025-12-04T12:15:05.5460933Z E1204 11:57:40.163000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp14 = tl.where(r0_mask, tmp12, float("-inf")) 2025-12-04T12:15:05.5461574Z E1204 11:57:40.163000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp15 = triton_helpers.max2(tmp14, 1)[:, None].to(tl.float32) 2025-12-04T12:15:05.5462076Z E1204 11:57:40.163000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp18 = tmp10 * tmp17 2025-12-04T12:15:05.5462531Z E1204 11:57:40.163000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp19 = -448.0 2025-12-04T12:15:05.5463139Z E1204 11:57:40.163000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp20 = triton_helpers.maximum(tmp18, tmp19) 2025-12-04T12:15:05.5463591Z E1204 11:57:40.163000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp21 = 448.0 2025-12-04T12:15:05.5464166Z E1204 11:57:40.163000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp22 = triton_helpers.minimum(tmp20, tmp21) 2025-12-04T12:15:05.5464711Z E1204 11:57:40.163000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp23 = tmp22.to(tl.float8e4nv) 2025-12-04T12:15:05.5465224Z E1204 11:57:40.163000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp24 = tmp15.to(tl.float32) 2025-12-04T12:15:05.5465957Z E1204 11:57:40.163000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (tl.broadcast_to(r0_2, [XBLOCK, R0_BLOCK])), tmp23, r0_mask) 2025-12-04T12:15:05.5466678Z E1204 11:57:40.163000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr2 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp24, None) 2025-12-04T12:15:05.5467040Z E1204 11:57:40.163000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:05.5469481Z E1204 11:57:40.163000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'in_ptr2': '*fp32', 'in_ptr3': '*fp32', 'out_ptr1': '*fp8e4nv', 'out_ptr2': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 75} 2025-12-04T12:15:05.5470020Z E1204 11:57:40.163000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:05.5471364Z E1204 11:57:40.163000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.5472502Z E1204 11:57:40.163000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.5474310Z E1204 11:57:40.163000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.5475454Z E1204 11:57:40.163000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.5476358Z E1204 11:57:40.163000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.5477131Z E1204 11:57:40.163000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.5477738Z E1204 11:57:40.163000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T12:15:05.5478899Z E1204 11:57:40.163000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.5479319Z E1204 11:57:40.163000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T12:15:05.5480223Z E1204 11:57:40.163000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.5480360Z ('RERUN', {'yellow': True}) [3.5989s] [100%] 2025-12-04T12:15:05.5481796Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,15_cuda E1204 11:57:40.812000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1 2025-12-04T12:15:05.5482993Z E1204 11:57:40.812000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.5483426Z E1204 11:57:40.812000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T12:15:05.5483885Z E1204 11:57:40.812000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 150 2025-12-04T12:15:05.5484450Z E1204 11:57:40.812000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] R0_BLOCK: tl.constexpr = 256 2025-12-04T12:15:05.5484923Z E1204 11:57:40.812000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T12:15:05.5485457Z E1204 11:57:40.812000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T12:15:05.5486013Z E1204 11:57:40.812000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:05.5486600Z E1204 11:57:40.812000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T12:15:05.5487183Z E1204 11:57:40.812000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T12:15:05.5487758Z E1204 11:57:40.812000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T12:15:05.5488199Z E1204 11:57:40.812000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_offset = 0 2025-12-04T12:15:05.5488761Z E1204 11:57:40.812000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T12:15:05.5489238Z E1204 11:57:40.812000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T12:15:05.5489702Z E1204 11:57:40.812000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T12:15:05.5490160Z E1204 11:57:40.812000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_2 = r0_index 2025-12-04T12:15:05.5490639Z E1204 11:57:40.812000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_1 = r0_index // 15 2025-12-04T12:15:05.5491293Z E1204 11:57:40.812000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_2), r0_mask, other=0.0).to(tl.float32) 2025-12-04T12:15:05.5491984Z E1204 11:57:40.812000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.load(in_ptr1 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0) 2025-12-04T12:15:05.5492671Z E1204 11:57:40.812000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tl.load(in_ptr2 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0) 2025-12-04T12:15:05.5493249Z E1204 11:57:40.812000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp16 = tl.load(in_ptr3 + (0)) 2025-12-04T12:15:05.5493803Z E1204 11:57:40.812000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp17 = tl.broadcast_to(tmp16, [1, 1]) 2025-12-04T12:15:05.5494322Z E1204 11:57:40.812000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float32) 2025-12-04T12:15:05.5494789Z E1204 11:57:40.812000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tmp1 - tmp2 2025-12-04T12:15:05.5495224Z E1204 11:57:40.812000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = 15.0 2025-12-04T12:15:05.5495751Z E1204 11:57:40.812000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp6 = (tmp4 / tmp5) 2025-12-04T12:15:05.5496193Z E1204 11:57:40.812000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = 1e-05 2025-12-04T12:15:05.5496749Z E1204 11:57:40.812000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tmp6 + tmp7 2025-12-04T12:15:05.5497311Z E1204 11:57:40.812000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = libdevice.rsqrt(tmp8) 2025-12-04T12:15:05.5497785Z E1204 11:57:40.812000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = tmp3 * tmp9 2025-12-04T12:15:05.5498307Z E1204 11:57:40.812000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = tl_math.abs(tmp10) 2025-12-04T12:15:05.5498898Z E1204 11:57:40.812000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = tl.broadcast_to(tmp11, [XBLOCK, R0_BLOCK]) 2025-12-04T12:15:05.5499498Z E1204 11:57:40.812000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp14 = tl.where(r0_mask, tmp12, float("-inf")) 2025-12-04T12:15:05.5500137Z E1204 11:57:40.812000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp15 = triton_helpers.max2(tmp14, 1)[:, None].to(tl.float32) 2025-12-04T12:15:05.5500636Z E1204 11:57:40.812000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp18 = tmp10 * tmp17 2025-12-04T12:15:05.5501419Z E1204 11:57:40.812000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp19 = -448.0 2025-12-04T12:15:05.5501995Z E1204 11:57:40.812000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp20 = triton_helpers.maximum(tmp18, tmp19) 2025-12-04T12:15:05.5502500Z E1204 11:57:40.812000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp21 = 448.0 2025-12-04T12:15:05.5503082Z E1204 11:57:40.812000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp22 = triton_helpers.minimum(tmp20, tmp21) 2025-12-04T12:15:05.5503632Z E1204 11:57:40.812000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp23 = tmp22.to(tl.float8e4nv) 2025-12-04T12:15:05.5504148Z E1204 11:57:40.812000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp24 = tmp15.to(tl.float32) 2025-12-04T12:15:05.5504857Z E1204 11:57:40.812000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (tl.broadcast_to(r0_2, [XBLOCK, R0_BLOCK])), tmp23, r0_mask) 2025-12-04T12:15:05.5505580Z E1204 11:57:40.812000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr2 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp24, None) 2025-12-04T12:15:05.5505955Z E1204 11:57:40.812000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:05.5508378Z E1204 11:57:40.812000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'in_ptr2': '*fp32', 'in_ptr3': '*fp32', 'out_ptr1': '*fp8e4nv', 'out_ptr2': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 75} 2025-12-04T12:15:05.5508918Z E1204 11:57:40.812000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:05.5510012Z E1204 11:57:40.812000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.5510645Z E1204 11:57:40.812000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.5511555Z E1204 11:57:40.812000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.5512276Z E1204 11:57:40.812000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.5513170Z E1204 11:57:40.812000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.5513948Z E1204 11:57:40.812000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.5514553Z E1204 11:57:40.812000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T12:15:05.5515717Z E1204 11:57:40.812000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.5516082Z E1204 11:57:40.812000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T12:15:05.5517026Z E1204 11:57:40.812000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.5517162Z ('RERUN', {'yellow': True}) [0.6077s] [100%] 2025-12-04T12:15:05.5518611Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,15_cuda E1204 11:57:41.421000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1 2025-12-04T12:15:05.5519755Z E1204 11:57:41.421000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.5520188Z E1204 11:57:41.421000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T12:15:05.5520651Z E1204 11:57:41.421000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 150 2025-12-04T12:15:05.5521209Z E1204 11:57:41.421000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] R0_BLOCK: tl.constexpr = 256 2025-12-04T12:15:05.5521687Z E1204 11:57:41.421000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T12:15:05.5522221Z E1204 11:57:41.421000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T12:15:05.5522760Z E1204 11:57:41.421000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:05.5523348Z E1204 11:57:41.421000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T12:15:05.5523992Z E1204 11:57:41.421000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T12:15:05.5524567Z E1204 11:57:41.421000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T12:15:05.5525010Z E1204 11:57:41.421000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_offset = 0 2025-12-04T12:15:05.5525583Z E1204 11:57:41.421000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T12:15:05.5526056Z E1204 11:57:41.421000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T12:15:05.5526513Z E1204 11:57:41.421000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T12:15:05.5526975Z E1204 11:57:41.421000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_2 = r0_index 2025-12-04T12:15:05.5527460Z E1204 11:57:41.421000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_1 = r0_index // 15 2025-12-04T12:15:05.5528121Z E1204 11:57:41.421000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_2), r0_mask, other=0.0).to(tl.float32) 2025-12-04T12:15:05.5528810Z E1204 11:57:41.421000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.load(in_ptr1 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0) 2025-12-04T12:15:05.5529497Z E1204 11:57:41.421000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tl.load(in_ptr2 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0) 2025-12-04T12:15:05.5530032Z E1204 11:57:41.421000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp16 = tl.load(in_ptr3 + (0)) 2025-12-04T12:15:05.5530625Z E1204 11:57:41.421000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp17 = tl.broadcast_to(tmp16, [1, 1]) 2025-12-04T12:15:05.5531145Z E1204 11:57:41.421000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float32) 2025-12-04T12:15:05.5531611Z E1204 11:57:41.421000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tmp1 - tmp2 2025-12-04T12:15:05.5532042Z E1204 11:57:41.421000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = 15.0 2025-12-04T12:15:05.5532532Z E1204 11:57:41.421000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp6 = (tmp4 / tmp5) 2025-12-04T12:15:05.5532968Z E1204 11:57:41.421000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = 1e-05 2025-12-04T12:15:05.5533447Z E1204 11:57:41.421000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tmp6 + tmp7 2025-12-04T12:15:05.5533967Z E1204 11:57:41.421000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = libdevice.rsqrt(tmp8) 2025-12-04T12:15:05.5534474Z E1204 11:57:41.421000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = tmp3 * tmp9 2025-12-04T12:15:05.5534995Z E1204 11:57:41.421000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = tl_math.abs(tmp10) 2025-12-04T12:15:05.5535585Z E1204 11:57:41.421000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = tl.broadcast_to(tmp11, [XBLOCK, R0_BLOCK]) 2025-12-04T12:15:05.5536171Z E1204 11:57:41.421000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp14 = tl.where(r0_mask, tmp12, float("-inf")) 2025-12-04T12:15:05.5536883Z E1204 11:57:41.421000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp15 = triton_helpers.max2(tmp14, 1)[:, None].to(tl.float32) 2025-12-04T12:15:05.5537422Z E1204 11:57:41.421000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp18 = tmp10 * tmp17 2025-12-04T12:15:05.5537873Z E1204 11:57:41.421000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp19 = -448.0 2025-12-04T12:15:05.5538450Z E1204 11:57:41.421000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp20 = triton_helpers.maximum(tmp18, tmp19) 2025-12-04T12:15:05.5538941Z E1204 11:57:41.421000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp21 = 448.0 2025-12-04T12:15:05.5539516Z E1204 11:57:41.421000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp22 = triton_helpers.minimum(tmp20, tmp21) 2025-12-04T12:15:05.5540065Z E1204 11:57:41.421000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp23 = tmp22.to(tl.float8e4nv) 2025-12-04T12:15:05.5540586Z E1204 11:57:41.421000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp24 = tmp15.to(tl.float32) 2025-12-04T12:15:05.5541292Z E1204 11:57:41.421000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (tl.broadcast_to(r0_2, [XBLOCK, R0_BLOCK])), tmp23, r0_mask) 2025-12-04T12:15:05.5542006Z E1204 11:57:41.421000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr2 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp24, None) 2025-12-04T12:15:05.5542372Z E1204 11:57:41.421000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:05.5544772Z E1204 11:57:41.421000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'in_ptr2': '*fp32', 'in_ptr3': '*fp32', 'out_ptr1': '*fp8e4nv', 'out_ptr2': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 75} 2025-12-04T12:15:05.5545351Z E1204 11:57:41.421000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:05.5546410Z E1204 11:57:41.421000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.5547036Z E1204 11:57:41.421000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.5547952Z E1204 11:57:41.421000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.5548665Z E1204 11:57:41.421000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.5549557Z E1204 11:57:41.421000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.5550337Z E1204 11:57:41.421000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.5550982Z E1204 11:57:41.421000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T12:15:05.5552150Z E1204 11:57:41.421000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.5552519Z E1204 11:57:41.421000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T12:15:05.5553459Z E1204 11:57:41.421000 117825 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.5553565Z FAILED [0.6071s] [100%] 2025-12-04T12:15:05.5553572Z 2025-12-04T12:15:05.5553719Z ==================================== RERUNS ==================================== 2025-12-04T12:15:05.5554130Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,15_cuda _ 2025-12-04T12:15:05.5554261Z Traceback (most recent call last): 2025-12-04T12:15:05.5554702Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant 2025-12-04T12:15:05.5554944Z y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T12:15:05.5555435Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:05.5555702Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:05.5556215Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:05.5556424Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:05.5556937Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:05.5557126Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:05.5557678Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:05.5558001Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:05.5558523Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:05.5558691Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:05.5559174Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:05.5559312Z return self._compile_to_module() 2025-12-04T12:15:05.5559797Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:05.5559965Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:05.5560500Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:05.5560634Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:05.5561184Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:05.5561419Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:05.5562007Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:05.5562153Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:05.5562667Z File "/tmp/tmpe4lf01oz/ep/cepmldjvpqzthlxmqq3znbbpq4dlfr2pccils4j5vyxsk46xyz34.py", line 137, in 2025-12-04T12:15:05.5563131Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T12:15:05.5563265Z kernel.precompile( 2025-12-04T12:15:05.5563856Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T12:15:05.5563993Z self._precompile_worker() 2025-12-04T12:15:05.5564591Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:05.5564773Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:05.5565417Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.5565617Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.5566081Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.5566331Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.5566775Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.5567123Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.5567353Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:05.5568077Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.5568171Z ^ 2025-12-04T12:15:05.5568629Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.5568635Z 2025-12-04T12:15:05.5569359Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:05.5569410Z 2025-12-04T12:15:05.5569415Z 2025-12-04T12:15:05.5569637Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:05.5570348Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,15_cuda 2025-12-04T12:15:05.5570356Z 2025-12-04T12:15:05.5570628Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:05.5570855Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.5571165Z frames [('total', 1)] 2025-12-04T12:15:05.5571288Z stats [('calls_captured', 10)] 2025-12-04T12:15:05.5571770Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:15:05.5571994Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.5572098Z graph_break [] 2025-12-04T12:15:05.5572508Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,15_cuda _ 2025-12-04T12:15:05.5572635Z Traceback (most recent call last): 2025-12-04T12:15:05.5573058Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant 2025-12-04T12:15:05.5573401Z y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T12:15:05.5573891Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:05.5574157Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:05.5574667Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:05.5574860Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:05.5575390Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:05.5575540Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:05.5576122Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:05.5576533Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:05.5577055Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:05.5577276Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:05.5577756Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:05.5577881Z return self._compile_to_module() 2025-12-04T12:15:05.5578379Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:05.5578546Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:05.5579076Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:05.5579209Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:05.5579709Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:05.5579956Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:05.5580546Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:05.5580690Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:05.5581197Z File "/tmp/tmp6vhdluxb/af/cafmv2us7q56iwaosl5v5h7evchdxekfd3w3uckrh4vrz5p6mlss.py", line 137, in 2025-12-04T12:15:05.5581661Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T12:15:05.5581836Z kernel.precompile( 2025-12-04T12:15:05.5582398Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T12:15:05.5582517Z self._precompile_worker() 2025-12-04T12:15:05.5583129Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:05.5583309Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:05.5583913Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.5584110Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.5584561Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.5584823Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.5585268Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.5585614Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.5585877Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:05.5586589Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.5586696Z ^ 2025-12-04T12:15:05.5587154Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.5587160Z 2025-12-04T12:15:05.5587881Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:05.5587889Z 2025-12-04T12:15:05.5587895Z 2025-12-04T12:15:05.5588148Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:05.5588850Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,15_cuda 2025-12-04T12:15:05.5588858Z 2025-12-04T12:15:05.5589141Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:05.5589364Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.5589518Z frames [('total', 1)] 2025-12-04T12:15:05.5589637Z stats [('calls_captured', 10)] 2025-12-04T12:15:05.5590103Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:15:05.5590340Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.5590448Z graph_break [] 2025-12-04T12:15:05.5590667Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.5590787Z frames [('total', 1)] 2025-12-04T12:15:05.5590906Z stats [('calls_captured', 10)] 2025-12-04T12:15:05.5591140Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.5591605Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:15:05.5591708Z graph_break [] 2025-12-04T12:15:05.5591868Z =================================== FAILURES =================================== 2025-12-04T12:15:05.5592267Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,15_cuda _ 2025-12-04T12:15:05.5592394Z Traceback (most recent call last): 2025-12-04T12:15:05.5592832Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant 2025-12-04T12:15:05.5593068Z y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T12:15:05.5593622Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:05.5593869Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:05.5594811Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:05.5595025Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:05.5595533Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:05.5595699Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:05.5596234Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:05.5596557Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:05.5597095Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:05.5597245Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:05.5597779Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:05.5597915Z return self._compile_to_module() 2025-12-04T12:15:05.5598398Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:05.5598576Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:05.5599092Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:05.5599221Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:05.5599729Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:05.5599965Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:05.5600595Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:05.5600726Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:05.5601231Z File "/tmp/tmp74h9pahr/3r/c3rhqx7dwe25fhfgmap3vvnziwycxz4cgpkgvh7ubpuf7rc225wi.py", line 137, in 2025-12-04T12:15:05.5601708Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T12:15:05.5601857Z kernel.precompile( 2025-12-04T12:15:05.5602413Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T12:15:05.5602545Z self._precompile_worker() 2025-12-04T12:15:05.5603142Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:05.5603338Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:05.5603930Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.5604130Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.5604591Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.5605012Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.5605470Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.5605805Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.5606031Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:05.5606814Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.5606908Z ^ 2025-12-04T12:15:05.5607366Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.5607387Z 2025-12-04T12:15:05.5608099Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:05.5608107Z 2025-12-04T12:15:05.5608112Z 2025-12-04T12:15:05.5608329Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:05.5609038Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,15_cuda 2025-12-04T12:15:05.5609048Z 2025-12-04T12:15:05.5609316Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:05.5609551Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.5609658Z frames [('total', 1)] 2025-12-04T12:15:05.5609776Z stats [('calls_captured', 10)] 2025-12-04T12:15:05.5610291Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:15:05.5610514Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.5610618Z graph_break [] 2025-12-04T12:15:05.5610855Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.5610962Z frames [('total', 1)] 2025-12-04T12:15:05.5611093Z stats [('calls_captured', 10)] 2025-12-04T12:15:05.5611311Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.5611770Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:15:05.5611888Z graph_break [] 2025-12-04T12:15:05.5612141Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.5612246Z frames [('total', 1)] 2025-12-04T12:15:05.5612382Z stats [('calls_captured', 10)] 2025-12-04T12:15:05.5612601Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.5613069Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:15:05.5613203Z graph_break [] 2025-12-04T12:15:05.5613856Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-0fe50dbde6f69754.xml - 2025-12-04T12:15:05.5614043Z =========================== short test summary info ============================ 2025-12-04T12:15:05.5614873Z FAILED [0.6071s] inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,15_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:05.5615600Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.5615691Z ^ 2025-12-04T12:15:05.5616150Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.5616156Z 2025-12-04T12:15:05.5616959Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:05.5616966Z 2025-12-04T12:15:05.5616971Z 2025-12-04T12:15:05.5617190Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:05.5617897Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,15_cuda 2025-12-04T12:15:05.5617969Z 2025-12-04T12:15:05.5618240Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:05.5618423Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T12:15:05.5618644Z ================== 1 failed, 187 deselected, 2 rerun in 4.86s ================== 2025-12-04T12:15:05.5618745Z Got exit code 1 2025-12-04T12:15:05.5619381Z FAILED CONSISTENTLY: test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,15_cuda 2025-12-04T12:15:05.5619798Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T12:15:05.5620268Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-1381d94cd6abec18.xml 2025-12-04T12:15:05.5620465Z ============================= test session starts ============================== 2025-12-04T12:15:05.5620823Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T12:15:05.5620953Z cachedir: .pytest_cache 2025-12-04T12:15:05.5621474Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:15:05.5621638Z rootdir: /var/lib/jenkins/workspace 2025-12-04T12:15:05.5621764Z configfile: pytest.ini 2025-12-04T12:15:05.5622357Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T12:15:05.5622589Z collecting ... collected 188 items / 29 deselected / 159 selected 2025-12-04T12:15:05.5622749Z stepcurrent: skipping 29 already run items. 2025-12-04T12:15:05.5622867Z Running 159 items in this shard 2025-12-04T12:15:05.5622872Z 2025-12-04T12:15:05.5624264Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,4096_cuda E1204 11:58:00.018000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0 2025-12-04T12:15:05.5625363Z E1204 11:58:00.018000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T12:15:05.5625870Z E1204 11:58:00.018000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 10 2025-12-04T12:15:05.5626329Z E1204 11:58:00.018000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 4096 2025-12-04T12:15:05.5626790Z E1204 11:58:00.018000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T12:15:05.5627348Z E1204 11:58:00.018000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T12:15:05.5627891Z E1204 11:58:00.018000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:05.5628496Z E1204 11:58:00.018000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T12:15:05.5628989Z E1204 11:58:00.018000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = xindex < xnumel 2025-12-04T12:15:05.5629545Z E1204 11:58:00.018000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_base = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T12:15:05.5630012Z E1204 11:58:00.018000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rbase = r0_base 2025-12-04T12:15:05.5630445Z E1204 11:58:00.018000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T12:15:05.5631101Z E1204 11:58:00.018000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_mean = tl.zeros([XBLOCK, R0_BLOCK], tl.float32) 2025-12-04T12:15:05.5631690Z E1204 11:58:00.018000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_m2 = tl.zeros([XBLOCK, R0_BLOCK], tl.float32) 2025-12-04T12:15:05.5632299Z E1204 11:58:00.018000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_weight = tl.zeros([XBLOCK, R0_BLOCK], tl.float32) 2025-12-04T12:15:05.5632895Z E1204 11:58:00.018000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] for r0_offset in tl.range(0, r0_numel, R0_BLOCK): 2025-12-04T12:15:05.5633429Z E1204 11:58:00.018000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = r0_offset + r0_base 2025-12-04T12:15:05.5633972Z E1204 11:58:00.018000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T12:15:05.5634463Z E1204 11:58:00.018000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T12:15:05.5634992Z E1204 11:58:00.018000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T12:15:05.5635459Z E1204 11:58:00.018000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_1 = r0_index 2025-12-04T12:15:05.5636270Z E1204 11:58:00.018000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask & xmask, eviction_policy='evict_last', other=0.0).to(tl.float32) 2025-12-04T12:15:05.5636817Z E1204 11:58:00.018000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float32) 2025-12-04T12:15:05.5637406Z E1204 11:58:00.018000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK]) 2025-12-04T12:15:05.5638171Z E1204 11:58:00.018000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_mean_next, tmp3_m2_next, tmp3_weight_next = triton_helpers.welford_reduce( 2025-12-04T12:15:05.5638776Z E1204 11:58:00.018000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2, tmp3_mean, tmp3_m2, tmp3_weight, roffset == 0 2025-12-04T12:15:05.5639208Z E1204 11:58:00.018000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ) 2025-12-04T12:15:05.5639872Z E1204 11:58:00.018000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_mean = tl.where(r0_mask & xmask, tmp3_mean_next, tmp3_mean) 2025-12-04T12:15:05.5640489Z E1204 11:58:00.018000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_m2 = tl.where(r0_mask & xmask, tmp3_m2_next, tmp3_m2) 2025-12-04T12:15:05.5641182Z E1204 11:58:00.018000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_weight = tl.where(r0_mask & xmask, tmp3_weight_next, tmp3_weight) 2025-12-04T12:15:05.5641884Z E1204 11:58:00.018000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4, tmp5, tmp6 = triton_helpers.welford(tmp3_mean, tmp3_m2, tmp3_weight, 1) 2025-12-04T12:15:05.5642385Z E1204 11:58:00.018000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tmp4[:, None] 2025-12-04T12:15:05.5642863Z E1204 11:58:00.018000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = tmp5[:, None] 2025-12-04T12:15:05.5643335Z E1204 11:58:00.018000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tmp6[:, None] 2025-12-04T12:15:05.5643978Z E1204 11:58:00.018000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp20 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32) 2025-12-04T12:15:05.5644537Z E1204 11:58:00.018000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp22 = tl.load(in_ptr1 + (0)) 2025-12-04T12:15:05.5645103Z E1204 11:58:00.018000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp23 = tl.broadcast_to(tmp22, [1, 1]) 2025-12-04T12:15:05.5645680Z E1204 11:58:00.018000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] for r0_offset in tl.range(0, r0_numel, R0_BLOCK): 2025-12-04T12:15:05.5646212Z E1204 11:58:00.018000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = r0_offset + r0_base 2025-12-04T12:15:05.5646754Z E1204 11:58:00.018000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T12:15:05.5647247Z E1204 11:58:00.018000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T12:15:05.5647745Z E1204 11:58:00.018000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T12:15:05.5648242Z E1204 11:58:00.018000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_1 = r0_index 2025-12-04T12:15:05.5649061Z E1204 11:58:00.018000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask & xmask, eviction_policy='evict_first', other=0.0).to(tl.float32) 2025-12-04T12:15:05.5649604Z E1204 11:58:00.018000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = tmp9.to(tl.float32) 2025-12-04T12:15:05.5650100Z E1204 11:58:00.018000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = tmp10 - tmp3 2025-12-04T12:15:05.5650579Z E1204 11:58:00.018000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = 4096.0 2025-12-04T12:15:05.5651109Z E1204 11:58:00.018000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = (tmp7 / tmp12) 2025-12-04T12:15:05.5651592Z E1204 11:58:00.018000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp14 = 1e-05 2025-12-04T12:15:05.5652096Z E1204 11:58:00.018000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp15 = tmp13 + tmp14 2025-12-04T12:15:05.5652668Z E1204 11:58:00.018000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp16 = libdevice.rsqrt(tmp15) 2025-12-04T12:15:05.5653178Z E1204 11:58:00.018000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp17 = tmp11 * tmp16 2025-12-04T12:15:05.5653696Z E1204 11:58:00.018000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp18 = tl_math.abs(tmp17) 2025-12-04T12:15:05.5654312Z E1204 11:58:00.018000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp19 = tl.broadcast_to(tmp18, [XBLOCK, R0_BLOCK]) 2025-12-04T12:15:05.5654896Z E1204 11:58:00.018000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp21 = triton_helpers.maximum(_tmp20, tmp19) 2025-12-04T12:15:05.5655483Z E1204 11:58:00.018000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp20 = tl.where(r0_mask & xmask, tmp21, _tmp20) 2025-12-04T12:15:05.5655992Z E1204 11:58:00.018000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp24 = tmp17 * tmp23 2025-12-04T12:15:05.5656536Z E1204 11:58:00.018000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp25 = -448.0 2025-12-04T12:15:05.5657129Z E1204 11:58:00.018000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp26 = triton_helpers.maximum(tmp24, tmp25) 2025-12-04T12:15:05.5657627Z E1204 11:58:00.018000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp27 = 448.0 2025-12-04T12:15:05.5658201Z E1204 11:58:00.018000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp28 = triton_helpers.minimum(tmp26, tmp27) 2025-12-04T12:15:05.5658762Z E1204 11:58:00.018000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp29 = tmp28.to(tl.float8e4nv) 2025-12-04T12:15:05.5659394Z E1204 11:58:00.018000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr3 + (r0_1 + 4096*x0), tmp29, r0_mask & xmask) 2025-12-04T12:15:05.5659984Z E1204 11:58:00.018000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp20 = triton_helpers.max2(_tmp20, 1)[:, None] 2025-12-04T12:15:05.5660529Z E1204 11:58:00.018000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr2 + (x0), tmp20, xmask) 2025-12-04T12:15:05.5660893Z E1204 11:58:00.018000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:05.5663265Z E1204 11:58:00.018000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr2': '*fp32', 'out_ptr3': '*fp8e4nv', 'xnumel': 'i32', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1, 'R0_BLOCK': 4096}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 75} 2025-12-04T12:15:05.5663811Z E1204 11:58:00.018000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:05.5664884Z E1204 11:58:00.018000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.5665514Z E1204 11:58:00.018000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.5666418Z E1204 11:58:00.018000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.5667144Z E1204 11:58:00.018000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.5668037Z E1204 11:58:00.018000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.5668804Z E1204 11:58:00.018000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.5669425Z E1204 11:58:00.018000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T12:15:05.5670515Z E1204 11:58:00.018000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T12:15:05.5670881Z E1204 11:58:00.018000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T12:15:05.5672032Z E1204 11:58:00.018000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.5672168Z ('RERUN', {'yellow': True}) [3.4490s] [ 0%] 2025-12-04T12:15:05.5673517Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,4096_cuda E1204 11:58:00.495000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0 2025-12-04T12:15:05.5674608Z E1204 11:58:00.495000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T12:15:05.5675067Z E1204 11:58:00.495000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 10 2025-12-04T12:15:05.5675525Z E1204 11:58:00.495000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 4096 2025-12-04T12:15:05.5675989Z E1204 11:58:00.495000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T12:15:05.5676625Z E1204 11:58:00.495000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T12:15:05.5677171Z E1204 11:58:00.495000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:05.5677772Z E1204 11:58:00.495000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T12:15:05.5678264Z E1204 11:58:00.495000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = xindex < xnumel 2025-12-04T12:15:05.5678820Z E1204 11:58:00.495000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_base = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T12:15:05.5679330Z E1204 11:58:00.495000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rbase = r0_base 2025-12-04T12:15:05.5679766Z E1204 11:58:00.495000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T12:15:05.5680380Z E1204 11:58:00.495000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_mean = tl.zeros([XBLOCK, R0_BLOCK], tl.float32) 2025-12-04T12:15:05.5681009Z E1204 11:58:00.495000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_m2 = tl.zeros([XBLOCK, R0_BLOCK], tl.float32) 2025-12-04T12:15:05.5681633Z E1204 11:58:00.495000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_weight = tl.zeros([XBLOCK, R0_BLOCK], tl.float32) 2025-12-04T12:15:05.5682215Z E1204 11:58:00.495000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] for r0_offset in tl.range(0, r0_numel, R0_BLOCK): 2025-12-04T12:15:05.5682747Z E1204 11:58:00.495000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = r0_offset + r0_base 2025-12-04T12:15:05.5683291Z E1204 11:58:00.495000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T12:15:05.5683780Z E1204 11:58:00.495000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T12:15:05.5684273Z E1204 11:58:00.495000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T12:15:05.5684738Z E1204 11:58:00.495000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_1 = r0_index 2025-12-04T12:15:05.5685593Z E1204 11:58:00.495000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask & xmask, eviction_policy='evict_last', other=0.0).to(tl.float32) 2025-12-04T12:15:05.5686132Z E1204 11:58:00.495000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float32) 2025-12-04T12:15:05.5686726Z E1204 11:58:00.495000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK]) 2025-12-04T12:15:05.5687461Z E1204 11:58:00.495000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_mean_next, tmp3_m2_next, tmp3_weight_next = triton_helpers.welford_reduce( 2025-12-04T12:15:05.5688059Z E1204 11:58:00.495000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2, tmp3_mean, tmp3_m2, tmp3_weight, roffset == 0 2025-12-04T12:15:05.5688472Z E1204 11:58:00.495000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ) 2025-12-04T12:15:05.5689129Z E1204 11:58:00.495000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_mean = tl.where(r0_mask & xmask, tmp3_mean_next, tmp3_mean) 2025-12-04T12:15:05.5689772Z E1204 11:58:00.495000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_m2 = tl.where(r0_mask & xmask, tmp3_m2_next, tmp3_m2) 2025-12-04T12:15:05.5690456Z E1204 11:58:00.495000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_weight = tl.where(r0_mask & xmask, tmp3_weight_next, tmp3_weight) 2025-12-04T12:15:05.5691158Z E1204 11:58:00.495000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4, tmp5, tmp6 = triton_helpers.welford(tmp3_mean, tmp3_m2, tmp3_weight, 1) 2025-12-04T12:15:05.5691650Z E1204 11:58:00.495000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tmp4[:, None] 2025-12-04T12:15:05.5692128Z E1204 11:58:00.495000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = tmp5[:, None] 2025-12-04T12:15:05.5692634Z E1204 11:58:00.495000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tmp6[:, None] 2025-12-04T12:15:05.5693283Z E1204 11:58:00.495000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp20 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32) 2025-12-04T12:15:05.5693804Z E1204 11:58:00.495000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp22 = tl.load(in_ptr1 + (0)) 2025-12-04T12:15:05.5694393Z E1204 11:58:00.495000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp23 = tl.broadcast_to(tmp22, [1, 1]) 2025-12-04T12:15:05.5694975Z E1204 11:58:00.495000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] for r0_offset in tl.range(0, r0_numel, R0_BLOCK): 2025-12-04T12:15:05.5695505Z E1204 11:58:00.495000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = r0_offset + r0_base 2025-12-04T12:15:05.5696043Z E1204 11:58:00.495000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T12:15:05.5696593Z E1204 11:58:00.495000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T12:15:05.5697085Z E1204 11:58:00.495000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T12:15:05.5697552Z E1204 11:58:00.495000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_1 = r0_index 2025-12-04T12:15:05.5698383Z E1204 11:58:00.495000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask & xmask, eviction_policy='evict_first', other=0.0).to(tl.float32) 2025-12-04T12:15:05.5698948Z E1204 11:58:00.495000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = tmp9.to(tl.float32) 2025-12-04T12:15:05.5699443Z E1204 11:58:00.495000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = tmp10 - tmp3 2025-12-04T12:15:05.5699923Z E1204 11:58:00.495000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = 4096.0 2025-12-04T12:15:05.5700427Z E1204 11:58:00.495000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = (tmp7 / tmp12) 2025-12-04T12:15:05.5700900Z E1204 11:58:00.495000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp14 = 1e-05 2025-12-04T12:15:05.5701394Z E1204 11:58:00.495000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp15 = tmp13 + tmp14 2025-12-04T12:15:05.5701932Z E1204 11:58:00.495000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp16 = libdevice.rsqrt(tmp15) 2025-12-04T12:15:05.5702443Z E1204 11:58:00.495000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp17 = tmp11 * tmp16 2025-12-04T12:15:05.5702993Z E1204 11:58:00.495000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp18 = tl_math.abs(tmp17) 2025-12-04T12:15:05.5703599Z E1204 11:58:00.495000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp19 = tl.broadcast_to(tmp18, [XBLOCK, R0_BLOCK]) 2025-12-04T12:15:05.5704183Z E1204 11:58:00.495000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp21 = triton_helpers.maximum(_tmp20, tmp19) 2025-12-04T12:15:05.5704769Z E1204 11:58:00.495000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp20 = tl.where(r0_mask & xmask, tmp21, _tmp20) 2025-12-04T12:15:05.5705277Z E1204 11:58:00.495000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp24 = tmp17 * tmp23 2025-12-04T12:15:05.5705777Z E1204 11:58:00.495000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp25 = -448.0 2025-12-04T12:15:05.5706367Z E1204 11:58:00.495000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp26 = triton_helpers.maximum(tmp24, tmp25) 2025-12-04T12:15:05.5706819Z E1204 11:58:00.495000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp27 = 448.0 2025-12-04T12:15:05.5707422Z E1204 11:58:00.495000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp28 = triton_helpers.minimum(tmp26, tmp27) 2025-12-04T12:15:05.5707973Z E1204 11:58:00.495000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp29 = tmp28.to(tl.float8e4nv) 2025-12-04T12:15:05.5708598Z E1204 11:58:00.495000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr3 + (r0_1 + 4096*x0), tmp29, r0_mask & xmask) 2025-12-04T12:15:05.5709194Z E1204 11:58:00.495000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp20 = triton_helpers.max2(_tmp20, 1)[:, None] 2025-12-04T12:15:05.5709746Z E1204 11:58:00.495000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr2 + (x0), tmp20, xmask) 2025-12-04T12:15:05.5710118Z E1204 11:58:00.495000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:05.5712367Z E1204 11:58:00.495000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr2': '*fp32', 'out_ptr3': '*fp8e4nv', 'xnumel': 'i32', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1, 'R0_BLOCK': 4096}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 75} 2025-12-04T12:15:05.5712951Z E1204 11:58:00.495000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:05.5713999Z E1204 11:58:00.495000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.5714629Z E1204 11:58:00.495000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.5715532Z E1204 11:58:00.495000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.5716215Z E1204 11:58:00.495000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.5717140Z E1204 11:58:00.495000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.5717913Z E1204 11:58:00.495000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.5718533Z E1204 11:58:00.495000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T12:15:05.5719677Z E1204 11:58:00.495000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T12:15:05.5720058Z E1204 11:58:00.495000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T12:15:05.5720950Z E1204 11:58:00.495000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.5721117Z ('RERUN', {'yellow': True}) [0.4378s] [ 0%] 2025-12-04T12:15:05.5722468Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,4096_cuda E1204 11:58:00.929000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0 2025-12-04T12:15:05.5723554Z E1204 11:58:00.929000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T12:15:05.5724001Z E1204 11:58:00.929000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 10 2025-12-04T12:15:05.5724451Z E1204 11:58:00.929000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 4096 2025-12-04T12:15:05.5724911Z E1204 11:58:00.929000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T12:15:05.5725458Z E1204 11:58:00.929000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T12:15:05.5725999Z E1204 11:58:00.929000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:05.5726632Z E1204 11:58:00.929000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T12:15:05.5727122Z E1204 11:58:00.929000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = xindex < xnumel 2025-12-04T12:15:05.5727687Z E1204 11:58:00.929000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_base = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T12:15:05.5728136Z E1204 11:58:00.929000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rbase = r0_base 2025-12-04T12:15:05.5728564Z E1204 11:58:00.929000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T12:15:05.5729179Z E1204 11:58:00.929000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_mean = tl.zeros([XBLOCK, R0_BLOCK], tl.float32) 2025-12-04T12:15:05.5729773Z E1204 11:58:00.929000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_m2 = tl.zeros([XBLOCK, R0_BLOCK], tl.float32) 2025-12-04T12:15:05.5730396Z E1204 11:58:00.929000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_weight = tl.zeros([XBLOCK, R0_BLOCK], tl.float32) 2025-12-04T12:15:05.5731010Z E1204 11:58:00.929000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] for r0_offset in tl.range(0, r0_numel, R0_BLOCK): 2025-12-04T12:15:05.5731543Z E1204 11:58:00.929000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = r0_offset + r0_base 2025-12-04T12:15:05.5732089Z E1204 11:58:00.929000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T12:15:05.5732580Z E1204 11:58:00.929000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T12:15:05.5733106Z E1204 11:58:00.929000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T12:15:05.5733576Z E1204 11:58:00.929000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_1 = r0_index 2025-12-04T12:15:05.5734386Z E1204 11:58:00.929000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask & xmask, eviction_policy='evict_last', other=0.0).to(tl.float32) 2025-12-04T12:15:05.5734956Z E1204 11:58:00.929000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float32) 2025-12-04T12:15:05.5735545Z E1204 11:58:00.929000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK]) 2025-12-04T12:15:05.5736333Z E1204 11:58:00.929000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_mean_next, tmp3_m2_next, tmp3_weight_next = triton_helpers.welford_reduce( 2025-12-04T12:15:05.5736950Z E1204 11:58:00.929000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2, tmp3_mean, tmp3_m2, tmp3_weight, roffset == 0 2025-12-04T12:15:05.5737367Z E1204 11:58:00.929000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ) 2025-12-04T12:15:05.5738016Z E1204 11:58:00.929000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_mean = tl.where(r0_mask & xmask, tmp3_mean_next, tmp3_mean) 2025-12-04T12:15:05.5738640Z E1204 11:58:00.929000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_m2 = tl.where(r0_mask & xmask, tmp3_m2_next, tmp3_m2) 2025-12-04T12:15:05.5739330Z E1204 11:58:00.929000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_weight = tl.where(r0_mask & xmask, tmp3_weight_next, tmp3_weight) 2025-12-04T12:15:05.5740089Z E1204 11:58:00.929000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4, tmp5, tmp6 = triton_helpers.welford(tmp3_mean, tmp3_m2, tmp3_weight, 1) 2025-12-04T12:15:05.5740587Z E1204 11:58:00.929000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tmp4[:, None] 2025-12-04T12:15:05.5741063Z E1204 11:58:00.929000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = tmp5[:, None] 2025-12-04T12:15:05.5741540Z E1204 11:58:00.929000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tmp6[:, None] 2025-12-04T12:15:05.5742188Z E1204 11:58:00.929000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp20 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32) 2025-12-04T12:15:05.5742714Z E1204 11:58:00.929000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp22 = tl.load(in_ptr1 + (0)) 2025-12-04T12:15:05.5743282Z E1204 11:58:00.929000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp23 = tl.broadcast_to(tmp22, [1, 1]) 2025-12-04T12:15:05.5743861Z E1204 11:58:00.929000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] for r0_offset in tl.range(0, r0_numel, R0_BLOCK): 2025-12-04T12:15:05.5744440Z E1204 11:58:00.929000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = r0_offset + r0_base 2025-12-04T12:15:05.5744972Z E1204 11:58:00.929000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T12:15:05.5745461Z E1204 11:58:00.929000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T12:15:05.5745956Z E1204 11:58:00.929000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T12:15:05.5746426Z E1204 11:58:00.929000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_1 = r0_index 2025-12-04T12:15:05.5747291Z E1204 11:58:00.929000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask & xmask, eviction_policy='evict_first', other=0.0).to(tl.float32) 2025-12-04T12:15:05.5747826Z E1204 11:58:00.929000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = tmp9.to(tl.float32) 2025-12-04T12:15:05.5748353Z E1204 11:58:00.929000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = tmp10 - tmp3 2025-12-04T12:15:05.5748827Z E1204 11:58:00.929000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = 4096.0 2025-12-04T12:15:05.5749326Z E1204 11:58:00.929000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = (tmp7 / tmp12) 2025-12-04T12:15:05.5749802Z E1204 11:58:00.929000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp14 = 1e-05 2025-12-04T12:15:05.5750300Z E1204 11:58:00.929000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp15 = tmp13 + tmp14 2025-12-04T12:15:05.5750844Z E1204 11:58:00.929000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp16 = libdevice.rsqrt(tmp15) 2025-12-04T12:15:05.5751354Z E1204 11:58:00.929000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp17 = tmp11 * tmp16 2025-12-04T12:15:05.5751882Z E1204 11:58:00.929000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp18 = tl_math.abs(tmp17) 2025-12-04T12:15:05.5752485Z E1204 11:58:00.929000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp19 = tl.broadcast_to(tmp18, [XBLOCK, R0_BLOCK]) 2025-12-04T12:15:05.5753093Z E1204 11:58:00.929000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp21 = triton_helpers.maximum(_tmp20, tmp19) 2025-12-04T12:15:05.5753684Z E1204 11:58:00.929000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp20 = tl.where(r0_mask & xmask, tmp21, _tmp20) 2025-12-04T12:15:05.5754198Z E1204 11:58:00.929000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp24 = tmp17 * tmp23 2025-12-04T12:15:05.5754662Z E1204 11:58:00.929000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp25 = -448.0 2025-12-04T12:15:05.5755258Z E1204 11:58:00.929000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp26 = triton_helpers.maximum(tmp24, tmp25) 2025-12-04T12:15:05.5755713Z E1204 11:58:00.929000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp27 = 448.0 2025-12-04T12:15:05.5756302Z E1204 11:58:00.929000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp28 = triton_helpers.minimum(tmp26, tmp27) 2025-12-04T12:15:05.5756844Z E1204 11:58:00.929000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp29 = tmp28.to(tl.float8e4nv) 2025-12-04T12:15:05.5757493Z E1204 11:58:00.929000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr3 + (r0_1 + 4096*x0), tmp29, r0_mask & xmask) 2025-12-04T12:15:05.5758083Z E1204 11:58:00.929000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp20 = triton_helpers.max2(_tmp20, 1)[:, None] 2025-12-04T12:15:05.5758628Z E1204 11:58:00.929000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr2 + (x0), tmp20, xmask) 2025-12-04T12:15:05.5759000Z E1204 11:58:00.929000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:05.5761282Z E1204 11:58:00.929000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr2': '*fp32', 'out_ptr3': '*fp8e4nv', 'xnumel': 'i32', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1, 'R0_BLOCK': 4096}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 75} 2025-12-04T12:15:05.5761859Z E1204 11:58:00.929000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:05.5762897Z E1204 11:58:00.929000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.5763544Z E1204 11:58:00.929000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.5764441Z E1204 11:58:00.929000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.5765122Z E1204 11:58:00.929000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.5766022Z E1204 11:58:00.929000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.5766788Z E1204 11:58:00.929000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.5767447Z E1204 11:58:00.929000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T12:15:05.5768542Z E1204 11:58:00.929000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T12:15:05.5768922Z E1204 11:58:00.929000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T12:15:05.5769813Z E1204 11:58:00.929000 118065 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.5769921Z FAILED [0.4316s] [ 0%] 2025-12-04T12:15:05.5769927Z 2025-12-04T12:15:05.5770087Z ==================================== RERUNS ==================================== 2025-12-04T12:15:05.5770493Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,4096_cuda _ 2025-12-04T12:15:05.5770631Z Traceback (most recent call last): 2025-12-04T12:15:05.5771309Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant 2025-12-04T12:15:05.5771549Z y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T12:15:05.5772057Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:05.5772307Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:05.5772821Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:05.5773033Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:05.5773595Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:05.5773756Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:05.5774296Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:05.5774616Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:05.5775199Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:05.5775350Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:05.5775846Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:05.5775970Z return self._compile_to_module() 2025-12-04T12:15:05.5776515Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:05.5776697Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:05.5777212Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:05.5777346Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:05.5777857Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:05.5778094Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:05.5778697Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:05.5778829Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:05.5779329Z File "/tmp/tmp52lu5q0o/hm/chmvvir2m2bfeysrqounc4phn2vjd2go73w5dudbgovks33bt2rm.py", line 65, in 2025-12-04T12:15:05.5779862Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T12:15:05.5779976Z kernel.precompile( 2025-12-04T12:15:05.5780544Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T12:15:05.5780663Z self._precompile_worker() 2025-12-04T12:15:05.5781260Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:05.5781459Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:05.5782056Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.5782269Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.5782723Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.5782971Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.5783427Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.5783796Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.5784023Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:05.5784691Z def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T12:15:05.5784782Z ^ 2025-12-04T12:15:05.5785247Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.5785253Z 2025-12-04T12:15:05.5785964Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:05.5786004Z 2025-12-04T12:15:05.5786009Z 2025-12-04T12:15:05.5786238Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:05.5786948Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,4096_cuda 2025-12-04T12:15:05.5786953Z 2025-12-04T12:15:05.5787221Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:05.5787490Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.5787597Z frames [('total', 1)] 2025-12-04T12:15:05.5787716Z stats [('calls_captured', 10)] 2025-12-04T12:15:05.5788196Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:05.5788421Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.5788536Z graph_break [] 2025-12-04T12:15:05.5788937Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,4096_cuda _ 2025-12-04T12:15:05.5789062Z Traceback (most recent call last): 2025-12-04T12:15:05.5789506Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant 2025-12-04T12:15:05.5789740Z y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T12:15:05.5790242Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:05.5790490Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:05.5790999Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:05.5791204Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:05.5791745Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:05.5791891Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:05.5792437Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:05.5792757Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:05.5793285Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:05.5793435Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:05.5793913Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:05.5794053Z return self._compile_to_module() 2025-12-04T12:15:05.5794538Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:05.5794719Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:05.5795232Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:05.5795392Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:05.5795899Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:05.5796132Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:05.5796721Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:05.5796862Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:05.5797358Z File "/tmp/tmp7u5s5bo3/tx/ctxhqvm6vjk4z3jw337ib5ariseyq2rg7cjxzy3h6fn5exx6hodp.py", line 65, in 2025-12-04T12:15:05.5797836Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T12:15:05.5797980Z kernel.precompile( 2025-12-04T12:15:05.5798537Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T12:15:05.5798668Z self._precompile_worker() 2025-12-04T12:15:05.5799259Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:05.5799484Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:05.5800076Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.5800275Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.5800738Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.5800988Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.5801431Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.5801782Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.5802008Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:05.5802676Z def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T12:15:05.5802774Z ^ 2025-12-04T12:15:05.5803233Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.5803239Z 2025-12-04T12:15:05.5803967Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:05.5804009Z 2025-12-04T12:15:05.5804016Z 2025-12-04T12:15:05.5804236Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:05.5804959Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,4096_cuda 2025-12-04T12:15:05.5804965Z 2025-12-04T12:15:05.5805234Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:05.5805473Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.5805579Z frames [('total', 1)] 2025-12-04T12:15:05.5805698Z stats [('calls_captured', 10)] 2025-12-04T12:15:05.5806175Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:05.5806401Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.5806502Z graph_break [] 2025-12-04T12:15:05.5806737Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.5806841Z frames [('total', 1)] 2025-12-04T12:15:05.5806969Z stats [('calls_captured', 10)] 2025-12-04T12:15:05.5807219Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.5807684Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:05.5807800Z graph_break [] 2025-12-04T12:15:05.5807948Z =================================== FAILURES =================================== 2025-12-04T12:15:05.5808350Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,4096_cuda _ 2025-12-04T12:15:05.5808487Z Traceback (most recent call last): 2025-12-04T12:15:05.5808912Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant 2025-12-04T12:15:05.5809158Z y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T12:15:05.5809680Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:05.5809946Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:05.5810471Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:05.5810709Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:05.5811223Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:05.5811387Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:05.5811922Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:05.5812259Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:05.5812783Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:05.5813454Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:05.5813955Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:05.5814081Z return self._compile_to_module() 2025-12-04T12:15:05.5814589Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:05.5814756Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:05.5815277Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:05.5815425Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:05.5815985Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:05.5816217Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:05.5816894Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:05.5817024Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:05.5817534Z File "/tmp/tmps8718jlj/6w/c6whfj672da2swy4iwrciepo4f2th75hpztdfso67ti34msth4rb.py", line 65, in 2025-12-04T12:15:05.5818005Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T12:15:05.5818119Z kernel.precompile( 2025-12-04T12:15:05.5818693Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T12:15:05.5818816Z self._precompile_worker() 2025-12-04T12:15:05.5819430Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:05.5819613Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:05.5820243Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.5820455Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.5820910Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.5821161Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.5821623Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.5821959Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.5822204Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:05.5822890Z def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T12:15:05.5822984Z ^ 2025-12-04T12:15:05.5823463Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.5823470Z 2025-12-04T12:15:05.5824182Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:05.5824221Z 2025-12-04T12:15:05.5824226Z 2025-12-04T12:15:05.5824462Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:05.5825170Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,4096_cuda 2025-12-04T12:15:05.5825178Z 2025-12-04T12:15:05.5825469Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:05.5825708Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.5825865Z frames [('total', 1)] 2025-12-04T12:15:05.5826054Z stats [('calls_captured', 10)] 2025-12-04T12:15:05.5826526Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:05.5826754Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.5826872Z graph_break [] 2025-12-04T12:15:05.5827090Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.5827208Z frames [('total', 1)] 2025-12-04T12:15:05.5827326Z stats [('calls_captured', 10)] 2025-12-04T12:15:05.5827543Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.5828063Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:05.5828164Z graph_break [] 2025-12-04T12:15:05.5828382Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.5828499Z frames [('total', 1)] 2025-12-04T12:15:05.5828616Z stats [('calls_captured', 10)] 2025-12-04T12:15:05.5828834Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.5829301Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:05.5829402Z graph_break [] 2025-12-04T12:15:05.5830067Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-1381d94cd6abec18.xml - 2025-12-04T12:15:05.5830242Z =========================== short test summary info ============================ 2025-12-04T12:15:05.5831098Z FAILED [0.4316s] inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,4096_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:05.5831797Z def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T12:15:05.5831888Z ^ 2025-12-04T12:15:05.5832359Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.5832368Z 2025-12-04T12:15:05.5833076Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:05.5833082Z 2025-12-04T12:15:05.5833087Z 2025-12-04T12:15:05.5833316Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:05.5834022Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,4096_cuda 2025-12-04T12:15:05.5834063Z 2025-12-04T12:15:05.5834333Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:05.5834531Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T12:15:05.5834735Z ================== 1 failed, 29 deselected, 2 rerun in 4.36s =================== 2025-12-04T12:15:05.5834848Z Got exit code 1 2025-12-04T12:15:05.5834992Z Retrying single test... 2025-12-04T12:15:05.5835467Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-2b9aebe063e8f7ef.xml 2025-12-04T12:15:05.5835649Z ============================= test session starts ============================== 2025-12-04T12:15:05.5836000Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T12:15:05.5836115Z cachedir: .pytest_cache 2025-12-04T12:15:05.5836654Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:15:05.5836780Z rootdir: /var/lib/jenkins/workspace 2025-12-04T12:15:05.5836901Z configfile: pytest.ini 2025-12-04T12:15:05.5837497Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T12:15:05.5837720Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T12:15:05.5838523Z stepcurrent: skipping 29 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,4096_cuda 2025-12-04T12:15:05.5838640Z Running 1 items in this shard 2025-12-04T12:15:05.5838645Z 2025-12-04T12:15:05.5840002Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,4096_cuda E1204 11:58:19.936000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0 2025-12-04T12:15:05.5841134Z E1204 11:58:19.936000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T12:15:05.5841574Z E1204 11:58:19.936000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 10 2025-12-04T12:15:05.5842038Z E1204 11:58:19.936000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 4096 2025-12-04T12:15:05.5842499Z E1204 11:58:19.936000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T12:15:05.5843043Z E1204 11:58:19.936000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T12:15:05.5843590Z E1204 11:58:19.936000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:05.5844223Z E1204 11:58:19.936000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T12:15:05.5844714Z E1204 11:58:19.936000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = xindex < xnumel 2025-12-04T12:15:05.5845277Z E1204 11:58:19.936000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_base = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T12:15:05.5845738Z E1204 11:58:19.936000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rbase = r0_base 2025-12-04T12:15:05.5846166Z E1204 11:58:19.936000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T12:15:05.5846813Z E1204 11:58:19.936000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_mean = tl.zeros([XBLOCK, R0_BLOCK], tl.float32) 2025-12-04T12:15:05.5847401Z E1204 11:58:19.936000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_m2 = tl.zeros([XBLOCK, R0_BLOCK], tl.float32) 2025-12-04T12:15:05.5848009Z E1204 11:58:19.936000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_weight = tl.zeros([XBLOCK, R0_BLOCK], tl.float32) 2025-12-04T12:15:05.5848639Z E1204 11:58:19.936000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] for r0_offset in tl.range(0, r0_numel, R0_BLOCK): 2025-12-04T12:15:05.5849170Z E1204 11:58:19.936000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = r0_offset + r0_base 2025-12-04T12:15:05.5849709Z E1204 11:58:19.936000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T12:15:05.5850204Z E1204 11:58:19.936000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T12:15:05.5850685Z E1204 11:58:19.936000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T12:15:05.5851166Z E1204 11:58:19.936000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_1 = r0_index 2025-12-04T12:15:05.5851981Z E1204 11:58:19.936000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask & xmask, eviction_policy='evict_last', other=0.0).to(tl.float32) 2025-12-04T12:15:05.5852521Z E1204 11:58:19.936000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float32) 2025-12-04T12:15:05.5853110Z E1204 11:58:19.936000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK]) 2025-12-04T12:15:05.5853893Z E1204 11:58:19.936000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_mean_next, tmp3_m2_next, tmp3_weight_next = triton_helpers.welford_reduce( 2025-12-04T12:15:05.5854501Z E1204 11:58:19.936000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2, tmp3_mean, tmp3_m2, tmp3_weight, roffset == 0 2025-12-04T12:15:05.5854900Z E1204 11:58:19.936000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ) 2025-12-04T12:15:05.5855572Z E1204 11:58:19.936000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_mean = tl.where(r0_mask & xmask, tmp3_mean_next, tmp3_mean) 2025-12-04T12:15:05.5856191Z E1204 11:58:19.936000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_m2 = tl.where(r0_mask & xmask, tmp3_m2_next, tmp3_m2) 2025-12-04T12:15:05.5857244Z E1204 11:58:19.936000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_weight = tl.where(r0_mask & xmask, tmp3_weight_next, tmp3_weight) 2025-12-04T12:15:05.5858018Z E1204 11:58:19.936000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4, tmp5, tmp6 = triton_helpers.welford(tmp3_mean, tmp3_m2, tmp3_weight, 1) 2025-12-04T12:15:05.5858502Z E1204 11:58:19.936000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tmp4[:, None] 2025-12-04T12:15:05.5858997Z E1204 11:58:19.936000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = tmp5[:, None] 2025-12-04T12:15:05.5859469Z E1204 11:58:19.936000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tmp6[:, None] 2025-12-04T12:15:05.5860116Z E1204 11:58:19.936000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp20 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32) 2025-12-04T12:15:05.5860675Z E1204 11:58:19.936000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp22 = tl.load(in_ptr1 + (0)) 2025-12-04T12:15:05.5861243Z E1204 11:58:19.936000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp23 = tl.broadcast_to(tmp22, [1, 1]) 2025-12-04T12:15:05.5861821Z E1204 11:58:19.936000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] for r0_offset in tl.range(0, r0_numel, R0_BLOCK): 2025-12-04T12:15:05.5862400Z E1204 11:58:19.936000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = r0_offset + r0_base 2025-12-04T12:15:05.5862937Z E1204 11:58:19.936000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T12:15:05.5863427Z E1204 11:58:19.936000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T12:15:05.5863925Z E1204 11:58:19.936000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T12:15:05.5864391Z E1204 11:58:19.936000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_1 = r0_index 2025-12-04T12:15:05.5865209Z E1204 11:58:19.936000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask & xmask, eviction_policy='evict_first', other=0.0).to(tl.float32) 2025-12-04T12:15:05.5865752Z E1204 11:58:19.936000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = tmp9.to(tl.float32) 2025-12-04T12:15:05.5866249Z E1204 11:58:19.936000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = tmp10 - tmp3 2025-12-04T12:15:05.5866723Z E1204 11:58:19.936000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = 4096.0 2025-12-04T12:15:05.5867333Z E1204 11:58:19.936000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = (tmp7 / tmp12) 2025-12-04T12:15:05.5867794Z E1204 11:58:19.936000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp14 = 1e-05 2025-12-04T12:15:05.5868308Z E1204 11:58:19.936000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp15 = tmp13 + tmp14 2025-12-04T12:15:05.5868843Z E1204 11:58:19.936000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp16 = libdevice.rsqrt(tmp15) 2025-12-04T12:15:05.5869352Z E1204 11:58:19.936000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp17 = tmp11 * tmp16 2025-12-04T12:15:05.5869872Z E1204 11:58:19.936000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp18 = tl_math.abs(tmp17) 2025-12-04T12:15:05.5870470Z E1204 11:58:19.936000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp19 = tl.broadcast_to(tmp18, [XBLOCK, R0_BLOCK]) 2025-12-04T12:15:05.5871247Z E1204 11:58:19.936000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp21 = triton_helpers.maximum(_tmp20, tmp19) 2025-12-04T12:15:05.5871916Z E1204 11:58:19.936000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp20 = tl.where(r0_mask & xmask, tmp21, _tmp20) 2025-12-04T12:15:05.5872437Z E1204 11:58:19.936000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp24 = tmp17 * tmp23 2025-12-04T12:15:05.5872900Z E1204 11:58:19.936000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp25 = -448.0 2025-12-04T12:15:05.5873490Z E1204 11:58:19.936000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp26 = triton_helpers.maximum(tmp24, tmp25) 2025-12-04T12:15:05.5873947Z E1204 11:58:19.936000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp27 = 448.0 2025-12-04T12:15:05.5874563Z E1204 11:58:19.936000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp28 = triton_helpers.minimum(tmp26, tmp27) 2025-12-04T12:15:05.5875120Z E1204 11:58:19.936000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp29 = tmp28.to(tl.float8e4nv) 2025-12-04T12:15:05.5875747Z E1204 11:58:19.936000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr3 + (r0_1 + 4096*x0), tmp29, r0_mask & xmask) 2025-12-04T12:15:05.5876376Z E1204 11:58:19.936000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp20 = triton_helpers.max2(_tmp20, 1)[:, None] 2025-12-04T12:15:05.5876922Z E1204 11:58:19.936000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr2 + (x0), tmp20, xmask) 2025-12-04T12:15:05.5877286Z E1204 11:58:19.936000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:05.5879558Z E1204 11:58:19.936000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr2': '*fp32', 'out_ptr3': '*fp8e4nv', 'xnumel': 'i32', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1, 'R0_BLOCK': 4096}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 75} 2025-12-04T12:15:05.5880093Z E1204 11:58:19.936000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:05.5881198Z E1204 11:58:19.936000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.5881836Z E1204 11:58:19.936000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.5882736Z E1204 11:58:19.936000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.5883416Z E1204 11:58:19.936000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.5884309Z E1204 11:58:19.936000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.5885083Z E1204 11:58:19.936000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.5885738Z E1204 11:58:19.936000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T12:15:05.5886834Z E1204 11:58:19.936000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T12:15:05.5887202Z E1204 11:58:19.936000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T12:15:05.5888103Z E1204 11:58:19.936000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.5888272Z ('RERUN', {'yellow': True}) [3.4101s] [100%] 2025-12-04T12:15:05.5889642Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,4096_cuda E1204 11:58:20.412000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0 2025-12-04T12:15:05.5890752Z E1204 11:58:20.412000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T12:15:05.5891201Z E1204 11:58:20.412000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 10 2025-12-04T12:15:05.5891655Z E1204 11:58:20.412000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 4096 2025-12-04T12:15:05.5892120Z E1204 11:58:20.412000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T12:15:05.5892671Z E1204 11:58:20.412000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T12:15:05.5893211Z E1204 11:58:20.412000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:05.5893808Z E1204 11:58:20.412000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T12:15:05.5898990Z E1204 11:58:20.412000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = xindex < xnumel 2025-12-04T12:15:05.5899704Z E1204 11:58:20.412000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_base = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T12:15:05.5900180Z E1204 11:58:20.412000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rbase = r0_base 2025-12-04T12:15:05.5900619Z E1204 11:58:20.412000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T12:15:05.5901239Z E1204 11:58:20.412000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_mean = tl.zeros([XBLOCK, R0_BLOCK], tl.float32) 2025-12-04T12:15:05.5901832Z E1204 11:58:20.412000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_m2 = tl.zeros([XBLOCK, R0_BLOCK], tl.float32) 2025-12-04T12:15:05.5902445Z E1204 11:58:20.412000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_weight = tl.zeros([XBLOCK, R0_BLOCK], tl.float32) 2025-12-04T12:15:05.5903041Z E1204 11:58:20.412000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] for r0_offset in tl.range(0, r0_numel, R0_BLOCK): 2025-12-04T12:15:05.5903577Z E1204 11:58:20.412000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = r0_offset + r0_base 2025-12-04T12:15:05.5904172Z E1204 11:58:20.412000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T12:15:05.5904664Z E1204 11:58:20.412000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T12:15:05.5905148Z E1204 11:58:20.412000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T12:15:05.5905630Z E1204 11:58:20.412000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_1 = r0_index 2025-12-04T12:15:05.5906441Z E1204 11:58:20.412000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask & xmask, eviction_policy='evict_last', other=0.0).to(tl.float32) 2025-12-04T12:15:05.5907020Z E1204 11:58:20.412000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float32) 2025-12-04T12:15:05.5907612Z E1204 11:58:20.412000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK]) 2025-12-04T12:15:05.5908336Z E1204 11:58:20.412000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_mean_next, tmp3_m2_next, tmp3_weight_next = triton_helpers.welford_reduce( 2025-12-04T12:15:05.5908989Z E1204 11:58:20.412000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2, tmp3_mean, tmp3_m2, tmp3_weight, roffset == 0 2025-12-04T12:15:05.5909389Z E1204 11:58:20.412000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ) 2025-12-04T12:15:05.5910060Z E1204 11:58:20.412000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_mean = tl.where(r0_mask & xmask, tmp3_mean_next, tmp3_mean) 2025-12-04T12:15:05.5910682Z E1204 11:58:20.412000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_m2 = tl.where(r0_mask & xmask, tmp3_m2_next, tmp3_m2) 2025-12-04T12:15:05.5911370Z E1204 11:58:20.412000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_weight = tl.where(r0_mask & xmask, tmp3_weight_next, tmp3_weight) 2025-12-04T12:15:05.5912074Z E1204 11:58:20.412000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4, tmp5, tmp6 = triton_helpers.welford(tmp3_mean, tmp3_m2, tmp3_weight, 1) 2025-12-04T12:15:05.5912554Z E1204 11:58:20.412000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tmp4[:, None] 2025-12-04T12:15:05.5913082Z E1204 11:58:20.412000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = tmp5[:, None] 2025-12-04T12:15:05.5913559Z E1204 11:58:20.412000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tmp6[:, None] 2025-12-04T12:15:05.5914207Z E1204 11:58:20.412000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp20 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32) 2025-12-04T12:15:05.5914743Z E1204 11:58:20.412000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp22 = tl.load(in_ptr1 + (0)) 2025-12-04T12:15:05.5915291Z E1204 11:58:20.412000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp23 = tl.broadcast_to(tmp22, [1, 1]) 2025-12-04T12:15:05.5915888Z E1204 11:58:20.412000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] for r0_offset in tl.range(0, r0_numel, R0_BLOCK): 2025-12-04T12:15:05.5916416Z E1204 11:58:20.412000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = r0_offset + r0_base 2025-12-04T12:15:05.5916960Z E1204 11:58:20.412000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T12:15:05.5917479Z E1204 11:58:20.412000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T12:15:05.5917977Z E1204 11:58:20.412000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T12:15:05.5918442Z E1204 11:58:20.412000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_1 = r0_index 2025-12-04T12:15:05.5919259Z E1204 11:58:20.412000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask & xmask, eviction_policy='evict_first', other=0.0).to(tl.float32) 2025-12-04T12:15:05.5919808Z E1204 11:58:20.412000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = tmp9.to(tl.float32) 2025-12-04T12:15:05.5920362Z E1204 11:58:20.412000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = tmp10 - tmp3 2025-12-04T12:15:05.5920840Z E1204 11:58:20.412000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = 4096.0 2025-12-04T12:15:05.5921342Z E1204 11:58:20.412000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = (tmp7 / tmp12) 2025-12-04T12:15:05.5921834Z E1204 11:58:20.412000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp14 = 1e-05 2025-12-04T12:15:05.5922342Z E1204 11:58:20.412000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp15 = tmp13 + tmp14 2025-12-04T12:15:05.5922876Z E1204 11:58:20.412000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp16 = libdevice.rsqrt(tmp15) 2025-12-04T12:15:05.5923386Z E1204 11:58:20.412000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp17 = tmp11 * tmp16 2025-12-04T12:15:05.5923909Z E1204 11:58:20.412000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp18 = tl_math.abs(tmp17) 2025-12-04T12:15:05.5924501Z E1204 11:58:20.412000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp19 = tl.broadcast_to(tmp18, [XBLOCK, R0_BLOCK]) 2025-12-04T12:15:05.5925094Z E1204 11:58:20.412000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp21 = triton_helpers.maximum(_tmp20, tmp19) 2025-12-04T12:15:05.5925677Z E1204 11:58:20.412000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp20 = tl.where(r0_mask & xmask, tmp21, _tmp20) 2025-12-04T12:15:05.5926187Z E1204 11:58:20.412000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp24 = tmp17 * tmp23 2025-12-04T12:15:05.5926692Z E1204 11:58:20.412000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp25 = -448.0 2025-12-04T12:15:05.5927269Z E1204 11:58:20.412000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp26 = triton_helpers.maximum(tmp24, tmp25) 2025-12-04T12:15:05.5927737Z E1204 11:58:20.412000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp27 = 448.0 2025-12-04T12:15:05.5928311Z E1204 11:58:20.412000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp28 = triton_helpers.minimum(tmp26, tmp27) 2025-12-04T12:15:05.5928866Z E1204 11:58:20.412000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp29 = tmp28.to(tl.float8e4nv) 2025-12-04T12:15:05.5929488Z E1204 11:58:20.412000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr3 + (r0_1 + 4096*x0), tmp29, r0_mask & xmask) 2025-12-04T12:15:05.5930083Z E1204 11:58:20.412000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp20 = triton_helpers.max2(_tmp20, 1)[:, None] 2025-12-04T12:15:05.5930662Z E1204 11:58:20.412000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr2 + (x0), tmp20, xmask) 2025-12-04T12:15:05.5931025Z E1204 11:58:20.412000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:05.5933310Z E1204 11:58:20.412000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr2': '*fp32', 'out_ptr3': '*fp8e4nv', 'xnumel': 'i32', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1, 'R0_BLOCK': 4096}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 75} 2025-12-04T12:15:05.5933845Z E1204 11:58:20.412000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:05.5934908Z E1204 11:58:20.412000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.5935565Z E1204 11:58:20.412000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.5936552Z E1204 11:58:20.412000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.5937242Z E1204 11:58:20.412000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.5938135Z E1204 11:58:20.412000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.5938912Z E1204 11:58:20.412000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.5939524Z E1204 11:58:20.412000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T12:15:05.5940624Z E1204 11:58:20.412000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T12:15:05.5941032Z E1204 11:58:20.412000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T12:15:05.5941944Z E1204 11:58:20.412000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.5942081Z ('RERUN', {'yellow': True}) [0.4382s] [100%] 2025-12-04T12:15:05.5943448Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,4096_cuda E1204 11:58:20.851000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0 2025-12-04T12:15:05.5944533Z E1204 11:58:20.851000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T12:15:05.5945012Z E1204 11:58:20.851000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 10 2025-12-04T12:15:05.5945465Z E1204 11:58:20.851000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 4096 2025-12-04T12:15:05.5945929Z E1204 11:58:20.851000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T12:15:05.5946470Z E1204 11:58:20.851000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T12:15:05.5947007Z E1204 11:58:20.851000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:05.5947633Z E1204 11:58:20.851000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T12:15:05.5948129Z E1204 11:58:20.851000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = xindex < xnumel 2025-12-04T12:15:05.5948684Z E1204 11:58:20.851000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_base = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T12:15:05.5949147Z E1204 11:58:20.851000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rbase = r0_base 2025-12-04T12:15:05.5949608Z E1204 11:58:20.851000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T12:15:05.5950219Z E1204 11:58:20.851000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_mean = tl.zeros([XBLOCK, R0_BLOCK], tl.float32) 2025-12-04T12:15:05.5950807Z E1204 11:58:20.851000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_m2 = tl.zeros([XBLOCK, R0_BLOCK], tl.float32) 2025-12-04T12:15:05.5951413Z E1204 11:58:20.851000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_weight = tl.zeros([XBLOCK, R0_BLOCK], tl.float32) 2025-12-04T12:15:05.5952006Z E1204 11:58:20.851000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] for r0_offset in tl.range(0, r0_numel, R0_BLOCK): 2025-12-04T12:15:05.5952533Z E1204 11:58:20.851000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = r0_offset + r0_base 2025-12-04T12:15:05.5953068Z E1204 11:58:20.851000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T12:15:05.5953557Z E1204 11:58:20.851000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T12:15:05.5954067Z E1204 11:58:20.851000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T12:15:05.5954549Z E1204 11:58:20.851000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_1 = r0_index 2025-12-04T12:15:05.5955364Z E1204 11:58:20.851000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask & xmask, eviction_policy='evict_last', other=0.0).to(tl.float32) 2025-12-04T12:15:05.5955901Z E1204 11:58:20.851000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float32) 2025-12-04T12:15:05.5956486Z E1204 11:58:20.851000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK]) 2025-12-04T12:15:05.5957218Z E1204 11:58:20.851000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_mean_next, tmp3_m2_next, tmp3_weight_next = triton_helpers.welford_reduce( 2025-12-04T12:15:05.5957828Z E1204 11:58:20.851000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2, tmp3_mean, tmp3_m2, tmp3_weight, roffset == 0 2025-12-04T12:15:05.5958257Z E1204 11:58:20.851000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ) 2025-12-04T12:15:05.5958928Z E1204 11:58:20.851000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_mean = tl.where(r0_mask & xmask, tmp3_mean_next, tmp3_mean) 2025-12-04T12:15:05.5959547Z E1204 11:58:20.851000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_m2 = tl.where(r0_mask & xmask, tmp3_m2_next, tmp3_m2) 2025-12-04T12:15:05.5960229Z E1204 11:58:20.851000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_weight = tl.where(r0_mask & xmask, tmp3_weight_next, tmp3_weight) 2025-12-04T12:15:05.5960960Z E1204 11:58:20.851000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4, tmp5, tmp6 = triton_helpers.welford(tmp3_mean, tmp3_m2, tmp3_weight, 1) 2025-12-04T12:15:05.5961439Z E1204 11:58:20.851000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tmp4[:, None] 2025-12-04T12:15:05.5961930Z E1204 11:58:20.851000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = tmp5[:, None] 2025-12-04T12:15:05.5962403Z E1204 11:58:20.851000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tmp6[:, None] 2025-12-04T12:15:05.5963079Z E1204 11:58:20.851000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp20 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32) 2025-12-04T12:15:05.5963601Z E1204 11:58:20.851000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp22 = tl.load(in_ptr1 + (0)) 2025-12-04T12:15:05.5964155Z E1204 11:58:20.851000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp23 = tl.broadcast_to(tmp22, [1, 1]) 2025-12-04T12:15:05.5964737Z E1204 11:58:20.851000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] for r0_offset in tl.range(0, r0_numel, R0_BLOCK): 2025-12-04T12:15:05.5965266Z E1204 11:58:20.851000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = r0_offset + r0_base 2025-12-04T12:15:05.5965801Z E1204 11:58:20.851000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T12:15:05.5966287Z E1204 11:58:20.851000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T12:15:05.5966775Z E1204 11:58:20.851000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T12:15:05.5967275Z E1204 11:58:20.851000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_1 = r0_index 2025-12-04T12:15:05.5968490Z E1204 11:58:20.851000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask & xmask, eviction_policy='evict_first', other=0.0).to(tl.float32) 2025-12-04T12:15:05.5969039Z E1204 11:58:20.851000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = tmp9.to(tl.float32) 2025-12-04T12:15:05.5969534Z E1204 11:58:20.851000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = tmp10 - tmp3 2025-12-04T12:15:05.5970020Z E1204 11:58:20.851000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = 4096.0 2025-12-04T12:15:05.5970522Z E1204 11:58:20.851000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = (tmp7 / tmp12) 2025-12-04T12:15:05.5971150Z E1204 11:58:20.851000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp14 = 1e-05 2025-12-04T12:15:05.5971663Z E1204 11:58:20.851000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp15 = tmp13 + tmp14 2025-12-04T12:15:05.5972303Z E1204 11:58:20.851000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp16 = libdevice.rsqrt(tmp15) 2025-12-04T12:15:05.5972988Z E1204 11:58:20.851000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp17 = tmp11 * tmp16 2025-12-04T12:15:05.5973513Z E1204 11:58:20.851000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp18 = tl_math.abs(tmp17) 2025-12-04T12:15:05.5974106Z E1204 11:58:20.851000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp19 = tl.broadcast_to(tmp18, [XBLOCK, R0_BLOCK]) 2025-12-04T12:15:05.5974701Z E1204 11:58:20.851000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp21 = triton_helpers.maximum(_tmp20, tmp19) 2025-12-04T12:15:05.5975348Z E1204 11:58:20.851000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp20 = tl.where(r0_mask & xmask, tmp21, _tmp20) 2025-12-04T12:15:05.5975863Z E1204 11:58:20.851000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp24 = tmp17 * tmp23 2025-12-04T12:15:05.5976390Z E1204 11:58:20.851000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp25 = -448.0 2025-12-04T12:15:05.5977039Z E1204 11:58:20.851000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp26 = triton_helpers.maximum(tmp24, tmp25) 2025-12-04T12:15:05.5977497Z E1204 11:58:20.851000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp27 = 448.0 2025-12-04T12:15:05.5978069Z E1204 11:58:20.851000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp28 = triton_helpers.minimum(tmp26, tmp27) 2025-12-04T12:15:05.5978624Z E1204 11:58:20.851000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp29 = tmp28.to(tl.float8e4nv) 2025-12-04T12:15:05.5979253Z E1204 11:58:20.851000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr3 + (r0_1 + 4096*x0), tmp29, r0_mask & xmask) 2025-12-04T12:15:05.5979839Z E1204 11:58:20.851000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp20 = triton_helpers.max2(_tmp20, 1)[:, None] 2025-12-04T12:15:05.5980387Z E1204 11:58:20.851000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr2 + (x0), tmp20, xmask) 2025-12-04T12:15:05.5980744Z E1204 11:58:20.851000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:05.5983020Z E1204 11:58:20.851000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr2': '*fp32', 'out_ptr3': '*fp8e4nv', 'xnumel': 'i32', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1, 'R0_BLOCK': 4096}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 75} 2025-12-04T12:15:05.5983606Z E1204 11:58:20.851000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:05.5984664Z E1204 11:58:20.851000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.5985295Z E1204 11:58:20.851000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.5986227Z E1204 11:58:20.851000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.5986908Z E1204 11:58:20.851000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.5987804Z E1204 11:58:20.851000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.5988577Z E1204 11:58:20.851000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.5989228Z E1204 11:58:20.851000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T12:15:05.5990320Z E1204 11:58:20.851000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T12:15:05.5990716Z E1204 11:58:20.851000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T12:15:05.5991627Z E1204 11:58:20.851000 118262 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.5991733Z FAILED [0.4366s] [100%] 2025-12-04T12:15:05.5991742Z 2025-12-04T12:15:05.5991899Z ==================================== RERUNS ==================================== 2025-12-04T12:15:05.5992302Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,4096_cuda _ 2025-12-04T12:15:05.5992430Z Traceback (most recent call last): 2025-12-04T12:15:05.5992871Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant 2025-12-04T12:15:05.5993106Z y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T12:15:05.5993610Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:05.5993860Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:05.5994374Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:05.5994585Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:05.5995131Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:05.5995283Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:05.5995834Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:05.5996153Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:05.5996682Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:05.5996831Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:05.5997312Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:05.5997445Z return self._compile_to_module() 2025-12-04T12:15:05.5997930Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:05.5998108Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:05.5998618Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:05.5998781Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:05.5999291Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:05.5999524Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:05.6000111Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:05.6000250Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:05.6000751Z File "/tmp/tmp04b30vau/b5/cb5uctpdchzyz4l3v67hhpm7kst32xrpmiutmqx7ct3b3ewapdmd.py", line 65, in 2025-12-04T12:15:05.6001224Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T12:15:05.6001374Z kernel.precompile( 2025-12-04T12:15:05.6001934Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T12:15:05.6002066Z self._precompile_worker() 2025-12-04T12:15:05.6002664Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:05.6002891Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:05.6003487Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.6003685Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.6004150Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.6004403Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.6004849Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.6005199Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.6005429Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:05.6006093Z def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T12:15:05.6006188Z ^ 2025-12-04T12:15:05.6006648Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.6006658Z 2025-12-04T12:15:05.6007381Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:05.6007421Z 2025-12-04T12:15:05.6007426Z 2025-12-04T12:15:05.6007647Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:05.6008374Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,4096_cuda 2025-12-04T12:15:05.6008379Z 2025-12-04T12:15:05.6008652Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:05.6008899Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.6009007Z frames [('total', 1)] 2025-12-04T12:15:05.6009127Z stats [('calls_captured', 10)] 2025-12-04T12:15:05.6009607Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:05.6009835Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.6009935Z graph_break [] 2025-12-04T12:15:05.6010354Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,4096_cuda _ 2025-12-04T12:15:05.6010480Z Traceback (most recent call last): 2025-12-04T12:15:05.6010948Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant 2025-12-04T12:15:05.6011184Z y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T12:15:05.6011671Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:05.6011936Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:05.6012451Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:05.6012644Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:05.6013173Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:05.6013348Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:05.6013896Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:05.6014219Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:05.6014739Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:05.6014930Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:05.6015410Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:05.6015547Z return self._compile_to_module() 2025-12-04T12:15:05.6016031Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:05.6016203Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:05.6016807Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:05.6016937Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:05.6017453Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:05.6017683Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:05.6018275Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:05.6018416Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:05.6018909Z File "/tmp/tmp0ofaig06/ba/cbajx6g5cj6ofn6vfid3h3la7c4yxxeb3l7voh32dnrorf2hallf.py", line 65, in 2025-12-04T12:15:05.6019421Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T12:15:05.6019554Z kernel.precompile( 2025-12-04T12:15:05.6020103Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T12:15:05.6020236Z self._precompile_worker() 2025-12-04T12:15:05.6020831Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:05.6021012Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:05.6021617Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.6021821Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.6022283Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.6022529Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.6022980Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.6023357Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.6023586Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:05.6024232Z def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T12:15:05.6024335Z ^ 2025-12-04T12:15:05.6024797Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.6024803Z 2025-12-04T12:15:05.6025523Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:05.6025532Z 2025-12-04T12:15:05.6025537Z 2025-12-04T12:15:05.6025788Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:05.6026521Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,4096_cuda 2025-12-04T12:15:05.6026527Z 2025-12-04T12:15:05.6026796Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:05.6027052Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.6027178Z frames [('total', 1)] 2025-12-04T12:15:05.6027294Z stats [('calls_captured', 10)] 2025-12-04T12:15:05.6027765Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:05.6027998Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.6028100Z graph_break [] 2025-12-04T12:15:05.6028335Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.6028442Z frames [('total', 1)] 2025-12-04T12:15:05.6028558Z stats [('calls_captured', 10)] 2025-12-04T12:15:05.6028789Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.6029249Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:05.6029351Z graph_break [] 2025-12-04T12:15:05.6029513Z =================================== FAILURES =================================== 2025-12-04T12:15:05.6029911Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,4096_cuda _ 2025-12-04T12:15:05.6030049Z Traceback (most recent call last): 2025-12-04T12:15:05.6030470Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant 2025-12-04T12:15:05.6030735Z y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T12:15:05.6031237Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:05.6031487Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:05.6032010Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:05.6032204Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:05.6032715Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:05.6032875Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:05.6033406Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:05.6033729Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:05.6034264Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:05.6034411Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:05.6034937Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:05.6035061Z return self._compile_to_module() 2025-12-04T12:15:05.6035544Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:05.6035726Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:05.6036239Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:05.6036382Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:05.6036880Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:05.6037144Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:05.6037741Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:05.6037871Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:05.6038362Z File "/tmp/tmpbiarnudx/d6/cd6z47mesl7pokl4t3s4opldztddi63mkv53jbtnhrypqh7tb67y.py", line 65, in 2025-12-04T12:15:05.6038868Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T12:15:05.6038979Z kernel.precompile( 2025-12-04T12:15:05.6039542Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T12:15:05.6039658Z self._precompile_worker() 2025-12-04T12:15:05.6040259Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:05.6040455Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:05.6041047Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.6041255Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.6041699Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.6041944Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.6042395Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.6042727Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.6042985Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:05.6043650Z def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T12:15:05.6043737Z ^ 2025-12-04T12:15:05.6044211Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.6044217Z 2025-12-04T12:15:05.6044926Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:05.6044934Z 2025-12-04T12:15:05.6044939Z 2025-12-04T12:15:05.6045167Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:05.6045874Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,4096_cuda 2025-12-04T12:15:05.6045882Z 2025-12-04T12:15:05.6046148Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:05.6046382Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.6046485Z frames [('total', 1)] 2025-12-04T12:15:05.6046610Z stats [('calls_captured', 10)] 2025-12-04T12:15:05.6047108Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:05.6047329Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.6047444Z graph_break [] 2025-12-04T12:15:05.6047661Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.6047762Z frames [('total', 1)] 2025-12-04T12:15:05.6047891Z stats [('calls_captured', 10)] 2025-12-04T12:15:05.6048105Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.6048569Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:05.6048678Z graph_break [] 2025-12-04T12:15:05.6048924Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.6049040Z frames [('total', 1)] 2025-12-04T12:15:05.6049155Z stats [('calls_captured', 10)] 2025-12-04T12:15:05.6049374Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.6049841Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:05.6049970Z graph_break [] 2025-12-04T12:15:05.6050627Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-2b9aebe063e8f7ef.xml - 2025-12-04T12:15:05.6050812Z =========================== short test summary info ============================ 2025-12-04T12:15:05.6051662Z FAILED [0.4366s] inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,4096_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:05.6052318Z def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T12:15:05.6052409Z ^ 2025-12-04T12:15:05.6052867Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.6052887Z 2025-12-04T12:15:05.6053594Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:05.6053600Z 2025-12-04T12:15:05.6053604Z 2025-12-04T12:15:05.6053819Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:05.6054536Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,4096_cuda 2025-12-04T12:15:05.6055078Z 2025-12-04T12:15:05.6055356Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:05.6055552Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T12:15:05.6055758Z ================== 1 failed, 187 deselected, 2 rerun in 4.33s ================== 2025-12-04T12:15:05.6055860Z Got exit code 1 2025-12-04T12:15:05.6055985Z Retrying single test... 2025-12-04T12:15:05.6056575Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-649ba93d0ac5919c.xml 2025-12-04T12:15:05.6056741Z ============================= test session starts ============================== 2025-12-04T12:15:05.6057112Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T12:15:05.6057222Z cachedir: .pytest_cache 2025-12-04T12:15:05.6057757Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:15:05.6057885Z rootdir: /var/lib/jenkins/workspace 2025-12-04T12:15:05.6057993Z configfile: pytest.ini 2025-12-04T12:15:05.6058646Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T12:15:05.6058868Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T12:15:05.6059676Z stepcurrent: skipping 29 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,4096_cuda 2025-12-04T12:15:05.6059795Z Running 1 items in this shard 2025-12-04T12:15:05.6059801Z 2025-12-04T12:15:05.6061141Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,4096_cuda E1204 11:58:39.633000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0 2025-12-04T12:15:05.6062285Z E1204 11:58:39.633000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T12:15:05.6062720Z E1204 11:58:39.633000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 10 2025-12-04T12:15:05.6063229Z E1204 11:58:39.633000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 4096 2025-12-04T12:15:05.6063687Z E1204 11:58:39.633000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T12:15:05.6064230Z E1204 11:58:39.633000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T12:15:05.6064778Z E1204 11:58:39.633000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:05.6065356Z E1204 11:58:39.633000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T12:15:05.6065861Z E1204 11:58:39.633000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = xindex < xnumel 2025-12-04T12:15:05.6066414Z E1204 11:58:39.633000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_base = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T12:15:05.6066876Z E1204 11:58:39.633000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rbase = r0_base 2025-12-04T12:15:05.6067308Z E1204 11:58:39.633000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T12:15:05.6067941Z E1204 11:58:39.633000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_mean = tl.zeros([XBLOCK, R0_BLOCK], tl.float32) 2025-12-04T12:15:05.6068536Z E1204 11:58:39.633000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_m2 = tl.zeros([XBLOCK, R0_BLOCK], tl.float32) 2025-12-04T12:15:05.6069147Z E1204 11:58:39.633000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_weight = tl.zeros([XBLOCK, R0_BLOCK], tl.float32) 2025-12-04T12:15:05.6069736Z E1204 11:58:39.633000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] for r0_offset in tl.range(0, r0_numel, R0_BLOCK): 2025-12-04T12:15:05.6070282Z E1204 11:58:39.633000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = r0_offset + r0_base 2025-12-04T12:15:05.6070812Z E1204 11:58:39.633000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T12:15:05.6071527Z E1204 11:58:39.633000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T12:15:05.6072013Z E1204 11:58:39.633000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T12:15:05.6072565Z E1204 11:58:39.633000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_1 = r0_index 2025-12-04T12:15:05.6073392Z E1204 11:58:39.633000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask & xmask, eviction_policy='evict_last', other=0.0).to(tl.float32) 2025-12-04T12:15:05.6073935Z E1204 11:58:39.633000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float32) 2025-12-04T12:15:05.6074528Z E1204 11:58:39.633000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK]) 2025-12-04T12:15:05.6075299Z E1204 11:58:39.633000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_mean_next, tmp3_m2_next, tmp3_weight_next = triton_helpers.welford_reduce( 2025-12-04T12:15:05.6075926Z E1204 11:58:39.633000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2, tmp3_mean, tmp3_m2, tmp3_weight, roffset == 0 2025-12-04T12:15:05.6076329Z E1204 11:58:39.633000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ) 2025-12-04T12:15:05.6077110Z E1204 11:58:39.633000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_mean = tl.where(r0_mask & xmask, tmp3_mean_next, tmp3_mean) 2025-12-04T12:15:05.6077729Z E1204 11:58:39.633000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_m2 = tl.where(r0_mask & xmask, tmp3_m2_next, tmp3_m2) 2025-12-04T12:15:05.6078405Z E1204 11:58:39.633000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_weight = tl.where(r0_mask & xmask, tmp3_weight_next, tmp3_weight) 2025-12-04T12:15:05.6079124Z E1204 11:58:39.633000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4, tmp5, tmp6 = triton_helpers.welford(tmp3_mean, tmp3_m2, tmp3_weight, 1) 2025-12-04T12:15:05.6079602Z E1204 11:58:39.633000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tmp4[:, None] 2025-12-04T12:15:05.6080095Z E1204 11:58:39.633000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = tmp5[:, None] 2025-12-04T12:15:05.6080572Z E1204 11:58:39.633000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tmp6[:, None] 2025-12-04T12:15:05.6081225Z E1204 11:58:39.633000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp20 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32) 2025-12-04T12:15:05.6081794Z E1204 11:58:39.633000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp22 = tl.load(in_ptr1 + (0)) 2025-12-04T12:15:05.6082339Z E1204 11:58:39.633000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp23 = tl.broadcast_to(tmp22, [1, 1]) 2025-12-04T12:15:05.6082941Z E1204 11:58:39.633000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] for r0_offset in tl.range(0, r0_numel, R0_BLOCK): 2025-12-04T12:15:05.6083477Z E1204 11:58:39.633000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = r0_offset + r0_base 2025-12-04T12:15:05.6084017Z E1204 11:58:39.633000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T12:15:05.6084510Z E1204 11:58:39.633000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T12:15:05.6084992Z E1204 11:58:39.633000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T12:15:05.6085478Z E1204 11:58:39.633000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_1 = r0_index 2025-12-04T12:15:05.6086331Z E1204 11:58:39.633000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask & xmask, eviction_policy='evict_first', other=0.0).to(tl.float32) 2025-12-04T12:15:05.6086878Z E1204 11:58:39.633000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = tmp9.to(tl.float32) 2025-12-04T12:15:05.6087378Z E1204 11:58:39.633000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = tmp10 - tmp3 2025-12-04T12:15:05.6087842Z E1204 11:58:39.633000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = 4096.0 2025-12-04T12:15:05.6088396Z E1204 11:58:39.633000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = (tmp7 / tmp12) 2025-12-04T12:15:05.6088854Z E1204 11:58:39.633000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp14 = 1e-05 2025-12-04T12:15:05.6089365Z E1204 11:58:39.633000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp15 = tmp13 + tmp14 2025-12-04T12:15:05.6089903Z E1204 11:58:39.633000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp16 = libdevice.rsqrt(tmp15) 2025-12-04T12:15:05.6090430Z E1204 11:58:39.633000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp17 = tmp11 * tmp16 2025-12-04T12:15:05.6090967Z E1204 11:58:39.633000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp18 = tl_math.abs(tmp17) 2025-12-04T12:15:05.6091564Z E1204 11:58:39.633000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp19 = tl.broadcast_to(tmp18, [XBLOCK, R0_BLOCK]) 2025-12-04T12:15:05.6092161Z E1204 11:58:39.633000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp21 = triton_helpers.maximum(_tmp20, tmp19) 2025-12-04T12:15:05.6092750Z E1204 11:58:39.633000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp20 = tl.where(r0_mask & xmask, tmp21, _tmp20) 2025-12-04T12:15:05.6093261Z E1204 11:58:39.633000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp24 = tmp17 * tmp23 2025-12-04T12:15:05.6093730Z E1204 11:58:39.633000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp25 = -448.0 2025-12-04T12:15:05.6094311Z E1204 11:58:39.633000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp26 = triton_helpers.maximum(tmp24, tmp25) 2025-12-04T12:15:05.6094820Z E1204 11:58:39.633000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp27 = 448.0 2025-12-04T12:15:05.6095398Z E1204 11:58:39.633000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp28 = triton_helpers.minimum(tmp26, tmp27) 2025-12-04T12:15:05.6095959Z E1204 11:58:39.633000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp29 = tmp28.to(tl.float8e4nv) 2025-12-04T12:15:05.6096651Z E1204 11:58:39.633000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr3 + (r0_1 + 4096*x0), tmp29, r0_mask & xmask) 2025-12-04T12:15:05.6097229Z E1204 11:58:39.633000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp20 = triton_helpers.max2(_tmp20, 1)[:, None] 2025-12-04T12:15:05.6097788Z E1204 11:58:39.633000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr2 + (x0), tmp20, xmask) 2025-12-04T12:15:05.6098149Z E1204 11:58:39.633000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:05.6100484Z E1204 11:58:39.633000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr2': '*fp32', 'out_ptr3': '*fp8e4nv', 'xnumel': 'i32', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1, 'R0_BLOCK': 4096}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 75} 2025-12-04T12:15:05.6101022Z E1204 11:58:39.633000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:05.6102119Z E1204 11:58:39.633000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.6102753Z E1204 11:58:39.633000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.6103658Z E1204 11:58:39.633000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.6104365Z E1204 11:58:39.633000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.6105251Z E1204 11:58:39.633000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.6106039Z E1204 11:58:39.633000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.6106650Z E1204 11:58:39.633000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T12:15:05.6107748Z E1204 11:58:39.633000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T12:15:05.6108116Z E1204 11:58:39.633000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T12:15:05.6109022Z E1204 11:58:39.633000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.6109201Z ('RERUN', {'yellow': True}) [3.4173s] [100%] 2025-12-04T12:15:05.6110550Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,4096_cuda E1204 11:58:40.102000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0 2025-12-04T12:15:05.6111642Z E1204 11:58:40.102000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T12:15:05.6112081Z E1204 11:58:40.102000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 10 2025-12-04T12:15:05.6112549Z E1204 11:58:40.102000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 4096 2025-12-04T12:15:05.6113014Z E1204 11:58:40.102000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T12:15:05.6113600Z E1204 11:58:40.102000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T12:15:05.6114145Z E1204 11:58:40.102000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:05.6114732Z E1204 11:58:40.102000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T12:15:05.6115241Z E1204 11:58:40.102000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = xindex < xnumel 2025-12-04T12:15:05.6115793Z E1204 11:58:40.102000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_base = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T12:15:05.6116307Z E1204 11:58:40.102000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rbase = r0_base 2025-12-04T12:15:05.6116739Z E1204 11:58:40.102000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T12:15:05.6117339Z E1204 11:58:40.102000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_mean = tl.zeros([XBLOCK, R0_BLOCK], tl.float32) 2025-12-04T12:15:05.6117976Z E1204 11:58:40.102000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_m2 = tl.zeros([XBLOCK, R0_BLOCK], tl.float32) 2025-12-04T12:15:05.6118579Z E1204 11:58:40.102000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_weight = tl.zeros([XBLOCK, R0_BLOCK], tl.float32) 2025-12-04T12:15:05.6119165Z E1204 11:58:40.102000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] for r0_offset in tl.range(0, r0_numel, R0_BLOCK): 2025-12-04T12:15:05.6119704Z E1204 11:58:40.102000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = r0_offset + r0_base 2025-12-04T12:15:05.6120244Z E1204 11:58:40.102000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T12:15:05.6120733Z E1204 11:58:40.102000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T12:15:05.6121217Z E1204 11:58:40.102000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T12:15:05.6121695Z E1204 11:58:40.102000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_1 = r0_index 2025-12-04T12:15:05.6122507Z E1204 11:58:40.102000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask & xmask, eviction_policy='evict_last', other=0.0).to(tl.float32) 2025-12-04T12:15:05.6123085Z E1204 11:58:40.102000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float32) 2025-12-04T12:15:05.6123677Z E1204 11:58:40.102000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK]) 2025-12-04T12:15:05.6124397Z E1204 11:58:40.102000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_mean_next, tmp3_m2_next, tmp3_weight_next = triton_helpers.welford_reduce( 2025-12-04T12:15:05.6125011Z E1204 11:58:40.102000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2, tmp3_mean, tmp3_m2, tmp3_weight, roffset == 0 2025-12-04T12:15:05.6125413Z E1204 11:58:40.102000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ) 2025-12-04T12:15:05.6126087Z E1204 11:58:40.102000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_mean = tl.where(r0_mask & xmask, tmp3_mean_next, tmp3_mean) 2025-12-04T12:15:05.6126707Z E1204 11:58:40.102000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_m2 = tl.where(r0_mask & xmask, tmp3_m2_next, tmp3_m2) 2025-12-04T12:15:05.6127429Z E1204 11:58:40.102000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_weight = tl.where(r0_mask & xmask, tmp3_weight_next, tmp3_weight) 2025-12-04T12:15:05.6128134Z E1204 11:58:40.102000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4, tmp5, tmp6 = triton_helpers.welford(tmp3_mean, tmp3_m2, tmp3_weight, 1) 2025-12-04T12:15:05.6128616Z E1204 11:58:40.102000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tmp4[:, None] 2025-12-04T12:15:05.6129105Z E1204 11:58:40.102000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = tmp5[:, None] 2025-12-04T12:15:05.6129616Z E1204 11:58:40.102000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tmp6[:, None] 2025-12-04T12:15:05.6130269Z E1204 11:58:40.102000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp20 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32) 2025-12-04T12:15:05.6130791Z E1204 11:58:40.102000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp22 = tl.load(in_ptr1 + (0)) 2025-12-04T12:15:05.6131373Z E1204 11:58:40.102000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp23 = tl.broadcast_to(tmp22, [1, 1]) 2025-12-04T12:15:05.6131965Z E1204 11:58:40.102000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] for r0_offset in tl.range(0, r0_numel, R0_BLOCK): 2025-12-04T12:15:05.6132500Z E1204 11:58:40.102000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = r0_offset + r0_base 2025-12-04T12:15:05.6133043Z E1204 11:58:40.102000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T12:15:05.6133530Z E1204 11:58:40.102000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T12:15:05.6134010Z E1204 11:58:40.102000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T12:15:05.6134486Z E1204 11:58:40.102000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_1 = r0_index 2025-12-04T12:15:05.6135303Z E1204 11:58:40.102000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask & xmask, eviction_policy='evict_first', other=0.0).to(tl.float32) 2025-12-04T12:15:05.6135845Z E1204 11:58:40.102000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = tmp9.to(tl.float32) 2025-12-04T12:15:05.6136436Z E1204 11:58:40.102000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = tmp10 - tmp3 2025-12-04T12:15:05.6136919Z E1204 11:58:40.102000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = 4096.0 2025-12-04T12:15:05.6137420Z E1204 11:58:40.102000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = (tmp7 / tmp12) 2025-12-04T12:15:05.6137878Z E1204 11:58:40.102000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp14 = 1e-05 2025-12-04T12:15:05.6138389Z E1204 11:58:40.102000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp15 = tmp13 + tmp14 2025-12-04T12:15:05.6138927Z E1204 11:58:40.102000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp16 = libdevice.rsqrt(tmp15) 2025-12-04T12:15:05.6139442Z E1204 11:58:40.102000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp17 = tmp11 * tmp16 2025-12-04T12:15:05.6139961Z E1204 11:58:40.102000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp18 = tl_math.abs(tmp17) 2025-12-04T12:15:05.6140599Z E1204 11:58:40.102000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp19 = tl.broadcast_to(tmp18, [XBLOCK, R0_BLOCK]) 2025-12-04T12:15:05.6141191Z E1204 11:58:40.102000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp21 = triton_helpers.maximum(_tmp20, tmp19) 2025-12-04T12:15:05.6141781Z E1204 11:58:40.102000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp20 = tl.where(r0_mask & xmask, tmp21, _tmp20) 2025-12-04T12:15:05.6142289Z E1204 11:58:40.102000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp24 = tmp17 * tmp23 2025-12-04T12:15:05.6142794Z E1204 11:58:40.102000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp25 = -448.0 2025-12-04T12:15:05.6143371Z E1204 11:58:40.102000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp26 = triton_helpers.maximum(tmp24, tmp25) 2025-12-04T12:15:05.6143840Z E1204 11:58:40.102000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp27 = 448.0 2025-12-04T12:15:05.6144410Z E1204 11:58:40.102000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp28 = triton_helpers.minimum(tmp26, tmp27) 2025-12-04T12:15:05.6144995Z E1204 11:58:40.102000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp29 = tmp28.to(tl.float8e4nv) 2025-12-04T12:15:05.6145624Z E1204 11:58:40.102000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr3 + (r0_1 + 4096*x0), tmp29, r0_mask & xmask) 2025-12-04T12:15:05.6146204Z E1204 11:58:40.102000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp20 = triton_helpers.max2(_tmp20, 1)[:, None] 2025-12-04T12:15:05.6146763Z E1204 11:58:40.102000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr2 + (x0), tmp20, xmask) 2025-12-04T12:15:05.6147126Z E1204 11:58:40.102000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:05.6149382Z E1204 11:58:40.102000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr2': '*fp32', 'out_ptr3': '*fp8e4nv', 'xnumel': 'i32', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1, 'R0_BLOCK': 4096}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 75} 2025-12-04T12:15:05.6149962Z E1204 11:58:40.102000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:05.6151021Z E1204 11:58:40.102000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.6151648Z E1204 11:58:40.102000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.6152552Z E1204 11:58:40.102000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.6153231Z E1204 11:58:40.102000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.6154159Z E1204 11:58:40.102000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.6154932Z E1204 11:58:40.102000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.6155548Z E1204 11:58:40.102000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T12:15:05.6156654Z E1204 11:58:40.102000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T12:15:05.6157056Z E1204 11:58:40.102000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T12:15:05.6157964Z E1204 11:58:40.102000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.6158099Z ('RERUN', {'yellow': True}) [0.4300s] [100%] 2025-12-04T12:15:05.6159480Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,4096_cuda E1204 11:58:40.534000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0 2025-12-04T12:15:05.6160565Z E1204 11:58:40.534000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T12:15:05.6161001Z E1204 11:58:40.534000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 10 2025-12-04T12:15:05.6161465Z E1204 11:58:40.534000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 4096 2025-12-04T12:15:05.6161923Z E1204 11:58:40.534000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T12:15:05.6162470Z E1204 11:58:40.534000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T12:15:05.6163008Z E1204 11:58:40.534000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:05.6163631Z E1204 11:58:40.534000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T12:15:05.6164142Z E1204 11:58:40.534000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = xindex < xnumel 2025-12-04T12:15:05.6164699Z E1204 11:58:40.534000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_base = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T12:15:05.6165165Z E1204 11:58:40.534000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rbase = r0_base 2025-12-04T12:15:05.6165598Z E1204 11:58:40.534000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T12:15:05.6166195Z E1204 11:58:40.534000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_mean = tl.zeros([XBLOCK, R0_BLOCK], tl.float32) 2025-12-04T12:15:05.6166801Z E1204 11:58:40.534000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_m2 = tl.zeros([XBLOCK, R0_BLOCK], tl.float32) 2025-12-04T12:15:05.6167417Z E1204 11:58:40.534000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_weight = tl.zeros([XBLOCK, R0_BLOCK], tl.float32) 2025-12-04T12:15:05.6168060Z E1204 11:58:40.534000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] for r0_offset in tl.range(0, r0_numel, R0_BLOCK): 2025-12-04T12:15:05.6168593Z E1204 11:58:40.534000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = r0_offset + r0_base 2025-12-04T12:15:05.6169140Z E1204 11:58:40.534000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T12:15:05.6169629Z E1204 11:58:40.534000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T12:15:05.6170112Z E1204 11:58:40.534000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T12:15:05.6170622Z E1204 11:58:40.534000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_1 = r0_index 2025-12-04T12:15:05.6171625Z E1204 11:58:40.534000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask & xmask, eviction_policy='evict_last', other=0.0).to(tl.float32) 2025-12-04T12:15:05.6172164Z E1204 11:58:40.534000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float32) 2025-12-04T12:15:05.6172842Z E1204 11:58:40.534000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK]) 2025-12-04T12:15:05.6173563Z E1204 11:58:40.534000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_mean_next, tmp3_m2_next, tmp3_weight_next = triton_helpers.welford_reduce( 2025-12-04T12:15:05.6174189Z E1204 11:58:40.534000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2, tmp3_mean, tmp3_m2, tmp3_weight, roffset == 0 2025-12-04T12:15:05.6174591Z E1204 11:58:40.534000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ) 2025-12-04T12:15:05.6175255Z E1204 11:58:40.534000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_mean = tl.where(r0_mask & xmask, tmp3_mean_next, tmp3_mean) 2025-12-04T12:15:05.6175878Z E1204 11:58:40.534000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_m2 = tl.where(r0_mask & xmask, tmp3_m2_next, tmp3_m2) 2025-12-04T12:15:05.6176637Z E1204 11:58:40.534000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_weight = tl.where(r0_mask & xmask, tmp3_weight_next, tmp3_weight) 2025-12-04T12:15:05.6177397Z E1204 11:58:40.534000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4, tmp5, tmp6 = triton_helpers.welford(tmp3_mean, tmp3_m2, tmp3_weight, 1) 2025-12-04T12:15:05.6177881Z E1204 11:58:40.534000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tmp4[:, None] 2025-12-04T12:15:05.6178372Z E1204 11:58:40.534000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = tmp5[:, None] 2025-12-04T12:15:05.6178844Z E1204 11:58:40.534000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tmp6[:, None] 2025-12-04T12:15:05.6179493Z E1204 11:58:40.534000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp20 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32) 2025-12-04T12:15:05.6180018Z E1204 11:58:40.534000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp22 = tl.load(in_ptr1 + (0)) 2025-12-04T12:15:05.6180570Z E1204 11:58:40.534000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp23 = tl.broadcast_to(tmp22, [1, 1]) 2025-12-04T12:15:05.6181170Z E1204 11:58:40.534000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] for r0_offset in tl.range(0, r0_numel, R0_BLOCK): 2025-12-04T12:15:05.6181741Z E1204 11:58:40.534000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = r0_offset + r0_base 2025-12-04T12:15:05.6182290Z E1204 11:58:40.534000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T12:15:05.6182780Z E1204 11:58:40.534000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T12:15:05.6183258Z E1204 11:58:40.534000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T12:15:05.6183740Z E1204 11:58:40.534000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_1 = r0_index 2025-12-04T12:15:05.6184612Z E1204 11:58:40.534000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask & xmask, eviction_policy='evict_first', other=0.0).to(tl.float32) 2025-12-04T12:15:05.6185160Z E1204 11:58:40.534000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = tmp9.to(tl.float32) 2025-12-04T12:15:05.6185660Z E1204 11:58:40.534000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = tmp10 - tmp3 2025-12-04T12:15:05.6186175Z E1204 11:58:40.534000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = 4096.0 2025-12-04T12:15:05.6186678Z E1204 11:58:40.534000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = (tmp7 / tmp12) 2025-12-04T12:15:05.6187133Z E1204 11:58:40.534000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp14 = 1e-05 2025-12-04T12:15:05.6187650Z E1204 11:58:40.534000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp15 = tmp13 + tmp14 2025-12-04T12:15:05.6188193Z E1204 11:58:40.534000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp16 = libdevice.rsqrt(tmp15) 2025-12-04T12:15:05.6188702Z E1204 11:58:40.534000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp17 = tmp11 * tmp16 2025-12-04T12:15:05.6189228Z E1204 11:58:40.534000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp18 = tl_math.abs(tmp17) 2025-12-04T12:15:05.6189823Z E1204 11:58:40.534000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp19 = tl.broadcast_to(tmp18, [XBLOCK, R0_BLOCK]) 2025-12-04T12:15:05.6190417Z E1204 11:58:40.534000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp21 = triton_helpers.maximum(_tmp20, tmp19) 2025-12-04T12:15:05.6191045Z E1204 11:58:40.534000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp20 = tl.where(r0_mask & xmask, tmp21, _tmp20) 2025-12-04T12:15:05.6191557Z E1204 11:58:40.534000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp24 = tmp17 * tmp23 2025-12-04T12:15:05.6192022Z E1204 11:58:40.534000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp25 = -448.0 2025-12-04T12:15:05.6192606Z E1204 11:58:40.534000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp26 = triton_helpers.maximum(tmp24, tmp25) 2025-12-04T12:15:05.6193081Z E1204 11:58:40.534000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp27 = 448.0 2025-12-04T12:15:05.6193662Z E1204 11:58:40.534000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp28 = triton_helpers.minimum(tmp26, tmp27) 2025-12-04T12:15:05.6194219Z E1204 11:58:40.534000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp29 = tmp28.to(tl.float8e4nv) 2025-12-04T12:15:05.6194879Z E1204 11:58:40.534000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr3 + (r0_1 + 4096*x0), tmp29, r0_mask & xmask) 2025-12-04T12:15:05.6195469Z E1204 11:58:40.534000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp20 = triton_helpers.max2(_tmp20, 1)[:, None] 2025-12-04T12:15:05.6196020Z E1204 11:58:40.534000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr2 + (x0), tmp20, xmask) 2025-12-04T12:15:05.6196383Z E1204 11:58:40.534000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:05.6198699Z E1204 11:58:40.534000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr2': '*fp32', 'out_ptr3': '*fp8e4nv', 'xnumel': 'i32', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1, 'R0_BLOCK': 4096}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 75} 2025-12-04T12:15:05.6199260Z E1204 11:58:40.534000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:05.6200318Z E1204 11:58:40.534000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.6200947Z E1204 11:58:40.534000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.6201859Z E1204 11:58:40.534000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.6202535Z E1204 11:58:40.534000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.6203430Z E1204 11:58:40.534000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.6204208Z E1204 11:58:40.534000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.6204850Z E1204 11:58:40.534000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T12:15:05.6205961Z E1204 11:58:40.534000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T12:15:05.6206329Z E1204 11:58:40.534000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T12:15:05.6207240Z E1204 11:58:40.534000 118459 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.6207413Z FAILED [0.4309s] [100%] 2025-12-04T12:15:05.6207419Z 2025-12-04T12:15:05.6207584Z ==================================== RERUNS ==================================== 2025-12-04T12:15:05.6207991Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,4096_cuda _ 2025-12-04T12:15:05.6208119Z Traceback (most recent call last): 2025-12-04T12:15:05.6208589Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant 2025-12-04T12:15:05.6208827Z y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T12:15:05.6209316Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:05.6209581Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:05.6210092Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:05.6210297Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:05.6210808Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:05.6210983Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:05.6211531Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:05.6211853Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:05.6212385Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:05.6212573Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:05.6213051Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:05.6213187Z return self._compile_to_module() 2025-12-04T12:15:05.6213670Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:05.6213838Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:05.6214369Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:05.6214503Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:05.6215010Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:05.6215242Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:05.6215830Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:05.6215973Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:05.6216541Z File "/tmp/tmpfgf7f6rd/ni/cnifx2r3rbe5jbjqoggcoruaazjzi2rqlgur3m6o7zrwlznlmmgt.py", line 65, in 2025-12-04T12:15:05.6217078Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T12:15:05.6217193Z kernel.precompile( 2025-12-04T12:15:05.6217748Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T12:15:05.6217886Z self._precompile_worker() 2025-12-04T12:15:05.6218482Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:05.6218665Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:05.6219273Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.6219473Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.6219938Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.6220186Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.6220630Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.6220979Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.6221237Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:05.6221900Z def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T12:15:05.6221993Z ^ 2025-12-04T12:15:05.6222450Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.6222456Z 2025-12-04T12:15:05.6223182Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:05.6223190Z 2025-12-04T12:15:05.6223196Z 2025-12-04T12:15:05.6223444Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:05.6224168Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,4096_cuda 2025-12-04T12:15:05.6224174Z 2025-12-04T12:15:05.6224441Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:05.6224710Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.6224816Z frames [('total', 1)] 2025-12-04T12:15:05.6224935Z stats [('calls_captured', 10)] 2025-12-04T12:15:05.6225412Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:05.6225635Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.6225738Z graph_break [] 2025-12-04T12:15:05.6226157Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,4096_cuda _ 2025-12-04T12:15:05.6226283Z Traceback (most recent call last): 2025-12-04T12:15:05.6226704Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant 2025-12-04T12:15:05.6226956Z y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T12:15:05.6227444Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:05.6227707Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:05.6228217Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:05.6228409Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:05.6228929Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:05.6229113Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:05.6229661Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:05.6229984Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:05.6230504Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:05.6230670Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:05.6231151Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:05.6231287Z return self._compile_to_module() 2025-12-04T12:15:05.6231772Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:05.6231939Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:05.6232471Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:05.6232601Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:05.6233127Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:05.6233374Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:05.6233961Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:05.6234103Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:05.6234600Z File "/tmp/tmpmync84u0/kk/ckkub6tjslzx3ruzlurj5vywnkw4brxz65lt67c5f6jmknaelh5c.py", line 65, in 2025-12-04T12:15:05.6235059Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T12:15:05.6235187Z kernel.precompile( 2025-12-04T12:15:05.6235772Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T12:15:05.6235902Z self._precompile_worker() 2025-12-04T12:15:05.6236499Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:05.6236680Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:05.6237319Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.6237517Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.6237970Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.6238232Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.6238676Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.6239021Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.6239252Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:05.6239899Z def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T12:15:05.6240006Z ^ 2025-12-04T12:15:05.6240462Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.6240467Z 2025-12-04T12:15:05.6241189Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:05.6241225Z 2025-12-04T12:15:05.6241230Z 2025-12-04T12:15:05.6241451Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:05.6242154Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,4096_cuda 2025-12-04T12:15:05.6242174Z 2025-12-04T12:15:05.6242441Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:05.6242663Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.6242784Z frames [('total', 1)] 2025-12-04T12:15:05.6242903Z stats [('calls_captured', 10)] 2025-12-04T12:15:05.6243368Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:05.6243603Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.6243705Z graph_break [] 2025-12-04T12:15:05.6243938Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.6244045Z frames [('total', 1)] 2025-12-04T12:15:05.6244162Z stats [('calls_captured', 10)] 2025-12-04T12:15:05.6244398Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.6244894Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:05.6244995Z graph_break [] 2025-12-04T12:15:05.6245159Z =================================== FAILURES =================================== 2025-12-04T12:15:05.6245561Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,4096_cuda _ 2025-12-04T12:15:05.6245697Z Traceback (most recent call last): 2025-12-04T12:15:05.6246127Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant 2025-12-04T12:15:05.6246363Z y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T12:15:05.6246898Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:05.6247148Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:05.6247667Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:05.6247876Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:05.6248419Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:05.6248587Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:05.6249119Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:05.6249441Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:05.6249980Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:05.6250128Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:05.6250626Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:05.6250750Z return self._compile_to_module() 2025-12-04T12:15:05.6251232Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:05.6251413Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:05.6251927Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:05.6252059Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:05.6252565Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:05.6252827Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:05.6253428Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:05.6253558Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:05.6254032Z File "/tmp/tmp8jd5_ty5/r5/cr5wejk7pc2vqewa74ajy7amhts2g4s63a6vitf6ahiuvth5izp7.py", line 65, in 2025-12-04T12:15:05.6254509Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T12:15:05.6254623Z kernel.precompile( 2025-12-04T12:15:05.6255191Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T12:15:05.6255311Z self._precompile_worker() 2025-12-04T12:15:05.6255907Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:05.6256102Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:05.6256796Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.6257050Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.6257523Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.6257773Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.6258232Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.6258569Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.6258801Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:05.6259502Z def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T12:15:05.6259597Z ^ 2025-12-04T12:15:05.6260075Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.6260081Z 2025-12-04T12:15:05.6261365Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:05.6261438Z 2025-12-04T12:15:05.6261443Z 2025-12-04T12:15:05.6261667Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:05.6262392Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,4096_cuda 2025-12-04T12:15:05.6262401Z 2025-12-04T12:15:05.6262671Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:05.6262917Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.6263025Z frames [('total', 1)] 2025-12-04T12:15:05.6263145Z stats [('calls_captured', 10)] 2025-12-04T12:15:05.6263631Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:05.6263855Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.6263977Z graph_break [] 2025-12-04T12:15:05.6264201Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.6264309Z frames [('total', 1)] 2025-12-04T12:15:05.6264442Z stats [('calls_captured', 10)] 2025-12-04T12:15:05.6264662Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.6265127Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:05.6265279Z graph_break [] 2025-12-04T12:15:05.6265503Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.6265625Z frames [('total', 1)] 2025-12-04T12:15:05.6265743Z stats [('calls_captured', 10)] 2025-12-04T12:15:05.6265970Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.6266447Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:05.6266550Z graph_break [] 2025-12-04T12:15:05.6267201Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-649ba93d0ac5919c.xml - 2025-12-04T12:15:05.6267397Z =========================== short test summary info ============================ 2025-12-04T12:15:05.6268254Z FAILED [0.4309s] inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,4096_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:05.6268921Z def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T12:15:05.6269013Z ^ 2025-12-04T12:15:05.6269504Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.6269510Z 2025-12-04T12:15:05.6270234Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:05.6270242Z 2025-12-04T12:15:05.6270246Z 2025-12-04T12:15:05.6270467Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:05.6271373Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,4096_cuda 2025-12-04T12:15:05.6271383Z 2025-12-04T12:15:05.6271727Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:05.6271923Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T12:15:05.6272132Z ================== 1 failed, 187 deselected, 2 rerun in 4.32s ================== 2025-12-04T12:15:05.6272237Z Got exit code 1 2025-12-04T12:15:05.6272882Z FAILED CONSISTENTLY: test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,4096_cuda 2025-12-04T12:15:05.6273343Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T12:15:05.6273955Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-df60bd1ca7e6baab.xml 2025-12-04T12:15:05.6274137Z ============================= test session starts ============================== 2025-12-04T12:15:05.6274496Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T12:15:05.6274627Z cachedir: .pytest_cache 2025-12-04T12:15:05.6275143Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:15:05.6275272Z rootdir: /var/lib/jenkins/workspace 2025-12-04T12:15:05.6275398Z configfile: pytest.ini 2025-12-04T12:15:05.6275990Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T12:15:05.6276230Z collecting ... collected 188 items / 30 deselected / 158 selected 2025-12-04T12:15:05.6276375Z stepcurrent: skipping 30 already run items. 2025-12-04T12:15:05.6276491Z Running 158 items in this shard 2025-12-04T12:15:05.6276497Z 2025-12-04T12:15:05.6277962Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,512_cuda E1204 11:58:59.481000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1 2025-12-04T12:15:05.6279295Z E1204 11:58:59.481000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T12:15:05.6279742Z E1204 11:58:59.481000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T12:15:05.6280199Z E1204 11:58:59.481000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 5120 2025-12-04T12:15:05.6280663Z E1204 11:58:59.481000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T12:15:05.6281216Z E1204 11:58:59.481000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T12:15:05.6281760Z E1204 11:58:59.481000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:05.6282415Z E1204 11:58:59.481000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T12:15:05.6283003Z E1204 11:58:59.481000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T12:15:05.6283577Z E1204 11:58:59.481000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_base = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T12:15:05.6284034Z E1204 11:58:59.481000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rbase = r0_base 2025-12-04T12:15:05.6284796Z E1204 11:58:59.481000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp13 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32) 2025-12-04T12:15:05.6285342Z E1204 11:58:59.481000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp15 = tl.load(in_ptr3 + (0)) 2025-12-04T12:15:05.6285896Z E1204 11:58:59.481000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp16 = tl.broadcast_to(tmp15, [1, 1]) 2025-12-04T12:15:05.6286501Z E1204 11:58:59.481000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] for r0_offset in tl.range(0, r0_numel, R0_BLOCK): 2025-12-04T12:15:05.6287077Z E1204 11:58:59.481000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = r0_offset + r0_base 2025-12-04T12:15:05.6287603Z E1204 11:58:59.481000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T12:15:05.6288109Z E1204 11:58:59.481000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T12:15:05.6288591Z E1204 11:58:59.481000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T12:15:05.6289072Z E1204 11:58:59.481000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_2 = r0_index 2025-12-04T12:15:05.6289576Z E1204 11:58:59.481000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_1 = r0_index // 512 2025-12-04T12:15:05.6290344Z E1204 11:58:59.481000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_2), r0_mask, eviction_policy='evict_first', other=0.0).to(tl.float32) 2025-12-04T12:15:05.6291053Z E1204 11:58:59.481000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.load(in_ptr1 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0) 2025-12-04T12:15:05.6291786Z E1204 11:58:59.481000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tl.load(in_ptr2 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0) 2025-12-04T12:15:05.6292330Z E1204 11:58:59.481000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float32) 2025-12-04T12:15:05.6292820Z E1204 11:58:59.481000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tmp1 - tmp2 2025-12-04T12:15:05.6293287Z E1204 11:58:59.481000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = 512.0 2025-12-04T12:15:05.6293781Z E1204 11:58:59.481000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp6 = (tmp4 / tmp5) 2025-12-04T12:15:05.6294235Z E1204 11:58:59.481000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = 1e-05 2025-12-04T12:15:05.6294739Z E1204 11:58:59.481000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tmp6 + tmp7 2025-12-04T12:15:05.6295270Z E1204 11:58:59.481000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = libdevice.rsqrt(tmp8) 2025-12-04T12:15:05.6295812Z E1204 11:58:59.481000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = tmp3 * tmp9 2025-12-04T12:15:05.6296420Z E1204 11:58:59.481000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = tl_math.abs(tmp10) 2025-12-04T12:15:05.6297026Z E1204 11:58:59.481000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = tl.broadcast_to(tmp11, [XBLOCK, R0_BLOCK]) 2025-12-04T12:15:05.6297620Z E1204 11:58:59.481000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp14 = triton_helpers.maximum(_tmp13, tmp12) 2025-12-04T12:15:05.6298234Z E1204 11:58:59.481000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp13 = tl.where(r0_mask, tmp14, _tmp13) 2025-12-04T12:15:05.6298756Z E1204 11:58:59.481000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp17 = tmp10 * tmp16 2025-12-04T12:15:05.6299225Z E1204 11:58:59.481000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp18 = -448.0 2025-12-04T12:15:05.6299810Z E1204 11:58:59.481000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp19 = triton_helpers.maximum(tmp17, tmp18) 2025-12-04T12:15:05.6300315Z E1204 11:58:59.481000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp20 = 448.0 2025-12-04T12:15:05.6300899Z E1204 11:58:59.481000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp21 = triton_helpers.minimum(tmp19, tmp20) 2025-12-04T12:15:05.6301451Z E1204 11:58:59.481000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp22 = tmp21.to(tl.float8e4nv) 2025-12-04T12:15:05.6302160Z E1204 11:58:59.481000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (tl.broadcast_to(r0_2, [XBLOCK, R0_BLOCK])), tmp22, r0_mask) 2025-12-04T12:15:05.6302756Z E1204 11:58:59.481000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = triton_helpers.max2(_tmp13, 1)[:, None] 2025-12-04T12:15:05.6303274Z E1204 11:58:59.481000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp23 = tmp13.to(tl.float32) 2025-12-04T12:15:05.6303984Z E1204 11:58:59.481000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr2 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp23, None) 2025-12-04T12:15:05.6304703Z E1204 11:58:59.481000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:05.6307388Z E1204 11:58:59.481000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'in_ptr2': '*fp32', 'in_ptr3': '*fp32', 'out_ptr1': '*fp8e4nv', 'out_ptr2': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1, 'R0_BLOCK': 2048}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]], (7,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 75} 2025-12-04T12:15:05.6307946Z E1204 11:58:59.481000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:05.6308999Z E1204 11:58:59.481000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.6309682Z E1204 11:58:59.481000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.6310585Z E1204 11:58:59.481000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.6311281Z E1204 11:58:59.481000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.6312164Z E1204 11:58:59.481000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.6313008Z E1204 11:58:59.481000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.6313630Z E1204 11:58:59.481000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T12:15:05.6314880Z E1204 11:58:59.481000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T12:15:05.6315298Z E1204 11:58:59.481000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T12:15:05.6316195Z E1204 11:58:59.481000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.6316341Z ('RERUN', {'yellow': True}) [3.5129s] [ 0%] 2025-12-04T12:15:05.6317779Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,512_cuda E1204 11:59:00.031000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1 2025-12-04T12:15:05.6319036Z E1204 11:59:00.031000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T12:15:05.6319500Z E1204 11:59:00.031000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T12:15:05.6319957Z E1204 11:59:00.031000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 5120 2025-12-04T12:15:05.6320431Z E1204 11:59:00.031000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T12:15:05.6320965Z E1204 11:59:00.031000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T12:15:05.6321521Z E1204 11:59:00.031000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:05.6322108Z E1204 11:59:00.031000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T12:15:05.6322692Z E1204 11:59:00.031000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T12:15:05.6323267Z E1204 11:59:00.031000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_base = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T12:15:05.6323756Z E1204 11:59:00.031000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rbase = r0_base 2025-12-04T12:15:05.6324406Z E1204 11:59:00.031000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp13 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32) 2025-12-04T12:15:05.6324938Z E1204 11:59:00.031000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp15 = tl.load(in_ptr3 + (0)) 2025-12-04T12:15:05.6325497Z E1204 11:59:00.031000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp16 = tl.broadcast_to(tmp15, [1, 1]) 2025-12-04T12:15:05.6326077Z E1204 11:59:00.031000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] for r0_offset in tl.range(0, r0_numel, R0_BLOCK): 2025-12-04T12:15:05.6326645Z E1204 11:59:00.031000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = r0_offset + r0_base 2025-12-04T12:15:05.6327186Z E1204 11:59:00.031000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T12:15:05.6327677Z E1204 11:59:00.031000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T12:15:05.6328208Z E1204 11:59:00.031000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T12:15:05.6328673Z E1204 11:59:00.031000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_2 = r0_index 2025-12-04T12:15:05.6329176Z E1204 11:59:00.031000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_1 = r0_index // 512 2025-12-04T12:15:05.6329959Z E1204 11:59:00.031000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_2), r0_mask, eviction_policy='evict_first', other=0.0).to(tl.float32) 2025-12-04T12:15:05.6330650Z E1204 11:59:00.031000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.load(in_ptr1 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0) 2025-12-04T12:15:05.6331349Z E1204 11:59:00.031000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tl.load(in_ptr2 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0) 2025-12-04T12:15:05.6331876Z E1204 11:59:00.031000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float32) 2025-12-04T12:15:05.6332366Z E1204 11:59:00.031000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tmp1 - tmp2 2025-12-04T12:15:05.6332879Z E1204 11:59:00.031000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = 512.0 2025-12-04T12:15:05.6333376Z E1204 11:59:00.031000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp6 = (tmp4 / tmp5) 2025-12-04T12:15:05.6333844Z E1204 11:59:00.031000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = 1e-05 2025-12-04T12:15:05.6334328Z E1204 11:59:00.031000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tmp6 + tmp7 2025-12-04T12:15:05.6334876Z E1204 11:59:00.031000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = libdevice.rsqrt(tmp8) 2025-12-04T12:15:05.6335364Z E1204 11:59:00.031000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = tmp3 * tmp9 2025-12-04T12:15:05.6335890Z E1204 11:59:00.031000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = tl_math.abs(tmp10) 2025-12-04T12:15:05.6336573Z E1204 11:59:00.031000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = tl.broadcast_to(tmp11, [XBLOCK, R0_BLOCK]) 2025-12-04T12:15:05.6337201Z E1204 11:59:00.031000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp14 = triton_helpers.maximum(_tmp13, tmp12) 2025-12-04T12:15:05.6337781Z E1204 11:59:00.031000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp13 = tl.where(r0_mask, tmp14, _tmp13) 2025-12-04T12:15:05.6338285Z E1204 11:59:00.031000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp17 = tmp10 * tmp16 2025-12-04T12:15:05.6338753Z E1204 11:59:00.031000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp18 = -448.0 2025-12-04T12:15:05.6339347Z E1204 11:59:00.031000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp19 = triton_helpers.maximum(tmp17, tmp18) 2025-12-04T12:15:05.6339850Z E1204 11:59:00.031000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp20 = 448.0 2025-12-04T12:15:05.6340443Z E1204 11:59:00.031000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp21 = triton_helpers.minimum(tmp19, tmp20) 2025-12-04T12:15:05.6340983Z E1204 11:59:00.031000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp22 = tmp21.to(tl.float8e4nv) 2025-12-04T12:15:05.6341732Z E1204 11:59:00.031000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (tl.broadcast_to(r0_2, [XBLOCK, R0_BLOCK])), tmp22, r0_mask) 2025-12-04T12:15:05.6342329Z E1204 11:59:00.031000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = triton_helpers.max2(_tmp13, 1)[:, None] 2025-12-04T12:15:05.6342845Z E1204 11:59:00.031000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp23 = tmp13.to(tl.float32) 2025-12-04T12:15:05.6343569Z E1204 11:59:00.031000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr2 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp23, None) 2025-12-04T12:15:05.6343937Z E1204 11:59:00.031000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:05.6346563Z E1204 11:59:00.031000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'in_ptr2': '*fp32', 'in_ptr3': '*fp32', 'out_ptr1': '*fp8e4nv', 'out_ptr2': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1, 'R0_BLOCK': 2048}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]], (7,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 75} 2025-12-04T12:15:05.6347149Z E1204 11:59:00.031000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:05.6348205Z E1204 11:59:00.031000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.6348833Z E1204 11:59:00.031000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.6349735Z E1204 11:59:00.031000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.6350415Z E1204 11:59:00.031000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.6351349Z E1204 11:59:00.031000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.6352131Z E1204 11:59:00.031000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.6352745Z E1204 11:59:00.031000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T12:15:05.6354047Z E1204 11:59:00.031000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T12:15:05.6354417Z E1204 11:59:00.031000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T12:15:05.6355320Z E1204 11:59:00.031000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.6355486Z ('RERUN', {'yellow': True}) [0.5125s] [ 0%] 2025-12-04T12:15:05.6356916Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,512_cuda E1204 11:59:00.547000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1 2025-12-04T12:15:05.6358175Z E1204 11:59:00.547000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T12:15:05.6358610Z E1204 11:59:00.547000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T12:15:05.6359081Z E1204 11:59:00.547000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 5120 2025-12-04T12:15:05.6359547Z E1204 11:59:00.547000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T12:15:05.6360096Z E1204 11:59:00.547000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T12:15:05.6360681Z E1204 11:59:00.547000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:05.6361266Z E1204 11:59:00.547000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T12:15:05.6361872Z E1204 11:59:00.547000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T12:15:05.6362429Z E1204 11:59:00.547000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_base = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T12:15:05.6362894Z E1204 11:59:00.547000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rbase = r0_base 2025-12-04T12:15:05.6363529Z E1204 11:59:00.547000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp13 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32) 2025-12-04T12:15:05.6364061Z E1204 11:59:00.547000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp15 = tl.load(in_ptr3 + (0)) 2025-12-04T12:15:05.6364628Z E1204 11:59:00.547000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp16 = tl.broadcast_to(tmp15, [1, 1]) 2025-12-04T12:15:05.6365264Z E1204 11:59:00.547000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] for r0_offset in tl.range(0, r0_numel, R0_BLOCK): 2025-12-04T12:15:05.6365811Z E1204 11:59:00.547000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = r0_offset + r0_base 2025-12-04T12:15:05.6366341Z E1204 11:59:00.547000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T12:15:05.6366845Z E1204 11:59:00.547000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T12:15:05.6367331Z E1204 11:59:00.547000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T12:15:05.6367828Z E1204 11:59:00.547000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_2 = r0_index 2025-12-04T12:15:05.6368353Z E1204 11:59:00.547000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_1 = r0_index // 512 2025-12-04T12:15:05.6369122Z E1204 11:59:00.547000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_2), r0_mask, eviction_policy='evict_first', other=0.0).to(tl.float32) 2025-12-04T12:15:05.6369858Z E1204 11:59:00.547000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.load(in_ptr1 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0) 2025-12-04T12:15:05.6370543Z E1204 11:59:00.547000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tl.load(in_ptr2 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0) 2025-12-04T12:15:05.6371288Z E1204 11:59:00.547000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float32) 2025-12-04T12:15:05.6371796Z E1204 11:59:00.547000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tmp1 - tmp2 2025-12-04T12:15:05.6372250Z E1204 11:59:00.547000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = 512.0 2025-12-04T12:15:05.6372761Z E1204 11:59:00.547000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp6 = (tmp4 / tmp5) 2025-12-04T12:15:05.6373217Z E1204 11:59:00.547000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = 1e-05 2025-12-04T12:15:05.6373702Z E1204 11:59:00.547000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tmp6 + tmp7 2025-12-04T12:15:05.6374363Z E1204 11:59:00.547000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = libdevice.rsqrt(tmp8) 2025-12-04T12:15:05.6374854Z E1204 11:59:00.547000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = tmp3 * tmp9 2025-12-04T12:15:05.6375397Z E1204 11:59:00.547000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = tl_math.abs(tmp10) 2025-12-04T12:15:05.6375993Z E1204 11:59:00.547000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = tl.broadcast_to(tmp11, [XBLOCK, R0_BLOCK]) 2025-12-04T12:15:05.6376636Z E1204 11:59:00.547000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp14 = triton_helpers.maximum(_tmp13, tmp12) 2025-12-04T12:15:05.6377213Z E1204 11:59:00.547000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp13 = tl.where(r0_mask, tmp14, _tmp13) 2025-12-04T12:15:05.6377714Z E1204 11:59:00.547000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp17 = tmp10 * tmp16 2025-12-04T12:15:05.6378200Z E1204 11:59:00.547000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp18 = -448.0 2025-12-04T12:15:05.6378839Z E1204 11:59:00.547000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp19 = triton_helpers.maximum(tmp17, tmp18) 2025-12-04T12:15:05.6379314Z E1204 11:59:00.547000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp20 = 448.0 2025-12-04T12:15:05.6379896Z E1204 11:59:00.547000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp21 = triton_helpers.minimum(tmp19, tmp20) 2025-12-04T12:15:05.6380442Z E1204 11:59:00.547000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp22 = tmp21.to(tl.float8e4nv) 2025-12-04T12:15:05.6381230Z E1204 11:59:00.547000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (tl.broadcast_to(r0_2, [XBLOCK, R0_BLOCK])), tmp22, r0_mask) 2025-12-04T12:15:05.6381812Z E1204 11:59:00.547000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = triton_helpers.max2(_tmp13, 1)[:, None] 2025-12-04T12:15:05.6382348Z E1204 11:59:00.547000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp23 = tmp13.to(tl.float32) 2025-12-04T12:15:05.6383060Z E1204 11:59:00.547000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr2 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp23, None) 2025-12-04T12:15:05.6383468Z E1204 11:59:00.547000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:05.6386121Z E1204 11:59:00.547000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'in_ptr2': '*fp32', 'in_ptr3': '*fp32', 'out_ptr1': '*fp8e4nv', 'out_ptr2': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1, 'R0_BLOCK': 2048}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]], (7,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 75} 2025-12-04T12:15:05.6386681Z E1204 11:59:00.547000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:05.6387721Z E1204 11:59:00.547000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.6388394Z E1204 11:59:00.547000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.6389302Z E1204 11:59:00.547000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.6389978Z E1204 11:59:00.547000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.6390871Z E1204 11:59:00.547000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.6391641Z E1204 11:59:00.547000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.6392264Z E1204 11:59:00.547000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T12:15:05.6393546Z E1204 11:59:00.547000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T12:15:05.6393930Z E1204 11:59:00.547000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T12:15:05.6395304Z E1204 11:59:00.547000 118656 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.6395414Z FAILED [0.5114s] [ 0%] 2025-12-04T12:15:05.6395421Z 2025-12-04T12:15:05.6395622Z ==================================== RERUNS ==================================== 2025-12-04T12:15:05.6396020Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,512_cuda _ 2025-12-04T12:15:05.6396150Z Traceback (most recent call last): 2025-12-04T12:15:05.6396594Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant 2025-12-04T12:15:05.6396828Z y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T12:15:05.6397369Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:05.6397618Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:05.6398131Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:05.6398342Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:05.6398856Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:05.6399021Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:05.6399556Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:05.6399881Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:05.6400416Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:05.6400569Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:05.6401065Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:05.6401221Z return self._compile_to_module() 2025-12-04T12:15:05.6401709Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:05.6401885Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:05.6402403Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:05.6402534Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:05.6403218Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:05.6403457Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:05.6404060Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:05.6404188Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:05.6404686Z File "/tmp/tmp4vf3m1r3/7q/c7qv3x5zpj2odey5ro52ulltd6qbs62sm2jkootpe7ii7f2dn25g.py", line 137, in 2025-12-04T12:15:05.6405165Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T12:15:05.6405278Z kernel.precompile( 2025-12-04T12:15:05.6405893Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T12:15:05.6406014Z self._precompile_worker() 2025-12-04T12:15:05.6406609Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:05.6406802Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:05.6407397Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.6407598Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.6408067Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.6408353Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.6408813Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.6409151Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.6409378Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:05.6410234Z def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T12:15:05.6410329Z ^ 2025-12-04T12:15:05.6410799Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.6410808Z 2025-12-04T12:15:05.6411520Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:05.6411527Z 2025-12-04T12:15:05.6411532Z 2025-12-04T12:15:05.6411751Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:05.6412474Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,512_cuda 2025-12-04T12:15:05.6412483Z 2025-12-04T12:15:05.6412752Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:05.6412986Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.6413092Z frames [('total', 1)] 2025-12-04T12:15:05.6413211Z stats [('calls_captured', 10)] 2025-12-04T12:15:05.6413687Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:15:05.6413973Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.6414090Z graph_break [] 2025-12-04T12:15:05.6414489Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,512_cuda _ 2025-12-04T12:15:05.6414616Z Traceback (most recent call last): 2025-12-04T12:15:05.6415050Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant 2025-12-04T12:15:05.6415285Z y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T12:15:05.6415774Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:05.6416038Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:05.6416626Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:05.6416843Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:05.6417358Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:05.6417506Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:05.6418096Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:05.6418421Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:05.6418957Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:05.6419107Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:05.6419584Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:05.6419722Z return self._compile_to_module() 2025-12-04T12:15:05.6420240Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:05.6420404Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:05.6420935Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:05.6421065Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:05.6421572Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:05.6421839Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:05.6422423Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:05.6422564Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:05.6423072Z File "/tmp/tmpexhm0fx9/i4/ci4wjlqde6fk4tmtwjf4zm6powrm3orm22qjeh5bifomvaihricx.py", line 137, in 2025-12-04T12:15:05.6423546Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T12:15:05.6423659Z kernel.precompile( 2025-12-04T12:15:05.6424216Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T12:15:05.6424350Z self._precompile_worker() 2025-12-04T12:15:05.6424946Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:05.6425128Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:05.6425735Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.6425934Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.6426430Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.6426677Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.6427119Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.6427467Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.6427693Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:05.6428513Z def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T12:15:05.6428604Z ^ 2025-12-04T12:15:05.6429065Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.6429072Z 2025-12-04T12:15:05.6429796Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:05.6429802Z 2025-12-04T12:15:05.6429806Z 2025-12-04T12:15:05.6430024Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:05.6430774Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,512_cuda 2025-12-04T12:15:05.6430783Z 2025-12-04T12:15:05.6431055Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:05.6431295Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.6431404Z frames [('total', 1)] 2025-12-04T12:15:05.6431522Z stats [('calls_captured', 10)] 2025-12-04T12:15:05.6432007Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:15:05.6432264Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.6432367Z graph_break [] 2025-12-04T12:15:05.6432607Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.6432712Z frames [('total', 1)] 2025-12-04T12:15:05.6432832Z stats [('calls_captured', 10)] 2025-12-04T12:15:05.6433065Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.6433557Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:15:05.6433670Z graph_break [] 2025-12-04T12:15:05.6433817Z =================================== FAILURES =================================== 2025-12-04T12:15:05.6434216Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,512_cuda _ 2025-12-04T12:15:05.6434357Z Traceback (most recent call last): 2025-12-04T12:15:05.6434782Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant 2025-12-04T12:15:05.6435016Z y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T12:15:05.6435527Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:05.6435777Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:05.6436304Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:05.6436498Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:05.6437008Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:05.6437169Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:05.6437740Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:05.6438074Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:05.6438595Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:05.6438745Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:05.6439235Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:05.6439370Z return self._compile_to_module() 2025-12-04T12:15:05.6439867Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:05.6440033Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:05.6440552Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:05.6440701Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:05.6441202Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:05.6441437Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:05.6442068Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:05.6442198Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:05.6442721Z File "/tmp/tmp68wrdup6/4e/c4eun5lbz42mqcz5denprvz2e7vakonqhzkfvfckcxg4ttfvxhob.py", line 137, in 2025-12-04T12:15:05.6443184Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T12:15:05.6443298Z kernel.precompile( 2025-12-04T12:15:05.6443872Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T12:15:05.6443994Z self._precompile_worker() 2025-12-04T12:15:05.6444642Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:05.6444825Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:05.6445421Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.6445667Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.6446120Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.6446368Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.6446827Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.6447168Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.6447413Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:05.6448221Z def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T12:15:05.6448314Z ^ 2025-12-04T12:15:05.6448786Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.6448794Z 2025-12-04T12:15:05.6449504Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:05.6449510Z 2025-12-04T12:15:05.6449515Z 2025-12-04T12:15:05.6449749Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:05.6450495Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,512_cuda 2025-12-04T12:15:05.6450500Z 2025-12-04T12:15:05.6450785Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:05.6451013Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.6451121Z frames [('total', 1)] 2025-12-04T12:15:05.6451255Z stats [('calls_captured', 10)] 2025-12-04T12:15:05.6451726Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:15:05.6451950Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.6452070Z graph_break [] 2025-12-04T12:15:05.6452293Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.6452416Z frames [('total', 1)] 2025-12-04T12:15:05.6452537Z stats [('calls_captured', 10)] 2025-12-04T12:15:05.6452757Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.6453237Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:15:05.6453341Z graph_break [] 2025-12-04T12:15:05.6453597Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.6453718Z frames [('total', 1)] 2025-12-04T12:15:05.6453835Z stats [('calls_captured', 10)] 2025-12-04T12:15:05.6454055Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.6454521Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:15:05.6454620Z graph_break [] 2025-12-04T12:15:05.6455291Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-df60bd1ca7e6baab.xml - 2025-12-04T12:15:05.6455472Z =========================== short test summary info ============================ 2025-12-04T12:15:05.6456423Z FAILED [0.5114s] inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,512_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:05.6457248Z def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T12:15:05.6457390Z ^ 2025-12-04T12:15:05.6457864Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.6457869Z 2025-12-04T12:15:05.6458576Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:05.6458584Z 2025-12-04T12:15:05.6458588Z 2025-12-04T12:15:05.6458822Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:05.6459529Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,512_cuda 2025-12-04T12:15:05.6459535Z 2025-12-04T12:15:05.6459806Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:05.6460002Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T12:15:05.6460205Z ================== 1 failed, 30 deselected, 2 rerun in 4.58s =================== 2025-12-04T12:15:05.6460319Z Got exit code 1 2025-12-04T12:15:05.6460429Z Retrying single test... 2025-12-04T12:15:05.6460900Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-927fdf8f8ff6280c.xml 2025-12-04T12:15:05.6461077Z ============================= test session starts ============================== 2025-12-04T12:15:05.6461468Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T12:15:05.6461581Z cachedir: .pytest_cache 2025-12-04T12:15:05.6462114Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:15:05.6462243Z rootdir: /var/lib/jenkins/workspace 2025-12-04T12:15:05.6462366Z configfile: pytest.ini 2025-12-04T12:15:05.6462960Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T12:15:05.6463186Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T12:15:05.6463989Z stepcurrent: skipping 30 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,512_cuda 2025-12-04T12:15:05.6464109Z Running 1 items in this shard 2025-12-04T12:15:05.6464114Z 2025-12-04T12:15:05.6465601Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,512_cuda E1204 11:59:19.133000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1 2025-12-04T12:15:05.6466845Z E1204 11:59:19.133000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T12:15:05.6467290Z E1204 11:59:19.133000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T12:15:05.6467744Z E1204 11:59:19.133000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 5120 2025-12-04T12:15:05.6468211Z E1204 11:59:19.133000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T12:15:05.6468788Z E1204 11:59:19.133000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T12:15:05.6469331Z E1204 11:59:19.133000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:05.6469926Z E1204 11:59:19.133000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T12:15:05.6470569Z E1204 11:59:19.133000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T12:15:05.6471360Z E1204 11:59:19.133000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_base = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T12:15:05.6471827Z E1204 11:59:19.133000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rbase = r0_base 2025-12-04T12:15:05.6472468Z E1204 11:59:19.133000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp13 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32) 2025-12-04T12:15:05.6473007Z E1204 11:59:19.133000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp15 = tl.load(in_ptr3 + (0)) 2025-12-04T12:15:05.6473554Z E1204 11:59:19.133000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp16 = tl.broadcast_to(tmp15, [1, 1]) 2025-12-04T12:15:05.6474136Z E1204 11:59:19.133000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] for r0_offset in tl.range(0, r0_numel, R0_BLOCK): 2025-12-04T12:15:05.6474677Z E1204 11:59:19.133000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = r0_offset + r0_base 2025-12-04T12:15:05.6475309Z E1204 11:59:19.133000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T12:15:05.6475814Z E1204 11:59:19.133000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T12:15:05.6476299Z E1204 11:59:19.133000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T12:15:05.6476769Z E1204 11:59:19.133000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_2 = r0_index 2025-12-04T12:15:05.6477289Z E1204 11:59:19.133000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_1 = r0_index // 512 2025-12-04T12:15:05.6478051Z E1204 11:59:19.133000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_2), r0_mask, eviction_policy='evict_first', other=0.0).to(tl.float32) 2025-12-04T12:15:05.6478762Z E1204 11:59:19.133000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.load(in_ptr1 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0) 2025-12-04T12:15:05.6479504Z E1204 11:59:19.133000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tl.load(in_ptr2 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0) 2025-12-04T12:15:05.6480045Z E1204 11:59:19.133000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float32) 2025-12-04T12:15:05.6480537Z E1204 11:59:19.133000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tmp1 - tmp2 2025-12-04T12:15:05.6480990Z E1204 11:59:19.133000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = 512.0 2025-12-04T12:15:05.6481495Z E1204 11:59:19.133000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp6 = (tmp4 / tmp5) 2025-12-04T12:15:05.6481997Z E1204 11:59:19.133000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = 1e-05 2025-12-04T12:15:05.6482497Z E1204 11:59:19.133000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tmp6 + tmp7 2025-12-04T12:15:05.6483032Z E1204 11:59:19.133000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = libdevice.rsqrt(tmp8) 2025-12-04T12:15:05.6483521Z E1204 11:59:19.133000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = tmp3 * tmp9 2025-12-04T12:15:05.6484101Z E1204 11:59:19.133000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = tl_math.abs(tmp10) 2025-12-04T12:15:05.6484696Z E1204 11:59:19.133000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = tl.broadcast_to(tmp11, [XBLOCK, R0_BLOCK]) 2025-12-04T12:15:05.6485290Z E1204 11:59:19.133000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp14 = triton_helpers.maximum(_tmp13, tmp12) 2025-12-04T12:15:05.6485847Z E1204 11:59:19.133000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp13 = tl.where(r0_mask, tmp14, _tmp13) 2025-12-04T12:15:05.6486348Z E1204 11:59:19.133000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp17 = tmp10 * tmp16 2025-12-04T12:15:05.6486824Z E1204 11:59:19.133000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp18 = -448.0 2025-12-04T12:15:05.6487399Z E1204 11:59:19.133000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp19 = triton_helpers.maximum(tmp17, tmp18) 2025-12-04T12:15:05.6487869Z E1204 11:59:19.133000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp20 = 448.0 2025-12-04T12:15:05.6488477Z E1204 11:59:19.133000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp21 = triton_helpers.minimum(tmp19, tmp20) 2025-12-04T12:15:05.6489031Z E1204 11:59:19.133000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp22 = tmp21.to(tl.float8e4nv) 2025-12-04T12:15:05.6489737Z E1204 11:59:19.133000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (tl.broadcast_to(r0_2, [XBLOCK, R0_BLOCK])), tmp22, r0_mask) 2025-12-04T12:15:05.6490314Z E1204 11:59:19.133000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = triton_helpers.max2(_tmp13, 1)[:, None] 2025-12-04T12:15:05.6490841Z E1204 11:59:19.133000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp23 = tmp13.to(tl.float32) 2025-12-04T12:15:05.6491552Z E1204 11:59:19.133000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr2 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp23, None) 2025-12-04T12:15:05.6491933Z E1204 11:59:19.133000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:05.6494658Z E1204 11:59:19.133000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'in_ptr2': '*fp32', 'in_ptr3': '*fp32', 'out_ptr1': '*fp8e4nv', 'out_ptr2': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1, 'R0_BLOCK': 2048}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]], (7,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 75} 2025-12-04T12:15:05.6495216Z E1204 11:59:19.133000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:05.6496348Z E1204 11:59:19.133000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.6497006Z E1204 11:59:19.133000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.6497938Z E1204 11:59:19.133000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.6498618Z E1204 11:59:19.133000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.6499514Z E1204 11:59:19.133000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.6500282Z E1204 11:59:19.133000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.6500901Z E1204 11:59:19.133000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T12:15:05.6502145Z E1204 11:59:19.133000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T12:15:05.6502568Z E1204 11:59:19.133000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T12:15:05.6503459Z E1204 11:59:19.133000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.6503608Z ('RERUN', {'yellow': True}) [3.5163s] [100%] 2025-12-04T12:15:05.6505035Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,512_cuda E1204 11:59:19.684000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1 2025-12-04T12:15:05.6506280Z E1204 11:59:19.684000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T12:15:05.6506726Z E1204 11:59:19.684000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T12:15:05.6507209Z E1204 11:59:19.684000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 5120 2025-12-04T12:15:05.6507684Z E1204 11:59:19.684000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T12:15:05.6508222Z E1204 11:59:19.684000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T12:15:05.6508777Z E1204 11:59:19.684000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:05.6509358Z E1204 11:59:19.684000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T12:15:05.6509996Z E1204 11:59:19.684000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T12:15:05.6510566Z E1204 11:59:19.684000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_base = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T12:15:05.6511013Z E1204 11:59:19.684000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rbase = r0_base 2025-12-04T12:15:05.6511690Z E1204 11:59:19.684000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp13 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32) 2025-12-04T12:15:05.6512214Z E1204 11:59:19.684000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp15 = tl.load(in_ptr3 + (0)) 2025-12-04T12:15:05.6512763Z E1204 11:59:19.684000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp16 = tl.broadcast_to(tmp15, [1, 1]) 2025-12-04T12:15:05.6513359Z E1204 11:59:19.684000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] for r0_offset in tl.range(0, r0_numel, R0_BLOCK): 2025-12-04T12:15:05.6513887Z E1204 11:59:19.684000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = r0_offset + r0_base 2025-12-04T12:15:05.6514429Z E1204 11:59:19.684000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T12:15:05.6514921Z E1204 11:59:19.684000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T12:15:05.6515400Z E1204 11:59:19.684000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T12:15:05.6515879Z E1204 11:59:19.684000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_2 = r0_index 2025-12-04T12:15:05.6516416Z E1204 11:59:19.684000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_1 = r0_index // 512 2025-12-04T12:15:05.6517190Z E1204 11:59:19.684000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_2), r0_mask, eviction_policy='evict_first', other=0.0).to(tl.float32) 2025-12-04T12:15:05.6517875Z E1204 11:59:19.684000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.load(in_ptr1 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0) 2025-12-04T12:15:05.6518579Z E1204 11:59:19.684000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tl.load(in_ptr2 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0) 2025-12-04T12:15:05.6519104Z E1204 11:59:19.684000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float32) 2025-12-04T12:15:05.6519592Z E1204 11:59:19.684000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tmp1 - tmp2 2025-12-04T12:15:05.6520057Z E1204 11:59:19.684000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = 512.0 2025-12-04T12:15:05.6520580Z E1204 11:59:19.684000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp6 = (tmp4 / tmp5) 2025-12-04T12:15:05.6521046Z E1204 11:59:19.684000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = 1e-05 2025-12-04T12:15:05.6521531Z E1204 11:59:19.684000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tmp6 + tmp7 2025-12-04T12:15:05.6522062Z E1204 11:59:19.684000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = libdevice.rsqrt(tmp8) 2025-12-04T12:15:05.6522560Z E1204 11:59:19.684000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = tmp3 * tmp9 2025-12-04T12:15:05.6523117Z E1204 11:59:19.684000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = tl_math.abs(tmp10) 2025-12-04T12:15:05.6523726Z E1204 11:59:19.684000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = tl.broadcast_to(tmp11, [XBLOCK, R0_BLOCK]) 2025-12-04T12:15:05.6524305Z E1204 11:59:19.684000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp14 = triton_helpers.maximum(_tmp13, tmp12) 2025-12-04T12:15:05.6524897Z E1204 11:59:19.684000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp13 = tl.where(r0_mask, tmp14, _tmp13) 2025-12-04T12:15:05.6525404Z E1204 11:59:19.684000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp17 = tmp10 * tmp16 2025-12-04T12:15:05.6525866Z E1204 11:59:19.684000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp18 = -448.0 2025-12-04T12:15:05.6526456Z E1204 11:59:19.684000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp19 = triton_helpers.maximum(tmp17, tmp18) 2025-12-04T12:15:05.6526911Z E1204 11:59:19.684000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp20 = 448.0 2025-12-04T12:15:05.6527487Z E1204 11:59:19.684000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp21 = triton_helpers.minimum(tmp19, tmp20) 2025-12-04T12:15:05.6528042Z E1204 11:59:19.684000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp22 = tmp21.to(tl.float8e4nv) 2025-12-04T12:15:05.6528746Z E1204 11:59:19.684000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (tl.broadcast_to(r0_2, [XBLOCK, R0_BLOCK])), tmp22, r0_mask) 2025-12-04T12:15:05.6529333Z E1204 11:59:19.684000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = triton_helpers.max2(_tmp13, 1)[:, None] 2025-12-04T12:15:05.6529883Z E1204 11:59:19.684000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp23 = tmp13.to(tl.float32) 2025-12-04T12:15:05.6530607Z E1204 11:59:19.684000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr2 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp23, None) 2025-12-04T12:15:05.6530970Z E1204 11:59:19.684000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:05.6533645Z E1204 11:59:19.684000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'in_ptr2': '*fp32', 'in_ptr3': '*fp32', 'out_ptr1': '*fp8e4nv', 'out_ptr2': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1, 'R0_BLOCK': 2048}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]], (7,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 75} 2025-12-04T12:15:05.6534184Z E1204 11:59:19.684000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:05.6535240Z E1204 11:59:19.684000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.6535869Z E1204 11:59:19.684000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.6536862Z E1204 11:59:19.684000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.6537567Z E1204 11:59:19.684000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.6538457Z E1204 11:59:19.684000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.6539272Z E1204 11:59:19.684000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.6539882Z E1204 11:59:19.684000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T12:15:05.6541153Z E1204 11:59:19.684000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T12:15:05.6541525Z E1204 11:59:19.684000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T12:15:05.6542417Z E1204 11:59:19.684000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.6542566Z ('RERUN', {'yellow': True}) [0.5129s] [100%] 2025-12-04T12:15:05.6544001Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,512_cuda E1204 11:59:20.202000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1 2025-12-04T12:15:05.6545293Z E1204 11:59:20.202000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T12:15:05.6545731Z E1204 11:59:20.202000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T12:15:05.6546197Z E1204 11:59:20.202000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 5120 2025-12-04T12:15:05.6546661Z E1204 11:59:20.202000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T12:15:05.6547202Z E1204 11:59:20.202000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T12:15:05.6547762Z E1204 11:59:20.202000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:05.6548374Z E1204 11:59:20.202000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T12:15:05.6548972Z E1204 11:59:20.202000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T12:15:05.6549533Z E1204 11:59:20.202000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_base = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T12:15:05.6550002Z E1204 11:59:20.202000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rbase = r0_base 2025-12-04T12:15:05.6550643Z E1204 11:59:20.202000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp13 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32) 2025-12-04T12:15:05.6551199Z E1204 11:59:20.202000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp15 = tl.load(in_ptr3 + (0)) 2025-12-04T12:15:05.6551764Z E1204 11:59:20.202000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp16 = tl.broadcast_to(tmp15, [1, 1]) 2025-12-04T12:15:05.6552345Z E1204 11:59:20.202000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] for r0_offset in tl.range(0, r0_numel, R0_BLOCK): 2025-12-04T12:15:05.6552918Z E1204 11:59:20.202000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = r0_offset + r0_base 2025-12-04T12:15:05.6553449Z E1204 11:59:20.202000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T12:15:05.6553940Z E1204 11:59:20.202000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T12:15:05.6554438Z E1204 11:59:20.202000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T12:15:05.6554906Z E1204 11:59:20.202000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_2 = r0_index 2025-12-04T12:15:05.6555424Z E1204 11:59:20.202000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_1 = r0_index // 512 2025-12-04T12:15:05.6556187Z E1204 11:59:20.202000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_2), r0_mask, eviction_policy='evict_first', other=0.0).to(tl.float32) 2025-12-04T12:15:05.6556875Z E1204 11:59:20.202000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.load(in_ptr1 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0) 2025-12-04T12:15:05.6557608Z E1204 11:59:20.202000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tl.load(in_ptr2 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0) 2025-12-04T12:15:05.6558136Z E1204 11:59:20.202000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float32) 2025-12-04T12:15:05.6558637Z E1204 11:59:20.202000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tmp1 - tmp2 2025-12-04T12:15:05.6559092Z E1204 11:59:20.202000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = 512.0 2025-12-04T12:15:05.6559601Z E1204 11:59:20.202000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp6 = (tmp4 / tmp5) 2025-12-04T12:15:05.6560057Z E1204 11:59:20.202000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = 1e-05 2025-12-04T12:15:05.6560544Z E1204 11:59:20.202000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tmp6 + tmp7 2025-12-04T12:15:05.6561096Z E1204 11:59:20.202000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = libdevice.rsqrt(tmp8) 2025-12-04T12:15:05.6561635Z E1204 11:59:20.202000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = tmp3 * tmp9 2025-12-04T12:15:05.6562170Z E1204 11:59:20.202000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = tl_math.abs(tmp10) 2025-12-04T12:15:05.6562767Z E1204 11:59:20.202000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = tl.broadcast_to(tmp11, [XBLOCK, R0_BLOCK]) 2025-12-04T12:15:05.6563349Z E1204 11:59:20.202000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp14 = triton_helpers.maximum(_tmp13, tmp12) 2025-12-04T12:15:05.6563923Z E1204 11:59:20.202000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp13 = tl.where(r0_mask, tmp14, _tmp13) 2025-12-04T12:15:05.6564451Z E1204 11:59:20.202000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp17 = tmp10 * tmp16 2025-12-04T12:15:05.6564929Z E1204 11:59:20.202000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp18 = -448.0 2025-12-04T12:15:05.6565504Z E1204 11:59:20.202000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp19 = triton_helpers.maximum(tmp17, tmp18) 2025-12-04T12:15:05.6565989Z E1204 11:59:20.202000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp20 = 448.0 2025-12-04T12:15:05.6566580Z E1204 11:59:20.202000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp21 = triton_helpers.minimum(tmp19, tmp20) 2025-12-04T12:15:05.6567120Z E1204 11:59:20.202000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp22 = tmp21.to(tl.float8e4nv) 2025-12-04T12:15:05.6567840Z E1204 11:59:20.202000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (tl.broadcast_to(r0_2, [XBLOCK, R0_BLOCK])), tmp22, r0_mask) 2025-12-04T12:15:05.6568418Z E1204 11:59:20.202000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = triton_helpers.max2(_tmp13, 1)[:, None] 2025-12-04T12:15:05.6568932Z E1204 11:59:20.202000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp23 = tmp13.to(tl.float32) 2025-12-04T12:15:05.6569655Z E1204 11:59:20.202000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr2 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp23, None) 2025-12-04T12:15:05.6570018Z E1204 11:59:20.202000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:05.6572885Z E1204 11:59:20.202000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'in_ptr2': '*fp32', 'in_ptr3': '*fp32', 'out_ptr1': '*fp8e4nv', 'out_ptr2': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1, 'R0_BLOCK': 2048}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]], (7,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 75} 2025-12-04T12:15:05.6573420Z E1204 11:59:20.202000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:05.6574487Z E1204 11:59:20.202000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.6575199Z E1204 11:59:20.202000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.6576106Z E1204 11:59:20.202000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.6576866Z E1204 11:59:20.202000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.6577761Z E1204 11:59:20.202000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.6578598Z E1204 11:59:20.202000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.6579214Z E1204 11:59:20.202000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T12:15:05.6580476Z E1204 11:59:20.202000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T12:15:05.6580885Z E1204 11:59:20.202000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T12:15:05.6581790Z E1204 11:59:20.202000 118875 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.6581899Z FAILED [0.5139s] [100%] 2025-12-04T12:15:05.6581906Z 2025-12-04T12:15:05.6582070Z ==================================== RERUNS ==================================== 2025-12-04T12:15:05.6582472Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,512_cuda _ 2025-12-04T12:15:05.6582600Z Traceback (most recent call last): 2025-12-04T12:15:05.6583040Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant 2025-12-04T12:15:05.6583275Z y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T12:15:05.6583778Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:05.6584026Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:05.6584598Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:05.6584808Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:05.6585320Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:05.6585468Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:05.6586015Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:05.6586338Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:05.6586870Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:05.6587023Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:05.6587506Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:05.6587647Z return self._compile_to_module() 2025-12-04T12:15:05.6588134Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:05.6588346Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:05.6588865Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:05.6588999Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:05.6589507Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:05.6589742Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:05.6590323Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:05.6590467Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:05.6591006Z File "/tmp/tmprvz50br1/ij/cijjsrboulgkoazpletm4enpys4jzah2fb6wlkk3jfo756sifzzx.py", line 137, in 2025-12-04T12:15:05.6591486Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T12:15:05.6591600Z kernel.precompile( 2025-12-04T12:15:05.6592155Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T12:15:05.6592319Z self._precompile_worker() 2025-12-04T12:15:05.6592911Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:05.6593105Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:05.6593699Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.6593899Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.6594365Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.6594612Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.6595052Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.6595401Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.6595627Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:05.6596443Z def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T12:15:05.6596570Z ^ 2025-12-04T12:15:05.6597029Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.6597049Z 2025-12-04T12:15:05.6597759Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:05.6597769Z 2025-12-04T12:15:05.6597774Z 2025-12-04T12:15:05.6597991Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:05.6598715Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,512_cuda 2025-12-04T12:15:05.6598721Z 2025-12-04T12:15:05.6598992Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:05.6599228Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.6599337Z frames [('total', 1)] 2025-12-04T12:15:05.6599455Z stats [('calls_captured', 10)] 2025-12-04T12:15:05.6599937Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:15:05.6600160Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.6600260Z graph_break [] 2025-12-04T12:15:05.6600701Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,512_cuda _ 2025-12-04T12:15:05.6600827Z Traceback (most recent call last): 2025-12-04T12:15:05.6601264Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant 2025-12-04T12:15:05.6601496Z y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T12:15:05.6601985Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:05.6602246Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:05.6602790Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:05.6602997Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:05.6603509Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:05.6603656Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:05.6604200Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:05.6604554Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:05.6605071Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:05.6605231Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:05.6605712Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:05.6605850Z return self._compile_to_module() 2025-12-04T12:15:05.6606332Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:05.6606497Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:05.6607023Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:05.6607158Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:05.6607664Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:05.6607896Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:05.6608481Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:05.6608673Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:05.6609180Z File "/tmp/tmplm6oh1qc/ly/clygrn34bxe55c4lxlsrh6vy73lawtnmx34zbauu5y2wbgfti2ss.py", line 137, in 2025-12-04T12:15:05.6609645Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T12:15:05.6609773Z kernel.precompile( 2025-12-04T12:15:05.6610323Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T12:15:05.6610456Z self._precompile_worker() 2025-12-04T12:15:05.6611055Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:05.6611233Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:05.6611845Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.6612047Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.6612515Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.6612814Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.6613259Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.6613612Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.6613846Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:05.6614655Z def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T12:15:05.6614764Z ^ 2025-12-04T12:15:05.6615223Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.6615268Z 2025-12-04T12:15:05.6616005Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:05.6616011Z 2025-12-04T12:15:05.6616016Z 2025-12-04T12:15:05.6616234Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:05.6617052Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,512_cuda 2025-12-04T12:15:05.6617059Z 2025-12-04T12:15:05.6617331Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:05.6617554Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.6617676Z frames [('total', 1)] 2025-12-04T12:15:05.6617795Z stats [('calls_captured', 10)] 2025-12-04T12:15:05.6618271Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:15:05.6621377Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.6621483Z graph_break [] 2025-12-04T12:15:05.6621711Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.6621831Z frames [('total', 1)] 2025-12-04T12:15:05.6621954Z stats [('calls_captured', 10)] 2025-12-04T12:15:05.6622188Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.6622652Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:15:05.6622754Z graph_break [] 2025-12-04T12:15:05.6622932Z =================================== FAILURES =================================== 2025-12-04T12:15:05.6623331Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,512_cuda _ 2025-12-04T12:15:05.6623459Z Traceback (most recent call last): 2025-12-04T12:15:05.6623903Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant 2025-12-04T12:15:05.6624175Z y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T12:15:05.6624668Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:05.6624939Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:05.6625457Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:05.6625667Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:05.6626181Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:05.6626329Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:05.6626880Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:05.6627263Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:05.6627800Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:05.6627954Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:05.6628437Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:05.6628575Z return self._compile_to_module() 2025-12-04T12:15:05.6629064Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:05.6629232Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:05.6629799Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:05.6629932Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:05.6630446Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:05.6630679Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:05.6631268Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:05.6631445Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:05.6631956Z File "/tmp/tmpf2u9p3qs/gk/cgkwof2xrehpz5xuibdydb6xxjntk2ldzm2llo7mjfg3vmfvszhi.py", line 137, in 2025-12-04T12:15:05.6632436Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T12:15:05.6632551Z kernel.precompile( 2025-12-04T12:15:05.6633108Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T12:15:05.6633245Z self._precompile_worker() 2025-12-04T12:15:05.6633924Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:05.6634107Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:05.6634720Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.6634922Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.6635385Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.6635631Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.6636079Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.6636430Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.6636662Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:05.6637491Z def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T12:15:05.6637587Z ^ 2025-12-04T12:15:05.6638042Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.6638048Z 2025-12-04T12:15:05.6638773Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:05.6638779Z 2025-12-04T12:15:05.6638784Z 2025-12-04T12:15:05.6639006Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:05.6639728Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,512_cuda 2025-12-04T12:15:05.6639768Z 2025-12-04T12:15:05.6640039Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:05.6640271Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.6640379Z frames [('total', 1)] 2025-12-04T12:15:05.6640499Z stats [('calls_captured', 10)] 2025-12-04T12:15:05.6640977Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:15:05.6641198Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.6641299Z graph_break [] 2025-12-04T12:15:05.6641527Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.6641633Z frames [('total', 1)] 2025-12-04T12:15:05.6641780Z stats [('calls_captured', 10)] 2025-12-04T12:15:05.6642012Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.6642473Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:15:05.6642586Z graph_break [] 2025-12-04T12:15:05.6642803Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.6642940Z frames [('total', 1)] 2025-12-04T12:15:05.6643067Z stats [('calls_captured', 10)] 2025-12-04T12:15:05.6643285Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.6643740Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:15:05.6643853Z graph_break [] 2025-12-04T12:15:05.6644507Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-927fdf8f8ff6280c.xml - 2025-12-04T12:15:05.6644697Z =========================== short test summary info ============================ 2025-12-04T12:15:05.6645592Z FAILED [0.5139s] inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,512_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:05.6646398Z def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T12:15:05.6646504Z ^ 2025-12-04T12:15:05.6646959Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.6646965Z 2025-12-04T12:15:05.6647687Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:05.6647695Z 2025-12-04T12:15:05.6647700Z 2025-12-04T12:15:05.6647919Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:05.6648641Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,512_cuda 2025-12-04T12:15:05.6648647Z 2025-12-04T12:15:05.6648920Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:05.6649103Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T12:15:05.6649322Z ================== 1 failed, 187 deselected, 2 rerun in 4.59s ================== 2025-12-04T12:15:05.6649424Z Got exit code 1 2025-12-04T12:15:05.6649532Z Retrying single test... 2025-12-04T12:15:05.6650014Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-2f380c761dc75570.xml 2025-12-04T12:15:05.6650182Z ============================= test session starts ============================== 2025-12-04T12:15:05.6650547Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T12:15:05.6650660Z cachedir: .pytest_cache 2025-12-04T12:15:05.6651213Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:15:05.6651356Z rootdir: /var/lib/jenkins/workspace 2025-12-04T12:15:05.6651467Z configfile: pytest.ini 2025-12-04T12:15:05.6652057Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T12:15:05.6652293Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T12:15:05.6653080Z stepcurrent: skipping 30 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,512_cuda 2025-12-04T12:15:05.6653244Z Running 1 items in this shard 2025-12-04T12:15:05.6653250Z 2025-12-04T12:15:05.6654689Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,512_cuda E1204 11:59:38.742000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1 2025-12-04T12:15:05.6656013Z E1204 11:59:38.742000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T12:15:05.6656576Z E1204 11:59:38.742000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T12:15:05.6657032Z E1204 11:59:38.742000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 5120 2025-12-04T12:15:05.6657511Z E1204 11:59:38.742000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T12:15:05.6658097Z E1204 11:59:38.742000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T12:15:05.6658653Z E1204 11:59:38.742000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:05.6659243Z E1204 11:59:38.742000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T12:15:05.6659845Z E1204 11:59:38.742000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T12:15:05.6660401Z E1204 11:59:38.742000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_base = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T12:15:05.6660852Z E1204 11:59:38.742000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rbase = r0_base 2025-12-04T12:15:05.6661506Z E1204 11:59:38.742000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp13 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32) 2025-12-04T12:15:05.6662028Z E1204 11:59:38.742000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp15 = tl.load(in_ptr3 + (0)) 2025-12-04T12:15:05.6662588Z E1204 11:59:38.742000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp16 = tl.broadcast_to(tmp15, [1, 1]) 2025-12-04T12:15:05.6663165Z E1204 11:59:38.742000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] for r0_offset in tl.range(0, r0_numel, R0_BLOCK): 2025-12-04T12:15:05.6663695Z E1204 11:59:38.742000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = r0_offset + r0_base 2025-12-04T12:15:05.6664234Z E1204 11:59:38.742000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T12:15:05.6664756Z E1204 11:59:38.742000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T12:15:05.6665248Z E1204 11:59:38.742000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T12:15:05.6665711Z E1204 11:59:38.742000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_2 = r0_index 2025-12-04T12:15:05.6666210Z E1204 11:59:38.742000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_1 = r0_index // 512 2025-12-04T12:15:05.6667012Z E1204 11:59:38.742000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_2), r0_mask, eviction_policy='evict_first', other=0.0).to(tl.float32) 2025-12-04T12:15:05.6667707Z E1204 11:59:38.742000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.load(in_ptr1 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0) 2025-12-04T12:15:05.6668413Z E1204 11:59:38.742000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tl.load(in_ptr2 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0) 2025-12-04T12:15:05.6668965Z E1204 11:59:38.742000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float32) 2025-12-04T12:15:05.6669465Z E1204 11:59:38.742000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tmp1 - tmp2 2025-12-04T12:15:05.6669921Z E1204 11:59:38.742000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = 512.0 2025-12-04T12:15:05.6670417Z E1204 11:59:38.742000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp6 = (tmp4 / tmp5) 2025-12-04T12:15:05.6670881Z E1204 11:59:38.742000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = 1e-05 2025-12-04T12:15:05.6671607Z E1204 11:59:38.742000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tmp6 + tmp7 2025-12-04T12:15:05.6672160Z E1204 11:59:38.742000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = libdevice.rsqrt(tmp8) 2025-12-04T12:15:05.6672652Z E1204 11:59:38.742000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = tmp3 * tmp9 2025-12-04T12:15:05.6673175Z E1204 11:59:38.742000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = tl_math.abs(tmp10) 2025-12-04T12:15:05.6673782Z E1204 11:59:38.742000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = tl.broadcast_to(tmp11, [XBLOCK, R0_BLOCK]) 2025-12-04T12:15:05.6674362Z E1204 11:59:38.742000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp14 = triton_helpers.maximum(_tmp13, tmp12) 2025-12-04T12:15:05.6674939Z E1204 11:59:38.742000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp13 = tl.where(r0_mask, tmp14, _tmp13) 2025-12-04T12:15:05.6675441Z E1204 11:59:38.742000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp17 = tmp10 * tmp16 2025-12-04T12:15:05.6675907Z E1204 11:59:38.742000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp18 = -448.0 2025-12-04T12:15:05.6676496Z E1204 11:59:38.742000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp19 = triton_helpers.maximum(tmp17, tmp18) 2025-12-04T12:15:05.6676953Z E1204 11:59:38.742000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp20 = 448.0 2025-12-04T12:15:05.6677539Z E1204 11:59:38.742000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp21 = triton_helpers.minimum(tmp19, tmp20) 2025-12-04T12:15:05.6678147Z E1204 11:59:38.742000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp22 = tmp21.to(tl.float8e4nv) 2025-12-04T12:15:05.6678865Z E1204 11:59:38.742000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (tl.broadcast_to(r0_2, [XBLOCK, R0_BLOCK])), tmp22, r0_mask) 2025-12-04T12:15:05.6679442Z E1204 11:59:38.742000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = triton_helpers.max2(_tmp13, 1)[:, None] 2025-12-04T12:15:05.6679958Z E1204 11:59:38.742000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp23 = tmp13.to(tl.float32) 2025-12-04T12:15:05.6680731Z E1204 11:59:38.742000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr2 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp23, None) 2025-12-04T12:15:05.6681100Z E1204 11:59:38.742000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:05.6683734Z E1204 11:59:38.742000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'in_ptr2': '*fp32', 'in_ptr3': '*fp32', 'out_ptr1': '*fp8e4nv', 'out_ptr2': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1, 'R0_BLOCK': 2048}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]], (7,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 75} 2025-12-04T12:15:05.6684316Z E1204 11:59:38.742000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:05.6685423Z E1204 11:59:38.742000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.6686053Z E1204 11:59:38.742000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.6686951Z E1204 11:59:38.742000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.6687628Z E1204 11:59:38.742000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.6688516Z E1204 11:59:38.742000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.6689883Z E1204 11:59:38.742000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.6690499Z E1204 11:59:38.742000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T12:15:05.6691761Z E1204 11:59:38.742000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T12:15:05.6692129Z E1204 11:59:38.742000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T12:15:05.6693086Z E1204 11:59:38.742000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.6693226Z ('RERUN', {'yellow': True}) [3.5349s] [100%] 2025-12-04T12:15:05.6694655Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,512_cuda E1204 11:59:39.304000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1 2025-12-04T12:15:05.6695940Z E1204 11:59:39.304000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T12:15:05.6696443Z E1204 11:59:39.304000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T12:15:05.6696913Z E1204 11:59:39.304000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 5120 2025-12-04T12:15:05.6697414Z E1204 11:59:39.304000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T12:15:05.6697964Z E1204 11:59:39.304000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T12:15:05.6698507Z E1204 11:59:39.304000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:05.6699090Z E1204 11:59:39.304000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T12:15:05.6699693Z E1204 11:59:39.304000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T12:15:05.6700372Z E1204 11:59:39.304000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_base = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T12:15:05.6700839Z E1204 11:59:39.304000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rbase = r0_base 2025-12-04T12:15:05.6701469Z E1204 11:59:39.304000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp13 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32) 2025-12-04T12:15:05.6702142Z E1204 11:59:39.304000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp15 = tl.load(in_ptr3 + (0)) 2025-12-04T12:15:05.6702697Z E1204 11:59:39.304000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp16 = tl.broadcast_to(tmp15, [1, 1]) 2025-12-04T12:15:05.6703280Z E1204 11:59:39.304000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] for r0_offset in tl.range(0, r0_numel, R0_BLOCK): 2025-12-04T12:15:05.6703825Z E1204 11:59:39.304000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = r0_offset + r0_base 2025-12-04T12:15:05.6704354Z E1204 11:59:39.304000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T12:15:05.6704857Z E1204 11:59:39.304000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T12:15:05.6705334Z E1204 11:59:39.304000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T12:15:05.6705801Z E1204 11:59:39.304000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_2 = r0_index 2025-12-04T12:15:05.6711232Z E1204 11:59:39.304000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_1 = r0_index // 512 2025-12-04T12:15:05.6712165Z E1204 11:59:39.304000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_2), r0_mask, eviction_policy='evict_first', other=0.0).to(tl.float32) 2025-12-04T12:15:05.6712885Z E1204 11:59:39.304000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.load(in_ptr1 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0) 2025-12-04T12:15:05.6713574Z E1204 11:59:39.304000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tl.load(in_ptr2 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0) 2025-12-04T12:15:05.6714106Z E1204 11:59:39.304000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float32) 2025-12-04T12:15:05.6714684Z E1204 11:59:39.304000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tmp1 - tmp2 2025-12-04T12:15:05.6715148Z E1204 11:59:39.304000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = 512.0 2025-12-04T12:15:05.6715662Z E1204 11:59:39.304000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp6 = (tmp4 / tmp5) 2025-12-04T12:15:05.6716158Z E1204 11:59:39.304000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = 1e-05 2025-12-04T12:15:05.6716642Z E1204 11:59:39.304000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tmp6 + tmp7 2025-12-04T12:15:05.6717191Z E1204 11:59:39.304000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = libdevice.rsqrt(tmp8) 2025-12-04T12:15:05.6717686Z E1204 11:59:39.304000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = tmp3 * tmp9 2025-12-04T12:15:05.6718220Z E1204 11:59:39.304000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = tl_math.abs(tmp10) 2025-12-04T12:15:05.6718869Z E1204 11:59:39.304000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = tl.broadcast_to(tmp11, [XBLOCK, R0_BLOCK]) 2025-12-04T12:15:05.6719461Z E1204 11:59:39.304000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp14 = triton_helpers.maximum(_tmp13, tmp12) 2025-12-04T12:15:05.6720040Z E1204 11:59:39.304000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp13 = tl.where(r0_mask, tmp14, _tmp13) 2025-12-04T12:15:05.6720539Z E1204 11:59:39.304000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp17 = tmp10 * tmp16 2025-12-04T12:15:05.6721023Z E1204 11:59:39.304000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp18 = -448.0 2025-12-04T12:15:05.6721603Z E1204 11:59:39.304000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp19 = triton_helpers.maximum(tmp17, tmp18) 2025-12-04T12:15:05.6722078Z E1204 11:59:39.304000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp20 = 448.0 2025-12-04T12:15:05.6722661Z E1204 11:59:39.304000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp21 = triton_helpers.minimum(tmp19, tmp20) 2025-12-04T12:15:05.6723201Z E1204 11:59:39.304000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp22 = tmp21.to(tl.float8e4nv) 2025-12-04T12:15:05.6723929Z E1204 11:59:39.304000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (tl.broadcast_to(r0_2, [XBLOCK, R0_BLOCK])), tmp22, r0_mask) 2025-12-04T12:15:05.6724511Z E1204 11:59:39.304000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = triton_helpers.max2(_tmp13, 1)[:, None] 2025-12-04T12:15:05.6725096Z E1204 11:59:39.304000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp23 = tmp13.to(tl.float32) 2025-12-04T12:15:05.6725806Z E1204 11:59:39.304000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr2 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp23, None) 2025-12-04T12:15:05.6726171Z E1204 11:59:39.304000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:05.6728856Z E1204 11:59:39.304000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'in_ptr2': '*fp32', 'in_ptr3': '*fp32', 'out_ptr1': '*fp8e4nv', 'out_ptr2': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1, 'R0_BLOCK': 2048}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]], (7,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 75} 2025-12-04T12:15:05.6729442Z E1204 11:59:39.304000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:05.6730495Z E1204 11:59:39.304000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.6731564Z E1204 11:59:39.304000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.6732482Z E1204 11:59:39.304000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.6733226Z E1204 11:59:39.304000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.6734127Z E1204 11:59:39.304000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.6734897Z E1204 11:59:39.304000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.6735526Z E1204 11:59:39.304000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T12:15:05.6736857Z E1204 11:59:39.304000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T12:15:05.6737231Z E1204 11:59:39.304000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T12:15:05.6738139Z E1204 11:59:39.304000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.6738275Z ('RERUN', {'yellow': True}) [0.5152s] [100%] 2025-12-04T12:15:05.6739730Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,512_cuda E1204 11:59:39.827000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1 2025-12-04T12:15:05.6741024Z E1204 11:59:39.827000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T12:15:05.6741474Z E1204 11:59:39.827000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T12:15:05.6741922Z E1204 11:59:39.827000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 5120 2025-12-04T12:15:05.6742380Z E1204 11:59:39.827000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T12:15:05.6742963Z E1204 11:59:39.827000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T12:15:05.6743509Z E1204 11:59:39.827000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:05.6744104Z E1204 11:59:39.827000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T12:15:05.6744715Z E1204 11:59:39.827000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T12:15:05.6745276Z E1204 11:59:39.827000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_base = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T12:15:05.6745725Z E1204 11:59:39.827000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rbase = r0_base 2025-12-04T12:15:05.6746362Z E1204 11:59:39.827000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp13 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32) 2025-12-04T12:15:05.6746937Z E1204 11:59:39.827000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp15 = tl.load(in_ptr3 + (0)) 2025-12-04T12:15:05.6747480Z E1204 11:59:39.827000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp16 = tl.broadcast_to(tmp15, [1, 1]) 2025-12-04T12:15:05.6748072Z E1204 11:59:39.827000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] for r0_offset in tl.range(0, r0_numel, R0_BLOCK): 2025-12-04T12:15:05.6748600Z E1204 11:59:39.827000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = r0_offset + r0_base 2025-12-04T12:15:05.6749124Z E1204 11:59:39.827000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T12:15:05.6749628Z E1204 11:59:39.827000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T12:15:05.6750108Z E1204 11:59:39.827000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T12:15:05.6750585Z E1204 11:59:39.827000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_2 = r0_index 2025-12-04T12:15:05.6751085Z E1204 11:59:39.827000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_1 = r0_index // 512 2025-12-04T12:15:05.6751848Z E1204 11:59:39.827000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_2), r0_mask, eviction_policy='evict_first', other=0.0).to(tl.float32) 2025-12-04T12:15:05.6752553Z E1204 11:59:39.827000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.load(in_ptr1 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0) 2025-12-04T12:15:05.6753269Z E1204 11:59:39.827000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tl.load(in_ptr2 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0) 2025-12-04T12:15:05.6753816Z E1204 11:59:39.827000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float32) 2025-12-04T12:15:05.6754306Z E1204 11:59:39.827000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tmp1 - tmp2 2025-12-04T12:15:05.6754771Z E1204 11:59:39.827000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = 512.0 2025-12-04T12:15:05.6755259Z E1204 11:59:39.827000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp6 = (tmp4 / tmp5) 2025-12-04T12:15:05.6755714Z E1204 11:59:39.827000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = 1e-05 2025-12-04T12:15:05.6756240Z E1204 11:59:39.827000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tmp6 + tmp7 2025-12-04T12:15:05.6756781Z E1204 11:59:39.827000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = libdevice.rsqrt(tmp8) 2025-12-04T12:15:05.6757283Z E1204 11:59:39.827000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = tmp3 * tmp9 2025-12-04T12:15:05.6757839Z E1204 11:59:39.827000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = tl_math.abs(tmp10) 2025-12-04T12:15:05.6758432Z E1204 11:59:39.827000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = tl.broadcast_to(tmp11, [XBLOCK, R0_BLOCK]) 2025-12-04T12:15:05.6759030Z E1204 11:59:39.827000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp14 = triton_helpers.maximum(_tmp13, tmp12) 2025-12-04T12:15:05.6759594Z E1204 11:59:39.827000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp13 = tl.where(r0_mask, tmp14, _tmp13) 2025-12-04T12:15:05.6760139Z E1204 11:59:39.827000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp17 = tmp10 * tmp16 2025-12-04T12:15:05.6760602Z E1204 11:59:39.827000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp18 = -448.0 2025-12-04T12:15:05.6761181Z E1204 11:59:39.827000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp19 = triton_helpers.maximum(tmp17, tmp18) 2025-12-04T12:15:05.6761650Z E1204 11:59:39.827000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp20 = 448.0 2025-12-04T12:15:05.6762220Z E1204 11:59:39.827000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp21 = triton_helpers.minimum(tmp19, tmp20) 2025-12-04T12:15:05.6762772Z E1204 11:59:39.827000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp22 = tmp21.to(tl.float8e4nv) 2025-12-04T12:15:05.6763482Z E1204 11:59:39.827000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (tl.broadcast_to(r0_2, [XBLOCK, R0_BLOCK])), tmp22, r0_mask) 2025-12-04T12:15:05.6764074Z E1204 11:59:39.827000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = triton_helpers.max2(_tmp13, 1)[:, None] 2025-12-04T12:15:05.6764591Z E1204 11:59:39.827000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp23 = tmp13.to(tl.float32) 2025-12-04T12:15:05.6765300Z E1204 11:59:39.827000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr2 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp23, None) 2025-12-04T12:15:05.6765675Z E1204 11:59:39.827000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:05.6768340Z E1204 11:59:39.827000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'in_ptr2': '*fp32', 'in_ptr3': '*fp32', 'out_ptr1': '*fp8e4nv', 'out_ptr2': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1, 'R0_BLOCK': 2048}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]], (7,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 75} 2025-12-04T12:15:05.6768901Z E1204 11:59:39.827000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:05.6769978Z E1204 11:59:39.827000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.6770626Z E1204 11:59:39.827000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.6771762Z E1204 11:59:39.827000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.6772463Z E1204 11:59:39.827000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.6773350Z E1204 11:59:39.827000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.6774144Z E1204 11:59:39.827000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.6774858Z E1204 11:59:39.827000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T12:15:05.6776109Z E1204 11:59:39.827000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T12:15:05.6776559Z E1204 11:59:39.827000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T12:15:05.6777523Z E1204 11:59:39.827000 119094 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.6777647Z FAILED [0.5182s] [100%] 2025-12-04T12:15:05.6777657Z 2025-12-04T12:15:05.6777806Z ==================================== RERUNS ==================================== 2025-12-04T12:15:05.6778208Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,512_cuda _ 2025-12-04T12:15:05.6778353Z Traceback (most recent call last): 2025-12-04T12:15:05.6778782Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant 2025-12-04T12:15:05.6779034Z y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T12:15:05.6779525Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:05.6779777Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:05.6780307Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:05.6780575Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:05.6781098Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:05.6781251Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:05.6781785Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:05.6782123Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:05.6782646Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:05.6782796Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:05.6783340Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:05.6783467Z return self._compile_to_module() 2025-12-04T12:15:05.6783969Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:05.6784133Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:05.6784697Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:05.6784843Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:05.6785341Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:05.6785587Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:05.6786175Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:05.6786304Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:05.6786851Z File "/tmp/tmp7yaj2ngs/hn/chnd5zeoc6rawi6vgrzw53ix6p54jmik4pp2r6spf4q2h6x272gd.py", line 137, in 2025-12-04T12:15:05.6787314Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T12:15:05.6787428Z kernel.precompile( 2025-12-04T12:15:05.6787995Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T12:15:05.6788116Z self._precompile_worker() 2025-12-04T12:15:05.6788724Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:05.6788904Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:05.6789498Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.6789710Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.6790165Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.6790423Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.6790867Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.6791200Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.6791439Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:05.6792245Z def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T12:15:05.6792353Z ^ 2025-12-04T12:15:05.6792813Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.6792822Z 2025-12-04T12:15:05.6793576Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:05.6793586Z 2025-12-04T12:15:05.6793590Z 2025-12-04T12:15:05.6793821Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:05.6794524Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,512_cuda 2025-12-04T12:15:05.6794530Z 2025-12-04T12:15:05.6794812Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:05.6795039Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.6795184Z frames [('total', 1)] 2025-12-04T12:15:05.6795322Z stats [('calls_captured', 10)] 2025-12-04T12:15:05.6795797Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:15:05.6796040Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.6796142Z graph_break [] 2025-12-04T12:15:05.6796542Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,512_cuda _ 2025-12-04T12:15:05.6796718Z Traceback (most recent call last): 2025-12-04T12:15:05.6797143Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant 2025-12-04T12:15:05.6797377Z y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T12:15:05.6797887Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:05.6798139Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:05.6798666Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:05.6798907Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:05.6799416Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:05.6799580Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:05.6800117Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:05.6800448Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:05.6800971Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:05.6801121Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:05.6801616Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:05.6801740Z return self._compile_to_module() 2025-12-04T12:15:05.6802225Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:05.6802402Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:05.6802919Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:05.6803061Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:05.6803558Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:05.6803790Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:05.6804392Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:05.6804522Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:05.6805085Z File "/tmp/tmpfnxbtl3o/ts/ctsuz7uvnj6sfg3btq44tonxqbdbfymagb3i25w7q2d57gsazfst.py", line 137, in 2025-12-04T12:15:05.6805553Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T12:15:05.6805668Z kernel.precompile( 2025-12-04T12:15:05.6806234Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T12:15:05.6806352Z self._precompile_worker() 2025-12-04T12:15:05.6806945Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:05.6807141Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:05.6807773Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.6807988Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.6808447Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.6808695Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.6809192Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.6809530Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.6809769Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:05.6810577Z def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T12:15:05.6810673Z ^ 2025-12-04T12:15:05.6811146Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.6811220Z 2025-12-04T12:15:05.6811934Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:05.6811943Z 2025-12-04T12:15:05.6811947Z 2025-12-04T12:15:05.6812179Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:05.6812887Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,512_cuda 2025-12-04T12:15:05.6812893Z 2025-12-04T12:15:05.6813165Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:05.6813404Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.6813515Z frames [('total', 1)] 2025-12-04T12:15:05.6813650Z stats [('calls_captured', 10)] 2025-12-04T12:15:05.6814125Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:15:05.6814349Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.6814467Z graph_break [] 2025-12-04T12:15:05.6814691Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.6814798Z frames [('total', 1)] 2025-12-04T12:15:05.6814932Z stats [('calls_captured', 10)] 2025-12-04T12:15:05.6815155Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.6815635Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:15:05.6815735Z graph_break [] 2025-12-04T12:15:05.6815882Z =================================== FAILURES =================================== 2025-12-04T12:15:05.6816368Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,512_cuda _ 2025-12-04T12:15:05.6816500Z Traceback (most recent call last): 2025-12-04T12:15:05.6816975Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant 2025-12-04T12:15:05.6817226Z y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T12:15:05.6817718Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:05.6817984Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:05.6818500Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:05.6818698Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:05.6819256Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:05.6819409Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:05.6819959Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:05.6820282Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:05.6820834Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:05.6820999Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:05.6821481Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:05.6821606Z return self._compile_to_module() 2025-12-04T12:15:05.6822100Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:05.6822267Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:05.6822797Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:05.6822970Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:05.6823467Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:05.6823719Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:05.6824305Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:05.6824447Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:05.6824959Z File "/tmp/tmp3q37ahr4/bs/cbsbpp7pjvsphuatpn2c2cok6zsa7zbbrfej5uieczdhghgxnl6w.py", line 137, in 2025-12-04T12:15:05.6825426Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T12:15:05.6825556Z kernel.precompile( 2025-12-04T12:15:05.6826114Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T12:15:05.6826232Z self._precompile_worker() 2025-12-04T12:15:05.6826840Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:05.6827023Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:05.6827627Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.6827829Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.6828282Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.6828541Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.6828986Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.6829373Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.6829603Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:05.6830413Z def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T12:15:05.6830516Z ^ 2025-12-04T12:15:05.6830973Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.6830979Z 2025-12-04T12:15:05.6831736Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:05.6831743Z 2025-12-04T12:15:05.6831748Z 2025-12-04T12:15:05.6831971Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:05.6832679Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,512_cuda 2025-12-04T12:15:05.6832700Z 2025-12-04T12:15:05.6833009Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:05.6833232Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.6833353Z frames [('total', 1)] 2025-12-04T12:15:05.6833474Z stats [('calls_captured', 10)] 2025-12-04T12:15:05.6833940Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:15:05.6834178Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.6834279Z graph_break [] 2025-12-04T12:15:05.6834498Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.6834660Z frames [('total', 1)] 2025-12-04T12:15:05.6834777Z stats [('calls_captured', 10)] 2025-12-04T12:15:05.6835009Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.6835471Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:15:05.6835575Z graph_break [] 2025-12-04T12:15:05.6835806Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.6835911Z frames [('total', 1)] 2025-12-04T12:15:05.6836026Z stats [('calls_captured', 10)] 2025-12-04T12:15:05.6836254Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.6836707Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:15:05.6836821Z graph_break [] 2025-12-04T12:15:05.6837469Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-2f380c761dc75570.xml - 2025-12-04T12:15:05.6837651Z =========================== short test summary info ============================ 2025-12-04T12:15:05.6838509Z FAILED [0.5182s] inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,512_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:05.6839315Z def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T12:15:05.6839420Z ^ 2025-12-04T12:15:05.6839878Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.6839884Z 2025-12-04T12:15:05.6840593Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:05.6840614Z 2025-12-04T12:15:05.6840618Z 2025-12-04T12:15:05.6840874Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:05.6842131Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,512_cuda 2025-12-04T12:15:05.6842141Z 2025-12-04T12:15:05.6842433Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:05.6842621Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T12:15:05.6842843Z ================== 1 failed, 187 deselected, 2 rerun in 4.61s ================== 2025-12-04T12:15:05.6842948Z Got exit code 1 2025-12-04T12:15:05.6843626Z FAILED CONSISTENTLY: test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,512_cuda 2025-12-04T12:15:05.6844054Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T12:15:05.6844533Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-db3aa4c2f1c0f2c1.xml 2025-12-04T12:15:05.6844698Z ============================= test session starts ============================== 2025-12-04T12:15:05.6845104Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T12:15:05.6845218Z cachedir: .pytest_cache 2025-12-04T12:15:05.6845749Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:15:05.6845874Z rootdir: /var/lib/jenkins/workspace 2025-12-04T12:15:05.6845983Z configfile: pytest.ini 2025-12-04T12:15:05.6846771Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T12:15:05.6847005Z collecting ... collected 188 items / 31 deselected / 157 selected 2025-12-04T12:15:05.6847227Z stepcurrent: skipping 31 already run items. 2025-12-04T12:15:05.6847361Z Running 157 items in this shard 2025-12-04T12:15:05.6847367Z 2025-12-04T12:15:05.6848738Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_4,2048,4096_cuda E1204 11:59:58.278000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0 2025-12-04T12:15:05.6849850Z E1204 11:59:58.278000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T12:15:05.6850304Z E1204 11:59:58.278000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 8192 2025-12-04T12:15:05.6850777Z E1204 11:59:58.278000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 4096 2025-12-04T12:15:05.6851242Z E1204 11:59:58.278000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T12:15:05.6851780Z E1204 11:59:58.278000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T12:15:05.6852341Z E1204 11:59:58.278000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:05.6852930Z E1204 11:59:58.278000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T12:15:05.6853538Z E1204 11:59:58.278000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T12:15:05.6854135Z E1204 11:59:58.278000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_base = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T12:15:05.6854590Z E1204 11:59:58.278000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rbase = r0_base 2025-12-04T12:15:05.6855040Z E1204 11:59:58.278000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T12:15:05.6855643Z E1204 11:59:58.278000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_mean = tl.zeros([XBLOCK, R0_BLOCK], tl.float32) 2025-12-04T12:15:05.6856241Z E1204 11:59:58.278000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_m2 = tl.zeros([XBLOCK, R0_BLOCK], tl.float32) 2025-12-04T12:15:05.6856973Z E1204 11:59:58.278000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_weight = tl.zeros([XBLOCK, R0_BLOCK], tl.float32) 2025-12-04T12:15:05.6857572Z E1204 11:59:58.278000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] for r0_offset in tl.range(0, r0_numel, R0_BLOCK): 2025-12-04T12:15:05.6858112Z E1204 11:59:58.278000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = r0_offset + r0_base 2025-12-04T12:15:05.6858704Z E1204 11:59:58.278000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T12:15:05.6859207Z E1204 11:59:58.278000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T12:15:05.6859688Z E1204 11:59:58.278000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T12:15:05.6860167Z E1204 11:59:58.278000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_1 = r0_index 2025-12-04T12:15:05.6860955Z E1204 11:59:58.278000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask, eviction_policy='evict_last', other=0.0).to(tl.float32) 2025-12-04T12:15:05.6861521Z E1204 11:59:58.278000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float32) 2025-12-04T12:15:05.6862124Z E1204 11:59:58.278000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK]) 2025-12-04T12:15:05.6862843Z E1204 11:59:58.278000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_mean_next, tmp3_m2_next, tmp3_weight_next = triton_helpers.welford_reduce( 2025-12-04T12:15:05.6863456Z E1204 11:59:58.278000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2, tmp3_mean, tmp3_m2, tmp3_weight, roffset == 0 2025-12-04T12:15:05.6863858Z E1204 11:59:58.278000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ) 2025-12-04T12:15:05.6864482Z E1204 11:59:58.278000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_mean = tl.where(r0_mask, tmp3_mean_next, tmp3_mean) 2025-12-04T12:15:05.6865079Z E1204 11:59:58.278000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_m2 = tl.where(r0_mask, tmp3_m2_next, tmp3_m2) 2025-12-04T12:15:05.6865725Z E1204 11:59:58.278000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_weight = tl.where(r0_mask, tmp3_weight_next, tmp3_weight) 2025-12-04T12:15:05.6866444Z E1204 11:59:58.278000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4, tmp5, tmp6 = triton_helpers.welford(tmp3_mean, tmp3_m2, tmp3_weight, 1) 2025-12-04T12:15:05.6866924Z E1204 11:59:58.278000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tmp4[:, None] 2025-12-04T12:15:05.6867411Z E1204 11:59:58.278000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = tmp5[:, None] 2025-12-04T12:15:05.6867928Z E1204 11:59:58.278000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tmp6[:, None] 2025-12-04T12:15:05.6868564Z E1204 11:59:58.278000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp20 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32) 2025-12-04T12:15:05.6869100Z E1204 11:59:58.278000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp22 = tl.load(in_ptr1 + (0)) 2025-12-04T12:15:05.6869646Z E1204 11:59:58.278000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp23 = tl.broadcast_to(tmp22, [1, 1]) 2025-12-04T12:15:05.6870275Z E1204 11:59:58.278000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] for r0_offset in tl.range(0, r0_numel, R0_BLOCK): 2025-12-04T12:15:05.6870809Z E1204 11:59:58.278000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = r0_offset + r0_base 2025-12-04T12:15:05.6871528Z E1204 11:59:58.278000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T12:15:05.6872036Z E1204 11:59:58.278000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T12:15:05.6872604Z E1204 11:59:58.278000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T12:15:05.6873084Z E1204 11:59:58.278000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_1 = r0_index 2025-12-04T12:15:05.6873877Z E1204 11:59:58.278000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask, eviction_policy='evict_first', other=0.0).to(tl.float32) 2025-12-04T12:15:05.6874424Z E1204 11:59:58.278000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = tmp9.to(tl.float32) 2025-12-04T12:15:05.6874977Z E1204 11:59:58.278000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = tmp10 - tmp3 2025-12-04T12:15:05.6875444Z E1204 11:59:58.278000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = 4096.0 2025-12-04T12:15:05.6875962Z E1204 11:59:58.278000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = (tmp7 / tmp12) 2025-12-04T12:15:05.6876421Z E1204 11:59:58.278000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp14 = 1e-05 2025-12-04T12:15:05.6876931Z E1204 11:59:58.278000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp15 = tmp13 + tmp14 2025-12-04T12:15:05.6877473Z E1204 11:59:58.278000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp16 = libdevice.rsqrt(tmp15) 2025-12-04T12:15:05.6877971Z E1204 11:59:58.278000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp17 = tmp11 * tmp16 2025-12-04T12:15:05.6878508Z E1204 11:59:58.278000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp18 = tl_math.abs(tmp17) 2025-12-04T12:15:05.6879104Z E1204 11:59:58.278000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp19 = tl.broadcast_to(tmp18, [XBLOCK, R0_BLOCK]) 2025-12-04T12:15:05.6879695Z E1204 11:59:58.278000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp21 = triton_helpers.maximum(_tmp20, tmp19) 2025-12-04T12:15:05.6880256Z E1204 11:59:58.278000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp20 = tl.where(r0_mask, tmp21, _tmp20) 2025-12-04T12:15:05.6880752Z E1204 11:59:58.278000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp24 = tmp17 * tmp23 2025-12-04T12:15:05.6881292Z E1204 11:59:58.278000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp25 = -448.0 2025-12-04T12:15:05.6881870Z E1204 11:59:58.278000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp26 = triton_helpers.maximum(tmp24, tmp25) 2025-12-04T12:15:05.6882340Z E1204 11:59:58.278000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp27 = 448.0 2025-12-04T12:15:05.6882916Z E1204 11:59:58.278000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp28 = triton_helpers.minimum(tmp26, tmp27) 2025-12-04T12:15:05.6883454Z E1204 11:59:58.278000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp29 = tmp28.to(tl.float8e4nv) 2025-12-04T12:15:05.6884115Z E1204 11:59:58.278000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr3 + (r0_1 + 4096*x0), tmp29, r0_mask) 2025-12-04T12:15:05.6884700Z E1204 11:59:58.278000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp20 = triton_helpers.max2(_tmp20, 1)[:, None] 2025-12-04T12:15:05.6885257Z E1204 11:59:58.278000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr2 + (x0), tmp20, None) 2025-12-04T12:15:05.6885669Z E1204 11:59:58.278000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:05.6888042Z E1204 11:59:58.278000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr2': '*fp32', 'out_ptr3': '*fp8e4nv', 'xnumel': 'i32', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1, 'R0_BLOCK': 4096}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 75} 2025-12-04T12:15:05.6888611Z E1204 11:59:58.278000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:05.6889674Z E1204 11:59:58.278000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.6890306Z E1204 11:59:58.278000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.6891210Z E1204 11:59:58.278000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.6891909Z E1204 11:59:58.278000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.6892801Z E1204 11:59:58.278000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.6893592Z E1204 11:59:58.278000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.6894202Z E1204 11:59:58.278000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T12:15:05.6895346Z E1204 11:59:58.278000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T12:15:05.6895718Z E1204 11:59:58.278000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T12:15:05.6896701Z E1204 11:59:58.278000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.6896839Z ('RERUN', {'yellow': True}) [3.4364s] [ 0%] 2025-12-04T12:15:05.6898226Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_4,2048,4096_cuda E1204 11:59:58.752000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0 2025-12-04T12:15:05.6899331Z E1204 11:59:58.752000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T12:15:05.6899779Z E1204 11:59:58.752000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 8192 2025-12-04T12:15:05.6900279Z E1204 11:59:58.752000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 4096 2025-12-04T12:15:05.6900743Z E1204 11:59:58.752000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T12:15:05.6901277Z E1204 11:59:58.752000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T12:15:05.6901837Z E1204 11:59:58.752000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:05.6902465Z E1204 11:59:58.752000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T12:15:05.6903071Z E1204 11:59:58.752000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T12:15:05.6903635Z E1204 11:59:58.752000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_base = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T12:15:05.6904099Z E1204 11:59:58.752000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rbase = r0_base 2025-12-04T12:15:05.6904533Z E1204 11:59:58.752000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T12:15:05.6905136Z E1204 11:59:58.752000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_mean = tl.zeros([XBLOCK, R0_BLOCK], tl.float32) 2025-12-04T12:15:05.6905745Z E1204 11:59:58.752000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_m2 = tl.zeros([XBLOCK, R0_BLOCK], tl.float32) 2025-12-04T12:15:05.6906353Z E1204 11:59:58.752000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_weight = tl.zeros([XBLOCK, R0_BLOCK], tl.float32) 2025-12-04T12:15:05.6906946Z E1204 11:59:58.752000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] for r0_offset in tl.range(0, r0_numel, R0_BLOCK): 2025-12-04T12:15:05.6907478Z E1204 11:59:58.752000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = r0_offset + r0_base 2025-12-04T12:15:05.6908002Z E1204 11:59:58.752000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T12:15:05.6908510Z E1204 11:59:58.752000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T12:15:05.6909041Z E1204 11:59:58.752000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T12:15:05.6909523Z E1204 11:59:58.752000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_1 = r0_index 2025-12-04T12:15:05.6910315Z E1204 11:59:58.752000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask, eviction_policy='evict_last', other=0.0).to(tl.float32) 2025-12-04T12:15:05.6910841Z E1204 11:59:58.752000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float32) 2025-12-04T12:15:05.6911444Z E1204 11:59:58.752000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK]) 2025-12-04T12:15:05.6912219Z E1204 11:59:58.752000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_mean_next, tmp3_m2_next, tmp3_weight_next = triton_helpers.welford_reduce( 2025-12-04T12:15:05.6912845Z E1204 11:59:58.752000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2, tmp3_mean, tmp3_m2, tmp3_weight, roffset == 0 2025-12-04T12:15:05.6913362Z E1204 11:59:58.752000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ) 2025-12-04T12:15:05.6913992Z E1204 11:59:58.752000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_mean = tl.where(r0_mask, tmp3_mean_next, tmp3_mean) 2025-12-04T12:15:05.6914582Z E1204 11:59:58.752000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_m2 = tl.where(r0_mask, tmp3_m2_next, tmp3_m2) 2025-12-04T12:15:05.6915231Z E1204 11:59:58.752000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_weight = tl.where(r0_mask, tmp3_weight_next, tmp3_weight) 2025-12-04T12:15:05.6915987Z E1204 11:59:58.752000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4, tmp5, tmp6 = triton_helpers.welford(tmp3_mean, tmp3_m2, tmp3_weight, 1) 2025-12-04T12:15:05.6916467Z E1204 11:59:58.752000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tmp4[:, None] 2025-12-04T12:15:05.6916957Z E1204 11:59:58.752000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = tmp5[:, None] 2025-12-04T12:15:05.6917432Z E1204 11:59:58.752000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tmp6[:, None] 2025-12-04T12:15:05.6918064Z E1204 11:59:58.752000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp20 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32) 2025-12-04T12:15:05.6918604Z E1204 11:59:58.752000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp22 = tl.load(in_ptr1 + (0)) 2025-12-04T12:15:05.6919154Z E1204 11:59:58.752000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp23 = tl.broadcast_to(tmp22, [1, 1]) 2025-12-04T12:15:05.6919745Z E1204 11:59:58.752000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] for r0_offset in tl.range(0, r0_numel, R0_BLOCK): 2025-12-04T12:15:05.6920276Z E1204 11:59:58.752000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = r0_offset + r0_base 2025-12-04T12:15:05.6920815Z E1204 11:59:58.752000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T12:15:05.6921303Z E1204 11:59:58.752000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T12:15:05.6921784Z E1204 11:59:58.752000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T12:15:05.6922260Z E1204 11:59:58.752000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_1 = r0_index 2025-12-04T12:15:05.6923085Z E1204 11:59:58.752000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask, eviction_policy='evict_first', other=0.0).to(tl.float32) 2025-12-04T12:15:05.6923627Z E1204 11:59:58.752000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = tmp9.to(tl.float32) 2025-12-04T12:15:05.6924122Z E1204 11:59:58.752000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = tmp10 - tmp3 2025-12-04T12:15:05.6924582Z E1204 11:59:58.752000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = 4096.0 2025-12-04T12:15:05.6925128Z E1204 11:59:58.752000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = (tmp7 / tmp12) 2025-12-04T12:15:05.6925592Z E1204 11:59:58.752000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp14 = 1e-05 2025-12-04T12:15:05.6926105Z E1204 11:59:58.752000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp15 = tmp13 + tmp14 2025-12-04T12:15:05.6926644Z E1204 11:59:58.752000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp16 = libdevice.rsqrt(tmp15) 2025-12-04T12:15:05.6927179Z E1204 11:59:58.752000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp17 = tmp11 * tmp16 2025-12-04T12:15:05.6927713Z E1204 11:59:58.752000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp18 = tl_math.abs(tmp17) 2025-12-04T12:15:05.6928312Z E1204 11:59:58.752000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp19 = tl.broadcast_to(tmp18, [XBLOCK, R0_BLOCK]) 2025-12-04T12:15:05.6928901Z E1204 11:59:58.752000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp21 = triton_helpers.maximum(_tmp20, tmp19) 2025-12-04T12:15:05.6929499Z E1204 11:59:58.752000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp20 = tl.where(r0_mask, tmp21, _tmp20) 2025-12-04T12:15:05.6929999Z E1204 11:59:58.752000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp24 = tmp17 * tmp23 2025-12-04T12:15:05.6930478Z E1204 11:59:58.752000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp25 = -448.0 2025-12-04T12:15:05.6931053Z E1204 11:59:58.752000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp26 = triton_helpers.maximum(tmp24, tmp25) 2025-12-04T12:15:05.6931530Z E1204 11:59:58.752000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp27 = 448.0 2025-12-04T12:15:05.6932104Z E1204 11:59:58.752000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp28 = triton_helpers.minimum(tmp26, tmp27) 2025-12-04T12:15:05.6932660Z E1204 11:59:58.752000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp29 = tmp28.to(tl.float8e4nv) 2025-12-04T12:15:05.6933254Z E1204 11:59:58.752000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr3 + (r0_1 + 4096*x0), tmp29, r0_mask) 2025-12-04T12:15:05.6933830Z E1204 11:59:58.752000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp20 = triton_helpers.max2(_tmp20, 1)[:, None] 2025-12-04T12:15:05.6934391Z E1204 11:59:58.752000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr2 + (x0), tmp20, None) 2025-12-04T12:15:05.6934751Z E1204 11:59:58.752000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:05.6937232Z E1204 11:59:58.752000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr2': '*fp32', 'out_ptr3': '*fp8e4nv', 'xnumel': 'i32', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1, 'R0_BLOCK': 4096}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 75} 2025-12-04T12:15:05.6937776Z E1204 11:59:58.752000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:05.6938855Z E1204 11:59:58.752000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.6939487Z E1204 11:59:58.752000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.6940397Z E1204 11:59:58.752000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.6941107Z E1204 11:59:58.752000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.6941988Z E1204 11:59:58.752000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.6942778Z E1204 11:59:58.752000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.6943420Z E1204 11:59:58.752000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T12:15:05.6944523Z E1204 11:59:58.752000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T12:15:05.6944892Z E1204 11:59:58.752000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T12:15:05.6945803Z E1204 11:59:58.752000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.6945940Z ('RERUN', {'yellow': True}) [0.4364s] [ 0%] 2025-12-04T12:15:05.6947301Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_4,2048,4096_cuda E1204 11:59:59.197000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0 2025-12-04T12:15:05.6948400Z E1204 11:59:59.197000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T12:15:05.6948849Z E1204 11:59:59.197000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 8192 2025-12-04T12:15:05.6949316Z E1204 11:59:59.197000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 4096 2025-12-04T12:15:05.6949776Z E1204 11:59:59.197000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T12:15:05.6950354Z E1204 11:59:59.197000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T12:15:05.6950901Z E1204 11:59:59.197000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:05.6951484Z E1204 11:59:59.197000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T12:15:05.6952080Z E1204 11:59:59.197000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T12:15:05.6952638Z E1204 11:59:59.197000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_base = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T12:15:05.6953128Z E1204 11:59:59.197000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rbase = r0_base 2025-12-04T12:15:05.6953563Z E1204 11:59:59.197000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T12:15:05.6954160Z E1204 11:59:59.197000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_mean = tl.zeros([XBLOCK, R0_BLOCK], tl.float32) 2025-12-04T12:15:05.6954787Z E1204 11:59:59.197000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_m2 = tl.zeros([XBLOCK, R0_BLOCK], tl.float32) 2025-12-04T12:15:05.6955392Z E1204 11:59:59.197000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_weight = tl.zeros([XBLOCK, R0_BLOCK], tl.float32) 2025-12-04T12:15:05.6955992Z E1204 11:59:59.197000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] for r0_offset in tl.range(0, r0_numel, R0_BLOCK): 2025-12-04T12:15:05.6956527Z E1204 11:59:59.197000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = r0_offset + r0_base 2025-12-04T12:15:05.6957091Z E1204 11:59:59.197000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T12:15:05.6957592Z E1204 11:59:59.197000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T12:15:05.6958073Z E1204 11:59:59.197000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T12:15:05.6958551Z E1204 11:59:59.197000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_1 = r0_index 2025-12-04T12:15:05.6959334Z E1204 11:59:59.197000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask, eviction_policy='evict_last', other=0.0).to(tl.float32) 2025-12-04T12:15:05.6959866Z E1204 11:59:59.197000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float32) 2025-12-04T12:15:05.6960457Z E1204 11:59:59.197000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK]) 2025-12-04T12:15:05.6961175Z E1204 11:59:59.197000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_mean_next, tmp3_m2_next, tmp3_weight_next = triton_helpers.welford_reduce( 2025-12-04T12:15:05.6961792Z E1204 11:59:59.197000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2, tmp3_mean, tmp3_m2, tmp3_weight, roffset == 0 2025-12-04T12:15:05.6962195Z E1204 11:59:59.197000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ) 2025-12-04T12:15:05.6962827Z E1204 11:59:59.197000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_mean = tl.where(r0_mask, tmp3_mean_next, tmp3_mean) 2025-12-04T12:15:05.6963472Z E1204 11:59:59.197000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_m2 = tl.where(r0_mask, tmp3_m2_next, tmp3_m2) 2025-12-04T12:15:05.6964122Z E1204 11:59:59.197000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_weight = tl.where(r0_mask, tmp3_weight_next, tmp3_weight) 2025-12-04T12:15:05.6964842Z E1204 11:59:59.197000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4, tmp5, tmp6 = triton_helpers.welford(tmp3_mean, tmp3_m2, tmp3_weight, 1) 2025-12-04T12:15:05.6965320Z E1204 11:59:59.197000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tmp4[:, None] 2025-12-04T12:15:05.6965809Z E1204 11:59:59.197000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = tmp5[:, None] 2025-12-04T12:15:05.6966316Z E1204 11:59:59.197000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tmp6[:, None] 2025-12-04T12:15:05.6966968Z E1204 11:59:59.197000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp20 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32) 2025-12-04T12:15:05.6967490Z E1204 11:59:59.197000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp22 = tl.load(in_ptr1 + (0)) 2025-12-04T12:15:05.6968069Z E1204 11:59:59.197000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp23 = tl.broadcast_to(tmp22, [1, 1]) 2025-12-04T12:15:05.6968658Z E1204 11:59:59.197000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] for r0_offset in tl.range(0, r0_numel, R0_BLOCK): 2025-12-04T12:15:05.6969187Z E1204 11:59:59.197000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = r0_offset + r0_base 2025-12-04T12:15:05.6969728Z E1204 11:59:59.197000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T12:15:05.6970249Z E1204 11:59:59.197000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T12:15:05.6970726Z E1204 11:59:59.197000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T12:15:05.6971433Z E1204 11:59:59.197000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_1 = r0_index 2025-12-04T12:15:05.6972220Z E1204 11:59:59.197000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask, eviction_policy='evict_first', other=0.0).to(tl.float32) 2025-12-04T12:15:05.6972767Z E1204 11:59:59.197000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = tmp9.to(tl.float32) 2025-12-04T12:15:05.6973265Z E1204 11:59:59.197000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = tmp10 - tmp3 2025-12-04T12:15:05.6973732Z E1204 11:59:59.197000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = 4096.0 2025-12-04T12:15:05.6974250Z E1204 11:59:59.197000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = (tmp7 / tmp12) 2025-12-04T12:15:05.6974708Z E1204 11:59:59.197000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp14 = 1e-05 2025-12-04T12:15:05.6975217Z E1204 11:59:59.197000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp15 = tmp13 + tmp14 2025-12-04T12:15:05.6975753Z E1204 11:59:59.197000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp16 = libdevice.rsqrt(tmp15) 2025-12-04T12:15:05.6976249Z E1204 11:59:59.197000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp17 = tmp11 * tmp16 2025-12-04T12:15:05.6976847Z E1204 11:59:59.197000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp18 = tl_math.abs(tmp17) 2025-12-04T12:15:05.6977543Z E1204 11:59:59.197000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp19 = tl.broadcast_to(tmp18, [XBLOCK, R0_BLOCK]) 2025-12-04T12:15:05.6978140Z E1204 11:59:59.197000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp21 = triton_helpers.maximum(_tmp20, tmp19) 2025-12-04T12:15:05.6978704Z E1204 11:59:59.197000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp20 = tl.where(r0_mask, tmp21, _tmp20) 2025-12-04T12:15:05.6979210Z E1204 11:59:59.197000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp24 = tmp17 * tmp23 2025-12-04T12:15:05.6979724Z E1204 11:59:59.197000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp25 = -448.0 2025-12-04T12:15:05.6980310Z E1204 11:59:59.197000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp26 = triton_helpers.maximum(tmp24, tmp25) 2025-12-04T12:15:05.6980783Z E1204 11:59:59.197000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp27 = 448.0 2025-12-04T12:15:05.6981405Z E1204 11:59:59.197000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp28 = triton_helpers.minimum(tmp26, tmp27) 2025-12-04T12:15:05.6981957Z E1204 11:59:59.197000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp29 = tmp28.to(tl.float8e4nv) 2025-12-04T12:15:05.6982547Z E1204 11:59:59.197000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr3 + (r0_1 + 4096*x0), tmp29, r0_mask) 2025-12-04T12:15:05.6983127Z E1204 11:59:59.197000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp20 = triton_helpers.max2(_tmp20, 1)[:, None] 2025-12-04T12:15:05.6983730Z E1204 11:59:59.197000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr2 + (x0), tmp20, None) 2025-12-04T12:15:05.6984089Z E1204 11:59:59.197000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:05.6986439Z E1204 11:59:59.197000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr2': '*fp32', 'out_ptr3': '*fp8e4nv', 'xnumel': 'i32', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1, 'R0_BLOCK': 4096}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 75} 2025-12-04T12:15:05.6986979Z E1204 11:59:59.197000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:05.6988031Z E1204 11:59:59.197000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.6988661Z E1204 11:59:59.197000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.6989566Z E1204 11:59:59.197000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.6990251Z E1204 11:59:59.197000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.6991164Z E1204 11:59:59.197000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.6991962Z E1204 11:59:59.197000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.6992572Z E1204 11:59:59.197000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T12:15:05.6993706Z E1204 11:59:59.197000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T12:15:05.6994074Z E1204 11:59:59.197000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T12:15:05.6994985Z E1204 11:59:59.197000 119313 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.6995127Z FAILED [0.4424s] [ 0%] 2025-12-04T12:15:05.6995133Z 2025-12-04T12:15:05.6995280Z ==================================== RERUNS ==================================== 2025-12-04T12:15:05.6995703Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_4,2048,4096_cuda _ 2025-12-04T12:15:05.6995832Z Traceback (most recent call last): 2025-12-04T12:15:05.6996266Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant 2025-12-04T12:15:05.6996506Z y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T12:15:05.6996996Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:05.6997289Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:05.6997804Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:05.6998013Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:05.6998522Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:05.6998672Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:05.6999222Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:05.6999545Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:05.7000071Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:05.7000241Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:05.7000727Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:05.7000869Z return self._compile_to_module() 2025-12-04T12:15:05.7001357Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:05.7001523Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:05.7002058Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:05.7002192Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:05.7002704Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:05.7002940Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:05.7003563Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:05.7003709Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:05.7004219Z File "/tmp/tmp76glkdzi/po/cpoyyo3lz5ssavri25muw6nyqrtinsgmmqbsiyzihwijrvuhg4kw.py", line 65, in 2025-12-04T12:15:05.7004686Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T12:15:05.7004815Z kernel.precompile( 2025-12-04T12:15:05.7005373Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T12:15:05.7005511Z self._precompile_worker() 2025-12-04T12:15:05.7006139Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:05.7006322Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:05.7006941Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.7007142Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.7007640Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.7007888Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.7008332Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.7008684Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.7008913Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:05.7009570Z def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T12:15:05.7009727Z ^ 2025-12-04T12:15:05.7010183Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.7010189Z 2025-12-04T12:15:05.7010916Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:05.7010923Z 2025-12-04T12:15:05.7010928Z 2025-12-04T12:15:05.7011147Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:05.7011879Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_4,2048,4096_cuda 2025-12-04T12:15:05.7011885Z 2025-12-04T12:15:05.7012159Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:05.7012388Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.7012517Z frames [('total', 1)] 2025-12-04T12:15:05.7012860Z stats [('calls_captured', 10)] 2025-12-04T12:15:05.7013561Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:05.7014409Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.7014878Z graph_break [] 2025-12-04T12:15:05.7015434Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_4,2048,4096_cuda _ 2025-12-04T12:15:05.7016121Z Traceback (most recent call last): 2025-12-04T12:15:05.7016872Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant 2025-12-04T12:15:05.7017682Z y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T12:15:05.7018540Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:05.7019481Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:05.7020401Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:05.7021241Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:05.7022160Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:05.7022980Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:05.7023802Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:05.7024808Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:05.7025829Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:05.7026659Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:05.7027431Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:05.7028180Z return self._compile_to_module() 2025-12-04T12:15:05.7028935Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:05.7029745Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:05.7030572Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:05.7031354Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:05.7032118Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:05.7032999Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:05.7034012Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:05.7034858Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:05.7035625Z File "/tmp/tmp46vfc6fa/vf/cvffpdm2qdkks4ztfr6e2tkrjm5ndpf4p2ef4jszkjrm4xkjivxj.py", line 65, in 2025-12-04T12:15:05.7036745Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T12:15:05.7037470Z kernel.precompile( 2025-12-04T12:15:05.7038204Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T12:15:05.7039026Z self._precompile_worker() 2025-12-04T12:15:05.7039855Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:05.7040766Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:05.7041677Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.7042620Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.7043416Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.7044253Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.7045091Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.7046020Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.7046733Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:05.7047754Z def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T12:15:05.7048645Z ^ 2025-12-04T12:15:05.7049287Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.7049884Z 2025-12-04T12:15:05.7050614Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:05.7051464Z 2025-12-04T12:15:05.7051468Z 2025-12-04T12:15:05.7051692Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:05.7052783Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_4,2048,4096_cuda 2025-12-04T12:15:05.7053656Z 2025-12-04T12:15:05.7053977Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:05.7054639Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.7055126Z frames [('total', 1)] 2025-12-04T12:15:05.7055436Z stats [('calls_captured', 10)] 2025-12-04T12:15:05.7056130Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:05.7057073Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.7057554Z graph_break [] 2025-12-04T12:15:05.7057944Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.7058428Z frames [('total', 1)] 2025-12-04T12:15:05.7058726Z stats [('calls_captured', 10)] 2025-12-04T12:15:05.7059175Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.7060011Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:05.7060712Z graph_break [] 2025-12-04T12:15:05.7061034Z =================================== FAILURES =================================== 2025-12-04T12:15:05.7061804Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_4,2048,4096_cuda _ 2025-12-04T12:15:05.7062472Z Traceback (most recent call last): 2025-12-04T12:15:05.7063151Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant 2025-12-04T12:15:05.7063961Z y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T12:15:05.7064837Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:05.7065708Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:05.7066626Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:05.7067490Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:05.7068343Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:05.7069140Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:05.7069965Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:05.7071177Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:05.7072181Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:05.7072989Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:05.7073773Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:05.7074531Z return self._compile_to_module() 2025-12-04T12:15:05.7075258Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:05.7076160Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:05.7076991Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:05.7077793Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:05.7078543Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:05.7079415Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:05.7080378Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:05.7081236Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:05.7082054Z File "/tmp/tmpuxr1bpil/jz/cjz4vkh4bhcgpe7ew3ykmveyaajtmh6bftewjpr4ecfje5u4d4ee.py", line 65, in 2025-12-04T12:15:05.7083177Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T12:15:05.7083914Z kernel.precompile( 2025-12-04T12:15:05.7084655Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T12:15:05.7085531Z self._precompile_worker() 2025-12-04T12:15:05.7086355Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:05.7087283Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:05.7088182Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.7089128Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.7089929Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.7090817Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.7091644Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.7092569Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.7093280Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:05.7094287Z def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T12:15:05.7095176Z ^ 2025-12-04T12:15:05.7095767Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.7096431Z 2025-12-04T12:15:05.7097165Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:05.7098022Z 2025-12-04T12:15:05.7098026Z 2025-12-04T12:15:05.7098265Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:05.7099328Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_4,2048,4096_cuda 2025-12-04T12:15:05.7100195Z 2025-12-04T12:15:05.7100468Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:05.7101108Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.7101584Z frames [('total', 1)] 2025-12-04T12:15:05.7101874Z stats [('calls_captured', 10)] 2025-12-04T12:15:05.7102561Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:05.7103400Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.7103857Z graph_break [] 2025-12-04T12:15:05.7104298Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.7104779Z frames [('total', 1)] 2025-12-04T12:15:05.7105070Z stats [('calls_captured', 10)] 2025-12-04T12:15:05.7105513Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.7106346Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:05.7107071Z graph_break [] 2025-12-04T12:15:05.7107438Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.7107909Z frames [('total', 1)] 2025-12-04T12:15:05.7108208Z stats [('calls_captured', 10)] 2025-12-04T12:15:05.7108633Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.7109512Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:05.7110226Z graph_break [] 2025-12-04T12:15:05.7111053Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-db3aa4c2f1c0f2c1.xml - 2025-12-04T12:15:05.7112010Z =========================== short test summary info ============================ 2025-12-04T12:15:05.7113270Z FAILED [0.4424s] inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_4,2048,4096_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:05.7114910Z def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T12:15:05.7115800Z ^ 2025-12-04T12:15:05.7116381Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.7116994Z 2025-12-04T12:15:05.7117714Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:05.7118643Z 2025-12-04T12:15:05.7118648Z 2025-12-04T12:15:05.7118867Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:05.7119948Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_4,2048,4096_cuda 2025-12-04T12:15:05.7120792Z 2025-12-04T12:15:05.7121066Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:05.7121670Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T12:15:05.7122200Z ================== 1 failed, 31 deselected, 2 rerun in 4.36s =================== 2025-12-04T12:15:05.7122651Z Got exit code 1 2025-12-04T12:15:05.7122918Z Retrying single test... 2025-12-04T12:15:05.7123589Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-7c20e7902388541e.xml 2025-12-04T12:15:05.7124378Z ============================= test session starts ============================== 2025-12-04T12:15:05.7125035Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T12:15:05.7125646Z cachedir: .pytest_cache 2025-12-04T12:15:05.7126360Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:15:05.7127148Z rootdir: /var/lib/jenkins/workspace 2025-12-04T12:15:05.7127497Z configfile: pytest.ini 2025-12-04T12:15:05.7128286Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T12:15:05.7129252Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T12:15:05.7130410Z stepcurrent: skipping 31 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_4,2048,4096_cuda 2025-12-04T12:15:05.7131514Z Running 1 items in this shard 2025-12-04T12:15:05.7131745Z 2025-12-04T12:15:05.7133105Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_4,2048,4096_cuda E1204 12:00:17.980000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0 2025-12-04T12:15:05.7135713Z E1204 12:00:17.980000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T12:15:05.7137524Z E1204 12:00:17.980000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 8192 2025-12-04T12:15:05.7138564Z E1204 12:00:17.980000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 4096 2025-12-04T12:15:05.7139630Z E1204 12:00:17.980000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T12:15:05.7140812Z E1204 12:00:17.980000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T12:15:05.7142036Z E1204 12:00:17.980000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:05.7143305Z E1204 12:00:17.980000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T12:15:05.7144608Z E1204 12:00:17.980000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T12:15:05.7145888Z E1204 12:00:17.980000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_base = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T12:15:05.7147088Z E1204 12:00:17.980000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rbase = r0_base 2025-12-04T12:15:05.7148117Z E1204 12:00:17.980000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T12:15:05.7149283Z E1204 12:00:17.980000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_mean = tl.zeros([XBLOCK, R0_BLOCK], tl.float32) 2025-12-04T12:15:05.7150603Z E1204 12:00:17.980000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_m2 = tl.zeros([XBLOCK, R0_BLOCK], tl.float32) 2025-12-04T12:15:05.7151946Z E1204 12:00:17.980000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_weight = tl.zeros([XBLOCK, R0_BLOCK], tl.float32) 2025-12-04T12:15:05.7153277Z E1204 12:00:17.980000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] for r0_offset in tl.range(0, r0_numel, R0_BLOCK): 2025-12-04T12:15:05.7154526Z E1204 12:00:17.980000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = r0_offset + r0_base 2025-12-04T12:15:05.7155726Z E1204 12:00:17.980000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T12:15:05.7156881Z E1204 12:00:17.980000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T12:15:05.7157998Z E1204 12:00:17.980000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T12:15:05.7159072Z E1204 12:00:17.980000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_1 = r0_index 2025-12-04T12:15:05.7160510Z E1204 12:00:17.980000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask, eviction_policy='evict_last', other=0.0).to(tl.float32) 2025-12-04T12:15:05.7161958Z E1204 12:00:17.980000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float32) 2025-12-04T12:15:05.7163210Z E1204 12:00:17.980000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK]) 2025-12-04T12:15:05.7164651Z E1204 12:00:17.980000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_mean_next, tmp3_m2_next, tmp3_weight_next = triton_helpers.welford_reduce( 2025-12-04T12:15:05.7166103Z E1204 12:00:17.980000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2, tmp3_mean, tmp3_m2, tmp3_weight, roffset == 0 2025-12-04T12:15:05.7167803Z E1204 12:00:17.980000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ) 2025-12-04T12:15:05.7168990Z E1204 12:00:17.980000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_mean = tl.where(r0_mask, tmp3_mean_next, tmp3_mean) 2025-12-04T12:15:05.7170342Z E1204 12:00:17.980000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_m2 = tl.where(r0_mask, tmp3_m2_next, tmp3_m2) 2025-12-04T12:15:05.7171950Z E1204 12:00:17.980000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_weight = tl.where(r0_mask, tmp3_weight_next, tmp3_weight) 2025-12-04T12:15:05.7173451Z E1204 12:00:17.980000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4, tmp5, tmp6 = triton_helpers.welford(tmp3_mean, tmp3_m2, tmp3_weight, 1) 2025-12-04T12:15:05.7174780Z E1204 12:00:17.980000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tmp4[:, None] 2025-12-04T12:15:05.7175884Z E1204 12:00:17.980000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = tmp5[:, None] 2025-12-04T12:15:05.7177141Z E1204 12:00:17.980000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tmp6[:, None] 2025-12-04T12:15:05.7178380Z E1204 12:00:17.980000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp20 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32) 2025-12-04T12:15:05.7179689Z E1204 12:00:17.980000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp22 = tl.load(in_ptr1 + (0)) 2025-12-04T12:15:05.7180903Z E1204 12:00:17.980000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp23 = tl.broadcast_to(tmp22, [1, 1]) 2025-12-04T12:15:05.7182170Z E1204 12:00:17.980000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] for r0_offset in tl.range(0, r0_numel, R0_BLOCK): 2025-12-04T12:15:05.7183410Z E1204 12:00:17.980000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = r0_offset + r0_base 2025-12-04T12:15:05.7184621Z E1204 12:00:17.980000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T12:15:05.7185782Z E1204 12:00:17.980000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T12:15:05.7186900Z E1204 12:00:17.980000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T12:15:05.7187978Z E1204 12:00:17.980000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_1 = r0_index 2025-12-04T12:15:05.7189385Z E1204 12:00:17.980000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask, eviction_policy='evict_first', other=0.0).to(tl.float32) 2025-12-04T12:15:05.7190841Z E1204 12:00:17.980000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = tmp9.to(tl.float32) 2025-12-04T12:15:05.7192081Z E1204 12:00:17.980000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = tmp10 - tmp3 2025-12-04T12:15:05.7193190Z E1204 12:00:17.980000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = 4096.0 2025-12-04T12:15:05.7194299Z E1204 12:00:17.980000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = (tmp7 / tmp12) 2025-12-04T12:15:05.7195411Z E1204 12:00:17.980000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp14 = 1e-05 2025-12-04T12:15:05.7196515Z E1204 12:00:17.980000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp15 = tmp13 + tmp14 2025-12-04T12:15:05.7197763Z E1204 12:00:17.980000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp16 = libdevice.rsqrt(tmp15) 2025-12-04T12:15:05.7198936Z E1204 12:00:17.980000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp17 = tmp11 * tmp16 2025-12-04T12:15:05.7200096Z E1204 12:00:17.980000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp18 = tl_math.abs(tmp17) 2025-12-04T12:15:05.7201413Z E1204 12:00:17.980000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp19 = tl.broadcast_to(tmp18, [XBLOCK, R0_BLOCK]) 2025-12-04T12:15:05.7202728Z E1204 12:00:17.980000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp21 = triton_helpers.maximum(_tmp20, tmp19) 2025-12-04T12:15:05.7204000Z E1204 12:00:17.980000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp20 = tl.where(r0_mask, tmp21, _tmp20) 2025-12-04T12:15:05.7205210Z E1204 12:00:17.980000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp24 = tmp17 * tmp23 2025-12-04T12:15:05.7206354Z E1204 12:00:17.980000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp25 = -448.0 2025-12-04T12:15:05.7207537Z E1204 12:00:17.980000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp26 = triton_helpers.maximum(tmp24, tmp25) 2025-12-04T12:15:05.7208699Z E1204 12:00:17.980000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp27 = 448.0 2025-12-04T12:15:05.7209872Z E1204 12:00:17.980000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp28 = triton_helpers.minimum(tmp26, tmp27) 2025-12-04T12:15:05.7211129Z E1204 12:00:17.980000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp29 = tmp28.to(tl.float8e4nv) 2025-12-04T12:15:05.7212407Z E1204 12:00:17.980000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr3 + (r0_1 + 4096*x0), tmp29, r0_mask) 2025-12-04T12:15:05.7213726Z E1204 12:00:17.980000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp20 = triton_helpers.max2(_tmp20, 1)[:, None] 2025-12-04T12:15:05.7214972Z E1204 12:00:17.980000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr2 + (x0), tmp20, None) 2025-12-04T12:15:05.7216027Z E1204 12:00:17.980000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:05.7218986Z E1204 12:00:17.980000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr2': '*fp32', 'out_ptr3': '*fp8e4nv', 'xnumel': 'i32', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1, 'R0_BLOCK': 4096}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 75} 2025-12-04T12:15:05.7222004Z E1204 12:00:17.980000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:05.7223736Z E1204 12:00:17.980000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.7225544Z E1204 12:00:17.980000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.7227264Z E1204 12:00:17.980000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.7228989Z E1204 12:00:17.980000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.7230687Z E1204 12:00:17.980000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.7232523Z E1204 12:00:17.980000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.7234037Z E1204 12:00:17.980000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T12:15:05.7235878Z E1204 12:00:17.980000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T12:15:05.7237507Z E1204 12:00:17.980000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T12:15:05.7238913Z E1204 12:00:17.980000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.7240081Z ('RERUN', {'yellow': True}) [3.4421s] [100%] 2025-12-04T12:15:05.7241691Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_4,2048,4096_cuda E1204 12:00:18.485000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0 2025-12-04T12:15:05.7244258Z E1204 12:00:18.485000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T12:15:05.7245936Z E1204 12:00:18.485000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 8192 2025-12-04T12:15:05.7246982Z E1204 12:00:18.485000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 4096 2025-12-04T12:15:05.7248039Z E1204 12:00:18.485000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T12:15:05.7249160Z E1204 12:00:18.485000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T12:15:05.7250394Z E1204 12:00:18.485000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:05.7251706Z E1204 12:00:18.485000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T12:15:05.7253020Z E1204 12:00:18.485000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T12:15:05.7254302Z E1204 12:00:18.485000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_base = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T12:15:05.7255436Z E1204 12:00:18.485000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rbase = r0_base 2025-12-04T12:15:05.7256578Z E1204 12:00:18.485000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T12:15:05.7257885Z E1204 12:00:18.485000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_mean = tl.zeros([XBLOCK, R0_BLOCK], tl.float32) 2025-12-04T12:15:05.7259224Z E1204 12:00:18.485000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_m2 = tl.zeros([XBLOCK, R0_BLOCK], tl.float32) 2025-12-04T12:15:05.7260547Z E1204 12:00:18.485000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_weight = tl.zeros([XBLOCK, R0_BLOCK], tl.float32) 2025-12-04T12:15:05.7261921Z E1204 12:00:18.485000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] for r0_offset in tl.range(0, r0_numel, R0_BLOCK): 2025-12-04T12:15:05.7263183Z E1204 12:00:18.485000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = r0_offset + r0_base 2025-12-04T12:15:05.7264392Z E1204 12:00:18.485000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T12:15:05.7265546Z E1204 12:00:18.485000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T12:15:05.7266718Z E1204 12:00:18.485000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T12:15:05.7267826Z E1204 12:00:18.485000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_1 = r0_index 2025-12-04T12:15:05.7269226Z E1204 12:00:18.485000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask, eviction_policy='evict_last', other=0.0).to(tl.float32) 2025-12-04T12:15:05.7270656Z E1204 12:00:18.485000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float32) 2025-12-04T12:15:05.7272098Z E1204 12:00:18.485000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK]) 2025-12-04T12:15:05.7273551Z E1204 12:00:18.485000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_mean_next, tmp3_m2_next, tmp3_weight_next = triton_helpers.welford_reduce( 2025-12-04T12:15:05.7275025Z E1204 12:00:18.485000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2, tmp3_mean, tmp3_m2, tmp3_weight, roffset == 0 2025-12-04T12:15:05.7276178Z E1204 12:00:18.485000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ) 2025-12-04T12:15:05.7277336Z E1204 12:00:18.485000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_mean = tl.where(r0_mask, tmp3_mean_next, tmp3_mean) 2025-12-04T12:15:05.7278686Z E1204 12:00:18.485000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_m2 = tl.where(r0_mask, tmp3_m2_next, tmp3_m2) 2025-12-04T12:15:05.7280068Z E1204 12:00:18.485000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_weight = tl.where(r0_mask, tmp3_weight_next, tmp3_weight) 2025-12-04T12:15:05.7281635Z E1204 12:00:18.485000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4, tmp5, tmp6 = triton_helpers.welford(tmp3_mean, tmp3_m2, tmp3_weight, 1) 2025-12-04T12:15:05.7282952Z E1204 12:00:18.485000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tmp4[:, None] 2025-12-04T12:15:05.7284051Z E1204 12:00:18.485000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = tmp5[:, None] 2025-12-04T12:15:05.7285150Z E1204 12:00:18.485000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tmp6[:, None] 2025-12-04T12:15:05.7286405Z E1204 12:00:18.485000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp20 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32) 2025-12-04T12:15:05.7287751Z E1204 12:00:18.485000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp22 = tl.load(in_ptr1 + (0)) 2025-12-04T12:15:05.7288962Z E1204 12:00:18.485000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp23 = tl.broadcast_to(tmp22, [1, 1]) 2025-12-04T12:15:05.7290240Z E1204 12:00:18.485000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] for r0_offset in tl.range(0, r0_numel, R0_BLOCK): 2025-12-04T12:15:05.7291536Z E1204 12:00:18.485000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = r0_offset + r0_base 2025-12-04T12:15:05.7292742Z E1204 12:00:18.485000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T12:15:05.7293889Z E1204 12:00:18.485000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T12:15:05.7295008Z E1204 12:00:18.485000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T12:15:05.7296100Z E1204 12:00:18.485000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_1 = r0_index 2025-12-04T12:15:05.7297618Z E1204 12:00:18.485000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask, eviction_policy='evict_first', other=0.0).to(tl.float32) 2025-12-04T12:15:05.7299061Z E1204 12:00:18.485000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = tmp9.to(tl.float32) 2025-12-04T12:15:05.7300229Z E1204 12:00:18.485000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = tmp10 - tmp3 2025-12-04T12:15:05.7301332Z E1204 12:00:18.485000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = 4096.0 2025-12-04T12:15:05.7302446Z E1204 12:00:18.485000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = (tmp7 / tmp12) 2025-12-04T12:15:05.7303537Z E1204 12:00:18.485000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp14 = 1e-05 2025-12-04T12:15:05.7304644Z E1204 12:00:18.485000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp15 = tmp13 + tmp14 2025-12-04T12:15:05.7305823Z E1204 12:00:18.485000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp16 = libdevice.rsqrt(tmp15) 2025-12-04T12:15:05.7307007Z E1204 12:00:18.485000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp17 = tmp11 * tmp16 2025-12-04T12:15:05.7308172Z E1204 12:00:18.485000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp18 = tl_math.abs(tmp17) 2025-12-04T12:15:05.7309417Z E1204 12:00:18.485000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp19 = tl.broadcast_to(tmp18, [XBLOCK, R0_BLOCK]) 2025-12-04T12:15:05.7310746Z E1204 12:00:18.485000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp21 = triton_helpers.maximum(_tmp20, tmp19) 2025-12-04T12:15:05.7312084Z E1204 12:00:18.485000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp20 = tl.where(r0_mask, tmp21, _tmp20) 2025-12-04T12:15:05.7313288Z E1204 12:00:18.485000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp24 = tmp17 * tmp23 2025-12-04T12:15:05.7314387Z E1204 12:00:18.485000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp25 = -448.0 2025-12-04T12:15:05.7315571Z E1204 12:00:18.485000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp26 = triton_helpers.maximum(tmp24, tmp25) 2025-12-04T12:15:05.7316753Z E1204 12:00:18.485000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp27 = 448.0 2025-12-04T12:15:05.7317966Z E1204 12:00:18.485000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp28 = triton_helpers.minimum(tmp26, tmp27) 2025-12-04T12:15:05.7319215Z E1204 12:00:18.485000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp29 = tmp28.to(tl.float8e4nv) 2025-12-04T12:15:05.7320490Z E1204 12:00:18.485000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr3 + (r0_1 + 4096*x0), tmp29, r0_mask) 2025-12-04T12:15:05.7321832Z E1204 12:00:18.485000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp20 = triton_helpers.max2(_tmp20, 1)[:, None] 2025-12-04T12:15:05.7323097Z E1204 12:00:18.485000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr2 + (x0), tmp20, None) 2025-12-04T12:15:05.7324131Z E1204 12:00:18.485000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:05.7326979Z E1204 12:00:18.485000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr2': '*fp32', 'out_ptr3': '*fp8e4nv', 'xnumel': 'i32', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1, 'R0_BLOCK': 4096}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 75} 2025-12-04T12:15:05.7330024Z E1204 12:00:18.485000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:05.7331745Z E1204 12:00:18.485000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.7333554Z E1204 12:00:18.485000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.7335213Z E1204 12:00:18.485000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.7336975Z E1204 12:00:18.485000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.7339241Z E1204 12:00:18.485000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.7341049Z E1204 12:00:18.485000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.7342650Z E1204 12:00:18.485000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T12:15:05.7344489Z E1204 12:00:18.485000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T12:15:05.7346095Z E1204 12:00:18.485000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T12:15:05.7347527Z E1204 12:00:18.485000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.7348701Z ('RERUN', {'yellow': True}) [0.4662s] [100%] 2025-12-04T12:15:05.7350337Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_4,2048,4096_cuda E1204 12:00:18.941000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0 2025-12-04T12:15:05.7353116Z E1204 12:00:18.941000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T12:15:05.7354793Z E1204 12:00:18.941000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 8192 2025-12-04T12:15:05.7355814Z E1204 12:00:18.941000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 4096 2025-12-04T12:15:05.7356877Z E1204 12:00:18.941000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T12:15:05.7358091Z E1204 12:00:18.941000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T12:15:05.7359316Z E1204 12:00:18.941000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:05.7360573Z E1204 12:00:18.941000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T12:15:05.7361889Z E1204 12:00:18.941000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T12:15:05.7363171Z E1204 12:00:18.941000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_base = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T12:15:05.7364321Z E1204 12:00:18.941000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rbase = r0_base 2025-12-04T12:15:05.7365341Z E1204 12:00:18.941000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T12:15:05.7366528Z E1204 12:00:18.941000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_mean = tl.zeros([XBLOCK, R0_BLOCK], tl.float32) 2025-12-04T12:15:05.7367857Z E1204 12:00:18.941000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_m2 = tl.zeros([XBLOCK, R0_BLOCK], tl.float32) 2025-12-04T12:15:05.7369186Z E1204 12:00:18.941000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_weight = tl.zeros([XBLOCK, R0_BLOCK], tl.float32) 2025-12-04T12:15:05.7370513Z E1204 12:00:18.941000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] for r0_offset in tl.range(0, r0_numel, R0_BLOCK): 2025-12-04T12:15:05.7371941Z E1204 12:00:18.941000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = r0_offset + r0_base 2025-12-04T12:15:05.7373277Z E1204 12:00:18.941000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T12:15:05.7374443Z E1204 12:00:18.941000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T12:15:05.7375574Z E1204 12:00:18.941000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T12:15:05.7376720Z E1204 12:00:18.941000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_1 = r0_index 2025-12-04T12:15:05.7378194Z E1204 12:00:18.941000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask, eviction_policy='evict_last', other=0.0).to(tl.float32) 2025-12-04T12:15:05.7379638Z E1204 12:00:18.941000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float32) 2025-12-04T12:15:05.7380906Z E1204 12:00:18.941000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK]) 2025-12-04T12:15:05.7382343Z E1204 12:00:18.941000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_mean_next, tmp3_m2_next, tmp3_weight_next = triton_helpers.welford_reduce( 2025-12-04T12:15:05.7383856Z E1204 12:00:18.941000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2, tmp3_mean, tmp3_m2, tmp3_weight, roffset == 0 2025-12-04T12:15:05.7385014Z E1204 12:00:18.941000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ) 2025-12-04T12:15:05.7386188Z E1204 12:00:18.941000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_mean = tl.where(r0_mask, tmp3_mean_next, tmp3_mean) 2025-12-04T12:15:05.7387597Z E1204 12:00:18.941000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_m2 = tl.where(r0_mask, tmp3_m2_next, tmp3_m2) 2025-12-04T12:15:05.7388975Z E1204 12:00:18.941000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_weight = tl.where(r0_mask, tmp3_weight_next, tmp3_weight) 2025-12-04T12:15:05.7390478Z E1204 12:00:18.941000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4, tmp5, tmp6 = triton_helpers.welford(tmp3_mean, tmp3_m2, tmp3_weight, 1) 2025-12-04T12:15:05.7391808Z E1204 12:00:18.941000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tmp4[:, None] 2025-12-04T12:15:05.7392913Z E1204 12:00:18.941000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = tmp5[:, None] 2025-12-04T12:15:05.7394010Z E1204 12:00:18.941000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tmp6[:, None] 2025-12-04T12:15:05.7395277Z E1204 12:00:18.941000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp20 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32) 2025-12-04T12:15:05.7396925Z E1204 12:00:18.941000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp22 = tl.load(in_ptr1 + (0)) 2025-12-04T12:15:05.7398149Z E1204 12:00:18.941000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp23 = tl.broadcast_to(tmp22, [1, 1]) 2025-12-04T12:15:05.7399423Z E1204 12:00:18.941000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] for r0_offset in tl.range(0, r0_numel, R0_BLOCK): 2025-12-04T12:15:05.7400669Z E1204 12:00:18.941000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = r0_offset + r0_base 2025-12-04T12:15:05.7401879Z E1204 12:00:18.941000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T12:15:05.7403116Z E1204 12:00:18.941000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T12:15:05.7404237Z E1204 12:00:18.941000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T12:15:05.7405316Z E1204 12:00:18.941000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_1 = r0_index 2025-12-04T12:15:05.7406724Z E1204 12:00:18.941000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask, eviction_policy='evict_first', other=0.0).to(tl.float32) 2025-12-04T12:15:05.7408177Z E1204 12:00:18.941000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = tmp9.to(tl.float32) 2025-12-04T12:15:05.7409391Z E1204 12:00:18.941000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = tmp10 - tmp3 2025-12-04T12:15:05.7410491Z E1204 12:00:18.941000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = 4096.0 2025-12-04T12:15:05.7411602Z E1204 12:00:18.941000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = (tmp7 / tmp12) 2025-12-04T12:15:05.7412746Z E1204 12:00:18.941000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp14 = 1e-05 2025-12-04T12:15:05.7413868Z E1204 12:00:18.941000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp15 = tmp13 + tmp14 2025-12-04T12:15:05.7415034Z E1204 12:00:18.941000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp16 = libdevice.rsqrt(tmp15) 2025-12-04T12:15:05.7416218Z E1204 12:00:18.941000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp17 = tmp11 * tmp16 2025-12-04T12:15:05.7417453Z E1204 12:00:18.941000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp18 = tl_math.abs(tmp17) 2025-12-04T12:15:05.7418766Z E1204 12:00:18.941000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp19 = tl.broadcast_to(tmp18, [XBLOCK, R0_BLOCK]) 2025-12-04T12:15:05.7420096Z E1204 12:00:18.941000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp21 = triton_helpers.maximum(_tmp20, tmp19) 2025-12-04T12:15:05.7421366Z E1204 12:00:18.941000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp20 = tl.where(r0_mask, tmp21, _tmp20) 2025-12-04T12:15:05.7422567Z E1204 12:00:18.941000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp24 = tmp17 * tmp23 2025-12-04T12:15:05.7423682Z E1204 12:00:18.941000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp25 = -448.0 2025-12-04T12:15:05.7424864Z E1204 12:00:18.941000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp26 = triton_helpers.maximum(tmp24, tmp25) 2025-12-04T12:15:05.7426045Z E1204 12:00:18.941000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp27 = 448.0 2025-12-04T12:15:05.7427228Z E1204 12:00:18.941000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp28 = triton_helpers.minimum(tmp26, tmp27) 2025-12-04T12:15:05.7428487Z E1204 12:00:18.941000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp29 = tmp28.to(tl.float8e4nv) 2025-12-04T12:15:05.7429765Z E1204 12:00:18.941000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr3 + (r0_1 + 4096*x0), tmp29, r0_mask) 2025-12-04T12:15:05.7431067Z E1204 12:00:18.941000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp20 = triton_helpers.max2(_tmp20, 1)[:, None] 2025-12-04T12:15:05.7432378Z E1204 12:00:18.941000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr2 + (x0), tmp20, None) 2025-12-04T12:15:05.7433433Z E1204 12:00:18.941000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:05.7436316Z E1204 12:00:18.941000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr2': '*fp32', 'out_ptr3': '*fp8e4nv', 'xnumel': 'i32', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1, 'R0_BLOCK': 4096}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 75} 2025-12-04T12:15:05.7439333Z E1204 12:00:18.941000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:05.7441056Z E1204 12:00:18.941000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.7442892Z E1204 12:00:18.941000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.7444563Z E1204 12:00:18.941000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.7446283Z E1204 12:00:18.941000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.7448055Z E1204 12:00:18.941000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.7449845Z E1204 12:00:18.941000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.7451358Z E1204 12:00:18.941000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T12:15:05.7453202Z E1204 12:00:18.941000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T12:15:05.7454804Z E1204 12:00:18.941000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T12:15:05.7456211Z E1204 12:00:18.941000 119510 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.7457444Z FAILED [0.4526s] [100%] 2025-12-04T12:15:05.7457625Z 2025-12-04T12:15:05.7457772Z ==================================== RERUNS ==================================== 2025-12-04T12:15:05.7458480Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_4,2048,4096_cuda _ 2025-12-04T12:15:05.7459163Z Traceback (most recent call last): 2025-12-04T12:15:05.7459819Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant 2025-12-04T12:15:05.7460618Z y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T12:15:05.7461482Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:05.7462407Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:05.7463299Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:05.7464152Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:05.7464991Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:05.7465792Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:05.7466599Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:05.7467597Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:05.7468629Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:05.7469444Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:05.7470201Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:05.7471141Z return self._compile_to_module() 2025-12-04T12:15:05.7471971Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:05.7472752Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:05.7473575Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:05.7474367Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:05.7475133Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:05.7475991Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:05.7477007Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:05.7477861Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:05.7478594Z File "/tmp/tmpgrziu__4/6u/c6uemlisc3hsoq7726bsp3fmh5ylwaujqxfeqxscwcgdf5lwfxby.py", line 65, in 2025-12-04T12:15:05.7479675Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T12:15:05.7480400Z kernel.precompile( 2025-12-04T12:15:05.7481149Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T12:15:05.7481954Z self._precompile_worker() 2025-12-04T12:15:05.7483186Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:05.7484111Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:05.7485047Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.7485976Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.7486777Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.7487625Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.7488466Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.7489381Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.7490092Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:05.7491112Z def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T12:15:05.7491986Z ^ 2025-12-04T12:15:05.7492667Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.7493277Z 2025-12-04T12:15:05.7494000Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:05.7494845Z 2025-12-04T12:15:05.7494850Z 2025-12-04T12:15:05.7495089Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:05.7496172Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_4,2048,4096_cuda 2025-12-04T12:15:05.7497090Z 2025-12-04T12:15:05.7497421Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:05.7498067Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.7498547Z frames [('total', 1)] 2025-12-04T12:15:05.7499069Z stats [('calls_captured', 10)] 2025-12-04T12:15:05.7499759Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:05.7500649Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.7501123Z graph_break [] 2025-12-04T12:15:05.7501682Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_4,2048,4096_cuda _ 2025-12-04T12:15:05.7502367Z Traceback (most recent call last): 2025-12-04T12:15:05.7503038Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant 2025-12-04T12:15:05.7503825Z y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T12:15:05.7504686Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:05.7505604Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:05.7506509Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:05.7507345Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:05.7508183Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:05.7508982Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:05.7509809Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:05.7510796Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:05.7511798Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:05.7512615Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:05.7513378Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:05.7514110Z return self._compile_to_module() 2025-12-04T12:15:05.7514850Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:05.7515646Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:05.7516449Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:05.7517245Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:05.7517998Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:05.7518869Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:05.7519860Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:05.7520720Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:05.7521486Z File "/tmp/tmpkckkfsyj/es/ces7zyrn5mfa4cjuiroujifwbbngjz4ts3ahgghgeak7bx3qq2at.py", line 65, in 2025-12-04T12:15:05.7522600Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T12:15:05.7523310Z kernel.precompile( 2025-12-04T12:15:05.7524061Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T12:15:05.7524877Z self._precompile_worker() 2025-12-04T12:15:05.7525726Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:05.7526651Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:05.7527567Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.7528510Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.7529285Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.7530167Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.7530998Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.7531926Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.7532617Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:05.7533635Z def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T12:15:05.7534561Z ^ 2025-12-04T12:15:05.7535137Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.7535744Z 2025-12-04T12:15:05.7536534Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:05.7537394Z 2025-12-04T12:15:05.7537399Z 2025-12-04T12:15:05.7537619Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:05.7538695Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_4,2048,4096_cuda 2025-12-04T12:15:05.7539544Z 2025-12-04T12:15:05.7539837Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:05.7540461Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.7540935Z frames [('total', 1)] 2025-12-04T12:15:05.7541249Z stats [('calls_captured', 10)] 2025-12-04T12:15:05.7541921Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:05.7542754Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.7543224Z graph_break [] 2025-12-04T12:15:05.7543607Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.7544061Z frames [('total', 1)] 2025-12-04T12:15:05.7544360Z stats [('calls_captured', 10)] 2025-12-04T12:15:05.7544796Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.7545607Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:05.7546320Z graph_break [] 2025-12-04T12:15:05.7546631Z =================================== FAILURES =================================== 2025-12-04T12:15:05.7547390Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_4,2048,4096_cuda _ 2025-12-04T12:15:05.7548056Z Traceback (most recent call last): 2025-12-04T12:15:05.7548727Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant 2025-12-04T12:15:05.7549528Z y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T12:15:05.7550021Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:05.7550268Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:05.7550797Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:05.7551032Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:05.7551553Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:05.7551706Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:05.7552242Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:05.7552635Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:05.7553155Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:05.7553314Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:05.7553795Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:05.7553918Z return self._compile_to_module() 2025-12-04T12:15:05.7561487Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:05.7561843Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:05.7562396Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:05.7562550Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:05.7563057Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:05.7563311Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:05.7563900Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:05.7564032Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:05.7564548Z File "/tmp/tmpj3600sez/lb/clbiv5vsuuc2fiqaak2ofncdtzl3q7rgj245pgjl43nlxhfgzwpt.py", line 65, in 2025-12-04T12:15:05.7565014Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T12:15:05.7565134Z kernel.precompile( 2025-12-04T12:15:05.7565705Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T12:15:05.7565831Z self._precompile_worker() 2025-12-04T12:15:05.7566439Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:05.7566621Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:05.7567213Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.7567430Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.7567885Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.7568152Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.7568642Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.7568982Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.7569231Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:05.7569888Z def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T12:15:05.7569994Z ^ 2025-12-04T12:15:05.7570456Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.7570464Z 2025-12-04T12:15:05.7571500Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:05.7571511Z 2025-12-04T12:15:05.7571516Z 2025-12-04T12:15:05.7571753Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:05.7572472Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_4,2048,4096_cuda 2025-12-04T12:15:05.7572534Z 2025-12-04T12:15:05.7572822Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:05.7573052Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.7573159Z frames [('total', 1)] 2025-12-04T12:15:05.7573295Z stats [('calls_captured', 10)] 2025-12-04T12:15:05.7573768Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:05.7574009Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.7574187Z graph_break [] 2025-12-04T12:15:05.7574416Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.7574531Z frames [('total', 1)] 2025-12-04T12:15:05.7574647Z stats [('calls_captured', 10)] 2025-12-04T12:15:05.7574866Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.7575346Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:05.7575446Z graph_break [] 2025-12-04T12:15:05.7575664Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.7575783Z frames [('total', 1)] 2025-12-04T12:15:05.7575900Z stats [('calls_captured', 10)] 2025-12-04T12:15:05.7576131Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.7576663Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:05.7576766Z graph_break [] 2025-12-04T12:15:05.7577434Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-7c20e7902388541e.xml - 2025-12-04T12:15:05.7577609Z =========================== short test summary info ============================ 2025-12-04T12:15:05.7578496Z FAILED [0.4526s] inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_4,2048,4096_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:05.7579141Z def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T12:15:05.7579231Z ^ 2025-12-04T12:15:05.7579707Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.7579713Z 2025-12-04T12:15:05.7580484Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:05.7580493Z 2025-12-04T12:15:05.7580498Z 2025-12-04T12:15:05.7580731Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:05.7581451Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_4,2048,4096_cuda 2025-12-04T12:15:05.7581457Z 2025-12-04T12:15:05.7581736Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:05.7581918Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T12:15:05.7582121Z ================== 1 failed, 187 deselected, 2 rerun in 4.41s ================== 2025-12-04T12:15:05.7582237Z Got exit code 1 2025-12-04T12:15:05.7582379Z Retrying single test... 2025-12-04T12:15:05.7582846Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-43cf13c151388d8e.xml 2025-12-04T12:15:05.7583032Z ============================= test session starts ============================== 2025-12-04T12:15:05.7583385Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T12:15:05.7583528Z cachedir: .pytest_cache 2025-12-04T12:15:05.7584059Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:15:05.7584186Z rootdir: /var/lib/jenkins/workspace 2025-12-04T12:15:05.7584307Z configfile: pytest.ini 2025-12-04T12:15:05.7584899Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T12:15:05.7585118Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T12:15:05.7585934Z stepcurrent: skipping 31 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_4,2048,4096_cuda 2025-12-04T12:15:05.7586085Z Running 1 items in this shard 2025-12-04T12:15:05.7586091Z 2025-12-04T12:15:05.7587454Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_4,2048,4096_cuda E1204 12:00:37.970000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0 2025-12-04T12:15:05.7588543Z E1204 12:00:37.970000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T12:15:05.7589002Z E1204 12:00:37.970000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 8192 2025-12-04T12:15:05.7589455Z E1204 12:00:37.970000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 4096 2025-12-04T12:15:05.7589924Z E1204 12:00:37.970000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T12:15:05.7590473Z E1204 12:00:37.970000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T12:15:05.7591011Z E1204 12:00:37.970000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:05.7591604Z E1204 12:00:37.970000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T12:15:05.7592187Z E1204 12:00:37.970000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T12:15:05.7592737Z E1204 12:00:37.970000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_base = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T12:15:05.7593240Z E1204 12:00:37.970000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rbase = r0_base 2025-12-04T12:15:05.7593673Z E1204 12:00:37.970000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T12:15:05.7594281Z E1204 12:00:37.970000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_mean = tl.zeros([XBLOCK, R0_BLOCK], tl.float32) 2025-12-04T12:15:05.7594863Z E1204 12:00:37.970000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_m2 = tl.zeros([XBLOCK, R0_BLOCK], tl.float32) 2025-12-04T12:15:05.7595500Z E1204 12:00:37.970000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_weight = tl.zeros([XBLOCK, R0_BLOCK], tl.float32) 2025-12-04T12:15:05.7596093Z E1204 12:00:37.970000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] for r0_offset in tl.range(0, r0_numel, R0_BLOCK): 2025-12-04T12:15:05.7596631Z E1204 12:00:37.970000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = r0_offset + r0_base 2025-12-04T12:15:05.7597202Z E1204 12:00:37.970000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T12:15:05.7597697Z E1204 12:00:37.970000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T12:15:05.7598190Z E1204 12:00:37.970000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T12:15:05.7598656Z E1204 12:00:37.970000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_1 = r0_index 2025-12-04T12:15:05.7599445Z E1204 12:00:37.970000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask, eviction_policy='evict_last', other=0.0).to(tl.float32) 2025-12-04T12:15:05.7600014Z E1204 12:00:37.970000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float32) 2025-12-04T12:15:05.7600607Z E1204 12:00:37.970000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK]) 2025-12-04T12:15:05.7601342Z E1204 12:00:37.970000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_mean_next, tmp3_m2_next, tmp3_weight_next = triton_helpers.welford_reduce( 2025-12-04T12:15:05.7601945Z E1204 12:00:37.970000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2, tmp3_mean, tmp3_m2, tmp3_weight, roffset == 0 2025-12-04T12:15:05.7602349Z E1204 12:00:37.970000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ) 2025-12-04T12:15:05.7602982Z E1204 12:00:37.970000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_mean = tl.where(r0_mask, tmp3_mean_next, tmp3_mean) 2025-12-04T12:15:05.7603572Z E1204 12:00:37.970000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_m2 = tl.where(r0_mask, tmp3_m2_next, tmp3_m2) 2025-12-04T12:15:05.7604232Z E1204 12:00:37.970000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_weight = tl.where(r0_mask, tmp3_weight_next, tmp3_weight) 2025-12-04T12:15:05.7604936Z E1204 12:00:37.970000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4, tmp5, tmp6 = triton_helpers.welford(tmp3_mean, tmp3_m2, tmp3_weight, 1) 2025-12-04T12:15:05.7605428Z E1204 12:00:37.970000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tmp4[:, None] 2025-12-04T12:15:05.7605904Z E1204 12:00:37.970000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = tmp5[:, None] 2025-12-04T12:15:05.7606410Z E1204 12:00:37.970000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tmp6[:, None] 2025-12-04T12:15:05.7607058Z E1204 12:00:37.970000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp20 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32) 2025-12-04T12:15:05.7607583Z E1204 12:00:37.970000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp22 = tl.load(in_ptr1 + (0)) 2025-12-04T12:15:05.7608140Z E1204 12:00:37.970000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp23 = tl.broadcast_to(tmp22, [1, 1]) 2025-12-04T12:15:05.7608767Z E1204 12:00:37.970000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] for r0_offset in tl.range(0, r0_numel, R0_BLOCK): 2025-12-04T12:15:05.7609297Z E1204 12:00:37.970000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = r0_offset + r0_base 2025-12-04T12:15:05.7609838Z E1204 12:00:37.970000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T12:15:05.7610326Z E1204 12:00:37.970000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T12:15:05.7610850Z E1204 12:00:37.970000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T12:15:05.7611315Z E1204 12:00:37.970000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_1 = r0_index 2025-12-04T12:15:05.7612102Z E1204 12:00:37.970000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask, eviction_policy='evict_first', other=0.0).to(tl.float32) 2025-12-04T12:15:05.7612645Z E1204 12:00:37.970000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = tmp9.to(tl.float32) 2025-12-04T12:15:05.7613179Z E1204 12:00:37.970000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = tmp10 - tmp3 2025-12-04T12:15:05.7613658Z E1204 12:00:37.970000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = 4096.0 2025-12-04T12:15:05.7614161Z E1204 12:00:37.970000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = (tmp7 / tmp12) 2025-12-04T12:15:05.7614630Z E1204 12:00:37.970000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp14 = 1e-05 2025-12-04T12:15:05.7615124Z E1204 12:00:37.970000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp15 = tmp13 + tmp14 2025-12-04T12:15:05.7615660Z E1204 12:00:37.970000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp16 = libdevice.rsqrt(tmp15) 2025-12-04T12:15:05.7616170Z E1204 12:00:37.970000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp17 = tmp11 * tmp16 2025-12-04T12:15:05.7616756Z E1204 12:00:37.970000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp18 = tl_math.abs(tmp17) 2025-12-04T12:15:05.7617365Z E1204 12:00:37.970000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp19 = tl.broadcast_to(tmp18, [XBLOCK, R0_BLOCK]) 2025-12-04T12:15:05.7617947Z E1204 12:00:37.970000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp21 = triton_helpers.maximum(_tmp20, tmp19) 2025-12-04T12:15:05.7618508Z E1204 12:00:37.970000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp20 = tl.where(r0_mask, tmp21, _tmp20) 2025-12-04T12:15:05.7619018Z E1204 12:00:37.970000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp24 = tmp17 * tmp23 2025-12-04T12:15:05.7619519Z E1204 12:00:37.970000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp25 = -448.0 2025-12-04T12:15:05.7620109Z E1204 12:00:37.970000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp26 = triton_helpers.maximum(tmp24, tmp25) 2025-12-04T12:15:05.7620572Z E1204 12:00:37.970000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp27 = 448.0 2025-12-04T12:15:05.7621145Z E1204 12:00:37.970000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp28 = triton_helpers.minimum(tmp26, tmp27) 2025-12-04T12:15:05.7621696Z E1204 12:00:37.970000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp29 = tmp28.to(tl.float8e4nv) 2025-12-04T12:15:05.7622328Z E1204 12:00:37.970000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr3 + (r0_1 + 4096*x0), tmp29, r0_mask) 2025-12-04T12:15:05.7622924Z E1204 12:00:37.970000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp20 = triton_helpers.max2(_tmp20, 1)[:, None] 2025-12-04T12:15:05.7623469Z E1204 12:00:37.970000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr2 + (x0), tmp20, None) 2025-12-04T12:15:05.7623895Z E1204 12:00:37.970000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:05.7626253Z E1204 12:00:37.970000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr2': '*fp32', 'out_ptr3': '*fp8e4nv', 'xnumel': 'i32', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1, 'R0_BLOCK': 4096}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 75} 2025-12-04T12:15:05.7626831Z E1204 12:00:37.970000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:05.7627876Z E1204 12:00:37.970000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.7628498Z E1204 12:00:37.970000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.7629407Z E1204 12:00:37.970000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.7630090Z E1204 12:00:37.970000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.7630986Z E1204 12:00:37.970000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.7631758Z E1204 12:00:37.970000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.7632377Z E1204 12:00:37.970000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T12:15:05.7633583Z E1204 12:00:37.970000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T12:15:05.7633952Z E1204 12:00:37.970000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T12:15:05.7634860Z E1204 12:00:37.970000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.7634995Z ('RERUN', {'yellow': True}) [3.4453s] [100%] 2025-12-04T12:15:05.7636389Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_4,2048,4096_cuda E1204 12:00:38.463000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0 2025-12-04T12:15:05.7637474Z E1204 12:00:38.463000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T12:15:05.7637938Z E1204 12:00:38.463000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 8192 2025-12-04T12:15:05.7638417Z E1204 12:00:38.463000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 4096 2025-12-04T12:15:05.7638876Z E1204 12:00:38.463000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T12:15:05.7639423Z E1204 12:00:38.463000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T12:15:05.7639964Z E1204 12:00:38.463000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:05.7640569Z E1204 12:00:38.463000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T12:15:05.7641182Z E1204 12:00:38.463000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T12:15:05.7641735Z E1204 12:00:38.463000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_base = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T12:15:05.7642199Z E1204 12:00:38.463000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rbase = r0_base 2025-12-04T12:15:05.7642629Z E1204 12:00:38.463000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T12:15:05.7643243Z E1204 12:00:38.463000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_mean = tl.zeros([XBLOCK, R0_BLOCK], tl.float32) 2025-12-04T12:15:05.7643825Z E1204 12:00:38.463000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_m2 = tl.zeros([XBLOCK, R0_BLOCK], tl.float32) 2025-12-04T12:15:05.7644451Z E1204 12:00:38.463000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_weight = tl.zeros([XBLOCK, R0_BLOCK], tl.float32) 2025-12-04T12:15:05.7645032Z E1204 12:00:38.463000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] for r0_offset in tl.range(0, r0_numel, R0_BLOCK): 2025-12-04T12:15:05.7645562Z E1204 12:00:38.463000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = r0_offset + r0_base 2025-12-04T12:15:05.7646099Z E1204 12:00:38.463000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T12:15:05.7646593Z E1204 12:00:38.463000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T12:15:05.7647120Z E1204 12:00:38.463000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T12:15:05.7647589Z E1204 12:00:38.463000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_1 = r0_index 2025-12-04T12:15:05.7648370Z E1204 12:00:38.463000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask, eviction_policy='evict_last', other=0.0).to(tl.float32) 2025-12-04T12:15:05.7648906Z E1204 12:00:38.463000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float32) 2025-12-04T12:15:05.7649490Z E1204 12:00:38.463000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK]) 2025-12-04T12:15:05.7650247Z E1204 12:00:38.463000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_mean_next, tmp3_m2_next, tmp3_weight_next = triton_helpers.welford_reduce( 2025-12-04T12:15:05.7650855Z E1204 12:00:38.463000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2, tmp3_mean, tmp3_m2, tmp3_weight, roffset == 0 2025-12-04T12:15:05.7651267Z E1204 12:00:38.463000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ) 2025-12-04T12:15:05.7651918Z E1204 12:00:38.463000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_mean = tl.where(r0_mask, tmp3_mean_next, tmp3_mean) 2025-12-04T12:15:05.7652511Z E1204 12:00:38.463000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_m2 = tl.where(r0_mask, tmp3_m2_next, tmp3_m2) 2025-12-04T12:15:05.7653168Z E1204 12:00:38.463000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_weight = tl.where(r0_mask, tmp3_weight_next, tmp3_weight) 2025-12-04T12:15:05.7653880Z E1204 12:00:38.463000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4, tmp5, tmp6 = triton_helpers.welford(tmp3_mean, tmp3_m2, tmp3_weight, 1) 2025-12-04T12:15:05.7654400Z E1204 12:00:38.463000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tmp4[:, None] 2025-12-04T12:15:05.7654878Z E1204 12:00:38.463000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = tmp5[:, None] 2025-12-04T12:15:05.7655351Z E1204 12:00:38.463000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tmp6[:, None] 2025-12-04T12:15:05.7655999Z E1204 12:00:38.463000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp20 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32) 2025-12-04T12:15:05.7656594Z E1204 12:00:38.463000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp22 = tl.load(in_ptr1 + (0)) 2025-12-04T12:15:05.7657157Z E1204 12:00:38.463000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp23 = tl.broadcast_to(tmp22, [1, 1]) 2025-12-04T12:15:05.7657741Z E1204 12:00:38.463000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] for r0_offset in tl.range(0, r0_numel, R0_BLOCK): 2025-12-04T12:15:05.7658272Z E1204 12:00:38.463000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = r0_offset + r0_base 2025-12-04T12:15:05.7658814Z E1204 12:00:38.463000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T12:15:05.7659305Z E1204 12:00:38.463000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T12:15:05.7659802Z E1204 12:00:38.463000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T12:15:05.7660268Z E1204 12:00:38.463000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_1 = r0_index 2025-12-04T12:15:05.7661130Z E1204 12:00:38.463000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask, eviction_policy='evict_first', other=0.0).to(tl.float32) 2025-12-04T12:15:05.7661666Z E1204 12:00:38.463000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = tmp9.to(tl.float32) 2025-12-04T12:15:05.7662162Z E1204 12:00:38.463000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = tmp10 - tmp3 2025-12-04T12:15:05.7662640Z E1204 12:00:38.463000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = 4096.0 2025-12-04T12:15:05.7663171Z E1204 12:00:38.463000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = (tmp7 / tmp12) 2025-12-04T12:15:05.7663646Z E1204 12:00:38.463000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp14 = 1e-05 2025-12-04T12:15:05.7664148Z E1204 12:00:38.463000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp15 = tmp13 + tmp14 2025-12-04T12:15:05.7664684Z E1204 12:00:38.463000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp16 = libdevice.rsqrt(tmp15) 2025-12-04T12:15:05.7665225Z E1204 12:00:38.463000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp17 = tmp11 * tmp16 2025-12-04T12:15:05.7665749Z E1204 12:00:38.463000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp18 = tl_math.abs(tmp17) 2025-12-04T12:15:05.7666358Z E1204 12:00:38.463000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp19 = tl.broadcast_to(tmp18, [XBLOCK, R0_BLOCK]) 2025-12-04T12:15:05.7666940Z E1204 12:00:38.463000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp21 = triton_helpers.maximum(_tmp20, tmp19) 2025-12-04T12:15:05.7667538Z E1204 12:00:38.463000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp20 = tl.where(r0_mask, tmp21, _tmp20) 2025-12-04T12:15:05.7668048Z E1204 12:00:38.463000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp24 = tmp17 * tmp23 2025-12-04T12:15:05.7668516Z E1204 12:00:38.463000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp25 = -448.0 2025-12-04T12:15:05.7669109Z E1204 12:00:38.463000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp26 = triton_helpers.maximum(tmp24, tmp25) 2025-12-04T12:15:05.7669568Z E1204 12:00:38.463000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp27 = 448.0 2025-12-04T12:15:05.7670147Z E1204 12:00:38.463000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp28 = triton_helpers.minimum(tmp26, tmp27) 2025-12-04T12:15:05.7670706Z E1204 12:00:38.463000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp29 = tmp28.to(tl.float8e4nv) 2025-12-04T12:15:05.7671531Z E1204 12:00:38.463000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr3 + (r0_1 + 4096*x0), tmp29, r0_mask) 2025-12-04T12:15:05.7672128Z E1204 12:00:38.463000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp20 = triton_helpers.max2(_tmp20, 1)[:, None] 2025-12-04T12:15:05.7672672Z E1204 12:00:38.463000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr2 + (x0), tmp20, None) 2025-12-04T12:15:05.7673049Z E1204 12:00:38.463000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:05.7675474Z E1204 12:00:38.463000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr2': '*fp32', 'out_ptr3': '*fp8e4nv', 'xnumel': 'i32', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1, 'R0_BLOCK': 4096}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 75} 2025-12-04T12:15:05.7676030Z E1204 12:00:38.463000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:05.7677121Z E1204 12:00:38.463000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.7677759Z E1204 12:00:38.463000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.7678667Z E1204 12:00:38.463000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.7679389Z E1204 12:00:38.463000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.7680284Z E1204 12:00:38.463000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.7681055Z E1204 12:00:38.463000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.7681728Z E1204 12:00:38.463000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T12:15:05.7682816Z E1204 12:00:38.463000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T12:15:05.7683192Z E1204 12:00:38.463000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T12:15:05.7684081Z E1204 12:00:38.463000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.7684216Z ('RERUN', {'yellow': True}) [0.4550s] [100%] 2025-12-04T12:15:05.7685581Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_4,2048,4096_cuda E1204 12:00:38.919000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0 2025-12-04T12:15:05.7686666Z E1204 12:00:38.919000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T12:15:05.7687123Z E1204 12:00:38.919000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 8192 2025-12-04T12:15:05.7687570Z E1204 12:00:38.919000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 4096 2025-12-04T12:15:05.7688032Z E1204 12:00:38.919000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T12:15:05.7688619Z E1204 12:00:38.919000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T12:15:05.7689161Z E1204 12:00:38.919000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:05.7689757Z E1204 12:00:38.919000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T12:15:05.7690338Z E1204 12:00:38.919000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T12:15:05.7690906Z E1204 12:00:38.919000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_base = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T12:15:05.7691390Z E1204 12:00:38.919000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rbase = r0_base 2025-12-04T12:15:05.7691826Z E1204 12:00:38.919000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T12:15:05.7692437Z E1204 12:00:38.919000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_mean = tl.zeros([XBLOCK, R0_BLOCK], tl.float32) 2025-12-04T12:15:05.7693054Z E1204 12:00:38.919000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_m2 = tl.zeros([XBLOCK, R0_BLOCK], tl.float32) 2025-12-04T12:15:05.7693669Z E1204 12:00:38.919000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_weight = tl.zeros([XBLOCK, R0_BLOCK], tl.float32) 2025-12-04T12:15:05.7694244Z E1204 12:00:38.919000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] for r0_offset in tl.range(0, r0_numel, R0_BLOCK): 2025-12-04T12:15:05.7694774Z E1204 12:00:38.919000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = r0_offset + r0_base 2025-12-04T12:15:05.7695346Z E1204 12:00:38.919000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T12:15:05.7695833Z E1204 12:00:38.919000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T12:15:05.7696392Z E1204 12:00:38.919000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T12:15:05.7696863Z E1204 12:00:38.919000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_1 = r0_index 2025-12-04T12:15:05.7697642Z E1204 12:00:38.919000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask, eviction_policy='evict_last', other=0.0).to(tl.float32) 2025-12-04T12:15:05.7698183Z E1204 12:00:38.919000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float32) 2025-12-04T12:15:05.7698776Z E1204 12:00:38.919000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK]) 2025-12-04T12:15:05.7699504Z E1204 12:00:38.919000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_mean_next, tmp3_m2_next, tmp3_weight_next = triton_helpers.welford_reduce( 2025-12-04T12:15:05.7700107Z E1204 12:00:38.919000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2, tmp3_mean, tmp3_m2, tmp3_weight, roffset == 0 2025-12-04T12:15:05.7700520Z E1204 12:00:38.919000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ) 2025-12-04T12:15:05.7701142Z E1204 12:00:38.919000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_mean = tl.where(r0_mask, tmp3_mean_next, tmp3_mean) 2025-12-04T12:15:05.7701777Z E1204 12:00:38.919000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_m2 = tl.where(r0_mask, tmp3_m2_next, tmp3_m2) 2025-12-04T12:15:05.7702438Z E1204 12:00:38.919000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_weight = tl.where(r0_mask, tmp3_weight_next, tmp3_weight) 2025-12-04T12:15:05.7703144Z E1204 12:00:38.919000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4, tmp5, tmp6 = triton_helpers.welford(tmp3_mean, tmp3_m2, tmp3_weight, 1) 2025-12-04T12:15:05.7703634Z E1204 12:00:38.919000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tmp4[:, None] 2025-12-04T12:15:05.7704111Z E1204 12:00:38.919000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = tmp5[:, None] 2025-12-04T12:15:05.7704613Z E1204 12:00:38.919000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tmp6[:, None] 2025-12-04T12:15:05.7705267Z E1204 12:00:38.919000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp20 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32) 2025-12-04T12:15:05.7705791Z E1204 12:00:38.919000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp22 = tl.load(in_ptr1 + (0)) 2025-12-04T12:15:05.7706380Z E1204 12:00:38.919000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp23 = tl.broadcast_to(tmp22, [1, 1]) 2025-12-04T12:15:05.7706960Z E1204 12:00:38.919000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] for r0_offset in tl.range(0, r0_numel, R0_BLOCK): 2025-12-04T12:15:05.7707501Z E1204 12:00:38.919000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = r0_offset + r0_base 2025-12-04T12:15:05.7708033Z E1204 12:00:38.919000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T12:15:05.7709171Z E1204 12:00:38.919000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T12:15:05.7709669Z E1204 12:00:38.919000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T12:15:05.7710141Z E1204 12:00:38.919000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_1 = r0_index 2025-12-04T12:15:05.7710951Z E1204 12:00:38.919000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask, eviction_policy='evict_first', other=0.0).to(tl.float32) 2025-12-04T12:15:05.7711482Z E1204 12:00:38.919000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = tmp9.to(tl.float32) 2025-12-04T12:15:05.7711986Z E1204 12:00:38.919000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = tmp10 - tmp3 2025-12-04T12:15:05.7712468Z E1204 12:00:38.919000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = 4096.0 2025-12-04T12:15:05.7712971Z E1204 12:00:38.919000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = (tmp7 / tmp12) 2025-12-04T12:15:05.7713456Z E1204 12:00:38.919000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp14 = 1e-05 2025-12-04T12:15:05.7713953Z E1204 12:00:38.919000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp15 = tmp13 + tmp14 2025-12-04T12:15:05.7714491Z E1204 12:00:38.919000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp16 = libdevice.rsqrt(tmp15) 2025-12-04T12:15:05.7715002Z E1204 12:00:38.919000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp17 = tmp11 * tmp16 2025-12-04T12:15:05.7715523Z E1204 12:00:38.919000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp18 = tl_math.abs(tmp17) 2025-12-04T12:15:05.7716172Z E1204 12:00:38.919000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp19 = tl.broadcast_to(tmp18, [XBLOCK, R0_BLOCK]) 2025-12-04T12:15:05.7716750Z E1204 12:00:38.919000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp21 = triton_helpers.maximum(_tmp20, tmp19) 2025-12-04T12:15:05.7717310Z E1204 12:00:38.919000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp20 = tl.where(r0_mask, tmp21, _tmp20) 2025-12-04T12:15:05.7717818Z E1204 12:00:38.919000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp24 = tmp17 * tmp23 2025-12-04T12:15:05.7718314Z E1204 12:00:38.919000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp25 = -448.0 2025-12-04T12:15:05.7718898Z E1204 12:00:38.919000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp26 = triton_helpers.maximum(tmp24, tmp25) 2025-12-04T12:15:05.7719359Z E1204 12:00:38.919000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp27 = 448.0 2025-12-04T12:15:05.7719946Z E1204 12:00:38.919000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp28 = triton_helpers.minimum(tmp26, tmp27) 2025-12-04T12:15:05.7720515Z E1204 12:00:38.919000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp29 = tmp28.to(tl.float8e4nv) 2025-12-04T12:15:05.7721157Z E1204 12:00:38.919000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr3 + (r0_1 + 4096*x0), tmp29, r0_mask) 2025-12-04T12:15:05.7721835Z E1204 12:00:38.919000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp20 = triton_helpers.max2(_tmp20, 1)[:, None] 2025-12-04T12:15:05.7722385Z E1204 12:00:38.919000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr2 + (x0), tmp20, None) 2025-12-04T12:15:05.7722798Z E1204 12:00:38.919000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:05.7725148Z E1204 12:00:38.919000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr2': '*fp32', 'out_ptr3': '*fp8e4nv', 'xnumel': 'i32', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1, 'R0_BLOCK': 4096}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 75} 2025-12-04T12:15:05.7725697Z E1204 12:00:38.919000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:05.7726744Z E1204 12:00:38.919000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.7727384Z E1204 12:00:38.919000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.7728283Z E1204 12:00:38.919000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.7728963Z E1204 12:00:38.919000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.7729888Z E1204 12:00:38.919000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.7730657Z E1204 12:00:38.919000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.7731286Z E1204 12:00:38.919000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T12:15:05.7732402Z E1204 12:00:38.919000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T12:15:05.7732781Z E1204 12:00:38.919000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T12:15:05.7733678Z E1204 12:00:38.919000 119707 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.7733816Z FAILED [0.4530s] [100%] 2025-12-04T12:15:05.7733823Z 2025-12-04T12:15:05.7733980Z ==================================== RERUNS ==================================== 2025-12-04T12:15:05.7734387Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_4,2048,4096_cuda _ 2025-12-04T12:15:05.7734522Z Traceback (most recent call last): 2025-12-04T12:15:05.7734947Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant 2025-12-04T12:15:05.7735181Z y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T12:15:05.7735684Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:05.7735967Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:05.7736565Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:05.7736761Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:05.7737272Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:05.7737433Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:05.7737969Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:05.7738288Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:05.7738824Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:05.7738974Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:05.7739471Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:05.7739597Z return self._compile_to_module() 2025-12-04T12:15:05.7740086Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:05.7740265Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:05.7740780Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:05.7740926Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:05.7741421Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:05.7741658Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:05.7742301Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:05.7742430Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:05.7742933Z File "/tmp/tmp5ggmxwz2/ek/cekg2gokxegjc25gieptllhpsmseog5b543tjrgeetz65iuizaik.py", line 65, in 2025-12-04T12:15:05.7743411Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T12:15:05.7743524Z kernel.precompile( 2025-12-04T12:15:05.7744090Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T12:15:05.7744209Z self._precompile_worker() 2025-12-04T12:15:05.7744834Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:05.7745029Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:05.7745631Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.7745848Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.7746332Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.7746578Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.7747035Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.7747370Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.7747599Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:05.7748260Z def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T12:15:05.7748383Z ^ 2025-12-04T12:15:05.7748855Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.7748862Z 2025-12-04T12:15:05.7749570Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:05.7749580Z 2025-12-04T12:15:05.7749585Z 2025-12-04T12:15:05.7749816Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:05.7750534Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_4,2048,4096_cuda 2025-12-04T12:15:05.7750540Z 2025-12-04T12:15:05.7750810Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:05.7751048Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.7751156Z frames [('total', 1)] 2025-12-04T12:15:05.7751289Z stats [('calls_captured', 10)] 2025-12-04T12:15:05.7751758Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:05.7751983Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.7752096Z graph_break [] 2025-12-04T12:15:05.7752516Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_4,2048,4096_cuda _ 2025-12-04T12:15:05.7752643Z Traceback (most recent call last): 2025-12-04T12:15:05.7753081Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant 2025-12-04T12:15:05.7753612Z y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T12:15:05.7754124Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:05.7754379Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:05.7754945Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:05.7755158Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:05.7755675Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:05.7755825Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:05.7756379Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:05.7756703Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:05.7757268Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:05.7757423Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:05.7757907Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:05.7758050Z return self._compile_to_module() 2025-12-04T12:15:05.7758585Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:05.7758762Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:05.7759277Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:05.7759409Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:05.7759922Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:05.7760159Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:05.7760783Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:05.7760932Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:05.7761431Z File "/tmp/tmpssk9n14d/i7/ci7ig3lx3xyndr2ivt262fdzrmbukp6ilf3mdund7p2x3k6uj7r5.py", line 65, in 2025-12-04T12:15:05.7761921Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T12:15:05.7762037Z kernel.precompile( 2025-12-04T12:15:05.7762593Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T12:15:05.7762728Z self._precompile_worker() 2025-12-04T12:15:05.7763330Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:05.7763527Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:05.7764130Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.7764331Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.7764792Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.7765045Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.7765489Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.7765841Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.7766070Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:05.7766741Z def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T12:15:05.7766836Z ^ 2025-12-04T12:15:05.7767328Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.7767334Z 2025-12-04T12:15:05.7768060Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:05.7768069Z 2025-12-04T12:15:05.7768074Z 2025-12-04T12:15:05.7768292Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:05.7769025Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_4,2048,4096_cuda 2025-12-04T12:15:05.7769030Z 2025-12-04T12:15:05.7769351Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:05.7769598Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.7769707Z frames [('total', 1)] 2025-12-04T12:15:05.7769828Z stats [('calls_captured', 10)] 2025-12-04T12:15:05.7770308Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:05.7770564Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.7770664Z graph_break [] 2025-12-04T12:15:05.7770898Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.7771209Z frames [('total', 1)] 2025-12-04T12:15:05.7771344Z stats [('calls_captured', 10)] 2025-12-04T12:15:05.7771565Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.7772027Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:05.7772144Z graph_break [] 2025-12-04T12:15:05.7772296Z =================================== FAILURES =================================== 2025-12-04T12:15:05.7772798Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_4,2048,4096_cuda _ 2025-12-04T12:15:05.7772938Z Traceback (most recent call last): 2025-12-04T12:15:05.7773365Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant 2025-12-04T12:15:05.7773615Z y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T12:15:05.7774105Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:05.7774356Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:05.7774886Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:05.7775088Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:05.7775604Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:05.7775773Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:05.7776369Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:05.7776713Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:05.7777236Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:05.7777385Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:05.7777884Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:05.7778009Z return self._compile_to_module() 2025-12-04T12:15:05.7778508Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:05.7778678Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:05.7779254Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:05.7779400Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:05.7779896Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:05.7780128Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:05.7780725Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:05.7780852Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:05.7781382Z File "/tmp/tmp938xof_7/g2/cg246gs4mive6nya6r23nldbjauykvbeqccxb75i2ywxwkjrs263.py", line 65, in 2025-12-04T12:15:05.7781846Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T12:15:05.7781969Z kernel.precompile( 2025-12-04T12:15:05.7782536Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T12:15:05.7782706Z self._precompile_worker() 2025-12-04T12:15:05.7783314Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:05.7783496Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:05.7784087Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.7784304Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.7784760Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.7785086Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.7785534Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.7785874Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.7786120Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:05.7786775Z def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T12:15:05.7786868Z ^ 2025-12-04T12:15:05.7787343Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.7787349Z 2025-12-04T12:15:05.7788062Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:05.7788070Z 2025-12-04T12:15:05.7788075Z 2025-12-04T12:15:05.7788308Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:05.7789027Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_4,2048,4096_cuda 2025-12-04T12:15:05.7789035Z 2025-12-04T12:15:05.7789320Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:05.7789546Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.7789654Z frames [('total', 1)] 2025-12-04T12:15:05.7789791Z stats [('calls_captured', 10)] 2025-12-04T12:15:05.7790254Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:05.7790482Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.7790597Z graph_break [] 2025-12-04T12:15:05.7790865Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.7790987Z frames [('total', 1)] 2025-12-04T12:15:05.7791105Z stats [('calls_captured', 10)] 2025-12-04T12:15:05.7791323Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.7791800Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:05.7791905Z graph_break [] 2025-12-04T12:15:05.7792122Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.7792242Z frames [('total', 1)] 2025-12-04T12:15:05.7792357Z stats [('calls_captured', 10)] 2025-12-04T12:15:05.7792588Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.7793076Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:05.7793179Z graph_break [] 2025-12-04T12:15:05.7793839Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-43cf13c151388d8e.xml - 2025-12-04T12:15:05.7794015Z =========================== short test summary info ============================ 2025-12-04T12:15:05.7794908Z FAILED [0.4530s] inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_4,2048,4096_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:05.7795568Z def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T12:15:05.7795660Z ^ 2025-12-04T12:15:05.7796133Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.7796139Z 2025-12-04T12:15:05.7796849Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:05.7796888Z 2025-12-04T12:15:05.7796893Z 2025-12-04T12:15:05.7797123Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:05.7797844Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_4,2048,4096_cuda 2025-12-04T12:15:05.7797850Z 2025-12-04T12:15:05.7798116Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:05.7798317Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T12:15:05.7798522Z ================== 1 failed, 187 deselected, 2 rerun in 4.40s ================== 2025-12-04T12:15:05.7798640Z Got exit code 1 2025-12-04T12:15:05.7799276Z FAILED CONSISTENTLY: test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_4,2048,4096_cuda 2025-12-04T12:15:05.7799697Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T12:15:05.7800177Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-27661fe34019a4f8.xml 2025-12-04T12:15:05.7800346Z ============================= test session starts ============================== 2025-12-04T12:15:05.7800711Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T12:15:05.7800822Z cachedir: .pytest_cache 2025-12-04T12:15:05.7801342Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:15:05.7801479Z rootdir: /var/lib/jenkins/workspace 2025-12-04T12:15:05.7801589Z configfile: pytest.ini 2025-12-04T12:15:05.7802180Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T12:15:05.7802455Z collecting ... collected 188 items / 32 deselected / 156 selected 2025-12-04T12:15:05.7802601Z stepcurrent: skipping 32 already run items. 2025-12-04T12:15:05.7802730Z Running 156 items in this shard 2025-12-04T12:15:05.7802737Z 2025-12-04T12:15:05.7804160Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,1,15_cuda E1204 12:00:57.803000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0 2025-12-04T12:15:05.7805308Z E1204 12:00:57.803000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0(in_ptr0, in_ptr1, out_ptr3, out_ptr4, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.7805755Z E1204 12:00:57.803000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T12:15:05.7806200Z E1204 12:00:57.803000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 15 2025-12-04T12:15:05.7806728Z E1204 12:00:57.803000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] R0_BLOCK: tl.constexpr = 16 2025-12-04T12:15:05.7807221Z E1204 12:00:57.803000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T12:15:05.7807773Z E1204 12:00:57.803000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T12:15:05.7808314Z E1204 12:00:57.803000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:05.7808901Z E1204 12:00:57.803000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T12:15:05.7809535Z E1204 12:00:57.803000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T12:15:05.7810091Z E1204 12:00:57.803000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T12:15:05.7810551Z E1204 12:00:57.803000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_offset = 0 2025-12-04T12:15:05.7811065Z E1204 12:00:57.803000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T12:15:05.7811533Z E1204 12:00:57.803000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T12:15:05.7812006Z E1204 12:00:57.803000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T12:15:05.7812458Z E1204 12:00:57.803000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_0 = r0_index 2025-12-04T12:15:05.7813125Z E1204 12:00:57.803000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0).to(tl.float32) 2025-12-04T12:15:05.7813656Z E1204 12:00:57.803000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp30 = tl.load(in_ptr1 + (0)) 2025-12-04T12:15:05.7814198Z E1204 12:00:57.803000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp31 = tl.broadcast_to(tmp30, [1, 1]) 2025-12-04T12:15:05.7814718Z E1204 12:00:57.803000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float32) 2025-12-04T12:15:05.7815306Z E1204 12:00:57.803000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK]) 2025-12-04T12:15:05.7815899Z E1204 12:00:57.803000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tl.where(r0_mask, tmp2, 0) 2025-12-04T12:15:05.7816554Z E1204 12:00:57.803000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = tl.broadcast_to(tmp2, [XBLOCK, R0_BLOCK]) 2025-12-04T12:15:05.7817094Z E1204 12:00:57.803000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = tl.where(r0_mask, tmp5, 0) 2025-12-04T12:15:05.7817673Z E1204 12:00:57.803000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tl.sum(tmp7, 1)[:, None].to(tl.float32) 2025-12-04T12:15:05.7818206Z E1204 12:00:57.803000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = tl.full([1, 1], 15, tl.int32) 2025-12-04T12:15:05.7818774Z E1204 12:00:57.803000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = tmp9.to(tl.float32) 2025-12-04T12:15:05.7819264Z E1204 12:00:57.803000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = (tmp8 / tmp10) 2025-12-04T12:15:05.7819759Z E1204 12:00:57.803000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = tmp2 - tmp11 2025-12-04T12:15:05.7820281Z E1204 12:00:57.803000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = tmp12 * tmp12 2025-12-04T12:15:05.7820872Z E1204 12:00:57.803000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp14 = tl.broadcast_to(tmp13, [XBLOCK, R0_BLOCK]) 2025-12-04T12:15:05.7821425Z E1204 12:00:57.803000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp16 = tl.where(r0_mask, tmp14, 0) 2025-12-04T12:15:05.7822011Z E1204 12:00:57.803000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp17 = tl.sum(tmp16, 1)[:, None].to(tl.float32) 2025-12-04T12:15:05.7822502Z E1204 12:00:57.803000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp18 = tmp1 - tmp11 2025-12-04T12:15:05.7822984Z E1204 12:00:57.803000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp19 = 15.0 2025-12-04T12:15:05.7823479Z E1204 12:00:57.803000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp20 = (tmp17 / tmp19) 2025-12-04T12:15:05.7823935Z E1204 12:00:57.803000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp21 = 1e-05 2025-12-04T12:15:05.7824419Z E1204 12:00:57.803000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp22 = tmp20 + tmp21 2025-12-04T12:15:05.7824959Z E1204 12:00:57.803000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp23 = libdevice.rsqrt(tmp22) 2025-12-04T12:15:05.7825444Z E1204 12:00:57.803000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp24 = tmp18 * tmp23 2025-12-04T12:15:05.7825953Z E1204 12:00:57.803000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp25 = tl_math.abs(tmp24) 2025-12-04T12:15:05.7826556Z E1204 12:00:57.803000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp26 = tl.broadcast_to(tmp25, [XBLOCK, R0_BLOCK]) 2025-12-04T12:15:05.7827135Z E1204 12:00:57.803000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp28 = tl.where(r0_mask, tmp26, float("-inf")) 2025-12-04T12:15:05.7827789Z E1204 12:00:57.803000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp29 = triton_helpers.max2(tmp28, 1)[:, None].to(tl.float32) 2025-12-04T12:15:05.7828272Z E1204 12:00:57.803000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp32 = tmp24 * tmp31 2025-12-04T12:15:05.7828718Z E1204 12:00:57.803000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp33 = -448.0 2025-12-04T12:15:05.7829361Z E1204 12:00:57.803000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp34 = triton_helpers.maximum(tmp32, tmp33) 2025-12-04T12:15:05.7829805Z E1204 12:00:57.803000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp35 = 448.0 2025-12-04T12:15:05.7830398Z E1204 12:00:57.803000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp36 = triton_helpers.minimum(tmp34, tmp35) 2025-12-04T12:15:05.7830935Z E1204 12:00:57.803000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp37 = tmp36.to(tl.float8e4nv) 2025-12-04T12:15:05.7831451Z E1204 12:00:57.803000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp38 = tmp29.to(tl.float32) 2025-12-04T12:15:05.7832211Z E1204 12:00:57.803000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr3 + (tl.broadcast_to(r0_0, [XBLOCK, R0_BLOCK])), tmp37, r0_mask) 2025-12-04T12:15:05.7832930Z E1204 12:00:57.803000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr4 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp38, None) 2025-12-04T12:15:05.7833315Z E1204 12:00:57.803000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:05.7835458Z E1204 12:00:57.803000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr3': '*fp8e4nv', 'out_ptr4': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 75} 2025-12-04T12:15:05.7836012Z E1204 12:00:57.803000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:05.7837173Z E1204 12:00:57.803000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.7837821Z E1204 12:00:57.803000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.7838715Z E1204 12:00:57.803000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.7839411Z E1204 12:00:57.803000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.7840301Z E1204 12:00:57.803000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.7841072Z E1204 12:00:57.803000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.7841698Z E1204 12:00:57.803000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T12:15:05.7842793Z E1204 12:00:57.803000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0(in_ptr0, in_ptr1, out_ptr3, out_ptr4, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.7843178Z E1204 12:00:57.803000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T12:15:05.7844121Z E1204 12:00:57.803000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.7844275Z ('RERUN', {'yellow': True}) [3.3658s] [ 0%] 2025-12-04T12:15:05.7845694Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,1,15_cuda E1204 12:00:58.245000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0 2025-12-04T12:15:05.7846810Z E1204 12:00:58.245000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0(in_ptr0, in_ptr1, out_ptr3, out_ptr4, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.7847264Z E1204 12:00:58.245000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T12:15:05.7847704Z E1204 12:00:58.245000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 15 2025-12-04T12:15:05.7848262Z E1204 12:00:58.245000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] R0_BLOCK: tl.constexpr = 16 2025-12-04T12:15:05.7848724Z E1204 12:00:58.245000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T12:15:05.7849258Z E1204 12:00:58.245000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T12:15:05.7849820Z E1204 12:00:58.245000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:05.7850402Z E1204 12:00:58.245000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T12:15:05.7851032Z E1204 12:00:58.245000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T12:15:05.7851587Z E1204 12:00:58.245000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T12:15:05.7852049Z E1204 12:00:58.245000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_offset = 0 2025-12-04T12:15:05.7852563Z E1204 12:00:58.245000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T12:15:05.7853033Z E1204 12:00:58.245000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T12:15:05.7853504Z E1204 12:00:58.245000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T12:15:05.7853956Z E1204 12:00:58.245000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_0 = r0_index 2025-12-04T12:15:05.7854615Z E1204 12:00:58.245000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0).to(tl.float32) 2025-12-04T12:15:05.7855144Z E1204 12:00:58.245000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp30 = tl.load(in_ptr1 + (0)) 2025-12-04T12:15:05.7855690Z E1204 12:00:58.245000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp31 = tl.broadcast_to(tmp30, [1, 1]) 2025-12-04T12:15:05.7856205Z E1204 12:00:58.245000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float32) 2025-12-04T12:15:05.7856885Z E1204 12:00:58.245000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK]) 2025-12-04T12:15:05.7857492Z E1204 12:00:58.245000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tl.where(r0_mask, tmp2, 0) 2025-12-04T12:15:05.7858072Z E1204 12:00:58.245000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = tl.broadcast_to(tmp2, [XBLOCK, R0_BLOCK]) 2025-12-04T12:15:05.7858610Z E1204 12:00:58.245000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = tl.where(r0_mask, tmp5, 0) 2025-12-04T12:15:05.7859196Z E1204 12:00:58.245000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tl.sum(tmp7, 1)[:, None].to(tl.float32) 2025-12-04T12:15:05.7859729Z E1204 12:00:58.245000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = tl.full([1, 1], 15, tl.int32) 2025-12-04T12:15:05.7860287Z E1204 12:00:58.245000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = tmp9.to(tl.float32) 2025-12-04T12:15:05.7860781Z E1204 12:00:58.245000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = (tmp8 / tmp10) 2025-12-04T12:15:05.7861257Z E1204 12:00:58.245000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = tmp2 - tmp11 2025-12-04T12:15:05.7861786Z E1204 12:00:58.245000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = tmp12 * tmp12 2025-12-04T12:15:05.7862375Z E1204 12:00:58.245000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp14 = tl.broadcast_to(tmp13, [XBLOCK, R0_BLOCK]) 2025-12-04T12:15:05.7862925Z E1204 12:00:58.245000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp16 = tl.where(r0_mask, tmp14, 0) 2025-12-04T12:15:05.7863502Z E1204 12:00:58.245000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp17 = tl.sum(tmp16, 1)[:, None].to(tl.float32) 2025-12-04T12:15:05.7864037Z E1204 12:00:58.245000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp18 = tmp1 - tmp11 2025-12-04T12:15:05.7864475Z E1204 12:00:58.245000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp19 = 15.0 2025-12-04T12:15:05.7864968Z E1204 12:00:58.245000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp20 = (tmp17 / tmp19) 2025-12-04T12:15:05.7865423Z E1204 12:00:58.245000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp21 = 1e-05 2025-12-04T12:15:05.7865907Z E1204 12:00:58.245000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp22 = tmp20 + tmp21 2025-12-04T12:15:05.7866445Z E1204 12:00:58.245000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp23 = libdevice.rsqrt(tmp22) 2025-12-04T12:15:05.7866926Z E1204 12:00:58.245000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp24 = tmp18 * tmp23 2025-12-04T12:15:05.7867438Z E1204 12:00:58.245000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp25 = tl_math.abs(tmp24) 2025-12-04T12:15:05.7868042Z E1204 12:00:58.245000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp26 = tl.broadcast_to(tmp25, [XBLOCK, R0_BLOCK]) 2025-12-04T12:15:05.7868620Z E1204 12:00:58.245000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp28 = tl.where(r0_mask, tmp26, float("-inf")) 2025-12-04T12:15:05.7869268Z E1204 12:00:58.245000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp29 = triton_helpers.max2(tmp28, 1)[:, None].to(tl.float32) 2025-12-04T12:15:05.7869749Z E1204 12:00:58.245000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp32 = tmp24 * tmp31 2025-12-04T12:15:05.7870194Z E1204 12:00:58.245000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp33 = -448.0 2025-12-04T12:15:05.7870839Z E1204 12:00:58.245000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp34 = triton_helpers.maximum(tmp32, tmp33) 2025-12-04T12:15:05.7871926Z E1204 12:00:58.245000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp35 = 448.0 2025-12-04T12:15:05.7872518Z E1204 12:00:58.245000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp36 = triton_helpers.minimum(tmp34, tmp35) 2025-12-04T12:15:05.7873048Z E1204 12:00:58.245000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp37 = tmp36.to(tl.float8e4nv) 2025-12-04T12:15:05.7873560Z E1204 12:00:58.245000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp38 = tmp29.to(tl.float32) 2025-12-04T12:15:05.7874385Z E1204 12:00:58.245000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr3 + (tl.broadcast_to(r0_0, [XBLOCK, R0_BLOCK])), tmp37, r0_mask) 2025-12-04T12:15:05.7875106Z E1204 12:00:58.245000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr4 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp38, None) 2025-12-04T12:15:05.7875483Z E1204 12:00:58.245000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:05.7877779Z E1204 12:00:58.245000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr3': '*fp8e4nv', 'out_ptr4': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 75} 2025-12-04T12:15:05.7878400Z E1204 12:00:58.245000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:05.7879442Z E1204 12:00:58.245000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.7880087Z E1204 12:00:58.245000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.7880980Z E1204 12:00:58.245000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.7881662Z E1204 12:00:58.245000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.7882566Z E1204 12:00:58.245000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.7883340Z E1204 12:00:58.245000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.7883966Z E1204 12:00:58.245000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T12:15:05.7885061Z E1204 12:00:58.245000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0(in_ptr0, in_ptr1, out_ptr3, out_ptr4, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.7885442Z E1204 12:00:58.245000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T12:15:05.7886391Z E1204 12:00:58.245000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.7886530Z ('RERUN', {'yellow': True}) [0.4038s] [ 0%] 2025-12-04T12:15:05.7887968Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,1,15_cuda E1204 12:00:58.651000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0 2025-12-04T12:15:05.7889090Z E1204 12:00:58.651000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0(in_ptr0, in_ptr1, out_ptr3, out_ptr4, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.7889546Z E1204 12:00:58.651000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T12:15:05.7889990Z E1204 12:00:58.651000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 15 2025-12-04T12:15:05.7890552Z E1204 12:00:58.651000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] R0_BLOCK: tl.constexpr = 16 2025-12-04T12:15:05.7891015Z E1204 12:00:58.651000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T12:15:05.7891547Z E1204 12:00:58.651000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T12:15:05.7892103Z E1204 12:00:58.651000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:05.7892688Z E1204 12:00:58.651000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T12:15:05.7893311Z E1204 12:00:58.651000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T12:15:05.7893868Z E1204 12:00:58.651000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T12:15:05.7894324Z E1204 12:00:58.651000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_offset = 0 2025-12-04T12:15:05.7894840Z E1204 12:00:58.651000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T12:15:05.7895312Z E1204 12:00:58.651000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T12:15:05.7895782Z E1204 12:00:58.651000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T12:15:05.7896229Z E1204 12:00:58.651000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_0 = r0_index 2025-12-04T12:15:05.7896952Z E1204 12:00:58.651000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0).to(tl.float32) 2025-12-04T12:15:05.7897480Z E1204 12:00:58.651000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp30 = tl.load(in_ptr1 + (0)) 2025-12-04T12:15:05.7898028Z E1204 12:00:58.651000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp31 = tl.broadcast_to(tmp30, [1, 1]) 2025-12-04T12:15:05.7898555Z E1204 12:00:58.651000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float32) 2025-12-04T12:15:05.7899142Z E1204 12:00:58.651000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK]) 2025-12-04T12:15:05.7899729Z E1204 12:00:58.651000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tl.where(r0_mask, tmp2, 0) 2025-12-04T12:15:05.7900307Z E1204 12:00:58.651000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = tl.broadcast_to(tmp2, [XBLOCK, R0_BLOCK]) 2025-12-04T12:15:05.7900840Z E1204 12:00:58.651000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = tl.where(r0_mask, tmp5, 0) 2025-12-04T12:15:05.7901425Z E1204 12:00:58.651000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tl.sum(tmp7, 1)[:, None].to(tl.float32) 2025-12-04T12:15:05.7901994Z E1204 12:00:58.651000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = tl.full([1, 1], 15, tl.int32) 2025-12-04T12:15:05.7902522Z E1204 12:00:58.651000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = tmp9.to(tl.float32) 2025-12-04T12:15:05.7903016Z E1204 12:00:58.651000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = (tmp8 / tmp10) 2025-12-04T12:15:05.7903493Z E1204 12:00:58.651000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = tmp2 - tmp11 2025-12-04T12:15:05.7904017Z E1204 12:00:58.651000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = tmp12 * tmp12 2025-12-04T12:15:05.7904603Z E1204 12:00:58.651000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp14 = tl.broadcast_to(tmp13, [XBLOCK, R0_BLOCK]) 2025-12-04T12:15:05.7905151Z E1204 12:00:58.651000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp16 = tl.where(r0_mask, tmp14, 0) 2025-12-04T12:15:05.7905731Z E1204 12:00:58.651000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp17 = tl.sum(tmp16, 1)[:, None].to(tl.float32) 2025-12-04T12:15:05.7906273Z E1204 12:00:58.651000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp18 = tmp1 - tmp11 2025-12-04T12:15:05.7906711Z E1204 12:00:58.651000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp19 = 15.0 2025-12-04T12:15:05.7907201Z E1204 12:00:58.651000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp20 = (tmp17 / tmp19) 2025-12-04T12:15:05.7907658Z E1204 12:00:58.651000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp21 = 1e-05 2025-12-04T12:15:05.7908136Z E1204 12:00:58.651000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp22 = tmp20 + tmp21 2025-12-04T12:15:05.7908675Z E1204 12:00:58.651000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp23 = libdevice.rsqrt(tmp22) 2025-12-04T12:15:05.7909152Z E1204 12:00:58.651000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp24 = tmp18 * tmp23 2025-12-04T12:15:05.7909659Z E1204 12:00:58.651000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp25 = tl_math.abs(tmp24) 2025-12-04T12:15:05.7910255Z E1204 12:00:58.651000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp26 = tl.broadcast_to(tmp25, [XBLOCK, R0_BLOCK]) 2025-12-04T12:15:05.7910828Z E1204 12:00:58.651000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp28 = tl.where(r0_mask, tmp26, float("-inf")) 2025-12-04T12:15:05.7911475Z E1204 12:00:58.651000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp29 = triton_helpers.max2(tmp28, 1)[:, None].to(tl.float32) 2025-12-04T12:15:05.7911957Z E1204 12:00:58.651000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp32 = tmp24 * tmp31 2025-12-04T12:15:05.7912400Z E1204 12:00:58.651000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp33 = -448.0 2025-12-04T12:15:05.7913026Z E1204 12:00:58.651000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp34 = triton_helpers.maximum(tmp32, tmp33) 2025-12-04T12:15:05.7913464Z E1204 12:00:58.651000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp35 = 448.0 2025-12-04T12:15:05.7914047Z E1204 12:00:58.651000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp36 = triton_helpers.minimum(tmp34, tmp35) 2025-12-04T12:15:05.7914581Z E1204 12:00:58.651000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp37 = tmp36.to(tl.float8e4nv) 2025-12-04T12:15:05.7915094Z E1204 12:00:58.651000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp38 = tmp29.to(tl.float32) 2025-12-04T12:15:05.7915843Z E1204 12:00:58.651000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr3 + (tl.broadcast_to(r0_0, [XBLOCK, R0_BLOCK])), tmp37, r0_mask) 2025-12-04T12:15:05.7916557Z E1204 12:00:58.651000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr4 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp38, None) 2025-12-04T12:15:05.7916966Z E1204 12:00:58.651000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:05.7919058Z E1204 12:00:58.651000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr3': '*fp8e4nv', 'out_ptr4': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 75} 2025-12-04T12:15:05.7919640Z E1204 12:00:58.651000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:05.7920684Z E1204 12:00:58.651000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.7921332Z E1204 12:00:58.651000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.7922221Z E1204 12:00:58.651000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.7922911Z E1204 12:00:58.651000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.7923813Z E1204 12:00:58.651000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.7924587Z E1204 12:00:58.651000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.7925210Z E1204 12:00:58.651000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T12:15:05.7926305Z E1204 12:00:58.651000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0(in_ptr0, in_ptr1, out_ptr3, out_ptr4, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.7926727Z E1204 12:00:58.651000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T12:15:05.7927622Z E1204 12:00:58.651000 119904 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.7927730Z FAILED [0.4035s] [ 0%] 2025-12-04T12:15:05.7927752Z 2025-12-04T12:15:05.7927901Z ==================================== RERUNS ==================================== 2025-12-04T12:15:05.7928283Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,1,15_cuda _ 2025-12-04T12:15:05.7928423Z Traceback (most recent call last): 2025-12-04T12:15:05.7928848Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant 2025-12-04T12:15:05.7929115Z y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T12:15:05.7929623Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:05.7929875Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:05.7930400Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:05.7930628Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:05.7931136Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:05.7931296Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:05.7931828Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:05.7932151Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:05.7932685Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:05.7932872Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:05.7933378Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:05.7933504Z return self._compile_to_module() 2025-12-04T12:15:05.7933989Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:05.7934168Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:05.7934687Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:05.7934835Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:05.7935336Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:05.7935571Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:05.7936173Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:05.7936364Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:05.7936878Z File "/tmp/tmp6qkidtfc/qd/cqdvosxiuun73ceezczqcvaxeq2b2mbglfras55yapfjpp5bt4sc.py", line 74, in 2025-12-04T12:15:05.7937361Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T12:15:05.7937476Z kernel.precompile( 2025-12-04T12:15:05.7938047Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T12:15:05.7938170Z self._precompile_worker() 2025-12-04T12:15:05.7938771Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:05.7938972Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:05.7939613Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.7939830Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.7940286Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.7940533Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.7940996Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.7941333Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.7941605Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:05.7942265Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0(in_ptr0, in_ptr1, out_ptr3, out_ptr4, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.7942361Z ^ 2025-12-04T12:15:05.7942839Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.7942876Z 2025-12-04T12:15:05.7943593Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:05.7943599Z 2025-12-04T12:15:05.7943604Z 2025-12-04T12:15:05.7943840Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:05.7944529Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,1,15_cuda 2025-12-04T12:15:05.7944538Z 2025-12-04T12:15:05.7944810Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:05.7945086Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.7945195Z frames [('total', 1)] 2025-12-04T12:15:05.7945329Z stats [('calls_captured', 10)] 2025-12-04T12:15:05.7945797Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:05.7946019Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.7946138Z graph_break [] 2025-12-04T12:15:05.7946519Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,1,15_cuda _ 2025-12-04T12:15:05.7946645Z Traceback (most recent call last): 2025-12-04T12:15:05.7947085Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant 2025-12-04T12:15:05.7947321Z y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T12:15:05.7947828Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:05.7948082Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:05.7948596Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:05.7948809Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:05.7949318Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:05.7949464Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:05.7950011Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:05.7950332Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:05.7950861Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:05.7951042Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:05.7951521Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:05.7951660Z return self._compile_to_module() 2025-12-04T12:15:05.7952145Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:05.7952323Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:05.7952837Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:05.7952966Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:05.7953504Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:05.7953738Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:05.7954341Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:05.7954470Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:05.7954992Z File "/tmp/tmp6_pgr861/bq/cbq2v2em5kgujuwm57jdnkrros4hjaz72skgv6ptkwvq772rugcf.py", line 74, in 2025-12-04T12:15:05.7955462Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T12:15:05.7955575Z kernel.precompile( 2025-12-04T12:15:05.7956125Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T12:15:05.7956255Z self._precompile_worker() 2025-12-04T12:15:05.7956853Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:05.7957081Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:05.7957675Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.7957876Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.7958343Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.7958592Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.7959047Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.7959381Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.7959611Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:05.7960274Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0(in_ptr0, in_ptr1, out_ptr3, out_ptr4, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.7960370Z ^ 2025-12-04T12:15:05.7960827Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.7960847Z 2025-12-04T12:15:05.7961557Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:05.7961563Z 2025-12-04T12:15:05.7961568Z 2025-12-04T12:15:05.7961785Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:05.7962489Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,1,15_cuda 2025-12-04T12:15:05.7962494Z 2025-12-04T12:15:05.7962766Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:05.7963001Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.7963148Z frames [('total', 1)] 2025-12-04T12:15:05.7963267Z stats [('calls_captured', 10)] 2025-12-04T12:15:05.7963750Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:05.7963976Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.7964077Z graph_break [] 2025-12-04T12:15:05.7964309Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.7964415Z frames [('total', 1)] 2025-12-04T12:15:05.7964541Z stats [('calls_captured', 10)] 2025-12-04T12:15:05.7964759Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.7965251Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:05.7965367Z graph_break [] 2025-12-04T12:15:05.7965516Z =================================== FAILURES =================================== 2025-12-04T12:15:05.7965898Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,1,15_cuda _ 2025-12-04T12:15:05.7966037Z Traceback (most recent call last): 2025-12-04T12:15:05.7966501Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant 2025-12-04T12:15:05.7966746Z y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T12:15:05.7967234Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:05.7967482Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:05.7968010Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:05.7968202Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:05.7968761Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:05.7968909Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:05.7969441Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:05.7969778Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:05.7970296Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:05.7970443Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:05.7971136Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:05.7971266Z return self._compile_to_module() 2025-12-04T12:15:05.7971767Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:05.7971938Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:05.7972452Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:05.7972601Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:05.7973097Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:05.7973344Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:05.7973934Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:05.7974061Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:05.7974573Z File "/tmp/tmplvqn329b/p5/cp5t2f53rpk7o5z6kw7x4uydjclzdisjyg3s3cy22fswwkis34a6.py", line 74, in 2025-12-04T12:15:05.7975115Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T12:15:05.7975229Z kernel.precompile( 2025-12-04T12:15:05.7975800Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T12:15:05.7975921Z self._precompile_worker() 2025-12-04T12:15:05.7976605Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:05.7976787Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:05.7977381Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.7977653Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.7978104Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.7978367Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.7978808Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.7979184Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.7979425Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:05.7980075Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0(in_ptr0, in_ptr1, out_ptr3, out_ptr4, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.7980166Z ^ 2025-12-04T12:15:05.7980636Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.7980642Z 2025-12-04T12:15:05.7981351Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:05.7981403Z 2025-12-04T12:15:05.7981410Z 2025-12-04T12:15:05.7981640Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:05.7982322Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,1,15_cuda 2025-12-04T12:15:05.7982331Z 2025-12-04T12:15:05.7982612Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:05.7982833Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.7982938Z frames [('total', 1)] 2025-12-04T12:15:05.7983067Z stats [('calls_captured', 10)] 2025-12-04T12:15:05.7983534Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:05.7983756Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.7983871Z graph_break [] 2025-12-04T12:15:05.7984090Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.7984209Z frames [('total', 1)] 2025-12-04T12:15:05.7984327Z stats [('calls_captured', 10)] 2025-12-04T12:15:05.7984549Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.7985022Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:05.7985124Z graph_break [] 2025-12-04T12:15:05.7985340Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.7985456Z frames [('total', 1)] 2025-12-04T12:15:05.7985571Z stats [('calls_captured', 10)] 2025-12-04T12:15:05.7985806Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.7986266Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:05.7986369Z graph_break [] 2025-12-04T12:15:05.7987067Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-27661fe34019a4f8.xml - 2025-12-04T12:15:05.7987246Z =========================== short test summary info ============================ 2025-12-04T12:15:05.7988085Z FAILED [0.4035s] inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,1,15_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:05.7988745Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0(in_ptr0, in_ptr1, out_ptr3, out_ptr4, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.7988837Z ^ 2025-12-04T12:15:05.7989339Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.7989346Z 2025-12-04T12:15:05.7990063Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:05.7990070Z 2025-12-04T12:15:05.7990074Z 2025-12-04T12:15:05.7990306Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:05.7991028Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,1,15_cuda 2025-12-04T12:15:05.7991033Z 2025-12-04T12:15:05.7991303Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:05.7991498Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T12:15:05.7991698Z ================== 1 failed, 32 deselected, 2 rerun in 4.22s =================== 2025-12-04T12:15:05.7991818Z Got exit code 1 2025-12-04T12:15:05.7991927Z Retrying single test... 2025-12-04T12:15:05.7992401Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-63ef36c446edecf7.xml 2025-12-04T12:15:05.7992638Z ============================= test session starts ============================== 2025-12-04T12:15:05.7992991Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T12:15:05.7993105Z cachedir: .pytest_cache 2025-12-04T12:15:05.7993638Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:15:05.7993766Z rootdir: /var/lib/jenkins/workspace 2025-12-04T12:15:05.7993889Z configfile: pytest.ini 2025-12-04T12:15:05.7994482Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T12:15:05.7994707Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T12:15:05.7995489Z stepcurrent: skipping 32 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,1,15_cuda 2025-12-04T12:15:05.7995607Z Running 1 items in this shard 2025-12-04T12:15:05.7995612Z 2025-12-04T12:15:05.7997039Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,1,15_cuda E1204 12:01:17.704000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0 2025-12-04T12:15:05.7998141Z E1204 12:01:17.704000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0(in_ptr0, in_ptr1, out_ptr3, out_ptr4, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.7998588Z E1204 12:01:17.704000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T12:15:05.7999064Z E1204 12:01:17.704000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 15 2025-12-04T12:15:05.7999575Z E1204 12:01:17.704000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] R0_BLOCK: tl.constexpr = 16 2025-12-04T12:15:05.8000054Z E1204 12:01:17.704000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T12:15:05.8000590Z E1204 12:01:17.704000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T12:15:05.8001144Z E1204 12:01:17.704000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:05.8001777Z E1204 12:01:17.704000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T12:15:05.8002365Z E1204 12:01:17.704000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T12:15:05.8002936Z E1204 12:01:17.704000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T12:15:05.8003410Z E1204 12:01:17.704000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_offset = 0 2025-12-04T12:15:05.8003940Z E1204 12:01:17.704000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T12:15:05.8004414Z E1204 12:01:17.704000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T12:15:05.8004870Z E1204 12:01:17.704000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T12:15:05.8005335Z E1204 12:01:17.704000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_0 = r0_index 2025-12-04T12:15:05.8006017Z E1204 12:01:17.704000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0).to(tl.float32) 2025-12-04T12:15:05.8006557Z E1204 12:01:17.704000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp30 = tl.load(in_ptr1 + (0)) 2025-12-04T12:15:05.8007109Z E1204 12:01:17.704000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp31 = tl.broadcast_to(tmp30, [1, 1]) 2025-12-04T12:15:05.8007617Z E1204 12:01:17.704000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float32) 2025-12-04T12:15:05.8008221Z E1204 12:01:17.704000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK]) 2025-12-04T12:15:05.8008759Z E1204 12:01:17.704000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tl.where(r0_mask, tmp2, 0) 2025-12-04T12:15:05.8009364Z E1204 12:01:17.704000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = tl.broadcast_to(tmp2, [XBLOCK, R0_BLOCK]) 2025-12-04T12:15:05.8009904Z E1204 12:01:17.704000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = tl.where(r0_mask, tmp5, 0) 2025-12-04T12:15:05.8010491Z E1204 12:01:17.704000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tl.sum(tmp7, 1)[:, None].to(tl.float32) 2025-12-04T12:15:05.8011031Z E1204 12:01:17.704000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = tl.full([1, 1], 15, tl.int32) 2025-12-04T12:15:05.8011542Z E1204 12:01:17.704000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = tmp9.to(tl.float32) 2025-12-04T12:15:05.8012049Z E1204 12:01:17.704000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = (tmp8 / tmp10) 2025-12-04T12:15:05.8012560Z E1204 12:01:17.704000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = tmp2 - tmp11 2025-12-04T12:15:05.8013055Z E1204 12:01:17.704000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = tmp12 * tmp12 2025-12-04T12:15:05.8013648Z E1204 12:01:17.704000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp14 = tl.broadcast_to(tmp13, [XBLOCK, R0_BLOCK]) 2025-12-04T12:15:05.8014189Z E1204 12:01:17.704000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp16 = tl.where(r0_mask, tmp14, 0) 2025-12-04T12:15:05.8014783Z E1204 12:01:17.704000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp17 = tl.sum(tmp16, 1)[:, None].to(tl.float32) 2025-12-04T12:15:05.8015295Z E1204 12:01:17.704000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp18 = tmp1 - tmp11 2025-12-04T12:15:05.8015753Z E1204 12:01:17.704000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp19 = 15.0 2025-12-04T12:15:05.8016245Z E1204 12:01:17.704000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp20 = (tmp17 / tmp19) 2025-12-04T12:15:05.8016783Z E1204 12:01:17.704000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp21 = 1e-05 2025-12-04T12:15:05.8017283Z E1204 12:01:17.704000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp22 = tmp20 + tmp21 2025-12-04T12:15:05.8017812Z E1204 12:01:17.704000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp23 = libdevice.rsqrt(tmp22) 2025-12-04T12:15:05.8018311Z E1204 12:01:17.704000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp24 = tmp18 * tmp23 2025-12-04T12:15:05.8018816Z E1204 12:01:17.704000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp25 = tl_math.abs(tmp24) 2025-12-04T12:15:05.8019443Z E1204 12:01:17.704000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp26 = tl.broadcast_to(tmp25, [XBLOCK, R0_BLOCK]) 2025-12-04T12:15:05.8020034Z E1204 12:01:17.704000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp28 = tl.where(r0_mask, tmp26, float("-inf")) 2025-12-04T12:15:05.8020677Z E1204 12:01:17.704000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp29 = triton_helpers.max2(tmp28, 1)[:, None].to(tl.float32) 2025-12-04T12:15:05.8021173Z E1204 12:01:17.704000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp32 = tmp24 * tmp31 2025-12-04T12:15:05.8021617Z E1204 12:01:17.704000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp33 = -448.0 2025-12-04T12:15:05.8022192Z E1204 12:01:17.704000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp34 = triton_helpers.maximum(tmp32, tmp33) 2025-12-04T12:15:05.8022650Z E1204 12:01:17.704000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp35 = 448.0 2025-12-04T12:15:05.8023220Z E1204 12:01:17.704000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp36 = triton_helpers.minimum(tmp34, tmp35) 2025-12-04T12:15:05.8023765Z E1204 12:01:17.704000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp37 = tmp36.to(tl.float8e4nv) 2025-12-04T12:15:05.8024278Z E1204 12:01:17.704000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp38 = tmp29.to(tl.float32) 2025-12-04T12:15:05.8025000Z E1204 12:01:17.704000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr3 + (tl.broadcast_to(r0_0, [XBLOCK, R0_BLOCK])), tmp37, r0_mask) 2025-12-04T12:15:05.8025743Z E1204 12:01:17.704000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr4 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp38, None) 2025-12-04T12:15:05.8026115Z E1204 12:01:17.704000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:05.8028209Z E1204 12:01:17.704000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr3': '*fp8e4nv', 'out_ptr4': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 75} 2025-12-04T12:15:05.8028775Z E1204 12:01:17.704000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:05.8029832Z E1204 12:01:17.704000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.8030495Z E1204 12:01:17.704000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.8031406Z E1204 12:01:17.704000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.8032090Z E1204 12:01:17.704000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.8032990Z E1204 12:01:17.704000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.8033790Z E1204 12:01:17.704000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.8034398Z E1204 12:01:17.704000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T12:15:05.8035506Z E1204 12:01:17.704000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0(in_ptr0, in_ptr1, out_ptr3, out_ptr4, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.8035873Z E1204 12:01:17.704000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T12:15:05.8036779Z E1204 12:01:17.704000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.8036916Z ('RERUN', {'yellow': True}) [3.3944s] [100%] 2025-12-04T12:15:05.8038351Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,1,15_cuda E1204 12:01:18.157000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0 2025-12-04T12:15:05.8039434Z E1204 12:01:18.157000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0(in_ptr0, in_ptr1, out_ptr3, out_ptr4, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.8039875Z E1204 12:01:18.157000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T12:15:05.8040354Z E1204 12:01:18.157000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 15 2025-12-04T12:15:05.8040875Z E1204 12:01:18.157000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] R0_BLOCK: tl.constexpr = 16 2025-12-04T12:15:05.8041356Z E1204 12:01:18.157000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T12:15:05.8041891Z E1204 12:01:18.157000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T12:15:05.8042442Z E1204 12:01:18.157000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:05.8043060Z E1204 12:01:18.157000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T12:15:05.8043649Z E1204 12:01:18.157000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T12:15:05.8044219Z E1204 12:01:18.157000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T12:15:05.8044775Z E1204 12:01:18.157000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_offset = 0 2025-12-04T12:15:05.8045301Z E1204 12:01:18.157000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T12:15:05.8045772Z E1204 12:01:18.157000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T12:15:05.8046236Z E1204 12:01:18.157000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T12:15:05.8046695Z E1204 12:01:18.157000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_0 = r0_index 2025-12-04T12:15:05.8047370Z E1204 12:01:18.157000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0).to(tl.float32) 2025-12-04T12:15:05.8047908Z E1204 12:01:18.157000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp30 = tl.load(in_ptr1 + (0)) 2025-12-04T12:15:05.8048453Z E1204 12:01:18.157000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp31 = tl.broadcast_to(tmp30, [1, 1]) 2025-12-04T12:15:05.8048960Z E1204 12:01:18.157000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float32) 2025-12-04T12:15:05.8049555Z E1204 12:01:18.157000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK]) 2025-12-04T12:15:05.8050084Z E1204 12:01:18.157000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tl.where(r0_mask, tmp2, 0) 2025-12-04T12:15:05.8050679Z E1204 12:01:18.157000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = tl.broadcast_to(tmp2, [XBLOCK, R0_BLOCK]) 2025-12-04T12:15:05.8051218Z E1204 12:01:18.157000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = tl.where(r0_mask, tmp5, 0) 2025-12-04T12:15:05.8051782Z E1204 12:01:18.157000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tl.sum(tmp7, 1)[:, None].to(tl.float32) 2025-12-04T12:15:05.8052329Z E1204 12:01:18.157000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = tl.full([1, 1], 15, tl.int32) 2025-12-04T12:15:05.8052843Z E1204 12:01:18.157000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = tmp9.to(tl.float32) 2025-12-04T12:15:05.8053343Z E1204 12:01:18.157000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = (tmp8 / tmp10) 2025-12-04T12:15:05.8053874Z E1204 12:01:18.157000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = tmp2 - tmp11 2025-12-04T12:15:05.8054369Z E1204 12:01:18.157000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = tmp12 * tmp12 2025-12-04T12:15:05.8054958Z E1204 12:01:18.157000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp14 = tl.broadcast_to(tmp13, [XBLOCK, R0_BLOCK]) 2025-12-04T12:15:05.8055495Z E1204 12:01:18.157000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp16 = tl.where(r0_mask, tmp14, 0) 2025-12-04T12:15:05.8056122Z E1204 12:01:18.157000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp17 = tl.sum(tmp16, 1)[:, None].to(tl.float32) 2025-12-04T12:15:05.8056682Z E1204 12:01:18.157000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp18 = tmp1 - tmp11 2025-12-04T12:15:05.8057139Z E1204 12:01:18.157000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp19 = 15.0 2025-12-04T12:15:05.8057629Z E1204 12:01:18.157000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp20 = (tmp17 / tmp19) 2025-12-04T12:15:05.8058109Z E1204 12:01:18.157000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp21 = 1e-05 2025-12-04T12:15:05.8058604Z E1204 12:01:18.157000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp22 = tmp20 + tmp21 2025-12-04T12:15:05.8059132Z E1204 12:01:18.157000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp23 = libdevice.rsqrt(tmp22) 2025-12-04T12:15:05.8059628Z E1204 12:01:18.157000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp24 = tmp18 * tmp23 2025-12-04T12:15:05.8060135Z E1204 12:01:18.157000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp25 = tl_math.abs(tmp24) 2025-12-04T12:15:05.8060755Z E1204 12:01:18.157000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp26 = tl.broadcast_to(tmp25, [XBLOCK, R0_BLOCK]) 2025-12-04T12:15:05.8061342Z E1204 12:01:18.157000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp28 = tl.where(r0_mask, tmp26, float("-inf")) 2025-12-04T12:15:05.8061979Z E1204 12:01:18.157000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp29 = triton_helpers.max2(tmp28, 1)[:, None].to(tl.float32) 2025-12-04T12:15:05.8062471Z E1204 12:01:18.157000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp32 = tmp24 * tmp31 2025-12-04T12:15:05.8062913Z E1204 12:01:18.157000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp33 = -448.0 2025-12-04T12:15:05.8063479Z E1204 12:01:18.157000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp34 = triton_helpers.maximum(tmp32, tmp33) 2025-12-04T12:15:05.8063934Z E1204 12:01:18.157000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp35 = 448.0 2025-12-04T12:15:05.8064505Z E1204 12:01:18.157000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp36 = triton_helpers.minimum(tmp34, tmp35) 2025-12-04T12:15:05.8065047Z E1204 12:01:18.157000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp37 = tmp36.to(tl.float8e4nv) 2025-12-04T12:15:05.8065560Z E1204 12:01:18.157000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp38 = tmp29.to(tl.float32) 2025-12-04T12:15:05.8066264Z E1204 12:01:18.157000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr3 + (tl.broadcast_to(r0_0, [XBLOCK, R0_BLOCK])), tmp37, r0_mask) 2025-12-04T12:15:05.8067011Z E1204 12:01:18.157000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr4 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp38, None) 2025-12-04T12:15:05.8067380Z E1204 12:01:18.157000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:05.8069526Z E1204 12:01:18.157000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr3': '*fp8e4nv', 'out_ptr4': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 75} 2025-12-04T12:15:05.8070065Z E1204 12:01:18.157000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:05.8071317Z E1204 12:01:18.157000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.8072024Z E1204 12:01:18.157000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.8072938Z E1204 12:01:18.157000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.8073626Z E1204 12:01:18.157000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.8074521Z E1204 12:01:18.157000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.8075335Z E1204 12:01:18.157000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.8075943Z E1204 12:01:18.157000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T12:15:05.8077046Z E1204 12:01:18.157000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0(in_ptr0, in_ptr1, out_ptr3, out_ptr4, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.8077416Z E1204 12:01:18.157000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T12:15:05.8078320Z E1204 12:01:18.157000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.8078455Z ('RERUN', {'yellow': True}) [0.4155s] [100%] 2025-12-04T12:15:05.8079884Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,1,15_cuda E1204 12:01:18.585000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0 2025-12-04T12:15:05.8080968Z E1204 12:01:18.585000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0(in_ptr0, in_ptr1, out_ptr3, out_ptr4, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.8081397Z E1204 12:01:18.585000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T12:15:05.8081899Z E1204 12:01:18.585000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 15 2025-12-04T12:15:05.8082414Z E1204 12:01:18.585000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] R0_BLOCK: tl.constexpr = 16 2025-12-04T12:15:05.8082886Z E1204 12:01:18.585000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T12:15:05.8083416Z E1204 12:01:18.585000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T12:15:05.8083954Z E1204 12:01:18.585000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:05.8084584Z E1204 12:01:18.585000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T12:15:05.8085173Z E1204 12:01:18.585000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T12:15:05.8085740Z E1204 12:01:18.585000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T12:15:05.8086208Z E1204 12:01:18.585000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_offset = 0 2025-12-04T12:15:05.8086737Z E1204 12:01:18.585000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T12:15:05.8087208Z E1204 12:01:18.585000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T12:15:05.8087671Z E1204 12:01:18.585000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T12:15:05.8088173Z E1204 12:01:18.585000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_0 = r0_index 2025-12-04T12:15:05.8088817Z E1204 12:01:18.585000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0).to(tl.float32) 2025-12-04T12:15:05.8089358Z E1204 12:01:18.585000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp30 = tl.load(in_ptr1 + (0)) 2025-12-04T12:15:05.8089903Z E1204 12:01:18.585000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp31 = tl.broadcast_to(tmp30, [1, 1]) 2025-12-04T12:15:05.8090408Z E1204 12:01:18.585000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float32) 2025-12-04T12:15:05.8091003Z E1204 12:01:18.585000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK]) 2025-12-04T12:15:05.8091539Z E1204 12:01:18.585000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tl.where(r0_mask, tmp2, 0) 2025-12-04T12:15:05.8092131Z E1204 12:01:18.585000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = tl.broadcast_to(tmp2, [XBLOCK, R0_BLOCK]) 2025-12-04T12:15:05.8092664Z E1204 12:01:18.585000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = tl.where(r0_mask, tmp5, 0) 2025-12-04T12:15:05.8093238Z E1204 12:01:18.585000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tl.sum(tmp7, 1)[:, None].to(tl.float32) 2025-12-04T12:15:05.8093785Z E1204 12:01:18.585000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = tl.full([1, 1], 15, tl.int32) 2025-12-04T12:15:05.8094294Z E1204 12:01:18.585000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = tmp9.to(tl.float32) 2025-12-04T12:15:05.8094824Z E1204 12:01:18.585000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = (tmp8 / tmp10) 2025-12-04T12:15:05.8095306Z E1204 12:01:18.585000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = tmp2 - tmp11 2025-12-04T12:15:05.8095789Z E1204 12:01:18.585000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = tmp12 * tmp12 2025-12-04T12:15:05.8096452Z E1204 12:01:18.585000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp14 = tl.broadcast_to(tmp13, [XBLOCK, R0_BLOCK]) 2025-12-04T12:15:05.8096995Z E1204 12:01:18.585000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp16 = tl.where(r0_mask, tmp14, 0) 2025-12-04T12:15:05.8097616Z E1204 12:01:18.585000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp17 = tl.sum(tmp16, 1)[:, None].to(tl.float32) 2025-12-04T12:15:05.8098097Z E1204 12:01:18.585000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp18 = tmp1 - tmp11 2025-12-04T12:15:05.8098549Z E1204 12:01:18.585000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp19 = 15.0 2025-12-04T12:15:05.8099043Z E1204 12:01:18.585000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp20 = (tmp17 / tmp19) 2025-12-04T12:15:05.8099515Z E1204 12:01:18.585000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp21 = 1e-05 2025-12-04T12:15:05.8100007Z E1204 12:01:18.585000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp22 = tmp20 + tmp21 2025-12-04T12:15:05.8100534Z E1204 12:01:18.585000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp23 = libdevice.rsqrt(tmp22) 2025-12-04T12:15:05.8101027Z E1204 12:01:18.585000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp24 = tmp18 * tmp23 2025-12-04T12:15:05.8101584Z E1204 12:01:18.585000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp25 = tl_math.abs(tmp24) 2025-12-04T12:15:05.8102170Z E1204 12:01:18.585000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp26 = tl.broadcast_to(tmp25, [XBLOCK, R0_BLOCK]) 2025-12-04T12:15:05.8102760Z E1204 12:01:18.585000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp28 = tl.where(r0_mask, tmp26, float("-inf")) 2025-12-04T12:15:05.8103397Z E1204 12:01:18.585000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp29 = triton_helpers.max2(tmp28, 1)[:, None].to(tl.float32) 2025-12-04T12:15:05.8103890Z E1204 12:01:18.585000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp32 = tmp24 * tmp31 2025-12-04T12:15:05.8104335Z E1204 12:01:18.585000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp33 = -448.0 2025-12-04T12:15:05.8104907Z E1204 12:01:18.585000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp34 = triton_helpers.maximum(tmp32, tmp33) 2025-12-04T12:15:05.8105371Z E1204 12:01:18.585000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp35 = 448.0 2025-12-04T12:15:05.8105947Z E1204 12:01:18.585000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp36 = triton_helpers.minimum(tmp34, tmp35) 2025-12-04T12:15:05.8106499Z E1204 12:01:18.585000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp37 = tmp36.to(tl.float8e4nv) 2025-12-04T12:15:05.8107015Z E1204 12:01:18.585000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp38 = tmp29.to(tl.float32) 2025-12-04T12:15:05.8107722Z E1204 12:01:18.585000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr3 + (tl.broadcast_to(r0_0, [XBLOCK, R0_BLOCK])), tmp37, r0_mask) 2025-12-04T12:15:05.8108476Z E1204 12:01:18.585000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr4 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp38, None) 2025-12-04T12:15:05.8108844Z E1204 12:01:18.585000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:05.8110986Z E1204 12:01:18.585000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr3': '*fp8e4nv', 'out_ptr4': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 75} 2025-12-04T12:15:05.8111533Z E1204 12:01:18.585000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:05.8112590Z E1204 12:01:18.585000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.8113249Z E1204 12:01:18.585000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.8114150Z E1204 12:01:18.585000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.8114836Z E1204 12:01:18.585000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.8115768Z E1204 12:01:18.585000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.8116539Z E1204 12:01:18.585000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.8117153Z E1204 12:01:18.585000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T12:15:05.8118264Z E1204 12:01:18.585000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0(in_ptr0, in_ptr1, out_ptr3, out_ptr4, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.8118635Z E1204 12:01:18.585000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T12:15:05.8119552Z E1204 12:01:18.585000 120101 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.8119665Z FAILED [0.4246s] [100%] 2025-12-04T12:15:05.8119670Z 2025-12-04T12:15:05.8119819Z ==================================== RERUNS ==================================== 2025-12-04T12:15:05.8120217Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,1,15_cuda _ 2025-12-04T12:15:05.8120346Z Traceback (most recent call last): 2025-12-04T12:15:05.8120793Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant 2025-12-04T12:15:05.8121034Z y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T12:15:05.8121526Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:05.8121831Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:05.8122348Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:05.8122560Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:05.8123071Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:05.8123219Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:05.8123766Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:05.8124089Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:05.8124645Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:05.8124816Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:05.8125297Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:05.8125435Z return self._compile_to_module() 2025-12-04T12:15:05.8125954Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:05.8126124Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:05.8126656Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:05.8126791Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:05.8127300Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:05.8127536Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:05.8128152Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:05.8128294Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:05.8128794Z File "/tmp/tmpovscfvm8/2t/c2tvd4waojawcq2jd6vt4hx66yjsr73vag7aljgtsa2xwwxxq476.py", line 74, in 2025-12-04T12:15:05.8129271Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T12:15:05.8129383Z kernel.precompile( 2025-12-04T12:15:05.8129939Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T12:15:05.8130075Z self._precompile_worker() 2025-12-04T12:15:05.8130669Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:05.8130848Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:05.8131458Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.8131654Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.8132115Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.8132361Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.8132801Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.8133150Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.8133376Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:05.8134041Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0(in_ptr0, in_ptr1, out_ptr3, out_ptr4, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.8134134Z ^ 2025-12-04T12:15:05.8134623Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.8134629Z 2025-12-04T12:15:05.8135351Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:05.8135360Z 2025-12-04T12:15:05.8135365Z 2025-12-04T12:15:05.8135580Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:05.8136350Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,1,15_cuda 2025-12-04T12:15:05.8136357Z 2025-12-04T12:15:05.8136670Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:05.8136897Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.8137021Z frames [('total', 1)] 2025-12-04T12:15:05.8137142Z stats [('calls_captured', 10)] 2025-12-04T12:15:05.8137626Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:05.8137879Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.8137979Z graph_break [] 2025-12-04T12:15:05.8138373Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,1,15_cuda _ 2025-12-04T12:15:05.8138499Z Traceback (most recent call last): 2025-12-04T12:15:05.8138924Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant 2025-12-04T12:15:05.8139171Z y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T12:15:05.8139659Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:05.8139950Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:05.8140466Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:05.8140659Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:05.8141188Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:05.8141335Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:05.8141868Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:05.8142203Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:05.8142726Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:05.8142889Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:05.8143371Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:05.8143493Z return self._compile_to_module() 2025-12-04T12:15:05.8143994Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:05.8144157Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:05.8144682Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:05.8144818Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:05.8145313Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:05.8145562Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:05.8146181Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:05.8146325Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:05.8146825Z File "/tmp/tmpnhz4w3ix/ef/cefcji5ilwaclwro5i25bancohnqhnyhfzwxnybhn7chngw53xcd.py", line 74, in 2025-12-04T12:15:05.8147289Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T12:15:05.8147414Z kernel.precompile( 2025-12-04T12:15:05.8147966Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T12:15:05.8148084Z self._precompile_worker() 2025-12-04T12:15:05.8148723Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:05.8148902Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:05.8149512Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.8149710Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.8150159Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.8150466Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.8150909Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.8151255Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.8151481Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:05.8152132Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0(in_ptr0, in_ptr1, out_ptr3, out_ptr4, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.8152268Z ^ 2025-12-04T12:15:05.8152729Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.8152735Z 2025-12-04T12:15:05.8153455Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:05.8153463Z 2025-12-04T12:15:05.8153468Z 2025-12-04T12:15:05.8153685Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:05.8154372Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,1,15_cuda 2025-12-04T12:15:05.8154377Z 2025-12-04T12:15:05.8154657Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:05.8154881Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.8155002Z frames [('total', 1)] 2025-12-04T12:15:05.8155119Z stats [('calls_captured', 10)] 2025-12-04T12:15:05.8155584Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:05.8155821Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.8155922Z graph_break [] 2025-12-04T12:15:05.8156140Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.8156258Z frames [('total', 1)] 2025-12-04T12:15:05.8156374Z stats [('calls_captured', 10)] 2025-12-04T12:15:05.8156606Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.8157065Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:05.8157168Z graph_break [] 2025-12-04T12:15:05.8157327Z =================================== FAILURES =================================== 2025-12-04T12:15:05.8157742Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,1,15_cuda _ 2025-12-04T12:15:05.8157868Z Traceback (most recent call last): 2025-12-04T12:15:05.8158305Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant 2025-12-04T12:15:05.8158540Z y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T12:15:05.8159040Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:05.8159288Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:05.8159801Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:05.8160037Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:05.8160546Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:05.8160709Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:05.8161239Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:05.8161591Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:05.8162122Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:05.8162272Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:05.8162750Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:05.8162886Z return self._compile_to_module() 2025-12-04T12:15:05.8163373Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:05.8163579Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:05.8164098Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:05.8164228Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:05.8164736Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:05.8164968Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:05.8165565Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:05.8165694Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:05.8166194Z File "/tmp/tmpnsm7xs1c/lp/clprq5564qirf3ierb2tnnvy6ickadup4jypktvrti6ssgf45oyd.py", line 74, in 2025-12-04T12:15:05.8166670Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T12:15:05.8166785Z kernel.precompile( 2025-12-04T12:15:05.8167344Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T12:15:05.8167478Z self._precompile_worker() 2025-12-04T12:15:05.8168078Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:05.8168269Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:05.8168864Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.8169060Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.8169525Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.8169770Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.8170258Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.8170598Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.8170825Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:05.8171692Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0(in_ptr0, in_ptr1, out_ptr3, out_ptr4, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.8171786Z ^ 2025-12-04T12:15:05.8172243Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.8172264Z 2025-12-04T12:15:05.8173053Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:05.8173062Z 2025-12-04T12:15:05.8173067Z 2025-12-04T12:15:05.8173285Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:05.8173990Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,1,15_cuda 2025-12-04T12:15:05.8174039Z 2025-12-04T12:15:05.8174306Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:05.8174542Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.8174648Z frames [('total', 1)] 2025-12-04T12:15:05.8174765Z stats [('calls_captured', 10)] 2025-12-04T12:15:05.8175248Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:05.8175475Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.8175576Z graph_break [] 2025-12-04T12:15:05.8175855Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.8175962Z frames [('total', 1)] 2025-12-04T12:15:05.8176090Z stats [('calls_captured', 10)] 2025-12-04T12:15:05.8176372Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.8176839Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:05.8176956Z graph_break [] 2025-12-04T12:15:05.8177174Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.8177280Z frames [('total', 1)] 2025-12-04T12:15:05.8177414Z stats [('calls_captured', 10)] 2025-12-04T12:15:05.8177633Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.8178111Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:05.8178213Z graph_break [] 2025-12-04T12:15:05.8178872Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-63ef36c446edecf7.xml - 2025-12-04T12:15:05.8179062Z =========================== short test summary info ============================ 2025-12-04T12:15:05.8179899Z FAILED [0.4246s] inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,1,15_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:05.8180559Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0(in_ptr0, in_ptr1, out_ptr3, out_ptr4, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.8180654Z ^ 2025-12-04T12:15:05.8181115Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.8181121Z 2025-12-04T12:15:05.8181849Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:05.8181908Z 2025-12-04T12:15:05.8181914Z 2025-12-04T12:15:05.8182135Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:05.8182840Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,1,15_cuda 2025-12-04T12:15:05.8182848Z 2025-12-04T12:15:05.8183117Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:05.8183301Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T12:15:05.8183524Z ================== 1 failed, 187 deselected, 2 rerun in 4.28s ================== 2025-12-04T12:15:05.8183628Z Got exit code 1 2025-12-04T12:15:05.8183752Z Retrying single test... 2025-12-04T12:15:05.8184254Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-818cc5e6f257d295.xml 2025-12-04T12:15:05.8184428Z ============================= test session starts ============================== 2025-12-04T12:15:05.8184795Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T12:15:05.8184908Z cachedir: .pytest_cache 2025-12-04T12:15:05.8185464Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:15:05.8185610Z rootdir: /var/lib/jenkins/workspace 2025-12-04T12:15:05.8185722Z configfile: pytest.ini 2025-12-04T12:15:05.8186333Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T12:15:05.8186554Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T12:15:05.8187326Z stepcurrent: skipping 32 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,1,15_cuda 2025-12-04T12:15:05.8187493Z Running 1 items in this shard 2025-12-04T12:15:05.8187502Z 2025-12-04T12:15:05.8188921Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,1,15_cuda E1204 12:01:37.514000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0 2025-12-04T12:15:05.8190030Z E1204 12:01:37.514000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0(in_ptr0, in_ptr1, out_ptr3, out_ptr4, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.8190466Z E1204 12:01:37.514000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T12:15:05.8190921Z E1204 12:01:37.514000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 15 2025-12-04T12:15:05.8191443Z E1204 12:01:37.514000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] R0_BLOCK: tl.constexpr = 16 2025-12-04T12:15:05.8191907Z E1204 12:01:37.514000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T12:15:05.8192456Z E1204 12:01:37.514000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T12:15:05.8193000Z E1204 12:01:37.514000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:05.8193604Z E1204 12:01:37.514000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T12:15:05.8194192Z E1204 12:01:37.514000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T12:15:05.8194782Z E1204 12:01:37.514000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T12:15:05.8195247Z E1204 12:01:37.514000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_offset = 0 2025-12-04T12:15:05.8195765Z E1204 12:01:37.514000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T12:15:05.8196245Z E1204 12:01:37.514000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T12:15:05.8196704Z E1204 12:01:37.514000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T12:15:05.8197195Z E1204 12:01:37.514000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_0 = r0_index 2025-12-04T12:15:05.8197857Z E1204 12:01:37.514000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0).to(tl.float32) 2025-12-04T12:15:05.8198384Z E1204 12:01:37.514000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp30 = tl.load(in_ptr1 + (0)) 2025-12-04T12:15:05.8198971Z E1204 12:01:37.514000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp31 = tl.broadcast_to(tmp30, [1, 1]) 2025-12-04T12:15:05.8199477Z E1204 12:01:37.514000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float32) 2025-12-04T12:15:05.8200056Z E1204 12:01:37.514000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK]) 2025-12-04T12:15:05.8200606Z E1204 12:01:37.514000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tl.where(r0_mask, tmp2, 0) 2025-12-04T12:15:05.8201189Z E1204 12:01:37.514000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = tl.broadcast_to(tmp2, [XBLOCK, R0_BLOCK]) 2025-12-04T12:15:05.8201780Z E1204 12:01:37.514000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = tl.where(r0_mask, tmp5, 0) 2025-12-04T12:15:05.8202354Z E1204 12:01:37.514000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tl.sum(tmp7, 1)[:, None].to(tl.float32) 2025-12-04T12:15:05.8202898Z E1204 12:01:37.514000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = tl.full([1, 1], 15, tl.int32) 2025-12-04T12:15:05.8203408Z E1204 12:01:37.514000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = tmp9.to(tl.float32) 2025-12-04T12:15:05.8203895Z E1204 12:01:37.514000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = (tmp8 / tmp10) 2025-12-04T12:15:05.8204384Z E1204 12:01:37.514000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = tmp2 - tmp11 2025-12-04T12:15:05.8204866Z E1204 12:01:37.514000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = tmp12 * tmp12 2025-12-04T12:15:05.8205466Z E1204 12:01:37.514000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp14 = tl.broadcast_to(tmp13, [XBLOCK, R0_BLOCK]) 2025-12-04T12:15:05.8206008Z E1204 12:01:37.514000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp16 = tl.where(r0_mask, tmp14, 0) 2025-12-04T12:15:05.8206585Z E1204 12:01:37.514000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp17 = tl.sum(tmp16, 1)[:, None].to(tl.float32) 2025-12-04T12:15:05.8207077Z E1204 12:01:37.514000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp18 = tmp1 - tmp11 2025-12-04T12:15:05.8207515Z E1204 12:01:37.514000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp19 = 15.0 2025-12-04T12:15:05.8208053Z E1204 12:01:37.514000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp20 = (tmp17 / tmp19) 2025-12-04T12:15:05.8208494Z E1204 12:01:37.514000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp21 = 1e-05 2025-12-04T12:15:05.8208978Z E1204 12:01:37.514000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp22 = tmp20 + tmp21 2025-12-04T12:15:05.8209521Z E1204 12:01:37.514000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp23 = libdevice.rsqrt(tmp22) 2025-12-04T12:15:05.8209999Z E1204 12:01:37.514000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp24 = tmp18 * tmp23 2025-12-04T12:15:05.8210550Z E1204 12:01:37.514000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp25 = tl_math.abs(tmp24) 2025-12-04T12:15:05.8211146Z E1204 12:01:37.514000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp26 = tl.broadcast_to(tmp25, [XBLOCK, R0_BLOCK]) 2025-12-04T12:15:05.8211724Z E1204 12:01:37.514000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp28 = tl.where(r0_mask, tmp26, float("-inf")) 2025-12-04T12:15:05.8212402Z E1204 12:01:37.514000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp29 = triton_helpers.max2(tmp28, 1)[:, None].to(tl.float32) 2025-12-04T12:15:05.8212882Z E1204 12:01:37.514000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp32 = tmp24 * tmp31 2025-12-04T12:15:05.8213336Z E1204 12:01:37.514000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp33 = -448.0 2025-12-04T12:15:05.8213912Z E1204 12:01:37.514000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp34 = triton_helpers.maximum(tmp32, tmp33) 2025-12-04T12:15:05.8214385Z E1204 12:01:37.514000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp35 = 448.0 2025-12-04T12:15:05.8214967Z E1204 12:01:37.514000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp36 = triton_helpers.minimum(tmp34, tmp35) 2025-12-04T12:15:05.8215502Z E1204 12:01:37.514000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp37 = tmp36.to(tl.float8e4nv) 2025-12-04T12:15:05.8216031Z E1204 12:01:37.514000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp38 = tmp29.to(tl.float32) 2025-12-04T12:15:05.8216802Z E1204 12:01:37.514000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr3 + (tl.broadcast_to(r0_0, [XBLOCK, R0_BLOCK])), tmp37, r0_mask) 2025-12-04T12:15:05.8217523Z E1204 12:01:37.514000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr4 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp38, None) 2025-12-04T12:15:05.8217892Z E1204 12:01:37.514000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:05.8219997Z E1204 12:01:37.514000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr3': '*fp8e4nv', 'out_ptr4': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 75} 2025-12-04T12:15:05.8220553Z E1204 12:01:37.514000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:05.8221647Z E1204 12:01:37.514000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.8222293Z E1204 12:01:37.514000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.8223187Z E1204 12:01:37.514000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.8223878Z E1204 12:01:37.514000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.8224793Z E1204 12:01:37.514000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.8225579Z E1204 12:01:37.514000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.8226217Z E1204 12:01:37.514000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T12:15:05.8227324Z E1204 12:01:37.514000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0(in_ptr0, in_ptr1, out_ptr3, out_ptr4, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.8227693Z E1204 12:01:37.514000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T12:15:05.8228589Z E1204 12:01:37.514000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.8228775Z ('RERUN', {'yellow': True}) [3.3705s] [100%] 2025-12-04T12:15:05.8230194Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,1,15_cuda E1204 12:01:37.963000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0 2025-12-04T12:15:05.8231297Z E1204 12:01:37.963000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0(in_ptr0, in_ptr1, out_ptr3, out_ptr4, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.8231727Z E1204 12:01:37.963000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T12:15:05.8232180Z E1204 12:01:37.963000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 15 2025-12-04T12:15:05.8232699Z E1204 12:01:37.963000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] R0_BLOCK: tl.constexpr = 16 2025-12-04T12:15:05.8233160Z E1204 12:01:37.963000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T12:15:05.8233707Z E1204 12:01:37.963000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T12:15:05.8234245Z E1204 12:01:37.963000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:05.8234840Z E1204 12:01:37.963000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T12:15:05.8235422Z E1204 12:01:37.963000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T12:15:05.8236006Z E1204 12:01:37.963000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T12:15:05.8236466Z E1204 12:01:37.963000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_offset = 0 2025-12-04T12:15:05.8236982Z E1204 12:01:37.963000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T12:15:05.8237466Z E1204 12:01:37.963000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T12:15:05.8237925Z E1204 12:01:37.963000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T12:15:05.8238399Z E1204 12:01:37.963000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_0 = r0_index 2025-12-04T12:15:05.8239059Z E1204 12:01:37.963000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0).to(tl.float32) 2025-12-04T12:15:05.8239582Z E1204 12:01:37.963000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp30 = tl.load(in_ptr1 + (0)) 2025-12-04T12:15:05.8240167Z E1204 12:01:37.963000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp31 = tl.broadcast_to(tmp30, [1, 1]) 2025-12-04T12:15:05.8240673Z E1204 12:01:37.963000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float32) 2025-12-04T12:15:05.8241254Z E1204 12:01:37.963000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK]) 2025-12-04T12:15:05.8241805Z E1204 12:01:37.963000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tl.where(r0_mask, tmp2, 0) 2025-12-04T12:15:05.8242415Z E1204 12:01:37.963000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = tl.broadcast_to(tmp2, [XBLOCK, R0_BLOCK]) 2025-12-04T12:15:05.8242959Z E1204 12:01:37.963000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = tl.where(r0_mask, tmp5, 0) 2025-12-04T12:15:05.8243535Z E1204 12:01:37.963000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tl.sum(tmp7, 1)[:, None].to(tl.float32) 2025-12-04T12:15:05.8244084Z E1204 12:01:37.963000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = tl.full([1, 1], 15, tl.int32) 2025-12-04T12:15:05.8245205Z E1204 12:01:37.963000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = tmp9.to(tl.float32) 2025-12-04T12:15:05.8245698Z E1204 12:01:37.963000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = (tmp8 / tmp10) 2025-12-04T12:15:05.8246197Z E1204 12:01:37.963000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = tmp2 - tmp11 2025-12-04T12:15:05.8246677Z E1204 12:01:37.963000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = tmp12 * tmp12 2025-12-04T12:15:05.8247283Z E1204 12:01:37.963000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp14 = tl.broadcast_to(tmp13, [XBLOCK, R0_BLOCK]) 2025-12-04T12:15:05.8247822Z E1204 12:01:37.963000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp16 = tl.where(r0_mask, tmp14, 0) 2025-12-04T12:15:05.8248400Z E1204 12:01:37.963000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp17 = tl.sum(tmp16, 1)[:, None].to(tl.float32) 2025-12-04T12:15:05.8248892Z E1204 12:01:37.963000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp18 = tmp1 - tmp11 2025-12-04T12:15:05.8249328Z E1204 12:01:37.963000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp19 = 15.0 2025-12-04T12:15:05.8249899Z E1204 12:01:37.963000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp20 = (tmp17 / tmp19) 2025-12-04T12:15:05.8250345Z E1204 12:01:37.963000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp21 = 1e-05 2025-12-04T12:15:05.8250829Z E1204 12:01:37.963000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp22 = tmp20 + tmp21 2025-12-04T12:15:05.8251370Z E1204 12:01:37.963000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp23 = libdevice.rsqrt(tmp22) 2025-12-04T12:15:05.8251847Z E1204 12:01:37.963000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp24 = tmp18 * tmp23 2025-12-04T12:15:05.8252479Z E1204 12:01:37.963000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp25 = tl_math.abs(tmp24) 2025-12-04T12:15:05.8253071Z E1204 12:01:37.963000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp26 = tl.broadcast_to(tmp25, [XBLOCK, R0_BLOCK]) 2025-12-04T12:15:05.8253649Z E1204 12:01:37.963000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp28 = tl.where(r0_mask, tmp26, float("-inf")) 2025-12-04T12:15:05.8254334Z E1204 12:01:37.963000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp29 = triton_helpers.max2(tmp28, 1)[:, None].to(tl.float32) 2025-12-04T12:15:05.8254815Z E1204 12:01:37.963000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp32 = tmp24 * tmp31 2025-12-04T12:15:05.8255274Z E1204 12:01:37.963000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp33 = -448.0 2025-12-04T12:15:05.8255846Z E1204 12:01:37.963000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp34 = triton_helpers.maximum(tmp32, tmp33) 2025-12-04T12:15:05.8256397Z E1204 12:01:37.963000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp35 = 448.0 2025-12-04T12:15:05.8256984Z E1204 12:01:37.963000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp36 = triton_helpers.minimum(tmp34, tmp35) 2025-12-04T12:15:05.8257531Z E1204 12:01:37.963000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp37 = tmp36.to(tl.float8e4nv) 2025-12-04T12:15:05.8258179Z E1204 12:01:37.963000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp38 = tmp29.to(tl.float32) 2025-12-04T12:15:05.8258886Z E1204 12:01:37.963000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr3 + (tl.broadcast_to(r0_0, [XBLOCK, R0_BLOCK])), tmp37, r0_mask) 2025-12-04T12:15:05.8259611Z E1204 12:01:37.963000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr4 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp38, None) 2025-12-04T12:15:05.8259980Z E1204 12:01:37.963000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:05.8262069Z E1204 12:01:37.963000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr3': '*fp8e4nv', 'out_ptr4': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 75} 2025-12-04T12:15:05.8262625Z E1204 12:01:37.963000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:05.8263744Z E1204 12:01:37.963000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.8264392Z E1204 12:01:37.963000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.8265285Z E1204 12:01:37.963000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.8265983Z E1204 12:01:37.963000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.8266897Z E1204 12:01:37.963000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.8267683Z E1204 12:01:37.963000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.8268319Z E1204 12:01:37.963000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T12:15:05.8269412Z E1204 12:01:37.963000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0(in_ptr0, in_ptr1, out_ptr3, out_ptr4, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.8269793Z E1204 12:01:37.963000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T12:15:05.8270687Z E1204 12:01:37.963000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.8270866Z ('RERUN', {'yellow': True}) [0.4110s] [100%] 2025-12-04T12:15:05.8272483Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,1,15_cuda E1204 12:01:38.379000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0 2025-12-04T12:15:05.8273588Z E1204 12:01:38.379000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0(in_ptr0, in_ptr1, out_ptr3, out_ptr4, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.8274022Z E1204 12:01:38.379000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T12:15:05.8274469Z E1204 12:01:38.379000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 15 2025-12-04T12:15:05.8274999Z E1204 12:01:38.379000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] R0_BLOCK: tl.constexpr = 16 2025-12-04T12:15:05.8275461Z E1204 12:01:38.379000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T12:15:05.8276007Z E1204 12:01:38.379000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T12:15:05.8276548Z E1204 12:01:38.379000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:05.8277153Z E1204 12:01:38.379000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T12:15:05.8277824Z E1204 12:01:38.379000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T12:15:05.8278391Z E1204 12:01:38.379000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T12:15:05.8278851Z E1204 12:01:38.379000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_offset = 0 2025-12-04T12:15:05.8279383Z E1204 12:01:38.379000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T12:15:05.8279871Z E1204 12:01:38.379000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T12:15:05.8280380Z E1204 12:01:38.379000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T12:15:05.8280830Z E1204 12:01:38.379000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_0 = r0_index 2025-12-04T12:15:05.8281495Z E1204 12:01:38.379000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_0), r0_mask, other=0.0).to(tl.float32) 2025-12-04T12:15:05.8282022Z E1204 12:01:38.379000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp30 = tl.load(in_ptr1 + (0)) 2025-12-04T12:15:05.8282628Z E1204 12:01:38.379000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp31 = tl.broadcast_to(tmp30, [1, 1]) 2025-12-04T12:15:05.8283142Z E1204 12:01:38.379000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float32) 2025-12-04T12:15:05.8283727Z E1204 12:01:38.379000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK]) 2025-12-04T12:15:05.8284275Z E1204 12:01:38.379000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tl.where(r0_mask, tmp2, 0) 2025-12-04T12:15:05.8284901Z E1204 12:01:38.379000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = tl.broadcast_to(tmp2, [XBLOCK, R0_BLOCK]) 2025-12-04T12:15:05.8285445Z E1204 12:01:38.379000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = tl.where(r0_mask, tmp5, 0) 2025-12-04T12:15:05.8286022Z E1204 12:01:38.379000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tl.sum(tmp7, 1)[:, None].to(tl.float32) 2025-12-04T12:15:05.8286557Z E1204 12:01:38.379000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = tl.full([1, 1], 15, tl.int32) 2025-12-04T12:15:05.8287391Z E1204 12:01:38.379000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = tmp9.to(tl.float32) 2025-12-04T12:15:05.8287890Z E1204 12:01:38.379000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = (tmp8 / tmp10) 2025-12-04T12:15:05.8288390Z E1204 12:01:38.379000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = tmp2 - tmp11 2025-12-04T12:15:05.8288869Z E1204 12:01:38.379000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = tmp12 * tmp12 2025-12-04T12:15:05.8289473Z E1204 12:01:38.379000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp14 = tl.broadcast_to(tmp13, [XBLOCK, R0_BLOCK]) 2025-12-04T12:15:05.8290013Z E1204 12:01:38.379000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp16 = tl.where(r0_mask, tmp14, 0) 2025-12-04T12:15:05.8290593Z E1204 12:01:38.379000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp17 = tl.sum(tmp16, 1)[:, None].to(tl.float32) 2025-12-04T12:15:05.8291091Z E1204 12:01:38.379000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp18 = tmp1 - tmp11 2025-12-04T12:15:05.8291587Z E1204 12:01:38.379000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp19 = 15.0 2025-12-04T12:15:05.8292092Z E1204 12:01:38.379000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp20 = (tmp17 / tmp19) 2025-12-04T12:15:05.8292539Z E1204 12:01:38.379000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp21 = 1e-05 2025-12-04T12:15:05.8293021Z E1204 12:01:38.379000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp22 = tmp20 + tmp21 2025-12-04T12:15:05.8293566Z E1204 12:01:38.379000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp23 = libdevice.rsqrt(tmp22) 2025-12-04T12:15:05.8294106Z E1204 12:01:38.379000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp24 = tmp18 * tmp23 2025-12-04T12:15:05.8294635Z E1204 12:01:38.379000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp25 = tl_math.abs(tmp24) 2025-12-04T12:15:05.8295226Z E1204 12:01:38.379000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp26 = tl.broadcast_to(tmp25, [XBLOCK, R0_BLOCK]) 2025-12-04T12:15:05.8295805Z E1204 12:01:38.379000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp28 = tl.where(r0_mask, tmp26, float("-inf")) 2025-12-04T12:15:05.8296568Z E1204 12:01:38.379000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp29 = triton_helpers.max2(tmp28, 1)[:, None].to(tl.float32) 2025-12-04T12:15:05.8297056Z E1204 12:01:38.379000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp32 = tmp24 * tmp31 2025-12-04T12:15:05.8297524Z E1204 12:01:38.379000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp33 = -448.0 2025-12-04T12:15:05.8298096Z E1204 12:01:38.379000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp34 = triton_helpers.maximum(tmp32, tmp33) 2025-12-04T12:15:05.8298605Z E1204 12:01:38.379000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp35 = 448.0 2025-12-04T12:15:05.8299200Z E1204 12:01:38.379000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp36 = triton_helpers.minimum(tmp34, tmp35) 2025-12-04T12:15:05.8299738Z E1204 12:01:38.379000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp37 = tmp36.to(tl.float8e4nv) 2025-12-04T12:15:05.8300268Z E1204 12:01:38.379000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp38 = tmp29.to(tl.float32) 2025-12-04T12:15:05.8300980Z E1204 12:01:38.379000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr3 + (tl.broadcast_to(r0_0, [XBLOCK, R0_BLOCK])), tmp37, r0_mask) 2025-12-04T12:15:05.8301699Z E1204 12:01:38.379000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr4 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp38, None) 2025-12-04T12:15:05.8302064Z E1204 12:01:38.379000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:05.8304159Z E1204 12:01:38.379000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr3': '*fp8e4nv', 'out_ptr4': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 75} 2025-12-04T12:15:05.8304715Z E1204 12:01:38.379000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:05.8305808Z E1204 12:01:38.379000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.8306452Z E1204 12:01:38.379000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.8307339Z E1204 12:01:38.379000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.8308058Z E1204 12:01:38.379000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.8308945Z E1204 12:01:38.379000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.8309725Z E1204 12:01:38.379000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.8310367Z E1204 12:01:38.379000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T12:15:05.8311449Z E1204 12:01:38.379000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0(in_ptr0, in_ptr1, out_ptr3, out_ptr4, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.8311833Z E1204 12:01:38.379000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T12:15:05.8312756Z E1204 12:01:38.379000 120298 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.8312879Z FAILED [0.4143s] [100%] 2025-12-04T12:15:05.8312885Z 2025-12-04T12:15:05.8313032Z ==================================== RERUNS ==================================== 2025-12-04T12:15:05.8313411Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,1,15_cuda _ 2025-12-04T12:15:05.8313553Z Traceback (most recent call last): 2025-12-04T12:15:05.8313979Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant 2025-12-04T12:15:05.8314229Z y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T12:15:05.8314721Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:05.8314971Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:05.8315502Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:05.8315695Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:05.8316223Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:05.8316371Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:05.8316901Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:05.8317238Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:05.8317760Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:05.8317911Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:05.8318439Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:05.8318565Z return self._compile_to_module() 2025-12-04T12:15:05.8319061Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:05.8319229Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:05.8319743Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:05.8319890Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:05.8320385Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:05.8320661Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:05.8321252Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:05.8321383Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:05.8321904Z File "/tmp/tmpbdqisg2q/sv/csvvymowwh2iuybyivtqdf7qhfhnalutevq7yvxg3yh52agrfyfp.py", line 74, in 2025-12-04T12:15:05.8322397Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T12:15:05.8322510Z kernel.precompile( 2025-12-04T12:15:05.8323082Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T12:15:05.8323201Z self._precompile_worker() 2025-12-04T12:15:05.8323805Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:05.8323990Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:05.8324584Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.8324832Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.8325284Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.8325544Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.8325989Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.8326326Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.8326565Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:05.8327221Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0(in_ptr0, in_ptr1, out_ptr3, out_ptr4, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.8327329Z ^ 2025-12-04T12:15:05.8327791Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.8327797Z 2025-12-04T12:15:05.8328507Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:05.8328517Z 2025-12-04T12:15:05.8328522Z 2025-12-04T12:15:05.8328753Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:05.8329440Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,1,15_cuda 2025-12-04T12:15:05.8329446Z 2025-12-04T12:15:05.8329731Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:05.8329959Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.8330065Z frames [('total', 1)] 2025-12-04T12:15:05.8330199Z stats [('calls_captured', 10)] 2025-12-04T12:15:05.8330696Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:05.8330933Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.8331038Z graph_break [] 2025-12-04T12:15:05.8331418Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,1,15_cuda _ 2025-12-04T12:15:05.8331556Z Traceback (most recent call last): 2025-12-04T12:15:05.8331979Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant 2025-12-04T12:15:05.8332212Z y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T12:15:05.8332741Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:05.8332991Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:05.8333518Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:05.8333709Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:05.8334218Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:05.8334409Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:05.8334940Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:05.8335273Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:05.8335790Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:05.8335941Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:05.8336538Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:05.8336662Z return self._compile_to_module() 2025-12-04T12:15:05.8337146Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:05.8337329Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:05.8337846Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:05.8337992Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:05.8338488Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:05.8338720Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:05.8339325Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:05.8339454Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:05.8339962Z File "/tmp/tmpmk786yzz/66/c66zje36jroboq56lns2fp755z7atndlwfgrbo7xtw4jylsgr5tj.py", line 74, in 2025-12-04T12:15:05.8340430Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T12:15:05.8340545Z kernel.precompile( 2025-12-04T12:15:05.8341109Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T12:15:05.8341227Z self._precompile_worker() 2025-12-04T12:15:05.8341827Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:05.8342020Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:05.8342617Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.8342875Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.8343327Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.8343573Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.8344029Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.8344365Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.8344609Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:05.8345292Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0(in_ptr0, in_ptr1, out_ptr3, out_ptr4, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.8345384Z ^ 2025-12-04T12:15:05.8345854Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.8345864Z 2025-12-04T12:15:05.8346576Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:05.8346629Z 2025-12-04T12:15:05.8346634Z 2025-12-04T12:15:05.8346864Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:05.8347550Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,1,15_cuda 2025-12-04T12:15:05.8347556Z 2025-12-04T12:15:05.8347827Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:05.8348064Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.8348172Z frames [('total', 1)] 2025-12-04T12:15:05.8348307Z stats [('calls_captured', 10)] 2025-12-04T12:15:05.8348809Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:05.8349032Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.8349148Z graph_break [] 2025-12-04T12:15:05.8349368Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.8349473Z frames [('total', 1)] 2025-12-04T12:15:05.8349607Z stats [('calls_captured', 10)] 2025-12-04T12:15:05.8349827Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.8350301Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:05.8350401Z graph_break [] 2025-12-04T12:15:05.8350552Z =================================== FAILURES =================================== 2025-12-04T12:15:05.8350945Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,1,15_cuda _ 2025-12-04T12:15:05.8351076Z Traceback (most recent call last): 2025-12-04T12:15:05.8351501Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant 2025-12-04T12:15:05.8351748Z y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T12:15:05.8352238Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:05.8352500Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:05.8358006Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:05.8358278Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:05.8358838Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:05.8358997Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:05.8359623Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:05.8359963Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:05.8360493Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:05.8360659Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:05.8361142Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:05.8361265Z return self._compile_to_module() 2025-12-04T12:15:05.8361809Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:05.8361979Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:05.8362498Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:05.8362644Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:05.8363139Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:05.8363429Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:05.8364018Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:05.8364147Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:05.8364667Z File "/tmp/tmp4jptl5lp/gx/cgxw5jl3tp2wk3lcynnubkv2md7le22vidarqvy4v3kovdwcbw26.py", line 74, in 2025-12-04T12:15:05.8365133Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T12:15:05.8365261Z kernel.precompile( 2025-12-04T12:15:05.8365860Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T12:15:05.8365982Z self._precompile_worker() 2025-12-04T12:15:05.8366596Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:05.8366783Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:05.8367378Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.8367592Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.8368062Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.8368325Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.8368771Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.8369113Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.8369359Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:05.8370017Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0(in_ptr0, in_ptr1, out_ptr3, out_ptr4, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.8370126Z ^ 2025-12-04T12:15:05.8370584Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.8370591Z 2025-12-04T12:15:05.8371534Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:05.8371555Z 2025-12-04T12:15:05.8371560Z 2025-12-04T12:15:05.8371785Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:05.8372572Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,1,15_cuda 2025-12-04T12:15:05.8372579Z 2025-12-04T12:15:05.8372868Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:05.8373096Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.8373203Z frames [('total', 1)] 2025-12-04T12:15:05.8373339Z stats [('calls_captured', 10)] 2025-12-04T12:15:05.8373808Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:05.8374046Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.8374192Z graph_break [] 2025-12-04T12:15:05.8374415Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.8374538Z frames [('total', 1)] 2025-12-04T12:15:05.8374653Z stats [('calls_captured', 10)] 2025-12-04T12:15:05.8374874Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.8375348Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:05.8375489Z graph_break [] 2025-12-04T12:15:05.8375721Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.8375823Z frames [('total', 1)] 2025-12-04T12:15:05.8375940Z stats [('calls_captured', 10)] 2025-12-04T12:15:05.8376172Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.8376708Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:05.8376808Z graph_break [] 2025-12-04T12:15:05.8377469Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-818cc5e6f257d295.xml - 2025-12-04T12:15:05.8377697Z =========================== short test summary info ============================ 2025-12-04T12:15:05.8378545Z FAILED [0.4143s] inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,1,15_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:05.8379198Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_0(in_ptr0, in_ptr1, out_ptr3, out_ptr4, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.8379289Z ^ 2025-12-04T12:15:05.8379764Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.8379770Z 2025-12-04T12:15:05.8380479Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:05.8380487Z 2025-12-04T12:15:05.8380491Z 2025-12-04T12:15:05.8380723Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:05.8381416Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,1,15_cuda 2025-12-04T12:15:05.8381423Z 2025-12-04T12:15:05.8381702Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:05.8381884Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T12:15:05.8382086Z ================== 1 failed, 187 deselected, 2 rerun in 4.24s ================== 2025-12-04T12:15:05.8382206Z Got exit code 1 2025-12-04T12:15:05.8382816Z FAILED CONSISTENTLY: test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,1,15_cuda 2025-12-04T12:15:05.8383224Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T12:15:05.8383749Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-b552d5ebf2a766dc.xml 2025-12-04T12:15:05.8383918Z ============================= test session starts ============================== 2025-12-04T12:15:05.8384284Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T12:15:05.8384396Z cachedir: .pytest_cache 2025-12-04T12:15:05.8384916Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:15:05.8385059Z rootdir: /var/lib/jenkins/workspace 2025-12-04T12:15:05.8385168Z configfile: pytest.ini 2025-12-04T12:15:05.8385804Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T12:15:05.8386034Z collecting ... collected 188 items / 33 deselected / 155 selected 2025-12-04T12:15:05.8386182Z stepcurrent: skipping 33 already run items. 2025-12-04T12:15:05.8386311Z Running 155 items in this shard 2025-12-04T12:15:05.8386316Z 2025-12-04T12:15:05.8388217Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,15_cuda E1204 12:01:57.466000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1 2025-12-04T12:15:05.8389428Z E1204 12:01:57.466000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.8389862Z E1204 12:01:57.466000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T12:15:05.8390338Z E1204 12:01:57.466000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 150 2025-12-04T12:15:05.8390874Z E1204 12:01:57.466000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] R0_BLOCK: tl.constexpr = 256 2025-12-04T12:15:05.8391337Z E1204 12:01:57.466000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T12:15:05.8392030Z E1204 12:01:57.466000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T12:15:05.8392576Z E1204 12:01:57.466000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:05.8393166Z E1204 12:01:57.466000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T12:15:05.8393766Z E1204 12:01:57.466000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T12:15:05.8394329Z E1204 12:01:57.466000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T12:15:05.8394789Z E1204 12:01:57.466000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_offset = 0 2025-12-04T12:15:05.8395307Z E1204 12:01:57.466000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T12:15:05.8395795Z E1204 12:01:57.466000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T12:15:05.8396255Z E1204 12:01:57.466000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T12:15:05.8396710Z E1204 12:01:57.466000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_2 = r0_index 2025-12-04T12:15:05.8397263Z E1204 12:01:57.466000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_1 = r0_index // 15 2025-12-04T12:15:05.8397912Z E1204 12:01:57.466000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_2), r0_mask, other=0.0).to(tl.float32) 2025-12-04T12:15:05.8398628Z E1204 12:01:57.466000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.load(in_ptr1 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0) 2025-12-04T12:15:05.8399314Z E1204 12:01:57.466000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tl.load(in_ptr2 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0) 2025-12-04T12:15:05.8399901Z E1204 12:01:57.466000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp16 = tl.load(in_ptr3 + (0)) 2025-12-04T12:15:05.8400465Z E1204 12:01:57.466000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp17 = tl.broadcast_to(tmp16, [1, 1]) 2025-12-04T12:15:05.8400982Z E1204 12:01:57.466000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float32) 2025-12-04T12:15:05.8401465Z E1204 12:01:57.466000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tmp1 - tmp2 2025-12-04T12:15:05.8401932Z E1204 12:01:57.466000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = 15.0 2025-12-04T12:15:05.8402410Z E1204 12:01:57.466000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp6 = (tmp4 / tmp5) 2025-12-04T12:15:05.8402859Z E1204 12:01:57.466000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = 1e-05 2025-12-04T12:15:05.8403320Z E1204 12:01:57.466000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tmp6 + tmp7 2025-12-04T12:15:05.8404369Z E1204 12:01:57.466000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = libdevice.rsqrt(tmp8) 2025-12-04T12:15:05.8404838Z E1204 12:01:57.466000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = tmp3 * tmp9 2025-12-04T12:15:05.8405345Z E1204 12:01:57.466000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = tl_math.abs(tmp10) 2025-12-04T12:15:05.8405950Z E1204 12:01:57.466000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = tl.broadcast_to(tmp11, [XBLOCK, R0_BLOCK]) 2025-12-04T12:15:05.8406525Z E1204 12:01:57.466000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp14 = tl.where(r0_mask, tmp12, float("-inf")) 2025-12-04T12:15:05.8407177Z E1204 12:01:57.466000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp15 = triton_helpers.max2(tmp14, 1)[:, None].to(tl.float32) 2025-12-04T12:15:05.8407663Z E1204 12:01:57.466000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp18 = tmp10 * tmp17 2025-12-04T12:15:05.8408122Z E1204 12:01:57.466000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp19 = -448.0 2025-12-04T12:15:05.8408701Z E1204 12:01:57.466000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp20 = triton_helpers.maximum(tmp18, tmp19) 2025-12-04T12:15:05.8409137Z E1204 12:01:57.466000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp21 = 448.0 2025-12-04T12:15:05.8409723Z E1204 12:01:57.466000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp22 = triton_helpers.minimum(tmp20, tmp21) 2025-12-04T12:15:05.8410257Z E1204 12:01:57.466000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp23 = tmp22.to(tl.float8e4nv) 2025-12-04T12:15:05.8410783Z E1204 12:01:57.466000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp24 = tmp15.to(tl.float32) 2025-12-04T12:15:05.8411530Z E1204 12:01:57.466000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (tl.broadcast_to(r0_2, [XBLOCK, R0_BLOCK])), tmp23, r0_mask) 2025-12-04T12:15:05.8412241Z E1204 12:01:57.466000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr2 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp24, None) 2025-12-04T12:15:05.8412617Z E1204 12:01:57.466000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:05.8415033Z E1204 12:01:57.466000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'in_ptr2': '*fp32', 'in_ptr3': '*fp32', 'out_ptr1': '*fp8e4nv', 'out_ptr2': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 75} 2025-12-04T12:15:05.8415615Z E1204 12:01:57.466000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:05.8416734Z E1204 12:01:57.466000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.8417385Z E1204 12:01:57.466000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.8418281Z E1204 12:01:57.466000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.8419042Z E1204 12:01:57.466000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.8419924Z E1204 12:01:57.466000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.8420703Z E1204 12:01:57.466000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.8421313Z E1204 12:01:57.466000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T12:15:05.8422464Z E1204 12:01:57.466000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.8422842Z E1204 12:01:57.466000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T12:15:05.8423732Z E1204 12:01:57.466000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.8423878Z ('RERUN', {'yellow': True}) [3.6064s] [ 0%] 2025-12-04T12:15:05.8425336Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,15_cuda E1204 12:01:58.115000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1 2025-12-04T12:15:05.8426499Z E1204 12:01:58.115000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.8426931Z E1204 12:01:58.115000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T12:15:05.8427378Z E1204 12:01:58.115000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 150 2025-12-04T12:15:05.8427908Z E1204 12:01:58.115000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] R0_BLOCK: tl.constexpr = 256 2025-12-04T12:15:05.8428408Z E1204 12:01:58.115000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T12:15:05.8428959Z E1204 12:01:58.115000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T12:15:05.8429497Z E1204 12:01:58.115000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:05.8430111Z E1204 12:01:58.115000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T12:15:05.8430707Z E1204 12:01:58.115000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T12:15:05.8431263Z E1204 12:01:58.115000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T12:15:05.8431719Z E1204 12:01:58.115000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_offset = 0 2025-12-04T12:15:05.8432272Z E1204 12:01:58.115000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T12:15:05.8432741Z E1204 12:01:58.115000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T12:15:05.8433215Z E1204 12:01:58.115000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T12:15:05.8433661Z E1204 12:01:58.115000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_2 = r0_index 2025-12-04T12:15:05.8434160Z E1204 12:01:58.115000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_1 = r0_index // 15 2025-12-04T12:15:05.8434807Z E1204 12:01:58.115000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_2), r0_mask, other=0.0).to(tl.float32) 2025-12-04T12:15:05.8435518Z E1204 12:01:58.115000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.load(in_ptr1 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0) 2025-12-04T12:15:05.8436205Z E1204 12:01:58.115000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tl.load(in_ptr2 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0) 2025-12-04T12:15:05.8436732Z E1204 12:01:58.115000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp16 = tl.load(in_ptr3 + (0)) 2025-12-04T12:15:05.8437292Z E1204 12:01:58.115000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp17 = tl.broadcast_to(tmp16, [1, 1]) 2025-12-04T12:15:05.8437806Z E1204 12:01:58.115000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float32) 2025-12-04T12:15:05.8438289Z E1204 12:01:58.115000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tmp1 - tmp2 2025-12-04T12:15:05.8438750Z E1204 12:01:58.115000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = 15.0 2025-12-04T12:15:05.8439230Z E1204 12:01:58.115000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp6 = (tmp4 / tmp5) 2025-12-04T12:15:05.8439682Z E1204 12:01:58.115000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = 1e-05 2025-12-04T12:15:05.8440147Z E1204 12:01:58.115000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tmp6 + tmp7 2025-12-04T12:15:05.8440676Z E1204 12:01:58.115000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = libdevice.rsqrt(tmp8) 2025-12-04T12:15:05.8441143Z E1204 12:01:58.115000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = tmp3 * tmp9 2025-12-04T12:15:05.8441684Z E1204 12:01:58.115000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = tl_math.abs(tmp10) 2025-12-04T12:15:05.8442289Z E1204 12:01:58.115000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = tl.broadcast_to(tmp11, [XBLOCK, R0_BLOCK]) 2025-12-04T12:15:05.8442863Z E1204 12:01:58.115000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp14 = tl.where(r0_mask, tmp12, float("-inf")) 2025-12-04T12:15:05.8443545Z E1204 12:01:58.115000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp15 = triton_helpers.max2(tmp14, 1)[:, None].to(tl.float32) 2025-12-04T12:15:05.8444030Z E1204 12:01:58.115000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp18 = tmp10 * tmp17 2025-12-04T12:15:05.8444475Z E1204 12:01:58.115000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp19 = -448.0 2025-12-04T12:15:05.8445060Z E1204 12:01:58.115000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp20 = triton_helpers.maximum(tmp18, tmp19) 2025-12-04T12:15:05.8445538Z E1204 12:01:58.115000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp21 = 448.0 2025-12-04T12:15:05.8446120Z E1204 12:01:58.115000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp22 = triton_helpers.minimum(tmp20, tmp21) 2025-12-04T12:15:05.8446654Z E1204 12:01:58.115000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp23 = tmp22.to(tl.float8e4nv) 2025-12-04T12:15:05.8447178Z E1204 12:01:58.115000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp24 = tmp15.to(tl.float32) 2025-12-04T12:15:05.8447880Z E1204 12:01:58.115000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (tl.broadcast_to(r0_2, [XBLOCK, R0_BLOCK])), tmp23, r0_mask) 2025-12-04T12:15:05.8448586Z E1204 12:01:58.115000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr2 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp24, None) 2025-12-04T12:15:05.8448968Z E1204 12:01:58.115000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:05.8451374Z E1204 12:01:58.115000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'in_ptr2': '*fp32', 'in_ptr3': '*fp32', 'out_ptr1': '*fp8e4nv', 'out_ptr2': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 75} 2025-12-04T12:15:05.8451981Z E1204 12:01:58.115000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:05.8453027Z E1204 12:01:58.115000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.8453672Z E1204 12:01:58.115000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.8454566Z E1204 12:01:58.115000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.8455293Z E1204 12:01:58.115000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.8456181Z E1204 12:01:58.115000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.8457050Z E1204 12:01:58.115000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.8457715Z E1204 12:01:58.115000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T12:15:05.8458867Z E1204 12:01:58.115000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.8459249Z E1204 12:01:58.115000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T12:15:05.8460179Z E1204 12:01:58.115000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.8460333Z ('RERUN', {'yellow': True}) [0.6085s] [ 0%] 2025-12-04T12:15:05.8461754Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,15_cuda E1204 12:01:58.729000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1 2025-12-04T12:15:05.8462915Z E1204 12:01:58.729000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.8463352Z E1204 12:01:58.729000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T12:15:05.8463798Z E1204 12:01:58.729000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 150 2025-12-04T12:15:05.8464333Z E1204 12:01:58.729000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] R0_BLOCK: tl.constexpr = 256 2025-12-04T12:15:05.8464795Z E1204 12:01:58.729000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T12:15:05.8465343Z E1204 12:01:58.729000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T12:15:05.8465888Z E1204 12:01:58.729000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:05.8466581Z E1204 12:01:58.729000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T12:15:05.8467186Z E1204 12:01:58.729000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T12:15:05.8467743Z E1204 12:01:58.729000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T12:15:05.8468200Z E1204 12:01:58.729000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_offset = 0 2025-12-04T12:15:05.8468721Z E1204 12:01:58.729000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T12:15:05.8469225Z E1204 12:01:58.729000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T12:15:05.8469700Z E1204 12:01:58.729000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T12:15:05.8470150Z E1204 12:01:58.729000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_2 = r0_index 2025-12-04T12:15:05.8470645Z E1204 12:01:58.729000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_1 = r0_index // 15 2025-12-04T12:15:05.8471540Z E1204 12:01:58.729000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_2), r0_mask, other=0.0).to(tl.float32) 2025-12-04T12:15:05.8472247Z E1204 12:01:58.729000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.load(in_ptr1 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0) 2025-12-04T12:15:05.8472940Z E1204 12:01:58.729000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tl.load(in_ptr2 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0) 2025-12-04T12:15:05.8473552Z E1204 12:01:58.729000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp16 = tl.load(in_ptr3 + (0)) 2025-12-04T12:15:05.8474118Z E1204 12:01:58.729000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp17 = tl.broadcast_to(tmp16, [1, 1]) 2025-12-04T12:15:05.8474627Z E1204 12:01:58.729000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float32) 2025-12-04T12:15:05.8475112Z E1204 12:01:58.729000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tmp1 - tmp2 2025-12-04T12:15:05.8475545Z E1204 12:01:58.729000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = 15.0 2025-12-04T12:15:05.8476025Z E1204 12:01:58.729000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp6 = (tmp4 / tmp5) 2025-12-04T12:15:05.8476481Z E1204 12:01:58.729000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = 1e-05 2025-12-04T12:15:05.8476952Z E1204 12:01:58.729000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tmp6 + tmp7 2025-12-04T12:15:05.8477484Z E1204 12:01:58.729000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = libdevice.rsqrt(tmp8) 2025-12-04T12:15:05.8477954Z E1204 12:01:58.729000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = tmp3 * tmp9 2025-12-04T12:15:05.8478458Z E1204 12:01:58.729000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = tl_math.abs(tmp10) 2025-12-04T12:15:05.8479058Z E1204 12:01:58.729000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = tl.broadcast_to(tmp11, [XBLOCK, R0_BLOCK]) 2025-12-04T12:15:05.8479635Z E1204 12:01:58.729000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp14 = tl.where(r0_mask, tmp12, float("-inf")) 2025-12-04T12:15:05.8480328Z E1204 12:01:58.729000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp15 = triton_helpers.max2(tmp14, 1)[:, None].to(tl.float32) 2025-12-04T12:15:05.8480816Z E1204 12:01:58.729000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp18 = tmp10 * tmp17 2025-12-04T12:15:05.8481261Z E1204 12:01:58.729000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp19 = -448.0 2025-12-04T12:15:05.8481848Z E1204 12:01:58.729000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp20 = triton_helpers.maximum(tmp18, tmp19) 2025-12-04T12:15:05.8482285Z E1204 12:01:58.729000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp21 = 448.0 2025-12-04T12:15:05.8482910Z E1204 12:01:58.729000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp22 = triton_helpers.minimum(tmp20, tmp21) 2025-12-04T12:15:05.8483448Z E1204 12:01:58.729000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp23 = tmp22.to(tl.float8e4nv) 2025-12-04T12:15:05.8483958Z E1204 12:01:58.729000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp24 = tmp15.to(tl.float32) 2025-12-04T12:15:05.8484713Z E1204 12:01:58.729000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (tl.broadcast_to(r0_2, [XBLOCK, R0_BLOCK])), tmp23, r0_mask) 2025-12-04T12:15:05.8485416Z E1204 12:01:58.729000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr2 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp24, None) 2025-12-04T12:15:05.8485791Z E1204 12:01:58.729000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:05.8488182Z E1204 12:01:58.729000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'in_ptr2': '*fp32', 'in_ptr3': '*fp32', 'out_ptr1': '*fp8e4nv', 'out_ptr2': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 75} 2025-12-04T12:15:05.8488763Z E1204 12:01:58.729000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:05.8489805Z E1204 12:01:58.729000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.8490451Z E1204 12:01:58.729000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.8491343Z E1204 12:01:58.729000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.8492040Z E1204 12:01:58.729000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.8492922Z E1204 12:01:58.729000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.8493691Z E1204 12:01:58.729000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.8494348Z E1204 12:01:58.729000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T12:15:05.8495495Z E1204 12:01:58.729000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.8495879Z E1204 12:01:58.729000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T12:15:05.8496872Z E1204 12:01:58.729000 120495 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.8496995Z FAILED [0.6109s] [ 0%] 2025-12-04T12:15:05.8497002Z 2025-12-04T12:15:05.8497153Z ==================================== RERUNS ==================================== 2025-12-04T12:15:05.8497542Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,15_cuda _ 2025-12-04T12:15:05.8497683Z Traceback (most recent call last): 2025-12-04T12:15:05.8498139Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant 2025-12-04T12:15:05.8498384Z y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T12:15:05.8498875Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:05.8499126Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:05.8499650Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:05.8499845Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:05.8500403Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:05.8500567Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:05.8501101Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:05.8501436Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:05.8501954Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:05.8502105Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:05.8502595Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:05.8502721Z return self._compile_to_module() 2025-12-04T12:15:05.8503214Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:05.8503385Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:05.8503907Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:05.8504057Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:05.8504557Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:05.8504791Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:05.8505391Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:05.8505520Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:05.8506043Z File "/tmp/tmp0b8nceha/j7/cj7h6jumgz2ritrdd52emqour3wzuolivg46ljwykgtzzbmutrvg.py", line 137, in 2025-12-04T12:15:05.8506537Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T12:15:05.8506651Z kernel.precompile( 2025-12-04T12:15:05.8507216Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T12:15:05.8507340Z self._precompile_worker() 2025-12-04T12:15:05.8507946Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:05.8508125Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:05.8508715Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.8508927Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.8509407Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.8509658Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.8510109Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.8510474Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.8510710Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:05.8511422Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.8511513Z ^ 2025-12-04T12:15:05.8511986Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.8511992Z 2025-12-04T12:15:05.8512707Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:05.8512747Z 2025-12-04T12:15:05.8512752Z 2025-12-04T12:15:05.8512983Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:05.8513674Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,15_cuda 2025-12-04T12:15:05.8513682Z 2025-12-04T12:15:05.8513963Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:05.8514187Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.8514292Z frames [('total', 1)] 2025-12-04T12:15:05.8514425Z stats [('calls_captured', 10)] 2025-12-04T12:15:05.8514891Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:15:05.8515112Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.8515229Z graph_break [] 2025-12-04T12:15:05.8515617Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,15_cuda _ 2025-12-04T12:15:05.8515751Z Traceback (most recent call last): 2025-12-04T12:15:05.8516174Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant 2025-12-04T12:15:05.8516406Z y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T12:15:05.8516905Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:05.8517153Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:05.8517663Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:05.8517869Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:05.8518376Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:05.8518573Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:05.8519107Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:05.8519426Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:05.8519957Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:05.8520106Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:05.8520597Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:05.8520719Z return self._compile_to_module() 2025-12-04T12:15:05.8521233Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:05.8521414Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:05.8521929Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:05.8522090Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:05.8522596Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:05.8522828Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:05.8523426Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:05.8523551Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:05.8524057Z File "/tmp/tmp7kkspfeo/te/cteizhh2wyjrb26zcfaw7rteg7eje6ljy5zq2k2b7yt4q4nxrnkq.py", line 137, in 2025-12-04T12:15:05.8524531Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T12:15:05.8524677Z kernel.precompile( 2025-12-04T12:15:05.8525243Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T12:15:05.8525370Z self._precompile_worker() 2025-12-04T12:15:05.8525968Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:05.8526161Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:05.8526757Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.8526970Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.8527426Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.8527675Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.8528135Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.8528468Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.8528700Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:05.8529421Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.8529511Z ^ 2025-12-04T12:15:05.8529982Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.8529988Z 2025-12-04T12:15:05.8530699Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:05.8530707Z 2025-12-04T12:15:05.8530746Z 2025-12-04T12:15:05.8530978Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:05.8531671Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,15_cuda 2025-12-04T12:15:05.8531679Z 2025-12-04T12:15:05.8531951Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:05.8532184Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.8532289Z frames [('total', 1)] 2025-12-04T12:15:05.8532403Z stats [('calls_captured', 10)] 2025-12-04T12:15:05.8532912Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:15:05.8533132Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.8533247Z graph_break [] 2025-12-04T12:15:05.8533468Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.8533570Z frames [('total', 1)] 2025-12-04T12:15:05.8533698Z stats [('calls_captured', 10)] 2025-12-04T12:15:05.8533947Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.8534407Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:15:05.8534520Z graph_break [] 2025-12-04T12:15:05.8534666Z =================================== FAILURES =================================== 2025-12-04T12:15:05.8535078Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,15_cuda _ 2025-12-04T12:15:05.8535204Z Traceback (most recent call last): 2025-12-04T12:15:05.8535633Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant 2025-12-04T12:15:05.8535911Z y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T12:15:05.8536478Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:05.8536745Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:05.8537264Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:05.8537464Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:05.8537987Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:05.8538137Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:05.8538680Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:05.8539021Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:05.8539544Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:05.8539709Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:05.8540197Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:05.8540320Z return self._compile_to_module() 2025-12-04T12:15:05.8540823Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:05.8540989Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:05.8541522Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:05.8541660Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:05.8542157Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:05.8542454Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:05.8543045Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:05.8543177Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:05.8543704Z File "/tmp/tmpcvqnmz83/ne/cneh6ojjwd4az6gygqhbjjxqv2oaheohx5nbs5a2oiwt4kopttgx.py", line 137, in 2025-12-04T12:15:05.8544172Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T12:15:05.8544305Z kernel.precompile( 2025-12-04T12:15:05.8544892Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T12:15:05.8545014Z self._precompile_worker() 2025-12-04T12:15:05.8545634Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:05.8545815Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:05.8546428Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.8546679Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.8547133Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.8547393Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.8547839Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.8548178Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.8548451Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:05.8549168Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.8549285Z ^ 2025-12-04T12:15:05.8549746Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.8549751Z 2025-12-04T12:15:05.8550462Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:05.8550482Z 2025-12-04T12:15:05.8550486Z 2025-12-04T12:15:05.8550705Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:05.8551398Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,15_cuda 2025-12-04T12:15:05.8551406Z 2025-12-04T12:15:05.8551698Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:05.8551921Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.8552040Z frames [('total', 1)] 2025-12-04T12:15:05.8552159Z stats [('calls_captured', 10)] 2025-12-04T12:15:05.8552624Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:15:05.8552860Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.8552959Z graph_break [] 2025-12-04T12:15:05.8553176Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.8553292Z frames [('total', 1)] 2025-12-04T12:15:05.8553408Z stats [('calls_captured', 10)] 2025-12-04T12:15:05.8553629Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.8554138Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:15:05.8554238Z graph_break [] 2025-12-04T12:15:05.8554473Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.8554582Z frames [('total', 1)] 2025-12-04T12:15:05.8554697Z stats [('calls_captured', 10)] 2025-12-04T12:15:05.8554935Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.8555391Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:15:05.8555490Z graph_break [] 2025-12-04T12:15:05.8556159Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-b552d5ebf2a766dc.xml - 2025-12-04T12:15:05.8556365Z =========================== short test summary info ============================ 2025-12-04T12:15:05.8557210Z FAILED [0.6109s] inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,15_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:05.8557918Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.8558038Z ^ 2025-12-04T12:15:05.8558513Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.8558519Z 2025-12-04T12:15:05.8559224Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:05.8559230Z 2025-12-04T12:15:05.8559235Z 2025-12-04T12:15:05.8559466Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:05.8560166Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,15_cuda 2025-12-04T12:15:05.8560201Z 2025-12-04T12:15:05.8560484Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:05.8560671Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T12:15:05.8560874Z ================== 1 failed, 33 deselected, 2 rerun in 4.87s =================== 2025-12-04T12:15:05.8560990Z Got exit code 1 2025-12-04T12:15:05.8561100Z Retrying single test... 2025-12-04T12:15:05.8561572Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-08c28ac73e77007a.xml 2025-12-04T12:15:05.8561750Z ============================= test session starts ============================== 2025-12-04T12:15:05.8562104Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T12:15:05.8562229Z cachedir: .pytest_cache 2025-12-04T12:15:05.8562753Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:15:05.8562880Z rootdir: /var/lib/jenkins/workspace 2025-12-04T12:15:05.8563006Z configfile: pytest.ini 2025-12-04T12:15:05.8563601Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T12:15:05.8563837Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T12:15:05.8564612Z stepcurrent: skipping 33 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,15_cuda 2025-12-04T12:15:05.8564729Z Running 1 items in this shard 2025-12-04T12:15:05.8564734Z 2025-12-04T12:15:05.8566192Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,15_cuda E1204 12:02:17.157000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1 2025-12-04T12:15:05.8567349Z E1204 12:02:17.157000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.8567797Z E1204 12:02:17.157000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T12:15:05.8568243Z E1204 12:02:17.157000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 150 2025-12-04T12:15:05.8568806Z E1204 12:02:17.157000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] R0_BLOCK: tl.constexpr = 256 2025-12-04T12:15:05.8569270Z E1204 12:02:17.157000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T12:15:05.8569807Z E1204 12:02:17.157000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T12:15:05.8570363Z E1204 12:02:17.157000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:05.8571177Z E1204 12:02:17.157000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T12:15:05.8571781Z E1204 12:02:17.157000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T12:15:05.8572342Z E1204 12:02:17.157000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T12:15:05.8572784Z E1204 12:02:17.157000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_offset = 0 2025-12-04T12:15:05.8573386Z E1204 12:02:17.157000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T12:15:05.8573860Z E1204 12:02:17.157000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T12:15:05.8574337Z E1204 12:02:17.157000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T12:15:05.8574784Z E1204 12:02:17.157000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_2 = r0_index 2025-12-04T12:15:05.8575268Z E1204 12:02:17.157000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_1 = r0_index // 15 2025-12-04T12:15:05.8575933Z E1204 12:02:17.157000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_2), r0_mask, other=0.0).to(tl.float32) 2025-12-04T12:15:05.8576695Z E1204 12:02:17.157000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.load(in_ptr1 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0) 2025-12-04T12:15:05.8577402Z E1204 12:02:17.157000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tl.load(in_ptr2 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0) 2025-12-04T12:15:05.8577931Z E1204 12:02:17.157000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp16 = tl.load(in_ptr3 + (0)) 2025-12-04T12:15:05.8578492Z E1204 12:02:17.157000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp17 = tl.broadcast_to(tmp16, [1, 1]) 2025-12-04T12:15:05.8579005Z E1204 12:02:17.157000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float32) 2025-12-04T12:15:05.8579476Z E1204 12:02:17.157000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tmp1 - tmp2 2025-12-04T12:15:05.8579975Z E1204 12:02:17.157000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = 15.0 2025-12-04T12:15:05.8580453Z E1204 12:02:17.157000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp6 = (tmp4 / tmp5) 2025-12-04T12:15:05.8580903Z E1204 12:02:17.157000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = 1e-05 2025-12-04T12:15:05.8581369Z E1204 12:02:17.157000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tmp6 + tmp7 2025-12-04T12:15:05.8582371Z E1204 12:02:17.157000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = libdevice.rsqrt(tmp8) 2025-12-04T12:15:05.8582920Z E1204 12:02:17.157000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = tmp3 * tmp9 2025-12-04T12:15:05.8583430Z E1204 12:02:17.157000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = tl_math.abs(tmp10) 2025-12-04T12:15:05.8584036Z E1204 12:02:17.157000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = tl.broadcast_to(tmp11, [XBLOCK, R0_BLOCK]) 2025-12-04T12:15:05.8584661Z E1204 12:02:17.157000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp14 = tl.where(r0_mask, tmp12, float("-inf")) 2025-12-04T12:15:05.8585301Z E1204 12:02:17.157000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp15 = triton_helpers.max2(tmp14, 1)[:, None].to(tl.float32) 2025-12-04T12:15:05.8585800Z E1204 12:02:17.157000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp18 = tmp10 * tmp17 2025-12-04T12:15:05.8586246Z E1204 12:02:17.157000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp19 = -448.0 2025-12-04T12:15:05.8586840Z E1204 12:02:17.157000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp20 = triton_helpers.maximum(tmp18, tmp19) 2025-12-04T12:15:05.8587314Z E1204 12:02:17.157000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp21 = 448.0 2025-12-04T12:15:05.8587884Z E1204 12:02:17.157000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp22 = triton_helpers.minimum(tmp20, tmp21) 2025-12-04T12:15:05.8588432Z E1204 12:02:17.157000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp23 = tmp22.to(tl.float8e4nv) 2025-12-04T12:15:05.8588943Z E1204 12:02:17.157000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp24 = tmp15.to(tl.float32) 2025-12-04T12:15:05.8589659Z E1204 12:02:17.157000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (tl.broadcast_to(r0_2, [XBLOCK, R0_BLOCK])), tmp23, r0_mask) 2025-12-04T12:15:05.8590366Z E1204 12:02:17.157000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr2 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp24, None) 2025-12-04T12:15:05.8590732Z E1204 12:02:17.157000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:05.8593127Z E1204 12:02:17.157000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'in_ptr2': '*fp32', 'in_ptr3': '*fp32', 'out_ptr1': '*fp8e4nv', 'out_ptr2': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 75} 2025-12-04T12:15:05.8593712Z E1204 12:02:17.157000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:05.8594892Z E1204 12:02:17.157000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.8595525Z E1204 12:02:17.157000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.8596431Z E1204 12:02:17.157000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.8597176Z E1204 12:02:17.157000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.8598079Z E1204 12:02:17.157000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.8598888Z E1204 12:02:17.157000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.8599512Z E1204 12:02:17.157000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T12:15:05.8600675Z E1204 12:02:17.157000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.8601095Z E1204 12:02:17.157000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T12:15:05.8601993Z E1204 12:02:17.157000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.8602131Z ('RERUN', {'yellow': True}) [3.6041s] [100%] 2025-12-04T12:15:05.8603564Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,15_cuda E1204 12:02:17.807000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1 2025-12-04T12:15:05.8604709Z E1204 12:02:17.807000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.8605152Z E1204 12:02:17.807000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T12:15:05.8605605Z E1204 12:02:17.807000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 150 2025-12-04T12:15:05.8606146Z E1204 12:02:17.807000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] R0_BLOCK: tl.constexpr = 256 2025-12-04T12:15:05.8606609Z E1204 12:02:17.807000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T12:15:05.8607139Z E1204 12:02:17.807000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T12:15:05.8607696Z E1204 12:02:17.807000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:05.8608320Z E1204 12:02:17.807000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T12:15:05.8608920Z E1204 12:02:17.807000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T12:15:05.8609483Z E1204 12:02:17.807000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T12:15:05.8609924Z E1204 12:02:17.807000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_offset = 0 2025-12-04T12:15:05.8610456Z E1204 12:02:17.807000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T12:15:05.8610973Z E1204 12:02:17.807000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T12:15:05.8611451Z E1204 12:02:17.807000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T12:15:05.8611900Z E1204 12:02:17.807000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_2 = r0_index 2025-12-04T12:15:05.8612418Z E1204 12:02:17.807000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_1 = r0_index // 15 2025-12-04T12:15:05.8613080Z E1204 12:02:17.807000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_2), r0_mask, other=0.0).to(tl.float32) 2025-12-04T12:15:05.8613771Z E1204 12:02:17.807000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.load(in_ptr1 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0) 2025-12-04T12:15:05.8614475Z E1204 12:02:17.807000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tl.load(in_ptr2 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0) 2025-12-04T12:15:05.8615042Z E1204 12:02:17.807000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp16 = tl.load(in_ptr3 + (0)) 2025-12-04T12:15:05.8615593Z E1204 12:02:17.807000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp17 = tl.broadcast_to(tmp16, [1, 1]) 2025-12-04T12:15:05.8616118Z E1204 12:02:17.807000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float32) 2025-12-04T12:15:05.8616666Z E1204 12:02:17.807000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tmp1 - tmp2 2025-12-04T12:15:05.8617112Z E1204 12:02:17.807000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = 15.0 2025-12-04T12:15:05.8617595Z E1204 12:02:17.807000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp6 = (tmp4 / tmp5) 2025-12-04T12:15:05.8618029Z E1204 12:02:17.807000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = 1e-05 2025-12-04T12:15:05.8618512Z E1204 12:02:17.807000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tmp6 + tmp7 2025-12-04T12:15:05.8619033Z E1204 12:02:17.807000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = libdevice.rsqrt(tmp8) 2025-12-04T12:15:05.8619517Z E1204 12:02:17.807000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = tmp3 * tmp9 2025-12-04T12:15:05.8620021Z E1204 12:02:17.807000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = tl_math.abs(tmp10) 2025-12-04T12:15:05.8620624Z E1204 12:02:17.807000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = tl.broadcast_to(tmp11, [XBLOCK, R0_BLOCK]) 2025-12-04T12:15:05.8621206Z E1204 12:02:17.807000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp14 = tl.where(r0_mask, tmp12, float("-inf")) 2025-12-04T12:15:05.8621898Z E1204 12:02:17.807000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp15 = triton_helpers.max2(tmp14, 1)[:, None].to(tl.float32) 2025-12-04T12:15:05.8622392Z E1204 12:02:17.807000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp18 = tmp10 * tmp17 2025-12-04T12:15:05.8622839Z E1204 12:02:17.807000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp19 = -448.0 2025-12-04T12:15:05.8623424Z E1204 12:02:17.807000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp20 = triton_helpers.maximum(tmp18, tmp19) 2025-12-04T12:15:05.8623867Z E1204 12:02:17.807000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp21 = 448.0 2025-12-04T12:15:05.8624759Z E1204 12:02:17.807000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp22 = triton_helpers.minimum(tmp20, tmp21) 2025-12-04T12:15:05.8625319Z E1204 12:02:17.807000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp23 = tmp22.to(tl.float8e4nv) 2025-12-04T12:15:05.8625836Z E1204 12:02:17.807000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp24 = tmp15.to(tl.float32) 2025-12-04T12:15:05.8626595Z E1204 12:02:17.807000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (tl.broadcast_to(r0_2, [XBLOCK, R0_BLOCK])), tmp23, r0_mask) 2025-12-04T12:15:05.8627305Z E1204 12:02:17.807000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr2 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp24, None) 2025-12-04T12:15:05.8627671Z E1204 12:02:17.807000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:05.8630075Z E1204 12:02:17.807000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'in_ptr2': '*fp32', 'in_ptr3': '*fp32', 'out_ptr1': '*fp8e4nv', 'out_ptr2': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 75} 2025-12-04T12:15:05.8630694Z E1204 12:02:17.807000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:05.8631742Z E1204 12:02:17.807000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.8632378Z E1204 12:02:17.807000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.8633284Z E1204 12:02:17.807000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.8633970Z E1204 12:02:17.807000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.8634876Z E1204 12:02:17.807000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.8635646Z E1204 12:02:17.807000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.8636310Z E1204 12:02:17.807000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T12:15:05.8637472Z E1204 12:02:17.807000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.8637835Z E1204 12:02:17.807000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T12:15:05.8638779Z E1204 12:02:17.807000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.8638919Z ('RERUN', {'yellow': True}) [0.6096s] [100%] 2025-12-04T12:15:05.8640359Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,15_cuda E1204 12:02:18.427000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1 2025-12-04T12:15:05.8641542Z E1204 12:02:18.427000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.8641993Z E1204 12:02:18.427000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T12:15:05.8642443Z E1204 12:02:18.427000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 150 2025-12-04T12:15:05.8643002Z E1204 12:02:18.427000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] R0_BLOCK: tl.constexpr = 256 2025-12-04T12:15:05.8643478Z E1204 12:02:18.427000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T12:15:05.8644018Z E1204 12:02:18.427000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T12:15:05.8644573Z E1204 12:02:18.427000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:05.8645157Z E1204 12:02:18.427000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T12:15:05.8645756Z E1204 12:02:18.427000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T12:15:05.8646326Z E1204 12:02:18.427000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T12:15:05.8646771Z E1204 12:02:18.427000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_offset = 0 2025-12-04T12:15:05.8647303Z E1204 12:02:18.427000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T12:15:05.8647776Z E1204 12:02:18.427000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T12:15:05.8648253Z E1204 12:02:18.427000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T12:15:05.8648702Z E1204 12:02:18.427000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_2 = r0_index 2025-12-04T12:15:05.8649190Z E1204 12:02:18.427000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_1 = r0_index // 15 2025-12-04T12:15:05.8649905Z E1204 12:02:18.427000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_2), r0_mask, other=0.0).to(tl.float32) 2025-12-04T12:15:05.8650598Z E1204 12:02:18.427000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.load(in_ptr1 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0) 2025-12-04T12:15:05.8651302Z E1204 12:02:18.427000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tl.load(in_ptr2 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0) 2025-12-04T12:15:05.8651831Z E1204 12:02:18.427000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp16 = tl.load(in_ptr3 + (0)) 2025-12-04T12:15:05.8652412Z E1204 12:02:18.427000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp17 = tl.broadcast_to(tmp16, [1, 1]) 2025-12-04T12:15:05.8652939Z E1204 12:02:18.427000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float32) 2025-12-04T12:15:05.8653415Z E1204 12:02:18.427000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tmp1 - tmp2 2025-12-04T12:15:05.8653898Z E1204 12:02:18.427000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = 15.0 2025-12-04T12:15:05.8654374Z E1204 12:02:18.427000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp6 = (tmp4 / tmp5) 2025-12-04T12:15:05.8654809Z E1204 12:02:18.427000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = 1e-05 2025-12-04T12:15:05.8655289Z E1204 12:02:18.427000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tmp6 + tmp7 2025-12-04T12:15:05.8655811Z E1204 12:02:18.427000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = libdevice.rsqrt(tmp8) 2025-12-04T12:15:05.8656413Z E1204 12:02:18.427000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = tmp3 * tmp9 2025-12-04T12:15:05.8656921Z E1204 12:02:18.427000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = tl_math.abs(tmp10) 2025-12-04T12:15:05.8657510Z E1204 12:02:18.427000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = tl.broadcast_to(tmp11, [XBLOCK, R0_BLOCK]) 2025-12-04T12:15:05.8658098Z E1204 12:02:18.427000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp14 = tl.where(r0_mask, tmp12, float("-inf")) 2025-12-04T12:15:05.8658733Z E1204 12:02:18.427000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp15 = triton_helpers.max2(tmp14, 1)[:, None].to(tl.float32) 2025-12-04T12:15:05.8659234Z E1204 12:02:18.427000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp18 = tmp10 * tmp17 2025-12-04T12:15:05.8659681Z E1204 12:02:18.427000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp19 = -448.0 2025-12-04T12:15:05.8660266Z E1204 12:02:18.427000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp20 = triton_helpers.maximum(tmp18, tmp19) 2025-12-04T12:15:05.8660710Z E1204 12:02:18.427000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp21 = 448.0 2025-12-04T12:15:05.8661281Z E1204 12:02:18.427000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp22 = triton_helpers.minimum(tmp20, tmp21) 2025-12-04T12:15:05.8661832Z E1204 12:02:18.427000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp23 = tmp22.to(tl.float8e4nv) 2025-12-04T12:15:05.8662349Z E1204 12:02:18.427000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp24 = tmp15.to(tl.float32) 2025-12-04T12:15:05.8663111Z E1204 12:02:18.427000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (tl.broadcast_to(r0_2, [XBLOCK, R0_BLOCK])), tmp23, r0_mask) 2025-12-04T12:15:05.8663824Z E1204 12:02:18.427000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr2 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp24, None) 2025-12-04T12:15:05.8664194Z E1204 12:02:18.427000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:05.8666608Z E1204 12:02:18.427000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'in_ptr2': '*fp32', 'in_ptr3': '*fp32', 'out_ptr1': '*fp8e4nv', 'out_ptr2': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 75} 2025-12-04T12:15:05.8667158Z E1204 12:02:18.427000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:05.8668231Z E1204 12:02:18.427000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.8668860Z E1204 12:02:18.427000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.8669768Z E1204 12:02:18.427000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.8670563Z E1204 12:02:18.427000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.8671748Z E1204 12:02:18.427000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.8672520Z E1204 12:02:18.427000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.8673141Z E1204 12:02:18.427000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T12:15:05.8674298Z E1204 12:02:18.427000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.8674667Z E1204 12:02:18.427000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T12:15:05.8675573Z E1204 12:02:18.427000 120735 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.8675678Z FAILED [0.6176s] [100%] 2025-12-04T12:15:05.8675685Z 2025-12-04T12:15:05.8675844Z ==================================== RERUNS ==================================== 2025-12-04T12:15:05.8676230Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,15_cuda _ 2025-12-04T12:15:05.8676357Z Traceback (most recent call last): 2025-12-04T12:15:05.8676794Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant 2025-12-04T12:15:05.8677145Z y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T12:15:05.8677656Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:05.8677910Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:05.8678423Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:05.8678634Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:05.8679143Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:05.8679309Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:05.8679895Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:05.8680224Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:05.8680764Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:05.8680969Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:05.8681448Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:05.8681587Z return self._compile_to_module() 2025-12-04T12:15:05.8682070Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:05.8682248Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:05.8682770Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:05.8682901Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:05.8683466Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:05.8683700Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:05.8684305Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:05.8684435Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:05.8684948Z File "/tmp/tmp6l90mpjy/y7/cy7w4pppwhstmgbhedapboukma6vmkcfiktelziyofw2uon22cbx.py", line 137, in 2025-12-04T12:15:05.8685430Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T12:15:05.8685542Z kernel.precompile( 2025-12-04T12:15:05.8686108Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T12:15:05.8686243Z self._precompile_worker() 2025-12-04T12:15:05.8686841Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:05.8687033Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:05.8687628Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.8687824Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.8688288Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.8688533Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.8688990Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.8689323Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.8689593Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:05.8690318Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.8690411Z ^ 2025-12-04T12:15:05.8690870Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.8690887Z 2025-12-04T12:15:05.8691600Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:05.8691607Z 2025-12-04T12:15:05.8691612Z 2025-12-04T12:15:05.8691830Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:05.8692570Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,15_cuda 2025-12-04T12:15:05.8692581Z 2025-12-04T12:15:05.8692852Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:05.8693091Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.8693232Z frames [('total', 1)] 2025-12-04T12:15:05.8693351Z stats [('calls_captured', 10)] 2025-12-04T12:15:05.8693831Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:15:05.8694054Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.8694155Z graph_break [] 2025-12-04T12:15:05.8694556Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,15_cuda _ 2025-12-04T12:15:05.8694685Z Traceback (most recent call last): 2025-12-04T12:15:05.8695125Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant 2025-12-04T12:15:05.8695418Z y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T12:15:05.8695908Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:05.8696172Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:05.8696774Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:05.8696984Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:05.8697493Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:05.8697640Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:05.8698190Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:05.8698515Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:05.8699033Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:05.8699200Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:05.8699679Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:05.8699817Z return self._compile_to_module() 2025-12-04T12:15:05.8700301Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:05.8700465Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:05.8700995Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:05.8701127Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:05.8701680Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:05.8701915Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:05.8702502Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:05.8702647Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:05.8703125Z File "/tmp/tmp7m_z2z_n/bf/cbfyztuyfzxs6gg5z7c564uo3zygh2evvep6wqyycfvbmim5tzre.py", line 137, in 2025-12-04T12:15:05.8703607Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T12:15:05.8703721Z kernel.precompile( 2025-12-04T12:15:05.8704315Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T12:15:05.8704452Z self._precompile_worker() 2025-12-04T12:15:05.8705050Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:05.8705230Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:05.8705873Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.8706074Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.8706542Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.8706786Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.8707232Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.8707582Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.8707843Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:05.8708574Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.8708667Z ^ 2025-12-04T12:15:05.8709126Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.8709132Z 2025-12-04T12:15:05.8709850Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:05.8709856Z 2025-12-04T12:15:05.8709861Z 2025-12-04T12:15:05.8710082Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:05.8710791Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,15_cuda 2025-12-04T12:15:05.8710799Z 2025-12-04T12:15:05.8711074Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:05.8711297Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.8711419Z frames [('total', 1)] 2025-12-04T12:15:05.8711538Z stats [('calls_captured', 10)] 2025-12-04T12:15:05.8712010Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:15:05.8712233Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.8712335Z graph_break [] 2025-12-04T12:15:05.8712576Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.8712683Z frames [('total', 1)] 2025-12-04T12:15:05.8712803Z stats [('calls_captured', 10)] 2025-12-04T12:15:05.8713039Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.8713557Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:15:05.8713674Z graph_break [] 2025-12-04T12:15:05.8713823Z =================================== FAILURES =================================== 2025-12-04T12:15:05.8714215Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,15_cuda _ 2025-12-04T12:15:05.8714357Z Traceback (most recent call last): 2025-12-04T12:15:05.8714785Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant 2025-12-04T12:15:05.8715018Z y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T12:15:05.8715523Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:05.8715808Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:05.8716340Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:05.8716535Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:05.8717047Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:05.8717249Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:05.8717786Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:05.8718121Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:05.8718644Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:05.8718796Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:05.8719293Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:05.8719453Z return self._compile_to_module() 2025-12-04T12:15:05.8719935Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:05.8720121Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:05.8720640Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:05.8720786Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:05.8721284Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:05.8721518Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:05.8722125Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:05.8722258Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:05.8722783Z File "/tmp/tmpv9l2bewc/xf/cxfljslkrx2pu46ribcqseu4ictygoxwuddw2h7xousjqc5wg66z.py", line 137, in 2025-12-04T12:15:05.8723249Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T12:15:05.8723368Z kernel.precompile( 2025-12-04T12:15:05.8723936Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T12:15:05.8724062Z self._precompile_worker() 2025-12-04T12:15:05.8724661Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:05.8724856Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:05.8725454Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.8725710Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.8726566Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.8726820Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.8727281Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.8727614Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.8727860Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:05.8728631Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.8728724Z ^ 2025-12-04T12:15:05.8729194Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.8729205Z 2025-12-04T12:15:05.8729915Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:05.8729951Z 2025-12-04T12:15:05.8729956Z 2025-12-04T12:15:05.8730183Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:05.8730877Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,15_cuda 2025-12-04T12:15:05.8730885Z 2025-12-04T12:15:05.8731151Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:05.8731389Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.8731496Z frames [('total', 1)] 2025-12-04T12:15:05.8731629Z stats [('calls_captured', 10)] 2025-12-04T12:15:05.8732280Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:15:05.8732506Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.8732626Z graph_break [] 2025-12-04T12:15:05.8732847Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.8732953Z frames [('total', 1)] 2025-12-04T12:15:05.8733089Z stats [('calls_captured', 10)] 2025-12-04T12:15:05.8733306Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.8733780Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:15:05.8733879Z graph_break [] 2025-12-04T12:15:05.8734097Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.8734216Z frames [('total', 1)] 2025-12-04T12:15:05.8734336Z stats [('calls_captured', 10)] 2025-12-04T12:15:05.8734557Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.8735026Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:15:05.8735129Z graph_break [] 2025-12-04T12:15:05.8735793Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-08c28ac73e77007a.xml - 2025-12-04T12:15:05.8735968Z =========================== short test summary info ============================ 2025-12-04T12:15:05.8736887Z FAILED [0.6176s] inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,15_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:05.8737611Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.8737707Z ^ 2025-12-04T12:15:05.8738234Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.8738242Z 2025-12-04T12:15:05.8738952Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:05.8738961Z 2025-12-04T12:15:05.8738965Z 2025-12-04T12:15:05.8739183Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:05.8739885Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,15_cuda 2025-12-04T12:15:05.8739891Z 2025-12-04T12:15:05.8740194Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:05.8740389Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T12:15:05.8740597Z ================== 1 failed, 187 deselected, 2 rerun in 4.87s ================== 2025-12-04T12:15:05.8740700Z Got exit code 1 2025-12-04T12:15:05.8740825Z Retrying single test... 2025-12-04T12:15:05.8741301Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-df1b42bf8f6cd06e.xml 2025-12-04T12:15:05.8741532Z ============================= test session starts ============================== 2025-12-04T12:15:05.8741886Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T12:15:05.8741999Z cachedir: .pytest_cache 2025-12-04T12:15:05.8742531Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:15:05.8742657Z rootdir: /var/lib/jenkins/workspace 2025-12-04T12:15:05.8742770Z configfile: pytest.ini 2025-12-04T12:15:05.8743373Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T12:15:05.8743636Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T12:15:05.8744433Z stepcurrent: skipping 33 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,15_cuda 2025-12-04T12:15:05.8744556Z Running 1 items in this shard 2025-12-04T12:15:05.8744561Z 2025-12-04T12:15:05.8745986Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,15_cuda E1204 12:02:36.927000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1 2025-12-04T12:15:05.8747158Z E1204 12:02:36.927000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.8747595Z E1204 12:02:36.927000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T12:15:05.8748056Z E1204 12:02:36.927000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 150 2025-12-04T12:15:05.8748576Z E1204 12:02:36.927000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] R0_BLOCK: tl.constexpr = 256 2025-12-04T12:15:05.8749049Z E1204 12:02:36.927000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T12:15:05.8749585Z E1204 12:02:36.927000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T12:15:05.8750127Z E1204 12:02:36.927000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:05.8750763Z E1204 12:02:36.927000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T12:15:05.8751346Z E1204 12:02:36.927000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T12:15:05.8751916Z E1204 12:02:36.927000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T12:15:05.8752357Z E1204 12:02:36.927000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_offset = 0 2025-12-04T12:15:05.8752877Z E1204 12:02:36.927000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T12:15:05.8753393Z E1204 12:02:36.927000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T12:15:05.8753860Z E1204 12:02:36.927000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T12:15:05.8754315Z E1204 12:02:36.927000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_2 = r0_index 2025-12-04T12:15:05.8754831Z E1204 12:02:36.927000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_1 = r0_index // 15 2025-12-04T12:15:05.8755488Z E1204 12:02:36.927000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_2), r0_mask, other=0.0).to(tl.float32) 2025-12-04T12:15:05.8756179Z E1204 12:02:36.927000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.load(in_ptr1 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0) 2025-12-04T12:15:05.8756867Z E1204 12:02:36.927000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tl.load(in_ptr2 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0) 2025-12-04T12:15:05.8757445Z E1204 12:02:36.927000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp16 = tl.load(in_ptr3 + (0)) 2025-12-04T12:15:05.8757996Z E1204 12:02:36.927000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp17 = tl.broadcast_to(tmp16, [1, 1]) 2025-12-04T12:15:05.8758520Z E1204 12:02:36.927000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float32) 2025-12-04T12:15:05.8758990Z E1204 12:02:36.927000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tmp1 - tmp2 2025-12-04T12:15:05.8759422Z E1204 12:02:36.927000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = 15.0 2025-12-04T12:15:05.8759911Z E1204 12:02:36.927000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp6 = (tmp4 / tmp5) 2025-12-04T12:15:05.8760349Z E1204 12:02:36.927000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = 1e-05 2025-12-04T12:15:05.8760826Z E1204 12:02:36.927000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tmp6 + tmp7 2025-12-04T12:15:05.8761346Z E1204 12:02:36.927000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = libdevice.rsqrt(tmp8) 2025-12-04T12:15:05.8761815Z E1204 12:02:36.927000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = tmp3 * tmp9 2025-12-04T12:15:05.8762331Z E1204 12:02:36.927000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = tl_math.abs(tmp10) 2025-12-04T12:15:05.8762925Z E1204 12:02:36.927000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = tl.broadcast_to(tmp11, [XBLOCK, R0_BLOCK]) 2025-12-04T12:15:05.8763511Z E1204 12:02:36.927000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp14 = tl.where(r0_mask, tmp12, float("-inf")) 2025-12-04T12:15:05.8764184Z E1204 12:02:36.927000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp15 = triton_helpers.max2(tmp14, 1)[:, None].to(tl.float32) 2025-12-04T12:15:05.8764675Z E1204 12:02:36.927000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp18 = tmp10 * tmp17 2025-12-04T12:15:05.8765131Z E1204 12:02:36.927000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp19 = -448.0 2025-12-04T12:15:05.8765698Z E1204 12:02:36.927000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp20 = triton_helpers.maximum(tmp18, tmp19) 2025-12-04T12:15:05.8766181Z E1204 12:02:36.927000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp21 = 448.0 2025-12-04T12:15:05.8766757Z E1204 12:02:36.927000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp22 = triton_helpers.minimum(tmp20, tmp21) 2025-12-04T12:15:05.8767306Z E1204 12:02:36.927000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp23 = tmp22.to(tl.float8e4nv) 2025-12-04T12:15:05.8767817Z E1204 12:02:36.927000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp24 = tmp15.to(tl.float32) 2025-12-04T12:15:05.8768550Z E1204 12:02:36.927000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (tl.broadcast_to(r0_2, [XBLOCK, R0_BLOCK])), tmp23, r0_mask) 2025-12-04T12:15:05.8769267Z E1204 12:02:36.927000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr2 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp24, None) 2025-12-04T12:15:05.8769632Z E1204 12:02:36.927000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:05.8772194Z E1204 12:02:36.927000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'in_ptr2': '*fp32', 'in_ptr3': '*fp32', 'out_ptr1': '*fp8e4nv', 'out_ptr2': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 75} 2025-12-04T12:15:05.8772826Z E1204 12:02:36.927000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:05.8773885Z E1204 12:02:36.927000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.8774516Z E1204 12:02:36.927000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.8775417Z E1204 12:02:36.927000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.8776096Z E1204 12:02:36.927000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.8777042Z E1204 12:02:36.927000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.8777877Z E1204 12:02:36.927000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.8778487Z E1204 12:02:36.927000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T12:15:05.8779651Z E1204 12:02:36.927000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.8780017Z E1204 12:02:36.927000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T12:15:05.8780963Z E1204 12:02:36.927000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.8781104Z ('RERUN', {'yellow': True}) [3.6095s] [100%] 2025-12-04T12:15:05.8782527Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,15_cuda E1204 12:02:37.569000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1 2025-12-04T12:15:05.8783731Z E1204 12:02:37.569000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.8784162Z E1204 12:02:37.569000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T12:15:05.8784621Z E1204 12:02:37.569000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 150 2025-12-04T12:15:05.8785184Z E1204 12:02:37.569000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] R0_BLOCK: tl.constexpr = 256 2025-12-04T12:15:05.8785658Z E1204 12:02:37.569000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T12:15:05.8786197Z E1204 12:02:37.569000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T12:15:05.8786738Z E1204 12:02:37.569000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:05.8787333Z E1204 12:02:37.569000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T12:15:05.8787914Z E1204 12:02:37.569000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T12:15:05.8788489Z E1204 12:02:37.569000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T12:15:05.8788930Z E1204 12:02:37.569000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_offset = 0 2025-12-04T12:15:05.8789447Z E1204 12:02:37.569000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T12:15:05.8789933Z E1204 12:02:37.569000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T12:15:05.8790391Z E1204 12:02:37.569000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T12:15:05.8790855Z E1204 12:02:37.569000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_2 = r0_index 2025-12-04T12:15:05.8791335Z E1204 12:02:37.569000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_1 = r0_index // 15 2025-12-04T12:15:05.8792012Z E1204 12:02:37.569000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_2), r0_mask, other=0.0).to(tl.float32) 2025-12-04T12:15:05.8792715Z E1204 12:02:37.569000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.load(in_ptr1 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0) 2025-12-04T12:15:05.8793400Z E1204 12:02:37.569000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tl.load(in_ptr2 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0) 2025-12-04T12:15:05.8793937Z E1204 12:02:37.569000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp16 = tl.load(in_ptr3 + (0)) 2025-12-04T12:15:05.8794535Z E1204 12:02:37.569000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp17 = tl.broadcast_to(tmp16, [1, 1]) 2025-12-04T12:15:05.8795063Z E1204 12:02:37.569000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float32) 2025-12-04T12:15:05.8795530Z E1204 12:02:37.569000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tmp1 - tmp2 2025-12-04T12:15:05.8795994Z E1204 12:02:37.569000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = 15.0 2025-12-04T12:15:05.8796487Z E1204 12:02:37.569000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp6 = (tmp4 / tmp5) 2025-12-04T12:15:05.8796927Z E1204 12:02:37.569000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = 1e-05 2025-12-04T12:15:05.8797401Z E1204 12:02:37.569000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tmp6 + tmp7 2025-12-04T12:15:05.8797921Z E1204 12:02:37.569000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = libdevice.rsqrt(tmp8) 2025-12-04T12:15:05.8798428Z E1204 12:02:37.569000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = tmp3 * tmp9 2025-12-04T12:15:05.8798947Z E1204 12:02:37.569000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = tl_math.abs(tmp10) 2025-12-04T12:15:05.8799540Z E1204 12:02:37.569000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = tl.broadcast_to(tmp11, [XBLOCK, R0_BLOCK]) 2025-12-04T12:15:05.8800130Z E1204 12:02:37.569000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp14 = tl.where(r0_mask, tmp12, float("-inf")) 2025-12-04T12:15:05.8800768Z E1204 12:02:37.569000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp15 = triton_helpers.max2(tmp14, 1)[:, None].to(tl.float32) 2025-12-04T12:15:05.8801256Z E1204 12:02:37.569000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp18 = tmp10 * tmp17 2025-12-04T12:15:05.8801721Z E1204 12:02:37.569000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp19 = -448.0 2025-12-04T12:15:05.8802295Z E1204 12:02:37.569000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp20 = triton_helpers.maximum(tmp18, tmp19) 2025-12-04T12:15:05.8802755Z E1204 12:02:37.569000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp21 = 448.0 2025-12-04T12:15:05.8803329Z E1204 12:02:37.569000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp22 = triton_helpers.minimum(tmp20, tmp21) 2025-12-04T12:15:05.8803863Z E1204 12:02:37.569000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp23 = tmp22.to(tl.float8e4nv) 2025-12-04T12:15:05.8804391Z E1204 12:02:37.569000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp24 = tmp15.to(tl.float32) 2025-12-04T12:15:05.8805137Z E1204 12:02:37.569000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (tl.broadcast_to(r0_2, [XBLOCK, R0_BLOCK])), tmp23, r0_mask) 2025-12-04T12:15:05.8805859Z E1204 12:02:37.569000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr2 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp24, None) 2025-12-04T12:15:05.8806228Z E1204 12:02:37.569000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:05.8808683Z E1204 12:02:37.569000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'in_ptr2': '*fp32', 'in_ptr3': '*fp32', 'out_ptr1': '*fp8e4nv', 'out_ptr2': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 75} 2025-12-04T12:15:05.8809252Z E1204 12:02:37.569000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:05.8810312Z E1204 12:02:37.569000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.8810942Z E1204 12:02:37.569000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.8811855Z E1204 12:02:37.569000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.8812572Z E1204 12:02:37.569000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.8813466Z E1204 12:02:37.569000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.8814251Z E1204 12:02:37.569000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.8814859Z E1204 12:02:37.569000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T12:15:05.8816025Z E1204 12:02:37.569000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.8816467Z E1204 12:02:37.569000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T12:15:05.8817375Z E1204 12:02:37.569000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.8817512Z ('RERUN', {'yellow': True}) [0.6020s] [100%] 2025-12-04T12:15:05.8818934Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,15_cuda E1204 12:02:38.175000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1 2025-12-04T12:15:05.8820146Z E1204 12:02:38.175000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.8820582Z E1204 12:02:38.175000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T12:15:05.8821045Z E1204 12:02:38.175000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 150 2025-12-04T12:15:05.8821563Z E1204 12:02:38.175000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] R0_BLOCK: tl.constexpr = 256 2025-12-04T12:15:05.8822068Z E1204 12:02:38.175000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T12:15:05.8822605Z E1204 12:02:38.175000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T12:15:05.8823148Z E1204 12:02:38.175000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:05.8823773Z E1204 12:02:38.175000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T12:15:05.8824355Z E1204 12:02:38.175000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T12:15:05.8824924Z E1204 12:02:38.175000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T12:15:05.8825369Z E1204 12:02:38.175000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_offset = 0 2025-12-04T12:15:05.8825891Z E1204 12:02:38.175000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T12:15:05.8826412Z E1204 12:02:38.175000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T12:15:05.8826877Z E1204 12:02:38.175000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T12:15:05.8827341Z E1204 12:02:38.175000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_2 = r0_index 2025-12-04T12:15:05.8827820Z E1204 12:02:38.175000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_1 = r0_index // 15 2025-12-04T12:15:05.8828463Z E1204 12:02:38.175000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_2), r0_mask, other=0.0).to(tl.float32) 2025-12-04T12:15:05.8829164Z E1204 12:02:38.175000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.load(in_ptr1 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0) 2025-12-04T12:15:05.8829849Z E1204 12:02:38.175000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tl.load(in_ptr2 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0) 2025-12-04T12:15:05.8830388Z E1204 12:02:38.175000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp16 = tl.load(in_ptr3 + (0)) 2025-12-04T12:15:05.8830942Z E1204 12:02:38.175000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp17 = tl.broadcast_to(tmp16, [1, 1]) 2025-12-04T12:15:05.8831459Z E1204 12:02:38.175000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float32) 2025-12-04T12:15:05.8831928Z E1204 12:02:38.175000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tmp1 - tmp2 2025-12-04T12:15:05.8832355Z E1204 12:02:38.175000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = 15.0 2025-12-04T12:15:05.8832880Z E1204 12:02:38.175000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp6 = (tmp4 / tmp5) 2025-12-04T12:15:05.8833317Z E1204 12:02:38.175000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = 1e-05 2025-12-04T12:15:05.8833797Z E1204 12:02:38.175000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tmp6 + tmp7 2025-12-04T12:15:05.8834316Z E1204 12:02:38.175000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = libdevice.rsqrt(tmp8) 2025-12-04T12:15:05.8834785Z E1204 12:02:38.175000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = tmp3 * tmp9 2025-12-04T12:15:05.8835331Z E1204 12:02:38.175000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = tl_math.abs(tmp10) 2025-12-04T12:15:05.8835931Z E1204 12:02:38.175000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = tl.broadcast_to(tmp11, [XBLOCK, R0_BLOCK]) 2025-12-04T12:15:05.8836518Z E1204 12:02:38.175000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp14 = tl.where(r0_mask, tmp12, float("-inf")) 2025-12-04T12:15:05.8837189Z E1204 12:02:38.175000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp15 = triton_helpers.max2(tmp14, 1)[:, None].to(tl.float32) 2025-12-04T12:15:05.8837669Z E1204 12:02:38.175000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp18 = tmp10 * tmp17 2025-12-04T12:15:05.8838122Z E1204 12:02:38.175000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp19 = -448.0 2025-12-04T12:15:05.8838696Z E1204 12:02:38.175000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp20 = triton_helpers.maximum(tmp18, tmp19) 2025-12-04T12:15:05.8839184Z E1204 12:02:38.175000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp21 = 448.0 2025-12-04T12:15:05.8839757Z E1204 12:02:38.175000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp22 = triton_helpers.minimum(tmp20, tmp21) 2025-12-04T12:15:05.8840294Z E1204 12:02:38.175000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp23 = tmp22.to(tl.float8e4nv) 2025-12-04T12:15:05.8840820Z E1204 12:02:38.175000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp24 = tmp15.to(tl.float32) 2025-12-04T12:15:05.8841530Z E1204 12:02:38.175000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (tl.broadcast_to(r0_2, [XBLOCK, R0_BLOCK])), tmp23, r0_mask) 2025-12-04T12:15:05.8842251Z E1204 12:02:38.175000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr2 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp24, None) 2025-12-04T12:15:05.8842617Z E1204 12:02:38.175000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:05.8844990Z E1204 12:02:38.175000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'in_ptr2': '*fp32', 'in_ptr3': '*fp32', 'out_ptr1': '*fp8e4nv', 'out_ptr2': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 2, 'num_stages': 1, 'debug': True, 'cc': 75} 2025-12-04T12:15:05.8845531Z E1204 12:02:38.175000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:05.8846633Z E1204 12:02:38.175000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.8847264Z E1204 12:02:38.175000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.8848166Z E1204 12:02:38.175000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.8848876Z E1204 12:02:38.175000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.8849755Z E1204 12:02:38.175000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.8850535Z E1204 12:02:38.175000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.8851174Z E1204 12:02:38.175000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T12:15:05.8852334Z E1204 12:02:38.175000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.8852705Z E1204 12:02:38.175000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T12:15:05.8853638Z E1204 12:02:38.175000 120975 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.8853745Z FAILED [0.6038s] [100%] 2025-12-04T12:15:05.8853754Z 2025-12-04T12:15:05.8853900Z ==================================== RERUNS ==================================== 2025-12-04T12:15:05.8854298Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,15_cuda _ 2025-12-04T12:15:05.8854425Z Traceback (most recent call last): 2025-12-04T12:15:05.8854847Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant 2025-12-04T12:15:05.8855095Z y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T12:15:05.8855589Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:05.8855852Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:05.8856431Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:05.8856632Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:05.8857162Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:05.8857311Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:05.8857860Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:05.8858183Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:05.8858705Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:05.8858873Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:05.8859393Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:05.8859531Z return self._compile_to_module() 2025-12-04T12:15:05.8860016Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:05.8860184Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:05.8860712Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:05.8860842Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:05.8861335Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:05.8861610Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:05.8862196Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:05.8862340Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:05.8862829Z File "/tmp/tmpjwoagh_5/7b/c7bkwe7w4rxkl3zpge2awqaihytaac57lg4juzzcfrkd7e4jfgcn.py", line 137, in 2025-12-04T12:15:05.8863328Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T12:15:05.8863454Z kernel.precompile( 2025-12-04T12:15:05.8864008Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T12:15:05.8864139Z self._precompile_worker() 2025-12-04T12:15:05.8864732Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:05.8864919Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:05.8865525Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.8865760Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.8866215Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.8866474Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.8866917Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.8867265Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.8867491Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:05.8868201Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.8868307Z ^ 2025-12-04T12:15:05.8868768Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.8868775Z 2025-12-04T12:15:05.8869496Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:05.8869505Z 2025-12-04T12:15:05.8869510Z 2025-12-04T12:15:05.8869728Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:05.8870422Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,15_cuda 2025-12-04T12:15:05.8870440Z 2025-12-04T12:15:05.8870708Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:05.8871159Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.8871284Z frames [('total', 1)] 2025-12-04T12:15:05.8871409Z stats [('calls_captured', 10)] 2025-12-04T12:15:05.8871949Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:15:05.8872187Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.8872295Z graph_break [] 2025-12-04T12:15:05.8872693Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,15_cuda _ 2025-12-04T12:15:05.8872818Z Traceback (most recent call last): 2025-12-04T12:15:05.8873247Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant 2025-12-04T12:15:05.8873494Z y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T12:15:05.8874033Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:05.8874283Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:05.8874817Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:05.8875010Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:05.8875658Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:05.8875808Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:05.8876340Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:05.8876674Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:05.8877194Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:05.8877357Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:05.8877883Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:05.8878006Z return self._compile_to_module() 2025-12-04T12:15:05.8878503Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:05.8878671Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:05.8879188Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:05.8879331Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:05.8879827Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:05.8880082Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:05.8880669Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:05.8880801Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:05.8881329Z File "/tmp/tmp1uu6riyx/rp/crpvzmvk2nslzhw4ccpeejwn2sicjksy4yichzaagajbkggpjegs.py", line 137, in 2025-12-04T12:15:05.8881796Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T12:15:05.8881928Z kernel.precompile( 2025-12-04T12:15:05.8882484Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T12:15:05.8882601Z self._precompile_worker() 2025-12-04T12:15:05.8883210Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:05.8883394Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:05.8883987Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.8884231Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.8884685Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.8884945Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.8885388Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.8885720Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.8885959Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:05.8886698Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.8886805Z ^ 2025-12-04T12:15:05.8887266Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.8887272Z 2025-12-04T12:15:05.8887981Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:05.8888031Z 2025-12-04T12:15:05.8888036Z 2025-12-04T12:15:05.8888254Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:05.8888947Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,15_cuda 2025-12-04T12:15:05.8888952Z 2025-12-04T12:15:05.8889236Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:05.8889460Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.8889622Z frames [('total', 1)] 2025-12-04T12:15:05.8889754Z stats [('calls_captured', 10)] 2025-12-04T12:15:05.8890222Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:15:05.8890461Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.8890567Z graph_break [] 2025-12-04T12:15:05.8890788Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.8890909Z frames [('total', 1)] 2025-12-04T12:15:05.8891027Z stats [('calls_captured', 10)] 2025-12-04T12:15:05.8891249Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.8891726Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:15:05.8891826Z graph_break [] 2025-12-04T12:15:05.8891991Z =================================== FAILURES =================================== 2025-12-04T12:15:05.8892381Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,15_cuda _ 2025-12-04T12:15:05.8892508Z Traceback (most recent call last): 2025-12-04T12:15:05.8892947Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant 2025-12-04T12:15:05.8893186Z y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T12:15:05.8893677Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:05.8893945Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:05.8894460Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:05.8894668Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:05.8895180Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:05.8895336Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:05.8895914Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:05.8896241Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:05.8896850Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:05.8897005Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:05.8897486Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:05.8897626Z return self._compile_to_module() 2025-12-04T12:15:05.8898153Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:05.8898321Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:05.8898858Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:05.8898991Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:05.8899531Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:05.8899765Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:05.8900348Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:05.8900489Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:05.8900994Z File "/tmp/tmpll822hcz/dm/cdm4qxpijru6fav7joz4qry3i522rghlxy7haujqu6yeqbv2yt6o.py", line 137, in 2025-12-04T12:15:05.8901475Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T12:15:05.8901615Z kernel.precompile( 2025-12-04T12:15:05.8902171Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T12:15:05.8902303Z self._precompile_worker() 2025-12-04T12:15:05.8902904Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:05.8903095Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:05.8903686Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.8903884Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.8904351Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.8904596Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.8905043Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.8905388Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.8905618Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:05.8906333Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.8906423Z ^ 2025-12-04T12:15:05.8906882Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.8906888Z 2025-12-04T12:15:05.8907611Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:05.8907620Z 2025-12-04T12:15:05.8907625Z 2025-12-04T12:15:05.8907880Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:05.8908589Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,15_cuda 2025-12-04T12:15:05.8908597Z 2025-12-04T12:15:05.8908867Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:05.8909103Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.8909208Z frames [('total', 1)] 2025-12-04T12:15:05.8909325Z stats [('calls_captured', 10)] 2025-12-04T12:15:05.8909804Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:15:05.8910057Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.8910160Z graph_break [] 2025-12-04T12:15:05.8910394Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.8910504Z frames [('total', 1)] 2025-12-04T12:15:05.8910622Z stats [('calls_captured', 10)] 2025-12-04T12:15:05.8910856Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.8911346Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:15:05.8911458Z graph_break [] 2025-12-04T12:15:05.8911676Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.8911780Z frames [('total', 1)] 2025-12-04T12:15:05.8911910Z stats [('calls_captured', 10)] 2025-12-04T12:15:05.8912129Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.8912584Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:15:05.8912697Z graph_break [] 2025-12-04T12:15:05.8913384Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-df1b42bf8f6cd06e.xml - 2025-12-04T12:15:05.8913575Z =========================== short test summary info ============================ 2025-12-04T12:15:05.8914408Z FAILED [0.6038s] inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,15_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:05.8915118Z def triton_per_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr): 2025-12-04T12:15:05.8915222Z ^ 2025-12-04T12:15:05.8915682Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.8915690Z 2025-12-04T12:15:05.8916407Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:05.8916417Z 2025-12-04T12:15:05.8916422Z 2025-12-04T12:15:05.8916638Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:05.8917344Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,15_cuda 2025-12-04T12:15:05.8917352Z 2025-12-04T12:15:05.8917619Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:05.8917803Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T12:15:05.8918021Z ================== 1 failed, 187 deselected, 2 rerun in 4.86s ================== 2025-12-04T12:15:05.8918123Z Got exit code 1 2025-12-04T12:15:05.8918748Z FAILED CONSISTENTLY: test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,15_cuda 2025-12-04T12:15:05.8919190Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T12:15:05.8919662Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-97d6c66aee44b097.xml 2025-12-04T12:15:05.8919842Z ============================= test session starts ============================== 2025-12-04T12:15:05.8920195Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T12:15:05.8920307Z cachedir: .pytest_cache 2025-12-04T12:15:05.8920838Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:15:05.8920963Z rootdir: /var/lib/jenkins/workspace 2025-12-04T12:15:05.8921088Z configfile: pytest.ini 2025-12-04T12:15:05.8921709Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T12:15:05.8921943Z collecting ... collected 188 items / 34 deselected / 154 selected 2025-12-04T12:15:05.8922104Z stepcurrent: skipping 34 already run items. 2025-12-04T12:15:05.8922219Z Running 154 items in this shard 2025-12-04T12:15:05.8922224Z 2025-12-04T12:15:05.8923619Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,4096_cuda E1204 12:02:56.624000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0 2025-12-04T12:15:05.8924710Z E1204 12:02:56.624000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T12:15:05.8925150Z E1204 12:02:56.624000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 10 2025-12-04T12:15:05.8925650Z E1204 12:02:56.624000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 4096 2025-12-04T12:15:05.8926109Z E1204 12:02:56.624000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T12:15:05.8926660Z E1204 12:02:56.624000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T12:15:05.8927199Z E1204 12:02:56.624000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:05.8927781Z E1204 12:02:56.624000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T12:15:05.8928287Z E1204 12:02:56.624000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = xindex < xnumel 2025-12-04T12:15:05.8928844Z E1204 12:02:56.624000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_base = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T12:15:05.8929305Z E1204 12:02:56.624000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rbase = r0_base 2025-12-04T12:15:05.8929738Z E1204 12:02:56.624000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T12:15:05.8930348Z E1204 12:02:56.624000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_mean = tl.zeros([XBLOCK, R0_BLOCK], tl.float32) 2025-12-04T12:15:05.8930934Z E1204 12:02:56.624000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_m2 = tl.zeros([XBLOCK, R0_BLOCK], tl.float32) 2025-12-04T12:15:05.8931543Z E1204 12:02:56.624000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_weight = tl.zeros([XBLOCK, R0_BLOCK], tl.float32) 2025-12-04T12:15:05.8932170Z E1204 12:02:56.624000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] for r0_offset in tl.range(0, r0_numel, R0_BLOCK): 2025-12-04T12:15:05.8932705Z E1204 12:02:56.624000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = r0_offset + r0_base 2025-12-04T12:15:05.8933244Z E1204 12:02:56.624000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T12:15:05.8933733Z E1204 12:02:56.624000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T12:15:05.8934213Z E1204 12:02:56.624000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T12:15:05.8934724Z E1204 12:02:56.624000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_1 = r0_index 2025-12-04T12:15:05.8935544Z E1204 12:02:56.624000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask & xmask, eviction_policy='evict_last', other=0.0).to(tl.float32) 2025-12-04T12:15:05.8936083Z E1204 12:02:56.624000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float32) 2025-12-04T12:15:05.8936796Z E1204 12:02:56.624000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK]) 2025-12-04T12:15:05.8937517Z E1204 12:02:56.624000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_mean_next, tmp3_m2_next, tmp3_weight_next = triton_helpers.welford_reduce( 2025-12-04T12:15:05.8938137Z E1204 12:02:56.624000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2, tmp3_mean, tmp3_m2, tmp3_weight, roffset == 0 2025-12-04T12:15:05.8938542Z E1204 12:02:56.624000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ) 2025-12-04T12:15:05.8939244Z E1204 12:02:56.624000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_mean = tl.where(r0_mask & xmask, tmp3_mean_next, tmp3_mean) 2025-12-04T12:15:05.8939862Z E1204 12:02:56.624000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_m2 = tl.where(r0_mask & xmask, tmp3_m2_next, tmp3_m2) 2025-12-04T12:15:05.8940555Z E1204 12:02:56.624000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_weight = tl.where(r0_mask & xmask, tmp3_weight_next, tmp3_weight) 2025-12-04T12:15:05.8941260Z E1204 12:02:56.624000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4, tmp5, tmp6 = triton_helpers.welford(tmp3_mean, tmp3_m2, tmp3_weight, 1) 2025-12-04T12:15:05.8941740Z E1204 12:02:56.624000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tmp4[:, None] 2025-12-04T12:15:05.8942233Z E1204 12:02:56.624000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = tmp5[:, None] 2025-12-04T12:15:05.8942707Z E1204 12:02:56.624000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tmp6[:, None] 2025-12-04T12:15:05.8943358Z E1204 12:02:56.624000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp20 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32) 2025-12-04T12:15:05.8943883Z E1204 12:02:56.624000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp22 = tl.load(in_ptr1 + (0)) 2025-12-04T12:15:05.8944433Z E1204 12:02:56.624000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp23 = tl.broadcast_to(tmp22, [1, 1]) 2025-12-04T12:15:05.8945027Z E1204 12:02:56.624000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] for r0_offset in tl.range(0, r0_numel, R0_BLOCK): 2025-12-04T12:15:05.8945600Z E1204 12:02:56.624000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = r0_offset + r0_base 2025-12-04T12:15:05.8946139Z E1204 12:02:56.624000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T12:15:05.8946628Z E1204 12:02:56.624000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T12:15:05.8947123Z E1204 12:02:56.624000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T12:15:05.8947588Z E1204 12:02:56.624000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_1 = r0_index 2025-12-04T12:15:05.8948433Z E1204 12:02:56.624000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask & xmask, eviction_policy='evict_first', other=0.0).to(tl.float32) 2025-12-04T12:15:05.8948981Z E1204 12:02:56.624000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = tmp9.to(tl.float32) 2025-12-04T12:15:05.8949475Z E1204 12:02:56.624000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = tmp10 - tmp3 2025-12-04T12:15:05.8949979Z E1204 12:02:56.624000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = 4096.0 2025-12-04T12:15:05.8950482Z E1204 12:02:56.624000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = (tmp7 / tmp12) 2025-12-04T12:15:05.8950940Z E1204 12:02:56.624000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp14 = 1e-05 2025-12-04T12:15:05.8951454Z E1204 12:02:56.624000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp15 = tmp13 + tmp14 2025-12-04T12:15:05.8951995Z E1204 12:02:56.624000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp16 = libdevice.rsqrt(tmp15) 2025-12-04T12:15:05.8952539Z E1204 12:02:56.624000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp17 = tmp11 * tmp16 2025-12-04T12:15:05.8953060Z E1204 12:02:56.624000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp18 = tl_math.abs(tmp17) 2025-12-04T12:15:05.8953658Z E1204 12:02:56.624000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp19 = tl.broadcast_to(tmp18, [XBLOCK, R0_BLOCK]) 2025-12-04T12:15:05.8954253Z E1204 12:02:56.624000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp21 = triton_helpers.maximum(_tmp20, tmp19) 2025-12-04T12:15:05.8954841Z E1204 12:02:56.624000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp20 = tl.where(r0_mask & xmask, tmp21, _tmp20) 2025-12-04T12:15:05.8955352Z E1204 12:02:56.624000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp24 = tmp17 * tmp23 2025-12-04T12:15:05.8955817Z E1204 12:02:56.624000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp25 = -448.0 2025-12-04T12:15:05.8956401Z E1204 12:02:56.624000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp26 = triton_helpers.maximum(tmp24, tmp25) 2025-12-04T12:15:05.8956875Z E1204 12:02:56.624000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp27 = 448.0 2025-12-04T12:15:05.8957450Z E1204 12:02:56.624000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp28 = triton_helpers.minimum(tmp26, tmp27) 2025-12-04T12:15:05.8958586Z E1204 12:02:56.624000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp29 = tmp28.to(tl.float8e4nv) 2025-12-04T12:15:05.8959218Z E1204 12:02:56.624000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr3 + (r0_1 + 4096*x0), tmp29, r0_mask & xmask) 2025-12-04T12:15:05.8959858Z E1204 12:02:56.624000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp20 = triton_helpers.max2(_tmp20, 1)[:, None] 2025-12-04T12:15:05.8960409Z E1204 12:02:56.624000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr2 + (x0), tmp20, xmask) 2025-12-04T12:15:05.8960772Z E1204 12:02:56.624000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:05.8963065Z E1204 12:02:56.624000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr2': '*fp32', 'out_ptr3': '*fp8e4nv', 'xnumel': 'i32', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1, 'R0_BLOCK': 4096}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 75} 2025-12-04T12:15:05.8963604Z E1204 12:02:56.624000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:05.8964686Z E1204 12:02:56.624000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.8965319Z E1204 12:02:56.624000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.8966229Z E1204 12:02:56.624000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.8966944Z E1204 12:02:56.624000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.8967838Z E1204 12:02:56.624000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.8968603Z E1204 12:02:56.624000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.8969209Z E1204 12:02:56.624000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T12:15:05.8970317Z E1204 12:02:56.624000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T12:15:05.8970741Z E1204 12:02:56.624000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T12:15:05.8971879Z E1204 12:02:56.624000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.8972017Z ('RERUN', {'yellow': True}) [3.4136s] [ 0%] 2025-12-04T12:15:05.8973373Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,4096_cuda E1204 12:02:57.099000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0 2025-12-04T12:15:05.8974560Z E1204 12:02:57.099000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T12:15:05.8975015Z E1204 12:02:57.099000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 10 2025-12-04T12:15:05.8975468Z E1204 12:02:57.099000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 4096 2025-12-04T12:15:05.8975929Z E1204 12:02:57.099000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T12:15:05.8976540Z E1204 12:02:57.099000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T12:15:05.8977138Z E1204 12:02:57.099000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:05.8977741Z E1204 12:02:57.099000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T12:15:05.8978236Z E1204 12:02:57.099000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = xindex < xnumel 2025-12-04T12:15:05.8978839Z E1204 12:02:57.099000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_base = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T12:15:05.8979305Z E1204 12:02:57.099000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rbase = r0_base 2025-12-04T12:15:05.8979746Z E1204 12:02:57.099000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T12:15:05.8980362Z E1204 12:02:57.099000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_mean = tl.zeros([XBLOCK, R0_BLOCK], tl.float32) 2025-12-04T12:15:05.8981002Z E1204 12:02:57.099000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_m2 = tl.zeros([XBLOCK, R0_BLOCK], tl.float32) 2025-12-04T12:15:05.8981606Z E1204 12:02:57.099000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_weight = tl.zeros([XBLOCK, R0_BLOCK], tl.float32) 2025-12-04T12:15:05.8982200Z E1204 12:02:57.099000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] for r0_offset in tl.range(0, r0_numel, R0_BLOCK): 2025-12-04T12:15:05.8982732Z E1204 12:02:57.099000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = r0_offset + r0_base 2025-12-04T12:15:05.8983272Z E1204 12:02:57.099000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T12:15:05.8983765Z E1204 12:02:57.099000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T12:15:05.8984255Z E1204 12:02:57.099000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T12:15:05.8984742Z E1204 12:02:57.099000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_1 = r0_index 2025-12-04T12:15:05.8985563Z E1204 12:02:57.099000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask & xmask, eviction_policy='evict_last', other=0.0).to(tl.float32) 2025-12-04T12:15:05.8986108Z E1204 12:02:57.099000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float32) 2025-12-04T12:15:05.8986700Z E1204 12:02:57.099000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK]) 2025-12-04T12:15:05.8987438Z E1204 12:02:57.099000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_mean_next, tmp3_m2_next, tmp3_weight_next = triton_helpers.welford_reduce( 2025-12-04T12:15:05.8988094Z E1204 12:02:57.099000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2, tmp3_mean, tmp3_m2, tmp3_weight, roffset == 0 2025-12-04T12:15:05.8988506Z E1204 12:02:57.099000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ) 2025-12-04T12:15:05.8989176Z E1204 12:02:57.099000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_mean = tl.where(r0_mask & xmask, tmp3_mean_next, tmp3_mean) 2025-12-04T12:15:05.8989797Z E1204 12:02:57.099000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_m2 = tl.where(r0_mask & xmask, tmp3_m2_next, tmp3_m2) 2025-12-04T12:15:05.8990543Z E1204 12:02:57.099000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_weight = tl.where(r0_mask & xmask, tmp3_weight_next, tmp3_weight) 2025-12-04T12:15:05.8991259Z E1204 12:02:57.099000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4, tmp5, tmp6 = triton_helpers.welford(tmp3_mean, tmp3_m2, tmp3_weight, 1) 2025-12-04T12:15:05.8991739Z E1204 12:02:57.099000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tmp4[:, None] 2025-12-04T12:15:05.8992277Z E1204 12:02:57.099000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = tmp5[:, None] 2025-12-04T12:15:05.8992747Z E1204 12:02:57.099000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tmp6[:, None] 2025-12-04T12:15:05.8993403Z E1204 12:02:57.099000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp20 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32) 2025-12-04T12:15:05.8993936Z E1204 12:02:57.099000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp22 = tl.load(in_ptr1 + (0)) 2025-12-04T12:15:05.8994539Z E1204 12:02:57.099000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp23 = tl.broadcast_to(tmp22, [1, 1]) 2025-12-04T12:15:05.8995124Z E1204 12:02:57.099000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] for r0_offset in tl.range(0, r0_numel, R0_BLOCK): 2025-12-04T12:15:05.8995658Z E1204 12:02:57.099000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = r0_offset + r0_base 2025-12-04T12:15:05.8996203Z E1204 12:02:57.099000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T12:15:05.8996696Z E1204 12:02:57.099000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T12:15:05.8997193Z E1204 12:02:57.099000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T12:15:05.8997666Z E1204 12:02:57.099000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_1 = r0_index 2025-12-04T12:15:05.8998482Z E1204 12:02:57.099000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask & xmask, eviction_policy='evict_first', other=0.0).to(tl.float32) 2025-12-04T12:15:05.8999029Z E1204 12:02:57.099000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = tmp9.to(tl.float32) 2025-12-04T12:15:05.8999529Z E1204 12:02:57.099000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = tmp10 - tmp3 2025-12-04T12:15:05.9000009Z E1204 12:02:57.099000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = 4096.0 2025-12-04T12:15:05.9000826Z E1204 12:02:57.099000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = (tmp7 / tmp12) 2025-12-04T12:15:05.9001342Z E1204 12:02:57.099000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp14 = 1e-05 2025-12-04T12:15:05.9001854Z E1204 12:02:57.099000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp15 = tmp13 + tmp14 2025-12-04T12:15:05.9002396Z E1204 12:02:57.099000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp16 = libdevice.rsqrt(tmp15) 2025-12-04T12:15:05.9002904Z E1204 12:02:57.099000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp17 = tmp11 * tmp16 2025-12-04T12:15:05.9003425Z E1204 12:02:57.099000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp18 = tl_math.abs(tmp17) 2025-12-04T12:15:05.9004059Z E1204 12:02:57.099000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp19 = tl.broadcast_to(tmp18, [XBLOCK, R0_BLOCK]) 2025-12-04T12:15:05.9004653Z E1204 12:02:57.099000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp21 = triton_helpers.maximum(_tmp20, tmp19) 2025-12-04T12:15:05.9005241Z E1204 12:02:57.099000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp20 = tl.where(r0_mask & xmask, tmp21, _tmp20) 2025-12-04T12:15:05.9005786Z E1204 12:02:57.099000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp24 = tmp17 * tmp23 2025-12-04T12:15:05.9006247Z E1204 12:02:57.099000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp25 = -448.0 2025-12-04T12:15:05.9006835Z E1204 12:02:57.099000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp26 = triton_helpers.maximum(tmp24, tmp25) 2025-12-04T12:15:05.9007294Z E1204 12:02:57.099000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp27 = 448.0 2025-12-04T12:15:05.9007873Z E1204 12:02:57.099000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp28 = triton_helpers.minimum(tmp26, tmp27) 2025-12-04T12:15:05.9008459Z E1204 12:02:57.099000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp29 = tmp28.to(tl.float8e4nv) 2025-12-04T12:15:05.9009086Z E1204 12:02:57.099000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr3 + (r0_1 + 4096*x0), tmp29, r0_mask & xmask) 2025-12-04T12:15:05.9009675Z E1204 12:02:57.099000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp20 = triton_helpers.max2(_tmp20, 1)[:, None] 2025-12-04T12:15:05.9010224Z E1204 12:02:57.099000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr2 + (x0), tmp20, xmask) 2025-12-04T12:15:05.9010585Z E1204 12:02:57.099000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:05.9012849Z E1204 12:02:57.099000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr2': '*fp32', 'out_ptr3': '*fp8e4nv', 'xnumel': 'i32', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1, 'R0_BLOCK': 4096}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 75} 2025-12-04T12:15:05.9013389Z E1204 12:02:57.099000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:05.9014464Z E1204 12:02:57.099000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.9015134Z E1204 12:02:57.099000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.9016046Z E1204 12:02:57.099000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.9016795Z E1204 12:02:57.099000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.9017690Z E1204 12:02:57.099000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.9018505Z E1204 12:02:57.099000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.9019135Z E1204 12:02:57.099000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T12:15:05.9020255Z E1204 12:02:57.099000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T12:15:05.9020625Z E1204 12:02:57.099000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T12:15:05.9021536Z E1204 12:02:57.099000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.9021671Z ('RERUN', {'yellow': True}) [0.4359s] [ 0%] 2025-12-04T12:15:05.9023056Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,4096_cuda E1204 12:02:57.542000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0 2025-12-04T12:15:05.9024143Z E1204 12:02:57.542000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T12:15:05.9024591Z E1204 12:02:57.542000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 10 2025-12-04T12:15:05.9025043Z E1204 12:02:57.542000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 4096 2025-12-04T12:15:05.9025501Z E1204 12:02:57.542000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T12:15:05.9026058Z E1204 12:02:57.542000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T12:15:05.9026596Z E1204 12:02:57.542000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:05.9027201Z E1204 12:02:57.542000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T12:15:05.9027694Z E1204 12:02:57.542000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = xindex < xnumel 2025-12-04T12:15:05.9028250Z E1204 12:02:57.542000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_base = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T12:15:05.9028709Z E1204 12:02:57.542000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rbase = r0_base 2025-12-04T12:15:05.9029202Z E1204 12:02:57.542000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T12:15:05.9029815Z E1204 12:02:57.542000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_mean = tl.zeros([XBLOCK, R0_BLOCK], tl.float32) 2025-12-04T12:15:05.9030404Z E1204 12:02:57.542000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_m2 = tl.zeros([XBLOCK, R0_BLOCK], tl.float32) 2025-12-04T12:15:05.9031012Z E1204 12:02:57.542000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_weight = tl.zeros([XBLOCK, R0_BLOCK], tl.float32) 2025-12-04T12:15:05.9031644Z E1204 12:02:57.542000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] for r0_offset in tl.range(0, r0_numel, R0_BLOCK): 2025-12-04T12:15:05.9032176Z E1204 12:02:57.542000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = r0_offset + r0_base 2025-12-04T12:15:05.9032720Z E1204 12:02:57.542000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T12:15:05.9033211Z E1204 12:02:57.542000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T12:15:05.9033739Z E1204 12:02:57.542000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T12:15:05.9034205Z E1204 12:02:57.542000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_1 = r0_index 2025-12-04T12:15:05.9035021Z E1204 12:02:57.542000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask & xmask, eviction_policy='evict_last', other=0.0).to(tl.float32) 2025-12-04T12:15:05.9035559Z E1204 12:02:57.542000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float32) 2025-12-04T12:15:05.9036176Z E1204 12:02:57.542000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK]) 2025-12-04T12:15:05.9036910Z E1204 12:02:57.542000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_mean_next, tmp3_m2_next, tmp3_weight_next = triton_helpers.welford_reduce( 2025-12-04T12:15:05.9037512Z E1204 12:02:57.542000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2, tmp3_mean, tmp3_m2, tmp3_weight, roffset == 0 2025-12-04T12:15:05.9037912Z E1204 12:02:57.542000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ) 2025-12-04T12:15:05.9038582Z E1204 12:02:57.542000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_mean = tl.where(r0_mask & xmask, tmp3_mean_next, tmp3_mean) 2025-12-04T12:15:05.9039205Z E1204 12:02:57.542000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_m2 = tl.where(r0_mask & xmask, tmp3_m2_next, tmp3_m2) 2025-12-04T12:15:05.9039887Z E1204 12:02:57.542000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_weight = tl.where(r0_mask & xmask, tmp3_weight_next, tmp3_weight) 2025-12-04T12:15:05.9040594Z E1204 12:02:57.542000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4, tmp5, tmp6 = triton_helpers.welford(tmp3_mean, tmp3_m2, tmp3_weight, 1) 2025-12-04T12:15:05.9041087Z E1204 12:02:57.542000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tmp4[:, None] 2025-12-04T12:15:05.9041565Z E1204 12:02:57.542000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = tmp5[:, None] 2025-12-04T12:15:05.9042039Z E1204 12:02:57.542000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tmp6[:, None] 2025-12-04T12:15:05.9042749Z E1204 12:02:57.542000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp20 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32) 2025-12-04T12:15:05.9043274Z E1204 12:02:57.542000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp22 = tl.load(in_ptr1 + (0)) 2025-12-04T12:15:05.9043832Z E1204 12:02:57.542000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp23 = tl.broadcast_to(tmp22, [1, 1]) 2025-12-04T12:15:05.9044408Z E1204 12:02:57.542000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] for r0_offset in tl.range(0, r0_numel, R0_BLOCK): 2025-12-04T12:15:05.9044973Z E1204 12:02:57.542000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = r0_offset + r0_base 2025-12-04T12:15:05.9045514Z E1204 12:02:57.542000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T12:15:05.9046007Z E1204 12:02:57.542000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T12:15:05.9046495Z E1204 12:02:57.542000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T12:15:05.9046990Z E1204 12:02:57.542000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_1 = r0_index 2025-12-04T12:15:05.9047809Z E1204 12:02:57.542000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask & xmask, eviction_policy='evict_first', other=0.0).to(tl.float32) 2025-12-04T12:15:05.9048357Z E1204 12:02:57.542000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = tmp9.to(tl.float32) 2025-12-04T12:15:05.9048853Z E1204 12:02:57.542000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = tmp10 - tmp3 2025-12-04T12:15:05.9049365Z E1204 12:02:57.542000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = 4096.0 2025-12-04T12:15:05.9049865Z E1204 12:02:57.542000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = (tmp7 / tmp12) 2025-12-04T12:15:05.9050337Z E1204 12:02:57.542000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp14 = 1e-05 2025-12-04T12:15:05.9050833Z E1204 12:02:57.542000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp15 = tmp13 + tmp14 2025-12-04T12:15:05.9051368Z E1204 12:02:57.542000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp16 = libdevice.rsqrt(tmp15) 2025-12-04T12:15:05.9051880Z E1204 12:02:57.542000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp17 = tmp11 * tmp16 2025-12-04T12:15:05.9052403Z E1204 12:02:57.542000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp18 = tl_math.abs(tmp17) 2025-12-04T12:15:05.9053005Z E1204 12:02:57.542000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp19 = tl.broadcast_to(tmp18, [XBLOCK, R0_BLOCK]) 2025-12-04T12:15:05.9053584Z E1204 12:02:57.542000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp21 = triton_helpers.maximum(_tmp20, tmp19) 2025-12-04T12:15:05.9054170Z E1204 12:02:57.542000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp20 = tl.where(r0_mask & xmask, tmp21, _tmp20) 2025-12-04T12:15:05.9054679Z E1204 12:02:57.542000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp24 = tmp17 * tmp23 2025-12-04T12:15:05.9055142Z E1204 12:02:57.542000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp25 = -448.0 2025-12-04T12:15:05.9055764Z E1204 12:02:57.542000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp26 = triton_helpers.maximum(tmp24, tmp25) 2025-12-04T12:15:05.9056222Z E1204 12:02:57.542000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp27 = 448.0 2025-12-04T12:15:05.9056893Z E1204 12:02:57.542000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp28 = triton_helpers.minimum(tmp26, tmp27) 2025-12-04T12:15:05.9057447Z E1204 12:02:57.542000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp29 = tmp28.to(tl.float8e4nv) 2025-12-04T12:15:05.9058073Z E1204 12:02:57.542000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr3 + (r0_1 + 4096*x0), tmp29, r0_mask & xmask) 2025-12-04T12:15:05.9058698Z E1204 12:02:57.542000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp20 = triton_helpers.max2(_tmp20, 1)[:, None] 2025-12-04T12:15:05.9059250Z E1204 12:02:57.542000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr2 + (x0), tmp20, xmask) 2025-12-04T12:15:05.9059611Z E1204 12:02:57.542000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:05.9061900Z E1204 12:02:57.542000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr2': '*fp32', 'out_ptr3': '*fp8e4nv', 'xnumel': 'i32', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1, 'R0_BLOCK': 4096}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 75} 2025-12-04T12:15:05.9062477Z E1204 12:02:57.542000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:05.9063520Z E1204 12:02:57.542000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.9064152Z E1204 12:02:57.542000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.9065064Z E1204 12:02:57.542000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.9065749Z E1204 12:02:57.542000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.9066647Z E1204 12:02:57.542000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.9067422Z E1204 12:02:57.542000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.9068044Z E1204 12:02:57.542000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T12:15:05.9069138Z E1204 12:02:57.542000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T12:15:05.9069506Z E1204 12:02:57.542000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T12:15:05.9070453Z E1204 12:02:57.542000 121215 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.9070565Z FAILED [0.4407s] [ 0%] 2025-12-04T12:15:05.9070572Z 2025-12-04T12:15:05.9070736Z ==================================== RERUNS ==================================== 2025-12-04T12:15:05.9071322Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,4096_cuda _ 2025-12-04T12:15:05.9071450Z Traceback (most recent call last): 2025-12-04T12:15:05.9071889Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant 2025-12-04T12:15:05.9072218Z y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T12:15:05.9072723Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:05.9072980Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:05.9073494Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:05.9073749Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:05.9074259Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:05.9074421Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:05.9074954Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:05.9075282Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:05.9075817Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:05.9076011Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:05.9076491Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:05.9076630Z return self._compile_to_module() 2025-12-04T12:15:05.9077115Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:05.9077291Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:05.9077807Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:05.9077938Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:05.9078449Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:05.9078682Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:05.9079284Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:05.9079429Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:05.9079928Z File "/tmp/tmpw7y32ewp/66/c66hsm3vmwnt3wy77ar7w7thboulp5wcyjt6mhqsautxju4d2lnc.py", line 65, in 2025-12-04T12:15:05.9080404Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T12:15:05.9080519Z kernel.precompile( 2025-12-04T12:15:05.9081073Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T12:15:05.9081210Z self._precompile_worker() 2025-12-04T12:15:05.9081811Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:05.9082013Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:05.9082648Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.9082851Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.9083322Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.9083571Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.9084028Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.9084366Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.9084718Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:05.9085389Z def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T12:15:05.9085488Z ^ 2025-12-04T12:15:05.9085946Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.9085999Z 2025-12-04T12:15:05.9086711Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:05.9086718Z 2025-12-04T12:15:05.9086723Z 2025-12-04T12:15:05.9086943Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:05.9087668Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,4096_cuda 2025-12-04T12:15:05.9087673Z 2025-12-04T12:15:05.9087947Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:05.9088240Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.9088352Z frames [('total', 1)] 2025-12-04T12:15:05.9088476Z stats [('calls_captured', 10)] 2025-12-04T12:15:05.9088965Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:05.9089195Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.9089311Z graph_break [] 2025-12-04T12:15:05.9089711Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,4096_cuda _ 2025-12-04T12:15:05.9089838Z Traceback (most recent call last): 2025-12-04T12:15:05.9090274Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant 2025-12-04T12:15:05.9090508Z y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T12:15:05.9091001Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:05.9091270Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:05.9091786Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:05.9091998Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:05.9092511Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:05.9092660Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:05.9093217Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:05.9093540Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:05.9094084Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:05.9094239Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:05.9094755Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:05.9094893Z return self._compile_to_module() 2025-12-04T12:15:05.9095379Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:05.9095542Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:05.9096072Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:05.9096201Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:05.9096815Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:05.9097049Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:05.9097637Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:05.9097779Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:05.9098257Z File "/tmp/tmp_wmyafcy/zc/czcx4l4pcssfzep4tga4tuzmlabsywmuqvcdzztv74d4wc3zuzud.py", line 65, in 2025-12-04T12:15:05.9098766Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T12:15:05.9098877Z kernel.precompile( 2025-12-04T12:15:05.9099431Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T12:15:05.9099561Z self._precompile_worker() 2025-12-04T12:15:05.9100160Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:05.9100339Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:05.9100978Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.9101176Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.9101644Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.9101889Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.9102330Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.9102678Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.9102906Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:05.9103570Z def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T12:15:05.9103663Z ^ 2025-12-04T12:15:05.9104123Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.9104129Z 2025-12-04T12:15:05.9104856Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:05.9104863Z 2025-12-04T12:15:05.9104867Z 2025-12-04T12:15:05.9105085Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:05.9105807Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,4096_cuda 2025-12-04T12:15:05.9105812Z 2025-12-04T12:15:05.9106086Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:05.9106308Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.9106432Z frames [('total', 1)] 2025-12-04T12:15:05.9106580Z stats [('calls_captured', 10)] 2025-12-04T12:15:05.9107059Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:05.9107284Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.9107384Z graph_break [] 2025-12-04T12:15:05.9107615Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.9107722Z frames [('total', 1)] 2025-12-04T12:15:05.9107838Z stats [('calls_captured', 10)] 2025-12-04T12:15:05.9108070Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.9108563Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:05.9108678Z graph_break [] 2025-12-04T12:15:05.9108828Z =================================== FAILURES =================================== 2025-12-04T12:15:05.9109228Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,4096_cuda _ 2025-12-04T12:15:05.9109367Z Traceback (most recent call last): 2025-12-04T12:15:05.9109825Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant 2025-12-04T12:15:05.9110056Z y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T12:15:05.9110556Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:05.9110808Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:05.9111332Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:05.9111528Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:05.9112073Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:05.9112235Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:05.9112768Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:05.9113106Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:05.9113624Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:05.9113774Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:05.9114270Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:05.9114395Z return self._compile_to_module() 2025-12-04T12:15:05.9114879Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:05.9115060Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:05.9115575Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:05.9115720Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:05.9116216Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:05.9116449Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:05.9117052Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:05.9117179Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:05.9117709Z File "/tmp/tmpl4vojzx9/pz/cpzghq2dotxh6p6q74wpwp5f23tqx5ymabvlkqjlncl4jfmyqfhi.py", line 65, in 2025-12-04T12:15:05.9118170Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T12:15:05.9118316Z kernel.precompile( 2025-12-04T12:15:05.9118883Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T12:15:05.9119003Z self._precompile_worker() 2025-12-04T12:15:05.9119597Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:05.9119791Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:05.9120383Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.9120596Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.9121077Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.9121325Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.9121784Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.9122118Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.9122392Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:05.9123040Z def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T12:15:05.9123131Z ^ 2025-12-04T12:15:05.9123602Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.9123607Z 2025-12-04T12:15:05.9124322Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:05.9124363Z 2025-12-04T12:15:05.9124370Z 2025-12-04T12:15:05.9124600Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:05.9125305Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,4096_cuda 2025-12-04T12:15:05.9125313Z 2025-12-04T12:15:05.9125586Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:05.9125819Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.9125926Z frames [('total', 1)] 2025-12-04T12:15:05.9126059Z stats [('calls_captured', 10)] 2025-12-04T12:15:05.9126528Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:05.9126749Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.9126869Z graph_break [] 2025-12-04T12:15:05.9127089Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.9127194Z frames [('total', 1)] 2025-12-04T12:15:05.9127324Z stats [('calls_captured', 10)] 2025-12-04T12:15:05.9127545Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.9128017Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:05.9128117Z graph_break [] 2025-12-04T12:15:05.9128331Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.9128447Z frames [('total', 1)] 2025-12-04T12:15:05.9128563Z stats [('calls_captured', 10)] 2025-12-04T12:15:05.9128781Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.9129259Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:05.9129360Z graph_break [] 2025-12-04T12:15:05.9130058Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-97d6c66aee44b097.xml - 2025-12-04T12:15:05.9130236Z =========================== short test summary info ============================ 2025-12-04T12:15:05.9131083Z FAILED [0.4407s] inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,4096_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:05.9131745Z def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T12:15:05.9131838Z ^ 2025-12-04T12:15:05.9132344Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.9132350Z 2025-12-04T12:15:05.9133060Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:05.9133068Z 2025-12-04T12:15:05.9133072Z 2025-12-04T12:15:05.9133291Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:05.9134052Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,4096_cuda 2025-12-04T12:15:05.9134057Z 2025-12-04T12:15:05.9134330Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:05.9134530Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T12:15:05.9134731Z ================== 1 failed, 34 deselected, 2 rerun in 4.33s =================== 2025-12-04T12:15:05.9134835Z Got exit code 1 2025-12-04T12:15:05.9134962Z Retrying single test... 2025-12-04T12:15:05.9135433Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-232f2d4b09cdec77.xml 2025-12-04T12:15:05.9135660Z ============================= test session starts ============================== 2025-12-04T12:15:05.9136562Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T12:15:05.9136682Z cachedir: .pytest_cache 2025-12-04T12:15:05.9137224Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:15:05.9137350Z rootdir: /var/lib/jenkins/workspace 2025-12-04T12:15:05.9137460Z configfile: pytest.ini 2025-12-04T12:15:05.9138063Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T12:15:05.9138288Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T12:15:05.9139079Z stepcurrent: skipping 34 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,4096_cuda 2025-12-04T12:15:05.9145592Z Running 1 items in this shard 2025-12-04T12:15:05.9145599Z 2025-12-04T12:15:05.9146969Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,4096_cuda E1204 12:03:16.409000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0 2025-12-04T12:15:05.9148085Z E1204 12:03:16.409000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T12:15:05.9148526Z E1204 12:03:16.409000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 10 2025-12-04T12:15:05.9148994Z E1204 12:03:16.409000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 4096 2025-12-04T12:15:05.9149570Z E1204 12:03:16.409000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T12:15:05.9150112Z E1204 12:03:16.409000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T12:15:05.9150669Z E1204 12:03:16.409000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:05.9151250Z E1204 12:03:16.409000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T12:15:05.9151807Z E1204 12:03:16.409000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = xindex < xnumel 2025-12-04T12:15:05.9152363Z E1204 12:03:16.409000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_base = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T12:15:05.9152831Z E1204 12:03:16.409000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rbase = r0_base 2025-12-04T12:15:05.9153273Z E1204 12:03:16.409000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T12:15:05.9153911Z E1204 12:03:16.409000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_mean = tl.zeros([XBLOCK, R0_BLOCK], tl.float32) 2025-12-04T12:15:05.9154517Z E1204 12:03:16.409000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_m2 = tl.zeros([XBLOCK, R0_BLOCK], tl.float32) 2025-12-04T12:15:05.9155129Z E1204 12:03:16.409000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_weight = tl.zeros([XBLOCK, R0_BLOCK], tl.float32) 2025-12-04T12:15:05.9155720Z E1204 12:03:16.409000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] for r0_offset in tl.range(0, r0_numel, R0_BLOCK): 2025-12-04T12:15:05.9156295Z E1204 12:03:16.409000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = r0_offset + r0_base 2025-12-04T12:15:05.9156823Z E1204 12:03:16.409000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T12:15:05.9157331Z E1204 12:03:16.409000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T12:15:05.9157815Z E1204 12:03:16.409000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T12:15:05.9158294Z E1204 12:03:16.409000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_1 = r0_index 2025-12-04T12:15:05.9159113Z E1204 12:03:16.409000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask & xmask, eviction_policy='evict_last', other=0.0).to(tl.float32) 2025-12-04T12:15:05.9159647Z E1204 12:03:16.409000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float32) 2025-12-04T12:15:05.9160254Z E1204 12:03:16.409000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK]) 2025-12-04T12:15:05.9160974Z E1204 12:03:16.409000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_mean_next, tmp3_m2_next, tmp3_weight_next = triton_helpers.welford_reduce( 2025-12-04T12:15:05.9161591Z E1204 12:03:16.409000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2, tmp3_mean, tmp3_m2, tmp3_weight, roffset == 0 2025-12-04T12:15:05.9161996Z E1204 12:03:16.409000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ) 2025-12-04T12:15:05.9162693Z E1204 12:03:16.409000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_mean = tl.where(r0_mask & xmask, tmp3_mean_next, tmp3_mean) 2025-12-04T12:15:05.9163316Z E1204 12:03:16.409000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_m2 = tl.where(r0_mask & xmask, tmp3_m2_next, tmp3_m2) 2025-12-04T12:15:05.9164024Z E1204 12:03:16.409000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_weight = tl.where(r0_mask & xmask, tmp3_weight_next, tmp3_weight) 2025-12-04T12:15:05.9164742Z E1204 12:03:16.409000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4, tmp5, tmp6 = triton_helpers.welford(tmp3_mean, tmp3_m2, tmp3_weight, 1) 2025-12-04T12:15:05.9165253Z E1204 12:03:16.409000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tmp4[:, None] 2025-12-04T12:15:05.9165747Z E1204 12:03:16.409000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = tmp5[:, None] 2025-12-04T12:15:05.9166227Z E1204 12:03:16.409000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tmp6[:, None] 2025-12-04T12:15:05.9166859Z E1204 12:03:16.409000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp20 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32) 2025-12-04T12:15:05.9167428Z E1204 12:03:16.409000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp22 = tl.load(in_ptr1 + (0)) 2025-12-04T12:15:05.9167970Z E1204 12:03:16.409000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp23 = tl.broadcast_to(tmp22, [1, 1]) 2025-12-04T12:15:05.9168569Z E1204 12:03:16.409000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] for r0_offset in tl.range(0, r0_numel, R0_BLOCK): 2025-12-04T12:15:05.9169103Z E1204 12:03:16.409000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = r0_offset + r0_base 2025-12-04T12:15:05.9169672Z E1204 12:03:16.409000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T12:15:05.9170161Z E1204 12:03:16.409000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T12:15:05.9170642Z E1204 12:03:16.409000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T12:15:05.9171331Z E1204 12:03:16.409000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_1 = r0_index 2025-12-04T12:15:05.9172152Z E1204 12:03:16.409000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask & xmask, eviction_policy='evict_first', other=0.0).to(tl.float32) 2025-12-04T12:15:05.9172693Z E1204 12:03:16.409000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = tmp9.to(tl.float32) 2025-12-04T12:15:05.9173193Z E1204 12:03:16.409000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = tmp10 - tmp3 2025-12-04T12:15:05.9173657Z E1204 12:03:16.409000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = 4096.0 2025-12-04T12:15:05.9174169Z E1204 12:03:16.409000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = (tmp7 / tmp12) 2025-12-04T12:15:05.9174625Z E1204 12:03:16.409000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp14 = 1e-05 2025-12-04T12:15:05.9175133Z E1204 12:03:16.409000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp15 = tmp13 + tmp14 2025-12-04T12:15:05.9175674Z E1204 12:03:16.409000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp16 = libdevice.rsqrt(tmp15) 2025-12-04T12:15:05.9176265Z E1204 12:03:16.409000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp17 = tmp11 * tmp16 2025-12-04T12:15:05.9176877Z E1204 12:03:16.409000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp18 = tl_math.abs(tmp17) 2025-12-04T12:15:05.9177482Z E1204 12:03:16.409000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp19 = tl.broadcast_to(tmp18, [XBLOCK, R0_BLOCK]) 2025-12-04T12:15:05.9178076Z E1204 12:03:16.409000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp21 = triton_helpers.maximum(_tmp20, tmp19) 2025-12-04T12:15:05.9178661Z E1204 12:03:16.409000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp20 = tl.where(r0_mask & xmask, tmp21, _tmp20) 2025-12-04T12:15:05.9179208Z E1204 12:03:16.409000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp24 = tmp17 * tmp23 2025-12-04T12:15:05.9179697Z E1204 12:03:16.409000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp25 = -448.0 2025-12-04T12:15:05.9180273Z E1204 12:03:16.409000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp26 = triton_helpers.maximum(tmp24, tmp25) 2025-12-04T12:15:05.9180788Z E1204 12:03:16.409000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp27 = 448.0 2025-12-04T12:15:05.9181362Z E1204 12:03:16.409000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp28 = triton_helpers.minimum(tmp26, tmp27) 2025-12-04T12:15:05.9181914Z E1204 12:03:16.409000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp29 = tmp28.to(tl.float8e4nv) 2025-12-04T12:15:05.9182545Z E1204 12:03:16.409000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr3 + (r0_1 + 4096*x0), tmp29, r0_mask & xmask) 2025-12-04T12:15:05.9183170Z E1204 12:03:16.409000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp20 = triton_helpers.max2(_tmp20, 1)[:, None] 2025-12-04T12:15:05.9183735Z E1204 12:03:16.409000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr2 + (x0), tmp20, xmask) 2025-12-04T12:15:05.9184101Z E1204 12:03:16.409000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:05.9186387Z E1204 12:03:16.409000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr2': '*fp32', 'out_ptr3': '*fp8e4nv', 'xnumel': 'i32', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1, 'R0_BLOCK': 4096}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 75} 2025-12-04T12:15:05.9186923Z E1204 12:03:16.409000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:05.9187983Z E1204 12:03:16.409000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.9188614Z E1204 12:03:16.409000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.9189527Z E1204 12:03:16.409000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.9190248Z E1204 12:03:16.409000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.9191125Z E1204 12:03:16.409000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.9191910Z E1204 12:03:16.409000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.9192513Z E1204 12:03:16.409000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T12:15:05.9193668Z E1204 12:03:16.409000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T12:15:05.9194041Z E1204 12:03:16.409000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T12:15:05.9194977Z E1204 12:03:16.409000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.9195110Z ('RERUN', {'yellow': True}) [3.4310s] [100%] 2025-12-04T12:15:05.9196444Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,4096_cuda E1204 12:03:16.891000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0 2025-12-04T12:15:05.9197549Z E1204 12:03:16.891000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T12:15:05.9198017Z E1204 12:03:16.891000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 10 2025-12-04T12:15:05.9198481Z E1204 12:03:16.891000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 4096 2025-12-04T12:15:05.9198942Z E1204 12:03:16.891000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T12:15:05.9199490Z E1204 12:03:16.891000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T12:15:05.9200037Z E1204 12:03:16.891000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:05.9200619Z E1204 12:03:16.891000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T12:15:05.9201126Z E1204 12:03:16.891000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = xindex < xnumel 2025-12-04T12:15:05.9201685Z E1204 12:03:16.891000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_base = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T12:15:05.9202147Z E1204 12:03:16.891000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rbase = r0_base 2025-12-04T12:15:05.9202580Z E1204 12:03:16.891000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T12:15:05.9203179Z E1204 12:03:16.891000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_mean = tl.zeros([XBLOCK, R0_BLOCK], tl.float32) 2025-12-04T12:15:05.9203776Z E1204 12:03:16.891000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_m2 = tl.zeros([XBLOCK, R0_BLOCK], tl.float32) 2025-12-04T12:15:05.9204413Z E1204 12:03:16.891000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_weight = tl.zeros([XBLOCK, R0_BLOCK], tl.float32) 2025-12-04T12:15:05.9205008Z E1204 12:03:16.891000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] for r0_offset in tl.range(0, r0_numel, R0_BLOCK): 2025-12-04T12:15:05.9205541Z E1204 12:03:16.891000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = r0_offset + r0_base 2025-12-04T12:15:05.9206069Z E1204 12:03:16.891000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T12:15:05.9206605Z E1204 12:03:16.891000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T12:15:05.9207088Z E1204 12:03:16.891000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T12:15:05.9207571Z E1204 12:03:16.891000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_1 = r0_index 2025-12-04T12:15:05.9208384Z E1204 12:03:16.891000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask & xmask, eviction_policy='evict_last', other=0.0).to(tl.float32) 2025-12-04T12:15:05.9208957Z E1204 12:03:16.891000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float32) 2025-12-04T12:15:05.9209543Z E1204 12:03:16.891000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK]) 2025-12-04T12:15:05.9210263Z E1204 12:03:16.891000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_mean_next, tmp3_m2_next, tmp3_weight_next = triton_helpers.welford_reduce( 2025-12-04T12:15:05.9210919Z E1204 12:03:16.891000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2, tmp3_mean, tmp3_m2, tmp3_weight, roffset == 0 2025-12-04T12:15:05.9211326Z E1204 12:03:16.891000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ) 2025-12-04T12:15:05.9211989Z E1204 12:03:16.891000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_mean = tl.where(r0_mask & xmask, tmp3_mean_next, tmp3_mean) 2025-12-04T12:15:05.9212605Z E1204 12:03:16.891000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_m2 = tl.where(r0_mask & xmask, tmp3_m2_next, tmp3_m2) 2025-12-04T12:15:05.9213275Z E1204 12:03:16.891000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_weight = tl.where(r0_mask & xmask, tmp3_weight_next, tmp3_weight) 2025-12-04T12:15:05.9213987Z E1204 12:03:16.891000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4, tmp5, tmp6 = triton_helpers.welford(tmp3_mean, tmp3_m2, tmp3_weight, 1) 2025-12-04T12:15:05.9214466Z E1204 12:03:16.891000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tmp4[:, None] 2025-12-04T12:15:05.9214960Z E1204 12:03:16.891000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = tmp5[:, None] 2025-12-04T12:15:05.9215432Z E1204 12:03:16.891000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tmp6[:, None] 2025-12-04T12:15:05.9216075Z E1204 12:03:16.891000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp20 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32) 2025-12-04T12:15:05.9216663Z E1204 12:03:16.891000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp22 = tl.load(in_ptr1 + (0)) 2025-12-04T12:15:05.9217258Z E1204 12:03:16.891000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp23 = tl.broadcast_to(tmp22, [1, 1]) 2025-12-04T12:15:05.9217858Z E1204 12:03:16.891000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] for r0_offset in tl.range(0, r0_numel, R0_BLOCK): 2025-12-04T12:15:05.9218389Z E1204 12:03:16.891000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = r0_offset + r0_base 2025-12-04T12:15:05.9218925Z E1204 12:03:16.891000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T12:15:05.9219418Z E1204 12:03:16.891000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T12:15:05.9219928Z E1204 12:03:16.891000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T12:15:05.9220407Z E1204 12:03:16.891000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_1 = r0_index 2025-12-04T12:15:05.9221227Z E1204 12:03:16.891000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask & xmask, eviction_policy='evict_first', other=0.0).to(tl.float32) 2025-12-04T12:15:05.9221801Z E1204 12:03:16.891000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = tmp9.to(tl.float32) 2025-12-04T12:15:05.9222296Z E1204 12:03:16.891000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = tmp10 - tmp3 2025-12-04T12:15:05.9222757Z E1204 12:03:16.891000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = 4096.0 2025-12-04T12:15:05.9223270Z E1204 12:03:16.891000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = (tmp7 / tmp12) 2025-12-04T12:15:05.9223726Z E1204 12:03:16.891000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp14 = 1e-05 2025-12-04T12:15:05.9224270Z E1204 12:03:16.891000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp15 = tmp13 + tmp14 2025-12-04T12:15:05.9224807Z E1204 12:03:16.891000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp16 = libdevice.rsqrt(tmp15) 2025-12-04T12:15:05.9225305Z E1204 12:03:16.891000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp17 = tmp11 * tmp16 2025-12-04T12:15:05.9225836Z E1204 12:03:16.891000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp18 = tl_math.abs(tmp17) 2025-12-04T12:15:05.9226428Z E1204 12:03:16.891000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp19 = tl.broadcast_to(tmp18, [XBLOCK, R0_BLOCK]) 2025-12-04T12:15:05.9227020Z E1204 12:03:16.891000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp21 = triton_helpers.maximum(_tmp20, tmp19) 2025-12-04T12:15:05.9227608Z E1204 12:03:16.891000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp20 = tl.where(r0_mask & xmask, tmp21, _tmp20) 2025-12-04T12:15:05.9228115Z E1204 12:03:16.891000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp24 = tmp17 * tmp23 2025-12-04T12:15:05.9228579Z E1204 12:03:16.891000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp25 = -448.0 2025-12-04T12:15:05.9229154Z E1204 12:03:16.891000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp26 = triton_helpers.maximum(tmp24, tmp25) 2025-12-04T12:15:05.9229626Z E1204 12:03:16.891000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp27 = 448.0 2025-12-04T12:15:05.9230200Z E1204 12:03:16.891000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp28 = triton_helpers.minimum(tmp26, tmp27) 2025-12-04T12:15:05.9230784Z E1204 12:03:16.891000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp29 = tmp28.to(tl.float8e4nv) 2025-12-04T12:15:05.9231408Z E1204 12:03:16.891000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr3 + (r0_1 + 4096*x0), tmp29, r0_mask & xmask) 2025-12-04T12:15:05.9231987Z E1204 12:03:16.891000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp20 = triton_helpers.max2(_tmp20, 1)[:, None] 2025-12-04T12:15:05.9232548Z E1204 12:03:16.891000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr2 + (x0), tmp20, xmask) 2025-12-04T12:15:05.9232911Z E1204 12:03:16.891000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:05.9235211Z E1204 12:03:16.891000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr2': '*fp32', 'out_ptr3': '*fp8e4nv', 'xnumel': 'i32', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1, 'R0_BLOCK': 4096}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 75} 2025-12-04T12:15:05.9235781Z E1204 12:03:16.891000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:05.9236835Z E1204 12:03:16.891000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.9237502Z E1204 12:03:16.891000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.9238406Z E1204 12:03:16.891000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.9239092Z E1204 12:03:16.891000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.9239969Z E1204 12:03:16.891000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.9240752Z E1204 12:03:16.891000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.9241365Z E1204 12:03:16.891000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T12:15:05.9242465Z E1204 12:03:16.891000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T12:15:05.9242834Z E1204 12:03:16.891000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T12:15:05.9243737Z E1204 12:03:16.891000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.9243874Z ('RERUN', {'yellow': True}) [0.4436s] [100%] 2025-12-04T12:15:05.9245249Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,4096_cuda E1204 12:03:17.339000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0 2025-12-04T12:15:05.9246354Z E1204 12:03:17.339000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T12:15:05.9246790Z E1204 12:03:17.339000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 10 2025-12-04T12:15:05.9247250Z E1204 12:03:17.339000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 4096 2025-12-04T12:15:05.9247741Z E1204 12:03:17.339000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T12:15:05.9248299Z E1204 12:03:17.339000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T12:15:05.9248837Z E1204 12:03:17.339000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:05.9249451Z E1204 12:03:17.339000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T12:15:05.9249950Z E1204 12:03:17.339000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = xindex < xnumel 2025-12-04T12:15:05.9250502Z E1204 12:03:17.339000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_base = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T12:15:05.9250962Z E1204 12:03:17.339000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rbase = r0_base 2025-12-04T12:15:05.9251428Z E1204 12:03:17.339000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T12:15:05.9252026Z E1204 12:03:17.339000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_mean = tl.zeros([XBLOCK, R0_BLOCK], tl.float32) 2025-12-04T12:15:05.9252630Z E1204 12:03:17.339000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_m2 = tl.zeros([XBLOCK, R0_BLOCK], tl.float32) 2025-12-04T12:15:05.9253234Z E1204 12:03:17.339000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_weight = tl.zeros([XBLOCK, R0_BLOCK], tl.float32) 2025-12-04T12:15:05.9253824Z E1204 12:03:17.339000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] for r0_offset in tl.range(0, r0_numel, R0_BLOCK): 2025-12-04T12:15:05.9254354Z E1204 12:03:17.339000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = r0_offset + r0_base 2025-12-04T12:15:05.9254904Z E1204 12:03:17.339000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T12:15:05.9255393Z E1204 12:03:17.339000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T12:15:05.9255872Z E1204 12:03:17.339000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T12:15:05.9256428Z E1204 12:03:17.339000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_1 = r0_index 2025-12-04T12:15:05.9257240Z E1204 12:03:17.339000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask & xmask, eviction_policy='evict_last', other=0.0).to(tl.float32) 2025-12-04T12:15:05.9257783Z E1204 12:03:17.339000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float32) 2025-12-04T12:15:05.9258421Z E1204 12:03:17.339000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK]) 2025-12-04T12:15:05.9259142Z E1204 12:03:17.339000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_mean_next, tmp3_m2_next, tmp3_weight_next = triton_helpers.welford_reduce( 2025-12-04T12:15:05.9259764Z E1204 12:03:17.339000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2, tmp3_mean, tmp3_m2, tmp3_weight, roffset == 0 2025-12-04T12:15:05.9260165Z E1204 12:03:17.339000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ) 2025-12-04T12:15:05.9260866Z E1204 12:03:17.339000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_mean = tl.where(r0_mask & xmask, tmp3_mean_next, tmp3_mean) 2025-12-04T12:15:05.9261491Z E1204 12:03:17.339000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_m2 = tl.where(r0_mask & xmask, tmp3_m2_next, tmp3_m2) 2025-12-04T12:15:05.9262181Z E1204 12:03:17.339000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_weight = tl.where(r0_mask & xmask, tmp3_weight_next, tmp3_weight) 2025-12-04T12:15:05.9262918Z E1204 12:03:17.339000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4, tmp5, tmp6 = triton_helpers.welford(tmp3_mean, tmp3_m2, tmp3_weight, 1) 2025-12-04T12:15:05.9263396Z E1204 12:03:17.339000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tmp4[:, None] 2025-12-04T12:15:05.9263886Z E1204 12:03:17.339000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = tmp5[:, None] 2025-12-04T12:15:05.9264363Z E1204 12:03:17.339000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tmp6[:, None] 2025-12-04T12:15:05.9265046Z E1204 12:03:17.339000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp20 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32) 2025-12-04T12:15:05.9265573Z E1204 12:03:17.339000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp22 = tl.load(in_ptr1 + (0)) 2025-12-04T12:15:05.9266125Z E1204 12:03:17.339000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp23 = tl.broadcast_to(tmp22, [1, 1]) 2025-12-04T12:15:05.9266721Z E1204 12:03:17.339000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] for r0_offset in tl.range(0, r0_numel, R0_BLOCK): 2025-12-04T12:15:05.9267256Z E1204 12:03:17.339000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = r0_offset + r0_base 2025-12-04T12:15:05.9267797Z E1204 12:03:17.339000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T12:15:05.9268294Z E1204 12:03:17.339000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T12:15:05.9268776Z E1204 12:03:17.339000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T12:15:05.9269259Z E1204 12:03:17.339000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_1 = r0_index 2025-12-04T12:15:05.9270077Z E1204 12:03:17.339000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask & xmask, eviction_policy='evict_first', other=0.0).to(tl.float32) 2025-12-04T12:15:05.9270623Z E1204 12:03:17.339000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = tmp9.to(tl.float32) 2025-12-04T12:15:05.9271349Z E1204 12:03:17.339000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = tmp10 - tmp3 2025-12-04T12:15:05.9271939Z E1204 12:03:17.339000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = 4096.0 2025-12-04T12:15:05.9272445Z E1204 12:03:17.339000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = (tmp7 / tmp12) 2025-12-04T12:15:05.9272910Z E1204 12:03:17.339000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp14 = 1e-05 2025-12-04T12:15:05.9273423Z E1204 12:03:17.339000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp15 = tmp13 + tmp14 2025-12-04T12:15:05.9273961Z E1204 12:03:17.339000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp16 = libdevice.rsqrt(tmp15) 2025-12-04T12:15:05.9274533Z E1204 12:03:17.339000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp17 = tmp11 * tmp16 2025-12-04T12:15:05.9275060Z E1204 12:03:17.339000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp18 = tl_math.abs(tmp17) 2025-12-04T12:15:05.9275660Z E1204 12:03:17.339000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp19 = tl.broadcast_to(tmp18, [XBLOCK, R0_BLOCK]) 2025-12-04T12:15:05.9276316Z E1204 12:03:17.339000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp21 = triton_helpers.maximum(_tmp20, tmp19) 2025-12-04T12:15:05.9276905Z E1204 12:03:17.339000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp20 = tl.where(r0_mask & xmask, tmp21, _tmp20) 2025-12-04T12:15:05.9277416Z E1204 12:03:17.339000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp24 = tmp17 * tmp23 2025-12-04T12:15:05.9277884Z E1204 12:03:17.339000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp25 = -448.0 2025-12-04T12:15:05.9278468Z E1204 12:03:17.339000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp26 = triton_helpers.maximum(tmp24, tmp25) 2025-12-04T12:15:05.9278994Z E1204 12:03:17.339000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp27 = 448.0 2025-12-04T12:15:05.9279575Z E1204 12:03:17.339000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp28 = triton_helpers.minimum(tmp26, tmp27) 2025-12-04T12:15:05.9280123Z E1204 12:03:17.339000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp29 = tmp28.to(tl.float8e4nv) 2025-12-04T12:15:05.9280748Z E1204 12:03:17.339000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr3 + (r0_1 + 4096*x0), tmp29, r0_mask & xmask) 2025-12-04T12:15:05.9281325Z E1204 12:03:17.339000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp20 = triton_helpers.max2(_tmp20, 1)[:, None] 2025-12-04T12:15:05.9281891Z E1204 12:03:17.339000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr2 + (x0), tmp20, xmask) 2025-12-04T12:15:05.9282252Z E1204 12:03:17.339000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:05.9284511Z E1204 12:03:17.339000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr2': '*fp32', 'out_ptr3': '*fp8e4nv', 'xnumel': 'i32', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1, 'R0_BLOCK': 4096}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 75} 2025-12-04T12:15:05.9285042Z E1204 12:03:17.339000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:05.9286138Z E1204 12:03:17.339000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.9286768Z E1204 12:03:17.339000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.9287673Z E1204 12:03:17.339000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.9288388Z E1204 12:03:17.339000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.9289288Z E1204 12:03:17.339000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.9290057Z E1204 12:03:17.339000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.9290699Z E1204 12:03:17.339000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T12:15:05.9291807Z E1204 12:03:17.339000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T12:15:05.9292174Z E1204 12:03:17.339000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T12:15:05.9293128Z E1204 12:03:17.339000 121412 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.9293239Z FAILED [0.4454s] [100%] 2025-12-04T12:15:05.9293246Z 2025-12-04T12:15:05.9293395Z ==================================== RERUNS ==================================== 2025-12-04T12:15:05.9293806Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,4096_cuda _ 2025-12-04T12:15:05.9293934Z Traceback (most recent call last): 2025-12-04T12:15:05.9294369Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant 2025-12-04T12:15:05.9294607Z y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T12:15:05.9295097Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:05.9295363Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:05.9295876Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:05.9296085Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:05.9296666Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:05.9296816Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:05.9297368Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:05.9297692Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:05.9298229Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:05.9298382Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:05.9298995Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:05.9299134Z return self._compile_to_module() 2025-12-04T12:15:05.9299621Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:05.9299790Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:05.9300321Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:05.9300453Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:05.9300964Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:05.9301233Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:05.9301824Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:05.9301966Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:05.9302467Z File "/tmp/tmpdomg4n05/iz/cizmrzz6mryyxk6k74zpidzy2u2zzkaqvmcsyw3wh5zw5relgn42.py", line 65, in 2025-12-04T12:15:05.9302974Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T12:15:05.9303088Z kernel.precompile( 2025-12-04T12:15:05.9303638Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T12:15:05.9303770Z self._precompile_worker() 2025-12-04T12:15:05.9304365Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:05.9304545Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:05.9305187Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.9305387Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.9305852Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.9306102Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.9306547Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.9306895Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.9307127Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:05.9307788Z def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T12:15:05.9307883Z ^ 2025-12-04T12:15:05.9308344Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.9308351Z 2025-12-04T12:15:05.9309073Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:05.9309082Z 2025-12-04T12:15:05.9309086Z 2025-12-04T12:15:05.9309305Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:05.9310020Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,4096_cuda 2025-12-04T12:15:05.9310026Z 2025-12-04T12:15:05.9310299Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:05.9310525Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.9310647Z frames [('total', 1)] 2025-12-04T12:15:05.9310796Z stats [('calls_captured', 10)] 2025-12-04T12:15:05.9311280Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:05.9311504Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.9311606Z graph_break [] 2025-12-04T12:15:05.9312014Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,4096_cuda _ 2025-12-04T12:15:05.9312137Z Traceback (most recent call last): 2025-12-04T12:15:05.9312560Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant 2025-12-04T12:15:05.9312806Z y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T12:15:05.9313326Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:05.9313592Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:05.9314107Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:05.9314300Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:05.9314878Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:05.9315025Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:05.9315573Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:05.9315893Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:05.9316412Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:05.9316603Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:05.9317088Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:05.9317211Z return self._compile_to_module() 2025-12-04T12:15:05.9317710Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:05.9317878Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:05.9318406Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:05.9318533Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:05.9319026Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:05.9319271Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:05.9319853Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:05.9319997Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:05.9320470Z File "/tmp/tmphc7i6l_a/xm/cxmjr4g4b2zqtk32544qtcdij4vwxgnynrm5c4tfhepuwribwsbf.py", line 65, in 2025-12-04T12:15:05.9320935Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T12:15:05.9321061Z kernel.precompile( 2025-12-04T12:15:05.9321613Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T12:15:05.9321730Z self._precompile_worker() 2025-12-04T12:15:05.9322337Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:05.9322519Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:05.9323159Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.9323357Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.9323806Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.9324064Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.9324507Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.9324878Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.9325106Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:05.9325783Z def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T12:15:05.9325890Z ^ 2025-12-04T12:15:05.9326351Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.9326356Z 2025-12-04T12:15:05.9327077Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:05.9327117Z 2025-12-04T12:15:05.9327122Z 2025-12-04T12:15:05.9327343Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:05.9328047Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,4096_cuda 2025-12-04T12:15:05.9328066Z 2025-12-04T12:15:05.9328336Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:05.9328563Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.9328716Z frames [('total', 1)] 2025-12-04T12:15:05.9328835Z stats [('calls_captured', 10)] 2025-12-04T12:15:05.9329310Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:05.9329552Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.9329655Z graph_break [] 2025-12-04T12:15:05.9329875Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.9329993Z frames [('total', 1)] 2025-12-04T12:15:05.9330109Z stats [('calls_captured', 10)] 2025-12-04T12:15:05.9330342Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.9330804Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:05.9330907Z graph_break [] 2025-12-04T12:15:05.9331067Z =================================== FAILURES =================================== 2025-12-04T12:15:05.9331468Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,4096_cuda _ 2025-12-04T12:15:05.9331592Z Traceback (most recent call last): 2025-12-04T12:15:05.9332030Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant 2025-12-04T12:15:05.9332265Z y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T12:15:05.9332768Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:05.9333018Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:05.9333530Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:05.9333736Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:05.9334244Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:05.9334436Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:05.9334971Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:05.9335294Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:05.9335824Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:05.9335973Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:05.9336545Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:05.9336688Z return self._compile_to_module() 2025-12-04T12:15:05.9337214Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:05.9337400Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:05.9337918Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:05.9338051Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:05.9338618Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:05.9338851Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:05.9339455Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:05.9339584Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:05.9340082Z File "/tmp/tmpwb9d0956/sk/cskhnk4q7cufhboa4xv5riz7djeb56xwe3v5cf4b32sw7f6tkcwy.py", line 65, in 2025-12-04T12:15:05.9340562Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T12:15:05.9340707Z kernel.precompile( 2025-12-04T12:15:05.9341268Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T12:15:05.9341403Z self._precompile_worker() 2025-12-04T12:15:05.9342004Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:05.9342200Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:05.9342793Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.9342995Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.9343467Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.9343712Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.9344172Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.9344509Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.9344740Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:05.9345407Z def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T12:15:05.9345500Z ^ 2025-12-04T12:15:05.9345957Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.9345977Z 2025-12-04T12:15:05.9346694Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:05.9346702Z 2025-12-04T12:15:05.9346706Z 2025-12-04T12:15:05.9346957Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:05.9347678Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,4096_cuda 2025-12-04T12:15:05.9347686Z 2025-12-04T12:15:05.9347951Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:05.9348189Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.9348299Z frames [('total', 1)] 2025-12-04T12:15:05.9348416Z stats [('calls_captured', 10)] 2025-12-04T12:15:05.9348898Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:05.9349152Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.9349273Z graph_break [] 2025-12-04T12:15:05.9349494Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.9349603Z frames [('total', 1)] 2025-12-04T12:15:05.9349737Z stats [('calls_captured', 10)] 2025-12-04T12:15:05.9349959Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.9350448Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:05.9350562Z graph_break [] 2025-12-04T12:15:05.9350777Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.9350879Z frames [('total', 1)] 2025-12-04T12:15:05.9351012Z stats [('calls_captured', 10)] 2025-12-04T12:15:05.9351231Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.9351704Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:05.9351832Z graph_break [] 2025-12-04T12:15:05.9352486Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-232f2d4b09cdec77.xml - 2025-12-04T12:15:05.9352675Z =========================== short test summary info ============================ 2025-12-04T12:15:05.9353522Z FAILED [0.4454s] inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,4096_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:05.9354828Z def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T12:15:05.9354919Z ^ 2025-12-04T12:15:05.9355379Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.9355388Z 2025-12-04T12:15:05.9356110Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:05.9356120Z 2025-12-04T12:15:05.9356125Z 2025-12-04T12:15:05.9356342Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:05.9357055Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,4096_cuda 2025-12-04T12:15:05.9357063Z 2025-12-04T12:15:05.9357334Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:05.9357516Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T12:15:05.9357735Z ================== 1 failed, 187 deselected, 2 rerun in 4.36s ================== 2025-12-04T12:15:05.9357837Z Got exit code 1 2025-12-04T12:15:05.9357958Z Retrying single test... 2025-12-04T12:15:05.9358435Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-6add3d31a0a55a66.xml 2025-12-04T12:15:05.9358651Z ============================= test session starts ============================== 2025-12-04T12:15:05.9359020Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T12:15:05.9359133Z cachedir: .pytest_cache 2025-12-04T12:15:05.9359658Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:15:05.9359798Z rootdir: /var/lib/jenkins/workspace 2025-12-04T12:15:05.9359908Z configfile: pytest.ini 2025-12-04T12:15:05.9360514Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T12:15:05.9360738Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T12:15:05.9361563Z stepcurrent: skipping 34 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,4096_cuda 2025-12-04T12:15:05.9361702Z Running 1 items in this shard 2025-12-04T12:15:05.9361708Z 2025-12-04T12:15:05.9363040Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,4096_cuda E1204 12:03:36.276000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0 2025-12-04T12:15:05.9364171Z E1204 12:03:36.276000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T12:15:05.9364611Z E1204 12:03:36.276000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 10 2025-12-04T12:15:05.9365071Z E1204 12:03:36.276000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 4096 2025-12-04T12:15:05.9365566Z E1204 12:03:36.276000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T12:15:05.9366103Z E1204 12:03:36.276000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T12:15:05.9366654Z E1204 12:03:36.276000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:05.9367309Z E1204 12:03:36.276000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T12:15:05.9367884Z E1204 12:03:36.276000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = xindex < xnumel 2025-12-04T12:15:05.9368444Z E1204 12:03:36.276000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_base = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T12:15:05.9368896Z E1204 12:03:36.276000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rbase = r0_base 2025-12-04T12:15:05.9369344Z E1204 12:03:36.276000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T12:15:05.9369944Z E1204 12:03:36.276000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_mean = tl.zeros([XBLOCK, R0_BLOCK], tl.float32) 2025-12-04T12:15:05.9370539Z E1204 12:03:36.276000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_m2 = tl.zeros([XBLOCK, R0_BLOCK], tl.float32) 2025-12-04T12:15:05.9371333Z E1204 12:03:36.276000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_weight = tl.zeros([XBLOCK, R0_BLOCK], tl.float32) 2025-12-04T12:15:05.9371933Z E1204 12:03:36.276000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] for r0_offset in tl.range(0, r0_numel, R0_BLOCK): 2025-12-04T12:15:05.9372565Z E1204 12:03:36.276000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = r0_offset + r0_base 2025-12-04T12:15:05.9373097Z E1204 12:03:36.276000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T12:15:05.9373605Z E1204 12:03:36.276000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T12:15:05.9374086Z E1204 12:03:36.276000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T12:15:05.9374566Z E1204 12:03:36.276000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_1 = r0_index 2025-12-04T12:15:05.9375441Z E1204 12:03:36.276000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask & xmask, eviction_policy='evict_last', other=0.0).to(tl.float32) 2025-12-04T12:15:05.9375975Z E1204 12:03:36.276000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float32) 2025-12-04T12:15:05.9376648Z E1204 12:03:36.276000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK]) 2025-12-04T12:15:05.9377423Z E1204 12:03:36.276000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_mean_next, tmp3_m2_next, tmp3_weight_next = triton_helpers.welford_reduce( 2025-12-04T12:15:05.9378038Z E1204 12:03:36.276000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2, tmp3_mean, tmp3_m2, tmp3_weight, roffset == 0 2025-12-04T12:15:05.9378445Z E1204 12:03:36.276000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ) 2025-12-04T12:15:05.9379097Z E1204 12:03:36.276000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_mean = tl.where(r0_mask & xmask, tmp3_mean_next, tmp3_mean) 2025-12-04T12:15:05.9379786Z E1204 12:03:36.276000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_m2 = tl.where(r0_mask & xmask, tmp3_m2_next, tmp3_m2) 2025-12-04T12:15:05.9380462Z E1204 12:03:36.276000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_weight = tl.where(r0_mask & xmask, tmp3_weight_next, tmp3_weight) 2025-12-04T12:15:05.9381178Z E1204 12:03:36.276000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4, tmp5, tmp6 = triton_helpers.welford(tmp3_mean, tmp3_m2, tmp3_weight, 1) 2025-12-04T12:15:05.9381658Z E1204 12:03:36.276000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tmp4[:, None] 2025-12-04T12:15:05.9382152Z E1204 12:03:36.276000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = tmp5[:, None] 2025-12-04T12:15:05.9382631Z E1204 12:03:36.276000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tmp6[:, None] 2025-12-04T12:15:05.9383263Z E1204 12:03:36.276000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp20 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32) 2025-12-04T12:15:05.9383802Z E1204 12:03:36.276000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp22 = tl.load(in_ptr1 + (0)) 2025-12-04T12:15:05.9384348Z E1204 12:03:36.276000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp23 = tl.broadcast_to(tmp22, [1, 1]) 2025-12-04T12:15:05.9384942Z E1204 12:03:36.276000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] for r0_offset in tl.range(0, r0_numel, R0_BLOCK): 2025-12-04T12:15:05.9385473Z E1204 12:03:36.276000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = r0_offset + r0_base 2025-12-04T12:15:05.9386042Z E1204 12:03:36.276000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T12:15:05.9386547Z E1204 12:03:36.276000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T12:15:05.9387027Z E1204 12:03:36.276000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T12:15:05.9387502Z E1204 12:03:36.276000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_1 = r0_index 2025-12-04T12:15:05.9388319Z E1204 12:03:36.276000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask & xmask, eviction_policy='evict_first', other=0.0).to(tl.float32) 2025-12-04T12:15:05.9388917Z E1204 12:03:36.276000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = tmp9.to(tl.float32) 2025-12-04T12:15:05.9389419Z E1204 12:03:36.276000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = tmp10 - tmp3 2025-12-04T12:15:05.9389881Z E1204 12:03:36.276000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = 4096.0 2025-12-04T12:15:05.9390432Z E1204 12:03:36.276000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = (tmp7 / tmp12) 2025-12-04T12:15:05.9390891Z E1204 12:03:36.276000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp14 = 1e-05 2025-12-04T12:15:05.9391400Z E1204 12:03:36.276000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp15 = tmp13 + tmp14 2025-12-04T12:15:05.9391944Z E1204 12:03:36.276000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp16 = libdevice.rsqrt(tmp15) 2025-12-04T12:15:05.9392479Z E1204 12:03:36.276000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp17 = tmp11 * tmp16 2025-12-04T12:15:05.9393013Z E1204 12:03:36.276000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp18 = tl_math.abs(tmp17) 2025-12-04T12:15:05.9393608Z E1204 12:03:36.276000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp19 = tl.broadcast_to(tmp18, [XBLOCK, R0_BLOCK]) 2025-12-04T12:15:05.9394200Z E1204 12:03:36.276000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp21 = triton_helpers.maximum(_tmp20, tmp19) 2025-12-04T12:15:05.9394789Z E1204 12:03:36.276000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp20 = tl.where(r0_mask & xmask, tmp21, _tmp20) 2025-12-04T12:15:05.9395290Z E1204 12:03:36.276000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp24 = tmp17 * tmp23 2025-12-04T12:15:05.9396090Z E1204 12:03:36.276000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp25 = -448.0 2025-12-04T12:15:05.9396668Z E1204 12:03:36.276000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp26 = triton_helpers.maximum(tmp24, tmp25) 2025-12-04T12:15:05.9397141Z E1204 12:03:36.276000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp27 = 448.0 2025-12-04T12:15:05.9397719Z E1204 12:03:36.276000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp28 = triton_helpers.minimum(tmp26, tmp27) 2025-12-04T12:15:05.9398257Z E1204 12:03:36.276000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp29 = tmp28.to(tl.float8e4nv) 2025-12-04T12:15:05.9398897Z E1204 12:03:36.276000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr3 + (r0_1 + 4096*x0), tmp29, r0_mask & xmask) 2025-12-04T12:15:05.9399525Z E1204 12:03:36.276000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp20 = triton_helpers.max2(_tmp20, 1)[:, None] 2025-12-04T12:15:05.9400090Z E1204 12:03:36.276000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr2 + (x0), tmp20, xmask) 2025-12-04T12:15:05.9400454Z E1204 12:03:36.276000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:05.9402751Z E1204 12:03:36.276000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr2': '*fp32', 'out_ptr3': '*fp8e4nv', 'xnumel': 'i32', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1, 'R0_BLOCK': 4096}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 75} 2025-12-04T12:15:05.9403288Z E1204 12:03:36.276000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:05.9404377Z E1204 12:03:36.276000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.9405007Z E1204 12:03:36.276000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.9405901Z E1204 12:03:36.276000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.9406623Z E1204 12:03:36.276000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.9407504Z E1204 12:03:36.276000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.9408288Z E1204 12:03:36.276000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.9408898Z E1204 12:03:36.276000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T12:15:05.9409997Z E1204 12:03:36.276000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T12:15:05.9410371Z E1204 12:03:36.276000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T12:15:05.9411282Z E1204 12:03:36.276000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.9411421Z ('RERUN', {'yellow': True}) [3.4387s] [100%] 2025-12-04T12:15:05.9412755Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,4096_cuda E1204 12:03:36.757000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0 2025-12-04T12:15:05.9413900Z E1204 12:03:36.757000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T12:15:05.9414344Z E1204 12:03:36.757000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 10 2025-12-04T12:15:05.9414811Z E1204 12:03:36.757000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 4096 2025-12-04T12:15:05.9415269Z E1204 12:03:36.757000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T12:15:05.9415805Z E1204 12:03:36.757000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T12:15:05.9416470Z E1204 12:03:36.757000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:05.9417060Z E1204 12:03:36.757000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T12:15:05.9417573Z E1204 12:03:36.757000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = xindex < xnumel 2025-12-04T12:15:05.9418126Z E1204 12:03:36.757000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_base = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T12:15:05.9418620Z E1204 12:03:36.757000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rbase = r0_base 2025-12-04T12:15:05.9419052Z E1204 12:03:36.757000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T12:15:05.9419651Z E1204 12:03:36.757000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_mean = tl.zeros([XBLOCK, R0_BLOCK], tl.float32) 2025-12-04T12:15:05.9420249Z E1204 12:03:36.757000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_m2 = tl.zeros([XBLOCK, R0_BLOCK], tl.float32) 2025-12-04T12:15:05.9420896Z E1204 12:03:36.757000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_weight = tl.zeros([XBLOCK, R0_BLOCK], tl.float32) 2025-12-04T12:15:05.9421487Z E1204 12:03:36.757000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] for r0_offset in tl.range(0, r0_numel, R0_BLOCK): 2025-12-04T12:15:05.9422021Z E1204 12:03:36.757000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = r0_offset + r0_base 2025-12-04T12:15:05.9422553Z E1204 12:03:36.757000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T12:15:05.9423053Z E1204 12:03:36.757000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T12:15:05.9423534Z E1204 12:03:36.757000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T12:15:05.9424015Z E1204 12:03:36.757000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_1 = r0_index 2025-12-04T12:15:05.9424831Z E1204 12:03:36.757000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask & xmask, eviction_policy='evict_last', other=0.0).to(tl.float32) 2025-12-04T12:15:05.9425356Z E1204 12:03:36.757000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float32) 2025-12-04T12:15:05.9425960Z E1204 12:03:36.757000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK]) 2025-12-04T12:15:05.9426675Z E1204 12:03:36.757000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_mean_next, tmp3_m2_next, tmp3_weight_next = triton_helpers.welford_reduce( 2025-12-04T12:15:05.9427343Z E1204 12:03:36.757000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2, tmp3_mean, tmp3_m2, tmp3_weight, roffset == 0 2025-12-04T12:15:05.9427746Z E1204 12:03:36.757000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ) 2025-12-04T12:15:05.9428411Z E1204 12:03:36.757000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_mean = tl.where(r0_mask & xmask, tmp3_mean_next, tmp3_mean) 2025-12-04T12:15:05.9429030Z E1204 12:03:36.757000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_m2 = tl.where(r0_mask & xmask, tmp3_m2_next, tmp3_m2) 2025-12-04T12:15:05.9429704Z E1204 12:03:36.757000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_weight = tl.where(r0_mask & xmask, tmp3_weight_next, tmp3_weight) 2025-12-04T12:15:05.9430884Z E1204 12:03:36.757000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4, tmp5, tmp6 = triton_helpers.welford(tmp3_mean, tmp3_m2, tmp3_weight, 1) 2025-12-04T12:15:05.9431382Z E1204 12:03:36.757000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tmp4[:, None] 2025-12-04T12:15:05.9431871Z E1204 12:03:36.757000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = tmp5[:, None] 2025-12-04T12:15:05.9432384Z E1204 12:03:36.757000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tmp6[:, None] 2025-12-04T12:15:05.9433021Z E1204 12:03:36.757000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp20 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32) 2025-12-04T12:15:05.9433567Z E1204 12:03:36.757000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp22 = tl.load(in_ptr1 + (0)) 2025-12-04T12:15:05.9434117Z E1204 12:03:36.757000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp23 = tl.broadcast_to(tmp22, [1, 1]) 2025-12-04T12:15:05.9434746Z E1204 12:03:36.757000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] for r0_offset in tl.range(0, r0_numel, R0_BLOCK): 2025-12-04T12:15:05.9435277Z E1204 12:03:36.757000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = r0_offset + r0_base 2025-12-04T12:15:05.9435821Z E1204 12:03:36.757000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T12:15:05.9436313Z E1204 12:03:36.757000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T12:15:05.9436793Z E1204 12:03:36.757000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T12:15:05.9437279Z E1204 12:03:36.757000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_1 = r0_index 2025-12-04T12:15:05.9438103Z E1204 12:03:36.757000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask & xmask, eviction_policy='evict_first', other=0.0).to(tl.float32) 2025-12-04T12:15:05.9438652Z E1204 12:03:36.757000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = tmp9.to(tl.float32) 2025-12-04T12:15:05.9439152Z E1204 12:03:36.757000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = tmp10 - tmp3 2025-12-04T12:15:05.9439617Z E1204 12:03:36.757000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = 4096.0 2025-12-04T12:15:05.9440137Z E1204 12:03:36.757000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = (tmp7 / tmp12) 2025-12-04T12:15:05.9440599Z E1204 12:03:36.757000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp14 = 1e-05 2025-12-04T12:15:05.9441175Z E1204 12:03:36.757000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp15 = tmp13 + tmp14 2025-12-04T12:15:05.9441715Z E1204 12:03:36.757000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp16 = libdevice.rsqrt(tmp15) 2025-12-04T12:15:05.9442215Z E1204 12:03:36.757000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp17 = tmp11 * tmp16 2025-12-04T12:15:05.9442753Z E1204 12:03:36.757000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp18 = tl_math.abs(tmp17) 2025-12-04T12:15:05.9443346Z E1204 12:03:36.757000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp19 = tl.broadcast_to(tmp18, [XBLOCK, R0_BLOCK]) 2025-12-04T12:15:05.9443975Z E1204 12:03:36.757000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp21 = triton_helpers.maximum(_tmp20, tmp19) 2025-12-04T12:15:05.9444573Z E1204 12:03:36.757000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp20 = tl.where(r0_mask & xmask, tmp21, _tmp20) 2025-12-04T12:15:05.9445072Z E1204 12:03:36.757000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp24 = tmp17 * tmp23 2025-12-04T12:15:05.9445586Z E1204 12:03:36.757000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp25 = -448.0 2025-12-04T12:15:05.9446168Z E1204 12:03:36.757000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp26 = triton_helpers.maximum(tmp24, tmp25) 2025-12-04T12:15:05.9446641Z E1204 12:03:36.757000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp27 = 448.0 2025-12-04T12:15:05.9447225Z E1204 12:03:36.757000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp28 = triton_helpers.minimum(tmp26, tmp27) 2025-12-04T12:15:05.9447815Z E1204 12:03:36.757000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp29 = tmp28.to(tl.float8e4nv) 2025-12-04T12:15:05.9448437Z E1204 12:03:36.757000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr3 + (r0_1 + 4096*x0), tmp29, r0_mask & xmask) 2025-12-04T12:15:05.9449014Z E1204 12:03:36.757000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp20 = triton_helpers.max2(_tmp20, 1)[:, None] 2025-12-04T12:15:05.9449575Z E1204 12:03:36.757000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr2 + (x0), tmp20, xmask) 2025-12-04T12:15:05.9449936Z E1204 12:03:36.757000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:05.9452202Z E1204 12:03:36.757000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr2': '*fp32', 'out_ptr3': '*fp8e4nv', 'xnumel': 'i32', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1, 'R0_BLOCK': 4096}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 75} 2025-12-04T12:15:05.9452745Z E1204 12:03:36.757000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:05.9453803Z E1204 12:03:36.757000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.9454437Z E1204 12:03:36.757000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.9455379Z E1204 12:03:36.757000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.9456061Z E1204 12:03:36.757000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.9457015Z E1204 12:03:36.757000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.9457875Z E1204 12:03:36.757000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.9458487Z E1204 12:03:36.757000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T12:15:05.9459590Z E1204 12:03:36.757000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T12:15:05.9459989Z E1204 12:03:36.757000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T12:15:05.9460893Z E1204 12:03:36.757000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.9461032Z ('RERUN', {'yellow': True}) [0.4432s] [100%] 2025-12-04T12:15:05.9462366Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,4096_cuda E1204 12:03:37.204000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0 2025-12-04T12:15:05.9463493Z E1204 12:03:37.204000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T12:15:05.9463930Z E1204 12:03:37.204000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 10 2025-12-04T12:15:05.9464396Z E1204 12:03:37.204000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 4096 2025-12-04T12:15:05.9464858Z E1204 12:03:37.204000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T12:15:05.9465409Z E1204 12:03:37.204000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T12:15:05.9465955Z E1204 12:03:37.204000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:05.9466539Z E1204 12:03:37.204000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T12:15:05.9467056Z E1204 12:03:37.204000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = xindex < xnumel 2025-12-04T12:15:05.9467608Z E1204 12:03:37.204000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_base = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T12:15:05.9468069Z E1204 12:03:37.204000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rbase = r0_base 2025-12-04T12:15:05.9468500Z E1204 12:03:37.204000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T12:15:05.9469135Z E1204 12:03:37.204000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_mean = tl.zeros([XBLOCK, R0_BLOCK], tl.float32) 2025-12-04T12:15:05.9469733Z E1204 12:03:37.204000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_m2 = tl.zeros([XBLOCK, R0_BLOCK], tl.float32) 2025-12-04T12:15:05.9470343Z E1204 12:03:37.204000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_weight = tl.zeros([XBLOCK, R0_BLOCK], tl.float32) 2025-12-04T12:15:05.9471143Z E1204 12:03:37.204000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] for r0_offset in tl.range(0, r0_numel, R0_BLOCK): 2025-12-04T12:15:05.9471761Z E1204 12:03:37.204000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = r0_offset + r0_base 2025-12-04T12:15:05.9472293Z E1204 12:03:37.204000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T12:15:05.9472799Z E1204 12:03:37.204000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T12:15:05.9473327Z E1204 12:03:37.204000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T12:15:05.9473808Z E1204 12:03:37.204000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_1 = r0_index 2025-12-04T12:15:05.9474619Z E1204 12:03:37.204000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask & xmask, eviction_policy='evict_last', other=0.0).to(tl.float32) 2025-12-04T12:15:05.9475164Z E1204 12:03:37.204000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float32) 2025-12-04T12:15:05.9475812Z E1204 12:03:37.204000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK]) 2025-12-04T12:15:05.9476530Z E1204 12:03:37.204000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_mean_next, tmp3_m2_next, tmp3_weight_next = triton_helpers.welford_reduce( 2025-12-04T12:15:05.9477147Z E1204 12:03:37.204000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2, tmp3_mean, tmp3_m2, tmp3_weight, roffset == 0 2025-12-04T12:15:05.9477547Z E1204 12:03:37.204000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ) 2025-12-04T12:15:05.9478214Z E1204 12:03:37.204000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_mean = tl.where(r0_mask & xmask, tmp3_mean_next, tmp3_mean) 2025-12-04T12:15:05.9478834Z E1204 12:03:37.204000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_m2 = tl.where(r0_mask & xmask, tmp3_m2_next, tmp3_m2) 2025-12-04T12:15:05.9479505Z E1204 12:03:37.204000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_weight = tl.where(r0_mask & xmask, tmp3_weight_next, tmp3_weight) 2025-12-04T12:15:05.9480228Z E1204 12:03:37.204000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4, tmp5, tmp6 = triton_helpers.welford(tmp3_mean, tmp3_m2, tmp3_weight, 1) 2025-12-04T12:15:05.9480706Z E1204 12:03:37.204000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tmp4[:, None] 2025-12-04T12:15:05.9481197Z E1204 12:03:37.204000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = tmp5[:, None] 2025-12-04T12:15:05.9481671Z E1204 12:03:37.204000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tmp6[:, None] 2025-12-04T12:15:05.9482371Z E1204 12:03:37.204000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp20 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32) 2025-12-04T12:15:05.9482904Z E1204 12:03:37.204000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp22 = tl.load(in_ptr1 + (0)) 2025-12-04T12:15:05.9483451Z E1204 12:03:37.204000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp23 = tl.broadcast_to(tmp22, [1, 1]) 2025-12-04T12:15:05.9484043Z E1204 12:03:37.204000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] for r0_offset in tl.range(0, r0_numel, R0_BLOCK): 2025-12-04T12:15:05.9484570Z E1204 12:03:37.204000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = r0_offset + r0_base 2025-12-04T12:15:05.9485144Z E1204 12:03:37.204000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T12:15:05.9485637Z E1204 12:03:37.204000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T12:15:05.9486119Z E1204 12:03:37.204000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T12:15:05.9486599Z E1204 12:03:37.204000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_1 = r0_index 2025-12-04T12:15:05.9487451Z E1204 12:03:37.204000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask & xmask, eviction_policy='evict_first', other=0.0).to(tl.float32) 2025-12-04T12:15:05.9487994Z E1204 12:03:37.204000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = tmp9.to(tl.float32) 2025-12-04T12:15:05.9488893Z E1204 12:03:37.204000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = tmp10 - tmp3 2025-12-04T12:15:05.9489366Z E1204 12:03:37.204000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = 4096.0 2025-12-04T12:15:05.9489951Z E1204 12:03:37.204000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = (tmp7 / tmp12) 2025-12-04T12:15:05.9490414Z E1204 12:03:37.204000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp14 = 1e-05 2025-12-04T12:15:05.9490925Z E1204 12:03:37.204000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp15 = tmp13 + tmp14 2025-12-04T12:15:05.9491464Z E1204 12:03:37.204000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp16 = libdevice.rsqrt(tmp15) 2025-12-04T12:15:05.9491956Z E1204 12:03:37.204000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp17 = tmp11 * tmp16 2025-12-04T12:15:05.9492494Z E1204 12:03:37.204000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp18 = tl_math.abs(tmp17) 2025-12-04T12:15:05.9493092Z E1204 12:03:37.204000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp19 = tl.broadcast_to(tmp18, [XBLOCK, R0_BLOCK]) 2025-12-04T12:15:05.9493837Z E1204 12:03:37.204000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp21 = triton_helpers.maximum(_tmp20, tmp19) 2025-12-04T12:15:05.9494430Z E1204 12:03:37.204000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp20 = tl.where(r0_mask & xmask, tmp21, _tmp20) 2025-12-04T12:15:05.9494941Z E1204 12:03:37.204000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp24 = tmp17 * tmp23 2025-12-04T12:15:05.9495409Z E1204 12:03:37.204000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp25 = -448.0 2025-12-04T12:15:05.9495990Z E1204 12:03:37.204000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp26 = triton_helpers.maximum(tmp24, tmp25) 2025-12-04T12:15:05.9496590Z E1204 12:03:37.204000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp27 = 448.0 2025-12-04T12:15:05.9497169Z E1204 12:03:37.204000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp28 = triton_helpers.minimum(tmp26, tmp27) 2025-12-04T12:15:05.9497725Z E1204 12:03:37.204000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp29 = tmp28.to(tl.float8e4nv) 2025-12-04T12:15:05.9498351Z E1204 12:03:37.204000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr3 + (r0_1 + 4096*x0), tmp29, r0_mask & xmask) 2025-12-04T12:15:05.9498959Z E1204 12:03:37.204000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp20 = triton_helpers.max2(_tmp20, 1)[:, None] 2025-12-04T12:15:05.9499520Z E1204 12:03:37.204000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr2 + (x0), tmp20, xmask) 2025-12-04T12:15:05.9499885Z E1204 12:03:37.204000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:05.9502154Z E1204 12:03:37.204000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr2': '*fp32', 'out_ptr3': '*fp8e4nv', 'xnumel': 'i32', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1, 'R0_BLOCK': 4096}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 75} 2025-12-04T12:15:05.9502725Z E1204 12:03:37.204000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:05.9503906Z E1204 12:03:37.204000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.9504542Z E1204 12:03:37.204000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.9505443Z E1204 12:03:37.204000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.9506128Z E1204 12:03:37.204000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.9507016Z E1204 12:03:37.204000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.9507806Z E1204 12:03:37.204000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.9508418Z E1204 12:03:37.204000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T12:15:05.9509522Z E1204 12:03:37.204000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T12:15:05.9509891Z E1204 12:03:37.204000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T12:15:05.9510844Z E1204 12:03:37.204000 121609 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.9510954Z FAILED [0.4448s] [100%] 2025-12-04T12:15:05.9510963Z 2025-12-04T12:15:05.9511112Z ==================================== RERUNS ==================================== 2025-12-04T12:15:05.9511522Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,4096_cuda _ 2025-12-04T12:15:05.9511648Z Traceback (most recent call last): 2025-12-04T12:15:05.9512085Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant 2025-12-04T12:15:05.9512321Z y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T12:15:05.9512844Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:05.9513109Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:05.9513624Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:05.9513818Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:05.9514373Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:05.9514521Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:05.9515066Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:05.9515388Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:05.9515908Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:05.9516069Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:05.9516586Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:05.9516723Z return self._compile_to_module() 2025-12-04T12:15:05.9517206Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:05.9517376Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:05.9517909Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:05.9518041Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:05.9518549Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:05.9518784Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:05.9519370Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:05.9519514Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:05.9520026Z File "/tmp/tmpchk618lu/rx/crxneosyy2h3poby4nucsuxwszzvopnfwjh2zfomce626zhrqqn5.py", line 65, in 2025-12-04T12:15:05.9520493Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T12:15:05.9520622Z kernel.precompile( 2025-12-04T12:15:05.9521174Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T12:15:05.9521307Z self._precompile_worker() 2025-12-04T12:15:05.9521905Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:05.9522085Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:05.9522699Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.9522937Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.9523405Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.9523657Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.9524099Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.9524445Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.9524670Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:05.9525354Z def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T12:15:05.9525462Z ^ 2025-12-04T12:15:05.9525918Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.9525924Z 2025-12-04T12:15:05.9526650Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:05.9526688Z 2025-12-04T12:15:05.9526693Z 2025-12-04T12:15:05.9526910Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:05.9527623Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,4096_cuda 2025-12-04T12:15:05.9527629Z 2025-12-04T12:15:05.9527899Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:05.9528127Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.9528281Z frames [('total', 1)] 2025-12-04T12:15:05.9528403Z stats [('calls_captured', 10)] 2025-12-04T12:15:05.9528872Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:05.9529120Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.9529225Z graph_break [] 2025-12-04T12:15:05.9529634Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,4096_cuda _ 2025-12-04T12:15:05.9529760Z Traceback (most recent call last): 2025-12-04T12:15:05.9530185Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant 2025-12-04T12:15:05.9530435Z y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T12:15:05.9530925Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:05.9531189Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:05.9531707Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:05.9531905Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:05.9532434Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:05.9532582Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:05.9533119Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:05.9533455Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:05.9533976Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:05.9534140Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:05.9534657Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:05.9534784Z return self._compile_to_module() 2025-12-04T12:15:05.9535283Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:05.9535450Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:05.9535981Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:05.9536115Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:05.9536678Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:05.9536930Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:05.9537553Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:05.9537687Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:05.9538205Z File "/tmp/tmps62fmfbq/7a/c7aoc4bxcv6wpgvpfv3xndonc7ddlvev4izlx62jhz7ign57wyva.py", line 65, in 2025-12-04T12:15:05.9538723Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T12:15:05.9538853Z kernel.precompile( 2025-12-04T12:15:05.9539406Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T12:15:05.9539526Z self._precompile_worker() 2025-12-04T12:15:05.9540138Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:05.9540322Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:05.9540933Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.9541167Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.9541620Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.9541884Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.9542328Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.9542663Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.9542909Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:05.9543562Z def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T12:15:05.9543672Z ^ 2025-12-04T12:15:05.9544134Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.9544140Z 2025-12-04T12:15:05.9544852Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:05.9544872Z 2025-12-04T12:15:05.9544877Z 2025-12-04T12:15:05.9545094Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:05.9545792Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,4096_cuda 2025-12-04T12:15:05.9545798Z 2025-12-04T12:15:05.9546080Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:05.9546305Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.9546425Z frames [('total', 1)] 2025-12-04T12:15:05.9546548Z stats [('calls_captured', 10)] 2025-12-04T12:15:05.9547045Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:05.9547279Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.9547380Z graph_break [] 2025-12-04T12:15:05.9547598Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.9547715Z frames [('total', 1)] 2025-12-04T12:15:05.9547832Z stats [('calls_captured', 10)] 2025-12-04T12:15:05.9548049Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.9548517Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:05.9548617Z graph_break [] 2025-12-04T12:15:05.9548806Z =================================== FAILURES =================================== 2025-12-04T12:15:05.9549204Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,4096_cuda _ 2025-12-04T12:15:05.9549333Z Traceback (most recent call last): 2025-12-04T12:15:05.9549771Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant 2025-12-04T12:15:05.9550035Z y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T12:15:05.9550537Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:05.9550786Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:05.9551300Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:05.9551507Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:05.9552020Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:05.9552202Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:05.9552749Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:05.9553071Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:05.9553608Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:05.9553759Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:05.9554240Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:05.9554375Z return self._compile_to_module() 2025-12-04T12:15:05.9554861Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:05.9555038Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:05.9555563Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:05.9555692Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:05.9556203Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:05.9556436Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:05.9557018Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:05.9557159Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:05.9557657Z File "/tmp/tmptfemlvnp/px/cpxba3dcg7g2emdld3folai36dg5l5232bau7arm43ijoiqyzepi.py", line 65, in 2025-12-04T12:15:05.9558132Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T12:15:05.9558246Z kernel.precompile( 2025-12-04T12:15:05.9558845Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T12:15:05.9558980Z self._precompile_worker() 2025-12-04T12:15:05.9559579Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:05.9559772Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:05.9560368Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.9560565Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.9561029Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.9561309Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.9561756Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.9562110Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.9562341Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:05.9563036Z def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T12:15:05.9563128Z ^ 2025-12-04T12:15:05.9563587Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.9563592Z 2025-12-04T12:15:05.9564317Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:05.9564322Z 2025-12-04T12:15:05.9564357Z 2025-12-04T12:15:05.9564574Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:05.9565298Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,4096_cuda 2025-12-04T12:15:05.9565306Z 2025-12-04T12:15:05.9565575Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:05.9565813Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.9565918Z frames [('total', 1)] 2025-12-04T12:15:05.9566035Z stats [('calls_captured', 10)] 2025-12-04T12:15:05.9566514Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:05.9566735Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.9566837Z graph_break [] 2025-12-04T12:15:05.9567068Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.9567174Z frames [('total', 1)] 2025-12-04T12:15:05.9567307Z stats [('calls_captured', 10)] 2025-12-04T12:15:05.9567524Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.9567980Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:05.9568096Z graph_break [] 2025-12-04T12:15:05.9568311Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.9568413Z frames [('total', 1)] 2025-12-04T12:15:05.9568541Z stats [('calls_captured', 10)] 2025-12-04T12:15:05.9568760Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.9569232Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:05.9569330Z graph_break [] 2025-12-04T12:15:05.9570020Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-6add3d31a0a55a66.xml - 2025-12-04T12:15:05.9570210Z =========================== short test summary info ============================ 2025-12-04T12:15:05.9571250Z FAILED [0.4448s] inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,4096_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:05.9571922Z def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T12:15:05.9572013Z ^ 2025-12-04T12:15:05.9572469Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.9572474Z 2025-12-04T12:15:05.9573283Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:05.9573292Z 2025-12-04T12:15:05.9573296Z 2025-12-04T12:15:05.9573517Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:05.9574232Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,4096_cuda 2025-12-04T12:15:05.9574278Z 2025-12-04T12:15:05.9574549Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:05.9574732Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T12:15:05.9574955Z ================== 1 failed, 187 deselected, 2 rerun in 4.37s ================== 2025-12-04T12:15:05.9575057Z Got exit code 1 2025-12-04T12:15:05.9575704Z FAILED CONSISTENTLY: test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,4096_cuda 2025-12-04T12:15:05.9576112Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T12:15:05.9576697Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-fa52f41f0c0be4e5.xml 2025-12-04T12:15:05.9576885Z ============================= test session starts ============================== 2025-12-04T12:15:05.9577240Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T12:15:05.9577352Z cachedir: .pytest_cache 2025-12-04T12:15:05.9577888Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:15:05.9578013Z rootdir: /var/lib/jenkins/workspace 2025-12-04T12:15:05.9578139Z configfile: pytest.ini 2025-12-04T12:15:05.9578733Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T12:15:05.9578961Z collecting ... collected 188 items / 35 deselected / 153 selected 2025-12-04T12:15:05.9579129Z stepcurrent: skipping 35 already run items. 2025-12-04T12:15:05.9579245Z Running 153 items in this shard 2025-12-04T12:15:05.9579250Z 2025-12-04T12:15:05.9580711Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,512_cuda E1204 12:03:56.167000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1 2025-12-04T12:15:05.9581962Z E1204 12:03:56.167000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T12:15:05.9582410Z E1204 12:03:56.167000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T12:15:05.9582910Z E1204 12:03:56.167000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 5120 2025-12-04T12:15:05.9583372Z E1204 12:03:56.167000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T12:15:05.9583922Z E1204 12:03:56.167000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T12:15:05.9584462Z E1204 12:03:56.167000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:05.9585054Z E1204 12:03:56.167000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T12:15:05.9585684Z E1204 12:03:56.167000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T12:15:05.9586246Z E1204 12:03:56.167000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_base = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T12:15:05.9586712Z E1204 12:03:56.167000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rbase = r0_base 2025-12-04T12:15:05.9587375Z E1204 12:03:56.167000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp13 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32) 2025-12-04T12:15:05.9587913Z E1204 12:03:56.167000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp15 = tl.load(in_ptr3 + (0)) 2025-12-04T12:15:05.9588460Z E1204 12:03:56.167000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp16 = tl.broadcast_to(tmp15, [1, 1]) 2025-12-04T12:15:05.9589040Z E1204 12:03:56.167000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] for r0_offset in tl.range(0, r0_numel, R0_BLOCK): 2025-12-04T12:15:05.9589616Z E1204 12:03:56.167000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = r0_offset + r0_base 2025-12-04T12:15:05.9590144Z E1204 12:03:56.167000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T12:15:05.9590652Z E1204 12:03:56.167000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T12:15:05.9591130Z E1204 12:03:56.167000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T12:15:05.9591593Z E1204 12:03:56.167000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_2 = r0_index 2025-12-04T12:15:05.9592108Z E1204 12:03:56.167000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_1 = r0_index // 512 2025-12-04T12:15:05.9592872Z E1204 12:03:56.167000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_2), r0_mask, eviction_policy='evict_first', other=0.0).to(tl.float32) 2025-12-04T12:15:05.9593577Z E1204 12:03:56.167000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.load(in_ptr1 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0) 2025-12-04T12:15:05.9594262Z E1204 12:03:56.167000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tl.load(in_ptr2 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0) 2025-12-04T12:15:05.9594800Z E1204 12:03:56.167000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float32) 2025-12-04T12:15:05.9595291Z E1204 12:03:56.167000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tmp1 - tmp2 2025-12-04T12:15:05.9595743Z E1204 12:03:56.167000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = 512.0 2025-12-04T12:15:05.9596286Z E1204 12:03:56.167000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp6 = (tmp4 / tmp5) 2025-12-04T12:15:05.9596741Z E1204 12:03:56.167000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = 1e-05 2025-12-04T12:15:05.9597240Z E1204 12:03:56.167000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tmp6 + tmp7 2025-12-04T12:15:05.9597771Z E1204 12:03:56.167000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = libdevice.rsqrt(tmp8) 2025-12-04T12:15:05.9598260Z E1204 12:03:56.167000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = tmp3 * tmp9 2025-12-04T12:15:05.9598825Z E1204 12:03:56.167000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = tl_math.abs(tmp10) 2025-12-04T12:15:05.9599423Z E1204 12:03:56.167000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = tl.broadcast_to(tmp11, [XBLOCK, R0_BLOCK]) 2025-12-04T12:15:05.9600015Z E1204 12:03:56.167000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp14 = triton_helpers.maximum(_tmp13, tmp12) 2025-12-04T12:15:05.9600605Z E1204 12:03:56.167000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp13 = tl.where(r0_mask, tmp14, _tmp13) 2025-12-04T12:15:05.9601106Z E1204 12:03:56.167000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp17 = tmp10 * tmp16 2025-12-04T12:15:05.9601584Z E1204 12:03:56.167000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp18 = -448.0 2025-12-04T12:15:05.9602169Z E1204 12:03:56.167000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp19 = triton_helpers.maximum(tmp17, tmp18) 2025-12-04T12:15:05.9602676Z E1204 12:03:56.167000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp20 = 448.0 2025-12-04T12:15:05.9603253Z E1204 12:03:56.167000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp21 = triton_helpers.minimum(tmp19, tmp20) 2025-12-04T12:15:05.9603808Z E1204 12:03:56.167000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp22 = tmp21.to(tl.float8e4nv) 2025-12-04T12:15:05.9604515Z E1204 12:03:56.167000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (tl.broadcast_to(r0_2, [XBLOCK, R0_BLOCK])), tmp22, r0_mask) 2025-12-04T12:15:05.9605094Z E1204 12:03:56.167000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = triton_helpers.max2(_tmp13, 1)[:, None] 2025-12-04T12:15:05.9605627Z E1204 12:03:56.167000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp23 = tmp13.to(tl.float32) 2025-12-04T12:15:05.9606342Z E1204 12:03:56.167000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr2 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp23, None) 2025-12-04T12:15:05.9606720Z E1204 12:03:56.167000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:05.9609378Z E1204 12:03:56.167000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'in_ptr2': '*fp32', 'in_ptr3': '*fp32', 'out_ptr1': '*fp8e4nv', 'out_ptr2': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1, 'R0_BLOCK': 2048}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]], (7,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 75} 2025-12-04T12:15:05.9609933Z E1204 12:03:56.167000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:05.9610984Z E1204 12:03:56.167000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.9611628Z E1204 12:03:56.167000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.9612555Z E1204 12:03:56.167000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.9613240Z E1204 12:03:56.167000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.9614135Z E1204 12:03:56.167000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.9614936Z E1204 12:03:56.167000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.9615560Z E1204 12:03:56.167000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T12:15:05.9616879Z E1204 12:03:56.167000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T12:15:05.9617301Z E1204 12:03:56.167000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T12:15:05.9618201Z E1204 12:03:56.167000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.9618352Z ('RERUN', {'yellow': True}) [3.5258s] [ 0%] 2025-12-04T12:15:05.9619787Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,512_cuda E1204 12:03:56.710000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1 2025-12-04T12:15:05.9621030Z E1204 12:03:56.710000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T12:15:05.9621482Z E1204 12:03:56.710000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T12:15:05.9621938Z E1204 12:03:56.710000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 5120 2025-12-04T12:15:05.9622417Z E1204 12:03:56.710000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T12:15:05.9622952Z E1204 12:03:56.710000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T12:15:05.9623494Z E1204 12:03:56.710000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:05.9624128Z E1204 12:03:56.710000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T12:15:05.9624715Z E1204 12:03:56.710000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T12:15:05.9625289Z E1204 12:03:56.710000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_base = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T12:15:05.9625760Z E1204 12:03:56.710000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rbase = r0_base 2025-12-04T12:15:05.9626444Z E1204 12:03:56.710000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp13 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32) 2025-12-04T12:15:05.9626975Z E1204 12:03:56.710000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp15 = tl.load(in_ptr3 + (0)) 2025-12-04T12:15:05.9627526Z E1204 12:03:56.710000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp16 = tl.broadcast_to(tmp15, [1, 1]) 2025-12-04T12:15:05.9628126Z E1204 12:03:56.710000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] for r0_offset in tl.range(0, r0_numel, R0_BLOCK): 2025-12-04T12:15:05.9628690Z E1204 12:03:56.710000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = r0_offset + r0_base 2025-12-04T12:15:05.9629234Z E1204 12:03:56.710000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T12:15:05.9629723Z E1204 12:03:56.710000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T12:15:05.9630206Z E1204 12:03:56.710000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T12:15:05.9630724Z E1204 12:03:56.710000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_2 = r0_index 2025-12-04T12:15:05.9631223Z E1204 12:03:56.710000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_1 = r0_index // 512 2025-12-04T12:15:05.9631998Z E1204 12:03:56.710000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_2), r0_mask, eviction_policy='evict_first', other=0.0).to(tl.float32) 2025-12-04T12:15:05.9632687Z E1204 12:03:56.710000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.load(in_ptr1 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0) 2025-12-04T12:15:05.9633384Z E1204 12:03:56.710000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tl.load(in_ptr2 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0) 2025-12-04T12:15:05.9633909Z E1204 12:03:56.710000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float32) 2025-12-04T12:15:05.9634395Z E1204 12:03:56.710000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tmp1 - tmp2 2025-12-04T12:15:05.9634863Z E1204 12:03:56.710000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = 512.0 2025-12-04T12:15:05.9635351Z E1204 12:03:56.710000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp6 = (tmp4 / tmp5) 2025-12-04T12:15:05.9635817Z E1204 12:03:56.710000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = 1e-05 2025-12-04T12:15:05.9636299Z E1204 12:03:56.710000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tmp6 + tmp7 2025-12-04T12:15:05.9636833Z E1204 12:03:56.710000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = libdevice.rsqrt(tmp8) 2025-12-04T12:15:05.9637382Z E1204 12:03:56.710000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = tmp3 * tmp9 2025-12-04T12:15:05.9637902Z E1204 12:03:56.710000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = tl_math.abs(tmp10) 2025-12-04T12:15:05.9638514Z E1204 12:03:56.710000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = tl.broadcast_to(tmp11, [XBLOCK, R0_BLOCK]) 2025-12-04T12:15:05.9639094Z E1204 12:03:56.710000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp14 = triton_helpers.maximum(_tmp13, tmp12) 2025-12-04T12:15:05.9639690Z E1204 12:03:56.710000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp13 = tl.where(r0_mask, tmp14, _tmp13) 2025-12-04T12:15:05.9640205Z E1204 12:03:56.710000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp17 = tmp10 * tmp16 2025-12-04T12:15:05.9640678Z E1204 12:03:56.710000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp18 = -448.0 2025-12-04T12:15:05.9641266Z E1204 12:03:56.710000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp19 = triton_helpers.maximum(tmp17, tmp18) 2025-12-04T12:15:05.9641757Z E1204 12:03:56.710000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp20 = 448.0 2025-12-04T12:15:05.9642331Z E1204 12:03:56.710000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp21 = triton_helpers.minimum(tmp19, tmp20) 2025-12-04T12:15:05.9642884Z E1204 12:03:56.710000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp22 = tmp21.to(tl.float8e4nv) 2025-12-04T12:15:05.9643591Z E1204 12:03:56.710000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (tl.broadcast_to(r0_2, [XBLOCK, R0_BLOCK])), tmp22, r0_mask) 2025-12-04T12:15:05.9644218Z E1204 12:03:56.710000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = triton_helpers.max2(_tmp13, 1)[:, None] 2025-12-04T12:15:05.9644733Z E1204 12:03:56.710000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp23 = tmp13.to(tl.float32) 2025-12-04T12:15:05.9645454Z E1204 12:03:56.710000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr2 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp23, None) 2025-12-04T12:15:05.9645815Z E1204 12:03:56.710000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:05.9648439Z E1204 12:03:56.710000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'in_ptr2': '*fp32', 'in_ptr3': '*fp32', 'out_ptr1': '*fp8e4nv', 'out_ptr2': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1, 'R0_BLOCK': 2048}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]], (7,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 75} 2025-12-04T12:15:05.9648986Z E1204 12:03:56.710000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:05.9650042Z E1204 12:03:56.710000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.9650698Z E1204 12:03:56.710000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.9651591Z E1204 12:03:56.710000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.9652280Z E1204 12:03:56.710000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.9653164Z E1204 12:03:56.710000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.9653973Z E1204 12:03:56.710000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.9654585Z E1204 12:03:56.710000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T12:15:05.9655845Z E1204 12:03:56.710000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T12:15:05.9656244Z E1204 12:03:56.710000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T12:15:05.9657234Z E1204 12:03:56.710000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.9657385Z ('RERUN', {'yellow': True}) [0.5049s] [ 0%] 2025-12-04T12:15:05.9658851Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,512_cuda E1204 12:03:57.220000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1 2025-12-04T12:15:05.9660108Z E1204 12:03:57.220000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T12:15:05.9660542Z E1204 12:03:57.220000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T12:15:05.9661010Z E1204 12:03:57.220000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 5120 2025-12-04T12:15:05.9661474Z E1204 12:03:57.220000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T12:15:05.9662006Z E1204 12:03:57.220000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T12:15:05.9662562Z E1204 12:03:57.220000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:05.9663145Z E1204 12:03:57.220000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T12:15:05.9663741Z E1204 12:03:57.220000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T12:15:05.9664297Z E1204 12:03:57.220000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_base = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T12:15:05.9664748Z E1204 12:03:57.220000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rbase = r0_base 2025-12-04T12:15:05.9665428Z E1204 12:03:57.220000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp13 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32) 2025-12-04T12:15:05.9665956Z E1204 12:03:57.220000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp15 = tl.load(in_ptr3 + (0)) 2025-12-04T12:15:05.9666514Z E1204 12:03:57.220000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp16 = tl.broadcast_to(tmp15, [1, 1]) 2025-12-04T12:15:05.9667090Z E1204 12:03:57.220000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] for r0_offset in tl.range(0, r0_numel, R0_BLOCK): 2025-12-04T12:15:05.9667663Z E1204 12:03:57.220000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = r0_offset + r0_base 2025-12-04T12:15:05.9668194Z E1204 12:03:57.220000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T12:15:05.9668685Z E1204 12:03:57.220000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T12:15:05.9669212Z E1204 12:03:57.220000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T12:15:05.9669677Z E1204 12:03:57.220000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_2 = r0_index 2025-12-04T12:15:05.9670188Z E1204 12:03:57.220000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_1 = r0_index // 512 2025-12-04T12:15:05.9671128Z E1204 12:03:57.220000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_2), r0_mask, eviction_policy='evict_first', other=0.0).to(tl.float32) 2025-12-04T12:15:05.9671895Z E1204 12:03:57.220000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.load(in_ptr1 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0) 2025-12-04T12:15:05.9672605Z E1204 12:03:57.220000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tl.load(in_ptr2 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0) 2025-12-04T12:15:05.9673131Z E1204 12:03:57.220000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float32) 2025-12-04T12:15:05.9673635Z E1204 12:03:57.220000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tmp1 - tmp2 2025-12-04T12:15:05.9674085Z E1204 12:03:57.220000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = 512.0 2025-12-04T12:15:05.9674594Z E1204 12:03:57.220000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp6 = (tmp4 / tmp5) 2025-12-04T12:15:05.9675050Z E1204 12:03:57.220000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = 1e-05 2025-12-04T12:15:05.9675536Z E1204 12:03:57.220000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tmp6 + tmp7 2025-12-04T12:15:05.9676088Z E1204 12:03:57.220000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = libdevice.rsqrt(tmp8) 2025-12-04T12:15:05.9676578Z E1204 12:03:57.220000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = tmp3 * tmp9 2025-12-04T12:15:05.9677113Z E1204 12:03:57.220000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = tl_math.abs(tmp10) 2025-12-04T12:15:05.9677709Z E1204 12:03:57.220000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = tl.broadcast_to(tmp11, [XBLOCK, R0_BLOCK]) 2025-12-04T12:15:05.9678845Z E1204 12:03:57.220000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp14 = triton_helpers.maximum(_tmp13, tmp12) 2025-12-04T12:15:05.9679430Z E1204 12:03:57.220000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp13 = tl.where(r0_mask, tmp14, _tmp13) 2025-12-04T12:15:05.9679934Z E1204 12:03:57.220000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp17 = tmp10 * tmp16 2025-12-04T12:15:05.9680418Z E1204 12:03:57.220000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp18 = -448.0 2025-12-04T12:15:05.9680998Z E1204 12:03:57.220000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp19 = triton_helpers.maximum(tmp17, tmp18) 2025-12-04T12:15:05.9681506Z E1204 12:03:57.220000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp20 = 448.0 2025-12-04T12:15:05.9682101Z E1204 12:03:57.220000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp21 = triton_helpers.minimum(tmp19, tmp20) 2025-12-04T12:15:05.9682645Z E1204 12:03:57.220000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp22 = tmp21.to(tl.float8e4nv) 2025-12-04T12:15:05.9683408Z E1204 12:03:57.220000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (tl.broadcast_to(r0_2, [XBLOCK, R0_BLOCK])), tmp22, r0_mask) 2025-12-04T12:15:05.9683986Z E1204 12:03:57.220000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = triton_helpers.max2(_tmp13, 1)[:, None] 2025-12-04T12:15:05.9684504Z E1204 12:03:57.220000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp23 = tmp13.to(tl.float32) 2025-12-04T12:15:05.9685228Z E1204 12:03:57.220000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr2 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp23, None) 2025-12-04T12:15:05.9685649Z E1204 12:03:57.220000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:05.9688258Z E1204 12:03:57.220000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'in_ptr2': '*fp32', 'in_ptr3': '*fp32', 'out_ptr1': '*fp8e4nv', 'out_ptr2': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1, 'R0_BLOCK': 2048}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]], (7,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 75} 2025-12-04T12:15:05.9688799Z E1204 12:03:57.220000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:05.9689854Z E1204 12:03:57.220000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.9690561Z E1204 12:03:57.220000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.9691523Z E1204 12:03:57.220000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.9692205Z E1204 12:03:57.220000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.9693140Z E1204 12:03:57.220000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.9693915Z E1204 12:03:57.220000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.9694528Z E1204 12:03:57.220000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T12:15:05.9695828Z E1204 12:03:57.220000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T12:15:05.9696196Z E1204 12:03:57.220000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T12:15:05.9697163Z E1204 12:03:57.220000 121806 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.9697306Z FAILED [0.5101s] [ 0%] 2025-12-04T12:15:05.9697313Z 2025-12-04T12:15:05.9697478Z ==================================== RERUNS ==================================== 2025-12-04T12:15:05.9697868Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,512_cuda _ 2025-12-04T12:15:05.9697993Z Traceback (most recent call last): 2025-12-04T12:15:05.9698435Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant 2025-12-04T12:15:05.9698676Z y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T12:15:05.9699176Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:05.9699465Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:05.9699981Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:05.9700193Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:05.9700705Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:05.9700853Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:05.9701405Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:05.9701729Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:05.9702265Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:05.9702421Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:05.9702903Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:05.9703046Z return self._compile_to_module() 2025-12-04T12:15:05.9703531Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:05.9703709Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:05.9704225Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:05.9704357Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:05.9704895Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:05.9705132Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:05.9705767Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:05.9705911Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:05.9706422Z File "/tmp/tmpgv72wtly/d2/cd2bhfqkfzk3ytih4l5jpwrrqngdoz4q257rrrhwldmopxdpfcqa.py", line 137, in 2025-12-04T12:15:05.9706897Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T12:15:05.9707012Z kernel.precompile( 2025-12-04T12:15:05.9707562Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T12:15:05.9707694Z self._precompile_worker() 2025-12-04T12:15:05.9708326Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:05.9708526Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:05.9709124Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.9709326Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.9709919Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.9710169Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.9710613Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.9710966Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.9711194Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:05.9712026Z def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T12:15:05.9712157Z ^ 2025-12-04T12:15:05.9712617Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.9712641Z 2025-12-04T12:15:05.9713357Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:05.9713363Z 2025-12-04T12:15:05.9713368Z 2025-12-04T12:15:05.9713588Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:05.9714303Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,512_cuda 2025-12-04T12:15:05.9714312Z 2025-12-04T12:15:05.9714584Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:05.9714828Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.9714936Z frames [('total', 1)] 2025-12-04T12:15:05.9715059Z stats [('calls_captured', 10)] 2025-12-04T12:15:05.9715541Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:15:05.9715769Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.9715871Z graph_break [] 2025-12-04T12:15:05.9716278Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,512_cuda _ 2025-12-04T12:15:05.9716406Z Traceback (most recent call last): 2025-12-04T12:15:05.9716843Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant 2025-12-04T12:15:05.9717083Z y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T12:15:05.9717573Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:05.9717881Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:05.9718683Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:05.9718899Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:05.9719414Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:05.9719566Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:05.9720113Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:05.9720485Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:05.9721008Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:05.9721179Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:05.9721661Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:05.9721834Z return self._compile_to_module() 2025-12-04T12:15:05.9722321Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:05.9722488Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:05.9723026Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:05.9723159Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:05.9723680Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:05.9723919Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:05.9724547Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:05.9724691Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:05.9725206Z File "/tmp/tmpz6kanr3z/2k/c2kscxxt7ys57ljxoamro5pgno6dvi66nhweineppnb6f3qpuar7.py", line 137, in 2025-12-04T12:15:05.9725669Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T12:15:05.9725797Z kernel.precompile( 2025-12-04T12:15:05.9726354Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T12:15:05.9726488Z self._precompile_worker() 2025-12-04T12:15:05.9727085Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:05.9727265Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:05.9727880Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.9728078Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.9728542Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.9728788Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.9729233Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.9729576Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.9729807Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:05.9730653Z def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T12:15:05.9730758Z ^ 2025-12-04T12:15:05.9731220Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.9731228Z 2025-12-04T12:15:05.9731950Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:05.9731958Z 2025-12-04T12:15:05.9731962Z 2025-12-04T12:15:05.9732180Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:05.9732888Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,512_cuda 2025-12-04T12:15:05.9732928Z 2025-12-04T12:15:05.9733197Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:05.9733426Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.9733543Z frames [('total', 1)] 2025-12-04T12:15:05.9733661Z stats [('calls_captured', 10)] 2025-12-04T12:15:05.9734137Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:15:05.9734409Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.9734510Z graph_break [] 2025-12-04T12:15:05.9734743Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.9734848Z frames [('total', 1)] 2025-12-04T12:15:05.9734966Z stats [('calls_captured', 10)] 2025-12-04T12:15:05.9735200Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.9735662Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:15:05.9735795Z graph_break [] 2025-12-04T12:15:05.9735963Z =================================== FAILURES =================================== 2025-12-04T12:15:05.9736431Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,512_cuda _ 2025-12-04T12:15:05.9736577Z Traceback (most recent call last): 2025-12-04T12:15:05.9737008Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant 2025-12-04T12:15:05.9737241Z y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T12:15:05.9737749Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:05.9737997Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:05.9738526Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:05.9738719Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:05.9739235Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:05.9739398Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:05.9739935Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:05.9740258Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:05.9740793Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:05.9740941Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:05.9741436Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:05.9741562Z return self._compile_to_module() 2025-12-04T12:15:05.9742098Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:05.9742278Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:05.9742796Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:05.9742943Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:05.9743444Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:05.9743680Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:05.9744279Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:05.9744440Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:05.9744943Z File "/tmp/tmpfahfx886/7c/c7cfetq46btaezjz4qzq4ubbk7h2uh4qdnt5yqj5q6hm3ku4wm37.py", line 137, in 2025-12-04T12:15:05.9745426Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T12:15:05.9745542Z kernel.precompile( 2025-12-04T12:15:05.9746109Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T12:15:05.9746268Z self._precompile_worker() 2025-12-04T12:15:05.9746865Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:05.9747059Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:05.9747656Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.9747869Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.9748321Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.9748603Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.9749063Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.9749400Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.9749627Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:05.9750453Z def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T12:15:05.9750543Z ^ 2025-12-04T12:15:05.9751017Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.9751023Z 2025-12-04T12:15:05.9751737Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:05.9751743Z 2025-12-04T12:15:05.9751747Z 2025-12-04T12:15:05.9751978Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:05.9752680Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,512_cuda 2025-12-04T12:15:05.9752686Z 2025-12-04T12:15:05.9752955Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:05.9753188Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.9753293Z frames [('total', 1)] 2025-12-04T12:15:05.9753422Z stats [('calls_captured', 10)] 2025-12-04T12:15:05.9753889Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:15:05.9754113Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.9754266Z graph_break [] 2025-12-04T12:15:05.9754485Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.9754590Z frames [('total', 1)] 2025-12-04T12:15:05.9754725Z stats [('calls_captured', 10)] 2025-12-04T12:15:05.9754944Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.9755415Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:15:05.9755516Z graph_break [] 2025-12-04T12:15:05.9755733Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.9755849Z frames [('total', 1)] 2025-12-04T12:15:05.9755965Z stats [('calls_captured', 10)] 2025-12-04T12:15:05.9756219Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.9756694Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:15:05.9756793Z graph_break [] 2025-12-04T12:15:05.9757463Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-fa52f41f0c0be4e5.xml - 2025-12-04T12:15:05.9757671Z =========================== short test summary info ============================ 2025-12-04T12:15:05.9758507Z FAILED [0.5101s] inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,512_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:05.9759329Z def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T12:15:05.9759419Z ^ 2025-12-04T12:15:05.9759889Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.9759928Z 2025-12-04T12:15:05.9760642Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:05.9760650Z 2025-12-04T12:15:05.9760655Z 2025-12-04T12:15:05.9760874Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:05.9761581Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,512_cuda 2025-12-04T12:15:05.9761588Z 2025-12-04T12:15:05.9761859Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:05.9762056Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T12:15:05.9762259Z ================== 1 failed, 35 deselected, 2 rerun in 4.58s =================== 2025-12-04T12:15:05.9762364Z Got exit code 1 2025-12-04T12:15:05.9762487Z Retrying single test... 2025-12-04T12:15:05.9762959Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-38b24c1b21208356.xml 2025-12-04T12:15:05.9763138Z ============================= test session starts ============================== 2025-12-04T12:15:05.9763492Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T12:15:05.9763604Z cachedir: .pytest_cache 2025-12-04T12:15:05.9764139Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:15:05.9764268Z rootdir: /var/lib/jenkins/workspace 2025-12-04T12:15:05.9764379Z configfile: pytest.ini 2025-12-04T12:15:05.9764984Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T12:15:05.9765211Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T12:15:05.9766040Z stepcurrent: skipping 35 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,512_cuda 2025-12-04T12:15:05.9766162Z Running 1 items in this shard 2025-12-04T12:15:05.9766167Z 2025-12-04T12:15:05.9767602Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,512_cuda E1204 12:04:15.973000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1 2025-12-04T12:15:05.9768892Z E1204 12:04:15.973000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T12:15:05.9769331Z E1204 12:04:15.973000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T12:15:05.9769797Z E1204 12:04:15.973000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 5120 2025-12-04T12:15:05.9770291Z E1204 12:04:15.973000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T12:15:05.9770838Z E1204 12:04:15.973000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T12:15:05.9771564Z E1204 12:04:15.973000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:05.9772150Z E1204 12:04:15.973000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T12:15:05.9772844Z E1204 12:04:15.973000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T12:15:05.9773399Z E1204 12:04:15.973000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_base = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T12:15:05.9773864Z E1204 12:04:15.973000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rbase = r0_base 2025-12-04T12:15:05.9774501Z E1204 12:04:15.973000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp13 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32) 2025-12-04T12:15:05.9775038Z E1204 12:04:15.973000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp15 = tl.load(in_ptr3 + (0)) 2025-12-04T12:15:05.9775588Z E1204 12:04:15.973000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp16 = tl.broadcast_to(tmp15, [1, 1]) 2025-12-04T12:15:05.9776176Z E1204 12:04:15.973000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] for r0_offset in tl.range(0, r0_numel, R0_BLOCK): 2025-12-04T12:15:05.9776791Z E1204 12:04:15.973000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = r0_offset + r0_base 2025-12-04T12:15:05.9777326Z E1204 12:04:15.973000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T12:15:05.9777830Z E1204 12:04:15.973000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T12:15:05.9778312Z E1204 12:04:15.973000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T12:15:05.9778783Z E1204 12:04:15.973000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_2 = r0_index 2025-12-04T12:15:05.9779302Z E1204 12:04:15.973000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_1 = r0_index // 512 2025-12-04T12:15:05.9780118Z E1204 12:04:15.973000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_2), r0_mask, eviction_policy='evict_first', other=0.0).to(tl.float32) 2025-12-04T12:15:05.9780833Z E1204 12:04:15.973000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.load(in_ptr1 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0) 2025-12-04T12:15:05.9781520Z E1204 12:04:15.973000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tl.load(in_ptr2 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0) 2025-12-04T12:15:05.9782113Z E1204 12:04:15.973000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float32) 2025-12-04T12:15:05.9782619Z E1204 12:04:15.973000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tmp1 - tmp2 2025-12-04T12:15:05.9783081Z E1204 12:04:15.973000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = 512.0 2025-12-04T12:15:05.9783589Z E1204 12:04:15.973000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp6 = (tmp4 / tmp5) 2025-12-04T12:15:05.9784088Z E1204 12:04:15.973000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = 1e-05 2025-12-04T12:15:05.9784571Z E1204 12:04:15.973000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tmp6 + tmp7 2025-12-04T12:15:05.9785123Z E1204 12:04:15.973000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = libdevice.rsqrt(tmp8) 2025-12-04T12:15:05.9785614Z E1204 12:04:15.973000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = tmp3 * tmp9 2025-12-04T12:15:05.9786184Z E1204 12:04:15.973000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = tl_math.abs(tmp10) 2025-12-04T12:15:05.9786776Z E1204 12:04:15.973000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = tl.broadcast_to(tmp11, [XBLOCK, R0_BLOCK]) 2025-12-04T12:15:05.9787377Z E1204 12:04:15.973000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp14 = triton_helpers.maximum(_tmp13, tmp12) 2025-12-04T12:15:05.9787943Z E1204 12:04:15.973000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp13 = tl.where(r0_mask, tmp14, _tmp13) 2025-12-04T12:15:05.9788445Z E1204 12:04:15.973000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp17 = tmp10 * tmp16 2025-12-04T12:15:05.9788931Z E1204 12:04:15.973000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp18 = -448.0 2025-12-04T12:15:05.9789512Z E1204 12:04:15.973000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp19 = triton_helpers.maximum(tmp17, tmp18) 2025-12-04T12:15:05.9789987Z E1204 12:04:15.973000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp20 = 448.0 2025-12-04T12:15:05.9790569Z E1204 12:04:15.973000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp21 = triton_helpers.minimum(tmp19, tmp20) 2025-12-04T12:15:05.9791111Z E1204 12:04:15.973000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp22 = tmp21.to(tl.float8e4nv) 2025-12-04T12:15:05.9791831Z E1204 12:04:15.973000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (tl.broadcast_to(r0_2, [XBLOCK, R0_BLOCK])), tmp22, r0_mask) 2025-12-04T12:15:05.9792414Z E1204 12:04:15.973000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = triton_helpers.max2(_tmp13, 1)[:, None] 2025-12-04T12:15:05.9792983Z E1204 12:04:15.973000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp23 = tmp13.to(tl.float32) 2025-12-04T12:15:05.9793696Z E1204 12:04:15.973000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr2 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp23, None) 2025-12-04T12:15:05.9794065Z E1204 12:04:15.973000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:05.9796729Z E1204 12:04:15.973000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'in_ptr2': '*fp32', 'in_ptr3': '*fp32', 'out_ptr1': '*fp8e4nv', 'out_ptr2': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1, 'R0_BLOCK': 2048}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]], (7,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 75} 2025-12-04T12:15:05.9797313Z E1204 12:04:15.973000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:05.9798355Z E1204 12:04:15.973000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.9798987Z E1204 12:04:15.973000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.9799899Z E1204 12:04:15.973000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.9800612Z E1204 12:04:15.973000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.9801512Z E1204 12:04:15.973000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.9802287Z E1204 12:04:15.973000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.9802907Z E1204 12:04:15.973000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T12:15:05.9804159Z E1204 12:04:15.973000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T12:15:05.9804544Z E1204 12:04:15.973000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T12:15:05.9805439Z E1204 12:04:15.973000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.9805576Z ('RERUN', {'yellow': True}) [3.5480s] [100%] 2025-12-04T12:15:05.9807017Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,512_cuda E1204 12:04:16.536000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1 2025-12-04T12:15:05.9808295Z E1204 12:04:16.536000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T12:15:05.9808748Z E1204 12:04:16.536000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T12:15:05.9809198Z E1204 12:04:16.536000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 5120 2025-12-04T12:15:05.9809670Z E1204 12:04:16.536000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T12:15:05.9810239Z E1204 12:04:16.536000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T12:15:05.9810787Z E1204 12:04:16.536000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:05.9811389Z E1204 12:04:16.536000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T12:15:05.9812009Z E1204 12:04:16.536000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T12:15:05.9812576Z E1204 12:04:16.536000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_base = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T12:15:05.9813026Z E1204 12:04:16.536000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rbase = r0_base 2025-12-04T12:15:05.9813665Z E1204 12:04:16.536000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp13 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32) 2025-12-04T12:15:05.9814237Z E1204 12:04:16.536000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp15 = tl.load(in_ptr3 + (0)) 2025-12-04T12:15:05.9814785Z E1204 12:04:16.536000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp16 = tl.broadcast_to(tmp15, [1, 1]) 2025-12-04T12:15:05.9815378Z E1204 12:04:16.536000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] for r0_offset in tl.range(0, r0_numel, R0_BLOCK): 2025-12-04T12:15:05.9815907Z E1204 12:04:16.536000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = r0_offset + r0_base 2025-12-04T12:15:05.9816882Z E1204 12:04:16.536000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T12:15:05.9817392Z E1204 12:04:16.536000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T12:15:05.9817876Z E1204 12:04:16.536000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T12:15:05.9818354Z E1204 12:04:16.536000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_2 = r0_index 2025-12-04T12:15:05.9818858Z E1204 12:04:16.536000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_1 = r0_index // 512 2025-12-04T12:15:05.9819634Z E1204 12:04:16.536000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_2), r0_mask, eviction_policy='evict_first', other=0.0).to(tl.float32) 2025-12-04T12:15:05.9820330Z E1204 12:04:16.536000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.load(in_ptr1 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0) 2025-12-04T12:15:05.9821067Z E1204 12:04:16.536000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tl.load(in_ptr2 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0) 2025-12-04T12:15:05.9821778Z E1204 12:04:16.536000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float32) 2025-12-04T12:15:05.9822274Z E1204 12:04:16.536000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tmp1 - tmp2 2025-12-04T12:15:05.9822740Z E1204 12:04:16.536000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = 512.0 2025-12-04T12:15:05.9823236Z E1204 12:04:16.536000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp6 = (tmp4 / tmp5) 2025-12-04T12:15:05.9823732Z E1204 12:04:16.536000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = 1e-05 2025-12-04T12:15:05.9824233Z E1204 12:04:16.536000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tmp6 + tmp7 2025-12-04T12:15:05.9824772Z E1204 12:04:16.536000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = libdevice.rsqrt(tmp8) 2025-12-04T12:15:05.9825274Z E1204 12:04:16.536000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = tmp3 * tmp9 2025-12-04T12:15:05.9825835Z E1204 12:04:16.536000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = tl_math.abs(tmp10) 2025-12-04T12:15:05.9826430Z E1204 12:04:16.536000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = tl.broadcast_to(tmp11, [XBLOCK, R0_BLOCK]) 2025-12-04T12:15:05.9827032Z E1204 12:04:16.536000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp14 = triton_helpers.maximum(_tmp13, tmp12) 2025-12-04T12:15:05.9827594Z E1204 12:04:16.536000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp13 = tl.where(r0_mask, tmp14, _tmp13) 2025-12-04T12:15:05.9828145Z E1204 12:04:16.536000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp17 = tmp10 * tmp16 2025-12-04T12:15:05.9828609Z E1204 12:04:16.536000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp18 = -448.0 2025-12-04T12:15:05.9829198Z E1204 12:04:16.536000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp19 = triton_helpers.maximum(tmp17, tmp18) 2025-12-04T12:15:05.9829654Z E1204 12:04:16.536000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp20 = 448.0 2025-12-04T12:15:05.9830229Z E1204 12:04:16.536000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp21 = triton_helpers.minimum(tmp19, tmp20) 2025-12-04T12:15:05.9830782Z E1204 12:04:16.536000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp22 = tmp21.to(tl.float8e4nv) 2025-12-04T12:15:05.9831487Z E1204 12:04:16.536000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (tl.broadcast_to(r0_2, [XBLOCK, R0_BLOCK])), tmp22, r0_mask) 2025-12-04T12:15:05.9832077Z E1204 12:04:16.536000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = triton_helpers.max2(_tmp13, 1)[:, None] 2025-12-04T12:15:05.9832595Z E1204 12:04:16.536000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp23 = tmp13.to(tl.float32) 2025-12-04T12:15:05.9833301Z E1204 12:04:16.536000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr2 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp23, None) 2025-12-04T12:15:05.9833680Z E1204 12:04:16.536000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:05.9836344Z E1204 12:04:16.536000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'in_ptr2': '*fp32', 'in_ptr3': '*fp32', 'out_ptr1': '*fp8e4nv', 'out_ptr2': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1, 'R0_BLOCK': 2048}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]], (7,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 75} 2025-12-04T12:15:05.9836897Z E1204 12:04:16.536000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:05.9837971Z E1204 12:04:16.536000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.9838619Z E1204 12:04:16.536000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.9839544Z E1204 12:04:16.536000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.9840240Z E1204 12:04:16.536000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.9841121Z E1204 12:04:16.536000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.9841937Z E1204 12:04:16.536000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.9842546Z E1204 12:04:16.536000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T12:15:05.9843791Z E1204 12:04:16.536000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T12:15:05.9844172Z E1204 12:04:16.536000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T12:15:05.9845063Z E1204 12:04:16.536000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.9845217Z ('RERUN', {'yellow': True}) [0.5240s] [100%] 2025-12-04T12:15:05.9846641Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,512_cuda E1204 12:04:17.049000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1 2025-12-04T12:15:05.9847897Z E1204 12:04:17.049000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T12:15:05.9848328Z E1204 12:04:17.049000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T12:15:05.9848824Z E1204 12:04:17.049000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 5120 2025-12-04T12:15:05.9849289Z E1204 12:04:17.049000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T12:15:05.9849826Z E1204 12:04:17.049000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T12:15:05.9850382Z E1204 12:04:17.049000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:05.9850963Z E1204 12:04:17.049000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T12:15:05.9851585Z E1204 12:04:17.049000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T12:15:05.9852138Z E1204 12:04:17.049000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_base = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T12:15:05.9852593Z E1204 12:04:17.049000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rbase = r0_base 2025-12-04T12:15:05.9853274Z E1204 12:04:17.049000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp13 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32) 2025-12-04T12:15:05.9853803Z E1204 12:04:17.049000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp15 = tl.load(in_ptr3 + (0)) 2025-12-04T12:15:05.9854360Z E1204 12:04:17.049000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp16 = tl.broadcast_to(tmp15, [1, 1]) 2025-12-04T12:15:05.9854945Z E1204 12:04:17.049000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] for r0_offset in tl.range(0, r0_numel, R0_BLOCK): 2025-12-04T12:15:05.9855475Z E1204 12:04:17.049000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = r0_offset + r0_base 2025-12-04T12:15:05.9856048Z E1204 12:04:17.049000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T12:15:05.9856621Z E1204 12:04:17.049000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T12:15:05.9857115Z E1204 12:04:17.049000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T12:15:05.9857580Z E1204 12:04:17.049000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_2 = r0_index 2025-12-04T12:15:05.9858077Z E1204 12:04:17.049000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_1 = r0_index // 512 2025-12-04T12:15:05.9858857Z E1204 12:04:17.049000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_2), r0_mask, eviction_policy='evict_first', other=0.0).to(tl.float32) 2025-12-04T12:15:05.9859550Z E1204 12:04:17.049000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.load(in_ptr1 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0) 2025-12-04T12:15:05.9860253Z E1204 12:04:17.049000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tl.load(in_ptr2 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0) 2025-12-04T12:15:05.9860779Z E1204 12:04:17.049000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float32) 2025-12-04T12:15:05.9861285Z E1204 12:04:17.049000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tmp1 - tmp2 2025-12-04T12:15:05.9861740Z E1204 12:04:17.049000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = 512.0 2025-12-04T12:15:05.9862276Z E1204 12:04:17.049000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp6 = (tmp4 / tmp5) 2025-12-04T12:15:05.9862745Z E1204 12:04:17.049000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = 1e-05 2025-12-04T12:15:05.9863226Z E1204 12:04:17.049000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tmp6 + tmp7 2025-12-04T12:15:05.9863773Z E1204 12:04:17.049000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = libdevice.rsqrt(tmp8) 2025-12-04T12:15:05.9864262Z E1204 12:04:17.049000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = tmp3 * tmp9 2025-12-04T12:15:05.9864815Z E1204 12:04:17.049000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = tl_math.abs(tmp10) 2025-12-04T12:15:05.9865421Z E1204 12:04:17.049000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = tl.broadcast_to(tmp11, [XBLOCK, R0_BLOCK]) 2025-12-04T12:15:05.9866002Z E1204 12:04:17.049000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp14 = triton_helpers.maximum(_tmp13, tmp12) 2025-12-04T12:15:05.9866609Z E1204 12:04:17.049000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp13 = tl.where(r0_mask, tmp14, _tmp13) 2025-12-04T12:15:05.9867108Z E1204 12:04:17.049000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp17 = tmp10 * tmp16 2025-12-04T12:15:05.9867574Z E1204 12:04:17.049000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp18 = -448.0 2025-12-04T12:15:05.9868165Z E1204 12:04:17.049000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp19 = triton_helpers.maximum(tmp17, tmp18) 2025-12-04T12:15:05.9868623Z E1204 12:04:17.049000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp20 = 448.0 2025-12-04T12:15:05.9869246Z E1204 12:04:17.049000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp21 = triton_helpers.minimum(tmp19, tmp20) 2025-12-04T12:15:05.9869787Z E1204 12:04:17.049000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp22 = tmp21.to(tl.float8e4nv) 2025-12-04T12:15:05.9870512Z E1204 12:04:17.049000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (tl.broadcast_to(r0_2, [XBLOCK, R0_BLOCK])), tmp22, r0_mask) 2025-12-04T12:15:05.9871319Z E1204 12:04:17.049000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = triton_helpers.max2(_tmp13, 1)[:, None] 2025-12-04T12:15:05.9871840Z E1204 12:04:17.049000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp23 = tmp13.to(tl.float32) 2025-12-04T12:15:05.9872568Z E1204 12:04:17.049000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr2 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp23, None) 2025-12-04T12:15:05.9872928Z E1204 12:04:17.049000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:05.9875556Z E1204 12:04:17.049000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'in_ptr2': '*fp32', 'in_ptr3': '*fp32', 'out_ptr1': '*fp8e4nv', 'out_ptr2': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1, 'R0_BLOCK': 2048}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]], (7,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 75} 2025-12-04T12:15:05.9876164Z E1204 12:04:17.049000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:05.9877216Z E1204 12:04:17.049000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.9877845Z E1204 12:04:17.049000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.9878799Z E1204 12:04:17.049000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.9879480Z E1204 12:04:17.049000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.9880377Z E1204 12:04:17.049000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.9881199Z E1204 12:04:17.049000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.9881808Z E1204 12:04:17.049000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T12:15:05.9883074Z E1204 12:04:17.049000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T12:15:05.9883512Z E1204 12:04:17.049000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T12:15:05.9884417Z E1204 12:04:17.049000 122025 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.9884524Z FAILED [0.5093s] [100%] 2025-12-04T12:15:05.9884530Z 2025-12-04T12:15:05.9884676Z ==================================== RERUNS ==================================== 2025-12-04T12:15:05.9885087Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,512_cuda _ 2025-12-04T12:15:05.9885215Z Traceback (most recent call last): 2025-12-04T12:15:05.9885655Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant 2025-12-04T12:15:05.9885891Z y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T12:15:05.9886382Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:05.9886644Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:05.9887162Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:05.9887371Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:05.9887882Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:05.9888030Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:05.9888578Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:05.9888899Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:05.9889468Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:05.9889620Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:05.9890101Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:05.9890236Z return self._compile_to_module() 2025-12-04T12:15:05.9890720Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:05.9890885Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:05.9891435Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:05.9891596Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:05.9892105Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:05.9892345Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:05.9892931Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:05.9893112Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:05.9893624Z File "/tmp/tmpuio9ao7y/3k/c3kj2kiyfpwgkww7duyhamkmysena3itstpvte6z3exyizeie6zu.py", line 137, in 2025-12-04T12:15:05.9894101Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T12:15:05.9894218Z kernel.precompile( 2025-12-04T12:15:05.9894781Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T12:15:05.9894916Z self._precompile_worker() 2025-12-04T12:15:05.9895518Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:05.9895731Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:05.9896400Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.9896603Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.9897074Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.9897320Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.9897766Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.9898121Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.9898352Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:05.9899179Z def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T12:15:05.9899275Z ^ 2025-12-04T12:15:05.9899731Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.9899737Z 2025-12-04T12:15:05.9900466Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:05.9900473Z 2025-12-04T12:15:05.9900477Z 2025-12-04T12:15:05.9900696Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:05.9901421Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,512_cuda 2025-12-04T12:15:05.9901429Z 2025-12-04T12:15:05.9901738Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:05.9901974Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.9902103Z frames [('total', 1)] 2025-12-04T12:15:05.9902224Z stats [('calls_captured', 10)] 2025-12-04T12:15:05.9902706Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:15:05.9902930Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.9903034Z graph_break [] 2025-12-04T12:15:05.9903437Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,512_cuda _ 2025-12-04T12:15:05.9903567Z Traceback (most recent call last): 2025-12-04T12:15:05.9904022Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant 2025-12-04T12:15:05.9904275Z y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T12:15:05.9904764Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:05.9905033Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:05.9905579Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:05.9905771Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:05.9906299Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:05.9906451Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:05.9907001Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:05.9907322Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:05.9907880Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:05.9908047Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:05.9908535Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:05.9908656Z return self._compile_to_module() 2025-12-04T12:15:05.9909157Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:05.9909321Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:05.9909845Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:05.9909977Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:05.9910472Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:05.9910724Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:05.9911311Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:05.9911449Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:05.9911950Z File "/tmp/tmp8rtpeauw/cc/ccctdkq4glp3ayx2tg773vcu44cl24rp45fxhxbasyfsf3qg5w7r.py", line 137, in 2025-12-04T12:15:05.9912412Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T12:15:05.9912537Z kernel.precompile( 2025-12-04T12:15:05.9913096Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T12:15:05.9913212Z self._precompile_worker() 2025-12-04T12:15:05.9913857Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:05.9914040Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:05.9914643Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.9914845Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.9915295Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.9915552Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.9915998Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.9916456Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.9916685Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:05.9917500Z def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T12:15:05.9917636Z ^ 2025-12-04T12:15:05.9918093Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.9918100Z 2025-12-04T12:15:05.9918825Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:05.9918830Z 2025-12-04T12:15:05.9918835Z 2025-12-04T12:15:05.9919054Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:05.9919758Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,512_cuda 2025-12-04T12:15:05.9919808Z 2025-12-04T12:15:05.9920083Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:05.9920306Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.9920426Z frames [('total', 1)] 2025-12-04T12:15:05.9920545Z stats [('calls_captured', 10)] 2025-12-04T12:15:05.9921008Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:15:05.9921241Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.9921343Z graph_break [] 2025-12-04T12:15:05.9921576Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.9921681Z frames [('total', 1)] 2025-12-04T12:15:05.9921799Z stats [('calls_captured', 10)] 2025-12-04T12:15:05.9922032Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.9922498Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:15:05.9922596Z graph_break [] 2025-12-04T12:15:05.9922756Z =================================== FAILURES =================================== 2025-12-04T12:15:05.9923147Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,512_cuda _ 2025-12-04T12:15:05.9923272Z Traceback (most recent call last): 2025-12-04T12:15:05.9923712Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant 2025-12-04T12:15:05.9923947Z y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T12:15:05.9924448Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:05.9924701Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:05.9925217Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:05.9925457Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:05.9925967Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:05.9926131Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:05.9926663Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:05.9926983Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:05.9927516Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:05.9927696Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:05.9928190Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:05.9928318Z return self._compile_to_module() 2025-12-04T12:15:05.9928800Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:05.9929028Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:05.9929543Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:05.9929672Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:05.9930179Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:05.9930407Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:05.9931007Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:05.9931163Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:05.9931676Z File "/tmp/tmp70m2r3ad/ey/ceycqjnmpvwnxuzpllna34bpdo66bnzsjlxrkdyo2klbckgshgp4.py", line 137, in 2025-12-04T12:15:05.9932150Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T12:15:05.9932267Z kernel.precompile( 2025-12-04T12:15:05.9932834Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T12:15:05.9932953Z self._precompile_worker() 2025-12-04T12:15:05.9933547Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:05.9933741Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:05.9934338Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.9934540Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.9935004Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.9935247Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.9935704Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.9936039Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.9936264Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:05.9937165Z def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T12:15:05.9937258Z ^ 2025-12-04T12:15:05.9937733Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.9937786Z 2025-12-04T12:15:05.9938503Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:05.9938512Z 2025-12-04T12:15:05.9938516Z 2025-12-04T12:15:05.9938733Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:05.9939447Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,512_cuda 2025-12-04T12:15:05.9939453Z 2025-12-04T12:15:05.9939726Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:05.9939994Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.9940102Z frames [('total', 1)] 2025-12-04T12:15:05.9940219Z stats [('calls_captured', 10)] 2025-12-04T12:15:05.9940698Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:15:05.9940923Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.9941067Z graph_break [] 2025-12-04T12:15:05.9941285Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.9941389Z frames [('total', 1)] 2025-12-04T12:15:05.9941521Z stats [('calls_captured', 10)] 2025-12-04T12:15:05.9941742Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.9942203Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:15:05.9942316Z graph_break [] 2025-12-04T12:15:05.9942534Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:05.9942651Z frames [('total', 1)] 2025-12-04T12:15:05.9942803Z stats [('calls_captured', 10)] 2025-12-04T12:15:05.9943024Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:05.9943492Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:15:05.9943597Z graph_break [] 2025-12-04T12:15:05.9944247Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-38b24c1b21208356.xml - 2025-12-04T12:15:05.9944436Z =========================== short test summary info ============================ 2025-12-04T12:15:05.9945264Z FAILED [0.5093s] inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,512_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:05.9946084Z def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T12:15:05.9946177Z ^ 2025-12-04T12:15:05.9946635Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.9946641Z 2025-12-04T12:15:05.9947362Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:05.9947368Z 2025-12-04T12:15:05.9947372Z 2025-12-04T12:15:05.9947593Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:05.9948303Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,512_cuda 2025-12-04T12:15:05.9948309Z 2025-12-04T12:15:05.9948580Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:05.9948776Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T12:15:05.9949019Z ================== 1 failed, 187 deselected, 2 rerun in 4.62s ================== 2025-12-04T12:15:05.9949122Z Got exit code 1 2025-12-04T12:15:05.9949245Z Retrying single test... 2025-12-04T12:15:05.9949714Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-b1ae24833396f782.xml 2025-12-04T12:15:05.9949883Z ============================= test session starts ============================== 2025-12-04T12:15:05.9950253Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T12:15:05.9950364Z cachedir: .pytest_cache 2025-12-04T12:15:05.9950883Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:15:05.9955697Z rootdir: /var/lib/jenkins/workspace 2025-12-04T12:15:05.9955911Z configfile: pytest.ini 2025-12-04T12:15:05.9956538Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T12:15:05.9956775Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T12:15:05.9957568Z stepcurrent: skipping 35 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,512_cuda 2025-12-04T12:15:05.9957743Z Running 1 items in this shard 2025-12-04T12:15:05.9957750Z 2025-12-04T12:15:05.9959193Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,512_cuda E1204 12:04:35.569000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1 2025-12-04T12:15:05.9960466Z E1204 12:04:35.569000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T12:15:05.9960944Z E1204 12:04:35.569000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T12:15:05.9961409Z E1204 12:04:35.569000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 5120 2025-12-04T12:15:05.9961871Z E1204 12:04:35.569000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T12:15:05.9962407Z E1204 12:04:35.569000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T12:15:05.9962960Z E1204 12:04:35.569000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:05.9963548Z E1204 12:04:35.569000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T12:15:05.9964149Z E1204 12:04:35.569000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T12:15:05.9964704Z E1204 12:04:35.569000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_base = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T12:15:05.9965157Z E1204 12:04:35.569000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rbase = r0_base 2025-12-04T12:15:05.9965806Z E1204 12:04:35.569000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp13 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32) 2025-12-04T12:15:05.9966335Z E1204 12:04:35.569000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp15 = tl.load(in_ptr3 + (0)) 2025-12-04T12:15:05.9966931Z E1204 12:04:35.569000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp16 = tl.broadcast_to(tmp15, [1, 1]) 2025-12-04T12:15:05.9967518Z E1204 12:04:35.569000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] for r0_offset in tl.range(0, r0_numel, R0_BLOCK): 2025-12-04T12:15:05.9968049Z E1204 12:04:35.569000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = r0_offset + r0_base 2025-12-04T12:15:05.9968589Z E1204 12:04:35.569000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T12:15:05.9969079Z E1204 12:04:35.569000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T12:15:05.9969600Z E1204 12:04:35.569000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T12:15:05.9970067Z E1204 12:04:35.569000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_2 = r0_index 2025-12-04T12:15:05.9970589Z E1204 12:04:35.569000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_1 = r0_index // 512 2025-12-04T12:15:05.9971564Z E1204 12:04:35.569000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_2), r0_mask, eviction_policy='evict_first', other=0.0).to(tl.float32) 2025-12-04T12:15:05.9972338Z E1204 12:04:35.569000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.load(in_ptr1 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0) 2025-12-04T12:15:05.9973043Z E1204 12:04:35.569000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tl.load(in_ptr2 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0) 2025-12-04T12:15:05.9973575Z E1204 12:04:35.569000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float32) 2025-12-04T12:15:05.9974129Z E1204 12:04:35.569000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tmp1 - tmp2 2025-12-04T12:15:05.9974581Z E1204 12:04:35.569000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = 512.0 2025-12-04T12:15:05.9975079Z E1204 12:04:35.569000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp6 = (tmp4 / tmp5) 2025-12-04T12:15:05.9975545Z E1204 12:04:35.569000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = 1e-05 2025-12-04T12:15:05.9976027Z E1204 12:04:35.569000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tmp6 + tmp7 2025-12-04T12:15:05.9976646Z E1204 12:04:35.569000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = libdevice.rsqrt(tmp8) 2025-12-04T12:15:05.9977142Z E1204 12:04:35.569000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = tmp3 * tmp9 2025-12-04T12:15:05.9977666Z E1204 12:04:35.569000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = tl_math.abs(tmp10) 2025-12-04T12:15:05.9978274Z E1204 12:04:35.569000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = tl.broadcast_to(tmp11, [XBLOCK, R0_BLOCK]) 2025-12-04T12:15:05.9978855Z E1204 12:04:35.569000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp14 = triton_helpers.maximum(_tmp13, tmp12) 2025-12-04T12:15:05.9979436Z E1204 12:04:35.569000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp13 = tl.where(r0_mask, tmp14, _tmp13) 2025-12-04T12:15:05.9979935Z E1204 12:04:35.569000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp17 = tmp10 * tmp16 2025-12-04T12:15:05.9980408Z E1204 12:04:35.569000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp18 = -448.0 2025-12-04T12:15:05.9981031Z E1204 12:04:35.569000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp19 = triton_helpers.maximum(tmp17, tmp18) 2025-12-04T12:15:05.9981488Z E1204 12:04:35.569000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp20 = 448.0 2025-12-04T12:15:05.9982076Z E1204 12:04:35.569000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp21 = triton_helpers.minimum(tmp19, tmp20) 2025-12-04T12:15:05.9982611Z E1204 12:04:35.569000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp22 = tmp21.to(tl.float8e4nv) 2025-12-04T12:15:05.9983394Z E1204 12:04:35.569000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (tl.broadcast_to(r0_2, [XBLOCK, R0_BLOCK])), tmp22, r0_mask) 2025-12-04T12:15:05.9983976Z E1204 12:04:35.569000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = triton_helpers.max2(_tmp13, 1)[:, None] 2025-12-04T12:15:05.9984487Z E1204 12:04:35.569000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp23 = tmp13.to(tl.float32) 2025-12-04T12:15:05.9985235Z E1204 12:04:35.569000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr2 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp23, None) 2025-12-04T12:15:05.9985597Z E1204 12:04:35.569000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:05.9988220Z E1204 12:04:35.569000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'in_ptr2': '*fp32', 'in_ptr3': '*fp32', 'out_ptr1': '*fp8e4nv', 'out_ptr2': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1, 'R0_BLOCK': 2048}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]], (7,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 75} 2025-12-04T12:15:05.9988793Z E1204 12:04:35.569000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:05.9989850Z E1204 12:04:35.569000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:05.9990482Z E1204 12:04:35.569000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:05.9991387Z E1204 12:04:35.569000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:05.9992068Z E1204 12:04:35.569000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:05.9992957Z E1204 12:04:35.569000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:05.9993726Z E1204 12:04:35.569000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:05.9994368Z E1204 12:04:35.569000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T12:15:05.9995641Z E1204 12:04:35.569000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T12:15:05.9996009Z E1204 12:04:35.569000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T12:15:05.9997454Z E1204 12:04:35.569000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:05.9997594Z ('RERUN', {'yellow': True}) [3.5338s] [100%] 2025-12-04T12:15:05.9999074Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,512_cuda E1204 12:04:36.139000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1 2025-12-04T12:15:06.0000323Z E1204 12:04:36.139000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T12:15:06.0000801Z E1204 12:04:36.139000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T12:15:06.0001251Z E1204 12:04:36.139000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 5120 2025-12-04T12:15:06.0001712Z E1204 12:04:36.139000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T12:15:06.0002301Z E1204 12:04:36.139000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T12:15:06.0002842Z E1204 12:04:36.139000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:06.0003438Z E1204 12:04:36.139000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T12:15:06.0004022Z E1204 12:04:36.139000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T12:15:06.0004575Z E1204 12:04:36.139000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_base = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T12:15:06.0005038Z E1204 12:04:36.139000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rbase = r0_base 2025-12-04T12:15:06.0005673Z E1204 12:04:36.139000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp13 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32) 2025-12-04T12:15:06.0006209Z E1204 12:04:36.139000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp15 = tl.load(in_ptr3 + (0)) 2025-12-04T12:15:06.0006761Z E1204 12:04:36.139000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp16 = tl.broadcast_to(tmp15, [1, 1]) 2025-12-04T12:15:06.0007341Z E1204 12:04:36.139000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] for r0_offset in tl.range(0, r0_numel, R0_BLOCK): 2025-12-04T12:15:06.0007884Z E1204 12:04:36.139000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = r0_offset + r0_base 2025-12-04T12:15:06.0008412Z E1204 12:04:36.139000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T12:15:06.0008946Z E1204 12:04:36.139000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T12:15:06.0009562Z E1204 12:04:36.139000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T12:15:06.0010034Z E1204 12:04:36.139000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_2 = r0_index 2025-12-04T12:15:06.0010549Z E1204 12:04:36.139000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_1 = r0_index // 512 2025-12-04T12:15:06.0011318Z E1204 12:04:36.139000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_2), r0_mask, eviction_policy='evict_first', other=0.0).to(tl.float32) 2025-12-04T12:15:06.0012071Z E1204 12:04:36.139000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.load(in_ptr1 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0) 2025-12-04T12:15:06.0012767Z E1204 12:04:36.139000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tl.load(in_ptr2 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0) 2025-12-04T12:15:06.0013344Z E1204 12:04:36.139000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float32) 2025-12-04T12:15:06.0013832Z E1204 12:04:36.139000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tmp1 - tmp2 2025-12-04T12:15:06.0014285Z E1204 12:04:36.139000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = 512.0 2025-12-04T12:15:06.0014797Z E1204 12:04:36.139000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp6 = (tmp4 / tmp5) 2025-12-04T12:15:06.0015250Z E1204 12:04:36.139000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = 1e-05 2025-12-04T12:15:06.0015787Z E1204 12:04:36.139000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tmp6 + tmp7 2025-12-04T12:15:06.0016384Z E1204 12:04:36.139000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = libdevice.rsqrt(tmp8) 2025-12-04T12:15:06.0016879Z E1204 12:04:36.139000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = tmp3 * tmp9 2025-12-04T12:15:06.0017415Z E1204 12:04:36.139000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = tl_math.abs(tmp10) 2025-12-04T12:15:06.0018013Z E1204 12:04:36.139000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = tl.broadcast_to(tmp11, [XBLOCK, R0_BLOCK]) 2025-12-04T12:15:06.0018611Z E1204 12:04:36.139000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp14 = triton_helpers.maximum(_tmp13, tmp12) 2025-12-04T12:15:06.0019174Z E1204 12:04:36.139000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp13 = tl.where(r0_mask, tmp14, _tmp13) 2025-12-04T12:15:06.0019670Z E1204 12:04:36.139000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp17 = tmp10 * tmp16 2025-12-04T12:15:06.0020159Z E1204 12:04:36.139000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp18 = -448.0 2025-12-04T12:15:06.0020736Z E1204 12:04:36.139000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp19 = triton_helpers.maximum(tmp17, tmp18) 2025-12-04T12:15:06.0021208Z E1204 12:04:36.139000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp20 = 448.0 2025-12-04T12:15:06.0021784Z E1204 12:04:36.139000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp21 = triton_helpers.minimum(tmp19, tmp20) 2025-12-04T12:15:06.0022386Z E1204 12:04:36.139000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp22 = tmp21.to(tl.float8e4nv) 2025-12-04T12:15:06.0023094Z E1204 12:04:36.139000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (tl.broadcast_to(r0_2, [XBLOCK, R0_BLOCK])), tmp22, r0_mask) 2025-12-04T12:15:06.0023673Z E1204 12:04:36.139000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = triton_helpers.max2(_tmp13, 1)[:, None] 2025-12-04T12:15:06.0024199Z E1204 12:04:36.139000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp23 = tmp13.to(tl.float32) 2025-12-04T12:15:06.0024943Z E1204 12:04:36.139000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr2 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp23, None) 2025-12-04T12:15:06.0025330Z E1204 12:04:36.139000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:06.0027969Z E1204 12:04:36.139000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'in_ptr2': '*fp32', 'in_ptr3': '*fp32', 'out_ptr1': '*fp8e4nv', 'out_ptr2': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1, 'R0_BLOCK': 2048}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]], (7,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 75} 2025-12-04T12:15:06.0028551Z E1204 12:04:36.139000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:06.0029625Z E1204 12:04:36.139000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:06.0030263Z E1204 12:04:36.139000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:06.0031154Z E1204 12:04:36.139000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:06.0031833Z E1204 12:04:36.139000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:06.0032730Z E1204 12:04:36.139000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:06.0033504Z E1204 12:04:36.139000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:06.0034123Z E1204 12:04:36.139000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T12:15:06.0035364Z E1204 12:04:36.139000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T12:15:06.0035745Z E1204 12:04:36.139000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T12:15:06.0036715Z E1204 12:04:36.139000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:06.0037132Z ('RERUN', {'yellow': True}) [0.5303s] [100%] 2025-12-04T12:15:06.0038578Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,512_cuda E1204 12:04:36.656000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1 2025-12-04T12:15:06.0039886Z E1204 12:04:36.656000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T12:15:06.0040339Z E1204 12:04:36.656000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T12:15:06.0040790Z E1204 12:04:36.656000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 5120 2025-12-04T12:15:06.0041300Z E1204 12:04:36.656000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T12:15:06.0041838Z E1204 12:04:36.656000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T12:15:06.0042378Z E1204 12:04:36.656000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:06.0042974Z E1204 12:04:36.656000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T12:15:06.0043555Z E1204 12:04:36.656000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T12:15:06.0044158Z E1204 12:04:36.656000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_base = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T12:15:06.0044612Z E1204 12:04:36.656000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rbase = r0_base 2025-12-04T12:15:06.0045255Z E1204 12:04:36.656000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp13 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32) 2025-12-04T12:15:06.0045782Z E1204 12:04:36.656000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp15 = tl.load(in_ptr3 + (0)) 2025-12-04T12:15:06.0046329Z E1204 12:04:36.656000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp16 = tl.broadcast_to(tmp15, [1, 1]) 2025-12-04T12:15:06.0046925Z E1204 12:04:36.656000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] for r0_offset in tl.range(0, r0_numel, R0_BLOCK): 2025-12-04T12:15:06.0047455Z E1204 12:04:36.656000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = r0_offset + r0_base 2025-12-04T12:15:06.0047994Z E1204 12:04:36.656000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T12:15:06.0048482Z E1204 12:04:36.656000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T12:15:06.0048961Z E1204 12:04:36.656000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T12:15:06.0049440Z E1204 12:04:36.656000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_2 = r0_index 2025-12-04T12:15:06.0049942Z E1204 12:04:36.656000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_1 = r0_index // 512 2025-12-04T12:15:06.0050762Z E1204 12:04:36.656000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_2), r0_mask, eviction_policy='evict_first', other=0.0).to(tl.float32) 2025-12-04T12:15:06.0051461Z E1204 12:04:36.656000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.load(in_ptr1 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0) 2025-12-04T12:15:06.0052161Z E1204 12:04:36.656000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tl.load(in_ptr2 + (r0_1), r0_mask, eviction_policy='evict_last', other=0.0) 2025-12-04T12:15:06.0052686Z E1204 12:04:36.656000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float32) 2025-12-04T12:15:06.0053214Z E1204 12:04:36.656000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tmp1 - tmp2 2025-12-04T12:15:06.0053692Z E1204 12:04:36.656000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp5 = 512.0 2025-12-04T12:15:06.0054187Z E1204 12:04:36.656000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp6 = (tmp4 / tmp5) 2025-12-04T12:15:06.0054686Z E1204 12:04:36.656000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = 1e-05 2025-12-04T12:15:06.0055169Z E1204 12:04:36.656000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tmp6 + tmp7 2025-12-04T12:15:06.0055698Z E1204 12:04:36.656000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = libdevice.rsqrt(tmp8) 2025-12-04T12:15:06.0056202Z E1204 12:04:36.656000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = tmp3 * tmp9 2025-12-04T12:15:06.0056800Z E1204 12:04:36.656000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = tl_math.abs(tmp10) 2025-12-04T12:15:06.0057464Z E1204 12:04:36.656000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = tl.broadcast_to(tmp11, [XBLOCK, R0_BLOCK]) 2025-12-04T12:15:06.0058046Z E1204 12:04:36.656000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp14 = triton_helpers.maximum(_tmp13, tmp12) 2025-12-04T12:15:06.0058607Z E1204 12:04:36.656000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp13 = tl.where(r0_mask, tmp14, _tmp13) 2025-12-04T12:15:06.0059118Z E1204 12:04:36.656000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp17 = tmp10 * tmp16 2025-12-04T12:15:06.0059586Z E1204 12:04:36.656000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp18 = -448.0 2025-12-04T12:15:06.0060172Z E1204 12:04:36.656000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp19 = triton_helpers.maximum(tmp17, tmp18) 2025-12-04T12:15:06.0060632Z E1204 12:04:36.656000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp20 = 448.0 2025-12-04T12:15:06.0061206Z E1204 12:04:36.656000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp21 = triton_helpers.minimum(tmp19, tmp20) 2025-12-04T12:15:06.0061761Z E1204 12:04:36.656000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp22 = tmp21.to(tl.float8e4nv) 2025-12-04T12:15:06.0062469Z E1204 12:04:36.656000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (tl.broadcast_to(r0_2, [XBLOCK, R0_BLOCK])), tmp22, r0_mask) 2025-12-04T12:15:06.0063056Z E1204 12:04:36.656000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = triton_helpers.max2(_tmp13, 1)[:, None] 2025-12-04T12:15:06.0063571Z E1204 12:04:36.656000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp23 = tmp13.to(tl.float32) 2025-12-04T12:15:06.0064344Z E1204 12:04:36.656000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr2 + (tl.full([1, 1], 0, tl.int32).broadcast_to(XBLOCK, 1)), tmp23, None) 2025-12-04T12:15:06.0064715Z E1204 12:04:36.656000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:06.0067392Z E1204 12:04:36.656000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'in_ptr2': '*fp32', 'in_ptr3': '*fp32', 'out_ptr1': '*fp8e4nv', 'out_ptr2': '*fp16', 'xnumel': 'constexpr', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1, 'R0_BLOCK': 2048}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]], (7,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 75} 2025-12-04T12:15:06.0067936Z E1204 12:04:36.656000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:06.0069023Z E1204 12:04:36.656000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:06.0069654Z E1204 12:04:36.656000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:06.0070547Z E1204 12:04:36.656000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:06.0071497Z E1204 12:04:36.656000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:06.0072387Z E1204 12:04:36.656000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:06.0073176Z E1204 12:04:36.656000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:06.0073792Z E1204 12:04:36.656000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T12:15:06.0075054Z E1204 12:04:36.656000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T12:15:06.0075426Z E1204 12:04:36.656000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T12:15:06.0076327Z E1204 12:04:36.656000 122244 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:06.0076449Z FAILED [0.5134s] [100%] 2025-12-04T12:15:06.0076456Z 2025-12-04T12:15:06.0076604Z ==================================== RERUNS ==================================== 2025-12-04T12:15:06.0077014Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,512_cuda _ 2025-12-04T12:15:06.0077140Z Traceback (most recent call last): 2025-12-04T12:15:06.0077667Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant 2025-12-04T12:15:06.0077917Z y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T12:15:06.0078411Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:06.0078679Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:06.0079192Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:06.0079388Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:06.0079916Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:06.0080111Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:06.0080647Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:06.0080986Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:06.0081511Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:06.0081722Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:06.0082202Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:06.0082327Z return self._compile_to_module() 2025-12-04T12:15:06.0082825Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:06.0082993Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:06.0083527Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:06.0083721Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:06.0084223Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:06.0084472Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:06.0085060Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:06.0085191Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:06.0085707Z File "/tmp/tmp00sk6nt5/c3/cc3ka5ut3tzoxycnduhrx2jw53imj4r72i5wqfxcnm66srw64d2i.py", line 137, in 2025-12-04T12:15:06.0086169Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T12:15:06.0086297Z kernel.precompile( 2025-12-04T12:15:06.0086857Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T12:15:06.0086978Z self._precompile_worker() 2025-12-04T12:15:06.0087594Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:06.0087777Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:06.0088388Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:06.0088589Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:06.0089042Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:06.0089303Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:06.0089748Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:06.0090095Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:06.0090353Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:06.0091160Z def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T12:15:06.0091265Z ^ 2025-12-04T12:15:06.0091724Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:06.0091730Z 2025-12-04T12:15:06.0092453Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:06.0092459Z 2025-12-04T12:15:06.0092464Z 2025-12-04T12:15:06.0092716Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:06.0093422Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,512_cuda 2025-12-04T12:15:06.0093428Z 2025-12-04T12:15:06.0093707Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:06.0093967Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:06.0094087Z frames [('total', 1)] 2025-12-04T12:15:06.0094203Z stats [('calls_captured', 10)] 2025-12-04T12:15:06.0094667Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:15:06.0094900Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:06.0095002Z graph_break [] 2025-12-04T12:15:06.0095401Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,512_cuda _ 2025-12-04T12:15:06.0095540Z Traceback (most recent call last): 2025-12-04T12:15:06.0096002Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant 2025-12-04T12:15:06.0096248Z y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T12:15:06.0096798Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:06.0097049Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:06.0097578Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:06.0097774Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:06.0098296Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:06.0098449Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:06.0098983Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:06.0099320Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:06.0099845Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:06.0099998Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:06.0100491Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:06.0100612Z return self._compile_to_module() 2025-12-04T12:15:06.0101107Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:06.0101268Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:06.0101787Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:06.0101929Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:06.0102469Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:06.0102714Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:06.0103306Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:06.0103433Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:06.0103955Z File "/tmp/tmpyn5psnqo/3g/c3ga3bog5mrqjsrpzn6bpss6y4kpevtk35kjgnyxu2lktonmrbpp.py", line 137, in 2025-12-04T12:15:06.0104415Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T12:15:06.0104529Z kernel.precompile( 2025-12-04T12:15:06.0105129Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T12:15:06.0105251Z self._precompile_worker() 2025-12-04T12:15:06.0105858Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:06.0106071Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:06.0106668Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:06.0106881Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:06.0107336Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:06.0107595Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:06.0108043Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:06.0108410Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:06.0108656Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:06.0109462Z def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T12:15:06.0109567Z ^ 2025-12-04T12:15:06.0110027Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:06.0110032Z 2025-12-04T12:15:06.0110751Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:06.0110757Z 2025-12-04T12:15:06.0110764Z 2025-12-04T12:15:06.0110993Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:06.0111692Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,512_cuda 2025-12-04T12:15:06.0111698Z 2025-12-04T12:15:06.0111979Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:06.0112205Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:06.0112310Z frames [('total', 1)] 2025-12-04T12:15:06.0112441Z stats [('calls_captured', 10)] 2025-12-04T12:15:06.0112912Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:15:06.0113146Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:06.0113248Z graph_break [] 2025-12-04T12:15:06.0113467Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:06.0113582Z frames [('total', 1)] 2025-12-04T12:15:06.0113705Z stats [('calls_captured', 10)] 2025-12-04T12:15:06.0113958Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:06.0114434Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:15:06.0114535Z graph_break [] 2025-12-04T12:15:06.0114687Z =================================== FAILURES =================================== 2025-12-04T12:15:06.0115095Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,512_cuda _ 2025-12-04T12:15:06.0115218Z Traceback (most recent call last): 2025-12-04T12:15:06.0115653Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant 2025-12-04T12:15:06.0115885Z y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T12:15:06.0116412Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:06.0116674Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:06.0117188Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:06.0117397Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:06.0117934Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:06.0118083Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:06.0118630Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:06.0118951Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:06.0119473Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:06.0119663Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:06.0120143Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:06.0120274Z return self._compile_to_module() 2025-12-04T12:15:06.0120763Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:06.0120928Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:06.0121455Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:06.0121584Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:06.0122090Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:06.0122325Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:06.0122917Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:06.0123056Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:06.0123566Z File "/tmp/tmp4lr11uw0/xg/cxgekqvj2prddxnzlrmwnkmuggs6vyoojttp6nduyhega5v2gj3z.py", line 137, in 2025-12-04T12:15:06.0124028Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T12:15:06.0124152Z kernel.precompile( 2025-12-04T12:15:06.0124707Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T12:15:06.0124839Z self._precompile_worker() 2025-12-04T12:15:06.0125433Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:06.0125617Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:06.0126335Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:06.0126536Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:06.0127000Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:06.0127246Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:06.0127688Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:06.0128035Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:06.0128260Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:06.0129126Z def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T12:15:06.0129222Z ^ 2025-12-04T12:15:06.0129682Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:06.0129688Z 2025-12-04T12:15:06.0130457Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:06.0130463Z 2025-12-04T12:15:06.0130468Z 2025-12-04T12:15:06.0130685Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:06.0131397Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,512_cuda 2025-12-04T12:15:06.0131404Z 2025-12-04T12:15:06.0131676Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:06.0131899Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:06.0132048Z frames [('total', 1)] 2025-12-04T12:15:06.0132169Z stats [('calls_captured', 10)] 2025-12-04T12:15:06.0132644Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:15:06.0132870Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:06.0132972Z graph_break [] 2025-12-04T12:15:06.0133204Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:06.0133309Z frames [('total', 1)] 2025-12-04T12:15:06.0133426Z stats [('calls_captured', 10)] 2025-12-04T12:15:06.0134074Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:06.0134540Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:15:06.0134654Z graph_break [] 2025-12-04T12:15:06.0134871Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:06.0134981Z frames [('total', 1)] 2025-12-04T12:15:06.0135110Z stats [('calls_captured', 10)] 2025-12-04T12:15:06.0135329Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:06.0135794Z inductor [('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:15:06.0135906Z graph_break [] 2025-12-04T12:15:06.0136641Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-b1ae24833396f782.xml - 2025-12-04T12:15:06.0136835Z =========================== short test summary info ============================ 2025-12-04T12:15:06.0137689Z FAILED [0.5134s] inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,512_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:06.0138725Z def triton_red_fused__to_copy_abs_amax_clamp_copy__fill_mul_native_layer_norm_select_view_1(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr1, out_ptr2, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T12:15:06.0138836Z ^ 2025-12-04T12:15:06.0139297Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:06.0139306Z 2025-12-04T12:15:06.0140031Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:06.0140037Z 2025-12-04T12:15:06.0140041Z 2025-12-04T12:15:06.0140260Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:06.0140993Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,512_cuda 2025-12-04T12:15:06.0141014Z 2025-12-04T12:15:06.0141289Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:06.0141477Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T12:15:06.0141700Z ================== 1 failed, 187 deselected, 2 rerun in 4.62s ================== 2025-12-04T12:15:06.0141836Z Got exit code 1 2025-12-04T12:15:06.0142455Z FAILED CONSISTENTLY: test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,512_cuda 2025-12-04T12:15:06.0142881Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T12:15:06.0143354Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-80996ba6b8c32f81.xml 2025-12-04T12:15:06.0143537Z ============================= test session starts ============================== 2025-12-04T12:15:06.0143894Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T12:15:06.0144041Z cachedir: .pytest_cache 2025-12-04T12:15:06.0144578Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:15:06.0144708Z rootdir: /var/lib/jenkins/workspace 2025-12-04T12:15:06.0144826Z configfile: pytest.ini 2025-12-04T12:15:06.0145431Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T12:15:06.0145662Z collecting ... collected 188 items / 36 deselected / 152 selected 2025-12-04T12:15:06.0145823Z stepcurrent: skipping 36 already run items. 2025-12-04T12:15:06.0145942Z Running 152 items in this shard 2025-12-04T12:15:06.0145947Z 2025-12-04T12:15:06.0147302Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_4,2048,4096_cuda E1204 12:04:55.395000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0 2025-12-04T12:15:06.0148414Z E1204 12:04:55.395000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T12:15:06.0148868Z E1204 12:04:55.395000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 8192 2025-12-04T12:15:06.0149336Z E1204 12:04:55.395000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 4096 2025-12-04T12:15:06.0149799Z E1204 12:04:55.395000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T12:15:06.0150351Z E1204 12:04:55.395000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T12:15:06.0150927Z E1204 12:04:55.395000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:06.0151513Z E1204 12:04:55.395000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T12:15:06.0152109Z E1204 12:04:55.395000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T12:15:06.0152663Z E1204 12:04:55.395000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_base = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T12:15:06.0153124Z E1204 12:04:55.395000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rbase = r0_base 2025-12-04T12:15:06.0153584Z E1204 12:04:55.395000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T12:15:06.0154189Z E1204 12:04:55.395000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_mean = tl.zeros([XBLOCK, R0_BLOCK], tl.float32) 2025-12-04T12:15:06.0154792Z E1204 12:04:55.395000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_m2 = tl.zeros([XBLOCK, R0_BLOCK], tl.float32) 2025-12-04T12:15:06.0155470Z E1204 12:04:55.395000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_weight = tl.zeros([XBLOCK, R0_BLOCK], tl.float32) 2025-12-04T12:15:06.0156065Z E1204 12:04:55.395000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] for r0_offset in tl.range(0, r0_numel, R0_BLOCK): 2025-12-04T12:15:06.0156599Z E1204 12:04:55.395000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = r0_offset + r0_base 2025-12-04T12:15:06.0157143Z E1204 12:04:55.395000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T12:15:06.0157667Z E1204 12:04:55.395000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T12:15:06.0158146Z E1204 12:04:55.395000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T12:15:06.0158631Z E1204 12:04:55.395000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_1 = r0_index 2025-12-04T12:15:06.0159420Z E1204 12:04:55.395000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask, eviction_policy='evict_last', other=0.0).to(tl.float32) 2025-12-04T12:15:06.0159957Z E1204 12:04:55.395000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float32) 2025-12-04T12:15:06.0160544Z E1204 12:04:55.395000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK]) 2025-12-04T12:15:06.0161268Z E1204 12:04:55.395000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_mean_next, tmp3_m2_next, tmp3_weight_next = triton_helpers.welford_reduce( 2025-12-04T12:15:06.0161886Z E1204 12:04:55.395000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2, tmp3_mean, tmp3_m2, tmp3_weight, roffset == 0 2025-12-04T12:15:06.0162288Z E1204 12:04:55.395000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ) 2025-12-04T12:15:06.0162918Z E1204 12:04:55.395000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_mean = tl.where(r0_mask, tmp3_mean_next, tmp3_mean) 2025-12-04T12:15:06.0163506Z E1204 12:04:55.395000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_m2 = tl.where(r0_mask, tmp3_m2_next, tmp3_m2) 2025-12-04T12:15:06.0164198Z E1204 12:04:55.395000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_weight = tl.where(r0_mask, tmp3_weight_next, tmp3_weight) 2025-12-04T12:15:06.0164906Z E1204 12:04:55.395000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4, tmp5, tmp6 = triton_helpers.welford(tmp3_mean, tmp3_m2, tmp3_weight, 1) 2025-12-04T12:15:06.0165388Z E1204 12:04:55.395000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tmp4[:, None] 2025-12-04T12:15:06.0165878Z E1204 12:04:55.395000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = tmp5[:, None] 2025-12-04T12:15:06.0166347Z E1204 12:04:55.395000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tmp6[:, None] 2025-12-04T12:15:06.0167028Z E1204 12:04:55.395000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp20 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32) 2025-12-04T12:15:06.0167558Z E1204 12:04:55.395000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp22 = tl.load(in_ptr1 + (0)) 2025-12-04T12:15:06.0168107Z E1204 12:04:55.395000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp23 = tl.broadcast_to(tmp22, [1, 1]) 2025-12-04T12:15:06.0168727Z E1204 12:04:55.395000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] for r0_offset in tl.range(0, r0_numel, R0_BLOCK): 2025-12-04T12:15:06.0169256Z E1204 12:04:55.395000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = r0_offset + r0_base 2025-12-04T12:15:06.0169793Z E1204 12:04:55.395000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T12:15:06.0170284Z E1204 12:04:55.395000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T12:15:06.0170763Z E1204 12:04:55.395000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T12:15:06.0171451Z E1204 12:04:55.395000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_1 = r0_index 2025-12-04T12:15:06.0172242Z E1204 12:04:55.395000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask, eviction_policy='evict_first', other=0.0).to(tl.float32) 2025-12-04T12:15:06.0172786Z E1204 12:04:55.395000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = tmp9.to(tl.float32) 2025-12-04T12:15:06.0173285Z E1204 12:04:55.395000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = tmp10 - tmp3 2025-12-04T12:15:06.0173766Z E1204 12:04:55.395000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = 4096.0 2025-12-04T12:15:06.0174273Z E1204 12:04:55.395000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = (tmp7 / tmp12) 2025-12-04T12:15:06.0174739Z E1204 12:04:55.395000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp14 = 1e-05 2025-12-04T12:15:06.0175250Z E1204 12:04:55.395000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp15 = tmp13 + tmp14 2025-12-04T12:15:06.0175796Z E1204 12:04:55.395000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp16 = libdevice.rsqrt(tmp15) 2025-12-04T12:15:06.0176368Z E1204 12:04:55.395000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp17 = tmp11 * tmp16 2025-12-04T12:15:06.0176892Z E1204 12:04:55.395000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp18 = tl_math.abs(tmp17) 2025-12-04T12:15:06.0177486Z E1204 12:04:55.395000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp19 = tl.broadcast_to(tmp18, [XBLOCK, R0_BLOCK]) 2025-12-04T12:15:06.0178179Z E1204 12:04:55.395000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp21 = triton_helpers.maximum(_tmp20, tmp19) 2025-12-04T12:15:06.0178742Z E1204 12:04:55.395000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp20 = tl.where(r0_mask, tmp21, _tmp20) 2025-12-04T12:15:06.0179252Z E1204 12:04:55.395000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp24 = tmp17 * tmp23 2025-12-04T12:15:06.0179719Z E1204 12:04:55.395000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp25 = -448.0 2025-12-04T12:15:06.0180360Z E1204 12:04:55.395000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp26 = triton_helpers.maximum(tmp24, tmp25) 2025-12-04T12:15:06.0180835Z E1204 12:04:55.395000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp27 = 448.0 2025-12-04T12:15:06.0181417Z E1204 12:04:55.395000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp28 = triton_helpers.minimum(tmp26, tmp27) 2025-12-04T12:15:06.0181969Z E1204 12:04:55.395000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp29 = tmp28.to(tl.float8e4nv) 2025-12-04T12:15:06.0182612Z E1204 12:04:55.395000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr3 + (r0_1 + 4096*x0), tmp29, r0_mask) 2025-12-04T12:15:06.0183185Z E1204 12:04:55.395000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp20 = triton_helpers.max2(_tmp20, 1)[:, None] 2025-12-04T12:15:06.0183746Z E1204 12:04:55.395000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr2 + (x0), tmp20, None) 2025-12-04T12:15:06.0184109Z E1204 12:04:55.395000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:06.0186508Z E1204 12:04:55.395000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr2': '*fp32', 'out_ptr3': '*fp8e4nv', 'xnumel': 'i32', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1, 'R0_BLOCK': 4096}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 75} 2025-12-04T12:15:06.0187049Z E1204 12:04:55.395000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:06.0188110Z E1204 12:04:55.395000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:06.0188743Z E1204 12:04:55.395000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:06.0189648Z E1204 12:04:55.395000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:06.0190328Z E1204 12:04:55.395000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:06.0191226Z E1204 12:04:55.395000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:06.0192044Z E1204 12:04:55.395000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:06.0192659Z E1204 12:04:55.395000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T12:15:06.0193768Z E1204 12:04:55.395000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T12:15:06.0194135Z E1204 12:04:55.395000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T12:15:06.0195070Z E1204 12:04:55.395000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:06.0195210Z ('RERUN', {'yellow': True}) [3.4498s] [ 0%] 2025-12-04T12:15:06.0196565Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_4,2048,4096_cuda E1204 12:04:55.877000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0 2025-12-04T12:15:06.0197686Z E1204 12:04:55.877000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T12:15:06.0198133Z E1204 12:04:55.877000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 8192 2025-12-04T12:15:06.0198595Z E1204 12:04:55.877000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 4096 2025-12-04T12:15:06.0199092Z E1204 12:04:55.877000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T12:15:06.0199637Z E1204 12:04:55.877000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T12:15:06.0200183Z E1204 12:04:55.877000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:06.0200765Z E1204 12:04:55.877000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T12:15:06.0201358Z E1204 12:04:55.877000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T12:15:06.0201910Z E1204 12:04:55.877000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_base = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T12:15:06.0202373Z E1204 12:04:55.877000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rbase = r0_base 2025-12-04T12:15:06.0202801Z E1204 12:04:55.877000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T12:15:06.0203413Z E1204 12:04:55.877000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_mean = tl.zeros([XBLOCK, R0_BLOCK], tl.float32) 2025-12-04T12:15:06.0203998Z E1204 12:04:55.877000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_m2 = tl.zeros([XBLOCK, R0_BLOCK], tl.float32) 2025-12-04T12:15:06.0204602Z E1204 12:04:55.877000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_weight = tl.zeros([XBLOCK, R0_BLOCK], tl.float32) 2025-12-04T12:15:06.0205197Z E1204 12:04:55.877000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] for r0_offset in tl.range(0, r0_numel, R0_BLOCK): 2025-12-04T12:15:06.0205761Z E1204 12:04:55.877000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = r0_offset + r0_base 2025-12-04T12:15:06.0206305Z E1204 12:04:55.877000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T12:15:06.0206795Z E1204 12:04:55.877000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T12:15:06.0207270Z E1204 12:04:55.877000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T12:15:06.0207747Z E1204 12:04:55.877000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_1 = r0_index 2025-12-04T12:15:06.0208560Z E1204 12:04:55.877000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask, eviction_policy='evict_last', other=0.0).to(tl.float32) 2025-12-04T12:15:06.0209102Z E1204 12:04:55.877000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float32) 2025-12-04T12:15:06.0209688Z E1204 12:04:55.877000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK]) 2025-12-04T12:15:06.0210434Z E1204 12:04:55.877000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_mean_next, tmp3_m2_next, tmp3_weight_next = triton_helpers.welford_reduce( 2025-12-04T12:15:06.0211050Z E1204 12:04:55.877000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2, tmp3_mean, tmp3_m2, tmp3_weight, roffset == 0 2025-12-04T12:15:06.0211450Z E1204 12:04:55.877000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ) 2025-12-04T12:15:06.0212080Z E1204 12:04:55.877000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_mean = tl.where(r0_mask, tmp3_mean_next, tmp3_mean) 2025-12-04T12:15:06.0212702Z E1204 12:04:55.877000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_m2 = tl.where(r0_mask, tmp3_m2_next, tmp3_m2) 2025-12-04T12:15:06.0213359Z E1204 12:04:55.877000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_weight = tl.where(r0_mask, tmp3_weight_next, tmp3_weight) 2025-12-04T12:15:06.0214061Z E1204 12:04:55.877000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4, tmp5, tmp6 = triton_helpers.welford(tmp3_mean, tmp3_m2, tmp3_weight, 1) 2025-12-04T12:15:06.0214541Z E1204 12:04:55.877000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tmp4[:, None] 2025-12-04T12:15:06.0215028Z E1204 12:04:55.877000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = tmp5[:, None] 2025-12-04T12:15:06.0215506Z E1204 12:04:55.877000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tmp6[:, None] 2025-12-04T12:15:06.0216147Z E1204 12:04:55.877000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp20 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32) 2025-12-04T12:15:06.0216735Z E1204 12:04:55.877000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp22 = tl.load(in_ptr1 + (0)) 2025-12-04T12:15:06.0217284Z E1204 12:04:55.877000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp23 = tl.broadcast_to(tmp22, [1, 1]) 2025-12-04T12:15:06.0217878Z E1204 12:04:55.877000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] for r0_offset in tl.range(0, r0_numel, R0_BLOCK): 2025-12-04T12:15:06.0218413Z E1204 12:04:55.877000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = r0_offset + r0_base 2025-12-04T12:15:06.0218999Z E1204 12:04:55.877000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T12:15:06.0219490Z E1204 12:04:55.877000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T12:15:06.0219987Z E1204 12:04:55.877000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T12:15:06.0220452Z E1204 12:04:55.877000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_1 = r0_index 2025-12-04T12:15:06.0221238Z E1204 12:04:55.877000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask, eviction_policy='evict_first', other=0.0).to(tl.float32) 2025-12-04T12:15:06.0221814Z E1204 12:04:55.877000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = tmp9.to(tl.float32) 2025-12-04T12:15:06.0222316Z E1204 12:04:55.877000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = tmp10 - tmp3 2025-12-04T12:15:06.0222790Z E1204 12:04:55.877000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = 4096.0 2025-12-04T12:15:06.0223324Z E1204 12:04:55.877000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = (tmp7 / tmp12) 2025-12-04T12:15:06.0223786Z E1204 12:04:55.877000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp14 = 1e-05 2025-12-04T12:15:06.0224298Z E1204 12:04:55.877000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp15 = tmp13 + tmp14 2025-12-04T12:15:06.0224838Z E1204 12:04:55.877000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp16 = libdevice.rsqrt(tmp15) 2025-12-04T12:15:06.0225347Z E1204 12:04:55.877000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp17 = tmp11 * tmp16 2025-12-04T12:15:06.0225903Z E1204 12:04:55.877000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp18 = tl_math.abs(tmp17) 2025-12-04T12:15:06.0226504Z E1204 12:04:55.877000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp19 = tl.broadcast_to(tmp18, [XBLOCK, R0_BLOCK]) 2025-12-04T12:15:06.0227096Z E1204 12:04:55.877000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp21 = triton_helpers.maximum(_tmp20, tmp19) 2025-12-04T12:15:06.0227656Z E1204 12:04:55.877000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp20 = tl.where(r0_mask, tmp21, _tmp20) 2025-12-04T12:15:06.0228166Z E1204 12:04:55.877000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp24 = tmp17 * tmp23 2025-12-04T12:15:06.0228630Z E1204 12:04:55.877000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp25 = -448.0 2025-12-04T12:15:06.0229210Z E1204 12:04:55.877000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp26 = triton_helpers.maximum(tmp24, tmp25) 2025-12-04T12:15:06.0229684Z E1204 12:04:55.877000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp27 = 448.0 2025-12-04T12:15:06.0230260Z E1204 12:04:55.877000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp28 = triton_helpers.minimum(tmp26, tmp27) 2025-12-04T12:15:06.0230811Z E1204 12:04:55.877000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp29 = tmp28.to(tl.float8e4nv) 2025-12-04T12:15:06.0231409Z E1204 12:04:55.877000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr3 + (r0_1 + 4096*x0), tmp29, r0_mask) 2025-12-04T12:15:06.0232049Z E1204 12:04:55.877000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp20 = triton_helpers.max2(_tmp20, 1)[:, None] 2025-12-04T12:15:06.0232601Z E1204 12:04:55.877000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr2 + (x0), tmp20, None) 2025-12-04T12:15:06.0232963Z E1204 12:04:55.877000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:06.0235356Z E1204 12:04:55.877000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr2': '*fp32', 'out_ptr3': '*fp8e4nv', 'xnumel': 'i32', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1, 'R0_BLOCK': 4096}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 75} 2025-12-04T12:15:06.0235896Z E1204 12:04:55.877000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:06.0236984Z E1204 12:04:55.877000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:06.0237610Z E1204 12:04:55.877000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:06.0238519Z E1204 12:04:55.877000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:06.0239229Z E1204 12:04:55.877000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:06.0240119Z E1204 12:04:55.877000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:06.0240896Z E1204 12:04:55.877000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:06.0241501Z E1204 12:04:55.877000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T12:15:06.0242614Z E1204 12:04:55.877000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T12:15:06.0242983Z E1204 12:04:55.877000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T12:15:06.0243884Z E1204 12:04:55.877000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:06.0244023Z ('RERUN', {'yellow': True}) [0.4440s] [ 0%] 2025-12-04T12:15:06.0245377Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_4,2048,4096_cuda E1204 12:04:56.325000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0 2025-12-04T12:15:06.0246497Z E1204 12:04:56.325000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T12:15:06.0246965Z E1204 12:04:56.325000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 8192 2025-12-04T12:15:06.0247421Z E1204 12:04:56.325000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 4096 2025-12-04T12:15:06.0247888Z E1204 12:04:56.325000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T12:15:06.0248437Z E1204 12:04:56.325000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T12:15:06.0249012Z E1204 12:04:56.325000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:06.0249611Z E1204 12:04:56.325000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T12:15:06.0250199Z E1204 12:04:56.325000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T12:15:06.0250784Z E1204 12:04:56.325000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_base = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T12:15:06.0251249Z E1204 12:04:56.325000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rbase = r0_base 2025-12-04T12:15:06.0251686Z E1204 12:04:56.325000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T12:15:06.0252300Z E1204 12:04:56.325000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_mean = tl.zeros([XBLOCK, R0_BLOCK], tl.float32) 2025-12-04T12:15:06.0252888Z E1204 12:04:56.325000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_m2 = tl.zeros([XBLOCK, R0_BLOCK], tl.float32) 2025-12-04T12:15:06.0253530Z E1204 12:04:56.325000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_weight = tl.zeros([XBLOCK, R0_BLOCK], tl.float32) 2025-12-04T12:15:06.0254126Z E1204 12:04:56.325000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] for r0_offset in tl.range(0, r0_numel, R0_BLOCK): 2025-12-04T12:15:06.0254658Z E1204 12:04:56.325000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = r0_offset + r0_base 2025-12-04T12:15:06.0255199Z E1204 12:04:56.325000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T12:15:06.0255690Z E1204 12:04:56.325000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T12:15:06.0256169Z E1204 12:04:56.325000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T12:15:06.0256756Z E1204 12:04:56.325000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_1 = r0_index 2025-12-04T12:15:06.0257542Z E1204 12:04:56.325000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask, eviction_policy='evict_last', other=0.0).to(tl.float32) 2025-12-04T12:15:06.0258089Z E1204 12:04:56.325000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float32) 2025-12-04T12:15:06.0258678Z E1204 12:04:56.325000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK]) 2025-12-04T12:15:06.0259414Z E1204 12:04:56.325000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_mean_next, tmp3_m2_next, tmp3_weight_next = triton_helpers.welford_reduce( 2025-12-04T12:15:06.0260064Z E1204 12:04:56.325000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2, tmp3_mean, tmp3_m2, tmp3_weight, roffset == 0 2025-12-04T12:15:06.0260470Z E1204 12:04:56.325000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ) 2025-12-04T12:15:06.0261103Z E1204 12:04:56.325000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_mean = tl.where(r0_mask, tmp3_mean_next, tmp3_mean) 2025-12-04T12:15:06.0261693Z E1204 12:04:56.325000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_m2 = tl.where(r0_mask, tmp3_m2_next, tmp3_m2) 2025-12-04T12:15:06.0262388Z E1204 12:04:56.325000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_weight = tl.where(r0_mask, tmp3_weight_next, tmp3_weight) 2025-12-04T12:15:06.0263095Z E1204 12:04:56.325000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4, tmp5, tmp6 = triton_helpers.welford(tmp3_mean, tmp3_m2, tmp3_weight, 1) 2025-12-04T12:15:06.0263582Z E1204 12:04:56.325000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tmp4[:, None] 2025-12-04T12:15:06.0264110Z E1204 12:04:56.325000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = tmp5[:, None] 2025-12-04T12:15:06.0264581Z E1204 12:04:56.325000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tmp6[:, None] 2025-12-04T12:15:06.0265232Z E1204 12:04:56.325000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp20 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32) 2025-12-04T12:15:06.0265761Z E1204 12:04:56.325000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp22 = tl.load(in_ptr1 + (0)) 2025-12-04T12:15:06.0266325Z E1204 12:04:56.325000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp23 = tl.broadcast_to(tmp22, [1, 1]) 2025-12-04T12:15:06.0266937Z E1204 12:04:56.325000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] for r0_offset in tl.range(0, r0_numel, R0_BLOCK): 2025-12-04T12:15:06.0267468Z E1204 12:04:56.325000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = r0_offset + r0_base 2025-12-04T12:15:06.0268015Z E1204 12:04:56.325000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T12:15:06.0268508Z E1204 12:04:56.325000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T12:15:06.0269006Z E1204 12:04:56.325000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T12:15:06.0269470Z E1204 12:04:56.325000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_1 = r0_index 2025-12-04T12:15:06.0270267Z E1204 12:04:56.325000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask, eviction_policy='evict_first', other=0.0).to(tl.float32) 2025-12-04T12:15:06.0270813Z E1204 12:04:56.325000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = tmp9.to(tl.float32) 2025-12-04T12:15:06.0271903Z E1204 12:04:56.325000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = tmp10 - tmp3 2025-12-04T12:15:06.0272416Z E1204 12:04:56.325000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = 4096.0 2025-12-04T12:15:06.0272921Z E1204 12:04:56.325000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = (tmp7 / tmp12) 2025-12-04T12:15:06.0273389Z E1204 12:04:56.325000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp14 = 1e-05 2025-12-04T12:15:06.0274046Z E1204 12:04:56.325000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp15 = tmp13 + tmp14 2025-12-04T12:15:06.0274586Z E1204 12:04:56.325000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp16 = libdevice.rsqrt(tmp15) 2025-12-04T12:15:06.0275097Z E1204 12:04:56.325000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp17 = tmp11 * tmp16 2025-12-04T12:15:06.0275621Z E1204 12:04:56.325000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp18 = tl_math.abs(tmp17) 2025-12-04T12:15:06.0276216Z E1204 12:04:56.325000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp19 = tl.broadcast_to(tmp18, [XBLOCK, R0_BLOCK]) 2025-12-04T12:15:06.0276863Z E1204 12:04:56.325000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp21 = triton_helpers.maximum(_tmp20, tmp19) 2025-12-04T12:15:06.0277433Z E1204 12:04:56.325000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp20 = tl.where(r0_mask, tmp21, _tmp20) 2025-12-04T12:15:06.0277940Z E1204 12:04:56.325000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp24 = tmp17 * tmp23 2025-12-04T12:15:06.0278453Z E1204 12:04:56.325000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp25 = -448.0 2025-12-04T12:15:06.0279044Z E1204 12:04:56.325000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp26 = triton_helpers.maximum(tmp24, tmp25) 2025-12-04T12:15:06.0279500Z E1204 12:04:56.325000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp27 = 448.0 2025-12-04T12:15:06.0280077Z E1204 12:04:56.325000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp28 = triton_helpers.minimum(tmp26, tmp27) 2025-12-04T12:15:06.0280680Z E1204 12:04:56.325000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp29 = tmp28.to(tl.float8e4nv) 2025-12-04T12:15:06.0281274Z E1204 12:04:56.325000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr3 + (r0_1 + 4096*x0), tmp29, r0_mask) 2025-12-04T12:15:06.0281866Z E1204 12:04:56.325000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp20 = triton_helpers.max2(_tmp20, 1)[:, None] 2025-12-04T12:15:06.0282417Z E1204 12:04:56.325000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr2 + (x0), tmp20, None) 2025-12-04T12:15:06.0282777Z E1204 12:04:56.325000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:06.0285138Z E1204 12:04:56.325000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr2': '*fp32', 'out_ptr3': '*fp8e4nv', 'xnumel': 'i32', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1, 'R0_BLOCK': 4096}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 75} 2025-12-04T12:15:06.0285678Z E1204 12:04:56.325000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:06.0286738Z E1204 12:04:56.325000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:06.0287401Z E1204 12:04:56.325000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:06.0288310Z E1204 12:04:56.325000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:06.0288993Z E1204 12:04:56.325000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:06.0289885Z E1204 12:04:56.325000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:06.0290684Z E1204 12:04:56.325000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:06.0291306Z E1204 12:04:56.325000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T12:15:06.0292399Z E1204 12:04:56.325000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T12:15:06.0292796Z E1204 12:04:56.325000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T12:15:06.0293698Z E1204 12:04:56.325000 122463 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:06.0293806Z FAILED [0.4455s] [ 0%] 2025-12-04T12:15:06.0293814Z 2025-12-04T12:15:06.0293971Z ==================================== RERUNS ==================================== 2025-12-04T12:15:06.0294410Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_4,2048,4096_cuda _ 2025-12-04T12:15:06.0294536Z Traceback (most recent call last): 2025-12-04T12:15:06.0294973Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant 2025-12-04T12:15:06.0295214Z y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T12:15:06.0295717Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:06.0295970Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:06.0296556Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:06.0296771Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:06.0297285Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:06.0297439Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:06.0297990Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:06.0298316Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:06.0298849Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:06.0299001Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:06.0299481Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:06.0299623Z return self._compile_to_module() 2025-12-04T12:15:06.0300110Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:06.0300293Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:06.0300852Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:06.0300983Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:06.0301499Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:06.0301730Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:06.0302313Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:06.0302453Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:06.0303019Z File "/tmp/tmp68dn1xf6/xe/cxeeedhdiosib6g6mpxnfz46ldep3ho7an5w23pkg733m7i4v5qx.py", line 65, in 2025-12-04T12:15:06.0303497Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T12:15:06.0303615Z kernel.precompile( 2025-12-04T12:15:06.0304166Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T12:15:06.0304332Z self._precompile_worker() 2025-12-04T12:15:06.0304924Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:06.0305115Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:06.0305706Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:06.0305904Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:06.0306369Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:06.0306648Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:06.0307105Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:06.0307442Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:06.0307669Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:06.0308332Z def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T12:15:06.0308424Z ^ 2025-12-04T12:15:06.0308882Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:06.0308888Z 2025-12-04T12:15:06.0309616Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:06.0309625Z 2025-12-04T12:15:06.0309629Z 2025-12-04T12:15:06.0309847Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:06.0310573Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_4,2048,4096_cuda 2025-12-04T12:15:06.0310582Z 2025-12-04T12:15:06.0310850Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:06.0311089Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:06.0311194Z frames [('total', 1)] 2025-12-04T12:15:06.0311315Z stats [('calls_captured', 10)] 2025-12-04T12:15:06.0311794Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:06.0312019Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:06.0312121Z graph_break [] 2025-12-04T12:15:06.0312593Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_4,2048,4096_cuda _ 2025-12-04T12:15:06.0312718Z Traceback (most recent call last): 2025-12-04T12:15:06.0313155Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant 2025-12-04T12:15:06.0313392Z y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T12:15:06.0313883Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:06.0314142Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:06.0314655Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:06.0314893Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:06.0315404Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:06.0315556Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:06.0316102Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:06.0316456Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:06.0316976Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:06.0317136Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:06.0317616Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:06.0317750Z return self._compile_to_module() 2025-12-04T12:15:06.0318240Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:06.0318450Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:06.0318984Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:06.0319115Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:06.0319623Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:06.0319857Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:06.0320446Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:06.0320588Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:06.0321085Z File "/tmp/tmpjl8roieu/wp/cwpr3t4i73ta32nonbipacqxobdkww3ahzw52d56nxg27hzemnre.py", line 65, in 2025-12-04T12:15:06.0321548Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T12:15:06.0321679Z kernel.precompile( 2025-12-04T12:15:06.0322236Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T12:15:06.0322369Z self._precompile_worker() 2025-12-04T12:15:06.0322967Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:06.0323147Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:06.0323753Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:06.0323953Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:06.0324418Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:06.0324662Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:06.0325152Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:06.0325501Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:06.0325731Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:06.0326377Z def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T12:15:06.0326481Z ^ 2025-12-04T12:15:06.0326939Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:06.0326944Z 2025-12-04T12:15:06.0327699Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:06.0327708Z 2025-12-04T12:15:06.0327713Z 2025-12-04T12:15:06.0327935Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:06.0328659Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_4,2048,4096_cuda 2025-12-04T12:15:06.0328697Z 2025-12-04T12:15:06.0328966Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:06.0329191Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:06.0329311Z frames [('total', 1)] 2025-12-04T12:15:06.0329430Z stats [('calls_captured', 10)] 2025-12-04T12:15:06.0329892Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:06.0330148Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:06.0330250Z graph_break [] 2025-12-04T12:15:06.0330515Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:06.0330624Z frames [('total', 1)] 2025-12-04T12:15:06.0330743Z stats [('calls_captured', 10)] 2025-12-04T12:15:06.0330976Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:06.0331439Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:06.0331540Z graph_break [] 2025-12-04T12:15:06.0331704Z =================================== FAILURES =================================== 2025-12-04T12:15:06.0332111Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_4,2048,4096_cuda _ 2025-12-04T12:15:06.0332251Z Traceback (most recent call last): 2025-12-04T12:15:06.0332685Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant 2025-12-04T12:15:06.0332919Z y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T12:15:06.0333430Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:06.0333682Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:06.0334215Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:06.0334410Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:06.0334922Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:06.0335084Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:06.0335619Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:06.0335945Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:06.0336817Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:06.0336979Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:06.0337487Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:06.0337616Z return self._compile_to_module() 2025-12-04T12:15:06.0338103Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:06.0338287Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:06.0338808Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:06.0338954Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:06.0339484Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:06.0339727Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:06.0340330Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:06.0340491Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:06.0340993Z File "/tmp/tmp4wo547af/3w/c3wzmawjcbjsrcfyxmclgbfwvt3io5i75yxrhyqhu7sqxxsqbdd3.py", line 65, in 2025-12-04T12:15:06.0341476Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T12:15:06.0341588Z kernel.precompile( 2025-12-04T12:15:06.0342154Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T12:15:06.0342276Z self._precompile_worker() 2025-12-04T12:15:06.0342874Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:06.0343107Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:06.0343704Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:06.0343919Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:06.0344369Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:06.0344615Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:06.0345072Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:06.0345406Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:06.0345641Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:06.0346307Z def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T12:15:06.0346397Z ^ 2025-12-04T12:15:06.0346869Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:06.0346878Z 2025-12-04T12:15:06.0347592Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:06.0347599Z 2025-12-04T12:15:06.0347603Z 2025-12-04T12:15:06.0347831Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:06.0348545Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_4,2048,4096_cuda 2025-12-04T12:15:06.0348551Z 2025-12-04T12:15:06.0348822Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:06.0349110Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:06.0349217Z frames [('total', 1)] 2025-12-04T12:15:06.0349344Z stats [('calls_captured', 10)] 2025-12-04T12:15:06.0349816Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:06.0350039Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:06.0350153Z graph_break [] 2025-12-04T12:15:06.0350371Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:06.0350473Z frames [('total', 1)] 2025-12-04T12:15:06.0350601Z stats [('calls_captured', 10)] 2025-12-04T12:15:06.0350820Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:06.0351322Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:06.0351439Z graph_break [] 2025-12-04T12:15:06.0351660Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:06.0351776Z frames [('total', 1)] 2025-12-04T12:15:06.0351891Z stats [('calls_captured', 10)] 2025-12-04T12:15:06.0352144Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:06.0352619Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:06.0352721Z graph_break [] 2025-12-04T12:15:06.0353374Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-80996ba6b8c32f81.xml - 2025-12-04T12:15:06.0353564Z =========================== short test summary info ============================ 2025-12-04T12:15:06.0354428Z FAILED [0.4455s] inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_4,2048,4096_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:06.0355143Z def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T12:15:06.0355239Z ^ 2025-12-04T12:15:06.0355700Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:06.0355718Z 2025-12-04T12:15:06.0356423Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:06.0356429Z 2025-12-04T12:15:06.0356433Z 2025-12-04T12:15:06.0356650Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:06.0357375Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_4,2048,4096_cuda 2025-12-04T12:15:06.0357383Z 2025-12-04T12:15:06.0357655Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:06.0357851Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T12:15:06.0358054Z ================== 1 failed, 36 deselected, 2 rerun in 4.38s =================== 2025-12-04T12:15:06.0358157Z Got exit code 1 2025-12-04T12:15:06.0358280Z Retrying single test... 2025-12-04T12:15:06.0358750Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-8b26ba548538abde.xml 2025-12-04T12:15:06.0358915Z ============================= test session starts ============================== 2025-12-04T12:15:06.0359280Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T12:15:06.0359395Z cachedir: .pytest_cache 2025-12-04T12:15:06.0359928Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:15:06.0360104Z rootdir: /var/lib/jenkins/workspace 2025-12-04T12:15:06.0360219Z configfile: pytest.ini 2025-12-04T12:15:06.0360827Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T12:15:06.0361054Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T12:15:06.0361861Z stepcurrent: skipping 36 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_4,2048,4096_cuda 2025-12-04T12:15:06.0361981Z Running 1 items in this shard 2025-12-04T12:15:06.0361986Z 2025-12-04T12:15:06.0363381Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_4,2048,4096_cuda E1204 12:05:14.988000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0 2025-12-04T12:15:06.0364497Z E1204 12:05:14.988000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T12:15:06.0364982Z E1204 12:05:14.988000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 8192 2025-12-04T12:15:06.0365445Z E1204 12:05:14.988000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 4096 2025-12-04T12:15:06.0365910Z E1204 12:05:14.988000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T12:15:06.0366460Z E1204 12:05:14.988000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T12:15:06.0367049Z E1204 12:05:14.988000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:06.0367634Z E1204 12:05:14.988000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T12:15:06.0368233Z E1204 12:05:14.988000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T12:15:06.0368793Z E1204 12:05:14.988000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_base = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T12:15:06.0369254Z E1204 12:05:14.988000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rbase = r0_base 2025-12-04T12:15:06.0369686Z E1204 12:05:14.988000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T12:15:06.0370289Z E1204 12:05:14.988000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_mean = tl.zeros([XBLOCK, R0_BLOCK], tl.float32) 2025-12-04T12:15:06.0370893Z E1204 12:05:14.988000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_m2 = tl.zeros([XBLOCK, R0_BLOCK], tl.float32) 2025-12-04T12:15:06.0371740Z E1204 12:05:14.988000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_weight = tl.zeros([XBLOCK, R0_BLOCK], tl.float32) 2025-12-04T12:15:06.0372340Z E1204 12:05:14.988000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] for r0_offset in tl.range(0, r0_numel, R0_BLOCK): 2025-12-04T12:15:06.0372870Z E1204 12:05:14.988000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = r0_offset + r0_base 2025-12-04T12:15:06.0373407Z E1204 12:05:14.988000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T12:15:06.0374026Z E1204 12:05:14.988000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T12:15:06.0374512Z E1204 12:05:14.988000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T12:15:06.0374996Z E1204 12:05:14.988000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_1 = r0_index 2025-12-04T12:15:06.0375778Z E1204 12:05:14.988000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask, eviction_policy='evict_last', other=0.0).to(tl.float32) 2025-12-04T12:15:06.0376380Z E1204 12:05:14.988000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float32) 2025-12-04T12:15:06.0377042Z E1204 12:05:14.988000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK]) 2025-12-04T12:15:06.0377772Z E1204 12:05:14.988000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_mean_next, tmp3_m2_next, tmp3_weight_next = triton_helpers.welford_reduce( 2025-12-04T12:15:06.0378489Z E1204 12:05:14.988000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2, tmp3_mean, tmp3_m2, tmp3_weight, roffset == 0 2025-12-04T12:15:06.0378967Z E1204 12:05:14.988000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ) 2025-12-04T12:15:06.0379602Z E1204 12:05:14.988000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_mean = tl.where(r0_mask, tmp3_mean_next, tmp3_mean) 2025-12-04T12:15:06.0380196Z E1204 12:05:14.988000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_m2 = tl.where(r0_mask, tmp3_m2_next, tmp3_m2) 2025-12-04T12:15:06.0380845Z E1204 12:05:14.988000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_weight = tl.where(r0_mask, tmp3_weight_next, tmp3_weight) 2025-12-04T12:15:06.0381624Z E1204 12:05:14.988000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4, tmp5, tmp6 = triton_helpers.welford(tmp3_mean, tmp3_m2, tmp3_weight, 1) 2025-12-04T12:15:06.0382109Z E1204 12:05:14.988000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tmp4[:, None] 2025-12-04T12:15:06.0382602Z E1204 12:05:14.988000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = tmp5[:, None] 2025-12-04T12:15:06.0383074Z E1204 12:05:14.988000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tmp6[:, None] 2025-12-04T12:15:06.0383728Z E1204 12:05:14.988000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp20 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32) 2025-12-04T12:15:06.0384259Z E1204 12:05:14.988000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp22 = tl.load(in_ptr1 + (0)) 2025-12-04T12:15:06.0384807Z E1204 12:05:14.988000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp23 = tl.broadcast_to(tmp22, [1, 1]) 2025-12-04T12:15:06.0385401Z E1204 12:05:14.988000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] for r0_offset in tl.range(0, r0_numel, R0_BLOCK): 2025-12-04T12:15:06.0385932Z E1204 12:05:14.988000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = r0_offset + r0_base 2025-12-04T12:15:06.0386471Z E1204 12:05:14.988000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T12:15:06.0386960Z E1204 12:05:14.988000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T12:15:06.0387441Z E1204 12:05:14.988000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T12:15:06.0387962Z E1204 12:05:14.988000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_1 = r0_index 2025-12-04T12:15:06.0388753Z E1204 12:05:14.988000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask, eviction_policy='evict_first', other=0.0).to(tl.float32) 2025-12-04T12:15:06.0389296Z E1204 12:05:14.988000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = tmp9.to(tl.float32) 2025-12-04T12:15:06.0389794Z E1204 12:05:14.988000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = tmp10 - tmp3 2025-12-04T12:15:06.0390287Z E1204 12:05:14.988000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = 4096.0 2025-12-04T12:15:06.0390804Z E1204 12:05:14.988000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = (tmp7 / tmp12) 2025-12-04T12:15:06.0391270Z E1204 12:05:14.988000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp14 = 1e-05 2025-12-04T12:15:06.0391781Z E1204 12:05:14.988000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp15 = tmp13 + tmp14 2025-12-04T12:15:06.0392352Z E1204 12:05:14.988000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp16 = libdevice.rsqrt(tmp15) 2025-12-04T12:15:06.0392847Z E1204 12:05:14.988000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp17 = tmp11 * tmp16 2025-12-04T12:15:06.0393380Z E1204 12:05:14.988000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp18 = tl_math.abs(tmp17) 2025-12-04T12:15:06.0393975Z E1204 12:05:14.988000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp19 = tl.broadcast_to(tmp18, [XBLOCK, R0_BLOCK]) 2025-12-04T12:15:06.0394602Z E1204 12:05:14.988000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp21 = triton_helpers.maximum(_tmp20, tmp19) 2025-12-04T12:15:06.0395161Z E1204 12:05:14.988000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp20 = tl.where(r0_mask, tmp21, _tmp20) 2025-12-04T12:15:06.0395672Z E1204 12:05:14.988000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp24 = tmp17 * tmp23 2025-12-04T12:15:06.0396137Z E1204 12:05:14.988000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp25 = -448.0 2025-12-04T12:15:06.0396718Z E1204 12:05:14.988000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp26 = triton_helpers.maximum(tmp24, tmp25) 2025-12-04T12:15:06.0397189Z E1204 12:05:14.988000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp27 = 448.0 2025-12-04T12:15:06.0397770Z E1204 12:05:14.988000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp28 = triton_helpers.minimum(tmp26, tmp27) 2025-12-04T12:15:06.0398324Z E1204 12:05:14.988000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp29 = tmp28.to(tl.float8e4nv) 2025-12-04T12:15:06.0398920Z E1204 12:05:14.988000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr3 + (r0_1 + 4096*x0), tmp29, r0_mask) 2025-12-04T12:15:06.0399500Z E1204 12:05:14.988000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp20 = triton_helpers.max2(_tmp20, 1)[:, None] 2025-12-04T12:15:06.0400069Z E1204 12:05:14.988000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr2 + (x0), tmp20, None) 2025-12-04T12:15:06.0400430Z E1204 12:05:14.988000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:06.0402850Z E1204 12:05:14.988000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr2': '*fp32', 'out_ptr3': '*fp8e4nv', 'xnumel': 'i32', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1, 'R0_BLOCK': 4096}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 75} 2025-12-04T12:15:06.0403393Z E1204 12:05:14.988000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:06.0404491Z E1204 12:05:14.988000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:06.0405126Z E1204 12:05:14.988000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:06.0406081Z E1204 12:05:14.988000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:06.0406759Z E1204 12:05:14.988000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:06.0407644Z E1204 12:05:14.988000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:06.0408463Z E1204 12:05:14.988000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:06.0409070Z E1204 12:05:14.988000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T12:15:06.0410174Z E1204 12:05:14.988000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T12:15:06.0410541Z E1204 12:05:14.988000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T12:15:06.0411454Z E1204 12:05:14.988000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:06.0411597Z ('RERUN', {'yellow': True}) [3.4500s] [100%] 2025-12-04T12:15:06.0412950Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_4,2048,4096_cuda E1204 12:05:15.479000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0 2025-12-04T12:15:06.0414056Z E1204 12:05:15.479000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T12:15:06.0414504Z E1204 12:05:15.479000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 8192 2025-12-04T12:15:06.0414963Z E1204 12:05:15.479000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 4096 2025-12-04T12:15:06.0415470Z E1204 12:05:15.479000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T12:15:06.0416014Z E1204 12:05:15.479000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T12:15:06.0416631Z E1204 12:05:15.479000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:06.0417216Z E1204 12:05:15.479000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T12:15:06.0417816Z E1204 12:05:15.479000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T12:15:06.0418414Z E1204 12:05:15.479000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_base = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T12:15:06.0418888Z E1204 12:05:15.479000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rbase = r0_base 2025-12-04T12:15:06.0419324Z E1204 12:05:15.479000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T12:15:06.0419958Z E1204 12:05:15.479000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_mean = tl.zeros([XBLOCK, R0_BLOCK], tl.float32) 2025-12-04T12:15:06.0420554Z E1204 12:05:15.479000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_m2 = tl.zeros([XBLOCK, R0_BLOCK], tl.float32) 2025-12-04T12:15:06.0421168Z E1204 12:05:15.479000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_weight = tl.zeros([XBLOCK, R0_BLOCK], tl.float32) 2025-12-04T12:15:06.0421764Z E1204 12:05:15.479000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] for r0_offset in tl.range(0, r0_numel, R0_BLOCK): 2025-12-04T12:15:06.0422333Z E1204 12:05:15.479000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = r0_offset + r0_base 2025-12-04T12:15:06.0422875Z E1204 12:05:15.479000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T12:15:06.0423371Z E1204 12:05:15.479000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T12:15:06.0423852Z E1204 12:05:15.479000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T12:15:06.0424331Z E1204 12:05:15.479000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_1 = r0_index 2025-12-04T12:15:06.0425117Z E1204 12:05:15.479000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask, eviction_policy='evict_last', other=0.0).to(tl.float32) 2025-12-04T12:15:06.0425661Z E1204 12:05:15.479000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float32) 2025-12-04T12:15:06.0426251Z E1204 12:05:15.479000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK]) 2025-12-04T12:15:06.0426974Z E1204 12:05:15.479000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_mean_next, tmp3_m2_next, tmp3_weight_next = triton_helpers.welford_reduce( 2025-12-04T12:15:06.0427596Z E1204 12:05:15.479000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2, tmp3_mean, tmp3_m2, tmp3_weight, roffset == 0 2025-12-04T12:15:06.0428005Z E1204 12:05:15.479000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ) 2025-12-04T12:15:06.0428675Z E1204 12:05:15.479000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_mean = tl.where(r0_mask, tmp3_mean_next, tmp3_mean) 2025-12-04T12:15:06.0429270Z E1204 12:05:15.479000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_m2 = tl.where(r0_mask, tmp3_m2_next, tmp3_m2) 2025-12-04T12:15:06.0429934Z E1204 12:05:15.479000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_weight = tl.where(r0_mask, tmp3_weight_next, tmp3_weight) 2025-12-04T12:15:06.0430644Z E1204 12:05:15.479000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4, tmp5, tmp6 = triton_helpers.welford(tmp3_mean, tmp3_m2, tmp3_weight, 1) 2025-12-04T12:15:06.0431124Z E1204 12:05:15.479000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tmp4[:, None] 2025-12-04T12:15:06.0431655Z E1204 12:05:15.479000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = tmp5[:, None] 2025-12-04T12:15:06.0432132Z E1204 12:05:15.479000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tmp6[:, None] 2025-12-04T12:15:06.0432788Z E1204 12:05:15.479000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp20 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32) 2025-12-04T12:15:06.0433345Z E1204 12:05:15.479000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp22 = tl.load(in_ptr1 + (0)) 2025-12-04T12:15:06.0433896Z E1204 12:05:15.479000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp23 = tl.broadcast_to(tmp22, [1, 1]) 2025-12-04T12:15:06.0434493Z E1204 12:05:15.479000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] for r0_offset in tl.range(0, r0_numel, R0_BLOCK): 2025-12-04T12:15:06.0435028Z E1204 12:05:15.479000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = r0_offset + r0_base 2025-12-04T12:15:06.0435602Z E1204 12:05:15.479000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T12:15:06.0436091Z E1204 12:05:15.479000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T12:15:06.0436571Z E1204 12:05:15.479000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T12:15:06.0437050Z E1204 12:05:15.479000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_1 = r0_index 2025-12-04T12:15:06.0440671Z E1204 12:05:15.479000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask, eviction_policy='evict_first', other=0.0).to(tl.float32) 2025-12-04T12:15:06.0441234Z E1204 12:05:15.479000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = tmp9.to(tl.float32) 2025-12-04T12:15:06.0441760Z E1204 12:05:15.479000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = tmp10 - tmp3 2025-12-04T12:15:06.0442224Z E1204 12:05:15.479000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = 4096.0 2025-12-04T12:15:06.0442750Z E1204 12:05:15.479000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = (tmp7 / tmp12) 2025-12-04T12:15:06.0443205Z E1204 12:05:15.479000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp14 = 1e-05 2025-12-04T12:15:06.0443720Z E1204 12:05:15.479000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp15 = tmp13 + tmp14 2025-12-04T12:15:06.0444293Z E1204 12:05:15.479000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp16 = libdevice.rsqrt(tmp15) 2025-12-04T12:15:06.0444785Z E1204 12:05:15.479000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp17 = tmp11 * tmp16 2025-12-04T12:15:06.0445325Z E1204 12:05:15.479000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp18 = tl_math.abs(tmp17) 2025-12-04T12:15:06.0445918Z E1204 12:05:15.479000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp19 = tl.broadcast_to(tmp18, [XBLOCK, R0_BLOCK]) 2025-12-04T12:15:06.0446502Z E1204 12:05:15.479000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp21 = triton_helpers.maximum(_tmp20, tmp19) 2025-12-04T12:15:06.0447077Z E1204 12:05:15.479000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp20 = tl.where(r0_mask, tmp21, _tmp20) 2025-12-04T12:15:06.0447636Z E1204 12:05:15.479000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp24 = tmp17 * tmp23 2025-12-04T12:15:06.0448117Z E1204 12:05:15.479000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp25 = -448.0 2025-12-04T12:15:06.0448696Z E1204 12:05:15.479000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp26 = triton_helpers.maximum(tmp24, tmp25) 2025-12-04T12:15:06.0449184Z E1204 12:05:15.479000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp27 = 448.0 2025-12-04T12:15:06.0449772Z E1204 12:05:15.479000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp28 = triton_helpers.minimum(tmp26, tmp27) 2025-12-04T12:15:06.0450312Z E1204 12:05:15.479000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp29 = tmp28.to(tl.float8e4nv) 2025-12-04T12:15:06.0450926Z E1204 12:05:15.479000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr3 + (r0_1 + 4096*x0), tmp29, r0_mask) 2025-12-04T12:15:06.0451502Z E1204 12:05:15.479000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp20 = triton_helpers.max2(_tmp20, 1)[:, None] 2025-12-04T12:15:06.0452102Z E1204 12:05:15.479000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr2 + (x0), tmp20, None) 2025-12-04T12:15:06.0452465Z E1204 12:05:15.479000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:06.0454889Z E1204 12:05:15.479000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr2': '*fp32', 'out_ptr3': '*fp8e4nv', 'xnumel': 'i32', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1, 'R0_BLOCK': 4096}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 75} 2025-12-04T12:15:06.0455441Z E1204 12:05:15.479000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:06.0456629Z E1204 12:05:15.479000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:06.0457278Z E1204 12:05:15.479000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:06.0458179Z E1204 12:05:15.479000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:06.0458872Z E1204 12:05:15.479000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:06.0459757Z E1204 12:05:15.479000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:06.0460542Z E1204 12:05:15.479000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:06.0461151Z E1204 12:05:15.479000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T12:15:06.0462326Z E1204 12:05:15.479000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T12:15:06.0462698Z E1204 12:05:15.479000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T12:15:06.0463591Z E1204 12:05:15.479000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:06.0463773Z ('RERUN', {'yellow': True}) [0.4527s] [100%] 2025-12-04T12:15:06.0465123Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_4,2048,4096_cuda E1204 12:05:15.931000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0 2025-12-04T12:15:06.0466219Z E1204 12:05:15.931000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T12:15:06.0466697Z E1204 12:05:15.931000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 8192 2025-12-04T12:15:06.0467162Z E1204 12:05:15.931000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 4096 2025-12-04T12:15:06.0467622Z E1204 12:05:15.931000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T12:15:06.0468154Z E1204 12:05:15.931000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T12:15:06.0468761Z E1204 12:05:15.931000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:06.0469347Z E1204 12:05:15.931000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T12:15:06.0469940Z E1204 12:05:15.931000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T12:15:06.0470493Z E1204 12:05:15.931000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_base = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T12:15:06.0471166Z E1204 12:05:15.931000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rbase = r0_base 2025-12-04T12:15:06.0471695Z E1204 12:05:15.931000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T12:15:06.0472388Z E1204 12:05:15.931000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_mean = tl.zeros([XBLOCK, R0_BLOCK], tl.float32) 2025-12-04T12:15:06.0472996Z E1204 12:05:15.931000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_m2 = tl.zeros([XBLOCK, R0_BLOCK], tl.float32) 2025-12-04T12:15:06.0473608Z E1204 12:05:15.931000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_weight = tl.zeros([XBLOCK, R0_BLOCK], tl.float32) 2025-12-04T12:15:06.0474193Z E1204 12:05:15.931000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] for r0_offset in tl.range(0, r0_numel, R0_BLOCK): 2025-12-04T12:15:06.0474745Z E1204 12:05:15.931000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = r0_offset + r0_base 2025-12-04T12:15:06.0475276Z E1204 12:05:15.931000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T12:15:06.0475949Z E1204 12:05:15.931000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T12:15:06.0476434Z E1204 12:05:15.931000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T12:15:06.0476904Z E1204 12:05:15.931000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_1 = r0_index 2025-12-04T12:15:06.0477770Z E1204 12:05:15.931000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask, eviction_policy='evict_last', other=0.0).to(tl.float32) 2025-12-04T12:15:06.0478366Z E1204 12:05:15.931000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float32) 2025-12-04T12:15:06.0478968Z E1204 12:05:15.931000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK]) 2025-12-04T12:15:06.0479691Z E1204 12:05:15.931000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_mean_next, tmp3_m2_next, tmp3_weight_next = triton_helpers.welford_reduce( 2025-12-04T12:15:06.0480356Z E1204 12:05:15.931000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2, tmp3_mean, tmp3_m2, tmp3_weight, roffset == 0 2025-12-04T12:15:06.0480758Z E1204 12:05:15.931000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ) 2025-12-04T12:15:06.0481374Z E1204 12:05:15.931000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_mean = tl.where(r0_mask, tmp3_mean_next, tmp3_mean) 2025-12-04T12:15:06.0481976Z E1204 12:05:15.931000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_m2 = tl.where(r0_mask, tmp3_m2_next, tmp3_m2) 2025-12-04T12:15:06.0482672Z E1204 12:05:15.931000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_weight = tl.where(r0_mask, tmp3_weight_next, tmp3_weight) 2025-12-04T12:15:06.0483391Z E1204 12:05:15.931000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4, tmp5, tmp6 = triton_helpers.welford(tmp3_mean, tmp3_m2, tmp3_weight, 1) 2025-12-04T12:15:06.0483869Z E1204 12:05:15.931000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tmp4[:, None] 2025-12-04T12:15:06.0484344Z E1204 12:05:15.931000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = tmp5[:, None] 2025-12-04T12:15:06.0484831Z E1204 12:05:15.931000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tmp6[:, None] 2025-12-04T12:15:06.0485461Z E1204 12:05:15.931000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp20 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32) 2025-12-04T12:15:06.0486001Z E1204 12:05:15.931000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp22 = tl.load(in_ptr1 + (0)) 2025-12-04T12:15:06.0486549Z E1204 12:05:15.931000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp23 = tl.broadcast_to(tmp22, [1, 1]) 2025-12-04T12:15:06.0487143Z E1204 12:05:15.931000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] for r0_offset in tl.range(0, r0_numel, R0_BLOCK): 2025-12-04T12:15:06.0487674Z E1204 12:05:15.931000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = r0_offset + r0_base 2025-12-04T12:15:06.0488204Z E1204 12:05:15.931000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T12:15:06.0488711Z E1204 12:05:15.931000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T12:15:06.0489191Z E1204 12:05:15.931000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T12:15:06.0489708Z E1204 12:05:15.931000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_1 = r0_index 2025-12-04T12:15:06.0490503Z E1204 12:05:15.931000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask, eviction_policy='evict_first', other=0.0).to(tl.float32) 2025-12-04T12:15:06.0491031Z E1204 12:05:15.931000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = tmp9.to(tl.float32) 2025-12-04T12:15:06.0491572Z E1204 12:05:15.931000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = tmp10 - tmp3 2025-12-04T12:15:06.0492033Z E1204 12:05:15.931000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = 4096.0 2025-12-04T12:15:06.0492547Z E1204 12:05:15.931000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = (tmp7 / tmp12) 2025-12-04T12:15:06.0493004Z E1204 12:05:15.931000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp14 = 1e-05 2025-12-04T12:15:06.0493530Z E1204 12:05:15.931000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp15 = tmp13 + tmp14 2025-12-04T12:15:06.0494079Z E1204 12:05:15.931000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp16 = libdevice.rsqrt(tmp15) 2025-12-04T12:15:06.0494573Z E1204 12:05:15.931000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp17 = tmp11 * tmp16 2025-12-04T12:15:06.0495109Z E1204 12:05:15.931000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp18 = tl_math.abs(tmp17) 2025-12-04T12:15:06.0495702Z E1204 12:05:15.931000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp19 = tl.broadcast_to(tmp18, [XBLOCK, R0_BLOCK]) 2025-12-04T12:15:06.0496408Z E1204 12:05:15.931000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp21 = triton_helpers.maximum(_tmp20, tmp19) 2025-12-04T12:15:06.0496974Z E1204 12:05:15.931000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp20 = tl.where(r0_mask, tmp21, _tmp20) 2025-12-04T12:15:06.0497471Z E1204 12:05:15.931000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp24 = tmp17 * tmp23 2025-12-04T12:15:06.0497951Z E1204 12:05:15.931000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp25 = -448.0 2025-12-04T12:15:06.0498527Z E1204 12:05:15.931000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp26 = triton_helpers.maximum(tmp24, tmp25) 2025-12-04T12:15:06.0498996Z E1204 12:05:15.931000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp27 = 448.0 2025-12-04T12:15:06.0499574Z E1204 12:05:15.931000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp28 = triton_helpers.minimum(tmp26, tmp27) 2025-12-04T12:15:06.0500115Z E1204 12:05:15.931000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp29 = tmp28.to(tl.float8e4nv) 2025-12-04T12:15:06.0500722Z E1204 12:05:15.931000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr3 + (r0_1 + 4096*x0), tmp29, r0_mask) 2025-12-04T12:15:06.0501302Z E1204 12:05:15.931000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp20 = triton_helpers.max2(_tmp20, 1)[:, None] 2025-12-04T12:15:06.0501860Z E1204 12:05:15.931000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr2 + (x0), tmp20, None) 2025-12-04T12:15:06.0502223Z E1204 12:05:15.931000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:06.0504623Z E1204 12:05:15.931000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr2': '*fp32', 'out_ptr3': '*fp8e4nv', 'xnumel': 'i32', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1, 'R0_BLOCK': 4096}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 75} 2025-12-04T12:15:06.0505190Z E1204 12:05:15.931000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:06.0506247Z E1204 12:05:15.931000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:06.0506873Z E1204 12:05:15.931000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:06.0507804Z E1204 12:05:15.931000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:06.0508503Z E1204 12:05:15.931000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:06.0509389Z E1204 12:05:15.931000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:06.0510224Z E1204 12:05:15.931000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:06.0510840Z E1204 12:05:15.931000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T12:15:06.0511944Z E1204 12:05:15.931000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T12:15:06.0512314Z E1204 12:05:15.931000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T12:15:06.0513211Z E1204 12:05:15.931000 122660 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:06.0513333Z FAILED [0.4487s] [100%] 2025-12-04T12:15:06.0513342Z 2025-12-04T12:15:06.0513489Z ==================================== RERUNS ==================================== 2025-12-04T12:15:06.0513909Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_4,2048,4096_cuda _ 2025-12-04T12:15:06.0514037Z Traceback (most recent call last): 2025-12-04T12:15:06.0514467Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant 2025-12-04T12:15:06.0514721Z y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T12:15:06.0515213Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:06.0515479Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:06.0515997Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:06.0516227Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:06.0516754Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:06.0516906Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:06.0517437Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:06.0517802Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:06.0518324Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:06.0518486Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:06.0518967Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:06.0519090Z return self._compile_to_module() 2025-12-04T12:15:06.0519592Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:06.0519793Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:06.0520322Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:06.0520455Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:06.0520955Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:06.0521200Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:06.0521787Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:06.0521914Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:06.0522466Z File "/tmp/tmprm7t157q/ry/cryfvhdvbdirhsvo7sox7acx2ecvuu3dc4h2kzejuuvfgwhl77kp.py", line 65, in 2025-12-04T12:15:06.0522931Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T12:15:06.0523076Z kernel.precompile( 2025-12-04T12:15:06.0523632Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T12:15:06.0523756Z self._precompile_worker() 2025-12-04T12:15:06.0524369Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:06.0524555Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:06.0525167Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:06.0525368Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:06.0525825Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:06.0526088Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:06.0526536Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:06.0526887Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:06.0527119Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:06.0527777Z def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T12:15:06.0527884Z ^ 2025-12-04T12:15:06.0528342Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:06.0528348Z 2025-12-04T12:15:06.0529106Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:06.0529129Z 2025-12-04T12:15:06.0529134Z 2025-12-04T12:15:06.0529353Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:06.0530067Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_4,2048,4096_cuda 2025-12-04T12:15:06.0530104Z 2025-12-04T12:15:06.0530395Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:06.0530626Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:06.0530749Z frames [('total', 1)] 2025-12-04T12:15:06.0530869Z stats [('calls_captured', 10)] 2025-12-04T12:15:06.0531339Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:06.0531584Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:06.0531688Z graph_break [] 2025-12-04T12:15:06.0532123Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_4,2048,4096_cuda _ 2025-12-04T12:15:06.0532262Z Traceback (most recent call last): 2025-12-04T12:15:06.0532687Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant 2025-12-04T12:15:06.0532941Z y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T12:15:06.0533433Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:06.0533680Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:06.0534209Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:06.0534441Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:06.0534968Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:06.0535117Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:06.0535650Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:06.0535994Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:06.0536593Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:06.0536744Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:06.0537244Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:06.0537369Z return self._compile_to_module() 2025-12-04T12:15:06.0537875Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:06.0538042Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:06.0538560Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:06.0538707Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:06.0539206Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:06.0539453Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:06.0540040Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:06.0540167Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:06.0540682Z File "/tmp/tmp4i4cd1io/cx/ccxp6zwyisw4fbmmmicno2erqws77zkvu5jbkgbicpk3vb7bdqp3.py", line 65, in 2025-12-04T12:15:06.0541189Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T12:15:06.0541305Z kernel.precompile( 2025-12-04T12:15:06.0541874Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T12:15:06.0541993Z self._precompile_worker() 2025-12-04T12:15:06.0542638Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:06.0542821Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:06.0543414Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:06.0543625Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:06.0544080Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:06.0544343Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:06.0544825Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:06.0545162Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:06.0545404Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:06.0546055Z def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T12:15:06.0546149Z ^ 2025-12-04T12:15:06.0546623Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:06.0546628Z 2025-12-04T12:15:06.0547383Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:06.0547391Z 2025-12-04T12:15:06.0547396Z 2025-12-04T12:15:06.0547629Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:06.0548345Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_4,2048,4096_cuda 2025-12-04T12:15:06.0548353Z 2025-12-04T12:15:06.0548637Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:06.0548861Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:06.0548968Z frames [('total', 1)] 2025-12-04T12:15:06.0549100Z stats [('calls_captured', 10)] 2025-12-04T12:15:06.0549569Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:06.0549797Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:06.0549914Z graph_break [] 2025-12-04T12:15:06.0550134Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:06.0550253Z frames [('total', 1)] 2025-12-04T12:15:06.0550369Z stats [('calls_captured', 10)] 2025-12-04T12:15:06.0550587Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:06.0551063Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:06.0551163Z graph_break [] 2025-12-04T12:15:06.0551312Z =================================== FAILURES =================================== 2025-12-04T12:15:06.0551725Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_4,2048,4096_cuda _ 2025-12-04T12:15:06.0551850Z Traceback (most recent call last): 2025-12-04T12:15:06.0552328Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant 2025-12-04T12:15:06.0552565Z y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T12:15:06.0553057Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:06.0553318Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:06.0553829Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:06.0554067Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:06.0554579Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:06.0554728Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:06.0555274Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:06.0555600Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:06.0556176Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:06.0556338Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:06.0556819Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:06.0556957Z return self._compile_to_module() 2025-12-04T12:15:06.0557444Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:06.0557610Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:06.0558142Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:06.0558273Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:06.0558825Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:06.0559064Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:06.0559655Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:06.0559797Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:06.0560295Z File "/tmp/tmpgn88xo6k/x7/cx7cgqmedpm36qrxnz56ka5amiatgv4cwg3j3zdaadc6olyqssht.py", line 65, in 2025-12-04T12:15:06.0560755Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T12:15:06.0560879Z kernel.precompile( 2025-12-04T12:15:06.0561432Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T12:15:06.0561565Z self._precompile_worker() 2025-12-04T12:15:06.0562164Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:06.0562346Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:06.0562957Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:06.0563159Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:06.0563621Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:06.0563866Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:06.0564308Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:06.0564659Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:06.0564917Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:06.0565569Z def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T12:15:06.0565675Z ^ 2025-12-04T12:15:06.0566134Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:06.0566170Z 2025-12-04T12:15:06.0566894Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:06.0566900Z 2025-12-04T12:15:06.0566906Z 2025-12-04T12:15:06.0567125Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:06.0567851Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_4,2048,4096_cuda 2025-12-04T12:15:06.0567857Z 2025-12-04T12:15:06.0568126Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:06.0568380Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:06.0568502Z frames [('total', 1)] 2025-12-04T12:15:06.0568620Z stats [('calls_captured', 10)] 2025-12-04T12:15:06.0569101Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:06.0569324Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:06.0569424Z graph_break [] 2025-12-04T12:15:06.0569654Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:06.0569758Z frames [('total', 1)] 2025-12-04T12:15:06.0569875Z stats [('calls_captured', 10)] 2025-12-04T12:15:06.0570143Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:06.0570611Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:06.0570711Z graph_break [] 2025-12-04T12:15:06.0571213Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:06.0571325Z frames [('total', 1)] 2025-12-04T12:15:06.0571457Z stats [('calls_captured', 10)] 2025-12-04T12:15:06.0571680Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:06.0572144Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:06.0572256Z graph_break [] 2025-12-04T12:15:06.0572906Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-8b26ba548538abde.xml - 2025-12-04T12:15:06.0573080Z =========================== short test summary info ============================ 2025-12-04T12:15:06.0573958Z FAILED [0.4487s] inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_4,2048,4096_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:06.0574608Z def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T12:15:06.0574717Z ^ 2025-12-04T12:15:06.0575177Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:06.0575183Z 2025-12-04T12:15:06.0575902Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:06.0575908Z 2025-12-04T12:15:06.0575913Z 2025-12-04T12:15:06.0576133Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:06.0576997Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_4,2048,4096_cuda 2025-12-04T12:15:06.0577020Z 2025-12-04T12:15:06.0577289Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:06.0577470Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T12:15:06.0577733Z ================== 1 failed, 187 deselected, 2 rerun in 4.39s ================== 2025-12-04T12:15:06.0577838Z Got exit code 1 2025-12-04T12:15:06.0577952Z Retrying single test... 2025-12-04T12:15:06.0578441Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-d73817a3e5f02a06.xml 2025-12-04T12:15:06.0578607Z ============================= test session starts ============================== 2025-12-04T12:15:06.0578965Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T12:15:06.0579097Z cachedir: .pytest_cache 2025-12-04T12:15:06.0579622Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:15:06.0579805Z rootdir: /var/lib/jenkins/workspace 2025-12-04T12:15:06.0579917Z configfile: pytest.ini 2025-12-04T12:15:06.0580509Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T12:15:06.0580748Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T12:15:06.0581542Z stepcurrent: skipping 36 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_4,2048,4096_cuda 2025-12-04T12:15:06.0581670Z Running 1 items in this shard 2025-12-04T12:15:06.0581675Z 2025-12-04T12:15:06.0583070Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_4,2048,4096_cuda E1204 12:05:34.461000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0 2025-12-04T12:15:06.0584174Z E1204 12:05:34.461000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T12:15:06.0584624Z E1204 12:05:34.461000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 8192 2025-12-04T12:15:06.0585077Z E1204 12:05:34.461000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 4096 2025-12-04T12:15:06.0585549Z E1204 12:05:34.461000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T12:15:06.0586087Z E1204 12:05:34.461000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T12:15:06.0586641Z E1204 12:05:34.461000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:06.0587229Z E1204 12:05:34.461000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T12:15:06.0587813Z E1204 12:05:34.461000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T12:15:06.0588381Z E1204 12:05:34.461000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_base = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T12:15:06.0588827Z E1204 12:05:34.461000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rbase = r0_base 2025-12-04T12:15:06.0589305Z E1204 12:05:34.461000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T12:15:06.0589904Z E1204 12:05:34.461000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_mean = tl.zeros([XBLOCK, R0_BLOCK], tl.float32) 2025-12-04T12:15:06.0590490Z E1204 12:05:34.461000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_m2 = tl.zeros([XBLOCK, R0_BLOCK], tl.float32) 2025-12-04T12:15:06.0591136Z E1204 12:05:34.461000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_weight = tl.zeros([XBLOCK, R0_BLOCK], tl.float32) 2025-12-04T12:15:06.0591714Z E1204 12:05:34.461000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] for r0_offset in tl.range(0, r0_numel, R0_BLOCK): 2025-12-04T12:15:06.0592255Z E1204 12:05:34.461000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = r0_offset + r0_base 2025-12-04T12:15:06.0592789Z E1204 12:05:34.461000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T12:15:06.0593277Z E1204 12:05:34.461000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T12:15:06.0593817Z E1204 12:05:34.461000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T12:15:06.0594288Z E1204 12:05:34.461000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_1 = r0_index 2025-12-04T12:15:06.0595085Z E1204 12:05:34.461000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask, eviction_policy='evict_last', other=0.0).to(tl.float32) 2025-12-04T12:15:06.0595614Z E1204 12:05:34.461000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float32) 2025-12-04T12:15:06.0596262Z E1204 12:05:34.461000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK]) 2025-12-04T12:15:06.0596987Z E1204 12:05:34.461000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_mean_next, tmp3_m2_next, tmp3_weight_next = triton_helpers.welford_reduce( 2025-12-04T12:15:06.0597594Z E1204 12:05:34.461000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2, tmp3_mean, tmp3_m2, tmp3_weight, roffset == 0 2025-12-04T12:15:06.0598015Z E1204 12:05:34.461000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ) 2025-12-04T12:15:06.0598633Z E1204 12:05:34.461000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_mean = tl.where(r0_mask, tmp3_mean_next, tmp3_mean) 2025-12-04T12:15:06.0599239Z E1204 12:05:34.461000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_m2 = tl.where(r0_mask, tmp3_m2_next, tmp3_m2) 2025-12-04T12:15:06.0599889Z E1204 12:05:34.461000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_weight = tl.where(r0_mask, tmp3_weight_next, tmp3_weight) 2025-12-04T12:15:06.0600612Z E1204 12:05:34.461000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4, tmp5, tmp6 = triton_helpers.welford(tmp3_mean, tmp3_m2, tmp3_weight, 1) 2025-12-04T12:15:06.0601096Z E1204 12:05:34.461000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tmp4[:, None] 2025-12-04T12:15:06.0601572Z E1204 12:05:34.461000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = tmp5[:, None] 2025-12-04T12:15:06.0602061Z E1204 12:05:34.461000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tmp6[:, None] 2025-12-04T12:15:06.0602730Z E1204 12:05:34.461000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp20 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32) 2025-12-04T12:15:06.0603272Z E1204 12:05:34.461000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp22 = tl.load(in_ptr1 + (0)) 2025-12-04T12:15:06.0603829Z E1204 12:05:34.461000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp23 = tl.broadcast_to(tmp22, [1, 1]) 2025-12-04T12:15:06.0604462Z E1204 12:05:34.461000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] for r0_offset in tl.range(0, r0_numel, R0_BLOCK): 2025-12-04T12:15:06.0605013Z E1204 12:05:34.461000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = r0_offset + r0_base 2025-12-04T12:15:06.0605541Z E1204 12:05:34.461000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T12:15:06.0606054Z E1204 12:05:34.461000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T12:15:06.0606535Z E1204 12:05:34.461000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T12:15:06.0607034Z E1204 12:05:34.461000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_1 = r0_index 2025-12-04T12:15:06.0607835Z E1204 12:05:34.461000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask, eviction_policy='evict_first', other=0.0).to(tl.float32) 2025-12-04T12:15:06.0608366Z E1204 12:05:34.461000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = tmp9.to(tl.float32) 2025-12-04T12:15:06.0608881Z E1204 12:05:34.461000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = tmp10 - tmp3 2025-12-04T12:15:06.0609385Z E1204 12:05:34.461000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = 4096.0 2025-12-04T12:15:06.0609889Z E1204 12:05:34.461000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = (tmp7 / tmp12) 2025-12-04T12:15:06.0610364Z E1204 12:05:34.461000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp14 = 1e-05 2025-12-04T12:15:06.0610861Z E1204 12:05:34.461000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp15 = tmp13 + tmp14 2025-12-04T12:15:06.0611418Z E1204 12:05:34.461000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp16 = libdevice.rsqrt(tmp15) 2025-12-04T12:15:06.0611914Z E1204 12:05:34.461000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp17 = tmp11 * tmp16 2025-12-04T12:15:06.0612439Z E1204 12:05:34.461000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp18 = tl_math.abs(tmp17) 2025-12-04T12:15:06.0613053Z E1204 12:05:34.461000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp19 = tl.broadcast_to(tmp18, [XBLOCK, R0_BLOCK]) 2025-12-04T12:15:06.0613629Z E1204 12:05:34.461000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp21 = triton_helpers.maximum(_tmp20, tmp19) 2025-12-04T12:15:06.0614210Z E1204 12:05:34.461000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp20 = tl.where(r0_mask, tmp21, _tmp20) 2025-12-04T12:15:06.0614710Z E1204 12:05:34.461000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp24 = tmp17 * tmp23 2025-12-04T12:15:06.0615190Z E1204 12:05:34.461000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp25 = -448.0 2025-12-04T12:15:06.0615767Z E1204 12:05:34.461000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp26 = triton_helpers.maximum(tmp24, tmp25) 2025-12-04T12:15:06.0616259Z E1204 12:05:34.461000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp27 = 448.0 2025-12-04T12:15:06.0616938Z E1204 12:05:34.461000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp28 = triton_helpers.minimum(tmp26, tmp27) 2025-12-04T12:15:06.0617478Z E1204 12:05:34.461000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp29 = tmp28.to(tl.float8e4nv) 2025-12-04T12:15:06.0618125Z E1204 12:05:34.461000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr3 + (r0_1 + 4096*x0), tmp29, r0_mask) 2025-12-04T12:15:06.0618697Z E1204 12:05:34.461000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp20 = triton_helpers.max2(_tmp20, 1)[:, None] 2025-12-04T12:15:06.0619244Z E1204 12:05:34.461000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr2 + (x0), tmp20, None) 2025-12-04T12:15:06.0619620Z E1204 12:05:34.461000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:06.0621991Z E1204 12:05:34.461000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr2': '*fp32', 'out_ptr3': '*fp8e4nv', 'xnumel': 'i32', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1, 'R0_BLOCK': 4096}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 75} 2025-12-04T12:15:06.0622574Z E1204 12:05:34.461000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:06.0623622Z E1204 12:05:34.461000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:06.0624267Z E1204 12:05:34.461000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:06.0625156Z E1204 12:05:34.461000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:06.0625850Z E1204 12:05:34.461000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:06.0626736Z E1204 12:05:34.461000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:06.0627517Z E1204 12:05:34.461000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:06.0628129Z E1204 12:05:34.461000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T12:15:06.0629230Z E1204 12:05:34.461000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T12:15:06.0629614Z E1204 12:05:34.461000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T12:15:06.0630541Z E1204 12:05:34.461000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:06.0630694Z ('RERUN', {'yellow': True}) [3.4181s] [100%] 2025-12-04T12:15:06.0632046Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_4,2048,4096_cuda E1204 12:05:34.936000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0 2025-12-04T12:15:06.0633175Z E1204 12:05:34.936000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T12:15:06.0633622Z E1204 12:05:34.936000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 8192 2025-12-04T12:15:06.0634076Z E1204 12:05:34.936000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 4096 2025-12-04T12:15:06.0634587Z E1204 12:05:34.936000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T12:15:06.0635121Z E1204 12:05:34.936000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T12:15:06.0635678Z E1204 12:05:34.936000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:06.0636260Z E1204 12:05:34.936000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T12:15:06.0636843Z E1204 12:05:34.936000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T12:15:06.0637452Z E1204 12:05:34.936000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_base = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T12:15:06.0637907Z E1204 12:05:34.936000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rbase = r0_base 2025-12-04T12:15:06.0638353Z E1204 12:05:34.936000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T12:15:06.0638955Z E1204 12:05:34.936000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_mean = tl.zeros([XBLOCK, R0_BLOCK], tl.float32) 2025-12-04T12:15:06.0639539Z E1204 12:05:34.936000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_m2 = tl.zeros([XBLOCK, R0_BLOCK], tl.float32) 2025-12-04T12:15:06.0640158Z E1204 12:05:34.936000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_weight = tl.zeros([XBLOCK, R0_BLOCK], tl.float32) 2025-12-04T12:15:06.0640742Z E1204 12:05:34.936000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] for r0_offset in tl.range(0, r0_numel, R0_BLOCK): 2025-12-04T12:15:06.0641288Z E1204 12:05:34.936000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = r0_offset + r0_base 2025-12-04T12:15:06.0641811Z E1204 12:05:34.936000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T12:15:06.0642314Z E1204 12:05:34.936000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T12:15:06.0642794Z E1204 12:05:34.936000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T12:15:06.0643260Z E1204 12:05:34.936000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_1 = r0_index 2025-12-04T12:15:06.0644086Z E1204 12:05:34.936000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask, eviction_policy='evict_last', other=0.0).to(tl.float32) 2025-12-04T12:15:06.0644614Z E1204 12:05:34.936000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float32) 2025-12-04T12:15:06.0645218Z E1204 12:05:34.936000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK]) 2025-12-04T12:15:06.0645967Z E1204 12:05:34.936000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_mean_next, tmp3_m2_next, tmp3_weight_next = triton_helpers.welford_reduce( 2025-12-04T12:15:06.0646571Z E1204 12:05:34.936000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2, tmp3_mean, tmp3_m2, tmp3_weight, roffset == 0 2025-12-04T12:15:06.0646996Z E1204 12:05:34.936000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ) 2025-12-04T12:15:06.0647613Z E1204 12:05:34.936000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_mean = tl.where(r0_mask, tmp3_mean_next, tmp3_mean) 2025-12-04T12:15:06.0648247Z E1204 12:05:34.936000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_m2 = tl.where(r0_mask, tmp3_m2_next, tmp3_m2) 2025-12-04T12:15:06.0648899Z E1204 12:05:34.936000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_weight = tl.where(r0_mask, tmp3_weight_next, tmp3_weight) 2025-12-04T12:15:06.0649616Z E1204 12:05:34.936000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4, tmp5, tmp6 = triton_helpers.welford(tmp3_mean, tmp3_m2, tmp3_weight, 1) 2025-12-04T12:15:06.0650098Z E1204 12:05:34.936000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tmp4[:, None] 2025-12-04T12:15:06.0650608Z E1204 12:05:34.936000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = tmp5[:, None] 2025-12-04T12:15:06.0651098Z E1204 12:05:34.936000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tmp6[:, None] 2025-12-04T12:15:06.0651731Z E1204 12:05:34.936000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp20 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32) 2025-12-04T12:15:06.0652267Z E1204 12:05:34.936000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp22 = tl.load(in_ptr1 + (0)) 2025-12-04T12:15:06.0652819Z E1204 12:05:34.936000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp23 = tl.broadcast_to(tmp22, [1, 1]) 2025-12-04T12:15:06.0653400Z E1204 12:05:34.936000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] for r0_offset in tl.range(0, r0_numel, R0_BLOCK): 2025-12-04T12:15:06.0653947Z E1204 12:05:34.936000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = r0_offset + r0_base 2025-12-04T12:15:06.0654476Z E1204 12:05:34.936000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T12:15:06.0654981Z E1204 12:05:34.936000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T12:15:06.0655463Z E1204 12:05:34.936000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T12:15:06.0655929Z E1204 12:05:34.936000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_1 = r0_index 2025-12-04T12:15:06.0656794Z E1204 12:05:34.936000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask, eviction_policy='evict_first', other=0.0).to(tl.float32) 2025-12-04T12:15:06.0657388Z E1204 12:05:34.936000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = tmp9.to(tl.float32) 2025-12-04T12:15:06.0657901Z E1204 12:05:34.936000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = tmp10 - tmp3 2025-12-04T12:15:06.0658361Z E1204 12:05:34.936000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = 4096.0 2025-12-04T12:15:06.0658906Z E1204 12:05:34.936000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = (tmp7 / tmp12) 2025-12-04T12:15:06.0659365Z E1204 12:05:34.936000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp14 = 1e-05 2025-12-04T12:15:06.0659863Z E1204 12:05:34.936000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp15 = tmp13 + tmp14 2025-12-04T12:15:06.0660417Z E1204 12:05:34.936000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp16 = libdevice.rsqrt(tmp15) 2025-12-04T12:15:06.0660910Z E1204 12:05:34.936000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp17 = tmp11 * tmp16 2025-12-04T12:15:06.0661479Z E1204 12:05:34.936000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp18 = tl_math.abs(tmp17) 2025-12-04T12:15:06.0662073Z E1204 12:05:34.936000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp19 = tl.broadcast_to(tmp18, [XBLOCK, R0_BLOCK]) 2025-12-04T12:15:06.0662653Z E1204 12:05:34.936000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp21 = triton_helpers.maximum(_tmp20, tmp19) 2025-12-04T12:15:06.0663226Z E1204 12:05:34.936000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp20 = tl.where(r0_mask, tmp21, _tmp20) 2025-12-04T12:15:06.0663780Z E1204 12:05:34.936000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp24 = tmp17 * tmp23 2025-12-04T12:15:06.0664262Z E1204 12:05:34.936000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp25 = -448.0 2025-12-04T12:15:06.0664841Z E1204 12:05:34.936000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp26 = triton_helpers.maximum(tmp24, tmp25) 2025-12-04T12:15:06.0665297Z E1204 12:05:34.936000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp27 = 448.0 2025-12-04T12:15:06.0665889Z E1204 12:05:34.936000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp28 = triton_helpers.minimum(tmp26, tmp27) 2025-12-04T12:15:06.0666426Z E1204 12:05:34.936000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp29 = tmp28.to(tl.float8e4nv) 2025-12-04T12:15:06.0667039Z E1204 12:05:34.936000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr3 + (r0_1 + 4096*x0), tmp29, r0_mask) 2025-12-04T12:15:06.0667614Z E1204 12:05:34.936000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp20 = triton_helpers.max2(_tmp20, 1)[:, None] 2025-12-04T12:15:06.0668158Z E1204 12:05:34.936000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr2 + (x0), tmp20, None) 2025-12-04T12:15:06.0668536Z E1204 12:05:34.936000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:06.0670920Z E1204 12:05:34.936000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr2': '*fp32', 'out_ptr3': '*fp8e4nv', 'xnumel': 'i32', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1, 'R0_BLOCK': 4096}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 75} 2025-12-04T12:15:06.0671686Z E1204 12:05:34.936000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:06.0672803Z E1204 12:05:34.936000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:06.0673445Z E1204 12:05:34.936000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:06.0674339Z E1204 12:05:34.936000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:06.0675038Z E1204 12:05:34.936000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:06.0675975Z E1204 12:05:34.936000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:06.0676761Z E1204 12:05:34.936000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:06.0677368Z E1204 12:05:34.936000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T12:15:06.0678584Z E1204 12:05:34.936000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T12:15:06.0678970Z E1204 12:05:34.936000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T12:15:06.0679857Z E1204 12:05:34.936000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:06.0680005Z ('RERUN', {'yellow': True}) [0.4365s] [100%] 2025-12-04T12:15:06.0681540Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_4,2048,4096_cuda E1204 12:05:35.382000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0 2025-12-04T12:15:06.0682643Z E1204 12:05:35.382000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T12:15:06.0683092Z E1204 12:05:35.382000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 8192 2025-12-04T12:15:06.0683543Z E1204 12:05:35.382000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_numel = 4096 2025-12-04T12:15:06.0684018Z E1204 12:05:35.382000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rnumel = r0_numel 2025-12-04T12:15:06.0684558Z E1204 12:05:35.382000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] RBLOCK: tl.constexpr = R0_BLOCK 2025-12-04T12:15:06.0685117Z E1204 12:05:35.382000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:06.0685757Z E1204 12:05:35.382000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:, None] 2025-12-04T12:15:06.0686359Z E1204 12:05:35.382000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:, None] 2025-12-04T12:15:06.0686912Z E1204 12:05:35.382000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_base = tl.arange(0, R0_BLOCK)[None, :] 2025-12-04T12:15:06.0687391Z E1204 12:05:35.382000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rbase = r0_base 2025-12-04T12:15:06.0687836Z E1204 12:05:35.382000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T12:15:06.0688442Z E1204 12:05:35.382000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_mean = tl.zeros([XBLOCK, R0_BLOCK], tl.float32) 2025-12-04T12:15:06.0689045Z E1204 12:05:35.382000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_m2 = tl.zeros([XBLOCK, R0_BLOCK], tl.float32) 2025-12-04T12:15:06.0689682Z E1204 12:05:35.382000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_weight = tl.zeros([XBLOCK, R0_BLOCK], tl.float32) 2025-12-04T12:15:06.0690260Z E1204 12:05:35.382000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] for r0_offset in tl.range(0, r0_numel, R0_BLOCK): 2025-12-04T12:15:06.0690810Z E1204 12:05:35.382000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = r0_offset + r0_base 2025-12-04T12:15:06.0691336Z E1204 12:05:35.382000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T12:15:06.0691872Z E1204 12:05:35.382000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T12:15:06.0692358Z E1204 12:05:35.382000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T12:15:06.0692824Z E1204 12:05:35.382000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_1 = r0_index 2025-12-04T12:15:06.0693619Z E1204 12:05:35.382000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask, eviction_policy='evict_last', other=0.0).to(tl.float32) 2025-12-04T12:15:06.0694148Z E1204 12:05:35.382000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float32) 2025-12-04T12:15:06.0694753Z E1204 12:05:35.382000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tl.broadcast_to(tmp1, [XBLOCK, R0_BLOCK]) 2025-12-04T12:15:06.0695476Z E1204 12:05:35.382000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_mean_next, tmp3_m2_next, tmp3_weight_next = triton_helpers.welford_reduce( 2025-12-04T12:15:06.0696091Z E1204 12:05:35.382000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2, tmp3_mean, tmp3_m2, tmp3_weight, roffset == 0 2025-12-04T12:15:06.0696560Z E1204 12:05:35.382000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ) 2025-12-04T12:15:06.0697181Z E1204 12:05:35.382000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_mean = tl.where(r0_mask, tmp3_mean_next, tmp3_mean) 2025-12-04T12:15:06.0697785Z E1204 12:05:35.382000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_m2 = tl.where(r0_mask, tmp3_m2_next, tmp3_m2) 2025-12-04T12:15:06.0698433Z E1204 12:05:35.382000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3_weight = tl.where(r0_mask, tmp3_weight_next, tmp3_weight) 2025-12-04T12:15:06.0699196Z E1204 12:05:35.382000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4, tmp5, tmp6 = triton_helpers.welford(tmp3_mean, tmp3_m2, tmp3_weight, 1) 2025-12-04T12:15:06.0699684Z E1204 12:05:35.382000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tmp4[:, None] 2025-12-04T12:15:06.0700157Z E1204 12:05:35.382000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp7 = tmp5[:, None] 2025-12-04T12:15:06.0700675Z E1204 12:05:35.382000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp8 = tmp6[:, None] 2025-12-04T12:15:06.0701327Z E1204 12:05:35.382000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp20 = tl.full([XBLOCK, R0_BLOCK], float("-inf"), tl.float32) 2025-12-04T12:15:06.0701864Z E1204 12:05:35.382000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp22 = tl.load(in_ptr1 + (0)) 2025-12-04T12:15:06.0702414Z E1204 12:05:35.382000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp23 = tl.broadcast_to(tmp22, [1, 1]) 2025-12-04T12:15:06.0703032Z E1204 12:05:35.382000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] for r0_offset in tl.range(0, r0_numel, R0_BLOCK): 2025-12-04T12:15:06.0703582Z E1204 12:05:35.382000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_index = r0_offset + r0_base 2025-12-04T12:15:06.0704114Z E1204 12:05:35.382000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_mask = r0_index < r0_numel 2025-12-04T12:15:06.0704617Z E1204 12:05:35.382000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] roffset = r0_offset 2025-12-04T12:15:06.0705101Z E1204 12:05:35.382000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] rindex = r0_index 2025-12-04T12:15:06.0705630Z E1204 12:05:35.382000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] r0_1 = r0_index 2025-12-04T12:15:06.0706420Z E1204 12:05:35.382000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp9 = tl.load(in_ptr0 + (r0_1 + 4096*x0), r0_mask, eviction_policy='evict_first', other=0.0).to(tl.float32) 2025-12-04T12:15:06.0706951Z E1204 12:05:35.382000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp10 = tmp9.to(tl.float32) 2025-12-04T12:15:06.0707469Z E1204 12:05:35.382000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp11 = tmp10 - tmp3 2025-12-04T12:15:06.0707933Z E1204 12:05:35.382000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp12 = 4096.0 2025-12-04T12:15:06.0708449Z E1204 12:05:35.382000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp13 = (tmp7 / tmp12) 2025-12-04T12:15:06.0708911Z E1204 12:05:35.382000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp14 = 1e-05 2025-12-04T12:15:06.0709413Z E1204 12:05:35.382000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp15 = tmp13 + tmp14 2025-12-04T12:15:06.0709968Z E1204 12:05:35.382000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp16 = libdevice.rsqrt(tmp15) 2025-12-04T12:15:06.0710471Z E1204 12:05:35.382000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp17 = tmp11 * tmp16 2025-12-04T12:15:06.0711008Z E1204 12:05:35.382000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp18 = tl_math.abs(tmp17) 2025-12-04T12:15:06.0711607Z E1204 12:05:35.382000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp19 = tl.broadcast_to(tmp18, [XBLOCK, R0_BLOCK]) 2025-12-04T12:15:06.0712237Z E1204 12:05:35.382000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp21 = triton_helpers.maximum(_tmp20, tmp19) 2025-12-04T12:15:06.0712816Z E1204 12:05:35.382000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] _tmp20 = tl.where(r0_mask, tmp21, _tmp20) 2025-12-04T12:15:06.0713317Z E1204 12:05:35.382000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp24 = tmp17 * tmp23 2025-12-04T12:15:06.0713832Z E1204 12:05:35.382000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp25 = -448.0 2025-12-04T12:15:06.0714412Z E1204 12:05:35.382000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp26 = triton_helpers.maximum(tmp24, tmp25) 2025-12-04T12:15:06.0714872Z E1204 12:05:35.382000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp27 = 448.0 2025-12-04T12:15:06.0715470Z E1204 12:05:35.382000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp28 = triton_helpers.minimum(tmp26, tmp27) 2025-12-04T12:15:06.0716012Z E1204 12:05:35.382000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp29 = tmp28.to(tl.float8e4nv) 2025-12-04T12:15:06.0716660Z E1204 12:05:35.382000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr3 + (r0_1 + 4096*x0), tmp29, r0_mask) 2025-12-04T12:15:06.0717239Z E1204 12:05:35.382000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp20 = triton_helpers.max2(_tmp20, 1)[:, None] 2025-12-04T12:15:06.0717802Z E1204 12:05:35.382000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr2 + (x0), tmp20, None) 2025-12-04T12:15:06.0718167Z E1204 12:05:35.382000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:06.0720539Z E1204 12:05:35.382000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'in_ptr1': '*fp32', 'out_ptr2': '*fp32', 'out_ptr3': '*fp8e4nv', 'xnumel': 'i32', 'r0_numel': 'i32', 'XBLOCK': 'constexpr', 'R0_BLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1, 'R0_BLOCK': 4096}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]], (3,): [['tt.divisibility', 16]], (4,): [['tt.divisibility', 16]], (5,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 16, 'num_stages': 1, 'debug': True, 'cc': 75} 2025-12-04T12:15:06.0721094Z E1204 12:05:35.382000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:06.0722143Z E1204 12:05:35.382000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:06.0722782Z E1204 12:05:35.382000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:06.0723690Z E1204 12:05:35.382000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:06.0724388Z E1204 12:05:35.382000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:06.0725275Z E1204 12:05:35.382000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:06.0726103Z E1204 12:05:35.382000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:06.0726717Z E1204 12:05:35.382000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T12:15:06.0727815Z E1204 12:05:35.382000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T12:15:06.0728226Z E1204 12:05:35.382000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T12:15:06.0729164Z E1204 12:05:35.382000 122857 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:06.0729329Z FAILED [0.4437s] [100%] 2025-12-04T12:15:06.0729340Z 2025-12-04T12:15:06.0729511Z ==================================== RERUNS ==================================== 2025-12-04T12:15:06.0729972Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_4,2048,4096_cuda _ 2025-12-04T12:15:06.0730097Z Traceback (most recent call last): 2025-12-04T12:15:06.0730521Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant 2025-12-04T12:15:06.0730771Z y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T12:15:06.0731256Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:06.0731507Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:06.0732032Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:06.0732258Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:06.0732782Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:06.0732931Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:06.0733468Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:06.0733806Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:06.0734327Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:06.0734488Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:06.0734970Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:06.0735094Z return self._compile_to_module() 2025-12-04T12:15:06.0735599Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:06.0735767Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:06.0736349Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:06.0736497Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:06.0736996Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:06.0737244Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:06.0737834Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:06.0737962Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:06.0738482Z File "/tmp/tmpv46aehkk/wb/cwbdu5clk27grsjparnqcqxu533u4tall67yccoeu3w7liyrugrq.py", line 65, in 2025-12-04T12:15:06.0738996Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T12:15:06.0739128Z kernel.precompile( 2025-12-04T12:15:06.0739688Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T12:15:06.0739809Z self._precompile_worker() 2025-12-04T12:15:06.0740453Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:06.0740636Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:06.0741248Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:06.0741449Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:06.0741905Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:06.0742161Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:06.0742637Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:06.0742973Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:06.0743216Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:06.0743870Z def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T12:15:06.0743976Z ^ 2025-12-04T12:15:06.0744433Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:06.0744439Z 2025-12-04T12:15:06.0745190Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:06.0745212Z 2025-12-04T12:15:06.0745218Z 2025-12-04T12:15:06.0745438Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:06.0746153Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_4,2048,4096_cuda 2025-12-04T12:15:06.0746161Z 2025-12-04T12:15:06.0746444Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:06.0746675Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:06.0746796Z frames [('total', 1)] 2025-12-04T12:15:06.0746915Z stats [('calls_captured', 10)] 2025-12-04T12:15:06.0747382Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:06.0747625Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:06.0747727Z graph_break [] 2025-12-04T12:15:06.0748132Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_4,2048,4096_cuda _ 2025-12-04T12:15:06.0748272Z Traceback (most recent call last): 2025-12-04T12:15:06.0748698Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant 2025-12-04T12:15:06.0748948Z y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T12:15:06.0749440Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:06.0749689Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:06.0750215Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:06.0750411Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:06.0750960Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:06.0751124Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:06.0751656Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:06.0751990Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:06.0752542Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:06.0752690Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:06.0753184Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:06.0753308Z return self._compile_to_module() 2025-12-04T12:15:06.0753812Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:06.0753977Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:06.0754545Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:06.0754689Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:06.0755184Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:06.0755420Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:06.0756023Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:06.0756149Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:06.0756628Z File "/tmp/tmpjikr7_i1/z2/cz2ugkaiwh4il4c4mrl27mf257qzxt4nf5ka7czk3gk6jk5l5ypd.py", line 65, in 2025-12-04T12:15:06.0757124Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T12:15:06.0757240Z kernel.precompile( 2025-12-04T12:15:06.0757812Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T12:15:06.0757931Z self._precompile_worker() 2025-12-04T12:15:06.0758544Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:06.0758725Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:06.0759320Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:06.0759533Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:06.0759988Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:06.0760234Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:06.0760692Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:06.0761027Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:06.0761268Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:06.0761920Z def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T12:15:06.0762010Z ^ 2025-12-04T12:15:06.0762481Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:06.0762488Z 2025-12-04T12:15:06.0763236Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:06.0763242Z 2025-12-04T12:15:06.0763249Z 2025-12-04T12:15:06.0763480Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:06.0764191Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_4,2048,4096_cuda 2025-12-04T12:15:06.0764231Z 2025-12-04T12:15:06.0764515Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:06.0764737Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:06.0764843Z frames [('total', 1)] 2025-12-04T12:15:06.0764972Z stats [('calls_captured', 10)] 2025-12-04T12:15:06.0765439Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:06.0765667Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:06.0765779Z graph_break [] 2025-12-04T12:15:06.0765999Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:06.0766149Z frames [('total', 1)] 2025-12-04T12:15:06.0766267Z stats [('calls_captured', 10)] 2025-12-04T12:15:06.0766487Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:06.0766962Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:06.0767065Z graph_break [] 2025-12-04T12:15:06.0767218Z =================================== FAILURES =================================== 2025-12-04T12:15:06.0767639Z _ TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_4,2048,4096_cuda _ 2025-12-04T12:15:06.0767765Z Traceback (most recent call last): 2025-12-04T12:15:06.0768248Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 314, in test_layernorm_fp8_quant 2025-12-04T12:15:06.0768487Z y_compiled = compiled_ln_fp8_quant(x, scale, amax_buffer_compiled) 2025-12-04T12:15:06.0768978Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:06.0769243Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:06.0769757Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:06.0769956Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:06.0770480Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:06.0770631Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:06.0771395Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:06.0771727Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:06.0772247Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:06.0772415Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:06.0772898Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:06.0773039Z return self._compile_to_module() 2025-12-04T12:15:06.0778760Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:06.0778985Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:06.0779539Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:06.0779675Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:06.0780340Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:06.0780606Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:06.0781196Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:06.0781345Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:06.0781899Z File "/tmp/tmp8xe0k353/x4/cx4q76mhbm4gdp4jsi72uakzjarpab5hz6blpceocira7y3nyci7.py", line 65, in 2025-12-04T12:15:06.0782368Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T12:15:06.0782501Z kernel.precompile( 2025-12-04T12:15:06.0783057Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T12:15:06.0783178Z self._precompile_worker() 2025-12-04T12:15:06.0783796Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:06.0784031Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:06.0784636Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:06.0784841Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:06.0785295Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:06.0785560Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:06.0786003Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:06.0786352Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:06.0786637Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:06.0787295Z def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T12:15:06.0787408Z ^ 2025-12-04T12:15:06.0787867Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:06.0787876Z 2025-12-04T12:15:06.0788602Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:06.0788608Z 2025-12-04T12:15:06.0788613Z 2025-12-04T12:15:06.0788832Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:06.0789546Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_4,2048,4096_cuda 2025-12-04T12:15:06.0789569Z 2025-12-04T12:15:06.0789837Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:06.0790066Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:06.0790187Z frames [('total', 1)] 2025-12-04T12:15:06.0790306Z stats [('calls_captured', 10)] 2025-12-04T12:15:06.0790773Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:06.0791012Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:06.0791115Z graph_break [] 2025-12-04T12:15:06.0791341Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:06.0791458Z frames [('total', 1)] 2025-12-04T12:15:06.0791576Z stats [('calls_captured', 10)] 2025-12-04T12:15:06.0791809Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:06.0792309Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:06.0792415Z graph_break [] 2025-12-04T12:15:06.0792649Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:06.0792760Z frames [('total', 1)] 2025-12-04T12:15:06.0792874Z stats [('calls_captured', 10)] 2025-12-04T12:15:06.0793136Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:06.0793593Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:06.0793708Z graph_break [] 2025-12-04T12:15:06.0794357Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-d73817a3e5f02a06.xml - 2025-12-04T12:15:06.0794532Z =========================== short test summary info ============================ 2025-12-04T12:15:06.0795416Z FAILED [0.4437s] inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_4,2048,4096_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:06.0796100Z def triton_red_fused__to_copy_abs_amax_clamp_mul_native_layer_norm_0(in_ptr0, in_ptr1, out_ptr2, out_ptr3, xnumel, r0_numel, XBLOCK : tl.constexpr, R0_BLOCK : tl.constexpr): 2025-12-04T12:15:06.0796204Z ^ 2025-12-04T12:15:06.0796668Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:06.0796674Z 2025-12-04T12:15:06.0797380Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:06.0797386Z 2025-12-04T12:15:06.0797405Z 2025-12-04T12:15:06.0797626Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:06.0798374Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_4,2048,4096_cuda 2025-12-04T12:15:06.0798382Z 2025-12-04T12:15:06.0798670Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:06.0798856Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T12:15:06.0799058Z ================== 1 failed, 187 deselected, 2 rerun in 4.34s ================== 2025-12-04T12:15:06.0799175Z Got exit code 1 2025-12-04T12:15:06.0799808Z FAILED CONSISTENTLY: test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_4,2048,4096_cuda 2025-12-04T12:15:06.0800230Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T12:15:06.0800697Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-90e37d7f0968dad1.xml 2025-12-04T12:15:06.0800865Z ============================= test session starts ============================== 2025-12-04T12:15:06.0801235Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T12:15:06.0801350Z cachedir: .pytest_cache 2025-12-04T12:15:06.0801882Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:15:06.0802013Z rootdir: /var/lib/jenkins/workspace 2025-12-04T12:15:06.0802123Z configfile: pytest.ini 2025-12-04T12:15:06.0802729Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T12:15:06.0802954Z collecting ... collected 188 items / 37 deselected / 151 selected 2025-12-04T12:15:06.0803099Z stepcurrent: skipping 37 already run items. 2025-12-04T12:15:06.0803229Z Running 151 items in this shard 2025-12-04T12:15:06.0803234Z 2025-12-04T12:15:06.0803836Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e5m2_amax_keep_dim_False_shape_1,1,15_cuda PASSED [3.5371s] [ 0%] 2025-12-04T12:15:06.0804418Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e5m2_amax_keep_dim_False_shape_1,10,15_cuda PASSED [0.6738s] [ 1%] 2025-12-04T12:15:06.0804985Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e5m2_amax_keep_dim_False_shape_1,10,4096_cuda PASSED [0.8417s] [ 1%] 2025-12-04T12:15:06.0805601Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e5m2_amax_keep_dim_False_shape_1,10,512_cuda PASSED [0.5862s] [ 2%] 2025-12-04T12:15:06.0806185Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e5m2_amax_keep_dim_False_shape_4,2048,4096_cuda PASSED [0.9603s] [ 3%] 2025-12-04T12:15:06.0806731Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e5m2_amax_keep_dim_True_shape_1,1,15_cuda PASSED [0.7028s] [ 3%] 2025-12-04T12:15:06.0807297Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e5m2_amax_keep_dim_True_shape_1,10,15_cuda PASSED [0.7668s] [ 4%] 2025-12-04T12:15:06.0807900Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e5m2_amax_keep_dim_True_shape_1,10,4096_cuda PASSED [1.0653s] [ 5%] 2025-12-04T12:15:06.0808458Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e5m2_amax_keep_dim_True_shape_1,10,512_cuda PASSED [0.7732s] [ 5%] 2025-12-04T12:15:06.0809041Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e5m2_amax_keep_dim_True_shape_4,2048,4096_cuda PASSED [1.1238s] [ 6%] 2025-12-04T12:15:06.0809554Z inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_bfloat16_float8_e4m3fn_shape_16,16,16_cuda PASSED [0.4479s] [ 7%] 2025-12-04T12:15:06.0810092Z inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_bfloat16_float8_e4m3fn_shape_4,2048,4096_cuda PASSED [0.4350s] [ 7%] 2025-12-04T12:15:06.0810629Z inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_bfloat16_float8_e5m2_shape_16,16,16_cuda PASSED [0.3957s] [ 8%] 2025-12-04T12:15:06.0811158Z inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_bfloat16_float8_e5m2_shape_4,2048,4096_cuda PASSED [0.4104s] [ 9%] 2025-12-04T12:15:06.0811740Z inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float16_float8_e4m3fn_shape_16,16,16_cuda ('RERUN', {'yellow': True}) [1.3575s] [ 9%] 2025-12-04T12:15:06.0812318Z inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float16_float8_e4m3fn_shape_16,16,16_cuda ('RERUN', {'yellow': True}) [1.1440s] [ 9%] 2025-12-04T12:15:06.0812836Z inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float16_float8_e4m3fn_shape_16,16,16_cuda FAILED [1.0731s] [ 9%] 2025-12-04T12:15:06.0812842Z 2025-12-04T12:15:06.0812986Z ==================================== RERUNS ==================================== 2025-12-04T12:15:06.0813317Z _ TestFP8TypesCUDA.test_to_fp8_saturated_float16_float8_e4m3fn_shape_16,16,16_cuda _ 2025-12-04T12:15:06.0813448Z Traceback (most recent call last): 2025-12-04T12:15:06.0813859Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 206, in test_to_fp8_saturated 2025-12-04T12:15:06.0814027Z y_compiled = compiled_fp8_cast(x, dst_dtype) 2025-12-04T12:15:06.0814520Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:06.0814770Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:06.0815297Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:06.0815493Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:06.0816016Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:06.0816164Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:06.0816847Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:06.0817190Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:06.0817710Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:06.0817871Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:06.0818385Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:06.0818508Z return self._compile_to_module() 2025-12-04T12:15:06.0819006Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:06.0819168Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:06.0819684Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:06.0819825Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:06.0820354Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:06.0820600Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:06.0821184Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:06.0821313Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:06.0821827Z File "/tmp/tmpw4l6efgy/rm/crmgjskr4wu2ah2xky35exzl3u4jvb7w5dsvy63hh3bu3hqsieaq.py", line 84, in 2025-12-04T12:15:06.0822279Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 649, in wait 2025-12-04T12:15:06.0822408Z self._wait_futures(scope) 2025-12-04T12:15:06.0822941Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 669, in _wait_futures 2025-12-04T12:15:06.0823061Z kernel = result.result() 2025-12-04T12:15:06.0823519Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 4416, in result 2025-12-04T12:15:06.0823635Z return self.result_fn() 2025-12-04T12:15:06.0824113Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 438, in get_result 2025-12-04T12:15:06.0824256Z raise e.with_name(kernel_name) from e 2025-12-04T12:15:06.0824636Z torch._inductor.exc.InductorError: SubprocException: An exception occurred in a subprocess: 2025-12-04T12:15:06.0824643Z 2025-12-04T12:15:06.0824786Z Name=triton_poi_fused__to_copy_clamp_0 2025-12-04T12:15:06.0824910Z Traceback (most recent call last): 2025-12-04T12:15:06.0825449Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_worker/subproc_pool.py", line 457, in do_job 2025-12-04T12:15:06.0825565Z result = job() 2025-12-04T12:15:06.0826158Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 68, in _worker_compile_triton 2025-12-04T12:15:06.0826314Z kernel.precompile(warm_cache_only=True) 2025-12-04T12:15:06.0826872Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 432, in precompile 2025-12-04T12:15:06.0826993Z self._precompile_worker() 2025-12-04T12:15:06.0827603Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:06.0827785Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:06.0828375Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:06.0828583Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:06.0829068Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:06.0829326Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:06.0829773Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:06.0830107Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:06.0830338Z triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T12:15:06.0830649Z def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:06.0830736Z ^ 2025-12-04T12:15:06.0831207Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:06.0831212Z 2025-12-04T12:15:06.0831217Z 2025-12-04T12:15:06.0831936Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:06.0831942Z 2025-12-04T12:15:06.0831981Z 2025-12-04T12:15:06.0832212Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:06.0832961Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_to_fp8_saturated_float16_float8_e4m3fn_shape_16,16,16_cuda 2025-12-04T12:15:06.0832973Z 2025-12-04T12:15:06.0833319Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:06.0833552Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:06.0833658Z frames [('total', 1)] 2025-12-04T12:15:06.0833789Z stats [('calls_captured', 8)] 2025-12-04T12:15:06.0834011Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)] 2025-12-04T12:15:06.0834653Z inductor [('pattern_matcher_nodes', 2), ('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_hit', 1)] 2025-12-04T12:15:06.0834759Z graph_break [] 2025-12-04T12:15:06.0835078Z _ TestFP8TypesCUDA.test_to_fp8_saturated_float16_float8_e4m3fn_shape_16,16,16_cuda _ 2025-12-04T12:15:06.0835222Z Traceback (most recent call last): 2025-12-04T12:15:06.0835629Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 206, in test_to_fp8_saturated 2025-12-04T12:15:06.0835778Z y_compiled = compiled_fp8_cast(x, dst_dtype) 2025-12-04T12:15:06.0836286Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:06.0836537Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:06.0837063Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:06.0837259Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:06.0837774Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:06.0837937Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:06.0838470Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:06.0838806Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:06.0839332Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:06.0839483Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:06.0839973Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:06.0840095Z return self._compile_to_module() 2025-12-04T12:15:06.0840582Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:06.0840795Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:06.0841311Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:06.0841457Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:06.0841954Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:06.0842221Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:06.0842824Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:06.0842952Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:06.0843457Z File "/tmp/tmpsws2xeka/3y/c3ykxdx5kb37kftbduqqa2c4ga6td42ejfr5fep3ellkp6vbyla4.py", line 84, in 2025-12-04T12:15:06.0844015Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 649, in wait 2025-12-04T12:15:06.0844134Z self._wait_futures(scope) 2025-12-04T12:15:06.0844684Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 669, in _wait_futures 2025-12-04T12:15:06.0844800Z kernel = result.result() 2025-12-04T12:15:06.0845246Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 4416, in result 2025-12-04T12:15:06.0845375Z return self.result_fn() 2025-12-04T12:15:06.0845853Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 438, in get_result 2025-12-04T12:15:06.0845996Z raise e.with_name(kernel_name) from e 2025-12-04T12:15:06.0846381Z torch._inductor.exc.InductorError: SubprocException: An exception occurred in a subprocess: 2025-12-04T12:15:06.0846387Z 2025-12-04T12:15:06.0846518Z Name=triton_poi_fused__to_copy_clamp_0 2025-12-04T12:15:06.0846689Z Traceback (most recent call last): 2025-12-04T12:15:06.0847231Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_worker/subproc_pool.py", line 457, in do_job 2025-12-04T12:15:06.0847334Z result = job() 2025-12-04T12:15:06.0847937Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 68, in _worker_compile_triton 2025-12-04T12:15:06.0848081Z kernel.precompile(warm_cache_only=True) 2025-12-04T12:15:06.0848651Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 432, in precompile 2025-12-04T12:15:06.0848770Z self._precompile_worker() 2025-12-04T12:15:06.0849363Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:06.0849560Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:06.0850162Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:06.0850377Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:06.0850835Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:06.0851085Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:06.0851548Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:06.0851885Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:06.0852070Z triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T12:15:06.0852394Z def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:06.0852486Z ^ 2025-12-04T12:15:06.0853011Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:06.0853017Z 2025-12-04T12:15:06.0853022Z 2025-12-04T12:15:06.0853737Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:06.0853745Z 2025-12-04T12:15:06.0853750Z 2025-12-04T12:15:06.0853979Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:06.0854642Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_to_fp8_saturated_float16_float8_e4m3fn_shape_16,16,16_cuda 2025-12-04T12:15:06.0854648Z 2025-12-04T12:15:06.0854915Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:06.0855156Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:06.0855262Z frames [('total', 1)] 2025-12-04T12:15:06.0855393Z stats [('calls_captured', 8)] 2025-12-04T12:15:06.0855622Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)] 2025-12-04T12:15:06.0856201Z inductor [('pattern_matcher_nodes', 2), ('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_hit', 1)] 2025-12-04T12:15:06.0856441Z graph_break [] 2025-12-04T12:15:06.0856668Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:06.0856776Z frames [('total', 1)] 2025-12-04T12:15:06.0856909Z stats [('calls_captured', 8)] 2025-12-04T12:15:06.0857131Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)] 2025-12-04T12:15:06.0857725Z inductor [('pattern_matcher_nodes', 2), ('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_hit', 1)] 2025-12-04T12:15:06.0857828Z graph_break [] 2025-12-04T12:15:06.0857977Z =================================== FAILURES =================================== 2025-12-04T12:15:06.0858350Z _ TestFP8TypesCUDA.test_to_fp8_saturated_float16_float8_e4m3fn_shape_16,16,16_cuda _ 2025-12-04T12:15:06.0858476Z Traceback (most recent call last): 2025-12-04T12:15:06.0858887Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 206, in test_to_fp8_saturated 2025-12-04T12:15:06.0859048Z y_compiled = compiled_fp8_cast(x, dst_dtype) 2025-12-04T12:15:06.0859533Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:06.0859794Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:06.0860306Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:06.0860498Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:06.0861021Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:06.0861170Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:06.0861709Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:06.0862040Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:06.0862556Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:06.0862723Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:06.0863203Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:06.0863328Z return self._compile_to_module() 2025-12-04T12:15:06.0863826Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:06.0863991Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:06.0864562Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:06.0864697Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:06.0865195Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:06.0865441Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:06.0866057Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:06.0866182Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:06.0866670Z File "/tmp/tmp9m_mh11q/i2/ci26xsjmz6ze2lfn72pqyotyrk5vbfpz32vup26viakrnept7s6z.py", line 84, in 2025-12-04T12:15:06.0867121Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 649, in wait 2025-12-04T12:15:06.0867252Z self._wait_futures(scope) 2025-12-04T12:15:06.0867752Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 669, in _wait_futures 2025-12-04T12:15:06.0867905Z kernel = result.result() 2025-12-04T12:15:06.0868358Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 4416, in result 2025-12-04T12:15:06.0868473Z return self.result_fn() 2025-12-04T12:15:06.0868965Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 438, in get_result 2025-12-04T12:15:06.0869094Z raise e.with_name(kernel_name) from e 2025-12-04T12:15:06.0869476Z torch._inductor.exc.InductorError: SubprocException: An exception occurred in a subprocess: 2025-12-04T12:15:06.0869482Z 2025-12-04T12:15:06.0869629Z Name=triton_poi_fused__to_copy_clamp_0 2025-12-04T12:15:06.0869750Z Traceback (most recent call last): 2025-12-04T12:15:06.0870338Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_worker/subproc_pool.py", line 457, in do_job 2025-12-04T12:15:06.0870453Z result = job() 2025-12-04T12:15:06.0871283Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 68, in _worker_compile_triton 2025-12-04T12:15:06.0871445Z kernel.precompile(warm_cache_only=True) 2025-12-04T12:15:06.0872002Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 432, in precompile 2025-12-04T12:15:06.0872122Z self._precompile_worker() 2025-12-04T12:15:06.0872731Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:06.0872909Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:06.0873515Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:06.0873719Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:06.0874174Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:06.0874436Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:06.0874879Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:06.0875218Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:06.0875418Z triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T12:15:06.0875730Z def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:06.0875838Z ^ 2025-12-04T12:15:06.0876296Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:06.0876302Z 2025-12-04T12:15:06.0876307Z 2025-12-04T12:15:06.0877123Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:06.0877145Z 2025-12-04T12:15:06.0877149Z 2025-12-04T12:15:06.0877370Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:06.0878002Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_to_fp8_saturated_float16_float8_e4m3fn_shape_16,16,16_cuda 2025-12-04T12:15:06.0878062Z 2025-12-04T12:15:06.0878349Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:06.0878574Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:06.0878695Z frames [('total', 1)] 2025-12-04T12:15:06.0878815Z stats [('calls_captured', 8)] 2025-12-04T12:15:06.0879040Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)] 2025-12-04T12:15:06.0879639Z inductor [('pattern_matcher_nodes', 2), ('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_hit', 1)] 2025-12-04T12:15:06.0879870Z graph_break [] 2025-12-04T12:15:06.0880091Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:06.0880210Z frames [('total', 1)] 2025-12-04T12:15:06.0880326Z stats [('calls_captured', 8)] 2025-12-04T12:15:06.0880549Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)] 2025-12-04T12:15:06.0881143Z inductor [('pattern_matcher_nodes', 2), ('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_hit', 1)] 2025-12-04T12:15:06.0881243Z graph_break [] 2025-12-04T12:15:06.0881473Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:06.0881579Z frames [('total', 1)] 2025-12-04T12:15:06.0881696Z stats [('calls_captured', 8)] 2025-12-04T12:15:06.0881975Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)] 2025-12-04T12:15:06.0882554Z inductor [('pattern_matcher_nodes', 2), ('async_compile_cache_miss', 2), ('pattern_matcher_count', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_hit', 1)] 2025-12-04T12:15:06.0882655Z graph_break [] 2025-12-04T12:15:06.0883321Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-90e37d7f0968dad1.xml - 2025-12-04T12:15:06.0883497Z =========================== short test summary info ============================ 2025-12-04T12:15:06.0884445Z FAILED [1.0731s] inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float16_float8_e4m3fn_shape_16,16,16_cuda - torch._inductor.exc.InductorError: SubprocException: An exception occurred in a subprocess: 2025-12-04T12:15:06.0884452Z 2025-12-04T12:15:06.0884582Z Name=triton_poi_fused__to_copy_clamp_0 2025-12-04T12:15:06.0884705Z Traceback (most recent call last): 2025-12-04T12:15:06.0885273Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_worker/subproc_pool.py", line 457, in do_job 2025-12-04T12:15:06.0885371Z result = job() 2025-12-04T12:15:06.0885975Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 68, in _worker_compile_triton 2025-12-04T12:15:06.0886115Z kernel.precompile(warm_cache_only=True) 2025-12-04T12:15:06.0886666Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 432, in precompile 2025-12-04T12:15:06.0886800Z self._precompile_worker() 2025-12-04T12:15:06.0887391Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:06.0887582Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:06.0888175Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:06.0888408Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:06.0888871Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:06.0889118Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:06.0889560Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:06.0889938Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:06.0890121Z triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T12:15:06.0890442Z def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:06.0890531Z ^ 2025-12-04T12:15:06.0890986Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:06.0890992Z 2025-12-04T12:15:06.0890996Z 2025-12-04T12:15:06.0891724Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:06.0891759Z 2025-12-04T12:15:06.0891764Z 2025-12-04T12:15:06.0891981Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:06.0892627Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_to_fp8_saturated_float16_float8_e4m3fn_shape_16,16,16_cuda 2025-12-04T12:15:06.0892634Z 2025-12-04T12:15:06.0892902Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:06.0893095Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T12:15:06.0893318Z ============ 1 failed, 14 passed, 37 deselected, 2 rerun in 16.37s ============= 2025-12-04T12:15:06.0893420Z Got exit code 1 2025-12-04T12:15:06.0893541Z Retrying single test... 2025-12-04T12:15:06.0894059Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-6ba281452d587f38.xml 2025-12-04T12:15:06.0894227Z ============================= test session starts ============================== 2025-12-04T12:15:06.0894593Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T12:15:06.0894705Z cachedir: .pytest_cache 2025-12-04T12:15:06.0895241Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:15:06.0895374Z rootdir: /var/lib/jenkins/workspace 2025-12-04T12:15:06.0895485Z configfile: pytest.ini 2025-12-04T12:15:06.0896090Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T12:15:06.0896387Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T12:15:06.0897098Z stepcurrent: skipping 51 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float16_float8_e4m3fn_shape_16,16,16_cuda 2025-12-04T12:15:06.0897232Z Running 1 items in this shard 2025-12-04T12:15:06.0897237Z 2025-12-04T12:15:06.0898372Z inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float16_float8_e4m3fn_shape_16,16,16_cuda E1204 12:06:24.568000 123839 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Triton compilation failed: triton_poi_fused__to_copy_clamp_0 2025-12-04T12:15:06.0899141Z E1204 12:06:24.568000 123839 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:06.0899688Z E1204 12:06:24.568000 123839 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:06.0900314Z E1204 12:06:24.568000 123839 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T12:15:06.0900821Z E1204 12:06:24.568000 123839 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] xmask = xindex < xnumel 2025-12-04T12:15:06.0901261Z E1204 12:06:24.568000 123839 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] x0 = xindex 2025-12-04T12:15:06.0901881Z E1204 12:06:24.568000 123839 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2025-12-04T12:15:06.0902429Z E1204 12:06:24.568000 123839 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp1 = tmp0.to(tl.float32) 2025-12-04T12:15:06.0902891Z E1204 12:06:24.568000 123839 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp2 = -448.0 2025-12-04T12:15:06.0903459Z E1204 12:06:24.568000 123839 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp3 = triton_helpers.maximum(tmp1, tmp2) 2025-12-04T12:15:06.0903906Z E1204 12:06:24.568000 123839 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp4 = 448.0 2025-12-04T12:15:06.0904517Z E1204 12:06:24.568000 123839 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp5 = triton_helpers.minimum(tmp3, tmp4) 2025-12-04T12:15:06.0905030Z E1204 12:06:24.568000 123839 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp6 = tmp5.to(tl.float32) 2025-12-04T12:15:06.0905572Z E1204 12:06:24.568000 123839 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp7 = tmp6.to(tl.float8e4nv) 2025-12-04T12:15:06.0906122Z E1204 12:06:24.568000 123839 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tl.store(out_ptr0 + (x0), tmp7, xmask) 2025-12-04T12:15:06.0906483Z E1204 12:06:24.568000 123839 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] 2025-12-04T12:15:06.0908204Z E1204 12:06:24.568000 123839 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] metadata: {'signature': {'in_ptr0': '*fp16', 'out_ptr0': '*fp8e4nv', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 256}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 75} 2025-12-04T12:15:06.0908748Z E1204 12:06:24.568000 123839 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Traceback (most recent call last): 2025-12-04T12:15:06.0909804Z E1204 12:06:24.568000 123839 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:06.0910436Z E1204 12:06:24.568000 123839 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:06.0911352Z E1204 12:06:24.568000 123839 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:06.0912037Z E1204 12:06:24.568000 123839 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:06.0912941Z E1204 12:06:24.568000 123839 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:06.0913735Z E1204 12:06:24.568000 123839 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:06.0914392Z E1204 12:06:24.568000 123839 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T12:15:06.0915150Z E1204 12:06:24.568000 123839 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:06.0915524Z E1204 12:06:24.568000 123839 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ^ 2025-12-04T12:15:06.0916467Z E1204 12:06:24.568000 123839 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:06.0916604Z ('RERUN', {'yellow': True}) [3.9740s] [100%] 2025-12-04T12:15:06.0917748Z inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float16_float8_e4m3fn_shape_16,16,16_cuda E1204 12:06:25.352000 123839 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Triton compilation failed: triton_poi_fused__to_copy_clamp_0 2025-12-04T12:15:06.0918503Z E1204 12:06:25.352000 123839 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:06.0919087Z E1204 12:06:25.352000 123839 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:06.0919673Z E1204 12:06:25.352000 123839 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T12:15:06.0920176Z E1204 12:06:25.352000 123839 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] xmask = xindex < xnumel 2025-12-04T12:15:06.0920628Z E1204 12:06:25.352000 123839 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] x0 = xindex 2025-12-04T12:15:06.0921260Z E1204 12:06:25.352000 123839 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2025-12-04T12:15:06.0921789Z E1204 12:06:25.352000 123839 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp1 = tmp0.to(tl.float32) 2025-12-04T12:15:06.0922245Z E1204 12:06:25.352000 123839 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp2 = -448.0 2025-12-04T12:15:06.0922816Z E1204 12:06:25.352000 123839 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp3 = triton_helpers.maximum(tmp1, tmp2) 2025-12-04T12:15:06.0923278Z E1204 12:06:25.352000 123839 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp4 = 448.0 2025-12-04T12:15:06.0923844Z E1204 12:06:25.352000 123839 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp5 = triton_helpers.minimum(tmp3, tmp4) 2025-12-04T12:15:06.0924376Z E1204 12:06:25.352000 123839 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp6 = tmp5.to(tl.float32) 2025-12-04T12:15:06.0924913Z E1204 12:06:25.352000 123839 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp7 = tmp6.to(tl.float8e4nv) 2025-12-04T12:15:06.0925469Z E1204 12:06:25.352000 123839 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tl.store(out_ptr0 + (x0), tmp7, xmask) 2025-12-04T12:15:06.0925851Z E1204 12:06:25.352000 123839 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] 2025-12-04T12:15:06.0927517Z E1204 12:06:25.352000 123839 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] metadata: {'signature': {'in_ptr0': '*fp16', 'out_ptr0': '*fp8e4nv', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 256}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 75} 2025-12-04T12:15:06.0928113Z E1204 12:06:25.352000 123839 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Traceback (most recent call last): 2025-12-04T12:15:06.0929168Z E1204 12:06:25.352000 123839 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:06.0929818Z E1204 12:06:25.352000 123839 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:06.0930747Z E1204 12:06:25.352000 123839 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:06.0931433Z E1204 12:06:25.352000 123839 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:06.0932338Z E1204 12:06:25.352000 123839 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:06.0933143Z E1204 12:06:25.352000 123839 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:06.0933772Z E1204 12:06:25.352000 123839 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T12:15:06.0934529Z E1204 12:06:25.352000 123839 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:06.0934912Z E1204 12:06:25.352000 123839 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ^ 2025-12-04T12:15:06.0935839Z E1204 12:06:25.352000 123839 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:06.0935979Z ('RERUN', {'yellow': True}) [0.7456s] [100%] 2025-12-04T12:15:06.0937178Z inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float16_float8_e4m3fn_shape_16,16,16_cuda E1204 12:06:26.093000 123839 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Triton compilation failed: triton_poi_fused__to_copy_clamp_0 2025-12-04T12:15:06.0937935Z E1204 12:06:26.093000 123839 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:06.0938503Z E1204 12:06:26.093000 123839 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:06.0939078Z E1204 12:06:26.093000 123839 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T12:15:06.0939598Z E1204 12:06:26.093000 123839 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] xmask = xindex < xnumel 2025-12-04T12:15:06.0940041Z E1204 12:06:26.093000 123839 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] x0 = xindex 2025-12-04T12:15:06.0940640Z E1204 12:06:26.093000 123839 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2025-12-04T12:15:06.0941172Z E1204 12:06:26.093000 123839 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp1 = tmp0.to(tl.float32) 2025-12-04T12:15:06.0941619Z E1204 12:06:26.093000 123839 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp2 = -448.0 2025-12-04T12:15:06.0942243Z E1204 12:06:26.093000 123839 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp3 = triton_helpers.maximum(tmp1, tmp2) 2025-12-04T12:15:06.0942691Z E1204 12:06:26.093000 123839 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp4 = 448.0 2025-12-04T12:15:06.0943260Z E1204 12:06:26.093000 123839 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp5 = triton_helpers.minimum(tmp3, tmp4) 2025-12-04T12:15:06.0943786Z E1204 12:06:26.093000 123839 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp6 = tmp5.to(tl.float32) 2025-12-04T12:15:06.0944353Z E1204 12:06:26.093000 123839 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp7 = tmp6.to(tl.float8e4nv) 2025-12-04T12:15:06.0944919Z E1204 12:06:26.093000 123839 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tl.store(out_ptr0 + (x0), tmp7, xmask) 2025-12-04T12:15:06.0945285Z E1204 12:06:26.093000 123839 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] 2025-12-04T12:15:06.0946956Z E1204 12:06:26.093000 123839 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] metadata: {'signature': {'in_ptr0': '*fp16', 'out_ptr0': '*fp8e4nv', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 256}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 75} 2025-12-04T12:15:06.0947547Z E1204 12:06:26.093000 123839 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Traceback (most recent call last): 2025-12-04T12:15:06.0948591Z E1204 12:06:26.093000 123839 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:06.0949270Z E1204 12:06:26.093000 123839 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:06.0950165Z E1204 12:06:26.093000 123839 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:06.0950862Z E1204 12:06:26.093000 123839 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:06.0951750Z E1204 12:06:26.093000 123839 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:06.0952539Z E1204 12:06:26.093000 123839 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:06.0953155Z E1204 12:06:26.093000 123839 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T12:15:06.0953911Z E1204 12:06:26.093000 123839 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:06.0954300Z E1204 12:06:26.093000 123839 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ^ 2025-12-04T12:15:06.0955205Z E1204 12:06:26.093000 123839 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:06.0955323Z FAILED [0.7389s] [100%] 2025-12-04T12:15:06.0955329Z 2025-12-04T12:15:06.0955475Z ==================================== RERUNS ==================================== 2025-12-04T12:15:06.0955841Z _ TestFP8TypesCUDA.test_to_fp8_saturated_float16_float8_e4m3fn_shape_16,16,16_cuda _ 2025-12-04T12:15:06.0955970Z Traceback (most recent call last): 2025-12-04T12:15:06.0956380Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 206, in test_to_fp8_saturated 2025-12-04T12:15:06.0956545Z y_compiled = compiled_fp8_cast(x, dst_dtype) 2025-12-04T12:15:06.0957034Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:06.0957317Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:06.0957842Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:06.0958038Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:06.0958557Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:06.0958706Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:06.0959243Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:06.0959607Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:06.0960126Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:06.0960293Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:06.0960778Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:06.0960901Z return self._compile_to_module() 2025-12-04T12:15:06.0961397Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:06.0961561Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:06.0962109Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:06.0962259Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:06.0962754Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:06.0963006Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:06.0963593Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:06.0963720Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:06.0964228Z File "/tmp/tmpkk2kc1li/x4/cx4yo2xtzpj37rqe6m7qsvlhsokh4mckkizs3c3i4wz7m4xrpilx.py", line 50, in 2025-12-04T12:15:06.0964694Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T12:15:06.0964820Z kernel.precompile( 2025-12-04T12:15:06.0965379Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T12:15:06.0965500Z self._precompile_worker() 2025-12-04T12:15:06.0966104Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:06.0966285Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:06.0966883Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:06.0967099Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:06.0967551Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:06.0967809Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:06.0968289Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:06.0968623Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:06.0968865Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:06.0969174Z def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:06.0969300Z ^ 2025-12-04T12:15:06.0969772Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:06.0969777Z 2025-12-04T12:15:06.0970492Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:06.0970498Z 2025-12-04T12:15:06.0970502Z 2025-12-04T12:15:06.0970732Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:06.0971627Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_to_fp8_saturated_float16_float8_e4m3fn_shape_16,16,16_cuda 2025-12-04T12:15:06.0971723Z 2025-12-04T12:15:06.0972013Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:06.0972238Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:06.0972345Z frames [('total', 1)] 2025-12-04T12:15:06.0972483Z stats [('calls_captured', 8)] 2025-12-04T12:15:06.0972949Z inductor [('pattern_matcher_nodes', 2), ('pattern_matcher_count', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:06.0973187Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)] 2025-12-04T12:15:06.0973288Z graph_break [] 2025-12-04T12:15:06.0973603Z _ TestFP8TypesCUDA.test_to_fp8_saturated_float16_float8_e4m3fn_shape_16,16,16_cuda _ 2025-12-04T12:15:06.0973742Z Traceback (most recent call last): 2025-12-04T12:15:06.0974200Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 206, in test_to_fp8_saturated 2025-12-04T12:15:06.0974351Z y_compiled = compiled_fp8_cast(x, dst_dtype) 2025-12-04T12:15:06.0974855Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:06.0975103Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:06.0975627Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:06.0975822Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:06.0976390Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:06.0976555Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:06.0977088Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:06.0977412Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:06.0977949Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:06.0978097Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:06.0978588Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:06.0978714Z return self._compile_to_module() 2025-12-04T12:15:06.0979201Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:06.0979381Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:06.0979897Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:06.0980042Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:06.0980595Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:06.0980832Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:06.0981428Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:06.0981596Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:06.0982092Z File "/tmp/tmp3v20ssgk/t4/ct4xw3u2u3v3vzzrrtwonytmh6n3mgbjmegdscqsl3oawb4ptn4q.py", line 50, in 2025-12-04T12:15:06.0982567Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T12:15:06.0982679Z kernel.precompile( 2025-12-04T12:15:06.0983245Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T12:15:06.0983367Z self._precompile_worker() 2025-12-04T12:15:06.0983968Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:06.0984201Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:06.0984795Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:06.0985013Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:06.0985461Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:06.0985707Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:06.0986162Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:06.0986532Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:06.0986766Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:06.0987094Z def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:06.0987189Z ^ 2025-12-04T12:15:06.0987661Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:06.0987669Z 2025-12-04T12:15:06.0988381Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:06.0988387Z 2025-12-04T12:15:06.0988392Z 2025-12-04T12:15:06.0988627Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:06.0989259Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_to_fp8_saturated_float16_float8_e4m3fn_shape_16,16,16_cuda 2025-12-04T12:15:06.0989265Z 2025-12-04T12:15:06.0989537Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:06.0989773Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:06.0989880Z frames [('total', 1)] 2025-12-04T12:15:06.0990000Z stats [('calls_captured', 8)] 2025-12-04T12:15:06.0990478Z inductor [('pattern_matcher_nodes', 2), ('pattern_matcher_count', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:06.0990702Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)] 2025-12-04T12:15:06.0990819Z graph_break [] 2025-12-04T12:15:06.0991038Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:06.0991142Z frames [('total', 1)] 2025-12-04T12:15:06.0991270Z stats [('calls_captured', 8)] 2025-12-04T12:15:06.0991489Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)] 2025-12-04T12:15:06.0991985Z inductor [('pattern_matcher_nodes', 2), ('pattern_matcher_count', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:06.0992101Z graph_break [] 2025-12-04T12:15:06.0992251Z =================================== FAILURES =================================== 2025-12-04T12:15:06.0992580Z _ TestFP8TypesCUDA.test_to_fp8_saturated_float16_float8_e4m3fn_shape_16,16,16_cuda _ 2025-12-04T12:15:06.0992706Z Traceback (most recent call last): 2025-12-04T12:15:06.0993189Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 206, in test_to_fp8_saturated 2025-12-04T12:15:06.0993353Z y_compiled = compiled_fp8_cast(x, dst_dtype) 2025-12-04T12:15:06.0993845Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:06.0994096Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:06.0994622Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:06.0994823Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:06.0995349Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:06.0995540Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:06.0996076Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:06.0996417Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:06.0996940Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:06.0997104Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:06.0997589Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:06.0997744Z return self._compile_to_module() 2025-12-04T12:15:06.0998248Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:06.0998420Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:06.0998938Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:06.0999084Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:06.0999582Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:06.0999828Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:06.1000415Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:06.1000543Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:06.1001073Z File "/tmp/tmpm6mlzava/in/cinos7dxtkcjfqeuab6hdpxejcycroispq5vl6d5ovoehvkr2qwa.py", line 50, in 2025-12-04T12:15:06.1001539Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T12:15:06.1001672Z kernel.precompile( 2025-12-04T12:15:06.1002228Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T12:15:06.1002350Z self._precompile_worker() 2025-12-04T12:15:06.1002958Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:06.1003146Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:06.1003741Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:06.1003958Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:06.1004444Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:06.1004706Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:06.1005149Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:06.1005486Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:06.1005771Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:06.1006080Z def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:06.1006184Z ^ 2025-12-04T12:15:06.1006642Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:06.1006647Z 2025-12-04T12:15:06.1007364Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:06.1007371Z 2025-12-04T12:15:06.1007404Z 2025-12-04T12:15:06.1007634Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:06.1008259Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_to_fp8_saturated_float16_float8_e4m3fn_shape_16,16,16_cuda 2025-12-04T12:15:06.1008267Z 2025-12-04T12:15:06.1008549Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:06.1008769Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:06.1008877Z frames [('total', 1)] 2025-12-04T12:15:06.1009010Z stats [('calls_captured', 8)] 2025-12-04T12:15:06.1009474Z inductor [('pattern_matcher_nodes', 2), ('pattern_matcher_count', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:06.1009749Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)] 2025-12-04T12:15:06.1009853Z graph_break [] 2025-12-04T12:15:06.1010073Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:06.1010199Z frames [('total', 1)] 2025-12-04T12:15:06.1010314Z stats [('calls_captured', 8)] 2025-12-04T12:15:06.1010532Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)] 2025-12-04T12:15:06.1011009Z inductor [('pattern_matcher_nodes', 2), ('pattern_matcher_count', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:06.1011112Z graph_break [] 2025-12-04T12:15:06.1011340Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:06.1011446Z frames [('total', 1)] 2025-12-04T12:15:06.1011560Z stats [('calls_captured', 8)] 2025-12-04T12:15:06.1011789Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)] 2025-12-04T12:15:06.1012248Z inductor [('pattern_matcher_nodes', 2), ('pattern_matcher_count', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:06.1012349Z graph_break [] 2025-12-04T12:15:06.1013009Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-6ba281452d587f38.xml - 2025-12-04T12:15:06.1013186Z =========================== short test summary info ============================ 2025-12-04T12:15:06.1013981Z FAILED [0.7389s] inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float16_float8_e4m3fn_shape_16,16,16_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:06.1014300Z def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:06.1014390Z ^ 2025-12-04T12:15:06.1014858Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:06.1014864Z 2025-12-04T12:15:06.1015610Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:06.1015616Z 2025-12-04T12:15:06.1015621Z 2025-12-04T12:15:06.1015854Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:06.1016560Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_to_fp8_saturated_float16_float8_e4m3fn_shape_16,16,16_cuda 2025-12-04T12:15:06.1016605Z 2025-12-04T12:15:06.1016876Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:06.1017072Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T12:15:06.1017274Z ================== 1 failed, 187 deselected, 2 rerun in 5.50s ================== 2025-12-04T12:15:06.1017390Z Got exit code 1 2025-12-04T12:15:06.1017499Z Retrying single test... 2025-12-04T12:15:06.1017972Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-85d1d6e9267cc116.xml 2025-12-04T12:15:06.1018156Z ============================= test session starts ============================== 2025-12-04T12:15:06.1018509Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T12:15:06.1018652Z cachedir: .pytest_cache 2025-12-04T12:15:06.1019184Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:15:06.1019312Z rootdir: /var/lib/jenkins/workspace 2025-12-04T12:15:06.1019436Z configfile: pytest.ini 2025-12-04T12:15:06.1020025Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T12:15:06.1020250Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T12:15:06.1020967Z stepcurrent: skipping 51 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float16_float8_e4m3fn_shape_16,16,16_cuda 2025-12-04T12:15:06.1021117Z Running 1 items in this shard 2025-12-04T12:15:06.1021122Z 2025-12-04T12:15:06.1022275Z inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float16_float8_e4m3fn_shape_16,16,16_cuda E1204 12:06:44.462000 124037 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Triton compilation failed: triton_poi_fused__to_copy_clamp_0 2025-12-04T12:15:06.1023030Z E1204 12:06:44.462000 124037 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:06.1023584Z E1204 12:06:44.462000 124037 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:06.1024162Z E1204 12:06:44.462000 124037 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T12:15:06.1024668Z E1204 12:06:44.462000 124037 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] xmask = xindex < xnumel 2025-12-04T12:15:06.1025120Z E1204 12:06:44.462000 124037 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] x0 = xindex 2025-12-04T12:15:06.1025725Z E1204 12:06:44.462000 124037 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2025-12-04T12:15:06.1026252Z E1204 12:06:44.462000 124037 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp1 = tmp0.to(tl.float32) 2025-12-04T12:15:06.1026705Z E1204 12:06:44.462000 124037 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp2 = -448.0 2025-12-04T12:15:06.1027276Z E1204 12:06:44.462000 124037 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp3 = triton_helpers.maximum(tmp1, tmp2) 2025-12-04T12:15:06.1027731Z E1204 12:06:44.462000 124037 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp4 = 448.0 2025-12-04T12:15:06.1028338Z E1204 12:06:44.462000 124037 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp5 = triton_helpers.minimum(tmp3, tmp4) 2025-12-04T12:15:06.1028873Z E1204 12:06:44.462000 124037 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp6 = tmp5.to(tl.float32) 2025-12-04T12:15:06.1029402Z E1204 12:06:44.462000 124037 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp7 = tmp6.to(tl.float8e4nv) 2025-12-04T12:15:06.1029984Z E1204 12:06:44.462000 124037 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tl.store(out_ptr0 + (x0), tmp7, xmask) 2025-12-04T12:15:06.1030364Z E1204 12:06:44.462000 124037 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] 2025-12-04T12:15:06.1032041Z E1204 12:06:44.462000 124037 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] metadata: {'signature': {'in_ptr0': '*fp16', 'out_ptr0': '*fp8e4nv', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 256}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 75} 2025-12-04T12:15:06.1032631Z E1204 12:06:44.462000 124037 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Traceback (most recent call last): 2025-12-04T12:15:06.1033884Z E1204 12:06:44.462000 124037 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:06.1034532Z E1204 12:06:44.462000 124037 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:06.1035472Z E1204 12:06:44.462000 124037 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:06.1036160Z E1204 12:06:44.462000 124037 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:06.1037058Z E1204 12:06:44.462000 124037 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:06.1037832Z E1204 12:06:44.462000 124037 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:06.1038453Z E1204 12:06:44.462000 124037 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T12:15:06.1039214Z E1204 12:06:44.462000 124037 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:06.1039602Z E1204 12:06:44.462000 124037 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ^ 2025-12-04T12:15:06.1040505Z E1204 12:06:44.462000 124037 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:06.1040642Z ('RERUN', {'yellow': True}) [3.9679s] [100%] 2025-12-04T12:15:06.1041782Z inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float16_float8_e4m3fn_shape_16,16,16_cuda E1204 12:06:45.250000 124037 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Triton compilation failed: triton_poi_fused__to_copy_clamp_0 2025-12-04T12:15:06.1042579Z E1204 12:06:45.250000 124037 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:06.1043143Z E1204 12:06:45.250000 124037 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:06.1043710Z E1204 12:06:45.250000 124037 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T12:15:06.1044250Z E1204 12:06:45.250000 124037 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] xmask = xindex < xnumel 2025-12-04T12:15:06.1044690Z E1204 12:06:45.250000 124037 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] x0 = xindex 2025-12-04T12:15:06.1045292Z E1204 12:06:45.250000 124037 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2025-12-04T12:15:06.1045822Z E1204 12:06:45.250000 124037 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp1 = tmp0.to(tl.float32) 2025-12-04T12:15:06.1046269Z E1204 12:06:45.250000 124037 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp2 = -448.0 2025-12-04T12:15:06.1046892Z E1204 12:06:45.250000 124037 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp3 = triton_helpers.maximum(tmp1, tmp2) 2025-12-04T12:15:06.1047332Z E1204 12:06:45.250000 124037 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp4 = 448.0 2025-12-04T12:15:06.1047900Z E1204 12:06:45.250000 124037 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp5 = triton_helpers.minimum(tmp3, tmp4) 2025-12-04T12:15:06.1048422Z E1204 12:06:45.250000 124037 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp6 = tmp5.to(tl.float32) 2025-12-04T12:15:06.1048985Z E1204 12:06:45.250000 124037 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp7 = tmp6.to(tl.float8e4nv) 2025-12-04T12:15:06.1049546Z E1204 12:06:45.250000 124037 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tl.store(out_ptr0 + (x0), tmp7, xmask) 2025-12-04T12:15:06.1049914Z E1204 12:06:45.250000 124037 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] 2025-12-04T12:15:06.1051605Z E1204 12:06:45.250000 124037 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] metadata: {'signature': {'in_ptr0': '*fp16', 'out_ptr0': '*fp8e4nv', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 256}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 75} 2025-12-04T12:15:06.1052145Z E1204 12:06:45.250000 124037 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Traceback (most recent call last): 2025-12-04T12:15:06.1053193Z E1204 12:06:45.250000 124037 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:06.1053837Z E1204 12:06:45.250000 124037 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:06.1054728Z E1204 12:06:45.250000 124037 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:06.1055428Z E1204 12:06:45.250000 124037 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:06.1056459Z E1204 12:06:45.250000 124037 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:06.1057256Z E1204 12:06:45.250000 124037 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:06.1057867Z E1204 12:06:45.250000 124037 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T12:15:06.1058652Z E1204 12:06:45.250000 124037 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:06.1059037Z E1204 12:06:45.250000 124037 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ^ 2025-12-04T12:15:06.1059939Z E1204 12:06:45.250000 124037 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:06.1060087Z ('RERUN', {'yellow': True}) [0.7474s] [100%] 2025-12-04T12:15:06.1061247Z inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float16_float8_e4m3fn_shape_16,16,16_cuda E1204 12:06:45.996000 124037 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Triton compilation failed: triton_poi_fused__to_copy_clamp_0 2025-12-04T12:15:06.1062013Z E1204 12:06:45.996000 124037 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:06.1062564Z E1204 12:06:45.996000 124037 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:06.1063163Z E1204 12:06:45.996000 124037 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T12:15:06.1063678Z E1204 12:06:45.996000 124037 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] xmask = xindex < xnumel 2025-12-04T12:15:06.1064117Z E1204 12:06:45.996000 124037 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] x0 = xindex 2025-12-04T12:15:06.1064729Z E1204 12:06:45.996000 124037 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2025-12-04T12:15:06.1065248Z E1204 12:06:45.996000 124037 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp1 = tmp0.to(tl.float32) 2025-12-04T12:15:06.1065702Z E1204 12:06:45.996000 124037 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp2 = -448.0 2025-12-04T12:15:06.1066282Z E1204 12:06:45.996000 124037 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp3 = triton_helpers.maximum(tmp1, tmp2) 2025-12-04T12:15:06.1066725Z E1204 12:06:45.996000 124037 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp4 = 448.0 2025-12-04T12:15:06.1067311Z E1204 12:06:45.996000 124037 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp5 = triton_helpers.minimum(tmp3, tmp4) 2025-12-04T12:15:06.1067823Z E1204 12:06:45.996000 124037 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp6 = tmp5.to(tl.float32) 2025-12-04T12:15:06.1068370Z E1204 12:06:45.996000 124037 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp7 = tmp6.to(tl.float8e4nv) 2025-12-04T12:15:06.1068920Z E1204 12:06:45.996000 124037 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tl.store(out_ptr0 + (x0), tmp7, xmask) 2025-12-04T12:15:06.1069288Z E1204 12:06:45.996000 124037 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] 2025-12-04T12:15:06.1071246Z E1204 12:06:45.996000 124037 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] metadata: {'signature': {'in_ptr0': '*fp16', 'out_ptr0': '*fp8e4nv', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 256}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 75} 2025-12-04T12:15:06.1071799Z E1204 12:06:45.996000 124037 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Traceback (most recent call last): 2025-12-04T12:15:06.1072905Z E1204 12:06:45.996000 124037 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:06.1073538Z E1204 12:06:45.996000 124037 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:06.1074453Z E1204 12:06:45.996000 124037 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:06.1075307Z E1204 12:06:45.996000 124037 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:06.1076204Z E1204 12:06:45.996000 124037 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:06.1076993Z E1204 12:06:45.996000 124037 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:06.1077691Z E1204 12:06:45.996000 124037 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T12:15:06.1078458Z E1204 12:06:45.996000 124037 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:06.1078838Z E1204 12:06:45.996000 124037 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ^ 2025-12-04T12:15:06.1079751Z E1204 12:06:45.996000 124037 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:06.1079856Z FAILED [0.7434s] [100%] 2025-12-04T12:15:06.1079862Z 2025-12-04T12:15:06.1080008Z ==================================== RERUNS ==================================== 2025-12-04T12:15:06.1080339Z _ TestFP8TypesCUDA.test_to_fp8_saturated_float16_float8_e4m3fn_shape_16,16,16_cuda _ 2025-12-04T12:15:06.1080467Z Traceback (most recent call last): 2025-12-04T12:15:06.1080893Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 206, in test_to_fp8_saturated 2025-12-04T12:15:06.1081046Z y_compiled = compiled_fp8_cast(x, dst_dtype) 2025-12-04T12:15:06.1081536Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:06.1081799Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:06.1082314Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:06.1082508Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:06.1083033Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:06.1083178Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:06.1083765Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:06.1084087Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:06.1084613Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:06.1084790Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:06.1085379Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:06.1085519Z return self._compile_to_module() 2025-12-04T12:15:06.1086005Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:06.1086171Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:06.1086704Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:06.1086842Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:06.1087339Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:06.1087618Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:06.1088207Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:06.1088354Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:06.1088855Z File "/tmp/tmpp2r2fe96/yf/cyfrmu5oofhiqi7gdqlt24x4w5coydol7zmfuv5lbwr5rgz7fvmd.py", line 50, in 2025-12-04T12:15:06.1089319Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T12:15:06.1089448Z kernel.precompile( 2025-12-04T12:15:06.1090048Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T12:15:06.1090187Z self._precompile_worker() 2025-12-04T12:15:06.1090785Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:06.1090969Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:06.1091582Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:06.1091785Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:06.1092242Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:06.1092507Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:06.1092951Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:06.1093307Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:06.1093536Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:06.1093850Z def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:06.1093958Z ^ 2025-12-04T12:15:06.1094423Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:06.1094431Z 2025-12-04T12:15:06.1095163Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:06.1095169Z 2025-12-04T12:15:06.1095174Z 2025-12-04T12:15:06.1095394Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:06.1096026Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_to_fp8_saturated_float16_float8_e4m3fn_shape_16,16,16_cuda 2025-12-04T12:15:06.1096082Z 2025-12-04T12:15:06.1096422Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:06.1096656Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:06.1096778Z frames [('total', 1)] 2025-12-04T12:15:06.1096898Z stats [('calls_captured', 8)] 2025-12-04T12:15:06.1097365Z inductor [('pattern_matcher_nodes', 2), ('pattern_matcher_count', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:06.1097641Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)] 2025-12-04T12:15:06.1097744Z graph_break [] 2025-12-04T12:15:06.1098077Z _ TestFP8TypesCUDA.test_to_fp8_saturated_float16_float8_e4m3fn_shape_16,16,16_cuda _ 2025-12-04T12:15:06.1098206Z Traceback (most recent call last): 2025-12-04T12:15:06.1098614Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 206, in test_to_fp8_saturated 2025-12-04T12:15:06.1098783Z y_compiled = compiled_fp8_cast(x, dst_dtype) 2025-12-04T12:15:06.1099275Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:06.1099561Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:06.1100091Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:06.1100287Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:06.1100815Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:06.1100960Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:06.1101490Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:06.1101854Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:06.1102380Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:06.1102543Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:06.1103021Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:06.1103143Z return self._compile_to_module() 2025-12-04T12:15:06.1103647Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:06.1103812Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:06.1104329Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:06.1104470Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:06.1104969Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:06.1105212Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:06.1105799Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:06.1105926Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:06.1106414Z File "/tmp/tmpfes_8dbg/t4/ct4nuoctpw6xjefrqvcmsjk2byuv57oevub7n5jmumauwqhfd5oc.py", line 50, in 2025-12-04T12:15:06.1106880Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T12:15:06.1107005Z kernel.precompile( 2025-12-04T12:15:06.1107559Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T12:15:06.1107676Z self._precompile_worker() 2025-12-04T12:15:06.1108326Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:06.1108508Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:06.1109106Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:06.1109319Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:06.1109804Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:06.1110069Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:06.1110511Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:06.1110846Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:06.1111090Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:06.1111400Z def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:06.1111524Z ^ 2025-12-04T12:15:06.1111993Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:06.1111998Z 2025-12-04T12:15:06.1112707Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:06.1112716Z 2025-12-04T12:15:06.1112721Z 2025-12-04T12:15:06.1112950Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:06.1113582Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_to_fp8_saturated_float16_float8_e4m3fn_shape_16,16,16_cuda 2025-12-04T12:15:06.1113588Z 2025-12-04T12:15:06.1113869Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:06.1114127Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:06.1114235Z frames [('total', 1)] 2025-12-04T12:15:06.1114370Z stats [('calls_captured', 8)] 2025-12-04T12:15:06.1114834Z inductor [('pattern_matcher_nodes', 2), ('pattern_matcher_count', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:06.1115069Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)] 2025-12-04T12:15:06.1115172Z graph_break [] 2025-12-04T12:15:06.1115391Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:06.1115506Z frames [('total', 1)] 2025-12-04T12:15:06.1115623Z stats [('calls_captured', 8)] 2025-12-04T12:15:06.1115842Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)] 2025-12-04T12:15:06.1116314Z inductor [('pattern_matcher_nodes', 2), ('pattern_matcher_count', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:06.1116414Z graph_break [] 2025-12-04T12:15:06.1116565Z =================================== FAILURES =================================== 2025-12-04T12:15:06.1116899Z _ TestFP8TypesCUDA.test_to_fp8_saturated_float16_float8_e4m3fn_shape_16,16,16_cuda _ 2025-12-04T12:15:06.1117023Z Traceback (most recent call last): 2025-12-04T12:15:06.1117446Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 206, in test_to_fp8_saturated 2025-12-04T12:15:06.1117599Z y_compiled = compiled_fp8_cast(x, dst_dtype) 2025-12-04T12:15:06.1118091Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:06.1118354Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:06.1118865Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:06.1119073Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:06.1119616Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:06.1119766Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:06.1120310Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:06.1120631Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:06.1121178Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:06.1121342Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:06.1121823Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:06.1121959Z return self._compile_to_module() 2025-12-04T12:15:06.1122445Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:06.1122608Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:06.1123172Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:06.1123300Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:06.1123806Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:06.1124042Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:06.1124625Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:06.1124765Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:06.1125268Z File "/tmp/tmpdtqwhkrj/ae/caeu5yzsejv7nmnevpk5vutu2gbz4clgszbzwsqmlxvdwamdh2sw.py", line 50, in 2025-12-04T12:15:06.1125763Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T12:15:06.1125891Z kernel.precompile( 2025-12-04T12:15:06.1126448Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T12:15:06.1126577Z self._precompile_worker() 2025-12-04T12:15:06.1127171Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:06.1127351Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:06.1127959Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:06.1128157Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:06.1128620Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:06.1128866Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:06.1129310Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:06.1129655Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:06.1129882Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:06.1130193Z def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:06.1130295Z ^ 2025-12-04T12:15:06.1130752Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:06.1130757Z 2025-12-04T12:15:06.1131477Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:06.1131483Z 2025-12-04T12:15:06.1131490Z 2025-12-04T12:15:06.1131756Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:06.1132399Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_to_fp8_saturated_float16_float8_e4m3fn_shape_16,16,16_cuda 2025-12-04T12:15:06.1132405Z 2025-12-04T12:15:06.1132675Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:06.1132930Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:06.1133047Z frames [('total', 1)] 2025-12-04T12:15:06.1133164Z stats [('calls_captured', 8)] 2025-12-04T12:15:06.1133627Z inductor [('pattern_matcher_nodes', 2), ('pattern_matcher_count', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:06.1133862Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)] 2025-12-04T12:15:06.1133963Z graph_break [] 2025-12-04T12:15:06.1134198Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:06.1134303Z frames [('total', 1)] 2025-12-04T12:15:06.1134417Z stats [('calls_captured', 8)] 2025-12-04T12:15:06.1134712Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)] 2025-12-04T12:15:06.1135172Z inductor [('pattern_matcher_nodes', 2), ('pattern_matcher_count', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:06.1135274Z graph_break [] 2025-12-04T12:15:06.1135505Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:06.1135608Z frames [('total', 1)] 2025-12-04T12:15:06.1135738Z stats [('calls_captured', 8)] 2025-12-04T12:15:06.1135958Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)] 2025-12-04T12:15:06.1136498Z inductor [('pattern_matcher_nodes', 2), ('pattern_matcher_count', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:06.1136616Z graph_break [] 2025-12-04T12:15:06.1137308Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-85d1d6e9267cc116.xml - 2025-12-04T12:15:06.1137489Z =========================== short test summary info ============================ 2025-12-04T12:15:06.1138280Z FAILED [0.7434s] inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float16_float8_e4m3fn_shape_16,16,16_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:06.1138595Z def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:06.1138701Z ^ 2025-12-04T12:15:06.1139161Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:06.1139167Z 2025-12-04T12:15:06.1139878Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:06.1139896Z 2025-12-04T12:15:06.1139903Z 2025-12-04T12:15:06.1140125Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:06.1140755Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_to_fp8_saturated_float16_float8_e4m3fn_shape_16,16,16_cuda 2025-12-04T12:15:06.1140761Z 2025-12-04T12:15:06.1141043Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:06.1141229Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T12:15:06.1141446Z ================== 1 failed, 187 deselected, 2 rerun in 5.50s ================== 2025-12-04T12:15:06.1141549Z Got exit code 1 2025-12-04T12:15:06.1142090Z FAILED CONSISTENTLY: test/inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float16_float8_e4m3fn_shape_16,16,16_cuda 2025-12-04T12:15:06.1142513Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T12:15:06.1143029Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-7a610c26dd7fa0e9.xml 2025-12-04T12:15:06.1143197Z ============================= test session starts ============================== 2025-12-04T12:15:06.1143564Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T12:15:06.1143677Z cachedir: .pytest_cache 2025-12-04T12:15:06.1144208Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:15:06.1144376Z rootdir: /var/lib/jenkins/workspace 2025-12-04T12:15:06.1144487Z configfile: pytest.ini 2025-12-04T12:15:06.1145092Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T12:15:06.1145319Z collecting ... collected 188 items / 52 deselected / 136 selected 2025-12-04T12:15:06.1145464Z stepcurrent: skipping 52 already run items. 2025-12-04T12:15:06.1145600Z Running 136 items in this shard 2025-12-04T12:15:06.1145605Z 2025-12-04T12:15:06.1146773Z inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float16_float8_e4m3fn_shape_4,2048,4096_cuda E1204 12:07:04.047000 124235 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Triton compilation failed: triton_poi_fused__to_copy_clamp_0 2025-12-04T12:15:06.1147595Z E1204 12:07:04.047000 124235 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:06.1148157Z E1204 12:07:04.047000 124235 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:06.1148743Z E1204 12:07:04.047000 124235 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T12:15:06.1149290Z E1204 12:07:04.047000 124235 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] xmask = xindex < xnumel 2025-12-04T12:15:06.1149737Z E1204 12:07:04.047000 124235 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] x0 = xindex 2025-12-04T12:15:06.1150360Z E1204 12:07:04.047000 124235 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2025-12-04T12:15:06.1150880Z E1204 12:07:04.047000 124235 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp1 = tmp0.to(tl.float32) 2025-12-04T12:15:06.1151346Z E1204 12:07:04.047000 124235 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp2 = -448.0 2025-12-04T12:15:06.1151917Z E1204 12:07:04.047000 124235 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp3 = triton_helpers.maximum(tmp1, tmp2) 2025-12-04T12:15:06.1152364Z E1204 12:07:04.047000 124235 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp4 = 448.0 2025-12-04T12:15:06.1152950Z E1204 12:07:04.047000 124235 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp5 = triton_helpers.minimum(tmp3, tmp4) 2025-12-04T12:15:06.1153472Z E1204 12:07:04.047000 124235 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp6 = tmp5.to(tl.float32) 2025-12-04T12:15:06.1154018Z E1204 12:07:04.047000 124235 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp7 = tmp6.to(tl.float8e4nv) 2025-12-04T12:15:06.1154574Z E1204 12:07:04.047000 124235 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tl.store(out_ptr0 + (x0), tmp7, xmask) 2025-12-04T12:15:06.1154945Z E1204 12:07:04.047000 124235 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] 2025-12-04T12:15:06.1156675Z E1204 12:07:04.047000 124235 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] metadata: {'signature': {'in_ptr0': '*fp16', 'out_ptr0': '*fp8e4nv', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1024}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 75} 2025-12-04T12:15:06.1157221Z E1204 12:07:04.047000 124235 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Traceback (most recent call last): 2025-12-04T12:15:06.1158322Z E1204 12:07:04.047000 124235 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:06.1158954Z E1204 12:07:04.047000 124235 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:06.1159878Z E1204 12:07:04.047000 124235 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:06.1160606Z E1204 12:07:04.047000 124235 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:06.1161516Z E1204 12:07:04.047000 124235 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:06.1162295Z E1204 12:07:04.047000 124235 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:06.1162907Z E1204 12:07:04.047000 124235 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T12:15:06.1163722Z E1204 12:07:04.047000 124235 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:06.1164104Z E1204 12:07:04.047000 124235 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ^ 2025-12-04T12:15:06.1165017Z E1204 12:07:04.047000 124235 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:06.1165155Z ('RERUN', {'yellow': True}) [3.8173s] [ 0%] 2025-12-04T12:15:06.1166318Z inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float16_float8_e4m3fn_shape_4,2048,4096_cuda E1204 12:07:04.845000 124235 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Triton compilation failed: triton_poi_fused__to_copy_clamp_0 2025-12-04T12:15:06.1167078Z E1204 12:07:04.845000 124235 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:06.1167627Z E1204 12:07:04.845000 124235 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:06.1168210Z E1204 12:07:04.845000 124235 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T12:15:06.1168715Z E1204 12:07:04.845000 124235 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] xmask = xindex < xnumel 2025-12-04T12:15:06.1169169Z E1204 12:07:04.845000 124235 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] x0 = xindex 2025-12-04T12:15:06.1169774Z E1204 12:07:04.845000 124235 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2025-12-04T12:15:06.1170328Z E1204 12:07:04.845000 124235 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp1 = tmp0.to(tl.float32) 2025-12-04T12:15:06.1170801Z E1204 12:07:04.845000 124235 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp2 = -448.0 2025-12-04T12:15:06.1171584Z E1204 12:07:04.845000 124235 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp3 = triton_helpers.maximum(tmp1, tmp2) 2025-12-04T12:15:06.1172107Z E1204 12:07:04.845000 124235 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp4 = 448.0 2025-12-04T12:15:06.1172677Z E1204 12:07:04.845000 124235 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp5 = triton_helpers.minimum(tmp3, tmp4) 2025-12-04T12:15:06.1173191Z E1204 12:07:04.845000 124235 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp6 = tmp5.to(tl.float32) 2025-12-04T12:15:06.1173739Z E1204 12:07:04.845000 124235 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp7 = tmp6.to(tl.float8e4nv) 2025-12-04T12:15:06.1174292Z E1204 12:07:04.845000 124235 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tl.store(out_ptr0 + (x0), tmp7, xmask) 2025-12-04T12:15:06.1174719Z E1204 12:07:04.845000 124235 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] 2025-12-04T12:15:06.1176453Z E1204 12:07:04.845000 124235 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] metadata: {'signature': {'in_ptr0': '*fp16', 'out_ptr0': '*fp8e4nv', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1024}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 75} 2025-12-04T12:15:06.1177071Z E1204 12:07:04.845000 124235 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Traceback (most recent call last): 2025-12-04T12:15:06.1178114Z E1204 12:07:04.845000 124235 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:06.1178762Z E1204 12:07:04.845000 124235 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:06.1179658Z E1204 12:07:04.845000 124235 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:06.1180343Z E1204 12:07:04.845000 124235 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:06.1181251Z E1204 12:07:04.845000 124235 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:06.1182022Z E1204 12:07:04.845000 124235 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:06.1182649Z E1204 12:07:04.845000 124235 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T12:15:06.1183400Z E1204 12:07:04.845000 124235 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:06.1183792Z E1204 12:07:04.845000 124235 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ^ 2025-12-04T12:15:06.1184733Z E1204 12:07:04.845000 124235 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:06.1184868Z ('RERUN', {'yellow': True}) [0.7586s] [ 0%] 2025-12-04T12:15:06.1186031Z inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float16_float8_e4m3fn_shape_4,2048,4096_cuda E1204 12:07:05.605000 124235 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Triton compilation failed: triton_poi_fused__to_copy_clamp_0 2025-12-04T12:15:06.1186837Z E1204 12:07:05.605000 124235 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:06.1187400Z E1204 12:07:05.605000 124235 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:06.1187967Z E1204 12:07:05.605000 124235 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T12:15:06.1188484Z E1204 12:07:05.605000 124235 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] xmask = xindex < xnumel 2025-12-04T12:15:06.1188959Z E1204 12:07:05.605000 124235 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] x0 = xindex 2025-12-04T12:15:06.1189560Z E1204 12:07:05.605000 124235 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2025-12-04T12:15:06.1190091Z E1204 12:07:05.605000 124235 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp1 = tmp0.to(tl.float32) 2025-12-04T12:15:06.1190539Z E1204 12:07:05.605000 124235 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp2 = -448.0 2025-12-04T12:15:06.1191120Z E1204 12:07:05.605000 124235 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp3 = triton_helpers.maximum(tmp1, tmp2) 2025-12-04T12:15:06.1191598Z E1204 12:07:05.605000 124235 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp4 = 448.0 2025-12-04T12:15:06.1192328Z E1204 12:07:05.605000 124235 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp5 = triton_helpers.minimum(tmp3, tmp4) 2025-12-04T12:15:06.1192855Z E1204 12:07:05.605000 124235 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp6 = tmp5.to(tl.float32) 2025-12-04T12:15:06.1193387Z E1204 12:07:05.605000 124235 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp7 = tmp6.to(tl.float8e4nv) 2025-12-04T12:15:06.1193953Z E1204 12:07:05.605000 124235 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tl.store(out_ptr0 + (x0), tmp7, xmask) 2025-12-04T12:15:06.1194318Z E1204 12:07:05.605000 124235 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] 2025-12-04T12:15:06.1195995Z E1204 12:07:05.605000 124235 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] metadata: {'signature': {'in_ptr0': '*fp16', 'out_ptr0': '*fp8e4nv', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1024}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 75} 2025-12-04T12:15:06.1196550Z E1204 12:07:05.605000 124235 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Traceback (most recent call last): 2025-12-04T12:15:06.1197594Z E1204 12:07:05.605000 124235 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:06.1198243Z E1204 12:07:05.605000 124235 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:06.1199204Z E1204 12:07:05.605000 124235 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:06.1199903Z E1204 12:07:05.605000 124235 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:06.1200826Z E1204 12:07:05.605000 124235 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:06.1201612Z E1204 12:07:05.605000 124235 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:06.1202224Z E1204 12:07:05.605000 124235 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T12:15:06.1202979Z E1204 12:07:05.605000 124235 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:06.1203403Z E1204 12:07:05.605000 124235 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ^ 2025-12-04T12:15:06.1204303Z E1204 12:07:05.605000 124235 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:06.1204421Z FAILED [0.7579s] [ 0%] 2025-12-04T12:15:06.1204427Z 2025-12-04T12:15:06.1204577Z ==================================== RERUNS ==================================== 2025-12-04T12:15:06.1204907Z _ TestFP8TypesCUDA.test_to_fp8_saturated_float16_float8_e4m3fn_shape_4,2048,4096_cuda _ 2025-12-04T12:15:06.1205083Z Traceback (most recent call last): 2025-12-04T12:15:06.1205493Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 206, in test_to_fp8_saturated 2025-12-04T12:15:06.1205686Z y_compiled = compiled_fp8_cast(x, dst_dtype) 2025-12-04T12:15:06.1206236Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:06.1206485Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:06.1207022Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:06.1207216Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:06.1207738Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:06.1207886Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:06.1208428Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:06.1208760Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:06.1209284Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:06.1209434Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:06.1209932Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:06.1210077Z return self._compile_to_module() 2025-12-04T12:15:06.1210571Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:06.1210736Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:06.1211252Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:06.1211449Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:06.1211945Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:06.1212192Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:06.1212775Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:06.1212944Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:06.1213463Z File "/tmp/tmpglafxchy/gt/cgtrdiumaxbfkm3z4iktfl6rcsgrvwz4zy6dlm6ighmkh6qwxqbj.py", line 50, in 2025-12-04T12:15:06.1213929Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T12:15:06.1214043Z kernel.precompile( 2025-12-04T12:15:06.1214619Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T12:15:06.1214738Z self._precompile_worker() 2025-12-04T12:15:06.1215345Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:06.1215561Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:06.1216156Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:06.1216439Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:06.1216896Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:06.1217156Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:06.1217599Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:06.1217976Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:06.1218224Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:06.1218538Z def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:06.1218628Z ^ 2025-12-04T12:15:06.1219102Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:06.1219111Z 2025-12-04T12:15:06.1219826Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:06.1219833Z 2025-12-04T12:15:06.1219838Z 2025-12-04T12:15:06.1220067Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:06.1220712Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_to_fp8_saturated_float16_float8_e4m3fn_shape_4,2048,4096_cuda 2025-12-04T12:15:06.1220720Z 2025-12-04T12:15:06.1221002Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:06.1221231Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:06.1221336Z frames [('total', 1)] 2025-12-04T12:15:06.1221469Z stats [('calls_captured', 8)] 2025-12-04T12:15:06.1221935Z inductor [('pattern_matcher_nodes', 2), ('pattern_matcher_count', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:06.1222160Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)] 2025-12-04T12:15:06.1222273Z graph_break [] 2025-12-04T12:15:06.1222603Z _ TestFP8TypesCUDA.test_to_fp8_saturated_float16_float8_e4m3fn_shape_4,2048,4096_cuda _ 2025-12-04T12:15:06.1222741Z Traceback (most recent call last): 2025-12-04T12:15:06.1223145Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 206, in test_to_fp8_saturated 2025-12-04T12:15:06.1223330Z y_compiled = compiled_fp8_cast(x, dst_dtype) 2025-12-04T12:15:06.1223832Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:06.1224083Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:06.1224595Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:06.1224832Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:06.1225339Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:06.1225499Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:06.1226033Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:06.1226355Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:06.1226891Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:06.1227075Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:06.1227570Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:06.1227700Z return self._compile_to_module() 2025-12-04T12:15:06.1228182Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:06.1228361Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:06.1228875Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:06.1229006Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:06.1229549Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:06.1229784Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:06.1230383Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:06.1230512Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:06.1231011Z File "/tmp/tmpftjp552m/ib/cibvr32quo63oqbwd35beydkb2mcnitnhf3qgzuspp5oa6piso5t.py", line 50, in 2025-12-04T12:15:06.1231490Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T12:15:06.1231603Z kernel.precompile( 2025-12-04T12:15:06.1232174Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T12:15:06.1232292Z self._precompile_worker() 2025-12-04T12:15:06.1232895Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:06.1233087Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:06.1233685Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:06.1233882Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:06.1234348Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:06.1234594Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:06.1235050Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:06.1235386Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:06.1235615Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:06.1235991Z def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:06.1236085Z ^ 2025-12-04T12:15:06.1236552Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:06.1236558Z 2025-12-04T12:15:06.1237268Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:06.1237307Z 2025-12-04T12:15:06.1237312Z 2025-12-04T12:15:06.1237548Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:06.1238203Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_to_fp8_saturated_float16_float8_e4m3fn_shape_4,2048,4096_cuda 2025-12-04T12:15:06.1238209Z 2025-12-04T12:15:06.1238482Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:06.1238726Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:06.1238868Z frames [('total', 1)] 2025-12-04T12:15:06.1238985Z stats [('calls_captured', 8)] 2025-12-04T12:15:06.1239466Z inductor [('pattern_matcher_nodes', 2), ('pattern_matcher_count', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:06.1239690Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)] 2025-12-04T12:15:06.1239809Z graph_break [] 2025-12-04T12:15:06.1240034Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:06.1240139Z frames [('total', 1)] 2025-12-04T12:15:06.1240272Z stats [('calls_captured', 8)] 2025-12-04T12:15:06.1240493Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)] 2025-12-04T12:15:06.1240958Z inductor [('pattern_matcher_nodes', 2), ('pattern_matcher_count', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:06.1241108Z graph_break [] 2025-12-04T12:15:06.1241266Z =================================== FAILURES =================================== 2025-12-04T12:15:06.1241613Z _ TestFP8TypesCUDA.test_to_fp8_saturated_float16_float8_e4m3fn_shape_4,2048,4096_cuda _ 2025-12-04T12:15:06.1241741Z Traceback (most recent call last): 2025-12-04T12:15:06.1242151Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 206, in test_to_fp8_saturated 2025-12-04T12:15:06.1242322Z y_compiled = compiled_fp8_cast(x, dst_dtype) 2025-12-04T12:15:06.1242812Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:06.1243063Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:06.1243590Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:06.1243787Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:06.1244313Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:06.1244465Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:06.1245001Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:06.1245337Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:06.1245862Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:06.1246027Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:06.1246509Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:06.1246634Z return self._compile_to_module() 2025-12-04T12:15:06.1247171Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:06.1247341Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:06.1247863Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:06.1248011Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:06.1248509Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:06.1248789Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:06.1249381Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:06.1249509Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:06.1250033Z File "/tmp/tmpxr130svn/kh/ckhezzz5lqtipruw74qnrsgwepab2nbounkw3shmzhpipxbsqm3y.py", line 50, in 2025-12-04T12:15:06.1250503Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T12:15:06.1250658Z kernel.precompile( 2025-12-04T12:15:06.1251208Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T12:15:06.1251326Z self._precompile_worker() 2025-12-04T12:15:06.1251936Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:06.1252118Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:06.1252714Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:06.1252924Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:06.1253406Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:06.1253670Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:06.1254117Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:06.1254450Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:06.1254687Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:06.1254998Z def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:06.1255099Z ^ 2025-12-04T12:15:06.1255557Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:06.1255562Z 2025-12-04T12:15:06.1256269Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:06.1256335Z 2025-12-04T12:15:06.1256348Z 2025-12-04T12:15:06.1256583Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:06.1257226Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_to_fp8_saturated_float16_float8_e4m3fn_shape_4,2048,4096_cuda 2025-12-04T12:15:06.1257233Z 2025-12-04T12:15:06.1257514Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:06.1257742Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:06.1257848Z frames [('total', 1)] 2025-12-04T12:15:06.1257980Z stats [('calls_captured', 8)] 2025-12-04T12:15:06.1258444Z inductor [('pattern_matcher_nodes', 2), ('pattern_matcher_count', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:06.1258679Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)] 2025-12-04T12:15:06.1258781Z graph_break [] 2025-12-04T12:15:06.1259049Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:06.1259168Z frames [('total', 1)] 2025-12-04T12:15:06.1259290Z stats [('calls_captured', 8)] 2025-12-04T12:15:06.1259509Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)] 2025-12-04T12:15:06.1259982Z inductor [('pattern_matcher_nodes', 2), ('pattern_matcher_count', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:06.1260117Z graph_break [] 2025-12-04T12:15:06.1260333Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:06.1260452Z frames [('total', 1)] 2025-12-04T12:15:06.1260569Z stats [('calls_captured', 8)] 2025-12-04T12:15:06.1260801Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)] 2025-12-04T12:15:06.1261257Z inductor [('pattern_matcher_nodes', 2), ('pattern_matcher_count', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:06.1261358Z graph_break [] 2025-12-04T12:15:06.1262033Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-7a610c26dd7fa0e9.xml - 2025-12-04T12:15:06.1262250Z =========================== short test summary info ============================ 2025-12-04T12:15:06.1263055Z FAILED [0.7579s] inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float16_float8_e4m3fn_shape_4,2048,4096_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:06.1263369Z def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:06.1263460Z ^ 2025-12-04T12:15:06.1263930Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:06.1263935Z 2025-12-04T12:15:06.1264645Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:06.1264683Z 2025-12-04T12:15:06.1264691Z 2025-12-04T12:15:06.1264921Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:06.1265567Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_to_fp8_saturated_float16_float8_e4m3fn_shape_4,2048,4096_cuda 2025-12-04T12:15:06.1265573Z 2025-12-04T12:15:06.1265841Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:06.1266046Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T12:15:06.1266251Z ================== 1 failed, 52 deselected, 2 rerun in 5.38s =================== 2025-12-04T12:15:06.1266369Z Got exit code 1 2025-12-04T12:15:06.1266481Z Retrying single test... 2025-12-04T12:15:06.1266953Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-269f6089cafc9f3b.xml 2025-12-04T12:15:06.1267133Z ============================= test session starts ============================== 2025-12-04T12:15:06.1267488Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T12:15:06.1267602Z cachedir: .pytest_cache 2025-12-04T12:15:06.1268136Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:15:06.1268264Z rootdir: /var/lib/jenkins/workspace 2025-12-04T12:15:06.1268390Z configfile: pytest.ini 2025-12-04T12:15:06.1268976Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T12:15:06.1269200Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T12:15:06.1269934Z stepcurrent: skipping 52 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float16_float8_e4m3fn_shape_4,2048,4096_cuda 2025-12-04T12:15:06.1270051Z Running 1 items in this shard 2025-12-04T12:15:06.1270058Z 2025-12-04T12:15:06.1271526Z inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float16_float8_e4m3fn_shape_4,2048,4096_cuda E1204 12:07:24.101000 124433 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Triton compilation failed: triton_poi_fused__to_copy_clamp_0 2025-12-04T12:15:06.1272289Z E1204 12:07:24.101000 124433 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:06.1273495Z E1204 12:07:24.101000 124433 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:06.1274083Z E1204 12:07:24.101000 124433 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T12:15:06.1274588Z E1204 12:07:24.101000 124433 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] xmask = xindex < xnumel 2025-12-04T12:15:06.1275042Z E1204 12:07:24.101000 124433 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] x0 = xindex 2025-12-04T12:15:06.1275701Z E1204 12:07:24.101000 124433 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2025-12-04T12:15:06.1276216Z E1204 12:07:24.101000 124433 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp1 = tmp0.to(tl.float32) 2025-12-04T12:15:06.1276681Z E1204 12:07:24.101000 124433 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp2 = -448.0 2025-12-04T12:15:06.1277250Z E1204 12:07:24.101000 124433 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp3 = triton_helpers.maximum(tmp1, tmp2) 2025-12-04T12:15:06.1277706Z E1204 12:07:24.101000 124433 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp4 = 448.0 2025-12-04T12:15:06.1278393Z E1204 12:07:24.101000 124433 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp5 = triton_helpers.minimum(tmp3, tmp4) 2025-12-04T12:15:06.1278928Z E1204 12:07:24.101000 124433 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp6 = tmp5.to(tl.float32) 2025-12-04T12:15:06.1279456Z E1204 12:07:24.101000 124433 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp7 = tmp6.to(tl.float8e4nv) 2025-12-04T12:15:06.1280007Z E1204 12:07:24.101000 124433 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tl.store(out_ptr0 + (x0), tmp7, xmask) 2025-12-04T12:15:06.1280386Z E1204 12:07:24.101000 124433 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] 2025-12-04T12:15:06.1282075Z E1204 12:07:24.101000 124433 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] metadata: {'signature': {'in_ptr0': '*fp16', 'out_ptr0': '*fp8e4nv', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1024}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 75} 2025-12-04T12:15:06.1282630Z E1204 12:07:24.101000 124433 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Traceback (most recent call last): 2025-12-04T12:15:06.1283679Z E1204 12:07:24.101000 124433 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:06.1284326Z E1204 12:07:24.101000 124433 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:06.1285348Z E1204 12:07:24.101000 124433 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:06.1286032Z E1204 12:07:24.101000 124433 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:06.1286939Z E1204 12:07:24.101000 124433 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:06.1287745Z E1204 12:07:24.101000 124433 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:06.1288371Z E1204 12:07:24.101000 124433 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T12:15:06.1289132Z E1204 12:07:24.101000 124433 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:06.1289547Z E1204 12:07:24.101000 124433 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ^ 2025-12-04T12:15:06.1290441Z E1204 12:07:24.101000 124433 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:06.1290577Z ('RERUN', {'yellow': True}) [3.8400s] [100%] 2025-12-04T12:15:06.1291735Z inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float16_float8_e4m3fn_shape_4,2048,4096_cuda E1204 12:07:24.900000 124433 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Triton compilation failed: triton_poi_fused__to_copy_clamp_0 2025-12-04T12:15:06.1292516Z E1204 12:07:24.900000 124433 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:06.1293077Z E1204 12:07:24.900000 124433 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:06.1293646Z E1204 12:07:24.900000 124433 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T12:15:06.1294158Z E1204 12:07:24.900000 124433 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] xmask = xindex < xnumel 2025-12-04T12:15:06.1294596Z E1204 12:07:24.900000 124433 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] x0 = xindex 2025-12-04T12:15:06.1295193Z E1204 12:07:24.900000 124433 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2025-12-04T12:15:06.1295723Z E1204 12:07:24.900000 124433 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp1 = tmp0.to(tl.float32) 2025-12-04T12:15:06.1296170Z E1204 12:07:24.900000 124433 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp2 = -448.0 2025-12-04T12:15:06.1296823Z E1204 12:07:24.900000 124433 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp3 = triton_helpers.maximum(tmp1, tmp2) 2025-12-04T12:15:06.1297266Z E1204 12:07:24.900000 124433 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp4 = 448.0 2025-12-04T12:15:06.1297839Z E1204 12:07:24.900000 124433 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp5 = triton_helpers.minimum(tmp3, tmp4) 2025-12-04T12:15:06.1298367Z E1204 12:07:24.900000 124433 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp6 = tmp5.to(tl.float32) 2025-12-04T12:15:06.1298894Z E1204 12:07:24.900000 124433 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp7 = tmp6.to(tl.float8e4nv) 2025-12-04T12:15:06.1299495Z E1204 12:07:24.900000 124433 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tl.store(out_ptr0 + (x0), tmp7, xmask) 2025-12-04T12:15:06.1299867Z E1204 12:07:24.900000 124433 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] 2025-12-04T12:15:06.1301532Z E1204 12:07:24.900000 124433 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] metadata: {'signature': {'in_ptr0': '*fp16', 'out_ptr0': '*fp8e4nv', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1024}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 75} 2025-12-04T12:15:06.1302117Z E1204 12:07:24.900000 124433 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Traceback (most recent call last): 2025-12-04T12:15:06.1303169Z E1204 12:07:24.900000 124433 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:06.1303847Z E1204 12:07:24.900000 124433 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:06.1304743Z E1204 12:07:24.900000 124433 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:06.1305447Z E1204 12:07:24.900000 124433 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:06.1306361Z E1204 12:07:24.900000 124433 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:06.1307150Z E1204 12:07:24.900000 124433 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:06.1307765Z E1204 12:07:24.900000 124433 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T12:15:06.1308527Z E1204 12:07:24.900000 124433 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:06.1308914Z E1204 12:07:24.900000 124433 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ^ 2025-12-04T12:15:06.1309812Z E1204 12:07:24.900000 124433 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:06.1309963Z ('RERUN', {'yellow': True}) [0.7598s] [100%] 2025-12-04T12:15:06.1311114Z inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float16_float8_e4m3fn_shape_4,2048,4096_cuda E1204 12:07:25.659000 124433 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Triton compilation failed: triton_poi_fused__to_copy_clamp_0 2025-12-04T12:15:06.1311877Z E1204 12:07:25.659000 124433 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:06.1312428Z E1204 12:07:25.659000 124433 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:06.1312993Z E1204 12:07:25.659000 124433 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T12:15:06.1313542Z E1204 12:07:25.659000 124433 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] xmask = xindex < xnumel 2025-12-04T12:15:06.1313982Z E1204 12:07:25.659000 124433 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] x0 = xindex 2025-12-04T12:15:06.1314595Z E1204 12:07:25.659000 124433 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2025-12-04T12:15:06.1315139Z E1204 12:07:25.659000 124433 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp1 = tmp0.to(tl.float32) 2025-12-04T12:15:06.1315585Z E1204 12:07:25.659000 124433 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp2 = -448.0 2025-12-04T12:15:06.1316166Z E1204 12:07:25.659000 124433 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp3 = triton_helpers.maximum(tmp1, tmp2) 2025-12-04T12:15:06.1316612Z E1204 12:07:25.659000 124433 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp4 = 448.0 2025-12-04T12:15:06.1317193Z E1204 12:07:25.659000 124433 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp5 = triton_helpers.minimum(tmp3, tmp4) 2025-12-04T12:15:06.1317738Z E1204 12:07:25.659000 124433 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp6 = tmp5.to(tl.float32) 2025-12-04T12:15:06.1318266Z E1204 12:07:25.659000 124433 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp7 = tmp6.to(tl.float8e4nv) 2025-12-04T12:15:06.1318831Z E1204 12:07:25.659000 124433 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tl.store(out_ptr0 + (x0), tmp7, xmask) 2025-12-04T12:15:06.1319199Z E1204 12:07:25.659000 124433 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] 2025-12-04T12:15:06.1320914Z E1204 12:07:25.659000 124433 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] metadata: {'signature': {'in_ptr0': '*fp16', 'out_ptr0': '*fp8e4nv', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1024}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 75} 2025-12-04T12:15:06.1321459Z E1204 12:07:25.659000 124433 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Traceback (most recent call last): 2025-12-04T12:15:06.1322513Z E1204 12:07:25.659000 124433 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:06.1323143Z E1204 12:07:25.659000 124433 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:06.1324058Z E1204 12:07:25.659000 124433 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:06.1324744Z E1204 12:07:25.659000 124433 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:06.1325639Z E1204 12:07:25.659000 124433 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:06.1326439Z E1204 12:07:25.659000 124433 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:06.1327047Z E1204 12:07:25.659000 124433 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T12:15:06.1327847Z E1204 12:07:25.659000 124433 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:06.1328223Z E1204 12:07:25.659000 124433 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ^ 2025-12-04T12:15:06.1329134Z E1204 12:07:25.659000 124433 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:06.1329276Z FAILED [0.7564s] [100%] 2025-12-04T12:15:06.1329283Z 2025-12-04T12:15:06.1329428Z ==================================== RERUNS ==================================== 2025-12-04T12:15:06.1329774Z _ TestFP8TypesCUDA.test_to_fp8_saturated_float16_float8_e4m3fn_shape_4,2048,4096_cuda _ 2025-12-04T12:15:06.1329901Z Traceback (most recent call last): 2025-12-04T12:15:06.1330315Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 206, in test_to_fp8_saturated 2025-12-04T12:15:06.1330479Z y_compiled = compiled_fp8_cast(x, dst_dtype) 2025-12-04T12:15:06.1331020Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:06.1331286Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:06.1331799Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:06.1331995Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:06.1332521Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:06.1332672Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:06.1333222Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:06.1333579Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:06.1334105Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:06.1334267Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:06.1334751Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:06.1334893Z return self._compile_to_module() 2025-12-04T12:15:06.1335380Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:06.1335548Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:06.1336082Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:06.1336216Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:06.1336787Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:06.1337037Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:06.1337624Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:06.1337767Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:06.1338264Z File "/tmp/tmpb725rz13/si/csiwejudcmit4ta34fze6lhmqx34pvu2t2ebnx7kjnxkwk6ejs4h.py", line 50, in 2025-12-04T12:15:06.1338727Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T12:15:06.1338855Z kernel.precompile( 2025-12-04T12:15:06.1339412Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T12:15:06.1339546Z self._precompile_worker() 2025-12-04T12:15:06.1340187Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:06.1340372Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:06.1340978Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:06.1341210Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:06.1341665Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:06.1341927Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:06.1342372Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:06.1342724Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:06.1342957Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:06.1343299Z def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:06.1343406Z ^ 2025-12-04T12:15:06.1343863Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:06.1343868Z 2025-12-04T12:15:06.1344595Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:06.1344601Z 2025-12-04T12:15:06.1344606Z 2025-12-04T12:15:06.1344824Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:06.1345466Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_to_fp8_saturated_float16_float8_e4m3fn_shape_4,2048,4096_cuda 2025-12-04T12:15:06.1345487Z 2025-12-04T12:15:06.1345792Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:06.1346018Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:06.1346140Z frames [('total', 1)] 2025-12-04T12:15:06.1346257Z stats [('calls_captured', 8)] 2025-12-04T12:15:06.1346722Z inductor [('pattern_matcher_nodes', 2), ('pattern_matcher_count', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:06.1346962Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)] 2025-12-04T12:15:06.1347068Z graph_break [] 2025-12-04T12:15:06.1347397Z _ TestFP8TypesCUDA.test_to_fp8_saturated_float16_float8_e4m3fn_shape_4,2048,4096_cuda _ 2025-12-04T12:15:06.1347533Z Traceback (most recent call last): 2025-12-04T12:15:06.1347940Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 206, in test_to_fp8_saturated 2025-12-04T12:15:06.1348101Z y_compiled = compiled_fp8_cast(x, dst_dtype) 2025-12-04T12:15:06.1348594Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:06.1348845Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:06.1349375Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:06.1349569Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:06.1350096Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:06.1350244Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:06.1350778Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:06.1351111Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:06.1351667Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:06.1351822Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:06.1352319Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:06.1352441Z return self._compile_to_module() 2025-12-04T12:15:06.1352933Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:06.1353128Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:06.1353641Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:06.1353785Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:06.1354283Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:06.1354530Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:06.1355114Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:06.1355275Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:06.1355790Z File "/tmp/tmpahkop17p/mn/cmn6d6w7uw7bkz7axfb2xh6lp7nwiymocv7nqm6qjcqriqzuldhl.py", line 50, in 2025-12-04T12:15:06.1356258Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T12:15:06.1356370Z kernel.precompile( 2025-12-04T12:15:06.1356934Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T12:15:06.1357052Z self._precompile_worker() 2025-12-04T12:15:06.1357689Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:06.1357874Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:06.1358473Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:06.1358685Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:06.1359137Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:06.1359398Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:06.1359841Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:06.1360177Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:06.1360417Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:06.1360732Z def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:06.1360822Z ^ 2025-12-04T12:15:06.1361291Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:06.1361299Z 2025-12-04T12:15:06.1362009Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:06.1362018Z 2025-12-04T12:15:06.1362023Z 2025-12-04T12:15:06.1362253Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:06.1362893Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_to_fp8_saturated_float16_float8_e4m3fn_shape_4,2048,4096_cuda 2025-12-04T12:15:06.1362899Z 2025-12-04T12:15:06.1363181Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:06.1363403Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:06.1363540Z frames [('total', 1)] 2025-12-04T12:15:06.1363672Z stats [('calls_captured', 8)] 2025-12-04T12:15:06.1364137Z inductor [('pattern_matcher_nodes', 2), ('pattern_matcher_count', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:06.1364358Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)] 2025-12-04T12:15:06.1364471Z graph_break [] 2025-12-04T12:15:06.1364720Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:06.1364836Z frames [('total', 1)] 2025-12-04T12:15:06.1364954Z stats [('calls_captured', 8)] 2025-12-04T12:15:06.1365171Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)] 2025-12-04T12:15:06.1365642Z inductor [('pattern_matcher_nodes', 2), ('pattern_matcher_count', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:06.1365745Z graph_break [] 2025-12-04T12:15:06.1365898Z =================================== FAILURES =================================== 2025-12-04T12:15:06.1366239Z _ TestFP8TypesCUDA.test_to_fp8_saturated_float16_float8_e4m3fn_shape_4,2048,4096_cuda _ 2025-12-04T12:15:06.1366393Z Traceback (most recent call last): 2025-12-04T12:15:06.1366812Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 206, in test_to_fp8_saturated 2025-12-04T12:15:06.1366960Z y_compiled = compiled_fp8_cast(x, dst_dtype) 2025-12-04T12:15:06.1367449Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:06.1367710Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:06.1368219Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:06.1368413Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:06.1368968Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:06.1369115Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:06.1369660Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:06.1369980Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:06.1370500Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:06.1370661Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:06.1371328Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:06.1371466Z return self._compile_to_module() 2025-12-04T12:15:06.1371955Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:06.1372127Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:06.1372659Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:06.1372791Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:06.1373289Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:06.1373540Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:06.1374122Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:06.1374263Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:06.1374770Z File "/tmp/tmpc1xvcb6p/mg/cmgvvw2kxh7ublbprutm5gnt3ve5cmrleevfnk4i44xwqujttayy.py", line 50, in 2025-12-04T12:15:06.1375305Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T12:15:06.1375431Z kernel.precompile( 2025-12-04T12:15:06.1375985Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T12:15:06.1376120Z self._precompile_worker() 2025-12-04T12:15:06.1376778Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:06.1377027Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:06.1377632Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:06.1377831Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:06.1378291Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:06.1378543Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:06.1378985Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:06.1379376Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:06.1379603Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:06.1379919Z def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:06.1380026Z ^ 2025-12-04T12:15:06.1380481Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:06.1380487Z 2025-12-04T12:15:06.1381211Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:06.1381217Z 2025-12-04T12:15:06.1381222Z 2025-12-04T12:15:06.1381485Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:06.1382126Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_to_fp8_saturated_float16_float8_e4m3fn_shape_4,2048,4096_cuda 2025-12-04T12:15:06.1382149Z 2025-12-04T12:15:06.1382420Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:06.1382639Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:06.1382760Z frames [('total', 1)] 2025-12-04T12:15:06.1382877Z stats [('calls_captured', 8)] 2025-12-04T12:15:06.1383347Z inductor [('pattern_matcher_nodes', 2), ('pattern_matcher_count', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:06.1383581Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)] 2025-12-04T12:15:06.1383683Z graph_break [] 2025-12-04T12:15:06.1383916Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:06.1384026Z frames [('total', 1)] 2025-12-04T12:15:06.1384142Z stats [('calls_captured', 8)] 2025-12-04T12:15:06.1384374Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)] 2025-12-04T12:15:06.1384833Z inductor [('pattern_matcher_nodes', 2), ('pattern_matcher_count', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:06.1384932Z graph_break [] 2025-12-04T12:15:06.1385166Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:06.1385270Z frames [('total', 1)] 2025-12-04T12:15:06.1385385Z stats [('calls_captured', 8)] 2025-12-04T12:15:06.1385618Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)] 2025-12-04T12:15:06.1386074Z inductor [('pattern_matcher_nodes', 2), ('pattern_matcher_count', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:06.1386186Z graph_break [] 2025-12-04T12:15:06.1386873Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-269f6089cafc9f3b.xml - 2025-12-04T12:15:06.1387049Z =========================== short test summary info ============================ 2025-12-04T12:15:06.1387875Z FAILED [0.7564s] inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float16_float8_e4m3fn_shape_4,2048,4096_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:06.1388186Z def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:06.1388322Z ^ 2025-12-04T12:15:06.1388780Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:06.1388785Z 2025-12-04T12:15:06.1389492Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:06.1389498Z 2025-12-04T12:15:06.1389517Z 2025-12-04T12:15:06.1389742Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:06.1390382Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_to_fp8_saturated_float16_float8_e4m3fn_shape_4,2048,4096_cuda 2025-12-04T12:15:06.1390418Z 2025-12-04T12:15:06.1390703Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:06.1390886Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T12:15:06.1391091Z ================== 1 failed, 187 deselected, 2 rerun in 5.40s ================== 2025-12-04T12:15:06.1391208Z Got exit code 1 2025-12-04T12:15:06.1391319Z Retrying single test... 2025-12-04T12:15:06.1391805Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-f11fe18ee197cc1f.xml 2025-12-04T12:15:06.1391972Z ============================= test session starts ============================== 2025-12-04T12:15:06.1392361Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T12:15:06.1392489Z cachedir: .pytest_cache 2025-12-04T12:15:06.1393014Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:15:06.1393142Z rootdir: /var/lib/jenkins/workspace 2025-12-04T12:15:06.1393267Z configfile: pytest.ini 2025-12-04T12:15:06.1393889Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T12:15:06.1394205Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T12:15:06.1394978Z stepcurrent: skipping 52 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float16_float8_e4m3fn_shape_4,2048,4096_cuda 2025-12-04T12:15:06.1395099Z Running 1 items in this shard 2025-12-04T12:15:06.1395105Z 2025-12-04T12:15:06.1396284Z inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float16_float8_e4m3fn_shape_4,2048,4096_cuda E1204 12:07:43.903000 124631 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Triton compilation failed: triton_poi_fused__to_copy_clamp_0 2025-12-04T12:15:06.1397049Z E1204 12:07:43.903000 124631 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:06.1397618Z E1204 12:07:43.903000 124631 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:06.1398193Z E1204 12:07:43.903000 124631 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T12:15:06.1398711Z E1204 12:07:43.903000 124631 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] xmask = xindex < xnumel 2025-12-04T12:15:06.1399200Z E1204 12:07:43.903000 124631 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] x0 = xindex 2025-12-04T12:15:06.1399801Z E1204 12:07:43.903000 124631 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2025-12-04T12:15:06.1400336Z E1204 12:07:43.903000 124631 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp1 = tmp0.to(tl.float32) 2025-12-04T12:15:06.1400820Z E1204 12:07:43.903000 124631 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp2 = -448.0 2025-12-04T12:15:06.1401405Z E1204 12:07:43.903000 124631 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp3 = triton_helpers.maximum(tmp1, tmp2) 2025-12-04T12:15:06.1401849Z E1204 12:07:43.903000 124631 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp4 = 448.0 2025-12-04T12:15:06.1402422Z E1204 12:07:43.903000 124631 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp5 = triton_helpers.minimum(tmp3, tmp4) 2025-12-04T12:15:06.1402952Z E1204 12:07:43.903000 124631 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp6 = tmp5.to(tl.float32) 2025-12-04T12:15:06.1403515Z E1204 12:07:43.903000 124631 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp7 = tmp6.to(tl.float8e4nv) 2025-12-04T12:15:06.1404077Z E1204 12:07:43.903000 124631 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tl.store(out_ptr0 + (x0), tmp7, xmask) 2025-12-04T12:15:06.1404449Z E1204 12:07:43.903000 124631 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] 2025-12-04T12:15:06.1406188Z E1204 12:07:43.903000 124631 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] metadata: {'signature': {'in_ptr0': '*fp16', 'out_ptr0': '*fp8e4nv', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1024}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 75} 2025-12-04T12:15:06.1406734Z E1204 12:07:43.903000 124631 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Traceback (most recent call last): 2025-12-04T12:15:06.1407778Z E1204 12:07:43.903000 124631 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:06.1408426Z E1204 12:07:43.903000 124631 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:06.1409323Z E1204 12:07:43.903000 124631 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:06.1410023Z E1204 12:07:43.903000 124631 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:06.1410906Z E1204 12:07:43.903000 124631 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:06.1411694Z E1204 12:07:43.903000 124631 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:06.1412305Z E1204 12:07:43.903000 124631 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T12:15:06.1413061Z E1204 12:07:43.903000 124631 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:06.1413480Z E1204 12:07:43.903000 124631 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ^ 2025-12-04T12:15:06.1414382Z E1204 12:07:43.903000 124631 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:06.1414572Z ('RERUN', {'yellow': True}) [3.8573s] [100%] 2025-12-04T12:15:06.1415721Z inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float16_float8_e4m3fn_shape_4,2048,4096_cuda E1204 12:07:44.715000 124631 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Triton compilation failed: triton_poi_fused__to_copy_clamp_0 2025-12-04T12:15:06.1416556Z E1204 12:07:44.715000 124631 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:06.1417110Z E1204 12:07:44.715000 124631 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:06.1417717Z E1204 12:07:44.715000 124631 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T12:15:06.1418231Z E1204 12:07:44.715000 124631 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] xmask = xindex < xnumel 2025-12-04T12:15:06.1418671Z E1204 12:07:44.715000 124631 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] x0 = xindex 2025-12-04T12:15:06.1419282Z E1204 12:07:44.715000 124631 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2025-12-04T12:15:06.1419797Z E1204 12:07:44.715000 124631 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp1 = tmp0.to(tl.float32) 2025-12-04T12:15:06.1420287Z E1204 12:07:44.715000 124631 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp2 = -448.0 2025-12-04T12:15:06.1420869Z E1204 12:07:44.715000 124631 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp3 = triton_helpers.maximum(tmp1, tmp2) 2025-12-04T12:15:06.1421317Z E1204 12:07:44.715000 124631 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp4 = 448.0 2025-12-04T12:15:06.1421901Z E1204 12:07:44.715000 124631 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp5 = triton_helpers.minimum(tmp3, tmp4) 2025-12-04T12:15:06.1422414Z E1204 12:07:44.715000 124631 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp6 = tmp5.to(tl.float32) 2025-12-04T12:15:06.1422946Z E1204 12:07:44.715000 124631 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp7 = tmp6.to(tl.float8e4nv) 2025-12-04T12:15:06.1423517Z E1204 12:07:44.715000 124631 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tl.store(out_ptr0 + (x0), tmp7, xmask) 2025-12-04T12:15:06.1423884Z E1204 12:07:44.715000 124631 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] 2025-12-04T12:15:06.1425570Z E1204 12:07:44.715000 124631 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] metadata: {'signature': {'in_ptr0': '*fp16', 'out_ptr0': '*fp8e4nv', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1024}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 75} 2025-12-04T12:15:06.1426114Z E1204 12:07:44.715000 124631 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Traceback (most recent call last): 2025-12-04T12:15:06.1427236Z E1204 12:07:44.715000 124631 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:06.1427871Z E1204 12:07:44.715000 124631 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:06.1428779Z E1204 12:07:44.715000 124631 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:06.1429495Z E1204 12:07:44.715000 124631 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:06.1430380Z E1204 12:07:44.715000 124631 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:06.1431171Z E1204 12:07:44.715000 124631 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:06.1431824Z E1204 12:07:44.715000 124631 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T12:15:06.1432591Z E1204 12:07:44.715000 124631 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:06.1432965Z E1204 12:07:44.715000 124631 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ^ 2025-12-04T12:15:06.1433872Z E1204 12:07:44.715000 124631 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:06.1434046Z ('RERUN', {'yellow': True}) [0.7700s] [100%] 2025-12-04T12:15:06.1435212Z inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float16_float8_e4m3fn_shape_4,2048,4096_cuda E1204 12:07:45.481000 124631 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Triton compilation failed: triton_poi_fused__to_copy_clamp_0 2025-12-04T12:15:06.1436107Z E1204 12:07:45.481000 124631 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:06.1436662Z E1204 12:07:45.481000 124631 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:06.1437240Z E1204 12:07:45.481000 124631 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T12:15:06.1437741Z E1204 12:07:45.481000 124631 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] xmask = xindex < xnumel 2025-12-04T12:15:06.1438195Z E1204 12:07:45.481000 124631 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] x0 = xindex 2025-12-04T12:15:06.1438798Z E1204 12:07:45.481000 124631 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2025-12-04T12:15:06.1439314Z E1204 12:07:45.481000 124631 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp1 = tmp0.to(tl.float32) 2025-12-04T12:15:06.1439779Z E1204 12:07:45.481000 124631 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp2 = -448.0 2025-12-04T12:15:06.1440346Z E1204 12:07:45.481000 124631 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp3 = triton_helpers.maximum(tmp1, tmp2) 2025-12-04T12:15:06.1440799Z E1204 12:07:45.481000 124631 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp4 = 448.0 2025-12-04T12:15:06.1441434Z E1204 12:07:45.481000 124631 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp5 = triton_helpers.minimum(tmp3, tmp4) 2025-12-04T12:15:06.1441955Z E1204 12:07:45.481000 124631 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp6 = tmp5.to(tl.float32) 2025-12-04T12:15:06.1442500Z E1204 12:07:45.481000 124631 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp7 = tmp6.to(tl.float8e4nv) 2025-12-04T12:15:06.1443087Z E1204 12:07:45.481000 124631 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tl.store(out_ptr0 + (x0), tmp7, xmask) 2025-12-04T12:15:06.1443464Z E1204 12:07:45.481000 124631 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] 2025-12-04T12:15:06.1445143Z E1204 12:07:45.481000 124631 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] metadata: {'signature': {'in_ptr0': '*fp16', 'out_ptr0': '*fp8e4nv', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1024}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 75} 2025-12-04T12:15:06.1445732Z E1204 12:07:45.481000 124631 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Traceback (most recent call last): 2025-12-04T12:15:06.1446785Z E1204 12:07:45.481000 124631 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:06.1447417Z E1204 12:07:45.481000 124631 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:06.1448377Z E1204 12:07:45.481000 124631 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:06.1449066Z E1204 12:07:45.481000 124631 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:06.1449975Z E1204 12:07:45.481000 124631 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:06.1450747Z E1204 12:07:45.481000 124631 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:06.1451368Z E1204 12:07:45.481000 124631 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T12:15:06.1452129Z E1204 12:07:45.481000 124631 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:06.1452504Z E1204 12:07:45.481000 124631 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ^ 2025-12-04T12:15:06.1453411Z E1204 12:07:45.481000 124631 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:06.1453517Z FAILED [0.7644s] [100%] 2025-12-04T12:15:06.1453524Z 2025-12-04T12:15:06.1453682Z ==================================== RERUNS ==================================== 2025-12-04T12:15:06.1454010Z _ TestFP8TypesCUDA.test_to_fp8_saturated_float16_float8_e4m3fn_shape_4,2048,4096_cuda _ 2025-12-04T12:15:06.1454135Z Traceback (most recent call last): 2025-12-04T12:15:06.1454556Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 206, in test_to_fp8_saturated 2025-12-04T12:15:06.1454767Z y_compiled = compiled_fp8_cast(x, dst_dtype) 2025-12-04T12:15:06.1455271Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:06.1455523Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:06.1456038Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:06.1456346Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:06.1456875Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:06.1457024Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:06.1457573Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:06.1457897Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:06.1458430Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:06.1458626Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:06.1459109Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:06.1459252Z return self._compile_to_module() 2025-12-04T12:15:06.1459737Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:06.1459917Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:06.1460431Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:06.1460562Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:06.1461104Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:06.1461339Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:06.1461940Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:06.1462068Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:06.1462570Z File "/tmp/tmp0p6vosii/ow/cowyk5fa37ssybq6rme34kpjpmxcejuynwrnyi3tx2boipbg6oge.py", line 50, in 2025-12-04T12:15:06.1463046Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T12:15:06.1463159Z kernel.precompile( 2025-12-04T12:15:06.1463711Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T12:15:06.1463841Z self._precompile_worker() 2025-12-04T12:15:06.1464441Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:06.1464635Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:06.1465231Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:06.1465429Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:06.1465892Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:06.1466136Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:06.1466592Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:06.1466928Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:06.1467157Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:06.1467516Z def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:06.1467612Z ^ 2025-12-04T12:15:06.1468072Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:06.1468078Z 2025-12-04T12:15:06.1468800Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:06.1468841Z 2025-12-04T12:15:06.1468846Z 2025-12-04T12:15:06.1469065Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:06.1469719Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_to_fp8_saturated_float16_float8_e4m3fn_shape_4,2048,4096_cuda 2025-12-04T12:15:06.1469726Z 2025-12-04T12:15:06.1470001Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:06.1470240Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:06.1470384Z frames [('total', 1)] 2025-12-04T12:15:06.1470503Z stats [('calls_captured', 8)] 2025-12-04T12:15:06.1471166Z inductor [('pattern_matcher_nodes', 2), ('pattern_matcher_count', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:06.1471392Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)] 2025-12-04T12:15:06.1471496Z graph_break [] 2025-12-04T12:15:06.1471842Z _ TestFP8TypesCUDA.test_to_fp8_saturated_float16_float8_e4m3fn_shape_4,2048,4096_cuda _ 2025-12-04T12:15:06.1471968Z Traceback (most recent call last): 2025-12-04T12:15:06.1472388Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 206, in test_to_fp8_saturated 2025-12-04T12:15:06.1472538Z y_compiled = compiled_fp8_cast(x, dst_dtype) 2025-12-04T12:15:06.1473136Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:06.1473402Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:06.1473919Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:06.1474114Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:06.1479466Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:06.1479648Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:06.1480215Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:06.1480538Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:06.1481069Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:06.1481239Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:06.1481722Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:06.1481856Z return self._compile_to_module() 2025-12-04T12:15:06.1482355Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:06.1482523Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:06.1483055Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:06.1483186Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:06.1483682Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:06.1483933Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:06.1484654Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:06.1484803Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:06.1485308Z File "/tmp/tmpss8tq8h7/me/cmetmdfmdndlnpbb2a36fujjsxkodhwtv2hita3jt4dzvoaayfeu.py", line 50, in 2025-12-04T12:15:06.1485772Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T12:15:06.1485949Z kernel.precompile( 2025-12-04T12:15:06.1486506Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T12:15:06.1486626Z self._precompile_worker() 2025-12-04T12:15:06.1487239Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:06.1487424Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:06.1488040Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:06.1488288Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:06.1488739Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:06.1489007Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:06.1489449Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:06.1489797Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:06.1490028Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:06.1490337Z def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:06.1490578Z ^ 2025-12-04T12:15:06.1491042Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:06.1491051Z 2025-12-04T12:15:06.1491763Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:06.1491783Z 2025-12-04T12:15:06.1491790Z 2025-12-04T12:15:06.1492007Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:06.1492651Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_to_fp8_saturated_float16_float8_e4m3fn_shape_4,2048,4096_cuda 2025-12-04T12:15:06.1492658Z 2025-12-04T12:15:06.1492942Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:06.1493166Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:06.1493293Z frames [('total', 1)] 2025-12-04T12:15:06.1493417Z stats [('calls_captured', 8)] 2025-12-04T12:15:06.1493885Z inductor [('pattern_matcher_nodes', 2), ('pattern_matcher_count', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:06.1494128Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)] 2025-12-04T12:15:06.1494233Z graph_break [] 2025-12-04T12:15:06.1494456Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:06.1494580Z frames [('total', 1)] 2025-12-04T12:15:06.1494693Z stats [('calls_captured', 8)] 2025-12-04T12:15:06.1494913Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)] 2025-12-04T12:15:06.1495382Z inductor [('pattern_matcher_nodes', 2), ('pattern_matcher_count', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:06.1495481Z graph_break [] 2025-12-04T12:15:06.1495639Z =================================== FAILURES =================================== 2025-12-04T12:15:06.1496005Z _ TestFP8TypesCUDA.test_to_fp8_saturated_float16_float8_e4m3fn_shape_4,2048,4096_cuda _ 2025-12-04T12:15:06.1496134Z Traceback (most recent call last): 2025-12-04T12:15:06.1496677Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 206, in test_to_fp8_saturated 2025-12-04T12:15:06.1496827Z y_compiled = compiled_fp8_cast(x, dst_dtype) 2025-12-04T12:15:06.1497333Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:06.1497621Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:06.1498131Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:06.1498341Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:06.1498848Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:06.1499003Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:06.1499547Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:06.1499902Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:06.1500440Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:06.1500593Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:06.1501074Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:06.1501207Z return self._compile_to_module() 2025-12-04T12:15:06.1501690Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:06.1501864Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:06.1502415Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:06.1502551Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:06.1503062Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:06.1503296Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:06.1503882Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:06.1504022Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:06.1504526Z File "/tmp/tmpfuj1hu4w/qv/cqvj6nhtmpiaob3lx7ddo35lewveggcp7cpbeulpxkm5kn4exrbs.py", line 50, in 2025-12-04T12:15:06.1505000Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T12:15:06.1505116Z kernel.precompile( 2025-12-04T12:15:06.1505670Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T12:15:06.1505799Z self._precompile_worker() 2025-12-04T12:15:06.1506393Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:06.1506583Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:06.1507173Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:06.1507371Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:06.1507835Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:06.1508081Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:06.1508565Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:06.1508916Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:06.1509142Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:06.1509465Z def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:06.1509587Z ^ 2025-12-04T12:15:06.1510047Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:06.1510053Z 2025-12-04T12:15:06.1510780Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:06.1510786Z 2025-12-04T12:15:06.1510791Z 2025-12-04T12:15:06.1511010Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:06.1511668Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_to_fp8_saturated_float16_float8_e4m3fn_shape_4,2048,4096_cuda 2025-12-04T12:15:06.1511706Z 2025-12-04T12:15:06.1511977Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:06.1512213Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:06.1512322Z frames [('total', 1)] 2025-12-04T12:15:06.1512439Z stats [('calls_captured', 8)] 2025-12-04T12:15:06.1512913Z inductor [('pattern_matcher_nodes', 2), ('pattern_matcher_count', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:06.1513136Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)] 2025-12-04T12:15:06.1513238Z graph_break [] 2025-12-04T12:15:06.1513471Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:06.1513575Z frames [('total', 1)] 2025-12-04T12:15:06.1513726Z stats [('calls_captured', 8)] 2025-12-04T12:15:06.1513956Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)] 2025-12-04T12:15:06.1514417Z inductor [('pattern_matcher_nodes', 2), ('pattern_matcher_count', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:06.1514533Z graph_break [] 2025-12-04T12:15:06.1514750Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:06.1514856Z frames [('total', 1)] 2025-12-04T12:15:06.1514984Z stats [('calls_captured', 8)] 2025-12-04T12:15:06.1515204Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)] 2025-12-04T12:15:06.1515661Z inductor [('pattern_matcher_nodes', 2), ('pattern_matcher_count', 1), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:06.1515771Z graph_break [] 2025-12-04T12:15:06.1516426Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-f11fe18ee197cc1f.xml - 2025-12-04T12:15:06.1516614Z =========================== short test summary info ============================ 2025-12-04T12:15:06.1517411Z FAILED [0.7644s] inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float16_float8_e4m3fn_shape_4,2048,4096_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:06.1517720Z def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:06.1517826Z ^ 2025-12-04T12:15:06.1518284Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:06.1518290Z 2025-12-04T12:15:06.1519008Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:06.1519014Z 2025-12-04T12:15:06.1519018Z 2025-12-04T12:15:06.1519236Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:06.1519912Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_to_fp8_saturated_float16_float8_e4m3fn_shape_4,2048,4096_cuda 2025-12-04T12:15:06.1519933Z 2025-12-04T12:15:06.1520200Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:06.1520379Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T12:15:06.1520629Z ================== 1 failed, 187 deselected, 2 rerun in 5.43s ================== 2025-12-04T12:15:06.1520729Z Got exit code 1 2025-12-04T12:15:06.1521284Z FAILED CONSISTENTLY: test/inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float16_float8_e4m3fn_shape_4,2048,4096_cuda 2025-12-04T12:15:06.1521707Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T12:15:06.1522179Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-0b8acd36d7258295.xml 2025-12-04T12:15:06.1522363Z ============================= test session starts ============================== 2025-12-04T12:15:06.1522713Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T12:15:06.1522884Z cachedir: .pytest_cache 2025-12-04T12:15:06.1523415Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:15:06.1523543Z rootdir: /var/lib/jenkins/workspace 2025-12-04T12:15:06.1523653Z configfile: pytest.ini 2025-12-04T12:15:06.1524256Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T12:15:06.1524482Z collecting ... collected 188 items / 53 deselected / 135 selected 2025-12-04T12:15:06.1524639Z stepcurrent: skipping 53 already run items. 2025-12-04T12:15:06.1524756Z Running 135 items in this shard 2025-12-04T12:15:06.1524762Z 2025-12-04T12:15:06.1525305Z inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float16_float8_e5m2_shape_16,16,16_cuda PASSED [4.1125s] [ 0%] 2025-12-04T12:15:06.1525835Z inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float16_float8_e5m2_shape_4,2048,4096_cuda PASSED [0.8458s] [ 1%] 2025-12-04T12:15:06.1526966Z inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float32_float8_e4m3fn_shape_16,16,16_cuda E1204 12:08:05.425000 124829 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Triton compilation failed: triton_poi_fused__to_copy_clamp_0 2025-12-04T12:15:06.1527738Z E1204 12:08:05.425000 124829 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:06.1528283Z E1204 12:08:05.425000 124829 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:06.1528869Z E1204 12:08:05.425000 124829 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T12:15:06.1529369Z E1204 12:08:05.425000 124829 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] xmask = xindex < xnumel 2025-12-04T12:15:06.1529810Z E1204 12:08:05.425000 124829 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] x0 = xindex 2025-12-04T12:15:06.1530370Z E1204 12:08:05.425000 124829 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp0 = tl.load(in_ptr0 + (x0), xmask) 2025-12-04T12:15:06.1530819Z E1204 12:08:05.425000 124829 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp1 = -448.0 2025-12-04T12:15:06.1531400Z E1204 12:08:05.425000 124829 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp2 = triton_helpers.maximum(tmp0, tmp1) 2025-12-04T12:15:06.1531885Z E1204 12:08:05.425000 124829 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp3 = 448.0 2025-12-04T12:15:06.1532452Z E1204 12:08:05.425000 124829 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp4 = triton_helpers.minimum(tmp2, tmp3) 2025-12-04T12:15:06.1533000Z E1204 12:08:05.425000 124829 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp5 = tmp4.to(tl.float8e4nv) 2025-12-04T12:15:06.1533548Z E1204 12:08:05.425000 124829 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tl.store(out_ptr0 + (x0), tmp5, xmask) 2025-12-04T12:15:06.1533961Z E1204 12:08:05.425000 124829 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] 2025-12-04T12:15:06.1535640Z E1204 12:08:05.425000 124829 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] metadata: {'signature': {'in_ptr0': '*fp32', 'out_ptr0': '*fp8e4nv', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 256}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 75} 2025-12-04T12:15:06.1536232Z E1204 12:08:05.425000 124829 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Traceback (most recent call last): 2025-12-04T12:15:06.1537363Z E1204 12:08:05.425000 124829 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:06.1538000Z E1204 12:08:05.425000 124829 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:06.1538926Z E1204 12:08:05.425000 124829 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:06.1539808Z E1204 12:08:05.425000 124829 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:06.1540753Z E1204 12:08:05.425000 124829 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:06.1541532Z E1204 12:08:05.425000 124829 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:06.1542152Z E1204 12:08:05.425000 124829 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T12:15:06.1542911Z E1204 12:08:05.425000 124829 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:06.1543283Z E1204 12:08:05.425000 124829 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ^ 2025-12-04T12:15:06.1544197Z E1204 12:08:05.425000 124829 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:06.1544334Z ('RERUN', {'yellow': True}) [0.5372s] [ 2%] 2025-12-04T12:15:06.1545573Z inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float32_float8_e4m3fn_shape_16,16,16_cuda E1204 12:08:06.155000 124829 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Triton compilation failed: triton_poi_fused__to_copy_clamp_0 2025-12-04T12:15:06.1546326Z E1204 12:08:06.155000 124829 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:06.1546931Z E1204 12:08:06.155000 124829 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:06.1547502Z E1204 12:08:06.155000 124829 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T12:15:06.1548001Z E1204 12:08:06.155000 124829 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] xmask = xindex < xnumel 2025-12-04T12:15:06.1548485Z E1204 12:08:06.155000 124829 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] x0 = xindex 2025-12-04T12:15:06.1549029Z E1204 12:08:06.155000 124829 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp0 = tl.load(in_ptr0 + (x0), xmask) 2025-12-04T12:15:06.1549493Z E1204 12:08:06.155000 124829 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp1 = -448.0 2025-12-04T12:15:06.1550070Z E1204 12:08:06.155000 124829 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp2 = triton_helpers.maximum(tmp0, tmp1) 2025-12-04T12:15:06.1550546Z E1204 12:08:06.155000 124829 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp3 = 448.0 2025-12-04T12:15:06.1551127Z E1204 12:08:06.155000 124829 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp4 = triton_helpers.minimum(tmp2, tmp3) 2025-12-04T12:15:06.1551654Z E1204 12:08:06.155000 124829 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp5 = tmp4.to(tl.float8e4nv) 2025-12-04T12:15:06.1552214Z E1204 12:08:06.155000 124829 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tl.store(out_ptr0 + (x0), tmp5, xmask) 2025-12-04T12:15:06.1552582Z E1204 12:08:06.155000 124829 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] 2025-12-04T12:15:06.1554300Z E1204 12:08:06.155000 124829 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] metadata: {'signature': {'in_ptr0': '*fp32', 'out_ptr0': '*fp8e4nv', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 256}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 75} 2025-12-04T12:15:06.1554843Z E1204 12:08:06.155000 124829 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Traceback (most recent call last): 2025-12-04T12:15:06.1555898Z E1204 12:08:06.155000 124829 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:06.1556538Z E1204 12:08:06.155000 124829 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:06.1557441Z E1204 12:08:06.155000 124829 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:06.1558147Z E1204 12:08:06.155000 124829 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:06.1559043Z E1204 12:08:06.155000 124829 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:06.1559836Z E1204 12:08:06.155000 124829 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:06.1560486Z E1204 12:08:06.155000 124829 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T12:15:06.1561265Z E1204 12:08:06.155000 124829 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:06.1561641Z E1204 12:08:06.155000 124829 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ^ 2025-12-04T12:15:06.1562543Z E1204 12:08:06.155000 124829 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:06.1562722Z ('RERUN', {'yellow': True}) [0.7003s] [ 2%] 2025-12-04T12:15:06.1563233Z inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float32_float8_e4m3fn_shape_16,16,16_cuda FAILED [0.9874s] [ 2%] 2025-12-04T12:15:06.1563239Z 2025-12-04T12:15:06.1563399Z ==================================== RERUNS ==================================== 2025-12-04T12:15:06.1563721Z _ TestFP8TypesCUDA.test_to_fp8_saturated_float32_float8_e4m3fn_shape_16,16,16_cuda _ 2025-12-04T12:15:06.1563848Z Traceback (most recent call last): 2025-12-04T12:15:06.1564302Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 206, in test_to_fp8_saturated 2025-12-04T12:15:06.1564455Z y_compiled = compiled_fp8_cast(x, dst_dtype) 2025-12-04T12:15:06.1564948Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:06.1565222Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:06.1565739Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:06.1565950Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:06.1566466Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:06.1566650Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:06.1567201Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:06.1567526Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:06.1568065Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:06.1568214Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:06.1568697Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:06.1568835Z return self._compile_to_module() 2025-12-04T12:15:06.1569319Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:06.1569489Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:06.1570022Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:06.1570157Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:06.1570669Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:06.1570901Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:06.1571686Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:06.1571831Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:06.1572334Z File "/tmp/tmpuzy8y7ce/b6/cb64tokfvetyct5qwmegpfoqpi7vhmuw3rfzaf7ogpqiwu6ucwh7.py", line 48, in 2025-12-04T12:15:06.1572807Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T12:15:06.1572923Z kernel.precompile( 2025-12-04T12:15:06.1573553Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T12:15:06.1573690Z self._precompile_worker() 2025-12-04T12:15:06.1574288Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:06.1574532Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:06.1575142Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:06.1575340Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:06.1575803Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:06.1576047Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:06.1576561Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:06.1576972Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:06.1577197Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:06.1577519Z def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:06.1577611Z ^ 2025-12-04T12:15:06.1578070Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:06.1578076Z 2025-12-04T12:15:06.1578803Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:06.1578810Z 2025-12-04T12:15:06.1578815Z 2025-12-04T12:15:06.1579031Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:06.1579726Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_to_fp8_saturated_float32_float8_e4m3fn_shape_16,16,16_cuda 2025-12-04T12:15:06.1579736Z 2025-12-04T12:15:06.1580003Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:06.1580229Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:06.1580349Z frames [('total', 1)] 2025-12-04T12:15:06.1580466Z stats [('calls_captured', 8)] 2025-12-04T12:15:06.1580702Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)] 2025-12-04T12:15:06.1580940Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:06.1581040Z graph_break [] 2025-12-04T12:15:06.1581371Z _ TestFP8TypesCUDA.test_to_fp8_saturated_float32_float8_e4m3fn_shape_16,16,16_cuda _ 2025-12-04T12:15:06.1581494Z Traceback (most recent call last): 2025-12-04T12:15:06.1581906Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 206, in test_to_fp8_saturated 2025-12-04T12:15:06.1582075Z y_compiled = compiled_fp8_cast(x, dst_dtype) 2025-12-04T12:15:06.1582572Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:06.1582835Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:06.1583348Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:06.1583543Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:06.1584069Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:06.1584216Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:06.1584752Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:06.1585120Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:06.1585644Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:06.1585806Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:06.1586286Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:06.1586444Z return self._compile_to_module() 2025-12-04T12:15:06.1586940Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:06.1587104Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:06.1587634Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:06.1587764Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:06.1588264Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:06.1588541Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:06.1589128Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:06.1589258Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:06.1589777Z File "/tmp/tmp8oergiu0/yr/cyrbzp3fehkavilzeoot5t43mmzhncemqmlaqmax2sbvm4b5ddeb.py", line 48, in 2025-12-04T12:15:06.1590240Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T12:15:06.1590365Z kernel.precompile( 2025-12-04T12:15:06.1590918Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T12:15:06.1591067Z self._precompile_worker() 2025-12-04T12:15:06.1591680Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:06.1591864Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:06.1592473Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:06.1592674Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:06.1593128Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:06.1593388Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:06.1593831Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:06.1594168Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:06.1594410Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:06.1594723Z def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:06.1594826Z ^ 2025-12-04T12:15:06.1595283Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:06.1595288Z 2025-12-04T12:15:06.1596000Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:06.1596018Z 2025-12-04T12:15:06.1596023Z 2025-12-04T12:15:06.1596239Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:06.1596872Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_to_fp8_saturated_float32_float8_e4m3fn_shape_16,16,16_cuda 2025-12-04T12:15:06.1596878Z 2025-12-04T12:15:06.1597197Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:06.1597422Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:06.1597548Z frames [('total', 1)] 2025-12-04T12:15:06.1597668Z stats [('calls_captured', 8)] 2025-12-04T12:15:06.1597893Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)] 2025-12-04T12:15:06.1598141Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:06.1598271Z graph_break [] 2025-12-04T12:15:06.1598493Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:06.1598610Z frames [('total', 1)] 2025-12-04T12:15:06.1598727Z stats [('calls_captured', 8)] 2025-12-04T12:15:06.1598945Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)] 2025-12-04T12:15:06.1599192Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:06.1599292Z graph_break [] 2025-12-04T12:15:06.1599454Z =================================== FAILURES =================================== 2025-12-04T12:15:06.1599768Z _ TestFP8TypesCUDA.test_to_fp8_saturated_float32_float8_e4m3fn_shape_16,16,16_cuda _ 2025-12-04T12:15:06.1599925Z Traceback (most recent call last): 2025-12-04T12:15:06.1600350Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 206, in test_to_fp8_saturated 2025-12-04T12:15:06.1600500Z y_compiled = compiled_fp8_cast(x, dst_dtype) 2025-12-04T12:15:06.1600992Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:06.1601255Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:06.1601765Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:06.1601972Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:06.1602512Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:06.1602663Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:06.1603216Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:06.1603536Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:06.1604068Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:06.1604215Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:06.1604694Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:06.1604829Z return self._compile_to_module() 2025-12-04T12:15:06.1605317Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:06.1605484Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:06.1606015Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:06.1606144Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:06.1606651Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:06.1606884Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:06.1607470Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:06.1607609Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:06.1608082Z File "/tmp/tmp_obofr6o/o7/co7thpiwcdsglmsh4mbxk6fqgabth7jbut74r3wqihix6g3mj5xi.py", line 80, in 2025-12-04T12:15:06.1608586Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 649, in wait 2025-12-04T12:15:06.1608702Z self._wait_futures(scope) 2025-12-04T12:15:06.1609199Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 669, in _wait_futures 2025-12-04T12:15:06.1609327Z kernel = result.result() 2025-12-04T12:15:06.1609768Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 4416, in result 2025-12-04T12:15:06.1609915Z return self.result_fn() 2025-12-04T12:15:06.1610410Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 438, in get_result 2025-12-04T12:15:06.1610542Z raise e.with_name(kernel_name) from e 2025-12-04T12:15:06.1610939Z torch._inductor.exc.InductorError: SubprocException: An exception occurred in a subprocess: 2025-12-04T12:15:06.1610945Z 2025-12-04T12:15:06.1611081Z Name=triton_poi_fused__to_copy_clamp_0 2025-12-04T12:15:06.1611207Z Traceback (most recent call last): 2025-12-04T12:15:06.1611759Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_worker/subproc_pool.py", line 457, in do_job 2025-12-04T12:15:06.1611893Z result = job() 2025-12-04T12:15:06.1612487Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 68, in _worker_compile_triton 2025-12-04T12:15:06.1612644Z kernel.precompile(warm_cache_only=True) 2025-12-04T12:15:06.1613201Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 432, in precompile 2025-12-04T12:15:06.1613331Z self._precompile_worker() 2025-12-04T12:15:06.1613928Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:06.1614109Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:06.1614755Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:06.1614954Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:06.1615420Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:06.1615666Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:06.1616113Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:06.1616540Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:06.1616724Z triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T12:15:06.1617074Z def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:06.1617178Z ^ 2025-12-04T12:15:06.1617641Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:06.1617647Z 2025-12-04T12:15:06.1617652Z 2025-12-04T12:15:06.1618380Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:06.1618386Z 2025-12-04T12:15:06.1618390Z 2025-12-04T12:15:06.1618610Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:06.1619255Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_to_fp8_saturated_float32_float8_e4m3fn_shape_16,16,16_cuda 2025-12-04T12:15:06.1619261Z 2025-12-04T12:15:06.1619532Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:06.1619757Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:06.1619891Z frames [('total', 1)] 2025-12-04T12:15:06.1620008Z stats [('calls_captured', 8)] 2025-12-04T12:15:06.1620293Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)] 2025-12-04T12:15:06.1620545Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:06.1620651Z graph_break [] 2025-12-04T12:15:06.1620891Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:06.1620999Z frames [('total', 1)] 2025-12-04T12:15:06.1621116Z stats [('calls_captured', 8)] 2025-12-04T12:15:06.1621396Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)] 2025-12-04T12:15:06.1621634Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:06.1621736Z graph_break [] 2025-12-04T12:15:06.1621970Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:06.1622076Z frames [('total', 1)] 2025-12-04T12:15:06.1622206Z stats [('calls_captured', 8)] 2025-12-04T12:15:06.1622427Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)] 2025-12-04T12:15:06.1622792Z inductor [('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_hit', 1)] 2025-12-04T12:15:06.1622939Z graph_break [] 2025-12-04T12:15:06.1623594Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-0b8acd36d7258295.xml - 2025-12-04T12:15:06.1623771Z =========================== short test summary info ============================ 2025-12-04T12:15:06.1624734Z FAILED [0.9874s] inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float32_float8_e4m3fn_shape_16,16,16_cuda - torch._inductor.exc.InductorError: SubprocException: An exception occurred in a subprocess: 2025-12-04T12:15:06.1624741Z 2025-12-04T12:15:06.1624876Z Name=triton_poi_fused__to_copy_clamp_0 2025-12-04T12:15:06.1625021Z Traceback (most recent call last): 2025-12-04T12:15:06.1625572Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_worker/subproc_pool.py", line 457, in do_job 2025-12-04T12:15:06.1625714Z result = job() 2025-12-04T12:15:06.1626331Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 68, in _worker_compile_triton 2025-12-04T12:15:06.1626479Z kernel.precompile(warm_cache_only=True) 2025-12-04T12:15:06.1627052Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 432, in precompile 2025-12-04T12:15:06.1627174Z self._precompile_worker() 2025-12-04T12:15:06.1627770Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:06.1627966Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:06.1628561Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:06.1628765Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:06.1629236Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:06.1629483Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:06.1629944Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:06.1630285Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:06.1630473Z triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T12:15:06.1630801Z def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:06.1630894Z ^ 2025-12-04T12:15:06.1631367Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:06.1631373Z 2025-12-04T12:15:06.1631377Z 2025-12-04T12:15:06.1632138Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:06.1632145Z 2025-12-04T12:15:06.1632152Z 2025-12-04T12:15:06.1632378Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:06.1633025Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_to_fp8_saturated_float32_float8_e4m3fn_shape_16,16,16_cuda 2025-12-04T12:15:06.1633067Z 2025-12-04T12:15:06.1633339Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:06.1633536Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T12:15:06.1633752Z ============= 1 failed, 2 passed, 53 deselected, 2 rerun in 7.23s ============== 2025-12-04T12:15:06.1633853Z Got exit code 1 2025-12-04T12:15:06.1633973Z Retrying single test... 2025-12-04T12:15:06.1634453Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-babe12520ea62fea.xml 2025-12-04T12:15:06.1634634Z ============================= test session starts ============================== 2025-12-04T12:15:06.1635026Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T12:15:06.1635136Z cachedir: .pytest_cache 2025-12-04T12:15:06.1635674Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:15:06.1635802Z rootdir: /var/lib/jenkins/workspace 2025-12-04T12:15:06.1635911Z configfile: pytest.ini 2025-12-04T12:15:06.1636520Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T12:15:06.1636743Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T12:15:06.1637496Z stepcurrent: skipping 55 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float32_float8_e4m3fn_shape_16,16,16_cuda 2025-12-04T12:15:06.1637617Z Running 1 items in this shard 2025-12-04T12:15:06.1637622Z 2025-12-04T12:15:06.1638753Z inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float32_float8_e4m3fn_shape_16,16,16_cuda E1204 12:08:24.345000 125085 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Triton compilation failed: triton_poi_fused__to_copy_clamp_0 2025-12-04T12:15:06.1639527Z E1204 12:08:24.345000 125085 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:06.1640078Z E1204 12:08:24.345000 125085 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:06.1640655Z E1204 12:08:24.345000 125085 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T12:15:06.1641162Z E1204 12:08:24.345000 125085 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] xmask = xindex < xnumel 2025-12-04T12:15:06.1641616Z E1204 12:08:24.345000 125085 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] x0 = xindex 2025-12-04T12:15:06.1642163Z E1204 12:08:24.345000 125085 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp0 = tl.load(in_ptr0 + (x0), xmask) 2025-12-04T12:15:06.1642621Z E1204 12:08:24.345000 125085 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp1 = -448.0 2025-12-04T12:15:06.1643208Z E1204 12:08:24.345000 125085 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp2 = triton_helpers.maximum(tmp0, tmp1) 2025-12-04T12:15:06.1643652Z E1204 12:08:24.345000 125085 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp3 = 448.0 2025-12-04T12:15:06.1644274Z E1204 12:08:24.345000 125085 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp4 = triton_helpers.minimum(tmp2, tmp3) 2025-12-04T12:15:06.1644810Z E1204 12:08:24.345000 125085 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp5 = tmp4.to(tl.float8e4nv) 2025-12-04T12:15:06.1645368Z E1204 12:08:24.345000 125085 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tl.store(out_ptr0 + (x0), tmp5, xmask) 2025-12-04T12:15:06.1645747Z E1204 12:08:24.345000 125085 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] 2025-12-04T12:15:06.1647458Z E1204 12:08:24.345000 125085 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] metadata: {'signature': {'in_ptr0': '*fp32', 'out_ptr0': '*fp8e4nv', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 256}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 75} 2025-12-04T12:15:06.1648016Z E1204 12:08:24.345000 125085 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Traceback (most recent call last): 2025-12-04T12:15:06.1649104Z E1204 12:08:24.345000 125085 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:06.1649748Z E1204 12:08:24.345000 125085 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:06.1650640Z E1204 12:08:24.345000 125085 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:06.1651371Z E1204 12:08:24.345000 125085 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:06.1652271Z E1204 12:08:24.345000 125085 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:06.1653044Z E1204 12:08:24.345000 125085 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:06.1653668Z E1204 12:08:24.345000 125085 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T12:15:06.1654422Z E1204 12:08:24.345000 125085 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:06.1654810Z E1204 12:08:24.345000 125085 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ^ 2025-12-04T12:15:06.1655705Z E1204 12:08:24.345000 125085 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:06.1655854Z ('RERUN', {'yellow': True}) [3.9235s] [100%] 2025-12-04T12:15:06.1657099Z inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float32_float8_e4m3fn_shape_16,16,16_cuda E1204 12:08:25.077000 125085 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Triton compilation failed: triton_poi_fused__to_copy_clamp_0 2025-12-04T12:15:06.1657854Z E1204 12:08:25.077000 125085 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:06.1658420Z E1204 12:08:25.077000 125085 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:06.1659040Z E1204 12:08:25.077000 125085 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T12:15:06.1659564Z E1204 12:08:25.077000 125085 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] xmask = xindex < xnumel 2025-12-04T12:15:06.1660004Z E1204 12:08:25.077000 125085 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] x0 = xindex 2025-12-04T12:15:06.1660585Z E1204 12:08:25.077000 125085 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp0 = tl.load(in_ptr0 + (x0), xmask) 2025-12-04T12:15:06.1661047Z E1204 12:08:25.077000 125085 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp1 = -448.0 2025-12-04T12:15:06.1661617Z E1204 12:08:25.077000 125085 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp2 = triton_helpers.maximum(tmp0, tmp1) 2025-12-04T12:15:06.1662074Z E1204 12:08:25.077000 125085 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp3 = 448.0 2025-12-04T12:15:06.1662647Z E1204 12:08:25.077000 125085 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp4 = triton_helpers.minimum(tmp2, tmp3) 2025-12-04T12:15:06.1663210Z E1204 12:08:25.077000 125085 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp5 = tmp4.to(tl.float8e4nv) 2025-12-04T12:15:06.1663773Z E1204 12:08:25.077000 125085 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tl.store(out_ptr0 + (x0), tmp5, xmask) 2025-12-04T12:15:06.1664143Z E1204 12:08:25.077000 125085 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] 2025-12-04T12:15:06.1665874Z E1204 12:08:25.077000 125085 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] metadata: {'signature': {'in_ptr0': '*fp32', 'out_ptr0': '*fp8e4nv', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 256}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 75} 2025-12-04T12:15:06.1666412Z E1204 12:08:25.077000 125085 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Traceback (most recent call last): 2025-12-04T12:15:06.1667481Z E1204 12:08:25.077000 125085 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:06.1668113Z E1204 12:08:25.077000 125085 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:06.1669024Z E1204 12:08:25.077000 125085 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:06.1669706Z E1204 12:08:25.077000 125085 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:06.1670589Z E1204 12:08:25.077000 125085 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:06.1671597Z E1204 12:08:25.077000 125085 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:06.1672209Z E1204 12:08:25.077000 125085 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T12:15:06.1673087Z E1204 12:08:25.077000 125085 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:06.1673462Z E1204 12:08:25.077000 125085 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ^ 2025-12-04T12:15:06.1674378Z E1204 12:08:25.077000 125085 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:06.1674562Z ('RERUN', {'yellow': True}) [0.6941s] [100%] 2025-12-04T12:15:06.1675693Z inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float32_float8_e4m3fn_shape_16,16,16_cuda E1204 12:08:25.763000 125085 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Triton compilation failed: triton_poi_fused__to_copy_clamp_0 2025-12-04T12:15:06.1676463Z E1204 12:08:25.763000 125085 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:06.1677011Z E1204 12:08:25.763000 125085 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:06.1677644Z E1204 12:08:25.763000 125085 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T12:15:06.1678141Z E1204 12:08:25.763000 125085 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] xmask = xindex < xnumel 2025-12-04T12:15:06.1678582Z E1204 12:08:25.763000 125085 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] x0 = xindex 2025-12-04T12:15:06.1679149Z E1204 12:08:25.763000 125085 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp0 = tl.load(in_ptr0 + (x0), xmask) 2025-12-04T12:15:06.1679594Z E1204 12:08:25.763000 125085 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp1 = -448.0 2025-12-04T12:15:06.1680228Z E1204 12:08:25.763000 125085 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp2 = triton_helpers.maximum(tmp0, tmp1) 2025-12-04T12:15:06.1680675Z E1204 12:08:25.763000 125085 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp3 = 448.0 2025-12-04T12:15:06.1681242Z E1204 12:08:25.763000 125085 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp4 = triton_helpers.minimum(tmp2, tmp3) 2025-12-04T12:15:06.1681791Z E1204 12:08:25.763000 125085 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp5 = tmp4.to(tl.float8e4nv) 2025-12-04T12:15:06.1682339Z E1204 12:08:25.763000 125085 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tl.store(out_ptr0 + (x0), tmp5, xmask) 2025-12-04T12:15:06.1682721Z E1204 12:08:25.763000 125085 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] 2025-12-04T12:15:06.1684382Z E1204 12:08:25.763000 125085 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] metadata: {'signature': {'in_ptr0': '*fp32', 'out_ptr0': '*fp8e4nv', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 256}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 75} 2025-12-04T12:15:06.1684936Z E1204 12:08:25.763000 125085 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Traceback (most recent call last): 2025-12-04T12:15:06.1685978Z E1204 12:08:25.763000 125085 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:06.1686626Z E1204 12:08:25.763000 125085 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:06.1687553Z E1204 12:08:25.763000 125085 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:06.1688241Z E1204 12:08:25.763000 125085 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:06.1689172Z E1204 12:08:25.763000 125085 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:06.1689946Z E1204 12:08:25.763000 125085 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:06.1690571Z E1204 12:08:25.763000 125085 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T12:15:06.1691329Z E1204 12:08:25.763000 125085 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:06.1691828Z E1204 12:08:25.763000 125085 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ^ 2025-12-04T12:15:06.1692723Z E1204 12:08:25.763000 125085 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:06.1692827Z FAILED [0.6842s] [100%] 2025-12-04T12:15:06.1692833Z 2025-12-04T12:15:06.1692997Z ==================================== RERUNS ==================================== 2025-12-04T12:15:06.1693315Z _ TestFP8TypesCUDA.test_to_fp8_saturated_float32_float8_e4m3fn_shape_16,16,16_cuda _ 2025-12-04T12:15:06.1693511Z Traceback (most recent call last): 2025-12-04T12:15:06.1693926Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 206, in test_to_fp8_saturated 2025-12-04T12:15:06.1694078Z y_compiled = compiled_fp8_cast(x, dst_dtype) 2025-12-04T12:15:06.1694582Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:06.1694833Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:06.1695348Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:06.1695557Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:06.1696067Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:06.1696230Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:06.1696841Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:06.1697163Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:06.1697700Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:06.1697849Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:06.1698346Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:06.1698469Z return self._compile_to_module() 2025-12-04T12:15:06.1698953Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:06.1699133Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:06.1699652Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:06.1699831Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:06.1700341Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:06.1700577Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:06.1701176Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:06.1701335Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:06.1701827Z File "/tmp/tmp355j3ia6/6q/c6q5lrws2s2lfh7tb3zi5gdm5ownwn5kgk366qipxg6w45ivw4bz.py", line 48, in 2025-12-04T12:15:06.1702303Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T12:15:06.1702415Z kernel.precompile( 2025-12-04T12:15:06.1702986Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T12:15:06.1703105Z self._precompile_worker() 2025-12-04T12:15:06.1703726Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:06.1703917Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:06.1704508Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:06.1704711Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:06.1705176Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:06.1705423Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:06.1705878Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:06.1706245Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:06.1706476Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:06.1706805Z def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:06.1706895Z ^ 2025-12-04T12:15:06.1707364Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:06.1707373Z 2025-12-04T12:15:06.1708083Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:06.1708089Z 2025-12-04T12:15:06.1708094Z 2025-12-04T12:15:06.1708314Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:06.1708964Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_to_fp8_saturated_float32_float8_e4m3fn_shape_16,16,16_cuda 2025-12-04T12:15:06.1708970Z 2025-12-04T12:15:06.1709240Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:06.1709485Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:06.1709593Z frames [('total', 1)] 2025-12-04T12:15:06.1709712Z stats [('calls_captured', 8)] 2025-12-04T12:15:06.1709970Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:06.1710192Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)] 2025-12-04T12:15:06.1710309Z graph_break [] 2025-12-04T12:15:06.1710624Z _ TestFP8TypesCUDA.test_to_fp8_saturated_float32_float8_e4m3fn_shape_16,16,16_cuda _ 2025-12-04T12:15:06.1710752Z Traceback (most recent call last): 2025-12-04T12:15:06.1711171Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 206, in test_to_fp8_saturated 2025-12-04T12:15:06.1711323Z y_compiled = compiled_fp8_cast(x, dst_dtype) 2025-12-04T12:15:06.1711848Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:06.1712118Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:06.1712634Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:06.1712870Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:06.1713379Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:06.1713528Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:06.1714077Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:06.1714400Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:06.1714941Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:06.1715146Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:06.1715791Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:06.1715964Z return self._compile_to_module() 2025-12-04T12:15:06.1716456Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:06.1716626Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:06.1717160Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:06.1717294Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:06.1717854Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:06.1718092Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:06.1718677Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:06.1718821Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:06.1719320Z File "/tmp/tmpdgvtg6yp/ex/cex2xpd4z4rp3zd4aewrxwc5fgvspiaio2kha6m4gsaedqqxoju7.py", line 48, in 2025-12-04T12:15:06.1719801Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T12:15:06.1719917Z kernel.precompile( 2025-12-04T12:15:06.1720478Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T12:15:06.1720612Z self._precompile_worker() 2025-12-04T12:15:06.1721214Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:06.1721395Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:06.1722010Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:06.1722211Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:06.1722680Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:06.1722926Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:06.1723370Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:06.1723723Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:06.1723951Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:06.1724307Z def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:06.1724402Z ^ 2025-12-04T12:15:06.1724861Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:06.1724867Z 2025-12-04T12:15:06.1725587Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:06.1725631Z 2025-12-04T12:15:06.1725636Z 2025-12-04T12:15:06.1725854Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:06.1726497Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_to_fp8_saturated_float32_float8_e4m3fn_shape_16,16,16_cuda 2025-12-04T12:15:06.1726503Z 2025-12-04T12:15:06.1726771Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:06.1727000Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:06.1727118Z frames [('total', 1)] 2025-12-04T12:15:06.1727270Z stats [('calls_captured', 8)] 2025-12-04T12:15:06.1727508Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:06.1727741Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)] 2025-12-04T12:15:06.1727841Z graph_break [] 2025-12-04T12:15:06.1728076Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:06.1728180Z frames [('total', 1)] 2025-12-04T12:15:06.1728294Z stats [('calls_captured', 8)] 2025-12-04T12:15:06.1728523Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)] 2025-12-04T12:15:06.1728761Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:06.1728860Z graph_break [] 2025-12-04T12:15:06.1729021Z =================================== FAILURES =================================== 2025-12-04T12:15:06.1729398Z _ TestFP8TypesCUDA.test_to_fp8_saturated_float32_float8_e4m3fn_shape_16,16,16_cuda _ 2025-12-04T12:15:06.1729536Z Traceback (most recent call last): 2025-12-04T12:15:06.1729946Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 206, in test_to_fp8_saturated 2025-12-04T12:15:06.1730095Z y_compiled = compiled_fp8_cast(x, dst_dtype) 2025-12-04T12:15:06.1730601Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:06.1730853Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:06.1731370Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:06.1731577Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:06.1732082Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:06.1732246Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:06.1732779Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:06.1733101Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:06.1733634Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:06.1733787Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:06.1734281Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:06.1734406Z return self._compile_to_module() 2025-12-04T12:15:06.1734892Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:06.1735070Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:06.1735649Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:06.1735782Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:06.1736371Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:06.1736609Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:06.1737256Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:06.1737387Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:06.1737892Z File "/tmp/tmpb9r7p03z/id/cidalxryarvebqahbawkdsf3vyl2c3mm6ckmtgw6gi7o7txg6djg.py", line 48, in 2025-12-04T12:15:06.1738372Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T12:15:06.1738489Z kernel.precompile( 2025-12-04T12:15:06.1739061Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T12:15:06.1739220Z self._precompile_worker() 2025-12-04T12:15:06.1739822Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:06.1740020Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:06.1740624Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:06.1740824Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:06.1741291Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:06.1741536Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:06.1742039Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:06.1742376Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:06.1742605Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:06.1742929Z def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:06.1743022Z ^ 2025-12-04T12:15:06.1743492Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:06.1743497Z 2025-12-04T12:15:06.1744207Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:06.1744214Z 2025-12-04T12:15:06.1744218Z 2025-12-04T12:15:06.1744433Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:06.1745080Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_to_fp8_saturated_float32_float8_e4m3fn_shape_16,16,16_cuda 2025-12-04T12:15:06.1745088Z 2025-12-04T12:15:06.1745361Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:06.1745594Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:06.1745701Z frames [('total', 1)] 2025-12-04T12:15:06.1745818Z stats [('calls_captured', 8)] 2025-12-04T12:15:06.1746072Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:06.1746294Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)] 2025-12-04T12:15:06.1746393Z graph_break [] 2025-12-04T12:15:06.1746627Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:06.1746730Z frames [('total', 1)] 2025-12-04T12:15:06.1746857Z stats [('calls_captured', 8)] 2025-12-04T12:15:06.1747121Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)] 2025-12-04T12:15:06.1747358Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:06.1747474Z graph_break [] 2025-12-04T12:15:06.1747690Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:06.1747793Z frames [('total', 1)] 2025-12-04T12:15:06.1747922Z stats [('calls_captured', 8)] 2025-12-04T12:15:06.1748174Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)] 2025-12-04T12:15:06.1748419Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:06.1748519Z graph_break [] 2025-12-04T12:15:06.1749171Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-babe12520ea62fea.xml - 2025-12-04T12:15:06.1749359Z =========================== short test summary info ============================ 2025-12-04T12:15:06.1750146Z FAILED [0.6842s] inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float32_float8_e4m3fn_shape_16,16,16_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:06.1750494Z def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:06.1750599Z ^ 2025-12-04T12:15:06.1751055Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:06.1751063Z 2025-12-04T12:15:06.1751782Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:06.1751788Z 2025-12-04T12:15:06.1751794Z 2025-12-04T12:15:06.1752011Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:06.1752655Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_to_fp8_saturated_float32_float8_e4m3fn_shape_16,16,16_cuda 2025-12-04T12:15:06.1752660Z 2025-12-04T12:15:06.1752962Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:06.1753144Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T12:15:06.1753362Z ================== 1 failed, 187 deselected, 2 rerun in 5.35s ================== 2025-12-04T12:15:06.1753463Z Got exit code 1 2025-12-04T12:15:06.1753573Z Retrying single test... 2025-12-04T12:15:06.1754062Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-08a6bb29b776e6ca.xml 2025-12-04T12:15:06.1754231Z ============================= test session starts ============================== 2025-12-04T12:15:06.1754595Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T12:15:06.1754707Z cachedir: .pytest_cache 2025-12-04T12:15:06.1755226Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:15:06.1755371Z rootdir: /var/lib/jenkins/workspace 2025-12-04T12:15:06.1755481Z configfile: pytest.ini 2025-12-04T12:15:06.1756074Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T12:15:06.1756310Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T12:15:06.1757158Z stepcurrent: skipping 55 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float32_float8_e4m3fn_shape_16,16,16_cuda 2025-12-04T12:15:06.1757299Z Running 1 items in this shard 2025-12-04T12:15:06.1757304Z 2025-12-04T12:15:06.1758441Z inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float32_float8_e4m3fn_shape_16,16,16_cuda E1204 12:08:44.323000 125283 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Triton compilation failed: triton_poi_fused__to_copy_clamp_0 2025-12-04T12:15:06.1759282Z E1204 12:08:44.323000 125283 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:06.1759838Z E1204 12:08:44.323000 125283 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:06.1760408Z E1204 12:08:44.323000 125283 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T12:15:06.1760977Z E1204 12:08:44.323000 125283 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] xmask = xindex < xnumel 2025-12-04T12:15:06.1761418Z E1204 12:08:44.323000 125283 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] x0 = xindex 2025-12-04T12:15:06.1761981Z E1204 12:08:44.323000 125283 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp0 = tl.load(in_ptr0 + (x0), xmask) 2025-12-04T12:15:06.1762434Z E1204 12:08:44.323000 125283 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp1 = -448.0 2025-12-04T12:15:06.1763038Z E1204 12:08:44.323000 125283 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp2 = triton_helpers.maximum(tmp0, tmp1) 2025-12-04T12:15:06.1763497Z E1204 12:08:44.323000 125283 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp3 = 448.0 2025-12-04T12:15:06.1764067Z E1204 12:08:44.323000 125283 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp4 = triton_helpers.minimum(tmp2, tmp3) 2025-12-04T12:15:06.1764607Z E1204 12:08:44.323000 125283 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp5 = tmp4.to(tl.float8e4nv) 2025-12-04T12:15:06.1765159Z E1204 12:08:44.323000 125283 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tl.store(out_ptr0 + (x0), tmp5, xmask) 2025-12-04T12:15:06.1765562Z E1204 12:08:44.323000 125283 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] 2025-12-04T12:15:06.1767246Z E1204 12:08:44.323000 125283 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] metadata: {'signature': {'in_ptr0': '*fp32', 'out_ptr0': '*fp8e4nv', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 256}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 75} 2025-12-04T12:15:06.1767790Z E1204 12:08:44.323000 125283 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Traceback (most recent call last): 2025-12-04T12:15:06.1768866Z E1204 12:08:44.323000 125283 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:06.1769500Z E1204 12:08:44.323000 125283 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:06.1770417Z E1204 12:08:44.323000 125283 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:06.1771338Z E1204 12:08:44.323000 125283 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:06.1772245Z E1204 12:08:44.323000 125283 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:06.1773107Z E1204 12:08:44.323000 125283 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:06.1773721Z E1204 12:08:44.323000 125283 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T12:15:06.1774497Z E1204 12:08:44.323000 125283 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:06.1774921Z E1204 12:08:44.323000 125283 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ^ 2025-12-04T12:15:06.1775838Z E1204 12:08:44.323000 125283 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:06.1775977Z ('RERUN', {'yellow': True}) [3.9491s] [100%] 2025-12-04T12:15:06.1777199Z inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float32_float8_e4m3fn_shape_16,16,16_cuda E1204 12:08:45.064000 125283 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Triton compilation failed: triton_poi_fused__to_copy_clamp_0 2025-12-04T12:15:06.1778016Z E1204 12:08:45.064000 125283 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:06.1778567Z E1204 12:08:45.064000 125283 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:06.1779158Z E1204 12:08:45.064000 125283 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T12:15:06.1779672Z E1204 12:08:45.064000 125283 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] xmask = xindex < xnumel 2025-12-04T12:15:06.1780175Z E1204 12:08:45.064000 125283 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] x0 = xindex 2025-12-04T12:15:06.1780725Z E1204 12:08:45.064000 125283 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp0 = tl.load(in_ptr0 + (x0), xmask) 2025-12-04T12:15:06.1781176Z E1204 12:08:45.064000 125283 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp1 = -448.0 2025-12-04T12:15:06.1781763Z E1204 12:08:45.064000 125283 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp2 = triton_helpers.maximum(tmp0, tmp1) 2025-12-04T12:15:06.1782211Z E1204 12:08:45.064000 125283 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp3 = 448.0 2025-12-04T12:15:06.1782799Z E1204 12:08:45.064000 125283 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp4 = triton_helpers.minimum(tmp2, tmp3) 2025-12-04T12:15:06.1783329Z E1204 12:08:45.064000 125283 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp5 = tmp4.to(tl.float8e4nv) 2025-12-04T12:15:06.1783900Z E1204 12:08:45.064000 125283 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tl.store(out_ptr0 + (x0), tmp5, xmask) 2025-12-04T12:15:06.1784272Z E1204 12:08:45.064000 125283 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] 2025-12-04T12:15:06.1785947Z E1204 12:08:45.064000 125283 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] metadata: {'signature': {'in_ptr0': '*fp32', 'out_ptr0': '*fp8e4nv', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 256}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 75} 2025-12-04T12:15:06.1786506Z E1204 12:08:45.064000 125283 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Traceback (most recent call last): 2025-12-04T12:15:06.1787600Z E1204 12:08:45.064000 125283 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:06.1788260Z E1204 12:08:45.064000 125283 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:06.1789160Z E1204 12:08:45.064000 125283 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:06.1789901Z E1204 12:08:45.064000 125283 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:06.1790791Z E1204 12:08:45.064000 125283 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:06.1791581Z E1204 12:08:45.064000 125283 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:06.1792232Z E1204 12:08:45.064000 125283 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T12:15:06.1792981Z E1204 12:08:45.064000 125283 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:06.1793372Z E1204 12:08:45.064000 125283 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ^ 2025-12-04T12:15:06.1794309Z E1204 12:08:45.064000 125283 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:06.1794463Z ('RERUN', {'yellow': True}) [0.7026s] [100%] 2025-12-04T12:15:06.1795590Z inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float32_float8_e4m3fn_shape_16,16,16_cuda E1204 12:08:45.757000 125283 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Triton compilation failed: triton_poi_fused__to_copy_clamp_0 2025-12-04T12:15:06.1796354Z E1204 12:08:45.757000 125283 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:06.1796929Z E1204 12:08:45.757000 125283 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:06.1797496Z E1204 12:08:45.757000 125283 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T12:15:06.1798016Z E1204 12:08:45.757000 125283 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] xmask = xindex < xnumel 2025-12-04T12:15:06.1798458Z E1204 12:08:45.757000 125283 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] x0 = xindex 2025-12-04T12:15:06.1799025Z E1204 12:08:45.757000 125283 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp0 = tl.load(in_ptr0 + (x0), xmask) 2025-12-04T12:15:06.1799478Z E1204 12:08:45.757000 125283 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp1 = -448.0 2025-12-04T12:15:06.1800047Z E1204 12:08:45.757000 125283 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp2 = triton_helpers.maximum(tmp0, tmp1) 2025-12-04T12:15:06.1800508Z E1204 12:08:45.757000 125283 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp3 = 448.0 2025-12-04T12:15:06.1801081Z E1204 12:08:45.757000 125283 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp4 = triton_helpers.minimum(tmp2, tmp3) 2025-12-04T12:15:06.1801667Z E1204 12:08:45.757000 125283 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp5 = tmp4.to(tl.float8e4nv) 2025-12-04T12:15:06.1802224Z E1204 12:08:45.757000 125283 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tl.store(out_ptr0 + (x0), tmp5, xmask) 2025-12-04T12:15:06.1802587Z E1204 12:08:45.757000 125283 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] 2025-12-04T12:15:06.1804319Z E1204 12:08:45.757000 125283 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] metadata: {'signature': {'in_ptr0': '*fp32', 'out_ptr0': '*fp8e4nv', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 256}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 75} 2025-12-04T12:15:06.1804867Z E1204 12:08:45.757000 125283 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Traceback (most recent call last): 2025-12-04T12:15:06.1805951Z E1204 12:08:45.757000 125283 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:06.1806590Z E1204 12:08:45.757000 125283 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:06.1807496Z E1204 12:08:45.757000 125283 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:06.1808216Z E1204 12:08:45.757000 125283 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:06.1809113Z E1204 12:08:45.757000 125283 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:06.1809887Z E1204 12:08:45.757000 125283 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:06.1810494Z E1204 12:08:45.757000 125283 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T12:15:06.1811259Z E1204 12:08:45.757000 125283 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:06.1811628Z E1204 12:08:45.757000 125283 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ^ 2025-12-04T12:15:06.1812536Z E1204 12:08:45.757000 125283 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:06.1812643Z FAILED [0.6908s] [100%] 2025-12-04T12:15:06.1812649Z 2025-12-04T12:15:06.1812794Z ==================================== RERUNS ==================================== 2025-12-04T12:15:06.1813125Z _ TestFP8TypesCUDA.test_to_fp8_saturated_float32_float8_e4m3fn_shape_16,16,16_cuda _ 2025-12-04T12:15:06.1813251Z Traceback (most recent call last): 2025-12-04T12:15:06.1813673Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 206, in test_to_fp8_saturated 2025-12-04T12:15:06.1813822Z y_compiled = compiled_fp8_cast(x, dst_dtype) 2025-12-04T12:15:06.1814312Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:06.1814583Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:06.1815150Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:06.1815360Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:06.1815873Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:06.1816073Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:06.1816687Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:06.1817012Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:06.1817532Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:06.1817698Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:06.1818184Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:06.1818359Z return self._compile_to_module() 2025-12-04T12:15:06.1818844Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:06.1819009Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:06.1819546Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:06.1819678Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:06.1820188Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:06.1820420Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:06.1821041Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:06.1821183Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:06.1821693Z File "/tmp/tmpiqzjxtdx/qw/cqwl6dvzf6hiuzyhu4ecf3hfxlro2wctkaasadwz2vxywthnrjvn.py", line 48, in 2025-12-04T12:15:06.1822156Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T12:15:06.1822284Z kernel.precompile( 2025-12-04T12:15:06.1822838Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T12:15:06.1822970Z self._precompile_worker() 2025-12-04T12:15:06.1823561Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:06.1823742Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:06.1824352Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:06.1824550Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:06.1825014Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:06.1825261Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:06.1825708Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:06.1826056Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:06.1826285Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:06.1826594Z def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:06.1826697Z ^ 2025-12-04T12:15:06.1827194Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:06.1827201Z 2025-12-04T12:15:06.1827934Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:06.1827942Z 2025-12-04T12:15:06.1827947Z 2025-12-04T12:15:06.1828165Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:06.1828860Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_to_fp8_saturated_float32_float8_e4m3fn_shape_16,16,16_cuda 2025-12-04T12:15:06.1828865Z 2025-12-04T12:15:06.1829137Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:06.1829361Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:06.1829480Z frames [('total', 1)] 2025-12-04T12:15:06.1829598Z stats [('calls_captured', 8)] 2025-12-04T12:15:06.1829841Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:06.1830075Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)] 2025-12-04T12:15:06.1830207Z graph_break [] 2025-12-04T12:15:06.1830537Z _ TestFP8TypesCUDA.test_to_fp8_saturated_float32_float8_e4m3fn_shape_16,16,16_cuda _ 2025-12-04T12:15:06.1830661Z Traceback (most recent call last): 2025-12-04T12:15:06.1831071Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 206, in test_to_fp8_saturated 2025-12-04T12:15:06.1831234Z y_compiled = compiled_fp8_cast(x, dst_dtype) 2025-12-04T12:15:06.1831721Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:06.1831970Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:06.1832495Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:06.1832731Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:06.1833255Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:06.1833404Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:06.1833937Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:06.1834274Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:06.1834795Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:06.1834956Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:06.1835433Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:06.1835556Z return self._compile_to_module() 2025-12-04T12:15:06.1836058Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:06.1836222Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:06.1836735Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:06.1836878Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:06.1837378Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:06.1837623Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:06.1838212Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:06.1838338Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:06.1838858Z File "/tmp/tmp68hil5v3/x6/cx6n6eh7bdyvggwbtpgtufw6pak3eeepotow2x7id2elvlgd24by.py", line 48, in 2025-12-04T12:15:06.1839361Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T12:15:06.1839489Z kernel.precompile( 2025-12-04T12:15:06.1840042Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T12:15:06.1840160Z self._precompile_worker() 2025-12-04T12:15:06.1840801Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:06.1840978Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:06.1841573Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:06.1841783Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:06.1842236Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:06.1842495Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:06.1842975Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:06.1843309Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:06.1843550Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:06.1843859Z def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:06.1843963Z ^ 2025-12-04T12:15:06.1844420Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:06.1844426Z 2025-12-04T12:15:06.1845168Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:06.1845176Z 2025-12-04T12:15:06.1845180Z 2025-12-04T12:15:06.1845413Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:06.1846044Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_to_fp8_saturated_float32_float8_e4m3fn_shape_16,16,16_cuda 2025-12-04T12:15:06.1846050Z 2025-12-04T12:15:06.1846333Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:06.1846559Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:06.1846668Z frames [('total', 1)] 2025-12-04T12:15:06.1846803Z stats [('calls_captured', 8)] 2025-12-04T12:15:06.1847042Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:06.1847278Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)] 2025-12-04T12:15:06.1847387Z graph_break [] 2025-12-04T12:15:06.1847613Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:06.1847731Z frames [('total', 1)] 2025-12-04T12:15:06.1847850Z stats [('calls_captured', 8)] 2025-12-04T12:15:06.1848068Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)] 2025-12-04T12:15:06.1848321Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:06.1848423Z graph_break [] 2025-12-04T12:15:06.1848573Z =================================== FAILURES =================================== 2025-12-04T12:15:06.1848903Z _ TestFP8TypesCUDA.test_to_fp8_saturated_float32_float8_e4m3fn_shape_16,16,16_cuda _ 2025-12-04T12:15:06.1849026Z Traceback (most recent call last): 2025-12-04T12:15:06.1849444Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 206, in test_to_fp8_saturated 2025-12-04T12:15:06.1849596Z y_compiled = compiled_fp8_cast(x, dst_dtype) 2025-12-04T12:15:06.1850090Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:06.1850386Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:06.1850904Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:06.1851097Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:06.1851623Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:06.1851806Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:06.1852353Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:06.1852674Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:06.1853195Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:06.1853360Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:06.1853840Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:06.1854017Z return self._compile_to_module() 2025-12-04T12:15:06.1854502Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:06.1854669Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:06.1855202Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:06.1855333Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:06.1855829Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:06.1856074Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:06.1856800Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:06.1856953Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:06.1857465Z File "/tmp/tmpoj5idkhc/cp/ccph6a4leq7aciraybmukyegynqx6xxwbjinolcqutbbvelo3x5e.py", line 48, in 2025-12-04T12:15:06.1857930Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T12:15:06.1858064Z kernel.precompile( 2025-12-04T12:15:06.1858621Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T12:15:06.1858754Z self._precompile_worker() 2025-12-04T12:15:06.1859538Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:06.1859725Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:06.1860337Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:06.1860539Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:06.1861007Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:06.1861258Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:06.1861707Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:06.1862058Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:06.1862286Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:06.1862596Z def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:06.1862706Z ^ 2025-12-04T12:15:06.1863215Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:06.1863225Z 2025-12-04T12:15:06.1863952Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:06.1863958Z 2025-12-04T12:15:06.1863992Z 2025-12-04T12:15:06.1864213Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:06.1864843Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_to_fp8_saturated_float32_float8_e4m3fn_shape_16,16,16_cuda 2025-12-04T12:15:06.1864863Z 2025-12-04T12:15:06.1865136Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:06.1865360Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:06.1865482Z frames [('total', 1)] 2025-12-04T12:15:06.1865605Z stats [('calls_captured', 8)] 2025-12-04T12:15:06.1865843Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:06.1866126Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)] 2025-12-04T12:15:06.1866228Z graph_break [] 2025-12-04T12:15:06.1866464Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:06.1866569Z frames [('total', 1)] 2025-12-04T12:15:06.1866691Z stats [('calls_captured', 8)] 2025-12-04T12:15:06.1866922Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)] 2025-12-04T12:15:06.1867157Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:06.1867258Z graph_break [] 2025-12-04T12:15:06.1867483Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:06.1867587Z frames [('total', 1)] 2025-12-04T12:15:06.1867704Z stats [('calls_captured', 8)] 2025-12-04T12:15:06.1867969Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)] 2025-12-04T12:15:06.1868202Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:06.1868318Z graph_break [] 2025-12-04T12:15:06.1868964Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-08a6bb29b776e6ca.xml - 2025-12-04T12:15:06.1869137Z =========================== short test summary info ============================ 2025-12-04T12:15:06.1869927Z FAILED [0.6908s] inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float32_float8_e4m3fn_shape_16,16,16_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:06.1870238Z def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:06.1870328Z ^ 2025-12-04T12:15:06.1870800Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:06.1870806Z 2025-12-04T12:15:06.1871748Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:06.1871758Z 2025-12-04T12:15:06.1871763Z 2025-12-04T12:15:06.1871997Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:06.1872623Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_to_fp8_saturated_float32_float8_e4m3fn_shape_16,16,16_cuda 2025-12-04T12:15:06.1872631Z 2025-12-04T12:15:06.1872916Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:06.1873098Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T12:15:06.1873303Z ================== 1 failed, 187 deselected, 2 rerun in 5.39s ================== 2025-12-04T12:15:06.1873420Z Got exit code 1 2025-12-04T12:15:06.1874070Z FAILED CONSISTENTLY: test/inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float32_float8_e4m3fn_shape_16,16,16_cuda 2025-12-04T12:15:06.1874497Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T12:15:06.1874972Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-ef8db3fa00c6c1d7.xml 2025-12-04T12:15:06.1875139Z ============================= test session starts ============================== 2025-12-04T12:15:06.1875555Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T12:15:06.1875669Z cachedir: .pytest_cache 2025-12-04T12:15:06.1876187Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:15:06.1876326Z rootdir: /var/lib/jenkins/workspace 2025-12-04T12:15:06.1876435Z configfile: pytest.ini 2025-12-04T12:15:06.1877040Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T12:15:06.1877271Z collecting ... collected 188 items / 56 deselected / 132 selected 2025-12-04T12:15:06.1877462Z stepcurrent: skipping 56 already run items. 2025-12-04T12:15:06.1877594Z Running 132 items in this shard 2025-12-04T12:15:06.1877599Z 2025-12-04T12:15:06.1878759Z inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float32_float8_e4m3fn_shape_4,2048,4096_cuda E1204 12:09:04.070000 125481 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Triton compilation failed: triton_poi_fused__to_copy_clamp_0 2025-12-04T12:15:06.1879533Z E1204 12:09:04.070000 125481 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:06.1880086Z E1204 12:09:04.070000 125481 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:06.1880708Z E1204 12:09:04.070000 125481 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T12:15:06.1881229Z E1204 12:09:04.070000 125481 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] xmask = xindex < xnumel 2025-12-04T12:15:06.1881668Z E1204 12:09:04.070000 125481 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] x0 = xindex 2025-12-04T12:15:06.1882229Z E1204 12:09:04.070000 125481 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp0 = tl.load(in_ptr0 + (x0), xmask) 2025-12-04T12:15:06.1882681Z E1204 12:09:04.070000 125481 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp1 = -448.0 2025-12-04T12:15:06.1883250Z E1204 12:09:04.070000 125481 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp2 = triton_helpers.maximum(tmp0, tmp1) 2025-12-04T12:15:06.1883707Z E1204 12:09:04.070000 125481 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp3 = 448.0 2025-12-04T12:15:06.1884271Z E1204 12:09:04.070000 125481 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp4 = triton_helpers.minimum(tmp2, tmp3) 2025-12-04T12:15:06.1884818Z E1204 12:09:04.070000 125481 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp5 = tmp4.to(tl.float8e4nv) 2025-12-04T12:15:06.1885367Z E1204 12:09:04.070000 125481 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tl.store(out_ptr0 + (x0), tmp5, xmask) 2025-12-04T12:15:06.1885742Z E1204 12:09:04.070000 125481 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] 2025-12-04T12:15:06.1887455Z E1204 12:09:04.070000 125481 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] metadata: {'signature': {'in_ptr0': '*fp32', 'out_ptr0': '*fp8e4nv', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1024}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 75} 2025-12-04T12:15:06.1887998Z E1204 12:09:04.070000 125481 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Traceback (most recent call last): 2025-12-04T12:15:06.1889058Z E1204 12:09:04.070000 125481 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:06.1889804Z E1204 12:09:04.070000 125481 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:06.1890717Z E1204 12:09:04.070000 125481 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:06.1891405Z E1204 12:09:04.070000 125481 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:06.1892335Z E1204 12:09:04.070000 125481 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:06.1893110Z E1204 12:09:04.070000 125481 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:06.1893736Z E1204 12:09:04.070000 125481 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T12:15:06.1894553Z E1204 12:09:04.070000 125481 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:06.1894929Z E1204 12:09:04.070000 125481 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ^ 2025-12-04T12:15:06.1895854Z E1204 12:09:04.070000 125481 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:06.1895993Z ('RERUN', {'yellow': True}) [3.7823s] [ 0%] 2025-12-04T12:15:06.1897235Z inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float32_float8_e4m3fn_shape_4,2048,4096_cuda E1204 12:09:04.816000 125481 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Triton compilation failed: triton_poi_fused__to_copy_clamp_0 2025-12-04T12:15:06.1897997Z E1204 12:09:04.816000 125481 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:06.1898550Z E1204 12:09:04.816000 125481 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:06.1899133Z E1204 12:09:04.816000 125481 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T12:15:06.1899632Z E1204 12:09:04.816000 125481 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] xmask = xindex < xnumel 2025-12-04T12:15:06.1900087Z E1204 12:09:04.816000 125481 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] x0 = xindex 2025-12-04T12:15:06.1900636Z E1204 12:09:04.816000 125481 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp0 = tl.load(in_ptr0 + (x0), xmask) 2025-12-04T12:15:06.1901102Z E1204 12:09:04.816000 125481 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp1 = -448.0 2025-12-04T12:15:06.1901721Z E1204 12:09:04.816000 125481 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp2 = triton_helpers.maximum(tmp0, tmp1) 2025-12-04T12:15:06.1902171Z E1204 12:09:04.816000 125481 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp3 = 448.0 2025-12-04T12:15:06.1902754Z E1204 12:09:04.816000 125481 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp4 = triton_helpers.minimum(tmp2, tmp3) 2025-12-04T12:15:06.1903317Z E1204 12:09:04.816000 125481 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp5 = tmp4.to(tl.float8e4nv) 2025-12-04T12:15:06.1903877Z E1204 12:09:04.816000 125481 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tl.store(out_ptr0 + (x0), tmp5, xmask) 2025-12-04T12:15:06.1904241Z E1204 12:09:04.816000 125481 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] 2025-12-04T12:15:06.1905916Z E1204 12:09:04.816000 125481 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] metadata: {'signature': {'in_ptr0': '*fp32', 'out_ptr0': '*fp8e4nv', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1024}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 75} 2025-12-04T12:15:06.1906519Z E1204 12:09:04.816000 125481 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Traceback (most recent call last): 2025-12-04T12:15:06.1907565Z E1204 12:09:04.816000 125481 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:06.1908242Z E1204 12:09:04.816000 125481 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:06.1909144Z E1204 12:09:04.816000 125481 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:06.1909844Z E1204 12:09:04.816000 125481 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:06.1910727Z E1204 12:09:04.816000 125481 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:06.1911515Z E1204 12:09:04.816000 125481 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:06.1912128Z E1204 12:09:04.816000 125481 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T12:15:06.1912883Z E1204 12:09:04.816000 125481 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:06.1913272Z E1204 12:09:04.816000 125481 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ^ 2025-12-04T12:15:06.1914179Z E1204 12:09:04.816000 125481 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:06.1914327Z ('RERUN', {'yellow': True}) [0.7077s] [ 0%] 2025-12-04T12:15:06.1915510Z inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float32_float8_e4m3fn_shape_4,2048,4096_cuda E1204 12:09:05.526000 125481 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Triton compilation failed: triton_poi_fused__to_copy_clamp_0 2025-12-04T12:15:06.1916274Z E1204 12:09:05.526000 125481 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:06.1916825Z E1204 12:09:05.526000 125481 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:06.1917425Z E1204 12:09:05.526000 125481 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T12:15:06.1917937Z E1204 12:09:05.526000 125481 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] xmask = xindex < xnumel 2025-12-04T12:15:06.1918372Z E1204 12:09:05.526000 125481 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] x0 = xindex 2025-12-04T12:15:06.1918936Z E1204 12:09:05.526000 125481 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp0 = tl.load(in_ptr0 + (x0), xmask) 2025-12-04T12:15:06.1919383Z E1204 12:09:05.526000 125481 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp1 = -448.0 2025-12-04T12:15:06.1919979Z E1204 12:09:05.526000 125481 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp2 = triton_helpers.maximum(tmp0, tmp1) 2025-12-04T12:15:06.1920429Z E1204 12:09:05.526000 125481 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp3 = 448.0 2025-12-04T12:15:06.1921001Z E1204 12:09:05.526000 125481 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp4 = triton_helpers.minimum(tmp2, tmp3) 2025-12-04T12:15:06.1921546Z E1204 12:09:05.526000 125481 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp5 = tmp4.to(tl.float8e4nv) 2025-12-04T12:15:06.1922128Z E1204 12:09:05.526000 125481 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tl.store(out_ptr0 + (x0), tmp5, xmask) 2025-12-04T12:15:06.1922498Z E1204 12:09:05.526000 125481 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] 2025-12-04T12:15:06.1924172Z E1204 12:09:05.526000 125481 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] metadata: {'signature': {'in_ptr0': '*fp32', 'out_ptr0': '*fp8e4nv', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1024}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 75} 2025-12-04T12:15:06.1924710Z E1204 12:09:05.526000 125481 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Traceback (most recent call last): 2025-12-04T12:15:06.1925776Z E1204 12:09:05.526000 125481 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:06.1926403Z E1204 12:09:05.526000 125481 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:06.1927317Z E1204 12:09:05.526000 125481 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:06.1928004Z E1204 12:09:05.526000 125481 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:06.1928895Z E1204 12:09:05.526000 125481 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:06.1929701Z E1204 12:09:05.526000 125481 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:06.1930315Z E1204 12:09:05.526000 125481 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T12:15:06.1931079Z E1204 12:09:05.526000 125481 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:06.1931481Z E1204 12:09:05.526000 125481 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ^ 2025-12-04T12:15:06.1932390Z E1204 12:09:05.526000 125481 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:06.1932495Z FAILED [0.7078s] [ 0%] 2025-12-04T12:15:06.1932501Z 2025-12-04T12:15:06.1932653Z ==================================== RERUNS ==================================== 2025-12-04T12:15:06.1932995Z _ TestFP8TypesCUDA.test_to_fp8_saturated_float32_float8_e4m3fn_shape_4,2048,4096_cuda _ 2025-12-04T12:15:06.1933150Z Traceback (most recent call last): 2025-12-04T12:15:06.1933568Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 206, in test_to_fp8_saturated 2025-12-04T12:15:06.1933718Z y_compiled = compiled_fp8_cast(x, dst_dtype) 2025-12-04T12:15:06.1934216Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:06.1934484Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:06.1935002Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:06.1935214Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:06.1935762Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:06.1935913Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:06.1936538Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:06.1936860Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:06.1937386Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:06.1937550Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:06.1938032Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:06.1938170Z return self._compile_to_module() 2025-12-04T12:15:06.1938656Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:06.1938824Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:06.1939356Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:06.1939491Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:06.1939999Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:06.1940233Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:06.1940820Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:06.1940962Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:06.1941447Z File "/tmp/tmpx98mpn5_/lw/clw6xaxj6sudxztfcfu6io3ulrezf4tprx7jroekbdgwjmiiquzl.py", line 48, in 2025-12-04T12:15:06.1941955Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T12:15:06.1942082Z kernel.precompile( 2025-12-04T12:15:06.1942640Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T12:15:06.1942771Z self._precompile_worker() 2025-12-04T12:15:06.1943381Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:06.1943593Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:06.1944200Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:06.1944402Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:06.1944867Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:06.1945120Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:06.1945566Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:06.1945953Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:06.1946186Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:06.1946502Z def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:06.1946609Z ^ 2025-12-04T12:15:06.1947068Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:06.1947075Z 2025-12-04T12:15:06.1947808Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:06.1947814Z 2025-12-04T12:15:06.1947849Z 2025-12-04T12:15:06.1948072Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:06.1948732Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_to_fp8_saturated_float32_float8_e4m3fn_shape_4,2048,4096_cuda 2025-12-04T12:15:06.1948741Z 2025-12-04T12:15:06.1949011Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:06.1949239Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:06.1949363Z frames [('total', 1)] 2025-12-04T12:15:06.1949484Z stats [('calls_captured', 8)] 2025-12-04T12:15:06.1949723Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:06.1949961Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)] 2025-12-04T12:15:06.1950063Z graph_break [] 2025-12-04T12:15:06.1950412Z _ TestFP8TypesCUDA.test_to_fp8_saturated_float32_float8_e4m3fn_shape_4,2048,4096_cuda _ 2025-12-04T12:15:06.1950540Z Traceback (most recent call last): 2025-12-04T12:15:06.1950954Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 206, in test_to_fp8_saturated 2025-12-04T12:15:06.1951123Z y_compiled = compiled_fp8_cast(x, dst_dtype) 2025-12-04T12:15:06.1951615Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:06.1951867Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:06.1952402Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:06.1952596Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:06.1953121Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:06.1953272Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:06.1953866Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:06.1954208Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:06.1954739Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:06.1954903Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:06.1955439Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:06.1955561Z return self._compile_to_module() 2025-12-04T12:15:06.1956063Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:06.1956229Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:06.1956750Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:06.1956899Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:06.1957394Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:06.1957669Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:06.1958255Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:06.1958386Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:06.1958878Z File "/tmp/tmppoz_38bc/xh/cxhicdx3rgo2cr6w24iitqtoq4q7nzkqsghyojlki6goph7hz222.py", line 48, in 2025-12-04T12:15:06.1959341Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T12:15:06.1959463Z kernel.precompile( 2025-12-04T12:15:06.1960048Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T12:15:06.1960169Z self._precompile_worker() 2025-12-04T12:15:06.1960777Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:06.1960957Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:06.1961550Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:06.1961769Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:06.1962218Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:06.1962477Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:06.1962921Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:06.1963258Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:06.1963499Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:06.1963810Z def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:06.1963915Z ^ 2025-12-04T12:15:06.1964369Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:06.1964377Z 2025-12-04T12:15:06.1965086Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:06.1965092Z 2025-12-04T12:15:06.1965110Z 2025-12-04T12:15:06.1965327Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:06.1965998Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_to_fp8_saturated_float32_float8_e4m3fn_shape_4,2048,4096_cuda 2025-12-04T12:15:06.1966005Z 2025-12-04T12:15:06.1966287Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:06.1966514Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:06.1966619Z frames [('total', 1)] 2025-12-04T12:15:06.1966752Z stats [('calls_captured', 8)] 2025-12-04T12:15:06.1967024Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:06.1967260Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)] 2025-12-04T12:15:06.1967362Z graph_break [] 2025-12-04T12:15:06.1967582Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:06.1967702Z frames [('total', 1)] 2025-12-04T12:15:06.1967817Z stats [('calls_captured', 8)] 2025-12-04T12:15:06.1968035Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)] 2025-12-04T12:15:06.1968286Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:06.1968385Z graph_break [] 2025-12-04T12:15:06.1968532Z =================================== FAILURES =================================== 2025-12-04T12:15:06.1968931Z _ TestFP8TypesCUDA.test_to_fp8_saturated_float32_float8_e4m3fn_shape_4,2048,4096_cuda _ 2025-12-04T12:15:06.1969056Z Traceback (most recent call last): 2025-12-04T12:15:06.1969471Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 206, in test_to_fp8_saturated 2025-12-04T12:15:06.1969623Z y_compiled = compiled_fp8_cast(x, dst_dtype) 2025-12-04T12:15:06.1970110Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:06.1970372Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:06.1970884Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:06.1971377Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:06.1971894Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:06.1972046Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:06.1972595Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:06.1972919Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:06.1973442Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:06.1973609Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:06.1974092Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:06.1974230Z return self._compile_to_module() 2025-12-04T12:15:06.1974721Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:06.1974890Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:06.1975422Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:06.1975552Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:06.1976062Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:06.1976366Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:06.1976955Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:06.1977103Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:06.1977654Z File "/tmp/tmp561o8arb/4l/c4lqqty7unedmq22u77uoqgnvprm5hukzraavwdhhir5bm6njnux.py", line 48, in 2025-12-04T12:15:06.1978123Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T12:15:06.1978254Z kernel.precompile( 2025-12-04T12:15:06.1978811Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T12:15:06.1978992Z self._precompile_worker() 2025-12-04T12:15:06.1979592Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:06.1979772Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:06.1980382Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:06.1980579Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:06.1981050Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:06.1981294Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:06.1981780Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:06.1982128Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:06.1982359Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:06.1982666Z def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:06.1982766Z ^ 2025-12-04T12:15:06.1983225Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:06.1983232Z 2025-12-04T12:15:06.1983986Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:06.1983993Z 2025-12-04T12:15:06.1983997Z 2025-12-04T12:15:06.1984217Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:06.1984866Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_to_fp8_saturated_float32_float8_e4m3fn_shape_4,2048,4096_cuda 2025-12-04T12:15:06.1984872Z 2025-12-04T12:15:06.1985144Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:06.1985365Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:06.1985483Z frames [('total', 1)] 2025-12-04T12:15:06.1985601Z stats [('calls_captured', 8)] 2025-12-04T12:15:06.1985836Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:06.1986068Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)] 2025-12-04T12:15:06.1986170Z graph_break [] 2025-12-04T12:15:06.1986406Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:06.1986510Z frames [('total', 1)] 2025-12-04T12:15:06.1986628Z stats [('calls_captured', 8)] 2025-12-04T12:15:06.1986862Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)] 2025-12-04T12:15:06.1987096Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:06.1987196Z graph_break [] 2025-12-04T12:15:06.1987429Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:06.1987532Z frames [('total', 1)] 2025-12-04T12:15:06.1987645Z stats [('calls_captured', 8)] 2025-12-04T12:15:06.1987872Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)] 2025-12-04T12:15:06.1988102Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:06.1988214Z graph_break [] 2025-12-04T12:15:06.1988902Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-ef8db3fa00c6c1d7.xml - 2025-12-04T12:15:06.1989079Z =========================== short test summary info ============================ 2025-12-04T12:15:06.1989885Z FAILED [0.7078s] inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float32_float8_e4m3fn_shape_4,2048,4096_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:06.1990195Z def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:06.1990332Z ^ 2025-12-04T12:15:06.1990788Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:06.1990795Z 2025-12-04T12:15:06.1991503Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:06.1991508Z 2025-12-04T12:15:06.1991513Z 2025-12-04T12:15:06.1991747Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:06.1992386Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_to_fp8_saturated_float32_float8_e4m3fn_shape_4,2048,4096_cuda 2025-12-04T12:15:06.1992423Z 2025-12-04T12:15:06.1992703Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:06.1992885Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T12:15:06.1993092Z ================== 1 failed, 56 deselected, 2 rerun in 5.24s =================== 2025-12-04T12:15:06.1993208Z Got exit code 1 2025-12-04T12:15:06.1993317Z Retrying single test... 2025-12-04T12:15:06.1993809Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-7a3ac84fc91fa02b.xml 2025-12-04T12:15:06.1993974Z ============================= test session starts ============================== 2025-12-04T12:15:06.1994358Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T12:15:06.1994487Z cachedir: .pytest_cache 2025-12-04T12:15:06.1995010Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:15:06.1995139Z rootdir: /var/lib/jenkins/workspace 2025-12-04T12:15:06.1995263Z configfile: pytest.ini 2025-12-04T12:15:06.1995857Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T12:15:06.1996096Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T12:15:06.1996810Z stepcurrent: skipping 56 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float32_float8_e4m3fn_shape_4,2048,4096_cuda 2025-12-04T12:15:06.1996927Z Running 1 items in this shard 2025-12-04T12:15:06.1996932Z 2025-12-04T12:15:06.1998105Z inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float32_float8_e4m3fn_shape_4,2048,4096_cuda E1204 12:09:24.259000 125679 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Triton compilation failed: triton_poi_fused__to_copy_clamp_0 2025-12-04T12:15:06.2000201Z E1204 12:09:24.259000 125679 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:06.2001645Z E1204 12:09:24.259000 125679 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:06.2002924Z E1204 12:09:24.259000 125679 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T12:15:06.2004143Z E1204 12:09:24.259000 125679 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] xmask = xindex < xnumel 2025-12-04T12:15:06.2005291Z E1204 12:09:24.259000 125679 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] x0 = xindex 2025-12-04T12:15:06.2006434Z E1204 12:09:24.259000 125679 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp0 = tl.load(in_ptr0 + (x0), xmask) 2025-12-04T12:15:06.2007629Z E1204 12:09:24.259000 125679 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp1 = -448.0 2025-12-04T12:15:06.2008800Z E1204 12:09:24.259000 125679 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp2 = triton_helpers.maximum(tmp0, tmp1) 2025-12-04T12:15:06.2009990Z E1204 12:09:24.259000 125679 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp3 = 448.0 2025-12-04T12:15:06.2011138Z E1204 12:09:24.259000 125679 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp4 = triton_helpers.minimum(tmp2, tmp3) 2025-12-04T12:15:06.2012371Z E1204 12:09:24.259000 125679 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp5 = tmp4.to(tl.float8e4nv) 2025-12-04T12:15:06.2013597Z E1204 12:09:24.259000 125679 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tl.store(out_ptr0 + (x0), tmp5, xmask) 2025-12-04T12:15:06.2014694Z E1204 12:09:24.259000 125679 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] 2025-12-04T12:15:06.2016966Z E1204 12:09:24.259000 125679 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] metadata: {'signature': {'in_ptr0': '*fp32', 'out_ptr0': '*fp8e4nv', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1024}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 75} 2025-12-04T12:15:06.2019314Z E1204 12:09:24.259000 125679 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Traceback (most recent call last): 2025-12-04T12:15:06.2021093Z E1204 12:09:24.259000 125679 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:06.2022901Z E1204 12:09:24.259000 125679 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:06.2024581Z E1204 12:09:24.259000 125679 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:06.2026301Z E1204 12:09:24.259000 125679 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:06.2028022Z E1204 12:09:24.259000 125679 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:06.2029832Z E1204 12:09:24.259000 125679 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:06.2031350Z E1204 12:09:24.259000 125679 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T12:15:06.2032870Z E1204 12:09:24.259000 125679 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:06.2034144Z E1204 12:09:24.259000 125679 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ^ 2025-12-04T12:15:06.2035598Z E1204 12:09:24.259000 125679 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:06.2036760Z ('RERUN', {'yellow': True}) [3.7988s] [100%] 2025-12-04T12:15:06.2038178Z inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float32_float8_e4m3fn_shape_4,2048,4096_cuda E1204 12:09:25.013000 125679 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Triton compilation failed: triton_poi_fused__to_copy_clamp_0 2025-12-04T12:15:06.2040253Z E1204 12:09:25.013000 125679 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:06.2041859Z E1204 12:09:25.013000 125679 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:06.2043155Z E1204 12:09:25.013000 125679 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T12:15:06.2044371Z E1204 12:09:25.013000 125679 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] xmask = xindex < xnumel 2025-12-04T12:15:06.2045517Z E1204 12:09:25.013000 125679 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] x0 = xindex 2025-12-04T12:15:06.2046657Z E1204 12:09:25.013000 125679 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp0 = tl.load(in_ptr0 + (x0), xmask) 2025-12-04T12:15:06.2047810Z E1204 12:09:25.013000 125679 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp1 = -448.0 2025-12-04T12:15:06.2048974Z E1204 12:09:25.013000 125679 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp2 = triton_helpers.maximum(tmp0, tmp1) 2025-12-04T12:15:06.2050154Z E1204 12:09:25.013000 125679 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp3 = 448.0 2025-12-04T12:15:06.2051352Z E1204 12:09:25.013000 125679 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp4 = triton_helpers.minimum(tmp2, tmp3) 2025-12-04T12:15:06.2052609Z E1204 12:09:25.013000 125679 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp5 = tmp4.to(tl.float8e4nv) 2025-12-04T12:15:06.2053834Z E1204 12:09:25.013000 125679 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tl.store(out_ptr0 + (x0), tmp5, xmask) 2025-12-04T12:15:06.2054912Z E1204 12:09:25.013000 125679 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] 2025-12-04T12:15:06.2057186Z E1204 12:09:25.013000 125679 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] metadata: {'signature': {'in_ptr0': '*fp32', 'out_ptr0': '*fp8e4nv', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1024}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 75} 2025-12-04T12:15:06.2059550Z E1204 12:09:25.013000 125679 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Traceback (most recent call last): 2025-12-04T12:15:06.2061280Z E1204 12:09:25.013000 125679 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:06.2063088Z E1204 12:09:25.013000 125679 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:06.2064746Z E1204 12:09:25.013000 125679 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:06.2066521Z E1204 12:09:25.013000 125679 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:06.2068232Z E1204 12:09:25.013000 125679 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:06.2070048Z E1204 12:09:25.013000 125679 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:06.2071842Z E1204 12:09:25.013000 125679 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T12:15:06.2073355Z E1204 12:09:25.013000 125679 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:06.2074640Z E1204 12:09:25.013000 125679 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ^ 2025-12-04T12:15:06.2076070Z E1204 12:09:25.013000 125679 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:06.2077341Z ('RERUN', {'yellow': True}) [0.7157s] [100%] 2025-12-04T12:15:06.2078753Z inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float32_float8_e4m3fn_shape_4,2048,4096_cuda E1204 12:09:25.729000 125679 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Triton compilation failed: triton_poi_fused__to_copy_clamp_0 2025-12-04T12:15:06.2080799Z E1204 12:09:25.729000 125679 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:06.2082311Z E1204 12:09:25.729000 125679 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:06.2083577Z E1204 12:09:25.729000 125679 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T12:15:06.2084789Z E1204 12:09:25.729000 125679 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] xmask = xindex < xnumel 2025-12-04T12:15:06.2085861Z E1204 12:09:25.729000 125679 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] x0 = xindex 2025-12-04T12:15:06.2087001Z E1204 12:09:25.729000 125679 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp0 = tl.load(in_ptr0 + (x0), xmask) 2025-12-04T12:15:06.2088146Z E1204 12:09:25.729000 125679 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp1 = -448.0 2025-12-04T12:15:06.2089310Z E1204 12:09:25.729000 125679 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp2 = triton_helpers.maximum(tmp0, tmp1) 2025-12-04T12:15:06.2090471Z E1204 12:09:25.729000 125679 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp3 = 448.0 2025-12-04T12:15:06.2091638Z E1204 12:09:25.729000 125679 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp4 = triton_helpers.minimum(tmp2, tmp3) 2025-12-04T12:15:06.2092898Z E1204 12:09:25.729000 125679 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp5 = tmp4.to(tl.float8e4nv) 2025-12-04T12:15:06.2094122Z E1204 12:09:25.729000 125679 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tl.store(out_ptr0 + (x0), tmp5, xmask) 2025-12-04T12:15:06.2095177Z E1204 12:09:25.729000 125679 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] 2025-12-04T12:15:06.2097497Z E1204 12:09:25.729000 125679 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] metadata: {'signature': {'in_ptr0': '*fp32', 'out_ptr0': '*fp8e4nv', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1024}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 75} 2025-12-04T12:15:06.2099855Z E1204 12:09:25.729000 125679 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Traceback (most recent call last): 2025-12-04T12:15:06.2101586Z E1204 12:09:25.729000 125679 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:06.2103460Z E1204 12:09:25.729000 125679 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:06.2105136Z E1204 12:09:25.729000 125679 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:06.2106837Z E1204 12:09:25.729000 125679 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:06.2108629Z E1204 12:09:25.729000 125679 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:06.2110529Z E1204 12:09:25.729000 125679 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:06.2112054Z E1204 12:09:25.729000 125679 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T12:15:06.2113625Z E1204 12:09:25.729000 125679 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:06.2114879Z E1204 12:09:25.729000 125679 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ^ 2025-12-04T12:15:06.2116305Z E1204 12:09:25.729000 125679 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:06.2117451Z FAILED [0.7141s] [100%] 2025-12-04T12:15:06.2117637Z 2025-12-04T12:15:06.2117802Z ==================================== RERUNS ==================================== 2025-12-04T12:15:06.2118418Z _ TestFP8TypesCUDA.test_to_fp8_saturated_float32_float8_e4m3fn_shape_4,2048,4096_cuda _ 2025-12-04T12:15:06.2119026Z Traceback (most recent call last): 2025-12-04T12:15:06.2119683Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 206, in test_to_fp8_saturated 2025-12-04T12:15:06.2120392Z y_compiled = compiled_fp8_cast(x, dst_dtype) 2025-12-04T12:15:06.2121162Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:06.2122053Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:06.2122967Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:06.2123816Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:06.2124666Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:06.2125483Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:06.2126307Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:06.2127298Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:06.2128331Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:06.2129160Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:06.2129934Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:06.2130676Z return self._compile_to_module() 2025-12-04T12:15:06.2131453Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:06.2132258Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:06.2133070Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:06.2133866Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:06.2134636Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:06.2135512Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:06.2136567Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:06.2137422Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:06.2138170Z File "/tmp/tmpsb_t3t81/pn/cpnj7uzwucqhgl2z5fsfdj7ymx5po6ue5rfhnfah4hosyiya6erj.py", line 48, in 2025-12-04T12:15:06.2139264Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T12:15:06.2139973Z kernel.precompile( 2025-12-04T12:15:06.2140725Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T12:15:06.2141553Z self._precompile_worker() 2025-12-04T12:15:06.2142400Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:06.2143322Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:06.2144236Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:06.2145185Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:06.2145972Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:06.2146828Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:06.2147667Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:06.2148599Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:06.2149292Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:06.2149970Z def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:06.2150522Z ^ 2025-12-04T12:15:06.2151097Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:06.2151705Z 2025-12-04T12:15:06.2152418Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:06.2153281Z 2025-12-04T12:15:06.2153286Z 2025-12-04T12:15:06.2153508Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:06.2154506Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_to_fp8_saturated_float32_float8_e4m3fn_shape_4,2048,4096_cuda 2025-12-04T12:15:06.2155279Z 2025-12-04T12:15:06.2155566Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:06.2156233Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:06.2156713Z frames [('total', 1)] 2025-12-04T12:15:06.2157020Z stats [('calls_captured', 8)] 2025-12-04T12:15:06.2157463Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:06.2158068Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)] 2025-12-04T12:15:06.2158565Z graph_break [] 2025-12-04T12:15:06.2159057Z _ TestFP8TypesCUDA.test_to_fp8_saturated_float32_float8_e4m3fn_shape_4,2048,4096_cuda _ 2025-12-04T12:15:06.2159645Z Traceback (most recent call last): 2025-12-04T12:15:06.2160291Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 206, in test_to_fp8_saturated 2025-12-04T12:15:06.2160984Z y_compiled = compiled_fp8_cast(x, dst_dtype) 2025-12-04T12:15:06.2161750Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:06.2162634Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:06.2163540Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:06.2164423Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:06.2165253Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:06.2166056Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:06.2166876Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:06.2167874Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:06.2168847Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:06.2169695Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:06.2170467Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:06.2171403Z return self._compile_to_module() 2025-12-04T12:15:06.2172139Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:06.2172935Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:06.2173758Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:06.2174535Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:06.2175291Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:06.2176164Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:06.2177203Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:06.2178052Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:06.2178816Z File "/tmp/tmpbf617sg9/7t/c7tfdw6utbmvjfoxsl22bakcthcus765bhk4sxl3zr4dnquos5mo.py", line 48, in 2025-12-04T12:15:06.2179932Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T12:15:06.2180644Z kernel.precompile( 2025-12-04T12:15:06.2181398Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T12:15:06.2182222Z self._precompile_worker() 2025-12-04T12:15:06.2183044Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:06.2183948Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:06.2184943Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:06.2185893Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:06.2186689Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:06.2193400Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:06.2194410Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:06.2195349Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:06.2196050Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:06.2196731Z def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:06.2197282Z ^ 2025-12-04T12:15:06.2197868Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:06.2198480Z 2025-12-04T12:15:06.2199262Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:06.2200121Z 2025-12-04T12:15:06.2200126Z 2025-12-04T12:15:06.2200352Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:06.2201363Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_to_fp8_saturated_float32_float8_e4m3fn_shape_4,2048,4096_cuda 2025-12-04T12:15:06.2202135Z 2025-12-04T12:15:06.2202418Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:06.2203044Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:06.2203523Z frames [('total', 1)] 2025-12-04T12:15:06.2203825Z stats [('calls_captured', 8)] 2025-12-04T12:15:06.2204326Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:06.2204928Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)] 2025-12-04T12:15:06.2205399Z graph_break [] 2025-12-04T12:15:06.2205776Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:06.2206232Z frames [('total', 1)] 2025-12-04T12:15:06.2206531Z stats [('calls_captured', 8)] 2025-12-04T12:15:06.2206971Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)] 2025-12-04T12:15:06.2207555Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:06.2208036Z graph_break [] 2025-12-04T12:15:06.2208340Z =================================== FAILURES =================================== 2025-12-04T12:15:06.2208952Z _ TestFP8TypesCUDA.test_to_fp8_saturated_float32_float8_e4m3fn_shape_4,2048,4096_cuda _ 2025-12-04T12:15:06.2209544Z Traceback (most recent call last): 2025-12-04T12:15:06.2210191Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 206, in test_to_fp8_saturated 2025-12-04T12:15:06.2210886Z y_compiled = compiled_fp8_cast(x, dst_dtype) 2025-12-04T12:15:06.2211649Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:06.2212532Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:06.2213436Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:06.2214391Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:06.2215278Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:06.2216075Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:06.2216970Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:06.2218092Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:06.2219179Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:06.2219998Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:06.2220762Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:06.2221541Z return self._compile_to_module() 2025-12-04T12:15:06.2222268Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:06.2223059Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:06.2223858Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:06.2224651Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:06.2225403Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:06.2226329Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:06.2227274Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:06.2228134Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:06.2228904Z File "/tmp/tmpdlz471w8/ks/cksf4jmqddtjemybjlcyhewuloalp4s4lf5nqdasdtmjasynwwve.py", line 48, in 2025-12-04T12:15:06.2230026Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T12:15:06.2230732Z kernel.precompile( 2025-12-04T12:15:06.2231475Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T12:15:06.2232360Z self._precompile_worker() 2025-12-04T12:15:06.2233169Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:06.2234094Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:06.2235012Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:06.2235957Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:06.2236738Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:06.2237573Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:06.2238402Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:06.2239325Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:06.2240017Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:06.2240698Z def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:06.2241244Z ^ 2025-12-04T12:15:06.2241815Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:06.2242422Z 2025-12-04T12:15:06.2243133Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:06.2243986Z 2025-12-04T12:15:06.2243991Z 2025-12-04T12:15:06.2244211Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:06.2245206Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_to_fp8_saturated_float32_float8_e4m3fn_shape_4,2048,4096_cuda 2025-12-04T12:15:06.2245986Z 2025-12-04T12:15:06.2246318Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:06.2246948Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:06.2247415Z frames [('total', 1)] 2025-12-04T12:15:06.2247712Z stats [('calls_captured', 8)] 2025-12-04T12:15:06.2248155Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:06.2248783Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)] 2025-12-04T12:15:06.2249243Z graph_break [] 2025-12-04T12:15:06.2249623Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:06.2250078Z frames [('total', 1)] 2025-12-04T12:15:06.2250380Z stats [('calls_captured', 8)] 2025-12-04T12:15:06.2250817Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)] 2025-12-04T12:15:06.2251397Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:06.2251877Z graph_break [] 2025-12-04T12:15:06.2252254Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:06.2252749Z frames [('total', 1)] 2025-12-04T12:15:06.2253044Z stats [('calls_captured', 8)] 2025-12-04T12:15:06.2253482Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)] 2025-12-04T12:15:06.2254072Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:06.2254528Z graph_break [] 2025-12-04T12:15:06.2255336Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-7a3ac84fc91fa02b.xml - 2025-12-04T12:15:06.2256390Z =========================== short test summary info ============================ 2025-12-04T12:15:06.2257504Z FAILED [0.7141s] inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float32_float8_e4m3fn_shape_4,2048,4096_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:06.2258770Z def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:06.2259314Z ^ 2025-12-04T12:15:06.2259898Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:06.2260495Z 2025-12-04T12:15:06.2261205Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:06.2262064Z 2025-12-04T12:15:06.2262069Z 2025-12-04T12:15:06.2262290Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:06.2263284Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_to_fp8_saturated_float32_float8_e4m3fn_shape_4,2048,4096_cuda 2025-12-04T12:15:06.2264053Z 2025-12-04T12:15:06.2264332Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:06.2264923Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T12:15:06.2265445Z ================== 1 failed, 187 deselected, 2 rerun in 5.27s ================== 2025-12-04T12:15:06.2265897Z Got exit code 1 2025-12-04T12:15:06.2266169Z Retrying single test... 2025-12-04T12:15:06.2266823Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-e162f70cb76e49ff.xml 2025-12-04T12:15:06.2267606Z ============================= test session starts ============================== 2025-12-04T12:15:06.2268278Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T12:15:06.2268872Z cachedir: .pytest_cache 2025-12-04T12:15:06.2269606Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:15:06.2270384Z rootdir: /var/lib/jenkins/workspace 2025-12-04T12:15:06.2270739Z configfile: pytest.ini 2025-12-04T12:15:06.2271753Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T12:15:06.2272713Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T12:15:06.2273783Z stepcurrent: skipping 56 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float32_float8_e4m3fn_shape_4,2048,4096_cuda 2025-12-04T12:15:06.2274751Z Running 1 items in this shard 2025-12-04T12:15:06.2275009Z 2025-12-04T12:15:06.2276166Z inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float32_float8_e4m3fn_shape_4,2048,4096_cuda E1204 12:09:44.203000 125877 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Triton compilation failed: triton_poi_fused__to_copy_clamp_0 2025-12-04T12:15:06.2278232Z E1204 12:09:44.203000 125877 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:06.2279675Z E1204 12:09:44.203000 125877 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:06.2280975Z E1204 12:09:44.203000 125877 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T12:15:06.2282186Z E1204 12:09:44.203000 125877 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] xmask = xindex < xnumel 2025-12-04T12:15:06.2283255Z E1204 12:09:44.203000 125877 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] x0 = xindex 2025-12-04T12:15:06.2284394Z E1204 12:09:44.203000 125877 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp0 = tl.load(in_ptr0 + (x0), xmask) 2025-12-04T12:15:06.2285530Z E1204 12:09:44.203000 125877 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp1 = -448.0 2025-12-04T12:15:06.2286738Z E1204 12:09:44.203000 125877 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp2 = triton_helpers.maximum(tmp0, tmp1) 2025-12-04T12:15:06.2287889Z E1204 12:09:44.203000 125877 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp3 = 448.0 2025-12-04T12:15:06.2289046Z E1204 12:09:44.203000 125877 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp4 = triton_helpers.minimum(tmp2, tmp3) 2025-12-04T12:15:06.2290282Z E1204 12:09:44.203000 125877 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp5 = tmp4.to(tl.float8e4nv) 2025-12-04T12:15:06.2291498Z E1204 12:09:44.203000 125877 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tl.store(out_ptr0 + (x0), tmp5, xmask) 2025-12-04T12:15:06.2292550Z E1204 12:09:44.203000 125877 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] 2025-12-04T12:15:06.2294733Z E1204 12:09:44.203000 125877 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] metadata: {'signature': {'in_ptr0': '*fp32', 'out_ptr0': '*fp8e4nv', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1024}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 75} 2025-12-04T12:15:06.2297139Z E1204 12:09:44.203000 125877 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Traceback (most recent call last): 2025-12-04T12:15:06.2298890Z E1204 12:09:44.203000 125877 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:06.2300748Z E1204 12:09:44.203000 125877 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:06.2302507Z E1204 12:09:44.203000 125877 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:06.2304215Z E1204 12:09:44.203000 125877 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:06.2305959Z E1204 12:09:44.203000 125877 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:06.2307757Z E1204 12:09:44.203000 125877 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:06.2309278Z E1204 12:09:44.203000 125877 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T12:15:06.2310776Z E1204 12:09:44.203000 125877 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:06.2312061Z E1204 12:09:44.203000 125877 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ^ 2025-12-04T12:15:06.2313482Z E1204 12:09:44.203000 125877 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:06.2314658Z ('RERUN', {'yellow': True}) [3.7868s] [100%] 2025-12-04T12:15:06.2316072Z inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float32_float8_e4m3fn_shape_4,2048,4096_cuda E1204 12:09:44.949000 125877 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Triton compilation failed: triton_poi_fused__to_copy_clamp_0 2025-12-04T12:15:06.2318131Z E1204 12:09:44.949000 125877 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:06.2319586Z E1204 12:09:44.949000 125877 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:06.2320851Z E1204 12:09:44.949000 125877 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T12:15:06.2322066Z E1204 12:09:44.949000 125877 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] xmask = xindex < xnumel 2025-12-04T12:15:06.2323132Z E1204 12:09:44.949000 125877 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] x0 = xindex 2025-12-04T12:15:06.2324260Z E1204 12:09:44.949000 125877 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp0 = tl.load(in_ptr0 + (x0), xmask) 2025-12-04T12:15:06.2325399Z E1204 12:09:44.949000 125877 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp1 = -448.0 2025-12-04T12:15:06.2326561Z E1204 12:09:44.949000 125877 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp2 = triton_helpers.maximum(tmp0, tmp1) 2025-12-04T12:15:06.2327704Z E1204 12:09:44.949000 125877 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp3 = 448.0 2025-12-04T12:15:06.2328841Z E1204 12:09:44.949000 125877 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp4 = triton_helpers.minimum(tmp2, tmp3) 2025-12-04T12:15:06.2330081Z E1204 12:09:44.949000 125877 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp5 = tmp4.to(tl.float8e4nv) 2025-12-04T12:15:06.2331308Z E1204 12:09:44.949000 125877 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tl.store(out_ptr0 + (x0), tmp5, xmask) 2025-12-04T12:15:06.2332414Z E1204 12:09:44.949000 125877 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] 2025-12-04T12:15:06.2334586Z E1204 12:09:44.949000 125877 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] metadata: {'signature': {'in_ptr0': '*fp32', 'out_ptr0': '*fp8e4nv', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1024}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 75} 2025-12-04T12:15:06.2337041Z E1204 12:09:44.949000 125877 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Traceback (most recent call last): 2025-12-04T12:15:06.2338789Z E1204 12:09:44.949000 125877 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:06.2340604Z E1204 12:09:44.949000 125877 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:06.2342314Z E1204 12:09:44.949000 125877 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:06.2344030Z E1204 12:09:44.949000 125877 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:06.2345718Z E1204 12:09:44.949000 125877 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:06.2347555Z E1204 12:09:44.949000 125877 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:06.2349076Z E1204 12:09:44.949000 125877 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T12:15:06.2350592Z E1204 12:09:44.949000 125877 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:06.2351866Z E1204 12:09:44.949000 125877 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ^ 2025-12-04T12:15:06.2353274Z E1204 12:09:44.949000 125877 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:06.2354440Z ('RERUN', {'yellow': True}) [0.7081s] [100%] 2025-12-04T12:15:06.2355859Z inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float32_float8_e4m3fn_shape_4,2048,4096_cuda E1204 12:09:45.662000 125877 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Triton compilation failed: triton_poi_fused__to_copy_clamp_0 2025-12-04T12:15:06.2357910Z E1204 12:09:45.662000 125877 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:06.2359354Z E1204 12:09:45.662000 125877 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:06.2360625Z E1204 12:09:45.662000 125877 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T12:15:06.2361838Z E1204 12:09:45.662000 125877 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] xmask = xindex < xnumel 2025-12-04T12:15:06.2362961Z E1204 12:09:45.662000 125877 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] x0 = xindex 2025-12-04T12:15:06.2364079Z E1204 12:09:45.662000 125877 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp0 = tl.load(in_ptr0 + (x0), xmask) 2025-12-04T12:15:06.2365221Z E1204 12:09:45.662000 125877 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp1 = -448.0 2025-12-04T12:15:06.2366385Z E1204 12:09:45.662000 125877 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp2 = triton_helpers.maximum(tmp0, tmp1) 2025-12-04T12:15:06.2367587Z E1204 12:09:45.662000 125877 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp3 = 448.0 2025-12-04T12:15:06.2368741Z E1204 12:09:45.662000 125877 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp4 = triton_helpers.minimum(tmp2, tmp3) 2025-12-04T12:15:06.2369979Z E1204 12:09:45.662000 125877 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tmp5 = tmp4.to(tl.float8e4nv) 2025-12-04T12:15:06.2371400Z E1204 12:09:45.662000 125877 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] tl.store(out_ptr0 + (x0), tmp5, xmask) 2025-12-04T12:15:06.2372537Z E1204 12:09:45.662000 125877 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] 2025-12-04T12:15:06.2374722Z E1204 12:09:45.662000 125877 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] metadata: {'signature': {'in_ptr0': '*fp32', 'out_ptr0': '*fp8e4nv', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1024}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 75} 2025-12-04T12:15:06.2377161Z E1204 12:09:45.662000 125877 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] Traceback (most recent call last): 2025-12-04T12:15:06.2378944Z E1204 12:09:45.662000 125877 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:06.2380775Z E1204 12:09:45.662000 125877 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:06.2382458Z E1204 12:09:45.662000 125877 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:06.2384193Z E1204 12:09:45.662000 125877 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:06.2385929Z E1204 12:09:45.662000 125877 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:06.2387731Z E1204 12:09:45.662000 125877 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:06.2389268Z E1204 12:09:45.662000 125877 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T12:15:06.2390786Z E1204 12:09:45.662000 125877 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:06.2392062Z E1204 12:09:45.662000 125877 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ^ 2025-12-04T12:15:06.2393547Z E1204 12:09:45.662000 125877 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0_1] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:06.2394684Z FAILED [0.7100s] [100%] 2025-12-04T12:15:06.2394887Z 2025-12-04T12:15:06.2395039Z ==================================== RERUNS ==================================== 2025-12-04T12:15:06.2395679Z _ TestFP8TypesCUDA.test_to_fp8_saturated_float32_float8_e4m3fn_shape_4,2048,4096_cuda _ 2025-12-04T12:15:06.2396273Z Traceback (most recent call last): 2025-12-04T12:15:06.2396993Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 206, in test_to_fp8_saturated 2025-12-04T12:15:06.2397703Z y_compiled = compiled_fp8_cast(x, dst_dtype) 2025-12-04T12:15:06.2398636Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:06.2399533Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:06.2400459Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:06.2401327Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:06.2402234Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:06.2403036Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:06.2403863Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:06.2404873Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:06.2405864Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:06.2406696Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:06.2407474Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:06.2408263Z return self._compile_to_module() 2025-12-04T12:15:06.2408987Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:06.2409794Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:06.2410616Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:06.2411413Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:06.2412166Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:06.2413046Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:06.2414013Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:06.2414864Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:06.2415639Z File "/tmp/tmp8e1yhjx4/no/cnopm3fdvd4yz7kcpibin3dcbfdh7fat63u5mupswtlcu553sgha.py", line 48, in 2025-12-04T12:15:06.2416844Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T12:15:06.2417580Z kernel.precompile( 2025-12-04T12:15:06.2418316Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T12:15:06.2419146Z self._precompile_worker() 2025-12-04T12:15:06.2419966Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:06.2420883Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:06.2421787Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:06.2422734Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:06.2423583Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:06.2424421Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:06.2425257Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:06.2426184Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:06.2426939Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:06.2427604Z def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:06.2428158Z ^ 2025-12-04T12:15:06.2428752Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:06.2429340Z 2025-12-04T12:15:06.2430075Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:06.2430957Z 2025-12-04T12:15:06.2430963Z 2025-12-04T12:15:06.2431185Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:06.2432190Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_to_fp8_saturated_float32_float8_e4m3fn_shape_4,2048,4096_cuda 2025-12-04T12:15:06.2432982Z 2025-12-04T12:15:06.2433254Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:06.2433893Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:06.2434356Z frames [('total', 1)] 2025-12-04T12:15:06.2434663Z stats [('calls_captured', 8)] 2025-12-04T12:15:06.2435126Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:06.2435711Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)] 2025-12-04T12:15:06.2436220Z graph_break [] 2025-12-04T12:15:06.2436712Z _ TestFP8TypesCUDA.test_to_fp8_saturated_float32_float8_e4m3fn_shape_4,2048,4096_cuda _ 2025-12-04T12:15:06.2437312Z Traceback (most recent call last): 2025-12-04T12:15:06.2437950Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 206, in test_to_fp8_saturated 2025-12-04T12:15:06.2438647Z y_compiled = compiled_fp8_cast(x, dst_dtype) 2025-12-04T12:15:06.2439431Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:06.2440302Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:06.2441211Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:06.2442138Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:06.2442991Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:06.2443806Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:06.2444626Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:06.2445629Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:06.2446620Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:06.2447452Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:06.2448220Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:06.2448979Z return self._compile_to_module() 2025-12-04T12:15:06.2449721Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:06.2450591Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:06.2451428Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:06.2452234Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:06.2453005Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:06.2453912Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:06.2454883Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:06.2455751Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:06.2456600Z File "/tmp/tmpywvjwlis/ym/cymldqugyfrcg7f2sj4ovu6wlfsq6vwwkh3mt4z4kht7mfzuiybj.py", line 48, in 2025-12-04T12:15:06.2457734Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T12:15:06.2458474Z kernel.precompile( 2025-12-04T12:15:06.2459243Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T12:15:06.2460098Z self._precompile_worker() 2025-12-04T12:15:06.2460931Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:06.2461866Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:06.2462782Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:06.2463724Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:06.2464532Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:06.2465436Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:06.2466415Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:06.2467354Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:06.2468071Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:06.2468755Z def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:06.2469301Z ^ 2025-12-04T12:15:06.2469892Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:06.2470502Z 2025-12-04T12:15:06.2471434Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:06.2472284Z 2025-12-04T12:15:06.2472288Z 2025-12-04T12:15:06.2472533Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:06.2473539Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_to_fp8_saturated_float32_float8_e4m3fn_shape_4,2048,4096_cuda 2025-12-04T12:15:06.2474324Z 2025-12-04T12:15:06.2474598Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:06.2475248Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:06.2475720Z frames [('total', 1)] 2025-12-04T12:15:06.2476012Z stats [('calls_captured', 8)] 2025-12-04T12:15:06.2476467Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:06.2477068Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)] 2025-12-04T12:15:06.2477535Z graph_break [] 2025-12-04T12:15:06.2477901Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:06.2478377Z frames [('total', 1)] 2025-12-04T12:15:06.2478682Z stats [('calls_captured', 8)] 2025-12-04T12:15:06.2479222Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)] 2025-12-04T12:15:06.2479834Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:06.2480311Z graph_break [] 2025-12-04T12:15:06.2480606Z =================================== FAILURES =================================== 2025-12-04T12:15:06.2481242Z _ TestFP8TypesCUDA.test_to_fp8_saturated_float32_float8_e4m3fn_shape_4,2048,4096_cuda _ 2025-12-04T12:15:06.2481896Z Traceback (most recent call last): 2025-12-04T12:15:06.2482556Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 206, in test_to_fp8_saturated 2025-12-04T12:15:06.2483249Z y_compiled = compiled_fp8_cast(x, dst_dtype) 2025-12-04T12:15:06.2484030Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:06.2484914Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:06.2485809Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:06.2486714Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:06.2487554Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:06.2488363Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:06.2489173Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:06.2490179Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:06.2491180Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:06.2491995Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:06.2492800Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:06.2493556Z return self._compile_to_module() 2025-12-04T12:15:06.2494295Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:06.2495074Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:06.2495893Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:06.2496772Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:06.2497533Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:06.2498393Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:06.2499365Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:06.2500226Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:06.2500987Z File "/tmp/tmp9ugzbsiq/44/c444otvirizvptr6l2p2f2pqnptejoxwcpdzsujgljt4ih5n5inl.py", line 48, in 2025-12-04T12:15:06.2502075Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T12:15:06.2502800Z kernel.precompile( 2025-12-04T12:15:06.2503556Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T12:15:06.2504361Z self._precompile_worker() 2025-12-04T12:15:06.2505185Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:06.2506108Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:06.2507067Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:06.2507271Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:06.2507725Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:06.2507988Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:06.2508433Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:06.2509341Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:06.2509888Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:06.2510203Z def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:06.2510312Z ^ 2025-12-04T12:15:06.2511246Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:06.2511262Z 2025-12-04T12:15:06.2511993Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:06.2512047Z 2025-12-04T12:15:06.2512052Z 2025-12-04T12:15:06.2512271Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:06.2512915Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_to_fp8_saturated_float32_float8_e4m3fn_shape_4,2048,4096_cuda 2025-12-04T12:15:06.2512935Z 2025-12-04T12:15:06.2513204Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:06.2513429Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:06.2513551Z frames [('total', 1)] 2025-12-04T12:15:06.2513668Z stats [('calls_captured', 8)] 2025-12-04T12:15:06.2513943Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:06.2514182Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)] 2025-12-04T12:15:06.2514284Z graph_break [] 2025-12-04T12:15:06.2514507Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:06.2514625Z frames [('total', 1)] 2025-12-04T12:15:06.2514742Z stats [('calls_captured', 8)] 2025-12-04T12:15:06.2514977Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)] 2025-12-04T12:15:06.2515221Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:06.2515321Z graph_break [] 2025-12-04T12:15:06.2515550Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:06.2515654Z frames [('total', 1)] 2025-12-04T12:15:06.2515768Z stats [('calls_captured', 8)] 2025-12-04T12:15:06.2516005Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('not_ok', 1)] 2025-12-04T12:15:06.2516241Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:06.2516345Z graph_break [] 2025-12-04T12:15:06.2517015Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-e162f70cb76e49ff.xml - 2025-12-04T12:15:06.2517193Z =========================== short test summary info ============================ 2025-12-04T12:15:06.2518009Z FAILED [0.7100s] inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float32_float8_e4m3fn_shape_4,2048,4096_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:06.2518324Z def triton_poi_fused__to_copy_clamp_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:06.2518415Z ^ 2025-12-04T12:15:06.2518887Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:06.2518893Z 2025-12-04T12:15:06.2519641Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:06.2519648Z 2025-12-04T12:15:06.2519653Z 2025-12-04T12:15:06.2519889Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:06.2520535Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_to_fp8_saturated_float32_float8_e4m3fn_shape_4,2048,4096_cuda 2025-12-04T12:15:06.2520542Z 2025-12-04T12:15:06.2520858Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:06.2521041Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T12:15:06.2521246Z ================== 1 failed, 187 deselected, 2 rerun in 5.25s ================== 2025-12-04T12:15:06.2521362Z Got exit code 1 2025-12-04T12:15:06.2521919Z FAILED CONSISTENTLY: test/inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float32_float8_e4m3fn_shape_4,2048,4096_cuda 2025-12-04T12:15:06.2522339Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T12:15:06.2522829Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-e70a5c274fb86b8e.xml 2025-12-04T12:15:06.2523030Z ============================= test session starts ============================== 2025-12-04T12:15:06.2523398Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T12:15:06.2523513Z cachedir: .pytest_cache 2025-12-04T12:15:06.2524033Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:15:06.2524175Z rootdir: /var/lib/jenkins/workspace 2025-12-04T12:15:06.2524284Z configfile: pytest.ini 2025-12-04T12:15:06.2524903Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T12:15:06.2525189Z collecting ... collected 188 items / 57 deselected / 131 selected 2025-12-04T12:15:06.2525341Z stepcurrent: skipping 57 already run items. 2025-12-04T12:15:06.2525472Z Running 131 items in this shard 2025-12-04T12:15:06.2525480Z 2025-12-04T12:15:06.2525991Z inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float32_float8_e5m2_shape_16,16,16_cuda PASSED [4.0439s] [ 0%] 2025-12-04T12:15:06.2526500Z inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float32_float8_e5m2_shape_4,2048,4096_cuda PASSED [0.7977s] [ 1%] 2025-12-04T12:15:06.2527018Z inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_bfloat16_shape_15,3,13_dst_types0_cuda_bfloat16 PASSED [0.1583s] [ 2%] 2025-12-04T12:15:06.2527546Z inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_bfloat16_shape_4,2048,4096_dst_types0_cuda_bfloat16 PASSED [0.1683s] [ 3%] 2025-12-04T12:15:06.2528650Z inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float16_shape_15,3,13_dst_types0_cuda_float16 E1204 12:10:05.915000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_0 2025-12-04T12:15:06.2529416Z E1204 12:10:05.915000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:06.2529988Z E1204 12:10:05.915000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:06.2530555Z E1204 12:10:05.915000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T12:15:06.2531048Z E1204 12:10:05.915000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = xindex < xnumel 2025-12-04T12:15:06.2531496Z E1204 12:10:05.915000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T12:15:06.2532131Z E1204 12:10:05.915000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2025-12-04T12:15:06.2532671Z E1204 12:10:05.915000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T12:15:06.2533181Z E1204 12:10:05.915000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tmp1.to(tl.float32) 2025-12-04T12:15:06.2533722Z E1204 12:10:05.915000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tmp0.to(tl.float8e5) 2025-12-04T12:15:06.2534240Z E1204 12:10:05.915000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tmp3.to(tl.float32) 2025-12-04T12:15:06.2534788Z E1204 12:10:05.915000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr0 + (x0), tmp2, xmask) 2025-12-04T12:15:06.2535353Z E1204 12:10:05.915000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (x0), tmp4, xmask) 2025-12-04T12:15:06.2535721Z E1204 12:10:05.915000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:06.2537636Z E1204 12:10:05.915000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'out_ptr0': '*fp16', 'out_ptr1': '*fp16', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 512}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 75} 2025-12-04T12:15:06.2538177Z E1204 12:10:05.915000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:06.2539083Z E1204 12:10:05.915000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper 2025-12-04T12:15:06.2539611Z E1204 12:10:05.915000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return fn(*args, **kwargs) 2025-12-04T12:15:06.2540450Z E1204 12:10:05.915000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1114, in to 2025-12-04T12:15:06.2541179Z E1204 12:10:05.915000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return cast(self, dtype, fp_downcast_rounding, bitcast, _semantic=_semantic) 2025-12-04T12:15:06.2542040Z E1204 12:10:05.915000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper 2025-12-04T12:15:06.2542565Z E1204 12:10:05.915000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return fn(*args, **kwargs) 2025-12-04T12:15:06.2543408Z E1204 12:10:05.915000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1978, in cast 2025-12-04T12:15:06.2544050Z E1204 12:10:05.915000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return _semantic.cast(input, dtype, fp_downcast_rounding) 2025-12-04T12:15:06.2544943Z E1204 12:10:05.915000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/semantic.py", line 827, in cast 2025-12-04T12:15:06.2545760Z E1204 12:10:05.915000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] self.builder.create_fp_to_fp(input.handle, dst_ty.to_ir(self.builder), fp_downcast_rounding), dst_ty) 2025-12-04T12:15:06.2546656Z E1204 12:10:05.915000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 712, in to_ir 2025-12-04T12:15:06.2547357Z E1204 12:10:05.915000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return builder.get_block_ty(self.element_ty.to_ir(builder), self.shape) 2025-12-04T12:15:06.2548215Z E1204 12:10:05.915000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 574, in to_ir 2025-12-04T12:15:06.2548933Z E1204 12:10:05.915000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] raise ValueError(f'type {self} not supported in this architecture. ' 2025-12-04T12:15:06.2549841Z E1204 12:10:05.915000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError: type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T12:15:06.2550206Z E1204 12:10:05.915000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:06.2550917Z E1204 12:10:05.915000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] The above exception was the direct cause of the following exception: 2025-12-04T12:15:06.2551298Z E1204 12:10:05.915000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:06.2551836Z E1204 12:10:05.915000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:06.2552884Z E1204 12:10:05.915000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:06.2554130Z E1204 12:10:05.915000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:06.2555022Z E1204 12:10:05.915000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:06.2555725Z E1204 12:10:05.915000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:06.2556612Z E1204 12:10:05.915000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:06.2557399Z E1204 12:10:05.915000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:06.2558014Z E1204 12:10:05.915000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 7:11: 2025-12-04T12:15:06.2558796Z E1204 12:10:05.915000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:06.2559336Z E1204 12:10:05.915000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:06.2559903Z E1204 12:10:05.915000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T12:15:06.2560408Z E1204 12:10:05.915000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = xindex < xnumel 2025-12-04T12:15:06.2560839Z E1204 12:10:05.915000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T12:15:06.2561474Z E1204 12:10:05.915000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2025-12-04T12:15:06.2562002Z E1204 12:10:05.915000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T12:15:06.2562419Z E1204 12:10:05.915000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T12:15:06.2563290Z E1204 12:10:05.915000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T12:15:06.2563426Z ('RERUN', {'yellow': True}) [0.2462s] [ 3%] 2025-12-04T12:15:06.2564530Z inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float16_shape_15,3,13_dst_types0_cuda_float16 E1204 12:10:06.391000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_0 2025-12-04T12:15:06.2565293Z E1204 12:10:06.391000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:06.2566707Z E1204 12:10:06.391000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:06.2568141Z E1204 12:10:06.391000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T12:15:06.2569527Z E1204 12:10:06.391000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = xindex < xnumel 2025-12-04T12:15:06.2569980Z E1204 12:10:06.391000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T12:15:06.2570633Z E1204 12:10:06.391000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2025-12-04T12:15:06.2571366Z E1204 12:10:06.391000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T12:15:06.2571885Z E1204 12:10:06.391000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tmp1.to(tl.float32) 2025-12-04T12:15:06.2572396Z E1204 12:10:06.391000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tmp0.to(tl.float8e5) 2025-12-04T12:15:06.2572916Z E1204 12:10:06.391000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tmp3.to(tl.float32) 2025-12-04T12:15:06.2573462Z E1204 12:10:06.391000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr0 + (x0), tmp2, xmask) 2025-12-04T12:15:06.2574019Z E1204 12:10:06.391000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (x0), tmp4, xmask) 2025-12-04T12:15:06.2574384Z E1204 12:10:06.391000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:06.2576217Z E1204 12:10:06.391000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'out_ptr0': '*fp16', 'out_ptr1': '*fp16', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 512}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 75} 2025-12-04T12:15:06.2576828Z E1204 12:10:06.391000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:06.2577793Z E1204 12:10:06.391000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper 2025-12-04T12:15:06.2578313Z E1204 12:10:06.391000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return fn(*args, **kwargs) 2025-12-04T12:15:06.2579150Z E1204 12:10:06.391000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1114, in to 2025-12-04T12:15:06.2579921Z E1204 12:10:06.391000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return cast(self, dtype, fp_downcast_rounding, bitcast, _semantic=_semantic) 2025-12-04T12:15:06.2580771Z E1204 12:10:06.391000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper 2025-12-04T12:15:06.2581293Z E1204 12:10:06.391000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return fn(*args, **kwargs) 2025-12-04T12:15:06.2582132Z E1204 12:10:06.391000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1978, in cast 2025-12-04T12:15:06.2582833Z E1204 12:10:06.391000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return _semantic.cast(input, dtype, fp_downcast_rounding) 2025-12-04T12:15:06.2583718Z E1204 12:10:06.391000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/semantic.py", line 827, in cast 2025-12-04T12:15:06.2584530Z E1204 12:10:06.391000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] self.builder.create_fp_to_fp(input.handle, dst_ty.to_ir(self.builder), fp_downcast_rounding), dst_ty) 2025-12-04T12:15:06.2585507Z E1204 12:10:06.391000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 712, in to_ir 2025-12-04T12:15:06.2586212Z E1204 12:10:06.391000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return builder.get_block_ty(self.element_ty.to_ir(builder), self.shape) 2025-12-04T12:15:06.2587075Z E1204 12:10:06.391000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 574, in to_ir 2025-12-04T12:15:06.2587766Z E1204 12:10:06.391000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] raise ValueError(f'type {self} not supported in this architecture. ' 2025-12-04T12:15:06.2588668Z E1204 12:10:06.391000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError: type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T12:15:06.2589036Z E1204 12:10:06.391000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:06.2589721Z E1204 12:10:06.391000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] The above exception was the direct cause of the following exception: 2025-12-04T12:15:06.2590103Z E1204 12:10:06.391000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:06.2590639Z E1204 12:10:06.391000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:06.2591693Z E1204 12:10:06.391000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:06.2592368Z E1204 12:10:06.391000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:06.2593421Z E1204 12:10:06.391000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:06.2594131Z E1204 12:10:06.391000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:06.2595064Z E1204 12:10:06.391000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:06.2595858Z E1204 12:10:06.391000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:06.2596477Z E1204 12:10:06.391000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 7:11: 2025-12-04T12:15:06.2597284Z E1204 12:10:06.391000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:06.2597826Z E1204 12:10:06.391000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:06.2598388Z E1204 12:10:06.391000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T12:15:06.2598964Z E1204 12:10:06.391000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = xindex < xnumel 2025-12-04T12:15:06.2599406Z E1204 12:10:06.391000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T12:15:06.2600054Z E1204 12:10:06.391000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2025-12-04T12:15:06.2600581Z E1204 12:10:06.391000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T12:15:06.2600998Z E1204 12:10:06.391000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T12:15:06.2601835Z E1204 12:10:06.391000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T12:15:06.2601970Z ('RERUN', {'yellow': True}) [0.4453s] [ 3%] 2025-12-04T12:15:06.2603087Z inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float16_shape_15,3,13_dst_types0_cuda_float16 E1204 12:10:06.806000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_0 2025-12-04T12:15:06.2603846Z E1204 12:10:06.806000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:06.2604404Z E1204 12:10:06.806000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:06.2604962Z E1204 12:10:06.806000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T12:15:06.2605456Z E1204 12:10:06.806000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = xindex < xnumel 2025-12-04T12:15:06.2605903Z E1204 12:10:06.806000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T12:15:06.2606497Z E1204 12:10:06.806000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2025-12-04T12:15:06.2607070Z E1204 12:10:06.806000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T12:15:06.2607580Z E1204 12:10:06.806000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tmp1.to(tl.float32) 2025-12-04T12:15:06.2608087Z E1204 12:10:06.806000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tmp0.to(tl.float8e5) 2025-12-04T12:15:06.2608637Z E1204 12:10:06.806000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tmp3.to(tl.float32) 2025-12-04T12:15:06.2609182Z E1204 12:10:06.806000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr0 + (x0), tmp2, xmask) 2025-12-04T12:15:06.2609734Z E1204 12:10:06.806000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (x0), tmp4, xmask) 2025-12-04T12:15:06.2610098Z E1204 12:10:06.806000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:06.2611937Z E1204 12:10:06.806000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'out_ptr0': '*fp16', 'out_ptr1': '*fp16', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 512}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 75} 2025-12-04T12:15:06.2612471Z E1204 12:10:06.806000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:06.2613369Z E1204 12:10:06.806000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper 2025-12-04T12:15:06.2613880Z E1204 12:10:06.806000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return fn(*args, **kwargs) 2025-12-04T12:15:06.2614715Z E1204 12:10:06.806000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1114, in to 2025-12-04T12:15:06.2615435Z E1204 12:10:06.806000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return cast(self, dtype, fp_downcast_rounding, bitcast, _semantic=_semantic) 2025-12-04T12:15:06.2616361Z E1204 12:10:06.806000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper 2025-12-04T12:15:06.2616886Z E1204 12:10:06.806000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return fn(*args, **kwargs) 2025-12-04T12:15:06.2617731Z E1204 12:10:06.806000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1978, in cast 2025-12-04T12:15:06.2618378Z E1204 12:10:06.806000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return _semantic.cast(input, dtype, fp_downcast_rounding) 2025-12-04T12:15:06.2619253Z E1204 12:10:06.806000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/semantic.py", line 827, in cast 2025-12-04T12:15:06.2620064Z E1204 12:10:06.806000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] self.builder.create_fp_to_fp(input.handle, dst_ty.to_ir(self.builder), fp_downcast_rounding), dst_ty) 2025-12-04T12:15:06.2620962Z E1204 12:10:06.806000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 712, in to_ir 2025-12-04T12:15:06.2621657Z E1204 12:10:06.806000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return builder.get_block_ty(self.element_ty.to_ir(builder), self.shape) 2025-12-04T12:15:06.2622517Z E1204 12:10:06.806000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 574, in to_ir 2025-12-04T12:15:06.2623855Z E1204 12:10:06.806000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] raise ValueError(f'type {self} not supported in this architecture. ' 2025-12-04T12:15:06.2625526Z E1204 12:10:06.806000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError: type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T12:15:06.2625902Z E1204 12:10:06.806000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:06.2626579Z E1204 12:10:06.806000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] The above exception was the direct cause of the following exception: 2025-12-04T12:15:06.2627005Z E1204 12:10:06.806000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:06.2629000Z E1204 12:10:06.806000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:06.2630201Z E1204 12:10:06.806000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:06.2630879Z E1204 12:10:06.806000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:06.2631778Z E1204 12:10:06.806000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:06.2632476Z E1204 12:10:06.806000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:06.2633360Z E1204 12:10:06.806000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:06.2634146Z E1204 12:10:06.806000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:06.2634767Z E1204 12:10:06.806000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 7:11: 2025-12-04T12:15:06.2635544Z E1204 12:10:06.806000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:06.2636091Z E1204 12:10:06.806000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:06.2636648Z E1204 12:10:06.806000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T12:15:06.2637154Z E1204 12:10:06.806000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = xindex < xnumel 2025-12-04T12:15:06.2637585Z E1204 12:10:06.806000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T12:15:06.2638229Z E1204 12:10:06.806000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2025-12-04T12:15:06.2638761Z E1204 12:10:06.806000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T12:15:06.2639192Z E1204 12:10:06.806000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T12:15:06.2640074Z E1204 12:10:06.806000 126075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T12:15:06.2640183Z FAILED [0.4130s] [ 3%] 2025-12-04T12:15:06.2640189Z 2025-12-04T12:15:06.2640351Z ==================================== RERUNS ==================================== 2025-12-04T12:15:06.2640667Z _ TestFP8TypesCUDA.test_valid_cast_float16_shape_15,3,13_dst_types0_cuda_float16 _ 2025-12-04T12:15:06.2640794Z Traceback (most recent call last): 2025-12-04T12:15:06.2641188Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 156, in test_valid_cast 2025-12-04T12:15:06.2641318Z y0_fp8, y1_fp8 = compiled_fp8_cast(x) 2025-12-04T12:15:06.2641859Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:06.2642110Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:06.2642627Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:06.2642834Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:06.2643346Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:06.2643504Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:06.2644069Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:06.2644395Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:06.2644931Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:06.2645079Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:06.2645560Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:06.2645700Z return self._compile_to_module() 2025-12-04T12:15:06.2646188Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:06.2646367Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:06.2646883Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:06.2647020Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:06.2647535Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:06.2647774Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:06.2648378Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:06.2648510Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:06.2649016Z File "/tmp/tmplmuksdhk/my/cmyqtfcskab3ydlogxb3r6dtgztlq5pbmlcnzdf5yowooyb3qrwb.py", line 51, in 2025-12-04T12:15:06.2649494Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T12:15:06.2649609Z kernel.precompile( 2025-12-04T12:15:06.2650168Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T12:15:06.2650338Z self._precompile_worker() 2025-12-04T12:15:06.2650935Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:06.2651134Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:06.2651730Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:06.2651962Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:06.2652433Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:06.2652681Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:06.2653144Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:06.2653486Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:06.2653724Z torch._inductor.exc.InductorError: CompilationError: at 7:11: 2025-12-04T12:15:06.2654093Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:06.2654221Z xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:06.2654364Z xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T12:15:06.2654492Z xmask = xindex < xnumel 2025-12-04T12:15:06.2654591Z x0 = xindex 2025-12-04T12:15:06.2654780Z tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2025-12-04T12:15:06.2654903Z tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T12:15:06.2654999Z ^ 2025-12-04T12:15:06.2655402Z type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T12:15:06.2655409Z 2025-12-04T12:15:06.2656155Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:06.2656162Z 2025-12-04T12:15:06.2656168Z 2025-12-04T12:15:06.2656484Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:06.2657113Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_valid_cast_float16_shape_15,3,13_dst_types0_cuda_float16 2025-12-04T12:15:06.2657118Z 2025-12-04T12:15:06.2657395Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:06.2657639Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:06.2657748Z frames [('total', 1)] 2025-12-04T12:15:06.2657869Z stats [('calls_captured', 4)] 2025-12-04T12:15:06.2658109Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:06.2658579Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:06.2658697Z graph_break [] 2025-12-04T12:15:06.2659019Z _ TestFP8TypesCUDA.test_valid_cast_float16_shape_15,3,13_dst_types0_cuda_float16 _ 2025-12-04T12:15:06.2659147Z Traceback (most recent call last): 2025-12-04T12:15:06.2659536Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 156, in test_valid_cast 2025-12-04T12:15:06.2659663Z y0_fp8, y1_fp8 = compiled_fp8_cast(x) 2025-12-04T12:15:06.2660156Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:06.2660418Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:06.2660930Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:06.2661139Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:06.2661649Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:06.2661848Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:06.2662401Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:06.2662724Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:06.2663253Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:06.2663440Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:06.2663919Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:06.2664054Z return self._compile_to_module() 2025-12-04T12:15:06.2664539Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:06.2664720Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:06.2665242Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:06.2665406Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:06.2665917Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:06.2666151Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:06.2666740Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:06.2666882Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:06.2667364Z File "/tmp/tmpvltzk6_5/yb/cyboltm5bweutm2o3lswgzf32bdwzhpvb32eanp77hnagze5ck47.py", line 51, in 2025-12-04T12:15:06.2667843Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T12:15:06.2667991Z kernel.precompile( 2025-12-04T12:15:06.2668550Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T12:15:06.2668684Z self._precompile_worker() 2025-12-04T12:15:06.2669283Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:06.2669481Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:06.2670079Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:06.2670279Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:06.2670743Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:06.2671168Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:06.2671618Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:06.2671969Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:06.2672202Z torch._inductor.exc.InductorError: CompilationError: at 7:11: 2025-12-04T12:15:06.2672538Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:06.2672667Z xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:06.2672808Z xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T12:15:06.2672934Z xmask = xindex < xnumel 2025-12-04T12:15:06.2673033Z x0 = xindex 2025-12-04T12:15:06.2673203Z tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2025-12-04T12:15:06.2673340Z tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T12:15:06.2673433Z ^ 2025-12-04T12:15:06.2673840Z type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T12:15:06.2673932Z 2025-12-04T12:15:06.2674653Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:06.2674662Z 2025-12-04T12:15:06.2674667Z 2025-12-04T12:15:06.2674889Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:06.2675578Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_valid_cast_float16_shape_15,3,13_dst_types0_cuda_float16 2025-12-04T12:15:06.2675585Z 2025-12-04T12:15:06.2675859Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:06.2676103Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:06.2676211Z frames [('total', 1)] 2025-12-04T12:15:06.2676333Z stats [('calls_captured', 4)] 2025-12-04T12:15:06.2676576Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:06.2677045Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:06.2677210Z graph_break [] 2025-12-04T12:15:06.2677432Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:06.2677539Z frames [('total', 1)] 2025-12-04T12:15:06.2677670Z stats [('calls_captured', 4)] 2025-12-04T12:15:06.2677893Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:06.2678351Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:06.2678465Z graph_break [] 2025-12-04T12:15:06.2678612Z =================================== FAILURES =================================== 2025-12-04T12:15:06.2678927Z _ TestFP8TypesCUDA.test_valid_cast_float16_shape_15,3,13_dst_types0_cuda_float16 _ 2025-12-04T12:15:06.2679106Z Traceback (most recent call last): 2025-12-04T12:15:06.2679487Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 156, in test_valid_cast 2025-12-04T12:15:06.2679633Z y0_fp8, y1_fp8 = compiled_fp8_cast(x) 2025-12-04T12:15:06.2680120Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:06.2680368Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:06.2680897Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:06.2681094Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:06.2683546Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:06.2683705Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:06.2684312Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:06.2684651Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:06.2685931Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:06.2686893Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:06.2687440Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:06.2687563Z return self._compile_to_module() 2025-12-04T12:15:06.2688064Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:06.2689404Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:06.2689986Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:06.2690227Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:06.2691729Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:06.2691988Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:06.2692580Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:06.2692754Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:06.2693248Z File "/tmp/tmpz_ft4e9s/zc/czcpsk3mcmiadknd77dy3sn35d6awyvzophsapvkjkr5udk3zhyb.py", line 51, in 2025-12-04T12:15:06.2693714Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T12:15:06.2693828Z kernel.precompile( 2025-12-04T12:15:06.2694402Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T12:15:06.2694526Z self._precompile_worker() 2025-12-04T12:15:06.2695139Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:06.2695354Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:06.2695957Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:06.2696172Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:06.2696701Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:06.2696964Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:06.2697406Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:06.2697780Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:06.2698027Z torch._inductor.exc.InductorError: CompilationError: at 7:11: 2025-12-04T12:15:06.2698351Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:06.2698476Z xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:06.2698630Z xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T12:15:06.2698745Z xmask = xindex < xnumel 2025-12-04T12:15:06.2698854Z x0 = xindex 2025-12-04T12:15:06.2699023Z tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2025-12-04T12:15:06.2699143Z tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T12:15:06.2699250Z ^ 2025-12-04T12:15:06.2699639Z type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T12:15:06.2699646Z 2025-12-04T12:15:06.2700365Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:06.2700386Z 2025-12-04T12:15:06.2700391Z 2025-12-04T12:15:06.2700611Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:06.2701238Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_valid_cast_float16_shape_15,3,13_dst_types0_cuda_float16 2025-12-04T12:15:06.2701247Z 2025-12-04T12:15:06.2701532Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:06.2701757Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:06.2701882Z frames [('total', 1)] 2025-12-04T12:15:06.2702005Z stats [('calls_captured', 4)] 2025-12-04T12:15:06.2702228Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:06.2702712Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:06.2702851Z graph_break [] 2025-12-04T12:15:06.2703073Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:06.2703191Z frames [('total', 1)] 2025-12-04T12:15:06.2703307Z stats [('calls_captured', 4)] 2025-12-04T12:15:06.2703528Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:06.2704001Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:06.2704134Z graph_break [] 2025-12-04T12:15:06.2704366Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:06.2704471Z frames [('total', 1)] 2025-12-04T12:15:06.2704586Z stats [('calls_captured', 4)] 2025-12-04T12:15:06.2704817Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:06.2705279Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:06.2705381Z graph_break [] 2025-12-04T12:15:06.2706046Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-e70a5c274fb86b8e.xml - 2025-12-04T12:15:06.2706253Z =========================== short test summary info ============================ 2025-12-04T12:15:06.2707033Z FAILED [0.4130s] inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float16_shape_15,3,13_dst_types0_cuda_float16 - torch._inductor.exc.InductorError: CompilationError: at 7:11: 2025-12-04T12:15:06.2707361Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:06.2707488Z xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:06.2707643Z xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T12:15:06.2707754Z xmask = xindex < xnumel 2025-12-04T12:15:06.2707848Z x0 = xindex 2025-12-04T12:15:06.2708046Z tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2025-12-04T12:15:06.2708202Z tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T12:15:06.2708309Z ^ 2025-12-04T12:15:06.2708703Z type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T12:15:06.2708712Z 2025-12-04T12:15:06.2709423Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:06.2709432Z 2025-12-04T12:15:06.2709451Z 2025-12-04T12:15:06.2709671Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:06.2710298Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_valid_cast_float16_shape_15,3,13_dst_types0_cuda_float16 2025-12-04T12:15:06.2710304Z 2025-12-04T12:15:06.2710636Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:06.2710825Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T12:15:06.2711049Z ============= 1 failed, 4 passed, 57 deselected, 2 rerun in 6.33s ============== 2025-12-04T12:15:06.2711169Z Got exit code 1 2025-12-04T12:15:06.2711283Z Retrying single test... 2025-12-04T12:15:06.2711768Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-0c17434f07767682.xml 2025-12-04T12:15:06.2711938Z ============================= test session starts ============================== 2025-12-04T12:15:06.2712297Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T12:15:06.2712425Z cachedir: .pytest_cache 2025-12-04T12:15:06.2712949Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:15:06.2713078Z rootdir: /var/lib/jenkins/workspace 2025-12-04T12:15:06.2713204Z configfile: pytest.ini 2025-12-04T12:15:06.2713828Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T12:15:06.2714071Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T12:15:06.2714779Z stepcurrent: skipping 61 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float16_shape_15,3,13_dst_types0_cuda_float16 2025-12-04T12:15:06.2714899Z Running 1 items in this shard 2025-12-04T12:15:06.2714935Z 2025-12-04T12:15:06.2716047Z inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float16_shape_15,3,13_dst_types0_cuda_float16 E1204 12:10:23.845000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_0 2025-12-04T12:15:06.2716814Z E1204 12:10:23.845000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:06.2717378Z E1204 12:10:23.845000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:06.2717974Z E1204 12:10:23.845000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T12:15:06.2718483Z E1204 12:10:23.845000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = xindex < xnumel 2025-12-04T12:15:06.2718917Z E1204 12:10:23.845000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T12:15:06.2719514Z E1204 12:10:23.845000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2025-12-04T12:15:06.2720061Z E1204 12:10:23.845000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T12:15:06.2720599Z E1204 12:10:23.845000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tmp1.to(tl.float32) 2025-12-04T12:15:06.2721132Z E1204 12:10:23.845000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tmp0.to(tl.float8e5) 2025-12-04T12:15:06.2721642Z E1204 12:10:23.845000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tmp3.to(tl.float32) 2025-12-04T12:15:06.2722190Z E1204 12:10:23.845000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr0 + (x0), tmp2, xmask) 2025-12-04T12:15:06.2722756Z E1204 12:10:23.845000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (x0), tmp4, xmask) 2025-12-04T12:15:06.2723123Z E1204 12:10:23.845000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:06.2724949Z E1204 12:10:23.845000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'out_ptr0': '*fp16', 'out_ptr1': '*fp16', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 512}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 75} 2025-12-04T12:15:06.2725492Z E1204 12:10:23.845000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:06.2726381Z E1204 12:10:23.845000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper 2025-12-04T12:15:06.2726885Z E1204 12:10:23.845000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return fn(*args, **kwargs) 2025-12-04T12:15:06.2727751Z E1204 12:10:23.845000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1114, in to 2025-12-04T12:15:06.2728473Z E1204 12:10:23.845000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return cast(self, dtype, fp_downcast_rounding, bitcast, _semantic=_semantic) 2025-12-04T12:15:06.2729330Z E1204 12:10:23.845000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper 2025-12-04T12:15:06.2729886Z E1204 12:10:23.845000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return fn(*args, **kwargs) 2025-12-04T12:15:06.2730726Z E1204 12:10:23.845000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1978, in cast 2025-12-04T12:15:06.2731375Z E1204 12:10:23.845000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return _semantic.cast(input, dtype, fp_downcast_rounding) 2025-12-04T12:15:06.2732276Z E1204 12:10:23.845000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/semantic.py", line 827, in cast 2025-12-04T12:15:06.2733091Z E1204 12:10:23.845000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] self.builder.create_fp_to_fp(input.handle, dst_ty.to_ir(self.builder), fp_downcast_rounding), dst_ty) 2025-12-04T12:15:06.2733944Z E1204 12:10:23.845000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 712, in to_ir 2025-12-04T12:15:06.2734687Z E1204 12:10:23.845000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return builder.get_block_ty(self.element_ty.to_ir(builder), self.shape) 2025-12-04T12:15:06.2735550Z E1204 12:10:23.845000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 574, in to_ir 2025-12-04T12:15:06.2736233Z E1204 12:10:23.845000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] raise ValueError(f'type {self} not supported in this architecture. ' 2025-12-04T12:15:06.2737202Z E1204 12:10:23.845000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError: type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T12:15:06.2737567Z E1204 12:10:23.845000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:06.2738504Z E1204 12:10:23.845000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] The above exception was the direct cause of the following exception: 2025-12-04T12:15:06.2738885Z E1204 12:10:23.845000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:06.2739628Z E1204 12:10:23.845000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:06.2741173Z E1204 12:10:23.845000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:06.2742782Z E1204 12:10:23.845000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:06.2743745Z E1204 12:10:23.845000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:06.2745043Z E1204 12:10:23.845000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:06.2745939Z E1204 12:10:23.845000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:06.2747737Z E1204 12:10:23.845000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:06.2748350Z E1204 12:10:23.845000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 7:11: 2025-12-04T12:15:06.2749128Z E1204 12:10:23.845000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:06.2749677Z E1204 12:10:23.845000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:06.2751062Z E1204 12:10:23.845000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T12:15:06.2751669Z E1204 12:10:23.845000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = xindex < xnumel 2025-12-04T12:15:06.2752161Z E1204 12:10:23.845000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T12:15:06.2752870Z E1204 12:10:23.845000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2025-12-04T12:15:06.2753396Z E1204 12:10:23.845000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T12:15:06.2753869Z E1204 12:10:23.845000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T12:15:06.2754699Z E1204 12:10:23.845000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T12:15:06.2754836Z ('RERUN', {'yellow': True}) [3.4912s] [100%] 2025-12-04T12:15:06.2756200Z inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float16_shape_15,3,13_dst_types0_cuda_float16 E1204 12:10:24.323000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_0 2025-12-04T12:15:06.2757017Z E1204 12:10:24.323000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:06.2757634Z E1204 12:10:24.323000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:06.2758253Z E1204 12:10:24.323000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T12:15:06.2758843Z E1204 12:10:24.323000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = xindex < xnumel 2025-12-04T12:15:06.2759335Z E1204 12:10:24.323000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T12:15:06.2759987Z E1204 12:10:24.323000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2025-12-04T12:15:06.2760529Z E1204 12:10:24.323000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T12:15:06.2761039Z E1204 12:10:24.323000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tmp1.to(tl.float32) 2025-12-04T12:15:06.2761702Z E1204 12:10:24.323000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tmp0.to(tl.float8e5) 2025-12-04T12:15:06.2762210Z E1204 12:10:24.323000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tmp3.to(tl.float32) 2025-12-04T12:15:06.2762758Z E1204 12:10:24.323000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr0 + (x0), tmp2, xmask) 2025-12-04T12:15:06.2763354Z E1204 12:10:24.323000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (x0), tmp4, xmask) 2025-12-04T12:15:06.2763714Z E1204 12:10:24.323000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:06.2765532Z E1204 12:10:24.323000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'out_ptr0': '*fp16', 'out_ptr1': '*fp16', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 512}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 75} 2025-12-04T12:15:06.2766101Z E1204 12:10:24.323000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:06.2766981Z E1204 12:10:24.323000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper 2025-12-04T12:15:06.2767488Z E1204 12:10:24.323000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return fn(*args, **kwargs) 2025-12-04T12:15:06.2768349Z E1204 12:10:24.323000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1114, in to 2025-12-04T12:15:06.2769076Z E1204 12:10:24.323000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return cast(self, dtype, fp_downcast_rounding, bitcast, _semantic=_semantic) 2025-12-04T12:15:06.2769935Z E1204 12:10:24.323000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper 2025-12-04T12:15:06.2770458Z E1204 12:10:24.323000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return fn(*args, **kwargs) 2025-12-04T12:15:06.2771483Z E1204 12:10:24.323000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1978, in cast 2025-12-04T12:15:06.2772139Z E1204 12:10:24.323000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return _semantic.cast(input, dtype, fp_downcast_rounding) 2025-12-04T12:15:06.2773012Z E1204 12:10:24.323000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/semantic.py", line 827, in cast 2025-12-04T12:15:06.2773836Z E1204 12:10:24.323000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] self.builder.create_fp_to_fp(input.handle, dst_ty.to_ir(self.builder), fp_downcast_rounding), dst_ty) 2025-12-04T12:15:06.2774712Z E1204 12:10:24.323000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 712, in to_ir 2025-12-04T12:15:06.2775412Z E1204 12:10:24.323000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return builder.get_block_ty(self.element_ty.to_ir(builder), self.shape) 2025-12-04T12:15:06.2776426Z E1204 12:10:24.323000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 574, in to_ir 2025-12-04T12:15:06.2777129Z E1204 12:10:24.323000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] raise ValueError(f'type {self} not supported in this architecture. ' 2025-12-04T12:15:06.2778070Z E1204 12:10:24.323000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError: type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T12:15:06.2778435Z E1204 12:10:24.323000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:06.2779116Z E1204 12:10:24.323000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] The above exception was the direct cause of the following exception: 2025-12-04T12:15:06.2779497Z E1204 12:10:24.323000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:06.2780033Z E1204 12:10:24.323000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:06.2781125Z E1204 12:10:24.323000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:06.2781756Z E1204 12:10:24.323000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:06.2782665Z E1204 12:10:24.323000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:06.2783396Z E1204 12:10:24.323000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:06.2784285Z E1204 12:10:24.323000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:06.2785068Z E1204 12:10:24.323000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:06.2785680Z E1204 12:10:24.323000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 7:11: 2025-12-04T12:15:06.2786451Z E1204 12:10:24.323000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:06.2786998Z E1204 12:10:24.323000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:06.2787572Z E1204 12:10:24.323000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T12:15:06.2788067Z E1204 12:10:24.323000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = xindex < xnumel 2025-12-04T12:15:06.2788501Z E1204 12:10:24.323000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T12:15:06.2789110Z E1204 12:10:24.323000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2025-12-04T12:15:06.2789641Z E1204 12:10:24.323000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T12:15:06.2790121Z E1204 12:10:24.323000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T12:15:06.2790942Z E1204 12:10:24.323000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T12:15:06.2791081Z ('RERUN', {'yellow': True}) [0.4372s] [100%] 2025-12-04T12:15:06.2792186Z inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float16_shape_15,3,13_dst_types0_cuda_float16 E1204 12:10:24.759000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_0 2025-12-04T12:15:06.2792982Z E1204 12:10:24.759000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:06.2793542Z E1204 12:10:24.759000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:06.2794097Z E1204 12:10:24.759000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T12:15:06.2794810Z E1204 12:10:24.759000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = xindex < xnumel 2025-12-04T12:15:06.2795306Z E1204 12:10:24.759000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T12:15:06.2796022Z E1204 12:10:24.759000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2025-12-04T12:15:06.2796563Z E1204 12:10:24.759000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T12:15:06.2797244Z E1204 12:10:24.759000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tmp1.to(tl.float32) 2025-12-04T12:15:06.2797862Z E1204 12:10:24.759000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tmp0.to(tl.float8e5) 2025-12-04T12:15:06.2798437Z E1204 12:10:24.759000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tmp3.to(tl.float32) 2025-12-04T12:15:06.2798987Z E1204 12:10:24.759000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr0 + (x0), tmp2, xmask) 2025-12-04T12:15:06.2799721Z E1204 12:10:24.759000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (x0), tmp4, xmask) 2025-12-04T12:15:06.2800088Z E1204 12:10:24.759000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:06.2802127Z E1204 12:10:24.759000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'out_ptr0': '*fp16', 'out_ptr1': '*fp16', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 512}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 75} 2025-12-04T12:15:06.2802664Z E1204 12:10:24.759000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:06.2803858Z E1204 12:10:24.759000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper 2025-12-04T12:15:06.2804422Z E1204 12:10:24.759000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return fn(*args, **kwargs) 2025-12-04T12:15:06.2805432Z E1204 12:10:24.759000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1114, in to 2025-12-04T12:15:06.2806157Z E1204 12:10:24.759000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return cast(self, dtype, fp_downcast_rounding, bitcast, _semantic=_semantic) 2025-12-04T12:15:06.2807087Z E1204 12:10:24.759000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper 2025-12-04T12:15:06.2807755Z E1204 12:10:24.759000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return fn(*args, **kwargs) 2025-12-04T12:15:06.2808602Z E1204 12:10:24.759000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1978, in cast 2025-12-04T12:15:06.2809254Z E1204 12:10:24.759000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return _semantic.cast(input, dtype, fp_downcast_rounding) 2025-12-04T12:15:06.2810124Z E1204 12:10:24.759000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/semantic.py", line 827, in cast 2025-12-04T12:15:06.2811068Z E1204 12:10:24.759000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] self.builder.create_fp_to_fp(input.handle, dst_ty.to_ir(self.builder), fp_downcast_rounding), dst_ty) 2025-12-04T12:15:06.2812161Z E1204 12:10:24.759000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 712, in to_ir 2025-12-04T12:15:06.2812858Z E1204 12:10:24.759000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return builder.get_block_ty(self.element_ty.to_ir(builder), self.shape) 2025-12-04T12:15:06.2813885Z E1204 12:10:24.759000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 574, in to_ir 2025-12-04T12:15:06.2814576Z E1204 12:10:24.759000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] raise ValueError(f'type {self} not supported in this architecture. ' 2025-12-04T12:15:06.2815479Z E1204 12:10:24.759000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError: type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T12:15:06.2815846Z E1204 12:10:24.759000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:06.2816605Z E1204 12:10:24.759000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] The above exception was the direct cause of the following exception: 2025-12-04T12:15:06.2816973Z E1204 12:10:24.759000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:06.2817504Z E1204 12:10:24.759000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:06.2818559Z E1204 12:10:24.759000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:06.2819187Z E1204 12:10:24.759000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:06.2820095Z E1204 12:10:24.759000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:06.2820818Z E1204 12:10:24.759000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:06.2821874Z E1204 12:10:24.759000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:06.2822660Z E1204 12:10:24.759000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:06.2823319Z E1204 12:10:24.759000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 7:11: 2025-12-04T12:15:06.2824093Z E1204 12:10:24.759000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:06.2824638Z E1204 12:10:24.759000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:06.2825244Z E1204 12:10:24.759000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T12:15:06.2825743Z E1204 12:10:24.759000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = xindex < xnumel 2025-12-04T12:15:06.2826174Z E1204 12:10:24.759000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T12:15:06.2826780Z E1204 12:10:24.759000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2025-12-04T12:15:06.2827306Z E1204 12:10:24.759000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T12:15:06.2827764Z E1204 12:10:24.759000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T12:15:06.2828588Z E1204 12:10:24.759000 126302 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T12:15:06.2828698Z FAILED [0.4331s] [100%] 2025-12-04T12:15:06.2828717Z 2025-12-04T12:15:06.2828865Z ==================================== RERUNS ==================================== 2025-12-04T12:15:06.2829184Z _ TestFP8TypesCUDA.test_valid_cast_float16_shape_15,3,13_dst_types0_cuda_float16 _ 2025-12-04T12:15:06.2829324Z Traceback (most recent call last): 2025-12-04T12:15:06.2829702Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 156, in test_valid_cast 2025-12-04T12:15:06.2829835Z y0_fp8, y1_fp8 = compiled_fp8_cast(x) 2025-12-04T12:15:06.2830341Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:06.2830601Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:06.2831132Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:06.2831332Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:06.2831843Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:06.2832009Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:06.2832545Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:06.2832867Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:06.2833402Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:06.2833555Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:06.2834083Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:06.2834216Z return self._compile_to_module() 2025-12-04T12:15:06.2834702Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:06.2834883Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:06.2835434Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:06.2835583Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:06.2836080Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:06.2836315Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:06.2836918Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:06.2837049Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:06.2837579Z File "/tmp/tmpb3b7m996/i3/ci3i5t3cwbigr6gq4kn5uylj6623h24vvwgdqbszvaaqeooimkcw.py", line 51, in 2025-12-04T12:15:06.2838058Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T12:15:06.2838177Z kernel.precompile( 2025-12-04T12:15:06.2838747Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T12:15:06.2838870Z self._precompile_worker() 2025-12-04T12:15:06.2839469Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:06.2839671Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:06.2840300Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:06.2840522Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:06.2840975Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:06.2841225Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:06.2841685Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:06.2842020Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:06.2842253Z torch._inductor.exc.InductorError: CompilationError: at 7:11: 2025-12-04T12:15:06.2842590Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:06.2842716Z xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:06.2842875Z xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T12:15:06.2842988Z xmask = xindex < xnumel 2025-12-04T12:15:06.2843088Z x0 = xindex 2025-12-04T12:15:06.2843275Z tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2025-12-04T12:15:06.2843397Z tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T12:15:06.2843489Z ^ 2025-12-04T12:15:06.2843892Z type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T12:15:06.2843901Z 2025-12-04T12:15:06.2844618Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:06.2844625Z 2025-12-04T12:15:06.2844629Z 2025-12-04T12:15:06.2844862Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:06.2845485Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_valid_cast_float16_shape_15,3,13_dst_types0_cuda_float16 2025-12-04T12:15:06.2845523Z 2025-12-04T12:15:06.2845807Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:06.2846036Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:06.2846142Z frames [('total', 1)] 2025-12-04T12:15:06.2846273Z stats [('calls_captured', 4)] 2025-12-04T12:15:06.2846734Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:06.2847005Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:06.2847120Z graph_break [] 2025-12-04T12:15:06.2847436Z _ TestFP8TypesCUDA.test_valid_cast_float16_shape_15,3,13_dst_types0_cuda_float16 _ 2025-12-04T12:15:06.2847573Z Traceback (most recent call last): 2025-12-04T12:15:06.2847949Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 156, in test_valid_cast 2025-12-04T12:15:06.2848079Z y0_fp8, y1_fp8 = compiled_fp8_cast(x) 2025-12-04T12:15:06.2848584Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:06.2848866Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:06.2849375Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:06.2849585Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:06.2850094Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:06.2850260Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:06.2850794Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:06.2851144Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:06.2851686Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:06.2851837Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:06.2852331Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:06.2852457Z return self._compile_to_module() 2025-12-04T12:15:06.2852947Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:06.2853129Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:06.2853647Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:06.2853781Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:06.2854295Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:06.2854527Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:06.2855122Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:06.2855250Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:06.2855755Z File "/tmp/tmpcj5k8xzb/23/c23abf7ytetlyzxb3s7egzkdo7cpfa2bemghxqitrhte6qiyfdju.py", line 51, in 2025-12-04T12:15:06.2856238Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T12:15:06.2856451Z kernel.precompile( 2025-12-04T12:15:06.2857026Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T12:15:06.2857147Z self._precompile_worker() 2025-12-04T12:15:06.2857801Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:06.2857998Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:06.2858595Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:06.2858796Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:06.2859297Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:06.2859546Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:06.2860002Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:06.2860339Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:06.2860575Z torch._inductor.exc.InductorError: CompilationError: at 7:11: 2025-12-04T12:15:06.2860918Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:06.2861078Z xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:06.2861219Z xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T12:15:06.2861344Z xmask = xindex < xnumel 2025-12-04T12:15:06.2861441Z x0 = xindex 2025-12-04T12:15:06.2861624Z tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2025-12-04T12:15:06.2861749Z tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T12:15:06.2861841Z ^ 2025-12-04T12:15:06.2862243Z type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T12:15:06.2862248Z 2025-12-04T12:15:06.2862961Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:06.2862968Z 2025-12-04T12:15:06.2862972Z 2025-12-04T12:15:06.2863237Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:06.2863859Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_valid_cast_float16_shape_15,3,13_dst_types0_cuda_float16 2025-12-04T12:15:06.2863867Z 2025-12-04T12:15:06.2864134Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:06.2864371Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:06.2864480Z frames [('total', 1)] 2025-12-04T12:15:06.2864610Z stats [('calls_captured', 4)] 2025-12-04T12:15:06.2865071Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:06.2865296Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:06.2865410Z graph_break [] 2025-12-04T12:15:06.2865638Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:06.2865744Z frames [('total', 1)] 2025-12-04T12:15:06.2865876Z stats [('calls_captured', 4)] 2025-12-04T12:15:06.2866096Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:06.2866560Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:06.2866672Z graph_break [] 2025-12-04T12:15:06.2866821Z =================================== FAILURES =================================== 2025-12-04T12:15:06.2867149Z _ TestFP8TypesCUDA.test_valid_cast_float16_shape_15,3,13_dst_types0_cuda_float16 _ 2025-12-04T12:15:06.2867272Z Traceback (most recent call last): 2025-12-04T12:15:06.2867645Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 156, in test_valid_cast 2025-12-04T12:15:06.2867788Z y0_fp8, y1_fp8 = compiled_fp8_cast(x) 2025-12-04T12:15:06.2868276Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:06.2868573Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:06.2869168Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:06.2869395Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:06.2869929Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:06.2870120Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:06.2870653Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:06.2871166Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:06.2871688Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:06.2871855Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:06.2872336Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:06.2872533Z return self._compile_to_module() 2025-12-04T12:15:06.2873038Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:06.2873206Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:06.2873737Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:06.2873869Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:06.2874368Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:06.2874619Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:06.2875254Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:06.2875388Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:06.2875906Z File "/tmp/tmpcrwducs3/3i/c3i6ijzztcvlfmucnu3llmrfhfa3cmsb3qtgms2uj7mn7z3wxqq6.py", line 51, in 2025-12-04T12:15:06.2876372Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T12:15:06.2876500Z kernel.precompile( 2025-12-04T12:15:06.2877055Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T12:15:06.2877174Z self._precompile_worker() 2025-12-04T12:15:06.2877779Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:06.2877960Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:06.2878573Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:06.2878775Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:06.2879227Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:06.2879489Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:06.2879936Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:06.2880272Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:06.2880523Z torch._inductor.exc.InductorError: CompilationError: at 7:11: 2025-12-04T12:15:06.2880844Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:06.2880984Z xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:06.2881168Z xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T12:15:06.2881282Z xmask = xindex < xnumel 2025-12-04T12:15:06.2881395Z x0 = xindex 2025-12-04T12:15:06.2881567Z tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2025-12-04T12:15:06.2881686Z tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T12:15:06.2881794Z ^ 2025-12-04T12:15:06.2882185Z type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T12:15:06.2882234Z 2025-12-04T12:15:06.2882966Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:06.2882973Z 2025-12-04T12:15:06.2882977Z 2025-12-04T12:15:06.2883198Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:06.2883822Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_valid_cast_float16_shape_15,3,13_dst_types0_cuda_float16 2025-12-04T12:15:06.2883841Z 2025-12-04T12:15:06.2884110Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:06.2884366Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:06.2884491Z frames [('total', 1)] 2025-12-04T12:15:06.2884608Z stats [('calls_captured', 4)] 2025-12-04T12:15:06.2885078Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:06.2885317Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:06.2885418Z graph_break [] 2025-12-04T12:15:06.2885638Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:06.2885756Z frames [('total', 1)] 2025-12-04T12:15:06.2885870Z stats [('calls_captured', 4)] 2025-12-04T12:15:06.2886129Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:06.2886595Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:06.2886697Z graph_break [] 2025-12-04T12:15:06.2886944Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:06.2887049Z frames [('total', 1)] 2025-12-04T12:15:06.2887163Z stats [('calls_captured', 4)] 2025-12-04T12:15:06.2887400Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:06.2887865Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:06.2887984Z graph_break [] 2025-12-04T12:15:06.2888630Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-0c17434f07767682.xml - 2025-12-04T12:15:06.2888808Z =========================== short test summary info ============================ 2025-12-04T12:15:06.2889596Z FAILED [0.4331s] inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float16_shape_15,3,13_dst_types0_cuda_float16 - torch._inductor.exc.InductorError: CompilationError: at 7:11: 2025-12-04T12:15:06.2889922Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:06.2890063Z xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:06.2890205Z xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T12:15:06.2890320Z xmask = xindex < xnumel 2025-12-04T12:15:06.2890431Z x0 = xindex 2025-12-04T12:15:06.2890604Z tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2025-12-04T12:15:06.2890727Z tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T12:15:06.2890835Z ^ 2025-12-04T12:15:06.2891226Z type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T12:15:06.2891231Z 2025-12-04T12:15:06.2892010Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:06.2892016Z 2025-12-04T12:15:06.2892023Z 2025-12-04T12:15:06.2892246Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:06.2892866Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_valid_cast_float16_shape_15,3,13_dst_types0_cuda_float16 2025-12-04T12:15:06.2892902Z 2025-12-04T12:15:06.2893189Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:06.2893375Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T12:15:06.2893598Z ================== 1 failed, 187 deselected, 2 rerun in 4.41s ================== 2025-12-04T12:15:06.2893704Z Got exit code 1 2025-12-04T12:15:06.2893814Z Retrying single test... 2025-12-04T12:15:06.2894307Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-3815c1aa47a06d85.xml 2025-12-04T12:15:06.2894478Z ============================= test session starts ============================== 2025-12-04T12:15:06.2894867Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T12:15:06.2894994Z cachedir: .pytest_cache 2025-12-04T12:15:06.2895517Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:15:06.2895664Z rootdir: /var/lib/jenkins/workspace 2025-12-04T12:15:06.2895777Z configfile: pytest.ini 2025-12-04T12:15:06.2896433Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T12:15:06.2896680Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T12:15:06.2897426Z stepcurrent: skipping 61 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float16_shape_15,3,13_dst_types0_cuda_float16 2025-12-04T12:15:06.2897564Z Running 1 items in this shard 2025-12-04T12:15:06.2897569Z 2025-12-04T12:15:06.2898660Z inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float16_shape_15,3,13_dst_types0_cuda_float16 E1204 12:10:43.467000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_0 2025-12-04T12:15:06.2899428Z E1204 12:10:43.467000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:06.2899995Z E1204 12:10:43.467000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:06.2900559Z E1204 12:10:43.467000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T12:15:06.2901073Z E1204 12:10:43.467000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = xindex < xnumel 2025-12-04T12:15:06.2901512Z E1204 12:10:43.467000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T12:15:06.2902110Z E1204 12:10:43.467000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2025-12-04T12:15:06.2902653Z E1204 12:10:43.467000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T12:15:06.2903162Z E1204 12:10:43.467000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tmp1.to(tl.float32) 2025-12-04T12:15:06.2903683Z E1204 12:10:43.467000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tmp0.to(tl.float8e5) 2025-12-04T12:15:06.2904190Z E1204 12:10:43.467000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tmp3.to(tl.float32) 2025-12-04T12:15:06.2904783Z E1204 12:10:43.467000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr0 + (x0), tmp2, xmask) 2025-12-04T12:15:06.2905331Z E1204 12:10:43.467000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (x0), tmp4, xmask) 2025-12-04T12:15:06.2905694Z E1204 12:10:43.467000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:06.2907546Z E1204 12:10:43.467000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'out_ptr0': '*fp16', 'out_ptr1': '*fp16', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 512}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 75} 2025-12-04T12:15:06.2908083Z E1204 12:10:43.467000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:06.2908993Z E1204 12:10:43.467000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper 2025-12-04T12:15:06.2909505Z E1204 12:10:43.467000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return fn(*args, **kwargs) 2025-12-04T12:15:06.2910348Z E1204 12:10:43.467000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1114, in to 2025-12-04T12:15:06.2911052Z E1204 12:10:43.467000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return cast(self, dtype, fp_downcast_rounding, bitcast, _semantic=_semantic) 2025-12-04T12:15:06.2911937Z E1204 12:10:43.467000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper 2025-12-04T12:15:06.2912465Z E1204 12:10:43.467000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return fn(*args, **kwargs) 2025-12-04T12:15:06.2913309Z E1204 12:10:43.467000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1978, in cast 2025-12-04T12:15:06.2913958Z E1204 12:10:43.467000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return _semantic.cast(input, dtype, fp_downcast_rounding) 2025-12-04T12:15:06.2914836Z E1204 12:10:43.467000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/semantic.py", line 827, in cast 2025-12-04T12:15:06.2915666Z E1204 12:10:43.467000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] self.builder.create_fp_to_fp(input.handle, dst_ty.to_ir(self.builder), fp_downcast_rounding), dst_ty) 2025-12-04T12:15:06.2916508Z E1204 12:10:43.467000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 712, in to_ir 2025-12-04T12:15:06.2917206Z E1204 12:10:43.467000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return builder.get_block_ty(self.element_ty.to_ir(builder), self.shape) 2025-12-04T12:15:06.2918063Z E1204 12:10:43.467000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 574, in to_ir 2025-12-04T12:15:06.2918779Z E1204 12:10:43.467000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] raise ValueError(f'type {self} not supported in this architecture. ' 2025-12-04T12:15:06.2919682Z E1204 12:10:43.467000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError: type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T12:15:06.2920048Z E1204 12:10:43.467000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:06.2920771Z E1204 12:10:43.467000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] The above exception was the direct cause of the following exception: 2025-12-04T12:15:06.2921134Z E1204 12:10:43.467000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:06.2921670Z E1204 12:10:43.467000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:06.2922725Z E1204 12:10:43.467000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:06.2923387Z E1204 12:10:43.467000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:06.2924292Z E1204 12:10:43.467000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:06.2924973Z E1204 12:10:43.467000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:06.2925906Z E1204 12:10:43.467000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:06.2926675Z E1204 12:10:43.467000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:06.2927291Z E1204 12:10:43.467000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 7:11: 2025-12-04T12:15:06.2928067Z E1204 12:10:43.467000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:06.2928608Z E1204 12:10:43.467000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:06.2929182Z E1204 12:10:43.467000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T12:15:06.2929679Z E1204 12:10:43.467000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = xindex < xnumel 2025-12-04T12:15:06.2930127Z E1204 12:10:43.467000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T12:15:06.2930721Z E1204 12:10:43.467000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2025-12-04T12:15:06.2931243Z E1204 12:10:43.467000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T12:15:06.2931680Z E1204 12:10:43.467000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T12:15:06.2932503Z E1204 12:10:43.467000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T12:15:06.2932690Z ('RERUN', {'yellow': True}) [3.4816s] [100%] 2025-12-04T12:15:06.2933767Z inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float16_shape_15,3,13_dst_types0_cuda_float16 E1204 12:10:43.934000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_0 2025-12-04T12:15:06.2934528Z E1204 12:10:43.934000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:06.2935123Z E1204 12:10:43.934000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:06.2935684Z E1204 12:10:43.934000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T12:15:06.2936197Z E1204 12:10:43.934000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = xindex < xnumel 2025-12-04T12:15:06.2936692Z E1204 12:10:43.934000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T12:15:06.2937328Z E1204 12:10:43.934000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2025-12-04T12:15:06.2937868Z E1204 12:10:43.934000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T12:15:06.2938379Z E1204 12:10:43.934000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tmp1.to(tl.float32) 2025-12-04T12:15:06.2938905Z E1204 12:10:43.934000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tmp0.to(tl.float8e5) 2025-12-04T12:15:06.2939410Z E1204 12:10:43.934000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tmp3.to(tl.float32) 2025-12-04T12:15:06.2940023Z E1204 12:10:43.934000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr0 + (x0), tmp2, xmask) 2025-12-04T12:15:06.2940573Z E1204 12:10:43.934000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (x0), tmp4, xmask) 2025-12-04T12:15:06.2940939Z E1204 12:10:43.934000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:06.2942767Z E1204 12:10:43.934000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'out_ptr0': '*fp16', 'out_ptr1': '*fp16', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 512}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 75} 2025-12-04T12:15:06.2943348Z E1204 12:10:43.934000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:06.2944228Z E1204 12:10:43.934000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper 2025-12-04T12:15:06.2944732Z E1204 12:10:43.934000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return fn(*args, **kwargs) 2025-12-04T12:15:06.2945577Z E1204 12:10:43.934000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1114, in to 2025-12-04T12:15:06.2946284Z E1204 12:10:43.934000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return cast(self, dtype, fp_downcast_rounding, bitcast, _semantic=_semantic) 2025-12-04T12:15:06.2947168Z E1204 12:10:43.934000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper 2025-12-04T12:15:06.2947694Z E1204 12:10:43.934000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return fn(*args, **kwargs) 2025-12-04T12:15:06.2948534Z E1204 12:10:43.934000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1978, in cast 2025-12-04T12:15:06.2949226Z E1204 12:10:43.934000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return _semantic.cast(input, dtype, fp_downcast_rounding) 2025-12-04T12:15:06.2950090Z E1204 12:10:43.934000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/semantic.py", line 827, in cast 2025-12-04T12:15:06.2950924Z E1204 12:10:43.934000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] self.builder.create_fp_to_fp(input.handle, dst_ty.to_ir(self.builder), fp_downcast_rounding), dst_ty) 2025-12-04T12:15:06.2951805Z E1204 12:10:43.934000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 712, in to_ir 2025-12-04T12:15:06.2952503Z E1204 12:10:43.934000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return builder.get_block_ty(self.element_ty.to_ir(builder), self.shape) 2025-12-04T12:15:06.2953358Z E1204 12:10:43.934000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 574, in to_ir 2025-12-04T12:15:06.2954073Z E1204 12:10:43.934000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] raise ValueError(f'type {self} not supported in this architecture. ' 2025-12-04T12:15:06.2954970Z E1204 12:10:43.934000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError: type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T12:15:06.2955336Z E1204 12:10:43.934000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:06.2956032Z E1204 12:10:43.934000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] The above exception was the direct cause of the following exception: 2025-12-04T12:15:06.2956392Z E1204 12:10:43.934000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:06.2956924Z E1204 12:10:43.934000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:06.2957977Z E1204 12:10:43.934000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:06.2958609Z E1204 12:10:43.934000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:06.2959511Z E1204 12:10:43.934000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:06.2960190Z E1204 12:10:43.934000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:06.2961094Z E1204 12:10:43.934000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:06.2961896Z E1204 12:10:43.934000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:06.2962513Z E1204 12:10:43.934000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 7:11: 2025-12-04T12:15:06.2963313Z E1204 12:10:43.934000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:06.2963853Z E1204 12:10:43.934000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:06.2964426Z E1204 12:10:43.934000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T12:15:06.2964921Z E1204 12:10:43.934000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = xindex < xnumel 2025-12-04T12:15:06.2965393Z E1204 12:10:43.934000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T12:15:06.2965986Z E1204 12:10:43.934000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2025-12-04T12:15:06.2966512Z E1204 12:10:43.934000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T12:15:06.2966940Z E1204 12:10:43.934000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T12:15:06.2967760Z E1204 12:10:43.934000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T12:15:06.2967941Z ('RERUN', {'yellow': True}) [0.4281s] [100%] 2025-12-04T12:15:06.2969036Z inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float16_shape_15,3,13_dst_types0_cuda_float16 E1204 12:10:44.367000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_0 2025-12-04T12:15:06.2969799Z E1204 12:10:44.367000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:06.2970478Z E1204 12:10:44.367000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:06.2971215Z E1204 12:10:44.367000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T12:15:06.2971730Z E1204 12:10:44.367000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = xindex < xnumel 2025-12-04T12:15:06.2972166Z E1204 12:10:44.367000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T12:15:06.2972770Z E1204 12:10:44.367000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2025-12-04T12:15:06.2973309Z E1204 12:10:44.367000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T12:15:06.2973818Z E1204 12:10:44.367000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tmp1.to(tl.float32) 2025-12-04T12:15:06.2974344Z E1204 12:10:44.367000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tmp0.to(tl.float8e5) 2025-12-04T12:15:06.2974844Z E1204 12:10:44.367000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tmp3.to(tl.float32) 2025-12-04T12:15:06.2975495Z E1204 12:10:44.367000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr0 + (x0), tmp2, xmask) 2025-12-04T12:15:06.2976047Z E1204 12:10:44.367000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (x0), tmp4, xmask) 2025-12-04T12:15:06.2976472Z E1204 12:10:44.367000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:06.2978429Z E1204 12:10:44.367000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'out_ptr0': '*fp16', 'out_ptr1': '*fp16', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 512}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 75} 2025-12-04T12:15:06.2978972Z E1204 12:10:44.367000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:06.2979908Z E1204 12:10:44.367000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper 2025-12-04T12:15:06.2980416Z E1204 12:10:44.367000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return fn(*args, **kwargs) 2025-12-04T12:15:06.2981263Z E1204 12:10:44.367000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1114, in to 2025-12-04T12:15:06.2981969Z E1204 12:10:44.367000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return cast(self, dtype, fp_downcast_rounding, bitcast, _semantic=_semantic) 2025-12-04T12:15:06.2982869Z E1204 12:10:44.367000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper 2025-12-04T12:15:06.2983390Z E1204 12:10:44.367000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return fn(*args, **kwargs) 2025-12-04T12:15:06.2984238Z E1204 12:10:44.367000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1978, in cast 2025-12-04T12:15:06.2984892Z E1204 12:10:44.367000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return _semantic.cast(input, dtype, fp_downcast_rounding) 2025-12-04T12:15:06.2985769Z E1204 12:10:44.367000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/semantic.py", line 827, in cast 2025-12-04T12:15:06.2986612Z E1204 12:10:44.367000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] self.builder.create_fp_to_fp(input.handle, dst_ty.to_ir(self.builder), fp_downcast_rounding), dst_ty) 2025-12-04T12:15:06.2987458Z E1204 12:10:44.367000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 712, in to_ir 2025-12-04T12:15:06.2988155Z E1204 12:10:44.367000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return builder.get_block_ty(self.element_ty.to_ir(builder), self.shape) 2025-12-04T12:15:06.2995819Z E1204 12:10:44.367000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 574, in to_ir 2025-12-04T12:15:06.2996640Z E1204 12:10:44.367000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] raise ValueError(f'type {self} not supported in this architecture. ' 2025-12-04T12:15:06.2997544Z E1204 12:10:44.367000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError: type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T12:15:06.2997917Z E1204 12:10:44.367000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:06.2998650Z E1204 12:10:44.367000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] The above exception was the direct cause of the following exception: 2025-12-04T12:15:06.2999014Z E1204 12:10:44.367000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:06.2999550Z E1204 12:10:44.367000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:06.3000609Z E1204 12:10:44.367000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:06.3001303Z E1204 12:10:44.367000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:06.3002214Z E1204 12:10:44.367000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:06.3002893Z E1204 12:10:44.367000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:06.3003793Z E1204 12:10:44.367000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:06.3004597Z E1204 12:10:44.367000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:06.3005213Z E1204 12:10:44.367000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 7:11: 2025-12-04T12:15:06.3005990Z E1204 12:10:44.367000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:06.3006539Z E1204 12:10:44.367000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:06.3007114Z E1204 12:10:44.367000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T12:15:06.3007611Z E1204 12:10:44.367000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = xindex < xnumel 2025-12-04T12:15:06.3008046Z E1204 12:10:44.367000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T12:15:06.3008658Z E1204 12:10:44.367000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2025-12-04T12:15:06.3009186Z E1204 12:10:44.367000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T12:15:06.3009618Z E1204 12:10:44.367000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T12:15:06.3010438Z E1204 12:10:44.367000 126499 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T12:15:06.3010545Z FAILED [0.4306s] [100%] 2025-12-04T12:15:06.3010568Z 2025-12-04T12:15:06.3010754Z ==================================== RERUNS ==================================== 2025-12-04T12:15:06.3011074Z _ TestFP8TypesCUDA.test_valid_cast_float16_shape_15,3,13_dst_types0_cuda_float16 _ 2025-12-04T12:15:06.3011216Z Traceback (most recent call last): 2025-12-04T12:15:06.3011588Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 156, in test_valid_cast 2025-12-04T12:15:06.3011751Z y0_fp8, y1_fp8 = compiled_fp8_cast(x) 2025-12-04T12:15:06.3012256Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:06.3012507Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:06.3013035Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:06.3013227Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:06.3013743Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:06.3013938Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:06.3014476Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:06.3014799Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:06.3015341Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:06.3015489Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:06.3015988Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:06.3016114Z return self._compile_to_module() 2025-12-04T12:15:06.3016752Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:06.3016936Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:06.3017454Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:06.3017596Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:06.3018088Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:06.3018321Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:06.3018915Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:06.3019043Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:06.3019540Z File "/tmp/tmp326ewhxr/ei/ceilwcfm3zbt52h2etvmwfnzahy2fjygyu5fyod7cxvfuxbjibsb.py", line 51, in 2025-12-04T12:15:06.3020017Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T12:15:06.3020129Z kernel.precompile( 2025-12-04T12:15:06.3020698Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T12:15:06.3020816Z self._precompile_worker() 2025-12-04T12:15:06.3021407Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:06.3021604Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:06.3022195Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:06.3022405Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:06.3022855Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:06.3023131Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:06.3023582Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:06.3023915Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:06.3024157Z torch._inductor.exc.InductorError: CompilationError: at 7:11: 2025-12-04T12:15:06.3024581Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:06.3024705Z xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:06.3024854Z xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T12:15:06.3024961Z xmask = xindex < xnumel 2025-12-04T12:15:06.3025056Z x0 = xindex 2025-12-04T12:15:06.3025232Z tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2025-12-04T12:15:06.3025351Z tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T12:15:06.3025442Z ^ 2025-12-04T12:15:06.3025840Z type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T12:15:06.3025878Z 2025-12-04T12:15:06.3026596Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:06.3026603Z 2025-12-04T12:15:06.3026608Z 2025-12-04T12:15:06.3026841Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:06.3027464Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_valid_cast_float16_shape_15,3,13_dst_types0_cuda_float16 2025-12-04T12:15:06.3027470Z 2025-12-04T12:15:06.3027747Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:06.3027971Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:06.3028076Z frames [('total', 1)] 2025-12-04T12:15:06.3028228Z stats [('calls_captured', 4)] 2025-12-04T12:15:06.3028694Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:06.3028919Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:06.3029027Z graph_break [] 2025-12-04T12:15:06.3029340Z _ TestFP8TypesCUDA.test_valid_cast_float16_shape_15,3,13_dst_types0_cuda_float16 _ 2025-12-04T12:15:06.3029476Z Traceback (most recent call last): 2025-12-04T12:15:06.3029848Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 156, in test_valid_cast 2025-12-04T12:15:06.3029974Z y0_fp8, y1_fp8 = compiled_fp8_cast(x) 2025-12-04T12:15:06.3030469Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:06.3030714Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:06.3031234Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:06.3031439Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:06.3031953Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:06.3032109Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:06.3032643Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:06.3032962Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:06.3033494Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:06.3033639Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:06.3034131Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:06.3034286Z return self._compile_to_module() 2025-12-04T12:15:06.3034771Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:06.3034949Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:06.3035461Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:06.3035623Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:06.3036129Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:06.3036357Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:06.3036953Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:06.3037080Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:06.3037584Z File "/tmp/tmpkual5uit/bk/cbkktboxumkejm2d2l6vpn5ws4em3tvuhn4q72hetkujcqovsh66.py", line 51, in 2025-12-04T12:15:06.3038087Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T12:15:06.3038199Z kernel.precompile( 2025-12-04T12:15:06.3038761Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T12:15:06.3038882Z self._precompile_worker() 2025-12-04T12:15:06.3039474Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:06.3039664Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:06.3040262Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:06.3040493Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:06.3040952Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:06.3041202Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:06.3041651Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:06.3041987Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:06.3042218Z torch._inductor.exc.InductorError: CompilationError: at 7:11: 2025-12-04T12:15:06.3042547Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:06.3042669Z xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:06.3042814Z xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T12:15:06.3042926Z xmask = xindex < xnumel 2025-12-04T12:15:06.3043021Z x0 = xindex 2025-12-04T12:15:06.3043202Z tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2025-12-04T12:15:06.3043322Z tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T12:15:06.3043416Z ^ 2025-12-04T12:15:06.3043817Z type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T12:15:06.3043823Z 2025-12-04T12:15:06.3044530Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:06.3044539Z 2025-12-04T12:15:06.3044544Z 2025-12-04T12:15:06.3044772Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:06.3045394Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_valid_cast_float16_shape_15,3,13_dst_types0_cuda_float16 2025-12-04T12:15:06.3045399Z 2025-12-04T12:15:06.3045666Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:06.3045933Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:06.3046040Z frames [('total', 1)] 2025-12-04T12:15:06.3046167Z stats [('calls_captured', 4)] 2025-12-04T12:15:06.3046634Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:06.3046853Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:06.3047010Z graph_break [] 2025-12-04T12:15:06.3047226Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:06.3047328Z frames [('total', 1)] 2025-12-04T12:15:06.3047448Z stats [('calls_captured', 4)] 2025-12-04T12:15:06.3047664Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:06.3048131Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:06.3048236Z graph_break [] 2025-12-04T12:15:06.3048385Z =================================== FAILURES =================================== 2025-12-04T12:15:06.3048729Z _ TestFP8TypesCUDA.test_valid_cast_float16_shape_15,3,13_dst_types0_cuda_float16 _ 2025-12-04T12:15:06.3048854Z Traceback (most recent call last): 2025-12-04T12:15:06.3049230Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 156, in test_valid_cast 2025-12-04T12:15:06.3049369Z y0_fp8, y1_fp8 = compiled_fp8_cast(x) 2025-12-04T12:15:06.3049857Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:06.3050111Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:06.3050623Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:06.3050816Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:06.3051370Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:06.3051524Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:06.3052050Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:06.3052377Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:06.3052895Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:06.3053053Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:06.3053531Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:06.3053649Z return self._compile_to_module() 2025-12-04T12:15:06.3054147Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:06.3054308Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:06.3054834Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:06.3054963Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:06.3055454Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:06.3055698Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:06.3056361Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:06.3056495Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:06.3057001Z File "/tmp/tmp3vcz5kjy/ma/cmaqxfqbp3fmiawf67x57kyjg6syy2ickuczmsulf2afa5dlc5rr.py", line 51, in 2025-12-04T12:15:06.3057504Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T12:15:06.3057630Z kernel.precompile( 2025-12-04T12:15:06.3058181Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T12:15:06.3058297Z self._precompile_worker() 2025-12-04T12:15:06.3058903Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:06.3059113Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:06.3059715Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:06.3059912Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:06.3060361Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:06.3060617Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:06.3061089Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:06.3061423Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:06.3061659Z torch._inductor.exc.InductorError: CompilationError: at 7:11: 2025-12-04T12:15:06.3061980Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:06.3062111Z xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:06.3062249Z xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T12:15:06.3062359Z xmask = xindex < xnumel 2025-12-04T12:15:06.3062467Z x0 = xindex 2025-12-04T12:15:06.3062636Z tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2025-12-04T12:15:06.3062757Z tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T12:15:06.3062889Z ^ 2025-12-04T12:15:06.3063283Z type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T12:15:06.3063292Z 2025-12-04T12:15:06.3064018Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:06.3064024Z 2025-12-04T12:15:06.3064031Z 2025-12-04T12:15:06.3064249Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:06.3064872Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_valid_cast_float16_shape_15,3,13_dst_types0_cuda_float16 2025-12-04T12:15:06.3064886Z 2025-12-04T12:15:06.3065155Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:06.3065379Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:06.3065496Z frames [('total', 1)] 2025-12-04T12:15:06.3065618Z stats [('calls_captured', 4)] 2025-12-04T12:15:06.3066082Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:06.3066319Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:06.3066421Z graph_break [] 2025-12-04T12:15:06.3066649Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:06.3066755Z frames [('total', 1)] 2025-12-04T12:15:06.3066872Z stats [('calls_captured', 4)] 2025-12-04T12:15:06.3067103Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:06.3067564Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:06.3067667Z graph_break [] 2025-12-04T12:15:06.3067895Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:06.3068001Z frames [('total', 1)] 2025-12-04T12:15:06.3068158Z stats [('calls_captured', 4)] 2025-12-04T12:15:06.3068387Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:06.3068849Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:06.3068956Z graph_break [] 2025-12-04T12:15:06.3069604Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-3815c1aa47a06d85.xml - 2025-12-04T12:15:06.3069819Z =========================== short test summary info ============================ 2025-12-04T12:15:06.3070597Z FAILED [0.4306s] inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float16_shape_15,3,13_dst_types0_cuda_float16 - torch._inductor.exc.InductorError: CompilationError: at 7:11: 2025-12-04T12:15:06.3070921Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:06.3071321Z xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:06.3071466Z xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T12:15:06.3071665Z xmask = xindex < xnumel 2025-12-04T12:15:06.3071766Z x0 = xindex 2025-12-04T12:15:06.3071937Z tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2025-12-04T12:15:06.3072056Z tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T12:15:06.3072159Z ^ 2025-12-04T12:15:06.3072553Z type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T12:15:06.3072560Z 2025-12-04T12:15:06.3073280Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:06.3073285Z 2025-12-04T12:15:06.3073290Z 2025-12-04T12:15:06.3073507Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:06.3074170Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_valid_cast_float16_shape_15,3,13_dst_types0_cuda_float16 2025-12-04T12:15:06.3074177Z 2025-12-04T12:15:06.3074461Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:06.3074641Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T12:15:06.3074854Z ================== 1 failed, 187 deselected, 2 rerun in 4.38s ================== 2025-12-04T12:15:06.3074956Z Got exit code 1 2025-12-04T12:15:06.3075495Z FAILED CONSISTENTLY: test/inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float16_shape_15,3,13_dst_types0_cuda_float16 2025-12-04T12:15:06.3075913Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T12:15:06.3076380Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-69850f25ab7699fd.xml 2025-12-04T12:15:06.3076557Z ============================= test session starts ============================== 2025-12-04T12:15:06.3076916Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T12:15:06.3077031Z cachedir: .pytest_cache 2025-12-04T12:15:06.3077559Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:15:06.3077683Z rootdir: /var/lib/jenkins/workspace 2025-12-04T12:15:06.3077795Z configfile: pytest.ini 2025-12-04T12:15:06.3078398Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T12:15:06.3078625Z collecting ... collected 188 items / 62 deselected / 126 selected 2025-12-04T12:15:06.3078776Z stepcurrent: skipping 62 already run items. 2025-12-04T12:15:06.3078889Z Running 126 items in this shard 2025-12-04T12:15:06.3078894Z 2025-12-04T12:15:06.3080058Z inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float16_shape_4,2048,4096_dst_types0_cuda_float16 E1204 12:11:03.467000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_0 2025-12-04T12:15:06.3080835Z E1204 12:11:03.467000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:06.3081374Z E1204 12:11:03.467000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:06.3081985Z E1204 12:11:03.467000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T12:15:06.3082483Z E1204 12:11:03.467000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = xindex < xnumel 2025-12-04T12:15:06.3082916Z E1204 12:11:03.467000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T12:15:06.3083530Z E1204 12:11:03.467000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2025-12-04T12:15:06.3084086Z E1204 12:11:03.467000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T12:15:06.3084603Z E1204 12:11:03.467000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tmp1.to(tl.float32) 2025-12-04T12:15:06.3085110Z E1204 12:11:03.467000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tmp0.to(tl.float8e5) 2025-12-04T12:15:06.3085624Z E1204 12:11:03.467000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tmp3.to(tl.float32) 2025-12-04T12:15:06.3086170Z E1204 12:11:03.467000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr0 + (x0), tmp2, xmask) 2025-12-04T12:15:06.3086748Z E1204 12:11:03.467000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (x0), tmp4, xmask) 2025-12-04T12:15:06.3087124Z E1204 12:11:03.467000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:06.3088923Z E1204 12:11:03.467000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'out_ptr0': '*fp16', 'out_ptr1': '*fp16', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1024}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 75} 2025-12-04T12:15:06.3089474Z E1204 12:11:03.467000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:06.3090341Z E1204 12:11:03.467000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper 2025-12-04T12:15:06.3090859Z E1204 12:11:03.467000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return fn(*args, **kwargs) 2025-12-04T12:15:06.3091689Z E1204 12:11:03.467000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1114, in to 2025-12-04T12:15:06.3092398Z E1204 12:11:03.467000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return cast(self, dtype, fp_downcast_rounding, bitcast, _semantic=_semantic) 2025-12-04T12:15:06.3093265Z E1204 12:11:03.467000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper 2025-12-04T12:15:06.3093826Z E1204 12:11:03.467000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return fn(*args, **kwargs) 2025-12-04T12:15:06.3094683Z E1204 12:11:03.467000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1978, in cast 2025-12-04T12:15:06.3095315Z E1204 12:11:03.467000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return _semantic.cast(input, dtype, fp_downcast_rounding) 2025-12-04T12:15:06.3096228Z E1204 12:11:03.467000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/semantic.py", line 827, in cast 2025-12-04T12:15:06.3097112Z E1204 12:11:03.467000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] self.builder.create_fp_to_fp(input.handle, dst_ty.to_ir(self.builder), fp_downcast_rounding), dst_ty) 2025-12-04T12:15:06.3097957Z E1204 12:11:03.467000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 712, in to_ir 2025-12-04T12:15:06.3098699Z E1204 12:11:03.467000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return builder.get_block_ty(self.element_ty.to_ir(builder), self.shape) 2025-12-04T12:15:06.3099547Z E1204 12:11:03.467000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 574, in to_ir 2025-12-04T12:15:06.3100244Z E1204 12:11:03.467000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] raise ValueError(f'type {self} not supported in this architecture. ' 2025-12-04T12:15:06.3101160Z E1204 12:11:03.467000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError: type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T12:15:06.3101538Z E1204 12:11:03.467000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:06.3102213Z E1204 12:11:03.467000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] The above exception was the direct cause of the following exception: 2025-12-04T12:15:06.3102573Z E1204 12:11:03.467000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:06.3103111Z E1204 12:11:03.467000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:06.3104151Z E1204 12:11:03.467000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:06.3104794Z E1204 12:11:03.467000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:06.3105676Z E1204 12:11:03.467000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:06.3106367Z E1204 12:11:03.467000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:06.3107241Z E1204 12:11:03.467000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:06.3108047Z E1204 12:11:03.467000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:06.3108665Z E1204 12:11:03.467000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 7:11: 2025-12-04T12:15:06.3109427Z E1204 12:11:03.467000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:06.3110013Z E1204 12:11:03.467000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:06.3110569Z E1204 12:11:03.467000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T12:15:06.3111073Z E1204 12:11:03.467000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = xindex < xnumel 2025-12-04T12:15:06.3111504Z E1204 12:11:03.467000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T12:15:06.3112097Z E1204 12:11:03.467000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2025-12-04T12:15:06.3112674Z E1204 12:11:03.467000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T12:15:06.3113088Z E1204 12:11:03.467000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T12:15:06.3113923Z E1204 12:11:03.467000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T12:15:06.3114058Z ('RERUN', {'yellow': True}) [3.4849s] [ 0%] 2025-12-04T12:15:06.3115203Z inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float16_shape_4,2048,4096_dst_types0_cuda_float16 E1204 12:11:03.946000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_0 2025-12-04T12:15:06.3115976Z E1204 12:11:03.946000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:06.3116518Z E1204 12:11:03.946000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:06.3117089Z E1204 12:11:03.946000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T12:15:06.3117585Z E1204 12:11:03.946000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = xindex < xnumel 2025-12-04T12:15:06.3118011Z E1204 12:11:03.946000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T12:15:06.3118621Z E1204 12:11:03.946000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2025-12-04T12:15:06.3119146Z E1204 12:11:03.946000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T12:15:06.3119660Z E1204 12:11:03.946000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tmp1.to(tl.float32) 2025-12-04T12:15:06.3120173Z E1204 12:11:03.946000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tmp0.to(tl.float8e5) 2025-12-04T12:15:06.3120678Z E1204 12:11:03.946000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tmp3.to(tl.float32) 2025-12-04T12:15:06.3121222Z E1204 12:11:03.946000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr0 + (x0), tmp2, xmask) 2025-12-04T12:15:06.3121825Z E1204 12:11:03.946000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (x0), tmp4, xmask) 2025-12-04T12:15:06.3122199Z E1204 12:11:03.946000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:06.3123998Z E1204 12:11:03.946000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'out_ptr0': '*fp16', 'out_ptr1': '*fp16', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1024}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 75} 2025-12-04T12:15:06.3124568Z E1204 12:11:03.946000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:06.3125437Z E1204 12:11:03.946000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper 2025-12-04T12:15:06.3125987Z E1204 12:11:03.946000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return fn(*args, **kwargs) 2025-12-04T12:15:06.3126818Z E1204 12:11:03.946000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1114, in to 2025-12-04T12:15:06.3127526Z E1204 12:11:03.946000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return cast(self, dtype, fp_downcast_rounding, bitcast, _semantic=_semantic) 2025-12-04T12:15:06.3128391Z E1204 12:11:03.946000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper 2025-12-04T12:15:06.3128928Z E1204 12:11:03.946000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return fn(*args, **kwargs) 2025-12-04T12:15:06.3129791Z E1204 12:11:03.946000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1978, in cast 2025-12-04T12:15:06.3130421Z E1204 12:11:03.946000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return _semantic.cast(input, dtype, fp_downcast_rounding) 2025-12-04T12:15:06.3131307Z E1204 12:11:03.946000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/semantic.py", line 827, in cast 2025-12-04T12:15:06.3132123Z E1204 12:11:03.946000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] self.builder.create_fp_to_fp(input.handle, dst_ty.to_ir(self.builder), fp_downcast_rounding), dst_ty) 2025-12-04T12:15:06.3132966Z E1204 12:11:03.946000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 712, in to_ir 2025-12-04T12:15:06.3133672Z E1204 12:11:03.946000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return builder.get_block_ty(self.element_ty.to_ir(builder), self.shape) 2025-12-04T12:15:06.3134515Z E1204 12:11:03.946000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 574, in to_ir 2025-12-04T12:15:06.3135209Z E1204 12:11:03.946000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] raise ValueError(f'type {self} not supported in this architecture. ' 2025-12-04T12:15:06.3136124Z E1204 12:11:03.946000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError: type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T12:15:06.3136562Z E1204 12:11:03.946000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:06.3137241Z E1204 12:11:03.946000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] The above exception was the direct cause of the following exception: 2025-12-04T12:15:06.3137639Z E1204 12:11:03.946000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:06.3138187Z E1204 12:11:03.946000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:06.3139217Z E1204 12:11:03.946000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:06.3139860Z E1204 12:11:03.946000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:06.3140787Z E1204 12:11:03.946000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:06.3141477Z E1204 12:11:03.946000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:06.3142357Z E1204 12:11:03.946000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:06.3143163Z E1204 12:11:03.946000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:06.3143790Z E1204 12:11:03.946000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 7:11: 2025-12-04T12:15:06.3144547Z E1204 12:11:03.946000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:06.3145103Z E1204 12:11:03.946000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:06.3145662Z E1204 12:11:03.946000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T12:15:06.3146168Z E1204 12:11:03.946000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = xindex < xnumel 2025-12-04T12:15:06.3146604Z E1204 12:11:03.946000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T12:15:06.3147200Z E1204 12:11:03.946000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2025-12-04T12:15:06.3147734Z E1204 12:11:03.946000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T12:15:06.3148149Z E1204 12:11:03.946000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T12:15:06.3149036Z E1204 12:11:03.946000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T12:15:06.3149219Z ('RERUN', {'yellow': True}) [0.4389s] [ 0%] 2025-12-04T12:15:06.3150414Z inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float16_shape_4,2048,4096_dst_types0_cuda_float16 E1204 12:11:04.391000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_0 2025-12-04T12:15:06.3151175Z E1204 12:11:04.391000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:06.3151718Z E1204 12:11:04.391000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:06.3152314Z E1204 12:11:04.391000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T12:15:06.3152807Z E1204 12:11:04.391000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = xindex < xnumel 2025-12-04T12:15:06.3153237Z E1204 12:11:04.391000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T12:15:06.3153839Z E1204 12:11:04.391000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2025-12-04T12:15:06.3154393Z E1204 12:11:04.391000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T12:15:06.3154915Z E1204 12:11:04.391000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tmp1.to(tl.float32) 2025-12-04T12:15:06.3155422Z E1204 12:11:04.391000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tmp0.to(tl.float8e5) 2025-12-04T12:15:06.3155938Z E1204 12:11:04.391000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tmp3.to(tl.float32) 2025-12-04T12:15:06.3156482Z E1204 12:11:04.391000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr0 + (x0), tmp2, xmask) 2025-12-04T12:15:06.3157056Z E1204 12:11:04.391000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (x0), tmp4, xmask) 2025-12-04T12:15:06.3157446Z E1204 12:11:04.391000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:06.3159256Z E1204 12:11:04.391000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'out_ptr0': '*fp16', 'out_ptr1': '*fp16', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1024}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 75} 2025-12-04T12:15:06.3159808Z E1204 12:11:04.391000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:06.3160680Z E1204 12:11:04.391000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper 2025-12-04T12:15:06.3161202Z E1204 12:11:04.391000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return fn(*args, **kwargs) 2025-12-04T12:15:06.3162042Z E1204 12:11:04.391000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1114, in to 2025-12-04T12:15:06.3162754Z E1204 12:11:04.391000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return cast(self, dtype, fp_downcast_rounding, bitcast, _semantic=_semantic) 2025-12-04T12:15:06.3163622Z E1204 12:11:04.391000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper 2025-12-04T12:15:06.3164168Z E1204 12:11:04.391000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return fn(*args, **kwargs) 2025-12-04T12:15:06.3165032Z E1204 12:11:04.391000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1978, in cast 2025-12-04T12:15:06.3165671Z E1204 12:11:04.391000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return _semantic.cast(input, dtype, fp_downcast_rounding) 2025-12-04T12:15:06.3166589Z E1204 12:11:04.391000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/semantic.py", line 827, in cast 2025-12-04T12:15:06.3167411Z E1204 12:11:04.391000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] self.builder.create_fp_to_fp(input.handle, dst_ty.to_ir(self.builder), fp_downcast_rounding), dst_ty) 2025-12-04T12:15:06.3168256Z E1204 12:11:04.391000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 712, in to_ir 2025-12-04T12:15:06.3168997Z E1204 12:11:04.391000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return builder.get_block_ty(self.element_ty.to_ir(builder), self.shape) 2025-12-04T12:15:06.3169844Z E1204 12:11:04.391000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 574, in to_ir 2025-12-04T12:15:06.3170545Z E1204 12:11:04.391000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] raise ValueError(f'type {self} not supported in this architecture. ' 2025-12-04T12:15:06.3171673Z E1204 12:11:04.391000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError: type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T12:15:06.3172059Z E1204 12:11:04.391000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:06.3172741Z E1204 12:11:04.391000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] The above exception was the direct cause of the following exception: 2025-12-04T12:15:06.3173104Z E1204 12:11:04.391000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:06.3173657Z E1204 12:11:04.391000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:06.3174698Z E1204 12:11:04.391000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:06.3175349Z E1204 12:11:04.391000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:06.3176242Z E1204 12:11:04.391000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:06.3176995Z E1204 12:11:04.391000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:06.3177890Z E1204 12:11:04.391000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:06.3178661Z E1204 12:11:04.391000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:06.3179352Z E1204 12:11:04.391000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 7:11: 2025-12-04T12:15:06.3180111Z E1204 12:11:04.391000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:06.3180672Z E1204 12:11:04.391000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:06.3181274Z E1204 12:11:04.391000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T12:15:06.3181783Z E1204 12:11:04.391000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = xindex < xnumel 2025-12-04T12:15:06.3182218Z E1204 12:11:04.391000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T12:15:06.3182818Z E1204 12:11:04.391000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2025-12-04T12:15:06.3183404Z E1204 12:11:04.391000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T12:15:06.3183822Z E1204 12:11:04.391000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T12:15:06.3184663Z E1204 12:11:04.391000 126696 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T12:15:06.3184773Z FAILED [0.4426s] [ 0%] 2025-12-04T12:15:06.3184780Z 2025-12-04T12:15:06.3184929Z ==================================== RERUNS ==================================== 2025-12-04T12:15:06.3185310Z _ TestFP8TypesCUDA.test_valid_cast_float16_shape_4,2048,4096_dst_types0_cuda_float16 _ 2025-12-04T12:15:06.3185446Z Traceback (most recent call last): 2025-12-04T12:15:06.3185838Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 156, in test_valid_cast 2025-12-04T12:15:06.3185974Z y0_fp8, y1_fp8 = compiled_fp8_cast(x) 2025-12-04T12:15:06.3186469Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:06.3186738Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:06.3187254Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:06.3187447Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:06.3187971Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:06.3188120Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:06.3188671Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:06.3188998Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:06.3189516Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:06.3189680Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:06.3190158Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:06.3190294Z return self._compile_to_module() 2025-12-04T12:15:06.3190778Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:06.3190940Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:06.3191517Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:06.3191651Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:06.3192150Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:06.3192396Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:06.3192982Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:06.3193154Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:06.3193664Z File "/tmp/tmp80bbx3c6/wi/cwivymddcbyswrvb4lnwapcjxwlo2mbmdks5ttplf6lzedjniysb.py", line 51, in 2025-12-04T12:15:06.3194128Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T12:15:06.3194251Z kernel.precompile( 2025-12-04T12:15:06.3194809Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T12:15:06.3194941Z self._precompile_worker() 2025-12-04T12:15:06.3195597Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:06.3195781Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:06.3196390Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:06.3196590Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:06.3197037Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:06.3197295Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:06.3197772Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:06.3198131Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:06.3198368Z torch._inductor.exc.InductorError: CompilationError: at 7:11: 2025-12-04T12:15:06.3198689Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:06.3198832Z xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:06.3198975Z xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T12:15:06.3199096Z xmask = xindex < xnumel 2025-12-04T12:15:06.3199193Z x0 = xindex 2025-12-04T12:15:06.3199360Z tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2025-12-04T12:15:06.3199492Z tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T12:15:06.3199585Z ^ 2025-12-04T12:15:06.3199974Z type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T12:15:06.3199980Z 2025-12-04T12:15:06.3200724Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:06.3200736Z 2025-12-04T12:15:06.3200741Z 2025-12-04T12:15:06.3201054Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:06.3201712Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_valid_cast_float16_shape_4,2048,4096_dst_types0_cuda_float16 2025-12-04T12:15:06.3201720Z 2025-12-04T12:15:06.3201988Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:06.3202213Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:06.3202333Z frames [('total', 1)] 2025-12-04T12:15:06.3202451Z stats [('calls_captured', 4)] 2025-12-04T12:15:06.3202933Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:06.3203203Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:06.3203309Z graph_break [] 2025-12-04T12:15:06.3203652Z _ TestFP8TypesCUDA.test_valid_cast_float16_shape_4,2048,4096_dst_types0_cuda_float16 _ 2025-12-04T12:15:06.3203778Z Traceback (most recent call last): 2025-12-04T12:15:06.3204149Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 156, in test_valid_cast 2025-12-04T12:15:06.3204326Z y0_fp8, y1_fp8 = compiled_fp8_cast(x) 2025-12-04T12:15:06.3204823Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:06.3205087Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:06.3205604Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:06.3205798Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:06.3206327Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:06.3206506Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:06.3207055Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:06.3207374Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:06.3207897Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:06.3208059Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:06.3208539Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:06.3208662Z return self._compile_to_module() 2025-12-04T12:15:06.3209196Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:06.3209364Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:06.3209896Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:06.3210025Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:06.3210521Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:06.3210767Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:06.3211351Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:06.3211491Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:06.3211965Z File "/tmp/tmp_gicfj3x/mb/cmbbr5crlh2ptxfo7qok6n3up7mmeimuqe4xybef6pb52lexcftt.py", line 51, in 2025-12-04T12:15:06.3212434Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T12:15:06.3212559Z kernel.precompile( 2025-12-04T12:15:06.3213119Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T12:15:06.3213237Z self._precompile_worker() 2025-12-04T12:15:06.3213845Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:06.3214028Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:06.3214632Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:06.3214830Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:06.3215281Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:06.3215575Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:06.3216022Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:06.3216437Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:06.3216673Z torch._inductor.exc.InductorError: CompilationError: at 7:11: 2025-12-04T12:15:06.3217031Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:06.3217170Z xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:06.3217310Z xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T12:15:06.3217422Z xmask = xindex < xnumel 2025-12-04T12:15:06.3217532Z x0 = xindex 2025-12-04T12:15:06.3217702Z tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2025-12-04T12:15:06.3217822Z tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T12:15:06.3217928Z ^ 2025-12-04T12:15:06.3218321Z type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T12:15:06.3218355Z 2025-12-04T12:15:06.3219079Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:06.3219086Z 2025-12-04T12:15:06.3219091Z 2025-12-04T12:15:06.3219312Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:06.3219965Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_valid_cast_float16_shape_4,2048,4096_dst_types0_cuda_float16 2025-12-04T12:15:06.3219971Z 2025-12-04T12:15:06.3220240Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:06.3220463Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:06.3220582Z frames [('total', 1)] 2025-12-04T12:15:06.3220729Z stats [('calls_captured', 4)] 2025-12-04T12:15:06.3221202Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:06.3221441Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:06.3221543Z graph_break [] 2025-12-04T12:15:06.3221774Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:06.3221883Z frames [('total', 1)] 2025-12-04T12:15:06.3222000Z stats [('calls_captured', 4)] 2025-12-04T12:15:06.3222240Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:06.3222703Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:06.3222804Z graph_break [] 2025-12-04T12:15:06.3222964Z =================================== FAILURES =================================== 2025-12-04T12:15:06.3223292Z _ TestFP8TypesCUDA.test_valid_cast_float16_shape_4,2048,4096_dst_types0_cuda_float16 _ 2025-12-04T12:15:06.3223435Z Traceback (most recent call last): 2025-12-04T12:15:06.3223813Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 156, in test_valid_cast 2025-12-04T12:15:06.3223942Z y0_fp8, y1_fp8 = compiled_fp8_cast(x) 2025-12-04T12:15:06.3224448Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:06.3224703Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:06.3225215Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:06.3225423Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:06.3225930Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:06.3226094Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:06.3226729Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:06.3227053Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:06.3227585Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:06.3227765Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:06.3228259Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:06.3228383Z return self._compile_to_module() 2025-12-04T12:15:06.3228869Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:06.3229048Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:06.3229570Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:06.3229734Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:06.3230244Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:06.3230473Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:06.3231072Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:06.3231201Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:06.3231678Z File "/tmp/tmp4u_71oi1/nx/cnxgc4rmxwyznuxvync4cnctmjmzcj4itoxroyo7obcx3jrk2mci.py", line 51, in 2025-12-04T12:15:06.3232155Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T12:15:06.3232269Z kernel.precompile( 2025-12-04T12:15:06.3232866Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T12:15:06.3232989Z self._precompile_worker() 2025-12-04T12:15:06.3233585Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:06.3233778Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:06.3234377Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:06.3234577Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:06.3235043Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:06.3235291Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:06.3235754Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:06.3236093Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:06.3236331Z torch._inductor.exc.InductorError: CompilationError: at 7:11: 2025-12-04T12:15:06.3236670Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:06.3236800Z xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:06.3236959Z xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T12:15:06.3237072Z xmask = xindex < xnumel 2025-12-04T12:15:06.3237171Z x0 = xindex 2025-12-04T12:15:06.3237355Z tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2025-12-04T12:15:06.3237478Z tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T12:15:06.3237573Z ^ 2025-12-04T12:15:06.3237976Z type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T12:15:06.3237982Z 2025-12-04T12:15:06.3238731Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:06.3238740Z 2025-12-04T12:15:06.3238745Z 2025-12-04T12:15:06.3238980Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:06.3239617Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_valid_cast_float16_shape_4,2048,4096_dst_types0_cuda_float16 2025-12-04T12:15:06.3239670Z 2025-12-04T12:15:06.3239939Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:06.3240178Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:06.3240286Z frames [('total', 1)] 2025-12-04T12:15:06.3240419Z stats [('calls_captured', 4)] 2025-12-04T12:15:06.3240891Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:06.3241118Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:06.3241268Z graph_break [] 2025-12-04T12:15:06.3241491Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:06.3241597Z frames [('total', 1)] 2025-12-04T12:15:06.3241730Z stats [('calls_captured', 4)] 2025-12-04T12:15:06.3241950Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:06.3242430Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:06.3242534Z graph_break [] 2025-12-04T12:15:06.3242754Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:06.3242876Z frames [('total', 1)] 2025-12-04T12:15:06.3242996Z stats [('calls_captured', 4)] 2025-12-04T12:15:06.3243218Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:06.3243740Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:06.3243846Z graph_break [] 2025-12-04T12:15:06.3244509Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-69850f25ab7699fd.xml - 2025-12-04T12:15:06.3244684Z =========================== short test summary info ============================ 2025-12-04T12:15:06.3245479Z FAILED [0.4426s] inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float16_shape_4,2048,4096_dst_types0_cuda_float16 - torch._inductor.exc.InductorError: CompilationError: at 7:11: 2025-12-04T12:15:06.3245815Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:06.3245941Z xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:06.3246082Z xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T12:15:06.3246205Z xmask = xindex < xnumel 2025-12-04T12:15:06.3246302Z x0 = xindex 2025-12-04T12:15:06.3246486Z tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2025-12-04T12:15:06.3246609Z tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T12:15:06.3246698Z ^ 2025-12-04T12:15:06.3247101Z type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T12:15:06.3247108Z 2025-12-04T12:15:06.3247817Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:06.3247825Z 2025-12-04T12:15:06.3247829Z 2025-12-04T12:15:06.3248058Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:06.3248697Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_valid_cast_float16_shape_4,2048,4096_dst_types0_cuda_float16 2025-12-04T12:15:06.3248703Z 2025-12-04T12:15:06.3248971Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:06.3249201Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T12:15:06.3249407Z ================== 1 failed, 62 deselected, 2 rerun in 4.41s =================== 2025-12-04T12:15:06.3249518Z Got exit code 1 2025-12-04T12:15:06.3249628Z Retrying single test... 2025-12-04T12:15:06.3250104Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-da23a1d59c747be6.xml 2025-12-04T12:15:06.3250316Z ============================= test session starts ============================== 2025-12-04T12:15:06.3250671Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T12:15:06.3250780Z cachedir: .pytest_cache 2025-12-04T12:15:06.3251313Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:15:06.3251442Z rootdir: /var/lib/jenkins/workspace 2025-12-04T12:15:06.3251555Z configfile: pytest.ini 2025-12-04T12:15:06.3252161Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T12:15:06.3252416Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T12:15:06.3253148Z stepcurrent: skipping 62 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float16_shape_4,2048,4096_dst_types0_cuda_float16 2025-12-04T12:15:06.3253268Z Running 1 items in this shard 2025-12-04T12:15:06.3253273Z 2025-12-04T12:15:06.3254399Z inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float16_shape_4,2048,4096_dst_types0_cuda_float16 E1204 12:11:23.511000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_0 2025-12-04T12:15:06.3255197Z E1204 12:11:23.511000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:06.3255745Z E1204 12:11:23.511000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:06.3256415Z E1204 12:11:23.511000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T12:15:06.3256914Z E1204 12:11:23.511000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = xindex < xnumel 2025-12-04T12:15:06.3257357Z E1204 12:11:23.511000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T12:15:06.3257954Z E1204 12:11:23.511000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2025-12-04T12:15:06.3258484Z E1204 12:11:23.511000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T12:15:06.3259002Z E1204 12:11:23.511000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tmp1.to(tl.float32) 2025-12-04T12:15:06.3259518Z E1204 12:11:23.511000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tmp0.to(tl.float8e5) 2025-12-04T12:15:06.3260033Z E1204 12:11:23.511000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tmp3.to(tl.float32) 2025-12-04T12:15:06.3260580Z E1204 12:11:23.511000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr0 + (x0), tmp2, xmask) 2025-12-04T12:15:06.3261124Z E1204 12:11:23.511000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (x0), tmp4, xmask) 2025-12-04T12:15:06.3261504Z E1204 12:11:23.511000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:06.3263350Z E1204 12:11:23.511000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'out_ptr0': '*fp16', 'out_ptr1': '*fp16', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1024}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 75} 2025-12-04T12:15:06.3263930Z E1204 12:11:23.511000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:06.3264797Z E1204 12:11:23.511000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper 2025-12-04T12:15:06.3265324Z E1204 12:11:23.511000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return fn(*args, **kwargs) 2025-12-04T12:15:06.3266157Z E1204 12:11:23.511000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1114, in to 2025-12-04T12:15:06.3266909Z E1204 12:11:23.511000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return cast(self, dtype, fp_downcast_rounding, bitcast, _semantic=_semantic) 2025-12-04T12:15:06.3267766Z E1204 12:11:23.511000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper 2025-12-04T12:15:06.3268271Z E1204 12:11:23.511000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return fn(*args, **kwargs) 2025-12-04T12:15:06.3269158Z E1204 12:11:23.511000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1978, in cast 2025-12-04T12:15:06.3269791Z E1204 12:11:23.511000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return _semantic.cast(input, dtype, fp_downcast_rounding) 2025-12-04T12:15:06.3270671Z E1204 12:11:23.511000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/semantic.py", line 827, in cast 2025-12-04T12:15:06.3271690Z E1204 12:11:23.511000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] self.builder.create_fp_to_fp(input.handle, dst_ty.to_ir(self.builder), fp_downcast_rounding), dst_ty) 2025-12-04T12:15:06.3272553Z E1204 12:11:23.511000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 712, in to_ir 2025-12-04T12:15:06.3273254Z E1204 12:11:23.511000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return builder.get_block_ty(self.element_ty.to_ir(builder), self.shape) 2025-12-04T12:15:06.3274100Z E1204 12:11:23.511000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 574, in to_ir 2025-12-04T12:15:06.3274801Z E1204 12:11:23.511000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] raise ValueError(f'type {self} not supported in this architecture. ' 2025-12-04T12:15:06.3275682Z E1204 12:11:23.511000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError: type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T12:15:06.3276061Z E1204 12:11:23.511000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:06.3276822Z E1204 12:11:23.511000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] The above exception was the direct cause of the following exception: 2025-12-04T12:15:06.3277200Z E1204 12:11:23.511000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:06.3277738Z E1204 12:11:23.511000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:06.3278828Z E1204 12:11:23.511000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:06.3279471Z E1204 12:11:23.511000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:06.3280366Z E1204 12:11:23.511000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:06.3281113Z E1204 12:11:23.511000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:06.3281994Z E1204 12:11:23.511000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:06.3282778Z E1204 12:11:23.511000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:06.3283388Z E1204 12:11:23.511000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 7:11: 2025-12-04T12:15:06.3284191Z E1204 12:11:23.511000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:06.3284757Z E1204 12:11:23.511000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:06.3285316Z E1204 12:11:23.511000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T12:15:06.3285893Z E1204 12:11:23.511000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = xindex < xnumel 2025-12-04T12:15:06.3286370Z E1204 12:11:23.511000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T12:15:06.3286960Z E1204 12:11:23.511000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2025-12-04T12:15:06.3287508Z E1204 12:11:23.511000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T12:15:06.3287923Z E1204 12:11:23.511000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T12:15:06.3288766Z E1204 12:11:23.511000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T12:15:06.3288906Z ('RERUN', {'yellow': True}) [3.5066s] [100%] 2025-12-04T12:15:06.3290032Z inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float16_shape_4,2048,4096_dst_types0_cuda_float16 E1204 12:11:23.992000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_0 2025-12-04T12:15:06.3290858Z E1204 12:11:23.992000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:06.3291408Z E1204 12:11:23.992000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:06.3291986Z E1204 12:11:23.992000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T12:15:06.3292479Z E1204 12:11:23.992000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = xindex < xnumel 2025-12-04T12:15:06.3292955Z E1204 12:11:23.992000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T12:15:06.3293546Z E1204 12:11:23.992000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2025-12-04T12:15:06.3294075Z E1204 12:11:23.992000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T12:15:06.3294600Z E1204 12:11:23.992000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tmp1.to(tl.float32) 2025-12-04T12:15:06.3295147Z E1204 12:11:23.992000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tmp0.to(tl.float8e5) 2025-12-04T12:15:06.3295663Z E1204 12:11:23.992000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tmp3.to(tl.float32) 2025-12-04T12:15:06.3296207Z E1204 12:11:23.992000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr0 + (x0), tmp2, xmask) 2025-12-04T12:15:06.3296818Z E1204 12:11:23.992000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (x0), tmp4, xmask) 2025-12-04T12:15:06.3297198Z E1204 12:11:23.992000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:06.3299048Z E1204 12:11:23.992000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'out_ptr0': '*fp16', 'out_ptr1': '*fp16', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1024}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 75} 2025-12-04T12:15:06.3299603Z E1204 12:11:23.992000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:06.3300465Z E1204 12:11:23.992000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper 2025-12-04T12:15:06.3300989Z E1204 12:11:23.992000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return fn(*args, **kwargs) 2025-12-04T12:15:06.3301832Z E1204 12:11:23.992000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1114, in to 2025-12-04T12:15:06.3302563Z E1204 12:11:23.992000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return cast(self, dtype, fp_downcast_rounding, bitcast, _semantic=_semantic) 2025-12-04T12:15:06.3303427Z E1204 12:11:23.992000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper 2025-12-04T12:15:06.3303930Z E1204 12:11:23.992000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return fn(*args, **kwargs) 2025-12-04T12:15:06.3305051Z E1204 12:11:23.992000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1978, in cast 2025-12-04T12:15:06.3305697Z E1204 12:11:23.992000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return _semantic.cast(input, dtype, fp_downcast_rounding) 2025-12-04T12:15:06.3306585Z E1204 12:11:23.992000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/semantic.py", line 827, in cast 2025-12-04T12:15:06.3307443Z E1204 12:11:23.992000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] self.builder.create_fp_to_fp(input.handle, dst_ty.to_ir(self.builder), fp_downcast_rounding), dst_ty) 2025-12-04T12:15:06.3308299Z E1204 12:11:23.992000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 712, in to_ir 2025-12-04T12:15:06.3309001Z E1204 12:11:23.992000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return builder.get_block_ty(self.element_ty.to_ir(builder), self.shape) 2025-12-04T12:15:06.3309875Z E1204 12:11:23.992000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 574, in to_ir 2025-12-04T12:15:06.3310576Z E1204 12:11:23.992000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] raise ValueError(f'type {self} not supported in this architecture. ' 2025-12-04T12:15:06.3311462Z E1204 12:11:23.992000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError: type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T12:15:06.3311839Z E1204 12:11:23.992000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:06.3312550Z E1204 12:11:23.992000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] The above exception was the direct cause of the following exception: 2025-12-04T12:15:06.3312928Z E1204 12:11:23.992000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:06.3313458Z E1204 12:11:23.992000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:06.3314501Z E1204 12:11:23.992000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:06.3315145Z E1204 12:11:23.992000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:06.3316043Z E1204 12:11:23.992000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:06.3316732Z E1204 12:11:23.992000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:06.3317624Z E1204 12:11:23.992000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:06.3318409Z E1204 12:11:23.992000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:06.3319020Z E1204 12:11:23.992000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 7:11: 2025-12-04T12:15:06.3319816Z E1204 12:11:23.992000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:06.3320376Z E1204 12:11:23.992000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:06.3320933Z E1204 12:11:23.992000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T12:15:06.3321471Z E1204 12:11:23.992000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = xindex < xnumel 2025-12-04T12:15:06.3321900Z E1204 12:11:23.992000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T12:15:06.3322503Z E1204 12:11:23.992000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2025-12-04T12:15:06.3323037Z E1204 12:11:23.992000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T12:15:06.3323454Z E1204 12:11:23.992000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T12:15:06.3324323Z E1204 12:11:23.992000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T12:15:06.3324473Z ('RERUN', {'yellow': True}) [0.4415s] [100%] 2025-12-04T12:15:06.3325609Z inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float16_shape_4,2048,4096_dst_types0_cuda_float16 E1204 12:11:24.435000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_0 2025-12-04T12:15:06.3326399Z E1204 12:11:24.435000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:06.3326947Z E1204 12:11:24.435000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:06.3327524Z E1204 12:11:24.435000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T12:15:06.3328017Z E1204 12:11:24.435000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = xindex < xnumel 2025-12-04T12:15:06.3328469Z E1204 12:11:24.435000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T12:15:06.3329068Z E1204 12:11:24.435000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2025-12-04T12:15:06.3329591Z E1204 12:11:24.435000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T12:15:06.3330116Z E1204 12:11:24.435000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tmp1.to(tl.float32) 2025-12-04T12:15:06.3330629Z E1204 12:11:24.435000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tmp0.to(tl.float8e5) 2025-12-04T12:15:06.3331149Z E1204 12:11:24.435000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tmp3.to(tl.float32) 2025-12-04T12:15:06.3331698Z E1204 12:11:24.435000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr0 + (x0), tmp2, xmask) 2025-12-04T12:15:06.3332259Z E1204 12:11:24.435000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (x0), tmp4, xmask) 2025-12-04T12:15:06.3332621Z E1204 12:11:24.435000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:06.3334457Z E1204 12:11:24.435000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'out_ptr0': '*fp16', 'out_ptr1': '*fp16', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1024}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 75} 2025-12-04T12:15:06.3335042Z E1204 12:11:24.435000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:06.3335904Z E1204 12:11:24.435000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper 2025-12-04T12:15:06.3336501Z E1204 12:11:24.435000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return fn(*args, **kwargs) 2025-12-04T12:15:06.3337339Z E1204 12:11:24.435000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1114, in to 2025-12-04T12:15:06.3338103Z E1204 12:11:24.435000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return cast(self, dtype, fp_downcast_rounding, bitcast, _semantic=_semantic) 2025-12-04T12:15:06.3338956Z E1204 12:11:24.435000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper 2025-12-04T12:15:06.3339463Z E1204 12:11:24.435000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return fn(*args, **kwargs) 2025-12-04T12:15:06.3340370Z E1204 12:11:24.435000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1978, in cast 2025-12-04T12:15:06.3341016Z E1204 12:11:24.435000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return _semantic.cast(input, dtype, fp_downcast_rounding) 2025-12-04T12:15:06.3341908Z E1204 12:11:24.435000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/semantic.py", line 827, in cast 2025-12-04T12:15:06.3342731Z E1204 12:11:24.435000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] self.builder.create_fp_to_fp(input.handle, dst_ty.to_ir(self.builder), fp_downcast_rounding), dst_ty) 2025-12-04T12:15:06.3343592Z E1204 12:11:24.435000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 712, in to_ir 2025-12-04T12:15:06.3344292Z E1204 12:11:24.435000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return builder.get_block_ty(self.element_ty.to_ir(builder), self.shape) 2025-12-04T12:15:06.3345145Z E1204 12:11:24.435000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 574, in to_ir 2025-12-04T12:15:06.3345829Z E1204 12:11:24.435000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] raise ValueError(f'type {self} not supported in this architecture. ' 2025-12-04T12:15:06.3346713Z E1204 12:11:24.435000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError: type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T12:15:06.3347090Z E1204 12:11:24.435000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:06.3347814Z E1204 12:11:24.435000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] The above exception was the direct cause of the following exception: 2025-12-04T12:15:06.3348191Z E1204 12:11:24.435000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:06.3348726Z E1204 12:11:24.435000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:06.3349764Z E1204 12:11:24.435000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:06.3350441Z E1204 12:11:24.435000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:06.3351341Z E1204 12:11:24.435000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:06.3352037Z E1204 12:11:24.435000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:06.3352950Z E1204 12:11:24.435000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:06.3353737Z E1204 12:11:24.435000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:06.3354355Z E1204 12:11:24.435000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 7:11: 2025-12-04T12:15:06.3355147Z E1204 12:11:24.435000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:06.3355703Z E1204 12:11:24.435000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:06.3356268Z E1204 12:11:24.435000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T12:15:06.3356781Z E1204 12:11:24.435000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = xindex < xnumel 2025-12-04T12:15:06.3357213Z E1204 12:11:24.435000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T12:15:06.3357817Z E1204 12:11:24.435000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2025-12-04T12:15:06.3358344Z E1204 12:11:24.435000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T12:15:06.3358757Z E1204 12:11:24.435000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T12:15:06.3359592Z E1204 12:11:24.435000 126893 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T12:15:06.3359700Z FAILED [0.4396s] [100%] 2025-12-04T12:15:06.3359707Z 2025-12-04T12:15:06.3359869Z ==================================== RERUNS ==================================== 2025-12-04T12:15:06.3360197Z _ TestFP8TypesCUDA.test_valid_cast_float16_shape_4,2048,4096_dst_types0_cuda_float16 _ 2025-12-04T12:15:06.3360324Z Traceback (most recent call last): 2025-12-04T12:15:06.3360711Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 156, in test_valid_cast 2025-12-04T12:15:06.3360840Z y0_fp8, y1_fp8 = compiled_fp8_cast(x) 2025-12-04T12:15:06.3361369Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:06.3361636Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:06.3362151Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:06.3362357Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:06.3362905Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:06.3363055Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:06.3363605Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:06.3363925Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:06.3364461Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:06.3364612Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:06.3365129Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:06.3365269Z return self._compile_to_module() 2025-12-04T12:15:06.3365751Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:06.3365919Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:06.3366446Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:06.3366578Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:06.3367086Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:06.3367352Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:06.3367945Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:06.3368089Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:06.3368588Z File "/tmp/tmpei49bwpg/td/ctdmtmrtmfcu6i67yfe6smnv2h6pms7leuoa24eqbt25ywc6tajy.py", line 51, in 2025-12-04T12:15:06.3369069Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T12:15:06.3369185Z kernel.precompile( 2025-12-04T12:15:06.3369742Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T12:15:06.3369875Z self._precompile_worker() 2025-12-04T12:15:06.3370476Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:06.3370659Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:06.3371456Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:06.3371660Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:06.3372127Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:06.3372378Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:06.3372821Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:06.3373173Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:06.3373407Z torch._inductor.exc.InductorError: CompilationError: at 7:11: 2025-12-04T12:15:06.3373828Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:06.3373958Z xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:06.3374104Z xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T12:15:06.3374233Z xmask = xindex < xnumel 2025-12-04T12:15:06.3374333Z x0 = xindex 2025-12-04T12:15:06.3374505Z tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2025-12-04T12:15:06.3374645Z tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T12:15:06.3374784Z ^ 2025-12-04T12:15:06.3375172Z type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T12:15:06.3375193Z 2025-12-04T12:15:06.3375909Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:06.3375916Z 2025-12-04T12:15:06.3375922Z 2025-12-04T12:15:06.3376139Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:06.3376860Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_valid_cast_float16_shape_4,2048,4096_dst_types0_cuda_float16 2025-12-04T12:15:06.3376917Z 2025-12-04T12:15:06.3377191Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:06.3377431Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:06.3377541Z frames [('total', 1)] 2025-12-04T12:15:06.3377659Z stats [('calls_captured', 4)] 2025-12-04T12:15:06.3378141Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:06.3378366Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:06.3378467Z graph_break [] 2025-12-04T12:15:06.3378809Z _ TestFP8TypesCUDA.test_valid_cast_float16_shape_4,2048,4096_dst_types0_cuda_float16 _ 2025-12-04T12:15:06.3378935Z Traceback (most recent call last): 2025-12-04T12:15:06.3379368Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 156, in test_valid_cast 2025-12-04T12:15:06.3379501Z y0_fp8, y1_fp8 = compiled_fp8_cast(x) 2025-12-04T12:15:06.3379991Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:06.3380254Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:06.3380774Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:06.3380968Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:06.3381489Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:06.3381635Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:06.3382189Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:06.3382509Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:06.3383032Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:06.3383196Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:06.3383676Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:06.3383816Z return self._compile_to_module() 2025-12-04T12:15:06.3384304Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:06.3384469Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:06.3385001Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:06.3385172Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:06.3385671Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:06.3385918Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:06.3386503Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:06.3386675Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:06.3387152Z File "/tmp/tmps3_nni3i/ey/ceyh6hqunnyf3nragc52kdfbngbubqol4izvzo6wt7jyh543kmdz.py", line 51, in 2025-12-04T12:15:06.3387616Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T12:15:06.3387740Z kernel.precompile( 2025-12-04T12:15:06.3388295Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T12:15:06.3388428Z self._precompile_worker() 2025-12-04T12:15:06.3389024Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:06.3389258Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:06.3389867Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:06.3390069Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:06.3390528Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:06.3390774Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:06.3391216Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:06.3391596Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:06.3391830Z torch._inductor.exc.InductorError: CompilationError: at 7:11: 2025-12-04T12:15:06.3392152Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:06.3392291Z xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:06.3392432Z xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T12:15:06.3392557Z xmask = xindex < xnumel 2025-12-04T12:15:06.3392654Z x0 = xindex 2025-12-04T12:15:06.3392825Z tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2025-12-04T12:15:06.3392960Z tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T12:15:06.3393053Z ^ 2025-12-04T12:15:06.3393444Z type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T12:15:06.3393450Z 2025-12-04T12:15:06.3394185Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:06.3394192Z 2025-12-04T12:15:06.3394196Z 2025-12-04T12:15:06.3394413Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:06.3395066Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_valid_cast_float16_shape_4,2048,4096_dst_types0_cuda_float16 2025-12-04T12:15:06.3395072Z 2025-12-04T12:15:06.3395345Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:06.3395567Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:06.3395687Z frames [('total', 1)] 2025-12-04T12:15:06.3395804Z stats [('calls_captured', 4)] 2025-12-04T12:15:06.3396282Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:06.3396504Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:06.3396607Z graph_break [] 2025-12-04T12:15:06.3396871Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:06.3396980Z frames [('total', 1)] 2025-12-04T12:15:06.3397095Z stats [('calls_captured', 4)] 2025-12-04T12:15:06.3397326Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:06.3397788Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:06.3397935Z graph_break [] 2025-12-04T12:15:06.3398084Z =================================== FAILURES =================================== 2025-12-04T12:15:06.3398408Z _ TestFP8TypesCUDA.test_valid_cast_float16_shape_4,2048,4096_dst_types0_cuda_float16 _ 2025-12-04T12:15:06.3398543Z Traceback (most recent call last): 2025-12-04T12:15:06.3398917Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 156, in test_valid_cast 2025-12-04T12:15:06.3399047Z y0_fp8, y1_fp8 = compiled_fp8_cast(x) 2025-12-04T12:15:06.3399550Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:06.3399835Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:06.3400359Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:06.3400557Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:06.3401066Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:06.3401227Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:06.3401762Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:06.3402098Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:06.3402660Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:06.3402815Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:06.3403311Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:06.3403436Z return self._compile_to_module() 2025-12-04T12:15:06.3403928Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:06.3404110Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:06.3404629Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:06.3404774Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:06.3405276Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:06.3405511Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:06.3406115Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:06.3406246Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:06.3406761Z File "/tmp/tmpb81ohzt6/ej/cejcyxr2mvh6skru7zv4rh45gmze7p7vd3efomayfjzvtgszzv5v.py", line 51, in 2025-12-04T12:15:06.3407229Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T12:15:06.3407345Z kernel.precompile( 2025-12-04T12:15:06.3407918Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T12:15:06.3408040Z self._precompile_worker() 2025-12-04T12:15:06.3408679Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:06.3408873Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:06.3409469Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:06.3409685Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:06.3410174Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:06.3410423Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:06.3410884Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:06.3411223Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:06.3411469Z torch._inductor.exc.InductorError: CompilationError: at 7:11: 2025-12-04T12:15:06.3411797Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:06.3411955Z xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:06.3412108Z xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T12:15:06.3412219Z xmask = xindex < xnumel 2025-12-04T12:15:06.3412315Z x0 = xindex 2025-12-04T12:15:06.3412497Z tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2025-12-04T12:15:06.3412621Z tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T12:15:06.3412714Z ^ 2025-12-04T12:15:06.3413117Z type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T12:15:06.3413122Z 2025-12-04T12:15:06.3413834Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:06.3413839Z 2025-12-04T12:15:06.3413844Z 2025-12-04T12:15:06.3414110Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:06.3414754Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_valid_cast_float16_shape_4,2048,4096_dst_types0_cuda_float16 2025-12-04T12:15:06.3414762Z 2025-12-04T12:15:06.3415047Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:06.3415272Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:06.3415384Z frames [('total', 1)] 2025-12-04T12:15:06.3415518Z stats [('calls_captured', 4)] 2025-12-04T12:15:06.3415985Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:06.3416208Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:06.3416402Z graph_break [] 2025-12-04T12:15:06.3416626Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:06.3416746Z frames [('total', 1)] 2025-12-04T12:15:06.3416865Z stats [('calls_captured', 4)] 2025-12-04T12:15:06.3417081Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:06.3417559Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:06.3417659Z graph_break [] 2025-12-04T12:15:06.3417875Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:06.3417996Z frames [('total', 1)] 2025-12-04T12:15:06.3418112Z stats [('calls_captured', 4)] 2025-12-04T12:15:06.3418346Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:06.3418811Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:06.3418911Z graph_break [] 2025-12-04T12:15:06.3419624Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-da23a1d59c747be6.xml - 2025-12-04T12:15:06.3419803Z =========================== short test summary info ============================ 2025-12-04T12:15:06.3420605Z FAILED [0.4396s] inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float16_shape_4,2048,4096_dst_types0_cuda_float16 - torch._inductor.exc.InductorError: CompilationError: at 7:11: 2025-12-04T12:15:06.3420940Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:06.3421112Z xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:06.3421263Z xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T12:15:06.3421373Z xmask = xindex < xnumel 2025-12-04T12:15:06.3421468Z x0 = xindex 2025-12-04T12:15:06.3421654Z tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2025-12-04T12:15:06.3421775Z tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T12:15:06.3421866Z ^ 2025-12-04T12:15:06.3422276Z type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T12:15:06.3422282Z 2025-12-04T12:15:06.3422994Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:06.3423029Z 2025-12-04T12:15:06.3423034Z 2025-12-04T12:15:06.3423270Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:06.3423908Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_valid_cast_float16_shape_4,2048,4096_dst_types0_cuda_float16 2025-12-04T12:15:06.3423914Z 2025-12-04T12:15:06.3424197Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:06.3424380Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T12:15:06.3424583Z ================== 1 failed, 187 deselected, 2 rerun in 4.43s ================== 2025-12-04T12:15:06.3424697Z Got exit code 1 2025-12-04T12:15:06.3424918Z Retrying single test... 2025-12-04T12:15:06.3425391Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-36993cd4956a89fe.xml 2025-12-04T12:15:06.3425572Z ============================= test session starts ============================== 2025-12-04T12:15:06.3425924Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T12:15:06.3426050Z cachedir: .pytest_cache 2025-12-04T12:15:06.3426572Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:15:06.3426697Z rootdir: /var/lib/jenkins/workspace 2025-12-04T12:15:06.3426821Z configfile: pytest.ini 2025-12-04T12:15:06.3427410Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T12:15:06.3427636Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T12:15:06.3428369Z stepcurrent: skipping 62 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float16_shape_4,2048,4096_dst_types0_cuda_float16 2025-12-04T12:15:06.3428492Z Running 1 items in this shard 2025-12-04T12:15:06.3428497Z 2025-12-04T12:15:06.3429627Z inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float16_shape_4,2048,4096_dst_types0_cuda_float16 E1204 12:11:43.589000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_0 2025-12-04T12:15:06.3430395Z E1204 12:11:43.589000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:06.3430953Z E1204 12:11:43.589000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:06.3431549Z E1204 12:11:43.589000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T12:15:06.3432044Z E1204 12:11:43.589000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = xindex < xnumel 2025-12-04T12:15:06.3432492Z E1204 12:11:43.589000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T12:15:06.3433138Z E1204 12:11:43.589000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2025-12-04T12:15:06.3433675Z E1204 12:11:43.589000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T12:15:06.3434180Z E1204 12:11:43.589000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tmp1.to(tl.float32) 2025-12-04T12:15:06.3434691Z E1204 12:11:43.589000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tmp0.to(tl.float8e5) 2025-12-04T12:15:06.3435207Z E1204 12:11:43.589000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tmp3.to(tl.float32) 2025-12-04T12:15:06.3435783Z E1204 12:11:43.589000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr0 + (x0), tmp2, xmask) 2025-12-04T12:15:06.3436341Z E1204 12:11:43.589000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (x0), tmp4, xmask) 2025-12-04T12:15:06.3436708Z E1204 12:11:43.589000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:06.3438561Z E1204 12:11:43.589000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'out_ptr0': '*fp16', 'out_ptr1': '*fp16', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1024}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 75} 2025-12-04T12:15:06.3439098Z E1204 12:11:43.589000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:06.3439962Z E1204 12:11:43.589000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper 2025-12-04T12:15:06.3440489Z E1204 12:11:43.589000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return fn(*args, **kwargs) 2025-12-04T12:15:06.3441324Z E1204 12:11:43.589000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1114, in to 2025-12-04T12:15:06.3442053Z E1204 12:11:43.589000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return cast(self, dtype, fp_downcast_rounding, bitcast, _semantic=_semantic) 2025-12-04T12:15:06.3442906Z E1204 12:11:43.589000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper 2025-12-04T12:15:06.3443424Z E1204 12:11:43.589000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return fn(*args, **kwargs) 2025-12-04T12:15:06.3444267Z E1204 12:11:43.589000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1978, in cast 2025-12-04T12:15:06.3444901Z E1204 12:11:43.589000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return _semantic.cast(input, dtype, fp_downcast_rounding) 2025-12-04T12:15:06.3445812Z E1204 12:11:43.589000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/semantic.py", line 827, in cast 2025-12-04T12:15:06.3446630Z E1204 12:11:43.589000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] self.builder.create_fp_to_fp(input.handle, dst_ty.to_ir(self.builder), fp_downcast_rounding), dst_ty) 2025-12-04T12:15:06.3447519Z E1204 12:11:43.589000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 712, in to_ir 2025-12-04T12:15:06.3448213Z E1204 12:11:43.589000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return builder.get_block_ty(self.element_ty.to_ir(builder), self.shape) 2025-12-04T12:15:06.3449071Z E1204 12:11:43.589000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 574, in to_ir 2025-12-04T12:15:06.3449752Z E1204 12:11:43.589000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] raise ValueError(f'type {self} not supported in this architecture. ' 2025-12-04T12:15:06.3450664Z E1204 12:11:43.589000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError: type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T12:15:06.3451040Z E1204 12:11:43.589000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:06.3451720Z E1204 12:11:43.589000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] The above exception was the direct cause of the following exception: 2025-12-04T12:15:06.3452093Z E1204 12:11:43.589000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:06.3452654Z E1204 12:11:43.589000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:06.3453708Z E1204 12:11:43.589000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:06.3454333Z E1204 12:11:43.589000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:06.3455225Z E1204 12:11:43.589000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:06.3455916Z E1204 12:11:43.589000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:06.3456891Z E1204 12:11:43.589000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:06.3457684Z E1204 12:11:43.589000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:06.3458299Z E1204 12:11:43.589000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 7:11: 2025-12-04T12:15:06.3459072Z E1204 12:11:43.589000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:06.3459618Z E1204 12:11:43.589000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:06.3460216Z E1204 12:11:43.589000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T12:15:06.3460724Z E1204 12:11:43.589000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = xindex < xnumel 2025-12-04T12:15:06.3461157Z E1204 12:11:43.589000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T12:15:06.3461791Z E1204 12:11:43.589000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2025-12-04T12:15:06.3462314Z E1204 12:11:43.589000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T12:15:06.3462730Z E1204 12:11:43.589000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T12:15:06.3463576Z E1204 12:11:43.589000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T12:15:06.3463742Z ('RERUN', {'yellow': True}) [3.4997s] [100%] 2025-12-04T12:15:06.3464864Z inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float16_shape_4,2048,4096_dst_types0_cuda_float16 E1204 12:11:44.061000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_0 2025-12-04T12:15:06.3465625Z E1204 12:11:44.061000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:06.3466179Z E1204 12:11:44.061000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:06.3466768Z E1204 12:11:44.061000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T12:15:06.3467261Z E1204 12:11:44.061000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = xindex < xnumel 2025-12-04T12:15:06.3467708Z E1204 12:11:44.061000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T12:15:06.3468305Z E1204 12:11:44.061000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2025-12-04T12:15:06.3468845Z E1204 12:11:44.061000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T12:15:06.3469353Z E1204 12:11:44.061000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tmp1.to(tl.float32) 2025-12-04T12:15:06.3469869Z E1204 12:11:44.061000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tmp0.to(tl.float8e5) 2025-12-04T12:15:06.3470390Z E1204 12:11:44.061000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tmp3.to(tl.float32) 2025-12-04T12:15:06.3471105Z E1204 12:11:44.061000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr0 + (x0), tmp2, xmask) 2025-12-04T12:15:06.3471665Z E1204 12:11:44.061000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (x0), tmp4, xmask) 2025-12-04T12:15:06.3472032Z E1204 12:11:44.061000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:06.3473909Z E1204 12:11:44.061000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'out_ptr0': '*fp16', 'out_ptr1': '*fp16', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1024}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 75} 2025-12-04T12:15:06.3474450Z E1204 12:11:44.061000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:06.3475313Z E1204 12:11:44.061000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper 2025-12-04T12:15:06.3475875Z E1204 12:11:44.061000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return fn(*args, **kwargs) 2025-12-04T12:15:06.3476709Z E1204 12:11:44.061000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1114, in to 2025-12-04T12:15:06.3477433Z E1204 12:11:44.061000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return cast(self, dtype, fp_downcast_rounding, bitcast, _semantic=_semantic) 2025-12-04T12:15:06.3478326Z E1204 12:11:44.061000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper 2025-12-04T12:15:06.3478845Z E1204 12:11:44.061000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return fn(*args, **kwargs) 2025-12-04T12:15:06.3479688Z E1204 12:11:44.061000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1978, in cast 2025-12-04T12:15:06.3480317Z E1204 12:11:44.061000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return _semantic.cast(input, dtype, fp_downcast_rounding) 2025-12-04T12:15:06.3481242Z E1204 12:11:44.061000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/semantic.py", line 827, in cast 2025-12-04T12:15:06.3482065Z E1204 12:11:44.061000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] self.builder.create_fp_to_fp(input.handle, dst_ty.to_ir(self.builder), fp_downcast_rounding), dst_ty) 2025-12-04T12:15:06.3482924Z E1204 12:11:44.061000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 712, in to_ir 2025-12-04T12:15:06.3483619Z E1204 12:11:44.061000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return builder.get_block_ty(self.element_ty.to_ir(builder), self.shape) 2025-12-04T12:15:06.3484482Z E1204 12:11:44.061000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 574, in to_ir 2025-12-04T12:15:06.3485163Z E1204 12:11:44.061000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] raise ValueError(f'type {self} not supported in this architecture. ' 2025-12-04T12:15:06.3486059Z E1204 12:11:44.061000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError: type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T12:15:06.3486423Z E1204 12:11:44.061000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:06.3487103Z E1204 12:11:44.061000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] The above exception was the direct cause of the following exception: 2025-12-04T12:15:06.3487474Z E1204 12:11:44.061000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:06.3488073Z E1204 12:11:44.061000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:06.3489131Z E1204 12:11:44.061000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:06.3489758Z E1204 12:11:44.061000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:06.3490671Z E1204 12:11:44.061000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:06.3491374Z E1204 12:11:44.061000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:06.3492265Z E1204 12:11:44.061000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:06.3493082Z E1204 12:11:44.061000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:06.3493696Z E1204 12:11:44.061000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 7:11: 2025-12-04T12:15:06.3494472Z E1204 12:11:44.061000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:06.3495016Z E1204 12:11:44.061000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:06.3495607Z E1204 12:11:44.061000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T12:15:06.3496122Z E1204 12:11:44.061000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = xindex < xnumel 2025-12-04T12:15:06.3496618Z E1204 12:11:44.061000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T12:15:06.3497234Z E1204 12:11:44.061000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2025-12-04T12:15:06.3497759Z E1204 12:11:44.061000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T12:15:06.3498195Z E1204 12:11:44.061000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T12:15:06.3499024Z E1204 12:11:44.061000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T12:15:06.3499164Z ('RERUN', {'yellow': True}) [0.4324s] [100%] 2025-12-04T12:15:06.3500302Z inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float16_shape_4,2048,4096_dst_types0_cuda_float16 E1204 12:11:44.495000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_0 2025-12-04T12:15:06.3501063Z E1204 12:11:44.495000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:06.3501618Z E1204 12:11:44.495000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:06.3502223Z E1204 12:11:44.495000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T12:15:06.3502718Z E1204 12:11:44.495000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = xindex < xnumel 2025-12-04T12:15:06.3503166Z E1204 12:11:44.495000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T12:15:06.3503760Z E1204 12:11:44.495000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2025-12-04T12:15:06.3504329Z E1204 12:11:44.495000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T12:15:06.3504839Z E1204 12:11:44.495000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tmp1.to(tl.float32) 2025-12-04T12:15:06.3505355Z E1204 12:11:44.495000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tmp0.to(tl.float8e5) 2025-12-04T12:15:06.3505879Z E1204 12:11:44.495000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tmp3.to(tl.float32) 2025-12-04T12:15:06.3506457Z E1204 12:11:44.495000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr0 + (x0), tmp2, xmask) 2025-12-04T12:15:06.3507022Z E1204 12:11:44.495000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (x0), tmp4, xmask) 2025-12-04T12:15:06.3507387Z E1204 12:11:44.495000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:06.3509244Z E1204 12:11:44.495000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp16', 'out_ptr0': '*fp16', 'out_ptr1': '*fp16', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1024}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 75} 2025-12-04T12:15:06.3509784Z E1204 12:11:44.495000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:06.3510662Z E1204 12:11:44.495000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper 2025-12-04T12:15:06.3511170Z E1204 12:11:44.495000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return fn(*args, **kwargs) 2025-12-04T12:15:06.3512006Z E1204 12:11:44.495000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1114, in to 2025-12-04T12:15:06.3512729Z E1204 12:11:44.495000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return cast(self, dtype, fp_downcast_rounding, bitcast, _semantic=_semantic) 2025-12-04T12:15:06.3513582Z E1204 12:11:44.495000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper 2025-12-04T12:15:06.3514104Z E1204 12:11:44.495000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return fn(*args, **kwargs) 2025-12-04T12:15:06.3514946Z E1204 12:11:44.495000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1978, in cast 2025-12-04T12:15:06.3515593Z E1204 12:11:44.495000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return _semantic.cast(input, dtype, fp_downcast_rounding) 2025-12-04T12:15:06.3516494Z E1204 12:11:44.495000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/semantic.py", line 827, in cast 2025-12-04T12:15:06.3517311Z E1204 12:11:44.495000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] self.builder.create_fp_to_fp(input.handle, dst_ty.to_ir(self.builder), fp_downcast_rounding), dst_ty) 2025-12-04T12:15:06.3518172Z E1204 12:11:44.495000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 712, in to_ir 2025-12-04T12:15:06.3518906Z E1204 12:11:44.495000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return builder.get_block_ty(self.element_ty.to_ir(builder), self.shape) 2025-12-04T12:15:06.3519764Z E1204 12:11:44.495000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 574, in to_ir 2025-12-04T12:15:06.3520447Z E1204 12:11:44.495000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] raise ValueError(f'type {self} not supported in this architecture. ' 2025-12-04T12:15:06.3521377Z E1204 12:11:44.495000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError: type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T12:15:06.3521740Z E1204 12:11:44.495000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:06.3522418Z E1204 12:11:44.495000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] The above exception was the direct cause of the following exception: 2025-12-04T12:15:06.3522788Z E1204 12:11:44.495000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:06.3523351Z E1204 12:11:44.495000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:06.3524404Z E1204 12:11:44.495000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:06.3525037Z E1204 12:11:44.495000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:06.3525940Z E1204 12:11:44.495000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:06.3526617Z E1204 12:11:44.495000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:06.3527510Z E1204 12:11:44.495000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:06.3528289Z E1204 12:11:44.495000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:06.3528903Z E1204 12:11:44.495000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 7:11: 2025-12-04T12:15:06.3529672Z E1204 12:11:44.495000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:06.3530214Z E1204 12:11:44.495000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:06.3530826Z E1204 12:11:44.495000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T12:15:06.3531323Z E1204 12:11:44.495000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = xindex < xnumel 2025-12-04T12:15:06.3531753Z E1204 12:11:44.495000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T12:15:06.3532389Z E1204 12:11:44.495000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2025-12-04T12:15:06.3532913Z E1204 12:11:44.495000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T12:15:06.3533345Z E1204 12:11:44.495000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T12:15:06.3534169Z E1204 12:11:44.495000 127090 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T12:15:06.3534308Z FAILED [0.4307s] [100%] 2025-12-04T12:15:06.3534315Z 2025-12-04T12:15:06.3534476Z ==================================== RERUNS ==================================== 2025-12-04T12:15:06.3534801Z _ TestFP8TypesCUDA.test_valid_cast_float16_shape_4,2048,4096_dst_types0_cuda_float16 _ 2025-12-04T12:15:06.3534942Z Traceback (most recent call last): 2025-12-04T12:15:06.3535318Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 156, in test_valid_cast 2025-12-04T12:15:06.3535447Z y0_fp8, y1_fp8 = compiled_fp8_cast(x) 2025-12-04T12:15:06.3535951Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:06.3536203Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:06.3536839Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:06.3537053Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:06.3537567Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:06.3537727Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:06.3538261Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:06.3538583Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:06.3539116Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:06.3539265Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:06.3539760Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:06.3539886Z return self._compile_to_module() 2025-12-04T12:15:06.3540374Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:06.3540554Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:06.3541072Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:06.3541205Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:06.3541720Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:06.3541953Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:06.3542551Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:06.3542682Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:06.3543207Z File "/tmp/tmpi56d1yz9/pl/cpljnzs7kv6cv43347rm2j3rcqe4fqv6i4pzmm4f2gftqkzl35sj.py", line 51, in 2025-12-04T12:15:06.3543692Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T12:15:06.3543805Z kernel.precompile( 2025-12-04T12:15:06.3544372Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T12:15:06.3544524Z self._precompile_worker() 2025-12-04T12:15:06.3545124Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:06.3545323Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:06.3545916Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:06.3546121Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:06.3546584Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:06.3546864Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:06.3547322Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:06.3547662Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:06.3547897Z torch._inductor.exc.InductorError: CompilationError: at 7:11: 2025-12-04T12:15:06.3548231Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:06.3548359Z xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:06.3548499Z xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T12:15:06.3548621Z xmask = xindex < xnumel 2025-12-04T12:15:06.3548746Z x0 = xindex 2025-12-04T12:15:06.3548932Z tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2025-12-04T12:15:06.3549054Z tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T12:15:06.3549148Z ^ 2025-12-04T12:15:06.3549546Z type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T12:15:06.3549552Z 2025-12-04T12:15:06.3550274Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:06.3550283Z 2025-12-04T12:15:06.3550287Z 2025-12-04T12:15:06.3550517Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:06.3551158Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_valid_cast_float16_shape_4,2048,4096_dst_types0_cuda_float16 2025-12-04T12:15:06.3551164Z 2025-12-04T12:15:06.3551434Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:06.3551672Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:06.3551780Z frames [('total', 1)] 2025-12-04T12:15:06.3551912Z stats [('calls_captured', 4)] 2025-12-04T12:15:06.3552375Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:06.3552596Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:06.3552715Z graph_break [] 2025-12-04T12:15:06.3553040Z _ TestFP8TypesCUDA.test_valid_cast_float16_shape_4,2048,4096_dst_types0_cuda_float16 _ 2025-12-04T12:15:06.3553164Z Traceback (most recent call last): 2025-12-04T12:15:06.3553551Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 156, in test_valid_cast 2025-12-04T12:15:06.3553680Z y0_fp8, y1_fp8 = compiled_fp8_cast(x) 2025-12-04T12:15:06.3554188Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:06.3554468Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:06.3554984Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:06.3555192Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:06.3555704Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:06.3555880Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:06.3556428Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:06.3556749Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:06.3557286Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:06.3557438Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:06.3557920Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:06.3558090Z return self._compile_to_module() 2025-12-04T12:15:06.3558577Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:06.3558760Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:06.3559275Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:06.3559407Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:06.3559922Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:06.3560154Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:06.3560770Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:06.3560915Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:06.3561411Z File "/tmp/tmp1fmpx0jd/ou/couh4d6bv2nmxvvtt5hffvzref3f5ysgl5sitr3f63rm4wyjo3i5.py", line 51, in 2025-12-04T12:15:06.3561886Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T12:15:06.3562003Z kernel.precompile( 2025-12-04T12:15:06.3562558Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T12:15:06.3562692Z self._precompile_worker() 2025-12-04T12:15:06.3563291Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:06.3563489Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:06.3564089Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:06.3564294Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:06.3564757Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:06.3565005Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:06.3565448Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:06.3565798Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:06.3566030Z torch._inductor.exc.InductorError: CompilationError: at 7:11: 2025-12-04T12:15:06.3566369Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:06.3566526Z xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:06.3566669Z xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T12:15:06.3566801Z xmask = xindex < xnumel 2025-12-04T12:15:06.3566897Z x0 = xindex 2025-12-04T12:15:06.3567070Z tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2025-12-04T12:15:06.3567204Z tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T12:15:06.3567296Z ^ 2025-12-04T12:15:06.3567726Z type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T12:15:06.3567732Z 2025-12-04T12:15:06.3568450Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:06.3568456Z 2025-12-04T12:15:06.3568461Z 2025-12-04T12:15:06.3568680Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:06.3569341Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_valid_cast_float16_shape_4,2048,4096_dst_types0_cuda_float16 2025-12-04T12:15:06.3569347Z 2025-12-04T12:15:06.3569650Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:06.3569882Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:06.3569988Z frames [('total', 1)] 2025-12-04T12:15:06.3570108Z stats [('calls_captured', 4)] 2025-12-04T12:15:06.3570592Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:06.3570815Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:06.3570928Z graph_break [] 2025-12-04T12:15:06.3571345Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:06.3571451Z frames [('total', 1)] 2025-12-04T12:15:06.3571584Z stats [('calls_captured', 4)] 2025-12-04T12:15:06.3571878Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:06.3572346Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:06.3572466Z graph_break [] 2025-12-04T12:15:06.3572617Z =================================== FAILURES =================================== 2025-12-04T12:15:06.3572956Z _ TestFP8TypesCUDA.test_valid_cast_float16_shape_4,2048,4096_dst_types0_cuda_float16 _ 2025-12-04T12:15:06.3573086Z Traceback (most recent call last): 2025-12-04T12:15:06.3573463Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 156, in test_valid_cast 2025-12-04T12:15:06.3573607Z y0_fp8, y1_fp8 = compiled_fp8_cast(x) 2025-12-04T12:15:06.3574100Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:06.3574353Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:06.3574889Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:06.3575088Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:06.3575615Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:06.3575765Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:06.3576361Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:06.3576705Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:06.3577233Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:06.3577398Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:06.3578125Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:06.3578252Z return self._compile_to_module() 2025-12-04T12:15:06.3578759Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:06.3578926Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:06.3579445Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:06.3579644Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:06.3580140Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:06.3580391Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:06.3580978Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:06.3581113Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:06.3581634Z File "/tmp/tmpk0y73hyi/4w/c4wnjsxmagfq62yledwacpfirhrnhebcisqz5akbiqfawjcrs4yk.py", line 51, in 2025-12-04T12:15:06.3582159Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T12:15:06.3582274Z kernel.precompile( 2025-12-04T12:15:06.3582846Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T12:15:06.3582968Z self._precompile_worker() 2025-12-04T12:15:06.3583584Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:06.3583767Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:06.3584399Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:06.3584616Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:06.3585070Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:06.3585337Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:06.3585781Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:06.3586120Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:06.3586367Z torch._inductor.exc.InductorError: CompilationError: at 7:11: 2025-12-04T12:15:06.3586692Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:06.3586816Z xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:06.3586965Z xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T12:15:06.3587077Z xmask = xindex < xnumel 2025-12-04T12:15:06.3587188Z x0 = xindex 2025-12-04T12:15:06.3587357Z tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2025-12-04T12:15:06.3587480Z tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T12:15:06.3587589Z ^ 2025-12-04T12:15:06.3587976Z type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T12:15:06.3587981Z 2025-12-04T12:15:06.3588695Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:06.3588715Z 2025-12-04T12:15:06.3588719Z 2025-12-04T12:15:06.3588937Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:06.3589581Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_valid_cast_float16_shape_4,2048,4096_dst_types0_cuda_float16 2025-12-04T12:15:06.3589588Z 2025-12-04T12:15:06.3589907Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:06.3590132Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:06.3590250Z frames [('total', 1)] 2025-12-04T12:15:06.3590366Z stats [('calls_captured', 4)] 2025-12-04T12:15:06.3590833Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:06.3591651Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:06.3591753Z graph_break [] 2025-12-04T12:15:06.3591973Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:06.3592094Z frames [('total', 1)] 2025-12-04T12:15:06.3592213Z stats [('calls_captured', 4)] 2025-12-04T12:15:06.3592430Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:06.3592908Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:06.3593013Z graph_break [] 2025-12-04T12:15:06.3593244Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:06.3593386Z frames [('total', 1)] 2025-12-04T12:15:06.3593502Z stats [('calls_captured', 4)] 2025-12-04T12:15:06.3593735Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:06.3594193Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:06.3594296Z graph_break [] 2025-12-04T12:15:06.3594957Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-36993cd4956a89fe.xml - 2025-12-04T12:15:06.3595132Z =========================== short test summary info ============================ 2025-12-04T12:15:06.3595971Z FAILED [0.4307s] inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float16_shape_4,2048,4096_dst_types0_cuda_float16 - torch._inductor.exc.InductorError: CompilationError: at 7:11: 2025-12-04T12:15:06.3596294Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:06.3596423Z xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:06.3596575Z xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T12:15:06.3596683Z xmask = xindex < xnumel 2025-12-04T12:15:06.3596793Z x0 = xindex 2025-12-04T12:15:06.3596966Z tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32) 2025-12-04T12:15:06.3597085Z tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T12:15:06.3597192Z ^ 2025-12-04T12:15:06.3597581Z type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T12:15:06.3597587Z 2025-12-04T12:15:06.3598299Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:06.3598307Z 2025-12-04T12:15:06.3598326Z 2025-12-04T12:15:06.3598547Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:06.3599187Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_valid_cast_float16_shape_4,2048,4096_dst_types0_cuda_float16 2025-12-04T12:15:06.3599192Z 2025-12-04T12:15:06.3599472Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:06.3599656Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T12:15:06.3599859Z ================== 1 failed, 187 deselected, 2 rerun in 4.41s ================== 2025-12-04T12:15:06.3599973Z Got exit code 1 2025-12-04T12:15:06.3600526Z FAILED CONSISTENTLY: test/inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float16_shape_4,2048,4096_dst_types0_cuda_float16 2025-12-04T12:15:06.3600951Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T12:15:06.3601463Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-78153e5fcd212bc6.xml 2025-12-04T12:15:06.3601635Z ============================= test session starts ============================== 2025-12-04T12:15:06.3602004Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T12:15:06.3602118Z cachedir: .pytest_cache 2025-12-04T12:15:06.3602683Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:15:06.3602810Z rootdir: /var/lib/jenkins/workspace 2025-12-04T12:15:06.3602921Z configfile: pytest.ini 2025-12-04T12:15:06.3603523Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T12:15:06.3603752Z collecting ... collected 188 items / 63 deselected / 125 selected 2025-12-04T12:15:06.3603899Z stepcurrent: skipping 63 already run items. 2025-12-04T12:15:06.3604031Z Running 125 items in this shard 2025-12-04T12:15:06.3604036Z 2025-12-04T12:15:06.3605163Z inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float32_shape_15,3,13_dst_types0_cuda_float32 E1204 12:12:03.577000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_0 2025-12-04T12:15:06.3605942Z E1204 12:12:03.577000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:06.3606490Z E1204 12:12:03.577000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:06.3607069Z E1204 12:12:03.577000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T12:15:06.3607598Z E1204 12:12:03.577000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = xindex < xnumel 2025-12-04T12:15:06.3608031Z E1204 12:12:03.577000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T12:15:06.3608592Z E1204 12:12:03.577000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (x0), xmask) 2025-12-04T12:15:06.3609120Z E1204 12:12:03.577000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T12:15:06.3609646Z E1204 12:12:03.577000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tmp1.to(tl.float32) 2025-12-04T12:15:06.3610157Z E1204 12:12:03.577000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tmp0.to(tl.float8e5) 2025-12-04T12:15:06.3610665Z E1204 12:12:03.577000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tmp3.to(tl.float32) 2025-12-04T12:15:06.3611225Z E1204 12:12:03.577000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr0 + (x0), tmp2, xmask) 2025-12-04T12:15:06.3611774Z E1204 12:12:03.577000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (x0), tmp4, xmask) 2025-12-04T12:15:06.3612149Z E1204 12:12:03.577000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:06.3613957Z E1204 12:12:03.577000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp32', 'out_ptr0': '*fp32', 'out_ptr1': '*fp32', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 512}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 75} 2025-12-04T12:15:06.3614539Z E1204 12:12:03.577000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:06.3615412Z E1204 12:12:03.577000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper 2025-12-04T12:15:06.3615951Z E1204 12:12:03.577000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return fn(*args, **kwargs) 2025-12-04T12:15:06.3616879Z E1204 12:12:03.577000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1114, in to 2025-12-04T12:15:06.3617592Z E1204 12:12:03.577000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return cast(self, dtype, fp_downcast_rounding, bitcast, _semantic=_semantic) 2025-12-04T12:15:06.3618463Z E1204 12:12:03.577000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper 2025-12-04T12:15:06.3619015Z E1204 12:12:03.577000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return fn(*args, **kwargs) 2025-12-04T12:15:06.3619876Z E1204 12:12:03.577000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1978, in cast 2025-12-04T12:15:06.3620514Z E1204 12:12:03.577000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return _semantic.cast(input, dtype, fp_downcast_rounding) 2025-12-04T12:15:06.3621410Z E1204 12:12:03.577000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/semantic.py", line 827, in cast 2025-12-04T12:15:06.3622242Z E1204 12:12:03.577000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] self.builder.create_fp_to_fp(input.handle, dst_ty.to_ir(self.builder), fp_downcast_rounding), dst_ty) 2025-12-04T12:15:06.3623082Z E1204 12:12:03.577000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 712, in to_ir 2025-12-04T12:15:06.3623793Z E1204 12:12:03.577000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return builder.get_block_ty(self.element_ty.to_ir(builder), self.shape) 2025-12-04T12:15:06.3624640Z E1204 12:12:03.577000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 574, in to_ir 2025-12-04T12:15:06.3625338Z E1204 12:12:03.577000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] raise ValueError(f'type {self} not supported in this architecture. ' 2025-12-04T12:15:06.3626217Z E1204 12:12:03.577000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError: type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T12:15:06.3626582Z E1204 12:12:03.577000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:06.3627279Z E1204 12:12:03.577000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] The above exception was the direct cause of the following exception: 2025-12-04T12:15:06.3627639Z E1204 12:12:03.577000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:06.3628185Z E1204 12:12:03.577000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:06.3629264Z E1204 12:12:03.577000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:06.3629906Z E1204 12:12:03.577000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:06.3630800Z E1204 12:12:03.577000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:06.3631523Z E1204 12:12:03.577000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:06.3632427Z E1204 12:12:03.577000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:06.3633197Z E1204 12:12:03.577000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:06.3633929Z E1204 12:12:03.577000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 7:11: 2025-12-04T12:15:06.3634685Z E1204 12:12:03.577000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:06.3635246Z E1204 12:12:03.577000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:06.3635808Z E1204 12:12:03.577000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T12:15:06.3636336Z E1204 12:12:03.577000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = xindex < xnumel 2025-12-04T12:15:06.3636790Z E1204 12:12:03.577000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T12:15:06.3637334Z E1204 12:12:03.577000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (x0), xmask) 2025-12-04T12:15:06.3637876Z E1204 12:12:03.577000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T12:15:06.3638293Z E1204 12:12:03.577000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T12:15:06.3639117Z E1204 12:12:03.577000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T12:15:06.3639272Z ('RERUN', {'yellow': True}) [3.4739s] [ 0%] 2025-12-04T12:15:06.3640360Z inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float32_shape_15,3,13_dst_types0_cuda_float32 E1204 12:12:04.057000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_0 2025-12-04T12:15:06.3641144Z E1204 12:12:04.057000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:06.3641690Z E1204 12:12:04.057000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:06.3642266Z E1204 12:12:04.057000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T12:15:06.3642760Z E1204 12:12:04.057000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = xindex < xnumel 2025-12-04T12:15:06.3643229Z E1204 12:12:04.057000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T12:15:06.3643788Z E1204 12:12:04.057000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (x0), xmask) 2025-12-04T12:15:06.3644312Z E1204 12:12:04.057000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T12:15:06.3644867Z E1204 12:12:04.057000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tmp1.to(tl.float32) 2025-12-04T12:15:06.3645379Z E1204 12:12:04.057000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tmp0.to(tl.float8e5) 2025-12-04T12:15:06.3645883Z E1204 12:12:04.057000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tmp3.to(tl.float32) 2025-12-04T12:15:06.3646449Z E1204 12:12:04.057000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr0 + (x0), tmp2, xmask) 2025-12-04T12:15:06.3647027Z E1204 12:12:04.057000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (x0), tmp4, xmask) 2025-12-04T12:15:06.3647473Z E1204 12:12:04.057000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:06.3649327Z E1204 12:12:04.057000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp32', 'out_ptr0': '*fp32', 'out_ptr1': '*fp32', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 512}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 75} 2025-12-04T12:15:06.3649930Z E1204 12:12:04.057000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:06.3650802Z E1204 12:12:04.057000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper 2025-12-04T12:15:06.3651309Z E1204 12:12:04.057000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return fn(*args, **kwargs) 2025-12-04T12:15:06.3652161Z E1204 12:12:04.057000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1114, in to 2025-12-04T12:15:06.3652870Z E1204 12:12:04.057000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return cast(self, dtype, fp_downcast_rounding, bitcast, _semantic=_semantic) 2025-12-04T12:15:06.3653742Z E1204 12:12:04.057000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper 2025-12-04T12:15:06.3654252Z E1204 12:12:04.057000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return fn(*args, **kwargs) 2025-12-04T12:15:06.3655110Z E1204 12:12:04.057000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1978, in cast 2025-12-04T12:15:06.3655758Z E1204 12:12:04.057000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return _semantic.cast(input, dtype, fp_downcast_rounding) 2025-12-04T12:15:06.3656695Z E1204 12:12:04.057000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/semantic.py", line 827, in cast 2025-12-04T12:15:06.3657572Z E1204 12:12:04.057000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] self.builder.create_fp_to_fp(input.handle, dst_ty.to_ir(self.builder), fp_downcast_rounding), dst_ty) 2025-12-04T12:15:06.3658420Z E1204 12:12:04.057000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 712, in to_ir 2025-12-04T12:15:06.3659168Z E1204 12:12:04.057000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return builder.get_block_ty(self.element_ty.to_ir(builder), self.shape) 2025-12-04T12:15:06.3660010Z E1204 12:12:04.057000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 574, in to_ir 2025-12-04T12:15:06.3660714Z E1204 12:12:04.057000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] raise ValueError(f'type {self} not supported in this architecture. ' 2025-12-04T12:15:06.3661599Z E1204 12:12:04.057000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError: type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T12:15:06.3662000Z E1204 12:12:04.057000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:06.3662693Z E1204 12:12:04.057000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] The above exception was the direct cause of the following exception: 2025-12-04T12:15:06.3663061Z E1204 12:12:04.057000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:06.3663608Z E1204 12:12:04.057000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:06.3664685Z E1204 12:12:04.057000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:06.3665338Z E1204 12:12:04.057000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:06.3666234Z E1204 12:12:04.057000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:06.3666914Z E1204 12:12:04.057000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:06.3667815Z E1204 12:12:04.057000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:06.3668596Z E1204 12:12:04.057000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:06.3669234Z E1204 12:12:04.057000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 7:11: 2025-12-04T12:15:06.3669995Z E1204 12:12:04.057000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:06.3670559Z E1204 12:12:04.057000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:06.3671298Z E1204 12:12:04.057000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T12:15:06.3671873Z E1204 12:12:04.057000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = xindex < xnumel 2025-12-04T12:15:06.3672324Z E1204 12:12:04.057000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T12:15:06.3672869Z E1204 12:12:04.057000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (x0), xmask) 2025-12-04T12:15:06.3673407Z E1204 12:12:04.057000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T12:15:06.3673872Z E1204 12:12:04.057000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T12:15:06.3674696Z E1204 12:12:04.057000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T12:15:06.3674851Z ('RERUN', {'yellow': True}) [0.4403s] [ 0%] 2025-12-04T12:15:06.3675939Z inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float32_shape_15,3,13_dst_types0_cuda_float32 E1204 12:12:04.494000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_0 2025-12-04T12:15:06.3676755Z E1204 12:12:04.494000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:06.3677299Z E1204 12:12:04.494000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:06.3677873Z E1204 12:12:04.494000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T12:15:06.3678368Z E1204 12:12:04.494000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = xindex < xnumel 2025-12-04T12:15:06.3678847Z E1204 12:12:04.494000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T12:15:06.3679406Z E1204 12:12:04.494000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (x0), xmask) 2025-12-04T12:15:06.3679931Z E1204 12:12:04.494000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T12:15:06.3680454Z E1204 12:12:04.494000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tmp1.to(tl.float32) 2025-12-04T12:15:06.3680964Z E1204 12:12:04.494000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tmp0.to(tl.float8e5) 2025-12-04T12:15:06.3681468Z E1204 12:12:04.494000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tmp3.to(tl.float32) 2025-12-04T12:15:06.3682036Z E1204 12:12:04.494000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr0 + (x0), tmp2, xmask) 2025-12-04T12:15:06.3682579Z E1204 12:12:04.494000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (x0), tmp4, xmask) 2025-12-04T12:15:06.3682952Z E1204 12:12:04.494000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:06.3684746Z E1204 12:12:04.494000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp32', 'out_ptr0': '*fp32', 'out_ptr1': '*fp32', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 512}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 75} 2025-12-04T12:15:06.3685355Z E1204 12:12:04.494000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:06.3686225Z E1204 12:12:04.494000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper 2025-12-04T12:15:06.3686736Z E1204 12:12:04.494000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return fn(*args, **kwargs) 2025-12-04T12:15:06.3687613Z E1204 12:12:04.494000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1114, in to 2025-12-04T12:15:06.3688327Z E1204 12:12:04.494000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return cast(self, dtype, fp_downcast_rounding, bitcast, _semantic=_semantic) 2025-12-04T12:15:06.3689194Z E1204 12:12:04.494000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper 2025-12-04T12:15:06.3689699Z E1204 12:12:04.494000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return fn(*args, **kwargs) 2025-12-04T12:15:06.3690585Z E1204 12:12:04.494000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1978, in cast 2025-12-04T12:15:06.3691219Z E1204 12:12:04.494000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return _semantic.cast(input, dtype, fp_downcast_rounding) 2025-12-04T12:15:06.3692086Z E1204 12:12:04.494000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/semantic.py", line 827, in cast 2025-12-04T12:15:06.3692948Z E1204 12:12:04.494000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] self.builder.create_fp_to_fp(input.handle, dst_ty.to_ir(self.builder), fp_downcast_rounding), dst_ty) 2025-12-04T12:15:06.3693795Z E1204 12:12:04.494000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 712, in to_ir 2025-12-04T12:15:06.3694499Z E1204 12:12:04.494000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return builder.get_block_ty(self.element_ty.to_ir(builder), self.shape) 2025-12-04T12:15:06.3695344Z E1204 12:12:04.494000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 574, in to_ir 2025-12-04T12:15:06.3696045Z E1204 12:12:04.494000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] raise ValueError(f'type {self} not supported in this architecture. ' 2025-12-04T12:15:06.3697001Z E1204 12:12:04.494000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError: type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T12:15:06.3697370Z E1204 12:12:04.494000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:06.3698061Z E1204 12:12:04.494000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] The above exception was the direct cause of the following exception: 2025-12-04T12:15:06.3698422Z E1204 12:12:04.494000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:06.3698965Z E1204 12:12:04.494000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:06.3700077Z E1204 12:12:04.494000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:06.3700721Z E1204 12:12:04.494000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:06.3701620Z E1204 12:12:04.494000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:06.3702330Z E1204 12:12:04.494000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:06.3703223Z E1204 12:12:04.494000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:06.3703997Z E1204 12:12:04.494000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:06.3704667Z E1204 12:12:04.494000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 7:11: 2025-12-04T12:15:06.3705426Z E1204 12:12:04.494000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:06.3705991Z E1204 12:12:04.494000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:06.3706550Z E1204 12:12:04.494000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T12:15:06.3707083Z E1204 12:12:04.494000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = xindex < xnumel 2025-12-04T12:15:06.3707531Z E1204 12:12:04.494000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T12:15:06.3708074Z E1204 12:12:04.494000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (x0), xmask) 2025-12-04T12:15:06.3708610Z E1204 12:12:04.494000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T12:15:06.3709029Z E1204 12:12:04.494000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T12:15:06.3709851Z E1204 12:12:04.494000 127287 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T12:15:06.3709971Z FAILED [0.4341s] [ 0%] 2025-12-04T12:15:06.3709977Z 2025-12-04T12:15:06.3710127Z ==================================== RERUNS ==================================== 2025-12-04T12:15:06.3710458Z _ TestFP8TypesCUDA.test_valid_cast_float32_shape_15,3,13_dst_types0_cuda_float32 _ 2025-12-04T12:15:06.3710587Z Traceback (most recent call last): 2025-12-04T12:15:06.3710960Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 156, in test_valid_cast 2025-12-04T12:15:06.3711102Z y0_fp8, y1_fp8 = compiled_fp8_cast(x) 2025-12-04T12:15:06.3711589Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:06.3711854Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:06.3712366Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:06.3712562Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:06.3713086Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:06.3713271Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:06.3713807Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:06.3714141Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:06.3714663Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:06.3714861Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:06.3715344Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:06.3715468Z return self._compile_to_module() 2025-12-04T12:15:06.3715966Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:06.3716133Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:06.3716664Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:06.3716831Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:06.3717325Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:06.3717572Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:06.3718155Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:06.3718282Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:06.3718793Z File "/tmp/tmp8ydd984h/5u/c5uv3ao4fcezwyqjnn4pj3wcxvmr7ou5xmuxgqjmmc7p3pbqvpwc.py", line 51, in 2025-12-04T12:15:06.3719288Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T12:15:06.3719419Z kernel.precompile( 2025-12-04T12:15:06.3719971Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T12:15:06.3720094Z self._precompile_worker() 2025-12-04T12:15:06.3720702Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:06.3720885Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:06.3721488Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:06.3721691Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:06.3722145Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:06.3722403Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:06.3722849Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:06.3723185Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:06.3723430Z torch._inductor.exc.InductorError: CompilationError: at 7:11: 2025-12-04T12:15:06.3723747Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:06.3723888Z xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:06.3724029Z xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T12:15:06.3724140Z xmask = xindex < xnumel 2025-12-04T12:15:06.3724250Z x0 = xindex 2025-12-04T12:15:06.3724375Z tmp0 = tl.load(in_ptr0 + (x0), xmask) 2025-12-04T12:15:06.3724496Z tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T12:15:06.3724603Z ^ 2025-12-04T12:15:06.3724994Z type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T12:15:06.3725043Z 2025-12-04T12:15:06.3725770Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:06.3725779Z 2025-12-04T12:15:06.3725784Z 2025-12-04T12:15:06.3726002Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:06.3726662Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_valid_cast_float32_shape_15,3,13_dst_types0_cuda_float32 2025-12-04T12:15:06.3726681Z 2025-12-04T12:15:06.3726951Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:06.3727178Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:06.3727297Z frames [('total', 1)] 2025-12-04T12:15:06.3727415Z stats [('calls_captured', 4)] 2025-12-04T12:15:06.3727884Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:06.3728120Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:06.3728257Z graph_break [] 2025-12-04T12:15:06.3728570Z _ TestFP8TypesCUDA.test_valid_cast_float32_shape_15,3,13_dst_types0_cuda_float32 _ 2025-12-04T12:15:06.3728707Z Traceback (most recent call last): 2025-12-04T12:15:06.3729083Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 156, in test_valid_cast 2025-12-04T12:15:06.3729225Z y0_fp8, y1_fp8 = compiled_fp8_cast(x) 2025-12-04T12:15:06.3729717Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:06.3729964Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:06.3730488Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:06.3730733Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:06.3731255Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:06.3731406Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:06.3731940Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:06.3732276Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:06.3732793Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:06.3732949Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:06.3733442Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:06.3733568Z return self._compile_to_module() 2025-12-04T12:15:06.3734070Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:06.3734240Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:06.3734755Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:06.3734903Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:06.3735403Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:06.3735652Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:06.3736240Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:06.3736448Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:06.3737013Z File "/tmp/tmphn5b1r1s/vx/cvxtly3y3idm74gvnjkxho3zm6igvoxdg74xq27ji7sczms6fsqj.py", line 51, in 2025-12-04T12:15:06.3737479Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T12:15:06.3737594Z kernel.precompile( 2025-12-04T12:15:06.3738163Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T12:15:06.3738313Z self._precompile_worker() 2025-12-04T12:15:06.3738922Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:06.3739102Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:06.3739698Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:06.3745263Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:06.3745831Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:06.3746081Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:06.3746649Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:06.3746995Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:06.3747247Z torch._inductor.exc.InductorError: CompilationError: at 7:11: 2025-12-04T12:15:06.3747569Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:06.3747697Z xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:06.3747851Z xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T12:15:06.3747964Z xmask = xindex < xnumel 2025-12-04T12:15:06.3748057Z x0 = xindex 2025-12-04T12:15:06.3748196Z tmp0 = tl.load(in_ptr0 + (x0), xmask) 2025-12-04T12:15:06.3748356Z tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T12:15:06.3748465Z ^ 2025-12-04T12:15:06.3748857Z type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T12:15:06.3748867Z 2025-12-04T12:15:06.3749582Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:06.3749591Z 2025-12-04T12:15:06.3749596Z 2025-12-04T12:15:06.3749828Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:06.3750450Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_valid_cast_float32_shape_15,3,13_dst_types0_cuda_float32 2025-12-04T12:15:06.3750457Z 2025-12-04T12:15:06.3750741Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:06.3750970Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:06.3751079Z frames [('total', 1)] 2025-12-04T12:15:06.3751215Z stats [('calls_captured', 4)] 2025-12-04T12:15:06.3751686Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:06.3751925Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:06.3752026Z graph_break [] 2025-12-04T12:15:06.3752250Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:06.3752369Z frames [('total', 1)] 2025-12-04T12:15:06.3752486Z stats [('calls_captured', 4)] 2025-12-04T12:15:06.3752706Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:06.3753180Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:06.3753280Z graph_break [] 2025-12-04T12:15:06.3753433Z =================================== FAILURES =================================== 2025-12-04T12:15:06.3753801Z _ TestFP8TypesCUDA.test_valid_cast_float32_shape_15,3,13_dst_types0_cuda_float32 _ 2025-12-04T12:15:06.3753931Z Traceback (most recent call last): 2025-12-04T12:15:06.3754318Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 156, in test_valid_cast 2025-12-04T12:15:06.3754447Z y0_fp8, y1_fp8 = compiled_fp8_cast(x) 2025-12-04T12:15:06.3754940Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:06.3755242Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:06.3755757Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:06.3755963Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:06.3756480Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:06.3756633Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:06.3757178Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:06.3757540Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:06.3758055Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:06.3758221Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:06.3758702Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:06.3758837Z return self._compile_to_module() 2025-12-04T12:15:06.3759321Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:06.3759515Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:06.3760054Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:06.3760185Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:06.3760693Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:06.3760923Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:06.3761507Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:06.3761643Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:06.3762118Z File "/tmp/tmp0_xwpdro/v2/cv2vr5qulpwknx2x7xlkexi6uhlcg3xeczvkdxgb3gweew2ujjyt.py", line 51, in 2025-12-04T12:15:06.3762583Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T12:15:06.3762717Z kernel.precompile( 2025-12-04T12:15:06.3763267Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T12:15:06.3763394Z self._precompile_worker() 2025-12-04T12:15:06.3763989Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:06.3764170Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:06.3764779Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:06.3764976Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:06.3765437Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:06.3765686Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:06.3766169Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:06.3766517Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:06.3766752Z torch._inductor.exc.InductorError: CompilationError: at 7:11: 2025-12-04T12:15:06.3767075Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:06.3767244Z xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:06.3767387Z xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T12:15:06.3767511Z xmask = xindex < xnumel 2025-12-04T12:15:06.3767608Z x0 = xindex 2025-12-04T12:15:06.3767733Z tmp0 = tl.load(in_ptr0 + (x0), xmask) 2025-12-04T12:15:06.3767861Z tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T12:15:06.3767953Z ^ 2025-12-04T12:15:06.3768346Z type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T12:15:06.3768355Z 2025-12-04T12:15:06.3769076Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:06.3769113Z 2025-12-04T12:15:06.3769118Z 2025-12-04T12:15:06.3769336Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:06.3769971Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_valid_cast_float32_shape_15,3,13_dst_types0_cuda_float32 2025-12-04T12:15:06.3769977Z 2025-12-04T12:15:06.3770245Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:06.3770478Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:06.3770580Z frames [('total', 1)] 2025-12-04T12:15:06.3770694Z stats [('calls_captured', 4)] 2025-12-04T12:15:06.3771474Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:06.3771700Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:06.3771804Z graph_break [] 2025-12-04T12:15:06.3772037Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:06.3772141Z frames [('total', 1)] 2025-12-04T12:15:06.3772258Z stats [('calls_captured', 4)] 2025-12-04T12:15:06.3772496Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:06.3772962Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:06.3773079Z graph_break [] 2025-12-04T12:15:06.3773297Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:06.3773403Z frames [('total', 1)] 2025-12-04T12:15:06.3773533Z stats [('calls_captured', 4)] 2025-12-04T12:15:06.3773758Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:06.3774216Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:06.3774332Z graph_break [] 2025-12-04T12:15:06.3775064Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-78153e5fcd212bc6.xml - 2025-12-04T12:15:06.3775286Z =========================== short test summary info ============================ 2025-12-04T12:15:06.3776060Z FAILED [0.4341s] inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float32_shape_15,3,13_dst_types0_cuda_float32 - torch._inductor.exc.InductorError: CompilationError: at 7:11: 2025-12-04T12:15:06.3776453Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:06.3776596Z xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:06.3776738Z xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T12:15:06.3776866Z xmask = xindex < xnumel 2025-12-04T12:15:06.3777044Z x0 = xindex 2025-12-04T12:15:06.3777168Z tmp0 = tl.load(in_ptr0 + (x0), xmask) 2025-12-04T12:15:06.3777300Z tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T12:15:06.3777393Z ^ 2025-12-04T12:15:06.3777783Z type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T12:15:06.3777788Z 2025-12-04T12:15:06.3778558Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:06.3778565Z 2025-12-04T12:15:06.3778570Z 2025-12-04T12:15:06.3778787Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:06.3779419Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_valid_cast_float32_shape_15,3,13_dst_types0_cuda_float32 2025-12-04T12:15:06.3779426Z 2025-12-04T12:15:06.3779702Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:06.3779885Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T12:15:06.3780167Z ================== 1 failed, 63 deselected, 2 rerun in 4.39s =================== 2025-12-04T12:15:06.3780270Z Got exit code 1 2025-12-04T12:15:06.3780393Z Retrying single test... 2025-12-04T12:15:06.3780866Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-04b538cf09549803.xml 2025-12-04T12:15:06.3781038Z ============================= test session starts ============================== 2025-12-04T12:15:06.3781404Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T12:15:06.3781516Z cachedir: .pytest_cache 2025-12-04T12:15:06.3782036Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:15:06.3782207Z rootdir: /var/lib/jenkins/workspace 2025-12-04T12:15:06.3782322Z configfile: pytest.ini 2025-12-04T12:15:06.3782924Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T12:15:06.3783150Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T12:15:06.3783853Z stepcurrent: skipping 63 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float32_shape_15,3,13_dst_types0_cuda_float32 2025-12-04T12:15:06.3783981Z Running 1 items in this shard 2025-12-04T12:15:06.3783987Z 2025-12-04T12:15:06.3785075Z inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float32_shape_15,3,13_dst_types0_cuda_float32 E1204 12:12:23.133000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_0 2025-12-04T12:15:06.3785850Z E1204 12:12:23.133000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:06.3786394Z E1204 12:12:23.133000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:06.3786954Z E1204 12:12:23.133000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T12:15:06.3787458Z E1204 12:12:23.133000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = xindex < xnumel 2025-12-04T12:15:06.3787894Z E1204 12:12:23.133000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T12:15:06.3788448Z E1204 12:12:23.133000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (x0), xmask) 2025-12-04T12:15:06.3789002Z E1204 12:12:23.133000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T12:15:06.3789523Z E1204 12:12:23.133000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tmp1.to(tl.float32) 2025-12-04T12:15:06.3790028Z E1204 12:12:23.133000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tmp0.to(tl.float8e5) 2025-12-04T12:15:06.3790531Z E1204 12:12:23.133000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tmp3.to(tl.float32) 2025-12-04T12:15:06.3791140Z E1204 12:12:23.133000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr0 + (x0), tmp2, xmask) 2025-12-04T12:15:06.3791684Z E1204 12:12:23.133000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (x0), tmp4, xmask) 2025-12-04T12:15:06.3792057Z E1204 12:12:23.133000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:06.3793857Z E1204 12:12:23.133000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp32', 'out_ptr0': '*fp32', 'out_ptr1': '*fp32', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 512}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 75} 2025-12-04T12:15:06.3794441Z E1204 12:12:23.133000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:06.3795306Z E1204 12:12:23.133000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper 2025-12-04T12:15:06.3795844Z E1204 12:12:23.133000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return fn(*args, **kwargs) 2025-12-04T12:15:06.3796690Z E1204 12:12:23.133000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1114, in to 2025-12-04T12:15:06.3797396Z E1204 12:12:23.133000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return cast(self, dtype, fp_downcast_rounding, bitcast, _semantic=_semantic) 2025-12-04T12:15:06.3798262Z E1204 12:12:23.133000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper 2025-12-04T12:15:06.3798761Z E1204 12:12:23.133000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return fn(*args, **kwargs) 2025-12-04T12:15:06.3799618Z E1204 12:12:23.133000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1978, in cast 2025-12-04T12:15:06.3800255Z E1204 12:12:23.133000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return _semantic.cast(input, dtype, fp_downcast_rounding) 2025-12-04T12:15:06.3801129Z E1204 12:12:23.133000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/semantic.py", line 827, in cast 2025-12-04T12:15:06.3801961Z E1204 12:12:23.133000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] self.builder.create_fp_to_fp(input.handle, dst_ty.to_ir(self.builder), fp_downcast_rounding), dst_ty) 2025-12-04T12:15:06.3802804Z E1204 12:12:23.133000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 712, in to_ir 2025-12-04T12:15:06.3803549Z E1204 12:12:23.133000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return builder.get_block_ty(self.element_ty.to_ir(builder), self.shape) 2025-12-04T12:15:06.3804394Z E1204 12:12:23.133000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 574, in to_ir 2025-12-04T12:15:06.3805127Z E1204 12:12:23.133000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] raise ValueError(f'type {self} not supported in this architecture. ' 2025-12-04T12:15:06.3806010Z E1204 12:12:23.133000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError: type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T12:15:06.3806373Z E1204 12:12:23.133000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:06.3807075Z E1204 12:12:23.133000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] The above exception was the direct cause of the following exception: 2025-12-04T12:15:06.3807469Z E1204 12:12:23.133000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:06.3808014Z E1204 12:12:23.133000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:06.3809055Z E1204 12:12:23.133000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:06.3809688Z E1204 12:12:23.133000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:06.3810607Z E1204 12:12:23.133000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:06.3811290Z E1204 12:12:23.133000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:06.3812178Z E1204 12:12:23.133000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:06.3812947Z E1204 12:12:23.133000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:06.3813568Z E1204 12:12:23.133000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 7:11: 2025-12-04T12:15:06.3814328Z E1204 12:12:23.133000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:06.3814884Z E1204 12:12:23.133000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:06.3815441Z E1204 12:12:23.133000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T12:15:06.3815936Z E1204 12:12:23.133000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = xindex < xnumel 2025-12-04T12:15:06.3816448Z E1204 12:12:23.133000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T12:15:06.3816989Z E1204 12:12:23.133000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (x0), xmask) 2025-12-04T12:15:06.3817574Z E1204 12:12:23.133000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T12:15:06.3817995Z E1204 12:12:23.133000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T12:15:06.3818813Z E1204 12:12:23.133000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T12:15:06.3818997Z ('RERUN', {'yellow': True}) [3.4962s] [100%] 2025-12-04T12:15:06.3820079Z inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float32_shape_15,3,13_dst_types0_cuda_float32 E1204 12:12:23.598000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_0 2025-12-04T12:15:06.3820852Z E1204 12:12:23.598000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:06.3821396Z E1204 12:12:23.598000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:06.3821986Z E1204 12:12:23.598000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T12:15:06.3822498Z E1204 12:12:23.598000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = xindex < xnumel 2025-12-04T12:15:06.3822934Z E1204 12:12:23.598000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T12:15:06.3823492Z E1204 12:12:23.598000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (x0), xmask) 2025-12-04T12:15:06.3824013Z E1204 12:12:23.598000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T12:15:06.3824566Z E1204 12:12:23.598000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tmp1.to(tl.float32) 2025-12-04T12:15:06.3825081Z E1204 12:12:23.598000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tmp0.to(tl.float8e5) 2025-12-04T12:15:06.3825586Z E1204 12:12:23.598000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tmp3.to(tl.float32) 2025-12-04T12:15:06.3826148Z E1204 12:12:23.598000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr0 + (x0), tmp2, xmask) 2025-12-04T12:15:06.3826692Z E1204 12:12:23.598000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (x0), tmp4, xmask) 2025-12-04T12:15:06.3827068Z E1204 12:12:23.598000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:06.3828870Z E1204 12:12:23.598000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp32', 'out_ptr0': '*fp32', 'out_ptr1': '*fp32', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 512}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 75} 2025-12-04T12:15:06.3829419Z E1204 12:12:23.598000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:06.3830285Z E1204 12:12:23.598000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper 2025-12-04T12:15:06.3830792Z E1204 12:12:23.598000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return fn(*args, **kwargs) 2025-12-04T12:15:06.3831671Z E1204 12:12:23.598000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1114, in to 2025-12-04T12:15:06.3832384Z E1204 12:12:23.598000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return cast(self, dtype, fp_downcast_rounding, bitcast, _semantic=_semantic) 2025-12-04T12:15:06.3833298Z E1204 12:12:23.598000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper 2025-12-04T12:15:06.3833804Z E1204 12:12:23.598000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return fn(*args, **kwargs) 2025-12-04T12:15:06.3834663Z E1204 12:12:23.598000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1978, in cast 2025-12-04T12:15:06.3835303Z E1204 12:12:23.598000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return _semantic.cast(input, dtype, fp_downcast_rounding) 2025-12-04T12:15:06.3836201Z E1204 12:12:23.598000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/semantic.py", line 827, in cast 2025-12-04T12:15:06.3837031Z E1204 12:12:23.598000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] self.builder.create_fp_to_fp(input.handle, dst_ty.to_ir(self.builder), fp_downcast_rounding), dst_ty) 2025-12-04T12:15:06.3837878Z E1204 12:12:23.598000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 712, in to_ir 2025-12-04T12:15:06.3838622Z E1204 12:12:23.598000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return builder.get_block_ty(self.element_ty.to_ir(builder), self.shape) 2025-12-04T12:15:06.3839467Z E1204 12:12:23.598000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 574, in to_ir 2025-12-04T12:15:06.3840162Z E1204 12:12:23.598000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] raise ValueError(f'type {self} not supported in this architecture. ' 2025-12-04T12:15:06.3841040Z E1204 12:12:23.598000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError: type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T12:15:06.3841402Z E1204 12:12:23.598000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:06.3842098Z E1204 12:12:23.598000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] The above exception was the direct cause of the following exception: 2025-12-04T12:15:06.3842459Z E1204 12:12:23.598000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:06.3843004Z E1204 12:12:23.598000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:06.3844042Z E1204 12:12:23.598000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:06.3844683Z E1204 12:12:23.598000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:06.3845610Z E1204 12:12:23.598000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:06.3846287Z E1204 12:12:23.598000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:06.3847178Z E1204 12:12:23.598000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:06.3848054Z E1204 12:12:23.598000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:06.3848677Z E1204 12:12:23.598000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 7:11: 2025-12-04T12:15:06.3849442Z E1204 12:12:23.598000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:06.3849998Z E1204 12:12:23.598000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:06.3850601Z E1204 12:12:23.598000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T12:15:06.3851100Z E1204 12:12:23.598000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = xindex < xnumel 2025-12-04T12:15:06.3851546Z E1204 12:12:23.598000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T12:15:06.3852082Z E1204 12:12:23.598000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (x0), xmask) 2025-12-04T12:15:06.3852649Z E1204 12:12:23.598000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T12:15:06.3853067Z E1204 12:12:23.598000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T12:15:06.3853890Z E1204 12:12:23.598000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T12:15:06.3854041Z ('RERUN', {'yellow': True}) [0.4248s] [100%] 2025-12-04T12:15:06.3855135Z inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float32_shape_15,3,13_dst_types0_cuda_float32 E1204 12:12:24.027000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_0 2025-12-04T12:15:06.3855903Z E1204 12:12:24.027000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:06.3856550Z E1204 12:12:24.027000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:06.3857114Z E1204 12:12:24.027000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T12:15:06.3857626Z E1204 12:12:24.027000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = xindex < xnumel 2025-12-04T12:15:06.3858058Z E1204 12:12:24.027000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T12:15:06.3858609Z E1204 12:12:24.027000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (x0), xmask) 2025-12-04T12:15:06.3859130Z E1204 12:12:24.027000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T12:15:06.3859693Z E1204 12:12:24.027000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tmp1.to(tl.float32) 2025-12-04T12:15:06.3860204Z E1204 12:12:24.027000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tmp0.to(tl.float8e5) 2025-12-04T12:15:06.3860707Z E1204 12:12:24.027000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tmp3.to(tl.float32) 2025-12-04T12:15:06.3861302Z E1204 12:12:24.027000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr0 + (x0), tmp2, xmask) 2025-12-04T12:15:06.3861846Z E1204 12:12:24.027000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (x0), tmp4, xmask) 2025-12-04T12:15:06.3862219Z E1204 12:12:24.027000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:06.3864024Z E1204 12:12:24.027000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp32', 'out_ptr0': '*fp32', 'out_ptr1': '*fp32', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 512}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 75} 2025-12-04T12:15:06.3864603Z E1204 12:12:24.027000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:06.3865471Z E1204 12:12:24.027000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper 2025-12-04T12:15:06.3865976Z E1204 12:12:24.027000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return fn(*args, **kwargs) 2025-12-04T12:15:06.3866854Z E1204 12:12:24.027000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1114, in to 2025-12-04T12:15:06.3867567Z E1204 12:12:24.027000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return cast(self, dtype, fp_downcast_rounding, bitcast, _semantic=_semantic) 2025-12-04T12:15:06.3868431Z E1204 12:12:24.027000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper 2025-12-04T12:15:06.3868939Z E1204 12:12:24.027000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return fn(*args, **kwargs) 2025-12-04T12:15:06.3869791Z E1204 12:12:24.027000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1978, in cast 2025-12-04T12:15:06.3870427Z E1204 12:12:24.027000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return _semantic.cast(input, dtype, fp_downcast_rounding) 2025-12-04T12:15:06.3871482Z E1204 12:12:24.027000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/semantic.py", line 827, in cast 2025-12-04T12:15:06.3872316Z E1204 12:12:24.027000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] self.builder.create_fp_to_fp(input.handle, dst_ty.to_ir(self.builder), fp_downcast_rounding), dst_ty) 2025-12-04T12:15:06.3873160Z E1204 12:12:24.027000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 712, in to_ir 2025-12-04T12:15:06.3873947Z E1204 12:12:24.027000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return builder.get_block_ty(self.element_ty.to_ir(builder), self.shape) 2025-12-04T12:15:06.3874791Z E1204 12:12:24.027000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 574, in to_ir 2025-12-04T12:15:06.3875490Z E1204 12:12:24.027000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] raise ValueError(f'type {self} not supported in this architecture. ' 2025-12-04T12:15:06.3876424Z E1204 12:12:24.027000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError: type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T12:15:06.3876783Z E1204 12:12:24.027000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:06.3877481Z E1204 12:12:24.027000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] The above exception was the direct cause of the following exception: 2025-12-04T12:15:06.3877842Z E1204 12:12:24.027000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:06.3878429Z E1204 12:12:24.027000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:06.3879471Z E1204 12:12:24.027000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:06.3880115Z E1204 12:12:24.027000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:06.3881004Z E1204 12:12:24.027000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:06.3881728Z E1204 12:12:24.027000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:06.3882624Z E1204 12:12:24.027000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:06.3883397Z E1204 12:12:24.027000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:06.3884020Z E1204 12:12:24.027000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 7:11: 2025-12-04T12:15:06.3884782Z E1204 12:12:24.027000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:06.3885334Z E1204 12:12:24.027000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:06.3885894Z E1204 12:12:24.027000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T12:15:06.3886389Z E1204 12:12:24.027000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = xindex < xnumel 2025-12-04T12:15:06.3886839Z E1204 12:12:24.027000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T12:15:06.3887378Z E1204 12:12:24.027000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (x0), xmask) 2025-12-04T12:15:06.3887915Z E1204 12:12:24.027000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T12:15:06.3888381Z E1204 12:12:24.027000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T12:15:06.3889207Z E1204 12:12:24.027000 127484 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T12:15:06.3889326Z FAILED [0.4280s] [100%] 2025-12-04T12:15:06.3889362Z 2025-12-04T12:15:06.3889511Z ==================================== RERUNS ==================================== 2025-12-04T12:15:06.3889838Z _ TestFP8TypesCUDA.test_valid_cast_float32_shape_15,3,13_dst_types0_cuda_float32 _ 2025-12-04T12:15:06.3889966Z Traceback (most recent call last): 2025-12-04T12:15:06.3890336Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 156, in test_valid_cast 2025-12-04T12:15:06.3890478Z y0_fp8, y1_fp8 = compiled_fp8_cast(x) 2025-12-04T12:15:06.3890975Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:06.3891226Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:06.3891784Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:06.3891979Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:06.3892503Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:06.3892653Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:06.3893188Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:06.3893523Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:06.3894076Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:06.3894238Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:06.3894725Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:06.3894848Z return self._compile_to_module() 2025-12-04T12:15:06.3895342Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:06.3895509Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:06.3896030Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:06.3896174Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:06.3896731Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:06.3896982Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:06.3897568Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:06.3897699Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:06.3898220Z File "/tmp/tmpazmxrjke/mg/cmgd3spxrrgsie7ojrcnagmaoswnkj4hm7jd2ehgjvt343c6nvvt.py", line 51, in 2025-12-04T12:15:06.3898680Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T12:15:06.3898814Z kernel.precompile( 2025-12-04T12:15:06.3899373Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T12:15:06.3899490Z self._precompile_worker() 2025-12-04T12:15:06.3900100Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:06.3900283Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:06.3900919Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:06.3901133Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:06.3901585Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:06.3901878Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:06.3902319Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:06.3902653Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:06.3902895Z torch._inductor.exc.InductorError: CompilationError: at 7:11: 2025-12-04T12:15:06.3903214Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:06.3903360Z xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:06.3903500Z xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T12:15:06.3903641Z xmask = xindex < xnumel 2025-12-04T12:15:06.3903750Z x0 = xindex 2025-12-04T12:15:06.3903873Z tmp0 = tl.load(in_ptr0 + (x0), xmask) 2025-12-04T12:15:06.3903992Z tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T12:15:06.3904102Z ^ 2025-12-04T12:15:06.3904492Z type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T12:15:06.3904499Z 2025-12-04T12:15:06.3905226Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:06.3905232Z 2025-12-04T12:15:06.3905237Z 2025-12-04T12:15:06.3905456Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:06.3906125Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_valid_cast_float32_shape_15,3,13_dst_types0_cuda_float32 2025-12-04T12:15:06.3906133Z 2025-12-04T12:15:06.3906423Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:06.3906648Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:06.3906768Z frames [('total', 1)] 2025-12-04T12:15:06.3906885Z stats [('calls_captured', 4)] 2025-12-04T12:15:06.3907358Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:06.3907594Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:06.3907696Z graph_break [] 2025-12-04T12:15:06.3908009Z _ TestFP8TypesCUDA.test_valid_cast_float32_shape_15,3,13_dst_types0_cuda_float32 _ 2025-12-04T12:15:06.3908144Z Traceback (most recent call last): 2025-12-04T12:15:06.3908521Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 156, in test_valid_cast 2025-12-04T12:15:06.3908664Z y0_fp8, y1_fp8 = compiled_fp8_cast(x) 2025-12-04T12:15:06.3909152Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:06.3909403Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:06.3909927Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:06.3910122Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:06.3910629Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:06.3910786Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:06.3911317Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:06.3911682Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:06.3912203Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:06.3912354Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:06.3912848Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:06.3913003Z return self._compile_to_module() 2025-12-04T12:15:06.3913495Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:06.3913659Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:06.3914186Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:06.3914328Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:06.3914828Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:06.3915075Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:06.3915693Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:06.3915825Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:06.3916335Z File "/tmp/tmprua1fz53/cc/ccce4o7a6hxb477ydo4e2rhu72dcnnp3fbfszb2hh4c3kyeqpjt3.py", line 51, in 2025-12-04T12:15:06.3916800Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T12:15:06.3916914Z kernel.precompile( 2025-12-04T12:15:06.3917482Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T12:15:06.3917604Z self._precompile_worker() 2025-12-04T12:15:06.3918244Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:06.3918432Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:06.3919026Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:06.3919242Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:06.3919697Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:06.3919960Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:06.3920406Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:06.3920744Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:06.3920995Z torch._inductor.exc.InductorError: CompilationError: at 7:11: 2025-12-04T12:15:06.3921319Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:06.3921448Z xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:06.3921607Z xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T12:15:06.3921720Z xmask = xindex < xnumel 2025-12-04T12:15:06.3921832Z x0 = xindex 2025-12-04T12:15:06.3921962Z tmp0 = tl.load(in_ptr0 + (x0), xmask) 2025-12-04T12:15:06.3922085Z tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T12:15:06.3922195Z ^ 2025-12-04T12:15:06.3922584Z type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T12:15:06.3922590Z 2025-12-04T12:15:06.3923302Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:06.3923309Z 2025-12-04T12:15:06.3923331Z 2025-12-04T12:15:06.3923581Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:06.3924208Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_valid_cast_float32_shape_15,3,13_dst_types0_cuda_float32 2025-12-04T12:15:06.3924216Z 2025-12-04T12:15:06.3924502Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:06.3924759Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:06.3924871Z frames [('total', 1)] 2025-12-04T12:15:06.3925003Z stats [('calls_captured', 4)] 2025-12-04T12:15:06.3925472Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:06.3925716Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:06.3925819Z graph_break [] 2025-12-04T12:15:06.3926044Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:06.3926168Z frames [('total', 1)] 2025-12-04T12:15:06.3926282Z stats [('calls_captured', 4)] 2025-12-04T12:15:06.3926534Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:06.3927013Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:06.3927147Z graph_break [] 2025-12-04T12:15:06.3927312Z =================================== FAILURES =================================== 2025-12-04T12:15:06.3927625Z _ TestFP8TypesCUDA.test_valid_cast_float32_shape_15,3,13_dst_types0_cuda_float32 _ 2025-12-04T12:15:06.3927748Z Traceback (most recent call last): 2025-12-04T12:15:06.3928136Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 156, in test_valid_cast 2025-12-04T12:15:06.3928263Z y0_fp8, y1_fp8 = compiled_fp8_cast(x) 2025-12-04T12:15:06.3928804Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:06.3929068Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:06.3929586Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:06.3929795Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:06.3930305Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:06.3930454Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:06.3931000Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:06.3931320Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:06.3931855Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:06.3932008Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:06.3932489Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:06.3932627Z return self._compile_to_module() 2025-12-04T12:15:06.3933113Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:06.3933280Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:06.3933810Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:06.3933937Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:06.3934448Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:06.3934682Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:06.3935321Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:06.3935467Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:06.3935965Z File "/tmp/tmpvv7ocoss/2c/c2c2lga7tiaygn4ikuv35kyz2khvqjwp443pfkaqsspj6zi4d3yo.py", line 51, in 2025-12-04T12:15:06.3936519Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T12:15:06.3936679Z kernel.precompile( 2025-12-04T12:15:06.3937236Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T12:15:06.3937368Z self._precompile_worker() 2025-12-04T12:15:06.3937963Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:06.3938147Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:06.3938756Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:06.3938987Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:06.3939449Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:06.3939701Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:06.3940143Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:06.3940492Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:06.3940725Z torch._inductor.exc.InductorError: CompilationError: at 7:11: 2025-12-04T12:15:06.3941058Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:06.3941215Z xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:06.3941357Z xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T12:15:06.3941483Z xmask = xindex < xnumel 2025-12-04T12:15:06.3941579Z x0 = xindex 2025-12-04T12:15:06.3941703Z tmp0 = tl.load(in_ptr0 + (x0), xmask) 2025-12-04T12:15:06.3941835Z tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T12:15:06.3941928Z ^ 2025-12-04T12:15:06.3942315Z type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T12:15:06.3942321Z 2025-12-04T12:15:06.3943045Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:06.3943051Z 2025-12-04T12:15:06.3943056Z 2025-12-04T12:15:06.3943271Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:06.3943913Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_valid_cast_float32_shape_15,3,13_dst_types0_cuda_float32 2025-12-04T12:15:06.3943919Z 2025-12-04T12:15:06.3944190Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:06.3944425Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:06.3944530Z frames [('total', 1)] 2025-12-04T12:15:06.3944648Z stats [('calls_captured', 4)] 2025-12-04T12:15:06.3945128Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:06.3945353Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:06.3945454Z graph_break [] 2025-12-04T12:15:06.3945687Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:06.3945790Z frames [('total', 1)] 2025-12-04T12:15:06.3945918Z stats [('calls_captured', 4)] 2025-12-04T12:15:06.3946174Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:06.3946635Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:06.3946750Z graph_break [] 2025-12-04T12:15:06.3946965Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:06.3947071Z frames [('total', 1)] 2025-12-04T12:15:06.3947202Z stats [('calls_captured', 4)] 2025-12-04T12:15:06.3947454Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:06.3947911Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:06.3948023Z graph_break [] 2025-12-04T12:15:06.3948671Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-04b538cf09549803.xml - 2025-12-04T12:15:06.3948862Z =========================== short test summary info ============================ 2025-12-04T12:15:06.3949632Z FAILED [0.4280s] inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float32_shape_15,3,13_dst_types0_cuda_float32 - torch._inductor.exc.InductorError: CompilationError: at 7:11: 2025-12-04T12:15:06.3949993Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:06.3950132Z xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:06.3950275Z xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T12:15:06.3950398Z xmask = xindex < xnumel 2025-12-04T12:15:06.3950493Z x0 = xindex 2025-12-04T12:15:06.3950616Z tmp0 = tl.load(in_ptr0 + (x0), xmask) 2025-12-04T12:15:06.3950748Z tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T12:15:06.3950839Z ^ 2025-12-04T12:15:06.3951226Z type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T12:15:06.3951232Z 2025-12-04T12:15:06.3951987Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:06.3951996Z 2025-12-04T12:15:06.3952001Z 2025-12-04T12:15:06.3952219Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:06.3952855Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_valid_cast_float32_shape_15,3,13_dst_types0_cuda_float32 2025-12-04T12:15:06.3952863Z 2025-12-04T12:15:06.3953130Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:06.3953328Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T12:15:06.3953532Z ================== 1 failed, 187 deselected, 2 rerun in 4.39s ================== 2025-12-04T12:15:06.3953635Z Got exit code 1 2025-12-04T12:15:06.3953759Z Retrying single test... 2025-12-04T12:15:06.3954235Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-d91f9f6b0d5ec125.xml 2025-12-04T12:15:06.3954402Z ============================= test session starts ============================== 2025-12-04T12:15:06.3954771Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T12:15:06.3954882Z cachedir: .pytest_cache 2025-12-04T12:15:06.3955416Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:15:06.3955543Z rootdir: /var/lib/jenkins/workspace 2025-12-04T12:15:06.3955652Z configfile: pytest.ini 2025-12-04T12:15:06.3956257Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T12:15:06.3956481Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T12:15:06.3957217Z stepcurrent: skipping 63 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float32_shape_15,3,13_dst_types0_cuda_float32 2025-12-04T12:15:06.3957347Z Running 1 items in this shard 2025-12-04T12:15:06.3957352Z 2025-12-04T12:15:06.3958441Z inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float32_shape_15,3,13_dst_types0_cuda_float32 E1204 12:12:42.869000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_0 2025-12-04T12:15:06.3959245Z E1204 12:12:42.869000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:06.3959791Z E1204 12:12:42.869000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:06.3960364Z E1204 12:12:42.869000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T12:15:06.3960862Z E1204 12:12:42.869000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = xindex < xnumel 2025-12-04T12:15:06.3961326Z E1204 12:12:42.869000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T12:15:06.3961880Z E1204 12:12:42.869000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (x0), xmask) 2025-12-04T12:15:06.3962408Z E1204 12:12:42.869000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T12:15:06.3962930Z E1204 12:12:42.869000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tmp1.to(tl.float32) 2025-12-04T12:15:06.3963440Z E1204 12:12:42.869000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tmp0.to(tl.float8e5) 2025-12-04T12:15:06.3963971Z E1204 12:12:42.869000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tmp3.to(tl.float32) 2025-12-04T12:15:06.3964529Z E1204 12:12:42.869000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr0 + (x0), tmp2, xmask) 2025-12-04T12:15:06.3965077Z E1204 12:12:42.869000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (x0), tmp4, xmask) 2025-12-04T12:15:06.3965452Z E1204 12:12:42.869000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:06.3967297Z E1204 12:12:42.869000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp32', 'out_ptr0': '*fp32', 'out_ptr1': '*fp32', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 512}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 75} 2025-12-04T12:15:06.3967981Z E1204 12:12:42.869000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:06.3968851Z E1204 12:12:42.869000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper 2025-12-04T12:15:06.3969359Z E1204 12:12:42.869000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return fn(*args, **kwargs) 2025-12-04T12:15:06.3970199Z E1204 12:12:42.869000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1114, in to 2025-12-04T12:15:06.3970909Z E1204 12:12:42.869000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return cast(self, dtype, fp_downcast_rounding, bitcast, _semantic=_semantic) 2025-12-04T12:15:06.3972028Z E1204 12:12:42.869000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper 2025-12-04T12:15:06.3972542Z E1204 12:12:42.869000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return fn(*args, **kwargs) 2025-12-04T12:15:06.3973399Z E1204 12:12:42.869000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1978, in cast 2025-12-04T12:15:06.3974082Z E1204 12:12:42.869000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return _semantic.cast(input, dtype, fp_downcast_rounding) 2025-12-04T12:15:06.3974957Z E1204 12:12:42.869000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/semantic.py", line 827, in cast 2025-12-04T12:15:06.3975793Z E1204 12:12:42.869000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] self.builder.create_fp_to_fp(input.handle, dst_ty.to_ir(self.builder), fp_downcast_rounding), dst_ty) 2025-12-04T12:15:06.3976755Z E1204 12:12:42.869000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 712, in to_ir 2025-12-04T12:15:06.3977467Z E1204 12:12:42.869000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return builder.get_block_ty(self.element_ty.to_ir(builder), self.shape) 2025-12-04T12:15:06.3978310Z E1204 12:12:42.869000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 574, in to_ir 2025-12-04T12:15:06.3979061Z E1204 12:12:42.869000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] raise ValueError(f'type {self} not supported in this architecture. ' 2025-12-04T12:15:06.3979950Z E1204 12:12:42.869000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError: type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T12:15:06.3980316Z E1204 12:12:42.869000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:06.3981012Z E1204 12:12:42.869000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] The above exception was the direct cause of the following exception: 2025-12-04T12:15:06.3981371Z E1204 12:12:42.869000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:06.3981916Z E1204 12:12:42.869000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:06.3982959Z E1204 12:12:42.869000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:06.3983607Z E1204 12:12:42.869000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:06.3987074Z E1204 12:12:42.869000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:06.3987758Z E1204 12:12:42.869000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:06.3990541Z E1204 12:12:42.869000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:06.3991337Z E1204 12:12:42.869000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:06.3991951Z E1204 12:12:42.869000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 7:11: 2025-12-04T12:15:06.3992748Z E1204 12:12:42.869000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:06.3993302Z E1204 12:12:42.869000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:06.3993883Z E1204 12:12:42.869000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T12:15:06.3994381Z E1204 12:12:42.869000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = xindex < xnumel 2025-12-04T12:15:06.3994827Z E1204 12:12:42.869000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T12:15:06.3995375Z E1204 12:12:42.869000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (x0), xmask) 2025-12-04T12:15:06.3995911Z E1204 12:12:42.869000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T12:15:06.3996330Z E1204 12:12:42.869000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T12:15:06.3997150Z E1204 12:12:42.869000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T12:15:06.3997346Z ('RERUN', {'yellow': True}) [3.4692s] [100%] 2025-12-04T12:15:06.3998440Z inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float32_shape_15,3,13_dst_types0_cuda_float32 E1204 12:12:43.354000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_0 2025-12-04T12:15:06.3999209Z E1204 12:12:43.354000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:06.3999754Z E1204 12:12:43.354000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:06.4000330Z E1204 12:12:43.354000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T12:15:06.4000830Z E1204 12:12:43.354000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = xindex < xnumel 2025-12-04T12:15:06.4001263Z E1204 12:12:43.354000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T12:15:06.4001820Z E1204 12:12:43.354000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (x0), xmask) 2025-12-04T12:15:06.4002344Z E1204 12:12:43.354000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T12:15:06.4002935Z E1204 12:12:43.354000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tmp1.to(tl.float32) 2025-12-04T12:15:06.4003445Z E1204 12:12:43.354000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tmp0.to(tl.float8e5) 2025-12-04T12:15:06.4003950Z E1204 12:12:43.354000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tmp3.to(tl.float32) 2025-12-04T12:15:06.4004586Z E1204 12:12:43.354000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr0 + (x0), tmp2, xmask) 2025-12-04T12:15:06.4005135Z E1204 12:12:43.354000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (x0), tmp4, xmask) 2025-12-04T12:15:06.4005513Z E1204 12:12:43.354000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:06.4007318Z E1204 12:12:43.354000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp32', 'out_ptr0': '*fp32', 'out_ptr1': '*fp32', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 512}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 75} 2025-12-04T12:15:06.4007875Z E1204 12:12:43.354000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:06.4008739Z E1204 12:12:43.354000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper 2025-12-04T12:15:06.4009247Z E1204 12:12:43.354000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return fn(*args, **kwargs) 2025-12-04T12:15:06.4010092Z E1204 12:12:43.354000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1114, in to 2025-12-04T12:15:06.4010803Z E1204 12:12:43.354000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return cast(self, dtype, fp_downcast_rounding, bitcast, _semantic=_semantic) 2025-12-04T12:15:06.4011714Z E1204 12:12:43.354000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper 2025-12-04T12:15:06.4012220Z E1204 12:12:43.354000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return fn(*args, **kwargs) 2025-12-04T12:15:06.4013082Z E1204 12:12:43.354000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1978, in cast 2025-12-04T12:15:06.4013725Z E1204 12:12:43.354000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return _semantic.cast(input, dtype, fp_downcast_rounding) 2025-12-04T12:15:06.4014593Z E1204 12:12:43.354000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/semantic.py", line 827, in cast 2025-12-04T12:15:06.4015429Z E1204 12:12:43.354000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] self.builder.create_fp_to_fp(input.handle, dst_ty.to_ir(self.builder), fp_downcast_rounding), dst_ty) 2025-12-04T12:15:06.4016274Z E1204 12:12:43.354000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 712, in to_ir 2025-12-04T12:15:06.4017155Z E1204 12:12:43.354000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return builder.get_block_ty(self.element_ty.to_ir(builder), self.shape) 2025-12-04T12:15:06.4017999Z E1204 12:12:43.354000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 574, in to_ir 2025-12-04T12:15:06.4018782Z E1204 12:12:43.354000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] raise ValueError(f'type {self} not supported in this architecture. ' 2025-12-04T12:15:06.4019682Z E1204 12:12:43.354000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError: type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T12:15:06.4020049Z E1204 12:12:43.354000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:06.4020749Z E1204 12:12:43.354000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] The above exception was the direct cause of the following exception: 2025-12-04T12:15:06.4021107Z E1204 12:12:43.354000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:06.4021660Z E1204 12:12:43.354000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:06.4022727Z E1204 12:12:43.354000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:06.4023481Z E1204 12:12:43.354000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:06.4024381Z E1204 12:12:43.354000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:06.4025064Z E1204 12:12:43.354000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:06.4025957Z E1204 12:12:43.354000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:06.4026800Z E1204 12:12:43.354000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:06.4027431Z E1204 12:12:43.354000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 7:11: 2025-12-04T12:15:06.4028194Z E1204 12:12:43.354000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:06.4028759Z E1204 12:12:43.354000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:06.4029320Z E1204 12:12:43.354000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T12:15:06.4029820Z E1204 12:12:43.354000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = xindex < xnumel 2025-12-04T12:15:06.4030268Z E1204 12:12:43.354000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T12:15:06.4030817Z E1204 12:12:43.354000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (x0), xmask) 2025-12-04T12:15:06.4031414Z E1204 12:12:43.354000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T12:15:06.4031832Z E1204 12:12:43.354000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T12:15:06.4032659Z E1204 12:12:43.354000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T12:15:06.4032872Z ('RERUN', {'yellow': True}) [0.4420s] [100%] 2025-12-04T12:15:06.4033992Z inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float32_shape_15,3,13_dst_types0_cuda_float32 E1204 12:12:43.796000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_0 2025-12-04T12:15:06.4034771Z E1204 12:12:43.796000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:06.4035316Z E1204 12:12:43.796000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:06.4035891Z E1204 12:12:43.796000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T12:15:06.4036386Z E1204 12:12:43.796000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = xindex < xnumel 2025-12-04T12:15:06.4036832Z E1204 12:12:43.796000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T12:15:06.4037388Z E1204 12:12:43.796000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (x0), xmask) 2025-12-04T12:15:06.4037919Z E1204 12:12:43.796000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T12:15:06.4038439Z E1204 12:12:43.796000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tmp1.to(tl.float32) 2025-12-04T12:15:06.4038948Z E1204 12:12:43.796000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tmp0.to(tl.float8e5) 2025-12-04T12:15:06.4039452Z E1204 12:12:43.796000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tmp3.to(tl.float32) 2025-12-04T12:15:06.4040064Z E1204 12:12:43.796000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr0 + (x0), tmp2, xmask) 2025-12-04T12:15:06.4040609Z E1204 12:12:43.796000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (x0), tmp4, xmask) 2025-12-04T12:15:06.4040982Z E1204 12:12:43.796000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:06.4042785Z E1204 12:12:43.796000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp32', 'out_ptr0': '*fp32', 'out_ptr1': '*fp32', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 512}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 75} 2025-12-04T12:15:06.4043342Z E1204 12:12:43.796000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:06.4044205Z E1204 12:12:43.796000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper 2025-12-04T12:15:06.4044710Z E1204 12:12:43.796000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return fn(*args, **kwargs) 2025-12-04T12:15:06.4045624Z E1204 12:12:43.796000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1114, in to 2025-12-04T12:15:06.4046333Z E1204 12:12:43.796000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return cast(self, dtype, fp_downcast_rounding, bitcast, _semantic=_semantic) 2025-12-04T12:15:06.4047271Z E1204 12:12:43.796000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper 2025-12-04T12:15:06.4047777Z E1204 12:12:43.796000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return fn(*args, **kwargs) 2025-12-04T12:15:06.4048631Z E1204 12:12:43.796000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1978, in cast 2025-12-04T12:15:06.4049270Z E1204 12:12:43.796000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return _semantic.cast(input, dtype, fp_downcast_rounding) 2025-12-04T12:15:06.4050139Z E1204 12:12:43.796000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/semantic.py", line 827, in cast 2025-12-04T12:15:06.4050974Z E1204 12:12:43.796000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] self.builder.create_fp_to_fp(input.handle, dst_ty.to_ir(self.builder), fp_downcast_rounding), dst_ty) 2025-12-04T12:15:06.4051813Z E1204 12:12:43.796000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 712, in to_ir 2025-12-04T12:15:06.4052522Z E1204 12:12:43.796000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return builder.get_block_ty(self.element_ty.to_ir(builder), self.shape) 2025-12-04T12:15:06.4053366Z E1204 12:12:43.796000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 574, in to_ir 2025-12-04T12:15:06.4054067Z E1204 12:12:43.796000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] raise ValueError(f'type {self} not supported in this architecture. ' 2025-12-04T12:15:06.4054984Z E1204 12:12:43.796000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError: type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T12:15:06.4055350Z E1204 12:12:43.796000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:06.4056043Z E1204 12:12:43.796000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] The above exception was the direct cause of the following exception: 2025-12-04T12:15:06.4056507Z E1204 12:12:43.796000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:06.4057053Z E1204 12:12:43.796000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:06.4058093Z E1204 12:12:43.796000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:06.4058741Z E1204 12:12:43.796000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:06.4059627Z E1204 12:12:43.796000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:06.4060417Z E1204 12:12:43.796000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:06.4061309Z E1204 12:12:43.796000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:06.4062151Z E1204 12:12:43.796000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:06.4062774Z E1204 12:12:43.796000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 7:11: 2025-12-04T12:15:06.4063535Z E1204 12:12:43.796000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:06.4064090Z E1204 12:12:43.796000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:06.4064650Z E1204 12:12:43.796000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T12:15:06.4065149Z E1204 12:12:43.796000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = xindex < xnumel 2025-12-04T12:15:06.4065597Z E1204 12:12:43.796000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T12:15:06.4066141Z E1204 12:12:43.796000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (x0), xmask) 2025-12-04T12:15:06.4066677Z E1204 12:12:43.796000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T12:15:06.4067096Z E1204 12:12:43.796000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T12:15:06.4067917Z E1204 12:12:43.796000 127681 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T12:15:06.4068042Z FAILED [0.4395s] [100%] 2025-12-04T12:15:06.4068049Z 2025-12-04T12:15:06.4068233Z ==================================== RERUNS ==================================== 2025-12-04T12:15:06.4068565Z _ TestFP8TypesCUDA.test_valid_cast_float32_shape_15,3,13_dst_types0_cuda_float32 _ 2025-12-04T12:15:06.4068693Z Traceback (most recent call last): 2025-12-04T12:15:06.4069070Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 156, in test_valid_cast 2025-12-04T12:15:06.4069216Z y0_fp8, y1_fp8 = compiled_fp8_cast(x) 2025-12-04T12:15:06.4069709Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:06.4069961Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:06.4070486Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:06.4070681Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:06.4071424Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:06.4071579Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:06.4072113Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:06.4072452Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:06.4072973Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:06.4073234Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:06.4073714Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:06.4073840Z return self._compile_to_module() 2025-12-04T12:15:06.4074344Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:06.4074613Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:06.4075136Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:06.4075281Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:06.4075779Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:06.4076035Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:06.4076625Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:06.4076754Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:06.4077273Z File "/tmp/tmpt6w16nnv/du/cduc2masbadbpu7gctirhfo35eloemknautmizriuqaxr6z5sq6a.py", line 51, in 2025-12-04T12:15:06.4077743Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T12:15:06.4077871Z kernel.precompile( 2025-12-04T12:15:06.4078426Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T12:15:06.4078547Z self._precompile_worker() 2025-12-04T12:15:06.4079156Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:06.4079340Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:06.4079936Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:06.4080149Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:06.4080597Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:06.4080904Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:06.4081353Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:06.4081689Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:06.4081936Z torch._inductor.exc.InductorError: CompilationError: at 7:11: 2025-12-04T12:15:06.4082259Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:06.4082399Z xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:06.4082540Z xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T12:15:06.4082651Z xmask = xindex < xnumel 2025-12-04T12:15:06.4082762Z x0 = xindex 2025-12-04T12:15:06.4082887Z tmp0 = tl.load(in_ptr0 + (x0), xmask) 2025-12-04T12:15:06.4083009Z tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T12:15:06.4083120Z ^ 2025-12-04T12:15:06.4083512Z type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T12:15:06.4083519Z 2025-12-04T12:15:06.4084245Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:06.4084252Z 2025-12-04T12:15:06.4084257Z 2025-12-04T12:15:06.4084472Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:06.4085153Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_valid_cast_float32_shape_15,3,13_dst_types0_cuda_float32 2025-12-04T12:15:06.4085160Z 2025-12-04T12:15:06.4085441Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:06.4085668Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:06.4085822Z frames [('total', 1)] 2025-12-04T12:15:06.4085944Z stats [('calls_captured', 4)] 2025-12-04T12:15:06.4086443Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:06.4086681Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:06.4086780Z graph_break [] 2025-12-04T12:15:06.4087120Z _ TestFP8TypesCUDA.test_valid_cast_float32_shape_15,3,13_dst_types0_cuda_float32 _ 2025-12-04T12:15:06.4087246Z Traceback (most recent call last): 2025-12-04T12:15:06.4087636Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 156, in test_valid_cast 2025-12-04T12:15:06.4087767Z y0_fp8, y1_fp8 = compiled_fp8_cast(x) 2025-12-04T12:15:06.4088257Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:06.4088523Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:06.4089042Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:06.4089242Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:06.4089772Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:06.4089923Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:06.4090471Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:06.4090796Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:06.4091315Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:06.4091483Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:06.4091966Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:06.4092154Z return self._compile_to_module() 2025-12-04T12:15:06.4092648Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:06.4092820Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:06.4093350Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:06.4093485Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:06.4093986Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:06.4094237Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:06.4094823Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:06.4094969Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:06.4095479Z File "/tmp/tmp76qdwytu/tr/ctrjd2h5qp4dv7rwfmx4kjgko243xboyncysg57ozmyhlbe6n3fc.py", line 51, in 2025-12-04T12:15:06.4095946Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T12:15:06.4096076Z kernel.precompile( 2025-12-04T12:15:06.4096713Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T12:15:06.4096901Z self._precompile_worker() 2025-12-04T12:15:06.4097501Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:06.4097680Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:06.4098288Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:06.4098526Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:06.4099008Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:06.4099273Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:06.4099716Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:06.4100068Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:06.4100305Z torch._inductor.exc.InductorError: CompilationError: at 7:11: 2025-12-04T12:15:06.4100628Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:06.4100771Z xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:06.4100917Z xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T12:15:06.4101028Z xmask = xindex < xnumel 2025-12-04T12:15:06.4101144Z x0 = xindex 2025-12-04T12:15:06.4101272Z tmp0 = tl.load(in_ptr0 + (x0), xmask) 2025-12-04T12:15:06.4101414Z tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T12:15:06.4101510Z ^ 2025-12-04T12:15:06.4101897Z type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T12:15:06.4101903Z 2025-12-04T12:15:06.4102628Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:06.4102637Z 2025-12-04T12:15:06.4102641Z 2025-12-04T12:15:06.4102858Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:06.4103495Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_valid_cast_float32_shape_15,3,13_dst_types0_cuda_float32 2025-12-04T12:15:06.4103501Z 2025-12-04T12:15:06.4103772Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:06.4104038Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:06.4104162Z frames [('total', 1)] 2025-12-04T12:15:06.4104279Z stats [('calls_captured', 4)] 2025-12-04T12:15:06.4104758Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:06.4104980Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:06.4105080Z graph_break [] 2025-12-04T12:15:06.4105313Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:06.4105417Z frames [('total', 1)] 2025-12-04T12:15:06.4105533Z stats [('calls_captured', 4)] 2025-12-04T12:15:06.4105766Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:06.4106228Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:06.4106342Z graph_break [] 2025-12-04T12:15:06.4106496Z =================================== FAILURES =================================== 2025-12-04T12:15:06.4106809Z _ TestFP8TypesCUDA.test_valid_cast_float32_shape_15,3,13_dst_types0_cuda_float32 _ 2025-12-04T12:15:06.4106947Z Traceback (most recent call last): 2025-12-04T12:15:06.4107320Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 156, in test_valid_cast 2025-12-04T12:15:06.4107447Z y0_fp8, y1_fp8 = compiled_fp8_cast(x) 2025-12-04T12:15:06.4107993Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:06.4108241Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:06.4108774Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:06.4108969Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:06.4109520Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:06.4109722Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:06.4110257Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:06.4110578Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:06.4111109Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:06.4111261Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:06.4111754Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:06.4111878Z return self._compile_to_module() 2025-12-04T12:15:06.4112363Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:06.4112547Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:06.4113063Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:06.4113209Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:06.4113705Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:06.4113938Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:06.4114540Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:06.4114668Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:06.4115169Z File "/tmp/tmptja5smb2/mb/cmbn3dj5wnso24f63dqvjd4pgdznbtmuljhs5h6bljqnndm2i7vy.py", line 51, in 2025-12-04T12:15:06.4115683Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T12:15:06.4115799Z kernel.precompile( 2025-12-04T12:15:06.4116369Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T12:15:06.4116488Z self._precompile_worker() 2025-12-04T12:15:06.4117085Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:06.4117283Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:06.4117876Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:06.4118086Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:06.4118537Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:06.4118789Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:06.4119251Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:06.4119586Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:06.4119817Z torch._inductor.exc.InductorError: CompilationError: at 7:11: 2025-12-04T12:15:06.4120150Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:06.4120307Z xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:06.4120461Z xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T12:15:06.4120573Z xmask = xindex < xnumel 2025-12-04T12:15:06.4120669Z x0 = xindex 2025-12-04T12:15:06.4120809Z tmp0 = tl.load(in_ptr0 + (x0), xmask) 2025-12-04T12:15:06.4120930Z tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T12:15:06.4121054Z ^ 2025-12-04T12:15:06.4121457Z type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T12:15:06.4121493Z 2025-12-04T12:15:06.4122205Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:06.4122211Z 2025-12-04T12:15:06.4122216Z 2025-12-04T12:15:06.4122447Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:06.4123075Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_valid_cast_float32_shape_15,3,13_dst_types0_cuda_float32 2025-12-04T12:15:06.4123081Z 2025-12-04T12:15:06.4123367Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:06.4123620Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:06.4123766Z frames [('total', 1)] 2025-12-04T12:15:06.4123943Z stats [('calls_captured', 4)] 2025-12-04T12:15:06.4124435Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:06.4124657Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:06.4124771Z graph_break [] 2025-12-04T12:15:06.4124989Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:06.4125099Z frames [('total', 1)] 2025-12-04T12:15:06.4125228Z stats [('calls_captured', 4)] 2025-12-04T12:15:06.4125451Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:06.4125925Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:06.4126024Z graph_break [] 2025-12-04T12:15:06.4126239Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:06.4126359Z frames [('total', 1)] 2025-12-04T12:15:06.4126474Z stats [('calls_captured', 4)] 2025-12-04T12:15:06.4126758Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:06.4127232Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:06.4127332Z graph_break [] 2025-12-04T12:15:06.4127997Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-d91f9f6b0d5ec125.xml - 2025-12-04T12:15:06.4128177Z =========================== short test summary info ============================ 2025-12-04T12:15:06.4128940Z FAILED [0.4395s] inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float32_shape_15,3,13_dst_types0_cuda_float32 - torch._inductor.exc.InductorError: CompilationError: at 7:11: 2025-12-04T12:15:06.4129276Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:06.4129404Z xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:06.4129558Z xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T12:15:06.4129672Z xmask = xindex < xnumel 2025-12-04T12:15:06.4129768Z x0 = xindex 2025-12-04T12:15:06.4129907Z tmp0 = tl.load(in_ptr0 + (x0), xmask) 2025-12-04T12:15:06.4130026Z tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T12:15:06.4130118Z ^ 2025-12-04T12:15:06.4130522Z type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T12:15:06.4130528Z 2025-12-04T12:15:06.4131280Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:06.4131286Z 2025-12-04T12:15:06.4131291Z 2025-12-04T12:15:06.4131521Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:06.4132147Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_valid_cast_float32_shape_15,3,13_dst_types0_cuda_float32 2025-12-04T12:15:06.4132191Z 2025-12-04T12:15:06.4132494Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:06.4132690Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T12:15:06.4132893Z ================== 1 failed, 187 deselected, 2 rerun in 4.39s ================== 2025-12-04T12:15:06.4133011Z Got exit code 1 2025-12-04T12:15:06.4133549Z FAILED CONSISTENTLY: test/inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float32_shape_15,3,13_dst_types0_cuda_float32 2025-12-04T12:15:06.4133959Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T12:15:06.4134442Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-ebbb316cfb6210df.xml 2025-12-04T12:15:06.4134613Z ============================= test session starts ============================== 2025-12-04T12:15:06.4134983Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T12:15:06.4135102Z cachedir: .pytest_cache 2025-12-04T12:15:06.4135623Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:15:06.4135766Z rootdir: /var/lib/jenkins/workspace 2025-12-04T12:15:06.4135878Z configfile: pytest.ini 2025-12-04T12:15:06.4136633Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T12:15:06.4136888Z collecting ... collected 188 items / 64 deselected / 124 selected 2025-12-04T12:15:06.4137033Z stepcurrent: skipping 64 already run items. 2025-12-04T12:15:06.4137166Z Running 124 items in this shard 2025-12-04T12:15:06.4137172Z 2025-12-04T12:15:06.4138346Z inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float32_shape_4,2048,4096_dst_types0_cuda_float32 E1204 12:13:02.522000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_0 2025-12-04T12:15:06.4139119Z E1204 12:13:02.522000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:06.4139681Z E1204 12:13:02.522000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:06.4140243Z E1204 12:13:02.522000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T12:15:06.4140750Z E1204 12:13:02.522000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = xindex < xnumel 2025-12-04T12:15:06.4141183Z E1204 12:13:02.522000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T12:15:06.4141730Z E1204 12:13:02.522000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (x0), xmask) 2025-12-04T12:15:06.4142266Z E1204 12:13:02.522000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T12:15:06.4142770Z E1204 12:13:02.522000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tmp1.to(tl.float32) 2025-12-04T12:15:06.4143292Z E1204 12:13:02.522000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tmp0.to(tl.float8e5) 2025-12-04T12:15:06.4143837Z E1204 12:13:02.522000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tmp3.to(tl.float32) 2025-12-04T12:15:06.4144392Z E1204 12:13:02.522000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr0 + (x0), tmp2, xmask) 2025-12-04T12:15:06.4144981Z E1204 12:13:02.522000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (x0), tmp4, xmask) 2025-12-04T12:15:06.4145377Z E1204 12:13:02.522000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:06.4147193Z E1204 12:13:02.522000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp32', 'out_ptr0': '*fp32', 'out_ptr1': '*fp32', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1024}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 75} 2025-12-04T12:15:06.4147731Z E1204 12:13:02.522000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:06.4148612Z E1204 12:13:02.522000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper 2025-12-04T12:15:06.4149119Z E1204 12:13:02.522000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return fn(*args, **kwargs) 2025-12-04T12:15:06.4149964Z E1204 12:13:02.522000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1114, in to 2025-12-04T12:15:06.4150674Z E1204 12:13:02.522000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return cast(self, dtype, fp_downcast_rounding, bitcast, _semantic=_semantic) 2025-12-04T12:15:06.4151529Z E1204 12:13:02.522000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper 2025-12-04T12:15:06.4152085Z E1204 12:13:02.522000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return fn(*args, **kwargs) 2025-12-04T12:15:06.4152930Z E1204 12:13:02.522000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1978, in cast 2025-12-04T12:15:06.4153582Z E1204 12:13:02.522000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return _semantic.cast(input, dtype, fp_downcast_rounding) 2025-12-04T12:15:06.4154455Z E1204 12:13:02.522000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/semantic.py", line 827, in cast 2025-12-04T12:15:06.4155289Z E1204 12:13:02.522000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] self.builder.create_fp_to_fp(input.handle, dst_ty.to_ir(self.builder), fp_downcast_rounding), dst_ty) 2025-12-04T12:15:06.4156146Z E1204 12:13:02.522000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 712, in to_ir 2025-12-04T12:15:06.4156839Z E1204 12:13:02.522000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return builder.get_block_ty(self.element_ty.to_ir(builder), self.shape) 2025-12-04T12:15:06.4157696Z E1204 12:13:02.522000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 574, in to_ir 2025-12-04T12:15:06.4158414Z E1204 12:13:02.522000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] raise ValueError(f'type {self} not supported in this architecture. ' 2025-12-04T12:15:06.4159308Z E1204 12:13:02.522000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError: type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T12:15:06.4159737Z E1204 12:13:02.522000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:06.4160429Z E1204 12:13:02.522000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] The above exception was the direct cause of the following exception: 2025-12-04T12:15:06.4160791Z E1204 12:13:02.522000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:06.4161323Z E1204 12:13:02.522000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:06.4162377Z E1204 12:13:02.522000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:06.4163008Z E1204 12:13:02.522000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:06.4163920Z E1204 12:13:02.522000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:06.4164602Z E1204 12:13:02.522000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:06.4165507Z E1204 12:13:02.522000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:06.4166278Z E1204 12:13:02.522000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:06.4166932Z E1204 12:13:02.522000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 7:11: 2025-12-04T12:15:06.4167703Z E1204 12:13:02.522000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:06.4168244Z E1204 12:13:02.522000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:06.4168822Z E1204 12:13:02.522000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T12:15:06.4169316Z E1204 12:13:02.522000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = xindex < xnumel 2025-12-04T12:15:06.4169765Z E1204 12:13:02.522000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T12:15:06.4170317Z E1204 12:13:02.522000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (x0), xmask) 2025-12-04T12:15:06.4170841Z E1204 12:13:02.522000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T12:15:06.4171484Z E1204 12:13:02.522000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T12:15:06.4172316Z E1204 12:13:02.522000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T12:15:06.4172559Z ('RERUN', {'yellow': True}) [3.4763s] [ 0%] 2025-12-04T12:15:06.4173669Z inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float32_shape_4,2048,4096_dst_types0_cuda_float32 E1204 12:13:02.997000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_0 2025-12-04T12:15:06.4174534Z E1204 12:13:02.997000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:06.4175096Z E1204 12:13:02.997000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:06.4175656Z E1204 12:13:02.997000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T12:15:06.4176169Z E1204 12:13:02.997000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = xindex < xnumel 2025-12-04T12:15:06.4176658Z E1204 12:13:02.997000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T12:15:06.4177201Z E1204 12:13:02.997000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (x0), xmask) 2025-12-04T12:15:06.4177752Z E1204 12:13:02.997000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T12:15:06.4178258Z E1204 12:13:02.997000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tmp1.to(tl.float32) 2025-12-04T12:15:06.4178785Z E1204 12:13:02.997000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tmp0.to(tl.float8e5) 2025-12-04T12:15:06.4179293Z E1204 12:13:02.997000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tmp3.to(tl.float32) 2025-12-04T12:15:06.4179850Z E1204 12:13:02.997000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr0 + (x0), tmp2, xmask) 2025-12-04T12:15:06.4180393Z E1204 12:13:02.997000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (x0), tmp4, xmask) 2025-12-04T12:15:06.4180806Z E1204 12:13:02.997000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:06.4182624Z E1204 12:13:02.997000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp32', 'out_ptr0': '*fp32', 'out_ptr1': '*fp32', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1024}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 75} 2025-12-04T12:15:06.4183160Z E1204 12:13:02.997000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:06.4184034Z E1204 12:13:02.997000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper 2025-12-04T12:15:06.4184545Z E1204 12:13:02.997000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return fn(*args, **kwargs) 2025-12-04T12:15:06.4185393Z E1204 12:13:02.997000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1114, in to 2025-12-04T12:15:06.4186105Z E1204 12:13:02.997000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return cast(self, dtype, fp_downcast_rounding, bitcast, _semantic=_semantic) 2025-12-04T12:15:06.4186999Z E1204 12:13:02.997000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper 2025-12-04T12:15:06.4187517Z E1204 12:13:02.997000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return fn(*args, **kwargs) 2025-12-04T12:15:06.4188437Z E1204 12:13:02.997000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1978, in cast 2025-12-04T12:15:06.4189089Z E1204 12:13:02.997000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return _semantic.cast(input, dtype, fp_downcast_rounding) 2025-12-04T12:15:06.4189959Z E1204 12:13:02.997000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/semantic.py", line 827, in cast 2025-12-04T12:15:06.4190787Z E1204 12:13:02.997000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] self.builder.create_fp_to_fp(input.handle, dst_ty.to_ir(self.builder), fp_downcast_rounding), dst_ty) 2025-12-04T12:15:06.4191631Z E1204 12:13:02.997000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 712, in to_ir 2025-12-04T12:15:06.4192328Z E1204 12:13:02.997000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return builder.get_block_ty(self.element_ty.to_ir(builder), self.shape) 2025-12-04T12:15:06.4193179Z E1204 12:13:02.997000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 574, in to_ir 2025-12-04T12:15:06.4193866Z E1204 12:13:02.997000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] raise ValueError(f'type {self} not supported in this architecture. ' 2025-12-04T12:15:06.4194758Z E1204 12:13:02.997000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError: type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T12:15:06.4195160Z E1204 12:13:02.997000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:06.4195855Z E1204 12:13:02.997000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] The above exception was the direct cause of the following exception: 2025-12-04T12:15:06.4196215Z E1204 12:13:02.997000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:06.4196752Z E1204 12:13:02.997000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:06.4197805Z E1204 12:13:02.997000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:06.4198431Z E1204 12:13:02.997000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:06.4199335Z E1204 12:13:02.997000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:06.4200012Z E1204 12:13:02.997000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:06.4200939Z E1204 12:13:02.997000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:06.4201709Z E1204 12:13:02.997000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:06.4202358Z E1204 12:13:02.997000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 7:11: 2025-12-04T12:15:06.4203162Z E1204 12:13:02.997000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:06.4203706Z E1204 12:13:02.997000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:06.4204280Z E1204 12:13:02.997000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T12:15:06.4204773Z E1204 12:13:02.997000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = xindex < xnumel 2025-12-04T12:15:06.4205218Z E1204 12:13:02.997000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T12:15:06.4205767Z E1204 12:13:02.997000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (x0), xmask) 2025-12-04T12:15:06.4206290Z E1204 12:13:02.997000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T12:15:06.4206719Z E1204 12:13:02.997000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T12:15:06.4207539Z E1204 12:13:02.997000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T12:15:06.4207690Z ('RERUN', {'yellow': True}) [0.4351s] [ 0%] 2025-12-04T12:15:06.4208801Z inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float32_shape_4,2048,4096_dst_types0_cuda_float32 E1204 12:13:03.432000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_0 2025-12-04T12:15:06.4209600Z E1204 12:13:03.432000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:06.4210157Z E1204 12:13:03.432000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:06.4210720Z E1204 12:13:03.432000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T12:15:06.4211226Z E1204 12:13:03.432000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = xindex < xnumel 2025-12-04T12:15:06.4211656Z E1204 12:13:03.432000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T12:15:06.4212198Z E1204 12:13:03.432000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (x0), xmask) 2025-12-04T12:15:06.4212741Z E1204 12:13:03.432000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T12:15:06.4213249Z E1204 12:13:03.432000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tmp1.to(tl.float32) 2025-12-04T12:15:06.4213773Z E1204 12:13:03.432000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tmp0.to(tl.float8e5) 2025-12-04T12:15:06.4214276Z E1204 12:13:03.432000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tmp3.to(tl.float32) 2025-12-04T12:15:06.4214900Z E1204 12:13:03.432000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr0 + (x0), tmp2, xmask) 2025-12-04T12:15:06.4215443Z E1204 12:13:03.432000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (x0), tmp4, xmask) 2025-12-04T12:15:06.4215860Z E1204 12:13:03.432000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:06.4217781Z E1204 12:13:03.432000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp32', 'out_ptr0': '*fp32', 'out_ptr1': '*fp32', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1024}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 75} 2025-12-04T12:15:06.4218321Z E1204 12:13:03.432000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:06.4219202Z E1204 12:13:03.432000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper 2025-12-04T12:15:06.4219719Z E1204 12:13:03.432000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return fn(*args, **kwargs) 2025-12-04T12:15:06.4220566Z E1204 12:13:03.432000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1114, in to 2025-12-04T12:15:06.4221275Z E1204 12:13:03.432000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return cast(self, dtype, fp_downcast_rounding, bitcast, _semantic=_semantic) 2025-12-04T12:15:06.4222130Z E1204 12:13:03.432000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper 2025-12-04T12:15:06.4222655Z E1204 12:13:03.432000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return fn(*args, **kwargs) 2025-12-04T12:15:06.4223533Z E1204 12:13:03.432000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1978, in cast 2025-12-04T12:15:06.4224182Z E1204 12:13:03.432000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return _semantic.cast(input, dtype, fp_downcast_rounding) 2025-12-04T12:15:06.4225057Z E1204 12:13:03.432000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/semantic.py", line 827, in cast 2025-12-04T12:15:06.4225885Z E1204 12:13:03.432000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] self.builder.create_fp_to_fp(input.handle, dst_ty.to_ir(self.builder), fp_downcast_rounding), dst_ty) 2025-12-04T12:15:06.4226725Z E1204 12:13:03.432000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 712, in to_ir 2025-12-04T12:15:06.4227427Z E1204 12:13:03.432000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return builder.get_block_ty(self.element_ty.to_ir(builder), self.shape) 2025-12-04T12:15:06.4228285Z E1204 12:13:03.432000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 574, in to_ir 2025-12-04T12:15:06.4229002Z E1204 12:13:03.432000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] raise ValueError(f'type {self} not supported in this architecture. ' 2025-12-04T12:15:06.4229891Z E1204 12:13:03.432000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError: type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T12:15:06.4230302Z E1204 12:13:03.432000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:06.4231028Z E1204 12:13:03.432000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] The above exception was the direct cause of the following exception: 2025-12-04T12:15:06.4231391Z E1204 12:13:03.432000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:06.4231927Z E1204 12:13:03.432000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:06.4232981Z E1204 12:13:03.432000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:06.4233604Z E1204 12:13:03.432000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:06.4234509Z E1204 12:13:03.432000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:06.4235190Z E1204 12:13:03.432000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:06.4236082Z E1204 12:13:03.432000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:06.4236851Z E1204 12:13:03.432000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:06.4237460Z E1204 12:13:03.432000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 7:11: 2025-12-04T12:15:06.4238265Z E1204 12:13:03.432000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:06.4238812Z E1204 12:13:03.432000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:06.4239384Z E1204 12:13:03.432000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T12:15:06.4239880Z E1204 12:13:03.432000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = xindex < xnumel 2025-12-04T12:15:06.4240321Z E1204 12:13:03.432000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T12:15:06.4240861Z E1204 12:13:03.432000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (x0), xmask) 2025-12-04T12:15:06.4241389Z E1204 12:13:03.432000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T12:15:06.4241819Z E1204 12:13:03.432000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T12:15:06.4242639Z E1204 12:13:03.432000 127878 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T12:15:06.4242789Z FAILED [0.4329s] [ 0%] 2025-12-04T12:15:06.4242796Z 2025-12-04T12:15:06.4242945Z ==================================== RERUNS ==================================== 2025-12-04T12:15:06.4243272Z _ TestFP8TypesCUDA.test_valid_cast_float32_shape_4,2048,4096_dst_types0_cuda_float32 _ 2025-12-04T12:15:06.4243410Z Traceback (most recent call last): 2025-12-04T12:15:06.4243814Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 156, in test_valid_cast 2025-12-04T12:15:06.4243972Z y0_fp8, y1_fp8 = compiled_fp8_cast(x) 2025-12-04T12:15:06.4244480Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:06.4244730Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:06.4245257Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:06.4245454Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:06.4245964Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:06.4246126Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:06.4246659Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:06.4246998Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:06.4247521Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:06.4247676Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:06.4248170Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:06.4248298Z return self._compile_to_module() 2025-12-04T12:15:06.4248781Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:06.4248964Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:06.4249482Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:06.4249635Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:06.4250166Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:06.4250402Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:06.4250998Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:06.4251126Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:06.4251600Z File "/tmp/tmp_c_zu4r5/uu/cuukewa6wbxqjvqj2skh52k5pc2mvc6crv56al2zfmkih5jomgrb.py", line 51, in 2025-12-04T12:15:06.4252067Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T12:15:06.4252183Z kernel.precompile( 2025-12-04T12:15:06.4252748Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T12:15:06.4252870Z self._precompile_worker() 2025-12-04T12:15:06.4253470Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:06.4253662Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:06.4254258Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:06.4254467Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:06.4254954Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:06.4255202Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:06.4255660Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:06.4255994Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:06.4256277Z torch._inductor.exc.InductorError: CompilationError: at 7:11: 2025-12-04T12:15:06.4256706Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:06.4256833Z xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:06.4256992Z xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T12:15:06.4257107Z xmask = xindex < xnumel 2025-12-04T12:15:06.4257218Z x0 = xindex 2025-12-04T12:15:06.4257356Z tmp0 = tl.load(in_ptr0 + (x0), xmask) 2025-12-04T12:15:06.4257483Z tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T12:15:06.4257578Z ^ 2025-12-04T12:15:06.4257981Z type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T12:15:06.4257987Z 2025-12-04T12:15:06.4258708Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:06.4258716Z 2025-12-04T12:15:06.4258721Z 2025-12-04T12:15:06.4258961Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:06.4259602Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_valid_cast_float32_shape_4,2048,4096_dst_types0_cuda_float32 2025-12-04T12:15:06.4259608Z 2025-12-04T12:15:06.4259892Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:06.4260120Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:06.4260231Z frames [('total', 1)] 2025-12-04T12:15:06.4260367Z stats [('calls_captured', 4)] 2025-12-04T12:15:06.4260836Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:06.4261059Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:06.4261181Z graph_break [] 2025-12-04T12:15:06.4261506Z _ TestFP8TypesCUDA.test_valid_cast_float32_shape_4,2048,4096_dst_types0_cuda_float32 _ 2025-12-04T12:15:06.4261767Z Traceback (most recent call last): 2025-12-04T12:15:06.4262146Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 156, in test_valid_cast 2025-12-04T12:15:06.4262274Z y0_fp8, y1_fp8 = compiled_fp8_cast(x) 2025-12-04T12:15:06.4262782Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:06.4263035Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:06.4263571Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:06.4263769Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:06.4264282Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:06.4264448Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:06.4264986Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:06.4265310Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:06.4265846Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:06.4265996Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:06.4266525Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:06.4266651Z return self._compile_to_module() 2025-12-04T12:15:06.4267136Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:06.4267319Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:06.4267904Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:06.4268057Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:06.4268556Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:06.4268789Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:06.4269393Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:06.4269527Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:06.4270024Z File "/tmp/tmp7la5352y/lb/clb77x6yqs7hsb3hmtygkrjoinvimp542n54darrcwxcz5koa3yw.py", line 51, in 2025-12-04T12:15:06.4270501Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T12:15:06.4270619Z kernel.precompile( 2025-12-04T12:15:06.4271367Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T12:15:06.4271488Z self._precompile_worker() 2025-12-04T12:15:06.4272087Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:06.4272284Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:06.4272881Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:06.4273095Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:06.4273546Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:06.4273791Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:06.4274331Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:06.4274671Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:06.4274901Z torch._inductor.exc.InductorError: CompilationError: at 7:11: 2025-12-04T12:15:06.4275235Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:06.4275360Z xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:06.4275517Z xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T12:15:06.4275627Z xmask = xindex < xnumel 2025-12-04T12:15:06.4275722Z x0 = xindex 2025-12-04T12:15:06.4275860Z tmp0 = tl.load(in_ptr0 + (x0), xmask) 2025-12-04T12:15:06.4275979Z tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T12:15:06.4276072Z ^ 2025-12-04T12:15:06.4276475Z type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T12:15:06.4276484Z 2025-12-04T12:15:06.4277200Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:06.4277206Z 2025-12-04T12:15:06.4277211Z 2025-12-04T12:15:06.4277440Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:06.4278076Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_valid_cast_float32_shape_4,2048,4096_dst_types0_cuda_float32 2025-12-04T12:15:06.4278151Z 2025-12-04T12:15:06.4278437Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:06.4278662Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:06.4278767Z frames [('total', 1)] 2025-12-04T12:15:06.4278900Z stats [('calls_captured', 4)] 2025-12-04T12:15:06.4279366Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:06.4279676Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:06.4279794Z graph_break [] 2025-12-04T12:15:06.4280015Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:06.4280121Z frames [('total', 1)] 2025-12-04T12:15:06.4280251Z stats [('calls_captured', 4)] 2025-12-04T12:15:06.4280470Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:06.4280945Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:06.4281049Z graph_break [] 2025-12-04T12:15:06.4281199Z =================================== FAILURES =================================== 2025-12-04T12:15:06.4281534Z _ TestFP8TypesCUDA.test_valid_cast_float32_shape_4,2048,4096_dst_types0_cuda_float32 _ 2025-12-04T12:15:06.4281659Z Traceback (most recent call last): 2025-12-04T12:15:06.4282036Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 156, in test_valid_cast 2025-12-04T12:15:06.4282182Z y0_fp8, y1_fp8 = compiled_fp8_cast(x) 2025-12-04T12:15:06.4282670Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:06.4282931Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:06.4283448Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:06.4283643Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:06.4284167Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:06.4284314Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:06.4284859Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:06.4285213Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:06.4285735Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:06.4285898Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:06.4286378Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:06.4286505Z return self._compile_to_module() 2025-12-04T12:15:06.4287003Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:06.4287167Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:06.4287699Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:06.4287832Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:06.4288333Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:06.4288578Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:06.4289162Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:06.4289304Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:06.4289841Z File "/tmp/tmpsn1tewe1/ke/ckevxfko2vbgshwtfltjb26qlatadjfaue2iv7hl7ulz465sdadk.py", line 51, in 2025-12-04T12:15:06.4290304Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T12:15:06.4290432Z kernel.precompile( 2025-12-04T12:15:06.4290988Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T12:15:06.4291134Z self._precompile_worker() 2025-12-04T12:15:06.4291776Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:06.4291956Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:06.4292717Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:06.4292924Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:06.4293381Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:06.4293641Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:06.4294084Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:06.4294433Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:06.4294672Z torch._inductor.exc.InductorError: CompilationError: at 7:11: 2025-12-04T12:15:06.4294996Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:06.4295136Z xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:06.4295278Z xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T12:15:06.4295392Z xmask = xindex < xnumel 2025-12-04T12:15:06.4295506Z x0 = xindex 2025-12-04T12:15:06.4295629Z tmp0 = tl.load(in_ptr0 + (x0), xmask) 2025-12-04T12:15:06.4295752Z tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T12:15:06.4295858Z ^ 2025-12-04T12:15:06.4296247Z type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T12:15:06.4296253Z 2025-12-04T12:15:06.4297043Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:06.4297057Z 2025-12-04T12:15:06.4297062Z 2025-12-04T12:15:06.4297336Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:06.4297992Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_valid_cast_float32_shape_4,2048,4096_dst_types0_cuda_float32 2025-12-04T12:15:06.4297998Z 2025-12-04T12:15:06.4298267Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:06.4298491Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:06.4298612Z frames [('total', 1)] 2025-12-04T12:15:06.4298730Z stats [('calls_captured', 4)] 2025-12-04T12:15:06.4299193Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:06.4299431Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:06.4299535Z graph_break [] 2025-12-04T12:15:06.4299771Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:06.4299878Z frames [('total', 1)] 2025-12-04T12:15:06.4299994Z stats [('calls_captured', 4)] 2025-12-04T12:15:06.4300225Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:06.4300691Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:06.4300791Z graph_break [] 2025-12-04T12:15:06.4301053Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:06.4301157Z frames [('total', 1)] 2025-12-04T12:15:06.4301284Z stats [('calls_captured', 4)] 2025-12-04T12:15:06.4301506Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:06.4301965Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:06.4302114Z graph_break [] 2025-12-04T12:15:06.4302811Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-ebbb316cfb6210df.xml - 2025-12-04T12:15:06.4302988Z =========================== short test summary info ============================ 2025-12-04T12:15:06.4303793Z FAILED [0.4329s] inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float32_shape_4,2048,4096_dst_types0_cuda_float32 - torch._inductor.exc.InductorError: CompilationError: at 7:11: 2025-12-04T12:15:06.4304115Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:06.4304257Z xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:06.4304396Z xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T12:15:06.4304507Z xmask = xindex < xnumel 2025-12-04T12:15:06.4304615Z x0 = xindex 2025-12-04T12:15:06.4304737Z tmp0 = tl.load(in_ptr0 + (x0), xmask) 2025-12-04T12:15:06.4304856Z tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T12:15:06.4304965Z ^ 2025-12-04T12:15:06.4305357Z type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T12:15:06.4305363Z 2025-12-04T12:15:06.4306087Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:06.4306093Z 2025-12-04T12:15:06.4306097Z 2025-12-04T12:15:06.4306314Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:06.4306957Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_valid_cast_float32_shape_4,2048,4096_dst_types0_cuda_float32 2025-12-04T12:15:06.4306977Z 2025-12-04T12:15:06.4307245Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:06.4307427Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T12:15:06.4307645Z ================== 1 failed, 64 deselected, 2 rerun in 4.39s =================== 2025-12-04T12:15:06.4307787Z Got exit code 1 2025-12-04T12:15:06.4307902Z Retrying single test... 2025-12-04T12:15:06.4308386Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-6f1b13e751374b5d.xml 2025-12-04T12:15:06.4308552Z ============================= test session starts ============================== 2025-12-04T12:15:06.4308917Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T12:15:06.4309034Z cachedir: .pytest_cache 2025-12-04T12:15:06.4309554Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:15:06.4309692Z rootdir: /var/lib/jenkins/workspace 2025-12-04T12:15:06.4309803Z configfile: pytest.ini 2025-12-04T12:15:06.4310397Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T12:15:06.4310638Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T12:15:06.4311359Z stepcurrent: skipping 64 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float32_shape_4,2048,4096_dst_types0_cuda_float32 2025-12-04T12:15:06.4311487Z Running 1 items in this shard 2025-12-04T12:15:06.4311492Z 2025-12-04T12:15:06.4312608Z inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float32_shape_4,2048,4096_dst_types0_cuda_float32 E1204 12:13:22.181000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_0 2025-12-04T12:15:06.4313429Z E1204 12:13:22.181000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:06.4313976Z E1204 12:13:22.181000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:06.4314625Z E1204 12:13:22.181000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T12:15:06.4315139Z E1204 12:13:22.181000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = xindex < xnumel 2025-12-04T12:15:06.4315571Z E1204 12:13:22.181000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T12:15:06.4316128Z E1204 12:13:22.181000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (x0), xmask) 2025-12-04T12:15:06.4316651Z E1204 12:13:22.181000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T12:15:06.4317158Z E1204 12:13:22.181000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tmp1.to(tl.float32) 2025-12-04T12:15:06.4317689Z E1204 12:13:22.181000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tmp0.to(tl.float8e5) 2025-12-04T12:15:06.4318193Z E1204 12:13:22.181000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tmp3.to(tl.float32) 2025-12-04T12:15:06.4318752Z E1204 12:13:22.181000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr0 + (x0), tmp2, xmask) 2025-12-04T12:15:06.4319297Z E1204 12:13:22.181000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (x0), tmp4, xmask) 2025-12-04T12:15:06.4319664Z E1204 12:13:22.181000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:06.4321547Z E1204 12:13:22.181000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp32', 'out_ptr0': '*fp32', 'out_ptr1': '*fp32', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1024}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 75} 2025-12-04T12:15:06.4322089Z E1204 12:13:22.181000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:06.4322969Z E1204 12:13:22.181000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper 2025-12-04T12:15:06.4323481Z E1204 12:13:22.181000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return fn(*args, **kwargs) 2025-12-04T12:15:06.4324336Z E1204 12:13:22.181000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1114, in to 2025-12-04T12:15:06.4325056Z E1204 12:13:22.181000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return cast(self, dtype, fp_downcast_rounding, bitcast, _semantic=_semantic) 2025-12-04T12:15:06.4325927Z E1204 12:13:22.181000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper 2025-12-04T12:15:06.4326472Z E1204 12:13:22.181000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return fn(*args, **kwargs) 2025-12-04T12:15:06.4327319Z E1204 12:13:22.181000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1978, in cast 2025-12-04T12:15:06.4327978Z E1204 12:13:22.181000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return _semantic.cast(input, dtype, fp_downcast_rounding) 2025-12-04T12:15:06.4328918Z E1204 12:13:22.181000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/semantic.py", line 827, in cast 2025-12-04T12:15:06.4329752Z E1204 12:13:22.181000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] self.builder.create_fp_to_fp(input.handle, dst_ty.to_ir(self.builder), fp_downcast_rounding), dst_ty) 2025-12-04T12:15:06.4330604Z E1204 12:13:22.181000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 712, in to_ir 2025-12-04T12:15:06.4331318Z E1204 12:13:22.181000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return builder.get_block_ty(self.element_ty.to_ir(builder), self.shape) 2025-12-04T12:15:06.4332172Z E1204 12:13:22.181000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 574, in to_ir 2025-12-04T12:15:06.4332856Z E1204 12:13:22.181000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] raise ValueError(f'type {self} not supported in this architecture. ' 2025-12-04T12:15:06.4333867Z E1204 12:13:22.181000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError: type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T12:15:06.4334237Z E1204 12:13:22.181000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:06.4334932Z E1204 12:13:22.181000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] The above exception was the direct cause of the following exception: 2025-12-04T12:15:06.4335296Z E1204 12:13:22.181000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:06.4335877Z E1204 12:13:22.181000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:06.4336996Z E1204 12:13:22.181000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:06.4337634Z E1204 12:13:22.181000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:06.4338545Z E1204 12:13:22.181000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:06.4339225Z E1204 12:13:22.181000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:06.4340123Z E1204 12:13:22.181000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:06.4340895Z E1204 12:13:22.181000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:06.4341564Z E1204 12:13:22.181000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 7:11: 2025-12-04T12:15:06.4342326Z E1204 12:13:22.181000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:06.4342908Z E1204 12:13:22.181000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:06.4343518Z E1204 12:13:22.181000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T12:15:06.4344012Z E1204 12:13:22.181000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = xindex < xnumel 2025-12-04T12:15:06.4344457Z E1204 12:13:22.181000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T12:15:06.4345002Z E1204 12:13:22.181000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (x0), xmask) 2025-12-04T12:15:06.4345531Z E1204 12:13:22.181000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T12:15:06.4345959Z E1204 12:13:22.181000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T12:15:06.4346791Z E1204 12:13:22.181000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T12:15:06.4346944Z ('RERUN', {'yellow': True}) [3.4575s] [100%] 2025-12-04T12:15:06.4348053Z inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float32_shape_4,2048,4096_dst_types0_cuda_float32 E1204 12:13:22.655000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_0 2025-12-04T12:15:06.4348831Z E1204 12:13:22.655000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:06.4349372Z E1204 12:13:22.655000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:06.4349972Z E1204 12:13:22.655000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T12:15:06.4350482Z E1204 12:13:22.655000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = xindex < xnumel 2025-12-04T12:15:06.4350913Z E1204 12:13:22.655000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T12:15:06.4351464Z E1204 12:13:22.655000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (x0), xmask) 2025-12-04T12:15:06.4351990Z E1204 12:13:22.655000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T12:15:06.4352493Z E1204 12:13:22.655000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tmp1.to(tl.float32) 2025-12-04T12:15:06.4353020Z E1204 12:13:22.655000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tmp0.to(tl.float8e5) 2025-12-04T12:15:06.4353527Z E1204 12:13:22.655000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tmp3.to(tl.float32) 2025-12-04T12:15:06.4354091Z E1204 12:13:22.655000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr0 + (x0), tmp2, xmask) 2025-12-04T12:15:06.4354638Z E1204 12:13:22.655000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (x0), tmp4, xmask) 2025-12-04T12:15:06.4355036Z E1204 12:13:22.655000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:06.4356895Z E1204 12:13:22.655000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp32', 'out_ptr0': '*fp32', 'out_ptr1': '*fp32', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1024}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 75} 2025-12-04T12:15:06.4357457Z E1204 12:13:22.655000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:06.4358335Z E1204 12:13:22.655000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper 2025-12-04T12:15:06.4358845Z E1204 12:13:22.655000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return fn(*args, **kwargs) 2025-12-04T12:15:06.4359704Z E1204 12:13:22.655000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1114, in to 2025-12-04T12:15:06.4360422Z E1204 12:13:22.655000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return cast(self, dtype, fp_downcast_rounding, bitcast, _semantic=_semantic) 2025-12-04T12:15:06.4361290Z E1204 12:13:22.655000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper 2025-12-04T12:15:06.4361793Z E1204 12:13:22.655000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return fn(*args, **kwargs) 2025-12-04T12:15:06.4362637Z E1204 12:13:22.655000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1978, in cast 2025-12-04T12:15:06.4363287Z E1204 12:13:22.655000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return _semantic.cast(input, dtype, fp_downcast_rounding) 2025-12-04T12:15:06.4364192Z E1204 12:13:22.655000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/semantic.py", line 827, in cast 2025-12-04T12:15:06.4365019Z E1204 12:13:22.655000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] self.builder.create_fp_to_fp(input.handle, dst_ty.to_ir(self.builder), fp_downcast_rounding), dst_ty) 2025-12-04T12:15:06.4365867Z E1204 12:13:22.655000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 712, in to_ir 2025-12-04T12:15:06.4366575Z E1204 12:13:22.655000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return builder.get_block_ty(self.element_ty.to_ir(builder), self.shape) 2025-12-04T12:15:06.4367423Z E1204 12:13:22.655000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 574, in to_ir 2025-12-04T12:15:06.4368109Z E1204 12:13:22.655000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] raise ValueError(f'type {self} not supported in this architecture. ' 2025-12-04T12:15:06.4369003Z E1204 12:13:22.655000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError: type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T12:15:06.4369395Z E1204 12:13:22.655000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:06.4370084Z E1204 12:13:22.655000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] The above exception was the direct cause of the following exception: 2025-12-04T12:15:06.4370445Z E1204 12:13:22.655000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:06.4371270Z E1204 12:13:22.655000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:06.4372328Z E1204 12:13:22.655000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:06.4372953Z E1204 12:13:22.655000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:06.4373866Z E1204 12:13:22.655000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:06.4374547Z E1204 12:13:22.655000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:06.4375447Z E1204 12:13:22.655000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:06.4376215Z E1204 12:13:22.655000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:06.4376901Z E1204 12:13:22.655000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 7:11: 2025-12-04T12:15:06.4377665Z E1204 12:13:22.655000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:06.4378209Z E1204 12:13:22.655000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:06.4378844Z E1204 12:13:22.655000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T12:15:06.4379343Z E1204 12:13:22.655000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = xindex < xnumel 2025-12-04T12:15:06.4379789Z E1204 12:13:22.655000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T12:15:06.4380334Z E1204 12:13:22.655000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (x0), xmask) 2025-12-04T12:15:06.4380859Z E1204 12:13:22.655000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T12:15:06.4381292Z E1204 12:13:22.655000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T12:15:06.4382122Z E1204 12:13:22.655000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T12:15:06.4382272Z ('RERUN', {'yellow': True}) [0.4342s] [100%] 2025-12-04T12:15:06.4383396Z inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float32_shape_4,2048,4096_dst_types0_cuda_float32 E1204 12:13:23.093000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_0 2025-12-04T12:15:06.4384214Z E1204 12:13:23.093000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:06.4384758Z E1204 12:13:23.093000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:06.4385359Z E1204 12:13:23.093000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T12:15:06.4385893Z E1204 12:13:23.093000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = xindex < xnumel 2025-12-04T12:15:06.4386327Z E1204 12:13:23.093000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T12:15:06.4386883Z E1204 12:13:23.093000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (x0), xmask) 2025-12-04T12:15:06.4387408Z E1204 12:13:23.093000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T12:15:06.4387916Z E1204 12:13:23.093000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tmp1.to(tl.float32) 2025-12-04T12:15:06.4388436Z E1204 12:13:23.093000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tmp0.to(tl.float8e5) 2025-12-04T12:15:06.4388946Z E1204 12:13:23.093000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tmp3.to(tl.float32) 2025-12-04T12:15:06.4389503Z E1204 12:13:23.093000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr0 + (x0), tmp2, xmask) 2025-12-04T12:15:06.4390047Z E1204 12:13:23.093000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (x0), tmp4, xmask) 2025-12-04T12:15:06.4390409Z E1204 12:13:23.093000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:06.4392267Z E1204 12:13:23.093000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp32', 'out_ptr0': '*fp32', 'out_ptr1': '*fp32', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1024}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 75} 2025-12-04T12:15:06.4392805Z E1204 12:13:23.093000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:06.4393682Z E1204 12:13:23.093000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper 2025-12-04T12:15:06.4394189Z E1204 12:13:23.093000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return fn(*args, **kwargs) 2025-12-04T12:15:06.4395037Z E1204 12:13:23.093000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1114, in to 2025-12-04T12:15:06.4395753Z E1204 12:13:23.093000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return cast(self, dtype, fp_downcast_rounding, bitcast, _semantic=_semantic) 2025-12-04T12:15:06.4396617Z E1204 12:13:23.093000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper 2025-12-04T12:15:06.4397121Z E1204 12:13:23.093000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return fn(*args, **kwargs) 2025-12-04T12:15:06.4398003Z E1204 12:13:23.093000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1978, in cast 2025-12-04T12:15:06.4398648Z E1204 12:13:23.093000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return _semantic.cast(input, dtype, fp_downcast_rounding) 2025-12-04T12:15:06.4399576Z E1204 12:13:23.093000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/semantic.py", line 827, in cast 2025-12-04T12:15:06.4400409Z E1204 12:13:23.093000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] self.builder.create_fp_to_fp(input.handle, dst_ty.to_ir(self.builder), fp_downcast_rounding), dst_ty) 2025-12-04T12:15:06.4401248Z E1204 12:13:23.093000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 712, in to_ir 2025-12-04T12:15:06.4401957Z E1204 12:13:23.093000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return builder.get_block_ty(self.element_ty.to_ir(builder), self.shape) 2025-12-04T12:15:06.4402799Z E1204 12:13:23.093000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 574, in to_ir 2025-12-04T12:15:06.4403489Z E1204 12:13:23.093000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] raise ValueError(f'type {self} not supported in this architecture. ' 2025-12-04T12:15:06.4404381Z E1204 12:13:23.093000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError: type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T12:15:06.4404749Z E1204 12:13:23.093000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:06.4405442Z E1204 12:13:23.093000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] The above exception was the direct cause of the following exception: 2025-12-04T12:15:06.4405802Z E1204 12:13:23.093000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:06.4406384Z E1204 12:13:23.093000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:06.4407447Z E1204 12:13:23.093000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:06.4408076Z E1204 12:13:23.093000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:06.4408983Z E1204 12:13:23.093000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:06.4409663Z E1204 12:13:23.093000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:06.4410561Z E1204 12:13:23.093000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:06.4411331Z E1204 12:13:23.093000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:06.4411950Z E1204 12:13:23.093000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 7:11: 2025-12-04T12:15:06.4412744Z E1204 12:13:23.093000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:06.4413285Z E1204 12:13:23.093000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:06.4413926Z E1204 12:13:23.093000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T12:15:06.4414418Z E1204 12:13:23.093000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = xindex < xnumel 2025-12-04T12:15:06.4414865Z E1204 12:13:23.093000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T12:15:06.4415406Z E1204 12:13:23.093000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (x0), xmask) 2025-12-04T12:15:06.4415934Z E1204 12:13:23.093000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T12:15:06.4416425Z E1204 12:13:23.093000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T12:15:06.4417262Z E1204 12:13:23.093000 128075 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T12:15:06.4417382Z FAILED [0.4356s] [100%] 2025-12-04T12:15:06.4417389Z 2025-12-04T12:15:06.4417535Z ==================================== RERUNS ==================================== 2025-12-04T12:15:06.4417866Z _ TestFP8TypesCUDA.test_valid_cast_float32_shape_4,2048,4096_dst_types0_cuda_float32 _ 2025-12-04T12:15:06.4418011Z Traceback (most recent call last): 2025-12-04T12:15:06.4418388Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 156, in test_valid_cast 2025-12-04T12:15:06.4418537Z y0_fp8, y1_fp8 = compiled_fp8_cast(x) 2025-12-04T12:15:06.4419028Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:06.4419279Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:06.4419847Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:06.4420050Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:06.4420563Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:06.4420724Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:06.4421261Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:06.4421596Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:06.4422120Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:06.4422270Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:06.4422764Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:06.4422893Z return self._compile_to_module() 2025-12-04T12:15:06.4423393Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:06.4423560Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:06.4424076Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:06.4424265Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:06.4424759Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:06.4424991Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:06.4425593Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:06.4425776Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:06.4426295Z File "/tmp/tmp_s67pl2n/37/c37zt7inya7rhhvfferv4dpwirmtgkdiincjagrzebkvrbfcgs5a.py", line 51, in 2025-12-04T12:15:06.4426760Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T12:15:06.4426874Z kernel.precompile( 2025-12-04T12:15:06.4427444Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T12:15:06.4427584Z self._precompile_worker() 2025-12-04T12:15:06.4428191Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:06.4428373Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:06.4428968Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:06.4429190Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:06.4429645Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:06.4429893Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:06.4430352Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:06.4430689Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:06.4430936Z torch._inductor.exc.InductorError: CompilationError: at 7:11: 2025-12-04T12:15:06.4431261Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:06.4431386Z xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:06.4431543Z xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T12:15:06.4431658Z xmask = xindex < xnumel 2025-12-04T12:15:06.4431758Z x0 = xindex 2025-12-04T12:15:06.4431937Z tmp0 = tl.load(in_ptr0 + (x0), xmask) 2025-12-04T12:15:06.4432064Z tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T12:15:06.4432173Z ^ 2025-12-04T12:15:06.4432563Z type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T12:15:06.4432569Z 2025-12-04T12:15:06.4433287Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:06.4433298Z 2025-12-04T12:15:06.4433321Z 2025-12-04T12:15:06.4433626Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:06.4434286Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_valid_cast_float32_shape_4,2048,4096_dst_types0_cuda_float32 2025-12-04T12:15:06.4434293Z 2025-12-04T12:15:06.4434582Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:06.4434817Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:06.4434926Z frames [('total', 1)] 2025-12-04T12:15:06.4435061Z stats [('calls_captured', 4)] 2025-12-04T12:15:06.4435531Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:06.4435770Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:06.4435874Z graph_break [] 2025-12-04T12:15:06.4436243Z _ TestFP8TypesCUDA.test_valid_cast_float32_shape_4,2048,4096_dst_types0_cuda_float32 _ 2025-12-04T12:15:06.4436383Z Traceback (most recent call last): 2025-12-04T12:15:06.4436757Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 156, in test_valid_cast 2025-12-04T12:15:06.4436886Z y0_fp8, y1_fp8 = compiled_fp8_cast(x) 2025-12-04T12:15:06.4437392Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:06.4437710Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:06.4438315Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:06.4438515Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:06.4439027Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:06.4439193Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:06.4439730Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:06.4440066Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:06.4440588Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:06.4440741Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:06.4441242Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:06.4441368Z return self._compile_to_module() 2025-12-04T12:15:06.4441856Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:06.4442035Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:06.4442555Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:06.4442700Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:06.4443195Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:06.4443426Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:06.4444072Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:06.4444201Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:06.4444711Z File "/tmp/tmpk0rfnz8k/rp/crpzruxu65lnw77wuerkdeuslfqi2plfmve4o4xjes3d43w2maf3.py", line 51, in 2025-12-04T12:15:06.4445176Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T12:15:06.4445291Z kernel.precompile( 2025-12-04T12:15:06.4445860Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T12:15:06.4445978Z self._precompile_worker() 2025-12-04T12:15:06.4446575Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:06.4446770Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:06.4447372Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:06.4447582Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:06.4448032Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:06.4448276Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:06.4448770Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:06.4449105Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:06.4449351Z torch._inductor.exc.InductorError: CompilationError: at 7:11: 2025-12-04T12:15:06.4449672Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:06.4449831Z xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:06.4450018Z xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T12:15:06.4450131Z xmask = xindex < xnumel 2025-12-04T12:15:06.4450228Z x0 = xindex 2025-12-04T12:15:06.4450368Z tmp0 = tl.load(in_ptr0 + (x0), xmask) 2025-12-04T12:15:06.4450488Z tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T12:15:06.4450583Z ^ 2025-12-04T12:15:06.4450988Z type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T12:15:06.4450997Z 2025-12-04T12:15:06.4451713Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:06.4451719Z 2025-12-04T12:15:06.4451723Z 2025-12-04T12:15:06.4451957Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:06.4452595Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_valid_cast_float32_shape_4,2048,4096_dst_types0_cuda_float32 2025-12-04T12:15:06.4452606Z 2025-12-04T12:15:06.4452887Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:06.4453114Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:06.4453219Z frames [('total', 1)] 2025-12-04T12:15:06.4453350Z stats [('calls_captured', 4)] 2025-12-04T12:15:06.4453815Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:06.4454041Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:06.4454154Z graph_break [] 2025-12-04T12:15:06.4454406Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:06.4454526Z frames [('total', 1)] 2025-12-04T12:15:06.4454642Z stats [('calls_captured', 4)] 2025-12-04T12:15:06.4454864Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:06.4455373Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:06.4455476Z graph_break [] 2025-12-04T12:15:06.4455628Z =================================== FAILURES =================================== 2025-12-04T12:15:06.4455965Z _ TestFP8TypesCUDA.test_valid_cast_float32_shape_4,2048,4096_dst_types0_cuda_float32 _ 2025-12-04T12:15:06.4456091Z Traceback (most recent call last): 2025-12-04T12:15:06.4456585Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 156, in test_valid_cast 2025-12-04T12:15:06.4456717Z y0_fp8, y1_fp8 = compiled_fp8_cast(x) 2025-12-04T12:15:06.4457209Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:06.4457474Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:06.4457992Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:06.4458191Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:06.4458719Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:06.4458872Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:06.4459424Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:06.4459795Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:06.4460317Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:06.4460478Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:06.4460961Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:06.4461132Z return self._compile_to_module() 2025-12-04T12:15:06.4461724Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:06.4461891Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:06.4462426Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:06.4462588Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:06.4463083Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:06.4463330Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:06.4463917Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:06.4464066Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:06.4464563Z File "/tmp/tmptpjwmlj1/na/cnao57g2ulgri32xj4l73z4laikimden227m6rybxodp4tkm4j57.py", line 51, in 2025-12-04T12:15:06.4465026Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T12:15:06.4465158Z kernel.precompile( 2025-12-04T12:15:06.4465713Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T12:15:06.4465849Z self._precompile_worker() 2025-12-04T12:15:06.4466444Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:06.4466624Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:06.4467230Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:06.4467471Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:06.4467926Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:06.4468185Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:06.4468625Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:06.4468973Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:06.4469206Z torch._inductor.exc.InductorError: CompilationError: at 7:11: 2025-12-04T12:15:06.4469526Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:06.4469667Z xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:06.4469805Z xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T12:15:06.4469917Z xmask = xindex < xnumel 2025-12-04T12:15:06.4470027Z x0 = xindex 2025-12-04T12:15:06.4470152Z tmp0 = tl.load(in_ptr0 + (x0), xmask) 2025-12-04T12:15:06.4470286Z tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T12:15:06.4470379Z ^ 2025-12-04T12:15:06.4470768Z type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T12:15:06.4470774Z 2025-12-04T12:15:06.4471697Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:06.4471801Z 2025-12-04T12:15:06.4471806Z 2025-12-04T12:15:06.4472027Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:06.4472679Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_valid_cast_float32_shape_4,2048,4096_dst_types0_cuda_float32 2025-12-04T12:15:06.4472685Z 2025-12-04T12:15:06.4473004Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:06.4473271Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:06.4473393Z frames [('total', 1)] 2025-12-04T12:15:06.4473509Z stats [('calls_captured', 4)] 2025-12-04T12:15:06.4473989Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:06.4474213Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:06.4474317Z graph_break [] 2025-12-04T12:15:06.4474550Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:06.4474655Z frames [('total', 1)] 2025-12-04T12:15:06.4474771Z stats [('calls_captured', 4)] 2025-12-04T12:15:06.4475005Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:06.4475465Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:06.4475581Z graph_break [] 2025-12-04T12:15:06.4475806Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:06.4475912Z frames [('total', 1)] 2025-12-04T12:15:06.4476041Z stats [('calls_captured', 4)] 2025-12-04T12:15:06.4476264Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:06.4476719Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:06.4476834Z graph_break [] 2025-12-04T12:15:06.4477482Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-6f1b13e751374b5d.xml - 2025-12-04T12:15:06.4477670Z =========================== short test summary info ============================ 2025-12-04T12:15:06.4478462Z FAILED [0.4356s] inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float32_shape_4,2048,4096_dst_types0_cuda_float32 - torch._inductor.exc.InductorError: CompilationError: at 7:11: 2025-12-04T12:15:06.4478833Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:06.4478980Z xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:06.4479122Z xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T12:15:06.4479233Z xmask = xindex < xnumel 2025-12-04T12:15:06.4479344Z x0 = xindex 2025-12-04T12:15:06.4479469Z tmp0 = tl.load(in_ptr0 + (x0), xmask) 2025-12-04T12:15:06.4479589Z tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T12:15:06.4479697Z ^ 2025-12-04T12:15:06.4487345Z type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T12:15:06.4487362Z 2025-12-04T12:15:06.4488180Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:06.4488197Z 2025-12-04T12:15:06.4488202Z 2025-12-04T12:15:06.4488426Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:06.4489077Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_valid_cast_float32_shape_4,2048,4096_dst_types0_cuda_float32 2025-12-04T12:15:06.4489096Z 2025-12-04T12:15:06.4489370Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:06.4489556Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T12:15:06.4489863Z ================== 1 failed, 187 deselected, 2 rerun in 4.37s ================== 2025-12-04T12:15:06.4489966Z Got exit code 1 2025-12-04T12:15:06.4490077Z Retrying single test... 2025-12-04T12:15:06.4490564Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-f2c09e3279cd971a.xml 2025-12-04T12:15:06.4490731Z ============================= test session starts ============================== 2025-12-04T12:15:06.4491135Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T12:15:06.4491286Z cachedir: .pytest_cache 2025-12-04T12:15:06.4491813Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:15:06.4491951Z rootdir: /var/lib/jenkins/workspace 2025-12-04T12:15:06.4492062Z configfile: pytest.ini 2025-12-04T12:15:06.4492654Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T12:15:06.4492892Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T12:15:06.4493617Z stepcurrent: skipping 64 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float32_shape_4,2048,4096_dst_types0_cuda_float32 2025-12-04T12:15:06.4493748Z Running 1 items in this shard 2025-12-04T12:15:06.4493756Z 2025-12-04T12:15:06.4494878Z inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float32_shape_4,2048,4096_dst_types0_cuda_float32 E1204 12:13:41.755000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_0 2025-12-04T12:15:06.4495645Z E1204 12:13:41.755000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:06.4496209Z E1204 12:13:41.755000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:06.4496896Z E1204 12:13:41.755000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T12:15:06.4497401Z E1204 12:13:41.755000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = xindex < xnumel 2025-12-04T12:15:06.4497835Z E1204 12:13:41.755000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T12:15:06.4498432Z E1204 12:13:41.755000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (x0), xmask) 2025-12-04T12:15:06.4498957Z E1204 12:13:41.755000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T12:15:06.4499462Z E1204 12:13:41.755000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tmp1.to(tl.float32) 2025-12-04T12:15:06.4499984Z E1204 12:13:41.755000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tmp0.to(tl.float8e5) 2025-12-04T12:15:06.4500486Z E1204 12:13:41.755000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tmp3.to(tl.float32) 2025-12-04T12:15:06.4501043Z E1204 12:13:41.755000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr0 + (x0), tmp2, xmask) 2025-12-04T12:15:06.4501591Z E1204 12:13:41.755000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (x0), tmp4, xmask) 2025-12-04T12:15:06.4501956Z E1204 12:13:41.755000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:06.4503772Z E1204 12:13:41.755000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp32', 'out_ptr0': '*fp32', 'out_ptr1': '*fp32', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1024}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 75} 2025-12-04T12:15:06.4504342Z E1204 12:13:41.755000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:06.4505287Z E1204 12:13:41.755000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper 2025-12-04T12:15:06.4505793Z E1204 12:13:41.755000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return fn(*args, **kwargs) 2025-12-04T12:15:06.4506641Z E1204 12:13:41.755000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1114, in to 2025-12-04T12:15:06.4507350Z E1204 12:13:41.755000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return cast(self, dtype, fp_downcast_rounding, bitcast, _semantic=_semantic) 2025-12-04T12:15:06.4508211Z E1204 12:13:41.755000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper 2025-12-04T12:15:06.4508719Z E1204 12:13:41.755000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return fn(*args, **kwargs) 2025-12-04T12:15:06.4509559Z E1204 12:13:41.755000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1978, in cast 2025-12-04T12:15:06.4510209Z E1204 12:13:41.755000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return _semantic.cast(input, dtype, fp_downcast_rounding) 2025-12-04T12:15:06.4511079Z E1204 12:13:41.755000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/semantic.py", line 827, in cast 2025-12-04T12:15:06.4511929Z E1204 12:13:41.755000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] self.builder.create_fp_to_fp(input.handle, dst_ty.to_ir(self.builder), fp_downcast_rounding), dst_ty) 2025-12-04T12:15:06.4512778Z E1204 12:13:41.755000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 712, in to_ir 2025-12-04T12:15:06.4513482Z E1204 12:13:41.755000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return builder.get_block_ty(self.element_ty.to_ir(builder), self.shape) 2025-12-04T12:15:06.4514323Z E1204 12:13:41.755000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 574, in to_ir 2025-12-04T12:15:06.4515004Z E1204 12:13:41.755000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] raise ValueError(f'type {self} not supported in this architecture. ' 2025-12-04T12:15:06.4515898Z E1204 12:13:41.755000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError: type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T12:15:06.4516259Z E1204 12:13:41.755000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:06.4516947Z E1204 12:13:41.755000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] The above exception was the direct cause of the following exception: 2025-12-04T12:15:06.4517339Z E1204 12:13:41.755000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:06.4517871Z E1204 12:13:41.755000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:06.4518921Z E1204 12:13:41.755000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:06.4519612Z E1204 12:13:41.755000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:06.4520519Z E1204 12:13:41.755000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:06.4521192Z E1204 12:13:41.755000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:06.4522089Z E1204 12:13:41.755000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:06.4522857Z E1204 12:13:41.755000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:06.4523481Z E1204 12:13:41.755000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 7:11: 2025-12-04T12:15:06.4524238Z E1204 12:13:41.755000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:06.4524779Z E1204 12:13:41.755000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:06.4525347Z E1204 12:13:41.755000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T12:15:06.4525836Z E1204 12:13:41.755000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = xindex < xnumel 2025-12-04T12:15:06.4526319Z E1204 12:13:41.755000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T12:15:06.4526860Z E1204 12:13:41.755000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (x0), xmask) 2025-12-04T12:15:06.4527379Z E1204 12:13:41.755000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T12:15:06.4527805Z E1204 12:13:41.755000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T12:15:06.4528632Z E1204 12:13:41.755000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T12:15:06.4528780Z ('RERUN', {'yellow': True}) [3.4628s] [100%] 2025-12-04T12:15:06.4529893Z inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float32_shape_4,2048,4096_dst_types0_cuda_float32 E1204 12:13:42.229000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_0 2025-12-04T12:15:06.4530653Z E1204 12:13:42.229000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:06.4531210Z E1204 12:13:42.229000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:06.4532174Z E1204 12:13:42.229000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T12:15:06.4532680Z E1204 12:13:42.229000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = xindex < xnumel 2025-12-04T12:15:06.4533110Z E1204 12:13:42.229000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T12:15:06.4533763Z E1204 12:13:42.229000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (x0), xmask) 2025-12-04T12:15:06.4534285Z E1204 12:13:42.229000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T12:15:06.4534786Z E1204 12:13:42.229000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tmp1.to(tl.float32) 2025-12-04T12:15:06.4535309Z E1204 12:13:42.229000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tmp0.to(tl.float8e5) 2025-12-04T12:15:06.4535805Z E1204 12:13:42.229000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tmp3.to(tl.float32) 2025-12-04T12:15:06.4536421Z E1204 12:13:42.229000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr0 + (x0), tmp2, xmask) 2025-12-04T12:15:06.4536999Z E1204 12:13:42.229000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (x0), tmp4, xmask) 2025-12-04T12:15:06.4537384Z E1204 12:13:42.229000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:06.4539239Z E1204 12:13:42.229000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp32', 'out_ptr0': '*fp32', 'out_ptr1': '*fp32', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1024}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 75} 2025-12-04T12:15:06.4539817Z E1204 12:13:42.229000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:06.4540741Z E1204 12:13:42.229000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper 2025-12-04T12:15:06.4541252Z E1204 12:13:42.229000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return fn(*args, **kwargs) 2025-12-04T12:15:06.4542092Z E1204 12:13:42.229000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1114, in to 2025-12-04T12:15:06.4542809Z E1204 12:13:42.229000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return cast(self, dtype, fp_downcast_rounding, bitcast, _semantic=_semantic) 2025-12-04T12:15:06.4543671Z E1204 12:13:42.229000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper 2025-12-04T12:15:06.4544182Z E1204 12:13:42.229000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return fn(*args, **kwargs) 2025-12-04T12:15:06.4545026Z E1204 12:13:42.229000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1978, in cast 2025-12-04T12:15:06.4545673Z E1204 12:13:42.229000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return _semantic.cast(input, dtype, fp_downcast_rounding) 2025-12-04T12:15:06.4546574Z E1204 12:13:42.229000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/semantic.py", line 827, in cast 2025-12-04T12:15:06.4547397Z E1204 12:13:42.229000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] self.builder.create_fp_to_fp(input.handle, dst_ty.to_ir(self.builder), fp_downcast_rounding), dst_ty) 2025-12-04T12:15:06.4548295Z E1204 12:13:42.229000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 712, in to_ir 2025-12-04T12:15:06.4549001Z E1204 12:13:42.229000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return builder.get_block_ty(self.element_ty.to_ir(builder), self.shape) 2025-12-04T12:15:06.4549842Z E1204 12:13:42.229000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 574, in to_ir 2025-12-04T12:15:06.4550531Z E1204 12:13:42.229000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] raise ValueError(f'type {self} not supported in this architecture. ' 2025-12-04T12:15:06.4551420Z E1204 12:13:42.229000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError: type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T12:15:06.4551789Z E1204 12:13:42.229000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:06.4552475Z E1204 12:13:42.229000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] The above exception was the direct cause of the following exception: 2025-12-04T12:15:06.4552837Z E1204 12:13:42.229000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:06.4553369Z E1204 12:13:42.229000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:06.4554417Z E1204 12:13:42.229000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:06.4555080Z E1204 12:13:42.229000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:06.4555981Z E1204 12:13:42.229000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:06.4556655Z E1204 12:13:42.229000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:06.4557544Z E1204 12:13:42.229000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:06.4558313Z E1204 12:13:42.229000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:06.4558942Z E1204 12:13:42.229000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 7:11: 2025-12-04T12:15:06.4559701Z E1204 12:13:42.229000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:06.4560242Z E1204 12:13:42.229000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:06.4560850Z E1204 12:13:42.229000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T12:15:06.4561343Z E1204 12:13:42.229000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = xindex < xnumel 2025-12-04T12:15:06.4561822Z E1204 12:13:42.229000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T12:15:06.4562394Z E1204 12:13:42.229000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (x0), xmask) 2025-12-04T12:15:06.4562920Z E1204 12:13:42.229000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T12:15:06.4563350Z E1204 12:13:42.229000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T12:15:06.4564168Z E1204 12:13:42.229000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T12:15:06.4564315Z ('RERUN', {'yellow': True}) [0.4335s] [100%] 2025-12-04T12:15:06.4565433Z inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float32_shape_4,2048,4096_dst_types0_cuda_float32 E1204 12:13:42.676000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_0 2025-12-04T12:15:06.4566196Z E1204 12:13:42.676000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:06.4566747Z E1204 12:13:42.676000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:06.4567304Z E1204 12:13:42.676000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T12:15:06.4567812Z E1204 12:13:42.676000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = xindex < xnumel 2025-12-04T12:15:06.4568244Z E1204 12:13:42.676000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T12:15:06.4568839Z E1204 12:13:42.676000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (x0), xmask) 2025-12-04T12:15:06.4569365Z E1204 12:13:42.676000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T12:15:06.4569866Z E1204 12:13:42.676000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tmp1.to(tl.float32) 2025-12-04T12:15:06.4570392Z E1204 12:13:42.676000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp3 = tmp0.to(tl.float8e5) 2025-12-04T12:15:06.4570896Z E1204 12:13:42.676000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp4 = tmp3.to(tl.float32) 2025-12-04T12:15:06.4571640Z E1204 12:13:42.676000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr0 + (x0), tmp2, xmask) 2025-12-04T12:15:06.4572187Z E1204 12:13:42.676000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr1 + (x0), tmp4, xmask) 2025-12-04T12:15:06.4572554Z E1204 12:13:42.676000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:06.4574376Z E1204 12:13:42.676000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp32', 'out_ptr0': '*fp32', 'out_ptr1': '*fp32', 'xnumel': 'i32', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'XBLOCK': 1024}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]], (2,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 75} 2025-12-04T12:15:06.4575001Z E1204 12:13:42.676000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:06.4575925Z E1204 12:13:42.676000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper 2025-12-04T12:15:06.4576543Z E1204 12:13:42.676000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return fn(*args, **kwargs) 2025-12-04T12:15:06.4577395Z E1204 12:13:42.676000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1114, in to 2025-12-04T12:15:06.4578108Z E1204 12:13:42.676000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return cast(self, dtype, fp_downcast_rounding, bitcast, _semantic=_semantic) 2025-12-04T12:15:06.4578975Z E1204 12:13:42.676000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 43, in wrapper 2025-12-04T12:15:06.4579486Z E1204 12:13:42.676000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return fn(*args, **kwargs) 2025-12-04T12:15:06.4580333Z E1204 12:13:42.676000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 1978, in cast 2025-12-04T12:15:06.4580982Z E1204 12:13:42.676000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return _semantic.cast(input, dtype, fp_downcast_rounding) 2025-12-04T12:15:06.4581852Z E1204 12:13:42.676000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/semantic.py", line 827, in cast 2025-12-04T12:15:06.4582688Z E1204 12:13:42.676000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] self.builder.create_fp_to_fp(input.handle, dst_ty.to_ir(self.builder), fp_downcast_rounding), dst_ty) 2025-12-04T12:15:06.4583592Z E1204 12:13:42.676000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 712, in to_ir 2025-12-04T12:15:06.4584302Z E1204 12:13:42.676000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return builder.get_block_ty(self.element_ty.to_ir(builder), self.shape) 2025-12-04T12:15:06.4585149Z E1204 12:13:42.676000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/language/core.py", line 574, in to_ir 2025-12-04T12:15:06.4585844Z E1204 12:13:42.676000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] raise ValueError(f'type {self} not supported in this architecture. ' 2025-12-04T12:15:06.4586741Z E1204 12:13:42.676000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError: type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T12:15:06.4587104Z E1204 12:13:42.676000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:06.4587791Z E1204 12:13:42.676000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] The above exception was the direct cause of the following exception: 2025-12-04T12:15:06.4588149Z E1204 12:13:42.676000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:06.4588725Z E1204 12:13:42.676000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:06.4589786Z E1204 12:13:42.676000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:06.4590464Z E1204 12:13:42.676000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:06.4591396Z E1204 12:13:42.676000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:06.4592078Z E1204 12:13:42.676000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:06.4592969Z E1204 12:13:42.676000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:06.4593750Z E1204 12:13:42.676000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:06.4594377Z E1204 12:13:42.676000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 7:11: 2025-12-04T12:15:06.4595136Z E1204 12:13:42.676000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:06.4595679Z E1204 12:13:42.676000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:06.4596258Z E1204 12:13:42.676000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T12:15:06.4596747Z E1204 12:13:42.676000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = xindex < xnumel 2025-12-04T12:15:06.4597189Z E1204 12:13:42.676000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] x0 = xindex 2025-12-04T12:15:06.4597775Z E1204 12:13:42.676000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (x0), xmask) 2025-12-04T12:15:06.4598300Z E1204 12:13:42.676000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T12:15:06.4598732Z E1204 12:13:42.676000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T12:15:06.4599556Z E1204 12:13:42.676000 128272 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T12:15:06.4599673Z FAILED [0.4451s] [100%] 2025-12-04T12:15:06.4599680Z 2025-12-04T12:15:06.4599826Z ==================================== RERUNS ==================================== 2025-12-04T12:15:06.4600152Z _ TestFP8TypesCUDA.test_valid_cast_float32_shape_4,2048,4096_dst_types0_cuda_float32 _ 2025-12-04T12:15:06.4600294Z Traceback (most recent call last): 2025-12-04T12:15:06.4600676Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 156, in test_valid_cast 2025-12-04T12:15:06.4600819Z y0_fp8, y1_fp8 = compiled_fp8_cast(x) 2025-12-04T12:15:06.4601315Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:06.4601566Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:06.4602123Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:06.4602324Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:06.4602833Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:06.4602995Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:06.4603559Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:06.4603925Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:06.4604448Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:06.4604600Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:06.4605099Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:06.4605227Z return self._compile_to_module() 2025-12-04T12:15:06.4605727Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:06.4605893Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:06.4606412Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:06.4606561Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:06.4607060Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:06.4607293Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:06.4607890Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:06.4608021Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:06.4608533Z File "/tmp/tmpxvkbrrj1/3z/c3zzs6t6rgzarktvabvrrs5jnnzy7ol6rncfz5zgmc56h7mvt5lf.py", line 51, in 2025-12-04T12:15:06.4608997Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T12:15:06.4609111Z kernel.precompile( 2025-12-04T12:15:06.4609689Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T12:15:06.4609844Z self._precompile_worker() 2025-12-04T12:15:06.4610453Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:06.4610633Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:06.4611230Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:06.4611440Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:06.4611890Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:06.4612134Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:06.4612587Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:06.4612930Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:06.4613173Z torch._inductor.exc.InductorError: CompilationError: at 7:11: 2025-12-04T12:15:06.4613496Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:06.4613623Z xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:06.4613777Z xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T12:15:06.4613885Z xmask = xindex < xnumel 2025-12-04T12:15:06.4614013Z x0 = xindex 2025-12-04T12:15:06.4614149Z tmp0 = tl.load(in_ptr0 + (x0), xmask) 2025-12-04T12:15:06.4614265Z tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T12:15:06.4614371Z ^ 2025-12-04T12:15:06.4614759Z type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T12:15:06.4614766Z 2025-12-04T12:15:06.4615483Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:06.4615550Z 2025-12-04T12:15:06.4615556Z 2025-12-04T12:15:06.4615785Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:06.4616487Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_valid_cast_float32_shape_4,2048,4096_dst_types0_cuda_float32 2025-12-04T12:15:06.4616494Z 2025-12-04T12:15:06.4616776Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:06.4617004Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:06.4617110Z frames [('total', 1)] 2025-12-04T12:15:06.4617238Z stats [('calls_captured', 4)] 2025-12-04T12:15:06.4617697Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:06.4617934Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:06.4618033Z graph_break [] 2025-12-04T12:15:06.4618358Z _ TestFP8TypesCUDA.test_valid_cast_float32_shape_4,2048,4096_dst_types0_cuda_float32 _ 2025-12-04T12:15:06.4618494Z Traceback (most recent call last): 2025-12-04T12:15:06.4618865Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 156, in test_valid_cast 2025-12-04T12:15:06.4618994Z y0_fp8, y1_fp8 = compiled_fp8_cast(x) 2025-12-04T12:15:06.4619493Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:06.4619742Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:06.4620268Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:06.4620460Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:06.4620969Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:06.4621723Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:06.4622267Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:06.4622598Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:06.4623119Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:06.4623272Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:06.4623763Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:06.4623884Z return self._compile_to_module() 2025-12-04T12:15:06.4624366Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:06.4624545Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:06.4625061Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:06.4625200Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:06.4625696Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:06.4625927Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:06.4626557Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:06.4626683Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:06.4627196Z File "/tmp/tmpul8swxe9/ce/cce4pvrydatjw3qpxy4xb24dbqsr55og3qzmdmbbyytgzqtvtg6l.py", line 51, in 2025-12-04T12:15:06.4627687Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T12:15:06.4627799Z kernel.precompile( 2025-12-04T12:15:06.4628398Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T12:15:06.4628513Z self._precompile_worker() 2025-12-04T12:15:06.4629109Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:06.4629298Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:06.4629889Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:06.4630097Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:06.4630545Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:06.4630791Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:06.4631249Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:06.4631581Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:06.4631828Z torch._inductor.exc.InductorError: CompilationError: at 7:11: 2025-12-04T12:15:06.4632144Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:06.4632272Z xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:06.4632420Z xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T12:15:06.4632530Z xmask = xindex < xnumel 2025-12-04T12:15:06.4632625Z x0 = xindex 2025-12-04T12:15:06.4632755Z tmp0 = tl.load(in_ptr0 + (x0), xmask) 2025-12-04T12:15:06.4632873Z tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T12:15:06.4632963Z ^ 2025-12-04T12:15:06.4633357Z type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T12:15:06.4633401Z 2025-12-04T12:15:06.4634121Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:06.4634127Z 2025-12-04T12:15:06.4634132Z 2025-12-04T12:15:06.4634360Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:06.4634997Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_valid_cast_float32_shape_4,2048,4096_dst_types0_cuda_float32 2025-12-04T12:15:06.4635005Z 2025-12-04T12:15:06.4635281Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:06.4635508Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:06.4635613Z frames [('total', 1)] 2025-12-04T12:15:06.4635739Z stats [('calls_captured', 4)] 2025-12-04T12:15:06.4636212Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:06.4636431Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:06.4636540Z graph_break [] 2025-12-04T12:15:06.4636758Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:06.4636872Z frames [('total', 1)] 2025-12-04T12:15:06.4636986Z stats [('calls_captured', 4)] 2025-12-04T12:15:06.4637270Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:06.4637741Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:06.4637838Z graph_break [] 2025-12-04T12:15:06.4637983Z =================================== FAILURES =================================== 2025-12-04T12:15:06.4638318Z _ TestFP8TypesCUDA.test_valid_cast_float32_shape_4,2048,4096_dst_types0_cuda_float32 _ 2025-12-04T12:15:06.4638473Z Traceback (most recent call last): 2025-12-04T12:15:06.4638892Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 156, in test_valid_cast 2025-12-04T12:15:06.4639018Z y0_fp8, y1_fp8 = compiled_fp8_cast(x) 2025-12-04T12:15:06.4639506Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:06.4639763Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:06.4640277Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:06.4640470Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:06.4640987Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:06.4641136Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:06.4641685Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:06.4642007Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:06.4642524Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:06.4642679Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:06.4643157Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:06.4643295Z return self._compile_to_module() 2025-12-04T12:15:06.4643779Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:06.4643942Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:06.4644468Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:06.4644639Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:06.4645134Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:06.4645378Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:06.4645961Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:06.4646102Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:06.4646603Z File "/tmp/tmpkbo8j2vw/qc/cqczjpqx6jq4biiqt3bcyhc7vnaq5gqca23io6r2sgd24l6qln7a.py", line 51, in 2025-12-04T12:15:06.4647061Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T12:15:06.4647187Z kernel.precompile( 2025-12-04T12:15:06.4647745Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T12:15:06.4647875Z self._precompile_worker() 2025-12-04T12:15:06.4648472Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:06.4648652Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:06.4649254Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:06.4649489Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:06.4649937Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:06.4650194Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:06.4650636Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:06.4651047Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:06.4651281Z torch._inductor.exc.InductorError: CompilationError: at 7:11: 2025-12-04T12:15:06.4651595Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:06.4651731Z xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:06.4651868Z xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T12:15:06.4651981Z xmask = xindex < xnumel 2025-12-04T12:15:06.4652091Z x0 = xindex 2025-12-04T12:15:06.4652212Z tmp0 = tl.load(in_ptr0 + (x0), xmask) 2025-12-04T12:15:06.4652341Z tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T12:15:06.4652433Z ^ 2025-12-04T12:15:06.4652819Z type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T12:15:06.4652825Z 2025-12-04T12:15:06.4653554Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:06.4653561Z 2025-12-04T12:15:06.4653566Z 2025-12-04T12:15:06.4653781Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:06.4654428Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_valid_cast_float32_shape_4,2048,4096_dst_types0_cuda_float32 2025-12-04T12:15:06.4654434Z 2025-12-04T12:15:06.4654706Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:06.4654927Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:06.4655044Z frames [('total', 1)] 2025-12-04T12:15:06.4655158Z stats [('calls_captured', 4)] 2025-12-04T12:15:06.4655638Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:06.4655865Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:06.4655997Z graph_break [] 2025-12-04T12:15:06.4656230Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:06.4656458Z frames [('total', 1)] 2025-12-04T12:15:06.4656579Z stats [('calls_captured', 4)] 2025-12-04T12:15:06.4656824Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:06.4657288Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:06.4657409Z graph_break [] 2025-12-04T12:15:06.4657629Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:06.4657735Z frames [('total', 1)] 2025-12-04T12:15:06.4657864Z stats [('calls_captured', 4)] 2025-12-04T12:15:06.4658083Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:06.4658545Z inductor [('pattern_matcher_nodes', 4), ('pattern_matcher_count', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:06.4658663Z graph_break [] 2025-12-04T12:15:06.4659315Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-f2c09e3279cd971a.xml - 2025-12-04T12:15:06.4659504Z =========================== short test summary info ============================ 2025-12-04T12:15:06.4660299Z FAILED [0.4451s] inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float32_shape_4,2048,4096_dst_types0_cuda_float32 - torch._inductor.exc.InductorError: CompilationError: at 7:11: 2025-12-04T12:15:06.4660664Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:06.4660803Z xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:06.4660945Z xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T12:15:06.4661057Z xmask = xindex < xnumel 2025-12-04T12:15:06.4661203Z x0 = xindex 2025-12-04T12:15:06.4661329Z tmp0 = tl.load(in_ptr0 + (x0), xmask) 2025-12-04T12:15:06.4661455Z tmp1 = tmp0.to(tl.float8e4nv) 2025-12-04T12:15:06.4661598Z ^ 2025-12-04T12:15:06.4661988Z type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5') 2025-12-04T12:15:06.4661994Z 2025-12-04T12:15:06.4662723Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:06.4662733Z 2025-12-04T12:15:06.4662737Z 2025-12-04T12:15:06.4662955Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:06.4663603Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_valid_cast_float32_shape_4,2048,4096_dst_types0_cuda_float32 2025-12-04T12:15:06.4663610Z 2025-12-04T12:15:06.4663882Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:06.4664069Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T12:15:06.4664296Z ================== 1 failed, 187 deselected, 2 rerun in 4.38s ================== 2025-12-04T12:15:06.4664399Z Got exit code 1 2025-12-04T12:15:06.4664955Z FAILED CONSISTENTLY: test/inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float32_shape_4,2048,4096_dst_types0_cuda_float32 2025-12-04T12:15:06.4665383Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T12:15:06.4665858Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-5c0e3bac2edd6805.xml 2025-12-04T12:15:06.4666044Z ============================= test session starts ============================== 2025-12-04T12:15:06.4666398Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T12:15:06.4666510Z cachedir: .pytest_cache 2025-12-04T12:15:06.4667056Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:15:06.4667218Z rootdir: /var/lib/jenkins/workspace 2025-12-04T12:15:06.4667347Z configfile: pytest.ini 2025-12-04T12:15:06.4667936Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T12:15:06.4668168Z collecting ... collected 188 items / 65 deselected / 123 selected 2025-12-04T12:15:06.4668334Z stepcurrent: skipping 65 already run items. 2025-12-04T12:15:06.4668454Z Running 123 items in this shard 2025-12-04T12:15:06.4668459Z 2025-12-04T12:15:06.4669525Z inductor/test_fp8.py::TestFP8TypesCUDA::test_xblock_for_small_numel_float8_e4m3fn_cuda E1204 12:14:00.809000 128469 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_0 2025-12-04T12:15:06.4670258Z E1204 12:14:00.809000 128469 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:06.4670697Z E1204 12:14:00.809000 128469 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T12:15:06.4671419Z E1204 12:14:00.809000 128469 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:06.4671980Z E1204 12:14:00.809000 128469 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T12:15:06.4672627Z E1204 12:14:00.809000 128469 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:] 2025-12-04T12:15:06.4673147Z E1204 12:14:00.809000 128469 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (0)) 2025-12-04T12:15:06.4673693Z E1204 12:14:00.809000 128469 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tl.broadcast_to(tmp0, [XBLOCK]) 2025-12-04T12:15:06.4674387Z E1204 12:14:00.809000 128469 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tmp1.to(tl.float8e4nv) 2025-12-04T12:15:06.4675112Z E1204 12:14:00.809000 128469 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr0 + (tl.full([XBLOCK], 0, tl.int32).broadcast_to(XBLOCK)), tmp2, None) 2025-12-04T12:15:06.4675484Z E1204 12:14:00.809000 128469 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:06.4677219Z E1204 12:14:00.809000 128469 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp32', 'out_ptr0': '*fp8e4nv', 'xnumel': 'constexpr', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 1, 'num_stages': 1, 'debug': True, 'cc': 75} 2025-12-04T12:15:06.4677772Z E1204 12:14:00.809000 128469 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:06.4678816Z E1204 12:14:00.809000 128469 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:06.4679442Z E1204 12:14:00.809000 128469 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:06.4680346Z E1204 12:14:00.809000 128469 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:06.4681082Z E1204 12:14:00.809000 128469 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:06.4681979Z E1204 12:14:00.809000 128469 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:06.4682747Z E1204 12:14:00.809000 128469 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:06.4683370Z E1204 12:14:00.809000 128469 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T12:15:06.4684084Z E1204 12:14:00.809000 128469 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:06.4684463Z E1204 12:14:00.809000 128469 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T12:15:06.4685363Z E1204 12:14:00.809000 128469 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:06.4685496Z ('RERUN', {'yellow': True}) [3.0549s] [ 0%] 2025-12-04T12:15:06.4686545Z inductor/test_fp8.py::TestFP8TypesCUDA::test_xblock_for_small_numel_float8_e4m3fn_cuda E1204 12:14:01.319000 128469 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_0 2025-12-04T12:15:06.4687299Z E1204 12:14:01.319000 128469 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:06.4687740Z E1204 12:14:01.319000 128469 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T12:15:06.4688342Z E1204 12:14:01.319000 128469 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:06.4688903Z E1204 12:14:01.319000 128469 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T12:15:06.4689476Z E1204 12:14:01.319000 128469 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:] 2025-12-04T12:15:06.4689999Z E1204 12:14:01.319000 128469 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (0)) 2025-12-04T12:15:06.4690561Z E1204 12:14:01.319000 128469 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tl.broadcast_to(tmp0, [XBLOCK]) 2025-12-04T12:15:06.4691080Z E1204 12:14:01.319000 128469 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tmp1.to(tl.float8e4nv) 2025-12-04T12:15:06.4691812Z E1204 12:14:01.319000 128469 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr0 + (tl.full([XBLOCK], 0, tl.int32).broadcast_to(XBLOCK)), tmp2, None) 2025-12-04T12:15:06.4692185Z E1204 12:14:01.319000 128469 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:06.4693915Z E1204 12:14:01.319000 128469 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp32', 'out_ptr0': '*fp8e4nv', 'xnumel': 'constexpr', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 1, 'num_stages': 1, 'debug': True, 'cc': 75} 2025-12-04T12:15:06.4694465Z E1204 12:14:01.319000 128469 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:06.4695538Z E1204 12:14:01.319000 128469 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:06.4696181Z E1204 12:14:01.319000 128469 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:06.4697135Z E1204 12:14:01.319000 128469 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:06.4697831Z E1204 12:14:01.319000 128469 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:06.4698710Z E1204 12:14:01.319000 128469 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:06.4699491Z E1204 12:14:01.319000 128469 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:06.4700109Z E1204 12:14:01.319000 128469 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T12:15:06.4700826Z E1204 12:14:01.319000 128469 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:06.4701243Z E1204 12:14:01.319000 128469 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T12:15:06.4702290Z E1204 12:14:01.319000 128469 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:06.4702488Z ('RERUN', {'yellow': True}) [0.2827s] [ 0%] 2025-12-04T12:15:06.4703559Z inductor/test_fp8.py::TestFP8TypesCUDA::test_xblock_for_small_numel_float8_e4m3fn_cuda E1204 12:14:01.602000 128469 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_0 2025-12-04T12:15:06.4704283Z E1204 12:14:01.602000 128469 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:06.4704734Z E1204 12:14:01.602000 128469 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T12:15:06.4705281Z E1204 12:14:01.602000 128469 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:06.4705859Z E1204 12:14:01.602000 128469 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T12:15:06.4706432Z E1204 12:14:01.602000 128469 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:] 2025-12-04T12:15:06.4706954Z E1204 12:14:01.602000 128469 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (0)) 2025-12-04T12:15:06.4707518Z E1204 12:14:01.602000 128469 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tl.broadcast_to(tmp0, [XBLOCK]) 2025-12-04T12:15:06.4708039Z E1204 12:14:01.602000 128469 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tmp1.to(tl.float8e4nv) 2025-12-04T12:15:06.4708775Z E1204 12:14:01.602000 128469 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr0 + (tl.full([XBLOCK], 0, tl.int32).broadcast_to(XBLOCK)), tmp2, None) 2025-12-04T12:15:06.4709140Z E1204 12:14:01.602000 128469 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:06.4710928Z E1204 12:14:01.602000 128469 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp32', 'out_ptr0': '*fp8e4nv', 'xnumel': 'constexpr', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 1, 'num_stages': 1, 'debug': True, 'cc': 75} 2025-12-04T12:15:06.4711466Z E1204 12:14:01.602000 128469 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:06.4712523Z E1204 12:14:01.602000 128469 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:06.4713157Z E1204 12:14:01.602000 128469 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:06.4714061Z E1204 12:14:01.602000 128469 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:06.4714760Z E1204 12:14:01.602000 128469 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:06.4715681Z E1204 12:14:01.602000 128469 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:06.4716519Z E1204 12:14:01.602000 128469 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:06.4717205Z E1204 12:14:01.602000 128469 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T12:15:06.4717939Z E1204 12:14:01.602000 128469 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:06.4718307Z E1204 12:14:01.602000 128469 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T12:15:06.4719211Z E1204 12:14:01.602000 128469 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:06.4719319Z FAILED [0.2805s] [ 0%] 2025-12-04T12:15:06.4719326Z 2025-12-04T12:15:06.4719474Z ==================================== RERUNS ==================================== 2025-12-04T12:15:06.4719782Z _______ TestFP8TypesCUDA.test_xblock_for_small_numel_float8_e4m3fn_cuda ________ 2025-12-04T12:15:06.4719914Z Traceback (most recent call last): 2025-12-04T12:15:06.4720366Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 91, in test_xblock_for_small_numel 2025-12-04T12:15:06.4720487Z actual = torch.compile(f)(x) 2025-12-04T12:15:06.4720980Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:06.4721249Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:06.4721768Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:06.4721969Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:06.4722495Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:06.4722646Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:06.4723234Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:06.4723562Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:06.4724085Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:06.4724250Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:06.4724733Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:06.4724868Z return self._compile_to_module() 2025-12-04T12:15:06.4725359Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:06.4725524Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:06.4726070Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:06.4726208Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:06.4726720Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:06.4726955Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:06.4727544Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:06.4727725Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:06.4728234Z File "/tmp/tmpvpk3crkk/dk/cdk5vi2ofixapffkl7vn54ayvwq6vxbrvzhgvnornrpgq27ef3tw.py", line 45, in 2025-12-04T12:15:06.4728701Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T12:15:06.4728860Z kernel.precompile( 2025-12-04T12:15:06.4729471Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T12:15:06.4729607Z self._precompile_worker() 2025-12-04T12:15:06.4730210Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:06.4730392Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:06.4731006Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:06.4731211Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:06.4731681Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:06.4731931Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:06.4732379Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:06.4732727Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:06.4732956Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:06.4733245Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:06.4733352Z ^ 2025-12-04T12:15:06.4733815Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:06.4733823Z 2025-12-04T12:15:06.4734550Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:06.4734557Z 2025-12-04T12:15:06.4734561Z 2025-12-04T12:15:06.4734780Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:06.4735393Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_xblock_for_small_numel_float8_e4m3fn_cuda 2025-12-04T12:15:06.4735399Z 2025-12-04T12:15:06.4735673Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:06.4735898Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:06.4736019Z frames [('total', 1)] 2025-12-04T12:15:06.4736140Z stats [('calls_captured', 1)] 2025-12-04T12:15:06.4736456Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:06.4736699Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:06.4736803Z graph_break [] 2025-12-04T12:15:06.4737108Z _______ TestFP8TypesCUDA.test_xblock_for_small_numel_float8_e4m3fn_cuda ________ 2025-12-04T12:15:06.4737235Z Traceback (most recent call last): 2025-12-04T12:15:06.4737675Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 91, in test_xblock_for_small_numel 2025-12-04T12:15:06.4737814Z actual = torch.compile(f)(x) 2025-12-04T12:15:06.4738310Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:06.4738562Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:06.4739096Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:06.4739292Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:06.4739862Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:06.4740010Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:06.4740547Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:06.4740919Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:06.4741472Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:06.4741639Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:06.4742123Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:06.4742249Z return self._compile_to_module() 2025-12-04T12:15:06.4742749Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:06.4742916Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:06.4743537Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:06.4743683Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:06.4744184Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:06.4744436Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:06.4745022Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:06.4745151Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:06.4745668Z File "/tmp/tmpjdeqtmqg/ru/cruj6wlj737fngl4mvq23ncz5u5wlnjubfm6kkwnyijtpdbpa3z7.py", line 45, in 2025-12-04T12:15:06.4746132Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T12:15:06.4746260Z kernel.precompile( 2025-12-04T12:15:06.4746815Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T12:15:06.4746934Z self._precompile_worker() 2025-12-04T12:15:06.4747603Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:06.4747788Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:06.4748384Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:06.4748597Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:06.4749049Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:06.4749311Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:06.4749761Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:06.4750103Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:06.4750348Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:06.4750644Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:06.4750736Z ^ 2025-12-04T12:15:06.4751212Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:06.4751218Z 2025-12-04T12:15:06.4751931Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:06.4751974Z 2025-12-04T12:15:06.4751979Z 2025-12-04T12:15:06.4752215Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:06.4752766Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_xblock_for_small_numel_float8_e4m3fn_cuda 2025-12-04T12:15:06.4752771Z 2025-12-04T12:15:06.4753052Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:06.4753309Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:06.4753446Z frames [('total', 1)] 2025-12-04T12:15:06.4753580Z stats [('calls_captured', 1)] 2025-12-04T12:15:06.4753820Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:06.4754042Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:06.4754156Z graph_break [] 2025-12-04T12:15:06.4754374Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:06.4754492Z frames [('total', 1)] 2025-12-04T12:15:06.4754608Z stats [('calls_captured', 1)] 2025-12-04T12:15:06.4754825Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:06.4755076Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:06.4755175Z graph_break [] 2025-12-04T12:15:06.4755322Z =================================== FAILURES =================================== 2025-12-04T12:15:06.4755624Z _______ TestFP8TypesCUDA.test_xblock_for_small_numel_float8_e4m3fn_cuda ________ 2025-12-04T12:15:06.4755752Z Traceback (most recent call last): 2025-12-04T12:15:06.4756197Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 91, in test_xblock_for_small_numel 2025-12-04T12:15:06.4756320Z actual = torch.compile(f)(x) 2025-12-04T12:15:06.4756813Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:06.4757079Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:06.4757599Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:06.4757798Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:06.4758329Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:06.4758477Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:06.4759058Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:06.4759384Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:06.4759905Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:06.4760071Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:06.4760553Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:06.4760689Z return self._compile_to_module() 2025-12-04T12:15:06.4761171Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:06.4761339Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:06.4761872Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:06.4762003Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:06.4762502Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:06.4762751Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:06.4763336Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:06.4763510Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:06.4764003Z File "/tmp/tmpe23i4x0q/m4/cm453kf6uooz34mn6h4mfgw3bzyev2ivt6ojijffnlcoepqgwz4c.py", line 45, in 2025-12-04T12:15:06.4764462Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T12:15:06.4764618Z kernel.precompile( 2025-12-04T12:15:06.4765207Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T12:15:06.4765338Z self._precompile_worker() 2025-12-04T12:15:06.4765935Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:06.4766115Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:06.4766723Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:06.4766922Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:06.4767371Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:06.4767631Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:06.4768084Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:06.4768431Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:06.4768660Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:06.4768946Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:06.4769053Z ^ 2025-12-04T12:15:06.4769510Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:06.4769519Z 2025-12-04T12:15:06.4770245Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:06.4770251Z 2025-12-04T12:15:06.4770256Z 2025-12-04T12:15:06.4770473Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:06.4771330Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_xblock_for_small_numel_float8_e4m3fn_cuda 2025-12-04T12:15:06.4771352Z 2025-12-04T12:15:06.4771629Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:06.4771855Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:06.4771975Z frames [('total', 1)] 2025-12-04T12:15:06.4772095Z stats [('calls_captured', 1)] 2025-12-04T12:15:06.4772341Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:06.4772581Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:06.4772683Z graph_break [] 2025-12-04T12:15:06.4772906Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:06.4773033Z frames [('total', 1)] 2025-12-04T12:15:06.4773150Z stats [('calls_captured', 1)] 2025-12-04T12:15:06.4773384Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:06.4773628Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:06.4773730Z graph_break [] 2025-12-04T12:15:06.4773958Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:06.4774065Z frames [('total', 1)] 2025-12-04T12:15:06.4774183Z stats [('calls_captured', 1)] 2025-12-04T12:15:06.4774412Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:06.4774693Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:06.4774794Z graph_break [] 2025-12-04T12:15:06.4775460Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-5c0e3bac2edd6805.xml - 2025-12-04T12:15:06.4775636Z =========================== short test summary info ============================ 2025-12-04T12:15:06.4776493Z FAILED [0.2805s] inductor/test_fp8.py::TestFP8TypesCUDA::test_xblock_for_small_numel_float8_e4m3fn_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:06.4776843Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:06.4776935Z ^ 2025-12-04T12:15:06.4777409Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:06.4777415Z 2025-12-04T12:15:06.4778127Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:06.4778135Z 2025-12-04T12:15:06.4778140Z 2025-12-04T12:15:06.4778375Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:06.4778934Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_xblock_for_small_numel_float8_e4m3fn_cuda 2025-12-04T12:15:06.4778942Z 2025-12-04T12:15:06.4779223Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:06.4779408Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T12:15:06.4779611Z ================== 1 failed, 65 deselected, 2 rerun in 3.66s =================== 2025-12-04T12:15:06.4779728Z Got exit code 1 2025-12-04T12:15:06.4779839Z Retrying single test... 2025-12-04T12:15:06.4780307Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-1c004f486086cbb5.xml 2025-12-04T12:15:06.4780487Z ============================= test session starts ============================== 2025-12-04T12:15:06.4780841Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T12:15:06.4780965Z cachedir: .pytest_cache 2025-12-04T12:15:06.4781484Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:15:06.4781613Z rootdir: /var/lib/jenkins/workspace 2025-12-04T12:15:06.4781735Z configfile: pytest.ini 2025-12-04T12:15:06.4782362Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T12:15:06.4783098Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T12:15:06.4783834Z stepcurrent: skipping 65 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_xblock_for_small_numel_float8_e4m3fn_cuda 2025-12-04T12:15:06.4784103Z Running 1 items in this shard 2025-12-04T12:15:06.4784110Z 2025-12-04T12:15:06.4785851Z inductor/test_fp8.py::TestFP8TypesCUDA::test_xblock_for_small_numel_float8_e4m3fn_cuda E1204 12:14:20.164000 128666 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_0 2025-12-04T12:15:06.4787555Z E1204 12:14:20.164000 128666 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:06.4788011Z E1204 12:14:20.164000 128666 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T12:15:06.4789102Z E1204 12:14:20.164000 128666 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:06.4789669Z E1204 12:14:20.164000 128666 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T12:15:06.4790312Z E1204 12:14:20.164000 128666 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:] 2025-12-04T12:15:06.4790835Z E1204 12:14:20.164000 128666 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (0)) 2025-12-04T12:15:06.4792274Z E1204 12:14:20.164000 128666 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tl.broadcast_to(tmp0, [XBLOCK]) 2025-12-04T12:15:06.4793805Z E1204 12:14:20.164000 128666 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tmp1.to(tl.float8e4nv) 2025-12-04T12:15:06.4794543Z E1204 12:14:20.164000 128666 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr0 + (tl.full([XBLOCK], 0, tl.int32).broadcast_to(XBLOCK)), tmp2, None) 2025-12-04T12:15:06.4794921Z E1204 12:14:20.164000 128666 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:06.4797098Z E1204 12:14:20.164000 128666 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp32', 'out_ptr0': '*fp8e4nv', 'xnumel': 'constexpr', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 1, 'num_stages': 1, 'debug': True, 'cc': 75} 2025-12-04T12:15:06.4797745Z E1204 12:14:20.164000 128666 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:06.4798952Z E1204 12:14:20.164000 128666 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:06.4799656Z E1204 12:14:20.164000 128666 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:06.4800856Z E1204 12:14:20.164000 128666 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:06.4801655Z E1204 12:14:20.164000 128666 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:06.4802743Z E1204 12:14:20.164000 128666 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:06.4803517Z E1204 12:14:20.164000 128666 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:06.4804141Z E1204 12:14:20.164000 128666 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T12:15:06.4804956Z E1204 12:14:20.164000 128666 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:06.4805398Z E1204 12:14:20.164000 128666 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T12:15:06.4806613Z E1204 12:14:20.164000 128666 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:06.4806764Z ('RERUN', {'yellow': True}) [3.0389s] [100%] 2025-12-04T12:15:06.4808056Z inductor/test_fp8.py::TestFP8TypesCUDA::test_xblock_for_small_numel_float8_e4m3fn_cuda E1204 12:14:20.668000 128666 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_0 2025-12-04T12:15:06.4808944Z E1204 12:14:20.668000 128666 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:06.4809491Z E1204 12:14:20.668000 128666 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T12:15:06.4810075Z E1204 12:14:20.668000 128666 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:06.4810767Z E1204 12:14:20.668000 128666 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T12:15:06.4811404Z E1204 12:14:20.668000 128666 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:] 2025-12-04T12:15:06.4811989Z E1204 12:14:20.668000 128666 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (0)) 2025-12-04T12:15:06.4812707Z E1204 12:14:20.668000 128666 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tl.broadcast_to(tmp0, [XBLOCK]) 2025-12-04T12:15:06.4813337Z E1204 12:14:20.668000 128666 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tmp1.to(tl.float8e4nv) 2025-12-04T12:15:06.4814238Z E1204 12:14:20.668000 128666 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr0 + (tl.full([XBLOCK], 0, tl.int32).broadcast_to(XBLOCK)), tmp2, None) 2025-12-04T12:15:06.4814605Z E1204 12:14:20.668000 128666 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:06.4816837Z E1204 12:14:20.668000 128666 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp32', 'out_ptr0': '*fp8e4nv', 'xnumel': 'constexpr', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 1, 'num_stages': 1, 'debug': True, 'cc': 75} 2025-12-04T12:15:06.4817381Z E1204 12:14:20.668000 128666 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:06.4818474Z E1204 12:14:20.668000 128666 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:06.4819120Z E1204 12:14:20.668000 128666 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:06.4820010Z E1204 12:14:20.668000 128666 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:06.4820707Z E1204 12:14:20.668000 128666 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:06.4821593Z E1204 12:14:20.668000 128666 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:06.4822382Z E1204 12:14:20.668000 128666 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:06.4822989Z E1204 12:14:20.668000 128666 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T12:15:06.4823707Z E1204 12:14:20.668000 128666 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:06.4824129Z E1204 12:14:20.668000 128666 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T12:15:06.4825025Z E1204 12:14:20.668000 128666 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:06.4825210Z ('RERUN', {'yellow': True}) [0.2795s] [100%] 2025-12-04T12:15:06.4826279Z inductor/test_fp8.py::TestFP8TypesCUDA::test_xblock_for_small_numel_float8_e4m3fn_cuda E1204 12:14:20.948000 128666 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_0 2025-12-04T12:15:06.4827013Z E1204 12:14:20.948000 128666 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:06.4827445Z E1204 12:14:20.948000 128666 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T12:15:06.4827988Z E1204 12:14:20.948000 128666 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:06.4828564Z E1204 12:14:20.948000 128666 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T12:15:06.4829134Z E1204 12:14:20.948000 128666 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:] 2025-12-04T12:15:06.4829671Z E1204 12:14:20.948000 128666 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (0)) 2025-12-04T12:15:06.4830219Z E1204 12:14:20.948000 128666 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tl.broadcast_to(tmp0, [XBLOCK]) 2025-12-04T12:15:06.4830745Z E1204 12:14:20.948000 128666 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tmp1.to(tl.float8e4nv) 2025-12-04T12:15:06.4831488Z E1204 12:14:20.948000 128666 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr0 + (tl.full([XBLOCK], 0, tl.int32).broadcast_to(XBLOCK)), tmp2, None) 2025-12-04T12:15:06.4831854Z E1204 12:14:20.948000 128666 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:06.4833630Z E1204 12:14:20.948000 128666 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp32', 'out_ptr0': '*fp8e4nv', 'xnumel': 'constexpr', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 1, 'num_stages': 1, 'debug': True, 'cc': 75} 2025-12-04T12:15:06.4834171Z E1204 12:14:20.948000 128666 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:06.4835228Z E1204 12:14:20.948000 128666 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:06.4835867Z E1204 12:14:20.948000 128666 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:06.4836770Z E1204 12:14:20.948000 128666 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:06.4837450Z E1204 12:14:20.948000 128666 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:06.4838376Z E1204 12:14:20.948000 128666 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:06.4839237Z E1204 12:14:20.948000 128666 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:06.4840095Z E1204 12:14:20.948000 128666 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T12:15:06.4840972Z E1204 12:14:20.948000 128666 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:06.4841343Z E1204 12:14:20.948000 128666 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T12:15:06.4842445Z E1204 12:14:20.948000 128666 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:06.4842558Z FAILED [0.2777s] [100%] 2025-12-04T12:15:06.4842565Z 2025-12-04T12:15:06.4842711Z ==================================== RERUNS ==================================== 2025-12-04T12:15:06.4843078Z _______ TestFP8TypesCUDA.test_xblock_for_small_numel_float8_e4m3fn_cuda ________ 2025-12-04T12:15:06.4843210Z Traceback (most recent call last): 2025-12-04T12:15:06.4843652Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 91, in test_xblock_for_small_numel 2025-12-04T12:15:06.4843857Z actual = torch.compile(f)(x) 2025-12-04T12:15:06.4844347Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:06.4844613Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:06.4845126Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:06.4845386Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:06.4845934Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:06.4846137Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:06.4846741Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:06.4847066Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:06.4847586Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:06.4847752Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:06.4848337Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:06.4848462Z return self._compile_to_module() 2025-12-04T12:15:06.4848964Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:06.4849126Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:06.4849661Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:06.4849797Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:06.4850366Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:06.4850619Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:06.4851210Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:06.4851441Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:06.4851949Z File "/tmp/tmp5up7ijr4/e6/ce62zz5vmvcbyvithppljypvzklyai4oveyk3awto4eqffvev77d.py", line 45, in 2025-12-04T12:15:06.4852416Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T12:15:06.4852593Z kernel.precompile( 2025-12-04T12:15:06.4853284Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T12:15:06.4853416Z self._precompile_worker() 2025-12-04T12:15:06.4854030Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:06.4854211Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:06.4854822Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:06.4855027Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:06.4855565Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:06.4855829Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:06.4856449Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:06.4856860Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:06.4857091Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:06.4857475Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:06.4857632Z ^ 2025-12-04T12:15:06.4858143Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:06.4858154Z 2025-12-04T12:15:06.4858932Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:06.4858940Z 2025-12-04T12:15:06.4858945Z 2025-12-04T12:15:06.4859164Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:06.4859765Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_xblock_for_small_numel_float8_e4m3fn_cuda 2025-12-04T12:15:06.4859774Z 2025-12-04T12:15:06.4860056Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:06.4860282Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:06.4860402Z frames [('total', 1)] 2025-12-04T12:15:06.4860520Z stats [('calls_captured', 1)] 2025-12-04T12:15:06.4860762Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:06.4861001Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:06.4861103Z graph_break [] 2025-12-04T12:15:06.4861393Z _______ TestFP8TypesCUDA.test_xblock_for_small_numel_float8_e4m3fn_cuda ________ 2025-12-04T12:15:06.4861533Z Traceback (most recent call last): 2025-12-04T12:15:06.4861973Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 91, in test_xblock_for_small_numel 2025-12-04T12:15:06.4862111Z actual = torch.compile(f)(x) 2025-12-04T12:15:06.4862823Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:06.4863133Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:06.4863718Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:06.4863917Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:06.4864471Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:06.4864633Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:06.4865170Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:06.4865502Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:06.4866094Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:06.4866245Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:06.4866744Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:06.4866868Z return self._compile_to_module() 2025-12-04T12:15:06.4867366Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:06.4867535Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:06.4868052Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:06.4868199Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:06.4868696Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:06.4868936Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:06.4869535Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:06.4869663Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:06.4870160Z File "/tmp/tmpbe076_mg/xb/cxbzh4vedyzoq2dfxsdpfge2fyym77n4croircujachkuzlmvsjc.py", line 45, in 2025-12-04T12:15:06.4870623Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T12:15:06.4870737Z kernel.precompile( 2025-12-04T12:15:06.4871495Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T12:15:06.4871616Z self._precompile_worker() 2025-12-04T12:15:06.4872308Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:06.4872494Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:06.4873091Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:06.4873303Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:06.4873755Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:06.4874002Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:06.4874460Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:06.4874800Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:06.4875045Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:06.4875336Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:06.4875426Z ^ 2025-12-04T12:15:06.4875899Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:06.4875905Z 2025-12-04T12:15:06.4876615Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:06.4876676Z 2025-12-04T12:15:06.4876682Z 2025-12-04T12:15:06.4876913Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:06.4877468Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_xblock_for_small_numel_float8_e4m3fn_cuda 2025-12-04T12:15:06.4877473Z 2025-12-04T12:15:06.4877754Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:06.4878020Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:06.4878171Z frames [('total', 1)] 2025-12-04T12:15:06.4878301Z stats [('calls_captured', 1)] 2025-12-04T12:15:06.4878543Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:06.4878764Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:06.4878880Z graph_break [] 2025-12-04T12:15:06.4879099Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:06.4879207Z frames [('total', 1)] 2025-12-04T12:15:06.4879337Z stats [('calls_captured', 1)] 2025-12-04T12:15:06.4879555Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:06.4879802Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:06.4879904Z graph_break [] 2025-12-04T12:15:06.4880053Z =================================== FAILURES =================================== 2025-12-04T12:15:06.4880360Z _______ TestFP8TypesCUDA.test_xblock_for_small_numel_float8_e4m3fn_cuda ________ 2025-12-04T12:15:06.4880490Z Traceback (most recent call last): 2025-12-04T12:15:06.4880927Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 91, in test_xblock_for_small_numel 2025-12-04T12:15:06.4881060Z actual = torch.compile(f)(x) 2025-12-04T12:15:06.4881554Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:06.4881826Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:06.4882340Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:06.4882535Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:06.4883058Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:06.4883209Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:06.4883800Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:06.4884139Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:06.4884656Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:06.4884817Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:06.4885303Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:06.4885428Z return self._compile_to_module() 2025-12-04T12:15:06.4885925Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:06.4886090Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:06.4886625Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:06.4886755Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:06.4887251Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:06.4887500Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:06.4888084Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:06.4888323Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:06.4888837Z File "/tmp/tmpf7nuk2b2/hf/chfizvyhutfvs77r7rwygxw3wl3n7zeg7z5aci55mrorbhxncngz.py", line 45, in 2025-12-04T12:15:06.4889298Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T12:15:06.4889458Z kernel.precompile( 2025-12-04T12:15:06.4890047Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T12:15:06.4890169Z self._precompile_worker() 2025-12-04T12:15:06.4890787Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:06.4890970Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:06.4891575Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:06.4891780Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:06.4892232Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:06.4892496Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:06.4892947Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:06.4893299Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:06.4893528Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:06.4893815Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:06.4893926Z ^ 2025-12-04T12:15:06.4894380Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:06.4894389Z 2025-12-04T12:15:06.4895102Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:06.4895122Z 2025-12-04T12:15:06.4895127Z 2025-12-04T12:15:06.4895347Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:06.4895935Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_xblock_for_small_numel_float8_e4m3fn_cuda 2025-12-04T12:15:06.4895941Z 2025-12-04T12:15:06.4896224Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:06.4896519Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:06.4896643Z frames [('total', 1)] 2025-12-04T12:15:06.4896762Z stats [('calls_captured', 1)] 2025-12-04T12:15:06.4897005Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:06.4897247Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:06.4897351Z graph_break [] 2025-12-04T12:15:06.4897572Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:06.4897692Z frames [('total', 1)] 2025-12-04T12:15:06.4897809Z stats [('calls_captured', 1)] 2025-12-04T12:15:06.4898031Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:06.4898289Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:06.4898391Z graph_break [] 2025-12-04T12:15:06.4898627Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:06.4898737Z frames [('total', 1)] 2025-12-04T12:15:06.4898853Z stats [('calls_captured', 1)] 2025-12-04T12:15:06.4899085Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:06.4899321Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:06.4899461Z graph_break [] 2025-12-04T12:15:06.4900133Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-1c004f486086cbb5.xml - 2025-12-04T12:15:06.4900308Z =========================== short test summary info ============================ 2025-12-04T12:15:06.4901033Z FAILED [0.2777s] inductor/test_fp8.py::TestFP8TypesCUDA::test_xblock_for_small_numel_float8_e4m3fn_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:06.4901379Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:06.4901471Z ^ 2025-12-04T12:15:06.4901941Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:06.4901947Z 2025-12-04T12:15:06.4902659Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:06.4902668Z 2025-12-04T12:15:06.4902672Z 2025-12-04T12:15:06.4902903Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:06.4903461Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_xblock_for_small_numel_float8_e4m3fn_cuda 2025-12-04T12:15:06.4903466Z 2025-12-04T12:15:06.4903737Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:06.4903938Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T12:15:06.4904142Z ================== 1 failed, 187 deselected, 2 rerun in 3.64s ================== 2025-12-04T12:15:06.4904257Z Got exit code 1 2025-12-04T12:15:06.4904369Z Retrying single test... 2025-12-04T12:15:06.4904839Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-5e05e7b060f911b0.xml 2025-12-04T12:15:06.4905021Z ============================= test session starts ============================== 2025-12-04T12:15:06.4905376Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T12:15:06.4905487Z cachedir: .pytest_cache 2025-12-04T12:15:06.4906020Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:15:06.4906150Z rootdir: /var/lib/jenkins/workspace 2025-12-04T12:15:06.4906275Z configfile: pytest.ini 2025-12-04T12:15:06.4906902Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T12:15:06.4907131Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T12:15:06.4907775Z stepcurrent: skipping 65 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_xblock_for_small_numel_float8_e4m3fn_cuda 2025-12-04T12:15:06.4907894Z Running 1 items in this shard 2025-12-04T12:15:06.4907899Z 2025-12-04T12:15:06.4908954Z inductor/test_fp8.py::TestFP8TypesCUDA::test_xblock_for_small_numel_float8_e4m3fn_cuda E1204 12:14:39.614000 128863 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_0 2025-12-04T12:15:06.4909683Z E1204 12:14:39.614000 128863 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:06.4910120Z E1204 12:14:39.614000 128863 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T12:15:06.4910679Z E1204 12:14:39.614000 128863 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:06.4911237Z E1204 12:14:39.614000 128863 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T12:15:06.4911844Z E1204 12:14:39.614000 128863 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:] 2025-12-04T12:15:06.4912365Z E1204 12:14:39.614000 128863 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (0)) 2025-12-04T12:15:06.4912913Z E1204 12:14:39.614000 128863 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tl.broadcast_to(tmp0, [XBLOCK]) 2025-12-04T12:15:06.4913511Z E1204 12:14:39.614000 128863 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tmp1.to(tl.float8e4nv) 2025-12-04T12:15:06.4914239Z E1204 12:14:39.614000 128863 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr0 + (tl.full([XBLOCK], 0, tl.int32).broadcast_to(XBLOCK)), tmp2, None) 2025-12-04T12:15:06.4914615Z E1204 12:14:39.614000 128863 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:06.4916357Z E1204 12:14:39.614000 128863 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp32', 'out_ptr0': '*fp8e4nv', 'xnumel': 'constexpr', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 1, 'num_stages': 1, 'debug': True, 'cc': 75} 2025-12-04T12:15:06.4916914Z E1204 12:14:39.614000 128863 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:06.4917958Z E1204 12:14:39.614000 128863 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:06.4918604Z E1204 12:14:39.614000 128863 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:06.4919501Z E1204 12:14:39.614000 128863 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:06.4920182Z E1204 12:14:39.614000 128863 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:06.4921113Z E1204 12:14:39.614000 128863 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:06.4921892Z E1204 12:14:39.614000 128863 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:06.4922516Z E1204 12:14:39.614000 128863 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T12:15:06.4923233Z E1204 12:14:39.614000 128863 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:06.4923616Z E1204 12:14:39.614000 128863 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T12:15:06.4924519Z E1204 12:14:39.614000 128863 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:06.4924654Z ('RERUN', {'yellow': True}) [3.0556s] [100%] 2025-12-04T12:15:06.4925699Z inductor/test_fp8.py::TestFP8TypesCUDA::test_xblock_for_small_numel_float8_e4m3fn_cuda E1204 12:14:40.122000 128863 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_0 2025-12-04T12:15:06.4926459Z E1204 12:14:40.122000 128863 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:06.4926902Z E1204 12:14:40.122000 128863 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T12:15:06.4927478Z E1204 12:14:40.122000 128863 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:06.4928089Z E1204 12:14:40.122000 128863 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T12:15:06.4928665Z E1204 12:14:40.122000 128863 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:] 2025-12-04T12:15:06.4929188Z E1204 12:14:40.122000 128863 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (0)) 2025-12-04T12:15:06.4929755Z E1204 12:14:40.122000 128863 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tl.broadcast_to(tmp0, [XBLOCK]) 2025-12-04T12:15:06.4930276Z E1204 12:14:40.122000 128863 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tmp1.to(tl.float8e4nv) 2025-12-04T12:15:06.4931017Z E1204 12:14:40.122000 128863 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr0 + (tl.full([XBLOCK], 0, tl.int32).broadcast_to(XBLOCK)), tmp2, None) 2025-12-04T12:15:06.4931383Z E1204 12:14:40.122000 128863 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:06.4933113Z E1204 12:14:40.122000 128863 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp32', 'out_ptr0': '*fp8e4nv', 'xnumel': 'constexpr', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 1, 'num_stages': 1, 'debug': True, 'cc': 75} 2025-12-04T12:15:06.4933662Z E1204 12:14:40.122000 128863 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:06.4934760Z E1204 12:14:40.122000 128863 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:06.4935415Z E1204 12:14:40.122000 128863 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:06.4936364Z E1204 12:14:40.122000 128863 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:06.4937063Z E1204 12:14:40.122000 128863 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:06.4937943Z E1204 12:14:40.122000 128863 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:06.4938730Z E1204 12:14:40.122000 128863 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:06.4939340Z E1204 12:14:40.122000 128863 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T12:15:06.4940068Z E1204 12:14:40.122000 128863 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:06.4940498Z E1204 12:14:40.122000 128863 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T12:15:06.4941397Z E1204 12:14:40.122000 128863 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:06.4941588Z ('RERUN', {'yellow': True}) [0.2822s] [100%] 2025-12-04T12:15:06.4942655Z inductor/test_fp8.py::TestFP8TypesCUDA::test_xblock_for_small_numel_float8_e4m3fn_cuda E1204 12:14:40.401000 128863 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Triton compilation failed: triton_poi_fused__to_copy_0 2025-12-04T12:15:06.4943374Z E1204 12:14:40.401000 128863 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:06.4943826Z E1204 12:14:40.401000 128863 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xnumel = 1 2025-12-04T12:15:06.4944365Z E1204 12:14:40.401000 128863 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xoffset = tl.program_id(0) * XBLOCK 2025-12-04T12:15:06.4944946Z E1204 12:14:40.401000 128863 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:] 2025-12-04T12:15:06.4945522Z E1204 12:14:40.401000 128863 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] xmask = tl.full([XBLOCK], True, tl.int1)[:] 2025-12-04T12:15:06.4946067Z E1204 12:14:40.401000 128863 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp0 = tl.load(in_ptr0 + (0)) 2025-12-04T12:15:06.4946618Z E1204 12:14:40.401000 128863 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp1 = tl.broadcast_to(tmp0, [XBLOCK]) 2025-12-04T12:15:06.4947149Z E1204 12:14:40.401000 128863 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tmp2 = tmp1.to(tl.float8e4nv) 2025-12-04T12:15:06.4947890Z E1204 12:14:40.401000 128863 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] tl.store(out_ptr0 + (tl.full([XBLOCK], 0, tl.int32).broadcast_to(XBLOCK)), tmp2, None) 2025-12-04T12:15:06.4948252Z E1204 12:14:40.401000 128863 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] 2025-12-04T12:15:06.4950046Z E1204 12:14:40.401000 128863 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] metadata: {'signature': {'in_ptr0': '*fp32', 'out_ptr0': '*fp8e4nv', 'xnumel': 'constexpr', 'XBLOCK': 'constexpr'}, 'device': 0, 'constants': {'xnumel': 1, 'XBLOCK': 1}, 'native_matmul': False, 'configs': [{(0,): [['tt.divisibility', 16]], (1,): [['tt.divisibility', 16]]}], 'enable_fp_fusion': True, 'device_type': 'cuda', 'num_warps': 1, 'num_stages': 1, 'debug': True, 'cc': 75} 2025-12-04T12:15:06.4950591Z E1204 12:14:40.401000 128863 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] Traceback (most recent call last): 2025-12-04T12:15:06.4951644Z E1204 12:14:40.401000 128863 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:06.4952278Z E1204 12:14:40.401000 128863 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:06.4953169Z E1204 12:14:40.401000 128863 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:06.4953865Z E1204 12:14:40.401000 128863 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:06.4954787Z E1204 12:14:40.401000 128863 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:06.4955576Z E1204 12:14:40.401000 128863 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:06.4956249Z E1204 12:14:40.401000 128863 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] triton.compiler.errors.CompilationError: at 1:0: 2025-12-04T12:15:06.4956983Z E1204 12:14:40.401000 128863 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:06.4957355Z E1204 12:14:40.401000 128863 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ^ 2025-12-04T12:15:06.4958244Z E1204 12:14:40.401000 128863 site-packages/torch/_inductor/runtime/triton_heuristics.py:810] [0/0] ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:06.4958365Z FAILED [0.2767s] [100%] 2025-12-04T12:15:06.4958371Z 2025-12-04T12:15:06.4958516Z ==================================== RERUNS ==================================== 2025-12-04T12:15:06.4958825Z _______ TestFP8TypesCUDA.test_xblock_for_small_numel_float8_e4m3fn_cuda ________ 2025-12-04T12:15:06.4958955Z Traceback (most recent call last): 2025-12-04T12:15:06.4959395Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 91, in test_xblock_for_small_numel 2025-12-04T12:15:06.4959533Z actual = torch.compile(f)(x) 2025-12-04T12:15:06.4960022Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:06.4960287Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:06.4960803Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:06.4960999Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:06.4961523Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:06.4961674Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:06.4962244Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:06.4962583Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:06.4963107Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:06.4963270Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:06.4963756Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:06.4963880Z return self._compile_to_module() 2025-12-04T12:15:06.4964376Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:06.4964542Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:06.4965077Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:06.4965214Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:06.4965711Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:06.4965959Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:06.4966545Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:06.4966707Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:06.4967223Z File "/tmp/tmplrulxepv/bw/cbw6pd55lttopoe2rolibfuhv7sndqskb5qwhnzhq5yl7an2klzj.py", line 45, in 2025-12-04T12:15:06.4967684Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T12:15:06.4967816Z kernel.precompile( 2025-12-04T12:15:06.4968406Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T12:15:06.4968557Z self._precompile_worker() 2025-12-04T12:15:06.4969177Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:06.4969360Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:06.4969968Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:06.4970171Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:06.4970622Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:06.4970886Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:06.4971509Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:06.4971854Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:06.4972102Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:06.4972390Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:06.4972498Z ^ 2025-12-04T12:15:06.4972955Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:06.4972962Z 2025-12-04T12:15:06.4973674Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:06.4973693Z 2025-12-04T12:15:06.4973697Z 2025-12-04T12:15:06.4973918Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:06.4974474Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_xblock_for_small_numel_float8_e4m3fn_cuda 2025-12-04T12:15:06.4974558Z 2025-12-04T12:15:06.4974843Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:06.4975067Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:06.4975173Z frames [('total', 1)] 2025-12-04T12:15:06.4975306Z stats [('calls_captured', 1)] 2025-12-04T12:15:06.4975548Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:06.4975786Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:06.4975889Z graph_break [] 2025-12-04T12:15:06.4976195Z _______ TestFP8TypesCUDA.test_xblock_for_small_numel_float8_e4m3fn_cuda ________ 2025-12-04T12:15:06.4976390Z Traceback (most recent call last): 2025-12-04T12:15:06.4976829Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 91, in test_xblock_for_small_numel 2025-12-04T12:15:06.4976953Z actual = torch.compile(f)(x) 2025-12-04T12:15:06.4977466Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:06.4977717Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:06.4978246Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:06.4978443Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:06.4979009Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:06.4979177Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:06.4979713Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:06.4980051Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:06.4980684Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:06.4980839Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:06.4981337Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:06.4981462Z return self._compile_to_module() 2025-12-04T12:15:06.4981951Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:06.4982137Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:06.4982656Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:06.4982802Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:06.4983300Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:06.4983546Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:06.4984149Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:06.4984280Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:06.4984796Z File "/tmp/tmpfug849ze/7d/c7dq77infxlvw22t4ophndfmtu3hjgapvvilsj5yp43sqze7dum4.py", line 45, in 2025-12-04T12:15:06.4985266Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T12:15:06.4985381Z kernel.precompile( 2025-12-04T12:15:06.4985952Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T12:15:06.4986074Z self._precompile_worker() 2025-12-04T12:15:06.4986670Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:06.4986910Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:06.4987510Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:06.4987722Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:06.4988176Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:06.4988427Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:06.4988884Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:06.4989224Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:06.4989466Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:06.4989786Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:06.4989881Z ^ 2025-12-04T12:15:06.4990355Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:06.4990360Z 2025-12-04T12:15:06.4991072Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:06.4991114Z 2025-12-04T12:15:06.4991118Z 2025-12-04T12:15:06.4991349Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:06.4991903Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_xblock_for_small_numel_float8_e4m3fn_cuda 2025-12-04T12:15:06.4991908Z 2025-12-04T12:15:06.4992176Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:06.4992448Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:06.4992556Z frames [('total', 1)] 2025-12-04T12:15:06.4992717Z stats [('calls_captured', 1)] 2025-12-04T12:15:06.4992957Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:06.4993179Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:06.4993295Z graph_break [] 2025-12-04T12:15:06.4993515Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:06.4993626Z frames [('total', 1)] 2025-12-04T12:15:06.4993754Z stats [('calls_captured', 1)] 2025-12-04T12:15:06.4993971Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:06.4994206Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:06.4994322Z graph_break [] 2025-12-04T12:15:06.4994470Z =================================== FAILURES =================================== 2025-12-04T12:15:06.4994779Z _______ TestFP8TypesCUDA.test_xblock_for_small_numel_float8_e4m3fn_cuda ________ 2025-12-04T12:15:06.4994908Z Traceback (most recent call last): 2025-12-04T12:15:06.4995346Z File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 91, in test_xblock_for_small_numel 2025-12-04T12:15:06.4995487Z actual = torch.compile(f)(x) 2025-12-04T12:15:06.4995976Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:15:06.4996254Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:15:06.4996782Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T12:15:06.4996976Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:15:06.4997501Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T12:15:06.4997654Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:15:06.4998231Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:15:06.4998568Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:15:06.4999087Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile 2025-12-04T12:15:06.4999248Z compiled_module = graph.compile_to_module() 2025-12-04T12:15:06.4999734Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2410, in compile_to_module 2025-12-04T12:15:06.4999857Z return self._compile_to_module() 2025-12-04T12:15:06.5000360Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2420, in _compile_to_module 2025-12-04T12:15:06.5000525Z mod = self._compile_to_module_lines(wrapper_code) 2025-12-04T12:15:06.5001046Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2495, in _compile_to_module_lines 2025-12-04T12:15:06.5001192Z mod = PyCodeCache.load_by_key_path( 2025-12-04T12:15:06.5001688Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 3673, in load_by_key_path 2025-12-04T12:15:06.5001935Z mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) 2025-12-04T12:15:06.5002519Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module 2025-12-04T12:15:06.5002681Z exec(code, mod.__dict__, mod.__dict__) 2025-12-04T12:15:06.5003191Z File "/tmp/tmpyjm1cd0y/qq/cqq53c7t2khz6m3yi4fjlkv76anwivkukfxnsjohara7iurydepe.py", line 45, in 2025-12-04T12:15:06.5003654Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/async_compile.py", line 477, in triton 2025-12-04T12:15:06.5003815Z kernel.precompile( 2025-12-04T12:15:06.5004403Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 441, in precompile 2025-12-04T12:15:06.5004521Z self._precompile_worker() 2025-12-04T12:15:06.5005132Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 463, in _precompile_worker 2025-12-04T12:15:06.5005310Z compile_results.append(self._precompile_config(c)) 2025-12-04T12:15:06.5005904Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 808, in _precompile_config 2025-12-04T12:15:06.5006118Z binary = triton.compile(*compile_args, **compile_kwargs) 2025-12-04T12:15:06.5006567Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 300, in compile 2025-12-04T12:15:06.5006825Z module = src.make_ir(target, options, codegen_fns, module_map, context) 2025-12-04T12:15:06.5007272Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/compiler/compiler.py", line 80, in make_ir 2025-12-04T12:15:06.5007611Z return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns, 2025-12-04T12:15:06.5007855Z torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:06.5008144Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:06.5008249Z ^ 2025-12-04T12:15:06.5008707Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:06.5008717Z 2025-12-04T12:15:06.5009425Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:06.5009432Z 2025-12-04T12:15:06.5009437Z 2025-12-04T12:15:06.5009668Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:06.5010263Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_xblock_for_small_numel_float8_e4m3fn_cuda 2025-12-04T12:15:06.5010272Z 2025-12-04T12:15:06.5010556Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:06.5010780Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:06.5010887Z frames [('total', 1)] 2025-12-04T12:15:06.5011017Z stats [('calls_captured', 1)] 2025-12-04T12:15:06.5011255Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:06.5011489Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:06.5011589Z graph_break [] 2025-12-04T12:15:06.5011807Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:06.5011924Z frames [('total', 1)] 2025-12-04T12:15:06.5012040Z stats [('calls_captured', 1)] 2025-12-04T12:15:06.5012260Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:06.5012517Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:06.5012620Z graph_break [] 2025-12-04T12:15:06.5012840Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:15:06.5012959Z frames [('total', 1)] 2025-12-04T12:15:06.5013075Z stats [('calls_captured', 1)] 2025-12-04T12:15:06.5013305Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:15:06.5013538Z inductor [('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T12:15:06.5013673Z graph_break [] 2025-12-04T12:15:06.5014337Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-5e05e7b060f911b0.xml - 2025-12-04T12:15:06.5014512Z =========================== short test summary info ============================ 2025-12-04T12:15:06.5015229Z FAILED [0.2767s] inductor/test_fp8.py::TestFP8TypesCUDA::test_xblock_for_small_numel_float8_e4m3fn_cuda - torch._inductor.exc.InductorError: CompilationError: at 1:0: 2025-12-04T12:15:06.5015588Z def triton_poi_fused__to_copy_0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): 2025-12-04T12:15:06.5015680Z ^ 2025-12-04T12:15:06.5016151Z ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')") 2025-12-04T12:15:06.5016156Z 2025-12-04T12:15:06.5016949Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:15:06.5016958Z 2025-12-04T12:15:06.5016962Z 2025-12-04T12:15:06.5017193Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:06.5017749Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_xblock_for_small_numel_float8_e4m3fn_cuda 2025-12-04T12:15:06.5017755Z 2025-12-04T12:15:06.5018024Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:06.5018227Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T12:15:06.5018434Z ================== 1 failed, 187 deselected, 2 rerun in 3.66s ================== 2025-12-04T12:15:06.5018539Z Got exit code 1 2025-12-04T12:15:06.5019030Z FAILED CONSISTENTLY: test/inductor/test_fp8.py::TestFP8TypesCUDA::test_xblock_for_small_numel_float8_e4m3fn_cuda 2025-12-04T12:15:06.5019443Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T12:15:06.5019931Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-4d9849432c7f5caf.xml 2025-12-04T12:15:06.5020099Z ============================= test session starts ============================== 2025-12-04T12:15:06.5020453Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T12:15:06.5020578Z cachedir: .pytest_cache 2025-12-04T12:15:06.5021146Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:15:06.5021292Z rootdir: /var/lib/jenkins/workspace 2025-12-04T12:15:06.5021403Z configfile: pytest.ini 2025-12-04T12:15:06.5021996Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T12:15:06.5022240Z collecting ... collected 188 items / 66 deselected / 122 selected 2025-12-04T12:15:06.5022389Z stepcurrent: skipping 66 already run items. 2025-12-04T12:15:06.5022507Z Running 122 items in this shard 2025-12-04T12:15:06.5022513Z 2025-12-04T12:15:06.5022960Z inductor/test_fp8.py::TestFP8TypesCUDA::test_xblock_for_small_numel_float8_e5m2_cuda PASSED [2.9121s] [ 0%] 2025-12-04T12:15:06.5023848Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_main_loop_scaling_shape0_use_fast_accum_False_scaling_block_sizes0_cuda SKIPPED [0.0004s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 1%] 2025-12-04T12:15:06.5024746Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_main_loop_scaling_shape0_use_fast_accum_False_scaling_block_sizes1_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 2%] 2025-12-04T12:15:06.5025619Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_main_loop_scaling_shape0_use_fast_accum_True_scaling_block_sizes0_cuda SKIPPED [0.0004s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 3%] 2025-12-04T12:15:06.5026553Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_main_loop_scaling_shape0_use_fast_accum_True_scaling_block_sizes1_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 4%] 2025-12-04T12:15:06.5027426Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_main_loop_scaling_shape1_use_fast_accum_False_scaling_block_sizes0_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 4%] 2025-12-04T12:15:06.5028365Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_main_loop_scaling_shape1_use_fast_accum_False_scaling_block_sizes1_cuda SKIPPED [0.0003s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 5%] 2025-12-04T12:15:06.5029253Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_main_loop_scaling_shape1_use_fast_accum_True_scaling_block_sizes0_cuda SKIPPED [0.0003s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 6%] 2025-12-04T12:15:06.5030118Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_main_loop_scaling_shape1_use_fast_accum_True_scaling_block_sizes1_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 7%] 2025-12-04T12:15:06.5030643Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_mx_fp8_max_autotune_cuda SKIPPED [0.0002s] (Not supported on non B200) [ 8%] 2025-12-04T12:15:06.5031261Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_mx_fusion_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 9%] 2025-12-04T12:15:06.5032223Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1024_K_1024_N_16_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 9%] 2025-12-04T12:15:06.5033164Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1024_K_1024_N_2048_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 10%] 2025-12-04T12:15:06.5034101Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1024_K_16_N_16_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 11%] 2025-12-04T12:15:06.5035038Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1024_K_16_N_2048_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 12%] 2025-12-04T12:15:06.5036005Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1024_K_32_N_16_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 13%] 2025-12-04T12:15:06.5036958Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1024_K_32_N_2048_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 13%] 2025-12-04T12:15:06.5037877Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1_K_1024_N_16_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 14%] 2025-12-04T12:15:06.5038825Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1_K_1024_N_2048_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 15%] 2025-12-04T12:15:06.5039740Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1_K_16_N_16_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 16%] 2025-12-04T12:15:06.5040676Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1_K_16_N_2048_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 17%] 2025-12-04T12:15:06.5041614Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1_K_32_N_16_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 18%] 2025-12-04T12:15:06.5042547Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1_K_32_N_2048_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 18%] 2025-12-04T12:15:06.5043540Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_257_K_1024_N_16_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 19%] 2025-12-04T12:15:06.5044494Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_257_K_1024_N_2048_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 20%] 2025-12-04T12:15:06.5045415Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_257_K_16_N_16_persistent_matmul_False_cuda SKIPPED [0.0004s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 21%] 2025-12-04T12:15:06.5046339Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_257_K_16_N_2048_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 22%] 2025-12-04T12:15:06.5047273Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_257_K_32_N_16_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 22%] 2025-12-04T12:15:06.5048196Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_257_K_32_N_2048_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 23%] 2025-12-04T12:15:06.5049130Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_33_K_1024_N_16_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 24%] 2025-12-04T12:15:06.5050066Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_33_K_1024_N_2048_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 25%] 2025-12-04T12:15:06.5051023Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_33_K_16_N_16_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 26%] 2025-12-04T12:15:06.5051946Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_33_K_16_N_2048_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 27%] 2025-12-04T12:15:06.5052863Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_33_K_32_N_16_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 27%] 2025-12-04T12:15:06.5053784Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_33_K_32_N_2048_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 28%] 2025-12-04T12:15:06.5054710Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_3_K_1024_N_16_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 29%] 2025-12-04T12:15:06.5055658Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_3_K_1024_N_2048_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 30%] 2025-12-04T12:15:06.5056646Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_3_K_16_N_16_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 31%] 2025-12-04T12:15:06.5057650Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_3_K_16_N_2048_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 31%] 2025-12-04T12:15:06.5058630Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_3_K_32_N_16_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 32%] 2025-12-04T12:15:06.5059563Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_3_K_32_N_2048_persistent_matmul_False_cuda SKIPPED [0.0004s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 33%] 2025-12-04T12:15:06.5060570Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_1024,1024,512_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 34%] 2025-12-04T12:15:06.5061584Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_1024,1024,512_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 35%] 2025-12-04T12:15:06.5062590Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_1024,1024,512_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 36%] 2025-12-04T12:15:06.5063599Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_1024,1024,512_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 36%] 2025-12-04T12:15:06.5064571Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_16,16,32_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 37%] 2025-12-04T12:15:06.5065536Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_16,16,32_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 38%] 2025-12-04T12:15:06.5066547Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_16,16,32_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 39%] 2025-12-04T12:15:06.5067506Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_16,16,32_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 40%] 2025-12-04T12:15:06.5068494Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_16,32,32_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 40%] 2025-12-04T12:15:06.5069456Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_16,32,32_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 41%] 2025-12-04T12:15:06.5070439Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_16,32,32_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 42%] 2025-12-04T12:15:06.5071589Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_16,32,32_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 43%] 2025-12-04T12:15:06.5072572Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_tma_template_shape_1024,1024,512_use_fast_accum_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 44%] 2025-12-04T12:15:06.5073452Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_tma_template_shape_1024,1024,512_use_fast_accum_True_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 45%] 2025-12-04T12:15:06.5074411Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_tma_template_shape_16,32,32_use_fast_accum_False_cuda SKIPPED [0.0004s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 45%] 2025-12-04T12:15:06.5075262Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_tma_template_shape_16,32,32_use_fast_accum_True_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 46%] 2025-12-04T12:15:06.5075971Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_scaled_mm_preserves_strides_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 47%] 2025-12-04T12:15:06.5076950Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1024_K_1024_N_16_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 48%] 2025-12-04T12:15:06.5077924Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1024_K_1024_N_2048_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 49%] 2025-12-04T12:15:06.5078888Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1024_K_16_N_16_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 50%] 2025-12-04T12:15:06.5079842Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1024_K_16_N_2048_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 50%] 2025-12-04T12:15:06.5080802Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1024_K_32_N_16_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 51%] 2025-12-04T12:15:06.5081816Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1024_K_32_N_2048_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 52%] 2025-12-04T12:15:06.5082775Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1_K_1024_N_16_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 53%] 2025-12-04T12:15:06.5083730Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1_K_1024_N_2048_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 54%] 2025-12-04T12:15:06.5084675Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1_K_16_N_16_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 54%] 2025-12-04T12:15:06.5085626Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1_K_16_N_2048_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 55%] 2025-12-04T12:15:06.5086560Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1_K_32_N_16_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 56%] 2025-12-04T12:15:06.5087544Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1_K_32_N_2048_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 57%] 2025-12-04T12:15:06.5088492Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_257_K_1024_N_16_persistent_matmul_False_cuda SKIPPED [0.0004s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 58%] 2025-12-04T12:15:06.5089521Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_257_K_1024_N_2048_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 59%] 2025-12-04T12:15:06.5090456Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_257_K_16_N_16_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 59%] 2025-12-04T12:15:06.5091413Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_257_K_16_N_2048_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 60%] 2025-12-04T12:15:06.5092687Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_257_K_32_N_16_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 61%] 2025-12-04T12:15:06.5093652Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_257_K_32_N_2048_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 62%] 2025-12-04T12:15:06.5094592Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_33_K_1024_N_16_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 63%] 2025-12-04T12:15:06.5095582Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_33_K_1024_N_2048_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 63%] 2025-12-04T12:15:06.5096591Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_33_K_16_N_16_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 64%] 2025-12-04T12:15:06.5097667Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_33_K_16_N_2048_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 65%] 2025-12-04T12:15:06.5098674Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_33_K_32_N_16_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 66%] 2025-12-04T12:15:06.5099647Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_33_K_32_N_2048_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 67%] 2025-12-04T12:15:06.5100623Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_3_K_1024_N_16_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 68%] 2025-12-04T12:15:06.5101579Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_3_K_1024_N_2048_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 68%] 2025-12-04T12:15:06.5102523Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_3_K_16_N_16_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 69%] 2025-12-04T12:15:06.5103500Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_3_K_16_N_2048_persistent_matmul_False_cuda SKIPPED [0.0004s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 70%] 2025-12-04T12:15:06.5104452Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_3_K_32_N_16_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 71%] 2025-12-04T12:15:06.5105485Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_3_K_32_N_2048_persistent_matmul_False_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 72%] 2025-12-04T12:15:06.5106614Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_bfloat16_shape_1024,1024,512_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda_bfloat16 SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 72%] 2025-12-04T12:15:06.5107714Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_bfloat16_shape_1024,1024,512_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda_bfloat16 SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 73%] 2025-12-04T12:15:06.5108827Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_bfloat16_shape_1024,1024,512_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda_bfloat16 SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 74%] 2025-12-04T12:15:06.5109926Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_bfloat16_shape_1024,1024,512_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda_bfloat16 SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 75%] 2025-12-04T12:15:06.5111025Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_bfloat16_shape_16,16,32_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda_bfloat16 SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 76%] 2025-12-04T12:15:06.5112092Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_bfloat16_shape_16,16,32_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda_bfloat16 SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 77%] 2025-12-04T12:15:06.5113215Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_bfloat16_shape_16,16,32_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda_bfloat16 SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 77%] 2025-12-04T12:15:06.5114287Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_bfloat16_shape_16,16,32_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda_bfloat16 SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 78%] 2025-12-04T12:15:06.5115366Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_bfloat16_shape_16,32,32_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda_bfloat16 SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 79%] 2025-12-04T12:15:06.5116458Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_bfloat16_shape_16,32,32_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda_bfloat16 SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 80%] 2025-12-04T12:15:06.5117533Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_bfloat16_shape_16,32,32_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda_bfloat16 SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 81%] 2025-12-04T12:15:06.5118612Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_bfloat16_shape_16,32,32_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda_bfloat16 SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 81%] 2025-12-04T12:15:06.5119732Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_float32_shape_1024,1024,512_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda_float32 SKIPPED [0.0004s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 82%] 2025-12-04T12:15:06.5120866Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_float32_shape_1024,1024,512_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda_float32 SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 83%] 2025-12-04T12:15:06.5121985Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_float32_shape_1024,1024,512_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda_float32 SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 84%] 2025-12-04T12:15:06.5123080Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_float32_shape_1024,1024,512_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda_float32 SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 85%] 2025-12-04T12:15:06.5124144Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_float32_shape_16,16,32_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda_float32 SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 86%] 2025-12-04T12:15:06.5125212Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_float32_shape_16,16,32_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda_float32 SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 86%] 2025-12-04T12:15:06.5126267Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_float32_shape_16,16,32_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda_float32 SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 87%] 2025-12-04T12:15:06.5127335Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_float32_shape_16,16,32_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda_float32 SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 88%] 2025-12-04T12:15:06.5128439Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_float32_shape_16,32,32_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda_float32 SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 89%] 2025-12-04T12:15:06.5129514Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_float32_shape_16,32,32_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda_float32 SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 90%] 2025-12-04T12:15:06.5130576Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_float32_shape_16,32,32_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda_float32 SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 90%] 2025-12-04T12:15:06.5131632Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_float32_shape_16,32,32_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda_float32 SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 91%] 2025-12-04T12:15:06.5132647Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_tma_template_bfloat16_shape_1024,1024,512_use_fast_accum_False_cuda_bfloat16 SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 92%] 2025-12-04T12:15:06.5133628Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_tma_template_bfloat16_shape_1024,1024,512_use_fast_accum_True_cuda_bfloat16 SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 93%] 2025-12-04T12:15:06.5134628Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_tma_template_bfloat16_shape_16,32,32_use_fast_accum_False_cuda_bfloat16 SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 94%] 2025-12-04T12:15:06.5135572Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_tma_template_bfloat16_shape_16,32,32_use_fast_accum_True_cuda_bfloat16 SKIPPED [0.0004s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 95%] 2025-12-04T12:15:06.5136704Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_tma_template_float32_shape_1024,1024,512_use_fast_accum_False_cuda_float32 SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 95%] 2025-12-04T12:15:06.5137676Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_tma_template_float32_shape_1024,1024,512_use_fast_accum_True_cuda_float32 SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 96%] 2025-12-04T12:15:06.5138631Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_tma_template_float32_shape_16,32,32_use_fast_accum_False_cuda_float32 SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 97%] 2025-12-04T12:15:06.5139571Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_tma_template_float32_shape_16,32,32_use_fast_accum_True_cuda_float32 SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 98%] 2025-12-04T12:15:06.5140292Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_unacceptable_input_dims_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [ 99%] 2025-12-04T12:15:06.5141065Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_unacceptable_scale_dims_rowwise_scaling_cuda SKIPPED [0.0002s] (FP8 is only supported on H100+, SM 8.9 and MI300+ and XPU devices) [100%] 2025-12-04T12:15:06.5141072Z 2025-12-04T12:15:06.5141732Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-4d9849432c7f5caf.xml - 2025-12-04T12:15:06.5141967Z ================ 1 passed, 121 skipped, 66 deselected in 3.35s ================= 2025-12-04T12:15:06.5157844Z The following tests failed consistently: ['test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda', 'test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda', 'test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda', 'test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda', 'test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda', 'test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda', 'test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda', 'test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda', 'test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda', 'test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda', 'test/inductor/test_fp8.py::TestFP8TypesCUDA::test_eager_fallback_bfloat16_cuda_bfloat16', 'test/inductor/test_fp8.py::TestFP8TypesCUDA::test_eager_fallback_float16_cuda_float16', 'test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,1,15_cuda', 'test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,15_cuda', 'test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,4096_cuda', 'test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,512_cuda', 'test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_4,2048,4096_cuda', 'test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,1,15_cuda', 'test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,15_cuda', 'test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,4096_cuda', 'test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,512_cuda', 'test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_4,2048,4096_cuda', 'test/inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float16_float8_e4m3fn_shape_16,16,16_cuda', 'test/inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float16_float8_e4m3fn_shape_4,2048,4096_cuda', 'test/inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float32_float8_e4m3fn_shape_16,16,16_cuda', 'test/inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float32_float8_e4m3fn_shape_4,2048,4096_cuda', 'test/inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float16_shape_15,3,13_dst_types0_cuda_float16', 'test/inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float16_shape_4,2048,4096_dst_types0_cuda_float16', 'test/inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float32_shape_15,3,13_dst_types0_cuda_float32', 'test/inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float32_shape_4,2048,4096_dst_types0_cuda_float32', 'test/inductor/test_fp8.py::TestFP8TypesCUDA::test_xblock_for_small_numel_float8_e4m3fn_cuda'] 2025-12-04T12:15:06.5157939Z 2025-12-04T12:15:06.5158395Z FINISHED PRINTING LOG FILE of inductor/test_fp8 1/1 (test/test-reports/inductor.test_fp8_1.1_5b24deb545871ee8_.log) 2025-12-04T12:15:06.5158400Z 2025-12-04T12:15:06.5158720Z Finished inductor/test_fp8 1/1 ... [2025-12-04 12:15:04.795060][10933.17794738], took 32.13min 2025-12-04T12:15:06.5159434Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-dff864e79f1bf91b.xml 2025-12-04T12:15:06.5160198Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-053a0e10a178eff6.xml 2025-12-04T12:15:06.5160893Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-966288eeb3fe785e.xml 2025-12-04T12:15:06.5161631Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-47dd8058babbbd0d.xml 2025-12-04T12:15:06.5162330Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-e92e228ccdafe934.xml 2025-12-04T12:15:06.5163021Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-0328fb4bc2fb022d.xml 2025-12-04T12:15:06.5163730Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-4ecceae3d20d3515.xml 2025-12-04T12:15:06.5164431Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-af3f0411f43ffff1.xml 2025-12-04T12:15:06.5165148Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-5f646abfecfc34db.xml 2025-12-04T12:15:06.5165836Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-2f71fa45f6063b14.xml 2025-12-04T12:15:06.5166540Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-3d881319a967678f.xml 2025-12-04T12:15:06.5167257Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-7b45d70025cf6016.xml 2025-12-04T12:15:06.5167952Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-b4a285d41fdad5fc.xml 2025-12-04T12:15:06.5168653Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-9b24822b6f23300e.xml 2025-12-04T12:15:06.5169395Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-642548938a706c13.xml 2025-12-04T12:15:06.5170101Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-3087aa3d89d0a96b.xml 2025-12-04T12:15:06.5170792Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-5fb4c628c04a1cdc.xml 2025-12-04T12:15:06.5171747Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-0d752e0bfa5071ea.xml 2025-12-04T12:15:06.5172462Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-fedbc7df4b1c2869.xml 2025-12-04T12:15:06.5173151Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-3cf6d62b643bfad7.xml 2025-12-04T12:15:06.5173862Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-e2e744a24cd2751e.xml 2025-12-04T12:15:06.5174554Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-6de5a411a3f65f82.xml 2025-12-04T12:15:06.5175259Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-d2f3621583fff098.xml 2025-12-04T12:15:06.5175952Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-319cee3df6121e1a.xml 2025-12-04T12:15:06.5176700Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-452be63c68b4eb35.xml 2025-12-04T12:15:06.5177490Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-5a49841d6a2b730b.xml 2025-12-04T12:15:06.5178182Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-f1313a025d30dc09.xml 2025-12-04T12:15:06.5178883Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-03aedafc0832726c.xml 2025-12-04T12:15:06.5179572Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-89171bcc48f05a69.xml 2025-12-04T12:15:06.5180262Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-6450e334481f0131.xml 2025-12-04T12:15:06.5180974Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-f7999da795e3cf34.xml 2025-12-04T12:15:06.5181679Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-ad7a38726bbc8b50.xml 2025-12-04T12:15:06.5182387Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-b434424093647de3.xml 2025-12-04T12:15:06.5183080Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-ccd966f4e119e833.xml 2025-12-04T12:15:06.5183819Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-d16f18ba4de45d90.xml 2025-12-04T12:15:06.5184525Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-4078dca354f1c797.xml 2025-12-04T12:15:06.5185221Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-7591ded94ad5fda9.xml 2025-12-04T12:15:06.5186018Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-4955a88ef6b89264.xml 2025-12-04T12:15:06.5186724Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-2fae1650dec37ec0.xml 2025-12-04T12:15:06.5434234Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-0893388d06071d35.xml 2025-12-04T12:15:06.5858983Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-b62e3abe6013e6ef.xml 2025-12-04T12:15:06.6232208Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-0fe50dbde6f69754.xml 2025-12-04T12:15:06.6550598Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-1381d94cd6abec18.xml 2025-12-04T12:15:06.6851519Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-2b9aebe063e8f7ef.xml 2025-12-04T12:15:06.7229676Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-649ba93d0ac5919c.xml 2025-12-04T12:15:06.7520729Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-df60bd1ca7e6baab.xml 2025-12-04T12:15:06.7900531Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-927fdf8f8ff6280c.xml 2025-12-04T12:15:06.8240529Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-2f380c761dc75570.xml 2025-12-04T12:15:06.8592102Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-db3aa4c2f1c0f2c1.xml 2025-12-04T12:15:06.8928127Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-7c20e7902388541e.xml 2025-12-04T12:15:06.9248981Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-43cf13c151388d8e.xml 2025-12-04T12:15:06.9615328Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-27661fe34019a4f8.xml 2025-12-04T12:15:06.9965411Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-63ef36c446edecf7.xml 2025-12-04T12:15:07.0296222Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-818cc5e6f257d295.xml 2025-12-04T12:15:07.0615558Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-b552d5ebf2a766dc.xml 2025-12-04T12:15:07.1070189Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-08c28ac73e77007a.xml 2025-12-04T12:15:07.1423173Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-df1b42bf8f6cd06e.xml 2025-12-04T12:15:07.1753947Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-97d6c66aee44b097.xml 2025-12-04T12:15:07.2069015Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-232f2d4b09cdec77.xml 2025-12-04T12:15:07.2376703Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-6add3d31a0a55a66.xml 2025-12-04T12:15:07.2682053Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-fa52f41f0c0be4e5.xml 2025-12-04T12:15:07.2972832Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-38b24c1b21208356.xml 2025-12-04T12:15:07.3301206Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-b1ae24833396f782.xml 2025-12-04T12:15:07.3610568Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-80996ba6b8c32f81.xml 2025-12-04T12:15:07.3949454Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-8b26ba548538abde.xml 2025-12-04T12:15:07.4411141Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-d73817a3e5f02a06.xml 2025-12-04T12:15:07.4726515Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-90e37d7f0968dad1.xml 2025-12-04T12:15:07.5065459Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-6ba281452d587f38.xml 2025-12-04T12:15:07.5397488Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-85d1d6e9267cc116.xml 2025-12-04T12:15:07.5677019Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-7a610c26dd7fa0e9.xml 2025-12-04T12:15:07.6093904Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-269f6089cafc9f3b.xml 2025-12-04T12:15:07.6437017Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-f11fe18ee197cc1f.xml 2025-12-04T12:15:07.6737533Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-0b8acd36d7258295.xml 2025-12-04T12:15:07.7065512Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-babe12520ea62fea.xml 2025-12-04T12:15:07.7400551Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-08a6bb29b776e6ca.xml 2025-12-04T12:15:07.8292760Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-ef8db3fa00c6c1d7.xml 2025-12-04T12:15:07.8890874Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-7a3ac84fc91fa02b.xml 2025-12-04T12:15:07.9217786Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-e162f70cb76e49ff.xml 2025-12-04T12:15:07.9532821Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-e70a5c274fb86b8e.xml 2025-12-04T12:15:07.9884953Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-0c17434f07767682.xml 2025-12-04T12:15:08.0184498Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-3815c1aa47a06d85.xml 2025-12-04T12:15:08.0485855Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-69850f25ab7699fd.xml 2025-12-04T12:15:08.0813995Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-da23a1d59c747be6.xml 2025-12-04T12:15:08.1152659Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-36993cd4956a89fe.xml 2025-12-04T12:15:08.1476568Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-78153e5fcd212bc6.xml 2025-12-04T12:15:08.1821092Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-04b538cf09549803.xml 2025-12-04T12:15:08.2384722Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-d91f9f6b0d5ec125.xml 2025-12-04T12:15:08.2709424Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-ebbb316cfb6210df.xml 2025-12-04T12:15:08.3024354Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-6f1b13e751374b5d.xml 2025-12-04T12:15:08.3312772Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-f2c09e3279cd971a.xml 2025-12-04T12:15:08.3631810Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-5c0e3bac2edd6805.xml 2025-12-04T12:15:08.3959108Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-1c004f486086cbb5.xml 2025-12-04T12:15:08.4346055Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-5e05e7b060f911b0.xml 2025-12-04T12:15:08.4646140Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-4d9849432c7f5caf.xml 2025-12-04T12:15:08.9568734Z Uploading logs for 57119749248 to S3 2025-12-04T12:15:09.1308848Z Uploading artifacts took 0.63 seconds 2025-12-04T12:15:09.1313118Z inductor/test_fp8 1/1 failed! 2025-12-04T12:15:09.1313652Z Running dynamo/test_model_output 1/1 ... [2025-12-04 12:15:09.131154][10937.514047253] 2025-12-04T12:15:09.1314233Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T12:15:09.1318455Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'dynamo/test_model_output.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 12:15:09.131618] 2025-12-04T12:15:45.3925080Z 2025-12-04T12:15:45.3926052Z PRINTING LOG FILE of dynamo/test_model_output 1/1 (test/test-reports/dynamo.test_model_output_1.1_9f288500c4a144e5_.log) 2025-12-04T12:15:45.3927496Z Test results will be stored in test-reports/python-pytest/dynamo.test_model_output/dynamo.test_model_output-d1e6e78eb0372411.xml 2025-12-04T12:15:45.3928602Z ============================= test session starts ============================== 2025-12-04T12:15:45.3929359Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T12:15:45.3929972Z cachedir: .pytest_cache 2025-12-04T12:15:45.3930680Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:15:45.3931688Z rootdir: /var/lib/jenkins/workspace 2025-12-04T12:15:45.3932047Z configfile: pytest.ini 2025-12-04T12:15:45.3932816Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T12:15:45.3933675Z collecting ... collected 18 items 2025-12-04T12:15:45.3934094Z stepcurrent: Cannot find last run test, not skipping 2025-12-04T12:15:45.3940886Z Running 18 items in this shard: test/dynamo/test_model_output.py::TestHFPretrained::test_pretrained, test/dynamo/test_model_output.py::TestHFPretrained::test_pretrained_non_const_attr, test/dynamo/test_model_output.py::TestModelOutput::test_mo_assign, test/dynamo/test_model_output.py::TestModelOutput::test_mo_create, test/dynamo/test_model_output.py::TestModelOutput::test_mo_from_outside, test/dynamo/test_model_output.py::TestModelOutput::test_mo_getattr, test/dynamo/test_model_output.py::TestModelOutput::test_mo_getattr_missing, test/dynamo/test_model_output.py::TestModelOutput::test_mo_getitem, test/dynamo/test_model_output.py::TestModelOutput::test_mo_index, test/dynamo/test_model_output.py::TestModelOutput::test_mo_init, test/dynamo/test_model_output.py::TestModelOutput::test_mo_init2, test/dynamo/test_model_output.py::TestModelOutput::test_mo_init_with_disable, test/dynamo/test_model_output.py::TestModelOutput::test_mo_newkey, test/dynamo/test_model_output.py::TestModelOutput::test_mo_reconstruct_bytecode, test/dynamo/test_model_output.py::TestModelOutput::test_mo_tuple, test/dynamo/test_model_output.py::TestModelOutput::test_none, test/dynamo/test_model_output.py::TestModelOutput::test_reconstruction, test/dynamo/test_model_output.py::TestModelOutputBertCUDA::test_HF_bert_model_output_cuda 2025-12-04T12:15:45.3947510Z 2025-12-04T12:15:45.3947942Z dynamo/test_model_output.py::TestHFPretrained::test_pretrained ('RERUN', {'yellow': True}) [0.0298s] [ 5%] 2025-12-04T12:15:45.3948944Z dynamo/test_model_output.py::TestHFPretrained::test_pretrained ('RERUN', {'yellow': True}) [0.0017s] [ 5%] 2025-12-04T12:15:45.3949842Z dynamo/test_model_output.py::TestHFPretrained::test_pretrained FAILED [0.0015s] [ 5%] 2025-12-04T12:15:45.3950307Z 2025-12-04T12:15:45.3950467Z ==================================== RERUNS ==================================== 2025-12-04T12:15:45.3950994Z _______________________ TestHFPretrained.test_pretrained _______________________ 2025-12-04T12:15:45.3951508Z Traceback (most recent call last): 2025-12-04T12:15:45.3952223Z File "/var/lib/jenkins/workspace/test/dynamo/test_model_output.py", line 45, in test_pretrained 2025-12-04T12:15:45.3952867Z ref = fn(x, tmp) 2025-12-04T12:15:45.3953382Z File "/var/lib/jenkins/workspace/test/dynamo/test_model_output.py", line 40, in fn 2025-12-04T12:15:45.3954004Z return a + torch.ones(2) * tmp.max_length 2025-12-04T12:15:45.3954848Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/transformers/configuration_utils.py", line 198, in __getattribute__ 2025-12-04T12:15:45.3955676Z return super().__getattribute__(key) 2025-12-04T12:15:45.3956213Z AttributeError: 'PreTrainedConfig' object has no attribute 'max_length' 2025-12-04T12:15:45.3956622Z 2025-12-04T12:15:45.3956856Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:45.3957637Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/dynamo/test_model_output.py TestHFPretrained.test_pretrained 2025-12-04T12:15:45.3958224Z 2025-12-04T12:15:45.3958496Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:45.3959155Z _______________________ TestHFPretrained.test_pretrained _______________________ 2025-12-04T12:15:45.3959669Z Traceback (most recent call last): 2025-12-04T12:15:45.3960299Z File "/var/lib/jenkins/workspace/test/dynamo/test_model_output.py", line 45, in test_pretrained 2025-12-04T12:15:45.3960943Z ref = fn(x, tmp) 2025-12-04T12:15:45.3961455Z File "/var/lib/jenkins/workspace/test/dynamo/test_model_output.py", line 40, in fn 2025-12-04T12:15:45.3962096Z return a + torch.ones(2) * tmp.max_length 2025-12-04T12:15:45.3962926Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/transformers/configuration_utils.py", line 198, in __getattribute__ 2025-12-04T12:15:45.3963752Z return super().__getattribute__(key) 2025-12-04T12:15:45.3964284Z AttributeError: 'PreTrainedConfig' object has no attribute 'max_length' 2025-12-04T12:15:45.3964692Z 2025-12-04T12:15:45.3964942Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:45.3965772Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/dynamo/test_model_output.py TestHFPretrained.test_pretrained 2025-12-04T12:15:45.3966362Z 2025-12-04T12:15:45.3966629Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:45.3967193Z =================================== FAILURES =================================== 2025-12-04T12:15:45.3967721Z _______________________ TestHFPretrained.test_pretrained _______________________ 2025-12-04T12:15:45.3968240Z Traceback (most recent call last): 2025-12-04T12:15:45.3968885Z File "/var/lib/jenkins/workspace/test/dynamo/test_model_output.py", line 45, in test_pretrained 2025-12-04T12:15:45.3969517Z ref = fn(x, tmp) 2025-12-04T12:15:45.3970029Z File "/var/lib/jenkins/workspace/test/dynamo/test_model_output.py", line 40, in fn 2025-12-04T12:15:45.3970648Z return a + torch.ones(2) * tmp.max_length 2025-12-04T12:15:45.3971701Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/transformers/configuration_utils.py", line 198, in __getattribute__ 2025-12-04T12:15:45.3972528Z return super().__getattribute__(key) 2025-12-04T12:15:45.3973065Z AttributeError: 'PreTrainedConfig' object has no attribute 'max_length' 2025-12-04T12:15:45.3973477Z 2025-12-04T12:15:45.3973709Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:45.3974505Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/dynamo/test_model_output.py TestHFPretrained.test_pretrained 2025-12-04T12:15:45.3975078Z 2025-12-04T12:15:45.3975344Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:45.3976567Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/dynamo.test_model_output/dynamo.test_model_output-d1e6e78eb0372411.xml - 2025-12-04T12:15:45.3977620Z =========================== short test summary info ============================ 2025-12-04T12:15:45.3978690Z FAILED [0.0015s] dynamo/test_model_output.py::TestHFPretrained::test_pretrained - AttributeError: 'PreTrainedConfig' object has no attribute 'max_length' 2025-12-04T12:15:45.3979482Z 2025-12-04T12:15:45.3979700Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:45.3980495Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/dynamo/test_model_output.py TestHFPretrained.test_pretrained 2025-12-04T12:15:45.3981068Z 2025-12-04T12:15:45.3981351Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:45.3981942Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T12:15:45.3982423Z ========================== 1 failed, 2 rerun in 0.08s ========================== 2025-12-04T12:15:45.3982836Z Got exit code 1 2025-12-04T12:15:45.3983109Z Retrying single test... 2025-12-04T12:15:45.3983825Z Test results will be stored in test-reports/python-pytest/dynamo.test_model_output/dynamo.test_model_output-0e0432c8246f889e.xml 2025-12-04T12:15:45.3984669Z ============================= test session starts ============================== 2025-12-04T12:15:45.3985337Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T12:15:45.3985936Z cachedir: .pytest_cache 2025-12-04T12:15:45.3986631Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:15:45.3987413Z rootdir: /var/lib/jenkins/workspace 2025-12-04T12:15:45.3987772Z configfile: pytest.ini 2025-12-04T12:15:45.3988600Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T12:15:45.3989546Z collecting ... collected 18 items / 17 deselected / 1 selected 2025-12-04T12:15:45.3990418Z stepcurrent: skipping 0 already run items. Running only test/dynamo/test_model_output.py::TestHFPretrained::test_pretrained 2025-12-04T12:15:45.3991195Z Running 1 items in this shard 2025-12-04T12:15:45.3991458Z 2025-12-04T12:15:45.3991938Z dynamo/test_model_output.py::TestHFPretrained::test_pretrained ('RERUN', {'yellow': True}) [0.0298s] [100%] 2025-12-04T12:15:45.3992936Z dynamo/test_model_output.py::TestHFPretrained::test_pretrained ('RERUN', {'yellow': True}) [0.0018s] [100%] 2025-12-04T12:15:45.3993827Z dynamo/test_model_output.py::TestHFPretrained::test_pretrained FAILED [0.0015s] [100%] 2025-12-04T12:15:45.3994294Z 2025-12-04T12:15:45.3994455Z ==================================== RERUNS ==================================== 2025-12-04T12:15:45.3994984Z _______________________ TestHFPretrained.test_pretrained _______________________ 2025-12-04T12:15:45.3995497Z Traceback (most recent call last): 2025-12-04T12:15:45.3996145Z File "/var/lib/jenkins/workspace/test/dynamo/test_model_output.py", line 45, in test_pretrained 2025-12-04T12:15:45.3996779Z ref = fn(x, tmp) 2025-12-04T12:15:45.3997295Z File "/var/lib/jenkins/workspace/test/dynamo/test_model_output.py", line 40, in fn 2025-12-04T12:15:45.3997917Z return a + torch.ones(2) * tmp.max_length 2025-12-04T12:15:45.3998763Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/transformers/configuration_utils.py", line 198, in __getattribute__ 2025-12-04T12:15:45.3999580Z return super().__getattribute__(key) 2025-12-04T12:15:45.4000117Z AttributeError: 'PreTrainedConfig' object has no attribute 'max_length' 2025-12-04T12:15:45.4000529Z 2025-12-04T12:15:45.4000759Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:45.4001540Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/dynamo/test_model_output.py TestHFPretrained.test_pretrained 2025-12-04T12:15:45.4002126Z 2025-12-04T12:15:45.4002393Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:45.4003049Z _______________________ TestHFPretrained.test_pretrained _______________________ 2025-12-04T12:15:45.4003560Z Traceback (most recent call last): 2025-12-04T12:15:45.4004192Z File "/var/lib/jenkins/workspace/test/dynamo/test_model_output.py", line 45, in test_pretrained 2025-12-04T12:15:45.4004892Z ref = fn(x, tmp) 2025-12-04T12:15:45.4005407Z File "/var/lib/jenkins/workspace/test/dynamo/test_model_output.py", line 40, in fn 2025-12-04T12:15:45.4006010Z return a + torch.ones(2) * tmp.max_length 2025-12-04T12:15:45.4006837Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/transformers/configuration_utils.py", line 198, in __getattribute__ 2025-12-04T12:15:45.4007662Z return super().__getattribute__(key) 2025-12-04T12:15:45.4008200Z AttributeError: 'PreTrainedConfig' object has no attribute 'max_length' 2025-12-04T12:15:45.4008607Z 2025-12-04T12:15:45.4008826Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:45.4009613Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/dynamo/test_model_output.py TestHFPretrained.test_pretrained 2025-12-04T12:15:45.4010203Z 2025-12-04T12:15:45.4010468Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:45.4011033Z =================================== FAILURES =================================== 2025-12-04T12:15:45.4011566Z _______________________ TestHFPretrained.test_pretrained _______________________ 2025-12-04T12:15:45.4012081Z Traceback (most recent call last): 2025-12-04T12:15:45.4012729Z File "/var/lib/jenkins/workspace/test/dynamo/test_model_output.py", line 45, in test_pretrained 2025-12-04T12:15:45.4013363Z ref = fn(x, tmp) 2025-12-04T12:15:45.4013871Z File "/var/lib/jenkins/workspace/test/dynamo/test_model_output.py", line 40, in fn 2025-12-04T12:15:45.4014525Z return a + torch.ones(2) * tmp.max_length 2025-12-04T12:15:45.4015351Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/transformers/configuration_utils.py", line 198, in __getattribute__ 2025-12-04T12:15:45.4016165Z return super().__getattribute__(key) 2025-12-04T12:15:45.4016783Z AttributeError: 'PreTrainedConfig' object has no attribute 'max_length' 2025-12-04T12:15:45.4017240Z 2025-12-04T12:15:45.4017469Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:45.4018306Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/dynamo/test_model_output.py TestHFPretrained.test_pretrained 2025-12-04T12:15:45.4018899Z 2025-12-04T12:15:45.4019171Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:45.4020305Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/dynamo.test_model_output/dynamo.test_model_output-0e0432c8246f889e.xml - 2025-12-04T12:15:45.4021347Z =========================== short test summary info ============================ 2025-12-04T12:15:45.4022332Z FAILED [0.0015s] dynamo/test_model_output.py::TestHFPretrained::test_pretrained - AttributeError: 'PreTrainedConfig' object has no attribute 'max_length' 2025-12-04T12:15:45.4023126Z 2025-12-04T12:15:45.4023343Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:45.4024141Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/dynamo/test_model_output.py TestHFPretrained.test_pretrained 2025-12-04T12:15:45.4024716Z 2025-12-04T12:15:45.4024998Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:45.4025591Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T12:15:45.4026106Z ================== 1 failed, 17 deselected, 2 rerun in 0.08s =================== 2025-12-04T12:15:45.4026549Z Got exit code 1 2025-12-04T12:15:45.4026827Z Retrying single test... 2025-12-04T12:15:45.4027547Z Test results will be stored in test-reports/python-pytest/dynamo.test_model_output/dynamo.test_model_output-1d824658578ee605.xml 2025-12-04T12:15:45.4028392Z ============================= test session starts ============================== 2025-12-04T12:15:45.4029058Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T12:15:45.4029663Z cachedir: .pytest_cache 2025-12-04T12:15:45.4030412Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:15:45.4031201Z rootdir: /var/lib/jenkins/workspace 2025-12-04T12:15:45.4031562Z configfile: pytest.ini 2025-12-04T12:15:45.4032329Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T12:15:45.4033281Z collecting ... collected 18 items / 17 deselected / 1 selected 2025-12-04T12:15:45.4034153Z stepcurrent: skipping 0 already run items. Running only test/dynamo/test_model_output.py::TestHFPretrained::test_pretrained 2025-12-04T12:15:45.4034931Z Running 1 items in this shard 2025-12-04T12:15:45.4035145Z 2025-12-04T12:15:45.4035571Z dynamo/test_model_output.py::TestHFPretrained::test_pretrained ('RERUN', {'yellow': True}) [0.0298s] [100%] 2025-12-04T12:15:45.4036562Z dynamo/test_model_output.py::TestHFPretrained::test_pretrained ('RERUN', {'yellow': True}) [0.0017s] [100%] 2025-12-04T12:15:45.4037462Z dynamo/test_model_output.py::TestHFPretrained::test_pretrained FAILED [0.0015s] [100%] 2025-12-04T12:15:45.4037933Z 2025-12-04T12:15:45.4038095Z ==================================== RERUNS ==================================== 2025-12-04T12:15:45.4038626Z _______________________ TestHFPretrained.test_pretrained _______________________ 2025-12-04T12:15:45.4039140Z Traceback (most recent call last): 2025-12-04T12:15:45.4039789Z File "/var/lib/jenkins/workspace/test/dynamo/test_model_output.py", line 45, in test_pretrained 2025-12-04T12:15:45.4040475Z ref = fn(x, tmp) 2025-12-04T12:15:45.4040986Z File "/var/lib/jenkins/workspace/test/dynamo/test_model_output.py", line 40, in fn 2025-12-04T12:15:45.4041609Z return a + torch.ones(2) * tmp.max_length 2025-12-04T12:15:45.4042445Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/transformers/configuration_utils.py", line 198, in __getattribute__ 2025-12-04T12:15:45.4043264Z return super().__getattribute__(key) 2025-12-04T12:15:45.4043838Z AttributeError: 'PreTrainedConfig' object has no attribute 'max_length' 2025-12-04T12:15:45.4044249Z 2025-12-04T12:15:45.4044520Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:45.4045301Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/dynamo/test_model_output.py TestHFPretrained.test_pretrained 2025-12-04T12:15:45.4045887Z 2025-12-04T12:15:45.4046155Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:45.4046812Z _______________________ TestHFPretrained.test_pretrained _______________________ 2025-12-04T12:15:45.4047334Z Traceback (most recent call last): 2025-12-04T12:15:45.4047992Z File "/var/lib/jenkins/workspace/test/dynamo/test_model_output.py", line 45, in test_pretrained 2025-12-04T12:15:45.4048628Z ref = fn(x, tmp) 2025-12-04T12:15:45.4049149Z File "/var/lib/jenkins/workspace/test/dynamo/test_model_output.py", line 40, in fn 2025-12-04T12:15:45.4049771Z return a + torch.ones(2) * tmp.max_length 2025-12-04T12:15:45.4050602Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/transformers/configuration_utils.py", line 198, in __getattribute__ 2025-12-04T12:15:45.4051437Z return super().__getattribute__(key) 2025-12-04T12:15:45.4051980Z AttributeError: 'PreTrainedConfig' object has no attribute 'max_length' 2025-12-04T12:15:45.4052392Z 2025-12-04T12:15:45.4052624Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:45.4053404Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/dynamo/test_model_output.py TestHFPretrained.test_pretrained 2025-12-04T12:15:45.4053996Z 2025-12-04T12:15:45.4054265Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:45.4054832Z =================================== FAILURES =================================== 2025-12-04T12:15:45.4055379Z _______________________ TestHFPretrained.test_pretrained _______________________ 2025-12-04T12:15:45.4055888Z Traceback (most recent call last): 2025-12-04T12:15:45.4056694Z File "/var/lib/jenkins/workspace/test/dynamo/test_model_output.py", line 45, in test_pretrained 2025-12-04T12:15:45.4057354Z ref = fn(x, tmp) 2025-12-04T12:15:45.4057858Z File "/var/lib/jenkins/workspace/test/dynamo/test_model_output.py", line 40, in fn 2025-12-04T12:15:45.4058484Z return a + torch.ones(2) * tmp.max_length 2025-12-04T12:15:45.4059317Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/transformers/configuration_utils.py", line 198, in __getattribute__ 2025-12-04T12:15:45.4060148Z return super().__getattribute__(key) 2025-12-04T12:15:45.4060675Z AttributeError: 'PreTrainedConfig' object has no attribute 'max_length' 2025-12-04T12:15:45.4061102Z 2025-12-04T12:15:45.4061321Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:45.4062114Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/dynamo/test_model_output.py TestHFPretrained.test_pretrained 2025-12-04T12:15:45.4062689Z 2025-12-04T12:15:45.4062972Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:45.4064096Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/dynamo.test_model_output/dynamo.test_model_output-1d824658578ee605.xml - 2025-12-04T12:15:45.4065125Z =========================== short test summary info ============================ 2025-12-04T12:15:45.4066111Z FAILED [0.0015s] dynamo/test_model_output.py::TestHFPretrained::test_pretrained - AttributeError: 'PreTrainedConfig' object has no attribute 'max_length' 2025-12-04T12:15:45.4066974Z 2025-12-04T12:15:45.4067208Z To execute this test, run the following from the base repo dir: 2025-12-04T12:15:45.4067986Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/dynamo/test_model_output.py TestHFPretrained.test_pretrained 2025-12-04T12:15:45.4068573Z 2025-12-04T12:15:45.4068840Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:15:45.4069434Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T12:15:45.4069997Z ================== 1 failed, 17 deselected, 2 rerun in 0.08s =================== 2025-12-04T12:15:45.4070477Z Got exit code 1 2025-12-04T12:15:45.4071156Z FAILED CONSISTENTLY: test/dynamo/test_model_output.py::TestHFPretrained::test_pretrained 2025-12-04T12:15:45.4072070Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T12:15:45.4073151Z Test results will be stored in test-reports/python-pytest/dynamo.test_model_output/dynamo.test_model_output-a0141b45c0b55065.xml 2025-12-04T12:15:45.4073999Z ============================= test session starts ============================== 2025-12-04T12:15:45.4074668Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T12:15:45.4075277Z cachedir: .pytest_cache 2025-12-04T12:15:45.4075975Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:15:45.4076763Z rootdir: /var/lib/jenkins/workspace 2025-12-04T12:15:45.4077121Z configfile: pytest.ini 2025-12-04T12:15:45.4077888Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T12:15:45.4078831Z collecting ... collected 18 items / 1 deselected / 17 selected 2025-12-04T12:15:45.4079328Z stepcurrent: skipping 1 already run items. 2025-12-04T12:15:45.4079716Z Running 17 items in this shard 2025-12-04T12:15:45.4079929Z 2025-12-04T12:15:45.4081695Z dynamo/test_model_output.py::TestHFPretrained::test_pretrained_non_const_attr SKIPPED [0.0008s] (Test is disabled because an issue exists disabling it: https://github.com/pytorch/pytorch/issues/169481 for platform(s) linux. If you're seeing this on your local machine and would like to enable this test, please make sure CI is not set and you are not using the flag --import-disabled-tests.) [ 5%] 2025-12-04T12:15:45.4083930Z dynamo/test_model_output.py::TestModelOutput::test_mo_assign PASSED [0.3789s] [ 11%] 2025-12-04T12:15:45.4084813Z dynamo/test_model_output.py::TestModelOutput::test_mo_create PASSED [0.0655s] [ 17%] 2025-12-04T12:15:45.4085626Z dynamo/test_model_output.py::TestModelOutput::test_mo_from_outside PASSED [0.0463s] [ 23%] 2025-12-04T12:15:45.4086427Z dynamo/test_model_output.py::TestModelOutput::test_mo_getattr PASSED [0.0475s] [ 29%] 2025-12-04T12:15:45.4087258Z dynamo/test_model_output.py::TestModelOutput::test_mo_getattr_missing PASSED [0.0444s] [ 35%] 2025-12-04T12:15:45.4088091Z dynamo/test_model_output.py::TestModelOutput::test_mo_getitem PASSED [0.0575s] [ 41%] 2025-12-04T12:15:45.4088867Z dynamo/test_model_output.py::TestModelOutput::test_mo_index PASSED [0.0676s] [ 47%] 2025-12-04T12:15:45.4089616Z dynamo/test_model_output.py::TestModelOutput::test_mo_init PASSED [0.0647s] [ 52%] 2025-12-04T12:15:45.4090379Z dynamo/test_model_output.py::TestModelOutput::test_mo_init2 PASSED [0.0671s] [ 58%] 2025-12-04T12:15:45.4091205Z dynamo/test_model_output.py::TestModelOutput::test_mo_init_with_disable PASSED [0.1121s] [ 64%] 2025-12-04T12:15:45.4092026Z dynamo/test_model_output.py::TestModelOutput::test_mo_newkey PASSED [0.0497s] [ 70%] 2025-12-04T12:15:45.4092868Z dynamo/test_model_output.py::TestModelOutput::test_mo_reconstruct_bytecode PASSED [0.0607s] [ 76%] 2025-12-04T12:15:45.4093704Z dynamo/test_model_output.py::TestModelOutput::test_mo_tuple PASSED [0.0580s] [ 82%] 2025-12-04T12:15:45.4094443Z dynamo/test_model_output.py::TestModelOutput::test_none PASSED [0.0702s] [ 88%] 2025-12-04T12:15:45.4095267Z dynamo/test_model_output.py::TestModelOutput::test_reconstruction PASSED [0.0616s] [ 94%] 2025-12-04T12:15:45.4096200Z dynamo/test_model_output.py::TestModelOutputBertCUDA::test_HF_bert_model_output_cuda PASSED [1.0125s] [100%] 2025-12-04T12:15:45.4096868Z 2025-12-04T12:15:45.4097586Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/dynamo.test_model_output/dynamo.test_model_output-a0141b45c0b55065.xml - 2025-12-04T12:15:45.4098726Z ================= 16 passed, 1 skipped, 1 deselected in 2.32s ================== 2025-12-04T12:15:45.4100259Z The following tests failed consistently: ['test/dynamo/test_model_output.py::TestHFPretrained::test_pretrained'] 2025-12-04T12:15:45.4100898Z 2025-12-04T12:15:45.4101425Z FINISHED PRINTING LOG FILE of dynamo/test_model_output 1/1 (test/test-reports/dynamo.test_model_output_1.1_9f288500c4a144e5_.log) 2025-12-04T12:15:45.4102097Z 2025-12-04T12:15:45.4102439Z Finished dynamo/test_model_output 1/1 ... [2025-12-04 12:15:45.392447][10973.775342194], took 0.60min 2025-12-04T12:15:45.4148121Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/dynamo.test_model_output/dynamo.test_model_output-d1e6e78eb0372411.xml 2025-12-04T12:15:45.4916816Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/dynamo.test_model_output/dynamo.test_model_output-0e0432c8246f889e.xml 2025-12-04T12:15:45.5206648Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/dynamo.test_model_output/dynamo.test_model_output-1d824658578ee605.xml 2025-12-04T12:15:45.5519792Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/dynamo.test_model_output/dynamo.test_model_output-a0141b45c0b55065.xml 2025-12-04T12:15:46.0352438Z Uploading logs for 57119749248 to S3 2025-12-04T12:15:46.2388718Z Uploading artifacts took 0.65 seconds 2025-12-04T12:15:46.2389140Z dynamo/test_model_output 1/1 failed! 2025-12-04T12:15:46.2393742Z Running inductor/test_triton_kernels 1/1 ... [2025-12-04 12:15:46.239183][10974.622077718] 2025-12-04T12:15:46.2394337Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T12:15:46.2398919Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_triton_kernels.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 12:15:46.239654] 2025-12-04T12:18:45.3518341Z 2025-12-04T12:18:45.3522289Z inductor/test_triton_kernels 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_triton_kernels_1.1_80e8269e9d3330b3_.log 2025-12-04T12:18:45.3732084Z Running 366 items in this shard: test/inductor/test_triton_kernels.py::KernelTests::test_constexpr_dynamic_shapes_wrapped_False_autotune_False, test/inductor/test_triton_kernels.py::KernelTests::test_constexpr_dynamic_shapes_wrapped_False_autotune_True, test/inductor/test_triton_kernels.py::KernelTests::test_constexpr_dynamic_shapes_wrapped_True_autotune_False, test/inductor/test_triton_kernels.py::KernelTests::test_constexpr_dynamic_shapes_wrapped_True_autotune_True, test/inductor/test_triton_kernels.py::KernelTests::test_i64_input, test/inductor/test_triton_kernels.py::KernelTests::test_kernel_inline_asm_quotes_double, test/inductor/test_triton_kernels.py::KernelTests::test_kernel_inline_asm_quotes_single, test/inductor/test_triton_kernels.py::KernelTests::test_kernel_with_docstring_quotes_double, test/inductor/test_triton_kernels.py::KernelTests::test_kernel_with_docstring_quotes_single, test/inductor/test_triton_kernels.py::KernelTests::test_layout_constraint_needs_fixed_stride_order, test/inductor/test_triton_kernels.py::KernelTests::test_no_nan_kernels, test/inductor/test_triton_kernels.py::KernelTests::test_on_device_tma_dynamic_False_tma_version_new, test/inductor/test_triton_kernels.py::KernelTests::test_on_device_tma_dynamic_False_tma_version_old, test/inductor/test_triton_kernels.py::KernelTests::test_on_device_tma_dynamic_True_tma_version_new, test/inductor/test_triton_kernels.py::KernelTests::test_on_device_tma_dynamic_True_tma_version_old, test/inductor/test_triton_kernels.py::KernelTests::test_tma_capture_and_functionalize_dynamic_False_tma_version_new, test/inductor/test_triton_kernels.py::KernelTests::test_tma_capture_and_functionalize_dynamic_False_tma_version_old, test/inductor/test_triton_kernels.py::KernelTests::test_tma_capture_and_functionalize_dynamic_True_tma_version_new, test/inductor/test_triton_kernels.py::KernelTests::test_tma_capture_and_functionalize_dynamic_True_tma_version_old, test/inductor/test_triton_kernels.py::KernelTests::test_tma_descriptor_1d_dynamic_False_backend_aot_eager_tma_version_new, test/inductor/test_triton_kernels.py::KernelTests::test_tma_descriptor_1d_dynamic_False_backend_aot_eager_tma_version_old, test/inductor/test_triton_kernels.py::KernelTests::test_tma_descriptor_1d_dynamic_False_backend_eager_tma_version_new, test/inductor/test_triton_kernels.py::KernelTests::test_tma_descriptor_1d_dynamic_False_backend_eager_tma_version_old, test/inductor/test_triton_kernels.py::KernelTests::test_tma_descriptor_1d_dynamic_False_backend_inductor_tma_version_new, test/inductor/test_triton_kernels.py::KernelTests::test_tma_descriptor_1d_dynamic_False_backend_inductor_tma_version_old, test/inductor/test_triton_kernels.py::KernelTests::test_tma_descriptor_1d_dynamic_True_backend_aot_eager_tma_version_new, test/inductor/test_triton_kernels.py::KernelTests::test_tma_descriptor_1d_dynamic_True_backend_aot_eager_tma_version_old, test/inductor/test_triton_kernels.py::KernelTests::test_tma_descriptor_1d_dynamic_True_backend_eager_tma_version_new, test/inductor/test_triton_kernels.py::KernelTests::test_tma_descriptor_1d_dynamic_True_backend_eager_tma_version_old, test/inductor/test_triton_kernels.py::KernelTests::test_tma_descriptor_1d_dynamic_True_backend_inductor_tma_version_new, test/inductor/test_triton_kernels.py::KernelTests::test_tma_descriptor_1d_dynamic_True_backend_inductor_tma_version_old, test/inductor/test_triton_kernels.py::KernelTests::test_tma_descriptor_2d_dynamic_False_backend_aot_eager_tma_version_new, test/inductor/test_triton_kernels.py::KernelTests::test_tma_descriptor_2d_dynamic_False_backend_aot_eager_tma_version_old, test/inductor/test_triton_kernels.py::KernelTests::test_tma_descriptor_2d_dynamic_False_backend_eager_tma_version_new, test/inductor/test_triton_kernels.py::KernelTests::test_tma_descriptor_2d_dynamic_False_backend_eager_tma_version_old, test/inductor/test_triton_kernels.py::KernelTests::test_tma_descriptor_2d_dynamic_True_backend_aot_eager_tma_version_new, test/inductor/test_triton_kernels.py::KernelTests::test_tma_descriptor_2d_dynamic_True_backend_aot_eager_tma_version_old, test/inductor/test_triton_kernels.py::KernelTests::test_tma_descriptor_2d_dynamic_True_backend_eager_tma_version_new, test/inductor/test_triton_kernels.py::KernelTests::test_tma_descriptor_2d_dynamic_True_backend_eager_tma_version_old, test/inductor/test_triton_kernels.py::KernelTests::test_tma_descriptor_dedup_tma_version_new, test/inductor/test_triton_kernels.py::KernelTests::test_tma_descriptor_dedup_tma_version_old, test/inductor/test_triton_kernels.py::KernelTests::test_tma_graph_breaks_after_data_ptr_False_after_create_desc_False_tma_version_new, test/inductor/test_triton_kernels.py::KernelTests::test_tma_graph_breaks_after_data_ptr_False_after_create_desc_False_tma_version_old, test/inductor/test_triton_kernels.py::KernelTests::test_tma_graph_breaks_after_data_ptr_False_after_create_desc_True_tma_version_new, test/inductor/test_triton_kernels.py::KernelTests::test_tma_graph_breaks_after_data_ptr_False_after_create_desc_True_tma_version_old, test/inductor/test_triton_kernels.py::KernelTests::test_tma_graph_breaks_after_data_ptr_True_after_create_desc_False_tma_version_new, test/inductor/test_triton_kernels.py::KernelTests::test_tma_graph_breaks_after_data_ptr_True_after_create_desc_False_tma_version_old, test/inductor/test_triton_kernels.py::KernelTests::test_tma_graph_breaks_after_data_ptr_True_after_create_desc_True_tma_version_new, test/inductor/test_triton_kernels.py::KernelTests::test_tma_graph_breaks_after_data_ptr_True_after_create_desc_True_tma_version_old, test/inductor/test_triton_kernels.py::KernelTests::test_triton_attrs_dict_equal_1_None_format, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_False_dynamic_False_backend_aot_eager_grid_type_1_tdlp_0, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_False_dynamic_False_backend_aot_eager_grid_type_1_tdlp_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_False_dynamic_False_backend_aot_eager_grid_type_2_tdlp_0, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_False_dynamic_False_backend_aot_eager_grid_type_2_tdlp_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_False_dynamic_False_backend_aot_eager_grid_type_3_tdlp_0, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_False_dynamic_False_backend_aot_eager_grid_type_3_tdlp_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_False_dynamic_False_backend_eager_grid_type_1_tdlp_0, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_False_dynamic_False_backend_eager_grid_type_1_tdlp_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_False_dynamic_False_backend_eager_grid_type_2_tdlp_0, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_False_dynamic_False_backend_eager_grid_type_2_tdlp_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_False_dynamic_False_backend_eager_grid_type_3_tdlp_0, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_False_dynamic_False_backend_eager_grid_type_3_tdlp_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_False_dynamic_False_backend_inductor_grid_type_1_tdlp_0, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_False_dynamic_False_backend_inductor_grid_type_1_tdlp_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_False_dynamic_False_backend_inductor_grid_type_2_tdlp_0, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_False_dynamic_False_backend_inductor_grid_type_2_tdlp_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_False_dynamic_False_backend_inductor_grid_type_3_tdlp_0, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_False_dynamic_False_backend_inductor_grid_type_3_tdlp_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_False_dynamic_True_backend_aot_eager_grid_type_1_tdlp_0, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_False_dynamic_True_backend_aot_eager_grid_type_1_tdlp_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_False_dynamic_True_backend_aot_eager_grid_type_2_tdlp_0, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_False_dynamic_True_backend_aot_eager_grid_type_2_tdlp_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_False_dynamic_True_backend_aot_eager_grid_type_3_tdlp_0, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_False_dynamic_True_backend_aot_eager_grid_type_3_tdlp_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_False_dynamic_True_backend_eager_grid_type_1_tdlp_0, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_False_dynamic_True_backend_eager_grid_type_1_tdlp_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_False_dynamic_True_backend_eager_grid_type_2_tdlp_0, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_False_dynamic_True_backend_eager_grid_type_2_tdlp_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_False_dynamic_True_backend_eager_grid_type_3_tdlp_0, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_False_dynamic_True_backend_eager_grid_type_3_tdlp_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_False_dynamic_True_backend_inductor_grid_type_1_tdlp_0, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_False_dynamic_True_backend_inductor_grid_type_1_tdlp_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_False_dynamic_True_backend_inductor_grid_type_2_tdlp_0, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_False_dynamic_True_backend_inductor_grid_type_2_tdlp_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_False_dynamic_True_backend_inductor_grid_type_3_tdlp_0, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_False_dynamic_True_backend_inductor_grid_type_3_tdlp_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_True_dynamic_False_backend_aot_eager_grid_type_1_tdlp_0, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_True_dynamic_False_backend_aot_eager_grid_type_1_tdlp_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_True_dynamic_False_backend_aot_eager_grid_type_2_tdlp_0, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_True_dynamic_False_backend_aot_eager_grid_type_2_tdlp_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_True_dynamic_False_backend_aot_eager_grid_type_3_tdlp_0, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_True_dynamic_False_backend_aot_eager_grid_type_3_tdlp_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_True_dynamic_False_backend_eager_grid_type_1_tdlp_0, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_True_dynamic_False_backend_eager_grid_type_1_tdlp_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_True_dynamic_False_backend_eager_grid_type_2_tdlp_0, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_True_dynamic_False_backend_eager_grid_type_2_tdlp_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_True_dynamic_False_backend_eager_grid_type_3_tdlp_0, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_True_dynamic_False_backend_eager_grid_type_3_tdlp_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_True_dynamic_False_backend_inductor_grid_type_1_tdlp_0, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_True_dynamic_False_backend_inductor_grid_type_1_tdlp_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_True_dynamic_False_backend_inductor_grid_type_2_tdlp_0, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_True_dynamic_False_backend_inductor_grid_type_2_tdlp_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_True_dynamic_False_backend_inductor_grid_type_3_tdlp_0, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_True_dynamic_False_backend_inductor_grid_type_3_tdlp_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_True_dynamic_True_backend_aot_eager_grid_type_1_tdlp_0, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_True_dynamic_True_backend_aot_eager_grid_type_1_tdlp_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_True_dynamic_True_backend_aot_eager_grid_type_2_tdlp_0, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_True_dynamic_True_backend_aot_eager_grid_type_2_tdlp_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_True_dynamic_True_backend_aot_eager_grid_type_3_tdlp_0, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_True_dynamic_True_backend_aot_eager_grid_type_3_tdlp_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_True_dynamic_True_backend_eager_grid_type_1_tdlp_0, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_True_dynamic_True_backend_eager_grid_type_1_tdlp_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_True_dynamic_True_backend_eager_grid_type_2_tdlp_0, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_True_dynamic_True_backend_eager_grid_type_2_tdlp_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_True_dynamic_True_backend_eager_grid_type_3_tdlp_0, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_True_dynamic_True_backend_eager_grid_type_3_tdlp_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_True_dynamic_True_backend_inductor_grid_type_1_tdlp_0, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_True_dynamic_True_backend_inductor_grid_type_1_tdlp_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_True_dynamic_True_backend_inductor_grid_type_2_tdlp_0, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_True_dynamic_True_backend_inductor_grid_type_2_tdlp_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_True_dynamic_True_backend_inductor_grid_type_3_tdlp_0, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_True_dynamic_True_backend_inductor_grid_type_3_tdlp_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_grad_False_dynamic_False_backend_aot_eager_grid_type_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_grad_False_dynamic_False_backend_aot_eager_grid_type_2, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_grad_False_dynamic_False_backend_aot_eager_grid_type_3, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_grad_False_dynamic_False_backend_eager_grid_type_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_grad_False_dynamic_False_backend_eager_grid_type_2, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_grad_False_dynamic_False_backend_eager_grid_type_3, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_grad_False_dynamic_False_backend_inductor_grid_type_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_grad_False_dynamic_False_backend_inductor_grid_type_2, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_grad_False_dynamic_False_backend_inductor_grid_type_3, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_grad_False_dynamic_True_backend_aot_eager_grid_type_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_grad_False_dynamic_True_backend_aot_eager_grid_type_2, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_grad_False_dynamic_True_backend_aot_eager_grid_type_3, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_grad_False_dynamic_True_backend_eager_grid_type_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_grad_False_dynamic_True_backend_eager_grid_type_2, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_grad_False_dynamic_True_backend_eager_grid_type_3, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_grad_False_dynamic_True_backend_inductor_grid_type_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_grad_False_dynamic_True_backend_inductor_grid_type_2, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_grad_False_dynamic_True_backend_inductor_grid_type_3, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_grad_True_dynamic_False_backend_aot_eager_grid_type_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_grad_True_dynamic_False_backend_aot_eager_grid_type_2, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_grad_True_dynamic_False_backend_aot_eager_grid_type_3, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_grad_True_dynamic_False_backend_eager_grid_type_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_grad_True_dynamic_False_backend_eager_grid_type_2, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_grad_True_dynamic_False_backend_eager_grid_type_3, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_grad_True_dynamic_False_backend_inductor_grid_type_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_grad_True_dynamic_False_backend_inductor_grid_type_2, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_grad_True_dynamic_False_backend_inductor_grid_type_3, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_grad_True_dynamic_True_backend_aot_eager_grid_type_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_grad_True_dynamic_True_backend_aot_eager_grid_type_2, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_grad_True_dynamic_True_backend_aot_eager_grid_type_3, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_grad_True_dynamic_True_backend_eager_grid_type_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_grad_True_dynamic_True_backend_eager_grid_type_2, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_grad_True_dynamic_True_backend_eager_grid_type_3, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_grad_True_dynamic_True_backend_inductor_grid_type_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_grad_True_dynamic_True_backend_inductor_grid_type_2, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_grad_True_dynamic_True_backend_inductor_grid_type_3, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_with_unsupported_args_backend_aot_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_with_unsupported_args_backend_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_with_unsupported_args_backend_inductor, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_caching, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_caching_duplicate, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_constants, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_dependancies, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_different_shapes_size_16_dynamic_False, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_different_shapes_size_16_dynamic_True, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_different_shapes_size_4_dynamic_False, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_different_shapes_size_4_dynamic_True, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_dtype_view_cfg_cpp_wrapper, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_dtype_view_cfg_normal, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_empty_autotune_config_dict_backend_aot_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_empty_autotune_config_dict_backend_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_empty_autotune_config_dict_backend_inductor, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_emulate_precision_mm_kernels_do_not_change, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_emulate_precision_unaffected, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_equal_to_1_arg_dump_launch_params_0_dynamic_False, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_equal_to_1_arg_dump_launch_params_0_dynamic_True, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_equal_to_1_arg_dump_launch_params_1_dynamic_False, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_equal_to_1_arg_dump_launch_params_1_dynamic_True, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_equal_to_1_float_arg_dynamic_False, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_equal_to_1_float_arg_dynamic_True, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_fallback, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_float64_constant_float16, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_float64_constant_float32, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_float64_constant_float64, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_functionalize, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_global_constexpr, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_higher_order_func, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_inner_triton_function_backend_aot_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_inner_triton_function_backend_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_inner_triton_function_backend_inductor, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_inputs_buffer_reuse, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_matmul_tracking, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_multi_kernel_grad_False, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_multi_kernel_grad_True, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_multiple_outputs_dynamic_False_backend_aot_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_multiple_outputs_dynamic_False_backend_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_multiple_outputs_dynamic_False_backend_inductor, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_multiple_outputs_dynamic_True_backend_aot_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_multiple_outputs_dynamic_True_backend_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_multiple_outputs_dynamic_True_backend_inductor, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_mutation_not_mark_dirty, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_mutation_type, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_native_grad_False_dynamic_False_backend_aot_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_native_grad_False_dynamic_False_backend_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_native_grad_False_dynamic_False_backend_inductor, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_native_grad_False_dynamic_True_backend_aot_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_native_grad_False_dynamic_True_backend_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_native_grad_False_dynamic_True_backend_inductor, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_native_grad_True_dynamic_False_backend_aot_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_native_grad_True_dynamic_False_backend_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_native_grad_True_dynamic_False_backend_inductor, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_native_grad_True_dynamic_True_backend_aot_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_native_grad_True_dynamic_True_backend_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_native_grad_True_dynamic_True_backend_inductor, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_no_clones_grad_False_dynamic_False, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_no_clones_grad_False_dynamic_True, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_no_clones_grad_True_dynamic_False, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_no_clones_grad_True_dynamic_True, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_none_args, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_num_ctas_backend_aot_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_num_ctas_backend_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_num_ctas_backend_inductor, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_out_of_order, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_reinplace_inplaceable_pass, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_restore_value_backend_aot_eager_autotune_at_compile_time_False, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_restore_value_backend_aot_eager_autotune_at_compile_time_True, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_restore_value_backend_eager_autotune_at_compile_time_False, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_restore_value_backend_eager_autotune_at_compile_time_True, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_restore_value_backend_inductor_autotune_at_compile_time_False, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_restore_value_backend_inductor_autotune_at_compile_time_True, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_slice_and_view_input, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_special_kwargs_with_autotune_backend_aot_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_special_kwargs_with_autotune_backend_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_special_kwargs_with_autotune_backend_inductor, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_special_kwargs_without_autotune_backend_aot_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_special_kwargs_without_autotune_backend_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_special_kwargs_without_autotune_backend_inductor, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_special_params_autotune_False_backend_aot_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_special_params_autotune_False_backend_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_special_params_autotune_False_backend_inductor, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_special_params_autotune_True_backend_aot_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_special_params_autotune_True_backend_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_special_params_autotune_True_backend_inductor, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_strided_input, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_strided_input_nonzero_offset, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_to_cpu, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_tracing_dynamic_False, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_tracing_dynamic_True, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_triton_dtype_dynamic_False_backend_aot_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_triton_dtype_dynamic_False_backend_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_triton_dtype_dynamic_False_backend_inductor, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_triton_dtype_dynamic_True_backend_aot_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_triton_dtype_dynamic_True_backend_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_triton_dtype_dynamic_True_backend_inductor, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_unbacked_shape_tensor_backend_aot_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_unbacked_shape_tensor_backend_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_unbacked_shape_tensor_backend_inductor, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_various_args, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_with_constexpr_function, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_with_grad_option_grad_fn0_backend_aot_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_with_grad_option_grad_fn0_backend_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_with_grad_option_grad_fn0_backend_inductor, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_with_grad_option_grad_fn1_backend_aot_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_with_grad_option_grad_fn1_backend_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_with_grad_option_grad_fn1_backend_inductor, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_with_imported_symbol, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_with_imported_symbol_with_custom_name, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_with_kernel_param, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_with_views_dynamic_False_backend_aot_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_with_views_dynamic_False_backend_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_with_views_dynamic_False_backend_inductor, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_with_views_dynamic_True_backend_aot_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_with_views_dynamic_True_backend_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_with_views_dynamic_True_backend_inductor, test/inductor/test_triton_kernels.py::MutationTests::test_add_for_loop, test/inductor/test_triton_kernels.py::MutationTests::test_add_for_loop2, test/inductor/test_triton_kernels.py::MutationTests::test_add_kernel_on_device_tma_new_api, test/inductor/test_triton_kernels.py::MutationTests::test_add_kernel_on_device_tma_old_api, test/inductor/test_triton_kernels.py::MutationTests::test_add_nested_for_loop, test/inductor/test_triton_kernels.py::MutationTests::test_add_nested_for_loop_multi_return, test/inductor/test_triton_kernels.py::MutationTests::test_argmax, test/inductor/test_triton_kernels.py::MutationTests::test_branch_with_multiple_yield_args, test/inductor/test_triton_kernels.py::MutationTests::test_cumsum, test/inductor/test_triton_kernels.py::MutationTests::test_fn_call_multi_return, test/inductor/test_triton_kernels.py::MutationTests::test_fn_call_one_return, test/inductor/test_triton_kernels.py::MutationTests::test_for_loop_arg, test/inductor/test_triton_kernels.py::MutationTests::test_for_loop_arg_2, test/inductor/test_triton_kernels.py::MutationTests::test_get_tma_stores, test/inductor/test_triton_kernels.py::MutationTests::test_labels, test/inductor/test_triton_kernels.py::MutationTests::test_mutations_add_4_times_kernel, test/inductor/test_triton_kernels.py::MutationTests::test_mutations_add_kernel, test/inductor/test_triton_kernels.py::MutationTests::test_mutations_add_kernel_2d_autotuned, test/inductor/test_triton_kernels.py::MutationTests::test_mutations_add_kernel_with_block_ptr, test/inductor/test_triton_kernels.py::MutationTests::test_mutations_add_kernel_with_import, test/inductor/test_triton_kernels.py::MutationTests::test_mutations_atomic_add_kernel, test/inductor/test_triton_kernels.py::MutationTests::test_mutations_cond_op_kernel, test/inductor/test_triton_kernels.py::MutationTests::test_mutations_indirection_kernel, test/inductor/test_triton_kernels.py::MutationTests::test_mutations_indirection_kernel1, test/inductor/test_triton_kernels.py::MutationTests::test_mutations_inline_asm_kernel_is_pure_false, test/inductor/test_triton_kernels.py::MutationTests::test_mutations_inline_asm_kernel_is_pure_true, test/inductor/test_triton_kernels.py::MutationTests::test_mutations_kernel_with_block_ptr_2d, test/inductor/test_triton_kernels.py::MutationTests::test_mutations_mul2_inplace_kernel, test/inductor/test_triton_kernels.py::MutationTests::test_nested_cond_op_kernel, test/inductor/test_triton_kernels.py::MutationTests::test_out_of_order_kernel, test/inductor/test_triton_kernels.py::MutationTests::test_out_of_order_kernel_call, test/inductor/test_triton_kernels.py::MutationTests::test_reduce_sum, test/inductor/test_triton_kernels.py::MutationTests::test_triton_kernel_inference_mode, test/inductor/test_triton_kernels.py::MutationTests::test_while_loop, test/inductor/test_triton_kernels.py::CustomOpTests::test_add_kernel_autotuned_False_dynamic_False, test/inductor/test_triton_kernels.py::CustomOpTests::test_add_kernel_autotuned_False_dynamic_True, test/inductor/test_triton_kernels.py::CustomOpTests::test_add_kernel_autotuned_True_dynamic_False, test/inductor/test_triton_kernels.py::CustomOpTests::test_add_kernel_autotuned_True_dynamic_True, test/inductor/test_triton_kernels.py::CustomOpTests::test_autotune_no_pre_or_post_hook_user_defined, test/inductor/test_triton_kernels.py::CustomOpTests::test_autotune_unbacked, test/inductor/test_triton_kernels.py::CustomOpTests::test_capture_triton_meta, test/inductor/test_triton_kernels.py::CustomOpTests::test_capture_triton_special_kwargs_dynamic_False_autotune_False, test/inductor/test_triton_kernels.py::CustomOpTests::test_capture_triton_special_kwargs_dynamic_False_autotune_True, test/inductor/test_triton_kernels.py::CustomOpTests::test_capture_triton_special_kwargs_dynamic_True_autotune_False, test/inductor/test_triton_kernels.py::CustomOpTests::test_capture_triton_special_kwargs_dynamic_True_autotune_True, test/inductor/test_triton_kernels.py::CustomOpTests::test_preserves_strides_variant_custom_op, test/inductor/test_triton_kernels.py::CustomOpTests::test_preserves_strides_variant_mutable_custom_op, test/inductor/test_triton_kernels.py::CustomOpTests::test_preserves_strides_variant_triton_kernel, test/inductor/test_triton_kernels.py::CustomOpTests::test_subclass, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_dynamic_grid_no_recompile, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_kernel_heuristic_non_strict_False_backend_aot_eager_autotune_at_compile_time_False, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_kernel_heuristic_non_strict_False_backend_aot_eager_autotune_at_compile_time_True, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_kernel_heuristic_non_strict_False_backend_eager_autotune_at_compile_time_False, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_kernel_heuristic_non_strict_False_backend_eager_autotune_at_compile_time_True, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_kernel_heuristic_non_strict_False_backend_inductor_autotune_at_compile_time_False, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_kernel_heuristic_non_strict_False_backend_inductor_autotune_at_compile_time_True, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_kernel_heuristic_non_strict_True_backend_aot_eager_autotune_at_compile_time_False, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_kernel_heuristic_non_strict_True_backend_aot_eager_autotune_at_compile_time_True, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_kernel_heuristic_non_strict_True_backend_eager_autotune_at_compile_time_False, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_kernel_heuristic_non_strict_True_backend_eager_autotune_at_compile_time_True, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_kernel_heuristic_non_strict_True_backend_inductor_autotune_at_compile_time_False, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_kernel_heuristic_non_strict_True_backend_inductor_autotune_at_compile_time_True, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_kernel_prune_configs_by_non_strict_False_backend_aot_eager_with_perf_model_False, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_kernel_prune_configs_by_non_strict_False_backend_aot_eager_with_perf_model_True, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_kernel_prune_configs_by_non_strict_False_backend_eager_with_perf_model_False, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_kernel_prune_configs_by_non_strict_False_backend_eager_with_perf_model_True, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_kernel_prune_configs_by_non_strict_False_backend_inductor_with_perf_model_False, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_kernel_prune_configs_by_non_strict_False_backend_inductor_with_perf_model_True, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_kernel_prune_configs_by_non_strict_True_backend_aot_eager_with_perf_model_False, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_kernel_prune_configs_by_non_strict_True_backend_aot_eager_with_perf_model_True, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_kernel_prune_configs_by_non_strict_True_backend_eager_with_perf_model_False, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_kernel_prune_configs_by_non_strict_True_backend_eager_with_perf_model_True, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_kernel_prune_configs_by_non_strict_True_backend_inductor_with_perf_model_False, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_kernel_prune_configs_by_non_strict_True_backend_inductor_with_perf_model_True, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_kernel_prune_configs_by_recompile_backend_aot_eager_with_perf_model_False, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_kernel_prune_configs_by_recompile_backend_aot_eager_with_perf_model_True, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_kernel_prune_configs_by_recompile_backend_eager_with_perf_model_False, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_kernel_prune_configs_by_recompile_backend_eager_with_perf_model_True, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_kernel_prune_configs_by_recompile_backend_inductor_with_perf_model_False, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_kernel_prune_configs_by_recompile_backend_inductor_with_perf_model_True, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_kernel_reset_to_zero_backend_aot_eager_autotune_at_compile_time_False, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_kernel_reset_to_zero_backend_aot_eager_autotune_at_compile_time_True, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_kernel_reset_to_zero_backend_eager_autotune_at_compile_time_False, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_kernel_reset_to_zero_backend_eager_autotune_at_compile_time_True, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_kernel_reset_to_zero_backend_inductor_autotune_at_compile_time_False, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_kernel_reset_to_zero_backend_inductor_autotune_at_compile_time_True, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_single_autotune_backend_aot_eager, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_single_autotune_backend_eager, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_single_autotune_backend_inductor, test/inductor/test_triton_kernels.py::CustomOpTests::test_wrap_triton_disabled_in_triton_op 2025-12-04T12:18:45.3945351Z 2025-12-04T12:18:45.3945762Z Finished inductor/test_triton_kernels 1/1 ... [2025-12-04 12:18:45.352389][11153.735282205], took 2.99min 2025-12-04T12:18:45.3947141Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_triton_kernels/inductor.test_triton_kernels-498ce8e3e7c25595.xml 2025-12-04T12:18:45.4664857Z Running inductor/test_loop_ordering 1/1 ... [2025-12-04 12:18:45.466178][11153.849072214] 2025-12-04T12:18:45.4665458Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T12:18:45.4668352Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_loop_ordering.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 12:18:45.466602] 2025-12-04T12:20:32.2915492Z 2025-12-04T12:20:32.2916758Z PRINTING LOG FILE of inductor/test_loop_ordering 1/1 (test/test-reports/inductor.test_loop_ordering_1.1_ca0aee6babe9c71a_.log) 2025-12-04T12:20:32.2918326Z W1204 12:18:54.697000 137513 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T12:20:32.2920105Z Test results will be stored in test-reports/python-pytest/inductor.test_loop_ordering/inductor.test_loop_ordering-264346bf50f4314b.xml 2025-12-04T12:20:32.2921310Z ============================= test session starts ============================== 2025-12-04T12:20:32.2922425Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T12:20:32.2923073Z cachedir: .pytest_cache 2025-12-04T12:20:32.2923903Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:20:32.2924690Z rootdir: /var/lib/jenkins/workspace 2025-12-04T12:20:32.2925054Z configfile: pytest.ini 2025-12-04T12:20:32.2925849Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T12:20:32.2926691Z collecting ... collected 53 items 2025-12-04T12:20:32.2927118Z stepcurrent: Cannot find last run test, not skipping 2025-12-04T12:20:32.2948987Z Running 53 items in this shard: test/inductor/test_loop_ordering.py::ImplDetailTest::test_merge_loops_invalidate_pw_dep_cache, test/inductor/test_loop_ordering.py::ImplDetailTest::test_reorder_and_merge_loops, test/inductor/test_loop_ordering.py::ImplDetailTest::test_reorder_modular_indexing, test/inductor/test_loop_ordering.py::ImplDetailTest::test_reorder_twice, test/inductor/test_loop_ordering.py::LoopOrderingTest::test_3dred_pw_2d_outer_red, test/inductor/test_loop_ordering.py::LoopOrderingTest::test_apbt_realize, test/inductor/test_loop_ordering.py::LoopOrderingTest::test_different_broadcast_shapes, test/inductor/test_loop_ordering.py::LoopOrderingTest::test_different_reduction_order, test/inductor/test_loop_ordering.py::LoopOrderingTest::test_for_reordering_reindex, test/inductor/test_loop_ordering.py::LoopOrderingTest::test_fp8_cast_and_t, test/inductor/test_loop_ordering.py::LoopOrderingTest::test_fp8_pattern_2, test/inductor/test_loop_ordering.py::LoopOrderingTest::test_fuse_reduction_with_tiled_pw, test/inductor/test_loop_ordering.py::LoopOrderingTest::test_fuse_with_scalar_shared_memory, test/inductor/test_loop_ordering.py::LoopOrderingTest::test_interaction_with_multi_template, test/inductor/test_loop_ordering.py::LoopOrderingTest::test_interaction_with_triton_template, test/inductor/test_loop_ordering.py::LoopOrderingTest::test_keep_fake_dep, test/inductor/test_loop_ordering.py::LoopOrderingTest::test_outer_dimension_softmax, test/inductor/test_loop_ordering.py::LoopOrderingTest::test_outer_dimension_sum_fuse_with_pw, test/inductor/test_loop_ordering.py::LoopOrderingTest::test_pw_outer_red, test/inductor/test_loop_ordering.py::LoopOrderingTest::test_pw_outer_red_2, test/inductor/test_loop_ordering.py::LoopOrderingTest::test_sum_and_t, test/inductor/test_loop_ordering.py::LoopOrderingTest::test_view, test/inductor/test_loop_ordering.py::MemoryCoalescingTest::test_coalescing, test/inductor/test_loop_ordering.py::MemoryCoalescingTest::test_induced_fused_tiling, test/inductor/test_loop_ordering.py::MemoryCoalescingTest::test_inferred_splits_inps0, test/inductor/test_loop_ordering.py::MemoryCoalescingTest::test_inferred_splits_inps1, test/inductor/test_loop_ordering.py::MemoryCoalescingTest::test_inferred_splits_inps2, test/inductor/test_loop_ordering.py::MemoryCoalescingTest::test_inferred_splits_inps3, test/inductor/test_loop_ordering.py::MemoryCoalescingTest::test_reduction_no_pointwise, test/inductor/test_loop_ordering.py::MemoryCoalescingTest::test_reduction_pointwise, test/inductor/test_loop_ordering.py::MemoryCoalescingTest::test_remapped_reads, test/inductor/test_loop_ordering.py::MemoryCoalescingTest::test_remapped_reads_split, test/inductor/test_loop_ordering.py::MemoryCoalescingTest::test_solve_for_tiling, test/inductor/test_loop_ordering.py::MemoryCoalescingTest::test_solve_for_zero, test/inductor/test_loop_ordering.py::MemoryCoalescingTest::test_tiled_coalesce_analysis_downcast_transposed_v_False, test/inductor/test_loop_ordering.py::MemoryCoalescingTest::test_tiled_coalesce_analysis_downcast_transposed_v_True, test/inductor/test_loop_ordering.py::TestTiling::test_3d_pointwise, test/inductor/test_loop_ordering.py::TestTiling::test_cat, test/inductor/test_loop_ordering.py::TestTiling::test_find_broadcast_var, test/inductor/test_loop_ordering.py::TestTiling::test_mutation_deps, test/inductor/test_loop_ordering.py::TestTiling::test_penalized_small_dim, test/inductor/test_loop_ordering.py::TestTiling::test_pointwise_a_NHWC_b_NHWC, test/inductor/test_loop_ordering.py::TestTiling::test_pointwise_a_NHWC_b_T, test/inductor/test_loop_ordering.py::TestTiling::test_pointwise_a_NHWC_b_cont, test/inductor/test_loop_ordering.py::TestTiling::test_pointwise_a_T_b_NHWC, test/inductor/test_loop_ordering.py::TestTiling::test_pointwise_a_T_b_T, test/inductor/test_loop_ordering.py::TestTiling::test_pointwise_a_T_b_cont, test/inductor/test_loop_ordering.py::TestTiling::test_pointwise_a_cont_b_NHWC, test/inductor/test_loop_ordering.py::TestTiling::test_pointwise_a_cont_b_T, test/inductor/test_loop_ordering.py::TestTiling::test_pointwise_a_cont_b_cont, test/inductor/test_loop_ordering.py::TestTiling::test_tiled_reduction, test/inductor/test_loop_ordering.py::TestIndexInversion::test_inversion_cases, test/inductor/test_loop_ordering.py::TestIndexInversion::test_original_complex_expression 2025-12-04T12:20:32.2970495Z 2025-12-04T12:20:32.2971441Z inductor/test_loop_ordering.py::ImplDetailTest::test_merge_loops_invalidate_pw_dep_cache PASSED [0.0475s] [ 1%] 2025-12-04T12:20:32.2972605Z inductor/test_loop_ordering.py::ImplDetailTest::test_reorder_and_merge_loops PASSED [0.0100s] [ 3%] 2025-12-04T12:20:32.2973536Z inductor/test_loop_ordering.py::ImplDetailTest::test_reorder_modular_indexing PASSED [0.0643s] [ 5%] 2025-12-04T12:20:32.2974520Z inductor/test_loop_ordering.py::ImplDetailTest::test_reorder_twice PASSED [0.0134s] [ 7%] 2025-12-04T12:20:32.2975838Z inductor/test_loop_ordering.py::LoopOrderingTest::test_3dred_pw_2d_outer_red I1204 12:19:00.156000 137513 site-packages/torch/_inductor/compile_fx.py:1584] [0/0] [__inductor_metrics] Graph Metrics: 2025-12-04T12:20:32.2978414Z I1204 12:19:00.156000 137513 site-packages/torch/_inductor/compile_fx.py:1584] [0/0] [__inductor_metrics] {'num_bytes_accessed': 1081600, 'nodes_num_elem': [(FusedSchedulerNode(nodes=op0_op1), 266240), (SchedulerNode(name='op2'), 4160)], 'node_runtimes': [(FusedSchedulerNode(nodes=op0_op1), 0.003327334533093381), (SchedulerNode(name='op2'), 5.198960207958408e-05)]} 2025-12-04T12:20:32.2980305Z PASSED [5.2497s] [ 9%] 2025-12-04T12:20:32.2981500Z inductor/test_loop_ordering.py::LoopOrderingTest::test_apbt_realize I1204 12:19:00.965000 137513 site-packages/torch/_inductor/compile_fx.py:1584] [0/0] [__inductor_metrics] Graph Metrics: 2025-12-04T12:20:32.2983883Z I1204 12:19:00.965000 137513 site-packages/torch/_inductor/compile_fx.py:1584] [0/0] [__inductor_metrics] {'num_bytes_accessed': 25165824, 'nodes_num_elem': [(FusedSchedulerNode(nodes=op1_op0_op2), 6291456)], 'node_runtimes': [(FusedSchedulerNode(nodes=op1_op0_op2), 0.07862747450509898)]} 2025-12-04T12:20:32.2985355Z PASSED [0.9374s] [ 11%] 2025-12-04T12:20:32.2986426Z inductor/test_loop_ordering.py::LoopOrderingTest::test_different_broadcast_shapes I1204 12:19:01.552000 137513 site-packages/torch/_inductor/compile_fx.py:1584] [0/0] [__inductor_metrics] Graph Metrics: 2025-12-04T12:20:32.2988821Z I1204 12:19:01.552000 137513 site-packages/torch/_inductor/compile_fx.py:1584] [0/0] [__inductor_metrics] {'num_bytes_accessed': 16785408, 'nodes_num_elem': [(SchedulerNode(name='op0'), 2098176), (SchedulerNode(name='op1'), 2098176)], 'node_runtimes': [(SchedulerNode(name='op0'), 0.026221955608878224), (SchedulerNode(name='op1'), 0.026221955608878224)]} 2025-12-04T12:20:32.2990434Z PASSED [0.4485s] [ 13%] 2025-12-04T12:20:32.2991482Z inductor/test_loop_ordering.py::LoopOrderingTest::test_different_reduction_order I1204 12:19:01.970000 137513 site-packages/torch/_inductor/compile_fx.py:1584] [0/0] [__inductor_metrics] Graph Metrics: 2025-12-04T12:20:32.2993955Z I1204 12:19:01.970000 137513 site-packages/torch/_inductor/compile_fx.py:1584] [0/0] [__inductor_metrics] {'num_bytes_accessed': 16789504, 'nodes_num_elem': [(SchedulerNode(name='op0'), 2099200), (SchedulerNode(name='op1'), 2098176)], 'node_runtimes': [(SchedulerNode(name='op0'), 0.02623475304939012), (SchedulerNode(name='op1'), 0.026221955608878224)]} 2025-12-04T12:20:32.2995584Z PASSED [0.3668s] [ 15%] 2025-12-04T12:20:32.2996633Z inductor/test_loop_ordering.py::LoopOrderingTest::test_for_reordering_reindex W1204 12:19:02.964000 137513 site-packages/torch/_inductor/utils.py:1703] [0/0] Not enough SMs to use max_autotune_gemm mode 2025-12-04T12:20:32.2998065Z I1204 12:19:03.341000 137513 site-packages/torch/_inductor/compile_fx.py:1584] [0/0] [__inductor_metrics] Graph Metrics: 2025-12-04T12:20:32.3000289Z I1204 12:19:03.341000 137513 site-packages/torch/_inductor/compile_fx.py:1584] [0/0] [__inductor_metrics] {'num_bytes_accessed': 54400, 'nodes_num_elem': [(FusedSchedulerNode(nodes=op0_op1), 12400), (ExternKernelSchedulerNode(name='op2'), 1200)], 'node_runtimes': [(FusedSchedulerNode(nodes=op0_op1), 0.00015496900619876023), (ExternKernelSchedulerNode(name='op2'), 0.0019654088050314465)]} 2025-12-04T12:20:32.3002147Z PASSED [1.4408s] [ 16%] 2025-12-04T12:20:32.3002857Z inductor/test_loop_ordering.py::LoopOrderingTest::test_fp8_cast_and_t SKIPPED [0.0003s] (FP8 requires H100+ and MI300+) [ 18%] 2025-12-04T12:20:32.3004016Z inductor/test_loop_ordering.py::LoopOrderingTest::test_fp8_pattern_2 SKIPPED [0.0003s] (FP8 requires H100+ and MI300+) [ 20%] 2025-12-04T12:20:32.3005528Z inductor/test_loop_ordering.py::LoopOrderingTest::test_fuse_reduction_with_tiled_pw I1204 12:19:04.677000 137513 site-packages/torch/_inductor/compile_fx.py:1584] [0/0] [__inductor_metrics] Graph Metrics: 2025-12-04T12:20:32.3008060Z I1204 12:19:04.677000 137513 site-packages/torch/_inductor/compile_fx.py:1584] [0/0] [__inductor_metrics] {'num_bytes_accessed': 241604, 'nodes_num_elem': [(FusedSchedulerNode(nodes=op0_op2_op3), 60200), (SchedulerNode(name='op1'), 201)], 'node_runtimes': [(FusedSchedulerNode(nodes=op0_op2_op3), 0.0007523495300939811), (SchedulerNode(name='op1'), 2.511997600479904e-06)]} 2025-12-04T12:20:32.3009823Z PASSED [1.3263s] [ 22%] 2025-12-04T12:20:32.3010931Z inductor/test_loop_ordering.py::LoopOrderingTest::test_fuse_with_scalar_shared_memory I1204 12:19:05.376000 137513 site-packages/torch/_inductor/compile_fx.py:1584] [0/0] [__inductor_metrics] Graph Metrics: 2025-12-04T12:20:32.3013017Z I1204 12:19:05.376000 137513 site-packages/torch/_inductor/compile_fx.py:1584] [0/0] [__inductor_metrics] {'num_bytes_accessed': 104, 'nodes_num_elem': [(FusedSchedulerNode(nodes=op0_op1), 26)], 'node_runtimes': [(FusedSchedulerNode(nodes=op0_op1), 3.249350129974005e-07)]} 2025-12-04T12:20:32.3014323Z PASSED [0.6280s] [ 24%] 2025-12-04T12:20:32.3015127Z inductor/test_loop_ordering.py::LoopOrderingTest::test_interaction_with_multi_template SKIPPED [0.0003s] (Need big gpu for max-autotune) [ 26%] 2025-12-04T12:20:32.3016639Z inductor/test_loop_ordering.py::LoopOrderingTest::test_interaction_with_triton_template SKIPPED [0.0002s] (Need big gpu for max-autotune) [ 28%] 2025-12-04T12:20:32.3018213Z inductor/test_loop_ordering.py::LoopOrderingTest::test_keep_fake_dep I1204 12:19:07.029000 137513 site-packages/torch/_inductor/compile_fx.py:1584] [0/0_1] [__inductor_metrics] Graph Metrics: 2025-12-04T12:20:32.3021069Z I1204 12:19:07.029000 137513 site-packages/torch/_inductor/compile_fx.py:1584] [0/0_1] [__inductor_metrics] {'num_bytes_accessed': 2068996, 'nodes_num_elem': [(SchedulerNode(name='op1'), 66560), (FusedSchedulerNode(nodes=op0_op3_op5_op2_op6), 446592), (SchedulerNode(name='op7'), 4097)], 'node_runtimes': [(SchedulerNode(name='op1'), 0.0008318336332733453), (FusedSchedulerNode(nodes=op0_op3_op5_op2_op6), 0.00558128374325135), (SchedulerNode(name='op7'), 5.120225954809038e-05)]} 2025-12-04T12:20:32.3023591Z I1204 12:19:08.524000 137513 site-packages/torch/_inductor/compile_fx.py:1584] [0/0_1] [__inductor_metrics] Graph Metrics: 2025-12-04T12:20:32.3027740Z I1204 12:19:08.524000 137513 site-packages/torch/_inductor/compile_fx.py:1584] [0/0_1] [__inductor_metrics] {'num_bytes_accessed': 6542092, 'nodes_num_elem': [(SchedulerNode(name='op9'), 131072), (FusedSchedulerNode(nodes=op0_op1_op2_op10), 835649), (SchedulerNode(name='op3'), 305153), (SchedulerNode(name='op4'), 2112), (SchedulerNode(name='op5'), 65), (SchedulerNode(name='op7'), 32768), (FusedSchedulerNode(nodes=op6_op8), 328704)], 'node_runtimes': [(SchedulerNode(name='op9'), 0.0016380723855228953), (FusedSchedulerNode(nodes=op0_op1_op2_op10), 0.010443523795240951), (SchedulerNode(name='op3'), 0.00381364977004599), (SchedulerNode(name='op4'), 2.639472105578884e-05), (SchedulerNode(name='op5'), 8.123375324935012e-07), (SchedulerNode(name='op7'), 0.0004095180963807238), (FusedSchedulerNode(nodes=op6_op8), 0.0041079784043191354)]} 2025-12-04T12:20:32.3031390Z PASSED [3.2679s] [ 30%] 2025-12-04T12:20:32.3032422Z inductor/test_loop_ordering.py::LoopOrderingTest::test_outer_dimension_softmax I1204 12:19:09.876000 137513 site-packages/torch/_inductor/compile_fx.py:1584] [0/0] [__inductor_metrics] Graph Metrics: 2025-12-04T12:20:32.3034580Z I1204 12:19:09.876000 137513 site-packages/torch/_inductor/compile_fx.py:1584] [0/0] [__inductor_metrics] {'num_bytes_accessed': 536870912, 'nodes_num_elem': [(FusedSchedulerNode(nodes=op0_op1_op2), 134217728)], 'node_runtimes': [(FusedSchedulerNode(nodes=op0_op1_op2), 1.6773861227754447)]} 2025-12-04T12:20:32.3035981Z PASSED [1.6114s] [ 32%] 2025-12-04T12:20:32.3037044Z inductor/test_loop_ordering.py::LoopOrderingTest::test_outer_dimension_sum_fuse_with_pw I1204 12:19:11.094000 137513 site-packages/torch/_inductor/compile_fx.py:1584] [0/0] [__inductor_metrics] Graph Metrics: 2025-12-04T12:20:32.3039249Z I1204 12:19:11.094000 137513 site-packages/torch/_inductor/compile_fx.py:1584] [0/0] [__inductor_metrics] {'num_bytes_accessed': 536870912, 'nodes_num_elem': [(FusedSchedulerNode(nodes=op0_op1), 134217728)], 'node_runtimes': [(FusedSchedulerNode(nodes=op0_op1), 1.6773861227754447)]} 2025-12-04T12:20:32.3040594Z PASSED [1.0912s] [ 33%] 2025-12-04T12:20:32.3041562Z inductor/test_loop_ordering.py::LoopOrderingTest::test_pw_outer_red I1204 12:19:12.222000 137513 site-packages/torch/_inductor/compile_fx.py:1584] [0/0] [__inductor_metrics] Graph Metrics: 2025-12-04T12:20:32.3043681Z I1204 12:19:12.222000 137513 site-packages/torch/_inductor/compile_fx.py:1584] [0/0] [__inductor_metrics] {'num_bytes_accessed': 18432, 'nodes_num_elem': [(FusedSchedulerNode(nodes=op0_op1), 4608)], 'node_runtimes': [(FusedSchedulerNode(nodes=op0_op1), 5.758848230353929e-05)]} 2025-12-04T12:20:32.3045017Z PASSED [0.9092s] [ 35%] 2025-12-04T12:20:32.3045975Z inductor/test_loop_ordering.py::LoopOrderingTest::test_pw_outer_red_2 I1204 12:19:12.773000 137513 site-packages/torch/_inductor/compile_fx.py:1584] [0/0] [__inductor_metrics] Graph Metrics: 2025-12-04T12:20:32.3048063Z I1204 12:19:12.773000 137513 site-packages/torch/_inductor/compile_fx.py:1584] [0/0] [__inductor_metrics] {'num_bytes_accessed': 18432, 'nodes_num_elem': [(FusedSchedulerNode(nodes=op0_op1_op2_op3), 4608)], 'node_runtimes': [(FusedSchedulerNode(nodes=op0_op1_op2_op3), 5.758848230353929e-05)]} 2025-12-04T12:20:32.3049474Z PASSED [0.5478s] [ 37%] 2025-12-04T12:20:32.3050429Z inductor/test_loop_ordering.py::LoopOrderingTest::test_sum_and_t I1204 12:19:13.579000 137513 site-packages/torch/_inductor/compile_fx.py:1584] [0/0] [__inductor_metrics] Graph Metrics: 2025-12-04T12:20:32.3052446Z I1204 12:19:13.579000 137513 site-packages/torch/_inductor/compile_fx.py:1584] [0/0] [__inductor_metrics] {'num_bytes_accessed': 16781312, 'nodes_num_elem': [(FusedSchedulerNode(nodes=op0_op1), 4195328)], 'node_runtimes': [(FusedSchedulerNode(nodes=op0_op1), 0.05243111377724455)]} 2025-12-04T12:20:32.3053791Z PASSED [0.9096s] [ 39%] 2025-12-04T12:20:32.3054736Z inductor/test_loop_ordering.py::LoopOrderingTest::test_view I1204 12:19:14.015000 137513 site-packages/torch/_inductor/compile_fx.py:1584] [0/0] [__inductor_metrics] Graph Metrics: 2025-12-04T12:20:32.3056827Z I1204 12:19:14.015000 137513 site-packages/torch/_inductor/compile_fx.py:1584] [0/0] [__inductor_metrics] {'num_bytes_accessed': 1200, 'nodes_num_elem': [(FusedSchedulerNode(nodes=op0_op1), 300)], 'node_runtimes': [(FusedSchedulerNode(nodes=op0_op1), 3.7492501499700058e-06)]} 2025-12-04T12:20:32.3058171Z PASSED [0.2883s] [ 41%] 2025-12-04T12:20:32.3058724Z inductor/test_loop_ordering.py::MemoryCoalescingTest::test_coalescing PASSED [0.0090s] [ 43%] 2025-12-04T12:20:32.3060071Z inductor/test_loop_ordering.py::MemoryCoalescingTest::test_induced_fused_tiling W1204 12:19:14.132000 137513 site-packages/torch/_inductor/codecache.py:629] [0/0] Failed to pickle cache key 2025-12-04T12:20:32.3061478Z W1204 12:19:14.132000 137513 site-packages/torch/_inductor/codecache.py:629] [0/0] Traceback (most recent call last): 2025-12-04T12:20:32.3062876Z W1204 12:19:14.132000 137513 site-packages/torch/_inductor/codecache.py:629] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 625, in dumps 2025-12-04T12:20:32.3064170Z W1204 12:19:14.132000 137513 site-packages/torch/_inductor/codecache.py:629] [0/0] self.dump(obj) 2025-12-04T12:20:32.3065447Z W1204 12:19:14.132000 137513 site-packages/torch/_inductor/codecache.py:629] [0/0] AttributeError: Can't pickle local object 'MemoryCoalescingTest.test_induced_fused_tiling..fn' 2025-12-04T12:20:32.3066767Z W1204 12:19:14.161000 137513 site-packages/torch/_inductor/codecache.py:629] [0/0] Failed to pickle cache key 2025-12-04T12:20:32.3067767Z W1204 12:19:14.161000 137513 site-packages/torch/_inductor/codecache.py:629] [0/0] Traceback (most recent call last): 2025-12-04T12:20:32.3069149Z W1204 12:19:14.161000 137513 site-packages/torch/_inductor/codecache.py:629] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 625, in dumps 2025-12-04T12:20:32.3070485Z W1204 12:19:14.161000 137513 site-packages/torch/_inductor/codecache.py:629] [0/0] self.dump(obj) 2025-12-04T12:20:32.3072481Z W1204 12:19:14.161000 137513 site-packages/torch/_inductor/codecache.py:629] [0/0] AttributeError: Can't pickle local object 'MemoryCoalescingTest.test_induced_fused_tiling..fn' 2025-12-04T12:20:32.3073636Z ('RERUN', {'yellow': True}) [0.1358s] [ 45%] 2025-12-04T12:20:32.3074807Z inductor/test_loop_ordering.py::MemoryCoalescingTest::test_induced_fused_tiling W1204 12:19:14.265000 137513 site-packages/torch/_inductor/codecache.py:629] [0/0] Failed to pickle cache key 2025-12-04T12:20:32.3076198Z W1204 12:19:14.265000 137513 site-packages/torch/_inductor/codecache.py:629] [0/0] Traceback (most recent call last): 2025-12-04T12:20:32.3078145Z W1204 12:19:14.265000 137513 site-packages/torch/_inductor/codecache.py:629] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 625, in dumps 2025-12-04T12:20:32.3080105Z W1204 12:19:14.265000 137513 site-packages/torch/_inductor/codecache.py:629] [0/0] self.dump(obj) 2025-12-04T12:20:32.3081970Z W1204 12:19:14.265000 137513 site-packages/torch/_inductor/codecache.py:629] [0/0] AttributeError: Can't pickle local object 'MemoryCoalescingTest.test_induced_fused_tiling..fn' 2025-12-04T12:20:32.3083919Z W1204 12:19:14.290000 137513 site-packages/torch/_inductor/codecache.py:629] [0/0] Failed to pickle cache key 2025-12-04T12:20:32.3085483Z W1204 12:19:14.290000 137513 site-packages/torch/_inductor/codecache.py:629] [0/0] Traceback (most recent call last): 2025-12-04T12:20:32.3087712Z W1204 12:19:14.290000 137513 site-packages/torch/_inductor/codecache.py:629] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 625, in dumps 2025-12-04T12:20:32.3090044Z W1204 12:19:14.290000 137513 site-packages/torch/_inductor/codecache.py:629] [0/0] self.dump(obj) 2025-12-04T12:20:32.3092330Z W1204 12:19:14.290000 137513 site-packages/torch/_inductor/codecache.py:629] [0/0] AttributeError: Can't pickle local object 'MemoryCoalescingTest.test_induced_fused_tiling..fn' 2025-12-04T12:20:32.3094177Z ('RERUN', {'yellow': True}) [0.0967s] [ 45%] 2025-12-04T12:20:32.3096497Z inductor/test_loop_ordering.py::MemoryCoalescingTest::test_induced_fused_tiling W1204 12:19:14.362000 137513 site-packages/torch/_inductor/codecache.py:629] [0/0] Failed to pickle cache key 2025-12-04T12:20:32.3099575Z W1204 12:19:14.362000 137513 site-packages/torch/_inductor/codecache.py:629] [0/0] Traceback (most recent call last): 2025-12-04T12:20:32.3102613Z W1204 12:19:14.362000 137513 site-packages/torch/_inductor/codecache.py:629] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 625, in dumps 2025-12-04T12:20:32.3105547Z W1204 12:19:14.362000 137513 site-packages/torch/_inductor/codecache.py:629] [0/0] self.dump(obj) 2025-12-04T12:20:32.3108059Z W1204 12:19:14.362000 137513 site-packages/torch/_inductor/codecache.py:629] [0/0] AttributeError: Can't pickle local object 'MemoryCoalescingTest.test_induced_fused_tiling..fn' 2025-12-04T12:20:32.3110784Z W1204 12:19:14.387000 137513 site-packages/torch/_inductor/codecache.py:629] [0/0] Failed to pickle cache key 2025-12-04T12:20:32.3112734Z W1204 12:19:14.387000 137513 site-packages/torch/_inductor/codecache.py:629] [0/0] Traceback (most recent call last): 2025-12-04T12:20:32.3115157Z W1204 12:19:14.387000 137513 site-packages/torch/_inductor/codecache.py:629] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 625, in dumps 2025-12-04T12:20:32.3116658Z W1204 12:19:14.387000 137513 site-packages/torch/_inductor/codecache.py:629] [0/0] self.dump(obj) 2025-12-04T12:20:32.3118146Z W1204 12:19:14.387000 137513 site-packages/torch/_inductor/codecache.py:629] [0/0] AttributeError: Can't pickle local object 'MemoryCoalescingTest.test_induced_fused_tiling..fn' 2025-12-04T12:20:32.3119499Z FAILED [0.0939s] [ 45%] 2025-12-04T12:20:32.3119710Z 2025-12-04T12:20:32.3119879Z ==================================== RERUNS ==================================== 2025-12-04T12:20:32.3120532Z ________________ MemoryCoalescingTest.test_induced_fused_tiling ________________ 2025-12-04T12:20:32.3121213Z Traceback (most recent call last): 2025-12-04T12:20:32.3122107Z File "/var/lib/jenkins/workspace/test/inductor/test_loop_ordering.py", line 1042, in test_induced_fused_tiling 2025-12-04T12:20:32.3122951Z out, code = run_and_get_code(torch.compile(forward), (permute)) 2025-12-04T12:20:32.3123786Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/utils.py", line 2409, in run_and_get_code 2025-12-04T12:20:32.3124518Z result = fn(*args, **kwargs) 2025-12-04T12:20:32.3125227Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:20:32.3126089Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:20:32.3126989Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 992, in _compile_fx_inner 2025-12-04T12:20:32.3128000Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:20:32.3128830Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 988, in _compile_fx_inner 2025-12-04T12:20:32.3129631Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:20:32.3130456Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:20:32.3131459Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:20:32.3132431Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1471, in codegen_and_compile 2025-12-04T12:20:32.3133225Z _check_triton_bf16_support(graph) 2025-12-04T12:20:32.3134029Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2911, in _check_triton_bf16_support 2025-12-04T12:20:32.3134845Z warn_and_skip(node.get_device()) 2025-12-04T12:20:32.3135568Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2894, in warn_and_skip 2025-12-04T12:20:32.3136490Z raise SkipFrame("BF16 is not supported") 2025-12-04T12:20:32.3137033Z torch._inductor.exc.InductorError: SkipFrame: BF16 is not supported 2025-12-04T12:20:32.3137424Z 2025-12-04T12:20:32.3138228Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:20:32.3139172Z 2025-12-04T12:20:32.3139178Z 2025-12-04T12:20:32.3139399Z To execute this test, run the following from the base repo dir: 2025-12-04T12:20:32.3140287Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_loop_ordering.py MemoryCoalescingTest.test_induced_fused_tiling 2025-12-04T12:20:32.3140961Z 2025-12-04T12:20:32.3141234Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:20:32.3141877Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:20:32.3142346Z frames [('total', 1)] 2025-12-04T12:20:32.3142649Z stats [('calls_captured', 3)] 2025-12-04T12:20:32.3143105Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-12-04T12:20:32.3143816Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_bypass', 1)] 2025-12-04T12:20:32.3144414Z graph_break [] 2025-12-04T12:20:32.3144797Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:20:32.3145895Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:20:32.3146913Z warnings.warn( 2025-12-04T12:20:32.3147443Z ________________ MemoryCoalescingTest.test_induced_fused_tiling ________________ 2025-12-04T12:20:32.3148174Z Traceback (most recent call last): 2025-12-04T12:20:32.3148893Z File "/var/lib/jenkins/workspace/test/inductor/test_loop_ordering.py", line 1042, in test_induced_fused_tiling 2025-12-04T12:20:32.3149794Z out, code = run_and_get_code(torch.compile(forward), (permute)) 2025-12-04T12:20:32.3150674Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/utils.py", line 2409, in run_and_get_code 2025-12-04T12:20:32.3151414Z result = fn(*args, **kwargs) 2025-12-04T12:20:32.3152287Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:20:32.3153172Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:20:32.3154078Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 992, in _compile_fx_inner 2025-12-04T12:20:32.3154913Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:20:32.3155771Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 988, in _compile_fx_inner 2025-12-04T12:20:32.3156572Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:20:32.3157398Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:20:32.3158383Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:20:32.3159379Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1471, in codegen_and_compile 2025-12-04T12:20:32.3160174Z _check_triton_bf16_support(graph) 2025-12-04T12:20:32.3160981Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2911, in _check_triton_bf16_support 2025-12-04T12:20:32.3161785Z warn_and_skip(node.get_device()) 2025-12-04T12:20:32.3162520Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2894, in warn_and_skip 2025-12-04T12:20:32.3163293Z raise SkipFrame("BF16 is not supported") 2025-12-04T12:20:32.3163811Z torch._inductor.exc.InductorError: SkipFrame: BF16 is not supported 2025-12-04T12:20:32.3164206Z 2025-12-04T12:20:32.3164969Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:20:32.3165834Z 2025-12-04T12:20:32.3165839Z 2025-12-04T12:20:32.3166057Z To execute this test, run the following from the base repo dir: 2025-12-04T12:20:32.3166932Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_loop_ordering.py MemoryCoalescingTest.test_induced_fused_tiling 2025-12-04T12:20:32.3167593Z 2025-12-04T12:20:32.3167878Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:20:32.3168510Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:20:32.3168985Z frames [('total', 1)] 2025-12-04T12:20:32.3169291Z stats [('calls_captured', 3)] 2025-12-04T12:20:32.3169724Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-12-04T12:20:32.3170451Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_bypass', 1)] 2025-12-04T12:20:32.3171257Z graph_break [] 2025-12-04T12:20:32.3171644Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:20:32.3172731Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:20:32.3173710Z warnings.warn( 2025-12-04T12:20:32.3174233Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:20:32.3174700Z frames [('total', 1)] 2025-12-04T12:20:32.3175007Z stats [('calls_captured', 3)] 2025-12-04T12:20:32.3175463Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-12-04T12:20:32.3176191Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_bypass', 1)] 2025-12-04T12:20:32.3176907Z graph_break [] 2025-12-04T12:20:32.3177293Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:20:32.3178447Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:20:32.3179407Z warnings.warn( 2025-12-04T12:20:32.3179724Z =================================== FAILURES =================================== 2025-12-04T12:20:32.3180298Z ________________ MemoryCoalescingTest.test_induced_fused_tiling ________________ 2025-12-04T12:20:32.3180838Z Traceback (most recent call last): 2025-12-04T12:20:32.3181648Z File "/var/lib/jenkins/workspace/test/inductor/test_loop_ordering.py", line 1042, in test_induced_fused_tiling 2025-12-04T12:20:32.3182497Z out, code = run_and_get_code(torch.compile(forward), (permute)) 2025-12-04T12:20:32.3183318Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/utils.py", line 2409, in run_and_get_code 2025-12-04T12:20:32.3184061Z result = fn(*args, **kwargs) 2025-12-04T12:20:32.3184781Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:20:32.3185660Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:20:32.3186550Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 992, in _compile_fx_inner 2025-12-04T12:20:32.3187399Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:20:32.3188247Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 988, in _compile_fx_inner 2025-12-04T12:20:32.3189034Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:20:32.3189864Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:20:32.3190870Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:20:32.3191922Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1471, in codegen_and_compile 2025-12-04T12:20:32.3192708Z _check_triton_bf16_support(graph) 2025-12-04T12:20:32.3193520Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2911, in _check_triton_bf16_support 2025-12-04T12:20:32.3194341Z warn_and_skip(node.get_device()) 2025-12-04T12:20:32.3195080Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2894, in warn_and_skip 2025-12-04T12:20:32.3195843Z raise SkipFrame("BF16 is not supported") 2025-12-04T12:20:32.3196377Z torch._inductor.exc.InductorError: SkipFrame: BF16 is not supported 2025-12-04T12:20:32.3196766Z 2025-12-04T12:20:32.3197491Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:20:32.3198340Z 2025-12-04T12:20:32.3198345Z 2025-12-04T12:20:32.3198585Z To execute this test, run the following from the base repo dir: 2025-12-04T12:20:32.3199455Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_loop_ordering.py MemoryCoalescingTest.test_induced_fused_tiling 2025-12-04T12:20:32.3200125Z 2025-12-04T12:20:32.3200395Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:20:32.3201023Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:20:32.3201532Z frames [('total', 1)] 2025-12-04T12:20:32.3201815Z stats [('calls_captured', 3)] 2025-12-04T12:20:32.3202261Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-12-04T12:20:32.3202975Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_bypass', 1)] 2025-12-04T12:20:32.3203556Z graph_break [] 2025-12-04T12:20:32.3203971Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:20:32.3205107Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:20:32.3206077Z warnings.warn( 2025-12-04T12:20:32.3206450Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:20:32.3206915Z frames [('total', 1)] 2025-12-04T12:20:32.3207219Z stats [('calls_captured', 3)] 2025-12-04T12:20:32.3207658Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-12-04T12:20:32.3208379Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_bypass', 1)] 2025-12-04T12:20:32.3208979Z graph_break [] 2025-12-04T12:20:32.3209339Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:20:32.3210433Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:20:32.3211404Z warnings.warn( 2025-12-04T12:20:32.3211788Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:20:32.3212242Z frames [('total', 1)] 2025-12-04T12:20:32.3212541Z stats [('calls_captured', 3)] 2025-12-04T12:20:32.3212987Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-12-04T12:20:32.3213716Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_bypass', 1)] 2025-12-04T12:20:32.3214319Z graph_break [] 2025-12-04T12:20:32.3214697Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:20:32.3215794Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:20:32.3216857Z warnings.warn( 2025-12-04T12:20:32.3217846Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_loop_ordering/inductor.test_loop_ordering-264346bf50f4314b.xml - 2025-12-04T12:20:32.3218916Z =========================== short test summary info ============================ 2025-12-04T12:20:32.3219986Z FAILED [0.0939s] inductor/test_loop_ordering.py::MemoryCoalescingTest::test_induced_fused_tiling - torch._inductor.exc.InductorError: SkipFrame: BF16 is not supported 2025-12-04T12:20:32.3220840Z 2025-12-04T12:20:32.3221547Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:20:32.3222405Z 2025-12-04T12:20:32.3222409Z 2025-12-04T12:20:32.3222633Z To execute this test, run the following from the base repo dir: 2025-12-04T12:20:32.3223515Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_loop_ordering.py MemoryCoalescingTest.test_induced_fused_tiling 2025-12-04T12:20:32.3224174Z 2025-12-04T12:20:32.3224458Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:20:32.3225040Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T12:20:32.3225579Z ============== 1 failed, 19 passed, 4 skipped, 2 rerun in 19.61s =============== 2025-12-04T12:20:32.3226035Z Got exit code 1 2025-12-04T12:20:32.3226313Z Retrying single test... 2025-12-04T12:20:32.3226944Z W1204 12:19:26.073000 138524 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T12:20:32.3228156Z Test results will be stored in test-reports/python-pytest/inductor.test_loop_ordering/inductor.test_loop_ordering-f4f4ac9590e83730.xml 2025-12-04T12:20:32.3229050Z ============================= test session starts ============================== 2025-12-04T12:20:32.3229704Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T12:20:32.3230354Z cachedir: .pytest_cache 2025-12-04T12:20:32.3231106Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:20:32.3231895Z rootdir: /var/lib/jenkins/workspace 2025-12-04T12:20:32.3232235Z configfile: pytest.ini 2025-12-04T12:20:32.3233008Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T12:20:32.3233948Z collecting ... collected 53 items / 52 deselected / 1 selected 2025-12-04T12:20:32.3234900Z stepcurrent: skipping 23 already run items. Running only test/inductor/test_loop_ordering.py::MemoryCoalescingTest::test_induced_fused_tiling 2025-12-04T12:20:32.3235761Z Running 1 items in this shard 2025-12-04T12:20:32.3235989Z 2025-12-04T12:20:32.3236807Z inductor/test_loop_ordering.py::MemoryCoalescingTest::test_induced_fused_tiling W1204 12:19:30.302000 138524 site-packages/torch/_inductor/codecache.py:629] [0/0] Failed to pickle cache key 2025-12-04T12:20:32.3238214Z W1204 12:19:30.302000 138524 site-packages/torch/_inductor/codecache.py:629] [0/0] Traceback (most recent call last): 2025-12-04T12:20:32.3239607Z W1204 12:19:30.302000 138524 site-packages/torch/_inductor/codecache.py:629] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 625, in dumps 2025-12-04T12:20:32.3240901Z W1204 12:19:30.302000 138524 site-packages/torch/_inductor/codecache.py:629] [0/0] self.dump(obj) 2025-12-04T12:20:32.3242190Z W1204 12:19:30.302000 138524 site-packages/torch/_inductor/codecache.py:629] [0/0] AttributeError: Can't pickle local object 'MemoryCoalescingTest.test_induced_fused_tiling..fn' 2025-12-04T12:20:32.3243514Z W1204 12:19:30.545000 138524 site-packages/torch/_inductor/codecache.py:629] [0/0] Failed to pickle cache key 2025-12-04T12:20:32.3244513Z W1204 12:19:30.545000 138524 site-packages/torch/_inductor/codecache.py:629] [0/0] Traceback (most recent call last): 2025-12-04T12:20:32.3245931Z W1204 12:19:30.545000 138524 site-packages/torch/_inductor/codecache.py:629] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 625, in dumps 2025-12-04T12:20:32.3247233Z W1204 12:19:30.545000 138524 site-packages/torch/_inductor/codecache.py:629] [0/0] self.dump(obj) 2025-12-04T12:20:32.3248516Z W1204 12:19:30.545000 138524 site-packages/torch/_inductor/codecache.py:629] [0/0] AttributeError: Can't pickle local object 'MemoryCoalescingTest.test_induced_fused_tiling..fn' 2025-12-04T12:20:32.3249562Z ('RERUN', {'yellow': True}) [4.4104s] [100%] 2025-12-04T12:20:32.3250632Z inductor/test_loop_ordering.py::MemoryCoalescingTest::test_induced_fused_tiling W1204 12:19:30.896000 138524 site-packages/torch/_inductor/codecache.py:629] [0/0] Failed to pickle cache key 2025-12-04T12:20:32.3252030Z W1204 12:19:30.896000 138524 site-packages/torch/_inductor/codecache.py:629] [0/0] Traceback (most recent call last): 2025-12-04T12:20:32.3253527Z W1204 12:19:30.896000 138524 site-packages/torch/_inductor/codecache.py:629] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 625, in dumps 2025-12-04T12:20:32.3254817Z W1204 12:19:30.896000 138524 site-packages/torch/_inductor/codecache.py:629] [0/0] self.dump(obj) 2025-12-04T12:20:32.3256098Z W1204 12:19:30.896000 138524 site-packages/torch/_inductor/codecache.py:629] [0/0] AttributeError: Can't pickle local object 'MemoryCoalescingTest.test_induced_fused_tiling..fn' 2025-12-04T12:20:32.3257528Z W1204 12:19:30.922000 138524 site-packages/torch/_inductor/codecache.py:629] [0/0] Failed to pickle cache key 2025-12-04T12:20:32.3258530Z W1204 12:19:30.922000 138524 site-packages/torch/_inductor/codecache.py:629] [0/0] Traceback (most recent call last): 2025-12-04T12:20:32.3259921Z W1204 12:19:30.922000 138524 site-packages/torch/_inductor/codecache.py:629] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 625, in dumps 2025-12-04T12:20:32.3261290Z W1204 12:19:30.922000 138524 site-packages/torch/_inductor/codecache.py:629] [0/0] self.dump(obj) 2025-12-04T12:20:32.3262556Z W1204 12:19:30.922000 138524 site-packages/torch/_inductor/codecache.py:629] [0/0] AttributeError: Can't pickle local object 'MemoryCoalescingTest.test_induced_fused_tiling..fn' 2025-12-04T12:20:32.3263607Z ('RERUN', {'yellow': True}) [0.2820s] [100%] 2025-12-04T12:20:32.3264699Z inductor/test_loop_ordering.py::MemoryCoalescingTest::test_induced_fused_tiling W1204 12:19:30.995000 138524 site-packages/torch/_inductor/codecache.py:629] [0/0] Failed to pickle cache key 2025-12-04T12:20:32.3266109Z W1204 12:19:30.995000 138524 site-packages/torch/_inductor/codecache.py:629] [0/0] Traceback (most recent call last): 2025-12-04T12:20:32.3267501Z W1204 12:19:30.995000 138524 site-packages/torch/_inductor/codecache.py:629] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 625, in dumps 2025-12-04T12:20:32.3268813Z W1204 12:19:30.995000 138524 site-packages/torch/_inductor/codecache.py:629] [0/0] self.dump(obj) 2025-12-04T12:20:32.3270099Z W1204 12:19:30.995000 138524 site-packages/torch/_inductor/codecache.py:629] [0/0] AttributeError: Can't pickle local object 'MemoryCoalescingTest.test_induced_fused_tiling..fn' 2025-12-04T12:20:32.3271620Z W1204 12:19:31.020000 138524 site-packages/torch/_inductor/codecache.py:629] [0/0] Failed to pickle cache key 2025-12-04T12:20:32.3272610Z W1204 12:19:31.020000 138524 site-packages/torch/_inductor/codecache.py:629] [0/0] Traceback (most recent call last): 2025-12-04T12:20:32.3273999Z W1204 12:19:31.020000 138524 site-packages/torch/_inductor/codecache.py:629] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 625, in dumps 2025-12-04T12:20:32.3275293Z W1204 12:19:31.020000 138524 site-packages/torch/_inductor/codecache.py:629] [0/0] self.dump(obj) 2025-12-04T12:20:32.3276718Z W1204 12:19:31.020000 138524 site-packages/torch/_inductor/codecache.py:629] [0/0] AttributeError: Can't pickle local object 'MemoryCoalescingTest.test_induced_fused_tiling..fn' 2025-12-04T12:20:32.3277753Z FAILED [0.0955s] [100%] 2025-12-04T12:20:32.3277936Z 2025-12-04T12:20:32.3278083Z ==================================== RERUNS ==================================== 2025-12-04T12:20:32.3278658Z ________________ MemoryCoalescingTest.test_induced_fused_tiling ________________ 2025-12-04T12:20:32.3279206Z Traceback (most recent call last): 2025-12-04T12:20:32.3279932Z File "/var/lib/jenkins/workspace/test/inductor/test_loop_ordering.py", line 1042, in test_induced_fused_tiling 2025-12-04T12:20:32.3280782Z out, code = run_and_get_code(torch.compile(forward), (permute)) 2025-12-04T12:20:32.3281620Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/utils.py", line 2409, in run_and_get_code 2025-12-04T12:20:32.3282353Z result = fn(*args, **kwargs) 2025-12-04T12:20:32.3283064Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:20:32.3283949Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:20:32.3284854Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 992, in _compile_fx_inner 2025-12-04T12:20:32.3285700Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:20:32.3286522Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 988, in _compile_fx_inner 2025-12-04T12:20:32.3287386Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:20:32.3288210Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:20:32.3289197Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:20:32.3290246Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1471, in codegen_and_compile 2025-12-04T12:20:32.3291086Z _check_triton_bf16_support(graph) 2025-12-04T12:20:32.3291884Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2911, in _check_triton_bf16_support 2025-12-04T12:20:32.3292690Z warn_and_skip(node.get_device()) 2025-12-04T12:20:32.3293427Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2894, in warn_and_skip 2025-12-04T12:20:32.3294201Z raise SkipFrame("BF16 is not supported") 2025-12-04T12:20:32.3294716Z torch._inductor.exc.InductorError: SkipFrame: BF16 is not supported 2025-12-04T12:20:32.3295121Z 2025-12-04T12:20:32.3295835Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:20:32.3296766Z 2025-12-04T12:20:32.3296771Z 2025-12-04T12:20:32.3296992Z To execute this test, run the following from the base repo dir: 2025-12-04T12:20:32.3297877Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_loop_ordering.py MemoryCoalescingTest.test_induced_fused_tiling 2025-12-04T12:20:32.3298534Z 2025-12-04T12:20:32.3298817Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:20:32.3299440Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:20:32.3299913Z frames [('total', 1)] 2025-12-04T12:20:32.3300216Z stats [('calls_captured', 3)] 2025-12-04T12:20:32.3300777Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_bypass', 1)] 2025-12-04T12:20:32.3301504Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-12-04T12:20:32.3301974Z graph_break [] 2025-12-04T12:20:32.3302360Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:20:32.3303506Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:20:32.3304481Z warnings.warn( 2025-12-04T12:20:32.3304926Z ________________ MemoryCoalescingTest.test_induced_fused_tiling ________________ 2025-12-04T12:20:32.3305455Z Traceback (most recent call last): 2025-12-04T12:20:32.3306196Z File "/var/lib/jenkins/workspace/test/inductor/test_loop_ordering.py", line 1042, in test_induced_fused_tiling 2025-12-04T12:20:32.3307049Z out, code = run_and_get_code(torch.compile(forward), (permute)) 2025-12-04T12:20:32.3307872Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/utils.py", line 2409, in run_and_get_code 2025-12-04T12:20:32.3308590Z result = fn(*args, **kwargs) 2025-12-04T12:20:32.3309300Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:20:32.3310184Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:20:32.3311087Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 992, in _compile_fx_inner 2025-12-04T12:20:32.3311920Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:20:32.3312757Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 988, in _compile_fx_inner 2025-12-04T12:20:32.3313554Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:20:32.3314400Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:20:32.3315395Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:20:32.3316391Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1471, in codegen_and_compile 2025-12-04T12:20:32.3317216Z _check_triton_bf16_support(graph) 2025-12-04T12:20:32.3318047Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2911, in _check_triton_bf16_support 2025-12-04T12:20:32.3318866Z warn_and_skip(node.get_device()) 2025-12-04T12:20:32.3319596Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2894, in warn_and_skip 2025-12-04T12:20:32.3320363Z raise SkipFrame("BF16 is not supported") 2025-12-04T12:20:32.3320872Z torch._inductor.exc.InductorError: SkipFrame: BF16 is not supported 2025-12-04T12:20:32.3321277Z 2025-12-04T12:20:32.3321987Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:20:32.3322829Z 2025-12-04T12:20:32.3322851Z 2025-12-04T12:20:32.3323072Z To execute this test, run the following from the base repo dir: 2025-12-04T12:20:32.3323950Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_loop_ordering.py MemoryCoalescingTest.test_induced_fused_tiling 2025-12-04T12:20:32.3324611Z 2025-12-04T12:20:32.3324879Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:20:32.3325512Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:20:32.3325981Z frames [('total', 1)] 2025-12-04T12:20:32.3326270Z stats [('calls_captured', 3)] 2025-12-04T12:20:32.3326842Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_bypass', 1)] 2025-12-04T12:20:32.3327572Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-12-04T12:20:32.3328048Z graph_break [] 2025-12-04T12:20:32.3328420Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:20:32.3329514Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:20:32.3330490Z warnings.warn( 2025-12-04T12:20:32.3330904Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:20:32.3331378Z frames [('total', 1)] 2025-12-04T12:20:32.3331682Z stats [('calls_captured', 3)] 2025-12-04T12:20:32.3332130Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-12-04T12:20:32.3332837Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_bypass', 1)] 2025-12-04T12:20:32.3333433Z graph_break [] 2025-12-04T12:20:32.3333814Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:20:32.3334888Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:20:32.3335853Z warnings.warn( 2025-12-04T12:20:32.3336170Z =================================== FAILURES =================================== 2025-12-04T12:20:32.3336835Z ________________ MemoryCoalescingTest.test_induced_fused_tiling ________________ 2025-12-04T12:20:32.3337384Z Traceback (most recent call last): 2025-12-04T12:20:32.3338123Z File "/var/lib/jenkins/workspace/test/inductor/test_loop_ordering.py", line 1042, in test_induced_fused_tiling 2025-12-04T12:20:32.3338969Z out, code = run_and_get_code(torch.compile(forward), (permute)) 2025-12-04T12:20:32.3339796Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/utils.py", line 2409, in run_and_get_code 2025-12-04T12:20:32.3340583Z result = fn(*args, **kwargs) 2025-12-04T12:20:32.3341294Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:20:32.3342175Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:20:32.3343069Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 992, in _compile_fx_inner 2025-12-04T12:20:32.3343955Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:20:32.3344835Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 988, in _compile_fx_inner 2025-12-04T12:20:32.3345638Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:20:32.3346451Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:20:32.3347455Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:20:32.3348455Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1471, in codegen_and_compile 2025-12-04T12:20:32.3349250Z _check_triton_bf16_support(graph) 2025-12-04T12:20:32.3350043Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2911, in _check_triton_bf16_support 2025-12-04T12:20:32.3350872Z warn_and_skip(node.get_device()) 2025-12-04T12:20:32.3351613Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2894, in warn_and_skip 2025-12-04T12:20:32.3352371Z raise SkipFrame("BF16 is not supported") 2025-12-04T12:20:32.3352901Z torch._inductor.exc.InductorError: SkipFrame: BF16 is not supported 2025-12-04T12:20:32.3353301Z 2025-12-04T12:20:32.3354015Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:20:32.3354858Z 2025-12-04T12:20:32.3354863Z 2025-12-04T12:20:32.3355096Z To execute this test, run the following from the base repo dir: 2025-12-04T12:20:32.3355968Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_loop_ordering.py MemoryCoalescingTest.test_induced_fused_tiling 2025-12-04T12:20:32.3356634Z 2025-12-04T12:20:32.3356901Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:20:32.3357539Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:20:32.3358047Z frames [('total', 1)] 2025-12-04T12:20:32.3358333Z stats [('calls_captured', 3)] 2025-12-04T12:20:32.3358903Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_bypass', 1)] 2025-12-04T12:20:32.3359624Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-12-04T12:20:32.3360080Z graph_break [] 2025-12-04T12:20:32.3360459Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:20:32.3361556Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:20:32.3362523Z warnings.warn( 2025-12-04T12:20:32.3362893Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:20:32.3363360Z frames [('total', 1)] 2025-12-04T12:20:32.3363666Z stats [('calls_captured', 3)] 2025-12-04T12:20:32.3364101Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-12-04T12:20:32.3364834Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_bypass', 1)] 2025-12-04T12:20:32.3365432Z graph_break [] 2025-12-04T12:20:32.3365810Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:20:32.3366882Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:20:32.3367895Z warnings.warn( 2025-12-04T12:20:32.3368279Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:20:32.3368733Z frames [('total', 1)] 2025-12-04T12:20:32.3369034Z stats [('calls_captured', 3)] 2025-12-04T12:20:32.3369481Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-12-04T12:20:32.3370241Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_bypass', 1)] 2025-12-04T12:20:32.3370834Z graph_break [] 2025-12-04T12:20:32.3371592Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:20:32.3372690Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:20:32.3373651Z warnings.warn( 2025-12-04T12:20:32.3374587Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_loop_ordering/inductor.test_loop_ordering-f4f4ac9590e83730.xml - 2025-12-04T12:20:32.3375668Z =========================== short test summary info ============================ 2025-12-04T12:20:32.3376791Z FAILED [0.0955s] inductor/test_loop_ordering.py::MemoryCoalescingTest::test_induced_fused_tiling - torch._inductor.exc.InductorError: SkipFrame: BF16 is not supported 2025-12-04T12:20:32.3377652Z 2025-12-04T12:20:32.3378388Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:20:32.3379238Z 2025-12-04T12:20:32.3379243Z 2025-12-04T12:20:32.3379465Z To execute this test, run the following from the base repo dir: 2025-12-04T12:20:32.3380347Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_loop_ordering.py MemoryCoalescingTest.test_induced_fused_tiling 2025-12-04T12:20:32.3381016Z 2025-12-04T12:20:32.3381288Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:20:32.3381885Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T12:20:32.3382404Z ================== 1 failed, 52 deselected, 2 rerun in 4.86s =================== 2025-12-04T12:20:32.3382850Z Got exit code 1 2025-12-04T12:20:32.3383126Z Retrying single test... 2025-12-04T12:20:32.3383762Z W1204 12:19:45.879000 138693 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T12:20:32.3384999Z Test results will be stored in test-reports/python-pytest/inductor.test_loop_ordering/inductor.test_loop_ordering-193da166d9e268ac.xml 2025-12-04T12:20:32.3385888Z ============================= test session starts ============================== 2025-12-04T12:20:32.3386558Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T12:20:32.3387149Z cachedir: .pytest_cache 2025-12-04T12:20:32.3387863Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:20:32.3388655Z rootdir: /var/lib/jenkins/workspace 2025-12-04T12:20:32.3388998Z configfile: pytest.ini 2025-12-04T12:20:32.3389774Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T12:20:32.3390730Z collecting ... collected 53 items / 52 deselected / 1 selected 2025-12-04T12:20:32.3391690Z stepcurrent: skipping 23 already run items. Running only test/inductor/test_loop_ordering.py::MemoryCoalescingTest::test_induced_fused_tiling 2025-12-04T12:20:32.3392531Z Running 1 items in this shard 2025-12-04T12:20:32.3392757Z 2025-12-04T12:20:32.3393574Z inductor/test_loop_ordering.py::MemoryCoalescingTest::test_induced_fused_tiling W1204 12:19:50.070000 138693 site-packages/torch/_inductor/codecache.py:629] [0/0] Failed to pickle cache key 2025-12-04T12:20:32.3395035Z W1204 12:19:50.070000 138693 site-packages/torch/_inductor/codecache.py:629] [0/0] Traceback (most recent call last): 2025-12-04T12:20:32.3396421Z W1204 12:19:50.070000 138693 site-packages/torch/_inductor/codecache.py:629] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 625, in dumps 2025-12-04T12:20:32.3397700Z W1204 12:19:50.070000 138693 site-packages/torch/_inductor/codecache.py:629] [0/0] self.dump(obj) 2025-12-04T12:20:32.3399076Z W1204 12:19:50.070000 138693 site-packages/torch/_inductor/codecache.py:629] [0/0] AttributeError: Can't pickle local object 'MemoryCoalescingTest.test_induced_fused_tiling..fn' 2025-12-04T12:20:32.3400396Z W1204 12:19:50.307000 138693 site-packages/torch/_inductor/codecache.py:629] [0/0] Failed to pickle cache key 2025-12-04T12:20:32.3401394Z W1204 12:19:50.307000 138693 site-packages/torch/_inductor/codecache.py:629] [0/0] Traceback (most recent call last): 2025-12-04T12:20:32.3402774Z W1204 12:19:50.307000 138693 site-packages/torch/_inductor/codecache.py:629] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 625, in dumps 2025-12-04T12:20:32.3404052Z W1204 12:19:50.307000 138693 site-packages/torch/_inductor/codecache.py:629] [0/0] self.dump(obj) 2025-12-04T12:20:32.3405329Z W1204 12:19:50.307000 138693 site-packages/torch/_inductor/codecache.py:629] [0/0] AttributeError: Can't pickle local object 'MemoryCoalescingTest.test_induced_fused_tiling..fn' 2025-12-04T12:20:32.3406379Z ('RERUN', {'yellow': True}) [4.3697s] [100%] 2025-12-04T12:20:32.3407460Z inductor/test_loop_ordering.py::MemoryCoalescingTest::test_induced_fused_tiling W1204 12:19:50.660000 138693 site-packages/torch/_inductor/codecache.py:629] [0/0] Failed to pickle cache key 2025-12-04T12:20:32.3408844Z W1204 12:19:50.660000 138693 site-packages/torch/_inductor/codecache.py:629] [0/0] Traceback (most recent call last): 2025-12-04T12:20:32.3410224Z W1204 12:19:50.660000 138693 site-packages/torch/_inductor/codecache.py:629] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 625, in dumps 2025-12-04T12:20:32.3426794Z W1204 12:19:50.660000 138693 site-packages/torch/_inductor/codecache.py:629] [0/0] self.dump(obj) 2025-12-04T12:20:32.3428117Z W1204 12:19:50.660000 138693 site-packages/torch/_inductor/codecache.py:629] [0/0] AttributeError: Can't pickle local object 'MemoryCoalescingTest.test_induced_fused_tiling..fn' 2025-12-04T12:20:32.3429563Z W1204 12:19:50.686000 138693 site-packages/torch/_inductor/codecache.py:629] [0/0] Failed to pickle cache key 2025-12-04T12:20:32.3430576Z W1204 12:19:50.686000 138693 site-packages/torch/_inductor/codecache.py:629] [0/0] Traceback (most recent call last): 2025-12-04T12:20:32.3431972Z W1204 12:19:50.686000 138693 site-packages/torch/_inductor/codecache.py:629] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 625, in dumps 2025-12-04T12:20:32.3433270Z W1204 12:19:50.686000 138693 site-packages/torch/_inductor/codecache.py:629] [0/0] self.dump(obj) 2025-12-04T12:20:32.3434550Z W1204 12:19:50.686000 138693 site-packages/torch/_inductor/codecache.py:629] [0/0] AttributeError: Can't pickle local object 'MemoryCoalescingTest.test_induced_fused_tiling..fn' 2025-12-04T12:20:32.3435597Z ('RERUN', {'yellow': True}) [0.2848s] [100%] 2025-12-04T12:20:32.3436703Z inductor/test_loop_ordering.py::MemoryCoalescingTest::test_induced_fused_tiling W1204 12:19:50.757000 138693 site-packages/torch/_inductor/codecache.py:629] [0/0] Failed to pickle cache key 2025-12-04T12:20:32.3438105Z W1204 12:19:50.757000 138693 site-packages/torch/_inductor/codecache.py:629] [0/0] Traceback (most recent call last): 2025-12-04T12:20:32.3439483Z W1204 12:19:50.757000 138693 site-packages/torch/_inductor/codecache.py:629] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 625, in dumps 2025-12-04T12:20:32.3440823Z W1204 12:19:50.757000 138693 site-packages/torch/_inductor/codecache.py:629] [0/0] self.dump(obj) 2025-12-04T12:20:32.3442106Z W1204 12:19:50.757000 138693 site-packages/torch/_inductor/codecache.py:629] [0/0] AttributeError: Can't pickle local object 'MemoryCoalescingTest.test_induced_fused_tiling..fn' 2025-12-04T12:20:32.3443423Z W1204 12:19:50.782000 138693 site-packages/torch/_inductor/codecache.py:629] [0/0] Failed to pickle cache key 2025-12-04T12:20:32.3444527Z W1204 12:19:50.782000 138693 site-packages/torch/_inductor/codecache.py:629] [0/0] Traceback (most recent call last): 2025-12-04T12:20:32.3445907Z W1204 12:19:50.782000 138693 site-packages/torch/_inductor/codecache.py:629] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 625, in dumps 2025-12-04T12:20:32.3447193Z W1204 12:19:50.782000 138693 site-packages/torch/_inductor/codecache.py:629] [0/0] self.dump(obj) 2025-12-04T12:20:32.3448483Z W1204 12:19:50.782000 138693 site-packages/torch/_inductor/codecache.py:629] [0/0] AttributeError: Can't pickle local object 'MemoryCoalescingTest.test_induced_fused_tiling..fn' 2025-12-04T12:20:32.3449503Z FAILED [0.0949s] [100%] 2025-12-04T12:20:32.3449689Z 2025-12-04T12:20:32.3449838Z ==================================== RERUNS ==================================== 2025-12-04T12:20:32.3450418Z ________________ MemoryCoalescingTest.test_induced_fused_tiling ________________ 2025-12-04T12:20:32.3450972Z Traceback (most recent call last): 2025-12-04T12:20:32.3451698Z File "/var/lib/jenkins/workspace/test/inductor/test_loop_ordering.py", line 1042, in test_induced_fused_tiling 2025-12-04T12:20:32.3452533Z out, code = run_and_get_code(torch.compile(forward), (permute)) 2025-12-04T12:20:32.3453361Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/utils.py", line 2409, in run_and_get_code 2025-12-04T12:20:32.3454106Z result = fn(*args, **kwargs) 2025-12-04T12:20:32.3454798Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:20:32.3455671Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:20:32.3456693Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 992, in _compile_fx_inner 2025-12-04T12:20:32.3457547Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:20:32.3458419Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 988, in _compile_fx_inner 2025-12-04T12:20:32.3459216Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:20:32.3460037Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:20:32.3461021Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:20:32.3462013Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1471, in codegen_and_compile 2025-12-04T12:20:32.3462812Z _check_triton_bf16_support(graph) 2025-12-04T12:20:32.3463611Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2911, in _check_triton_bf16_support 2025-12-04T12:20:32.3464414Z warn_and_skip(node.get_device()) 2025-12-04T12:20:32.3465147Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2894, in warn_and_skip 2025-12-04T12:20:32.3465923Z raise SkipFrame("BF16 is not supported") 2025-12-04T12:20:32.3466447Z torch._inductor.exc.InductorError: SkipFrame: BF16 is not supported 2025-12-04T12:20:32.3466830Z 2025-12-04T12:20:32.3467541Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:20:32.3468394Z 2025-12-04T12:20:32.3468399Z 2025-12-04T12:20:32.3468657Z To execute this test, run the following from the base repo dir: 2025-12-04T12:20:32.3469535Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_loop_ordering.py MemoryCoalescingTest.test_induced_fused_tiling 2025-12-04T12:20:32.3470197Z 2025-12-04T12:20:32.3470477Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:20:32.3473092Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:20:32.3473888Z frames [('total', 1)] 2025-12-04T12:20:32.3474202Z stats [('calls_captured', 3)] 2025-12-04T12:20:32.3474830Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_bypass', 1)] 2025-12-04T12:20:32.3475569Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-12-04T12:20:32.3476044Z graph_break [] 2025-12-04T12:20:32.3476432Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:20:32.3477532Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:20:32.3478511Z warnings.warn( 2025-12-04T12:20:32.3478952Z ________________ MemoryCoalescingTest.test_induced_fused_tiling ________________ 2025-12-04T12:20:32.3479483Z Traceback (most recent call last): 2025-12-04T12:20:32.3480213Z File "/var/lib/jenkins/workspace/test/inductor/test_loop_ordering.py", line 1042, in test_induced_fused_tiling 2025-12-04T12:20:32.3481062Z out, code = run_and_get_code(torch.compile(forward), (permute)) 2025-12-04T12:20:32.3481893Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/utils.py", line 2409, in run_and_get_code 2025-12-04T12:20:32.3482619Z result = fn(*args, **kwargs) 2025-12-04T12:20:32.3483327Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:20:32.3484207Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:20:32.3485110Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 992, in _compile_fx_inner 2025-12-04T12:20:32.3485939Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:20:32.3486778Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 988, in _compile_fx_inner 2025-12-04T12:20:32.3487578Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:20:32.3488439Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:20:32.3489438Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:20:32.3490422Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1471, in codegen_and_compile 2025-12-04T12:20:32.3491220Z _check_triton_bf16_support(graph) 2025-12-04T12:20:32.3492007Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2911, in _check_triton_bf16_support 2025-12-04T12:20:32.3492823Z warn_and_skip(node.get_device()) 2025-12-04T12:20:32.3493549Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2894, in warn_and_skip 2025-12-04T12:20:32.3494319Z raise SkipFrame("BF16 is not supported") 2025-12-04T12:20:32.3494835Z torch._inductor.exc.InductorError: SkipFrame: BF16 is not supported 2025-12-04T12:20:32.3495232Z 2025-12-04T12:20:32.3495960Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:20:32.3496902Z 2025-12-04T12:20:32.3496908Z 2025-12-04T12:20:32.3497130Z To execute this test, run the following from the base repo dir: 2025-12-04T12:20:32.3498010Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_loop_ordering.py MemoryCoalescingTest.test_induced_fused_tiling 2025-12-04T12:20:32.3498734Z 2025-12-04T12:20:32.3499003Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:20:32.3499642Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:20:32.3500113Z frames [('total', 1)] 2025-12-04T12:20:32.3500417Z stats [('calls_captured', 3)] 2025-12-04T12:20:32.3501013Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_bypass', 1)] 2025-12-04T12:20:32.3501778Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-12-04T12:20:32.3502258Z graph_break [] 2025-12-04T12:20:32.3502624Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:20:32.3503725Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:20:32.3504695Z warnings.warn( 2025-12-04T12:20:32.3505076Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:20:32.3505531Z frames [('total', 1)] 2025-12-04T12:20:32.3505831Z stats [('calls_captured', 3)] 2025-12-04T12:20:32.3506275Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-12-04T12:20:32.3506980Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_bypass', 1)] 2025-12-04T12:20:32.3507579Z graph_break [] 2025-12-04T12:20:32.3507967Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:20:32.3509050Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:20:32.3510033Z warnings.warn( 2025-12-04T12:20:32.3510354Z =================================== FAILURES =================================== 2025-12-04T12:20:32.3510930Z ________________ MemoryCoalescingTest.test_induced_fused_tiling ________________ 2025-12-04T12:20:32.3511467Z Traceback (most recent call last): 2025-12-04T12:20:32.3512200Z File "/var/lib/jenkins/workspace/test/inductor/test_loop_ordering.py", line 1042, in test_induced_fused_tiling 2025-12-04T12:20:32.3513044Z out, code = run_and_get_code(torch.compile(forward), (permute)) 2025-12-04T12:20:32.3513877Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/utils.py", line 2409, in run_and_get_code 2025-12-04T12:20:32.3514605Z result = fn(*args, **kwargs) 2025-12-04T12:20:32.3515358Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T12:20:32.3516244Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T12:20:32.3517134Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 992, in _compile_fx_inner 2025-12-04T12:20:32.3517984Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:20:32.3518826Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 988, in _compile_fx_inner 2025-12-04T12:20:32.3519621Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:20:32.3520430Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:20:32.3521437Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:20:32.3522432Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1471, in codegen_and_compile 2025-12-04T12:20:32.3523230Z _check_triton_bf16_support(graph) 2025-12-04T12:20:32.3524012Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2911, in _check_triton_bf16_support 2025-12-04T12:20:32.3524827Z warn_and_skip(node.get_device()) 2025-12-04T12:20:32.3525652Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2894, in warn_and_skip 2025-12-04T12:20:32.3526406Z raise SkipFrame("BF16 is not supported") 2025-12-04T12:20:32.3526933Z torch._inductor.exc.InductorError: SkipFrame: BF16 is not supported 2025-12-04T12:20:32.3527331Z 2025-12-04T12:20:32.3528041Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:20:32.3528943Z 2025-12-04T12:20:32.3528951Z 2025-12-04T12:20:32.3529214Z To execute this test, run the following from the base repo dir: 2025-12-04T12:20:32.3530094Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_loop_ordering.py MemoryCoalescingTest.test_induced_fused_tiling 2025-12-04T12:20:32.3530752Z 2025-12-04T12:20:32.3531020Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:20:32.3531651Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:20:32.3532121Z frames [('total', 1)] 2025-12-04T12:20:32.3532406Z stats [('calls_captured', 3)] 2025-12-04T12:20:32.3532974Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_bypass', 1)] 2025-12-04T12:20:32.3533698Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-12-04T12:20:32.3534173Z graph_break [] 2025-12-04T12:20:32.3534538Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:20:32.3535637Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:20:32.3536696Z warnings.warn( 2025-12-04T12:20:32.3537066Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:20:32.3537536Z frames [('total', 1)] 2025-12-04T12:20:32.3537835Z stats [('calls_captured', 3)] 2025-12-04T12:20:32.3538271Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-12-04T12:20:32.3538990Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_bypass', 1)] 2025-12-04T12:20:32.3539587Z graph_break [] 2025-12-04T12:20:32.3539964Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:20:32.3541090Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:20:32.3542066Z warnings.warn( 2025-12-04T12:20:32.3542448Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:20:32.3542905Z frames [('total', 1)] 2025-12-04T12:20:32.3543205Z stats [('calls_captured', 3)] 2025-12-04T12:20:32.3543647Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('not_ok', 1)] 2025-12-04T12:20:32.3544368Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_bypass', 1)] 2025-12-04T12:20:32.3544957Z graph_break [] 2025-12-04T12:20:32.3545338Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:20:32.3546423Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:20:32.3547383Z warnings.warn( 2025-12-04T12:20:32.3548312Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_loop_ordering/inductor.test_loop_ordering-193da166d9e268ac.xml - 2025-12-04T12:20:32.3549381Z =========================== short test summary info ============================ 2025-12-04T12:20:32.3550448Z FAILED [0.0949s] inductor/test_loop_ordering.py::MemoryCoalescingTest::test_induced_fused_tiling - torch._inductor.exc.InductorError: SkipFrame: BF16 is not supported 2025-12-04T12:20:32.3551307Z 2025-12-04T12:20:32.3552033Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T12:20:32.3552913Z 2025-12-04T12:20:32.3552918Z 2025-12-04T12:20:32.3553134Z To execute this test, run the following from the base repo dir: 2025-12-04T12:20:32.3554010Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_loop_ordering.py MemoryCoalescingTest.test_induced_fused_tiling 2025-12-04T12:20:32.3554714Z 2025-12-04T12:20:32.3554985Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:20:32.3555616Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T12:20:32.3556133Z ================== 1 failed, 52 deselected, 2 rerun in 4.82s =================== 2025-12-04T12:20:32.3556582Z Got exit code 1 2025-12-04T12:20:32.3557190Z FAILED CONSISTENTLY: test/inductor/test_loop_ordering.py::MemoryCoalescingTest::test_induced_fused_tiling 2025-12-04T12:20:32.3558169Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T12:20:32.3559177Z W1204 12:20:05.719000 138862 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T12:20:32.3560343Z Test results will be stored in test-reports/python-pytest/inductor.test_loop_ordering/inductor.test_loop_ordering-d4eac70f931f6c8b.xml 2025-12-04T12:20:32.3561230Z ============================= test session starts ============================== 2025-12-04T12:20:32.3562119Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T12:20:32.3562726Z cachedir: .pytest_cache 2025-12-04T12:20:32.3563441Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:20:32.3564229Z rootdir: /var/lib/jenkins/workspace 2025-12-04T12:20:32.3564572Z configfile: pytest.ini 2025-12-04T12:20:32.3565354Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T12:20:32.3566309Z collecting ... collected 53 items / 24 deselected / 29 selected 2025-12-04T12:20:32.3566803Z stepcurrent: skipping 24 already run items. 2025-12-04T12:20:32.3567198Z Running 29 items in this shard 2025-12-04T12:20:32.3567422Z 2025-12-04T12:20:32.3567861Z inductor/test_loop_ordering.py::MemoryCoalescingTest::test_inferred_splits_inps0 PASSED [0.0675s] [ 3%] 2025-12-04T12:20:32.3568951Z inductor/test_loop_ordering.py::MemoryCoalescingTest::test_inferred_splits_inps1 PASSED [0.0190s] [ 6%] 2025-12-04T12:20:32.3569935Z inductor/test_loop_ordering.py::MemoryCoalescingTest::test_inferred_splits_inps2 PASSED [0.0233s] [ 10%] 2025-12-04T12:20:32.3570926Z inductor/test_loop_ordering.py::MemoryCoalescingTest::test_inferred_splits_inps3 PASSED [0.0188s] [ 13%] 2025-12-04T12:20:32.3572544Z inductor/test_loop_ordering.py::MemoryCoalescingTest::test_reduction_no_pointwise W1204 12:20:10.051000 138862 site-packages/torch/_inductor/codecache.py:629] [0/0] Failed to pickle cache key 2025-12-04T12:20:32.3573972Z W1204 12:20:10.051000 138862 site-packages/torch/_inductor/codecache.py:629] [0/0] Traceback (most recent call last): 2025-12-04T12:20:32.3575351Z W1204 12:20:10.051000 138862 site-packages/torch/_inductor/codecache.py:629] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 625, in dumps 2025-12-04T12:20:32.3576718Z W1204 12:20:10.051000 138862 site-packages/torch/_inductor/codecache.py:629] [0/0] self.dump(obj) 2025-12-04T12:20:32.3578012Z W1204 12:20:10.051000 138862 site-packages/torch/_inductor/codecache.py:629] [0/0] AttributeError: Can't pickle local object 'MemoryCoalescingTest.test_reduction_no_pointwise..fn' 2025-12-04T12:20:32.3579344Z W1204 12:20:10.284000 138862 site-packages/torch/_inductor/codecache.py:629] [0/0] Failed to pickle cache key 2025-12-04T12:20:32.3580436Z W1204 12:20:10.284000 138862 site-packages/torch/_inductor/codecache.py:629] [0/0] Traceback (most recent call last): 2025-12-04T12:20:32.3581824Z W1204 12:20:10.284000 138862 site-packages/torch/_inductor/codecache.py:629] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 625, in dumps 2025-12-04T12:20:32.3583134Z W1204 12:20:10.284000 138862 site-packages/torch/_inductor/codecache.py:629] [0/0] self.dump(obj) 2025-12-04T12:20:32.3584538Z W1204 12:20:10.284000 138862 site-packages/torch/_inductor/codecache.py:629] [0/0] AttributeError: Can't pickle local object 'MemoryCoalescingTest.test_reduction_no_pointwise..fn' 2025-12-04T12:20:32.3585919Z I1204 12:20:10.908000 138862 site-packages/torch/_inductor/compile_fx.py:1584] [0/0] [__inductor_metrics] Graph Metrics: 2025-12-04T12:20:32.3587494Z I1204 12:20:10.908000 138862 site-packages/torch/_inductor/compile_fx.py:1584] [0/0] [__inductor_metrics] {'num_bytes_accessed': 4100, 'nodes_num_elem': [(SchedulerNode(name='op0'), 1025)], 'node_runtimes': [(SchedulerNode(name='op0'), 1.280993801239752e-05)]} 2025-12-04T12:20:32.3588762Z PASSED [4.9165s] [ 17%] 2025-12-04T12:20:32.3589779Z inductor/test_loop_ordering.py::MemoryCoalescingTest::test_reduction_pointwise W1204 12:20:11.009000 138862 site-packages/torch/_inductor/codecache.py:629] [0/0] Failed to pickle cache key 2025-12-04T12:20:32.3591188Z W1204 12:20:11.009000 138862 site-packages/torch/_inductor/codecache.py:629] [0/0] Traceback (most recent call last): 2025-12-04T12:20:32.3592569Z W1204 12:20:11.009000 138862 site-packages/torch/_inductor/codecache.py:629] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 625, in dumps 2025-12-04T12:20:32.3593866Z W1204 12:20:11.009000 138862 site-packages/torch/_inductor/codecache.py:629] [0/0] self.dump(obj) 2025-12-04T12:20:32.3595151Z W1204 12:20:11.009000 138862 site-packages/torch/_inductor/codecache.py:629] [0/0] AttributeError: Can't pickle local object 'MemoryCoalescingTest.test_reduction_pointwise..fn' 2025-12-04T12:20:32.3596476Z W1204 12:20:11.037000 138862 site-packages/torch/_inductor/codecache.py:629] [0/0] Failed to pickle cache key 2025-12-04T12:20:32.3597469Z W1204 12:20:11.037000 138862 site-packages/torch/_inductor/codecache.py:629] [0/0] Traceback (most recent call last): 2025-12-04T12:20:32.3598899Z W1204 12:20:11.037000 138862 site-packages/torch/_inductor/codecache.py:629] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 625, in dumps 2025-12-04T12:20:32.3600199Z W1204 12:20:11.037000 138862 site-packages/torch/_inductor/codecache.py:629] [0/0] self.dump(obj) 2025-12-04T12:20:32.3601481Z W1204 12:20:11.037000 138862 site-packages/torch/_inductor/codecache.py:629] [0/0] AttributeError: Can't pickle local object 'MemoryCoalescingTest.test_reduction_pointwise..fn' 2025-12-04T12:20:32.3602841Z I1204 12:20:11.164000 138862 site-packages/torch/_inductor/compile_fx.py:1584] [0/0] [__inductor_metrics] Graph Metrics: 2025-12-04T12:20:32.3604535Z I1204 12:20:11.164000 138862 site-packages/torch/_inductor/compile_fx.py:1584] [0/0] [__inductor_metrics] {'num_bytes_accessed': 525312, 'nodes_num_elem': [(FusedSchedulerNode(nodes=op0_op1), 131328)], 'node_runtimes': [(FusedSchedulerNode(nodes=op0_op1), 0.0016412717456508697)]} 2025-12-04T12:20:32.3605893Z PASSED [0.2533s] [ 20%] 2025-12-04T12:20:32.3606890Z inductor/test_loop_ordering.py::MemoryCoalescingTest::test_remapped_reads W1204 12:20:11.243000 138862 site-packages/torch/_inductor/codecache.py:629] [0/0] Failed to pickle cache key 2025-12-04T12:20:32.3608268Z W1204 12:20:11.243000 138862 site-packages/torch/_inductor/codecache.py:629] [0/0] Traceback (most recent call last): 2025-12-04T12:20:32.3609640Z W1204 12:20:11.243000 138862 site-packages/torch/_inductor/codecache.py:629] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 625, in dumps 2025-12-04T12:20:32.3610974Z W1204 12:20:11.243000 138862 site-packages/torch/_inductor/codecache.py:629] [0/0] self.dump(obj) 2025-12-04T12:20:32.3612228Z W1204 12:20:11.243000 138862 site-packages/torch/_inductor/codecache.py:629] [0/0] AttributeError: Can't pickle local object 'MemoryCoalescingTest.test_remapped_reads..fn' 2025-12-04T12:20:32.3613520Z W1204 12:20:11.266000 138862 site-packages/torch/_inductor/codecache.py:629] [0/0] Failed to pickle cache key 2025-12-04T12:20:32.3614552Z W1204 12:20:11.266000 138862 site-packages/torch/_inductor/codecache.py:629] [0/0] Traceback (most recent call last): 2025-12-04T12:20:32.3615955Z W1204 12:20:11.266000 138862 site-packages/torch/_inductor/codecache.py:629] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 625, in dumps 2025-12-04T12:20:32.3617335Z W1204 12:20:11.266000 138862 site-packages/torch/_inductor/codecache.py:629] [0/0] self.dump(obj) 2025-12-04T12:20:32.3618586Z W1204 12:20:11.266000 138862 site-packages/torch/_inductor/codecache.py:629] [0/0] AttributeError: Can't pickle local object 'MemoryCoalescingTest.test_remapped_reads..fn' 2025-12-04T12:20:32.3619916Z I1204 12:20:11.358000 138862 site-packages/torch/_inductor/compile_fx.py:1584] [0/0] [__inductor_metrics] Graph Metrics: 2025-12-04T12:20:32.3621467Z I1204 12:20:11.358000 138862 site-packages/torch/_inductor/compile_fx.py:1584] [0/0] [__inductor_metrics] {'num_bytes_accessed': 192, 'nodes_num_elem': [(SchedulerNode(name='op0'), 48)], 'node_runtimes': [(SchedulerNode(name='op0'), 5.99880023995201e-07)]} 2025-12-04T12:20:32.3622684Z PASSED [0.1907s] [ 24%] 2025-12-04T12:20:32.3623695Z inductor/test_loop_ordering.py::MemoryCoalescingTest::test_remapped_reads_split W1204 12:20:11.479000 138862 site-packages/torch/_inductor/codecache.py:629] [0/0] Failed to pickle cache key 2025-12-04T12:20:32.3625092Z W1204 12:20:11.479000 138862 site-packages/torch/_inductor/codecache.py:629] [0/0] Traceback (most recent call last): 2025-12-04T12:20:32.3626466Z W1204 12:20:11.479000 138862 site-packages/torch/_inductor/codecache.py:629] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 625, in dumps 2025-12-04T12:20:32.3627759Z W1204 12:20:11.479000 138862 site-packages/torch/_inductor/codecache.py:629] [0/0] self.dump(obj) 2025-12-04T12:20:32.3629040Z W1204 12:20:11.479000 138862 site-packages/torch/_inductor/codecache.py:629] [0/0] AttributeError: Can't pickle local object 'MemoryCoalescingTest.test_remapped_reads_split..fn' 2025-12-04T12:20:32.3630417Z W1204 12:20:11.518000 138862 site-packages/torch/_inductor/codecache.py:629] [0/0] Failed to pickle cache key 2025-12-04T12:20:32.3631419Z W1204 12:20:11.518000 138862 site-packages/torch/_inductor/codecache.py:629] [0/0] Traceback (most recent call last): 2025-12-04T12:20:32.3632781Z W1204 12:20:11.518000 138862 site-packages/torch/_inductor/codecache.py:629] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 625, in dumps 2025-12-04T12:20:32.3634076Z W1204 12:20:11.518000 138862 site-packages/torch/_inductor/codecache.py:629] [0/0] self.dump(obj) 2025-12-04T12:20:32.3635356Z W1204 12:20:11.518000 138862 site-packages/torch/_inductor/codecache.py:629] [0/0] AttributeError: Can't pickle local object 'MemoryCoalescingTest.test_remapped_reads_split..fn' 2025-12-04T12:20:32.3636717Z I1204 12:20:11.702000 138862 site-packages/torch/_inductor/compile_fx.py:1584] [0/0] [__inductor_metrics] Graph Metrics: 2025-12-04T12:20:32.3638392Z I1204 12:20:11.702000 138862 site-packages/torch/_inductor/compile_fx.py:1584] [0/0] [__inductor_metrics] {'num_bytes_accessed': 432, 'nodes_num_elem': [(FusedSchedulerNode(nodes=op0_op1), 108)], 'node_runtimes': [(FusedSchedulerNode(nodes=op0_op1), 1.3497300539892022e-06)]} 2025-12-04T12:20:32.3639720Z PASSED [0.3441s] [ 27%] 2025-12-04T12:20:32.3640315Z inductor/test_loop_ordering.py::MemoryCoalescingTest::test_solve_for_tiling PASSED [0.1454s] [ 31%] 2025-12-04T12:20:32.3641291Z inductor/test_loop_ordering.py::MemoryCoalescingTest::test_solve_for_zero PASSED [0.1614s] [ 34%] 2025-12-04T12:20:32.3642787Z inductor/test_loop_ordering.py::MemoryCoalescingTest::test_tiled_coalesce_analysis_downcast_transposed_v_False W1204 12:20:12.098000 138862 site-packages/torch/_inductor/codecache.py:629] [0/0] Failed to pickle cache key 2025-12-04T12:20:32.3644341Z W1204 12:20:12.098000 138862 site-packages/torch/_inductor/codecache.py:629] [0/0] Traceback (most recent call last): 2025-12-04T12:20:32.3645793Z W1204 12:20:12.098000 138862 site-packages/torch/_inductor/codecache.py:629] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 625, in dumps 2025-12-04T12:20:32.3647085Z W1204 12:20:12.098000 138862 site-packages/torch/_inductor/codecache.py:629] [0/0] self.dump(obj) 2025-12-04T12:20:32.3648371Z W1204 12:20:12.098000 138862 site-packages/torch/_inductor/codecache.py:629] [0/0] AttributeError: Can't pickle local object 'MemoryCoalescingTest.test_tiled_coalesce_analysis..fn' 2025-12-04T12:20:32.3649719Z W1204 12:20:12.122000 138862 site-packages/torch/_inductor/codecache.py:629] [0/0] Failed to pickle cache key 2025-12-04T12:20:32.3650721Z W1204 12:20:12.122000 138862 site-packages/torch/_inductor/codecache.py:629] [0/0] Traceback (most recent call last): 2025-12-04T12:20:32.3652104Z W1204 12:20:12.122000 138862 site-packages/torch/_inductor/codecache.py:629] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 625, in dumps 2025-12-04T12:20:32.3653398Z W1204 12:20:12.122000 138862 site-packages/torch/_inductor/codecache.py:629] [0/0] self.dump(obj) 2025-12-04T12:20:32.3654676Z W1204 12:20:12.122000 138862 site-packages/torch/_inductor/codecache.py:629] [0/0] AttributeError: Can't pickle local object 'MemoryCoalescingTest.test_tiled_coalesce_analysis..fn' 2025-12-04T12:20:32.3656049Z I1204 12:20:12.736000 138862 site-packages/torch/_inductor/compile_fx.py:1584] [0/0] [__inductor_metrics] Graph Metrics: 2025-12-04T12:20:32.3657730Z I1204 12:20:12.736000 138862 site-packages/torch/_inductor/compile_fx.py:1584] [0/0] [__inductor_metrics] {'num_bytes_accessed': 786432, 'nodes_num_elem': [(SchedulerNode(name='op0'), 196608)], 'node_runtimes': [(SchedulerNode(name='op0'), 0.002457108578284343)]} 2025-12-04T12:20:32.3658993Z PASSED [0.7862s] [ 37%] 2025-12-04T12:20:32.3660182Z inductor/test_loop_ordering.py::MemoryCoalescingTest::test_tiled_coalesce_analysis_downcast_transposed_v_True W1204 12:20:12.887000 138862 site-packages/torch/_inductor/codecache.py:629] [0/0] Failed to pickle cache key 2025-12-04T12:20:32.3661751Z W1204 12:20:12.887000 138862 site-packages/torch/_inductor/codecache.py:629] [0/0] Traceback (most recent call last): 2025-12-04T12:20:32.3663238Z W1204 12:20:12.887000 138862 site-packages/torch/_inductor/codecache.py:629] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 625, in dumps 2025-12-04T12:20:32.3664530Z W1204 12:20:12.887000 138862 site-packages/torch/_inductor/codecache.py:629] [0/0] self.dump(obj) 2025-12-04T12:20:32.3665823Z W1204 12:20:12.887000 138862 site-packages/torch/_inductor/codecache.py:629] [0/0] AttributeError: Can't pickle local object 'MemoryCoalescingTest.test_tiled_coalesce_analysis..fn' 2025-12-04T12:20:32.3667141Z W1204 12:20:12.912000 138862 site-packages/torch/_inductor/codecache.py:629] [0/0] Failed to pickle cache key 2025-12-04T12:20:32.3668146Z W1204 12:20:12.912000 138862 site-packages/torch/_inductor/codecache.py:629] [0/0] Traceback (most recent call last): 2025-12-04T12:20:32.3669520Z W1204 12:20:12.912000 138862 site-packages/torch/_inductor/codecache.py:629] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 625, in dumps 2025-12-04T12:20:32.3670807Z W1204 12:20:12.912000 138862 site-packages/torch/_inductor/codecache.py:629] [0/0] self.dump(obj) 2025-12-04T12:20:32.3672278Z W1204 12:20:12.912000 138862 site-packages/torch/_inductor/codecache.py:629] [0/0] AttributeError: Can't pickle local object 'MemoryCoalescingTest.test_tiled_coalesce_analysis..fn' 2025-12-04T12:20:32.3673755Z I1204 12:20:13.536000 138862 site-packages/torch/_inductor/compile_fx.py:1584] [0/0] [__inductor_metrics] Graph Metrics: 2025-12-04T12:20:32.3675377Z I1204 12:20:13.536000 138862 site-packages/torch/_inductor/compile_fx.py:1584] [0/0] [__inductor_metrics] {'num_bytes_accessed': 1048576, 'nodes_num_elem': [(SchedulerNode(name='op0'), 262144)], 'node_runtimes': [(SchedulerNode(name='op0'), 0.0032761447710457905)]} 2025-12-04T12:20:32.3676756Z PASSED [0.7959s] [ 41%] 2025-12-04T12:20:32.3677685Z inductor/test_loop_ordering.py::TestTiling::test_3d_pointwise I1204 12:20:14.911000 138862 site-packages/torch/_inductor/compile_fx.py:1584] [0/0] [__inductor_metrics] Graph Metrics: 2025-12-04T12:20:32.3679596Z I1204 12:20:14.911000 138862 site-packages/torch/_inductor/compile_fx.py:1584] [0/0] [__inductor_metrics] {'num_bytes_accessed': 268435456, 'nodes_num_elem': [(SchedulerNode(name='op0'), 67108864)], 'node_runtimes': [(SchedulerNode(name='op0'), 0.8386930613877224)]} 2025-12-04T12:20:32.3680863Z PASSED [1.7333s] [ 44%] 2025-12-04T12:20:32.3681756Z inductor/test_loop_ordering.py::TestTiling::test_cat I1204 12:20:16.617000 138862 site-packages/torch/_inductor/compile_fx.py:1584] [0/0] [__inductor_metrics] Graph Metrics: 2025-12-04T12:20:32.3683629Z I1204 12:20:16.617000 138862 site-packages/torch/_inductor/compile_fx.py:1584] [0/0] [__inductor_metrics] {'num_bytes_accessed': 268435456, 'nodes_num_elem': [(SchedulerNode(name='op0'), 67108864)], 'node_runtimes': [(SchedulerNode(name='op0'), 0.8386930613877224)]} 2025-12-04T12:20:32.3684889Z PASSED [1.5437s] [ 48%] 2025-12-04T12:20:32.3685453Z inductor/test_loop_ordering.py::TestTiling::test_find_broadcast_var PASSED [0.0053s] [ 51%] 2025-12-04T12:20:32.3686665Z inductor/test_loop_ordering.py::TestTiling::test_mutation_deps W1204 12:20:16.964000 138862 site-packages/torch/_inductor/codecache.py:629] [0/0] Failed to pickle cache key 2025-12-04T12:20:32.3687969Z W1204 12:20:16.964000 138862 site-packages/torch/_inductor/codecache.py:629] [0/0] Traceback (most recent call last): 2025-12-04T12:20:32.3689344Z W1204 12:20:16.964000 138862 site-packages/torch/_inductor/codecache.py:629] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 625, in dumps 2025-12-04T12:20:32.3690719Z W1204 12:20:16.964000 138862 site-packages/torch/_inductor/codecache.py:629] [0/0] self.dump(obj) 2025-12-04T12:20:32.3691975Z W1204 12:20:16.964000 138862 site-packages/torch/_inductor/codecache.py:629] [0/0] AttributeError: Can't pickle local object 'TestTiling.test_mutation_deps..fn' 2025-12-04T12:20:32.3693201Z W1204 12:20:16.990000 138862 site-packages/torch/_inductor/codecache.py:629] [0/0] Failed to pickle cache key 2025-12-04T12:20:32.3694187Z W1204 12:20:16.990000 138862 site-packages/torch/_inductor/codecache.py:629] [0/0] Traceback (most recent call last): 2025-12-04T12:20:32.3695570Z W1204 12:20:16.990000 138862 site-packages/torch/_inductor/codecache.py:629] [0/0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 625, in dumps 2025-12-04T12:20:32.3696923Z W1204 12:20:16.990000 138862 site-packages/torch/_inductor/codecache.py:629] [0/0] self.dump(obj) 2025-12-04T12:20:32.3698105Z W1204 12:20:16.990000 138862 site-packages/torch/_inductor/codecache.py:629] [0/0] AttributeError: Can't pickle local object 'TestTiling.test_mutation_deps..fn' 2025-12-04T12:20:32.3699379Z I1204 12:20:17.425000 138862 site-packages/torch/_inductor/compile_fx.py:1584] [0/0] [__inductor_metrics] Graph Metrics: 2025-12-04T12:20:32.3701081Z I1204 12:20:17.425000 138862 site-packages/torch/_inductor/compile_fx.py:1584] [0/0] [__inductor_metrics] {'num_bytes_accessed': 134217728, 'nodes_num_elem': [(FusedSchedulerNode(nodes=op0_op1), 33554432)], 'node_runtimes': [(FusedSchedulerNode(nodes=op0_op1), 0.4193465306938612)]} 2025-12-04T12:20:32.3704845Z PASSED [0.6148s] [ 55%] 2025-12-04T12:20:32.3705839Z inductor/test_loop_ordering.py::TestTiling::test_penalized_small_dim I1204 12:20:18.061000 138862 site-packages/torch/_inductor/compile_fx.py:1584] [0/0] [__inductor_metrics] Graph Metrics: 2025-12-04T12:20:32.3707769Z I1204 12:20:18.061000 138862 site-packages/torch/_inductor/compile_fx.py:1584] [0/0] [__inductor_metrics] {'num_bytes_accessed': 40016, 'nodes_num_elem': [(SchedulerNode(name='op0'), 10004)], 'node_runtimes': [(SchedulerNode(name='op0'), 0.0001250249950009998)]} 2025-12-04T12:20:32.3709102Z PASSED [0.5818s] [ 58%] 2025-12-04T12:20:32.3710092Z inductor/test_loop_ordering.py::TestTiling::test_pointwise_a_NHWC_b_NHWC I1204 12:20:18.631000 138862 site-packages/torch/_inductor/compile_fx.py:1584] [0/0] [__inductor_metrics] Graph Metrics: 2025-12-04T12:20:32.3712053Z I1204 12:20:18.631000 138862 site-packages/torch/_inductor/compile_fx.py:1584] [0/0] [__inductor_metrics] {'num_bytes_accessed': 201326592, 'nodes_num_elem': [(SchedulerNode(name='op0'), 50331648)], 'node_runtimes': [(SchedulerNode(name='op0'), 0.6290197960407918)]} 2025-12-04T12:20:32.3713331Z PASSED [0.6256s] [ 62%] 2025-12-04T12:20:32.3714302Z inductor/test_loop_ordering.py::TestTiling::test_pointwise_a_NHWC_b_T I1204 12:20:19.778000 138862 site-packages/torch/_inductor/compile_fx.py:1584] [0/0] [__inductor_metrics] Graph Metrics: 2025-12-04T12:20:32.3716242Z I1204 12:20:19.778000 138862 site-packages/torch/_inductor/compile_fx.py:1584] [0/0] [__inductor_metrics] {'num_bytes_accessed': 201326592, 'nodes_num_elem': [(SchedulerNode(name='op0'), 50331648)], 'node_runtimes': [(SchedulerNode(name='op0'), 0.6290197960407918)]} 2025-12-04T12:20:32.3717523Z PASSED [1.3383s] [ 65%] 2025-12-04T12:20:32.3718520Z inductor/test_loop_ordering.py::TestTiling::test_pointwise_a_NHWC_b_cont I1204 12:20:21.111000 138862 site-packages/torch/_inductor/compile_fx.py:1584] [0/0] [__inductor_metrics] Graph Metrics: 2025-12-04T12:20:32.3720477Z I1204 12:20:21.111000 138862 site-packages/torch/_inductor/compile_fx.py:1584] [0/0] [__inductor_metrics] {'num_bytes_accessed': 201326592, 'nodes_num_elem': [(SchedulerNode(name='op0'), 50331648)], 'node_runtimes': [(SchedulerNode(name='op0'), 0.6290197960407918)]} 2025-12-04T12:20:32.3721737Z PASSED [1.2863s] [ 68%] 2025-12-04T12:20:32.3722705Z inductor/test_loop_ordering.py::TestTiling::test_pointwise_a_T_b_NHWC I1204 12:20:22.409000 138862 site-packages/torch/_inductor/compile_fx.py:1584] [0/0] [__inductor_metrics] Graph Metrics: 2025-12-04T12:20:32.3724718Z I1204 12:20:22.409000 138862 site-packages/torch/_inductor/compile_fx.py:1584] [0/0] [__inductor_metrics] {'num_bytes_accessed': 201326592, 'nodes_num_elem': [(SchedulerNode(name='op0'), 50331648)], 'node_runtimes': [(SchedulerNode(name='op0'), 0.6290197960407918)]} 2025-12-04T12:20:32.3725986Z PASSED [1.3411s] [ 72%] 2025-12-04T12:20:32.3726937Z inductor/test_loop_ordering.py::TestTiling::test_pointwise_a_T_b_T I1204 12:20:23.226000 138862 site-packages/torch/_inductor/compile_fx.py:1584] [0/0] [__inductor_metrics] Graph Metrics: 2025-12-04T12:20:32.3728851Z I1204 12:20:23.226000 138862 site-packages/torch/_inductor/compile_fx.py:1584] [0/0] [__inductor_metrics] {'num_bytes_accessed': 201326592, 'nodes_num_elem': [(SchedulerNode(name='op0'), 50331648)], 'node_runtimes': [(SchedulerNode(name='op0'), 0.6290197960407918)]} 2025-12-04T12:20:32.3730130Z PASSED [0.6231s] [ 75%] 2025-12-04T12:20:32.3731093Z inductor/test_loop_ordering.py::TestTiling::test_pointwise_a_T_b_cont I1204 12:20:24.410000 138862 site-packages/torch/_inductor/compile_fx.py:1584] [0/0] [__inductor_metrics] Graph Metrics: 2025-12-04T12:20:32.3733039Z I1204 12:20:24.410000 138862 site-packages/torch/_inductor/compile_fx.py:1584] [0/0] [__inductor_metrics] {'num_bytes_accessed': 201326592, 'nodes_num_elem': [(SchedulerNode(name='op0'), 50331648)], 'node_runtimes': [(SchedulerNode(name='op0'), 0.6290197960407918)]} 2025-12-04T12:20:32.3734305Z PASSED [1.3315s] [ 79%] 2025-12-04T12:20:32.3735277Z inductor/test_loop_ordering.py::TestTiling::test_pointwise_a_cont_b_NHWC I1204 12:20:25.670000 138862 site-packages/torch/_inductor/compile_fx.py:1584] [0/0] [__inductor_metrics] Graph Metrics: 2025-12-04T12:20:32.3737363Z I1204 12:20:25.670000 138862 site-packages/torch/_inductor/compile_fx.py:1584] [0/0] [__inductor_metrics] {'num_bytes_accessed': 201326592, 'nodes_num_elem': [(SchedulerNode(name='op0'), 50331648)], 'node_runtimes': [(SchedulerNode(name='op0'), 0.6290197960407918)]} 2025-12-04T12:20:32.3738682Z PASSED [1.2890s] [ 82%] 2025-12-04T12:20:32.3739682Z inductor/test_loop_ordering.py::TestTiling::test_pointwise_a_cont_b_T I1204 12:20:26.977000 138862 site-packages/torch/_inductor/compile_fx.py:1584] [0/0] [__inductor_metrics] Graph Metrics: 2025-12-04T12:20:32.3741605Z I1204 12:20:26.977000 138862 site-packages/torch/_inductor/compile_fx.py:1584] [0/0] [__inductor_metrics] {'num_bytes_accessed': 201326592, 'nodes_num_elem': [(SchedulerNode(name='op0'), 50331648)], 'node_runtimes': [(SchedulerNode(name='op0'), 0.6290197960407918)]} 2025-12-04T12:20:32.3742870Z PASSED [1.2719s] [ 86%] 2025-12-04T12:20:32.3743853Z inductor/test_loop_ordering.py::TestTiling::test_pointwise_a_cont_b_cont I1204 12:20:27.746000 138862 site-packages/torch/_inductor/compile_fx.py:1584] [0/0] [__inductor_metrics] Graph Metrics: 2025-12-04T12:20:32.3745815Z I1204 12:20:27.746000 138862 site-packages/torch/_inductor/compile_fx.py:1584] [0/0] [__inductor_metrics] {'num_bytes_accessed': 201326592, 'nodes_num_elem': [(SchedulerNode(name='op0'), 50331648)], 'node_runtimes': [(SchedulerNode(name='op0'), 0.6290197960407918)]} 2025-12-04T12:20:32.3747075Z PASSED [0.6211s] [ 89%] 2025-12-04T12:20:32.3748013Z inductor/test_loop_ordering.py::TestTiling::test_tiled_reduction I1204 12:20:28.802000 138862 site-packages/torch/_inductor/compile_fx.py:1584] [0/0] [__inductor_metrics] Graph Metrics: 2025-12-04T12:20:32.3749950Z I1204 12:20:28.802000 138862 site-packages/torch/_inductor/compile_fx.py:1584] [0/0] [__inductor_metrics] {'num_bytes_accessed': 1074790400, 'nodes_num_elem': [(SchedulerNode(name='op0'), 268697600)], 'node_runtimes': [(SchedulerNode(name='op0'), 3.3580483903219354)]} 2025-12-04T12:20:32.3751226Z PASSED [1.2841s] [ 93%] 2025-12-04T12:20:32.3751801Z inductor/test_loop_ordering.py::TestIndexInversion::test_inversion_cases PASSED [0.0535s] [ 96%] 2025-12-04T12:20:32.3752767Z inductor/test_loop_ordering.py::TestIndexInversion::test_original_complex_expression PASSED [0.7354s] [100%] 2025-12-04T12:20:32.3753368Z 2025-12-04T12:20:32.3754182Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_loop_ordering/inductor.test_loop_ordering-d4eac70f931f6c8b.xml - 2025-12-04T12:20:32.3755275Z ====================== 29 passed, 24 deselected in 24.12s ====================== 2025-12-04T12:20:32.3756161Z The following tests failed consistently: ['test/inductor/test_loop_ordering.py::MemoryCoalescingTest::test_induced_fused_tiling'] 2025-12-04T12:20:32.3756843Z 2025-12-04T12:20:32.3757411Z FINISHED PRINTING LOG FILE of inductor/test_loop_ordering 1/1 (test/test-reports/inductor.test_loop_ordering_1.1_ca0aee6babe9c71a_.log) 2025-12-04T12:20:32.3758124Z 2025-12-04T12:20:32.3758485Z Finished inductor/test_loop_ordering 1/1 ... [2025-12-04 12:20:32.292243][11260.675134844], took 1.78min 2025-12-04T12:20:32.3759833Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_loop_ordering/inductor.test_loop_ordering-264346bf50f4314b.xml 2025-12-04T12:20:32.3986285Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_loop_ordering/inductor.test_loop_ordering-f4f4ac9590e83730.xml 2025-12-04T12:20:32.4300570Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_loop_ordering/inductor.test_loop_ordering-193da166d9e268ac.xml 2025-12-04T12:20:32.4630007Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_loop_ordering/inductor.test_loop_ordering-d4eac70f931f6c8b.xml 2025-12-04T12:20:33.0174111Z Uploading logs for 57119749248 to S3 2025-12-04T12:20:33.1920718Z Uploading artifacts took 0.69 seconds 2025-12-04T12:20:33.1921133Z inductor/test_loop_ordering 1/1 failed! 2025-12-04T12:20:33.1925865Z Running export/test_serdes 1/1 ... [2025-12-04 12:20:33.192392][11261.575285415] 2025-12-04T12:20:33.1926415Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T12:20:33.1931199Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'export/test_serdes.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 12:20:33.192866] 2025-12-04T12:24:41.5856113Z 2025-12-04T12:24:41.5859178Z export/test_serdes 1/1 was successful, full logs can be found in artifacts with path test/test-reports/export.test_serdes_1.1_d6753111c4d56d4f_.log 2025-12-04T12:24:41.6337746Z Running 880 items in this shard: test/export/test_serdes.py::SerDesExportTestDynamismExpression::test_export_assume_static_by_default_serdes_strict, test/export/test_serdes.py::SerDesExportTestDynamismExpression::test_export_constraints_error_not_in_range_serdes_strict, test/export/test_serdes.py::SerDesExportTestDynamismExpression::test_export_constraints_error_serdes_strict, test/export/test_serdes.py::SerDesExportTestDynamismExpression::test_export_inline_constraints_serdes_strict, test/export/test_serdes.py::SerDesExportTestDynamismExpression::test_export_slice_maxsize_serdes_strict, test/export/test_serdes.py::SerDesExportTestDynamismExpression::test_export_slice_unbacked_dim1_serdes_strict, test/export/test_serdes.py::SerDesExportTestDynamismExpression::test_export_strict_narrow_unbacked_expr_serdes_strict, test/export/test_serdes.py::SerDesExportTestDynamismExpression::test_no_grad_param_inplace_serdes_strict, test/export/test_serdes.py::SerDesExportTestDynamismExpression::test_reshape_view_backed_size_oblivious_serdes_strict, test/export/test_serdes.py::SerDesExportNonStrictTestDynamismExpression::test_export_assume_static_by_default_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestDynamismExpression::test_export_constraints_error_not_in_range_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestDynamismExpression::test_export_constraints_error_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestDynamismExpression::test_export_inline_constraints_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestDynamismExpression::test_export_slice_maxsize_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestDynamismExpression::test_export_slice_unbacked_dim1_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestDynamismExpression::test_export_strict_narrow_unbacked_expr_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestDynamismExpression::test_no_grad_param_inplace_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestDynamismExpression::test_reshape_view_backed_size_oblivious_serdes_nonstrict, test/export/test_serdes.py::SerDesExportTestExport::test__scaled_dot_product_flash_attention_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_additional_inputs_constants_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_allow_explicit_guards_as_runtime_asserts_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_annotate_on_assert_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_args_type_checked_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_aten_lift_fresh_copy_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_attention_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_attr_assignment_extra_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_automatic_constrain_size_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_automatic_dynamic_shapes_constant_relation_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_automatic_dynamic_shapes_linear_relation_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_automatic_dynamic_shapes_simple_equality_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_baddbmm_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_basic_non_strict_fake_tensor_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_basic_non_strict_real_tensor_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_basic_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_bincount_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_buffer_util_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_capture_subclass_constructor_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_capture_subclass_constructor_torch_ir_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_capture_subclass_wrong_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_ccode_python_mod_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_cdist_forward_compute_mode_zero_export_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_check_specialized_int_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_checks_to_constrain_range_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_cleanup_dynamic_markers_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_colin_unbacked_backed_vr_sub_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_colon_parameter_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_compiling_state_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_cond_access_identical_symint_closure_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_cond_branches_return_constant_int_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_cond_branches_return_same_int_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_cond_buffers_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_cond_contains_unbacked_no_escape_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_cond_int_closure_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_cond_unflatten_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_cond_with_module_stack_export_with_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_cond_with_module_stack_export_with_unflatten_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_constant_aliasing_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_constant_input_naming_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_constant_no_user_inp_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_constant_output_dup_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_constant_output_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_constant_requires_grad_const_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_constant_return_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_constant_tensor_mutation_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_constant_tensor_with_non_functional_nested_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_constant_tensor_with_non_functional_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_constrain_decomp_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_constrain_size_in_eager_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_constrain_size_with_constrain_value_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_constrain_size_with_various_cases_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_conv_dynamic_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_crop_like_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_cse_for_symint_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_custom_op_auto_functionalize_pre_dispatch_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_custom_op_auto_functionalize_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_custom_op_auto_warn_pre_dispatch_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_custom_op_preserve_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_custom_pytree_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_custom_tag_metadata_re_export_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_decomp_batch_norm_functional_predispatch_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_decomp_item_in_prim_after_decomposition_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_decomp_item_in_prim_before_decomposition_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_default_decomposition_core_cia_ops_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_derived_dim_1_2_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_derived_dim_basic_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_derived_dim_integer_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_derived_dim_nested_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_derived_dim_out_of_order_repeat_derived_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_derived_dim_out_of_order_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_derived_dim_out_of_order_simplified_repeat_non_derived_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_derived_dim_out_of_order_simplified_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_derived_dim_repeat_derived_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_detect_leak_nonstrict_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_detect_leak_nonstrict_with_stacktrace_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_detect_leak_strict_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_device_to_dynamic_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_device_to_gpu_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_device_to_mutation_float_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_device_to_mutation_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_device_to_static_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_dim_1_2_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_dim_auto_and_dim_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_dim_dynamic_divisibility_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_dim_dynamic_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_dim_dynamic_specialization_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_dim_hint_range_violations_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_dim_hint_ranges_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_disable_forced_specializations_errors_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_disable_forced_specializations_ok_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_distributed_all_gather_into_tensor_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_distributed_all_gather_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_distributed_all_reduce_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_distributed_all_to_all_single_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_distributed_reduce_scatter_tensor_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_dont_duck_size_for_auto_dynamic_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_double_lifted_constants_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_draft_export_checks_aliasing_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_draft_export_checks_mutation_list_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_draft_export_checks_mutation_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_draft_export_checks_mutation_with_nan_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_draft_export_fake_kernel_inference_errors_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_draft_export_infers_fake_kernel_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_duplicate_modules_with_non_persistent_buffers_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_dynamic_lr_shift_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_dynamic_shapes_bounds_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_dynamic_shapes_builder_basic_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_dynamic_shapes_builder_kwargs_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_dynamic_shapes_builder_pytree_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_dynamic_shapes_dataclass_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_dynamic_shapes_inferred_basic_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_dynamic_shapes_serdes_generic_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_dynamic_shapes_serdes_user_errors_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_dynamic_shapes_serdes_various_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_dynamic_shapes_spec_with_pytree_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_dynamic_shapes_wrapped_with_shape_guards_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_dynamic_sym_round_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_ends_of_bounds_oblivious_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_enum_str_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_error_does_not_reference_eager_fallback_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_error_when_passing_mutating_primitive_op_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_exception_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_expand_copy_export_handles_implicit_true_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_api_with_dynamic_shapes_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_as_backend_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_associative_scan_lifted_buffers_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_associative_scan_symbol_dim_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_associative_scan_symbol_scandim_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_aten_to_unflatten_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_aten_to_unflatten_subclass_pre_dispatch_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_aten_to_unflatten_subclass_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_cond_preserve_torch_fn_for_subgraphs_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_cond_symbool_pred_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_cond_warns_constant_pred_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_custom_decomp_table_basic_pop_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_custom_decomp_table_container_methods_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_custom_op_lib_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_custom_triton_kernel_mutable_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_custom_triton_kernel_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_cyclic_reference_leak_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_decomp_torture_case_1_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_decomp_torture_case_2_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_decomps_dynamic_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_decomps_simple_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_dynamo_config_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_for_training_run_decomp_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_for_training_with_container_type_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_for_training_with_dynamic_shapes_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_for_training_with_mutation_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_for_training_with_state_dict_hooks_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_func_with_default_kwargs_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_func_with_keyword_only_args_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_func_with_kwargs_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_func_with_pytree_kwargs_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_func_with_var_keyword_args_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_func_with_var_keyword_pytree_args_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_func_with_var_postional_args_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_function_schema_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_graph_with_no_inputs_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_input_mutation_bug_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_input_mutation_dynamic_shape_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_input_mutation_static_shape_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_leak_compile_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_linear_preserve_dynamic_shape_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_max_nonstrict_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_max_onnx_reported_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_method_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_mod_constraints_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_module_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_preserve_linear_at_aot_level_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_preserve_linear_but_not_custom_op_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_rnn_variants_with_warning_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_scan_pytree_output_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_script_module_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_statically_known_true_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_then_compile_tensor_ctor_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_with_autocast_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_with_fake_tensor_inputs_on_cuda_devices_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_with_fake_tensor_inputs_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_with_inline_constraints_complex_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_with_inline_constraints_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_with_set_grad_enabled_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_with_wrong_inputs_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_external_call_non_strict_real_tensor_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_fake_inputs_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_fake_weights_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_filter_traceback_frames_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_flex_attention_export_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_float_conversion_from_int_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_float_conversion_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_fqn_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_from_node_metadata_export_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_full_on_scalar_tensor_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_function_holding_tensor_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_hints_wrapper_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_hoo_inline_users_issue_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_if_functional_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_if_post_autograd_op_preserved_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_inductor_backend_inside_nonstrict_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_inline_script_class_method_recursive_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_inline_script_class_method_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_inline_script_function_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_inline_script_method_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_int_shape_specialization_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_intermediate_shape_comp_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_invalid_pytree_dynamo_graph_capture_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_is_exporting_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_is_nonzero_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_isnonzero_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_issue_113041_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_issue_157289_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_issue_161902_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_istft_op_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_keep_composite_ops_invalid_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_keep_composite_ops_linear_convd_for_training_ir_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_keep_composite_ops_linear_convd_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_kwarg_dynamic_shapes_diff_order_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_kwargs_reorder_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_layer_norm_unbacked_normalized_shape_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_layer_sharing_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_lazy_module_kwargs_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_lifted_constants_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_linear_conv_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_malformed_fqn_from_source_name_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_map_buffers_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_map_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_mask_nonzero_static_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_masked_select_dynamic_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_math_pow_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_mismatched_dynamic_shapes_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_mixed_input_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_module_dict_key_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_module_input_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_module_input_subclasses_parameterization_nested_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_module_list_slice_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_module_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_module_with_dict_container_inp_out_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_modules_access_for_deleted_submodule_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_more_multidimensional_slicing_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_multidimensional_slicing_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_multinomial_dynamic_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_multiple_definitions_same_name_dim_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_namedtuple_input_export_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_native_multi_attention_head_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_nested_dynamic_shapes_spec_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_nested_module_fake_tensor_leak_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_nested_module_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_nested_module_with_constant_buffer_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_nested_module_with_init_buffer_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_nested_module_with_parameter_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_nn_module_stack_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_nn_module_stack_shared_submodule_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_no_check_is_size_error_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_no_suggested_fixes_for_data_dependent_errors_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_no_tensor_computation_2_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_no_tensor_computation_3_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_no_tensor_computation_4_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_no_tensor_computation_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_non_arg_name_dynamic_shapes_api_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_non_arg_name_dynamic_shapes_api_with_container_type_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_non_arg_name_dynamic_shapes_api_with_kwarg_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_non_persistent_buffer_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_non_strict_dynamic_shapes_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_non_strict_dynamic_shapes_suggested_fixes_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_none_buffers_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_nonstrict_retrace_preserves_metadata_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_nonzero_2_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_nonzero_dynamic_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_not_registered_parameter_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_operator_aten_tensor_mode_variant_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_output_node_name_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_pad_sequence_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_param_util_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_partial_patched_forward_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_placeholder_naming_collisions_hoo_subgraphs_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_placeholder_naming_collisions_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_placeholder_naming_order_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_placeholder_naming_order_variadic_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_placeholder_update_preserving_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_predispatch_cond_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_predispatch_grad_wrappers_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_preserve_annotation_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_preserve_module_call_signature_unflatten_specialization_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_preserve_requires_grad_placeholders_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_preserve_shape_dynamism_for_unused_inputs_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_profiling_code_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_python_asserts_with_sym_int_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_pytree_register_data_class_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_pytree_register_nested_data_class_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_raise_user_error_when_guard_on_data_dependent_operation_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_range_constraints_with_replacement_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_real_tensor_alias_dtype_mismatch_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_real_tensor_bool_cast_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_real_tensor_errors_on_aliasing_custom_op_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_real_tensor_for_max_op_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_real_tensor_size_mismatch_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_redundant_assert_max_upper_bound_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_redundant_asserts_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_refine_dynamic_shapes_from_suggested_fixes_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_register_constant_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_repeat_interleave_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_replace_unbacked_with_very_large_upperbound_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_replaced_unbacked_bindings_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_reshape_view_helper_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_retracable_ep_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_retrace_pre_autograd_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_run_decomposition_supports_user_input_mutation_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_run_decompositions_keep_metadata_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_run_decompositions_keep_tensor_constant_metadata_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_runtime_assert_for_prim_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_runtime_assert_for_prm_str_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_runtime_assert_with_size_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_sdpa_gqa_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_sequential_slicing_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_set_example_inputs_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_set_grad_as_side_effect_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_set_grad_empty_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_set_grad_unflatten_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_setgrad_lifted_tensor_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_shared_submodule_nn_module_stack_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_simple_export_for_training_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_simple_unbacked_view_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_size_input_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_slice_nn_module_stack_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_solver_unsupported_sympy_function_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_specialize_derived_dim_roots_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_split_const_gm_with_lifted_constants_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_stack_trace_make_fx_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_stack_trace_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_state_primitives_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_state_shape_attribute_assignment_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_state_tensors_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_static_dim_constraints_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_subclass_context_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_subclass_nested_attr_access_complicated_metadata_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_subclass_nested_attr_access_const_metadata_not_top_level_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_subclass_nested_attr_access_const_metadata_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_subclass_nested_attr_access_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_subclass_nested_attr_access_submodule_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_subclasses_parameterization_nested_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_subclasses_parameterization_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_suggest_torch_checks_with_non_negative_check_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_suggest_torch_checks_with_regular_check_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_suggested_fixes_for_data_dependent_errors_basic_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_suggested_fixes_for_data_dependent_errors_puzzlers_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_suggested_fixes_new_roots_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_sym_float_operators_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_sym_or_sym_and_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_sym_sqrt_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_symbool_item_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_symfloat_item_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_symint_input_additional_inputs_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_symint_input_basic_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_symint_input_ranges_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_symint_input_shapes_collection_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_symint_input_specialization_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_symint_item_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_symint_output_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_symint_tensor_return_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_tag_ac_export_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_tensor_attribute_zero_args_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_tensor_constant_aten_to_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_tensor_constant_with_wrapped_method_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_to_module_with_mutated_buffer_multiple_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_to_module_with_mutated_buffer_multiple_update_sub_later_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_to_module_with_mutated_buffer_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_tolist_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_torch_check_eq_commutativity_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_torch_fn_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_trace_under_fake_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_train_eval_on_exported_preautograd_module_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_tril_dynamic_diagonal_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_triu_dynamic_diagonal_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unbacked_3d_matmul_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unbacked_bincount_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unbacked_bindings_for_divisible_u_symint_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unbacked_deferred_runtime_retrace_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unbacked_expand_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unbacked_infer_size_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unbacked_kth_value_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unbacked_linear_layer_norm_input_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unbacked_noncontig_lin_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unbacked_pad_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unbacked_scalar_constructor_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unbacked_slice_forward_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unbacked_slice_simple_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unbacked_stack_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unbacked_to_cond_passthrough_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unbacked_to_cond_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unbacked_unsqueeze_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unflatten_asserts_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unflatten_buffer_update_child2parent_swap_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unflatten_closure_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unflatten_isinstance_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unflatten_multiple_graphs_dispatch_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unflatten_multiple_graphs_preserve_signature_no_error_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unflatten_multiple_graphs_shared_submodule_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unflatten_multiple_graphs_state_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unflatten_no_unroll_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unflatten_placeholder_update_child2parent_swap_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unflatten_placeholder_update_grandchild2cousin_swap_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unflatten_random_dag_5_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unflatten_random_dag_6_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unflatten_random_dag_buf_8_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unflatten_random_dag_const_preserving_3_1_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unflatten_random_dag_const_preserving_3_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unflatten_random_dag_mutating_buf_4_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unflatten_random_dag_mutating_buf_6_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unflatten_random_dag_mutating_buf_9_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unflatten_random_dag_mutating_buf_preserving_10_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unflatten_random_dag_mutating_buf_preserving_4_1_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unflatten_random_dag_mutating_buf_preserving_4_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unflatten_random_dag_mutating_buf_preserving_5_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unflatten_random_dag_mutating_buf_preserving_7_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unflatten_random_dag_preserving_4_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unused_aliases_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unused_constant_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_uplift_common_custom_meta_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_uplift_common_custom_meta_with_multiple_calls_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_use_embedding_twice_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_user_input_and_buffer_mutation_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_vmap_custom_autograd_function_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_vmap_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_vmap_to_assert_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_where_decomp_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_while_loop_assert_separation_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_while_loop_index_assertions_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_while_loop_simple_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_while_loop_tensor_constant_idx_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_wrapper_module_serdes_strict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test__scaled_dot_product_flash_attention_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_additional_inputs_constants_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_allow_explicit_guards_as_runtime_asserts_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_annotate_on_assert_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_args_type_checked_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_aten_lift_fresh_copy_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_attention_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_attr_assignment_extra_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_automatic_constrain_size_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_automatic_dynamic_shapes_constant_relation_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_automatic_dynamic_shapes_linear_relation_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_automatic_dynamic_shapes_simple_equality_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_baddbmm_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_basic_non_strict_fake_tensor_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_basic_non_strict_real_tensor_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_basic_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_bincount_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_buffer_util_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_capture_subclass_constructor_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_capture_subclass_constructor_torch_ir_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_capture_subclass_wrong_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_ccode_python_mod_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_cdist_forward_compute_mode_zero_export_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_check_specialized_int_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_checks_to_constrain_range_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_cleanup_dynamic_markers_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_colin_unbacked_backed_vr_sub_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_colon_parameter_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_compiling_state_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_cond_access_identical_symint_closure_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_cond_branches_return_constant_int_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_cond_branches_return_same_int_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_cond_buffers_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_cond_contains_unbacked_no_escape_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_cond_int_closure_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_cond_unflatten_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_cond_with_module_stack_export_with_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_cond_with_module_stack_export_with_unflatten_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_constant_aliasing_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_constant_input_naming_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_constant_no_user_inp_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_constant_output_dup_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_constant_output_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_constant_requires_grad_const_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_constant_return_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_constant_tensor_mutation_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_constant_tensor_with_non_functional_nested_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_constant_tensor_with_non_functional_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_constrain_decomp_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_constrain_size_in_eager_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_constrain_size_with_constrain_value_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_constrain_size_with_various_cases_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_conv_dynamic_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_crop_like_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_cse_for_symint_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_custom_op_auto_functionalize_pre_dispatch_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_custom_op_auto_functionalize_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_custom_op_auto_warn_pre_dispatch_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_custom_op_preserve_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_custom_pytree_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_custom_tag_metadata_re_export_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_decomp_batch_norm_functional_predispatch_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_decomp_item_in_prim_after_decomposition_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_decomp_item_in_prim_before_decomposition_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_default_decomposition_core_cia_ops_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_derived_dim_1_2_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_derived_dim_basic_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_derived_dim_integer_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_derived_dim_nested_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_derived_dim_out_of_order_repeat_derived_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_derived_dim_out_of_order_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_derived_dim_out_of_order_simplified_repeat_non_derived_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_derived_dim_out_of_order_simplified_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_derived_dim_repeat_derived_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_detect_leak_nonstrict_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_detect_leak_nonstrict_with_stacktrace_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_detect_leak_strict_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_device_to_dynamic_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_device_to_gpu_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_device_to_mutation_float_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_device_to_mutation_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_device_to_static_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_dim_1_2_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_dim_auto_and_dim_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_dim_dynamic_divisibility_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_dim_dynamic_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_dim_dynamic_specialization_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_dim_hint_range_violations_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_dim_hint_ranges_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_disable_forced_specializations_errors_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_disable_forced_specializations_ok_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_distributed_all_gather_into_tensor_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_distributed_all_gather_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_distributed_all_reduce_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_distributed_all_to_all_single_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_distributed_reduce_scatter_tensor_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_dont_duck_size_for_auto_dynamic_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_double_lifted_constants_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_draft_export_checks_aliasing_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_draft_export_checks_mutation_list_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_draft_export_checks_mutation_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_draft_export_checks_mutation_with_nan_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_draft_export_fake_kernel_inference_errors_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_draft_export_infers_fake_kernel_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_duplicate_modules_with_non_persistent_buffers_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_dynamic_lr_shift_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_dynamic_shapes_bounds_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_dynamic_shapes_builder_basic_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_dynamic_shapes_builder_kwargs_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_dynamic_shapes_builder_pytree_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_dynamic_shapes_dataclass_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_dynamic_shapes_inferred_basic_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_dynamic_shapes_serdes_generic_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_dynamic_shapes_serdes_user_errors_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_dynamic_shapes_serdes_various_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_dynamic_shapes_spec_with_pytree_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_dynamic_shapes_wrapped_with_shape_guards_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_dynamic_sym_round_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_ends_of_bounds_oblivious_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_enum_str_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_error_does_not_reference_eager_fallback_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_error_when_passing_mutating_primitive_op_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_exception_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_expand_copy_export_handles_implicit_true_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_api_with_dynamic_shapes_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_as_backend_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_associative_scan_lifted_buffers_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_associative_scan_symbol_dim_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_associative_scan_symbol_scandim_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_aten_to_unflatten_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_aten_to_unflatten_subclass_pre_dispatch_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_aten_to_unflatten_subclass_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_cond_preserve_torch_fn_for_subgraphs_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_cond_symbool_pred_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_cond_warns_constant_pred_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_custom_decomp_table_basic_pop_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_custom_decomp_table_container_methods_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_custom_op_lib_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_custom_triton_kernel_mutable_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_custom_triton_kernel_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_cyclic_reference_leak_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_decomp_torture_case_1_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_decomp_torture_case_2_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_decomps_dynamic_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_decomps_simple_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_dynamo_config_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_for_training_run_decomp_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_for_training_with_container_type_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_for_training_with_dynamic_shapes_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_for_training_with_mutation_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_for_training_with_state_dict_hooks_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_func_with_default_kwargs_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_func_with_keyword_only_args_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_func_with_kwargs_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_func_with_pytree_kwargs_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_func_with_var_keyword_args_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_func_with_var_keyword_pytree_args_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_func_with_var_postional_args_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_function_schema_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_graph_with_no_inputs_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_input_mutation_bug_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_input_mutation_dynamic_shape_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_input_mutation_static_shape_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_leak_compile_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_linear_preserve_dynamic_shape_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_max_nonstrict_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_max_onnx_reported_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_method_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_mod_constraints_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_module_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_preserve_linear_at_aot_level_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_preserve_linear_but_not_custom_op_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_rnn_variants_with_warning_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_scan_pytree_output_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_script_module_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_statically_known_true_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_then_compile_tensor_ctor_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_with_autocast_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_with_fake_tensor_inputs_on_cuda_devices_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_with_fake_tensor_inputs_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_with_inline_constraints_complex_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_with_inline_constraints_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_with_set_grad_enabled_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_with_wrong_inputs_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_external_call_non_strict_real_tensor_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_fake_inputs_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_fake_weights_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_filter_traceback_frames_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_flex_attention_export_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_float_conversion_from_int_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_float_conversion_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_fqn_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_from_node_metadata_export_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_full_on_scalar_tensor_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_function_holding_tensor_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_hints_wrapper_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_hoo_inline_users_issue_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_if_functional_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_if_post_autograd_op_preserved_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_inductor_backend_inside_nonstrict_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_inline_script_class_method_recursive_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_inline_script_class_method_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_inline_script_function_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_inline_script_method_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_int_shape_specialization_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_intermediate_shape_comp_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_invalid_pytree_dynamo_graph_capture_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_is_exporting_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_is_nonzero_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_isnonzero_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_issue_113041_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_issue_157289_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_issue_161902_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_istft_op_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_keep_composite_ops_invalid_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_keep_composite_ops_linear_convd_for_training_ir_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_keep_composite_ops_linear_convd_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_kwarg_dynamic_shapes_diff_order_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_kwargs_reorder_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_layer_norm_unbacked_normalized_shape_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_layer_sharing_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_lazy_module_kwargs_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_lifted_constants_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_linear_conv_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_malformed_fqn_from_source_name_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_map_buffers_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_map_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_mask_nonzero_static_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_masked_select_dynamic_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_math_pow_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_mismatched_dynamic_shapes_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_mixed_input_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_module_dict_key_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_module_input_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_module_input_subclasses_parameterization_nested_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_module_list_slice_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_module_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_module_with_dict_container_inp_out_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_modules_access_for_deleted_submodule_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_more_multidimensional_slicing_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_multidimensional_slicing_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_multinomial_dynamic_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_multiple_definitions_same_name_dim_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_namedtuple_input_export_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_native_multi_attention_head_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_nested_dynamic_shapes_spec_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_nested_module_fake_tensor_leak_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_nested_module_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_nested_module_with_constant_buffer_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_nested_module_with_init_buffer_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_nested_module_with_parameter_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_nn_module_stack_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_nn_module_stack_shared_submodule_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_no_check_is_size_error_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_no_suggested_fixes_for_data_dependent_errors_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_no_tensor_computation_2_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_no_tensor_computation_3_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_no_tensor_computation_4_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_no_tensor_computation_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_non_arg_name_dynamic_shapes_api_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_non_arg_name_dynamic_shapes_api_with_container_type_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_non_arg_name_dynamic_shapes_api_with_kwarg_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_non_persistent_buffer_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_non_strict_dynamic_shapes_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_non_strict_dynamic_shapes_suggested_fixes_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_none_buffers_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_nonstrict_retrace_preserves_metadata_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_nonzero_2_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_nonzero_dynamic_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_not_registered_parameter_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_operator_aten_tensor_mode_variant_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_output_node_name_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_pad_sequence_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_param_util_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_partial_patched_forward_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_placeholder_naming_collisions_hoo_subgraphs_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_placeholder_naming_collisions_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_placeholder_naming_order_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_placeholder_naming_order_variadic_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_placeholder_update_preserving_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_predispatch_cond_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_predispatch_grad_wrappers_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_preserve_annotation_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_preserve_module_call_signature_unflatten_specialization_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_preserve_requires_grad_placeholders_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_preserve_shape_dynamism_for_unused_inputs_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_profiling_code_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_python_asserts_with_sym_int_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_pytree_register_data_class_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_pytree_register_nested_data_class_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_raise_user_error_when_guard_on_data_dependent_operation_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_range_constraints_with_replacement_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_real_tensor_alias_dtype_mismatch_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_real_tensor_bool_cast_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_real_tensor_errors_on_aliasing_custom_op_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_real_tensor_for_max_op_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_real_tensor_size_mismatch_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_redundant_assert_max_upper_bound_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_redundant_asserts_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_refine_dynamic_shapes_from_suggested_fixes_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_register_constant_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_repeat_interleave_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_replace_unbacked_with_very_large_upperbound_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_replaced_unbacked_bindings_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_reshape_view_helper_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_retracable_ep_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_retrace_pre_autograd_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_run_decomposition_supports_user_input_mutation_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_run_decompositions_keep_metadata_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_run_decompositions_keep_tensor_constant_metadata_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_runtime_assert_for_prim_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_runtime_assert_for_prm_str_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_runtime_assert_with_size_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_sdpa_gqa_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_sequential_slicing_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_set_example_inputs_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_set_grad_as_side_effect_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_set_grad_empty_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_set_grad_unflatten_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_setgrad_lifted_tensor_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_shared_submodule_nn_module_stack_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_simple_export_for_training_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_simple_unbacked_view_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_size_input_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_slice_nn_module_stack_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_solver_unsupported_sympy_function_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_specialize_derived_dim_roots_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_split_const_gm_with_lifted_constants_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_stack_trace_make_fx_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_stack_trace_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_state_primitives_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_state_shape_attribute_assignment_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_state_tensors_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_static_dim_constraints_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_subclass_context_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_subclass_nested_attr_access_complicated_metadata_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_subclass_nested_attr_access_const_metadata_not_top_level_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_subclass_nested_attr_access_const_metadata_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_subclass_nested_attr_access_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_subclass_nested_attr_access_submodule_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_subclasses_parameterization_nested_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_subclasses_parameterization_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_suggest_torch_checks_with_non_negative_check_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_suggest_torch_checks_with_regular_check_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_suggested_fixes_for_data_dependent_errors_basic_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_suggested_fixes_for_data_dependent_errors_puzzlers_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_suggested_fixes_new_roots_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_sym_float_operators_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_sym_or_sym_and_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_sym_sqrt_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_symbool_item_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_symfloat_item_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_symint_input_additional_inputs_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_symint_input_basic_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_symint_input_ranges_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_symint_input_shapes_collection_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_symint_input_specialization_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_symint_item_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_symint_output_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_symint_tensor_return_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_tag_ac_export_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_tensor_attribute_zero_args_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_tensor_constant_aten_to_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_tensor_constant_with_wrapped_method_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_to_module_with_mutated_buffer_multiple_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_to_module_with_mutated_buffer_multiple_update_sub_later_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_to_module_with_mutated_buffer_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_tolist_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_torch_check_eq_commutativity_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_torch_fn_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_trace_under_fake_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_train_eval_on_exported_preautograd_module_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_tril_dynamic_diagonal_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_triu_dynamic_diagonal_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unbacked_3d_matmul_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unbacked_bincount_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unbacked_bindings_for_divisible_u_symint_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unbacked_deferred_runtime_retrace_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unbacked_expand_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unbacked_infer_size_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unbacked_kth_value_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unbacked_linear_layer_norm_input_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unbacked_noncontig_lin_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unbacked_pad_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unbacked_scalar_constructor_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unbacked_slice_forward_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unbacked_slice_simple_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unbacked_stack_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unbacked_to_cond_passthrough_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unbacked_to_cond_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unbacked_unsqueeze_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unflatten_asserts_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unflatten_buffer_update_child2parent_swap_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unflatten_closure_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unflatten_isinstance_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unflatten_multiple_graphs_dispatch_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unflatten_multiple_graphs_preserve_signature_no_error_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unflatten_multiple_graphs_shared_submodule_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unflatten_multiple_graphs_state_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unflatten_no_unroll_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unflatten_placeholder_update_child2parent_swap_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unflatten_placeholder_update_grandchild2cousin_swap_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unflatten_random_dag_5_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unflatten_random_dag_6_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unflatten_random_dag_buf_8_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unflatten_random_dag_const_preserving_3_1_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unflatten_random_dag_const_preserving_3_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unflatten_random_dag_mutating_buf_4_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unflatten_random_dag_mutating_buf_6_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unflatten_random_dag_mutating_buf_9_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unflatten_random_dag_mutating_buf_preserving_10_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unflatten_random_dag_mutating_buf_preserving_4_1_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unflatten_random_dag_mutating_buf_preserving_4_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unflatten_random_dag_mutating_buf_preserving_5_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unflatten_random_dag_mutating_buf_preserving_7_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unflatten_random_dag_preserving_4_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unused_aliases_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unused_constant_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_uplift_common_custom_meta_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_uplift_common_custom_meta_with_multiple_calls_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_use_embedding_twice_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_user_input_and_buffer_mutation_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_vmap_custom_autograd_function_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_vmap_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_vmap_to_assert_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_where_decomp_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_while_loop_assert_separation_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_while_loop_index_assertions_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_while_loop_simple_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_while_loop_tensor_constant_idx_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_wrapper_module_serdes_nonstrict 2025-12-04T12:24:41.6805875Z 2025-12-04T12:24:41.6806251Z Finished export/test_serdes 1/1 ... [2025-12-04 12:24:41.587580][11509.970472875], took 4.14min 2025-12-04T12:24:41.6807427Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/export.test_serdes/export.test_serdes-191fd84c43c29743.xml 2025-12-04T12:24:41.7399320Z Running dynamo/test_backends 1/1 ... [2025-12-04 12:24:41.739605][11510.122497627] 2025-12-04T12:24:41.7400103Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T12:24:41.7403031Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'dynamo/test_backends.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 12:24:41.740065] 2025-12-04T12:25:50.2175311Z 2025-12-04T12:25:50.2176451Z PRINTING LOG FILE of dynamo/test_backends 1/1 (test/test-reports/dynamo.test_backends_1.1_0248c6271c37d6dd_.log) 2025-12-04T12:25:50.2178039Z Test results will be stored in test-reports/python-pytest/dynamo.test_backends/dynamo.test_backends-7c3220f8bc842d2f.xml 2025-12-04T12:25:50.2178867Z ============================= test session starts ============================== 2025-12-04T12:25:50.2179538Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T12:25:50.2180194Z cachedir: .pytest_cache 2025-12-04T12:25:50.2180952Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:25:50.2181750Z rootdir: /var/lib/jenkins/workspace 2025-12-04T12:25:50.2182112Z configfile: pytest.ini 2025-12-04T12:25:50.2182895Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T12:25:50.2183746Z collecting ... collected 21 items 2025-12-04T12:25:50.2184170Z stepcurrent: Cannot find last run test, not skipping 2025-12-04T12:25:50.2192715Z Running 21 items in this shard: test/dynamo/test_backends.py::NormalizeIRTests::test_inplace_normalize, test/dynamo/test_backends.py::MPSSupportedTest::test_mps_supported, test/dynamo/test_backends.py::TestExplainWithBackend::test_explain_with_backend, test/dynamo/test_backends.py::TestCustomBackendAPI::test_aot_autograd_api, test/dynamo/test_backends.py::TestCustomBackendAPI::test_backend_graph_freeze, test/dynamo/test_backends.py::TestCustomBackendAPI::test_backend_recompilation, test/dynamo/test_backends.py::TestCustomBackendAPI::test_lookup_backend, test/dynamo/test_backends.py::TestCustomBackendAPI::test_lookup_custom_backend, test/dynamo/test_backends.py::TestCustomBackendAPI::test_register_backend_api, test/dynamo/test_backends.py::TestOptimizationsCUDA::test_aot_cudagraphs_cuda, test/dynamo/test_backends.py::TestOptimizationsCUDA::test_aot_eager_cuda, test/dynamo/test_backends.py::TestOptimizationsCUDA::test_aot_eager_decomp_partition_cuda, test/dynamo/test_backends.py::TestOptimizationsCUDA::test_aot_ts_cuda, test/dynamo/test_backends.py::TestOptimizationsCUDA::test_eager_cuda, test/dynamo/test_backends.py::TestOptimizationsCUDA::test_eager_noexcept_cuda, test/dynamo/test_backends.py::TestOptimizationsCUDA::test_example_inputs_cuda, test/dynamo/test_backends.py::TestOptimizationsCUDA::test_example_inputs_runtime_use_cuda, test/dynamo/test_backends.py::TestOptimizationsCUDA::test_intel_gaudi_backend_cuda, test/dynamo/test_backends.py::TestOptimizationsCUDA::test_list_backends_cuda, test/dynamo/test_backends.py::TestOptimizationsCUDA::test_torchscript_cuda, test/dynamo/test_backends.py::TestOptimizationsCUDA::test_tvm_cuda 2025-12-04T12:25:50.2201357Z 2025-12-04T12:25:50.2201718Z dynamo/test_backends.py::NormalizeIRTests::test_inplace_normalize PASSED [0.2319s] [ 4%] 2025-12-04T12:25:50.2202625Z dynamo/test_backends.py::MPSSupportedTest::test_mps_supported SKIPPED [0.0003s] (requires mps) [ 9%] 2025-12-04T12:25:50.2203579Z dynamo/test_backends.py::TestExplainWithBackend::test_explain_with_backend PASSED [7.1635s] [ 14%] 2025-12-04T12:25:50.2204500Z dynamo/test_backends.py::TestCustomBackendAPI::test_aot_autograd_api PASSED [0.0603s] [ 19%] 2025-12-04T12:25:50.2205410Z dynamo/test_backends.py::TestCustomBackendAPI::test_backend_graph_freeze PASSED [0.1014s] [ 23%] 2025-12-04T12:25:50.2206339Z dynamo/test_backends.py::TestCustomBackendAPI::test_backend_recompilation PASSED [0.7360s] [ 28%] 2025-12-04T12:25:50.2207220Z dynamo/test_backends.py::TestCustomBackendAPI::test_lookup_backend PASSED [0.7438s] [ 33%] 2025-12-04T12:25:50.2208195Z dynamo/test_backends.py::TestCustomBackendAPI::test_lookup_custom_backend PASSED [0.0034s] [ 38%] 2025-12-04T12:25:50.2209123Z dynamo/test_backends.py::TestCustomBackendAPI::test_register_backend_api PASSED [0.0433s] [ 42%] 2025-12-04T12:25:50.2210138Z dynamo/test_backends.py::TestOptimizationsCUDA::test_aot_cudagraphs_cuda ('RERUN', {'yellow': True}) [1.2270s] [ 47%] 2025-12-04T12:25:50.2211277Z dynamo/test_backends.py::TestOptimizationsCUDA::test_aot_cudagraphs_cuda ('RERUN', {'yellow': True}) [0.5088s] [ 47%] 2025-12-04T12:25:50.2212323Z dynamo/test_backends.py::TestOptimizationsCUDA::test_aot_cudagraphs_cuda FAILED [0.5170s] [ 47%] 2025-12-04T12:25:50.2212853Z 2025-12-04T12:25:50.2213016Z ==================================== RERUNS ==================================== 2025-12-04T12:25:50.2213594Z ________________ TestOptimizationsCUDA.test_aot_cudagraphs_cuda ________________ 2025-12-04T12:25:50.2214135Z Traceback (most recent call last): 2025-12-04T12:25:50.2214923Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:25:50.2215705Z method(*args, **kwargs) 2025-12-04T12:25:50.2216504Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:25:50.2217283Z method(*args, **kwargs) 2025-12-04T12:25:50.2218018Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T12:25:50.2218791Z with policy(): 2025-12-04T12:25:50.2219471Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T12:25:50.2220249Z raise RuntimeError(msg) 2025-12-04T12:25:50.2221540Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestOptimizationsCUDA.test_aot_cudagraphs_cuda! Caching allocator allocated memory was 0 and is now reported as 3584 on device 0. CUDA driver allocated memory was 680460288 and is now 867106816. 2025-12-04T12:25:50.2222756Z 2025-12-04T12:25:50.2222989Z To execute this test, run the following from the base repo dir: 2025-12-04T12:25:50.2223837Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/dynamo/test_backends.py TestOptimizationsCUDA.test_aot_cudagraphs_cuda 2025-12-04T12:25:50.2224480Z 2025-12-04T12:25:50.2224756Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:25:50.2225455Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:25:50.2225942Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:25:50.2226314Z stats [('calls_captured', 4), ('unique_graphs', 1)] 2025-12-04T12:25:50.2226737Z aot_autograd [('total', 1), ('ok', 1)] 2025-12-04T12:25:50.2227263Z ________________ TestOptimizationsCUDA.test_aot_cudagraphs_cuda ________________ 2025-12-04T12:25:50.2227794Z Traceback (most recent call last): 2025-12-04T12:25:50.2228568Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:25:50.2229341Z method(*args, **kwargs) 2025-12-04T12:25:50.2230068Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:25:50.2230823Z method(*args, **kwargs) 2025-12-04T12:25:50.2231549Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T12:25:50.2232314Z with policy(): 2025-12-04T12:25:50.2232995Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T12:25:50.2233773Z raise RuntimeError(msg) 2025-12-04T12:25:50.2235070Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestOptimizationsCUDA.test_aot_cudagraphs_cuda! Caching allocator allocated memory was 0 and is now reported as 3584 on device 0. CUDA driver allocated memory was 839843840 and is now 867106816. 2025-12-04T12:25:50.2236334Z 2025-12-04T12:25:50.2236565Z To execute this test, run the following from the base repo dir: 2025-12-04T12:25:50.2237406Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/dynamo/test_backends.py TestOptimizationsCUDA.test_aot_cudagraphs_cuda 2025-12-04T12:25:50.2238051Z 2025-12-04T12:25:50.2238323Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:25:50.2239002Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:25:50.2239525Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:25:50.2239897Z stats [('calls_captured', 4), ('unique_graphs', 1)] 2025-12-04T12:25:50.2240326Z aot_autograd [('total', 1), ('ok', 1)] 2025-12-04T12:25:50.2240795Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:25:50.2241272Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:25:50.2241642Z stats [('calls_captured', 4), ('unique_graphs', 1)] 2025-12-04T12:25:50.2242061Z aot_autograd [('total', 1), ('ok', 1)] 2025-12-04T12:25:50.2242454Z =================================== FAILURES =================================== 2025-12-04T12:25:50.2243016Z ________________ TestOptimizationsCUDA.test_aot_cudagraphs_cuda ________________ 2025-12-04T12:25:50.2243561Z Traceback (most recent call last): 2025-12-04T12:25:50.2244330Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:25:50.2245101Z method(*args, **kwargs) 2025-12-04T12:25:50.2245825Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:25:50.2246596Z method(*args, **kwargs) 2025-12-04T12:25:50.2247314Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T12:25:50.2248068Z with policy(): 2025-12-04T12:25:50.2248762Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T12:25:50.2249541Z raise RuntimeError(msg) 2025-12-04T12:25:50.2250841Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestOptimizationsCUDA.test_aot_cudagraphs_cuda! Caching allocator allocated memory was 0 and is now reported as 3584 on device 0. CUDA driver allocated memory was 839843840 and is now 867106816. 2025-12-04T12:25:50.2252056Z 2025-12-04T12:25:50.2252314Z To execute this test, run the following from the base repo dir: 2025-12-04T12:25:50.2253169Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/dynamo/test_backends.py TestOptimizationsCUDA.test_aot_cudagraphs_cuda 2025-12-04T12:25:50.2253795Z 2025-12-04T12:25:50.2254077Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:25:50.2254709Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:25:50.2255186Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:25:50.2255573Z stats [('calls_captured', 4), ('unique_graphs', 1)] 2025-12-04T12:25:50.2255998Z aot_autograd [('total', 1), ('ok', 1)] 2025-12-04T12:25:50.2256534Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:25:50.2257015Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:25:50.2257400Z stats [('calls_captured', 4), ('unique_graphs', 1)] 2025-12-04T12:25:50.2257815Z aot_autograd [('total', 1), ('ok', 1)] 2025-12-04T12:25:50.2258286Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:25:50.2258760Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:25:50.2259140Z stats [('calls_captured', 4), ('unique_graphs', 1)] 2025-12-04T12:25:50.2259543Z aot_autograd [('total', 1), ('ok', 1)] 2025-12-04T12:25:50.2260475Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/dynamo.test_backends/dynamo.test_backends-7c3220f8bc842d2f.xml - 2025-12-04T12:25:50.2261541Z =========================== short test summary info ============================ 2025-12-04T12:25:50.2263401Z FAILED [0.5170s] dynamo/test_backends.py::TestOptimizationsCUDA::test_aot_cudagraphs_cuda - RuntimeError: CUDA driver API confirmed a leak in __main__.TestOptimizationsCUDA.test_aot_cudagraphs_cuda! Caching allocator allocated memory was 0 and is now reported as 3584 on device 0. CUDA driver allocated memory was 839843840 and is now 867106816. 2025-12-04T12:25:50.2265093Z 2025-12-04T12:25:50.2265331Z To execute this test, run the following from the base repo dir: 2025-12-04T12:25:50.2266214Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/dynamo/test_backends.py TestOptimizationsCUDA.test_aot_cudagraphs_cuda 2025-12-04T12:25:50.2266865Z 2025-12-04T12:25:50.2267138Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:25:50.2267745Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T12:25:50.2268291Z =============== 1 failed, 8 passed, 1 skipped, 2 rerun in 11.39s =============== 2025-12-04T12:25:50.2268738Z Got exit code 1 2025-12-04T12:25:50.2269020Z Retrying single test... 2025-12-04T12:25:50.2269715Z Test results will be stored in test-reports/python-pytest/dynamo.test_backends/dynamo.test_backends-7281b232f81c7a26.xml 2025-12-04T12:25:50.2270518Z ============================= test session starts ============================== 2025-12-04T12:25:50.2271428Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T12:25:50.2272046Z cachedir: .pytest_cache 2025-12-04T12:25:50.2272765Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:25:50.2273547Z rootdir: /var/lib/jenkins/workspace 2025-12-04T12:25:50.2273917Z configfile: pytest.ini 2025-12-04T12:25:50.2274699Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T12:25:50.2275640Z collecting ... collected 21 items / 20 deselected / 1 selected 2025-12-04T12:25:50.2276570Z stepcurrent: skipping 9 already run items. Running only test/dynamo/test_backends.py::TestOptimizationsCUDA::test_aot_cudagraphs_cuda 2025-12-04T12:25:50.2277398Z Running 1 items in this shard 2025-12-04T12:25:50.2277613Z 2025-12-04T12:25:50.2278109Z dynamo/test_backends.py::TestOptimizationsCUDA::test_aot_cudagraphs_cuda ('RERUN', {'yellow': True}) [1.3827s] [100%] 2025-12-04T12:25:50.2279303Z dynamo/test_backends.py::TestOptimizationsCUDA::test_aot_cudagraphs_cuda ('RERUN', {'yellow': True}) [0.4729s] [100%] 2025-12-04T12:25:50.2280318Z dynamo/test_backends.py::TestOptimizationsCUDA::test_aot_cudagraphs_cuda FAILED [0.4714s] [100%] 2025-12-04T12:25:50.2280858Z 2025-12-04T12:25:50.2281007Z ==================================== RERUNS ==================================== 2025-12-04T12:25:50.2281583Z ________________ TestOptimizationsCUDA.test_aot_cudagraphs_cuda ________________ 2025-12-04T12:25:50.2282114Z Traceback (most recent call last): 2025-12-04T12:25:50.2282891Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:25:50.2283665Z method(*args, **kwargs) 2025-12-04T12:25:50.2284386Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:25:50.2285143Z method(*args, **kwargs) 2025-12-04T12:25:50.2285869Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T12:25:50.2286629Z with policy(): 2025-12-04T12:25:50.2287309Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T12:25:50.2288092Z raise RuntimeError(msg) 2025-12-04T12:25:50.2289389Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestOptimizationsCUDA.test_aot_cudagraphs_cuda! Caching allocator allocated memory was 0 and is now reported as 3584 on device 0. CUDA driver allocated memory was 680460288 and is now 867106816. 2025-12-04T12:25:50.2290738Z 2025-12-04T12:25:50.2290962Z To execute this test, run the following from the base repo dir: 2025-12-04T12:25:50.2291814Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/dynamo/test_backends.py TestOptimizationsCUDA.test_aot_cudagraphs_cuda 2025-12-04T12:25:50.2292493Z 2025-12-04T12:25:50.2292821Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:25:50.2293449Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:25:50.2293931Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:25:50.2294312Z stats [('calls_captured', 4), ('unique_graphs', 1)] 2025-12-04T12:25:50.2294726Z aot_autograd [('total', 1), ('ok', 1)] 2025-12-04T12:25:50.2295238Z ________________ TestOptimizationsCUDA.test_aot_cudagraphs_cuda ________________ 2025-12-04T12:25:50.2295786Z Traceback (most recent call last): 2025-12-04T12:25:50.2296634Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:25:50.2297402Z method(*args, **kwargs) 2025-12-04T12:25:50.2298130Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:25:50.2298909Z method(*args, **kwargs) 2025-12-04T12:25:50.2299643Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T12:25:50.2300397Z with policy(): 2025-12-04T12:25:50.2301092Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T12:25:50.2301873Z raise RuntimeError(msg) 2025-12-04T12:25:50.2303153Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestOptimizationsCUDA.test_aot_cudagraphs_cuda! Caching allocator allocated memory was 0 and is now reported as 3584 on device 0. CUDA driver allocated memory was 839843840 and is now 867106816. 2025-12-04T12:25:50.2304373Z 2025-12-04T12:25:50.2304589Z To execute this test, run the following from the base repo dir: 2025-12-04T12:25:50.2305443Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/dynamo/test_backends.py TestOptimizationsCUDA.test_aot_cudagraphs_cuda 2025-12-04T12:25:50.2306081Z 2025-12-04T12:25:50.2306414Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:25:50.2307049Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:25:50.2307521Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:25:50.2307899Z stats [('calls_captured', 4), ('unique_graphs', 1)] 2025-12-04T12:25:50.2308319Z aot_autograd [('total', 1), ('ok', 1)] 2025-12-04T12:25:50.2308768Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:25:50.2309251Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:25:50.2309626Z stats [('calls_captured', 4), ('unique_graphs', 1)] 2025-12-04T12:25:50.2310031Z aot_autograd [('total', 1), ('ok', 1)] 2025-12-04T12:25:50.2310427Z =================================== FAILURES =================================== 2025-12-04T12:25:50.2310996Z ________________ TestOptimizationsCUDA.test_aot_cudagraphs_cuda ________________ 2025-12-04T12:25:50.2311549Z Traceback (most recent call last): 2025-12-04T12:25:50.2312311Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:25:50.2313084Z method(*args, **kwargs) 2025-12-04T12:25:50.2313804Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:25:50.2314558Z method(*args, **kwargs) 2025-12-04T12:25:50.2315274Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T12:25:50.2316074Z with policy(): 2025-12-04T12:25:50.2316766Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T12:25:50.2317530Z raise RuntimeError(msg) 2025-12-04T12:25:50.2318826Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestOptimizationsCUDA.test_aot_cudagraphs_cuda! Caching allocator allocated memory was 0 and is now reported as 3584 on device 0. CUDA driver allocated memory was 839843840 and is now 867106816. 2025-12-04T12:25:50.2320124Z 2025-12-04T12:25:50.2320345Z To execute this test, run the following from the base repo dir: 2025-12-04T12:25:50.2321198Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/dynamo/test_backends.py TestOptimizationsCUDA.test_aot_cudagraphs_cuda 2025-12-04T12:25:50.2321827Z 2025-12-04T12:25:50.2322101Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:25:50.2322748Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:25:50.2323231Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:25:50.2323617Z stats [('calls_captured', 4), ('unique_graphs', 1)] 2025-12-04T12:25:50.2324026Z aot_autograd [('total', 1), ('ok', 1)] 2025-12-04T12:25:50.2324493Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:25:50.2324972Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:25:50.2325342Z stats [('calls_captured', 4), ('unique_graphs', 1)] 2025-12-04T12:25:50.2325763Z aot_autograd [('total', 1), ('ok', 1)] 2025-12-04T12:25:50.2326227Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:25:50.2326687Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:25:50.2327063Z stats [('calls_captured', 4), ('unique_graphs', 1)] 2025-12-04T12:25:50.2327481Z aot_autograd [('total', 1), ('ok', 1)] 2025-12-04T12:25:50.2328404Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/dynamo.test_backends/dynamo.test_backends-7281b232f81c7a26.xml - 2025-12-04T12:25:50.2329389Z =========================== short test summary info ============================ 2025-12-04T12:25:50.2331245Z FAILED [0.4714s] dynamo/test_backends.py::TestOptimizationsCUDA::test_aot_cudagraphs_cuda - RuntimeError: CUDA driver API confirmed a leak in __main__.TestOptimizationsCUDA.test_aot_cudagraphs_cuda! Caching allocator allocated memory was 0 and is now reported as 3584 on device 0. CUDA driver allocated memory was 839843840 and is now 867106816. 2025-12-04T12:25:50.2332969Z 2025-12-04T12:25:50.2333194Z To execute this test, run the following from the base repo dir: 2025-12-04T12:25:50.2334045Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/dynamo/test_backends.py TestOptimizationsCUDA.test_aot_cudagraphs_cuda 2025-12-04T12:25:50.2334673Z 2025-12-04T12:25:50.2334944Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:25:50.2335545Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T12:25:50.2336083Z ================== 1 failed, 20 deselected, 2 rerun in 2.36s =================== 2025-12-04T12:25:50.2336648Z Got exit code 1 2025-12-04T12:25:50.2336914Z Retrying single test... 2025-12-04T12:25:50.2337615Z Test results will be stored in test-reports/python-pytest/dynamo.test_backends/dynamo.test_backends-0c21d337a20b3a01.xml 2025-12-04T12:25:50.2338432Z ============================= test session starts ============================== 2025-12-04T12:25:50.2339094Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T12:25:50.2339706Z cachedir: .pytest_cache 2025-12-04T12:25:50.2340433Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:25:50.2341223Z rootdir: /var/lib/jenkins/workspace 2025-12-04T12:25:50.2341571Z configfile: pytest.ini 2025-12-04T12:25:50.2342401Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T12:25:50.2343350Z collecting ... collected 21 items / 20 deselected / 1 selected 2025-12-04T12:25:50.2344274Z stepcurrent: skipping 9 already run items. Running only test/dynamo/test_backends.py::TestOptimizationsCUDA::test_aot_cudagraphs_cuda 2025-12-04T12:25:50.2345102Z Running 1 items in this shard 2025-12-04T12:25:50.2345364Z 2025-12-04T12:25:50.2345887Z dynamo/test_backends.py::TestOptimizationsCUDA::test_aot_cudagraphs_cuda ('RERUN', {'yellow': True}) [1.3703s] [100%] 2025-12-04T12:25:50.2347000Z dynamo/test_backends.py::TestOptimizationsCUDA::test_aot_cudagraphs_cuda ('RERUN', {'yellow': True}) [0.4663s] [100%] 2025-12-04T12:25:50.2348001Z dynamo/test_backends.py::TestOptimizationsCUDA::test_aot_cudagraphs_cuda FAILED [0.4503s] [100%] 2025-12-04T12:25:50.2348542Z 2025-12-04T12:25:50.2348689Z ==================================== RERUNS ==================================== 2025-12-04T12:25:50.2349267Z ________________ TestOptimizationsCUDA.test_aot_cudagraphs_cuda ________________ 2025-12-04T12:25:50.2349811Z Traceback (most recent call last): 2025-12-04T12:25:50.2350568Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:25:50.2351342Z method(*args, **kwargs) 2025-12-04T12:25:50.2352067Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:25:50.2352842Z method(*args, **kwargs) 2025-12-04T12:25:50.2353554Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T12:25:50.2354316Z with policy(): 2025-12-04T12:25:50.2355008Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T12:25:50.2355766Z raise RuntimeError(msg) 2025-12-04T12:25:50.2357064Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestOptimizationsCUDA.test_aot_cudagraphs_cuda! Caching allocator allocated memory was 0 and is now reported as 3584 on device 0. CUDA driver allocated memory was 680460288 and is now 867106816. 2025-12-04T12:25:50.2358289Z 2025-12-04T12:25:50.2358508Z To execute this test, run the following from the base repo dir: 2025-12-04T12:25:50.2359365Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/dynamo/test_backends.py TestOptimizationsCUDA.test_aot_cudagraphs_cuda 2025-12-04T12:25:50.2360034Z 2025-12-04T12:25:50.2360318Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:25:50.2360950Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:25:50.2361432Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:25:50.2361813Z stats [('calls_captured', 4), ('unique_graphs', 1)] 2025-12-04T12:25:50.2362222Z aot_autograd [('total', 1), ('ok', 1)] 2025-12-04T12:25:50.2362749Z ________________ TestOptimizationsCUDA.test_aot_cudagraphs_cuda ________________ 2025-12-04T12:25:50.2363298Z Traceback (most recent call last): 2025-12-04T12:25:50.2364048Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:25:50.2364823Z method(*args, **kwargs) 2025-12-04T12:25:50.2365545Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:25:50.2366309Z method(*args, **kwargs) 2025-12-04T12:25:50.2367024Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T12:25:50.2367782Z with policy(): 2025-12-04T12:25:50.2368474Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T12:25:50.2369240Z raise RuntimeError(msg) 2025-12-04T12:25:50.2370571Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestOptimizationsCUDA.test_aot_cudagraphs_cuda! Caching allocator allocated memory was 0 and is now reported as 3584 on device 0. CUDA driver allocated memory was 839843840 and is now 867106816. 2025-12-04T12:25:50.2371991Z 2025-12-04T12:25:50.2372213Z To execute this test, run the following from the base repo dir: 2025-12-04T12:25:50.2373081Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/dynamo/test_backends.py TestOptimizationsCUDA.test_aot_cudagraphs_cuda 2025-12-04T12:25:50.2373798Z 2025-12-04T12:25:50.2374159Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:25:50.2374781Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:25:50.2375270Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:25:50.2375656Z stats [('calls_captured', 4), ('unique_graphs', 1)] 2025-12-04T12:25:50.2376067Z aot_autograd [('total', 1), ('ok', 1)] 2025-12-04T12:25:50.2376610Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:25:50.2377093Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:25:50.2377461Z stats [('calls_captured', 4), ('unique_graphs', 1)] 2025-12-04T12:25:50.2377884Z aot_autograd [('total', 1), ('ok', 1)] 2025-12-04T12:25:50.2378279Z =================================== FAILURES =================================== 2025-12-04T12:25:50.2378851Z ________________ TestOptimizationsCUDA.test_aot_cudagraphs_cuda ________________ 2025-12-04T12:25:50.2379385Z Traceback (most recent call last): 2025-12-04T12:25:50.2380158Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:25:50.2380931Z method(*args, **kwargs) 2025-12-04T12:25:50.2381644Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T12:25:50.2382412Z method(*args, **kwargs) 2025-12-04T12:25:50.2383130Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T12:25:50.2383887Z with policy(): 2025-12-04T12:25:50.2384565Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T12:25:50.2385341Z raise RuntimeError(msg) 2025-12-04T12:25:50.2386701Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestOptimizationsCUDA.test_aot_cudagraphs_cuda! Caching allocator allocated memory was 0 and is now reported as 3584 on device 0. CUDA driver allocated memory was 839843840 and is now 867106816. 2025-12-04T12:25:50.2387916Z 2025-12-04T12:25:50.2388146Z To execute this test, run the following from the base repo dir: 2025-12-04T12:25:50.2388985Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/dynamo/test_backends.py TestOptimizationsCUDA.test_aot_cudagraphs_cuda 2025-12-04T12:25:50.2389628Z 2025-12-04T12:25:50.2389897Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:25:50.2390540Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:25:50.2391024Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:25:50.2391398Z stats [('calls_captured', 4), ('unique_graphs', 1)] 2025-12-04T12:25:50.2391823Z aot_autograd [('total', 1), ('ok', 1)] 2025-12-04T12:25:50.2392293Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:25:50.2392763Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:25:50.2393151Z stats [('calls_captured', 4), ('unique_graphs', 1)] 2025-12-04T12:25:50.2393576Z aot_autograd [('total', 1), ('ok', 1)] 2025-12-04T12:25:50.2394023Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:25:50.2394493Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:25:50.2394875Z stats [('calls_captured', 4), ('unique_graphs', 1)] 2025-12-04T12:25:50.2395294Z aot_autograd [('total', 1), ('ok', 1)] 2025-12-04T12:25:50.2396274Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/dynamo.test_backends/dynamo.test_backends-0c21d337a20b3a01.xml - 2025-12-04T12:25:50.2397268Z =========================== short test summary info ============================ 2025-12-04T12:25:50.2400416Z FAILED [0.4503s] dynamo/test_backends.py::TestOptimizationsCUDA::test_aot_cudagraphs_cuda - RuntimeError: CUDA driver API confirmed a leak in __main__.TestOptimizationsCUDA.test_aot_cudagraphs_cuda! Caching allocator allocated memory was 0 and is now reported as 3584 on device 0. CUDA driver allocated memory was 839843840 and is now 867106816. 2025-12-04T12:25:50.2402149Z 2025-12-04T12:25:50.2402387Z To execute this test, run the following from the base repo dir: 2025-12-04T12:25:50.2403249Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/dynamo/test_backends.py TestOptimizationsCUDA.test_aot_cudagraphs_cuda 2025-12-04T12:25:50.2403970Z 2025-12-04T12:25:50.2404350Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:25:50.2405091Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T12:25:50.2405629Z ================== 1 failed, 20 deselected, 2 rerun in 2.32s =================== 2025-12-04T12:25:50.2406071Z Got exit code 1 2025-12-04T12:25:50.2406657Z FAILED CONSISTENTLY: test/dynamo/test_backends.py::TestOptimizationsCUDA::test_aot_cudagraphs_cuda 2025-12-04T12:25:50.2407634Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T12:25:50.2408700Z Test results will be stored in test-reports/python-pytest/dynamo.test_backends/dynamo.test_backends-62a9a8755ec319d6.xml 2025-12-04T12:25:50.2409504Z ============================= test session starts ============================== 2025-12-04T12:25:50.2410179Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T12:25:50.2410792Z cachedir: .pytest_cache 2025-12-04T12:25:50.2411508Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:25:50.2412302Z rootdir: /var/lib/jenkins/workspace 2025-12-04T12:25:50.2412669Z configfile: pytest.ini 2025-12-04T12:25:50.2413450Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T12:25:50.2414395Z collecting ... collected 21 items / 10 deselected / 11 selected 2025-12-04T12:25:50.2414980Z stepcurrent: skipping 10 already run items. 2025-12-04T12:25:50.2415387Z Running 11 items in this shard 2025-12-04T12:25:50.2415599Z 2025-12-04T12:25:50.2415973Z dynamo/test_backends.py::TestOptimizationsCUDA::test_aot_eager_cuda PASSED [1.1914s] [ 9%] 2025-12-04T12:25:50.2417029Z dynamo/test_backends.py::TestOptimizationsCUDA::test_aot_eager_decomp_partition_cuda PASSED [0.1564s] [ 18%] 2025-12-04T12:25:50.2417981Z dynamo/test_backends.py::TestOptimizationsCUDA::test_aot_ts_cuda PASSED [0.2164s] [ 27%] 2025-12-04T12:25:50.2418821Z dynamo/test_backends.py::TestOptimizationsCUDA::test_eager_cuda PASSED [0.0739s] [ 36%] 2025-12-04T12:25:50.2419686Z dynamo/test_backends.py::TestOptimizationsCUDA::test_eager_noexcept_cuda PASSED [0.0738s] [ 45%] 2025-12-04T12:25:50.2420614Z dynamo/test_backends.py::TestOptimizationsCUDA::test_example_inputs_cuda PASSED [0.0506s] [ 54%] 2025-12-04T12:25:50.2421596Z dynamo/test_backends.py::TestOptimizationsCUDA::test_example_inputs_runtime_use_cuda PASSED [0.0470s] [ 63%] 2025-12-04T12:25:50.2422692Z dynamo/test_backends.py::TestOptimizationsCUDA::test_intel_gaudi_backend_cuda SKIPPED [0.0019s] (Only runs on hpu) [ 72%] 2025-12-04T12:25:50.2423709Z dynamo/test_backends.py::TestOptimizationsCUDA::test_list_backends_cuda PASSED [0.0124s] [ 81%] 2025-12-04T12:25:50.2424612Z dynamo/test_backends.py::TestOptimizationsCUDA::test_torchscript_cuda PASSED [0.1006s] [ 90%] 2025-12-04T12:25:50.2425538Z dynamo/test_backends.py::TestOptimizationsCUDA::test_tvm_cuda SKIPPED [0.0003s] (requires tvm) [100%] 2025-12-04T12:25:50.2426137Z 2025-12-04T12:25:50.2426827Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/dynamo.test_backends/dynamo.test_backends-62a9a8755ec319d6.xml - 2025-12-04T12:25:50.2427850Z ================= 9 passed, 2 skipped, 10 deselected in 1.97s ================== 2025-12-04T12:25:50.2428726Z The following tests failed consistently: ['test/dynamo/test_backends.py::TestOptimizationsCUDA::test_aot_cudagraphs_cuda'] 2025-12-04T12:25:50.2429424Z 2025-12-04T12:25:50.2429944Z FINISHED PRINTING LOG FILE of dynamo/test_backends 1/1 (test/test-reports/dynamo.test_backends_1.1_0248c6271c37d6dd_.log) 2025-12-04T12:25:50.2430563Z 2025-12-04T12:25:50.2431055Z Finished dynamo/test_backends 1/1 ... [2025-12-04 12:25:50.217435][11578.600331701], took 1.14min 2025-12-04T12:25:50.2432272Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/dynamo.test_backends/dynamo.test_backends-7c3220f8bc842d2f.xml 2025-12-04T12:25:50.3324527Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/dynamo.test_backends/dynamo.test_backends-7281b232f81c7a26.xml 2025-12-04T12:25:50.3636853Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/dynamo.test_backends/dynamo.test_backends-0c21d337a20b3a01.xml 2025-12-04T12:25:50.3921843Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/dynamo.test_backends/dynamo.test_backends-62a9a8755ec319d6.xml 2025-12-04T12:25:50.9799529Z Uploading logs for 57119749248 to S3 2025-12-04T12:25:51.1640450Z Uploading artifacts took 0.74 seconds 2025-12-04T12:25:51.1640858Z dynamo/test_backends 1/1 failed! 2025-12-04T12:25:51.1645719Z Running inductor/test_aot_inductor_package 1/1 ... [2025-12-04 12:25:51.164389][11579.547283555] 2025-12-04T12:25:51.1646342Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T12:25:51.1651216Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_aot_inductor_package.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 12:25:51.164855] 2025-12-04T12:35:04.2505411Z 2025-12-04T12:35:04.2506517Z PRINTING LOG FILE of inductor/test_aot_inductor_package 1/1 (test/test-reports/inductor.test_aot_inductor_package_1.1_5509f9f54e762912_.log) 2025-12-04T12:35:04.2519097Z Test results will be stored in test-reports/python-pytest/inductor.test_aot_inductor_package/inductor.test_aot_inductor_package-b1ca468dab29d0d8.xml 2025-12-04T12:35:04.2520747Z ============================= test session starts ============================== 2025-12-04T12:35:04.2521892Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T12:35:04.2522672Z cachedir: .pytest_cache 2025-12-04T12:35:04.2523678Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:35:04.2524951Z rootdir: /var/lib/jenkins/workspace 2025-12-04T12:35:04.2525452Z configfile: pytest.ini 2025-12-04T12:35:04.2526754Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T12:35:04.2527833Z collecting ... collected 88 items 2025-12-04T12:35:04.2528252Z stepcurrent: Cannot find last run test, not skipping 2025-12-04T12:35:04.2597615Z Running 88 items in this shard: test/inductor/test_aot_inductor_package.py::TestAOTInductorPackage_cpu::test_add, test/inductor/test_aot_inductor_package.py::TestAOTInductorPackage_cpu::test_bool_input, test/inductor/test_aot_inductor_package.py::TestAOTInductorPackage_cpu::test_compile_after_package, test/inductor/test_aot_inductor_package.py::TestAOTInductorPackage_cpu::test_compile_after_package_multi_arch, test/inductor/test_aot_inductor_package.py::TestAOTInductorPackage_cpu::test_compile_after_package_static, test/inductor/test_aot_inductor_package.py::TestAOTInductorPackage_cpu::test_compile_standalone_cos, test/inductor/test_aot_inductor_package.py::TestAOTInductorPackage_cpu::test_compile_with_exporter, test/inductor/test_aot_inductor_package.py::TestAOTInductorPackage_cpu::test_compile_with_exporter_weights, test/inductor/test_aot_inductor_package.py::TestAOTInductorPackage_cpu::test_deepcopy_compiled_model, test/inductor/test_aot_inductor_package.py::TestAOTInductorPackage_cpu::test_duplicate_calls, test/inductor/test_aot_inductor_package.py::TestAOTInductorPackage_cpu::test_linear, test/inductor/test_aot_inductor_package.py::TestAOTInductorPackage_cpu::test_loading_wrong_model, test/inductor/test_aot_inductor_package.py::TestAOTInductorPackage_cpu::test_metadata, test/inductor/test_aot_inductor_package.py::TestAOTInductorPackage_cpu::test_multiple_methods, test/inductor/test_aot_inductor_package.py::TestAOTInductorPackage_cpu::test_package_shared_weights, test/inductor/test_aot_inductor_package.py::TestAOTInductorPackage_cpu::test_package_user_managed_weight, test/inductor/test_aot_inductor_package.py::TestAOTInductorPackage_cpu::test_package_weights_on_disk_nested_module, test/inductor/test_aot_inductor_package.py::TestAOTInductorPackage_cpu::test_package_without_weight, test/inductor/test_aot_inductor_package.py::TestAOTInductorPackage_cpu::test_remove_intermediate_files, test/inductor/test_aot_inductor_package.py::TestAOTInductorPackage_cpu::test_save_buffer, test/inductor/test_aot_inductor_package.py::TestAOTInductorPackage_cpu::test_specified_output_dir, test/inductor/test_aot_inductor_package.py::TestAOTInductorPackage_cpu::test_update_weights, test/inductor/test_aot_inductor_package.py::TestAOTInductorPackageCpp_cpu::test_add, test/inductor/test_aot_inductor_package.py::TestAOTInductorPackageCpp_cpu::test_bool_input, test/inductor/test_aot_inductor_package.py::TestAOTInductorPackageCpp_cpu::test_compile_after_package, test/inductor/test_aot_inductor_package.py::TestAOTInductorPackageCpp_cpu::test_compile_after_package_multi_arch, test/inductor/test_aot_inductor_package.py::TestAOTInductorPackageCpp_cpu::test_compile_after_package_static, test/inductor/test_aot_inductor_package.py::TestAOTInductorPackageCpp_cpu::test_compile_standalone_cos, test/inductor/test_aot_inductor_package.py::TestAOTInductorPackageCpp_cpu::test_compile_with_exporter, test/inductor/test_aot_inductor_package.py::TestAOTInductorPackageCpp_cpu::test_compile_with_exporter_weights, test/inductor/test_aot_inductor_package.py::TestAOTInductorPackageCpp_cpu::test_deepcopy_compiled_model, test/inductor/test_aot_inductor_package.py::TestAOTInductorPackageCpp_cpu::test_duplicate_calls, test/inductor/test_aot_inductor_package.py::TestAOTInductorPackageCpp_cpu::test_linear, test/inductor/test_aot_inductor_package.py::TestAOTInductorPackageCpp_cpu::test_loading_wrong_model, test/inductor/test_aot_inductor_package.py::TestAOTInductorPackageCpp_cpu::test_metadata, test/inductor/test_aot_inductor_package.py::TestAOTInductorPackageCpp_cpu::test_multiple_methods, test/inductor/test_aot_inductor_package.py::TestAOTInductorPackageCpp_cpu::test_package_shared_weights, test/inductor/test_aot_inductor_package.py::TestAOTInductorPackageCpp_cpu::test_package_user_managed_weight, test/inductor/test_aot_inductor_package.py::TestAOTInductorPackageCpp_cpu::test_package_weights_on_disk_nested_module, test/inductor/test_aot_inductor_package.py::TestAOTInductorPackageCpp_cpu::test_package_without_weight, test/inductor/test_aot_inductor_package.py::TestAOTInductorPackageCpp_cpu::test_remove_intermediate_files, test/inductor/test_aot_inductor_package.py::TestAOTInductorPackageCpp_cpu::test_save_buffer, test/inductor/test_aot_inductor_package.py::TestAOTInductorPackageCpp_cpu::test_specified_output_dir, test/inductor/test_aot_inductor_package.py::TestAOTInductorPackageCpp_cpu::test_update_weights, test/inductor/test_aot_inductor_package.py::TestAOTInductorPackage_cuda::test_add, test/inductor/test_aot_inductor_package.py::TestAOTInductorPackage_cuda::test_bool_input, test/inductor/test_aot_inductor_package.py::TestAOTInductorPackage_cuda::test_compile_after_package, test/inductor/test_aot_inductor_package.py::TestAOTInductorPackage_cuda::test_compile_after_package_multi_arch, test/inductor/test_aot_inductor_package.py::TestAOTInductorPackage_cuda::test_compile_after_package_static, test/inductor/test_aot_inductor_package.py::TestAOTInductorPackage_cuda::test_compile_standalone_cos, test/inductor/test_aot_inductor_package.py::TestAOTInductorPackage_cuda::test_compile_with_exporter, test/inductor/test_aot_inductor_package.py::TestAOTInductorPackage_cuda::test_compile_with_exporter_weights, test/inductor/test_aot_inductor_package.py::TestAOTInductorPackage_cuda::test_deepcopy_compiled_model, test/inductor/test_aot_inductor_package.py::TestAOTInductorPackage_cuda::test_duplicate_calls, test/inductor/test_aot_inductor_package.py::TestAOTInductorPackage_cuda::test_linear, test/inductor/test_aot_inductor_package.py::TestAOTInductorPackage_cuda::test_loading_wrong_model, test/inductor/test_aot_inductor_package.py::TestAOTInductorPackage_cuda::test_metadata, test/inductor/test_aot_inductor_package.py::TestAOTInductorPackage_cuda::test_multiple_methods, test/inductor/test_aot_inductor_package.py::TestAOTInductorPackage_cuda::test_package_shared_weights, test/inductor/test_aot_inductor_package.py::TestAOTInductorPackage_cuda::test_package_user_managed_weight, test/inductor/test_aot_inductor_package.py::TestAOTInductorPackage_cuda::test_package_weights_on_disk_nested_module, test/inductor/test_aot_inductor_package.py::TestAOTInductorPackage_cuda::test_package_without_weight, test/inductor/test_aot_inductor_package.py::TestAOTInductorPackage_cuda::test_remove_intermediate_files, test/inductor/test_aot_inductor_package.py::TestAOTInductorPackage_cuda::test_save_buffer, test/inductor/test_aot_inductor_package.py::TestAOTInductorPackage_cuda::test_specified_output_dir, test/inductor/test_aot_inductor_package.py::TestAOTInductorPackage_cuda::test_update_weights, test/inductor/test_aot_inductor_package.py::TestAOTInductorPackageCpp_cuda::test_add, test/inductor/test_aot_inductor_package.py::TestAOTInductorPackageCpp_cuda::test_bool_input, test/inductor/test_aot_inductor_package.py::TestAOTInductorPackageCpp_cuda::test_compile_after_package, test/inductor/test_aot_inductor_package.py::TestAOTInductorPackageCpp_cuda::test_compile_after_package_multi_arch, test/inductor/test_aot_inductor_package.py::TestAOTInductorPackageCpp_cuda::test_compile_after_package_static, test/inductor/test_aot_inductor_package.py::TestAOTInductorPackageCpp_cuda::test_compile_standalone_cos, test/inductor/test_aot_inductor_package.py::TestAOTInductorPackageCpp_cuda::test_compile_with_exporter, test/inductor/test_aot_inductor_package.py::TestAOTInductorPackageCpp_cuda::test_compile_with_exporter_weights, test/inductor/test_aot_inductor_package.py::TestAOTInductorPackageCpp_cuda::test_deepcopy_compiled_model, test/inductor/test_aot_inductor_package.py::TestAOTInductorPackageCpp_cuda::test_duplicate_calls, test/inductor/test_aot_inductor_package.py::TestAOTInductorPackageCpp_cuda::test_linear, test/inductor/test_aot_inductor_package.py::TestAOTInductorPackageCpp_cuda::test_loading_wrong_model, test/inductor/test_aot_inductor_package.py::TestAOTInductorPackageCpp_cuda::test_metadata, test/inductor/test_aot_inductor_package.py::TestAOTInductorPackageCpp_cuda::test_multiple_methods, test/inductor/test_aot_inductor_package.py::TestAOTInductorPackageCpp_cuda::test_package_shared_weights, test/inductor/test_aot_inductor_package.py::TestAOTInductorPackageCpp_cuda::test_package_user_managed_weight, test/inductor/test_aot_inductor_package.py::TestAOTInductorPackageCpp_cuda::test_package_weights_on_disk_nested_module, test/inductor/test_aot_inductor_package.py::TestAOTInductorPackageCpp_cuda::test_package_without_weight, test/inductor/test_aot_inductor_package.py::TestAOTInductorPackageCpp_cuda::test_remove_intermediate_files, test/inductor/test_aot_inductor_package.py::TestAOTInductorPackageCpp_cuda::test_save_buffer, test/inductor/test_aot_inductor_package.py::TestAOTInductorPackageCpp_cuda::test_specified_output_dir, test/inductor/test_aot_inductor_package.py::TestAOTInductorPackageCpp_cuda::test_update_weights 2025-12-04T12:35:04.2669111Z 2025-12-04T12:35:04.2669919Z inductor/test_aot_inductor_package.py::TestAOTInductorPackage_cpu::test_add PASSED [7.6318s] [ 1%] 2025-12-04T12:35:04.2672074Z inductor/test_aot_inductor_package.py::TestAOTInductorPackage_cpu::test_bool_input PASSED [5.0570s] [ 2%] 2025-12-04T12:35:04.2674766Z inductor/test_aot_inductor_package.py::TestAOTInductorPackage_cpu::test_compile_after_package SKIPPED [0.0004s] (Test is only supported on CUDA 12.6+) [ 3%] 2025-12-04T12:35:04.2677519Z inductor/test_aot_inductor_package.py::TestAOTInductorPackage_cpu::test_compile_after_package_multi_arch SKIPPED [0.0002s] (Test is only supported on CUDA 12.8+) [ 4%] 2025-12-04T12:35:04.2680538Z inductor/test_aot_inductor_package.py::TestAOTInductorPackage_cpu::test_compile_after_package_static SKIPPED [0.0002s] (Test is only supported on CUDA 12.6+) [ 5%] 2025-12-04T12:35:04.2683266Z inductor/test_aot_inductor_package.py::TestAOTInductorPackage_cpu::test_compile_standalone_cos SKIPPED [0.0031s] (Only meant to test cpp package) [ 6%] 2025-12-04T12:35:04.2685616Z inductor/test_aot_inductor_package.py::TestAOTInductorPackage_cpu::test_compile_with_exporter SKIPPED [0.0002s] (Test is only supported on CUDA 12.6+) [ 7%] 2025-12-04T12:35:04.2688427Z inductor/test_aot_inductor_package.py::TestAOTInductorPackage_cpu::test_compile_with_exporter_weights SKIPPED [0.0006s] (Test is only supported on CUDA 12.6+) [ 9%] 2025-12-04T12:35:04.2692139Z inductor/test_aot_inductor_package.py::TestAOTInductorPackage_cpu::test_deepcopy_compiled_model W1204 12:26:18.236000 140836 site-packages/torch/export/pt2_archive/_package.py:763] AOTICompiledModel deepcopy warning: AOTICompiledModel.loader is not deepcopied. 2025-12-04T12:35:04.2694840Z PASSED [5.1506s] [ 10%] 2025-12-04T12:35:04.2695995Z inductor/test_aot_inductor_package.py::TestAOTInductorPackage_cpu::test_duplicate_calls PASSED [20.3835s] [ 11%] 2025-12-04T12:35:04.2697577Z inductor/test_aot_inductor_package.py::TestAOTInductorPackage_cpu::test_linear PASSED [5.1766s] [ 12%] 2025-12-04T12:35:04.2700017Z inductor/test_aot_inductor_package.py::TestAOTInductorPackage_cpu::test_loading_wrong_model W1204 12:26:49.002000 140836 site-packages/torch/_inductor/package/package.py:120] Loading outdated pt2 file. Please regenerate your package. 2025-12-04T12:35:04.2702036Z PASSED [5.2006s] [ 13%] 2025-12-04T12:35:04.2703050Z inductor/test_aot_inductor_package.py::TestAOTInductorPackage_cpu::test_metadata PASSED [5.1577s] [ 14%] 2025-12-04T12:35:04.2704774Z inductor/test_aot_inductor_package.py::TestAOTInductorPackage_cpu::test_multiple_methods PASSED [10.4660s] [ 15%] 2025-12-04T12:35:04.2706305Z inductor/test_aot_inductor_package.py::TestAOTInductorPackage_cpu::test_package_shared_weights PASSED [2.1586s] [ 17%] 2025-12-04T12:35:04.2707486Z inductor/test_aot_inductor_package.py::TestAOTInductorPackage_cpu::test_package_user_managed_weight PASSED [6.3455s] [ 18%] 2025-12-04T12:35:04.2708833Z inductor/test_aot_inductor_package.py::TestAOTInductorPackage_cpu::test_package_weights_on_disk_nested_module PASSED [5.2032s] [ 19%] 2025-12-04T12:35:04.2710042Z inductor/test_aot_inductor_package.py::TestAOTInductorPackage_cpu::test_package_without_weight PASSED [5.3058s] [ 20%] 2025-12-04T12:35:04.2711186Z inductor/test_aot_inductor_package.py::TestAOTInductorPackage_cpu::test_remove_intermediate_files PASSED [5.1781s] [ 21%] 2025-12-04T12:35:04.2712291Z inductor/test_aot_inductor_package.py::TestAOTInductorPackage_cpu::test_save_buffer PASSED [5.2636s] [ 22%] 2025-12-04T12:35:04.2713374Z inductor/test_aot_inductor_package.py::TestAOTInductorPackage_cpu::test_specified_output_dir PASSED [5.2034s] [ 23%] 2025-12-04T12:35:04.2714468Z inductor/test_aot_inductor_package.py::TestAOTInductorPackage_cpu::test_update_weights PASSED [5.6854s] [ 25%] 2025-12-04T12:35:04.2716021Z inductor/test_aot_inductor_package.py::TestAOTInductorPackageCpp_cpu::test_add In file included from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_float.h:12, 2025-12-04T12:35:04.2717649Z from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512.h:11, 2025-12-04T12:35:04.2718603Z from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec.h:5, 2025-12-04T12:35:04.2719668Z from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/functional_base.h:7, 2025-12-04T12:35:04.2720660Z from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/functional.h:4, 2025-12-04T12:35:04.2721663Z from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/torch/csrc/inductor/cpp_prefix.h:45, 2025-12-04T12:35:04.2722931Z from /tmp/ROfn0q/tmpejwtemx7/data/aotinductor/model/cn3k2mlnpdktb5d42n3gbws3qpzrim5w2lb6w5t7cv3mzl7dq3b5.wrapper.cpp:723: 2025-12-04T12:35:04.2724345Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/sleef.h:192:10: warning: ISO C++ prohibits anonymous structs [-Wpedantic] 2025-12-04T12:35:04.2727525Z 192 | struct { 2025-12-04T12:35:04.2727780Z | ^ 2025-12-04T12:35:04.2728446Z In file included from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512.h:15, 2025-12-04T12:35:04.2729472Z from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec.h:5, 2025-12-04T12:35:04.2730417Z from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/functional_base.h:7, 2025-12-04T12:35:04.2731414Z from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/functional.h:4, 2025-12-04T12:35:04.2732433Z from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/torch/csrc/inductor/cpp_prefix.h:45, 2025-12-04T12:35:04.2733702Z from /tmp/ROfn0q/tmpejwtemx7/data/aotinductor/model/cn3k2mlnpdktb5d42n3gbws3qpzrim5w2lb6w5t7cv3mzl7dq3b5.wrapper.cpp:723: 2025-12-04T12:35:04.2737421Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In static member function ‘static at::vec::CPU_CAPABILITY::Vectorized at::vec::CPU_CAPABILITY::Vectorized::blendv(const at::vec::CPU_CAPABILITY::Vectorized&, const at::vec::CPU_CAPABILITY::Vectorized&, const at::vec::CPU_CAPABILITY::Vectorized&)’: 2025-12-04T12:35:04.2740607Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:544:38: warning: overflow in conversion from ‘int’ to ‘short int’ changes value from ‘65535’ to ‘-1’ [-Woverflow] 2025-12-04T12:35:04.2741873Z 544 | auto msb_one = _mm512_set1_epi16(0xFFFF); 2025-12-04T12:35:04.2742294Z | ^~~~~~ 2025-12-04T12:35:04.2743056Z In file included from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512.h:15, 2025-12-04T12:35:04.2744053Z from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec.h:5, 2025-12-04T12:35:04.2745011Z from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/functional_base.h:7, 2025-12-04T12:35:04.2746017Z from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/functional.h:4, 2025-12-04T12:35:04.2747032Z from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/torch/csrc/inductor/cpp_prefix.h:45, 2025-12-04T12:35:04.2748275Z from /tmp/ROfn0q/tmpejwtemx7/data/aotinductor/model/cn3k2mlnpdktb5d42n3gbws3qpzrim5w2lb6w5t7cv3mzl7dq3b5.wrapper.cpp:723: 2025-12-04T12:35:04.2750720Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In member function ‘at::vec::CPU_CAPABILITY::Vectorized at::vec::CPU_CAPABILITY::Vectorized::operator==(const at::vec::CPU_CAPABILITY::Vectorized&) const’: 2025-12-04T12:35:04.2753396Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:697:54: warning: overflow in conversion from ‘int’ to ‘short int’ changes value from ‘65535’ to ‘-1’ [-Woverflow] 2025-12-04T12:35:04.2754754Z 697 | return _mm512_mask_set1_epi16(zero_vector, mask, 0xFFFF); 2025-12-04T12:35:04.2755276Z | ^~~~~~ 2025-12-04T12:35:04.2757171Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In member function ‘at::vec::CPU_CAPABILITY::Vectorized at::vec::CPU_CAPABILITY::Vectorized::operator!=(const at::vec::CPU_CAPABILITY::Vectorized&) const’: 2025-12-04T12:35:04.2759769Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:701:54: warning: overflow in conversion from ‘int’ to ‘short int’ changes value from ‘65535’ to ‘-1’ [-Woverflow] 2025-12-04T12:35:04.2761073Z 701 | return _mm512_mask_set1_epi16(zero_vector, mask, 0xFFFF); 2025-12-04T12:35:04.2761546Z | ^~~~~~ 2025-12-04T12:35:04.2763434Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In member function ‘at::vec::CPU_CAPABILITY::Vectorized at::vec::CPU_CAPABILITY::Vectorized::operator<(const at::vec::CPU_CAPABILITY::Vectorized&) const’: 2025-12-04T12:35:04.2766028Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:705:54: warning: overflow in conversion from ‘int’ to ‘short int’ changes value from ‘65535’ to ‘-1’ [-Woverflow] 2025-12-04T12:35:04.2767494Z 705 | return _mm512_mask_set1_epi16(zero_vector, mask, 0xFFFF); 2025-12-04T12:35:04.2767973Z | ^~~~~~ 2025-12-04T12:35:04.2769864Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In member function ‘at::vec::CPU_CAPABILITY::Vectorized at::vec::CPU_CAPABILITY::Vectorized::operator<=(const at::vec::CPU_CAPABILITY::Vectorized&) const’: 2025-12-04T12:35:04.2772789Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:709:54: warning: overflow in conversion from ‘int’ to ‘short int’ changes value from ‘65535’ to ‘-1’ [-Woverflow] 2025-12-04T12:35:04.2774116Z 709 | return _mm512_mask_set1_epi16(zero_vector, mask, 0xFFFF); 2025-12-04T12:35:04.2774595Z | ^~~~~~ 2025-12-04T12:35:04.2776552Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In member function ‘at::vec::CPU_CAPABILITY::Vectorized at::vec::CPU_CAPABILITY::Vectorized::operator>(const at::vec::CPU_CAPABILITY::Vectorized&) const’: 2025-12-04T12:35:04.2779174Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:713:54: warning: overflow in conversion from ‘int’ to ‘short int’ changes value from ‘65535’ to ‘-1’ [-Woverflow] 2025-12-04T12:35:04.2780472Z 713 | return _mm512_mask_set1_epi16(zero_vector, mask, 0xFFFF); 2025-12-04T12:35:04.2780959Z | ^~~~~~ 2025-12-04T12:35:04.2782849Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In member function ‘at::vec::CPU_CAPABILITY::Vectorized at::vec::CPU_CAPABILITY::Vectorized::operator>=(const at::vec::CPU_CAPABILITY::Vectorized&) const’: 2025-12-04T12:35:04.2785428Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:717:54: warning: overflow in conversion from ‘int’ to ‘short int’ changes value from ‘65535’ to ‘-1’ [-Woverflow] 2025-12-04T12:35:04.2786792Z 717 | return _mm512_mask_set1_epi16(zero_vector, mask, 0xFFFF); 2025-12-04T12:35:04.2787262Z | ^~~~~~ 2025-12-04T12:35:04.2789871Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In static member function ‘static at::vec::CPU_CAPABILITY::Vectorized at::vec::CPU_CAPABILITY::Vectorized::blendv(const at::vec::CPU_CAPABILITY::Vectorized&, const at::vec::CPU_CAPABILITY::Vectorized&, const at::vec::CPU_CAPABILITY::Vectorized&)’: 2025-12-04T12:35:04.2793104Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1153:37: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.2794407Z 1153 | auto msb_one = _mm512_set1_epi8(0xFF); 2025-12-04T12:35:04.2794816Z | ^~~~ 2025-12-04T12:35:04.2796739Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In member function ‘at::vec::CPU_CAPABILITY::Vectorized at::vec::CPU_CAPABILITY::Vectorized::operator==(const at::vec::CPU_CAPABILITY::Vectorized&) const’: 2025-12-04T12:35:04.2799406Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1166:53: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.2800740Z 1166 | return _mm512_mask_set1_epi8(zero_vector, mask, 0xFF); 2025-12-04T12:35:04.2801216Z | ^~~~ 2025-12-04T12:35:04.2803153Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In member function ‘at::vec::CPU_CAPABILITY::Vectorized at::vec::CPU_CAPABILITY::Vectorized::operator!=(const at::vec::CPU_CAPABILITY::Vectorized&) const’: 2025-12-04T12:35:04.2805822Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1170:53: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.2807201Z 1170 | return _mm512_mask_set1_epi8(zero_vector, mask, 0xFF); 2025-12-04T12:35:04.2807681Z | ^~~~ 2025-12-04T12:35:04.2809622Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In member function ‘at::vec::CPU_CAPABILITY::Vectorized at::vec::CPU_CAPABILITY::Vectorized::operator<(const at::vec::CPU_CAPABILITY::Vectorized&) const’: 2025-12-04T12:35:04.2812299Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1174:53: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.2813614Z 1174 | return _mm512_mask_set1_epi8(zero_vector, mask, 0xFF); 2025-12-04T12:35:04.2814089Z | ^~~~ 2025-12-04T12:35:04.2816040Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In member function ‘at::vec::CPU_CAPABILITY::Vectorized at::vec::CPU_CAPABILITY::Vectorized::operator<=(const at::vec::CPU_CAPABILITY::Vectorized&) const’: 2025-12-04T12:35:04.2818781Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1178:53: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.2820164Z 1178 | return _mm512_mask_set1_epi8(zero_vector, mask, 0xFF); 2025-12-04T12:35:04.2820622Z | ^~~~ 2025-12-04T12:35:04.2823274Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In static member function ‘static at::vec::CPU_CAPABILITY::Vectorized at::vec::CPU_CAPABILITY::Vectorized::blendv(const at::vec::CPU_CAPABILITY::Vectorized&, const at::vec::CPU_CAPABILITY::Vectorized&, const at::vec::CPU_CAPABILITY::Vectorized&)’: 2025-12-04T12:35:04.2826531Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1207:37: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.2827810Z 1207 | auto msb_one = _mm512_set1_epi8(0xFF); 2025-12-04T12:35:04.2828213Z | ^~~~ 2025-12-04T12:35:04.2830155Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In member function ‘at::vec::CPU_CAPABILITY::Vectorized at::vec::CPU_CAPABILITY::Vectorized::operator==(const at::vec::CPU_CAPABILITY::Vectorized&) const’: 2025-12-04T12:35:04.2832851Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1220:53: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.2834180Z 1220 | return _mm512_mask_set1_epi8(zero_vector, mask, 0xFF); 2025-12-04T12:35:04.2834643Z | ^~~~ 2025-12-04T12:35:04.2836584Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In member function ‘at::vec::CPU_CAPABILITY::Vectorized at::vec::CPU_CAPABILITY::Vectorized::operator!=(const at::vec::CPU_CAPABILITY::Vectorized&) const’: 2025-12-04T12:35:04.2839279Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1224:53: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.2840597Z 1224 | return _mm512_mask_set1_epi8(zero_vector, mask, 0xFF); 2025-12-04T12:35:04.2841071Z | ^~~~ 2025-12-04T12:35:04.2843072Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In member function ‘at::vec::CPU_CAPABILITY::Vectorized at::vec::CPU_CAPABILITY::Vectorized::operator<(const at::vec::CPU_CAPABILITY::Vectorized&) const’: 2025-12-04T12:35:04.2845749Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1228:53: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.2847073Z 1228 | return _mm512_mask_set1_epi8(zero_vector, mask, 0xFF); 2025-12-04T12:35:04.2847541Z | ^~~~ 2025-12-04T12:35:04.2849506Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In member function ‘at::vec::CPU_CAPABILITY::Vectorized at::vec::CPU_CAPABILITY::Vectorized::operator<=(const at::vec::CPU_CAPABILITY::Vectorized&) const’: 2025-12-04T12:35:04.2852201Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1232:53: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.2853520Z 1232 | return _mm512_mask_set1_epi8(zero_vector, mask, 0xFF); 2025-12-04T12:35:04.2853987Z | ^~~~ 2025-12-04T12:35:04.2856767Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In instantiation of ‘at::vec::CPU_CAPABILITY::Vectorized at::vec::CPU_CAPABILITY::shift_512_8(const at::vec::CPU_CAPABILITY::Vectorized&, const at::vec::CPU_CAPABILITY::Vectorized&) [with bool left_shift = true; T = signed char; typename std::enable_if<(is_same_v || is_same_v), int>::type = 0]’: 2025-12-04T12:35:04.2859477Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:2074:27: required from here 2025-12-04T12:35:04.2861404Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1866:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.2862609Z 1866 | 0x80, 2025-12-04T12:35:04.2862873Z | ^~~~ 2025-12-04T12:35:04.2864230Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1868:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.2865428Z 1868 | 0x80, 2025-12-04T12:35:04.2865689Z | ^~~~ 2025-12-04T12:35:04.2867035Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1870:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.2868258Z 1870 | 0x80, 2025-12-04T12:35:04.2868503Z | ^~~~ 2025-12-04T12:35:04.2869849Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1872:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.2871306Z 1872 | 0x80, 2025-12-04T12:35:04.2871581Z | ^~~~ 2025-12-04T12:35:04.2872928Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1874:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.2874140Z 1874 | 0x80, 2025-12-04T12:35:04.2874397Z | ^~~~ 2025-12-04T12:35:04.2875812Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1876:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.2877046Z 1876 | 0x80, 2025-12-04T12:35:04.2877303Z | ^~~~ 2025-12-04T12:35:04.2878646Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1878:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.2879839Z 1878 | 0x80, 2025-12-04T12:35:04.2880105Z | ^~~~ 2025-12-04T12:35:04.2881439Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1880:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.2882654Z 1880 | 0x80, 2025-12-04T12:35:04.2882896Z | ^~~~ 2025-12-04T12:35:04.2884244Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1882:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.2885456Z 1882 | 0x80, 2025-12-04T12:35:04.2885697Z | ^~~~ 2025-12-04T12:35:04.2887035Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1884:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.2888246Z 1884 | 0x80, 2025-12-04T12:35:04.2888565Z | ^~~~ 2025-12-04T12:35:04.2889896Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1886:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.2891103Z 1886 | 0x80, 2025-12-04T12:35:04.2891358Z | ^~~~ 2025-12-04T12:35:04.2892744Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1888:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.2894002Z 1888 | 0x80, 2025-12-04T12:35:04.2894265Z | ^~~~ 2025-12-04T12:35:04.2895604Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1890:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.2896868Z 1890 | 0x80, 2025-12-04T12:35:04.2897132Z | ^~~~ 2025-12-04T12:35:04.2898486Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1892:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.2899702Z 1892 | 0x80, 2025-12-04T12:35:04.2899944Z | ^~~~ 2025-12-04T12:35:04.2901302Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1894:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.2902526Z 1894 | 0x80, 2025-12-04T12:35:04.2902766Z | ^~~~ 2025-12-04T12:35:04.2904106Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1896:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.2905319Z 1896 | 0x80, 2025-12-04T12:35:04.2905571Z | ^~~~ 2025-12-04T12:35:04.2906897Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1898:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.2908105Z 1898 | 0x80, 2025-12-04T12:35:04.2908358Z | ^~~~ 2025-12-04T12:35:04.2909752Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1900:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.2910958Z 1900 | 0x80, 2025-12-04T12:35:04.2911217Z | ^~~~ 2025-12-04T12:35:04.2912565Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1902:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.2913776Z 1902 | 0x80, 2025-12-04T12:35:04.2914037Z | ^~~~ 2025-12-04T12:35:04.2915390Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1904:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.2916602Z 1904 | 0x80, 2025-12-04T12:35:04.2916849Z | ^~~~ 2025-12-04T12:35:04.2918211Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1906:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.2919435Z 1906 | 0x80, 2025-12-04T12:35:04.2919696Z | ^~~~ 2025-12-04T12:35:04.2921028Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1908:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.2922291Z 1908 | 0x80, 2025-12-04T12:35:04.2922553Z | ^~~~ 2025-12-04T12:35:04.2923890Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1910:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.2925111Z 1910 | 0x80, 2025-12-04T12:35:04.2925372Z | ^~~~ 2025-12-04T12:35:04.2926796Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1912:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.2928000Z 1912 | 0x80, 2025-12-04T12:35:04.2928263Z | ^~~~ 2025-12-04T12:35:04.2929610Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1914:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.2930831Z 1914 | 0x80, 2025-12-04T12:35:04.2931075Z | ^~~~ 2025-12-04T12:35:04.2932416Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1916:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.2933629Z 1916 | 0x80, 2025-12-04T12:35:04.2933866Z | ^~~~ 2025-12-04T12:35:04.2935219Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1918:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.2936497Z 1918 | 0x80, 2025-12-04T12:35:04.2936752Z | ^~~~ 2025-12-04T12:35:04.2938079Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1920:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.2939293Z 1920 | 0x80, 2025-12-04T12:35:04.2939548Z | ^~~~ 2025-12-04T12:35:04.2941107Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1922:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.2942336Z 1922 | 0x80, 2025-12-04T12:35:04.2942593Z | ^~~~ 2025-12-04T12:35:04.2944040Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1924:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.2945238Z 1924 | 0x80, 2025-12-04T12:35:04.2945496Z | ^~~~ 2025-12-04T12:35:04.2946842Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1926:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.2948056Z 1926 | 0x80, 2025-12-04T12:35:04.2948297Z | ^~~~ 2025-12-04T12:35:04.2949635Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1928:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.2950850Z 1928 | 0x80); 2025-12-04T12:35:04.2951112Z | ^~~~ 2025-12-04T12:35:04.2952460Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1930:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.2953682Z 1930 | 0x80, 2025-12-04T12:35:04.2953937Z | ^~~~ 2025-12-04T12:35:04.2955268Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1932:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.2956530Z 1932 | 0x80, 2025-12-04T12:35:04.2956788Z | ^~~~ 2025-12-04T12:35:04.2958132Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1934:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.2959327Z 1934 | 0x80, 2025-12-04T12:35:04.2959622Z | ^~~~ 2025-12-04T12:35:04.2961013Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1936:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.2962225Z 1936 | 0x80, 2025-12-04T12:35:04.2962467Z | ^~~~ 2025-12-04T12:35:04.2963815Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1938:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.2965034Z 1938 | 0x80, 2025-12-04T12:35:04.2965278Z | ^~~~ 2025-12-04T12:35:04.2966623Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1940:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.2967833Z 1940 | 0x80, 2025-12-04T12:35:04.2968096Z | ^~~~ 2025-12-04T12:35:04.2969434Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1942:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.2970649Z 1942 | 0x80, 2025-12-04T12:35:04.2970903Z | ^~~~ 2025-12-04T12:35:04.2972476Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1944:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.2973701Z 1944 | 0x80, 2025-12-04T12:35:04.2973957Z | ^~~~ 2025-12-04T12:35:04.2975299Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1946:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.2976560Z 1946 | 0x80, 2025-12-04T12:35:04.2976822Z | ^~~~ 2025-12-04T12:35:04.2978310Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1948:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.2979524Z 1948 | 0x80, 2025-12-04T12:35:04.2979765Z | ^~~~ 2025-12-04T12:35:04.2981109Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1950:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.2982324Z 1950 | 0x80, 2025-12-04T12:35:04.2982568Z | ^~~~ 2025-12-04T12:35:04.2983915Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1952:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.2985192Z 1952 | 0x80, 2025-12-04T12:35:04.2985452Z | ^~~~ 2025-12-04T12:35:04.2986850Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1954:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.2988073Z 1954 | 0x80, 2025-12-04T12:35:04.2988335Z | ^~~~ 2025-12-04T12:35:04.2989689Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1956:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.2990945Z 1956 | 0x80, 2025-12-04T12:35:04.2991201Z | ^~~~ 2025-12-04T12:35:04.2992551Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1958:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.2993750Z 1958 | 0x80, 2025-12-04T12:35:04.2994008Z | ^~~~ 2025-12-04T12:35:04.2995353Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1960:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.2996567Z 1960 | 0x80, 2025-12-04T12:35:04.2996808Z | ^~~~ 2025-12-04T12:35:04.2998157Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1962:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.2999370Z 1962 | 0x80, 2025-12-04T12:35:04.2999629Z | ^~~~ 2025-12-04T12:35:04.3000977Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1964:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.3002203Z 1964 | 0x80, 2025-12-04T12:35:04.3002461Z | ^~~~ 2025-12-04T12:35:04.3003842Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1966:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.3005062Z 1966 | 0x80, 2025-12-04T12:35:04.3005321Z | ^~~~ 2025-12-04T12:35:04.3006667Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1968:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.3007869Z 1968 | 0x80, 2025-12-04T12:35:04.3008131Z | ^~~~ 2025-12-04T12:35:04.3009478Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1970:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.3010705Z 1970 | 0x80, 2025-12-04T12:35:04.3010950Z | ^~~~ 2025-12-04T12:35:04.3012309Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1972:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.3013523Z 1972 | 0x80, 2025-12-04T12:35:04.3013768Z | ^~~~ 2025-12-04T12:35:04.3015123Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1974:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.3016446Z 1974 | 0x80, 2025-12-04T12:35:04.3016702Z | ^~~~ 2025-12-04T12:35:04.3018045Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1976:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.3020138Z 1976 | 0x80, 2025-12-04T12:35:04.3020409Z | ^~~~ 2025-12-04T12:35:04.3021849Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1978:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.3023067Z 1978 | 0x80, 2025-12-04T12:35:04.3023321Z | ^~~~ 2025-12-04T12:35:04.3024665Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1980:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.3025877Z 1980 | 0x80, 2025-12-04T12:35:04.3026129Z | ^~~~ 2025-12-04T12:35:04.3027473Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1982:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.3028691Z 1982 | 0x80, 2025-12-04T12:35:04.3028933Z | ^~~~ 2025-12-04T12:35:04.3030295Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1984:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.3031496Z 1984 | 0x80, 2025-12-04T12:35:04.3031738Z | ^~~~ 2025-12-04T12:35:04.3033074Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1986:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.3034290Z 1986 | 0x80, 2025-12-04T12:35:04.3034548Z | ^~~~ 2025-12-04T12:35:04.3035870Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1988:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.3037086Z 1988 | 0x80, 2025-12-04T12:35:04.3037342Z | ^~~~ 2025-12-04T12:35:04.3038729Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1990:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.3039927Z 1990 | 0x80, 2025-12-04T12:35:04.3040180Z | ^~~~ 2025-12-04T12:35:04.3041522Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1992:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.3042719Z 1992 | 0x80, 2025-12-04T12:35:04.3042975Z | ^~~~ 2025-12-04T12:35:04.3044318Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:2002:38: warning: overflow in conversion from ‘int’ to ‘short int’ changes value from ‘65280’ to ‘-256’ [-Woverflow] 2025-12-04T12:35:04.3045596Z 2002 | __m512i keep_1 = _mm512_set1_epi16(0xFF00); 2025-12-04T12:35:04.3046002Z | ^~~~~~ 2025-12-04T12:35:04.3048713Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In instantiation of ‘at::vec::CPU_CAPABILITY::Vectorized at::vec::CPU_CAPABILITY::shift_512_8(const at::vec::CPU_CAPABILITY::Vectorized&, const at::vec::CPU_CAPABILITY::Vectorized&) [with bool left_shift = true; T = unsigned char; typename std::enable_if<(is_same_v || is_same_v), int>::type = 0]’: 2025-12-04T12:35:04.3051403Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:2081:27: required from here 2025-12-04T12:35:04.3053318Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1866:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.3054575Z 1866 | 0x80, 2025-12-04T12:35:04.3054831Z | ^~~~ 2025-12-04T12:35:04.3056210Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1868:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.3057511Z 1868 | 0x80, 2025-12-04T12:35:04.3057756Z | ^~~~ 2025-12-04T12:35:04.3059106Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1870:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.3060328Z 1870 | 0x80, 2025-12-04T12:35:04.3060584Z | ^~~~ 2025-12-04T12:35:04.3061909Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1872:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.3063122Z 1872 | 0x80, 2025-12-04T12:35:04.3063391Z | ^~~~ 2025-12-04T12:35:04.3064725Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1874:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.3065922Z 1874 | 0x80, 2025-12-04T12:35:04.3066174Z | ^~~~ 2025-12-04T12:35:04.3067508Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1876:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.3068721Z 1876 | 0x80, 2025-12-04T12:35:04.3068963Z | ^~~~ 2025-12-04T12:35:04.3070304Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1878:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.3071822Z 1878 | 0x80, 2025-12-04T12:35:04.3072075Z | ^~~~ 2025-12-04T12:35:04.3073444Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1880:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.3074659Z 1880 | 0x80, 2025-12-04T12:35:04.3074915Z | ^~~~ 2025-12-04T12:35:04.3076247Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1882:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.3077464Z 1882 | 0x80, 2025-12-04T12:35:04.3077724Z | ^~~~ 2025-12-04T12:35:04.3079058Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1884:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.3080281Z 1884 | 0x80, 2025-12-04T12:35:04.3080549Z | ^~~~ 2025-12-04T12:35:04.3081889Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1886:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.3083089Z 1886 | 0x80, 2025-12-04T12:35:04.3083346Z | ^~~~ 2025-12-04T12:35:04.3084756Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1888:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.3085966Z 1888 | 0x80, 2025-12-04T12:35:04.3086206Z | ^~~~ 2025-12-04T12:35:04.3087547Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1890:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.3088817Z 1890 | 0x80, 2025-12-04T12:35:04.3089130Z | ^~~~ 2025-12-04T12:35:04.3090480Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1892:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.3091689Z 1892 | 0x80, 2025-12-04T12:35:04.3091942Z | ^~~~ 2025-12-04T12:35:04.3093279Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1894:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.3094495Z 1894 | 0x80, 2025-12-04T12:35:04.3094754Z | ^~~~ 2025-12-04T12:35:04.3096097Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1896:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.3097411Z 1896 | 0x80, 2025-12-04T12:35:04.3097676Z | ^~~~ 2025-12-04T12:35:04.3099028Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1898:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.3100221Z 1898 | 0x80, 2025-12-04T12:35:04.3100482Z | ^~~~ 2025-12-04T12:35:04.3101840Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1900:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.3103052Z 1900 | 0x80, 2025-12-04T12:35:04.3103298Z | ^~~~ 2025-12-04T12:35:04.3104696Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1902:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.3105928Z 1902 | 0x80, 2025-12-04T12:35:04.3106190Z | ^~~~ 2025-12-04T12:35:04.3107529Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1904:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.3108745Z 1904 | 0x80, 2025-12-04T12:35:04.3109010Z | ^~~~ 2025-12-04T12:35:04.3110351Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1906:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.3111569Z 1906 | 0x80, 2025-12-04T12:35:04.3111827Z | ^~~~ 2025-12-04T12:35:04.3113177Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1908:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.3114390Z 1908 | 0x80, 2025-12-04T12:35:04.3114653Z | ^~~~ 2025-12-04T12:35:04.3115993Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1910:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.3117216Z 1910 | 0x80, 2025-12-04T12:35:04.3117455Z | ^~~~ 2025-12-04T12:35:04.3118845Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1912:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.3120057Z 1912 | 0x80, 2025-12-04T12:35:04.3120299Z | ^~~~ 2025-12-04T12:35:04.3121642Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1914:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.3123479Z 1914 | 0x80, 2025-12-04T12:35:04.3123740Z | ^~~~ 2025-12-04T12:35:04.3125877Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1916:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.3127103Z 1916 | 0x80, 2025-12-04T12:35:04.3127372Z | ^~~~ 2025-12-04T12:35:04.3128718Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1918:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.3129932Z 1918 | 0x80, 2025-12-04T12:35:04.3130195Z | ^~~~ 2025-12-04T12:35:04.3131550Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1920:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.3132757Z 1920 | 0x80, 2025-12-04T12:35:04.3133010Z | ^~~~ 2025-12-04T12:35:04.3134356Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1922:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.3135565Z 1922 | 0x80, 2025-12-04T12:35:04.3135813Z | ^~~~ 2025-12-04T12:35:04.3137226Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1924:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.3138444Z 1924 | 0x80, 2025-12-04T12:35:04.3138691Z | ^~~~ 2025-12-04T12:35:04.3140138Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1926:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.3141365Z 1926 | 0x80, 2025-12-04T12:35:04.3141628Z | ^~~~ 2025-12-04T12:35:04.3142959Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1928:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.3144625Z 1928 | 0x80); 2025-12-04T12:35:04.3144947Z | ^~~~ 2025-12-04T12:35:04.3146310Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1930:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.3147506Z 1930 | 0x80, 2025-12-04T12:35:04.3147763Z | ^~~~ 2025-12-04T12:35:04.3149118Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1932:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.3150453Z 1932 | 0x80, 2025-12-04T12:35:04.3150715Z | ^~~~ 2025-12-04T12:35:04.3152062Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1934:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.3153277Z 1934 | 0x80, 2025-12-04T12:35:04.3153595Z | ^~~~ 2025-12-04T12:35:04.3155158Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1936:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.3156370Z 1936 | 0x80, 2025-12-04T12:35:04.3156634Z | ^~~~ 2025-12-04T12:35:04.3158036Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1938:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.3159283Z 1938 | 0x80, 2025-12-04T12:35:04.3159542Z | ^~~~ 2025-12-04T12:35:04.3160872Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1940:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.3162089Z 1940 | 0x80, 2025-12-04T12:35:04.3162349Z | ^~~~ 2025-12-04T12:35:04.3163692Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1942:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.3164903Z 1942 | 0x80, 2025-12-04T12:35:04.3165161Z | ^~~~ 2025-12-04T12:35:04.3166521Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1944:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.3167739Z 1944 | 0x80, 2025-12-04T12:35:04.3167985Z | ^~~~ 2025-12-04T12:35:04.3169322Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1946:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.3170541Z 1946 | 0x80, 2025-12-04T12:35:04.3170787Z | ^~~~ 2025-12-04T12:35:04.3172416Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1948:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.3173636Z 1948 | 0x80, 2025-12-04T12:35:04.3173896Z | ^~~~ 2025-12-04T12:35:04.3175345Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1950:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.3176629Z 1950 | 0x80, 2025-12-04T12:35:04.3176886Z | ^~~~ 2025-12-04T12:35:04.3178221Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1952:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.3179491Z 1952 | 0x80, 2025-12-04T12:35:04.3179749Z | ^~~~ 2025-12-04T12:35:04.3181091Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1954:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.3182292Z 1954 | 0x80, 2025-12-04T12:35:04.3182548Z | ^~~~ 2025-12-04T12:35:04.3183999Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1956:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.3185236Z 1956 | 0x80, 2025-12-04T12:35:04.3185482Z | ^~~~ 2025-12-04T12:35:04.3186825Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1958:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.3188052Z 1958 | 0x80, 2025-12-04T12:35:04.3188299Z | ^~~~ 2025-12-04T12:35:04.3189648Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1960:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.3190864Z 1960 | 0x80, 2025-12-04T12:35:04.3191124Z | ^~~~ 2025-12-04T12:35:04.3192474Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1962:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.3193693Z 1962 | 0x80, 2025-12-04T12:35:04.3193957Z | ^~~~ 2025-12-04T12:35:04.3195305Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1964:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.3196515Z 1964 | 0x80, 2025-12-04T12:35:04.3196777Z | ^~~~ 2025-12-04T12:35:04.3198124Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1966:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.3199337Z 1966 | 0x80, 2025-12-04T12:35:04.3199585Z | ^~~~ 2025-12-04T12:35:04.3200989Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1968:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.3202206Z 1968 | 0x80, 2025-12-04T12:35:04.3202456Z | ^~~~ 2025-12-04T12:35:04.3203802Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1970:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.3205019Z 1970 | 0x80, 2025-12-04T12:35:04.3205276Z | ^~~~ 2025-12-04T12:35:04.3206600Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1972:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.3207806Z 1972 | 0x80, 2025-12-04T12:35:04.3208068Z | ^~~~ 2025-12-04T12:35:04.3209408Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1974:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.3210615Z 1974 | 0x80, 2025-12-04T12:35:04.3210867Z | ^~~~ 2025-12-04T12:35:04.3212210Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1976:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.3213472Z 1976 | 0x80, 2025-12-04T12:35:04.3213733Z | ^~~~ 2025-12-04T12:35:04.3215078Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1978:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.3216347Z 1978 | 0x80, 2025-12-04T12:35:04.3216670Z | ^~~~ 2025-12-04T12:35:04.3218097Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1980:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.3219311Z 1980 | 0x80, 2025-12-04T12:35:04.3219560Z | ^~~~ 2025-12-04T12:35:04.3220908Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1982:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.3222130Z 1982 | 0x80, 2025-12-04T12:35:04.3222392Z | ^~~~ 2025-12-04T12:35:04.3223715Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1984:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.3224927Z 1984 | 0x80, 2025-12-04T12:35:04.3225191Z | ^~~~ 2025-12-04T12:35:04.3226544Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1986:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.3227740Z 1986 | 0x80, 2025-12-04T12:35:04.3227994Z | ^~~~ 2025-12-04T12:35:04.3229338Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1988:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.3230536Z 1988 | 0x80, 2025-12-04T12:35:04.3230795Z | ^~~~ 2025-12-04T12:35:04.3232138Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1990:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.3233347Z 1990 | 0x80, 2025-12-04T12:35:04.3233595Z | ^~~~ 2025-12-04T12:35:04.3234993Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1992:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.3236201Z 1992 | 0x80, 2025-12-04T12:35:04.3236456Z | ^~~~ 2025-12-04T12:35:04.3237785Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:2002:38: warning: overflow in conversion from ‘int’ to ‘short int’ changes value from ‘65280’ to ‘-256’ [-Woverflow] 2025-12-04T12:35:04.3239068Z 2002 | __m512i keep_1 = _mm512_set1_epi16(0xFF00); 2025-12-04T12:35:04.3239481Z | ^~~~~~ 2025-12-04T12:35:04.3242188Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In instantiation of ‘at::vec::CPU_CAPABILITY::Vectorized at::vec::CPU_CAPABILITY::shift_512_8(const at::vec::CPU_CAPABILITY::Vectorized&, const at::vec::CPU_CAPABILITY::Vectorized&) [with bool left_shift = false; T = signed char; typename std::enable_if<(is_same_v || is_same_v), int>::type = 0]’: 2025-12-04T12:35:04.3244816Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:2109:28: required from here 2025-12-04T12:35:04.3246729Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1866:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.3247989Z 1866 | 0x80, 2025-12-04T12:35:04.3248248Z | ^~~~ 2025-12-04T12:35:04.3249581Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1868:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.3250840Z 1868 | 0x80, 2025-12-04T12:35:04.3251094Z | ^~~~ 2025-12-04T12:35:04.3252477Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1870:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.3253674Z 1870 | 0x80, 2025-12-04T12:35:04.3253928Z | ^~~~ 2025-12-04T12:35:04.3255270Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1872:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.3256555Z 1872 | 0x80, 2025-12-04T12:35:04.3256795Z | ^~~~ 2025-12-04T12:35:04.3258151Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1874:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.3259375Z 1874 | 0x80, 2025-12-04T12:35:04.3259616Z | ^~~~ 2025-12-04T12:35:04.3260972Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1876:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.3262180Z 1876 | 0x80, 2025-12-04T12:35:04.3262435Z | ^~~~ 2025-12-04T12:35:04.3263767Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1878:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.3264982Z 1878 | 0x80, 2025-12-04T12:35:04.3265245Z | ^~~~ 2025-12-04T12:35:04.3266566Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1880:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.3267786Z 1880 | 0x80, 2025-12-04T12:35:04.3268041Z | ^~~~ 2025-12-04T12:35:04.3269439Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1882:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.3270637Z 1882 | 0x80, 2025-12-04T12:35:04.3304336Z | ^~~~ 2025-12-04T12:35:04.3306131Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1884:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.3307391Z 1884 | 0x80, 2025-12-04T12:35:04.3307662Z | ^~~~ 2025-12-04T12:35:04.3309038Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1886:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.3310272Z 1886 | 0x80, 2025-12-04T12:35:04.3310520Z | ^~~~ 2025-12-04T12:35:04.3311898Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1888:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.3313117Z 1888 | 0x80, 2025-12-04T12:35:04.3313379Z | ^~~~ 2025-12-04T12:35:04.3314711Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1890:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.3316122Z 1890 | 0x80, 2025-12-04T12:35:04.3316387Z | ^~~~ 2025-12-04T12:35:04.3317742Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1892:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.3319021Z 1892 | 0x80, 2025-12-04T12:35:04.3319299Z | ^~~~ 2025-12-04T12:35:04.3320720Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1894:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.3321917Z 1894 | 0x80, 2025-12-04T12:35:04.3322178Z | ^~~~ 2025-12-04T12:35:04.3323525Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1896:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.3324744Z 1896 | 0x80, 2025-12-04T12:35:04.3324988Z | ^~~~ 2025-12-04T12:35:04.3326336Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1898:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.3327554Z 1898 | 0x80, 2025-12-04T12:35:04.3327802Z | ^~~~ 2025-12-04T12:35:04.3329148Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1900:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.3330361Z 1900 | 0x80, 2025-12-04T12:35:04.3330618Z | ^~~~ 2025-12-04T12:35:04.3331946Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1902:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.3333163Z 1902 | 0x80, 2025-12-04T12:35:04.3333414Z | ^~~~ 2025-12-04T12:35:04.3334742Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1904:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.3335958Z 1904 | 0x80, 2025-12-04T12:35:04.3336398Z | ^~~~ 2025-12-04T12:35:04.3337796Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1906:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.3338999Z 1906 | 0x80, 2025-12-04T12:35:04.3339257Z | ^~~~ 2025-12-04T12:35:04.3340599Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1908:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.3341812Z 1908 | 0x80, 2025-12-04T12:35:04.3342055Z | ^~~~ 2025-12-04T12:35:04.3343396Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1910:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.3344613Z 1910 | 0x80, 2025-12-04T12:35:04.3344864Z | ^~~~ 2025-12-04T12:35:04.3346209Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1912:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.3347417Z 1912 | 0x80, 2025-12-04T12:35:04.3347671Z | ^~~~ 2025-12-04T12:35:04.3348993Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1914:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.3350252Z 1914 | 0x80, 2025-12-04T12:35:04.3350507Z | ^~~~ 2025-12-04T12:35:04.3351848Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1916:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.3353093Z 1916 | 0x80, 2025-12-04T12:35:04.3353401Z | ^~~~ 2025-12-04T12:35:04.3354755Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1918:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.3355970Z 1918 | 0x80, 2025-12-04T12:35:04.3356215Z | ^~~~ 2025-12-04T12:35:04.3357547Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1920:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.3358764Z 1920 | 0x80, 2025-12-04T12:35:04.3359010Z | ^~~~ 2025-12-04T12:35:04.3360366Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1922:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.3361596Z 1922 | 0x80, 2025-12-04T12:35:04.3361859Z | ^~~~ 2025-12-04T12:35:04.3363183Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1924:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.3364399Z 1924 | 0x80, 2025-12-04T12:35:04.3364651Z | ^~~~ 2025-12-04T12:35:04.3365975Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1926:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.3367190Z 1926 | 0x80, 2025-12-04T12:35:04.3367447Z | ^~~~ 2025-12-04T12:35:04.3368786Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1928:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.3370023Z 1928 | 0x80); 2025-12-04T12:35:04.3370298Z | ^~~~ 2025-12-04T12:35:04.3371853Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1930:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.3373068Z 1930 | 0x80, 2025-12-04T12:35:04.3373309Z | ^~~~ 2025-12-04T12:35:04.3374667Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1932:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.3375877Z 1932 | 0x80, 2025-12-04T12:35:04.3376117Z | ^~~~ 2025-12-04T12:35:04.3377539Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1934:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.3378762Z 1934 | 0x80, 2025-12-04T12:35:04.3379025Z | ^~~~ 2025-12-04T12:35:04.3380356Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1936:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.3381561Z 1936 | 0x80, 2025-12-04T12:35:04.3381818Z | ^~~~ 2025-12-04T12:35:04.3383243Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1938:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.3384439Z 1938 | 0x80, 2025-12-04T12:35:04.3384695Z | ^~~~ 2025-12-04T12:35:04.3386036Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1940:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.3387336Z 1940 | 0x80, 2025-12-04T12:35:04.3387593Z | ^~~~ 2025-12-04T12:35:04.3388935Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1942:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.3390139Z 1942 | 0x80, 2025-12-04T12:35:04.3390378Z | ^~~~ 2025-12-04T12:35:04.3391726Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1944:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.3392944Z 1944 | 0x80, 2025-12-04T12:35:04.3393189Z | ^~~~ 2025-12-04T12:35:04.3394594Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1946:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.3395811Z 1946 | 0x80, 2025-12-04T12:35:04.3396072Z | ^~~~ 2025-12-04T12:35:04.3397405Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1948:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.3398616Z 1948 | 0x80, 2025-12-04T12:35:04.3398874Z | ^~~~ 2025-12-04T12:35:04.3400228Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1950:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.3401426Z 1950 | 0x80, 2025-12-04T12:35:04.3401686Z | ^~~~ 2025-12-04T12:35:04.3403039Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1952:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.3404262Z 1952 | 0x80, 2025-12-04T12:35:04.3404510Z | ^~~~ 2025-12-04T12:35:04.3405864Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1954:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.3407091Z 1954 | 0x80, 2025-12-04T12:35:04.3407402Z | ^~~~ 2025-12-04T12:35:04.3408748Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1956:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.3409959Z 1956 | 0x80, 2025-12-04T12:35:04.3410217Z | ^~~~ 2025-12-04T12:35:04.3411552Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1958:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.3412852Z 1958 | 0x80, 2025-12-04T12:35:04.3413112Z | ^~~~ 2025-12-04T12:35:04.3414447Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1960:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.3415660Z 1960 | 0x80, 2025-12-04T12:35:04.3415929Z | ^~~~ 2025-12-04T12:35:04.3417338Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1962:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.3418536Z 1962 | 0x80, 2025-12-04T12:35:04.3418791Z | ^~~~ 2025-12-04T12:35:04.3420146Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1964:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.3421362Z 1964 | 0x80, 2025-12-04T12:35:04.3421603Z | ^~~~ 2025-12-04T12:35:04.3422946Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1966:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.3424150Z 1966 | 0x80, 2025-12-04T12:35:04.3424399Z | ^~~~ 2025-12-04T12:35:04.3425744Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1968:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.3426951Z 1968 | 0x80, 2025-12-04T12:35:04.3427210Z | ^~~~ 2025-12-04T12:35:04.3428593Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1970:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.3429811Z 1970 | 0x80, 2025-12-04T12:35:04.3430068Z | ^~~~ 2025-12-04T12:35:04.3431408Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1972:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.3432601Z 1972 | 0x80, 2025-12-04T12:35:04.3432864Z | ^~~~ 2025-12-04T12:35:04.3434206Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1974:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.3435402Z 1974 | 0x80, 2025-12-04T12:35:04.3435660Z | ^~~~ 2025-12-04T12:35:04.3437014Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1976:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.3438231Z 1976 | 0x80, 2025-12-04T12:35:04.3438476Z | ^~~~ 2025-12-04T12:35:04.3439817Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1978:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.3441025Z 1978 | 0x80, 2025-12-04T12:35:04.3441322Z | ^~~~ 2025-12-04T12:35:04.3442652Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1980:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.3443860Z 1980 | 0x80, 2025-12-04T12:35:04.3444114Z | ^~~~ 2025-12-04T12:35:04.3445479Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1982:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.3446741Z 1982 | 0x80, 2025-12-04T12:35:04.3446994Z | ^~~~ 2025-12-04T12:35:04.3448330Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1984:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.3449532Z 1984 | 0x80, 2025-12-04T12:35:04.3449794Z | ^~~~ 2025-12-04T12:35:04.3451135Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1986:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.3452340Z 1986 | 0x80, 2025-12-04T12:35:04.3452584Z | ^~~~ 2025-12-04T12:35:04.3453936Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1988:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.3455140Z 1988 | 0x80, 2025-12-04T12:35:04.3455379Z | ^~~~ 2025-12-04T12:35:04.3456786Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1990:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.3458004Z 1990 | 0x80, 2025-12-04T12:35:04.3458261Z | ^~~~ 2025-12-04T12:35:04.3459594Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1992:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.3460802Z 1992 | 0x80, 2025-12-04T12:35:04.3461058Z | ^~~~ 2025-12-04T12:35:04.3462451Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:2002:38: warning: overflow in conversion from ‘int’ to ‘short int’ changes value from ‘65280’ to ‘-256’ [-Woverflow] 2025-12-04T12:35:04.3463724Z 2002 | __m512i keep_1 = _mm512_set1_epi16(0xFF00); 2025-12-04T12:35:04.3464131Z | ^~~~~~ 2025-12-04T12:35:04.3466850Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In instantiation of ‘at::vec::CPU_CAPABILITY::Vectorized at::vec::CPU_CAPABILITY::shift_512_8(const at::vec::CPU_CAPABILITY::Vectorized&, const at::vec::CPU_CAPABILITY::Vectorized&) [with bool left_shift = false; T = unsigned char; typename std::enable_if<(is_same_v || is_same_v), int>::type = 0]’: 2025-12-04T12:35:04.3470195Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:2116:28: required from here 2025-12-04T12:35:04.3472336Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1866:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.3473553Z 1866 | 0x80, 2025-12-04T12:35:04.3473821Z | ^~~~ 2025-12-04T12:35:04.3475347Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1868:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.3476665Z 1868 | 0x80, 2025-12-04T12:35:04.3476929Z | ^~~~ 2025-12-04T12:35:04.3478292Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1870:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.3479502Z 1870 | 0x80, 2025-12-04T12:35:04.3479803Z | ^~~~ 2025-12-04T12:35:04.3481211Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1872:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.3482428Z 1872 | 0x80, 2025-12-04T12:35:04.3482670Z | ^~~~ 2025-12-04T12:35:04.3484007Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1874:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.3485223Z 1874 | 0x80, 2025-12-04T12:35:04.3485482Z | ^~~~ 2025-12-04T12:35:04.3486807Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1876:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.3488026Z 1876 | 0x80, 2025-12-04T12:35:04.3488291Z | ^~~~ 2025-12-04T12:35:04.3489653Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1878:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.3490850Z 1878 | 0x80, 2025-12-04T12:35:04.3491108Z | ^~~~ 2025-12-04T12:35:04.3492450Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1880:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.3493656Z 1880 | 0x80, 2025-12-04T12:35:04.3493918Z | ^~~~ 2025-12-04T12:35:04.3495267Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1882:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.3496551Z 1882 | 0x80, 2025-12-04T12:35:04.3496804Z | ^~~~ 2025-12-04T12:35:04.3498240Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1884:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.3499453Z 1884 | 0x80, 2025-12-04T12:35:04.3499712Z | ^~~~ 2025-12-04T12:35:04.3501039Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1886:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.3502259Z 1886 | 0x80, 2025-12-04T12:35:04.3502516Z | ^~~~ 2025-12-04T12:35:04.3503843Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1888:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.3505059Z 1888 | 0x80, 2025-12-04T12:35:04.3505327Z | ^~~~ 2025-12-04T12:35:04.3506687Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1890:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.3507883Z 1890 | 0x80, 2025-12-04T12:35:04.3508146Z | ^~~~ 2025-12-04T12:35:04.3509488Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1892:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.3510745Z 1892 | 0x80, 2025-12-04T12:35:04.3510985Z | ^~~~ 2025-12-04T12:35:04.3512330Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1894:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.3513575Z 1894 | 0x80, 2025-12-04T12:35:04.3513814Z | ^~~~ 2025-12-04T12:35:04.3515195Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1896:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.3516400Z 1896 | 0x80, 2025-12-04T12:35:04.3516653Z | ^~~~ 2025-12-04T12:35:04.3517976Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1898:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.3519189Z 1898 | 0x80, 2025-12-04T12:35:04.3519447Z | ^~~~ 2025-12-04T12:35:04.3520772Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1900:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.3521990Z 1900 | 0x80, 2025-12-04T12:35:04.3522245Z | ^~~~ 2025-12-04T12:35:04.3523598Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1902:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.3524791Z 1902 | 0x80, 2025-12-04T12:35:04.3525047Z | ^~~~ 2025-12-04T12:35:04.3526385Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1904:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.3527598Z 1904 | 0x80, 2025-12-04T12:35:04.3527839Z | ^~~~ 2025-12-04T12:35:04.3529179Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1906:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.3530397Z 1906 | 0x80, 2025-12-04T12:35:04.3530638Z | ^~~~ 2025-12-04T12:35:04.3532027Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1908:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.3533240Z 1908 | 0x80, 2025-12-04T12:35:04.3533495Z | ^~~~ 2025-12-04T12:35:04.3534819Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1910:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.3536041Z 1910 | 0x80, 2025-12-04T12:35:04.3536361Z | ^~~~ 2025-12-04T12:35:04.3537722Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1912:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.3538925Z 1912 | 0x80, 2025-12-04T12:35:04.3539190Z | ^~~~ 2025-12-04T12:35:04.3540542Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1914:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.3541745Z 1914 | 0x80, 2025-12-04T12:35:04.3541998Z | ^~~~ 2025-12-04T12:35:04.3543339Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1916:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.3544594Z 1916 | 0x80, 2025-12-04T12:35:04.3544836Z | ^~~~ 2025-12-04T12:35:04.3546181Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1918:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.3547428Z 1918 | 0x80, 2025-12-04T12:35:04.3547687Z | ^~~~ 2025-12-04T12:35:04.3549057Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1920:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.3550263Z 1920 | 0x80, 2025-12-04T12:35:04.3550516Z | ^~~~ 2025-12-04T12:35:04.3551847Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1922:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.3553056Z 1922 | 0x80, 2025-12-04T12:35:04.3553310Z | ^~~~ 2025-12-04T12:35:04.3554646Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1924:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.3555844Z 1924 | 0x80, 2025-12-04T12:35:04.3556101Z | ^~~~ 2025-12-04T12:35:04.3557448Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1926:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.3558652Z 1926 | 0x80, 2025-12-04T12:35:04.3558896Z | ^~~~ 2025-12-04T12:35:04.3560230Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1928:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.3561449Z 1928 | 0x80); 2025-12-04T12:35:04.3561703Z | ^~~~ 2025-12-04T12:35:04.3563048Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1930:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.3564265Z 1930 | 0x80, 2025-12-04T12:35:04.3564526Z | ^~~~ 2025-12-04T12:35:04.3565900Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1932:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.3567112Z 1932 | 0x80, 2025-12-04T12:35:04.3567368Z | ^~~~ 2025-12-04T12:35:04.3568698Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1934:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.3569908Z 1934 | 0x80, 2025-12-04T12:35:04.3570163Z | ^~~~ 2025-12-04T12:35:04.3571689Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1936:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.3572893Z 1936 | 0x80, 2025-12-04T12:35:04.3573155Z | ^~~~ 2025-12-04T12:35:04.3574505Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1938:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.3575722Z 1938 | 0x80, 2025-12-04T12:35:04.3575966Z | ^~~~ 2025-12-04T12:35:04.3577380Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1940:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.3578684Z 1940 | 0x80, 2025-12-04T12:35:04.3578929Z | ^~~~ 2025-12-04T12:35:04.3580283Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1942:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.3581568Z 1942 | 0x80, 2025-12-04T12:35:04.3581835Z | ^~~~ 2025-12-04T12:35:04.3583229Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1944:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.3584457Z 1944 | 0x80, 2025-12-04T12:35:04.3584721Z | ^~~~ 2025-12-04T12:35:04.3586085Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1946:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.3587290Z 1946 | 0x80, 2025-12-04T12:35:04.3587551Z | ^~~~ 2025-12-04T12:35:04.3588893Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1948:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.3590110Z 1948 | 0x80, 2025-12-04T12:35:04.3590411Z | ^~~~ 2025-12-04T12:35:04.3591765Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1950:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.3592977Z 1950 | 0x80, 2025-12-04T12:35:04.3593224Z | ^~~~ 2025-12-04T12:35:04.3594576Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1952:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.3595801Z 1952 | 0x80, 2025-12-04T12:35:04.3596064Z | ^~~~ 2025-12-04T12:35:04.3597393Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1954:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.3598615Z 1954 | 0x80, 2025-12-04T12:35:04.3598879Z | ^~~~ 2025-12-04T12:35:04.3600209Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1956:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.3601411Z 1956 | 0x80, 2025-12-04T12:35:04.3601665Z | ^~~~ 2025-12-04T12:35:04.3603001Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1958:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.3604239Z 1958 | 0x80, 2025-12-04T12:35:04.3604493Z | ^~~~ 2025-12-04T12:35:04.3605833Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1960:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.3607085Z 1960 | 0x80, 2025-12-04T12:35:04.3607365Z | ^~~~ 2025-12-04T12:35:04.3608709Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1962:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.3609919Z 1962 | 0x80, 2025-12-04T12:35:04.3610160Z | ^~~~ 2025-12-04T12:35:04.3611495Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1964:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.3612710Z 1964 | 0x80, 2025-12-04T12:35:04.3612967Z | ^~~~ 2025-12-04T12:35:04.3614299Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1966:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.3615519Z 1966 | 0x80, 2025-12-04T12:35:04.3615781Z | ^~~~ 2025-12-04T12:35:04.3617181Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1968:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.3618383Z 1968 | 0x80, 2025-12-04T12:35:04.3618641Z | ^~~~ 2025-12-04T12:35:04.3620000Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1970:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.3621201Z 1970 | 0x80, 2025-12-04T12:35:04.3621462Z | ^~~~ 2025-12-04T12:35:04.3622799Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1972:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.3624075Z 1972 | 0x80, 2025-12-04T12:35:04.3624328Z | ^~~~ 2025-12-04T12:35:04.3625680Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1974:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.3626889Z 1974 | 0x80, 2025-12-04T12:35:04.3627133Z | ^~~~ 2025-12-04T12:35:04.3628481Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1976:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.3629700Z 1976 | 0x80, 2025-12-04T12:35:04.3629960Z | ^~~~ 2025-12-04T12:35:04.3631282Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1978:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.3632513Z 1978 | 0x80, 2025-12-04T12:35:04.3632765Z | ^~~~ 2025-12-04T12:35:04.3634102Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1980:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.3635315Z 1980 | 0x80, 2025-12-04T12:35:04.3635567Z | ^~~~ 2025-12-04T12:35:04.3636947Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1982:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.3638148Z 1982 | 0x80, 2025-12-04T12:35:04.3638391Z | ^~~~ 2025-12-04T12:35:04.3639720Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1984:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.3641002Z 1984 | 0x80, 2025-12-04T12:35:04.3641248Z | ^~~~ 2025-12-04T12:35:04.3642584Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1986:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.3643789Z 1986 | 0x80, 2025-12-04T12:35:04.3644045Z | ^~~~ 2025-12-04T12:35:04.3645375Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1988:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.3646587Z 1988 | 0x80, 2025-12-04T12:35:04.3646840Z | ^~~~ 2025-12-04T12:35:04.3648175Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1990:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.3649395Z 1990 | 0x80, 2025-12-04T12:35:04.3649651Z | ^~~~ 2025-12-04T12:35:04.3650994Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1992:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.3652187Z 1992 | 0x80, 2025-12-04T12:35:04.3652447Z | ^~~~ 2025-12-04T12:35:04.3653798Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:2002:38: warning: overflow in conversion from ‘int’ to ‘short int’ changes value from ‘65280’ to ‘-256’ [-Woverflow] 2025-12-04T12:35:04.3655076Z 2002 | __m512i keep_1 = _mm512_set1_epi16(0xFF00); 2025-12-04T12:35:04.3655469Z | ^~~~~~ 2025-12-04T12:35:04.3656243Z In file included from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512.h:16, 2025-12-04T12:35:04.3657390Z from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec.h:5, 2025-12-04T12:35:04.3658353Z from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/functional_base.h:7, 2025-12-04T12:35:04.3659333Z from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/functional.h:4, 2025-12-04T12:35:04.3660358Z from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/torch/csrc/inductor/cpp_prefix.h:45, 2025-12-04T12:35:04.3661625Z from /tmp/ROfn0q/tmpejwtemx7/data/aotinductor/model/cn3k2mlnpdktb5d42n3gbws3qpzrim5w2lb6w5t7cv3mzl7dq3b5.wrapper.cpp:723: 2025-12-04T12:35:04.3663957Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h: In instantiation of ‘void at::vec::CPU_CAPABILITY::QuantizeAvx512(const float*, T*, int, float, int64_t) [with T = signed char; int64_t = long int]’: 2025-12-04T12:35:04.3665830Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:696:31: required from here 2025-12-04T12:35:04.3667780Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:201:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.3669072Z 201 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.3669417Z | ^~~~ 2025-12-04T12:35:04.3670770Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:201:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.3672203Z 201 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.3672547Z | ^~~~ 2025-12-04T12:35:04.3674084Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:201:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.3675315Z 201 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.3675665Z | ^~~~ 2025-12-04T12:35:04.3677069Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:201:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.3678315Z 201 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.3678649Z | ^~~~ 2025-12-04T12:35:04.3680073Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:202:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.3681319Z 202 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.3681640Z | ^~~~ 2025-12-04T12:35:04.3683009Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:202:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.3684246Z 202 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.3684588Z | ^~~~ 2025-12-04T12:35:04.3685951Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:202:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.3687194Z 202 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.3687544Z | ^~~~ 2025-12-04T12:35:04.3688937Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:202:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.3690231Z 202 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.3690578Z | ^~~~ 2025-12-04T12:35:04.3691990Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:203:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.3693215Z 203 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.3693529Z | ^~~~ 2025-12-04T12:35:04.3694893Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:203:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.3696123Z 203 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.3696533Z | ^~~~ 2025-12-04T12:35:04.3697929Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:203:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.3699178Z 203 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.3699519Z | ^~~~ 2025-12-04T12:35:04.3700902Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:203:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.3702227Z 203 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.3702572Z | ^~~~ 2025-12-04T12:35:04.3703987Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:205:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.3705201Z 205 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.3705574Z | ^~~~ 2025-12-04T12:35:04.3706976Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:205:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.3708217Z 205 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.3708540Z | ^~~~ 2025-12-04T12:35:04.3709921Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:205:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.3711159Z 205 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.3711484Z | ^~~~ 2025-12-04T12:35:04.3712882Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:205:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.3714117Z 205 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.3714458Z | ^~~~ 2025-12-04T12:35:04.3715866Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:206:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.3717104Z 206 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.3717434Z | ^~~~ 2025-12-04T12:35:04.3718783Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:206:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.3720007Z 206 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.3720343Z | ^~~~ 2025-12-04T12:35:04.3721717Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:206:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.3722990Z 206 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.3723336Z | ^~~~ 2025-12-04T12:35:04.3724732Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:206:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.3725961Z 206 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.3726284Z | ^~~~ 2025-12-04T12:35:04.3727700Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:207:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.3728945Z 207 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.3729276Z | ^~~~ 2025-12-04T12:35:04.3731342Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:207:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.3732587Z 207 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.3732930Z | ^~~~ 2025-12-04T12:35:04.3734300Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:207:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.3735595Z 207 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.3735932Z | ^~~~ 2025-12-04T12:35:04.3737393Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:207:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.3738619Z 207 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.3739006Z | ^~~~ 2025-12-04T12:35:04.3740476Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:209:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.3741710Z 209 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.3742028Z | ^~~~ 2025-12-04T12:35:04.3743381Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:209:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.3744794Z 209 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.3745131Z | ^~~~ 2025-12-04T12:35:04.3746509Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:209:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.3747752Z 209 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.3748097Z | ^~~~ 2025-12-04T12:35:04.3749496Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:209:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.3750729Z 209 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.3751067Z | ^~~~ 2025-12-04T12:35:04.3752483Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:210:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.3753705Z 210 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.3754033Z | ^~~~ 2025-12-04T12:35:04.3755385Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:210:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.3756688Z 210 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.3757019Z | ^~~~ 2025-12-04T12:35:04.3758402Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:210:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.3759637Z 210 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.3759975Z | ^~~~ 2025-12-04T12:35:04.3761366Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:210:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.3762606Z 210 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.3762952Z | ^~~~ 2025-12-04T12:35:04.3764363Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:211:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.3765608Z 211 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.3765946Z | ^~~~ 2025-12-04T12:35:04.3767304Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:211:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.3768598Z 211 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.3768938Z | ^~~~ 2025-12-04T12:35:04.3770329Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:211:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.3770448Z 211 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.3770610Z | ^~~~ 2025-12-04T12:35:04.3772069Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:211:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.3772186Z 211 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.3772305Z | ^~~~ 2025-12-04T12:35:04.3773495Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:213:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.3773633Z 213 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.3773728Z | ^~~~ 2025-12-04T12:35:04.3774917Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:213:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.3775053Z 213 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.3775152Z | ^~~~ 2025-12-04T12:35:04.3776863Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:213:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.3776980Z 213 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.3777087Z | ^~~~ 2025-12-04T12:35:04.3778304Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:213:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.3778424Z 213 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.3778527Z | ^~~~ 2025-12-04T12:35:04.3779732Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:214:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.3779861Z 214 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.3779975Z | ^~~~ 2025-12-04T12:35:04.3781156Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:214:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.3781270Z 214 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.3781379Z | ^~~~ 2025-12-04T12:35:04.3782637Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:214:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.3782764Z 214 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.3782864Z | ^~~~ 2025-12-04T12:35:04.3784091Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:214:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.3784266Z 214 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.3784366Z | ^~~~ 2025-12-04T12:35:04.3785565Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:215:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.3785685Z 215 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.3785777Z | ^~~~ 2025-12-04T12:35:04.3786971Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:215:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.3787083Z 215 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.3787187Z | ^~~~ 2025-12-04T12:35:04.3788401Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:215:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.3788515Z 215 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.3788638Z | ^~~~ 2025-12-04T12:35:04.3789820Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:215:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.3789940Z 215 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.3790057Z | ^~~~ 2025-12-04T12:35:04.3791563Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h: In instantiation of ‘void at::vec::CPU_CAPABILITY::QuantizeAvx512(const float*, T*, int, float, int64_t) [with T = unsigned char; int64_t = long int]’: 2025-12-04T12:35:04.3792249Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:933:31: required from here 2025-12-04T12:35:04.3793445Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:201:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.3793560Z 201 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.3793669Z | ^~~~ 2025-12-04T12:35:04.3794859Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:201:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.3794985Z 201 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.3795087Z | ^~~~ 2025-12-04T12:35:04.3796281Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:201:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.3796423Z 201 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.3796525Z | ^~~~ 2025-12-04T12:35:04.3797726Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:201:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.3797840Z 201 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.3797988Z | ^~~~ 2025-12-04T12:35:04.3799185Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:202:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.3799298Z 202 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.3799392Z | ^~~~ 2025-12-04T12:35:04.3800666Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:202:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.3800782Z 202 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.3800893Z | ^~~~ 2025-12-04T12:35:04.3802078Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:202:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.3802197Z 202 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.3802310Z | ^~~~ 2025-12-04T12:35:04.3803492Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:202:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.3803625Z 202 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.3803725Z | ^~~~ 2025-12-04T12:35:04.3804914Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:203:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.3805043Z 203 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.3805135Z | ^~~~ 2025-12-04T12:35:04.3806335Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:203:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.3806455Z 203 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.3806554Z | ^~~~ 2025-12-04T12:35:04.3807752Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:203:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.3807912Z 203 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.3808023Z | ^~~~ 2025-12-04T12:35:04.3809217Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:203:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.3809332Z 203 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.3809448Z | ^~~~ 2025-12-04T12:35:04.3810633Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:205:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.3810745Z 205 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.3810853Z | ^~~~ 2025-12-04T12:35:04.3812043Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:205:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.3812183Z 205 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.3812280Z | ^~~~ 2025-12-04T12:35:04.3813457Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:205:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.3813584Z 205 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.3813723Z | ^~~~ 2025-12-04T12:35:04.3814920Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:205:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.3815034Z 205 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.3815139Z | ^~~~ 2025-12-04T12:35:04.3816498Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:206:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.3816615Z 206 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.3816710Z | ^~~~ 2025-12-04T12:35:04.3817920Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:206:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.3818040Z 206 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.3818151Z | ^~~~ 2025-12-04T12:35:04.3819333Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:206:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.3819452Z 206 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.3819569Z | ^~~~ 2025-12-04T12:35:04.3820769Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:206:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.3820897Z 206 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.3821003Z | ^~~~ 2025-12-04T12:35:04.3822178Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:207:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.3822310Z 207 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.3822404Z | ^~~~ 2025-12-04T12:35:04.3823601Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:207:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.3823762Z 207 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.3823867Z | ^~~~ 2025-12-04T12:35:04.3825063Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:207:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.3825176Z 207 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.3825277Z | ^~~~ 2025-12-04T12:35:04.3826479Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:207:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.3826590Z 207 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.3826703Z | ^~~~ 2025-12-04T12:35:04.3827892Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:209:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.3828011Z 209 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.3828117Z | ^~~~ 2025-12-04T12:35:04.3829298Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:209:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.3829466Z 209 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.3829566Z | ^~~~ 2025-12-04T12:35:04.3830757Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:209:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.3830882Z 209 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.3830982Z | ^~~~ 2025-12-04T12:35:04.3832253Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:209:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.3832368Z 209 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.3832472Z | ^~~~ 2025-12-04T12:35:04.3833666Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:210:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.3833783Z 210 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.3833876Z | ^~~~ 2025-12-04T12:35:04.3835077Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:210:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.3835198Z 210 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.3835308Z | ^~~~ 2025-12-04T12:35:04.3836499Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:210:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.3836611Z 210 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.3836729Z | ^~~~ 2025-12-04T12:35:04.3837912Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:210:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.3838047Z 210 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.3838150Z | ^~~~ 2025-12-04T12:35:04.3839327Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:211:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.3839507Z 211 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.3839610Z | ^~~~ 2025-12-04T12:35:04.3840804Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:211:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.3840931Z 211 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.3841028Z | ^~~~ 2025-12-04T12:35:04.3842232Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:211:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.3842345Z 211 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.3842446Z | ^~~~ 2025-12-04T12:35:04.3843649Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:211:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.3843769Z 211 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.3843884Z | ^~~~ 2025-12-04T12:35:04.3845061Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:213:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.3845217Z 213 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.3845324Z | ^~~~ 2025-12-04T12:35:04.3846504Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:213:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.3846632Z 213 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.3846763Z | ^~~~ 2025-12-04T12:35:04.3848007Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:213:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.3848135Z 213 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.3848237Z | ^~~~ 2025-12-04T12:35:04.3849422Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:213:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.3849555Z 213 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.3849658Z | ^~~~ 2025-12-04T12:35:04.3850848Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:214:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.3850986Z 214 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.3851080Z | ^~~~ 2025-12-04T12:35:04.3852286Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:214:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.3852401Z 214 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.3852515Z | ^~~~ 2025-12-04T12:35:04.3853704Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:214:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.3853825Z 214 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.3853942Z | ^~~~ 2025-12-04T12:35:04.3855129Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:214:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.3855314Z 214 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.3855423Z | ^~~~ 2025-12-04T12:35:04.3856684Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:215:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.3856815Z 215 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.3856909Z | ^~~~ 2025-12-04T12:35:04.3858108Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:215:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.3858238Z 215 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.3858337Z | ^~~~ 2025-12-04T12:35:04.3859554Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:215:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.3859677Z 215 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.3859780Z | ^~~~ 2025-12-04T12:35:04.3860983Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:215:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.3861147Z 215 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.3861267Z | ^~~~ 2025-12-04T12:35:04.3861374Z PASSED [9.4856s] [ 26%] 2025-12-04T12:35:04.3862365Z inductor/test_aot_inductor_package.py::TestAOTInductorPackageCpp_cpu::test_bool_input In file included from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_float.h:12, 2025-12-04T12:35:04.3862825Z from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512.h:11, 2025-12-04T12:35:04.3863280Z from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec.h:5, 2025-12-04T12:35:04.3863739Z from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/functional_base.h:7, 2025-12-04T12:35:04.3864148Z from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/functional.h:4, 2025-12-04T12:35:04.3864618Z from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/torch/csrc/inductor/cpp_prefix.h:45, 2025-12-04T12:35:04.3865260Z from /tmp/Qt7dz2/tmpq_68ffd4/data/aotinductor/model/cad56iehvyrgd23725fkkazyktxy2vkmfdx2f6hgjyqe5hsp2q7e.wrapper.cpp:656: 2025-12-04T12:35:04.3865868Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/sleef.h:192:10: warning: ISO C++ prohibits anonymous structs [-Wpedantic] 2025-12-04T12:35:04.3865989Z 192 | struct { 2025-12-04T12:35:04.3866085Z | ^ 2025-12-04T12:35:04.3866593Z In file included from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512.h:15, 2025-12-04T12:35:04.3866973Z from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec.h:5, 2025-12-04T12:35:04.3867421Z from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/functional_base.h:7, 2025-12-04T12:35:04.3867827Z from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/functional.h:4, 2025-12-04T12:35:04.3868307Z from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/torch/csrc/inductor/cpp_prefix.h:45, 2025-12-04T12:35:04.3868927Z from /tmp/Qt7dz2/tmpq_68ffd4/data/aotinductor/model/cad56iehvyrgd23725fkkazyktxy2vkmfdx2f6hgjyqe5hsp2q7e.wrapper.cpp:656: 2025-12-04T12:35:04.3871468Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In static member function ‘static at::vec::CPU_CAPABILITY::Vectorized at::vec::CPU_CAPABILITY::Vectorized::blendv(const at::vec::CPU_CAPABILITY::Vectorized&, const at::vec::CPU_CAPABILITY::Vectorized&, const at::vec::CPU_CAPABILITY::Vectorized&)’: 2025-12-04T12:35:04.3872660Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:544:38: warning: overflow in conversion from ‘int’ to ‘short int’ changes value from ‘65535’ to ‘-1’ [-Woverflow] 2025-12-04T12:35:04.3872833Z 544 | auto msb_one = _mm512_set1_epi16(0xFFFF); 2025-12-04T12:35:04.3872951Z | ^~~~~~ 2025-12-04T12:35:04.3873454Z In file included from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512.h:15, 2025-12-04T12:35:04.3873842Z from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec.h:5, 2025-12-04T12:35:04.3874296Z from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/functional_base.h:7, 2025-12-04T12:35:04.3874715Z from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/functional.h:4, 2025-12-04T12:35:04.3875178Z from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/torch/csrc/inductor/cpp_prefix.h:45, 2025-12-04T12:35:04.3875848Z from /tmp/Qt7dz2/tmpq_68ffd4/data/aotinductor/model/cad56iehvyrgd23725fkkazyktxy2vkmfdx2f6hgjyqe5hsp2q7e.wrapper.cpp:656: 2025-12-04T12:35:04.3877492Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In member function ‘at::vec::CPU_CAPABILITY::Vectorized at::vec::CPU_CAPABILITY::Vectorized::operator==(const at::vec::CPU_CAPABILITY::Vectorized&) const’: 2025-12-04T12:35:04.3878761Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:697:54: warning: overflow in conversion from ‘int’ to ‘short int’ changes value from ‘65535’ to ‘-1’ [-Woverflow] 2025-12-04T12:35:04.3878988Z 697 | return _mm512_mask_set1_epi16(zero_vector, mask, 0xFFFF); 2025-12-04T12:35:04.3879114Z | ^~~~~~ 2025-12-04T12:35:04.3880748Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In member function ‘at::vec::CPU_CAPABILITY::Vectorized at::vec::CPU_CAPABILITY::Vectorized::operator!=(const at::vec::CPU_CAPABILITY::Vectorized&) const’: 2025-12-04T12:35:04.3881918Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:701:54: warning: overflow in conversion from ‘int’ to ‘short int’ changes value from ‘65535’ to ‘-1’ [-Woverflow] 2025-12-04T12:35:04.3882131Z 701 | return _mm512_mask_set1_epi16(zero_vector, mask, 0xFFFF); 2025-12-04T12:35:04.3882283Z | ^~~~~~ 2025-12-04T12:35:04.3883894Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In member function ‘at::vec::CPU_CAPABILITY::Vectorized at::vec::CPU_CAPABILITY::Vectorized::operator<(const at::vec::CPU_CAPABILITY::Vectorized&) const’: 2025-12-04T12:35:04.3885072Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:705:54: warning: overflow in conversion from ‘int’ to ‘short int’ changes value from ‘65535’ to ‘-1’ [-Woverflow] 2025-12-04T12:35:04.3885283Z 705 | return _mm512_mask_set1_epi16(zero_vector, mask, 0xFFFF); 2025-12-04T12:35:04.3885408Z | ^~~~~~ 2025-12-04T12:35:04.3887084Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In member function ‘at::vec::CPU_CAPABILITY::Vectorized at::vec::CPU_CAPABILITY::Vectorized::operator<=(const at::vec::CPU_CAPABILITY::Vectorized&) const’: 2025-12-04T12:35:04.3888256Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:709:54: warning: overflow in conversion from ‘int’ to ‘short int’ changes value from ‘65535’ to ‘-1’ [-Woverflow] 2025-12-04T12:35:04.3888471Z 709 | return _mm512_mask_set1_epi16(zero_vector, mask, 0xFFFF); 2025-12-04T12:35:04.3888635Z | ^~~~~~ 2025-12-04T12:35:04.3890731Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In member function ‘at::vec::CPU_CAPABILITY::Vectorized at::vec::CPU_CAPABILITY::Vectorized::operator>(const at::vec::CPU_CAPABILITY::Vectorized&) const’: 2025-12-04T12:35:04.3891997Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:713:54: warning: overflow in conversion from ‘int’ to ‘short int’ changes value from ‘65535’ to ‘-1’ [-Woverflow] 2025-12-04T12:35:04.3892205Z 713 | return _mm512_mask_set1_epi16(zero_vector, mask, 0xFFFF); 2025-12-04T12:35:04.3892346Z | ^~~~~~ 2025-12-04T12:35:04.3893961Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In member function ‘at::vec::CPU_CAPABILITY::Vectorized at::vec::CPU_CAPABILITY::Vectorized::operator>=(const at::vec::CPU_CAPABILITY::Vectorized&) const’: 2025-12-04T12:35:04.3895144Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:717:54: warning: overflow in conversion from ‘int’ to ‘short int’ changes value from ‘65535’ to ‘-1’ [-Woverflow] 2025-12-04T12:35:04.3895355Z 717 | return _mm512_mask_set1_epi16(zero_vector, mask, 0xFFFF); 2025-12-04T12:35:04.3895498Z | ^~~~~~ 2025-12-04T12:35:04.3898009Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In static member function ‘static at::vec::CPU_CAPABILITY::Vectorized at::vec::CPU_CAPABILITY::Vectorized::blendv(const at::vec::CPU_CAPABILITY::Vectorized&, const at::vec::CPU_CAPABILITY::Vectorized&, const at::vec::CPU_CAPABILITY::Vectorized&)’: 2025-12-04T12:35:04.3899233Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1153:37: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.3899397Z 1153 | auto msb_one = _mm512_set1_epi8(0xFF); 2025-12-04T12:35:04.3899513Z | ^~~~ 2025-12-04T12:35:04.3901255Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In member function ‘at::vec::CPU_CAPABILITY::Vectorized at::vec::CPU_CAPABILITY::Vectorized::operator==(const at::vec::CPU_CAPABILITY::Vectorized&) const’: 2025-12-04T12:35:04.3902452Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1166:53: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.3902679Z 1166 | return _mm512_mask_set1_epi8(zero_vector, mask, 0xFF); 2025-12-04T12:35:04.3902803Z | ^~~~ 2025-12-04T12:35:04.3904455Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In member function ‘at::vec::CPU_CAPABILITY::Vectorized at::vec::CPU_CAPABILITY::Vectorized::operator!=(const at::vec::CPU_CAPABILITY::Vectorized&) const’: 2025-12-04T12:35:04.3905676Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1170:53: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.3905882Z 1170 | return _mm512_mask_set1_epi8(zero_vector, mask, 0xFF); 2025-12-04T12:35:04.3906021Z | ^~~~ 2025-12-04T12:35:04.3907675Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In member function ‘at::vec::CPU_CAPABILITY::Vectorized at::vec::CPU_CAPABILITY::Vectorized::operator<(const at::vec::CPU_CAPABILITY::Vectorized&) const’: 2025-12-04T12:35:04.3908916Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1174:53: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.3909174Z 1174 | return _mm512_mask_set1_epi8(zero_vector, mask, 0xFF); 2025-12-04T12:35:04.3909340Z | ^~~~ 2025-12-04T12:35:04.3911011Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In member function ‘at::vec::CPU_CAPABILITY::Vectorized at::vec::CPU_CAPABILITY::Vectorized::operator<=(const at::vec::CPU_CAPABILITY::Vectorized&) const’: 2025-12-04T12:35:04.3912203Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1178:53: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.3912420Z 1178 | return _mm512_mask_set1_epi8(zero_vector, mask, 0xFF); 2025-12-04T12:35:04.3912545Z | ^~~~ 2025-12-04T12:35:04.3914887Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In static member function ‘static at::vec::CPU_CAPABILITY::Vectorized at::vec::CPU_CAPABILITY::Vectorized::blendv(const at::vec::CPU_CAPABILITY::Vectorized&, const at::vec::CPU_CAPABILITY::Vectorized&, const at::vec::CPU_CAPABILITY::Vectorized&)’: 2025-12-04T12:35:04.3916093Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1207:37: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.3916247Z 1207 | auto msb_one = _mm512_set1_epi8(0xFF); 2025-12-04T12:35:04.3916378Z | ^~~~ 2025-12-04T12:35:04.3918080Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In member function ‘at::vec::CPU_CAPABILITY::Vectorized at::vec::CPU_CAPABILITY::Vectorized::operator==(const at::vec::CPU_CAPABILITY::Vectorized&) const’: 2025-12-04T12:35:04.3919345Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1220:53: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.3919555Z 1220 | return _mm512_mask_set1_epi8(zero_vector, mask, 0xFF); 2025-12-04T12:35:04.3919679Z | ^~~~ 2025-12-04T12:35:04.3921389Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In member function ‘at::vec::CPU_CAPABILITY::Vectorized at::vec::CPU_CAPABILITY::Vectorized::operator!=(const at::vec::CPU_CAPABILITY::Vectorized&) const’: 2025-12-04T12:35:04.3922573Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1224:53: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.3922805Z 1224 | return _mm512_mask_set1_epi8(zero_vector, mask, 0xFF); 2025-12-04T12:35:04.3922929Z | ^~~~ 2025-12-04T12:35:04.3924632Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In member function ‘at::vec::CPU_CAPABILITY::Vectorized at::vec::CPU_CAPABILITY::Vectorized::operator<(const at::vec::CPU_CAPABILITY::Vectorized&) const’: 2025-12-04T12:35:04.3925858Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1228:53: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.3926058Z 1228 | return _mm512_mask_set1_epi8(zero_vector, mask, 0xFF); 2025-12-04T12:35:04.3926230Z | ^~~~ 2025-12-04T12:35:04.3927957Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In member function ‘at::vec::CPU_CAPABILITY::Vectorized at::vec::CPU_CAPABILITY::Vectorized::operator<=(const at::vec::CPU_CAPABILITY::Vectorized&) const’: 2025-12-04T12:35:04.3929161Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1232:53: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.3929367Z 1232 | return _mm512_mask_set1_epi8(zero_vector, mask, 0xFF); 2025-12-04T12:35:04.3929489Z | ^~~~ 2025-12-04T12:35:04.3931883Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In instantiation of ‘at::vec::CPU_CAPABILITY::Vectorized at::vec::CPU_CAPABILITY::shift_512_8(const at::vec::CPU_CAPABILITY::Vectorized&, const at::vec::CPU_CAPABILITY::Vectorized&) [with bool left_shift = true; T = signed char; typename std::enable_if<(is_same_v || is_same_v), int>::type = 0]’: 2025-12-04T12:35:04.3932471Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:2074:27: required from here 2025-12-04T12:35:04.3933672Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1866:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.3933775Z 1866 | 0x80, 2025-12-04T12:35:04.3933885Z | ^~~~ 2025-12-04T12:35:04.3935065Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1868:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.3935167Z 1868 | 0x80, 2025-12-04T12:35:04.3935273Z | ^~~~ 2025-12-04T12:35:04.3936591Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1870:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.3936702Z 1870 | 0x80, 2025-12-04T12:35:04.3936796Z | ^~~~ 2025-12-04T12:35:04.3937990Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1872:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.3938108Z 1872 | 0x80, 2025-12-04T12:35:04.3938201Z | ^~~~ 2025-12-04T12:35:04.3939375Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1874:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.3939492Z 1874 | 0x80, 2025-12-04T12:35:04.3939584Z | ^~~~ 2025-12-04T12:35:04.3940784Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1876:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.3940879Z 1876 | 0x80, 2025-12-04T12:35:04.3940969Z | ^~~~ 2025-12-04T12:35:04.3942155Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1878:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.3942288Z 1878 | 0x80, 2025-12-04T12:35:04.3942381Z | ^~~~ 2025-12-04T12:35:04.3943579Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1880:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.3943713Z 1880 | 0x80, 2025-12-04T12:35:04.3943823Z | ^~~~ 2025-12-04T12:35:04.3945040Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1882:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.3945134Z 1882 | 0x80, 2025-12-04T12:35:04.3945239Z | ^~~~ 2025-12-04T12:35:04.3946421Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1884:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.3946535Z 1884 | 0x80, 2025-12-04T12:35:04.3946628Z | ^~~~ 2025-12-04T12:35:04.3947804Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1886:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.3947917Z 1886 | 0x80, 2025-12-04T12:35:04.3948018Z | ^~~~ 2025-12-04T12:35:04.3949197Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1888:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.3949303Z 1888 | 0x80, 2025-12-04T12:35:04.3949394Z | ^~~~ 2025-12-04T12:35:04.3950577Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1890:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.3950677Z 1890 | 0x80, 2025-12-04T12:35:04.3950769Z | ^~~~ 2025-12-04T12:35:04.3951962Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1892:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.3952065Z 1892 | 0x80, 2025-12-04T12:35:04.3952212Z | ^~~~ 2025-12-04T12:35:04.3953396Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1894:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.3953492Z 1894 | 0x80, 2025-12-04T12:35:04.3953604Z | ^~~~ 2025-12-04T12:35:04.3954776Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1896:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.3954876Z 1896 | 0x80, 2025-12-04T12:35:04.3954985Z | ^~~~ 2025-12-04T12:35:04.3956160Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1898:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.3956276Z 1898 | 0x80, 2025-12-04T12:35:04.3956380Z | ^~~~ 2025-12-04T12:35:04.3957553Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1900:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.3957660Z 1900 | 0x80, 2025-12-04T12:35:04.3957751Z | ^~~~ 2025-12-04T12:35:04.3958941Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1902:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.3959073Z 1902 | 0x80, 2025-12-04T12:35:04.3959165Z | ^~~~ 2025-12-04T12:35:04.3960357Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1904:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.3960493Z 1904 | 0x80, 2025-12-04T12:35:04.3960621Z | ^~~~ 2025-12-04T12:35:04.3961813Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1906:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.3961907Z 1906 | 0x80, 2025-12-04T12:35:04.3962012Z | ^~~~ 2025-12-04T12:35:04.3963191Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1908:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.3963290Z 1908 | 0x80, 2025-12-04T12:35:04.3963396Z | ^~~~ 2025-12-04T12:35:04.3964571Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1910:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.3964689Z 1910 | 0x80, 2025-12-04T12:35:04.3964788Z | ^~~~ 2025-12-04T12:35:04.3965967Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1912:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.3966077Z 1912 | 0x80, 2025-12-04T12:35:04.3966171Z | ^~~~ 2025-12-04T12:35:04.3967352Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1914:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.3967467Z 1914 | 0x80, 2025-12-04T12:35:04.3967562Z | ^~~~ 2025-12-04T12:35:04.3968755Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1916:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.3968898Z 1916 | 0x80, 2025-12-04T12:35:04.3968999Z | ^~~~ 2025-12-04T12:35:04.3970190Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1918:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.3970287Z 1918 | 0x80, 2025-12-04T12:35:04.3970395Z | ^~~~ 2025-12-04T12:35:04.3971771Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1920:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.3971868Z 1920 | 0x80, 2025-12-04T12:35:04.3971976Z | ^~~~ 2025-12-04T12:35:04.3973155Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1922:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.3973265Z 1922 | 0x80, 2025-12-04T12:35:04.3973381Z | ^~~~ 2025-12-04T12:35:04.3974558Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1924:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.3974669Z 1924 | 0x80, 2025-12-04T12:35:04.3974763Z | ^~~~ 2025-12-04T12:35:04.3976059Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1926:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.3976171Z 1926 | 0x80, 2025-12-04T12:35:04.3976265Z | ^~~~ 2025-12-04T12:35:04.3977528Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1928:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.3977742Z 1928 | 0x80); 2025-12-04T12:35:04.3977841Z | ^~~~ 2025-12-04T12:35:04.3979037Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1930:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.3979133Z 1930 | 0x80, 2025-12-04T12:35:04.3979226Z | ^~~~ 2025-12-04T12:35:04.3980427Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1932:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.3980522Z 1932 | 0x80, 2025-12-04T12:35:04.3980630Z | ^~~~ 2025-12-04T12:35:04.3981813Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1934:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.3981920Z 1934 | 0x80, 2025-12-04T12:35:04.3982030Z | ^~~~ 2025-12-04T12:35:04.3983206Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1936:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.3983315Z 1936 | 0x80, 2025-12-04T12:35:04.3983411Z | ^~~~ 2025-12-04T12:35:04.3984588Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1938:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.3984694Z 1938 | 0x80, 2025-12-04T12:35:04.3984788Z | ^~~~ 2025-12-04T12:35:04.3986012Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1940:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.3986133Z 1940 | 0x80, 2025-12-04T12:35:04.3986224Z | ^~~~ 2025-12-04T12:35:04.3987412Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1942:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.3987507Z 1942 | 0x80, 2025-12-04T12:35:04.3987604Z | ^~~~ 2025-12-04T12:35:04.3988794Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1944:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.3988889Z 1944 | 0x80, 2025-12-04T12:35:04.3988983Z | ^~~~ 2025-12-04T12:35:04.3990171Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1946:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.3990284Z 1946 | 0x80, 2025-12-04T12:35:04.3990391Z | ^~~~ 2025-12-04T12:35:04.3991569Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1948:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.3991662Z 1948 | 0x80, 2025-12-04T12:35:04.3991808Z | ^~~~ 2025-12-04T12:35:04.3992983Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1950:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.3993095Z 1950 | 0x80, 2025-12-04T12:35:04.3993188Z | ^~~~ 2025-12-04T12:35:04.3994398Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1952:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.3994541Z 1952 | 0x80, 2025-12-04T12:35:04.3994635Z | ^~~~ 2025-12-04T12:35:04.3995811Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1954:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.3995921Z 1954 | 0x80, 2025-12-04T12:35:04.3996021Z | ^~~~ 2025-12-04T12:35:04.3997206Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1956:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.3997301Z 1956 | 0x80, 2025-12-04T12:35:04.3997393Z | ^~~~ 2025-12-04T12:35:04.3998595Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1958:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.3998695Z 1958 | 0x80, 2025-12-04T12:35:04.3998803Z | ^~~~ 2025-12-04T12:35:04.3999974Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1960:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4000068Z 1960 | 0x80, 2025-12-04T12:35:04.4000180Z | ^~~~ 2025-12-04T12:35:04.4001350Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1962:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4001443Z 1962 | 0x80, 2025-12-04T12:35:04.4001554Z | ^~~~ 2025-12-04T12:35:04.4002767Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1964:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4002882Z 1964 | 0x80, 2025-12-04T12:35:04.4002975Z | ^~~~ 2025-12-04T12:35:04.4004156Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1966:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4004300Z 1966 | 0x80, 2025-12-04T12:35:04.4004391Z | ^~~~ 2025-12-04T12:35:04.4005580Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1968:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4005674Z 1968 | 0x80, 2025-12-04T12:35:04.4005766Z | ^~~~ 2025-12-04T12:35:04.4006992Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1970:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4007120Z 1970 | 0x80, 2025-12-04T12:35:04.4007211Z | ^~~~ 2025-12-04T12:35:04.4008400Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1972:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4008503Z 1972 | 0x80, 2025-12-04T12:35:04.4008609Z | ^~~~ 2025-12-04T12:35:04.4009784Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1974:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4009880Z 1974 | 0x80, 2025-12-04T12:35:04.4009987Z | ^~~~ 2025-12-04T12:35:04.4011177Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1976:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4011286Z 1976 | 0x80, 2025-12-04T12:35:04.4011380Z | ^~~~ 2025-12-04T12:35:04.4012556Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1978:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4012670Z 1978 | 0x80, 2025-12-04T12:35:04.4012763Z | ^~~~ 2025-12-04T12:35:04.4013937Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1980:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4014046Z 1980 | 0x80, 2025-12-04T12:35:04.4014140Z | ^~~~ 2025-12-04T12:35:04.4015373Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1982:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4015468Z 1982 | 0x80, 2025-12-04T12:35:04.4015562Z | ^~~~ 2025-12-04T12:35:04.4016822Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1984:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4016926Z 1984 | 0x80, 2025-12-04T12:35:04.4017022Z | ^~~~ 2025-12-04T12:35:04.4018218Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1986:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4018315Z 1986 | 0x80, 2025-12-04T12:35:04.4018424Z | ^~~~ 2025-12-04T12:35:04.4019618Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1988:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4019713Z 1988 | 0x80, 2025-12-04T12:35:04.4019822Z | ^~~~ 2025-12-04T12:35:04.4020998Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1990:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4021152Z 1990 | 0x80, 2025-12-04T12:35:04.4021244Z | ^~~~ 2025-12-04T12:35:04.4022426Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1992:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4022533Z 1992 | 0x80, 2025-12-04T12:35:04.4022626Z | ^~~~ 2025-12-04T12:35:04.4023896Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:2002:38: warning: overflow in conversion from ‘int’ to ‘short int’ changes value from ‘65280’ to ‘-256’ [-Woverflow] 2025-12-04T12:35:04.4024056Z 2002 | __m512i keep_1 = _mm512_set1_epi16(0xFF00); 2025-12-04T12:35:04.4024174Z | ^~~~~~ 2025-12-04T12:35:04.4026595Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In instantiation of ‘at::vec::CPU_CAPABILITY::Vectorized at::vec::CPU_CAPABILITY::shift_512_8(const at::vec::CPU_CAPABILITY::Vectorized&, const at::vec::CPU_CAPABILITY::Vectorized&) [with bool left_shift = true; T = unsigned char; typename std::enable_if<(is_same_v || is_same_v), int>::type = 0]’: 2025-12-04T12:35:04.4027185Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:2081:27: required from here 2025-12-04T12:35:04.4028399Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1866:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4028494Z 1866 | 0x80, 2025-12-04T12:35:04.4028586Z | ^~~~ 2025-12-04T12:35:04.4029783Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1868:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4029883Z 1868 | 0x80, 2025-12-04T12:35:04.4029991Z | ^~~~ 2025-12-04T12:35:04.4031166Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1870:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4031259Z 1870 | 0x80, 2025-12-04T12:35:04.4031371Z | ^~~~ 2025-12-04T12:35:04.4032611Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1872:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4032719Z 1872 | 0x80, 2025-12-04T12:35:04.4032811Z | ^~~~ 2025-12-04T12:35:04.4033986Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1874:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4034099Z 1874 | 0x80, 2025-12-04T12:35:04.4034192Z | ^~~~ 2025-12-04T12:35:04.4035364Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1876:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4035470Z 1876 | 0x80, 2025-12-04T12:35:04.4035570Z | ^~~~ 2025-12-04T12:35:04.4036770Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1878:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4036866Z 1878 | 0x80, 2025-12-04T12:35:04.4036958Z | ^~~~ 2025-12-04T12:35:04.4038154Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1880:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4038283Z 1880 | 0x80, 2025-12-04T12:35:04.4038377Z | ^~~~ 2025-12-04T12:35:04.4039577Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1882:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4039708Z 1882 | 0x80, 2025-12-04T12:35:04.4039813Z | ^~~~ 2025-12-04T12:35:04.4041033Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1884:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4041133Z 1884 | 0x80, 2025-12-04T12:35:04.4041247Z | ^~~~ 2025-12-04T12:35:04.4042423Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1886:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4042542Z 1886 | 0x80, 2025-12-04T12:35:04.4042636Z | ^~~~ 2025-12-04T12:35:04.4043810Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1888:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4043925Z 1888 | 0x80, 2025-12-04T12:35:04.4044020Z | ^~~~ 2025-12-04T12:35:04.4045199Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1890:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4045308Z 1890 | 0x80, 2025-12-04T12:35:04.4045400Z | ^~~~ 2025-12-04T12:35:04.4046591Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1892:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4046690Z 1892 | 0x80, 2025-12-04T12:35:04.4046782Z | ^~~~ 2025-12-04T12:35:04.4047972Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1894:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4048074Z 1894 | 0x80, 2025-12-04T12:35:04.4048181Z | ^~~~ 2025-12-04T12:35:04.4049401Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1896:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4049495Z 1896 | 0x80, 2025-12-04T12:35:04.4049600Z | ^~~~ 2025-12-04T12:35:04.4050774Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1898:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4050874Z 1898 | 0x80, 2025-12-04T12:35:04.4050981Z | ^~~~ 2025-12-04T12:35:04.4052150Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1900:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4052265Z 1900 | 0x80, 2025-12-04T12:35:04.4052376Z | ^~~~ 2025-12-04T12:35:04.4053575Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1902:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4053685Z 1902 | 0x80, 2025-12-04T12:35:04.4053779Z | ^~~~ 2025-12-04T12:35:04.4054966Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1904:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4055101Z 1904 | 0x80, 2025-12-04T12:35:04.4055195Z | ^~~~ 2025-12-04T12:35:04.4056447Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1906:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4056589Z 1906 | 0x80, 2025-12-04T12:35:04.4056685Z | ^~~~ 2025-12-04T12:35:04.4057922Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1908:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4058021Z 1908 | 0x80, 2025-12-04T12:35:04.4058130Z | ^~~~ 2025-12-04T12:35:04.4059308Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1910:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4059410Z 1910 | 0x80, 2025-12-04T12:35:04.4059522Z | ^~~~ 2025-12-04T12:35:04.4060695Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1912:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4060812Z 1912 | 0x80, 2025-12-04T12:35:04.4060908Z | ^~~~ 2025-12-04T12:35:04.4062107Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1914:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4062220Z 1914 | 0x80, 2025-12-04T12:35:04.4062314Z | ^~~~ 2025-12-04T12:35:04.4063502Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1916:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4063619Z 1916 | 0x80, 2025-12-04T12:35:04.4063714Z | ^~~~ 2025-12-04T12:35:04.4064903Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1918:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4065006Z 1918 | 0x80, 2025-12-04T12:35:04.4065102Z | ^~~~ 2025-12-04T12:35:04.4066340Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1920:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4066438Z 1920 | 0x80, 2025-12-04T12:35:04.4066534Z | ^~~~ 2025-12-04T12:35:04.4067722Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1922:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4067822Z 1922 | 0x80, 2025-12-04T12:35:04.4067929Z | ^~~~ 2025-12-04T12:35:04.4069101Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1924:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4069202Z 1924 | 0x80, 2025-12-04T12:35:04.4069315Z | ^~~~ 2025-12-04T12:35:04.4070495Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1926:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4070609Z 1926 | 0x80, 2025-12-04T12:35:04.4070703Z | ^~~~ 2025-12-04T12:35:04.4072100Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1928:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4072303Z 1928 | 0x80); 2025-12-04T12:35:04.4072395Z | ^~~~ 2025-12-04T12:35:04.4073589Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1930:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4073735Z 1930 | 0x80, 2025-12-04T12:35:04.4073835Z | ^~~~ 2025-12-04T12:35:04.4075079Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1932:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4075175Z 1932 | 0x80, 2025-12-04T12:35:04.4075266Z | ^~~~ 2025-12-04T12:35:04.4076463Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1934:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4076561Z 1934 | 0x80, 2025-12-04T12:35:04.4076666Z | ^~~~ 2025-12-04T12:35:04.4077837Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1936:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4077939Z 1936 | 0x80, 2025-12-04T12:35:04.4078052Z | ^~~~ 2025-12-04T12:35:04.4079240Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1938:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4079350Z 1938 | 0x80, 2025-12-04T12:35:04.4079444Z | ^~~~ 2025-12-04T12:35:04.4080620Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1940:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4080738Z 1940 | 0x80, 2025-12-04T12:35:04.4080829Z | ^~~~ 2025-12-04T12:35:04.4082004Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1942:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4082121Z 1942 | 0x80, 2025-12-04T12:35:04.4082272Z | ^~~~ 2025-12-04T12:35:04.4083462Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1944:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4083557Z 1944 | 0x80, 2025-12-04T12:35:04.4083649Z | ^~~~ 2025-12-04T12:35:04.4084835Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1946:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4084936Z 1946 | 0x80, 2025-12-04T12:35:04.4085028Z | ^~~~ 2025-12-04T12:35:04.4086211Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1948:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4086317Z 1948 | 0x80, 2025-12-04T12:35:04.4086431Z | ^~~~ 2025-12-04T12:35:04.4087608Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1950:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4087702Z 1950 | 0x80, 2025-12-04T12:35:04.4087811Z | ^~~~ 2025-12-04T12:35:04.4088983Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1952:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4089128Z 1952 | 0x80, 2025-12-04T12:35:04.4089222Z | ^~~~ 2025-12-04T12:35:04.4090393Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1954:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4090545Z 1954 | 0x80, 2025-12-04T12:35:04.4090688Z | ^~~~ 2025-12-04T12:35:04.4091862Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1956:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4091969Z 1956 | 0x80, 2025-12-04T12:35:04.4092062Z | ^~~~ 2025-12-04T12:35:04.4093250Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1958:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4093345Z 1958 | 0x80, 2025-12-04T12:35:04.4093439Z | ^~~~ 2025-12-04T12:35:04.4094630Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1960:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4094762Z 1960 | 0x80, 2025-12-04T12:35:04.4094873Z | ^~~~ 2025-12-04T12:35:04.4096048Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1962:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4096144Z 1962 | 0x80, 2025-12-04T12:35:04.4096251Z | ^~~~ 2025-12-04T12:35:04.4097518Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1964:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4097616Z 1964 | 0x80, 2025-12-04T12:35:04.4097731Z | ^~~~ 2025-12-04T12:35:04.4098907Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1966:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4099036Z 1966 | 0x80, 2025-12-04T12:35:04.4099131Z | ^~~~ 2025-12-04T12:35:04.4100306Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1968:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4100419Z 1968 | 0x80, 2025-12-04T12:35:04.4100513Z | ^~~~ 2025-12-04T12:35:04.4101763Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1970:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4101860Z 1970 | 0x80, 2025-12-04T12:35:04.4101955Z | ^~~~ 2025-12-04T12:35:04.4103152Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1972:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4103321Z 1972 | 0x80, 2025-12-04T12:35:04.4103413Z | ^~~~ 2025-12-04T12:35:04.4104599Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1974:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4104695Z 1974 | 0x80, 2025-12-04T12:35:04.4104801Z | ^~~~ 2025-12-04T12:35:04.4105978Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1976:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4106072Z 1976 | 0x80, 2025-12-04T12:35:04.4106177Z | ^~~~ 2025-12-04T12:35:04.4107360Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1978:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4107479Z 1978 | 0x80, 2025-12-04T12:35:04.4107572Z | ^~~~ 2025-12-04T12:35:04.4108746Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1980:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4108856Z 1980 | 0x80, 2025-12-04T12:35:04.4108948Z | ^~~~ 2025-12-04T12:35:04.4110743Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1982:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4110855Z 1982 | 0x80, 2025-12-04T12:35:04.4110952Z | ^~~~ 2025-12-04T12:35:04.4112208Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1984:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4112315Z 1984 | 0x80, 2025-12-04T12:35:04.4112408Z | ^~~~ 2025-12-04T12:35:04.4113598Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1986:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4113693Z 1986 | 0x80, 2025-12-04T12:35:04.4113792Z | ^~~~ 2025-12-04T12:35:04.4114979Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1988:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4115076Z 1988 | 0x80, 2025-12-04T12:35:04.4115185Z | ^~~~ 2025-12-04T12:35:04.4116363Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1990:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4116469Z 1990 | 0x80, 2025-12-04T12:35:04.4116577Z | ^~~~ 2025-12-04T12:35:04.4117756Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1992:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4117862Z 1992 | 0x80, 2025-12-04T12:35:04.4117993Z | ^~~~ 2025-12-04T12:35:04.4119172Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:2002:38: warning: overflow in conversion from ‘int’ to ‘short int’ changes value from ‘65280’ to ‘-256’ [-Woverflow] 2025-12-04T12:35:04.4119341Z 2002 | __m512i keep_1 = _mm512_set1_epi16(0xFF00); 2025-12-04T12:35:04.4119459Z | ^~~~~~ 2025-12-04T12:35:04.4121969Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In instantiation of ‘at::vec::CPU_CAPABILITY::Vectorized at::vec::CPU_CAPABILITY::shift_512_8(const at::vec::CPU_CAPABILITY::Vectorized&, const at::vec::CPU_CAPABILITY::Vectorized&) [with bool left_shift = false; T = signed char; typename std::enable_if<(is_same_v || is_same_v), int>::type = 0]’: 2025-12-04T12:35:04.4122553Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:2109:28: required from here 2025-12-04T12:35:04.4123912Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1866:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4124022Z 1866 | 0x80, 2025-12-04T12:35:04.4124116Z | ^~~~ 2025-12-04T12:35:04.4125326Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1868:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4125425Z 1868 | 0x80, 2025-12-04T12:35:04.4125518Z | ^~~~ 2025-12-04T12:35:04.4126716Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1870:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4126816Z 1870 | 0x80, 2025-12-04T12:35:04.4126921Z | ^~~~ 2025-12-04T12:35:04.4128093Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1872:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4128187Z 1872 | 0x80, 2025-12-04T12:35:04.4128297Z | ^~~~ 2025-12-04T12:35:04.4129538Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1874:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4129633Z 1874 | 0x80, 2025-12-04T12:35:04.4129742Z | ^~~~ 2025-12-04T12:35:04.4130921Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1876:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4131034Z 1876 | 0x80, 2025-12-04T12:35:04.4131128Z | ^~~~ 2025-12-04T12:35:04.4132300Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1878:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4132409Z 1878 | 0x80, 2025-12-04T12:35:04.4132503Z | ^~~~ 2025-12-04T12:35:04.4133704Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1880:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4133799Z 1880 | 0x80, 2025-12-04T12:35:04.4133893Z | ^~~~ 2025-12-04T12:35:04.4135092Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1882:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4135233Z 1882 | 0x80, 2025-12-04T12:35:04.4135327Z | ^~~~ 2025-12-04T12:35:04.4136596Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1884:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4136690Z 1884 | 0x80, 2025-12-04T12:35:04.4145356Z | ^~~~ 2025-12-04T12:35:04.4146998Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1886:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4147101Z 1886 | 0x80, 2025-12-04T12:35:04.4147211Z | ^~~~ 2025-12-04T12:35:04.4148407Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1888:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4148511Z 1888 | 0x80, 2025-12-04T12:35:04.4148624Z | ^~~~ 2025-12-04T12:35:04.4149817Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1890:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4149928Z 1890 | 0x80, 2025-12-04T12:35:04.4150021Z | ^~~~ 2025-12-04T12:35:04.4151214Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1892:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4151325Z 1892 | 0x80, 2025-12-04T12:35:04.4151419Z | ^~~~ 2025-12-04T12:35:04.4152614Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1894:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4152715Z 1894 | 0x80, 2025-12-04T12:35:04.4152811Z | ^~~~ 2025-12-04T12:35:04.4154460Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1896:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4154559Z 1896 | 0x80, 2025-12-04T12:35:04.4154662Z | ^~~~ 2025-12-04T12:35:04.4155937Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1898:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4156041Z 1898 | 0x80, 2025-12-04T12:35:04.4156149Z | ^~~~ 2025-12-04T12:35:04.4157333Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1900:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4157434Z 1900 | 0x80, 2025-12-04T12:35:04.4157545Z | ^~~~ 2025-12-04T12:35:04.4158722Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1902:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4158830Z 1902 | 0x80, 2025-12-04T12:35:04.4158931Z | ^~~~ 2025-12-04T12:35:04.4160124Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1904:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4160229Z 1904 | 0x80, 2025-12-04T12:35:04.4160323Z | ^~~~ 2025-12-04T12:35:04.4161495Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1906:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4161641Z 1906 | 0x80, 2025-12-04T12:35:04.4161732Z | ^~~~ 2025-12-04T12:35:04.4162931Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1908:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4163028Z 1908 | 0x80, 2025-12-04T12:35:04.4163184Z | ^~~~ 2025-12-04T12:35:04.4164419Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1910:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4164517Z 1910 | 0x80, 2025-12-04T12:35:04.4164611Z | ^~~~ 2025-12-04T12:35:04.4165796Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1912:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4165898Z 1912 | 0x80, 2025-12-04T12:35:04.4166005Z | ^~~~ 2025-12-04T12:35:04.4167176Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1914:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4167273Z 1914 | 0x80, 2025-12-04T12:35:04.4167393Z | ^~~~ 2025-12-04T12:35:04.4168587Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1916:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4168700Z 1916 | 0x80, 2025-12-04T12:35:04.4168798Z | ^~~~ 2025-12-04T12:35:04.4169968Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1918:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4170085Z 1918 | 0x80, 2025-12-04T12:35:04.4170177Z | ^~~~ 2025-12-04T12:35:04.4171565Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1920:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4171674Z 1920 | 0x80, 2025-12-04T12:35:04.4171774Z | ^~~~ 2025-12-04T12:35:04.4173061Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1922:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4173161Z 1922 | 0x80, 2025-12-04T12:35:04.4173252Z | ^~~~ 2025-12-04T12:35:04.4174447Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1924:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4174547Z 1924 | 0x80, 2025-12-04T12:35:04.4174653Z | ^~~~ 2025-12-04T12:35:04.4175830Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1926:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4175931Z 1926 | 0x80, 2025-12-04T12:35:04.4176038Z | ^~~~ 2025-12-04T12:35:04.4177307Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1928:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4177410Z 1928 | 0x80); 2025-12-04T12:35:04.4177522Z | ^~~~ 2025-12-04T12:35:04.4178702Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1930:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4178881Z 1930 | 0x80, 2025-12-04T12:35:04.4178975Z | ^~~~ 2025-12-04T12:35:04.4180154Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1932:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4180307Z 1932 | 0x80, 2025-12-04T12:35:04.4180400Z | ^~~~ 2025-12-04T12:35:04.4181657Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1934:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4181755Z 1934 | 0x80, 2025-12-04T12:35:04.4181853Z | ^~~~ 2025-12-04T12:35:04.4183051Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1936:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4183153Z 1936 | 0x80, 2025-12-04T12:35:04.4183248Z | ^~~~ 2025-12-04T12:35:04.4184443Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1938:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4184543Z 1938 | 0x80, 2025-12-04T12:35:04.4184651Z | ^~~~ 2025-12-04T12:35:04.4185847Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1940:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4185942Z 1940 | 0x80, 2025-12-04T12:35:04.4186048Z | ^~~~ 2025-12-04T12:35:04.4187226Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1942:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4187338Z 1942 | 0x80, 2025-12-04T12:35:04.4187430Z | ^~~~ 2025-12-04T12:35:04.4188602Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1944:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4188717Z 1944 | 0x80, 2025-12-04T12:35:04.4188810Z | ^~~~ 2025-12-04T12:35:04.4190028Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1946:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4190138Z 1946 | 0x80, 2025-12-04T12:35:04.4190232Z | ^~~~ 2025-12-04T12:35:04.4191418Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1948:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4191520Z 1948 | 0x80, 2025-12-04T12:35:04.4191614Z | ^~~~ 2025-12-04T12:35:04.4192807Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1950:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4192908Z 1950 | 0x80, 2025-12-04T12:35:04.4193001Z | ^~~~ 2025-12-04T12:35:04.4194197Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1952:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4194292Z 1952 | 0x80, 2025-12-04T12:35:04.4194401Z | ^~~~ 2025-12-04T12:35:04.4195576Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1954:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4195711Z 1954 | 0x80, 2025-12-04T12:35:04.4195817Z | ^~~~ 2025-12-04T12:35:04.4196997Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1956:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4197139Z 1956 | 0x80, 2025-12-04T12:35:04.4197234Z | ^~~~ 2025-12-04T12:35:04.4198452Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1958:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4198561Z 1958 | 0x80, 2025-12-04T12:35:04.4198654Z | ^~~~ 2025-12-04T12:35:04.4199840Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1960:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4199940Z 1960 | 0x80, 2025-12-04T12:35:04.4200032Z | ^~~~ 2025-12-04T12:35:04.4201213Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1962:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4201312Z 1962 | 0x80, 2025-12-04T12:35:04.4202639Z | ^~~~ 2025-12-04T12:35:04.4203881Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1964:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4203978Z 1964 | 0x80, 2025-12-04T12:35:04.4204087Z | ^~~~ 2025-12-04T12:35:04.4205268Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1966:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4205369Z 1966 | 0x80, 2025-12-04T12:35:04.4205476Z | ^~~~ 2025-12-04T12:35:04.4206651Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1968:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4206768Z 1968 | 0x80, 2025-12-04T12:35:04.4206867Z | ^~~~ 2025-12-04T12:35:04.4208047Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1970:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4208151Z 1970 | 0x80, 2025-12-04T12:35:04.4208246Z | ^~~~ 2025-12-04T12:35:04.4209416Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1972:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4209573Z 1972 | 0x80, 2025-12-04T12:35:04.4209666Z | ^~~~ 2025-12-04T12:35:04.4210857Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1974:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4210989Z 1974 | 0x80, 2025-12-04T12:35:04.4211089Z | ^~~~ 2025-12-04T12:35:04.4212314Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1976:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4212409Z 1976 | 0x80, 2025-12-04T12:35:04.4212502Z | ^~~~ 2025-12-04T12:35:04.4213690Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1978:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4213790Z 1978 | 0x80, 2025-12-04T12:35:04.4213895Z | ^~~~ 2025-12-04T12:35:04.4215074Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1980:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4215175Z 1980 | 0x80, 2025-12-04T12:35:04.4215286Z | ^~~~ 2025-12-04T12:35:04.4216558Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1982:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4216668Z 1982 | 0x80, 2025-12-04T12:35:04.4216760Z | ^~~~ 2025-12-04T12:35:04.4217939Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1984:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4218054Z 1984 | 0x80, 2025-12-04T12:35:04.4218147Z | ^~~~ 2025-12-04T12:35:04.4219319Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1986:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4219434Z 1986 | 0x80, 2025-12-04T12:35:04.4219595Z | ^~~~ 2025-12-04T12:35:04.4220795Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1988:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4220890Z 1988 | 0x80, 2025-12-04T12:35:04.4220982Z | ^~~~ 2025-12-04T12:35:04.4222173Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1990:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4222275Z 1990 | 0x80, 2025-12-04T12:35:04.4222381Z | ^~~~ 2025-12-04T12:35:04.4223559Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1992:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4223667Z 1992 | 0x80, 2025-12-04T12:35:04.4223780Z | ^~~~ 2025-12-04T12:35:04.4224960Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:2002:38: warning: overflow in conversion from ‘int’ to ‘short int’ changes value from ‘65280’ to ‘-256’ [-Woverflow] 2025-12-04T12:35:04.4225123Z 2002 | __m512i keep_1 = _mm512_set1_epi16(0xFF00); 2025-12-04T12:35:04.4225259Z | ^~~~~~ 2025-12-04T12:35:04.4228280Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In instantiation of ‘at::vec::CPU_CAPABILITY::Vectorized at::vec::CPU_CAPABILITY::shift_512_8(const at::vec::CPU_CAPABILITY::Vectorized&, const at::vec::CPU_CAPABILITY::Vectorized&) [with bool left_shift = false; T = unsigned char; typename std::enable_if<(is_same_v || is_same_v), int>::type = 0]’: 2025-12-04T12:35:04.4228984Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:2116:28: required from here 2025-12-04T12:35:04.4230184Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1866:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4230301Z 1866 | 0x80, 2025-12-04T12:35:04.4230398Z | ^~~~ 2025-12-04T12:35:04.4231592Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1868:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4231704Z 1868 | 0x80, 2025-12-04T12:35:04.4231800Z | ^~~~ 2025-12-04T12:35:04.4232994Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1870:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4233102Z 1870 | 0x80, 2025-12-04T12:35:04.4233194Z | ^~~~ 2025-12-04T12:35:04.4234378Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1872:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4234473Z 1872 | 0x80, 2025-12-04T12:35:04.4234565Z | ^~~~ 2025-12-04T12:35:04.4235940Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1874:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4236037Z 1874 | 0x80, 2025-12-04T12:35:04.4236145Z | ^~~~ 2025-12-04T12:35:04.4237376Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1876:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4237485Z 1876 | 0x80, 2025-12-04T12:35:04.4237592Z | ^~~~ 2025-12-04T12:35:04.4238768Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1878:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4238875Z 1878 | 0x80, 2025-12-04T12:35:04.4238970Z | ^~~~ 2025-12-04T12:35:04.4240156Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1880:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4240263Z 1880 | 0x80, 2025-12-04T12:35:04.4240356Z | ^~~~ 2025-12-04T12:35:04.4241536Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1882:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4241666Z 1882 | 0x80, 2025-12-04T12:35:04.4241762Z | ^~~~ 2025-12-04T12:35:04.4242956Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1884:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4243052Z 1884 | 0x80, 2025-12-04T12:35:04.4243191Z | ^~~~ 2025-12-04T12:35:04.4244387Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1886:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4244481Z 1886 | 0x80, 2025-12-04T12:35:04.4244575Z | ^~~~ 2025-12-04T12:35:04.4245776Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1888:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4245950Z 1888 | 0x80, 2025-12-04T12:35:04.4246057Z | ^~~~ 2025-12-04T12:35:04.4247230Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1890:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4247323Z 1890 | 0x80, 2025-12-04T12:35:04.4247438Z | ^~~~ 2025-12-04T12:35:04.4248609Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1892:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4248717Z 1892 | 0x80, 2025-12-04T12:35:04.4248811Z | ^~~~ 2025-12-04T12:35:04.4249998Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1894:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4250112Z 1894 | 0x80, 2025-12-04T12:35:04.4250204Z | ^~~~ 2025-12-04T12:35:04.4251377Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1896:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4251486Z 1896 | 0x80, 2025-12-04T12:35:04.4251586Z | ^~~~ 2025-12-04T12:35:04.4252775Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1898:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4252870Z 1898 | 0x80, 2025-12-04T12:35:04.4252962Z | ^~~~ 2025-12-04T12:35:04.4254201Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1900:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4254303Z 1900 | 0x80, 2025-12-04T12:35:04.4254410Z | ^~~~ 2025-12-04T12:35:04.4255587Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1902:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4255684Z 1902 | 0x80, 2025-12-04T12:35:04.4255796Z | ^~~~ 2025-12-04T12:35:04.4257056Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1904:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4257154Z 1904 | 0x80, 2025-12-04T12:35:04.4257265Z | ^~~~ 2025-12-04T12:35:04.4258460Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1906:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4258575Z 1906 | 0x80, 2025-12-04T12:35:04.4258668Z | ^~~~ 2025-12-04T12:35:04.4259839Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1908:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4260002Z 1908 | 0x80, 2025-12-04T12:35:04.4260095Z | ^~~~ 2025-12-04T12:35:04.4261279Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1910:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4261372Z 1910 | 0x80, 2025-12-04T12:35:04.4261462Z | ^~~~ 2025-12-04T12:35:04.4262722Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1912:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4262817Z 1912 | 0x80, 2025-12-04T12:35:04.4262909Z | ^~~~ 2025-12-04T12:35:04.4264097Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1914:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4264199Z 1914 | 0x80, 2025-12-04T12:35:04.4264304Z | ^~~~ 2025-12-04T12:35:04.4265473Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1916:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4265566Z 1916 | 0x80, 2025-12-04T12:35:04.4265670Z | ^~~~ 2025-12-04T12:35:04.4266862Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1918:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4266970Z 1918 | 0x80, 2025-12-04T12:35:04.4267064Z | ^~~~ 2025-12-04T12:35:04.4268235Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1920:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4268352Z 1920 | 0x80, 2025-12-04T12:35:04.4268446Z | ^~~~ 2025-12-04T12:35:04.4269621Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1922:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4269733Z 1922 | 0x80, 2025-12-04T12:35:04.4269828Z | ^~~~ 2025-12-04T12:35:04.4271313Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1924:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4271412Z 1924 | 0x80, 2025-12-04T12:35:04.4271508Z | ^~~~ 2025-12-04T12:35:04.4272713Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1926:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4272816Z 1926 | 0x80, 2025-12-04T12:35:04.4272923Z | ^~~~ 2025-12-04T12:35:04.4274098Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1928:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4274196Z 1928 | 0x80); 2025-12-04T12:35:04.4274305Z | ^~~~ 2025-12-04T12:35:04.4275493Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1930:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4275590Z 1930 | 0x80, 2025-12-04T12:35:04.4275696Z | ^~~~ 2025-12-04T12:35:04.4276865Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1932:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4277030Z 1932 | 0x80, 2025-12-04T12:35:04.4277121Z | ^~~~ 2025-12-04T12:35:04.4278300Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1934:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4278407Z 1934 | 0x80, 2025-12-04T12:35:04.4278499Z | ^~~~ 2025-12-04T12:35:04.4279805Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1936:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4279902Z 1936 | 0x80, 2025-12-04T12:35:04.4279995Z | ^~~~ 2025-12-04T12:35:04.4281181Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1938:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4281283Z 1938 | 0x80, 2025-12-04T12:35:04.4281377Z | ^~~~ 2025-12-04T12:35:04.4282562Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1940:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4282660Z 1940 | 0x80, 2025-12-04T12:35:04.4282781Z | ^~~~ 2025-12-04T12:35:04.4283971Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1942:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4284068Z 1942 | 0x80, 2025-12-04T12:35:04.4284178Z | ^~~~ 2025-12-04T12:35:04.4285351Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1944:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4285467Z 1944 | 0x80, 2025-12-04T12:35:04.4285562Z | ^~~~ 2025-12-04T12:35:04.4286747Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1946:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4286858Z 1946 | 0x80, 2025-12-04T12:35:04.4286957Z | ^~~~ 2025-12-04T12:35:04.4288207Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1948:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4288315Z 1948 | 0x80, 2025-12-04T12:35:04.4288408Z | ^~~~ 2025-12-04T12:35:04.4289599Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1950:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4289701Z 1950 | 0x80, 2025-12-04T12:35:04.4289793Z | ^~~~ 2025-12-04T12:35:04.4290981Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1952:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4291075Z 1952 | 0x80, 2025-12-04T12:35:04.4291206Z | ^~~~ 2025-12-04T12:35:04.4292437Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1954:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4292532Z 1954 | 0x80, 2025-12-04T12:35:04.4292639Z | ^~~~ 2025-12-04T12:35:04.4293814Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1956:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4293944Z 1956 | 0x80, 2025-12-04T12:35:04.4294051Z | ^~~~ 2025-12-04T12:35:04.4295226Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1958:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4295334Z 1958 | 0x80, 2025-12-04T12:35:04.4295432Z | ^~~~ 2025-12-04T12:35:04.4296712Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1960:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4296823Z 1960 | 0x80, 2025-12-04T12:35:04.4296917Z | ^~~~ 2025-12-04T12:35:04.4298103Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1962:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4298218Z 1962 | 0x80, 2025-12-04T12:35:04.4298312Z | ^~~~ 2025-12-04T12:35:04.4299505Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1964:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4299608Z 1964 | 0x80, 2025-12-04T12:35:04.4299701Z | ^~~~ 2025-12-04T12:35:04.4300941Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1966:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4301037Z 1966 | 0x80, 2025-12-04T12:35:04.4301143Z | ^~~~ 2025-12-04T12:35:04.4302314Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1968:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4302416Z 1968 | 0x80, 2025-12-04T12:35:04.4302520Z | ^~~~ 2025-12-04T12:35:04.4303701Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1970:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4303801Z 1970 | 0x80, 2025-12-04T12:35:04.4303906Z | ^~~~ 2025-12-04T12:35:04.4305089Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1972:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4305196Z 1972 | 0x80, 2025-12-04T12:35:04.4305291Z | ^~~~ 2025-12-04T12:35:04.4306462Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1974:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4306611Z 1974 | 0x80, 2025-12-04T12:35:04.4306703Z | ^~~~ 2025-12-04T12:35:04.4307897Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1976:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4308026Z 1976 | 0x80, 2025-12-04T12:35:04.4308120Z | ^~~~ 2025-12-04T12:35:04.4309355Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1978:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4309451Z 1978 | 0x80, 2025-12-04T12:35:04.4309543Z | ^~~~ 2025-12-04T12:35:04.4310732Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1980:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4310833Z 1980 | 0x80, 2025-12-04T12:35:04.4310939Z | ^~~~ 2025-12-04T12:35:04.4312110Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1982:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4312210Z 1982 | 0x80, 2025-12-04T12:35:04.4312316Z | ^~~~ 2025-12-04T12:35:04.4313504Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1984:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4313613Z 1984 | 0x80, 2025-12-04T12:35:04.4313707Z | ^~~~ 2025-12-04T12:35:04.4314878Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1986:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4314996Z 1986 | 0x80, 2025-12-04T12:35:04.4315090Z | ^~~~ 2025-12-04T12:35:04.4316266Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1988:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4316385Z 1988 | 0x80, 2025-12-04T12:35:04.4316478Z | ^~~~ 2025-12-04T12:35:04.4317708Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1990:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4317806Z 1990 | 0x80, 2025-12-04T12:35:04.4317898Z | ^~~~ 2025-12-04T12:35:04.4319090Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1992:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4319190Z 1992 | 0x80, 2025-12-04T12:35:04.4319296Z | ^~~~ 2025-12-04T12:35:04.4320478Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:2002:38: warning: overflow in conversion from ‘int’ to ‘short int’ changes value from ‘65280’ to ‘-256’ [-Woverflow] 2025-12-04T12:35:04.4320645Z 2002 | __m512i keep_1 = _mm512_set1_epi16(0xFF00); 2025-12-04T12:35:04.4320789Z | ^~~~~~ 2025-12-04T12:35:04.4321299Z In file included from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512.h:16, 2025-12-04T12:35:04.4321667Z from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec.h:5, 2025-12-04T12:35:04.4322120Z from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/functional_base.h:7, 2025-12-04T12:35:04.4322595Z from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/functional.h:4, 2025-12-04T12:35:04.4323060Z from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/torch/csrc/inductor/cpp_prefix.h:45, 2025-12-04T12:35:04.4323702Z from /tmp/Qt7dz2/tmpq_68ffd4/data/aotinductor/model/cad56iehvyrgd23725fkkazyktxy2vkmfdx2f6hgjyqe5hsp2q7e.wrapper.cpp:656: 2025-12-04T12:35:04.4325255Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h: In instantiation of ‘void at::vec::CPU_CAPABILITY::QuantizeAvx512(const float*, T*, int, float, int64_t) [with T = signed char; int64_t = long int]’: 2025-12-04T12:35:04.4325836Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:696:31: required from here 2025-12-04T12:35:04.4327042Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:201:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.4327171Z 201 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.4327282Z | ^~~~ 2025-12-04T12:35:04.4328470Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:201:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.4328600Z 201 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.4328725Z | ^~~~ 2025-12-04T12:35:04.4329916Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:201:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.4330047Z 201 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.4330153Z | ^~~~ 2025-12-04T12:35:04.4331344Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:201:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.4331477Z 201 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.4331583Z | ^~~~ 2025-12-04T12:35:04.4332818Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:202:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.4332958Z 202 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.4333054Z | ^~~~ 2025-12-04T12:35:04.4334256Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:202:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.4334378Z 202 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.4334478Z | ^~~~ 2025-12-04T12:35:04.4335680Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:202:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.4335795Z 202 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.4335919Z | ^~~~ 2025-12-04T12:35:04.4337181Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:202:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.4337297Z 202 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.4337416Z | ^~~~ 2025-12-04T12:35:04.4338605Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:203:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.4338778Z 203 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.4338874Z | ^~~~ 2025-12-04T12:35:04.4340055Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:203:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.4340221Z 203 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.4340319Z | ^~~~ 2025-12-04T12:35:04.4341567Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:203:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.4341693Z 203 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.4341792Z | ^~~~ 2025-12-04T12:35:04.4342994Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:203:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.4343116Z 203 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.4343219Z | ^~~~ 2025-12-04T12:35:04.4344416Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:205:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.4344555Z 205 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.4344660Z | ^~~~ 2025-12-04T12:35:04.4345839Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:205:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.4345953Z 205 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.4346072Z | ^~~~ 2025-12-04T12:35:04.4347251Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:205:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.4347381Z 205 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.4347481Z | ^~~~ 2025-12-04T12:35:04.4348712Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:205:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.4348846Z 205 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.4348947Z | ^~~~ 2025-12-04T12:35:04.4350124Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:206:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.4350258Z 206 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.4350350Z | ^~~~ 2025-12-04T12:35:04.4351542Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:206:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.4351654Z 206 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.4351757Z | ^~~~ 2025-12-04T12:35:04.4352967Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:206:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.4353080Z 206 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.4353191Z | ^~~~ 2025-12-04T12:35:04.4354373Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:206:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.4354529Z 206 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.4354642Z | ^~~~ 2025-12-04T12:35:04.4355820Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:207:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.4355981Z 207 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.4356082Z | ^~~~ 2025-12-04T12:35:04.4357304Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:207:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.4357432Z 207 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.4357529Z | ^~~~ 2025-12-04T12:35:04.4358718Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:207:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.4358851Z 207 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.4358951Z | ^~~~ 2025-12-04T12:35:04.4360149Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:207:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.4360287Z 207 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.4360390Z | ^~~~ 2025-12-04T12:35:04.4361585Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:209:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.4361696Z 209 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.4361807Z | ^~~~ 2025-12-04T12:35:04.4362985Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:209:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.4363097Z 209 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.4363204Z | ^~~~ 2025-12-04T12:35:04.4364425Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:209:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.4364560Z 209 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.4364658Z | ^~~~ 2025-12-04T12:35:04.4365840Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:209:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.4365971Z 209 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.4366073Z | ^~~~ 2025-12-04T12:35:04.4367249Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:210:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.4367376Z 210 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.4367478Z | ^~~~ 2025-12-04T12:35:04.4368684Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:210:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.4368799Z 210 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.4368895Z | ^~~~ 2025-12-04T12:35:04.4370094Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:210:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.4370245Z 210 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.4370359Z | ^~~~ 2025-12-04T12:35:04.4371742Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:210:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.4371947Z 210 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.4372073Z | ^~~~ 2025-12-04T12:35:04.4373322Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:211:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.4373453Z 211 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.4373547Z | ^~~~ 2025-12-04T12:35:04.4374732Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:211:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.4374865Z 211 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.4374962Z | ^~~~ 2025-12-04T12:35:04.4376138Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:211:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.4376340Z 211 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.4376450Z | ^~~~ 2025-12-04T12:35:04.4377653Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:211:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.4377771Z 211 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.4377880Z | ^~~~ 2025-12-04T12:35:04.4379070Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:213:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.4379182Z 213 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.4379291Z | ^~~~ 2025-12-04T12:35:04.4380533Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:213:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.4380654Z 213 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.4380762Z | ^~~~ 2025-12-04T12:35:04.4381943Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:213:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.4382061Z 213 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.4382175Z | ^~~~ 2025-12-04T12:35:04.4383354Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:213:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.4383480Z 213 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.4383590Z | ^~~~ 2025-12-04T12:35:04.4384780Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:214:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.4384908Z 214 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.4385002Z | ^~~~ 2025-12-04T12:35:04.4386197Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:214:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.4386361Z 214 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.4386456Z | ^~~~ 2025-12-04T12:35:04.4388152Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:214:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.4388322Z 214 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.4388448Z | ^~~~ 2025-12-04T12:35:04.4389682Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:214:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.4389797Z 214 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.4389917Z | ^~~~ 2025-12-04T12:35:04.4391101Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:215:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.4391219Z 215 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.4391324Z | ^~~~ 2025-12-04T12:35:04.4392505Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:215:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.4392680Z 215 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.4392779Z | ^~~~ 2025-12-04T12:35:04.4393962Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:215:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.4394087Z 215 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.4394192Z | ^~~~ 2025-12-04T12:35:04.4395384Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:215:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.4395497Z 215 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.4395599Z | ^~~~ 2025-12-04T12:35:04.4397098Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h: In instantiation of ‘void at::vec::CPU_CAPABILITY::QuantizeAvx512(const float*, T*, int, float, int64_t) [with T = unsigned char; int64_t = long int]’: 2025-12-04T12:35:04.4397679Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:933:31: required from here 2025-12-04T12:35:04.4398886Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:201:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.4399040Z 201 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.4399136Z | ^~~~ 2025-12-04T12:35:04.4400457Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:201:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.4400674Z 201 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.4400790Z | ^~~~ 2025-12-04T12:35:04.4402033Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:201:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.4402149Z 201 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.4402272Z | ^~~~ 2025-12-04T12:35:04.4403461Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:201:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.4403599Z 201 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.4403705Z | ^~~~ 2025-12-04T12:35:04.4404882Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:202:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.4405028Z 202 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.4405123Z | ^~~~ 2025-12-04T12:35:04.4406304Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:202:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.4406432Z 202 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.4406530Z | ^~~~ 2025-12-04T12:35:04.4407733Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:202:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.4407845Z 202 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.4407947Z | ^~~~ 2025-12-04T12:35:04.4409202Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:202:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.4409323Z 202 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.4409438Z | ^~~~ 2025-12-04T12:35:04.4410619Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:203:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.4410736Z 203 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.4410843Z | ^~~~ 2025-12-04T12:35:04.4412024Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:203:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.4412148Z 203 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.4412252Z | ^~~~ 2025-12-04T12:35:04.4413446Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:203:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.4413572Z 203 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.4413674Z | ^~~~ 2025-12-04T12:35:04.4414857Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:203:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.4415035Z 203 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.4415138Z | ^~~~ 2025-12-04T12:35:04.4416398Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:205:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.4416565Z 205 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.4416660Z | ^~~~ 2025-12-04T12:35:04.4417918Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:205:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.4418033Z 205 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.4418147Z | ^~~~ 2025-12-04T12:35:04.4419340Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:205:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.4419464Z 205 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.4419581Z | ^~~~ 2025-12-04T12:35:04.4420772Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:205:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.4420903Z 205 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.4421029Z | ^~~~ 2025-12-04T12:35:04.4422208Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:206:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.4422338Z 206 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.4422440Z | ^~~~ 2025-12-04T12:35:04.4423624Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:206:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.4423755Z 206 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.4423855Z | ^~~~ 2025-12-04T12:35:04.4425112Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:206:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.4425238Z 206 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.4425345Z | ^~~~ 2025-12-04T12:35:04.4426547Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:206:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.4426676Z 206 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.4426795Z | ^~~~ 2025-12-04T12:35:04.4427983Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:207:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.4428098Z 207 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.4428217Z | ^~~~ 2025-12-04T12:35:04.4429410Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:207:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.4429524Z 207 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.4429644Z | ^~~~ 2025-12-04T12:35:04.4430827Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:207:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.4431007Z 207 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.4431106Z | ^~~~ 2025-12-04T12:35:04.4432645Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:207:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.4432830Z 207 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.4432939Z | ^~~~ 2025-12-04T12:35:04.4434187Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:209:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.4434307Z 209 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.4434399Z | ^~~~ 2025-12-04T12:35:04.4435604Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:209:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.4435726Z 209 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.4435840Z | ^~~~ 2025-12-04T12:35:04.4437026Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:209:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.4437153Z 209 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.4437277Z | ^~~~ 2025-12-04T12:35:04.4438464Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:209:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.4438576Z 209 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.4438703Z | ^~~~ 2025-12-04T12:35:04.4439886Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:210:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.4440013Z 210 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.4440106Z | ^~~~ 2025-12-04T12:35:04.4441334Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:210:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.4441467Z 210 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.4441563Z | ^~~~ 2025-12-04T12:35:04.4442758Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:210:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.4442877Z 210 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.4442978Z | ^~~~ 2025-12-04T12:35:04.4444176Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:210:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.4444290Z 210 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.4444398Z | ^~~~ 2025-12-04T12:35:04.4445608Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:211:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.4445720Z 211 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.4445831Z | ^~~~ 2025-12-04T12:35:04.4447015Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:211:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.4447166Z 211 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.4447278Z | ^~~~ 2025-12-04T12:35:04.4448462Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:211:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.4448625Z 211 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.4448732Z | ^~~~ 2025-12-04T12:35:04.4449953Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:211:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.4450083Z 211 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.4450184Z | ^~~~ 2025-12-04T12:35:04.4451387Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:213:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.4451507Z 213 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.4451598Z | ^~~~ 2025-12-04T12:35:04.4452793Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:213:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.4452924Z 213 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.4453022Z | ^~~~ 2025-12-04T12:35:04.4454215Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:213:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.4454329Z 213 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.4454451Z | ^~~~ 2025-12-04T12:35:04.4455634Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:213:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.4455746Z 213 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.4455860Z | ^~~~ 2025-12-04T12:35:04.4457174Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:214:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.4457308Z 214 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.4457404Z | ^~~~ 2025-12-04T12:35:04.4458598Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:214:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.4458736Z 214 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.4458837Z | ^~~~ 2025-12-04T12:35:04.4460038Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:214:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.4460152Z 214 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.4460260Z | ^~~~ 2025-12-04T12:35:04.4461476Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:214:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.4461590Z 214 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.4461692Z | ^~~~ 2025-12-04T12:35:04.4462888Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:215:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.4463063Z 215 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.4463171Z | ^~~~ 2025-12-04T12:35:04.4464358Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:215:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.4464506Z 215 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.4464624Z | ^~~~ 2025-12-04T12:35:04.4465848Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:215:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.4465974Z 215 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.4466077Z | ^~~~ 2025-12-04T12:35:04.4467257Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:215:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.4467394Z 215 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.4467497Z | ^~~~ 2025-12-04T12:35:04.4467604Z PASSED [8.9354s] [ 27%] 2025-12-04T12:35:04.4468335Z inductor/test_aot_inductor_package.py::TestAOTInductorPackageCpp_cpu::test_compile_after_package SKIPPED [0.0003s] (Test is only supported on CUDA 12.6+) [ 28%] 2025-12-04T12:35:04.4469096Z inductor/test_aot_inductor_package.py::TestAOTInductorPackageCpp_cpu::test_compile_after_package_multi_arch SKIPPED [0.0002s] (Test is only supported on CUDA 12.8+) [ 29%] 2025-12-04T12:35:04.4469831Z inductor/test_aot_inductor_package.py::TestAOTInductorPackageCpp_cpu::test_compile_after_package_static SKIPPED [0.0003s] (Test is only supported on CUDA 12.6+) [ 30%] 2025-12-04T12:35:04.4471234Z inductor/test_aot_inductor_package.py::TestAOTInductorPackageCpp_cpu::test_compile_standalone_cos W1204 12:28:03.440000 140836 site-packages/torch/_inductor/utils.py:3815] Overriding: aot_inductor.dynamic_linkage=False when aot_inductor_mode.compile_standalone is True. 2025-12-04T12:35:04.4471420Z -- The CXX compiler identification is GNU 11.4.0 2025-12-04T12:35:04.4471547Z -- Detecting CXX compiler ABI info 2025-12-04T12:35:04.4471689Z -- Detecting CXX compiler ABI info - done 2025-12-04T12:35:04.4471925Z -- Check for working CXX compiler: /opt/cache/bin/c++ - skipped 2025-12-04T12:35:04.4472155Z -- Detecting CXX compile features 2025-12-04T12:35:04.4472295Z -- Detecting CXX compile features - done 2025-12-04T12:35:04.4472481Z -- Found CUDA: /usr/local/cuda (found version "12.4") 2025-12-04T12:35:04.4472798Z -- The CUDA compiler identification is NVIDIA 12.4.131 with host compiler GNU 11.4.0 2025-12-04T12:35:04.4472938Z -- Detecting CUDA compiler ABI info 2025-12-04T12:35:04.4473077Z -- Detecting CUDA compiler ABI info - done 2025-12-04T12:35:04.4473322Z -- Check for working CUDA compiler: /usr/local/cuda/bin/nvcc - skipped 2025-12-04T12:35:04.4473460Z -- Detecting CUDA compile features 2025-12-04T12:35:04.4473596Z -- Detecting CUDA compile features - done 2025-12-04T12:35:04.4473851Z -- Found CUDAToolkit: /usr/local/cuda/include (found version "12.4.131") 2025-12-04T12:35:04.4474013Z -- Performing Test CMAKE_HAVE_LIBC_PTHREAD 2025-12-04T12:35:04.4474189Z -- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success 2025-12-04T12:35:04.4474299Z -- Found Threads: TRUE 2025-12-04T12:35:04.4474437Z -- PyTorch: CUDA detected: 12.4 2025-12-04T12:35:04.4474600Z -- PyTorch: CUDA nvcc is: /usr/local/cuda/bin/nvcc 2025-12-04T12:35:04.4474777Z -- PyTorch: CUDA toolkit directory: /usr/local/cuda 2025-12-04T12:35:04.4474899Z -- PyTorch: Header version is: 12.4 2025-12-04T12:35:04.4475333Z -- Found Python: /opt/conda/envs/py_3.10/bin/python3.10 (found version "3.10.14") found components: Interpreter 2025-12-04T12:35:04.4475901Z CMake Warning at /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/share/cmake/Caffe2/public/cuda.cmake:149 (message): 2025-12-04T12:35:04.4476100Z Failed to compute shorthash for libnvrtc.so 2025-12-04T12:35:04.4476223Z Call Stack (most recent call first): 2025-12-04T12:35:04.4476709Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/share/cmake/Caffe2/Caffe2Config.cmake:86 (include) 2025-12-04T12:35:04.4477206Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/share/cmake/Torch/TorchConfig.cmake:68 (find_package) 2025-12-04T12:35:04.4477378Z CMakeLists.txt:11 (find_package) 2025-12-04T12:35:04.4477391Z 2025-12-04T12:35:04.4477437Z 2025-12-04T12:35:04.4477624Z -- USE_CUDNN is set to 0. Compiling without cuDNN support 2025-12-04T12:35:04.4477860Z -- USE_CUSPARSELT is set to 0. Compiling without cuSPARSELt support 2025-12-04T12:35:04.4478052Z -- USE_CUDSS is set to 0. Compiling without cuDSS support 2025-12-04T12:35:04.4478241Z -- USE_CUFILE is set to 0. Compiling without cuFile support 2025-12-04T12:35:04.4478813Z CMake Warning at /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/share/cmake/Caffe2/public/cuda.cmake:332 (message): 2025-12-04T12:35:04.4479082Z pytorch is not compatible with `CMAKE_CUDA_ARCHITECTURES` and will ignore 2025-12-04T12:35:04.4479288Z its value. Please configure `TORCH_CUDA_ARCH_LIST` instead. 2025-12-04T12:35:04.4479426Z Call Stack (most recent call first): 2025-12-04T12:35:04.4479895Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/share/cmake/Caffe2/Caffe2Config.cmake:86 (include) 2025-12-04T12:35:04.4480389Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/share/cmake/Torch/TorchConfig.cmake:68 (find_package) 2025-12-04T12:35:04.4480525Z CMakeLists.txt:11 (find_package) 2025-12-04T12:35:04.4480531Z 2025-12-04T12:35:04.4480536Z 2025-12-04T12:35:04.4480758Z -- Added CUDA NVCC flags for: -gencode;arch=compute_75,code=sm_75 2025-12-04T12:35:04.4481308Z CMake Warning at /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/share/cmake/Torch/TorchConfig.cmake:22 (message): 2025-12-04T12:35:04.4481484Z static library kineto_LIBRARY-NOTFOUND not found. 2025-12-04T12:35:04.4481607Z Call Stack (most recent call first): 2025-12-04T12:35:04.4482189Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/share/cmake/Torch/TorchConfig.cmake:125 (append_torchlib_if_found) 2025-12-04T12:35:04.4482315Z CMakeLists.txt:11 (find_package) 2025-12-04T12:35:04.4482320Z 2025-12-04T12:35:04.4482327Z 2025-12-04T12:35:04.4482679Z -- Found Torch: /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/lib/libtorch.so 2025-12-04T12:35:04.4482833Z -- Configuring done (2.9s) 2025-12-04T12:35:04.4482950Z -- Generating done (0.0s) 2025-12-04T12:35:04.4483321Z -- Build files have been written to: /tmp/tmpwdu6dysi/cos.wrapper/data/aotinductor/model/build 2025-12-04T12:35:04.4483535Z [ 50%] Building CXX object CMakeFiles/cos.dir/cos.wrapper.cpp.o 2025-12-04T12:35:04.4483673Z [100%] Linking CXX static library libcos.a 2025-12-04T12:35:04.4483794Z [100%] Built target cos 2025-12-04T12:35:04.4483904Z PASSED [9.5524s] [ 31%] 2025-12-04T12:35:04.4484617Z inductor/test_aot_inductor_package.py::TestAOTInductorPackageCpp_cpu::test_compile_with_exporter SKIPPED [0.0003s] (Test is only supported on CUDA 12.6+) [ 32%] 2025-12-04T12:35:04.4485353Z inductor/test_aot_inductor_package.py::TestAOTInductorPackageCpp_cpu::test_compile_with_exporter_weights SKIPPED [0.0002s] (Test is only supported on CUDA 12.6+) [ 34%] 2025-12-04T12:35:04.4486577Z inductor/test_aot_inductor_package.py::TestAOTInductorPackageCpp_cpu::test_deepcopy_compiled_model W1204 12:28:18.162000 140836 site-packages/torch/export/pt2_archive/_package.py:763] AOTICompiledModel deepcopy warning: AOTICompiledModel.loader is not deepcopied. 2025-12-04T12:35:04.4486699Z PASSED [5.1877s] [ 35%] 2025-12-04T12:35:04.4487193Z inductor/test_aot_inductor_package.py::TestAOTInductorPackageCpp_cpu::test_duplicate_calls PASSED [21.6305s] [ 36%] 2025-12-04T12:35:04.4488163Z inductor/test_aot_inductor_package.py::TestAOTInductorPackageCpp_cpu::test_linear In file included from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_float.h:12, 2025-12-04T12:35:04.4488632Z from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512.h:11, 2025-12-04T12:35:04.4489000Z from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec.h:5, 2025-12-04T12:35:04.4489490Z from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/functional_base.h:7, 2025-12-04T12:35:04.4489928Z from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/functional.h:4, 2025-12-04T12:35:04.4490404Z from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/torch/csrc/inductor/cpp_prefix.h:45, 2025-12-04T12:35:04.4491065Z from /tmp/Ld3r6p/tmpxth9d7tz/data/aotinductor/model/cuzep6c3r5og2e4er75yunzp3ebrphkoo5sxbhxof3er5uux4ih3.wrapper.cpp:750: 2025-12-04T12:35:04.4491668Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/sleef.h:192:10: warning: ISO C++ prohibits anonymous structs [-Wpedantic] 2025-12-04T12:35:04.4491791Z 192 | struct { 2025-12-04T12:35:04.4491885Z | ^ 2025-12-04T12:35:04.4492401Z In file included from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512.h:15, 2025-12-04T12:35:04.4492773Z from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec.h:5, 2025-12-04T12:35:04.4493216Z from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/functional_base.h:7, 2025-12-04T12:35:04.4493634Z from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/functional.h:4, 2025-12-04T12:35:04.4494099Z from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/torch/csrc/inductor/cpp_prefix.h:45, 2025-12-04T12:35:04.4494776Z from /tmp/Ld3r6p/tmpxth9d7tz/data/aotinductor/model/cuzep6c3r5og2e4er75yunzp3ebrphkoo5sxbhxof3er5uux4ih3.wrapper.cpp:750: 2025-12-04T12:35:04.4497195Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In static member function ‘static at::vec::CPU_CAPABILITY::Vectorized at::vec::CPU_CAPABILITY::Vectorized::blendv(const at::vec::CPU_CAPABILITY::Vectorized&, const at::vec::CPU_CAPABILITY::Vectorized&, const at::vec::CPU_CAPABILITY::Vectorized&)’: 2025-12-04T12:35:04.4498417Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:544:38: warning: overflow in conversion from ‘int’ to ‘short int’ changes value from ‘65535’ to ‘-1’ [-Woverflow] 2025-12-04T12:35:04.4498574Z 544 | auto msb_one = _mm512_set1_epi16(0xFFFF); 2025-12-04T12:35:04.4498692Z | ^~~~~~ 2025-12-04T12:35:04.4499220Z In file included from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512.h:15, 2025-12-04T12:35:04.4499589Z from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec.h:5, 2025-12-04T12:35:04.4500052Z from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/functional_base.h:7, 2025-12-04T12:35:04.4500467Z from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/functional.h:4, 2025-12-04T12:35:04.4500939Z from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/torch/csrc/inductor/cpp_prefix.h:45, 2025-12-04T12:35:04.4501617Z from /tmp/Ld3r6p/tmpxth9d7tz/data/aotinductor/model/cuzep6c3r5og2e4er75yunzp3ebrphkoo5sxbhxof3er5uux4ih3.wrapper.cpp:750: 2025-12-04T12:35:04.4503253Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In member function ‘at::vec::CPU_CAPABILITY::Vectorized at::vec::CPU_CAPABILITY::Vectorized::operator==(const at::vec::CPU_CAPABILITY::Vectorized&) const’: 2025-12-04T12:35:04.4504480Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:697:54: warning: overflow in conversion from ‘int’ to ‘short int’ changes value from ‘65535’ to ‘-1’ [-Woverflow] 2025-12-04T12:35:04.4504724Z 697 | return _mm512_mask_set1_epi16(zero_vector, mask, 0xFFFF); 2025-12-04T12:35:04.4504890Z | ^~~~~~ 2025-12-04T12:35:04.4506529Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In member function ‘at::vec::CPU_CAPABILITY::Vectorized at::vec::CPU_CAPABILITY::Vectorized::operator!=(const at::vec::CPU_CAPABILITY::Vectorized&) const’: 2025-12-04T12:35:04.4507703Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:701:54: warning: overflow in conversion from ‘int’ to ‘short int’ changes value from ‘65535’ to ‘-1’ [-Woverflow] 2025-12-04T12:35:04.4507934Z 701 | return _mm512_mask_set1_epi16(zero_vector, mask, 0xFFFF); 2025-12-04T12:35:04.4508064Z | ^~~~~~ 2025-12-04T12:35:04.4509706Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In member function ‘at::vec::CPU_CAPABILITY::Vectorized at::vec::CPU_CAPABILITY::Vectorized::operator<(const at::vec::CPU_CAPABILITY::Vectorized&) const’: 2025-12-04T12:35:04.4510880Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:705:54: warning: overflow in conversion from ‘int’ to ‘short int’ changes value from ‘65535’ to ‘-1’ [-Woverflow] 2025-12-04T12:35:04.4511092Z 705 | return _mm512_mask_set1_epi16(zero_vector, mask, 0xFFFF); 2025-12-04T12:35:04.4511241Z | ^~~~~~ 2025-12-04T12:35:04.4512855Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In member function ‘at::vec::CPU_CAPABILITY::Vectorized at::vec::CPU_CAPABILITY::Vectorized::operator<=(const at::vec::CPU_CAPABILITY::Vectorized&) const’: 2025-12-04T12:35:04.4514105Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:709:54: warning: overflow in conversion from ‘int’ to ‘short int’ changes value from ‘65535’ to ‘-1’ [-Woverflow] 2025-12-04T12:35:04.4514309Z 709 | return _mm512_mask_set1_epi16(zero_vector, mask, 0xFFFF); 2025-12-04T12:35:04.4514437Z | ^~~~~~ 2025-12-04T12:35:04.4516062Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In member function ‘at::vec::CPU_CAPABILITY::Vectorized at::vec::CPU_CAPABILITY::Vectorized::operator>(const at::vec::CPU_CAPABILITY::Vectorized&) const’: 2025-12-04T12:35:04.4517229Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:713:54: warning: overflow in conversion from ‘int’ to ‘short int’ changes value from ‘65535’ to ‘-1’ [-Woverflow] 2025-12-04T12:35:04.4517454Z 713 | return _mm512_mask_set1_epi16(zero_vector, mask, 0xFFFF); 2025-12-04T12:35:04.4517593Z | ^~~~~~ 2025-12-04T12:35:04.4519652Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In member function ‘at::vec::CPU_CAPABILITY::Vectorized at::vec::CPU_CAPABILITY::Vectorized::operator>=(const at::vec::CPU_CAPABILITY::Vectorized&) const’: 2025-12-04T12:35:04.4520830Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:717:54: warning: overflow in conversion from ‘int’ to ‘short int’ changes value from ‘65535’ to ‘-1’ [-Woverflow] 2025-12-04T12:35:04.4521094Z 717 | return _mm512_mask_set1_epi16(zero_vector, mask, 0xFFFF); 2025-12-04T12:35:04.4521234Z | ^~~~~~ 2025-12-04T12:35:04.4523552Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In static member function ‘static at::vec::CPU_CAPABILITY::Vectorized at::vec::CPU_CAPABILITY::Vectorized::blendv(const at::vec::CPU_CAPABILITY::Vectorized&, const at::vec::CPU_CAPABILITY::Vectorized&, const at::vec::CPU_CAPABILITY::Vectorized&)’: 2025-12-04T12:35:04.4524804Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1153:37: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.4524963Z 1153 | auto msb_one = _mm512_set1_epi8(0xFF); 2025-12-04T12:35:04.4525097Z | ^~~~ 2025-12-04T12:35:04.4526752Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In member function ‘at::vec::CPU_CAPABILITY::Vectorized at::vec::CPU_CAPABILITY::Vectorized::operator==(const at::vec::CPU_CAPABILITY::Vectorized&) const’: 2025-12-04T12:35:04.4527968Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1166:53: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.4528186Z 1166 | return _mm512_mask_set1_epi8(zero_vector, mask, 0xFF); 2025-12-04T12:35:04.4528312Z | ^~~~ 2025-12-04T12:35:04.4529972Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In member function ‘at::vec::CPU_CAPABILITY::Vectorized at::vec::CPU_CAPABILITY::Vectorized::operator!=(const at::vec::CPU_CAPABILITY::Vectorized&) const’: 2025-12-04T12:35:04.4531215Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1170:53: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.4531480Z 1170 | return _mm512_mask_set1_epi8(zero_vector, mask, 0xFF); 2025-12-04T12:35:04.4531613Z | ^~~~ 2025-12-04T12:35:04.4533272Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In member function ‘at::vec::CPU_CAPABILITY::Vectorized at::vec::CPU_CAPABILITY::Vectorized::operator<(const at::vec::CPU_CAPABILITY::Vectorized&) const’: 2025-12-04T12:35:04.4534523Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1174:53: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.4534726Z 1174 | return _mm512_mask_set1_epi8(zero_vector, mask, 0xFF); 2025-12-04T12:35:04.4534863Z | ^~~~ 2025-12-04T12:35:04.4536713Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In member function ‘at::vec::CPU_CAPABILITY::Vectorized at::vec::CPU_CAPABILITY::Vectorized::operator<=(const at::vec::CPU_CAPABILITY::Vectorized&) const’: 2025-12-04T12:35:04.4537937Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1178:53: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.4538149Z 1178 | return _mm512_mask_set1_epi8(zero_vector, mask, 0xFF); 2025-12-04T12:35:04.4538273Z | ^~~~ 2025-12-04T12:35:04.4540637Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In static member function ‘static at::vec::CPU_CAPABILITY::Vectorized at::vec::CPU_CAPABILITY::Vectorized::blendv(const at::vec::CPU_CAPABILITY::Vectorized&, const at::vec::CPU_CAPABILITY::Vectorized&, const at::vec::CPU_CAPABILITY::Vectorized&)’: 2025-12-04T12:35:04.4541838Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1207:37: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.4542000Z 1207 | auto msb_one = _mm512_set1_epi8(0xFF); 2025-12-04T12:35:04.4542116Z | ^~~~ 2025-12-04T12:35:04.4543806Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In member function ‘at::vec::CPU_CAPABILITY::Vectorized at::vec::CPU_CAPABILITY::Vectorized::operator==(const at::vec::CPU_CAPABILITY::Vectorized&) const’: 2025-12-04T12:35:04.4545139Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1220:53: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.4545403Z 1220 | return _mm512_mask_set1_epi8(zero_vector, mask, 0xFF); 2025-12-04T12:35:04.4545546Z | ^~~~ 2025-12-04T12:35:04.4547265Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In member function ‘at::vec::CPU_CAPABILITY::Vectorized at::vec::CPU_CAPABILITY::Vectorized::operator!=(const at::vec::CPU_CAPABILITY::Vectorized&) const’: 2025-12-04T12:35:04.4548480Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1224:53: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.4548685Z 1224 | return _mm512_mask_set1_epi8(zero_vector, mask, 0xFF); 2025-12-04T12:35:04.4548817Z | ^~~~ 2025-12-04T12:35:04.4550530Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In member function ‘at::vec::CPU_CAPABILITY::Vectorized at::vec::CPU_CAPABILITY::Vectorized::operator<(const at::vec::CPU_CAPABILITY::Vectorized&) const’: 2025-12-04T12:35:04.4551721Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1228:53: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.4551980Z 1228 | return _mm512_mask_set1_epi8(zero_vector, mask, 0xFF); 2025-12-04T12:35:04.4552102Z | ^~~~ 2025-12-04T12:35:04.4553815Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In member function ‘at::vec::CPU_CAPABILITY::Vectorized at::vec::CPU_CAPABILITY::Vectorized::operator<=(const at::vec::CPU_CAPABILITY::Vectorized&) const’: 2025-12-04T12:35:04.4555083Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1232:53: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.4555286Z 1232 | return _mm512_mask_set1_epi8(zero_vector, mask, 0xFF); 2025-12-04T12:35:04.4555424Z | ^~~~ 2025-12-04T12:35:04.4557810Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In instantiation of ‘at::vec::CPU_CAPABILITY::Vectorized at::vec::CPU_CAPABILITY::shift_512_8(const at::vec::CPU_CAPABILITY::Vectorized&, const at::vec::CPU_CAPABILITY::Vectorized&) [with bool left_shift = true; T = signed char; typename std::enable_if<(is_same_v || is_same_v), int>::type = 0]’: 2025-12-04T12:35:04.4558421Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:2074:27: required from here 2025-12-04T12:35:04.4559615Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1866:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4559730Z 1866 | 0x80, 2025-12-04T12:35:04.4559826Z | ^~~~ 2025-12-04T12:35:04.4561014Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1868:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4561121Z 1868 | 0x80, 2025-12-04T12:35:04.4561214Z | ^~~~ 2025-12-04T12:35:04.4562391Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1870:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4562548Z 1870 | 0x80, 2025-12-04T12:35:04.4562650Z | ^~~~ 2025-12-04T12:35:04.4563842Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1872:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4563938Z 1872 | 0x80, 2025-12-04T12:35:04.4564033Z | ^~~~ 2025-12-04T12:35:04.4565237Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1874:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4565332Z 1874 | 0x80, 2025-12-04T12:35:04.4565425Z | ^~~~ 2025-12-04T12:35:04.4566607Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1876:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4566721Z 1876 | 0x80, 2025-12-04T12:35:04.4566826Z | ^~~~ 2025-12-04T12:35:04.4567999Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1878:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4568093Z 1878 | 0x80, 2025-12-04T12:35:04.4568200Z | ^~~~ 2025-12-04T12:35:04.4569418Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1880:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4569528Z 1880 | 0x80, 2025-12-04T12:35:04.4569621Z | ^~~~ 2025-12-04T12:35:04.4570801Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1882:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4571224Z 1882 | 0x80, 2025-12-04T12:35:04.4571321Z | ^~~~ 2025-12-04T12:35:04.4572509Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1884:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4572620Z 1884 | 0x80, 2025-12-04T12:35:04.4572717Z | ^~~~ 2025-12-04T12:35:04.4573913Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1886:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4574012Z 1886 | 0x80, 2025-12-04T12:35:04.4574104Z | ^~~~ 2025-12-04T12:35:04.4575302Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1888:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4575409Z 1888 | 0x80, 2025-12-04T12:35:04.4575521Z | ^~~~ 2025-12-04T12:35:04.4576776Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1890:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4576871Z 1890 | 0x80, 2025-12-04T12:35:04.4576986Z | ^~~~ 2025-12-04T12:35:04.4578166Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1892:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4578275Z 1892 | 0x80, 2025-12-04T12:35:04.4578368Z | ^~~~ 2025-12-04T12:35:04.4579598Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1894:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4579720Z 1894 | 0x80, 2025-12-04T12:35:04.4579812Z | ^~~~ 2025-12-04T12:35:04.4580991Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1896:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4581096Z 1896 | 0x80, 2025-12-04T12:35:04.4581195Z | ^~~~ 2025-12-04T12:35:04.4582376Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1898:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4582470Z 1898 | 0x80, 2025-12-04T12:35:04.4582562Z | ^~~~ 2025-12-04T12:35:04.4583754Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1900:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4583859Z 1900 | 0x80, 2025-12-04T12:35:04.4583951Z | ^~~~ 2025-12-04T12:35:04.4585138Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1902:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4585231Z 1902 | 0x80, 2025-12-04T12:35:04.4585387Z | ^~~~ 2025-12-04T12:35:04.4586565Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1904:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4586660Z 1904 | 0x80, 2025-12-04T12:35:04.4586765Z | ^~~~ 2025-12-04T12:35:04.4587984Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1906:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4588130Z 1906 | 0x80, 2025-12-04T12:35:04.4588223Z | ^~~~ 2025-12-04T12:35:04.4589398Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1908:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4589509Z 1908 | 0x80, 2025-12-04T12:35:04.4589607Z | ^~~~ 2025-12-04T12:35:04.4590774Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1910:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4590882Z 1910 | 0x80, 2025-12-04T12:35:04.4590974Z | ^~~~ 2025-12-04T12:35:04.4592169Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1912:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4592270Z 1912 | 0x80, 2025-12-04T12:35:04.4592366Z | ^~~~ 2025-12-04T12:35:04.4593558Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1914:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4593654Z 1914 | 0x80, 2025-12-04T12:35:04.4593767Z | ^~~~ 2025-12-04T12:35:04.4594943Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1916:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4595039Z 1916 | 0x80, 2025-12-04T12:35:04.4595147Z | ^~~~ 2025-12-04T12:35:04.4596374Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1918:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4596477Z 1918 | 0x80, 2025-12-04T12:35:04.4596584Z | ^~~~ 2025-12-04T12:35:04.4597767Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1920:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4597883Z 1920 | 0x80, 2025-12-04T12:35:04.4597977Z | ^~~~ 2025-12-04T12:35:04.4599157Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1922:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4599264Z 1922 | 0x80, 2025-12-04T12:35:04.4599356Z | ^~~~ 2025-12-04T12:35:04.4600557Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1924:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4600652Z 1924 | 0x80, 2025-12-04T12:35:04.4600743Z | ^~~~ 2025-12-04T12:35:04.4601929Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1926:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4602069Z 1926 | 0x80, 2025-12-04T12:35:04.4602161Z | ^~~~ 2025-12-04T12:35:04.4603362Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1928:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4603458Z 1928 | 0x80); 2025-12-04T12:35:04.4603565Z | ^~~~ 2025-12-04T12:35:04.4604824Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1930:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4604919Z 1930 | 0x80, 2025-12-04T12:35:04.4605026Z | ^~~~ 2025-12-04T12:35:04.4606203Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1932:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4606317Z 1932 | 0x80, 2025-12-04T12:35:04.4606410Z | ^~~~ 2025-12-04T12:35:04.4607590Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1934:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4607700Z 1934 | 0x80, 2025-12-04T12:35:04.4607794Z | ^~~~ 2025-12-04T12:35:04.4608993Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1936:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4609103Z 1936 | 0x80, 2025-12-04T12:35:04.4609201Z | ^~~~ 2025-12-04T12:35:04.4610393Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1938:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4610496Z 1938 | 0x80, 2025-12-04T12:35:04.4610588Z | ^~~~ 2025-12-04T12:35:04.4611782Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1940:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4611879Z 1940 | 0x80, 2025-12-04T12:35:04.4611974Z | ^~~~ 2025-12-04T12:35:04.4613229Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1942:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4613326Z 1942 | 0x80, 2025-12-04T12:35:04.4613436Z | ^~~~ 2025-12-04T12:35:04.4614613Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1944:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4614715Z 1944 | 0x80, 2025-12-04T12:35:04.4614827Z | ^~~~ 2025-12-04T12:35:04.4616002Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1946:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4616111Z 1946 | 0x80, 2025-12-04T12:35:04.4616213Z | ^~~~ 2025-12-04T12:35:04.4617481Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1948:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4617594Z 1948 | 0x80, 2025-12-04T12:35:04.4617690Z | ^~~~ 2025-12-04T12:35:04.4618866Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1950:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4619067Z 1950 | 0x80, 2025-12-04T12:35:04.4619157Z | ^~~~ 2025-12-04T12:35:04.4620345Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1952:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4620443Z 1952 | 0x80, 2025-12-04T12:35:04.4620584Z | ^~~~ 2025-12-04T12:35:04.4621827Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1954:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4621922Z 1954 | 0x80, 2025-12-04T12:35:04.4622029Z | ^~~~ 2025-12-04T12:35:04.4623205Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1956:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4623311Z 1956 | 0x80, 2025-12-04T12:35:04.4623416Z | ^~~~ 2025-12-04T12:35:04.4624592Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1958:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4624692Z 1958 | 0x80, 2025-12-04T12:35:04.4624806Z | ^~~~ 2025-12-04T12:35:04.4625999Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1960:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4626111Z 1960 | 0x80, 2025-12-04T12:35:04.4626203Z | ^~~~ 2025-12-04T12:35:04.4627374Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1962:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4627487Z 1962 | 0x80, 2025-12-04T12:35:04.4627578Z | ^~~~ 2025-12-04T12:35:04.4628764Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1964:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4628858Z 1964 | 0x80, 2025-12-04T12:35:04.4628956Z | ^~~~ 2025-12-04T12:35:04.4630185Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1966:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4630281Z 1966 | 0x80, 2025-12-04T12:35:04.4630373Z | ^~~~ 2025-12-04T12:35:04.4631567Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1968:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4631670Z 1968 | 0x80, 2025-12-04T12:35:04.4631779Z | ^~~~ 2025-12-04T12:35:04.4632953Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1970:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4633047Z 1970 | 0x80, 2025-12-04T12:35:04.4633192Z | ^~~~ 2025-12-04T12:35:04.4634423Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1972:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4634534Z 1972 | 0x80, 2025-12-04T12:35:04.4634627Z | ^~~~ 2025-12-04T12:35:04.4635802Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1974:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4635945Z 1974 | 0x80, 2025-12-04T12:35:04.4636036Z | ^~~~ 2025-12-04T12:35:04.4637217Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1976:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4637333Z 1976 | 0x80, 2025-12-04T12:35:04.4637425Z | ^~~~ 2025-12-04T12:35:04.4638626Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1978:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4638721Z 1978 | 0x80, 2025-12-04T12:35:04.4638814Z | ^~~~ 2025-12-04T12:35:04.4640005Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1980:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4640108Z 1980 | 0x80, 2025-12-04T12:35:04.4640217Z | ^~~~ 2025-12-04T12:35:04.4641386Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1982:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4641487Z 1982 | 0x80, 2025-12-04T12:35:04.4641591Z | ^~~~ 2025-12-04T12:35:04.4642808Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1984:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4642905Z 1984 | 0x80, 2025-12-04T12:35:04.4643012Z | ^~~~ 2025-12-04T12:35:04.4644179Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1986:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4644294Z 1986 | 0x80, 2025-12-04T12:35:04.4644387Z | ^~~~ 2025-12-04T12:35:04.4645556Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1988:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4645672Z 1988 | 0x80, 2025-12-04T12:35:04.4645764Z | ^~~~ 2025-12-04T12:35:04.4646959Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1990:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4647054Z 1990 | 0x80, 2025-12-04T12:35:04.4647146Z | ^~~~ 2025-12-04T12:35:04.4648328Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1992:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4648458Z 1992 | 0x80, 2025-12-04T12:35:04.4648552Z | ^~~~ 2025-12-04T12:35:04.4649752Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:2002:38: warning: overflow in conversion from ‘int’ to ‘short int’ changes value from ‘65280’ to ‘-256’ [-Woverflow] 2025-12-04T12:35:04.4649949Z 2002 | __m512i keep_1 = _mm512_set1_epi16(0xFF00); 2025-12-04T12:35:04.4650088Z | ^~~~~~ 2025-12-04T12:35:04.4652555Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In instantiation of ‘at::vec::CPU_CAPABILITY::Vectorized at::vec::CPU_CAPABILITY::shift_512_8(const at::vec::CPU_CAPABILITY::Vectorized&, const at::vec::CPU_CAPABILITY::Vectorized&) [with bool left_shift = true; T = unsigned char; typename std::enable_if<(is_same_v || is_same_v), int>::type = 0]’: 2025-12-04T12:35:04.4653161Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:2081:27: required from here 2025-12-04T12:35:04.4654356Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1866:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4654460Z 1866 | 0x80, 2025-12-04T12:35:04.4654577Z | ^~~~ 2025-12-04T12:35:04.4655764Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1868:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4655876Z 1868 | 0x80, 2025-12-04T12:35:04.4655971Z | ^~~~ 2025-12-04T12:35:04.4657230Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1870:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4657350Z 1870 | 0x80, 2025-12-04T12:35:04.4657444Z | ^~~~ 2025-12-04T12:35:04.4658626Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1872:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4658744Z 1872 | 0x80, 2025-12-04T12:35:04.4658884Z | ^~~~ 2025-12-04T12:35:04.4660091Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1874:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4660185Z 1874 | 0x80, 2025-12-04T12:35:04.4660277Z | ^~~~ 2025-12-04T12:35:04.4661469Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1876:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4661569Z 1876 | 0x80, 2025-12-04T12:35:04.4661662Z | ^~~~ 2025-12-04T12:35:04.4662849Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1878:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4662949Z 1878 | 0x80, 2025-12-04T12:35:04.4663060Z | ^~~~ 2025-12-04T12:35:04.4664239Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1880:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4664333Z 1880 | 0x80, 2025-12-04T12:35:04.4664442Z | ^~~~ 2025-12-04T12:35:04.4665614Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1882:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4665761Z 1882 | 0x80, 2025-12-04T12:35:04.4665853Z | ^~~~ 2025-12-04T12:35:04.4667032Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1884:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4667179Z 1884 | 0x80, 2025-12-04T12:35:04.4667279Z | ^~~~ 2025-12-04T12:35:04.4668490Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1886:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4668598Z 1886 | 0x80, 2025-12-04T12:35:04.4668690Z | ^~~~ 2025-12-04T12:35:04.4669879Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1888:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4669979Z 1888 | 0x80, 2025-12-04T12:35:04.4670072Z | ^~~~ 2025-12-04T12:35:04.4671488Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1890:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4671591Z 1890 | 0x80, 2025-12-04T12:35:04.4671712Z | ^~~~ 2025-12-04T12:35:04.4672889Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1892:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4672984Z 1892 | 0x80, 2025-12-04T12:35:04.4673094Z | ^~~~ 2025-12-04T12:35:04.4674273Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1894:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4674373Z 1894 | 0x80, 2025-12-04T12:35:04.4674480Z | ^~~~ 2025-12-04T12:35:04.4675647Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1896:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4675853Z 1896 | 0x80, 2025-12-04T12:35:04.4675954Z | ^~~~ 2025-12-04T12:35:04.4677131Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1898:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4677239Z 1898 | 0x80, 2025-12-04T12:35:04.4677332Z | ^~~~ 2025-12-04T12:35:04.4679083Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1900:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4679188Z 1900 | 0x80, 2025-12-04T12:35:04.4679281Z | ^~~~ 2025-12-04T12:35:04.4680481Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1902:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4680589Z 1902 | 0x80, 2025-12-04T12:35:04.4680690Z | ^~~~ 2025-12-04T12:35:04.4681877Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1904:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4681971Z 1904 | 0x80, 2025-12-04T12:35:04.4682080Z | ^~~~ 2025-12-04T12:35:04.4683257Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1906:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4683431Z 1906 | 0x80, 2025-12-04T12:35:04.4683546Z | ^~~~ 2025-12-04T12:35:04.4684728Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1908:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4684891Z 1908 | 0x80, 2025-12-04T12:35:04.4685032Z | ^~~~ 2025-12-04T12:35:04.4686217Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1910:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4686325Z 1910 | 0x80, 2025-12-04T12:35:04.4686419Z | ^~~~ 2025-12-04T12:35:04.4687597Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1912:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4687703Z 1912 | 0x80, 2025-12-04T12:35:04.4687796Z | ^~~~ 2025-12-04T12:35:04.4688979Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1914:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4689085Z 1914 | 0x80, 2025-12-04T12:35:04.4689184Z | ^~~~ 2025-12-04T12:35:04.4690374Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1916:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4690469Z 1916 | 0x80, 2025-12-04T12:35:04.4690561Z | ^~~~ 2025-12-04T12:35:04.4691754Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1918:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4691849Z 1918 | 0x80, 2025-12-04T12:35:04.4691955Z | ^~~~ 2025-12-04T12:35:04.4693128Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1920:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4693271Z 1920 | 0x80, 2025-12-04T12:35:04.4693379Z | ^~~~ 2025-12-04T12:35:04.4694551Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1922:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4694678Z 1922 | 0x80, 2025-12-04T12:35:04.4694773Z | ^~~~ 2025-12-04T12:35:04.4695953Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1924:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4696062Z 1924 | 0x80, 2025-12-04T12:35:04.4696159Z | ^~~~ 2025-12-04T12:35:04.4697426Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1926:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4697535Z 1926 | 0x80, 2025-12-04T12:35:04.4697631Z | ^~~~ 2025-12-04T12:35:04.4698829Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1928:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4698930Z 1928 | 0x80); 2025-12-04T12:35:04.4699024Z | ^~~~ 2025-12-04T12:35:04.4700281Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1930:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4700377Z 1930 | 0x80, 2025-12-04T12:35:04.4700486Z | ^~~~ 2025-12-04T12:35:04.4701667Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1932:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4701837Z 1932 | 0x80, 2025-12-04T12:35:04.4701946Z | ^~~~ 2025-12-04T12:35:04.4703125Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1934:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4703235Z 1934 | 0x80, 2025-12-04T12:35:04.4703336Z | ^~~~ 2025-12-04T12:35:04.4704519Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1936:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4704630Z 1936 | 0x80, 2025-12-04T12:35:04.4704730Z | ^~~~ 2025-12-04T12:35:04.4705917Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1938:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4706040Z 1938 | 0x80, 2025-12-04T12:35:04.4706136Z | ^~~~ 2025-12-04T12:35:04.4707324Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1940:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4707422Z 1940 | 0x80, 2025-12-04T12:35:04.4707523Z | ^~~~ 2025-12-04T12:35:04.4708710Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1942:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4708810Z 1942 | 0x80, 2025-12-04T12:35:04.4708903Z | ^~~~ 2025-12-04T12:35:04.4710140Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1944:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4710242Z 1944 | 0x80, 2025-12-04T12:35:04.4710353Z | ^~~~ 2025-12-04T12:35:04.4711528Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1946:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4711625Z 1946 | 0x80, 2025-12-04T12:35:04.4711741Z | ^~~~ 2025-12-04T12:35:04.4712916Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1948:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4713029Z 1948 | 0x80, 2025-12-04T12:35:04.4713125Z | ^~~~ 2025-12-04T12:35:04.4714309Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1950:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4714429Z 1950 | 0x80, 2025-12-04T12:35:04.4714523Z | ^~~~ 2025-12-04T12:35:04.4715698Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1952:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4715807Z 1952 | 0x80, 2025-12-04T12:35:04.4715936Z | ^~~~ 2025-12-04T12:35:04.4717131Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1954:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4717225Z 1954 | 0x80, 2025-12-04T12:35:04.4717317Z | ^~~~ 2025-12-04T12:35:04.4718549Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1956:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4718674Z 1956 | 0x80, 2025-12-04T12:35:04.4718779Z | ^~~~ 2025-12-04T12:35:04.4719952Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1958:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4720052Z 1958 | 0x80, 2025-12-04T12:35:04.4720164Z | ^~~~ 2025-12-04T12:35:04.4721332Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1960:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4721425Z 1960 | 0x80, 2025-12-04T12:35:04.4721531Z | ^~~~ 2025-12-04T12:35:04.4722714Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1962:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4722826Z 1962 | 0x80, 2025-12-04T12:35:04.4722919Z | ^~~~ 2025-12-04T12:35:04.4724086Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1964:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4724199Z 1964 | 0x80, 2025-12-04T12:35:04.4724289Z | ^~~~ 2025-12-04T12:35:04.4725474Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1966:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4725570Z 1966 | 0x80, 2025-12-04T12:35:04.4725663Z | ^~~~ 2025-12-04T12:35:04.4726921Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1968:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4727021Z 1968 | 0x80, 2025-12-04T12:35:04.4727116Z | ^~~~ 2025-12-04T12:35:04.4728307Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1970:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4728436Z 1970 | 0x80, 2025-12-04T12:35:04.4728543Z | ^~~~ 2025-12-04T12:35:04.4729717Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1972:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4729812Z 1972 | 0x80, 2025-12-04T12:35:04.4729920Z | ^~~~ 2025-12-04T12:35:04.4731173Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1974:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4731281Z 1974 | 0x80, 2025-12-04T12:35:04.4731374Z | ^~~~ 2025-12-04T12:35:04.4732551Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1976:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4732665Z 1976 | 0x80, 2025-12-04T12:35:04.4732758Z | ^~~~ 2025-12-04T12:35:04.4733928Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1978:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4734036Z 1978 | 0x80, 2025-12-04T12:35:04.4734128Z | ^~~~ 2025-12-04T12:35:04.4735329Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1980:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4735426Z 1980 | 0x80, 2025-12-04T12:35:04.4735518Z | ^~~~ 2025-12-04T12:35:04.4736774Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1982:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4736878Z 1982 | 0x80, 2025-12-04T12:35:04.4736975Z | ^~~~ 2025-12-04T12:35:04.4738169Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1984:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4738265Z 1984 | 0x80, 2025-12-04T12:35:04.4738375Z | ^~~~ 2025-12-04T12:35:04.4739607Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1986:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4739705Z 1986 | 0x80, 2025-12-04T12:35:04.4739816Z | ^~~~ 2025-12-04T12:35:04.4740995Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1988:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4741112Z 1988 | 0x80, 2025-12-04T12:35:04.4741205Z | ^~~~ 2025-12-04T12:35:04.4742379Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1990:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4742488Z 1990 | 0x80, 2025-12-04T12:35:04.4742587Z | ^~~~ 2025-12-04T12:35:04.4743768Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1992:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4743877Z 1992 | 0x80, 2025-12-04T12:35:04.4743970Z | ^~~~ 2025-12-04T12:35:04.4745165Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:2002:38: warning: overflow in conversion from ‘int’ to ‘short int’ changes value from ‘65280’ to ‘-256’ [-Woverflow] 2025-12-04T12:35:04.4745368Z 2002 | __m512i keep_1 = _mm512_set1_epi16(0xFF00); 2025-12-04T12:35:04.4745484Z | ^~~~~~ 2025-12-04T12:35:04.4747956Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In instantiation of ‘at::vec::CPU_CAPABILITY::Vectorized at::vec::CPU_CAPABILITY::shift_512_8(const at::vec::CPU_CAPABILITY::Vectorized&, const at::vec::CPU_CAPABILITY::Vectorized&) [with bool left_shift = false; T = signed char; typename std::enable_if<(is_same_v || is_same_v), int>::type = 0]’: 2025-12-04T12:35:04.4748569Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:2109:28: required from here 2025-12-04T12:35:04.4749767Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1866:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4749870Z 1866 | 0x80, 2025-12-04T12:35:04.4749982Z | ^~~~ 2025-12-04T12:35:04.4751156Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1868:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4751251Z 1868 | 0x80, 2025-12-04T12:35:04.4751367Z | ^~~~ 2025-12-04T12:35:04.4752552Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1870:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4752657Z 1870 | 0x80, 2025-12-04T12:35:04.4752750Z | ^~~~ 2025-12-04T12:35:04.4753927Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1872:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4754042Z 1872 | 0x80, 2025-12-04T12:35:04.4754134Z | ^~~~ 2025-12-04T12:35:04.4755306Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1874:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4755420Z 1874 | 0x80, 2025-12-04T12:35:04.4755511Z | ^~~~ 2025-12-04T12:35:04.4756754Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1876:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4756849Z 1876 | 0x80, 2025-12-04T12:35:04.4756941Z | ^~~~ 2025-12-04T12:35:04.4758126Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1878:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4758226Z 1878 | 0x80, 2025-12-04T12:35:04.4758317Z | ^~~~ 2025-12-04T12:35:04.4759505Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1880:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4759603Z 1880 | 0x80, 2025-12-04T12:35:04.4759713Z | ^~~~ 2025-12-04T12:35:04.4760896Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1882:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4760990Z 1882 | 0x80, 2025-12-04T12:35:04.4761099Z | ^~~~ 2025-12-04T12:35:04.4762275Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1884:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4762420Z 1884 | 0x80, 2025-12-04T12:35:04.4762512Z | ^~~~ 2025-12-04T12:35:04.4767113Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1886:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4767332Z 1886 | 0x80, 2025-12-04T12:35:04.4767442Z | ^~~~ 2025-12-04T12:35:04.4770815Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1888:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4770930Z 1888 | 0x80, 2025-12-04T12:35:04.4771223Z | ^~~~ 2025-12-04T12:35:04.4772467Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1890:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4772590Z 1890 | 0x80, 2025-12-04T12:35:04.4772686Z | ^~~~ 2025-12-04T12:35:04.4773910Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1892:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4774012Z 1892 | 0x80, 2025-12-04T12:35:04.4774119Z | ^~~~ 2025-12-04T12:35:04.4775320Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1894:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4775416Z 1894 | 0x80, 2025-12-04T12:35:04.4775523Z | ^~~~ 2025-12-04T12:35:04.4776765Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1896:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4776867Z 1896 | 0x80, 2025-12-04T12:35:04.4776975Z | ^~~~ 2025-12-04T12:35:04.4778160Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1898:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4778277Z 1898 | 0x80, 2025-12-04T12:35:04.4778370Z | ^~~~ 2025-12-04T12:35:04.4779550Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1900:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4779660Z 1900 | 0x80, 2025-12-04T12:35:04.4779753Z | ^~~~ 2025-12-04T12:35:04.4780930Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1902:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4781046Z 1902 | 0x80, 2025-12-04T12:35:04.4781139Z | ^~~~ 2025-12-04T12:35:04.4782329Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1904:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4782429Z 1904 | 0x80, 2025-12-04T12:35:04.4782521Z | ^~~~ 2025-12-04T12:35:04.4783710Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1906:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4783804Z 1906 | 0x80, 2025-12-04T12:35:04.4783908Z | ^~~~ 2025-12-04T12:35:04.4785080Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1908:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4785278Z 1908 | 0x80, 2025-12-04T12:35:04.4785388Z | ^~~~ 2025-12-04T12:35:04.4786664Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1910:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4786809Z 1910 | 0x80, 2025-12-04T12:35:04.4786918Z | ^~~~ 2025-12-04T12:35:04.4788153Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1912:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4788262Z 1912 | 0x80, 2025-12-04T12:35:04.4788357Z | ^~~~ 2025-12-04T12:35:04.4789534Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1914:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4789652Z 1914 | 0x80, 2025-12-04T12:35:04.4789746Z | ^~~~ 2025-12-04T12:35:04.4790931Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1916:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4791030Z 1916 | 0x80, 2025-12-04T12:35:04.4791124Z | ^~~~ 2025-12-04T12:35:04.4792312Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1918:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4792406Z 1918 | 0x80, 2025-12-04T12:35:04.4792499Z | ^~~~ 2025-12-04T12:35:04.4793684Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1920:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4793785Z 1920 | 0x80, 2025-12-04T12:35:04.4793894Z | ^~~~ 2025-12-04T12:35:04.4795077Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1922:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4795177Z 1922 | 0x80, 2025-12-04T12:35:04.4795286Z | ^~~~ 2025-12-04T12:35:04.4796466Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1924:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4796573Z 1924 | 0x80, 2025-12-04T12:35:04.4796671Z | ^~~~ 2025-12-04T12:35:04.4797841Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1926:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4797958Z 1926 | 0x80, 2025-12-04T12:35:04.4798051Z | ^~~~ 2025-12-04T12:35:04.4799234Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1928:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4799351Z 1928 | 0x80); 2025-12-04T12:35:04.4799444Z | ^~~~ 2025-12-04T12:35:04.4800635Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1930:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4800730Z 1930 | 0x80, 2025-12-04T12:35:04.4800826Z | ^~~~ 2025-12-04T12:35:04.4802020Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1932:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4802159Z 1932 | 0x80, 2025-12-04T12:35:04.4802264Z | ^~~~ 2025-12-04T12:35:04.4803492Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1934:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4803623Z 1934 | 0x80, 2025-12-04T12:35:04.4803729Z | ^~~~ 2025-12-04T12:35:04.4804950Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1936:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4805047Z 1936 | 0x80, 2025-12-04T12:35:04.4805158Z | ^~~~ 2025-12-04T12:35:04.4806340Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1938:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4806456Z 1938 | 0x80, 2025-12-04T12:35:04.4806550Z | ^~~~ 2025-12-04T12:35:04.4807734Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1940:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4807852Z 1940 | 0x80, 2025-12-04T12:35:04.4807946Z | ^~~~ 2025-12-04T12:35:04.4809141Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1942:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4809236Z 1942 | 0x80, 2025-12-04T12:35:04.4809331Z | ^~~~ 2025-12-04T12:35:04.4810521Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1944:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4810624Z 1944 | 0x80, 2025-12-04T12:35:04.4810718Z | ^~~~ 2025-12-04T12:35:04.4811914Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1946:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4812015Z 1946 | 0x80, 2025-12-04T12:35:04.4812119Z | ^~~~ 2025-12-04T12:35:04.4813297Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1948:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4813388Z 1948 | 0x80, 2025-12-04T12:35:04.4813495Z | ^~~~ 2025-12-04T12:35:04.4814673Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1950:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4814779Z 1950 | 0x80, 2025-12-04T12:35:04.4814873Z | ^~~~ 2025-12-04T12:35:04.4816046Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1952:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4816162Z 1952 | 0x80, 2025-12-04T12:35:04.4816254Z | ^~~~ 2025-12-04T12:35:04.4817518Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1954:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4817626Z 1954 | 0x80, 2025-12-04T12:35:04.4817719Z | ^~~~ 2025-12-04T12:35:04.4818950Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1956:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4819044Z 1956 | 0x80, 2025-12-04T12:35:04.4819137Z | ^~~~ 2025-12-04T12:35:04.4820385Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1958:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4820513Z 1958 | 0x80, 2025-12-04T12:35:04.4820605Z | ^~~~ 2025-12-04T12:35:04.4821837Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1960:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4821933Z 1960 | 0x80, 2025-12-04T12:35:04.4822041Z | ^~~~ 2025-12-04T12:35:04.4823223Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1962:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4823317Z 1962 | 0x80, 2025-12-04T12:35:04.4823426Z | ^~~~ 2025-12-04T12:35:04.4824599Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1964:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4824713Z 1964 | 0x80, 2025-12-04T12:35:04.4824806Z | ^~~~ 2025-12-04T12:35:04.4825978Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1966:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4826085Z 1966 | 0x80, 2025-12-04T12:35:04.4826179Z | ^~~~ 2025-12-04T12:35:04.4827353Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1968:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4827459Z 1968 | 0x80, 2025-12-04T12:35:04.4827552Z | ^~~~ 2025-12-04T12:35:04.4828755Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1970:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4828855Z 1970 | 0x80, 2025-12-04T12:35:04.4828948Z | ^~~~ 2025-12-04T12:35:04.4830135Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1972:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4830232Z 1972 | 0x80, 2025-12-04T12:35:04.4830343Z | ^~~~ 2025-12-04T12:35:04.4831558Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1974:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4831653Z 1974 | 0x80, 2025-12-04T12:35:04.4831756Z | ^~~~ 2025-12-04T12:35:04.4832932Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1976:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4833068Z 1976 | 0x80, 2025-12-04T12:35:04.4833177Z | ^~~~ 2025-12-04T12:35:04.4834357Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1978:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4834467Z 1978 | 0x80, 2025-12-04T12:35:04.4834567Z | ^~~~ 2025-12-04T12:35:04.4835740Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1980:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4835852Z 1980 | 0x80, 2025-12-04T12:35:04.4835946Z | ^~~~ 2025-12-04T12:35:04.4837173Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1982:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4837276Z 1982 | 0x80, 2025-12-04T12:35:04.4837369Z | ^~~~ 2025-12-04T12:35:04.4838590Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1984:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4838687Z 1984 | 0x80, 2025-12-04T12:35:04.4838785Z | ^~~~ 2025-12-04T12:35:04.4839977Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1986:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4840070Z 1986 | 0x80, 2025-12-04T12:35:04.4840175Z | ^~~~ 2025-12-04T12:35:04.4841354Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1988:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4841455Z 1988 | 0x80, 2025-12-04T12:35:04.4841562Z | ^~~~ 2025-12-04T12:35:04.4842740Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1990:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4842851Z 1990 | 0x80, 2025-12-04T12:35:04.4842949Z | ^~~~ 2025-12-04T12:35:04.4844123Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1992:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4844229Z 1992 | 0x80, 2025-12-04T12:35:04.4844321Z | ^~~~ 2025-12-04T12:35:04.4845514Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:2002:38: warning: overflow in conversion from ‘int’ to ‘short int’ changes value from ‘65280’ to ‘-256’ [-Woverflow] 2025-12-04T12:35:04.4845692Z 2002 | __m512i keep_1 = _mm512_set1_epi16(0xFF00); 2025-12-04T12:35:04.4845819Z | ^~~~~~ 2025-12-04T12:35:04.4848260Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In instantiation of ‘at::vec::CPU_CAPABILITY::Vectorized at::vec::CPU_CAPABILITY::shift_512_8(const at::vec::CPU_CAPABILITY::Vectorized&, const at::vec::CPU_CAPABILITY::Vectorized&) [with bool left_shift = false; T = unsigned char; typename std::enable_if<(is_same_v || is_same_v), int>::type = 0]’: 2025-12-04T12:35:04.4848887Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:2116:28: required from here 2025-12-04T12:35:04.4850133Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1866:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4850232Z 1866 | 0x80, 2025-12-04T12:35:04.4850328Z | ^~~~ 2025-12-04T12:35:04.4851530Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1868:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4851633Z 1868 | 0x80, 2025-12-04T12:35:04.4851727Z | ^~~~ 2025-12-04T12:35:04.4852922Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1870:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4853016Z 1870 | 0x80, 2025-12-04T12:35:04.4853122Z | ^~~~ 2025-12-04T12:35:04.4854348Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1872:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4854445Z 1872 | 0x80, 2025-12-04T12:35:04.4854553Z | ^~~~ 2025-12-04T12:35:04.4855760Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1874:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4855881Z 1874 | 0x80, 2025-12-04T12:35:04.4855975Z | ^~~~ 2025-12-04T12:35:04.4857224Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1876:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4857338Z 1876 | 0x80, 2025-12-04T12:35:04.4857439Z | ^~~~ 2025-12-04T12:35:04.4858622Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1878:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4858733Z 1878 | 0x80, 2025-12-04T12:35:04.4858833Z | ^~~~ 2025-12-04T12:35:04.4860021Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1880:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4860123Z 1880 | 0x80, 2025-12-04T12:35:04.4860217Z | ^~~~ 2025-12-04T12:35:04.4861404Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1882:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4861499Z 1882 | 0x80, 2025-12-04T12:35:04.4861613Z | ^~~~ 2025-12-04T12:35:04.4862791Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1884:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4862892Z 1884 | 0x80, 2025-12-04T12:35:04.4863003Z | ^~~~ 2025-12-04T12:35:04.4864172Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1886:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4864331Z 1886 | 0x80, 2025-12-04T12:35:04.4864424Z | ^~~~ 2025-12-04T12:35:04.4865601Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1888:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4865707Z 1888 | 0x80, 2025-12-04T12:35:04.4865872Z | ^~~~ 2025-12-04T12:35:04.4867050Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1890:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4867157Z 1890 | 0x80, 2025-12-04T12:35:04.4867288Z | ^~~~ 2025-12-04T12:35:04.4868472Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1892:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4868572Z 1892 | 0x80, 2025-12-04T12:35:04.4868664Z | ^~~~ 2025-12-04T12:35:04.4869846Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1894:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4869941Z 1894 | 0x80, 2025-12-04T12:35:04.4870046Z | ^~~~ 2025-12-04T12:35:04.4871418Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1896:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4871513Z 1896 | 0x80, 2025-12-04T12:35:04.4871625Z | ^~~~ 2025-12-04T12:35:04.4872804Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1898:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4872905Z 1898 | 0x80, 2025-12-04T12:35:04.4873012Z | ^~~~ 2025-12-04T12:35:04.4874177Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1900:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4874286Z 1900 | 0x80, 2025-12-04T12:35:04.4874391Z | ^~~~ 2025-12-04T12:35:04.4875561Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1902:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4875668Z 1902 | 0x80, 2025-12-04T12:35:04.4875784Z | ^~~~ 2025-12-04T12:35:04.4876972Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1904:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4877074Z 1904 | 0x80, 2025-12-04T12:35:04.4877184Z | ^~~~ 2025-12-04T12:35:04.4878359Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1906:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4878461Z 1906 | 0x80, 2025-12-04T12:35:04.4878575Z | ^~~~ 2025-12-04T12:35:04.4879753Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1908:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4879856Z 1908 | 0x80, 2025-12-04T12:35:04.4879964Z | ^~~~ 2025-12-04T12:35:04.4881143Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1910:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4881368Z 1910 | 0x80, 2025-12-04T12:35:04.4881463Z | ^~~~ 2025-12-04T12:35:04.4882643Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1912:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4882858Z 1912 | 0x80, 2025-12-04T12:35:04.4882954Z | ^~~~ 2025-12-04T12:35:04.4884149Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1914:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4884296Z 1914 | 0x80, 2025-12-04T12:35:04.4884392Z | ^~~~ 2025-12-04T12:35:04.4885591Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1916:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4885695Z 1916 | 0x80, 2025-12-04T12:35:04.4885789Z | ^~~~ 2025-12-04T12:35:04.4886981Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1918:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4887090Z 1918 | 0x80, 2025-12-04T12:35:04.4887200Z | ^~~~ 2025-12-04T12:35:04.4888384Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1920:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4888486Z 1920 | 0x80, 2025-12-04T12:35:04.4888597Z | ^~~~ 2025-12-04T12:35:04.4889773Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1922:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4889892Z 1922 | 0x80, 2025-12-04T12:35:04.4889987Z | ^~~~ 2025-12-04T12:35:04.4891164Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1924:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4891288Z 1924 | 0x80, 2025-12-04T12:35:04.4891380Z | ^~~~ 2025-12-04T12:35:04.4892557Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1926:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4892674Z 1926 | 0x80, 2025-12-04T12:35:04.4892768Z | ^~~~ 2025-12-04T12:35:04.4893958Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1928:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4894063Z 1928 | 0x80); 2025-12-04T12:35:04.4894159Z | ^~~~ 2025-12-04T12:35:04.4895358Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1930:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4895462Z 1930 | 0x80, 2025-12-04T12:35:04.4895570Z | ^~~~ 2025-12-04T12:35:04.4896845Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1932:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4896944Z 1932 | 0x80, 2025-12-04T12:35:04.4897051Z | ^~~~ 2025-12-04T12:35:04.4898233Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1934:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4898391Z 1934 | 0x80, 2025-12-04T12:35:04.4898496Z | ^~~~ 2025-12-04T12:35:04.4899710Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1936:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4899852Z 1936 | 0x80, 2025-12-04T12:35:04.4899943Z | ^~~~ 2025-12-04T12:35:04.4901160Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1938:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4901273Z 1938 | 0x80, 2025-12-04T12:35:04.4901366Z | ^~~~ 2025-12-04T12:35:04.4902550Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1940:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4902650Z 1940 | 0x80, 2025-12-04T12:35:04.4902745Z | ^~~~ 2025-12-04T12:35:04.4903931Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1942:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4904032Z 1942 | 0x80, 2025-12-04T12:35:04.4904124Z | ^~~~ 2025-12-04T12:35:04.4905312Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1944:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4905406Z 1944 | 0x80, 2025-12-04T12:35:04.4905514Z | ^~~~ 2025-12-04T12:35:04.4906684Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1946:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4906785Z 1946 | 0x80, 2025-12-04T12:35:04.4906894Z | ^~~~ 2025-12-04T12:35:04.4908067Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1948:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4908183Z 1948 | 0x80, 2025-12-04T12:35:04.4908277Z | ^~~~ 2025-12-04T12:35:04.4909450Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1950:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4909560Z 1950 | 0x80, 2025-12-04T12:35:04.4909654Z | ^~~~ 2025-12-04T12:35:04.4910831Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1952:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4910947Z 1952 | 0x80, 2025-12-04T12:35:04.4911041Z | ^~~~ 2025-12-04T12:35:04.4912229Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1954:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4912330Z 1954 | 0x80, 2025-12-04T12:35:04.4912422Z | ^~~~ 2025-12-04T12:35:04.4913610Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1956:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4913705Z 1956 | 0x80, 2025-12-04T12:35:04.4913797Z | ^~~~ 2025-12-04T12:35:04.4914980Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1958:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4915118Z 1958 | 0x80, 2025-12-04T12:35:04.4915223Z | ^~~~ 2025-12-04T12:35:04.4916436Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1960:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4916563Z 1960 | 0x80, 2025-12-04T12:35:04.4916668Z | ^~~~ 2025-12-04T12:35:04.4917882Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1962:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4917991Z 1962 | 0x80, 2025-12-04T12:35:04.4918087Z | ^~~~ 2025-12-04T12:35:04.4919263Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1964:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4919383Z 1964 | 0x80, 2025-12-04T12:35:04.4919479Z | ^~~~ 2025-12-04T12:35:04.4920660Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1966:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4920775Z 1966 | 0x80, 2025-12-04T12:35:04.4920868Z | ^~~~ 2025-12-04T12:35:04.4922059Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1968:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4922153Z 1968 | 0x80, 2025-12-04T12:35:04.4922247Z | ^~~~ 2025-12-04T12:35:04.4923437Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1970:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4923537Z 1970 | 0x80, 2025-12-04T12:35:04.4923643Z | ^~~~ 2025-12-04T12:35:04.4924821Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1972:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4924921Z 1972 | 0x80, 2025-12-04T12:35:04.4925028Z | ^~~~ 2025-12-04T12:35:04.4926211Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1974:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4926304Z 1974 | 0x80, 2025-12-04T12:35:04.4926409Z | ^~~~ 2025-12-04T12:35:04.4927578Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1976:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4927732Z 1976 | 0x80, 2025-12-04T12:35:04.4927825Z | ^~~~ 2025-12-04T12:35:04.4929002Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1978:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4929145Z 1978 | 0x80, 2025-12-04T12:35:04.4929239Z | ^~~~ 2025-12-04T12:35:04.4930435Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1980:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4930529Z 1980 | 0x80, 2025-12-04T12:35:04.4930620Z | ^~~~ 2025-12-04T12:35:04.4931806Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1982:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4931906Z 1982 | 0x80, 2025-12-04T12:35:04.4931999Z | ^~~~ 2025-12-04T12:35:04.4933241Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1984:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4933342Z 1984 | 0x80, 2025-12-04T12:35:04.4933448Z | ^~~~ 2025-12-04T12:35:04.4934679Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1986:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4934775Z 1986 | 0x80, 2025-12-04T12:35:04.4934883Z | ^~~~ 2025-12-04T12:35:04.4936050Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1988:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4936162Z 1988 | 0x80, 2025-12-04T12:35:04.4936256Z | ^~~~ 2025-12-04T12:35:04.4937512Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1990:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4937628Z 1990 | 0x80, 2025-12-04T12:35:04.4937720Z | ^~~~ 2025-12-04T12:35:04.4938897Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1992:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.4939006Z 1992 | 0x80, 2025-12-04T12:35:04.4939098Z | ^~~~ 2025-12-04T12:35:04.4940300Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:2002:38: warning: overflow in conversion from ‘int’ to ‘short int’ changes value from ‘65280’ to ‘-256’ [-Woverflow] 2025-12-04T12:35:04.4940459Z 2002 | __m512i keep_1 = _mm512_set1_epi16(0xFF00); 2025-12-04T12:35:04.4940577Z | ^~~~~~ 2025-12-04T12:35:04.4941100Z In file included from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512.h:16, 2025-12-04T12:35:04.4941480Z from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec.h:5, 2025-12-04T12:35:04.4941938Z from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/functional_base.h:7, 2025-12-04T12:35:04.4942341Z from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/functional.h:4, 2025-12-04T12:35:04.4942805Z from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/torch/csrc/inductor/cpp_prefix.h:45, 2025-12-04T12:35:04.4943549Z from /tmp/Ld3r6p/tmpxth9d7tz/data/aotinductor/model/cuzep6c3r5og2e4er75yunzp3ebrphkoo5sxbhxof3er5uux4ih3.wrapper.cpp:750: 2025-12-04T12:35:04.4945023Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h: In instantiation of ‘void at::vec::CPU_CAPABILITY::QuantizeAvx512(const float*, T*, int, float, int64_t) [with T = signed char; int64_t = long int]’: 2025-12-04T12:35:04.4945656Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:696:31: required from here 2025-12-04T12:35:04.4946850Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:201:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.4946970Z 201 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.4947088Z | ^~~~ 2025-12-04T12:35:04.4948274Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:201:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.4948405Z 201 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.4948507Z | ^~~~ 2025-12-04T12:35:04.4949752Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:201:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.4949880Z 201 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.4950021Z | ^~~~ 2025-12-04T12:35:04.4951219Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:201:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.4951340Z 201 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.4951448Z | ^~~~ 2025-12-04T12:35:04.4952654Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:202:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.4952775Z 202 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.4952877Z | ^~~~ 2025-12-04T12:35:04.4954074Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:202:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.4954194Z 202 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.4954307Z | ^~~~ 2025-12-04T12:35:04.4955479Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:202:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.4955600Z 202 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.4955716Z | ^~~~ 2025-12-04T12:35:04.4956906Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:202:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.4957039Z 202 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.4957141Z | ^~~~ 2025-12-04T12:35:04.4958326Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:203:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.4958453Z 203 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.4958640Z | ^~~~ 2025-12-04T12:35:04.4959914Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:203:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.4986401Z 203 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.4986631Z | ^~~~ 2025-12-04T12:35:04.4988209Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:203:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.4988408Z 203 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.4988513Z | ^~~~ 2025-12-04T12:35:04.4989782Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:203:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.4989917Z 203 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.4990034Z | ^~~~ 2025-12-04T12:35:04.4991238Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:205:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.4991367Z 205 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.4991463Z | ^~~~ 2025-12-04T12:35:04.4992681Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:205:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.4992797Z 205 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.4992903Z | ^~~~ 2025-12-04T12:35:04.4994107Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:205:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.4994228Z 205 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.4994342Z | ^~~~ 2025-12-04T12:35:04.4995521Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:205:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.4995642Z 205 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.4995767Z | ^~~~ 2025-12-04T12:35:04.4996951Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:206:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.4997087Z 206 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.4997182Z | ^~~~ 2025-12-04T12:35:04.4998365Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:206:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.4998501Z 206 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.4998601Z | ^~~~ 2025-12-04T12:35:04.4999791Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:206:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.4999927Z 206 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.5000029Z | ^~~~ 2025-12-04T12:35:04.5001229Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:206:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.5001342Z 206 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.5001445Z | ^~~~ 2025-12-04T12:35:04.5002719Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:207:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.5002830Z 207 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.5002940Z | ^~~~ 2025-12-04T12:35:04.5004160Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:207:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.5004309Z 207 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.5004422Z | ^~~~ 2025-12-04T12:35:04.5005641Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:207:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.5005769Z 207 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.5005880Z | ^~~~ 2025-12-04T12:35:04.5007075Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:207:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.5007202Z 207 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.5007314Z | ^~~~ 2025-12-04T12:35:04.5008501Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:209:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.5008629Z 209 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.5008728Z | ^~~~ 2025-12-04T12:35:04.5009920Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:209:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.5010038Z 209 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.5010133Z | ^~~~ 2025-12-04T12:35:04.5011325Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:209:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.5011442Z 209 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.5011561Z | ^~~~ 2025-12-04T12:35:04.5012741Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:209:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.5012859Z 209 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.5012970Z | ^~~~ 2025-12-04T12:35:04.5014148Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:210:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.5014278Z 210 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.5014370Z | ^~~~ 2025-12-04T12:35:04.5015557Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:210:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.5015685Z 210 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.5015781Z | ^~~~ 2025-12-04T12:35:04.5017045Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:210:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.5017174Z 210 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.5017274Z | ^~~~ 2025-12-04T12:35:04.5018524Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:210:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.5018635Z 210 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.5018737Z | ^~~~ 2025-12-04T12:35:04.5019973Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:211:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.5020118Z 211 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.5020225Z | ^~~~ 2025-12-04T12:35:04.5021446Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:211:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.5021561Z 211 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.5021680Z | ^~~~ 2025-12-04T12:35:04.5022858Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:211:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.5022981Z 211 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.5023080Z | ^~~~ 2025-12-04T12:35:04.5024268Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:211:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.5024393Z 211 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.5024498Z | ^~~~ 2025-12-04T12:35:04.5025674Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:213:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.5025806Z 213 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.5025900Z | ^~~~ 2025-12-04T12:35:04.5027091Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:213:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.5027210Z 213 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.5027312Z | ^~~~ 2025-12-04T12:35:04.5028510Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:213:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.5028628Z 213 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.5028740Z | ^~~~ 2025-12-04T12:35:04.5029923Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:213:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.5030041Z 213 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.5030156Z | ^~~~ 2025-12-04T12:35:04.5031336Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:214:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.5031466Z 214 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.5031558Z | ^~~~ 2025-12-04T12:35:04.5032742Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:214:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.5032868Z 214 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.5032966Z | ^~~~ 2025-12-04T12:35:04.5034182Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:214:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.5034306Z 214 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.5034406Z | ^~~~ 2025-12-04T12:35:04.5035632Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:214:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.5035778Z 214 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.5035880Z | ^~~~ 2025-12-04T12:35:04.5037107Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:215:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.5037218Z 215 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.5037330Z | ^~~~ 2025-12-04T12:35:04.5038513Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:215:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.5038623Z 215 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.5038732Z | ^~~~ 2025-12-04T12:35:04.5039925Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:215:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.5040038Z 215 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.5040159Z | ^~~~ 2025-12-04T12:35:04.5041337Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:215:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.5041469Z 215 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.5041568Z | ^~~~ 2025-12-04T12:35:04.5043062Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h: In instantiation of ‘void at::vec::CPU_CAPABILITY::QuantizeAvx512(const float*, T*, int, float, int64_t) [with T = unsigned char; int64_t = long int]’: 2025-12-04T12:35:04.5043661Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:933:31: required from here 2025-12-04T12:35:04.5044848Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:201:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.5044977Z 201 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.5045071Z | ^~~~ 2025-12-04T12:35:04.5046255Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:201:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.5046433Z 201 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.5046529Z | ^~~~ 2025-12-04T12:35:04.5047734Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:201:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.5047883Z 201 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.5047984Z | ^~~~ 2025-12-04T12:35:04.5049185Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:201:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.5049299Z 201 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.5049422Z | ^~~~ 2025-12-04T12:35:04.5050609Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:202:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.5050723Z 202 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.5050830Z | ^~~~ 2025-12-04T12:35:04.5052064Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:202:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.5052189Z 202 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.5052336Z | ^~~~ 2025-12-04T12:35:04.5053528Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:202:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.5053660Z 202 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.5053759Z | ^~~~ 2025-12-04T12:35:04.5054943Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:202:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.5055077Z 202 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.5055180Z | ^~~~ 2025-12-04T12:35:04.5056444Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:203:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.5056567Z 203 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.5056658Z | ^~~~ 2025-12-04T12:35:04.5057858Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:203:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.5057976Z 203 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.5058085Z | ^~~~ 2025-12-04T12:35:04.5059274Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:203:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.5059391Z 203 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.5059504Z | ^~~~ 2025-12-04T12:35:04.5060695Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:203:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.5060821Z 203 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.5060924Z | ^~~~ 2025-12-04T12:35:04.5062169Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:205:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.5062294Z 205 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.5062390Z | ^~~~ 2025-12-04T12:35:04.5063585Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:205:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.5063753Z 205 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.5063852Z | ^~~~ 2025-12-04T12:35:04.5065059Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:205:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.5065171Z 205 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.5065277Z | ^~~~ 2025-12-04T12:35:04.5066467Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:205:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.5066581Z 205 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.5066695Z | ^~~~ 2025-12-04T12:35:04.5067943Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:206:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.5068058Z 206 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.5068214Z | ^~~~ 2025-12-04T12:35:04.5069400Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:206:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.5069519Z 206 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.5069633Z | ^~~~ 2025-12-04T12:35:04.5070814Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:206:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.5071192Z 206 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.5071357Z | ^~~~ 2025-12-04T12:35:04.5072587Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:206:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.5072724Z 206 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.5072826Z | ^~~~ 2025-12-04T12:35:04.5074022Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:207:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.5074142Z 207 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.5074236Z | ^~~~ 2025-12-04T12:35:04.5075443Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:207:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.5075564Z 207 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.5075679Z | ^~~~ 2025-12-04T12:35:04.5076868Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:207:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.5076983Z 207 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.5077098Z | ^~~~ 2025-12-04T12:35:04.5078274Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:207:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.5078491Z 207 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.5078609Z | ^~~~ 2025-12-04T12:35:04.5079795Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:209:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.5079974Z 209 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.5080072Z | ^~~~ 2025-12-04T12:35:04.5081270Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:209:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.5081402Z 209 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.5081507Z | ^~~~ 2025-12-04T12:35:04.5082703Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:209:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.5082817Z 209 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.5082919Z | ^~~~ 2025-12-04T12:35:04.5084181Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:209:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.5084302Z 209 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.5084465Z | ^~~~ 2025-12-04T12:35:04.5085647Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:210:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.5085764Z 210 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.5085876Z | ^~~~ 2025-12-04T12:35:04.5087063Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:210:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.5087182Z 210 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.5087302Z | ^~~~ 2025-12-04T12:35:04.5088485Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:210:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.5088625Z 210 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.5088728Z | ^~~~ 2025-12-04T12:35:04.5089910Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:210:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.5090048Z 210 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.5090152Z | ^~~~ 2025-12-04T12:35:04.5091352Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:211:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.5091471Z 211 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.5091565Z | ^~~~ 2025-12-04T12:35:04.5092768Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:211:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.5092882Z 211 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.5092980Z | ^~~~ 2025-12-04T12:35:04.5094177Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:211:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.5094340Z 211 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.5094458Z | ^~~~ 2025-12-04T12:35:04.5095685Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:211:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.5095830Z 211 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.5095947Z | ^~~~ 2025-12-04T12:35:04.5097236Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:213:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.5097368Z 213 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.5097466Z | ^~~~ 2025-12-04T12:35:04.5098656Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:213:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.5098782Z 213 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.5098878Z | ^~~~ 2025-12-04T12:35:04.5100088Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:213:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.5100201Z 213 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.5100308Z | ^~~~ 2025-12-04T12:35:04.5101510Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:213:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.5101628Z 213 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.5101730Z | ^~~~ 2025-12-04T12:35:04.5102923Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:214:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.5103048Z 214 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.5103158Z | ^~~~ 2025-12-04T12:35:04.5104345Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:214:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.5104462Z 214 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.5104570Z | ^~~~ 2025-12-04T12:35:04.5105747Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:214:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.5105876Z 214 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.5105976Z | ^~~~ 2025-12-04T12:35:04.5107159Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:214:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.5107294Z 214 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.5107393Z | ^~~~ 2025-12-04T12:35:04.5108587Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:215:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.5108700Z 215 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.5108792Z | ^~~~ 2025-12-04T12:35:04.5110035Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:215:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.5110146Z 215 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.5110246Z | ^~~~ 2025-12-04T12:35:04.5111487Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:215:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.5111653Z 215 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.5111768Z | ^~~~ 2025-12-04T12:35:04.5112986Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:215:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.5113099Z 215 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.5113220Z | ^~~~ 2025-12-04T12:35:04.5113327Z PASSED [9.3004s] [ 37%] 2025-12-04T12:35:04.5114428Z inductor/test_aot_inductor_package.py::TestAOTInductorPackageCpp_cpu::test_loading_wrong_model W1204 12:28:54.274000 140836 site-packages/torch/_inductor/package/package.py:120] Loading outdated pt2 file. Please regenerate your package. 2025-12-04T12:35:04.5114538Z PASSED [5.1753s] [ 38%] 2025-12-04T12:35:04.5115516Z inductor/test_aot_inductor_package.py::TestAOTInductorPackageCpp_cpu::test_metadata In file included from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_float.h:12, 2025-12-04T12:35:04.5115971Z from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512.h:11, 2025-12-04T12:35:04.5116340Z from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec.h:5, 2025-12-04T12:35:04.5116796Z from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/functional_base.h:7, 2025-12-04T12:35:04.5117201Z from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/functional.h:4, 2025-12-04T12:35:04.5117664Z from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/torch/csrc/inductor/cpp_prefix.h:45, 2025-12-04T12:35:04.5118347Z from /tmp/zNkm53/tmpp0pxkc5y/data/aotinductor/model/cuzep6c3r5og2e4er75yunzp3ebrphkoo5sxbhxof3er5uux4ih3.wrapper.cpp:750: 2025-12-04T12:35:04.5118947Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/sleef.h:192:10: warning: ISO C++ prohibits anonymous structs [-Wpedantic] 2025-12-04T12:35:04.5119060Z 192 | struct { 2025-12-04T12:35:04.5119155Z | ^ 2025-12-04T12:35:04.5119660Z In file included from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512.h:15, 2025-12-04T12:35:04.5120043Z from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec.h:5, 2025-12-04T12:35:04.5120484Z from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/functional_base.h:7, 2025-12-04T12:35:04.5120900Z from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/functional.h:4, 2025-12-04T12:35:04.5121365Z from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/torch/csrc/inductor/cpp_prefix.h:45, 2025-12-04T12:35:04.5122030Z from /tmp/zNkm53/tmpp0pxkc5y/data/aotinductor/model/cuzep6c3r5og2e4er75yunzp3ebrphkoo5sxbhxof3er5uux4ih3.wrapper.cpp:750: 2025-12-04T12:35:04.5124293Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In static member function ‘static at::vec::CPU_CAPABILITY::Vectorized at::vec::CPU_CAPABILITY::Vectorized::blendv(const at::vec::CPU_CAPABILITY::Vectorized&, const at::vec::CPU_CAPABILITY::Vectorized&, const at::vec::CPU_CAPABILITY::Vectorized&)’: 2025-12-04T12:35:04.5125548Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:544:38: warning: overflow in conversion from ‘int’ to ‘short int’ changes value from ‘65535’ to ‘-1’ [-Woverflow] 2025-12-04T12:35:04.5125752Z 544 | auto msb_one = _mm512_set1_epi16(0xFFFF); 2025-12-04T12:35:04.5125901Z | ^~~~~~ 2025-12-04T12:35:04.5126406Z In file included from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512.h:15, 2025-12-04T12:35:04.5126825Z from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec.h:5, 2025-12-04T12:35:04.5127268Z from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/functional_base.h:7, 2025-12-04T12:35:04.5127687Z from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/functional.h:4, 2025-12-04T12:35:04.5128156Z from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/torch/csrc/inductor/cpp_prefix.h:45, 2025-12-04T12:35:04.5128818Z from /tmp/zNkm53/tmpp0pxkc5y/data/aotinductor/model/cuzep6c3r5og2e4er75yunzp3ebrphkoo5sxbhxof3er5uux4ih3.wrapper.cpp:750: 2025-12-04T12:35:04.5130478Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In member function ‘at::vec::CPU_CAPABILITY::Vectorized at::vec::CPU_CAPABILITY::Vectorized::operator==(const at::vec::CPU_CAPABILITY::Vectorized&) const’: 2025-12-04T12:35:04.5131651Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:697:54: warning: overflow in conversion from ‘int’ to ‘short int’ changes value from ‘65535’ to ‘-1’ [-Woverflow] 2025-12-04T12:35:04.5131882Z 697 | return _mm512_mask_set1_epi16(zero_vector, mask, 0xFFFF); 2025-12-04T12:35:04.5132009Z | ^~~~~~ 2025-12-04T12:35:04.5133642Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In member function ‘at::vec::CPU_CAPABILITY::Vectorized at::vec::CPU_CAPABILITY::Vectorized::operator!=(const at::vec::CPU_CAPABILITY::Vectorized&) const’: 2025-12-04T12:35:04.5134807Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:701:54: warning: overflow in conversion from ‘int’ to ‘short int’ changes value from ‘65535’ to ‘-1’ [-Woverflow] 2025-12-04T12:35:04.5135019Z 701 | return _mm512_mask_set1_epi16(zero_vector, mask, 0xFFFF); 2025-12-04T12:35:04.5135156Z | ^~~~~~ 2025-12-04T12:35:04.5136850Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In member function ‘at::vec::CPU_CAPABILITY::Vectorized at::vec::CPU_CAPABILITY::Vectorized::operator<(const at::vec::CPU_CAPABILITY::Vectorized&) const’: 2025-12-04T12:35:04.5138048Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:705:54: warning: overflow in conversion from ‘int’ to ‘short int’ changes value from ‘65535’ to ‘-1’ [-Woverflow] 2025-12-04T12:35:04.5138259Z 705 | return _mm512_mask_set1_epi16(zero_vector, mask, 0xFFFF); 2025-12-04T12:35:04.5138404Z | ^~~~~~ 2025-12-04T12:35:04.5140030Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In member function ‘at::vec::CPU_CAPABILITY::Vectorized at::vec::CPU_CAPABILITY::Vectorized::operator<=(const at::vec::CPU_CAPABILITY::Vectorized&) const’: 2025-12-04T12:35:04.5141187Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:709:54: warning: overflow in conversion from ‘int’ to ‘short int’ changes value from ‘65535’ to ‘-1’ [-Woverflow] 2025-12-04T12:35:04.5141462Z 709 | return _mm512_mask_set1_epi16(zero_vector, mask, 0xFFFF); 2025-12-04T12:35:04.5141588Z | ^~~~~~ 2025-12-04T12:35:04.5143252Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In member function ‘at::vec::CPU_CAPABILITY::Vectorized at::vec::CPU_CAPABILITY::Vectorized::operator>(const at::vec::CPU_CAPABILITY::Vectorized&) const’: 2025-12-04T12:35:04.5144470Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:713:54: warning: overflow in conversion from ‘int’ to ‘short int’ changes value from ‘65535’ to ‘-1’ [-Woverflow] 2025-12-04T12:35:04.5144689Z 713 | return _mm512_mask_set1_epi16(zero_vector, mask, 0xFFFF); 2025-12-04T12:35:04.5144822Z | ^~~~~~ 2025-12-04T12:35:04.5146442Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In member function ‘at::vec::CPU_CAPABILITY::Vectorized at::vec::CPU_CAPABILITY::Vectorized::operator>=(const at::vec::CPU_CAPABILITY::Vectorized&) const’: 2025-12-04T12:35:04.5147620Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:717:54: warning: overflow in conversion from ‘int’ to ‘short int’ changes value from ‘65535’ to ‘-1’ [-Woverflow] 2025-12-04T12:35:04.5147825Z 717 | return _mm512_mask_set1_epi16(zero_vector, mask, 0xFFFF); 2025-12-04T12:35:04.5147961Z | ^~~~~~ 2025-12-04T12:35:04.5150212Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In static member function ‘static at::vec::CPU_CAPABILITY::Vectorized at::vec::CPU_CAPABILITY::Vectorized::blendv(const at::vec::CPU_CAPABILITY::Vectorized&, const at::vec::CPU_CAPABILITY::Vectorized&, const at::vec::CPU_CAPABILITY::Vectorized&)’: 2025-12-04T12:35:04.5151433Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1153:37: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.5151585Z 1153 | auto msb_one = _mm512_set1_epi8(0xFF); 2025-12-04T12:35:04.5151700Z | ^~~~ 2025-12-04T12:35:04.5153380Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In member function ‘at::vec::CPU_CAPABILITY::Vectorized at::vec::CPU_CAPABILITY::Vectorized::operator==(const at::vec::CPU_CAPABILITY::Vectorized&) const’: 2025-12-04T12:35:04.5154569Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1166:53: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.5154788Z 1166 | return _mm512_mask_set1_epi8(zero_vector, mask, 0xFF); 2025-12-04T12:35:04.5154920Z | ^~~~ 2025-12-04T12:35:04.5156581Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In member function ‘at::vec::CPU_CAPABILITY::Vectorized at::vec::CPU_CAPABILITY::Vectorized::operator!=(const at::vec::CPU_CAPABILITY::Vectorized&) const’: 2025-12-04T12:35:04.5157787Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1170:53: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.5158039Z 1170 | return _mm512_mask_set1_epi8(zero_vector, mask, 0xFF); 2025-12-04T12:35:04.5158176Z | ^~~~ 2025-12-04T12:35:04.5159857Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In member function ‘at::vec::CPU_CAPABILITY::Vectorized at::vec::CPU_CAPABILITY::Vectorized::operator<(const at::vec::CPU_CAPABILITY::Vectorized&) const’: 2025-12-04T12:35:04.5161091Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1174:53: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.5161333Z 1174 | return _mm512_mask_set1_epi8(zero_vector, mask, 0xFF); 2025-12-04T12:35:04.5161460Z | ^~~~ 2025-12-04T12:35:04.5163134Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In member function ‘at::vec::CPU_CAPABILITY::Vectorized at::vec::CPU_CAPABILITY::Vectorized::operator<=(const at::vec::CPU_CAPABILITY::Vectorized&) const’: 2025-12-04T12:35:04.5164334Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1178:53: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.5164552Z 1178 | return _mm512_mask_set1_epi8(zero_vector, mask, 0xFF); 2025-12-04T12:35:04.5164674Z | ^~~~ 2025-12-04T12:35:04.5167008Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In static member function ‘static at::vec::CPU_CAPABILITY::Vectorized at::vec::CPU_CAPABILITY::Vectorized::blendv(const at::vec::CPU_CAPABILITY::Vectorized&, const at::vec::CPU_CAPABILITY::Vectorized&, const at::vec::CPU_CAPABILITY::Vectorized&)’: 2025-12-04T12:35:04.5168215Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1207:37: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.5168368Z 1207 | auto msb_one = _mm512_set1_epi8(0xFF); 2025-12-04T12:35:04.5168504Z | ^~~~ 2025-12-04T12:35:04.5170198Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In member function ‘at::vec::CPU_CAPABILITY::Vectorized at::vec::CPU_CAPABILITY::Vectorized::operator==(const at::vec::CPU_CAPABILITY::Vectorized&) const’: 2025-12-04T12:35:04.5172130Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1220:53: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.5172441Z 1220 | return _mm512_mask_set1_epi8(zero_vector, mask, 0xFF); 2025-12-04T12:35:04.5172570Z | ^~~~ 2025-12-04T12:35:04.5174305Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In member function ‘at::vec::CPU_CAPABILITY::Vectorized at::vec::CPU_CAPABILITY::Vectorized::operator!=(const at::vec::CPU_CAPABILITY::Vectorized&) const’: 2025-12-04T12:35:04.5175577Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1224:53: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.5175793Z 1224 | return _mm512_mask_set1_epi8(zero_vector, mask, 0xFF); 2025-12-04T12:35:04.5175917Z | ^~~~ 2025-12-04T12:35:04.5177695Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In member function ‘at::vec::CPU_CAPABILITY::Vectorized at::vec::CPU_CAPABILITY::Vectorized::operator<(const at::vec::CPU_CAPABILITY::Vectorized&) const’: 2025-12-04T12:35:04.5178956Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1228:53: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.5179165Z 1228 | return _mm512_mask_set1_epi8(zero_vector, mask, 0xFF); 2025-12-04T12:35:04.5179353Z | ^~~~ 2025-12-04T12:35:04.5181049Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In member function ‘at::vec::CPU_CAPABILITY::Vectorized at::vec::CPU_CAPABILITY::Vectorized::operator<=(const at::vec::CPU_CAPABILITY::Vectorized&) const’: 2025-12-04T12:35:04.5182255Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1232:53: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.5182460Z 1232 | return _mm512_mask_set1_epi8(zero_vector, mask, 0xFF); 2025-12-04T12:35:04.5182588Z | ^~~~ 2025-12-04T12:35:04.5184967Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In instantiation of ‘at::vec::CPU_CAPABILITY::Vectorized at::vec::CPU_CAPABILITY::shift_512_8(const at::vec::CPU_CAPABILITY::Vectorized&, const at::vec::CPU_CAPABILITY::Vectorized&) [with bool left_shift = true; T = signed char; typename std::enable_if<(is_same_v || is_same_v), int>::type = 0]’: 2025-12-04T12:35:04.5185549Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:2074:27: required from here 2025-12-04T12:35:04.5186748Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1866:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5186851Z 1866 | 0x80, 2025-12-04T12:35:04.5186966Z | ^~~~ 2025-12-04T12:35:04.5188141Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1868:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5188242Z 1868 | 0x80, 2025-12-04T12:35:04.5188349Z | ^~~~ 2025-12-04T12:35:04.5189527Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1870:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5189712Z 1870 | 0x80, 2025-12-04T12:35:04.5189806Z | ^~~~ 2025-12-04T12:35:04.5190988Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1872:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5191142Z 1872 | 0x80, 2025-12-04T12:35:04.5191238Z | ^~~~ 2025-12-04T12:35:04.5192415Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1874:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5192533Z 1874 | 0x80, 2025-12-04T12:35:04.5192628Z | ^~~~ 2025-12-04T12:35:04.5193821Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1876:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5193924Z 1876 | 0x80, 2025-12-04T12:35:04.5194018Z | ^~~~ 2025-12-04T12:35:04.5195201Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1878:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5195350Z 1878 | 0x80, 2025-12-04T12:35:04.5195459Z | ^~~~ 2025-12-04T12:35:04.5196635Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1880:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5196768Z 1880 | 0x80, 2025-12-04T12:35:04.5196880Z | ^~~~ 2025-12-04T12:35:04.5198058Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1882:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5198159Z 1882 | 0x80, 2025-12-04T12:35:04.5198268Z | ^~~~ 2025-12-04T12:35:04.5199443Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1884:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5199571Z 1884 | 0x80, 2025-12-04T12:35:04.5199666Z | ^~~~ 2025-12-04T12:35:04.5200838Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1886:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5200958Z 1886 | 0x80, 2025-12-04T12:35:04.5201053Z | ^~~~ 2025-12-04T12:35:04.5202236Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1888:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5202340Z 1888 | 0x80, 2025-12-04T12:35:04.5202434Z | ^~~~ 2025-12-04T12:35:04.5203628Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1890:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5203727Z 1890 | 0x80, 2025-12-04T12:35:04.5203822Z | ^~~~ 2025-12-04T12:35:04.5205017Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1892:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5205109Z 1892 | 0x80, 2025-12-04T12:35:04.5205218Z | ^~~~ 2025-12-04T12:35:04.5206391Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1894:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5206540Z 1894 | 0x80, 2025-12-04T12:35:04.5206652Z | ^~~~ 2025-12-04T12:35:04.5207835Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1896:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5207985Z 1896 | 0x80, 2025-12-04T12:35:04.5208080Z | ^~~~ 2025-12-04T12:35:04.5209267Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1898:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5209379Z 1898 | 0x80, 2025-12-04T12:35:04.5209471Z | ^~~~ 2025-12-04T12:35:04.5210647Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1900:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5210765Z 1900 | 0x80, 2025-12-04T12:35:04.5210857Z | ^~~~ 2025-12-04T12:35:04.5212085Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1902:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5212187Z 1902 | 0x80, 2025-12-04T12:35:04.5212281Z | ^~~~ 2025-12-04T12:35:04.5213502Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1904:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5213598Z 1904 | 0x80, 2025-12-04T12:35:04.5213690Z | ^~~~ 2025-12-04T12:35:04.5214871Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1906:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5214972Z 1906 | 0x80, 2025-12-04T12:35:04.5215077Z | ^~~~ 2025-12-04T12:35:04.5216262Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1908:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5216422Z 1908 | 0x80, 2025-12-04T12:35:04.5216535Z | ^~~~ 2025-12-04T12:35:04.5217726Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1910:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5217833Z 1910 | 0x80, 2025-12-04T12:35:04.5217926Z | ^~~~ 2025-12-04T12:35:04.5219096Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1912:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5219211Z 1912 | 0x80, 2025-12-04T12:35:04.5219306Z | ^~~~ 2025-12-04T12:35:04.5220500Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1914:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5220604Z 1914 | 0x80, 2025-12-04T12:35:04.5220695Z | ^~~~ 2025-12-04T12:35:04.5221887Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1916:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5221982Z 1916 | 0x80, 2025-12-04T12:35:04.5222076Z | ^~~~ 2025-12-04T12:35:04.5223261Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1918:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5223416Z 1918 | 0x80, 2025-12-04T12:35:04.5223524Z | ^~~~ 2025-12-04T12:35:04.5224742Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1920:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5224867Z 1920 | 0x80, 2025-12-04T12:35:04.5224974Z | ^~~~ 2025-12-04T12:35:04.5226192Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1922:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5226301Z 1922 | 0x80, 2025-12-04T12:35:04.5226393Z | ^~~~ 2025-12-04T12:35:04.5227577Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1924:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5227689Z 1924 | 0x80, 2025-12-04T12:35:04.5227781Z | ^~~~ 2025-12-04T12:35:04.5228962Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1926:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5229075Z 1926 | 0x80, 2025-12-04T12:35:04.5229167Z | ^~~~ 2025-12-04T12:35:04.5230359Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1928:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5230455Z 1928 | 0x80); 2025-12-04T12:35:04.5230547Z | ^~~~ 2025-12-04T12:35:04.5231735Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1930:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5231834Z 1930 | 0x80, 2025-12-04T12:35:04.5231926Z | ^~~~ 2025-12-04T12:35:04.5233117Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1932:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5233217Z 1932 | 0x80, 2025-12-04T12:35:04.5233323Z | ^~~~ 2025-12-04T12:35:04.5234512Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1934:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5234610Z 1934 | 0x80, 2025-12-04T12:35:04.5234717Z | ^~~~ 2025-12-04T12:35:04.5235887Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1936:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5236001Z 1936 | 0x80, 2025-12-04T12:35:04.5236093Z | ^~~~ 2025-12-04T12:35:04.5237278Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1938:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5237392Z 1938 | 0x80, 2025-12-04T12:35:04.5237483Z | ^~~~ 2025-12-04T12:35:04.5238666Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1940:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5238772Z 1940 | 0x80, 2025-12-04T12:35:04.5238864Z | ^~~~ 2025-12-04T12:35:04.5240045Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1942:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5240210Z 1942 | 0x80, 2025-12-04T12:35:04.5240303Z | ^~~~ 2025-12-04T12:35:04.5241530Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1944:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5241658Z 1944 | 0x80, 2025-12-04T12:35:04.5241768Z | ^~~~ 2025-12-04T12:35:04.5242981Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1946:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5243078Z 1946 | 0x80, 2025-12-04T12:35:04.5243189Z | ^~~~ 2025-12-04T12:35:04.5244363Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1948:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5244462Z 1948 | 0x80, 2025-12-04T12:35:04.5244566Z | ^~~~ 2025-12-04T12:35:04.5245745Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1950:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5245860Z 1950 | 0x80, 2025-12-04T12:35:04.5245952Z | ^~~~ 2025-12-04T12:35:04.5247128Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1952:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5247235Z 1952 | 0x80, 2025-12-04T12:35:04.5247328Z | ^~~~ 2025-12-04T12:35:04.5248513Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1954:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5248607Z 1954 | 0x80, 2025-12-04T12:35:04.5248698Z | ^~~~ 2025-12-04T12:35:04.5249888Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1956:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5249990Z 1956 | 0x80, 2025-12-04T12:35:04.5250083Z | ^~~~ 2025-12-04T12:35:04.5251271Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1958:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5251365Z 1958 | 0x80, 2025-12-04T12:35:04.5251472Z | ^~~~ 2025-12-04T12:35:04.5252651Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1960:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5252743Z 1960 | 0x80, 2025-12-04T12:35:04.5252848Z | ^~~~ 2025-12-04T12:35:04.5254025Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1962:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5254139Z 1962 | 0x80, 2025-12-04T12:35:04.5254230Z | ^~~~ 2025-12-04T12:35:04.5255411Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1964:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5255517Z 1964 | 0x80, 2025-12-04T12:35:04.5255609Z | ^~~~ 2025-12-04T12:35:04.5256898Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1966:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5257007Z 1966 | 0x80, 2025-12-04T12:35:04.5257100Z | ^~~~ 2025-12-04T12:35:04.5258340Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1968:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5258470Z 1968 | 0x80, 2025-12-04T12:35:04.5258563Z | ^~~~ 2025-12-04T12:35:04.5259792Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1970:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5259888Z 1970 | 0x80, 2025-12-04T12:35:04.5259981Z | ^~~~ 2025-12-04T12:35:04.5261172Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1972:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5261268Z 1972 | 0x80, 2025-12-04T12:35:04.5261372Z | ^~~~ 2025-12-04T12:35:04.5262552Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1974:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5262651Z 1974 | 0x80, 2025-12-04T12:35:04.5262758Z | ^~~~ 2025-12-04T12:35:04.5263939Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1976:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5264050Z 1976 | 0x80, 2025-12-04T12:35:04.5264150Z | ^~~~ 2025-12-04T12:35:04.5265326Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1978:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5265438Z 1978 | 0x80, 2025-12-04T12:35:04.5265530Z | ^~~~ 2025-12-04T12:35:04.5266706Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1980:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5266824Z 1980 | 0x80, 2025-12-04T12:35:04.5266917Z | ^~~~ 2025-12-04T12:35:04.5268110Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1982:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5268206Z 1982 | 0x80, 2025-12-04T12:35:04.5268339Z | ^~~~ 2025-12-04T12:35:04.5269525Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1984:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5269619Z 1984 | 0x80, 2025-12-04T12:35:04.5269724Z | ^~~~ 2025-12-04T12:35:04.5270898Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1986:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5271345Z 1986 | 0x80, 2025-12-04T12:35:04.5271452Z | ^~~~ 2025-12-04T12:35:04.5272649Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1988:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5272742Z 1988 | 0x80, 2025-12-04T12:35:04.5272858Z | ^~~~ 2025-12-04T12:35:04.5274031Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1990:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5274140Z 1990 | 0x80, 2025-12-04T12:35:04.5274232Z | ^~~~ 2025-12-04T12:35:04.5275486Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1992:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5275603Z 1992 | 0x80, 2025-12-04T12:35:04.5275696Z | ^~~~ 2025-12-04T12:35:04.5276948Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:2002:38: warning: overflow in conversion from ‘int’ to ‘short int’ changes value from ‘65280’ to ‘-256’ [-Woverflow] 2025-12-04T12:35:04.5277111Z 2002 | __m512i keep_1 = _mm512_set1_epi16(0xFF00); 2025-12-04T12:35:04.5277236Z | ^~~~~~ 2025-12-04T12:35:04.5279657Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In instantiation of ‘at::vec::CPU_CAPABILITY::Vectorized at::vec::CPU_CAPABILITY::shift_512_8(const at::vec::CPU_CAPABILITY::Vectorized&, const at::vec::CPU_CAPABILITY::Vectorized&) [with bool left_shift = true; T = unsigned char; typename std::enable_if<(is_same_v || is_same_v), int>::type = 0]’: 2025-12-04T12:35:04.5280248Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:2081:27: required from here 2025-12-04T12:35:04.5281450Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1866:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5281551Z 1866 | 0x80, 2025-12-04T12:35:04.5281646Z | ^~~~ 2025-12-04T12:35:04.5282843Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1868:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5282940Z 1868 | 0x80, 2025-12-04T12:35:04.5283049Z | ^~~~ 2025-12-04T12:35:04.5284231Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1870:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5284327Z 1870 | 0x80, 2025-12-04T12:35:04.5284436Z | ^~~~ 2025-12-04T12:35:04.5285617Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1872:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5285786Z 1872 | 0x80, 2025-12-04T12:35:04.5285883Z | ^~~~ 2025-12-04T12:35:04.5287064Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1874:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5287174Z 1874 | 0x80, 2025-12-04T12:35:04.5287270Z | ^~~~ 2025-12-04T12:35:04.5288494Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1876:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5288606Z 1876 | 0x80, 2025-12-04T12:35:04.5288700Z | ^~~~ 2025-12-04T12:35:04.5289909Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1878:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5290013Z 1878 | 0x80, 2025-12-04T12:35:04.5290105Z | ^~~~ 2025-12-04T12:35:04.5291291Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1880:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5291387Z 1880 | 0x80, 2025-12-04T12:35:04.5291537Z | ^~~~ 2025-12-04T12:35:04.5292726Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1882:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5292822Z 1882 | 0x80, 2025-12-04T12:35:04.5292983Z | ^~~~ 2025-12-04T12:35:04.5294157Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1884:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5294262Z 1884 | 0x80, 2025-12-04T12:35:04.5294366Z | ^~~~ 2025-12-04T12:35:04.5295547Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1886:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5295652Z 1886 | 0x80, 2025-12-04T12:35:04.5295757Z | ^~~~ 2025-12-04T12:35:04.5296996Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1888:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5297107Z 1888 | 0x80, 2025-12-04T12:35:04.5297206Z | ^~~~ 2025-12-04T12:35:04.5298401Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1890:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5298501Z 1890 | 0x80, 2025-12-04T12:35:04.5298595Z | ^~~~ 2025-12-04T12:35:04.5299784Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1892:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5299880Z 1892 | 0x80, 2025-12-04T12:35:04.5299983Z | ^~~~ 2025-12-04T12:35:04.5301172Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1894:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5301266Z 1894 | 0x80, 2025-12-04T12:35:04.5301376Z | ^~~~ 2025-12-04T12:35:04.5302547Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1896:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5302685Z 1896 | 0x80, 2025-12-04T12:35:04.5302793Z | ^~~~ 2025-12-04T12:35:04.5303967Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1898:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5304078Z 1898 | 0x80, 2025-12-04T12:35:04.5304212Z | ^~~~ 2025-12-04T12:35:04.5305386Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1900:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5305496Z 1900 | 0x80, 2025-12-04T12:35:04.5305599Z | ^~~~ 2025-12-04T12:35:04.5306769Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1902:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5306882Z 1902 | 0x80, 2025-12-04T12:35:04.5306974Z | ^~~~ 2025-12-04T12:35:04.5308165Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1904:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5308293Z 1904 | 0x80, 2025-12-04T12:35:04.5308393Z | ^~~~ 2025-12-04T12:35:04.5309584Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1906:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5309711Z 1906 | 0x80, 2025-12-04T12:35:04.5309805Z | ^~~~ 2025-12-04T12:35:04.5310993Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1908:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5311094Z 1908 | 0x80, 2025-12-04T12:35:04.5311203Z | ^~~~ 2025-12-04T12:35:04.5312377Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1910:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5312478Z 1910 | 0x80, 2025-12-04T12:35:04.5312590Z | ^~~~ 2025-12-04T12:35:04.5313761Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1912:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5313878Z 1912 | 0x80, 2025-12-04T12:35:04.5313970Z | ^~~~ 2025-12-04T12:35:04.5315147Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1914:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5315261Z 1914 | 0x80, 2025-12-04T12:35:04.5315354Z | ^~~~ 2025-12-04T12:35:04.5316533Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1916:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5316652Z 1916 | 0x80, 2025-12-04T12:35:04.5316743Z | ^~~~ 2025-12-04T12:35:04.5317930Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1918:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5318030Z 1918 | 0x80, 2025-12-04T12:35:04.5318123Z | ^~~~ 2025-12-04T12:35:04.5319307Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1920:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5319441Z 1920 | 0x80, 2025-12-04T12:35:04.5319550Z | ^~~~ 2025-12-04T12:35:04.5320731Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1922:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5320903Z 1922 | 0x80, 2025-12-04T12:35:04.5321012Z | ^~~~ 2025-12-04T12:35:04.5322189Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1924:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5322324Z 1924 | 0x80, 2025-12-04T12:35:04.5322432Z | ^~~~ 2025-12-04T12:35:04.5323611Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1926:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5323725Z 1926 | 0x80, 2025-12-04T12:35:04.5323819Z | ^~~~ 2025-12-04T12:35:04.5324995Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1928:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5325115Z 1928 | 0x80); 2025-12-04T12:35:04.5325207Z | ^~~~ 2025-12-04T12:35:04.5326400Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1930:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5326496Z 1930 | 0x80, 2025-12-04T12:35:04.5326587Z | ^~~~ 2025-12-04T12:35:04.5327775Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1932:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5327875Z 1932 | 0x80, 2025-12-04T12:35:04.5327966Z | ^~~~ 2025-12-04T12:35:04.5329155Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1934:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5329254Z 1934 | 0x80, 2025-12-04T12:35:04.5329358Z | ^~~~ 2025-12-04T12:35:04.5330536Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1936:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5330629Z 1936 | 0x80, 2025-12-04T12:35:04.5330734Z | ^~~~ 2025-12-04T12:35:04.5331904Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1938:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5332018Z 1938 | 0x80, 2025-12-04T12:35:04.5332110Z | ^~~~ 2025-12-04T12:35:04.5333288Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1940:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5333402Z 1940 | 0x80, 2025-12-04T12:35:04.5333492Z | ^~~~ 2025-12-04T12:35:04.5334668Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1942:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5334774Z 1942 | 0x80, 2025-12-04T12:35:04.5334866Z | ^~~~ 2025-12-04T12:35:04.5336053Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1944:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5336189Z 1944 | 0x80, 2025-12-04T12:35:04.5336281Z | ^~~~ 2025-12-04T12:35:04.5337603Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1946:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5337733Z 1946 | 0x80, 2025-12-04T12:35:04.5337840Z | ^~~~ 2025-12-04T12:35:04.5339055Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1948:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5339152Z 1948 | 0x80, 2025-12-04T12:35:04.5339259Z | ^~~~ 2025-12-04T12:35:04.5340432Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1950:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5340536Z 1950 | 0x80, 2025-12-04T12:35:04.5340644Z | ^~~~ 2025-12-04T12:35:04.5341821Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1952:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5341936Z 1952 | 0x80, 2025-12-04T12:35:04.5342029Z | ^~~~ 2025-12-04T12:35:04.5343209Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1954:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5343318Z 1954 | 0x80, 2025-12-04T12:35:04.5343410Z | ^~~~ 2025-12-04T12:35:04.5344596Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1956:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5344696Z 1956 | 0x80, 2025-12-04T12:35:04.5344787Z | ^~~~ 2025-12-04T12:35:04.5345978Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1958:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5346079Z 1958 | 0x80, 2025-12-04T12:35:04.5346171Z | ^~~~ 2025-12-04T12:35:04.5347362Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1960:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5347457Z 1960 | 0x80, 2025-12-04T12:35:04.5347563Z | ^~~~ 2025-12-04T12:35:04.5348732Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1962:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5348832Z 1962 | 0x80, 2025-12-04T12:35:04.5348938Z | ^~~~ 2025-12-04T12:35:04.5350115Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1964:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5350228Z 1964 | 0x80, 2025-12-04T12:35:04.5350320Z | ^~~~ 2025-12-04T12:35:04.5351501Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1966:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5351607Z 1966 | 0x80, 2025-12-04T12:35:04.5351700Z | ^~~~ 2025-12-04T12:35:04.5352869Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1968:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5353033Z 1968 | 0x80, 2025-12-04T12:35:04.5353127Z | ^~~~ 2025-12-04T12:35:04.5354362Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1970:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5354505Z 1970 | 0x80, 2025-12-04T12:35:04.5354598Z | ^~~~ 2025-12-04T12:35:04.5355832Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1972:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5355928Z 1972 | 0x80, 2025-12-04T12:35:04.5356020Z | ^~~~ 2025-12-04T12:35:04.5357209Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1974:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5357311Z 1974 | 0x80, 2025-12-04T12:35:04.5357418Z | ^~~~ 2025-12-04T12:35:04.5358596Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1976:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5358696Z 1976 | 0x80, 2025-12-04T12:35:04.5358801Z | ^~~~ 2025-12-04T12:35:04.5359983Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1978:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5360091Z 1978 | 0x80, 2025-12-04T12:35:04.5360183Z | ^~~~ 2025-12-04T12:35:04.5361352Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1980:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5361462Z 1980 | 0x80, 2025-12-04T12:35:04.5361555Z | ^~~~ 2025-12-04T12:35:04.5362734Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1982:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5362847Z 1982 | 0x80, 2025-12-04T12:35:04.5362940Z | ^~~~ 2025-12-04T12:35:04.5364131Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1984:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5364237Z 1984 | 0x80, 2025-12-04T12:35:04.5364331Z | ^~~~ 2025-12-04T12:35:04.5365515Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1986:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5365649Z 1986 | 0x80, 2025-12-04T12:35:04.5365756Z | ^~~~ 2025-12-04T12:35:04.5366937Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1988:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5367068Z 1988 | 0x80, 2025-12-04T12:35:04.5367176Z | ^~~~ 2025-12-04T12:35:04.5368358Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1990:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5368454Z 1990 | 0x80, 2025-12-04T12:35:04.5368561Z | ^~~~ 2025-12-04T12:35:04.5369744Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1992:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5369853Z 1992 | 0x80, 2025-12-04T12:35:04.5369948Z | ^~~~ 2025-12-04T12:35:04.5371398Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:2002:38: warning: overflow in conversion from ‘int’ to ‘short int’ changes value from ‘65280’ to ‘-256’ [-Woverflow] 2025-12-04T12:35:04.5371580Z 2002 | __m512i keep_1 = _mm512_set1_epi16(0xFF00); 2025-12-04T12:35:04.5371700Z | ^~~~~~ 2025-12-04T12:35:04.5374179Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In instantiation of ‘at::vec::CPU_CAPABILITY::Vectorized at::vec::CPU_CAPABILITY::shift_512_8(const at::vec::CPU_CAPABILITY::Vectorized&, const at::vec::CPU_CAPABILITY::Vectorized&) [with bool left_shift = false; T = signed char; typename std::enable_if<(is_same_v || is_same_v), int>::type = 0]’: 2025-12-04T12:35:04.5374775Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:2109:28: required from here 2025-12-04T12:35:04.5375987Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1866:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5376091Z 1866 | 0x80, 2025-12-04T12:35:04.5376186Z | ^~~~ 2025-12-04T12:35:04.5377464Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1868:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5377563Z 1868 | 0x80, 2025-12-04T12:35:04.5377657Z | ^~~~ 2025-12-04T12:35:04.5378855Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1870:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5378952Z 1870 | 0x80, 2025-12-04T12:35:04.5379061Z | ^~~~ 2025-12-04T12:35:04.5380246Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1872:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5380348Z 1872 | 0x80, 2025-12-04T12:35:04.5380457Z | ^~~~ 2025-12-04T12:35:04.5381638Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1874:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5381749Z 1874 | 0x80, 2025-12-04T12:35:04.5381909Z | ^~~~ 2025-12-04T12:35:04.5383087Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1876:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5383194Z 1876 | 0x80, 2025-12-04T12:35:04.5383290Z | ^~~~ 2025-12-04T12:35:04.5384468Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1878:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5384629Z 1878 | 0x80, 2025-12-04T12:35:04.5384722Z | ^~~~ 2025-12-04T12:35:04.5385918Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1880:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5386015Z 1880 | 0x80, 2025-12-04T12:35:04.5386114Z | ^~~~ 2025-12-04T12:35:04.5387300Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1882:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5387397Z 1882 | 0x80, 2025-12-04T12:35:04.5387502Z | ^~~~ 2025-12-04T12:35:04.5388708Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1884:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5388808Z 1884 | 0x80, 2025-12-04T12:35:04.5388916Z | ^~~~ 2025-12-04T12:35:04.5390125Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1886:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5390222Z 1886 | 0x80, 2025-12-04T12:35:04.5390335Z | ^~~~ 2025-12-04T12:35:04.5391513Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1888:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5391619Z 1888 | 0x80, 2025-12-04T12:35:04.5391710Z | ^~~~ 2025-12-04T12:35:04.5392892Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1890:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5393003Z 1890 | 0x80, 2025-12-04T12:35:04.5393097Z | ^~~~ 2025-12-04T12:35:04.5394280Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1892:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5394374Z 1892 | 0x80, 2025-12-04T12:35:04.5394475Z | ^~~~ 2025-12-04T12:35:04.5395664Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1894:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5395759Z 1894 | 0x80, 2025-12-04T12:35:04.5395851Z | ^~~~ 2025-12-04T12:35:04.5397045Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1896:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5397146Z 1896 | 0x80, 2025-12-04T12:35:04.5397256Z | ^~~~ 2025-12-04T12:35:04.5398433Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1898:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5398526Z 1898 | 0x80, 2025-12-04T12:35:04.5398679Z | ^~~~ 2025-12-04T12:35:04.5399847Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1900:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5399954Z 1900 | 0x80, 2025-12-04T12:35:04.5400046Z | ^~~~ 2025-12-04T12:35:04.5401254Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1902:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5401396Z 1902 | 0x80, 2025-12-04T12:35:04.5401488Z | ^~~~ 2025-12-04T12:35:04.5402694Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1904:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5402816Z 1904 | 0x80, 2025-12-04T12:35:04.5402910Z | ^~~~ 2025-12-04T12:35:04.5404100Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1906:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5404195Z 1906 | 0x80, 2025-12-04T12:35:04.5404286Z | ^~~~ 2025-12-04T12:35:04.5405480Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1908:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5405574Z 1908 | 0x80, 2025-12-04T12:35:04.5405667Z | ^~~~ 2025-12-04T12:35:04.5406856Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1910:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5406956Z 1910 | 0x80, 2025-12-04T12:35:04.5407062Z | ^~~~ 2025-12-04T12:35:04.5408236Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1912:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5408331Z 1912 | 0x80, 2025-12-04T12:35:04.5408435Z | ^~~~ 2025-12-04T12:35:04.5409622Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1914:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5409731Z 1914 | 0x80, 2025-12-04T12:35:04.5409824Z | ^~~~ 2025-12-04T12:35:04.5411009Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1916:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5411122Z 1916 | 0x80, 2025-12-04T12:35:04.5411214Z | ^~~~ 2025-12-04T12:35:04.5412389Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1918:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5412498Z 1918 | 0x80, 2025-12-04T12:35:04.5412596Z | ^~~~ 2025-12-04T12:35:04.5413796Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1920:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5413891Z 1920 | 0x80, 2025-12-04T12:35:04.5413988Z | ^~~~ 2025-12-04T12:35:04.5415178Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1922:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5415320Z 1922 | 0x80, 2025-12-04T12:35:04.5415426Z | ^~~~ 2025-12-04T12:35:04.5416667Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1924:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5416765Z 1924 | 0x80, 2025-12-04T12:35:04.5416939Z | ^~~~ 2025-12-04T12:35:04.5418153Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1926:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5418248Z 1926 | 0x80, 2025-12-04T12:35:04.5418389Z | ^~~~ 2025-12-04T12:35:04.5419563Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1928:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5419680Z 1928 | 0x80); 2025-12-04T12:35:04.5419775Z | ^~~~ 2025-12-04T12:35:04.5420949Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1930:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5421060Z 1930 | 0x80, 2025-12-04T12:35:04.5421164Z | ^~~~ 2025-12-04T12:35:04.5422345Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1932:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5422438Z 1932 | 0x80, 2025-12-04T12:35:04.5422537Z | ^~~~ 2025-12-04T12:35:04.5423725Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1934:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5423826Z 1934 | 0x80, 2025-12-04T12:35:04.5423917Z | ^~~~ 2025-12-04T12:35:04.5425100Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1936:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5425194Z 1936 | 0x80, 2025-12-04T12:35:04.5425313Z | ^~~~ 2025-12-04T12:35:04.5426481Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1938:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5426573Z 1938 | 0x80, 2025-12-04T12:35:04.5426683Z | ^~~~ 2025-12-04T12:35:04.5427859Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1940:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5427972Z 1940 | 0x80, 2025-12-04T12:35:04.5428064Z | ^~~~ 2025-12-04T12:35:04.5429231Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1942:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5429340Z 1942 | 0x80, 2025-12-04T12:35:04.5429442Z | ^~~~ 2025-12-04T12:35:04.5430609Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1944:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5430716Z 1944 | 0x80, 2025-12-04T12:35:04.5430813Z | ^~~~ 2025-12-04T12:35:04.5432000Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1946:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5432141Z 1946 | 0x80, 2025-12-04T12:35:04.5432232Z | ^~~~ 2025-12-04T12:35:04.5433426Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1948:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5433557Z 1948 | 0x80, 2025-12-04T12:35:04.5433682Z | ^~~~ 2025-12-04T12:35:04.5434871Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1950:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5435000Z 1950 | 0x80, 2025-12-04T12:35:04.5435111Z | ^~~~ 2025-12-04T12:35:04.5436295Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1952:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5436394Z 1952 | 0x80, 2025-12-04T12:35:04.5436501Z | ^~~~ 2025-12-04T12:35:04.5437675Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1954:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5437793Z 1954 | 0x80, 2025-12-04T12:35:04.5437885Z | ^~~~ 2025-12-04T12:35:04.5439058Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1956:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5439171Z 1956 | 0x80, 2025-12-04T12:35:04.5439264Z | ^~~~ 2025-12-04T12:35:04.5440435Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1958:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5440552Z 1958 | 0x80, 2025-12-04T12:35:04.5440644Z | ^~~~ 2025-12-04T12:35:04.5441836Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1960:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5441944Z 1960 | 0x80, 2025-12-04T12:35:04.5442038Z | ^~~~ 2025-12-04T12:35:04.5443222Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1962:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5443324Z 1962 | 0x80, 2025-12-04T12:35:04.5443434Z | ^~~~ 2025-12-04T12:35:04.5444605Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1964:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5444705Z 1964 | 0x80, 2025-12-04T12:35:04.5444811Z | ^~~~ 2025-12-04T12:35:04.5445981Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1966:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5446099Z 1966 | 0x80, 2025-12-04T12:35:04.5446192Z | ^~~~ 2025-12-04T12:35:04.5447362Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1968:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5447476Z 1968 | 0x80, 2025-12-04T12:35:04.5447567Z | ^~~~ 2025-12-04T12:35:04.5448739Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1970:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5448900Z 1970 | 0x80, 2025-12-04T12:35:04.5448993Z | ^~~~ 2025-12-04T12:35:04.5450218Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1972:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5450347Z 1972 | 0x80, 2025-12-04T12:35:04.5450440Z | ^~~~ 2025-12-04T12:35:04.5451666Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1974:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5451764Z 1974 | 0x80, 2025-12-04T12:35:04.5451857Z | ^~~~ 2025-12-04T12:35:04.5453050Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1976:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5453152Z 1976 | 0x80, 2025-12-04T12:35:04.5453256Z | ^~~~ 2025-12-04T12:35:04.5454438Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1978:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5454541Z 1978 | 0x80, 2025-12-04T12:35:04.5454650Z | ^~~~ 2025-12-04T12:35:04.5455828Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1980:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5455938Z 1980 | 0x80, 2025-12-04T12:35:04.5456037Z | ^~~~ 2025-12-04T12:35:04.5457276Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1982:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5457397Z 1982 | 0x80, 2025-12-04T12:35:04.5457491Z | ^~~~ 2025-12-04T12:35:04.5458679Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1984:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5458797Z 1984 | 0x80, 2025-12-04T12:35:04.5458891Z | ^~~~ 2025-12-04T12:35:04.5460084Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1986:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5460181Z 1986 | 0x80, 2025-12-04T12:35:04.5460276Z | ^~~~ 2025-12-04T12:35:04.5461466Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1988:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5461712Z 1988 | 0x80, 2025-12-04T12:35:04.5461819Z | ^~~~ 2025-12-04T12:35:04.5463000Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1990:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5463133Z 1990 | 0x80, 2025-12-04T12:35:04.5463243Z | ^~~~ 2025-12-04T12:35:04.5464424Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1992:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5464520Z 1992 | 0x80, 2025-12-04T12:35:04.5464632Z | ^~~~ 2025-12-04T12:35:04.5465810Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:2002:38: warning: overflow in conversion from ‘int’ to ‘short int’ changes value from ‘65280’ to ‘-256’ [-Woverflow] 2025-12-04T12:35:04.5465994Z 2002 | __m512i keep_1 = _mm512_set1_epi16(0xFF00); 2025-12-04T12:35:04.5466113Z | ^~~~~~ 2025-12-04T12:35:04.5468586Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In instantiation of ‘at::vec::CPU_CAPABILITY::Vectorized at::vec::CPU_CAPABILITY::shift_512_8(const at::vec::CPU_CAPABILITY::Vectorized&, const at::vec::CPU_CAPABILITY::Vectorized&) [with bool left_shift = false; T = unsigned char; typename std::enable_if<(is_same_v || is_same_v), int>::type = 0]’: 2025-12-04T12:35:04.5469189Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:2116:28: required from here 2025-12-04T12:35:04.5470376Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1866:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5470497Z 1866 | 0x80, 2025-12-04T12:35:04.5470592Z | ^~~~ 2025-12-04T12:35:04.5471993Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1868:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5472094Z 1868 | 0x80, 2025-12-04T12:35:04.5472186Z | ^~~~ 2025-12-04T12:35:04.5473388Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1870:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5473485Z 1870 | 0x80, 2025-12-04T12:35:04.5473578Z | ^~~~ 2025-12-04T12:35:04.5474782Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1872:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5474885Z 1872 | 0x80, 2025-12-04T12:35:04.5474991Z | ^~~~ 2025-12-04T12:35:04.5476170Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1874:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5476272Z 1874 | 0x80, 2025-12-04T12:35:04.5476384Z | ^~~~ 2025-12-04T12:35:04.5477559Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1876:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5477664Z 1876 | 0x80, 2025-12-04T12:35:04.5477758Z | ^~~~ 2025-12-04T12:35:04.5478928Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1878:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5479146Z 1878 | 0x80, 2025-12-04T12:35:04.5479239Z | ^~~~ 2025-12-04T12:35:04.5480420Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1880:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5480578Z 1880 | 0x80, 2025-12-04T12:35:04.5480669Z | ^~~~ 2025-12-04T12:35:04.5481867Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1882:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5481963Z 1882 | 0x80, 2025-12-04T12:35:04.5482055Z | ^~~~ 2025-12-04T12:35:04.5483241Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1884:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5483340Z 1884 | 0x80, 2025-12-04T12:35:04.5483435Z | ^~~~ 2025-12-04T12:35:04.5484677Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1886:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5484779Z 1886 | 0x80, 2025-12-04T12:35:04.5484890Z | ^~~~ 2025-12-04T12:35:04.5486115Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1888:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5486210Z 1888 | 0x80, 2025-12-04T12:35:04.5486317Z | ^~~~ 2025-12-04T12:35:04.5487502Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1890:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5487611Z 1890 | 0x80, 2025-12-04T12:35:04.5487705Z | ^~~~ 2025-12-04T12:35:04.5488882Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1892:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5488996Z 1892 | 0x80, 2025-12-04T12:35:04.5489089Z | ^~~~ 2025-12-04T12:35:04.5490263Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1894:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5490370Z 1894 | 0x80, 2025-12-04T12:35:04.5490462Z | ^~~~ 2025-12-04T12:35:04.5491655Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1896:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5491750Z 1896 | 0x80, 2025-12-04T12:35:04.5491841Z | ^~~~ 2025-12-04T12:35:04.5493028Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1898:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5493127Z 1898 | 0x80, 2025-12-04T12:35:04.5493229Z | ^~~~ 2025-12-04T12:35:04.5494407Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1900:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5494499Z 1900 | 0x80, 2025-12-04T12:35:04.5494605Z | ^~~~ 2025-12-04T12:35:04.5495825Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1902:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5495918Z 1902 | 0x80, 2025-12-04T12:35:04.5496023Z | ^~~~ 2025-12-04T12:35:04.5497325Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1904:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5497477Z 1904 | 0x80, 2025-12-04T12:35:04.5497570Z | ^~~~ 2025-12-04T12:35:04.5498789Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1906:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5498903Z 1906 | 0x80, 2025-12-04T12:35:04.5498997Z | ^~~~ 2025-12-04T12:35:04.5500195Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1908:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5500289Z 1908 | 0x80, 2025-12-04T12:35:04.5500380Z | ^~~~ 2025-12-04T12:35:04.5501576Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1910:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5501677Z 1910 | 0x80, 2025-12-04T12:35:04.5501768Z | ^~~~ 2025-12-04T12:35:04.5502961Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1912:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5503056Z 1912 | 0x80, 2025-12-04T12:35:04.5503160Z | ^~~~ 2025-12-04T12:35:04.5504336Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1914:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5504430Z 1914 | 0x80, 2025-12-04T12:35:04.5504538Z | ^~~~ 2025-12-04T12:35:04.5505727Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1916:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5505839Z 1916 | 0x80, 2025-12-04T12:35:04.5505931Z | ^~~~ 2025-12-04T12:35:04.5507110Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1918:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5507217Z 1918 | 0x80, 2025-12-04T12:35:04.5507313Z | ^~~~ 2025-12-04T12:35:04.5508485Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1920:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5508594Z 1920 | 0x80, 2025-12-04T12:35:04.5508686Z | ^~~~ 2025-12-04T12:35:04.5509872Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1922:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5509972Z 1922 | 0x80, 2025-12-04T12:35:04.5510063Z | ^~~~ 2025-12-04T12:35:04.5511256Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1924:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5511350Z 1924 | 0x80, 2025-12-04T12:35:04.5511512Z | ^~~~ 2025-12-04T12:35:04.5512692Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1926:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5512786Z 1926 | 0x80, 2025-12-04T12:35:04.5512892Z | ^~~~ 2025-12-04T12:35:04.5514098Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1928:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5514226Z 1928 | 0x80); 2025-12-04T12:35:04.5514333Z | ^~~~ 2025-12-04T12:35:04.5515544Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1930:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5515654Z 1930 | 0x80, 2025-12-04T12:35:04.5515754Z | ^~~~ 2025-12-04T12:35:04.5516936Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1932:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5517045Z 1932 | 0x80, 2025-12-04T12:35:04.5517135Z | ^~~~ 2025-12-04T12:35:04.5518324Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1934:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5518425Z 1934 | 0x80, 2025-12-04T12:35:04.5518516Z | ^~~~ 2025-12-04T12:35:04.5519714Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1936:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5519810Z 1936 | 0x80, 2025-12-04T12:35:04.5519907Z | ^~~~ 2025-12-04T12:35:04.5521107Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1938:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5521201Z 1938 | 0x80, 2025-12-04T12:35:04.5521305Z | ^~~~ 2025-12-04T12:35:04.5522487Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1940:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5522585Z 1940 | 0x80, 2025-12-04T12:35:04.5522690Z | ^~~~ 2025-12-04T12:35:04.5523867Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1942:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5523981Z 1942 | 0x80, 2025-12-04T12:35:04.5524073Z | ^~~~ 2025-12-04T12:35:04.5525242Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1944:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5525348Z 1944 | 0x80, 2025-12-04T12:35:04.5525440Z | ^~~~ 2025-12-04T12:35:04.5526621Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1946:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5526728Z 1946 | 0x80, 2025-12-04T12:35:04.5526820Z | ^~~~ 2025-12-04T12:35:04.5528019Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1948:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5528165Z 1948 | 0x80, 2025-12-04T12:35:04.5528257Z | ^~~~ 2025-12-04T12:35:04.5529445Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1950:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5529540Z 1950 | 0x80, 2025-12-04T12:35:04.5529635Z | ^~~~ 2025-12-04T12:35:04.5530908Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1952:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5531005Z 1952 | 0x80, 2025-12-04T12:35:04.5531116Z | ^~~~ 2025-12-04T12:35:04.5532341Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1954:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5532444Z 1954 | 0x80, 2025-12-04T12:35:04.5532555Z | ^~~~ 2025-12-04T12:35:04.5533724Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1956:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5533833Z 1956 | 0x80, 2025-12-04T12:35:04.5533927Z | ^~~~ 2025-12-04T12:35:04.5535112Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1958:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5535219Z 1958 | 0x80, 2025-12-04T12:35:04.5535310Z | ^~~~ 2025-12-04T12:35:04.5536557Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1960:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5536674Z 1960 | 0x80, 2025-12-04T12:35:04.5536768Z | ^~~~ 2025-12-04T12:35:04.5537965Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1962:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5538062Z 1962 | 0x80, 2025-12-04T12:35:04.5538159Z | ^~~~ 2025-12-04T12:35:04.5539354Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1964:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5539449Z 1964 | 0x80, 2025-12-04T12:35:04.5539561Z | ^~~~ 2025-12-04T12:35:04.5540730Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1966:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5540830Z 1966 | 0x80, 2025-12-04T12:35:04.5540947Z | ^~~~ 2025-12-04T12:35:04.5542117Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1968:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5542211Z 1968 | 0x80, 2025-12-04T12:35:04.5542324Z | ^~~~ 2025-12-04T12:35:04.5543502Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1970:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5543611Z 1970 | 0x80, 2025-12-04T12:35:04.5543711Z | ^~~~ 2025-12-04T12:35:04.5544886Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1972:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5545061Z 1972 | 0x80, 2025-12-04T12:35:04.5545156Z | ^~~~ 2025-12-04T12:35:04.5546348Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1974:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5546444Z 1974 | 0x80, 2025-12-04T12:35:04.5546606Z | ^~~~ 2025-12-04T12:35:04.5547796Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1976:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5547892Z 1976 | 0x80, 2025-12-04T12:35:04.5548020Z | ^~~~ 2025-12-04T12:35:04.5549211Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1978:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5549316Z 1978 | 0x80, 2025-12-04T12:35:04.5549423Z | ^~~~ 2025-12-04T12:35:04.5550594Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1980:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5550691Z 1980 | 0x80, 2025-12-04T12:35:04.5550813Z | ^~~~ 2025-12-04T12:35:04.5551987Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1982:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5552096Z 1982 | 0x80, 2025-12-04T12:35:04.5552196Z | ^~~~ 2025-12-04T12:35:04.5553368Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1984:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5553488Z 1984 | 0x80, 2025-12-04T12:35:04.5553582Z | ^~~~ 2025-12-04T12:35:04.5554755Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1986:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5554873Z 1986 | 0x80, 2025-12-04T12:35:04.5554974Z | ^~~~ 2025-12-04T12:35:04.5556158Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1988:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5556260Z 1988 | 0x80, 2025-12-04T12:35:04.5556355Z | ^~~~ 2025-12-04T12:35:04.5557549Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1990:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5557691Z 1990 | 0x80, 2025-12-04T12:35:04.5557801Z | ^~~~ 2025-12-04T12:35:04.5558981Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1992:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5559086Z 1992 | 0x80, 2025-12-04T12:35:04.5559232Z | ^~~~ 2025-12-04T12:35:04.5560416Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:2002:38: warning: overflow in conversion from ‘int’ to ‘short int’ changes value from ‘65280’ to ‘-256’ [-Woverflow] 2025-12-04T12:35:04.5560584Z 2002 | __m512i keep_1 = _mm512_set1_epi16(0xFF00); 2025-12-04T12:35:04.5560719Z | ^~~~~~ 2025-12-04T12:35:04.5561225Z In file included from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512.h:16, 2025-12-04T12:35:04.5561617Z from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec.h:5, 2025-12-04T12:35:04.5562061Z from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/functional_base.h:7, 2025-12-04T12:35:04.5562506Z from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/functional.h:4, 2025-12-04T12:35:04.5562987Z from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/torch/csrc/inductor/cpp_prefix.h:45, 2025-12-04T12:35:04.5563688Z from /tmp/zNkm53/tmpp0pxkc5y/data/aotinductor/model/cuzep6c3r5og2e4er75yunzp3ebrphkoo5sxbhxof3er5uux4ih3.wrapper.cpp:750: 2025-12-04T12:35:04.5565184Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h: In instantiation of ‘void at::vec::CPU_CAPABILITY::QuantizeAvx512(const float*, T*, int, float, int64_t) [with T = signed char; int64_t = long int]’: 2025-12-04T12:35:04.5565765Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:696:31: required from here 2025-12-04T12:35:04.5566972Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:201:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.5567094Z 201 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.5567185Z | ^~~~ 2025-12-04T12:35:04.5568386Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:201:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.5568500Z 201 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.5568600Z | ^~~~ 2025-12-04T12:35:04.5569804Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:201:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.5569924Z 201 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.5570037Z | ^~~~ 2025-12-04T12:35:04.5571395Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:201:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.5571517Z 201 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.5571633Z | ^~~~ 2025-12-04T12:35:04.5572824Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:202:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.5572953Z 202 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.5573136Z | ^~~~ 2025-12-04T12:35:04.5574323Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:202:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.5574448Z 202 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.5574548Z | ^~~~ 2025-12-04T12:35:04.5575750Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:202:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.5575912Z 202 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.5576014Z | ^~~~ 2025-12-04T12:35:04.5577297Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:202:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.5577418Z 202 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.5577519Z | ^~~~ 2025-12-04T12:35:04.5578715Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:203:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.5578900Z 203 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.5579014Z | ^~~~ 2025-12-04T12:35:04.5580198Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:203:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.5580360Z 203 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.5580472Z | ^~~~ 2025-12-04T12:35:04.5581665Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:203:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.5581795Z 203 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.5581897Z | ^~~~ 2025-12-04T12:35:04.5583081Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:203:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.5583212Z 203 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.5583313Z | ^~~~ 2025-12-04T12:35:04.5584492Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:205:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.5584619Z 205 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.5584714Z | ^~~~ 2025-12-04T12:35:04.5585906Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:205:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.5586025Z 205 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.5586123Z | ^~~~ 2025-12-04T12:35:04.5587322Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:205:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.5587441Z 205 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.5587554Z | ^~~~ 2025-12-04T12:35:04.5588743Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:205:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.5588857Z 205 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.5589018Z | ^~~~ 2025-12-04T12:35:04.5590202Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:206:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.5590329Z 206 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.5590423Z | ^~~~ 2025-12-04T12:35:04.5591636Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:206:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.5591813Z 206 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.5591910Z | ^~~~ 2025-12-04T12:35:04.5593125Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:206:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.5593258Z 206 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.5593358Z | ^~~~ 2025-12-04T12:35:04.5594546Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:206:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.5594658Z 206 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.5594770Z | ^~~~ 2025-12-04T12:35:04.5595960Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:207:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.5596079Z 207 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.5596185Z | ^~~~ 2025-12-04T12:35:04.5597365Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:207:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.5597481Z 207 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.5597591Z | ^~~~ 2025-12-04T12:35:04.5598776Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:207:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.5598906Z 207 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.5599007Z | ^~~~ 2025-12-04T12:35:04.5600198Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:207:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.5600323Z 207 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.5600425Z | ^~~~ 2025-12-04T12:35:04.5601601Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:209:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.5601733Z 209 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.5601824Z | ^~~~ 2025-12-04T12:35:04.5603020Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:209:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.5603137Z 209 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.5603231Z | ^~~~ 2025-12-04T12:35:04.5604429Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:209:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.5604540Z 209 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.5604696Z | ^~~~ 2025-12-04T12:35:04.5605875Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:209:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.5605986Z 209 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.5606102Z | ^~~~ 2025-12-04T12:35:04.5607355Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:210:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.5607483Z 210 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.5607579Z | ^~~~ 2025-12-04T12:35:04.5608801Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:210:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.5608931Z 210 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.5609028Z | ^~~~ 2025-12-04T12:35:04.5610216Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:210:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.5610340Z 210 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.5610451Z | ^~~~ 2025-12-04T12:35:04.5611647Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:210:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.5611764Z 210 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.5611865Z | ^~~~ 2025-12-04T12:35:04.5613056Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:211:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.5613174Z 211 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.5613281Z | ^~~~ 2025-12-04T12:35:04.5614462Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:211:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.5614580Z 211 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.5614691Z | ^~~~ 2025-12-04T12:35:04.5615877Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:211:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.5616004Z 211 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.5616105Z | ^~~~ 2025-12-04T12:35:04.5617359Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:211:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.5617494Z 211 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.5617599Z | ^~~~ 2025-12-04T12:35:04.5618781Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:213:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.5618918Z 213 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.5619011Z | ^~~~ 2025-12-04T12:35:04.5620217Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:213:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.5620330Z 213 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.5620469Z | ^~~~ 2025-12-04T12:35:04.5621667Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:213:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.5621781Z 213 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.5621895Z | ^~~~ 2025-12-04T12:35:04.5623119Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:213:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.5623264Z 213 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.5623411Z | ^~~~ 2025-12-04T12:35:04.5624589Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:214:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.5624720Z 214 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.5624813Z | ^~~~ 2025-12-04T12:35:04.5625995Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:214:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.5626126Z 214 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.5626232Z | ^~~~ 2025-12-04T12:35:04.5627416Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:214:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.5627548Z 214 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.5627649Z | ^~~~ 2025-12-04T12:35:04.5628838Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:214:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.5628958Z 214 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.5629061Z | ^~~~ 2025-12-04T12:35:04.5630256Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:215:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.5630373Z 215 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.5630481Z | ^~~~ 2025-12-04T12:35:04.5631668Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:215:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.5631780Z 215 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.5631893Z | ^~~~ 2025-12-04T12:35:04.5633075Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:215:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.5633209Z 215 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.5633311Z | ^~~~ 2025-12-04T12:35:04.5634510Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:215:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.5634642Z 215 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.5634748Z | ^~~~ 2025-12-04T12:35:04.5636221Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h: In instantiation of ‘void at::vec::CPU_CAPABILITY::QuantizeAvx512(const float*, T*, int, float, int64_t) [with T = unsigned char; int64_t = long int]’: 2025-12-04T12:35:04.5636816Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:933:31: required from here 2025-12-04T12:35:04.5638038Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:201:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.5638168Z 201 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.5638332Z | ^~~~ 2025-12-04T12:35:04.5639517Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:201:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.5639681Z 201 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.5639781Z | ^~~~ 2025-12-04T12:35:04.5640985Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:201:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.5641106Z 201 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.5641208Z | ^~~~ 2025-12-04T12:35:04.5642413Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:201:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.5642534Z 201 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.5642650Z | ^~~~ 2025-12-04T12:35:04.5643832Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:202:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.5643946Z 202 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.5644057Z | ^~~~ 2025-12-04T12:35:04.5645247Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:202:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.5645387Z 202 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.5645485Z | ^~~~ 2025-12-04T12:35:04.5646675Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:202:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.5646809Z 202 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.5646910Z | ^~~~ 2025-12-04T12:35:04.5648096Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:202:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.5648224Z 202 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.5648327Z | ^~~~ 2025-12-04T12:35:04.5649555Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:203:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.5649667Z 203 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.5649761Z | ^~~~ 2025-12-04T12:35:04.5650967Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:203:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.5651138Z 203 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.5651253Z | ^~~~ 2025-12-04T12:35:04.5652441Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:203:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.5652557Z 203 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.5652676Z | ^~~~ 2025-12-04T12:35:04.5653857Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:203:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.5653982Z 203 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.5654127Z | ^~~~ 2025-12-04T12:35:04.5655306Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:205:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.5655474Z 205 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.5655569Z | ^~~~ 2025-12-04T12:35:04.5656825Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:205:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.5656959Z 205 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.5657107Z | ^~~~ 2025-12-04T12:35:04.5658311Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:205:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.5658430Z 205 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.5658529Z | ^~~~ 2025-12-04T12:35:04.5659728Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:205:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.5659839Z 205 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.5659953Z | ^~~~ 2025-12-04T12:35:04.5661124Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:206:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.5661242Z 206 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.5661346Z | ^~~~ 2025-12-04T12:35:04.5662527Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:206:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.5662656Z 206 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.5662756Z | ^~~~ 2025-12-04T12:35:04.5663940Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:206:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.5664066Z 206 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.5664166Z | ^~~~ 2025-12-04T12:35:04.5665406Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:206:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.5665531Z 206 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.5665635Z | ^~~~ 2025-12-04T12:35:04.5666834Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:207:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.5666987Z 207 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.5667083Z | ^~~~ 2025-12-04T12:35:04.5668290Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:207:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.5668408Z 207 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.5668517Z | ^~~~ 2025-12-04T12:35:04.5669705Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:207:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.5669816Z 207 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.5669973Z | ^~~~ 2025-12-04T12:35:04.5671359Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:207:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.5671554Z 207 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.5671672Z | ^~~~ 2025-12-04T12:35:04.5673341Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:209:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.5673480Z 209 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.5673574Z | ^~~~ 2025-12-04T12:35:04.5674773Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:209:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.5674905Z 209 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.5675004Z | ^~~~ 2025-12-04T12:35:04.5676204Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:209:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.5676316Z 209 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.5676418Z | ^~~~ 2025-12-04T12:35:04.5677614Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:209:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.5677738Z 209 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.5677851Z | ^~~~ 2025-12-04T12:35:04.5679033Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:210:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.5679153Z 210 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.5679261Z | ^~~~ 2025-12-04T12:35:04.5680451Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:210:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.5680563Z 210 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.5680756Z | ^~~~ 2025-12-04T12:35:04.5681950Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:210:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.5682077Z 210 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.5682178Z | ^~~~ 2025-12-04T12:35:04.5683404Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:210:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.5683577Z 210 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.5683678Z | ^~~~ 2025-12-04T12:35:04.5684905Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:211:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.5685024Z 211 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.5685119Z | ^~~~ 2025-12-04T12:35:04.5686319Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:211:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.5686433Z 211 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.5686558Z | ^~~~ 2025-12-04T12:35:04.5687739Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:211:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.5687856Z 211 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.5687972Z | ^~~~ 2025-12-04T12:35:04.5689154Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:211:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.5689272Z 211 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.5689386Z | ^~~~ 2025-12-04T12:35:04.5690573Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:213:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.5690709Z 213 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.5690801Z | ^~~~ 2025-12-04T12:35:04.5691982Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:213:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.5692107Z 213 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.5692204Z | ^~~~ 2025-12-04T12:35:04.5693403Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:213:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.5693521Z 213 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.5693621Z | ^~~~ 2025-12-04T12:35:04.5694821Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:213:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.5694938Z 213 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.5695038Z | ^~~~ 2025-12-04T12:35:04.5696230Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:214:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.5696421Z 214 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.5696580Z | ^~~~ 2025-12-04T12:35:04.5697776Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:214:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.5697887Z 214 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.5697999Z | ^~~~ 2025-12-04T12:35:04.5699217Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:214:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.5699378Z 214 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.5699479Z | ^~~~ 2025-12-04T12:35:04.5700697Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:214:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.5700828Z 214 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.5700930Z | ^~~~ 2025-12-04T12:35:04.5702126Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:215:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.5702244Z 215 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.5702344Z | ^~~~ 2025-12-04T12:35:04.5703537Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:215:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.5703654Z 215 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.5703750Z | ^~~~ 2025-12-04T12:35:04.5704947Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:215:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.5705065Z 215 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.5705181Z | ^~~~ 2025-12-04T12:35:04.5706374Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:215:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.5706492Z 215 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.5706609Z | ^~~~ 2025-12-04T12:35:04.5706717Z PASSED [9.3053s] [ 39%] 2025-12-04T12:35:04.5707758Z inductor/test_aot_inductor_package.py::TestAOTInductorPackageCpp_cpu::test_multiple_methods In file included from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_float.h:12, 2025-12-04T12:35:04.5708198Z from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512.h:11, 2025-12-04T12:35:04.5708573Z from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec.h:5, 2025-12-04T12:35:04.5709035Z from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/functional_base.h:7, 2025-12-04T12:35:04.5709442Z from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/functional.h:4, 2025-12-04T12:35:04.5709921Z from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/torch/csrc/inductor/cpp_prefix.h:45, 2025-12-04T12:35:04.5710568Z from /tmp/hdMAUq/tmp2gno_q_y/data/aotinductor/model1/cji6fcfpjxr5ad3oypbruxr5r26niflgwwkmd5rthzuhxclq6uis.wrapper.cpp:751: 2025-12-04T12:35:04.5711168Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/sleef.h:192:10: warning: ISO C++ prohibits anonymous structs [-Wpedantic] 2025-12-04T12:35:04.5711279Z 192 | struct { 2025-12-04T12:35:04.5711434Z | ^ 2025-12-04T12:35:04.5711950Z In file included from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512.h:15, 2025-12-04T12:35:04.5712317Z from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec.h:5, 2025-12-04T12:35:04.5712786Z from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/functional_base.h:7, 2025-12-04T12:35:04.5713231Z from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/functional.h:4, 2025-12-04T12:35:04.5713692Z from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/torch/csrc/inductor/cpp_prefix.h:45, 2025-12-04T12:35:04.5714369Z from /tmp/hdMAUq/tmp2gno_q_y/data/aotinductor/model1/cji6fcfpjxr5ad3oypbruxr5r26niflgwwkmd5rthzuhxclq6uis.wrapper.cpp:751: 2025-12-04T12:35:04.5716605Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In static member function ‘static at::vec::CPU_CAPABILITY::Vectorized at::vec::CPU_CAPABILITY::Vectorized::blendv(const at::vec::CPU_CAPABILITY::Vectorized&, const at::vec::CPU_CAPABILITY::Vectorized&, const at::vec::CPU_CAPABILITY::Vectorized&)’: 2025-12-04T12:35:04.5717797Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:544:38: warning: overflow in conversion from ‘int’ to ‘short int’ changes value from ‘65535’ to ‘-1’ [-Woverflow] 2025-12-04T12:35:04.5717968Z 544 | auto msb_one = _mm512_set1_epi16(0xFFFF); 2025-12-04T12:35:04.5718086Z | ^~~~~~ 2025-12-04T12:35:04.5718608Z In file included from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512.h:15, 2025-12-04T12:35:04.5718978Z from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec.h:5, 2025-12-04T12:35:04.5719424Z from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/functional_base.h:7, 2025-12-04T12:35:04.5719844Z from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/functional.h:4, 2025-12-04T12:35:04.5720312Z from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/torch/csrc/inductor/cpp_prefix.h:45, 2025-12-04T12:35:04.5720980Z from /tmp/hdMAUq/tmp2gno_q_y/data/aotinductor/model1/cji6fcfpjxr5ad3oypbruxr5r26niflgwwkmd5rthzuhxclq6uis.wrapper.cpp:751: 2025-12-04T12:35:04.5722610Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In member function ‘at::vec::CPU_CAPABILITY::Vectorized at::vec::CPU_CAPABILITY::Vectorized::operator==(const at::vec::CPU_CAPABILITY::Vectorized&) const’: 2025-12-04T12:35:04.5723793Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:697:54: warning: overflow in conversion from ‘int’ to ‘short int’ changes value from ‘65535’ to ‘-1’ [-Woverflow] 2025-12-04T12:35:04.5724009Z 697 | return _mm512_mask_set1_epi16(zero_vector, mask, 0xFFFF); 2025-12-04T12:35:04.5724139Z | ^~~~~~ 2025-12-04T12:35:04.5725774Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In member function ‘at::vec::CPU_CAPABILITY::Vectorized at::vec::CPU_CAPABILITY::Vectorized::operator!=(const at::vec::CPU_CAPABILITY::Vectorized&) const’: 2025-12-04T12:35:04.5726950Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:701:54: warning: overflow in conversion from ‘int’ to ‘short int’ changes value from ‘65535’ to ‘-1’ [-Woverflow] 2025-12-04T12:35:04.5727172Z 701 | return _mm512_mask_set1_epi16(zero_vector, mask, 0xFFFF); 2025-12-04T12:35:04.5727341Z | ^~~~~~ 2025-12-04T12:35:04.5728982Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In member function ‘at::vec::CPU_CAPABILITY::Vectorized at::vec::CPU_CAPABILITY::Vectorized::operator<(const at::vec::CPU_CAPABILITY::Vectorized&) const’: 2025-12-04T12:35:04.5730187Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:705:54: warning: overflow in conversion from ‘int’ to ‘short int’ changes value from ‘65535’ to ‘-1’ [-Woverflow] 2025-12-04T12:35:04.5730426Z 705 | return _mm512_mask_set1_epi16(zero_vector, mask, 0xFFFF); 2025-12-04T12:35:04.5730568Z | ^~~~~~ 2025-12-04T12:35:04.5732187Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In member function ‘at::vec::CPU_CAPABILITY::Vectorized at::vec::CPU_CAPABILITY::Vectorized::operator<=(const at::vec::CPU_CAPABILITY::Vectorized&) const’: 2025-12-04T12:35:04.5733368Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:709:54: warning: overflow in conversion from ‘int’ to ‘short int’ changes value from ‘65535’ to ‘-1’ [-Woverflow] 2025-12-04T12:35:04.5733587Z 709 | return _mm512_mask_set1_epi16(zero_vector, mask, 0xFFFF); 2025-12-04T12:35:04.5733712Z | ^~~~~~ 2025-12-04T12:35:04.5735342Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In member function ‘at::vec::CPU_CAPABILITY::Vectorized at::vec::CPU_CAPABILITY::Vectorized::operator>(const at::vec::CPU_CAPABILITY::Vectorized&) const’: 2025-12-04T12:35:04.5736584Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:713:54: warning: overflow in conversion from ‘int’ to ‘short int’ changes value from ‘65535’ to ‘-1’ [-Woverflow] 2025-12-04T12:35:04.5736818Z 713 | return _mm512_mask_set1_epi16(zero_vector, mask, 0xFFFF); 2025-12-04T12:35:04.5736945Z | ^~~~~~ 2025-12-04T12:35:04.5738588Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In member function ‘at::vec::CPU_CAPABILITY::Vectorized at::vec::CPU_CAPABILITY::Vectorized::operator>=(const at::vec::CPU_CAPABILITY::Vectorized&) const’: 2025-12-04T12:35:04.5740321Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:717:54: warning: overflow in conversion from ‘int’ to ‘short int’ changes value from ‘65535’ to ‘-1’ [-Woverflow] 2025-12-04T12:35:04.5740529Z 717 | return _mm512_mask_set1_epi16(zero_vector, mask, 0xFFFF); 2025-12-04T12:35:04.5740677Z | ^~~~~~ 2025-12-04T12:35:04.5742969Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In static member function ‘static at::vec::CPU_CAPABILITY::Vectorized at::vec::CPU_CAPABILITY::Vectorized::blendv(const at::vec::CPU_CAPABILITY::Vectorized&, const at::vec::CPU_CAPABILITY::Vectorized&, const at::vec::CPU_CAPABILITY::Vectorized&)’: 2025-12-04T12:35:04.5744194Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1153:37: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.5744351Z 1153 | auto msb_one = _mm512_set1_epi8(0xFF); 2025-12-04T12:35:04.5744481Z | ^~~~ 2025-12-04T12:35:04.5746486Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In member function ‘at::vec::CPU_CAPABILITY::Vectorized at::vec::CPU_CAPABILITY::Vectorized::operator==(const at::vec::CPU_CAPABILITY::Vectorized&) const’: 2025-12-04T12:35:04.5748695Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1166:53: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.5748960Z 1166 | return _mm512_mask_set1_epi8(zero_vector, mask, 0xFF); 2025-12-04T12:35:04.5749088Z | ^~~~ 2025-12-04T12:35:04.5750825Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In member function ‘at::vec::CPU_CAPABILITY::Vectorized at::vec::CPU_CAPABILITY::Vectorized::operator!=(const at::vec::CPU_CAPABILITY::Vectorized&) const’: 2025-12-04T12:35:04.5752822Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1170:53: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.5753814Z 1170 | return _mm512_mask_set1_epi8(zero_vector, mask, 0xFF); 2025-12-04T12:35:04.5753962Z | ^~~~ 2025-12-04T12:35:04.5755657Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In member function ‘at::vec::CPU_CAPABILITY::Vectorized at::vec::CPU_CAPABILITY::Vectorized::operator<(const at::vec::CPU_CAPABILITY::Vectorized&) const’: 2025-12-04T12:35:04.5757143Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1174:53: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.5757349Z 1174 | return _mm512_mask_set1_epi8(zero_vector, mask, 0xFF); 2025-12-04T12:35:04.5757889Z | ^~~~ 2025-12-04T12:35:04.5760826Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In member function ‘at::vec::CPU_CAPABILITY::Vectorized at::vec::CPU_CAPABILITY::Vectorized::operator<=(const at::vec::CPU_CAPABILITY::Vectorized&) const’: 2025-12-04T12:35:04.5762216Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1178:53: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.5762475Z 1178 | return _mm512_mask_set1_epi8(zero_vector, mask, 0xFF); 2025-12-04T12:35:04.5762605Z | ^~~~ 2025-12-04T12:35:04.5766033Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In static member function ‘static at::vec::CPU_CAPABILITY::Vectorized at::vec::CPU_CAPABILITY::Vectorized::blendv(const at::vec::CPU_CAPABILITY::Vectorized&, const at::vec::CPU_CAPABILITY::Vectorized&, const at::vec::CPU_CAPABILITY::Vectorized&)’: 2025-12-04T12:35:04.5767259Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1207:37: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.5767428Z 1207 | auto msb_one = _mm512_set1_epi8(0xFF); 2025-12-04T12:35:04.5767543Z | ^~~~ 2025-12-04T12:35:04.5771323Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In member function ‘at::vec::CPU_CAPABILITY::Vectorized at::vec::CPU_CAPABILITY::Vectorized::operator==(const at::vec::CPU_CAPABILITY::Vectorized&) const’: 2025-12-04T12:35:04.5773850Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1220:53: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.5774058Z 1220 | return _mm512_mask_set1_epi8(zero_vector, mask, 0xFF); 2025-12-04T12:35:04.5774289Z | ^~~~ 2025-12-04T12:35:04.5779134Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In member function ‘at::vec::CPU_CAPABILITY::Vectorized at::vec::CPU_CAPABILITY::Vectorized::operator!=(const at::vec::CPU_CAPABILITY::Vectorized&) const’: 2025-12-04T12:35:04.5782764Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1224:53: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.5784003Z 1224 | return _mm512_mask_set1_epi8(zero_vector, mask, 0xFF); 2025-12-04T12:35:04.5784130Z | ^~~~ 2025-12-04T12:35:04.5791057Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In member function ‘at::vec::CPU_CAPABILITY::Vectorized at::vec::CPU_CAPABILITY::Vectorized::operator<(const at::vec::CPU_CAPABILITY::Vectorized&) const’: 2025-12-04T12:35:04.5793260Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1228:53: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.5793483Z 1228 | return _mm512_mask_set1_epi8(zero_vector, mask, 0xFF); 2025-12-04T12:35:04.5793609Z | ^~~~ 2025-12-04T12:35:04.5797214Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In member function ‘at::vec::CPU_CAPABILITY::Vectorized at::vec::CPU_CAPABILITY::Vectorized::operator<=(const at::vec::CPU_CAPABILITY::Vectorized&) const’: 2025-12-04T12:35:04.5798858Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1232:53: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.5799071Z 1232 | return _mm512_mask_set1_epi8(zero_vector, mask, 0xFF); 2025-12-04T12:35:04.5799209Z | ^~~~ 2025-12-04T12:35:04.5803164Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In instantiation of ‘at::vec::CPU_CAPABILITY::Vectorized at::vec::CPU_CAPABILITY::shift_512_8(const at::vec::CPU_CAPABILITY::Vectorized&, const at::vec::CPU_CAPABILITY::Vectorized&) [with bool left_shift = true; T = signed char; typename std::enable_if<(is_same_v || is_same_v), int>::type = 0]’: 2025-12-04T12:35:04.5804696Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:2074:27: required from here 2025-12-04T12:35:04.5806698Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1866:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5807441Z 1866 | 0x80, 2025-12-04T12:35:04.5807556Z | ^~~~ 2025-12-04T12:35:04.5808902Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1868:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5809014Z 1868 | 0x80, 2025-12-04T12:35:04.5809114Z | ^~~~ 2025-12-04T12:35:04.5810405Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1870:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5810518Z 1870 | 0x80, 2025-12-04T12:35:04.5811265Z | ^~~~ 2025-12-04T12:35:04.5812543Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1872:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5812647Z 1872 | 0x80, 2025-12-04T12:35:04.5812739Z | ^~~~ 2025-12-04T12:35:04.5815302Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1874:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5815406Z 1874 | 0x80, 2025-12-04T12:35:04.5815630Z | ^~~~ 2025-12-04T12:35:04.5817639Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1876:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5817739Z 1876 | 0x80, 2025-12-04T12:35:04.5817846Z | ^~~~ 2025-12-04T12:35:04.5819477Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1878:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5819699Z 1878 | 0x80, 2025-12-04T12:35:04.5819809Z | ^~~~ 2025-12-04T12:35:04.5821683Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1880:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5821802Z 1880 | 0x80, 2025-12-04T12:35:04.5821897Z | ^~~~ 2025-12-04T12:35:04.5823746Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1882:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5823862Z 1882 | 0x80, 2025-12-04T12:35:04.5824037Z | ^~~~ 2025-12-04T12:35:04.5825234Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1884:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5825353Z 1884 | 0x80, 2025-12-04T12:35:04.5825447Z | ^~~~ 2025-12-04T12:35:04.5826780Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1886:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5826951Z 1886 | 0x80, 2025-12-04T12:35:04.5827046Z | ^~~~ 2025-12-04T12:35:04.5829475Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1888:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5829578Z 1888 | 0x80, 2025-12-04T12:35:04.5829674Z | ^~~~ 2025-12-04T12:35:04.5832419Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1890:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5832519Z 1890 | 0x80, 2025-12-04T12:35:04.5832629Z | ^~~~ 2025-12-04T12:35:04.5834022Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1892:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5834127Z 1892 | 0x80, 2025-12-04T12:35:04.5834463Z | ^~~~ 2025-12-04T12:35:04.5835834Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1894:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5835947Z 1894 | 0x80, 2025-12-04T12:35:04.5836042Z | ^~~~ 2025-12-04T12:35:04.5837294Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1896:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5837404Z 1896 | 0x80, 2025-12-04T12:35:04.5839181Z | ^~~~ 2025-12-04T12:35:04.5840437Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1898:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5840556Z 1898 | 0x80, 2025-12-04T12:35:04.5840650Z | ^~~~ 2025-12-04T12:35:04.5842762Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1900:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5842963Z 1900 | 0x80, 2025-12-04T12:35:04.5843067Z | ^~~~ 2025-12-04T12:35:04.5844809Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1902:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5844907Z 1902 | 0x80, 2025-12-04T12:35:04.5845023Z | ^~~~ 2025-12-04T12:35:04.5847700Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1904:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5847809Z 1904 | 0x80, 2025-12-04T12:35:04.5848545Z | ^~~~ 2025-12-04T12:35:04.5850519Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1906:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5850633Z 1906 | 0x80, 2025-12-04T12:35:04.5850733Z | ^~~~ 2025-12-04T12:35:04.5851939Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1908:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5852050Z 1908 | 0x80, 2025-12-04T12:35:04.5852151Z | ^~~~ 2025-12-04T12:35:04.5853322Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1910:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5853502Z 1910 | 0x80, 2025-12-04T12:35:04.5853593Z | ^~~~ 2025-12-04T12:35:04.5854788Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1912:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5854883Z 1912 | 0x80, 2025-12-04T12:35:04.5855048Z | ^~~~ 2025-12-04T12:35:04.5856233Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1914:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5856402Z 1914 | 0x80, 2025-12-04T12:35:04.5856538Z | ^~~~ 2025-12-04T12:35:04.5857740Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1916:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5857843Z 1916 | 0x80, 2025-12-04T12:35:04.5857949Z | ^~~~ 2025-12-04T12:35:04.5859121Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1918:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5859215Z 1918 | 0x80, 2025-12-04T12:35:04.5859335Z | ^~~~ 2025-12-04T12:35:04.5860504Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1920:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5860614Z 1920 | 0x80, 2025-12-04T12:35:04.5860712Z | ^~~~ 2025-12-04T12:35:04.5861883Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1922:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5861998Z 1922 | 0x80, 2025-12-04T12:35:04.5862091Z | ^~~~ 2025-12-04T12:35:04.5863275Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1924:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5863382Z 1924 | 0x80, 2025-12-04T12:35:04.5863486Z | ^~~~ 2025-12-04T12:35:04.5864676Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1926:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5864774Z 1926 | 0x80, 2025-12-04T12:35:04.5864871Z | ^~~~ 2025-12-04T12:35:04.5866060Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1928:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5866166Z 1928 | 0x80); 2025-12-04T12:35:04.5866272Z | ^~~~ 2025-12-04T12:35:04.5867443Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1930:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5867544Z 1930 | 0x80, 2025-12-04T12:35:04.5867658Z | ^~~~ 2025-12-04T12:35:04.5868830Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1932:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5868930Z 1932 | 0x80, 2025-12-04T12:35:04.5869039Z | ^~~~ 2025-12-04T12:35:04.5870206Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1934:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5870353Z 1934 | 0x80, 2025-12-04T12:35:04.5881158Z | ^~~~ 2025-12-04T12:35:04.5882586Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1936:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5882989Z 1936 | 0x80, 2025-12-04T12:35:04.5883088Z | ^~~~ 2025-12-04T12:35:04.5884497Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1938:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5884694Z 1938 | 0x80, 2025-12-04T12:35:04.5885061Z | ^~~~ 2025-12-04T12:35:04.5887073Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1940:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5887190Z 1940 | 0x80, 2025-12-04T12:35:04.5887285Z | ^~~~ 2025-12-04T12:35:04.5888588Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1942:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5888702Z 1942 | 0x80, 2025-12-04T12:35:04.5888796Z | ^~~~ 2025-12-04T12:35:04.5890740Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1944:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5890943Z 1944 | 0x80, 2025-12-04T12:35:04.5891055Z | ^~~~ 2025-12-04T12:35:04.5894292Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1946:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5894489Z 1946 | 0x80, 2025-12-04T12:35:04.5894601Z | ^~~~ 2025-12-04T12:35:04.5898562Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1948:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5899445Z 1948 | 0x80, 2025-12-04T12:35:04.5899544Z | ^~~~ 2025-12-04T12:35:04.5901673Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1950:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5901887Z 1950 | 0x80, 2025-12-04T12:35:04.5901983Z | ^~~~ 2025-12-04T12:35:04.5903172Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1952:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5903294Z 1952 | 0x80, 2025-12-04T12:35:04.5903388Z | ^~~~ 2025-12-04T12:35:04.5904589Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1954:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5904692Z 1954 | 0x80, 2025-12-04T12:35:04.5904786Z | ^~~~ 2025-12-04T12:35:04.5905983Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1956:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5906080Z 1956 | 0x80, 2025-12-04T12:35:04.5906188Z | ^~~~ 2025-12-04T12:35:04.5907360Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1958:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5907636Z 1958 | 0x80, 2025-12-04T12:35:04.5907741Z | ^~~~ 2025-12-04T12:35:04.5908967Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1960:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5909096Z 1960 | 0x80, 2025-12-04T12:35:04.5909207Z | ^~~~ 2025-12-04T12:35:04.5910424Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1962:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5910536Z 1962 | 0x80, 2025-12-04T12:35:04.5910630Z | ^~~~ 2025-12-04T12:35:04.5911808Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1964:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5911924Z 1964 | 0x80, 2025-12-04T12:35:04.5912016Z | ^~~~ 2025-12-04T12:35:04.5913208Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1966:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5913308Z 1966 | 0x80, 2025-12-04T12:35:04.5913397Z | ^~~~ 2025-12-04T12:35:04.5914586Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1968:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5914678Z 1968 | 0x80, 2025-12-04T12:35:04.5914769Z | ^~~~ 2025-12-04T12:35:04.5915952Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1970:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5916052Z 1970 | 0x80, 2025-12-04T12:35:04.5916154Z | ^~~~ 2025-12-04T12:35:04.5917334Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1972:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5917432Z 1972 | 0x80, 2025-12-04T12:35:04.5917537Z | ^~~~ 2025-12-04T12:35:04.5918718Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1974:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5918823Z 1974 | 0x80, 2025-12-04T12:35:04.5921384Z | ^~~~ 2025-12-04T12:35:04.5922600Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1976:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5922719Z 1976 | 0x80, 2025-12-04T12:35:04.5922810Z | ^~~~ 2025-12-04T12:35:04.5923983Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1978:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5924095Z 1978 | 0x80, 2025-12-04T12:35:04.5924185Z | ^~~~ 2025-12-04T12:35:04.5925374Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1980:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5925469Z 1980 | 0x80, 2025-12-04T12:35:04.5925696Z | ^~~~ 2025-12-04T12:35:04.5926897Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1982:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5927059Z 1982 | 0x80, 2025-12-04T12:35:04.5927169Z | ^~~~ 2025-12-04T12:35:04.5928384Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1984:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5928518Z 1984 | 0x80, 2025-12-04T12:35:04.5928624Z | ^~~~ 2025-12-04T12:35:04.5929837Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1986:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5929934Z 1986 | 0x80, 2025-12-04T12:35:04.5930043Z | ^~~~ 2025-12-04T12:35:04.5931219Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1988:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5931333Z 1988 | 0x80, 2025-12-04T12:35:04.5931427Z | ^~~~ 2025-12-04T12:35:04.5932606Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1990:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5932719Z 1990 | 0x80, 2025-12-04T12:35:04.5932811Z | ^~~~ 2025-12-04T12:35:04.5934004Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1992:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5934098Z 1992 | 0x80, 2025-12-04T12:35:04.5934191Z | ^~~~ 2025-12-04T12:35:04.5935394Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:2002:38: warning: overflow in conversion from ‘int’ to ‘short int’ changes value from ‘65280’ to ‘-256’ [-Woverflow] 2025-12-04T12:35:04.5935560Z 2002 | __m512i keep_1 = _mm512_set1_epi16(0xFF00); 2025-12-04T12:35:04.5935678Z | ^~~~~~ 2025-12-04T12:35:04.5938214Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In instantiation of ‘at::vec::CPU_CAPABILITY::Vectorized at::vec::CPU_CAPABILITY::shift_512_8(const at::vec::CPU_CAPABILITY::Vectorized&, const at::vec::CPU_CAPABILITY::Vectorized&) [with bool left_shift = true; T = unsigned char; typename std::enable_if<(is_same_v || is_same_v), int>::type = 0]’: 2025-12-04T12:35:04.5938805Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:2081:27: required from here 2025-12-04T12:35:04.5940014Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1866:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5940156Z 1866 | 0x80, 2025-12-04T12:35:04.5940265Z | ^~~~ 2025-12-04T12:35:04.5941451Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1868:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5941583Z 1868 | 0x80, 2025-12-04T12:35:04.5941688Z | ^~~~ 2025-12-04T12:35:04.5942869Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1870:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5942980Z 1870 | 0x80, 2025-12-04T12:35:04.5943072Z | ^~~~ 2025-12-04T12:35:04.5944255Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1872:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5944365Z 1872 | 0x80, 2025-12-04T12:35:04.5944458Z | ^~~~ 2025-12-04T12:35:04.5945665Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1874:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5945779Z 1874 | 0x80, 2025-12-04T12:35:04.5945872Z | ^~~~ 2025-12-04T12:35:04.5947093Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1876:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5947188Z 1876 | 0x80, 2025-12-04T12:35:04.5947280Z | ^~~~ 2025-12-04T12:35:04.5948470Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1878:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5948563Z 1878 | 0x80, 2025-12-04T12:35:04.5948654Z | ^~~~ 2025-12-04T12:35:04.5949847Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1880:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5949946Z 1880 | 0x80, 2025-12-04T12:35:04.5950048Z | ^~~~ 2025-12-04T12:35:04.5953812Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1882:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5953912Z 1882 | 0x80, 2025-12-04T12:35:04.5954018Z | ^~~~ 2025-12-04T12:35:04.5955216Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1884:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5955325Z 1884 | 0x80, 2025-12-04T12:35:04.5955417Z | ^~~~ 2025-12-04T12:35:04.5957370Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1886:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5957489Z 1886 | 0x80, 2025-12-04T12:35:04.5957586Z | ^~~~ 2025-12-04T12:35:04.5959422Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1888:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5959537Z 1888 | 0x80, 2025-12-04T12:35:04.5959633Z | ^~~~ 2025-12-04T12:35:04.5961541Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1890:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5961647Z 1890 | 0x80, 2025-12-04T12:35:04.5961739Z | ^~~~ 2025-12-04T12:35:04.5964368Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1892:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5964553Z 1892 | 0x80, 2025-12-04T12:35:04.5964659Z | ^~~~ 2025-12-04T12:35:04.5966438Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1894:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5966537Z 1894 | 0x80, 2025-12-04T12:35:04.5966643Z | ^~~~ 2025-12-04T12:35:04.5968023Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1896:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5968123Z 1896 | 0x80, 2025-12-04T12:35:04.5968227Z | ^~~~ 2025-12-04T12:35:04.5969467Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1898:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5969582Z 1898 | 0x80, 2025-12-04T12:35:04.5969675Z | ^~~~ 2025-12-04T12:35:04.5971241Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1900:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5971359Z 1900 | 0x80, 2025-12-04T12:35:04.5971459Z | ^~~~ 2025-12-04T12:35:04.5972796Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1902:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5972893Z 1902 | 0x80, 2025-12-04T12:35:04.5972983Z | ^~~~ 2025-12-04T12:35:04.5974181Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1904:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5974283Z 1904 | 0x80, 2025-12-04T12:35:04.5974375Z | ^~~~ 2025-12-04T12:35:04.5975566Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1906:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5975661Z 1906 | 0x80, 2025-12-04T12:35:04.5975772Z | ^~~~ 2025-12-04T12:35:04.5977016Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1908:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5977113Z 1908 | 0x80, 2025-12-04T12:35:04.5977221Z | ^~~~ 2025-12-04T12:35:04.5978412Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1910:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5978527Z 1910 | 0x80, 2025-12-04T12:35:04.5978621Z | ^~~~ 2025-12-04T12:35:04.5979800Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1912:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5979906Z 1912 | 0x80, 2025-12-04T12:35:04.5980074Z | ^~~~ 2025-12-04T12:35:04.5981256Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1914:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5981361Z 1914 | 0x80, 2025-12-04T12:35:04.5981453Z | ^~~~ 2025-12-04T12:35:04.5982705Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1916:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5982848Z 1916 | 0x80, 2025-12-04T12:35:04.5982939Z | ^~~~ 2025-12-04T12:35:04.5984164Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1918:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5984263Z 1918 | 0x80, 2025-12-04T12:35:04.5984366Z | ^~~~ 2025-12-04T12:35:04.5985554Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1920:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5985649Z 1920 | 0x80, 2025-12-04T12:35:04.5985754Z | ^~~~ 2025-12-04T12:35:04.5986931Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1922:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5987032Z 1922 | 0x80, 2025-12-04T12:35:04.5987140Z | ^~~~ 2025-12-04T12:35:04.5988318Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1924:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5988429Z 1924 | 0x80, 2025-12-04T12:35:04.5988522Z | ^~~~ 2025-12-04T12:35:04.5989694Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1926:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5989800Z 1926 | 0x80, 2025-12-04T12:35:04.5989893Z | ^~~~ 2025-12-04T12:35:04.5991093Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1928:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5991193Z 1928 | 0x80); 2025-12-04T12:35:04.5991291Z | ^~~~ 2025-12-04T12:35:04.5992500Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1930:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5992601Z 1930 | 0x80, 2025-12-04T12:35:04.5992695Z | ^~~~ 2025-12-04T12:35:04.5993887Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1932:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5993982Z 1932 | 0x80, 2025-12-04T12:35:04.5994089Z | ^~~~ 2025-12-04T12:35:04.5995277Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1934:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5995375Z 1934 | 0x80, 2025-12-04T12:35:04.5995480Z | ^~~~ 2025-12-04T12:35:04.5996661Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1936:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5996810Z 1936 | 0x80, 2025-12-04T12:35:04.5996907Z | ^~~~ 2025-12-04T12:35:04.5998083Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1938:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5998194Z 1938 | 0x80, 2025-12-04T12:35:04.5998287Z | ^~~~ 2025-12-04T12:35:04.5999536Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1940:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.5999649Z 1940 | 0x80, 2025-12-04T12:35:04.5999743Z | ^~~~ 2025-12-04T12:35:04.6000974Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1942:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6001077Z 1942 | 0x80, 2025-12-04T12:35:04.6001171Z | ^~~~ 2025-12-04T12:35:04.6002372Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1944:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6002468Z 1944 | 0x80, 2025-12-04T12:35:04.6002566Z | ^~~~ 2025-12-04T12:35:04.6003757Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1946:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6003853Z 1946 | 0x80, 2025-12-04T12:35:04.6003969Z | ^~~~ 2025-12-04T12:35:04.6005137Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1948:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6005239Z 1948 | 0x80, 2025-12-04T12:35:04.6005347Z | ^~~~ 2025-12-04T12:35:04.6006529Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1950:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6006639Z 1950 | 0x80, 2025-12-04T12:35:04.6006739Z | ^~~~ 2025-12-04T12:35:04.6007921Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1952:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6008031Z 1952 | 0x80, 2025-12-04T12:35:04.6008131Z | ^~~~ 2025-12-04T12:35:04.6009303Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1954:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6009421Z 1954 | 0x80, 2025-12-04T12:35:04.6009512Z | ^~~~ 2025-12-04T12:35:04.6010700Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1956:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6010793Z 1956 | 0x80, 2025-12-04T12:35:04.6010900Z | ^~~~ 2025-12-04T12:35:04.6012094Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1958:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6012190Z 1958 | 0x80, 2025-12-04T12:35:04.6012302Z | ^~~~ 2025-12-04T12:35:04.6013473Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1960:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6013604Z 1960 | 0x80, 2025-12-04T12:35:04.6013710Z | ^~~~ 2025-12-04T12:35:04.6014885Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1962:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6014977Z 1962 | 0x80, 2025-12-04T12:35:04.6015153Z | ^~~~ 2025-12-04T12:35:04.6016391Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1964:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6016502Z 1964 | 0x80, 2025-12-04T12:35:04.6016635Z | ^~~~ 2025-12-04T12:35:04.6017819Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1966:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6017933Z 1966 | 0x80, 2025-12-04T12:35:04.6018024Z | ^~~~ 2025-12-04T12:35:04.6019225Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1968:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6019325Z 1968 | 0x80, 2025-12-04T12:35:04.6019422Z | ^~~~ 2025-12-04T12:35:04.6020607Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1970:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6020706Z 1970 | 0x80, 2025-12-04T12:35:04.6020801Z | ^~~~ 2025-12-04T12:35:04.6021988Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1972:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6022089Z 1972 | 0x80, 2025-12-04T12:35:04.6022194Z | ^~~~ 2025-12-04T12:35:04.6023370Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1974:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6023468Z 1974 | 0x80, 2025-12-04T12:35:04.6023580Z | ^~~~ 2025-12-04T12:35:04.6024753Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1976:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6024866Z 1976 | 0x80, 2025-12-04T12:35:04.6024959Z | ^~~~ 2025-12-04T12:35:04.6026129Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1978:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6026245Z 1978 | 0x80, 2025-12-04T12:35:04.6026338Z | ^~~~ 2025-12-04T12:35:04.6027521Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1980:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6027642Z 1980 | 0x80, 2025-12-04T12:35:04.6027735Z | ^~~~ 2025-12-04T12:35:04.6028915Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1982:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6029014Z 1982 | 0x80, 2025-12-04T12:35:04.6029106Z | ^~~~ 2025-12-04T12:35:04.6030289Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1984:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6030446Z 1984 | 0x80, 2025-12-04T12:35:04.6030537Z | ^~~~ 2025-12-04T12:35:04.6031719Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1986:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6031882Z 1986 | 0x80, 2025-12-04T12:35:04.6031987Z | ^~~~ 2025-12-04T12:35:04.6033164Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1988:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6033299Z 1988 | 0x80, 2025-12-04T12:35:04.6033411Z | ^~~~ 2025-12-04T12:35:04.6034590Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1990:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6034705Z 1990 | 0x80, 2025-12-04T12:35:04.6034797Z | ^~~~ 2025-12-04T12:35:04.6035969Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1992:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6036090Z 1992 | 0x80, 2025-12-04T12:35:04.6036182Z | ^~~~ 2025-12-04T12:35:04.6037361Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:2002:38: warning: overflow in conversion from ‘int’ to ‘short int’ changes value from ‘65280’ to ‘-256’ [-Woverflow] 2025-12-04T12:35:04.6037542Z 2002 | __m512i keep_1 = _mm512_set1_epi16(0xFF00); 2025-12-04T12:35:04.6037660Z | ^~~~~~ 2025-12-04T12:35:04.6040078Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In instantiation of ‘at::vec::CPU_CAPABILITY::Vectorized at::vec::CPU_CAPABILITY::shift_512_8(const at::vec::CPU_CAPABILITY::Vectorized&, const at::vec::CPU_CAPABILITY::Vectorized&) [with bool left_shift = false; T = signed char; typename std::enable_if<(is_same_v || is_same_v), int>::type = 0]’: 2025-12-04T12:35:04.6040673Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:2109:28: required from here 2025-12-04T12:35:04.6041885Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1866:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6041982Z 1866 | 0x80, 2025-12-04T12:35:04.6042076Z | ^~~~ 2025-12-04T12:35:04.6043268Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1868:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6043407Z 1868 | 0x80, 2025-12-04T12:35:04.6043512Z | ^~~~ 2025-12-04T12:35:04.6044693Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1870:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6044825Z 1870 | 0x80, 2025-12-04T12:35:04.6044929Z | ^~~~ 2025-12-04T12:35:04.6046107Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1872:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6046200Z 1872 | 0x80, 2025-12-04T12:35:04.6046308Z | ^~~~ 2025-12-04T12:35:04.6047479Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1874:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6047593Z 1874 | 0x80, 2025-12-04T12:35:04.6047684Z | ^~~~ 2025-12-04T12:35:04.6048891Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1876:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6049004Z 1876 | 0x80, 2025-12-04T12:35:04.6049098Z | ^~~~ 2025-12-04T12:35:04.6050321Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1878:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6050416Z 1878 | 0x80, 2025-12-04T12:35:04.6050507Z | ^~~~ 2025-12-04T12:35:04.6051699Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1880:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6051798Z 1880 | 0x80, 2025-12-04T12:35:04.6051889Z | ^~~~ 2025-12-04T12:35:04.6053079Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1882:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6053178Z 1882 | 0x80, 2025-12-04T12:35:04.6053283Z | ^~~~ 2025-12-04T12:35:04.6054468Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1884:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6054562Z 1884 | 0x80, 2025-12-04T12:35:04.6054667Z | ^~~~ 2025-12-04T12:35:04.6055834Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1886:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6055947Z 1886 | 0x80, 2025-12-04T12:35:04.6056038Z | ^~~~ 2025-12-04T12:35:04.6057290Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1888:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6057408Z 1888 | 0x80, 2025-12-04T12:35:04.6057498Z | ^~~~ 2025-12-04T12:35:04.6058692Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1890:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6058800Z 1890 | 0x80, 2025-12-04T12:35:04.6058890Z | ^~~~ 2025-12-04T12:35:04.6060073Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1892:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6060215Z 1892 | 0x80, 2025-12-04T12:35:04.6060306Z | ^~~~ 2025-12-04T12:35:04.6061505Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1894:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6061637Z 1894 | 0x80, 2025-12-04T12:35:04.6061741Z | ^~~~ 2025-12-04T12:35:04.6062928Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1896:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6063020Z 1896 | 0x80, 2025-12-04T12:35:04.6063125Z | ^~~~ 2025-12-04T12:35:04.6064301Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1898:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6064402Z 1898 | 0x80, 2025-12-04T12:35:04.6064507Z | ^~~~ 2025-12-04T12:35:04.6065711Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1900:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6065830Z 1900 | 0x80, 2025-12-04T12:35:04.6065923Z | ^~~~ 2025-12-04T12:35:04.6067132Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1902:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6067244Z 1902 | 0x80, 2025-12-04T12:35:04.6067339Z | ^~~~ 2025-12-04T12:35:04.6068530Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1904:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6068631Z 1904 | 0x80, 2025-12-04T12:35:04.6068722Z | ^~~~ 2025-12-04T12:35:04.6069914Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1906:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6070014Z 1906 | 0x80, 2025-12-04T12:35:04.6070106Z | ^~~~ 2025-12-04T12:35:04.6071502Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1908:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6071598Z 1908 | 0x80, 2025-12-04T12:35:04.6071703Z | ^~~~ 2025-12-04T12:35:04.6072877Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1910:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6072978Z 1910 | 0x80, 2025-12-04T12:35:04.6073081Z | ^~~~ 2025-12-04T12:35:04.6074261Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1912:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6074373Z 1912 | 0x80, 2025-12-04T12:35:04.6074466Z | ^~~~ 2025-12-04T12:35:04.6075648Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1914:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6075757Z 1914 | 0x80, 2025-12-04T12:35:04.6075851Z | ^~~~ 2025-12-04T12:35:04.6077110Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1916:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6077220Z 1916 | 0x80, 2025-12-04T12:35:04.6077314Z | ^~~~ 2025-12-04T12:35:04.6078569Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1918:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6078711Z 1918 | 0x80, 2025-12-04T12:35:04.6078805Z | ^~~~ 2025-12-04T12:35:04.6080064Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1920:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6080162Z 1920 | 0x80, 2025-12-04T12:35:04.6080255Z | ^~~~ 2025-12-04T12:35:04.6081452Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1922:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6081549Z 1922 | 0x80, 2025-12-04T12:35:04.6081657Z | ^~~~ 2025-12-04T12:35:04.6082841Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1924:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6082940Z 1924 | 0x80, 2025-12-04T12:35:04.6083050Z | ^~~~ 2025-12-04T12:35:04.6084234Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1926:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6084344Z 1926 | 0x80, 2025-12-04T12:35:04.6084438Z | ^~~~ 2025-12-04T12:35:04.6085625Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1928:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6085740Z 1928 | 0x80); 2025-12-04T12:35:04.6085834Z | ^~~~ 2025-12-04T12:35:04.6087016Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1930:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6087134Z 1930 | 0x80, 2025-12-04T12:35:04.6087230Z | ^~~~ 2025-12-04T12:35:04.6088427Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1932:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6088523Z 1932 | 0x80, 2025-12-04T12:35:04.6088616Z | ^~~~ 2025-12-04T12:35:04.6089813Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1934:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6089910Z 1934 | 0x80, 2025-12-04T12:35:04.6090019Z | ^~~~ 2025-12-04T12:35:04.6091204Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1936:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6091306Z 1936 | 0x80, 2025-12-04T12:35:04.6091417Z | ^~~~ 2025-12-04T12:35:04.6092597Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1938:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6092691Z 1938 | 0x80, 2025-12-04T12:35:04.6092845Z | ^~~~ 2025-12-04T12:35:04.6094017Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1940:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6094128Z 1940 | 0x80, 2025-12-04T12:35:04.6094219Z | ^~~~ 2025-12-04T12:35:04.6095434Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1942:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6095578Z 1942 | 0x80, 2025-12-04T12:35:04.6095671Z | ^~~~ 2025-12-04T12:35:04.6097021Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1944:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6097123Z 1944 | 0x80, 2025-12-04T12:35:04.6097222Z | ^~~~ 2025-12-04T12:35:04.6098417Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1946:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6098513Z 1946 | 0x80, 2025-12-04T12:35:04.6098606Z | ^~~~ 2025-12-04T12:35:04.6099798Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1948:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6099897Z 1948 | 0x80, 2025-12-04T12:35:04.6100003Z | ^~~~ 2025-12-04T12:35:04.6101181Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1950:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6101275Z 1950 | 0x80, 2025-12-04T12:35:04.6101386Z | ^~~~ 2025-12-04T12:35:04.6102558Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1952:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6102666Z 1952 | 0x80, 2025-12-04T12:35:04.6102758Z | ^~~~ 2025-12-04T12:35:04.6103932Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1954:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6104046Z 1954 | 0x80, 2025-12-04T12:35:04.6104138Z | ^~~~ 2025-12-04T12:35:04.6105315Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1956:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6105422Z 1956 | 0x80, 2025-12-04T12:35:04.6105519Z | ^~~~ 2025-12-04T12:35:04.6106706Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1958:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6106800Z 1958 | 0x80, 2025-12-04T12:35:04.6106896Z | ^~~~ 2025-12-04T12:35:04.6108093Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1960:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6108192Z 1960 | 0x80, 2025-12-04T12:35:04.6108302Z | ^~~~ 2025-12-04T12:35:04.6109488Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1962:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6109582Z 1962 | 0x80, 2025-12-04T12:35:04.6109730Z | ^~~~ 2025-12-04T12:35:04.6110905Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1964:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6111002Z 1964 | 0x80, 2025-12-04T12:35:04.6111112Z | ^~~~ 2025-12-04T12:35:04.6112332Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1966:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6112476Z 1966 | 0x80, 2025-12-04T12:35:04.6112569Z | ^~~~ 2025-12-04T12:35:04.6113780Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1968:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6113895Z 1968 | 0x80, 2025-12-04T12:35:04.6113988Z | ^~~~ 2025-12-04T12:35:04.6115175Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1970:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6115269Z 1970 | 0x80, 2025-12-04T12:35:04.6115360Z | ^~~~ 2025-12-04T12:35:04.6116559Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1972:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6116654Z 1972 | 0x80, 2025-12-04T12:35:04.6116746Z | ^~~~ 2025-12-04T12:35:04.6117940Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1974:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6118040Z 1974 | 0x80, 2025-12-04T12:35:04.6118144Z | ^~~~ 2025-12-04T12:35:04.6119315Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1976:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6119408Z 1976 | 0x80, 2025-12-04T12:35:04.6119515Z | ^~~~ 2025-12-04T12:35:04.6120700Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1978:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6120808Z 1978 | 0x80, 2025-12-04T12:35:04.6120902Z | ^~~~ 2025-12-04T12:35:04.6122081Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1980:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6122194Z 1980 | 0x80, 2025-12-04T12:35:04.6122287Z | ^~~~ 2025-12-04T12:35:04.6123462Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1982:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6123570Z 1982 | 0x80, 2025-12-04T12:35:04.6123663Z | ^~~~ 2025-12-04T12:35:04.6124865Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1984:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6124961Z 1984 | 0x80, 2025-12-04T12:35:04.6125054Z | ^~~~ 2025-12-04T12:35:04.6126245Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1986:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6126402Z 1986 | 0x80, 2025-12-04T12:35:04.6126495Z | ^~~~ 2025-12-04T12:35:04.6127683Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1988:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6127779Z 1988 | 0x80, 2025-12-04T12:35:04.6127924Z | ^~~~ 2025-12-04T12:35:04.6129130Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1990:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6129225Z 1990 | 0x80, 2025-12-04T12:35:04.6129370Z | ^~~~ 2025-12-04T12:35:04.6130552Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1992:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6130667Z 1992 | 0x80, 2025-12-04T12:35:04.6130758Z | ^~~~ 2025-12-04T12:35:04.6131940Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:2002:38: warning: overflow in conversion from ‘int’ to ‘short int’ changes value from ‘65280’ to ‘-256’ [-Woverflow] 2025-12-04T12:35:04.6132116Z 2002 | __m512i keep_1 = _mm512_set1_epi16(0xFF00); 2025-12-04T12:35:04.6132239Z | ^~~~~~ 2025-12-04T12:35:04.6134678Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In instantiation of ‘at::vec::CPU_CAPABILITY::Vectorized at::vec::CPU_CAPABILITY::shift_512_8(const at::vec::CPU_CAPABILITY::Vectorized&, const at::vec::CPU_CAPABILITY::Vectorized&) [with bool left_shift = false; T = unsigned char; typename std::enable_if<(is_same_v || is_same_v), int>::type = 0]’: 2025-12-04T12:35:04.6135263Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:2116:28: required from here 2025-12-04T12:35:04.6136524Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1866:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6136641Z 1866 | 0x80, 2025-12-04T12:35:04.6136739Z | ^~~~ 2025-12-04T12:35:04.6137934Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1868:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6138034Z 1868 | 0x80, 2025-12-04T12:35:04.6138127Z | ^~~~ 2025-12-04T12:35:04.6139314Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1870:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6139465Z 1870 | 0x80, 2025-12-04T12:35:04.6139573Z | ^~~~ 2025-12-04T12:35:04.6140749Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1872:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6140848Z 1872 | 0x80, 2025-12-04T12:35:04.6140991Z | ^~~~ 2025-12-04T12:35:04.6142169Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1874:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6142269Z 1874 | 0x80, 2025-12-04T12:35:04.6142374Z | ^~~~ 2025-12-04T12:35:04.6143545Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1876:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6143658Z 1876 | 0x80, 2025-12-04T12:35:04.6143749Z | ^~~~ 2025-12-04T12:35:04.6144927Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1878:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6145096Z 1878 | 0x80, 2025-12-04T12:35:04.6145189Z | ^~~~ 2025-12-04T12:35:04.6146373Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1880:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6146503Z 1880 | 0x80, 2025-12-04T12:35:04.6146596Z | ^~~~ 2025-12-04T12:35:04.6147781Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1882:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6147881Z 1882 | 0x80, 2025-12-04T12:35:04.6147972Z | ^~~~ 2025-12-04T12:35:04.6149158Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1884:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6149263Z 1884 | 0x80, 2025-12-04T12:35:04.6149372Z | ^~~~ 2025-12-04T12:35:04.6150545Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1886:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6150644Z 1886 | 0x80, 2025-12-04T12:35:04.6150752Z | ^~~~ 2025-12-04T12:35:04.6151922Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1888:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6152037Z 1888 | 0x80, 2025-12-04T12:35:04.6152130Z | ^~~~ 2025-12-04T12:35:04.6153297Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1890:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6153418Z 1890 | 0x80, 2025-12-04T12:35:04.6153514Z | ^~~~ 2025-12-04T12:35:04.6154693Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1892:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6154807Z 1892 | 0x80, 2025-12-04T12:35:04.6154901Z | ^~~~ 2025-12-04T12:35:04.6156091Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1894:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6156226Z 1894 | 0x80, 2025-12-04T12:35:04.6156319Z | ^~~~ 2025-12-04T12:35:04.6157519Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1896:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6157650Z 1896 | 0x80, 2025-12-04T12:35:04.6157741Z | ^~~~ 2025-12-04T12:35:04.6158931Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1898:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6159026Z 1898 | 0x80, 2025-12-04T12:35:04.6159134Z | ^~~~ 2025-12-04T12:35:04.6160309Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1900:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6160409Z 1900 | 0x80, 2025-12-04T12:35:04.6160513Z | ^~~~ 2025-12-04T12:35:04.6161737Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1902:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6161851Z 1902 | 0x80, 2025-12-04T12:35:04.6161942Z | ^~~~ 2025-12-04T12:35:04.6163237Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1904:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6163347Z 1904 | 0x80, 2025-12-04T12:35:04.6163439Z | ^~~~ 2025-12-04T12:35:04.6164627Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1906:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6164729Z 1906 | 0x80, 2025-12-04T12:35:04.6164822Z | ^~~~ 2025-12-04T12:35:04.6166022Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1908:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6166124Z 1908 | 0x80, 2025-12-04T12:35:04.6166217Z | ^~~~ 2025-12-04T12:35:04.6167423Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1910:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6167519Z 1910 | 0x80, 2025-12-04T12:35:04.6167629Z | ^~~~ 2025-12-04T12:35:04.6168801Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1912:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6168902Z 1912 | 0x80, 2025-12-04T12:35:04.6169011Z | ^~~~ 2025-12-04T12:35:04.6170193Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1914:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6170313Z 1914 | 0x80, 2025-12-04T12:35:04.6170407Z | ^~~~ 2025-12-04T12:35:04.6171770Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1916:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6171883Z 1916 | 0x80, 2025-12-04T12:35:04.6171976Z | ^~~~ 2025-12-04T12:35:04.6173169Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1918:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6173365Z 1918 | 0x80, 2025-12-04T12:35:04.6173460Z | ^~~~ 2025-12-04T12:35:04.6174704Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1920:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6174845Z 1920 | 0x80, 2025-12-04T12:35:04.6174937Z | ^~~~ 2025-12-04T12:35:04.6176175Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1922:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6176273Z 1922 | 0x80, 2025-12-04T12:35:04.6176435Z | ^~~~ 2025-12-04T12:35:04.6177640Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1924:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6177743Z 1924 | 0x80, 2025-12-04T12:35:04.6177852Z | ^~~~ 2025-12-04T12:35:04.6179030Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1926:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6179131Z 1926 | 0x80, 2025-12-04T12:35:04.6179243Z | ^~~~ 2025-12-04T12:35:04.6180426Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1928:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6180541Z 1928 | 0x80); 2025-12-04T12:35:04.6180636Z | ^~~~ 2025-12-04T12:35:04.6181819Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1930:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6181938Z 1930 | 0x80, 2025-12-04T12:35:04.6182031Z | ^~~~ 2025-12-04T12:35:04.6183204Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1932:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6183319Z 1932 | 0x80, 2025-12-04T12:35:04.6183410Z | ^~~~ 2025-12-04T12:35:04.6184594Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1934:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6184693Z 1934 | 0x80, 2025-12-04T12:35:04.6184787Z | ^~~~ 2025-12-04T12:35:04.6185979Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1936:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6186082Z 1936 | 0x80, 2025-12-04T12:35:04.6186190Z | ^~~~ 2025-12-04T12:35:04.6187367Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1938:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6187471Z 1938 | 0x80, 2025-12-04T12:35:04.6187578Z | ^~~~ 2025-12-04T12:35:04.6188757Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1940:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6188851Z 1940 | 0x80, 2025-12-04T12:35:04.6188959Z | ^~~~ 2025-12-04T12:35:04.6190130Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1942:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6190287Z 1942 | 0x80, 2025-12-04T12:35:04.6190379Z | ^~~~ 2025-12-04T12:35:04.6191584Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1944:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6191727Z 1944 | 0x80, 2025-12-04T12:35:04.6191820Z | ^~~~ 2025-12-04T12:35:04.6193040Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1946:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6193137Z 1946 | 0x80, 2025-12-04T12:35:04.6193231Z | ^~~~ 2025-12-04T12:35:04.6194420Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1948:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6194522Z 1948 | 0x80, 2025-12-04T12:35:04.6194616Z | ^~~~ 2025-12-04T12:35:04.6195806Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1950:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6195906Z 1950 | 0x80, 2025-12-04T12:35:04.6196014Z | ^~~~ 2025-12-04T12:35:04.6197186Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1952:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6197280Z 1952 | 0x80, 2025-12-04T12:35:04.6197389Z | ^~~~ 2025-12-04T12:35:04.6198576Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1954:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6198684Z 1954 | 0x80, 2025-12-04T12:35:04.6198777Z | ^~~~ 2025-12-04T12:35:04.6199952Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1956:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6200070Z 1956 | 0x80, 2025-12-04T12:35:04.6200164Z | ^~~~ 2025-12-04T12:35:04.6201344Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1958:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6201450Z 1958 | 0x80, 2025-12-04T12:35:04.6201543Z | ^~~~ 2025-12-04T12:35:04.6202730Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1960:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6202825Z 1960 | 0x80, 2025-12-04T12:35:04.6202917Z | ^~~~ 2025-12-04T12:35:04.6204110Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1962:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6204208Z 1962 | 0x80, 2025-12-04T12:35:04.6204301Z | ^~~~ 2025-12-04T12:35:04.6205491Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1964:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6205585Z 1964 | 0x80, 2025-12-04T12:35:04.6205691Z | ^~~~ 2025-12-04T12:35:04.6206909Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1966:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6207003Z 1966 | 0x80, 2025-12-04T12:35:04.6207110Z | ^~~~ 2025-12-04T12:35:04.6208342Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1968:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6208482Z 1968 | 0x80, 2025-12-04T12:35:04.6208575Z | ^~~~ 2025-12-04T12:35:04.6209784Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1970:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6209892Z 1970 | 0x80, 2025-12-04T12:35:04.6209985Z | ^~~~ 2025-12-04T12:35:04.6211169Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1972:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6211279Z 1972 | 0x80, 2025-12-04T12:35:04.6211372Z | ^~~~ 2025-12-04T12:35:04.6212564Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1974:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6212666Z 1974 | 0x80, 2025-12-04T12:35:04.6212760Z | ^~~~ 2025-12-04T12:35:04.6213956Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1976:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6214051Z 1976 | 0x80, 2025-12-04T12:35:04.6214156Z | ^~~~ 2025-12-04T12:35:04.6215336Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1978:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6215431Z 1978 | 0x80, 2025-12-04T12:35:04.6215539Z | ^~~~ 2025-12-04T12:35:04.6216787Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1980:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6216891Z 1980 | 0x80, 2025-12-04T12:35:04.6217061Z | ^~~~ 2025-12-04T12:35:04.6218247Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1982:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6218357Z 1982 | 0x80, 2025-12-04T12:35:04.6218457Z | ^~~~ 2025-12-04T12:35:04.6219631Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1984:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6219745Z 1984 | 0x80, 2025-12-04T12:35:04.6219837Z | ^~~~ 2025-12-04T12:35:04.6221031Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1986:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6221173Z 1986 | 0x80, 2025-12-04T12:35:04.6221264Z | ^~~~ 2025-12-04T12:35:04.6222454Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1988:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6222548Z 1988 | 0x80, 2025-12-04T12:35:04.6222678Z | ^~~~ 2025-12-04T12:35:04.6223865Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1990:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6223959Z 1990 | 0x80, 2025-12-04T12:35:04.6224065Z | ^~~~ 2025-12-04T12:35:04.6225268Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1992:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6225367Z 1992 | 0x80, 2025-12-04T12:35:04.6225473Z | ^~~~ 2025-12-04T12:35:04.6226695Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:2002:38: warning: overflow in conversion from ‘int’ to ‘short int’ changes value from ‘65280’ to ‘-256’ [-Woverflow] 2025-12-04T12:35:04.6226868Z 2002 | __m512i keep_1 = _mm512_set1_epi16(0xFF00); 2025-12-04T12:35:04.6226992Z | ^~~~~~ 2025-12-04T12:35:04.6227498Z In file included from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512.h:16, 2025-12-04T12:35:04.6227881Z from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec.h:5, 2025-12-04T12:35:04.6228331Z from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/functional_base.h:7, 2025-12-04T12:35:04.6228754Z from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/functional.h:4, 2025-12-04T12:35:04.6229223Z from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/torch/csrc/inductor/cpp_prefix.h:45, 2025-12-04T12:35:04.6229864Z from /tmp/hdMAUq/tmp2gno_q_y/data/aotinductor/model1/cji6fcfpjxr5ad3oypbruxr5r26niflgwwkmd5rthzuhxclq6uis.wrapper.cpp:751: 2025-12-04T12:35:04.6231347Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h: In instantiation of ‘void at::vec::CPU_CAPABILITY::QuantizeAvx512(const float*, T*, int, float, int64_t) [with T = signed char; int64_t = long int]’: 2025-12-04T12:35:04.6231924Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:696:31: required from here 2025-12-04T12:35:04.6233126Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:201:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.6233245Z 201 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.6233343Z | ^~~~ 2025-12-04T12:35:04.6234535Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:201:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.6234700Z 201 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.6234812Z | ^~~~ 2025-12-04T12:35:04.6235996Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:201:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.6236116Z 201 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.6236266Z | ^~~~ 2025-12-04T12:35:04.6237452Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:201:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.6237587Z 201 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.6237688Z | ^~~~ 2025-12-04T12:35:04.6238867Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:202:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.6239002Z 202 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.6239095Z | ^~~~ 2025-12-04T12:35:04.6240321Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:202:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.6240456Z 202 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.6240555Z | ^~~~ 2025-12-04T12:35:04.6241781Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:202:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.6241895Z 202 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.6241996Z | ^~~~ 2025-12-04T12:35:04.6243200Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:202:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.6243313Z 202 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.6243431Z | ^~~~ 2025-12-04T12:35:04.6244616Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:203:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.6244734Z 203 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.6244839Z | ^~~~ 2025-12-04T12:35:04.6246024Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:203:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.6246149Z 203 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.6246254Z | ^~~~ 2025-12-04T12:35:04.6247435Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:203:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.6247560Z 203 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.6247660Z | ^~~~ 2025-12-04T12:35:04.6248852Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:203:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.6248977Z 203 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.6249089Z | ^~~~ 2025-12-04T12:35:04.6250276Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:205:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.6250427Z 205 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.6250521Z | ^~~~ 2025-12-04T12:35:04.6251719Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:205:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.6251837Z 205 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.6251985Z | ^~~~ 2025-12-04T12:35:04.6253167Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:205:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.6253286Z 205 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.6253397Z | ^~~~ 2025-12-04T12:35:04.6254574Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:205:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.6254692Z 205 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.6254819Z | ^~~~ 2025-12-04T12:35:04.6256033Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:206:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.6256168Z 206 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.6256263Z | ^~~~ 2025-12-04T12:35:04.6257572Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:206:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.6257704Z 206 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.6257805Z | ^~~~ 2025-12-04T12:35:04.6259006Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:206:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.6259120Z 206 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.6259221Z | ^~~~ 2025-12-04T12:35:04.6260425Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:206:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.6260548Z 206 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.6260667Z | ^~~~ 2025-12-04T12:35:04.6261854Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:207:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.6261969Z 207 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.6262087Z | ^~~~ 2025-12-04T12:35:04.6263269Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:207:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.6263381Z 207 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.6263495Z | ^~~~ 2025-12-04T12:35:04.6264692Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:207:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.6264820Z 207 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.6264928Z | ^~~~ 2025-12-04T12:35:04.6266111Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:207:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.6266301Z 207 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.6266403Z | ^~~~ 2025-12-04T12:35:04.6267602Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:209:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.6267786Z 209 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.6267881Z | ^~~~ 2025-12-04T12:35:04.6269081Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:209:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.6269233Z 209 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.6269346Z | ^~~~ 2025-12-04T12:35:04.6270525Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:209:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.6270646Z 209 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.6270759Z | ^~~~ 2025-12-04T12:35:04.6272130Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:209:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.6272248Z 209 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.6272367Z | ^~~~ 2025-12-04T12:35:04.6273554Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:210:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.6273683Z 210 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.6273778Z | ^~~~ 2025-12-04T12:35:04.6274971Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:210:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.6275104Z 210 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.6275201Z | ^~~~ 2025-12-04T12:35:04.6276405Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:210:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.6276522Z 210 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.6276628Z | ^~~~ 2025-12-04T12:35:04.6277831Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:210:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.6277943Z 210 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.6278050Z | ^~~~ 2025-12-04T12:35:04.6279236Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:211:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.6279347Z 211 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.6279461Z | ^~~~ 2025-12-04T12:35:04.6280645Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:211:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.6280756Z 211 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.6280872Z | ^~~~ 2025-12-04T12:35:04.6282053Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:211:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.6282255Z 211 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.6282356Z | ^~~~ 2025-12-04T12:35:04.6283543Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:211:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.6283759Z 211 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.6283862Z | ^~~~ 2025-12-04T12:35:04.6285112Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:213:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.6285231Z 213 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.6285324Z | ^~~~ 2025-12-04T12:35:04.6286528Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:213:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.6286646Z 213 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.6286741Z | ^~~~ 2025-12-04T12:35:04.6287942Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:213:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.6288063Z 213 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.6288179Z | ^~~~ 2025-12-04T12:35:04.6289361Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:213:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.6289473Z 213 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.6289590Z | ^~~~ 2025-12-04T12:35:04.6290774Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:214:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.6290900Z 214 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.6290994Z | ^~~~ 2025-12-04T12:35:04.6292178Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:214:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.6292310Z 214 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.6292406Z | ^~~~ 2025-12-04T12:35:04.6293604Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:214:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.6293716Z 214 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.6293823Z | ^~~~ 2025-12-04T12:35:04.6295013Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:214:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.6295127Z 214 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.6295236Z | ^~~~ 2025-12-04T12:35:04.6296494Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:215:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.6296609Z 215 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.6296723Z | ^~~~ 2025-12-04T12:35:04.6297910Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:215:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.6298071Z 215 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.6298182Z | ^~~~ 2025-12-04T12:35:04.6299364Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:215:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.6299565Z 215 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.6299668Z | ^~~~ 2025-12-04T12:35:04.6300892Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:215:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.6301021Z 215 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.6301123Z | ^~~~ 2025-12-04T12:35:04.6302604Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h: In instantiation of ‘void at::vec::CPU_CAPABILITY::QuantizeAvx512(const float*, T*, int, float, int64_t) [with T = unsigned char; int64_t = long int]’: 2025-12-04T12:35:04.6303188Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:933:31: required from here 2025-12-04T12:35:04.6304379Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:201:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.6304510Z 201 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.6304605Z | ^~~~ 2025-12-04T12:35:04.6305811Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:201:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.6305926Z 201 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.6306029Z | ^~~~ 2025-12-04T12:35:04.6307226Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:201:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.6307338Z 201 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.6307437Z | ^~~~ 2025-12-04T12:35:04.6308641Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:201:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.6308754Z 201 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.6308876Z | ^~~~ 2025-12-04T12:35:04.6310052Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:202:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.6310169Z 202 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.6310274Z | ^~~~ 2025-12-04T12:35:04.6311451Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:202:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.6311585Z 202 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.6311686Z | ^~~~ 2025-12-04T12:35:04.6312869Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:202:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.6313001Z 202 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.6313102Z | ^~~~ 2025-12-04T12:35:04.6314291Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:202:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.6314447Z 202 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.6314547Z | ^~~~ 2025-12-04T12:35:04.6315775Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:203:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.6315922Z 203 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.6316015Z | ^~~~ 2025-12-04T12:35:04.6317259Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:203:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.6317371Z 203 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.6317482Z | ^~~~ 2025-12-04T12:35:04.6318676Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:203:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.6318794Z 203 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.6318908Z | ^~~~ 2025-12-04T12:35:04.6320100Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:203:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.6320231Z 203 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.6320332Z | ^~~~ 2025-12-04T12:35:04.6321519Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:205:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.6321643Z 205 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.6321743Z | ^~~~ 2025-12-04T12:35:04.6322933Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:205:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.6323044Z 205 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.6323140Z | ^~~~ 2025-12-04T12:35:04.6324340Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:205:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.6324452Z 205 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.6324558Z | ^~~~ 2025-12-04T12:35:04.6325756Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:205:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.6325913Z 205 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.6326025Z | ^~~~ 2025-12-04T12:35:04.6327208Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:206:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.6327325Z 206 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.6327469Z | ^~~~ 2025-12-04T12:35:04.6328650Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:206:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.6328782Z 206 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.6328879Z | ^~~~ 2025-12-04T12:35:04.6330056Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:206:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.6330188Z 206 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.6330290Z | ^~~~ 2025-12-04T12:35:04.6331520Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:206:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.6331641Z 206 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.6331742Z | ^~~~ 2025-12-04T12:35:04.6332975Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:207:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.6333092Z 207 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.6333188Z | ^~~~ 2025-12-04T12:35:04.6334397Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:207:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.6334511Z 207 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.6334622Z | ^~~~ 2025-12-04T12:35:04.6335817Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:207:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.6335936Z 207 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.6336049Z | ^~~~ 2025-12-04T12:35:04.6337326Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:207:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.6337454Z 207 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.6337561Z | ^~~~ 2025-12-04T12:35:04.6338739Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:209:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.6338865Z 209 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.6338960Z | ^~~~ 2025-12-04T12:35:04.6340165Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:209:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.6340278Z 209 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.6340382Z | ^~~~ 2025-12-04T12:35:04.6341575Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:209:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.6341734Z 209 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.6341835Z | ^~~~ 2025-12-04T12:35:04.6343033Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:209:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.6343152Z 209 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.6343302Z | ^~~~ 2025-12-04T12:35:04.6344490Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:210:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.6344602Z 210 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.6344711Z | ^~~~ 2025-12-04T12:35:04.6345885Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:210:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.6346019Z 210 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.6346117Z | ^~~~ 2025-12-04T12:35:04.6347338Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:210:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.6347471Z 210 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.6347573Z | ^~~~ 2025-12-04T12:35:04.6348803Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:210:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.6348918Z 210 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.6349021Z | ^~~~ 2025-12-04T12:35:04.6350221Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:211:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.6350336Z 211 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.6350431Z | ^~~~ 2025-12-04T12:35:04.6351640Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:211:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.6351761Z 211 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.6351874Z | ^~~~ 2025-12-04T12:35:04.6353062Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:211:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.6353177Z 211 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.6353302Z | ^~~~ 2025-12-04T12:35:04.6354479Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:211:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.6354608Z 211 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.6354716Z | ^~~~ 2025-12-04T12:35:04.6355902Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:213:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.6356035Z 213 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.6356136Z | ^~~~ 2025-12-04T12:35:04.6357335Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:213:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.6357486Z 213 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.6357584Z | ^~~~ 2025-12-04T12:35:04.6358776Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:213:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.6358927Z 213 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.6359061Z | ^~~~ 2025-12-04T12:35:04.6360262Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:213:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.6360422Z 213 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.6360538Z | ^~~~ 2025-12-04T12:35:04.6361719Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:214:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.6361837Z 214 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.6361944Z | ^~~~ 2025-12-04T12:35:04.6363130Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:214:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.6363260Z 214 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.6363355Z | ^~~~ 2025-12-04T12:35:04.6364538Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:214:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.6364669Z 214 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.6364772Z | ^~~~ 2025-12-04T12:35:04.6365956Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:214:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.6366082Z 214 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.6366183Z | ^~~~ 2025-12-04T12:35:04.6367374Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:215:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.6367491Z 215 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.6367584Z | ^~~~ 2025-12-04T12:35:04.6368780Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:215:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.6368890Z 215 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.6369008Z | ^~~~ 2025-12-04T12:35:04.6370188Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:215:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.6370300Z 215 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.6370419Z | ^~~~ 2025-12-04T12:35:04.6371784Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:215:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.6371914Z 215 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.6372023Z | ^~~~ 2025-12-04T12:35:04.6372562Z In file included from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_float.h:12, 2025-12-04T12:35:04.6373011Z from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512.h:11, 2025-12-04T12:35:04.6373455Z from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec.h:5, 2025-12-04T12:35:04.6373912Z from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/functional_base.h:7, 2025-12-04T12:35:04.6374357Z from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/functional.h:4, 2025-12-04T12:35:04.6374883Z from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/torch/csrc/inductor/cpp_prefix.h:45, 2025-12-04T12:35:04.6375583Z from /tmp/GPm4bX/tmp2gno_q_y/data/aotinductor/model2/cyss5jazqjsvp5s2t3ihlofugodyzirark5aiimqjwirn4hylxbp.wrapper.cpp:656: 2025-12-04T12:35:04.6376184Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/sleef.h:192:10: warning: ISO C++ prohibits anonymous structs [-Wpedantic] 2025-12-04T12:35:04.6376354Z 192 | struct { 2025-12-04T12:35:04.6376452Z | ^ 2025-12-04T12:35:04.6376950Z In file included from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512.h:15, 2025-12-04T12:35:04.6377329Z from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec.h:5, 2025-12-04T12:35:04.6377770Z from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/functional_base.h:7, 2025-12-04T12:35:04.6378177Z from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/functional.h:4, 2025-12-04T12:35:04.6378656Z from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/torch/csrc/inductor/cpp_prefix.h:45, 2025-12-04T12:35:04.6379294Z from /tmp/GPm4bX/tmp2gno_q_y/data/aotinductor/model2/cyss5jazqjsvp5s2t3ihlofugodyzirark5aiimqjwirn4hylxbp.wrapper.cpp:656: 2025-12-04T12:35:04.6381548Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In static member function ‘static at::vec::CPU_CAPABILITY::Vectorized at::vec::CPU_CAPABILITY::Vectorized::blendv(const at::vec::CPU_CAPABILITY::Vectorized&, const at::vec::CPU_CAPABILITY::Vectorized&, const at::vec::CPU_CAPABILITY::Vectorized&)’: 2025-12-04T12:35:04.6382742Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:544:38: warning: overflow in conversion from ‘int’ to ‘short int’ changes value from ‘65535’ to ‘-1’ [-Woverflow] 2025-12-04T12:35:04.6382908Z 544 | auto msb_one = _mm512_set1_epi16(0xFFFF); 2025-12-04T12:35:04.6383032Z | ^~~~~~ 2025-12-04T12:35:04.6383533Z In file included from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512.h:15, 2025-12-04T12:35:04.6383920Z from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec.h:5, 2025-12-04T12:35:04.6384357Z from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/functional_base.h:7, 2025-12-04T12:35:04.6384770Z from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/functional.h:4, 2025-12-04T12:35:04.6385236Z from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/torch/csrc/inductor/cpp_prefix.h:45, 2025-12-04T12:35:04.6385882Z from /tmp/GPm4bX/tmp2gno_q_y/data/aotinductor/model2/cyss5jazqjsvp5s2t3ihlofugodyzirark5aiimqjwirn4hylxbp.wrapper.cpp:656: 2025-12-04T12:35:04.6387529Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In member function ‘at::vec::CPU_CAPABILITY::Vectorized at::vec::CPU_CAPABILITY::Vectorized::operator==(const at::vec::CPU_CAPABILITY::Vectorized&) const’: 2025-12-04T12:35:04.6388738Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:697:54: warning: overflow in conversion from ‘int’ to ‘short int’ changes value from ‘65535’ to ‘-1’ [-Woverflow] 2025-12-04T12:35:04.6388958Z 697 | return _mm512_mask_set1_epi16(zero_vector, mask, 0xFFFF); 2025-12-04T12:35:04.6389120Z | ^~~~~~ 2025-12-04T12:35:04.6390812Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In member function ‘at::vec::CPU_CAPABILITY::Vectorized at::vec::CPU_CAPABILITY::Vectorized::operator!=(const at::vec::CPU_CAPABILITY::Vectorized&) const’: 2025-12-04T12:35:04.6391972Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:701:54: warning: overflow in conversion from ‘int’ to ‘short int’ changes value from ‘65535’ to ‘-1’ [-Woverflow] 2025-12-04T12:35:04.6392185Z 701 | return _mm512_mask_set1_epi16(zero_vector, mask, 0xFFFF); 2025-12-04T12:35:04.6392326Z | ^~~~~~ 2025-12-04T12:35:04.6393945Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In member function ‘at::vec::CPU_CAPABILITY::Vectorized at::vec::CPU_CAPABILITY::Vectorized::operator<(const at::vec::CPU_CAPABILITY::Vectorized&) const’: 2025-12-04T12:35:04.6395287Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:705:54: warning: overflow in conversion from ‘int’ to ‘short int’ changes value from ‘65535’ to ‘-1’ [-Woverflow] 2025-12-04T12:35:04.6395500Z 705 | return _mm512_mask_set1_epi16(zero_vector, mask, 0xFFFF); 2025-12-04T12:35:04.6395627Z | ^~~~~~ 2025-12-04T12:35:04.6397264Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In member function ‘at::vec::CPU_CAPABILITY::Vectorized at::vec::CPU_CAPABILITY::Vectorized::operator<=(const at::vec::CPU_CAPABILITY::Vectorized&) const’: 2025-12-04T12:35:04.6398435Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:709:54: warning: overflow in conversion from ‘int’ to ‘short int’ changes value from ‘65535’ to ‘-1’ [-Woverflow] 2025-12-04T12:35:04.6398660Z 709 | return _mm512_mask_set1_epi16(zero_vector, mask, 0xFFFF); 2025-12-04T12:35:04.6398790Z | ^~~~~~ 2025-12-04T12:35:04.6400416Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In member function ‘at::vec::CPU_CAPABILITY::Vectorized at::vec::CPU_CAPABILITY::Vectorized::operator>(const at::vec::CPU_CAPABILITY::Vectorized&) const’: 2025-12-04T12:35:04.6401583Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:713:54: warning: overflow in conversion from ‘int’ to ‘short int’ changes value from ‘65535’ to ‘-1’ [-Woverflow] 2025-12-04T12:35:04.6401788Z 713 | return _mm512_mask_set1_epi16(zero_vector, mask, 0xFFFF); 2025-12-04T12:35:04.6401928Z | ^~~~~~ 2025-12-04T12:35:04.6403554Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In member function ‘at::vec::CPU_CAPABILITY::Vectorized at::vec::CPU_CAPABILITY::Vectorized::operator>=(const at::vec::CPU_CAPABILITY::Vectorized&) const’: 2025-12-04T12:35:04.6404731Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:717:54: warning: overflow in conversion from ‘int’ to ‘short int’ changes value from ‘65535’ to ‘-1’ [-Woverflow] 2025-12-04T12:35:04.6404933Z 717 | return _mm512_mask_set1_epi16(zero_vector, mask, 0xFFFF); 2025-12-04T12:35:04.6405099Z | ^~~~~~ 2025-12-04T12:35:04.6407429Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In static member function ‘static at::vec::CPU_CAPABILITY::Vectorized at::vec::CPU_CAPABILITY::Vectorized::blendv(const at::vec::CPU_CAPABILITY::Vectorized&, const at::vec::CPU_CAPABILITY::Vectorized&, const at::vec::CPU_CAPABILITY::Vectorized&)’: 2025-12-04T12:35:04.6408772Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1153:37: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.6408935Z 1153 | auto msb_one = _mm512_set1_epi8(0xFF); 2025-12-04T12:35:04.6409051Z | ^~~~ 2025-12-04T12:35:04.6410737Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In member function ‘at::vec::CPU_CAPABILITY::Vectorized at::vec::CPU_CAPABILITY::Vectorized::operator==(const at::vec::CPU_CAPABILITY::Vectorized&) const’: 2025-12-04T12:35:04.6411937Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1166:53: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.6412162Z 1166 | return _mm512_mask_set1_epi8(zero_vector, mask, 0xFF); 2025-12-04T12:35:04.6412289Z | ^~~~ 2025-12-04T12:35:04.6413946Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In member function ‘at::vec::CPU_CAPABILITY::Vectorized at::vec::CPU_CAPABILITY::Vectorized::operator!=(const at::vec::CPU_CAPABILITY::Vectorized&) const’: 2025-12-04T12:35:04.6415163Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1170:53: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.6415367Z 1170 | return _mm512_mask_set1_epi8(zero_vector, mask, 0xFF); 2025-12-04T12:35:04.6415514Z | ^~~~ 2025-12-04T12:35:04.6417233Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In member function ‘at::vec::CPU_CAPABILITY::Vectorized at::vec::CPU_CAPABILITY::Vectorized::operator<(const at::vec::CPU_CAPABILITY::Vectorized&) const’: 2025-12-04T12:35:04.6418429Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1174:53: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.6418653Z 1174 | return _mm512_mask_set1_epi8(zero_vector, mask, 0xFF); 2025-12-04T12:35:04.6418775Z | ^~~~ 2025-12-04T12:35:04.6420451Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In member function ‘at::vec::CPU_CAPABILITY::Vectorized at::vec::CPU_CAPABILITY::Vectorized::operator<=(const at::vec::CPU_CAPABILITY::Vectorized&) const’: 2025-12-04T12:35:04.6421648Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1178:53: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.6421865Z 1178 | return _mm512_mask_set1_epi8(zero_vector, mask, 0xFF); 2025-12-04T12:35:04.6421988Z | ^~~~ 2025-12-04T12:35:04.6424357Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In static member function ‘static at::vec::CPU_CAPABILITY::Vectorized at::vec::CPU_CAPABILITY::Vectorized::blendv(const at::vec::CPU_CAPABILITY::Vectorized&, const at::vec::CPU_CAPABILITY::Vectorized&, const at::vec::CPU_CAPABILITY::Vectorized&)’: 2025-12-04T12:35:04.6425633Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1207:37: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.6425780Z 1207 | auto msb_one = _mm512_set1_epi8(0xFF); 2025-12-04T12:35:04.6425939Z | ^~~~ 2025-12-04T12:35:04.6427633Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In member function ‘at::vec::CPU_CAPABILITY::Vectorized at::vec::CPU_CAPABILITY::Vectorized::operator==(const at::vec::CPU_CAPABILITY::Vectorized&) const’: 2025-12-04T12:35:04.6428843Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1220:53: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.6429051Z 1220 | return _mm512_mask_set1_epi8(zero_vector, mask, 0xFF); 2025-12-04T12:35:04.6429180Z | ^~~~ 2025-12-04T12:35:04.6430887Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In member function ‘at::vec::CPU_CAPABILITY::Vectorized at::vec::CPU_CAPABILITY::Vectorized::operator!=(const at::vec::CPU_CAPABILITY::Vectorized&) const’: 2025-12-04T12:35:04.6432074Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1224:53: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.6432292Z 1224 | return _mm512_mask_set1_epi8(zero_vector, mask, 0xFF); 2025-12-04T12:35:04.6432415Z | ^~~~ 2025-12-04T12:35:04.6434106Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In member function ‘at::vec::CPU_CAPABILITY::Vectorized at::vec::CPU_CAPABILITY::Vectorized::operator<(const at::vec::CPU_CAPABILITY::Vectorized&) const’: 2025-12-04T12:35:04.6435315Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1228:53: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.6435516Z 1228 | return _mm512_mask_set1_epi8(zero_vector, mask, 0xFF); 2025-12-04T12:35:04.6435658Z | ^~~~ 2025-12-04T12:35:04.6437349Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In member function ‘at::vec::CPU_CAPABILITY::Vectorized at::vec::CPU_CAPABILITY::Vectorized::operator<=(const at::vec::CPU_CAPABILITY::Vectorized&) const’: 2025-12-04T12:35:04.6438555Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1232:53: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.6438760Z 1232 | return _mm512_mask_set1_epi8(zero_vector, mask, 0xFF); 2025-12-04T12:35:04.6438888Z | ^~~~ 2025-12-04T12:35:04.6441252Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In instantiation of ‘at::vec::CPU_CAPABILITY::Vectorized at::vec::CPU_CAPABILITY::shift_512_8(const at::vec::CPU_CAPABILITY::Vectorized&, const at::vec::CPU_CAPABILITY::Vectorized&) [with bool left_shift = true; T = signed char; typename std::enable_if<(is_same_v || is_same_v), int>::type = 0]’: 2025-12-04T12:35:04.6441923Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:2074:27: required from here 2025-12-04T12:35:04.6443158Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1866:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6443258Z 1866 | 0x80, 2025-12-04T12:35:04.6443402Z | ^~~~ 2025-12-04T12:35:04.6444583Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1868:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6444686Z 1868 | 0x80, 2025-12-04T12:35:04.6444795Z | ^~~~ 2025-12-04T12:35:04.6445971Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1870:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6446068Z 1870 | 0x80, 2025-12-04T12:35:04.6446192Z | ^~~~ 2025-12-04T12:35:04.6447365Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1872:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6447474Z 1872 | 0x80, 2025-12-04T12:35:04.6447572Z | ^~~~ 2025-12-04T12:35:04.6448746Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1874:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6448860Z 1874 | 0x80, 2025-12-04T12:35:04.6448954Z | ^~~~ 2025-12-04T12:35:04.6450136Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1876:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6450232Z 1876 | 0x80, 2025-12-04T12:35:04.6450335Z | ^~~~ 2025-12-04T12:35:04.6451521Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1878:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6451615Z 1878 | 0x80, 2025-12-04T12:35:04.6451710Z | ^~~~ 2025-12-04T12:35:04.6452899Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1880:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6453028Z 1880 | 0x80, 2025-12-04T12:35:04.6453135Z | ^~~~ 2025-12-04T12:35:04.6454307Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1882:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6454401Z 1882 | 0x80, 2025-12-04T12:35:04.6454545Z | ^~~~ 2025-12-04T12:35:04.6455717Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1884:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6455822Z 1884 | 0x80, 2025-12-04T12:35:04.6455920Z | ^~~~ 2025-12-04T12:35:04.6457172Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1886:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6457292Z 1886 | 0x80, 2025-12-04T12:35:04.6457385Z | ^~~~ 2025-12-04T12:35:04.6458565Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1888:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6458721Z 1888 | 0x80, 2025-12-04T12:35:04.6458822Z | ^~~~ 2025-12-04T12:35:04.6460010Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1890:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6460138Z 1890 | 0x80, 2025-12-04T12:35:04.6460234Z | ^~~~ 2025-12-04T12:35:04.6461423Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1892:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6461524Z 1892 | 0x80, 2025-12-04T12:35:04.6461631Z | ^~~~ 2025-12-04T12:35:04.6462801Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1894:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6462908Z 1894 | 0x80, 2025-12-04T12:35:04.6463017Z | ^~~~ 2025-12-04T12:35:04.6464190Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1896:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6464292Z 1896 | 0x80, 2025-12-04T12:35:04.6464407Z | ^~~~ 2025-12-04T12:35:04.6465580Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1898:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6465697Z 1898 | 0x80, 2025-12-04T12:35:04.6465790Z | ^~~~ 2025-12-04T12:35:04.6466969Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1900:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6467096Z 1900 | 0x80, 2025-12-04T12:35:04.6467191Z | ^~~~ 2025-12-04T12:35:04.6468372Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1902:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6468473Z 1902 | 0x80, 2025-12-04T12:35:04.6468568Z | ^~~~ 2025-12-04T12:35:04.6469759Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1904:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6469892Z 1904 | 0x80, 2025-12-04T12:35:04.6469984Z | ^~~~ 2025-12-04T12:35:04.6471390Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1906:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6471577Z 1906 | 0x80, 2025-12-04T12:35:04.6471685Z | ^~~~ 2025-12-04T12:35:04.6472867Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1908:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6472971Z 1908 | 0x80, 2025-12-04T12:35:04.6473080Z | ^~~~ 2025-12-04T12:35:04.6474262Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1910:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6474378Z 1910 | 0x80, 2025-12-04T12:35:04.6474473Z | ^~~~ 2025-12-04T12:35:04.6475696Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1912:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6475817Z 1912 | 0x80, 2025-12-04T12:35:04.6475913Z | ^~~~ 2025-12-04T12:35:04.6477137Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1914:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6477251Z 1914 | 0x80, 2025-12-04T12:35:04.6477345Z | ^~~~ 2025-12-04T12:35:04.6478537Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1916:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6478640Z 1916 | 0x80, 2025-12-04T12:35:04.6478731Z | ^~~~ 2025-12-04T12:35:04.6479926Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1918:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6480027Z 1918 | 0x80, 2025-12-04T12:35:04.6480117Z | ^~~~ 2025-12-04T12:35:04.6481303Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1920:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6481398Z 1920 | 0x80, 2025-12-04T12:35:04.6481503Z | ^~~~ 2025-12-04T12:35:04.6482672Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1922:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6482772Z 1922 | 0x80, 2025-12-04T12:35:04.6482879Z | ^~~~ 2025-12-04T12:35:04.6484066Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1924:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6484179Z 1924 | 0x80, 2025-12-04T12:35:04.6484271Z | ^~~~ 2025-12-04T12:35:04.6485447Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1926:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6485553Z 1926 | 0x80, 2025-12-04T12:35:04.6485642Z | ^~~~ 2025-12-04T12:35:04.6486816Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1928:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6486980Z 1928 | 0x80); 2025-12-04T12:35:04.6487073Z | ^~~~ 2025-12-04T12:35:04.6488309Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1930:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6488483Z 1930 | 0x80, 2025-12-04T12:35:04.6488581Z | ^~~~ 2025-12-04T12:35:04.6489815Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1932:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6489942Z 1932 | 0x80, 2025-12-04T12:35:04.6490067Z | ^~~~ 2025-12-04T12:35:04.6491253Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1934:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6491398Z 1934 | 0x80, 2025-12-04T12:35:04.6491501Z | ^~~~ 2025-12-04T12:35:04.6492710Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1936:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6492826Z 1936 | 0x80, 2025-12-04T12:35:04.6492917Z | ^~~~ 2025-12-04T12:35:04.6494117Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1938:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6494211Z 1938 | 0x80, 2025-12-04T12:35:04.6494302Z | ^~~~ 2025-12-04T12:35:04.6495489Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1940:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6495589Z 1940 | 0x80, 2025-12-04T12:35:04.6495693Z | ^~~~ 2025-12-04T12:35:04.6496940Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1942:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6497043Z 1942 | 0x80, 2025-12-04T12:35:04.6497151Z | ^~~~ 2025-12-04T12:35:04.6498340Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1944:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6498436Z 1944 | 0x80, 2025-12-04T12:35:04.6498548Z | ^~~~ 2025-12-04T12:35:04.6499722Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1946:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6499835Z 1946 | 0x80, 2025-12-04T12:35:04.6499928Z | ^~~~ 2025-12-04T12:35:04.6501109Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1948:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6501221Z 1948 | 0x80, 2025-12-04T12:35:04.6501313Z | ^~~~ 2025-12-04T12:35:04.6502508Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1950:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6502602Z 1950 | 0x80, 2025-12-04T12:35:04.6502693Z | ^~~~ 2025-12-04T12:35:04.6503879Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1952:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6504045Z 1952 | 0x80, 2025-12-04T12:35:04.6504136Z | ^~~~ 2025-12-04T12:35:04.6505365Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1954:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6505493Z 1954 | 0x80, 2025-12-04T12:35:04.6505600Z | ^~~~ 2025-12-04T12:35:04.6506814Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1956:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6506908Z 1956 | 0x80, 2025-12-04T12:35:04.6507016Z | ^~~~ 2025-12-04T12:35:04.6508193Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1958:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6508308Z 1958 | 0x80, 2025-12-04T12:35:04.6508400Z | ^~~~ 2025-12-04T12:35:04.6509574Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1960:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6509686Z 1960 | 0x80, 2025-12-04T12:35:04.6509778Z | ^~~~ 2025-12-04T12:35:04.6510952Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1962:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6511059Z 1962 | 0x80, 2025-12-04T12:35:04.6511153Z | ^~~~ 2025-12-04T12:35:04.6512350Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1964:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6512454Z 1964 | 0x80, 2025-12-04T12:35:04.6512544Z | ^~~~ 2025-12-04T12:35:04.6513740Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1966:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6513839Z 1966 | 0x80, 2025-12-04T12:35:04.6513932Z | ^~~~ 2025-12-04T12:35:04.6515125Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1968:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6515220Z 1968 | 0x80, 2025-12-04T12:35:04.6515325Z | ^~~~ 2025-12-04T12:35:04.6516503Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1970:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6516607Z 1970 | 0x80, 2025-12-04T12:35:04.6516713Z | ^~~~ 2025-12-04T12:35:04.6517905Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1972:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6518015Z 1972 | 0x80, 2025-12-04T12:35:04.6518106Z | ^~~~ 2025-12-04T12:35:04.6519293Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1974:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6519402Z 1974 | 0x80, 2025-12-04T12:35:04.6519493Z | ^~~~ 2025-12-04T12:35:04.6520720Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1976:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6520826Z 1976 | 0x80, 2025-12-04T12:35:04.6520919Z | ^~~~ 2025-12-04T12:35:04.6522144Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1978:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6522273Z 1978 | 0x80, 2025-12-04T12:35:04.6522365Z | ^~~~ 2025-12-04T12:35:04.6523594Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1980:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6523692Z 1980 | 0x80, 2025-12-04T12:35:04.6523796Z | ^~~~ 2025-12-04T12:35:04.6524984Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1982:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6525079Z 1982 | 0x80, 2025-12-04T12:35:04.6525187Z | ^~~~ 2025-12-04T12:35:04.6526382Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1984:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6526485Z 1984 | 0x80, 2025-12-04T12:35:04.6526589Z | ^~~~ 2025-12-04T12:35:04.6527789Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1986:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6527900Z 1986 | 0x80, 2025-12-04T12:35:04.6527996Z | ^~~~ 2025-12-04T12:35:04.6529190Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1988:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6529300Z 1988 | 0x80, 2025-12-04T12:35:04.6529394Z | ^~~~ 2025-12-04T12:35:04.6530595Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1990:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6530697Z 1990 | 0x80, 2025-12-04T12:35:04.6530789Z | ^~~~ 2025-12-04T12:35:04.6531982Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1992:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6532075Z 1992 | 0x80, 2025-12-04T12:35:04.6532169Z | ^~~~ 2025-12-04T12:35:04.6533388Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:2002:38: warning: overflow in conversion from ‘int’ to ‘short int’ changes value from ‘65280’ to ‘-256’ [-Woverflow] 2025-12-04T12:35:04.6533550Z 2002 | __m512i keep_1 = _mm512_set1_epi16(0xFF00); 2025-12-04T12:35:04.6533680Z | ^~~~~~ 2025-12-04T12:35:04.6536105Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In instantiation of ‘at::vec::CPU_CAPABILITY::Vectorized at::vec::CPU_CAPABILITY::shift_512_8(const at::vec::CPU_CAPABILITY::Vectorized&, const at::vec::CPU_CAPABILITY::Vectorized&) [with bool left_shift = true; T = unsigned char; typename std::enable_if<(is_same_v || is_same_v), int>::type = 0]’: 2025-12-04T12:35:04.6536773Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:2081:27: required from here 2025-12-04T12:35:04.6538048Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1866:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6538143Z 1866 | 0x80, 2025-12-04T12:35:04.6538251Z | ^~~~ 2025-12-04T12:35:04.6539477Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1868:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6539619Z 1868 | 0x80, 2025-12-04T12:35:04.6539711Z | ^~~~ 2025-12-04T12:35:04.6540935Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1870:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6541046Z 1870 | 0x80, 2025-12-04T12:35:04.6541146Z | ^~~~ 2025-12-04T12:35:04.6542328Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1872:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6542438Z 1872 | 0x80, 2025-12-04T12:35:04.6542530Z | ^~~~ 2025-12-04T12:35:04.6543730Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1874:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6543832Z 1874 | 0x80, 2025-12-04T12:35:04.6543923Z | ^~~~ 2025-12-04T12:35:04.6545127Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1876:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6545222Z 1876 | 0x80, 2025-12-04T12:35:04.6545333Z | ^~~~ 2025-12-04T12:35:04.6546522Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1878:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6546618Z 1878 | 0x80, 2025-12-04T12:35:04.6546726Z | ^~~~ 2025-12-04T12:35:04.6547912Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1880:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6548013Z 1880 | 0x80, 2025-12-04T12:35:04.6548119Z | ^~~~ 2025-12-04T12:35:04.6549305Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1882:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6549458Z 1882 | 0x80, 2025-12-04T12:35:04.6549552Z | ^~~~ 2025-12-04T12:35:04.6550727Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1884:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6550836Z 1884 | 0x80, 2025-12-04T12:35:04.6550930Z | ^~~~ 2025-12-04T12:35:04.6552165Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1886:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6552261Z 1886 | 0x80, 2025-12-04T12:35:04.6552355Z | ^~~~ 2025-12-04T12:35:04.6553550Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1888:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6553647Z 1888 | 0x80, 2025-12-04T12:35:04.6553741Z | ^~~~ 2025-12-04T12:35:04.6554929Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1890:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6555022Z 1890 | 0x80, 2025-12-04T12:35:04.6555130Z | ^~~~ 2025-12-04T12:35:04.6556352Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1892:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6556448Z 1892 | 0x80, 2025-12-04T12:35:04.6556556Z | ^~~~ 2025-12-04T12:35:04.6557798Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1894:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6557915Z 1894 | 0x80, 2025-12-04T12:35:04.6558006Z | ^~~~ 2025-12-04T12:35:04.6559184Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1896:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6559290Z 1896 | 0x80, 2025-12-04T12:35:04.6559382Z | ^~~~ 2025-12-04T12:35:04.6560567Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1898:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6560673Z 1898 | 0x80, 2025-12-04T12:35:04.6560764Z | ^~~~ 2025-12-04T12:35:04.6561958Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1900:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6562066Z 1900 | 0x80, 2025-12-04T12:35:04.6562157Z | ^~~~ 2025-12-04T12:35:04.6563343Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1902:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6563439Z 1902 | 0x80, 2025-12-04T12:35:04.6563535Z | ^~~~ 2025-12-04T12:35:04.6564730Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1904:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6564824Z 1904 | 0x80, 2025-12-04T12:35:04.6564934Z | ^~~~ 2025-12-04T12:35:04.6566100Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1906:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6566237Z 1906 | 0x80, 2025-12-04T12:35:04.6566340Z | ^~~~ 2025-12-04T12:35:04.6567509Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1908:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6567614Z 1908 | 0x80, 2025-12-04T12:35:04.6567712Z | ^~~~ 2025-12-04T12:35:04.6568926Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1910:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6569032Z 1910 | 0x80, 2025-12-04T12:35:04.6569129Z | ^~~~ 2025-12-04T12:35:04.6570304Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1912:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6570419Z 1912 | 0x80, 2025-12-04T12:35:04.6570510Z | ^~~~ 2025-12-04T12:35:04.6571892Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1914:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6571985Z 1914 | 0x80, 2025-12-04T12:35:04.6572179Z | ^~~~ 2025-12-04T12:35:04.6573382Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1916:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6573478Z 1916 | 0x80, 2025-12-04T12:35:04.6573632Z | ^~~~ 2025-12-04T12:35:04.6574816Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1918:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6574918Z 1918 | 0x80, 2025-12-04T12:35:04.6575025Z | ^~~~ 2025-12-04T12:35:04.6576204Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1920:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6576352Z 1920 | 0x80, 2025-12-04T12:35:04.6576479Z | ^~~~ 2025-12-04T12:35:04.6577667Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1922:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6577773Z 1922 | 0x80, 2025-12-04T12:35:04.6577870Z | ^~~~ 2025-12-04T12:35:04.6579042Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1924:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6579158Z 1924 | 0x80, 2025-12-04T12:35:04.6579249Z | ^~~~ 2025-12-04T12:35:04.6580437Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1926:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6580539Z 1926 | 0x80, 2025-12-04T12:35:04.6580636Z | ^~~~ 2025-12-04T12:35:04.6581826Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1928:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6581929Z 1928 | 0x80); 2025-12-04T12:35:04.6582022Z | ^~~~ 2025-12-04T12:35:04.6583208Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1930:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6583362Z 1930 | 0x80, 2025-12-04T12:35:04.6583467Z | ^~~~ 2025-12-04T12:35:04.6584645Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1932:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6584785Z 1932 | 0x80, 2025-12-04T12:35:04.6584943Z | ^~~~ 2025-12-04T12:35:04.6586126Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1934:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6586271Z 1934 | 0x80, 2025-12-04T12:35:04.6586366Z | ^~~~ 2025-12-04T12:35:04.6587543Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1936:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6587665Z 1936 | 0x80, 2025-12-04T12:35:04.6587758Z | ^~~~ 2025-12-04T12:35:04.6588937Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1938:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6589057Z 1938 | 0x80, 2025-12-04T12:35:04.6589149Z | ^~~~ 2025-12-04T12:35:04.6590342Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1940:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6590441Z 1940 | 0x80, 2025-12-04T12:35:04.6590532Z | ^~~~ 2025-12-04T12:35:04.6591724Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1942:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6591822Z 1942 | 0x80, 2025-12-04T12:35:04.6591916Z | ^~~~ 2025-12-04T12:35:04.6593103Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1944:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6593209Z 1944 | 0x80, 2025-12-04T12:35:04.6593314Z | ^~~~ 2025-12-04T12:35:04.6594490Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1946:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6594589Z 1946 | 0x80, 2025-12-04T12:35:04.6594695Z | ^~~~ 2025-12-04T12:35:04.6595867Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1948:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6595979Z 1948 | 0x80, 2025-12-04T12:35:04.6596071Z | ^~~~ 2025-12-04T12:35:04.6597243Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1950:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6597362Z 1950 | 0x80, 2025-12-04T12:35:04.6597453Z | ^~~~ 2025-12-04T12:35:04.6598632Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1952:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6598732Z 1952 | 0x80, 2025-12-04T12:35:04.6598823Z | ^~~~ 2025-12-04T12:35:04.6600008Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1954:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6600150Z 1954 | 0x80, 2025-12-04T12:35:04.6600240Z | ^~~~ 2025-12-04T12:35:04.6601469Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1956:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6601596Z 1956 | 0x80, 2025-12-04T12:35:04.6601700Z | ^~~~ 2025-12-04T12:35:04.6602911Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1958:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6603005Z 1958 | 0x80, 2025-12-04T12:35:04.6603108Z | ^~~~ 2025-12-04T12:35:04.6604279Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1960:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6604391Z 1960 | 0x80, 2025-12-04T12:35:04.6604483Z | ^~~~ 2025-12-04T12:35:04.6605662Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1962:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6605773Z 1962 | 0x80, 2025-12-04T12:35:04.6605866Z | ^~~~ 2025-12-04T12:35:04.6607044Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1964:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6607152Z 1964 | 0x80, 2025-12-04T12:35:04.6607243Z | ^~~~ 2025-12-04T12:35:04.6608425Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1966:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6608524Z 1966 | 0x80, 2025-12-04T12:35:04.6608615Z | ^~~~ 2025-12-04T12:35:04.6609802Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1968:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6609902Z 1968 | 0x80, 2025-12-04T12:35:04.6609992Z | ^~~~ 2025-12-04T12:35:04.6611184Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1970:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6611278Z 1970 | 0x80, 2025-12-04T12:35:04.6611382Z | ^~~~ 2025-12-04T12:35:04.6612558Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1972:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6612658Z 1972 | 0x80, 2025-12-04T12:35:04.6612766Z | ^~~~ 2025-12-04T12:35:04.6613946Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1974:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6614057Z 1974 | 0x80, 2025-12-04T12:35:04.6614149Z | ^~~~ 2025-12-04T12:35:04.6615329Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1976:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6615439Z 1976 | 0x80, 2025-12-04T12:35:04.6615533Z | ^~~~ 2025-12-04T12:35:04.6616773Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1978:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6616942Z 1978 | 0x80, 2025-12-04T12:35:04.6617037Z | ^~~~ 2025-12-04T12:35:04.6618266Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1980:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6618412Z 1980 | 0x80, 2025-12-04T12:35:04.6618510Z | ^~~~ 2025-12-04T12:35:04.6619738Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1982:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6619834Z 1982 | 0x80, 2025-12-04T12:35:04.6619939Z | ^~~~ 2025-12-04T12:35:04.6621114Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1984:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6621217Z 1984 | 0x80, 2025-12-04T12:35:04.6621323Z | ^~~~ 2025-12-04T12:35:04.6622504Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1986:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6622603Z 1986 | 0x80, 2025-12-04T12:35:04.6622710Z | ^~~~ 2025-12-04T12:35:04.6623891Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1988:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6623997Z 1988 | 0x80, 2025-12-04T12:35:04.6624090Z | ^~~~ 2025-12-04T12:35:04.6625270Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1990:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6625384Z 1990 | 0x80, 2025-12-04T12:35:04.6625474Z | ^~~~ 2025-12-04T12:35:04.6626664Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1992:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6626763Z 1992 | 0x80, 2025-12-04T12:35:04.6626856Z | ^~~~ 2025-12-04T12:35:04.6628073Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:2002:38: warning: overflow in conversion from ‘int’ to ‘short int’ changes value from ‘65280’ to ‘-256’ [-Woverflow] 2025-12-04T12:35:04.6628234Z 2002 | __m512i keep_1 = _mm512_set1_epi16(0xFF00); 2025-12-04T12:35:04.6628351Z | ^~~~~~ 2025-12-04T12:35:04.6630764Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In instantiation of ‘at::vec::CPU_CAPABILITY::Vectorized at::vec::CPU_CAPABILITY::shift_512_8(const at::vec::CPU_CAPABILITY::Vectorized&, const at::vec::CPU_CAPABILITY::Vectorized&) [with bool left_shift = false; T = signed char; typename std::enable_if<(is_same_v || is_same_v), int>::type = 0]’: 2025-12-04T12:35:04.6631352Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:2109:28: required from here 2025-12-04T12:35:04.6632556Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1866:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6632654Z 1866 | 0x80, 2025-12-04T12:35:04.6632765Z | ^~~~ 2025-12-04T12:35:04.6633942Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1868:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6634082Z 1868 | 0x80, 2025-12-04T12:35:04.6634191Z | ^~~~ 2025-12-04T12:35:04.6635418Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1870:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6635562Z 1870 | 0x80, 2025-12-04T12:35:04.6635659Z | ^~~~ 2025-12-04T12:35:04.6636868Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1872:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6636979Z 1872 | 0x80, 2025-12-04T12:35:04.6637073Z | ^~~~ 2025-12-04T12:35:04.6638254Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1874:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6638364Z 1874 | 0x80, 2025-12-04T12:35:04.6638455Z | ^~~~ 2025-12-04T12:35:04.6639646Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1876:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6639748Z 1876 | 0x80, 2025-12-04T12:35:04.6639838Z | ^~~~ 2025-12-04T12:35:04.6641037Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1878:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6641131Z 1878 | 0x80, 2025-12-04T12:35:04.6641241Z | ^~~~ 2025-12-04T12:35:04.6642420Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1880:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6642512Z 1880 | 0x80, 2025-12-04T12:35:04.6642619Z | ^~~~ 2025-12-04T12:35:04.6643796Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1882:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6643899Z 1882 | 0x80, 2025-12-04T12:35:04.6644007Z | ^~~~ 2025-12-04T12:35:04.6645196Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1884:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6645307Z 1884 | 0x80, 2025-12-04T12:35:04.6645397Z | ^~~~ 2025-12-04T12:35:04.6646615Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1886:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6646720Z 1886 | 0x80, 2025-12-04T12:35:04.6646810Z | ^~~~ 2025-12-04T12:35:04.6648004Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1888:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6648132Z 1888 | 0x80, 2025-12-04T12:35:04.6648223Z | ^~~~ 2025-12-04T12:35:04.6649429Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1890:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6649524Z 1890 | 0x80, 2025-12-04T12:35:04.6649614Z | ^~~~ 2025-12-04T12:35:04.6650813Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1892:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6650908Z 1892 | 0x80, 2025-12-04T12:35:04.6651010Z | ^~~~ 2025-12-04T12:35:04.6652228Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1894:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6652333Z 1894 | 0x80, 2025-12-04T12:35:04.6652438Z | ^~~~ 2025-12-04T12:35:04.6653663Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1896:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6653770Z 1896 | 0x80, 2025-12-04T12:35:04.6653865Z | ^~~~ 2025-12-04T12:35:04.6655040Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1898:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6655146Z 1898 | 0x80, 2025-12-04T12:35:04.6655238Z | ^~~~ 2025-12-04T12:35:04.6656494Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1900:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6656609Z 1900 | 0x80, 2025-12-04T12:35:04.6656699Z | ^~~~ 2025-12-04T12:35:04.6657899Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1902:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6657994Z 1902 | 0x80, 2025-12-04T12:35:04.6658092Z | ^~~~ 2025-12-04T12:35:04.6659286Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1904:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6659382Z 1904 | 0x80, 2025-12-04T12:35:04.6659487Z | ^~~~ 2025-12-04T12:35:04.6660662Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1906:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6660763Z 1906 | 0x80, 2025-12-04T12:35:04.6660875Z | ^~~~ 2025-12-04T12:35:04.6662052Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1908:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6662146Z 1908 | 0x80, 2025-12-04T12:35:04.6662300Z | ^~~~ 2025-12-04T12:35:04.6663489Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1910:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6663597Z 1910 | 0x80, 2025-12-04T12:35:04.6663688Z | ^~~~ 2025-12-04T12:35:04.6664870Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1912:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6665095Z 1912 | 0x80, 2025-12-04T12:35:04.6665188Z | ^~~~ 2025-12-04T12:35:04.6666392Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1914:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6666491Z 1914 | 0x80, 2025-12-04T12:35:04.6666591Z | ^~~~ 2025-12-04T12:35:04.6667778Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1916:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6667874Z 1916 | 0x80, 2025-12-04T12:35:04.6667969Z | ^~~~ 2025-12-04T12:35:04.6669243Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1918:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6669346Z 1918 | 0x80, 2025-12-04T12:35:04.6669451Z | ^~~~ 2025-12-04T12:35:04.6670663Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1920:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6670762Z 1920 | 0x80, 2025-12-04T12:35:04.6670877Z | ^~~~ 2025-12-04T12:35:04.6672200Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1922:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6672312Z 1922 | 0x80, 2025-12-04T12:35:04.6672405Z | ^~~~ 2025-12-04T12:35:04.6673586Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1924:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6673703Z 1924 | 0x80, 2025-12-04T12:35:04.6673795Z | ^~~~ 2025-12-04T12:35:04.6674981Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1926:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6675100Z 1926 | 0x80, 2025-12-04T12:35:04.6675192Z | ^~~~ 2025-12-04T12:35:04.6676375Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1928:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6676471Z 1928 | 0x80); 2025-12-04T12:35:04.6676563Z | ^~~~ 2025-12-04T12:35:04.6677760Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1930:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6677855Z 1930 | 0x80, 2025-12-04T12:35:04.6677968Z | ^~~~ 2025-12-04T12:35:04.6679147Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1932:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6679328Z 1932 | 0x80, 2025-12-04T12:35:04.6679432Z | ^~~~ 2025-12-04T12:35:04.6680620Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1934:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6680715Z 1934 | 0x80, 2025-12-04T12:35:04.6680822Z | ^~~~ 2025-12-04T12:35:04.6682118Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1936:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6682228Z 1936 | 0x80, 2025-12-04T12:35:04.6682319Z | ^~~~ 2025-12-04T12:35:04.6683542Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1938:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6683656Z 1938 | 0x80, 2025-12-04T12:35:04.6683748Z | ^~~~ 2025-12-04T12:35:04.6684945Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1940:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6685038Z 1940 | 0x80, 2025-12-04T12:35:04.6685129Z | ^~~~ 2025-12-04T12:35:04.6686324Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1942:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6686417Z 1942 | 0x80, 2025-12-04T12:35:04.6686508Z | ^~~~ 2025-12-04T12:35:04.6687706Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1944:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6687807Z 1944 | 0x80, 2025-12-04T12:35:04.6687912Z | ^~~~ 2025-12-04T12:35:04.6689087Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1946:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6689180Z 1946 | 0x80, 2025-12-04T12:35:04.6689291Z | ^~~~ 2025-12-04T12:35:04.6690469Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1948:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6690579Z 1948 | 0x80, 2025-12-04T12:35:04.6690680Z | ^~~~ 2025-12-04T12:35:04.6691859Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1950:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6691971Z 1950 | 0x80, 2025-12-04T12:35:04.6692063Z | ^~~~ 2025-12-04T12:35:04.6693239Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1952:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6693347Z 1952 | 0x80, 2025-12-04T12:35:04.6693454Z | ^~~~ 2025-12-04T12:35:04.6694642Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1954:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6694736Z 1954 | 0x80, 2025-12-04T12:35:04.6694834Z | ^~~~ 2025-12-04T12:35:04.6696025Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1956:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6696158Z 1956 | 0x80, 2025-12-04T12:35:04.6696261Z | ^~~~ 2025-12-04T12:35:04.6697541Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1958:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6697635Z 1958 | 0x80, 2025-12-04T12:35:04.6697822Z | ^~~~ 2025-12-04T12:35:04.6699000Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1960:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6699093Z 1960 | 0x80, 2025-12-04T12:35:04.6699233Z | ^~~~ 2025-12-04T12:35:04.6700411Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1962:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6708776Z 1962 | 0x80, 2025-12-04T12:35:04.6708946Z | ^~~~ 2025-12-04T12:35:04.6710279Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1964:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6710391Z 1964 | 0x80, 2025-12-04T12:35:04.6710512Z | ^~~~ 2025-12-04T12:35:04.6711702Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1966:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6711813Z 1966 | 0x80, 2025-12-04T12:35:04.6711914Z | ^~~~ 2025-12-04T12:35:04.6713111Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1968:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6713213Z 1968 | 0x80, 2025-12-04T12:35:04.6713305Z | ^~~~ 2025-12-04T12:35:04.6714496Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1970:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6714599Z 1970 | 0x80, 2025-12-04T12:35:04.6714712Z | ^~~~ 2025-12-04T12:35:04.6715882Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1972:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6715983Z 1972 | 0x80, 2025-12-04T12:35:04.6716094Z | ^~~~ 2025-12-04T12:35:04.6717271Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1974:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6717384Z 1974 | 0x80, 2025-12-04T12:35:04.6717478Z | ^~~~ 2025-12-04T12:35:04.6718652Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1976:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6718767Z 1976 | 0x80, 2025-12-04T12:35:04.6718865Z | ^~~~ 2025-12-04T12:35:04.6720036Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1978:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6720157Z 1978 | 0x80, 2025-12-04T12:35:04.6720248Z | ^~~~ 2025-12-04T12:35:04.6721429Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1980:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6721623Z 1980 | 0x80, 2025-12-04T12:35:04.6721712Z | ^~~~ 2025-12-04T12:35:04.6722902Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1982:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6723082Z 1982 | 0x80, 2025-12-04T12:35:04.6723173Z | ^~~~ 2025-12-04T12:35:04.6724370Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1984:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6724502Z 1984 | 0x80, 2025-12-04T12:35:04.6724611Z | ^~~~ 2025-12-04T12:35:04.6725787Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1986:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6725887Z 1986 | 0x80, 2025-12-04T12:35:04.6725994Z | ^~~~ 2025-12-04T12:35:04.6727167Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1988:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6727284Z 1988 | 0x80, 2025-12-04T12:35:04.6727374Z | ^~~~ 2025-12-04T12:35:04.6728542Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1990:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6728652Z 1990 | 0x80, 2025-12-04T12:35:04.6728741Z | ^~~~ 2025-12-04T12:35:04.6729918Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1992:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6730027Z 1992 | 0x80, 2025-12-04T12:35:04.6730117Z | ^~~~ 2025-12-04T12:35:04.6731306Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:2002:38: warning: overflow in conversion from ‘int’ to ‘short int’ changes value from ‘65280’ to ‘-256’ [-Woverflow] 2025-12-04T12:35:04.6731478Z 2002 | __m512i keep_1 = _mm512_set1_epi16(0xFF00); 2025-12-04T12:35:04.6731597Z | ^~~~~~ 2025-12-04T12:35:04.6734044Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In instantiation of ‘at::vec::CPU_CAPABILITY::Vectorized at::vec::CPU_CAPABILITY::shift_512_8(const at::vec::CPU_CAPABILITY::Vectorized&, const at::vec::CPU_CAPABILITY::Vectorized&) [with bool left_shift = false; T = unsigned char; typename std::enable_if<(is_same_v || is_same_v), int>::type = 0]’: 2025-12-04T12:35:04.6734630Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:2116:28: required from here 2025-12-04T12:35:04.6735836Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1866:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6735933Z 1866 | 0x80, 2025-12-04T12:35:04.6736041Z | ^~~~ 2025-12-04T12:35:04.6737313Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1868:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6737407Z 1868 | 0x80, 2025-12-04T12:35:04.6737510Z | ^~~~ 2025-12-04T12:35:04.6738692Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1870:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6738849Z 1870 | 0x80, 2025-12-04T12:35:04.6738941Z | ^~~~ 2025-12-04T12:35:04.6740155Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1872:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6740296Z 1872 | 0x80, 2025-12-04T12:35:04.6740386Z | ^~~~ 2025-12-04T12:35:04.6741595Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1874:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6741704Z 1874 | 0x80, 2025-12-04T12:35:04.6741799Z | ^~~~ 2025-12-04T12:35:04.6742978Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1876:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6743077Z 1876 | 0x80, 2025-12-04T12:35:04.6743171Z | ^~~~ 2025-12-04T12:35:04.6744359Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1878:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6744459Z 1878 | 0x80, 2025-12-04T12:35:04.6744552Z | ^~~~ 2025-12-04T12:35:04.6745743Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1880:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6745834Z 1880 | 0x80, 2025-12-04T12:35:04.6745939Z | ^~~~ 2025-12-04T12:35:04.6747110Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1882:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6747211Z 1882 | 0x80, 2025-12-04T12:35:04.6747317Z | ^~~~ 2025-12-04T12:35:04.6748495Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1884:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6748609Z 1884 | 0x80, 2025-12-04T12:35:04.6748704Z | ^~~~ 2025-12-04T12:35:04.6749883Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1886:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6749986Z 1886 | 0x80, 2025-12-04T12:35:04.6750078Z | ^~~~ 2025-12-04T12:35:04.6751248Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1888:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6751417Z 1888 | 0x80, 2025-12-04T12:35:04.6751506Z | ^~~~ 2025-12-04T12:35:04.6752697Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1890:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6752832Z 1890 | 0x80, 2025-12-04T12:35:04.6752928Z | ^~~~ 2025-12-04T12:35:04.6754123Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1892:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6754215Z 1892 | 0x80, 2025-12-04T12:35:04.6754323Z | ^~~~ 2025-12-04T12:35:04.6755499Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1894:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6755598Z 1894 | 0x80, 2025-12-04T12:35:04.6755700Z | ^~~~ 2025-12-04T12:35:04.6756912Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1896:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6757025Z 1896 | 0x80, 2025-12-04T12:35:04.6757117Z | ^~~~ 2025-12-04T12:35:04.6758331Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1898:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6758436Z 1898 | 0x80, 2025-12-04T12:35:04.6758524Z | ^~~~ 2025-12-04T12:35:04.6759703Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1900:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6759812Z 1900 | 0x80, 2025-12-04T12:35:04.6759900Z | ^~~~ 2025-12-04T12:35:04.6761087Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1902:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6761191Z 1902 | 0x80, 2025-12-04T12:35:04.6761285Z | ^~~~ 2025-12-04T12:35:04.6762470Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1904:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6762564Z 1904 | 0x80, 2025-12-04T12:35:04.6762655Z | ^~~~ 2025-12-04T12:35:04.6763834Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1906:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6763931Z 1906 | 0x80, 2025-12-04T12:35:04.6764035Z | ^~~~ 2025-12-04T12:35:04.6765212Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1908:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6765310Z 1908 | 0x80, 2025-12-04T12:35:04.6765414Z | ^~~~ 2025-12-04T12:35:04.6766598Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1910:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6766704Z 1910 | 0x80, 2025-12-04T12:35:04.6766792Z | ^~~~ 2025-12-04T12:35:04.6767960Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1912:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6768136Z 1912 | 0x80, 2025-12-04T12:35:04.6768232Z | ^~~~ 2025-12-04T12:35:04.6769409Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1914:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6769566Z 1914 | 0x80, 2025-12-04T12:35:04.6769657Z | ^~~~ 2025-12-04T12:35:04.6770856Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1916:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6771136Z 1916 | 0x80, 2025-12-04T12:35:04.6771228Z | ^~~~ 2025-12-04T12:35:04.6772434Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1918:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6772526Z 1918 | 0x80, 2025-12-04T12:35:04.6772637Z | ^~~~ 2025-12-04T12:35:04.6774431Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1920:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6774549Z 1920 | 0x80, 2025-12-04T12:35:04.6774655Z | ^~~~ 2025-12-04T12:35:04.6775895Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1922:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6776001Z 1922 | 0x80, 2025-12-04T12:35:04.6776092Z | ^~~~ 2025-12-04T12:35:04.6777348Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1924:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6777455Z 1924 | 0x80, 2025-12-04T12:35:04.6777546Z | ^~~~ 2025-12-04T12:35:04.6778728Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1926:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6778843Z 1926 | 0x80, 2025-12-04T12:35:04.6778938Z | ^~~~ 2025-12-04T12:35:04.6780126Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1928:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6780220Z 1928 | 0x80); 2025-12-04T12:35:04.6780311Z | ^~~~ 2025-12-04T12:35:04.6781494Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1930:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6781588Z 1930 | 0x80, 2025-12-04T12:35:04.6781678Z | ^~~~ 2025-12-04T12:35:04.6782872Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1932:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6782971Z 1932 | 0x80, 2025-12-04T12:35:04.6783077Z | ^~~~ 2025-12-04T12:35:04.6784257Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1934:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6784348Z 1934 | 0x80, 2025-12-04T12:35:04.6784453Z | ^~~~ 2025-12-04T12:35:04.6785692Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1936:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6785799Z 1936 | 0x80, 2025-12-04T12:35:04.6785889Z | ^~~~ 2025-12-04T12:35:04.6787098Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1938:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6787257Z 1938 | 0x80, 2025-12-04T12:35:04.6787348Z | ^~~~ 2025-12-04T12:35:04.6788566Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1940:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6788674Z 1940 | 0x80, 2025-12-04T12:35:04.6788774Z | ^~~~ 2025-12-04T12:35:04.6789964Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1942:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6790059Z 1942 | 0x80, 2025-12-04T12:35:04.6790150Z | ^~~~ 2025-12-04T12:35:04.6791341Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1944:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6791442Z 1944 | 0x80, 2025-12-04T12:35:04.6791543Z | ^~~~ 2025-12-04T12:35:04.6792719Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1946:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6792813Z 1946 | 0x80, 2025-12-04T12:35:04.6792923Z | ^~~~ 2025-12-04T12:35:04.6794098Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1948:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6794203Z 1948 | 0x80, 2025-12-04T12:35:04.6794290Z | ^~~~ 2025-12-04T12:35:04.6795473Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1950:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6795589Z 1950 | 0x80, 2025-12-04T12:35:04.6795682Z | ^~~~ 2025-12-04T12:35:04.6796859Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1952:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6796963Z 1952 | 0x80, 2025-12-04T12:35:04.6797059Z | ^~~~ 2025-12-04T12:35:04.6798242Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1954:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6798337Z 1954 | 0x80, 2025-12-04T12:35:04.6798427Z | ^~~~ 2025-12-04T12:35:04.6799617Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1956:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6799716Z 1956 | 0x80, 2025-12-04T12:35:04.6799808Z | ^~~~ 2025-12-04T12:35:04.6800996Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1958:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6801091Z 1958 | 0x80, 2025-12-04T12:35:04.6801231Z | ^~~~ 2025-12-04T12:35:04.6802407Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1960:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6802498Z 1960 | 0x80, 2025-12-04T12:35:04.6802605Z | ^~~~ 2025-12-04T12:35:04.6803817Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1962:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6803956Z 1962 | 0x80, 2025-12-04T12:35:04.6804046Z | ^~~~ 2025-12-04T12:35:04.6805272Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1964:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6805385Z 1964 | 0x80, 2025-12-04T12:35:04.6805476Z | ^~~~ 2025-12-04T12:35:04.6806646Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1966:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6806751Z 1966 | 0x80, 2025-12-04T12:35:04.6806841Z | ^~~~ 2025-12-04T12:35:04.6808040Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1968:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6808131Z 1968 | 0x80, 2025-12-04T12:35:04.6808220Z | ^~~~ 2025-12-04T12:35:04.6809425Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1970:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6809527Z 1970 | 0x80, 2025-12-04T12:35:04.6809630Z | ^~~~ 2025-12-04T12:35:04.6810808Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1972:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6810903Z 1972 | 0x80, 2025-12-04T12:35:04.6811004Z | ^~~~ 2025-12-04T12:35:04.6812194Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1974:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6812296Z 1974 | 0x80, 2025-12-04T12:35:04.6812385Z | ^~~~ 2025-12-04T12:35:04.6813563Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1976:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6813670Z 1976 | 0x80, 2025-12-04T12:35:04.6813759Z | ^~~~ 2025-12-04T12:35:04.6814933Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1978:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6815039Z 1978 | 0x80, 2025-12-04T12:35:04.6815129Z | ^~~~ 2025-12-04T12:35:04.6816393Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1980:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6816490Z 1980 | 0x80, 2025-12-04T12:35:04.6816584Z | ^~~~ 2025-12-04T12:35:04.6817782Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1982:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6817928Z 1982 | 0x80, 2025-12-04T12:35:04.6818019Z | ^~~~ 2025-12-04T12:35:04.6819209Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1984:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6819301Z 1984 | 0x80, 2025-12-04T12:35:04.6819447Z | ^~~~ 2025-12-04T12:35:04.6820651Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1986:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6820748Z 1986 | 0x80, 2025-12-04T12:35:04.6820893Z | ^~~~ 2025-12-04T12:35:04.6822070Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1988:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6822185Z 1988 | 0x80, 2025-12-04T12:35:04.6822277Z | ^~~~ 2025-12-04T12:35:04.6823456Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1990:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6823565Z 1990 | 0x80, 2025-12-04T12:35:04.6823662Z | ^~~~ 2025-12-04T12:35:04.6824831Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1992:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.6824936Z 1992 | 0x80, 2025-12-04T12:35:04.6825034Z | ^~~~ 2025-12-04T12:35:04.6826227Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:2002:38: warning: overflow in conversion from ‘int’ to ‘short int’ changes value from ‘65280’ to ‘-256’ [-Woverflow] 2025-12-04T12:35:04.6826392Z 2002 | __m512i keep_1 = _mm512_set1_epi16(0xFF00); 2025-12-04T12:35:04.6826510Z | ^~~~~~ 2025-12-04T12:35:04.6827030Z In file included from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512.h:16, 2025-12-04T12:35:04.6827404Z from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec.h:5, 2025-12-04T12:35:04.6827862Z from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/functional_base.h:7, 2025-12-04T12:35:04.6828268Z from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/functional.h:4, 2025-12-04T12:35:04.6828742Z from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/torch/csrc/inductor/cpp_prefix.h:45, 2025-12-04T12:35:04.6829401Z from /tmp/GPm4bX/tmp2gno_q_y/data/aotinductor/model2/cyss5jazqjsvp5s2t3ihlofugodyzirark5aiimqjwirn4hylxbp.wrapper.cpp:656: 2025-12-04T12:35:04.6830914Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h: In instantiation of ‘void at::vec::CPU_CAPABILITY::QuantizeAvx512(const float*, T*, int, float, int64_t) [with T = signed char; int64_t = long int]’: 2025-12-04T12:35:04.6831511Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:696:31: required from here 2025-12-04T12:35:04.6832746Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:201:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.6832885Z 201 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.6832979Z | ^~~~ 2025-12-04T12:35:04.6834166Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:201:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.6834297Z 201 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.6834398Z | ^~~~ 2025-12-04T12:35:04.6835621Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:201:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.6835753Z 201 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.6835848Z | ^~~~ 2025-12-04T12:35:04.6837082Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:201:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.6837192Z 201 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.6837290Z | ^~~~ 2025-12-04T12:35:04.6838500Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:202:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.6838618Z 202 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.6838720Z | ^~~~ 2025-12-04T12:35:04.6839908Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:202:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.6840029Z 202 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.6840140Z | ^~~~ 2025-12-04T12:35:04.6841331Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:202:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.6841455Z 202 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.6841563Z | ^~~~ 2025-12-04T12:35:04.6842746Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:202:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.6842873Z 202 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.6842975Z | ^~~~ 2025-12-04T12:35:04.6844160Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:203:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.6844283Z 203 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.6844376Z | ^~~~ 2025-12-04T12:35:04.6845569Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:203:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.6845721Z 203 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.6845817Z | ^~~~ 2025-12-04T12:35:04.6847017Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:203:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.6847135Z 203 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.6847283Z | ^~~~ 2025-12-04T12:35:04.6848470Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:203:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.6848587Z 203 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.6848694Z | ^~~~ 2025-12-04T12:35:04.6849863Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:205:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.6849990Z 205 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.6850082Z | ^~~~ 2025-12-04T12:35:04.6851296Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:205:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.6851430Z 205 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.6851526Z | ^~~~ 2025-12-04T12:35:04.6852748Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:205:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.6852866Z 205 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.6852968Z | ^~~~ 2025-12-04T12:35:04.6854160Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:205:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.6854280Z 205 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.6854380Z | ^~~~ 2025-12-04T12:35:04.6855582Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:206:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.6855700Z 206 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.6855803Z | ^~~~ 2025-12-04T12:35:04.6857059Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:206:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.6857174Z 206 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.6857295Z | ^~~~ 2025-12-04T12:35:04.6858481Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:206:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.6858604Z 206 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.6858705Z | ^~~~ 2025-12-04T12:35:04.6859898Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:206:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.6860038Z 206 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.6860146Z | ^~~~ 2025-12-04T12:35:04.6861340Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:207:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.6861503Z 207 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.6861596Z | ^~~~ 2025-12-04T12:35:04.6862797Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:207:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.6862916Z 207 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.6863071Z | ^~~~ 2025-12-04T12:35:04.6864261Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:207:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.6864381Z 207 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.6864498Z | ^~~~ 2025-12-04T12:35:04.6865682Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:207:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.6865803Z 207 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.6865921Z | ^~~~ 2025-12-04T12:35:04.6867162Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:209:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.6867298Z 209 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.6867393Z | ^~~~ 2025-12-04T12:35:04.6868621Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:209:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.6868754Z 209 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.6868855Z | ^~~~ 2025-12-04T12:35:04.6870055Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:209:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.6870178Z 209 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.6870280Z | ^~~~ 2025-12-04T12:35:04.6871669Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:209:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.6871792Z 209 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.6871909Z | ^~~~ 2025-12-04T12:35:04.6873101Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:210:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.6873216Z 210 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.6873331Z | ^~~~ 2025-12-04T12:35:04.6874513Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:210:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.6874628Z 210 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.6874741Z | ^~~~ 2025-12-04T12:35:04.6875940Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:210:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.6876069Z 210 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.6876179Z | ^~~~ 2025-12-04T12:35:04.6877360Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:210:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.6877582Z 210 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.6877684Z | ^~~~ 2025-12-04T12:35:04.6878882Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:211:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.6879049Z 211 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.6879186Z | ^~~~ 2025-12-04T12:35:04.6880392Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:211:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.6880553Z 211 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.6880665Z | ^~~~ 2025-12-04T12:35:04.6881848Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:211:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.6881963Z 211 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.6882077Z | ^~~~ 2025-12-04T12:35:04.6883262Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:211:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.6883378Z 211 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.6883493Z | ^~~~ 2025-12-04T12:35:04.6884673Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:213:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.6884796Z 213 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.6884888Z | ^~~~ 2025-12-04T12:35:04.6886078Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:213:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.6886212Z 213 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.6886307Z | ^~~~ 2025-12-04T12:35:04.6887502Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:213:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.6887620Z 213 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.6887718Z | ^~~~ 2025-12-04T12:35:04.6888924Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:213:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.6889034Z 213 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.6889142Z | ^~~~ 2025-12-04T12:35:04.6890331Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:214:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.6890444Z 214 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.6890557Z | ^~~~ 2025-12-04T12:35:04.6891751Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:214:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.6891864Z 214 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.6891981Z | ^~~~ 2025-12-04T12:35:04.6893166Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:214:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.6893340Z 214 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.6893441Z | ^~~~ 2025-12-04T12:35:04.6894623Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:214:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.6894789Z 214 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.6894924Z | ^~~~ 2025-12-04T12:35:04.6896115Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:215:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.6896264Z 215 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.6896422Z | ^~~~ 2025-12-04T12:35:04.6897630Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:215:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.6897749Z 215 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.6897845Z | ^~~~ 2025-12-04T12:35:04.6899047Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:215:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.6899166Z 215 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.6899281Z | ^~~~ 2025-12-04T12:35:04.6900470Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:215:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.6900586Z 215 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.6900703Z | ^~~~ 2025-12-04T12:35:04.6902176Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h: In instantiation of ‘void at::vec::CPU_CAPABILITY::QuantizeAvx512(const float*, T*, int, float, int64_t) [with T = unsigned char; int64_t = long int]’: 2025-12-04T12:35:04.6902770Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:933:31: required from here 2025-12-04T12:35:04.6903965Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:201:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.6904080Z 201 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.6904189Z | ^~~~ 2025-12-04T12:35:04.6905377Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:201:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.6905513Z 201 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.6905611Z | ^~~~ 2025-12-04T12:35:04.6906799Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:201:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.6906935Z 201 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.6907043Z | ^~~~ 2025-12-04T12:35:04.6908237Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:201:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.6908358Z 201 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.6908461Z | ^~~~ 2025-12-04T12:35:04.6909652Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:202:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.6909823Z 202 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.6909929Z | ^~~~ 2025-12-04T12:35:04.6911156Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:202:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.6911306Z 202 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.6911421Z | ^~~~ 2025-12-04T12:35:04.6912640Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:202:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.6912755Z 202 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.6912867Z | ^~~~ 2025-12-04T12:35:04.6914057Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:202:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.6914191Z 202 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.6914291Z | ^~~~ 2025-12-04T12:35:04.6915476Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:203:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.6915610Z 203 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.6915707Z | ^~~~ 2025-12-04T12:35:04.6916915Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:203:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.6917027Z 203 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.6917129Z | ^~~~ 2025-12-04T12:35:04.6918324Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:203:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.6918439Z 203 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.6918559Z | ^~~~ 2025-12-04T12:35:04.6919760Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:203:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.6919872Z 203 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.6919991Z | ^~~~ 2025-12-04T12:35:04.6921170Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:205:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.6921289Z 205 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.6921396Z | ^~~~ 2025-12-04T12:35:04.6922580Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:205:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.6922717Z 205 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.6922821Z | ^~~~ 2025-12-04T12:35:04.6923999Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:205:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.6924132Z 205 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.6924232Z | ^~~~ 2025-12-04T12:35:04.6925422Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:205:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.6925660Z 205 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.6925762Z | ^~~~ 2025-12-04T12:35:04.6927036Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:206:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.6927180Z 206 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.6927272Z | ^~~~ 2025-12-04T12:35:04.6928512Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:206:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.6928627Z 206 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.6928737Z | ^~~~ 2025-12-04T12:35:04.6929919Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:206:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.6930037Z 206 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.6930152Z | ^~~~ 2025-12-04T12:35:04.6931339Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:206:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.6931469Z 206 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.6931570Z | ^~~~ 2025-12-04T12:35:04.6932755Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:207:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.6932882Z 207 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.6932982Z | ^~~~ 2025-12-04T12:35:04.6934176Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:207:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.6934288Z 207 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.6934385Z | ^~~~ 2025-12-04T12:35:04.6935596Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:207:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.6935709Z 207 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.6935815Z | ^~~~ 2025-12-04T12:35:04.6937100Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:207:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.6937266Z 207 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.6937382Z | ^~~~ 2025-12-04T12:35:04.6938576Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:209:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.6938696Z 209 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.6938844Z | ^~~~ 2025-12-04T12:35:04.6940031Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:209:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.6940165Z 209 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.6940263Z | ^~~~ 2025-12-04T12:35:04.6941445Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:209:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.6941576Z 209 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.6941676Z | ^~~~ 2025-12-04T12:35:04.6942910Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:209:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.6943028Z 209 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.6943129Z | ^~~~ 2025-12-04T12:35:04.6944360Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:210:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.6944473Z 210 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.6944567Z | ^~~~ 2025-12-04T12:35:04.6945770Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:210:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.6945891Z 210 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.6946003Z | ^~~~ 2025-12-04T12:35:04.6947191Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:210:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.6947309Z 210 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.6947422Z | ^~~~ 2025-12-04T12:35:04.6948620Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:210:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.6948746Z 210 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.6948852Z | ^~~~ 2025-12-04T12:35:04.6950029Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:211:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.6950157Z 211 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.6950248Z | ^~~~ 2025-12-04T12:35:04.6951463Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:211:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.6951578Z 211 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.6951678Z | ^~~~ 2025-12-04T12:35:04.6952873Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:211:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.6953030Z 211 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.6953129Z | ^~~~ 2025-12-04T12:35:04.6954329Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:211:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.6954451Z 211 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.6954603Z | ^~~~ 2025-12-04T12:35:04.6955789Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:213:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.6955907Z 213 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.6956010Z | ^~~~ 2025-12-04T12:35:04.6957193Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:213:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.6957324Z 213 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.6957421Z | ^~~~ 2025-12-04T12:35:04.6958643Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:213:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.6958774Z 213 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.6958873Z | ^~~~ 2025-12-04T12:35:04.6960105Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:213:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.6960220Z 213 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.6960322Z | ^~~~ 2025-12-04T12:35:04.6961517Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:214:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.6961629Z 214 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.6961723Z | ^~~~ 2025-12-04T12:35:04.6962922Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:214:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.6963039Z 214 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.6963149Z | ^~~~ 2025-12-04T12:35:04.6964339Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:214:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.6964451Z 214 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.6964573Z | ^~~~ 2025-12-04T12:35:04.6965753Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:214:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.6965881Z 214 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.6965984Z | ^~~~ 2025-12-04T12:35:04.6967168Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:215:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.6967293Z 215 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.6967392Z | ^~~~ 2025-12-04T12:35:04.6968591Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:215:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.6968742Z 215 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.6968838Z | ^~~~ 2025-12-04T12:35:04.6970033Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:215:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.6970177Z 215 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.6970310Z | ^~~~ 2025-12-04T12:35:04.6971674Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:215:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.6971868Z 215 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.6971982Z | ^~~~ 2025-12-04T12:35:04.6972529Z In file included from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_float.h:12, 2025-12-04T12:35:04.6972972Z from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512.h:11, 2025-12-04T12:35:04.6973360Z from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec.h:5, 2025-12-04T12:35:04.6973811Z from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/functional_base.h:7, 2025-12-04T12:35:04.6974234Z from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/functional.h:4, 2025-12-04T12:35:04.6974698Z from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/torch/csrc/inductor/cpp_prefix.h:45, 2025-12-04T12:35:04.6975342Z from /tmp/wzlxAD/tmp2gno_q_y/data/aotinductor/model1/cji6fcfpjxr5ad3oypbruxr5r26niflgwwkmd5rthzuhxclq6uis.wrapper.cpp:751: 2025-12-04T12:35:04.6975957Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/sleef.h:192:10: warning: ISO C++ prohibits anonymous structs [-Wpedantic] 2025-12-04T12:35:04.6976059Z 192 | struct { 2025-12-04T12:35:04.6976166Z | ^ 2025-12-04T12:35:04.6976723Z In file included from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512.h:15, 2025-12-04T12:35:04.6977096Z from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec.h:5, 2025-12-04T12:35:04.6977555Z from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/functional_base.h:7, 2025-12-04T12:35:04.6977961Z from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/functional.h:4, 2025-12-04T12:35:04.6978445Z from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/torch/csrc/inductor/cpp_prefix.h:45, 2025-12-04T12:35:04.6979086Z from /tmp/wzlxAD/tmp2gno_q_y/data/aotinductor/model1/cji6fcfpjxr5ad3oypbruxr5r26niflgwwkmd5rthzuhxclq6uis.wrapper.cpp:751: 2025-12-04T12:35:04.6981328Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In static member function ‘static at::vec::CPU_CAPABILITY::Vectorized at::vec::CPU_CAPABILITY::Vectorized::blendv(const at::vec::CPU_CAPABILITY::Vectorized&, const at::vec::CPU_CAPABILITY::Vectorized&, const at::vec::CPU_CAPABILITY::Vectorized&)’: 2025-12-04T12:35:04.6982524Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:544:38: warning: overflow in conversion from ‘int’ to ‘short int’ changes value from ‘65535’ to ‘-1’ [-Woverflow] 2025-12-04T12:35:04.6982685Z 544 | auto msb_one = _mm512_set1_epi16(0xFFFF); 2025-12-04T12:35:04.6982821Z | ^~~~~~ 2025-12-04T12:35:04.6983324Z In file included from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512.h:15, 2025-12-04T12:35:04.6983783Z from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec.h:5, 2025-12-04T12:35:04.6984243Z from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/functional_base.h:7, 2025-12-04T12:35:04.6984687Z from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/functional.h:4, 2025-12-04T12:35:04.6985207Z from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/torch/csrc/inductor/cpp_prefix.h:45, 2025-12-04T12:35:04.6985875Z from /tmp/wzlxAD/tmp2gno_q_y/data/aotinductor/model1/cji6fcfpjxr5ad3oypbruxr5r26niflgwwkmd5rthzuhxclq6uis.wrapper.cpp:751: 2025-12-04T12:35:04.6987531Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In member function ‘at::vec::CPU_CAPABILITY::Vectorized at::vec::CPU_CAPABILITY::Vectorized::operator==(const at::vec::CPU_CAPABILITY::Vectorized&) const’: 2025-12-04T12:35:04.6988706Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:697:54: warning: overflow in conversion from ‘int’ to ‘short int’ changes value from ‘65535’ to ‘-1’ [-Woverflow] 2025-12-04T12:35:04.6988922Z 697 | return _mm512_mask_set1_epi16(zero_vector, mask, 0xFFFF); 2025-12-04T12:35:04.6989075Z | ^~~~~~ 2025-12-04T12:35:04.6990705Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In member function ‘at::vec::CPU_CAPABILITY::Vectorized at::vec::CPU_CAPABILITY::Vectorized::operator!=(const at::vec::CPU_CAPABILITY::Vectorized&) const’: 2025-12-04T12:35:04.6991883Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:701:54: warning: overflow in conversion from ‘int’ to ‘short int’ changes value from ‘65535’ to ‘-1’ [-Woverflow] 2025-12-04T12:35:04.6992096Z 701 | return _mm512_mask_set1_epi16(zero_vector, mask, 0xFFFF); 2025-12-04T12:35:04.6992226Z | ^~~~~~ 2025-12-04T12:35:04.6993857Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In member function ‘at::vec::CPU_CAPABILITY::Vectorized at::vec::CPU_CAPABILITY::Vectorized::operator<(const at::vec::CPU_CAPABILITY::Vectorized&) const’: 2025-12-04T12:35:04.6995022Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:705:54: warning: overflow in conversion from ‘int’ to ‘short int’ changes value from ‘65535’ to ‘-1’ [-Woverflow] 2025-12-04T12:35:04.6995239Z 705 | return _mm512_mask_set1_epi16(zero_vector, mask, 0xFFFF); 2025-12-04T12:35:04.6995364Z | ^~~~~~ 2025-12-04T12:35:04.6996998Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In member function ‘at::vec::CPU_CAPABILITY::Vectorized at::vec::CPU_CAPABILITY::Vectorized::operator<=(const at::vec::CPU_CAPABILITY::Vectorized&) const’: 2025-12-04T12:35:04.6998162Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:709:54: warning: overflow in conversion from ‘int’ to ‘short int’ changes value from ‘65535’ to ‘-1’ [-Woverflow] 2025-12-04T12:35:04.6998369Z 709 | return _mm512_mask_set1_epi16(zero_vector, mask, 0xFFFF); 2025-12-04T12:35:04.6998506Z | ^~~~~~ 2025-12-04T12:35:04.7000124Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In member function ‘at::vec::CPU_CAPABILITY::Vectorized at::vec::CPU_CAPABILITY::Vectorized::operator>(const at::vec::CPU_CAPABILITY::Vectorized&) const’: 2025-12-04T12:35:04.7001341Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:713:54: warning: overflow in conversion from ‘int’ to ‘short int’ changes value from ‘65535’ to ‘-1’ [-Woverflow] 2025-12-04T12:35:04.7001543Z 713 | return _mm512_mask_set1_epi16(zero_vector, mask, 0xFFFF); 2025-12-04T12:35:04.7001702Z | ^~~~~~ 2025-12-04T12:35:04.7003390Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In member function ‘at::vec::CPU_CAPABILITY::Vectorized at::vec::CPU_CAPABILITY::Vectorized::operator>=(const at::vec::CPU_CAPABILITY::Vectorized&) const’: 2025-12-04T12:35:04.7004553Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:717:54: warning: overflow in conversion from ‘int’ to ‘short int’ changes value from ‘65535’ to ‘-1’ [-Woverflow] 2025-12-04T12:35:04.7004773Z 717 | return _mm512_mask_set1_epi16(zero_vector, mask, 0xFFFF); 2025-12-04T12:35:04.7004896Z | ^~~~~~ 2025-12-04T12:35:04.7007169Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In static member function ‘static at::vec::CPU_CAPABILITY::Vectorized at::vec::CPU_CAPABILITY::Vectorized::blendv(const at::vec::CPU_CAPABILITY::Vectorized&, const at::vec::CPU_CAPABILITY::Vectorized&, const at::vec::CPU_CAPABILITY::Vectorized&)’: 2025-12-04T12:35:04.7008373Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1153:37: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.7008534Z 1153 | auto msb_one = _mm512_set1_epi8(0xFF); 2025-12-04T12:35:04.7008658Z | ^~~~ 2025-12-04T12:35:04.7010327Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In member function ‘at::vec::CPU_CAPABILITY::Vectorized at::vec::CPU_CAPABILITY::Vectorized::operator==(const at::vec::CPU_CAPABILITY::Vectorized&) const’: 2025-12-04T12:35:04.7011540Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1166:53: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.7011749Z 1166 | return _mm512_mask_set1_epi8(zero_vector, mask, 0xFF); 2025-12-04T12:35:04.7011892Z | ^~~~ 2025-12-04T12:35:04.7013542Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In member function ‘at::vec::CPU_CAPABILITY::Vectorized at::vec::CPU_CAPABILITY::Vectorized::operator!=(const at::vec::CPU_CAPABILITY::Vectorized&) const’: 2025-12-04T12:35:04.7014739Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1170:53: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.7014960Z 1170 | return _mm512_mask_set1_epi8(zero_vector, mask, 0xFF); 2025-12-04T12:35:04.7015091Z | ^~~~ 2025-12-04T12:35:04.7016835Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In member function ‘at::vec::CPU_CAPABILITY::Vectorized at::vec::CPU_CAPABILITY::Vectorized::operator<(const at::vec::CPU_CAPABILITY::Vectorized&) const’: 2025-12-04T12:35:04.7018106Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1174:53: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.7018383Z 1174 | return _mm512_mask_set1_epi8(zero_vector, mask, 0xFF); 2025-12-04T12:35:04.7018513Z | ^~~~ 2025-12-04T12:35:04.7020206Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In member function ‘at::vec::CPU_CAPABILITY::Vectorized at::vec::CPU_CAPABILITY::Vectorized::operator<=(const at::vec::CPU_CAPABILITY::Vectorized&) const’: 2025-12-04T12:35:04.7021490Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1178:53: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.7021696Z 1178 | return _mm512_mask_set1_epi8(zero_vector, mask, 0xFF); 2025-12-04T12:35:04.7021835Z | ^~~~ 2025-12-04T12:35:04.7024183Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In static member function ‘static at::vec::CPU_CAPABILITY::Vectorized at::vec::CPU_CAPABILITY::Vectorized::blendv(const at::vec::CPU_CAPABILITY::Vectorized&, const at::vec::CPU_CAPABILITY::Vectorized&, const at::vec::CPU_CAPABILITY::Vectorized&)’: 2025-12-04T12:35:04.7025396Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1207:37: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.7025554Z 1207 | auto msb_one = _mm512_set1_epi8(0xFF); 2025-12-04T12:35:04.7025672Z | ^~~~ 2025-12-04T12:35:04.7027392Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In member function ‘at::vec::CPU_CAPABILITY::Vectorized at::vec::CPU_CAPABILITY::Vectorized::operator==(const at::vec::CPU_CAPABILITY::Vectorized&) const’: 2025-12-04T12:35:04.7028842Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1220:53: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.7029069Z 1220 | return _mm512_mask_set1_epi8(zero_vector, mask, 0xFF); 2025-12-04T12:35:04.7029193Z | ^~~~ 2025-12-04T12:35:04.7030925Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In member function ‘at::vec::CPU_CAPABILITY::Vectorized at::vec::CPU_CAPABILITY::Vectorized::operator!=(const at::vec::CPU_CAPABILITY::Vectorized&) const’: 2025-12-04T12:35:04.7032122Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1224:53: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.7032332Z 1224 | return _mm512_mask_set1_epi8(zero_vector, mask, 0xFF); 2025-12-04T12:35:04.7032470Z | ^~~~ 2025-12-04T12:35:04.7034185Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In member function ‘at::vec::CPU_CAPABILITY::Vectorized at::vec::CPU_CAPABILITY::Vectorized::operator<(const at::vec::CPU_CAPABILITY::Vectorized&) const’: 2025-12-04T12:35:04.7035897Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1228:53: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.7036111Z 1228 | return _mm512_mask_set1_epi8(zero_vector, mask, 0xFF); 2025-12-04T12:35:04.7036313Z | ^~~~ 2025-12-04T12:35:04.7038089Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In member function ‘at::vec::CPU_CAPABILITY::Vectorized at::vec::CPU_CAPABILITY::Vectorized::operator<=(const at::vec::CPU_CAPABILITY::Vectorized&) const’: 2025-12-04T12:35:04.7039319Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1232:53: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.7039591Z 1232 | return _mm512_mask_set1_epi8(zero_vector, mask, 0xFF); 2025-12-04T12:35:04.7039717Z | ^~~~ 2025-12-04T12:35:04.7042100Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In instantiation of ‘at::vec::CPU_CAPABILITY::Vectorized at::vec::CPU_CAPABILITY::shift_512_8(const at::vec::CPU_CAPABILITY::Vectorized&, const at::vec::CPU_CAPABILITY::Vectorized&) [with bool left_shift = true; T = signed char; typename std::enable_if<(is_same_v || is_same_v), int>::type = 0]’: 2025-12-04T12:35:04.7042696Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:2074:27: required from here 2025-12-04T12:35:04.7043910Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1866:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7044023Z 1866 | 0x80, 2025-12-04T12:35:04.7044121Z | ^~~~ 2025-12-04T12:35:04.7045329Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1868:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7045431Z 1868 | 0x80, 2025-12-04T12:35:04.7045524Z | ^~~~ 2025-12-04T12:35:04.7046729Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1870:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7046833Z 1870 | 0x80, 2025-12-04T12:35:04.7046938Z | ^~~~ 2025-12-04T12:35:04.7048132Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1872:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7048226Z 1872 | 0x80, 2025-12-04T12:35:04.7048331Z | ^~~~ 2025-12-04T12:35:04.7049508Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1874:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7049624Z 1874 | 0x80, 2025-12-04T12:35:04.7049716Z | ^~~~ 2025-12-04T12:35:04.7050926Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1876:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7051028Z 1876 | 0x80, 2025-12-04T12:35:04.7051135Z | ^~~~ 2025-12-04T12:35:04.7052315Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1878:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7052410Z 1878 | 0x80, 2025-12-04T12:35:04.7052521Z | ^~~~ 2025-12-04T12:35:04.7053704Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1880:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7053855Z 1880 | 0x80, 2025-12-04T12:35:04.7053947Z | ^~~~ 2025-12-04T12:35:04.7055159Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1882:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7055305Z 1882 | 0x80, 2025-12-04T12:35:04.7055399Z | ^~~~ 2025-12-04T12:35:04.7056708Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1884:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7056808Z 1884 | 0x80, 2025-12-04T12:35:04.7056901Z | ^~~~ 2025-12-04T12:35:04.7058102Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1886:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7058207Z 1886 | 0x80, 2025-12-04T12:35:04.7058302Z | ^~~~ 2025-12-04T12:35:04.7059503Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1888:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7059606Z 1888 | 0x80, 2025-12-04T12:35:04.7059712Z | ^~~~ 2025-12-04T12:35:04.7060900Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1890:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7060995Z 1890 | 0x80, 2025-12-04T12:35:04.7061102Z | ^~~~ 2025-12-04T12:35:04.7062277Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1892:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7062395Z 1892 | 0x80, 2025-12-04T12:35:04.7062489Z | ^~~~ 2025-12-04T12:35:04.7063668Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1894:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7063788Z 1894 | 0x80, 2025-12-04T12:35:04.7063881Z | ^~~~ 2025-12-04T12:35:04.7065063Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1896:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7065174Z 1896 | 0x80, 2025-12-04T12:35:04.7065267Z | ^~~~ 2025-12-04T12:35:04.7066469Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1898:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7066604Z 1898 | 0x80, 2025-12-04T12:35:04.7066695Z | ^~~~ 2025-12-04T12:35:04.7067896Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1900:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7068028Z 1900 | 0x80, 2025-12-04T12:35:04.7068120Z | ^~~~ 2025-12-04T12:35:04.7069316Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1902:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7069413Z 1902 | 0x80, 2025-12-04T12:35:04.7069522Z | ^~~~ 2025-12-04T12:35:04.7070703Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1904:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7070806Z 1904 | 0x80, 2025-12-04T12:35:04.7070909Z | ^~~~ 2025-12-04T12:35:04.7072395Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1906:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7072514Z 1906 | 0x80, 2025-12-04T12:35:04.7072606Z | ^~~~ 2025-12-04T12:35:04.7073832Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1908:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7073949Z 1908 | 0x80, 2025-12-04T12:35:04.7074041Z | ^~~~ 2025-12-04T12:35:04.7075227Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1910:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7075339Z 1910 | 0x80, 2025-12-04T12:35:04.7075429Z | ^~~~ 2025-12-04T12:35:04.7076619Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1912:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7076719Z 1912 | 0x80, 2025-12-04T12:35:04.7076811Z | ^~~~ 2025-12-04T12:35:04.7078000Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1914:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7078096Z 1914 | 0x80, 2025-12-04T12:35:04.7078204Z | ^~~~ 2025-12-04T12:35:04.7079383Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1916:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7079490Z 1916 | 0x80, 2025-12-04T12:35:04.7079597Z | ^~~~ 2025-12-04T12:35:04.7080789Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1918:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7080892Z 1918 | 0x80, 2025-12-04T12:35:04.7080997Z | ^~~~ 2025-12-04T12:35:04.7082197Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1920:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7082305Z 1920 | 0x80, 2025-12-04T12:35:04.7082399Z | ^~~~ 2025-12-04T12:35:04.7083579Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1922:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7083741Z 1922 | 0x80, 2025-12-04T12:35:04.7083832Z | ^~~~ 2025-12-04T12:35:04.7085036Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1924:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7085180Z 1924 | 0x80, 2025-12-04T12:35:04.7085274Z | ^~~~ 2025-12-04T12:35:04.7086481Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1926:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7086576Z 1926 | 0x80, 2025-12-04T12:35:04.7086668Z | ^~~~ 2025-12-04T12:35:04.7087866Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1928:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7087968Z 1928 | 0x80); 2025-12-04T12:35:04.7088074Z | ^~~~ 2025-12-04T12:35:04.7089291Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1930:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7089397Z 1930 | 0x80, 2025-12-04T12:35:04.7089504Z | ^~~~ 2025-12-04T12:35:04.7090716Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1932:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7090827Z 1932 | 0x80, 2025-12-04T12:35:04.7090919Z | ^~~~ 2025-12-04T12:35:04.7092115Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1934:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7092223Z 1934 | 0x80, 2025-12-04T12:35:04.7092315Z | ^~~~ 2025-12-04T12:35:04.7093493Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1936:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7093606Z 1936 | 0x80, 2025-12-04T12:35:04.7093698Z | ^~~~ 2025-12-04T12:35:04.7094890Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1938:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7094987Z 1938 | 0x80, 2025-12-04T12:35:04.7095080Z | ^~~~ 2025-12-04T12:35:04.7096273Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1940:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7096439Z 1940 | 0x80, 2025-12-04T12:35:04.7096535Z | ^~~~ 2025-12-04T12:35:04.7097737Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1942:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7097838Z 1942 | 0x80, 2025-12-04T12:35:04.7097949Z | ^~~~ 2025-12-04T12:35:04.7099133Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1944:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7099228Z 1944 | 0x80, 2025-12-04T12:35:04.7099336Z | ^~~~ 2025-12-04T12:35:04.7100561Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1946:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7100669Z 1946 | 0x80, 2025-12-04T12:35:04.7100759Z | ^~~~ 2025-12-04T12:35:04.7101965Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1948:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7102126Z 1948 | 0x80, 2025-12-04T12:35:04.7102216Z | ^~~~ 2025-12-04T12:35:04.7103432Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1950:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7103542Z 1950 | 0x80, 2025-12-04T12:35:04.7103635Z | ^~~~ 2025-12-04T12:35:04.7104835Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1952:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7104932Z 1952 | 0x80, 2025-12-04T12:35:04.7105023Z | ^~~~ 2025-12-04T12:35:04.7106221Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1954:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7106321Z 1954 | 0x80, 2025-12-04T12:35:04.7106428Z | ^~~~ 2025-12-04T12:35:04.7107603Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1956:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7107696Z 1956 | 0x80, 2025-12-04T12:35:04.7107807Z | ^~~~ 2025-12-04T12:35:04.7108980Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1958:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7109088Z 1958 | 0x80, 2025-12-04T12:35:04.7109180Z | ^~~~ 2025-12-04T12:35:04.7110365Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1960:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7110480Z 1960 | 0x80, 2025-12-04T12:35:04.7110570Z | ^~~~ 2025-12-04T12:35:04.7111756Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1962:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7111862Z 1962 | 0x80, 2025-12-04T12:35:04.7111959Z | ^~~~ 2025-12-04T12:35:04.7113148Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1964:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7113243Z 1964 | 0x80, 2025-12-04T12:35:04.7113334Z | ^~~~ 2025-12-04T12:35:04.7114525Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1966:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7114624Z 1966 | 0x80, 2025-12-04T12:35:04.7114714Z | ^~~~ 2025-12-04T12:35:04.7115917Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1968:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7116011Z 1968 | 0x80, 2025-12-04T12:35:04.7116155Z | ^~~~ 2025-12-04T12:35:04.7117334Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1970:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7117428Z 1970 | 0x80, 2025-12-04T12:35:04.7117532Z | ^~~~ 2025-12-04T12:35:04.7118745Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1972:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7118886Z 1972 | 0x80, 2025-12-04T12:35:04.7118976Z | ^~~~ 2025-12-04T12:35:04.7120185Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1974:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7120295Z 1974 | 0x80, 2025-12-04T12:35:04.7120396Z | ^~~~ 2025-12-04T12:35:04.7121569Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1976:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7121677Z 1976 | 0x80, 2025-12-04T12:35:04.7121769Z | ^~~~ 2025-12-04T12:35:04.7122963Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1978:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7123062Z 1978 | 0x80, 2025-12-04T12:35:04.7123152Z | ^~~~ 2025-12-04T12:35:04.7124360Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1980:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7124462Z 1980 | 0x80, 2025-12-04T12:35:04.7124576Z | ^~~~ 2025-12-04T12:35:04.7125746Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1982:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7125844Z 1982 | 0x80, 2025-12-04T12:35:04.7125954Z | ^~~~ 2025-12-04T12:35:04.7127129Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1984:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7127228Z 1984 | 0x80, 2025-12-04T12:35:04.7127338Z | ^~~~ 2025-12-04T12:35:04.7128519Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1986:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7128633Z 1986 | 0x80, 2025-12-04T12:35:04.7128726Z | ^~~~ 2025-12-04T12:35:04.7129900Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1988:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7130010Z 1988 | 0x80, 2025-12-04T12:35:04.7130101Z | ^~~~ 2025-12-04T12:35:04.7131300Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1990:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7131396Z 1990 | 0x80, 2025-12-04T12:35:04.7131492Z | ^~~~ 2025-12-04T12:35:04.7132693Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1992:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7132830Z 1992 | 0x80, 2025-12-04T12:35:04.7132921Z | ^~~~ 2025-12-04T12:35:04.7134118Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:2002:38: warning: overflow in conversion from ‘int’ to ‘short int’ changes value from ‘65280’ to ‘-256’ [-Woverflow] 2025-12-04T12:35:04.7134276Z 2002 | __m512i keep_1 = _mm512_set1_epi16(0xFF00); 2025-12-04T12:35:04.7134480Z | ^~~~~~ 2025-12-04T12:35:04.7137020Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In instantiation of ‘at::vec::CPU_CAPABILITY::Vectorized at::vec::CPU_CAPABILITY::shift_512_8(const at::vec::CPU_CAPABILITY::Vectorized&, const at::vec::CPU_CAPABILITY::Vectorized&) [with bool left_shift = true; T = unsigned char; typename std::enable_if<(is_same_v || is_same_v), int>::type = 0]’: 2025-12-04T12:35:04.7137622Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:2081:27: required from here 2025-12-04T12:35:04.7138823Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1866:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7138919Z 1866 | 0x80, 2025-12-04T12:35:04.7139035Z | ^~~~ 2025-12-04T12:35:04.7140215Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1868:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7140328Z 1868 | 0x80, 2025-12-04T12:35:04.7140427Z | ^~~~ 2025-12-04T12:35:04.7141598Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1870:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7141713Z 1870 | 0x80, 2025-12-04T12:35:04.7141807Z | ^~~~ 2025-12-04T12:35:04.7142980Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1872:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7143096Z 1872 | 0x80, 2025-12-04T12:35:04.7143202Z | ^~~~ 2025-12-04T12:35:04.7144392Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1874:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7144486Z 1874 | 0x80, 2025-12-04T12:35:04.7144585Z | ^~~~ 2025-12-04T12:35:04.7145779Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1876:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7145881Z 1876 | 0x80, 2025-12-04T12:35:04.7145976Z | ^~~~ 2025-12-04T12:35:04.7147167Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1878:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7147265Z 1878 | 0x80, 2025-12-04T12:35:04.7147384Z | ^~~~ 2025-12-04T12:35:04.7148558Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1880:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7148656Z 1880 | 0x80, 2025-12-04T12:35:04.7148772Z | ^~~~ 2025-12-04T12:35:04.7149947Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1882:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7150097Z 1882 | 0x80, 2025-12-04T12:35:04.7150193Z | ^~~~ 2025-12-04T12:35:04.7151370Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1884:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7151517Z 1884 | 0x80, 2025-12-04T12:35:04.7151645Z | ^~~~ 2025-12-04T12:35:04.7152835Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1886:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7152991Z 1886 | 0x80, 2025-12-04T12:35:04.7153086Z | ^~~~ 2025-12-04T12:35:04.7154284Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1888:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7154387Z 1888 | 0x80, 2025-12-04T12:35:04.7154481Z | ^~~~ 2025-12-04T12:35:04.7155674Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1890:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7155774Z 1890 | 0x80, 2025-12-04T12:35:04.7155886Z | ^~~~ 2025-12-04T12:35:04.7157060Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1892:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7157166Z 1892 | 0x80, 2025-12-04T12:35:04.7157271Z | ^~~~ 2025-12-04T12:35:04.7158454Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1894:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7158560Z 1894 | 0x80, 2025-12-04T12:35:04.7158666Z | ^~~~ 2025-12-04T12:35:04.7159847Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1896:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7159970Z 1896 | 0x80, 2025-12-04T12:35:04.7160064Z | ^~~~ 2025-12-04T12:35:04.7161241Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1898:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7161354Z 1898 | 0x80, 2025-12-04T12:35:04.7161444Z | ^~~~ 2025-12-04T12:35:04.7162624Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1900:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7162759Z 1900 | 0x80, 2025-12-04T12:35:04.7162850Z | ^~~~ 2025-12-04T12:35:04.7164039Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1902:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7164197Z 1902 | 0x80, 2025-12-04T12:35:04.7164287Z | ^~~~ 2025-12-04T12:35:04.7165488Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1904:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7165589Z 1904 | 0x80, 2025-12-04T12:35:04.7165694Z | ^~~~ 2025-12-04T12:35:04.7166863Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1906:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7166967Z 1906 | 0x80, 2025-12-04T12:35:04.7167078Z | ^~~~ 2025-12-04T12:35:04.7168246Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1908:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7168398Z 1908 | 0x80, 2025-12-04T12:35:04.7168495Z | ^~~~ 2025-12-04T12:35:04.7169673Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1910:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7169893Z 1910 | 0x80, 2025-12-04T12:35:04.7169987Z | ^~~~ 2025-12-04T12:35:04.7171352Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1912:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7171469Z 1912 | 0x80, 2025-12-04T12:35:04.7171563Z | ^~~~ 2025-12-04T12:35:04.7172773Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1914:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7172875Z 1914 | 0x80, 2025-12-04T12:35:04.7172969Z | ^~~~ 2025-12-04T12:35:04.7174164Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1916:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7174260Z 1916 | 0x80, 2025-12-04T12:35:04.7174364Z | ^~~~ 2025-12-04T12:35:04.7175534Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1918:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7175634Z 1918 | 0x80, 2025-12-04T12:35:04.7175742Z | ^~~~ 2025-12-04T12:35:04.7176981Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1920:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7177083Z 1920 | 0x80, 2025-12-04T12:35:04.7177190Z | ^~~~ 2025-12-04T12:35:04.7178374Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1922:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7178483Z 1922 | 0x80, 2025-12-04T12:35:04.7178575Z | ^~~~ 2025-12-04T12:35:04.7179740Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1924:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7179940Z 1924 | 0x80, 2025-12-04T12:35:04.7180033Z | ^~~~ 2025-12-04T12:35:04.7181233Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1926:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7181374Z 1926 | 0x80, 2025-12-04T12:35:04.7181466Z | ^~~~ 2025-12-04T12:35:04.7182668Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1928:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7182767Z 1928 | 0x80); 2025-12-04T12:35:04.7182861Z | ^~~~ 2025-12-04T12:35:04.7184059Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1930:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7184158Z 1930 | 0x80, 2025-12-04T12:35:04.7184268Z | ^~~~ 2025-12-04T12:35:04.7185486Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1932:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7185588Z 1932 | 0x80, 2025-12-04T12:35:04.7185697Z | ^~~~ 2025-12-04T12:35:04.7186921Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1934:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7187031Z 1934 | 0x80, 2025-12-04T12:35:04.7187125Z | ^~~~ 2025-12-04T12:35:04.7188302Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1936:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7188420Z 1936 | 0x80, 2025-12-04T12:35:04.7188512Z | ^~~~ 2025-12-04T12:35:04.7189692Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1938:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7189804Z 1938 | 0x80, 2025-12-04T12:35:04.7189899Z | ^~~~ 2025-12-04T12:35:04.7191092Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1940:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7191189Z 1940 | 0x80, 2025-12-04T12:35:04.7191280Z | ^~~~ 2025-12-04T12:35:04.7192471Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1942:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7192572Z 1942 | 0x80, 2025-12-04T12:35:04.7192665Z | ^~~~ 2025-12-04T12:35:04.7193856Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1944:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7193956Z 1944 | 0x80, 2025-12-04T12:35:04.7194064Z | ^~~~ 2025-12-04T12:35:04.7195375Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1946:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7195470Z 1946 | 0x80, 2025-12-04T12:35:04.7195577Z | ^~~~ 2025-12-04T12:35:04.7196767Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1948:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7196922Z 1948 | 0x80, 2025-12-04T12:35:04.7197014Z | ^~~~ 2025-12-04T12:35:04.7198229Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1950:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7198370Z 1950 | 0x80, 2025-12-04T12:35:04.7198463Z | ^~~~ 2025-12-04T12:35:04.7199681Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1952:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7199791Z 1952 | 0x80, 2025-12-04T12:35:04.7199883Z | ^~~~ 2025-12-04T12:35:04.7201068Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1954:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7201168Z 1954 | 0x80, 2025-12-04T12:35:04.7201258Z | ^~~~ 2025-12-04T12:35:04.7202459Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1956:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7202558Z 1956 | 0x80, 2025-12-04T12:35:04.7202664Z | ^~~~ 2025-12-04T12:35:04.7203846Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1958:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7203942Z 1958 | 0x80, 2025-12-04T12:35:04.7204047Z | ^~~~ 2025-12-04T12:35:04.7205221Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1960:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7205322Z 1960 | 0x80, 2025-12-04T12:35:04.7205430Z | ^~~~ 2025-12-04T12:35:04.7206609Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1962:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7206722Z 1962 | 0x80, 2025-12-04T12:35:04.7206814Z | ^~~~ 2025-12-04T12:35:04.7207990Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1964:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7208097Z 1964 | 0x80, 2025-12-04T12:35:04.7208187Z | ^~~~ 2025-12-04T12:35:04.7209370Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1966:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7209476Z 1966 | 0x80, 2025-12-04T12:35:04.7209567Z | ^~~~ 2025-12-04T12:35:04.7210767Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1968:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7210867Z 1968 | 0x80, 2025-12-04T12:35:04.7210959Z | ^~~~ 2025-12-04T12:35:04.7212159Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1970:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7212254Z 1970 | 0x80, 2025-12-04T12:35:04.7212358Z | ^~~~ 2025-12-04T12:35:04.7213532Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1972:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7213666Z 1972 | 0x80, 2025-12-04T12:35:04.7213772Z | ^~~~ 2025-12-04T12:35:04.7214979Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1974:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7215120Z 1974 | 0x80, 2025-12-04T12:35:04.7215211Z | ^~~~ 2025-12-04T12:35:04.7216496Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1976:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7216609Z 1976 | 0x80, 2025-12-04T12:35:04.7216701Z | ^~~~ 2025-12-04T12:35:04.7218227Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1978:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7218340Z 1978 | 0x80, 2025-12-04T12:35:04.7218434Z | ^~~~ 2025-12-04T12:35:04.7219639Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1980:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7219740Z 1980 | 0x80, 2025-12-04T12:35:04.7219833Z | ^~~~ 2025-12-04T12:35:04.7221035Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1982:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7221129Z 1982 | 0x80, 2025-12-04T12:35:04.7221220Z | ^~~~ 2025-12-04T12:35:04.7222416Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1984:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7222511Z 1984 | 0x80, 2025-12-04T12:35:04.7222617Z | ^~~~ 2025-12-04T12:35:04.7223791Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1986:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7223891Z 1986 | 0x80, 2025-12-04T12:35:04.7223994Z | ^~~~ 2025-12-04T12:35:04.7225170Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1988:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7225277Z 1988 | 0x80, 2025-12-04T12:35:04.7225368Z | ^~~~ 2025-12-04T12:35:04.7226565Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1990:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7226674Z 1990 | 0x80, 2025-12-04T12:35:04.7226765Z | ^~~~ 2025-12-04T12:35:04.7227963Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1992:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7228066Z 1992 | 0x80, 2025-12-04T12:35:04.7228158Z | ^~~~ 2025-12-04T12:35:04.7229364Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:2002:38: warning: overflow in conversion from ‘int’ to ‘short int’ changes value from ‘65280’ to ‘-256’ [-Woverflow] 2025-12-04T12:35:04.7230656Z 2002 | __m512i keep_1 = _mm512_set1_epi16(0xFF00); 2025-12-04T12:35:04.7231163Z | ^~~~~~ 2025-12-04T12:35:04.7233876Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In instantiation of ‘at::vec::CPU_CAPABILITY::Vectorized at::vec::CPU_CAPABILITY::shift_512_8(const at::vec::CPU_CAPABILITY::Vectorized&, const at::vec::CPU_CAPABILITY::Vectorized&) [with bool left_shift = false; T = signed char; typename std::enable_if<(is_same_v || is_same_v), int>::type = 0]’: 2025-12-04T12:35:04.7236582Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:2109:28: required from here 2025-12-04T12:35:04.7238579Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1866:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7239785Z 1866 | 0x80, 2025-12-04T12:35:04.7240056Z | ^~~~ 2025-12-04T12:35:04.7241410Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1868:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7242636Z 1868 | 0x80, 2025-12-04T12:35:04.7242885Z | ^~~~ 2025-12-04T12:35:04.7244235Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1870:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7245460Z 1870 | 0x80, 2025-12-04T12:35:04.7245702Z | ^~~~ 2025-12-04T12:35:04.7247049Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1872:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7248259Z 1872 | 0x80, 2025-12-04T12:35:04.7248522Z | ^~~~ 2025-12-04T12:35:04.7249850Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1874:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7251062Z 1874 | 0x80, 2025-12-04T12:35:04.7251316Z | ^~~~ 2025-12-04T12:35:04.7252662Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1876:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7253870Z 1876 | 0x80, 2025-12-04T12:35:04.7254128Z | ^~~~ 2025-12-04T12:35:04.7255479Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1878:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7256755Z 1878 | 0x80, 2025-12-04T12:35:04.7257020Z | ^~~~ 2025-12-04T12:35:04.7258373Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1880:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7259718Z 1880 | 0x80, 2025-12-04T12:35:04.7259964Z | ^~~~ 2025-12-04T12:35:04.7261324Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1882:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7262601Z 1882 | 0x80, 2025-12-04T12:35:04.7262861Z | ^~~~ 2025-12-04T12:35:04.7264216Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1884:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7265477Z 1884 | 0x80, 2025-12-04T12:35:04.7265732Z | ^~~~ 2025-12-04T12:35:04.7267061Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1886:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7268277Z 1886 | 0x80, 2025-12-04T12:35:04.7268533Z | ^~~~ 2025-12-04T12:35:04.7269941Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1888:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7271311Z 1888 | 0x80, 2025-12-04T12:35:04.7271573Z | ^~~~ 2025-12-04T12:35:04.7273005Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1890:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7274243Z 1890 | 0x80, 2025-12-04T12:35:04.7274491Z | ^~~~ 2025-12-04T12:35:04.7275849Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1892:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7277062Z 1892 | 0x80, 2025-12-04T12:35:04.7277306Z | ^~~~ 2025-12-04T12:35:04.7278665Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1894:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7279888Z 1894 | 0x80, 2025-12-04T12:35:04.7280142Z | ^~~~ 2025-12-04T12:35:04.7281475Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1896:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7282699Z 1896 | 0x80, 2025-12-04T12:35:04.7282956Z | ^~~~ 2025-12-04T12:35:04.7284282Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1898:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7285495Z 1898 | 0x80, 2025-12-04T12:35:04.7285760Z | ^~~~ 2025-12-04T12:35:04.7287113Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1900:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7288313Z 1900 | 0x80, 2025-12-04T12:35:04.7288574Z | ^~~~ 2025-12-04T12:35:04.7289923Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1902:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7291224Z 1902 | 0x80, 2025-12-04T12:35:04.7291469Z | ^~~~ 2025-12-04T12:35:04.7292825Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1904:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7294034Z 1904 | 0x80, 2025-12-04T12:35:04.7294285Z | ^~~~ 2025-12-04T12:35:04.7295679Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1906:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7296997Z 1906 | 0x80, 2025-12-04T12:35:04.7297265Z | ^~~~ 2025-12-04T12:35:04.7298600Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1908:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7299814Z 1908 | 0x80, 2025-12-04T12:35:04.7300072Z | ^~~~ 2025-12-04T12:35:04.7301414Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1910:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7302629Z 1910 | 0x80, 2025-12-04T12:35:04.7302960Z | ^~~~ 2025-12-04T12:35:04.7304313Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1912:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7305522Z 1912 | 0x80, 2025-12-04T12:35:04.7305811Z | ^~~~ 2025-12-04T12:35:04.7307159Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1914:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7308377Z 1914 | 0x80, 2025-12-04T12:35:04.7308619Z | ^~~~ 2025-12-04T12:35:04.7309961Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1916:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7311165Z 1916 | 0x80, 2025-12-04T12:35:04.7311438Z | ^~~~ 2025-12-04T12:35:04.7312764Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1918:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7313971Z 1918 | 0x80, 2025-12-04T12:35:04.7314236Z | ^~~~ 2025-12-04T12:35:04.7315557Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1920:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7316777Z 1920 | 0x80, 2025-12-04T12:35:04.7317040Z | ^~~~ 2025-12-04T12:35:04.7318382Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1922:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7319585Z 1922 | 0x80, 2025-12-04T12:35:04.7319855Z | ^~~~ 2025-12-04T12:35:04.7321598Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1924:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7322828Z 1924 | 0x80, 2025-12-04T12:35:04.7323078Z | ^~~~ 2025-12-04T12:35:04.7324434Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1926:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7325719Z 1926 | 0x80, 2025-12-04T12:35:04.7325964Z | ^~~~ 2025-12-04T12:35:04.7327305Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1928:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7328741Z 1928 | 0x80); 2025-12-04T12:35:04.7329241Z | ^~~~ 2025-12-04T12:35:04.7330849Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1930:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7332596Z 1930 | 0x80, 2025-12-04T12:35:04.7333055Z | ^~~~ 2025-12-04T12:35:04.7334956Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1932:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7336682Z 1932 | 0x80, 2025-12-04T12:35:04.7337119Z | ^~~~ 2025-12-04T12:35:04.7340967Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1934:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7342369Z 1934 | 0x80, 2025-12-04T12:35:04.7342644Z | ^~~~ 2025-12-04T12:35:04.7345551Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1936:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7346908Z 1936 | 0x80, 2025-12-04T12:35:04.7347163Z | ^~~~ 2025-12-04T12:35:04.7348566Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1938:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7349812Z 1938 | 0x80, 2025-12-04T12:35:04.7350116Z | ^~~~ 2025-12-04T12:35:04.7351570Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1940:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7352799Z 1940 | 0x80, 2025-12-04T12:35:04.7353054Z | ^~~~ 2025-12-04T12:35:04.7354380Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1942:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7355599Z 1942 | 0x80, 2025-12-04T12:35:04.7355855Z | ^~~~ 2025-12-04T12:35:04.7357361Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1944:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7358635Z 1944 | 0x80, 2025-12-04T12:35:04.7358893Z | ^~~~ 2025-12-04T12:35:04.7360240Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1946:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7361982Z 1946 | 0x80, 2025-12-04T12:35:04.7362391Z | ^~~~ 2025-12-04T12:35:04.7364291Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1948:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7365572Z 1948 | 0x80, 2025-12-04T12:35:04.7365820Z | ^~~~ 2025-12-04T12:35:04.7367242Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1950:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7368602Z 1950 | 0x80, 2025-12-04T12:35:04.7368888Z | ^~~~ 2025-12-04T12:35:04.7370327Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1952:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7372549Z 1952 | 0x80, 2025-12-04T12:35:04.7372972Z | ^~~~ 2025-12-04T12:35:04.7374576Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1954:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7375799Z 1954 | 0x80, 2025-12-04T12:35:04.7376104Z | ^~~~ 2025-12-04T12:35:04.7377679Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1956:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7379041Z 1956 | 0x80, 2025-12-04T12:35:04.7379357Z | ^~~~ 2025-12-04T12:35:04.7380722Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1958:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7382568Z 1958 | 0x80, 2025-12-04T12:35:04.7382814Z | ^~~~ 2025-12-04T12:35:04.7384239Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1960:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7386040Z 1960 | 0x80, 2025-12-04T12:35:04.7386284Z | ^~~~ 2025-12-04T12:35:04.7387980Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1962:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7389624Z 1962 | 0x80, 2025-12-04T12:35:04.7389887Z | ^~~~ 2025-12-04T12:35:04.7391527Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1964:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7393464Z 1964 | 0x80, 2025-12-04T12:35:04.7393722Z | ^~~~ 2025-12-04T12:35:04.7395621Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1966:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7397192Z 1966 | 0x80, 2025-12-04T12:35:04.7397448Z | ^~~~ 2025-12-04T12:35:04.7398865Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1968:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7400623Z 1968 | 0x80, 2025-12-04T12:35:04.7401054Z | ^~~~ 2025-12-04T12:35:04.7403155Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1970:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7404376Z 1970 | 0x80, 2025-12-04T12:35:04.7404637Z | ^~~~ 2025-12-04T12:35:04.7406254Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1972:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7407757Z 1972 | 0x80, 2025-12-04T12:35:04.7408006Z | ^~~~ 2025-12-04T12:35:04.7410111Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1974:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7412157Z 1974 | 0x80, 2025-12-04T12:35:04.7412460Z | ^~~~ 2025-12-04T12:35:04.7414334Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1976:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7415674Z 1976 | 0x80, 2025-12-04T12:35:04.7415938Z | ^~~~ 2025-12-04T12:35:04.7417418Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1978:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7418633Z 1978 | 0x80, 2025-12-04T12:35:04.7418894Z | ^~~~ 2025-12-04T12:35:04.7420229Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1980:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7421456Z 1980 | 0x80, 2025-12-04T12:35:04.7421715Z | ^~~~ 2025-12-04T12:35:04.7423066Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1982:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7424262Z 1982 | 0x80, 2025-12-04T12:35:04.7424527Z | ^~~~ 2025-12-04T12:35:04.7425875Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1984:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7427097Z 1984 | 0x80, 2025-12-04T12:35:04.7427337Z | ^~~~ 2025-12-04T12:35:04.7428672Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1986:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7429888Z 1986 | 0x80, 2025-12-04T12:35:04.7430127Z | ^~~~ 2025-12-04T12:35:04.7431472Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1988:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7432685Z 1988 | 0x80, 2025-12-04T12:35:04.7432941Z | ^~~~ 2025-12-04T12:35:04.7434265Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1990:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7435489Z 1990 | 0x80, 2025-12-04T12:35:04.7435743Z | ^~~~ 2025-12-04T12:35:04.7437079Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1992:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7438289Z 1992 | 0x80, 2025-12-04T12:35:04.7438544Z | ^~~~ 2025-12-04T12:35:04.7439898Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:2002:38: warning: overflow in conversion from ‘int’ to ‘short int’ changes value from ‘65280’ to ‘-256’ [-Woverflow] 2025-12-04T12:35:04.7441157Z 2002 | __m512i keep_1 = _mm512_set1_epi16(0xFF00); 2025-12-04T12:35:04.7441564Z | ^~~~~~ 2025-12-04T12:35:04.7444270Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In instantiation of ‘at::vec::CPU_CAPABILITY::Vectorized at::vec::CPU_CAPABILITY::shift_512_8(const at::vec::CPU_CAPABILITY::Vectorized&, const at::vec::CPU_CAPABILITY::Vectorized&) [with bool left_shift = false; T = unsigned char; typename std::enable_if<(is_same_v || is_same_v), int>::type = 0]’: 2025-12-04T12:35:04.7446957Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:2116:28: required from here 2025-12-04T12:35:04.7448908Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1866:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7450147Z 1866 | 0x80, 2025-12-04T12:35:04.7450407Z | ^~~~ 2025-12-04T12:35:04.7451791Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1868:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7453014Z 1868 | 0x80, 2025-12-04T12:35:04.7453261Z | ^~~~ 2025-12-04T12:35:04.7454609Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1870:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7455822Z 1870 | 0x80, 2025-12-04T12:35:04.7456063Z | ^~~~ 2025-12-04T12:35:04.7457490Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1872:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7458707Z 1872 | 0x80, 2025-12-04T12:35:04.7458971Z | ^~~~ 2025-12-04T12:35:04.7460307Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1874:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7461521Z 1874 | 0x80, 2025-12-04T12:35:04.7461785Z | ^~~~ 2025-12-04T12:35:04.7463126Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1876:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7464726Z 1876 | 0x80, 2025-12-04T12:35:04.7465135Z | ^~~~ 2025-12-04T12:35:04.7467492Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1878:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7468838Z 1878 | 0x80, 2025-12-04T12:35:04.7469082Z | ^~~~ 2025-12-04T12:35:04.7470773Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1880:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7472358Z 1880 | 0x80, 2025-12-04T12:35:04.7472770Z | ^~~~ 2025-12-04T12:35:04.7474526Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1882:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7475921Z 1882 | 0x80, 2025-12-04T12:35:04.7476251Z | ^~~~ 2025-12-04T12:35:04.7477605Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1884:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7478930Z 1884 | 0x80, 2025-12-04T12:35:04.7479188Z | ^~~~ 2025-12-04T12:35:04.7481116Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1886:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7483213Z 1886 | 0x80, 2025-12-04T12:35:04.7483628Z | ^~~~ 2025-12-04T12:35:04.7485142Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1888:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7486485Z 1888 | 0x80, 2025-12-04T12:35:04.7486764Z | ^~~~ 2025-12-04T12:35:04.7488674Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1890:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7490271Z 1890 | 0x80, 2025-12-04T12:35:04.7490521Z | ^~~~ 2025-12-04T12:35:04.7492584Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1892:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7494281Z 1892 | 0x80, 2025-12-04T12:35:04.7494688Z | ^~~~ 2025-12-04T12:35:04.7496482Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1894:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7509109Z 1894 | 0x80, 2025-12-04T12:35:04.7509413Z | ^~~~ 2025-12-04T12:35:04.7510931Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1896:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7512628Z 1896 | 0x80, 2025-12-04T12:35:04.7513056Z | ^~~~ 2025-12-04T12:35:04.7515120Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1898:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7516809Z 1898 | 0x80, 2025-12-04T12:35:04.7517084Z | ^~~~ 2025-12-04T12:35:04.7518521Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1900:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7519926Z 1900 | 0x80, 2025-12-04T12:35:04.7520345Z | ^~~~ 2025-12-04T12:35:04.7522378Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1902:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7524295Z 1902 | 0x80, 2025-12-04T12:35:04.7524630Z | ^~~~ 2025-12-04T12:35:04.7526133Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1904:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7527615Z 1904 | 0x80, 2025-12-04T12:35:04.7527982Z | ^~~~ 2025-12-04T12:35:04.7529865Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1906:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7531606Z 1906 | 0x80, 2025-12-04T12:35:04.7532028Z | ^~~~ 2025-12-04T12:35:04.7533743Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1908:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7535355Z 1908 | 0x80, 2025-12-04T12:35:04.7535772Z | ^~~~ 2025-12-04T12:35:04.7537496Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1910:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7538711Z 1910 | 0x80, 2025-12-04T12:35:04.7539044Z | ^~~~ 2025-12-04T12:35:04.7541371Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1912:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7543417Z 1912 | 0x80, 2025-12-04T12:35:04.7543675Z | ^~~~ 2025-12-04T12:35:04.7545250Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1914:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7546855Z 1914 | 0x80, 2025-12-04T12:35:04.7547098Z | ^~~~ 2025-12-04T12:35:04.7548895Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1916:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7550126Z 1916 | 0x80, 2025-12-04T12:35:04.7550544Z | ^~~~ 2025-12-04T12:35:04.7552859Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1918:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7554783Z 1918 | 0x80, 2025-12-04T12:35:04.7555061Z | ^~~~ 2025-12-04T12:35:04.7556554Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1920:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7558170Z 1920 | 0x80, 2025-12-04T12:35:04.7558431Z | ^~~~ 2025-12-04T12:35:04.7560133Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1922:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7561441Z 1922 | 0x80, 2025-12-04T12:35:04.7561696Z | ^~~~ 2025-12-04T12:35:04.7563042Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1924:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7564941Z 1924 | 0x80, 2025-12-04T12:35:04.7565349Z | ^~~~ 2025-12-04T12:35:04.7567606Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1926:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7568899Z 1926 | 0x80, 2025-12-04T12:35:04.7569146Z | ^~~~ 2025-12-04T12:35:04.7570572Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1928:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7572407Z 1928 | 0x80); 2025-12-04T12:35:04.7572724Z | ^~~~ 2025-12-04T12:35:04.7575093Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1930:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7576839Z 1930 | 0x80, 2025-12-04T12:35:04.7577115Z | ^~~~ 2025-12-04T12:35:04.7578858Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1932:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7580275Z 1932 | 0x80, 2025-12-04T12:35:04.7580536Z | ^~~~ 2025-12-04T12:35:04.7582189Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1934:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7583483Z 1934 | 0x80, 2025-12-04T12:35:04.7583745Z | ^~~~ 2025-12-04T12:35:04.7585098Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1936:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7586307Z 1936 | 0x80, 2025-12-04T12:35:04.7586550Z | ^~~~ 2025-12-04T12:35:04.7587896Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1938:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7589099Z 1938 | 0x80, 2025-12-04T12:35:04.7589342Z | ^~~~ 2025-12-04T12:35:04.7590687Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1940:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7591897Z 1940 | 0x80, 2025-12-04T12:35:04.7592149Z | ^~~~ 2025-12-04T12:35:04.7593474Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1942:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7594684Z 1942 | 0x80, 2025-12-04T12:35:04.7594943Z | ^~~~ 2025-12-04T12:35:04.7596275Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1944:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7597484Z 1944 | 0x80, 2025-12-04T12:35:04.7597747Z | ^~~~ 2025-12-04T12:35:04.7599091Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1946:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7600290Z 1946 | 0x80, 2025-12-04T12:35:04.7600545Z | ^~~~ 2025-12-04T12:35:04.7601892Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1948:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7603098Z 1948 | 0x80, 2025-12-04T12:35:04.7603348Z | ^~~~ 2025-12-04T12:35:04.7604698Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1950:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7605909Z 1950 | 0x80, 2025-12-04T12:35:04.7606160Z | ^~~~ 2025-12-04T12:35:04.7607504Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1952:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7608796Z 1952 | 0x80, 2025-12-04T12:35:04.7609049Z | ^~~~ 2025-12-04T12:35:04.7610386Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1954:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7611593Z 1954 | 0x80, 2025-12-04T12:35:04.7611930Z | ^~~~ 2025-12-04T12:35:04.7613271Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1956:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7614472Z 1956 | 0x80, 2025-12-04T12:35:04.7614771Z | ^~~~ 2025-12-04T12:35:04.7616117Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1958:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7617400Z 1958 | 0x80, 2025-12-04T12:35:04.7617661Z | ^~~~ 2025-12-04T12:35:04.7619016Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1960:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7620238Z 1960 | 0x80, 2025-12-04T12:35:04.7620499Z | ^~~~ 2025-12-04T12:35:04.7621851Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1962:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7623059Z 1962 | 0x80, 2025-12-04T12:35:04.7623311Z | ^~~~ 2025-12-04T12:35:04.7624655Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1964:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7625880Z 1964 | 0x80, 2025-12-04T12:35:04.7626142Z | ^~~~ 2025-12-04T12:35:04.7627472Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1966:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7628692Z 1966 | 0x80, 2025-12-04T12:35:04.7628956Z | ^~~~ 2025-12-04T12:35:04.7630305Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1968:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7631509Z 1968 | 0x80, 2025-12-04T12:35:04.7631769Z | ^~~~ 2025-12-04T12:35:04.7633119Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1970:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7634345Z 1970 | 0x80, 2025-12-04T12:35:04.7634591Z | ^~~~ 2025-12-04T12:35:04.7635934Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1972:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7637154Z 1972 | 0x80, 2025-12-04T12:35:04.7637398Z | ^~~~ 2025-12-04T12:35:04.7638737Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1974:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7639948Z 1974 | 0x80, 2025-12-04T12:35:04.7640200Z | ^~~~ 2025-12-04T12:35:04.7641528Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1976:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7642828Z 1976 | 0x80, 2025-12-04T12:35:04.7643085Z | ^~~~ 2025-12-04T12:35:04.7644403Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1978:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7645681Z 1978 | 0x80, 2025-12-04T12:35:04.7645944Z | ^~~~ 2025-12-04T12:35:04.7647297Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1980:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7648529Z 1980 | 0x80, 2025-12-04T12:35:04.7648788Z | ^~~~ 2025-12-04T12:35:04.7650134Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1982:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7651344Z 1982 | 0x80, 2025-12-04T12:35:04.7651583Z | ^~~~ 2025-12-04T12:35:04.7652921Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1984:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7654139Z 1984 | 0x80, 2025-12-04T12:35:04.7654382Z | ^~~~ 2025-12-04T12:35:04.7655720Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1986:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7657004Z 1986 | 0x80, 2025-12-04T12:35:04.7657265Z | ^~~~ 2025-12-04T12:35:04.7658602Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1988:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7659826Z 1988 | 0x80, 2025-12-04T12:35:04.7660082Z | ^~~~ 2025-12-04T12:35:04.7661427Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1990:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7662633Z 1990 | 0x80, 2025-12-04T12:35:04.7662891Z | ^~~~ 2025-12-04T12:35:04.7664235Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1992:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7665433Z 1992 | 0x80, 2025-12-04T12:35:04.7665689Z | ^~~~ 2025-12-04T12:35:04.7667030Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:2002:38: warning: overflow in conversion from ‘int’ to ‘short int’ changes value from ‘65280’ to ‘-256’ [-Woverflow] 2025-12-04T12:35:04.7668306Z 2002 | __m512i keep_1 = _mm512_set1_epi16(0xFF00); 2025-12-04T12:35:04.7668701Z | ^~~~~~ 2025-12-04T12:35:04.7669462Z In file included from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512.h:16, 2025-12-04T12:35:04.7670489Z from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec.h:5, 2025-12-04T12:35:04.7671657Z from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/functional_base.h:7, 2025-12-04T12:35:04.7672646Z from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/functional.h:4, 2025-12-04T12:35:04.7673664Z from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/torch/csrc/inductor/cpp_prefix.h:45, 2025-12-04T12:35:04.7675017Z from /tmp/wzlxAD/tmp2gno_q_y/data/aotinductor/model1/cji6fcfpjxr5ad3oypbruxr5r26niflgwwkmd5rthzuhxclq6uis.wrapper.cpp:751: 2025-12-04T12:35:04.7677313Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h: In instantiation of ‘void at::vec::CPU_CAPABILITY::QuantizeAvx512(const float*, T*, int, float, int64_t) [with T = signed char; int64_t = long int]’: 2025-12-04T12:35:04.7679278Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:696:31: required from here 2025-12-04T12:35:04.7681251Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:201:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.7682500Z 201 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.7682839Z | ^~~~ 2025-12-04T12:35:04.7684187Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:201:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.7685425Z 201 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.7685765Z | ^~~~ 2025-12-04T12:35:04.7687144Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:201:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.7688370Z 201 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.7688712Z | ^~~~ 2025-12-04T12:35:04.7690113Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:201:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.7691343Z 201 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.7691672Z | ^~~~ 2025-12-04T12:35:04.7693085Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:202:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.7694316Z 202 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.7694637Z | ^~~~ 2025-12-04T12:35:04.7695997Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:202:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.7697305Z 202 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.7697643Z | ^~~~ 2025-12-04T12:35:04.7699014Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:202:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.7700299Z 202 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.7700636Z | ^~~~ 2025-12-04T12:35:04.7702026Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:202:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.7703245Z 202 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.7703637Z | ^~~~ 2025-12-04T12:35:04.7705045Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:203:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.7706289Z 203 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.7706621Z | ^~~~ 2025-12-04T12:35:04.7707972Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:203:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.7709222Z 203 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.7709546Z | ^~~~ 2025-12-04T12:35:04.7710969Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:203:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.7712220Z 203 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.7712563Z | ^~~~ 2025-12-04T12:35:04.7713985Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:203:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.7715225Z 203 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.7715574Z | ^~~~ 2025-12-04T12:35:04.7716990Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:205:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.7718215Z 205 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.7718552Z | ^~~~ 2025-12-04T12:35:04.7719916Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:205:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.7721153Z 205 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.7721479Z | ^~~~ 2025-12-04T12:35:04.7722869Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:205:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.7724111Z 205 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.7724442Z | ^~~~ 2025-12-04T12:35:04.7725830Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:205:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.7727060Z 205 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.7727403Z | ^~~~ 2025-12-04T12:35:04.7728816Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:206:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.7730048Z 206 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.7730374Z | ^~~~ 2025-12-04T12:35:04.7731738Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:206:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.7733008Z 206 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.7733343Z | ^~~~ 2025-12-04T12:35:04.7734726Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:206:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.7735959Z 206 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.7736402Z | ^~~~ 2025-12-04T12:35:04.7737817Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:206:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.7739057Z 206 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.7739383Z | ^~~~ 2025-12-04T12:35:04.7740795Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:207:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.7742046Z 207 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.7742381Z | ^~~~ 2025-12-04T12:35:04.7743773Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:207:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.7745011Z 207 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.7745342Z | ^~~~ 2025-12-04T12:35:04.7746778Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:207:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.7748007Z 207 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.7748347Z | ^~~~ 2025-12-04T12:35:04.7749736Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:207:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.7750978Z 207 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.7751306Z | ^~~~ 2025-12-04T12:35:04.7752730Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:209:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.7753965Z 209 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.7754281Z | ^~~~ 2025-12-04T12:35:04.7755648Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:209:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.7756880Z 209 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.7757221Z | ^~~~ 2025-12-04T12:35:04.7758579Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:209:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.7759810Z 209 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.7760152Z | ^~~~ 2025-12-04T12:35:04.7761550Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:209:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.7762776Z 209 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.7763117Z | ^~~~ 2025-12-04T12:35:04.7764522Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:210:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.7765804Z 210 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.7766122Z | ^~~~ 2025-12-04T12:35:04.7767468Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:210:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.7768741Z 210 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.7769097Z | ^~~~ 2025-12-04T12:35:04.7770476Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:210:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.7772010Z 210 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.7772355Z | ^~~~ 2025-12-04T12:35:04.7773741Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:210:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.7774979Z 210 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.7775319Z | ^~~~ 2025-12-04T12:35:04.7776785Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:211:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.7776905Z 211 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.7776998Z | ^~~~ 2025-12-04T12:35:04.7778208Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:211:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.7778321Z 211 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.7778419Z | ^~~~ 2025-12-04T12:35:04.7779618Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:211:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.7779737Z 211 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.7779853Z | ^~~~ 2025-12-04T12:35:04.7781033Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:211:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.7781151Z 211 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.7781266Z | ^~~~ 2025-12-04T12:35:04.7782450Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:213:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.7782576Z 213 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.7782677Z | ^~~~ 2025-12-04T12:35:04.7783856Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:213:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.7783981Z 213 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.7784079Z | ^~~~ 2025-12-04T12:35:04.7785281Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:213:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.7785399Z 213 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.7785510Z | ^~~~ 2025-12-04T12:35:04.7786707Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:213:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.7786888Z 213 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.7786990Z | ^~~~ 2025-12-04T12:35:04.7788194Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:214:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.7788362Z 214 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.7788519Z | ^~~~ 2025-12-04T12:35:04.7789702Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:214:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.7789852Z 214 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.7789966Z | ^~~~ 2025-12-04T12:35:04.7791151Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:214:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.7791281Z 214 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.7791384Z | ^~~~ 2025-12-04T12:35:04.7792565Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:214:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.7792696Z 214 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.7792798Z | ^~~~ 2025-12-04T12:35:04.7793998Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:215:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.7794112Z 215 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.7794205Z | ^~~~ 2025-12-04T12:35:04.7795398Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:215:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.7795516Z 215 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.7795614Z | ^~~~ 2025-12-04T12:35:04.7796815Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:215:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.7796932Z 215 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.7797046Z | ^~~~ 2025-12-04T12:35:04.7798235Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:215:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.7798347Z 215 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.7798469Z | ^~~~ 2025-12-04T12:35:04.7799957Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h: In instantiation of ‘void at::vec::CPU_CAPABILITY::QuantizeAvx512(const float*, T*, int, float, int64_t) [with T = unsigned char; int64_t = long int]’: 2025-12-04T12:35:04.7800553Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:933:31: required from here 2025-12-04T12:35:04.7801743Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:201:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.7801866Z 201 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.7801977Z | ^~~~ 2025-12-04T12:35:04.7803163Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:201:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.7803341Z 201 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.7803441Z | ^~~~ 2025-12-04T12:35:04.7804661Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:201:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.7804822Z 201 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.7804924Z | ^~~~ 2025-12-04T12:35:04.7806156Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:201:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.7806270Z 201 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.7806373Z | ^~~~ 2025-12-04T12:35:04.7807566Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:202:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.7807684Z 202 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.7807779Z | ^~~~ 2025-12-04T12:35:04.7808980Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:202:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.7809097Z 202 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.7809207Z | ^~~~ 2025-12-04T12:35:04.7810394Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:202:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.7810506Z 202 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.7810623Z | ^~~~ 2025-12-04T12:35:04.7811817Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:202:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.7811941Z 202 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.7812040Z | ^~~~ 2025-12-04T12:35:04.7813231Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:203:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.7813368Z 203 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.7813461Z | ^~~~ 2025-12-04T12:35:04.7814656Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:203:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.7814776Z 203 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.7814871Z | ^~~~ 2025-12-04T12:35:04.7816059Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:203:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.7816172Z 203 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.7816343Z | ^~~~ 2025-12-04T12:35:04.7817552Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:203:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.7817673Z 203 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.7817790Z | ^~~~ 2025-12-04T12:35:04.7818962Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:205:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.7819123Z 205 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.7819231Z | ^~~~ 2025-12-04T12:35:04.7820449Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:205:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.7820628Z 205 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.7820725Z | ^~~~ 2025-12-04T12:35:04.7821942Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:205:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.7822073Z 205 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.7822174Z | ^~~~ 2025-12-04T12:35:04.7823371Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:205:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.7823490Z 205 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.7823590Z | ^~~~ 2025-12-04T12:35:04.7824786Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:206:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.7824905Z 206 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.7824998Z | ^~~~ 2025-12-04T12:35:04.7826203Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:206:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.7826316Z 206 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.7826433Z | ^~~~ 2025-12-04T12:35:04.7827609Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:206:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.7827721Z 206 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.7827836Z | ^~~~ 2025-12-04T12:35:04.7829023Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:206:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.7829158Z 206 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.7829258Z | ^~~~ 2025-12-04T12:35:04.7830442Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:207:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.7830573Z 207 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.7830666Z | ^~~~ 2025-12-04T12:35:04.7831854Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:207:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.7831966Z 207 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.7832110Z | ^~~~ 2025-12-04T12:35:04.7833316Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:207:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.7833437Z 207 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.7833538Z | ^~~~ 2025-12-04T12:35:04.7834742Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:207:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.7834894Z 207 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.7835009Z | ^~~~ 2025-12-04T12:35:04.7836219Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:209:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.7836340Z 209 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.7836447Z | ^~~~ 2025-12-04T12:35:04.7837670Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:209:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.7837795Z 209 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.7837891Z | ^~~~ 2025-12-04T12:35:04.7839073Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:209:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.7839208Z 209 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.7839309Z | ^~~~ 2025-12-04T12:35:04.7840490Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:209:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.7840622Z 209 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.7840727Z | ^~~~ 2025-12-04T12:35:04.7841925Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:210:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.7842039Z 210 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.7842139Z | ^~~~ 2025-12-04T12:35:04.7843330Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:210:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.7843443Z 210 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.7843554Z | ^~~~ 2025-12-04T12:35:04.7844738Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:210:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.7844855Z 210 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.7844969Z | ^~~~ 2025-12-04T12:35:04.7846155Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:210:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.7846324Z 210 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.7846426Z | ^~~~ 2025-12-04T12:35:04.7847605Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:211:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.7847738Z 211 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.7847866Z | ^~~~ 2025-12-04T12:35:04.7849050Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:211:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.7849185Z 211 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.7849281Z | ^~~~ 2025-12-04T12:35:04.7850473Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:211:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.7850592Z 211 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.7850691Z | ^~~~ 2025-12-04T12:35:04.7851925Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:211:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.7852045Z 211 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.7852159Z | ^~~~ 2025-12-04T12:35:04.7853457Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:213:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.7853572Z 213 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.7853682Z | ^~~~ 2025-12-04T12:35:04.7854865Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:213:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.7854999Z 213 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.7855100Z | ^~~~ 2025-12-04T12:35:04.7856349Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:213:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.7856489Z 213 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.7856588Z | ^~~~ 2025-12-04T12:35:04.7857784Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:213:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.7857913Z 213 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.7858020Z | ^~~~ 2025-12-04T12:35:04.7859216Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:214:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.7859328Z 214 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.7859421Z | ^~~~ 2025-12-04T12:35:04.7860625Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:214:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.7860744Z 214 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.7860853Z | ^~~~ 2025-12-04T12:35:04.7862036Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:214:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.7862201Z 214 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.7862314Z | ^~~~ 2025-12-04T12:35:04.7863512Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:214:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.7863643Z 214 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.7863780Z | ^~~~ 2025-12-04T12:35:04.7864953Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:215:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.7865086Z 215 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.7865179Z | ^~~~ 2025-12-04T12:35:04.7866358Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:215:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.7866491Z 215 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.7866587Z | ^~~~ 2025-12-04T12:35:04.7867824Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:215:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.7867944Z 215 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.7868044Z | ^~~~ 2025-12-04T12:35:04.7869273Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:215:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.7869387Z 215 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.7869501Z | ^~~~ 2025-12-04T12:35:04.7870045Z In file included from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_float.h:12, 2025-12-04T12:35:04.7870486Z from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512.h:11, 2025-12-04T12:35:04.7870867Z from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec.h:5, 2025-12-04T12:35:04.7871554Z from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/functional_base.h:7, 2025-12-04T12:35:04.7871979Z from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/functional.h:4, 2025-12-04T12:35:04.7872448Z from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/torch/csrc/inductor/cpp_prefix.h:45, 2025-12-04T12:35:04.7873091Z from /tmp/U7W6v5/tmp2gno_q_y/data/aotinductor/model2/cyss5jazqjsvp5s2t3ihlofugodyzirark5aiimqjwirn4hylxbp.wrapper.cpp:656: 2025-12-04T12:35:04.7873714Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/sleef.h:192:10: warning: ISO C++ prohibits anonymous structs [-Wpedantic] 2025-12-04T12:35:04.7873812Z 192 | struct { 2025-12-04T12:35:04.7873910Z | ^ 2025-12-04T12:35:04.7874424Z In file included from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512.h:15, 2025-12-04T12:35:04.7874797Z from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec.h:5, 2025-12-04T12:35:04.7875255Z from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/functional_base.h:7, 2025-12-04T12:35:04.7875663Z from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/functional.h:4, 2025-12-04T12:35:04.7876124Z from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/torch/csrc/inductor/cpp_prefix.h:45, 2025-12-04T12:35:04.7876859Z from /tmp/U7W6v5/tmp2gno_q_y/data/aotinductor/model2/cyss5jazqjsvp5s2t3ihlofugodyzirark5aiimqjwirn4hylxbp.wrapper.cpp:656: 2025-12-04T12:35:04.7879194Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In static member function ‘static at::vec::CPU_CAPABILITY::Vectorized at::vec::CPU_CAPABILITY::Vectorized::blendv(const at::vec::CPU_CAPABILITY::Vectorized&, const at::vec::CPU_CAPABILITY::Vectorized&, const at::vec::CPU_CAPABILITY::Vectorized&)’: 2025-12-04T12:35:04.7880481Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:544:38: warning: overflow in conversion from ‘int’ to ‘short int’ changes value from ‘65535’ to ‘-1’ [-Woverflow] 2025-12-04T12:35:04.7880638Z 544 | auto msb_one = _mm512_set1_epi16(0xFFFF); 2025-12-04T12:35:04.7880772Z | ^~~~~~ 2025-12-04T12:35:04.7881281Z In file included from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512.h:15, 2025-12-04T12:35:04.7881649Z from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec.h:5, 2025-12-04T12:35:04.7882104Z from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/functional_base.h:7, 2025-12-04T12:35:04.7882518Z from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/functional.h:4, 2025-12-04T12:35:04.7882993Z from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/torch/csrc/inductor/cpp_prefix.h:45, 2025-12-04T12:35:04.7883639Z from /tmp/U7W6v5/tmp2gno_q_y/data/aotinductor/model2/cyss5jazqjsvp5s2t3ihlofugodyzirark5aiimqjwirn4hylxbp.wrapper.cpp:656: 2025-12-04T12:35:04.7885280Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In member function ‘at::vec::CPU_CAPABILITY::Vectorized at::vec::CPU_CAPABILITY::Vectorized::operator==(const at::vec::CPU_CAPABILITY::Vectorized&) const’: 2025-12-04T12:35:04.7886468Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:697:54: warning: overflow in conversion from ‘int’ to ‘short int’ changes value from ‘65535’ to ‘-1’ [-Woverflow] 2025-12-04T12:35:04.7886689Z 697 | return _mm512_mask_set1_epi16(zero_vector, mask, 0xFFFF); 2025-12-04T12:35:04.7886828Z | ^~~~~~ 2025-12-04T12:35:04.7888447Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In member function ‘at::vec::CPU_CAPABILITY::Vectorized at::vec::CPU_CAPABILITY::Vectorized::operator!=(const at::vec::CPU_CAPABILITY::Vectorized&) const’: 2025-12-04T12:35:04.7889621Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:701:54: warning: overflow in conversion from ‘int’ to ‘short int’ changes value from ‘65535’ to ‘-1’ [-Woverflow] 2025-12-04T12:35:04.7889831Z 701 | return _mm512_mask_set1_epi16(zero_vector, mask, 0xFFFF); 2025-12-04T12:35:04.7889970Z | ^~~~~~ 2025-12-04T12:35:04.7891604Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In member function ‘at::vec::CPU_CAPABILITY::Vectorized at::vec::CPU_CAPABILITY::Vectorized::operator<(const at::vec::CPU_CAPABILITY::Vectorized&) const’: 2025-12-04T12:35:04.7892775Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:705:54: warning: overflow in conversion from ‘int’ to ‘short int’ changes value from ‘65535’ to ‘-1’ [-Woverflow] 2025-12-04T12:35:04.7892994Z 705 | return _mm512_mask_set1_epi16(zero_vector, mask, 0xFFFF); 2025-12-04T12:35:04.7893163Z | ^~~~~~ 2025-12-04T12:35:04.7894793Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In member function ‘at::vec::CPU_CAPABILITY::Vectorized at::vec::CPU_CAPABILITY::Vectorized::operator<=(const at::vec::CPU_CAPABILITY::Vectorized&) const’: 2025-12-04T12:35:04.7895987Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:709:54: warning: overflow in conversion from ‘int’ to ‘short int’ changes value from ‘65535’ to ‘-1’ [-Woverflow] 2025-12-04T12:35:04.7896227Z 709 | return _mm512_mask_set1_epi16(zero_vector, mask, 0xFFFF); 2025-12-04T12:35:04.7896479Z | ^~~~~~ 2025-12-04T12:35:04.7898105Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In member function ‘at::vec::CPU_CAPABILITY::Vectorized at::vec::CPU_CAPABILITY::Vectorized::operator>(const at::vec::CPU_CAPABILITY::Vectorized&) const’: 2025-12-04T12:35:04.7899285Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:713:54: warning: overflow in conversion from ‘int’ to ‘short int’ changes value from ‘65535’ to ‘-1’ [-Woverflow] 2025-12-04T12:35:04.7899497Z 713 | return _mm512_mask_set1_epi16(zero_vector, mask, 0xFFFF); 2025-12-04T12:35:04.7899628Z | ^~~~~~ 2025-12-04T12:35:04.7901266Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In member function ‘at::vec::CPU_CAPABILITY::Vectorized at::vec::CPU_CAPABILITY::Vectorized::operator>=(const at::vec::CPU_CAPABILITY::Vectorized&) const’: 2025-12-04T12:35:04.7902431Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:717:54: warning: overflow in conversion from ‘int’ to ‘short int’ changes value from ‘65535’ to ‘-1’ [-Woverflow] 2025-12-04T12:35:04.7902660Z 717 | return _mm512_mask_set1_epi16(zero_vector, mask, 0xFFFF); 2025-12-04T12:35:04.7902787Z | ^~~~~~ 2025-12-04T12:35:04.7905056Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In static member function ‘static at::vec::CPU_CAPABILITY::Vectorized at::vec::CPU_CAPABILITY::Vectorized::blendv(const at::vec::CPU_CAPABILITY::Vectorized&, const at::vec::CPU_CAPABILITY::Vectorized&, const at::vec::CPU_CAPABILITY::Vectorized&)’: 2025-12-04T12:35:04.7906266Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1153:37: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.7906422Z 1153 | auto msb_one = _mm512_set1_epi8(0xFF); 2025-12-04T12:35:04.7906554Z | ^~~~ 2025-12-04T12:35:04.7908221Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In member function ‘at::vec::CPU_CAPABILITY::Vectorized at::vec::CPU_CAPABILITY::Vectorized::operator==(const at::vec::CPU_CAPABILITY::Vectorized&) const’: 2025-12-04T12:35:04.7909432Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1166:53: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.7909648Z 1166 | return _mm512_mask_set1_epi8(zero_vector, mask, 0xFF); 2025-12-04T12:35:04.7909790Z | ^~~~ 2025-12-04T12:35:04.7911440Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In member function ‘at::vec::CPU_CAPABILITY::Vectorized at::vec::CPU_CAPABILITY::Vectorized::operator!=(const at::vec::CPU_CAPABILITY::Vectorized&) const’: 2025-12-04T12:35:04.7912712Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1170:53: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.7912964Z 1170 | return _mm512_mask_set1_epi8(zero_vector, mask, 0xFF); 2025-12-04T12:35:04.7913090Z | ^~~~ 2025-12-04T12:35:04.7914796Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In member function ‘at::vec::CPU_CAPABILITY::Vectorized at::vec::CPU_CAPABILITY::Vectorized::operator<(const at::vec::CPU_CAPABILITY::Vectorized&) const’: 2025-12-04T12:35:04.7915990Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1174:53: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.7916209Z 1174 | return _mm512_mask_set1_epi8(zero_vector, mask, 0xFF); 2025-12-04T12:35:04.7916329Z | ^~~~ 2025-12-04T12:35:04.7917988Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In member function ‘at::vec::CPU_CAPABILITY::Vectorized at::vec::CPU_CAPABILITY::Vectorized::operator<=(const at::vec::CPU_CAPABILITY::Vectorized&) const’: 2025-12-04T12:35:04.7919206Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1178:53: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.7919409Z 1178 | return _mm512_mask_set1_epi8(zero_vector, mask, 0xFF); 2025-12-04T12:35:04.7919552Z | ^~~~ 2025-12-04T12:35:04.7921888Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In static member function ‘static at::vec::CPU_CAPABILITY::Vectorized at::vec::CPU_CAPABILITY::Vectorized::blendv(const at::vec::CPU_CAPABILITY::Vectorized&, const at::vec::CPU_CAPABILITY::Vectorized&, const at::vec::CPU_CAPABILITY::Vectorized&)’: 2025-12-04T12:35:04.7923105Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1207:37: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.7923257Z 1207 | auto msb_one = _mm512_set1_epi8(0xFF); 2025-12-04T12:35:04.7923371Z | ^~~~ 2025-12-04T12:35:04.7925087Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In member function ‘at::vec::CPU_CAPABILITY::Vectorized at::vec::CPU_CAPABILITY::Vectorized::operator==(const at::vec::CPU_CAPABILITY::Vectorized&) const’: 2025-12-04T12:35:04.7926289Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1220:53: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.7926512Z 1220 | return _mm512_mask_set1_epi8(zero_vector, mask, 0xFF); 2025-12-04T12:35:04.7926637Z | ^~~~ 2025-12-04T12:35:04.7928332Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In member function ‘at::vec::CPU_CAPABILITY::Vectorized at::vec::CPU_CAPABILITY::Vectorized::operator!=(const at::vec::CPU_CAPABILITY::Vectorized&) const’: 2025-12-04T12:35:04.7929570Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1224:53: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.7929774Z 1224 | return _mm512_mask_set1_epi8(zero_vector, mask, 0xFF); 2025-12-04T12:35:04.7929953Z | ^~~~ 2025-12-04T12:35:04.7931718Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In member function ‘at::vec::CPU_CAPABILITY::Vectorized at::vec::CPU_CAPABILITY::Vectorized::operator<(const at::vec::CPU_CAPABILITY::Vectorized&) const’: 2025-12-04T12:35:04.7932925Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1228:53: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.7933130Z 1228 | return _mm512_mask_set1_epi8(zero_vector, mask, 0xFF); 2025-12-04T12:35:04.7933250Z | ^~~~ 2025-12-04T12:35:04.7934952Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In member function ‘at::vec::CPU_CAPABILITY::Vectorized at::vec::CPU_CAPABILITY::Vectorized::operator<=(const at::vec::CPU_CAPABILITY::Vectorized&) const’: 2025-12-04T12:35:04.7936150Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1232:53: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.7936433Z 1232 | return _mm512_mask_set1_epi8(zero_vector, mask, 0xFF); 2025-12-04T12:35:04.7936563Z | ^~~~ 2025-12-04T12:35:04.7938971Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In instantiation of ‘at::vec::CPU_CAPABILITY::Vectorized at::vec::CPU_CAPABILITY::shift_512_8(const at::vec::CPU_CAPABILITY::Vectorized&, const at::vec::CPU_CAPABILITY::Vectorized&) [with bool left_shift = true; T = signed char; typename std::enable_if<(is_same_v || is_same_v), int>::type = 0]’: 2025-12-04T12:35:04.7939570Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:2074:27: required from here 2025-12-04T12:35:04.7940761Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1866:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7940873Z 1866 | 0x80, 2025-12-04T12:35:04.7940970Z | ^~~~ 2025-12-04T12:35:04.7942160Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1868:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7942262Z 1868 | 0x80, 2025-12-04T12:35:04.7942355Z | ^~~~ 2025-12-04T12:35:04.7943556Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1870:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7943657Z 1870 | 0x80, 2025-12-04T12:35:04.7943753Z | ^~~~ 2025-12-04T12:35:04.7944953Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1872:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7945048Z 1872 | 0x80, 2025-12-04T12:35:04.7945157Z | ^~~~ 2025-12-04T12:35:04.7946335Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1874:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7946499Z 1874 | 0x80, 2025-12-04T12:35:04.7946606Z | ^~~~ 2025-12-04T12:35:04.7947825Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1876:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7947969Z 1876 | 0x80, 2025-12-04T12:35:04.7948062Z | ^~~~ 2025-12-04T12:35:04.7949272Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1878:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7949387Z 1878 | 0x80, 2025-12-04T12:35:04.7949480Z | ^~~~ 2025-12-04T12:35:04.7950650Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1880:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7950765Z 1880 | 0x80, 2025-12-04T12:35:04.7950857Z | ^~~~ 2025-12-04T12:35:04.7952048Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1882:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7952149Z 1882 | 0x80, 2025-12-04T12:35:04.7952242Z | ^~~~ 2025-12-04T12:35:04.7953431Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1884:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7953525Z 1884 | 0x80, 2025-12-04T12:35:04.7953629Z | ^~~~ 2025-12-04T12:35:04.7954803Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1886:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7954903Z 1886 | 0x80, 2025-12-04T12:35:04.7955010Z | ^~~~ 2025-12-04T12:35:04.7956187Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1888:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7956284Z 1888 | 0x80, 2025-12-04T12:35:04.7956394Z | ^~~~ 2025-12-04T12:35:04.7957581Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1890:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7957690Z 1890 | 0x80, 2025-12-04T12:35:04.7957784Z | ^~~~ 2025-12-04T12:35:04.7958963Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1892:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7959072Z 1892 | 0x80, 2025-12-04T12:35:04.7959163Z | ^~~~ 2025-12-04T12:35:04.7960353Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1894:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7960451Z 1894 | 0x80, 2025-12-04T12:35:04.7960542Z | ^~~~ 2025-12-04T12:35:04.7961738Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1896:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7961832Z 1896 | 0x80, 2025-12-04T12:35:04.7961922Z | ^~~~ 2025-12-04T12:35:04.7963152Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1898:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7963246Z 1898 | 0x80, 2025-12-04T12:35:04.7963356Z | ^~~~ 2025-12-04T12:35:04.7964567Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1900:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7964692Z 1900 | 0x80, 2025-12-04T12:35:04.7964795Z | ^~~~ 2025-12-04T12:35:04.7966005Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1902:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7966113Z 1902 | 0x80, 2025-12-04T12:35:04.7966205Z | ^~~~ 2025-12-04T12:35:04.7967385Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1904:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7967493Z 1904 | 0x80, 2025-12-04T12:35:04.7967584Z | ^~~~ 2025-12-04T12:35:04.7968760Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1906:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7968871Z 1906 | 0x80, 2025-12-04T12:35:04.7968962Z | ^~~~ 2025-12-04T12:35:04.7970152Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1908:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7970246Z 1908 | 0x80, 2025-12-04T12:35:04.7970337Z | ^~~~ 2025-12-04T12:35:04.7971721Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1910:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7971816Z 1910 | 0x80, 2025-12-04T12:35:04.7971908Z | ^~~~ 2025-12-04T12:35:04.7973107Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1912:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7973207Z 1912 | 0x80, 2025-12-04T12:35:04.7973313Z | ^~~~ 2025-12-04T12:35:04.7974494Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1914:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7974591Z 1914 | 0x80, 2025-12-04T12:35:04.7974773Z | ^~~~ 2025-12-04T12:35:04.7975953Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1916:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7976061Z 1916 | 0x80, 2025-12-04T12:35:04.7976153Z | ^~~~ 2025-12-04T12:35:04.7977394Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1918:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7977561Z 1918 | 0x80, 2025-12-04T12:35:04.7977652Z | ^~~~ 2025-12-04T12:35:04.7978855Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1920:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7978948Z 1920 | 0x80, 2025-12-04T12:35:04.7979046Z | ^~~~ 2025-12-04T12:35:04.7980231Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1922:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7980326Z 1922 | 0x80, 2025-12-04T12:35:04.7980417Z | ^~~~ 2025-12-04T12:35:04.7981652Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1924:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7981754Z 1924 | 0x80, 2025-12-04T12:35:04.7981859Z | ^~~~ 2025-12-04T12:35:04.7983087Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1926:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7983182Z 1926 | 0x80, 2025-12-04T12:35:04.7983294Z | ^~~~ 2025-12-04T12:35:04.7984472Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1928:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7984582Z 1928 | 0x80); 2025-12-04T12:35:04.7984677Z | ^~~~ 2025-12-04T12:35:04.7985854Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1930:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7985970Z 1930 | 0x80, 2025-12-04T12:35:04.7986062Z | ^~~~ 2025-12-04T12:35:04.7987241Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1932:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7987352Z 1932 | 0x80, 2025-12-04T12:35:04.7987450Z | ^~~~ 2025-12-04T12:35:04.7988640Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1934:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7988734Z 1934 | 0x80, 2025-12-04T12:35:04.7988824Z | ^~~~ 2025-12-04T12:35:04.7990018Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1936:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7990118Z 1936 | 0x80, 2025-12-04T12:35:04.7990213Z | ^~~~ 2025-12-04T12:35:04.7991404Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1938:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7991497Z 1938 | 0x80, 2025-12-04T12:35:04.7991643Z | ^~~~ 2025-12-04T12:35:04.7992822Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1940:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7992917Z 1940 | 0x80, 2025-12-04T12:35:04.7993021Z | ^~~~ 2025-12-04T12:35:04.7994201Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1942:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7994350Z 1942 | 0x80, 2025-12-04T12:35:04.7994443Z | ^~~~ 2025-12-04T12:35:04.7995623Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1944:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7995736Z 1944 | 0x80, 2025-12-04T12:35:04.7995829Z | ^~~~ 2025-12-04T12:35:04.7997000Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1946:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7997107Z 1946 | 0x80, 2025-12-04T12:35:04.7997198Z | ^~~~ 2025-12-04T12:35:04.7998430Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1948:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7998527Z 1948 | 0x80, 2025-12-04T12:35:04.7998618Z | ^~~~ 2025-12-04T12:35:04.7999859Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1950:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.7999960Z 1950 | 0x80, 2025-12-04T12:35:04.8000066Z | ^~~~ 2025-12-04T12:35:04.8001241Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1952:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8001335Z 1952 | 0x80, 2025-12-04T12:35:04.8001443Z | ^~~~ 2025-12-04T12:35:04.8002628Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1954:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8002723Z 1954 | 0x80, 2025-12-04T12:35:04.8002830Z | ^~~~ 2025-12-04T12:35:04.8004010Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1956:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8004126Z 1956 | 0x80, 2025-12-04T12:35:04.8004219Z | ^~~~ 2025-12-04T12:35:04.8005392Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1958:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8005504Z 1958 | 0x80, 2025-12-04T12:35:04.8005599Z | ^~~~ 2025-12-04T12:35:04.8006799Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1960:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8006897Z 1960 | 0x80, 2025-12-04T12:35:04.8006992Z | ^~~~ 2025-12-04T12:35:04.8008183Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1962:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8008319Z 1962 | 0x80, 2025-12-04T12:35:04.8008411Z | ^~~~ 2025-12-04T12:35:04.8009605Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1964:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8009705Z 1964 | 0x80, 2025-12-04T12:35:04.8009852Z | ^~~~ 2025-12-04T12:35:04.8011063Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1966:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8011160Z 1966 | 0x80, 2025-12-04T12:35:04.8011306Z | ^~~~ 2025-12-04T12:35:04.8012480Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1968:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8012594Z 1968 | 0x80, 2025-12-04T12:35:04.8012686Z | ^~~~ 2025-12-04T12:35:04.8013863Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1970:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8013974Z 1970 | 0x80, 2025-12-04T12:35:04.8014081Z | ^~~~ 2025-12-04T12:35:04.8015253Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1972:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8015360Z 1972 | 0x80, 2025-12-04T12:35:04.8015458Z | ^~~~ 2025-12-04T12:35:04.8016709Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1974:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8016815Z 1974 | 0x80, 2025-12-04T12:35:04.8016908Z | ^~~~ 2025-12-04T12:35:04.8018103Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1976:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8018198Z 1976 | 0x80, 2025-12-04T12:35:04.8018305Z | ^~~~ 2025-12-04T12:35:04.8019493Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1978:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8019585Z 1978 | 0x80, 2025-12-04T12:35:04.8019697Z | ^~~~ 2025-12-04T12:35:04.8020869Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1980:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8020968Z 1980 | 0x80, 2025-12-04T12:35:04.8021074Z | ^~~~ 2025-12-04T12:35:04.8022252Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1982:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8022363Z 1982 | 0x80, 2025-12-04T12:35:04.8022464Z | ^~~~ 2025-12-04T12:35:04.8023635Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1984:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8023742Z 1984 | 0x80, 2025-12-04T12:35:04.8023841Z | ^~~~ 2025-12-04T12:35:04.8025016Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1986:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8025176Z 1986 | 0x80, 2025-12-04T12:35:04.8025267Z | ^~~~ 2025-12-04T12:35:04.8026458Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1988:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8026588Z 1988 | 0x80, 2025-12-04T12:35:04.8026716Z | ^~~~ 2025-12-04T12:35:04.8027906Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1990:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8028032Z 1990 | 0x80, 2025-12-04T12:35:04.8028139Z | ^~~~ 2025-12-04T12:35:04.8029312Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1992:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8029411Z 1992 | 0x80, 2025-12-04T12:35:04.8029516Z | ^~~~ 2025-12-04T12:35:04.8030702Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:2002:38: warning: overflow in conversion from ‘int’ to ‘short int’ changes value from ‘65280’ to ‘-256’ [-Woverflow] 2025-12-04T12:35:04.8030881Z 2002 | __m512i keep_1 = _mm512_set1_epi16(0xFF00); 2025-12-04T12:35:04.8031006Z | ^~~~~~ 2025-12-04T12:35:04.8033413Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In instantiation of ‘at::vec::CPU_CAPABILITY::Vectorized at::vec::CPU_CAPABILITY::shift_512_8(const at::vec::CPU_CAPABILITY::Vectorized&, const at::vec::CPU_CAPABILITY::Vectorized&) [with bool left_shift = true; T = unsigned char; typename std::enable_if<(is_same_v || is_same_v), int>::type = 0]’: 2025-12-04T12:35:04.8034012Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:2081:27: required from here 2025-12-04T12:35:04.8035201Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1866:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8035321Z 1866 | 0x80, 2025-12-04T12:35:04.8035417Z | ^~~~ 2025-12-04T12:35:04.8036593Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1868:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8036709Z 1868 | 0x80, 2025-12-04T12:35:04.8036803Z | ^~~~ 2025-12-04T12:35:04.8037989Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1870:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8038088Z 1870 | 0x80, 2025-12-04T12:35:04.8038181Z | ^~~~ 2025-12-04T12:35:04.8039371Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1872:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8039471Z 1872 | 0x80, 2025-12-04T12:35:04.8039561Z | ^~~~ 2025-12-04T12:35:04.8040753Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1874:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8040847Z 1874 | 0x80, 2025-12-04T12:35:04.8040953Z | ^~~~ 2025-12-04T12:35:04.8042126Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1876:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8042264Z 1876 | 0x80, 2025-12-04T12:35:04.8042368Z | ^~~~ 2025-12-04T12:35:04.8043574Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1878:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8043714Z 1878 | 0x80, 2025-12-04T12:35:04.8043806Z | ^~~~ 2025-12-04T12:35:04.8045028Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1880:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8045142Z 1880 | 0x80, 2025-12-04T12:35:04.8045235Z | ^~~~ 2025-12-04T12:35:04.8046410Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1882:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8046524Z 1882 | 0x80, 2025-12-04T12:35:04.8046617Z | ^~~~ 2025-12-04T12:35:04.8047810Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1884:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8047910Z 1884 | 0x80, 2025-12-04T12:35:04.8048002Z | ^~~~ 2025-12-04T12:35:04.8049190Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1886:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8049285Z 1886 | 0x80, 2025-12-04T12:35:04.8049388Z | ^~~~ 2025-12-04T12:35:04.8050559Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1888:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8050664Z 1888 | 0x80, 2025-12-04T12:35:04.8050772Z | ^~~~ 2025-12-04T12:35:04.8051949Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1890:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8052050Z 1890 | 0x80, 2025-12-04T12:35:04.8052155Z | ^~~~ 2025-12-04T12:35:04.8053331Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1892:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8053438Z 1892 | 0x80, 2025-12-04T12:35:04.8053530Z | ^~~~ 2025-12-04T12:35:04.8054706Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1894:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8054858Z 1894 | 0x80, 2025-12-04T12:35:04.8054949Z | ^~~~ 2025-12-04T12:35:04.8056140Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1896:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8056273Z 1896 | 0x80, 2025-12-04T12:35:04.8056436Z | ^~~~ 2025-12-04T12:35:04.8057643Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1898:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8057737Z 1898 | 0x80, 2025-12-04T12:35:04.8057831Z | ^~~~ 2025-12-04T12:35:04.8059019Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1900:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8059119Z 1900 | 0x80, 2025-12-04T12:35:04.8059226Z | ^~~~ 2025-12-04T12:35:04.8060460Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1902:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8060562Z 1902 | 0x80, 2025-12-04T12:35:04.8060669Z | ^~~~ 2025-12-04T12:35:04.8061878Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1904:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8061987Z 1904 | 0x80, 2025-12-04T12:35:04.8062079Z | ^~~~ 2025-12-04T12:35:04.8063254Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1906:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8063367Z 1906 | 0x80, 2025-12-04T12:35:04.8063458Z | ^~~~ 2025-12-04T12:35:04.8064634Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1908:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8064746Z 1908 | 0x80, 2025-12-04T12:35:04.8064836Z | ^~~~ 2025-12-04T12:35:04.8066028Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1910:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8066122Z 1910 | 0x80, 2025-12-04T12:35:04.8066213Z | ^~~~ 2025-12-04T12:35:04.8067400Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1912:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8067500Z 1912 | 0x80, 2025-12-04T12:35:04.8067592Z | ^~~~ 2025-12-04T12:35:04.8068783Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1914:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8068882Z 1914 | 0x80, 2025-12-04T12:35:04.8068986Z | ^~~~ 2025-12-04T12:35:04.8070162Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1916:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8070256Z 1916 | 0x80, 2025-12-04T12:35:04.8070362Z | ^~~~ 2025-12-04T12:35:04.8071761Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1918:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8071953Z 1918 | 0x80, 2025-12-04T12:35:04.8072045Z | ^~~~ 2025-12-04T12:35:04.8073227Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1920:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8073389Z 1920 | 0x80, 2025-12-04T12:35:04.8073480Z | ^~~~ 2025-12-04T12:35:04.8074663Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1922:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8074776Z 1922 | 0x80, 2025-12-04T12:35:04.8074868Z | ^~~~ 2025-12-04T12:35:04.8076062Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1924:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8076167Z 1924 | 0x80, 2025-12-04T12:35:04.8076258Z | ^~~~ 2025-12-04T12:35:04.8077502Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1926:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8077607Z 1926 | 0x80, 2025-12-04T12:35:04.8077713Z | ^~~~ 2025-12-04T12:35:04.8078941Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1928:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8079042Z 1928 | 0x80); 2025-12-04T12:35:04.8079153Z | ^~~~ 2025-12-04T12:35:04.8080330Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1930:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8080429Z 1930 | 0x80, 2025-12-04T12:35:04.8080535Z | ^~~~ 2025-12-04T12:35:04.8081715Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1932:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8081829Z 1932 | 0x80, 2025-12-04T12:35:04.8081921Z | ^~~~ 2025-12-04T12:35:04.8083098Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1934:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8083203Z 1934 | 0x80, 2025-12-04T12:35:04.8083295Z | ^~~~ 2025-12-04T12:35:04.8084488Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1936:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8084581Z 1936 | 0x80, 2025-12-04T12:35:04.8084673Z | ^~~~ 2025-12-04T12:35:04.8085862Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1938:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8085961Z 1938 | 0x80, 2025-12-04T12:35:04.8086053Z | ^~~~ 2025-12-04T12:35:04.8087265Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1940:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8087359Z 1940 | 0x80, 2025-12-04T12:35:04.8087465Z | ^~~~ 2025-12-04T12:35:04.8088711Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1942:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8088807Z 1942 | 0x80, 2025-12-04T12:35:04.8088915Z | ^~~~ 2025-12-04T12:35:04.8090133Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1944:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8090287Z 1944 | 0x80, 2025-12-04T12:35:04.8090381Z | ^~~~ 2025-12-04T12:35:04.8091598Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1946:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8091711Z 1946 | 0x80, 2025-12-04T12:35:04.8091804Z | ^~~~ 2025-12-04T12:35:04.8092981Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1948:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8093094Z 1948 | 0x80, 2025-12-04T12:35:04.8093187Z | ^~~~ 2025-12-04T12:35:04.8094381Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1950:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8094483Z 1950 | 0x80, 2025-12-04T12:35:04.8094579Z | ^~~~ 2025-12-04T12:35:04.8095775Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1952:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8095871Z 1952 | 0x80, 2025-12-04T12:35:04.8095979Z | ^~~~ 2025-12-04T12:35:04.8097224Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1954:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8097322Z 1954 | 0x80, 2025-12-04T12:35:04.8097429Z | ^~~~ 2025-12-04T12:35:04.8098618Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1956:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8098723Z 1956 | 0x80, 2025-12-04T12:35:04.8098832Z | ^~~~ 2025-12-04T12:35:04.8100013Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1958:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8100124Z 1958 | 0x80, 2025-12-04T12:35:04.8100225Z | ^~~~ 2025-12-04T12:35:04.8101398Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1960:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8101507Z 1960 | 0x80, 2025-12-04T12:35:04.8101602Z | ^~~~ 2025-12-04T12:35:04.8102791Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1962:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8102894Z 1962 | 0x80, 2025-12-04T12:35:04.8102986Z | ^~~~ 2025-12-04T12:35:04.8104177Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1964:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8104275Z 1964 | 0x80, 2025-12-04T12:35:04.8104413Z | ^~~~ 2025-12-04T12:35:04.8105609Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1966:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8105706Z 1966 | 0x80, 2025-12-04T12:35:04.8105817Z | ^~~~ 2025-12-04T12:35:04.8107034Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1968:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8107247Z 1968 | 0x80, 2025-12-04T12:35:04.8107357Z | ^~~~ 2025-12-04T12:35:04.8108572Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1970:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8108680Z 1970 | 0x80, 2025-12-04T12:35:04.8108777Z | ^~~~ 2025-12-04T12:35:04.8109952Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1972:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8110060Z 1972 | 0x80, 2025-12-04T12:35:04.8110150Z | ^~~~ 2025-12-04T12:35:04.8111328Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1974:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8111441Z 1974 | 0x80, 2025-12-04T12:35:04.8111532Z | ^~~~ 2025-12-04T12:35:04.8112721Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1976:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8112814Z 1976 | 0x80, 2025-12-04T12:35:04.8112910Z | ^~~~ 2025-12-04T12:35:04.8114092Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1978:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8114185Z 1978 | 0x80, 2025-12-04T12:35:04.8114277Z | ^~~~ 2025-12-04T12:35:04.8115468Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1980:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8115566Z 1980 | 0x80, 2025-12-04T12:35:04.8115669Z | ^~~~ 2025-12-04T12:35:04.8116844Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1982:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8116942Z 1982 | 0x80, 2025-12-04T12:35:04.8117048Z | ^~~~ 2025-12-04T12:35:04.8118219Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1984:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8118328Z 1984 | 0x80, 2025-12-04T12:35:04.8118420Z | ^~~~ 2025-12-04T12:35:04.8119596Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1986:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8119707Z 1986 | 0x80, 2025-12-04T12:35:04.8119800Z | ^~~~ 2025-12-04T12:35:04.8120985Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1988:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8121137Z 1988 | 0x80, 2025-12-04T12:35:04.8121228Z | ^~~~ 2025-12-04T12:35:04.8122415Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1990:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8122509Z 1990 | 0x80, 2025-12-04T12:35:04.8122599Z | ^~~~ 2025-12-04T12:35:04.8123881Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1992:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8123976Z 1992 | 0x80, 2025-12-04T12:35:04.8124081Z | ^~~~ 2025-12-04T12:35:04.8125297Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:2002:38: warning: overflow in conversion from ‘int’ to ‘short int’ changes value from ‘65280’ to ‘-256’ [-Woverflow] 2025-12-04T12:35:04.8125464Z 2002 | __m512i keep_1 = _mm512_set1_epi16(0xFF00); 2025-12-04T12:35:04.8125595Z | ^~~~~~ 2025-12-04T12:35:04.8128000Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In instantiation of ‘at::vec::CPU_CAPABILITY::Vectorized at::vec::CPU_CAPABILITY::shift_512_8(const at::vec::CPU_CAPABILITY::Vectorized&, const at::vec::CPU_CAPABILITY::Vectorized&) [with bool left_shift = false; T = signed char; typename std::enable_if<(is_same_v || is_same_v), int>::type = 0]’: 2025-12-04T12:35:04.8128607Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:2109:28: required from here 2025-12-04T12:35:04.8129789Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1866:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8129892Z 1866 | 0x80, 2025-12-04T12:35:04.8130001Z | ^~~~ 2025-12-04T12:35:04.8131174Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1868:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8131285Z 1868 | 0x80, 2025-12-04T12:35:04.8131393Z | ^~~~ 2025-12-04T12:35:04.8132566Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1870:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8132678Z 1870 | 0x80, 2025-12-04T12:35:04.8132779Z | ^~~~ 2025-12-04T12:35:04.8133965Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1872:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8134066Z 1872 | 0x80, 2025-12-04T12:35:04.8134159Z | ^~~~ 2025-12-04T12:35:04.8135343Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1874:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8135437Z 1874 | 0x80, 2025-12-04T12:35:04.8135540Z | ^~~~ 2025-12-04T12:35:04.8136819Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1876:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8136917Z 1876 | 0x80, 2025-12-04T12:35:04.8137031Z | ^~~~ 2025-12-04T12:35:04.8138213Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1878:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8138359Z 1878 | 0x80, 2025-12-04T12:35:04.8138472Z | ^~~~ 2025-12-04T12:35:04.8139643Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1880:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8139753Z 1880 | 0x80, 2025-12-04T12:35:04.8139916Z | ^~~~ 2025-12-04T12:35:04.8141093Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1882:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8141202Z 1882 | 0x80, 2025-12-04T12:35:04.8141330Z | ^~~~ 2025-12-04T12:35:04.8142510Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1884:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8142625Z 1884 | 0x80, 2025-12-04T12:35:04.8142716Z | ^~~~ 2025-12-04T12:35:04.8143896Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1886:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8143996Z 1886 | 0x80, 2025-12-04T12:35:04.8144092Z | ^~~~ 2025-12-04T12:35:04.8145275Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1888:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8145375Z 1888 | 0x80, 2025-12-04T12:35:04.8145479Z | ^~~~ 2025-12-04T12:35:04.8146655Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1890:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8146756Z 1890 | 0x80, 2025-12-04T12:35:04.8146859Z | ^~~~ 2025-12-04T12:35:04.8148025Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1892:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8148124Z 1892 | 0x80, 2025-12-04T12:35:04.8148234Z | ^~~~ 2025-12-04T12:35:04.8149403Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1894:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8149514Z 1894 | 0x80, 2025-12-04T12:35:04.8149607Z | ^~~~ 2025-12-04T12:35:04.8150780Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1896:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8151529Z 1896 | 0x80, 2025-12-04T12:35:04.8151625Z | ^~~~ 2025-12-04T12:35:04.8152823Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1898:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8152966Z 1898 | 0x80, 2025-12-04T12:35:04.8153058Z | ^~~~ 2025-12-04T12:35:04.8154245Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1900:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8154344Z 1900 | 0x80, 2025-12-04T12:35:04.8154436Z | ^~~~ 2025-12-04T12:35:04.8155622Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1902:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8155721Z 1902 | 0x80, 2025-12-04T12:35:04.8155828Z | ^~~~ 2025-12-04T12:35:04.8157002Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1904:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8157138Z 1904 | 0x80, 2025-12-04T12:35:04.8157246Z | ^~~~ 2025-12-04T12:35:04.8158421Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1906:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8158567Z 1906 | 0x80, 2025-12-04T12:35:04.8158661Z | ^~~~ 2025-12-04T12:35:04.8159838Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1908:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8159950Z 1908 | 0x80, 2025-12-04T12:35:04.8160042Z | ^~~~ 2025-12-04T12:35:04.8161216Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1910:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8161328Z 1910 | 0x80, 2025-12-04T12:35:04.8161420Z | ^~~~ 2025-12-04T12:35:04.8162612Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1912:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8162706Z 1912 | 0x80, 2025-12-04T12:35:04.8162798Z | ^~~~ 2025-12-04T12:35:04.8163989Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1914:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8164091Z 1914 | 0x80, 2025-12-04T12:35:04.8164184Z | ^~~~ 2025-12-04T12:35:04.8165380Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1916:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8165481Z 1916 | 0x80, 2025-12-04T12:35:04.8165585Z | ^~~~ 2025-12-04T12:35:04.8166765Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1918:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8166858Z 1918 | 0x80, 2025-12-04T12:35:04.8166965Z | ^~~~ 2025-12-04T12:35:04.8168141Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1920:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8168287Z 1920 | 0x80, 2025-12-04T12:35:04.8168380Z | ^~~~ 2025-12-04T12:35:04.8169563Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1922:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8169705Z 1922 | 0x80, 2025-12-04T12:35:04.8169797Z | ^~~~ 2025-12-04T12:35:04.8171191Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1924:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8171315Z 1924 | 0x80, 2025-12-04T12:35:04.8171409Z | ^~~~ 2025-12-04T12:35:04.8172601Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1926:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8172703Z 1926 | 0x80, 2025-12-04T12:35:04.8172796Z | ^~~~ 2025-12-04T12:35:04.8174064Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1928:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8174169Z 1928 | 0x80); 2025-12-04T12:35:04.8174278Z | ^~~~ 2025-12-04T12:35:04.8175508Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1930:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8175607Z 1930 | 0x80, 2025-12-04T12:35:04.8175719Z | ^~~~ 2025-12-04T12:35:04.8176959Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1932:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8177083Z 1932 | 0x80, 2025-12-04T12:35:04.8177192Z | ^~~~ 2025-12-04T12:35:04.8178389Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1934:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8178507Z 1934 | 0x80, 2025-12-04T12:35:04.8178601Z | ^~~~ 2025-12-04T12:35:04.8179782Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1936:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8179897Z 1936 | 0x80, 2025-12-04T12:35:04.8179991Z | ^~~~ 2025-12-04T12:35:04.8181184Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1938:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8181288Z 1938 | 0x80, 2025-12-04T12:35:04.8181382Z | ^~~~ 2025-12-04T12:35:04.8182572Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1940:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8182674Z 1940 | 0x80, 2025-12-04T12:35:04.8182770Z | ^~~~ 2025-12-04T12:35:04.8183966Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1942:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8184064Z 1942 | 0x80, 2025-12-04T12:35:04.8184173Z | ^~~~ 2025-12-04T12:35:04.8185346Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1944:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8185518Z 1944 | 0x80, 2025-12-04T12:35:04.8185626Z | ^~~~ 2025-12-04T12:35:04.8186845Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1946:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8187001Z 1946 | 0x80, 2025-12-04T12:35:04.8187096Z | ^~~~ 2025-12-04T12:35:04.8188318Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1948:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8188431Z 1948 | 0x80, 2025-12-04T12:35:04.8188527Z | ^~~~ 2025-12-04T12:35:04.8189715Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1950:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8189831Z 1950 | 0x80, 2025-12-04T12:35:04.8189926Z | ^~~~ 2025-12-04T12:35:04.8191129Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1952:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8191232Z 1952 | 0x80, 2025-12-04T12:35:04.8191324Z | ^~~~ 2025-12-04T12:35:04.8192519Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1954:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8192613Z 1954 | 0x80, 2025-12-04T12:35:04.8192707Z | ^~~~ 2025-12-04T12:35:04.8193901Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1956:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8194004Z 1956 | 0x80, 2025-12-04T12:35:04.8194109Z | ^~~~ 2025-12-04T12:35:04.8195287Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1958:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8195389Z 1958 | 0x80, 2025-12-04T12:35:04.8195501Z | ^~~~ 2025-12-04T12:35:04.8196679Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1960:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8196788Z 1960 | 0x80, 2025-12-04T12:35:04.8196880Z | ^~~~ 2025-12-04T12:35:04.8198066Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1962:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8198185Z 1962 | 0x80, 2025-12-04T12:35:04.8198275Z | ^~~~ 2025-12-04T12:35:04.8199465Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1964:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8199564Z 1964 | 0x80, 2025-12-04T12:35:04.8199658Z | ^~~~ 2025-12-04T12:35:04.8200866Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1966:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8200959Z 1966 | 0x80, 2025-12-04T12:35:04.8201053Z | ^~~~ 2025-12-04T12:35:04.8202243Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1968:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8202387Z 1968 | 0x80, 2025-12-04T12:35:04.8202490Z | ^~~~ 2025-12-04T12:35:04.8203703Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1970:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8203830Z 1970 | 0x80, 2025-12-04T12:35:04.8203942Z | ^~~~ 2025-12-04T12:35:04.8205156Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1972:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8205267Z 1972 | 0x80, 2025-12-04T12:35:04.8205359Z | ^~~~ 2025-12-04T12:35:04.8206540Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1974:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8206646Z 1974 | 0x80, 2025-12-04T12:35:04.8206740Z | ^~~~ 2025-12-04T12:35:04.8207918Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1976:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8208033Z 1976 | 0x80, 2025-12-04T12:35:04.8208127Z | ^~~~ 2025-12-04T12:35:04.8209324Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1978:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8209422Z 1978 | 0x80, 2025-12-04T12:35:04.8209516Z | ^~~~ 2025-12-04T12:35:04.8210711Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1980:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8210805Z 1980 | 0x80, 2025-12-04T12:35:04.8210900Z | ^~~~ 2025-12-04T12:35:04.8212095Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1982:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8212194Z 1982 | 0x80, 2025-12-04T12:35:04.8212299Z | ^~~~ 2025-12-04T12:35:04.8213486Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1984:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8213582Z 1984 | 0x80, 2025-12-04T12:35:04.8213686Z | ^~~~ 2025-12-04T12:35:04.8214875Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1986:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8214980Z 1986 | 0x80, 2025-12-04T12:35:04.8215071Z | ^~~~ 2025-12-04T12:35:04.8216254Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1988:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8216440Z 1988 | 0x80, 2025-12-04T12:35:04.8216534Z | ^~~~ 2025-12-04T12:35:04.8217734Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1990:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8217845Z 1990 | 0x80, 2025-12-04T12:35:04.8217939Z | ^~~~ 2025-12-04T12:35:04.8219197Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1992:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8219294Z 1992 | 0x80, 2025-12-04T12:35:04.8219387Z | ^~~~ 2025-12-04T12:35:04.8220777Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:2002:38: warning: overflow in conversion from ‘int’ to ‘short int’ changes value from ‘65280’ to ‘-256’ [-Woverflow] 2025-12-04T12:35:04.8220977Z 2002 | __m512i keep_1 = _mm512_set1_epi16(0xFF00); 2025-12-04T12:35:04.8221114Z | ^~~~~~ 2025-12-04T12:35:04.8223586Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In instantiation of ‘at::vec::CPU_CAPABILITY::Vectorized at::vec::CPU_CAPABILITY::shift_512_8(const at::vec::CPU_CAPABILITY::Vectorized&, const at::vec::CPU_CAPABILITY::Vectorized&) [with bool left_shift = false; T = unsigned char; typename std::enable_if<(is_same_v || is_same_v), int>::type = 0]’: 2025-12-04T12:35:04.8224177Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:2116:28: required from here 2025-12-04T12:35:04.8225387Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1866:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8225490Z 1866 | 0x80, 2025-12-04T12:35:04.8225599Z | ^~~~ 2025-12-04T12:35:04.8226785Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1868:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8226879Z 1868 | 0x80, 2025-12-04T12:35:04.8226988Z | ^~~~ 2025-12-04T12:35:04.8228162Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1870:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8228272Z 1870 | 0x80, 2025-12-04T12:35:04.8228370Z | ^~~~ 2025-12-04T12:35:04.8229557Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1872:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8229673Z 1872 | 0x80, 2025-12-04T12:35:04.8229767Z | ^~~~ 2025-12-04T12:35:04.8230966Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1874:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8231062Z 1874 | 0x80, 2025-12-04T12:35:04.8231160Z | ^~~~ 2025-12-04T12:35:04.8232348Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1876:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8232445Z 1876 | 0x80, 2025-12-04T12:35:04.8232541Z | ^~~~ 2025-12-04T12:35:04.8233736Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1878:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8233834Z 1878 | 0x80, 2025-12-04T12:35:04.8233940Z | ^~~~ 2025-12-04T12:35:04.8235117Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1880:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8235265Z 1880 | 0x80, 2025-12-04T12:35:04.8235370Z | ^~~~ 2025-12-04T12:35:04.8236546Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1882:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8236654Z 1882 | 0x80, 2025-12-04T12:35:04.8236748Z | ^~~~ 2025-12-04T12:35:04.8237959Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1884:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8238102Z 1884 | 0x80, 2025-12-04T12:35:04.8238195Z | ^~~~ 2025-12-04T12:35:04.8239432Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1886:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8239551Z 1886 | 0x80, 2025-12-04T12:35:04.8239645Z | ^~~~ 2025-12-04T12:35:04.8240838Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1888:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8240933Z 1888 | 0x80, 2025-12-04T12:35:04.8241026Z | ^~~~ 2025-12-04T12:35:04.8242234Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1890:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8242330Z 1890 | 0x80, 2025-12-04T12:35:04.8242420Z | ^~~~ 2025-12-04T12:35:04.8243614Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1892:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8243715Z 1892 | 0x80, 2025-12-04T12:35:04.8243821Z | ^~~~ 2025-12-04T12:35:04.8244992Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1894:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8245085Z 1894 | 0x80, 2025-12-04T12:35:04.8245189Z | ^~~~ 2025-12-04T12:35:04.8246375Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1896:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8246484Z 1896 | 0x80, 2025-12-04T12:35:04.8246578Z | ^~~~ 2025-12-04T12:35:04.8247767Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1898:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8247921Z 1898 | 0x80, 2025-12-04T12:35:04.8248012Z | ^~~~ 2025-12-04T12:35:04.8249188Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1900:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8249296Z 1900 | 0x80, 2025-12-04T12:35:04.8249394Z | ^~~~ 2025-12-04T12:35:04.8250618Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1902:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8250713Z 1902 | 0x80, 2025-12-04T12:35:04.8250816Z | ^~~~ 2025-12-04T12:35:04.8252022Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1904:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8252127Z 1904 | 0x80, 2025-12-04T12:35:04.8252235Z | ^~~~ 2025-12-04T12:35:04.8253422Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1906:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8253517Z 1906 | 0x80, 2025-12-04T12:35:04.8253689Z | ^~~~ 2025-12-04T12:35:04.8254883Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1908:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8254977Z 1908 | 0x80, 2025-12-04T12:35:04.8255126Z | ^~~~ 2025-12-04T12:35:04.8256374Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1910:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8256493Z 1910 | 0x80, 2025-12-04T12:35:04.8256586Z | ^~~~ 2025-12-04T12:35:04.8257768Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1912:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8257883Z 1912 | 0x80, 2025-12-04T12:35:04.8257991Z | ^~~~ 2025-12-04T12:35:04.8259182Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1914:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8259275Z 1914 | 0x80, 2025-12-04T12:35:04.8259372Z | ^~~~ 2025-12-04T12:35:04.8260566Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1916:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8260668Z 1916 | 0x80, 2025-12-04T12:35:04.8260759Z | ^~~~ 2025-12-04T12:35:04.8261954Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1918:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8262048Z 1918 | 0x80, 2025-12-04T12:35:04.8262170Z | ^~~~ 2025-12-04T12:35:04.8263355Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1920:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8263449Z 1920 | 0x80, 2025-12-04T12:35:04.8263565Z | ^~~~ 2025-12-04T12:35:04.8264751Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1922:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8264916Z 1922 | 0x80, 2025-12-04T12:35:04.8265009Z | ^~~~ 2025-12-04T12:35:04.8266188Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1924:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8266302Z 1924 | 0x80, 2025-12-04T12:35:04.8266441Z | ^~~~ 2025-12-04T12:35:04.8267622Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1926:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8267734Z 1926 | 0x80, 2025-12-04T12:35:04.8267834Z | ^~~~ 2025-12-04T12:35:04.8269019Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1928:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8269125Z 1928 | 0x80); 2025-12-04T12:35:04.8269218Z | ^~~~ 2025-12-04T12:35:04.8270410Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1930:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8270542Z 1930 | 0x80, 2025-12-04T12:35:04.8270656Z | ^~~~ 2025-12-04T12:35:04.8272036Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1932:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8272206Z 1932 | 0x80, 2025-12-04T12:35:04.8272315Z | ^~~~ 2025-12-04T12:35:04.8273502Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1934:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8273603Z 1934 | 0x80, 2025-12-04T12:35:04.8273715Z | ^~~~ 2025-12-04T12:35:04.8274894Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1936:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8275016Z 1936 | 0x80, 2025-12-04T12:35:04.8275110Z | ^~~~ 2025-12-04T12:35:04.8276288Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1938:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8276406Z 1938 | 0x80, 2025-12-04T12:35:04.8276500Z | ^~~~ 2025-12-04T12:35:04.8277692Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1940:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8277796Z 1940 | 0x80, 2025-12-04T12:35:04.8277892Z | ^~~~ 2025-12-04T12:35:04.8279089Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1942:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8279201Z 1942 | 0x80, 2025-12-04T12:35:04.8279295Z | ^~~~ 2025-12-04T12:35:04.8280480Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1944:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8280580Z 1944 | 0x80, 2025-12-04T12:35:04.8280689Z | ^~~~ 2025-12-04T12:35:04.8281858Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1946:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8282013Z 1946 | 0x80, 2025-12-04T12:35:04.8282124Z | ^~~~ 2025-12-04T12:35:04.8283438Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1948:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8283713Z 1948 | 0x80, 2025-12-04T12:35:04.8283810Z | ^~~~ 2025-12-04T12:35:04.8284997Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1950:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8285141Z 1950 | 0x80, 2025-12-04T12:35:04.8285237Z | ^~~~ 2025-12-04T12:35:04.8286412Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1952:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8286527Z 1952 | 0x80, 2025-12-04T12:35:04.8286620Z | ^~~~ 2025-12-04T12:35:04.8287821Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1954:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8287921Z 1954 | 0x80, 2025-12-04T12:35:04.8288013Z | ^~~~ 2025-12-04T12:35:04.8289227Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1956:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8289322Z 1956 | 0x80, 2025-12-04T12:35:04.8289414Z | ^~~~ 2025-12-04T12:35:04.8290607Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1958:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8290707Z 1958 | 0x80, 2025-12-04T12:35:04.8290814Z | ^~~~ 2025-12-04T12:35:04.8291995Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1960:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8292095Z 1960 | 0x80, 2025-12-04T12:35:04.8292203Z | ^~~~ 2025-12-04T12:35:04.8293395Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1962:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8293502Z 1962 | 0x80, 2025-12-04T12:35:04.8293593Z | ^~~~ 2025-12-04T12:35:04.8294772Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1964:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8294886Z 1964 | 0x80, 2025-12-04T12:35:04.8294977Z | ^~~~ 2025-12-04T12:35:04.8296159Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1966:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8296272Z 1966 | 0x80, 2025-12-04T12:35:04.8296433Z | ^~~~ 2025-12-04T12:35:04.8297638Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1968:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8297738Z 1968 | 0x80, 2025-12-04T12:35:04.8297829Z | ^~~~ 2025-12-04T12:35:04.8299015Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1970:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8299169Z 1970 | 0x80, 2025-12-04T12:35:04.8299274Z | ^~~~ 2025-12-04T12:35:04.8300493Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1972:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8300652Z 1972 | 0x80, 2025-12-04T12:35:04.8300757Z | ^~~~ 2025-12-04T12:35:04.8301962Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1974:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8302058Z 1974 | 0x80, 2025-12-04T12:35:04.8302164Z | ^~~~ 2025-12-04T12:35:04.8303341Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1976:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8303453Z 1976 | 0x80, 2025-12-04T12:35:04.8303545Z | ^~~~ 2025-12-04T12:35:04.8304727Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1978:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8304840Z 1978 | 0x80, 2025-12-04T12:35:04.8304933Z | ^~~~ 2025-12-04T12:35:04.8306126Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1980:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8306221Z 1980 | 0x80, 2025-12-04T12:35:04.8306312Z | ^~~~ 2025-12-04T12:35:04.8307496Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1982:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8307597Z 1982 | 0x80, 2025-12-04T12:35:04.8307690Z | ^~~~ 2025-12-04T12:35:04.8308889Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1984:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8308989Z 1984 | 0x80, 2025-12-04T12:35:04.8309096Z | ^~~~ 2025-12-04T12:35:04.8310274Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1986:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8310368Z 1986 | 0x80, 2025-12-04T12:35:04.8310476Z | ^~~~ 2025-12-04T12:35:04.8311644Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1988:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8311759Z 1988 | 0x80, 2025-12-04T12:35:04.8311852Z | ^~~~ 2025-12-04T12:35:04.8313030Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1990:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8313143Z 1990 | 0x80, 2025-12-04T12:35:04.8313234Z | ^~~~ 2025-12-04T12:35:04.8314418Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1992:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8314525Z 1992 | 0x80, 2025-12-04T12:35:04.8314618Z | ^~~~ 2025-12-04T12:35:04.8315801Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:2002:38: warning: overflow in conversion from ‘int’ to ‘short int’ changes value from ‘65280’ to ‘-256’ [-Woverflow] 2025-12-04T12:35:04.8316006Z 2002 | __m512i keep_1 = _mm512_set1_epi16(0xFF00); 2025-12-04T12:35:04.8316122Z | ^~~~~~ 2025-12-04T12:35:04.8316676Z In file included from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512.h:16, 2025-12-04T12:35:04.8317081Z from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec.h:5, 2025-12-04T12:35:04.8317541Z from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/functional_base.h:7, 2025-12-04T12:35:04.8317974Z from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/functional.h:4, 2025-12-04T12:35:04.8318437Z from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/torch/csrc/inductor/cpp_prefix.h:45, 2025-12-04T12:35:04.8319099Z from /tmp/U7W6v5/tmp2gno_q_y/data/aotinductor/model2/cyss5jazqjsvp5s2t3ihlofugodyzirark5aiimqjwirn4hylxbp.wrapper.cpp:656: 2025-12-04T12:35:04.8320589Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h: In instantiation of ‘void at::vec::CPU_CAPABILITY::QuantizeAvx512(const float*, T*, int, float, int64_t) [with T = signed char; int64_t = long int]’: 2025-12-04T12:35:04.8321180Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:696:31: required from here 2025-12-04T12:35:04.8322366Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:201:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.8322485Z 201 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.8322595Z | ^~~~ 2025-12-04T12:35:04.8323790Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:201:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.8323923Z 201 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.8324027Z | ^~~~ 2025-12-04T12:35:04.8325221Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:201:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.8325356Z 201 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.8325461Z | ^~~~ 2025-12-04T12:35:04.8326666Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:201:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.8326786Z 201 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.8326888Z | ^~~~ 2025-12-04T12:35:04.8328084Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:202:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.8328198Z 202 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.8328306Z | ^~~~ 2025-12-04T12:35:04.8329510Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:202:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.8329629Z 202 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.8329738Z | ^~~~ 2025-12-04T12:35:04.8330918Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:202:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.8331081Z 202 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.8331194Z | ^~~~ 2025-12-04T12:35:04.8332413Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:202:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.8332575Z 202 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.8332679Z | ^~~~ 2025-12-04T12:35:04.8333892Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:203:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.8334020Z 203 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.8334112Z | ^~~~ 2025-12-04T12:35:04.8335308Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:203:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.8335430Z 203 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.8335527Z | ^~~~ 2025-12-04T12:35:04.8336803Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:203:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.8336924Z 203 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.8337026Z | ^~~~ 2025-12-04T12:35:04.8338237Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:203:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.8338351Z 203 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.8338470Z | ^~~~ 2025-12-04T12:35:04.8339655Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:205:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.8339767Z 205 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.8339877Z | ^~~~ 2025-12-04T12:35:04.8341069Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:205:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.8341204Z 205 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.8341301Z | ^~~~ 2025-12-04T12:35:04.8342485Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:205:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.8342668Z 205 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.8342769Z | ^~~~ 2025-12-04T12:35:04.8343967Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:205:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.8344082Z 205 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.8344228Z | ^~~~ 2025-12-04T12:35:04.8345422Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:206:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.8345540Z 206 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.8345632Z | ^~~~ 2025-12-04T12:35:04.8346828Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:206:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.8346945Z 206 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.8353568Z | ^~~~ 2025-12-04T12:35:04.8355092Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:206:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.8355227Z 206 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.8355350Z | ^~~~ 2025-12-04T12:35:04.8356597Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:206:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.8356718Z 206 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.8356838Z | ^~~~ 2025-12-04T12:35:04.8358038Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:207:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.8358178Z 207 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.8358274Z | ^~~~ 2025-12-04T12:35:04.8359467Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:207:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.8359605Z 207 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.8359704Z | ^~~~ 2025-12-04T12:35:04.8360910Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:207:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.8361024Z 207 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.8361128Z | ^~~~ 2025-12-04T12:35:04.8362338Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:207:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.8362455Z 207 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.8362571Z | ^~~~ 2025-12-04T12:35:04.8363760Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:209:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.8363881Z 209 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.8363989Z | ^~~~ 2025-12-04T12:35:04.8365176Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:209:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.8365340Z 209 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.8365455Z | ^~~~ 2025-12-04T12:35:04.8366639Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:209:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.8366770Z 209 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.8367016Z | ^~~~ 2025-12-04T12:35:04.8368201Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:209:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.8368336Z 209 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.8368440Z | ^~~~ 2025-12-04T12:35:04.8369630Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:210:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.8369749Z 210 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.8369843Z | ^~~~ 2025-12-04T12:35:04.8371336Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:210:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.8371458Z 210 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.8371572Z | ^~~~ 2025-12-04T12:35:04.8372823Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:210:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.8372938Z 210 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.8373055Z | ^~~~ 2025-12-04T12:35:04.8374247Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:210:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.8374367Z 210 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.8374482Z | ^~~~ 2025-12-04T12:35:04.8375667Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:211:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.8375802Z 211 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.8375899Z | ^~~~ 2025-12-04T12:35:04.8377211Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:211:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.8377336Z 211 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.8377433Z | ^~~~ 2025-12-04T12:35:04.8378650Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:211:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.8378761Z 211 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.8378859Z | ^~~~ 2025-12-04T12:35:04.8380064Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:211:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.8380183Z 211 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.8380284Z | ^~~~ 2025-12-04T12:35:04.8381478Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:213:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.8381655Z 213 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.8381763Z | ^~~~ 2025-12-04T12:35:04.8382945Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:213:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.8383055Z 213 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.8383258Z | ^~~~ 2025-12-04T12:35:04.8384448Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:213:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.8384611Z 213 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.8384715Z | ^~~~ 2025-12-04T12:35:04.8385890Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:213:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.8386020Z 213 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.8386121Z | ^~~~ 2025-12-04T12:35:04.8387324Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:214:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.8387440Z 214 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.8387533Z | ^~~~ 2025-12-04T12:35:04.8388736Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:214:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.8388848Z 214 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.8388945Z | ^~~~ 2025-12-04T12:35:04.8390137Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:214:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.8390255Z 214 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.8390371Z | ^~~~ 2025-12-04T12:35:04.8391558Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:214:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.8391676Z 214 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.8391788Z | ^~~~ 2025-12-04T12:35:04.8392973Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:215:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.8393095Z 215 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.8393194Z | ^~~~ 2025-12-04T12:35:04.8394370Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:215:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.8394495Z 215 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.8394593Z | ^~~~ 2025-12-04T12:35:04.8395794Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:215:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.8395910Z 215 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.8396010Z | ^~~~ 2025-12-04T12:35:04.8397211Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:215:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.8397360Z 215 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.8397459Z | ^~~~ 2025-12-04T12:35:04.8398958Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h: In instantiation of ‘void at::vec::CPU_CAPABILITY::QuantizeAvx512(const float*, T*, int, float, int64_t) [with T = unsigned char; int64_t = long int]’: 2025-12-04T12:35:04.8399612Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:933:31: required from here 2025-12-04T12:35:04.8400860Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:201:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.8400980Z 201 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.8401090Z | ^~~~ 2025-12-04T12:35:04.8402282Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:201:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.8402402Z 201 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.8402513Z | ^~~~ 2025-12-04T12:35:04.8403701Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:201:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.8403820Z 201 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.8403933Z | ^~~~ 2025-12-04T12:35:04.8405117Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:201:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.8405243Z 201 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.8405344Z | ^~~~ 2025-12-04T12:35:04.8406528Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:202:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.8406653Z 202 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.8406746Z | ^~~~ 2025-12-04T12:35:04.8407948Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:202:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.8408066Z 202 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.8408161Z | ^~~~ 2025-12-04T12:35:04.8409357Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:202:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.8409475Z 202 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.8409573Z | ^~~~ 2025-12-04T12:35:04.8410772Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:202:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.8410884Z 202 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.8411011Z | ^~~~ 2025-12-04T12:35:04.8412184Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:203:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.8412302Z 203 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.8412408Z | ^~~~ 2025-12-04T12:35:04.8413589Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:203:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.8413752Z 203 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.8413848Z | ^~~~ 2025-12-04T12:35:04.8415068Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:203:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.8415226Z 203 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.8415325Z | ^~~~ 2025-12-04T12:35:04.8416677Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:203:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.8416793Z 203 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.8416894Z | ^~~~ 2025-12-04T12:35:04.8418105Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:205:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.8418224Z 205 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.8418317Z | ^~~~ 2025-12-04T12:35:04.8419515Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:205:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.8419633Z 205 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.8419745Z | ^~~~ 2025-12-04T12:35:04.8420930Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:205:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.8421044Z 205 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.8421156Z | ^~~~ 2025-12-04T12:35:04.8422343Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:205:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.8422466Z 205 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.8422567Z | ^~~~ 2025-12-04T12:35:04.8423752Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:206:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.8423883Z 206 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.8423976Z | ^~~~ 2025-12-04T12:35:04.8425169Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:206:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.8425286Z 206 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.8425383Z | ^~~~ 2025-12-04T12:35:04.8426578Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:206:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.8426687Z 206 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.8426800Z | ^~~~ 2025-12-04T12:35:04.8427998Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:206:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.8428120Z 206 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.8428234Z | ^~~~ 2025-12-04T12:35:04.8429409Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:207:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.8429578Z 207 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.8429684Z | ^~~~ 2025-12-04T12:35:04.8430905Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:207:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.8431066Z 207 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.8431165Z | ^~~~ 2025-12-04T12:35:04.8432388Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:207:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.8432516Z 207 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.8432617Z | ^~~~ 2025-12-04T12:35:04.8433809Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:207:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.8433933Z 207 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.8434031Z | ^~~~ 2025-12-04T12:35:04.8435234Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:209:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.8435352Z 209 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.8435444Z | ^~~~ 2025-12-04T12:35:04.8436649Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:209:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.8436760Z 209 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.8436869Z | ^~~~ 2025-12-04T12:35:04.8438055Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:209:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.8438167Z 209 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.8438278Z | ^~~~ 2025-12-04T12:35:04.8439460Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:209:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.8439592Z 209 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.8439694Z | ^~~~ 2025-12-04T12:35:04.8440870Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:210:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.8441032Z 210 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.8441125Z | ^~~~ 2025-12-04T12:35:04.8442317Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:210:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.8442429Z 210 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.8442569Z | ^~~~ 2025-12-04T12:35:04.8443762Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:210:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.8443883Z 210 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.8443983Z | ^~~~ 2025-12-04T12:35:04.8445170Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:210:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.8445291Z 210 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.8445405Z | ^~~~ 2025-12-04T12:35:04.8446617Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:211:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.8446737Z 211 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.8446845Z | ^~~~ 2025-12-04T12:35:04.8448068Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:211:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.8448197Z 211 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.8448296Z | ^~~~ 2025-12-04T12:35:04.8449485Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:211:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.8449621Z 211 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.8449722Z | ^~~~ 2025-12-04T12:35:04.8450924Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:211:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.8451047Z 211 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.8451154Z | ^~~~ 2025-12-04T12:35:04.8452354Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:213:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.8452469Z 213 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.8452570Z | ^~~~ 2025-12-04T12:35:04.8453769Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:213:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.8453882Z 213 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.8453995Z | ^~~~ 2025-12-04T12:35:04.8455185Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:213:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.8455305Z 213 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.8455423Z | ^~~~ 2025-12-04T12:35:04.8456690Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:213:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.8456866Z 213 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.8456969Z | ^~~~ 2025-12-04T12:35:04.8458166Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:214:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.8458292Z 214 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.8458436Z | ^~~~ 2025-12-04T12:35:04.8459633Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:214:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.8459755Z 214 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.8459854Z | ^~~~ 2025-12-04T12:35:04.8461059Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:214:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.8461175Z 214 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.8461276Z | ^~~~ 2025-12-04T12:35:04.8462509Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:214:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.8462631Z 214 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.8462748Z | ^~~~ 2025-12-04T12:35:04.8463965Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:215:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.8464081Z 215 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.8464191Z | ^~~~ 2025-12-04T12:35:04.8465553Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:215:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.8465692Z 215 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.8465791Z | ^~~~ 2025-12-04T12:35:04.8466991Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:215:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.8467124Z 215 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.8467222Z | ^~~~ 2025-12-04T12:35:04.8468405Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:215:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.8468529Z 215 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.8468630Z | ^~~~ 2025-12-04T12:35:04.8468753Z PASSED [36.3331s] [ 40%] 2025-12-04T12:35:04.8469422Z inductor/test_aot_inductor_package.py::TestAOTInductorPackageCpp_cpu::test_package_shared_weights SKIPPED [0.0036s] (No support for cpp only) [ 42%] 2025-12-04T12:35:04.8470099Z inductor/test_aot_inductor_package.py::TestAOTInductorPackageCpp_cpu::test_package_user_managed_weight SKIPPED [0.0031s] (No support for cpp only) [ 43%] 2025-12-04T12:35:04.8470830Z inductor/test_aot_inductor_package.py::TestAOTInductorPackageCpp_cpu::test_package_weights_on_disk_nested_module SKIPPED [0.0029s] (No support for cpp only) [ 44%] 2025-12-04T12:35:04.8471686Z inductor/test_aot_inductor_package.py::TestAOTInductorPackageCpp_cpu::test_package_without_weight SKIPPED [0.0029s] (No support for cpp only) [ 45%] 2025-12-04T12:35:04.8472225Z inductor/test_aot_inductor_package.py::TestAOTInductorPackageCpp_cpu::test_remove_intermediate_files PASSED [5.2513s] [ 46%] 2025-12-04T12:35:04.8472689Z inductor/test_aot_inductor_package.py::TestAOTInductorPackageCpp_cpu::test_save_buffer PASSED [5.2709s] [ 47%] 2025-12-04T12:35:04.8473815Z inductor/test_aot_inductor_package.py::TestAOTInductorPackageCpp_cpu::test_specified_output_dir In file included from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_float.h:12, 2025-12-04T12:35:04.8474268Z from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512.h:11, 2025-12-04T12:35:04.8474693Z from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec.h:5, 2025-12-04T12:35:04.8475148Z from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/functional_base.h:7, 2025-12-04T12:35:04.8475555Z from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/functional.h:4, 2025-12-04T12:35:04.8476033Z from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/torch/csrc/inductor/cpp_prefix.h:45, 2025-12-04T12:35:04.8476704Z from /tmp/uDOzLN/tmpn56tbzg5/data/aotinductor/model1/cwulnadwx3jyqkgl526d3bpo7ziav2n33dginvvv4zbkqn5jle4v.wrapper.cpp:729: 2025-12-04T12:35:04.8477306Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/sleef.h:192:10: warning: ISO C++ prohibits anonymous structs [-Wpedantic] 2025-12-04T12:35:04.8477477Z 192 | struct { 2025-12-04T12:35:04.8477577Z | ^ 2025-12-04T12:35:04.8478075Z In file included from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512.h:15, 2025-12-04T12:35:04.8478516Z from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec.h:5, 2025-12-04T12:35:04.8478958Z from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/functional_base.h:7, 2025-12-04T12:35:04.8479374Z from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/functional.h:4, 2025-12-04T12:35:04.8479839Z from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/torch/csrc/inductor/cpp_prefix.h:45, 2025-12-04T12:35:04.8480506Z from /tmp/uDOzLN/tmpn56tbzg5/data/aotinductor/model1/cwulnadwx3jyqkgl526d3bpo7ziav2n33dginvvv4zbkqn5jle4v.wrapper.cpp:729: 2025-12-04T12:35:04.8482768Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In static member function ‘static at::vec::CPU_CAPABILITY::Vectorized at::vec::CPU_CAPABILITY::Vectorized::blendv(const at::vec::CPU_CAPABILITY::Vectorized&, const at::vec::CPU_CAPABILITY::Vectorized&, const at::vec::CPU_CAPABILITY::Vectorized&)’: 2025-12-04T12:35:04.8483950Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:544:38: warning: overflow in conversion from ‘int’ to ‘short int’ changes value from ‘65535’ to ‘-1’ [-Woverflow] 2025-12-04T12:35:04.8484124Z 544 | auto msb_one = _mm512_set1_epi16(0xFFFF); 2025-12-04T12:35:04.8484240Z | ^~~~~~ 2025-12-04T12:35:04.8484755Z In file included from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512.h:15, 2025-12-04T12:35:04.8485128Z from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec.h:5, 2025-12-04T12:35:04.8485572Z from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/functional_base.h:7, 2025-12-04T12:35:04.8485993Z from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/functional.h:4, 2025-12-04T12:35:04.8486454Z from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/torch/csrc/inductor/cpp_prefix.h:45, 2025-12-04T12:35:04.8487134Z from /tmp/uDOzLN/tmpn56tbzg5/data/aotinductor/model1/cwulnadwx3jyqkgl526d3bpo7ziav2n33dginvvv4zbkqn5jle4v.wrapper.cpp:729: 2025-12-04T12:35:04.8488802Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In member function ‘at::vec::CPU_CAPABILITY::Vectorized at::vec::CPU_CAPABILITY::Vectorized::operator==(const at::vec::CPU_CAPABILITY::Vectorized&) const’: 2025-12-04T12:35:04.8490015Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:697:54: warning: overflow in conversion from ‘int’ to ‘short int’ changes value from ‘65535’ to ‘-1’ [-Woverflow] 2025-12-04T12:35:04.8490240Z 697 | return _mm512_mask_set1_epi16(zero_vector, mask, 0xFFFF); 2025-12-04T12:35:04.8490370Z | ^~~~~~ 2025-12-04T12:35:04.8492003Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In member function ‘at::vec::CPU_CAPABILITY::Vectorized at::vec::CPU_CAPABILITY::Vectorized::operator!=(const at::vec::CPU_CAPABILITY::Vectorized&) const’: 2025-12-04T12:35:04.8493176Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:701:54: warning: overflow in conversion from ‘int’ to ‘short int’ changes value from ‘65535’ to ‘-1’ [-Woverflow] 2025-12-04T12:35:04.8493434Z 701 | return _mm512_mask_set1_epi16(zero_vector, mask, 0xFFFF); 2025-12-04T12:35:04.8493566Z | ^~~~~~ 2025-12-04T12:35:04.8495223Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In member function ‘at::vec::CPU_CAPABILITY::Vectorized at::vec::CPU_CAPABILITY::Vectorized::operator<(const at::vec::CPU_CAPABILITY::Vectorized&) const’: 2025-12-04T12:35:04.8496477Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:705:54: warning: overflow in conversion from ‘int’ to ‘short int’ changes value from ‘65535’ to ‘-1’ [-Woverflow] 2025-12-04T12:35:04.8496694Z 705 | return _mm512_mask_set1_epi16(zero_vector, mask, 0xFFFF); 2025-12-04T12:35:04.8496831Z | ^~~~~~ 2025-12-04T12:35:04.8498465Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In member function ‘at::vec::CPU_CAPABILITY::Vectorized at::vec::CPU_CAPABILITY::Vectorized::operator<=(const at::vec::CPU_CAPABILITY::Vectorized&) const’: 2025-12-04T12:35:04.8499647Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:709:54: warning: overflow in conversion from ‘int’ to ‘short int’ changes value from ‘65535’ to ‘-1’ [-Woverflow] 2025-12-04T12:35:04.8499852Z 709 | return _mm512_mask_set1_epi16(zero_vector, mask, 0xFFFF); 2025-12-04T12:35:04.8499977Z | ^~~~~~ 2025-12-04T12:35:04.8501617Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In member function ‘at::vec::CPU_CAPABILITY::Vectorized at::vec::CPU_CAPABILITY::Vectorized::operator>(const at::vec::CPU_CAPABILITY::Vectorized&) const’: 2025-12-04T12:35:04.8502797Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:713:54: warning: overflow in conversion from ‘int’ to ‘short int’ changes value from ‘65535’ to ‘-1’ [-Woverflow] 2025-12-04T12:35:04.8503021Z 713 | return _mm512_mask_set1_epi16(zero_vector, mask, 0xFFFF); 2025-12-04T12:35:04.8503151Z | ^~~~~~ 2025-12-04T12:35:04.8504759Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In member function ‘at::vec::CPU_CAPABILITY::Vectorized at::vec::CPU_CAPABILITY::Vectorized::operator>=(const at::vec::CPU_CAPABILITY::Vectorized&) const’: 2025-12-04T12:35:04.8505981Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:717:54: warning: overflow in conversion from ‘int’ to ‘short int’ changes value from ‘65535’ to ‘-1’ [-Woverflow] 2025-12-04T12:35:04.8506220Z 717 | return _mm512_mask_set1_epi16(zero_vector, mask, 0xFFFF); 2025-12-04T12:35:04.8506389Z | ^~~~~~ 2025-12-04T12:35:04.8508685Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In static member function ‘static at::vec::CPU_CAPABILITY::Vectorized at::vec::CPU_CAPABILITY::Vectorized::blendv(const at::vec::CPU_CAPABILITY::Vectorized&, const at::vec::CPU_CAPABILITY::Vectorized&, const at::vec::CPU_CAPABILITY::Vectorized&)’: 2025-12-04T12:35:04.8509894Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1153:37: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.8510048Z 1153 | auto msb_one = _mm512_set1_epi8(0xFF); 2025-12-04T12:35:04.8510165Z | ^~~~ 2025-12-04T12:35:04.8511838Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In member function ‘at::vec::CPU_CAPABILITY::Vectorized at::vec::CPU_CAPABILITY::Vectorized::operator==(const at::vec::CPU_CAPABILITY::Vectorized&) const’: 2025-12-04T12:35:04.8513036Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1166:53: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.8513254Z 1166 | return _mm512_mask_set1_epi8(zero_vector, mask, 0xFF); 2025-12-04T12:35:04.8513386Z | ^~~~ 2025-12-04T12:35:04.8515047Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In member function ‘at::vec::CPU_CAPABILITY::Vectorized at::vec::CPU_CAPABILITY::Vectorized::operator!=(const at::vec::CPU_CAPABILITY::Vectorized&) const’: 2025-12-04T12:35:04.8516245Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1170:53: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.8516448Z 1170 | return _mm512_mask_set1_epi8(zero_vector, mask, 0xFF); 2025-12-04T12:35:04.8516589Z | ^~~~ 2025-12-04T12:35:04.8518232Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In member function ‘at::vec::CPU_CAPABILITY::Vectorized at::vec::CPU_CAPABILITY::Vectorized::operator<(const at::vec::CPU_CAPABILITY::Vectorized&) const’: 2025-12-04T12:35:04.8519436Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1174:53: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.8519650Z 1174 | return _mm512_mask_set1_epi8(zero_vector, mask, 0xFF); 2025-12-04T12:35:04.8519773Z | ^~~~ 2025-12-04T12:35:04.8521439Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In member function ‘at::vec::CPU_CAPABILITY::Vectorized at::vec::CPU_CAPABILITY::Vectorized::operator<=(const at::vec::CPU_CAPABILITY::Vectorized&) const’: 2025-12-04T12:35:04.8522632Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1178:53: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.8522889Z 1178 | return _mm512_mask_set1_epi8(zero_vector, mask, 0xFF); 2025-12-04T12:35:04.8523012Z | ^~~~ 2025-12-04T12:35:04.8525420Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In static member function ‘static at::vec::CPU_CAPABILITY::Vectorized at::vec::CPU_CAPABILITY::Vectorized::blendv(const at::vec::CPU_CAPABILITY::Vectorized&, const at::vec::CPU_CAPABILITY::Vectorized&, const at::vec::CPU_CAPABILITY::Vectorized&)’: 2025-12-04T12:35:04.8526648Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1207:37: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.8526815Z 1207 | auto msb_one = _mm512_set1_epi8(0xFF); 2025-12-04T12:35:04.8526931Z | ^~~~ 2025-12-04T12:35:04.8528634Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In member function ‘at::vec::CPU_CAPABILITY::Vectorized at::vec::CPU_CAPABILITY::Vectorized::operator==(const at::vec::CPU_CAPABILITY::Vectorized&) const’: 2025-12-04T12:35:04.8529839Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1220:53: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.8530050Z 1220 | return _mm512_mask_set1_epi8(zero_vector, mask, 0xFF); 2025-12-04T12:35:04.8530186Z | ^~~~ 2025-12-04T12:35:04.8531869Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In member function ‘at::vec::CPU_CAPABILITY::Vectorized at::vec::CPU_CAPABILITY::Vectorized::operator!=(const at::vec::CPU_CAPABILITY::Vectorized&) const’: 2025-12-04T12:35:04.8533067Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1224:53: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.8533282Z 1224 | return _mm512_mask_set1_epi8(zero_vector, mask, 0xFF); 2025-12-04T12:35:04.8533404Z | ^~~~ 2025-12-04T12:35:04.8535101Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In member function ‘at::vec::CPU_CAPABILITY::Vectorized at::vec::CPU_CAPABILITY::Vectorized::operator<(const at::vec::CPU_CAPABILITY::Vectorized&) const’: 2025-12-04T12:35:04.8536348Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1228:53: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.8536567Z 1228 | return _mm512_mask_set1_epi8(zero_vector, mask, 0xFF); 2025-12-04T12:35:04.8536695Z | ^~~~ 2025-12-04T12:35:04.8538402Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In member function ‘at::vec::CPU_CAPABILITY::Vectorized at::vec::CPU_CAPABILITY::Vectorized::operator<=(const at::vec::CPU_CAPABILITY::Vectorized&) const’: 2025-12-04T12:35:04.8539607Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1232:53: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.8539857Z 1232 | return _mm512_mask_set1_epi8(zero_vector, mask, 0xFF); 2025-12-04T12:35:04.8539992Z | ^~~~ 2025-12-04T12:35:04.8542392Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In instantiation of ‘at::vec::CPU_CAPABILITY::Vectorized at::vec::CPU_CAPABILITY::shift_512_8(const at::vec::CPU_CAPABILITY::Vectorized&, const at::vec::CPU_CAPABILITY::Vectorized&) [with bool left_shift = true; T = signed char; typename std::enable_if<(is_same_v || is_same_v), int>::type = 0]’: 2025-12-04T12:35:04.8543069Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:2074:27: required from here 2025-12-04T12:35:04.8544257Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1866:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8544359Z 1866 | 0x80, 2025-12-04T12:35:04.8544466Z | ^~~~ 2025-12-04T12:35:04.8545643Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1868:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8545762Z 1868 | 0x80, 2025-12-04T12:35:04.8545859Z | ^~~~ 2025-12-04T12:35:04.8547033Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1870:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8547150Z 1870 | 0x80, 2025-12-04T12:35:04.8547244Z | ^~~~ 2025-12-04T12:35:04.8548421Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1872:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8548546Z 1872 | 0x80, 2025-12-04T12:35:04.8548639Z | ^~~~ 2025-12-04T12:35:04.8549825Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1874:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8549931Z 1874 | 0x80, 2025-12-04T12:35:04.8550022Z | ^~~~ 2025-12-04T12:35:04.8551206Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1876:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8551306Z 1876 | 0x80, 2025-12-04T12:35:04.8551396Z | ^~~~ 2025-12-04T12:35:04.8552589Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1878:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8552690Z 1878 | 0x80, 2025-12-04T12:35:04.8552792Z | ^~~~ 2025-12-04T12:35:04.8553960Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1880:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8554066Z 1880 | 0x80, 2025-12-04T12:35:04.8554174Z | ^~~~ 2025-12-04T12:35:04.8555349Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1882:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8555460Z 1882 | 0x80, 2025-12-04T12:35:04.8555552Z | ^~~~ 2025-12-04T12:35:04.8556726Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1884:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8556875Z 1884 | 0x80, 2025-12-04T12:35:04.8556965Z | ^~~~ 2025-12-04T12:35:04.8558168Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1886:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8558336Z 1886 | 0x80, 2025-12-04T12:35:04.8558432Z | ^~~~ 2025-12-04T12:35:04.8559625Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1888:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8559761Z 1888 | 0x80, 2025-12-04T12:35:04.8559855Z | ^~~~ 2025-12-04T12:35:04.8561051Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1890:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8561154Z 1890 | 0x80, 2025-12-04T12:35:04.8561262Z | ^~~~ 2025-12-04T12:35:04.8562440Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1892:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8562543Z 1892 | 0x80, 2025-12-04T12:35:04.8562654Z | ^~~~ 2025-12-04T12:35:04.8563833Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1894:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8563948Z 1894 | 0x80, 2025-12-04T12:35:04.8564047Z | ^~~~ 2025-12-04T12:35:04.8565218Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1896:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8565340Z 1896 | 0x80, 2025-12-04T12:35:04.8565434Z | ^~~~ 2025-12-04T12:35:04.8566613Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1898:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8566730Z 1898 | 0x80, 2025-12-04T12:35:04.8566823Z | ^~~~ 2025-12-04T12:35:04.8568022Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1900:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8568118Z 1900 | 0x80, 2025-12-04T12:35:04.8568212Z | ^~~~ 2025-12-04T12:35:04.8569402Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1902:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8569505Z 1902 | 0x80, 2025-12-04T12:35:04.8569599Z | ^~~~ 2025-12-04T12:35:04.8570806Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1904:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8570912Z 1904 | 0x80, 2025-12-04T12:35:04.8571197Z | ^~~~ 2025-12-04T12:35:04.8572386Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1906:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8572483Z 1906 | 0x80, 2025-12-04T12:35:04.8572594Z | ^~~~ 2025-12-04T12:35:04.8573942Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1908:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8574165Z 1908 | 0x80, 2025-12-04T12:35:04.8574259Z | ^~~~ 2025-12-04T12:35:04.8575502Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1910:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8575660Z 1910 | 0x80, 2025-12-04T12:35:04.8575756Z | ^~~~ 2025-12-04T12:35:04.8577055Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1912:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8577168Z 1912 | 0x80, 2025-12-04T12:35:04.8577259Z | ^~~~ 2025-12-04T12:35:04.8578465Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1914:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8578566Z 1914 | 0x80, 2025-12-04T12:35:04.8578659Z | ^~~~ 2025-12-04T12:35:04.8579858Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1916:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8579956Z 1916 | 0x80, 2025-12-04T12:35:04.8580061Z | ^~~~ 2025-12-04T12:35:04.8581430Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1918:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8581527Z 1918 | 0x80, 2025-12-04T12:35:04.8581637Z | ^~~~ 2025-12-04T12:35:04.8582875Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1920:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8582977Z 1920 | 0x80, 2025-12-04T12:35:04.8583082Z | ^~~~ 2025-12-04T12:35:04.8584348Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1922:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8584501Z 1922 | 0x80, 2025-12-04T12:35:04.8584615Z | ^~~~ 2025-12-04T12:35:04.8585936Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1924:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8586050Z 1924 | 0x80, 2025-12-04T12:35:04.8586141Z | ^~~~ 2025-12-04T12:35:04.8587331Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1926:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8587484Z 1926 | 0x80, 2025-12-04T12:35:04.8587588Z | ^~~~ 2025-12-04T12:35:04.8588906Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1928:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8589053Z 1928 | 0x80); 2025-12-04T12:35:04.8589144Z | ^~~~ 2025-12-04T12:35:04.8590415Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1930:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8590512Z 1930 | 0x80, 2025-12-04T12:35:04.8590615Z | ^~~~ 2025-12-04T12:35:04.8591862Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1932:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8591990Z 1932 | 0x80, 2025-12-04T12:35:04.8592119Z | ^~~~ 2025-12-04T12:35:04.8593348Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1934:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8593463Z 1934 | 0x80, 2025-12-04T12:35:04.8593557Z | ^~~~ 2025-12-04T12:35:04.8595200Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1936:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8595362Z 1936 | 0x80, 2025-12-04T12:35:04.8595522Z | ^~~~ 2025-12-04T12:35:04.8597290Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1938:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8597473Z 1938 | 0x80, 2025-12-04T12:35:04.8597629Z | ^~~~ 2025-12-04T12:35:04.8599279Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1940:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8599444Z 1940 | 0x80, 2025-12-04T12:35:04.8599604Z | ^~~~ 2025-12-04T12:35:04.8601069Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1942:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8601167Z 1942 | 0x80, 2025-12-04T12:35:04.8601260Z | ^~~~ 2025-12-04T12:35:04.8602762Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1944:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8602930Z 1944 | 0x80, 2025-12-04T12:35:04.8603058Z | ^~~~ 2025-12-04T12:35:04.8604612Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1946:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8604715Z 1946 | 0x80, 2025-12-04T12:35:04.8604823Z | ^~~~ 2025-12-04T12:35:04.8606037Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1948:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8606208Z 1948 | 0x80, 2025-12-04T12:35:04.8606361Z | ^~~~ 2025-12-04T12:35:04.8608458Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1950:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8608572Z 1950 | 0x80, 2025-12-04T12:35:04.8608670Z | ^~~~ 2025-12-04T12:35:04.8609863Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1952:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8610057Z 1952 | 0x80, 2025-12-04T12:35:04.8610151Z | ^~~~ 2025-12-04T12:35:04.8611806Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1954:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8611962Z 1954 | 0x80, 2025-12-04T12:35:04.8612121Z | ^~~~ 2025-12-04T12:35:04.8614025Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1956:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8614189Z 1956 | 0x80, 2025-12-04T12:35:04.8614360Z | ^~~~ 2025-12-04T12:35:04.8616125Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1958:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8616233Z 1958 | 0x80, 2025-12-04T12:35:04.8616420Z | ^~~~ 2025-12-04T12:35:04.8617945Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1960:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8618104Z 1960 | 0x80, 2025-12-04T12:35:04.8618277Z | ^~~~ 2025-12-04T12:35:04.8620061Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1962:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8620235Z 1962 | 0x80, 2025-12-04T12:35:04.8620386Z | ^~~~ 2025-12-04T12:35:04.8621775Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1964:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8621893Z 1964 | 0x80, 2025-12-04T12:35:04.8621985Z | ^~~~ 2025-12-04T12:35:04.8623963Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1966:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8624062Z 1966 | 0x80, 2025-12-04T12:35:04.8624156Z | ^~~~ 2025-12-04T12:35:04.8625474Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1968:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8625633Z 1968 | 0x80, 2025-12-04T12:35:04.8625789Z | ^~~~ 2025-12-04T12:35:04.8627491Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1970:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8627595Z 1970 | 0x80, 2025-12-04T12:35:04.8627700Z | ^~~~ 2025-12-04T12:35:04.8629666Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1972:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8629814Z 1972 | 0x80, 2025-12-04T12:35:04.8629949Z | ^~~~ 2025-12-04T12:35:04.8631334Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1974:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8631442Z 1974 | 0x80, 2025-12-04T12:35:04.8631535Z | ^~~~ 2025-12-04T12:35:04.8633067Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1976:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8633289Z 1976 | 0x80, 2025-12-04T12:35:04.8633440Z | ^~~~ 2025-12-04T12:35:04.8635006Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1978:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8635115Z 1978 | 0x80, 2025-12-04T12:35:04.8635214Z | ^~~~ 2025-12-04T12:35:04.8636561Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1980:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8636658Z 1980 | 0x80, 2025-12-04T12:35:04.8636749Z | ^~~~ 2025-12-04T12:35:04.8637950Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1982:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8638050Z 1982 | 0x80, 2025-12-04T12:35:04.8638155Z | ^~~~ 2025-12-04T12:35:04.8639652Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1984:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8639821Z 1984 | 0x80, 2025-12-04T12:35:04.8639997Z | ^~~~ 2025-12-04T12:35:04.8642098Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1986:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8642253Z 1986 | 0x80, 2025-12-04T12:35:04.8642422Z | ^~~~ 2025-12-04T12:35:04.8643803Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1988:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8643920Z 1988 | 0x80, 2025-12-04T12:35:04.8644014Z | ^~~~ 2025-12-04T12:35:04.8645505Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1990:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8645676Z 1990 | 0x80, 2025-12-04T12:35:04.8645826Z | ^~~~ 2025-12-04T12:35:04.8647089Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1992:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8647184Z 1992 | 0x80, 2025-12-04T12:35:04.8647278Z | ^~~~ 2025-12-04T12:35:04.8648498Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:2002:38: warning: overflow in conversion from ‘int’ to ‘short int’ changes value from ‘65280’ to ‘-256’ [-Woverflow] 2025-12-04T12:35:04.8648663Z 2002 | __m512i keep_1 = _mm512_set1_epi16(0xFF00); 2025-12-04T12:35:04.8648791Z | ^~~~~~ 2025-12-04T12:35:04.8651665Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In instantiation of ‘at::vec::CPU_CAPABILITY::Vectorized at::vec::CPU_CAPABILITY::shift_512_8(const at::vec::CPU_CAPABILITY::Vectorized&, const at::vec::CPU_CAPABILITY::Vectorized&) [with bool left_shift = true; T = unsigned char; typename std::enable_if<(is_same_v || is_same_v), int>::type = 0]’: 2025-12-04T12:35:04.8652334Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:2081:27: required from here 2025-12-04T12:35:04.8653640Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1866:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8653737Z 1866 | 0x80, 2025-12-04T12:35:04.8653844Z | ^~~~ 2025-12-04T12:35:04.8655873Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1868:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8656050Z 1868 | 0x80, 2025-12-04T12:35:04.8656220Z | ^~~~ 2025-12-04T12:35:04.8658328Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1870:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8658443Z 1870 | 0x80, 2025-12-04T12:35:04.8658536Z | ^~~~ 2025-12-04T12:35:04.8659746Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1872:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8659858Z 1872 | 0x80, 2025-12-04T12:35:04.8660008Z | ^~~~ 2025-12-04T12:35:04.8661342Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1874:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8661460Z 1874 | 0x80, 2025-12-04T12:35:04.8661553Z | ^~~~ 2025-12-04T12:35:04.8662976Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1876:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8663110Z 1876 | 0x80, 2025-12-04T12:35:04.8663212Z | ^~~~ 2025-12-04T12:35:04.8664490Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1878:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8664587Z 1878 | 0x80, 2025-12-04T12:35:04.8664687Z | ^~~~ 2025-12-04T12:35:04.8665885Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1880:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8665988Z 1880 | 0x80, 2025-12-04T12:35:04.8666096Z | ^~~~ 2025-12-04T12:35:04.8667271Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1882:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8667366Z 1882 | 0x80, 2025-12-04T12:35:04.8667481Z | ^~~~ 2025-12-04T12:35:04.8668670Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1884:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8668782Z 1884 | 0x80, 2025-12-04T12:35:04.8668883Z | ^~~~ 2025-12-04T12:35:04.8670060Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1886:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8670253Z 1886 | 0x80, 2025-12-04T12:35:04.8670348Z | ^~~~ 2025-12-04T12:35:04.8671750Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1888:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8671863Z 1888 | 0x80, 2025-12-04T12:35:04.8672109Z | ^~~~ 2025-12-04T12:35:04.8673924Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1890:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8674088Z 1890 | 0x80, 2025-12-04T12:35:04.8674331Z | ^~~~ 2025-12-04T12:35:04.8676427Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1892:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8676545Z 1892 | 0x80, 2025-12-04T12:35:04.8676657Z | ^~~~ 2025-12-04T12:35:04.8677842Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1894:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8677938Z 1894 | 0x80, 2025-12-04T12:35:04.8678060Z | ^~~~ 2025-12-04T12:35:04.8679310Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1896:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8679407Z 1896 | 0x80, 2025-12-04T12:35:04.8679526Z | ^~~~ 2025-12-04T12:35:04.8680712Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1898:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8680823Z 1898 | 0x80, 2025-12-04T12:35:04.8680917Z | ^~~~ 2025-12-04T12:35:04.8682091Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1900:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8682203Z 1900 | 0x80, 2025-12-04T12:35:04.8682310Z | ^~~~ 2025-12-04T12:35:04.8684257Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1902:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8684418Z 1902 | 0x80, 2025-12-04T12:35:04.8684555Z | ^~~~ 2025-12-04T12:35:04.8686501Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1904:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8686669Z 1904 | 0x80, 2025-12-04T12:35:04.8686828Z | ^~~~ 2025-12-04T12:35:04.8688084Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1906:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8688233Z 1906 | 0x80, 2025-12-04T12:35:04.8688536Z | ^~~~ 2025-12-04T12:35:04.8689879Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1908:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8689982Z 1908 | 0x80, 2025-12-04T12:35:04.8690090Z | ^~~~ 2025-12-04T12:35:04.8691762Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1910:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8691964Z 1910 | 0x80, 2025-12-04T12:35:04.8692055Z | ^~~~ 2025-12-04T12:35:04.8693355Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1912:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8693598Z 1912 | 0x80, 2025-12-04T12:35:04.8693765Z | ^~~~ 2025-12-04T12:35:04.8695828Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1914:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8696063Z 1914 | 0x80, 2025-12-04T12:35:04.8696209Z | ^~~~ 2025-12-04T12:35:04.8698119Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1916:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8698227Z 1916 | 0x80, 2025-12-04T12:35:04.8698318Z | ^~~~ 2025-12-04T12:35:04.8699688Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1918:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8699823Z 1918 | 0x80, 2025-12-04T12:35:04.8699930Z | ^~~~ 2025-12-04T12:35:04.8701517Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1920:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8701620Z 1920 | 0x80, 2025-12-04T12:35:04.8701727Z | ^~~~ 2025-12-04T12:35:04.8703280Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1922:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8703391Z 1922 | 0x80, 2025-12-04T12:35:04.8703499Z | ^~~~ 2025-12-04T12:35:04.8705029Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1924:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8705223Z 1924 | 0x80, 2025-12-04T12:35:04.8705371Z | ^~~~ 2025-12-04T12:35:04.8707471Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1926:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8707607Z 1926 | 0x80, 2025-12-04T12:35:04.8707700Z | ^~~~ 2025-12-04T12:35:04.8709254Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1928:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8709514Z 1928 | 0x80); 2025-12-04T12:35:04.8709612Z | ^~~~ 2025-12-04T12:35:04.8710890Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1930:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8711036Z 1930 | 0x80, 2025-12-04T12:35:04.8711127Z | ^~~~ 2025-12-04T12:35:04.8712685Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1932:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8712839Z 1932 | 0x80, 2025-12-04T12:35:04.8712948Z | ^~~~ 2025-12-04T12:35:04.8714134Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1934:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8714234Z 1934 | 0x80, 2025-12-04T12:35:04.8714341Z | ^~~~ 2025-12-04T12:35:04.8716189Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1936:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8716370Z 1936 | 0x80, 2025-12-04T12:35:04.8716521Z | ^~~~ 2025-12-04T12:35:04.8718054Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1938:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8718170Z 1938 | 0x80, 2025-12-04T12:35:04.8718265Z | ^~~~ 2025-12-04T12:35:04.8719945Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1940:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8720066Z 1940 | 0x80, 2025-12-04T12:35:04.8720160Z | ^~~~ 2025-12-04T12:35:04.8721511Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1942:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8721673Z 1942 | 0x80, 2025-12-04T12:35:04.8721831Z | ^~~~ 2025-12-04T12:35:04.8723289Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1944:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8723386Z 1944 | 0x80, 2025-12-04T12:35:04.8723478Z | ^~~~ 2025-12-04T12:35:04.8724912Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1946:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8725085Z 1946 | 0x80, 2025-12-04T12:35:04.8725226Z | ^~~~ 2025-12-04T12:35:04.8726412Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1948:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8726513Z 1948 | 0x80, 2025-12-04T12:35:04.8726618Z | ^~~~ 2025-12-04T12:35:04.8728332Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1950:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8728509Z 1950 | 0x80, 2025-12-04T12:35:04.8728665Z | ^~~~ 2025-12-04T12:35:04.8730623Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1952:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8730819Z 1952 | 0x80, 2025-12-04T12:35:04.8730914Z | ^~~~ 2025-12-04T12:35:04.8732401Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1954:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8732619Z 1954 | 0x80, 2025-12-04T12:35:04.8732714Z | ^~~~ 2025-12-04T12:35:04.8734393Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1956:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8734517Z 1956 | 0x80, 2025-12-04T12:35:04.8734608Z | ^~~~ 2025-12-04T12:35:04.8735936Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1958:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8736099Z 1958 | 0x80, 2025-12-04T12:35:04.8736266Z | ^~~~ 2025-12-04T12:35:04.8737696Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1960:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8737798Z 1960 | 0x80, 2025-12-04T12:35:04.8737905Z | ^~~~ 2025-12-04T12:35:04.8739594Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1962:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8739756Z 1962 | 0x80, 2025-12-04T12:35:04.8739932Z | ^~~~ 2025-12-04T12:35:04.8741906Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1964:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8742027Z 1964 | 0x80, 2025-12-04T12:35:04.8742120Z | ^~~~ 2025-12-04T12:35:04.8743723Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1966:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8743843Z 1966 | 0x80, 2025-12-04T12:35:04.8743934Z | ^~~~ 2025-12-04T12:35:04.8745424Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1968:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8745588Z 1968 | 0x80, 2025-12-04T12:35:04.8745722Z | ^~~~ 2025-12-04T12:35:04.8747067Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1970:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8747234Z 1970 | 0x80, 2025-12-04T12:35:04.8747387Z | ^~~~ 2025-12-04T12:35:04.8749060Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1972:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8749162Z 1972 | 0x80, 2025-12-04T12:35:04.8749270Z | ^~~~ 2025-12-04T12:35:04.8750931Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1974:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8751094Z 1974 | 0x80, 2025-12-04T12:35:04.8751259Z | ^~~~ 2025-12-04T12:35:04.8753002Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1976:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8753207Z 1976 | 0x80, 2025-12-04T12:35:04.8753301Z | ^~~~ 2025-12-04T12:35:04.8755072Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1978:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8755224Z 1978 | 0x80, 2025-12-04T12:35:04.8755316Z | ^~~~ 2025-12-04T12:35:04.8756979Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1980:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8757162Z 1980 | 0x80, 2025-12-04T12:35:04.8757305Z | ^~~~ 2025-12-04T12:35:04.8758672Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1982:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8758779Z 1982 | 0x80, 2025-12-04T12:35:04.8758928Z | ^~~~ 2025-12-04T12:35:04.8760812Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1984:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8760916Z 1984 | 0x80, 2025-12-04T12:35:04.8761063Z | ^~~~ 2025-12-04T12:35:04.8762666Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1986:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8762763Z 1986 | 0x80, 2025-12-04T12:35:04.8762869Z | ^~~~ 2025-12-04T12:35:04.8764562Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1988:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8764734Z 1988 | 0x80, 2025-12-04T12:35:04.8764899Z | ^~~~ 2025-12-04T12:35:04.8766657Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1990:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8766776Z 1990 | 0x80, 2025-12-04T12:35:04.8766867Z | ^~~~ 2025-12-04T12:35:04.8768449Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1992:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8768629Z 1992 | 0x80, 2025-12-04T12:35:04.8768785Z | ^~~~ 2025-12-04T12:35:04.8770357Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:2002:38: warning: overflow in conversion from ‘int’ to ‘short int’ changes value from ‘65280’ to ‘-256’ [-Woverflow] 2025-12-04T12:35:04.8770630Z 2002 | __m512i keep_1 = _mm512_set1_epi16(0xFF00); 2025-12-04T12:35:04.8770824Z | ^~~~~~ 2025-12-04T12:35:04.8774355Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In instantiation of ‘at::vec::CPU_CAPABILITY::Vectorized at::vec::CPU_CAPABILITY::shift_512_8(const at::vec::CPU_CAPABILITY::Vectorized&, const at::vec::CPU_CAPABILITY::Vectorized&) [with bool left_shift = false; T = signed char; typename std::enable_if<(is_same_v || is_same_v), int>::type = 0]’: 2025-12-04T12:35:04.8774949Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:2109:28: required from here 2025-12-04T12:35:04.8777008Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1866:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8777178Z 1866 | 0x80, 2025-12-04T12:35:04.8777306Z | ^~~~ 2025-12-04T12:35:04.8779138Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1868:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8779301Z 1868 | 0x80, 2025-12-04T12:35:04.8779411Z | ^~~~ 2025-12-04T12:35:04.8780654Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1870:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8780781Z 1870 | 0x80, 2025-12-04T12:35:04.8780949Z | ^~~~ 2025-12-04T12:35:04.8783026Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1872:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8783205Z 1872 | 0x80, 2025-12-04T12:35:04.8783359Z | ^~~~ 2025-12-04T12:35:04.8784891Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1874:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8785013Z 1874 | 0x80, 2025-12-04T12:35:04.8785108Z | ^~~~ 2025-12-04T12:35:04.8786380Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1876:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8786494Z 1876 | 0x80, 2025-12-04T12:35:04.8786599Z | ^~~~ 2025-12-04T12:35:04.8787858Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1878:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8787956Z 1878 | 0x80, 2025-12-04T12:35:04.8788052Z | ^~~~ 2025-12-04T12:35:04.8789386Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1880:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8789490Z 1880 | 0x80, 2025-12-04T12:35:04.8789584Z | ^~~~ 2025-12-04T12:35:04.8790911Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1882:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8791009Z 1882 | 0x80, 2025-12-04T12:35:04.8791129Z | ^~~~ 2025-12-04T12:35:04.8792369Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1884:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8792467Z 1884 | 0x80, 2025-12-04T12:35:04.8792579Z | ^~~~ 2025-12-04T12:35:04.8793760Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1886:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8793878Z 1886 | 0x80, 2025-12-04T12:35:04.8793973Z | ^~~~ 2025-12-04T12:35:04.8795152Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1888:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8795264Z 1888 | 0x80, 2025-12-04T12:35:04.8795433Z | ^~~~ 2025-12-04T12:35:04.8796609Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1890:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8796718Z 1890 | 0x80, 2025-12-04T12:35:04.8796811Z | ^~~~ 2025-12-04T12:35:04.8798042Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1892:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8798169Z 1892 | 0x80, 2025-12-04T12:35:04.8798262Z | ^~~~ 2025-12-04T12:35:04.8799492Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1894:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8799590Z 1894 | 0x80, 2025-12-04T12:35:04.8799704Z | ^~~~ 2025-12-04T12:35:04.8800881Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1896:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8800978Z 1896 | 0x80, 2025-12-04T12:35:04.8801081Z | ^~~~ 2025-12-04T12:35:04.8802261Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1898:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8802364Z 1898 | 0x80, 2025-12-04T12:35:04.8802468Z | ^~~~ 2025-12-04T12:35:04.8803642Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1900:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8803755Z 1900 | 0x80, 2025-12-04T12:35:04.8803846Z | ^~~~ 2025-12-04T12:35:04.8805014Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1902:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8805124Z 1902 | 0x80, 2025-12-04T12:35:04.8805219Z | ^~~~ 2025-12-04T12:35:04.8806406Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1904:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8806505Z 1904 | 0x80, 2025-12-04T12:35:04.8806597Z | ^~~~ 2025-12-04T12:35:04.8807782Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1906:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8807915Z 1906 | 0x80, 2025-12-04T12:35:04.8808006Z | ^~~~ 2025-12-04T12:35:04.8809196Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1908:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8809293Z 1908 | 0x80, 2025-12-04T12:35:04.8809399Z | ^~~~ 2025-12-04T12:35:04.8810614Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1910:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8810707Z 1910 | 0x80, 2025-12-04T12:35:04.8810811Z | ^~~~ 2025-12-04T12:35:04.8811981Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1912:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8812097Z 1912 | 0x80, 2025-12-04T12:35:04.8812188Z | ^~~~ 2025-12-04T12:35:04.8813441Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1914:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8813552Z 1914 | 0x80, 2025-12-04T12:35:04.8813645Z | ^~~~ 2025-12-04T12:35:04.8814874Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1916:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8814984Z 1916 | 0x80, 2025-12-04T12:35:04.8815076Z | ^~~~ 2025-12-04T12:35:04.8816373Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1918:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8816479Z 1918 | 0x80, 2025-12-04T12:35:04.8816571Z | ^~~~ 2025-12-04T12:35:04.8817770Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1920:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8817864Z 1920 | 0x80, 2025-12-04T12:35:04.8817962Z | ^~~~ 2025-12-04T12:35:04.8819155Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1922:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8819250Z 1922 | 0x80, 2025-12-04T12:35:04.8819361Z | ^~~~ 2025-12-04T12:35:04.8820532Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1924:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8820631Z 1924 | 0x80, 2025-12-04T12:35:04.8820739Z | ^~~~ 2025-12-04T12:35:04.8821904Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1926:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8822012Z 1926 | 0x80, 2025-12-04T12:35:04.8822111Z | ^~~~ 2025-12-04T12:35:04.8823288Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1928:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8823398Z 1928 | 0x80); 2025-12-04T12:35:04.8823496Z | ^~~~ 2025-12-04T12:35:04.8824674Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1930:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8824825Z 1930 | 0x80, 2025-12-04T12:35:04.8824919Z | ^~~~ 2025-12-04T12:35:04.8826110Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1932:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8826204Z 1932 | 0x80, 2025-12-04T12:35:04.8826342Z | ^~~~ 2025-12-04T12:35:04.8827535Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1934:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8827629Z 1934 | 0x80, 2025-12-04T12:35:04.8827741Z | ^~~~ 2025-12-04T12:35:04.8828912Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1936:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8829011Z 1936 | 0x80, 2025-12-04T12:35:04.8829117Z | ^~~~ 2025-12-04T12:35:04.8830284Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1938:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8830377Z 1938 | 0x80, 2025-12-04T12:35:04.8830550Z | ^~~~ 2025-12-04T12:35:04.8831724Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1940:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8831830Z 1940 | 0x80, 2025-12-04T12:35:04.8831957Z | ^~~~ 2025-12-04T12:35:04.8833131Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1942:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8833244Z 1942 | 0x80, 2025-12-04T12:35:04.8833337Z | ^~~~ 2025-12-04T12:35:04.8834514Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1944:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8834614Z 1944 | 0x80, 2025-12-04T12:35:04.8834714Z | ^~~~ 2025-12-04T12:35:04.8835903Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1946:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8836002Z 1946 | 0x80, 2025-12-04T12:35:04.8836094Z | ^~~~ 2025-12-04T12:35:04.8837285Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1948:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8837384Z 1948 | 0x80, 2025-12-04T12:35:04.8837492Z | ^~~~ 2025-12-04T12:35:04.8838661Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1950:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8838760Z 1950 | 0x80, 2025-12-04T12:35:04.8838870Z | ^~~~ 2025-12-04T12:35:04.8840043Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1952:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8840156Z 1952 | 0x80, 2025-12-04T12:35:04.8840247Z | ^~~~ 2025-12-04T12:35:04.8841421Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1954:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8841565Z 1954 | 0x80, 2025-12-04T12:35:04.8841657Z | ^~~~ 2025-12-04T12:35:04.8842834Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1956:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8843013Z 1956 | 0x80, 2025-12-04T12:35:04.8843105Z | ^~~~ 2025-12-04T12:35:04.8844289Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1958:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8844417Z 1958 | 0x80, 2025-12-04T12:35:04.8844509Z | ^~~~ 2025-12-04T12:35:04.8845695Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1960:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8845797Z 1960 | 0x80, 2025-12-04T12:35:04.8845901Z | ^~~~ 2025-12-04T12:35:04.8847071Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1962:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8847177Z 1962 | 0x80, 2025-12-04T12:35:04.8847287Z | ^~~~ 2025-12-04T12:35:04.8848453Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1964:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8848552Z 1964 | 0x80, 2025-12-04T12:35:04.8848656Z | ^~~~ 2025-12-04T12:35:04.8849823Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1966:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8849936Z 1966 | 0x80, 2025-12-04T12:35:04.8850028Z | ^~~~ 2025-12-04T12:35:04.8851196Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1968:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8851314Z 1968 | 0x80, 2025-12-04T12:35:04.8851408Z | ^~~~ 2025-12-04T12:35:04.8852591Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1970:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8852691Z 1970 | 0x80, 2025-12-04T12:35:04.8852788Z | ^~~~ 2025-12-04T12:35:04.8853970Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1972:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8854071Z 1972 | 0x80, 2025-12-04T12:35:04.8854163Z | ^~~~ 2025-12-04T12:35:04.8855350Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1974:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8855450Z 1974 | 0x80, 2025-12-04T12:35:04.8855555Z | ^~~~ 2025-12-04T12:35:04.8856801Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1976:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8856899Z 1976 | 0x80, 2025-12-04T12:35:04.8857010Z | ^~~~ 2025-12-04T12:35:04.8858192Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1978:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8858350Z 1978 | 0x80, 2025-12-04T12:35:04.8858443Z | ^~~~ 2025-12-04T12:35:04.8859654Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1980:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8859797Z 1980 | 0x80, 2025-12-04T12:35:04.8859888Z | ^~~~ 2025-12-04T12:35:04.8861098Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1982:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8861207Z 1982 | 0x80, 2025-12-04T12:35:04.8861298Z | ^~~~ 2025-12-04T12:35:04.8862485Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1984:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8862585Z 1984 | 0x80, 2025-12-04T12:35:04.8862677Z | ^~~~ 2025-12-04T12:35:04.8863872Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1986:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8863973Z 1986 | 0x80, 2025-12-04T12:35:04.8864068Z | ^~~~ 2025-12-04T12:35:04.8865260Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1988:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8865354Z 1988 | 0x80, 2025-12-04T12:35:04.8865459Z | ^~~~ 2025-12-04T12:35:04.8866622Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1990:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8866722Z 1990 | 0x80, 2025-12-04T12:35:04.8866830Z | ^~~~ 2025-12-04T12:35:04.8868008Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1992:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8868126Z 1992 | 0x80, 2025-12-04T12:35:04.8868219Z | ^~~~ 2025-12-04T12:35:04.8869396Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:2002:38: warning: overflow in conversion from ‘int’ to ‘short int’ changes value from ‘65280’ to ‘-256’ [-Woverflow] 2025-12-04T12:35:04.8869571Z 2002 | __m512i keep_1 = _mm512_set1_epi16(0xFF00); 2025-12-04T12:35:04.8869691Z | ^~~~~~ 2025-12-04T12:35:04.8872324Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h: In instantiation of ‘at::vec::CPU_CAPABILITY::Vectorized at::vec::CPU_CAPABILITY::shift_512_8(const at::vec::CPU_CAPABILITY::Vectorized&, const at::vec::CPU_CAPABILITY::Vectorized&) [with bool left_shift = false; T = unsigned char; typename std::enable_if<(is_same_v || is_same_v), int>::type = 0]’: 2025-12-04T12:35:04.8872922Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:2116:28: required from here 2025-12-04T12:35:04.8874112Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1866:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8874225Z 1866 | 0x80, 2025-12-04T12:35:04.8874319Z | ^~~~ 2025-12-04T12:35:04.8875514Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1868:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8875698Z 1868 | 0x80, 2025-12-04T12:35:04.8875792Z | ^~~~ 2025-12-04T12:35:04.8877041Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1870:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8877179Z 1870 | 0x80, 2025-12-04T12:35:04.8877290Z | ^~~~ 2025-12-04T12:35:04.8878515Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1872:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8878612Z 1872 | 0x80, 2025-12-04T12:35:04.8878721Z | ^~~~ 2025-12-04T12:35:04.8879902Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1874:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8880003Z 1874 | 0x80, 2025-12-04T12:35:04.8880112Z | ^~~~ 2025-12-04T12:35:04.8881287Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1876:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8881403Z 1876 | 0x80, 2025-12-04T12:35:04.8881498Z | ^~~~ 2025-12-04T12:35:04.8882678Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1878:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8882788Z 1878 | 0x80, 2025-12-04T12:35:04.8882881Z | ^~~~ 2025-12-04T12:35:04.8884072Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1880:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8884176Z 1880 | 0x80, 2025-12-04T12:35:04.8884271Z | ^~~~ 2025-12-04T12:35:04.8885458Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1882:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8885562Z 1882 | 0x80, 2025-12-04T12:35:04.8885655Z | ^~~~ 2025-12-04T12:35:04.8886846Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1884:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8886940Z 1884 | 0x80, 2025-12-04T12:35:04.8887048Z | ^~~~ 2025-12-04T12:35:04.8888220Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1886:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8888320Z 1886 | 0x80, 2025-12-04T12:35:04.8888428Z | ^~~~ 2025-12-04T12:35:04.8889607Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1888:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8889724Z 1888 | 0x80, 2025-12-04T12:35:04.8889819Z | ^~~~ 2025-12-04T12:35:04.8890993Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1890:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8891103Z 1890 | 0x80, 2025-12-04T12:35:04.8891196Z | ^~~~ 2025-12-04T12:35:04.8892407Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1892:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8892512Z 1892 | 0x80, 2025-12-04T12:35:04.8892602Z | ^~~~ 2025-12-04T12:35:04.8893838Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1894:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8893972Z 1894 | 0x80, 2025-12-04T12:35:04.8894064Z | ^~~~ 2025-12-04T12:35:04.8895286Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1896:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8895381Z 1896 | 0x80, 2025-12-04T12:35:04.8895486Z | ^~~~ 2025-12-04T12:35:04.8896730Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1898:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8896824Z 1898 | 0x80, 2025-12-04T12:35:04.8896933Z | ^~~~ 2025-12-04T12:35:04.8898116Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1900:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8898221Z 1900 | 0x80, 2025-12-04T12:35:04.8898332Z | ^~~~ 2025-12-04T12:35:04.8899507Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1902:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8899616Z 1902 | 0x80, 2025-12-04T12:35:04.8899709Z | ^~~~ 2025-12-04T12:35:04.8900883Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1904:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8900992Z 1904 | 0x80, 2025-12-04T12:35:04.8901088Z | ^~~~ 2025-12-04T12:35:04.8902282Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1906:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8902383Z 1906 | 0x80, 2025-12-04T12:35:04.8902474Z | ^~~~ 2025-12-04T12:35:04.8903664Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1908:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8903757Z 1908 | 0x80, 2025-12-04T12:35:04.8903849Z | ^~~~ 2025-12-04T12:35:04.8905084Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1910:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8905179Z 1910 | 0x80, 2025-12-04T12:35:04.8905284Z | ^~~~ 2025-12-04T12:35:04.8906464Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1912:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8906595Z 1912 | 0x80, 2025-12-04T12:35:04.8906700Z | ^~~~ 2025-12-04T12:35:04.8907879Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1914:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8907985Z 1914 | 0x80, 2025-12-04T12:35:04.8908084Z | ^~~~ 2025-12-04T12:35:04.8909256Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1916:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8909364Z 1916 | 0x80, 2025-12-04T12:35:04.8909456Z | ^~~~ 2025-12-04T12:35:04.8910674Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1918:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8910790Z 1918 | 0x80, 2025-12-04T12:35:04.8910882Z | ^~~~ 2025-12-04T12:35:04.8912183Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1920:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8912280Z 1920 | 0x80, 2025-12-04T12:35:04.8912380Z | ^~~~ 2025-12-04T12:35:04.8913571Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1922:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8913666Z 1922 | 0x80, 2025-12-04T12:35:04.8913760Z | ^~~~ 2025-12-04T12:35:04.8914952Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1924:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8915054Z 1924 | 0x80, 2025-12-04T12:35:04.8915163Z | ^~~~ 2025-12-04T12:35:04.8916347Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1926:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8916443Z 1926 | 0x80, 2025-12-04T12:35:04.8916560Z | ^~~~ 2025-12-04T12:35:04.8917737Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1928:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8917846Z 1928 | 0x80); 2025-12-04T12:35:04.8917938Z | ^~~~ 2025-12-04T12:35:04.8919111Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1930:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8919224Z 1930 | 0x80, 2025-12-04T12:35:04.8919316Z | ^~~~ 2025-12-04T12:35:04.8920495Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1932:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8920601Z 1932 | 0x80, 2025-12-04T12:35:04.8920739Z | ^~~~ 2025-12-04T12:35:04.8921926Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1934:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8922020Z 1934 | 0x80, 2025-12-04T12:35:04.8922111Z | ^~~~ 2025-12-04T12:35:04.8923302Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1936:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8923430Z 1936 | 0x80, 2025-12-04T12:35:04.8923533Z | ^~~~ 2025-12-04T12:35:04.8924711Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1938:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8924805Z 1938 | 0x80, 2025-12-04T12:35:04.8924917Z | ^~~~ 2025-12-04T12:35:04.8926086Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1940:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8926179Z 1940 | 0x80, 2025-12-04T12:35:04.8926283Z | ^~~~ 2025-12-04T12:35:04.8927488Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1942:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8927603Z 1942 | 0x80, 2025-12-04T12:35:04.8927693Z | ^~~~ 2025-12-04T12:35:04.8928896Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1944:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8929010Z 1944 | 0x80, 2025-12-04T12:35:04.8929102Z | ^~~~ 2025-12-04T12:35:04.8930287Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1946:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8930380Z 1946 | 0x80, 2025-12-04T12:35:04.8930471Z | ^~~~ 2025-12-04T12:35:04.8931669Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1948:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8931763Z 1948 | 0x80, 2025-12-04T12:35:04.8931854Z | ^~~~ 2025-12-04T12:35:04.8933047Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1950:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8933145Z 1950 | 0x80, 2025-12-04T12:35:04.8933249Z | ^~~~ 2025-12-04T12:35:04.8934419Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1952:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8934511Z 1952 | 0x80, 2025-12-04T12:35:04.8934616Z | ^~~~ 2025-12-04T12:35:04.8935795Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1954:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8935901Z 1954 | 0x80, 2025-12-04T12:35:04.8935991Z | ^~~~ 2025-12-04T12:35:04.8937240Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1956:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8937392Z 1956 | 0x80, 2025-12-04T12:35:04.8937484Z | ^~~~ 2025-12-04T12:35:04.8938659Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1958:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8938769Z 1958 | 0x80, 2025-12-04T12:35:04.8938860Z | ^~~~ 2025-12-04T12:35:04.8940117Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1960:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8940213Z 1960 | 0x80, 2025-12-04T12:35:04.8940305Z | ^~~~ 2025-12-04T12:35:04.8941526Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1962:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8941628Z 1962 | 0x80, 2025-12-04T12:35:04.8941719Z | ^~~~ 2025-12-04T12:35:04.8942910Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1964:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8943004Z 1964 | 0x80, 2025-12-04T12:35:04.8943118Z | ^~~~ 2025-12-04T12:35:04.8944297Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1966:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8944394Z 1966 | 0x80, 2025-12-04T12:35:04.8944508Z | ^~~~ 2025-12-04T12:35:04.8945681Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1968:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8945797Z 1968 | 0x80, 2025-12-04T12:35:04.8945889Z | ^~~~ 2025-12-04T12:35:04.8947054Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1970:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8947162Z 1970 | 0x80, 2025-12-04T12:35:04.8947266Z | ^~~~ 2025-12-04T12:35:04.8948438Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1972:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8948545Z 1972 | 0x80, 2025-12-04T12:35:04.8948644Z | ^~~~ 2025-12-04T12:35:04.8949833Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1974:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8949932Z 1974 | 0x80, 2025-12-04T12:35:04.8950027Z | ^~~~ 2025-12-04T12:35:04.8951215Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1976:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8951310Z 1976 | 0x80, 2025-12-04T12:35:04.8951427Z | ^~~~ 2025-12-04T12:35:04.8952597Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1978:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8952692Z 1978 | 0x80, 2025-12-04T12:35:04.8952804Z | ^~~~ 2025-12-04T12:35:04.8953973Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1980:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8954140Z 1980 | 0x80, 2025-12-04T12:35:04.8954232Z | ^~~~ 2025-12-04T12:35:04.8955406Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1982:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8955528Z 1982 | 0x80, 2025-12-04T12:35:04.8955693Z | ^~~~ 2025-12-04T12:35:04.8956865Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1984:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8956976Z 1984 | 0x80, 2025-12-04T12:35:04.8957104Z | ^~~~ 2025-12-04T12:35:04.8958298Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1986:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8958399Z 1986 | 0x80, 2025-12-04T12:35:04.8958493Z | ^~~~ 2025-12-04T12:35:04.8959682Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1988:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8959784Z 1988 | 0x80, 2025-12-04T12:35:04.8959883Z | ^~~~ 2025-12-04T12:35:04.8961067Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1990:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8961168Z 1990 | 0x80, 2025-12-04T12:35:04.8961276Z | ^~~~ 2025-12-04T12:35:04.8962450Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1992:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘128’ to ‘'\37777777600'’ [-Woverflow] 2025-12-04T12:35:04.8962551Z 1992 | 0x80, 2025-12-04T12:35:04.8962662Z | ^~~~ 2025-12-04T12:35:04.8963835Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:2002:38: warning: overflow in conversion from ‘int’ to ‘short int’ changes value from ‘65280’ to ‘-256’ [-Woverflow] 2025-12-04T12:35:04.8964015Z 2002 | __m512i keep_1 = _mm512_set1_epi16(0xFF00); 2025-12-04T12:35:04.8964142Z | ^~~~~~ 2025-12-04T12:35:04.8964645Z In file included from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512.h:16, 2025-12-04T12:35:04.8965035Z from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec.h:5, 2025-12-04T12:35:04.8965481Z from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/functional_base.h:7, 2025-12-04T12:35:04.8965903Z from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/functional.h:4, 2025-12-04T12:35:04.8966374Z from /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/torch/csrc/inductor/cpp_prefix.h:45, 2025-12-04T12:35:04.8967054Z from /tmp/uDOzLN/tmpn56tbzg5/data/aotinductor/model1/cwulnadwx3jyqkgl526d3bpo7ziav2n33dginvvv4zbkqn5jle4v.wrapper.cpp:729: 2025-12-04T12:35:04.8968533Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h: In instantiation of ‘void at::vec::CPU_CAPABILITY::QuantizeAvx512(const float*, T*, int, float, int64_t) [with T = signed char; int64_t = long int]’: 2025-12-04T12:35:04.8969115Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:696:31: required from here 2025-12-04T12:35:04.8970316Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:201:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.8970477Z 201 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.8970572Z | ^~~~ 2025-12-04T12:35:04.8972082Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:201:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.8972248Z 201 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.8972364Z | ^~~~ 2025-12-04T12:35:04.8973615Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:201:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.8973732Z 201 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.8973851Z | ^~~~ 2025-12-04T12:35:04.8975042Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:201:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.8975179Z 201 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.8975284Z | ^~~~ 2025-12-04T12:35:04.8976535Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:202:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.8976672Z 202 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.8976768Z | ^~~~ 2025-12-04T12:35:04.8977964Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:202:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.8978091Z 202 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.8978194Z | ^~~~ 2025-12-04T12:35:04.8979390Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:202:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.8979502Z 202 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.8979601Z | ^~~~ 2025-12-04T12:35:04.8980814Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:202:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.8980928Z 202 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.8981044Z | ^~~~ 2025-12-04T12:35:04.8982219Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:203:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.8982335Z 203 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.8982445Z | ^~~~ 2025-12-04T12:35:04.8983618Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:203:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.8983745Z 203 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.8983848Z | ^~~~ 2025-12-04T12:35:04.8985032Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:203:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.8985166Z 203 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.8985265Z | ^~~~ 2025-12-04T12:35:04.8986446Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:203:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.8986624Z 203 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.8986728Z | ^~~~ 2025-12-04T12:35:04.8987964Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:205:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.8988108Z 205 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.8988201Z | ^~~~ 2025-12-04T12:35:04.8989426Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:205:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.8989540Z 205 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.8989653Z | ^~~~ 2025-12-04T12:35:04.8990840Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:205:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.8990960Z 205 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.8991077Z | ^~~~ 2025-12-04T12:35:04.8992264Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:205:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.8992395Z 205 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.8992498Z | ^~~~ 2025-12-04T12:35:04.8993677Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:206:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.8993801Z 206 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.8993901Z | ^~~~ 2025-12-04T12:35:04.8995078Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:206:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.8995202Z 206 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.8995299Z | ^~~~ 2025-12-04T12:35:04.8996504Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:206:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.8996616Z 206 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.8996722Z | ^~~~ 2025-12-04T12:35:04.8997916Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:206:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.8998068Z 206 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.8998183Z | ^~~~ 2025-12-04T12:35:04.8999363Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:207:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.8999482Z 207 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.8999624Z | ^~~~ 2025-12-04T12:35:04.9000800Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:207:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.9000919Z 207 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.9001031Z | ^~~~ 2025-12-04T12:35:04.9002211Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:207:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.9002342Z 207 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.9002442Z | ^~~~ 2025-12-04T12:35:04.9003659Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:207:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.9003795Z 207 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.9003894Z | ^~~~ 2025-12-04T12:35:04.9005143Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:209:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.9005259Z 209 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.9005350Z | ^~~~ 2025-12-04T12:35:04.9006544Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:209:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.9006656Z 209 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.9006766Z | ^~~~ 2025-12-04T12:35:04.9007951Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:209:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.9008069Z 209 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.9008182Z | ^~~~ 2025-12-04T12:35:04.9009367Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:209:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.9009480Z 209 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.9009604Z | ^~~~ 2025-12-04T12:35:04.9010788Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:210:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.9010916Z 210 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.9011007Z | ^~~~ 2025-12-04T12:35:04.9012209Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:210:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.9012333Z 210 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.9012435Z | ^~~~ 2025-12-04T12:35:04.9013624Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:210:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.9013776Z 210 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.9013874Z | ^~~~ 2025-12-04T12:35:04.9015072Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:210:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.9015189Z 210 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.9015338Z | ^~~~ 2025-12-04T12:35:04.9016598Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:211:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.9016711Z 211 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.9016818Z | ^~~~ 2025-12-04T12:35:04.9018000Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:211:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.9018122Z 211 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.9018238Z | ^~~~ 2025-12-04T12:35:04.9019464Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:211:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.9019598Z 211 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.9019699Z | ^~~~ 2025-12-04T12:35:04.9020913Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:211:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.9021041Z 211 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.9021143Z | ^~~~ 2025-12-04T12:35:04.9022338Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:213:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.9022448Z 213 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.9022542Z | ^~~~ 2025-12-04T12:35:04.9023743Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:213:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.9023860Z 213 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.9023957Z | ^~~~ 2025-12-04T12:35:04.9025159Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:213:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.9025270Z 213 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.9025388Z | ^~~~ 2025-12-04T12:35:04.9026568Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:213:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.9026679Z 213 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.9026798Z | ^~~~ 2025-12-04T12:35:04.9027983Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:214:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.9028108Z 214 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.9028208Z | ^~~~ 2025-12-04T12:35:04.9029391Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:214:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.9029558Z 214 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.9029653Z | ^~~~ 2025-12-04T12:35:04.9030853Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:214:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.9031007Z 214 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.9031154Z | ^~~~ 2025-12-04T12:35:04.9032356Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:214:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.9032507Z 214 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.9032610Z | ^~~~ 2025-12-04T12:35:04.9033803Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:215:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.9033926Z 215 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.9034035Z | ^~~~ 2025-12-04T12:35:04.9035220Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:215:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.9035339Z 215 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.9035456Z | ^~~~ 2025-12-04T12:35:04.9036643Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:215:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.9036771Z 215 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.9036872Z | ^~~~ 2025-12-04T12:35:04.9038054Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:215:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.9038179Z 215 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.9038280Z | ^~~~ 2025-12-04T12:35:04.9039767Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h: In instantiation of ‘void at::vec::CPU_CAPABILITY::QuantizeAvx512(const float*, T*, int, float, int64_t) [with T = unsigned char; int64_t = long int]’: 2025-12-04T12:35:04.9040362Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:933:31: required from here 2025-12-04T12:35:04.9041538Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:201:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.9041671Z 201 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.9041764Z | ^~~~ 2025-12-04T12:35:04.9042959Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:201:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.9043078Z 201 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.9043182Z | ^~~~ 2025-12-04T12:35:04.9044375Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:201:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.9044494Z 201 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.9044605Z | ^~~~ 2025-12-04T12:35:04.9045782Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:201:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.9045934Z 201 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.9046050Z | ^~~~ 2025-12-04T12:35:04.9047262Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:202:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.9047407Z 202 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.9047515Z | ^~~~ 2025-12-04T12:35:04.9048737Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:202:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.9048866Z 202 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.9048965Z | ^~~~ 2025-12-04T12:35:04.9050147Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:202:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.9050286Z 202 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.9050389Z | ^~~~ 2025-12-04T12:35:04.9051590Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:202:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.9051710Z 202 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.9051813Z | ^~~~ 2025-12-04T12:35:04.9053008Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:203:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.9053123Z 203 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.9053224Z | ^~~~ 2025-12-04T12:35:04.9054419Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:203:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.9054533Z 203 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.9054647Z | ^~~~ 2025-12-04T12:35:04.9055841Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:203:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.9055956Z 203 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.9056081Z | ^~~~ 2025-12-04T12:35:04.9057326Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:203:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.9057463Z 203 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.9057567Z | ^~~~ 2025-12-04T12:35:04.9058751Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:205:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.9058890Z 205 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.9058991Z | ^~~~ 2025-12-04T12:35:04.9060186Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:205:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.9060307Z 205 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.9060405Z | ^~~~ 2025-12-04T12:35:04.9061603Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:205:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.9061763Z 205 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.9061866Z | ^~~~ 2025-12-04T12:35:04.9063098Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:205:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.9063261Z 205 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.9063375Z | ^~~~ 2025-12-04T12:35:04.9064585Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:206:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.9064701Z 206 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.9064810Z | ^~~~ 2025-12-04T12:35:04.9065992Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:206:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.9066129Z 206 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.9066229Z | ^~~~ 2025-12-04T12:35:04.9067417Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:206:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.9067546Z 206 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.9067648Z | ^~~~ 2025-12-04T12:35:04.9068844Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:206:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.9068956Z 206 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.9069064Z | ^~~~ 2025-12-04T12:35:04.9070258Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:207:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.9070371Z 207 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.9070463Z | ^~~~ 2025-12-04T12:35:04.9071836Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:207:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.9071947Z 207 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.9072064Z | ^~~~ 2025-12-04T12:35:04.9073241Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:207:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.9073359Z 207 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.9073474Z | ^~~~ 2025-12-04T12:35:04.9074658Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:207:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.9074792Z 207 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.9074897Z | ^~~~ 2025-12-04T12:35:04.9076069Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:209:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.9076199Z 209 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.9076292Z | ^~~~ 2025-12-04T12:35:04.9077491Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:209:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.9077695Z 209 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.9077791Z | ^~~~ 2025-12-04T12:35:04.9079043Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:209:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.9079199Z 209 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.9079302Z | ^~~~ 2025-12-04T12:35:04.9080544Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:209:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.9080662Z 209 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.9080784Z | ^~~~ 2025-12-04T12:35:04.9081973Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:210:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.9082087Z 210 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.9082197Z | ^~~~ 2025-12-04T12:35:04.9083384Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:210:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.9083515Z 210 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.9083610Z | ^~~~ 2025-12-04T12:35:04.9084795Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:210:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.9084921Z 210 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.9085028Z | ^~~~ 2025-12-04T12:35:04.9086221Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:210:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.9086336Z 210 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.9086445Z | ^~~~ 2025-12-04T12:35:04.9087640Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:211:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.9087754Z 211 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.9087854Z | ^~~~ 2025-12-04T12:35:04.9089043Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:211:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.9089198Z 211 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.9089307Z | ^~~~ 2025-12-04T12:35:04.9090484Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:211:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.9090603Z 211 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.9090753Z | ^~~~ 2025-12-04T12:35:04.9091934Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:211:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.9092072Z 211 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.9092174Z | ^~~~ 2025-12-04T12:35:04.9093352Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:213:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.9093485Z 213 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.9093578Z | ^~~~ 2025-12-04T12:35:04.9094810Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:213:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.9094928Z 213 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.9095026Z | ^~~~ 2025-12-04T12:35:04.9096258Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:213:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.9096447Z 213 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.9096550Z | ^~~~ 2025-12-04T12:35:04.9097769Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:213:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.9097883Z 213 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.9098001Z | ^~~~ 2025-12-04T12:35:04.9099189Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:214:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.9099310Z 214 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.9099422Z | ^~~~ 2025-12-04T12:35:04.9100607Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:214:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.9100737Z 214 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.9100842Z | ^~~~ 2025-12-04T12:35:04.9102020Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:214:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.9102147Z 214 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.9102249Z | ^~~~ 2025-12-04T12:35:04.9103452Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:214:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.9103563Z 214 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.9103672Z | ^~~~ 2025-12-04T12:35:04.9104857Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:215:7: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.9105020Z 215 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.9105113Z | ^~~~ 2025-12-04T12:35:04.9106311Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:215:13: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.9106430Z 215 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.9106579Z | ^~~~ 2025-12-04T12:35:04.9107765Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:215:19: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.9107883Z 215 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.9107996Z | ^~~~ 2025-12-04T12:35:04.9109172Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_qint.h:215:25: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘255’ to ‘'\37777777777'’ [-Woverflow] 2025-12-04T12:35:04.9109304Z 215 | 0xff, 0xff, 0xff, 0xff, 2025-12-04T12:35:04.9109405Z | ^~~~ 2025-12-04T12:35:04.9109511Z PASSED [9.5383s] [ 48%] 2025-12-04T12:35:04.9110191Z inductor/test_aot_inductor_package.py::TestAOTInductorPackageCpp_cpu::test_update_weights SKIPPED [0.0033s] (No support for cpp only) [ 50%] 2025-12-04T12:35:04.9110611Z inductor/test_aot_inductor_package.py::TestAOTInductorPackage_cuda::test_add PASSED [6.0938s] [ 51%] 2025-12-04T12:35:04.9111058Z inductor/test_aot_inductor_package.py::TestAOTInductorPackage_cuda::test_bool_input PASSED [6.0266s] [ 52%] 2025-12-04T12:35:04.9111785Z inductor/test_aot_inductor_package.py::TestAOTInductorPackage_cuda::test_compile_after_package SKIPPED [0.0003s] (Test is only supported on CUDA 12.6+) [ 53%] 2025-12-04T12:35:04.9112508Z inductor/test_aot_inductor_package.py::TestAOTInductorPackage_cuda::test_compile_after_package_multi_arch SKIPPED [0.0002s] (Test is only supported on CUDA 12.8+) [ 54%] 2025-12-04T12:35:04.9113234Z inductor/test_aot_inductor_package.py::TestAOTInductorPackage_cuda::test_compile_after_package_static SKIPPED [0.0002s] (Test is only supported on CUDA 12.6+) [ 55%] 2025-12-04T12:35:04.9113899Z inductor/test_aot_inductor_package.py::TestAOTInductorPackage_cuda::test_compile_standalone_cos SKIPPED [0.0034s] (Only meant to test cpp package) [ 56%] 2025-12-04T12:35:04.9114587Z inductor/test_aot_inductor_package.py::TestAOTInductorPackage_cuda::test_compile_with_exporter SKIPPED [0.0002s] (Test is only supported on CUDA 12.6+) [ 57%] 2025-12-04T12:35:04.9115307Z inductor/test_aot_inductor_package.py::TestAOTInductorPackage_cuda::test_compile_with_exporter_weights SKIPPED [0.0002s] (Test is only supported on CUDA 12.6+) [ 59%] 2025-12-04T12:35:04.9116506Z inductor/test_aot_inductor_package.py::TestAOTInductorPackage_cuda::test_deepcopy_compiled_model W1204 12:30:18.171000 140836 site-packages/torch/export/pt2_archive/_package.py:763] AOTICompiledModel deepcopy warning: AOTICompiledModel.loader is not deepcopied. 2025-12-04T12:35:04.9116630Z PASSED [6.0298s] [ 60%] 2025-12-04T12:35:04.9117104Z inductor/test_aot_inductor_package.py::TestAOTInductorPackage_cuda::test_duplicate_calls PASSED [17.3854s] [ 61%] 2025-12-04T12:35:04.9117976Z inductor/test_aot_inductor_package.py::TestAOTInductorPackage_cuda::test_linear W1204 12:30:36.578000 140836 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T12:35:04.9118084Z PASSED [7.0526s] [ 62%] 2025-12-04T12:35:04.9119128Z inductor/test_aot_inductor_package.py::TestAOTInductorPackage_cuda::test_loading_wrong_model W1204 12:30:48.678000 140836 site-packages/torch/_inductor/package/package.py:120] Loading outdated pt2 file. Please regenerate your package. 2025-12-04T12:35:04.9119247Z PASSED [6.0630s] [ 63%] 2025-12-04T12:35:04.9119681Z inductor/test_aot_inductor_package.py::TestAOTInductorPackage_cuda::test_metadata PASSED [6.1607s] [ 64%] 2025-12-04T12:35:04.9120209Z inductor/test_aot_inductor_package.py::TestAOTInductorPackage_cuda::test_multiple_methods PASSED [11.9880s] [ 65%] 2025-12-04T12:35:04.9120709Z inductor/test_aot_inductor_package.py::TestAOTInductorPackage_cuda::test_package_shared_weights PASSED [2.1197s] [ 67%] 2025-12-04T12:35:04.9121229Z inductor/test_aot_inductor_package.py::TestAOTInductorPackage_cuda::test_package_user_managed_weight PASSED [6.4443s] [ 68%] 2025-12-04T12:35:04.9121867Z inductor/test_aot_inductor_package.py::TestAOTInductorPackage_cuda::test_package_weights_on_disk_nested_module PASSED [5.4121s] [ 69%] 2025-12-04T12:35:04.9122368Z inductor/test_aot_inductor_package.py::TestAOTInductorPackage_cuda::test_package_without_weight PASSED [5.3768s] [ 70%] 2025-12-04T12:35:04.9122890Z inductor/test_aot_inductor_package.py::TestAOTInductorPackage_cuda::test_remove_intermediate_files PASSED [6.0808s] [ 71%] 2025-12-04T12:35:04.9123341Z inductor/test_aot_inductor_package.py::TestAOTInductorPackage_cuda::test_save_buffer PASSED [6.1294s] [ 72%] 2025-12-04T12:35:04.9123833Z inductor/test_aot_inductor_package.py::TestAOTInductorPackage_cuda::test_specified_output_dir PASSED [6.1094s] [ 73%] 2025-12-04T12:35:04.9124308Z inductor/test_aot_inductor_package.py::TestAOTInductorPackage_cuda::test_update_weights PASSED [5.7031s] [ 75%] 2025-12-04T12:35:04.9124771Z inductor/test_aot_inductor_package.py::TestAOTInductorPackageCpp_cuda::test_add PASSED [9.4664s] [ 76%] 2025-12-04T12:35:04.9125252Z inductor/test_aot_inductor_package.py::TestAOTInductorPackageCpp_cuda::test_bool_input PASSED [9.3557s] [ 77%] 2025-12-04T12:35:04.9125979Z inductor/test_aot_inductor_package.py::TestAOTInductorPackageCpp_cuda::test_compile_after_package SKIPPED [0.0003s] (Test is only supported on CUDA 12.6+) [ 78%] 2025-12-04T12:35:04.9126724Z inductor/test_aot_inductor_package.py::TestAOTInductorPackageCpp_cuda::test_compile_after_package_multi_arch SKIPPED [0.0002s] (Test is only supported on CUDA 12.8+) [ 79%] 2025-12-04T12:35:04.9127467Z inductor/test_aot_inductor_package.py::TestAOTInductorPackageCpp_cuda::test_compile_after_package_static SKIPPED [0.0002s] (Test is only supported on CUDA 12.6+) [ 80%] 2025-12-04T12:35:04.9128689Z inductor/test_aot_inductor_package.py::TestAOTInductorPackageCpp_cuda::test_compile_standalone_cos W1204 12:32:09.075000 140836 site-packages/torch/_inductor/utils.py:3815] Overriding: aot_inductor.dynamic_linkage=False when aot_inductor_mode.compile_standalone is True. 2025-12-04T12:35:04.9128838Z ('RERUN', {'yellow': True}) [0.6115s] [ 81%] 2025-12-04T12:35:04.9130065Z inductor/test_aot_inductor_package.py::TestAOTInductorPackageCpp_cuda::test_compile_standalone_cos W1204 12:32:09.687000 140836 site-packages/torch/_inductor/utils.py:3815] Overriding: aot_inductor.dynamic_linkage=False when aot_inductor_mode.compile_standalone is True. 2025-12-04T12:35:04.9130212Z ('RERUN', {'yellow': True}) [0.5757s] [ 81%] 2025-12-04T12:35:04.9131421Z inductor/test_aot_inductor_package.py::TestAOTInductorPackageCpp_cuda::test_compile_standalone_cos W1204 12:32:10.265000 140836 site-packages/torch/_inductor/utils.py:3815] Overriding: aot_inductor.dynamic_linkage=False when aot_inductor_mode.compile_standalone is True. 2025-12-04T12:35:04.9131527Z FAILED [0.5841s] [ 81%] 2025-12-04T12:35:04.9131537Z 2025-12-04T12:35:04.9131697Z ==================================== RERUNS ==================================== 2025-12-04T12:35:04.9132000Z __________ TestAOTInductorPackageCpp_cuda.test_compile_standalone_cos __________ 2025-12-04T12:35:04.9132138Z Traceback (most recent call last): 2025-12-04T12:35:04.9132673Z File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_package.py", line 452, in test_compile_standalone_cos 2025-12-04T12:35:04.9132804Z build_path, _ = self.cmake_compile( 2025-12-04T12:35:04.9133273Z File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_package.py", line 179, in cmake_compile 2025-12-04T12:35:04.9133478Z package_path = torch._inductor.aoti_compile_and_package( 2025-12-04T12:35:04.9134049Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 151, in aoti_compile_and_package 2025-12-04T12:35:04.9134198Z return aot_inductor_minifier_wrapper( 2025-12-04T12:35:04.9134741Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/debug.py", line 1336, in aot_inductor_minifier_wrapper 2025-12-04T12:35:04.9134850Z raise e 2025-12-04T12:35:04.9135439Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/debug.py", line 1306, in aot_inductor_minifier_wrapper 2025-12-04T12:35:04.9135539Z return func( 2025-12-04T12:35:04.9136103Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 195, in _aoti_compile_and_package_inner 2025-12-04T12:35:04.9136405Z aoti_files = aot_compile(gm, args, kwargs, options=inductor_configs) 2025-12-04T12:35:04.9136884Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 311, in aot_compile 2025-12-04T12:35:04.9137003Z return compile_fx_aot( 2025-12-04T12:35:04.9137499Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2007, in compile_fx_aot 2025-12-04T12:35:04.9137641Z compiled_artifacts = compile_fx( 2025-12-04T12:35:04.9138153Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2477, in compile_fx 2025-12-04T12:35:04.9138264Z return compile_fx( 2025-12-04T12:35:04.9138748Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2516, in compile_fx 2025-12-04T12:35:04.9138886Z return _maybe_wrap_and_compile_fx_main( 2025-12-04T12:35:04.9139502Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2605, in _maybe_wrap_and_compile_fx_main 2025-12-04T12:35:04.9139620Z return _compile_fx_main( 2025-12-04T12:35:04.9140121Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2788, in _compile_fx_main 2025-12-04T12:35:04.9140342Z return inference_compiler(unlifted_gm, example_inputs_) 2025-12-04T12:35:04.9140861Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/schemas.py", line 1249, in __call__ 2025-12-04T12:35:04.9141015Z return self.compiler_fn(gm, example_inputs) 2025-12-04T12:35:04.9141537Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2669, in fw_compiler_base 2025-12-04T12:35:04.9141658Z return compile_fx_forward( 2025-12-04T12:35:04.9142192Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2341, in compile_fx_forward 2025-12-04T12:35:04.9142309Z return inner_compile( 2025-12-04T12:35:04.9142592Z File "/opt/conda/envs/py_3.10/lib/python3.10/contextlib.py", line 79, in inner 2025-12-04T12:35:04.9142723Z return func(*args, **kwds) 2025-12-04T12:35:04.9143225Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 806, in compile_fx_inner 2025-12-04T12:35:04.9143492Z return wrap_compiler_debug(_compile_fx_inner, compiler_name="inductor")( 2025-12-04T12:35:04.9144001Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/repro/after_aot.py", line 146, in debug_wrapper 2025-12-04T12:35:04.9144180Z inner_compiled_fn = compiler_fn(gm, example_inputs) 2025-12-04T12:35:04.9144701Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 992, in _compile_fx_inner 2025-12-04T12:35:04.9144897Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:35:04.9145400Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 988, in _compile_fx_inner 2025-12-04T12:35:04.9145566Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:35:04.9146102Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:35:04.9146501Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:35:04.9147019Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1519, in codegen_and_compile 2025-12-04T12:35:04.9147165Z compiled_fn = AotCodeCompiler.compile( 2025-12-04T12:35:04.9147671Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 2409, in compile 2025-12-04T12:35:04.9147779Z subprocess.run( 2025-12-04T12:35:04.9148057Z File "/opt/conda/envs/py_3.10/lib/python3.10/subprocess.py", line 526, in run 2025-12-04T12:35:04.9148245Z raise CalledProcessError(retcode, process.args, 2025-12-04T12:35:04.9150320Z torch._inductor.exc.InductorError: CalledProcessError: Command '['nvcc', '-fatbin', '/tmp/tmpn4zamgxp/ce75deix5v2ltuxilvqvbxmlr5qvlqmr22yvpthfi2chjjhlxkcn/cos_triton_poi_fused_cos_0.ptx', '-o', '/tmp/tmpn4zamgxp/ce75deix5v2ltuxilvqvbxmlr5qvlqmr22yvpthfi2chjjhlxkcn/cos_triton_poi_fused_cos_0.fatbin', '-gencode', 'arch=compute_75,code=compute_75', '-gencode', 'arch=compute_75,code=sm_75']' returned non-zero exit status 255. 2025-12-04T12:35:04.9150330Z 2025-12-04T12:35:04.9150570Z To execute this test, run the following from the base repo dir: 2025-12-04T12:35:04.9151324Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_aot_inductor_package.py TestAOTInductorPackageCpp_cuda.test_compile_standalone_cos 2025-12-04T12:35:04.9151334Z 2025-12-04T12:35:04.9151621Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:35:04.9151850Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:35:04.9152685Z inductor [('async_compile_cache_miss', 2), ('benchmarking.InductorBenchmarker.benchmark', 2), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('async_compile_cache_hit', 1)] 2025-12-04T12:35:04.9152807Z graph_break [] 2025-12-04T12:35:04.9153027Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:35:04.9153856Z /opt/conda/envs/py_3.10/lib/python3.10/copyreg.py:101: FutureWarning: `isinstance(treespec, LeafSpec)` is deprecated, use `isinstance(treespec, TreeSpec) and treespec.is_leaf()` instead. 2025-12-04T12:35:04.9153973Z return cls.__new__(cls, *args) 2025-12-04T12:35:04.9155576Z nvcc -fatbin /tmp/tmpn4zamgxp/ce75deix5v2ltuxilvqvbxmlr5qvlqmr22yvpthfi2chjjhlxkcn/cos_triton_poi_fused_cos_0.ptx -o /tmp/tmpn4zamgxp/ce75deix5v2ltuxilvqvbxmlr5qvlqmr22yvpthfi2chjjhlxkcn/cos_triton_poi_fused_cos_0.fatbin -gencode arch=compute_75,code=compute_75 -gencode arch=compute_75,code=sm_75 failed with: 2025-12-04T12:35:04.9155687Z stdout: 2025-12-04T12:35:04.9155695Z 2025-12-04T12:35:04.9155786Z stderr: 2025-12-04T12:35:04.9156644Z ptxas /tmp/tmpn4zamgxp/ce75deix5v2ltuxilvqvbxmlr5qvlqmr22yvpthfi2chjjhlxkcn/cos_triton_poi_fused_cos_0.ptx, line 5; fatal : Unsupported .version 8.7; current version is '8.4' 2025-12-04T12:35:04.9156815Z ptxas fatal : Ptx assembly aborted due to errors 2025-12-04T12:35:04.9156821Z 2025-12-04T12:35:04.9157123Z __________ TestAOTInductorPackageCpp_cuda.test_compile_standalone_cos __________ 2025-12-04T12:35:04.9157257Z Traceback (most recent call last): 2025-12-04T12:35:04.9157796Z File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_package.py", line 452, in test_compile_standalone_cos 2025-12-04T12:35:04.9157938Z build_path, _ = self.cmake_compile( 2025-12-04T12:35:04.9158394Z File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_package.py", line 179, in cmake_compile 2025-12-04T12:35:04.9158599Z package_path = torch._inductor.aoti_compile_and_package( 2025-12-04T12:35:04.9159142Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 151, in aoti_compile_and_package 2025-12-04T12:35:04.9159273Z return aot_inductor_minifier_wrapper( 2025-12-04T12:35:04.9159815Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/debug.py", line 1336, in aot_inductor_minifier_wrapper 2025-12-04T12:35:04.9159962Z raise e 2025-12-04T12:35:04.9160497Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/debug.py", line 1306, in aot_inductor_minifier_wrapper 2025-12-04T12:35:04.9160609Z return func( 2025-12-04T12:35:04.9161158Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 195, in _aoti_compile_and_package_inner 2025-12-04T12:35:04.9161422Z aoti_files = aot_compile(gm, args, kwargs, options=inductor_configs) 2025-12-04T12:35:04.9161893Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 311, in aot_compile 2025-12-04T12:35:04.9162007Z return compile_fx_aot( 2025-12-04T12:35:04.9162511Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2007, in compile_fx_aot 2025-12-04T12:35:04.9162638Z compiled_artifacts = compile_fx( 2025-12-04T12:35:04.9163110Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2477, in compile_fx 2025-12-04T12:35:04.9163231Z return compile_fx( 2025-12-04T12:35:04.9163700Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2516, in compile_fx 2025-12-04T12:35:04.9163867Z return _maybe_wrap_and_compile_fx_main( 2025-12-04T12:35:04.9164456Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2605, in _maybe_wrap_and_compile_fx_main 2025-12-04T12:35:04.9164570Z return _compile_fx_main( 2025-12-04T12:35:04.9165110Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2788, in _compile_fx_main 2025-12-04T12:35:04.9165312Z return inference_compiler(unlifted_gm, example_inputs_) 2025-12-04T12:35:04.9165832Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/schemas.py", line 1249, in __call__ 2025-12-04T12:35:04.9165996Z return self.compiler_fn(gm, example_inputs) 2025-12-04T12:35:04.9166497Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2669, in fw_compiler_base 2025-12-04T12:35:04.9166613Z return compile_fx_forward( 2025-12-04T12:35:04.9167144Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2341, in compile_fx_forward 2025-12-04T12:35:04.9167258Z return inner_compile( 2025-12-04T12:35:04.9167550Z File "/opt/conda/envs/py_3.10/lib/python3.10/contextlib.py", line 79, in inner 2025-12-04T12:35:04.9167663Z return func(*args, **kwds) 2025-12-04T12:35:04.9168156Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 806, in compile_fx_inner 2025-12-04T12:35:04.9168437Z return wrap_compiler_debug(_compile_fx_inner, compiler_name="inductor")( 2025-12-04T12:35:04.9168928Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/repro/after_aot.py", line 146, in debug_wrapper 2025-12-04T12:35:04.9169119Z inner_compiled_fn = compiler_fn(gm, example_inputs) 2025-12-04T12:35:04.9169619Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 992, in _compile_fx_inner 2025-12-04T12:35:04.9169813Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:35:04.9170330Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 988, in _compile_fx_inner 2025-12-04T12:35:04.9170476Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:35:04.9171178Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:35:04.9171518Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:35:04.9172039Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1519, in codegen_and_compile 2025-12-04T12:35:04.9172307Z compiled_fn = AotCodeCompiler.compile( 2025-12-04T12:35:04.9172761Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 2409, in compile 2025-12-04T12:35:04.9172871Z subprocess.run( 2025-12-04T12:35:04.9173167Z File "/opt/conda/envs/py_3.10/lib/python3.10/subprocess.py", line 526, in run 2025-12-04T12:35:04.9173385Z raise CalledProcessError(retcode, process.args, 2025-12-04T12:35:04.9175468Z torch._inductor.exc.InductorError: CalledProcessError: Command '['nvcc', '-fatbin', '/tmp/tmpb0ttpte1/ce75deix5v2ltuxilvqvbxmlr5qvlqmr22yvpthfi2chjjhlxkcn/cos_triton_poi_fused_cos_0.ptx', '-o', '/tmp/tmpb0ttpte1/ce75deix5v2ltuxilvqvbxmlr5qvlqmr22yvpthfi2chjjhlxkcn/cos_triton_poi_fused_cos_0.fatbin', '-gencode', 'arch=compute_75,code=compute_75', '-gencode', 'arch=compute_75,code=sm_75']' returned non-zero exit status 255. 2025-12-04T12:35:04.9175476Z 2025-12-04T12:35:04.9175698Z To execute this test, run the following from the base repo dir: 2025-12-04T12:35:04.9176386Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_aot_inductor_package.py TestAOTInductorPackageCpp_cuda.test_compile_standalone_cos 2025-12-04T12:35:04.9176412Z 2025-12-04T12:35:04.9176684Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:35:04.9176964Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:35:04.9177779Z inductor [('async_compile_cache_miss', 2), ('benchmarking.InductorBenchmarker.benchmark', 2), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('async_compile_cache_hit', 1)] 2025-12-04T12:35:04.9177884Z graph_break [] 2025-12-04T12:35:04.9178153Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:35:04.9178986Z /opt/conda/envs/py_3.10/lib/python3.10/copyreg.py:101: FutureWarning: `isinstance(treespec, LeafSpec)` is deprecated, use `isinstance(treespec, TreeSpec) and treespec.is_leaf()` instead. 2025-12-04T12:35:04.9179106Z return cls.__new__(cls, *args) 2025-12-04T12:35:04.9180727Z nvcc -fatbin /tmp/tmpn4zamgxp/ce75deix5v2ltuxilvqvbxmlr5qvlqmr22yvpthfi2chjjhlxkcn/cos_triton_poi_fused_cos_0.ptx -o /tmp/tmpn4zamgxp/ce75deix5v2ltuxilvqvbxmlr5qvlqmr22yvpthfi2chjjhlxkcn/cos_triton_poi_fused_cos_0.fatbin -gencode arch=compute_75,code=compute_75 -gencode arch=compute_75,code=sm_75 failed with: 2025-12-04T12:35:04.9180826Z stdout: 2025-12-04T12:35:04.9180831Z 2025-12-04T12:35:04.9180926Z stderr: 2025-12-04T12:35:04.9181792Z ptxas /tmp/tmpn4zamgxp/ce75deix5v2ltuxilvqvbxmlr5qvlqmr22yvpthfi2chjjhlxkcn/cos_triton_poi_fused_cos_0.ptx, line 5; fatal : Unsupported .version 8.7; current version is '8.4' 2025-12-04T12:35:04.9181963Z ptxas fatal : Ptx assembly aborted due to errors 2025-12-04T12:35:04.9185072Z 2025-12-04T12:35:04.9185316Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:35:04.9186150Z inductor [('async_compile_cache_miss', 2), ('benchmarking.InductorBenchmarker.benchmark', 2), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('async_compile_cache_hit', 1)] 2025-12-04T12:35:04.9186256Z graph_break [] 2025-12-04T12:35:04.9186494Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:35:04.9187311Z /opt/conda/envs/py_3.10/lib/python3.10/copyreg.py:101: FutureWarning: `isinstance(treespec, LeafSpec)` is deprecated, use `isinstance(treespec, TreeSpec) and treespec.is_leaf()` instead. 2025-12-04T12:35:04.9187430Z return cls.__new__(cls, *args) 2025-12-04T12:35:04.9189043Z nvcc -fatbin /tmp/tmpb0ttpte1/ce75deix5v2ltuxilvqvbxmlr5qvlqmr22yvpthfi2chjjhlxkcn/cos_triton_poi_fused_cos_0.ptx -o /tmp/tmpb0ttpte1/ce75deix5v2ltuxilvqvbxmlr5qvlqmr22yvpthfi2chjjhlxkcn/cos_triton_poi_fused_cos_0.fatbin -gencode arch=compute_75,code=compute_75 -gencode arch=compute_75,code=sm_75 failed with: 2025-12-04T12:35:04.9189174Z stdout: 2025-12-04T12:35:04.9189180Z 2025-12-04T12:35:04.9189342Z stderr: 2025-12-04T12:35:04.9190183Z ptxas /tmp/tmpb0ttpte1/ce75deix5v2ltuxilvqvbxmlr5qvlqmr22yvpthfi2chjjhlxkcn/cos_triton_poi_fused_cos_0.ptx, line 5; fatal : Unsupported .version 8.7; current version is '8.4' 2025-12-04T12:35:04.9190351Z ptxas fatal : Ptx assembly aborted due to errors 2025-12-04T12:35:04.9190356Z 2025-12-04T12:35:04.9190523Z =================================== FAILURES =================================== 2025-12-04T12:35:04.9190824Z __________ TestAOTInductorPackageCpp_cuda.test_compile_standalone_cos __________ 2025-12-04T12:35:04.9190951Z Traceback (most recent call last): 2025-12-04T12:35:04.9191498Z File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_package.py", line 452, in test_compile_standalone_cos 2025-12-04T12:35:04.9191627Z build_path, _ = self.cmake_compile( 2025-12-04T12:35:04.9192097Z File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_package.py", line 179, in cmake_compile 2025-12-04T12:35:04.9192303Z package_path = torch._inductor.aoti_compile_and_package( 2025-12-04T12:35:04.9192840Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 151, in aoti_compile_and_package 2025-12-04T12:35:04.9192987Z return aot_inductor_minifier_wrapper( 2025-12-04T12:35:04.9193565Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/debug.py", line 1336, in aot_inductor_minifier_wrapper 2025-12-04T12:35:04.9193674Z raise e 2025-12-04T12:35:04.9194210Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/debug.py", line 1306, in aot_inductor_minifier_wrapper 2025-12-04T12:35:04.9194310Z return func( 2025-12-04T12:35:04.9194904Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 195, in _aoti_compile_and_package_inner 2025-12-04T12:35:04.9195142Z aoti_files = aot_compile(gm, args, kwargs, options=inductor_configs) 2025-12-04T12:35:04.9195597Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 311, in aot_compile 2025-12-04T12:35:04.9195729Z return compile_fx_aot( 2025-12-04T12:35:04.9196220Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2007, in compile_fx_aot 2025-12-04T12:35:04.9196359Z compiled_artifacts = compile_fx( 2025-12-04T12:35:04.9196831Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2477, in compile_fx 2025-12-04T12:35:04.9196938Z return compile_fx( 2025-12-04T12:35:04.9197416Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2516, in compile_fx 2025-12-04T12:35:04.9197555Z return _maybe_wrap_and_compile_fx_main( 2025-12-04T12:35:04.9198128Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2605, in _maybe_wrap_and_compile_fx_main 2025-12-04T12:35:04.9198334Z return _compile_fx_main( 2025-12-04T12:35:04.9198844Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2788, in _compile_fx_main 2025-12-04T12:35:04.9199060Z return inference_compiler(unlifted_gm, example_inputs_) 2025-12-04T12:35:04.9199579Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/schemas.py", line 1249, in __call__ 2025-12-04T12:35:04.9199734Z return self.compiler_fn(gm, example_inputs) 2025-12-04T12:35:04.9200251Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2669, in fw_compiler_base 2025-12-04T12:35:04.9200372Z return compile_fx_forward( 2025-12-04T12:35:04.9200902Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2341, in compile_fx_forward 2025-12-04T12:35:04.9201017Z return inner_compile( 2025-12-04T12:35:04.9201301Z File "/opt/conda/envs/py_3.10/lib/python3.10/contextlib.py", line 79, in inner 2025-12-04T12:35:04.9201434Z return func(*args, **kwds) 2025-12-04T12:35:04.9201964Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 806, in compile_fx_inner 2025-12-04T12:35:04.9202229Z return wrap_compiler_debug(_compile_fx_inner, compiler_name="inductor")( 2025-12-04T12:35:04.9202735Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/repro/after_aot.py", line 146, in debug_wrapper 2025-12-04T12:35:04.9202910Z inner_compiled_fn = compiler_fn(gm, example_inputs) 2025-12-04T12:35:04.9203415Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 992, in _compile_fx_inner 2025-12-04T12:35:04.9208841Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:35:04.9209385Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 988, in _compile_fx_inner 2025-12-04T12:35:04.9209552Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:35:04.9210090Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:35:04.9210419Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:35:04.9210958Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1519, in codegen_and_compile 2025-12-04T12:35:04.9211191Z compiled_fn = AotCodeCompiler.compile( 2025-12-04T12:35:04.9211663Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 2409, in compile 2025-12-04T12:35:04.9211772Z subprocess.run( 2025-12-04T12:35:04.9212054Z File "/opt/conda/envs/py_3.10/lib/python3.10/subprocess.py", line 526, in run 2025-12-04T12:35:04.9212270Z raise CalledProcessError(retcode, process.args, 2025-12-04T12:35:04.9214359Z torch._inductor.exc.InductorError: CalledProcessError: Command '['nvcc', '-fatbin', '/tmp/tmp7iehhac5/ce75deix5v2ltuxilvqvbxmlr5qvlqmr22yvpthfi2chjjhlxkcn/cos_triton_poi_fused_cos_0.ptx', '-o', '/tmp/tmp7iehhac5/ce75deix5v2ltuxilvqvbxmlr5qvlqmr22yvpthfi2chjjhlxkcn/cos_triton_poi_fused_cos_0.fatbin', '-gencode', 'arch=compute_75,code=compute_75', '-gencode', 'arch=compute_75,code=sm_75']' returned non-zero exit status 255. 2025-12-04T12:35:04.9214373Z 2025-12-04T12:35:04.9214606Z To execute this test, run the following from the base repo dir: 2025-12-04T12:35:04.9215239Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_aot_inductor_package.py TestAOTInductorPackageCpp_cuda.test_compile_standalone_cos 2025-12-04T12:35:04.9215246Z 2025-12-04T12:35:04.9215519Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:35:04.9215767Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:35:04.9216677Z inductor [('async_compile_cache_miss', 2), ('benchmarking.InductorBenchmarker.benchmark', 2), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('async_compile_cache_hit', 1)] 2025-12-04T12:35:04.9216871Z graph_break [] 2025-12-04T12:35:04.9217096Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:35:04.9217919Z /opt/conda/envs/py_3.10/lib/python3.10/copyreg.py:101: FutureWarning: `isinstance(treespec, LeafSpec)` is deprecated, use `isinstance(treespec, TreeSpec) and treespec.is_leaf()` instead. 2025-12-04T12:35:04.9218054Z return cls.__new__(cls, *args) 2025-12-04T12:35:04.9219659Z nvcc -fatbin /tmp/tmpn4zamgxp/ce75deix5v2ltuxilvqvbxmlr5qvlqmr22yvpthfi2chjjhlxkcn/cos_triton_poi_fused_cos_0.ptx -o /tmp/tmpn4zamgxp/ce75deix5v2ltuxilvqvbxmlr5qvlqmr22yvpthfi2chjjhlxkcn/cos_triton_poi_fused_cos_0.fatbin -gencode arch=compute_75,code=compute_75 -gencode arch=compute_75,code=sm_75 failed with: 2025-12-04T12:35:04.9219772Z stdout: 2025-12-04T12:35:04.9219778Z 2025-12-04T12:35:04.9219873Z stderr: 2025-12-04T12:35:04.9220720Z ptxas /tmp/tmpn4zamgxp/ce75deix5v2ltuxilvqvbxmlr5qvlqmr22yvpthfi2chjjhlxkcn/cos_triton_poi_fused_cos_0.ptx, line 5; fatal : Unsupported .version 8.7; current version is '8.4' 2025-12-04T12:35:04.9220945Z ptxas fatal : Ptx assembly aborted due to errors 2025-12-04T12:35:04.9220952Z 2025-12-04T12:35:04.9221169Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:35:04.9221984Z inductor [('async_compile_cache_miss', 2), ('benchmarking.InductorBenchmarker.benchmark', 2), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('async_compile_cache_hit', 1)] 2025-12-04T12:35:04.9222086Z graph_break [] 2025-12-04T12:35:04.9222307Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:35:04.9223140Z /opt/conda/envs/py_3.10/lib/python3.10/copyreg.py:101: FutureWarning: `isinstance(treespec, LeafSpec)` is deprecated, use `isinstance(treespec, TreeSpec) and treespec.is_leaf()` instead. 2025-12-04T12:35:04.9223262Z return cls.__new__(cls, *args) 2025-12-04T12:35:04.9224865Z nvcc -fatbin /tmp/tmpb0ttpte1/ce75deix5v2ltuxilvqvbxmlr5qvlqmr22yvpthfi2chjjhlxkcn/cos_triton_poi_fused_cos_0.ptx -o /tmp/tmpb0ttpte1/ce75deix5v2ltuxilvqvbxmlr5qvlqmr22yvpthfi2chjjhlxkcn/cos_triton_poi_fused_cos_0.fatbin -gencode arch=compute_75,code=compute_75 -gencode arch=compute_75,code=sm_75 failed with: 2025-12-04T12:35:04.9224962Z stdout: 2025-12-04T12:35:04.9224967Z 2025-12-04T12:35:04.9225060Z stderr: 2025-12-04T12:35:04.9225943Z ptxas /tmp/tmpb0ttpte1/ce75deix5v2ltuxilvqvbxmlr5qvlqmr22yvpthfi2chjjhlxkcn/cos_triton_poi_fused_cos_0.ptx, line 5; fatal : Unsupported .version 8.7; current version is '8.4' 2025-12-04T12:35:04.9226112Z ptxas fatal : Ptx assembly aborted due to errors 2025-12-04T12:35:04.9226117Z 2025-12-04T12:35:04.9226351Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:35:04.9227203Z inductor [('async_compile_cache_miss', 2), ('benchmarking.InductorBenchmarker.benchmark', 2), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('async_compile_cache_hit', 1)] 2025-12-04T12:35:04.9227310Z graph_break [] 2025-12-04T12:35:04.9227548Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:35:04.9228361Z /opt/conda/envs/py_3.10/lib/python3.10/copyreg.py:101: FutureWarning: `isinstance(treespec, LeafSpec)` is deprecated, use `isinstance(treespec, TreeSpec) and treespec.is_leaf()` instead. 2025-12-04T12:35:04.9228490Z return cls.__new__(cls, *args) 2025-12-04T12:35:04.9230086Z nvcc -fatbin /tmp/tmp7iehhac5/ce75deix5v2ltuxilvqvbxmlr5qvlqmr22yvpthfi2chjjhlxkcn/cos_triton_poi_fused_cos_0.ptx -o /tmp/tmp7iehhac5/ce75deix5v2ltuxilvqvbxmlr5qvlqmr22yvpthfi2chjjhlxkcn/cos_triton_poi_fused_cos_0.fatbin -gencode arch=compute_75,code=compute_75 -gencode arch=compute_75,code=sm_75 failed with: 2025-12-04T12:35:04.9230181Z stdout: 2025-12-04T12:35:04.9230203Z 2025-12-04T12:35:04.9230294Z stderr: 2025-12-04T12:35:04.9231133Z ptxas /tmp/tmp7iehhac5/ce75deix5v2ltuxilvqvbxmlr5qvlqmr22yvpthfi2chjjhlxkcn/cos_triton_poi_fused_cos_0.ptx, line 5; fatal : Unsupported .version 8.7; current version is '8.4' 2025-12-04T12:35:04.9231362Z ptxas fatal : Ptx assembly aborted due to errors 2025-12-04T12:35:04.9231367Z 2025-12-04T12:35:04.9232200Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_aot_inductor_package/inductor.test_aot_inductor_package-b1ca468dab29d0d8.xml - 2025-12-04T12:35:04.9232377Z =========================== short test summary info ============================ 2025-12-04T12:35:04.9235036Z FAILED [0.5841s] inductor/test_aot_inductor_package.py::TestAOTInductorPackageCpp_cuda::test_compile_standalone_cos - torch._inductor.exc.InductorError: CalledProcessError: Command '['nvcc', '-fatbin', '/tmp/tmp7iehhac5/ce75deix5v2ltuxilvqvbxmlr5qvlqmr22yvpthfi2chjjhlxkcn/cos_triton_poi_fused_cos_0.ptx', '-o', '/tmp/tmp7iehhac5/ce75deix5v2ltuxilvqvbxmlr5qvlqmr22yvpthfi2chjjhlxkcn/cos_triton_poi_fused_cos_0.fatbin', '-gencode', 'arch=compute_75,code=compute_75', '-gencode', 'arch=compute_75,code=sm_75']' returned non-zero exit status 255. 2025-12-04T12:35:04.9235078Z 2025-12-04T12:35:04.9235310Z To execute this test, run the following from the base repo dir: 2025-12-04T12:35:04.9235939Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_aot_inductor_package.py TestAOTInductorPackageCpp_cuda.test_compile_standalone_cos 2025-12-04T12:35:04.9235945Z 2025-12-04T12:35:04.9236211Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:35:04.9236408Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T12:35:04.9236637Z ======== 1 failed, 46 passed, 25 skipped, 2 rerun in 370.49s (0:06:10) ========= 2025-12-04T12:35:04.9236738Z Got exit code 1 2025-12-04T12:35:04.9236862Z Retrying single test... 2025-12-04T12:35:04.9237515Z Test results will be stored in test-reports/python-pytest/inductor.test_aot_inductor_package/inductor.test_aot_inductor_package-69f64b5320fd797d.xml 2025-12-04T12:35:04.9237700Z ============================= test session starts ============================== 2025-12-04T12:35:04.9238059Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T12:35:04.9238170Z cachedir: .pytest_cache 2025-12-04T12:35:04.9238709Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:35:04.9238839Z rootdir: /var/lib/jenkins/workspace 2025-12-04T12:35:04.9238979Z configfile: pytest.ini 2025-12-04T12:35:04.9239585Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T12:35:04.9239799Z collecting ... collected 88 items / 87 deselected / 1 selected 2025-12-04T12:35:04.9240557Z stepcurrent: skipping 71 already run items. Running only test/inductor/test_aot_inductor_package.py::TestAOTInductorPackageCpp_cuda::test_compile_standalone_cos 2025-12-04T12:35:04.9240678Z Running 1 items in this shard 2025-12-04T12:35:04.9240683Z 2025-12-04T12:35:04.9241909Z inductor/test_aot_inductor_package.py::TestAOTInductorPackageCpp_cuda::test_compile_standalone_cos W1204 12:32:24.269000 147773 site-packages/torch/_inductor/utils.py:3815] Overriding: aot_inductor.dynamic_linkage=False when aot_inductor_mode.compile_standalone is True. 2025-12-04T12:35:04.9242061Z ('RERUN', {'yellow': True}) [5.5651s] [100%] 2025-12-04T12:35:04.9243271Z inductor/test_aot_inductor_package.py::TestAOTInductorPackageCpp_cuda::test_compile_standalone_cos W1204 12:32:28.137000 147773 site-packages/torch/_inductor/utils.py:3815] Overriding: aot_inductor.dynamic_linkage=False when aot_inductor_mode.compile_standalone is True. 2025-12-04T12:35:04.9243417Z ('RERUN', {'yellow': True}) [0.5711s] [100%] 2025-12-04T12:35:04.9244637Z inductor/test_aot_inductor_package.py::TestAOTInductorPackageCpp_cuda::test_compile_standalone_cos W1204 12:32:28.710000 147773 site-packages/torch/_inductor/utils.py:3815] Overriding: aot_inductor.dynamic_linkage=False when aot_inductor_mode.compile_standalone is True. 2025-12-04T12:35:04.9244795Z FAILED [0.5715s] [100%] 2025-12-04T12:35:04.9244801Z 2025-12-04T12:35:04.9244947Z ==================================== RERUNS ==================================== 2025-12-04T12:35:04.9245245Z __________ TestAOTInductorPackageCpp_cuda.test_compile_standalone_cos __________ 2025-12-04T12:35:04.9245380Z Traceback (most recent call last): 2025-12-04T12:35:04.9245916Z File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_package.py", line 452, in test_compile_standalone_cos 2025-12-04T12:35:04.9246046Z build_path, _ = self.cmake_compile( 2025-12-04T12:35:04.9246519Z File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_package.py", line 179, in cmake_compile 2025-12-04T12:35:04.9246722Z package_path = torch._inductor.aoti_compile_and_package( 2025-12-04T12:35:04.9247259Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 151, in aoti_compile_and_package 2025-12-04T12:35:04.9247393Z return aot_inductor_minifier_wrapper( 2025-12-04T12:35:04.9247967Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/debug.py", line 1336, in aot_inductor_minifier_wrapper 2025-12-04T12:35:04.9248077Z raise e 2025-12-04T12:35:04.9248611Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/debug.py", line 1306, in aot_inductor_minifier_wrapper 2025-12-04T12:35:04.9248723Z return func( 2025-12-04T12:35:04.9249273Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 195, in _aoti_compile_and_package_inner 2025-12-04T12:35:04.9249502Z aoti_files = aot_compile(gm, args, kwargs, options=inductor_configs) 2025-12-04T12:35:04.9249977Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 311, in aot_compile 2025-12-04T12:35:04.9250092Z return compile_fx_aot( 2025-12-04T12:35:04.9250584Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2007, in compile_fx_aot 2025-12-04T12:35:04.9250724Z compiled_artifacts = compile_fx( 2025-12-04T12:35:04.9251192Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2477, in compile_fx 2025-12-04T12:35:04.9251313Z return compile_fx( 2025-12-04T12:35:04.9251808Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2516, in compile_fx 2025-12-04T12:35:04.9251946Z return _maybe_wrap_and_compile_fx_main( 2025-12-04T12:35:04.9252532Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2605, in _maybe_wrap_and_compile_fx_main 2025-12-04T12:35:04.9252647Z return _compile_fx_main( 2025-12-04T12:35:04.9253180Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2788, in _compile_fx_main 2025-12-04T12:35:04.9253398Z return inference_compiler(unlifted_gm, example_inputs_) 2025-12-04T12:35:04.9253918Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/schemas.py", line 1249, in __call__ 2025-12-04T12:35:04.9254079Z return self.compiler_fn(gm, example_inputs) 2025-12-04T12:35:04.9254580Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2669, in fw_compiler_base 2025-12-04T12:35:04.9254696Z return compile_fx_forward( 2025-12-04T12:35:04.9255225Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2341, in compile_fx_forward 2025-12-04T12:35:04.9255336Z return inner_compile( 2025-12-04T12:35:04.9255628Z File "/opt/conda/envs/py_3.10/lib/python3.10/contextlib.py", line 79, in inner 2025-12-04T12:35:04.9255744Z return func(*args, **kwds) 2025-12-04T12:35:04.9256236Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 806, in compile_fx_inner 2025-12-04T12:35:04.9256631Z return wrap_compiler_debug(_compile_fx_inner, compiler_name="inductor")( 2025-12-04T12:35:04.9257124Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/repro/after_aot.py", line 146, in debug_wrapper 2025-12-04T12:35:04.9257302Z inner_compiled_fn = compiler_fn(gm, example_inputs) 2025-12-04T12:35:04.9257817Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 992, in _compile_fx_inner 2025-12-04T12:35:04.9258014Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:35:04.9258528Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 988, in _compile_fx_inner 2025-12-04T12:35:04.9258676Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:35:04.9259210Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:35:04.9259552Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:35:04.9260110Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1519, in codegen_and_compile 2025-12-04T12:35:04.9260266Z compiled_fn = AotCodeCompiler.compile( 2025-12-04T12:35:04.9260717Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 2409, in compile 2025-12-04T12:35:04.9260821Z subprocess.run( 2025-12-04T12:35:04.9261110Z File "/opt/conda/envs/py_3.10/lib/python3.10/subprocess.py", line 526, in run 2025-12-04T12:35:04.9261279Z raise CalledProcessError(retcode, process.args, 2025-12-04T12:35:04.9263353Z torch._inductor.exc.InductorError: CalledProcessError: Command '['nvcc', '-fatbin', '/tmp/tmpn0sihacm/ce75deix5v2ltuxilvqvbxmlr5qvlqmr22yvpthfi2chjjhlxkcn/cos_triton_poi_fused_cos_0.ptx', '-o', '/tmp/tmpn0sihacm/ce75deix5v2ltuxilvqvbxmlr5qvlqmr22yvpthfi2chjjhlxkcn/cos_triton_poi_fused_cos_0.fatbin', '-gencode', 'arch=compute_75,code=compute_75', '-gencode', 'arch=compute_75,code=sm_75']' returned non-zero exit status 255. 2025-12-04T12:35:04.9263364Z 2025-12-04T12:35:04.9263582Z To execute this test, run the following from the base repo dir: 2025-12-04T12:35:04.9264212Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_aot_inductor_package.py TestAOTInductorPackageCpp_cuda.test_compile_standalone_cos 2025-12-04T12:35:04.9264218Z 2025-12-04T12:35:04.9264535Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:35:04.9264760Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:35:04.9265457Z inductor [('benchmarking.InductorBenchmarker.benchmark', 2), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('async_compile_cache_miss', 1)] 2025-12-04T12:35:04.9265585Z graph_break [] 2025-12-04T12:35:04.9265804Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:35:04.9266639Z /opt/conda/envs/py_3.10/lib/python3.10/copyreg.py:101: FutureWarning: `isinstance(treespec, LeafSpec)` is deprecated, use `isinstance(treespec, TreeSpec) and treespec.is_leaf()` instead. 2025-12-04T12:35:04.9266757Z return cls.__new__(cls, *args) 2025-12-04T12:35:04.9268377Z nvcc -fatbin /tmp/tmpn0sihacm/ce75deix5v2ltuxilvqvbxmlr5qvlqmr22yvpthfi2chjjhlxkcn/cos_triton_poi_fused_cos_0.ptx -o /tmp/tmpn0sihacm/ce75deix5v2ltuxilvqvbxmlr5qvlqmr22yvpthfi2chjjhlxkcn/cos_triton_poi_fused_cos_0.fatbin -gencode arch=compute_75,code=compute_75 -gencode arch=compute_75,code=sm_75 failed with: 2025-12-04T12:35:04.9268472Z stdout: 2025-12-04T12:35:04.9268478Z 2025-12-04T12:35:04.9268565Z stderr: 2025-12-04T12:35:04.9269418Z ptxas /tmp/tmpn0sihacm/ce75deix5v2ltuxilvqvbxmlr5qvlqmr22yvpthfi2chjjhlxkcn/cos_triton_poi_fused_cos_0.ptx, line 5; fatal : Unsupported .version 8.7; current version is '8.4' 2025-12-04T12:35:04.9269584Z ptxas fatal : Ptx assembly aborted due to errors 2025-12-04T12:35:04.9269628Z 2025-12-04T12:35:04.9269943Z __________ TestAOTInductorPackageCpp_cuda.test_compile_standalone_cos __________ 2025-12-04T12:35:04.9270072Z Traceback (most recent call last): 2025-12-04T12:35:04.9270603Z File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_package.py", line 452, in test_compile_standalone_cos 2025-12-04T12:35:04.9270743Z build_path, _ = self.cmake_compile( 2025-12-04T12:35:04.9271487Z File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_package.py", line 179, in cmake_compile 2025-12-04T12:35:04.9271695Z package_path = torch._inductor.aoti_compile_and_package( 2025-12-04T12:35:04.9272235Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 151, in aoti_compile_and_package 2025-12-04T12:35:04.9272364Z return aot_inductor_minifier_wrapper( 2025-12-04T12:35:04.9272924Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/debug.py", line 1336, in aot_inductor_minifier_wrapper 2025-12-04T12:35:04.9273022Z raise e 2025-12-04T12:35:04.9273556Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/debug.py", line 1306, in aot_inductor_minifier_wrapper 2025-12-04T12:35:04.9273752Z return func( 2025-12-04T12:35:04.9274301Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 195, in _aoti_compile_and_package_inner 2025-12-04T12:35:04.9274549Z aoti_files = aot_compile(gm, args, kwargs, options=inductor_configs) 2025-12-04T12:35:04.9275008Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 311, in aot_compile 2025-12-04T12:35:04.9275122Z return compile_fx_aot( 2025-12-04T12:35:04.9275626Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2007, in compile_fx_aot 2025-12-04T12:35:04.9275753Z compiled_artifacts = compile_fx( 2025-12-04T12:35:04.9276224Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2477, in compile_fx 2025-12-04T12:35:04.9276347Z return compile_fx( 2025-12-04T12:35:04.9276813Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2516, in compile_fx 2025-12-04T12:35:04.9276964Z return _maybe_wrap_and_compile_fx_main( 2025-12-04T12:35:04.9277532Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2605, in _maybe_wrap_and_compile_fx_main 2025-12-04T12:35:04.9277647Z return _compile_fx_main( 2025-12-04T12:35:04.9278231Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2788, in _compile_fx_main 2025-12-04T12:35:04.9278434Z return inference_compiler(unlifted_gm, example_inputs_) 2025-12-04T12:35:04.9279008Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/schemas.py", line 1249, in __call__ 2025-12-04T12:35:04.9279158Z return self.compiler_fn(gm, example_inputs) 2025-12-04T12:35:04.9279661Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2669, in fw_compiler_base 2025-12-04T12:35:04.9279792Z return compile_fx_forward( 2025-12-04T12:35:04.9280304Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2341, in compile_fx_forward 2025-12-04T12:35:04.9280413Z return inner_compile( 2025-12-04T12:35:04.9280705Z File "/opt/conda/envs/py_3.10/lib/python3.10/contextlib.py", line 79, in inner 2025-12-04T12:35:04.9280823Z return func(*args, **kwds) 2025-12-04T12:35:04.9281329Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 806, in compile_fx_inner 2025-12-04T12:35:04.9281593Z return wrap_compiler_debug(_compile_fx_inner, compiler_name="inductor")( 2025-12-04T12:35:04.9282087Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/repro/after_aot.py", line 146, in debug_wrapper 2025-12-04T12:35:04.9282343Z inner_compiled_fn = compiler_fn(gm, example_inputs) 2025-12-04T12:35:04.9282842Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 992, in _compile_fx_inner 2025-12-04T12:35:04.9283038Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:35:04.9283548Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 988, in _compile_fx_inner 2025-12-04T12:35:04.9283695Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:35:04.9284242Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:35:04.9284561Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:35:04.9285082Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1519, in codegen_and_compile 2025-12-04T12:35:04.9285244Z compiled_fn = AotCodeCompiler.compile( 2025-12-04T12:35:04.9285697Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 2409, in compile 2025-12-04T12:35:04.9285854Z subprocess.run( 2025-12-04T12:35:04.9286133Z File "/opt/conda/envs/py_3.10/lib/python3.10/subprocess.py", line 526, in run 2025-12-04T12:35:04.9286301Z raise CalledProcessError(retcode, process.args, 2025-12-04T12:35:04.9288384Z torch._inductor.exc.InductorError: CalledProcessError: Command '['nvcc', '-fatbin', '/tmp/tmp690r3gye/ce75deix5v2ltuxilvqvbxmlr5qvlqmr22yvpthfi2chjjhlxkcn/cos_triton_poi_fused_cos_0.ptx', '-o', '/tmp/tmp690r3gye/ce75deix5v2ltuxilvqvbxmlr5qvlqmr22yvpthfi2chjjhlxkcn/cos_triton_poi_fused_cos_0.fatbin', '-gencode', 'arch=compute_75,code=compute_75', '-gencode', 'arch=compute_75,code=sm_75']' returned non-zero exit status 255. 2025-12-04T12:35:04.9288392Z 2025-12-04T12:35:04.9288614Z To execute this test, run the following from the base repo dir: 2025-12-04T12:35:04.9289265Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_aot_inductor_package.py TestAOTInductorPackageCpp_cuda.test_compile_standalone_cos 2025-12-04T12:35:04.9289274Z 2025-12-04T12:35:04.9289542Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:35:04.9289766Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:35:04.9290510Z inductor [('benchmarking.InductorBenchmarker.benchmark', 2), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('async_compile_cache_miss', 1)] 2025-12-04T12:35:04.9290615Z graph_break [] 2025-12-04T12:35:04.9290848Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:35:04.9291700Z /opt/conda/envs/py_3.10/lib/python3.10/copyreg.py:101: FutureWarning: `isinstance(treespec, LeafSpec)` is deprecated, use `isinstance(treespec, TreeSpec) and treespec.is_leaf()` instead. 2025-12-04T12:35:04.9291820Z return cls.__new__(cls, *args) 2025-12-04T12:35:04.9293436Z nvcc -fatbin /tmp/tmpn0sihacm/ce75deix5v2ltuxilvqvbxmlr5qvlqmr22yvpthfi2chjjhlxkcn/cos_triton_poi_fused_cos_0.ptx -o /tmp/tmpn0sihacm/ce75deix5v2ltuxilvqvbxmlr5qvlqmr22yvpthfi2chjjhlxkcn/cos_triton_poi_fused_cos_0.fatbin -gencode arch=compute_75,code=compute_75 -gencode arch=compute_75,code=sm_75 failed with: 2025-12-04T12:35:04.9293534Z stdout: 2025-12-04T12:35:04.9293540Z 2025-12-04T12:35:04.9293647Z stderr: 2025-12-04T12:35:04.9294491Z ptxas /tmp/tmpn0sihacm/ce75deix5v2ltuxilvqvbxmlr5qvlqmr22yvpthfi2chjjhlxkcn/cos_triton_poi_fused_cos_0.ptx, line 5; fatal : Unsupported .version 8.7; current version is '8.4' 2025-12-04T12:35:04.9294658Z ptxas fatal : Ptx assembly aborted due to errors 2025-12-04T12:35:04.9294664Z 2025-12-04T12:35:04.9294902Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:35:04.9295586Z inductor [('benchmarking.InductorBenchmarker.benchmark', 2), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('async_compile_cache_miss', 1)] 2025-12-04T12:35:04.9295738Z graph_break [] 2025-12-04T12:35:04.9295956Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:35:04.9296843Z /opt/conda/envs/py_3.10/lib/python3.10/copyreg.py:101: FutureWarning: `isinstance(treespec, LeafSpec)` is deprecated, use `isinstance(treespec, TreeSpec) and treespec.is_leaf()` instead. 2025-12-04T12:35:04.9296977Z return cls.__new__(cls, *args) 2025-12-04T12:35:04.9298564Z nvcc -fatbin /tmp/tmp690r3gye/ce75deix5v2ltuxilvqvbxmlr5qvlqmr22yvpthfi2chjjhlxkcn/cos_triton_poi_fused_cos_0.ptx -o /tmp/tmp690r3gye/ce75deix5v2ltuxilvqvbxmlr5qvlqmr22yvpthfi2chjjhlxkcn/cos_triton_poi_fused_cos_0.fatbin -gencode arch=compute_75,code=compute_75 -gencode arch=compute_75,code=sm_75 failed with: 2025-12-04T12:35:04.9298673Z stdout: 2025-12-04T12:35:04.9298678Z 2025-12-04T12:35:04.9298771Z stderr: 2025-12-04T12:35:04.9299612Z ptxas /tmp/tmp690r3gye/ce75deix5v2ltuxilvqvbxmlr5qvlqmr22yvpthfi2chjjhlxkcn/cos_triton_poi_fused_cos_0.ptx, line 5; fatal : Unsupported .version 8.7; current version is '8.4' 2025-12-04T12:35:04.9299800Z ptxas fatal : Ptx assembly aborted due to errors 2025-12-04T12:35:04.9299845Z 2025-12-04T12:35:04.9299994Z =================================== FAILURES =================================== 2025-12-04T12:35:04.9300307Z __________ TestAOTInductorPackageCpp_cuda.test_compile_standalone_cos __________ 2025-12-04T12:35:04.9300434Z Traceback (most recent call last): 2025-12-04T12:35:04.9300968Z File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_package.py", line 452, in test_compile_standalone_cos 2025-12-04T12:35:04.9301109Z build_path, _ = self.cmake_compile( 2025-12-04T12:35:04.9301571Z File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_package.py", line 179, in cmake_compile 2025-12-04T12:35:04.9301772Z package_path = torch._inductor.aoti_compile_and_package( 2025-12-04T12:35:04.9302319Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 151, in aoti_compile_and_package 2025-12-04T12:35:04.9302453Z return aot_inductor_minifier_wrapper( 2025-12-04T12:35:04.9303010Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/debug.py", line 1336, in aot_inductor_minifier_wrapper 2025-12-04T12:35:04.9303101Z raise e 2025-12-04T12:35:04.9303638Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/debug.py", line 1306, in aot_inductor_minifier_wrapper 2025-12-04T12:35:04.9303751Z return func( 2025-12-04T12:35:04.9304328Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 195, in _aoti_compile_and_package_inner 2025-12-04T12:35:04.9304572Z aoti_files = aot_compile(gm, args, kwargs, options=inductor_configs) 2025-12-04T12:35:04.9305058Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 311, in aot_compile 2025-12-04T12:35:04.9305173Z return compile_fx_aot( 2025-12-04T12:35:04.9305675Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2007, in compile_fx_aot 2025-12-04T12:35:04.9305800Z compiled_artifacts = compile_fx( 2025-12-04T12:35:04.9306264Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2477, in compile_fx 2025-12-04T12:35:04.9306382Z return compile_fx( 2025-12-04T12:35:04.9306846Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2516, in compile_fx 2025-12-04T12:35:04.9306998Z return _maybe_wrap_and_compile_fx_main( 2025-12-04T12:35:04.9307568Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2605, in _maybe_wrap_and_compile_fx_main 2025-12-04T12:35:04.9307682Z return _compile_fx_main( 2025-12-04T12:35:04.9308198Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2788, in _compile_fx_main 2025-12-04T12:35:04.9308396Z return inference_compiler(unlifted_gm, example_inputs_) 2025-12-04T12:35:04.9308949Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/schemas.py", line 1249, in __call__ 2025-12-04T12:35:04.9309111Z return self.compiler_fn(gm, example_inputs) 2025-12-04T12:35:04.9309615Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2669, in fw_compiler_base 2025-12-04T12:35:04.9309743Z return compile_fx_forward( 2025-12-04T12:35:04.9310257Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2341, in compile_fx_forward 2025-12-04T12:35:04.9310368Z return inner_compile( 2025-12-04T12:35:04.9310662Z File "/opt/conda/envs/py_3.10/lib/python3.10/contextlib.py", line 79, in inner 2025-12-04T12:35:04.9310772Z return func(*args, **kwds) 2025-12-04T12:35:04.9311280Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 806, in compile_fx_inner 2025-12-04T12:35:04.9311549Z return wrap_compiler_debug(_compile_fx_inner, compiler_name="inductor")( 2025-12-04T12:35:04.9312068Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/repro/after_aot.py", line 146, in debug_wrapper 2025-12-04T12:35:04.9312256Z inner_compiled_fn = compiler_fn(gm, example_inputs) 2025-12-04T12:35:04.9312756Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 992, in _compile_fx_inner 2025-12-04T12:35:04.9312951Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:35:04.9313468Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 988, in _compile_fx_inner 2025-12-04T12:35:04.9313614Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:35:04.9314157Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:35:04.9314478Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:35:04.9314994Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1519, in codegen_and_compile 2025-12-04T12:35:04.9315151Z compiled_fn = AotCodeCompiler.compile( 2025-12-04T12:35:04.9315600Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 2409, in compile 2025-12-04T12:35:04.9315717Z subprocess.run( 2025-12-04T12:35:04.9316021Z File "/opt/conda/envs/py_3.10/lib/python3.10/subprocess.py", line 526, in run 2025-12-04T12:35:04.9316191Z raise CalledProcessError(retcode, process.args, 2025-12-04T12:35:04.9318284Z torch._inductor.exc.InductorError: CalledProcessError: Command '['nvcc', '-fatbin', '/tmp/tmpw2ggbbe6/ce75deix5v2ltuxilvqvbxmlr5qvlqmr22yvpthfi2chjjhlxkcn/cos_triton_poi_fused_cos_0.ptx', '-o', '/tmp/tmpw2ggbbe6/ce75deix5v2ltuxilvqvbxmlr5qvlqmr22yvpthfi2chjjhlxkcn/cos_triton_poi_fused_cos_0.fatbin', '-gencode', 'arch=compute_75,code=compute_75', '-gencode', 'arch=compute_75,code=sm_75']' returned non-zero exit status 255. 2025-12-04T12:35:04.9318295Z 2025-12-04T12:35:04.9318512Z To execute this test, run the following from the base repo dir: 2025-12-04T12:35:04.9319158Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_aot_inductor_package.py TestAOTInductorPackageCpp_cuda.test_compile_standalone_cos 2025-12-04T12:35:04.9319164Z 2025-12-04T12:35:04.9319431Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:35:04.9319652Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:35:04.9320349Z inductor [('benchmarking.InductorBenchmarker.benchmark', 2), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('async_compile_cache_miss', 1)] 2025-12-04T12:35:04.9320449Z graph_break [] 2025-12-04T12:35:04.9320683Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:35:04.9321502Z /opt/conda/envs/py_3.10/lib/python3.10/copyreg.py:101: FutureWarning: `isinstance(treespec, LeafSpec)` is deprecated, use `isinstance(treespec, TreeSpec) and treespec.is_leaf()` instead. 2025-12-04T12:35:04.9321653Z return cls.__new__(cls, *args) 2025-12-04T12:35:04.9323271Z nvcc -fatbin /tmp/tmpn0sihacm/ce75deix5v2ltuxilvqvbxmlr5qvlqmr22yvpthfi2chjjhlxkcn/cos_triton_poi_fused_cos_0.ptx -o /tmp/tmpn0sihacm/ce75deix5v2ltuxilvqvbxmlr5qvlqmr22yvpthfi2chjjhlxkcn/cos_triton_poi_fused_cos_0.fatbin -gencode arch=compute_75,code=compute_75 -gencode arch=compute_75,code=sm_75 failed with: 2025-12-04T12:35:04.9323367Z stdout: 2025-12-04T12:35:04.9323372Z 2025-12-04T12:35:04.9323479Z stderr: 2025-12-04T12:35:04.9324315Z ptxas /tmp/tmpn0sihacm/ce75deix5v2ltuxilvqvbxmlr5qvlqmr22yvpthfi2chjjhlxkcn/cos_triton_poi_fused_cos_0.ptx, line 5; fatal : Unsupported .version 8.7; current version is '8.4' 2025-12-04T12:35:04.9324481Z ptxas fatal : Ptx assembly aborted due to errors 2025-12-04T12:35:04.9324487Z 2025-12-04T12:35:04.9324722Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:35:04.9325396Z inductor [('benchmarking.InductorBenchmarker.benchmark', 2), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('async_compile_cache_miss', 1)] 2025-12-04T12:35:04.9325558Z graph_break [] 2025-12-04T12:35:04.9325777Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:35:04.9326589Z /opt/conda/envs/py_3.10/lib/python3.10/copyreg.py:101: FutureWarning: `isinstance(treespec, LeafSpec)` is deprecated, use `isinstance(treespec, TreeSpec) and treespec.is_leaf()` instead. 2025-12-04T12:35:04.9326719Z return cls.__new__(cls, *args) 2025-12-04T12:35:04.9328321Z nvcc -fatbin /tmp/tmp690r3gye/ce75deix5v2ltuxilvqvbxmlr5qvlqmr22yvpthfi2chjjhlxkcn/cos_triton_poi_fused_cos_0.ptx -o /tmp/tmp690r3gye/ce75deix5v2ltuxilvqvbxmlr5qvlqmr22yvpthfi2chjjhlxkcn/cos_triton_poi_fused_cos_0.fatbin -gencode arch=compute_75,code=compute_75 -gencode arch=compute_75,code=sm_75 failed with: 2025-12-04T12:35:04.9328427Z stdout: 2025-12-04T12:35:04.9328432Z 2025-12-04T12:35:04.9328527Z stderr: 2025-12-04T12:35:04.9329363Z ptxas /tmp/tmp690r3gye/ce75deix5v2ltuxilvqvbxmlr5qvlqmr22yvpthfi2chjjhlxkcn/cos_triton_poi_fused_cos_0.ptx, line 5; fatal : Unsupported .version 8.7; current version is '8.4' 2025-12-04T12:35:04.9329538Z ptxas fatal : Ptx assembly aborted due to errors 2025-12-04T12:35:04.9329543Z 2025-12-04T12:35:04.9329758Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:35:04.9330478Z inductor [('benchmarking.InductorBenchmarker.benchmark', 2), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('async_compile_cache_miss', 1)] 2025-12-04T12:35:04.9330579Z graph_break [] 2025-12-04T12:35:04.9330795Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:35:04.9331648Z /opt/conda/envs/py_3.10/lib/python3.10/copyreg.py:101: FutureWarning: `isinstance(treespec, LeafSpec)` is deprecated, use `isinstance(treespec, TreeSpec) and treespec.is_leaf()` instead. 2025-12-04T12:35:04.9331766Z return cls.__new__(cls, *args) 2025-12-04T12:35:04.9333370Z nvcc -fatbin /tmp/tmpw2ggbbe6/ce75deix5v2ltuxilvqvbxmlr5qvlqmr22yvpthfi2chjjhlxkcn/cos_triton_poi_fused_cos_0.ptx -o /tmp/tmpw2ggbbe6/ce75deix5v2ltuxilvqvbxmlr5qvlqmr22yvpthfi2chjjhlxkcn/cos_triton_poi_fused_cos_0.fatbin -gencode arch=compute_75,code=compute_75 -gencode arch=compute_75,code=sm_75 failed with: 2025-12-04T12:35:04.9333463Z stdout: 2025-12-04T12:35:04.9333468Z 2025-12-04T12:35:04.9333558Z stderr: 2025-12-04T12:35:04.9334407Z ptxas /tmp/tmpw2ggbbe6/ce75deix5v2ltuxilvqvbxmlr5qvlqmr22yvpthfi2chjjhlxkcn/cos_triton_poi_fused_cos_0.ptx, line 5; fatal : Unsupported .version 8.7; current version is '8.4' 2025-12-04T12:35:04.9334571Z ptxas fatal : Ptx assembly aborted due to errors 2025-12-04T12:35:04.9334576Z 2025-12-04T12:35:04.9335415Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_aot_inductor_package/inductor.test_aot_inductor_package-69f64b5320fd797d.xml - 2025-12-04T12:35:04.9335628Z =========================== short test summary info ============================ 2025-12-04T12:35:04.9338389Z FAILED [0.5715s] inductor/test_aot_inductor_package.py::TestAOTInductorPackageCpp_cuda::test_compile_standalone_cos - torch._inductor.exc.InductorError: CalledProcessError: Command '['nvcc', '-fatbin', '/tmp/tmpw2ggbbe6/ce75deix5v2ltuxilvqvbxmlr5qvlqmr22yvpthfi2chjjhlxkcn/cos_triton_poi_fused_cos_0.ptx', '-o', '/tmp/tmpw2ggbbe6/ce75deix5v2ltuxilvqvbxmlr5qvlqmr22yvpthfi2chjjhlxkcn/cos_triton_poi_fused_cos_0.fatbin', '-gencode', 'arch=compute_75,code=compute_75', '-gencode', 'arch=compute_75,code=sm_75']' returned non-zero exit status 255. 2025-12-04T12:35:04.9338397Z 2025-12-04T12:35:04.9338617Z To execute this test, run the following from the base repo dir: 2025-12-04T12:35:04.9339253Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_aot_inductor_package.py TestAOTInductorPackageCpp_cuda.test_compile_standalone_cos 2025-12-04T12:35:04.9339272Z 2025-12-04T12:35:04.9339540Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:35:04.9339760Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T12:35:04.9339974Z ================== 1 failed, 87 deselected, 2 rerun in 6.75s =================== 2025-12-04T12:35:04.9340077Z Got exit code 1 2025-12-04T12:35:04.9340187Z Retrying single test... 2025-12-04T12:35:04.9340850Z Test results will be stored in test-reports/python-pytest/inductor.test_aot_inductor_package/inductor.test_aot_inductor_package-e41e403fca9b1188.xml 2025-12-04T12:35:04.9341014Z ============================= test session starts ============================== 2025-12-04T12:35:04.9341377Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T12:35:04.9341491Z cachedir: .pytest_cache 2025-12-04T12:35:04.9342013Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:35:04.9342145Z rootdir: /var/lib/jenkins/workspace 2025-12-04T12:35:04.9342260Z configfile: pytest.ini 2025-12-04T12:35:04.9342850Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T12:35:04.9343073Z collecting ... collected 88 items / 87 deselected / 1 selected 2025-12-04T12:35:04.9343820Z stepcurrent: skipping 71 already run items. Running only test/inductor/test_aot_inductor_package.py::TestAOTInductorPackageCpp_cuda::test_compile_standalone_cos 2025-12-04T12:35:04.9343948Z Running 1 items in this shard 2025-12-04T12:35:04.9343953Z 2025-12-04T12:35:04.9345214Z inductor/test_aot_inductor_package.py::TestAOTInductorPackageCpp_cuda::test_compile_standalone_cos W1204 12:32:44.069000 148188 site-packages/torch/_inductor/utils.py:3815] Overriding: aot_inductor.dynamic_linkage=False when aot_inductor_mode.compile_standalone is True. 2025-12-04T12:35:04.9345350Z ('RERUN', {'yellow': True}) [5.6710s] [100%] 2025-12-04T12:35:04.9346583Z inductor/test_aot_inductor_package.py::TestAOTInductorPackageCpp_cuda::test_compile_standalone_cos W1204 12:32:48.031000 148188 site-packages/torch/_inductor/utils.py:3815] Overriding: aot_inductor.dynamic_linkage=False when aot_inductor_mode.compile_standalone is True. 2025-12-04T12:35:04.9346714Z ('RERUN', {'yellow': True}) [0.5973s] [100%] 2025-12-04T12:35:04.9347936Z inductor/test_aot_inductor_package.py::TestAOTInductorPackageCpp_cuda::test_compile_standalone_cos W1204 12:32:48.630000 148188 site-packages/torch/_inductor/utils.py:3815] Overriding: aot_inductor.dynamic_linkage=False when aot_inductor_mode.compile_standalone is True. 2025-12-04T12:35:04.9348038Z FAILED [0.5838s] [100%] 2025-12-04T12:35:04.9348044Z 2025-12-04T12:35:04.9348199Z ==================================== RERUNS ==================================== 2025-12-04T12:35:04.9348497Z __________ TestAOTInductorPackageCpp_cuda.test_compile_standalone_cos __________ 2025-12-04T12:35:04.9348657Z Traceback (most recent call last): 2025-12-04T12:35:04.9349204Z File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_package.py", line 452, in test_compile_standalone_cos 2025-12-04T12:35:04.9349334Z build_path, _ = self.cmake_compile( 2025-12-04T12:35:04.9349786Z File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_package.py", line 179, in cmake_compile 2025-12-04T12:35:04.9349998Z package_path = torch._inductor.aoti_compile_and_package( 2025-12-04T12:35:04.9350530Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 151, in aoti_compile_and_package 2025-12-04T12:35:04.9350671Z return aot_inductor_minifier_wrapper( 2025-12-04T12:35:04.9351218Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/debug.py", line 1336, in aot_inductor_minifier_wrapper 2025-12-04T12:35:04.9351310Z raise e 2025-12-04T12:35:04.9351861Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/debug.py", line 1306, in aot_inductor_minifier_wrapper 2025-12-04T12:35:04.9351991Z return func( 2025-12-04T12:35:04.9352538Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 195, in _aoti_compile_and_package_inner 2025-12-04T12:35:04.9352784Z aoti_files = aot_compile(gm, args, kwargs, options=inductor_configs) 2025-12-04T12:35:04.9353239Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 311, in aot_compile 2025-12-04T12:35:04.9353366Z return compile_fx_aot( 2025-12-04T12:35:04.9353857Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2007, in compile_fx_aot 2025-12-04T12:35:04.9353979Z compiled_artifacts = compile_fx( 2025-12-04T12:35:04.9354463Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2477, in compile_fx 2025-12-04T12:35:04.9354568Z return compile_fx( 2025-12-04T12:35:04.9355045Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2516, in compile_fx 2025-12-04T12:35:04.9355182Z return _maybe_wrap_and_compile_fx_main( 2025-12-04T12:35:04.9355751Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2605, in _maybe_wrap_and_compile_fx_main 2025-12-04T12:35:04.9355876Z return _compile_fx_main( 2025-12-04T12:35:04.9356405Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2788, in _compile_fx_main 2025-12-04T12:35:04.9356607Z return inference_compiler(unlifted_gm, example_inputs_) 2025-12-04T12:35:04.9357140Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/schemas.py", line 1249, in __call__ 2025-12-04T12:35:04.9357319Z return self.compiler_fn(gm, example_inputs) 2025-12-04T12:35:04.9357837Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2669, in fw_compiler_base 2025-12-04T12:35:04.9357954Z return compile_fx_forward( 2025-12-04T12:35:04.9358471Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2341, in compile_fx_forward 2025-12-04T12:35:04.9358595Z return inner_compile( 2025-12-04T12:35:04.9358876Z File "/opt/conda/envs/py_3.10/lib/python3.10/contextlib.py", line 79, in inner 2025-12-04T12:35:04.9359003Z return func(*args, **kwds) 2025-12-04T12:35:04.9359514Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 806, in compile_fx_inner 2025-12-04T12:35:04.9359781Z return wrap_compiler_debug(_compile_fx_inner, compiler_name="inductor")( 2025-12-04T12:35:04.9360291Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/repro/after_aot.py", line 146, in debug_wrapper 2025-12-04T12:35:04.9360467Z inner_compiled_fn = compiler_fn(gm, example_inputs) 2025-12-04T12:35:04.9361007Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 992, in _compile_fx_inner 2025-12-04T12:35:04.9361217Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:35:04.9361720Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 988, in _compile_fx_inner 2025-12-04T12:35:04.9361882Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:35:04.9362418Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:35:04.9362742Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:35:04.9363279Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1519, in codegen_and_compile 2025-12-04T12:35:04.9363427Z compiled_fn = AotCodeCompiler.compile( 2025-12-04T12:35:04.9363883Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 2409, in compile 2025-12-04T12:35:04.9364009Z subprocess.run( 2025-12-04T12:35:04.9364413Z File "/opt/conda/envs/py_3.10/lib/python3.10/subprocess.py", line 526, in run 2025-12-04T12:35:04.9364595Z raise CalledProcessError(retcode, process.args, 2025-12-04T12:35:04.9366669Z torch._inductor.exc.InductorError: CalledProcessError: Command '['nvcc', '-fatbin', '/tmp/tmpszu7egnh/ce75deix5v2ltuxilvqvbxmlr5qvlqmr22yvpthfi2chjjhlxkcn/cos_triton_poi_fused_cos_0.ptx', '-o', '/tmp/tmpszu7egnh/ce75deix5v2ltuxilvqvbxmlr5qvlqmr22yvpthfi2chjjhlxkcn/cos_triton_poi_fused_cos_0.fatbin', '-gencode', 'arch=compute_75,code=compute_75', '-gencode', 'arch=compute_75,code=sm_75']' returned non-zero exit status 255. 2025-12-04T12:35:04.9366676Z 2025-12-04T12:35:04.9366914Z To execute this test, run the following from the base repo dir: 2025-12-04T12:35:04.9367551Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_aot_inductor_package.py TestAOTInductorPackageCpp_cuda.test_compile_standalone_cos 2025-12-04T12:35:04.9367559Z 2025-12-04T12:35:04.9367831Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:35:04.9368075Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:35:04.9368759Z inductor [('benchmarking.InductorBenchmarker.benchmark', 2), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('async_compile_cache_miss', 1)] 2025-12-04T12:35:04.9368877Z graph_break [] 2025-12-04T12:35:04.9369130Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:35:04.9369953Z /opt/conda/envs/py_3.10/lib/python3.10/copyreg.py:101: FutureWarning: `isinstance(treespec, LeafSpec)` is deprecated, use `isinstance(treespec, TreeSpec) and treespec.is_leaf()` instead. 2025-12-04T12:35:04.9370123Z return cls.__new__(cls, *args) 2025-12-04T12:35:04.9371910Z nvcc -fatbin /tmp/tmpszu7egnh/ce75deix5v2ltuxilvqvbxmlr5qvlqmr22yvpthfi2chjjhlxkcn/cos_triton_poi_fused_cos_0.ptx -o /tmp/tmpszu7egnh/ce75deix5v2ltuxilvqvbxmlr5qvlqmr22yvpthfi2chjjhlxkcn/cos_triton_poi_fused_cos_0.fatbin -gencode arch=compute_75,code=compute_75 -gencode arch=compute_75,code=sm_75 failed with: 2025-12-04T12:35:04.9372027Z stdout: 2025-12-04T12:35:04.9372032Z 2025-12-04T12:35:04.9372126Z stderr: 2025-12-04T12:35:04.9372970Z ptxas /tmp/tmpszu7egnh/ce75deix5v2ltuxilvqvbxmlr5qvlqmr22yvpthfi2chjjhlxkcn/cos_triton_poi_fused_cos_0.ptx, line 5; fatal : Unsupported .version 8.7; current version is '8.4' 2025-12-04T12:35:04.9373158Z ptxas fatal : Ptx assembly aborted due to errors 2025-12-04T12:35:04.9373163Z 2025-12-04T12:35:04.9373466Z __________ TestAOTInductorPackageCpp_cuda.test_compile_standalone_cos __________ 2025-12-04T12:35:04.9373608Z Traceback (most recent call last): 2025-12-04T12:35:04.9374149Z File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_package.py", line 452, in test_compile_standalone_cos 2025-12-04T12:35:04.9374279Z build_path, _ = self.cmake_compile( 2025-12-04T12:35:04.9374847Z File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_package.py", line 179, in cmake_compile 2025-12-04T12:35:04.9375055Z package_path = torch._inductor.aoti_compile_and_package( 2025-12-04T12:35:04.9375585Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 151, in aoti_compile_and_package 2025-12-04T12:35:04.9375732Z return aot_inductor_minifier_wrapper( 2025-12-04T12:35:04.9376280Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/debug.py", line 1336, in aot_inductor_minifier_wrapper 2025-12-04T12:35:04.9376464Z raise e 2025-12-04T12:35:04.9377004Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/debug.py", line 1306, in aot_inductor_minifier_wrapper 2025-12-04T12:35:04.9377102Z return func( 2025-12-04T12:35:04.9377667Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 195, in _aoti_compile_and_package_inner 2025-12-04T12:35:04.9377903Z aoti_files = aot_compile(gm, args, kwargs, options=inductor_configs) 2025-12-04T12:35:04.9378412Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 311, in aot_compile 2025-12-04T12:35:04.9378540Z return compile_fx_aot( 2025-12-04T12:35:04.9379030Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2007, in compile_fx_aot 2025-12-04T12:35:04.9379168Z compiled_artifacts = compile_fx( 2025-12-04T12:35:04.9379639Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2477, in compile_fx 2025-12-04T12:35:04.9379746Z return compile_fx( 2025-12-04T12:35:04.9380227Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2516, in compile_fx 2025-12-04T12:35:04.9380365Z return _maybe_wrap_and_compile_fx_main( 2025-12-04T12:35:04.9380955Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2605, in _maybe_wrap_and_compile_fx_main 2025-12-04T12:35:04.9381076Z return _compile_fx_main( 2025-12-04T12:35:04.9381579Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2788, in _compile_fx_main 2025-12-04T12:35:04.9381796Z return inference_compiler(unlifted_gm, example_inputs_) 2025-12-04T12:35:04.9382322Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/schemas.py", line 1249, in __call__ 2025-12-04T12:35:04.9382673Z return self.compiler_fn(gm, example_inputs) 2025-12-04T12:35:04.9383191Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2669, in fw_compiler_base 2025-12-04T12:35:04.9383309Z return compile_fx_forward( 2025-12-04T12:35:04.9383879Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2341, in compile_fx_forward 2025-12-04T12:35:04.9383995Z return inner_compile( 2025-12-04T12:35:04.9384277Z File "/opt/conda/envs/py_3.10/lib/python3.10/contextlib.py", line 79, in inner 2025-12-04T12:35:04.9384411Z return func(*args, **kwds) 2025-12-04T12:35:04.9384903Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 806, in compile_fx_inner 2025-12-04T12:35:04.9385171Z return wrap_compiler_debug(_compile_fx_inner, compiler_name="inductor")( 2025-12-04T12:35:04.9385677Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/repro/after_aot.py", line 146, in debug_wrapper 2025-12-04T12:35:04.9385852Z inner_compiled_fn = compiler_fn(gm, example_inputs) 2025-12-04T12:35:04.9386368Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 992, in _compile_fx_inner 2025-12-04T12:35:04.9386565Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:35:04.9387063Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 988, in _compile_fx_inner 2025-12-04T12:35:04.9387261Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:35:04.9387794Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:35:04.9388129Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:35:04.9388653Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1519, in codegen_and_compile 2025-12-04T12:35:04.9388797Z compiled_fn = AotCodeCompiler.compile( 2025-12-04T12:35:04.9389263Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 2409, in compile 2025-12-04T12:35:04.9389369Z subprocess.run( 2025-12-04T12:35:04.9389645Z File "/opt/conda/envs/py_3.10/lib/python3.10/subprocess.py", line 526, in run 2025-12-04T12:35:04.9389826Z raise CalledProcessError(retcode, process.args, 2025-12-04T12:35:04.9391892Z torch._inductor.exc.InductorError: CalledProcessError: Command '['nvcc', '-fatbin', '/tmp/tmp0mbifsnb/ce75deix5v2ltuxilvqvbxmlr5qvlqmr22yvpthfi2chjjhlxkcn/cos_triton_poi_fused_cos_0.ptx', '-o', '/tmp/tmp0mbifsnb/ce75deix5v2ltuxilvqvbxmlr5qvlqmr22yvpthfi2chjjhlxkcn/cos_triton_poi_fused_cos_0.fatbin', '-gencode', 'arch=compute_75,code=compute_75', '-gencode', 'arch=compute_75,code=sm_75']' returned non-zero exit status 255. 2025-12-04T12:35:04.9391932Z 2025-12-04T12:35:04.9392170Z To execute this test, run the following from the base repo dir: 2025-12-04T12:35:04.9392800Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_aot_inductor_package.py TestAOTInductorPackageCpp_cuda.test_compile_standalone_cos 2025-12-04T12:35:04.9392806Z 2025-12-04T12:35:04.9393089Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:35:04.9393314Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:35:04.9393998Z inductor [('benchmarking.InductorBenchmarker.benchmark', 2), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('async_compile_cache_miss', 1)] 2025-12-04T12:35:04.9394117Z graph_break [] 2025-12-04T12:35:04.9394337Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:35:04.9395163Z /opt/conda/envs/py_3.10/lib/python3.10/copyreg.py:101: FutureWarning: `isinstance(treespec, LeafSpec)` is deprecated, use `isinstance(treespec, TreeSpec) and treespec.is_leaf()` instead. 2025-12-04T12:35:04.9395313Z return cls.__new__(cls, *args) 2025-12-04T12:35:04.9396944Z nvcc -fatbin /tmp/tmpszu7egnh/ce75deix5v2ltuxilvqvbxmlr5qvlqmr22yvpthfi2chjjhlxkcn/cos_triton_poi_fused_cos_0.ptx -o /tmp/tmpszu7egnh/ce75deix5v2ltuxilvqvbxmlr5qvlqmr22yvpthfi2chjjhlxkcn/cos_triton_poi_fused_cos_0.fatbin -gencode arch=compute_75,code=compute_75 -gencode arch=compute_75,code=sm_75 failed with: 2025-12-04T12:35:04.9397056Z stdout: 2025-12-04T12:35:04.9397062Z 2025-12-04T12:35:04.9397161Z stderr: 2025-12-04T12:35:04.9398016Z ptxas /tmp/tmpszu7egnh/ce75deix5v2ltuxilvqvbxmlr5qvlqmr22yvpthfi2chjjhlxkcn/cos_triton_poi_fused_cos_0.ptx, line 5; fatal : Unsupported .version 8.7; current version is '8.4' 2025-12-04T12:35:04.9398185Z ptxas fatal : Ptx assembly aborted due to errors 2025-12-04T12:35:04.9398190Z 2025-12-04T12:35:04.9398410Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:35:04.9399107Z inductor [('benchmarking.InductorBenchmarker.benchmark', 2), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('async_compile_cache_miss', 1)] 2025-12-04T12:35:04.9399210Z graph_break [] 2025-12-04T12:35:04.9399441Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:35:04.9400256Z /opt/conda/envs/py_3.10/lib/python3.10/copyreg.py:101: FutureWarning: `isinstance(treespec, LeafSpec)` is deprecated, use `isinstance(treespec, TreeSpec) and treespec.is_leaf()` instead. 2025-12-04T12:35:04.9400409Z return cls.__new__(cls, *args) 2025-12-04T12:35:04.9402022Z nvcc -fatbin /tmp/tmp0mbifsnb/ce75deix5v2ltuxilvqvbxmlr5qvlqmr22yvpthfi2chjjhlxkcn/cos_triton_poi_fused_cos_0.ptx -o /tmp/tmp0mbifsnb/ce75deix5v2ltuxilvqvbxmlr5qvlqmr22yvpthfi2chjjhlxkcn/cos_triton_poi_fused_cos_0.fatbin -gencode arch=compute_75,code=compute_75 -gencode arch=compute_75,code=sm_75 failed with: 2025-12-04T12:35:04.9402119Z stdout: 2025-12-04T12:35:04.9402124Z 2025-12-04T12:35:04.9402231Z stderr: 2025-12-04T12:35:04.9403073Z ptxas /tmp/tmp0mbifsnb/ce75deix5v2ltuxilvqvbxmlr5qvlqmr22yvpthfi2chjjhlxkcn/cos_triton_poi_fused_cos_0.ptx, line 5; fatal : Unsupported .version 8.7; current version is '8.4' 2025-12-04T12:35:04.9403239Z ptxas fatal : Ptx assembly aborted due to errors 2025-12-04T12:35:04.9403244Z 2025-12-04T12:35:04.9403406Z =================================== FAILURES =================================== 2025-12-04T12:35:04.9403706Z __________ TestAOTInductorPackageCpp_cuda.test_compile_standalone_cos __________ 2025-12-04T12:35:04.9403846Z Traceback (most recent call last): 2025-12-04T12:35:04.9404384Z File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_package.py", line 452, in test_compile_standalone_cos 2025-12-04T12:35:04.9404546Z build_path, _ = self.cmake_compile( 2025-12-04T12:35:04.9405017Z File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_package.py", line 179, in cmake_compile 2025-12-04T12:35:04.9405220Z package_path = torch._inductor.aoti_compile_and_package( 2025-12-04T12:35:04.9405745Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 151, in aoti_compile_and_package 2025-12-04T12:35:04.9405889Z return aot_inductor_minifier_wrapper( 2025-12-04T12:35:04.9406429Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/debug.py", line 1336, in aot_inductor_minifier_wrapper 2025-12-04T12:35:04.9406538Z raise e 2025-12-04T12:35:04.9407080Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/debug.py", line 1306, in aot_inductor_minifier_wrapper 2025-12-04T12:35:04.9407180Z return func( 2025-12-04T12:35:04.9407744Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 195, in _aoti_compile_and_package_inner 2025-12-04T12:35:04.9407976Z aoti_files = aot_compile(gm, args, kwargs, options=inductor_configs) 2025-12-04T12:35:04.9408436Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 311, in aot_compile 2025-12-04T12:35:04.9408593Z return compile_fx_aot( 2025-12-04T12:35:04.9409088Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2007, in compile_fx_aot 2025-12-04T12:35:04.9409225Z compiled_artifacts = compile_fx( 2025-12-04T12:35:04.9409724Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2477, in compile_fx 2025-12-04T12:35:04.9409832Z return compile_fx( 2025-12-04T12:35:04.9410314Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2516, in compile_fx 2025-12-04T12:35:04.9410453Z return _maybe_wrap_and_compile_fx_main( 2025-12-04T12:35:04.9411036Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2605, in _maybe_wrap_and_compile_fx_main 2025-12-04T12:35:04.9411151Z return _compile_fx_main( 2025-12-04T12:35:04.9411653Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2788, in _compile_fx_main 2025-12-04T12:35:04.9411865Z return inference_compiler(unlifted_gm, example_inputs_) 2025-12-04T12:35:04.9412384Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/schemas.py", line 1249, in __call__ 2025-12-04T12:35:04.9412535Z return self.compiler_fn(gm, example_inputs) 2025-12-04T12:35:04.9413052Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2669, in fw_compiler_base 2025-12-04T12:35:04.9413224Z return compile_fx_forward( 2025-12-04T12:35:04.9413748Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2341, in compile_fx_forward 2025-12-04T12:35:04.9413860Z return inner_compile( 2025-12-04T12:35:04.9414140Z File "/opt/conda/envs/py_3.10/lib/python3.10/contextlib.py", line 79, in inner 2025-12-04T12:35:04.9414266Z return func(*args, **kwds) 2025-12-04T12:35:04.9414762Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 806, in compile_fx_inner 2025-12-04T12:35:04.9415024Z return wrap_compiler_debug(_compile_fx_inner, compiler_name="inductor")( 2025-12-04T12:35:04.9415525Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/repro/after_aot.py", line 146, in debug_wrapper 2025-12-04T12:35:04.9415701Z inner_compiled_fn = compiler_fn(gm, example_inputs) 2025-12-04T12:35:04.9416218Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 992, in _compile_fx_inner 2025-12-04T12:35:04.9416505Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T12:35:04.9417049Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 988, in _compile_fx_inner 2025-12-04T12:35:04.9417209Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T12:35:04.9417740Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T12:35:04.9418078Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T12:35:04.9418598Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1519, in codegen_and_compile 2025-12-04T12:35:04.9418739Z compiled_fn = AotCodeCompiler.compile( 2025-12-04T12:35:04.9419206Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 2409, in compile 2025-12-04T12:35:04.9419316Z subprocess.run( 2025-12-04T12:35:04.9419590Z File "/opt/conda/envs/py_3.10/lib/python3.10/subprocess.py", line 526, in run 2025-12-04T12:35:04.9419776Z raise CalledProcessError(retcode, process.args, 2025-12-04T12:35:04.9421875Z torch._inductor.exc.InductorError: CalledProcessError: Command '['nvcc', '-fatbin', '/tmp/tmpgm40r8lx/ce75deix5v2ltuxilvqvbxmlr5qvlqmr22yvpthfi2chjjhlxkcn/cos_triton_poi_fused_cos_0.ptx', '-o', '/tmp/tmpgm40r8lx/ce75deix5v2ltuxilvqvbxmlr5qvlqmr22yvpthfi2chjjhlxkcn/cos_triton_poi_fused_cos_0.fatbin', '-gencode', 'arch=compute_75,code=compute_75', '-gencode', 'arch=compute_75,code=sm_75']' returned non-zero exit status 255. 2025-12-04T12:35:04.9421883Z 2025-12-04T12:35:04.9422124Z To execute this test, run the following from the base repo dir: 2025-12-04T12:35:04.9422810Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_aot_inductor_package.py TestAOTInductorPackageCpp_cuda.test_compile_standalone_cos 2025-12-04T12:35:04.9422819Z 2025-12-04T12:35:04.9423104Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:35:04.9423329Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:35:04.9424012Z inductor [('benchmarking.InductorBenchmarker.benchmark', 2), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('async_compile_cache_miss', 1)] 2025-12-04T12:35:04.9424127Z graph_break [] 2025-12-04T12:35:04.9424348Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:35:04.9425182Z /opt/conda/envs/py_3.10/lib/python3.10/copyreg.py:101: FutureWarning: `isinstance(treespec, LeafSpec)` is deprecated, use `isinstance(treespec, TreeSpec) and treespec.is_leaf()` instead. 2025-12-04T12:35:04.9425300Z return cls.__new__(cls, *args) 2025-12-04T12:35:04.9426903Z nvcc -fatbin /tmp/tmpszu7egnh/ce75deix5v2ltuxilvqvbxmlr5qvlqmr22yvpthfi2chjjhlxkcn/cos_triton_poi_fused_cos_0.ptx -o /tmp/tmpszu7egnh/ce75deix5v2ltuxilvqvbxmlr5qvlqmr22yvpthfi2chjjhlxkcn/cos_triton_poi_fused_cos_0.fatbin -gencode arch=compute_75,code=compute_75 -gencode arch=compute_75,code=sm_75 failed with: 2025-12-04T12:35:04.9427046Z stdout: 2025-12-04T12:35:04.9427051Z 2025-12-04T12:35:04.9427146Z stderr: 2025-12-04T12:35:04.9427997Z ptxas /tmp/tmpszu7egnh/ce75deix5v2ltuxilvqvbxmlr5qvlqmr22yvpthfi2chjjhlxkcn/cos_triton_poi_fused_cos_0.ptx, line 5; fatal : Unsupported .version 8.7; current version is '8.4' 2025-12-04T12:35:04.9428162Z ptxas fatal : Ptx assembly aborted due to errors 2025-12-04T12:35:04.9428169Z 2025-12-04T12:35:04.9428388Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:35:04.9429083Z inductor [('benchmarking.InductorBenchmarker.benchmark', 2), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('async_compile_cache_miss', 1)] 2025-12-04T12:35:04.9429186Z graph_break [] 2025-12-04T12:35:04.9429416Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:35:04.9430233Z /opt/conda/envs/py_3.10/lib/python3.10/copyreg.py:101: FutureWarning: `isinstance(treespec, LeafSpec)` is deprecated, use `isinstance(treespec, TreeSpec) and treespec.is_leaf()` instead. 2025-12-04T12:35:04.9430384Z return cls.__new__(cls, *args) 2025-12-04T12:35:04.9431990Z nvcc -fatbin /tmp/tmp0mbifsnb/ce75deix5v2ltuxilvqvbxmlr5qvlqmr22yvpthfi2chjjhlxkcn/cos_triton_poi_fused_cos_0.ptx -o /tmp/tmp0mbifsnb/ce75deix5v2ltuxilvqvbxmlr5qvlqmr22yvpthfi2chjjhlxkcn/cos_triton_poi_fused_cos_0.fatbin -gencode arch=compute_75,code=compute_75 -gencode arch=compute_75,code=sm_75 failed with: 2025-12-04T12:35:04.9432084Z stdout: 2025-12-04T12:35:04.9432090Z 2025-12-04T12:35:04.9432195Z stderr: 2025-12-04T12:35:04.9433034Z ptxas /tmp/tmp0mbifsnb/ce75deix5v2ltuxilvqvbxmlr5qvlqmr22yvpthfi2chjjhlxkcn/cos_triton_poi_fused_cos_0.ptx, line 5; fatal : Unsupported .version 8.7; current version is '8.4' 2025-12-04T12:35:04.9433200Z ptxas fatal : Ptx assembly aborted due to errors 2025-12-04T12:35:04.9433208Z 2025-12-04T12:35:04.9433443Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:35:04.9434129Z inductor [('benchmarking.InductorBenchmarker.benchmark', 2), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('async_compile_cache_miss', 1)] 2025-12-04T12:35:04.9434246Z graph_break [] 2025-12-04T12:35:04.9434464Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:35:04.9435309Z /opt/conda/envs/py_3.10/lib/python3.10/copyreg.py:101: FutureWarning: `isinstance(treespec, LeafSpec)` is deprecated, use `isinstance(treespec, TreeSpec) and treespec.is_leaf()` instead. 2025-12-04T12:35:04.9435443Z return cls.__new__(cls, *args) 2025-12-04T12:35:04.9437065Z nvcc -fatbin /tmp/tmpgm40r8lx/ce75deix5v2ltuxilvqvbxmlr5qvlqmr22yvpthfi2chjjhlxkcn/cos_triton_poi_fused_cos_0.ptx -o /tmp/tmpgm40r8lx/ce75deix5v2ltuxilvqvbxmlr5qvlqmr22yvpthfi2chjjhlxkcn/cos_triton_poi_fused_cos_0.fatbin -gencode arch=compute_75,code=compute_75 -gencode arch=compute_75,code=sm_75 failed with: 2025-12-04T12:35:04.9437175Z stdout: 2025-12-04T12:35:04.9437182Z 2025-12-04T12:35:04.9437276Z stderr: 2025-12-04T12:35:04.9438115Z ptxas /tmp/tmpgm40r8lx/ce75deix5v2ltuxilvqvbxmlr5qvlqmr22yvpthfi2chjjhlxkcn/cos_triton_poi_fused_cos_0.ptx, line 5; fatal : Unsupported .version 8.7; current version is '8.4' 2025-12-04T12:35:04.9438296Z ptxas fatal : Ptx assembly aborted due to errors 2025-12-04T12:35:04.9438302Z 2025-12-04T12:35:04.9439134Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_aot_inductor_package/inductor.test_aot_inductor_package-e41e403fca9b1188.xml - 2025-12-04T12:35:04.9439326Z =========================== short test summary info ============================ 2025-12-04T12:35:04.9442005Z FAILED [0.5838s] inductor/test_aot_inductor_package.py::TestAOTInductorPackageCpp_cuda::test_compile_standalone_cos - torch._inductor.exc.InductorError: CalledProcessError: Command '['nvcc', '-fatbin', '/tmp/tmpgm40r8lx/ce75deix5v2ltuxilvqvbxmlr5qvlqmr22yvpthfi2chjjhlxkcn/cos_triton_poi_fused_cos_0.ptx', '-o', '/tmp/tmpgm40r8lx/ce75deix5v2ltuxilvqvbxmlr5qvlqmr22yvpthfi2chjjhlxkcn/cos_triton_poi_fused_cos_0.fatbin', '-gencode', 'arch=compute_75,code=compute_75', '-gencode', 'arch=compute_75,code=sm_75']' returned non-zero exit status 255. 2025-12-04T12:35:04.9442048Z 2025-12-04T12:35:04.9442290Z To execute this test, run the following from the base repo dir: 2025-12-04T12:35:04.9442928Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_aot_inductor_package.py TestAOTInductorPackageCpp_cuda.test_compile_standalone_cos 2025-12-04T12:35:04.9442934Z 2025-12-04T12:35:04.9443205Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:35:04.9443405Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T12:35:04.9443610Z ================== 1 failed, 87 deselected, 2 rerun in 6.89s =================== 2025-12-04T12:35:04.9443732Z Got exit code 1 2025-12-04T12:35:04.9444294Z FAILED CONSISTENTLY: test/inductor/test_aot_inductor_package.py::TestAOTInductorPackageCpp_cuda::test_compile_standalone_cos 2025-12-04T12:35:04.9444736Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T12:35:04.9445399Z Test results will be stored in test-reports/python-pytest/inductor.test_aot_inductor_package/inductor.test_aot_inductor_package-07f86488d4cce1d3.xml 2025-12-04T12:35:04.9445570Z ============================= test session starts ============================== 2025-12-04T12:35:04.9445923Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T12:35:04.9446055Z cachedir: .pytest_cache 2025-12-04T12:35:04.9446579Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:35:04.9446724Z rootdir: /var/lib/jenkins/workspace 2025-12-04T12:35:04.9446837Z configfile: pytest.ini 2025-12-04T12:35:04.9447423Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T12:35:04.9447664Z collecting ... collected 88 items / 72 deselected / 16 selected 2025-12-04T12:35:04.9447811Z stepcurrent: skipping 72 already run items. 2025-12-04T12:35:04.9447930Z Running 16 items in this shard 2025-12-04T12:35:04.9447949Z 2025-12-04T12:35:04.9448689Z inductor/test_aot_inductor_package.py::TestAOTInductorPackageCpp_cuda::test_compile_with_exporter SKIPPED [0.0004s] (Test is only supported on CUDA 12.6+) [ 6%] 2025-12-04T12:35:04.9449430Z inductor/test_aot_inductor_package.py::TestAOTInductorPackageCpp_cuda::test_compile_with_exporter_weights SKIPPED [0.0003s] (Test is only supported on CUDA 12.6+) [ 12%] 2025-12-04T12:35:04.9450699Z inductor/test_aot_inductor_package.py::TestAOTInductorPackageCpp_cuda::test_deepcopy_compiled_model W1204 12:33:12.749000 148603 site-packages/torch/export/pt2_archive/_package.py:763] AOTICompiledModel deepcopy warning: AOTICompiledModel.loader is not deepcopied. 2025-12-04T12:35:04.9450813Z PASSED [10.5333s] [ 18%] 2025-12-04T12:35:04.9451324Z inductor/test_aot_inductor_package.py::TestAOTInductorPackageCpp_cuda::test_duplicate_calls PASSED [22.4138s] [ 25%] 2025-12-04T12:35:04.9452202Z inductor/test_aot_inductor_package.py::TestAOTInductorPackageCpp_cuda::test_linear W1204 12:33:36.177000 148603 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T12:35:04.9452318Z PASSED [10.8322s] [ 31%] 2025-12-04T12:35:04.9453393Z inductor/test_aot_inductor_package.py::TestAOTInductorPackageCpp_cuda::test_loading_wrong_model W1204 12:33:52.011000 148603 site-packages/torch/_inductor/package/package.py:120] Loading outdated pt2 file. Please regenerate your package. 2025-12-04T12:35:04.9453503Z PASSED [6.0079s] [ 37%] 2025-12-04T12:35:04.9453978Z inductor/test_aot_inductor_package.py::TestAOTInductorPackageCpp_cuda::test_metadata PASSED [9.9375s] [ 43%] 2025-12-04T12:35:04.9454510Z inductor/test_aot_inductor_package.py::TestAOTInductorPackageCpp_cuda::test_multiple_methods PASSED [37.7687s] [ 50%] 2025-12-04T12:35:04.9455163Z inductor/test_aot_inductor_package.py::TestAOTInductorPackageCpp_cuda::test_package_shared_weights SKIPPED [0.0033s] (No support for cpp only) [ 56%] 2025-12-04T12:35:04.9455850Z inductor/test_aot_inductor_package.py::TestAOTInductorPackageCpp_cuda::test_package_user_managed_weight SKIPPED [0.0030s] (No support for cpp only) [ 62%] 2025-12-04T12:35:04.9456630Z inductor/test_aot_inductor_package.py::TestAOTInductorPackageCpp_cuda::test_package_weights_on_disk_nested_module SKIPPED [0.0030s] (No support for cpp only) [ 68%] 2025-12-04T12:35:04.9457309Z inductor/test_aot_inductor_package.py::TestAOTInductorPackageCpp_cuda::test_package_without_weight SKIPPED [0.0029s] (No support for cpp only) [ 75%] 2025-12-04T12:35:04.9457843Z inductor/test_aot_inductor_package.py::TestAOTInductorPackageCpp_cuda::test_remove_intermediate_files PASSED [6.0768s] [ 81%] 2025-12-04T12:35:04.9458313Z inductor/test_aot_inductor_package.py::TestAOTInductorPackageCpp_cuda::test_save_buffer PASSED [6.1722s] [ 87%] 2025-12-04T12:35:04.9458878Z inductor/test_aot_inductor_package.py::TestAOTInductorPackageCpp_cuda::test_specified_output_dir PASSED [9.9551s] [ 93%] 2025-12-04T12:35:04.9459491Z inductor/test_aot_inductor_package.py::TestAOTInductorPackageCpp_cuda::test_update_weights SKIPPED [0.0052s] (No support for cpp only) [100%] 2025-12-04T12:35:04.9459497Z 2025-12-04T12:35:04.9460344Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_aot_inductor_package/inductor.test_aot_inductor_package-07f86488d4cce1d3.xml - 2025-12-04T12:35:04.9460572Z =========== 9 passed, 7 skipped, 72 deselected in 119.78s (0:01:59) ============ 2025-12-04T12:35:04.9461252Z The following tests failed consistently: ['test/inductor/test_aot_inductor_package.py::TestAOTInductorPackageCpp_cuda::test_compile_standalone_cos'] 2025-12-04T12:35:04.9461260Z 2025-12-04T12:35:04.9461881Z FINISHED PRINTING LOG FILE of inductor/test_aot_inductor_package 1/1 (test/test-reports/inductor.test_aot_inductor_package_1.1_5509f9f54e762912_.log) 2025-12-04T12:35:04.9461889Z 2025-12-04T12:35:04.9462281Z Finished inductor/test_aot_inductor_package 1/1 ... [2025-12-04 12:35:04.270661][12132.653543468], took 9.22min 2025-12-04T12:35:04.9463243Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_aot_inductor_package/inductor.test_aot_inductor_package-b1ca468dab29d0d8.xml 2025-12-04T12:35:04.9464123Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_aot_inductor_package/inductor.test_aot_inductor_package-69f64b5320fd797d.xml 2025-12-04T12:35:04.9465041Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_aot_inductor_package/inductor.test_aot_inductor_package-e41e403fca9b1188.xml 2025-12-04T12:35:04.9465914Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_aot_inductor_package/inductor.test_aot_inductor_package-07f86488d4cce1d3.xml 2025-12-04T12:35:05.0701010Z Uploading logs for 57119749248 to S3 2025-12-04T12:35:05.2827341Z Uploading artifacts took 0.81 seconds 2025-12-04T12:35:05.2827852Z inductor/test_aot_inductor_package 1/1 failed! 2025-12-04T12:35:05.2832017Z Running inductor/test_padding 1/1 ... [2025-12-04 12:35:05.283004][12133.665898165] 2025-12-04T12:35:05.2832613Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T12:35:05.2837081Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_padding.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 12:35:05.283465] 2025-12-04T12:35:59.9338211Z 2025-12-04T12:35:59.9339368Z inductor/test_padding 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_padding_1.1_4d224b6d5f4af5af_.log 2025-12-04T12:35:59.9368708Z Running 55 items in this shard: test/inductor/test_padding.py::PerfTestBetweenGoodAndBadShape::test_BertForMaskedLM, test/inductor/test_padding.py::PerfTestBetweenGoodAndBadShape::test_LinearAndSoftmax_both_shapes, test/inductor/test_padding.py::PerfTestBetweenGoodAndBadShape::test_nobias_LinearAndSoftmax_both_shapes, test/inductor/test_padding.py::PerfTestWithAndWithoutPadding::test_longformer, test/inductor/test_padding.py::PerfTestWithAndWithoutPadding::test_longformer_small_bs, test/inductor/test_padding.py::PerfTestWithAndWithoutPadding::test_nvidia_deeprecommender, test/inductor/test_padding.py::PaddingTest::test_LinearAndSoftmax_codegen, test/inductor/test_padding.py::PaddingTest::test_attention, test/inductor/test_padding.py::PaddingTest::test_cat, test/inductor/test_padding.py::PaddingTest::test_conv, test/inductor/test_padding.py::PaddingTest::test_dynamic_shape_padding_shape0_alignment_bytes_32_enable_pad_False, test/inductor/test_padding.py::PaddingTest::test_dynamic_shape_padding_shape1_alignment_bytes_32_enable_pad_True, test/inductor/test_padding.py::PaddingTest::test_dynamic_shape_padding_shape2_alignment_bytes_64_enable_pad_False, test/inductor/test_padding.py::PaddingTest::test_dynamic_shape_padding_shape3_alignment_bytes_64_enable_pad_True, test/inductor/test_padding.py::PaddingTest::test_dynamic_shape_padding_shape4_alignment_bytes_32_enable_pad_False, test/inductor/test_padding.py::PaddingTest::test_dynamic_shape_padding_shape5_alignment_bytes_32_enable_pad_True, test/inductor/test_padding.py::PaddingTest::test_dynamic_shape_padding_shape6_alignment_bytes_64_enable_pad_False, test/inductor/test_padding.py::PaddingTest::test_dynamic_shape_padding_shape7_alignment_bytes_64_enable_pad_True, test/inductor/test_padding.py::PaddingTest::test_matmul, test/inductor/test_padding.py::PaddingTest::test_mm_padding_perf, test/inductor/test_padding.py::PaddingTest::test_nobias_LinearAndSoftmax_codegen, test/inductor/test_padding.py::PaddingTest::test_noop_concat_output_padding_shape0_alignment_bytes_32_pad_output_False, test/inductor/test_padding.py::PaddingTest::test_noop_concat_output_padding_shape1_alignment_bytes_32_pad_output_True, test/inductor/test_padding.py::PaddingTest::test_noop_concat_output_padding_shape2_alignment_bytes_64_pad_output_False, test/inductor/test_padding.py::PaddingTest::test_noop_concat_output_padding_shape3_alignment_bytes_64_pad_output_True, test/inductor/test_padding.py::PaddingTest::test_outer_dynamic_shape_padding_shape0_alignment_bytes_32_enable_pad_False, test/inductor/test_padding.py::PaddingTest::test_outer_dynamic_shape_padding_shape1_alignment_bytes_32_enable_pad_True, test/inductor/test_padding.py::PaddingTest::test_outer_dynamic_shape_padding_shape2_alignment_bytes_64_enable_pad_False, test/inductor/test_padding.py::PaddingTest::test_outer_dynamic_shape_padding_shape3_alignment_bytes_64_enable_pad_True, test/inductor/test_padding.py::PaddingTest::test_outer_dynamic_shape_padding_shape4_alignment_bytes_32_enable_pad_False, test/inductor/test_padding.py::PaddingTest::test_outer_dynamic_shape_padding_shape5_alignment_bytes_32_enable_pad_True, test/inductor/test_padding.py::PaddingTest::test_outer_dynamic_shape_padding_shape6_alignment_bytes_64_enable_pad_False, test/inductor/test_padding.py::PaddingTest::test_outer_dynamic_shape_padding_shape7_alignment_bytes_64_enable_pad_True, test/inductor/test_padding.py::PaddingTest::test_pad_3d_tensor, test/inductor/test_padding.py::PaddingTest::test_pad_channels_last, test/inductor/test_padding.py::PaddingTest::test_pad_outputs_alignment_bytes_128_shape0_float16, test/inductor/test_padding.py::PaddingTest::test_pad_outputs_alignment_bytes_128_shape0_float32, test/inductor/test_padding.py::PaddingTest::test_pad_outputs_alignment_bytes_128_shape1_float16, test/inductor/test_padding.py::PaddingTest::test_pad_outputs_alignment_bytes_128_shape1_float32, test/inductor/test_padding.py::PaddingTest::test_pad_outputs_alignment_bytes_32_shape0_float16, test/inductor/test_padding.py::PaddingTest::test_pad_outputs_alignment_bytes_32_shape0_float32, test/inductor/test_padding.py::PaddingTest::test_pad_outputs_alignment_bytes_32_shape1_float16, test/inductor/test_padding.py::PaddingTest::test_pad_outputs_alignment_bytes_32_shape1_float32, test/inductor/test_padding.py::PaddingTest::test_pad_strides, test/inductor/test_padding.py::PaddingTest::test_pad_strides_skip, test/inductor/test_padding.py::PaddingTest::test_padmm, test/inductor/test_padding.py::PaddingTest::test_perm_outer_dynamic_shape_padding_shape0_perm0_alignment_bytes_32_enable_pad_False, test/inductor/test_padding.py::PaddingTest::test_perm_outer_dynamic_shape_padding_shape1_perm1_alignment_bytes_32_enable_pad_True, test/inductor/test_padding.py::PaddingTest::test_perm_outer_dynamic_shape_padding_shape2_perm2_alignment_bytes_64_enable_pad_True, test/inductor/test_padding.py::PaddingTest::test_perm_outer_dynamic_shape_padding_shape3_perm3_alignment_bytes_64_enable_pad_False, test/inductor/test_padding.py::PaddingTest::test_perm_outer_dynamic_shape_padding_shape4_perm4_alignment_bytes_32_enable_pad_False, test/inductor/test_padding.py::PaddingTest::test_perm_outer_dynamic_shape_padding_shape5_perm5_alignment_bytes_32_enable_pad_True, test/inductor/test_padding.py::PaddingTest::test_perm_outer_dynamic_shape_padding_shape6_perm6_alignment_bytes_64_enable_pad_True, test/inductor/test_padding.py::PaddingTest::test_perm_outer_dynamic_shape_padding_shape7_perm7_alignment_bytes_64_enable_pad_False, test/inductor/test_padding.py::PaddingTest::test_view 2025-12-04T12:35:59.9396284Z 2025-12-04T12:35:59.9396630Z Finished inductor/test_padding 1/1 ... [2025-12-04 12:35:59.933676][12188.316571745], took 0.91min 2025-12-04T12:35:59.9575457Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_padding/inductor.test_padding-be250a10b53bb058.xml 2025-12-04T12:36:00.0467886Z Running dynamo/test_aot_compile 1/1 ... [2025-12-04 12:36:00.046401][12188.429295275] 2025-12-04T12:36:00.0468480Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T12:36:00.0471203Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'dynamo/test_aot_compile.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 12:36:00.046858] 2025-12-04T12:37:50.1755930Z 2025-12-04T12:37:50.1757700Z dynamo/test_aot_compile 1/1 was successful, full logs can be found in artifacts with path test/test-reports/dynamo.test_aot_compile_1.1_232ed44e0e50b87e_.log 2025-12-04T12:37:50.1768687Z Running 25 items in this shard: test/dynamo/test_aot_compile.py::TestAOTCompile::test_aot_compile_basic_fn, test/dynamo/test_aot_compile.py::TestAOTCompile::test_aot_compile_basic_fn_inductor, test/dynamo/test_aot_compile.py::TestAOTCompile::test_aot_compile_basic_forward, test/dynamo/test_aot_compile.py::TestAOTCompile::test_aot_compile_disable_guard_check, test/dynamo/test_aot_compile.py::TestAOTCompile::test_aot_compile_grad_mode_after_prior_compile, test/dynamo/test_aot_compile.py::TestAOTCompile::test_aot_compile_graph_break_error_fmt, test/dynamo/test_aot_compile.py::TestAOTCompile::test_aot_compile_module, test/dynamo/test_aot_compile.py::TestAOTCompile::test_aot_compile_repeat_interleave, test/dynamo/test_aot_compile.py::TestAOTCompile::test_aot_compile_source_info, test/dynamo/test_aot_compile.py::TestAOTCompile::test_aot_compile_with_aoti, test/dynamo/test_aot_compile.py::TestAOTCompile::test_aot_compile_with_aoti_module, test/dynamo/test_aot_compile.py::TestAOTCompile::test_aot_compile_with_aoti_torch_compile, test/dynamo/test_aot_compile.py::TestAOTCompile::test_aot_compile_with_checkpoint, test/dynamo/test_aot_compile.py::TestAOTCompile::test_aot_compile_with_closure_save_and_load, test/dynamo/test_aot_compile.py::TestAOTCompile::test_aot_compile_with_default_args, test/dynamo/test_aot_compile.py::TestAOTCompile::test_aot_compile_with_global_tensor, test/dynamo/test_aot_compile.py::TestAOTCompile::test_aot_compile_with_super_call, test/dynamo/test_aot_compile.py::TestAOTCompile::test_aot_module_simplified_serializable_autograd, test/dynamo/test_aot_compile.py::TestAOTCompile::test_aot_module_simplified_serializable_inference, test/dynamo/test_aot_compile.py::TestAOTCompile::test_decorated_function_aot, test/dynamo/test_aot_compile.py::TestAOTCompile::test_decorated_function_with_functools_wrap_aot, test/dynamo/test_aot_compile.py::TestAOTCompile::test_external_refs_validation, test/dynamo/test_aot_compile.py::TestAOTCompile::test_fullgraph_capture_with_pytree_func, test/dynamo/test_aot_compile.py::TestAOTCompile::test_fullgraph_capture_with_pytree_module, test/dynamo/test_aot_compile.py::TestAOTCompile::test_guard_filter_override_aot 2025-12-04T12:37:50.1779725Z 2025-12-04T12:37:50.1780093Z Finished dynamo/test_aot_compile 1/1 ... [2025-12-04 12:37:50.175336][12298.558230105], took 1.84min 2025-12-04T12:37:50.1993594Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/dynamo.test_aot_compile/dynamo.test_aot_compile-10a88b68c9603fe3.xml 2025-12-04T12:37:50.2889723Z Running dynamo/test_sets 1/1 ... [2025-12-04 12:37:50.288629][12298.671523635] 2025-12-04T12:37:50.2890278Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T12:37:50.2893023Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'dynamo/test_sets.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 12:37:50.289067] 2025-12-04T12:38:06.7776261Z 2025-12-04T12:38:06.7777447Z dynamo/test_sets 1/1 was successful, full logs can be found in artifacts with path test/test-reports/dynamo.test_sets_1.1_e77962cd1c25fe47_.log 2025-12-04T12:38:06.7815113Z Running 124 items in this shard: test/dynamo/test_sets.py::CustomSetTests::test_custom_add, test/dynamo/test_sets.py::CustomSetTests::test_custom_contains, test/dynamo/test_sets.py::MiscTests::test_isdisjoint_with_generator, test/dynamo/test_sets.py::TestSetGuards::test_in_guard, test/dynamo/test_sets.py::TestSetGuards::test_set_guard_on_keys_change, test/dynamo/test_sets.py::TestSetGuards::test_set_multiple_types, test/dynamo/test_sets.py::TestSetGuards::test_set_recompile_on_key_change, test/dynamo/test_sets.py::TestSetGuards::test_set_recompile_on_key_pop, test/dynamo/test_sets.py::TestSetGuards::test_set_with_function, test/dynamo/test_sets.py::TestSetGuards::test_set_with_tensors, test/dynamo/test_sets.py::FrozensetTests::test_binop_and, test/dynamo/test_sets.py::FrozensetTests::test_binop_or, test/dynamo/test_sets.py::FrozensetTests::test_binop_sub, test/dynamo/test_sets.py::FrozensetTests::test_binop_xor, test/dynamo/test_sets.py::FrozensetTests::test_cmp_eq, test/dynamo/test_sets.py::FrozensetTests::test_cmp_greater_than, test/dynamo/test_sets.py::FrozensetTests::test_cmp_greater_than_or_equal, test/dynamo/test_sets.py::FrozensetTests::test_cmp_less_than, test/dynamo/test_sets.py::FrozensetTests::test_cmp_less_than_or_equal, test/dynamo/test_sets.py::FrozensetTests::test_cmp_ne, test/dynamo/test_sets.py::FrozensetTests::test_constructor_iterable, test/dynamo/test_sets.py::FrozensetTests::test_contains, test/dynamo/test_sets.py::FrozensetTests::test_copy, test/dynamo/test_sets.py::FrozensetTests::test_difference, test/dynamo/test_sets.py::FrozensetTests::test_equality, test/dynamo/test_sets.py::FrozensetTests::test_in_frozenset, test/dynamo/test_sets.py::FrozensetTests::test_intersection, test/dynamo/test_sets.py::FrozensetTests::test_isdisjoint, test/dynamo/test_sets.py::FrozensetTests::test_issubset, test/dynamo/test_sets.py::FrozensetTests::test_issuperset, test/dynamo/test_sets.py::FrozensetTests::test_symmetric_difference, test/dynamo/test_sets.py::FrozensetTests::test_to_frozenset, test/dynamo/test_sets.py::FrozensetTests::test_to_set, test/dynamo/test_sets.py::FrozensetTests::test_union, test/dynamo/test_sets.py::SetTests::test_add, test/dynamo/test_sets.py::SetTests::test_binop_and, test/dynamo/test_sets.py::SetTests::test_binop_or, test/dynamo/test_sets.py::SetTests::test_binop_sub, test/dynamo/test_sets.py::SetTests::test_binop_xor, test/dynamo/test_sets.py::SetTests::test_clear, test/dynamo/test_sets.py::SetTests::test_cmp_eq, test/dynamo/test_sets.py::SetTests::test_cmp_greater_than, test/dynamo/test_sets.py::SetTests::test_cmp_greater_than_or_equal, test/dynamo/test_sets.py::SetTests::test_cmp_less_than, test/dynamo/test_sets.py::SetTests::test_cmp_less_than_or_equal, test/dynamo/test_sets.py::SetTests::test_cmp_ne, test/dynamo/test_sets.py::SetTests::test_constructor_iterable, test/dynamo/test_sets.py::SetTests::test_contains, test/dynamo/test_sets.py::SetTests::test_copy, test/dynamo/test_sets.py::SetTests::test_difference, test/dynamo/test_sets.py::SetTests::test_difference_update, test/dynamo/test_sets.py::SetTests::test_discard, test/dynamo/test_sets.py::SetTests::test_equality, test/dynamo/test_sets.py::SetTests::test_in_frozenset, test/dynamo/test_sets.py::SetTests::test_intersection, test/dynamo/test_sets.py::SetTests::test_intersection_update, test/dynamo/test_sets.py::SetTests::test_isdisjoint, test/dynamo/test_sets.py::SetTests::test_issubset, test/dynamo/test_sets.py::SetTests::test_issuperset, test/dynamo/test_sets.py::SetTests::test_pop, test/dynamo/test_sets.py::SetTests::test_remove, test/dynamo/test_sets.py::SetTests::test_symmetric_difference, test/dynamo/test_sets.py::SetTests::test_symmetric_difference_update, test/dynamo/test_sets.py::SetTests::test_to_frozenset, test/dynamo/test_sets.py::SetTests::test_to_set, test/dynamo/test_sets.py::SetTests::test_union, test/dynamo/test_sets.py::SetTests::test_update, test/dynamo/test_sets.py::UserDefinedSetTests::test_add, test/dynamo/test_sets.py::UserDefinedSetTests::test_binop_and, test/dynamo/test_sets.py::UserDefinedSetTests::test_binop_or, test/dynamo/test_sets.py::UserDefinedSetTests::test_binop_sub, test/dynamo/test_sets.py::UserDefinedSetTests::test_binop_xor, test/dynamo/test_sets.py::UserDefinedSetTests::test_clear, test/dynamo/test_sets.py::UserDefinedSetTests::test_cmp_eq, test/dynamo/test_sets.py::UserDefinedSetTests::test_cmp_greater_than, test/dynamo/test_sets.py::UserDefinedSetTests::test_cmp_greater_than_or_equal, test/dynamo/test_sets.py::UserDefinedSetTests::test_cmp_less_than, test/dynamo/test_sets.py::UserDefinedSetTests::test_cmp_less_than_or_equal, test/dynamo/test_sets.py::UserDefinedSetTests::test_cmp_ne, test/dynamo/test_sets.py::UserDefinedSetTests::test_constructor_iterable, test/dynamo/test_sets.py::UserDefinedSetTests::test_contains, test/dynamo/test_sets.py::UserDefinedSetTests::test_copy, test/dynamo/test_sets.py::UserDefinedSetTests::test_difference, test/dynamo/test_sets.py::UserDefinedSetTests::test_difference_update, test/dynamo/test_sets.py::UserDefinedSetTests::test_discard, test/dynamo/test_sets.py::UserDefinedSetTests::test_equality, test/dynamo/test_sets.py::UserDefinedSetTests::test_in_frozenset, test/dynamo/test_sets.py::UserDefinedSetTests::test_intersection, test/dynamo/test_sets.py::UserDefinedSetTests::test_intersection_update, test/dynamo/test_sets.py::UserDefinedSetTests::test_isdisjoint, test/dynamo/test_sets.py::UserDefinedSetTests::test_issubset, test/dynamo/test_sets.py::UserDefinedSetTests::test_issuperset, test/dynamo/test_sets.py::UserDefinedSetTests::test_pop, test/dynamo/test_sets.py::UserDefinedSetTests::test_remove, test/dynamo/test_sets.py::UserDefinedSetTests::test_symmetric_difference, test/dynamo/test_sets.py::UserDefinedSetTests::test_symmetric_difference_update, test/dynamo/test_sets.py::UserDefinedSetTests::test_to_frozenset, test/dynamo/test_sets.py::UserDefinedSetTests::test_to_set, test/dynamo/test_sets.py::UserDefinedSetTests::test_union, test/dynamo/test_sets.py::UserDefinedSetTests::test_update, test/dynamo/test_sets.py::UserDefinedFrozensetTests::test_binop_and, test/dynamo/test_sets.py::UserDefinedFrozensetTests::test_binop_or, test/dynamo/test_sets.py::UserDefinedFrozensetTests::test_binop_sub, test/dynamo/test_sets.py::UserDefinedFrozensetTests::test_binop_xor, test/dynamo/test_sets.py::UserDefinedFrozensetTests::test_cmp_eq, test/dynamo/test_sets.py::UserDefinedFrozensetTests::test_cmp_greater_than, test/dynamo/test_sets.py::UserDefinedFrozensetTests::test_cmp_greater_than_or_equal, test/dynamo/test_sets.py::UserDefinedFrozensetTests::test_cmp_less_than, test/dynamo/test_sets.py::UserDefinedFrozensetTests::test_cmp_less_than_or_equal, test/dynamo/test_sets.py::UserDefinedFrozensetTests::test_cmp_ne, test/dynamo/test_sets.py::UserDefinedFrozensetTests::test_constructor_iterable, test/dynamo/test_sets.py::UserDefinedFrozensetTests::test_contains, test/dynamo/test_sets.py::UserDefinedFrozensetTests::test_copy, test/dynamo/test_sets.py::UserDefinedFrozensetTests::test_difference, test/dynamo/test_sets.py::UserDefinedFrozensetTests::test_equality, test/dynamo/test_sets.py::UserDefinedFrozensetTests::test_in_frozenset, test/dynamo/test_sets.py::UserDefinedFrozensetTests::test_intersection, test/dynamo/test_sets.py::UserDefinedFrozensetTests::test_isdisjoint, test/dynamo/test_sets.py::UserDefinedFrozensetTests::test_issubset, test/dynamo/test_sets.py::UserDefinedFrozensetTests::test_issuperset, test/dynamo/test_sets.py::UserDefinedFrozensetTests::test_symmetric_difference, test/dynamo/test_sets.py::UserDefinedFrozensetTests::test_to_frozenset, test/dynamo/test_sets.py::UserDefinedFrozensetTests::test_to_set, test/dynamo/test_sets.py::UserDefinedFrozensetTests::test_union 2025-12-04T12:38:06.7852424Z 2025-12-04T12:38:06.7852721Z Finished dynamo/test_sets 1/1 ... [2025-12-04 12:38:06.777519][12315.160414108], took 0.27min 2025-12-04T12:38:06.8015161Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/dynamo.test_sets/dynamo.test_sets-f0cb58e83c4ea8ef.xml 2025-12-04T12:38:06.8806884Z Running dynamo/test_wrap_inductor_compiled_regions 1/1 ... [2025-12-04 12:38:06.880344][12315.263239371] 2025-12-04T12:38:06.8807593Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T12:38:06.8810082Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'dynamo/test_wrap_inductor_compiled_regions.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 12:38:06.880752] 2025-12-04T12:38:40.9615757Z 2025-12-04T12:38:40.9619305Z dynamo/test_wrap_inductor_compiled_regions 1/1 was successful, full logs can be found in artifacts with path test/test-reports/dynamo.test_wrap_inductor_compiled_regions_1.1_1c64e72dd7c0888e_.log 2025-12-04T12:38:40.9633042Z Running 18 items in this shard: test/dynamo/test_wrap_inductor_compiled_regions.py::TestWrapInductorCompiledRegions::test_flex_attention_with_sac_must_save, test/dynamo/test_wrap_inductor_compiled_regions.py::TestWrapInductorCompiledRegions::test_flex_attention_with_sac_prefer_recompute, test/dynamo/test_wrap_inductor_compiled_regions.py::TestWrapInductorCompiledRegions::test_flex_attention_with_wrapper_basic, test/dynamo/test_wrap_inductor_compiled_regions.py::TestWrapInductorCompiledRegions::test_flex_attention_wrapper_visible_in_debug_mode, test/dynamo/test_wrap_inductor_compiled_regions.py::TestWrapInductorCompiledRegions::test_flex_attention_wrapper_with_backward, test/dynamo/test_wrap_inductor_compiled_regions.py::TestWrapInductorCompiledRegions::test_flex_attention_wrapper_with_cache, test/dynamo/test_wrap_inductor_compiled_regions.py::TestWrapInductorCompiledRegions::test_sac_outer_compile_inner_basic, test/dynamo/test_wrap_inductor_compiled_regions.py::TestWrapInductorCompiledRegions::test_sac_outer_compile_inner_flex_attention, test/dynamo/test_wrap_inductor_compiled_regions.py::TestWrapInductorCompiledRegions::test_wrap_config_affects_cache_key, test/dynamo/test_wrap_inductor_compiled_regions.py::TestWrapInductorCompiledRegions::test_wrap_default_disabled, test/dynamo/test_wrap_inductor_compiled_regions.py::TestWrapInductorCompiledRegions::test_wrap_disabled_not_visible_in_debug_mode, test/dynamo/test_wrap_inductor_compiled_regions.py::TestWrapInductorCompiledRegions::test_wrap_enabled_visible_in_debug_mode, test/dynamo/test_wrap_inductor_compiled_regions.py::TestWrapInductorCompiledRegions::test_wrap_no_dispatch_mode_no_hop_invoked, test/dynamo/test_wrap_inductor_compiled_regions.py::TestWrapInductorCompiledRegions::test_wrap_option_type_validation, test/dynamo/test_wrap_inductor_compiled_regions.py::TestWrapInductorCompiledRegions::test_wrap_per_compilation, test/dynamo/test_wrap_inductor_compiled_regions.py::TestWrapInductorCompiledRegions::test_wrap_with_backward, test/dynamo/test_wrap_inductor_compiled_regions.py::TestWrapInductorCompiledRegions::test_wrap_with_cache, test/dynamo/test_wrap_inductor_compiled_regions.py::TestWrapInductorCompiledRegions::test_wrap_with_multiple_ops 2025-12-04T12:38:40.9644384Z 2025-12-04T12:38:40.9644832Z Finished dynamo/test_wrap_inductor_compiled_regions 1/1 ... [2025-12-04 12:38:40.961336][12349.344232968], took 0.57min 2025-12-04T12:38:40.9853355Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/dynamo.test_wrap_inductor_compiled_regions/dynamo.test_wrap_inductor_compiled_regions-2f1d9c362e038030.xml 2025-12-04T12:38:41.0751344Z Running test_sparse 2/2 ... [2025-12-04 12:38:41.074786][12349.457680642] 2025-12-04T12:38:41.0752091Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T12:38:41.0754571Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_sparse.py', '--shard-id=2', '--num-shards=2', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 12:38:41.075224] 2025-12-04T12:44:58.2663314Z 2025-12-04T12:44:58.2664256Z test_sparse 2/2 was successful, full logs can be found in artifacts with path test/test-reports/test_sparse_2.2_a491ad82f72502f4_.log 2025-12-04T12:44:58.3334850Z Running 1574 items in this shard: test/test_sparse.py::TestSparseLegacyAndDeprecation::test_legacy_warnings, test/test_sparse.py::TestSparseOneOff::test_cuda_sparse_cpu_dense_add, test/test_sparse.py::TestSparseMeta::test_add_meta_SparseCSR_float64, test/test_sparse.py::TestSparseMeta::test_fake_SparseBSC_float64, test/test_sparse.py::TestSparseMeta::test_fake_SparseBSR_float64, test/test_sparse.py::TestSparseMeta::test_fake_SparseCSC_float64, test/test_sparse.py::TestSparseMeta::test_fake_SparseCSR_float64, test/test_sparse.py::TestSparseMeta::test_meta_SparseBSR_float64, test/test_sparse.py::TestSparseMeta::test_meta_SparseCOO_float64, test/test_sparse.py::TestSparseMeta::test_meta_SparseCSC_float64, test/test_sparse.py::TestSparseMeta::test_print_meta_SparseBSR_float64, test/test_sparse.py::TestSparseMeta::test_print_meta_SparseCSC_float64, test/test_sparse.py::TestSparseMeta::test_print_meta_SparseCSR_float64, test/test_sparse.py::TestSparseMeta::test_sum_meta_SparseCSR_float64, test/test_sparse.py::TestSparseMeta::test_to_meta_SparseBSC_float64, test/test_sparse.py::TestSparseMeta::test_to_meta_SparseBSR_float64, test/test_sparse.py::TestSparseMeta::test_to_meta_SparseCSC_float64, test/test_sparse.py::TestSparseMeta::test_to_meta_SparseCSR_float64, test/test_sparse.py::TestSparseMeta::test_zeros_like_fake_SparseBSC_float64, test/test_sparse.py::TestSparseMeta::test_zeros_like_fake_SparseCOO_float64, test/test_sparse.py::TestSparseMeta::test_zeros_like_fake_SparseCSR_float64, test/test_sparse.py::TestSparseMeta::test_zeros_like_meta_SparseBSC_float64, test/test_sparse.py::TestSparseMeta::test_zeros_like_meta_SparseBSR_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_abs_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_abs_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_abs_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_abs_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_abs_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_abs_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_asin_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_asin_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_asin_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_asin_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_asin_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_asin_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_asinh_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_asinh_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_asinh_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_asinh_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_asinh_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_asinh_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_asinh_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_atan_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_atan_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_atan_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_atanh_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_atanh_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_atanh_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_ceil_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_ceil_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_ceil_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_conj_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_conj_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_conj_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_conj_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_conj_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_conj_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_conj_physical_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_conj_physical_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_conj_physical_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_conj_physical_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_conj_physical_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_deg2rad_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_deg2rad_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_deg2rad_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_deg2rad_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_erf_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_erf_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_erf_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_erf_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_erf_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_erfinv_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_erfinv_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_erfinv_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_erfinv_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_erfinv_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_expm1_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_expm1_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_expm1_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_expm1_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_floor_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_floor_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_frac_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_isinf_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_isinf_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_isinf_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_isinf_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_isinf_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_isinf_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_isinf_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_isnan_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_isnan_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_isnan_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_isnan_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_isneginf_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_isneginf_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_isneginf_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_isposinf_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_isposinf_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_log1p_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_log1p_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_log1p_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_log1p_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_log1p_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_nan_to_num_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_nan_to_num_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_nan_to_num_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_neg_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_neg_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_neg_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_neg_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_neg_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_neg_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_nn_functional_relu_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_nn_functional_relu_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_nn_functional_relu_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_positive_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_positive_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_positive_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_positive_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_rad2deg_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_rad2deg_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_rad2deg_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_round_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_round_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_round_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_round_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_sgn_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_sgn_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_sgn_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_sign_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_sign_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_sign_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_sign_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_sign_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_signbit_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_signbit_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_signbit_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_signbit_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_signbit_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_sin_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_sin_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_sin_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_sinh_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_sinh_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_sinh_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_sqrt_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_sqrt_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_sqrt_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_sqrt_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_tan_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_tan_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_tan_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_tanh_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_trunc_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_trunc_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_trunc_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_trunc_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_abs_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_abs_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_abs_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_abs_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_abs_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_abs_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_abs_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_asin_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_asin_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_asin_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_asinh_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_asinh_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_asinh_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_asinh_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_atan_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_atan_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_atan_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_atan_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_atan_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_atanh_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_atanh_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_atanh_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_atanh_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_ceil_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_ceil_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_ceil_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_conj_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_conj_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_conj_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_conj_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_conj_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_conj_physical_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_conj_physical_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_conj_physical_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_conj_physical_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_deg2rad_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_erf_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_erf_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_erf_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_erf_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_erf_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_erfinv_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_erfinv_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_erfinv_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_erfinv_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_erfinv_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_expm1_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_expm1_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_expm1_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_expm1_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_expm1_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_expm1_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_expm1_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_expm1_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_floor_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_floor_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_floor_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_frac_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_isinf_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_isinf_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_isinf_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_isinf_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_isinf_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_isinf_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_isnan_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_isnan_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_isnan_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_isneginf_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_isneginf_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_isneginf_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_isneginf_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_isneginf_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_isposinf_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_isposinf_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_isposinf_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_isposinf_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_log1p_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_log1p_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_nan_to_num_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_nan_to_num_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_nan_to_num_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_neg_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_neg_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_neg_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_nn_functional_relu_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_nn_functional_relu_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_nn_functional_relu_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_nn_functional_relu_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_nn_functional_relu_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_positive_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_positive_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_rad2deg_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_rad2deg_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_round_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_sgn_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_sgn_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_sgn_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_sgn_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_sgn_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_sign_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_sign_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_sign_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_signbit_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_signbit_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_signbit_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_signbit_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_sin_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_sinh_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_sinh_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_sinh_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_sinh_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_sinh_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_sinh_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_sqrt_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_sqrt_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_sqrt_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_sqrt_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_sqrt_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_tan_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_tan_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_tan_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_tan_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_tan_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_tanh_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_tanh_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_tanh_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_tanh_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_tanh_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_tanh_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_trunc_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_trunc_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_trunc_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_trunc_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_trunc_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_abs_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_abs_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_abs_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_abs_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_abs_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_asin_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_asinh_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_asinh_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_asinh_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_asinh_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_asinh_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_atan_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_atan_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_atan_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_atan_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_atan_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_atanh_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_atanh_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_atanh_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_atanh_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_atanh_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_ceil_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_ceil_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_ceil_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_conj_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_conj_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_conj_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_conj_physical_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_conj_physical_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_conj_physical_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_conj_physical_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_conj_physical_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_conj_physical_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_deg2rad_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_deg2rad_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_deg2rad_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_deg2rad_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_erf_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_erf_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_erf_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_erf_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_erf_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_erf_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_erf_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_erfinv_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_expm1_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_expm1_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_expm1_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_expm1_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_expm1_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_floor_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_floor_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_floor_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_floor_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_floor_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_floor_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_floor_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_frac_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_isinf_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_isinf_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_isinf_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_isnan_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_isnan_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_isnan_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_isnan_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_isnan_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_isnan_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_isnan_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_isnan_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_isneginf_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_isneginf_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_isneginf_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_isneginf_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_isposinf_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_isposinf_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_isposinf_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_isposinf_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_log1p_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_nan_to_num_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_nan_to_num_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_nan_to_num_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_nan_to_num_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_nan_to_num_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_neg_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_neg_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_neg_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_neg_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_nn_functional_relu_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_nn_functional_relu_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_nn_functional_relu_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_nn_functional_relu_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_positive_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_positive_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_positive_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_positive_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_rad2deg_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_rad2deg_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_rad2deg_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_rad2deg_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_rad2deg_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_round_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_sgn_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_sgn_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_sgn_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_sgn_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_sgn_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_sgn_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_sign_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_sign_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_sign_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_signbit_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_signbit_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_signbit_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_signbit_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_sin_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_sin_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_sin_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_sin_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_sin_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_sinh_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_sqrt_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_sqrt_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_sqrt_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_tan_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_tan_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_tan_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_tan_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_tan_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_tanh_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_tanh_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_tanh_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_tanh_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_trunc_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_trunc_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_abs_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_abs_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_asinh_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_atanh_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_ceil_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_conj_physical_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_erf_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_floor_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_frac_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_isnan_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_isnan_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_isneginf_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_isposinf_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_log1p_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_nan_to_num_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_neg_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_positive_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_rad2deg_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_round_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_sgn_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_sgn_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_sin_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_sin_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_sinh_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_tanh_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_tanh_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_trunc_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_abs_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_abs_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_abs_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_abs_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_asin_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_asin_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_asin_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_asin_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_asin_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_asinh_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_asinh_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_asinh_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_asinh_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_atan_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_atan_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_atan_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_atan_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_atanh_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_atanh_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_ceil_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_ceil_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_ceil_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_ceil_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_conj_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_conj_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_conj_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_conj_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_conj_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_conj_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_conj_physical_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_conj_physical_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_conj_physical_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_deg2rad_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_deg2rad_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_erf_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_erf_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_erf_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_erf_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_erf_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_erfinv_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_erfinv_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_erfinv_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_expm1_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_expm1_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_expm1_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_expm1_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_expm1_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_floor_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_floor_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_floor_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_floor_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_floor_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_isinf_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_isinf_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_isnan_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_isnan_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_isnan_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_isneginf_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_isneginf_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_isposinf_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_isposinf_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_isposinf_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_isposinf_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_log1p_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_log1p_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_log1p_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_nan_to_num_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_nan_to_num_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_nan_to_num_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_nan_to_num_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_neg_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_neg_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_neg_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_neg_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_neg_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_nn_functional_relu_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_nn_functional_relu_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_nn_functional_relu_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_nn_functional_relu_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_positive_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_positive_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_positive_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_positive_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_positive_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_rad2deg_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_rad2deg_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_rad2deg_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_rad2deg_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_rad2deg_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_round_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_round_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_round_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_sgn_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_sgn_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_sgn_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_sgn_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_sgn_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_sign_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_sign_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_sign_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_sign_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_sign_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_signbit_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_signbit_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_signbit_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_signbit_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_sin_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_sin_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_sin_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_sin_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_sinh_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_sinh_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_sinh_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_sinh_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_sqrt_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_sqrt_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_sqrt_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_sqrt_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_sqrt_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_sqrt_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_sqrt_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_tan_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_tan_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_tan_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_tan_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_tanh_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_tanh_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_tanh_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_tanh_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_tanh_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_tanh_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_trunc_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_trunc_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_trunc_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_abs_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_abs_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_abs_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_abs_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_abs_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_abs_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_abs_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_asin_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_asin_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_asin_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_asin_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_asin_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_asin_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_asin_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_asin_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_asinh_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_asinh_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_asinh_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_asinh_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_atan_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_atan_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_atan_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_atan_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_atan_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_atan_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_atanh_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_atanh_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_ceil_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_ceil_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_ceil_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_ceil_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_conj_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_conj_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_conj_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_conj_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_conj_physical_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_conj_physical_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_conj_physical_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_conj_physical_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_conj_physical_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_conj_physical_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_conj_physical_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_deg2rad_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_deg2rad_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_deg2rad_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_deg2rad_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_deg2rad_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_erf_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_erf_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_erf_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_erf_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_erf_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_erf_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_erfinv_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_erfinv_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_erfinv_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_expm1_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_expm1_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_expm1_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_expm1_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_expm1_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_floor_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_floor_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_isinf_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_isinf_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_isinf_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_isinf_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_isnan_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_isnan_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_isnan_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_isnan_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_isnan_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_isneginf_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_isneginf_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_isneginf_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_isneginf_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_isneginf_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_isposinf_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_isposinf_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_isposinf_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_log1p_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_log1p_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_log1p_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_log1p_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_nan_to_num_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_nan_to_num_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_nan_to_num_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_neg_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_neg_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_neg_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_neg_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_nn_functional_relu_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_nn_functional_relu_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_nn_functional_relu_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_nn_functional_relu_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_positive_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_positive_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_positive_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_positive_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_positive_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_positive_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_rad2deg_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_rad2deg_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_rad2deg_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_round_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_round_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_round_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_round_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_sgn_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_sgn_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_sgn_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_sign_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_sign_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_sign_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_sign_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_sign_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_signbit_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_signbit_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_signbit_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_signbit_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_sin_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_sin_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_sin_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_sinh_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_sinh_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_sinh_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_sqrt_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_sqrt_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_sqrt_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_sqrt_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_sqrt_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_tan_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_tan_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_tan_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_tan_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_tanh_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_tanh_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_tanh_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_tanh_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_trunc_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_trunc_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_trunc_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_trunc_cuda_uint8, test/test_sparse.py::TestSparseMaskedReductionsCUDA::test_future_empty_dim_masked_amax_cuda_bfloat16, test/test_sparse.py::TestSparseMaskedReductionsCUDA::test_future_empty_dim_masked_amax_cuda_float16, test/test_sparse.py::TestSparseMaskedReductionsCUDA::test_future_empty_dim_masked_amax_cuda_int32, test/test_sparse.py::TestSparseMaskedReductionsCUDA::test_future_empty_dim_masked_amax_cuda_int64, test/test_sparse.py::TestSparseMaskedReductionsCUDA::test_future_empty_dim_masked_amax_cuda_uint8, test/test_sparse.py::TestSparseMaskedReductionsCUDA::test_future_empty_dim_masked_amin_cuda_bfloat16, test/test_sparse.py::TestSparseMaskedReductionsCUDA::test_future_empty_dim_masked_amin_cuda_float16, test/test_sparse.py::TestSparseMaskedReductionsCUDA::test_future_empty_dim_masked_amin_cuda_float64, test/test_sparse.py::TestSparseMaskedReductionsCUDA::test_future_empty_dim_masked_amin_cuda_int8, test/test_sparse.py::TestSparseMaskedReductionsCUDA::test_future_empty_dim_masked_amin_cuda_uint8, test/test_sparse.py::TestSparseMaskedReductionsCUDA::test_future_empty_dim_masked_prod_cuda_bool, test/test_sparse.py::TestSparseMaskedReductionsCUDA::test_future_empty_dim_masked_prod_cuda_complex128, test/test_sparse.py::TestSparseMaskedReductionsCUDA::test_future_empty_dim_masked_prod_cuda_complex64, test/test_sparse.py::TestSparseMaskedReductionsCUDA::test_future_empty_dim_masked_prod_cuda_float32, test/test_sparse.py::TestSparseMaskedReductionsCUDA::test_future_empty_dim_masked_prod_cuda_int16, test/test_sparse.py::TestSparseMaskedReductionsCUDA::test_future_empty_dim_masked_prod_cuda_int8, test/test_sparse.py::TestSparseMaskedReductionsCUDA::test_future_empty_dim_masked_prod_cuda_uint8, test/test_sparse.py::TestSparseMaskedReductionsCUDA::test_future_empty_dim_masked_sum_cuda_bool, test/test_sparse.py::TestSparseMaskedReductionsCUDA::test_future_empty_dim_masked_sum_cuda_complex64, test/test_sparse.py::TestSparseMaskedReductionsCUDA::test_future_empty_dim_masked_sum_cuda_float16, test/test_sparse.py::TestSparseMaskedReductionsCUDA::test_future_empty_dim_masked_sum_cuda_float64, test/test_sparse.py::TestSparseMaskedReductionsCUDA::test_future_empty_dim_masked_sum_cuda_int32, test/test_sparse.py::TestSparseMaskedReductionsCUDA::test_future_empty_dim_masked_sum_cuda_uint8, test/test_sparse.py::TestSparseCUDA::test_Sparse_to_Sparse_copy__cuda_complex128, test/test_sparse.py::TestSparseCUDA::test_Sparse_to_Sparse_copy_multi_gpu_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_add_sub_nnz_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_add_zeros_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_asin_arcsin_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_asin_arcsin_cuda_int16, test/test_sparse.py::TestSparseCUDA::test_basic_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_bmm_oob_cuda, test/test_sparse.py::TestSparseCUDA::test_bmm_windows_error_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_cat_cuda_complex128, test/test_sparse.py::TestSparseCUDA::test_cat_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_change_tensor_metadata_cuda_complex128, test/test_sparse.py::TestSparseCUDA::test_coalesce_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_coalesce_reference_cycle_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_contig_hybrid_cuda_complex128, test/test_sparse.py::TestSparseCUDA::test_contig_hybrid_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_ctor_is_coalesced_with_gradcheck_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_ctor_large_sizes_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_ctor_size_checks_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_cuda_empty_cuda, test/test_sparse.py::TestSparseCUDA::test_empty_full_requires_grad_False_cuda_bfloat16, test/test_sparse.py::TestSparseCUDA::test_empty_full_requires_grad_False_cuda_complex64, test/test_sparse.py::TestSparseCUDA::test_empty_full_requires_grad_False_cuda_float32, test/test_sparse.py::TestSparseCUDA::test_empty_full_requires_grad_True_cuda_bool, test/test_sparse.py::TestSparseCUDA::test_empty_full_requires_grad_True_cuda_complex128, test/test_sparse.py::TestSparseCUDA::test_empty_full_requires_grad_True_cuda_complex64, test/test_sparse.py::TestSparseCUDA::test_empty_full_requires_grad_True_cuda_float16, test/test_sparse.py::TestSparseCUDA::test_empty_full_requires_grad_True_cuda_float32, test/test_sparse.py::TestSparseCUDA::test_empty_full_requires_grad_True_cuda_int32, test/test_sparse.py::TestSparseCUDA::test_empty_full_requires_grad_True_cuda_int8, test/test_sparse.py::TestSparseCUDA::test_empty_full_requires_grad_True_cuda_uint8, test/test_sparse.py::TestSparseCUDA::test_empty_like_cuda_complex128, test/test_sparse.py::TestSparseCUDA::test_factory_copy_cuda, test/test_sparse.py::TestSparseCUDA::test_factory_cuda_complex64, test/test_sparse.py::TestSparseCUDA::test_factory_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_factory_dense_dim_cuda_complex128, test/test_sparse.py::TestSparseCUDA::test_factory_dense_dim_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_factory_device_type_inference_cuda, test/test_sparse.py::TestSparseCUDA::test_factory_empty_indices_cuda, test/test_sparse.py::TestSparseCUDA::test_factory_nnz_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_factory_size_check_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_factory_type_inference_cuda_complex64, test/test_sparse.py::TestSparseCUDA::test_factory_type_inference_cuda_float32, test/test_sparse.py::TestSparseCUDA::test_factory_type_inference_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_factory_type_inference_cuda_int64, test/test_sparse.py::TestSparseCUDA::test_full_broadcast_to_cuda_complex128, test/test_sparse.py::TestSparseCUDA::test_full_broadcast_to_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_hsmm_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_index_select_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_index_select_empty_and_non_contiguous_index_cuda_complex128, test/test_sparse.py::TestSparseCUDA::test_index_select_empty_and_non_contiguous_index_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_index_select_exhaustive_index_large_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_index_select_exhaustive_index_small_cuda_complex128, test/test_sparse.py::TestSparseCUDA::test_index_select_exhaustive_index_small_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_index_select_parallelization_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_isnan_cuda, test/test_sparse.py::TestSparseCUDA::test_legacy_new_cuda, test/test_sparse.py::TestSparseCUDA::test_log1p_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_log1p_cuda_int16, test/test_sparse.py::TestSparseCUDA::test_log1p_cuda_int8, test/test_sparse.py::TestSparseCUDA::test_log_softmax_zero_nnz_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_mm_cuda_complex128, test/test_sparse.py::TestSparseCUDA::test_narrow_cuda_complex128, test/test_sparse.py::TestSparseCUDA::test_negative_indices_cuda, test/test_sparse.py::TestSparseCUDA::test_new_cuda_complex128, test/test_sparse.py::TestSparseCUDA::test_new_device_single_gpu_cuda, test/test_sparse.py::TestSparseCUDA::test_norm_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_permute_masked_cuda_complex128, test/test_sparse.py::TestSparseCUDA::test_permute_masked_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_permute_sparse_cuda_complex128, test/test_sparse.py::TestSparseCUDA::test_same_gpu_cuda, test/test_sparse.py::TestSparseCUDA::test_scalar_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_select_cuda_complex128, test/test_sparse.py::TestSparseCUDA::test_select_no_type_promotion_cuda_int64, test/test_sparse.py::TestSparseCUDA::test_select_no_type_promotion_cuda_int8, test/test_sparse.py::TestSparseCUDA::test_select_no_type_promotion_cuda_uint8, test/test_sparse.py::TestSparseCUDA::test_shared_cuda_complex128, test/test_sparse.py::TestSparseCUDA::test_small_nnz_coalesced_cuda, test/test_sparse.py::TestSparseCUDA::test_softmax_zero_nnz_cuda_float32, test/test_sparse.py::TestSparseCUDA::test_spadd_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_sparse_add_coalesce_cuda_complex128, test/test_sparse.py::TestSparseCUDA::test_sparse_add_coalesce_cuda_complex64, test/test_sparse.py::TestSparseCUDA::test_sparse_add_coalesce_cuda_float32, test/test_sparse.py::TestSparseCUDA::test_sparse_addmm_cuda_bfloat16, test/test_sparse.py::TestSparseCUDA::test_sparse_addmm_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_sparse_bool_cuda_complex128, test/test_sparse.py::TestSparseCUDA::test_sparse_bool_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_sparse_dense_mul_cuda_complex128, test/test_sparse.py::TestSparseCUDA::test_sparse_dense_mul_cuda_int32, test/test_sparse.py::TestSparseCUDA::test_sparse_dense_mul_cuda_int64, test/test_sparse.py::TestSparseCUDA::test_sparse_dense_mul_cuda_uint8, test/test_sparse.py::TestSparseCUDA::test_sparse_mask_hybrid_cuda_complex128, test/test_sparse.py::TestSparseCUDA::test_sparse_mask_hybrid_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_sparse_matmul_cuda_float32, test/test_sparse.py::TestSparseCUDA::test_sparse_mul_masked_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_sparse_mul_sparse_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_sparse_sparse_mul_cuda_complex128, test/test_sparse.py::TestSparseCUDA::test_sparse_sparse_mul_cuda_float16, test/test_sparse.py::TestSparseCUDA::test_sparse_sparse_mul_cuda_int16, test/test_sparse.py::TestSparseCUDA::test_sparse_sparse_mul_cuda_int64, test/test_sparse.py::TestSparseCUDA::test_sparse_sparse_mul_cuda_int8, test/test_sparse.py::TestSparseCUDA::test_sparse_spdiags_cuda_bool, test/test_sparse.py::TestSparseCUDA::test_sparse_spdiags_cuda_complex64, test/test_sparse.py::TestSparseCUDA::test_sparse_spdiags_cuda_float32, test/test_sparse.py::TestSparseCUDA::test_sparse_spdiags_cuda_int16, test/test_sparse.py::TestSparseCUDA::test_sparse_spdiags_cuda_int64, test/test_sparse.py::TestSparseCUDA::test_sparse_spdiags_cuda_uint8, test/test_sparse.py::TestSparseCUDA::test_sspaddmm_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_sum_cuda_complex64, test/test_sparse.py::TestSparseCUDA::test_sum_cuda_float32, test/test_sparse.py::TestSparseCUDA::test_sum_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_sum_cuda_int32, test/test_sparse.py::TestSparseCUDA::test_sum_cuda_int64, test/test_sparse.py::TestSparseCUDA::test_sum_cuda_int8, test/test_sparse.py::TestSparseCUDA::test_sum_cuda_uint8, test/test_sparse.py::TestSparseCUDA::test_t_empty_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_to_dense_hybrid_masked_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_to_dense_with_gradcheck_masked_cuda_complex128, test/test_sparse.py::TestSparseCUDA::test_to_dense_with_gradcheck_masked_cuda_float32, test/test_sparse.py::TestSparseCUDA::test_to_dense_with_gradcheck_sparse_cuda_complex128, test/test_sparse.py::TestSparseCUDA::test_to_dense_with_gradcheck_sparse_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_to_sparse_cuda_complex128, test/test_sparse.py::TestSparseCUDA::test_to_sparse_cuda_complex64, test/test_sparse.py::TestSparseCUDA::test_to_sparse_cuda_float16, test/test_sparse.py::TestSparseCUDA::test_to_sparse_cuda_int32, test/test_sparse.py::TestSparseCUDA::test_zeros_cuda_complex128, test/test_sparse.py::TestSparseCUDA::test_zeros_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_zeros_like_cuda_complex128, test/test_sparse.py::TestSparseCUDA::test_zeros_like_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_as_sparse_gradcheck_SparseBSC_masked_slow_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_as_sparse_gradcheck_SparseBSC_nonmasked_fast_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_as_sparse_gradcheck_SparseBSC_nonmasked_slow_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_as_sparse_gradcheck_SparseBSR_masked_fast_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_as_sparse_gradcheck_SparseBSR_masked_slow_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_as_sparse_gradcheck_SparseBSR_nonmasked_fast_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_as_sparse_gradcheck_SparseCOO_masked_fast_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_as_sparse_gradcheck_SparseCOO_nonmasked_fast_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_as_sparse_gradcheck_SparseCOO_nonmasked_slow_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_as_sparse_gradcheck_SparseCSC_nonmasked_fast_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_as_sparse_gradcheck_SparseCSR_masked_fast_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_as_sparse_gradcheck_SparseCSR_masked_slow_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_as_sparse_gradcheck_SparseCSR_nonmasked_fast_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_as_sparse_gradcheck_SparseCSR_nonmasked_slow_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseBSC_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseBSC_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseBSC_cuda_complex32, test/test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseBSC_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseBSC_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseBSR_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseBSR_cuda_complex32, test/test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseBSR_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseBSR_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseBSR_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseCOO_cuda_complex32, test/test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseCOO_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseCOO_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseCOO_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseCSC_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseCSC_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseCSC_cuda_complex32, test/test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseCSC_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseCSC_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseCSR_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseCSR_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseCSR_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseCSR_cuda_complex32, test/test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseCSR_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseCSR_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseCSR_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseCSR_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseCSR_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_check_sparse_tensor_invariants_SparseBSC_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_check_sparse_tensor_invariants_SparseBSR_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_check_sparse_tensor_invariants_SparseCOO_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_check_sparse_tensor_invariants_SparseCSC_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_constructor_autograd_SparseBSC_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_constructor_autograd_SparseCOO_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_constructor_autograd_SparseCSC_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_constructor_mismatched_pinned_memory_SparseBSC_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_constructor_mismatched_pinned_memory_SparseCSC_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_constructor_mismatched_pinned_memory_SparseCSR_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_constructor_pin_memory_SparseBSC_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_constructor_pin_memory_SparseCSC_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_constructor_pin_memory_SparseCSR_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_constructor_pin_memory_Strided_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_constructor_pinned_memory_SparseBSR_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_constructor_pinned_memory_SparseCSC_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_constructor_pinned_memory_SparseCSR_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_constructor_pinned_memory_Strided_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_gradcheck_mm_SparseBSR_masked_fast_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_gradcheck_mm_SparseBSR_masked_slow_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_gradcheck_mm_SparseBSR_masked_slow_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_gradcheck_mm_SparseBSR_sparse_slow_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_gradcheck_mm_SparseBSR_sparse_slow_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_gradcheck_mm_SparseCOO_masked_fast_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_gradcheck_mm_SparseCOO_masked_fast_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_gradcheck_mm_SparseCOO_sparse_slow_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_gradcheck_mm_SparseCOO_sparse_slow_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_gradcheck_mm_SparseCSC_masked_fast_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_gradcheck_mm_SparseCSC_masked_slow_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_gradcheck_mm_SparseCSC_masked_slow_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_gradcheck_mm_SparseCSC_sparse_fast_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_gradcheck_mm_SparseCSR_masked_fast_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_gradcheck_mm_SparseCSR_sparse_fast_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_gradcheck_to_dense_SparseBSC_int64_masked_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_gradcheck_to_dense_SparseBSC_int64_sparse_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_gradcheck_to_dense_SparseBSR_int64_masked_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_gradcheck_to_dense_SparseBSR_int64_sparse_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_gradcheck_to_dense_SparseBSR_int64_sparse_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_gradcheck_to_dense_SparseCOO_int64_sparse_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_gradcheck_to_dense_SparseCSC_int64_masked_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_gradcheck_to_dense_SparseCSC_int64_sparse_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_gradcheck_to_dense_SparseCSR_int64_masked_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_gradcheck_to_dense_SparseCSR_int64_masked_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_gradcheck_to_dense_SparseCSR_int64_sparse_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_invalid_blocksize_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_randn_like_SparseBSC_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_randn_like_SparseBSC_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_randn_like_SparseBSR_cuda_complex32, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_randn_like_SparseBSR_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_randn_like_SparseCOO_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_randn_like_SparseCOO_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_randn_like_SparseCOO_cuda_complex32, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_randn_like_SparseCOO_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_randn_like_SparseCSC_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_randn_like_SparseCSC_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_randn_like_SparseCSC_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_randn_like_SparseCSC_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_randn_like_SparseCSC_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_randn_like_SparseCSR_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_randn_like_SparseCSR_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseBSC_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseBSC_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseBSC_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseBSC_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseBSC_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseBSC_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseBSC_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseBSC_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseBSC_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseBSC_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseBSC_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseBSR_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseBSR_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseBSR_cuda_complex32, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseBSR_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseBSR_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseCOO_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseCOO_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseCOO_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseCOO_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseCOO_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseCOO_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseCOO_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseCOO_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseCSC_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseCSC_cuda_complex32, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseCSC_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseCSC_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseCSC_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseCSC_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseCSC_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseCSR_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseCSR_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseCSR_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseCSR_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseCSR_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseCSR_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseCSR_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseCSR_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseCSR_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseCSR_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_method_pin_memory_SparseBSR_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_method_pin_memory_SparseCOO_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_method_pin_memory_SparseCSR_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_method_pin_memory_Strided_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_backward_sum_SparseBSC_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_backward_sum_SparseBSC_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_backward_sum_SparseBSR_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_backward_sum_SparseBSR_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_backward_sum_SparseCOO_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_backward_sum_SparseCOO_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_backward_sum_SparseCOO_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_backward_sum_SparseCSR_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_backward_sum_SparseCSR_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_backward_sum_SparseCSR_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_backward_sum_SparseCSR_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseBSC_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseBSC_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseBSC_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseBSC_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseBSC_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseBSR_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseBSR_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseBSR_cuda_complex32, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseBSR_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseBSR_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseBSR_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseBSR_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseBSR_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseCOO_cuda_complex32, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseCOO_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseCOO_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseCOO_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseCOO_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseCSC_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseCSC_cuda_complex32, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseCSC_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseCSC_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseCSC_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseCSC_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseCSC_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseCSR_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseCSR_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseCSR_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseCSR_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseCSR_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseCSR_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseCSR_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseCSR_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseCSR_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseBSC_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseBSC_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseBSC_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseBSC_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseBSC_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseBSC_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseBSC_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseBSC_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseBSR_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseBSR_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseBSR_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseCOO_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseCOO_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseCOO_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseCOO_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseCOO_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseCSC_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseCSC_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseCSC_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseCSC_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseCSC_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseCSR_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseCSR_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseCSR_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseCSR_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseCSR_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseCSR_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseCSR_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseCSR_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSC_int32_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSC_int32_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSC_int32_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSC_int64_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSC_int64_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSC_int64_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSC_int64_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSC_int64_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSC_int64_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSR_int32_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSR_int32_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSR_int32_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSR_int32_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSR_int32_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSR_int32_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSR_int64_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSR_int64_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSR_int64_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSR_int64_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSR_int64_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSR_int64_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCOO_int32_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCOO_int32_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCOO_int32_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCOO_int32_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCOO_int32_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCOO_int32_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCOO_int64_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCOO_int64_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCOO_int64_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCOO_int64_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCOO_int64_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSC_int32_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSC_int32_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSC_int32_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSC_int32_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSC_int32_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSC_int32_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSC_int32_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSC_int32_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSC_int32_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSC_int32_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSC_int64_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSC_int64_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSC_int64_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSC_int64_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSC_int64_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSC_int64_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSC_int64_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSR_int32_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSR_int32_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSR_int32_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSR_int32_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSR_int32_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSR_int32_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSR_int32_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSR_int32_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSR_int32_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSR_int64_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSR_int64_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSR_int64_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSR_int64_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSR_int64_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSR_int64_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSC_int32_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSC_int32_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSC_int32_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSC_int32_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSC_int32_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSC_int32_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSC_int64_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSC_int64_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSC_int64_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSC_int64_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSC_int64_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSR_int32_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSR_int32_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSR_int32_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSR_int32_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSR_int32_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSR_int32_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSR_int32_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSR_int32_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSR_int32_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSR_int32_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSR_int64_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSR_int64_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSR_int64_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSR_int64_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSR_int64_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSR_int64_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCOO_int32_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCOO_int32_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCOO_int32_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCOO_int32_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCOO_int32_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCOO_int64_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCOO_int64_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCOO_int64_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCOO_int64_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCOO_int64_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSC_int32_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSC_int32_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSC_int32_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSC_int32_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSC_int32_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSC_int32_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSC_int32_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSC_int32_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSC_int64_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSC_int64_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSC_int64_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSC_int64_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSC_int64_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSC_int64_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSC_int64_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSC_int64_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSC_int64_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSR_int32_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSR_int32_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSR_int32_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSR_int64_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSR_int64_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSR_int64_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSR_int64_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSR_int64_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSC_int32_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSC_int32_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSC_int32_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSC_int32_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSC_int32_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSC_int32_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSC_int64_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSC_int64_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSC_int64_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSC_int64_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSC_int64_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSC_int64_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSC_int64_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSC_int64_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSR_int32_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSR_int32_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSR_int32_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSR_int32_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSR_int32_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSR_int64_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSR_int64_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSR_int64_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSR_int64_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSR_int64_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSR_int64_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCOO_int32_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCOO_int32_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCOO_int32_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCOO_int32_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCOO_int32_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCOO_int32_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCOO_int32_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCOO_int64_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCOO_int64_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCOO_int64_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCOO_int64_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCOO_int64_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSC_int32_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSC_int32_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSC_int32_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSC_int32_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSC_int32_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSC_int32_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSC_int64_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSC_int64_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSC_int64_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSC_int64_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSC_int64_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSC_int64_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSR_int32_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSR_int32_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSR_int32_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSR_int32_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSR_int32_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSR_int32_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSR_int32_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSR_int64_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSR_int64_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSR_int64_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSR_int64_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSR_int64_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSR_int64_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSR_int64_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSC_int32_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSC_int32_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSC_int32_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSC_int32_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSC_int32_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSC_int32_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSC_int32_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSC_int64_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSC_int64_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSC_int64_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSC_int64_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSC_int64_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSC_int64_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSC_int64_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSC_int64_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSC_int64_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSR_int32_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSR_int32_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSR_int32_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSR_int32_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSR_int32_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSR_int32_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSR_int32_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSR_int32_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSR_int32_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSR_int64_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSR_int64_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSR_int64_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSR_int64_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSR_int64_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSR_int64_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCOO_int32_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCOO_int32_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCOO_int32_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCOO_int64_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCOO_int64_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCOO_int64_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCOO_int64_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCOO_int64_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSC_int32_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSC_int32_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSC_int32_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSC_int32_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSC_int32_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSC_int64_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSC_int64_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSC_int64_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSC_int64_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSC_int64_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSC_int64_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSC_int64_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSR_int32_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSR_int32_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSR_int32_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSR_int32_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSR_int32_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSR_int64_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSR_int64_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSR_int64_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSR_int64_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSR_int64_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSR_int64_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSC_int32_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSC_int32_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSC_int32_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSC_int32_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSC_int32_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSC_int32_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSC_int32_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSC_int32_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSC_int64_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSC_int64_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSC_int64_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSC_int64_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSC_int64_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSC_int64_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSR_int32_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSR_int32_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSR_int32_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSR_int32_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSR_int32_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSR_int32_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSR_int64_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSR_int64_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSR_int64_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSR_int64_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSR_int64_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSR_int64_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSR_int64_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSR_int64_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSR_int64_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCOO_int32_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCOO_int32_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCOO_int32_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCOO_int32_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCOO_int32_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCOO_int32_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCOO_int32_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCOO_int64_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCOO_int64_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCOO_int64_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCOO_int64_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCOO_int64_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSC_int32_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSC_int32_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSC_int32_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSC_int32_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSC_int32_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSC_int32_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSC_int64_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSC_int64_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSC_int64_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSC_int64_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSC_int64_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSC_int64_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSC_int64_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSR_int32_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSR_int32_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSR_int32_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSR_int32_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSR_int32_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSR_int32_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSR_int32_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSR_int64_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSR_int64_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSR_int64_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSR_int64_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSR_int64_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSR_int64_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSR_int64_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSR_int64_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSR_int64_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSC_int32_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSC_int32_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSC_int32_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSC_int32_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSC_int32_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSC_int64_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSC_int64_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSC_int64_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSC_int64_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSC_int64_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSC_int64_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSC_int64_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSR_int32_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSR_int32_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSR_int32_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSR_int32_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSR_int32_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSR_int64_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSR_int64_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSR_int64_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSR_int64_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSR_int64_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSR_int64_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSR_int64_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSR_int64_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCOO_int32_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCOO_int32_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCOO_int32_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCOO_int32_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCOO_int32_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCOO_int64_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCOO_int64_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCOO_int64_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCOO_int64_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCOO_int64_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCOO_int64_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCOO_int64_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSC_int32_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSC_int32_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSC_int32_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSC_int32_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSC_int32_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSC_int32_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSC_int64_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSC_int64_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSC_int64_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSC_int64_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSC_int64_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSR_int32_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSR_int32_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSR_int32_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSR_int32_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSR_int32_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSR_int32_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSR_int32_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSR_int32_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSR_int64_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSR_int64_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSR_int64_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSR_int64_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSR_int64_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSR_int64_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSR_int64_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSR_int64_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSC_int32_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSC_int32_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSC_int32_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSC_int32_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSC_int32_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSC_int32_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSC_int32_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSC_int32_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSC_int64_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSC_int64_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSC_int64_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSC_int64_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSC_int64_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSC_int64_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSR_int32_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSR_int32_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSR_int32_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSR_int32_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSR_int64_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSR_int64_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSR_int64_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSR_int64_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSR_int64_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSR_int64_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSR_int64_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSR_int64_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSR_int64_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCOO_int32_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCOO_int32_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCOO_int32_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCOO_int32_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCOO_int32_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCOO_int32_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCOO_int64_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCOO_int64_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCOO_int64_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCOO_int64_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCOO_int64_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCOO_int64_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCOO_int64_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSC_int32_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSC_int32_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSC_int32_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSC_int32_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSC_int32_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSC_int32_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSC_int64_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSC_int64_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSC_int64_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSC_int64_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSC_int64_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSC_int64_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSR_int32_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSR_int32_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSR_int32_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSR_int32_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSR_int32_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSR_int32_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSR_int64_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSR_int64_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSR_int64_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSR_int64_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_identity_SparseBSC_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_identity_SparseBSR_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_identity_SparseCSR_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_identity_Strided_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_ccol_indices_SparseBSC_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_ccol_indices_SparseCSC_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_ccol_indices_SparseCSR_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_col_indices_SparseCSC_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_col_indices_SparseCSR_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_crow_indices_SparseCSC_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_indices_SparseBSC_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_indices_SparseBSR_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_indices_SparseCOO_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_indices_SparseCSR_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_indices_Strided_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_is_coalesced_SparseBSR_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_is_coalesced_SparseCSC_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_is_coalesced_Strided_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_row_indices_SparseBSC_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_row_indices_SparseCOO_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_row_indices_SparseCSC_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_row_indices_SparseCSR_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_row_indices_Strided_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_values_SparseCSC_cuda 2025-12-04T12:44:58.3984393Z 2025-12-04T12:44:58.3984708Z Finished test_sparse 2/2 ... [2025-12-04 12:44:58.268347][12726.651240454], took 6.29min 2025-12-04T12:44:58.3985774Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_sparse/test_sparse-598e6683c5cfc22a.xml 2025-12-04T12:44:58.4325994Z Running test_decomp 3/17 ... [2025-12-04 12:44:58.432293][12726.815187852] 2025-12-04T12:44:58.4326484Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T12:44:58.4329511Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_decomp.py', '--shard-id=3', '--num-shards=17', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 12:44:58.432729] 2025-12-04T12:55:44.6226542Z 2025-12-04T12:55:44.6227909Z test_decomp 3/17 was successful, full logs can be found in artifacts with path test/test-reports/test_decomp_3.17_3a5dd6feb399010e_.log 2025-12-04T12:55:44.6434986Z Running 547 items in this shard: test/test_decomp.py::TestDecompCUDA::test_comprehensive___getitem___cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive___getitem___cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive___getitem___cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive___rdiv___cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive__chunk_cat_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive__softmax_backward_data_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive__unsafe_masked_index_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_abs_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_addcdiv_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_addmm_decomposed_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_addmv_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_addr_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_all_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_allclose_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_amax_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_amin_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_argmax_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_argwhere_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_as_strided_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_as_strided_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_as_strided_scatter_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_atan2_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_atanh_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_atleast_1d_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_atleast_3d_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_bfloat16_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_bfloat16_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_bfloat16_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_bincount_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_bmm_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_bool_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_bool_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_bool_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_broadcast_tensors_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_broadcast_to_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_byte_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cartesian_prod_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cartesian_prod_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cat_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cdouble_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_chalf_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cholesky_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_chunk_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_clamp_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_clamp_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_clamp_max_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_clamp_min_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_clone_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_column_stack_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_conj_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_contiguous_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_contiguous_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_copysign_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cos_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cosh_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_count_nonzero_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cov_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cummax_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cumprod_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cumsum_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cumulative_trapezoid_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cumulative_trapezoid_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cumulative_trapezoid_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_deg2rad_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_diag_embed_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_diagonal_copy_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_diagonal_copy_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_diagonal_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_diff_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_diff_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_diff_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_div_floor_rounding_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_div_trunc_rounding_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_double_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_dsplit_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_dstack_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_empty_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_eq_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_equal_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_erf_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_erfc_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_erfinv_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_exp2_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_exp_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_expm1_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_exponential_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_fftn_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_fftshift_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_hfft2_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_hfft_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_hfft_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_ifftn_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_irfftn_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_rfft2_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fill_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fill_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fliplr_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_floor_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_floor_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fmin_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fmod_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_full_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_gather_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_geometric_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_half_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_half_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_histc_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_hsplit_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_hsplit_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_hsplit_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_i0_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_i0_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_imag_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_index_add_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_index_add_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_index_add_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_index_fill_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_index_put_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_index_put_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_index_reduce_amax_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_index_reduce_amax_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_index_reduce_amin_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_index_reduce_mean_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_index_select_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_int_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_isfinite_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_isinf_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_isinf_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_isnan_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_isposinf_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_isposinf_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_isreal_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_jiterator_2inputs_2outputs_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_jiterator_binary_return_by_ref_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_jiterator_binary_return_by_ref_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_kron_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_kthvalue_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_cross_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_diagonal_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_eigvals_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_householder_product_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_inv_ex_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_ldl_factor_ex_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_matrix_norm_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_multi_dot_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_multi_dot_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_pinv_hermitian_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_slogdet_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_solve_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_solve_triangular_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_svd_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_log2_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_log2_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_log_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_log_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_log_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_log_softmax_with_dtype_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_logaddexp_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_logcumsumexp_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_logdet_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_logical_and_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_logical_and_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_logical_not_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_logspace_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_logspace_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_logspace_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_logspace_tensor_overload_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_lu_solve_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_mH_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_mH_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_mT_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_argmax_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_argmax_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_cumprod_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_cumprod_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_fill_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_fill_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_fill_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_log_softmax_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_mean_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_prod_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_prod_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_prod_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_scatter_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_std_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_var_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_var_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_max_reduction_no_dim_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_max_reduction_no_dim_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_maximum_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_mean_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_meshgrid_variadic_tensors_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_meshgrid_variadic_tensors_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_min_reduction_no_dim_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_min_reduction_no_dim_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_msort_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_msort_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_mv_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_mvlgamma_mvlgamma_p_5_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nansum_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_narrow_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_narrow_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_ne_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_ne_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_neg_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_new_empty_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_new_empty_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_adaptive_avg_pool1d_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_adaptive_avg_pool2d_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_adaptive_max_pool2d_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_bilinear_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_binary_cross_entropy_with_logits_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_channel_shuffle_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_conv3d_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_conv3d_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_conv_transpose1d_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_conv_transpose1d_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_dropout2d_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_embedding_bag_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_feature_alpha_dropout_without_train_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_fractional_max_pool2d_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_gelu_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_hardshrink_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_hardtanh_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_instance_norm_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_interpolate_nearest-exact_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_interpolate_trilinear_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_kl_div_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_layer_norm_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_margin_ranking_loss_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_max_unpool3d_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_multilabel_margin_loss_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_pad_circular_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_pad_constant_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_pixel_shuffle_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_prelu_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_rms_norm_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_scaled_dot_product_attention_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_softshrink_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_softsign_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_tanhshrink_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_triplet_margin_loss_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_triplet_margin_loss_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_upsample_bilinear_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_upsample_nearest_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_norm_fro_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_normal_in_place_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_normal_number_mean_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_ones_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_ones_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_ormqr_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_ormqr_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_permute_copy_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_permute_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_pinverse_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_polygamma_polygamma_n_2_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_polygamma_polygamma_n_2_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_put_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_rand_like_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_randint_like_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_randn_like_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_real_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_reciprocal_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_renorm_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_renorm_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_repeat_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_repeat_interleave_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_repeat_interleave_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_reshape_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_resize__cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_resize_as__cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_resize_as__cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_resolve_conj_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_resolve_neg_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_roll_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_rot90_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_rot90_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_round_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_rsub_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_scatter_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_scatter_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_scatter_reduce_amin_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_signbit_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sinc_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sinc_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_slice_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_slice_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_softmax_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_softmax_with_dtype_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_softmax_with_dtype_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_bessel_y0_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_bessel_y0_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_bessel_y1_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_chebyshev_polynomial_t_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_entr_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_hermite_polynomial_he_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_i0e_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_i0e_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_i1_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_i1e_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_laguerre_polynomial_l_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_legendre_polynomial_p_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_modified_bessel_i1_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_modified_bessel_i1_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_modified_bessel_i1_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_ndtr_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_polygamma_special_polygamma_n_0_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_polygamma_special_polygamma_n_0_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_scaled_modified_bessel_k0_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_spherical_bessel_j0_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_xlog1py_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_zeta_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_split_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_square_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_square_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_squeeze_copy_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_squeeze_multiple_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_squeeze_multiple_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_stack_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_std_unbiased_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sum_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sum_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sum_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sum_to_size_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sum_to_size_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_svd_lowrank_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_take_along_dim_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_tan_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_tanh_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_tensor_split_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_tile_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_tile_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_tile_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_tile_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_tile_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_to_sparse_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_to_sparse_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_torch_ops_aten__flash_attention_forward_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_trace_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_trace_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_trace_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_trapezoid_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_tril_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_true_divide_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_true_divide_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_true_divide_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unflatten_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unflatten_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unflatten_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unfold_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unfold_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unique_consecutive_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unique_consecutive_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unique_cuda_uint16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unsafe_chunk_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unsafe_split_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unsafe_split_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unsafe_split_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_var_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_var_mean_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_var_unbiased_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_view_as_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_view_copy_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_view_copy_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_view_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_vsplit_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_zeros_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick__unsafe_masked_index_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick__unsafe_masked_index_put_accumulate_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_abs_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_abs_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_add_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_add_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_addcmul_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_addmm_decomposed_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_alias_copy_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_alias_copy_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_all_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_all_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_amax_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_amax_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_arange_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_as_strided_copy_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_quick_asin_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_asinh_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_quick_atan2_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_atan_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_baddbmm_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_bitwise_and_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_bucketize_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_bucketize_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_bucketize_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_cat_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_cauchy_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_clamp_min_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_complex_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_constant_pad_nd_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_copysign_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_copysign_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_core_backward_fill_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_core_backward_mv_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_core_backward_mvlgamma_mvlgamma_p_3_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_core_backward_rsub_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_core_backward_select_scatter_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_core_backward_squeeze_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_core_backward_unfold_copy_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_count_nonzero_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_cumprod_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_diagonal_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_digamma_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_div_no_rounding_mode_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_quick_div_trunc_rounding_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_empty_strided_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_erf_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_erfinv_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_exp_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_exp_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_fft_fftn_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_fft_hfft2_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_fft_hfft2_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_fft_hfftn_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_quick_fft_ifft2_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_fft_ifft_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_fft_ifftn_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_fft_ifftn_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_fft_ihfft_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_fft_irfftn_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_quick_fft_rfft2_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_fft_rfft2_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_flip_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_floor_divide_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_floor_divide_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_fmax_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_fmin_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_fmod_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_full_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_full_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_geometric_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_grid_sampler_2d_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_gt_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_heaviside_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_i0_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_index_add_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_index_add_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_quick_index_copy_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_quick_index_copy_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_isinf_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_isinf_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_isnan_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_isneginf_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_isposinf_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_lerp_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_lerp_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_lgamma_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_linalg_diagonal_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_linalg_diagonal_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_linalg_diagonal_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_linalg_vector_norm_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_linspace_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_log10_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_log1p_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_log1p_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_logaddexp2_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_logical_and_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_logical_not_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_logical_xor_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_logical_xor_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_logit_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_mean_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_meshgrid_list_of_tensors_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_minimum_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_mul_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_mvlgamma_mvlgamma_p_1_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_mvlgamma_mvlgamma_p_3_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_nan_to_num_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_nan_to_num_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_ne_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_ne_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_new_empty_strided_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_new_full_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_new_ones_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_new_ones_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_new_ones_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_new_zeros_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_embedding_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_hardshrink_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_huber_loss_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_leaky_relu_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_max_unpool3d_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_pad_constant_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_relu_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_norm_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_prod_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_rad2deg_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_rad2deg_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_reciprocal_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_reciprocal_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_remainder_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_remainder_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_roll_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_rot90_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_rsqrt_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_rsub_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_select_scatter_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_select_scatter_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_signbit_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_sin_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_sinc_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_special_ndtri_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_special_zeta_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_split_with_sizes_copy_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_split_with_sizes_copy_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_split_with_sizes_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_sqrt_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_quick_squeeze_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_squeeze_multiple_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_squeeze_multiple_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_stack_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_std_unbiased_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_sub_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_sum_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_t_copy_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_t_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_tan_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_tan_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_trace_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_transpose_copy_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_transpose_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_tril_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_quick_triu_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_trunc_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_unbind_copy_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_unbind_copy_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_unbind_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_unfold_copy_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_unfold_copy_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_unfold_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_unsafe_split_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_unsqueeze_copy_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_unsqueeze_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_var_mean_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_var_mean_unbiased_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_view_copy_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_view_copy_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_xlogy_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_xlogy_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_zeros_like_cuda_complex64 2025-12-04T12:55:44.6637261Z 2025-12-04T12:55:44.6637737Z Finished test_decomp 3/17 ... [2025-12-04 12:55:44.623271][13373.006165549], took 10.77min 2025-12-04T12:55:44.6638783Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_decomp/test_decomp-5879e0e26736617e.xml 2025-12-04T12:55:45.7031190Z Uploading artifacts took 0.97 seconds 2025-12-04T12:55:45.7034458Z Running test_decomp 8/17 ... [2025-12-04 12:55:45.703295][13374.086189868] 2025-12-04T12:55:45.7034961Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T12:55:45.7039595Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_decomp.py', '--shard-id=8', '--num-shards=17', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 12:55:45.703754] 2025-12-04T13:09:24.4113437Z 2025-12-04T13:09:24.4114802Z test_decomp 8/17 was successful, full logs can be found in artifacts with path test/test-reports/test_decomp_8.17_26b4abb8a1042a34_.log 2025-12-04T13:09:24.4321920Z Running 541 items in this shard: test/test_decomp.py::TestDecompCUDA::test_comprehensive___rdiv___cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive___rmod___cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive___rpow___cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive___rsub___cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive__chunk_cat_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive__segment_reduce_lengths_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive__segment_reduce_offsets_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive__unsafe_masked_index_put_accumulate_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_addcmul_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_addmm_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_addmm_decomposed_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_addr_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_addr_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_addr_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_alias_copy_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_alias_copy_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_amin_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_aminmax_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_angle_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_angle_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_angle_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_angle_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_argsort_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_argwhere_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_as_strided_partial_views_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_as_strided_partial_views_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_as_strided_partial_views_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_asin_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_asinh_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_atan_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_atleast_1d_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_atleast_1d_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_atleast_2d_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_atleast_2d_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_bitwise_or_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_bmm_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_bool_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_broadcast_to_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cdouble_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cdouble_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cdouble_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cfloat_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_chunk_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_clamp_min_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_clone_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_column_stack_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_combinations_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_combinations_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_complex_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_complex_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_conj_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_conj_physical_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_conj_physical_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_copysign_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_corrcoef_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_corrcoef_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cos_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cos_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cosh_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_count_nonzero_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cov_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cov_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cov_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cross_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cummin_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cumprod_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cumprod_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cumprod_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cumsum_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_deg2rad_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_diag_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_diag_embed_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_diag_embed_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_diagonal_copy_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_diagonal_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_diagonal_scatter_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_div_no_rounding_mode_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_div_trunc_rounding_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_dot_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_einsum_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_empty_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_empty_like_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_equal_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_equal_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_equal_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_erfc_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_erfinv_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_exp2_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_expand_as_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_eye_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_eye_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_fft_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_fftn_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_fftshift_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_hfft2_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_ifft2_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_ifft2_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_ifft_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_ifftn_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_ifftn_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_ifftshift_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_ihfftn_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_rfft2_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_rfft_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_rfftn_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_flatten_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_flatten_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_flatten_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_flip_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fliplr_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_flipud_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_flipud_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_float_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_floor_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_floor_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_floor_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_frac_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_frexp_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_full_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_gt_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_hash_tensor_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_hsplit_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_hsplit_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_index_fill_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_inner_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_isclose_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_isclose_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_isclose_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_isfinite_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_isin_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_isinf_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_isinf_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_isposinf_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_item_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_item_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_item_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_jiterator_2inputs_2outputs_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_jiterator_4inputs_with_extra_args_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_jiterator_unary_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_kron_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_kthvalue_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_lcm_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_ldexp_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_le_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_lerp_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_det_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_diagonal_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_diagonal_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_householder_product_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_lstsq_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_lu_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_lu_factor_ex_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_lu_solve_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_norm_subgradients_at_zero_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_qr_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_solve_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_solve_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_solve_ex_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_vander_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linspace_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linspace_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_logical_or_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_logical_or_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_logit_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_logit_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_logsumexp_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_long_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_lt_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_lu_solve_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_mH_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_argmin_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_argmin_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_cumsum_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_cumsum_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_cumsum_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_fill_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_fill_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_log_softmax_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_log_softmax_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_logsumexp_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_median_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_prod_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_prod_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_prod_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_select_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_softmax_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_max_reduction_no_dim_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_median_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_meshgrid_variadic_tensors_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_min_reduction_with_dim_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_mvlgamma_mvlgamma_p_1_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_mvlgamma_mvlgamma_p_1_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nan_to_num_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nan_to_num_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nanquantile_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_narrow_copy_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_native_layer_norm_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_native_layer_norm_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_neg_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_new_empty_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_new_empty_strided_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_new_zeros_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_adaptive_avg_pool3d_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_adaptive_max_pool1d_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_adaptive_max_pool2d_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_avg_pool1d_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_batch_norm_without_cudnn_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_binary_cross_entropy_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_conv_transpose2d_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_conv_transpose3d_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_conv_transpose3d_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_ctc_loss_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_feature_alpha_dropout_with_train_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_feature_alpha_dropout_without_train_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_fractional_max_pool2d_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_group_norm_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_hardswish_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_interpolate_linear_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_interpolate_trilinear_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_kl_div_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_l1_loss_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_max_unpool1d_grad_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_max_unpool2d_grad_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_mse_loss_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_one_hot_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_pad_circular_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_pad_constant_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_pad_reflect_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_pad_reflect_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_pad_replicate_negative_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_pairwise_distance_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_poisson_nll_loss_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_rrelu_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_silu_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_tanhshrink_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_tanhshrink_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_tanhshrink_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_tanhshrink_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_threshold_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_triplet_margin_loss_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_triplet_margin_with_distance_loss_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_unfold_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_upsample_bilinear_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nonzero_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_norm_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_norm_fro_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_norm_inf_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_outer_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_pca_lowrank_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_permute_copy_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_pinverse_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_polygamma_polygamma_n_0_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_polygamma_polygamma_n_0_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_positive_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_put_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_put_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_put_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_randint_like_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_real_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_repeat_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_reshape_as_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_reshape_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_reshape_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_reshape_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_rot90_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_round_decimals_0_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_round_decimals_neg_3_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_rsqrt_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_rsub_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_scalar_tensor_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_scalar_tensor_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_scatter_reduce_mean_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_select_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_select_scatter_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_select_scatter_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sigmoid_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_signal_windows_blackman_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_signal_windows_exponential_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_signal_windows_nuttall_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sinc_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sinh_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_slice_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_slice_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sort_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_airy_ai_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_bessel_j1_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_bessel_j1_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_bessel_y1_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_chebyshev_polynomial_v_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_chebyshev_polynomial_v_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_chebyshev_polynomial_w_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_i0e_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_laguerre_polynomial_l_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_laguerre_polynomial_l_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_laguerre_polynomial_l_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_legendre_polynomial_p_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_modified_bessel_k0_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_modified_bessel_k0_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_polygamma_special_polygamma_n_0_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_scaled_modified_bessel_k0_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_scaled_modified_bessel_k1_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_shifted_chebyshev_polynomial_t_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_shifted_chebyshev_polynomial_w_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_shifted_chebyshev_polynomial_w_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_zeta_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_split_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_split_with_sizes_copy_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_split_with_sizes_copy_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_split_with_sizes_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sqrt_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sqrt_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_square_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_squeeze_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_std_mean_unbiased_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_std_unbiased_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_stft_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_t_copy_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_t_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_take_along_dim_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_take_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_tan_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_tanh_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_tanh_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_tensor_split_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_tensor_split_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_tile_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_to_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_to_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_trace_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_transpose_copy_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_transpose_copy_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_transpose_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_transpose_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_transpose_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_transpose_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_trapz_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_triangular_solve_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_triu_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_triu_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unbind_copy_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unbind_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unflatten_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_var_unbiased_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_vdot_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_view_as_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_vstack_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_xlogy_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_zeros_like_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_zeros_like_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick__chunk_cat_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick__chunk_cat_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick__native_batch_norm_legit_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick__softmax_backward_data_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_acosh_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_add_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_add_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_addcmul_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_alias_copy_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_alias_copy_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_quick_any_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_any_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_as_strided_copy_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_as_strided_copy_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_as_strided_scatter_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_asinh_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_bernoulli_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_block_diag_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_bucketize_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_cat_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_clamp_max_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_clamp_min_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_clone_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_quick_conj_physical_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_copysign_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_core_backward_addcmul_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_core_backward_diagonal_copy_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_core_backward_frac_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_core_backward_index_add_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_core_backward_linalg_cross_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_core_backward_nn_functional_unfold_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_core_backward_norm_fro_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_core_backward_special_xlog1py_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_count_nonzero_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_cumprod_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_cumsum_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_cumsum_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_diag_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_diag_embed_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_diagonal_copy_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_diagonal_copy_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_diagonal_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_quick_diagonal_scatter_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_digamma_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_dist_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_div_floor_rounding_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_empty_like_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_empty_strided_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_erf_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_erf_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_erfinv_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_exp_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_exp_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_quick_expand_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_expand_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_expm1_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_exponential_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_eye_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_fft_fft2_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_fft_hfft_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_fft_hfftn_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_fft_hfftn_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_fft_hfftn_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_fft_ifft2_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_fft_ifft_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_fft_ifftn_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_fft_ihfftn_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_fft_ihfftn_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_fft_irfft2_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_fft_irfft_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_fft_irfftn_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_fft_rfftn_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_fft_rfftn_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_fmax_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_fmin_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_fmod_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_frac_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_full_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_ge_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_geometric_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_geometric_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_gt_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_heaviside_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_heaviside_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_index_add_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_index_copy_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_index_select_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_isnan_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_isnan_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_linalg_cross_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_linalg_cross_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_linalg_diagonal_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_linspace_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_log10_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_log10_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_log_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_log_normal_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_logical_xor_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_logspace_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_logspace_tensor_overload_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_logsumexp_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_masked_fill_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_masked_fill_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_masked_fill_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_maximum_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_maximum_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_maximum_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_meshgrid_list_of_tensors_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_native_dropout_backward_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_native_dropout_backward_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_native_layer_norm_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_new_empty_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_new_full_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_nextafter_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_binary_cross_entropy_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_binary_cross_entropy_with_logits_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_gelu_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_hardsigmoid_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_hardsigmoid_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_hardtanh_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_max_unpool2d_grad_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_pad_constant_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_pad_constant_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_pad_constant_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_unfold_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_norm_inf_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_normal_number_mean_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_permute_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_pow_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_prod_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_reciprocal_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_repeat_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_roll_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_rot90_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_rot90_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_rsub_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_select_scatter_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_sgn_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_sgn_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_quick_sin_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_slice_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_special_entr_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_special_entr_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_special_entr_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_special_erfcx_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_special_i0e_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_special_i0e_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_special_i1_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_special_i1e_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_special_i1e_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_special_ndtr_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_special_ndtr_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_special_ndtri_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_special_xlog1py_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_special_zeta_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_split_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_split_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_split_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_sqrt_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_squeeze_copy_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_squeeze_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_squeeze_multiple_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_stack_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_std_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_std_mean_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_std_mean_unbiased_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_sub_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_sum_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_t_copy_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_take_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_take_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_take_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_tanh_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_trace_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_transpose_copy_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_quick_tril_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_tril_indices_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_unbind_copy_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_unfold_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_unfold_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_unfold_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_uniform_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_unsqueeze_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_view_copy_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_view_copy_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_view_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_where_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_quick_xlogy_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_zeros_like_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_zeros_like_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_rnn_decomp_module_nn_LSTM_eval_mode_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_uniform_cuda 2025-12-04T13:09:24.4527077Z 2025-12-04T13:09:24.4527402Z Finished test_decomp 8/17 ... [2025-12-04 13:09:24.411846][14192.794741033], took 13.65min 2025-12-04T13:09:24.4528509Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_decomp/test_decomp-c4519c63d1395608.xml 2025-12-04T13:09:24.5566196Z Running test_decomp 13/17 ... [2025-12-04 13:09:24.556294][14192.939188006] 2025-12-04T13:09:24.5566772Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T13:09:24.5569943Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_decomp.py', '--shard-id=13', '--num-shards=17', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 13:09:24.556741] 2025-12-04T13:20:11.2757816Z 2025-12-04T13:20:11.2759273Z test_decomp 13/17 was successful, full logs can be found in artifacts with path test/test-reports/test_decomp_13.17_a52400f805dcf5ec_.log 2025-12-04T13:20:11.2969619Z Running 552 items in this shard: test/test_decomp.py::TestDecompCUDA::test_arange_graph_cuda, test/test_decomp.py::TestDecompCUDA::test_comprehensive_H_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_H_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_T_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive___rdiv___cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive___rdiv___cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive___rmod___cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive___rmul___cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive___rmul___cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive___rmul___cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive__chunk_cat_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive__segment_reduce_offsets_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive__softmax_backward_data_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive__unsafe_masked_index_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive__unsafe_masked_index_put_accumulate_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_abs_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_acosh_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_acosh_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_addcdiv_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_alias_copy_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_alias_copy_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_allclose_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_amin_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_aminmax_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_any_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_arange_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_argmin_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_argsort_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_as_strided_partial_views_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_as_strided_partial_views_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_as_strided_partial_views_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_asinh_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_atan2_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_atan_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_atanh_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_baddbmm_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_bernoulli_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_bernoulli_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_bincount_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_bitwise_left_shift_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_bitwise_not_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_block_diag_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_block_diag_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_bool_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_broadcast_tensors_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_broadcast_to_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_bucketize_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_bucketize_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_byte_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cat_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cat_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cdouble_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cdouble_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_ceil_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_char_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cholesky_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cholesky_solve_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_clamp_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_clamp_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_clamp_max_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_clamp_min_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_clone_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_column_stack_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_combinations_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_conj_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_conj_physical_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_constant_pad_nd_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_contiguous_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_contiguous_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cosh_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_count_nonzero_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_count_nonzero_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cross_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cummax_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cummin_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cumsum_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_deg2rad_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_diag_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_diag_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_diagonal_copy_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_diagonal_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_diagonal_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_diagonal_scatter_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_digamma_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_dist_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_double_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_dsplit_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_dsplit_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_dstack_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_einsum_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_empty_permuted_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_eq_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_erfc_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_erfc_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_erfinv_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_exp_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_expand_as_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_expand_as_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_eye_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_fft2_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_fft_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_hfft2_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_hfft2_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_hfft_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_hfftn_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_ifft_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_ifft_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_ifftshift_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_ifftshift_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_ihfft2_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_irfft2_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_irfft2_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_irfftn_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_rfft2_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_rfft2_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_rfftn_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_float_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_floor_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fmin_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_geometric_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_geometric_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_half_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_hash_tensor_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_histc_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_histc_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_hsplit_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_imag_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_index_put_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_index_reduce_prod_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_index_reduce_prod_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_index_select_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_isin_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_isneginf_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_isposinf_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_isreal_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_jiterator_2inputs_2outputs_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_jiterator_4inputs_with_extra_args_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_le_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_lerp_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_lgamma_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_lgamma_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_det_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_diagonal_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_ldl_factor_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_ldl_solve_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_lstsq_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_lu_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_lu_factor_ex_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_matrix_rank_hermitian_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_multi_dot_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_norm_subgradients_at_zero_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_norm_subgradients_at_zero_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_qr_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_svdvals_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linspace_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_log10_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_log10_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_log2_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_log2_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_log2_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_log_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_log_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_log_softmax_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_logaddexp2_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_logical_not_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_logical_or_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_logical_xor_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_logspace_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_logspace_tensor_overload_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_lu_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_mH_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_mH_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_amax_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_amax_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_argmax_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_cumsum_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_cumsum_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_log_softmax_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_mean_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_norm_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_prod_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_select_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_select_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_std_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_var_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_max_binary_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_max_binary_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_max_reduction_with_dim_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_meshgrid_variadic_tensors_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_min_reduction_with_dim_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_minimum_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_mode_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_movedim_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_mul_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_mvlgamma_mvlgamma_p_1_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_mvlgamma_mvlgamma_p_1_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_mvlgamma_mvlgamma_p_3_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nan_to_num_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nanmean_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_narrow_copy_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_native_dropout_backward_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_new_empty_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_new_empty_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_new_empty_strided_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_new_empty_strided_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_new_ones_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_new_ones_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_new_zeros_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_adaptive_avg_pool2d_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_adaptive_max_pool3d_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_avg_pool3d_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_binary_cross_entropy_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_binary_cross_entropy_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_conv_transpose3d_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_cosine_embedding_loss_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_cross_entropy_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_instance_norm_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_interpolate_linear_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_interpolate_nearest_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_l1_loss_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_l1_loss_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_layer_norm_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_local_response_norm_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_logsigmoid_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_margin_ranking_loss_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_max_unpool3d_grad_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_multilabel_soft_margin_loss_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_nll_loss_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_pad_circular_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_pad_constant_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_pad_replicate_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_pad_replicate_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_pad_replicate_negative_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_poisson_nll_loss_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_prelu_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_relu_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_silu_complex_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_smooth_l1_loss_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_softmin_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_triplet_margin_loss_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_norm_inf_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_normal_in_place_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_ones_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_ones_like_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_ones_like_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_outer_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_permute_copy_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_permute_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_permute_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_permute_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_polygamma_polygamma_n_0_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_polygamma_polygamma_n_0_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_polygamma_polygamma_n_1_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_polygamma_polygamma_n_1_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_polygamma_polygamma_n_2_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_polygamma_polygamma_n_2_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_polygamma_polygamma_n_3_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_polygamma_polygamma_n_3_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_positive_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_pow_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_prod_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_put_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_rad2deg_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_rad2deg_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_rad2deg_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_randint_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_randn_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_randn_like_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_randn_like_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_ravel_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_ravel_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_ravel_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_renorm_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_repeat_interleave_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_reshape_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_resize__cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_resize_as__cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_resolve_neg_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_roll_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_rot90_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_round_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_round_decimals_0_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_round_decimals_neg_3_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_rsqrt_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_scalar_tensor_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_scalar_tensor_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_scatter_reduce_mean_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_scatter_reduce_prod_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_select_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_select_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_select_scatter_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sgn_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_short_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_signal_windows_bartlett_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_signal_windows_hann_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sinh_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_slice_scatter_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_softmax_with_dtype_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sort_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_bessel_y0_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_bessel_y1_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_chebyshev_polynomial_u_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_chebyshev_polynomial_v_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_chebyshev_polynomial_w_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_hermite_polynomial_he_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_hermite_polynomial_he_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_laguerre_polynomial_l_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_modified_bessel_i1_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_modified_bessel_k1_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_ndtr_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_ndtr_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_scaled_modified_bessel_k0_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_shifted_chebyshev_polynomial_u_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_shifted_chebyshev_polynomial_w_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_spherical_bessel_j0_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_zeta_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_split_list_args_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_split_with_sizes_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_squeeze_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_stack_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sub_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sum_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sum_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sum_to_size_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_t_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_tan_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_tanh_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_tensor_split_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_tensordot_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_tile_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_to_sparse_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_to_sparse_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_torch__scaled_mm_cuda_float8_e4m3fn, test/test_decomp.py::TestDecompCUDA::test_comprehensive_transpose_copy_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_transpose_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_tril_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_true_divide_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_trunc_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_trunc_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unbind_copy_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unbind_copy_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unbind_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unflatten_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unfold_copy_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unfold_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unfold_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unique_consecutive_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unique_consecutive_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unsafe_chunk_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unsqueeze_copy_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unsqueeze_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unsqueeze_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_var_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_var_mean_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_var_mean_unbiased_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_var_mean_unbiased_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_var_unbiased_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_view_as_complex_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_view_as_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_view_copy_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_vsplit_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_vsplit_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_where_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_xlogy_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_masked_fill_cuda, test/test_decomp.py::TestDecompCUDA::test_quick__chunk_cat_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick__chunk_cat_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick__native_batch_norm_legit_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick__unsafe_masked_index_put_accumulate_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_abs_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_abs_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_acos_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_acosh_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_quick_acosh_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_addmm_decomposed_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_addr_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_alias_copy_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_any_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_arange_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_as_strided_copy_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_as_strided_scatter_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_as_strided_scatter_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_asin_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_atan2_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_atanh_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_bitwise_and_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_bitwise_left_shift_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_bitwise_xor_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_cat_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_clamp_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_clamp_min_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_clamp_min_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_clone_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_clone_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_conj_physical_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_copysign_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_core_backward_masked_fill_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_core_backward_rad2deg_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_cos_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_cos_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_count_nonzero_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_count_nonzero_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_cumprod_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_cumprod_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_diag_embed_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_diag_embed_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_diag_embed_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_diag_embed_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_diag_embed_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_diagonal_copy_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_diagonal_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_dot_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_dot_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_empty_like_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_empty_like_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_empty_strided_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_empty_strided_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_empty_strided_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_exp_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_exp_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_expand_copy_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_eye_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_fft_ihfftn_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_fft_irfft2_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_fft_irfft_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_fft_rfft2_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_fft_rfft_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_fft_rfftn_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_fft_rfftn_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_fill_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_flip_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_fmax_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_fmod_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_full_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_ge_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_gt_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_igamma_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_index_copy_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_index_fill_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_index_fill_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_isinf_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_isneginf_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_lcm_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_le_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_lerp_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_lgamma_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_linalg_cross_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_linalg_diagonal_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_linspace_tensor_overload_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_log10_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_log_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_log_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_logaddexp_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_logaddexp_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_logical_or_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_logical_or_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_logical_or_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_logical_xor_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_logit_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_logspace_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_logspace_tensor_overload_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_logsumexp_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_logsumexp_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_lt_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_lt_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_maximum_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_meshgrid_list_of_tensors_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_meshgrid_list_of_tensors_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_meshgrid_variadic_tensors_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_meshgrid_variadic_tensors_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_minimum_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_minimum_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_mul_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_mv_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_narrow_copy_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_quick_narrow_copy_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_narrow_copy_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_native_layer_norm_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_neg_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_neg_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_new_empty_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_new_empty_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_quick_new_empty_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_new_empty_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_nextafter_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_prelu_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_relu6_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_relu6_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_silu_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_unfold_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_norm_fro_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_normal_in_place_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_ones_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_ones_like_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_permute_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_permute_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_prod_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_randn_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_reciprocal_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_repeat_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_repeat_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_roll_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_round_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_rsqrt_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_select_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_select_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_sigmoid_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_sign_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_sin_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_slice_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_slice_scatter_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_slice_scatter_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_special_entr_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_special_log_ndtr_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_special_xlog1py_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_special_xlog1py_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_special_zeta_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_split_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_split_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_quick_split_with_sizes_copy_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_split_with_sizes_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_split_with_sizes_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_split_with_sizes_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_squeeze_copy_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_std_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_std_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_std_mean_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_std_unbiased_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_sub_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_sub_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_t_copy_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_t_copy_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_take_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_take_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_tan_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_tan_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_tanh_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_tanh_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_transpose_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_quick_trunc_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_unbind_copy_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_quick_unsafe_split_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_unsafe_split_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_unsqueeze_copy_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_unsqueeze_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_var_mean_unbiased_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_view_copy_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_view_copy_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_view_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_xlogy_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_zeros_cuda_int8, test/test_decomp.py::DecompOneOffTestsCUDA::test_amp_batch_norm_backward_cuda 2025-12-04T13:20:11.3178805Z 2025-12-04T13:20:11.3179120Z Finished test_decomp 13/17 ... [2025-12-04 13:20:11.276365][14839.659259266], took 10.78min 2025-12-04T13:20:11.3180207Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_decomp/test_decomp-fd1a91e45a41098b.xml 2025-12-04T13:20:12.3499653Z Uploading artifacts took 0.96 seconds 2025-12-04T13:20:12.3505483Z Running test_ops_fwd_gradients 1/2 ... [2025-12-04 13:20:12.350297][14840.733191155] 2025-12-04T13:20:12.3506347Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T13:20:12.3512105Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_ops_fwd_gradients.py', '--shard-id=1', '--num-shards=2', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 13:20:12.350860] 2025-12-04T13:30:01.7149662Z 2025-12-04T13:30:01.7150630Z test_ops_fwd_gradients 1/2 was successful, full logs can be found in artifacts with path test/test-reports/test_ops_fwd_gradients_1.2_4abfc4ee1bccdea9_.log 2025-12-04T13:30:01.7991127Z Running 1619 items in this shard: test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_H_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_T_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad___getitem___cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad___radd___cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad___radd___cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad___rmatmul___cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad___rmod___cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad___rmul___cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad___rpow___cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad__segment_reduce_lengths_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad__segment_reduce_offsets_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad__unsafe_masked_index_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad__unsafe_masked_index_put_accumulate_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_abs_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_acos_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_acosh_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_add_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_addbmm_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_addcdiv_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_addmm_decomposed_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_addmv_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_addr_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_alias_copy_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_alias_copy_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_all_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_allclose_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_amin_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_aminmax_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_angle_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_angle_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_any_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_argmin_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_argsort_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_argwhere_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_as_strided_copy_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_as_strided_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_as_strided_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_as_strided_partial_views_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_as_strided_scatter_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_asin_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_asinh_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_asinh_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_atan2_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_atan_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_atan_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_atanh_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_atleast_1d_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_atleast_2d_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_bfloat16_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_block_diag_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_bmm_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_bool_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_bool_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_broadcast_to_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_broadcast_to_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_byte_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_cartesian_prod_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_cat_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_cat_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_cauchy_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_cdouble_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_cfloat_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_chalf_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_cholesky_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_cholesky_inverse_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_cholesky_solve_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_clamp_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_clamp_max_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_clamp_min_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_clone_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_clone_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_column_stack_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_column_stack_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_combinations_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_conj_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_constant_pad_nd_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_copysign_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_corrcoef_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_cov_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_cov_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_cross_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_cross_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_cummax_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_cumprod_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_cumprod_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_cumsum_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_diag_embed_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_diag_embed_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_diagonal_copy_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_diagonal_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_diagonal_scatter_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_diff_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_dist_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_div_no_rounding_mode_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_double_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_double_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_dsplit_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_dsplit_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_empty_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_empty_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_empty_like_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_empty_like_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_empty_permuted_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_empty_strided_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_eq_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_equal_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_erfc_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_exp2_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_exp_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_expand_as_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_expand_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_expm1_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_exponential_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_eye_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_eye_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_fft_fft2_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_fft_fft2_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_fft_fft_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_fft_fftn_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_fft_fftshift_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_fft_fftshift_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_fft_hfft2_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_fft_hfftn_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_fft_hfftn_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_fft_ifftn_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_fft_ifftn_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_fft_ifftshift_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_fft_ihfftn_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_fft_irfft2_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_fft_irfft2_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_fft_irfft_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_fft_rfft2_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_fft_rfft_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_fill_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_flip_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_fliplr_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_flipud_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_fmax_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_frexp_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_full_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_full_like_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_full_like_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_gather_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_gather_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_grid_sampler_3d_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_gt_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_half_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_histc_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_hsplit_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_hstack_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_hstack_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_hypot_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_igamma_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_igammac_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_index_copy_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_index_copy_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_index_fill_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_index_fill_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_index_put_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_index_put_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_index_reduce_amax_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_index_reduce_amin_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_index_reduce_mean_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_index_reduce_prod_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_index_select_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_inner_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_inner_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_int_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_int_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_isclose_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_isclose_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_isfinite_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_isin_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_isinf_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_isinf_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_isnan_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_isneginf_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_isposinf_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_istft_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_jiterator_2inputs_2outputs_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_jiterator_4inputs_with_extra_args_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_jiterator_binary_return_by_ref_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_jiterator_binary_return_by_ref_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_jiterator_unary_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_kthvalue_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_ldexp_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_lerp_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_lgamma_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_linalg_cholesky_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_linalg_cholesky_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_linalg_cholesky_ex_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_linalg_cond_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_linalg_det_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_linalg_eig_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_linalg_eigh_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_linalg_eigh_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_linalg_eigvals_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_linalg_householder_product_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_linalg_inv_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_linalg_inv_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_linalg_inv_ex_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_linalg_ldl_factor_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_linalg_ldl_factor_ex_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_linalg_ldl_factor_ex_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_linalg_ldl_solve_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_linalg_lstsq_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_linalg_lstsq_grad_oriented_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_linalg_lstsq_grad_oriented_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_linalg_lu_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_linalg_lu_factor_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_linalg_lu_factor_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_linalg_lu_factor_ex_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_linalg_lu_solve_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_linalg_lu_solve_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_linalg_matrix_norm_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_linalg_matrix_power_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_linalg_matrix_rank_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_linalg_matrix_rank_hermitian_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_linalg_multi_dot_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_linalg_multi_dot_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_linalg_norm_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_linalg_norm_subgradients_at_zero_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_linalg_pinv_hermitian_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_linalg_pinv_hermitian_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_linalg_pinv_singular_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_linalg_qr_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_linalg_qr_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_linalg_slogdet_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_linalg_solve_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_linalg_solve_triangular_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_linalg_svd_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_linalg_svd_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_linalg_svdvals_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_linalg_svdvals_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_linalg_tensorinv_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_linalg_vecdot_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_linalg_vecdot_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_linalg_vector_norm_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_log10_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_log10_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_log1p_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_log2_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_log_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_log_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_log_softmax_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_logaddexp2_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_logaddexp_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_logcumsumexp_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_logcumsumexp_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_logdet_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_logical_and_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_logical_not_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_logical_xor_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_logspace_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_logspace_tensor_overload_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_logspace_tensor_overload_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_logsumexp_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_logsumexp_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_long_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_lt_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_lu_unpack_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_mH_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_mT_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_masked_amax_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_masked_argmin_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_masked_cumprod_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_masked_cumsum_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_masked_fill_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_masked_log_softmax_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_masked_logaddexp_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_masked_logsumexp_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_masked_logsumexp_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_masked_mean_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_masked_mean_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_masked_median_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_masked_norm_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_masked_prod_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_masked_scatter_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_masked_scatter_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_masked_select_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_masked_select_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_masked_softmax_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_masked_std_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_masked_std_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_masked_sum_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_masked_var_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_masked_var_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_matmul_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_matmul_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_matrix_exp_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_max_pool2d_with_indices_backward_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_max_reduction_no_dim_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_max_reduction_with_dim_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_mean_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_median_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_meshgrid_variadic_tensors_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_meshgrid_variadic_tensors_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_min_binary_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_min_reduction_no_dim_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_minimum_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_mm_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_movedim_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_msort_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_mul_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_multinomial_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_mvlgamma_mvlgamma_p_1_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_mvlgamma_mvlgamma_p_5_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nanquantile_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nansum_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nansum_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_narrow_copy_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_narrow_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_native_dropout_backward_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_ne_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_neg_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_new_empty_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_new_empty_strided_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nextafter_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_adaptive_avg_pool1d_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_adaptive_avg_pool3d_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_adaptive_max_pool1d_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_avg_pool1d_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_avg_pool2d_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_avg_pool3d_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_binary_cross_entropy_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_celu_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_channel_shuffle_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_channel_shuffle_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_conv1d_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_conv1d_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_conv3d_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_conv3d_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_conv_transpose1d_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_conv_transpose1d_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_conv_transpose2d_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_conv_transpose3d_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_cross_entropy_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_ctc_loss_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_dropout2d_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_dropout_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_elu_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_embedding_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_feature_alpha_dropout_with_train_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_feature_alpha_dropout_without_train_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_feature_alpha_dropout_without_train_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_fractional_max_pool3d_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_gaussian_nll_loss_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_gelu_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_hardtanh_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_hinge_embedding_loss_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_interpolate_bicubic_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_interpolate_bilinear_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_interpolate_linear_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_interpolate_nearest-exact_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_l1_loss_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_leaky_relu_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_linear_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_local_response_norm_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_margin_ranking_loss_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_max_unpool1d_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_max_unpool2d_grad_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_max_unpool3d_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_mse_loss_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_multi_head_attention_forward_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_multi_margin_loss_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_multilabel_margin_loss_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_nll_loss_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_pad_circular_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_pad_constant_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_pad_reflect_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_pad_replicate_negative_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_pairwise_distance_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_pairwise_distance_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_pdist_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_pixel_shuffle_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_pixel_unshuffle_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_poisson_nll_loss_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_prelu_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_relu6_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_rms_norm_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_rrelu_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_selu_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_smooth_l1_loss_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_softmin_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_softshrink_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_softsign_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_tanhshrink_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_triplet_margin_loss_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_triplet_margin_with_distance_loss_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_unfold_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_unfold_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nn_functional_upsample_nearest_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nonzero_static_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_nonzero_static_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_norm_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_norm_fro_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_norm_inf_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_norm_nuc_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_normal_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_normal_in_place_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_normal_in_place_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_ones_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_ones_like_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_ones_like_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_ormqr_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_outer_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_outer_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_pca_lowrank_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_pca_lowrank_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_permute_copy_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_permute_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_positive_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_prod_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_put_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_qr_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_rad2deg_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_rand_like_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_rand_like_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_randint_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_randint_like_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_randn_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_randn_like_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_ravel_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_real_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_reciprocal_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_renorm_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_renorm_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_repeat_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_repeat_interleave_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_reshape_as_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_reshape_as_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_reshape_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_reshape_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_resize__cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_resolve_neg_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_resolve_neg_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_roll_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_rot90_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_round_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_round_decimals_3_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_rsub_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_scalar_tensor_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_scatter_add_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_scatter_reduce_amin_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_scatter_reduce_mean_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_scatter_reduce_sum_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_select_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_select_scatter_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_sgn_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_sigmoid_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_signal_windows_bartlett_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_signal_windows_blackman_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_signal_windows_exponential_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_signal_windows_general_cosine_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_signal_windows_hann_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_signal_windows_nuttall_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_signbit_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_sinc_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_sinh_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_sinh_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_slice_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_softmax_with_dtype_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_softmax_with_dtype_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_sort_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_sparse_sampled_addmm_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_special_airy_ai_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_special_bessel_j1_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_special_bessel_y0_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_special_bessel_y1_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_special_chebyshev_polynomial_t_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_special_chebyshev_polynomial_v_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_special_entr_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_special_i0e_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_special_i1e_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_special_legendre_polynomial_p_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_special_modified_bessel_k0_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_special_modified_bessel_k1_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_special_ndtr_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_special_ndtri_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_special_scaled_modified_bessel_k0_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_special_scaled_modified_bessel_k1_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_special_shifted_chebyshev_polynomial_t_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_special_shifted_chebyshev_polynomial_v_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_special_shifted_chebyshev_polynomial_w_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_special_xlog1py_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_special_zeta_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_split_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_split_with_sizes_copy_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_split_with_sizes_copy_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_split_with_sizes_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_sqrt_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_square_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_squeeze_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_squeeze_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_stack_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_std_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_std_mean_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_std_mean_unbiased_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_std_unbiased_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_std_unbiased_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_stft_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_stft_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_sub_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_svd_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_svd_lowrank_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_t_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_t_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_take_along_dim_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_take_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_tan_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_tan_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_tanh_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_tile_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_to_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_to_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_to_sparse_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_torch_ops_aten__safe_softmax_default_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_trace_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_transpose_copy_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_trapezoid_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_trapz_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_trapz_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_tril_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_tril_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_triu_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_true_divide_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_true_divide_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_unbind_copy_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_unbind_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_unflatten_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_unfold_copy_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_unfold_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_unsafe_split_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_unsafe_split_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_unsqueeze_copy_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_unsqueeze_copy_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_unsqueeze_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_var_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_var_mean_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_var_mean_unbiased_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_var_unbiased_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_vdot_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_view_as_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_view_as_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_view_as_real_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_view_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_vsplit_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_vsplit_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_vstack_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_where_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_zero__cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_zero__cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_zeros_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_zeros_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_fn_fwgrad_bwgrad_zeros_like_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_H_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_T_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD___radd___cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD___radd___cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD___rmatmul___cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD___rmatmul___cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD___rmul___cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD___rpow___cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD___rpow___cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD___rsub___cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD__segment_reduce_lengths_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD__segment_reduce_offsets_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD__unsafe_masked_index_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_abs_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_acos_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_add_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_addbmm_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_addcdiv_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_addcmul_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_addcmul_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_addmm_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_addmm_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_addmm_decomposed_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_addmm_decomposed_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_addr_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_alias_copy_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_alias_copy_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_allclose_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_amax_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_angle_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_angle_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_any_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_arange_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_argmax_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_argmin_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_argsort_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_argwhere_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_argwhere_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_as_strided_copy_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_as_strided_copy_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_as_strided_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_as_strided_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_as_strided_partial_views_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_as_strided_scatter_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_asin_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_asinh_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_atleast_2d_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_atleast_2d_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_baddbmm_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_bfloat16_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_block_diag_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_bmm_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_bool_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_bucketize_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_byte_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_cat_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_cdist_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_cdouble_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_cdouble_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_cfloat_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_cfloat_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_chalf_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_cholesky_inverse_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_cholesky_inverse_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_cholesky_solve_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_chunk_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_clamp_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_clone_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_column_stack_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_combinations_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_complex_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_conj_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_conj_physical_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_conj_physical_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_copysign_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_corrcoef_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_cos_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_cosh_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_cov_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_cov_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_cross_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_cummax_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_cummin_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_cumulative_trapezoid_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_cumulative_trapezoid_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_diag_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_diag_embed_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_diagflat_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_diagflat_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_diagonal_copy_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_diagonal_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_diagonal_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_diff_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_digamma_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_dist_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_dot_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_double_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_double_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_dsplit_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_dstack_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_empty_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_empty_like_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_empty_strided_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_eq_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_equal_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_erfc_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_exp_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_exp_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_expand_as_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_expand_copy_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_expand_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_exponential_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_fft_fft2_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_fft_fft2_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_fft_fft_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_fft_fftn_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_fft_fftn_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_fft_fftshift_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_fft_hfft_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_fft_hfftn_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_fft_hfftn_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_fft_ifft2_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_fft_ifft_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_fft_ifft_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_fft_ifftshift_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_fft_ihfftn_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_fft_irfft2_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_fft_irfft_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_fft_irfftn_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_fft_rfft_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_fill_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_fliplr_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_flipud_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_float_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_float_power_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_floor_divide_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_fmin_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_frac_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_full_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_full_like_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_gradient_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_grid_sampler_2d_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_grid_sampler_3d_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_histc_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_hsplit_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_hstack_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_hstack_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_hypot_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_igammac_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_index_add_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_index_add_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_index_copy_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_index_reduce_amax_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_index_reduce_mean_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_index_reduce_prod_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_index_select_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_inner_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_int_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_isclose_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_isclose_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_isfinite_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_isinf_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_isnan_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_isnan_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_isneginf_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_jiterator_4inputs_with_extra_args_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_jiterator_binary_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_jiterator_binary_return_by_ref_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_jiterator_binary_return_by_ref_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_jiterator_unary_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_kron_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_kthvalue_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_ldexp_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_lerp_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_lgamma_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_linalg_cholesky_ex_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_linalg_cholesky_ex_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_linalg_cross_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_linalg_cross_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_linalg_det_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_linalg_det_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_linalg_eig_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_linalg_eigh_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_linalg_eigvals_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_linalg_eigvalsh_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_linalg_eigvalsh_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_linalg_householder_product_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_linalg_householder_product_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_linalg_inv_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_linalg_inv_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_linalg_inv_ex_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_linalg_ldl_factor_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_linalg_ldl_factor_ex_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_linalg_ldl_solve_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_linalg_ldl_solve_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_linalg_lstsq_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_linalg_lstsq_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_linalg_lstsq_grad_oriented_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_linalg_lu_factor_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_linalg_lu_factor_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_linalg_lu_factor_ex_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_linalg_lu_solve_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_linalg_lu_solve_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_linalg_matrix_norm_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_linalg_matrix_power_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_linalg_matrix_power_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_linalg_matrix_rank_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_linalg_matrix_rank_hermitian_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_linalg_matrix_rank_hermitian_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_linalg_multi_dot_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_linalg_norm_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_linalg_norm_subgradients_at_zero_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_linalg_norm_subgradients_at_zero_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_linalg_pinv_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_linalg_pinv_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_linalg_pinv_hermitian_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_linalg_pinv_hermitian_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_linalg_pinv_singular_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_linalg_pinv_singular_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_linalg_qr_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_linalg_slogdet_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_linalg_solve_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_linalg_svd_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_linalg_svd_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_linalg_tensorinv_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_linalg_tensorsolve_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_linalg_vander_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_linalg_vecdot_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_linalg_vecdot_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_linalg_vector_norm_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_linalg_vector_norm_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_linspace_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_linspace_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_linspace_tensor_overload_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_linspace_tensor_overload_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_log10_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_log10_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_log1p_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_log1p_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_log2_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_log_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_logaddexp_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_logcumsumexp_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_logical_and_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_logical_not_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_logical_or_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_logical_or_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_logical_xor_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_logspace_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_logspace_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_logsumexp_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_lu_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_lu_solve_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_lu_unpack_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_mH_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_mT_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_mT_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_masked_amax_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_masked_amin_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_masked_argmax_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_masked_cumprod_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_masked_cumsum_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_masked_fill_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_masked_mean_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_masked_median_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_masked_scatter_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_masked_select_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_masked_select_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_masked_softmax_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_masked_softmin_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_masked_sum_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_masked_var_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_matrix_exp_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_matrix_exp_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_max_binary_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_max_reduction_with_dim_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_maximum_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_meshgrid_list_of_tensors_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_min_binary_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_min_reduction_with_dim_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_minimum_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_mode_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_movedim_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_msort_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_mul_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_multinomial_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_mv_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_mvlgamma_mvlgamma_p_1_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_mvlgamma_mvlgamma_p_5_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nanmean_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nanmedian_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_narrow_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_narrow_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_native_dropout_backward_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_native_layer_norm_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_ne_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_neg_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_new_empty_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_new_empty_strided_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_new_ones_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_new_zeros_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_new_zeros_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_adaptive_avg_pool2d_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_adaptive_max_pool1d_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_adaptive_max_pool2d_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_adaptive_max_pool3d_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_alpha_dropout_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_avg_pool1d_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_avg_pool2d_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_batch_norm_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_celu_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_channel_shuffle_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_conv1d_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_conv1d_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_conv3d_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_conv3d_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_conv_transpose3d_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_conv_transpose3d_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_cosine_embedding_loss_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_cosine_similarity_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_dropout2d_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_dropout3d_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_dropout_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_feature_alpha_dropout_without_train_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_fractional_max_pool3d_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_gaussian_nll_loss_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_hardsigmoid_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_hardtanh_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_instance_norm_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_interpolate_nearest-exact_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_interpolate_nearest_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_l1_loss_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_linear_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_linear_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_local_response_norm_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_max_pool3d_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_max_unpool1d_grad_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_max_unpool3d_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_mish_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_multi_margin_loss_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_multilabel_soft_margin_loss_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_nll_loss_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_normalize_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_pad_circular_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_pad_constant_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_pad_reflect_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_pad_reflect_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_pairwise_distance_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_pixel_shuffle_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_pixel_shuffle_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_pixel_unshuffle_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_poisson_nll_loss_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_prelu_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_relu_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_rms_norm_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_rrelu_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_selu_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_silu_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_smooth_l1_loss_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_soft_margin_loss_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_softmin_with_dtype_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_softmin_with_dtype_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_softplus_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_tanhshrink_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_tanhshrink_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_triplet_margin_loss_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_triplet_margin_loss_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_unfold_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_unfold_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nn_functional_upsample_bilinear_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nonzero_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nonzero_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_nonzero_static_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_norm_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_norm_fro_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_norm_inf_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_norm_nuc_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_normal_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_normal_in_place_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_ones_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_ones_like_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_ones_like_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_ormqr_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_ormqr_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_pca_lowrank_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_permute_copy_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_permute_copy_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_polar_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_polygamma_polygamma_n_1_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_positive_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_pow_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_put_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_qr_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_quantile_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_rad2deg_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_rand_like_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_randn_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_randn_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_randn_like_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_real_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_real_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_reciprocal_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_remainder_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_renorm_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_repeat_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_repeat_interleave_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_repeat_interleave_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_reshape_as_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_reshape_as_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_resize_as__cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_resolve_conj_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_resolve_neg_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_resolve_neg_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_rot90_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_round_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_round_decimals_3_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_rsqrt_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_scatter_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_scatter_reduce_amax_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_scatter_reduce_amin_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_scatter_reduce_mean_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_scatter_reduce_prod_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_scatter_reduce_sum_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_select_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_sgn_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_short_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_short_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_sigmoid_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_sign_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_signal_windows_bartlett_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_signal_windows_exponential_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_signal_windows_gaussian_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_signal_windows_general_cosine_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_signal_windows_hann_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_signal_windows_kaiser_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_signal_windows_nuttall_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_sin_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_sinc_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_sinc_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_sinh_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_sinh_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_slice_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_slice_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_softmax_with_dtype_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_special_airy_ai_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_special_bessel_j1_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_special_bessel_y0_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_special_bessel_y1_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_special_chebyshev_polynomial_t_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_special_chebyshev_polynomial_u_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_special_chebyshev_polynomial_w_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_special_entr_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_special_hermite_polynomial_h_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_special_i0e_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_special_laguerre_polynomial_l_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_special_log_ndtr_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_special_modified_bessel_i1_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_special_modified_bessel_k0_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_special_ndtri_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_special_scaled_modified_bessel_k1_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_special_shifted_chebyshev_polynomial_u_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_special_shifted_chebyshev_polynomial_v_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_special_spherical_bessel_j0_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_split_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_split_list_args_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_split_with_sizes_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_sqrt_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_square_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_squeeze_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_squeeze_multiple_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_std_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_std_mean_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_std_mean_unbiased_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_std_unbiased_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_stft_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_sub_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_sum_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_t_copy_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_t_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_t_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_take_along_dim_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_take_along_dim_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_take_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_take_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_tan_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_tanh_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_tensor_split_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_tile_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_to_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_to_sparse_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_to_sparse_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_topk_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_trace_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_trace_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_transpose_copy_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_transpose_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_trapezoid_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_trapz_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_triangular_solve_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_tril_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_triu_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_triu_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_trunc_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_unbind_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_unflatten_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_unfold_copy_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_unfold_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_unfold_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_uniform_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_unique_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_unsafe_split_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_unsqueeze_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_var_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_var_mean_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_var_mean_unbiased_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_var_unbiased_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_var_unbiased_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_view_as_complex_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_view_as_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_view_as_real_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_view_copy_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_view_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_vsplit_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_vsplit_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_vstack_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_vstack_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_where_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_where_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_xlogy_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_zero__cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_zeros_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_zeros_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_forward_mode_AD_zeros_like_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_H_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_H_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_T_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_T_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD___getitem___cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD___radd___cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD___rdiv___cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD___rmatmul___cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD___rmatmul___cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD___rpow___cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD___rsub___cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD___rsub___cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD__segment_reduce_lengths_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD__unsafe_masked_index_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD__unsafe_masked_index_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_abs_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_abs_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_acos_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_acosh_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_acosh_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_addbmm_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_addbmm_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_addcmul_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_addmm_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_addmm_decomposed_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_addmm_decomposed_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_addr_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_allclose_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_amax_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_aminmax_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_angle_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_any_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_argmax_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_argmin_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_argsort_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_as_strided_copy_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_as_strided_copy_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_as_strided_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_as_strided_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_as_strided_scatter_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_asinh_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_atanh_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_atleast_2d_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_atleast_2d_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_atleast_3d_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_baddbmm_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_bernoulli_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_block_diag_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_bmm_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_bmm_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_bool_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_broadcast_to_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_bucketize_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_byte_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_byte_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_cat_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_cat_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_cauchy_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_cdouble_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_cfloat_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_chalf_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_char_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_char_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_cholesky_inverse_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_cholesky_solve_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_chunk_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_clone_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_combinations_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_complex_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_conj_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_conj_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_conj_physical_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_conj_physical_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_contiguous_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_copysign_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_cosh_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_count_nonzero_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_cov_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_cross_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_cumprod_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_cumsum_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_cumulative_trapezoid_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_cumulative_trapezoid_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_deg2rad_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_diag_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_diag_embed_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_diagflat_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_diagonal_copy_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_diagonal_copy_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_diagonal_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_diagonal_scatter_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_digamma_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_dist_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_div_no_rounding_mode_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_div_no_rounding_mode_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_div_trunc_rounding_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_dot_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_double_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_double_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_dsplit_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_dsplit_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_einsum_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_einsum_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_empty_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_empty_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_empty_like_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_empty_permuted_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_empty_permuted_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_eq_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_erfinv_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_exp2_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_exp_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_expand_copy_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_expand_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_expm1_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_expm1_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_eye_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_eye_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_fft_fft2_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_fft_fft_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_fft_fftn_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_fft_fftshift_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_fft_hfftn_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_fft_ifft2_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_fft_ifft2_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_fft_ifft_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_fft_ifftn_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_fft_ifftn_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_fft_ihfftn_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_fft_irfft2_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_fft_irfft_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_fft_rfft2_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_fft_rfftn_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_fill_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_flatten_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_flip_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_fliplr_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_flipud_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_float_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_fmax_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_fmin_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_frac_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_full_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_full_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_ge_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_geqrf_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_gradient_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_gt_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_half_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_half_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_hash_tensor_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_heaviside_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_histc_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_hsplit_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_hstack_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_hstack_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_hypot_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_i0_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_igammac_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_index_copy_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_index_copy_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_index_fill_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_index_fill_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_index_put_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_index_put_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_index_reduce_amin_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_index_reduce_prod_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_index_select_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_isclose_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_isin_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_isinf_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_isnan_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_isposinf_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_item_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_jiterator_4inputs_with_extra_args_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_jiterator_binary_return_by_ref_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_ldexp_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_le_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_lerp_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_lerp_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_lgamma_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_linalg_cholesky_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_linalg_cholesky_ex_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_linalg_cholesky_ex_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_linalg_cond_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_linalg_cond_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_linalg_cross_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_linalg_det_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_linalg_diagonal_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_linalg_diagonal_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_linalg_eigvals_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_linalg_householder_product_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_linalg_householder_product_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_linalg_inv_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_linalg_inv_ex_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_linalg_ldl_factor_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_linalg_ldl_factor_ex_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_linalg_ldl_factor_ex_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_linalg_ldl_solve_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_linalg_lstsq_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_linalg_lstsq_grad_oriented_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_linalg_lstsq_grad_oriented_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_linalg_lu_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_linalg_lu_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_linalg_lu_factor_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_linalg_lu_factor_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_linalg_lu_factor_ex_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_linalg_lu_solve_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_linalg_matrix_norm_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_linalg_matrix_power_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_linalg_matrix_power_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_linalg_matrix_rank_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_linalg_matrix_rank_hermitian_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_linalg_matrix_rank_hermitian_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_linalg_multi_dot_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_linalg_norm_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_linalg_norm_subgradients_at_zero_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_linalg_norm_subgradients_at_zero_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_linalg_pinv_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_linalg_pinv_singular_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_linalg_slogdet_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_linalg_slogdet_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_linalg_solve_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_linalg_solve_triangular_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_linalg_solve_triangular_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_linalg_svd_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_linalg_svd_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_linalg_tensorinv_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_linalg_tensorinv_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_linalg_tensorsolve_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_linalg_vander_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_linalg_vecdot_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_linalg_vector_norm_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_linspace_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_log10_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_log2_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_log_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_log_normal_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_log_softmax_with_dtype_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_log_softmax_with_dtype_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_logaddexp_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_logcumsumexp_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_logcumsumexp_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_logical_and_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_logspace_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_logspace_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_logspace_tensor_overload_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_logspace_tensor_overload_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_logsumexp_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_logsumexp_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_lu_unpack_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_mH_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_mT_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_masked_amax_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_masked_argmin_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_masked_cumsum_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_masked_cumsum_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_masked_logsumexp_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_masked_mean_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_masked_median_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_masked_norm_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_masked_normalize_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_masked_prod_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_masked_softmin_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_masked_std_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_masked_sum_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_masked_var_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_matmul_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_matrix_exp_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_matrix_exp_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_maximum_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_mean_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_meshgrid_variadic_tensors_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_min_binary_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_min_reduction_no_dim_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_min_reduction_with_dim_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_minimum_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_mm_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_mm_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_movedim_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_msort_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_mul_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_mvlgamma_mvlgamma_p_1_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_mvlgamma_mvlgamma_p_3_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_mvlgamma_mvlgamma_p_5_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nan_to_num_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nanmean_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nanmean_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nanmedian_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nanquantile_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_native_batch_norm_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_native_layer_norm_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_ne_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_new_empty_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_new_empty_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_new_full_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_new_ones_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_new_ones_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_new_zeros_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_adaptive_avg_pool2d_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_adaptive_avg_pool3d_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_adaptive_max_pool2d_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_alpha_dropout_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_avg_pool1d_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_avg_pool3d_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_batch_norm_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_batch_norm_without_cudnn_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_channel_shuffle_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_channel_shuffle_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_conv2d_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_conv3d_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_conv3d_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_conv_transpose1d_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_conv_transpose2d_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_conv_transpose2d_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_conv_transpose3d_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_ctc_loss_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_dropout2d_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_dropout_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_elu_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_embedding_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_feature_alpha_dropout_without_train_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_fractional_max_pool3d_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_gelu_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_glu_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_grid_sample_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_group_norm_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_huber_loss_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_instance_norm_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_interpolate_bilinear_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_interpolate_linear_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_interpolate_nearest_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_interpolate_trilinear_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_kl_div_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_l1_loss_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_leaky_relu_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_linear_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_logsigmoid_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_max_pool2d_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_max_unpool1d_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_max_unpool2d_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_max_unpool2d_grad_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_max_unpool3d_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_multilabel_margin_loss_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_multilabel_soft_margin_loss_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_nll_loss_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_normalize_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_pad_constant_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_pad_replicate_negative_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_pad_replicate_negative_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_pairwise_distance_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_pdist_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_pixel_shuffle_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_pixel_unshuffle_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_prelu_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_relu6_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_relu_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_rms_norm_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_rrelu_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_scaled_dot_product_attention_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_silu_complex_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_smooth_l1_loss_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_softmin_with_dtype_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_softshrink_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_softsign_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_tanhshrink_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_threshold_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nn_functional_unfold_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_nonzero_static_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_norm_fro_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_norm_inf_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_normal_number_mean_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_ones_like_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_ones_like_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_outer_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_pca_lowrank_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_permute_copy_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_permute_copy_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_permute_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_pinverse_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_polar_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_polygamma_polygamma_n_0_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_polygamma_polygamma_n_3_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_positive_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_positive_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_pow_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_put_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_qr_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_quantile_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_rand_like_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_randint_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_randint_like_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_randn_like_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_ravel_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_ravel_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_remainder_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_renorm_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_repeat_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_repeat_interleave_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_repeat_interleave_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_reshape_as_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_reshape_as_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_reshape_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_reshape_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_resize__cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_resize__cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_resize_as__cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_resolve_conj_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_resolve_conj_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_resolve_neg_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_resolve_neg_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_roll_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_roll_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_rot90_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_rot90_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_round_decimals_0_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_round_decimals_3_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_rsqrt_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_rsub_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_scatter_add_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_scatter_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_scatter_reduce_mean_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_scatter_reduce_prod_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_scatter_reduce_sum_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_sgn_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_short_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_short_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_sigmoid_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_sign_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_signal_windows_bartlett_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_signal_windows_blackman_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_signal_windows_gaussian_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_signal_windows_general_cosine_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_signal_windows_hamming_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_signal_windows_nuttall_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_sin_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_sin_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_sinc_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_slice_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_slice_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_slice_scatter_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_softmax_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_softmax_with_dtype_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_special_bessel_j0_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_special_bessel_y1_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_special_chebyshev_polynomial_u_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_special_chebyshev_polynomial_v_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_special_entr_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_special_erfcx_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_special_hermite_polynomial_h_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_special_i1_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_special_legendre_polynomial_p_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_special_modified_bessel_i1_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_special_modified_bessel_k0_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_special_ndtri_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_special_polygamma_special_polygamma_n_0_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_special_scaled_modified_bessel_k1_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_split_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_split_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_split_list_args_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_split_list_args_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_split_with_sizes_copy_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_split_with_sizes_copy_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_sqrt_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_squeeze_copy_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_squeeze_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_squeeze_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_squeeze_multiple_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_stack_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_std_mean_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_std_unbiased_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_std_unbiased_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_stft_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_stft_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_sub_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_sum_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_svd_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_svd_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_svd_lowrank_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_svd_lowrank_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_t_copy_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_t_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_take_along_dim_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_tan_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_tan_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_tensordot_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_tensordot_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_tile_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_to_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_to_sparse_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_topk_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_torch_ops_aten__safe_softmax_default_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_trace_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_transpose_copy_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_transpose_copy_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_transpose_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_transpose_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_trapz_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_triangular_solve_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_tril_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_true_divide_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_trunc_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_unbind_copy_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_unflatten_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_unfold_copy_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_unfold_copy_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_unfold_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_unfold_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_unique_consecutive_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_unsafe_chunk_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_unsafe_chunk_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_unsafe_split_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_unsafe_split_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_unsqueeze_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_var_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_var_mean_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_var_mean_unbiased_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_var_mean_unbiased_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_vdot_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_view_as_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_view_as_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_vsplit_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_vstack_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_vstack_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_where_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_xlogy_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_zero__cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_zero__cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_zeros_cuda_complex128, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_zeros_cuda_float64, test/test_ops_fwd_gradients.py::TestFwdGradientsCUDA::test_inplace_forward_mode_AD_zeros_like_cuda_complex128 2025-12-04T13:30:01.8810860Z 2025-12-04T13:30:01.8811241Z Finished test_ops_fwd_gradients 1/2 ... [2025-12-04 13:30:01.717653][15430.10054539], took 9.82min 2025-12-04T13:30:01.8812510Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_ops_fwd_gradients/test_ops_fwd_gradients-dac273fbaf67ad10.xml 2025-12-04T13:30:01.8820076Z Running test_meta 2/5 ... [2025-12-04 13:30:01.881677][15430.264571133] 2025-12-04T13:30:01.8820676Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T13:30:01.8824258Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_meta.py', '--shard-id=2', '--num-shards=5', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 13:30:01.882150] 2025-12-04T13:56:15.6860335Z 2025-12-04T13:56:15.6861327Z test_meta 2/5 was successful, full logs can be found in artifacts with path test/test-reports/test_meta_2.5_dad2a564d06ce93f_.log 2025-12-04T13:56:16.0272795Z Running 8088 items in this shard: test/test_meta.py::TestMetaConverter::test_imag, test/test_meta.py::TestMetaConverter::test_inplace_set_storage, test/test_meta.py::TestMetaConverter::test_non_leaf, test/test_meta.py::TestMetaConverter::test_view_mutate, test/test_meta.py::TestMetaConverter::test_weakref, test/test_meta.py::TestMetaCUDA::test_batch_norm_backward_output_mask0_cuda, test/test_meta.py::TestMetaCUDA::test_binary_ufuncs_mixed_dtype___rmod___cuda_float32, test/test_meta.py::TestMetaCUDA::test_binary_ufuncs_mixed_dtype__refs_fmin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_binary_ufuncs_mixed_dtype__refs_logical_and_cuda_float32, test/test_meta.py::TestMetaCUDA::test_binary_ufuncs_mixed_dtype__refs_nextafter_cuda_float32, test/test_meta.py::TestMetaCUDA::test_binary_ufuncs_mixed_dtype__refs_pow_cuda_float32, test/test_meta.py::TestMetaCUDA::test_binary_ufuncs_mixed_dtype__refs_special_xlog1py_cuda_float32, test/test_meta.py::TestMetaCUDA::test_binary_ufuncs_mixed_dtype_add_cuda_float32, test/test_meta.py::TestMetaCUDA::test_binary_ufuncs_mixed_dtype_clamp_min_cuda_float32, test/test_meta.py::TestMetaCUDA::test_binary_ufuncs_mixed_dtype_copysign_cuda_float32, test/test_meta.py::TestMetaCUDA::test_binary_ufuncs_mixed_dtype_div_trunc_rounding_cuda_float32, test/test_meta.py::TestMetaCUDA::test_binary_ufuncs_mixed_dtype_jiterator_binary_cuda_float32, test/test_meta.py::TestMetaCUDA::test_binary_ufuncs_mixed_dtype_max_binary_cuda_float32, test/test_meta.py::TestMetaCUDA::test_binary_ufuncs_mixed_dtype_mul_cuda_float32, test/test_meta.py::TestMetaCUDA::test_binary_ufuncs_mixed_dtype_polar_cuda_float32, test/test_meta.py::TestMetaCUDA::test_binary_ufuncs_mixed_dtype_pow_cuda_float32, test/test_meta.py::TestMetaCUDA::test_binary_ufuncs_mixed_dtype_remainder_cuda_float32, test/test_meta.py::TestMetaCUDA::test_binary_ufuncs_mixed_dtype_special_chebyshev_polynomial_t_cuda_float32, test/test_meta.py::TestMetaCUDA::test_binary_ufuncs_mixed_dtype_special_chebyshev_polynomial_w_cuda_float32, test/test_meta.py::TestMetaCUDA::test_binary_ufuncs_mixed_dtype_special_shifted_chebyshev_polynomial_v_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_H_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_H_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_H_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_T_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_T_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_T_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace___getitem___cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace___radd___cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace___radd___cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace___radd___cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace___rand___cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace___rmatmul___cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace___rmatmul___cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace___rmod___cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace___rmod___cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace___rmod___cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace___rmod___cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace___rmul___cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace___ror___cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace___ror___cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace___ror___cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace___rpow___cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace___rpow___cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace___rsub___cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace___rxor___cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace___rxor___cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_abs_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_abs_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_acos_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_add_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_add_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_add_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_addcmul_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_addcmul_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_addcmul_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_asin_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_asin_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_atan_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_atan_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_atan_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_atan_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_ceil_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_ceil_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_ceil_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_clamp_min_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_clamp_min_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_clamp_min_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_clamp_min_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_clamp_min_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_copy_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_copy_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_copy_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_copy_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_cos_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_cos_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_cosh_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_cosh_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_div_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_div_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_div_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_erf_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_erf_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_erf_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_erfc_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_exp_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_exp_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_expm1_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_expm1_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_floor_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_floor_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_frac_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_frac_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_frac_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_lerp_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_lerp_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_lerp_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_lgamma_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_lgamma_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_log10_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_log1p_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_log1p_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_log2_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_log2_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_log2_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_log_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_log_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_log_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_max_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_maximum_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_maximum_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_maximum_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_minimum_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_minimum_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_minimum_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_minimum_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_mul_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_mul_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_neg_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_neg_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_neg_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_norm_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_norm_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_pow_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_pow_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_reciprocal_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_reciprocal_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_reciprocal_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_reciprocal_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_reciprocal_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_reciprocal_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_reciprocal_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_rsqrt_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_rsqrt_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_rsqrt_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_rsqrt_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_rsqrt_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_sigmoid_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_sigmoid_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_sigmoid_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_sign_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_sign_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_sin_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_sin_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_sinh_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_sinh_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_sinh_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_sqrt_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_sqrt_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_sub_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_sub_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_sub_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_tan_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_tan_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_tan_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_tanh_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_tanh_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_tanh_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_zero_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_zero_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_zero_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_zero_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__native_batch_norm_legit_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__native_batch_norm_legit_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__softmax_backward_data_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__softmax_backward_data_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__unsafe_masked_index_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__unsafe_masked_index_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__unsafe_masked_index_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__unsafe_masked_index_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__unsafe_masked_index_put_accumulate_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__unsafe_masked_index_put_accumulate_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__upsample_bilinear2d_aa_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_abs_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_abs_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_abs_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_acos_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_acos_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_acosh_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_acosh_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_acosh_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_add_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_add_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_add_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_addbmm_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_addbmm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_addcdiv_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_addcmul_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_addcmul_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_addcmul_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_addcmul_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_addmm_decomposed_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_addmm_decomposed_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_addmm_decomposed_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_addmm_decomposed_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_addmv_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_addr_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_alias_copy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_alias_copy_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_alias_copy_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_all_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_all_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_all_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_allclose_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_allclose_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_amax_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_amax_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_aminmax_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_aminmax_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_angle_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_angle_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_angle_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_angle_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_any_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_any_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_any_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_any_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_arange_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_arange_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_arange_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_arange_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_arange_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_argmax_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_argmax_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_argmin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_argmin_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_argmin_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_argsort_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_argsort_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_argsort_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_argwhere_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_argwhere_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_argwhere_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_argwhere_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_as_strided_copy_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_as_strided_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_as_strided_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_as_strided_partial_views_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_as_strided_partial_views_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_as_strided_partial_views_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_as_strided_scatter_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_as_strided_scatter_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_as_strided_scatter_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_asin_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_asin_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_asinh_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_asinh_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_asinh_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_asinh_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_atan2_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_atan_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_atan_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_atan_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_atanh_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_atanh_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_atleast_1d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_atleast_2d_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_atleast_3d_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_atleast_3d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_atleast_3d_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_atleast_3d_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_baddbmm_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_baddbmm_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_bernoulli_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_bernoulli_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_bfloat16_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_bincount_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_bitwise_and_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_bitwise_and_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_bitwise_and_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_bitwise_not_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_bitwise_or_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_bitwise_or_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_bitwise_right_shift_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_bitwise_right_shift_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_bitwise_xor_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_block_diag_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_block_diag_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_block_diag_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_block_diag_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_bmm_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_bmm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_bmm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_bool_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_bool_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_bool_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_bool_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_bool_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_broadcast_tensors_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_broadcast_to_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_broadcast_to_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_broadcast_to_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_bucketize_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_byte_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cartesian_prod_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cartesian_prod_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cartesian_prod_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cartesian_prod_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cartesian_prod_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cat_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cat_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cat_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cat_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cauchy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cauchy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cdouble_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cdouble_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_ceil_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_ceil_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_ceil_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cfloat_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_chalf_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_chalf_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_chalf_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_char_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_char_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cholesky_solve_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_chunk_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_chunk_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_chunk_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_clamp_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_clamp_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_clamp_max_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_clamp_max_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_clamp_min_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_clone_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_column_stack_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_column_stack_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_combinations_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_combinations_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_combinations_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_conj_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_conj_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_conj_physical_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_conj_physical_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_conj_physical_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_conj_physical_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_constant_pad_nd_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_constant_pad_nd_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_contiguous_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_contiguous_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_copysign_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_corrcoef_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_corrcoef_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_corrcoef_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_corrcoef_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cos_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cos_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cos_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cos_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cos_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cosh_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cosh_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cov_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cov_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cross_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cross_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cross_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cummax_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cummax_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cummin_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cummin_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cumprod_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cumprod_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cumprod_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cumprod_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cumsum_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cumsum_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cumsum_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cumulative_trapezoid_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_deg2rad_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_deg2rad_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_deg2rad_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_deg2rad_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_deg2rad_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_diag_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_diag_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_diag_embed_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_diagonal_copy_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_diagonal_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_diagonal_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_diagonal_scatter_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_diff_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_digamma_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_digamma_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_digamma_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_digamma_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_div_floor_rounding_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_div_floor_rounding_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_div_no_rounding_mode_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_dot_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_double_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_double_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_dsplit_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_dstack_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_dstack_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_einsum_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_empty_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_empty_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_empty_like_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_empty_like_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_empty_permuted_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_empty_permuted_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_empty_permuted_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_empty_strided_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_empty_strided_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_empty_strided_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_eq_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_eq_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_eq_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_equal_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_equal_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_erf_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_erfc_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_erfinv_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_erfinv_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_erfinv_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_erfinv_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_exp2_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_exp2_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_exp_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_exp_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_expand_as_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_expand_as_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_expand_as_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_expand_as_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_expand_copy_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_expand_copy_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_expand_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_expm1_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_expm1_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_expm1_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_exponential_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_eye_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_eye_cuda_float8_e4m3fnuz, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_eye_cuda_float8_e5m2fnuz, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_eye_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_fft2_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_fft2_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_fft_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_fft_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_fft_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_fft_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_fftn_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_fftn_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_fftn_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_fftshift_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_hfft_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_hfftn_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_ifft2_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_ifft2_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_ifft_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_ifft_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_ifftn_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_ifftn_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_ifftn_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_ifftshift_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_ifftshift_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_ifftshift_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_ihfft2_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_ihfft2_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_ihfft2_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_ihfft_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_ihfftn_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_irfft2_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_irfft2_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_irfft_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_irfft_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_irfft_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_irfft_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_irfft_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_irfftn_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_irfftn_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_irfftn_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_irfftn_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_rfft_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_rfft_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_rfftn_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fill_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fill_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fill_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fill_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fill_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fill_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_flatten_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_flip_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_flip_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_flip_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fliplr_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fliplr_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_flipud_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_flipud_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_float_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_float_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_float_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_float_power_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_float_power_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_floor_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_floor_divide_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fmax_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fmax_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fmax_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fmin_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fmod_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fmod_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_frexp_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_full_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_full_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_full_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_full_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_full_like_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_gather_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_gather_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_gather_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_gcd_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_ge_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_ge_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_ge_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_ge_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_geometric_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_geometric_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_gradient_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_grid_sampler_3d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_grid_sampler_3d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_gt_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_half_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_hash_tensor_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_hash_tensor_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_heaviside_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_heaviside_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_histc_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_hsplit_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_hsplit_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_hsplit_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_hsplit_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_hsplit_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_hstack_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_hstack_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_hstack_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_hstack_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_i0_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_i0_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_igamma_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_igammac_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_imag_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_index_copy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_index_copy_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_index_fill_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_index_put_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_index_put_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_index_put_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_index_put_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_index_reduce_amax_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_index_reduce_mean_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_index_reduce_prod_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_index_reduce_prod_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_index_select_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_index_select_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_index_select_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_index_select_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_inner_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_int_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_int_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_isclose_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_isfinite_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_isfinite_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_isin_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_isin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_isinf_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_isinf_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_isinf_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_isnan_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_isnan_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_isneginf_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_isneginf_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_isneginf_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_isneginf_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_isreal_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_isreal_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_istft_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_item_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_item_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_jiterator_2inputs_2outputs_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_jiterator_2inputs_2outputs_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_jiterator_4inputs_with_extra_args_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_jiterator_4inputs_with_extra_args_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_jiterator_4inputs_with_extra_args_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_jiterator_binary_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_jiterator_binary_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_jiterator_binary_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_jiterator_binary_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_jiterator_binary_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_jiterator_binary_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_jiterator_binary_return_by_ref_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_jiterator_binary_return_by_ref_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_jiterator_unary_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_jiterator_unary_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_kron_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_kron_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_kron_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_kthvalue_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_lcm_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_ldexp_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_le_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_lerp_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_lgamma_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_cholesky_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_cond_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_cross_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_cross_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_cross_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_det_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_det_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_diagonal_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_diagonal_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_diagonal_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_diagonal_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_eig_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_eigh_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_eigvalsh_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_inv_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_ldl_factor_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_ldl_factor_ex_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_ldl_solve_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_lstsq_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_lu_solve_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_matrix_norm_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_matrix_rank_hermitian_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_matrix_rank_hermitian_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_multi_dot_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_multi_dot_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_norm_subgradients_at_zero_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_pinv_singular_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_qr_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_qr_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_solve_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_solve_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_solve_ex_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_solve_triangular_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_svd_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_svdvals_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_vecdot_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linspace_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linspace_tensor_overload_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linspace_tensor_overload_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linspace_tensor_overload_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linspace_tensor_overload_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linspace_tensor_overload_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_log10_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_log10_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_log10_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_log10_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_log1p_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_log1p_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_log1p_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_log1p_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_log_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_log_softmax_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_log_softmax_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_log_softmax_with_dtype_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_log_softmax_with_dtype_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_log_softmax_with_dtype_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_logaddexp2_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_logaddexp_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_logdet_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_logical_and_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_logical_and_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_logical_not_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_logical_not_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_logical_or_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_logical_xor_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_logical_xor_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_logspace_tensor_overload_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_logspace_tensor_overload_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_logspace_tensor_overload_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_logspace_tensor_overload_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_logsumexp_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_long_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_lt_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_lu_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_lu_solve_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_mH_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_mH_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_mT_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_mT_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_mT_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_amax_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_amax_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_amax_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_amax_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_amin_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_argmax_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_argmin_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_cumprod_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_cumprod_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_cumsum_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_cumsum_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_cumsum_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_fill_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_fill_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_fill_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_fill_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_fill_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_logsumexp_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_logsumexp_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_mean_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_median_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_median_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_norm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_normalize_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_scatter_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_scatter_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_select_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_select_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_select_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_select_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_std_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_std_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_std_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_var_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_matmul_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_matrix_exp_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_matrix_exp_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_max_binary_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_max_binary_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_max_pool2d_with_indices_backward_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_max_reduction_no_dim_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_max_reduction_no_dim_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_max_reduction_no_dim_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_max_reduction_with_dim_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_max_reduction_with_dim_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_maximum_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_maximum_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_meshgrid_list_of_tensors_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_meshgrid_list_of_tensors_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_meshgrid_list_of_tensors_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_meshgrid_list_of_tensors_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_meshgrid_variadic_tensors_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_min_binary_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_min_reduction_no_dim_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_min_reduction_with_dim_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_minimum_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_minimum_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_mm_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_mm_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_mode_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_mode_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_mode_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_mode_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_movedim_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_movedim_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_msort_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_msort_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_mul_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_mul_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_multinomial_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_mv_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_mvlgamma_mvlgamma_p_1_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_mvlgamma_mvlgamma_p_1_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_mvlgamma_mvlgamma_p_3_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_mvlgamma_mvlgamma_p_3_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_mvlgamma_mvlgamma_p_3_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_mvlgamma_mvlgamma_p_3_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_mvlgamma_mvlgamma_p_5_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_mvlgamma_mvlgamma_p_5_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nan_to_num_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nan_to_num_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nanmean_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nanmedian_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nanmedian_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nanquantile_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nansum_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_narrow_copy_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_narrow_copy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_narrow_copy_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_narrow_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_narrow_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_native_batch_norm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_native_dropout_backward_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_native_layer_norm_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_ne_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_ne_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_ne_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_ne_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_neg_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_neg_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_neg_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_new_empty_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_new_empty_strided_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_new_empty_strided_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_new_empty_strided_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_new_full_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_new_full_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_new_full_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_new_ones_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_new_zeros_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_adaptive_avg_pool1d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_adaptive_avg_pool2d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_adaptive_avg_pool3d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_adaptive_avg_pool3d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_adaptive_max_pool1d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_adaptive_max_pool2d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_adaptive_max_pool2d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_adaptive_max_pool3d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_alpha_dropout_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_alpha_dropout_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_avg_pool1d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_avg_pool1d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_avg_pool1d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_avg_pool1d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_avg_pool2d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_avg_pool2d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_avg_pool3d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_batch_norm_without_cudnn_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_bilinear_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_channel_shuffle_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_channel_shuffle_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_channel_shuffle_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_conv1d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_conv2d_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_conv3d_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_conv_transpose1d_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_conv_transpose2d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_conv_transpose2d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_conv_transpose3d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_cosine_embedding_loss_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_cosine_embedding_loss_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_cosine_embedding_loss_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_cosine_embedding_loss_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_cosine_similarity_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_cosine_similarity_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_cross_entropy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_dropout2d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_embedding_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_feature_alpha_dropout_with_train_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_feature_alpha_dropout_without_train_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_feature_alpha_dropout_without_train_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_feature_alpha_dropout_without_train_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_fractional_max_pool2d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_fractional_max_pool3d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_gelu_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_gelu_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_grid_sample_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_group_norm_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_hardsigmoid_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_hardswish_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_hardswish_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_huber_loss_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_interpolate_area_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_interpolate_bicubic_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_interpolate_nearest-exact_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_interpolate_trilinear_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_logsigmoid_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_max_pool2d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_max_pool2d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_max_unpool1d_grad_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_max_unpool2d_grad_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_max_unpool3d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_max_unpool3d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_max_unpool3d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_mish_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_mse_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_multi_head_attention_forward_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_multi_head_attention_forward_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_nll_loss_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_nll_loss_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_normalize_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_normalize_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_pad_circular_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_pad_constant_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_pad_constant_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_pad_constant_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_pad_reflect_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_pad_reflect_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_pad_replicate_negative_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_pad_replicate_negative_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_pad_replicate_negative_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_pad_replicate_negative_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_pairwise_distance_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_pairwise_distance_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_pdist_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_pixel_shuffle_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_pixel_shuffle_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_pixel_unshuffle_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_pixel_unshuffle_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_relu6_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_relu6_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_relu_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_relu_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_rms_norm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_rrelu_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_scaled_dot_product_attention_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_scaled_dot_product_attention_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_selu_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_selu_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_silu_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_smooth_l1_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_soft_margin_loss_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_softmin_with_dtype_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_softmin_with_dtype_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_softshrink_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_softshrink_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_softsign_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_softsign_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_softsign_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_softsign_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_tanhshrink_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_tanhshrink_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_tanhshrink_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_threshold_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_threshold_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_triplet_margin_loss_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_triplet_margin_loss_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_triplet_margin_with_distance_loss_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_triplet_margin_with_distance_loss_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_triplet_margin_with_distance_loss_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_upsample_bilinear_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_upsample_nearest_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_upsample_nearest_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nonzero_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nonzero_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nonzero_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nonzero_static_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_norm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_norm_inf_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_norm_nuc_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_normal_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_normal_in_place_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_normal_in_place_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_ones_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_ones_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_ones_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_ones_like_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_ones_like_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_ones_like_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_ones_like_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_ormqr_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_permute_copy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_permute_copy_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_permute_copy_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_permute_copy_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_permute_copy_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_permute_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_permute_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_permute_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_permute_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_permute_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_polygamma_polygamma_n_0_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_polygamma_polygamma_n_0_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_polygamma_polygamma_n_0_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_polygamma_polygamma_n_1_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_polygamma_polygamma_n_1_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_polygamma_polygamma_n_2_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_polygamma_polygamma_n_2_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_polygamma_polygamma_n_3_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_polygamma_polygamma_n_3_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_polygamma_polygamma_n_3_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_polygamma_polygamma_n_4_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_polygamma_polygamma_n_4_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_polygamma_polygamma_n_4_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_positive_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_positive_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_positive_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_pow_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_pow_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_pow_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_prod_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_prod_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_put_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_qr_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_rad2deg_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_rad2deg_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_rad2deg_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_rand_like_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_randn_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_randn_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_randn_like_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_randn_like_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_randn_like_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_ravel_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_real_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_real_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_real_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_real_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_reciprocal_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_remainder_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_remainder_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_remainder_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_repeat_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_repeat_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_repeat_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_repeat_interleave_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_reshape_as_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_reshape_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_reshape_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_reshape_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_resize__cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_resize__cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_resize_as__cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_resize_as__cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_resize_as__cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_resize_as__cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_resolve_conj_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_resolve_conj_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_resolve_conj_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_resolve_conj_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_resolve_neg_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_resolve_neg_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_resolve_neg_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_roll_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_roll_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_rot90_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_rot90_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_round_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_round_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_round_decimals_0_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_round_decimals_3_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_round_decimals_neg_3_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_round_decimals_neg_3_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_rsqrt_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_rsqrt_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_rsub_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_rsub_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_rsub_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_rsub_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_scalar_tensor_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_scalar_tensor_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_scalar_tensor_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_scatter_add_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_scatter_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_scatter_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_scatter_reduce_amax_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_scatter_reduce_amax_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_scatter_reduce_mean_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_scatter_reduce_prod_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_scatter_reduce_sum_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_scatter_reduce_sum_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_scatter_reduce_sum_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_searchsorted_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_searchsorted_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_searchsorted_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_select_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sgn_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sgn_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sgn_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sgn_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_short_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sigmoid_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sign_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sign_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sign_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_signal_windows_bartlett_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_signal_windows_cosine_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_signal_windows_gaussian_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_signal_windows_general_hamming_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_signal_windows_hann_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_signal_windows_hann_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_signbit_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_signbit_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_signbit_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_signbit_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sin_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sinc_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sinc_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sinh_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_slice_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_slice_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_slice_scatter_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_softmax_with_dtype_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_softmax_with_dtype_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_softmax_with_dtype_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sort_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sort_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sort_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sort_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sparse_mm_reduce_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sparse_sampled_addmm_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_bessel_j1_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_bessel_y0_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_bessel_y1_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_chebyshev_polynomial_t_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_chebyshev_polynomial_t_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_chebyshev_polynomial_u_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_chebyshev_polynomial_u_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_chebyshev_polynomial_v_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_entr_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_erfcx_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_erfcx_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_erfcx_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_hermite_polynomial_h_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_hermite_polynomial_he_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_i0e_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_i0e_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_i1_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_i1_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_i1_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_laguerre_polynomial_l_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_laguerre_polynomial_l_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_laguerre_polynomial_l_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_laguerre_polynomial_l_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_legendre_polynomial_p_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_legendre_polynomial_p_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_log_ndtr_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_modified_bessel_i1_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_modified_bessel_k0_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_modified_bessel_k0_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_ndtr_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_ndtr_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_ndtr_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_ndtr_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_ndtr_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_ndtr_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_ndtr_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_ndtri_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_ndtri_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_polygamma_special_polygamma_n_0_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_polygamma_special_polygamma_n_0_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_polygamma_special_polygamma_n_0_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_scaled_modified_bessel_k1_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_shifted_chebyshev_polynomial_v_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_shifted_chebyshev_polynomial_v_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_spherical_bessel_j0_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_xlog1py_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_split_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_split_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_split_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_split_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_split_list_args_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_split_list_args_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_split_with_sizes_copy_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_split_with_sizes_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_split_with_sizes_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_split_with_sizes_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_split_with_sizes_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sqrt_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sqrt_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sqrt_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sqrt_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sqrt_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_square_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_square_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_squeeze_copy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_squeeze_copy_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_squeeze_copy_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_squeeze_copy_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_squeeze_copy_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_squeeze_copy_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_squeeze_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_squeeze_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_squeeze_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_squeeze_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_squeeze_multiple_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_squeeze_multiple_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_stack_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_stack_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_stack_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_std_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_std_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_std_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_std_mean_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_std_mean_unbiased_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_std_unbiased_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_stft_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_stft_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sub_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sub_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sub_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sum_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sum_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sum_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sum_to_size_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sum_to_size_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sum_to_size_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sum_to_size_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_svd_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_svd_lowrank_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_svd_lowrank_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_t_copy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_t_copy_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_t_copy_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_t_copy_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_take_along_dim_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_take_along_dim_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_take_along_dim_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_take_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_tan_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_tan_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_tan_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_tanh_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_tanh_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_tanh_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_tensor_split_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_tensor_split_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_tensor_split_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_tensor_split_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_tile_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_tile_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_to_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_to_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_to_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_to_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_to_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_to_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_to_sparse_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_to_sparse_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_to_sparse_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_topk_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_topk_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_torch_ops_aten__efficient_attention_forward_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_torch_ops_aten__safe_softmax_default_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_torch_ops_aten__safe_softmax_default_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_trace_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_trace_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_trace_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_transpose_copy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_transpose_copy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_transpose_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_trapezoid_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_trapezoid_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_trapezoid_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_trapz_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_trapz_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_trapz_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_triangular_solve_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_tril_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_tril_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_tril_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_triu_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_triu_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_triu_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_true_divide_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_true_divide_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_trunc_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unbind_copy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unbind_copy_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unbind_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unflatten_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unfold_copy_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unfold_copy_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unfold_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unfold_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unfold_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unfold_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unfold_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unfold_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unfold_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unique_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unique_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unique_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unique_cuda_uint16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unique_cuda_uint64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unsafe_chunk_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unsafe_chunk_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unsafe_chunk_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unsafe_split_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unsafe_split_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unsafe_split_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unsqueeze_copy_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unsqueeze_copy_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unsqueeze_copy_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unsqueeze_copy_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unsqueeze_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unsqueeze_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unsqueeze_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unsqueeze_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_var_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_var_mean_unbiased_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_var_unbiased_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_vdot_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_vdot_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_view_as_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_view_as_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_view_as_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_view_copy_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_view_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_view_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_view_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_vsplit_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_vsplit_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_vsplit_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_vsplit_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_vsplit_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_vstack_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_vstack_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_vstack_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_vstack_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_where_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_where_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_xlogy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_xlogy_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_xlogy_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_zeros_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_zeros_like_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_zeros_like_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_H_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_H_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_H_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_H_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_H_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_T_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_T_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_T_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace___getitem___cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace___getitem___cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace___getitem___cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace___getitem___cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace___radd___cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace___radd___cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace___rdiv___cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace___rdiv___cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace___rmatmul___cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace___rmatmul___cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace___rmod___cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace___rmul___cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace___rmul___cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace___rmul___cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace___rmul___cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace___ror___cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace___ror___cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace___rpow___cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace___rsub___cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace___rsub___cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace___rxor___cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace___rxor___cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__chunk_cat_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_abs_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_abs_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_abs_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_acos_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_acos_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_add_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_add_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_add_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_addcdiv_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_addcdiv_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_addcdiv_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_addcdiv_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_addcmul_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_atan_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_atan_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_atan_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_ceil_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_clamp_max_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_clamp_max_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_clamp_max_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_clamp_min_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_clamp_min_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_copy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_copy_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_cos_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_cos_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_cos_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_cos_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_cos_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_cosh_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_cosh_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_div_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_div_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_div_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_erf_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_erf_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_erfc_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_exp_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_expm1_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_expm1_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_expm1_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_expm1_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_floor_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_frac_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_frac_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_frac_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_frac_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_frac_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_frac_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_lerp_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_lgamma_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_lgamma_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_lgamma_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_lgamma_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_log10_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_log1p_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_log1p_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_log1p_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_log2_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_log_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_log_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_log_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_log_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_maximum_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_maximum_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_maximum_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_neg_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_neg_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_neg_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_norm_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_norm_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_pow_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_pow_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_pow_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_reciprocal_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_reciprocal_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_reciprocal_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_reciprocal_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_reciprocal_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_round_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_round_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_rsqrt_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_rsqrt_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_rsqrt_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_sigmoid_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_sigmoid_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_sign_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_sign_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_sin_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_sin_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_sinh_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_sinh_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_sinh_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_sinh_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_sqrt_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_sqrt_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_sub_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_tan_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_tan_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_tan_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_tan_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_tanh_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_tanh_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_tanh_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_trunc_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_trunc_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_trunc_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_zero_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__native_batch_norm_legit_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__native_batch_norm_legit_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__segment_reduce_lengths_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__unsafe_masked_index_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__unsafe_masked_index_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__unsafe_masked_index_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__unsafe_masked_index_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__unsafe_masked_index_put_accumulate_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__unsafe_masked_index_put_accumulate_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_acos_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_acos_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_acos_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_acos_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_acosh_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_acosh_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_acosh_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_acosh_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_add_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_addbmm_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_addbmm_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_addbmm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_addcdiv_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_addcmul_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_addcmul_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_addcmul_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_addcmul_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_addmm_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_addmm_decomposed_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_addmv_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_addmv_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_addmv_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_addmv_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_addr_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_alias_copy_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_alias_copy_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_alias_copy_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_all_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_all_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_all_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_all_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_all_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_allclose_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_amax_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_amin_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_aminmax_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_aminmax_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_aminmax_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_aminmax_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_aminmax_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_angle_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_any_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_any_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_any_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_any_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_arange_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_arange_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_argmax_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_argmin_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_argsort_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_argsort_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_argsort_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_argwhere_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_argwhere_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_as_strided_copy_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_as_strided_copy_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_as_strided_copy_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_as_strided_copy_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_as_strided_copy_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_as_strided_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_as_strided_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_as_strided_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_as_strided_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_as_strided_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_as_strided_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_as_strided_partial_views_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_as_strided_scatter_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_as_strided_scatter_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_asin_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_asinh_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_asinh_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_asinh_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_atan2_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_atan2_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_atan_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_atan_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_atanh_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_atanh_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_atleast_1d_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_atleast_1d_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_atleast_2d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_atleast_2d_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_atleast_2d_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_atleast_3d_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_baddbmm_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_baddbmm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_bernoulli_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_bfloat16_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_bfloat16_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_bfloat16_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_bfloat16_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_bfloat16_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_bincount_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_bitwise_and_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_bitwise_and_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_bitwise_left_shift_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_bitwise_not_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_bitwise_or_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_bitwise_xor_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_bitwise_xor_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_bitwise_xor_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_block_diag_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_block_diag_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_block_diag_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_block_diag_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_block_diag_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_block_diag_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_block_diag_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_bmm_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_bmm_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_bool_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_bool_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_bool_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_bool_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_broadcast_tensors_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_broadcast_tensors_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_broadcast_tensors_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_broadcast_to_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_broadcast_to_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_broadcast_to_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_bucketize_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_bucketize_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_byte_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_byte_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cartesian_prod_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cartesian_prod_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cat_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cauchy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cdouble_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cdouble_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cdouble_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cdouble_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cdouble_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_ceil_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cfloat_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cfloat_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cfloat_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_chalf_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_chalf_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_chalf_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_char_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_char_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_char_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_char_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_chunk_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_chunk_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_clamp_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_clamp_max_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_clamp_max_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_clamp_max_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_clamp_min_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_clone_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_clone_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_clone_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_clone_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_clone_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_column_stack_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_column_stack_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_column_stack_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_column_stack_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_column_stack_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_combinations_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_conj_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_conj_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_conj_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_conj_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_conj_physical_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_conj_physical_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_conj_physical_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_constant_pad_nd_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_constant_pad_nd_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_constant_pad_nd_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_contiguous_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_contiguous_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_contiguous_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_copysign_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cos_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cos_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cosh_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cosh_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cosh_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cosh_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cosh_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cosh_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cosh_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_count_nonzero_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_count_nonzero_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cov_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cov_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cross_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cummax_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cummax_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cummin_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cummin_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cummin_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cummin_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cummin_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cumprod_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cumprod_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cumsum_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cumsum_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cumulative_trapezoid_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cumulative_trapezoid_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cumulative_trapezoid_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cumulative_trapezoid_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_deg2rad_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_deg2rad_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_deg2rad_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_diag_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_diag_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_diag_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_diag_embed_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_diagflat_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_diagflat_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_diagonal_copy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_diagonal_scatter_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_diagonal_scatter_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_diff_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_diff_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_digamma_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_digamma_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_digamma_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_digamma_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_digamma_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_div_floor_rounding_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_div_no_rounding_mode_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_div_no_rounding_mode_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_div_trunc_rounding_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_div_trunc_rounding_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_dot_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_dot_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_double_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_double_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_double_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_double_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_dsplit_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_dsplit_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_dsplit_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_dstack_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_dstack_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_einsum_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_einsum_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_empty_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_empty_like_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_empty_like_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_empty_permuted_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_empty_permuted_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_empty_permuted_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_empty_strided_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_empty_strided_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_eq_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_eq_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_equal_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_equal_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_equal_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_erfc_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_exp2_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_exp_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_exp_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_exp_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_expand_as_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_expand_as_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_expand_copy_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_expand_copy_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_expand_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_expand_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_expand_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_expand_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_expand_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_expm1_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_expm1_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_expm1_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_eye_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_eye_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_eye_cuda_float8_e5m2fnuz, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_fft2_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_fftshift_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_fftshift_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_hfft2_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_hfft2_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_hfft2_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_hfft2_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_hfft_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_hfft_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_hfft_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_hfftn_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_hfftn_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_ifft2_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_ifft_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_ifft_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_ifft_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_ifftn_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_ifftn_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_ifftshift_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_ifftshift_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_ihfft2_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_ihfft2_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_ihfft_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_ihfft_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_ihfft_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_ihfftn_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_ihfftn_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_irfft2_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_irfft2_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_irfft2_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_irfft_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_irfft_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_irfft_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_irfftn_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_irfftn_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_rfft2_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_rfft2_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_rfftn_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_rfftn_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_rfftn_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fill_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fill_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fill_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_flatten_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_flatten_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_flip_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fliplr_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fliplr_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fliplr_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_flipud_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_flipud_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_float_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_float_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_float_power_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_float_power_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_float_power_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_floor_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_floor_divide_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_floor_divide_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_floor_divide_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fmax_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fmax_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fmin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fmin_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fmod_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_frac_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_full_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_full_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_full_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_full_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_full_like_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_full_like_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_full_like_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_full_like_cuda_uint32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_gather_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_gather_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_gather_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_gcd_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_ge_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_ge_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_ge_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_ge_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_geometric_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_geometric_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_geqrf_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_grid_sampler_2d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_grid_sampler_3d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_gt_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_gt_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_half_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_hash_tensor_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_hash_tensor_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_hash_tensor_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_hash_tensor_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_heaviside_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_heaviside_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_histc_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_hsplit_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_hsplit_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_hstack_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_hstack_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_hstack_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_i0_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_i0_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_index_add_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_index_add_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_index_add_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_index_add_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_index_copy_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_index_copy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_index_fill_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_index_fill_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_index_put_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_index_put_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_index_put_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_index_put_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_index_reduce_amax_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_index_reduce_amax_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_index_reduce_amin_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_index_reduce_amin_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_index_reduce_prod_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_index_reduce_prod_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_index_reduce_prod_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_index_select_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_index_select_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_int_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_int_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_isclose_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_isclose_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_isclose_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_isclose_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_isfinite_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_isin_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_isin_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_isin_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_isneginf_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_isneginf_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_isneginf_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_isposinf_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_isposinf_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_isreal_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_istft_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_item_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_item_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_jiterator_2inputs_2outputs_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_jiterator_2inputs_2outputs_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_jiterator_2inputs_2outputs_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_jiterator_2inputs_2outputs_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_jiterator_4inputs_with_extra_args_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_jiterator_4inputs_with_extra_args_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_jiterator_4inputs_with_extra_args_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_jiterator_4inputs_with_extra_args_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_jiterator_4inputs_with_extra_args_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_jiterator_binary_return_by_ref_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_jiterator_binary_return_by_ref_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_jiterator_binary_return_by_ref_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_jiterator_unary_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_jiterator_unary_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_jiterator_unary_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_jiterator_unary_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_kron_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_kron_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_kron_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_kron_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_kthvalue_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_lcm_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_lcm_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_ldexp_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_ldexp_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_ldexp_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_ldexp_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_le_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_lerp_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_lerp_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_lgamma_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_lgamma_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_cond_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_cross_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_cross_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_cross_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_det_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_diagonal_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_diagonal_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_diagonal_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_eigh_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_inv_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_inv_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_inv_ex_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_ldl_factor_ex_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_ldl_factor_ex_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_lstsq_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_lstsq_grad_oriented_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_lu_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_lu_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_lu_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_lu_factor_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_lu_factor_ex_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_matrix_norm_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_matrix_norm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_matrix_power_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_matrix_power_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_matrix_rank_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_matrix_rank_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_matrix_rank_hermitian_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_multi_dot_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_norm_subgradients_at_zero_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_pinv_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_slogdet_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_solve_ex_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_solve_ex_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_solve_triangular_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_solve_triangular_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_svd_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_vander_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_vander_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_vecdot_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_vecdot_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_vecdot_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_vecdot_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linspace_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linspace_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linspace_tensor_overload_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_log10_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_log1p_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_log1p_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_log1p_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_log1p_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_log2_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_log2_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_log2_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_log2_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_log_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_log_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_log_normal_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_log_normal_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_log_softmax_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_log_softmax_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_log_softmax_with_dtype_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_logaddexp2_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_logaddexp2_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_logaddexp_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_logaddexp_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_logaddexp_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_logcumsumexp_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_logical_and_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_logical_not_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_logical_not_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_logical_not_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_logical_not_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_logical_or_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_logical_xor_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_logical_xor_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_logical_xor_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_logit_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_logit_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_logspace_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_logsumexp_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_long_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_long_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_lt_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_lt_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_lt_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_lt_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_lu_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_lu_unpack_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_mT_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_amax_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_amax_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_amax_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_amax_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_amax_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_amin_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_amin_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_amin_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_argmax_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_argmax_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_argmax_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_argmax_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_argmin_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_cumprod_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_cumsum_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_cumsum_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_cumsum_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_cumsum_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_cumsum_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_fill_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_log_softmax_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_logaddexp_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_logaddexp_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_logsumexp_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_logsumexp_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_logsumexp_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_logsumexp_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_mean_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_median_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_normalize_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_normalize_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_prod_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_prod_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_scatter_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_scatter_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_scatter_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_select_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_select_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_select_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_std_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_std_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_sum_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_sum_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_var_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_var_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_var_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_max_binary_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_max_binary_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_max_binary_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_max_reduction_no_dim_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_max_reduction_no_dim_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_max_reduction_with_dim_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_maximum_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_maximum_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_maximum_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_median_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_median_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_median_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_median_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_meshgrid_variadic_tensors_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_meshgrid_variadic_tensors_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_meshgrid_variadic_tensors_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_meshgrid_variadic_tensors_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_min_binary_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_min_binary_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_min_binary_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_min_reduction_no_dim_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_min_reduction_no_dim_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_min_reduction_with_dim_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_min_reduction_with_dim_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_minimum_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_mm_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_mode_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_movedim_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_movedim_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_movedim_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_msort_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_msort_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_mul_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_mul_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_mul_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_multinomial_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_mv_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_mvlgamma_mvlgamma_p_1_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_mvlgamma_mvlgamma_p_3_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_mvlgamma_mvlgamma_p_3_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_mvlgamma_mvlgamma_p_5_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nan_to_num_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nan_to_num_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nanmean_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nanmean_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nanmedian_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nanmedian_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nanmedian_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nanmedian_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nanmedian_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nanquantile_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nansum_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nansum_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nansum_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nansum_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nansum_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_narrow_copy_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_narrow_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_narrow_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_narrow_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_narrow_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_native_dropout_backward_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_native_layer_norm_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_ne_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_neg_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_neg_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_neg_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_new_empty_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_new_empty_strided_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_new_empty_strided_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_new_ones_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_new_zeros_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_new_zeros_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_new_zeros_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_adaptive_avg_pool2d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_adaptive_max_pool1d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_adaptive_max_pool2d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_adaptive_max_pool3d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_avg_pool1d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_avg_pool2d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_bilinear_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_bilinear_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_channel_shuffle_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_channel_shuffle_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_conv1d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_conv2d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_conv2d_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_conv2d_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_conv3d_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_conv_transpose1d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_conv_transpose1d_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_conv_transpose2d_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_conv_transpose2d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_conv_transpose3d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_conv_transpose3d_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_conv_transpose3d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_cosine_embedding_loss_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_cosine_embedding_loss_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_cosine_similarity_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_cosine_similarity_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_cross_entropy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_dropout2d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_dropout2d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_elu_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_embedding_bag_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_embedding_bag_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_feature_alpha_dropout_with_train_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_feature_alpha_dropout_without_train_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_feature_alpha_dropout_without_train_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_feature_alpha_dropout_without_train_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_fractional_max_pool3d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_fractional_max_pool3d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_grid_sample_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_group_norm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_hardshrink_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_hardsigmoid_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_hardswish_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_hardtanh_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_hinge_embedding_loss_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_instance_norm_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_interpolate_area_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_interpolate_area_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_interpolate_bicubic_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_interpolate_linear_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_interpolate_nearest-exact_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_interpolate_nearest_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_interpolate_trilinear_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_kl_div_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_layer_norm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_leaky_relu_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_leaky_relu_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_linear_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_local_response_norm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_logsigmoid_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_margin_ranking_loss_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_margin_ranking_loss_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_max_pool2d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_max_pool3d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_max_pool3d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_max_unpool2d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_max_unpool2d_grad_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_max_unpool2d_grad_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_max_unpool3d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_max_unpool3d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_mse_loss_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_multi_head_attention_forward_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_multi_margin_loss_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_multilabel_margin_loss_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_multilabel_soft_margin_loss_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_nll_loss_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_nll_loss_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_normalize_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_one_hot_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_pad_circular_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_pad_circular_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_pad_constant_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_pad_constant_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_pad_constant_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_pad_reflect_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_pad_reflect_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_pad_reflect_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_pad_reflect_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_pad_replicate_negative_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_pad_replicate_negative_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_pairwise_distance_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_pairwise_distance_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_pdist_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_pixel_shuffle_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_pixel_shuffle_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_pixel_shuffle_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_pixel_shuffle_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_pixel_shuffle_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_pixel_shuffle_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_pixel_unshuffle_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_pixel_unshuffle_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_pixel_unshuffle_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_pixel_unshuffle_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_poisson_nll_loss_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_prelu_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_relu6_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_relu_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_relu_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_rrelu_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_rrelu_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_selu_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_silu_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_silu_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_smooth_l1_loss_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_softmin_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_softplus_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_softplus_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_softsign_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_softsign_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_softsign_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_softsign_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_tanhshrink_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_tanhshrink_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_tanhshrink_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_threshold_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_threshold_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_threshold_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_triplet_margin_loss_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_triplet_margin_with_distance_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_triplet_margin_with_distance_loss_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_unfold_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_unfold_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_upsample_bilinear_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_upsample_nearest_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nonzero_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nonzero_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nonzero_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nonzero_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nonzero_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nonzero_static_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nonzero_static_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_norm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_norm_fro_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_norm_inf_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_normal_in_place_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_normal_number_mean_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_ones_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_ones_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_ones_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_ones_like_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_ones_like_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_ormqr_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_outer_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_pca_lowrank_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_permute_copy_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_permute_copy_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_permute_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_permute_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_permute_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_permute_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_polygamma_polygamma_n_0_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_polygamma_polygamma_n_0_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_polygamma_polygamma_n_0_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_polygamma_polygamma_n_1_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_polygamma_polygamma_n_2_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_polygamma_polygamma_n_2_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_polygamma_polygamma_n_2_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_polygamma_polygamma_n_2_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_polygamma_polygamma_n_2_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_polygamma_polygamma_n_3_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_polygamma_polygamma_n_4_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_polygamma_polygamma_n_4_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_polygamma_polygamma_n_4_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_polygamma_polygamma_n_4_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_positive_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_pow_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_pow_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_pow_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_pow_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_pow_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_prod_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_put_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_put_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_put_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_rand_like_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_randint_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_randn_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_randn_like_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_ravel_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_ravel_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_real_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_real_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_real_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_real_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_reciprocal_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_remainder_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_remainder_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_renorm_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_renorm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_repeat_interleave_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_repeat_interleave_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_repeat_interleave_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_repeat_interleave_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_repeat_interleave_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_reshape_as_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_reshape_as_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_reshape_as_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_reshape_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_reshape_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_resize__cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_resize__cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_resize_as__cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_resize_as__cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_resize_as__cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_resolve_conj_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_resolve_conj_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_resolve_conj_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_resolve_conj_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_resolve_neg_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_resolve_neg_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_resolve_neg_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_roll_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_round_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_round_decimals_0_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_round_decimals_3_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_rsqrt_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_rsqrt_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_rsqrt_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_rsqrt_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_rsqrt_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_rsub_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_rsub_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_scalar_tensor_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_scatter_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_scatter_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_scatter_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_scatter_reduce_amax_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_scatter_reduce_amax_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_scatter_reduce_amin_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_scatter_reduce_amin_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_scatter_reduce_sum_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_scatter_reduce_sum_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_searchsorted_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_select_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_select_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_select_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_select_scatter_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_select_scatter_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_select_scatter_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sgn_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sgn_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sgn_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sgn_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sgn_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sgn_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_short_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_short_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_short_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_short_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sigmoid_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sigmoid_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sigmoid_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sign_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sign_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_signal_windows_bartlett_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_signal_windows_bartlett_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_signal_windows_exponential_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_signal_windows_nuttall_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_signbit_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_signbit_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sin_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sin_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sin_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sinc_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sinh_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sinh_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sinh_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sinh_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_slice_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_slice_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_slice_scatter_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_softmax_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sort_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sort_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sort_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sparse_mm_reduce_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_airy_ai_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_bessel_j0_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_bessel_j1_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_bessel_j1_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_bessel_y0_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_bessel_y1_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_bessel_y1_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_bessel_y1_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_chebyshev_polynomial_t_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_chebyshev_polynomial_t_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_chebyshev_polynomial_v_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_chebyshev_polynomial_v_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_chebyshev_polynomial_w_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_chebyshev_polynomial_w_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_chebyshev_polynomial_w_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_entr_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_entr_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_entr_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_erfcx_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_hermite_polynomial_h_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_hermite_polynomial_h_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_i0e_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_laguerre_polynomial_l_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_legendre_polynomial_p_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_legendre_polynomial_p_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_legendre_polynomial_p_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_legendre_polynomial_p_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_legendre_polynomial_p_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_modified_bessel_i0_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_modified_bessel_i1_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_modified_bessel_i1_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_modified_bessel_k0_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_modified_bessel_k1_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_modified_bessel_k1_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_ndtr_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_ndtr_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_ndtri_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_ndtri_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_ndtri_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_polygamma_special_polygamma_n_0_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_polygamma_special_polygamma_n_0_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_scaled_modified_bessel_k0_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_scaled_modified_bessel_k0_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_scaled_modified_bessel_k1_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_scaled_modified_bessel_k1_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_shifted_chebyshev_polynomial_t_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_shifted_chebyshev_polynomial_u_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_shifted_chebyshev_polynomial_v_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_shifted_chebyshev_polynomial_w_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_shifted_chebyshev_polynomial_w_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_spherical_bessel_j0_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_xlog1py_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_zeta_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_split_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_split_list_args_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_split_list_args_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_split_list_args_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_split_with_sizes_copy_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_split_with_sizes_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_split_with_sizes_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sqrt_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sqrt_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sqrt_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sqrt_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sqrt_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sqrt_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sqrt_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_square_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_square_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_square_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_squeeze_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_squeeze_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_squeeze_multiple_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_squeeze_multiple_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_squeeze_multiple_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_stack_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_stack_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_std_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_std_mean_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_std_mean_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_std_mean_unbiased_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_std_unbiased_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_std_unbiased_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_stft_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sub_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sum_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sum_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sum_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sum_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sum_to_size_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sum_to_size_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_t_copy_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_t_copy_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_t_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_t_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_take_along_dim_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_take_along_dim_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_take_along_dim_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_take_along_dim_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_take_along_dim_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_tan_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_tan_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_tanh_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_tensor_split_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_tensordot_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_tile_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_tile_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_tile_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_to_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_to_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_to_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_to_sparse_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_to_sparse_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_to_sparse_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_to_sparse_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_topk_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_torch_ops_aten__efficient_attention_forward_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_torch_ops_aten__safe_softmax_default_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_torch_ops_aten__safe_softmax_default_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_torch_ops_aten__safe_softmax_default_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_trace_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_trace_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_transpose_copy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_transpose_copy_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_transpose_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_transpose_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_transpose_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_trapezoid_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_trapz_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_trapz_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_triangular_solve_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_tril_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_tril_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_tril_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_tril_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_triu_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_triu_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_triu_indices_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_triu_indices_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_true_divide_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_true_divide_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_true_divide_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unbind_copy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unbind_copy_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unbind_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unbind_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unflatten_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unflatten_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unflatten_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unfold_copy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unfold_copy_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unfold_copy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unfold_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unfold_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_uniform_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_uniform_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unique_consecutive_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unique_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unique_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unique_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unique_cuda_uint64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unsafe_chunk_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unsafe_chunk_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unsafe_split_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unsafe_split_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unsqueeze_copy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unsqueeze_copy_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unsqueeze_copy_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unsqueeze_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unsqueeze_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_var_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_var_mean_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_var_mean_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_view_as_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_view_as_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_view_as_real_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_view_copy_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_view_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_vsplit_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_vsplit_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_vstack_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_where_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_where_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_where_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_zero__cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_zero__cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_zero__cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_zero__cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_zero__cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_zeros_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_zeros_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_zeros_like_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_zeros_like_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_zeros_like_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_zeros_like_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_H_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_H_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_H_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_H_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_T_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_T_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_T_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_T_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_T_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace___getitem___cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace___radd___cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace___radd___cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace___radd___cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace___radd___cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace___rand___cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace___rand___cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace___rdiv___cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace___rdiv___cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace___rmatmul___cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace___rmod___cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace___rmod___cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace___rmul___cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace___ror___cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace___rpow___cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace___rxor___cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__chunk_cat_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__chunk_cat_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_abs_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_add_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_add_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_add_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_addcdiv_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_addcdiv_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_addcdiv_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_addcmul_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_addcmul_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_addcmul_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_asin_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_asin_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_asin_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_asin_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_asin_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_atan_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_ceil_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_ceil_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_clamp_max_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_clamp_min_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_clamp_min_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_clamp_min_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_copy_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_copy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_copy_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_copy_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_cosh_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_div_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_div_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_div_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_erf_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_erfc_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_erfc_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_exp_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_expm1_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_expm1_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_expm1_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_floor_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_frac_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_frac_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_lerp_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_lerp_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_lgamma_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_lgamma_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_lgamma_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_lgamma_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_log10_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_log10_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_log1p_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_log1p_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_log1p_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_log2_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_log_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_log_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_log_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_max_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_max_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_maximum_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_maximum_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_maximum_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_minimum_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_minimum_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_minimum_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_mul_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_mul_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_neg_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_neg_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_neg_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_norm_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_pow_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_reciprocal_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_reciprocal_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_reciprocal_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_rsqrt_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_sigmoid_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_sigmoid_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_sign_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_sign_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_sin_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_sin_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_sinh_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_sinh_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_sinh_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_sinh_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_sqrt_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_sqrt_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_sqrt_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_sub_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_tan_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_tan_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_tanh_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_tanh_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_tanh_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_trunc_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_trunc_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_trunc_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_trunc_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_zero_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__native_batch_norm_legit_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__segment_reduce_offsets_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__softmax_backward_data_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__unsafe_masked_index_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__unsafe_masked_index_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__unsafe_masked_index_put_accumulate_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__unsafe_masked_index_put_accumulate_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__unsafe_masked_index_put_accumulate_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__upsample_bilinear2d_aa_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_abs_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_abs_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_abs_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_acos_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_acos_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_acosh_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_acosh_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_acosh_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_acosh_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_acosh_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_acosh_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_add_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_add_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_addbmm_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_addbmm_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_addcdiv_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_addcdiv_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_addcdiv_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_addcmul_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_addr_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_addr_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides___rdiv___cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides___rmul___cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides___ror___cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides___rsub___cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides__foreach_clamp_max_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides__foreach_clamp_min_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides__foreach_floor_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides__foreach_minimum_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides__foreach_neg_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides__foreach_norm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides__foreach_rsqrt_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides__foreach_sign_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides__unsafe_masked_index_put_accumulate_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_addcmul_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_addmm_decomposed_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_amax_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_any_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_argwhere_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_atan_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_atleast_2d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_atleast_3d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_bernoulli_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_bfloat16_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_bitwise_and_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_bitwise_right_shift_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_cdouble_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_chalf_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_cholesky_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_chunk_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_clamp_max_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_eq_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_expm1_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_fft_hfftn_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_fft_ifft2_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_fft_irfftn_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_flip_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_flipud_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_float_power_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_full_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_half_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_histc_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_isclose_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_isin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_isnan_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_isneginf_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_isposinf_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_jiterator_4inputs_with_extra_args_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_jiterator_binary_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_kthvalue_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_lcm_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_lerp_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_linalg_cholesky_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_linalg_inv_ex_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_linalg_lu_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_linalg_lu_factor_ex_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_linalg_qr_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_linalg_slogdet_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_linalg_svdvals_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_linalg_vector_norm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_linspace_tensor_overload_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_log10_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_logdet_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_logit_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_logspace_tensor_overload_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_logsumexp_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_lt_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_lu_solve_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_masked_argmax_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_masked_cumsum_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_masked_mean_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_masked_norm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_masked_select_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_matrix_exp_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_meshgrid_list_of_tensors_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_mul_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_native_batch_norm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_new_empty_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_nn_functional_conv_transpose3d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_nn_functional_cosine_similarity_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_nn_functional_gelu_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_nn_functional_grid_sample_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_nn_functional_interpolate_trilinear_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_nn_functional_kl_div_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_nn_functional_linear_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_nn_functional_margin_ranking_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_nn_functional_max_pool1d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_nn_functional_max_pool2d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_nn_functional_max_unpool1d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_nn_functional_max_unpool2d_grad_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_nn_functional_mish_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_nn_functional_multilabel_margin_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_nn_functional_pad_replicate_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_nn_functional_poisson_nll_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_nn_functional_prelu_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_nn_functional_scaled_dot_product_attention_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_nn_functional_softmin_with_dtype_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_nn_functional_softshrink_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_nn_functional_triplet_margin_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_normal_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_normal_in_place_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_outer_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_permute_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_positive_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_randint_like_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_randn_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_real_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_repeat_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_repeat_interleave_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_resolve_conj_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_rot90_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_round_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_scatter_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_scatter_reduce_sum_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_select_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_signal_windows_cosine_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_sparse_sampled_addmm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_special_bessel_j0_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_special_bessel_y0_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_special_chebyshev_polynomial_u_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_special_chebyshev_polynomial_v_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_special_i1_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_special_laguerre_polynomial_l_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_special_scaled_modified_bessel_k0_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_special_shifted_chebyshev_polynomial_t_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_special_spherical_bessel_j0_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_special_zeta_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_squeeze_copy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_sub_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_t_copy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_to_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_to_sparse_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_tril_indices_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_true_divide_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_trunc_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_unsafe_split_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_unsqueeze_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_var_mean_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_vdot_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_view_as_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_zero__cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_allclose_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_allclose_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_allclose_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_allclose_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_amin_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_amin_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_aminmax_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_aminmax_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_angle_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_any_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_any_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_any_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_any_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_arange_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_arange_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_arange_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_argmax_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_argmax_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_argmax_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_argmin_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_argmin_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_argsort_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_argsort_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_argsort_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_argsort_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_argwhere_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_argwhere_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_as_strided_copy_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_as_strided_copy_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_as_strided_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_as_strided_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_as_strided_partial_views_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_as_strided_partial_views_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_as_strided_partial_views_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_as_strided_partial_views_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_as_strided_partial_views_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_as_strided_scatter_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_asin_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_asin_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_asinh_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_asinh_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_asinh_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_atan2_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_atan_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_atan_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_atan_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_atanh_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_atanh_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_atanh_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_atleast_1d_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_atleast_1d_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_atleast_2d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_atleast_2d_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_atleast_3d_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_atleast_3d_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_atleast_3d_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_baddbmm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_baddbmm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_bfloat16_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_bfloat16_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_bincount_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_bitwise_left_shift_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_bitwise_not_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_bitwise_not_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_bitwise_not_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_bitwise_or_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_bitwise_or_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_bitwise_right_shift_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_bitwise_xor_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_bmm_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_bmm_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_bmm_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_bmm_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_bool_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_bool_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_bool_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_bool_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_bool_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_bool_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_broadcast_tensors_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_broadcast_tensors_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_broadcast_to_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_broadcast_to_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_bucketize_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_bucketize_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_byte_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_byte_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_byte_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cartesian_prod_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cartesian_prod_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cat_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cat_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cat_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cat_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cat_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cdist_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cdouble_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cdouble_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_ceil_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_ceil_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_ceil_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cfloat_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cfloat_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cfloat_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cfloat_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cfloat_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_chalf_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_chalf_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_chalf_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_char_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_char_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cholesky_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_chunk_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_chunk_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_chunk_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_chunk_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_clamp_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_clamp_max_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_clamp_max_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_clamp_min_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_clamp_min_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_clamp_min_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_clone_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_clone_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_column_stack_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_column_stack_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_combinations_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_complex_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_conj_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_conj_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_conj_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_conj_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_conj_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_conj_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_conj_physical_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_constant_pad_nd_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_constant_pad_nd_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_constant_pad_nd_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_contiguous_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_contiguous_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_contiguous_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_contiguous_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_contiguous_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_copysign_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_corrcoef_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cos_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cosh_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cosh_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cosh_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cosh_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_count_nonzero_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_count_nonzero_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_count_nonzero_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cov_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cov_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cov_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cov_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cummax_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cummax_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cummin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cummin_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cummin_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cummin_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cumprod_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cumprod_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cumsum_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cumsum_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cumsum_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cumulative_trapezoid_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cumulative_trapezoid_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cumulative_trapezoid_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_deg2rad_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_diag_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_diag_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_diag_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_diag_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_diag_embed_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_diag_embed_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_diag_embed_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_diag_embed_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_diagflat_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_diagflat_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_diagflat_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_diagflat_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_diagflat_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_diagonal_copy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_diagonal_copy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_diagonal_copy_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_diagonal_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_diagonal_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_diagonal_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_diagonal_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_diagonal_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_diagonal_scatter_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_diagonal_scatter_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_diagonal_scatter_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_diagonal_scatter_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_diagonal_scatter_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_diagonal_scatter_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_diff_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_diff_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_digamma_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_digamma_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_dist_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_div_floor_rounding_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_div_floor_rounding_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_div_floor_rounding_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_div_no_rounding_mode_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_div_trunc_rounding_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_div_trunc_rounding_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_div_trunc_rounding_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_dot_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_dot_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_dot_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_dsplit_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_dstack_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_dstack_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_dstack_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_einsum_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_empty_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_empty_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_empty_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_empty_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_empty_like_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_empty_like_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_empty_permuted_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_empty_permuted_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_empty_strided_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_empty_strided_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_empty_strided_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_eq_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_eq_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_eq_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_eq_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_eq_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_equal_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_equal_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_erf_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_erf_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_erf_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_erf_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_erfc_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_erfc_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_erfinv_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_erfinv_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_exp2_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_exp_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_exp_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_exp_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_expand_as_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_expand_copy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_expand_copy_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_expand_copy_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_expand_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_expand_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_expand_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_expand_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_expand_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_expm1_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_expm1_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_eye_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_eye_cuda_float8_e4m3fn, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_eye_cuda_float8_e5m2fnuz, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_fft2_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_fft2_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_fft2_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_fft_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_fftn_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_fftn_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_fftn_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_fftshift_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_fftshift_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_hfft2_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_hfft2_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_hfft_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_hfft_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_hfftn_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_hfftn_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_ifft2_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_ifft_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_ifft_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_ifftn_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_ifftn_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_ihfft2_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_ihfft2_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_ihfft_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_ihfft_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_ihfft_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_ihfft_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_ihfftn_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_irfft2_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_irfft2_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_irfft_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_irfft_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_irfft_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_irfft_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_irfft_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_irfft_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_irfftn_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_irfftn_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_irfftn_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_irfftn_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_rfft_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_rfft_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_rfft_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fill_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fill_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_flatten_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_flatten_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_flatten_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_flatten_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_flatten_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_flip_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fliplr_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fliplr_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fliplr_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_flipud_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_flipud_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_flipud_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_flipud_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_flipud_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_float_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_float_power_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_float_power_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_float_power_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_floor_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_floor_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fmax_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fmax_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fmax_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fmax_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fmax_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fmax_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fmin_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fmin_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fmin_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fmin_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fmin_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fmod_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fmod_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fmod_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_frac_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_full_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_full_like_cuda_uint16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_gather_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_gather_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_gather_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_gather_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_gather_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_gcd_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_ge_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_ge_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_geometric_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_geometric_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_gradient_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_grid_sampler_2d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_grid_sampler_2d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_grid_sampler_3d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_grid_sampler_3d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_gt_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_half_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_half_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_hash_tensor_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_hash_tensor_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_heaviside_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_heaviside_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_heaviside_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_heaviside_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_heaviside_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_hsplit_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_hsplit_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_hsplit_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_hstack_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_hstack_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_hypot_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_hypot_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_igamma_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_igamma_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_imag_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_index_copy_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_index_fill_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_index_fill_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_index_put_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_index_put_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_index_put_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_index_reduce_amax_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_index_reduce_amin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_index_reduce_mean_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_index_reduce_mean_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_index_reduce_mean_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_index_reduce_mean_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_index_reduce_prod_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_index_reduce_prod_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_inner_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_inner_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_inner_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_int_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_int_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_int_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_isfinite_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_isfinite_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_isfinite_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_isin_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_isinf_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_isinf_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_isinf_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_isnan_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_isnan_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_isneginf_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_isneginf_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_isreal_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_item_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_jiterator_2inputs_2outputs_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_jiterator_4inputs_with_extra_args_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_jiterator_4inputs_with_extra_args_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_jiterator_binary_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_jiterator_binary_return_by_ref_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_kron_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_kron_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_kthvalue_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_kthvalue_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_kthvalue_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_kthvalue_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_kthvalue_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_lcm_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_ldexp_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_ldexp_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_ldexp_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_le_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_lerp_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_lgamma_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_lgamma_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_lgamma_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_lgamma_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_cholesky_ex_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_cond_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_cross_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_cross_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_diagonal_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_diagonal_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_diagonal_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_eigh_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_eigvals_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_eigvalsh_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_householder_product_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_householder_product_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_inv_ex_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_ldl_factor_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_lstsq_grad_oriented_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_lu_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_lu_factor_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_lu_solve_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_matrix_norm_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_matrix_power_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_multi_dot_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_multi_dot_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_multi_dot_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_norm_subgradients_at_zero_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_pinv_singular_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_pinv_singular_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_slogdet_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_solve_ex_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_svd_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_svd_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_tensorinv_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_tensorsolve_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_tensorsolve_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_vander_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_vander_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_vander_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_vander_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_vecdot_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_vector_norm_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_vector_norm_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_vector_norm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linspace_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linspace_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linspace_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linspace_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linspace_tensor_overload_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linspace_tensor_overload_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linspace_tensor_overload_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linspace_tensor_overload_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_log10_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_log10_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_log10_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_log10_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_log1p_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_log1p_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_log1p_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_log2_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_log2_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_log2_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_log2_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_log2_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_log2_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_log2_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_log_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_log_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_log_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_log_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_log_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_log_softmax_with_dtype_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_log_softmax_with_dtype_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_log_softmax_with_dtype_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_log_softmax_with_dtype_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_log_softmax_with_dtype_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_logaddexp_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_logaddexp_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_logdet_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_logical_and_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_logical_and_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_logical_and_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_logical_not_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_logical_not_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_logical_not_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_logical_not_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_logical_or_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_logical_or_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_logical_or_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_logical_xor_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_logit_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_logspace_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_logspace_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_logspace_tensor_overload_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_logspace_tensor_overload_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_logspace_tensor_overload_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_logsumexp_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_logsumexp_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_logsumexp_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_long_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_long_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_long_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_long_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_lt_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_lt_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_lu_unpack_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_mH_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_mH_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_mH_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_mT_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_mT_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_mT_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_mT_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_amax_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_amax_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_amax_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_amax_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_argmax_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_argmax_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_cumprod_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_cumprod_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_cumprod_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_cumprod_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_cumprod_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_cumprod_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_cumsum_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_cumsum_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_fill_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_fill_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_fill_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_mean_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_mean_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_mean_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_mean_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_normalize_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_normalize_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_normalize_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_prod_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_prod_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_prod_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_scatter_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_scatter_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_scatter_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_select_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_select_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_select_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_select_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_std_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_sum_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_sum_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_sum_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_var_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_max_binary_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_max_reduction_no_dim_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_max_reduction_with_dim_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_maximum_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_maximum_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_mean_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_mean_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_median_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_median_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_median_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_meshgrid_list_of_tensors_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_meshgrid_list_of_tensors_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_meshgrid_variadic_tensors_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_min_binary_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_min_binary_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_min_binary_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_min_reduction_no_dim_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_min_reduction_no_dim_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_minimum_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_minimum_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_minimum_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_mode_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_movedim_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_msort_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_msort_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_msort_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_mul_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_mul_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_mul_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_mv_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_mvlgamma_mvlgamma_p_1_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_mvlgamma_mvlgamma_p_1_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_mvlgamma_mvlgamma_p_3_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_mvlgamma_mvlgamma_p_5_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_mvlgamma_mvlgamma_p_5_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_mvlgamma_mvlgamma_p_5_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nan_to_num_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nan_to_num_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nan_to_num_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nanmean_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nansum_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_narrow_copy_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_narrow_copy_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_narrow_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_narrow_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_narrow_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_narrow_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_narrow_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_native_dropout_backward_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_ne_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_ne_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_ne_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_neg_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_neg_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_neg_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_new_empty_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_new_empty_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_new_empty_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_new_empty_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_new_empty_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_new_empty_strided_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_new_empty_strided_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_new_empty_strided_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_new_full_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_new_full_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_new_ones_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_new_ones_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_new_ones_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_new_zeros_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_new_zeros_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_new_zeros_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nextafter_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_adaptive_avg_pool1d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_adaptive_avg_pool1d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_adaptive_avg_pool3d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_adaptive_max_pool2d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_alpha_dropout_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_avg_pool3d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_avg_pool3d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_avg_pool3d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_batch_norm_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_batch_norm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_binary_cross_entropy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_binary_cross_entropy_with_logits_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_binary_cross_entropy_with_logits_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_binary_cross_entropy_with_logits_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_conv1d_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_conv2d_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_conv2d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_conv2d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_conv3d_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_conv3d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_conv_transpose2d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_conv_transpose3d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_cosine_similarity_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_cosine_similarity_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_cosine_similarity_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_cross_entropy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_cross_entropy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_dropout3d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_dropout3d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_dropout_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_embedding_bag_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_embedding_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_feature_alpha_dropout_without_train_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_feature_alpha_dropout_without_train_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_fractional_max_pool2d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_gelu_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_glu_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_glu_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_group_norm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_hardshrink_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_hardsigmoid_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_hardtanh_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_hardtanh_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_huber_loss_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_instance_norm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_interpolate_area_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_interpolate_linear_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_interpolate_nearest_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_interpolate_nearest_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_interpolate_nearest_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_kl_div_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_layer_norm_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_layer_norm_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_layer_norm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_linear_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_linear_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_local_response_norm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_margin_ranking_loss_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_margin_ranking_loss_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_max_pool1d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_max_pool2d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_max_pool2d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_max_unpool1d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_max_unpool1d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_max_unpool2d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_max_unpool2d_grad_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_multi_margin_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_multilabel_margin_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_multilabel_soft_margin_loss_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_multilabel_soft_margin_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_multilabel_soft_margin_loss_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_normalize_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_one_hot_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_pad_constant_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_pad_constant_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_pad_reflect_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_pad_reflect_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_pad_reflect_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_pad_reflect_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_pad_reflect_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_pad_replicate_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_pad_replicate_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_pad_replicate_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_pad_replicate_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_pad_replicate_negative_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_pad_replicate_negative_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_pad_replicate_negative_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_pairwise_distance_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_pixel_shuffle_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_pixel_shuffle_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_pixel_shuffle_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_poisson_nll_loss_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_relu6_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_relu6_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_relu_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_rrelu_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_silu_complex_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_softmin_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_softmin_with_dtype_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_softmin_with_dtype_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_softmin_with_dtype_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_softplus_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_softplus_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_softplus_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_softshrink_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_softshrink_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_softsign_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_softsign_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_softsign_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_tanhshrink_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_tanhshrink_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_threshold_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_threshold_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_triplet_margin_loss_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_triplet_margin_loss_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_triplet_margin_loss_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_triplet_margin_loss_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_triplet_margin_with_distance_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_triplet_margin_with_distance_loss_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_triplet_margin_with_distance_loss_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_triplet_margin_with_distance_loss_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_triplet_margin_with_distance_loss_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_upsample_bilinear_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_upsample_bilinear_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_upsample_bilinear_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_upsample_nearest_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_upsample_nearest_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nonzero_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nonzero_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nonzero_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nonzero_static_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nonzero_static_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_norm_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_norm_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_norm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_norm_fro_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_norm_fro_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_norm_nuc_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_normal_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_ones_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_ones_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_ones_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_ones_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_ones_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_ones_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_ones_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_ones_like_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_ones_like_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_outer_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_permute_copy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_permute_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_permute_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_permute_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_permute_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_pinverse_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_pinverse_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_polygamma_polygamma_n_0_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_polygamma_polygamma_n_0_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_polygamma_polygamma_n_0_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_polygamma_polygamma_n_0_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_polygamma_polygamma_n_1_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_polygamma_polygamma_n_1_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_polygamma_polygamma_n_1_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_polygamma_polygamma_n_2_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_polygamma_polygamma_n_2_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_polygamma_polygamma_n_3_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_polygamma_polygamma_n_3_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_polygamma_polygamma_n_4_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_polygamma_polygamma_n_4_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_positive_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_positive_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_positive_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_positive_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_pow_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_prod_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_prod_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_prod_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_prod_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_prod_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_put_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_put_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_quantile_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_rad2deg_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_rand_like_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_randint_like_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_randint_like_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_randn_like_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_ravel_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_ravel_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_ravel_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_real_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_reciprocal_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_reciprocal_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_reciprocal_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_renorm_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_renorm_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_repeat_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_repeat_interleave_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_repeat_interleave_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_reshape_as_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_reshape_as_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_reshape_as_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_reshape_as_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_reshape_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_resize__cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_resize__cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_resize_as__cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_resize_as__cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_resize_as__cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_resolve_conj_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_resolve_conj_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_resolve_conj_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_resolve_neg_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_roll_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_round_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_round_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_round_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_round_decimals_neg_3_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_rsqrt_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_rsqrt_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_rsub_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_rsub_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_rsub_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_scalar_tensor_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_scalar_tensor_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_scalar_tensor_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_scalar_tensor_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_scalar_tensor_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_scatter_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_scatter_reduce_amax_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_scatter_reduce_amax_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_scatter_reduce_amin_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_scatter_reduce_amin_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_scatter_reduce_mean_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_scatter_reduce_mean_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_scatter_reduce_mean_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_scatter_reduce_prod_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_scatter_reduce_sum_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_scatter_reduce_sum_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_searchsorted_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_select_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_select_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_select_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_select_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_select_scatter_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_select_scatter_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_select_scatter_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_short_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_short_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_short_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_short_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_signal_windows_bartlett_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_signal_windows_blackman_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_signal_windows_hamming_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_signal_windows_hann_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_signal_windows_nuttall_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_signal_windows_nuttall_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_signbit_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_sin_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_sin_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_sin_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_sin_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_sinc_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_sinh_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_slice_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_slice_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_slice_scatter_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_slice_scatter_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_softmax_with_dtype_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_softmax_with_dtype_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_softmax_with_dtype_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_softmax_with_dtype_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_softmax_with_dtype_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_sort_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_sort_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_sparse_mm_reduce_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_sparse_sampled_addmm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_sparse_sampled_addmm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_airy_ai_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_airy_ai_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_bessel_j1_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_bessel_y0_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_bessel_y0_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_bessel_y0_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_bessel_y1_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_bessel_y1_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_chebyshev_polynomial_t_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_chebyshev_polynomial_t_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_chebyshev_polynomial_t_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_entr_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_entr_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_entr_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_erfcx_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_hermite_polynomial_h_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_hermite_polynomial_h_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_hermite_polynomial_he_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_i0e_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_i0e_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_i1_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_i1_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_i1_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_i1_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_i1e_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_i1e_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_i1e_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_laguerre_polynomial_l_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_legendre_polynomial_p_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_log_ndtr_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_modified_bessel_i0_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_modified_bessel_i0_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_modified_bessel_i1_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_modified_bessel_i1_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_modified_bessel_i1_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_modified_bessel_k0_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_modified_bessel_k1_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_modified_bessel_k1_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_ndtr_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_ndtr_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_polygamma_special_polygamma_n_0_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_scaled_modified_bessel_k0_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_scaled_modified_bessel_k0_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_scaled_modified_bessel_k1_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_shifted_chebyshev_polynomial_t_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_shifted_chebyshev_polynomial_v_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_spherical_bessel_j0_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_spherical_bessel_j0_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_split_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_split_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_split_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_split_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_split_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_split_list_args_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_split_list_args_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_split_with_sizes_copy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_split_with_sizes_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_split_with_sizes_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_sqrt_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_sqrt_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_sqrt_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_sqrt_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_square_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_square_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_square_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_square_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_square_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_squeeze_copy_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_squeeze_copy_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_squeeze_copy_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_squeeze_copy_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_squeeze_copy_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_squeeze_copy_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_squeeze_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_squeeze_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_squeeze_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_squeeze_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_squeeze_multiple_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_squeeze_multiple_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_squeeze_multiple_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_stack_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_std_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_std_mean_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_std_mean_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_std_mean_unbiased_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_stft_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_sub_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_sum_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_sum_to_size_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_sum_to_size_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_svd_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_svd_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_t_copy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_t_copy_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_t_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_t_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_t_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_take_along_dim_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_take_along_dim_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_take_along_dim_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_take_along_dim_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_tan_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_tan_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_tan_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_tanh_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_tanh_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_tensor_split_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_tensordot_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_tile_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_tile_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_tile_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_to_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_to_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_to_sparse_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_topk_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_topk_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_torch__scaled_mm_v2_cuda_float8_e4m3fn, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_torch_ops_aten__flash_attention_forward_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_torch_ops_aten__safe_softmax_default_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_trace_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_trace_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_trace_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_transpose_copy_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_transpose_copy_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_transpose_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_transpose_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_transpose_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_trapezoid_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_trapezoid_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_trapezoid_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_trapezoid_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_trapezoid_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_trapz_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_triangular_solve_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_tril_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_tril_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_tril_indices_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_tril_indices_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_triu_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_triu_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_triu_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_triu_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_true_divide_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_true_divide_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_trunc_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_trunc_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unbind_copy_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unbind_copy_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unbind_copy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unbind_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unflatten_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unflatten_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unflatten_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unfold_copy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unfold_copy_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unfold_copy_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unfold_copy_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unfold_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_uniform_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_uniform_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_uniform_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unique_consecutive_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unique_consecutive_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unique_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unravel_index_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unsafe_chunk_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unsafe_chunk_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unsafe_chunk_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unsafe_split_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unsafe_split_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unsafe_split_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unsqueeze_copy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unsqueeze_copy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unsqueeze_copy_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unsqueeze_copy_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unsqueeze_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unsqueeze_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unsqueeze_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unsqueeze_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_var_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_var_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_var_mean_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_var_mean_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_var_unbiased_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_var_unbiased_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_vdot_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_view_as_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_view_copy_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_view_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_view_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_view_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_vsplit_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_vsplit_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_vsplit_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_vstack_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_vstack_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_vstack_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_where_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_where_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_where_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_where_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_xlogy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_xlogy_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_xlogy_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_zero__cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_zeros_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_zeros_like_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_H_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_H_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_H_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_H_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_T_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_T_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_T_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_T_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_T_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace___radd___cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace___rdiv___cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace___rdiv___cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace___rdiv___cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace___rdiv___cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace___rmod___cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace___rmul___cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace___rmul___cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace___ror___cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace___rpow___cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace___rsub___cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace___rsub___cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace___rsub___cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace___rxor___cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__chunk_cat_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_abs_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_abs_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_acos_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_acos_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_acos_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_add_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_addcdiv_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_addcdiv_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_addcdiv_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_addcdiv_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_addcdiv_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_addcdiv_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_addcmul_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_addcmul_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_atan_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_atan_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_atan_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_atan_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_ceil_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_ceil_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_ceil_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_ceil_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_clamp_max_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_clamp_max_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_clamp_max_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_clamp_min_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_clamp_min_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_clamp_min_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_copy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_copy_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_copy_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_copy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_copy_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_copy_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_cosh_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_cosh_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_div_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_erf_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_erf_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_erf_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_erf_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_erf_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_erfc_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_exp_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_exp_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_exp_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_exp_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_exp_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_expm1_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_expm1_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_expm1_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_frac_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_lgamma_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_log10_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_log10_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_log1p_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_log1p_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_log1p_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_log1p_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_log2_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_log_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_log_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_log_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_max_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_max_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_maximum_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_minimum_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_minimum_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_minimum_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_mul_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_mul_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_mul_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_neg_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_norm_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_norm_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_pow_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_pow_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_pow_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_pow_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_pow_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_pow_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_reciprocal_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_reciprocal_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_reciprocal_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_round_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_round_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_rsqrt_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_rsqrt_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_rsqrt_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_sigmoid_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_sigmoid_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_sign_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_sign_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_sin_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_sin_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_sin_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_sinh_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_sqrt_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_sub_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_sub_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_tan_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_tan_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_tanh_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_tanh_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_tanh_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_tanh_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_trunc_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_trunc_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_zero_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__native_batch_norm_legit_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__softmax_backward_data_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__unsafe_masked_index_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__unsafe_masked_index_put_accumulate_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_abs_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_abs_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_abs_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_acos_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_acos_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_acos_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_acosh_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_acosh_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_acosh_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_acosh_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_addbmm_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_addbmm_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_addcdiv_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_addcdiv_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_addcmul_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_addcmul_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_addmm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_addmm_decomposed_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_addmm_decomposed_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_addr_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_alias_copy_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides__foreach_addcmul_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides__foreach_div_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides__foreach_exp_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides__foreach_expm1_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides__foreach_floor_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides__foreach_log_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides__foreach_minimum_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides__foreach_neg_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides__foreach_pow_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides__unsafe_masked_index_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_acos_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_addcdiv_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_all_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_allclose_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_bool_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_broadcast_tensors_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_cartesian_prod_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_ceil_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_char_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_clamp_min_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_clone_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_combinations_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_cross_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_cummax_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_cumprod_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_double_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_expand_copy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_eye_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_fft_fftn_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_fliplr_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_flipud_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_fmax_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_full_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_geometric_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_histc_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_hypot_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_i0_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_igamma_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_index_add_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_index_fill_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_index_reduce_amin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_isposinf_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_jiterator_binary_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_jiterator_unary_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_lerp_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_linalg_inv_ex_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_linalg_ldl_factor_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_linalg_lu_solve_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_linalg_matrix_norm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_linalg_multi_dot_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_linalg_norm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_linalg_pinv_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_linalg_pinv_singular_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_linalg_solve_triangular_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_linalg_vander_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_linalg_vecdot_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_log1p_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_log_softmax_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_logdet_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_logical_or_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_logspace_tensor_overload_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_long_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_mH_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_masked_amax_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_max_pool2d_with_indices_backward_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_median_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_meshgrid_variadic_tensors_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_min_reduction_no_dim_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_minimum_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_movedim_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_mul_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_nan_to_num_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_narrow_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_ne_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_new_ones_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_nn_functional_adaptive_avg_pool1d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_nn_functional_adaptive_max_pool1d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_nn_functional_bilinear_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_nn_functional_conv_transpose3d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_nn_functional_cosine_embedding_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_nn_functional_dropout_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_nn_functional_elu_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_nn_functional_embedding_bag_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_nn_functional_fractional_max_pool3d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_nn_functional_gaussian_nll_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_nn_functional_glu_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_nn_functional_group_norm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_nn_functional_huber_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_nn_functional_interpolate_bicubic_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_nn_functional_interpolate_nearest_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_nn_functional_interpolate_trilinear_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_nn_functional_l1_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_nn_functional_max_pool1d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_nn_functional_max_pool3d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_nn_functional_max_unpool3d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_nn_functional_mish_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_nn_functional_nll_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_nn_functional_pad_circular_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_nn_functional_prelu_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_nn_functional_rms_norm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_nn_functional_selu_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_nn_functional_softmin_with_dtype_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_nn_functional_unfold_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_normal_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_pca_lowrank_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_permute_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_polygamma_polygamma_n_1_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_rad2deg_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_randn_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_ravel_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_reciprocal_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_renorm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_reshape_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_scatter_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_scatter_reduce_amin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_select_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_signal_windows_cosine_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_signal_windows_general_hamming_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_signal_windows_nuttall_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_signbit_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_sinh_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_softmax_with_dtype_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_sort_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_sparse_sampled_addmm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_special_chebyshev_polynomial_u_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_special_hermite_polynomial_h_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_special_i0e_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_special_modified_bessel_k0_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_special_ndtr_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_special_shifted_chebyshev_polynomial_u_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_special_shifted_chebyshev_polynomial_w_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_special_spherical_bessel_j0_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_squeeze_copy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_squeeze_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_stack_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_svd_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_svd_lowrank_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_t_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_tanh_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_tile_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_torch__scaled_mm_v2_cuda_float8_e4m3fn, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_tril_indices_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_unbind_copy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_unflatten_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_unique_consecutive_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_unsafe_chunk_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_var_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_view_as_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_view_as_real_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_view_copy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_vstack_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_xlogy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_allclose_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_amax_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_amin_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_amin_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_amin_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_aminmax_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_angle_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_angle_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_angle_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_any_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_any_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_any_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_arange_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_arange_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_argmin_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_argsort_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_argwhere_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_argwhere_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_as_strided_copy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_as_strided_copy_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_as_strided_copy_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_as_strided_copy_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_as_strided_copy_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_as_strided_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_as_strided_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_as_strided_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_as_strided_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_as_strided_partial_views_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_as_strided_partial_views_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_as_strided_scatter_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_as_strided_scatter_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_as_strided_scatter_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_as_strided_scatter_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_asin_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_asin_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_asin_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_asin_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_asinh_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_asinh_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_atan2_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_atan2_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_atan2_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_atan_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_atan_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_atan_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_atanh_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_atanh_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_atleast_1d_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_atleast_1d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_atleast_1d_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_atleast_1d_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_atleast_1d_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_atleast_2d_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_atleast_2d_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_atleast_3d_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_atleast_3d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_baddbmm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_bernoulli_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_bernoulli_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_bfloat16_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_bfloat16_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_bfloat16_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_bitwise_and_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_bitwise_left_shift_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_bitwise_or_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_bitwise_right_shift_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_bitwise_xor_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_bitwise_xor_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_block_diag_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_block_diag_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_block_diag_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_block_diag_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_block_diag_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_bool_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_broadcast_tensors_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_broadcast_tensors_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_broadcast_tensors_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_broadcast_tensors_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_broadcast_to_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_bucketize_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_bucketize_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_byte_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_byte_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_byte_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_byte_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_byte_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_byte_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cartesian_prod_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cartesian_prod_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cartesian_prod_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cat_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cat_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cat_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cauchy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cauchy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cauchy_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_ceil_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_ceil_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cfloat_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cfloat_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cfloat_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cfloat_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cfloat_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_chalf_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_chalf_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_chalf_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_char_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_char_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_char_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_char_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_chunk_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_chunk_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_chunk_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_clamp_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_clamp_max_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_clamp_max_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_clamp_max_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_clamp_min_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_clamp_min_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_clone_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_clone_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_column_stack_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_column_stack_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_combinations_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_combinations_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_complex_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_conj_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_conj_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_conj_physical_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_conj_physical_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_conj_physical_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_conj_physical_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_constant_pad_nd_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_constant_pad_nd_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_constant_pad_nd_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_constant_pad_nd_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_constant_pad_nd_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_contiguous_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_copysign_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_copysign_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_copysign_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_copysign_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_corrcoef_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_corrcoef_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cos_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cos_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cos_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cos_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cos_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cos_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cosh_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cosh_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cosh_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_count_nonzero_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_count_nonzero_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cov_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cov_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cov_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cov_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cummin_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cummin_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cummin_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cummin_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cumprod_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cumprod_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cumprod_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cumsum_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cumulative_trapezoid_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cumulative_trapezoid_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cumulative_trapezoid_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_deg2rad_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_deg2rad_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_diag_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_diag_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_diag_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_diag_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_diag_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_diag_embed_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_diag_embed_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_diagflat_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_diagflat_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_diagflat_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_diagflat_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_diagonal_copy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_diagonal_copy_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_diagonal_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_diagonal_scatter_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_diagonal_scatter_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_diagonal_scatter_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_diagonal_scatter_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_diagonal_scatter_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_diff_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_diff_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_diff_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_digamma_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_dist_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_dist_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_dist_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_div_floor_rounding_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_div_trunc_rounding_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_div_trunc_rounding_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_div_trunc_rounding_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_dot_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_double_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_double_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_dsplit_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_dsplit_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_dstack_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_dstack_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_dstack_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_einsum_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_empty_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_empty_like_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_empty_like_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_empty_like_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_empty_permuted_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_empty_permuted_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_empty_strided_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_empty_strided_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_eq_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_eq_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_equal_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_equal_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_equal_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_equal_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_erf_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_erf_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_erfc_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_erfc_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_exp2_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_exp_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_exp_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_expand_as_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_expand_as_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_expand_as_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_expand_as_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_expand_copy_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_expand_copy_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_expand_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_expand_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_expand_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_expand_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_expm1_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_expm1_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_expm1_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_exponential_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_eye_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_eye_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_eye_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_eye_cuda_float8_e4m3fn, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_eye_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_fft2_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_fft2_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_fft2_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_fft_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_fft_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_fft_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_fftn_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_fftn_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_fftn_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_fftn_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_fftshift_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_fftshift_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_hfft2_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_hfft_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_hfft_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_hfftn_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_hfftn_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_ifft2_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_ifft2_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_ifft_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_ifft_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_ifft_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_ifftn_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_ifftn_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_ifftshift_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_ifftshift_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_ifftshift_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_ihfft2_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_ihfft_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_ihfftn_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_irfft2_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_irfft2_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_irfft2_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_irfft_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_irfft_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_irfft_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_irfftn_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_irfftn_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_irfftn_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_rfft_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_rfft_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_rfftn_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_rfftn_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fill_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_flatten_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_flatten_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_flatten_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_flatten_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_flip_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_flip_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_flip_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fliplr_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_flipud_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_float_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_float_power_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_float_power_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_float_power_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_float_power_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_float_power_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_floor_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_floor_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_floor_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_floor_divide_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fmax_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fmod_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fmod_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_frexp_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_full_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_full_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_full_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_full_like_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_full_like_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_full_like_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_gather_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_gcd_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_ge_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_ge_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_geometric_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_geometric_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_geometric_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_gradient_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_grid_sampler_2d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_grid_sampler_2d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_gt_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_half_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_heaviside_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_histc_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_histc_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_hsplit_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_hsplit_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_hstack_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_hypot_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_hypot_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_i0_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_index_add_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_index_add_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_index_copy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_index_copy_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_index_copy_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_index_fill_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_index_fill_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_index_fill_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_index_fill_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_index_put_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_index_put_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_index_reduce_amax_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_index_reduce_amax_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_index_reduce_amax_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_index_reduce_amin_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_index_reduce_mean_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_index_reduce_mean_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_index_reduce_mean_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_index_reduce_prod_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_index_reduce_prod_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_index_select_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_index_select_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_int_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_int_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_int_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_isclose_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_isclose_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_isclose_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_isfinite_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_isfinite_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_isin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_isin_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_isinf_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_isinf_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_isinf_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_isinf_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_isinf_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_isnan_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_isneginf_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_isneginf_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_isposinf_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_isposinf_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_isposinf_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_isposinf_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_isreal_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_isreal_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_istft_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_istft_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_item_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_item_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_jiterator_2inputs_2outputs_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_jiterator_2inputs_2outputs_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_jiterator_binary_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_jiterator_binary_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_jiterator_binary_return_by_ref_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_jiterator_binary_return_by_ref_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_jiterator_binary_return_by_ref_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_jiterator_unary_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_jiterator_unary_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_kron_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_kthvalue_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_lcm_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_le_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_lerp_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_lgamma_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_lgamma_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_lgamma_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_cholesky_ex_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_cross_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_cross_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_cross_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_diagonal_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_diagonal_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_diagonal_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_diagonal_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_eig_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_eigvals_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_eigvalsh_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_eigvalsh_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_householder_product_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_ldl_factor_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_ldl_factor_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_ldl_factor_ex_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_ldl_solve_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_lstsq_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_lstsq_grad_oriented_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_lstsq_grad_oriented_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_lu_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_lu_factor_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_lu_factor_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_lu_factor_ex_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_lu_factor_ex_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_lu_solve_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_matrix_power_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_matrix_rank_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_matrix_rank_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_matrix_rank_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_matrix_rank_hermitian_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_multi_dot_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_norm_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_norm_subgradients_at_zero_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_pinv_hermitian_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_qr_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_solve_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_solve_triangular_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_svd_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_svdvals_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_vander_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_vecdot_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_vecdot_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linspace_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linspace_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linspace_tensor_overload_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linspace_tensor_overload_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linspace_tensor_overload_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_log10_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_log10_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_log1p_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_log2_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_log2_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_log2_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_log2_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_log_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_log_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_log_normal_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_log_softmax_with_dtype_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_log_softmax_with_dtype_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_log_softmax_with_dtype_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_logaddexp2_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_logaddexp2_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_logcumsumexp_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_logdet_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_logical_and_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_logical_and_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_logical_and_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_logical_not_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_logical_not_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_logical_or_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_logical_or_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_logical_xor_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_logical_xor_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_logical_xor_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_logical_xor_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_logical_xor_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_logit_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_logit_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_logspace_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_logspace_tensor_overload_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_logsumexp_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_logsumexp_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_logsumexp_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_logsumexp_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_long_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_lt_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_lt_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_lt_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_lu_solve_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_lu_unpack_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_mH_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_mH_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_mH_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_mH_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_mT_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_mT_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_amin_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_argmax_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_cumsum_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_cumsum_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_fill_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_fill_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_fill_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_fill_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_fill_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_log_softmax_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_logaddexp_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_logsumexp_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_mean_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_median_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_median_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_prod_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_prod_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_scatter_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_scatter_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_select_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_select_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_softmax_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_softmin_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_std_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_std_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_std_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_std_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_std_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_sum_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_sum_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_sum_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_sum_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_var_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_var_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_matmul_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_matrix_exp_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_max_binary_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_max_binary_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_max_binary_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_max_pool2d_with_indices_backward_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_max_reduction_no_dim_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_max_reduction_no_dim_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_maximum_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_maximum_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_maximum_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_mean_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_median_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_median_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_meshgrid_list_of_tensors_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_meshgrid_list_of_tensors_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_meshgrid_list_of_tensors_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_meshgrid_list_of_tensors_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_meshgrid_list_of_tensors_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_meshgrid_variadic_tensors_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_meshgrid_variadic_tensors_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_meshgrid_variadic_tensors_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_min_binary_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_min_reduction_no_dim_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_min_reduction_no_dim_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_min_reduction_no_dim_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_min_reduction_with_dim_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_min_reduction_with_dim_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_minimum_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_mm_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_mode_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_mode_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_movedim_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_movedim_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_movedim_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_movedim_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_msort_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_msort_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_msort_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_mul_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_mul_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_mul_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_multinomial_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_mv_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_mv_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_mvlgamma_mvlgamma_p_3_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_mvlgamma_mvlgamma_p_3_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_mvlgamma_mvlgamma_p_5_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_mvlgamma_mvlgamma_p_5_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nan_to_num_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nan_to_num_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nan_to_num_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nan_to_num_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nanmean_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nanquantile_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nansum_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nansum_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nansum_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_narrow_copy_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_narrow_copy_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_narrow_copy_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_narrow_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_narrow_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_native_batch_norm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_native_dropout_backward_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_native_layer_norm_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_native_layer_norm_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_native_layer_norm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_ne_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_new_empty_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_new_empty_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_new_empty_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_new_empty_strided_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_new_empty_strided_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_new_full_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_new_full_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_new_full_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_new_zeros_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_adaptive_avg_pool2d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_adaptive_avg_pool2d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_adaptive_avg_pool2d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_adaptive_avg_pool3d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_adaptive_max_pool2d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_adaptive_max_pool2d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_adaptive_max_pool3d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_alpha_dropout_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_avg_pool1d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_avg_pool3d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_bilinear_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_celu_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_celu_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_channel_shuffle_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_channel_shuffle_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_conv1d_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_conv2d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_conv3d_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_conv3d_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_conv_transpose2d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_conv_transpose2d_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_conv_transpose3d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_conv_transpose3d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_cosine_embedding_loss_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_cosine_embedding_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_cosine_similarity_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_ctc_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_dropout_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_elu_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_embedding_bag_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_feature_alpha_dropout_without_train_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_feature_alpha_dropout_without_train_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_feature_alpha_dropout_without_train_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_feature_alpha_dropout_without_train_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_fractional_max_pool2d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_gelu_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_glu_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_group_norm_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_hardshrink_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_hardsigmoid_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_hardswish_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_hardtanh_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_hinge_embedding_loss_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_huber_loss_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_huber_loss_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_interpolate_bicubic_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_interpolate_bicubic_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_interpolate_linear_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_interpolate_trilinear_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_kl_div_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_kl_div_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_leaky_relu_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_linear_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_linear_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_local_response_norm_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_local_response_norm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_margin_ranking_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_margin_ranking_loss_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_max_pool3d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_max_pool3d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_max_unpool2d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_max_unpool2d_grad_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_max_unpool3d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_max_unpool3d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_max_unpool3d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_max_unpool3d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_max_unpool3d_grad_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_max_unpool3d_grad_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_mse_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_multi_head_attention_forward_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_multi_margin_loss_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_multi_margin_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_multilabel_margin_loss_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_multilabel_margin_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_multilabel_soft_margin_loss_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_nll_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_nll_loss_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_normalize_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_pad_circular_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_pad_circular_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_pad_constant_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_pad_constant_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_pad_constant_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_pad_reflect_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_pad_replicate_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_pad_replicate_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_pad_replicate_negative_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_pairwise_distance_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_pixel_shuffle_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_pixel_shuffle_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_pixel_shuffle_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_pixel_shuffle_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_pixel_unshuffle_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_pixel_unshuffle_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_pixel_unshuffle_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_poisson_nll_loss_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_poisson_nll_loss_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_prelu_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_prelu_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_relu6_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_relu6_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_relu_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_relu_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_relu_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_rms_norm_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_scaled_dot_product_attention_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_silu_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_smooth_l1_loss_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_soft_margin_loss_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_softmin_with_dtype_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_softmin_with_dtype_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_softmin_with_dtype_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_softmin_with_dtype_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_softsign_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_softsign_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_softsign_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_threshold_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_threshold_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_threshold_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_triplet_margin_loss_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_triplet_margin_loss_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_triplet_margin_with_distance_loss_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_unfold_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_upsample_bilinear_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nonzero_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nonzero_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nonzero_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nonzero_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nonzero_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nonzero_static_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nonzero_static_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_norm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_norm_fro_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_norm_inf_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_normal_in_place_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_normal_in_place_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_normal_in_place_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_ones_like_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_ones_like_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_ones_like_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_outer_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_permute_copy_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_permute_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_polygamma_polygamma_n_0_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_polygamma_polygamma_n_1_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_polygamma_polygamma_n_1_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_polygamma_polygamma_n_1_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_polygamma_polygamma_n_2_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_polygamma_polygamma_n_4_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_polygamma_polygamma_n_4_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_polygamma_polygamma_n_4_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_positive_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_positive_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_pow_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_pow_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_pow_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_pow_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_prod_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_put_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_put_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_rad2deg_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_rand_like_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_randint_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_randint_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_randint_like_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_randn_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_randn_like_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_ravel_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_ravel_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_ravel_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_real_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_real_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_real_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_real_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_real_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_reciprocal_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_renorm_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_renorm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_repeat_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_repeat_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_repeat_interleave_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_repeat_interleave_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_repeat_interleave_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_reshape_as_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_reshape_as_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_reshape_as_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_reshape_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_reshape_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_resize__cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_resize_as__cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_resize_as__cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_resolve_conj_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_resolve_neg_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_resolve_neg_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_resolve_neg_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_resolve_neg_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_resolve_neg_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_roll_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_round_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_round_decimals_0_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_round_decimals_3_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_round_decimals_3_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_rsqrt_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_rsqrt_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_rsub_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_rsub_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_scalar_tensor_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_scatter_add_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_scatter_add_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_scatter_add_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_scatter_add_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_scatter_add_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_scatter_reduce_amax_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_scatter_reduce_amax_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_scatter_reduce_amax_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_scatter_reduce_amin_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_scatter_reduce_mean_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_scatter_reduce_mean_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_scatter_reduce_prod_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_scatter_reduce_sum_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_scatter_reduce_sum_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_searchsorted_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_searchsorted_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_select_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_select_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_select_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_select_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sgn_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_short_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_short_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sigmoid_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sigmoid_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sigmoid_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sign_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sign_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_signal_windows_cosine_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_signal_windows_gaussian_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_signal_windows_gaussian_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_signal_windows_hamming_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_signal_windows_hamming_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_signbit_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sin_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sin_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sin_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sinc_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sinc_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sinh_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sinh_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sinh_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_slice_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_slice_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_slice_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_slice_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_slice_scatter_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_slice_scatter_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_slice_scatter_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_softmax_with_dtype_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_softmax_with_dtype_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_softmax_with_dtype_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sort_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sort_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sort_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sort_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sparse_mm_reduce_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sparse_mm_reduce_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sparse_sampled_addmm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_airy_ai_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_airy_ai_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_airy_ai_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_airy_ai_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_bessel_j1_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_bessel_j1_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_bessel_y0_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_bessel_y1_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_bessel_y1_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_bessel_y1_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_bessel_y1_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_chebyshev_polynomial_t_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_chebyshev_polynomial_t_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_chebyshev_polynomial_u_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_chebyshev_polynomial_u_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_chebyshev_polynomial_v_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_chebyshev_polynomial_v_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_chebyshev_polynomial_v_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_chebyshev_polynomial_v_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_chebyshev_polynomial_w_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_chebyshev_polynomial_w_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_entr_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_entr_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_erfcx_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_hermite_polynomial_h_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_hermite_polynomial_h_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_hermite_polynomial_h_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_hermite_polynomial_he_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_i0e_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_i0e_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_i0e_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_i1_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_i1_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_i1_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_i1e_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_i1e_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_laguerre_polynomial_l_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_legendre_polynomial_p_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_log_ndtr_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_log_ndtr_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_modified_bessel_i0_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_modified_bessel_i0_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_modified_bessel_i1_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_modified_bessel_k0_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_modified_bessel_k1_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_modified_bessel_k1_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_ndtr_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_ndtr_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_ndtr_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_ndtri_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_ndtri_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_ndtri_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_ndtri_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_polygamma_special_polygamma_n_0_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_polygamma_special_polygamma_n_0_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_polygamma_special_polygamma_n_0_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_polygamma_special_polygamma_n_0_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_scaled_modified_bessel_k0_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_scaled_modified_bessel_k0_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_scaled_modified_bessel_k1_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_shifted_chebyshev_polynomial_t_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_shifted_chebyshev_polynomial_t_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_shifted_chebyshev_polynomial_t_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_shifted_chebyshev_polynomial_u_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_shifted_chebyshev_polynomial_v_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_shifted_chebyshev_polynomial_v_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_shifted_chebyshev_polynomial_w_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_spherical_bessel_j0_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_spherical_bessel_j0_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_zeta_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_zeta_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_zeta_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_split_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_split_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_split_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_split_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_split_list_args_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_split_list_args_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_split_list_args_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_split_list_args_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_split_list_args_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_split_with_sizes_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_split_with_sizes_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_split_with_sizes_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_split_with_sizes_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sqrt_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_square_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_square_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_square_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_squeeze_copy_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_squeeze_copy_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_squeeze_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_squeeze_multiple_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_stack_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_stack_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_stack_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_stack_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_std_mean_unbiased_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_std_mean_unbiased_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sub_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sub_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sum_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sum_to_size_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sum_to_size_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_svd_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_svd_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_svd_lowrank_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_t_copy_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_t_copy_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_t_copy_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_t_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_t_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_t_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_take_along_dim_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_take_along_dim_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_take_along_dim_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_take_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_take_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_take_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_tan_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_tan_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_tan_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_tanh_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_tanh_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_tanh_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_tanh_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_tensor_split_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_tensordot_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_tile_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_tile_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_tile_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_to_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_to_sparse_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_to_sparse_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_to_sparse_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_topk_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_topk_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_topk_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_torch_ops_aten__safe_softmax_default_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_trace_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_trace_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_trace_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_transpose_copy_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_transpose_copy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_transpose_copy_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_transpose_copy_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_transpose_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_transpose_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_trapezoid_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_trapz_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_trapz_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_trapz_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_trapz_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_tril_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_tril_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_tril_indices_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_triu_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_triu_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_triu_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_triu_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_triu_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_triu_indices_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_true_divide_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_true_divide_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_true_divide_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_trunc_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_trunc_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unbind_copy_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unbind_copy_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unbind_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unbind_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unflatten_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unflatten_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unflatten_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unfold_copy_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unfold_copy_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unfold_copy_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unfold_copy_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unique_consecutive_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unique_consecutive_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unique_consecutive_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unique_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unique_cuda_uint32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unravel_index_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unsafe_chunk_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unsafe_chunk_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unsafe_split_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unsafe_split_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unsafe_split_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unsafe_split_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unsafe_split_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unsafe_split_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unsqueeze_copy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unsqueeze_copy_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unsqueeze_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unsqueeze_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_var_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_var_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_var_mean_unbiased_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_var_mean_unbiased_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_var_unbiased_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_var_unbiased_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_var_unbiased_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_vdot_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_view_as_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_view_as_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_view_as_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_view_as_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_view_as_real_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_view_copy_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_view_copy_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_view_copy_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_view_copy_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_view_copy_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_view_copy_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_view_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_view_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_vsplit_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_vsplit_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_vsplit_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_vstack_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_vstack_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_vstack_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_vstack_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_where_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_where_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_where_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_where_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_xlogy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_xlogy_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_xlogy_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_zero__cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_zero__cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_zeros_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_zeros_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_zeros_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_zeros_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_zeros_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_zeros_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_zeros_like_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_zeros_like_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_zeros_like_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_zeros_like_cuda_int8, test/test_meta.py::TestMetaCUDA::test_group_norm_backward_output_mask0_cuda, test/test_meta.py::TestMetaCUDA::test_group_norm_backward_output_mask5_cuda, test/test_meta.py::TestMetaCUDA::test_index_select_out_cuda, test/test_meta.py::TestMetaCUDA::test_layer_norm_backward_output_mask6_cuda, test/test_meta.py::TestMetaCUDA::test_meta_inplace_H_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_H_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_T_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_T_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_T_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace___getitem___cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace___getitem___cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace___radd___cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace___radd___cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace___rand___cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace___rand___cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace___rand___cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace___rdiv___cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace___rmatmul___cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace___rmod___cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace___rmod___cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace___rmod___cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace___rmod___cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace___rmul___cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace___rmul___cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace___ror___cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace___rpow___cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace___rpow___cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace___rpow___cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace___rsub___cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace___rsub___cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace___rsub___cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace___rsub___cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace___rxor___cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__batch_norm_with_update_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_abs_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_abs_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_abs_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_abs_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_acos_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_acos_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_acos_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_add_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_add_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_add_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_addcdiv_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_addcmul_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_asin_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_asin_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_asin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_asin_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_asin_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_atan_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_atan_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_ceil_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_ceil_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_clamp_max_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_clamp_min_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_clamp_min_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_clamp_min_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_copy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_copy_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_copy_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_copy_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_cos_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_cos_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_cosh_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_cosh_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_cosh_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_cosh_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_cosh_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_cosh_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_div_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_div_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_div_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_erf_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_erf_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_erf_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_erf_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_erfc_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_erfc_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_erfc_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_erfc_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_erfc_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_erfc_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_exp_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_exp_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_exp_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_expm1_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_expm1_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_expm1_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_floor_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_frac_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_frac_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_frac_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_lerp_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_lerp_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_lgamma_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_lgamma_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_log10_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_log10_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_log1p_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_log1p_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_log2_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_log2_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_max_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_max_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_max_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_max_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_maximum_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_maximum_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_minimum_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_minimum_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_minimum_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_mul_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_mul_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_neg_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_neg_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_neg_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_neg_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_norm_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_norm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_norm_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_pow_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_pow_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_reciprocal_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_round_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_round_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_round_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_rsqrt_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_rsqrt_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_rsqrt_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_rsqrt_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_sigmoid_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_sigmoid_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_sign_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_sin_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_sin_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_sin_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_sinh_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_sinh_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_sqrt_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_sqrt_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_sqrt_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_sub_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_sub_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_tan_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_tan_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_tan_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_tan_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_tan_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_tanh_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_tanh_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_tanh_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_trunc_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_trunc_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_trunc_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_trunc_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__native_batch_norm_legit_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__segment_reduce_offsets_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__softmax_backward_data_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__unsafe_masked_index_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__unsafe_masked_index_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace__unsafe_masked_index_put_accumulate_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace__unsafe_masked_index_put_accumulate_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace__upsample_bilinear2d_aa_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_acos_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_acos_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_acos_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_add_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_add_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_add_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_addcmul_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_addcmul_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_addmm_decomposed_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_addmm_decomposed_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_addr_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_addr_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_alias_copy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_all_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_all_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_all_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_all_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_all_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_allclose_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_allclose_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_allclose_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_amax_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_amax_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_amax_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_amin_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_amin_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_aminmax_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_angle_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_angle_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_any_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_arange_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_arange_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_argmax_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_argmin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_argsort_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_argsort_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_argwhere_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_argwhere_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_argwhere_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_as_strided_copy_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_as_strided_copy_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_as_strided_partial_views_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_as_strided_partial_views_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_as_strided_partial_views_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_as_strided_partial_views_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_as_strided_scatter_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_as_strided_scatter_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_asin_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_asinh_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_asinh_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_atan2_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_atan_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_atan_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_atanh_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_atanh_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_atanh_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_atanh_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_atanh_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_atleast_1d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_atleast_1d_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_atleast_2d_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_atleast_2d_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_atleast_3d_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_baddbmm_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_bfloat16_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_bincount_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_bincount_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_bincount_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_bitwise_and_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_bitwise_left_shift_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_bitwise_not_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_bitwise_not_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_bitwise_not_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_bitwise_not_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_bitwise_not_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_bitwise_or_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_bitwise_right_shift_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_bitwise_xor_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_bitwise_xor_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_block_diag_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_block_diag_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_block_diag_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_bmm_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_bmm_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_bool_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_bool_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_broadcast_tensors_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_broadcast_tensors_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_broadcast_tensors_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_broadcast_to_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_broadcast_to_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_bucketize_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_bucketize_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_byte_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_byte_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_byte_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_byte_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cartesian_prod_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cartesian_prod_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cartesian_prod_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cartesian_prod_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cat_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cdouble_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cdouble_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cdouble_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_ceil_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cfloat_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cfloat_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cfloat_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_chalf_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_chalf_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_chalf_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_char_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_char_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_char_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_char_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_char_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cholesky_inverse_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cholesky_inverse_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_chunk_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_clamp_max_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_clone_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_column_stack_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_column_stack_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_combinations_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_combinations_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_combinations_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_complex_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_conj_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_conj_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_conj_physical_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_conj_physical_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_constant_pad_nd_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_constant_pad_nd_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_contiguous_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_contiguous_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_copysign_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_copysign_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cos_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cosh_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cosh_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cosh_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_count_nonzero_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cov_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cov_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cross_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cross_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cummax_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cummax_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cummin_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cummin_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cumprod_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cumprod_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cumsum_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cumsum_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cumulative_trapezoid_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cumulative_trapezoid_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cumulative_trapezoid_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_deg2rad_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_diag_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_diag_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_diag_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_diag_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_diag_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_diag_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_diagflat_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_diagonal_copy_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_diagonal_copy_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_diagonal_copy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_diagonal_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_diagonal_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_diagonal_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_diagonal_scatter_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_diagonal_scatter_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_diagonal_scatter_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_diff_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_diff_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_diff_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_diff_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_digamma_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_dist_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_dist_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_div_no_rounding_mode_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_div_no_rounding_mode_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_dot_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_double_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_double_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_dsplit_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_dsplit_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_dsplit_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_dstack_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_dstack_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_dstack_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_dstack_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_dstack_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_dstack_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_dstack_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_empty_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_empty_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_empty_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_empty_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_empty_like_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_empty_like_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_empty_like_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_empty_like_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_empty_permuted_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_empty_permuted_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_empty_permuted_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_empty_permuted_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_empty_strided_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_empty_strided_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_eq_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_eq_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_eq_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_eq_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_eq_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_equal_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_equal_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_equal_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_equal_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_equal_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_erf_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_erf_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_erf_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_erfc_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_erfc_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_erfc_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_erfc_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_erfinv_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_erfinv_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_erfinv_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_exp2_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_exp_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_exp_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_exp_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_exp_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_exp_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_exp_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_exp_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_expand_as_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_expand_as_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_expand_copy_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_expand_copy_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_expand_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_expand_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_expm1_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_eye_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_eye_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_fft2_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_fft2_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_fftn_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_fftshift_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_fftshift_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_hfft2_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_hfft2_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_hfft2_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_hfft_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_hfftn_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_hfftn_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_hfftn_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_ifft2_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_ifft2_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_ifft_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_ifft_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_ifft_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_ifft_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_ifftn_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_ifftn_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_ifftn_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_ifftn_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_ifftn_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_ifftn_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_ifftshift_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_ifftshift_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_ifftshift_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_ifftshift_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_ifftshift_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_ihfft2_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_ihfftn_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_irfft2_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_irfft_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_irfft_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_irfft_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_irfft_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_irfftn_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_irfftn_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_rfft2_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_rfft2_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_rfft2_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_rfft_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_rfft_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_rfftn_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_rfftn_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fill_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fill_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fill_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fill_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_flatten_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_flatten_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_flip_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_flip_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_flip_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_flip_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fliplr_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fliplr_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fliplr_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fliplr_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_flipud_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_flipud_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_float_power_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_floor_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_floor_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_floor_divide_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fmax_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fmax_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fmax_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fmin_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fmod_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fmod_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fmod_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_frac_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_full_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_full_like_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_full_like_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_full_like_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_gather_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_gather_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_gather_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_gather_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_gather_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_gcd_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_gcd_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_ge_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_ge_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_ge_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_geometric_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_geometric_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_geometric_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_gradient_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_gradient_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_grid_sampler_3d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_grid_sampler_3d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_gt_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_gt_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_half_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_hash_tensor_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_hash_tensor_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_hash_tensor_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_hash_tensor_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_heaviside_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_heaviside_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_heaviside_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_heaviside_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_hsplit_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_hstack_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_hstack_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_hstack_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_hstack_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_igamma_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_imag_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_imag_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_index_add_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_index_add_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_index_copy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_index_copy_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_index_copy_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_index_fill_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_index_fill_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_index_put_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_index_put_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_index_put_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_index_reduce_amax_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_index_reduce_amin_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_index_reduce_amin_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_index_reduce_amin_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_index_reduce_mean_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_index_reduce_prod_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_index_select_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_index_select_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_inner_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_int_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_int_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_isclose_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_isclose_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_isfinite_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_isfinite_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_isin_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_isinf_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_isinf_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_isnan_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_isnan_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_isneginf_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_isneginf_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_isposinf_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_isposinf_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_isreal_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_isreal_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_isreal_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_item_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_item_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_item_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_jiterator_2inputs_2outputs_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_jiterator_2inputs_2outputs_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_jiterator_2inputs_2outputs_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_jiterator_4inputs_with_extra_args_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_jiterator_4inputs_with_extra_args_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_jiterator_binary_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_jiterator_binary_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_jiterator_binary_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_jiterator_binary_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_jiterator_binary_return_by_ref_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_jiterator_binary_return_by_ref_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_jiterator_binary_return_by_ref_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_jiterator_binary_return_by_ref_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_jiterator_binary_return_by_ref_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_jiterator_unary_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_kron_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_kron_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_kron_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_kthvalue_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_kthvalue_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_lcm_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_ldexp_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_ldexp_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_le_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_le_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_lgamma_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_lgamma_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_lgamma_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_cholesky_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_cross_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_cross_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_diagonal_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_diagonal_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_diagonal_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_diagonal_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_diagonal_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_diagonal_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_eig_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_eigh_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_eigvals_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_eigvalsh_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_householder_product_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_householder_product_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_inv_ex_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_ldl_factor_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_ldl_factor_ex_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_ldl_solve_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_lstsq_grad_oriented_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_lu_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_lu_factor_ex_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_matrix_power_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_matrix_rank_hermitian_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_matrix_rank_hermitian_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_norm_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_norm_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_norm_subgradients_at_zero_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_norm_subgradients_at_zero_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_norm_subgradients_at_zero_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_pinv_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_pinv_hermitian_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_slogdet_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_solve_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_solve_ex_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_svd_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_svd_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_svdvals_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_tensorsolve_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_vander_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_vander_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_vander_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_vecdot_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_vector_norm_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_vector_norm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linspace_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linspace_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linspace_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linspace_tensor_overload_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linspace_tensor_overload_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_log10_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_log10_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_log1p_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_log2_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_log2_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_log2_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_log2_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_log2_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_log_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_log_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_log_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_log_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_log_softmax_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_log_softmax_with_dtype_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_logaddexp_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_logdet_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_logdet_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_logdet_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_logical_and_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_logical_and_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_logical_and_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_logical_not_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_logical_not_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_logical_not_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_logical_not_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_logical_not_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_logical_or_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_logical_or_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_logit_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_logit_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_logspace_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_logspace_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_logspace_tensor_overload_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_logspace_tensor_overload_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_logspace_tensor_overload_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_logspace_tensor_overload_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_logspace_tensor_overload_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_logspace_tensor_overload_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_logspace_tensor_overload_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_logsumexp_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_logsumexp_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_logsumexp_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_long_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_long_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_long_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_lu_solve_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_lu_solve_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_lu_unpack_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_mH_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_mH_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_mH_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_mH_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_mH_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_mH_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_mH_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_mT_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_mT_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_mT_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_mT_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_mT_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_amax_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_argmax_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_argmax_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_argmin_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_argmin_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_cumprod_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_cumsum_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_cumsum_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_cumsum_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_fill_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_fill_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_fill_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_fill_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_log_softmax_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_logsumexp_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_mean_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_mean_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_norm_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_normalize_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_normalize_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_prod_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_prod_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_scatter_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_scatter_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_scatter_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_scatter_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_select_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_softmin_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_softmin_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_std_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_std_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_std_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_std_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_sum_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_sum_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_sum_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_var_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_matmul_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_matmul_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_matrix_exp_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_matrix_exp_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_max_binary_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_max_reduction_no_dim_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_max_reduction_no_dim_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_max_reduction_with_dim_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_maximum_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_maximum_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_maximum_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_maximum_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_mean_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_mean_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_median_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_meshgrid_variadic_tensors_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_meshgrid_variadic_tensors_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_min_binary_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_min_binary_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_min_reduction_no_dim_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_min_reduction_no_dim_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_min_reduction_no_dim_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_min_reduction_with_dim_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_minimum_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_minimum_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_minimum_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_minimum_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_minimum_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_mode_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_mode_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_mode_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_movedim_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_movedim_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_movedim_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_movedim_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_movedim_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_msort_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_msort_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_msort_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_mul_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_mul_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_multinomial_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_multinomial_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_multinomial_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_mv_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_mv_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_mvlgamma_mvlgamma_p_1_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_mvlgamma_mvlgamma_p_1_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_mvlgamma_mvlgamma_p_3_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_mvlgamma_mvlgamma_p_5_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_mvlgamma_mvlgamma_p_5_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_mvlgamma_mvlgamma_p_5_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nan_to_num_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nanmean_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nanmean_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nanmedian_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nanmedian_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nanmedian_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nanmedian_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nansum_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nansum_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nansum_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_narrow_copy_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_narrow_copy_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_narrow_copy_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_narrow_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_narrow_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_native_dropout_backward_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_native_dropout_backward_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_native_layer_norm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_ne_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_ne_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_ne_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_ne_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_neg_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_neg_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_neg_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_new_empty_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_new_empty_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_new_empty_strided_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_new_empty_strided_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_new_full_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_new_full_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_new_ones_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_new_zeros_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_new_zeros_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_new_zeros_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nextafter_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_adaptive_avg_pool2d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_adaptive_max_pool2d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_adaptive_max_pool3d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_avg_pool1d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_avg_pool2d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_avg_pool3d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_batch_norm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_bilinear_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_binary_cross_entropy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_binary_cross_entropy_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_celu_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_channel_shuffle_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_channel_shuffle_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_channel_shuffle_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_channel_shuffle_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_conv1d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_conv2d_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_conv2d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_conv3d_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_conv3d_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_conv_transpose1d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_conv_transpose1d_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_conv_transpose1d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_conv_transpose2d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_conv_transpose2d_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_conv_transpose2d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_conv_transpose3d_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_cross_entropy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_cross_entropy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_dropout3d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_dropout3d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_elu_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_elu_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_embedding_bag_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_embedding_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_feature_alpha_dropout_with_train_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_feature_alpha_dropout_with_train_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_feature_alpha_dropout_without_train_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_fractional_max_pool2d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_fractional_max_pool2d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_fractional_max_pool3d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_gelu_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_glu_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_group_norm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_hardsigmoid_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_hardtanh_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_hardtanh_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_interpolate_area_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_interpolate_bicubic_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_interpolate_bilinear_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_interpolate_nearest_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_l1_loss_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_l1_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_leaky_relu_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_linear_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_linear_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_logsigmoid_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_logsigmoid_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_logsigmoid_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_margin_ranking_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_margin_ranking_loss_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_max_pool1d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_max_pool1d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_max_unpool1d_grad_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_max_unpool2d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_max_unpool2d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_mse_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_multi_head_attention_forward_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_multi_head_attention_forward_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_multi_margin_loss_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_multilabel_margin_loss_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_nll_loss_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_normalize_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_normalize_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_pad_circular_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_pad_circular_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_pad_circular_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_pad_circular_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_pad_constant_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_pad_constant_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_pad_reflect_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_pad_reflect_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_pad_reflect_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_pad_replicate_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_pad_replicate_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_pad_replicate_negative_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_pad_replicate_negative_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_pad_replicate_negative_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_pairwise_distance_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_pairwise_distance_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_pairwise_distance_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_pixel_shuffle_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_pixel_shuffle_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_pixel_unshuffle_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_pixel_unshuffle_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_poisson_nll_loss_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_poisson_nll_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_poisson_nll_loss_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_relu6_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_relu_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_rms_norm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_selu_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_selu_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_selu_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_silu_complex_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_smooth_l1_loss_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_softmin_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_softmin_with_dtype_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_softmin_with_dtype_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_softplus_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_softshrink_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_softsign_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_softsign_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_tanhshrink_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_tanhshrink_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_threshold_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_threshold_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_threshold_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_triplet_margin_with_distance_loss_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_unfold_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_upsample_bilinear_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_upsample_bilinear_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nonzero_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nonzero_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nonzero_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nonzero_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nonzero_static_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nonzero_static_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_norm_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_norm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_norm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_norm_fro_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_norm_nuc_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_ones_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_ones_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_outer_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_outer_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_outer_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_outer_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_pca_lowrank_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_permute_copy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_permute_copy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_permute_copy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_permute_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_permute_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_permute_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_permute_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_permute_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_pinverse_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_pinverse_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_polygamma_polygamma_n_0_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_polygamma_polygamma_n_0_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_polygamma_polygamma_n_0_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_polygamma_polygamma_n_1_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_polygamma_polygamma_n_1_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_polygamma_polygamma_n_1_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_polygamma_polygamma_n_2_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_polygamma_polygamma_n_2_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_polygamma_polygamma_n_3_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_polygamma_polygamma_n_3_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_polygamma_polygamma_n_4_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_polygamma_polygamma_n_4_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_polygamma_polygamma_n_4_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_positive_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_positive_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_positive_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_positive_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_positive_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_positive_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_positive_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_pow_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_pow_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_prod_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_prod_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_qr_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_quantile_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_rand_like_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_rand_like_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_randint_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_randint_like_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_randint_like_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_randint_like_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_randn_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_randn_like_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_randn_like_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_randn_like_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_randn_like_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_ravel_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_real_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_reciprocal_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_reciprocal_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_remainder_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_repeat_interleave_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_reshape_as_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_reshape_as_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_reshape_as_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_reshape_as_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_reshape_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_reshape_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_resize__cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_resize__cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_resize_as__cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_resize_as__cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_resize_as__cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_resolve_conj_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_resolve_conj_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_resolve_neg_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_resolve_neg_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_resolve_neg_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_roll_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_roll_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_roll_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_rot90_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_rot90_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_round_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_round_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_round_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_round_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_round_decimals_0_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_rsqrt_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_rsub_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_scalar_tensor_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_scalar_tensor_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_scalar_tensor_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_scatter_add_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_scatter_add_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_scatter_add_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_scatter_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_scatter_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_scatter_reduce_amax_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_scatter_reduce_amax_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_scatter_reduce_amin_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_scatter_reduce_mean_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_scatter_reduce_mean_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_scatter_reduce_mean_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_scatter_reduce_prod_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_scatter_reduce_sum_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_scatter_reduce_sum_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_scatter_reduce_sum_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_searchsorted_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_searchsorted_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_searchsorted_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_searchsorted_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_select_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_select_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_select_scatter_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sgn_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sgn_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_short_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_short_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_short_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sigmoid_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sigmoid_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sigmoid_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sign_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sign_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sign_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_signal_windows_exponential_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_signal_windows_gaussian_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_signal_windows_general_hamming_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_signal_windows_hamming_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_signal_windows_nuttall_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_signbit_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_signbit_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sin_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sin_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sinc_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sinh_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sinh_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sinh_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_slice_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_slice_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_slice_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_slice_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_slice_scatter_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_softmax_with_dtype_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_softmax_with_dtype_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_softmax_with_dtype_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_softmax_with_dtype_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sort_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sort_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sort_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sparse_mm_reduce_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_airy_ai_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_airy_ai_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_airy_ai_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_bessel_j1_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_bessel_j1_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_bessel_j1_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_bessel_y0_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_bessel_y0_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_chebyshev_polynomial_t_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_chebyshev_polynomial_u_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_chebyshev_polynomial_u_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_chebyshev_polynomial_v_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_chebyshev_polynomial_w_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_chebyshev_polynomial_w_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_chebyshev_polynomial_w_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_entr_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_entr_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_hermite_polynomial_h_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_hermite_polynomial_h_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_hermite_polynomial_he_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_i1_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_i1e_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_i1e_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_laguerre_polynomial_l_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_laguerre_polynomial_l_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_laguerre_polynomial_l_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_legendre_polynomial_p_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_log_ndtr_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_modified_bessel_k0_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_modified_bessel_k0_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_modified_bessel_k1_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_ndtr_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_polygamma_special_polygamma_n_0_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_polygamma_special_polygamma_n_0_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_polygamma_special_polygamma_n_0_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_scaled_modified_bessel_k0_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_scaled_modified_bessel_k0_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_scaled_modified_bessel_k1_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_scaled_modified_bessel_k1_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_shifted_chebyshev_polynomial_t_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_shifted_chebyshev_polynomial_u_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_shifted_chebyshev_polynomial_w_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_shifted_chebyshev_polynomial_w_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_shifted_chebyshev_polynomial_w_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_spherical_bessel_j0_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_xlog1py_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_xlog1py_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_zeta_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_zeta_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_zeta_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_split_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_split_list_args_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_split_list_args_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_split_list_args_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_split_list_args_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_split_with_sizes_copy_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_split_with_sizes_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sqrt_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_square_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_square_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_square_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_squeeze_copy_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_squeeze_copy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_squeeze_copy_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_squeeze_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_squeeze_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_squeeze_multiple_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_squeeze_multiple_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_squeeze_multiple_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_stack_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_stack_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_std_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_std_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_std_mean_unbiased_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_std_unbiased_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_stft_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_stft_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_stft_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sub_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sub_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sub_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sub_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sum_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sum_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sum_to_size_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sum_to_size_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_svd_lowrank_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_t_copy_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_t_copy_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_t_copy_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_t_copy_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_t_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_t_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_t_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_t_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_take_along_dim_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_take_along_dim_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_take_along_dim_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_take_along_dim_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_take_along_dim_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_take_along_dim_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_tan_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_tan_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_tan_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_tan_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_tan_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_tan_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_tanh_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_tanh_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_tensor_split_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_tensor_split_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_tensor_split_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_tensor_split_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_tensordot_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_tile_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_to_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_to_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_to_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_to_sparse_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_topk_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_topk_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_torch_ops_aten__efficient_attention_forward_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_torch_ops_aten__safe_softmax_default_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_torch_ops_aten__safe_softmax_default_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_trace_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_trace_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_trace_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_transpose_copy_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_transpose_copy_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_transpose_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_transpose_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_transpose_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_transpose_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_trapezoid_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_trapz_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_trapz_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_trapz_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_triangular_solve_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_tril_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_tril_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_tril_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_triu_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_triu_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_triu_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_triu_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_true_divide_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_trunc_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unbind_copy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unbind_copy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unbind_copy_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unbind_copy_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unbind_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unbind_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unbind_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unflatten_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unflatten_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unflatten_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unflatten_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unfold_copy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unfold_copy_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unfold_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unique_consecutive_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unique_consecutive_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unique_consecutive_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unique_consecutive_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unravel_index_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unsafe_chunk_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unsafe_chunk_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unsafe_split_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unsafe_split_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unsafe_split_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unsafe_split_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unsqueeze_copy_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unsqueeze_copy_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unsqueeze_copy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unsqueeze_copy_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unsqueeze_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unsqueeze_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unsqueeze_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_var_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_var_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_var_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_var_mean_unbiased_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_var_mean_unbiased_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_vdot_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_view_as_complex_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_view_as_complex_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_view_copy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_view_copy_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_view_copy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_view_copy_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_view_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_view_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_vsplit_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_vsplit_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_vstack_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_where_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_where_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_xlogy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_xlogy_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_xlogy_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_xlogy_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_zero__cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_zero__cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_zero__cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_zeros_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_zeros_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_zeros_like_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_zeros_like_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_H_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_H_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_T_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_T_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_T_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace___getitem___cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace___getitem___cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace___radd___cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace___radd___cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace___rand___cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace___rand___cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace___rdiv___cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace___rdiv___cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace___rmatmul___cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace___rmod___cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace___rmod___cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace___rmod___cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace___rmod___cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace___rmul___cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace___rmul___cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace___rmul___cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace___rmul___cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace___rpow___cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace___rsub___cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace___rsub___cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace__chunk_cat_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_abs_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_abs_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_abs_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_abs_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_add_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_add_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_addcdiv_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_addcmul_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_addcmul_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_asin_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_asin_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_asin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_atan_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_atan_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_atan_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_atan_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_atan_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_ceil_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_ceil_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_ceil_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_clamp_max_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_clamp_min_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_clamp_min_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_copy_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_copy_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_cos_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_cos_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_cos_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_cosh_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_cosh_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_cosh_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_div_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_div_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_erf_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_exp_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_expm1_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_expm1_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_floor_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_frac_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_frac_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_frac_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_frac_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_frac_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_frac_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_lerp_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_lerp_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_lerp_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_lgamma_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_lgamma_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_lgamma_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_log10_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_log10_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_log10_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_log1p_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_log1p_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_log1p_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_log1p_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_log2_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_log2_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_log2_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_log_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_max_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_max_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_max_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_max_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_max_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_maximum_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_maximum_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_maximum_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_maximum_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_minimum_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_minimum_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_mul_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_mul_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_mul_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_mul_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_mul_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_mul_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_neg_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_neg_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_norm_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_pow_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_pow_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_pow_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_pow_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_reciprocal_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_reciprocal_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_reciprocal_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_round_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_round_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_round_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_round_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_rsqrt_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_rsqrt_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_rsqrt_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_sigmoid_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_sigmoid_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_sigmoid_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_sign_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_sign_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_sign_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_sign_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_sin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_sin_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_sin_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_sin_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_sinh_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_sinh_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_sinh_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_sqrt_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_sqrt_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_sub_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_sub_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_sub_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_tan_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_tan_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_tanh_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_tanh_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_tanh_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_trunc_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_trunc_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_zero_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_zero_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__segment_reduce_lengths_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__segment_reduce_offsets_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__segment_reduce_offsets_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace__segment_reduce_offsets_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__unsafe_masked_index_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace__unsafe_masked_index_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__unsafe_masked_index_put_accumulate_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace__unsafe_masked_index_put_accumulate_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__unsafe_masked_index_put_accumulate_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__unsafe_masked_index_put_accumulate_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__unsafe_masked_index_put_accumulate_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_abs_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_abs_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_abs_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_acos_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_acos_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_acos_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_acos_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_acos_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_acosh_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_add_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_add_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_addbmm_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_addcmul_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_addmm_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_addmm_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_addmm_decomposed_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_addmv_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_addr_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_addr_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_addr_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_addr_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_addr_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_alias_copy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_alias_copy_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_alias_copy_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_alias_copy_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_alias_copy_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_all_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_all_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_all_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_amin_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_amin_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_amin_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_amin_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_aminmax_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_aminmax_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_aminmax_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_angle_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_angle_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_any_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_any_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_any_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_arange_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_arange_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_argmax_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_argmin_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_argsort_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_argsort_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_argsort_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_argwhere_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_argwhere_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_argwhere_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_as_strided_copy_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_as_strided_copy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_as_strided_copy_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_as_strided_copy_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_as_strided_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_as_strided_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_as_strided_partial_views_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_as_strided_partial_views_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_as_strided_partial_views_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_as_strided_scatter_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_as_strided_scatter_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_as_strided_scatter_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_asin_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_asinh_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_asinh_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_atan2_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_atan_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_atanh_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_atanh_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_atanh_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_atanh_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_atanh_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_atleast_1d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_atleast_1d_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_atleast_1d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_atleast_2d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_atleast_2d_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_atleast_2d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_atleast_2d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_atleast_2d_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_atleast_2d_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_atleast_3d_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_atleast_3d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_atleast_3d_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_baddbmm_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_bernoulli_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_bfloat16_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_bitwise_and_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_bitwise_not_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_bitwise_not_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_bitwise_right_shift_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_bitwise_xor_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_bitwise_xor_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_bitwise_xor_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_bitwise_xor_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_block_diag_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_block_diag_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_block_diag_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_bmm_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_bool_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_bool_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_bool_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_broadcast_shapes_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_broadcast_tensors_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_broadcast_tensors_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_broadcast_to_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_broadcast_to_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_bucketize_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_byte_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_byte_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cartesian_prod_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cartesian_prod_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cartesian_prod_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cat_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cauchy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cdouble_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cdouble_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cdouble_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_ceil_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_ceil_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cfloat_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cfloat_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cfloat_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cfloat_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_chalf_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_chalf_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_chalf_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_char_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_char_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_char_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_char_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cholesky_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cholesky_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cholesky_inverse_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cholesky_inverse_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_chunk_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_chunk_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_clamp_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_clamp_max_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_clamp_max_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_clamp_max_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_clamp_max_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_clamp_max_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_clone_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_clone_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_column_stack_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_column_stack_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_column_stack_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_column_stack_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_column_stack_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_column_stack_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_combinations_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_combinations_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_combinations_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_complex_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_complex_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_complex_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_conj_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_conj_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_conj_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_conj_physical_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_constant_pad_nd_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_constant_pad_nd_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_contiguous_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_contiguous_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_copysign_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_copysign_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_copysign_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_copysign_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_copysign_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_corrcoef_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_corrcoef_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_corrcoef_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_corrcoef_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cos_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cos_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cos_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cos_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cos_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cosh_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cosh_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cosh_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_count_nonzero_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_count_nonzero_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_count_nonzero_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_count_nonzero_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cov_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cov_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cov_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cross_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cross_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cross_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cummax_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cummin_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cummin_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cumprod_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cumprod_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cumprod_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cumprod_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cumsum_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cumsum_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cumulative_trapezoid_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_deg2rad_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_deg2rad_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_deg2rad_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_diag_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_diag_embed_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_diag_embed_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_diag_embed_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_diagflat_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_diagflat_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_diagonal_copy_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_diagonal_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_diagonal_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_diagonal_scatter_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_diff_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_diff_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_digamma_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_digamma_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_dist_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_dist_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_dist_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_div_floor_rounding_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_div_no_rounding_mode_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_div_no_rounding_mode_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_div_trunc_rounding_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_dsplit_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_dsplit_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_dsplit_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_dsplit_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_empty_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_empty_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_empty_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_empty_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_empty_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_empty_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_empty_permuted_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_empty_strided_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_equal_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_equal_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_equal_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_erf_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_erfc_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_erfc_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_exp2_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_exp2_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_exp2_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_exp_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_exp_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_exp_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_exp_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_exp_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_expand_as_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_expand_as_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_expand_as_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_expand_as_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_expand_copy_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_expand_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_expand_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_expand_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_expm1_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_expm1_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_eye_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_eye_cuda_float8_e4m3fn, test/test_meta.py::TestMetaCUDA::test_meta_outplace_eye_cuda_float8_e5m2fnuz, test/test_meta.py::TestMetaCUDA::test_meta_outplace_eye_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_fft2_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_fft_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_fftn_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_fftn_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_fftshift_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_fftshift_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_fftshift_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_hfft2_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_hfft2_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_hfft_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_hfft_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_hfftn_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_ifft2_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_ifft2_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_ifft2_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_ifft2_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_ifft_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_ifft_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_ifftn_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_ifftn_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_ifftn_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_ifftshift_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_ifftshift_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_ifftshift_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_ifftshift_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_ihfft2_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_ihfft_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_ihfft_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_ihfftn_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_ihfftn_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_ihfftn_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_ihfftn_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_irfft2_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_irfft2_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_irfft2_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_irfft2_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_irfft_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_irfftn_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_rfft2_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_rfft2_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_rfft2_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_rfft2_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_rfft_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_rfftn_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_rfftn_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_rfftn_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fill_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fill_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fill_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fill_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fill_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_flatten_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_flatten_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fliplr_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fliplr_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fliplr_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fliplr_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_flipud_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_flipud_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_flipud_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_float_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_float_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_float_power_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_float_power_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_float_power_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_float_power_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_float_power_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_floor_divide_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fmax_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fmax_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fmax_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fmin_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fmin_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fmod_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_frac_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_frac_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_full_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_full_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_full_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_full_like_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_full_like_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_full_like_cuda_uint16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_gather_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_gcd_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_ge_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_geometric_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_geometric_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_geometric_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_grid_sampler_2d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_gt_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_gt_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_gt_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_gt_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_half_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_half_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_half_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_half_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_hash_tensor_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_heaviside_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_heaviside_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_hsplit_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_hstack_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_hstack_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_hstack_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_i0_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_i0_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_igamma_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_igammac_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_index_add_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_index_add_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_index_add_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_index_copy_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_index_copy_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_index_fill_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_index_fill_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_index_fill_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_index_put_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_index_put_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_index_put_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_index_reduce_amax_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_index_reduce_amin_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_index_reduce_amin_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_index_reduce_prod_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_index_reduce_prod_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_index_select_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_index_select_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_index_select_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_index_select_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_inner_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_int_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_isclose_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_isfinite_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_isfinite_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_isin_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_isin_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_isinf_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_isinf_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_isinf_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_isnan_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_isnan_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_isneginf_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_isneginf_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_isneginf_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_isposinf_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_isposinf_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_isposinf_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_isreal_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_item_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_item_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_jiterator_4inputs_with_extra_args_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_jiterator_4inputs_with_extra_args_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_jiterator_4inputs_with_extra_args_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_jiterator_4inputs_with_extra_args_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_jiterator_binary_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_jiterator_binary_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_jiterator_binary_return_by_ref_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_jiterator_binary_return_by_ref_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_jiterator_binary_return_by_ref_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_jiterator_binary_return_by_ref_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_jiterator_unary_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_jiterator_unary_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_jiterator_unary_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_jiterator_unary_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_kron_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_kron_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_kthvalue_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_kthvalue_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_ldexp_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_le_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_le_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_le_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_lerp_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_lgamma_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_lgamma_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_lgamma_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_cholesky_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_cholesky_ex_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_cond_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_cross_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_det_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_diagonal_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_diagonal_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_diagonal_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_diagonal_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_eig_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_eig_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_eigh_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_eigvals_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_householder_product_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_lstsq_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_lstsq_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_lstsq_grad_oriented_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_lu_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_lu_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_lu_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_matrix_norm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_matrix_rank_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_matrix_rank_hermitian_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_matrix_rank_hermitian_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_multi_dot_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_norm_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_norm_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_norm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_pinv_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_pinv_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_pinv_singular_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_pinv_singular_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_svd_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_svd_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_tensorinv_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_tensorsolve_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_tensorsolve_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linspace_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linspace_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linspace_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linspace_tensor_overload_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_log10_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_log10_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_log10_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_log1p_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_log2_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_log2_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_log2_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_log_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_log_softmax_with_dtype_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_log_softmax_with_dtype_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_log_softmax_with_dtype_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_logaddexp2_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_logaddexp2_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_logaddexp_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_logaddexp_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_logaddexp_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_logdet_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_logical_and_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_logical_not_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_logical_not_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_logical_not_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_logical_not_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_logical_not_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_logical_or_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_logical_or_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_logical_or_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_logical_xor_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_logical_xor_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_logit_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_logspace_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_logspace_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_logspace_tensor_overload_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_logspace_tensor_overload_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_logspace_tensor_overload_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_logspace_tensor_overload_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_logsumexp_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_logsumexp_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_logsumexp_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_logsumexp_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_long_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_long_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_long_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_lt_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_lt_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_lu_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_lu_solve_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_lu_unpack_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_mH_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_mH_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_mH_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_mH_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_mH_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_mT_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_amax_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_argmin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_argmin_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_cumprod_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_cumprod_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_cumsum_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_fill_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_fill_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_fill_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_logaddexp_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_logsumexp_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_mean_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_mean_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_norm_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_norm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_normalize_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_prod_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_prod_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_prod_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_scatter_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_scatter_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_select_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_softmax_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_softmin_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_softmin_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_sum_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_sum_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_sum_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_var_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_var_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_max_binary_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_max_binary_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_max_binary_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_max_binary_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_max_pool2d_with_indices_backward_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_max_reduction_no_dim_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_max_reduction_no_dim_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_max_reduction_no_dim_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_max_reduction_with_dim_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_maximum_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_maximum_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_mean_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_median_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_median_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_median_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_meshgrid_list_of_tensors_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_meshgrid_list_of_tensors_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_meshgrid_list_of_tensors_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_meshgrid_variadic_tensors_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_meshgrid_variadic_tensors_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_meshgrid_variadic_tensors_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_min_binary_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_min_reduction_no_dim_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_min_reduction_no_dim_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_min_reduction_with_dim_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_min_reduction_with_dim_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_minimum_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_minimum_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_minimum_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_minimum_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_mm_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_mode_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_mode_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_mode_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_movedim_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_movedim_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_movedim_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_msort_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_msort_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_msort_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_mul_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_mul_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_mv_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_mv_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_mvlgamma_mvlgamma_p_1_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_mvlgamma_mvlgamma_p_1_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_mvlgamma_mvlgamma_p_5_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nan_to_num_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nanmean_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nanmedian_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nanmedian_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nanmedian_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nanmedian_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nanquantile_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_narrow_copy_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_narrow_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_narrow_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_narrow_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_narrow_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_native_batch_norm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_native_layer_norm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_ne_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_neg_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_neg_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_neg_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_new_empty_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_new_empty_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_new_empty_strided_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_new_empty_strided_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_new_empty_strided_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_new_empty_strided_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_new_full_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_new_full_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_new_full_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_new_ones_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_new_ones_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_new_ones_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_new_ones_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_new_zeros_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_new_zeros_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_new_zeros_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_new_zeros_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_new_zeros_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_new_zeros_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_new_zeros_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_adaptive_avg_pool1d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_adaptive_avg_pool2d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_adaptive_avg_pool3d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_adaptive_max_pool1d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_adaptive_max_pool2d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_adaptive_max_pool3d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_alpha_dropout_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_avg_pool3d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_binary_cross_entropy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_binary_cross_entropy_with_logits_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_channel_shuffle_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_conv1d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_conv1d_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_conv1d_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_conv1d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_conv2d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_conv2d_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_conv2d_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_conv3d_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_conv3d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_conv_transpose1d_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_conv_transpose1d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_conv_transpose2d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_conv_transpose3d_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_cosine_embedding_loss_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_cosine_embedding_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_cosine_embedding_loss_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_cosine_similarity_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_dropout2d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_dropout2d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_dropout3d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_dropout3d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_elu_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_embedding_bag_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_embedding_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_feature_alpha_dropout_with_train_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_feature_alpha_dropout_with_train_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_feature_alpha_dropout_with_train_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_feature_alpha_dropout_without_train_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_fractional_max_pool2d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_fractional_max_pool3d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_gelu_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_grid_sample_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_group_norm_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_hardsigmoid_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_hardswish_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_hardtanh_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_hardtanh_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_hardtanh_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_hardtanh_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_huber_loss_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_instance_norm_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_instance_norm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_interpolate_area_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_interpolate_bicubic_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_interpolate_bicubic_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_interpolate_bicubic_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_interpolate_nearest-exact_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_interpolate_nearest_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_kl_div_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_l1_loss_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_l1_loss_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_leaky_relu_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_linear_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_logsigmoid_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_logsigmoid_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_margin_ranking_loss_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_margin_ranking_loss_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_margin_ranking_loss_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_max_pool3d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_max_unpool1d_grad_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_max_unpool2d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_max_unpool3d_grad_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_max_unpool3d_grad_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_mish_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_mse_loss_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_mse_loss_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_multi_head_attention_forward_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_multilabel_soft_margin_loss_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_normalize_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_pad_circular_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_pad_circular_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_pad_constant_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_pad_constant_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_pad_reflect_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_pad_reflect_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_pad_reflect_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_pad_reflect_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_pad_reflect_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_pad_replicate_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_pad_replicate_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_pad_replicate_negative_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_pad_replicate_negative_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_pad_replicate_negative_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_pad_replicate_negative_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_pad_replicate_negative_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_pairwise_distance_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_pdist_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_pixel_shuffle_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_pixel_shuffle_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_pixel_shuffle_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_pixel_unshuffle_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_pixel_unshuffle_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_poisson_nll_loss_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_relu6_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_relu_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_relu_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_rms_norm_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_rms_norm_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_scaled_dot_product_attention_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_softmin_with_dtype_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_softmin_with_dtype_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_softmin_with_dtype_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_softmin_with_dtype_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_softshrink_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_softsign_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_softsign_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_softsign_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_softsign_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_softsign_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_softsign_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_tanhshrink_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_tanhshrink_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_tanhshrink_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_threshold_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_triplet_margin_loss_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_triplet_margin_loss_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_triplet_margin_with_distance_loss_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_triplet_margin_with_distance_loss_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_triplet_margin_with_distance_loss_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_unfold_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_unfold_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_upsample_bilinear_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_upsample_nearest_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nonzero_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nonzero_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nonzero_static_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nonzero_static_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_norm_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_norm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_norm_nuc_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_normal_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_normal_in_place_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_ones_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_ones_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_ones_like_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_ones_like_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_outer_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_outer_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_outer_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_outer_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_outer_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_outer_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_permute_copy_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_permute_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_polar_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_polygamma_polygamma_n_0_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_polygamma_polygamma_n_1_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_polygamma_polygamma_n_1_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_polygamma_polygamma_n_2_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_polygamma_polygamma_n_3_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_polygamma_polygamma_n_3_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_positive_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_positive_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_positive_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_pow_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_pow_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_pow_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_pow_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_prod_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_put_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_qr_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_quantile_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_rad2deg_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_rad2deg_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_rand_like_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_rand_like_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_randint_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_randint_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_randint_like_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_randint_like_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_randn_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_randn_like_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_randn_like_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_ravel_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_ravel_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_ravel_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_ravel_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_ravel_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_real_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_real_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_real_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_remainder_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_remainder_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_remainder_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_renorm_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_repeat_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_repeat_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_repeat_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_repeat_interleave_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_repeat_interleave_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_repeat_interleave_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_reshape_as_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_reshape_as_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_reshape_as_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_reshape_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_reshape_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_reshape_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_resize__cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_resize__cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_resize__cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_resize__cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_resize__cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_resize_as__cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_resolve_conj_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_resolve_conj_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_resolve_conj_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_resolve_conj_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_resolve_conj_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_resolve_neg_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_roll_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_roll_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_roll_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_rot90_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_round_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_round_decimals_0_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_rsqrt_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_rsub_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_rsub_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_scalar_tensor_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_scalar_tensor_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_scalar_tensor_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_scalar_tensor_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_scalar_tensor_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_scatter_add_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_scatter_add_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_scatter_add_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_scatter_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_scatter_reduce_amax_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_scatter_reduce_amin_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_scatter_reduce_amin_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_scatter_reduce_amin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_scatter_reduce_mean_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_scatter_reduce_mean_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_scatter_reduce_prod_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_scatter_reduce_prod_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_scatter_reduce_sum_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_searchsorted_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_searchsorted_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_select_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_select_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_select_scatter_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sgn_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_short_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sigmoid_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sign_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sign_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sign_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sign_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_signal_windows_bartlett_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_signal_windows_bartlett_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_signal_windows_hann_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_signal_windows_kaiser_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_signal_windows_nuttall_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_signbit_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sin_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sin_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sinc_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sinc_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sinc_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sinh_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_slice_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_slice_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_slice_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_slice_scatter_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_softmax_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_softmax_with_dtype_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_softmax_with_dtype_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_softmax_with_dtype_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sort_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sparse_mm_reduce_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sparse_mm_reduce_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sparse_sampled_addmm_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_airy_ai_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_bessel_j0_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_bessel_j0_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_bessel_y0_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_bessel_y0_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_bessel_y1_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_bessel_y1_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_chebyshev_polynomial_t_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_chebyshev_polynomial_u_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_chebyshev_polynomial_v_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_chebyshev_polynomial_v_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_chebyshev_polynomial_w_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_chebyshev_polynomial_w_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_entr_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_entr_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_entr_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_erfcx_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_erfcx_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_hermite_polynomial_h_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_i0e_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_i1_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_i1e_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_laguerre_polynomial_l_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_legendre_polynomial_p_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_modified_bessel_i0_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_modified_bessel_i1_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_modified_bessel_i1_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_modified_bessel_i1_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_modified_bessel_k0_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_modified_bessel_k0_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_modified_bessel_k0_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_ndtri_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_ndtri_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_polygamma_special_polygamma_n_0_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_scaled_modified_bessel_k0_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_scaled_modified_bessel_k1_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_scaled_modified_bessel_k1_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_scaled_modified_bessel_k1_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_shifted_chebyshev_polynomial_t_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_shifted_chebyshev_polynomial_t_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_shifted_chebyshev_polynomial_u_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_shifted_chebyshev_polynomial_v_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_shifted_chebyshev_polynomial_v_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_shifted_chebyshev_polynomial_w_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_shifted_chebyshev_polynomial_w_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_shifted_chebyshev_polynomial_w_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_spherical_bessel_j0_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_xlog1py_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_xlog1py_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_xlog1py_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_zeta_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_zeta_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_split_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_split_list_args_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_split_list_args_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_split_with_sizes_copy_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_split_with_sizes_copy_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_split_with_sizes_copy_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_split_with_sizes_copy_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_split_with_sizes_copy_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_split_with_sizes_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_split_with_sizes_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sqrt_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sqrt_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sqrt_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sqrt_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_square_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_squeeze_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_squeeze_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_squeeze_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_squeeze_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_squeeze_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_squeeze_multiple_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_stack_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_stack_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_stack_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_std_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_std_mean_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_std_unbiased_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sub_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sub_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sub_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sub_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sum_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sum_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sum_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sum_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sum_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sum_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sum_to_size_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sum_to_size_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_svd_lowrank_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_t_copy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_t_copy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_t_copy_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_t_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_t_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_t_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_t_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_take_along_dim_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_take_along_dim_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_take_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_take_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_tan_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_tan_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_tan_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_tensor_split_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_tensor_split_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_tensor_split_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_tensor_split_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_tensor_split_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_tensor_split_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_tensordot_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_tile_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_tile_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_tile_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_tile_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_topk_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_torch_ops_aten__efficient_attention_forward_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_torch_ops_aten__safe_softmax_default_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_torch_ops_aten__safe_softmax_default_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_torch_ops_aten__safe_softmax_default_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_trace_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_trace_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_transpose_copy_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_transpose_copy_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_transpose_copy_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_transpose_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_trapezoid_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_trapz_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_trapz_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_tril_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_tril_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_tril_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_tril_indices_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_triu_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_triu_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_triu_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_true_divide_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_trunc_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_trunc_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_trunc_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_trunc_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_trunc_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unbind_copy_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unbind_copy_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unbind_copy_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unbind_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unbind_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unbind_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unbind_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unflatten_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unflatten_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unflatten_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unfold_copy_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unfold_copy_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unfold_copy_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unfold_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unfold_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unfold_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unfold_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unique_consecutive_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unique_consecutive_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unique_consecutive_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unique_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unravel_index_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unravel_index_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unsafe_chunk_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unsafe_chunk_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unsafe_split_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unsqueeze_copy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unsqueeze_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unsqueeze_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_var_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_var_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_var_mean_unbiased_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_var_unbiased_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_var_unbiased_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_view_as_complex_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_view_as_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_view_as_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_view_as_real_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_view_copy_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_view_copy_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_view_copy_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_view_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_view_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_vsplit_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_vstack_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_where_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_where_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_xlogy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_xlogy_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_zero__cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_zeros_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_zeros_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_zeros_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_zeros_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_zeros_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_zeros_like_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_zeros_like_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_mixed_dtype_for_native_layer_norm_backward_float16_float32_cuda 2025-12-04T13:56:16.3629243Z 2025-12-04T13:56:16.3629556Z Finished test_meta 2/5 ... [2025-12-04 13:56:15.698116][17004.081006297], took 26.23min 2025-12-04T13:56:16.3630622Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_meta/test_meta-cbc50d7c3e0b1b6a.xml 2025-12-04T13:56:17.1390137Z Uploading artifacts took 1.11 seconds 2025-12-04T13:56:17.1394393Z Running test_ops_jit 2/2 ... [2025-12-04 13:56:17.139266][17005.522161083] 2025-12-04T13:56:17.1394911Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T13:56:17.1399269Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_ops_jit.py', '--shard-id=2', '--num-shards=2', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 13:56:17.139713] 2025-12-04T14:08:06.1254035Z 2025-12-04T14:08:06.1255012Z test_ops_jit 2/2 was successful, full logs can be found in artifacts with path test/test-reports/test_ops_jit_2.2_814c1a8715769c60_.log 2025-12-04T14:08:06.1510377Z Running 594 items in this shard: test/test_ops_jit.py::TestJitCUDA::test_jit_alias_remapping_acos_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_jit_alias_remapping_asinh_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_jit_alias_remapping_atan_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_jit_alias_remapping_atanh_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_jit_alias_remapping_div_floor_rounding_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_jit_alias_remapping_erf_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_jit_alias_remapping_erfc_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_jit_alias_remapping_exp2_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_jit_alias_remapping_expm1_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_jit_alias_remapping_ge_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_jit_alias_remapping_gt_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_jit_alias_remapping_igammac_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_jit_alias_remapping_linalg_det_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_jit_alias_remapping_linalg_inv_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_jit_alias_remapping_log_softmax_with_dtype_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_jit_alias_remapping_logit_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_jit_alias_remapping_logsumexp_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_jit_alias_remapping_lt_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_jit_alias_remapping_mH_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_jit_alias_remapping_matrix_exp_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_jit_alias_remapping_max_binary_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_jit_alias_remapping_mvlgamma_mvlgamma_p_3_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_jit_alias_remapping_neg_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_jit_alias_remapping_nn_functional_conv2d_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_jit_alias_remapping_nn_functional_conv3d_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_jit_alias_remapping_outer_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_jit_alias_remapping_round_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_jit_alias_remapping_round_decimals_0_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_jit_alias_remapping_round_decimals_neg_3_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_jit_alias_remapping_transpose_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_jit_alias_remapping_trunc_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_H_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_T_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_T_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit___getitem___cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit___radd___cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit___radd___cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit___rdiv___cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit___rmod___cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit___rmul___cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit___rmul___cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit___rsub___cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit___rsub___cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit__batch_norm_with_update_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit__chunk_cat_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit__native_batch_norm_legit_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit__segment_reduce_lengths_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit__segment_reduce_offsets_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit__softmax_backward_data_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit__unsafe_masked_index_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit__unsafe_masked_index_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit__unsafe_masked_index_put_accumulate_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit__upsample_bilinear2d_aa_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_abs_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_abs_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_acos_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_acos_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_add_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_addbmm_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_addcmul_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_addmm_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_addmm_decomposed_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_addmv_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_addr_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_addr_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_alias_copy_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_alias_copy_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_all_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_all_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_allclose_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_allclose_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_amin_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_angle_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_any_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_any_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_arange_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_argmax_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_as_strided_copy_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_as_strided_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_as_strided_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_as_strided_partial_views_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_as_strided_partial_views_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_as_strided_scatter_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_as_strided_scatter_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_asinh_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_atan_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_atanh_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_atanh_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_atleast_1d_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_atleast_1d_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_atleast_2d_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_bfloat16_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_bfloat16_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_block_diag_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_bmm_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_bmm_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_bool_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_broadcast_tensors_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_broadcast_to_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_byte_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_cartesian_prod_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_cdist_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_cdouble_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_ceil_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_chalf_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_chalf_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_char_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_cholesky_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_cholesky_inverse_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_cholesky_solve_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_chunk_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_chunk_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_clamp_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_clamp_min_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_column_stack_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_combinations_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_complex_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_conj_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_conj_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_conj_physical_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_contiguous_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_cummin_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_cumprod_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_cumsum_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_cumulative_trapezoid_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_deg2rad_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_diag_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_diag_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_diagflat_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_diagonal_copy_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_diagonal_copy_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_diagonal_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_diagonal_scatter_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_diagonal_scatter_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_diff_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_diff_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_dist_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_dist_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_div_floor_rounding_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_div_trunc_rounding_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_double_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_dstack_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_einsum_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_einsum_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_empty_like_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_empty_permuted_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_equal_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_erfc_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_erfinv_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_exp2_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_exp2_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_exp_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_expand_as_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_expand_copy_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_expand_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_expm1_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_eye_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_fft_fft2_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_fft_fft_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_fft_fft_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_fft_fftn_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_fft_hfft2_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_fft_ifft_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_fft_ifftn_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_fft_ifftn_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_fft_ihfft2_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_fft_ihfft_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_fft_irfft2_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_fft_irfft_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_fft_irfftn_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_fft_rfft_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_fft_rfftn_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_fill_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_fill_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_flatten_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_flip_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_fliplr_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_flipud_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_float_power_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_floor_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_floor_divide_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_fmax_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_frac_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_frexp_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_full_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_full_like_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_full_like_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_gather_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_gather_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_ge_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_geometric_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_geqrf_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_geqrf_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_grid_sampler_3d_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_half_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_hash_tensor_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_heaviside_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_hstack_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_hstack_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_hypot_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_igamma_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_index_add_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_index_add_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_index_copy_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_index_copy_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_index_put_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_index_put_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_index_reduce_amax_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_index_reduce_amin_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_index_reduce_prod_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_index_select_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_index_select_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_inner_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_isclose_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_isfinite_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_isin_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_isinf_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_isneginf_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_isposinf_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_isreal_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_isreal_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_istft_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_item_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_item_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_jiterator_2inputs_2outputs_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_jiterator_4inputs_with_extra_args_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_jiterator_4inputs_with_extra_args_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_jiterator_binary_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_jiterator_unary_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_kron_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_kron_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_ldexp_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_le_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_lerp_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_linalg_cholesky_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_linalg_cholesky_ex_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_linalg_cholesky_ex_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_linalg_cond_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_linalg_cross_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_linalg_cross_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_linalg_diagonal_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_linalg_diagonal_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_linalg_eig_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_linalg_eigvals_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_linalg_eigvalsh_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_linalg_eigvalsh_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_linalg_householder_product_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_linalg_inv_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_linalg_ldl_factor_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_linalg_ldl_factor_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_linalg_ldl_solve_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_linalg_lstsq_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_linalg_lstsq_grad_oriented_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_linalg_lstsq_grad_oriented_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_linalg_lu_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_linalg_lu_factor_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_linalg_lu_factor_ex_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_linalg_lu_solve_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_linalg_lu_solve_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_linalg_matrix_norm_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_linalg_matrix_norm_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_linalg_matrix_power_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_linalg_matrix_rank_hermitian_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_linalg_multi_dot_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_linalg_multi_dot_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_linalg_norm_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_linalg_norm_subgradients_at_zero_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_linalg_pinv_hermitian_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_linalg_pinv_singular_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_linalg_pinv_singular_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_linalg_qr_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_linalg_solve_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_linalg_solve_ex_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_linalg_solve_triangular_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_linalg_solve_triangular_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_linalg_svd_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_linalg_svdvals_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_linalg_tensorinv_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_linalg_tensorinv_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_linalg_vander_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_linalg_vander_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_linalg_vecdot_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_linspace_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_linspace_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_linspace_tensor_overload_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_linspace_tensor_overload_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_log1p_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_log_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_log_softmax_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_logaddexp2_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_logaddexp_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_logcumsumexp_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_logdet_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_logical_and_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_logical_not_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_logical_or_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_logical_xor_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_logical_xor_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_logit_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_logspace_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_logspace_tensor_overload_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_logsumexp_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_long_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_long_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_lu_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_lu_solve_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_lu_solve_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_lu_unpack_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_mH_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_mH_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_mT_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_mT_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_masked_amax_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_masked_argmin_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_masked_cumsum_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_masked_cumsum_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_masked_fill_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_masked_fill_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_masked_log_softmax_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_masked_logaddexp_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_masked_logsumexp_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_masked_mean_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_masked_mean_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_masked_prod_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_masked_prod_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_masked_select_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_masked_select_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_masked_softmax_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_masked_softmin_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_masked_std_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_masked_std_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_masked_sum_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_masked_var_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_matmul_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_matmul_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_matrix_exp_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_matrix_exp_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_max_binary_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_maximum_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_median_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_meshgrid_list_of_tensors_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_meshgrid_variadic_tensors_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_min_reduction_no_dim_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_min_reduction_with_dim_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_mm_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_mode_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_multinomial_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_mvlgamma_mvlgamma_p_3_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nanmean_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nanmedian_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nansum_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_narrow_copy_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_native_batch_norm_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_native_dropout_backward_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_new_empty_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_new_ones_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_new_ones_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_new_zeros_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_new_zeros_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nextafter_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_adaptive_avg_pool2d_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_adaptive_max_pool1d_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_adaptive_max_pool2d_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_adaptive_max_pool3d_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_avg_pool2d_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_batch_norm_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_batch_norm_without_cudnn_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_bilinear_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_binary_cross_entropy_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_binary_cross_entropy_with_logits_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_celu_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_channel_shuffle_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_conv1d_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_conv1d_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_conv_transpose1d_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_conv_transpose1d_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_conv_transpose2d_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_conv_transpose3d_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_cosine_embedding_loss_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_ctc_loss_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_dropout2d_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_dropout3d_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_embedding_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_feature_alpha_dropout_with_train_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_feature_alpha_dropout_without_train_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_fractional_max_pool2d_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_fractional_max_pool3d_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_gelu_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_glu_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_grid_sample_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_hardshrink_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_hinge_embedding_loss_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_interpolate_area_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_interpolate_bicubic_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_interpolate_linear_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_interpolate_nearest-exact_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_interpolate_nearest_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_interpolate_trilinear_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_kl_div_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_layer_norm_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_leaky_relu_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_linear_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_logsigmoid_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_margin_ranking_loss_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_max_pool1d_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_max_pool2d_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_max_unpool1d_grad_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_max_unpool2d_grad_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_max_unpool3d_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_mish_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_mse_loss_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_multi_head_attention_forward_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_multi_margin_loss_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_multilabel_margin_loss_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_multilabel_soft_margin_loss_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_nll_loss_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_pad_circular_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_pad_constant_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_pad_replicate_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_pairwise_distance_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_pairwise_distance_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_pixel_unshuffle_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_relu6_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_relu_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_rms_norm_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_rms_norm_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_silu_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_soft_margin_loss_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_softsign_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_tanhshrink_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_unfold_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nn_functional_upsample_bilinear_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nonzero_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nonzero_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_nonzero_static_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_norm_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_norm_fro_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_norm_nuc_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_norm_nuc_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_normal_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_normal_in_place_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_ones_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_ones_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_ones_like_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_ormqr_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_outer_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_outer_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_pca_lowrank_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_pca_lowrank_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_permute_copy_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_permute_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_pinverse_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_polygamma_polygamma_n_4_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_positive_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_positive_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_pow_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_prod_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_qr_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_quantile_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_randint_like_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_ravel_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_ravel_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_real_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_reciprocal_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_remainder_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_renorm_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_renorm_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_repeat_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_repeat_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_repeat_interleave_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_repeat_interleave_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_reshape_as_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_reshape_as_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_reshape_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_resize_as__cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_resolve_conj_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_resolve_neg_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_resolve_neg_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_roll_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_rot90_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_rot90_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_round_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_round_decimals_neg_3_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_rsqrt_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_scalar_tensor_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_scatter_add_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_scatter_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_scatter_reduce_amax_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_scatter_reduce_amin_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_scatter_reduce_mean_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_select_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_select_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_sgn_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_sgn_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_short_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_sigmoid_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_signal_windows_exponential_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_signal_windows_general_cosine_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_signal_windows_general_hamming_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_signal_windows_hamming_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_sin_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_sin_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_sinc_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_sinh_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_slice_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_slice_scatter_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_softmax_with_dtype_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_sparse_sampled_addmm_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_sparse_sampled_addmm_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_special_bessel_j1_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_special_bessel_y1_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_special_chebyshev_polynomial_t_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_special_chebyshev_polynomial_v_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_special_chebyshev_polynomial_w_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_special_hermite_polynomial_he_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_special_i1_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_special_laguerre_polynomial_l_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_special_log_ndtr_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_special_modified_bessel_i0_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_special_ndtri_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_special_polygamma_special_polygamma_n_0_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_special_scaled_modified_bessel_k0_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_special_shifted_chebyshev_polynomial_u_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_special_shifted_chebyshev_polynomial_v_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_special_shifted_chebyshev_polynomial_w_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_special_spherical_bessel_j0_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_special_zeta_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_split_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_split_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_split_list_args_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_split_list_args_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_split_with_sizes_copy_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_split_with_sizes_copy_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_split_with_sizes_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_sqrt_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_square_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_stack_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_stack_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_std_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_std_mean_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_std_unbiased_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_std_unbiased_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_sub_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_sub_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_sum_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_sum_to_size_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_svd_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_svd_lowrank_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_svd_lowrank_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_t_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_take_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_take_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_tanh_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_tanh_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_tensor_split_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_topk_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_torch_ops_aten__efficient_attention_forward_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_torch_ops_aten__safe_softmax_default_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_trace_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_trace_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_transpose_copy_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_transpose_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_trapezoid_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_trapz_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_triangular_solve_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_tril_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_triu_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_true_divide_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_trunc_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_unbind_copy_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_unbind_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_unbind_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_unflatten_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_uniform_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_uniform_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_unique_consecutive_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_unique_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_unsafe_chunk_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_unsafe_chunk_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_unsqueeze_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_unsqueeze_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_var_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_var_unbiased_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_view_as_complex_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_view_as_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_view_as_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_view_as_real_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_view_copy_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_view_copy_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_view_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_view_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_vsplit_cuda_float32, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_where_cuda_complex64, test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_zeros_cuda_complex64 2025-12-04T14:08:06.1762185Z 2025-12-04T14:08:06.1762485Z Finished test_ops_jit 2/2 ... [2025-12-04 14:08:06.126153][17714.509047369], took 11.82min 2025-12-04T14:08:06.1763566Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_ops_jit/test_ops_jit-2f4faab6a29e642c.xml 2025-12-04T14:08:06.2472064Z Running test_nestedtensor 3/4 ... [2025-12-04 14:08:06.246889][17714.629784328] 2025-12-04T14:08:06.2472605Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T14:08:06.2475929Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_nestedtensor.py', '--shard-id=3', '--num-shards=4', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 14:08:06.247332] 2025-12-04T14:16:12.1978081Z 2025-12-04T14:16:12.1979045Z test_nestedtensor 3/4 was successful, full logs can be found in artifacts with path test/test-reports/test_nestedtensor_3.4_8e55fc0245a5aec0_.log 2025-12-04T14:16:12.2207909Z Running 397 items in this shard: test/test_nestedtensor.py::TestNestedTensor::test_2d_nested_tensor_batch_size_2_max_seq_len_3_vocab_size_10, test/test_nestedtensor.py::TestNestedTensor::test_3d_nested_tensor_batch_size_2_max_seq_len_3_vocab_size_10, test/test_nestedtensor.py::TestNestedTensor::test_3d_nested_tensor_float_batch_size_4_max_seq_len_5_vocab_size_10, test/test_nestedtensor.py::TestNestedTensor::test_3d_nested_tensor_float_batch_size_4_max_seq_len_5_vocab_size_20, test/test_nestedtensor.py::TestNestedTensor::test_is_contiguous, test/test_nestedtensor.py::TestNestedTensor::test_like_functions_zeros_like, test/test_nestedtensor.py::TestNestedTensor::test_unbind_3, test/test_nestedtensor.py::TestNestedInt::test_comparisons, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_clone_cuda_float16, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_contiguous_cuda_float32, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_dropout_jagged_cuda_float32, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_dropout_jagged_cuda_float64, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_dropout_noncontiguous_cuda_float32, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_dropout_strided_cuda_float32, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_embedding_jagged_cuda, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_empty_like_cuda_float16, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_jagged_amax_dtypes_cuda_bfloat16, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_jagged_amax_dtypes_cuda_uint8, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_jagged_amin_dtypes_cuda_uint8, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_jagged_argmax_dtypes_cuda_int16, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_jagged_max_dtypes_cuda_int32, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_jagged_min_dtypes_cuda_int32, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_jagged_min_dtypes_cuda_uint8, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_layer_norm_breaking_cuda_float32, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_linear_cuda_float64, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_masked_fill_cuda_float16, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_masked_fill_cuda_float64, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_matmul_noncontiguous_cuda_float32, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_narrow_cuda_float32, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_narrow_cuda_float64, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_nested_masked_select_cuda, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_nested_tensor_add_transpose_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_nested_tensor_add_transpose_True_cuda_float16, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_nested_tensor_add_transpose_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_nested_tensor_chunk_cuda_float16, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_nested_tensor_chunk_cuda_float32, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_nested_tensor_dense_elementwise_embedding_dim_256_cuda_float16, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_nested_tensor_dense_elementwise_embedding_dim_8_cuda_float32, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_nested_tensor_indexing_cuda_float32, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_nested_tensor_mul_cuda_float32, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_nested_tensor_sub_transpose_False_cuda_float16, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_nested_tensor_sub_transpose_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_reshape_cuda_float32, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_scaled_dot_product_attention_input_dim_4_cuda, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_serialization_requires_grad_False_weights_only_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_serialization_requires_grad_False_weights_only_False_cuda_float64, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_serialization_requires_grad_False_weights_only_True_cuda_float64, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_serialization_requires_grad_True_weights_only_True_cuda_float16, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_softmax_noncontiguous_cuda_float32, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_to_padded_tensor_dim2_cuda_float16, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_to_padded_tensor_dim2_cuda_float32, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_to_padded_tensor_dim2_cuda_float64, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_to_padded_tensor_dim3_cuda_float32, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_to_padded_tensor_dim4_cuda_float16, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_to_padded_tensor_noncontiguous_cuda_float64, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_to_padded_tensor_zero_numel_errors_cuda_float64, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_unary_funcs_cos_cuda, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_unary_funcs_isposinf_cuda, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_unary_funcs_relu_cuda, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_unary_funcs_sgn_cuda, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_unary_funcs_sqrt_cuda, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_unary_funcs_tanh_cuda, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_unbind_noncontiguous_cuda_float16, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_view_inference_mode_interaction_cuda_float32, test/test_nestedtensor.py::TestNestedTensorAutogradCUDA::test_backward_sub_strided_cuda, test/test_nestedtensor.py::TestNestedTensorAutogradCUDA::test_dropout_backward_strided_cuda, test/test_nestedtensor.py::TestNestedTensorAutogradCUDA::test_indexing_backward_cuda, test/test_nestedtensor.py::TestNestedTensorAutogradCUDA::test_layer_norm_backward_5d_size_2_cuda, test/test_nestedtensor.py::TestNestedTensorAutogradCUDA::test_layer_norm_backward_5d_size_32_cuda, test/test_nestedtensor.py::TestNestedTensorAutogradCUDA::test_layer_norm_backward_edge_case_cuda, test/test_nestedtensor.py::TestNestedTensorAutogradCUDA::test_layer_norm_backward_size_2_cuda, test/test_nestedtensor.py::TestNestedTensorAutogradCUDA::test_nested_tensor_generates_leaf_cuda, test/test_nestedtensor.py::TestNestedTensorAutogradCUDA::test_nested_tensor_linear_cuda, test/test_nestedtensor.py::TestNestedTensorAutogradCUDA::test_nested_tensor_reshape_gradcheck_cuda, test/test_nestedtensor.py::TestNestedTensorAutogradCUDA::test_set_requires_grad_from_list_cuda, test/test_nestedtensor.py::TestNestedTensorAutogradCUDA::test_set_requires_grad_from_mask_cuda, test/test_nestedtensor.py::TestNestedTensorAutogradCUDA::test_to_buffer_series_ops_grad_with_broadcast_cuda, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_0_layout_jagged_requires_grad_False_contiguous_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_0_layout_jagged_requires_grad_False_contiguous_False_cuda_float64, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_0_layout_jagged_requires_grad_True_contiguous_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_0_layout_strided_requires_grad_False_contiguous_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_1_layout_jagged_requires_grad_False_contiguous_True_cuda_float64, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_1_layout_jagged_requires_grad_True_contiguous_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_1_layout_strided_requires_grad_False_contiguous_False_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_1_layout_strided_requires_grad_False_contiguous_False_cuda_float64, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_1_layout_strided_requires_grad_True_contiguous_False_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_2_layout_strided_requires_grad_False_contiguous_False_cuda_float64, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_2_layout_strided_requires_grad_False_contiguous_True_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_3_layout_jagged_requires_grad_False_contiguous_False_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_3_layout_jagged_requires_grad_True_contiguous_False_cuda_float64, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_3_layout_strided_requires_grad_True_contiguous_False_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_3_layout_strided_requires_grad_True_contiguous_True_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_4_layout_jagged_requires_grad_False_contiguous_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_4_layout_jagged_requires_grad_False_contiguous_False_cuda_float64, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_4_layout_jagged_requires_grad_False_contiguous_True_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_4_layout_jagged_requires_grad_True_contiguous_False_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_4_layout_strided_requires_grad_False_contiguous_False_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_4_layout_strided_requires_grad_False_contiguous_True_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_4_layout_strided_requires_grad_True_contiguous_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_4_layout_strided_requires_grad_True_contiguous_True_cuda_float64, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_autograd_function_with_None_grad_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_binary_pointwise_with_nested_int_second_arg_cuda, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_compile_with_propagated_dynamic_max_seq_len_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_composite_op_in_inference_mode_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_construction_from_list_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_copy__cuda, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_jagged_layout_construction_as_nested_tensor_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_jagged_layout_construction_nested_tensor_requires_grad_False_components_require_grad_False_cuda_float64, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_jagged_layout_construction_nested_tensor_requires_grad_False_components_require_grad_True_cuda_float64, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_jagged_layout_construction_nested_tensor_requires_grad_True_components_require_grad_False_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_jagged_layout_construction_nested_tensor_requires_grad_True_components_require_grad_True_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_jagged_op_different_output_shape_dim_mean_keepdim_False_requires_grad_False_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_jagged_op_different_output_shape_dim_mean_keepdim_False_requires_grad_False_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_jagged_op_different_output_shape_dim_mean_keepdim_False_requires_grad_True_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_jagged_op_different_output_shape_dim_mean_keepdim_True_requires_grad_False_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_jagged_op_different_output_shape_dim_sum_keepdim_False_requires_grad_False_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_jagged_op_different_output_shape_dim_sum_keepdim_True_requires_grad_False_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_jagged_op_different_output_shape_dim_sum_keepdim_True_requires_grad_True_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_jagged_view_from_values_offsets_requires_grad_False_values_is_view_False_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_jagged_view_from_values_offsets_requires_grad_False_values_is_view_True_cuda_float64, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_jagged_view_from_values_offsets_requires_grad_True_values_is_view_False_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_jagged_view_from_values_offsets_requires_grad_True_values_is_view_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_layer_norm_2d_input_requires_grad_False_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_layer_norm_2d_input_requires_grad_True_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_layer_norm_operate_on_batch_dim_requires_grad_False_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_layer_norm_operate_on_batch_dim_requires_grad_True_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_layer_norm_reduce_ragged_idx_1_requires_grad_False_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_layer_norm_with_lengths_requires_grad_True_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_linear_nt_dim_3_cuda, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_linear_nt_dim_4_cuda, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_narrow_cuda, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_nested_tensor_from_jagged_pass_min_max_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_batch_only_different_output_shape_mean_keepdim_False_requires_grad_False_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_batch_only_different_output_shape_sum_keepdim_False_requires_grad_False_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_batch_only_different_output_shape_sum_keepdim_True_requires_grad_False_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_ragged_idx_1_different_output_shape_mean_keepdim_True_requires_grad_False_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_ragged_idx_1_different_output_shape_sum_keepdim_False_requires_grad_False_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_ragged_idx_1_different_output_shape_sum_keepdim_False_requires_grad_True_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_ragged_idx_greater_than_1_different_output_shape_mean_transpose_offset_1_keepdim_False_requires_grad_True_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_ragged_idx_greater_than_1_different_output_shape_mean_transpose_offset_1_keepdim_True_requires_grad_False_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_ragged_idx_greater_than_1_different_output_shape_mean_transpose_offset_1_keepdim_True_requires_grad_True_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_ragged_idx_greater_than_1_different_output_shape_mean_transpose_offset_2_keepdim_True_requires_grad_False_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_ragged_idx_greater_than_1_different_output_shape_mean_transpose_offset_2_keepdim_True_requires_grad_True_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_ragged_idx_greater_than_1_different_output_shape_sum_transpose_offset_1_keepdim_False_requires_grad_True_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_ragged_idx_greater_than_1_different_output_shape_sum_transpose_offset_1_keepdim_True_requires_grad_False_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_ragged_idx_greater_than_1_different_output_shape_sum_transpose_offset_2_keepdim_True_requires_grad_False_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_ragged_idx_greater_than_1_different_output_shape_sum_transpose_offset_2_keepdim_True_requires_grad_True_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_transpose_non_ragged_dim_different_output_shape_mean_keepdim_False_requires_grad_True_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_transpose_non_ragged_dim_different_output_shape_sum_keepdim_True_requires_grad_True_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_with_lengths_different_output_shape_mean_keepdim_False_requires_grad_False_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_with_lengths_different_output_shape_mean_keepdim_True_requires_grad_False_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_with_lengths_different_output_shape_sum_keepdim_False_requires_grad_False_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_with_lengths_different_output_shape_sum_keepdim_True_requires_grad_True_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_profiler_sequence_nr_cuda, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_record_stream_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_reshape_decomp_requires_grad_True_cuda, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_sdpa_autocast_cuda, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_sdpa_compile_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_sdpa_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_sdpa_with_constant_sequence_length_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_sdpa_with_packed_in_proj_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_serialization_noncontig_transposed_weights_only_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_softmax_cuda, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_softmax_dim_reduce_ragged_idx_1_requires_grad_False_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_softmax_dim_reduce_ragged_idx_greater_than_1_same_output_shape_transpose_offset_1_requires_grad_True_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_softmax_dim_reduce_ragged_idx_greater_than_1_same_output_shape_transpose_offset_2_requires_grad_True_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_softmax_dim_requires_grad_True_components_require_grad_False_log_softmax_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_softmax_reduce_batch_dim_requires_grad_True_components_require_grad_False_log_softmax_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_softmax_reduce_batch_dim_requires_grad_True_components_require_grad_True_softmax_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_specialize_dynamic_shape_recompile_cuda, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_squeeze_cuda, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_sum_dim_reduce_batch_and_non_batch_keepdim_False_requires_grad_False_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_sum_dim_reduce_batch_and_non_batch_keepdim_False_requires_grad_True_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_sum_dim_reduce_batch_and_non_batch_keepdim_True_requires_grad_True_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_sum_dim_reduce_ragged_and_non_batch_keepdim_False_requires_grad_True_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_to_padded_tensor_compile_nt_dim_2_requires_grad_False_cuda_float64, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_to_padded_tensor_compile_nt_dim_4_requires_grad_False_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_to_padded_tensor_compile_nt_dim_4_requires_grad_True_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_to_padded_tensor_nt_dim_2_requires_grad_True_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_to_padded_tensor_nt_dim_2_requires_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_to_padded_tensor_nt_dim_2_requires_grad_True_cuda_float64, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_to_padded_tensor_nt_dim_3_requires_grad_False_cuda_bool, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_to_padded_tensor_nt_dim_4_requires_grad_False_cuda_bool, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_to_padded_tensor_nt_dim_4_requires_grad_False_cuda_float64, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_unbind_lengths_ragged_idx_1_cuda, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_view_ragged_idx_not_one_cuda, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward___rmod___cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward___rpow___cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_abs_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_amin_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_bmm_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_complex_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_cosh_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_digamma_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_div_no_rounding_mode_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_div_trunc_rounding_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_double_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_erfc_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_erfinv_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_exp_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_fill_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_i0_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_ldexp_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_log2_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_log_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_logaddexp_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_masked_norm_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_masked_std_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_matmul_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_min_reduction_with_dim_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_mvlgamma_mvlgamma_p_5_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_nn_functional_embedding_bag_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_nn_functional_hardtanh_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_nn_functional_silu_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_nn_functional_threshold_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_polygamma_polygamma_n_1_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_polygamma_polygamma_n_2_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_polygamma_polygamma_n_4_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_remainder_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_round_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_round_decimals_neg_3_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_rsub_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_special_i1_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_special_log_ndtr_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_special_ndtri_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_special_polygamma_special_polygamma_n_0_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_split_with_sizes_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_sqrt_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_square_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_std_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_std_unbiased_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_tanh_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_to_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_var_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward___rmul___cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_abs_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_atanh_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_chalf_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_clamp_min_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_clone_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_complex_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_copysign_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_div_trunc_rounding_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_float_power_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_fmod_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_frac_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_frexp_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_half_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_index_put_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_lgamma_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_log2_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_masked_amax_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_max_reduction_with_dim_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_maximum_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_minimum_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_nn_functional_embedding_bag_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_nn_functional_linear_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_nn_functional_mish_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_nn_functional_rrelu_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_nn_functional_silu_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_nn_functional_softplus_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_nn_functional_tanhshrink_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_nn_functional_threshold_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_polar_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_polygamma_polygamma_n_1_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_polygamma_polygamma_n_2_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_positive_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_round_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_round_decimals_neg_3_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_sinh_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_special_erfcx_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_special_i1_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_special_log_ndtr_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_special_ndtr_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_special_ndtri_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_std_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_to_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_unflatten_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_xlogy_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward___radd___cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_asin_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_asinh_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_clamp_max_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_copysign_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_cos_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_digamma_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_div_floor_rounding_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_erfinv_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_fill_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_float_power_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_floor_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_floor_divide_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_fmax_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_half_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_igammac_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_isinf_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_isneginf_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_isreal_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_masked_amin_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_masked_mean_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_masked_norm_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_matmul_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_mvlgamma_mvlgamma_p_1_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_nn_functional_celu_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_nn_functional_embedding_bag_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_nn_functional_hardtanh_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_nn_functional_linear_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_nn_functional_rrelu_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_nn_functional_tanhshrink_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_polygamma_polygamma_n_0_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_reciprocal_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_round_decimals_0_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_rsqrt_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_rsub_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_sgn_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_short_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_signbit_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_special_bessel_j1_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_special_bessel_y1_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_special_chebyshev_polynomial_t_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_special_hermite_polynomial_he_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_special_legendre_polynomial_p_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_special_modified_bessel_i0_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_special_scaled_modified_bessel_k1_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_special_shifted_chebyshev_polynomial_v_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_split_with_sizes_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_std_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_trunc_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_unflatten_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_unsqueeze_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_var_unbiased_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_xlogy_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward___rpow___cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_acos_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_add_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_argmin_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_bfloat16_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_chalf_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_chunk_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_clone_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_complex_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_conj_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_copysign_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_count_nonzero_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_div_no_rounding_mode_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_div_trunc_rounding_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_exp2_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_frac_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_hash_tensor_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_isclose_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_isreal_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_jiterator_binary_return_by_ref_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_linalg_vector_norm_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_logical_and_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_logical_not_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_logical_xor_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_logit_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_long_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_lt_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_masked_argmin_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_matmul_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_maximum_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_min_binary_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_min_reduction_with_dim_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_mvlgamma_mvlgamma_p_3_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_mvlgamma_mvlgamma_p_5_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_nansum_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_neg_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_nextafter_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_nn_functional_embedding_bag_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_nn_functional_hardtanh_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_nn_functional_logsigmoid_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_nn_functional_prelu_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_nn_functional_relu_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_nn_functional_softsign_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_nn_functional_tanhshrink_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_rsub_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_select_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_sigmoid_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_sign_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_special_bessel_j0_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_special_chebyshev_polynomial_v_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_special_entr_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_special_erfcx_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_special_i0e_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_special_i1e_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_special_laguerre_polynomial_l_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_special_legendre_polynomial_p_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_special_log_ndtr_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_special_modified_bessel_i0_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_sqrt_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_squeeze_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_std_unbiased_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_unsqueeze_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_var_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_xlogy_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_nested_tensor_input_mutation_backward_cuda 2025-12-04T14:16:12.2433134Z 2025-12-04T14:16:12.2433474Z Finished test_nestedtensor 3/4 ... [2025-12-04 14:16:12.198371][18200.581265167], took 8.10min 2025-12-04T14:16:12.2434955Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_nestedtensor/test_nestedtensor-8372b6917771ca4c.xml 2025-12-04T14:16:12.3104307Z Running test_ops 2/11 ... [2025-12-04 14:16:12.310099][18200.692992794] 2025-12-04T14:16:12.3104829Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T14:16:12.3108038Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_ops.py', '--shard-id=2', '--num-shards=11', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 14:16:12.310549] 2025-12-04T14:37:58.1952279Z 2025-12-04T14:37:58.1953173Z test_ops 2/11 was successful, full logs can be found in artifacts with path test/test-reports/test_ops_2.11_06c992f175cc3a27_.log 2025-12-04T14:37:58.3227785Z Running 3122 items in this shard: test/test_ops.py::TestCommonCUDA::test_compare_cpu_H_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu___rmod___cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_cauchy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_diagonal_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_expand_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_eye_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_hstack_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_igamma_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_igammac_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_new_full_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_nextafter_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_nn_functional_nll_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_repeat_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_reshape_as_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_rot90_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_unsqueeze_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_var_mean_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_view_as_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_xlogy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_atan2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_bernoulli_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_combinations_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_cummax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_dist_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_div_floor_rounding_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_dsplit_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_empty_strided_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_full_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_geometric_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_linalg_eig_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_linalg_householder_product_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_logdet_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_masked_cumprod_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_masked_scatter_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_max_pool2d_with_indices_backward_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_max_reduction_no_dim_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_mm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_narrow_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_native_batch_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_new_full_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nextafter_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_conv2d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_feature_alpha_dropout_without_train_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_interpolate_bicubic_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_max_pool3d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_max_unpool2d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_multi_head_attention_forward_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_pad_constant_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_pad_reflect_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_softmin_with_dtype_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nonzero_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_norm_fro_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_norm_nuc_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_normal_number_mean_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_permute_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_quantile_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_resolve_neg_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_scatter_reduce_mean_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_select_scatter_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_to_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_uniform_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_unsqueeze_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_var_unbiased_cuda_float32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_abs_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_as_strided_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_asin_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_char_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_dstack_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_fft_ifft_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_hstack_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_index_copy_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_log_softmax_with_dtype_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_masked_fill_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_nn_functional_conv2d_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_nn_functional_conv_transpose3d_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_prod_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_rsqrt_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_sub_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_unsafe_chunk_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_view_as_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_view_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_dtypes_T_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes___getitem___cuda, test/test_ops.py::TestCommonCUDA::test_dtypes___rmul___cuda, test/test_ops.py::TestCommonCUDA::test_dtypes___ror___cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__batch_norm_with_update_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs__conversions_char_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_alias_copy_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_allclose_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_arange_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_as_strided_copy_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_atan2_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_bitwise_or_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_div_trunc_rounding_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_dot_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_equal_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_erf_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_expand_copy_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_fft_rfft2_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_isfinite_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_lgamma_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_log1p_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_log2_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_log_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_log_normal_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_meshgrid_variadic_tensors_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_nn_functional_hinge_embedding_loss_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_nn_functional_relu_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_nn_functional_softshrink_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_nn_functional_tanhshrink_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_nn_functional_triplet_margin_loss_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_remainder_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_sigmoid_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_special_i0e_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_special_log_softmax_with_dtype_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_special_multigammaln_mvlgamma_p_1_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_split_with_sizes_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_square_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_sub_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_triu_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_trunc_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_xlogy_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_addcmul_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_addmm_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_amax_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_amin_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_asin_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_bincount_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_cat_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_cdist_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_char_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_cholesky_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_count_nonzero_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_erfc_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_erfinv_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_fft_rfft2_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_flip_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_flipud_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_half_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_hash_tensor_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_heaviside_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_hstack_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_index_reduce_amax_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_inner_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_isneginf_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_jiterator_binary_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_lcm_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_linalg_ldl_factor_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_linalg_norm_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_logical_not_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_masked_amin_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_masked_var_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_median_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_minimum_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_mm_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_mv_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_native_dropout_backward_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_neg_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_alpha_dropout_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_instance_norm_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_max_unpool2d_grad_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_mish_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_normalize_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_rrelu_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_norm_inf_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_ormqr_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_permute_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_resolve_neg_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_rot90_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_round_decimals_3_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_scatter_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_scatter_reduce_mean_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_select_scatter_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_softmax_with_dtype_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_special_laguerre_polynomial_l_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_special_modified_bessel_i1_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_special_ndtr_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_special_scaled_modified_bessel_k1_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_square_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_torch__scaled_mm_v2_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_unfold_copy_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_uniform_cuda, test/test_ops.py::TestCommonCUDA::test_errors___rand___cuda, test/test_ops.py::TestCommonCUDA::test_errors___rdiv___cuda, test/test_ops.py::TestCommonCUDA::test_errors_aminmax_cuda, test/test_ops.py::TestCommonCUDA::test_errors_as_strided_scatter_cuda, test/test_ops.py::TestCommonCUDA::test_errors_complex_cuda, test/test_ops.py::TestCommonCUDA::test_errors_fft_fftn_cuda, test/test_ops.py::TestCommonCUDA::test_errors_fft_irfft2_cuda, test/test_ops.py::TestCommonCUDA::test_errors_fft_rfftn_cuda, test/test_ops.py::TestCommonCUDA::test_errors_fmax_cuda, test/test_ops.py::TestCommonCUDA::test_errors_gradient_cuda, test/test_ops.py::TestCommonCUDA::test_errors_kthvalue_cuda, test/test_ops.py::TestCommonCUDA::test_errors_median_cuda, test/test_ops.py::TestCommonCUDA::test_errors_nn_functional_l1_loss_cuda, test/test_ops.py::TestCommonCUDA::test_errors_nn_functional_multi_margin_loss_cuda, test/test_ops.py::TestCommonCUDA::test_errors_scatter_cuda, test/test_ops.py::TestCommonCUDA::test_errors_special_chebyshev_polynomial_t_cuda, test/test_ops.py::TestCommonCUDA::test_errors_special_shifted_chebyshev_polynomial_u_cuda, test/test_ops.py::TestCommonCUDA::test_errors_triu_cuda, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_addmm_decomposed_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_amax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_as_strided_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_cholesky_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_cholesky_inverse_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_fft_ifft_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_fft_irfft_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_float_power_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_histc_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_ldexp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_linalg_multi_dot_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_linspace_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_logit_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_logspace_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_multinomial_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_mvlgamma_mvlgamma_p_1_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_native_batch_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_ne_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_nextafter_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_norm_fro_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_norm_inf_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_normal_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_outer_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_polygamma_polygamma_n_1_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_pow_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_round_decimals_3_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_sin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_special_i0e_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_special_scaled_modified_bessel_k1_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_special_shifted_chebyshev_polynomial_t_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_special_shifted_chebyshev_polynomial_w_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_special_xlog1py_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_special_zeta_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_take_along_dim_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices___rdiv___cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices__chunk_cat_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_allclose_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_amax_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_argwhere_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_as_strided_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_as_strided_scatter_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_block_diag_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_bmm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_cdist_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_ceil_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_cholesky_solve_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_combinations_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_corrcoef_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_cummin_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_cumulative_trapezoid_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_diagonal_copy_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_erfc_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_fft_fftshift_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_fft_ifft2_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_fft_ihfft2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_fft_rfftn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_fmax_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_frexp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_gather_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_igammac_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_index_reduce_mean_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_jiterator_binary_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_jiterator_unary_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_lu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_pinv_singular_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_qr_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_vander_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_vecdot_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_long_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_masked_amin_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_masked_std_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_max_reduction_no_dim_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_meshgrid_list_of_tensors_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_min_reduction_with_dim_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_mode_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_mul_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_narrow_copy_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nextafter_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_adaptive_max_pool1d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_adaptive_max_pool3d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_alpha_dropout_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_celu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_conv_transpose3d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_feature_alpha_dropout_with_train_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_hinge_embedding_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_margin_ranking_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_max_unpool2d_grad_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_max_unpool3d_grad_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_normalize_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_pad_circular_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_pad_replicate_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_prelu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_relu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_triplet_margin_loss_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_upsample_nearest_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_permute_copy_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_polygamma_polygamma_n_1_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_rad2deg_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_ravel_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_resize_as__cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_resolve_neg_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_scatter_reduce_mean_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_scatter_reduce_prod_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_scatter_reduce_prod_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_signal_windows_gaussian_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_sin_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_slice_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_chebyshev_polynomial_u_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_i0e_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_modified_bessel_k0_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_polygamma_special_polygamma_n_0_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_sum_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_t_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_tanh_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_tile_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_to_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_to_sparse_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_torch_ops_aten__efficient_attention_forward_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_unique_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_unsafe_split_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_view_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_vsplit_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_vstack_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_zeros_like_cuda_int64, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values___radd___cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_addr_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_all_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_angle_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_diag_embed_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_digamma_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_fft_fft_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_fft_ifftshift_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_fft_irfftn_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_fmax_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_isclose_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_isfinite_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_jiterator_2inputs_2outputs_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_masked_fill_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_max_binary_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_msort_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_nan_to_num_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_new_empty_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_permute_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_polygamma_polygamma_n_3_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_sigmoid_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_softmax_with_dtype_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_special_chebyshev_polynomial_t_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_special_hermite_polynomial_h_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_special_legendre_polynomial_p_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_squeeze_multiple_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_sum_to_size_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_t_copy_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_to_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_tril_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_true_divide_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_unique_consecutive_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_view_as_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_zeros_like_cuda_bool, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_T_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples__unsafe_masked_index_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples__unsafe_masked_index_put_accumulate_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_add_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_as_strided_copy_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_as_strided_scatter_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_asin_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_atan_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_atanh_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_atleast_3d_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_bfloat16_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_block_diag_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_bmm_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_bool_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_broadcast_tensors_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_chunk_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_chunk_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_clamp_max_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_combinations_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_complex_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_constant_pad_nd_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cov_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cov_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cross_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cummax_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cumprod_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cumulative_trapezoid_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_deg2rad_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_diag_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_div_floor_rounding_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_double_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_dstack_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_einsum_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_empty_strided_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_empty_strided_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_eq_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_exp2_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_fftn_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_hfft_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_ifftshift_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_ihfft_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_ihfftn_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_irfft2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_rfft_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_rfftn_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_flatten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_flatten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fliplr_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fliplr_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_floor_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fmin_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_full_like_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_histc_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_index_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_index_select_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_int_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_isfinite_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_jiterator_4inputs_with_extra_args_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_jiterator_binary_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_det_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_ldl_solve_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_lu_factor_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_multi_dot_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_pinv_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_tensorinv_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_tensorinv_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_tensorsolve_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_vander_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linspace_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linspace_tensor_overload_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_log10_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_logaddexp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_logdet_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_logical_and_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_logical_not_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_lu_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_argmin_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_cumprod_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_cumprod_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_cumsum_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_logsumexp_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_scatter_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_std_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_var_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_var_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_max_binary_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_max_reduction_with_dim_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_meshgrid_variadic_tensors_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_min_reduction_with_dim_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_msort_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nanmean_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nansum_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_neg_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_neg_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_new_empty_strided_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_adaptive_max_pool1d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_avg_pool1d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_cosine_similarity_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_ctc_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_embedding_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_fractional_max_pool3d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_grid_sample_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_interpolate_nearest_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_linear_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_max_unpool2d_grad_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_multilabel_soft_margin_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_pad_constant_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_pad_replicate_negative_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_pairwise_distance_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_relu_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_threshold_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_triplet_margin_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_triplet_margin_loss_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_triplet_margin_with_distance_loss_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_upsample_nearest_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_ormqr_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_polygamma_polygamma_n_4_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_randn_like_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_ravel_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_remainder_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_repeat_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_roll_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_round_decimals_0_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_round_decimals_neg_3_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_rsub_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_scatter_reduce_amin_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_scatter_reduce_mean_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_select_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_signal_windows_cosine_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_signal_windows_general_hamming_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_sparse_sampled_addmm_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_airy_ai_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_bessel_y0_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_chebyshev_polynomial_t_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_entr_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_erfcx_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_i0e_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_i1_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_i1e_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_modified_bessel_i1_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_scaled_modified_bessel_k1_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_split_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_split_with_sizes_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_squeeze_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_std_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_std_mean_unbiased_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_std_unbiased_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_svd_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_unfold_copy_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_unfold_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_view_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_vstack_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_xlogy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_zero__cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_zeros_like_cuda_float32, test/test_ops.py::TestCommonCUDA::test_numpy_ref_argwhere_cuda_float64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_cat_cuda_int64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_clamp_cuda_int64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_diff_cuda_float64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_equal_cuda_float64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_jiterator_2inputs_2outputs_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_numpy_ref_linalg_cross_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_numpy_ref_meshgrid_variadic_tensors_cuda_float64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_meshgrid_variadic_tensors_cuda_int64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_nn_functional_gelu_cuda_float64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_nn_functional_smooth_l1_loss_cuda_float64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_roll_cuda_float64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_roll_cuda_int64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_signal_windows_blackman_cuda_float64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_signal_windows_general_cosine_cuda_float64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_signal_windows_general_hamming_cuda_float64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_squeeze_multiple_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_out_H_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out___getitem___cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__native_batch_norm_legit_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs__conversions_bool_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs__conversions_complex_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs__conversions_long_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_asin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_bitwise_left_shift_cuda_int64, test/test_ops.py::TestCommonCUDA::test_out__refs_bucketize_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_cat_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_constant_pad_nd_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_count_nonzero_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_div_no_rounding_mode_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_dstack_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_erf_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_fft_ihfft_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_flipud_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_float_power_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_hsplit_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_istft_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out__refs_item_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_linalg_vecdot_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_log2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_logspace_tensor_overload_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_neg_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_nn_functional_hardshrink_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_nn_functional_poisson_nll_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_nn_functional_relu6_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_pow_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_repeat_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_round_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_select_scatter_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_sinh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_special_entr_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_special_log_softmax_with_dtype_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_special_multigammaln_mvlgamma_p_1_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_special_ndtr_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_special_zeta_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_split_with_sizes_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_squeeze_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_sub_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_to_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_trace_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_unflatten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_vstack_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_add_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_alias_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_all_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_arange_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_argmin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_bernoulli_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_bitwise_right_shift_cuda_int64, test/test_ops.py::TestCommonCUDA::test_out_cauchy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_clamp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_combinations_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_dstack_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_fft_fftshift_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_fft_hfft2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_fft_ifftshift_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_fft_irfftn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_flip_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_fmin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_full_like_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_gather_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_grid_sampler_2d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_grid_sampler_3d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_gt_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_half_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_igamma_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_index_reduce_mean_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_isnan_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_isreal_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_jiterator_binary_return_by_ref_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_ldexp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_linalg_ldl_solve_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_linalg_lu_solve_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_linalg_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_linalg_norm_subgradients_at_zero_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_lt_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_masked_argmin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_masked_normalize_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_masked_var_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_max_reduction_with_dim_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_min_reduction_no_dim_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_min_reduction_with_dim_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_mm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_msort_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nanmedian_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_new_ones_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_adaptive_avg_pool1d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_conv2d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_cosine_embedding_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_elu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_layer_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_max_unpool1d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_max_unpool3d_grad_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_mse_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_nll_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_silu_complex_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_smooth_l1_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_tanhshrink_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_pca_lowrank_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_put_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_acos_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_acosh_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_addcdiv_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_addcmul_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_addmv_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_alias_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_atan2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_cat_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_cummin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_erfinv_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_fft_hfft2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_fft_ifft2_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_fft_ihfft2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_fft_rfft2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_index_reduce_prod_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_linalg_cholesky_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_linalg_eig_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_linalg_lu_solve_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_linalg_matrix_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_linalg_pinv_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_linalg_pinv_hermitian_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_log_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_logspace_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_masked_select_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_mm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_mv_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_normal_number_mean_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_quantile_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_round_decimals_0_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_scatter_reduce_amin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_scatter_reduce_mean_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_scatter_reduce_sum_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_sgn_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_softmax_with_dtype_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_special_ndtr_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_special_ndtri_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_special_xlog1py_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_square_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_sub_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_sub_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_take_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_transpose_copy_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_transpose_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_unsqueeze_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_where_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_roll_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_searchsorted_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_sign_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_signal_windows_general_cosine_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_split_with_sizes_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_squeeze_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_svd_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_svd_lowrank_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_tanh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_vdot_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_warning___rxor___cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_broadcast_to_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_diag_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_dot_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_expm1_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_fft_fftshift_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_fft_ifftshift_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_fill_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_fliplr_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_fmax_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_linalg_svd_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_linspace_tensor_overload_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_log_softmax_with_dtype_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_logspace_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_narrow_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_nn_functional_dropout_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_nn_functional_mish_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_pow_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_randn_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_reshape_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_rot90_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_special_entr_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_special_log_ndtr_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_special_logit_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_t_copy_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_to_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_trunc_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_var_mean_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_vstack_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_atleast_1d_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_bitwise_and_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_block_diag_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_bool_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_byte_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_cauchy_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_char_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_cholesky_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_dist_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_empty_strided_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_expand_as_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_expand_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_fft_fftn_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_fft_hfft_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_fft_hfftn_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_fft_ifftshift_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_fft_rfft_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_geometric_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_gradient_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_histc_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_igamma_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_index_reduce_mean_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_isreal_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_istft_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_lgamma_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_linalg_ldl_factor_ex_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_log_softmax_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_logaddexp2_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_logical_not_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_lu_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_masked_logaddexp_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_masked_softmax_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_masked_softmin_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_max_binary_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_meshgrid_variadic_tensors_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_min_binary_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nansum_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_narrow_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_adaptive_max_pool1d_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_celu_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_dropout3d_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_embedding_bag_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_embedding_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_max_pool1d_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_max_pool2d_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_max_pool3d_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_max_unpool3d_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_scaled_dot_product_attention_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_softplus_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_norm_fro_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_normal_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_prod_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_reshape_as_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_resolve_conj_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_select_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_signal_windows_hamming_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_signal_windows_hann_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_signbit_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_sinh_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_slice_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_sparse_sampled_addmm_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_special_modified_bessel_i0_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_sqrt_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_to_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_torch__scaled_mm_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_torch_ops_aten__safe_softmax_default_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_tril_indices_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_triu_indices_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_view_cuda, test/test_ops.py::TestCommonCUDA::test_out_zero__cuda_float32, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_asin_cuda_int32, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_asinh_cuda_bool, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_atanh_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_cos_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_div_no_rounding_mode_cuda_int16, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_div_no_rounding_mode_cuda_int64, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_erfinv_cuda_int32, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_erfinv_cuda_int64, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_erfinv_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_expm1_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_i0_cuda_int32, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_i0_cuda_int64, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_i0_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_ldexp_cuda_int32, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_lgamma_cuda_int32, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_log1p_cuda_int32, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_log1p_cuda_int64, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_logit_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_masked_var_cuda_int8, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_mvlgamma_mvlgamma_p_5_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_polygamma_polygamma_n_2_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_polygamma_polygamma_n_3_cuda_int64, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_rad2deg_cuda_int64, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_rad2deg_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_special_chebyshev_polynomial_t_cuda_int64, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_special_chebyshev_polynomial_v_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_special_hermite_polynomial_he_cuda_int32, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_special_legendre_polynomial_p_cuda_bool, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_special_shifted_chebyshev_polynomial_v_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_special_shifted_chebyshev_polynomial_w_cuda_int64, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_sqrt_cuda_int16, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_tan_cuda_int64, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_tanh_cuda_int64, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_tanh_cuda_int8, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_xlogy_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_T_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_T_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_bfloat16_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_bfloat16_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_bool_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_byte_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_byte_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_byte_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_byte_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_cdouble_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_cfloat_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_double_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_float_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_float_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_half_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_half_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_int_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_long_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_short_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_short_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_abs_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_abs_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_abs_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_acos_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_acos_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_acos_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_add_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_addcdiv_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_addcdiv_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_addcmul_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_addr_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_addr_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_addr_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_addr_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_alias_copy_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_alias_copy_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_all_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_all_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_allclose_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_amax_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_amax_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_amax_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_amin_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_arange_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_arange_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_partial_views_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_partial_views_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_partial_views_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_scatter_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_asinh_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_asinh_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atan2_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atan_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atanh_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atanh_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atanh_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atleast_1d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atleast_2d_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atleast_2d_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atleast_3d_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atleast_3d_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_bitwise_and_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_bitwise_not_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_bitwise_or_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_bitwise_right_shift_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_broadcast_tensors_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_bucketize_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_cat_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_cat_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_ceil_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_chunk_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_clamp_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_clamp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_clamp_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_clamp_min_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_clone_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_column_stack_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_conj_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_constant_pad_nd_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_constant_pad_nd_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_contiguous_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_copysign_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_cos_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_count_nonzero_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_cumprod_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_cumsum_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_deg2rad_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_diag_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_diag_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_diagonal_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_diagonal_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_digamma_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_div_no_rounding_mode_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_dot_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_dsplit_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_dstack_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_dstack_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_dstack_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_empty_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_empty_like_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_empty_strided_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_eq_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_equal_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_erf_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_erfinv_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_erfinv_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_exp2_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_exp_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_exp_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_expand_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_expand_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_expand_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_expand_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_exponential_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_eye_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_eye_cuda_float8_e5m2fnuz, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fft_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fftshift_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_hfft2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_hfft_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_hfft_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_hfft_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifft2_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifft2_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifftn_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifftn_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifftshift_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifftshift_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ihfft2_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ihfft2_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ihfft_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ihfftn_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_irfft2_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_irfftn_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_rfft2_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_rfft_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_rfftn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fill_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fliplr_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fliplr_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_flipud_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_float_power_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_float_power_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_floor_divide_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fmax_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fmin_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_gcd_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_ge_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_ge_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_geometric_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_heaviside_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_hstack_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_hypot_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_i0_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_igamma_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_index_copy_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_index_copy_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_index_copy_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_index_fill_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isinf_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isinf_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isinf_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isinf_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isneginf_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isneginf_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isposinf_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isreal_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isreal_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isreal_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_lcm_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_le_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_le_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_lerp_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_linalg_cross_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_linalg_diagonal_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_linalg_diagonal_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_linalg_matrix_norm_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_linspace_tensor_overload_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_linspace_tensor_overload_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_log10_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_log1p_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_log2_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_log_normal_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_log_normal_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_log_softmax_with_dtype_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_log_softmax_with_dtype_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logaddexp_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_and_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_or_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_xor_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_xor_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logspace_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logsumexp_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logsumexp_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_masked_fill_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_masked_fill_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_masked_fill_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_maximum_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_maximum_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_meshgrid_variadic_tensors_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_meshgrid_variadic_tensors_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_minimum_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_minimum_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_movedim_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_mul_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_narrow_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_narrow_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_ne_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_neg_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_new_empty_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_new_full_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_new_ones_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_new_ones_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_new_zeros_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_new_zeros_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_alpha_dropout_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_elu_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_hardtanh_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_layer_norm_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_margin_ranking_loss_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_mish_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_pixel_shuffle_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_pixel_shuffle_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_pixel_unshuffle_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_poisson_nll_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_poisson_nll_loss_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_relu_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_selu_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_softmin_with_dtype_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_softplus_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_softshrink_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_tanhshrink_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_normal_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_ones_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_permute_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_permute_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_pow_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_rad2deg_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_randn_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_ravel_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_ravel_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_real_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_reciprocal_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_remainder_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_repeat_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_repeat_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_reshape_as_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_roll_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_round_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_round_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_rsqrt_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_rsqrt_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_rsub_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sgn_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sgn_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sigmoid_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sigmoid_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sigmoid_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sign_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_signbit_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_signbit_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sin_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sinh_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_softmax_with_dtype_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_softmax_with_dtype_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_erfcx_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_erfcx_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_i1e_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_log_ndtr_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_log_ndtr_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_log_softmax_with_dtype_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_multigammaln_mvlgamma_p_1_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_ndtr_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_ndtr_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_softmax_with_dtype_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_softmax_with_dtype_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_softmax_with_dtype_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_xlog1py_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_xlog1py_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_xlog1py_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_split_with_sizes_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_split_with_sizes_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_split_with_sizes_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sqrt_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_square_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_squeeze_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_squeeze_copy_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_squeeze_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sub_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sum_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sum_to_size_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sum_to_size_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sum_to_size_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_t_copy_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_t_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_take_along_dim_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_take_along_dim_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_take_along_dim_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_tensor_split_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_trace_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_transpose_copy_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_true_divide_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_true_divide_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_trunc_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_trunc_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unbind_copy_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unbind_copy_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unflatten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unflatten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unflatten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unfold_copy_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unfold_copy_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unsqueeze_copy_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unsqueeze_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unsqueeze_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unsqueeze_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_var_mean_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_view_as_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_view_as_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_view_copy_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_view_copy_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_view_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_vsplit_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_vsplit_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_vstack_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_where_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_xlogy_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_xlogy_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_xlogy_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_zeros_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_zeros_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_T_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_add_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_diagonal_copy_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_diagonal_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_div_trunc_rounding_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_fft_ifftn_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_fft_ihfftn_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_fft_irfft_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_ge_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_igammac_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_index_add_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_logaddexp_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_normal__in_place_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bfloat16_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bfloat16_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bfloat16_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bool_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_byte_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cdouble_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cdouble_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cfloat_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cfloat_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_chalf_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_double_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_double_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_float_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_float_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_int_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_int_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_long_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_acos_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_acosh_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addcmul_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addcmul_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addr_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_all_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_amax_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_amax_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_amin_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_copy_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_copy_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_partial_views_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_partial_views_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_partial_views_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_partial_views_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_scatter_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_scatter_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_scatter_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asin_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asinh_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atan2_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atan2_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atan_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atanh_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atanh_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atanh_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_1d_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_1d_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_2d_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_3d_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_not_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_xor_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_broadcast_shapes_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_broadcast_tensors_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_broadcast_tensors_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_broadcast_tensors_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_broadcast_to_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bucketize_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bucketize_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cat_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cauchy_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ceil_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_chunk_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clone_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_column_stack_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_column_stack_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_constant_pad_nd_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_constant_pad_nd_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_copysign_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cos_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cos_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cosh_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_count_nonzero_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cumprod_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cumsum_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cumsum_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diag_embed_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diag_embed_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diag_embed_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_copy_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_scatter_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_no_rounding_mode_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_trunc_rounding_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dot_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dsplit_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dsplit_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dstack_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dstack_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_like_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_strided_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_strided_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_eq_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_eq_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_equal_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erf_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_exp2_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_exp2_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_exp_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expand_copy_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_eye_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fft2_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fft2_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fft2_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fft_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fft_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fft_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fftn_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifft2_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifft_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifft_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifft_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifftn_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifftn_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifftn_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfft2_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfft2_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfft_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfft_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfftn_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfft2_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfft2_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfft_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfft_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfftn_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfftn_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfftn_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfftn_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfftn_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flatten_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flatten_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flip_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fliplr_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_float_power_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_float_power_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_float_power_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_floor_divide_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmin_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmod_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_gcd_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ge_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_geometric_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_gt_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hstack_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hstack_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_add_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_add_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_select_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isclose_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isnan_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isnan_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_item_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lcm_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lcm_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_le_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_le_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_cross_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_diagonal_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_diagonal_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_norm_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_vector_norm_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linspace_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log10_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_softmax_with_dtype_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_softmax_with_dtype_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_and_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_and_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_not_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_xor_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logspace_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logspace_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logspace_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logspace_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logspace_tensor_overload_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logsumexp_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logsumexp_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_masked_fill_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_masked_fill_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_mean_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_mean_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_meshgrid_list_of_tensors_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_meshgrid_variadic_tensors_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_meshgrid_variadic_tensors_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_meshgrid_variadic_tensors_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_minimum_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_minimum_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_movedim_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_mul_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_mul_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ne_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ne_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_empty_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_ones_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_ones_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_celu_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_channel_shuffle_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_channel_shuffle_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_gelu_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_glu_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_group_norm_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_mse_loss_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_pairwise_distance_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_pdist_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_pixel_shuffle_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_pixel_unshuffle_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_poisson_nll_loss_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_relu_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softmax_with_dtype_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softmin_with_dtype_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softplus_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_tanhshrink_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_tanhshrink_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_threshold_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_norm_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_normal_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_normal_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_permute_copy_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_permute_copy_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_permute_copy_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_permute_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_positive_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_positive_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_positive_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rad2deg_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rad2deg_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_randn_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_randn_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_randn_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_real_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_real_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reciprocal_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_remainder_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_as_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rot90_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_round_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rsqrt_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rsub_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rsub_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_select_scatter_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sgn_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sigmoid_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sign_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_signbit_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sin_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sinc_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_softmax_with_dtype_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_softmax_with_dtype_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_bessel_j1_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_bessel_j1_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_bessel_j1_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_entr_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_erfcx_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_i0e_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_i0e_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_i1e_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_i1e_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_log_softmax_with_dtype_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_log_softmax_with_dtype_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_log_softmax_with_dtype_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_log_softmax_with_dtype_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_multigammaln_mvlgamma_p_1_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_multigammaln_mvlgamma_p_1_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_multigammaln_mvlgamma_p_3_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_multigammaln_mvlgamma_p_3_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_multigammaln_mvlgamma_p_5_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_ndtri_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_softmax_with_dtype_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_softmax_with_dtype_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_split_with_sizes_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sqrt_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sqrt_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_squeeze_copy_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_squeeze_copy_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_squeeze_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_stack_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_stack_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_std_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_std_mean_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sub_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sub_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sum_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sum_to_size_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_t_copy_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_t_copy_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_t_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_take_along_dim_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_take_along_dim_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tanh_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tensor_split_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_trace_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_trace_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_transpose_copy_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_transpose_copy_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_transpose_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_transpose_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tril_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tril_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_true_divide_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_true_divide_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_trunc_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unbind_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unbind_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unbind_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unflatten_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_copy_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unsqueeze_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unsqueeze_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_as_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_copy_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_vstack_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_where_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_where_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_where_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_xlogy_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_T_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_T_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_bool_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_byte_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_byte_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_cdouble_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_cdouble_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_cfloat_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_cfloat_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_double_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_float_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_long_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_short_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_short_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_short_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_abs_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_abs_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_abs_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_abs_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_acos_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_acos_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_add_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_add_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_addcdiv_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_addr_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_alias_copy_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_alias_copy_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_all_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_all_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_amax_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_amax_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_any_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_arange_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_arange_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_copy_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_partial_views_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_partial_views_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_asin_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atan2_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atan2_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atan2_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atan_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atan_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atan_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atanh_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atleast_2d_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atleast_2d_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atleast_2d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atleast_2d_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atleast_3d_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atleast_3d_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_bitwise_and_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_bitwise_or_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_bitwise_right_shift_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_block_diag_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_block_diag_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_broadcast_tensors_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_broadcast_tensors_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_broadcast_to_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_bucketize_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cat_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cat_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cat_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cauchy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_chunk_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_chunk_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_clamp_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_clamp_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_clamp_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_clamp_max_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_clone_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_column_stack_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_conj_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_conj_physical_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_constant_pad_nd_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_constant_pad_nd_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_contiguous_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cos_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cos_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cos_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cosh_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_count_nonzero_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diag_embed_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diagonal_copy_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diagonal_scatter_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_digamma_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_div_floor_rounding_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_div_floor_rounding_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_div_trunc_rounding_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_div_trunc_rounding_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_div_trunc_rounding_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_dsplit_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_dstack_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_empty_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_empty_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_empty_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_empty_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_empty_like_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_empty_like_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_empty_strided_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_eq_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_eq_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_equal_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_erfc_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_exp2_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_expand_as_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_expand_as_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_expand_copy_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_expand_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_expand_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_expm1_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_expm1_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_expm1_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_exponential_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_exponential_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fft_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fft_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fft_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fftn_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fftn_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fftshift_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_hfft_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_hfft_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_hfftn_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_hfftn_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_hfftn_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifft2_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifft_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifftshift_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ihfft2_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ihfft2_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ihfft_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ihfft_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ihfft_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ihfftn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_irfft2_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_irfft_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_rfft2_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_rfft_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_rfft_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fill_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fill_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_flatten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_flip_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_flip_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_flip_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fliplr_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_flipud_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_flipud_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_flipud_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_float_power_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_floor_divide_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fmin_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fmod_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_frexp_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ge_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_hsplit_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_hsplit_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_hypot_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_igammac_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_copy_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isclose_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isclose_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isclose_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isfinite_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isfinite_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isneginf_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isneginf_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_le_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linalg_matrix_norm_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linalg_svd_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linalg_vecdot_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log10_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log1p_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log_normal_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logaddexp_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_and_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_not_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_or_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_xor_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logspace_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logspace_tensor_overload_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logspace_tensor_overload_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_masked_fill_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_masked_fill_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_meshgrid_list_of_tensors_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_minimum_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_movedim_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_mul_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_mul_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nan_to_num_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_narrow_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_narrow_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ne_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_empty_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_ones_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_ones_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_zeros_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_zeros_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_zeros_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_dropout_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_glu_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_hardshrink_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_hardtanh_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_hinge_embedding_loss_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_l1_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_log_softmax_with_dtype_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_log_softmax_with_dtype_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_log_softmax_with_dtype_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_margin_ranking_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_mish_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_relu6_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_relu6_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_relu_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_softmax_with_dtype_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_softmin_with_dtype_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_tanhshrink_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_tanhshrink_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_threshold_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_triplet_margin_loss_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_normal__in_place_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_normal_number_mean_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_permute_copy_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_permute_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_positive_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_prod_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_randn_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ravel_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_real_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_reciprocal_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_reshape_as_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_roll_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_roll_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_rot90_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_rot90_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_round_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_rsqrt_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_rsub_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_select_scatter_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_select_scatter_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sgn_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sgn_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sign_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_signbit_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sin_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sinc_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sinh_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sinh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sinh_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_softmax_with_dtype_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_entr_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_entr_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_erfcx_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_i0e_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_i1_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_i1e_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_logit_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_multigammaln_mvlgamma_p_1_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_multigammaln_mvlgamma_p_3_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_ndtri_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_ndtri_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_ndtri_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_softmax_with_dtype_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_xlog1py_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_zeta_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sqrt_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sqrt_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_squeeze_copy_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_squeeze_copy_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_squeeze_copy_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_squeeze_multiple_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_squeeze_multiple_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_stack_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_std_mean_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sub_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sum_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sum_to_size_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_t_copy_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_take_along_dim_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_take_along_dim_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tanh_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tanh_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_to_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_to_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_trace_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_transpose_copy_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_transpose_copy_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_transpose_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tril_indices_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_true_divide_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unbind_copy_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unfold_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unfold_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unfold_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unfold_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unsqueeze_copy_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unsqueeze_copy_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unsqueeze_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unsqueeze_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_var_mean_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_view_as_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_view_copy_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_where_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_zeros_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_bfloat16_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_bfloat16_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_bool_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_byte_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_cfloat_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_cfloat_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_char_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_char_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_complex_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_double_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_double_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_float_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_half_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_half_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_int_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_long_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_long_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_long_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_abs_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_acos_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_acosh_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_acosh_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_add_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_addcmul_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_addcmul_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_addr_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_addr_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_alias_copy_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_alias_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_all_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_amax_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_amin_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_amin_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_any_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_arange_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_arange_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_arange_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_as_strided_copy_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_as_strided_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_asinh_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_asinh_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atan2_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atan_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atan_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atleast_1d_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atleast_1d_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atleast_3d_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atleast_3d_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bitwise_and_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bitwise_left_shift_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bitwise_not_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_block_diag_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_broadcast_tensors_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_broadcast_tensors_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cat_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_chunk_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_clamp_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_clamp_max_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_clamp_min_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_clone_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_clone_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_conj_physical_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_constant_pad_nd_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_constant_pad_nd_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_constant_pad_nd_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_contiguous_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_contiguous_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_contiguous_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cumprod_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cumprod_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cumprod_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diag_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diag_embed_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diagonal_scatter_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diagonal_scatter_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_div_floor_rounding_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_div_no_rounding_mode_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_div_trunc_rounding_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_dot_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_dstack_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_dstack_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_dstack_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_empty_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_empty_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_empty_like_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_empty_strided_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_equal_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_equal_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_equal_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_erf_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_erfc_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_exp2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_exp2_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_exp_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_exp_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_expand_as_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_expand_as_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_expand_copy_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_expand_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_expand_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fft_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fftshift_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_hfft2_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_hfftn_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifft_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifft_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifftn_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifftshift_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifftshift_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ihfft2_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ihfft2_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ihfft2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_irfft2_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_irfft2_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_irfft2_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_irfft_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_rfft2_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_rfft_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_rfft_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fill_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fliplr_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_float_power_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_float_power_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fmax_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fmin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fmin_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fmin_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fmod_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fmod_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fmod_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_frac_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ge_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_geometric_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_geometric_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_gt_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_gt_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_heaviside_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_heaviside_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_heaviside_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_heaviside_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_heaviside_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_heaviside_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_heaviside_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_hsplit_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_hsplit_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_hsplit_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_hsplit_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_hstack_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_hstack_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_hstack_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_i0_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_imag_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_add_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_copy_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_fill_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_select_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isinf_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isinf_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isinf_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isnan_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_item_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_lgamma_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linalg_diagonal_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linalg_diagonal_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linalg_norm_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linalg_svd_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linspace_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linspace_tensor_overload_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log10_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log1p_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log1p_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log2_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log_softmax_with_dtype_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log_softmax_with_dtype_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logaddexp2_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logaddexp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_and_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_and_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_not_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logspace_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logspace_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logspace_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logsumexp_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_masked_fill_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_maximum_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_meshgrid_variadic_tensors_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_movedim_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_narrow_copy_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_narrow_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ne_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_neg_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_neg_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_empty_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_empty_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_empty_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_empty_strided_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_empty_strided_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_zeros_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_zeros_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_alpha_dropout_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_dropout_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_gelu_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_glu_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_glu_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_hardshrink_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_hardtanh_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_hardtanh_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_l1_loss_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_l1_loss_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_l1_loss_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_layer_norm_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_layer_norm_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_layer_norm_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_log_softmax_with_dtype_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_log_softmax_with_dtype_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_margin_ranking_loss_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_margin_ranking_loss_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_mish_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_pixel_shuffle_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_pixel_shuffle_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_pixel_unshuffle_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_pixel_unshuffle_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_poisson_nll_loss_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_relu6_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_smooth_l1_loss_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_softmin_with_dtype_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_threshold_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_threshold_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_normal_number_mean_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_permute_copy_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_permute_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_positive_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_positive_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_positive_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_rad2deg_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_randn_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ravel_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ravel_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_real_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_reciprocal_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_remainder_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_remainder_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_repeat_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_repeat_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_reshape_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_rot90_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_round_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_rsqrt_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_select_scatter_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sgn_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sgn_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sigmoid_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sigmoid_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sigmoid_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_signbit_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_signbit_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sin_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sin_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sinc_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_bessel_j0_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_bessel_j1_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_entr_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_erfcx_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_i0e_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_i1e_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_logit_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_multigammaln_mvlgamma_p_1_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_multigammaln_mvlgamma_p_3_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_softmax_with_dtype_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_xlog1py_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_zeta_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_zeta_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_zeta_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_split_with_sizes_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sqrt_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_square_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_square_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_squeeze_multiple_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_std_mean_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sub_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sub_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sub_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sub_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sum_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sum_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_t_copy_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_t_copy_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_t_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_t_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_t_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_take_along_dim_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_take_along_dim_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_take_along_dim_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tan_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tan_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tan_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tanh_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tanh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tanh_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_to_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_trace_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_trace_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_transpose_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tril_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tril_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tril_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_triu_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_trunc_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_trunc_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unflatten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unfold_copy_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unfold_copy_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unfold_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unfold_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unsqueeze_copy_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unsqueeze_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unsqueeze_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_vdot_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_vdot_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_view_copy_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_view_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_vsplit_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_vsplit_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_vsplit_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_vsplit_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_vstack_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_where_cuda_float64, test/test_ops.py::TestCommonCUDA::test_reduction_ops_reduce_argmin_cuda, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_H_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager___getitem___cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager___radd___cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager___rsub___cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager__batch_norm_with_update_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_addcdiv_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_addcmul_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_block_diag_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_bool_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_broadcast_to_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_cosh_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_cross_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_diag_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_diagonal_scatter_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_diff_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_dist_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_double_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_exp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_fft_fft2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_fft_fftshift_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_fft_hfftn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_fft_irfft2_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_flatten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_float_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_frac_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_hsplit_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_igamma_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_imag_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_index_copy_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_index_put_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_jiterator_unary_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_ldexp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_diagonal_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_eig_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_householder_product_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_inv_ex_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_ldl_factor_ex_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_lu_solve_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_matrix_rank_hermitian_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_solve_ex_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_solve_ex_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_svd_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_tensorinv_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_logaddexp2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_logdet_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_logical_xor_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_logical_xor_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_lu_unpack_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_mT_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_masked_logsumexp_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_masked_mean_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_masked_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_masked_normalize_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_masked_scatter_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_masked_select_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_masked_std_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_masked_std_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_maximum_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_mean_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_movedim_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_native_batch_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_native_dropout_backward_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_new_empty_strided_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_adaptive_max_pool1d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_conv1d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_conv2d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_conv3d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_interpolate_trilinear_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_layer_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_multi_head_attention_forward_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_pixel_unshuffle_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_rms_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_scaled_dot_product_attention_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nonzero_static_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_ormqr_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_ormqr_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_outer_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_pow_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_qr_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_rand_like_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_randn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_resize_as__cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_rot90_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_rot90_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_scatter_reduce_prod_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_short_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_sinh_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_sparse_sampled_addmm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_special_entr_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_split_with_sizes_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_sqrt_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_std_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_std_unbiased_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_tan_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_trapezoid_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_unsafe_chunk_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_var_mean_unbiased_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_view_as_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward__segment_reduce_offsets_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_acosh_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_angle_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_atleast_2d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_atleast_3d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_bernoulli_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_bmm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_broadcast_to_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_digamma_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_dist_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_div_floor_rounding_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_expand_as_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_fft_ifftn_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_index_copy_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_index_reduce_prod_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_linalg_lu_factor_ex_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_linalg_norm_subgradients_at_zero_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_masked_cumsum_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_masked_fill_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_masked_mean_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_masked_median_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_masked_select_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_minimum_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_native_batch_norm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_celu_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_conv2d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_glu_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_layer_norm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_mse_loss_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_pad_constant_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_rot90_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_round_decimals_0_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_scatter_reduce_prod_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_squeeze_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_torch_ops_aten__safe_softmax_default_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_trapz_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_T_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input__native_batch_norm_legit_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input__segment_reduce_lengths_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_acos_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_add_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_cdist_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_constant_pad_nd_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_dsplit_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_empty_strided_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_exp_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_expand_as_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_expand_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_fft_ifftshift_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_frac_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_hash_tensor_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_i0_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_jiterator_unary_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_kron_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_kthvalue_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_linalg_matrix_rank_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_linalg_matrix_rank_hermitian_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_linalg_slogdet_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_linalg_svdvals_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_log_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_lu_unpack_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_masked_argmax_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_masked_scatter_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_masked_var_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_nanmean_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_nn_functional_adaptive_max_pool3d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_nn_functional_avg_pool3d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_nn_functional_conv1d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_nn_functional_cosine_similarity_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_nn_functional_dropout_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_nn_functional_embedding_bag_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_nn_functional_group_norm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_nn_functional_max_pool3d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_nn_functional_max_unpool3d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_nn_functional_pad_reflect_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_nn_functional_silu_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_nn_functional_softshrink_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_norm_nuc_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_permute_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_polar_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_randint_like_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_rsub_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_select_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_select_scatter_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_signal_windows_cosine_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_signbit_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_sparse_mm_reduce_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_special_bessel_y0_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_special_bessel_y1_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_special_erfcx_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_special_hermite_polynomial_he_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_special_ndtri_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_special_shifted_chebyshev_polynomial_t_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_square_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_svd_lowrank_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_tensordot_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_true_divide_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_var_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_view_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad___getitem___cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad___rsub___cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_addcdiv_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_amax_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_aminmax_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_atleast_2d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_cos_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_cross_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_cummin_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_diagonal_scatter_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_empty_like_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_eye_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_hsplit_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_index_put_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linalg_det_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linalg_lstsq_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linalg_multi_dot_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linalg_tensorsolve_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linalg_vander_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_log10_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_log_softmax_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_masked_logsumexp_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_masked_var_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_max_reduction_no_dim_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_mm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_new_empty_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_new_empty_strided_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_new_ones_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_conv1d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_conv_transpose1d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_cosine_similarity_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_ctc_loss_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_hardshrink_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_huber_loss_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_kl_div_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_pad_circular_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_pad_constant_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_rrelu_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_silu_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_soft_margin_loss_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_upsample_nearest_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_polygamma_polygamma_n_3_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_randint_like_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_resolve_conj_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_rsqrt_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_select_scatter_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_signal_windows_bartlett_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_sparse_mm_reduce_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_special_chebyshev_polynomial_w_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_stack_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_std_mean_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_topk_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_transpose_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_unflatten_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_H_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator___rmatmul___cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator___rmod___cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator__native_batch_norm_legit_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator__segment_reduce_offsets_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_addmv_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_argwhere_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_as_strided_copy_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_as_strided_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_cdist_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_cummin_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_exp2_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_fft_fft_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_fft_ifft_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_index_fill_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_isclose_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_linalg_eig_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_linalg_householder_product_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_linalg_inv_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_linalg_matrix_power_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_linalg_norm_subgradients_at_zero_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_linalg_slogdet_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_linalg_vector_norm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_linspace_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_log_normal_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_logcumsumexp_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_mT_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_masked_amax_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_masked_argmin_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_masked_var_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_matrix_exp_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_mean_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_mvlgamma_mvlgamma_p_5_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_native_dropout_backward_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_new_zeros_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_conv_transpose3d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_cross_entropy_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_hinge_embedding_loss_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_interpolate_nearest-exact_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_max_unpool2d_grad_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_multilabel_margin_loss_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_pairwise_distance_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_silu_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_triplet_margin_with_distance_loss_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_quantile_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_rand_like_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_real_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_reshape_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_roll_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_rsub_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_scatter_add_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_select_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_select_scatter_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_sign_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_signal_windows_general_hamming_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_slice_scatter_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_softmax_with_dtype_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_sparse_mm_reduce_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_special_entr_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_special_ndtr_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_special_scaled_modified_bessel_k0_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_special_scaled_modified_bessel_k1_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_special_shifted_chebyshev_polynomial_u_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_special_zeta_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_split_with_sizes_copy_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_take_along_dim_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_tensor_split_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_transpose_copy_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_tril_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_unique_consecutive_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_vstack_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_zero__cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_zeros_like_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay___radd___cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay__native_batch_norm_legit_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_aminmax_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_argmax_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_argsort_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_cdist_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_diagflat_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_div_floor_rounding_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_einsum_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_fft_ifft2_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_fft_ifft_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_fill_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_fliplr_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_flipud_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_hypot_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_index_reduce_mean_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_isneginf_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_le_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_linalg_cond_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_linalg_lstsq_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_linalg_lu_factor_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_linalg_pinv_hermitian_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_linalg_qr_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_linalg_tensorinv_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_log_softmax_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_logdet_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_logical_and_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_mT_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_masked_median_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_masked_scatter_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_masked_std_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_maximum_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_median_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_min_reduction_with_dim_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_mode_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_msort_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_nan_to_num_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_narrow_copy_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_nn_functional_adaptive_max_pool2d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_nn_functional_batch_norm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_nn_functional_conv3d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_nn_functional_conv_transpose3d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_nn_functional_dropout3d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_nn_functional_embedding_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_nn_functional_kl_div_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_nn_functional_logsigmoid_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_nn_functional_margin_ranking_loss_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_nn_functional_max_pool2d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_nn_functional_max_pool3d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_nn_functional_max_unpool1d_grad_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_nn_functional_pairwise_distance_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_nn_functional_pixel_unshuffle_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_nn_functional_poisson_nll_loss_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_nn_functional_softmin_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_nn_functional_softsign_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_nn_functional_tanhshrink_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_nn_functional_triplet_margin_with_distance_loss_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_outer_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_polar_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_polygamma_polygamma_n_4_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_ravel_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_renorm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_resize_as__cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_roll_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_scatter_add_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_sign_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_slice_scatter_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_special_entr_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_special_shifted_chebyshev_polynomial_v_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_split_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_stack_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_std_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_std_mean_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_std_unbiased_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_stft_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_trapezoid_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_unsqueeze_copy_cuda_float32, test/test_ops.py::TestMathBitsCUDA::test_conj_view___radd___cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__chunk_cat_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs__conversions_bfloat16_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs__conversions_bool_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_acos_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_addr_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_broadcast_to_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_diag_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_empty_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_eq_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_eye_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_fft_ifft2_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_index_add_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_index_copy_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_item_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_linspace_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_log10_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_logical_and_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_new_empty_strided_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_nn_functional_pixel_shuffle_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_nn_functional_tanhshrink_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_split_with_sizes_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_squeeze_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_sub_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_true_divide_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_unbind_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_unsqueeze_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_vstack_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_as_strided_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_as_strided_scatter_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_broadcast_to_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_cholesky_solve_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_count_nonzero_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_empty_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_fft_hfftn_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_fill_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_index_copy_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_int_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_jiterator_binary_return_by_ref_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_lerp_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_linalg_norm_subgradients_at_zero_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_linalg_qr_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_logcumsumexp_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_logical_and_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_logical_or_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_masked_fill_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_masked_prod_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_masked_sum_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_matrix_exp_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_meshgrid_list_of_tensors_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_nn_functional_channel_shuffle_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_put_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_resolve_neg_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_select_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_sinc_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_square_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_take_along_dim_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_take_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_tile_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_unfold_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_vsplit_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs__conversions_byte_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs__conversions_cdouble_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs__conversions_chalf_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_abs_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_as_strided_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_asinh_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_cumsum_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_diagonal_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_empty_like_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_empty_strided_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_fft_ifft2_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_fft_ifftn_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_index_copy_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_isinf_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_linalg_vector_norm_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_narrow_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_nn_functional_log_softmax_with_dtype_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_nn_functional_softmin_with_dtype_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_randn_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_stack_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_stft_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_triu_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_atleast_3d_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_byte_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_combinations_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_conj_physical_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_cumulative_trapezoid_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_dist_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_dsplit_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_empty_permuted_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_eye_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_geqrf_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_hsplit_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_isreal_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linalg_cond_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linalg_ldl_factor_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_logcumsumexp_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_masked_sum_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_nn_functional_pixel_unshuffle_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_nn_functional_silu_complex_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_norm_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_positive_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_pow_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_randn_like_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_repeat_interleave_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_rsqrt_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_select_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_short_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_sub_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_t_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_unsafe_chunk_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_where_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_view___rpow___cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs__conversions_chalf_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs__conversions_long_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_all_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_as_strided_copy_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_asin_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_block_diag_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_broadcast_tensors_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_ceil_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_conj_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_erfinv_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_fft_hfftn_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_fft_ihfft2_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_ge_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_hsplit_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_igammac_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_linspace_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_logical_and_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_ne_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_new_empty_strided_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_nn_functional_elu_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_nn_functional_glu_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_nn_functional_smooth_l1_loss_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_normal_number_mean_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_prod_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_round_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_sinh_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_tanh_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_tril_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_atan_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_cat_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_combinations_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_diag_embed_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_div_no_rounding_mode_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_div_trunc_rounding_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_equal_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_fft_rfft2_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_fft_rfft_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_flip_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_geqrf_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_gt_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_hypot_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_igamma_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_index_reduce_mean_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_isin_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_jiterator_binary_return_by_ref_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_linalg_eig_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_linalg_matrix_rank_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_linalg_pinv_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_linalg_vecdot_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_linspace_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_logaddexp2_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_logical_not_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_logical_xor_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nanmean_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_ne_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_new_ones_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_batch_norm_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_binary_cross_entropy_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_binary_cross_entropy_with_logits_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_conv1d_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_conv_transpose1d_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_grid_sample_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_interpolate_nearest_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_leaky_relu_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_max_unpool1d_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_max_unpool3d_grad_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_smooth_l1_loss_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_softsign_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_normal_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_normal_number_mean_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_polygamma_polygamma_n_2_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_prod_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_rand_like_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_resolve_conj_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_round_decimals_0_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_scatter_reduce_sum_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_sinc_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_special_bessel_j1_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_special_chebyshev_polynomial_t_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_special_i0e_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_special_i1e_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_special_shifted_chebyshev_polynomial_u_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_stack_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_std_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_stft_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_trapz_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_view_as_complex_cuda_float64, test/test_ops.py::TestFakeTensorCUDA::test_fake___radd___cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_addmm_decomposed_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_argsort_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_as_strided_copy_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_as_strided_scatter_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_atan2_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_atan_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast___radd___cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_acos_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_aminmax_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_as_strided_scatter_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_atan2_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_baddbmm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_broadcast_shapes_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_cartesian_prod_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_cholesky_inverse_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_clamp_min_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_complex_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_copysign_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_corrcoef_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_div_trunc_rounding_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_exp_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_fft_hfft_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_grid_sampler_2d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_hash_tensor_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_jiterator_binary_return_by_ref_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linalg_eigh_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linalg_pinv_hermitian_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linalg_solve_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linalg_tensorsolve_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_logsumexp_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_masked_argmax_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_native_layer_norm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_adaptive_avg_pool3d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_avg_pool2d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_batch_norm_without_cudnn_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_conv_transpose2d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_cosine_similarity_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_gaussian_nll_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_interpolate_nearest_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_max_unpool1d_grad_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_max_unpool2d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_poisson_nll_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_relu6_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_relu_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_silu_complex_cuda_complex64, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_softmin_with_dtype_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_norm_nuc_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_real_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_scatter_reduce_sum_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_sigmoid_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_sign_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_sinh_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_sort_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_special_bessel_y1_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_special_chebyshev_polynomial_v_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_special_legendre_polynomial_p_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_special_ndtr_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_special_polygamma_special_polygamma_n_0_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_special_scaled_modified_bessel_k1_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_special_xlog1py_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_special_zeta_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_sub_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_tan_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_to_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_torch__scaled_mm_cuda_float8_e4m3fn, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_transpose_copy_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_trunc_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_vsplit_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_bitwise_or_cuda_int64, test/test_ops.py::TestFakeTensorCUDA::test_fake_block_diag_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_chalf_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_copysign_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp__unsafe_masked_index_put_accumulate_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_amin_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_as_strided_scatter_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_cholesky_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_cov_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_diff_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_fft_fftn_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_fft_rfft_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_frexp_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_index_reduce_amax_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_linalg_cholesky_ex_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_linalg_diagonal_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_linalg_lu_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_linalg_svd_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_log_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_lu_unpack_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_masked_amax_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_masked_amin_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_masked_fill_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_matmul_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_msort_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_avg_pool1d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_avg_pool3d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_bilinear_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_channel_shuffle_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_logsigmoid_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_max_unpool2d_grad_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_pdist_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_pixel_unshuffle_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_relu6_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_softmin_with_dtype_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_polar_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_prod_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_round_decimals_neg_3_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_sigmoid_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_special_i0e_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_squeeze_copy_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_std_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_take_along_dim_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_tensordot_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_tile_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_unbind_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_unsafe_chunk_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_add_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_atan2_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_corrcoef_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_diag_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_diagonal_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_erf_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_fft_fftshift_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_i0_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_index_reduce_amax_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_index_reduce_prod_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_linalg_eigvalsh_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_linalg_lu_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_linalg_qr_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_linalg_tensorinv_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_log_softmax_with_dtype_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_masked_fill_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_masked_median_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_masked_select_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_max_reduction_with_dim_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_min_reduction_no_dim_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_mm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_mv_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nanmedian_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_native_layer_norm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_adaptive_avg_pool2d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_adaptive_avg_pool3d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_adaptive_max_pool3d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_avg_pool1d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_avg_pool2d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_conv2d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_hinge_embedding_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_l1_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_margin_ranking_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_max_unpool3d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_normalize_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_prelu_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_relu_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_permute_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_resolve_conj_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_resolve_neg_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_sinc_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_split_with_sizes_copy_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_squeeze_multiple_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_svd_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_trace_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_var_mean_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_vstack_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_div_no_rounding_mode_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_empty_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_empty_strided_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_erfc_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_exp_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_fft_ifftn_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_flip_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_fmax_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_index_reduce_mean_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_jiterator_unary_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_lcm_cuda_int64, test/test_ops.py::TestFakeTensorCUDA::test_fake_le_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_linalg_eig_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_linalg_ldl_solve_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_linalg_solve_ex_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_linalg_solve_triangular_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_log2_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_matmul_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_meshgrid_list_of_tensors_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_mm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_mv_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nan_to_num_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_narrow_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_adaptive_avg_pool2d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_alpha_dropout_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_channel_shuffle_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_conv3d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_cosine_embedding_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_dropout2d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_feature_alpha_dropout_with_train_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_max_unpool1d_grad_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_silu_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_normal_number_mean_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_quantile_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_randint_like_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_randn_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_scatter_add_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_scatter_reduce_mean_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_sigmoid_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_signal_windows_exponential_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_sin_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_softmax_with_dtype_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_sort_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_special_hermite_polynomial_h_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_special_modified_bessel_i1_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_special_modified_bessel_k1_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_special_polygamma_special_polygamma_n_0_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_triangular_solve_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_triu_indices_cuda_int64, test/test_ops.py::TestFakeTensorCUDA::test_fake_unfold_copy_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_where_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops__softmax_backward_data_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_abs_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_all_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_angle_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_as_strided_copy_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_baddbmm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_bfloat16_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_bitwise_left_shift_cuda_int64, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_clamp_max_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_cummax_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_diag_embed_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_diff_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_fft_fft2_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_fft_ihfftn_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_flipud_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_frexp_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_full_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_geqrf_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_gradient_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_histc_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_igammac_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_imag_cuda_complex64, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_index_put_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_index_reduce_amin_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_index_reduce_prod_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_isnan_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_jiterator_4inputs_with_extra_args_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_cholesky_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_ldl_solve_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_vecdot_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linspace_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linspace_tensor_overload_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_logical_and_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_masked_logsumexp_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_masked_median_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_masked_norm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_matmul_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_mode_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_narrow_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_native_dropout_backward_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_new_empty_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_new_zeros_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_multilabel_soft_margin_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_norm_fro_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_ones_like_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_polygamma_polygamma_n_4_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_put_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_reshape_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_rot90_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_select_scatter_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_sign_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_signal_windows_blackman_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_softmax_with_dtype_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_squeeze_multiple_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_t_copy_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_tanh_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_tensor_split_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_trapz_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_unravel_index_cuda_int64, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_unsafe_chunk_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout__refs_arange_cuda_uint8, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout__refs_linspace_cuda_complex128, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout__refs_linspace_cuda_complex64, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout__refs_linspace_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout__refs_linspace_tensor_overload_cuda_int64, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout__refs_ones_cuda_int32, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout__refs_zeros_cuda_float16, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout__refs_zeros_cuda_uint8, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout_full_cuda_bfloat16, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout_logspace_cuda_complex64, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout_logspace_tensor_overload_cuda_int16, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout_logspace_tensor_overload_cuda_int32, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout_ones_cuda_bfloat16, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout_zeros_cuda_int16, test/test_ops.py::TestTagsCUDA::test_tags___rdiv___cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags___rxor___cuda_int64, test/test_ops.py::TestTagsCUDA::test_tags__refs__conversions_chalf_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_as_strided_copy_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_atan_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_broadcast_shapes_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_bucketize_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_diagonal_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_empty_strided_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_equal_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_expand_copy_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_expm1_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_fft_fft_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_fft_ihfftn_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_fill_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_floor_divide_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_gcd_cuda_int64, test/test_ops.py::TestTagsCUDA::test_tags__refs_i0_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_lcm_cuda_int64, test/test_ops.py::TestTagsCUDA::test_tags__refs_logical_and_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_masked_fill_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_nan_to_num_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_ne_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_nn_functional_elu_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_nn_functional_glu_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_nn_functional_l1_loss_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_nn_functional_layer_norm_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_nn_functional_prelu_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_nn_functional_relu6_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_nn_functional_threshold_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_ones_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_pow_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_reciprocal_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_softmax_with_dtype_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_special_ndtri_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_square_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_std_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_tril_indices_cuda_int64, test/test_ops.py::TestTagsCUDA::test_tags__refs_triu_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_vdot_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_view_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__segment_reduce_offsets_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_addcmul_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_addmv_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_argwhere_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_block_diag_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_cholesky_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_column_stack_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_contiguous_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_cross_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_diag_embed_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_dot_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_expand_copy_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_expand_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_fft_fftshift_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_fft_hfftn_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_flatten_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_grid_sampler_2d_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_index_reduce_amax_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_istft_cuda_complex64, test/test_ops.py::TestTagsCUDA::test_tags_jiterator_unary_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_lcm_cuda_int64, test/test_ops.py::TestTagsCUDA::test_tags_linalg_eig_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_linalg_pinv_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_logdet_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_logical_and_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_lu_solve_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_masked_amin_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_max_pool2d_with_indices_backward_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_max_reduction_with_dim_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nan_to_num_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nanmedian_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_narrow_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_native_dropout_backward_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_ne_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nextafter_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_batch_norm_without_cudnn_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_bilinear_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_binary_cross_entropy_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_feature_alpha_dropout_without_train_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_grid_sample_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_hardtanh_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_interpolate_nearest-exact_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_leaky_relu_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_max_unpool1d_grad_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_pad_replicate_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_prelu_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nonzero_static_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_norm_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_normal_number_mean_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_randint_like_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_resolve_conj_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_scatter_reduce_mean_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_scatter_reduce_sum_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_softmax_with_dtype_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_special_hermite_polynomial_h_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_special_polygamma_special_polygamma_n_0_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_special_zeta_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_split_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_split_with_sizes_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_std_mean_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_trapezoid_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_triangular_solve_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_unbind_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_unique_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_var_mean_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_view_cuda_float32 2025-12-04T14:37:58.4472281Z 2025-12-04T14:37:58.4472619Z Finished test_ops 2/11 ... [2025-12-04 14:37:58.199705][19506.582596802], took 21.76min 2025-12-04T14:37:58.4473659Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_ops/test_ops-d95bfbe57b5d2d89.xml 2025-12-04T14:37:59.7542973Z Uploading artifacts took 1.37 seconds 2025-12-04T14:37:59.7546682Z Running test_ops 7/11 ... [2025-12-04 14:37:59.754483][19508.137376493] 2025-12-04T14:37:59.7547186Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T14:37:59.7551823Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_ops.py', '--shard-id=7', '--num-shards=11', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 14:37:59.754953] 2025-12-04T14:57:29.2582061Z 2025-12-04T14:57:29.2583281Z test_ops 7/11 was successful, full logs can be found in artifacts with path test/test-reports/test_ops_7.11_97114ebb7b0ad963_.log 2025-12-04T14:57:29.3860347Z Running 3090 items in this shard: test/test_ops.py::TestCommonCUDA::test_compare_cpu___ror___cuda_int64, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs__conversions_bool_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs__conversions_half_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs__conversions_int_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_addcdiv_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_addcmul_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_alias_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_diagonal_scatter_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_empty_strided_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_index_add_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_index_fill_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_log_softmax_with_dtype_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_logaddexp2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_new_ones_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_split_with_sizes_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_as_strided_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_as_strided_scatter_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_bfloat16_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_copysign_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_div_trunc_rounding_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_expand_as_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_fft_fftshift_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_fft_ifftshift_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_flip_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_full_like_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_hstack_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_hypot_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_index_put_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_index_reduce_prod_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_int_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_linalg_cond_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_linalg_eigvals_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_linalg_pinv_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_linalg_slogdet_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_logaddexp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_logcumsumexp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_lu_solve_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_lu_unpack_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_masked_logaddexp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_masked_normalize_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_mul_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_adaptive_avg_pool3d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_batch_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_binary_cross_entropy_with_logits_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_dropout3d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_interpolate_linear_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_interpolate_nearest_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_leaky_relu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_linear_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_margin_ranking_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_max_unpool2d_grad_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_multilabel_margin_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_normalize_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_pixel_unshuffle_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_put_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_rot90_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_scatter_add_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_select_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_softmax_with_dtype_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_sparse_mm_reduce_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_special_chebyshev_polynomial_u_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_special_legendre_polynomial_p_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_special_shifted_chebyshev_polynomial_v_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_std_unbiased_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_take_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_tensordot_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_view_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_xlogy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_zeros_cuda_float32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_T_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_alias_copy_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_cfloat_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_conj_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_diagonal_copy_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_eq_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_fft_fftshift_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_fill_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_index_put_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_index_select_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_lerp_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_long_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_neg_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_positive_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_resolve_neg_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_sgn_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_sin_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_split_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_trace_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_zeros_like_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_dtypes__refs__conversions_bool_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs__conversions_float_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs__conversions_long_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_addcmul_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_bitwise_and_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_conj_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_conj_physical_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_contiguous_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_div_no_rounding_mode_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_exp2_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_eye_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_fft_fft_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_fft_ifftshift_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_fft_ihfftn_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_fft_irfftn_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_flatten_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_isnan_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_isneginf_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_item_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_linspace_tensor_overload_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_logspace_tensor_overload_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_nn_functional_dropout_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_nn_functional_hardshrink_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_ones_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_rad2deg_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_special_bessel_j0_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_special_multigammaln_mvlgamma_p_3_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_t_copy_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_vdot_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_view_copy_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__unsafe_masked_index_put_accumulate_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_allclose_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_aminmax_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_as_strided_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_bfloat16_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_bitwise_and_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_ceil_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_corrcoef_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_cov_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_cummax_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_cummin_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_cumprod_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_diff_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_fft_rfftn_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_fill_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_flatten_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_full_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_gather_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_grid_sampler_3d_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_histogramdd_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_imag_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_index_add_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_index_reduce_amin_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_int_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_isposinf_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_linalg_eigvals_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_linalg_eigvalsh_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_linalg_solve_triangular_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_linalg_vander_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_logical_and_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_mT_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_masked_norm_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_matmul_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nansum_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_narrow_copy_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_adaptive_max_pool1d_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_bilinear_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_celu_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_feature_alpha_dropout_without_train_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_fractional_max_pool2d_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_gelu_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_hardtanh_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_hinge_embedding_loss_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_interpolate_area_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_l1_loss_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_max_unpool3d_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_prelu_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_relu_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_scaled_dot_product_attention_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_silu_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_soft_margin_loss_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_softmin_with_dtype_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_softplus_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_triplet_margin_with_distance_loss_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_ones_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_pinverse_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_polygamma_polygamma_n_3_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_put_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_randn_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_round_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_scatter_reduce_amin_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_signal_windows_general_hamming_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_sin_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_sinc_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_special_bessel_y0_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_special_i1e_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_sqrt_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_std_unbiased_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_sub_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_tile_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_torch_ops_aten__efficient_attention_forward_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_unique_consecutive_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_var_mean_unbiased_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_vsplit_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_vstack_cuda, test/test_ops.py::TestCommonCUDA::test_errors___rsub___cuda, test/test_ops.py::TestCommonCUDA::test_errors_amin_cuda, test/test_ops.py::TestCommonCUDA::test_errors_bucketize_cuda, test/test_ops.py::TestCommonCUDA::test_errors_cat_cuda, test/test_ops.py::TestCommonCUDA::test_errors_diag_cuda, test/test_ops.py::TestCommonCUDA::test_errors_fft_fft_cuda, test/test_ops.py::TestCommonCUDA::test_errors_fft_ifft_cuda, test/test_ops.py::TestCommonCUDA::test_errors_fft_irfftn_cuda, test/test_ops.py::TestCommonCUDA::test_errors_fliplr_cuda, test/test_ops.py::TestCommonCUDA::test_errors_float_power_cuda, test/test_ops.py::TestCommonCUDA::test_errors_gather_cuda, test/test_ops.py::TestCommonCUDA::test_errors_jiterator_binary_cuda, test/test_ops.py::TestCommonCUDA::test_errors_linalg_lstsq_cuda, test/test_ops.py::TestCommonCUDA::test_errors_nn_functional_gaussian_nll_loss_cuda, test/test_ops.py::TestCommonCUDA::test_errors_nn_functional_margin_ranking_loss_cuda, test/test_ops.py::TestCommonCUDA::test_errors_pow_cuda, test/test_ops.py::TestCommonCUDA::test_errors_signal_windows_hamming_cuda, test/test_ops.py::TestCommonCUDA::test_errors_sparse_mul_layout4_cuda, test/test_ops.py::TestCommonCUDA::test_errors_special_hermite_polynomial_he_cuda, test/test_ops.py::TestCommonCUDA::test_errors_sum_to_size_cuda, test/test_ops.py::TestCommonCUDA::test_errors_trace_cuda, test/test_ops.py::TestCommonCUDA::test_errors_view_cuda, test/test_ops.py::TestCommonCUDA::test_errors_xlogy_cuda, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch__batch_norm_with_update_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_addcdiv_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_addcmul_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_addr_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_alias_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_argmin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_asin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_cross_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_cumsum_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_diagonal_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_div_no_rounding_mode_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_fft_ihfftn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_fft_rfft_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_full_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_igamma_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_igammac_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_isin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_lgamma_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_linalg_eig_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_linalg_householder_product_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_linalg_pinv_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_linalg_svd_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_log_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_logical_or_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_max_binary_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_max_reduction_no_dim_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_mode_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_mul_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_nanmean_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_polygamma_polygamma_n_0_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_rsqrt_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_scatter_add_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_scatter_reduce_amax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_sign_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_special_erfcx_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_special_modified_bessel_k1_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_special_ndtr_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_tensordot_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_where_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices___radd___cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices___rdiv___cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices___rmatmul___cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices___rsub___cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices__chunk_cat_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices__native_batch_norm_legit_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices__unsafe_masked_index_put_accumulate_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_addbmm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_amin_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_aminmax_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_as_strided_copy_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_as_strided_scatter_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_broadcast_shapes_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_broadcast_tensors_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_broadcast_to_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_constant_pad_nd_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_count_nonzero_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_cov_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_cumprod_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_cumsum_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_cumulative_trapezoid_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_double_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_empty_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_empty_permuted_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_fft_fftshift_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_fft_ifftshift_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_fft_irfft_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_float_power_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_fmin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_geometric_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_histc_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_hsplit_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_index_add_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_index_reduce_amin_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_int_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_isin_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_jiterator_4inputs_with_extra_args_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_le_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_lgamma_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_lstsq_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_matrix_rank_hermitian_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_pinv_hermitian_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_logaddexp2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_logical_xor_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_logit_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_logspace_tensor_overload_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_masked_cumprod_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_masked_select_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_mul_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_multinomial_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_mvlgamma_mvlgamma_p_3_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nanmedian_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_narrow_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_new_empty_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_new_empty_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_new_zeros_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_channel_shuffle_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_conv_transpose1d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_instance_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_interpolate_area_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_interpolate_nearest_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_layer_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_max_unpool3d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_mish_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_pad_reflect_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_silu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_threshold_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_ones_like_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_permute_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_polygamma_polygamma_n_3_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_pow_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_rand_like_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_real_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_reshape_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_resize__cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_resize__cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_scatter_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_select_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_sgn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_sinh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_softmax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_bessel_j0_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_bessel_y0_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_bessel_y0_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_modified_bessel_i1_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_modified_bessel_k0_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_ndtri_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_shifted_chebyshev_polynomial_u_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_squeeze_copy_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_squeeze_multiple_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_std_mean_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_sub_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_sum_to_size_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_take_along_dim_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_trapz_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_unbind_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_unique_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_where_cuda_int64, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_asin_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_block_diag_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_count_nonzero_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_deg2rad_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_diag_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_diagonal_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_eq_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_erfinv_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_exp2_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_expand_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_fft_ifftn_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_fft_ihfftn_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_fft_rfft2_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_gt_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_half_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_hstack_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_lgamma_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_masked_select_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_min_reduction_with_dim_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_narrow_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_nn_functional_softsign_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_ones_like_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_permute_copy_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_roll_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_scatter_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_short_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_special_bessel_y0_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_special_hermite_polynomial_he_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_special_i1e_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_special_scaled_modified_bessel_k1_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_squeeze_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_tanh_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_transpose_copy_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_triu_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_unsafe_split_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_unsqueeze_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_vstack_cuda_bool, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples___radd___cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples___rmul___cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples__chunk_cat_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples__softmax_backward_data_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples__unsafe_masked_index_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_addcmul_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_addcmul_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_addmm_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_addmm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_addr_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_alias_copy_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_all_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_all_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_argmax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_argwhere_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_atleast_2d_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_atleast_3d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_bfloat16_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_bitwise_left_shift_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_block_diag_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cartesian_prod_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cdouble_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cdouble_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cholesky_inverse_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cholesky_inverse_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_chunk_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_constant_pad_nd_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_constant_pad_nd_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_count_nonzero_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cross_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cummax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cumsum_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cumsum_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_diagonal_scatter_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_div_no_rounding_mode_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_div_trunc_rounding_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_double_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_empty_permuted_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_erf_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_erfc_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_erfc_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_expand_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_hfft2_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_ifft_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_flip_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_flip_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_float_power_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_i0_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_index_add_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_index_copy_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_index_reduce_amax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_index_select_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_isclose_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_isnan_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_kron_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_lgamma_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_cholesky_ex_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_cross_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_ldl_factor_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_lu_solve_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_matrix_rank_hermitian_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_norm_subgradients_at_zero_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_pinv_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_pinv_singular_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_solve_triangular_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_vecdot_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linspace_tensor_overload_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_log2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_log_softmax_with_dtype_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_logical_or_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_logical_xor_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_logspace_tensor_overload_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_lu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_amin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_argmax_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_fill_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_softmin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_std_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_maximum_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_median_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_minimum_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_mul_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_alpha_dropout_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_conv1d_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_dropout3d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_kl_div_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_multilabel_margin_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_pad_circular_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_smooth_l1_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_softmin_with_dtype_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_tanhshrink_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_triplet_margin_with_distance_loss_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_normal_number_mean_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_ones_like_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_polygamma_polygamma_n_0_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_polygamma_polygamma_n_2_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_polygamma_polygamma_n_3_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_qr_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_rand_like_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_randint_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_randn_like_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_scatter_add_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_scatter_reduce_amax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_scatter_reduce_sum_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_signal_windows_kaiser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_bessel_y1_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_chebyshev_polynomial_w_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_entr_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_laguerre_polynomial_l_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_polygamma_special_polygamma_n_0_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_scaled_modified_bessel_k1_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_shifted_chebyshev_polynomial_v_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_shifted_chebyshev_polynomial_w_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_split_with_sizes_copy_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_squeeze_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_svd_lowrank_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_t_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_take_along_dim_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_take_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_tensor_split_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_tile_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_to_sparse_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_torch_ops_aten__safe_softmax_default_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_transpose_copy_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_transpose_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_tril_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_triu_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_true_divide_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_uniform_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_unique_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_unsqueeze_copy_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_unsqueeze_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_var_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_var_mean_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_view_as_real_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_vsplit_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_vstack_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_where_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_zero__cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_zeros_like_cuda_int64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_argwhere_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_numpy_ref_cat_cuda_float64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_diag_cuda_float64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_diff_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_numpy_ref_equal_cuda_int64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_jiterator_4inputs_with_extra_args_cuda_int64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_linalg_cross_cuda_float64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_nn_functional_conv_transpose1d_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_numpy_ref_nn_functional_conv_transpose3d_cuda_float64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_nn_functional_group_norm_cuda_float64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_nn_functional_rms_norm_cuda_float64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_searchsorted_cuda_float64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_tensor_split_cuda_int64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_transpose_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_out___rxor___cuda_int64, test/test_ops.py::TestCommonCUDA::test_out__batch_norm_with_update_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs__conversions_double_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_atleast_1d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_bitwise_xor_cuda_int64, test/test_ops.py::TestCommonCUDA::test_out__refs_ceil_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_clamp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_cosh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_empty_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_exp2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_expand_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_hypot_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_linalg_diagonal_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_log10_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_logaddexp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_logical_and_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_narrow_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_nn_functional_log_softmax_with_dtype_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_nn_functional_nll_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_nn_functional_relu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_real_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_renorm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_softmax_with_dtype_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_tensor_split_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_trunc_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_unfold_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_view_as_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_atleast_2d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_atleast_3d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_broadcast_to_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_clamp_min_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_column_stack_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_double_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_equal_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_exp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_expand_as_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_expand_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_expand_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_fft_fftn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_fft_ifft2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_flatten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_full_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_heaviside_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_histc_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_hsplit_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_index_reduce_prod_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_integral_dtype__refs_sum_cuda_int16, test/test_ops.py::TestCommonCUDA::test_out_jiterator_unary_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_kron_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_linalg_cholesky_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_linalg_inv_ex_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_linalg_vecdot_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_linspace_tensor_overload_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_logical_not_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_logspace_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_masked_sum_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_max_pool2d_with_indices_backward_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_max_reduction_no_dim_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_mean_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_mv_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nan_to_num_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_adaptive_max_pool2d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_bilinear_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_embedding_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_group_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_interpolate_trilinear_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_local_response_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_multi_head_attention_forward_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_pad_replicate_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_softplus_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_threshold_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_polygamma_polygamma_n_4_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_rad2deg_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_randint_like_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_abs_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_addbmm_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_alias_copy_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_atanh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_ceil_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_diff_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_exp_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_fft_fft_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_fft_irfft_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_fft_irfftn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_fmin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_frexp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_linalg_lu_factor_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_linalg_matrix_norm_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_linalg_norm_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_linalg_pinv_hermitian_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_linalg_solve_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_linalg_svd_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_linalg_tensorsolve_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_logspace_tensor_overload_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_lu_unpack_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_max_binary_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_nansum_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_nn_functional_gelu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_norm_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_qr_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_square_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_std_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_triangular_solve_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_triu_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_xlogy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_reshape_as_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_round_decimals_0_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_scalar_tensor_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_softmax_with_dtype_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_sparse_sampled_addmm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_special_airy_ai_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_special_chebyshev_polynomial_t_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_split_with_sizes_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_stack_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_sub_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_t_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_tril_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_uniform_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_unique_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_var_unbiased_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_view_as_complex_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_view_as_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_warning___radd___cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__batch_norm_with_update_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_acosh_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_addcdiv_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_amax_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_atan2_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_atleast_2d_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_cosh_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_diagonal_copy_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_div_floor_rounding_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_empty_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_eye_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_fft_fft_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_fft_irfft2_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_fft_rfft2_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_floor_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_fmin_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_gcd_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_i0_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_index_fill_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_index_select_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_istft_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_linalg_norm_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_linalg_svdvals_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_log10_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_nn_functional_glu_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_nn_functional_margin_ranking_loss_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_nn_functional_pdist_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_nn_functional_pixel_shuffle_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_norm_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_normal_number_mean_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_special_erfcx_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_special_multigammaln_mvlgamma_p_3_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_special_xlog1py_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_sub_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_tan_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_unflatten_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_vdot_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_vsplit_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__segment_reduce_lengths_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__segment_reduce_offsets_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_alias_copy_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_any_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_argsort_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_atanh_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_chunk_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_cumsum_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_diagonal_copy_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_eye_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_fft_ihfft_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_flatten_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_frexp_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_half_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_histogram_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_hstack_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_index_fill_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_isneginf_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_kron_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_lerp_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_linalg_cholesky_ex_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_linalg_det_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_linalg_lstsq_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_linalg_lu_factor_ex_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_linalg_pinv_hermitian_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_linalg_solve_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_log2_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_logical_and_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_masked_cumsum_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_meshgrid_list_of_tensors_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_native_batch_norm_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_alpha_dropout_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_avg_pool2d_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_binary_cross_entropy_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_conv1d_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_conv2d_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_l1_loss_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_leaky_relu_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_max_unpool1d_grad_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_nll_loss_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_relu6_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nonzero_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_polygamma_polygamma_n_4_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_real_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_repeat_interleave_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_scatter_reduce_mean_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_select_scatter_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_signal_windows_blackman_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_signal_windows_general_hamming_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_softmax_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_special_i1_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_split_with_sizes_copy_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_tensordot_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_to_sparse_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_true_divide_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_unfold_copy_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_unsqueeze_copy_cuda, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_acos_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_acosh_cuda_int16, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_acosh_cuda_int8, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_asin_cuda_int8, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_asinh_cuda_int16, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_asinh_cuda_int8, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_atan2_cuda_bool, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_atan2_cuda_int8, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_div_no_rounding_mode_cuda_bool, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_erf_cuda_int32, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_erfc_cuda_int64, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_exp2_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_ldexp_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_lgamma_cuda_int8, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_log_cuda_bool, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_log_cuda_int64, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_masked_var_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_mvlgamma_mvlgamma_p_1_cuda_int64, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_polygamma_polygamma_n_0_cuda_bool, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_polygamma_polygamma_n_0_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_polygamma_polygamma_n_1_cuda_int64, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_polygamma_polygamma_n_2_cuda_int16, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_rsqrt_cuda_int16, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_rsqrt_cuda_int32, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_sigmoid_cuda_int32, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_sigmoid_cuda_int8, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_sin_cuda_int8, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_sinh_cuda_int16, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_special_chebyshev_polynomial_t_cuda_int32, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_special_chebyshev_polynomial_t_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_special_chebyshev_polynomial_u_cuda_bool, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_special_chebyshev_polynomial_v_cuda_bool, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_special_chebyshev_polynomial_w_cuda_int16, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_special_hermite_polynomial_he_cuda_int8, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_special_laguerre_polynomial_l_cuda_int8, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_special_shifted_chebyshev_polynomial_v_cuda_int64, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_special_shifted_chebyshev_polynomial_w_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_special_xlog1py_cuda_int64, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_special_zeta_cuda_int8, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_true_divide_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_T_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_bfloat16_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_bfloat16_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_bfloat16_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_byte_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_cdouble_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_chalf_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_chalf_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_chalf_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_chalf_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_char_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_char_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_complex_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_double_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_float_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_float_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_half_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_int_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_long_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_long_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_long_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_short_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_acos_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_acos_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_acosh_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_add_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_add_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_add_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_addcdiv_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_addr_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_addr_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_alias_copy_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_all_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_arange_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_copy_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_copy_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_partial_views_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_scatter_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_scatter_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_scatter_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_scatter_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_asin_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_asin_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_asin_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atleast_2d_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atleast_2d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_bitwise_left_shift_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_bitwise_not_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_bitwise_or_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_bitwise_xor_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_block_diag_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_block_diag_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_broadcast_shapes_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_broadcast_to_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_bucketize_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_cat_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_cat_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_ceil_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_chunk_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_clamp_max_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_clamp_max_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_clone_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_column_stack_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_column_stack_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_conj_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_conj_physical_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_conj_physical_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_constant_pad_nd_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_contiguous_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_copysign_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_copysign_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_cos_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_cumsum_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_deg2rad_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_diag_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_diag_embed_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_diag_embed_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_diag_embed_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_diagonal_copy_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_diagonal_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_diagonal_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_diagonal_scatter_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_diagonal_scatter_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_digamma_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_digamma_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_digamma_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_div_floor_rounding_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_div_no_rounding_mode_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_div_no_rounding_mode_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_div_trunc_rounding_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_div_trunc_rounding_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_dsplit_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_dsplit_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_dsplit_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_dstack_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_dstack_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_empty_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_empty_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_empty_like_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_empty_like_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_eq_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_eq_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_eq_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_erfc_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_erfc_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_exp_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_expand_as_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_expand_copy_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_expand_copy_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_expand_copy_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_expand_copy_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_expm1_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_expm1_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_eye_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_eye_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fft2_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fft2_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fft_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fftn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fftshift_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_hfft2_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_hfft2_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifft2_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifft_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifft_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifft_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifft_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifftshift_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ihfft2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_irfft2_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_irfftn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_irfftn_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_irfftn_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_rfft2_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_rfft_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_rfft_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fill_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_flatten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_flatten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_flatten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_flip_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fliplr_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_flipud_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_flipud_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_float_power_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_floor_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_floor_divide_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fmod_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_frac_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_geometric_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_heaviside_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_heaviside_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_hsplit_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_hstack_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_imag_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_index_add_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_index_copy_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_index_copy_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_index_select_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_index_select_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_index_select_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isclose_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isinf_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isposinf_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isposinf_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isreal_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_item_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_lerp_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_lgamma_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_lgamma_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_linalg_diagonal_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_linalg_diagonal_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_linalg_diagonal_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_linalg_diagonal_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_linalg_svd_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_linspace_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_linspace_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_linspace_tensor_overload_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_linspace_tensor_overload_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_log10_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_log1p_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_log1p_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_log2_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_log2_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_log_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_log_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_log_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_log_normal_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_log_softmax_with_dtype_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logaddexp2_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_and_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_and_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_not_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logsumexp_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logsumexp_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_lt_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_meshgrid_variadic_tensors_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_meshgrid_variadic_tensors_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_minimum_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_movedim_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_movedim_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_mul_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_mul_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_mul_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nan_to_num_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nan_to_num_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_narrow_copy_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_ne_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_neg_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_new_empty_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_new_empty_strided_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_new_full_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_new_ones_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_new_zeros_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_new_zeros_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nextafter_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_celu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_dropout_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_glu_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_glu_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_hardshrink_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_hardshrink_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_hardtanh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_hinge_embedding_loss_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_huber_loss_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_l1_loss_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_l1_loss_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_layer_norm_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_log_softmax_with_dtype_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_margin_ranking_loss_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_margin_ranking_loss_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_margin_ranking_loss_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_mish_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_mish_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_nll_loss_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_pdist_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_pdist_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_pixel_shuffle_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_poisson_nll_loss_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_poisson_nll_loss_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_relu6_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_selu_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_softmax_with_dtype_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_tanhshrink_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_threshold_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_triplet_margin_loss_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_normal_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_ones_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_ones_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_permute_copy_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_permute_copy_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_permute_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_positive_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_pow_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_pow_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_prod_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_rad2deg_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_rad2deg_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_ravel_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_real_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_real_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_renorm_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_reshape_as_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_reshape_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_roll_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_roll_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_rot90_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_rsqrt_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_select_scatter_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sigmoid_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sigmoid_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sign_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_signbit_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sin_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sin_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sinh_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sinh_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sinh_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sinh_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_bessel_j0_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_bessel_j0_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_bessel_j0_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_bessel_j1_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_entr_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_erfcx_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_i1_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_i1_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_i1e_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_i1e_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_logit_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_multigammaln_mvlgamma_p_5_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_softmax_with_dtype_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_square_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_squeeze_multiple_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_squeeze_multiple_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_stack_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_stack_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sub_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sum_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_take_along_dim_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_tensor_split_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_transpose_copy_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_triu_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_true_divide_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_trunc_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unbind_copy_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unbind_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unbind_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unfold_copy_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unfold_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unfold_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unsqueeze_copy_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unsqueeze_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_var_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_vdot_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_view_copy_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_view_copy_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_view_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_view_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_view_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_view_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_vsplit_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_vsplit_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_where_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_where_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_xlogy_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_bitwise_and_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_copysign_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_dot_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_dsplit_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_fft_fft2_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_fft_fftn_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_fft_hfft_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_fft_ihfft2_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_fmod_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_index_select_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_item_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_linalg_cross_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_movedim_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_nn_functional_l1_loss_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_special_xlog1py_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_t_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_trace_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_T_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bool_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bool_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cdouble_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cdouble_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cdouble_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cdouble_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cfloat_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cfloat_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_char_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_int_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_long_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_long_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_long_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_short_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_short_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_short_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_short_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_short_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_acosh_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_acosh_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_acosh_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_add_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_add_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addcdiv_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addcdiv_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_amin_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_amin_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_any_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_any_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_any_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_arange_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_arange_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_arange_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_scatter_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_scatter_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_scatter_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asin_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asinh_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asinh_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atan2_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atan_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atan_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atanh_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atanh_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_2d_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_3d_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_3d_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_3d_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_left_shift_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_block_diag_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_block_diag_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_broadcast_tensors_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_broadcast_tensors_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_broadcast_tensors_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_broadcast_to_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_broadcast_to_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_broadcast_to_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bucketize_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_chunk_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_chunk_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_max_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clone_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_column_stack_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_column_stack_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_conj_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_conj_physical_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_conj_physical_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_conj_physical_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_contiguous_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_contiguous_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cos_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cos_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cosh_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cosh_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cosh_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_count_nonzero_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_count_nonzero_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diag_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diag_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diag_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_copy_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_copy_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_copy_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_scatter_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_scatter_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_scatter_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_no_rounding_mode_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_no_rounding_mode_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_trunc_rounding_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dsplit_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dstack_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_strided_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_strided_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erfinv_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_exp2_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_exp_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expand_as_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expand_copy_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expand_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_eye_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fft2_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fft2_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fft_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fft_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fftn_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fftn_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fftshift_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfft_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifft_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifftshift_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfft2_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfft_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfftn_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfft2_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfft2_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfft_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfft_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfftn_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flatten_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flatten_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flatten_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flip_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flip_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fliplr_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_float_power_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_float_power_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_floor_divide_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_floor_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmax_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmax_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmod_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_frexp_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_frexp_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_frexp_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_gcd_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_gcd_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_gcd_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ge_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ge_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ge_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_heaviside_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_i0_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_igammac_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_imag_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_fill_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_select_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_select_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_select_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isclose_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isfinite_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isinf_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isnan_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isneginf_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isposinf_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isposinf_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isposinf_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isreal_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isreal_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lcm_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_le_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lerp_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_cross_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_svd_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log10_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log10_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log10_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log1p_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log2_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_normal_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_softmax_with_dtype_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_and_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_not_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logspace_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logspace_tensor_overload_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logspace_tensor_overload_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logspace_tensor_overload_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logsumexp_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_masked_fill_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_mean_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_mean_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_meshgrid_variadic_tensors_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_minimum_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_movedim_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_mul_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ne_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ne_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_empty_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_empty_strided_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_empty_strided_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_ones_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_gelu_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_hardtanh_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_huber_loss_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_l1_loss_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_leaky_relu_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_log_softmax_with_dtype_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_mish_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_mse_loss_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_pairwise_distance_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_pixel_unshuffle_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_pixel_unshuffle_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_poisson_nll_loss_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softplus_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softplus_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_tanhshrink_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_normal__in_place_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_normal_number_mean_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ones_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ones_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_permute_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_positive_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_pow_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_pow_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_prod_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_prod_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ravel_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_real_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_real_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_remainder_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_as_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_as_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_roll_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_roll_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_roll_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_round_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rsub_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rsub_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_select_scatter_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sgn_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sigmoid_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sigmoid_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sigmoid_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sin_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sinh_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sinh_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_bessel_j0_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_entr_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_erfcx_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_i0e_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_i1_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_i1_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_i1e_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_logit_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_logit_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_logit_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_spherical_bessel_j0_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_xlog1py_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_zeta_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_square_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_square_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_squeeze_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_squeeze_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_std_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_std_mean_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sub_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sub_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sum_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sum_to_size_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sum_to_size_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sum_to_size_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_t_copy_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_take_along_dim_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tanh_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tanh_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tensor_split_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_to_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_trace_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_trace_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_trace_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tril_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tril_indices_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_triu_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_triu_indices_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_triu_indices_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_true_divide_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_trunc_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unbind_copy_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unbind_copy_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unbind_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unsqueeze_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_var_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_var_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_vsplit_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_vsplit_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_where_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_xlogy_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_xlogy_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_zeros_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_zeros_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_zeros_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_T_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_T_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_cfloat_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_cfloat_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_chalf_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_char_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_complex_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_double_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_float_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_half_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_half_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_long_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_short_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_add_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_add_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_addcdiv_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_addcdiv_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_addcmul_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_addcmul_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_addcmul_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_addr_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_addr_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_alias_copy_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_all_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_allclose_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_allclose_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_amax_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_amax_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_amin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_any_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_any_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_partial_views_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_partial_views_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_partial_views_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_partial_views_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_asinh_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_asinh_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atan_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atan_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atanh_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atanh_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atleast_3d_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_bitwise_or_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_bitwise_right_shift_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_broadcast_tensors_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_broadcast_to_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cat_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ceil_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_chunk_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_chunk_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_chunk_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_clamp_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_clamp_max_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_clamp_min_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_clone_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_clone_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_conj_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_conj_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_conj_physical_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_constant_pad_nd_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_contiguous_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cosh_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cumprod_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cumprod_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cumprod_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_deg2rad_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_deg2rad_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_deg2rad_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diag_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diag_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diag_embed_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diag_embed_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diagonal_copy_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diagonal_copy_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diagonal_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diagonal_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diagonal_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diagonal_scatter_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_digamma_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_div_trunc_rounding_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_dstack_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_empty_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_empty_like_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_empty_strided_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_empty_strided_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_equal_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_erf_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_erfinv_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_erfinv_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_erfinv_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_exp2_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_exp_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_expand_as_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_expand_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_expand_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_eye_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_eye_cuda_float8_e5m2, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fft2_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fft2_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fftn_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fftshift_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_hfft_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_hfftn_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_hfftn_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifft2_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifft2_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifft_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifft_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifft_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifftshift_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifftshift_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifftshift_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ihfftn_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_irfft_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_irfftn_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_flatten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_flatten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_flatten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_flip_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_flip_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_flip_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fliplr_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_flipud_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_flipud_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_float_power_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_floor_divide_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fmax_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fmin_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fmod_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fmod_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fmod_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_frexp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ge_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_geometric_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_gt_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_heaviside_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_heaviside_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_hsplit_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_hsplit_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_hstack_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_hypot_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_add_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_add_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_copy_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_copy_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_fill_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_fill_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_fill_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_select_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_select_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_select_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isclose_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isclose_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isfinite_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isfinite_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isinf_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isnan_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isposinf_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isreal_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_item_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_le_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_lgamma_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linalg_cross_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linalg_diagonal_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linalg_diagonal_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linalg_diagonal_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linalg_svd_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linalg_svd_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linspace_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linspace_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linspace_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log10_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log2_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log_softmax_with_dtype_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log_softmax_with_dtype_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logaddexp_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_and_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_xor_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_xor_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_xor_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logspace_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logspace_tensor_overload_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_masked_fill_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_masked_fill_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_meshgrid_list_of_tensors_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_meshgrid_variadic_tensors_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_meshgrid_variadic_tensors_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_movedim_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_narrow_copy_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_narrow_copy_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_narrow_copy_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_narrow_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_neg_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_alpha_dropout_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_dropout_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_dropout_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_glu_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_hardshrink_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_hinge_embedding_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_l1_loss_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_log_softmax_with_dtype_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_margin_ranking_loss_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_mse_loss_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_pairwise_distance_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_pairwise_distance_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_pairwise_distance_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_pixel_unshuffle_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_pixel_unshuffle_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_poisson_nll_loss_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_relu_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_relu_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_relu_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_softmax_with_dtype_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_softmin_with_dtype_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_softmin_with_dtype_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_softshrink_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_softshrink_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_tanhshrink_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_threshold_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_triplet_margin_loss_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_triplet_margin_loss_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_norm_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_normal__in_place_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_normal_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ones_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ones_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_permute_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_permute_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_permute_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_prod_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_rad2deg_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ravel_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_real_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_renorm_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_reshape_as_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_reshape_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_reshape_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_reshape_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_rot90_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_rsqrt_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_rsqrt_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_rsub_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sgn_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sgn_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sgn_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sign_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sign_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_signbit_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sin_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sin_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_softmax_with_dtype_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_bessel_j0_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_bessel_j1_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_erfcx_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_i0e_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_i0e_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_i0e_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_i1_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_log_ndtr_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_logit_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_logit_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_multigammaln_mvlgamma_p_1_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_multigammaln_mvlgamma_p_1_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_multigammaln_mvlgamma_p_3_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_multigammaln_mvlgamma_p_5_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_ndtr_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_ndtr_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_split_with_sizes_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_split_with_sizes_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sqrt_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_square_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_square_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_squeeze_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_squeeze_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_squeeze_multiple_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_squeeze_multiple_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_stack_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_std_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sum_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sum_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sum_to_size_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_t_copy_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_t_copy_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_take_along_dim_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tanh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tensor_split_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_to_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_to_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_trace_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_transpose_copy_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_transpose_copy_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tril_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unbind_copy_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unbind_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unbind_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unflatten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unsqueeze_copy_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_view_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_view_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_view_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_vstack_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_where_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_zeros_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_T_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_T_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_bfloat16_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_bfloat16_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_byte_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_cdouble_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_cfloat_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_cfloat_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_cfloat_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_cfloat_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_chalf_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_chalf_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_chalf_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_char_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_char_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_double_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_double_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_float_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_float_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_half_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_int_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_long_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_polar_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_short_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_short_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_acos_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_acos_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_add_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_add_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_add_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_addcdiv_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_addr_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_all_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_all_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_all_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_amax_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_amax_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_amax_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_amin_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_amin_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_any_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_arange_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_as_strided_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_as_strided_partial_views_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_as_strided_partial_views_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_as_strided_partial_views_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_as_strided_partial_views_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_asin_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_asinh_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atan2_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atan_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atan_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atanh_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atleast_1d_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bitwise_left_shift_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bitwise_not_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bitwise_or_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bitwise_xor_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_block_diag_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_block_diag_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bucketize_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bucketize_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cat_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cauchy_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ceil_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_clamp_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_clamp_max_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_clone_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_column_stack_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_column_stack_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_constant_pad_nd_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_contiguous_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_copysign_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_copysign_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cos_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cos_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_count_nonzero_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cumprod_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cumsum_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_deg2rad_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diag_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diag_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diag_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diag_embed_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diagonal_copy_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diagonal_copy_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diagonal_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diagonal_copy_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diagonal_scatter_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diagonal_scatter_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_digamma_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_div_no_rounding_mode_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_dot_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_dsplit_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_empty_like_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_empty_like_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_empty_strided_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_erfc_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_exp2_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_expand_as_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_expand_as_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_expand_copy_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_expand_copy_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_expand_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_expand_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_expand_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_expm1_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_expm1_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fft2_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fft_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fft_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fft_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fftn_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_hfft2_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_hfft2_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_hfft_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_hfftn_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifft2_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifft_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifft_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifftn_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifftn_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifftshift_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifftshift_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ihfftn_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_irfft2_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_irfft_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_irfftn_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_rfft_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_rfftn_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_rfftn_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fill_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fill_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_flatten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_flatten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_flatten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fliplr_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fliplr_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_flipud_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_float_power_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_float_power_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_floor_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_floor_divide_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_floor_divide_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fmax_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fmin_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fmin_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fmod_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_geometric_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_geometric_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_hsplit_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_imag_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_copy_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_fill_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_fill_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_select_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_select_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isfinite_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isfinite_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isinf_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isinf_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isnan_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isnan_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isneginf_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isneginf_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isneginf_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isposinf_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_item_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_item_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_le_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_lerp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linalg_diagonal_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linalg_diagonal_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linalg_diagonal_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linalg_matrix_norm_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linalg_norm_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linalg_svdvals_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linalg_vector_norm_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linspace_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linspace_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linspace_tensor_overload_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log10_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log10_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log10_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log2_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log_softmax_with_dtype_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log_softmax_with_dtype_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_not_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_not_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_or_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_or_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_xor_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logspace_tensor_overload_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_lt_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_masked_fill_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_mean_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_mean_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_meshgrid_list_of_tensors_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_meshgrid_variadic_tensors_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_minimum_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_minimum_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_movedim_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_mul_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_narrow_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_narrow_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_narrow_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_narrow_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ne_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ne_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_empty_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_empty_strided_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_empty_strided_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_empty_strided_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_zeros_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_zeros_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_celu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_channel_shuffle_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_dropout_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_hardshrink_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_l1_loss_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_leaky_relu_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_margin_ranking_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_margin_ranking_loss_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_mse_loss_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_pairwise_distance_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_pixel_shuffle_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_pixel_unshuffle_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_pixel_unshuffle_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_prelu_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_relu6_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_relu6_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_relu_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_selu_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_softmax_with_dtype_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_softmax_with_dtype_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_softplus_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_tanhshrink_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_tanhshrink_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_tanhshrink_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_triplet_margin_loss_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_triplet_margin_loss_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_normal__in_place_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_normal__in_place_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_normal_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_normal_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ones_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_permute_copy_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_positive_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_positive_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_pow_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_pow_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_pow_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_prod_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_randn_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ravel_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ravel_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_real_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_real_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_reciprocal_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_renorm_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_repeat_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_reshape_as_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_reshape_as_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_reshape_as_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_reshape_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_reshape_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_rot90_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_round_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_rsqrt_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_rsqrt_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_rsub_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_rsub_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sgn_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sigmoid_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sigmoid_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sigmoid_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sigmoid_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sign_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sign_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sinc_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sinc_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sinh_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sinh_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_bessel_j1_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_entr_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_erfcx_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_log_softmax_with_dtype_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_multigammaln_mvlgamma_p_1_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_multigammaln_mvlgamma_p_1_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_ndtr_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_ndtr_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_ndtr_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_softmax_with_dtype_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_softmax_with_dtype_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sqrt_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_squeeze_copy_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_squeeze_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_squeeze_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_squeeze_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_stack_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_stack_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_std_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_std_mean_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sub_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sub_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sum_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sum_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_t_copy_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_t_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tanh_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_trace_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_transpose_copy_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tril_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_triu_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_triu_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_triu_indices_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_true_divide_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unbind_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unbind_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unflatten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unfold_copy_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unfold_copy_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unfold_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unsqueeze_copy_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unsqueeze_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unsqueeze_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_view_as_complex_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_view_as_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_view_as_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_view_as_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_view_as_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_view_copy_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_view_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_view_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_where_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_where_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_zeros_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_zeros_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_reduction_ops_reduce_all_cuda, test/test_ops.py::TestCommonCUDA::test_reduction_ops_reduce_amin_cuda, test/test_ops.py::TestCommonCUDA::test_reduction_ops_reduce_max_reduction_with_dim_cuda, test/test_ops.py::TestCommonCUDA::test_reduction_ops_reduce_mean_cuda, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_abs_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_argmax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_argwhere_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_asinh_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_atleast_1d_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_bfloat16_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_block_diag_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_cartesian_prod_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_char_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_chunk_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_clamp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_combinations_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_constant_pad_nd_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_corrcoef_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_cos_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_diag_embed_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_diagonal_scatter_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_dist_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_div_floor_rounding_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_div_trunc_rounding_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_dot_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_empty_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_empty_strided_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_exp_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_fft_ifftshift_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_fft_irfftn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_fft_rfft_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_fliplr_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_float_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_float_power_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_floor_divide_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_gather_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_gather_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_igammac_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_isclose_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_isfinite_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_isnan_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_item_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_jiterator_2inputs_2outputs_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_jiterator_binary_return_by_ref_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_ldl_factor_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_lu_factor_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_matrix_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_norm_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_norm_subgradients_at_zero_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_qr_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_tensorsolve_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_vector_norm_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_log10_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_logical_or_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_masked_scatter_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_masked_var_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_mean_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_min_binary_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_movedim_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_mul_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_adaptive_max_pool2d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_batch_norm_without_cudnn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_celu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_cosine_embedding_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_dropout2d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_embedding_bag_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_max_pool3d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_mse_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_multilabel_soft_margin_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_nll_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_pdist_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_relu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_rms_norm_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_silu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_soft_margin_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_softsign_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_tanhshrink_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_normal_number_mean_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_qr_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_renorm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_reshape_as_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_reshape_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_resize__cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_resolve_conj_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_scalar_tensor_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_scatter_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_scatter_reduce_amin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_short_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_sigmoid_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_sign_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_slice_scatter_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_special_erfcx_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_special_scaled_modified_bessel_k0_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_split_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_squeeze_multiple_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_squeeze_multiple_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_t_copy_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_take_along_dim_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_take_along_dim_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_tan_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_tanh_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_tile_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_trace_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_trace_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_trapz_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_true_divide_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_unbind_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_zero__cuda_complex64, test/test_ops.py::TestCompositeComplianceCUDA::test_backward___getitem___cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward___rmod___cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward__native_batch_norm_legit_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward__upsample_bilinear2d_aa_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_add_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_atanh_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_clamp_min_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_column_stack_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_copysign_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_deg2rad_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_diagonal_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_double_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_exp_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_fft_hfft_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_flip_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_fliplr_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_hstack_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_lgamma_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_linalg_cross_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_linalg_inv_ex_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_linalg_matrix_norm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_linalg_pinv_singular_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_log1p_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_log2_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_log_softmax_with_dtype_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_lu_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_mH_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_masked_std_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_mvlgamma_mvlgamma_p_5_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nanmean_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_native_dropout_backward_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_conv_transpose1d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_embedding_bag_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_pairwise_distance_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_softplus_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_triplet_margin_loss_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_positive_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_ravel_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_remainder_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_select_scatter_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_sign_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_slice_scatter_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_sum_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_sum_to_size_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_to_sparse_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_trace_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input___rsub___cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_argwhere_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_as_strided_partial_views_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_cfloat_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_cummin_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_cumprod_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_dist_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_gradient_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_histc_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_hstack_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_index_add_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_index_reduce_prod_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_index_select_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_linalg_cholesky_ex_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_linalg_lstsq_grad_oriented_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_linalg_norm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_linalg_vector_norm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_logcumsumexp_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_mH_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_mT_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_masked_sum_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_matrix_exp_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_max_binary_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_min_reduction_with_dim_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_mm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_msort_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_new_empty_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_new_full_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_nn_functional_adaptive_avg_pool1d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_nn_functional_adaptive_avg_pool2d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_nn_functional_avg_pool2d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_nn_functional_conv_transpose2d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_nn_functional_gelu_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_nn_functional_interpolate_nearest-exact_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_nn_functional_linear_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_nn_functional_max_unpool1d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_nn_functional_multi_margin_loss_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_nn_functional_upsample_bilinear_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_norm_inf_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_ones_like_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_repeat_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_resolve_neg_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_roll_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_rot90_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_rsqrt_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_signal_windows_kaiser_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_special_bessel_j0_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_special_modified_bessel_i0_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_special_shifted_chebyshev_polynomial_w_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_sub_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_to_sparse_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_topk_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_transpose_copy_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_trapezoid_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_trapz_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_trunc_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_unbind_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_abs_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_acos_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_ceil_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_cholesky_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_count_nonzero_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_dstack_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_empty_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_expm1_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_fft_fft2_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_fft_rfft2_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_flatten_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_flip_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_fmax_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_full_like_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_geqrf_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_hypot_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_index_fill_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_lgamma_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linalg_ldl_factor_ex_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linalg_matrix_rank_hermitian_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_logdet_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_logical_not_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_logit_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_long_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_masked_scatter_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_matrix_exp_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_min_reduction_no_dim_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_minimum_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_mv_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_native_dropout_backward_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_native_layer_norm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_channel_shuffle_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_glu_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_hinge_embedding_loss_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_leaky_relu_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_max_unpool3d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_multilabel_soft_margin_loss_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_relu6_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_triplet_margin_with_distance_loss_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_upsample_bilinear_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nonzero_static_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_permute_copy_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_polygamma_polygamma_n_2_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_remainder_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_resize__cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_round_decimals_0_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_rsub_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_sigmoid_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_sinh_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_slice_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_sort_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_special_bessel_j0_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_special_chebyshev_polynomial_u_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_special_hermite_polynomial_h_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_special_polygamma_special_polygamma_n_0_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_square_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_sub_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_t_copy_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_t_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_tile_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_torch_ops_aten__efficient_attention_forward_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_vstack_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator__batch_norm_with_update_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator__segment_reduce_lengths_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator__softmax_backward_data_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_abs_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_amin_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_as_strided_scatter_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_atan2_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_cat_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_cdouble_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_clamp_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_cosh_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_cumsum_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_diag_embed_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_diagonal_scatter_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_dot_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_full_like_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_histc_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_hstack_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_hypot_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_igamma_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_linalg_ldl_solve_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_logaddexp_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_logdet_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_long_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_masked_select_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_mvlgamma_mvlgamma_p_1_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nansum_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_narrow_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_native_layer_norm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_batch_norm_without_cudnn_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_channel_shuffle_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_conv_transpose2d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_gaussian_nll_loss_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_layer_norm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_max_unpool3d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_multi_head_attention_forward_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_softshrink_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_randint_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_reshape_as_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_rot90_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_round_decimals_0_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_scatter_reduce_sum_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_special_bessel_y0_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_special_hermite_polynomial_he_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_special_laguerre_polynomial_l_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_split_list_args_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_stack_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_tensordot_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_triangular_solve_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_unsqueeze_copy_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_add_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_addmm_decomposed_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_addmv_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_bernoulli_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_broadcast_shapes_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_byte_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_conj_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_conj_physical_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_constant_pad_nd_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_contiguous_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_empty_permuted_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_eq_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_fft_fftshift_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_float_power_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_floor_divide_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_ge_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_grid_sampler_2d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_grid_sampler_3d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_int_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_isinf_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_isnan_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_jiterator_binary_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_lerp_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_linalg_det_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_linalg_eigvalsh_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_linalg_ldl_solve_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_linalg_matrix_rank_hermitian_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_linalg_tensorsolve_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_log1p_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_logical_not_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_logical_xor_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_masked_argmax_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_masked_cumprod_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_masked_log_softmax_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_masked_logaddexp_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_masked_softmax_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_masked_softmin_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_matmul_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_min_binary_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_minimum_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_native_batch_norm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_new_empty_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_new_ones_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_nn_functional_binary_cross_entropy_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_nn_functional_hinge_embedding_loss_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_nn_functional_layer_norm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_nn_functional_pad_reflect_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_nn_functional_pad_replicate_negative_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_nn_functional_upsample_nearest_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_nonzero_static_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_pinverse_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_rad2deg_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_randn_like_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_round_decimals_neg_3_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_scatter_reduce_amax_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_take_along_dim_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_torch_ops_aten__safe_softmax_default_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_var_unbiased_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_view_as_complex_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_vstack_cuda_float32, test/test_ops.py::TestMathBitsCUDA::test_conj_view___rmul___cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view___rpow___cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs__conversions_cfloat_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_abs_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_acosh_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_as_strided_partial_views_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_broadcast_tensors_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_cos_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_isfinite_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_linalg_cross_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_linalg_svd_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_log1p_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_masked_fill_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_nn_functional_pairwise_distance_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_nn_functional_pixel_unshuffle_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_nn_functional_softmax_with_dtype_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_repeat_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_t_copy_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_tan_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_tril_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_view_as_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_view_copy_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_where_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__unsafe_masked_index_put_accumulate_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_alias_copy_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_cumprod_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_diag_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_full_like_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_gather_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_linalg_eigvals_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_linalg_lu_factor_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_linalg_slogdet_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_logical_not_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_logsumexp_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_mH_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_masked_select_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_movedim_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_mul_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_nn_functional_pad_circular_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_nn_functional_unfold_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_randn_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_resize_as__cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_scatter_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_squeeze_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_stack_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_std_mean_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_t_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_unsqueeze_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view___rmatmul___cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs__conversions_float_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_all_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_atleast_2d_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_block_diag_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_clone_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_constant_pad_nd_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_cumprod_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_fft_irfft2_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_flip_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_index_add_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_istft_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_linalg_diagonal_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_linalg_svdvals_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_linalg_vecdot_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_logspace_tensor_overload_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_new_empty_strided_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_repeat_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_special_log_softmax_with_dtype_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_special_softmax_with_dtype_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_tan_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_tanh_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_tril_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_block_diag_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_cartesian_prod_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_chalf_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_cholesky_solve_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_contiguous_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_cos_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_cross_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_empty_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_empty_strided_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_equal_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_expand_copy_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_expand_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_fft_fftshift_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_fliplr_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linalg_eigvals_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linalg_lstsq_grad_oriented_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linalg_pinv_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linalg_solve_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linalg_vector_norm_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_masked_select_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_matmul_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_nanmean_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_ne_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_nn_functional_conv1d_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_nn_functional_conv_transpose3d_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_nn_functional_l1_loss_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_nn_functional_normalize_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_nn_functional_pad_circular_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_nn_functional_pad_replicate_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_nn_functional_triplet_margin_loss_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_nn_functional_triplet_margin_with_distance_loss_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_permute_copy_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_permute_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_real_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_reciprocal_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_repeat_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_reshape_as_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_slice_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_split_with_sizes_copy_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_svd_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_tensor_split_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_trapz_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_tril_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_true_divide_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_view_as_real_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_view___rmatmul___cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__chunk_cat_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs__conversions_bfloat16_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_acosh_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_atan2_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_atanh_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_copysign_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_exp2_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_exponential_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_flip_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_fmod_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_linalg_diagonal_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_log_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_logical_xor_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_lt_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_meshgrid_list_of_tensors_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_new_zeros_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_nn_functional_dropout_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_nn_functional_threshold_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_ones_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_permute_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_reshape_as_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_sgn_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_square_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_sum_to_size_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_t_copy_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_triu_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_view_copy_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_alias_copy_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_bfloat16_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_cdouble_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_chunk_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_clamp_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_conj_physical_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_constant_pad_nd_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_cross_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_cumprod_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_deg2rad_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_diagonal_copy_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_einsum_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_fft_ifftshift_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_fft_ihfft2_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_fft_ihfftn_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_fft_rfftn_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_floor_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_full_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_half_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_inner_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_isfinite_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_isnan_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_isneginf_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_jiterator_unary_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_lerp_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_linalg_det_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_linalg_eigvals_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_linalg_inv_ex_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_linalg_norm_subgradients_at_zero_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_linalg_solve_triangular_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_log_softmax_with_dtype_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_logical_and_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_logspace_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_masked_fill_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_max_pool2d_with_indices_backward_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_max_reduction_no_dim_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_mul_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_new_empty_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_new_empty_strided_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_new_full_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_adaptive_avg_pool2d_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_adaptive_max_pool3d_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_batch_norm_without_cudnn_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_channel_shuffle_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_cosine_embedding_loss_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_dropout_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_embedding_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_hinge_embedding_loss_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_margin_ranking_loss_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_max_pool2d_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_prelu_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_relu_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_threshold_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_polygamma_polygamma_n_4_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_rad2deg_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_scalar_tensor_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_scatter_add_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_sigmoid_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_sparse_sampled_addmm_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_special_modified_bessel_i0_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_special_shifted_chebyshev_polynomial_t_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_special_spherical_bessel_j0_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_to_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_transpose_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_triangular_solve_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_triu_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_var_mean_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_vdot_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_view_copy_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_zeros_cuda_float64, test/test_ops.py::TestFakeTensorCUDA::test_fake__upsample_bilinear2d_aa_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_argmin_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_atleast_1d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast___getitem___cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast___rsub___cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_all_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_clone_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_contiguous_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_div_no_rounding_mode_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_fft_hfft2_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_fft_hfftn_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_fft_ifftn_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_fft_irfft2_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_fft_irfft_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_fmod_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_hypot_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_index_copy_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_index_put_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_isclose_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_isinf_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_isreal_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_istft_cuda_complex64, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_kron_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linalg_cholesky_ex_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linalg_diagonal_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linalg_ldl_factor_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linalg_norm_subgradients_at_zero_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_log1p_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_logical_xor_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_logit_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_logspace_tensor_overload_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_masked_logaddexp_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_mm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_avg_pool3d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_glu_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_hardsigmoid_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_hinge_embedding_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_huber_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_instance_norm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_mse_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_multi_head_attention_forward_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_pad_circular_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_pad_replicate_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nonzero_static_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_ormqr_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_polygamma_polygamma_n_0_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_polygamma_polygamma_n_2_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_rot90_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_scalar_tensor_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_sgn_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_signal_windows_general_hamming_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_signbit_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_special_chebyshev_polynomial_u_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_special_log_ndtr_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_special_spherical_bessel_j0_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_squeeze_copy_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_squeeze_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_stft_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_take_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_tile_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_trapz_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_true_divide_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_unflatten_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_unsqueeze_copy_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_var_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_bincount_cuda_int64, test/test_ops.py::TestFakeTensorCUDA::test_fake_bucketize_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_ceil_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_conj_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp__segment_reduce_lengths_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_as_strided_copy_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_clone_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_diag_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_div_floor_rounding_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_dot_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_dsplit_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_fft_ifft2_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_fft_ihfft_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_fft_rfft2_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_flipud_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_ldexp_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_linalg_inv_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_log10_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_log_softmax_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_mT_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_max_reduction_no_dim_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_mode_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_mvlgamma_mvlgamma_p_5_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_dropout3d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_fractional_max_pool2d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_fractional_max_pool3d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_instance_norm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_max_pool2d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_max_unpool1d_grad_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_poisson_nll_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_soft_margin_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_tanhshrink_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_put_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_repeat_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_repeat_interleave_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_reshape_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_select_scatter_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_sgn_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_sinc_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_special_entr_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_special_i1e_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_special_ndtr_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_special_polygamma_special_polygamma_n_0_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_sum_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_t_copy_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_t_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_tanh_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_to_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_torch_ops_aten__efficient_attention_forward_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_var_unbiased_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_xlogy_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp___getitem___cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp__native_batch_norm_legit_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_addmm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_addr_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_cholesky_inverse_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_cosh_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_dist_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_dsplit_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_einsum_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_fft_irfft2_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_fft_irfft_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_float_power_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_inner_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_linalg_diagonal_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_linalg_eig_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_linalg_inv_ex_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_linalg_lu_factor_ex_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_linalg_multi_dot_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_logaddexp2_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_masked_norm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_masked_sum_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_median_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_meshgrid_variadic_tensors_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_mul_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_mvlgamma_mvlgamma_p_1_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_narrow_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_neg_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_channel_shuffle_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_conv3d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_fractional_max_pool3d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_hardshrink_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_hardswish_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_interpolate_bilinear_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_leaky_relu_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_pca_lowrank_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_reciprocal_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_reshape_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_rot90_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_special_i0e_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_special_i1e_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_squeeze_copy_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_std_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_sum_to_size_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_triangular_solve_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_var_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_cummax_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_div_floor_rounding_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_expand_copy_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_fft_hfftn_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_full_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_index_reduce_amin_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_index_select_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_isnan_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_isreal_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_linalg_diagonal_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_linalg_eigvals_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_linalg_matrix_rank_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_linalg_slogdet_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_mH_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_masked_logaddexp_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_masked_logsumexp_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_masked_mean_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_masked_softmax_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_masked_std_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_matrix_exp_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_mean_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_minimum_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nanmean_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_new_ones_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_binary_cross_entropy_with_logits_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_celu_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_gelu_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_max_unpool1d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_max_unpool3d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_mse_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_pixel_shuffle_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_softplus_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_pow_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_rad2deg_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_resize__cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_round_decimals_3_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_scatter_reduce_amin_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_select_scatter_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_signal_windows_hamming_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_slice_scatter_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_special_bessel_j1_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_special_chebyshev_polynomial_t_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_special_modified_bessel_i0_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_special_scaled_modified_bessel_k1_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_square_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_squeeze_copy_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_var_mean_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_var_mean_unbiased_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_view_copy_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops___getitem___cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops__native_batch_norm_legit_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_atan2_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_bernoulli_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_bitwise_not_cuda_int64, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_cholesky_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_corrcoef_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_dstack_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_empty_permuted_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_erf_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_exp2_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_fft_hfftn_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_fliplr_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_fmin_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_frac_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_hstack_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_igamma_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_inner_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_isin_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_isreal_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_kron_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_kthvalue_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_diagonal_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_eigvalsh_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_inv_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_lstsq_grad_oriented_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_lu_factor_ex_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_solve_ex_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_solve_triangular_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_log10_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_logaddexp2_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_masked_softmax_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_masked_softmin_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_minimum_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_adaptive_max_pool3d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_avg_pool1d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_batch_norm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_channel_shuffle_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_dropout3d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_embedding_bag_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_hardswish_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_hardtanh_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_max_unpool2d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_pad_replicate_negative_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_prelu_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_relu6_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_relu_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_smooth_l1_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_upsample_nearest_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_normal_number_mean_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_quantile_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_special_bessel_j0_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_special_entr_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_special_spherical_bessel_j0_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_split_list_args_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_square_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_std_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_stft_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_tensordot_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_to_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_torch_ops_aten__flash_attention_forward_cuda_float16, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_transpose_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_true_divide_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_where_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_zero__cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_zeros_like_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout__refs_arange_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout__refs_linspace_cuda_uint8, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout__refs_logspace_tensor_overload_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout__refs_ones_cuda_complex128, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout_arange_cuda_float64, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout_linspace_cuda_float64, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout_linspace_tensor_overload_cuda_complex64, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout_linspace_tensor_overload_cuda_int16, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout_ones_cuda_bool, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout_ones_cuda_uint8, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout_zeros_cuda_complex32, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout_zeros_cuda_float16, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout_zeros_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout_zeros_cuda_int8, test/test_ops.py::TestTagsCUDA::test_tags__batch_norm_with_update_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs__conversions_short_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_addcmul_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_any_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_atleast_1d_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_atleast_3d_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_cauchy_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_eq_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_fft_irfftn_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_flatten_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_flip_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_ge_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_isclose_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_isposinf_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_lerp_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_linspace_tensor_overload_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_neg_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_new_empty_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_nextafter_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_nn_functional_channel_shuffle_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_nn_functional_gelu_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_nn_functional_hardtanh_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_rad2deg_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_repeat_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_special_zeta_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_split_with_sizes_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_unfold_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_vsplit_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__unsafe_masked_index_put_accumulate_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_acosh_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_addr_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_aminmax_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_argsort_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_bernoulli_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_bitwise_xor_cuda_int64, test/test_ops.py::TestTagsCUDA::test_tags_broadcast_tensors_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_cov_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_cummin_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_diag_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_dsplit_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_equal_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_erfinv_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_fft_irfft_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_fft_rfft2_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_float_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_floor_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_full_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_gt_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_hsplit_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_imag_cuda_complex64, test/test_ops.py::TestTagsCUDA::test_tags_index_fill_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_isnan_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_kron_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_linalg_det_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_linalg_householder_product_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_linalg_ldl_factor_ex_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_linalg_matrix_power_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_linalg_solve_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_linalg_svd_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_linalg_tensorinv_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_log10_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_logspace_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_logsumexp_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_masked_select_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_masked_std_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_new_empty_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_adaptive_max_pool1d_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_channel_shuffle_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_dropout2d_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_dropout3d_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_fractional_max_pool2d_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_fractional_max_pool3d_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_group_norm_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_kl_div_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_logsigmoid_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_pad_reflect_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_pdist_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_silu_complex_cuda_complex64, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_threshold_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_rand_like_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_real_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_repeat_interleave_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_resolve_neg_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_scatter_reduce_amax_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_softmax_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_special_airy_ai_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_special_bessel_j1_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_special_modified_bessel_i0_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_special_scaled_modified_bessel_k0_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_special_spherical_bessel_j0_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_sum_to_size_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_t_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_torch_ops_aten__flash_attention_forward_cuda_float16, test/test_ops.py::TestTagsCUDA::test_tags_trace_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_triu_indices_cuda_int64, test/test_ops.py::TestTagsCUDA::test_tags_true_divide_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_unfold_copy_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_vdot_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_vsplit_cuda_float32, test/test_ops.py::TestForwardADWithScalarsCUDA::test_0d_tensor_with_python_scalar_div_floor_rounding_cuda_float32 2025-12-04T14:57:29.5099231Z 2025-12-04T14:57:29.5099548Z Finished test_ops 7/11 ... [2025-12-04 14:57:29.262559][20677.645451895], took 19.49min 2025-12-04T14:57:29.5100582Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_ops/test_ops-75f8d45594e24741.xml 2025-12-04T14:57:29.5101641Z Running functorch/test_dims 1/1 ... [2025-12-04 14:57:29.479650][20677.862543093] 2025-12-04T14:57:29.5102180Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T14:57:29.5103412Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'functorch/test_dims.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 14:57:29.480125] 2025-12-04T14:58:19.8684571Z 2025-12-04T14:58:19.8685920Z functorch/test_dims 1/1 was successful, full logs can be found in artifacts with path test/test-reports/functorch.test_dims_1.1_a45bb86ae199f167_.log 2025-12-04T14:58:19.8706397Z Running 68 items in this shard: test/functorch/test_dims.py::TestMin::test_adapt, test/functorch/test_dims.py::TestMin::test_attn, test/functorch/test_dims.py::TestMin::test_attn_cuda, test/functorch/test_dims.py::TestMin::test_big_split, test/functorch/test_dims.py::TestMin::test_compare_dims, test/functorch/test_dims.py::TestMin::test_diag, test/functorch/test_dims.py::TestMin::test_dim_args, test/functorch/test_dims.py::TestMin::test_dims_with_size, test/functorch/test_dims.py::TestMin::test_dir, test/functorch/test_dims.py::TestMin::test_doc, test/functorch/test_dims.py::TestMin::test_embed, test/functorch/test_dims.py::TestMin::test_eq, test/functorch/test_dims.py::TestMin::test_expand, test/functorch/test_dims.py::TestMin::test_functorch, test/functorch/test_dims.py::TestMin::test_hello, test/functorch/test_dims.py::TestMin::test_index, test/functorch/test_dims.py::TestMin::test_index_placement, test/functorch/test_dims.py::TestMin::test_inplace, test/functorch/test_dims.py::TestMin::test_manual_stuff, test/functorch/test_dims.py::TestMin::test_mask, test/functorch/test_dims.py::TestMin::test_max, test/functorch/test_dims.py::TestMin::test_mm, test/functorch/test_dims.py::TestMin::test_mm_fuse, test/functorch/test_dims.py::TestMin::test_monkey, test/functorch/test_dims.py::TestMin::test_network, test/functorch/test_dims.py::TestMin::test_order, test/functorch/test_dims.py::TestMin::test_order_keyword, test/functorch/test_dims.py::TestMin::test_permute_orig, test/functorch/test_dims.py::TestMin::test_seg, test/functorch/test_dims.py::TestMin::test_simple, test/functorch/test_dims.py::TestMin::test_softmax_split, test/functorch/test_dims.py::TestMin::test_stack, test/functorch/test_dims.py::TestMin::test_time_mm_fuse, test/functorch/test_dims.py::TestMin::test_with_dims_split, test/functorch/test_dims.py::TestMinFunctorchOnly::test_adapt, test/functorch/test_dims.py::TestMinFunctorchOnly::test_attn, test/functorch/test_dims.py::TestMinFunctorchOnly::test_attn_cuda, test/functorch/test_dims.py::TestMinFunctorchOnly::test_big_split, test/functorch/test_dims.py::TestMinFunctorchOnly::test_compare_dims, test/functorch/test_dims.py::TestMinFunctorchOnly::test_diag, test/functorch/test_dims.py::TestMinFunctorchOnly::test_dim_args, test/functorch/test_dims.py::TestMinFunctorchOnly::test_dims_with_size, test/functorch/test_dims.py::TestMinFunctorchOnly::test_dir, test/functorch/test_dims.py::TestMinFunctorchOnly::test_doc, test/functorch/test_dims.py::TestMinFunctorchOnly::test_embed, test/functorch/test_dims.py::TestMinFunctorchOnly::test_eq, test/functorch/test_dims.py::TestMinFunctorchOnly::test_expand, test/functorch/test_dims.py::TestMinFunctorchOnly::test_functorch, test/functorch/test_dims.py::TestMinFunctorchOnly::test_hello, test/functorch/test_dims.py::TestMinFunctorchOnly::test_index, test/functorch/test_dims.py::TestMinFunctorchOnly::test_index_placement, test/functorch/test_dims.py::TestMinFunctorchOnly::test_inplace, test/functorch/test_dims.py::TestMinFunctorchOnly::test_manual_stuff, test/functorch/test_dims.py::TestMinFunctorchOnly::test_mask, test/functorch/test_dims.py::TestMinFunctorchOnly::test_max, test/functorch/test_dims.py::TestMinFunctorchOnly::test_mm, test/functorch/test_dims.py::TestMinFunctorchOnly::test_mm_fuse, test/functorch/test_dims.py::TestMinFunctorchOnly::test_monkey, test/functorch/test_dims.py::TestMinFunctorchOnly::test_network, test/functorch/test_dims.py::TestMinFunctorchOnly::test_order, test/functorch/test_dims.py::TestMinFunctorchOnly::test_order_keyword, test/functorch/test_dims.py::TestMinFunctorchOnly::test_permute_orig, test/functorch/test_dims.py::TestMinFunctorchOnly::test_seg, test/functorch/test_dims.py::TestMinFunctorchOnly::test_simple, test/functorch/test_dims.py::TestMinFunctorchOnly::test_softmax_split, test/functorch/test_dims.py::TestMinFunctorchOnly::test_stack, test/functorch/test_dims.py::TestMinFunctorchOnly::test_time_mm_fuse, test/functorch/test_dims.py::TestMinFunctorchOnly::test_with_dims_split 2025-12-04T14:58:19.8726235Z 2025-12-04T14:58:19.8726562Z Finished functorch/test_dims 1/1 ... [2025-12-04 14:58:19.868288][20728.251183012], took 0.84min 2025-12-04T14:58:19.8940876Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/functorch.test_dims/functorch.test_dims-e2a9e671430fd99e.xml 2025-12-04T14:58:21.2486504Z Uploading artifacts took 1.27 seconds 2025-12-04T14:58:21.2490604Z Running functorch/test_ops 1/7 ... [2025-12-04 14:58:21.248849][20729.631742543] 2025-12-04T14:58:21.2491448Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T14:58:21.2495714Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'functorch/test_ops.py', '--shard-id=1', '--num-shards=7', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 14:58:21.249305] 2025-12-04T15:09:59.9473859Z 2025-12-04T15:09:59.9474870Z functorch/test_ops 1/7 was successful, full logs can be found in artifacts with path test/test-reports/functorch.test_ops_1.7_2b66798f0700c47b_.log 2025-12-04T15:10:00.0170070Z Running 1492 items in this shard: test/functorch/test_ops.py::TestOperatorsCUDA::test_extremal_numerics_l1_loss_cuda, test/functorch/test_ops.py::TestOperatorsCUDA::test_extremal_numerics_softmax_cuda, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_ForwardHasDefaultArgsAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_NumpyCubeAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_NumpyMulAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad___rmul___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_acosh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_addcmul_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_addr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_all_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_amin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_argmin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_asinh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_bucketize_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_byte_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_cat_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_clone_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_combinations_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_cos_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_cumprod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_diagonal_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_dot_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_dstack_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_empty_strided_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_expand_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_expand_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_exponential_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_fft_fft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_fft_hfftn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_fft_ifftshift_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_fft_ihfft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_fft_irfft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_index_reduce_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_index_select_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_isclose_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_isin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_jiterator_2inputs_2outputs_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_jiterator_4inputs_with_extra_args_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_kron_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_linalg_diagonal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_linalg_inv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_linalg_matrix_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_linalg_matrix_rank_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_linalg_norm_subgradients_at_zero_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_linalg_pinv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_log10_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_log_normal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_logaddexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_lt_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_masked_fill_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_masked_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_masked_select_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_masked_softmin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_masked_var_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_matrix_exp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_max_binary_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_mode_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_msort_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_mvlgamma_mvlgamma_p_1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_mvlgamma_mvlgamma_p_5_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nan_to_num_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_narrow_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_native_batch_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_avg_pool1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_conv2d_stride_padding_no_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_conv2d_with_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_conv_transpose2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_conv_transpose3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_dropout_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_embedding_functorch_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_feature_alpha_dropout_without_train_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_glu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_interpolate_bicubic_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_interpolate_nearest-exact_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_max_pool2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_multilabel_soft_margin_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_normalize_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_pdist_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_scaled_dot_product_attention_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_softmin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_softshrink_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_unfold_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_upsample_bilinear_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_polygamma_polygamma_n_3_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_prod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_randint_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_randn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_reciprocal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_rsqrt_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_searchsorted_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_select_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_signal_windows_cosine_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_signal_windows_kaiser_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_special_airy_ai_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_special_chebyshev_polynomial_u_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_special_hermite_polynomial_h_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_special_hermite_polynomial_he_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_special_i1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_split_with_sizes_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_std_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_std_unbiased_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_svd_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_take_along_dim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_take_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_to_sparse_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_topk_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_transpose_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_trunc_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_unsafe_split_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_view_as_complex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_vstack_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_SelectAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp___getitem___functorch_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp__native_batch_norm_legit_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp__segment_reduce_lengths_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp__unsafe_masked_index_put_accumulate_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_acos_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_addmv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_addr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_angle_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_arange_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_as_strided_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_as_strided_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_as_strided_scatter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_atan_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_atanh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_byte_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_cdouble_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_ceil_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_cholesky_solve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_clone_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_column_stack_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_cov_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_cross_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_cummax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_cumulative_trapezoid_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_deg2rad_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_diagonal_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_div_floor_rounding_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_div_trunc_rounding_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_empty_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_empty_permuted_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_fft_hfftn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_fft_irfft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_fft_rfftn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_fill_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_floor_divide_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_ge_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_hypot_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_index_select_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_int_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_isfinite_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_isneginf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_lerp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_lgamma_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_linalg_inv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_linalg_ldl_solve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_linalg_lstsq_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_linalg_lu_factor_ex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_linalg_matrix_power_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_linalg_norm_subgradients_at_zero_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_linalg_svdvals_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_linalg_vecdot_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_log2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_log_softmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_log_softmax_with_dtype_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_logcumsumexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_logit_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_logsumexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_long_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_lt_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_lu_solve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_mT_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_masked_amin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_masked_logaddexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_masked_sum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_masked_var_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_matmul_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_max_binary_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_max_reduction_with_dim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_mul_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nanmedian_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_native_batch_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_native_dropout_backward_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_adaptive_avg_pool2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_adaptive_avg_pool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_adaptive_max_pool1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_avg_pool1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_binary_cross_entropy_with_logits_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_conv2d_stride_padding_with_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_embedding_bag_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_interpolate_area_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_layer_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_margin_ranking_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_max_unpool1d_grad_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_mish_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_pad_reflect_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_pad_replicate_negative_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_rrelu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_triplet_margin_with_distance_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_upsample_nearest_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_normal_in_place_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_ones_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_ormqr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_pca_lowrank_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_permute_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_polygamma_polygamma_n_2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_polygamma_polygamma_n_4_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_ravel_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_reciprocal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_remainder_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_resize__cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_rsqrt_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_signal_windows_cosine_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_signal_windows_gaussian_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_special_bessel_j1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_special_i0e_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_special_ndtr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_split_list_args_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_sum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_svd_lowrank_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_tensordot_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_to_sparse_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_topk_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_unflatten_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_var_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_var_mean_unbiased_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_view_as_complex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_zeros_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_NumpyCubeNotComposableAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_NumpySortAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_ZeroGradientsGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp___rdiv___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp___rmatmul___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp__unsafe_masked_index_put_accumulate_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_allclose_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_argmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_argsort_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_argwhere_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_as_strided_partial_views_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_as_strided_scatter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_atleast_1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_baddbmm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_bfloat16_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_cdist_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_chunk_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_clamp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_diagflat_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_diagonal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_diagonal_scatter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_div_floor_rounding_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_dot_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_dsplit_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_einsum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_empty_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_exp2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_expand_as_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_expand_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_fft_fft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_fft_ifft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_fft_ifftshift_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_fft_ihfft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_floor_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_gradient_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_grid_sampler_2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_heaviside_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_index_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_isclose_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_isneginf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_jiterator_binary_return_by_ref_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_le_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_linalg_eigvals_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_linalg_householder_product_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_linalg_ldl_solve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_linalg_lstsq_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_linalg_lu_factor_ex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_linalg_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_linalg_pinv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_linalg_pinv_hermitian_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_linalg_qr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_linalg_solve_ex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_linalg_solve_triangular_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_linalg_svd_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_linalg_tensorinv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_log10_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_logdet_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_logical_not_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_masked_amin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_masked_argmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_masked_fill_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_masked_scatter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_max_reduction_with_dim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_min_reduction_with_dim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_mode_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_movedim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_narrow_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_native_dropout_backward_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_binary_cross_entropy_with_logits_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_conv1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_conv2d_strided_padding_dilation_no_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_ctc_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_dropout3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_hardtanh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_hinge_embedding_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_instance_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_interpolate_area_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_interpolate_bicubic_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_interpolate_linear_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_interpolate_trilinear_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_kl_div_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_local_response_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_max_unpool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_multi_margin_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_normalize_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_pad_constant_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_pad_replicate_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_pixel_unshuffle_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_relu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_tanhshrink_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_triplet_margin_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nonzero_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_polar_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_polygamma_polygamma_n_1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_remainder_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_repeat_interleave_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_resize_as__cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_roll_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_round_decimals_3_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_short_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_sinc_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_sinh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_slice_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_slice_scatter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_softmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_sparse_mm_reduce_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_sparse_sampled_addmm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_squeeze_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_squeeze_multiple_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_std_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_unfold_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_unique_consecutive_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_unsqueeze_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_zero__cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjpvmap_MulGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvmap_NumpyCubeAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvmap_NumpyExpMarkDirtyAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvmap_ZeroGradientsGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvmapvmap_CubeGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvmapvmap_NumpyExpMarkDirtyAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvmapvmap_NumpyTakeAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_ordered_bool_raises_argmin_cuda_bool, test/functorch/test_ops.py::TestOperatorsCUDA::test_ordered_complex_raises_amin_cuda_complex32, test/functorch/test_ops.py::TestOperatorsCUDA::test_ordered_complex_raises_clamp_cuda_complex128, test/functorch/test_ops.py::TestOperatorsCUDA::test_ordered_complex_raises_ge_cuda_complex128, test/functorch/test_ops.py::TestOperatorsCUDA::test_ordered_complex_raises_lt_cuda_complex128, test/functorch/test_ops.py::TestOperatorsCUDA::test_ordered_complex_raises_lt_cuda_complex64, test/functorch/test_ops.py::TestOperatorsCUDA::test_ordered_complex_raises_sort_cuda_complex128, test/functorch/test_ops.py::TestOperatorsCUDA::test_ordered_complex_raises_topk_cuda_complex64, test/functorch/test_ops.py::TestOperatorsCUDA::test_view_then_inplace_list_return_dsplit_grad_op_jvp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_view_then_inplace_list_return_dsplit_grad_op_vjp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_view_then_inplace_list_return_unbind_grad_op_jvp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_view_then_inplace_list_return_vsplit_grad_op_jvp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_view_then_inplace_list_return_vsplit_grad_op_vjp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_view_then_inplace_positive_grad_op_vjp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_view_then_inplace_real_grad_op_vjp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_view_then_inplace_special_grad_op_vjp_cuda, test/functorch/test_ops.py::TestOperatorsCUDA::test_view_then_inplace_unflatten_grad_op_jvp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_view_then_inplace_unflatten_grad_op_vjp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_view_then_inplace_view_as_grad_op_vjp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_view_then_inplace_view_grad_op_vjp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp___getitem___functorch_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp___radd___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp___rmul___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp__unsafe_masked_index_put_accumulate_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_acos_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_addmv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_angle_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_arange_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_as_strided_partial_views_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_atanh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_bernoulli_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_bool_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_cat_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_cdist_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_char_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_cholesky_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_conj_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_copysign_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_diagonal_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_dstack_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_empty_permuted_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_exp2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_fft_fftn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_fft_hfft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_fft_irfftn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_floor_divide_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_fmin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_ge_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_geqrf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_grid_sampler_3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_hstack_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_index_fill_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_int_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_jiterator_binary_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_kthvalue_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_linalg_eigh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_linalg_householder_product_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_linalg_lu_factor_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_linalg_svd_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_log1p_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_log2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_log_normal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_logaddexp2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_logical_or_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_long_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_long_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_lu_solve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_masked_amin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_masked_argmin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_min_reduction_with_dim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_mvlgamma_mvlgamma_p_1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_mvlgamma_mvlgamma_p_5_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_new_empty_strided_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_new_ones_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nextafter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_batch_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_binary_cross_entropy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_channel_shuffle_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_conv1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_dropout_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_hardswish_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_hardtanh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_instance_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_interpolate_bicubic_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_interpolate_nearest_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_margin_ranking_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_max_pool2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_max_unpool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_max_unpool3d_grad_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_softsign_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_normal_in_place_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_ones_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_pca_lowrank_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_permute_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_polygamma_polygamma_n_3_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_repeat_interleave_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_reshape_as_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_rsqrt_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_scatter_reduce_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_short_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_signal_windows_gaussian_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_signal_windows_general_hamming_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_softmax_with_dtype_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_sparse_sampled_addmm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_special_bessel_y1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_special_chebyshev_polynomial_u_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_special_chebyshev_polynomial_w_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_special_hermite_polynomial_h_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_special_ndtri_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_special_shifted_chebyshev_polynomial_u_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_sqrt_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_squeeze_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_unflatten_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_unsafe_chunk_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_var_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_CubeGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp___getitem___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp___rsub___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_as_strided_scatter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_atleast_1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_bool_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_byte_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_cartesian_prod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_cdist_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_cfloat_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_cholesky_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_column_stack_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_cosh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_dstack_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_empty_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_expand_as_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_expm1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_fft_fftshift_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_fft_irfft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_fft_rfftn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_floor_divide_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_hash_tensor_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_isneginf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_isreal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_linalg_cond_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_linalg_det_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_linalg_eigh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_linalg_inv_ex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_linalg_slogdet_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_linalg_solve_ex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_linalg_svd_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_linalg_vecdot_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_linspace_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_log_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_logit_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_masked_argmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_masked_scatter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_matmul_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_matrix_exp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_min_reduction_with_dim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_msort_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_mvlgamma_mvlgamma_p_3_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_new_ones_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_new_zeros_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nextafter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_adaptive_avg_pool2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_adaptive_avg_pool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_avg_pool1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_celu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_conv3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_dropout_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_embedding_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_feature_alpha_dropout_without_train_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_hardswish_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_interpolate_area_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_interpolate_bilinear_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_interpolate_linear_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_interpolate_nearest-exact_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_kl_div_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_logsigmoid_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_mse_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_pixel_unshuffle_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_prelu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_softshrink_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_triplet_margin_with_distance_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_norm_nuc_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_ones_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_permute_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_permute_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_rad2deg_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_randint_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_repeat_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_scatter_reduce_amax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_signal_windows_exponential_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_signal_windows_gaussian_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_signal_windows_hamming_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_sort_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_special_shifted_chebyshev_polynomial_v_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_special_xlog1py_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_special_zeta_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_squeeze_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_std_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_topk_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_torch_ops_aten__efficient_attention_forward_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_transpose_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_unfold_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_unique_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_view_as_complex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_view_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_ForwardHasDefaultArgsAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_NumpyCubeAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_NumpyTakeAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap___getitem___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap__segment_reduce_offsets_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_addcmul_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_allclose_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_argmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_as_strided_scatter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_atan2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_broadcast_tensors_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_cartesian_prod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_cholesky_inverse_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_combinations_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_cross_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_cumulative_trapezoid_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_diff_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_div_no_rounding_mode_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_dot_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_eq_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_eye_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_fft_fftshift_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_flip_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_float_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_fmod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_grid_sampler_2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_half_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_hash_tensor_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_index_put_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_index_select_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_jiterator_unary_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_lerp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_linalg_cond_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_linalg_eigvals_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_linalg_ldl_solve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_linalg_lu_factor_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_linalg_lu_factor_ex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_linalg_matrix_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_linalg_multi_dot_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_linalg_svd_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_log_normal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_logical_or_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_masked_logsumexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_masked_select_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_maximum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_mvlgamma_mvlgamma_p_1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_new_empty_strided_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_adaptive_avg_pool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_adaptive_max_pool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_celu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_conv2d_stride_with_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_conv3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_conv_transpose3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_cross_entropy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_dropout_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_elu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_interpolate_linear_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_kl_div_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_linear_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_max_pool1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_multilabel_soft_margin_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_normalize_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_pad_circular_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_softshrink_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_threshold_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_triplet_margin_with_distance_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_norm_inf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_normal_in_place_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_normal_number_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_ops_aten_index_put_functorch_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_randint_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_renorm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_repeat_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_round_decimals_neg_3_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_scatter_reduce_amax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_scatter_reduce_sum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_short_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_signal_windows_general_cosine_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_sin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_sinc_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_sort_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_sparse_sampled_addmm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_special_airy_ai_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_special_bessel_y0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_special_chebyshev_polynomial_u_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_special_log_ndtr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_special_xlog1py_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_squeeze_multiple_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_std_mean_unbiased_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_sum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_svd_lowrank_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_to_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_unfold_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_unsafe_chunk_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_var_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_var_mean_unbiased_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_view_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_view_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_where_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmapvmap_NumpySortAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_MulGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_MulGenVmapAutogradFunction_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_NumpyCubeAutogradFunction_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_NumpyCubeNotComposableAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_SelectAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_SelectGenVmapAutogradFunction_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_ZeroGradientsGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad___getitem___functorch_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad___rdiv___cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad__batch_norm_with_update_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad__segment_reduce_offsets_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad__unsafe_masked_index_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad__unsafe_masked_index_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_abs_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_addbmm_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_addr_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_alias_copy_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_amax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_aminmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_angle_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_any_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_argsort_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_as_strided_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_as_strided_partial_views_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_asinh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_atleast_2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_atleast_3d_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_bfloat16_functorch_no_channels_last_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_ceil_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_char_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_char_functorch_no_channels_last_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_clamp_max_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_clamp_min_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_clamp_min_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_column_stack_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_conj_physical_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_copysign_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_cosh_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_cummax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_deg2rad_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_diagonal_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_diagonal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_empty_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_empty_permuted_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_eq_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_equal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_erfc_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_erfinv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_exp2_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_expand_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_expm1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_exponential_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_fft_fft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_fft_fft_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_fft_hfft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_fft_ifftshift_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_fft_ihfft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_fft_ihfftn_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_fft_irfft2_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_flip_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_fliplr_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_full_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_full_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_geometric_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_gradient_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_half_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_heaviside_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_index_add_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_index_put_functorch_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_index_reduce_amax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_index_reduce_amin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_index_reduce_mean_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_index_reduce_prod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_index_reduce_prod_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_int_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_int_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_isfinite_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_isin_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_item_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_jiterator_binary_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_kron_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_diagonal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_eigh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_eigvalsh_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_inv_ex_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_lstsq_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_lu_factor_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_lu_factor_ex_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_matrix_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_multi_dot_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_solve_triangular_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_log_normal_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_logit_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_logsumexp_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_long_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_long_functorch_no_channels_last_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_lt_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_lu_solve_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_lu_unpack_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_masked_amin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_masked_fill_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_masked_log_softmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_masked_mean_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_masked_normalize_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_masked_sum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_matrix_exp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_max_binary_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_max_reduction_with_dim_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_meshgrid_list_of_tensors_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_minimum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_msort_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_multinomial_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_mv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_mvlgamma_mvlgamma_p_5_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nanmean_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nanquantile_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nansum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_ne_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_ne_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_neg_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_neg_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_new_zeros_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nextafter_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_adaptive_avg_pool2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_bilinear_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_binary_cross_entropy_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_conv2d_stride_depthwise_with_bias_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_conv2d_stride_with_bias_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_conv3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_conv_transpose1d_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_conv_transpose3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_cosine_embedding_loss_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_cosine_similarity_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_embedding_functorch_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_embedding_functorch_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_fractional_max_pool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_gaussian_nll_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_grid_sample_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_hardswish_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_interpolate_nearest_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_interpolate_trilinear_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_interpolate_trilinear_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_leaky_relu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_logsigmoid_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_max_unpool3d_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_pad_reflect_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_pixel_shuffle_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_poisson_nll_loss_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_selu_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_softshrink_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_tanhshrink_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_threshold_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_norm_nuc_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_normal_number_mean_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_ops_aten_index_put_functorch_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_ormqr_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_pca_lowrank_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_polygamma_polygamma_n_4_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_positive_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_quantile_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_ravel_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_reciprocal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_round_decimals_neg_3_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_scatter_add_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_scatter_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_scatter_reduce_amin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_scatter_reduce_prod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_select_scatter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_sign_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_signal_windows_general_cosine_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_signal_windows_general_cosine_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_sinc_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_sinc_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_special_airy_ai_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_special_bessel_y1_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_special_hermite_polynomial_h_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_special_hermite_polynomial_he_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_special_i1e_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_special_laguerre_polynomial_l_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_special_modified_bessel_k0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_special_scaled_modified_bessel_k0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_special_shifted_chebyshev_polynomial_u_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_special_shifted_chebyshev_polynomial_v_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_split_with_sizes_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_split_with_sizes_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_squeeze_multiple_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_std_mean_unbiased_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_sub_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_sub_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_svd_lowrank_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_t_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_tensor_split_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_to_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_topk_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_torch_ops_aten__safe_softmax_default_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_triangular_solve_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_tril_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_trunc_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_unsqueeze_copy_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_var_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_var_mean_unbiased_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_var_unbiased_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_vdot_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_vdot_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_vstack_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_CubeGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_ForwardHasDefaultArgsAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_SelectAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall___rdiv___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall___rmod___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall__batch_norm_with_update_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall__unsafe_masked_index_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_addcdiv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_asinh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_atan2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_atleast_1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_block_diag_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_cartesian_prod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_cat_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_cauchy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_cfloat_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_chalf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_conj_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_cosh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_cumprod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_diagflat_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_einsum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_erfc_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_fft_fft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_fft_fft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_fft_ihfftn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_fmod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_frexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_grid_sampler_3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_half_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_CubeGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_NumpyCubeAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule___getitem___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule___getitem___functorch_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule___rpow___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule__chunk_cat_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule__softmax_backward_data_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_acosh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_addcdiv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_addmv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_allclose_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_argmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_atan2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_bfloat16_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_block_diag_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_cartesian_prod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_clamp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_clone_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_combinations_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_complex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_cos_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_cumsum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_diag_embed_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_einsum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_empty_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_empty_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_eq_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_equal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_erf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_fft_ifft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_fft_irfftn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_fft_rfft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_flatten_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_fmin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_gt_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_half_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_hash_tensor_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_lgamma_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_linalg_cross_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_linalg_diagonal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_linalg_inv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_linalg_matrix_rank_hermitian_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_linalg_solve_triangular_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_log_softmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_logaddexp2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_logaddexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_logical_or_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_logsumexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_masked_cumsum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_masked_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_masked_scatter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_masked_std_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_matmul_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_max_reduction_with_dim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_maximum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_movedim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nanmean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_batch_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_conv2d_stride_groups_with_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_conv2d_stride_no_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_conv2d_with_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_cosine_similarity_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_hardshrink_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_interpolate_bilinear_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_layer_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_logsigmoid_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_max_pool2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_max_unpool1d_grad_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_relu6_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_tanhshrink_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_upsample_bilinear_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_ormqr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_permute_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_permute_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_polygamma_polygamma_n_1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_polygamma_polygamma_n_2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_polygamma_polygamma_n_4_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_rad2deg_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_randn_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_resize__cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_resolve_conj_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_resolve_neg_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_roll_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_rot90_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_round_decimals_0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_select_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_signal_windows_bartlett_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_signal_windows_nuttall_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_slice_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_slice_scatter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_sort_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_special_bessel_y0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_special_entr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_special_i0e_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_special_i1e_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_special_laguerre_polynomial_l_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_special_scaled_modified_bessel_k0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_special_shifted_chebyshev_polynomial_t_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_special_shifted_chebyshev_polynomial_w_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_std_mean_unbiased_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_to_sparse_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_tril_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_trunc_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_uniform_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_view_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_hsplit_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_index_reduce_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_isposinf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_lgamma_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_linalg_eigh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_linalg_ldl_factor_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_linalg_ldl_factor_ex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_linalg_matrix_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_linalg_solve_triangular_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_linalg_vector_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_linspace_tensor_overload_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_log2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_log_softmax_with_dtype_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_long_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_mT_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_masked_log_softmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_masked_median_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_masked_softmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_masked_sum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_matmul_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_max_pool2d_with_indices_backward_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_min_reduction_with_dim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_mv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nanmedian_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_native_batch_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_native_layer_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_new_empty_strided_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_new_zeros_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_avg_pool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_batch_norm_without_cudnn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_bilinear_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_conv1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_conv2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_conv2d_stride_no_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_conv3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_dropout3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_feature_alpha_dropout_without_train_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_glu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_instance_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_interpolate_trilinear_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_max_unpool1d_grad_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_max_unpool2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_max_unpool3d_grad_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_mse_loss_functorch_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_multi_head_attention_forward_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_multi_margin_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_pad_replicate_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_pdist_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_softshrink_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_normal_in_place_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_normal_number_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_polygamma_polygamma_n_4_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_put_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_repeat_interleave_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_resolve_neg_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_scatter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_signal_windows_hann_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_special_chebyshev_polynomial_w_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_special_entr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_special_laguerre_polynomial_l_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_special_shifted_chebyshev_polynomial_w_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_special_spherical_bessel_j0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_special_zeta_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_sub_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_t_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_tanh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_torch_ops_aten__efficient_attention_forward_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_tril_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_true_divide_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_unflatten_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_uniform_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_var_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_NumpyCubeAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_NumpyExpMarkDirtyAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp__batch_norm_with_update_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp__unsafe_masked_index_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_abs_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_addcdiv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_any_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_atanh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_atleast_2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_bmm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_broadcast_to_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_byte_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_cat_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_cdist_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_chunk_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_clamp_min_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_contiguous_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_double_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_double_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_empty_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_erf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_expm1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_fft_fft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_fft_fftn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_fft_hfft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_fft_hfftn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_fft_irfft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_fft_rfft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_flip_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_full_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_geqrf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_grid_sampler_3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_index_select_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_inner_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_isfinite_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_isneginf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_le_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_linalg_cholesky_ex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_linalg_inv_ex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_linalg_ldl_solve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_linalg_lstsq_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_linalg_norm_subgradients_at_zero_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_linalg_qr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_linalg_slogdet_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_linalg_solve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_logsumexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_masked_log_softmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_masked_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_max_reduction_with_dim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_maximum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_mm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_msort_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_native_batch_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_ne_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_adaptive_avg_pool1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_avg_pool1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_avg_pool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_conv2d_no_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_conv2d_stride_depthwise_with_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_hardsigmoid_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_instance_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_interpolate_nearest-exact_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_kl_div_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_linear_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_max_unpool2d_grad_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_mish_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_mse_loss_functorch_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_multilabel_margin_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_pad_reflect_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_pairwise_distance_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_relu6_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_softmin_with_dtype_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nonzero_static_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_norm_inf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_ones_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_pca_lowrank_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_rad2deg_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_randint_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_randn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_randn_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_reshape_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_resolve_neg_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_scatter_reduce_prod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_sgn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_sigmoid_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_signal_windows_general_hamming_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_signal_windows_kaiser_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_softmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_sort_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_sparse_sampled_addmm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_special_hermite_polynomial_he_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_special_i1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_special_legendre_polynomial_p_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_special_modified_bessel_i1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_special_modified_bessel_k0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_special_shifted_chebyshev_polynomial_t_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_special_spherical_bessel_j0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_std_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_std_unbiased_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_take_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_tan_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_uniform_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_var_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvmap_ForwardHasDefaultArgsAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_MulGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_NumpyMulAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_SelectGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_SortGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp___rmatmul___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_addcmul_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_addmm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_allclose_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_baddbmm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_bool_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_char_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_cholesky_inverse_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_clamp_max_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_conj_physical_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_contiguous_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_corrcoef_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_cummax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_cummin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_cumprod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_diagonal_scatter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_einsum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_eq_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_fft_hfft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_fft_ifft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_fft_ihfft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_fft_ihfftn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_fmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_fmod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_full_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_gather_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_geqrf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_ScaleGradGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_SelectAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_SortGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule___getitem___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule__segment_reduce_offsets_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule__upsample_bilinear2d_aa_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_addbmm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_addmm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_addmm_decomposed_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_alias_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_amax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_any_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_as_strided_scatter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_atleast_2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_atleast_3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_bernoulli_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_block_diag_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_chalf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_clone_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_cos_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_diag_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_diagflat_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_div_floor_rounding_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_double_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_expand_as_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_exponential_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_fft_fft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_fft_fftshift_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_fft_rfft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_flip_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_floor_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_fmod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_frac_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_half_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_heaviside_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_hsplit_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_igamma_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_index_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_index_select_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_inner_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_isin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_linalg_matrix_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_linspace_tensor_overload_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_log_softmax_with_dtype_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_logit_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_logspace_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_lu_solve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_masked_amin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_masked_cumprod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_masked_logaddexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_masked_softmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_masked_sum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_movedim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nanquantile_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nansum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_alpha_dropout_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_conv2d_no_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_conv2d_stride_padding_with_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_conv2d_with_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_ctc_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_elu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_group_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_hinge_embedding_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_layer_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_max_unpool1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_multilabel_soft_margin_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_normalize_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_pad_reflect_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_pairwise_distance_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_relu6_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_selu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nonzero_static_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_ops_aten__new_zeros_with_same_feature_meta_functorchonly_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_ops_aten_index_put_functorch_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_permute_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_pinverse_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_polygamma_polygamma_n_4_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_pow_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_real_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_renorm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_repeat_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_reshape_as_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_resize_as__cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_roll_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_rsub_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_scatter_reduce_prod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_short_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_signal_windows_bartlett_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_sin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_slice_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_special_hermite_polynomial_h_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_special_scaled_modified_bessel_k0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_special_shifted_chebyshev_polynomial_v_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_special_xlog1py_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_std_unbiased_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_sub_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_take_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_var_mean_unbiased_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_vdot_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_where_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_index_put_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_inner_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_isclose_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_isin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_isneginf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_isreal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_jiterator_binary_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_linalg_eigvals_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_linalg_pinv_hermitian_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_linalg_solve_triangular_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_linalg_svdvals_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_logaddexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_logsumexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_lu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_lu_solve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_lu_unpack_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_maximum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_min_binary_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_mode_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_msort_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_mv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_narrow_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_native_batch_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_avg_pool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_conv2d_stride_groups_with_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_conv2d_stride_padding_no_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_conv2d_stride_padding_with_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_conv2d_with_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_conv_transpose2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_cosine_embedding_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_cosine_similarity_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_cross_entropy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_embedding_bag_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_feature_alpha_dropout_with_train_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_fractional_max_pool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_gaussian_nll_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_gelu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_hardsigmoid_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_interpolate_linear_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_logsigmoid_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_max_unpool1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_mish_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_pixel_shuffle_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_poisson_nll_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_relu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_softmin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_unfold_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_upsample_nearest_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_normal_number_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_ones_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_polygamma_polygamma_n_2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_qr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_randint_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_randn_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_remainder_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_reshape_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_resize_as__cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_resolve_conj_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_resolve_neg_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_round_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_round_decimals_0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_scatter_reduce_prod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_scatter_reduce_sum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_select_scatter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_signal_windows_general_hamming_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_softmax_with_dtype_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_sort_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_special_bessel_y0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_special_entr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_special_i1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_special_legendre_polynomial_p_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_special_ndtr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_special_ndtri_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_stack_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_sum_to_size_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_tensor_split_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_topk_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_transpose_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_trunc_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_unfold_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_var_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_view_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_vstack_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_zero__cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_zeros_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_SelectAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp__softmax_backward_data_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_acos_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_addbmm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_angle_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_as_strided_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_asin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_atanh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_atleast_3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_baddbmm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_bernoulli_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_broadcast_shapes_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_cartesian_prod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_cat_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_ceil_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_constant_pad_nd_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_copysign_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_cov_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_cummax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_diagonal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_diagonal_scatter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_diff_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_dist_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_einsum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_empty_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_equal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_fft_irfft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_fmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_gt_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_index_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_index_fill_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_index_select_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_isclose_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_linalg_cholesky_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_linalg_cholesky_ex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_linalg_cross_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_linalg_inv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_linalg_ldl_factor_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_linalg_ldl_solve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_linalg_lstsq_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_linalg_lstsq_grad_oriented_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_linalg_multi_dot_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_linalg_pinv_hermitian_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_linalg_solve_triangular_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_log_softmax_with_dtype_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_logical_or_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_logit_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_lu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_masked_log_softmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_masked_sum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_meshgrid_list_of_tensors_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_mm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_mvlgamma_mvlgamma_p_1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_mvlgamma_mvlgamma_p_3_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nan_to_num_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_new_full_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_new_ones_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_alpha_dropout_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_channel_shuffle_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_conv2d_no_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_conv_transpose1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_cross_entropy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_feature_alpha_dropout_with_train_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_group_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_huber_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_interpolate_nearest-exact_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_max_pool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_max_unpool1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_max_unpool3d_grad_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_pixel_unshuffle_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_relu6_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_rms_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_softsign_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_triplet_margin_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_upsample_bilinear_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_ones_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_quantile_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_remainder_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_resize_as__cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_round_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_select_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_signal_windows_cosine_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_special_chebyshev_polynomial_u_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_special_entr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_special_erfcx_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_special_modified_bessel_k1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_special_scaled_modified_bessel_k0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_sum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_take_along_dim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_tensordot_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_tile_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_topk_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_torch_ops_aten__safe_softmax_default_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_trace_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_transpose_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_unbind_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_unique_consecutive_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_var_mean_unbiased_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_vdot_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_view_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_vstack_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_where_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvmap_NumpyCubeNotComposableAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvmap_NumpySortAutogradFunction_cuda_float32 2025-12-04T15:10:00.0849530Z 2025-12-04T15:10:00.0849901Z Finished functorch/test_ops 1/7 ... [2025-12-04 15:09:59.949482][21428.332373553], took 11.65min 2025-12-04T15:10:00.0851101Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/functorch.test_ops/functorch.test_ops-caabf5583dae6043.xml 2025-12-04T15:10:00.1449865Z Running functorch/test_ops 6/7 ... [2025-12-04 15:10:00.144694][21428.52758652] 2025-12-04T15:10:00.1450415Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T15:10:00.1454349Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'functorch/test_ops.py', '--shard-id=6', '--num-shards=7', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:10:00.145197] 2025-12-04T15:20:28.9988402Z 2025-12-04T15:20:28.9989454Z functorch/test_ops 6/7 was successful, full logs can be found in artifacts with path test/test-reports/functorch.test_ops_6.7_b2e5f87489ea3e61_.log 2025-12-04T15:20:29.0676878Z Running 1471 items in this shard: test/functorch/test_ops.py::TestOperatorsCUDA::test_extremal_numerics_nll_loss_cuda, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_MulGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_ScaleGradGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_SelectAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad___rpow___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad__segment_reduce_offsets_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_addmv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_cartesian_prod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_ceil_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_chalf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_cholesky_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_clamp_max_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_cov_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_diag_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_eye_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_fft_fftshift_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_fliplr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_float_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_float_power_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_fmod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_frexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_ge_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_hash_tensor_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_heaviside_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_hsplit_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_hypot_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_index_put_functorch_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_index_reduce_amin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_isneginf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_jiterator_binary_return_by_ref_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_ldexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_le_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_linalg_cholesky_ex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_linalg_cond_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_linalg_eigh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_linalg_lu_solve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_linalg_svdvals_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_linalg_tensorinv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_linalg_vector_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_logical_not_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_logsumexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_lu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_masked_cumprod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_masked_fill_functorch_Scalar_only_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_masked_std_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_matmul_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_max_reduction_with_dim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_min_binary_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_min_reduction_no_dim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_min_reduction_with_dim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_new_full_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_adaptive_max_pool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_conv2d_stride_no_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_conv2d_stride_with_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_cosine_embedding_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_elu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_fractional_max_pool2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_gaussian_nll_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_gelu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_hardswish_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_hinge_embedding_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_instance_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_interpolate_trilinear_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_kl_div_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_linear_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_logsigmoid_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_mish_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_multilabel_margin_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_nll_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_pad_constant_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_rms_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_rrelu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_silu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_smooth_l1_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_tanhshrink_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_triplet_margin_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_triplet_margin_with_distance_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_upsample_nearest_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_ones_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_permute_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_polar_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_reshape_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_resolve_conj_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_resolve_neg_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_round_decimals_neg_3_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_scalar_tensor_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_scatter_reduce_amin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_scatter_reduce_sum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_signal_windows_gaussian_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_signal_windows_hamming_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_signal_windows_nuttall_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_slice_scatter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_softmax_with_dtype_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_sparse_sampled_addmm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_special_bessel_j1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_special_laguerre_polynomial_l_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_special_log_ndtr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_special_xlog1py_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_split_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_tensor_split_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_tile_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_trapezoid_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_unfold_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_unique_consecutive_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_unique_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_var_mean_unbiased_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_view_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_vsplit_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_xlogy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_CubeGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_NumpyTakeAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp___radd___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp___rmod___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp__chunk_cat_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_alias_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_all_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_atan2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_byte_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_chalf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_char_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_clamp_min_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_conj_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_corrcoef_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_cos_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_cosh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_diagflat_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_diff_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_digamma_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_double_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_empty_strided_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_exponential_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_fft_fft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_fft_ifft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_fft_ihfft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_fft_irfft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_flip_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_float_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_float_power_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_floor_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_fmod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_geqrf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_grid_sampler_3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_histc_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_i0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_index_add_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_isnan_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_jiterator_binary_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_kron_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_linalg_eig_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_linalg_lu_factor_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_linalg_pinv_hermitian_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_linalg_solve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_logaddexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_logical_and_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_long_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_mH_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_masked_normalize_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_masked_scatter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_masked_softmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_min_reduction_no_dim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_mvlgamma_mvlgamma_p_1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_narrow_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_new_empty_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_avg_pool2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_batch_norm_without_cudnn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_bilinear_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_conv2d_stride_no_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_conv2d_stride_padding_no_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_dropout2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_dropout3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_fractional_max_pool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_glu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_hardtanh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_interpolate_linear_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_max_pool2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_multi_margin_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_pad_constant_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_poisson_nll_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_relu6_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_rms_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_soft_margin_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_triplet_margin_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_norm_nuc_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_normal_number_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_ops_aten__new_zeros_with_same_feature_meta_functorchonly_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_ops_aten_index_put_functorch_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_permute_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_polygamma_polygamma_n_0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_renorm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_resolve_conj_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_roll_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_scatter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_scatter_reduce_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_softmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_special_entr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_special_erfcx_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_special_i1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_special_polygamma_special_polygamma_n_0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_split_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_split_with_sizes_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_stack_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_sum_to_size_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_take_along_dim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_tan_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_trapz_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_unbind_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_unfold_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_view_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_vstack_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpjvpvmap_MulGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpjvpvmap_NumpyExpMarkDirtyAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_MulGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_NumpyExpMarkDirtyAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_NumpyTakeAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_acosh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_addmm_decomposed_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_addmv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_atan2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_bmm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_broadcast_shapes_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_cat_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_copysign_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_count_nonzero_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_diag_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_digamma_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_double_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_erf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_fft_irfftn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_flatten_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_flipud_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_half_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_histc_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_isfinite_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_isposinf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_isreal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_ldexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_linalg_cholesky_ex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_linalg_det_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_linalg_multi_dot_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_linalg_norm_subgradients_at_zero_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_linalg_solve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_linalg_vector_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_log2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_log_normal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_logical_or_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_logical_xor_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_logspace_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_long_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_lu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_masked_amax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_masked_argmin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_masked_cumsum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_masked_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_masked_select_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_max_reduction_no_dim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_maximum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_median_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_minimum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_mm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_multinomial_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_ne_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_conv2d_stride_no_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_conv2d_stride_padding_with_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_conv_transpose1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_conv_transpose2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_dropout_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_elu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_embedding_functorch_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_feature_alpha_dropout_with_train_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_hardshrink_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_layer_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_max_pool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_max_unpool3d_grad_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_multi_head_attention_forward_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_multilabel_margin_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_nll_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_prelu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_rrelu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_softmin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_softsign_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_norm_nuc_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_normal_in_place_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_normal_number_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_qr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_quantile_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_randint_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_reciprocal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_round_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_scatter_add_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_select_scatter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_sgn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_signal_windows_bartlett_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_signal_windows_cosine_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_sin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_special_bessel_y0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_special_chebyshev_polynomial_u_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_special_entr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_special_legendre_polynomial_p_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_special_ndtr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_special_zeta_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_split_with_sizes_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_std_mean_unbiased_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_sub_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_sum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_svd_lowrank_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_tile_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_torch_ops_aten__efficient_attention_forward_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_torch_ops_aten__safe_softmax_default_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_trunc_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_var_mean_unbiased_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_vsplit_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_zeros_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjpvmap_NumpyTakeAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvmap_NumpyTakeAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvmapvmap_NumpyCubeNotComposableAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_ordered_bool_raises_topk_cuda_bool, test/functorch/test_ops.py::TestOperatorsCUDA::test_ordered_complex_raises_amax_cuda_complex64, test/functorch/test_ops.py::TestOperatorsCUDA::test_ordered_complex_raises_argmin_cuda_complex64, test/functorch/test_ops.py::TestOperatorsCUDA::test_ordered_complex_raises_gt_cuda_complex128, test/functorch/test_ops.py::TestOperatorsCUDA::test_ordered_complex_raises_le_cuda_complex128, test/functorch/test_ops.py::TestOperatorsCUDA::test_view_then_inplace_conj_grad_op_jvp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_view_then_inplace_contiguous_grad_op_jvp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_view_then_inplace_list_return_split_grad_op_vjp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_view_then_inplace_movedim_grad_op_vjp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_view_then_inplace_permute_grad_op_jvp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_view_then_inplace_permute_grad_op_vjp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_view_then_inplace_positive_grad_op_jvp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_view_then_inplace_real_grad_op_jvp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_view_then_inplace_special_grad_op_jvp_cuda, test/functorch/test_ops.py::TestOperatorsCUDA::test_view_then_inplace_squeeze_grad_op_vjp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_view_then_inplace_squeeze_multiple_grad_op_jvp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_ForwardHasDefaultArgsAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_NumpyCubeAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_SelectAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_SelectGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp___rpow___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp__segment_reduce_lengths_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_addmm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_addr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_amin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_aminmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_argsort_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_asin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_atan2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_atan_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_broadcast_tensors_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_cartesian_prod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_chalf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_clamp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_column_stack_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_complex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_conj_physical_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_corrcoef_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_cosh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_cumsum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_diagonal_scatter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_digamma_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_double_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_dsplit_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_einsum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_empty_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_erfinv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_expand_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_fft_ihfft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_fliplr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_float_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_full_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_half_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_heaviside_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_index_add_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_index_reduce_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_isinf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_jiterator_2inputs_2outputs_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_linalg_det_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_linalg_eigvals_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_linalg_matrix_rank_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_linalg_matrix_rank_hermitian_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_linalg_pinv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_linalg_solve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_linalg_tensorsolve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_log_softmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_logspace_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_masked_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_masked_select_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_masked_softmin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_matmul_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_mm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_mode_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_movedim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_new_zeros_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_adaptive_avg_pool2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_adaptive_max_pool1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_alpha_dropout_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_avg_pool1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_bilinear_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_binary_cross_entropy_with_logits_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_conv2d_stride_depthwise_with_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_conv2d_stride_no_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_conv2d_stride_with_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_conv3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_dropout2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_dropout3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_embedding_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_feature_alpha_dropout_without_train_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_interpolate_bilinear_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_interpolate_trilinear_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_kl_div_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_mse_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_multi_head_attention_forward_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_pad_circular_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_pad_constant_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_pad_replicate_negative_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_pdist_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_softmin_with_dtype_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_softplus_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_softshrink_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_tanhshrink_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_threshold_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_permute_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_prod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_put_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_quantile_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_remainder_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_renorm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_round_decimals_0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_scalar_tensor_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_sign_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_signal_windows_exponential_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_signal_windows_general_cosine_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_slice_scatter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_special_bessel_j1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_special_i1e_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_special_log_ndtr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_special_polygamma_special_polygamma_n_0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_special_scaled_modified_bessel_k0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_squeeze_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_std_unbiased_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_sub_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_svd_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_take_along_dim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_tensor_split_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_to_sparse_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_triangular_solve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_var_mean_unbiased_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_NumpyCubeAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_NumpyMulAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_NumpyTakeAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_SelectGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp___getitem___functorch_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp___rpow___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp__native_batch_norm_legit_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_aminmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_argwhere_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_as_strided_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_atleast_2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_cdouble_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_conj_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_corrcoef_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_diag_embed_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_digamma_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_div_trunc_rounding_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_double_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_fft_ifft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_fft_ihfftn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_flip_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_float_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_fmin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_fmod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_full_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_gather_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_grid_sampler_2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_igammac_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_index_put_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_index_reduce_amax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_inner_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_isin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_item_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_jiterator_4inputs_with_extra_args_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_ldexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_linalg_householder_product_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_linalg_lu_factor_ex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_linalg_norm_subgradients_at_zero_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_linalg_svdvals_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_logaddexp2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_logical_not_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_mH_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_masked_amin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_masked_select_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_masked_softmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_masked_sum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_max_binary_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_meshgrid_list_of_tensors_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_mm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_mvlgamma_mvlgamma_p_5_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_native_batch_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_native_dropout_backward_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_native_layer_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_neg_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_new_empty_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_new_full_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_batch_norm_without_cudnn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_bilinear_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_conv1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_embedding_functorch_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_feature_alpha_dropout_with_train_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_hardsigmoid_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_hardtanh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_hinge_embedding_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_max_pool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_max_unpool2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_pad_reflect_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_pad_replicate_negative_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_rms_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_soft_margin_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_norm_fro_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_normal_in_place_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_normal_number_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_ops_aten__new_zeros_with_same_feature_meta_functorchonly_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_ops_aten_index_put_functorch_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_polygamma_polygamma_n_0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_positive_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_randn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_ravel_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_repeat_interleave_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_resolve_neg_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_searchsorted_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_signal_windows_bartlett_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_signal_windows_blackman_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_sinc_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_sparse_mm_reduce_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_sparse_sampled_addmm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_special_log_ndtr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_special_spherical_bessel_j0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_squeeze_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_stft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_triangular_solve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_trunc_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_unflatten_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_vstack_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_zero__cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjpvmap_ZeroGradientsGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_T_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap___rmod___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap___rmul___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap__batch_norm_with_update_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_acosh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_add_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_addcdiv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_addmm_decomposed_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_addr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_amin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_angle_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_atleast_2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_baddbmm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_block_diag_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_bmm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_bool_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_bool_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_bucketize_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_cdouble_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_cfloat_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_char_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_clamp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_complex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_conj_physical_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_constant_pad_nd_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_cumsum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_deg2rad_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_diag_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_digamma_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_empty_permuted_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_erfc_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_fft_fft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_fft_ifft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_fft_irfft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_fft_rfft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_fft_rfftn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_fill_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_fliplr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_floor_divide_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_ge_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_hstack_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_index_reduce_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_isfinite_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_linalg_eig_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_linalg_ldl_factor_ex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_linalg_svdvals_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_linalg_tensorinv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_linalg_vecdot_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_linspace_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_linspace_tensor_overload_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_log_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_lt_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_masked_logaddexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_masked_prod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_masked_std_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_matrix_exp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_max_reduction_no_dim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_max_reduction_with_dim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_movedim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_mv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nanmedian_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_adaptive_avg_pool2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_alpha_dropout_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_conv2d_stride_groups_with_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_conv2d_strided_padding_dilation_with_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_feature_alpha_dropout_with_train_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_hardswish_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_layer_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_logsigmoid_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_mse_loss_functorch_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_multilabel_margin_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_nll_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_pixel_unshuffle_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_relu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_rms_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_scaled_dot_product_attention_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_softmin_with_dtype_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_upsample_bilinear_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nonzero_static_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_ones_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_polar_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_positive_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_prod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_round_decimals_0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_scatter_reduce_prod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_searchsorted_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_select_scatter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_sgn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_sigmoid_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_signal_windows_hamming_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_special_hermite_polynomial_he_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_special_modified_bessel_k1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_split_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_square_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_sub_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_sum_to_size_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_take_along_dim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_take_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_tan_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_tanh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_to_sparse_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_trapezoid_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_unique_consecutive_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_view_as_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_zeros_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmapvmap_NumpyCubeAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmapvmap_NumpyCubeNotComposableAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_H_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_NumpyExpMarkDirtyAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_NumpyExpMarkDirtyAutogradFunction_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_NumpyMulAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad___radd___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad___radd___cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad___rmod___cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad__unsafe_masked_index_put_accumulate_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_abs_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_addcdiv_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_allclose_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_amax_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_arange_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_argmin_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_atanh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_baddbmm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_baddbmm_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_bernoulli_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_block_diag_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_block_diag_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_broadcast_shapes_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_broadcast_tensors_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_broadcast_tensors_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_broadcast_to_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_byte_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_cdist_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_cfloat_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_chalf_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_cholesky_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_cholesky_inverse_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_cholesky_solve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_clone_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_complex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_cos_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_count_nonzero_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_cross_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_cummin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_cummin_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_cumprod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_cumprod_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_diag_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_diag_embed_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_diagflat_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_diagonal_scatter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_diff_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_digamma_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_dist_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_einsum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_empty_strided_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_erfc_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_exp2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_fft_fftn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_fft_fftn_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_fft_hfftn_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_fft_ifft2_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_fft_ifft_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_fft_irfft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_flip_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_float_functorch_no_channels_last_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_floor_divide_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_fmin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_frac_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_ge_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_geqrf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_grid_sampler_2d_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_half_functorch_no_channels_last_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_index_add_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_index_copy_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_isinf_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_isneginf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_isreal_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_jiterator_2inputs_2outputs_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_jiterator_binary_return_by_ref_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_ldexp_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_lerp_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_lgamma_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_cholesky_ex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_lu_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_matrix_rank_hermitian_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_multi_dot_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_pinv_hermitian_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_pinv_hermitian_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_slogdet_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_solve_ex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_solve_ex_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_solve_triangular_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_tensorsolve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linspace_tensor_overload_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_log10_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_log2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_log_softmax_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_log_softmax_with_dtype_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_logaddexp2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_logaddexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_logcumsumexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_logcumsumexp_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_logical_not_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_logical_or_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_mT_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_masked_amax_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_masked_argmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_masked_argmin_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_masked_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_masked_scatter_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_masked_select_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_masked_softmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_matmul_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_max_binary_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_max_pool2d_with_indices_backward_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_min_binary_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_min_reduction_no_dim_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_mm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_mode_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_mul_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_mv_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nan_to_num_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nanquantile_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_narrow_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_native_batch_norm_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_new_full_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_adaptive_max_pool2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_adaptive_max_pool3d_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_alpha_dropout_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_batch_norm_without_cudnn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_bilinear_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_conv1d_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_conv2d_stride_depthwise_with_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_conv2d_stride_padding_no_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_conv2d_stride_padding_with_bias_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_conv2d_strided_padding_dilation_no_bias_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_conv2d_strided_padding_dilation_with_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_conv_transpose3d_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_ctc_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_dropout2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_fractional_max_pool3d_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_gelu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_hardsigmoid_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_hardsigmoid_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_instance_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_interpolate_bilinear_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_interpolate_nearest_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_linear_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_local_response_norm_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_multi_head_attention_forward_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_pad_circular_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_pad_reflect_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_rms_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_scaled_dot_product_attention_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_selu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_softmin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_softplus_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_softshrink_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_upsample_nearest_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nonzero_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_normal_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_ones_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_ops_aten__new_zeros_with_same_feature_meta_functorchonly_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_ops_aten_index_put_functorch_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_permute_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_polar_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_polygamma_polygamma_n_1_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_polygamma_polygamma_n_3_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_polygamma_polygamma_n_4_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_pow_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_qr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_qr_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_randint_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_repeat_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_resize__cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_resolve_conj_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_resolve_conj_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_round_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_scatter_reduce_mean_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_sigmoid_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_signal_windows_blackman_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_signal_windows_exponential_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_signal_windows_general_hamming_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_signal_windows_general_hamming_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_signal_windows_hamming_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_signal_windows_hann_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_sin_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_slice_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_sort_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_special_bessel_y0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_special_chebyshev_polynomial_u_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_special_chebyshev_polynomial_v_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_special_hermite_polynomial_h_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_special_i0e_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_special_legendre_polynomial_p_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_special_log_ndtr_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_special_modified_bessel_i0_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_special_modified_bessel_k0_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_special_scaled_modified_bessel_k1_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_special_shifted_chebyshev_polynomial_w_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_split_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_sqrt_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_std_unbiased_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_take_along_dim_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_tanh_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_tensor_split_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_torch_ops_aten__safe_softmax_default_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_transpose_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_triu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_triu_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_unbind_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_unflatten_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_unfold_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_unsafe_chunk_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_var_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_var_mean_unbiased_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_where_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_zeros_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_zeros_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall___getitem___functorch_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall___radd___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall__segment_reduce_lengths_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_allclose_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_as_strided_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_bmm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_broadcast_to_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_cdouble_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_char_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_char_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_cholesky_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_cholesky_solve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_clone_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_combinations_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_corrcoef_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_cummin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_diag_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_diag_embed_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_dot_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_dsplit_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_erfinv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_exp2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_fft_hfft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_fft_ifftshift_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_fft_rfftn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_float_power_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_frac_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_full_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_geometric_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_geqrf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_SelectGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule___rmod___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule___rmul___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule__segment_reduce_lengths_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule__unsafe_masked_index_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_addmm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_addr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_any_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_argmin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_argsort_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_as_strided_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_atleast_2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_baddbmm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_byte_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_cdouble_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_cfloat_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_cholesky_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_corrcoef_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_cross_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_fft_fft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_fft_fftshift_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_full_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_geqrf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_gradient_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_half_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_heaviside_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_histc_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_hsplit_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_index_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_index_put_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_index_reduce_amax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_index_reduce_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_index_select_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_isin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_isposinf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_linalg_cholesky_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_linalg_eig_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_linalg_eigh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_linalg_lstsq_grad_oriented_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_linalg_norm_subgradients_at_zero_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_linalg_tensorsolve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_log10_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_lu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_lu_unpack_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_masked_amin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_median_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_minimum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_mode_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_multinomial_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_mvlgamma_mvlgamma_p_3_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_native_dropout_backward_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_new_full_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_adaptive_avg_pool1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_adaptive_avg_pool2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_adaptive_max_pool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_avg_pool2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_batch_norm_without_cudnn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_binary_cross_entropy_with_logits_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_conv2d_no_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_conv2d_strided_padding_dilation_no_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_glu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_hardswish_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_instance_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_interpolate_nearest-exact_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_kl_div_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_l1_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_pad_circular_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_pairwise_distance_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_pixel_unshuffle_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_selu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_unfold_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_normal_in_place_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_ones_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_real_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_reciprocal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_repeat_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_resize_as__cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_round_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_rsub_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_scatter_reduce_amax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_scatter_reduce_prod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_searchsorted_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_signal_windows_gaussian_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_signal_windows_general_cosine_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_signal_windows_hann_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_special_chebyshev_polynomial_w_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_special_erfcx_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_special_hermite_polynomial_h_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_special_hermite_polynomial_he_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_special_scaled_modified_bessel_k1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_split_list_args_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_split_with_sizes_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_sqrt_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_stack_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_std_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_sum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_tensordot_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_trace_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_trapz_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_unfold_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_hstack_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_isreal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_linalg_cholesky_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_linalg_eig_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_linalg_inv_ex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_linalg_multi_dot_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_linalg_pinv_singular_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_linalg_slogdet_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_linalg_tensorinv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_linalg_vander_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_logdet_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_maximum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_meshgrid_list_of_tensors_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_mm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_mvlgamma_mvlgamma_p_5_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_narrow_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nextafter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_adaptive_max_pool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_alpha_dropout_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_conv2d_with_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_conv_transpose1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_ctc_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_elu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_feature_alpha_dropout_with_train_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_gaussian_nll_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_grid_sample_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_hinge_embedding_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_interpolate_linear_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_linear_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_local_response_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_multilabel_margin_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_nll_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_pixel_unshuffle_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_poisson_nll_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_relu6_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_relu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_selu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_softmin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_triplet_margin_with_distance_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_ops_aten__new_zeros_with_same_feature_meta_functorchonly_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_pca_lowrank_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_permute_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_positive_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_randn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_renorm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_resolve_conj_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_scatter_add_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_softmax_with_dtype_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_sort_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_special_bessel_j0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_special_bessel_j1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_special_chebyshev_polynomial_t_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_special_i1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_special_i1e_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_special_modified_bessel_i1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_special_modified_bessel_k0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_special_ndtri_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_special_shifted_chebyshev_polynomial_t_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_square_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_squeeze_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_std_unbiased_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_sum_to_size_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_t_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_topk_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_unfold_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_unsafe_chunk_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_unsafe_split_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_unsqueeze_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_view_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_where_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_zeros_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_zeros_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_H_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_SelectAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_SortGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp___rmatmul___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp__chunk_cat_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_add_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_addmv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_all_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_asin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_baddbmm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_block_diag_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_bool_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_cartesian_prod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_chalf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_cholesky_solve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_complex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_conj_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_conj_physical_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_cos_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_cosh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_cross_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_diagonal_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_erfc_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_erfinv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_eye_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_fft_ifft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_fft_ifft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_fft_rfft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_flipud_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_floor_divide_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_frexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_full_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_igamma_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_igammac_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_index_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_isin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_item_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_jiterator_4inputs_with_extra_args_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_lgamma_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_linalg_cholesky_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_linalg_cond_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_linalg_eigvals_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_linalg_ldl_factor_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_linalg_lu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_linalg_matrix_rank_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_linalg_matrix_rank_hermitian_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_linalg_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_linalg_solve_triangular_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_linspace_tensor_overload_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_logical_not_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_long_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_mH_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_masked_amax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_masked_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_masked_normalize_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_masked_sum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_max_binary_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_meshgrid_variadic_tensors_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_min_reduction_no_dim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_mv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nanquantile_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_narrow_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_new_empty_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_new_full_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_conv2d_stride_no_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_embedding_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_gaussian_nll_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_leaky_relu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_logsigmoid_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_multi_margin_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_silu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_soft_margin_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_softmin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_polygamma_polygamma_n_4_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_real_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_remainder_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_resolve_conj_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_roll_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_scalar_tensor_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_scatter_reduce_sum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_sparse_mm_reduce_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_special_chebyshev_polynomial_t_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_special_i0e_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_special_polygamma_special_polygamma_n_0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_squeeze_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_sub_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_svd_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_t_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_trapezoid_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_unbind_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_unfold_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_view_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_vsplit_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_xlogy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvmap_CubeGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_H_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_NumpyCubeAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_ZeroGradientsGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp___getitem___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp___rpow___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp__native_batch_norm_legit_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp__segment_reduce_lengths_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp__upsample_bilinear2d_aa_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_arange_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_atleast_2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_atleast_3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_chunk_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_copysign_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_double_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_dstack_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_empty_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_fft_ifft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_fft_ihfft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_fft_irfft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_fmin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_CubeGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_NumpySortAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_NumpyTakeAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule___getitem___functorch_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule___rmatmul___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule___rmod___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule__batch_norm_with_update_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule__chunk_cat_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_acosh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_addcmul_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_angle_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_arange_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_argmin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_asin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_baddbmm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_bfloat16_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_broadcast_shapes_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_cauchy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_cdist_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_cholesky_inverse_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_cholesky_solve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_chunk_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_conj_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_constant_pad_nd_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_copysign_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_cov_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_cumulative_trapezoid_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_diagonal_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_dist_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_exp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_fft_hfftn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_fft_ifftn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_fft_ifftshift_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_flatten_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_float_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_floor_divide_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_fmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_gradient_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_index_put_functorch_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_isclose_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_isposinf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_linalg_eig_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_linalg_eigh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_linalg_lu_factor_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_linalg_lu_solve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_linalg_matrix_rank_hermitian_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_linalg_tensorinv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_linalg_tensorsolve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_linalg_vector_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_logaddexp2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_logdet_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_logspace_tensor_overload_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_lt_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_masked_amax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_masked_argmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_masked_fill_functorch_Scalar_only_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_masked_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_masked_median_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_masked_std_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_max_pool2d_with_indices_backward_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_min_reduction_with_dim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_mvlgamma_mvlgamma_p_3_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_narrow_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_native_batch_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_new_empty_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_new_zeros_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_adaptive_avg_pool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_batch_norm_without_cudnn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_binary_cross_entropy_with_logits_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_conv2d_stride_no_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_conv_transpose3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_dropout3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_fractional_max_pool2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_gaussian_nll_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_logsigmoid_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_max_pool1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_max_unpool1d_grad_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_nll_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_pad_circular_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_tanhshrink_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nonzero_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_norm_fro_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_normal_number_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_prod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_resolve_neg_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_select_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_sparse_mm_reduce_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_special_bessel_y1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_special_chebyshev_polynomial_w_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_special_modified_bessel_i1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_special_zeta_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_squeeze_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_tile_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_torch_ops_aten__efficient_attention_forward_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_trapz_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_unflatten_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_unfold_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_unique_consecutive_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_unsqueeze_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_unsqueeze_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_var_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_view_as_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_view_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_zeros_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_zeros_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_hash_tensor_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_index_select_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_le_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_linalg_eig_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_linalg_lu_solve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_linalg_matrix_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_linalg_matrix_rank_hermitian_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_linalg_tensorinv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_linspace_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_log2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_log_normal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_logdet_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_logical_or_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_logical_xor_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_logspace_tensor_overload_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_long_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_lt_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_masked_fill_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_masked_fill_functorch_Scalar_only_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_masked_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_masked_std_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_max_binary_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_max_reduction_with_dim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nanquantile_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_ne_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_new_full_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_conv2d_strided_padding_dilation_no_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_dropout_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_elu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_feature_alpha_dropout_without_train_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_glu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_hardtanh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_kl_div_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_linear_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_max_pool2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_max_pool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_mse_loss_functorch_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_multilabel_soft_margin_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_nll_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_pad_reflect_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_prelu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_relu6_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_tanhshrink_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_norm_fro_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_norm_nuc_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_ones_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_quantile_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_reciprocal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_scatter_add_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_scatter_reduce_amax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_scatter_reduce_amin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_short_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_sign_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_signal_windows_hamming_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_sin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_sinc_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_special_bessel_y1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_special_chebyshev_polynomial_v_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_special_hermite_polynomial_h_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_split_with_sizes_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_split_with_sizes_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_t_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_tan_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_tensordot_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_to_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_to_sparse_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_torch_ops_aten__efficient_attention_forward_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_torch_ops_aten__safe_softmax_default_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_transpose_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_trapezoid_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_unbind_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_unflatten_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_unique_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_unsafe_chunk_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_var_unbiased_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_view_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_CubeGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_H_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_NumpyCubeAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_NumpyCubeNotComposableAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_ScaleGradGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp___radd___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp__batch_norm_with_update_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_addcdiv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_addmv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_asinh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_atan2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_byte_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_cholesky_inverse_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_cumprod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_diag_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_div_no_rounding_mode_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_double_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_expand_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_eye_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_fft_fft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_fft_fftshift_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_fft_ifft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_fill_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_fliplr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_flipud_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_float_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_floor_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_floor_divide_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_fmin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_frac_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_frexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_grid_sampler_3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_half_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_hstack_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_index_add_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_index_reduce_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_index_reduce_prod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_isin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_jiterator_binary_return_by_ref_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_linalg_eigh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_linalg_eigvals_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_linalg_matrix_rank_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_linalg_pinv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_linalg_pinv_singular_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_linalg_vector_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_log_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_logcumsumexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_long_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_long_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_masked_fill_functorch_Scalar_only_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_masked_logsumexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_masked_median_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_masked_softmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nanmean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_neg_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_adaptive_avg_pool2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_adaptive_max_pool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_avg_pool1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_batch_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_celu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_conv1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_dropout_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_embedding_functorch_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_fractional_max_pool2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_gaussian_nll_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_hardshrink_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_interpolate_bicubic_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_l1_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_leaky_relu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_multi_head_attention_forward_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_pad_constant_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_pdist_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_softshrink_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nonzero_static_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_norm_fro_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_ormqr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_outer_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_polar_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_polygamma_polygamma_n_1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_qr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_reciprocal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_resize__cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_resolve_conj_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_resolve_neg_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_rsub_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_scatter_reduce_prod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_signal_windows_general_cosine_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_signal_windows_hann_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_signbit_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_softmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_softmax_with_dtype_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_special_bessel_y0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_special_legendre_polynomial_p_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_special_ndtri_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_special_scaled_modified_bessel_k1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_special_shifted_chebyshev_polynomial_u_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_sub_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_svd_lowrank_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_t_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_take_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_to_sparse_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_torch_ops_aten__efficient_attention_forward_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_true_divide_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_unflatten_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_unfold_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_unfold_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_view_as_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvmapjvp_linalg_solve_cuda 2025-12-04T15:20:29.1353537Z 2025-12-04T15:20:29.1354235Z Finished functorch/test_ops 6/7 ... [2025-12-04 15:20:29.000777][22057.383669017], took 10.48min 2025-12-04T15:20:29.1355429Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/functorch.test_ops/functorch.test_ops-b6190fae5240f1fb.xml 2025-12-04T15:20:30.3031787Z Uploading artifacts took 1.16 seconds 2025-12-04T15:20:30.3036054Z Running inductor/test_select_algorithm 1/1 ... [2025-12-04 15:20:30.303413][22058.686307177] 2025-12-04T15:20:30.3036657Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T15:20:30.3040943Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_select_algorithm.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:20:30.303861] 2025-12-04T15:20:40.7214234Z 2025-12-04T15:20:40.7215392Z inductor/test_select_algorithm 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_select_algorithm_1.1_7db4d246e17eb863_.log 2025-12-04T15:20:40.7216393Z 2025-12-04T15:20:40.7216801Z Finished inductor/test_select_algorithm 1/1 ... [2025-12-04 15:20:40.721200][22069.104096027], took 0.17min 2025-12-04T15:20:40.7479381Z Running inductor/test_cpu_repro 1/3 ... [2025-12-04 15:20:40.747662][22069.130557131] 2025-12-04T15:20:40.7479965Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T15:20:40.7483256Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_cpu_repro.py', '--shard-id=1', '--num-shards=3', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:20:40.748096] 2025-12-04T15:35:02.5880011Z 2025-12-04T15:35:02.5881534Z inductor/test_cpu_repro 1/3 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_cpu_repro_1.3_45e7fcc9d89e84f9_.log 2025-12-04T15:35:02.6107579Z Running 233 items in this shard: test/inductor/test_cpu_repro.py::CPUReproTests::test_add_layernorm, test/inductor/test_cpu_repro.py::CPUReproTests::test_aten_normal_dtype, test/inductor/test_cpu_repro.py::CPUReproTests::test_auto_zvec_vsx_simd, test/inductor/test_cpu_repro.py::CPUReproTests::test_avx2_bool_constant_pad_nd, test/inductor/test_cpu_repro.py::CPUReproTests::test_bf16_zeros, test/inductor/test_cpu_repro.py::CPUReproTests::test_bitwise_logical_op_bool, test/inductor/test_cpu_repro.py::CPUReproTests::test_bitwise_shift_corner_inputs, test/inductor/test_cpu_repro.py::CPUReproTests::test_channels_last_view_as_complex, test/inductor/test_cpu_repro.py::CPUReproTests::test_consistent_remove_buffers, test/inductor/test_cpu_repro.py::CPUReproTests::test_constant_bool_vec, test/inductor/test_cpu_repro.py::CPUReproTests::test_conv1d_strided_weight_torch_compile, test/inductor/test_cpu_repro.py::CPUReproTests::test_conv2d_autocast, test/inductor/test_cpu_repro.py::CPUReproTests::test_conv_in_channel_1_dynamic_shapes, test/inductor/test_cpu_repro.py::CPUReproTests::test_convert_fp32_int64_oob_vec, test/inductor/test_cpu_repro.py::CPUReproTests::test_convert_fp32_to_int64_vec, test/inductor/test_cpu_repro.py::CPUReproTests::test_convert_int32_to_int64_vec, test/inductor/test_cpu_repro.py::CPUReproTests::test_decomposed_fake_quant_per_channel, test/inductor/test_cpu_repro.py::CPUReproTests::test_dequant_quant_lowering_int8, test/inductor/test_cpu_repro.py::CPUReproTests::test_dequant_quant_lowering_uint8, test/inductor/test_cpu_repro.py::CPUReproTests::test_double_pointwise_vec, test/inductor/test_cpu_repro.py::CPUReproTests::test_for_loop_collapsed, test/inductor/test_cpu_repro.py::CPUReproTests::test_full_bits_lowp, test/inductor/test_cpu_repro.py::CPUReproTests::test_fused_node, test/inductor/test_cpu_repro.py::CPUReproTests::test_group_norm_backward_symint_divisible_channels, test/inductor/test_cpu_repro.py::CPUReproTests::test_index_add, test/inductor/test_cpu_repro.py::CPUReproTests::test_index_propagation_issue_102065, test/inductor/test_cpu_repro.py::CPUReproTests::test_inplace_add_alpha, test/inductor/test_cpu_repro.py::CPUReproTests::test_int32_pointwise_vec, test/inductor/test_cpu_repro.py::CPUReproTests::test_int64_reduction_vec, test/inductor/test_cpu_repro.py::CPUReproTests::test_issue_148058, test/inductor/test_cpu_repro.py::CPUReproTests::test_linear_with_reshape, test/inductor/test_cpu_repro.py::CPUReproTests::test_load_half, test/inductor/test_cpu_repro.py::CPUReproTests::test_load_inf_bf16, test/inductor/test_cpu_repro.py::CPUReproTests::test_load_same_bool_tensor_twice, test/inductor/test_cpu_repro.py::CPUReproTests::test_low_fp_index_expr_issue_147279, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_1_hidden_size_7_num_layers_1_bidirectional_False_bias_True_empty_state_True_batch_first_False_batch_size_7_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_1_hidden_size_7_num_layers_1_bidirectional_False_bias_True_empty_state_True_batch_first_True_batch_size_1_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_1_hidden_size_7_num_layers_1_bidirectional_False_bias_True_empty_state_True_batch_first_True_batch_size_7_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_1_hidden_size_7_num_layers_1_bidirectional_True_bias_False_empty_state_False_batch_first_False_batch_size_1_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_1_hidden_size_7_num_layers_1_bidirectional_True_bias_False_empty_state_False_batch_first_True_batch_size_1_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_1_hidden_size_7_num_layers_1_bidirectional_True_bias_False_empty_state_False_batch_first_True_batch_size_7_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_1_hidden_size_7_num_layers_1_bidirectional_True_bias_False_empty_state_True_batch_first_False_batch_size_7_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_1_hidden_size_7_num_layers_1_bidirectional_True_bias_False_empty_state_True_batch_first_True_batch_size_1_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_1_hidden_size_7_num_layers_1_bidirectional_True_bias_True_empty_state_False_batch_first_False_batch_size_1_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_1_hidden_size_7_num_layers_1_bidirectional_True_bias_True_empty_state_False_batch_first_False_batch_size_7_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_1_hidden_size_7_num_layers_1_bidirectional_True_bias_True_empty_state_False_batch_first_True_batch_size_1_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_1_hidden_size_7_num_layers_1_bidirectional_True_bias_True_empty_state_False_batch_first_True_batch_size_1_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_1_hidden_size_7_num_layers_1_bidirectional_True_bias_True_empty_state_False_batch_first_True_batch_size_7_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_1_hidden_size_7_num_layers_1_bidirectional_True_bias_True_empty_state_True_batch_first_False_batch_size_1_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_1_hidden_size_7_num_layers_1_bidirectional_True_bias_True_empty_state_True_batch_first_False_batch_size_7_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_1_hidden_size_7_num_layers_1_bidirectional_True_bias_True_empty_state_True_batch_first_True_batch_size_1_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_1_hidden_size_7_num_layers_7_bidirectional_False_bias_False_empty_state_False_batch_first_False_batch_size_7_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_1_hidden_size_7_num_layers_7_bidirectional_False_bias_False_empty_state_False_batch_first_True_batch_size_1_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_1_hidden_size_7_num_layers_7_bidirectional_False_bias_False_empty_state_False_batch_first_True_batch_size_1_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_1_hidden_size_7_num_layers_7_bidirectional_False_bias_False_empty_state_True_batch_first_False_batch_size_1_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_1_hidden_size_7_num_layers_7_bidirectional_False_bias_False_empty_state_True_batch_first_True_batch_size_1_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_1_hidden_size_7_num_layers_7_bidirectional_False_bias_False_empty_state_True_batch_first_True_batch_size_7_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_1_hidden_size_7_num_layers_7_bidirectional_False_bias_True_empty_state_False_batch_first_False_batch_size_1_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_1_hidden_size_7_num_layers_7_bidirectional_False_bias_True_empty_state_False_batch_first_True_batch_size_1_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_1_hidden_size_7_num_layers_7_bidirectional_False_bias_True_empty_state_True_batch_first_False_batch_size_1_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_1_hidden_size_7_num_layers_7_bidirectional_False_bias_True_empty_state_True_batch_first_False_batch_size_7_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_1_hidden_size_7_num_layers_7_bidirectional_False_bias_True_empty_state_True_batch_first_True_batch_size_1_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_1_hidden_size_7_num_layers_7_bidirectional_False_bias_True_empty_state_True_batch_first_True_batch_size_7_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_1_hidden_size_7_num_layers_7_bidirectional_True_bias_False_empty_state_False_batch_first_False_batch_size_1_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_1_hidden_size_7_num_layers_7_bidirectional_True_bias_False_empty_state_False_batch_first_False_batch_size_7_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_1_hidden_size_7_num_layers_7_bidirectional_True_bias_False_empty_state_False_batch_first_True_batch_size_1_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_1_hidden_size_7_num_layers_7_bidirectional_True_bias_False_empty_state_False_batch_first_True_batch_size_7_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_1_hidden_size_7_num_layers_7_bidirectional_True_bias_False_empty_state_False_batch_first_True_batch_size_7_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_1_hidden_size_7_num_layers_7_bidirectional_True_bias_False_empty_state_True_batch_first_False_batch_size_1_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_1_hidden_size_7_num_layers_7_bidirectional_True_bias_False_empty_state_True_batch_first_False_batch_size_7_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_1_hidden_size_7_num_layers_7_bidirectional_True_bias_False_empty_state_True_batch_first_True_batch_size_7_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_1_hidden_size_7_num_layers_7_bidirectional_True_bias_True_empty_state_False_batch_first_False_batch_size_1_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_1_hidden_size_7_num_layers_7_bidirectional_True_bias_True_empty_state_False_batch_first_False_batch_size_7_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_1_hidden_size_7_num_layers_7_bidirectional_True_bias_True_empty_state_False_batch_first_True_batch_size_1_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_1_hidden_size_7_num_layers_7_bidirectional_True_bias_True_empty_state_True_batch_first_True_batch_size_1_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_1_hidden_size_7_num_layers_7_bidirectional_True_bias_True_empty_state_True_batch_first_True_batch_size_7_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_7_hidden_size_7_num_layers_1_bidirectional_False_bias_False_empty_state_False_batch_first_False_batch_size_7_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_7_hidden_size_7_num_layers_1_bidirectional_False_bias_False_empty_state_True_batch_first_False_batch_size_7_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_7_hidden_size_7_num_layers_1_bidirectional_False_bias_True_empty_state_False_batch_first_False_batch_size_1_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_7_hidden_size_7_num_layers_1_bidirectional_False_bias_True_empty_state_True_batch_first_False_batch_size_7_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_7_hidden_size_7_num_layers_1_bidirectional_False_bias_True_empty_state_True_batch_first_True_batch_size_1_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_7_hidden_size_7_num_layers_1_bidirectional_False_bias_True_empty_state_True_batch_first_True_batch_size_1_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_7_hidden_size_7_num_layers_1_bidirectional_False_bias_True_empty_state_True_batch_first_True_batch_size_7_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_7_hidden_size_7_num_layers_1_bidirectional_True_bias_False_empty_state_False_batch_first_True_batch_size_1_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_7_hidden_size_7_num_layers_1_bidirectional_True_bias_False_empty_state_False_batch_first_True_batch_size_7_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_7_hidden_size_7_num_layers_1_bidirectional_True_bias_False_empty_state_True_batch_first_False_batch_size_1_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_7_hidden_size_7_num_layers_1_bidirectional_True_bias_False_empty_state_True_batch_first_True_batch_size_1_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_7_hidden_size_7_num_layers_1_bidirectional_True_bias_False_empty_state_True_batch_first_True_batch_size_1_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_7_hidden_size_7_num_layers_1_bidirectional_True_bias_True_empty_state_False_batch_first_False_batch_size_7_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_7_hidden_size_7_num_layers_1_bidirectional_True_bias_True_empty_state_False_batch_first_True_batch_size_1_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_7_hidden_size_7_num_layers_1_bidirectional_True_bias_True_empty_state_False_batch_first_True_batch_size_7_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_7_hidden_size_7_num_layers_1_bidirectional_True_bias_True_empty_state_False_batch_first_True_batch_size_7_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_7_hidden_size_7_num_layers_1_bidirectional_True_bias_True_empty_state_True_batch_first_False_batch_size_1_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_7_hidden_size_7_num_layers_1_bidirectional_True_bias_True_empty_state_True_batch_first_False_batch_size_1_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_7_hidden_size_7_num_layers_1_bidirectional_True_bias_True_empty_state_True_batch_first_True_batch_size_1_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_7_hidden_size_7_num_layers_1_bidirectional_True_bias_True_empty_state_True_batch_first_True_batch_size_7_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_7_hidden_size_7_num_layers_7_bidirectional_False_bias_False_empty_state_False_batch_first_True_batch_size_1_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_7_hidden_size_7_num_layers_7_bidirectional_False_bias_False_empty_state_True_batch_first_False_batch_size_1_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_7_hidden_size_7_num_layers_7_bidirectional_False_bias_False_empty_state_True_batch_first_False_batch_size_1_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_7_hidden_size_7_num_layers_7_bidirectional_False_bias_True_empty_state_False_batch_first_True_batch_size_7_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_7_hidden_size_7_num_layers_7_bidirectional_False_bias_True_empty_state_True_batch_first_False_batch_size_1_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_7_hidden_size_7_num_layers_7_bidirectional_True_bias_False_empty_state_False_batch_first_False_batch_size_1_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_7_hidden_size_7_num_layers_7_bidirectional_True_bias_False_empty_state_False_batch_first_False_batch_size_7_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_7_hidden_size_7_num_layers_7_bidirectional_True_bias_False_empty_state_True_batch_first_False_batch_size_7_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_7_hidden_size_7_num_layers_7_bidirectional_True_bias_False_empty_state_True_batch_first_False_batch_size_7_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_7_hidden_size_7_num_layers_7_bidirectional_True_bias_False_empty_state_True_batch_first_True_batch_size_7_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_7_hidden_size_7_num_layers_7_bidirectional_True_bias_False_empty_state_True_batch_first_True_batch_size_7_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_7_hidden_size_7_num_layers_7_bidirectional_True_bias_True_empty_state_False_batch_first_False_batch_size_7_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_7_hidden_size_7_num_layers_7_bidirectional_True_bias_True_empty_state_False_batch_first_False_batch_size_7_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_7_hidden_size_7_num_layers_7_bidirectional_True_bias_True_empty_state_False_batch_first_True_batch_size_7_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_7_hidden_size_7_num_layers_7_bidirectional_True_bias_True_empty_state_True_batch_first_False_batch_size_1_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_7_hidden_size_7_num_layers_7_bidirectional_True_bias_True_empty_state_True_batch_first_False_batch_size_7_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_7_hidden_size_7_num_layers_7_bidirectional_True_bias_True_empty_state_True_batch_first_True_batch_size_1_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_7_hidden_size_7_num_layers_7_bidirectional_True_bias_True_empty_state_True_batch_first_True_batch_size_7_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_1_hidden_size_7_num_layers_1_bidirectional_False_bias_False_empty_state_False_batch_first_False_batch_size_1_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_1_hidden_size_7_num_layers_1_bidirectional_False_bias_False_empty_state_False_batch_first_False_batch_size_7_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_1_hidden_size_7_num_layers_1_bidirectional_False_bias_False_empty_state_False_batch_first_True_batch_size_1_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_1_hidden_size_7_num_layers_1_bidirectional_False_bias_False_empty_state_False_batch_first_True_batch_size_7_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_1_hidden_size_7_num_layers_1_bidirectional_False_bias_False_empty_state_True_batch_first_False_batch_size_7_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_1_hidden_size_7_num_layers_1_bidirectional_False_bias_True_empty_state_False_batch_first_False_batch_size_1_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_1_hidden_size_7_num_layers_1_bidirectional_False_bias_True_empty_state_False_batch_first_False_batch_size_7_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_1_hidden_size_7_num_layers_1_bidirectional_False_bias_True_empty_state_False_batch_first_True_batch_size_1_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_1_hidden_size_7_num_layers_1_bidirectional_False_bias_True_empty_state_True_batch_first_False_batch_size_7_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_1_hidden_size_7_num_layers_1_bidirectional_False_bias_True_empty_state_True_batch_first_True_batch_size_7_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_1_hidden_size_7_num_layers_1_bidirectional_True_bias_False_empty_state_False_batch_first_True_batch_size_1_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_1_hidden_size_7_num_layers_1_bidirectional_True_bias_False_empty_state_False_batch_first_True_batch_size_1_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_1_hidden_size_7_num_layers_1_bidirectional_True_bias_False_empty_state_False_batch_first_True_batch_size_7_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_1_hidden_size_7_num_layers_1_bidirectional_True_bias_False_empty_state_True_batch_first_False_batch_size_1_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_1_hidden_size_7_num_layers_1_bidirectional_True_bias_False_empty_state_True_batch_first_False_batch_size_7_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_1_hidden_size_7_num_layers_1_bidirectional_True_bias_False_empty_state_True_batch_first_True_batch_size_1_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_1_hidden_size_7_num_layers_1_bidirectional_True_bias_True_empty_state_False_batch_first_True_batch_size_1_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_1_hidden_size_7_num_layers_1_bidirectional_True_bias_True_empty_state_False_batch_first_True_batch_size_7_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_1_hidden_size_7_num_layers_1_bidirectional_True_bias_True_empty_state_True_batch_first_False_batch_size_1_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_1_hidden_size_7_num_layers_1_bidirectional_True_bias_True_empty_state_True_batch_first_True_batch_size_1_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_1_hidden_size_7_num_layers_1_bidirectional_True_bias_True_empty_state_True_batch_first_True_batch_size_7_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_1_hidden_size_7_num_layers_7_bidirectional_False_bias_False_empty_state_False_batch_first_False_batch_size_1_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_1_hidden_size_7_num_layers_7_bidirectional_False_bias_False_empty_state_False_batch_first_True_batch_size_7_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_1_hidden_size_7_num_layers_7_bidirectional_False_bias_False_empty_state_True_batch_first_False_batch_size_7_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_1_hidden_size_7_num_layers_7_bidirectional_False_bias_False_empty_state_True_batch_first_True_batch_size_7_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_1_hidden_size_7_num_layers_7_bidirectional_False_bias_True_empty_state_False_batch_first_False_batch_size_1_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_1_hidden_size_7_num_layers_7_bidirectional_False_bias_True_empty_state_False_batch_first_True_batch_size_1_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_1_hidden_size_7_num_layers_7_bidirectional_False_bias_True_empty_state_False_batch_first_True_batch_size_1_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_1_hidden_size_7_num_layers_7_bidirectional_False_bias_True_empty_state_False_batch_first_True_batch_size_7_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_1_hidden_size_7_num_layers_7_bidirectional_False_bias_True_empty_state_True_batch_first_False_batch_size_7_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_1_hidden_size_7_num_layers_7_bidirectional_True_bias_False_empty_state_False_batch_first_True_batch_size_1_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_1_hidden_size_7_num_layers_7_bidirectional_True_bias_False_empty_state_True_batch_first_False_batch_size_1_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_1_hidden_size_7_num_layers_7_bidirectional_True_bias_False_empty_state_True_batch_first_False_batch_size_7_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_1_hidden_size_7_num_layers_7_bidirectional_True_bias_False_empty_state_True_batch_first_True_batch_size_1_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_1_hidden_size_7_num_layers_7_bidirectional_True_bias_True_empty_state_False_batch_first_False_batch_size_1_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_1_hidden_size_7_num_layers_7_bidirectional_True_bias_True_empty_state_False_batch_first_False_batch_size_7_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_1_hidden_size_7_num_layers_7_bidirectional_True_bias_True_empty_state_False_batch_first_False_batch_size_7_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_1_hidden_size_7_num_layers_7_bidirectional_True_bias_True_empty_state_True_batch_first_False_batch_size_7_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_7_hidden_size_7_num_layers_1_bidirectional_False_bias_False_empty_state_False_batch_first_False_batch_size_1_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_7_hidden_size_7_num_layers_1_bidirectional_False_bias_False_empty_state_False_batch_first_True_batch_size_1_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_7_hidden_size_7_num_layers_1_bidirectional_False_bias_False_empty_state_True_batch_first_False_batch_size_1_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_7_hidden_size_7_num_layers_1_bidirectional_False_bias_True_empty_state_False_batch_first_True_batch_size_7_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_7_hidden_size_7_num_layers_1_bidirectional_False_bias_True_empty_state_True_batch_first_False_batch_size_1_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_7_hidden_size_7_num_layers_1_bidirectional_False_bias_True_empty_state_True_batch_first_True_batch_size_7_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_7_hidden_size_7_num_layers_1_bidirectional_True_bias_False_empty_state_False_batch_first_False_batch_size_1_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_7_hidden_size_7_num_layers_1_bidirectional_True_bias_False_empty_state_False_batch_first_False_batch_size_7_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_7_hidden_size_7_num_layers_1_bidirectional_True_bias_False_empty_state_False_batch_first_True_batch_size_1_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_7_hidden_size_7_num_layers_1_bidirectional_True_bias_False_empty_state_True_batch_first_False_batch_size_1_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_7_hidden_size_7_num_layers_1_bidirectional_True_bias_False_empty_state_True_batch_first_False_batch_size_1_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_7_hidden_size_7_num_layers_1_bidirectional_True_bias_False_empty_state_True_batch_first_False_batch_size_7_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_7_hidden_size_7_num_layers_1_bidirectional_True_bias_False_empty_state_True_batch_first_False_batch_size_7_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_7_hidden_size_7_num_layers_1_bidirectional_True_bias_False_empty_state_True_batch_first_True_batch_size_7_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_7_hidden_size_7_num_layers_1_bidirectional_True_bias_True_empty_state_False_batch_first_False_batch_size_7_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_7_hidden_size_7_num_layers_1_bidirectional_True_bias_True_empty_state_False_batch_first_True_batch_size_7_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_7_hidden_size_7_num_layers_1_bidirectional_True_bias_True_empty_state_True_batch_first_False_batch_size_7_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_7_hidden_size_7_num_layers_1_bidirectional_True_bias_True_empty_state_True_batch_first_True_batch_size_7_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_7_hidden_size_7_num_layers_7_bidirectional_False_bias_False_empty_state_True_batch_first_False_batch_size_1_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_7_hidden_size_7_num_layers_7_bidirectional_False_bias_False_empty_state_True_batch_first_False_batch_size_7_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_7_hidden_size_7_num_layers_7_bidirectional_False_bias_False_empty_state_True_batch_first_True_batch_size_1_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_7_hidden_size_7_num_layers_7_bidirectional_False_bias_False_empty_state_True_batch_first_True_batch_size_1_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_7_hidden_size_7_num_layers_7_bidirectional_False_bias_False_empty_state_True_batch_first_True_batch_size_7_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_7_hidden_size_7_num_layers_7_bidirectional_False_bias_True_empty_state_False_batch_first_True_batch_size_1_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_7_hidden_size_7_num_layers_7_bidirectional_False_bias_True_empty_state_False_batch_first_True_batch_size_7_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_7_hidden_size_7_num_layers_7_bidirectional_False_bias_True_empty_state_True_batch_first_False_batch_size_1_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_7_hidden_size_7_num_layers_7_bidirectional_False_bias_True_empty_state_True_batch_first_False_batch_size_7_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_7_hidden_size_7_num_layers_7_bidirectional_False_bias_True_empty_state_True_batch_first_True_batch_size_1_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_7_hidden_size_7_num_layers_7_bidirectional_False_bias_True_empty_state_True_batch_first_True_batch_size_1_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_7_hidden_size_7_num_layers_7_bidirectional_True_bias_False_empty_state_False_batch_first_False_batch_size_7_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_7_hidden_size_7_num_layers_7_bidirectional_True_bias_False_empty_state_False_batch_first_True_batch_size_1_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_7_hidden_size_7_num_layers_7_bidirectional_True_bias_False_empty_state_False_batch_first_True_batch_size_1_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_7_hidden_size_7_num_layers_7_bidirectional_True_bias_False_empty_state_True_batch_first_False_batch_size_1_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_7_hidden_size_7_num_layers_7_bidirectional_True_bias_False_empty_state_True_batch_first_False_batch_size_7_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_7_hidden_size_7_num_layers_7_bidirectional_True_bias_False_empty_state_True_batch_first_True_batch_size_1_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_7_hidden_size_7_num_layers_7_bidirectional_True_bias_False_empty_state_True_batch_first_True_batch_size_7_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_7_hidden_size_7_num_layers_7_bidirectional_True_bias_True_empty_state_False_batch_first_False_batch_size_1_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_7_hidden_size_7_num_layers_7_bidirectional_True_bias_True_empty_state_False_batch_first_False_batch_size_7_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_7_hidden_size_7_num_layers_7_bidirectional_True_bias_True_empty_state_False_batch_first_True_batch_size_1_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_7_hidden_size_7_num_layers_7_bidirectional_True_bias_True_empty_state_True_batch_first_False_batch_size_1_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_7_hidden_size_7_num_layers_7_bidirectional_True_bias_True_empty_state_True_batch_first_False_batch_size_7_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_7_hidden_size_7_num_layers_7_bidirectional_True_bias_True_empty_state_True_batch_first_True_batch_size_1_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_masked_fill_softmax, test/inductor/test_cpu_repro.py::CPUReproTests::test_masked_fill_with_inf_or_nan_value, test/inductor/test_cpu_repro.py::CPUReproTests::test_masked_load_int64_vec, test/inductor/test_cpu_repro.py::CPUReproTests::test_memory_copy_with_fusion, test/inductor/test_cpu_repro.py::CPUReproTests::test_mkl_linear, test/inductor/test_cpu_repro.py::CPUReproTests::test_nn_param_assign_wrapped, test/inductor/test_cpu_repro.py::CPUReproTests::test_no_redundant_to_dtypes_between_fused_scheduler_node, test/inductor/test_cpu_repro.py::CPUReproTests::test_non_contiguous_index_with_constant_stride, test/inductor/test_cpu_repro.py::CPUReproTests::test_non_contiguous_load_buf_quant_int8, test/inductor/test_cpu_repro.py::CPUReproTests::test_outer_loop_fusion, test/inductor/test_cpu_repro.py::CPUReproTests::test_outer_loop_fusion_buffer_remove, test/inductor/test_cpu_repro.py::CPUReproTests::test_pad_with_nan_value, test/inductor/test_cpu_repro.py::CPUReproTests::test_per_channel_fake_quant_int8, test/inductor/test_cpu_repro.py::CPUReproTests::test_per_channel_fake_quant_uint8, test/inductor/test_cpu_repro.py::CPUReproTests::test_per_tensor_fake_quant_int8, test/inductor/test_cpu_repro.py::CPUReproTests::test_randint_symint_input, test/inductor/test_cpu_repro.py::CPUReproTests::test_reduction_with_dynamic_threads, test/inductor/test_cpu_repro.py::CPUReproTests::test_relu_with_inf_value, test/inductor/test_cpu_repro.py::CPUReproTests::test_scalar_sign_with_min, test/inductor/test_cpu_repro.py::CPUReproTests::test_scatter_using_atomic_add, test/inductor/test_cpu_repro.py::CPUReproTests::test_sign_cpu_only, test/inductor/test_cpu_repro.py::CPUReproTests::test_slice_scatter_default_end_value, test/inductor/test_cpu_repro.py::CPUReproTests::test_tile2d_store_channel_shuffle_cl_quant_output_uint8, test/inductor/test_cpu_repro.py::CPUReproTests::test_timed_cpu_only, test/inductor/test_cpu_repro.py::CPUReproTests::test_to_channels_last_fp8, test/inductor/test_cpu_repro.py::CPUReproTests::test_to_channels_last_lowp_fp, test/inductor/test_cpu_repro.py::CPUReproTests::test_to_dtype_float_bool, test/inductor/test_cpu_repro.py::CPUReproTests::test_to_uint8_rounding_method, test/inductor/test_cpu_repro.py::CPUReproTests::test_transpose_mxn_16_16_bf16_fp16, test/inductor/test_cpu_repro.py::CPUReproTests::test_transpose_non_contiguous, test/inductor/test_cpu_repro.py::CPUReproTests::test_transpose_vertical_sum_cpu_only, test/inductor/test_cpu_repro.py::CPUReproTests::test_two_local_buffers_in_outer_loop_fusion, test/inductor/test_cpu_repro.py::CPUReproTests::test_uint64_pointwise_vec, test/inductor/test_cpu_repro.py::CPUReproTests::test_unrolled_bool_prod_vectorized, test/inductor/test_cpu_repro.py::CPUReproTests::test_unsupported_conv_transpose, test/inductor/test_cpu_repro.py::CPUReproTests::test_vec_compare_op_cpu_only, test/inductor/test_cpu_repro.py::CPUReproTests::test_vec_logical, test/inductor/test_cpu_repro.py::CPUReproTests::test_vec_randn, test/inductor/test_cpu_repro.py::CPUReproTests::test_view_dtype 2025-12-04T15:35:02.6287030Z 2025-12-04T15:35:02.6287419Z Finished inductor/test_cpu_repro 1/3 ... [2025-12-04 15:35:02.588412][22930.971297542], took 14.36min 2025-12-04T15:35:02.6288692Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cpu_repro/inductor.test_cpu_repro-e45fcbaf6c1a2b2c.xml 2025-12-04T15:35:02.7181129Z Running inductor/test_custom_lowering 1/1 ... [2025-12-04 15:35:02.717691][22931.100584221] 2025-12-04T15:35:02.7181791Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T15:35:02.7184702Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_custom_lowering.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:35:02.718182] 2025-12-04T15:35:26.3188745Z 2025-12-04T15:35:26.3189867Z inductor/test_custom_lowering 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_custom_lowering_1.1_b51e0c13dc286ed6_.log 2025-12-04T15:35:26.3193427Z Running 6 items in this shard: test/inductor/test_custom_lowering.py::TestCustomLowering::test_constant_creation, test/inductor/test_custom_lowering.py::TestCustomLowering::test_jagged_to_padded_dense_sanity_cuda, test/inductor/test_custom_lowering.py::TestCustomLowering::test_jagged_to_padded_dense_zero_size, test/inductor/test_custom_lowering.py::TestCustomLowering::test_multi_inp_asm, test/inductor/test_custom_lowering.py::TestCustomLowering::test_register_lowering_custom_dict, test/inductor/test_custom_lowering.py::TestCustomLowering::test_tanh_approx 2025-12-04T15:35:26.3196259Z 2025-12-04T15:35:26.3196640Z Finished inductor/test_custom_lowering 1/1 ... [2025-12-04 15:35:26.318667][22954.701563649], took 0.39min 2025-12-04T15:35:26.3449880Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_custom_lowering/inductor.test_custom_lowering-f90a8c2a1b7dd9b0.xml 2025-12-04T15:35:26.4286308Z Running inductor/test_perf 1/1 ... [2025-12-04 15:35:26.428372][22954.811266278] 2025-12-04T15:35:26.4286840Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T15:35:26.4290699Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_perf.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:35:26.428845] 2025-12-04T15:36:27.7444889Z 2025-12-04T15:36:27.7445877Z inductor/test_perf 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_perf_1.1_8b1dd16368b2df6e_.log 2025-12-04T15:36:27.7471341Z Running 66 items in this shard: test/inductor/test_perf.py::NumBytesMetricTests::test_cat, test/inductor/test_perf.py::NumBytesMetricTests::test_cat_pointwise, test/inductor/test_perf.py::NumBytesMetricTests::test_cat_pointwise_config_option, test/inductor/test_perf.py::NumBytesMetricTests::test_cat_pointwise_many_complex_inputs, test/inductor/test_perf.py::NumBytesMetricTests::test_cat_pointwise_many_simple_inputs, test/inductor/test_perf.py::NumBytesMetricTests::test_extern, test/inductor/test_perf.py::NumBytesMetricTests::test_index, test/inductor/test_perf.py::NumBytesMetricTests::test_pointwise, test/inductor/test_perf.py::NumBytesMetricTests::test_reduction, test/inductor/test_perf.py::FusionTests::test_create_block_mask, test/inductor/test_perf.py::FusionTests::test_double_softmax, test/inductor/test_perf.py::FusionTests::test_factory_reduction, test/inductor/test_perf.py::FusionTests::test_horizontal_reduction_outer_pointwise, test/inductor/test_perf.py::FusionTests::test_horizontal_reduction_pointwise, test/inductor/test_perf.py::FusionTests::test_horizontal_reduction_pointwise2, test/inductor/test_perf.py::FusionTests::test_horizontal_reduction_reduction, test/inductor/test_perf.py::FusionTests::test_horizontal_sum_pw_broadcast, test/inductor/test_perf.py::FusionTests::test_index_pointwise, test/inductor/test_perf.py::FusionTests::test_index_reduction, test/inductor/test_perf.py::FusionTests::test_layer_norm, test/inductor/test_perf.py::FusionTests::test_mutation_fusion, test/inductor/test_perf.py::FusionTests::test_neighbor, test/inductor/test_perf.py::FusionTests::test_norm_chain, test/inductor/test_perf.py::FusionTests::test_pointwise_multi_level_reduction, test/inductor/test_perf.py::FusionTests::test_reduction_pointwise_multi_level_reduction, test/inductor/test_perf.py::FusionTests::test_softmax_backward, test/inductor/test_perf.py::FusionTests::test_softmax_inner, test/inductor/test_perf.py::FusionTests::test_vertical_sum_pw, test/inductor/test_perf.py::SchedulerFusionTests::test_fusion_choice1, test/inductor/test_perf.py::SchedulerFusionTests::test_fusion_choice2, test/inductor/test_perf.py::SchedulerFusionTests::test_fusion_choice3, test/inductor/test_perf.py::SchedulerFusionTests::test_fusion_choice4_cpu, test/inductor/test_perf.py::TilingTests::test_tiling_simple, test/inductor/test_perf.py::TilingTests::test_tiling_three, test/inductor/test_perf.py::MinCutPartitioningTests::test_partitioning_cat, test/inductor/test_perf.py::MinCutPartitioningTests::test_partitioning_dtype, test/inductor/test_perf.py::MinCutPartitioningTests::test_partitioning_full_remat, test/inductor/test_perf.py::MinCutPartitioningTests::test_partitioning_keops, test/inductor/test_perf.py::MinCutPartitioningTests::test_partitioning_long_chain_add, test/inductor/test_perf.py::MinCutPartitioningTests::test_partitioning_partial_remat, test/inductor/test_perf.py::MinCutPartitioningTests::test_partitioning_relu, test/inductor/test_perf.py::MinCutPartitioningTests::test_partitioning_unremat_bw, test/inductor/test_perf.py::MinCutPartitioningTests::test_partitioning_unremat_bw2, test/inductor/test_perf.py::MinCutPartitioningTests::test_partitioning_with_view, test/inductor/test_perf.py::NoopTests::test_noop_cat, test/inductor/test_perf.py::NoopTests::test_noop_clones, test/inductor/test_perf.py::NoopTests::test_noop_device_conversion, test/inductor/test_perf.py::NoopTests::test_noop_dtype_conversion, test/inductor/test_perf.py::NoopTests::test_noop_int_ops, test/inductor/test_perf.py::NoopTests::test_noop_slice_scatter, test/inductor/test_perf.py::InplacingTests::test_inplace_custom_op, test/inductor/test_perf.py::InplacingTests::test_inplace_custom_op_intermediate, test/inductor/test_perf.py::InplacingTests::test_inplace_custom_op_training, test/inductor/test_perf.py::InplacingTests::test_inplace_custom_op_training_two_mutated_inputs, test/inductor/test_perf.py::InplacingTests::test_inplace_custom_op_two_mutated_inputs, test/inductor/test_perf.py::InplacingTests::test_inplace_randperm_scatter, test/inductor/test_perf.py::InplacingTests::test_inplace_scatter, test/inductor/test_perf.py::InplacingTests::test_inplace_scatter_noop_view, test/inductor/test_perf.py::InplacingTests::test_inplace_triton_kernel_training, test/inductor/test_perf.py::InplacingTests::test_inplace_triton_kernel_v1, test/inductor/test_perf.py::InplacingTests::test_inplace_triton_kernel_v2, test/inductor/test_perf.py::InplacingTests::test_inplace_triton_kernel_v3, test/inductor/test_perf.py::InplacingTests::test_inplace_triton_kernel_v4, test/inductor/test_perf.py::InplacingTests::test_inplace_triton_kernel_v5, test/inductor/test_perf.py::InplacingTests::test_inplace_triton_kernel_v6, test/inductor/test_perf.py::InplacingTests::test_triton_kernel_not_fusable_with_users 2025-12-04T15:36:27.7495006Z 2025-12-04T15:36:27.7495331Z Finished inductor/test_perf 1/1 ... [2025-12-04 15:36:27.744639][23016.127530282], took 1.02min 2025-12-04T15:36:27.7713563Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_perf/inductor.test_perf-34de9a09a2935f8d.xml 2025-12-04T15:36:27.8555522Z Running inductor/test_binary_folding 1/1 ... [2025-12-04 15:36:27.855173][23016.23806694] 2025-12-04T15:36:27.8556211Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T15:36:27.8559427Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_binary_folding.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:36:27.855634] 2025-12-04T15:38:20.0471537Z 2025-12-04T15:38:20.0472841Z inductor/test_binary_folding 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_binary_folding_1.1_181cb55db6266036_.log 2025-12-04T15:38:20.0476323Z Running 6 items in this shard: test/inductor/test_binary_folding.py::FreezingCpuTests::test_conv_binary_folding_cpu, test/inductor/test_binary_folding.py::FreezingCpuTests::test_conv_bn_folding_cpu, test/inductor/test_binary_folding.py::FreezingCpuTests::test_linear_binary_folding_cpu, test/inductor/test_binary_folding.py::FreezingGpuTests::test_conv_binary_folding_cuda, test/inductor/test_binary_folding.py::FreezingGpuTests::test_conv_bn_folding_cuda, test/inductor/test_binary_folding.py::FreezingGpuTests::test_linear_binary_folding_cuda 2025-12-04T15:38:20.0479083Z 2025-12-04T15:38:20.0479460Z Finished inductor/test_binary_folding 1/1 ... [2025-12-04 15:38:20.046900][23128.429796054], took 1.87min 2025-12-04T15:38:20.0736656Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_binary_folding/inductor.test_binary_folding-0c797ad2be676af7.xml 2025-12-04T15:38:20.1619444Z Running inductor/test_mkldnn_pattern_matcher 3/3 ... [2025-12-04 15:38:20.161640][23128.544533143] 2025-12-04T15:38:20.1620118Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T15:38:20.1623208Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_mkldnn_pattern_matcher.py', '--shard-id=3', '--num-shards=3', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:38:20.162060] 2025-12-04T15:46:32.2305180Z 2025-12-04T15:46:32.2306362Z inductor/test_mkldnn_pattern_matcher 3/3 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_mkldnn_pattern_matcher_3.3_de8f963f0fd4260a_.log 2025-12-04T15:46:32.2377458Z Running 95 items in this shard: test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_conv2d_binary_inplace_fusion_failed_cpu, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_False_bfloat16_dynamic_False_reshape_a_False_M_1_inplace_add_False_expand_a_scale_False, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_False_bfloat16_dynamic_False_reshape_a_False_M_1_inplace_add_False_expand_a_scale_True, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_False_bfloat16_dynamic_False_reshape_a_False_M_1_inplace_add_True_expand_a_scale_False, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_False_bfloat16_dynamic_False_reshape_a_False_M_32_inplace_add_False_expand_a_scale_False, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_False_bfloat16_dynamic_False_reshape_a_False_M_32_inplace_add_False_expand_a_scale_True, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_False_bfloat16_dynamic_False_reshape_a_False_M_32_inplace_add_True_expand_a_scale_False, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_False_bfloat16_dynamic_False_reshape_a_False_M_32_inplace_add_True_expand_a_scale_True, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_False_bfloat16_dynamic_False_reshape_a_True_M_1_inplace_add_True_expand_a_scale_False, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_False_bfloat16_dynamic_False_reshape_a_True_M_32_inplace_add_True_expand_a_scale_True, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_False_bfloat16_dynamic_True_reshape_a_False_M_1_inplace_add_False_expand_a_scale_True, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_False_bfloat16_dynamic_True_reshape_a_False_M_32_inplace_add_False_expand_a_scale_False, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_False_bfloat16_dynamic_True_reshape_a_False_M_32_inplace_add_True_expand_a_scale_False, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_False_bfloat16_dynamic_True_reshape_a_True_M_1_inplace_add_False_expand_a_scale_True, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_False_bfloat16_dynamic_True_reshape_a_True_M_32_inplace_add_True_expand_a_scale_False, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_False_float32_dynamic_False_reshape_a_False_M_1_inplace_add_True_expand_a_scale_False, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_False_float32_dynamic_False_reshape_a_False_M_32_inplace_add_False_expand_a_scale_False, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_False_float32_dynamic_False_reshape_a_True_M_32_inplace_add_True_expand_a_scale_False, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_False_float32_dynamic_True_reshape_a_False_M_1_inplace_add_False_expand_a_scale_False, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_False_float32_dynamic_True_reshape_a_False_M_1_inplace_add_True_expand_a_scale_False, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_False_float32_dynamic_True_reshape_a_False_M_1_inplace_add_True_expand_a_scale_True, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_False_float32_dynamic_True_reshape_a_False_M_32_inplace_add_False_expand_a_scale_True, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_False_float32_dynamic_True_reshape_a_False_M_32_inplace_add_True_expand_a_scale_False, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_False_float32_dynamic_True_reshape_a_True_M_1_inplace_add_False_expand_a_scale_True, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_False_float32_dynamic_True_reshape_a_True_M_1_inplace_add_True_expand_a_scale_True, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_False_float32_dynamic_True_reshape_a_True_M_32_inplace_add_True_expand_a_scale_True, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_True_bfloat16_dynamic_False_reshape_a_False_M_1_inplace_add_False_expand_a_scale_False, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_True_bfloat16_dynamic_False_reshape_a_False_M_1_inplace_add_False_expand_a_scale_True, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_True_bfloat16_dynamic_False_reshape_a_True_M_1_inplace_add_True_expand_a_scale_False, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_True_bfloat16_dynamic_False_reshape_a_True_M_32_inplace_add_False_expand_a_scale_False, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_True_bfloat16_dynamic_False_reshape_a_True_M_32_inplace_add_False_expand_a_scale_True, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_True_bfloat16_dynamic_False_reshape_a_True_M_32_inplace_add_True_expand_a_scale_False, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_True_bfloat16_dynamic_False_reshape_a_True_M_32_inplace_add_True_expand_a_scale_True, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_True_bfloat16_dynamic_True_reshape_a_False_M_1_inplace_add_False_expand_a_scale_True, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_True_bfloat16_dynamic_True_reshape_a_False_M_32_inplace_add_False_expand_a_scale_False, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_True_bfloat16_dynamic_True_reshape_a_False_M_32_inplace_add_False_expand_a_scale_True, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_True_bfloat16_dynamic_True_reshape_a_True_M_1_inplace_add_False_expand_a_scale_False, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_True_bfloat16_dynamic_True_reshape_a_True_M_1_inplace_add_True_expand_a_scale_False, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_True_bfloat16_dynamic_True_reshape_a_True_M_32_inplace_add_False_expand_a_scale_False, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_True_bfloat16_dynamic_True_reshape_a_True_M_32_inplace_add_False_expand_a_scale_True, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_True_float32_dynamic_False_reshape_a_False_M_1_inplace_add_False_expand_a_scale_False, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_True_float32_dynamic_False_reshape_a_False_M_1_inplace_add_False_expand_a_scale_True, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_True_float32_dynamic_False_reshape_a_False_M_1_inplace_add_True_expand_a_scale_True, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_True_float32_dynamic_False_reshape_a_False_M_32_inplace_add_False_expand_a_scale_False, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_True_float32_dynamic_False_reshape_a_False_M_32_inplace_add_True_expand_a_scale_True, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_True_float32_dynamic_False_reshape_a_True_M_1_inplace_add_True_expand_a_scale_True, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_True_float32_dynamic_False_reshape_a_True_M_32_inplace_add_False_expand_a_scale_True, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_True_float32_dynamic_False_reshape_a_True_M_32_inplace_add_True_expand_a_scale_True, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_True_float32_dynamic_True_reshape_a_False_M_1_inplace_add_True_expand_a_scale_True, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_True_float32_dynamic_True_reshape_a_False_M_32_inplace_add_False_expand_a_scale_False, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_dynamic_qlinear_cpu, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_dynamic_qlinear_qat_cpu, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_leaky_relu_pattern_fallback, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_linear_binary_broadcast_shapes, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_linear_fp32, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_linear_relu_dynamic_fp16, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_qat_qconv2d_add_relu, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_qat_qconv2d_silu, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_qconv1d_relu_cpu, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_qconv2d_add_2, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_qconv2d_add_int8_mixed_bf16, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_qconv2d_add_relu_int8_mixed_bf16, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_qconv2d_add_relu_int8_mixed_bf16_xpu, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_qconv2d_cpu, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_qconv2d_hardswish_int8_mixed_bf16_cpu, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_qconv2d_hardswish_int8_mixed_bf16_xpu, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_qconv2d_hardtanh_int8_mixed_bf16_cpu, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_qconv2d_hardtanh_xpu, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_qconv2d_int8_mixed_bf16_xpu, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_qconv2d_relu_int8_mixed_bf16_xpu, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_qlinear_add_cpu_use_relu_False_is_qat_False_is_dynamic_False, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_qlinear_add_cpu_use_relu_False_is_qat_False_is_dynamic_True, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_qlinear_add_cpu_use_relu_False_is_qat_True_is_dynamic_True, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_qlinear_add_cpu_use_relu_True_is_qat_True_is_dynamic_False, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_qlinear_add_cpu_use_relu_True_is_qat_True_is_dynamic_True, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_qlinear_add_int8_mixed_bf16_use_relu_True_is_qat_True_is_dynamic_True, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_qlinear_add_xpu_use_relu_True_is_qat_False_is_dynamic_False, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_qlinear_dequant_promotion_cpu_input_dim_exceeds_2, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_qlinear_dequant_promotion_input_dim_exceeds_2_xpu, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_qlinear_dequant_promotion_int8_mixed_bf16_xpu, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_qlinear_dequant_promotion_xpu, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_qlinear_input_dim_exceeds_2_xpu, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_qlinear_int8_mixed_bf16, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_qlinear_int8_mixed_bf16_input_dim_exceeds_2_and_not_contiguous_xpu, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_qlinear_int8_mixed_bf16_input_dim_exceeds_2_use_autocast, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_qlinear_mul, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_qlinear_relu_int8_mixed_bf16_input_dim_exceeds_2, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_smooth_quant_with_int_mm_has_bias_False_float32_per_channel_quant_True_dynamic_True, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_smooth_quant_with_int_mm_has_bias_True_bfloat16_per_channel_quant_True_dynamic_False, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_smooth_quant_with_int_mm_has_bias_True_bfloat16_per_channel_quant_True_dynamic_True, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_smooth_quant_with_int_mm_has_bias_True_float32_per_channel_quant_True_dynamic_True, test/inductor/test_mkldnn_pattern_matcher.py::TestDynamicPatternMatcher::test_linear_input_non_contiguous_3D_wo_bias_dynamic_shapes, test/inductor/test_mkldnn_pattern_matcher.py::TestDynamicPatternMatcher::test_linear_unary_dynamic_shapes, test/inductor/test_mkldnn_pattern_matcher.py::TestDynamicPatternMatcher::test_qat_bn_conv2d, test/inductor/test_mkldnn_pattern_matcher.py::TestDynamicPatternMatcher::test_qconv2d_maxpool2d_linear_dynamic_cpu 2025-12-04T15:46:32.2447340Z 2025-12-04T15:46:32.2447759Z Finished inductor/test_mkldnn_pattern_matcher 3/3 ... [2025-12-04 15:46:32.230039][23620.612930542], took 8.20min 2025-12-04T15:46:32.2575093Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mkldnn_pattern_matcher/inductor.test_mkldnn_pattern_matcher-c93031a5b8f8293d.xml 2025-12-04T15:46:34.6055491Z Uploading artifacts took 2.28 seconds 2025-12-04T15:46:34.6059878Z Running inductor/test_cutlass_backend 1/1 ... [2025-12-04 15:46:34.605776][23622.988671298] 2025-12-04T15:46:34.6060518Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T15:46:34.6064845Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_cutlass_backend.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:46:34.606229] 2025-12-04T15:46:44.9169026Z 2025-12-04T15:46:44.9170163Z inductor/test_cutlass_backend 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_cutlass_backend_1.1_15c862b0fcbdbc05_.log 2025-12-04T15:46:44.9171223Z 2025-12-04T15:46:44.9171623Z Finished inductor/test_cutlass_backend 1/1 ... [2025-12-04 15:46:44.916666][23633.299562898], took 0.17min 2025-12-04T15:46:44.9442153Z Running inductor/test_ck_backend 1/1 ... [2025-12-04 15:46:44.943893][23633.326787523] 2025-12-04T15:46:44.9442720Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T15:46:44.9445750Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_ck_backend.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:46:44.944323] 2025-12-04T15:46:55.2772575Z 2025-12-04T15:46:55.2773759Z inductor/test_ck_backend 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_ck_backend_1.1_578c7dfc11700a2c_.log 2025-12-04T15:46:55.2774608Z 2025-12-04T15:46:55.2774975Z Finished inductor/test_ck_backend 1/1 ... [2025-12-04 15:46:55.277018][23643.659914217], took 0.17min 2025-12-04T15:46:55.3040869Z Running inductor/test_gpu_cpp_wrapper 1/1 ... [2025-12-04 15:46:55.303811][23643.686706801] 2025-12-04T15:46:55.3041485Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T15:46:55.3044858Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_gpu_cpp_wrapper.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:46:55.304229] 2025-12-04T15:53:29.6305490Z 2025-12-04T15:53:29.6306588Z inductor/test_gpu_cpp_wrapper 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_gpu_cpp_wrapper_1.1_e2281895ade7355a_.log 2025-12-04T15:53:29.6453343Z Running 259 items in this shard: test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_add_complex4_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_add_complex_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_adding_tensor_offsets_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_addmm_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_aoti_debug_printer_works_on_constants, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_as_strided_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_batch_norm_2d_2_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_bernoulli1_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_bitwise_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_bmm1_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_bmm2_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_buffer_use_after_remove_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_cat_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_cat_slice_cat_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_consecutive_split_cumprod_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_conv_backward_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_convolution1_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_custom_op_1_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_custom_op_2_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_custom_op_3_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_float16_float16_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_float16_float32_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_float16_float64_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_float16_int16_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_float16_int32_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_float16_int64_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_float16_int8_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_float16_uint8_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_float32_float16_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_float32_float32_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_float32_float64_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_float32_int16_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_float32_int32_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_float32_int64_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_float32_int8_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_float32_uint8_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_float64_float16_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_float64_float32_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_float64_float64_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_float64_int16_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_float64_int32_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_float64_int64_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_float64_int8_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_float64_uint8_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_fusion_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_int16_float16_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_int16_float32_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_int16_float64_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_int16_int16_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_int16_int32_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_int16_int64_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_int16_int8_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_int16_uint8_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_int32_float16_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_int32_float32_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_int32_float64_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_int32_int16_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_int32_int32_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_int32_int64_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_int32_int8_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_int32_uint8_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_int64_float16_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_int64_float32_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_int64_float64_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_int64_int16_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_int64_int32_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_int64_int64_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_int64_int8_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_int64_uint8_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_int8_float16_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_int8_float32_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_int8_float64_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_int8_int16_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_int8_int32_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_int8_int64_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_int8_int8_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_int8_uint8_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_uint8_float16_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_uint8_float32_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_uint8_float64_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_uint8_int16_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_uint8_int32_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_uint8_int64_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_uint8_int8_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dtypeview_uint8_uint8_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_dynamic_shapes_persistent_reduction_mixed_x_dim_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_embedding_bag_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_enable_dynamic_shapes_cpp_wrapper_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_fft_real_input_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_fft_real_input_real_output_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_foreach_cpp_wrapper_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_index_put_deterministic_fallback_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_index_tensor_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_inductor_layout_optimization_input_mutations_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_insignificant_strides_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_layer_norm_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_linear1_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_linear2_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_linear_relu_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_mm_plus_mm2_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_mm_plus_mm3_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_mm_views_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_multi_device_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_multi_threading_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_non_tensor_args_wrapped_on_cpu, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_pointwise_hermite_polynomial_h_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_pointwise_hermite_polynomial_he_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_pow3_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_profiler_mark_wrapper_call_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_randint_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_reduction1_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_relu_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_repeat_interleave_2_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_roi_align_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_scalar_input_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_scaled_dot_product_attention_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_scaled_dot_product_efficient_attention_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_silu_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_sort_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_sum_dtype_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_sum_int_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_transpose_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_unspec_inputs_float16_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_unspec_inputs_float32_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_unspec_inputs_float64_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_unspec_inputs_int16_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_unspec_inputs_int32_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_unspec_inputs_int64_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_unspec_inputs_int8_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_unspec_inputs_uint8_cuda_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_add_complex4_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_add_complex_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_adding_tensor_offsets_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_addmm_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_annotation_training, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_as_strided_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_batch_norm_2d_2_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_bernoulli1_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_bitwise_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_bmm1_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_bmm2_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_buffer_use_after_remove_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_cat_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_cat_slice_cat_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_consecutive_split_cumprod_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_conv_backward_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_convolution1_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_custom_op_1_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_custom_op_2_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_custom_op_3_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_float16_float16_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_float16_float32_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_float16_float64_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_float16_int16_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_float16_int32_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_float16_int64_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_float16_int8_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_float16_uint8_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_float32_float16_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_float32_float32_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_float32_float64_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_float32_int16_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_float32_int32_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_float32_int64_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_float32_int8_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_float32_uint8_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_float64_float16_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_float64_float32_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_float64_float64_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_float64_int16_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_float64_int32_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_float64_int64_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_float64_int8_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_float64_uint8_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_fusion_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_int16_float16_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_int16_float32_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_int16_float64_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_int16_int16_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_int16_int32_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_int16_int64_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_int16_int8_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_int16_uint8_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_int32_float16_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_int32_float32_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_int32_float64_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_int32_int16_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_int32_int32_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_int32_int64_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_int32_int8_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_int32_uint8_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_int64_float16_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_int64_float32_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_int64_float64_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_int64_int16_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_int64_int32_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_int64_int64_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_int64_int8_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_int64_uint8_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_int8_float16_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_int8_float32_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_int8_float64_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_int8_int16_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_int8_int32_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_int8_int64_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_int8_int8_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_int8_uint8_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_uint8_float16_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_uint8_float32_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_uint8_float64_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_uint8_int16_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_uint8_int32_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_uint8_int64_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_uint8_int8_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dtypeview_uint8_uint8_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_dynamic_shapes_persistent_reduction_mixed_x_dim_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_embedding_bag_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_enable_dynamic_shapes_cpp_wrapper_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_fft_real_input_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_fft_real_input_real_output_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_foreach_cpp_wrapper_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_index_put_deterministic_fallback_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_index_tensor_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_inductor_layout_optimization_input_mutations_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_insignificant_strides_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_layer_norm_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_linear1_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_linear2_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_linear_relu_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_mm_plus_mm2_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_mm_plus_mm3_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_mm_views_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_multi_device_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_multi_threading_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_pointwise_hermite_polynomial_h_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_pointwise_hermite_polynomial_he_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_pow3_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_profiler_mark_wrapper_call_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_randint_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_reduction1_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_relu_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_repeat_interleave_2_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_roi_align_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_scalar_input_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_scaled_dot_product_attention_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_scaled_dot_product_efficient_attention_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_silu_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_sort_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_sum_dtype_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_sum_int_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_transpose_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_unspec_inputs_float16_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_unspec_inputs_float32_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_unspec_inputs_float64_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_unspec_inputs_int16_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_unspec_inputs_int32_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_unspec_inputs_int64_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_unspec_inputs_int8_cuda_dynamic_shapes_gpu_wrapper, test/inductor/test_gpu_cpp_wrapper.py::DynamicShapesGpuWrapperGpuTests::test_unspec_inputs_uint8_cuda_dynamic_shapes_gpu_wrapper 2025-12-04T15:53:29.6598714Z 2025-12-04T15:53:29.6599096Z Finished inductor/test_gpu_cpp_wrapper 1/1 ... [2025-12-04 15:53:29.630721][24038.013615316], took 6.57min 2025-12-04T15:53:29.6600462Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_gpu_cpp_wrapper/inductor.test_gpu_cpp_wrapper-c206afd337165094.xml 2025-12-04T15:53:29.7560339Z Running inductor/test_cutedsl_template 1/1 ... [2025-12-04 15:53:29.755698][24038.138590972] 2025-12-04T15:53:29.7560939Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T15:53:29.7563866Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_cutedsl_template.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:53:29.756138] 2025-12-04T15:53:40.1861426Z 2025-12-04T15:53:40.1862554Z inductor/test_cutedsl_template 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_cutedsl_template_1.1_431b05ccc7f3aa92_.log 2025-12-04T15:53:40.1869224Z Running 13 items in this shard: test/inductor/test_cutedsl_template.py::TestCuteDSLTemplate::test_cse_integration, test/inductor/test_cutedsl_template.py::TestCuteDSLTemplate::test_cutedsl_add_e2e, test/inductor/test_cutedsl_template.py::TestCuteDSLTemplate::test_cutedsl_add_e2e_autotune, test/inductor/test_cutedsl_template.py::TestCuteDSLTemplate::test_cutedsl_op_overrides, test/inductor/test_cutedsl_template.py::TestCuteDSLTemplate::test_gen_defines, test/inductor/test_cutedsl_template.py::TestCuteDSLTemplate::test_gen_imports, test/inductor/test_cutedsl_template.py::TestCuteDSLTemplate::test_get_output_hook, test/inductor/test_cutedsl_template.py::TestCuteDSLTemplate::test_indented_buffer_usage, test/inductor/test_cutedsl_template.py::TestCuteDSLTemplate::test_modification_subgraph, test/inductor/test_cutedsl_template.py::TestCuteDSLTemplate::test_multiple_templates_unique_names, test/inductor/test_cutedsl_template.py::TestCuteDSLTemplate::test_render_includes_imports, test/inductor/test_cutedsl_template.py::TestCuteDSLTemplate::test_template_aliasing, test/inductor/test_cutedsl_template.py::TestCuteDSLTemplate::test_template_env_contains_hooks 2025-12-04T15:53:40.1876502Z 2025-12-04T15:53:40.1876896Z Finished inductor/test_cutedsl_template 1/1 ... [2025-12-04 15:53:40.185929][24048.568825606], took 0.17min 2025-12-04T15:53:40.2126259Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cutedsl_template/inductor.test_cutedsl_template-1780c0291e7a0397.xml 2025-12-04T15:53:40.2929666Z Running inductor/test_benchmark_fusion 1/1 ... [2025-12-04 15:53:40.292619][24048.675513853] 2025-12-04T15:53:40.2930269Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T15:53:40.2933490Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_benchmark_fusion.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:53:40.293060] 2025-12-04T15:54:10.6054931Z 2025-12-04T15:54:10.6056442Z inductor/test_benchmark_fusion 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_benchmark_fusion_1.1_06ce66c290620934_.log 2025-12-04T15:54:10.6065063Z Running 16 items in this shard: test/inductor/test_benchmark_fusion.py::BenchmarkFusionGpuTest::test_avoid_register_spilling_cuda, test/inductor/test_benchmark_fusion.py::BenchmarkFusionGpuTest::test_foreach_kernel_cuda, test/inductor/test_benchmark_fusion.py::BenchmarkFusionGpuTest::test_register_spills_cuda, test/inductor/test_benchmark_fusion.py::BenchmarkFusionGpuTest::test_resnet18_cuda, test/inductor/test_benchmark_fusion.py::BenchmarkFusionGpuTest::test_softmax_cuda, test/inductor/test_benchmark_fusion.py::BenchmarkFusionGpuTest::test_tield_kernel_fusion_cuda, test/inductor/test_benchmark_fusion.py::BenchmarkingTest::test_benchmark_on_non_zero_device, test/inductor/test_benchmark_fusion.py::BenchmarkMultiTemplateFusionGpuTest::test_changed_layout, test/inductor/test_benchmark_fusion.py::BenchmarkMultiTemplateFusionGpuTest::test_equivalent_extern_code, test/inductor/test_benchmark_fusion.py::BenchmarkMultiTemplateFusionGpuTest::test_equivalent_template_code, test/inductor/test_benchmark_fusion.py::BenchmarkFusionCpuTest::test_avoid_register_spilling_cpu, test/inductor/test_benchmark_fusion.py::BenchmarkFusionCpuTest::test_foreach_kernel_cpu, test/inductor/test_benchmark_fusion.py::BenchmarkFusionCpuTest::test_register_spills_cpu, test/inductor/test_benchmark_fusion.py::BenchmarkFusionCpuTest::test_resnet18_cpu, test/inductor/test_benchmark_fusion.py::BenchmarkFusionCpuTest::test_softmax_cpu, test/inductor/test_benchmark_fusion.py::BenchmarkFusionCpuTest::test_tield_kernel_fusion_cpu 2025-12-04T15:54:10.6073412Z 2025-12-04T15:54:10.6073816Z Finished inductor/test_benchmark_fusion 1/1 ... [2025-12-04 15:54:10.605282][24078.988178676], took 0.51min 2025-12-04T15:54:10.6326029Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_benchmark_fusion/inductor.test_benchmark_fusion-33e3c50f2f02127c.xml 2025-12-04T15:54:10.7148716Z Running dynamo/test_modules 1/1 ... [2025-12-04 15:54:10.714562][24079.097456204] 2025-12-04T15:54:10.7149283Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T15:54:10.7152621Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'dynamo/test_modules.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:54:10.715015] 2025-12-04T15:54:47.9360740Z 2025-12-04T15:54:47.9361838Z dynamo/test_modules 1/1 was successful, full logs can be found in artifacts with path test/test-reports/dynamo.test_modules_1.1_8a3e7afe44c0508c_.log 2025-12-04T15:54:47.9410386Z Running 135 items in this shard: test/dynamo/test_modules.py::NNModuleTests::test_access_by_keys, test/dynamo/test_modules.py::NNModuleTests::test_basicmodule1, test/dynamo/test_modules.py::NNModuleTests::test_basicmodule2, test/dynamo/test_modules.py::NNModuleTests::test_call_fn_with_non_const_inputs_safe, test/dynamo/test_modules.py::NNModuleTests::test_cfgmod, test/dynamo/test_modules.py::NNModuleTests::test_children, test/dynamo/test_modules.py::NNModuleTests::test_constloop, test/dynamo/test_modules.py::NNModuleTests::test_conv_call_forward_directly, test/dynamo/test_modules.py::NNModuleTests::test_conv_call_super_forward_directly, test/dynamo/test_modules.py::NNModuleTests::test_conv_transpose_call_forward_directly, test/dynamo/test_modules.py::NNModuleTests::test_conv_transpose_call_super_forward_directly, test/dynamo/test_modules.py::NNModuleTests::test_densenet, test/dynamo/test_modules.py::NNModuleTests::test_enumvalues, test/dynamo/test_modules.py::NNModuleTests::test_fnmember, test/dynamo/test_modules.py::NNModuleTests::test_fnmembercmp1, test/dynamo/test_modules.py::NNModuleTests::test_fnmembercmp2, test/dynamo/test_modules.py::NNModuleTests::test_forward_directly, test/dynamo/test_modules.py::NNModuleTests::test_generation_tag, test/dynamo/test_modules.py::NNModuleTests::test_hasattr, test/dynamo/test_modules.py::NNModuleTests::test_inject_module_parameters, test/dynamo/test_modules.py::NNModuleTests::test_intarg, test/dynamo/test_modules.py::NNModuleTests::test_iseval1, test/dynamo/test_modules.py::NNModuleTests::test_iseval2, test/dynamo/test_modules.py::NNModuleTests::test_isnonelayer, test/dynamo/test_modules.py::NNModuleTests::test_istraining1, test/dynamo/test_modules.py::NNModuleTests::test_istraining2, test/dynamo/test_modules.py::NNModuleTests::test_layerlist, test/dynamo/test_modules.py::NNModuleTests::test_lazy_module1, test/dynamo/test_modules.py::NNModuleTests::test_lazy_module2, test/dynamo/test_modules.py::NNModuleTests::test_lazy_module4, test/dynamo/test_modules.py::NNModuleTests::test_lazy_module5, test/dynamo/test_modules.py::NNModuleTests::test_lazy_module6, test/dynamo/test_modules.py::NNModuleTests::test_lazy_module7, test/dynamo/test_modules.py::NNModuleTests::test_lazy_module_bad_params, test/dynamo/test_modules.py::NNModuleTests::test_lazy_module_bad_params_call_function, test/dynamo/test_modules.py::NNModuleTests::test_lazy_module_kwargs, test/dynamo/test_modules.py::NNModuleTests::test_lazy_module_no_cls_to_become, test/dynamo/test_modules.py::NNModuleTests::test_lazy_module_speculation_log_divergence, test/dynamo/test_modules.py::NNModuleTests::test_module_attribute_precedence, test/dynamo/test_modules.py::NNModuleTests::test_module_call_module_with_static_forward, test/dynamo/test_modules.py::NNModuleTests::test_module_class_method, test/dynamo/test_modules.py::NNModuleTests::test_module_comparison, test/dynamo/test_modules.py::NNModuleTests::test_module_forward_has_graph_break, test/dynamo/test_modules.py::NNModuleTests::test_module_guard_name_is_valid, test/dynamo/test_modules.py::NNModuleTests::test_module_name_string, test/dynamo/test_modules.py::NNModuleTests::test_module_property, test/dynamo/test_modules.py::NNModuleTests::test_module_static_method, test/dynamo/test_modules.py::NNModuleTests::test_moduledict, test/dynamo/test_modules.py::NNModuleTests::test_moduledict_custom, test/dynamo/test_modules.py::NNModuleTests::test_modulelist, test/dynamo/test_modules.py::NNModuleTests::test_modulelist_custom, test/dynamo/test_modules.py::NNModuleTests::test_modulelist_nested, test/dynamo/test_modules.py::NNModuleTests::test_modulemethod1, test/dynamo/test_modules.py::NNModuleTests::test_modulemethod2, test/dynamo/test_modules.py::NNModuleTests::test_named_children, test/dynamo/test_modules.py::NNModuleTests::test_nn_module_setattr, test/dynamo/test_modules.py::NNModuleTests::test_nn_module_unspec_int_attr, test/dynamo/test_modules.py::NNModuleTests::test_nn_moduledict_contains, test/dynamo/test_modules.py::NNModuleTests::test_parameterdict, test/dynamo/test_modules.py::NNModuleTests::test_parameterdict_custom, test/dynamo/test_modules.py::NNModuleTests::test_parameters1, test/dynamo/test_modules.py::NNModuleTests::test_parameters2, test/dynamo/test_modules.py::NNModuleTests::test_parameters3, test/dynamo/test_modules.py::NNModuleTests::test_parameters4, test/dynamo/test_modules.py::NNModuleTests::test_parameters5, test/dynamo/test_modules.py::NNModuleTests::test_self_mutating1, test/dynamo/test_modules.py::NNModuleTests::test_seq, test/dynamo/test_modules.py::NNModuleTests::test_sequential_with_duplicated_module, test/dynamo/test_modules.py::NNModuleTests::test_sequential_with_duplicated_module2, test/dynamo/test_modules.py::NNModuleTests::test_simple_torch_function, test/dynamo/test_modules.py::NNModuleTests::test_stringmember, test/dynamo/test_modules.py::NNModuleTests::test_submodules1, test/dynamo/test_modules.py::NNModuleTests::test_submodules2, test/dynamo/test_modules.py::NNModuleTests::test_super1, test/dynamo/test_modules.py::NNModuleTests::test_super2, test/dynamo/test_modules.py::NNModuleTests::test_super_class_method, test/dynamo/test_modules.py::NNModuleTests::test_tensorlist, test/dynamo/test_modules.py::NNModuleTests::test_torch_function_with_closure, test/dynamo/test_modules.py::NNModuleTests::test_torch_mangled_class_name, test/dynamo/test_modules.py::NNModuleTests::test_unsupportedmethod, test/dynamo/test_modules.py::NNModuleTests::test_unsupportedmodule, test/dynamo/test_modules.py::NNModuleTests::test_viamodulecall, test/dynamo/test_modules.py::OptimizedModuleTest::test_assign_does_not_exist, test/dynamo/test_modules.py::OptimizedModuleTest::test_attr, test/dynamo/test_modules.py::OptimizedModuleTest::test_attr_precedence, test/dynamo/test_modules.py::OptimizedModuleTest::test_backward_hooks, test/dynamo/test_modules.py::OptimizedModuleTest::test_branch_on_nn_module_custom_bool, test/dynamo/test_modules.py::OptimizedModuleTest::test_branch_on_nn_module_custom_len, test/dynamo/test_modules.py::OptimizedModuleTest::test_buffer_order, test/dynamo/test_modules.py::OptimizedModuleTest::test_composition, test/dynamo/test_modules.py::OptimizedModuleTest::test_composition_with_opt_mod, test/dynamo/test_modules.py::OptimizedModuleTest::test_delattr_on_compiled_module, test/dynamo/test_modules.py::OptimizedModuleTest::test_dir, test/dynamo/test_modules.py::OptimizedModuleTest::test_dunder_call_explicitly, test/dynamo/test_modules.py::OptimizedModuleTest::test_globals_change_in_other_file, test/dynamo/test_modules.py::OptimizedModuleTest::test_guard_on_torch_nn_modules, test/dynamo/test_modules.py::OptimizedModuleTest::test_hooks_allowed_modules, test/dynamo/test_modules.py::OptimizedModuleTest::test_hooks_allowed_modules_compiles, test/dynamo/test_modules.py::OptimizedModuleTest::test_hooks_allowed_modules_compiles_self_contained, test/dynamo/test_modules.py::OptimizedModuleTest::test_hooks_inner, test/dynamo/test_modules.py::OptimizedModuleTest::test_hooks_outer, test/dynamo/test_modules.py::OptimizedModuleTest::test_hooks_skip_guards, test/dynamo/test_modules.py::OptimizedModuleTest::test_inline_inbuilt_nn_modules, test/dynamo/test_modules.py::OptimizedModuleTest::test_mark_static_nn_module_tensor, test/dynamo/test_modules.py::OptimizedModuleTest::test_mark_static_previously_seen_tensor, test/dynamo/test_modules.py::OptimizedModuleTest::test_mark_static_with_freezing, test/dynamo/test_modules.py::OptimizedModuleTest::test_module_dict_iter_keys, test/dynamo/test_modules.py::OptimizedModuleTest::test_module_dict_iter_name, test/dynamo/test_modules.py::OptimizedModuleTest::test_module_dict_iter_values, test/dynamo/test_modules.py::OptimizedModuleTest::test_module_order, test/dynamo/test_modules.py::OptimizedModuleTest::test_module_patch, test/dynamo/test_modules.py::OptimizedModuleTest::test_module_setattr, test/dynamo/test_modules.py::OptimizedModuleTest::test_monkeypatching_forward, test/dynamo/test_modules.py::OptimizedModuleTest::test_nn_module, test/dynamo/test_modules.py::OptimizedModuleTest::test_no_op_assignment, test/dynamo/test_modules.py::OptimizedModuleTest::test_no_recompile_on_nn_guarded_modules, test/dynamo/test_modules.py::OptimizedModuleTest::test_overridden_call, test/dynamo/test_modules.py::OptimizedModuleTest::test_param_order, test/dynamo/test_modules.py::OptimizedModuleTest::test_param_requires_grad, test/dynamo/test_modules.py::OptimizedModuleTest::test_patch_module, test/dynamo/test_modules.py::OptimizedModuleTest::test_recompile_limit_on_freed_module, test/dynamo/test_modules.py::OptimizedModuleTest::test_recompile_limit_on_guarded_nn_modules, test/dynamo/test_modules.py::OptimizedModuleTest::test_recursion, test/dynamo/test_modules.py::OptimizedModuleTest::test_save_and_load_all_backends, test/dynamo/test_modules.py::OptimizedModuleTest::test_save_and_load_inductor, test/dynamo/test_modules.py::OptimizedModuleTest::test_setattr_on_compiled_module, test/dynamo/test_modules.py::OptimizedModuleTest::test_specialized_module___iter__, test/dynamo/test_modules.py::OptimizedModuleTest::test_to, test/dynamo/test_modules.py::OptimizedModuleTest::test_trace_delattr, test/dynamo/test_modules.py::OptimizedModuleTest::test_udo_instance_method_as_hook, test/dynamo/test_modules.py::OptimizedModuleTest::test_unhashable_nn_submodule, test/dynamo/test_modules.py::OptimizedModuleTest::test_unspec_non_inlinable_module, test/dynamo/test_modules.py::OptimizedModuleTest::test_unspecialized_seq, test/dynamo/test_modules.py::OptimizedModuleTest::test_user_defined_nn_module_dynamic, test/dynamo/test_modules.py::NNModuleTestsDeviceCUDA::test_lazy_module3_cuda 2025-12-04T15:54:47.9457809Z 2025-12-04T15:54:47.9458174Z Finished dynamo/test_modules 1/1 ... [2025-12-04 15:54:47.936073][24116.318967515], took 0.62min 2025-12-04T15:54:47.9636166Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/dynamo.test_modules/dynamo.test_modules-f3674dc870090d50.xml 2025-12-04T15:54:48.0598055Z Running dynamo/test_recompiles 1/1 ... [2025-12-04 15:54:48.059482][24116.442376281] 2025-12-04T15:54:48.0598620Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T15:54:48.0602424Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'dynamo/test_recompiles.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:54:48.059950] 2025-12-04T15:54:59.4413261Z 2025-12-04T15:54:59.4414827Z dynamo/test_recompiles 1/1 was successful, full logs can be found in artifacts with path test/test-reports/dynamo.test_recompiles_1.1_781d5b3da7b99916_.log 2025-12-04T15:54:59.4425082Z Running 18 items in this shard: test/dynamo/test_recompiles.py::RecompileTests::test_aliasing_guard_failures, test/dynamo/test_recompiles.py::RecompileTests::test_aliasing_guard_failures_with_globals, test/dynamo/test_recompiles.py::RecompileTests::test_ambient_autocast_recompile, test/dynamo/test_recompiles.py::RecompileTests::test_autocast_constant_fold, test/dynamo/test_recompiles.py::RecompileTests::test_automatic_dynamic_on_closed_ints, test/dynamo/test_recompiles.py::RecompileTests::test_automatic_dynamic_reduce_recompiles, test/dynamo/test_recompiles.py::RecompileTests::test_automatic_dynamic_shapes_mark_as_oblivious, test/dynamo/test_recompiles.py::RecompileTests::test_automatic_dynamic_shapes_mark_as_oblivious_fail_counterfactual, test/dynamo/test_recompiles.py::RecompileTests::test_automatic_dynamic_shapes_mark_as_unbacked, test/dynamo/test_recompiles.py::RecompileTests::test_automatic_dynamic_tensor_scalar_change, test/dynamo/test_recompiles.py::RecompileTests::test_dunder_call_recompile, test/dynamo/test_recompiles.py::RecompileTests::test_dynamic_shape_parameter_recompile, test/dynamo/test_recompiles.py::RecompileTests::test_inline_inbuilt_nn_modules_candidate, test/dynamo/test_recompiles.py::RecompileTests::test_no_recompile_over_unused_objects, test/dynamo/test_recompiles.py::RecompileTests::test_no_recursive_compile_after_cache_limit_hit, test/dynamo/test_recompiles.py::RecompileTests::test_recompiles_true_false_flop, test/dynamo/test_recompiles.py::RecompileTests::test_run_mode_after_cache_limit_hit, test/dynamo/test_recompiles.py::RecompileTests::test_simple_module_recompile 2025-12-04T15:54:59.4433593Z 2025-12-04T15:54:59.4433940Z Finished dynamo/test_recompiles 1/1 ... [2025-12-04 15:54:59.441102][24127.823998208], took 0.19min 2025-12-04T15:54:59.4688190Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/dynamo.test_recompiles/dynamo.test_recompiles-755ec9793479e2dd.xml 2025-12-04T15:54:59.5494716Z Running export/test_tree_utils 1/1 ... [2025-12-04 15:54:59.549127][24127.932021217] 2025-12-04T15:54:59.5495311Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T15:54:59.5498593Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'export/test_tree_utils.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:54:59.549578] 2025-12-04T15:55:05.0223104Z 2025-12-04T15:55:05.0224098Z export/test_tree_utils 1/1 was successful, full logs can be found in artifacts with path test/test-reports/export.test_tree_utils_1.1_01fdd9412c3dc291_.log 2025-12-04T15:55:05.0225690Z Running 2 items in this shard: test/export/test_tree_utils.py::TestTreeUtils::test_equivalence_check, test/export/test_tree_utils.py::TestTreeUtils::test_reorder_kwargs 2025-12-04T15:55:05.0226558Z 2025-12-04T15:55:05.0227114Z Finished export/test_tree_utils 1/1 ... [2025-12-04 15:55:05.022113][24133.405008916], took 0.09min 2025-12-04T15:55:05.0496619Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/export.test_tree_utils/export.test_tree_utils-4b33de82582b2e92.xml 2025-12-04T15:55:05.0829389Z Running inductor/test_triton_wrapper 1/1 ... [2025-12-04 15:55:05.082620][24133.465515595] 2025-12-04T15:55:05.0830338Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T15:55:05.0834467Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_triton_wrapper.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:55:05.083077] 2025-12-04T15:55:34.5922201Z 2025-12-04T15:55:34.5923309Z inductor/test_triton_wrapper 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_triton_wrapper_1.1_aad0f3987661a0f9_.log 2025-12-04T15:55:34.5924719Z Running 1 items in this shard: test/inductor/test_triton_wrapper.py::TestTritonWrapper::test_wrapper_using_gpu_seed 2025-12-04T15:55:34.5925333Z 2025-12-04T15:55:34.5925734Z Finished inductor/test_triton_wrapper 1/1 ... [2025-12-04 15:55:34.591979][24162.974875934], took 0.49min 2025-12-04T15:55:34.6201234Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_triton_wrapper/inductor.test_triton_wrapper-7697274370716365.xml 2025-12-04T15:55:34.6939152Z Running inductor/test_static_cuda_launcher 1/1 ... [2025-12-04 15:55:34.693602][24163.076496954] 2025-12-04T15:55:34.6939801Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T15:55:34.6943429Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_static_cuda_launcher.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:55:34.694088] 2025-12-04T15:55:59.9475061Z 2025-12-04T15:55:59.9476225Z inductor/test_static_cuda_launcher 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_static_cuda_launcher_1.1_aa705837cbb50573_.log 2025-12-04T15:55:59.9485229Z Running 17 items in this shard: test/inductor/test_static_cuda_launcher.py::TestStaticCudaLauncher::test_basic, test/inductor/test_static_cuda_launcher.py::TestStaticCudaLauncher::test_basic_1arg, test/inductor/test_static_cuda_launcher.py::TestStaticCudaLauncher::test_constexpr, test/inductor/test_static_cuda_launcher.py::TestStaticCudaLauncher::test_high_shared_mem, test/inductor/test_static_cuda_launcher.py::TestStaticCudaLauncher::test_implied_constant, test/inductor/test_static_cuda_launcher.py::TestStaticCudaLauncher::test_kernel_empty_tensor, test/inductor/test_static_cuda_launcher.py::TestStaticCudaLauncher::test_kernel_many_args, test/inductor/test_static_cuda_launcher.py::TestStaticCudaLauncher::test_kernel_no_args, test/inductor/test_static_cuda_launcher.py::TestStaticCudaLauncher::test_signed_integers, test/inductor/test_static_cuda_launcher.py::TestStaticCudaLauncher::test_too_high_shared_mem, test/inductor/test_static_cuda_launcher.py::TestStaticCudaLauncher::test_unsigned_integers, test/inductor/test_static_cuda_launcher.py::TestStaticTritonCompileResult::test_any, test/inductor/test_static_cuda_launcher.py::TestStaticTritonCompileResult::test_basic_compile, test/inductor/test_static_cuda_launcher.py::TestStaticTritonCompileResult::test_disable_static_cuda_launcher, test/inductor/test_static_cuda_launcher.py::TestStaticTritonCompileResult::test_empty_tensor, test/inductor/test_static_cuda_launcher.py::TestStaticTritonCompileResult::test_incompatible_code, test/inductor/test_static_cuda_launcher.py::TestStaticTritonCompileResult::test_static_launch_user_defined_triton_kernels 2025-12-04T15:55:59.9493892Z 2025-12-04T15:55:59.9494299Z Finished inductor/test_static_cuda_launcher 1/1 ... [2025-12-04 15:55:59.947292][24188.330187662], took 0.42min 2025-12-04T15:55:59.9752049Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_static_cuda_launcher/inductor.test_static_cuda_launcher-96effba66b878950.xml 2025-12-04T15:56:00.0525918Z Running export/test_dynamic_shapes 1/1 ... [2025-12-04 15:56:00.052211][24188.435104223] 2025-12-04T15:56:00.0526570Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T15:56:00.0529653Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'export/test_dynamic_shapes.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:56:00.052682] 2025-12-04T15:56:05.5251923Z 2025-12-04T15:56:05.5253001Z export/test_dynamic_shapes 1/1 was successful, full logs can be found in artifacts with path test/test-reports/export.test_dynamic_shapes_1.1_fa1beed2f0eed81a_.log 2025-12-04T15:56:05.5254640Z Running 2 items in this shard: test/export/test_dynamic_shapes.py::TestDimHint::test_dimhint_factory, test/export/test_dynamic_shapes.py::TestDimHint::test_dimhint_repr 2025-12-04T15:56:05.5255595Z 2025-12-04T15:56:05.5256058Z Finished export/test_dynamic_shapes 1/1 ... [2025-12-04 15:56:05.525009][24193.90790566], took 0.09min 2025-12-04T15:56:05.5531056Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/export.test_dynamic_shapes/export.test_dynamic_shapes-6f817f896f94c83c.xml 2025-12-04T15:56:05.5842852Z Running dynamo/test_sdpa 1/1 ... [2025-12-04 15:56:05.584037][24193.966932613] 2025-12-04T15:56:05.5843477Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T15:56:05.5847018Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'dynamo/test_sdpa.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:56:05.584436] 2025-12-04T15:56:14.9630348Z 2025-12-04T15:56:14.9631426Z dynamo/test_sdpa 1/1 was successful, full logs can be found in artifacts with path test/test-reports/dynamo.test_sdpa_1.1_5570cc8ef25d14ab_.log 2025-12-04T15:56:14.9634631Z Running 6 items in this shard: test/dynamo/test_sdpa.py::TestSDPA::test_graph_break_SDPAParams, test/dynamo/test_sdpa.py::TestSDPA::test_input_SDPAParams, test/dynamo/test_sdpa.py::TestSDPA::test_intermediate_attr_access_SDPAParams, test/dynamo/test_sdpa.py::TestSDPA::test_returns_SDPAParams, test/dynamo/test_sdpa.py::TestSDPA::test_sdpa_c_functions_no_graph_break, test/dynamo/test_sdpa.py::TestSDPA::test_sdpa_kernel_decorator_with_compile 2025-12-04T15:56:14.9636873Z 2025-12-04T15:56:14.9637452Z Finished dynamo/test_sdpa 1/1 ... [2025-12-04 15:56:14.962803][24203.345699621], took 0.16min 2025-12-04T15:56:14.9920710Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/dynamo.test_sdpa/dynamo.test_sdpa-3e0149796a415876.xml 2025-12-04T15:56:15.0747078Z Running dynamo/test_utils 1/1 ... [2025-12-04 15:56:15.074372][24203.457266368] 2025-12-04T15:56:15.0747672Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T15:56:15.0750394Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'dynamo/test_utils.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:56:15.074784] 2025-12-04T15:56:57.5034992Z 2025-12-04T15:56:57.5036046Z dynamo/test_utils 1/1 was successful, full logs can be found in artifacts with path test/test-reports/dynamo.test_utils_1.1_31a21332cf86ab83_.log 2025-12-04T15:56:57.5043172Z Running 17 items in this shard: test/dynamo/test_utils.py::TestUtils::test_graph_break_counting, test/dynamo/test_utils.py::TestUtils::test_larger_multiplier_for_even_smaller_tensor, test/dynamo/test_utils.py::TestUtils::test_larger_multiplier_for_smaller_tensor, test/dynamo/test_utils.py::TestUtils::test_nan, test/dynamo/test_utils.py::TestUtils::test_traced_code_query, test/dynamo/test_utils.py::TestDynamoTimed::test_compiler_config, test/dynamo/test_utils.py::TestDynamoTimed::test_dynamic_shape_feature_use, test/dynamo/test_utils.py::TestDynamoTimed::test_dynamo_timed, test/dynamo/test_utils.py::TestDynamoTimed::test_exception_stack_trace, test/dynamo/test_utils.py::TestDynamoTimed::test_graph_node_shapes, test/dynamo/test_utils.py::TestDynamoTimed::test_inductor_provenance, test/dynamo/test_utils.py::TestDynamoTimed::test_ir_count, test/dynamo/test_utils.py::TestDynamoTimed::test_log_dynamo_start, test/dynamo/test_utils.py::TestDynamoTimed::test_num_params, test/dynamo/test_utils.py::TestDynamoTimed::test_stack_trace, test/dynamo/test_utils.py::TestInductorConfigParsingForLogging::test_inductor_config_jsonify, test/dynamo/test_utils.py::TestInductorConfigParsingForLogging::test_inductor_config_parsing_non_conforming_items 2025-12-04T15:56:57.5049552Z 2025-12-04T15:56:57.5049876Z Finished dynamo/test_utils 1/1 ... [2025-12-04 15:56:57.503322][24245.886219167], took 0.71min 2025-12-04T15:56:57.5314974Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/dynamo.test_utils/dynamo.test_utils-e6d94f5c34c685f8.xml 2025-12-04T15:56:57.6151387Z Running inductor/test_codegen_triton 1/1 ... [2025-12-04 15:56:57.614788][24245.997682791] 2025-12-04T15:56:57.6151998Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T15:56:57.6154614Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_codegen_triton.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:56:57.615208] 2025-12-04T15:57:08.0948490Z 2025-12-04T15:57:08.0949644Z inductor/test_codegen_triton 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_codegen_triton_1.1_8e8a3c1b0bc12db7_.log 2025-12-04T15:57:08.0951097Z Running 1 items in this shard: test/inductor/test_codegen_triton.py::TestCodegenTriton::test_config_of_sizearg 2025-12-04T15:57:08.0951689Z 2025-12-04T15:57:08.0952099Z Finished inductor/test_codegen_triton 1/1 ... [2025-12-04 15:57:08.094655][24256.477548352], took 0.17min 2025-12-04T15:57:08.1233956Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_codegen_triton/inductor.test_codegen_triton-f741c3b21cf28e3b.xml 2025-12-04T15:57:08.2150497Z Running dynamo/test_frame_init 1/1 ... [2025-12-04 15:57:08.214716][24256.597610845] 2025-12-04T15:57:08.2151075Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T15:57:08.2154505Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'dynamo/test_frame_init.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:57:08.215180] 2025-12-04T15:57:13.7876024Z 2025-12-04T15:57:13.7877013Z dynamo/test_frame_init 1/1 was successful, full logs can be found in artifacts with path test/test-reports/dynamo.test_frame_init_1.1_2f60459938295159_.log 2025-12-04T15:57:13.7878224Z Running 1 items in this shard: test/dynamo/test_frame_init.py::FrameInitTests::test_frame_init 2025-12-04T15:57:13.7878744Z 2025-12-04T15:57:13.7879089Z Finished dynamo/test_frame_init 1/1 ... [2025-12-04 15:57:13.787404][24262.170299591], took 0.09min 2025-12-04T15:57:13.8158733Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/dynamo.test_frame_init/dynamo.test_frame_init-c2e1024fb8a07387.xml 2025-12-04T15:57:13.8434943Z Running inductor/test_device_assert 1/1 ... [2025-12-04 15:57:13.843190][24262.22608477] 2025-12-04T15:57:13.8435543Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T15:57:13.8438897Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_device_assert.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:57:13.843630] 2025-12-04T15:57:35.9420709Z 2025-12-04T15:57:35.9422055Z inductor/test_device_assert 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_device_assert_1.1_d916ba60ad9d20e5_.log 2025-12-04T15:57:35.9427395Z Running 8 items in this shard: test/inductor/test_device_assert.py::TestTorchDeviceAssertTrigger::test_assert_fusion, test/inductor/test_device_assert.py::TestTorchDeviceAssertTrigger::test_assert_should_not_throw_backend_aot_eager, test/inductor/test_device_assert.py::TestTorchDeviceAssertTrigger::test_assert_should_not_throw_backend_eager, test/inductor/test_device_assert.py::TestTorchDeviceAssertTrigger::test_assert_should_not_throw_backend_inductor, test/inductor/test_device_assert.py::TestTorchDeviceAssertTrigger::test_assert_should_throw_backend_aot_eager, test/inductor/test_device_assert.py::TestTorchDeviceAssertTrigger::test_assert_should_throw_backend_eager, test/inductor/test_device_assert.py::TestTorchDeviceAssertTrigger::test_assert_should_throw_backend_inductor, test/inductor/test_device_assert.py::TestTorchDeviceAssertTrigger::test_run_assert_triton 2025-12-04T15:57:35.9431907Z 2025-12-04T15:57:35.9432279Z Finished inductor/test_device_assert 1/1 ... [2025-12-04 15:57:35.941838][24284.324735187], took 0.37min 2025-12-04T15:57:35.9706978Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_device_assert/inductor.test_device_assert-451c7142fcd9d62b.xml 2025-12-04T15:57:36.0458623Z Running dynamo/test_skip_non_tensor 1/1 ... [2025-12-04 15:57:36.045543][24284.428437593] 2025-12-04T15:57:36.0459220Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T15:57:36.0462378Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'dynamo/test_skip_non_tensor.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:57:36.045990] 2025-12-04T15:57:44.6731091Z 2025-12-04T15:57:44.6732791Z dynamo/test_skip_non_tensor 1/1 was successful, full logs can be found in artifacts with path test/test-reports/dynamo.test_skip_non_tensor_1.1_5109354b2e4bf091_.log 2025-12-04T15:57:44.6740447Z Running 8 items in this shard: test/dynamo/test_skip_non_tensor.py::SkipNonTensorTests::test_add_skip, test/dynamo/test_skip_non_tensor.py::SkipNonTensorTests::test_add_tensor1, test/dynamo/test_skip_non_tensor.py::SkipNonTensorTests::test_add_tensor2, test/dynamo/test_skip_non_tensor.py::SkipNonTensorTests::test_add_tensor_dict, test/dynamo/test_skip_non_tensor.py::SkipNonTensorTests::test_add_tensor_list, test/dynamo/test_skip_non_tensor.py::SkipNonTensorTests::test_custom_list, test/dynamo/test_skip_non_tensor.py::SkipNonTensorTests::test_do_not_skip_side_effects, test/dynamo/test_skip_non_tensor.py::SkipNonTensorTests::test_recursive_list 2025-12-04T15:57:44.6746971Z 2025-12-04T15:57:44.6747698Z Finished dynamo/test_skip_non_tensor 1/1 ... [2025-12-04 15:57:44.672850][24293.055746303], took 0.14min 2025-12-04T15:57:44.7028054Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/dynamo.test_skip_non_tensor/dynamo.test_skip_non_tensor-f190ace25428cb94.xml 2025-12-04T15:57:44.7834040Z Running dynamo/test_skip_guard_eval_unsafe 1/1 ... [2025-12-04 15:57:44.782979][24293.165873571] 2025-12-04T15:57:44.7835059Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T15:57:44.7838228Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'dynamo/test_skip_guard_eval_unsafe.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:57:44.783471] 2025-12-04T15:58:01.9734193Z 2025-12-04T15:58:01.9735316Z dynamo/test_skip_guard_eval_unsafe 1/1 was successful, full logs can be found in artifacts with path test/test-reports/dynamo.test_skip_guard_eval_unsafe_1.1_b141b115e14ff53c_.log 2025-12-04T15:58:01.9738797Z Running 5 items in this shard: test/dynamo/test_skip_guard_eval_unsafe.py::RunDiffGuardTests::test_bool_recompile, test/dynamo/test_skip_guard_eval_unsafe.py::RunDiffGuardTests::test_cache_line_pickup, test/dynamo/test_skip_guard_eval_unsafe.py::RunDiffGuardTests::test_fail_on_tensor_shape_change, test/dynamo/test_skip_guard_eval_unsafe.py::RunDiffGuardTests::test_post_recompile, test/dynamo/test_skip_guard_eval_unsafe.py::RunDiffGuardTests::test_tensor_recompile 2025-12-04T15:58:01.9741200Z 2025-12-04T15:58:01.9741598Z Finished dynamo/test_skip_guard_eval_unsafe 1/1 ... [2025-12-04 15:58:01.973235][24310.356131087], took 0.29min 2025-12-04T15:58:02.0023158Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/dynamo.test_skip_guard_eval_unsafe/dynamo.test_skip_guard_eval_unsafe-aa1ded9d0a4e400e.xml 2025-12-04T15:58:02.0771120Z Running inductor/test_control_deps 1/1 ... [2025-12-04 15:58:02.076802][24310.459696197] 2025-12-04T15:58:02.0771730Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T15:58:02.0775257Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_control_deps.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:58:02.077262] 2025-12-04T15:58:21.4708829Z 2025-12-04T15:58:21.4710024Z inductor/test_control_deps 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_control_deps_1.1_e3804afa5ea10bb1_.log 2025-12-04T15:58:21.4711382Z Running 1 items in this shard: test/inductor/test_control_deps.py::TestControlDeps::test_control_deps_prevents_fusion 2025-12-04T15:58:21.4711991Z 2025-12-04T15:58:21.4712385Z Finished inductor/test_control_deps 1/1 ... [2025-12-04 15:58:21.470610][24329.853502291], took 0.32min 2025-12-04T15:58:21.5001476Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_control_deps/inductor.test_control_deps-23047419ffe03376.xml 2025-12-04T15:58:21.5727269Z Running inductor/test_benchmarking 1/1 ... [2025-12-04 15:58:21.572380][24329.955275117] 2025-12-04T15:58:21.5727927Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T15:58:21.5731550Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_benchmarking.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:58:21.572862] 2025-12-04T15:58:34.9074345Z 2025-12-04T15:58:34.9075429Z inductor/test_benchmarking 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_benchmarking_1.1_f947c0362e7ea45b_.log 2025-12-04T15:58:34.9082991Z Running 12 items in this shard: test/inductor/test_benchmarking.py::TestBenchmarker::test_benchmark_cpu_smoke_benchmarker_cls0, test/inductor/test_benchmarking.py::TestBenchmarker::test_benchmark_cpu_smoke_benchmarker_cls1, test/inductor/test_benchmarking.py::TestBenchmarker::test_benchmark_gpu_smoke_benchmarker_cls0, test/inductor/test_benchmarking.py::TestBenchmarker::test_benchmark_gpu_smoke_benchmarker_cls1, test/inductor/test_benchmarking.py::TestBenchmarker::test_benchmark_safely_infers_device_many_devices_benchmarker_cls0, test/inductor/test_benchmarking.py::TestBenchmarker::test_benchmark_safely_infers_device_many_devices_benchmarker_cls1, test/inductor/test_benchmarking.py::TestBenchmarker::test_benchmark_safely_infers_device_no_devices_benchmarker_cls0, test/inductor/test_benchmarking.py::TestBenchmarker::test_benchmark_safely_infers_device_no_devices_benchmarker_cls1, test/inductor/test_benchmarking.py::TestBenchmarker::test_benchmark_smoke_benchmarker_cls0_device_cpu, test/inductor/test_benchmarking.py::TestBenchmarker::test_benchmark_smoke_benchmarker_cls0_device_cuda, test/inductor/test_benchmarking.py::TestBenchmarker::test_benchmark_smoke_benchmarker_cls1_device_cpu, test/inductor/test_benchmarking.py::TestBenchmarker::test_benchmark_smoke_benchmarker_cls1_device_cuda 2025-12-04T15:58:34.9089740Z 2025-12-04T15:58:34.9090203Z Finished inductor/test_benchmarking 1/1 ... [2025-12-04 15:58:34.907244][24343.290137105], took 0.22min 2025-12-04T15:58:34.9368879Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_benchmarking/inductor.test_benchmarking-53f04a03954d2058.xml 2025-12-04T15:58:35.0052182Z Running inductor/test_helion_kernels 1/1 ... [2025-12-04 15:58:35.004844][24343.387738827] 2025-12-04T15:58:35.0052806Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T15:58:35.0055339Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_helion_kernels.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:58:35.005279] 2025-12-04T15:58:45.3354409Z 2025-12-04T15:58:45.3355641Z inductor/test_helion_kernels 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_helion_kernels_1.1_7576dd76567d0db5_.log 2025-12-04T15:58:45.3357314Z Running 2 items in this shard: test/inductor/test_helion_kernels.py::HelionTests::test_add_kernel, test/inductor/test_helion_kernels.py::HelionTests::test_softmax_view_reshape 2025-12-04T15:58:45.3358218Z 2025-12-04T15:58:45.3358592Z Finished inductor/test_helion_kernels 1/1 ... [2025-12-04 15:58:45.335202][24353.718099338], took 0.17min 2025-12-04T15:58:45.3650334Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_helion_kernels/inductor.test_helion_kernels-0df86f8cd24ea26a.xml 2025-12-04T15:58:45.4421994Z Running inductor/test_quantization 1/1 ... [2025-12-04 15:58:45.441847][24353.824740991] 2025-12-04T15:58:45.4422645Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T15:58:45.4425122Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_quantization.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:58:45.442262] 2025-12-04T15:59:05.8376479Z 2025-12-04T15:59:05.8377617Z inductor/test_quantization 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_quantization_1.1_84a522d95ca6c1ae_.log 2025-12-04T15:59:05.8379876Z Running 2 items in this shard: test/inductor/test_quantization.py::TestQuantization::test_activation_quantization_aten_with_scaling, test/inductor/test_quantization.py::TestQuantization::test_activation_quantization_aten_without_scaling 2025-12-04T15:59:05.8381216Z 2025-12-04T15:59:05.8381660Z Finished inductor/test_quantization 1/1 ... [2025-12-04 15:59:05.837412][24374.220307718], took 0.34min 2025-12-04T15:59:05.8675446Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_quantization/inductor.test_quantization-951156711359c867.xml 2025-12-04T15:59:05.9464568Z Running export/test_tools 1/1 ... [2025-12-04 15:59:05.946121][24374.329015529] 2025-12-04T15:59:05.9465174Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T15:59:05.9467969Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'export/test_tools.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:59:05.946544] 2025-12-04T15:59:14.1730817Z 2025-12-04T15:59:14.1731754Z export/test_tools 1/1 was successful, full logs can be found in artifacts with path test/test-reports/export.test_tools_1.1_7b301d5abd4a995c_.log 2025-12-04T15:59:14.1733371Z Running 2 items in this shard: test/export/test_tools.py::TestExportTools::test_report_exportability_basic, test/export/test_tools.py::TestExportTools::test_report_exportability_with_issues 2025-12-04T15:59:14.1734377Z 2025-12-04T15:59:14.1734695Z Finished export/test_tools 1/1 ... [2025-12-04 15:59:14.172847][24382.555743637], took 0.14min 2025-12-04T15:59:14.2031072Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/export.test_tools/export.test_tools-c033e9415dabe65c.xml 2025-12-04T15:59:14.2826336Z Running inductor/test_compiled_optimizers 1/3 ... [2025-12-04 15:59:14.282326][24382.665219411] 2025-12-04T15:59:14.2827089Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T15:59:14.2830136Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_compiled_optimizers.py', '--shard-id=1', '--num-shards=3', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:59:14.282743] 2025-12-04T16:09:22.3028958Z 2025-12-04T16:09:22.3030432Z inductor/test_compiled_optimizers 1/3 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_compiled_optimizers_1.3_8b95325a31b7233d_.log 2025-12-04T16:09:22.3188328Z Running 248 items in this shard: test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adadelta_cpu, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adadelta_foreach_cuda, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adadelta_maximize_foreach_cuda, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adadelta_tensor_lr_capturable_cuda_exponentiallr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adadelta_tensor_lr_capturable_cuda_multiplicativelr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adadelta_tensor_lr_capturable_cuda_polynomiallr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adadelta_tensor_lr_capturable_cuda_reducelronplateau, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adadelta_tensor_lr_capturable_cuda_steplr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adadelta_tensor_lr_capturable_foreach_cuda_constantlr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adadelta_tensor_lr_capturable_foreach_cuda_cycliclr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adadelta_tensor_lr_capturable_foreach_cuda_lambdalr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adadelta_tensor_lr_capturable_foreach_cuda_linearlr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adadelta_tensor_lr_capturable_foreach_cuda_multiplicativelr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adadelta_tensor_lr_capturable_foreach_cuda_polynomiallr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adadelta_weight_decay_capturable_foreach_cuda, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adadelta_weight_decay_cuda, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adagrad_initial_accumulator_value_weight_decay_foreach_cuda, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adagrad_lr_decay_weight_decay_cpu, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adagrad_tensor_lr_cpu_constantlr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adagrad_tensor_lr_cpu_cosineannealinglr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adagrad_tensor_lr_cpu_cosineannealingwarmrestarts, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adagrad_tensor_lr_cpu_cycliclr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adagrad_tensor_lr_cpu_linearlr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adagrad_tensor_lr_cpu_multisteplr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adagrad_tensor_lr_cpu_polynomiallr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adagrad_tensor_lr_cpu_reducelronplateau, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adagrad_tensor_lr_cuda_constantlr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adagrad_tensor_lr_cuda_exponentiallr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adagrad_tensor_lr_cuda_multisteplr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adagrad_tensor_lr_cuda_onecyclelr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adagrad_tensor_lr_cuda_reducelronplateau, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adagrad_tensor_lr_foreach_cuda_cycliclr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adagrad_tensor_lr_foreach_cuda_exponentiallr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adagrad_tensor_lr_foreach_cuda_reducelronplateau, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adagrad_tensor_lr_foreach_cuda_steplr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adagrad_weight_decay_cpu, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adam_capturable_foreach_cuda, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adam_cpu, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adam_cuda, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adam_tensor_lr_amsgrad_capturable_cuda_constantlr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adam_tensor_lr_amsgrad_capturable_cuda_cosineannealinglr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adam_tensor_lr_amsgrad_capturable_cuda_cosineannealingwarmrestarts, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adam_tensor_lr_amsgrad_capturable_cuda_linearlr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adam_tensor_lr_amsgrad_capturable_cuda_multiplicativelr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adam_tensor_lr_amsgrad_capturable_cuda_multisteplr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adam_tensor_lr_amsgrad_capturable_cuda_onecyclelr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adam_tensor_lr_amsgrad_capturable_cuda_steplr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adam_tensor_lr_amsgrad_capturable_foreach_cuda_cosineannealinglr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adam_tensor_lr_amsgrad_capturable_foreach_cuda_cosineannealingwarmrestarts, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adam_tensor_lr_amsgrad_capturable_foreach_cuda_cycliclr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adam_tensor_lr_amsgrad_capturable_foreach_cuda_exponentiallr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adam_tensor_lr_amsgrad_capturable_foreach_cuda_linearlr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adam_tensor_lr_amsgrad_capturable_foreach_cuda_multiplicativelr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adam_tensor_lr_amsgrad_capturable_foreach_cuda_steplr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adam_tensor_lr_tensor_betas_amsgrad_capturable_cuda_constantlr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adam_tensor_lr_tensor_betas_amsgrad_capturable_cuda_cosineannealingwarmrestarts, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adam_tensor_lr_tensor_betas_amsgrad_capturable_cuda_exponentiallr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adam_tensor_lr_tensor_betas_amsgrad_capturable_cuda_lambdalr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adam_tensor_lr_tensor_betas_amsgrad_capturable_cuda_linearlr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adam_tensor_lr_tensor_betas_amsgrad_capturable_foreach_cuda_cosineannealingwarmrestarts, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adam_tensor_lr_tensor_betas_amsgrad_capturable_foreach_cuda_lambdalr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adam_tensor_lr_tensor_betas_amsgrad_capturable_foreach_cuda_multiplicativelr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adam_tensor_lr_tensor_betas_amsgrad_capturable_foreach_cuda_multisteplr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adam_tensor_lr_tensor_betas_amsgrad_capturable_foreach_cuda_onecyclelr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adam_tensor_lr_tensor_betas_amsgrad_capturable_foreach_cuda_reducelronplateau, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adam_tensor_lr_tensor_betas_capturable_cuda_cosineannealinglr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adam_tensor_lr_tensor_betas_capturable_cuda_cosineannealingwarmrestarts, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adam_tensor_lr_tensor_betas_capturable_cuda_exponentiallr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adam_tensor_lr_tensor_betas_capturable_cuda_onecyclelr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adam_tensor_lr_tensor_betas_capturable_cuda_reducelronplateau, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adam_tensor_lr_tensor_betas_capturable_foreach_cuda_cosineannealinglr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adam_tensor_lr_tensor_betas_capturable_foreach_cuda_exponentiallr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adam_tensor_lr_tensor_betas_capturable_foreach_cuda_polynomiallr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adam_weight_decay_amsgrad_cpu, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adam_weight_decay_amsgrad_foreach_cuda, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adam_weight_decay_maximize_foreach_cuda, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamax_maximize_foreach_cuda, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamax_tensor_lr_weight_decay_capturable_cuda_cosineannealingwarmrestarts, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamax_tensor_lr_weight_decay_capturable_cuda_multisteplr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamax_tensor_lr_weight_decay_capturable_cuda_polynomiallr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamax_tensor_lr_weight_decay_capturable_cuda_steplr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamax_tensor_lr_weight_decay_capturable_foreach_cuda_cosineannealingwarmrestarts, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamax_tensor_lr_weight_decay_capturable_foreach_cuda_exponentiallr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamax_tensor_lr_weight_decay_capturable_foreach_cuda_lambdalr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamax_tensor_lr_weight_decay_capturable_foreach_cuda_linearlr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamax_tensor_lr_weight_decay_capturable_foreach_cuda_multisteplr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamax_tensor_lr_weight_decay_capturable_foreach_cuda_steplr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamax_weight_decay_capturable_cuda, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamax_weight_decay_maximize_cpu, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamw_capturable_foreach_cuda, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamw_cpu, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamw_tensor_lr_amsgrad_capturable_cuda_cosineannealingwarmrestarts, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamw_tensor_lr_amsgrad_capturable_cuda_exponentiallr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamw_tensor_lr_amsgrad_capturable_cuda_lambdalr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamw_tensor_lr_amsgrad_capturable_cuda_multisteplr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamw_tensor_lr_amsgrad_capturable_cuda_reducelronplateau, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamw_tensor_lr_amsgrad_capturable_foreach_cuda_cosineannealinglr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamw_tensor_lr_amsgrad_capturable_foreach_cuda_cosineannealingwarmrestarts, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamw_tensor_lr_amsgrad_capturable_foreach_cuda_multiplicativelr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamw_tensor_lr_amsgrad_capturable_foreach_cuda_multisteplr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamw_tensor_lr_tensor_betas_amsgrad_capturable_cuda_cosineannealinglr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamw_tensor_lr_tensor_betas_amsgrad_capturable_cuda_cycliclr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamw_tensor_lr_tensor_betas_amsgrad_capturable_cuda_linearlr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamw_tensor_lr_tensor_betas_amsgrad_capturable_foreach_cuda_constantlr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamw_tensor_lr_tensor_betas_amsgrad_capturable_foreach_cuda_cosineannealingwarmrestarts, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamw_tensor_lr_tensor_betas_amsgrad_capturable_foreach_cuda_multiplicativelr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamw_tensor_lr_tensor_betas_amsgrad_capturable_foreach_cuda_onecyclelr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamw_tensor_lr_tensor_betas_amsgrad_capturable_foreach_cuda_polynomiallr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamw_tensor_lr_tensor_betas_amsgrad_capturable_foreach_cuda_reducelronplateau, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamw_tensor_lr_tensor_betas_capturable_cuda_cosineannealinglr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamw_tensor_lr_tensor_betas_capturable_cuda_cosineannealingwarmrestarts, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamw_tensor_lr_tensor_betas_capturable_cuda_exponentiallr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamw_tensor_lr_tensor_betas_capturable_cuda_lambdalr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamw_tensor_lr_tensor_betas_capturable_cuda_linearlr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamw_tensor_lr_tensor_betas_capturable_cuda_multisteplr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamw_tensor_lr_tensor_betas_capturable_foreach_cuda_constantlr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamw_tensor_lr_tensor_betas_capturable_foreach_cuda_cycliclr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamw_tensor_lr_tensor_betas_capturable_foreach_cuda_lambdalr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamw_tensor_lr_tensor_betas_capturable_foreach_cuda_linearlr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamw_tensor_lr_tensor_betas_capturable_foreach_cuda_multisteplr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamw_tensor_lr_tensor_betas_capturable_foreach_cuda_onecyclelr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamw_weight_decay_amsgrad_foreach_cuda, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamw_weight_decay_cpu, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamw_weight_decay_maximize_foreach_cuda, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_asgd_capturable_foreach_cuda, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_asgd_cpu, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_asgd_cuda, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_asgd_foreach_cuda, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_asgd_maximize_capturable_cuda, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_asgd_maximize_cpu, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_asgd_maximize_foreach_cuda, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_asgd_t0_cuda, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_asgd_tensor_lr_weight_decay_maximize_capturable_cuda_constantlr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_asgd_tensor_lr_weight_decay_maximize_capturable_cuda_cosineannealinglr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_asgd_tensor_lr_weight_decay_maximize_capturable_cuda_lambdalr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_asgd_tensor_lr_weight_decay_maximize_capturable_cuda_multiplicativelr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_asgd_tensor_lr_weight_decay_maximize_capturable_cuda_reducelronplateau, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_asgd_tensor_lr_weight_decay_maximize_capturable_cuda_steplr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_asgd_tensor_lr_weight_decay_maximize_capturable_foreach_cuda_cosineannealinglr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_asgd_tensor_lr_weight_decay_maximize_capturable_foreach_cuda_cycliclr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_asgd_tensor_lr_weight_decay_maximize_capturable_foreach_cuda_exponentiallr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_asgd_tensor_lr_weight_decay_maximize_capturable_foreach_cuda_lambdalr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_asgd_tensor_lr_weight_decay_maximize_capturable_foreach_cuda_linearlr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_asgd_tensor_lr_weight_decay_maximize_capturable_foreach_cuda_multiplicativelr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_asgd_tensor_lr_weight_decay_maximize_capturable_foreach_cuda_multisteplr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_asgd_tensor_lr_weight_decay_maximize_capturable_foreach_cuda_onecyclelr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_asgd_tensor_lr_weight_decay_maximize_capturable_foreach_cuda_reducelronplateau, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_asgd_weight_decay_capturable_cuda, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_asgd_weight_decay_maximize_capturable_cuda, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_closure_graph_break, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_foreach_map_adam, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_nadam_foreach_cuda, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_nadam_momentum_decay_foreach_cuda, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_nadam_tensor_lr_weight_decay_momentum_decay_decoupled_weight_decay_capturable_cuda_multiplicativelr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_nadam_tensor_lr_weight_decay_momentum_decay_decoupled_weight_decay_capturable_cuda_multisteplr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_nadam_tensor_lr_weight_decay_momentum_decay_decoupled_weight_decay_capturable_foreach_cuda_cosineannealinglr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_nadam_tensor_lr_weight_decay_momentum_decay_decoupled_weight_decay_capturable_foreach_cuda_cosineannealingwarmrestarts, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_nadam_tensor_lr_weight_decay_momentum_decay_decoupled_weight_decay_capturable_foreach_cuda_exponentiallr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_nadam_weight_decay_cpu, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_nadam_weight_decay_foreach_cuda, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_nadam_weight_decay_maximize_cuda, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_nadam_weight_decay_momentum_decay_decoupled_weight_decay_capturable_cuda, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_nadam_weight_decay_momentum_decay_decoupled_weight_decay_cpu, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_nadam_weight_decay_momentum_decay_decoupled_weight_decay_foreach_cuda, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_radam_capturable_cuda, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_radam_capturable_foreach_cuda, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_radam_capturable_weight_decay_cuda, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_radam_capturable_weight_decay_foreach_cuda, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_radam_cpu, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_radam_tensor_lr_capturable_weight_decay_decoupled_weight_decay_cuda_linearlr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_radam_tensor_lr_capturable_weight_decay_decoupled_weight_decay_cuda_multisteplr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_radam_tensor_lr_capturable_weight_decay_decoupled_weight_decay_foreach_cuda_exponentiallr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_radam_tensor_lr_capturable_weight_decay_decoupled_weight_decay_foreach_cuda_lambdalr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_radam_tensor_lr_capturable_weight_decay_decoupled_weight_decay_foreach_cuda_multiplicativelr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_radam_tensor_lr_capturable_weight_decay_decoupled_weight_decay_foreach_cuda_multisteplr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_radam_weight_decay_decoupled_weight_decay_cuda, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_radam_weight_decay_decoupled_weight_decay_foreach_cuda, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_radam_weight_decay_maximize_cpu, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_rmsprop_cuda, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_rmsprop_maximize_cpu, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_rmsprop_maximize_foreach_cuda, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_rmsprop_maximize_weight_decay_cuda, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_rmsprop_tensor_lr_capturable_cuda_constantlr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_rmsprop_tensor_lr_capturable_cuda_exponentiallr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_rmsprop_tensor_lr_capturable_cuda_linearlr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_rmsprop_tensor_lr_capturable_foreach_cuda_cosineannealinglr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_rmsprop_tensor_lr_capturable_foreach_cuda_cycliclr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_rmsprop_tensor_lr_capturable_foreach_cuda_polynomiallr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_rmsprop_tensor_lr_capturable_foreach_cuda_steplr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_rmsprop_weight_decay_centered_foreach_cuda, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_rmsprop_weight_decay_maximize_capturable_cuda, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_rmsprop_weight_decay_maximize_capturable_foreach_cuda, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_rprop_capturable_cuda, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_rprop_maximize_cpu, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_rprop_step_sizes_foreach_cuda, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_rprop_tensor_lr_capturable_cuda_constantlr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_rprop_tensor_lr_capturable_cuda_cosineannealinglr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_rprop_tensor_lr_capturable_cuda_cycliclr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_rprop_tensor_lr_capturable_cuda_linearlr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_rprop_tensor_lr_capturable_cuda_multiplicativelr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_rprop_tensor_lr_capturable_cuda_polynomiallr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_rprop_tensor_lr_capturable_cuda_reducelronplateau, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_rprop_tensor_lr_capturable_foreach_cuda_cycliclr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_rprop_tensor_lr_capturable_foreach_cuda_exponentiallr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_rprop_tensor_lr_capturable_foreach_cuda_linearlr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_rprop_tensor_lr_capturable_foreach_cuda_multisteplr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_rprop_tensor_lr_capturable_foreach_cuda_polynomiallr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_rprop_tensor_lr_capturable_foreach_cuda_reducelronplateau, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_rprop_tensor_lr_capturable_foreach_cuda_steplr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_sgd_cuda, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_sgd_momentum_cpu, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_sgd_momentum_foreach_cuda, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_sgd_momentum_weight_decay_cpu, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_sgd_momentum_weight_decay_cuda, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_sgd_momentum_weight_decay_foreach_cuda, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_sgd_recompile_single, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_sgd_tensor_lr_cpu_constantlr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_sgd_tensor_lr_cpu_cosineannealinglr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_sgd_tensor_lr_cpu_exponentiallr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_sgd_tensor_lr_cpu_multisteplr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_sgd_tensor_lr_cpu_polynomiallr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_sgd_tensor_lr_cpu_reducelronplateau, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_sgd_tensor_lr_cpu_steplr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_sgd_tensor_lr_cuda_cosineannealingwarmrestarts, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_sgd_tensor_lr_cuda_lambdalr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_sgd_tensor_lr_cuda_multisteplr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_sgd_tensor_lr_cuda_polynomiallr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_sgd_tensor_lr_cuda_reducelronplateau, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_sgd_tensor_lr_foreach_cuda_constantlr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_sgd_tensor_lr_foreach_cuda_cycliclr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_sgd_tensor_lr_foreach_cuda_linearlr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_sgd_tensor_lr_foreach_cuda_multisteplr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_sgd_tensor_lr_foreach_cuda_onecyclelr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_sgd_tensor_lr_foreach_cuda_steplr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_sgd_weight_decay_maximize_cpu, test/inductor/test_compiled_optimizers.py::CompiledOptimizerParityTestsCUDA::test_correctness_ASGD_use_closure_True_cuda_float32, test/inductor/test_compiled_optimizers.py::CompiledOptimizerParityTestsCUDA::test_correctness_Adafactor_use_closure_False_cuda_float32, test/inductor/test_compiled_optimizers.py::CompiledOptimizerParityTestsCUDA::test_correctness_Adafactor_use_closure_True_cuda_float32, test/inductor/test_compiled_optimizers.py::CompiledOptimizerParityTestsCUDA::test_correctness_Adagrad_use_closure_True_cuda_float32, test/inductor/test_compiled_optimizers.py::CompiledOptimizerParityTestsCUDA::test_correctness_Adam_use_closure_True_cuda_float32, test/inductor/test_compiled_optimizers.py::CompiledOptimizerParityTestsCUDA::test_correctness_Adamax_use_closure_False_cuda_float32, test/inductor/test_compiled_optimizers.py::CompiledOptimizerParityTestsCUDA::test_correctness_LBFGS_use_closure_False_cuda_float32, test/inductor/test_compiled_optimizers.py::CompiledOptimizerParityTestsCUDA::test_correctness_Muon_use_closure_False_cuda_float32, test/inductor/test_compiled_optimizers.py::CompiledOptimizerParityTestsCUDA::test_correctness_NAdam_use_closure_False_cuda_float32, test/inductor/test_compiled_optimizers.py::CompiledOptimizerParityTestsCUDA::test_correctness_RAdam_use_closure_True_cuda_float32, test/inductor/test_compiled_optimizers.py::CompiledOptimizerParityTestsCUDA::test_correctness_RMSprop_use_closure_True_cuda_float32, test/inductor/test_compiled_optimizers.py::CompiledOptimizerParityTestsCUDA::test_correctness_Rprop_use_closure_False_cuda_float32, test/inductor/test_compiled_optimizers.py::CompiledOptimizerParityTestsCUDA::test_correctness_SparseAdam_use_closure_False_cuda_float32 2025-12-04T16:09:22.3343532Z 2025-12-04T16:09:22.3343955Z Finished inductor/test_compiled_optimizers 1/3 ... [2025-12-04 16:09:22.303207][24990.686101481], took 10.13min 2025-12-04T16:09:22.3345406Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_compiled_optimizers/inductor.test_compiled_optimizers-1745c9b9e5fc7ed3.xml 2025-12-04T16:09:23.7534319Z Uploading artifacts took 1.34 seconds 2025-12-04T16:09:23.7538675Z Running inductor/test_aot_inductor_utils 1/1 ... [2025-12-04 16:09:23.753659][24992.13655299] 2025-12-04T16:09:23.7539273Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T16:09:23.7543902Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_aot_inductor_utils.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 16:09:23.754148] 2025-12-04T16:09:34.1690039Z 2025-12-04T16:09:34.1691279Z inductor/test_aot_inductor_utils 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_aot_inductor_utils_1.1_6e3c972b94953db6_.log 2025-12-04T16:09:34.1692302Z Running 0 items in this shard: 2025-12-04T16:09:34.1692541Z 2025-12-04T16:09:34.1693178Z Finished inductor/test_aot_inductor_utils 1/1 ... [2025-12-04 16:09:34.168797][25002.551692764], took 0.17min 2025-12-04T16:09:34.1994633Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_aot_inductor_utils/inductor.test_aot_inductor_utils-e7355f16ccb52d23.xml 2025-12-04T16:09:34.2735811Z Running inductor/test_control_flow 3/4 ... [2025-12-04 16:09:34.273251][25002.656145406] 2025-12-04T16:09:34.2736473Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T16:09:34.2739591Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_control_flow.py', '--shard-id=3', '--num-shards=4', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 16:09:34.273714] 2025-12-04T16:24:52.4248050Z 2025-12-04T16:24:52.4249183Z inductor/test_control_flow 3/4 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_control_flow_3.4_41808f1ad591b77f_.log 2025-12-04T16:24:52.4467092Z Running 183 items in this shard: test/inductor/test_control_flow.py::CondTests::test_cond_advanced_dynamic_shapes_device_cuda, test/inductor/test_control_flow.py::CondTests::test_cond_aliasing_outputs, test/inductor/test_control_flow.py::CondTests::test_cond_decompose_ops_in_subgraph_device_cpu, test/inductor/test_control_flow.py::CondTests::test_cond_functional_call_device_cpu_dynamic_False, test/inductor/test_control_flow.py::CondTests::test_cond_inductor_fx_passes_recursively_applied, test/inductor/test_control_flow.py::CondTests::test_cond_mismatched_branch_output_size_device_cuda_dynamic_False, test/inductor/test_control_flow.py::CondTests::test_cond_outer_code_before_after_device_cpu_dynamic_False, test/inductor/test_control_flow.py::CondTests::test_cond_outer_code_before_after_device_cuda_dynamic_False, test/inductor/test_control_flow.py::CondTests::test_cond_select_with_input_idx_device_cpu_dynamic_False, test/inductor/test_control_flow.py::CondTests::test_cond_subgraphs_with_parameters_device_cpu_dynamic_False, test/inductor/test_control_flow.py::CondTests::test_cond_unbacked_symint_closure_device_cuda_dynamic_True, test/inductor/test_control_flow.py::CondTests::test_cond_unbacked_symint_inner_to_outer_device_cpu, test/inductor/test_control_flow.py::CondTests::test_cond_unbacked_symint_inner_to_outer_device_cuda, test/inductor/test_control_flow.py::CondTests::test_cond_unbacked_symint_outer_to_inner_device_cuda, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_nested_control_flow_device_cpu_dynamic_False_autograd_False, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_nested_control_flow_device_cpu_dynamic_True_autograd_False, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_nested_control_flow_device_cuda_dynamic_True_autograd_True, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_simple_control_flow_device_cpu_dynamic_True_autograd_True, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_simple_control_flow_device_cuda_dynamic_True_autograd_False, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_stack_output_simple_device_cpu_dynamic_True, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_stack_output_simple_device_cuda_dynamic_False, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_conv_device_cpu_dynamic_False_autograd_False, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_conv_device_cpu_dynamic_False_autograd_True, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_conv_device_cuda_dynamic_True_autograd_True, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_data_dependent_in_out_device_cpu_dynamic_False_autograd_True, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_data_dependent_ops_device_cpu_dynamic_True_autograd_False, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_data_dependent_ops_device_cpu_dynamic_True_autograd_True, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_data_dependent_ops_device_cuda_dynamic_False_autograd_False, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_data_dependent_ops_device_cuda_dynamic_True_autograd_True, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_outer_code_device_cpu_dynamic_False_autograd_False, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_outer_code_device_cpu_dynamic_False_autograd_True, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_outer_code_device_cuda_dynamic_False_autograd_False, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_outer_code_device_cuda_dynamic_False_autograd_True, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_parameters_device_cpu_dynamic_False_autograd_True, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_pytree_inputs_device_cpu_dynamic_True_autograd_True, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_pytree_inputs_device_cuda_dynamic_False_autograd_False, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_pytree_inputs_device_cuda_dynamic_False_autograd_True, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_pytree_inputs_device_cuda_dynamic_True_autograd_True, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_unbacked_symint_closure_device_cuda_dynamic_True_autograd_False, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_zero_loop_device_cpu_dynamic_False, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cpu_dynamic_False_reverse_False_dim_0_scan_length_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cpu_dynamic_False_reverse_False_dim_0_scan_length_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cpu_dynamic_False_reverse_False_dim_1_scan_length_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cpu_dynamic_False_reverse_False_dim_1_scan_length_5_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cpu_dynamic_False_reverse_False_dim_3_scan_length_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cpu_dynamic_False_reverse_True_dim_0_scan_length_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cpu_dynamic_False_reverse_True_dim_0_scan_length_5_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cpu_dynamic_False_reverse_True_dim_1_scan_length_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cpu_dynamic_False_reverse_True_dim_1_scan_length_5_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cpu_dynamic_True_reverse_False_dim_0_scan_length_5_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cpu_dynamic_True_reverse_False_dim_1_scan_length_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cpu_dynamic_True_reverse_False_dim_1_scan_length_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cpu_dynamic_True_reverse_False_dim_1_scan_length_5_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cpu_dynamic_True_reverse_False_dim_3_scan_length_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cpu_dynamic_True_reverse_False_dim_3_scan_length_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cpu_dynamic_True_reverse_False_dim_3_scan_length_5_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cpu_dynamic_True_reverse_True_dim_3_scan_length_5_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cuda_dynamic_False_reverse_False_dim_1_scan_length_5_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cuda_dynamic_False_reverse_False_dim_1_scan_length_5_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cuda_dynamic_False_reverse_False_dim_3_scan_length_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cuda_dynamic_False_reverse_False_dim_3_scan_length_5_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cuda_dynamic_False_reverse_True_dim_0_scan_length_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cuda_dynamic_False_reverse_True_dim_0_scan_length_5_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cuda_dynamic_False_reverse_True_dim_3_scan_length_5_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cuda_dynamic_False_reverse_True_dim_3_scan_length_5_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cuda_dynamic_True_reverse_False_dim_0_scan_length_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cuda_dynamic_True_reverse_False_dim_0_scan_length_5_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cuda_dynamic_True_reverse_False_dim_3_scan_length_5_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_chunked_ce_device_cuda_dynamic_False_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cpu_dynamic_False_reverse_False_dim_0_scan_length_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cpu_dynamic_False_reverse_False_dim_0_scan_length_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cpu_dynamic_False_reverse_False_dim_1_scan_length_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cpu_dynamic_False_reverse_False_dim_3_scan_length_5_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cpu_dynamic_False_reverse_True_dim_0_scan_length_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cpu_dynamic_False_reverse_True_dim_1_scan_length_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cpu_dynamic_False_reverse_True_dim_1_scan_length_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cpu_dynamic_True_reverse_False_dim_0_scan_length_5_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cpu_dynamic_True_reverse_False_dim_0_scan_length_5_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cpu_dynamic_True_reverse_False_dim_3_scan_length_5_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cpu_dynamic_True_reverse_True_dim_3_scan_length_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cpu_dynamic_True_reverse_True_dim_3_scan_length_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cpu_dynamic_True_reverse_True_dim_3_scan_length_5_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cpu_dynamic_True_reverse_True_dim_3_scan_length_5_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cuda_dynamic_False_reverse_False_dim_1_scan_length_5_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cuda_dynamic_False_reverse_False_dim_1_scan_length_5_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cuda_dynamic_False_reverse_False_dim_3_scan_length_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cuda_dynamic_False_reverse_False_dim_3_scan_length_5_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cuda_dynamic_False_reverse_True_dim_0_scan_length_5_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cuda_dynamic_False_reverse_True_dim_1_scan_length_5_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cuda_dynamic_True_reverse_False_dim_0_scan_length_5_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cuda_dynamic_True_reverse_False_dim_1_scan_length_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cuda_dynamic_True_reverse_False_dim_3_scan_length_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cuda_dynamic_True_reverse_False_dim_3_scan_length_5_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cuda_dynamic_True_reverse_True_dim_0_scan_length_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cuda_dynamic_True_reverse_True_dim_0_scan_length_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cuda_dynamic_True_reverse_True_dim_0_scan_length_5_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cuda_dynamic_True_reverse_True_dim_3_scan_length_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_False_reverse_False_dim_0_pred_False_scan_length_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_False_reverse_False_dim_0_pred_False_scan_length_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_False_reverse_False_dim_0_pred_False_scan_length_5_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_False_reverse_False_dim_0_pred_False_scan_length_5_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_False_reverse_False_dim_0_pred_True_scan_length_5_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_False_reverse_False_dim_1_pred_False_scan_length_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_False_reverse_False_dim_1_pred_False_scan_length_5_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_False_reverse_False_dim_3_pred_False_scan_length_5_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_False_reverse_False_dim_3_pred_True_scan_length_5_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_False_reverse_True_dim_0_pred_False_scan_length_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_False_reverse_True_dim_0_pred_False_scan_length_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_False_reverse_True_dim_0_pred_True_scan_length_5_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_False_reverse_True_dim_1_pred_True_scan_length_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_False_reverse_True_dim_1_pred_True_scan_length_5_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_False_reverse_True_dim_3_pred_False_scan_length_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_False_reverse_True_dim_3_pred_True_scan_length_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_False_reverse_True_dim_3_pred_True_scan_length_5_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_False_reverse_True_dim_3_pred_True_scan_length_5_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_True_reverse_False_dim_0_pred_False_scan_length_5_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_True_reverse_False_dim_0_pred_True_scan_length_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_True_reverse_False_dim_0_pred_True_scan_length_5_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_True_reverse_False_dim_1_pred_False_scan_length_5_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_True_reverse_False_dim_3_pred_False_scan_length_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_True_reverse_False_dim_3_pred_True_scan_length_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_True_reverse_False_dim_3_pred_True_scan_length_5_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_True_reverse_True_dim_0_pred_False_scan_length_5_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_True_reverse_True_dim_0_pred_True_scan_length_5_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_True_reverse_True_dim_1_pred_False_scan_length_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_True_reverse_True_dim_1_pred_False_scan_length_5_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_True_reverse_True_dim_3_pred_False_scan_length_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_True_reverse_True_dim_3_pred_False_scan_length_5_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_True_reverse_True_dim_3_pred_True_scan_length_5_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cuda_dynamic_False_reverse_False_dim_0_pred_False_scan_length_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cuda_dynamic_False_reverse_False_dim_0_pred_False_scan_length_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cuda_dynamic_False_reverse_False_dim_1_pred_False_scan_length_5_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cuda_dynamic_False_reverse_False_dim_1_pred_True_scan_length_5_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cuda_dynamic_False_reverse_False_dim_3_pred_False_scan_length_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cuda_dynamic_False_reverse_True_dim_0_pred_False_scan_length_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cuda_dynamic_False_reverse_True_dim_0_pred_True_scan_length_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cuda_dynamic_False_reverse_True_dim_3_pred_True_scan_length_5_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cuda_dynamic_True_reverse_False_dim_0_pred_False_scan_length_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cuda_dynamic_True_reverse_False_dim_0_pred_True_scan_length_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cuda_dynamic_True_reverse_False_dim_3_pred_True_scan_length_5_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cuda_dynamic_True_reverse_True_dim_0_pred_False_scan_length_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cuda_dynamic_True_reverse_True_dim_0_pred_False_scan_length_5_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cuda_dynamic_True_reverse_True_dim_1_pred_False_scan_length_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cuda_dynamic_True_reverse_True_dim_1_pred_False_scan_length_5_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cuda_dynamic_True_reverse_True_dim_3_pred_False_scan_length_5_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cuda_dynamic_True_reverse_True_dim_3_pred_True_scan_length_5_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cpu_dynamic_False_reverse_False_dim_0_scan_length_5_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cpu_dynamic_False_reverse_False_dim_3_scan_length_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cpu_dynamic_False_reverse_False_dim_3_scan_length_5_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cpu_dynamic_False_reverse_True_dim_0_scan_length_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cpu_dynamic_False_reverse_True_dim_1_scan_length_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cpu_dynamic_False_reverse_True_dim_3_scan_length_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cpu_dynamic_True_reverse_False_dim_0_scan_length_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cpu_dynamic_True_reverse_False_dim_0_scan_length_5_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cpu_dynamic_True_reverse_False_dim_3_scan_length_5_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cpu_dynamic_True_reverse_True_dim_0_scan_length_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cpu_dynamic_True_reverse_True_dim_1_scan_length_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cpu_dynamic_True_reverse_True_dim_1_scan_length_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cpu_dynamic_True_reverse_True_dim_1_scan_length_5_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cpu_dynamic_True_reverse_True_dim_3_scan_length_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cuda_dynamic_False_reverse_False_dim_0_scan_length_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cuda_dynamic_False_reverse_False_dim_1_scan_length_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cuda_dynamic_False_reverse_False_dim_3_scan_length_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cuda_dynamic_False_reverse_True_dim_3_scan_length_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cuda_dynamic_True_reverse_False_dim_1_scan_length_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cuda_dynamic_True_reverse_False_dim_3_scan_length_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cuda_dynamic_True_reverse_True_dim_3_scan_length_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_pytree_in_out_device_cpu_dynamic_False_reverse_True_dim_0_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_pytree_in_out_device_cpu_dynamic_True_reverse_False_dim_2_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_pytree_in_out_device_cpu_dynamic_True_reverse_True_dim_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_pytree_in_out_device_cuda_dynamic_False_reverse_False_dim_0_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_pytree_in_out_device_cuda_dynamic_False_reverse_True_dim_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_pytree_in_out_device_cuda_dynamic_True_reverse_False_dim_2_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_pytree_in_out_device_cuda_dynamic_True_reverse_True_dim_0_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_with_clamp_device_cpu_dynamic_False_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_with_clamp_device_cpu_dynamic_True_autograd_True, test/inductor/test_control_flow.py::MapTests::test_map_nested_with_cond_device_cpu_dynamic_False_autograd_False, test/inductor/test_control_flow.py::MapTests::test_map_nested_with_cond_device_cpu_dynamic_False_autograd_True, test/inductor/test_control_flow.py::MapTests::test_map_pytree_in_out_device_cpu_dynamic_False_autograd_False, test/inductor/test_control_flow.py::MapTests::test_map_pytree_in_out_device_cpu_dynamic_True_autograd_True, test/inductor/test_control_flow.py::MapTests::test_map_simple_device_cpu_dynamic_False_autograd_True, test/inductor/test_control_flow.py::MapTests::test_map_simple_linear_with_view_device_cpu_dynamic_False_autograd_True, test/inductor/test_control_flow.py::MapTests::test_map_simple_linear_with_view_device_cuda_dynamic_True_autograd_True 2025-12-04T16:24:52.4641975Z 2025-12-04T16:24:52.4642418Z Finished inductor/test_control_flow 3/4 ... [2025-12-04 16:24:52.464020][25920.846908759], took 15.30min 2025-12-04T16:24:52.4947299Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_control_flow/inductor.test_control_flow-0b2081966a192cef.xml 2025-12-04T16:24:52.5944887Z Running inductor/test_minifier_isolate 1/1 ... [2025-12-04 16:24:52.594149][25920.977041386] 2025-12-04T16:24:52.5945516Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T16:24:52.5949306Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_minifier_isolate.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 16:24:52.594636] 2025-12-04T16:26:53.5379150Z 2025-12-04T16:26:53.5380289Z inductor/test_minifier_isolate 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_minifier_isolate_1.1_057329f0cdaf132f_.log 2025-12-04T16:26:53.5382182Z Running 2 items in this shard: test/inductor/test_minifier_isolate.py::MinifierIsolateTests::test_after_aot_cpu_runtime_error, test/inductor/test_minifier_isolate.py::MinifierIsolateTests::test_after_aot_gpu_runtime_error 2025-12-04T16:26:53.5383329Z 2025-12-04T16:26:53.5383966Z Finished inductor/test_minifier_isolate 1/1 ... [2025-12-04 16:26:53.537671][26041.92056799], took 2.02min 2025-12-04T16:26:53.5688849Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_minifier_isolate/inductor.test_minifier_isolate-f50615d1a1981661.xml 2025-12-04T16:26:53.6489372Z Running dynamo/test_error_messages 1/1 ... [2025-12-04 16:26:53.648631][26042.031524968] 2025-12-04T16:26:53.6489988Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T16:26:53.6493694Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'dynamo/test_error_messages.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 16:26:53.649100] 2025-12-04T16:27:13.9945573Z 2025-12-04T16:27:13.9946952Z dynamo/test_error_messages 1/1 was successful, full logs can be found in artifacts with path test/test-reports/dynamo.test_error_messages_1.1_69ccbdbb7b8c4f0d_.log 2025-12-04T16:27:13.9974031Z Running 51 items in this shard: test/dynamo/test_error_messages.py::ErrorMessagesTest::test_assert_failure_in_generic_ctx_mgr, test/dynamo/test_error_messages.py::ErrorMessagesTest::test_backend_fake_tensor_exc, test/dynamo/test_error_messages.py::ErrorMessagesTest::test_class_property, test/dynamo/test_error_messages.py::ErrorMessagesTest::test_cpp_extension_recommends_custom_ops, test/dynamo/test_error_messages.py::ErrorMessagesTest::test_data_dependent_branching_fullgraph, test/dynamo/test_error_messages.py::ErrorMessagesTest::test_data_dependent_branching_gb, test/dynamo/test_error_messages.py::ErrorMessagesTest::test_data_dependent_operator2, test/dynamo/test_error_messages.py::ErrorMessagesTest::test_dict_items_input, test/dynamo/test_error_messages.py::ErrorMessagesTest::test_disable_message, test/dynamo/test_error_messages.py::ErrorMessagesTest::test_dynamic_shape_operator_no_meta_kernel, test/dynamo/test_error_messages.py::ErrorMessagesTest::test_dynamo_graph_break_fn, test/dynamo/test_error_messages.py::ErrorMessagesTest::test_dynamo_graph_break_fn_with_msg, test/dynamo/test_error_messages.py::ErrorMessagesTest::test_faketensor_nyi, test/dynamo/test_error_messages.py::ErrorMessagesTest::test_generic_ctx_mgr_graph_break, test/dynamo/test_error_messages.py::ErrorMessagesTest::test_graph_break_in_buggy_resume_prologue, test/dynamo/test_error_messages.py::ErrorMessagesTest::test_graph_break_traceback_above_dynamo_shows_user_code, test/dynamo/test_error_messages.py::ErrorMessagesTest::test_graph_break_traceback_collapsed_resume_frames, test/dynamo/test_error_messages.py::ErrorMessagesTest::test_internal_compiler_stacktrace_verbose, test/dynamo/test_error_messages.py::ErrorMessagesTest::test_latest_bytecode_to_graph_break, test/dynamo/test_error_messages.py::ErrorMessagesTest::test_latest_bytecode_to_graph_break_fullgraph, test/dynamo/test_error_messages.py::ErrorMessagesTest::test_latest_bytecode_to_graph_break_python_versioning, test/dynamo/test_error_messages.py::ErrorMessagesTest::test_load_build_class, test/dynamo/test_error_messages.py::ErrorMessagesTest::test_lru_cache_warning_logs_nested_call, test/dynamo/test_error_messages.py::ErrorMessagesTest::test_lru_cache_warning_logs_user_stack_trace, test/dynamo/test_error_messages.py::ErrorMessagesTest::test_nested_compile_user_frames, test/dynamo/test_error_messages.py::ErrorMessagesTest::test_no_internal_compiler_stacktrace, test/dynamo/test_error_messages.py::ErrorMessagesTest::test_observed_exception, test/dynamo/test_error_messages.py::ErrorMessagesTest::test_optree_graph_break_message, test/dynamo/test_error_messages.py::ErrorMessagesTest::test_reconstruction_failure, test/dynamo/test_error_messages.py::ErrorMessagesTest::test_reconstruction_failure_gb, test/dynamo/test_error_messages.py::ErrorMessagesTest::test_skip_frame_empty_function_message, test/dynamo/test_error_messages.py::ErrorMessagesTest::test_skip_frame_in_loop_message, test/dynamo/test_error_messages.py::ErrorMessagesTest::test_skipfile_call, test/dynamo/test_error_messages.py::ErrorMessagesTest::test_skipfile_dynamo_call, test/dynamo/test_error_messages.py::ErrorMessagesTest::test_skipfile_inline, test/dynamo/test_error_messages.py::ErrorMessagesTest::test_skipped_frame_with_verbose_traceback, test/dynamo/test_error_messages.py::ErrorMessagesTest::test_sort_with_nonconstant_keys, test/dynamo/test_error_messages.py::ErrorMessagesTest::test_step_graph_break, test/dynamo/test_error_messages.py::ErrorMessagesTest::test_store_attr_graph_break, test/dynamo/test_error_messages.py::ErrorMessagesTest::test_super_call_function, test/dynamo/test_error_messages.py::ErrorMessagesTest::test_super_call_method, test/dynamo/test_error_messages.py::ErrorMessagesTest::test_uninitialized_module, test/dynamo/test_error_messages.py::ErrorMessagesTest::test_unsupported_builtin, test/dynamo/test_error_messages.py::ErrorMessagesTest::test_unsupported_bytecode, test/dynamo/test_error_messages.py::ErrorMessagesTest::test_unsupported_context, test/dynamo/test_error_messages.py::ErrorMessagesTest::test_variable_tracker_source_attribution, test/dynamo/test_error_messages.py::ErrorMessagesTest::test_warnings, test/dynamo/test_error_messages.py::NestedGraphBreakLoggingTests::test_nested_graph_break_different_call_sites_not_suppressed, test/dynamo/test_error_messages.py::NestedGraphBreakLoggingTests::test_skip_frame_in_loop_message_nested, test/dynamo/test_error_messages.py::NestedGraphBreakLoggingTests::test_skipped_frame_with_verbose_traceback_nested, test/dynamo/test_error_messages.py::NestedGraphBreakLoggingTests::test_try_block_with_graph_break_suppression 2025-12-04T16:27:13.9998687Z 2025-12-04T16:27:13.9999051Z Finished dynamo/test_error_messages 1/1 ... [2025-12-04 16:27:13.994343][26062.377238637], took 0.34min 2025-12-04T16:27:14.0254593Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/dynamo.test_error_messages/dynamo.test_error_messages-36d8e363c2770c16.xml 2025-12-04T16:27:14.1594359Z Running dynamo/test_fake_distributed 1/1 ... [2025-12-04 16:27:14.159064][26062.541957658] 2025-12-04T16:27:14.1594995Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T16:27:14.1598229Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'dynamo/test_fake_distributed.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 16:27:14.159543] 2025-12-04T16:27:23.4880384Z 2025-12-04T16:27:23.4881535Z dynamo/test_fake_distributed 1/1 was successful, full logs can be found in artifacts with path test/test-reports/dynamo.test_fake_distributed_1.1_14aa9693a6d04f2f_.log 2025-12-04T16:27:23.4884036Z Running 3 items in this shard: test/dynamo/test_fake_distributed.py::TestFakeDistributed::test_all_to_all_single_autograd, test/dynamo/test_fake_distributed.py::TestFakeDistributed::test_device_mesh_flatten, test/dynamo/test_fake_distributed.py::TestFakeDistributed::test_device_mesh_get_local_rank 2025-12-04T16:27:23.4885667Z 2025-12-04T16:27:23.4886073Z Finished dynamo/test_fake_distributed 1/1 ... [2025-12-04 16:27:23.487798][26071.870694527], took 0.16min 2025-12-04T16:27:23.5188454Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/dynamo.test_fake_distributed/dynamo.test_fake_distributed-b0f5d6fe6c345e8f.xml 2025-12-04T16:27:23.6234583Z Running dynamo/test_tree_map 1/1 ... [2025-12-04 16:27:23.623106][26072.006000616] 2025-12-04T16:27:23.6235255Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T16:27:23.6238404Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'dynamo/test_tree_map.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 16:27:23.623563] 2025-12-04T16:27:33.1521205Z 2025-12-04T16:27:33.1522297Z dynamo/test_tree_map 1/1 was successful, full logs can be found in artifacts with path test/test-reports/dynamo.test_tree_map_1.1_63649f1aa127b381_.log 2025-12-04T16:27:33.1544254Z Running 31 items in this shard: test/dynamo/test_tree_map.py::TreeMapCompileTests::test_constantvariable_handles_none_is_leaf_kwarg, test/dynamo/test_tree_map.py::TreeMapCompileTests::test_constantvariable_handles_python_and_dtype_leaves, test/dynamo/test_tree_map.py::TreeMapCompileTests::test_tree_map_is_leaf_handles_tensor_nodes, test/dynamo/test_tree_map.py::TreeMapCompileTests::test_tree_map_is_leaf_non_constant_fallback, test/dynamo/test_tree_map.py::TreeMapCompileTests::test_tree_map_none_nodes_default_behavior_tree_map_name_optree_tree_map_impl0, test/dynamo/test_tree_map.py::TreeMapCompileTests::test_tree_map_none_nodes_default_behavior_tree_map_name_pytree_cxx_tree_map_impl2, test/dynamo/test_tree_map.py::TreeMapCompileTests::test_tree_map_none_nodes_default_behavior_tree_map_name_pytree_python_tree_map_impl1, test/dynamo/test_tree_map.py::TreeMapCompileTests::test_tree_map_none_nodes_reject_mismatched_siblings, test/dynamo/test_tree_map.py::TreeMapCompileTests::test_tree_map_only_applies_to_tensor_nodes, test/dynamo/test_tree_map.py::TreeMapCompileTests::test_tree_map_only_handles_multiple_types, test/dynamo/test_tree_map.py::TreeMapCompileTests::test_tree_map_only_multiple_trees_falls_back, test/dynamo/test_tree_map.py::TreeMapCompileTests::test_tree_map_only_predicate_selector_skips_fastpath, test/dynamo/test_tree_map.py::TreeMapCompileTests::test_tree_map_rejects_mismatched_container_types, test/dynamo/test_tree_map.py::TreeMapCompileTests::test_tree_map_variants_tree_map_name_optree_tree_map_impl0_kwargs_name_default_kwargs0_allowed_impls0, test/dynamo/test_tree_map.py::TreeMapCompileTests::test_tree_map_variants_tree_map_name_optree_tree_map_impl0_kwargs_name_is_leaf_kwargs2_allowed_impls2, test/dynamo/test_tree_map.py::TreeMapCompileTests::test_tree_map_variants_tree_map_name_optree_tree_map_impl0_kwargs_name_namespace_and_none_is_leaf_kwargs4_allowed_impls4, test/dynamo/test_tree_map.py::TreeMapCompileTests::test_tree_map_variants_tree_map_name_optree_tree_map_impl0_kwargs_name_namespace_kwargs3_allowed_impls3, test/dynamo/test_tree_map.py::TreeMapCompileTests::test_tree_map_variants_tree_map_name_optree_tree_map_impl0_kwargs_name_namespace_none_is_leaf_predicate_kwargs5_allowed_impls5, test/dynamo/test_tree_map.py::TreeMapCompileTests::test_tree_map_variants_tree_map_name_optree_tree_map_impl0_kwargs_name_none_is_leaf_kwargs1_allowed_impls1, test/dynamo/test_tree_map.py::TreeMapCompileTests::test_tree_map_variants_tree_map_name_pytree_cxx_tree_map_impl2_kwargs_name_default_kwargs0_allowed_impls0, test/dynamo/test_tree_map.py::TreeMapCompileTests::test_tree_map_variants_tree_map_name_pytree_cxx_tree_map_impl2_kwargs_name_is_leaf_kwargs2_allowed_impls2, test/dynamo/test_tree_map.py::TreeMapCompileTests::test_tree_map_variants_tree_map_name_pytree_cxx_tree_map_impl2_kwargs_name_namespace_and_none_is_leaf_kwargs4_allowed_impls4, test/dynamo/test_tree_map.py::TreeMapCompileTests::test_tree_map_variants_tree_map_name_pytree_cxx_tree_map_impl2_kwargs_name_namespace_kwargs3_allowed_impls3, test/dynamo/test_tree_map.py::TreeMapCompileTests::test_tree_map_variants_tree_map_name_pytree_cxx_tree_map_impl2_kwargs_name_namespace_none_is_leaf_predicate_kwargs5_allowed_impls5, test/dynamo/test_tree_map.py::TreeMapCompileTests::test_tree_map_variants_tree_map_name_pytree_cxx_tree_map_impl2_kwargs_name_none_is_leaf_kwargs1_allowed_impls1, test/dynamo/test_tree_map.py::TreeMapCompileTests::test_tree_map_variants_tree_map_name_pytree_python_tree_map_impl1_kwargs_name_default_kwargs0_allowed_impls0, test/dynamo/test_tree_map.py::TreeMapCompileTests::test_tree_map_variants_tree_map_name_pytree_python_tree_map_impl1_kwargs_name_is_leaf_kwargs2_allowed_impls2, test/dynamo/test_tree_map.py::TreeMapCompileTests::test_tree_map_variants_tree_map_name_pytree_python_tree_map_impl1_kwargs_name_namespace_and_none_is_leaf_kwargs4_allowed_impls4, test/dynamo/test_tree_map.py::TreeMapCompileTests::test_tree_map_variants_tree_map_name_pytree_python_tree_map_impl1_kwargs_name_namespace_kwargs3_allowed_impls3, test/dynamo/test_tree_map.py::TreeMapCompileTests::test_tree_map_variants_tree_map_name_pytree_python_tree_map_impl1_kwargs_name_namespace_none_is_leaf_predicate_kwargs5_allowed_impls5, test/dynamo/test_tree_map.py::TreeMapCompileTests::test_tree_map_variants_tree_map_name_pytree_python_tree_map_impl1_kwargs_name_none_is_leaf_kwargs1_allowed_impls1 2025-12-04T16:27:33.1565118Z 2025-12-04T16:27:33.1565444Z Finished dynamo/test_tree_map 1/1 ... [2025-12-04 16:27:33.151927][26081.534824081], took 0.16min 2025-12-04T16:27:33.1831687Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/dynamo.test_tree_map/dynamo.test_tree_map-39d9c68e899fe910.xml 2025-12-04T16:27:33.2621561Z Running dynamo/test_minifier 1/1 ... [2025-12-04 16:27:33.261825][26081.644719054] 2025-12-04T16:27:33.2622128Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T16:27:33.2625415Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'dynamo/test_minifier.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 16:27:33.262279] 2025-12-04T16:27:44.4432712Z 2025-12-04T16:27:44.4433748Z dynamo/test_minifier 1/1 was successful, full logs can be found in artifacts with path test/test-reports/dynamo.test_minifier_1.1_70592d9088ca13b1_.log 2025-12-04T16:27:44.4441803Z Running 15 items in this shard: test/dynamo/test_minifier.py::MinifierTestsCUDA::test_after_dynamo_cpu_accuracy_backend_passes_cuda, test/dynamo/test_minifier.py::MinifierTestsCUDA::test_after_dynamo_cpu_accuracy_error_cuda, test/dynamo/test_minifier.py::MinifierTestsCUDA::test_after_dynamo_cpu_compile_backend_passes_cuda, test/dynamo/test_minifier.py::MinifierTestsCUDA::test_after_dynamo_cpu_compile_error_cuda, test/dynamo/test_minifier.py::MinifierTestsCUDA::test_after_dynamo_cpu_runtime_backend_passes_cuda, test/dynamo/test_minifier.py::MinifierTestsCUDA::test_after_dynamo_cpu_runtime_error_cuda, test/dynamo/test_minifier.py::MinifierTestsCUDA::test_after_dynamo_cuda_accuracy_backend_passes_cuda, test/dynamo/test_minifier.py::MinifierTestsCUDA::test_after_dynamo_cuda_accuracy_error_cuda, test/dynamo/test_minifier.py::MinifierTestsCUDA::test_after_dynamo_cuda_compile_backend_passes_cuda, test/dynamo/test_minifier.py::MinifierTestsCUDA::test_after_dynamo_cuda_compile_error_cuda, test/dynamo/test_minifier.py::MinifierTestsCUDA::test_after_dynamo_cuda_runtime_backend_passes_cuda, test/dynamo/test_minifier.py::MinifierTestsCUDA::test_after_dynamo_cuda_runtime_error_cuda, test/dynamo/test_minifier.py::MinifierTestsCUDA::test_after_dynamo_non_leaf_compile_error_cuda, test/dynamo/test_minifier.py::MinifierTestsCUDA::test_cpu_cuda_module_after_dynamo_cuda, test/dynamo/test_minifier.py::MinifierTestsCUDA::test_if_graph_minified_cuda 2025-12-04T16:27:44.4449220Z 2025-12-04T16:27:44.4449546Z Finished dynamo/test_minifier 1/1 ... [2025-12-04 16:27:44.443259][26092.82615348], took 0.19min 2025-12-04T16:27:44.4753744Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/dynamo.test_minifier/dynamo.test_minifier-9124cc51e1c5e7b6.xml 2025-12-04T16:27:44.5607593Z Running dynamo/test_guard_manager 1/1 ... [2025-12-04 16:27:44.560410][26092.943303797] 2025-12-04T16:27:44.5608180Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T16:27:44.5611560Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'dynamo/test_guard_manager.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 16:27:44.560883] 2025-12-04T16:27:53.4890072Z 2025-12-04T16:27:53.4891392Z dynamo/test_guard_manager 1/1 was successful, full logs can be found in artifacts with path test/test-reports/dynamo.test_guard_manager_1.1_bfbfec93ec272b46_.log 2025-12-04T16:27:53.4907260Z Running 38 items in this shard: test/dynamo/test_guard_manager.py::GuardManagerTests::test_attr_guard_manager, test/dynamo/test_guard_manager.py::GuardManagerTests::test_call_function_no_args_guard, test/dynamo/test_guard_manager.py::GuardManagerTests::test_clone, test/dynamo/test_guard_manager.py::GuardManagerTests::test_default_device_guard, test/dynamo/test_guard_manager.py::GuardManagerTests::test_dict_contains_guard, test/dynamo/test_guard_manager.py::GuardManagerTests::test_dict_getitem_accessor, test/dynamo/test_guard_manager.py::GuardManagerTests::test_dict_guard_manager, test/dynamo/test_guard_manager.py::GuardManagerTests::test_dict_version_guard, test/dynamo/test_guard_manager.py::GuardManagerTests::test_diff_guard_manager, test/dynamo/test_guard_manager.py::GuardManagerTests::test_dynamic_indices_guard, test/dynamo/test_guard_manager.py::GuardManagerTests::test_equals_guard, test/dynamo/test_guard_manager.py::GuardManagerTests::test_framelocals_accessor, test/dynamo/test_guard_manager.py::GuardManagerTests::test_framelocals_guard_e2e, test/dynamo/test_guard_manager.py::GuardManagerTests::test_global_state_guard, test/dynamo/test_guard_manager.py::GuardManagerTests::test_global_state_reason, test/dynamo/test_guard_manager.py::GuardManagerTests::test_global_weakref, test/dynamo/test_guard_manager.py::GuardManagerTests::test_globals, test/dynamo/test_guard_manager.py::GuardManagerTests::test_guard_manager_leaf_guard, test/dynamo/test_guard_manager.py::GuardManagerTests::test_id_guard, test/dynamo/test_guard_manager.py::GuardManagerTests::test_item_guard_manager, test/dynamo/test_guard_manager.py::GuardManagerTests::test_lambda_manager, test/dynamo/test_guard_manager.py::GuardManagerTests::test_length_check_guard, test/dynamo/test_guard_manager.py::GuardManagerTests::test_no_hasattr_guard, test/dynamo/test_guard_manager.py::GuardManagerTests::test_no_tensor_aliasing_guard, test/dynamo/test_guard_manager.py::GuardManagerTests::test_python_lambda_leaf_guard, test/dynamo/test_guard_manager.py::GuardManagerTests::test_tensor_aliasing_guard, test/dynamo/test_guard_manager.py::GuardManagerTests::test_tensor_match_guard, test/dynamo/test_guard_manager.py::GuardManagerTests::test_tuple_iterator_getitem, test/dynamo/test_guard_manager.py::GuardManagerTests::test_type_guard, test/dynamo/test_guard_manager.py::GuardManagerTests::test_type_manager, test/dynamo/test_guard_manager.py::GuardManagerTests::test_weakref_alive_guard, test/dynamo/test_guard_manager.py::TypePropagationTests::test_basic_types, test/dynamo/test_guard_manager.py::DuplicateGuardTest::test_duplicate_guard, test/dynamo/test_guard_manager.py::TagSafetyChecks::test_dict_tag_safe, test/dynamo/test_guard_manager.py::TagSafetyChecks::test_immutable_tag_safe, test/dynamo/test_guard_manager.py::TagSafetyChecks::test_nn_module_tag_overridden_getattr_safe, test/dynamo/test_guard_manager.py::TagSafetyChecks::test_nn_module_tag_safe, test/dynamo/test_guard_manager.py::RecursiveDictGuardTests::test_disabling 2025-12-04T16:27:53.4922411Z 2025-12-04T16:27:53.4922782Z Finished dynamo/test_guard_manager 1/1 ... [2025-12-04 16:27:53.488809][26101.871705616], took 0.15min 2025-12-04T16:27:53.5206664Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/dynamo.test_guard_manager/dynamo.test_guard_manager-f0dd8a549f18516b.xml 2025-12-04T16:27:53.6119885Z Running export/test_schema 1/1 ... [2025-12-04 16:27:53.611657][26101.994550249] 2025-12-04T16:27:53.6120494Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T16:27:53.6123428Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'export/test_schema.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 16:27:53.612092] 2025-12-04T16:27:59.8361692Z 2025-12-04T16:27:59.8362700Z export/test_schema 1/1 was successful, full logs can be found in artifacts with path test/test-reports/export.test_schema_1.1_81eb22b4e3e11516_.log 2025-12-04T16:27:59.8365051Z Running 5 items in this shard: test/export/test_schema.py::TestSchema::test_schema_check, test/export/test_schema.py::TestSchema::test_schema_comparison, test/export/test_schema.py::TestSchema::test_schema_compatibility, test/export/test_schema.py::TestSchema::test_schema_diff, test/export/test_schema.py::TestSchema::test_thrift_schema_unchanged 2025-12-04T16:27:59.8367021Z 2025-12-04T16:27:59.8367355Z Finished export/test_schema 1/1 ... [2025-12-04 16:27:59.835954][26108.218850992], took 0.10min 2025-12-04T16:27:59.8677661Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/export.test_schema/export.test_schema-98e7fce7714746ab.xml 2025-12-04T16:27:59.9620452Z Running export/test_pass_infra 1/1 ... [2025-12-04 16:27:59.961725][26108.3446195] 2025-12-04T16:27:59.9621015Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T16:27:59.9624082Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'export/test_pass_infra.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 16:27:59.962165] 2025-12-04T16:28:08.7896350Z 2025-12-04T16:28:08.7897411Z export/test_pass_infra 1/1 was successful, full logs can be found in artifacts with path test/test-reports/export.test_pass_infra_1.1_d5838225a9a8bb31_.log 2025-12-04T16:28:08.7900395Z Running 5 items in this shard: test/export/test_pass_infra.py::TestPassInfra::test_cond, test/export/test_pass_infra.py::TestPassInfra::test_export_pass_base, test/export/test_pass_infra.py::TestPassInfra::test_graph_signature_updated_after_transformation, test/export/test_pass_infra.py::TestPassInfra::test_node_name_stability, test/export/test_pass_infra.py::TestPassInfra::test_replace_hook_basic 2025-12-04T16:28:08.7902455Z 2025-12-04T16:28:08.7902796Z Finished export/test_pass_infra 1/1 ... [2025-12-04 16:28:08.789420][26117.172317106], took 0.15min 2025-12-04T16:28:08.8212369Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/export.test_pass_infra/export.test_pass_infra-0489f34d1d482c78.xml 2025-12-04T16:28:08.9201997Z Running dynamo/test_recompile_ux 1/1 ... [2025-12-04 16:28:08.919844][26117.302737117] 2025-12-04T16:28:08.9202633Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T16:28:08.9205246Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'dynamo/test_recompile_ux.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 16:28:08.920280] 2025-12-04T16:28:18.5494126Z 2025-12-04T16:28:18.5495775Z dynamo/test_recompile_ux 1/1 was successful, full logs can be found in artifacts with path test/test-reports/dynamo.test_recompile_ux_1.1_ac1d0051161f3db2_.log 2025-12-04T16:28:18.5505054Z Running 10 items in this shard: test/dynamo/test_recompile_ux.py::RecompileUxTests::test_drop_cache_on_skip, test/dynamo/test_recompile_ux.py::RecompileUxTests::test_dynamic_input, test/dynamo/test_recompile_ux.py::RecompileUxTests::test_fail_on_recompile_limit_hit, test/dynamo/test_recompile_ux.py::RecompileUxTests::test_loop_torture, test/dynamo/test_recompile_ux.py::RecompileUxTests::test_mismatched_type, test/dynamo/test_recompile_ux.py::RecompileUxTests::test_multiple_guard_fails, test/dynamo/test_recompile_ux.py::RecompileUxTests::test_multiple_guard_fails_report_all, test/dynamo/test_recompile_ux.py::RecompileUxTests::test_nvfuser_guards, test/dynamo/test_recompile_ux.py::RecompileUxTests::test_recompile_child_run_only, test/dynamo/test_recompile_ux.py::RecompileUxTests::test_verbose_tensor_check 2025-12-04T16:28:18.5513001Z 2025-12-04T16:28:18.5513693Z Finished dynamo/test_recompile_ux 1/1 ... [2025-12-04 16:28:18.549134][26126.932030107], took 0.16min 2025-12-04T16:28:18.5822713Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/dynamo.test_recompile_ux/dynamo.test_recompile_ux-5436245cbc75fddd.xml 2025-12-04T16:28:18.6545742Z Running export/test_experimental 1/1 ... [2025-12-04 16:28:18.654172][26127.037066215] 2025-12-04T16:28:18.6546675Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T16:28:18.6550144Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'export/test_experimental.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 16:28:18.654673] 2025-12-04T16:28:29.8360553Z 2025-12-04T16:28:29.8361601Z export/test_experimental 1/1 was successful, full logs can be found in artifacts with path test/test-reports/export.test_experimental_1.1_01776c650d6c59b4_.log 2025-12-04T16:28:29.8372074Z Running 22 items in this shard: test/export/test_experimental.py::TestExperiment::test_dynamo_graph_capture, test/export/test_experimental.py::TestExperiment::test_dynamo_graph_capture_closure, test/export/test_experimental.py::TestExperiment::test_dynamo_graph_capture_ctx_return, test/export/test_experimental.py::TestExperiment::test_dynamo_graph_capture_custom_pytree_type, test/export/test_experimental.py::TestExperiment::test_dynamo_graph_capture_default_args, test/export/test_experimental.py::TestExperiment::test_dynamo_graph_capture_dict_keys_getitem, test/export/test_experimental.py::TestExperiment::test_dynamo_graph_capture_full_tracing_context, test/export/test_experimental.py::TestExperiment::test_dynamo_graph_capture_fx_graph_annotate_overlap_pass, test/export/test_experimental.py::TestExperiment::test_dynamo_graph_capture_side_effects, test/export/test_experimental.py::TestExperiment::test_dynamo_graph_capture_with_call_override, test/export/test_experimental.py::TestExperiment::test_dynamo_graph_capture_with_tensor_constant, test/export/test_experimental.py::TestExperiment::test_export_add_in_out_info, test/export/test_experimental.py::TestExperiment::test_export_leaf, test/export/test_experimental.py::TestExperiment::test_joint_basic, test/export/test_experimental.py::TestExperiment::test_joint_buffer_input_mutations, test/export/test_experimental.py::TestExperiment::test_joint_cifar10_backwards, test/export/test_experimental.py::TestExperiment::test_joint_dynamic, test/export/test_experimental.py::TestExperiment::test_joint_loss_index, test/export/test_experimental.py::TestExperiment::test_side_effect, test/export/test_experimental.py::TestExperiment::test_sticky_export, test/export/test_experimental.py::TestExperiment::test_sticky_export_dynamic, test/export/test_experimental.py::TestExperiment::test_sticky_export_nested_inp 2025-12-04T16:28:29.8381720Z 2025-12-04T16:28:29.8382082Z Finished export/test_experimental 1/1 ... [2025-12-04 16:28:29.835842][26138.2187343], took 0.19min 2025-12-04T16:28:29.8675886Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/export.test_experimental/export.test_experimental-4743e9a7200af635.xml 2025-12-04T16:28:29.9421936Z Running export/test_converter 1/1 ... [2025-12-04 16:28:29.941832][26138.324726122] 2025-12-04T16:28:29.9422528Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T16:28:29.9425567Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'export/test_converter.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 16:28:29.942291] 2025-12-04T16:28:55.2947619Z 2025-12-04T16:28:55.2950505Z export/test_converter 1/1 was successful, full logs can be found in artifacts with path test/test-reports/export.test_converter_1.1_96408107873dd104_.log 2025-12-04T16:28:55.2969041Z Running 48 items in this shard: test/export/test_converter.py::TestConverter::test_aten___getitem___dict, test/export/test_converter.py::TestConverter::test_aten___getitem___list, test/export/test_converter.py::TestConverter::test_aten___is__, test/export/test_converter.py::TestConverter::test_aten___isnot__, test/export/test_converter.py::TestConverter::test_aten___not__, test/export/test_converter.py::TestConverter::test_aten_add_t, test/export/test_converter.py::TestConverter::test_aten_append_t, test/export/test_converter.py::TestConverter::test_aten_dim, test/export/test_converter.py::TestConverter::test_aten_floordiv, test/export/test_converter.py::TestConverter::test_aten_len, test/export/test_converter.py::TestConverter::test_aten_tensor_dtype_int, test/export/test_converter.py::TestConverter::test_aten_tensor_dynamic, test/export/test_converter.py::TestConverter::test_aten_tensor_prim_dtype, test/export/test_converter.py::TestConverter::test_aten_to_dtype_with_mutating_storage, test/export/test_converter.py::TestConverter::test_context_manager, test/export/test_converter.py::TestConverter::test_convert_func_without_param, test/export/test_converter.py::TestConverter::test_convert_if_basic, test/export/test_converter.py::TestConverter::test_convert_if_duplicate_attr_names, test/export/test_converter.py::TestConverter::test_convert_if_multiple_out, test/export/test_converter.py::TestConverter::test_convert_if_tuple_out, test/export/test_converter.py::TestConverter::test_convert_nn_module_with_nested_buffer, test/export/test_converter.py::TestConverter::test_convert_nn_module_with_nested_if_and_buffer, test/export/test_converter.py::TestConverter::test_convert_nn_module_with_nested_if_and_param, test/export/test_converter.py::TestConverter::test_convert_nn_module_with_nested_param, test/export/test_converter.py::TestConverter::test_convert_retrace_nested_scripted_modules, test/export/test_converter.py::TestConverter::test_convert_script_object, test/export/test_converter.py::TestConverter::test_get_tensor_constants, test/export/test_converter.py::TestConverter::test_hidden_input_name, test/export/test_converter.py::TestConverter::test_implicit_constant_to_tensor_handling, test/export/test_converter.py::TestConverter::test_prim_SetAttr, test/export/test_converter.py::TestConverter::test_prim_device, test/export/test_converter.py::TestConverter::test_prim_device_cuda, test/export/test_converter.py::TestConverter::test_prim_dtype, test/export/test_converter.py::TestConverter::test_prim_max, test/export/test_converter.py::TestConverter::test_prim_min, test/export/test_converter.py::TestConverter::test_prim_tolist, test/export/test_converter.py::TestConverter::test_profiler__record_function, test/export/test_converter.py::TestConverter::test_raise_exception, test/export/test_converter.py::TestConverter::test_ts2ep_convert_quantized_model1, test/export/test_converter.py::TestConverter::test_ts2ep_convert_quantized_model_with_opcontext, test/export/test_converter.py::TestConverter::test_ts2ep_convert_quantized_model_with_opcontext_and_constant, test/export/test_converter.py::TestConverter::test_ts2ep_converter_basic, test/export/test_converter.py::TestConverter::test_ts2ep_converter_container_output, test/export/test_converter.py::TestConverter::test_ts2ep_converter_contains, test/export/test_converter.py::TestConverter::test_ts2ep_converter_custom_op, test/export/test_converter.py::TestConverter::test_ts2ep_converter_unpack, test/export/test_converter.py::TestConverter::test_ts2ep_multi_outputs_on_call_ops, test/export/test_converter.py::TestConverter::test_ts2ep_with_loop 2025-12-04T16:28:55.2987173Z 2025-12-04T16:28:55.2987539Z Finished export/test_converter 1/1 ... [2025-12-04 16:28:55.294603][26163.677498699], took 0.42min 2025-12-04T16:28:55.3274191Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/export.test_converter/export.test_converter-a6e4e9ebcfaea6df.xml 2025-12-04T16:28:55.3990035Z Running dynamo/test_reorder_logs 1/1 ... [2025-12-04 16:28:55.398682][26163.78157649] 2025-12-04T16:28:55.3990647Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T16:28:55.3993777Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'dynamo/test_reorder_logs.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 16:28:55.399135] 2025-12-04T16:29:04.7776431Z 2025-12-04T16:29:04.7777455Z dynamo/test_reorder_logs 1/1 was successful, full logs can be found in artifacts with path test/test-reports/dynamo.test_reorder_logs_1.1_c9bc43c050335e8d_.log 2025-12-04T16:29:04.7785263Z Running 14 items in this shard: test/dynamo/test_reorder_logs.py::IgnoreLogsTests::test_ignore_logger_ignore_method0_fn0_should_ignore_logger_False, test/dynamo/test_reorder_logs.py::IgnoreLogsTests::test_ignore_logger_ignore_method1_fn1_should_ignore_logger_False, test/dynamo/test_reorder_logs.py::IgnoreLogsTests::test_ignore_logger_ignore_method2_fn2_should_ignore_logger_False, test/dynamo/test_reorder_logs.py::IgnoreLogsTests::test_ignore_logger_ignore_method3_fn3_should_ignore_logger_False, test/dynamo/test_reorder_logs.py::IgnoreLogsTests::test_ignore_logger_ignore_method4_fn4_should_ignore_logger_True, test/dynamo/test_reorder_logs.py::IgnoreLogsTests::test_ignore_logger_ignore_method5_fn5_should_ignore_logger_True, test/dynamo/test_reorder_logs.py::IgnoreLogsTests::test_ignore_logger_ignore_method6_fn6_should_ignore_logger_True, test/dynamo/test_reorder_logs.py::IgnoreLogsTests::test_ignore_logger_ignore_method7_fn7_should_ignore_logger_True, test/dynamo/test_reorder_logs.py::ReorderLogsTests::test_constant_mutation, test/dynamo/test_reorder_logs.py::ReorderLogsTests::test_dont_reorder_print, test/dynamo/test_reorder_logs.py::ReorderLogsTests::test_reorder_custom_log_fn, test/dynamo/test_reorder_logs.py::ReorderLogsTests::test_reorder_print, test/dynamo/test_reorder_logs.py::ReorderLogsTests::test_reorder_print_graph_break, test/dynamo/test_reorder_logs.py::ReorderLogsTests::test_reorder_warnings 2025-12-04T16:29:04.7792498Z 2025-12-04T16:29:04.7792928Z Finished dynamo/test_reorder_logs 1/1 ... [2025-12-04 16:29:04.777447][26173.16034271], took 0.16min 2025-12-04T16:29:04.8093836Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/dynamo.test_reorder_logs/dynamo.test_reorder_logs-d530254831fe0a21.xml 2025-12-04T16:29:04.8895297Z Running dynamo/test_subclasses 1/1 ... [2025-12-04 16:29:04.889193][26173.272087508] 2025-12-04T16:29:04.8895897Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T16:29:04.8898943Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'dynamo/test_subclasses.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 16:29:04.889648] 2025-12-04T16:29:55.0805756Z 2025-12-04T16:29:55.0809604Z dynamo/test_subclasses 1/1 was successful, full logs can be found in artifacts with path test/test-reports/dynamo.test_subclasses_1.1_2bde93c2c59c5c84_.log 2025-12-04T16:29:55.0923001Z Running 126 items in this shard: test/dynamo/test_subclasses.py::SubclassTests::test_as_subclass_attr_mutation, test/dynamo/test_subclasses.py::SubclassTests::test_base_torch_function_tracing, test/dynamo/test_subclasses.py::SubclassTests::test_compile_higher_order_with_functionalization, test/dynamo/test_subclasses.py::SubclassTests::test_compile_with_fake_tensor_automatic_dynamic, test/dynamo/test_subclasses.py::SubclassTests::test_compile_with_fake_tensor_dynamic_dim, test/dynamo/test_subclasses.py::SubclassTests::test_compile_with_functionalization, test/dynamo/test_subclasses.py::SubclassTests::test_disable_all_torch_function, test/dynamo/test_subclasses.py::SubclassTests::test_disable_all_torch_function_restore_values, test/dynamo/test_subclasses.py::SubclassTests::test_disable_all_torch_function_restore_values_graph_break, test/dynamo/test_subclasses.py::SubclassTests::test_has_torch_function, test/dynamo/test_subclasses.py::SubclassTests::test_make_subclass, test/dynamo/test_subclasses.py::SubclassTests::test_mark_static_with_subclass_desugaring_dynamic_False, test/dynamo/test_subclasses.py::SubclassTests::test_mark_static_with_subclass_desugaring_dynamic_True, test/dynamo/test_subclasses.py::SubclassTests::test_newly_constructed_tensor_subclass_attr_mutation, test/dynamo/test_subclasses.py::SubclassTests::test_njt_subclass_from_buffer, test/dynamo/test_subclasses.py::SubclassTests::test_njt_subclass_from_cat, test/dynamo/test_subclasses.py::SubclassTests::test_njt_subclass_simple, test/dynamo/test_subclasses.py::SubclassTests::test_no_call_to_new, test/dynamo/test_subclasses.py::SubclassTests::test_no_torch_function_on_size_bytecode, test/dynamo/test_subclasses.py::SubclassTests::test_no_torch_function_recompiles, test/dynamo/test_subclasses.py::SubclassTests::test_nontraceable_tensor_subclass, test/dynamo/test_subclasses.py::SubclassTests::test_overridden_method_guarding, test/dynamo/test_subclasses.py::SubclassTests::test_parameter_subclass_custom_torch_func_and_dynamic_attr, test/dynamo/test_subclasses.py::SubclassTests::test_parameter_subclass_with_old_torch_function, test/dynamo/test_subclasses.py::SubclassTests::test_recompile_with_symbool_inputs, test/dynamo/test_subclasses.py::SubclassTests::test_recompiles_with_optional_inner_tensor, test/dynamo/test_subclasses.py::SubclassTests::test_return_as_subclass, test/dynamo/test_subclasses.py::SubclassTests::test_return_local_subclass, test/dynamo/test_subclasses.py::SubclassTests::test_return_subclass, test/dynamo/test_subclasses.py::SubclassTests::test_subclass_TwoTensor_TwoTensor_TwoTensor, test/dynamo/test_subclasses.py::SubclassTests::test_subclass_TwoTensor_nested_diff_sizes, test/dynamo/test_subclasses.py::SubclassTests::test_subclass_constructor_proxying, test/dynamo/test_subclasses.py::SubclassTests::test_subclass_dont_invoke_torch_function_on_overridden_attr, test/dynamo/test_subclasses.py::SubclassTests::test_subclass_dont_invoke_torch_function_on_overridden_method, test/dynamo/test_subclasses.py::SubclassTests::test_subclass_override_shape_and_to, test/dynamo/test_subclasses.py::SubclassTests::test_subclass_parameters_are_static_under_training, test/dynamo/test_subclasses.py::SubclassTests::test_subclass_views_dynamic_False, test/dynamo/test_subclasses.py::SubclassTests::test_subclass_views_dynamic_True, test/dynamo/test_subclasses.py::SubclassTests::test_subclass_with_disabled_torch_function, test/dynamo/test_subclasses.py::SubclassTests::test_support_bases, test/dynamo/test_subclasses.py::SubclassTests::test_tensor_subclass_TwoTensor_automatic_dynamic_shapes, test/dynamo/test_subclasses.py::SubclassTests::test_tensor_subclass_TwoTensor_clone_view, test/dynamo/test_subclasses.py::SubclassTests::test_tensor_subclass_TwoTensor_different_shape, test/dynamo/test_subclasses.py::SubclassTests::test_tensor_subclass_TwoTensor_mark_dynamic_shapes, test/dynamo/test_subclasses.py::SubclassTests::test_tensor_subclass_TwoTensor_mul, test/dynamo/test_subclasses.py::SubclassTests::test_tensor_subclass_TwoTensor_nested, test/dynamo/test_subclasses.py::SubclassTests::test_tensor_subclass_TwoTensor_return_multiple, test/dynamo/test_subclasses.py::SubclassTests::test_tensor_subclass_TwoTensor_return_shape, test/dynamo/test_subclasses.py::SubclassTests::test_tensor_subclass_TwoTensor_return_tensor_and_subclass, test/dynamo/test_subclasses.py::SubclassTests::test_tensor_subclass_TwoTensor_simple, test/dynamo/test_subclasses.py::SubclassTests::test_tensor_subclass_TwoTensor_view, test/dynamo/test_subclasses.py::SubclassTests::test_tensor_subclass_TwoTensor_view_mul, test/dynamo/test_subclasses.py::SubclassTests::test_tensor_subclass_attr_codegen_tos, test/dynamo/test_subclasses.py::SubclassTests::test_tensor_subclass_ctx_custom_guards_error_arg_num, test/dynamo/test_subclasses.py::SubclassTests::test_tensor_subclass_ctx_custom_guards_error_not_classmethod, test/dynamo/test_subclasses.py::SubclassTests::test_tensor_subclass_ctx_custom_guards_override, test/dynamo/test_subclasses.py::SubclassTests::test_tensor_subclass_ctx_guards, test/dynamo/test_subclasses.py::SubclassTests::test_tensor_subclass_ctx_recursive_guards, test/dynamo/test_subclasses.py::SubclassTests::test_tensor_subclass_custom_attr, test/dynamo/test_subclasses.py::SubclassTests::test_tensor_subclass_with_non_classmethod_torch_function, test/dynamo/test_subclasses.py::SubclassTests::test_torch_dispatch_subclass_guard_recompile, test/dynamo/test_subclasses.py::SubclassTests::test_torch_function_call_on_attr, test/dynamo/test_subclasses.py::SubclassTests::test_torch_function_call_on_method, test/dynamo/test_subclasses.py::SubclassTests::test_torch_function_call_on_method_arg, test/dynamo/test_subclasses.py::SubclassTests::test_torch_function_list_args, test/dynamo/test_subclasses.py::SubclassTests::test_torch_function_state_graph_break, test/dynamo/test_subclasses.py::SubclassTests::test_torch_function_state_guards, test/dynamo/test_subclasses.py::SubclassTests::test_torch_function_state_nested, test/dynamo/test_subclasses.py::SubclassTests::test_torch_function_state_tracing, test/dynamo/test_subclasses.py::SubclassTests::test_torch_function_subclass_survives_into_aot_autograd, test/dynamo/test_subclasses.py::SubclassTests::test_torch_function_wrapper_class, test/dynamo/test_subclasses.py::SubclassTests::test_torch_function_wrapper_class_with_kwargs, test/dynamo/test_subclasses.py::SubclassTests::test_type_check_equality_subclass, test/dynamo/test_subclasses.py::SubclassTests::test_type_check_equality_tensor, test/dynamo/test_subclasses.py::SubclassTests::test_type_check_identity_subclass, test/dynamo/test_subclasses.py::SubclassTests::test_type_check_identity_tensor, test/dynamo/test_subclasses.py::SubclassTests::test_type_check_isinstance_subclass, test/dynamo/test_subclasses.py::SubclassTests::test_type_check_isinstance_tensor, test/dynamo/test_subclasses.py::SubclassTests::test_user_overridden_attr_unsupported, test/dynamo/test_subclasses.py::SubclassTests::test_user_overridden_method_unsupported, test/dynamo/test_subclasses.py::SubclassTests::test_user_overridden_property_unsupported, test/dynamo/test_subclasses.py::SubclassTests::test_wrapper_subclass_dynamo_attribute_access_on_intermediate, test/dynamo/test_subclasses.py::SubclassTests::test_wrapper_subclass_guards_on_inner_tensor, test/dynamo/test_subclasses.py::SubclassTests::test_wrapper_subclass_with_differently_sized_inner_tensor, test/dynamo/test_subclasses.py::SubclassTests::test_wrapper_subclass_with_same_sized_inner_tensor, test/dynamo/test_subclasses.py::TestNestedTensor::test_basic_autograd, test/dynamo/test_subclasses.py::TestNestedTensor::test_basic_autograd_inductor, test/dynamo/test_subclasses.py::TestNestedTensor::test_binary_does_not_recompile, test/dynamo/test_subclasses.py::TestNestedTensor::test_binary_recompiles, test/dynamo/test_subclasses.py::TestNestedTensor::test_in_graph_construction_from_input, test/dynamo/test_subclasses.py::TestNestedTensor::test_in_graph_construction_from_input_2, test/dynamo/test_subclasses.py::TestNestedTensor::test_in_graph_construction_from_input_4, test/dynamo/test_subclasses.py::TestNestedTensor::test_in_graph_construction_from_input_5, test/dynamo/test_subclasses.py::TestNestedTensor::test_in_graph_construction_from_input_6, test/dynamo/test_subclasses.py::TestNestedTensor::test_in_graph_construction_from_intermediate, test/dynamo/test_subclasses.py::TestNestedTensor::test_in_graph_construction_from_intermediate_2, test/dynamo/test_subclasses.py::TestNestedTensor::test_in_graph_construction_from_intermediate_3, test/dynamo/test_subclasses.py::TestNestedTensor::test_in_graph_construction_from_intermediate_4, test/dynamo/test_subclasses.py::TestNestedTensor::test_in_graph_construction_from_intermediate_5, test/dynamo/test_subclasses.py::TestNestedTensor::test_in_graph_construction_mixed, test/dynamo/test_subclasses.py::TestNestedTensor::test_in_graph_construction_mixed_2, test/dynamo/test_subclasses.py::TestNestedTensor::test_in_graph_construction_mixed_3, test/dynamo/test_subclasses.py::TestNestedTensor::test_in_graph_is_nested_call, test/dynamo/test_subclasses.py::TestNestedTensor::test_inference_tensor, test/dynamo/test_subclasses.py::TestNestedTensor::test_inline_nested_tensor_from_jagged, test/dynamo/test_subclasses.py::TestNestedTensor::test_inputs_to_compiled_fn_are_views_nt_view_name_base_is_nt_False_basic, test/dynamo/test_subclasses.py::TestNestedTensor::test_inputs_to_compiled_fn_are_views_nt_view_name_base_is_nt_False_leaf_False_False, test/dynamo/test_subclasses.py::TestNestedTensor::test_inputs_to_compiled_fn_are_views_nt_view_name_base_is_nt_False_leaf_False_True, test/dynamo/test_subclasses.py::TestNestedTensor::test_inputs_to_compiled_fn_are_views_nt_view_name_base_is_nt_False_leaf_True_False, test/dynamo/test_subclasses.py::TestNestedTensor::test_inputs_to_compiled_fn_are_views_nt_view_name_base_is_nt_False_leaf_True_True, test/dynamo/test_subclasses.py::TestNestedTensor::test_inputs_to_compiled_fn_are_views_nt_view_name_base_is_nt_False_obscure, test/dynamo/test_subclasses.py::TestNestedTensor::test_inputs_to_compiled_fn_are_views_nt_view_name_base_is_nt_True_basic, test/dynamo/test_subclasses.py::TestNestedTensor::test_inputs_to_compiled_fn_are_views_nt_view_name_base_is_nt_True_leaf_False_False, test/dynamo/test_subclasses.py::TestNestedTensor::test_inputs_to_compiled_fn_are_views_nt_view_name_base_is_nt_True_leaf_False_True, test/dynamo/test_subclasses.py::TestNestedTensor::test_inputs_to_compiled_fn_are_views_nt_view_name_base_is_nt_True_leaf_True_False, test/dynamo/test_subclasses.py::TestNestedTensor::test_inputs_to_compiled_fn_are_views_nt_view_name_base_is_nt_True_leaf_True_True, test/dynamo/test_subclasses.py::TestNestedTensor::test_inputs_to_compiled_fn_are_views_nt_view_name_base_is_nt_True_obscure, test/dynamo/test_subclasses.py::TestNestedTensor::test_inputs_to_compiled_fn_are_views_nt_view_name_dense_subclass_dense_subclass, test/dynamo/test_subclasses.py::TestNestedTensor::test_inputs_to_compiled_fn_are_views_nt_view_name_subclass_dense, test/dynamo/test_subclasses.py::TestNestedTensor::test_param_subclass_isinstance_input, test/dynamo/test_subclasses.py::TestNestedTensor::test_return_shape, test/dynamo/test_subclasses.py::TestNestedTensor::test_subclass_dense_subclass_dense_view, test/dynamo/test_subclasses.py::TestNestedTensor::test_subclass_gives_static_shapes_when_dynamic_false, test/dynamo/test_subclasses.py::TestNestedTensor::test_subclass_with_mutation_in_graph, test/dynamo/test_subclasses.py::TestNestedTensor::test_unary_does_not_recompile, test/dynamo/test_subclasses.py::TestNestedTensor::test_unbind 2025-12-04T16:29:55.1033744Z 2025-12-04T16:29:55.1034448Z Finished dynamo/test_subclasses 1/1 ... [2025-12-04 16:29:55.080632][26223.463523815], took 0.84min 2025-12-04T16:29:55.1214437Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/dynamo.test_subclasses/dynamo.test_subclasses-90ae20717b7fd572.xml 2025-12-04T16:29:56.6525350Z Uploading artifacts took 1.44 seconds 2025-12-04T16:29:56.6530668Z Running dynamo/test_python_autograd 1/1 ... [2025-12-04 16:29:56.652819][26225.035712755] 2025-12-04T16:29:56.6531588Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T16:29:56.6537773Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'dynamo/test_python_autograd.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 16:29:56.653380] 2025-12-04T16:30:04.8300583Z 2025-12-04T16:30:04.8301654Z dynamo/test_python_autograd 1/1 was successful, full logs can be found in artifacts with path test/test-reports/dynamo.test_python_autograd_1.1_3d66bfb1c1737055_.log 2025-12-04T16:30:04.8304434Z Running 5 items in this shard: test/dynamo/test_python_autograd.py::TestPythonAutograd::test_backwards1, test/dynamo/test_python_autograd.py::TestPythonAutograd::test_backwards2, test/dynamo/test_python_autograd.py::TestPythonAutograd::test_forwards1, test/dynamo/test_python_autograd.py::TestPythonAutograd::test_forwards2, test/dynamo/test_python_autograd.py::TestPythonAutograd::test_split 2025-12-04T16:30:04.8306465Z 2025-12-04T16:30:04.8306835Z Finished dynamo/test_python_autograd 1/1 ... [2025-12-04 16:30:04.829851][26233.212747616], took 0.14min 2025-12-04T16:30:04.8624242Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/dynamo.test_python_autograd/dynamo.test_python_autograd-b76a60537c2ba691.xml 2025-12-04T16:30:04.9559978Z Running export/test_draft_export 1/1 ... [2025-12-04 16:30:04.955636][26233.338531574] 2025-12-04T16:30:04.9560568Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T16:30:04.9563496Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'export/test_draft_export.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 16:30:04.956109] 2025-12-04T16:30:25.0034861Z 2025-12-04T16:30:25.0035916Z export/test_draft_export 1/1 was successful, full logs can be found in artifacts with path test/test-reports/export.test_draft_export_1.1_dc9e6c5dfafe9a68_.log 2025-12-04T16:30:25.0046132Z Running 21 items in this shard: test/export/test_draft_export.py::TestDraftExport::test_complex_data_dependent_expr, test/export/test_draft_export.py::TestDraftExport::test_constantify_unbacked_symbol, test/export/test_draft_export.py::TestDraftExport::test_cuda_memory_usage, test/export/test_draft_export.py::TestDraftExport::test_data_dependent_failure, test/export/test_draft_export.py::TestDraftExport::test_dedup_data_dependent_failure, test/export/test_draft_export.py::TestDraftExport::test_fake_infer_dense_in_memory_check, test/export/test_draft_export.py::TestDraftExport::test_masked_linear, test/export/test_draft_export.py::TestDraftExport::test_missing_meta_kernel_custom_op_basic, test/export/test_draft_export.py::TestDraftExport::test_missing_meta_kernel_custom_op_multiple_profiles, test/export/test_draft_export.py::TestDraftExport::test_missing_meta_kernel_custom_op_update_profile, test/export/test_draft_export.py::TestDraftExport::test_missing_meta_kernel_guard, test/export/test_draft_export.py::TestDraftExport::test_missing_meta_kernel_impl, test/export/test_draft_export.py::TestDraftExport::test_offsets, test/export/test_draft_export.py::TestDraftExport::test_override_incorrectly_aliasing_kernel, test/export/test_draft_export.py::TestDraftExport::test_override_mismatched_fake_kernel_with_unbacked_symbols, test/export/test_draft_export.py::TestDraftExport::test_override_size_and_dtype_mismatched_fake_kernels, test/export/test_draft_export.py::TestDraftExport::test_shape_failure, test/export/test_draft_export.py::TestDraftExport::test_side_effect1, test/export/test_draft_export.py::TestDraftExport::test_side_effect_inps, test/export/test_draft_export.py::TestDraftExport::test_torchbind, test/export/test_draft_export.py::TestDraftExport::test_unbacked_div_mod_replacement 2025-12-04T16:30:25.0055266Z 2025-12-04T16:30:25.0055634Z Finished export/test_draft_export 1/1 ... [2025-12-04 16:30:25.003540][26253.386434637], took 0.33min 2025-12-04T16:30:25.0358503Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/export.test_draft_export/export.test_draft_export-0c8a812115433a7d.xml 2025-12-04T16:30:25.1197111Z Running test_package 1/1 ... [2025-12-04 16:30:25.119382][26253.50227625] 2025-12-04T16:30:25.1197670Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T16:30:25.1200610Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_package.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 16:30:25.119822] 2025-12-04T16:30:33.0964061Z 2025-12-04T16:30:33.0965415Z test_package 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_package_1.1_34eeddca63aecf34_.log 2025-12-04T16:30:33.1008535Z Running 137 items in this shard: test/test_package.py::TestAnalyze::test_trace_dependencies, test/test_package.py::TestDependencyAPI::test_allow_empty_with_error, test/test_package.py::TestDependencyAPI::test_broken_dependency, test/test_package.py::TestDependencyAPI::test_deny, test/test_package.py::TestDependencyAPI::test_deny_glob, test/test_package.py::TestDependencyAPI::test_extern, test/test_package.py::TestDependencyAPI::test_extern_glob, test/test_package.py::TestDependencyAPI::test_extern_glob_allow_empty, test/test_package.py::TestDependencyAPI::test_externing_c_extension, test/test_package.py::TestDependencyAPI::test_implicit_intern, test/test_package.py::TestDependencyAPI::test_intern_error, test/test_package.py::TestDependencyAPI::test_invalid_import, test/test_package.py::TestDependencyAPI::test_mock, test/test_package.py::TestDependencyAPI::test_mock_glob, test/test_package.py::TestDependencyAPI::test_mock_glob_allow_empty, test/test_package.py::TestDependencyAPI::test_pickle_mocked, test/test_package.py::TestDependencyAPI::test_pickle_mocked_all, test/test_package.py::TestDependencyAPI::test_repackage_mocked_module, test/test_package.py::TestDependencyHooks::test_extern_and_mock_hook, test/test_package.py::TestDependencyHooks::test_multiple_extern_hooks, test/test_package.py::TestDependencyHooks::test_multiple_mock_hooks, test/test_package.py::TestDependencyHooks::test_remove_hooks, test/test_package.py::TestDependencyHooks::test_single_hook, test/test_package.py::TestDiGraph::test_all_paths, test/test_package.py::TestDiGraph::test_contains, test/test_package.py::TestDiGraph::test_contains_non_hashable, test/test_package.py::TestDiGraph::test_edges, test/test_package.py::TestDiGraph::test_forward_closure, test/test_package.py::TestDiGraph::test_iter, test/test_package.py::TestDiGraph::test_node_attr_update, test/test_package.py::TestDiGraph::test_node_attrs, test/test_package.py::TestDiGraph::test_predecessor_not_in_graph, test/test_package.py::TestDiGraph::test_predecessors, test/test_package.py::TestDiGraph::test_successor_not_in_graph, test/test_package.py::TestDiGraph::test_successors, test/test_package.py::DirectoryReaderTest::test_importer_access, test/test_package.py::DirectoryReaderTest::test_loading_has_record, test/test_package.py::DirectoryReaderTest::test_loading_module, test/test_package.py::DirectoryReaderTest::test_loading_pickle, test/test_package.py::DirectoryReaderTest::test_package_resource_access, test/test_package.py::DirectoryReaderTest::test_resource_access_by_path, test/test_package.py::DirectoryReaderTest::test_resource_reader, test/test_package.py::DirectoryReaderTest::test_scriptobject_failure_message, test/test_package.py::TestGlobGroup::test_exclude, test/test_package.py::TestGlobGroup::test_exclude_from_all, test/test_package.py::TestGlobGroup::test_invalid_raw, test/test_package.py::TestGlobGroup::test_list_include_exclude, test/test_package.py::TestGlobGroup::test_one_star, test/test_package.py::TestGlobGroup::test_one_star_middle, test/test_package.py::TestGlobGroup::test_one_star_multiple_in_component, test/test_package.py::TestGlobGroup::test_one_star_partial, test/test_package.py::TestGlobGroup::test_one_star_partial_extension, test/test_package.py::TestGlobGroup::test_raw_two_star, test/test_package.py::TestGlobGroup::test_two_star, test/test_package.py::TestGlobGroup::test_two_star_end, test/test_package.py::TestGlobGroup::test_two_star_middle, test/test_package.py::TestGlobGroup::test_two_star_multiple, test/test_package.py::TestImporter::test_ordered_importer_basic, test/test_package.py::TestImporter::test_ordered_importer_whichmodule, test/test_package.py::TestImporter::test_package_importer_whichmodule_no_dunder_module, test/test_package.py::TestImporter::test_single_ordered_importer, test/test_package.py::TestImporter::test_sys_importer, test/test_package.py::TestImporter::test_sys_importer_roundtrip, test/test_package.py::TestLoadBCPackages::test_load_bc_packages_fx_module, test/test_package.py::TestLoadBCPackages::test_load_bc_packages_nn_module, test/test_package.py::TestLoadBCPackages::test_load_bc_packages_torchscript_module, test/test_package.py::TestMangling::test_demangle_base, test/test_package.py::TestMangling::test_demangler_multiple_manglers, test/test_package.py::TestMangling::test_is_mangled, test/test_package.py::TestMangling::test_mangle_empty_errors, test/test_package.py::TestMangling::test_mangle_prefix, test/test_package.py::TestMangling::test_mangler_is_consistent, test/test_package.py::TestMangling::test_package_mangler, test/test_package.py::TestMangling::test_roundtrip_mangling, test/test_package.py::TestMangling::test_unique_manglers, test/test_package.py::TestMangling::test_unique_module_names, test/test_package.py::TestMisc::test_dunder_package_present, test/test_package.py::TestMisc::test_dunder_package_works_from_package, test/test_package.py::TestMisc::test_exporter_content_lists, test/test_package.py::TestMisc::test_file_structure, test/test_package.py::TestMisc::test_file_structure_has_file, test/test_package.py::TestMisc::test_inspect_class, test/test_package.py::TestMisc::test_is_from_package, test/test_package.py::TestMisc::test_load_python_version_from_package, test/test_package.py::TestMisc::test_loaders_that_remap_files_work_ok, test/test_package.py::TestMisc::test_python_version, test/test_package.py::TestMisc::test_std_lib_sys_hackery_checks, test/test_package.py::ModelTest::test_model_save, test/test_package.py::ModelTest::test_resnet, test/test_package.py::ModelTest::test_script_resnet, test/test_package.py::TestPackageFX::test_package_fx_custom_tracer, test/test_package.py::TestPackageFX::test_package_fx_package, test/test_package.py::TestPackageFX::test_package_fx_simple, test/test_package.py::TestPackageFX::test_package_fx_with_imports, test/test_package.py::TestPackageFX::test_package_fx_wrap, test/test_package.py::TestPackageFX::test_package_gm_preserve_stack_trace, test/test_package.py::TestPackageFX::test_package_then_fx, test/test_package.py::TestPackageScript::test_different_package_interface, test/test_package.py::TestPackageScript::test_different_package_script_class, test/test_package.py::TestPackageScript::test_load_shared_scriptmodules, test/test_package.py::TestPackageScript::test_load_shared_tensors, test/test_package.py::TestPackageScript::test_load_shared_tensors_repackaged, test/test_package.py::TestPackageScript::test_mixing_packaged_and_inline_modules, test/test_package.py::TestPackageScript::test_mixing_packaged_and_inline_modules_shared_code, test/test_package.py::TestPackageScript::test_package_interface, test/test_package.py::TestPackageScript::test_package_script_class, test/test_package.py::TestPackageScript::test_package_script_class_referencing_self, test/test_package.py::TestPackageScript::test_save_eager_mods_sharing_scriptmodule, test/test_package.py::TestPackageScript::test_save_independent_scriptmodules, test/test_package.py::TestPackageScript::test_save_repeat_scriptmodules, test/test_package.py::TestPackageScript::test_save_scriptmodule, test/test_package.py::TestPackageScript::test_save_scriptmodule_file, test/test_package.py::TestPackageScript::test_save_scriptmodule_only_necessary_code, test/test_package.py::TestPackageScript::test_save_scriptmodule_with_submods, test/test_package.py::TestPackageScript::test_save_scriptmodules_in_container, test/test_package.py::TestPackageScript::test_save_scriptmodules_submod_redefinition, test/test_package.py::TestPackageScript::test_save_shared_tensors, test/test_package.py::TestPackageScript::test_saving_and_scripting_packaged_mod, test/test_package.py::TestPackageScript::test_scriptmodules_repeat_save, test/test_package.py::TestPackageScript::test_tensor_sharing_pickle, test/test_package.py::TestRepackage::test_repackage_import_indirectly_via_parent_module, test/test_package.py::TestResources::test_importer_access, test/test_package.py::TestResources::test_package_resource_access, test/test_package.py::TestResources::test_resource_access_by_path, test/test_package.py::TestResources::test_resource_reader, test/test_package.py::TestSaveLoad::test_bad_dunder_imports, test/test_package.py::TestSaveLoad::test_dunder_imports, test/test_package.py::TestSaveLoad::test_exporting_mismatched_code, test/test_package.py::TestSaveLoad::test_pickle, test/test_package.py::TestSaveLoad::test_pickle_long_name_with_protocol_4, test/test_package.py::TestSaveLoad::test_save_imported_module, test/test_package.py::TestSaveLoad::test_save_imported_module_using_package_importer, test/test_package.py::TestSaveLoad::test_save_load_fp8, test/test_package.py::TestSaveLoad::test_save_module, test/test_package.py::TestSaveLoad::test_save_module_binary, test/test_package.py::TestSaveLoad::test_saving_source, test/test_package.py::TestSaveLoad::test_saving_string 2025-12-04T16:30:33.1050504Z 2025-12-04T16:30:33.1050792Z Finished test_package 1/1 ... [2025-12-04 16:30:33.096408][26261.479301358], took 0.13min 2025-12-04T16:30:33.1291835Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_package/test_package-523d81f0792170f1.xml 2025-12-04T16:30:33.2170806Z Running test_mkl_verbose 1/1 ... [2025-12-04 16:30:33.216794][26261.599688784] 2025-12-04T16:30:33.2171628Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T16:30:33.2175333Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_mkl_verbose.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 16:30:33.217300] 2025-12-04T16:30:43.2487760Z 2025-12-04T16:30:43.2488707Z test_mkl_verbose 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_mkl_verbose_1.1_8df5a0c4f0a0ed8d_.log 2025-12-04T16:30:43.2490081Z Running 2 items in this shard: test/test_mkl_verbose.py::TestMKLVerbose::test_verbose_off, test/test_mkl_verbose.py::TestMKLVerbose::test_verbose_on 2025-12-04T16:30:43.2490843Z 2025-12-04T16:30:43.2491160Z Finished test_mkl_verbose 1/1 ... [2025-12-04 16:30:43.248501][26271.631396535], took 0.17min 2025-12-04T16:30:43.2818624Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_mkl_verbose/test_mkl_verbose-874cbf06946f8b3e.xml 2025-12-04T16:30:43.3444016Z Running test_comparison_utils 1/1 ... [2025-12-04 16:30:43.344052][26271.726946881] 2025-12-04T16:30:43.3444719Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T16:30:43.3447498Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_comparison_utils.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 16:30:43.344494] 2025-12-04T16:30:48.8169404Z 2025-12-04T16:30:48.8170375Z test_comparison_utils 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_comparison_utils_1.1_bef8586b0834f006_.log 2025-12-04T16:30:48.8173979Z Running 7 items in this shard: test/test_comparison_utils.py::TestComparisonUtils::test_all_equal_no_assert, test/test_comparison_utils.py::TestComparisonUtils::test_all_equal_no_assert_nones, test/test_comparison_utils.py::TestComparisonUtils::test_assert_device, test/test_comparison_utils.py::TestComparisonUtils::test_assert_dtype, test/test_comparison_utils.py::TestComparisonUtils::test_assert_layout, test/test_comparison_utils.py::TestComparisonUtils::test_assert_sizes, test/test_comparison_utils.py::TestComparisonUtils::test_assert_strides 2025-12-04T16:30:48.8176743Z 2025-12-04T16:30:48.8177084Z Finished test_comparison_utils 1/1 ... [2025-12-04 16:30:48.816752][26277.199649922], took 0.09min 2025-12-04T16:30:48.8502194Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_comparison_utils/test_comparison_utils-ce770324779d51b3.xml 2025-12-04T16:30:48.8839286Z Running functorch/test_ac_logging 1/1 ... [2025-12-04 16:30:48.883684][26277.266578608] 2025-12-04T16:30:48.8839864Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T16:30:48.8843425Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'functorch/test_ac_logging.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 16:30:48.884101] 2025-12-04T16:30:54.4066782Z 2025-12-04T16:30:54.4068054Z functorch/test_ac_logging 1/1 was successful, full logs can be found in artifacts with path test/test-reports/functorch.test_ac_logging_1.1_7064fc1f81d9dc21_.log 2025-12-04T16:30:54.4070835Z Running 4 items in this shard: test/functorch/test_ac_logging.py::TestAcLogging::test_create_activation_checkpointing_logging_structure_payload, test/functorch/test_ac_logging.py::TestAcLogging::test_create_joint_graph_edges, test/functorch/test_ac_logging.py::TestAcLogging::test_create_joint_graph_node_information, test/functorch/test_ac_logging.py::TestAcLogging::test_create_structured_trace_for_min_cut_info 2025-12-04T16:30:54.4073321Z 2025-12-04T16:30:54.4073678Z Finished functorch/test_ac_logging 1/1 ... [2025-12-04 16:30:54.406457][26282.789353096], took 0.09min 2025-12-04T16:30:54.4399043Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/functorch.test_ac_logging/functorch.test_ac_logging-f1c79a1c8c74be66.xml 2025-12-04T16:30:54.4758980Z Running test_mkldnn_verbose 1/1 ... [2025-12-04 16:30:54.475629][26282.858522976] 2025-12-04T16:30:54.4759534Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T16:30:54.4763053Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_mkldnn_verbose.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 16:30:54.476075] 2025-12-04T16:31:03.5038173Z 2025-12-04T16:31:03.5039119Z test_mkldnn_verbose 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_mkldnn_verbose_1.1_7178d5eae573783e_.log 2025-12-04T16:31:03.5040629Z Running 2 items in this shard: test/test_mkldnn_verbose.py::TestMKLDNNVerbose::test_verbose_off, test/test_mkldnn_verbose.py::TestMKLDNNVerbose::test_verbose_on 2025-12-04T16:31:03.5041451Z 2025-12-04T16:31:03.5042003Z Finished test_mkldnn_verbose 1/1 ... [2025-12-04 16:31:03.503614][26291.886505987], took 0.15min 2025-12-04T16:31:03.5369219Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_mkldnn_verbose/test_mkldnn_verbose-e983273d29ed8e1e.xml 2025-12-04T16:31:03.6170601Z Running test_cpp_api_parity 1/1 ... [2025-12-04 16:31:03.616768][26291.999662952] 2025-12-04T16:31:03.6171312Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T16:31:03.6174719Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_cpp_api_parity.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 16:31:03.617246] 2025-12-04T16:31:30.8251011Z 2025-12-04T16:31:30.8254323Z test_cpp_api_parity 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_cpp_api_parity_1.1_286b24be771dc4b7_.log 2025-12-04T16:31:30.8476728Z Running 488 items in this shard: test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_BCELoss_no_batch_dim_mean, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_BCELoss_no_batch_dim_mean_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_BCELoss_no_batch_dim_none, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_BCELoss_no_batch_dim_none_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_BCELoss_no_batch_dim_sum, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_BCELoss_no_batch_dim_sum_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_BCEWithLogitsLoss_no_batch_dim_mean, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_BCEWithLogitsLoss_no_batch_dim_mean_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_BCEWithLogitsLoss_no_batch_dim_none, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_BCEWithLogitsLoss_no_batch_dim_none_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_BCEWithLogitsLoss_no_batch_dim_sum, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_BCEWithLogitsLoss_no_batch_dim_sum_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv1d, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv1d_circular_stride2_pad2, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv1d_circular_stride2_pad2_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv1d_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv1d_dilated, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv1d_dilated_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv1d_groups, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv1d_groups_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv1d_pad1, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv1d_pad1_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv1d_pad1size1, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv1d_pad1size1_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv1d_pad2, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv1d_pad2_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv1d_pad2size1, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv1d_pad2size1_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv1d_pad_same, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv1d_pad_same2, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv1d_pad_same2_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv1d_pad_same_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv1d_pad_same_dilated, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv1d_pad_same_dilated_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv1d_pad_valid, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv1d_pad_valid_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv1d_reflect_stride2_pad2, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv1d_reflect_stride2_pad2_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv1d_replicate_stride2_pad2, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv1d_replicate_stride2_pad2_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv1d_stride, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv1d_stride_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv1d_zero_batch, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv1d_zero_batch_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv1d_zeros_stride2_pad2, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv1d_zeros_stride2_pad2_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv2d, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv2d_circular_stride2_pad2, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv2d_circular_stride2_pad2_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv2d_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv2d_depthwise, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv2d_depthwise_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv2d_depthwise_dilated, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv2d_depthwise_dilated_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv2d_depthwise_padded, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv2d_depthwise_padded_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv2d_depthwise_strided, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv2d_depthwise_strided_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv2d_depthwise_with_multiplier, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv2d_depthwise_with_multiplier_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv2d_dilated, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv2d_dilated_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv2d_groups, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv2d_groups_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv2d_groups_thnn, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv2d_groups_thnn_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv2d_no_bias, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv2d_no_bias_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv2d_pad_same, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv2d_pad_same_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv2d_pad_same_dilated, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv2d_pad_same_dilated_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv2d_pad_valid, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv2d_pad_valid_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv2d_padding, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv2d_padding_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv2d_reflect_stride2_pad2, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv2d_reflect_stride2_pad2_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv2d_replicate_stride2_pad2, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv2d_replicate_stride2_pad2_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv2d_strided, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv2d_strided_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv2d_zero_batch, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv2d_zero_batch_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv2d_zeros_stride2_pad2, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv2d_zeros_stride2_pad2_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv3d, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv3d_1x1x1_no_bias, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv3d_1x1x1_no_bias_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv3d_circular_stride2_pad2, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv3d_circular_stride2_pad2_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv3d_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv3d_dilated, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv3d_dilated_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv3d_dilated_strided, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv3d_dilated_strided_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv3d_groups, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv3d_groups_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv3d_no_bias, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv3d_no_bias_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv3d_pad_same, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv3d_pad_same_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv3d_pad_same_dilated, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv3d_pad_same_dilated_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv3d_pad_valid, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv3d_pad_valid_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv3d_replicate_stride2_pad2, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv3d_replicate_stride2_pad2_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv3d_stride, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv3d_stride_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv3d_stride_padding, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv3d_stride_padding_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv3d_zero_batch, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv3d_zero_batch_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv3d_zeros_stride2_pad2, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv3d_zeros_stride2_pad2_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_ConvTranspose1d, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_ConvTranspose1d_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_ConvTranspose1d_dilated, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_ConvTranspose1d_dilated_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_ConvTranspose1d_groups, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_ConvTranspose1d_groups_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_ConvTranspose1d_no_bias, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_ConvTranspose1d_no_bias_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_ConvTranspose2d, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_ConvTranspose2d_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_ConvTranspose2d_dilated, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_ConvTranspose2d_dilated_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_ConvTranspose2d_groups, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_ConvTranspose2d_groups_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_ConvTranspose2d_no_bias, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_ConvTranspose2d_no_bias_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_ConvTranspose3d, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_ConvTranspose3d_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_ConvTranspose3d_dilated, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_ConvTranspose3d_dilated_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_CosineEmbeddingLoss_no_batch_dim_mean, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_CosineEmbeddingLoss_no_batch_dim_mean_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_CosineEmbeddingLoss_no_batch_dim_none, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_CosineEmbeddingLoss_no_batch_dim_none_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_CosineEmbeddingLoss_no_batch_dim_sum, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_CosineEmbeddingLoss_no_batch_dim_sum_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_CrossMapLRN2d, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_CrossMapLRN2d_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Embedding, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_EmbeddingBag_discontiguous, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_EmbeddingBag_discontiguous_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_EmbeddingBag_max, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_EmbeddingBag_max_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_EmbeddingBag_max_padding_idx, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_EmbeddingBag_max_padding_idx_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_EmbeddingBag_mean, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_EmbeddingBag_mean_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_EmbeddingBag_mean_padding_idx, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_EmbeddingBag_mean_padding_idx_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_EmbeddingBag_sparse, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_EmbeddingBag_sparse_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_EmbeddingBag_sum, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_EmbeddingBag_sum_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_EmbeddingBag_sum_padding_idx, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_EmbeddingBag_sum_padding_idx_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Embedding_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Embedding_discontiguous, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Embedding_discontiguous_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Embedding_sparse, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Embedding_sparse_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Flatten, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Flatten_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Flatten_no_batch_dim, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Flatten_no_batch_dim_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Fold, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Fold_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Fold_int_input, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Fold_int_input_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Fold_no_batch_dim_input, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Fold_no_batch_dim_input_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Fold_no_batch_dim_int_input, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Fold_no_batch_dim_int_input_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_HingeEmbeddingLoss_no_batch_dim_mean, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_HingeEmbeddingLoss_no_batch_dim_mean_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_HingeEmbeddingLoss_no_batch_dim_none, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_HingeEmbeddingLoss_no_batch_dim_none_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_HingeEmbeddingLoss_no_batch_dim_sum, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_HingeEmbeddingLoss_no_batch_dim_sum_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_LayerNorm_3d_no_affine_large_feature, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_LayerNorm_3d_no_affine_large_feature_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Linear, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Linear_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Linear_no_batch_dim, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Linear_no_batch_dim_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Linear_no_bias, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Linear_no_bias_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_MarginRankingLoss_no_batch_dim_mean, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_MarginRankingLoss_no_batch_dim_mean_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_MarginRankingLoss_no_batch_dim_none, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_MarginRankingLoss_no_batch_dim_none_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_MarginRankingLoss_no_batch_dim_sum, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_MarginRankingLoss_no_batch_dim_sum_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_MultiLabelMarginLoss_no_batch_dim_mean, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_MultiLabelMarginLoss_no_batch_dim_mean_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_MultiLabelMarginLoss_no_batch_dim_none, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_MultiLabelMarginLoss_no_batch_dim_none_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_MultiLabelMarginLoss_no_batch_dim_sum, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_MultiLabelMarginLoss_no_batch_dim_sum_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_MultiLabelSoftMarginLoss_no_batch_dim_mean, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_MultiLabelSoftMarginLoss_no_batch_dim_mean_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_MultiLabelSoftMarginLoss_no_batch_dim_none, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_MultiLabelSoftMarginLoss_no_batch_dim_none_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_MultiLabelSoftMarginLoss_no_batch_dim_sum, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_MultiLabelSoftMarginLoss_no_batch_dim_sum_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_NLLLoss_no_batch_dim_mean, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_NLLLoss_no_batch_dim_mean_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_NLLLoss_no_batch_dim_none, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_NLLLoss_no_batch_dim_none_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_NLLLoss_no_batch_dim_sum, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_NLLLoss_no_batch_dim_sum_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_PairwiseDistance, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_PairwiseDistance_broadcast_lhs, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_PairwiseDistance_broadcast_lhs_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_PairwiseDistance_broadcast_rhs, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_PairwiseDistance_broadcast_rhs_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_PairwiseDistance_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_PairwiseDistance_no_batch_dim, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_PairwiseDistance_no_batch_dim_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_PairwiseDistance_with_non_default_args, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_PairwiseDistance_with_non_default_args_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_PixelShuffle, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_PixelShuffle_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_PixelUnshuffle, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_PixelUnshuffle_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_RReLU, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_RReLU_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_RReLU_with_up_down, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_RReLU_with_up_down_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_RReLU_with_up_down_scalar, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_RReLU_with_up_down_scalar_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_ReplicationPad3d, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_ReplicationPad3d_complex, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_ReplicationPad3d_complex_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_ReplicationPad3d_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_ReplicationPad3d_no_batch_dim, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_ReplicationPad3d_no_batch_dim_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_SampleModule_has_parity, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_SampleModule_has_parity_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_SampleModule_no_parity, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_SampleModule_no_parity_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_SoftMarginLoss_no_batch_dim_mean, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_SoftMarginLoss_no_batch_dim_mean_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_SoftMarginLoss_no_batch_dim_none, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_SoftMarginLoss_no_batch_dim_none_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_SoftMarginLoss_no_batch_dim_sum, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_SoftMarginLoss_no_batch_dim_sum_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_TransformerDecoderLayer_gelu_activation, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_TransformerDecoderLayer_gelu_activation_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_TransformerDecoderLayer_relu_activation, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_TransformerDecoderLayer_relu_activation_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_TransformerEncoderLayer_gelu_activation, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_TransformerEncoderLayer_gelu_activation_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_TransformerEncoderLayer_relu_activation, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_TransformerEncoderLayer_relu_activation_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Transformer_multilayer_coder, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Transformer_multilayer_coder_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_TripletMarginLoss_no_batch_dim_mean, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_TripletMarginLoss_no_batch_dim_mean_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_TripletMarginLoss_no_batch_dim_none, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_TripletMarginLoss_no_batch_dim_none_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_TripletMarginLoss_no_batch_dim_sum, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_TripletMarginLoss_no_batch_dim_sum_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Unflatten_no_batch_dim, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Unflatten_no_batch_dim_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Unfold, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Unfold_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Unfold_int_input, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Unfold_int_input_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_BCELoss_no_reduce, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_BCELoss_no_reduce_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_BCELoss_no_reduce_scalar, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_BCELoss_no_reduce_scalar_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_BCELoss_weights_no_reduce, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_BCELoss_weights_no_reduce_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_BCELoss_weights_no_reduce_scalar, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_BCELoss_weights_no_reduce_scalar_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_BCEWithLogitsLoss_legacy_enum, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_BCEWithLogitsLoss_legacy_enum_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_BCEWithLogitsLoss_no_reduce, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_BCEWithLogitsLoss_no_reduce_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_BCEWithLogitsLoss_no_reduce_scalar, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_BCEWithLogitsLoss_no_reduce_scalar_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_HingeEmbeddingLoss_margin_no_reduce, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_HingeEmbeddingLoss_margin_no_reduce_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_HingeEmbeddingLoss_no_reduce, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_HingeEmbeddingLoss_no_reduce_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_HuberLoss_delta, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_HuberLoss_delta_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_KLDivLoss_no_reduce, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_KLDivLoss_no_reduce_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_KLDivLoss_no_reduce_log_target, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_KLDivLoss_no_reduce_log_target_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_KLDivLoss_no_reduce_scalar, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_KLDivLoss_no_reduce_scalar_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_KLDivLoss_no_reduce_scalar_log_target, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_KLDivLoss_no_reduce_scalar_log_target_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_KLDivLoss_with_log_target_no_reduce, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_KLDivLoss_with_log_target_no_reduce_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_KLDivLoss_with_target_no_reduce, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_KLDivLoss_with_target_no_reduce_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_L1Loss_no_reduce, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_L1Loss_no_reduce_complex, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_L1Loss_no_reduce_complex_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_L1Loss_no_reduce_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_L1Loss_no_reduce_scalar, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_L1Loss_no_reduce_scalar_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_MSELoss_no_reduce, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_MSELoss_no_reduce_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_MSELoss_no_reduce_scalar, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_MSELoss_no_reduce_scalar_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_MultiLabelMarginLoss_0d_no_reduce, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_MultiLabelMarginLoss_0d_no_reduce_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_MultiLabelMarginLoss_1d_no_reduce, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_MultiLabelMarginLoss_1d_no_reduce_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_MultiLabelMarginLoss_index_neg, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_MultiLabelMarginLoss_index_neg_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_MultiLabelMarginLoss_no_reduce, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_MultiLabelMarginLoss_no_reduce_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_MultiLabelSoftMarginLoss_no_reduce, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_MultiLabelSoftMarginLoss_no_reduce_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_MultiLabelSoftMarginLoss_weights_no_reduce, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_MultiLabelSoftMarginLoss_weights_no_reduce_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_MultiMarginLoss_1d_no_reduce, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_MultiMarginLoss_1d_no_reduce_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_MultiMarginLoss_margin_no_reduce, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_MultiMarginLoss_margin_no_reduce_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_MultiMarginLoss_no_reduce, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_MultiMarginLoss_no_reduce_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_MultiMarginLoss_p_no_reduce, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_MultiMarginLoss_p_no_reduce_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_MultiMarginLoss_weights_no_reduce, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_MultiMarginLoss_weights_no_reduce_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_NLLLoss2d_no_reduce, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_NLLLoss2d_no_reduce_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_NLLLoss2d_no_reduce_ignore_index, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_NLLLoss2d_no_reduce_ignore_index_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_NLLLoss2d_no_reduce_weights, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_NLLLoss2d_no_reduce_weights_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_NLLLossNd_no_reduce, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_NLLLossNd_no_reduce_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_NLLLossNd_no_reduce_ignore_index, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_NLLLossNd_no_reduce_ignore_index_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_NLLLossNd_no_reduce_weights, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_NLLLossNd_no_reduce_weights_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_NLLLoss_no_reduce, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_NLLLoss_no_reduce_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_NLLLoss_no_reduce_ignore_index, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_NLLLoss_no_reduce_ignore_index_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_NLLLoss_no_reduce_weights, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_NLLLoss_no_reduce_weights_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_NLLLoss_no_reduce_weights_ignore_index, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_NLLLoss_no_reduce_weights_ignore_index_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_NLLLoss_no_reduce_weights_ignore_index_neg, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_NLLLoss_no_reduce_weights_ignore_index_neg_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_PoissonNLLLoss_no_reduce, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_PoissonNLLLoss_no_reduce_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_SmoothL1Loss_beta, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_SmoothL1Loss_beta_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_SmoothL1Loss_no_reduce, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_SmoothL1Loss_no_reduce_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_SmoothL1Loss_no_reduce_scalar, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_SmoothL1Loss_no_reduce_scalar_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_SmoothL1Loss_zero_beta, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_SmoothL1Loss_zero_beta_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_SoftMarginLoss_no_reduce, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_SoftMarginLoss_no_reduce_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_bicubic_2d, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_bicubic_2d_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_bicubic_2d_zero_dim, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_bicubic_2d_zero_dim_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_bicubic_scale_2d, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_bicubic_scale_2d_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_bicubic_scale_tuple_shared_2d, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_bicubic_scale_tuple_shared_2d_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_bicubic_scale_tuple_skewed_2d, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_bicubic_scale_tuple_skewed_2d_align_corners, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_bicubic_scale_tuple_skewed_2d_align_corners_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_bicubic_scale_tuple_skewed_2d_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_bicubic_tuple_2d, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_bicubic_tuple_2d_align_corners, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_bicubic_tuple_2d_align_corners_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_bicubic_tuple_2d_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_bilinear_2d, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_bilinear_2d_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_bilinear_2d_zero_dim, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_bilinear_2d_zero_dim_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_bilinear_scale_2d, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_bilinear_scale_2d_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_bilinear_scale_tuple_shared_2d, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_bilinear_scale_tuple_shared_2d_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_bilinear_scale_tuple_skewed_2d, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_bilinear_scale_tuple_skewed_2d_align_corners, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_bilinear_scale_tuple_skewed_2d_align_corners_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_bilinear_scale_tuple_skewed_2d_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_bilinear_tuple_2d, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_bilinear_tuple_2d_align_corners, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_bilinear_tuple_2d_align_corners_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_bilinear_tuple_2d_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_linear_1d, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_linear_1d_align_corners, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_linear_1d_align_corners_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_linear_1d_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_linear_1d_zero_dim, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_linear_1d_zero_dim_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_linear_scale_1d, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_linear_scale_1d_align_corners, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_linear_scale_1d_align_corners_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_linear_scale_1d_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_linear_tuple_1d, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_linear_tuple_1d_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_nearest_1d, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_nearest_1d_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_nearest_1d_zero_dim, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_nearest_1d_zero_dim_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_nearest_2d, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_nearest_2d_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_nearest_2d_launch_configs, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_nearest_2d_launch_configs_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_nearest_2d_zero_dim, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_nearest_2d_zero_dim_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_nearest_3d, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_nearest_3d_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_nearest_3d_zero_dim, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_nearest_3d_zero_dim_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_nearest_scale_1d, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_nearest_scale_1d_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_nearest_scale_2d, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_nearest_scale_2d_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_nearest_scale_3d, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_nearest_scale_3d_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_nearest_tuple_1d, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_nearest_tuple_1d_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_nearest_tuple_2d, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_nearest_tuple_2d_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_nearest_tuple_3d, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_nearest_tuple_3d_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_trilinear_3d, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_trilinear_3d_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_trilinear_3d_zero_dim, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_trilinear_3d_zero_dim_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_trilinear_scale_3d, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_trilinear_scale_3d_align_corners, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_trilinear_scale_3d_align_corners_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_trilinear_scale_3d_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_trilinear_tuple_3d, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_trilinear_tuple_3d_align_corners, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_trilinear_tuple_3d_align_corners_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_trilinear_tuple_3d_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_log_softmax_dim0, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_log_softmax_dim0_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_log_softmax_dim3, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_log_softmax_dim3_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_log_softmax_lastdim, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_log_softmax_lastdim_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_log_softmax_scalar, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_log_softmax_scalar_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_log_softmax_spatial, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_log_softmax_spatial_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_log_softmax_spatial_special, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_log_softmax_spatial_special_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_multimarginloss_1d_input_0d_target_no_reduce, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_multimarginloss_1d_input_0d_target_no_reduce_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_sample_functional_has_parity, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_sample_functional_has_parity_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_sample_functional_no_parity, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_sample_functional_no_parity_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_softmax_functional_dim0, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_softmax_functional_dim0_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_softmax_functional_dim3, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_softmax_functional_dim3_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_softmax_functional_scalar, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_softmax_functional_scalar_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_softmax_lastdim, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_softmax_lastdim_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_softmax_lastdim_dtype, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_softmax_lastdim_dtype_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_softmax_spatial, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_softmax_spatial_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_softmax_spatial_dtype, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_softmax_spatial_dtype_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_softmax_spatial_special, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_softmax_spatial_special_cuda 2025-12-04T16:31:30.8697310Z 2025-12-04T16:31:30.8697643Z Finished test_cpp_api_parity 1/1 ... [2025-12-04 16:31:30.825704][26319.208598972], took 0.45min 2025-12-04T16:31:30.8698842Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_cpp_api_parity/test_cpp_api_parity-c6b7300fef8db168.xml 2025-12-04T16:31:30.9679641Z Running test_autoload 1/1 ... [2025-12-04 16:31:30.967659][26319.350552084] 2025-12-04T16:31:30.9680191Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T16:31:30.9683187Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_autoload.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 16:31:30.968091] 2025-12-04T16:31:36.4909482Z 2025-12-04T16:31:36.4910417Z test_autoload 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_autoload_1.1_4b58ab9cd8e50318_.log 2025-12-04T16:31:36.4911549Z Running 1 items in this shard: test/test_autoload.py::TestDeviceBackendAutoload::test_autoload 2025-12-04T16:31:36.4912050Z 2025-12-04T16:31:36.4912340Z Finished test_autoload 1/1 ... [2025-12-04 16:31:36.490709][26324.873605686], took 0.09min 2025-12-04T16:31:36.5248479Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_autoload/test_autoload-21f1eacf8f4a4d28.xml 2025-12-04T16:31:36.5513658Z Running nn/attention/test_open_registry 1/1 ... [2025-12-04 16:31:36.551028][26324.933922166] 2025-12-04T16:31:36.5514454Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T16:31:36.5516884Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'nn/attention/test_open_registry.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 16:31:36.551442] 2025-12-04T16:31:42.1242598Z 2025-12-04T16:31:42.1243687Z nn/attention/test_open_registry 1/1 was successful, full logs can be found in artifacts with path test/test-reports/nn.attention.test_open_registry_1.1_52b8c107579dfb04_.log 2025-12-04T16:31:42.1245742Z Running 2 items in this shard: test/nn/attention/test_open_registry.py::TestFlashAttentionRegistry::test_activate_unknown_impl_errors, test/nn/attention/test_open_registry.py::TestFlashAttentionRegistry::test_register_and_activate_impl 2025-12-04T16:31:42.1246992Z 2025-12-04T16:31:42.1247399Z Finished nn/attention/test_open_registry 1/1 ... [2025-12-04 16:31:42.124046][26330.506940553], took 0.09min 2025-12-04T16:31:42.1588792Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/nn.attention.test_open_registry/nn.attention.test_open_registry-bacfee0084c93992.xml 2025-12-04T16:31:42.1910976Z Running test_as_strided 1/1 ... [2025-12-04 16:31:42.190809][26330.573703645] 2025-12-04T16:31:42.1911530Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T16:31:42.1914558Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_as_strided.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 16:31:42.191218] 2025-12-04T16:31:47.8140116Z 2025-12-04T16:31:47.8140962Z test_as_strided 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_as_strided_1.1_915ecc12abd3e105_.log 2025-12-04T16:31:47.8142378Z Running 2 items in this shard: test/test_as_strided.py::TestAsStrided::test_size_10_exhaustive, test/test_as_strided.py::TestAsStrided::test_subset_property 2025-12-04T16:31:47.8143171Z 2025-12-04T16:31:47.8143462Z Finished test_as_strided 1/1 ... [2025-12-04 16:31:47.813815][26336.196711787], took 0.09min 2025-12-04T16:31:47.8480051Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_as_strided/test_as_strided-4555079064233d7d.xml 2025-12-04T16:31:47.8814938Z Running test_foreach 1/1 ... [2025-12-04 16:31:47.881198][26336.264093037] 2025-12-04T16:31:47.8815720Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T16:31:47.8818541Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_foreach.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 16:31:47.881612] 2025-12-04T16:42:05.6629315Z 2025-12-04T16:42:05.6630313Z test_foreach 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_foreach_1.1_754d93a1205d9df5_.log 2025-12-04T16:42:05.8367177Z Running 3577 items in this shard: test/test_foreach.py::TestForeachCUDA::test_0dim_tensor_overload_cpu_ok_cuda, test/test_foreach.py::TestForeachCUDA::test_0dim_tensor_overload_exception_cuda, test/test_foreach.py::TestForeachCUDA::test_add_scalar_with_empty_list_and_empty_tensor_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_add_scalar_with_empty_list_and_empty_tensor_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_add_scalar_with_empty_list_and_empty_tensor_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_add_scalar_with_empty_list_and_empty_tensor_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_add_scalar_with_empty_list_and_empty_tensor_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_add_scalar_with_empty_list_and_empty_tensor_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_add_scalar_with_empty_list_and_empty_tensor_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_add_scalar_with_empty_list_and_empty_tensor_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_add_scalar_with_empty_list_and_empty_tensor_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_add_scalar_with_empty_list_and_empty_tensor_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_add_scalar_with_empty_list_and_empty_tensor_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_all_zero_size_tensors_do_not_launch_kernel__foreach_abs_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_all_zero_size_tensors_do_not_launch_kernel__foreach_acos_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_all_zero_size_tensors_do_not_launch_kernel__foreach_add_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_all_zero_size_tensors_do_not_launch_kernel__foreach_addcdiv_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_all_zero_size_tensors_do_not_launch_kernel__foreach_addcmul_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_all_zero_size_tensors_do_not_launch_kernel__foreach_asin_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_all_zero_size_tensors_do_not_launch_kernel__foreach_atan_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_all_zero_size_tensors_do_not_launch_kernel__foreach_ceil_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_all_zero_size_tensors_do_not_launch_kernel__foreach_clamp_max_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_all_zero_size_tensors_do_not_launch_kernel__foreach_clamp_min_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_all_zero_size_tensors_do_not_launch_kernel__foreach_copy_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_all_zero_size_tensors_do_not_launch_kernel__foreach_cos_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_all_zero_size_tensors_do_not_launch_kernel__foreach_cosh_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_all_zero_size_tensors_do_not_launch_kernel__foreach_div_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_all_zero_size_tensors_do_not_launch_kernel__foreach_erf_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_all_zero_size_tensors_do_not_launch_kernel__foreach_erfc_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_all_zero_size_tensors_do_not_launch_kernel__foreach_exp_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_all_zero_size_tensors_do_not_launch_kernel__foreach_expm1_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_all_zero_size_tensors_do_not_launch_kernel__foreach_floor_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_all_zero_size_tensors_do_not_launch_kernel__foreach_frac_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_all_zero_size_tensors_do_not_launch_kernel__foreach_lerp_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_all_zero_size_tensors_do_not_launch_kernel__foreach_lgamma_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_all_zero_size_tensors_do_not_launch_kernel__foreach_log10_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_all_zero_size_tensors_do_not_launch_kernel__foreach_log1p_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_all_zero_size_tensors_do_not_launch_kernel__foreach_log2_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_all_zero_size_tensors_do_not_launch_kernel__foreach_log_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_all_zero_size_tensors_do_not_launch_kernel__foreach_max_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_all_zero_size_tensors_do_not_launch_kernel__foreach_maximum_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_all_zero_size_tensors_do_not_launch_kernel__foreach_minimum_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_all_zero_size_tensors_do_not_launch_kernel__foreach_mul_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_all_zero_size_tensors_do_not_launch_kernel__foreach_neg_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_all_zero_size_tensors_do_not_launch_kernel__foreach_norm_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_all_zero_size_tensors_do_not_launch_kernel__foreach_pow_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_all_zero_size_tensors_do_not_launch_kernel__foreach_reciprocal_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_all_zero_size_tensors_do_not_launch_kernel__foreach_round_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_all_zero_size_tensors_do_not_launch_kernel__foreach_rsqrt_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_all_zero_size_tensors_do_not_launch_kernel__foreach_sigmoid_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_all_zero_size_tensors_do_not_launch_kernel__foreach_sign_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_all_zero_size_tensors_do_not_launch_kernel__foreach_sin_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_all_zero_size_tensors_do_not_launch_kernel__foreach_sinh_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_all_zero_size_tensors_do_not_launch_kernel__foreach_sqrt_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_all_zero_size_tensors_do_not_launch_kernel__foreach_sub_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_all_zero_size_tensors_do_not_launch_kernel__foreach_tan_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_all_zero_size_tensors_do_not_launch_kernel__foreach_tanh_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_all_zero_size_tensors_do_not_launch_kernel__foreach_trunc_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_all_zero_size_tensors_do_not_launch_kernel__foreach_zero_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_abs_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_abs_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_abs_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_abs_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_acos_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_acos_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_acos_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_acos_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_add_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_add_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_add_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_add_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_addcdiv_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_addcdiv_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_addcdiv_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_addcdiv_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_addcmul_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_addcmul_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_addcmul_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_addcmul_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_asin_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_asin_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_asin_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_asin_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_atan_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_atan_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_atan_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_atan_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_ceil_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_ceil_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_ceil_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_ceil_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_clamp_max_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_clamp_max_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_clamp_max_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_clamp_max_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_clamp_min_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_clamp_min_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_clamp_min_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_clamp_min_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_copy_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_copy_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_copy_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_copy_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_cos_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_cos_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_cos_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_cos_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_cosh_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_cosh_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_cosh_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_cosh_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_div_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_div_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_div_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_div_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_erf_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_erf_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_erf_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_erf_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_erfc_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_erfc_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_erfc_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_erfc_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_exp_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_exp_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_exp_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_exp_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_expm1_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_expm1_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_expm1_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_expm1_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_floor_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_floor_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_floor_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_floor_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_frac_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_frac_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_frac_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_frac_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_lerp_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_lerp_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_lerp_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_lerp_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_lgamma_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_lgamma_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_lgamma_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_lgamma_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_log10_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_log10_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_log10_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_log10_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_log1p_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_log1p_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_log1p_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_log1p_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_log2_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_log2_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_log2_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_log2_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_log_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_log_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_log_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_log_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_max_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_max_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_max_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_max_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_maximum_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_maximum_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_maximum_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_maximum_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_minimum_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_minimum_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_minimum_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_minimum_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_mul_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_mul_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_mul_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_mul_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_neg_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_neg_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_neg_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_neg_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_norm_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_norm_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_norm_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_norm_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_pow_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_pow_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_pow_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_pow_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_reciprocal_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_reciprocal_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_reciprocal_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_reciprocal_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_round_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_round_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_round_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_round_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_rsqrt_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_rsqrt_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_rsqrt_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_rsqrt_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_sigmoid_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_sigmoid_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_sigmoid_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_sigmoid_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_sign_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_sign_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_sign_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_sign_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_sin_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_sin_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_sin_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_sin_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_sinh_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_sinh_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_sinh_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_sinh_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_sqrt_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_sqrt_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_sqrt_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_sqrt_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_sub_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_sub_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_sub_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_sub_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_tan_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_tan_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_tan_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_tan_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_tanh_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_tanh_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_tanh_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_tanh_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_trunc_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_trunc_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_trunc_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_trunc_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_zero_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_zero_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_zero_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_zero_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_big_num_tensors__foreach_max_use_cuda_graph_False_w_empty_False_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_big_num_tensors__foreach_max_use_cuda_graph_False_w_empty_False_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_big_num_tensors__foreach_max_use_cuda_graph_False_w_empty_True_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_big_num_tensors__foreach_max_use_cuda_graph_False_w_empty_True_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_big_num_tensors__foreach_max_use_cuda_graph_True_w_empty_False_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_big_num_tensors__foreach_max_use_cuda_graph_True_w_empty_False_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_big_num_tensors__foreach_max_use_cuda_graph_True_w_empty_True_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_big_num_tensors__foreach_max_use_cuda_graph_True_w_empty_True_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_big_num_tensors__foreach_norm_use_cuda_graph_False_w_empty_False_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_big_num_tensors__foreach_norm_use_cuda_graph_False_w_empty_False_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_big_num_tensors__foreach_norm_use_cuda_graph_False_w_empty_True_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_big_num_tensors__foreach_norm_use_cuda_graph_False_w_empty_True_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_big_num_tensors__foreach_norm_use_cuda_graph_True_w_empty_False_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_big_num_tensors__foreach_norm_use_cuda_graph_True_w_empty_False_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_big_num_tensors__foreach_norm_use_cuda_graph_True_w_empty_True_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_big_num_tensors__foreach_norm_use_cuda_graph_True_w_empty_True_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_binary_op_float_inf_nan__foreach_add_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_binary_op_float_inf_nan__foreach_add_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_binary_op_float_inf_nan__foreach_add_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_float_inf_nan__foreach_add_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_binary_op_float_inf_nan__foreach_clamp_max_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_binary_op_float_inf_nan__foreach_clamp_max_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_binary_op_float_inf_nan__foreach_clamp_max_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_float_inf_nan__foreach_clamp_max_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_binary_op_float_inf_nan__foreach_clamp_min_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_binary_op_float_inf_nan__foreach_clamp_min_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_binary_op_float_inf_nan__foreach_clamp_min_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_float_inf_nan__foreach_clamp_min_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_binary_op_float_inf_nan__foreach_div_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_binary_op_float_inf_nan__foreach_div_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_binary_op_float_inf_nan__foreach_div_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_float_inf_nan__foreach_div_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_binary_op_float_inf_nan__foreach_maximum_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_binary_op_float_inf_nan__foreach_maximum_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_binary_op_float_inf_nan__foreach_maximum_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_float_inf_nan__foreach_maximum_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_binary_op_float_inf_nan__foreach_minimum_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_binary_op_float_inf_nan__foreach_minimum_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_binary_op_float_inf_nan__foreach_minimum_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_float_inf_nan__foreach_minimum_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_binary_op_float_inf_nan__foreach_mul_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_binary_op_float_inf_nan__foreach_mul_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_binary_op_float_inf_nan__foreach_mul_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_float_inf_nan__foreach_mul_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_binary_op_float_inf_nan__foreach_pow_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_binary_op_float_inf_nan__foreach_pow_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_binary_op_float_inf_nan__foreach_pow_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_float_inf_nan__foreach_pow_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_binary_op_float_inf_nan__foreach_sub_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_binary_op_float_inf_nan__foreach_sub_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_binary_op_float_inf_nan__foreach_sub_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_float_inf_nan__foreach_sub_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_add_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_add_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_add_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_add_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_add_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_add_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_add_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_add_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_add_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_add_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_add_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_add_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_clamp_max_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_clamp_max_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_clamp_max_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_clamp_max_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_clamp_max_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_clamp_max_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_clamp_max_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_clamp_max_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_clamp_max_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_clamp_max_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_clamp_max_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_clamp_max_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_clamp_min_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_clamp_min_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_clamp_min_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_clamp_min_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_clamp_min_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_clamp_min_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_clamp_min_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_clamp_min_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_clamp_min_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_clamp_min_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_clamp_min_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_clamp_min_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_div_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_div_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_div_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_div_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_div_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_div_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_div_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_div_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_div_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_div_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_div_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_div_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_maximum_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_maximum_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_maximum_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_maximum_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_maximum_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_maximum_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_maximum_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_maximum_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_maximum_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_maximum_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_maximum_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_maximum_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_minimum_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_minimum_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_minimum_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_minimum_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_minimum_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_minimum_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_minimum_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_minimum_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_minimum_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_minimum_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_minimum_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_minimum_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_mul_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_mul_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_mul_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_mul_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_mul_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_mul_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_mul_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_mul_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_mul_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_mul_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_mul_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_mul_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_pow_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_pow_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_pow_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_pow_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_pow_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_pow_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_pow_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_pow_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_pow_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_pow_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_pow_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_pow_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_sub_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_sub_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_sub_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_sub_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_sub_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_sub_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_sub_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_sub_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_sub_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_sub_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_sub_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_sub_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_add_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_add_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_add_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_add_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_add_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_add_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_add_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_add_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_add_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_add_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_add_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_add_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_clamp_max_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_clamp_max_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_clamp_max_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_clamp_max_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_clamp_max_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_clamp_max_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_clamp_max_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_clamp_max_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_clamp_max_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_clamp_max_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_clamp_max_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_clamp_max_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_clamp_min_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_clamp_min_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_clamp_min_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_clamp_min_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_clamp_min_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_clamp_min_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_clamp_min_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_clamp_min_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_clamp_min_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_clamp_min_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_clamp_min_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_clamp_min_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_div_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_div_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_div_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_div_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_div_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_div_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_div_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_div_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_div_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_div_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_div_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_div_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_maximum_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_maximum_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_maximum_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_maximum_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_maximum_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_maximum_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_maximum_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_maximum_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_maximum_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_maximum_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_maximum_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_maximum_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_minimum_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_minimum_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_minimum_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_minimum_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_minimum_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_minimum_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_minimum_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_minimum_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_minimum_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_minimum_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_minimum_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_minimum_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_mul_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_mul_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_mul_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_mul_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_mul_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_mul_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_mul_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_mul_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_mul_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_mul_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_mul_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_mul_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_pow_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_pow_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_pow_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_pow_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_pow_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_pow_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_pow_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_pow_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_pow_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_pow_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_pow_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_pow_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_sub_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_sub_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_sub_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_sub_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_sub_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_sub_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_sub_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_sub_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_sub_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_sub_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_sub_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_sub_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_different_tensor_dtypes__foreach_add_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_different_tensor_dtypes__foreach_clamp_max_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_different_tensor_dtypes__foreach_clamp_min_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_different_tensor_dtypes__foreach_div_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_different_tensor_dtypes__foreach_maximum_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_different_tensor_dtypes__foreach_minimum_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_different_tensor_dtypes__foreach_mul_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_different_tensor_dtypes__foreach_pow_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_different_tensor_dtypes__foreach_sub_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_add_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_add_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_add_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_add_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_add_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_add_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_add_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_add_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_add_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_add_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_add_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_add_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_clamp_max_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_clamp_max_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_clamp_max_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_clamp_max_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_clamp_max_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_clamp_max_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_clamp_max_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_clamp_max_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_clamp_max_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_clamp_max_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_clamp_max_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_clamp_max_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_clamp_min_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_clamp_min_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_clamp_min_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_clamp_min_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_clamp_min_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_clamp_min_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_clamp_min_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_clamp_min_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_clamp_min_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_clamp_min_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_clamp_min_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_clamp_min_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_div_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_div_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_div_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_div_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_div_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_div_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_div_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_div_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_div_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_div_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_div_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_div_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_maximum_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_maximum_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_maximum_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_maximum_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_maximum_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_maximum_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_maximum_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_maximum_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_maximum_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_maximum_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_maximum_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_maximum_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_minimum_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_minimum_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_minimum_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_minimum_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_minimum_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_minimum_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_minimum_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_minimum_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_minimum_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_minimum_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_minimum_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_minimum_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_mul_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_mul_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_mul_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_mul_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_mul_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_mul_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_mul_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_mul_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_mul_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_mul_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_mul_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_mul_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_pow_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_pow_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_pow_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_pow_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_pow_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_pow_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_pow_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_pow_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_pow_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_pow_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_pow_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_pow_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_sub_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_sub_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_sub_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_sub_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_sub_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_sub_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_sub_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_sub_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_sub_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_sub_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_sub_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_sub_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_add_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_add_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_add_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_add_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_add_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_add_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_add_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_add_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_add_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_add_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_add_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_add_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_clamp_max_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_clamp_max_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_clamp_max_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_clamp_max_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_clamp_max_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_clamp_max_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_clamp_max_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_clamp_max_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_clamp_max_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_clamp_max_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_clamp_max_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_clamp_max_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_clamp_min_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_clamp_min_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_clamp_min_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_clamp_min_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_clamp_min_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_clamp_min_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_clamp_min_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_clamp_min_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_clamp_min_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_clamp_min_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_clamp_min_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_clamp_min_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_div_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_div_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_div_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_div_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_div_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_div_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_div_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_div_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_div_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_div_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_div_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_div_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_maximum_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_maximum_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_maximum_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_maximum_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_maximum_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_maximum_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_maximum_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_maximum_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_maximum_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_maximum_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_maximum_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_maximum_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_minimum_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_minimum_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_minimum_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_minimum_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_minimum_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_minimum_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_minimum_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_minimum_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_minimum_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_minimum_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_minimum_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_minimum_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_mul_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_mul_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_mul_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_mul_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_mul_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_mul_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_mul_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_mul_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_mul_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_mul_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_mul_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_mul_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_pow_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_pow_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_pow_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_pow_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_pow_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_pow_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_pow_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_pow_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_pow_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_pow_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_pow_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_pow_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_sub_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_sub_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_sub_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_sub_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_sub_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_sub_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_sub_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_sub_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_sub_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_sub_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_sub_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_sub_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_binary_op_with_scalar_self_support__foreach_pow_is_fastpath_False_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_binary_op_with_scalar_self_support__foreach_pow_is_fastpath_False_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_binary_op_with_scalar_self_support__foreach_pow_is_fastpath_False_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_binary_op_with_scalar_self_support__foreach_pow_is_fastpath_False_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_binary_op_with_scalar_self_support__foreach_pow_is_fastpath_False_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_binary_op_with_scalar_self_support__foreach_pow_is_fastpath_False_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_with_scalar_self_support__foreach_pow_is_fastpath_False_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_binary_op_with_scalar_self_support__foreach_pow_is_fastpath_False_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_binary_op_with_scalar_self_support__foreach_pow_is_fastpath_False_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_binary_op_with_scalar_self_support__foreach_pow_is_fastpath_False_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_binary_op_with_scalar_self_support__foreach_pow_is_fastpath_False_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_binary_op_with_scalar_self_support__foreach_pow_is_fastpath_False_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_binary_op_with_scalar_self_support__foreach_pow_is_fastpath_True_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_binary_op_with_scalar_self_support__foreach_pow_is_fastpath_True_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_binary_op_with_scalar_self_support__foreach_pow_is_fastpath_True_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_binary_op_with_scalar_self_support__foreach_pow_is_fastpath_True_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_binary_op_with_scalar_self_support__foreach_pow_is_fastpath_True_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_binary_op_with_scalar_self_support__foreach_pow_is_fastpath_True_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_with_scalar_self_support__foreach_pow_is_fastpath_True_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_binary_op_with_scalar_self_support__foreach_pow_is_fastpath_True_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_binary_op_with_scalar_self_support__foreach_pow_is_fastpath_True_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_binary_op_with_scalar_self_support__foreach_pow_is_fastpath_True_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_binary_op_with_scalar_self_support__foreach_pow_is_fastpath_True_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_binary_op_with_scalar_self_support__foreach_pow_is_fastpath_True_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_div_reciprocal_cuda, test/test_foreach.py::TestForeachCUDA::test_foreach_check_stride_ignore_dims_of_one_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_foreach_copy_with_different_device_inputs__foreach_copy_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_foreach_copy_with_different_device_inputs__foreach_copy_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_foreach_copy_with_different_device_inputs__foreach_copy_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_foreach_copy_with_different_device_inputs__foreach_copy_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_foreach_copy_with_different_device_inputs__foreach_copy_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_foreach_copy_with_different_device_inputs__foreach_copy_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_foreach_copy_with_different_device_inputs__foreach_copy_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_foreach_copy_with_different_device_inputs__foreach_copy_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_foreach_copy_with_different_device_inputs__foreach_copy_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_foreach_copy_with_different_device_inputs__foreach_copy_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_foreach_copy_with_different_device_inputs__foreach_copy_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_foreach_copy_with_different_device_inputs__foreach_copy_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_foreach_copy_with_multi_device_inputs__foreach_copy_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_foreach_copy_with_multi_device_inputs__foreach_copy_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_foreach_copy_with_multi_device_inputs__foreach_copy_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_foreach_copy_with_multi_device_inputs__foreach_copy_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_foreach_copy_with_multi_device_inputs__foreach_copy_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_foreach_copy_with_multi_device_inputs__foreach_copy_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_foreach_copy_with_multi_device_inputs__foreach_copy_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_foreach_copy_with_multi_device_inputs__foreach_copy_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_foreach_copy_with_multi_device_inputs__foreach_copy_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_foreach_copy_with_multi_device_inputs__foreach_copy_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_foreach_copy_with_multi_device_inputs__foreach_copy_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_foreach_copy_with_multi_device_inputs__foreach_copy_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_foreach_copy_with_multi_dtypes__foreach_copy_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_foreach_copy_with_multi_dtypes__foreach_copy_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_foreach_copy_with_multi_dtypes__foreach_copy_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_foreach_copy_with_multi_dtypes__foreach_copy_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_foreach_copy_with_multi_dtypes__foreach_copy_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_foreach_copy_with_multi_dtypes__foreach_copy_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_foreach_copy_with_multi_dtypes__foreach_copy_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_foreach_copy_with_multi_dtypes__foreach_copy_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_foreach_copy_with_multi_dtypes__foreach_copy_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_foreach_copy_with_multi_dtypes__foreach_copy_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_foreach_copy_with_multi_dtypes__foreach_copy_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_foreach_copy_with_multi_dtypes__foreach_copy_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_foreach_copy_with_multi_dtypes_large_input_cuda, test/test_foreach.py::TestForeachCUDA::test_foreach_l2_large_value_input__foreach_norm_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_foreach_l2_large_value_input__foreach_norm_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_foreach_reduce_large_input__foreach_max_w_empty_False_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_foreach_reduce_large_input__foreach_max_w_empty_False_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_foreach_reduce_large_input__foreach_max_w_empty_False_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_foreach_reduce_large_input__foreach_max_w_empty_False_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_foreach_reduce_large_input__foreach_max_w_empty_False_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_foreach_reduce_large_input__foreach_max_w_empty_False_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_foreach_reduce_large_input__foreach_max_w_empty_False_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_foreach_reduce_large_input__foreach_max_w_empty_False_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_foreach_reduce_large_input__foreach_max_w_empty_False_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_foreach_reduce_large_input__foreach_max_w_empty_False_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_foreach_reduce_large_input__foreach_max_w_empty_False_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_foreach_reduce_large_input__foreach_max_w_empty_False_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_foreach_reduce_large_input__foreach_max_w_empty_True_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_foreach_reduce_large_input__foreach_max_w_empty_True_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_foreach_reduce_large_input__foreach_max_w_empty_True_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_foreach_reduce_large_input__foreach_max_w_empty_True_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_foreach_reduce_large_input__foreach_max_w_empty_True_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_foreach_reduce_large_input__foreach_max_w_empty_True_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_foreach_reduce_large_input__foreach_max_w_empty_True_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_foreach_reduce_large_input__foreach_max_w_empty_True_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_foreach_reduce_large_input__foreach_max_w_empty_True_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_foreach_reduce_large_input__foreach_max_w_empty_True_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_foreach_reduce_large_input__foreach_max_w_empty_True_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_foreach_reduce_large_input__foreach_max_w_empty_True_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_foreach_reduce_large_input__foreach_norm_w_empty_False_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_foreach_reduce_large_input__foreach_norm_w_empty_False_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_foreach_reduce_large_input__foreach_norm_w_empty_False_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_foreach_reduce_large_input__foreach_norm_w_empty_False_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_foreach_reduce_large_input__foreach_norm_w_empty_False_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_foreach_reduce_large_input__foreach_norm_w_empty_False_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_foreach_reduce_large_input__foreach_norm_w_empty_False_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_foreach_reduce_large_input__foreach_norm_w_empty_False_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_foreach_reduce_large_input__foreach_norm_w_empty_False_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_foreach_reduce_large_input__foreach_norm_w_empty_False_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_foreach_reduce_large_input__foreach_norm_w_empty_False_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_foreach_reduce_large_input__foreach_norm_w_empty_False_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_foreach_reduce_large_input__foreach_norm_w_empty_True_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_foreach_reduce_large_input__foreach_norm_w_empty_True_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_foreach_reduce_large_input__foreach_norm_w_empty_True_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_foreach_reduce_large_input__foreach_norm_w_empty_True_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_foreach_reduce_large_input__foreach_norm_w_empty_True_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_foreach_reduce_large_input__foreach_norm_w_empty_True_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_foreach_reduce_large_input__foreach_norm_w_empty_True_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_foreach_reduce_large_input__foreach_norm_w_empty_True_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_foreach_reduce_large_input__foreach_norm_w_empty_True_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_foreach_reduce_large_input__foreach_norm_w_empty_True_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_foreach_reduce_large_input__foreach_norm_w_empty_True_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_foreach_reduce_large_input__foreach_norm_w_empty_True_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_inplace_foreach_leaf_check_and_grad_fn__foreach_abs_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_inplace_foreach_leaf_check_and_grad_fn__foreach_acos_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_inplace_foreach_leaf_check_and_grad_fn__foreach_add_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_inplace_foreach_leaf_check_and_grad_fn__foreach_addcdiv_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_inplace_foreach_leaf_check_and_grad_fn__foreach_addcmul_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_inplace_foreach_leaf_check_and_grad_fn__foreach_asin_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_inplace_foreach_leaf_check_and_grad_fn__foreach_atan_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_inplace_foreach_leaf_check_and_grad_fn__foreach_ceil_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_inplace_foreach_leaf_check_and_grad_fn__foreach_clamp_max_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_inplace_foreach_leaf_check_and_grad_fn__foreach_clamp_min_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_inplace_foreach_leaf_check_and_grad_fn__foreach_copy_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_inplace_foreach_leaf_check_and_grad_fn__foreach_cos_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_inplace_foreach_leaf_check_and_grad_fn__foreach_cosh_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_inplace_foreach_leaf_check_and_grad_fn__foreach_div_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_inplace_foreach_leaf_check_and_grad_fn__foreach_erf_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_inplace_foreach_leaf_check_and_grad_fn__foreach_erfc_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_inplace_foreach_leaf_check_and_grad_fn__foreach_exp_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_inplace_foreach_leaf_check_and_grad_fn__foreach_expm1_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_inplace_foreach_leaf_check_and_grad_fn__foreach_floor_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_inplace_foreach_leaf_check_and_grad_fn__foreach_frac_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_inplace_foreach_leaf_check_and_grad_fn__foreach_lerp_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_inplace_foreach_leaf_check_and_grad_fn__foreach_lgamma_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_inplace_foreach_leaf_check_and_grad_fn__foreach_log10_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_inplace_foreach_leaf_check_and_grad_fn__foreach_log1p_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_inplace_foreach_leaf_check_and_grad_fn__foreach_log2_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_inplace_foreach_leaf_check_and_grad_fn__foreach_log_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_inplace_foreach_leaf_check_and_grad_fn__foreach_maximum_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_inplace_foreach_leaf_check_and_grad_fn__foreach_minimum_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_inplace_foreach_leaf_check_and_grad_fn__foreach_mul_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_inplace_foreach_leaf_check_and_grad_fn__foreach_neg_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_inplace_foreach_leaf_check_and_grad_fn__foreach_pow_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_inplace_foreach_leaf_check_and_grad_fn__foreach_reciprocal_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_inplace_foreach_leaf_check_and_grad_fn__foreach_round_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_inplace_foreach_leaf_check_and_grad_fn__foreach_rsqrt_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_inplace_foreach_leaf_check_and_grad_fn__foreach_sigmoid_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_inplace_foreach_leaf_check_and_grad_fn__foreach_sign_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_inplace_foreach_leaf_check_and_grad_fn__foreach_sin_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_inplace_foreach_leaf_check_and_grad_fn__foreach_sinh_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_inplace_foreach_leaf_check_and_grad_fn__foreach_sqrt_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_inplace_foreach_leaf_check_and_grad_fn__foreach_sub_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_inplace_foreach_leaf_check_and_grad_fn__foreach_tan_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_inplace_foreach_leaf_check_and_grad_fn__foreach_tanh_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_inplace_foreach_leaf_check_and_grad_fn__foreach_trunc_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_inplace_foreach_leaf_check_and_grad_fn__foreach_zero_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_lifetime_of_grad_fn_when_result_is_saved__foreach_exp_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_lifetime_of_grad_fn_when_result_is_saved__foreach_expm1_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_lifetime_of_grad_fn_when_result_is_saved__foreach_pow_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_lifetime_of_grad_fn_when_result_is_saved__foreach_reciprocal_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_lifetime_of_grad_fn_when_result_is_saved__foreach_rsqrt_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_lifetime_of_grad_fn_when_result_is_saved__foreach_sigmoid_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_lifetime_of_grad_fn_when_result_is_saved__foreach_sqrt_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_lifetime_of_grad_fn_when_result_is_saved__foreach_tan_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_lifetime_of_grad_fn_when_result_is_saved__foreach_tanh_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_outplace_with_invalid_grads__foreach_abs_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_outplace_with_invalid_grads__foreach_acos_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_outplace_with_invalid_grads__foreach_add_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_outplace_with_invalid_grads__foreach_addcdiv_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_outplace_with_invalid_grads__foreach_addcmul_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_outplace_with_invalid_grads__foreach_asin_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_outplace_with_invalid_grads__foreach_atan_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_outplace_with_invalid_grads__foreach_ceil_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_outplace_with_invalid_grads__foreach_clamp_max_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_outplace_with_invalid_grads__foreach_clamp_min_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_outplace_with_invalid_grads__foreach_cos_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_outplace_with_invalid_grads__foreach_cosh_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_outplace_with_invalid_grads__foreach_div_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_outplace_with_invalid_grads__foreach_erf_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_outplace_with_invalid_grads__foreach_erfc_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_outplace_with_invalid_grads__foreach_exp_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_outplace_with_invalid_grads__foreach_expm1_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_outplace_with_invalid_grads__foreach_floor_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_outplace_with_invalid_grads__foreach_frac_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_outplace_with_invalid_grads__foreach_lerp_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_outplace_with_invalid_grads__foreach_lgamma_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_outplace_with_invalid_grads__foreach_log10_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_outplace_with_invalid_grads__foreach_log1p_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_outplace_with_invalid_grads__foreach_log2_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_outplace_with_invalid_grads__foreach_log_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_outplace_with_invalid_grads__foreach_maximum_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_outplace_with_invalid_grads__foreach_minimum_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_outplace_with_invalid_grads__foreach_mul_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_outplace_with_invalid_grads__foreach_neg_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_outplace_with_invalid_grads__foreach_pow_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_outplace_with_invalid_grads__foreach_reciprocal_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_outplace_with_invalid_grads__foreach_round_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_outplace_with_invalid_grads__foreach_rsqrt_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_outplace_with_invalid_grads__foreach_sigmoid_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_outplace_with_invalid_grads__foreach_sign_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_outplace_with_invalid_grads__foreach_sin_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_outplace_with_invalid_grads__foreach_sinh_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_outplace_with_invalid_grads__foreach_sqrt_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_outplace_with_invalid_grads__foreach_sub_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_outplace_with_invalid_grads__foreach_tan_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_outplace_with_invalid_grads__foreach_tanh_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_outplace_with_invalid_grads__foreach_trunc_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_abs_fastpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_abs_fastpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_abs_fastpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_abs_fastpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_abs_fastpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_abs_fastpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_abs_fastpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_abs_fastpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_abs_fastpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_abs_fastpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_abs_fastpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_abs_fastpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_abs_fastpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_abs_fastpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_abs_fastpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_abs_fastpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_abs_fastpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_abs_fastpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_abs_fastpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_abs_fastpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_abs_fastpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_abs_fastpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_abs_fastpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_abs_fastpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_abs_slowpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_abs_slowpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_abs_slowpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_abs_slowpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_abs_slowpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_abs_slowpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_abs_slowpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_abs_slowpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_abs_slowpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_abs_slowpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_abs_slowpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_abs_slowpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_abs_slowpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_abs_slowpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_abs_slowpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_abs_slowpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_abs_slowpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_abs_slowpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_abs_slowpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_abs_slowpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_abs_slowpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_abs_slowpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_abs_slowpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_abs_slowpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_acos_fastpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_acos_fastpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_acos_fastpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_acos_fastpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_acos_fastpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_acos_fastpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_acos_fastpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_acos_fastpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_acos_fastpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_acos_fastpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_acos_fastpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_acos_fastpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_acos_fastpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_acos_fastpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_acos_fastpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_acos_fastpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_acos_fastpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_acos_fastpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_acos_fastpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_acos_fastpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_acos_fastpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_acos_fastpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_acos_fastpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_acos_fastpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_acos_slowpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_acos_slowpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_acos_slowpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_acos_slowpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_acos_slowpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_acos_slowpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_acos_slowpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_acos_slowpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_acos_slowpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_acos_slowpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_acos_slowpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_acos_slowpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_acos_slowpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_acos_slowpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_acos_slowpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_acos_slowpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_acos_slowpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_acos_slowpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_acos_slowpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_acos_slowpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_acos_slowpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_acos_slowpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_acos_slowpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_acos_slowpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_add_fastpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_add_fastpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_add_fastpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_add_fastpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_add_fastpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_add_fastpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_add_fastpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_add_fastpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_add_fastpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_add_fastpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_add_fastpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_add_fastpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_add_fastpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_add_fastpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_add_fastpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_add_fastpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_add_fastpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_add_fastpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_add_fastpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_add_fastpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_add_fastpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_add_fastpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_add_fastpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_add_fastpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_add_slowpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_add_slowpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_add_slowpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_add_slowpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_add_slowpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_add_slowpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_add_slowpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_add_slowpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_add_slowpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_add_slowpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_add_slowpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_add_slowpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_add_slowpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_add_slowpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_add_slowpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_add_slowpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_add_slowpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_add_slowpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_add_slowpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_add_slowpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_add_slowpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_add_slowpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_add_slowpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_add_slowpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcdiv_fastpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcdiv_fastpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcdiv_fastpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcdiv_fastpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcdiv_fastpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcdiv_fastpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcdiv_fastpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcdiv_fastpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcdiv_fastpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcdiv_fastpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcdiv_fastpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcdiv_fastpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcdiv_fastpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcdiv_fastpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcdiv_fastpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcdiv_fastpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcdiv_fastpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcdiv_fastpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcdiv_fastpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcdiv_fastpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcdiv_fastpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcdiv_fastpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcdiv_fastpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcdiv_fastpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcdiv_slowpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcdiv_slowpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcdiv_slowpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcdiv_slowpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcdiv_slowpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcdiv_slowpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcdiv_slowpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcdiv_slowpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcdiv_slowpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcdiv_slowpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcdiv_slowpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcdiv_slowpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcdiv_slowpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcdiv_slowpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcdiv_slowpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcdiv_slowpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcdiv_slowpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcdiv_slowpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcdiv_slowpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcdiv_slowpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcdiv_slowpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcdiv_slowpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcdiv_slowpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcdiv_slowpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcmul_fastpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcmul_fastpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcmul_fastpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcmul_fastpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcmul_fastpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcmul_fastpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcmul_fastpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcmul_fastpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcmul_fastpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcmul_fastpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcmul_fastpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcmul_fastpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcmul_fastpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcmul_fastpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcmul_fastpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcmul_fastpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcmul_fastpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcmul_fastpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcmul_fastpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcmul_fastpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcmul_fastpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcmul_fastpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcmul_fastpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcmul_fastpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcmul_slowpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcmul_slowpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcmul_slowpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcmul_slowpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcmul_slowpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcmul_slowpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcmul_slowpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcmul_slowpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcmul_slowpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcmul_slowpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcmul_slowpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcmul_slowpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcmul_slowpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcmul_slowpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcmul_slowpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcmul_slowpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcmul_slowpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcmul_slowpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcmul_slowpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcmul_slowpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcmul_slowpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcmul_slowpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcmul_slowpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcmul_slowpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_asin_fastpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_asin_fastpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_asin_fastpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_asin_fastpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_asin_fastpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_asin_fastpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_asin_fastpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_asin_fastpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_asin_fastpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_asin_fastpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_asin_fastpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_asin_fastpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_asin_fastpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_asin_fastpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_asin_fastpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_asin_fastpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_asin_fastpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_asin_fastpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_asin_fastpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_asin_fastpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_asin_fastpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_asin_fastpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_asin_fastpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_asin_fastpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_asin_slowpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_asin_slowpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_asin_slowpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_asin_slowpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_asin_slowpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_asin_slowpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_asin_slowpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_asin_slowpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_asin_slowpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_asin_slowpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_asin_slowpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_asin_slowpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_asin_slowpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_asin_slowpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_asin_slowpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_asin_slowpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_asin_slowpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_asin_slowpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_asin_slowpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_asin_slowpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_asin_slowpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_asin_slowpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_asin_slowpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_asin_slowpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_atan_fastpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_atan_fastpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_atan_fastpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_atan_fastpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_atan_fastpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_atan_fastpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_atan_fastpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_atan_fastpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_atan_fastpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_atan_fastpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_atan_fastpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_atan_fastpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_atan_fastpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_atan_fastpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_atan_fastpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_atan_fastpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_atan_fastpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_atan_fastpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_atan_fastpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_atan_fastpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_atan_fastpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_atan_fastpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_atan_fastpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_atan_fastpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_atan_slowpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_atan_slowpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_atan_slowpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_atan_slowpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_atan_slowpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_atan_slowpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_atan_slowpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_atan_slowpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_atan_slowpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_atan_slowpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_atan_slowpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_atan_slowpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_atan_slowpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_atan_slowpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_atan_slowpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_atan_slowpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_atan_slowpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_atan_slowpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_atan_slowpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_atan_slowpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_atan_slowpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_atan_slowpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_atan_slowpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_atan_slowpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_ceil_fastpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_ceil_fastpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_ceil_fastpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_ceil_fastpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_ceil_fastpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_ceil_fastpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_ceil_fastpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_ceil_fastpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_ceil_fastpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_ceil_fastpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_ceil_fastpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_ceil_fastpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_ceil_fastpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_ceil_fastpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_ceil_fastpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_ceil_fastpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_ceil_fastpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_ceil_fastpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_ceil_fastpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_ceil_fastpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_ceil_fastpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_ceil_fastpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_ceil_fastpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_ceil_fastpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_ceil_slowpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_ceil_slowpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_ceil_slowpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_ceil_slowpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_ceil_slowpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_ceil_slowpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_ceil_slowpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_ceil_slowpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_ceil_slowpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_ceil_slowpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_ceil_slowpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_ceil_slowpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_ceil_slowpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_ceil_slowpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_ceil_slowpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_ceil_slowpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_ceil_slowpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_ceil_slowpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_ceil_slowpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_ceil_slowpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_ceil_slowpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_ceil_slowpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_ceil_slowpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_ceil_slowpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_max_fastpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_max_fastpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_max_fastpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_max_fastpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_max_fastpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_max_fastpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_max_fastpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_max_fastpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_max_fastpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_max_fastpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_max_fastpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_max_fastpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_max_fastpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_max_fastpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_max_fastpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_max_fastpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_max_fastpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_max_fastpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_max_fastpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_max_fastpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_max_fastpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_max_fastpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_max_fastpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_max_fastpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_max_slowpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_max_slowpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_max_slowpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_max_slowpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_max_slowpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_max_slowpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_max_slowpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_max_slowpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_max_slowpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_max_slowpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_max_slowpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_max_slowpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_max_slowpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_max_slowpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_max_slowpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_max_slowpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_max_slowpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_max_slowpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_max_slowpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_max_slowpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_max_slowpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_max_slowpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_max_slowpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_max_slowpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_min_fastpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_min_fastpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_min_fastpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_min_fastpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_min_fastpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_min_fastpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_min_fastpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_min_fastpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_min_fastpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_min_fastpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_min_fastpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_min_fastpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_min_fastpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_min_fastpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_min_fastpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_min_fastpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_min_fastpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_min_fastpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_min_fastpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_min_fastpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_min_fastpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_min_fastpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_min_fastpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_min_fastpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_min_slowpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_min_slowpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_min_slowpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_min_slowpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_min_slowpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_min_slowpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_min_slowpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_min_slowpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_min_slowpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_min_slowpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_min_slowpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_min_slowpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_min_slowpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_min_slowpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_min_slowpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_min_slowpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_min_slowpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_min_slowpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_min_slowpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_min_slowpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_min_slowpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_min_slowpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_min_slowpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_min_slowpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_copy_fastpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_copy_fastpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_copy_fastpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_copy_fastpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_copy_fastpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_copy_fastpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_copy_fastpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_copy_fastpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_copy_fastpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_copy_fastpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_copy_fastpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_copy_fastpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_copy_fastpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_copy_fastpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_copy_fastpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_copy_fastpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_copy_fastpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_copy_fastpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_copy_fastpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_copy_fastpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_copy_fastpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_copy_fastpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_copy_fastpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_copy_fastpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_copy_slowpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_copy_slowpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_copy_slowpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_copy_slowpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_copy_slowpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_copy_slowpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_copy_slowpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_copy_slowpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_copy_slowpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_copy_slowpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_copy_slowpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_copy_slowpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_copy_slowpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_copy_slowpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_copy_slowpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_copy_slowpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_copy_slowpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_copy_slowpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_copy_slowpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_copy_slowpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_copy_slowpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_copy_slowpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_copy_slowpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_copy_slowpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cos_fastpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cos_fastpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cos_fastpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cos_fastpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cos_fastpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cos_fastpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cos_fastpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cos_fastpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cos_fastpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cos_fastpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cos_fastpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cos_fastpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cos_fastpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cos_fastpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cos_fastpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cos_fastpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cos_fastpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cos_fastpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cos_fastpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cos_fastpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cos_fastpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cos_fastpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cos_fastpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cos_fastpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cos_slowpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cos_slowpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cos_slowpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cos_slowpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cos_slowpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cos_slowpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cos_slowpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cos_slowpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cos_slowpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cos_slowpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cos_slowpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cos_slowpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cos_slowpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cos_slowpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cos_slowpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cos_slowpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cos_slowpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cos_slowpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cos_slowpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cos_slowpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cos_slowpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cos_slowpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cos_slowpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cos_slowpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cosh_fastpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cosh_fastpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cosh_fastpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cosh_fastpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cosh_fastpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cosh_fastpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cosh_fastpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cosh_fastpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cosh_fastpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cosh_fastpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cosh_fastpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cosh_fastpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cosh_fastpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cosh_fastpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cosh_fastpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cosh_fastpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cosh_fastpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cosh_fastpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cosh_fastpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cosh_fastpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cosh_fastpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cosh_fastpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cosh_fastpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cosh_fastpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cosh_slowpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cosh_slowpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cosh_slowpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cosh_slowpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cosh_slowpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cosh_slowpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cosh_slowpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cosh_slowpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cosh_slowpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cosh_slowpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cosh_slowpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cosh_slowpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cosh_slowpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cosh_slowpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cosh_slowpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cosh_slowpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cosh_slowpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cosh_slowpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cosh_slowpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cosh_slowpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cosh_slowpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cosh_slowpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cosh_slowpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cosh_slowpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_div_fastpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_div_fastpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_div_fastpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_div_fastpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_div_fastpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_div_fastpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_div_fastpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_div_fastpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_div_fastpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_div_fastpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_div_fastpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_div_fastpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_div_fastpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_div_fastpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_div_fastpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_div_fastpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_div_fastpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_div_fastpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_div_fastpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_div_fastpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_div_fastpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_div_fastpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_div_fastpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_div_fastpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_div_slowpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_div_slowpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_div_slowpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_div_slowpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_div_slowpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_div_slowpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_div_slowpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_div_slowpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_div_slowpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_div_slowpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_div_slowpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_div_slowpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_div_slowpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_div_slowpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_div_slowpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_div_slowpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_div_slowpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_div_slowpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_div_slowpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_div_slowpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_div_slowpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_div_slowpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_div_slowpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_div_slowpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erf_fastpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erf_fastpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erf_fastpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erf_fastpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erf_fastpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erf_fastpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erf_fastpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erf_fastpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erf_fastpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erf_fastpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erf_fastpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erf_fastpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erf_fastpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erf_fastpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erf_fastpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erf_fastpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erf_fastpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erf_fastpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erf_fastpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erf_fastpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erf_fastpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erf_fastpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erf_fastpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erf_fastpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erf_slowpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erf_slowpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erf_slowpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erf_slowpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erf_slowpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erf_slowpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erf_slowpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erf_slowpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erf_slowpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erf_slowpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erf_slowpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erf_slowpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erf_slowpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erf_slowpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erf_slowpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erf_slowpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erf_slowpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erf_slowpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erf_slowpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erf_slowpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erf_slowpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erf_slowpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erf_slowpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erf_slowpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erfc_fastpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erfc_fastpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erfc_fastpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erfc_fastpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erfc_fastpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erfc_fastpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erfc_fastpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erfc_fastpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erfc_fastpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erfc_fastpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erfc_fastpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erfc_fastpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erfc_fastpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erfc_fastpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erfc_fastpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erfc_fastpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erfc_fastpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erfc_fastpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erfc_fastpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erfc_fastpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erfc_fastpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erfc_fastpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erfc_fastpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erfc_fastpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erfc_slowpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erfc_slowpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erfc_slowpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erfc_slowpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erfc_slowpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erfc_slowpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erfc_slowpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erfc_slowpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erfc_slowpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erfc_slowpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erfc_slowpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erfc_slowpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erfc_slowpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erfc_slowpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erfc_slowpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erfc_slowpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erfc_slowpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erfc_slowpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erfc_slowpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erfc_slowpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erfc_slowpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erfc_slowpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erfc_slowpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erfc_slowpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_exp_fastpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_exp_fastpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_exp_fastpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_exp_fastpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_exp_fastpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_exp_fastpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_exp_fastpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_exp_fastpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_exp_fastpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_exp_fastpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_exp_fastpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_exp_fastpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_exp_fastpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_exp_fastpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_exp_fastpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_exp_fastpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_exp_fastpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_exp_fastpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_exp_fastpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_exp_fastpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_exp_fastpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_exp_fastpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_exp_fastpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_exp_fastpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_exp_slowpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_exp_slowpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_exp_slowpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_exp_slowpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_exp_slowpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_exp_slowpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_exp_slowpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_exp_slowpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_exp_slowpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_exp_slowpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_exp_slowpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_exp_slowpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_exp_slowpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_exp_slowpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_exp_slowpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_exp_slowpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_exp_slowpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_exp_slowpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_exp_slowpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_exp_slowpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_exp_slowpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_exp_slowpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_exp_slowpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_exp_slowpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_expm1_fastpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_expm1_fastpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_expm1_fastpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_expm1_fastpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_expm1_fastpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_expm1_fastpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_expm1_fastpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_expm1_fastpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_expm1_fastpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_expm1_fastpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_expm1_fastpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_expm1_fastpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_expm1_fastpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_expm1_fastpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_expm1_fastpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_expm1_fastpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_expm1_fastpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_expm1_fastpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_expm1_fastpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_expm1_fastpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_expm1_fastpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_expm1_fastpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_expm1_fastpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_expm1_fastpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_expm1_slowpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_expm1_slowpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_expm1_slowpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_expm1_slowpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_expm1_slowpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_expm1_slowpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_expm1_slowpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_expm1_slowpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_expm1_slowpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_expm1_slowpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_expm1_slowpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_expm1_slowpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_expm1_slowpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_expm1_slowpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_expm1_slowpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_expm1_slowpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_expm1_slowpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_expm1_slowpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_expm1_slowpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_expm1_slowpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_expm1_slowpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_expm1_slowpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_expm1_slowpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_expm1_slowpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_floor_fastpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_floor_fastpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_floor_fastpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_floor_fastpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_floor_fastpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_floor_fastpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_floor_fastpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_floor_fastpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_floor_fastpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_floor_fastpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_floor_fastpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_floor_fastpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_floor_fastpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_floor_fastpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_floor_fastpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_floor_fastpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_floor_fastpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_floor_fastpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_floor_fastpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_floor_fastpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_floor_fastpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_floor_fastpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_floor_fastpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_floor_fastpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_floor_slowpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_floor_slowpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_floor_slowpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_floor_slowpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_floor_slowpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_floor_slowpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_floor_slowpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_floor_slowpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_floor_slowpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_floor_slowpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_floor_slowpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_floor_slowpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_floor_slowpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_floor_slowpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_floor_slowpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_floor_slowpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_floor_slowpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_floor_slowpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_floor_slowpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_floor_slowpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_floor_slowpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_floor_slowpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_floor_slowpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_floor_slowpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_frac_fastpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_frac_fastpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_frac_fastpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_frac_fastpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_frac_fastpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_frac_fastpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_frac_fastpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_frac_fastpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_frac_fastpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_frac_fastpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_frac_fastpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_frac_fastpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_frac_fastpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_frac_fastpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_frac_fastpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_frac_fastpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_frac_fastpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_frac_fastpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_frac_fastpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_frac_fastpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_frac_fastpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_frac_fastpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_frac_fastpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_frac_fastpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_frac_slowpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_frac_slowpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_frac_slowpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_frac_slowpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_frac_slowpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_frac_slowpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_frac_slowpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_frac_slowpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_frac_slowpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_frac_slowpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_frac_slowpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_frac_slowpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_frac_slowpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_frac_slowpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_frac_slowpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_frac_slowpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_frac_slowpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_frac_slowpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_frac_slowpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_frac_slowpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_frac_slowpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_frac_slowpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_frac_slowpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_frac_slowpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lerp_fastpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lerp_fastpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lerp_fastpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lerp_fastpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lerp_fastpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lerp_fastpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lerp_fastpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lerp_fastpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lerp_fastpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lerp_fastpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lerp_fastpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lerp_fastpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lerp_fastpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lerp_fastpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lerp_fastpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lerp_fastpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lerp_fastpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lerp_fastpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lerp_fastpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lerp_fastpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lerp_fastpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lerp_fastpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lerp_fastpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lerp_fastpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lerp_slowpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lerp_slowpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lerp_slowpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lerp_slowpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lerp_slowpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lerp_slowpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lerp_slowpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lerp_slowpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lerp_slowpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lerp_slowpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lerp_slowpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lerp_slowpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lerp_slowpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lerp_slowpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lerp_slowpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lerp_slowpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lerp_slowpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lerp_slowpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lerp_slowpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lerp_slowpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lerp_slowpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lerp_slowpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lerp_slowpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lerp_slowpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lgamma_fastpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lgamma_fastpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lgamma_fastpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lgamma_fastpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lgamma_fastpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lgamma_fastpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lgamma_fastpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lgamma_fastpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lgamma_fastpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lgamma_fastpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lgamma_fastpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lgamma_fastpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lgamma_fastpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lgamma_fastpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lgamma_fastpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lgamma_fastpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lgamma_fastpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lgamma_fastpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lgamma_fastpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lgamma_fastpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lgamma_fastpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lgamma_fastpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lgamma_fastpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lgamma_fastpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lgamma_slowpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lgamma_slowpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lgamma_slowpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lgamma_slowpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lgamma_slowpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lgamma_slowpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lgamma_slowpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lgamma_slowpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lgamma_slowpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lgamma_slowpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lgamma_slowpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lgamma_slowpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lgamma_slowpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lgamma_slowpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lgamma_slowpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lgamma_slowpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lgamma_slowpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lgamma_slowpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lgamma_slowpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lgamma_slowpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lgamma_slowpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lgamma_slowpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lgamma_slowpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lgamma_slowpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log10_fastpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log10_fastpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log10_fastpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log10_fastpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log10_fastpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log10_fastpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log10_fastpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log10_fastpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log10_fastpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log10_fastpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log10_fastpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log10_fastpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log10_fastpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log10_fastpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log10_fastpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log10_fastpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log10_fastpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log10_fastpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log10_fastpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log10_fastpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log10_fastpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log10_fastpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log10_fastpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log10_fastpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log10_slowpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log10_slowpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log10_slowpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log10_slowpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log10_slowpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log10_slowpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log10_slowpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log10_slowpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log10_slowpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log10_slowpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log10_slowpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log10_slowpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log10_slowpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log10_slowpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log10_slowpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log10_slowpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log10_slowpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log10_slowpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log10_slowpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log10_slowpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log10_slowpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log10_slowpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log10_slowpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log10_slowpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log1p_fastpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log1p_fastpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log1p_fastpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log1p_fastpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log1p_fastpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log1p_fastpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log1p_fastpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log1p_fastpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log1p_fastpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log1p_fastpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log1p_fastpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log1p_fastpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log1p_fastpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log1p_fastpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log1p_fastpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log1p_fastpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log1p_fastpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log1p_fastpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log1p_fastpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log1p_fastpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log1p_fastpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log1p_fastpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log1p_fastpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log1p_fastpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log1p_slowpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log1p_slowpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log1p_slowpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log1p_slowpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log1p_slowpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log1p_slowpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log1p_slowpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log1p_slowpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log1p_slowpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log1p_slowpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log1p_slowpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log1p_slowpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log1p_slowpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log1p_slowpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log1p_slowpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log1p_slowpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log1p_slowpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log1p_slowpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log1p_slowpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log1p_slowpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log1p_slowpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log1p_slowpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log1p_slowpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log1p_slowpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log2_fastpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log2_fastpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log2_fastpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log2_fastpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log2_fastpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log2_fastpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log2_fastpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log2_fastpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log2_fastpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log2_fastpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log2_fastpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log2_fastpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log2_fastpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log2_fastpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log2_fastpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log2_fastpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log2_fastpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log2_fastpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log2_fastpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log2_fastpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log2_fastpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log2_fastpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log2_fastpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log2_fastpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log2_slowpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log2_slowpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log2_slowpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log2_slowpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log2_slowpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log2_slowpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log2_slowpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log2_slowpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log2_slowpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log2_slowpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log2_slowpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log2_slowpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log2_slowpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log2_slowpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log2_slowpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log2_slowpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log2_slowpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log2_slowpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log2_slowpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log2_slowpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log2_slowpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log2_slowpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log2_slowpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log2_slowpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log_fastpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log_fastpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log_fastpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log_fastpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log_fastpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log_fastpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log_fastpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log_fastpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log_fastpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log_fastpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log_fastpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log_fastpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log_fastpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log_fastpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log_fastpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log_fastpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log_fastpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log_fastpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log_fastpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log_fastpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log_fastpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log_fastpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log_fastpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log_fastpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log_slowpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log_slowpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log_slowpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log_slowpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log_slowpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log_slowpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log_slowpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log_slowpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log_slowpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log_slowpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log_slowpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log_slowpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log_slowpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log_slowpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log_slowpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log_slowpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log_slowpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log_slowpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log_slowpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log_slowpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log_slowpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log_slowpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log_slowpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log_slowpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_max_fastpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_max_fastpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_max_fastpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_max_fastpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_max_fastpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_max_fastpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_max_fastpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_max_fastpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_max_fastpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_max_fastpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_max_fastpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_max_fastpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_max_fastpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_max_fastpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_max_fastpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_max_fastpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_max_fastpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_max_fastpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_max_fastpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_max_fastpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_max_fastpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_max_fastpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_max_fastpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_max_fastpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_max_slowpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_max_slowpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_max_slowpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_max_slowpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_max_slowpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_max_slowpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_max_slowpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_max_slowpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_max_slowpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_max_slowpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_max_slowpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_max_slowpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_max_slowpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_max_slowpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_max_slowpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_max_slowpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_max_slowpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_max_slowpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_max_slowpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_max_slowpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_max_slowpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_max_slowpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_max_slowpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_max_slowpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_maximum_fastpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_maximum_fastpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_maximum_fastpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_maximum_fastpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_maximum_fastpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_maximum_fastpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_maximum_fastpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_maximum_fastpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_maximum_fastpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_maximum_fastpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_maximum_fastpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_maximum_fastpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_maximum_fastpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_maximum_fastpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_maximum_fastpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_maximum_fastpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_maximum_fastpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_maximum_fastpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_maximum_fastpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_maximum_fastpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_maximum_fastpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_maximum_fastpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_maximum_fastpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_maximum_fastpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_maximum_slowpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_maximum_slowpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_maximum_slowpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_maximum_slowpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_maximum_slowpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_maximum_slowpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_maximum_slowpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_maximum_slowpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_maximum_slowpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_maximum_slowpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_maximum_slowpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_maximum_slowpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_maximum_slowpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_maximum_slowpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_maximum_slowpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_maximum_slowpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_maximum_slowpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_maximum_slowpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_maximum_slowpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_maximum_slowpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_maximum_slowpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_maximum_slowpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_maximum_slowpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_maximum_slowpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_minimum_fastpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_minimum_fastpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_minimum_fastpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_minimum_fastpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_minimum_fastpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_minimum_fastpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_minimum_fastpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_minimum_fastpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_minimum_fastpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_minimum_fastpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_minimum_fastpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_minimum_fastpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_minimum_fastpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_minimum_fastpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_minimum_fastpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_minimum_fastpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_minimum_fastpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_minimum_fastpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_minimum_fastpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_minimum_fastpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_minimum_fastpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_minimum_fastpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_minimum_fastpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_minimum_fastpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_minimum_slowpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_minimum_slowpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_minimum_slowpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_minimum_slowpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_minimum_slowpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_minimum_slowpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_minimum_slowpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_minimum_slowpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_minimum_slowpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_minimum_slowpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_minimum_slowpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_minimum_slowpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_minimum_slowpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_minimum_slowpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_minimum_slowpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_minimum_slowpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_minimum_slowpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_minimum_slowpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_minimum_slowpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_minimum_slowpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_minimum_slowpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_minimum_slowpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_minimum_slowpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_minimum_slowpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_mul_fastpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_mul_fastpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_mul_fastpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_mul_fastpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_mul_fastpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_mul_fastpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_mul_fastpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_mul_fastpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_mul_fastpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_mul_fastpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_mul_fastpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_mul_fastpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_mul_fastpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_mul_fastpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_mul_fastpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_mul_fastpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_mul_fastpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_mul_fastpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_mul_fastpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_mul_fastpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_mul_fastpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_mul_fastpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_mul_fastpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_mul_fastpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_mul_slowpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_mul_slowpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_mul_slowpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_mul_slowpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_mul_slowpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_mul_slowpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_mul_slowpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_mul_slowpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_mul_slowpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_mul_slowpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_mul_slowpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_mul_slowpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_mul_slowpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_mul_slowpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_mul_slowpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_mul_slowpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_mul_slowpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_mul_slowpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_mul_slowpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_mul_slowpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_mul_slowpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_mul_slowpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_mul_slowpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_mul_slowpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_neg_fastpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_neg_fastpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_neg_fastpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_neg_fastpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_neg_fastpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_neg_fastpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_neg_fastpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_neg_fastpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_neg_fastpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_neg_fastpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_neg_fastpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_neg_fastpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_neg_fastpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_neg_fastpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_neg_fastpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_neg_fastpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_neg_fastpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_neg_fastpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_neg_fastpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_neg_fastpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_neg_fastpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_neg_fastpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_neg_fastpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_neg_fastpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_neg_slowpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_neg_slowpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_neg_slowpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_neg_slowpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_neg_slowpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_neg_slowpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_neg_slowpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_neg_slowpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_neg_slowpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_neg_slowpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_neg_slowpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_neg_slowpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_neg_slowpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_neg_slowpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_neg_slowpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_neg_slowpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_neg_slowpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_neg_slowpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_neg_slowpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_neg_slowpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_neg_slowpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_neg_slowpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_neg_slowpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_neg_slowpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_norm_fastpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_norm_fastpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_norm_fastpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_norm_fastpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_norm_fastpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_norm_fastpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_norm_fastpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_norm_fastpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_norm_fastpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_norm_fastpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_norm_fastpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_norm_fastpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_norm_fastpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_norm_fastpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_norm_fastpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_norm_fastpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_norm_fastpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_norm_fastpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_norm_fastpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_norm_fastpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_norm_fastpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_norm_fastpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_norm_fastpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_norm_fastpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_norm_slowpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_norm_slowpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_norm_slowpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_norm_slowpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_norm_slowpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_norm_slowpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_norm_slowpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_norm_slowpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_norm_slowpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_norm_slowpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_norm_slowpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_norm_slowpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_norm_slowpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_norm_slowpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_norm_slowpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_norm_slowpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_norm_slowpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_norm_slowpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_norm_slowpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_norm_slowpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_norm_slowpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_norm_slowpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_norm_slowpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_norm_slowpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_pow_fastpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_pow_fastpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_pow_fastpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_pow_fastpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_pow_fastpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_pow_fastpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_pow_fastpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_pow_fastpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_pow_fastpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_pow_fastpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_pow_fastpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_pow_fastpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_pow_fastpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_pow_fastpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_pow_fastpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_pow_fastpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_pow_fastpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_pow_fastpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_pow_fastpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_pow_fastpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_pow_fastpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_pow_fastpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_pow_fastpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_pow_fastpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_pow_slowpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_pow_slowpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_pow_slowpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_pow_slowpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_pow_slowpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_pow_slowpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_pow_slowpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_pow_slowpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_pow_slowpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_pow_slowpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_pow_slowpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_pow_slowpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_pow_slowpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_pow_slowpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_pow_slowpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_pow_slowpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_pow_slowpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_pow_slowpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_pow_slowpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_pow_slowpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_pow_slowpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_pow_slowpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_pow_slowpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_pow_slowpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_reciprocal_fastpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_reciprocal_fastpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_reciprocal_fastpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_reciprocal_fastpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_reciprocal_fastpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_reciprocal_fastpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_reciprocal_fastpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_reciprocal_fastpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_reciprocal_fastpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_reciprocal_fastpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_reciprocal_fastpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_reciprocal_fastpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_reciprocal_fastpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_reciprocal_fastpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_reciprocal_fastpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_reciprocal_fastpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_reciprocal_fastpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_reciprocal_fastpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_reciprocal_fastpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_reciprocal_fastpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_reciprocal_fastpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_reciprocal_fastpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_reciprocal_fastpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_reciprocal_fastpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_reciprocal_slowpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_reciprocal_slowpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_reciprocal_slowpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_reciprocal_slowpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_reciprocal_slowpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_reciprocal_slowpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_reciprocal_slowpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_reciprocal_slowpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_reciprocal_slowpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_reciprocal_slowpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_reciprocal_slowpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_reciprocal_slowpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_reciprocal_slowpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_reciprocal_slowpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_reciprocal_slowpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_reciprocal_slowpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_reciprocal_slowpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_reciprocal_slowpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_reciprocal_slowpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_reciprocal_slowpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_reciprocal_slowpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_reciprocal_slowpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_reciprocal_slowpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_reciprocal_slowpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_round_fastpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_round_fastpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_round_fastpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_round_fastpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_round_fastpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_round_fastpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_round_fastpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_round_fastpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_round_fastpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_round_fastpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_round_fastpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_round_fastpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_round_fastpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_round_fastpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_round_fastpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_round_fastpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_round_fastpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_round_fastpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_round_fastpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_round_fastpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_round_fastpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_round_fastpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_round_fastpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_round_fastpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_round_slowpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_round_slowpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_round_slowpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_round_slowpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_round_slowpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_round_slowpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_round_slowpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_round_slowpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_round_slowpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_round_slowpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_round_slowpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_round_slowpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_round_slowpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_round_slowpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_round_slowpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_round_slowpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_round_slowpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_round_slowpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_round_slowpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_round_slowpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_round_slowpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_round_slowpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_round_slowpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_round_slowpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_rsqrt_fastpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_rsqrt_fastpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_rsqrt_fastpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_rsqrt_fastpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_rsqrt_fastpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_rsqrt_fastpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_rsqrt_fastpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_rsqrt_fastpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_rsqrt_fastpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_rsqrt_fastpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_rsqrt_fastpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_rsqrt_fastpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_rsqrt_fastpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_rsqrt_fastpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_rsqrt_fastpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_rsqrt_fastpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_rsqrt_fastpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_rsqrt_fastpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_rsqrt_fastpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_rsqrt_fastpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_rsqrt_fastpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_rsqrt_fastpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_rsqrt_fastpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_rsqrt_fastpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_rsqrt_slowpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_rsqrt_slowpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_rsqrt_slowpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_rsqrt_slowpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_rsqrt_slowpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_rsqrt_slowpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_rsqrt_slowpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_rsqrt_slowpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_rsqrt_slowpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_rsqrt_slowpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_rsqrt_slowpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_rsqrt_slowpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_rsqrt_slowpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_rsqrt_slowpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_rsqrt_slowpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_rsqrt_slowpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_rsqrt_slowpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_rsqrt_slowpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_rsqrt_slowpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_rsqrt_slowpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_rsqrt_slowpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_rsqrt_slowpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_rsqrt_slowpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_rsqrt_slowpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sigmoid_fastpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sigmoid_fastpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sigmoid_fastpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sigmoid_fastpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sigmoid_fastpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sigmoid_fastpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sigmoid_fastpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sigmoid_fastpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sigmoid_fastpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sigmoid_fastpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sigmoid_fastpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sigmoid_fastpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sigmoid_fastpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sigmoid_fastpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sigmoid_fastpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sigmoid_fastpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sigmoid_fastpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sigmoid_fastpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sigmoid_fastpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sigmoid_fastpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sigmoid_fastpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sigmoid_fastpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sigmoid_fastpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sigmoid_fastpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sigmoid_slowpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sigmoid_slowpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sigmoid_slowpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sigmoid_slowpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sigmoid_slowpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sigmoid_slowpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sigmoid_slowpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sigmoid_slowpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sigmoid_slowpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sigmoid_slowpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sigmoid_slowpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sigmoid_slowpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sigmoid_slowpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sigmoid_slowpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sigmoid_slowpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sigmoid_slowpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sigmoid_slowpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sigmoid_slowpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sigmoid_slowpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sigmoid_slowpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sigmoid_slowpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sigmoid_slowpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sigmoid_slowpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sigmoid_slowpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sign_fastpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sign_fastpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sign_fastpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sign_fastpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sign_fastpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sign_fastpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sign_fastpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sign_fastpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sign_fastpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sign_fastpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sign_fastpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sign_fastpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sign_fastpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sign_fastpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sign_fastpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sign_fastpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sign_fastpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sign_fastpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sign_fastpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sign_fastpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sign_fastpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sign_fastpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sign_fastpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sign_fastpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sign_slowpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sign_slowpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sign_slowpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sign_slowpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sign_slowpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sign_slowpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sign_slowpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sign_slowpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sign_slowpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sign_slowpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sign_slowpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sign_slowpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sign_slowpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sign_slowpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sign_slowpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sign_slowpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sign_slowpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sign_slowpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sign_slowpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sign_slowpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sign_slowpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sign_slowpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sign_slowpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sign_slowpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sin_fastpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sin_fastpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sin_fastpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sin_fastpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sin_fastpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sin_fastpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sin_fastpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sin_fastpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sin_fastpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sin_fastpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sin_fastpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sin_fastpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sin_fastpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sin_fastpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sin_fastpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sin_fastpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sin_fastpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sin_fastpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sin_fastpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sin_fastpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sin_fastpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sin_fastpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sin_fastpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sin_fastpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sin_slowpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sin_slowpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sin_slowpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sin_slowpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sin_slowpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sin_slowpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sin_slowpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sin_slowpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sin_slowpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sin_slowpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sin_slowpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sin_slowpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sin_slowpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sin_slowpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sin_slowpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sin_slowpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sin_slowpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sin_slowpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sin_slowpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sin_slowpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sin_slowpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sin_slowpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sin_slowpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sin_slowpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sinh_fastpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sinh_fastpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sinh_fastpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sinh_fastpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sinh_fastpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sinh_fastpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sinh_fastpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sinh_fastpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sinh_fastpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sinh_fastpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sinh_fastpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sinh_fastpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sinh_fastpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sinh_fastpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sinh_fastpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sinh_fastpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sinh_fastpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sinh_fastpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sinh_fastpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sinh_fastpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sinh_fastpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sinh_fastpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sinh_fastpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sinh_fastpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sinh_slowpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sinh_slowpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sinh_slowpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sinh_slowpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sinh_slowpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sinh_slowpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sinh_slowpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sinh_slowpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sinh_slowpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sinh_slowpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sinh_slowpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sinh_slowpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sinh_slowpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sinh_slowpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sinh_slowpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sinh_slowpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sinh_slowpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sinh_slowpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sinh_slowpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sinh_slowpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sinh_slowpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sinh_slowpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sinh_slowpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sinh_slowpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sqrt_fastpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sqrt_fastpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sqrt_fastpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sqrt_fastpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sqrt_fastpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sqrt_fastpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sqrt_fastpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sqrt_fastpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sqrt_fastpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sqrt_fastpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sqrt_fastpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sqrt_fastpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sqrt_fastpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sqrt_fastpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sqrt_fastpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sqrt_fastpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sqrt_fastpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sqrt_fastpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sqrt_fastpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sqrt_fastpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sqrt_fastpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sqrt_fastpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sqrt_fastpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sqrt_fastpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sqrt_slowpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sqrt_slowpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sqrt_slowpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sqrt_slowpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sqrt_slowpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sqrt_slowpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sqrt_slowpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sqrt_slowpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sqrt_slowpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sqrt_slowpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sqrt_slowpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sqrt_slowpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sqrt_slowpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sqrt_slowpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sqrt_slowpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sqrt_slowpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sqrt_slowpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sqrt_slowpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sqrt_slowpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sqrt_slowpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sqrt_slowpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sqrt_slowpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sqrt_slowpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sqrt_slowpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sub_fastpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sub_fastpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sub_fastpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sub_fastpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sub_fastpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sub_fastpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sub_fastpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sub_fastpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sub_fastpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sub_fastpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sub_fastpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sub_fastpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sub_fastpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sub_fastpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sub_fastpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sub_fastpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sub_fastpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sub_fastpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sub_fastpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sub_fastpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sub_fastpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sub_fastpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sub_fastpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sub_fastpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sub_slowpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sub_slowpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sub_slowpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sub_slowpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sub_slowpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sub_slowpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sub_slowpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sub_slowpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sub_slowpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sub_slowpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sub_slowpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sub_slowpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sub_slowpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sub_slowpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sub_slowpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sub_slowpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sub_slowpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sub_slowpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sub_slowpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sub_slowpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sub_slowpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sub_slowpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sub_slowpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sub_slowpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tan_fastpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tan_fastpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tan_fastpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tan_fastpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tan_fastpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tan_fastpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tan_fastpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tan_fastpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tan_fastpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tan_fastpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tan_fastpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tan_fastpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tan_fastpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tan_fastpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tan_fastpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tan_fastpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tan_fastpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tan_fastpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tan_fastpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tan_fastpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tan_fastpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tan_fastpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tan_fastpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tan_fastpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tan_slowpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tan_slowpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tan_slowpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tan_slowpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tan_slowpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tan_slowpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tan_slowpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tan_slowpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tan_slowpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tan_slowpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tan_slowpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tan_slowpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tan_slowpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tan_slowpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tan_slowpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tan_slowpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tan_slowpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tan_slowpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tan_slowpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tan_slowpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tan_slowpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tan_slowpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tan_slowpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tan_slowpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tanh_fastpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tanh_fastpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tanh_fastpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tanh_fastpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tanh_fastpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tanh_fastpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tanh_fastpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tanh_fastpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tanh_fastpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tanh_fastpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tanh_fastpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tanh_fastpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tanh_fastpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tanh_fastpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tanh_fastpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tanh_fastpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tanh_fastpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tanh_fastpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tanh_fastpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tanh_fastpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tanh_fastpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tanh_fastpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tanh_fastpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tanh_fastpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tanh_slowpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tanh_slowpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tanh_slowpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tanh_slowpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tanh_slowpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tanh_slowpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tanh_slowpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tanh_slowpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tanh_slowpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tanh_slowpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tanh_slowpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tanh_slowpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tanh_slowpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tanh_slowpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tanh_slowpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tanh_slowpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tanh_slowpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tanh_slowpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tanh_slowpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tanh_slowpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tanh_slowpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tanh_slowpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tanh_slowpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tanh_slowpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_trunc_fastpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_trunc_fastpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_trunc_fastpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_trunc_fastpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_trunc_fastpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_trunc_fastpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_trunc_fastpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_trunc_fastpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_trunc_fastpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_trunc_fastpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_trunc_fastpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_trunc_fastpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_trunc_fastpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_trunc_fastpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_trunc_fastpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_trunc_fastpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_trunc_fastpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_trunc_fastpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_trunc_fastpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_trunc_fastpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_trunc_fastpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_trunc_fastpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_trunc_fastpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_trunc_fastpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_trunc_slowpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_trunc_slowpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_trunc_slowpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_trunc_slowpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_trunc_slowpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_trunc_slowpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_trunc_slowpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_trunc_slowpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_trunc_slowpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_trunc_slowpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_trunc_slowpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_trunc_slowpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_trunc_slowpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_trunc_slowpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_trunc_slowpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_trunc_slowpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_trunc_slowpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_trunc_slowpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_trunc_slowpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_trunc_slowpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_trunc_slowpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_trunc_slowpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_trunc_slowpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_trunc_slowpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_zero_fastpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_zero_fastpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_zero_fastpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_zero_fastpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_zero_fastpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_zero_fastpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_zero_fastpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_zero_fastpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_zero_fastpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_zero_fastpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_zero_fastpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_zero_fastpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_zero_fastpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_zero_fastpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_zero_fastpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_zero_fastpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_zero_fastpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_zero_fastpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_zero_fastpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_zero_fastpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_zero_fastpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_zero_fastpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_zero_fastpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_zero_fastpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_zero_slowpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_zero_slowpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_zero_slowpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_zero_slowpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_zero_slowpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_zero_slowpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_zero_slowpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_zero_slowpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_zero_slowpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_zero_slowpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_zero_slowpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_zero_slowpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_zero_slowpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_zero_slowpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_zero_slowpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_zero_slowpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_zero_slowpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_zero_slowpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_zero_slowpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_zero_slowpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_zero_slowpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_zero_slowpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_zero_slowpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_zero_slowpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_pointwise_op_tensors_on_different_devices__foreach_addcdiv_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_pointwise_op_tensors_on_different_devices__foreach_addcdiv_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_pointwise_op_tensors_on_different_devices__foreach_addcmul_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_pointwise_op_tensors_on_different_devices__foreach_addcmul_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_pointwise_op_with_tensor_of_scalarlist_overload__foreach_addcdiv_is_fastpath_False_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_pointwise_op_with_tensor_of_scalarlist_overload__foreach_addcdiv_is_fastpath_False_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_pointwise_op_with_tensor_of_scalarlist_overload__foreach_addcdiv_is_fastpath_False_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_pointwise_op_with_tensor_of_scalarlist_overload__foreach_addcdiv_is_fastpath_False_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_pointwise_op_with_tensor_of_scalarlist_overload__foreach_addcdiv_is_fastpath_False_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_pointwise_op_with_tensor_of_scalarlist_overload__foreach_addcdiv_is_fastpath_False_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_pointwise_op_with_tensor_of_scalarlist_overload__foreach_addcdiv_is_fastpath_False_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_pointwise_op_with_tensor_of_scalarlist_overload__foreach_addcdiv_is_fastpath_False_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_pointwise_op_with_tensor_of_scalarlist_overload__foreach_addcdiv_is_fastpath_False_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_pointwise_op_with_tensor_of_scalarlist_overload__foreach_addcdiv_is_fastpath_False_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_pointwise_op_with_tensor_of_scalarlist_overload__foreach_addcdiv_is_fastpath_False_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_pointwise_op_with_tensor_of_scalarlist_overload__foreach_addcdiv_is_fastpath_False_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_pointwise_op_with_tensor_of_scalarlist_overload__foreach_addcdiv_is_fastpath_True_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_pointwise_op_with_tensor_of_scalarlist_overload__foreach_addcdiv_is_fastpath_True_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_pointwise_op_with_tensor_of_scalarlist_overload__foreach_addcdiv_is_fastpath_True_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_pointwise_op_with_tensor_of_scalarlist_overload__foreach_addcdiv_is_fastpath_True_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_pointwise_op_with_tensor_of_scalarlist_overload__foreach_addcdiv_is_fastpath_True_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_pointwise_op_with_tensor_of_scalarlist_overload__foreach_addcdiv_is_fastpath_True_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_pointwise_op_with_tensor_of_scalarlist_overload__foreach_addcdiv_is_fastpath_True_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_pointwise_op_with_tensor_of_scalarlist_overload__foreach_addcdiv_is_fastpath_True_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_pointwise_op_with_tensor_of_scalarlist_overload__foreach_addcdiv_is_fastpath_True_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_pointwise_op_with_tensor_of_scalarlist_overload__foreach_addcdiv_is_fastpath_True_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_pointwise_op_with_tensor_of_scalarlist_overload__foreach_addcdiv_is_fastpath_True_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_pointwise_op_with_tensor_of_scalarlist_overload__foreach_addcdiv_is_fastpath_True_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_pointwise_op_with_tensor_of_scalarlist_overload__foreach_addcmul_is_fastpath_False_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_pointwise_op_with_tensor_of_scalarlist_overload__foreach_addcmul_is_fastpath_False_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_pointwise_op_with_tensor_of_scalarlist_overload__foreach_addcmul_is_fastpath_False_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_pointwise_op_with_tensor_of_scalarlist_overload__foreach_addcmul_is_fastpath_False_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_pointwise_op_with_tensor_of_scalarlist_overload__foreach_addcmul_is_fastpath_False_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_pointwise_op_with_tensor_of_scalarlist_overload__foreach_addcmul_is_fastpath_False_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_pointwise_op_with_tensor_of_scalarlist_overload__foreach_addcmul_is_fastpath_False_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_pointwise_op_with_tensor_of_scalarlist_overload__foreach_addcmul_is_fastpath_False_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_pointwise_op_with_tensor_of_scalarlist_overload__foreach_addcmul_is_fastpath_False_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_pointwise_op_with_tensor_of_scalarlist_overload__foreach_addcmul_is_fastpath_False_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_pointwise_op_with_tensor_of_scalarlist_overload__foreach_addcmul_is_fastpath_False_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_pointwise_op_with_tensor_of_scalarlist_overload__foreach_addcmul_is_fastpath_False_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_pointwise_op_with_tensor_of_scalarlist_overload__foreach_addcmul_is_fastpath_True_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_pointwise_op_with_tensor_of_scalarlist_overload__foreach_addcmul_is_fastpath_True_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_pointwise_op_with_tensor_of_scalarlist_overload__foreach_addcmul_is_fastpath_True_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_pointwise_op_with_tensor_of_scalarlist_overload__foreach_addcmul_is_fastpath_True_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_pointwise_op_with_tensor_of_scalarlist_overload__foreach_addcmul_is_fastpath_True_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_pointwise_op_with_tensor_of_scalarlist_overload__foreach_addcmul_is_fastpath_True_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_pointwise_op_with_tensor_of_scalarlist_overload__foreach_addcmul_is_fastpath_True_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_pointwise_op_with_tensor_of_scalarlist_overload__foreach_addcmul_is_fastpath_True_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_pointwise_op_with_tensor_of_scalarlist_overload__foreach_addcmul_is_fastpath_True_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_pointwise_op_with_tensor_of_scalarlist_overload__foreach_addcmul_is_fastpath_True_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_pointwise_op_with_tensor_of_scalarlist_overload__foreach_addcmul_is_fastpath_True_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_pointwise_op_with_tensor_of_scalarlist_overload__foreach_addcmul_is_fastpath_True_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_tensors_grouping_cuda, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_abs_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_abs_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_abs_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_abs_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_abs_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_abs_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_abs_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_abs_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_abs_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_abs_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_abs_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_abs_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_acos_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_acos_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_acos_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_acos_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_acos_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_acos_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_acos_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_acos_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_acos_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_acos_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_acos_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_acos_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_asin_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_asin_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_asin_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_asin_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_asin_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_asin_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_asin_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_asin_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_asin_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_asin_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_asin_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_asin_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_atan_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_atan_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_atan_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_atan_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_atan_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_atan_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_atan_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_atan_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_atan_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_atan_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_atan_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_atan_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_ceil_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_ceil_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_ceil_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_ceil_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_ceil_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_ceil_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_ceil_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_ceil_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_ceil_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_ceil_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_ceil_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_ceil_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_cos_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_cos_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_cos_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_cos_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_cos_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_cos_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_cos_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_cos_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_cos_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_cos_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_cos_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_cos_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_cosh_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_cosh_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_cosh_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_cosh_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_cosh_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_cosh_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_cosh_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_cosh_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_cosh_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_cosh_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_cosh_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_cosh_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_erf_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_erf_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_erf_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_erf_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_erf_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_erf_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_erf_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_erf_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_erf_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_erf_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_erf_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_erf_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_erfc_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_erfc_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_erfc_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_erfc_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_erfc_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_erfc_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_erfc_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_erfc_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_erfc_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_erfc_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_erfc_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_erfc_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_exp_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_exp_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_exp_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_exp_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_exp_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_exp_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_exp_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_exp_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_exp_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_exp_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_exp_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_exp_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_expm1_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_expm1_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_expm1_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_expm1_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_expm1_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_expm1_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_expm1_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_expm1_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_expm1_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_expm1_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_expm1_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_expm1_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_floor_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_floor_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_floor_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_floor_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_floor_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_floor_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_floor_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_floor_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_floor_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_floor_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_floor_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_floor_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_frac_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_frac_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_frac_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_frac_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_frac_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_frac_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_frac_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_frac_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_frac_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_frac_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_frac_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_frac_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_lgamma_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_lgamma_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_lgamma_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_lgamma_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_lgamma_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_lgamma_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_lgamma_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_lgamma_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_lgamma_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_lgamma_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_lgamma_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_lgamma_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_log10_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_log10_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_log10_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_log10_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_log10_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_log10_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_log10_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_log10_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_log10_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_log10_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_log10_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_log10_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_log1p_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_log1p_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_log1p_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_log1p_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_log1p_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_log1p_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_log1p_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_log1p_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_log1p_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_log1p_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_log1p_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_log1p_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_log2_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_log2_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_log2_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_log2_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_log2_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_log2_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_log2_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_log2_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_log2_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_log2_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_log2_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_log2_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_log_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_log_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_log_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_log_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_log_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_log_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_log_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_log_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_log_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_log_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_log_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_log_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_neg_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_neg_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_neg_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_neg_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_neg_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_neg_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_neg_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_neg_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_neg_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_neg_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_neg_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_neg_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_reciprocal_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_reciprocal_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_reciprocal_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_reciprocal_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_reciprocal_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_reciprocal_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_reciprocal_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_reciprocal_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_reciprocal_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_reciprocal_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_reciprocal_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_reciprocal_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_round_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_round_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_round_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_round_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_round_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_round_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_round_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_round_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_round_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_round_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_round_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_round_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_rsqrt_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_rsqrt_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_rsqrt_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_rsqrt_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_rsqrt_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_rsqrt_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_rsqrt_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_rsqrt_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_rsqrt_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_rsqrt_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_rsqrt_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_rsqrt_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sigmoid_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sigmoid_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sigmoid_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sigmoid_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sigmoid_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sigmoid_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sigmoid_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sigmoid_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sigmoid_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sigmoid_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sigmoid_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sigmoid_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sign_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sign_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sign_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sign_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sign_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sign_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sign_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sign_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sign_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sign_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sign_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sign_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sin_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sin_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sin_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sin_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sin_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sin_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sin_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sin_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sin_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sin_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sin_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sin_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sinh_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sinh_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sinh_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sinh_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sinh_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sinh_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sinh_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sinh_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sinh_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sinh_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sinh_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sinh_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sqrt_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sqrt_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sqrt_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sqrt_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sqrt_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sqrt_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sqrt_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sqrt_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sqrt_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sqrt_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sqrt_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sqrt_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_tan_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_tan_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_tan_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_tan_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_tan_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_tan_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_tan_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_tan_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_tan_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_tan_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_tan_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_tan_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_tanh_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_tanh_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_tanh_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_tanh_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_tanh_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_tanh_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_tanh_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_tanh_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_tanh_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_tanh_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_tanh_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_tanh_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_trunc_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_trunc_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_trunc_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_trunc_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_trunc_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_trunc_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_trunc_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_trunc_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_trunc_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_trunc_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_trunc_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_trunc_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_zero_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_zero_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_zero_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_zero_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_zero_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_zero_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_zero_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_zero_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_zero_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_zero_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_zero_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_zero_cuda_uint8 2025-12-04T16:42:06.0692010Z 2025-12-04T16:42:06.0692365Z Finished test_foreach 1/1 ... [2025-12-04 16:42:05.668435][26954.051326727], took 10.30min 2025-12-04T16:42:06.0693476Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_foreach/test_foreach-aa4419a4e7b6d381.xml 2025-12-04T16:42:06.0694487Z Running xpu/test_gemm 1/1 ... [2025-12-04 16:42:05.855809][26954.23870292] 2025-12-04T16:42:06.0695285Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T16:42:06.0696550Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'xpu/test_gemm.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 16:42:05.856264] 2025-12-04T16:42:11.7427547Z 2025-12-04T16:42:11.7428756Z xpu/test_gemm 1/1 was successful, full logs can be found in artifacts with path test/test-reports/xpu.test_gemm_1.1_f9c98ad78a8f930f_.log 2025-12-04T16:42:11.7429612Z Running 0 items in this shard: 2025-12-04T16:42:11.7429834Z 2025-12-04T16:42:11.7430122Z Finished xpu/test_gemm 1/1 ... [2025-12-04 16:42:11.742540][26960.125437229], took 0.10min 2025-12-04T16:42:11.7774086Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/xpu.test_gemm/xpu.test_gemm-2cb9cf39de6aa2cf.xml 2025-12-04T16:42:11.8063767Z Running test_numpy_interop 1/1 ... [2025-12-04 16:42:11.806095][26960.188989215] 2025-12-04T16:42:11.8064621Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T16:42:11.8068158Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_numpy_interop.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 16:42:11.806521] 2025-12-04T16:42:19.8327170Z 2025-12-04T16:42:19.8328468Z test_numpy_interop 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_numpy_interop_1.1_0cfaaa8b9ef10506_.log 2025-12-04T16:42:19.8347483Z Running 46 items in this shard: test/test_numpy_interop.py::TestNumPyInteropCUDA::test___eq___cuda_bool, test/test_numpy_interop.py::TestNumPyInteropCUDA::test___eq___cuda_complex128, test/test_numpy_interop.py::TestNumPyInteropCUDA::test___eq___cuda_complex64, test/test_numpy_interop.py::TestNumPyInteropCUDA::test___eq___cuda_float16, test/test_numpy_interop.py::TestNumPyInteropCUDA::test___eq___cuda_float32, test/test_numpy_interop.py::TestNumPyInteropCUDA::test___eq___cuda_float64, test/test_numpy_interop.py::TestNumPyInteropCUDA::test___eq___cuda_int16, test/test_numpy_interop.py::TestNumPyInteropCUDA::test___eq___cuda_int32, test/test_numpy_interop.py::TestNumPyInteropCUDA::test___eq___cuda_int64, test/test_numpy_interop.py::TestNumPyInteropCUDA::test___eq___cuda_int8, test/test_numpy_interop.py::TestNumPyInteropCUDA::test___eq___cuda_uint8, test/test_numpy_interop.py::TestNumPyInteropCUDA::test_copy_mode_cuda, test/test_numpy_interop.py::TestNumPyInteropCUDA::test_ctor_with_invalid_numpy_array_sequence_cuda, test/test_numpy_interop.py::TestNumPyInteropCUDA::test_ctor_with_numpy_scalar_ctor_cuda, test/test_numpy_interop.py::TestNumPyInteropCUDA::test_empty_tensors_interop_cuda, test/test_numpy_interop.py::TestNumPyInteropCUDA::test_from_list_of_ndarray_warning_cuda, test/test_numpy_interop.py::TestNumPyInteropCUDA::test_from_numpy_cuda, test/test_numpy_interop.py::TestNumPyInteropCUDA::test_from_numpy_no_leak_on_invalid_dtype_cuda, test/test_numpy_interop.py::TestNumPyInteropCUDA::test_from_numpy_zero_element_type_cuda, test/test_numpy_interop.py::TestNumPyInteropCUDA::test_has_storage_numpy_cuda, test/test_numpy_interop.py::TestNumPyInteropCUDA::test_multiplication_numpy_scalar_cuda, test/test_numpy_interop.py::TestNumPyInteropCUDA::test_ndarray_astype_object_graph_break_2_cuda, test/test_numpy_interop.py::TestNumPyInteropCUDA::test_ndarray_astype_object_graph_break_cuda, test/test_numpy_interop.py::TestNumPyInteropCUDA::test_numpy_array_interface_cuda, test/test_numpy_interop.py::TestNumPyInteropCUDA::test_numpy_index_cuda, test/test_numpy_interop.py::TestNumPyInteropCUDA::test_numpy_index_multi_cuda, test/test_numpy_interop.py::TestNumPyInteropCUDA::test_numpy_non_writeable_cuda, test/test_numpy_interop.py::TestNumPyInteropCUDA::test_numpy_scalar_cmp_cuda_bfloat16, test/test_numpy_interop.py::TestNumPyInteropCUDA::test_numpy_scalar_cmp_cuda_bool, test/test_numpy_interop.py::TestNumPyInteropCUDA::test_numpy_scalar_cmp_cuda_complex128, test/test_numpy_interop.py::TestNumPyInteropCUDA::test_numpy_scalar_cmp_cuda_complex64, test/test_numpy_interop.py::TestNumPyInteropCUDA::test_numpy_scalar_cmp_cuda_float16, test/test_numpy_interop.py::TestNumPyInteropCUDA::test_numpy_scalar_cmp_cuda_float32, test/test_numpy_interop.py::TestNumPyInteropCUDA::test_numpy_scalar_cmp_cuda_float64, test/test_numpy_interop.py::TestNumPyInteropCUDA::test_numpy_scalar_cmp_cuda_int16, test/test_numpy_interop.py::TestNumPyInteropCUDA::test_numpy_scalar_cmp_cuda_int32, test/test_numpy_interop.py::TestNumPyInteropCUDA::test_numpy_scalar_cmp_cuda_int64, test/test_numpy_interop.py::TestNumPyInteropCUDA::test_numpy_scalar_cmp_cuda_int8, test/test_numpy_interop.py::TestNumPyInteropCUDA::test_numpy_scalar_cmp_cuda_uint8, test/test_numpy_interop.py::TestNumPyInteropCUDA::test_numpy_unresizable_cuda, test/test_numpy_interop.py::TestNumPyInteropCUDA::test_parse_numpy_int_cuda, test/test_numpy_interop.py::TestNumPyInteropCUDA::test_parse_numpy_int_overflow_cuda, test/test_numpy_interop.py::TestNumPyInteropCUDA::test_to_numpy_bool_cuda, test/test_numpy_interop.py::TestNumPyInteropCUDA::test_to_numpy_cuda, test/test_numpy_interop.py::TestNumPyInteropCUDA::test_to_numpy_force_argument_cuda, test/test_numpy_interop.py::TestNumPyInteropCUDA::test_to_numpy_zero_tensor_cuda 2025-12-04T16:42:19.8365969Z 2025-12-04T16:42:19.8366347Z Finished test_numpy_interop 1/1 ... [2025-12-04 16:42:19.832534][26968.215430871], took 0.13min 2025-12-04T16:42:19.8670086Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_numpy_interop/test_numpy_interop-660870d95235d56d.xml 2025-12-04T16:42:19.9367942Z Running profiler/test_cpp_thread 1/1 ... [2025-12-04 16:42:19.936509][26968.319403393] 2025-12-04T16:42:19.9368509Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T16:42:19.9372173Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'profiler/test_cpp_thread.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 16:42:19.936951] 2025-12-04T16:42:30.7174079Z 2025-12-04T16:42:30.7175359Z profiler/test_cpp_thread 1/1 was successful, full logs can be found in artifacts with path test/test-reports/profiler.test_cpp_thread_1.1_6bc17e34ef07b5a0_.log 2025-12-04T16:42:30.7179166Z Running 6 items in this shard: test/profiler/test_cpp_thread.py::CppThreadTestCUDA::test_profile_memory_cuda, test/profiler/test_cpp_thread.py::CppThreadTestCUDA::test_with_enable_profiler_in_child_thread_cuda, test/profiler/test_cpp_thread.py::CppThreadTestCUDA::test_without_enable_profiler_in_child_thread_cuda, test/profiler/test_cpp_thread.py::CppThreadTestXPU::test_profile_memory_xpu, test/profiler/test_cpp_thread.py::CppThreadTestXPU::test_with_enable_profiler_in_child_thread_xpu, test/profiler/test_cpp_thread.py::CppThreadTestXPU::test_without_enable_profiler_in_child_thread_xpu 2025-12-04T16:42:30.7182112Z 2025-12-04T16:42:30.7182485Z Finished profiler/test_cpp_thread 1/1 ... [2025-12-04 16:42:30.717263][26979.100158034], took 0.18min 2025-12-04T16:42:30.7555963Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/profiler.test_cpp_thread/profiler.test_cpp_thread-31559e2ba96f64a3.xml 2025-12-04T16:42:30.8345064Z Running test_hub 1/1 ... [2025-12-04 16:42:30.834201][26979.21709463] 2025-12-04T16:42:30.8345601Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T16:42:30.8348712Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_hub.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 16:42:30.834637] 2025-12-04T16:42:46.2721822Z 2025-12-04T16:42:46.2722703Z test_hub 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_hub_1.1_af317e8677316cdb_.log 2025-12-04T16:42:46.2728139Z Running 20 items in this shard: test/test_hub.py::TestHub::test_download_url_to_file, test/test_hub.py::TestHub::test_get_set_dir, test/test_hub.py::TestHub::test_hub_parse_repo_info, test/test_hub.py::TestHub::test_list_entrypoints, test/test_hub.py::TestHub::test_load_commit_from_forked_repo, test/test_hub.py::TestHub::test_load_from_branch, test/test_hub.py::TestHub::test_load_from_github, test/test_hub.py::TestHub::test_load_from_local_dir, test/test_hub.py::TestHub::test_load_legacy_zip_checkpoint, test/test_hub.py::TestHub::test_load_state_dict_from_url, test/test_hub.py::TestHub::test_load_zip_1_6_checkpoint, test/test_hub.py::TestHub::test_trust_repo_builtin_trusted_owners, test/test_hub.py::TestHub::test_trust_repo_check_no, test/test_hub.py::TestHub::test_trust_repo_check_yes, test/test_hub.py::TestHub::test_trust_repo_false_emptystring, test/test_hub.py::TestHub::test_trust_repo_false_no, test/test_hub.py::TestHub::test_trust_repo_legacy, test/test_hub.py::TestHub::test_trust_repo_none, test/test_hub.py::TestHub::test_trust_repo_true, test/test_hub.py::TestHub::test_trusted_repo_false_yes 2025-12-04T16:42:46.2733574Z 2025-12-04T16:42:46.2733837Z Finished test_hub 1/1 ... [2025-12-04 16:42:46.271922][26994.654819168], took 0.26min 2025-12-04T16:42:46.3066338Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_hub/test_hub-33a47573ff45c77e.xml 2025-12-04T16:42:46.3867393Z Running test_segment_reductions 1/1 ... [2025-12-04 16:42:46.386407][26994.769301074] 2025-12-04T16:42:46.3868057Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T16:42:46.3871273Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_segment_reductions.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 16:42:46.386854] 2025-12-04T16:42:56.7164090Z 2025-12-04T16:42:56.7165402Z test_segment_reductions 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_segment_reductions_1.1_c6d7e787931576c3_.log 2025-12-04T16:42:56.7207285Z Running 74 items in this shard: test/test_segment_reductions.py::TestSegmentReductionsCUDA::test_multi_d_cuda_bfloat16_int32, test/test_segment_reductions.py::TestSegmentReductionsCUDA::test_multi_d_cuda_bfloat16_int64, test/test_segment_reductions.py::TestSegmentReductionsCUDA::test_multi_d_cuda_float16_int32, test/test_segment_reductions.py::TestSegmentReductionsCUDA::test_multi_d_cuda_float16_int64, test/test_segment_reductions.py::TestSegmentReductionsCUDA::test_multi_d_cuda_float32_int32, test/test_segment_reductions.py::TestSegmentReductionsCUDA::test_multi_d_cuda_float32_int64, test/test_segment_reductions.py::TestSegmentReductionsCUDA::test_multi_d_cuda_float64_int32, test/test_segment_reductions.py::TestSegmentReductionsCUDA::test_multi_d_cuda_float64_int64, test/test_segment_reductions.py::TestSegmentReductionsCUDA::test_multi_d_simple_cuda_bfloat16_int32, test/test_segment_reductions.py::TestSegmentReductionsCUDA::test_multi_d_simple_cuda_bfloat16_int64, test/test_segment_reductions.py::TestSegmentReductionsCUDA::test_multi_d_simple_cuda_float16_int32, test/test_segment_reductions.py::TestSegmentReductionsCUDA::test_multi_d_simple_cuda_float16_int64, test/test_segment_reductions.py::TestSegmentReductionsCUDA::test_multi_d_simple_cuda_float32_int32, test/test_segment_reductions.py::TestSegmentReductionsCUDA::test_multi_d_simple_cuda_float32_int64, test/test_segment_reductions.py::TestSegmentReductionsCUDA::test_multi_d_simple_cuda_float64_int32, test/test_segment_reductions.py::TestSegmentReductionsCUDA::test_multi_d_simple_cuda_float64_int64, test/test_segment_reductions.py::TestSegmentReductionsCUDA::test_pytorch_scatter_test_cases_reduce_max_cuda_bfloat16_int32, test/test_segment_reductions.py::TestSegmentReductionsCUDA::test_pytorch_scatter_test_cases_reduce_max_cuda_bfloat16_int64, test/test_segment_reductions.py::TestSegmentReductionsCUDA::test_pytorch_scatter_test_cases_reduce_max_cuda_float16_int32, test/test_segment_reductions.py::TestSegmentReductionsCUDA::test_pytorch_scatter_test_cases_reduce_max_cuda_float16_int64, test/test_segment_reductions.py::TestSegmentReductionsCUDA::test_pytorch_scatter_test_cases_reduce_max_cuda_float32_int32, test/test_segment_reductions.py::TestSegmentReductionsCUDA::test_pytorch_scatter_test_cases_reduce_max_cuda_float32_int64, test/test_segment_reductions.py::TestSegmentReductionsCUDA::test_pytorch_scatter_test_cases_reduce_max_cuda_float64_int32, test/test_segment_reductions.py::TestSegmentReductionsCUDA::test_pytorch_scatter_test_cases_reduce_max_cuda_float64_int64, test/test_segment_reductions.py::TestSegmentReductionsCUDA::test_pytorch_scatter_test_cases_reduce_mean_cuda_bfloat16_int32, test/test_segment_reductions.py::TestSegmentReductionsCUDA::test_pytorch_scatter_test_cases_reduce_mean_cuda_bfloat16_int64, test/test_segment_reductions.py::TestSegmentReductionsCUDA::test_pytorch_scatter_test_cases_reduce_mean_cuda_float16_int32, test/test_segment_reductions.py::TestSegmentReductionsCUDA::test_pytorch_scatter_test_cases_reduce_mean_cuda_float16_int64, test/test_segment_reductions.py::TestSegmentReductionsCUDA::test_pytorch_scatter_test_cases_reduce_mean_cuda_float32_int32, test/test_segment_reductions.py::TestSegmentReductionsCUDA::test_pytorch_scatter_test_cases_reduce_mean_cuda_float32_int64, test/test_segment_reductions.py::TestSegmentReductionsCUDA::test_pytorch_scatter_test_cases_reduce_mean_cuda_float64_int32, test/test_segment_reductions.py::TestSegmentReductionsCUDA::test_pytorch_scatter_test_cases_reduce_mean_cuda_float64_int64, test/test_segment_reductions.py::TestSegmentReductionsCUDA::test_pytorch_scatter_test_cases_reduce_min_cuda_bfloat16_int32, test/test_segment_reductions.py::TestSegmentReductionsCUDA::test_pytorch_scatter_test_cases_reduce_min_cuda_bfloat16_int64, test/test_segment_reductions.py::TestSegmentReductionsCUDA::test_pytorch_scatter_test_cases_reduce_min_cuda_float16_int32, test/test_segment_reductions.py::TestSegmentReductionsCUDA::test_pytorch_scatter_test_cases_reduce_min_cuda_float16_int64, test/test_segment_reductions.py::TestSegmentReductionsCUDA::test_pytorch_scatter_test_cases_reduce_min_cuda_float32_int32, test/test_segment_reductions.py::TestSegmentReductionsCUDA::test_pytorch_scatter_test_cases_reduce_min_cuda_float32_int64, test/test_segment_reductions.py::TestSegmentReductionsCUDA::test_pytorch_scatter_test_cases_reduce_min_cuda_float64_int32, test/test_segment_reductions.py::TestSegmentReductionsCUDA::test_pytorch_scatter_test_cases_reduce_min_cuda_float64_int64, test/test_segment_reductions.py::TestSegmentReductionsCUDA::test_pytorch_scatter_test_cases_reduce_prod_cuda_bfloat16_int32, test/test_segment_reductions.py::TestSegmentReductionsCUDA::test_pytorch_scatter_test_cases_reduce_prod_cuda_bfloat16_int64, test/test_segment_reductions.py::TestSegmentReductionsCUDA::test_pytorch_scatter_test_cases_reduce_prod_cuda_float16_int32, test/test_segment_reductions.py::TestSegmentReductionsCUDA::test_pytorch_scatter_test_cases_reduce_prod_cuda_float16_int64, test/test_segment_reductions.py::TestSegmentReductionsCUDA::test_pytorch_scatter_test_cases_reduce_prod_cuda_float32_int32, test/test_segment_reductions.py::TestSegmentReductionsCUDA::test_pytorch_scatter_test_cases_reduce_prod_cuda_float32_int64, test/test_segment_reductions.py::TestSegmentReductionsCUDA::test_pytorch_scatter_test_cases_reduce_prod_cuda_float64_int32, test/test_segment_reductions.py::TestSegmentReductionsCUDA::test_pytorch_scatter_test_cases_reduce_prod_cuda_float64_int64, test/test_segment_reductions.py::TestSegmentReductionsCUDA::test_pytorch_scatter_test_cases_reduce_sum_cuda_bfloat16_int32, test/test_segment_reductions.py::TestSegmentReductionsCUDA::test_pytorch_scatter_test_cases_reduce_sum_cuda_bfloat16_int64, test/test_segment_reductions.py::TestSegmentReductionsCUDA::test_pytorch_scatter_test_cases_reduce_sum_cuda_float16_int32, test/test_segment_reductions.py::TestSegmentReductionsCUDA::test_pytorch_scatter_test_cases_reduce_sum_cuda_float16_int64, test/test_segment_reductions.py::TestSegmentReductionsCUDA::test_pytorch_scatter_test_cases_reduce_sum_cuda_float32_int32, test/test_segment_reductions.py::TestSegmentReductionsCUDA::test_pytorch_scatter_test_cases_reduce_sum_cuda_float32_int64, test/test_segment_reductions.py::TestSegmentReductionsCUDA::test_pytorch_scatter_test_cases_reduce_sum_cuda_float64_int32, test/test_segment_reductions.py::TestSegmentReductionsCUDA::test_pytorch_scatter_test_cases_reduce_sum_cuda_float64_int64, test/test_segment_reductions.py::TestSegmentReductionsCUDA::test_simple_1d_cuda_bfloat16_int32, test/test_segment_reductions.py::TestSegmentReductionsCUDA::test_simple_1d_cuda_bfloat16_int64, test/test_segment_reductions.py::TestSegmentReductionsCUDA::test_simple_1d_cuda_float16_int32, test/test_segment_reductions.py::TestSegmentReductionsCUDA::test_simple_1d_cuda_float16_int64, test/test_segment_reductions.py::TestSegmentReductionsCUDA::test_simple_1d_cuda_float32_int32, test/test_segment_reductions.py::TestSegmentReductionsCUDA::test_simple_1d_cuda_float32_int64, test/test_segment_reductions.py::TestSegmentReductionsCUDA::test_simple_1d_cuda_float64_int32, test/test_segment_reductions.py::TestSegmentReductionsCUDA::test_simple_1d_cuda_float64_int64, test/test_segment_reductions.py::TestSegmentReductionsCUDA::test_simple_zero_length_cuda_bfloat16_int32, test/test_segment_reductions.py::TestSegmentReductionsCUDA::test_simple_zero_length_cuda_bfloat16_int64, test/test_segment_reductions.py::TestSegmentReductionsCUDA::test_simple_zero_length_cuda_float16_int32, test/test_segment_reductions.py::TestSegmentReductionsCUDA::test_simple_zero_length_cuda_float16_int64, test/test_segment_reductions.py::TestSegmentReductionsCUDA::test_simple_zero_length_cuda_float32_int32, test/test_segment_reductions.py::TestSegmentReductionsCUDA::test_simple_zero_length_cuda_float32_int64, test/test_segment_reductions.py::TestSegmentReductionsCUDA::test_simple_zero_length_cuda_float64_int32, test/test_segment_reductions.py::TestSegmentReductionsCUDA::test_simple_zero_length_cuda_float64_int64, test/test_segment_reductions.py::TestSegmentReductionsCUDA::test_unsafe_flag_cuda_int32, test/test_segment_reductions.py::TestSegmentReductionsCUDA::test_unsafe_flag_cuda_int64 2025-12-04T16:42:56.7248030Z 2025-12-04T16:42:56.7248377Z Finished test_segment_reductions 1/1 ... [2025-12-04 16:42:56.716270][27005.099164802], took 0.17min 2025-12-04T16:42:56.7518242Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_segment_reductions/test_segment_reductions-ad616dd6940e0de0.xml 2025-12-04T16:42:56.8307107Z Running test_autograd_fallback 1/1 ... [2025-12-04 16:42:56.830382][27005.213274943] 2025-12-04T16:42:56.8307694Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T16:42:56.8310564Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_autograd_fallback.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 16:42:56.830825] 2025-12-04T16:43:02.6538942Z 2025-12-04T16:43:02.6539920Z test_autograd_fallback 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_autograd_fallback_1.1_60e7b253f9787096_.log 2025-12-04T16:43:02.6554505Z Running 28 items in this shard: test/test_autograd_fallback.py::TestAutogradFallback::test_autograd_function_registered_to_cpu_mode_nothing, test/test_autograd_fallback.py::TestAutogradFallback::test_autograd_function_registered_to_cpu_mode_warn, test/test_autograd_fallback.py::TestAutogradFallback::test_base_does_not_require_grad_mode_nothing, test/test_autograd_fallback.py::TestAutogradFallback::test_base_does_not_require_grad_mode_warn, test/test_autograd_fallback.py::TestAutogradFallback::test_composite_registered_to_cpu_mode_nothing, test/test_autograd_fallback.py::TestAutogradFallback::test_composite_registered_to_cpu_mode_warn, test/test_autograd_fallback.py::TestAutogradFallback::test_cpu_return_self_mode_nothing, test/test_autograd_fallback.py::TestAutogradFallback::test_cpu_return_self_mode_warn, test/test_autograd_fallback.py::TestAutogradFallback::test_inplace_autograd_function_registered_to_cpu_mode_nothing, test/test_autograd_fallback.py::TestAutogradFallback::test_inplace_autograd_function_registered_to_cpu_mode_warn, test/test_autograd_fallback.py::TestAutogradFallback::test_inplace_on_tensor_that_does_not_require_grad_mode_nothing, test/test_autograd_fallback.py::TestAutogradFallback::test_inplace_on_tensor_that_does_not_require_grad_mode_warn, test/test_autograd_fallback.py::TestAutogradFallback::test_no_autograd_kernel_inplace_mode_nothing, test/test_autograd_fallback.py::TestAutogradFallback::test_no_autograd_kernel_inplace_mode_warn, test/test_autograd_fallback.py::TestAutogradFallback::test_no_autograd_kernel_mode_nothing, test/test_autograd_fallback.py::TestAutogradFallback::test_no_autograd_kernel_mode_warn, test/test_autograd_fallback.py::TestAutogradFallback::test_no_grad_mode_nothing, test/test_autograd_fallback.py::TestAutogradFallback::test_no_grad_mode_warn, test/test_autograd_fallback.py::TestAutogradFallback::test_post_autograd_returns_leaf_mode_nothing, test/test_autograd_fallback.py::TestAutogradFallback::test_post_autograd_returns_leaf_mode_warn, test/test_autograd_fallback.py::TestAutogradFallback::test_post_autograd_returns_mix_of_requires_grad_tensors_mode_nothing, test/test_autograd_fallback.py::TestAutogradFallback::test_post_autograd_returns_mix_of_requires_grad_tensors_mode_warn, test/test_autograd_fallback.py::TestAutogradFallback::test_supports_tensor_lists_mode_nothing, test/test_autograd_fallback.py::TestAutogradFallback::test_supports_tensor_lists_mode_warn, test/test_autograd_fallback.py::TestAutogradFallback::test_undefined_grads_mode_nothing, test/test_autograd_fallback.py::TestAutogradFallback::test_undefined_grads_mode_warn, test/test_autograd_fallback.py::TestAutogradFallback::test_undefined_inputs_outputs_mode_nothing, test/test_autograd_fallback.py::TestAutogradFallback::test_undefined_inputs_outputs_mode_warn 2025-12-04T16:43:02.6568482Z 2025-12-04T16:43:02.6568831Z Finished test_autograd_fallback 1/1 ... [2025-12-04 16:43:02.653714][27011.03661036], took 0.10min 2025-12-04T16:43:02.6889678Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_autograd_fallback/test_autograd_fallback-e1a7bbd98afc63dc.xml 2025-12-04T16:43:02.7247780Z Running test_type_hints 1/1 ... [2025-12-04 16:43:02.724484][27011.107378218] 2025-12-04T16:43:02.7248347Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T16:43:02.7251603Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_type_hints.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 16:43:02.724908] 2025-12-04T16:43:08.3477918Z 2025-12-04T16:43:08.3478835Z test_type_hints 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_type_hints_1.1_d9336b501fe8992b_.log 2025-12-04T16:43:08.3479934Z Running 1 items in this shard: test/test_type_hints.py::TestTypeHints::test_doc_examples 2025-12-04T16:43:08.3480416Z 2025-12-04T16:43:08.3480708Z Finished test_type_hints 1/1 ... [2025-12-04 16:43:08.347558][27016.730455185], took 0.09min 2025-12-04T16:43:08.3824812Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_type_hints/test_type_hints-d14fd0906e097d86.xml 2025-12-04T16:43:08.4278776Z Running functorch/test_aot_joint_with_descriptors 1/1 ... [2025-12-04 16:43:08.427573][27016.810466783] 2025-12-04T16:43:08.4279453Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T16:43:08.4282853Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'functorch/test_aot_joint_with_descriptors.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 16:43:08.428014] 2025-12-04T16:43:26.5693116Z 2025-12-04T16:43:26.5694284Z functorch/test_aot_joint_with_descriptors 1/1 was successful, full logs can be found in artifacts with path test/test-reports/functorch.test_aot_joint_with_descriptors_1.1_948ec5a85f7c1f8f_.log 2025-12-04T16:43:26.5705337Z Running 17 items in this shard: test/functorch/test_aot_joint_with_descriptors.py::TestAOTJointWithDescriptors::test_conv_bn_module, test/functorch/test_aot_joint_with_descriptors.py::TestAOTJointWithDescriptors::test_custom_op_stack_trace, test/functorch/test_aot_joint_with_descriptors.py::TestAOTJointWithDescriptors::test_export_and_compile, test/functorch/test_aot_joint_with_descriptors.py::TestAOTJointWithDescriptors::test_fx_utils_conv_bn_module, test/functorch/test_aot_joint_with_descriptors.py::TestAOTJointWithDescriptors::test_fx_utils_multiple_outputs, test/functorch/test_aot_joint_with_descriptors.py::TestAOTJointWithDescriptors::test_fx_utils_node_consistency, test/functorch/test_aot_joint_with_descriptors.py::TestAOTJointWithDescriptors::test_fx_utils_simple_linear, test/functorch/test_aot_joint_with_descriptors.py::TestAOTJointWithDescriptors::test_in_out_specs, test/functorch/test_aot_joint_with_descriptors.py::TestAOTJointWithDescriptors::test_module_with_kwargs, test/functorch/test_aot_joint_with_descriptors.py::TestAOTJointWithDescriptors::test_multiple_outputs_module, test/functorch/test_aot_joint_with_descriptors.py::TestAOTJointWithDescriptors::test_no_annotation_on_gradient_acc_nodes, test/functorch/test_aot_joint_with_descriptors.py::TestAOTJointWithDescriptors::test_preserve_annotate_flex_attention, test/functorch/test_aot_joint_with_descriptors.py::TestAOTJointWithDescriptors::test_preserve_annotate_function, test/functorch/test_aot_joint_with_descriptors.py::TestAOTJointWithDescriptors::test_preserve_annotate_replay_view, test/functorch/test_aot_joint_with_descriptors.py::TestAOTJointWithDescriptors::test_preserve_annotate_simple, test/functorch/test_aot_joint_with_descriptors.py::TestAOTJointWithDescriptors::test_simple_linear_module, test/functorch/test_aot_joint_with_descriptors.py::TestAOTJointWithDescriptors::test_static_input_indices 2025-12-04T16:43:26.5715274Z 2025-12-04T16:43:26.5715822Z Finished functorch/test_aot_joint_with_descriptors 1/1 ... [2025-12-04 16:43:26.569070][27034.951967153], took 0.30min 2025-12-04T16:43:26.6048932Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/functorch.test_aot_joint_with_descriptors/functorch.test_aot_joint_with_descriptors-79fd9b229bc0c00b.xml 2025-12-04T16:43:26.6787286Z Running test_fx_reinplace_pass 1/1 ... [2025-12-04 16:43:26.678362][27035.061255298] 2025-12-04T16:43:26.6787896Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T16:43:26.6790297Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_fx_reinplace_pass.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 16:43:26.678792] 2025-12-04T16:43:32.8580596Z 2025-12-04T16:43:32.8581590Z test_fx_reinplace_pass 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_fx_reinplace_pass_1.1_8f7033a49b0aaa2e_.log 2025-12-04T16:43:32.8587523Z Running 12 items in this shard: test/test_fx_reinplace_pass.py::TestReinplacePass::test_out_node_updated, test/test_fx_reinplace_pass.py::TestReinplacePass::test_reinplace_basic, test/test_fx_reinplace_pass.py::TestReinplacePass::test_reinplace_different_metadata, test/test_fx_reinplace_pass.py::TestReinplacePass::test_reinplace_index_mutation, test/test_fx_reinplace_pass.py::TestReinplacePass::test_reinplace_overlapping_memory, test/test_fx_reinplace_pass.py::TestReinplacePass::test_reinplace_scatter_op, test/test_fx_reinplace_pass.py::TestReinplacePass::test_reinplace_scatter_twice, test/test_fx_reinplace_pass.py::TestReinplacePass::test_reinplace_scatter_twice_with_different_view_op_invalid, test/test_fx_reinplace_pass.py::TestReinplacePass::test_reinplace_scatter_twice_with_different_view_op_invalid2, test/test_fx_reinplace_pass.py::TestReinplacePass::test_reinplace_scatter_twice_with_different_view_op_valid, test/test_fx_reinplace_pass.py::TestReinplacePass::test_reinplace_sym_input, test/test_fx_reinplace_pass.py::TestReinplacePass::test_reinplace_with_view 2025-12-04T16:43:32.8593191Z 2025-12-04T16:43:32.8593524Z Finished test_fx_reinplace_pass 1/1 ... [2025-12-04 16:43:32.857851][27041.240747659], took 0.10min 2025-12-04T16:43:32.8930572Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_fx_reinplace_pass/test_fx_reinplace_pass-047146b9ff22e4f6.xml 2025-12-04T16:43:32.9781035Z Running functorch/test_control_flow 2/2 ... [2025-12-04 16:43:32.977781][27041.360674385] 2025-12-04T16:43:32.9781638Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T16:43:32.9784654Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'functorch/test_control_flow.py', '--shard-id=2', '--num-shards=2', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 16:43:32.978208] 2025-12-04T17:04:34.4615330Z 2025-12-04T17:04:34.4616379Z functorch/test_control_flow 2/2 was successful, full logs can be found in artifacts with path test/test-reports/functorch.test_control_flow_2.2_2e5432104edc7835_.log 2025-12-04T17:04:34.5414256Z Running 950 items in this shard: test/functorch/test_control_flow.py::TestControlFlow::test_cond_autograd_different_pytree_output, test/functorch/test_control_flow.py::TestControlFlow::test_cond_autograd_grad_through_cond, test/functorch/test_control_flow.py::TestControlFlow::test_cond_autograd_nested, test/functorch/test_control_flow.py::TestControlFlow::test_cond_autograd_pytree_not_all_inputs_used, test/functorch/test_control_flow.py::TestControlFlow::test_cond_autograd_simple, test/functorch/test_control_flow.py::TestControlFlow::test_cond_gpu, test/functorch/test_control_flow.py::TestControlFlow::test_cond_in_forloop, test/functorch/test_control_flow.py::TestControlFlow::test_cond_no_trace, test/functorch/test_control_flow.py::TestControlFlow::test_map_autograd_no_grad_output, test/functorch/test_control_flow.py::TestControlFlow::test_map_autograd_simple_partial_grad, test/functorch/test_control_flow.py::TestControlFlow::test_map_gpu, test/functorch/test_control_flow.py::TestControlFlow::test_map_list_in_out, test/functorch/test_control_flow.py::TestControlFlow::test_scan_binary_operator_reverse_False_compile_mode_eager_cpu_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_binary_operator_reverse_False_compile_mode_none_cpu_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_binary_operator_reverse_False_compile_mode_none_cuda_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_binary_operator_reverse_True_compile_mode_eager_cuda_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_carry_output_alias, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_RNN_compile_mode_eager_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_RNN_compile_mode_none_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_RNN_partial_autograd_reverse_False_compile_mode_eager_partial_grad_additional_inputs_cuda, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_RNN_partial_autograd_reverse_False_compile_mode_eager_partial_grad_complex_cuda, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_RNN_partial_autograd_reverse_False_compile_mode_eager_partial_grad_random_cuda, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_RNN_partial_autograd_reverse_False_compile_mode_eager_partial_grad_xs_cuda, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_RNN_partial_autograd_reverse_False_compile_mode_none_partial_grad_additional_inputs_cuda, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_RNN_partial_autograd_reverse_False_compile_mode_none_partial_grad_complex_cpu, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_RNN_partial_autograd_reverse_False_compile_mode_none_partial_grad_random_cpu, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_RNN_partial_autograd_reverse_False_compile_mode_none_partial_grad_xs_cpu, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_RNN_partial_autograd_reverse_True_compile_mode_eager_partial_grad_complex_cpu, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_RNN_partial_autograd_reverse_True_compile_mode_eager_partial_grad_init_cuda, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_RNN_partial_autograd_reverse_True_compile_mode_eager_partial_grad_xs_cpu, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_RNN_partial_autograd_reverse_True_compile_mode_none_partial_grad_additional_inputs_cpu, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_RNN_partial_autograd_reverse_True_compile_mode_none_partial_grad_complex_cpu, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_RNN_partial_autograd_reverse_True_compile_mode_none_partial_grad_complex_cuda, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_RNN_partial_autograd_reverse_True_compile_mode_none_partial_grad_random_cpu, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_combine_fn_carries_ys_same_grad_reverse_False_compile_mode_none_cpu_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_combine_fn_carries_ys_same_grad_reverse_False_compile_mode_none_cpu_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_combine_fn_carries_ys_same_grad_reverse_False_compile_mode_none_cuda_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_combine_fn_carries_ys_same_grad_reverse_True_compile_mode_eager_cpu_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_combine_fn_carries_ys_same_grad_reverse_True_compile_mode_eager_cuda_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_combine_fn_carries_ys_same_grad_reverse_True_compile_mode_none_cpu_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_combine_fn_with_no_grad_additional_inputs_all_reverse_False_compile_mode_eager_cpu_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_combine_fn_with_no_grad_additional_inputs_all_reverse_False_compile_mode_eager_cuda_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_combine_fn_with_no_grad_additional_inputs_all_reverse_False_compile_mode_none_cpu_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_combine_fn_with_no_grad_additional_inputs_all_reverse_False_compile_mode_none_cuda_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_combine_fn_with_no_grad_additional_inputs_all_reverse_False_compile_mode_none_cuda_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_combine_fn_with_no_grad_additional_inputs_all_reverse_True_compile_mode_eager_cpu_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_combine_fn_with_no_grad_additional_inputs_all_reverse_True_compile_mode_eager_cuda_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_combine_fn_with_no_grad_additional_inputs_all_reverse_True_compile_mode_none_cpu_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_combine_fn_with_no_grad_additional_inputs_all_reverse_True_compile_mode_none_cuda_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_combine_fn_with_no_grad_additional_inputs_partial_reverse_False_compile_mode_eager_cuda_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_combine_fn_with_no_grad_additional_inputs_partial_reverse_False_compile_mode_none_cpu_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_combine_fn_with_no_grad_additional_inputs_partial_reverse_False_compile_mode_none_cpu_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_combine_fn_with_no_grad_additional_inputs_partial_reverse_True_compile_mode_eager_cpu_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_combine_fn_with_no_grad_additional_inputs_partial_reverse_True_compile_mode_eager_cpu_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_combine_fn_with_no_grad_additional_inputs_partial_reverse_True_compile_mode_eager_cuda_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_combine_fn_with_no_grad_additional_inputs_partial_reverse_True_compile_mode_none_cpu_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_combine_fn_with_no_grad_additional_inputs_partial_reverse_True_compile_mode_none_cuda_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_combine_fn_with_no_grad_additional_inputs_partial_reverse_True_compile_mode_none_cuda_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_combine_fn_with_no_grad_for_out_reverse_False_compile_mode_eager_cuda_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_combine_fn_with_no_grad_for_out_reverse_True_compile_mode_eager_cpu_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_combine_fn_with_no_grad_for_out_reverse_True_compile_mode_eager_cuda_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_combine_fn_with_no_grad_for_out_reverse_True_compile_mode_none_cuda_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_combine_fn_with_no_grad_init_carries_equal_grad_reverse_False_compile_mode_eager_cpu_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_combine_fn_with_no_grad_init_carries_equal_grad_reverse_False_compile_mode_eager_cpu_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_combine_fn_with_no_grad_init_carries_equal_grad_reverse_False_compile_mode_eager_cuda_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_combine_fn_with_no_grad_init_carries_equal_grad_reverse_False_compile_mode_none_cuda_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_combine_fn_with_no_grad_init_carries_equal_grad_reverse_True_compile_mode_eager_cpu_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_combine_fn_with_no_grad_init_carries_equal_grad_reverse_True_compile_mode_eager_cpu_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_combine_fn_with_no_grad_init_carries_equal_grad_reverse_True_compile_mode_eager_cuda_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_combine_fn_with_no_grad_init_carries_equal_grad_reverse_True_compile_mode_none_cpu_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_combine_fn_with_no_grad_init_carries_equal_grad_reverse_True_compile_mode_none_cpu_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_combine_fn_with_no_grad_init_carries_equal_grad_reverse_True_compile_mode_none_cuda_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_combine_fn_with_no_grad_init_carries_equal_grad_reverse_True_compile_mode_none_cuda_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_combine_fn_with_no_grad_init_carries_unequal_grad_reverse_False_compile_mode_eager_cpu_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_combine_fn_with_no_grad_init_carries_unequal_grad_reverse_False_compile_mode_eager_cpu_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_combine_fn_with_no_grad_init_carries_unequal_grad_reverse_False_compile_mode_eager_cuda_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_combine_fn_with_no_grad_init_carries_unequal_grad_reverse_False_compile_mode_none_cpu_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_combine_fn_with_no_grad_init_carries_unequal_grad_reverse_True_compile_mode_eager_cpu_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_combine_fn_with_no_grad_init_carries_unequal_grad_reverse_True_compile_mode_none_cpu_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_nested_reverse_False_compile_mode_eager_cuda_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_nested_reverse_False_compile_mode_none_cpu_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_nested_reverse_False_compile_mode_none_cuda_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_nested_reverse_False_compile_mode_none_cuda_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_nested_reverse_True_compile_mode_eager_cpu_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_nested_reverse_True_compile_mode_eager_cpu_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_nested_reverse_True_compile_mode_eager_cuda_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_nested_reverse_True_compile_mode_none_cpu_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_nested_reverse_True_compile_mode_none_cuda_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_nested_reverse_True_compile_mode_none_cuda_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_compile_cnt_reverse_False_cuda, test/functorch/test_control_flow.py::TestControlFlow::test_scan_compile_cnt_reverse_True_cpu, test/functorch/test_control_flow.py::TestControlFlow::test_scan_compile_cnt_reverse_True_cuda, test/functorch/test_control_flow.py::TestControlFlow::test_scan_compile_reverse_False_compile_mode_eager_cpu_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_compile_reverse_False_compile_mode_eager_cpu_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_compile_reverse_False_compile_mode_eager_cuda_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_compile_reverse_False_compile_mode_none_cpu_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_compile_reverse_False_compile_mode_none_cuda_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_compile_reverse_True_compile_mode_eager_cpu_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_compile_reverse_True_compile_mode_none_cuda_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_compile_reverse_True_compile_mode_none_cuda_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_complex_pytree_reverse_False_compile_mode_eager_cpu_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_complex_pytree_reverse_False_compile_mode_eager_cuda_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_complex_pytree_reverse_False_compile_mode_none_cuda_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_complex_pytree_reverse_False_compile_mode_none_cuda_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_complex_pytree_reverse_True_compile_mode_eager_cuda_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_complex_pytree_reverse_True_compile_mode_none_cpu_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_complex_pytree_reverse_True_compile_mode_none_cuda_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_dim_reverse_False_compile_mode_eager_cpu_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_dim_reverse_False_compile_mode_none_cpu_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_dim_reverse_False_compile_mode_none_cuda_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_dim_reverse_True_compile_mode_eager_cuda_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_dim_reverse_True_compile_mode_none_cuda_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_downstream_scan_matmul_compile_mode_eager_reverse_False_cpu_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_downstream_scan_matmul_compile_mode_eager_reverse_False_cuda_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_downstream_scan_matmul_compile_mode_eager_reverse_True_cpu_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_downstream_scan_matmul_compile_mode_eager_reverse_True_cpu_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_downstream_scan_matmul_compile_mode_eager_reverse_True_cuda_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_downstream_scan_matmul_compile_mode_none_reverse_False_cuda_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_downstream_scan_matmul_compile_mode_none_reverse_False_cuda_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_downstream_scan_matmul_compile_mode_none_reverse_True_cuda_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_downstream_scan_scan_dim_compile_mode_eager_reverse_False_cpu_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_downstream_scan_scan_dim_compile_mode_eager_reverse_True_cuda_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_downstream_scan_scan_dim_compile_mode_none_reverse_False_cpu_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_downstream_scan_scan_dim_compile_mode_none_reverse_False_cpu_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_downstream_scan_scan_dim_compile_mode_none_reverse_True_cpu_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_downstream_scan_scan_dim_compile_mode_none_reverse_True_cpu_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_downstream_scan_scan_dim_compile_mode_none_reverse_True_cuda_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_dtype_reverse_False_compile_mode_eager_cpu_complex64, test/functorch/test_control_flow.py::TestControlFlow::test_scan_dtype_reverse_False_compile_mode_eager_cpu_int64, test/functorch/test_control_flow.py::TestControlFlow::test_scan_dtype_reverse_False_compile_mode_eager_cuda_float16, test/functorch/test_control_flow.py::TestControlFlow::test_scan_dtype_reverse_False_compile_mode_eager_cuda_int64, test/functorch/test_control_flow.py::TestControlFlow::test_scan_dtype_reverse_False_compile_mode_none_cpu_float16, test/functorch/test_control_flow.py::TestControlFlow::test_scan_dtype_reverse_False_compile_mode_none_cpu_float32, test/functorch/test_control_flow.py::TestControlFlow::test_scan_dtype_reverse_False_compile_mode_none_cpu_int32, test/functorch/test_control_flow.py::TestControlFlow::test_scan_dtype_reverse_False_compile_mode_none_cuda_int32, test/functorch/test_control_flow.py::TestControlFlow::test_scan_dtype_reverse_False_compile_mode_none_cuda_int64, test/functorch/test_control_flow.py::TestControlFlow::test_scan_dtype_reverse_True_compile_mode_eager_cpu_complex64, test/functorch/test_control_flow.py::TestControlFlow::test_scan_dtype_reverse_True_compile_mode_eager_cpu_float16, test/functorch/test_control_flow.py::TestControlFlow::test_scan_dtype_reverse_True_compile_mode_eager_cpu_int64, test/functorch/test_control_flow.py::TestControlFlow::test_scan_dtype_reverse_True_compile_mode_eager_cuda_float32, test/functorch/test_control_flow.py::TestControlFlow::test_scan_dtype_reverse_True_compile_mode_eager_cuda_int64, test/functorch/test_control_flow.py::TestControlFlow::test_scan_dtype_reverse_True_compile_mode_none_cpu_float16, test/functorch/test_control_flow.py::TestControlFlow::test_scan_dtype_reverse_True_compile_mode_none_cpu_float32, test/functorch/test_control_flow.py::TestControlFlow::test_scan_dtype_reverse_True_compile_mode_none_cpu_int32, test/functorch/test_control_flow.py::TestControlFlow::test_scan_dtype_reverse_True_compile_mode_none_cpu_int64, test/functorch/test_control_flow.py::TestControlFlow::test_scan_dtype_reverse_True_compile_mode_none_cuda_complex64, test/functorch/test_control_flow.py::TestControlFlow::test_scan_dtype_reverse_True_compile_mode_none_cuda_int64, test/functorch/test_control_flow.py::TestControlFlow::test_scan_init_pytree_complex_reverse_False_compile_mode_eager_cpu_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_init_pytree_complex_reverse_False_compile_mode_none_cpu_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_init_pytree_complex_reverse_False_compile_mode_none_cuda_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_init_pytree_complex_reverse_True_compile_mode_eager_cpu_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_init_pytree_complex_reverse_True_compile_mode_eager_cuda_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_init_pytree_complex_reverse_True_compile_mode_eager_cuda_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_init_pytree_complex_reverse_True_compile_mode_none_cuda_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_init_pytree_complex_reverse_True_compile_mode_none_cuda_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_init_reverse_False_compile_mode_eager_cpu_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_init_reverse_False_compile_mode_eager_cuda_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_init_reverse_False_compile_mode_eager_cuda_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_init_reverse_False_compile_mode_none_cpu_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_init_reverse_False_compile_mode_none_cpu_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_init_reverse_False_compile_mode_none_cuda_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_init_reverse_True_compile_mode_eager_cpu_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_init_reverse_True_compile_mode_eager_cuda_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_init_reverse_True_compile_mode_eager_cuda_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_init_reverse_True_compile_mode_none_cuda_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_init_scanned_0, test/functorch/test_control_flow.py::TestControlFlow::test_scan_init_wrong_pytree_carry_shape, test/functorch/test_control_flow.py::TestControlFlow::test_scan_init_wrong_pytree_complex_reverse_False_cpu, test/functorch/test_control_flow.py::TestControlFlow::test_scan_init_wrong_pytree_complex_reverse_True_cuda, test/functorch/test_control_flow.py::TestControlFlow::test_scan_init_wrong_pytree_init_shorter_carry, test/functorch/test_control_flow.py::TestControlFlow::test_scan_input_carry_alias, test/functorch/test_control_flow.py::TestControlFlow::test_scan_input_mutation, test/functorch/test_control_flow.py::TestControlFlow::test_scan_input_output_alias, test/functorch/test_control_flow.py::TestControlFlow::test_scan_multiple_layers_gradient_layers_1_device_cuda, test/functorch/test_control_flow.py::TestControlFlow::test_scan_multiple_layers_gradient_layers_2_device_cuda, test/functorch/test_control_flow.py::TestControlFlow::test_scan_multiple_layers_gradient_layers_3_device_cuda, test/functorch/test_control_flow.py::TestControlFlow::test_scan_non_pointwise_reverse_False_compile_mode_eager_cpu_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_non_pointwise_reverse_False_compile_mode_eager_cuda_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_non_pointwise_reverse_False_compile_mode_eager_cuda_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_non_pointwise_reverse_False_compile_mode_none_cpu_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_non_pointwise_reverse_False_compile_mode_none_cpu_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_non_pointwise_reverse_False_compile_mode_none_cuda_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_non_pointwise_reverse_False_compile_mode_none_cuda_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_non_pointwise_reverse_True_compile_mode_eager_cpu_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_non_pointwise_reverse_True_compile_mode_eager_cuda_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_non_pointwise_reverse_True_compile_mode_eager_cuda_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_non_pointwise_reverse_True_compile_mode_none_cpu_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_one_return, test/functorch/test_control_flow.py::TestControlFlow::test_scan_simple_graph, test/functorch/test_control_flow.py::TestControlFlow::test_scan_tuple_reverse_False_compile_mode_eager_cpu_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_tuple_reverse_False_compile_mode_eager_cuda_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_tuple_reverse_False_compile_mode_eager_cuda_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_tuple_reverse_False_compile_mode_none_cpu_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_tuple_reverse_True_compile_mode_eager_cpu_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_tuple_reverse_True_compile_mode_eager_cpu_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_tuple_reverse_True_compile_mode_eager_cuda_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_tuple_reverse_True_compile_mode_none_cpu_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_tuple_reverse_True_compile_mode_none_cuda_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_tuple_reverse_True_compile_mode_none_cuda_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_while_loop_gpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_binary_operator_compile_mode_compile_combine_mode_generic_reverse_False_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_binary_operator_compile_mode_compile_combine_mode_generic_reverse_False_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_binary_operator_compile_mode_compile_combine_mode_generic_reverse_True_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_binary_operator_compile_mode_compile_combine_mode_pointwise_reverse_False_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_binary_operator_compile_mode_compile_combine_mode_pointwise_reverse_False_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_binary_operator_compile_mode_compile_combine_mode_pointwise_reverse_False_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_binary_operator_compile_mode_compile_combine_mode_pointwise_reverse_True_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_binary_operator_compile_mode_compile_combine_mode_pointwise_reverse_True_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_binary_operator_compile_mode_compile_combine_mode_pointwise_reverse_True_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_binary_operator_compile_mode_compile_combine_mode_pointwise_reverse_True_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_binary_operator_compile_mode_compile_dynamic_shape_combine_mode_generic_reverse_True_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_binary_operator_compile_mode_compile_dynamic_shape_combine_mode_generic_reverse_True_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_binary_operator_compile_mode_compile_dynamic_shape_combine_mode_generic_reverse_True_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_binary_operator_compile_mode_compile_dynamic_shape_combine_mode_generic_reverse_True_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_binary_operator_compile_mode_compile_dynamic_shape_combine_mode_pointwise_reverse_False_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_binary_operator_compile_mode_compile_dynamic_shape_combine_mode_pointwise_reverse_False_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_binary_operator_compile_mode_compile_dynamic_shape_combine_mode_pointwise_reverse_True_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_binary_operator_compile_mode_compile_dynamic_shape_combine_mode_pointwise_reverse_True_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_binary_operator_compile_mode_eager_combine_mode_pointwise_reverse_False_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_binary_operator_compile_mode_eager_combine_mode_pointwise_reverse_False_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_binary_operator_compile_mode_eager_combine_mode_pointwise_reverse_True_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_binary_operator_compile_mode_none_combine_mode_generic_reverse_False_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_binary_operator_compile_mode_none_combine_mode_generic_reverse_False_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_binary_operator_compile_mode_none_combine_mode_generic_reverse_True_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_binary_operator_compile_mode_none_combine_mode_generic_reverse_True_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_binary_operator_compile_mode_none_combine_mode_pointwise_reverse_False_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_binary_operator_compile_mode_none_combine_mode_pointwise_reverse_False_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_binary_operator_compile_mode_none_combine_mode_pointwise_reverse_True_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_binary_operator_compile_mode_none_combine_mode_pointwise_reverse_True_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_compile_reverse_False_compile_mode_compile_combine_mode_generic_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_compile_reverse_False_compile_mode_compile_dynamic_shape_combine_mode_generic_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_compile_reverse_False_compile_mode_compile_dynamic_shape_combine_mode_generic_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_compile_reverse_False_compile_mode_compile_dynamic_shape_combine_mode_generic_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_compile_reverse_False_compile_mode_compile_dynamic_shape_combine_mode_pointwise_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_compile_reverse_False_compile_mode_eager_combine_mode_generic_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_compile_reverse_False_compile_mode_eager_combine_mode_generic_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_compile_reverse_False_compile_mode_eager_combine_mode_generic_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_compile_reverse_False_compile_mode_eager_combine_mode_generic_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_compile_reverse_False_compile_mode_eager_combine_mode_pointwise_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_compile_reverse_False_compile_mode_eager_combine_mode_pointwise_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_compile_reverse_False_compile_mode_none_combine_mode_generic_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_compile_reverse_False_compile_mode_none_combine_mode_generic_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_compile_reverse_False_compile_mode_none_combine_mode_pointwise_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_compile_reverse_False_compile_mode_none_combine_mode_pointwise_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_compile_reverse_False_compile_mode_none_combine_mode_pointwise_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_compile_reverse_True_compile_mode_compile_combine_mode_generic_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_compile_reverse_True_compile_mode_compile_combine_mode_generic_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_compile_reverse_True_compile_mode_compile_combine_mode_pointwise_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_compile_reverse_True_compile_mode_compile_combine_mode_pointwise_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_compile_reverse_True_compile_mode_compile_combine_mode_pointwise_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_compile_reverse_True_compile_mode_compile_dynamic_shape_combine_mode_generic_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_compile_reverse_True_compile_mode_eager_combine_mode_generic_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_compile_reverse_True_compile_mode_eager_combine_mode_pointwise_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_compile_reverse_True_compile_mode_eager_combine_mode_pointwise_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_compile_reverse_True_compile_mode_eager_combine_mode_pointwise_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_compile_reverse_True_compile_mode_none_combine_mode_generic_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_compile_reverse_True_compile_mode_none_combine_mode_generic_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_compile_reverse_True_compile_mode_none_combine_mode_generic_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_compile_reverse_True_compile_mode_none_combine_mode_pointwise_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_compile_reverse_True_compile_mode_none_combine_mode_pointwise_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_compile_reverse_True_compile_mode_none_combine_mode_pointwise_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_complex_pytree_compile_mode_compile_combine_mode_generic_reverse_False_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_complex_pytree_compile_mode_compile_combine_mode_generic_reverse_False_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_complex_pytree_compile_mode_compile_combine_mode_generic_reverse_True_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_complex_pytree_compile_mode_compile_combine_mode_pointwise_reverse_False_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_complex_pytree_compile_mode_compile_combine_mode_pointwise_reverse_False_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_complex_pytree_compile_mode_compile_combine_mode_pointwise_reverse_False_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_complex_pytree_compile_mode_compile_combine_mode_pointwise_reverse_True_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_complex_pytree_compile_mode_compile_dynamic_shape_combine_mode_generic_reverse_False_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_complex_pytree_compile_mode_compile_dynamic_shape_combine_mode_generic_reverse_True_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_complex_pytree_compile_mode_compile_dynamic_shape_combine_mode_generic_reverse_True_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_complex_pytree_compile_mode_compile_dynamic_shape_combine_mode_pointwise_reverse_False_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_complex_pytree_compile_mode_compile_dynamic_shape_combine_mode_pointwise_reverse_False_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_complex_pytree_compile_mode_compile_dynamic_shape_combine_mode_pointwise_reverse_False_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_complex_pytree_compile_mode_compile_dynamic_shape_combine_mode_pointwise_reverse_True_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_complex_pytree_compile_mode_compile_dynamic_shape_combine_mode_pointwise_reverse_True_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_complex_pytree_compile_mode_compile_dynamic_shape_combine_mode_pointwise_reverse_True_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_complex_pytree_compile_mode_eager_combine_mode_generic_reverse_False_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_complex_pytree_compile_mode_eager_combine_mode_generic_reverse_False_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_complex_pytree_compile_mode_eager_combine_mode_generic_reverse_False_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_complex_pytree_compile_mode_eager_combine_mode_pointwise_reverse_False_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_complex_pytree_compile_mode_eager_combine_mode_pointwise_reverse_False_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_complex_pytree_compile_mode_eager_combine_mode_pointwise_reverse_False_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_complex_pytree_compile_mode_eager_combine_mode_pointwise_reverse_True_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_complex_pytree_compile_mode_eager_combine_mode_pointwise_reverse_True_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_complex_pytree_compile_mode_eager_combine_mode_pointwise_reverse_True_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_complex_pytree_compile_mode_eager_combine_mode_pointwise_reverse_True_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_complex_pytree_compile_mode_none_combine_mode_generic_reverse_False_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_complex_pytree_compile_mode_none_combine_mode_generic_reverse_False_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_complex_pytree_compile_mode_none_combine_mode_generic_reverse_False_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_complex_pytree_compile_mode_none_combine_mode_generic_reverse_True_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_complex_pytree_compile_mode_none_combine_mode_generic_reverse_True_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_complex_pytree_compile_mode_none_combine_mode_pointwise_reverse_False_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_complex_pytree_compile_mode_none_combine_mode_pointwise_reverse_True_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_complex_pytree_compile_mode_none_combine_mode_pointwise_reverse_True_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_cond_in_combine_fn_compile_mode_compile_dynamic_shape_reverse_False_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_cond_in_combine_fn_compile_mode_compile_dynamic_shape_reverse_False_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_cond_in_combine_fn_compile_mode_compile_reverse_False_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_cond_in_combine_fn_compile_mode_compile_reverse_False_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_cond_in_combine_fn_compile_mode_compile_reverse_True_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_cond_in_combine_fn_compile_mode_compile_reverse_True_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_cond_in_combine_fn_compile_mode_eager_reverse_False_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_cond_in_combine_fn_compile_mode_eager_reverse_False_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_cond_in_combine_fn_compile_mode_eager_reverse_True_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_cond_in_combine_fn_compile_mode_none_reverse_False_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_cond_in_combine_fn_compile_mode_none_reverse_False_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_cond_in_combine_fn_compile_mode_none_reverse_True_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_cond_in_combine_fn_compile_mode_none_reverse_True_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_different_input_size_compile_mode_compile_dynamic_shape_reverse_False_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_different_input_size_compile_mode_compile_dynamic_shape_reverse_True_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_different_input_size_compile_mode_compile_reverse_False_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_different_input_size_compile_mode_eager_reverse_False_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_different_input_size_compile_mode_eager_reverse_False_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_different_input_size_compile_mode_none_reverse_False_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_different_input_size_compile_mode_none_reverse_True_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_dim_reverse_False_compile_mode_compile_combine_mode_generic_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_dim_reverse_False_compile_mode_compile_combine_mode_pointwise_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_dim_reverse_False_compile_mode_compile_combine_mode_pointwise_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_dim_reverse_False_compile_mode_compile_dynamic_shape_combine_mode_generic_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_dim_reverse_False_compile_mode_compile_dynamic_shape_combine_mode_generic_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_dim_reverse_False_compile_mode_compile_dynamic_shape_combine_mode_pointwise_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_dim_reverse_False_compile_mode_compile_dynamic_shape_combine_mode_pointwise_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_dim_reverse_False_compile_mode_compile_dynamic_shape_combine_mode_pointwise_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_dim_reverse_False_compile_mode_eager_combine_mode_generic_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_dim_reverse_False_compile_mode_eager_combine_mode_generic_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_dim_reverse_False_compile_mode_eager_combine_mode_generic_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_dim_reverse_False_compile_mode_eager_combine_mode_pointwise_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_dim_reverse_False_compile_mode_none_combine_mode_pointwise_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_dim_reverse_False_compile_mode_none_combine_mode_pointwise_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_dim_reverse_True_compile_mode_compile_combine_mode_generic_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_dim_reverse_True_compile_mode_compile_combine_mode_generic_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_dim_reverse_True_compile_mode_compile_combine_mode_generic_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_dim_reverse_True_compile_mode_compile_combine_mode_pointwise_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_dim_reverse_True_compile_mode_compile_combine_mode_pointwise_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_dim_reverse_True_compile_mode_compile_dynamic_shape_combine_mode_generic_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_dim_reverse_True_compile_mode_eager_combine_mode_generic_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_dim_reverse_True_compile_mode_eager_combine_mode_pointwise_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_dim_reverse_True_compile_mode_eager_combine_mode_pointwise_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_dim_reverse_True_compile_mode_eager_combine_mode_pointwise_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_dim_reverse_True_compile_mode_none_combine_mode_pointwise_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_matmul_combine_mode_generic_compile_mode_compile_dynamic_shape_reverse_False_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_matmul_combine_mode_generic_compile_mode_compile_dynamic_shape_reverse_False_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_matmul_combine_mode_generic_compile_mode_compile_dynamic_shape_reverse_True_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_matmul_combine_mode_generic_compile_mode_compile_reverse_False_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_matmul_combine_mode_generic_compile_mode_compile_reverse_False_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_matmul_combine_mode_generic_compile_mode_compile_reverse_False_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_matmul_combine_mode_generic_compile_mode_compile_reverse_True_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_matmul_combine_mode_generic_compile_mode_compile_reverse_True_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_matmul_combine_mode_generic_compile_mode_compile_reverse_True_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_matmul_combine_mode_generic_compile_mode_eager_reverse_False_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_matmul_combine_mode_generic_compile_mode_eager_reverse_False_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_matmul_combine_mode_generic_compile_mode_eager_reverse_True_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_matmul_combine_mode_generic_compile_mode_none_reverse_False_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_matmul_combine_mode_generic_compile_mode_none_reverse_False_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_matmul_combine_mode_generic_compile_mode_none_reverse_True_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_matmul_combine_mode_generic_compile_mode_none_reverse_True_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_matmul_combine_mode_generic_compile_mode_none_reverse_True_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_matmul_combine_mode_pointwise_compile_mode_compile_dynamic_shape_reverse_True_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_matmul_combine_mode_pointwise_compile_mode_compile_dynamic_shape_reverse_True_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_matmul_combine_mode_pointwise_compile_mode_compile_dynamic_shape_reverse_True_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_matmul_combine_mode_pointwise_compile_mode_compile_reverse_False_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_matmul_combine_mode_pointwise_compile_mode_compile_reverse_False_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_matmul_combine_mode_pointwise_compile_mode_compile_reverse_False_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_matmul_combine_mode_pointwise_compile_mode_compile_reverse_True_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_matmul_combine_mode_pointwise_compile_mode_compile_reverse_True_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_matmul_combine_mode_pointwise_compile_mode_eager_reverse_False_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_matmul_combine_mode_pointwise_compile_mode_eager_reverse_True_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_matmul_combine_mode_pointwise_compile_mode_eager_reverse_True_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_matmul_combine_mode_pointwise_compile_mode_eager_reverse_True_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_matmul_combine_mode_pointwise_compile_mode_none_reverse_True_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_matmul_combine_mode_pointwise_compile_mode_none_reverse_True_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_combine_mode_generic_compile_mode_compile_dynamic_shape_reverse_False_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_combine_mode_generic_compile_mode_compile_dynamic_shape_reverse_False_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_combine_mode_generic_compile_mode_compile_dynamic_shape_reverse_True_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_combine_mode_generic_compile_mode_compile_dynamic_shape_reverse_True_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_combine_mode_generic_compile_mode_compile_reverse_False_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_combine_mode_generic_compile_mode_compile_reverse_False_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_combine_mode_generic_compile_mode_compile_reverse_False_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_combine_mode_generic_compile_mode_compile_reverse_True_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_combine_mode_generic_compile_mode_compile_reverse_True_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_combine_mode_generic_compile_mode_eager_reverse_False_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_combine_mode_generic_compile_mode_eager_reverse_False_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_combine_mode_generic_compile_mode_eager_reverse_False_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_combine_mode_generic_compile_mode_eager_reverse_True_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_combine_mode_generic_compile_mode_eager_reverse_True_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_combine_mode_generic_compile_mode_none_reverse_True_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_combine_mode_generic_compile_mode_none_reverse_True_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_combine_mode_pointwise_compile_mode_compile_dynamic_shape_reverse_False_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_combine_mode_pointwise_compile_mode_compile_dynamic_shape_reverse_False_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_combine_mode_pointwise_compile_mode_compile_dynamic_shape_reverse_True_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_combine_mode_pointwise_compile_mode_compile_dynamic_shape_reverse_True_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_combine_mode_pointwise_compile_mode_compile_dynamic_shape_reverse_True_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_combine_mode_pointwise_compile_mode_compile_reverse_False_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_combine_mode_pointwise_compile_mode_compile_reverse_True_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_combine_mode_pointwise_compile_mode_compile_reverse_True_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_combine_mode_pointwise_compile_mode_compile_reverse_True_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_combine_mode_pointwise_compile_mode_eager_reverse_False_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_combine_mode_pointwise_compile_mode_none_reverse_False_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_combine_mode_pointwise_compile_mode_none_reverse_False_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_different_dim_combine_mode_generic_compile_mode_compile_dynamic_shape_reverse_first_False_same_direction_False_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_different_dim_combine_mode_generic_compile_mode_compile_dynamic_shape_reverse_first_False_same_direction_False_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_different_dim_combine_mode_generic_compile_mode_compile_dynamic_shape_reverse_first_False_same_direction_True_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_different_dim_combine_mode_generic_compile_mode_compile_dynamic_shape_reverse_first_False_same_direction_True_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_different_dim_combine_mode_generic_compile_mode_compile_dynamic_shape_reverse_first_True_same_direction_False_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_different_dim_combine_mode_generic_compile_mode_compile_dynamic_shape_reverse_first_True_same_direction_True_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_different_dim_combine_mode_generic_compile_mode_compile_dynamic_shape_reverse_first_True_same_direction_True_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_different_dim_combine_mode_generic_compile_mode_compile_dynamic_shape_reverse_first_True_same_direction_True_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_different_dim_combine_mode_generic_compile_mode_compile_reverse_first_False_same_direction_False_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_different_dim_combine_mode_generic_compile_mode_compile_reverse_first_False_same_direction_False_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_different_dim_combine_mode_generic_compile_mode_compile_reverse_first_False_same_direction_True_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_different_dim_combine_mode_generic_compile_mode_compile_reverse_first_False_same_direction_True_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_different_dim_combine_mode_generic_compile_mode_compile_reverse_first_True_same_direction_True_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_different_dim_combine_mode_generic_compile_mode_compile_reverse_first_True_same_direction_True_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_different_dim_combine_mode_generic_compile_mode_eager_reverse_first_False_same_direction_False_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_different_dim_combine_mode_generic_compile_mode_eager_reverse_first_False_same_direction_True_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_different_dim_combine_mode_generic_compile_mode_eager_reverse_first_True_same_direction_False_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_different_dim_combine_mode_generic_compile_mode_eager_reverse_first_True_same_direction_False_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_different_dim_combine_mode_generic_compile_mode_none_reverse_first_False_same_direction_False_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_different_dim_combine_mode_generic_compile_mode_none_reverse_first_False_same_direction_True_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_different_dim_combine_mode_generic_compile_mode_none_reverse_first_True_same_direction_False_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_different_dim_combine_mode_generic_compile_mode_none_reverse_first_True_same_direction_True_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_different_dim_combine_mode_pointwise_compile_mode_compile_dynamic_shape_reverse_first_False_same_direction_False_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_different_dim_combine_mode_pointwise_compile_mode_compile_dynamic_shape_reverse_first_False_same_direction_False_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_different_dim_combine_mode_pointwise_compile_mode_compile_dynamic_shape_reverse_first_False_same_direction_False_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_different_dim_combine_mode_pointwise_compile_mode_compile_dynamic_shape_reverse_first_False_same_direction_False_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_different_dim_combine_mode_pointwise_compile_mode_compile_dynamic_shape_reverse_first_False_same_direction_True_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_different_dim_combine_mode_pointwise_compile_mode_compile_dynamic_shape_reverse_first_True_same_direction_False_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_different_dim_combine_mode_pointwise_compile_mode_compile_dynamic_shape_reverse_first_True_same_direction_True_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_different_dim_combine_mode_pointwise_compile_mode_compile_dynamic_shape_reverse_first_True_same_direction_True_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_different_dim_combine_mode_pointwise_compile_mode_compile_dynamic_shape_reverse_first_True_same_direction_True_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_different_dim_combine_mode_pointwise_compile_mode_compile_reverse_first_False_same_direction_False_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_different_dim_combine_mode_pointwise_compile_mode_compile_reverse_first_False_same_direction_False_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_different_dim_combine_mode_pointwise_compile_mode_compile_reverse_first_False_same_direction_True_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_different_dim_combine_mode_pointwise_compile_mode_compile_reverse_first_False_same_direction_True_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_different_dim_combine_mode_pointwise_compile_mode_compile_reverse_first_False_same_direction_True_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_different_dim_combine_mode_pointwise_compile_mode_compile_reverse_first_True_same_direction_False_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_different_dim_combine_mode_pointwise_compile_mode_compile_reverse_first_True_same_direction_False_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_different_dim_combine_mode_pointwise_compile_mode_compile_reverse_first_True_same_direction_True_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_different_dim_combine_mode_pointwise_compile_mode_compile_reverse_first_True_same_direction_True_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_different_dim_combine_mode_pointwise_compile_mode_eager_reverse_first_False_same_direction_False_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_different_dim_combine_mode_pointwise_compile_mode_eager_reverse_first_False_same_direction_True_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_different_dim_combine_mode_pointwise_compile_mode_eager_reverse_first_False_same_direction_True_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_different_dim_combine_mode_pointwise_compile_mode_eager_reverse_first_True_same_direction_False_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_different_dim_combine_mode_pointwise_compile_mode_none_reverse_first_False_same_direction_False_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_different_dim_combine_mode_pointwise_compile_mode_none_reverse_first_False_same_direction_False_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_different_dim_combine_mode_pointwise_compile_mode_none_reverse_first_False_same_direction_True_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_different_dim_combine_mode_pointwise_compile_mode_none_reverse_first_False_same_direction_True_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_different_dim_combine_mode_pointwise_compile_mode_none_reverse_first_True_same_direction_False_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_different_dim_combine_mode_pointwise_compile_mode_none_reverse_first_True_same_direction_True_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_expand_in_combine_fn_compile_mode_compile_dynamic_shape_reverse_False_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_expand_in_combine_fn_compile_mode_compile_dynamic_shape_reverse_False_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_expand_in_combine_fn_compile_mode_compile_dynamic_shape_reverse_True_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_expand_in_combine_fn_compile_mode_compile_reverse_False_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_expand_in_combine_fn_compile_mode_compile_reverse_True_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_expand_in_combine_fn_compile_mode_eager_reverse_False_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_expand_in_combine_fn_compile_mode_eager_reverse_True_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_expand_in_combine_fn_compile_mode_eager_reverse_True_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_expand_in_combine_fn_compile_mode_eager_reverse_True_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_expand_in_combine_fn_compile_mode_none_reverse_False_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_expand_in_combine_fn_compile_mode_none_reverse_True_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_expand_in_combine_fn_compile_mode_none_reverse_True_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_fct_compile_mode_compile_combine_mode_generic_reverse_False_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_fct_compile_mode_compile_combine_mode_generic_reverse_True_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_fct_compile_mode_compile_combine_mode_generic_reverse_True_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_fct_compile_mode_compile_combine_mode_pointwise_reverse_False_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_fct_compile_mode_compile_combine_mode_pointwise_reverse_False_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_fct_compile_mode_compile_combine_mode_pointwise_reverse_False_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_fct_compile_mode_compile_combine_mode_pointwise_reverse_True_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_fct_compile_mode_compile_dynamic_shape_combine_mode_generic_reverse_False_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_fct_compile_mode_compile_dynamic_shape_combine_mode_generic_reverse_False_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_fct_compile_mode_compile_dynamic_shape_combine_mode_generic_reverse_True_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_fct_compile_mode_compile_dynamic_shape_combine_mode_generic_reverse_True_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_fct_compile_mode_compile_dynamic_shape_combine_mode_pointwise_reverse_False_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_fct_compile_mode_compile_dynamic_shape_combine_mode_pointwise_reverse_True_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_fct_compile_mode_compile_dynamic_shape_combine_mode_pointwise_reverse_True_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_fct_compile_mode_eager_combine_mode_generic_reverse_False_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_fct_compile_mode_eager_combine_mode_generic_reverse_True_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_fct_compile_mode_eager_combine_mode_generic_reverse_True_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_fct_compile_mode_eager_combine_mode_pointwise_reverse_False_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_fct_compile_mode_eager_combine_mode_pointwise_reverse_False_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_fct_compile_mode_eager_combine_mode_pointwise_reverse_True_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_fct_compile_mode_eager_combine_mode_pointwise_reverse_True_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_fct_compile_mode_eager_combine_mode_pointwise_reverse_True_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_fct_compile_mode_none_combine_mode_generic_reverse_False_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_fct_compile_mode_none_combine_mode_generic_reverse_True_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_fct_compile_mode_none_combine_mode_generic_reverse_True_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_fct_compile_mode_none_combine_mode_pointwise_reverse_False_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_fct_compile_mode_none_combine_mode_pointwise_reverse_False_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_fct_compile_mode_none_combine_mode_pointwise_reverse_False_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_fct_compile_mode_none_combine_mode_pointwise_reverse_False_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_fct_compile_mode_none_combine_mode_pointwise_reverse_True_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_fct_compile_mode_none_combine_mode_pointwise_reverse_True_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_fct_generic_compile_mode_compile_dynamic_shape_reverse_False_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_fct_generic_compile_mode_compile_dynamic_shape_reverse_False_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_fct_generic_compile_mode_compile_dynamic_shape_reverse_False_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_fct_generic_compile_mode_compile_dynamic_shape_reverse_True_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_fct_generic_compile_mode_compile_dynamic_shape_reverse_True_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_fct_generic_compile_mode_compile_reverse_False_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_fct_generic_compile_mode_compile_reverse_False_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_fct_generic_compile_mode_compile_reverse_True_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_fct_generic_compile_mode_compile_reverse_True_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_fct_generic_compile_mode_eager_reverse_False_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_fct_generic_compile_mode_eager_reverse_False_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_fct_generic_compile_mode_eager_reverse_True_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_fct_generic_compile_mode_eager_reverse_True_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_fct_generic_compile_mode_none_reverse_False_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_fct_generic_compile_mode_none_reverse_False_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_fct_generic_compile_mode_none_reverse_True_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_fct_generic_compile_mode_none_reverse_True_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_nested_compile_mode_compile_combine_mode_generic_reverse_False_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_nested_compile_mode_compile_combine_mode_generic_reverse_False_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_nested_compile_mode_compile_combine_mode_generic_reverse_False_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_nested_compile_mode_compile_combine_mode_generic_reverse_True_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_nested_compile_mode_compile_combine_mode_pointwise_reverse_False_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_nested_compile_mode_compile_combine_mode_pointwise_reverse_False_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_nested_compile_mode_compile_combine_mode_pointwise_reverse_True_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_nested_compile_mode_compile_combine_mode_pointwise_reverse_True_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_nested_compile_mode_compile_dynamic_shape_combine_mode_generic_reverse_False_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_nested_compile_mode_compile_dynamic_shape_combine_mode_generic_reverse_True_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_nested_compile_mode_compile_dynamic_shape_combine_mode_generic_reverse_True_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_nested_compile_mode_compile_dynamic_shape_combine_mode_generic_reverse_True_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_nested_compile_mode_compile_dynamic_shape_combine_mode_pointwise_reverse_False_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_nested_compile_mode_compile_dynamic_shape_combine_mode_pointwise_reverse_False_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_nested_compile_mode_compile_dynamic_shape_combine_mode_pointwise_reverse_False_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_nested_compile_mode_compile_dynamic_shape_combine_mode_pointwise_reverse_True_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_nested_compile_mode_compile_dynamic_shape_combine_mode_pointwise_reverse_True_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_nested_compile_mode_compile_dynamic_shape_combine_mode_pointwise_reverse_True_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_nested_compile_mode_eager_combine_mode_generic_reverse_False_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_nested_compile_mode_eager_combine_mode_generic_reverse_False_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_nested_compile_mode_eager_combine_mode_pointwise_reverse_False_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_nested_compile_mode_eager_combine_mode_pointwise_reverse_True_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_nested_compile_mode_eager_combine_mode_pointwise_reverse_True_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_nested_compile_mode_none_combine_mode_generic_reverse_False_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_nested_compile_mode_none_combine_mode_generic_reverse_False_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_nested_compile_mode_none_combine_mode_generic_reverse_True_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_nested_compile_mode_none_combine_mode_generic_reverse_True_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_nested_compile_mode_none_combine_mode_generic_reverse_True_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_nested_compile_mode_none_combine_mode_pointwise_reverse_False_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_nested_compile_mode_none_combine_mode_pointwise_reverse_True_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_nested_compile_mode_none_combine_mode_pointwise_reverse_True_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_pytree_compile_mode_compile_dynamic_shape_reverse_False_cpu_combine_mode_generic_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_pytree_compile_mode_compile_dynamic_shape_reverse_False_cpu_combine_mode_pointwise_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_pytree_compile_mode_compile_dynamic_shape_reverse_False_cuda_combine_mode_generic_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_pytree_compile_mode_compile_dynamic_shape_reverse_False_cuda_combine_mode_pointwise_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_pytree_compile_mode_compile_dynamic_shape_reverse_True_cpu_combine_mode_generic_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_pytree_compile_mode_compile_dynamic_shape_reverse_True_cpu_combine_mode_pointwise_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_pytree_compile_mode_compile_dynamic_shape_reverse_True_cuda_combine_mode_generic_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_pytree_compile_mode_compile_dynamic_shape_reverse_True_cuda_combine_mode_generic_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_pytree_compile_mode_compile_dynamic_shape_reverse_True_cuda_combine_mode_pointwise_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_pytree_compile_mode_compile_reverse_False_cpu_combine_mode_generic_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_pytree_compile_mode_compile_reverse_False_cuda_combine_mode_generic_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_pytree_compile_mode_compile_reverse_False_cuda_combine_mode_generic_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_pytree_compile_mode_compile_reverse_True_cpu_combine_mode_generic_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_pytree_compile_mode_compile_reverse_True_cpu_combine_mode_generic_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_pytree_compile_mode_compile_reverse_True_cpu_combine_mode_pointwise_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_pytree_compile_mode_compile_reverse_True_cuda_combine_mode_generic_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_pytree_compile_mode_compile_reverse_True_cuda_combine_mode_pointwise_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_pytree_compile_mode_compile_reverse_True_cuda_combine_mode_pointwise_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_pytree_compile_mode_eager_reverse_False_cpu_combine_mode_generic_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_pytree_compile_mode_eager_reverse_False_cpu_combine_mode_pointwise_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_pytree_compile_mode_eager_reverse_False_cpu_combine_mode_pointwise_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_pytree_compile_mode_eager_reverse_False_cuda_combine_mode_generic_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_pytree_compile_mode_eager_reverse_False_cuda_combine_mode_generic_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_pytree_compile_mode_eager_reverse_False_cuda_combine_mode_pointwise_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_pytree_compile_mode_eager_reverse_True_cpu_combine_mode_generic_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_pytree_compile_mode_none_reverse_False_cpu_combine_mode_generic_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_pytree_compile_mode_none_reverse_False_cuda_combine_mode_generic_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_pytree_compile_mode_none_reverse_False_cuda_combine_mode_pointwise_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_pytree_compile_mode_none_reverse_True_cpu_combine_mode_generic_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_pytree_compile_mode_none_reverse_True_cpu_combine_mode_generic_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_pytree_compile_mode_none_reverse_True_cpu_combine_mode_pointwise_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_pytree_compile_mode_none_reverse_True_cuda_combine_mode_generic_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_pytree_compile_mode_none_reverse_True_cuda_combine_mode_pointwise_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_pytree_compile_mode_none_reverse_True_cuda_combine_mode_pointwise_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_shape_check_compile_mode_compile_combine_mode_generic_reverse_False_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_shape_check_compile_mode_compile_combine_mode_generic_reverse_False_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_shape_check_compile_mode_compile_combine_mode_generic_reverse_False_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_shape_check_compile_mode_compile_combine_mode_generic_reverse_True_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_shape_check_compile_mode_compile_combine_mode_generic_reverse_True_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_shape_check_compile_mode_compile_combine_mode_pointwise_reverse_False_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_shape_check_compile_mode_compile_combine_mode_pointwise_reverse_True_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_shape_check_compile_mode_compile_combine_mode_pointwise_reverse_True_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_shape_check_compile_mode_compile_dynamic_shape_combine_mode_generic_reverse_False_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_shape_check_compile_mode_compile_dynamic_shape_combine_mode_generic_reverse_False_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_shape_check_compile_mode_compile_dynamic_shape_combine_mode_generic_reverse_False_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_shape_check_compile_mode_compile_dynamic_shape_combine_mode_generic_reverse_False_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_shape_check_compile_mode_compile_dynamic_shape_combine_mode_generic_reverse_True_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_shape_check_compile_mode_compile_dynamic_shape_combine_mode_generic_reverse_True_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_shape_check_compile_mode_compile_dynamic_shape_combine_mode_pointwise_reverse_False_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_shape_check_compile_mode_compile_dynamic_shape_combine_mode_pointwise_reverse_False_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_shape_check_compile_mode_compile_dynamic_shape_combine_mode_pointwise_reverse_False_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_shape_check_compile_mode_compile_dynamic_shape_combine_mode_pointwise_reverse_True_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_shape_check_compile_mode_compile_dynamic_shape_combine_mode_pointwise_reverse_True_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_shape_check_compile_mode_compile_dynamic_shape_combine_mode_pointwise_reverse_True_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_shape_check_compile_mode_compile_dynamic_shape_combine_mode_pointwise_reverse_True_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_shape_check_compile_mode_eager_combine_mode_generic_reverse_False_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_shape_check_compile_mode_eager_combine_mode_generic_reverse_False_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_shape_check_compile_mode_eager_combine_mode_generic_reverse_False_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_shape_check_compile_mode_eager_combine_mode_generic_reverse_True_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_shape_check_compile_mode_eager_combine_mode_generic_reverse_True_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_shape_check_compile_mode_eager_combine_mode_pointwise_reverse_False_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_shape_check_compile_mode_eager_combine_mode_pointwise_reverse_False_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_shape_check_compile_mode_eager_combine_mode_pointwise_reverse_False_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_shape_check_compile_mode_eager_combine_mode_pointwise_reverse_True_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_shape_check_compile_mode_eager_combine_mode_pointwise_reverse_True_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_shape_check_compile_mode_none_combine_mode_generic_reverse_False_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_shape_check_compile_mode_none_combine_mode_generic_reverse_False_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_shape_check_compile_mode_none_combine_mode_generic_reverse_False_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_shape_check_compile_mode_none_combine_mode_generic_reverse_True_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_shape_check_compile_mode_none_combine_mode_generic_reverse_True_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_shape_check_compile_mode_none_combine_mode_generic_reverse_True_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_shape_check_compile_mode_none_combine_mode_generic_reverse_True_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_shape_check_compile_mode_none_combine_mode_pointwise_reverse_False_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_shape_check_compile_mode_none_combine_mode_pointwise_reverse_False_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_shape_check_compile_mode_none_combine_mode_pointwise_reverse_False_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_shape_check_compile_mode_none_combine_mode_pointwise_reverse_True_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_simple_compile_mode_compile_combine_mode_generic_reverse_True_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_simple_compile_mode_compile_combine_mode_pointwise_reverse_False_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_simple_compile_mode_compile_combine_mode_pointwise_reverse_False_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_simple_compile_mode_compile_combine_mode_pointwise_reverse_True_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_simple_compile_mode_compile_combine_mode_pointwise_reverse_True_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_simple_compile_mode_compile_dynamic_shape_combine_mode_generic_reverse_False_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_simple_compile_mode_compile_dynamic_shape_combine_mode_generic_reverse_False_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_simple_compile_mode_compile_dynamic_shape_combine_mode_generic_reverse_False_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_simple_compile_mode_compile_dynamic_shape_combine_mode_generic_reverse_True_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_simple_compile_mode_compile_dynamic_shape_combine_mode_pointwise_reverse_False_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_simple_compile_mode_compile_dynamic_shape_combine_mode_pointwise_reverse_False_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_simple_compile_mode_compile_dynamic_shape_combine_mode_pointwise_reverse_False_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_simple_compile_mode_compile_dynamic_shape_combine_mode_pointwise_reverse_True_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_simple_compile_mode_compile_dynamic_shape_combine_mode_pointwise_reverse_True_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_simple_compile_mode_compile_dynamic_shape_combine_mode_pointwise_reverse_True_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_simple_compile_mode_eager_combine_mode_generic_reverse_False_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_simple_compile_mode_eager_combine_mode_generic_reverse_False_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_simple_compile_mode_eager_combine_mode_generic_reverse_True_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_simple_compile_mode_eager_combine_mode_generic_reverse_True_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_simple_compile_mode_eager_combine_mode_pointwise_reverse_False_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_simple_compile_mode_eager_combine_mode_pointwise_reverse_False_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_simple_compile_mode_eager_combine_mode_pointwise_reverse_True_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_simple_compile_mode_eager_combine_mode_pointwise_reverse_True_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_simple_compile_mode_eager_combine_mode_pointwise_reverse_True_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_simple_compile_mode_eager_combine_mode_pointwise_reverse_True_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_simple_compile_mode_none_combine_mode_generic_reverse_False_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_simple_compile_mode_none_combine_mode_generic_reverse_True_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_simple_compile_mode_none_combine_mode_generic_reverse_True_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_simple_compile_mode_none_combine_mode_pointwise_reverse_False_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_simple_compile_mode_none_combine_mode_pointwise_reverse_False_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_simple_compile_mode_none_combine_mode_pointwise_reverse_True_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_simple_compile_mode_none_combine_mode_pointwise_reverse_True_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_simple_compile_mode_none_combine_mode_pointwise_reverse_True_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_loop_in_combine_fn_compile_mode_compile_dynamic_shape_loop_type_for_reverse_False_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_loop_in_combine_fn_compile_mode_compile_dynamic_shape_loop_type_for_reverse_False_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_loop_in_combine_fn_compile_mode_compile_dynamic_shape_loop_type_for_reverse_True_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_loop_in_combine_fn_compile_mode_compile_dynamic_shape_loop_type_for_reverse_True_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_loop_in_combine_fn_compile_mode_compile_loop_type_for_reverse_False_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_loop_in_combine_fn_compile_mode_compile_loop_type_for_reverse_False_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_loop_in_combine_fn_compile_mode_compile_loop_type_for_reverse_True_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_loop_in_combine_fn_compile_mode_compile_loop_type_for_reverse_True_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_loop_in_combine_fn_compile_mode_compile_loop_type_for_reverse_True_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_loop_in_combine_fn_compile_mode_eager_loop_type_for_reverse_False_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_loop_in_combine_fn_compile_mode_eager_loop_type_for_reverse_False_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_loop_in_combine_fn_compile_mode_eager_loop_type_for_reverse_True_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_loop_in_combine_fn_compile_mode_eager_loop_type_for_reverse_True_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_loop_in_combine_fn_compile_mode_none_loop_type_for_reverse_False_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_loop_in_combine_fn_compile_mode_none_loop_type_for_reverse_False_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_loop_in_combine_fn_compile_mode_none_loop_type_for_reverse_False_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_loop_in_combine_fn_compile_mode_none_loop_type_for_reverse_True_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_loop_in_combine_fn_compile_mode_none_loop_type_for_reverse_True_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_map_in_combine_fn, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_non_contiguous_tensor_compile_mode_compile_dynamic_shape_reverse_False_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_non_contiguous_tensor_compile_mode_compile_dynamic_shape_reverse_False_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_non_contiguous_tensor_compile_mode_compile_dynamic_shape_reverse_False_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_non_contiguous_tensor_compile_mode_compile_dynamic_shape_reverse_True_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_non_contiguous_tensor_compile_mode_compile_dynamic_shape_reverse_True_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_non_contiguous_tensor_compile_mode_compile_reverse_False_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_non_contiguous_tensor_compile_mode_compile_reverse_False_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_non_contiguous_tensor_compile_mode_compile_reverse_False_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_non_contiguous_tensor_compile_mode_compile_reverse_False_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_non_contiguous_tensor_compile_mode_compile_reverse_True_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_non_contiguous_tensor_compile_mode_eager_reverse_False_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_non_contiguous_tensor_compile_mode_eager_reverse_False_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_non_contiguous_tensor_compile_mode_eager_reverse_True_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_non_contiguous_tensor_compile_mode_eager_reverse_True_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_non_contiguous_tensor_compile_mode_eager_reverse_True_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_non_contiguous_tensor_compile_mode_none_reverse_False_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_non_contiguous_tensor_compile_mode_none_reverse_False_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_non_contiguous_tensor_compile_mode_none_reverse_False_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_non_contiguous_tensor_compile_mode_none_reverse_False_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_non_contiguous_tensor_compile_mode_none_reverse_True_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_non_pointwise_generic_reverse_False_compile_mode_compile_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_non_pointwise_generic_reverse_False_compile_mode_compile_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_non_pointwise_generic_reverse_False_compile_mode_compile_dynamic_shape_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_non_pointwise_generic_reverse_False_compile_mode_eager_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_non_pointwise_generic_reverse_False_compile_mode_eager_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_non_pointwise_generic_reverse_False_compile_mode_eager_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_non_pointwise_generic_reverse_False_compile_mode_none_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_non_pointwise_generic_reverse_True_compile_mode_compile_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_non_pointwise_generic_reverse_True_compile_mode_compile_dynamic_shape_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_non_pointwise_generic_reverse_True_compile_mode_eager_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_non_pointwise_generic_reverse_True_compile_mode_eager_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_non_pointwise_generic_reverse_True_compile_mode_none_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_non_pointwise_generic_reverse_True_compile_mode_none_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_output_output_alias, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_partial_grad_combine_mode_generic_compile_mode_compile_dynamic_shape_reverse_False_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_partial_grad_combine_mode_generic_compile_mode_compile_reverse_False_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_partial_grad_combine_mode_generic_compile_mode_compile_reverse_False_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_partial_grad_combine_mode_generic_compile_mode_compile_reverse_True_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_partial_grad_combine_mode_generic_compile_mode_eager_reverse_False_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_partial_grad_combine_mode_generic_compile_mode_eager_reverse_True_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_partial_grad_combine_mode_generic_compile_mode_none_reverse_False_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_partial_grad_combine_mode_generic_compile_mode_none_reverse_True_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_partial_grad_combine_mode_generic_compile_mode_none_reverse_True_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_partial_grad_combine_mode_pointwise_compile_mode_compile_dynamic_shape_reverse_False_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_partial_grad_combine_mode_pointwise_compile_mode_compile_dynamic_shape_reverse_False_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_partial_grad_combine_mode_pointwise_compile_mode_compile_dynamic_shape_reverse_True_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_partial_grad_combine_mode_pointwise_compile_mode_compile_reverse_False_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_partial_grad_combine_mode_pointwise_compile_mode_compile_reverse_False_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_partial_grad_combine_mode_pointwise_compile_mode_eager_reverse_True_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_partial_grad_combine_mode_pointwise_compile_mode_eager_reverse_True_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_partial_grad_combine_mode_pointwise_compile_mode_none_reverse_False_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_partial_grad_no_grad_combine_mode_generic_compile_mode_compile_dynamic_shape_reverse_False_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_partial_grad_no_grad_combine_mode_generic_compile_mode_compile_dynamic_shape_reverse_False_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_partial_grad_no_grad_combine_mode_generic_compile_mode_compile_dynamic_shape_reverse_True_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_partial_grad_no_grad_combine_mode_generic_compile_mode_compile_dynamic_shape_reverse_True_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_partial_grad_no_grad_combine_mode_generic_compile_mode_compile_reverse_False_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_partial_grad_no_grad_combine_mode_generic_compile_mode_eager_reverse_True_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_partial_grad_no_grad_combine_mode_pointwise_compile_mode_compile_dynamic_shape_reverse_True_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_partial_grad_no_grad_combine_mode_pointwise_compile_mode_compile_dynamic_shape_reverse_True_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_partial_grad_no_grad_combine_mode_pointwise_compile_mode_compile_reverse_False_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_partial_grad_no_grad_combine_mode_pointwise_compile_mode_compile_reverse_True_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_partial_grad_no_grad_combine_mode_pointwise_compile_mode_compile_reverse_True_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_partial_grad_no_grad_combine_mode_pointwise_compile_mode_eager_reverse_False_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_partial_grad_no_grad_combine_mode_pointwise_compile_mode_eager_reverse_False_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_partial_grad_no_grad_combine_mode_pointwise_compile_mode_eager_reverse_True_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_partial_grad_no_grad_combine_mode_pointwise_compile_mode_none_reverse_False_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_tuple_compile_mode_compile_combine_mode_generic_reverse_False_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_tuple_compile_mode_compile_combine_mode_generic_reverse_True_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_tuple_compile_mode_compile_combine_mode_generic_reverse_True_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_tuple_compile_mode_compile_combine_mode_generic_reverse_True_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_tuple_compile_mode_compile_combine_mode_generic_reverse_True_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_tuple_compile_mode_compile_combine_mode_pointwise_reverse_False_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_tuple_compile_mode_compile_combine_mode_pointwise_reverse_True_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_tuple_compile_mode_compile_combine_mode_pointwise_reverse_True_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_tuple_compile_mode_compile_combine_mode_pointwise_reverse_True_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_tuple_compile_mode_compile_dynamic_shape_combine_mode_generic_reverse_False_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_tuple_compile_mode_compile_dynamic_shape_combine_mode_generic_reverse_False_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_tuple_compile_mode_compile_dynamic_shape_combine_mode_generic_reverse_True_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_tuple_compile_mode_compile_dynamic_shape_combine_mode_generic_reverse_True_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_tuple_compile_mode_compile_dynamic_shape_combine_mode_generic_reverse_True_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_tuple_compile_mode_compile_dynamic_shape_combine_mode_pointwise_reverse_False_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_tuple_compile_mode_compile_dynamic_shape_combine_mode_pointwise_reverse_False_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_tuple_compile_mode_compile_dynamic_shape_combine_mode_pointwise_reverse_True_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_tuple_compile_mode_compile_dynamic_shape_combine_mode_pointwise_reverse_True_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_tuple_compile_mode_compile_dynamic_shape_combine_mode_pointwise_reverse_True_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_tuple_compile_mode_eager_combine_mode_generic_reverse_False_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_tuple_compile_mode_eager_combine_mode_generic_reverse_True_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_tuple_compile_mode_eager_combine_mode_generic_reverse_True_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_tuple_compile_mode_eager_combine_mode_pointwise_reverse_False_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_tuple_compile_mode_eager_combine_mode_pointwise_reverse_False_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_tuple_compile_mode_eager_combine_mode_pointwise_reverse_True_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_tuple_compile_mode_eager_combine_mode_pointwise_reverse_True_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_tuple_compile_mode_eager_combine_mode_pointwise_reverse_True_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_tuple_compile_mode_none_combine_mode_generic_reverse_False_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_tuple_compile_mode_none_combine_mode_generic_reverse_True_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_tuple_compile_mode_none_combine_mode_pointwise_reverse_False_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_tuple_compile_mode_none_combine_mode_pointwise_reverse_True_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_tuple_compile_mode_none_combine_mode_pointwise_reverse_True_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_tuple_compile_mode_none_combine_mode_pointwise_reverse_True_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_vmap_in_combine_fn_compile_mode_compile_dynamic_shape_reverse_False_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_vmap_in_combine_fn_compile_mode_compile_dynamic_shape_reverse_False_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_vmap_in_combine_fn_compile_mode_compile_dynamic_shape_reverse_False_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_vmap_in_combine_fn_compile_mode_compile_dynamic_shape_reverse_True_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_vmap_in_combine_fn_compile_mode_compile_dynamic_shape_reverse_True_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_vmap_in_combine_fn_compile_mode_compile_dynamic_shape_reverse_True_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_vmap_in_combine_fn_compile_mode_compile_reverse_False_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_vmap_in_combine_fn_compile_mode_compile_reverse_True_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_vmap_in_combine_fn_compile_mode_compile_reverse_True_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_vmap_in_combine_fn_compile_mode_eager_reverse_False_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_vmap_in_combine_fn_compile_mode_eager_reverse_False_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_vmap_in_combine_fn_compile_mode_eager_reverse_True_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_vmap_in_combine_fn_compile_mode_none_reverse_False_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_vmap_in_combine_fn_compile_mode_none_reverse_False_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_vmap_in_combine_fn_compile_mode_none_reverse_True_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_vmap_in_combine_fn_compile_mode_none_reverse_True_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_vmap_in_combine_fn_compile_mode_none_reverse_True_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_vmap_in_combine_fn_compile_mode_none_reverse_True_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_wrong_pytree, test/functorch/test_control_flow.py::TestControlFlowTraced::test_compile_while_loop_stack_output_dynamic_False_backend_aot_eager, test/functorch/test_control_flow.py::TestControlFlowTraced::test_compile_while_loop_stack_output_dynamic_False_backend_eager, test/functorch/test_control_flow.py::TestControlFlowTraced::test_compile_while_loop_stack_output_dynamic_True_backend_aot_eager, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_autograd_backward, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_eager_run_with_item, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_functionalized, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_functionalized_data_dependent_pred, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_functionalized_input_aliasing_with_aot_func, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_functionalized_input_mutation_on_false_branch, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_functionalized_input_mutation_on_true_branch, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_functionalized_nested, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_functionalized_nested_input_mutation_with_aot_func, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_functionalized_output_alias_input, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_mismatched_branch_output_dynamic_False_backend_aot_eager, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_mismatched_branch_output_dynamic_False_backend_eager, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_mismatched_branch_strided_output_dynamic_True_backend_eager, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_nested_traced_fake_tensor, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_nested_traced_other_inputs, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_nested_traced_other_inputs_fake_tensor, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_nested_with_closure, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_simple_with_linear_compile_check_graph, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_trace_set__and_mutate_intermediate, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_traced_not_nested_fake_tensor, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_tracing_with_valid_inputs_predType_boolTensor_innerFnType_function_nOperands_0_nClosure_0_nesting_0, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_tracing_with_valid_inputs_predType_boolTensor_innerFnType_function_nOperands_0_nClosure_1_nesting_0, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_tracing_with_valid_inputs_predType_boolTensor_innerFnType_function_nOperands_1_nClosure_0_nesting_0, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_tracing_with_valid_inputs_predType_boolTensor_innerFnType_function_nOperands_1_nClosure_0_nesting_2, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_tracing_with_valid_inputs_predType_boolTensor_innerFnType_module_nOperands_0_nClosure_0_nesting_0, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_tracing_with_valid_inputs_predType_boolTensor_innerFnType_module_nOperands_0_nClosure_0_nesting_2, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_tracing_with_valid_inputs_predType_boolTensor_innerFnType_module_nOperands_0_nClosure_1_nesting_0, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_tracing_with_valid_inputs_predType_boolTensor_innerFnType_module_nOperands_0_nClosure_1_nesting_2, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_tracing_with_valid_inputs_predType_boolTensor_innerFnType_module_nOperands_1_nClosure_0_nesting_2, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_tracing_with_valid_inputs_predType_boolTensor_innerFnType_module_nOperands_1_nClosure_1_nesting_0, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_tracing_with_valid_inputs_predType_boolTensor_innerFnType_object_nOperands_0_nClosure_0_nesting_2, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_tracing_with_valid_inputs_predType_boolTensor_innerFnType_object_nOperands_0_nClosure_1_nesting_0, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_tracing_with_valid_inputs_predType_boolTensor_innerFnType_object_nOperands_0_nClosure_1_nesting_2, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_tracing_with_valid_inputs_predType_bool_innerFnType_function_nOperands_0_nClosure_0_nesting_0, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_tracing_with_valid_inputs_predType_bool_innerFnType_function_nOperands_0_nClosure_1_nesting_2, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_tracing_with_valid_inputs_predType_bool_innerFnType_function_nOperands_1_nClosure_1_nesting_0, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_tracing_with_valid_inputs_predType_bool_innerFnType_module_nOperands_0_nClosure_0_nesting_0, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_tracing_with_valid_inputs_predType_bool_innerFnType_module_nOperands_0_nClosure_0_nesting_2, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_tracing_with_valid_inputs_predType_bool_innerFnType_module_nOperands_1_nClosure_0_nesting_0, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_tracing_with_valid_inputs_predType_bool_innerFnType_module_nOperands_1_nClosure_0_nesting_2, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_tracing_with_valid_inputs_predType_bool_innerFnType_module_nOperands_1_nClosure_1_nesting_2, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_tracing_with_valid_inputs_predType_bool_innerFnType_object_nOperands_0_nClosure_0_nesting_0, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_tracing_with_valid_inputs_predType_bool_innerFnType_object_nOperands_0_nClosure_0_nesting_2, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_tracing_with_valid_inputs_predType_bool_innerFnType_object_nOperands_0_nClosure_1_nesting_2, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_tracing_with_valid_inputs_predType_bool_innerFnType_object_nOperands_1_nClosure_1_nesting_2, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_tracing_with_valid_inputs_predType_floatTensor_innerFnType_function_nOperands_0_nClosure_0_nesting_0, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_tracing_with_valid_inputs_predType_floatTensor_innerFnType_function_nOperands_0_nClosure_0_nesting_2, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_tracing_with_valid_inputs_predType_floatTensor_innerFnType_function_nOperands_1_nClosure_0_nesting_0, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_tracing_with_valid_inputs_predType_floatTensor_innerFnType_function_nOperands_1_nClosure_0_nesting_2, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_tracing_with_valid_inputs_predType_floatTensor_innerFnType_function_nOperands_1_nClosure_1_nesting_2, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_tracing_with_valid_inputs_predType_floatTensor_innerFnType_module_nOperands_0_nClosure_0_nesting_2, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_tracing_with_valid_inputs_predType_floatTensor_innerFnType_module_nOperands_0_nClosure_1_nesting_0, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_tracing_with_valid_inputs_predType_floatTensor_innerFnType_module_nOperands_1_nClosure_0_nesting_0, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_tracing_with_valid_inputs_predType_floatTensor_innerFnType_object_nOperands_0_nClosure_0_nesting_0, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_tracing_with_valid_inputs_predType_floatTensor_innerFnType_object_nOperands_1_nClosure_0_nesting_0, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_tracing_with_valid_inputs_predType_floatTensor_innerFnType_object_nOperands_1_nClosure_1_nesting_2, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_tracing_with_valid_inputs_predType_intTensor_innerFnType_function_nOperands_0_nClosure_0_nesting_2, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_tracing_with_valid_inputs_predType_intTensor_innerFnType_function_nOperands_0_nClosure_1_nesting_0, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_tracing_with_valid_inputs_predType_intTensor_innerFnType_function_nOperands_0_nClosure_1_nesting_2, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_tracing_with_valid_inputs_predType_intTensor_innerFnType_function_nOperands_1_nClosure_1_nesting_0, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_tracing_with_valid_inputs_predType_intTensor_innerFnType_function_nOperands_1_nClosure_1_nesting_2, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_tracing_with_valid_inputs_predType_intTensor_innerFnType_module_nOperands_0_nClosure_1_nesting_2, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_tracing_with_valid_inputs_predType_intTensor_innerFnType_module_nOperands_1_nClosure_0_nesting_2, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_tracing_with_valid_inputs_predType_intTensor_innerFnType_module_nOperands_1_nClosure_1_nesting_0, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_tracing_with_valid_inputs_predType_intTensor_innerFnType_object_nOperands_0_nClosure_0_nesting_0, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_tracing_with_valid_inputs_predType_intTensor_innerFnType_object_nOperands_0_nClosure_0_nesting_2, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_tracing_with_valid_inputs_predType_intTensor_innerFnType_object_nOperands_0_nClosure_1_nesting_0, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_tracing_with_valid_inputs_predType_intTensor_innerFnType_object_nOperands_1_nClosure_0_nesting_0, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_tracing_with_valid_inputs_predType_intTensor_innerFnType_object_nOperands_1_nClosure_0_nesting_2, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_tracing_with_valid_inputs_predType_intTensor_innerFnType_object_nOperands_1_nClosure_1_nesting_0, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_vmap_multiple_args_with_closure, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_vmap_multiple_outputs_nClosure_0, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_vmap_multiple_outputs_nClosure_1, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_vmap_predType_boolTensor_innerFnType_function_nOperands_1_nClosure_0_nesting_0, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_vmap_predType_boolTensor_innerFnType_function_nOperands_1_nClosure_1_nesting_0, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_vmap_predType_boolTensor_innerFnType_module_nOperands_2_nClosure_0_nesting_0, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_vmap_predType_boolTensor_innerFnType_module_nOperands_2_nClosure_1_nesting_0, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_vmap_predType_boolTensor_innerFnType_object_nOperands_1_nClosure_1_nesting_0, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_vmap_predType_boolTensor_innerFnType_object_nOperands_2_nClosure_0_nesting_0, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_vmap_predType_boolTensor_innerFnType_object_nOperands_2_nClosure_1_nesting_0, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_vmap_simple, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_vmap_single_input_with_closure, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_with_consecutive_make_fx_symbolic, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_with_module_param_closure, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_with_tensor_closure, test/functorch/test_control_flow.py::TestControlFlowTraced::test_input_input_alias, test/functorch/test_control_flow.py::TestControlFlowTraced::test_input_mutation_inference_mode_True, test/functorch/test_control_flow.py::TestControlFlowTraced::test_map_functionalized_aot_func, test/functorch/test_control_flow.py::TestControlFlowTraced::test_map_functionalized_elem_mutation, test/functorch/test_control_flow.py::TestControlFlowTraced::test_merge_output, test/functorch/test_control_flow.py::TestControlFlowTraced::test_raise_error_on_mismatch_type_size, test/functorch/test_control_flow.py::TestControlFlowTraced::test_scan_in_vmap_mixed_batch_dims, test/functorch/test_control_flow.py::TestControlFlowTraced::test_scan_in_vmap_simple, test/functorch/test_control_flow.py::TestControlFlowTraced::test_scan_in_vmap_unbatched_init_error, test/functorch/test_control_flow.py::TestControlFlowTraced::test_scan_in_vmap_unbatched_x, test/functorch/test_control_flow.py::TestControlFlowTraced::test_scan_vmap_scan_nested, test/functorch/test_control_flow.py::TestControlFlowTraced::test_tracing_map_autograd_aot_functionalized, test/functorch/test_control_flow.py::TestControlFlowTraced::test_tracing_map_real, test/functorch/test_control_flow.py::TestControlFlowTraced::test_tracing_map_symbolic_list, test/functorch/test_control_flow.py::TestControlFlowTraced::test_tracing_map_symbolic_simple, test/functorch/test_control_flow.py::TestControlFlowTraced::test_vmap_vmap_boolcond_False, test/functorch/test_control_flow.py::TestControlFlowTraced::test_while_loop_autograd_simple, test/functorch/test_control_flow.py::TestControlFlowTraced::test_while_loop_compile_backend_aot_eager_while_loop_test_const_and_symint_output, test/functorch/test_control_flow.py::TestControlFlowTraced::test_while_loop_compile_backend_aot_eager_while_loop_test_int_carry, test/functorch/test_control_flow.py::TestControlFlowTraced::test_while_loop_compile_backend_aot_eager_while_loop_test_pytree_int_carry, test/functorch/test_control_flow.py::TestControlFlowTraced::test_while_loop_compile_backend_aot_eager_while_loop_test_simple, test/functorch/test_control_flow.py::TestControlFlowTraced::test_while_loop_compile_backend_eager_while_loop_test_const_and_symint_output, test/functorch/test_control_flow.py::TestControlFlowTraced::test_while_loop_compile_backend_eager_while_loop_test_int_carry, test/functorch/test_control_flow.py::TestControlFlowTraced::test_while_loop_compile_backend_eager_while_loop_test_nested2, test/functorch/test_control_flow.py::TestControlFlowTraced::test_while_loop_compile_backend_eager_while_loop_test_nested_with_linear, test/functorch/test_control_flow.py::TestControlFlowTraced::test_while_loop_compile_backend_eager_while_loop_test_simple_with_mutation, test/functorch/test_control_flow.py::TestControlFlowTraced::test_while_loop_compile_backend_eager_while_loop_test_simple_with_pytree_carry, test/functorch/test_control_flow.py::TestControlFlowTraced::test_while_loop_functionalize_func_type_cpp_while_loop_test_nested, test/functorch/test_control_flow.py::TestControlFlowTraced::test_while_loop_functionalize_func_type_cpp_while_loop_test_nested2, test/functorch/test_control_flow.py::TestControlFlowTraced::test_while_loop_functionalize_func_type_cpp_while_loop_test_simple, test/functorch/test_control_flow.py::TestControlFlowTraced::test_while_loop_functionalize_func_type_cpp_while_loop_test_simple_with_mutation, test/functorch/test_control_flow.py::TestControlFlowTraced::test_while_loop_functionalize_func_type_functorch_while_loop_test_nested, test/functorch/test_control_flow.py::TestControlFlowTraced::test_while_loop_functionalize_func_type_functorch_while_loop_test_simple, test/functorch/test_control_flow.py::TestControlFlowTraced::test_while_loop_functionalize_func_type_functorch_while_loop_test_simple_with_pytree_carry, test/functorch/test_control_flow.py::TestControlFlowTraced::test_while_loop_functionalize_func_type_no_while_loop_test_simple, test/functorch/test_control_flow.py::TestControlFlowTraced::test_while_loop_functionalize_func_type_no_while_loop_test_simple_with_pytree_carry, test/functorch/test_control_flow.py::TestControlFlowTraced::test_while_loop_functionalize_func_type_python_while_loop_test_nested, test/functorch/test_control_flow.py::TestControlFlowTraced::test_while_loop_functionalize_func_type_python_while_loop_test_nested2, test/functorch/test_control_flow.py::TestControlFlowTraced::test_while_loop_functionalize_func_type_python_while_loop_test_simple_with_pytree_carry, test/functorch/test_control_flow.py::TestControlFlowTraced::test_while_loop_nested2_traced, test/functorch/test_control_flow.py::TestControlFlowTraced::test_while_loop_nested_traced, test/functorch/test_control_flow.py::TestControlFlowTraced::test_while_loop_op_constant_and_symint_output_export_strict_False_dynamic_False, test/functorch/test_control_flow.py::TestControlFlowTraced::test_while_loop_op_constant_and_symint_output_export_strict_False_dynamic_True, test/functorch/test_control_flow.py::TestControlFlowTraced::test_while_loop_op_constant_and_symint_output_export_strict_True_dynamic_False, test/functorch/test_control_flow.py::TestControlFlowTraced::test_while_loop_op_constant_and_symint_output_export_strict_True_dynamic_True, test/functorch/test_control_flow.py::TestControlFlowTraced::test_while_loop_op_int_carry_export_strict_False_dynamic_False, test/functorch/test_control_flow.py::TestControlFlowTraced::test_while_loop_op_int_carry_export_strict_True_dynamic_False, test/functorch/test_control_flow.py::TestControlFlowTraced::test_while_loop_op_pytree_int_carry_compile_dynamic_False_backend_aot_eager, test/functorch/test_control_flow.py::TestControlFlowTraced::test_while_loop_op_pytree_int_carry_compile_dynamic_False_backend_eager, test/functorch/test_control_flow.py::TestControlFlowTraced::test_while_loop_op_pytree_int_carry_compile_dynamic_True_backend_aot_eager, test/functorch/test_control_flow.py::TestControlFlowTraced::test_while_loop_op_pytree_int_carry_compile_dynamic_True_backend_eager, test/functorch/test_control_flow.py::TestControlFlowTraced::test_while_loop_op_pytree_int_carry_export_strict_False_dynamic_True, test/functorch/test_control_flow.py::TestControlFlowTraced::test_while_loop_op_pytree_int_carry_export_strict_True_dynamic_False, test/functorch/test_control_flow.py::TestControlFlowTraced::test_while_loop_op_pytree_int_carry_export_strict_True_dynamic_True, test/functorch/test_control_flow.py::TestControlFlowTraced::test_while_loop_simple_functionalize_check_graph_func_type_cpp, test/functorch/test_control_flow.py::TestControlFlowTraced::test_while_loop_simple_functionalize_check_graph_func_type_functorch, test/functorch/test_control_flow.py::TestControlFlowTraced::test_while_loop_tracing_while_loop_test_nested, test/functorch/test_control_flow.py::TestControlFlowTraced::test_while_loop_tracing_while_loop_test_nested_with_linear, test/functorch/test_control_flow.py::TestControlFlowTraced::test_while_loop_tracing_while_loop_test_simple, test/functorch/test_control_flow.py::TestControlFlowTraced::test_while_loop_tracing_while_loop_test_simple_with_linear, test/functorch/test_control_flow.py::TestHopSchema::test_associative_scan_gen_schema_multiple_inputs, test/functorch/test_control_flow.py::TestHopSchema::test_associative_scan_gen_schema_with_additional_inputs, test/functorch/test_control_flow.py::TestHopSchema::test_cond_gen_schema_symbool_inputs, test/functorch/test_control_flow.py::TestHopSchema::test_cond_gen_schema_tensor_inputs, test/functorch/test_control_flow.py::TestHopSchema::test_list_gen_schema_type_ScriptObj, test/functorch/test_control_flow.py::TestHopSchema::test_list_gen_schema_type_float, test/functorch/test_control_flow.py::TestHopSchema::test_list_gen_schema_type_int, test/functorch/test_control_flow.py::TestHopSchema::test_schema_tree_spec, test/functorch/test_control_flow.py::TestHopSchema::test_type_gen_schema_type_GraphModule, test/functorch/test_control_flow.py::TestHopSchema::test_type_gen_schema_type_ScriptObj, test/functorch/test_control_flow.py::TestHopSchema::test_type_gen_schema_type_SymBool, test/functorch/test_control_flow.py::TestHopSchema::test_type_gen_schema_type_SymInt, test/functorch/test_control_flow.py::TestHopSchema::test_type_gen_schema_type_Tensor, test/functorch/test_control_flow.py::TestHopSchema::test_type_gen_schema_type_float, test/functorch/test_control_flow.py::TestHopSchema::test_while_loop_gen_schema_with_additional_inputs, test/functorch/test_control_flow.py::TestHopSchema::test_while_loop_gen_schema_with_int_carries 2025-12-04T17:04:34.6142547Z 2025-12-04T17:04:34.6142968Z Finished functorch/test_control_flow 2/2 ... [2025-12-04 17:04:34.484435][28302.86732389], took 21.03min 2025-12-04T17:04:34.6144317Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/functorch.test_control_flow/functorch.test_control_flow-922a9914156e0312.xml 2025-12-04T17:04:36.0860652Z Uploading artifacts took 1.47 seconds 2025-12-04T17:04:36.0864625Z Running test_subclass 1/1 ... [2025-12-04 17:04:36.086263][28304.469158108] 2025-12-04T17:04:36.0865223Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T17:04:36.0869673Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_subclass.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 17:04:36.086734] 2025-12-04T17:04:42.0600278Z 2025-12-04T17:04:42.0601322Z test_subclass 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_subclass_1.1_b65d4f741f14f053_.log 2025-12-04T17:04:42.0645441Z Running 100 items in this shard: test/test_subclass.py::TestSubclass::test_deepcopy_base_tensor_as_param_False, test/test_subclass.py::TestSubclass::test_deepcopy_base_tensor_as_param_True, test/test_subclass.py::TestSubclass::test_deepcopy_diag_tensor_below_as_param_False, test/test_subclass.py::TestSubclass::test_deepcopy_diag_tensor_below_as_param_True, test/test_subclass.py::TestSubclass::test_deepcopy_logging_tensor_as_param_False, test/test_subclass.py::TestSubclass::test_deepcopy_logging_tensor_as_param_True, test/test_subclass.py::TestSubclass::test_deepcopy_non_wrapper_tensor_as_param_False, test/test_subclass.py::TestSubclass::test_deepcopy_non_wrapper_tensor_as_param_True, test/test_subclass.py::TestSubclass::test_deepcopy_sparse_tensor_as_param_False, test/test_subclass.py::TestSubclass::test_deepcopy_sparse_tensor_as_param_True, test/test_subclass.py::TestSubclass::test_deepcopy_wrapper_with_custom_sizes_as_param_False, test/test_subclass.py::TestSubclass::test_deepcopy_wrapper_with_custom_sizes_as_param_True, test/test_subclass.py::TestSubclass::test_deepcopy_wrapper_with_custom_strides_as_param_False, test/test_subclass.py::TestSubclass::test_deepcopy_wrapper_with_custom_strides_as_param_True, test/test_subclass.py::TestSubclass::test_lazy_module_base_tensor, test/test_subclass.py::TestSubclass::test_lazy_module_diag_tensor_below, test/test_subclass.py::TestSubclass::test_lazy_module_logging_tensor, test/test_subclass.py::TestSubclass::test_lazy_module_non_wrapper_tensor, test/test_subclass.py::TestSubclass::test_lazy_module_sparse_tensor, test/test_subclass.py::TestSubclass::test_lazy_module_wrapper_with_custom_sizes, test/test_subclass.py::TestSubclass::test_lazy_module_wrapper_with_custom_strides, test/test_subclass.py::TestSubclass::test_module_optimization_base_tensor, test/test_subclass.py::TestSubclass::test_module_optimization_diag_tensor_below, test/test_subclass.py::TestSubclass::test_module_optimization_logging_tensor, test/test_subclass.py::TestSubclass::test_module_optimization_non_wrapper_tensor, test/test_subclass.py::TestSubclass::test_module_optimization_sparse_tensor, test/test_subclass.py::TestSubclass::test_module_optimization_wrapper_with_custom_sizes, test/test_subclass.py::TestSubclass::test_module_optimization_wrapper_with_custom_strides, test/test_subclass.py::TestSubclass::test_non_rewrapping_torch_dispatch_subclass_as_parameter_throws_for_detach, test/test_subclass.py::TestSubclass::test_param_invariants_base_tensor_tensor_requires_grad_False, test/test_subclass.py::TestSubclass::test_param_invariants_base_tensor_tensor_requires_grad_True, test/test_subclass.py::TestSubclass::test_param_invariants_diag_tensor_below_tensor_requires_grad_False, test/test_subclass.py::TestSubclass::test_param_invariants_diag_tensor_below_tensor_requires_grad_True, test/test_subclass.py::TestSubclass::test_param_invariants_logging_tensor_tensor_requires_grad_False, test/test_subclass.py::TestSubclass::test_param_invariants_logging_tensor_tensor_requires_grad_True, test/test_subclass.py::TestSubclass::test_param_invariants_non_wrapper_tensor_tensor_requires_grad_False, test/test_subclass.py::TestSubclass::test_param_invariants_non_wrapper_tensor_tensor_requires_grad_True, test/test_subclass.py::TestSubclass::test_param_invariants_sparse_tensor_tensor_requires_grad_False, test/test_subclass.py::TestSubclass::test_param_invariants_sparse_tensor_tensor_requires_grad_True, test/test_subclass.py::TestSubclass::test_param_invariants_wrapper_with_custom_sizes_tensor_requires_grad_False, test/test_subclass.py::TestSubclass::test_param_invariants_wrapper_with_custom_sizes_tensor_requires_grad_True, test/test_subclass.py::TestSubclass::test_param_invariants_wrapper_with_custom_strides_tensor_requires_grad_False, test/test_subclass.py::TestSubclass::test_param_invariants_wrapper_with_custom_strides_tensor_requires_grad_True, test/test_subclass.py::TestSubclass::test_parametrization_base_tensor_leave_parametrized_False, test/test_subclass.py::TestSubclass::test_parametrization_base_tensor_leave_parametrized_True, test/test_subclass.py::TestSubclass::test_parametrization_diag_tensor_below_leave_parametrized_False, test/test_subclass.py::TestSubclass::test_parametrization_diag_tensor_below_leave_parametrized_True, test/test_subclass.py::TestSubclass::test_parametrization_logging_tensor_leave_parametrized_False, test/test_subclass.py::TestSubclass::test_parametrization_logging_tensor_leave_parametrized_True, test/test_subclass.py::TestSubclass::test_parametrization_non_wrapper_tensor_leave_parametrized_False, test/test_subclass.py::TestSubclass::test_parametrization_non_wrapper_tensor_leave_parametrized_True, test/test_subclass.py::TestSubclass::test_parametrization_sparse_tensor_leave_parametrized_False, test/test_subclass.py::TestSubclass::test_parametrization_sparse_tensor_leave_parametrized_True, test/test_subclass.py::TestSubclass::test_parametrization_wrapper_with_custom_sizes_leave_parametrized_False, test/test_subclass.py::TestSubclass::test_parametrization_wrapper_with_custom_sizes_leave_parametrized_True, test/test_subclass.py::TestSubclass::test_parametrization_wrapper_with_custom_strides_leave_parametrized_False, test/test_subclass.py::TestSubclass::test_parametrization_wrapper_with_custom_strides_leave_parametrized_True, test/test_subclass.py::TestSubclass::test_repr_base_tensor_as_param_False, test/test_subclass.py::TestSubclass::test_repr_base_tensor_as_param_True, test/test_subclass.py::TestSubclass::test_repr_diag_tensor_below_as_param_False, test/test_subclass.py::TestSubclass::test_repr_diag_tensor_below_as_param_True, test/test_subclass.py::TestSubclass::test_repr_logging_tensor_as_param_False, test/test_subclass.py::TestSubclass::test_repr_logging_tensor_as_param_True, test/test_subclass.py::TestSubclass::test_repr_non_wrapper_tensor_as_param_False, test/test_subclass.py::TestSubclass::test_repr_non_wrapper_tensor_as_param_True, test/test_subclass.py::TestSubclass::test_repr_sparse_tensor_as_param_False, test/test_subclass.py::TestSubclass::test_repr_sparse_tensor_as_param_True, test/test_subclass.py::TestSubclass::test_repr_wrapper_with_custom_sizes_as_param_False, test/test_subclass.py::TestSubclass::test_repr_wrapper_with_custom_sizes_as_param_True, test/test_subclass.py::TestSubclass::test_repr_wrapper_with_custom_strides_as_param_False, test/test_subclass.py::TestSubclass::test_repr_wrapper_with_custom_strides_as_param_True, test/test_subclass.py::TestSubclass::test_serialization_base_tensor_as_param_False, test/test_subclass.py::TestSubclass::test_serialization_base_tensor_as_param_True, test/test_subclass.py::TestSubclass::test_serialization_diag_tensor_below_as_param_False, test/test_subclass.py::TestSubclass::test_serialization_diag_tensor_below_as_param_True, test/test_subclass.py::TestSubclass::test_serialization_logging_tensor_as_param_False, test/test_subclass.py::TestSubclass::test_serialization_logging_tensor_as_param_True, test/test_subclass.py::TestSubclass::test_serialization_non_wrapper_tensor_as_param_False, test/test_subclass.py::TestSubclass::test_serialization_non_wrapper_tensor_as_param_True, test/test_subclass.py::TestSubclass::test_serialization_sparse_tensor_as_param_False, test/test_subclass.py::TestSubclass::test_serialization_sparse_tensor_as_param_True, test/test_subclass.py::TestSubclass::test_serialization_wrapper_with_custom_sizes_as_param_False, test/test_subclass.py::TestSubclass::test_serialization_wrapper_with_custom_sizes_as_param_True, test/test_subclass.py::TestSubclass::test_serialization_wrapper_with_custom_strides_as_param_False, test/test_subclass.py::TestSubclass::test_serialization_wrapper_with_custom_strides_as_param_True, test/test_subclass.py::TestSubclass::test_tensor_subclass_storage_data_accesses_throw, test/test_subclass.py::TestSubclass::test_type_propagation_base_tensor_as_param_False, test/test_subclass.py::TestSubclass::test_type_propagation_base_tensor_as_param_True, test/test_subclass.py::TestSubclass::test_type_propagation_diag_tensor_below_as_param_False, test/test_subclass.py::TestSubclass::test_type_propagation_diag_tensor_below_as_param_True, test/test_subclass.py::TestSubclass::test_type_propagation_logging_tensor_as_param_False, test/test_subclass.py::TestSubclass::test_type_propagation_logging_tensor_as_param_True, test/test_subclass.py::TestSubclass::test_type_propagation_non_wrapper_tensor_as_param_False, test/test_subclass.py::TestSubclass::test_type_propagation_non_wrapper_tensor_as_param_True, test/test_subclass.py::TestSubclass::test_type_propagation_sparse_tensor_as_param_False, test/test_subclass.py::TestSubclass::test_type_propagation_sparse_tensor_as_param_True, test/test_subclass.py::TestSubclass::test_type_propagation_wrapper_with_custom_sizes_as_param_False, test/test_subclass.py::TestSubclass::test_type_propagation_wrapper_with_custom_sizes_as_param_True, test/test_subclass.py::TestSubclass::test_type_propagation_wrapper_with_custom_strides_as_param_False, test/test_subclass.py::TestSubclass::test_type_propagation_wrapper_with_custom_strides_as_param_True 2025-12-04T17:04:42.0689007Z 2025-12-04T17:04:42.0689299Z Finished test_subclass 1/1 ... [2025-12-04 17:04:42.059996][28310.442891746], took 0.10min 2025-12-04T17:04:42.0959595Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_subclass/test_subclass-68565895e4fc66ea.xml 2025-12-04T17:04:42.1234970Z Running functorch/test_vmap_registrations 1/1 ... [2025-12-04 17:04:42.123242][28310.506136608] 2025-12-04T17:04:42.1235601Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T17:04:42.1239024Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'functorch/test_vmap_registrations.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 17:04:42.123655] 2025-12-04T17:04:52.2530499Z 2025-12-04T17:04:52.2531627Z functorch/test_vmap_registrations 1/1 was successful, full logs can be found in artifacts with path test/test-reports/functorch.test_vmap_registrations_1.1_8a0424ce5b3ca65e_.log 2025-12-04T17:04:52.3812424Z Running 1723 items in this shard: test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[_test::cat], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[_test::get_first], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[_test::leaky_relu], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::__and__.Scalar], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::__and__.Tensor], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::__iand__.Scalar], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::__iand__.Tensor], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::__ior__.Scalar], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::__ior__.Tensor], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::__ixor__.Scalar], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::__ixor__.Tensor], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::__or__.Scalar], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::__or__.Tensor], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::__xor__.Scalar], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::__xor__.Tensor], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::_add_batch_dim], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::_autocast_to_full_precision], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::_autocast_to_reduced_precision], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::_backward], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::_batch_norm_impl_index], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::_batch_norm_impl_index_backward], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::_cast_Byte], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::_cast_Char], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::_cast_Double], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::_cast_Float], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::_cast_Half], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::_cast_Int], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::_cast_Long], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::_cast_Short], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::_choose_qparams_per_tensor], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::_convolution.deprecated], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::_convolution_double_backward], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::_convolution_mode], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::_cufft_clear_plan_cache], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::_cufft_get_plan_cache_max_size], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::_cufft_get_plan_cache_size], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::_cufft_set_plan_cache_max_size], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::_debug_has_internal_overlap], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::_dim_arange], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::_embedding_bag_sparse_backward], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::_fused_rms_norm], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::_gather_sparse_backward], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::_grid_sampler_2d_cpu_fallback_backward], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::_has_compatible_shallow_copy_type], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::_is_zerotensor], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::_lu_with_info], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::_nnpack_available], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::_pack_padded_sequence_backward], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::_pad_circular], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::_pad_enum], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::_pad_packed_sequence], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::_propagate_xla_data], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::_remove_batch_dim], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::_reshape_from_tensor], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::_rowwise_prune], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::_saturate_weight_to_fp16], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::_scaled_dot_product_attention_math], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::_shape_as_tensor], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::_sobol_engine_draw], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::_sobol_engine_ff_], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::_sobol_engine_initialize_state_], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::_sobol_engine_scramble_], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::_sparse_bsc_tensor_unsafe], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::_sparse_bsr_tensor_unsafe], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::_sparse_compressed_tensor_unsafe], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::_sparse_coo_tensor_unsafe], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::_sparse_csc_tensor_unsafe], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::_sparse_csr_tensor_unsafe], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::_sparse_log_softmax.Dimname], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::_sparse_log_softmax.int], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::_sparse_mm.reduce], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::_sparse_mm], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::_sparse_softmax.Dimname], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::_sparse_softmax.int], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::_sparse_sum.dim_dtype], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::_sparse_sum.dtype], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::_sparse_sum], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::_test_ambiguous_defaults.a], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::_test_ambiguous_defaults.b], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::_test_autograd_multiple_dispatch.ntonly], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::_test_check_tensor], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::_test_serialization_subcmul], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::_test_string_default], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::_thnn_differentiable_gru_cell_backward], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::_thnn_differentiable_lstm_cell_backward], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::_thnn_fused_lstm_cell_backward], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::_to_cpu], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::_unpack_dual], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::_upsample_bicubic2d_aa.vec], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::_upsample_bilinear2d_aa.vec], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::_upsample_nearest_exact1d.vec], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::_upsample_nearest_exact2d.vec], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::_upsample_nearest_exact3d.vec], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::_use_cudnn_rnn_flatten_weight], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::_validate_sparse_bsc_tensor_args], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::_validate_sparse_bsr_tensor_args], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::_validate_sparse_compressed_tensor_args], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::_validate_sparse_coo_tensor_args], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::_validate_sparse_csc_tensor_args], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::_validate_sparse_csr_tensor_args], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::_version], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::_weight_norm], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::_weight_norm_differentiable_backward], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::_wrapped_linear_prepack], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::_wrapped_quantized_linear_prepacked], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::absolute.out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::absolute], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::absolute_], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::adaptive_avg_pool1d], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::adaptive_avg_pool2d], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::adaptive_avg_pool3d], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::adaptive_max_pool1d], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::adjoint], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::affine_grid_generator_backward], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::align_as], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::align_tensors], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::align_to.ellipsis_idx], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::align_to], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::all.dimname], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::all.dimname_out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::alpha_dropout], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::alpha_dropout_], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::any.dimname], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::any.dimname_out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::arccos.out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::arccos], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::arccos_], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::arccosh.out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::arccosh], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::arccosh_], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::arcsin.out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::arcsin], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::arcsin_], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::arcsinh.out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::arcsinh], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::arcsinh_], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::arctan.out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::arctan2.out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::arctan2], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::arctan2_], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::arctan], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::arctan_], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::arctanh.out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::arctanh], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::arctanh_], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::argsort.dimname], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::argsort.stable], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::argsort.stable_out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::argsort], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::argwhere], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::atleast_1d.Sequence], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::atleast_1d], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::atleast_2d.Sequence], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::atleast_2d], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::atleast_3d.Sequence], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::atleast_3d], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::avg_pool1d], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::batch_norm], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::bilinear], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::broadcast_tensors], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::broadcast_to], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::can_cast], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::cartesian_prod], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::cat.names], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::cat.names_out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::cdist], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::chain_matmul.out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::chain_matmul], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::chalf], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::choose_qparams_optimized], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::chunk], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::clip.Tensor], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::clip.Tensor_out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::clip.out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::clip], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::clip_.Tensor], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::clip_], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::coalesce], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::column_stack.out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::column_stack], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::combinations], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::concat.names], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::concat.names_out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::concat.out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::concat], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::concatenate.names], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::concatenate.names_out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::concatenate.out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::concatenate], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::conj], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::conj_physical], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::contiguous], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::conv1d.padding], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::conv1d], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::conv2d.padding], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::conv2d], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::conv3d.padding], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::conv3d], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::conv_tbc_backward], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::conv_transpose1d], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::conv_transpose2d.input], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::conv_transpose3d.input], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::corrcoef], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::cosine_embedding_loss], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::cosine_similarity], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::cov], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::cross.out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::cross], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::cross_entropy_loss], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::ctc_loss.IntList], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::ctc_loss.Tensor], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::cudnn_is_acceptable], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::cummax.dimname], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::cummax.dimname_out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::cummaxmin_backward], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::cummin.dimname], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::cummin.dimname_out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::cumprod.dimname], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::cumprod.dimname_out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::cumprod_.dimname], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::cumprod_backward], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::cumsum.dimname], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::cumsum.dimname_out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::cumsum_.dimname], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::cumulative_trapezoid.dx], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::cumulative_trapezoid.x], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::data], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::det], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::diag.out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::diag], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::diagflat], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::diagonal.Dimname], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::diff.out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::diff], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::divide.Scalar], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::divide.Scalar_mode], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::divide.Tensor], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::divide.Tensor_mode], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::divide.out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::divide.out_mode], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::divide_.Scalar], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::divide_.Scalar_mode], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::divide_.Tensor], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::divide_.Tensor_mode], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::dropout], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::dropout_], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::dsplit.array], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::dsplit.int], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::dstack.out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::dstack], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::einsum], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::embedding_backward], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::embedding_bag.padding_idx], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::embedding_bag], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::embedding_sparse_backward], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::empty.out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::expand_as], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::fake_quantize_per_channel_affine], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::fake_quantize_per_channel_affine_cachemask_backward], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::fake_quantize_per_tensor_affine.tensor_qparams], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::fake_quantize_per_tensor_affine], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::fake_quantize_per_tensor_affine_cachemask_backward], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::fbgemm_linear_fp16_weight.out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::fbgemm_linear_fp16_weight], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::fbgemm_linear_fp16_weight_fp32_activation.out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::fbgemm_linear_fp16_weight_fp32_activation], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::fbgemm_linear_int8_weight], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::fbgemm_linear_int8_weight_fp32_activation], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::fbgemm_linear_quantize_weight], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::fbgemm_pack_gemm_matrix_fp16], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::fbgemm_pack_quantized_matrix.KN], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::fbgemm_pack_quantized_matrix], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::feature_alpha_dropout], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::feature_alpha_dropout_], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::feature_dropout], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::feature_dropout_], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::fft_fft.out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::fft_fft2.out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::fft_fft2], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::fft_fft], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::fft_fftn.out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::fft_fftn], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::fft_fftshift], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::fft_hfft.out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::fft_hfft2.out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::fft_hfft2], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::fft_hfft], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::fft_hfftn.out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::fft_hfftn], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::fft_ifft.out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::fft_ifft2.out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::fft_ifft2], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::fft_ifft], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::fft_ifftn.out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::fft_ifftn], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::fft_ifftshift], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::fft_ihfft.out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::fft_ihfft2.out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::fft_ihfft2], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::fft_ihfft], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::fft_ihfftn.out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::fft_ihfftn], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::fft_irfft.out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::fft_irfft2.out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::fft_irfft2], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::fft_irfft], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::fft_irfftn.out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::fft_irfftn], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::fft_rfft.out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::fft_rfft2.out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::fft_rfft2], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::fft_rfft], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::fft_rfftn.out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::fft_rfftn], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::fill_diagonal_], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::fix.out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::fix], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::fix_], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::flatten.DimnameList], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::flatten.named_out_dim], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::flatten.using_ints], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::flatten.using_names], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::flatten_dense_tensors], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::fliplr], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::flipud], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::float_power.Scalar], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::float_power.Scalar_out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::float_power.Tensor_Scalar], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::float_power.Tensor_Scalar_out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::float_power.Tensor_Tensor], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::float_power.Tensor_Tensor_out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::float_power_.Scalar], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::float_power_.Tensor], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::frobenius_norm.dim], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::frobenius_norm.out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::fused_moving_avg_obs_fake_quant], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::gather.dimname], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::gather.dimname_out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::gather_backward], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::ger.out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::ger], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::get_gradients], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::gradient.array], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::gradient.scalararray], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::gradient.scalarint], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::gradient.scalarrayarray], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::gradient.scalarrayint], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::gradient.tensorarray], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::gradient.tensorarrayint], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::greater.Scalar], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::greater.Scalar_out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::greater.Tensor], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::greater.Tensor_out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::greater_.Scalar], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::greater_.Tensor], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::greater_equal.Scalar], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::greater_equal.Scalar_out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::greater_equal.Tensor], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::greater_equal.Tensor_out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::greater_equal_.Scalar], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::greater_equal_.Tensor], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::grid_sampler], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::group_norm], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::gru.data], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::gru.input], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::gru_cell], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::hinge_embedding_loss], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::histogramdd.TensorList_bins], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::histogramdd.int_bins], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::histogramdd], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::hsplit.array], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::hsplit.int], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::hstack.out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::hstack], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::imag], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::index_add.dimname], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::index_copy.dimname], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::index_copy_.dimname], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::index_fill.Dimname_Scalar], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::index_fill.Dimname_Tensor], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::index_fill_.Dimname_Scalar], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::index_fill_.Dimname_Tensor], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::index_select.dimname], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::index_select.dimname_out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::index_select_backward], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::infinitely_differentiable_gelu_backward], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::inner.out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::inner], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::instance_norm], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::inverse.out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::inverse], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::is_complex], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::is_conj], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::is_distributed], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::is_floating_point], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::is_inference], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::is_leaf], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::is_neg], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::is_nonzero], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::is_signed], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::is_vulkan_available], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::isclose], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::isfinite], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::isreal], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::istft], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::item], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::kl_div], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::kron.out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::kron], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::kthvalue.dimname], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::kthvalue.dimname_out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::l1_loss], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::layer_norm], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::ldexp.Tensor], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::ldexp.out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::ldexp_], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::less.Scalar], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::less.Scalar_out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::less.Tensor], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::less.Tensor_out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::less_.Scalar], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::less_.Tensor], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::less_equal.Scalar], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::less_equal.Scalar_out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::less_equal.Tensor], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::less_equal.Tensor_out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::less_equal_.Scalar], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::less_equal_.Tensor], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::linalg_cholesky.out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::linalg_cholesky], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::linalg_cond.out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::linalg_cond.p_str], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::linalg_cond.p_str_out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::linalg_cond], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::linalg_det.out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::linalg_det], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::linalg_diagonal], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::linalg_eigh.eigvals], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::linalg_eigh], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::linalg_eigvals], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::linalg_eigvalsh.out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::linalg_eigvalsh], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::linalg_inv.out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::linalg_inv], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::linalg_ldl_factor.out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::linalg_ldl_factor], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::linalg_lu_factor.out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::linalg_lu_factor], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::linalg_matmul.out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::linalg_matmul], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::linalg_matrix_norm.out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::linalg_matrix_norm.str_ord], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::linalg_matrix_norm.str_ord_out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::linalg_matrix_norm], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::linalg_matrix_power.out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::linalg_matrix_power], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::linalg_matrix_rank.atol_rtol_float], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::linalg_matrix_rank.atol_rtol_float_out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::linalg_matrix_rank.atol_rtol_tensor], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::linalg_matrix_rank.atol_rtol_tensor_out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::linalg_matrix_rank.out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::linalg_matrix_rank.out_tol_tensor], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::linalg_matrix_rank.tol_tensor], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::linalg_matrix_rank], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::linalg_multi_dot.out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::linalg_multi_dot], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::linalg_norm.ord_str], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::linalg_norm.ord_str_out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::linalg_norm.out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::linalg_norm], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::linalg_pinv.atol_rtol_float], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::linalg_pinv.atol_rtol_float_out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::linalg_pinv.out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::linalg_pinv.out_rcond_tensor], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::linalg_pinv.rcond_tensor], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::linalg_pinv], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::linalg_slogdet.out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::linalg_slogdet], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::linalg_solve.out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::linalg_solve], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::linalg_solve_ex.out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::linalg_solve_ex], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::linalg_svd.U], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::linalg_svd], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::linalg_svdvals.out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::linalg_svdvals], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::linalg_tensorinv.out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::linalg_tensorinv], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::linalg_tensorsolve.out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::linalg_tensorsolve], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::linalg_vander], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::linalg_vecdot.out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::linalg_vecdot], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::linear], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::log_sigmoid.out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::log_sigmoid], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::log_softmax.Dimname], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::log_softmax.int], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::logcumsumexp.dimname], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::logcumsumexp.dimname_out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::logdet], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::logsumexp.names], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::logsumexp.names_out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::lstm.data], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::lstm.input], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::lstm_cell], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::lu_solve.out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::lu_solve], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::mH], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::mT], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::margin_ranking_loss], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::masked_select_backward], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::matmul.out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::matmul], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::matrix_H], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::matrix_exp], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::matrix_exp_backward], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::matrix_power.out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::matrix_power], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::max.names_dim], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::max.names_dim_max], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::max.other], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::max.out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::max_pool1d], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::max_pool1d_with_indices], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::max_pool2d], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::max_pool3d], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::mean.names_dim], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::mean.names_out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::median.names_dim], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::median.names_dim_values], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::meshgrid.indexing], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::meshgrid], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::min.names_dim], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::min.names_dim_min], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::min.other], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::min.out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::mish_backward], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::mode.dimname], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::mode.dimname_out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::moveaxis.int], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::moveaxis.intlist], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::movedim.int], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::movedim.intlist], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::msort.out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::msort], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::multilabel_margin_loss.out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::multilabel_margin_loss], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::multiply.Scalar], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::multiply.Tensor], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::multiply.out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::multiply_.Scalar], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::multiply_.Tensor], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::nanmean.out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::nanmean], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::nanmedian.names_dim], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::nanmedian.names_dim_values], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::nanquantile.out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::nanquantile.scalar], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::nanquantile.scalar_out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::nanquantile], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::narrow.Tensor], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::narrow], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::native_channel_shuffle], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::negative.out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::negative], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::negative_], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::nested_to_padded_tensor], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::nll_loss.out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::nll_loss2d.out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::nll_loss2d], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::nll_loss], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::nll_loss_nd], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::nonzero_numpy], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::norm.names_ScalarOpt_dim], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::norm.names_ScalarOpt_dim_dtype], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::norm.names_dtype_out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::norm.names_out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::norm_except_dim], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::not_equal.Scalar], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::not_equal.Scalar_out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::not_equal.Tensor], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::not_equal.Tensor_out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::not_equal_.Scalar], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::not_equal_.Tensor], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::nuclear_norm.dim], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::nuclear_norm.dim_out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::nuclear_norm.out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::nuclear_norm], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::numpy_T], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::one_hot], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::orgqr.out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::orgqr], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::outer.out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::outer], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::output_nr], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::pad], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::pad_sequence], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::pairwise_distance], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::pdist], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::pin_memory], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::pinverse], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::poisson_nll_loss], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::positive], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::prelu], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::prod.Dimname_out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::prod.dim_Dimname], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::promote_types], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::qr.Q], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::qr], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::quantile.out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::quantile.scalar], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::quantile.scalar_out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::quantile], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::quantized_gru_cell], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::quantized_lstm_cell], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::quantized_rnn_relu_cell], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::quantized_rnn_tanh_cell], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::rand.generator_out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::randn.generator_out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::randn.out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::ravel], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::real], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::refine_names], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::relu6], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::relu6_], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::rename], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::rename_], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::repeat_interleave.self_Tensor], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::repeat_interleave.self_int], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::requires_grad_], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::reshape], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::reshape_as], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::resolve_conj], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::resolve_neg], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::result_type.Scalar], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::result_type.Scalar_Scalar], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::result_type.Scalar_Tensor], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::result_type.Tensor], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::retain_grad], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::retains_grad], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::rms_norm], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::rnn_relu.data], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::rnn_relu.input], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::rnn_relu_cell], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::rnn_tanh.data], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::rnn_tanh.input], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::rnn_tanh_cell], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::row_stack.out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::row_stack], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::rrelu], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::rrelu_], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::scaled_dot_product_attention], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::scatter.dimname_src], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::scatter.dimname_value], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::scatter_add.dimname], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::select.Dimname], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::selu], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::selu_], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::set_.source_Tensor_storage_offset], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::set_data], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::silu_backward], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::size.Dimname], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::size.int], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::slogdet.out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::slogdet], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::slow_conv3d.out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::slow_conv3d], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::smm], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::softmax.Dimname], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::softmax.int], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::sort.dimname], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::sort.dimname_stable], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::sort.dimname_values], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::sort.dimname_values_stable], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::sparse_bsc_tensor.ccol_row_value], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::sparse_bsc_tensor.ccol_row_value_size], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::sparse_bsr_tensor.crow_col_value], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::sparse_bsr_tensor.crow_col_value_size], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::sparse_coo_tensor.indices], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::sparse_coo_tensor.indices_size], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::sparse_csc_tensor.ccol_row_value], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::sparse_csc_tensor.ccol_row_value_size], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::sparse_csr_tensor.crow_col_value], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::sparse_csr_tensor.crow_col_value_size], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::special_digamma.out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::special_digamma], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::special_erf.out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::special_erf], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::special_erfc.out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::special_erfc], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::special_erfinv.out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::special_erfinv], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::special_exp2.out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::special_exp2], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::special_expit.out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::special_expit], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::special_expm1.out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::special_expm1], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::special_gammainc.out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::special_gammainc], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::special_gammaincc.out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::special_gammaincc], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::special_gammaln.out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::special_gammaln], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::special_i0.out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::special_i0], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::special_log1p.out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::special_log1p], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::special_log_softmax], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::special_logit.out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::special_logit], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::special_logsumexp.out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::special_logsumexp], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::special_multigammaln.out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::special_multigammaln], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::special_ndtr.out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::special_ndtr], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::special_polygamma.out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::special_polygamma], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::special_psi.out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::special_psi], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::special_round.out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::special_round], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::special_sinc.out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::special_sinc], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::special_softmax], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::special_xlogy.other_scalar], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::special_xlogy.other_scalar_out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::special_xlogy.out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::special_xlogy.self_scalar], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::special_xlogy.self_scalar_out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::special_xlogy], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::split.sizes], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::square.out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::square], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::square_], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::squeeze.dimname], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::squeeze_.dimname], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::sspaddmm], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::std.correction_names], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::std.correction_names_out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::std.dim], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::std.names_dim], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::std.names_out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::std.out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::std], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::std_mean.correction_names], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::std_mean.dim], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::std_mean.names_dim], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::std_mean], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::stft.center], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::stft], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::stride.Dimname], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::stride.int], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::subtract.Scalar], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::subtract.Tensor], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::subtract.out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::subtract_.Scalar], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::subtract_.Tensor], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::sum.DimnameList_out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::sum.dim_DimnameList], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::sum_to_size], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::svd.U], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::svd], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::swapaxes], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::swapaxes_], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::swapdims], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::swapdims_], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::sym_is_contiguous], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::sym_numel], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::sym_size.int], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::sym_storage_offset], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::sym_stride.int], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::take_along_dim.out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::take_along_dim], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::tensor_split.indices], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::tensor_split.sections], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::tensor_split.tensor_indices_or_sections], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::tensordot.out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::tensordot], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::thnn_conv2d.out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::thnn_conv2d], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::tile], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::to.device], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::to.dtype], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::to.dtype_layout], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::to.other], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::to_dense], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::to_dense_backward], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::to_mkldnn_backward], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::to_sparse.sparse_dim], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::to_sparse], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::to_sparse_bsc], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::to_sparse_bsr], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::to_sparse_csc], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::to_sparse_csr], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::trace_backward], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::transpose.Dimname], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::trapezoid.dx], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::trapezoid.x], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::trapz.dx], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::trapz.x], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::triplet_margin_loss], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::true_divide.Scalar], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::true_divide.Tensor], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::true_divide.out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::true_divide_.Scalar], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::true_divide_.Tensor], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::type_as], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::unbind.Dimname], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::unflatten.Dimname], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::unflatten.int], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::unflatten_dense_tensors], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::unsafe_chunk], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::upsample_bicubic2d.vec], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::upsample_bilinear2d.vec], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::upsample_linear1d.vec], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::upsample_nearest1d.vec], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::upsample_nearest2d.vec], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::upsample_nearest3d.vec], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::upsample_trilinear3d.vec], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::value_selecting_reduction_backward], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::vander], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::var.correction_names], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::var.correction_names_out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::var.dim], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::var.names_dim], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::var.names_out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::var.out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::var], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::var_mean.correction_names], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::var_mean.dim], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::var_mean.names_dim], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::var_mean], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::view_as], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::vsplit.array], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::vsplit.int], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::vstack.out], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::vstack], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::where.ScalarOther], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::where.ScalarSelf], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::where.Scalar], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[aten::where], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[c10d_functional::all_gather_into_tensor], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[c10d_functional::all_gather_into_tensor_coalesced], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[c10d_functional::all_reduce], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[c10d_functional::all_reduce_coalesced], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[c10d_functional::all_to_all_single], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[c10d_functional::broadcast], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[c10d_functional::reduce_scatter_tensor], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[c10d_functional::reduce_scatter_tensor_coalesced], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[c10d_functional::wait_tensor], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[inductor::_alloc_from_pool], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[mkldnn::_is_mkldnn_acl_supported], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[mkldnn::_is_mkldnn_bf16_supported], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[mkldnn::_is_mkldnn_fp16_supported], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[prepacked::unpack_prepacked_sizes_conv2d], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[prepacked::unpack_prepacked_sizes_linear], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[profiler::_record_function_enter], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[profiler::_record_function_enter_new], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[profiler::_record_function_exit._RecordFunction], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[profiler::_record_function_exit], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[quantized::conv1d_unpack], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[quantized::conv2d_dilation], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[quantized::conv2d_groups], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[quantized::conv2d_output_padding], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[quantized::conv2d_padding], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[quantized::conv2d_stride], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[quantized::conv2d_transpose], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[quantized::conv2d_unpack], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[quantized::conv2d_unpack_sizes], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[quantized::conv3d_dilation], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[quantized::conv3d_groups], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[quantized::conv3d_output_padding], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[quantized::conv3d_padding], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[quantized::conv3d_stride], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[quantized::conv3d_transpose], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[quantized::conv3d_unpack], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[quantized::conv_transpose1d_unpack], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[quantized::conv_transpose2d_dilation], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[quantized::conv_transpose2d_groups], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[quantized::conv_transpose2d_output_padding], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[quantized::conv_transpose2d_padding], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[quantized::conv_transpose2d_stride], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[quantized::conv_transpose2d_transpose], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[quantized::conv_transpose2d_unpack], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[quantized::conv_transpose3d_dilation], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[quantized::conv_transpose3d_groups], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[quantized::conv_transpose3d_output_padding], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[quantized::conv_transpose3d_padding], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[quantized::conv_transpose3d_stride], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[quantized::conv_transpose3d_transpose], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[quantized::conv_transpose3d_unpack], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[quantized::conv_unpack], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[quantized::embedding_bag_unpack], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[quantized::linear_unpack], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[quantized::linear_unpack_fp16], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[quantized::make_quantized_cell_params_fp16], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_a_batching_rule_for_composite_implicit_autograd_[sparse::qlinear_unpack], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::__and__.Scalar], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::__and__.Tensor], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::__iand__.Scalar], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::__iand__.Tensor], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::__ior__.Scalar], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::__ior__.Tensor], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::__ixor__.Scalar], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::__ixor__.Tensor], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::__or__.Scalar], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::__or__.Tensor], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::__xor__.Scalar], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::__xor__.Tensor], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::_batch_norm_impl_index], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::_convolution_double_backward], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::_convolution_mode], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::_fused_rms_norm], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::_has_compatible_shallow_copy_type], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::_lu_with_info], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::_pad_circular], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::_scaled_dot_product_attention_math], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::_test_check_tensor], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::_upsample_bicubic2d_aa.vec], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::_upsample_bilinear2d_aa.vec], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::absolute], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::absolute_], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::adaptive_avg_pool1d], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::adaptive_avg_pool2d], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::adaptive_avg_pool3d], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::adaptive_max_pool1d], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::adjoint], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::alias_copy], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::arccos], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::arccos_], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::arccosh], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::arccosh_], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::arcsin], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::arcsin_], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::arcsinh], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::arcsinh_], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::arctan2], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::arctan2_], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::arctan], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::arctan_], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::arctanh], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::arctanh_], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::argsort.stable], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::argsort], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::as_strided_copy], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::atleast_1d.Sequence], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::atleast_1d], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::atleast_2d.Sequence], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::atleast_2d], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::atleast_3d.Sequence], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::atleast_3d], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::avg_pool1d], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::batch_norm], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::broadcast_tensors], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::broadcast_to], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::cartesian_prod], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::cdist], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::chunk], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::clip.Tensor], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::clip], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::combinations], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::concat], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::concatenate], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::conj_physical], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::contiguous], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::conv1d.padding], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::conv1d], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::conv2d.padding], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::conv2d], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::conv3d.padding], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::conv3d], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::conv_transpose1d], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::conv_transpose2d.input], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::conv_transpose3d.input], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::corrcoef], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::cosine_embedding_loss], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::cosine_similarity], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::cov], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::cross], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::cross_entropy_loss], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::cumprod_backward], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::cumulative_trapezoid.dx], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::cumulative_trapezoid.x], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::det], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::diag], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::diagonal_copy], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::diff], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::divide.Scalar], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::divide.Scalar_mode], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::divide.Tensor], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::divide.Tensor_mode], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::divide_.Scalar_mode], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::divide_.Tensor], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::divide_.Tensor_mode], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::dropout], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::dsplit.array], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::dsplit.int], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::dstack], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::einsum], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::embedding_backward], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::expand_as], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::fft_fft2], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::fft_fft], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::fft_fftn], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::fft_fftshift], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::fft_hfft2], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::fft_hfft], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::fft_hfftn], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::fft_ifft2], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::fft_ifft], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::fft_ifftn], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::fft_ifftshift], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::fft_ihfft], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::fft_irfft2], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::fft_irfft], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::fft_irfftn], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::fft_rfft2], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::fft_rfft], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::fft_rfftn], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::fix], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::flatten.using_ints], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::fliplr], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::flipud], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::float_power.Scalar], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::float_power.Tensor_Scalar], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::float_power.Tensor_Tensor], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::frobenius_norm.dim], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::gather_backward], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::ger], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::gradient.array], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::gradient.scalararray], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::gradient.scalarint], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::gradient.scalarrayarray], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::gradient.scalarrayint], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::gradient.tensorarray], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::gradient.tensorarrayint], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::greater.Scalar], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::greater.Tensor], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::greater_equal.Scalar], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::greater_equal.Tensor], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::grid_sampler], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::group_norm], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::hinge_embedding_loss], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::hsplit.array], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::hsplit.int], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::hstack], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::imag], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::index_select_backward], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::inner], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::instance_norm], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::inverse], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::is_complex], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::is_same_size], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::isfinite], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::isreal], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::kron], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::l1_loss], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::layer_norm], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::ldexp.Tensor], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::less.Scalar], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::less.Tensor], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::less_equal.Scalar], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::less_equal.Tensor], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::linalg_cholesky], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::linalg_cond], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::linalg_det], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::linalg_diagonal], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::linalg_eigh], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::linalg_eigvals], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::linalg_eigvalsh], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::linalg_inv], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::linalg_ldl_factor], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::linalg_lu_factor], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::linalg_matmul], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::linalg_matrix_norm.str_ord], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::linalg_matrix_norm], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::linalg_matrix_power], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::linalg_matrix_rank.atol_rtol_float], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::linalg_matrix_rank.atol_rtol_tensor], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::linalg_multi_dot], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::linalg_norm.ord_str], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::linalg_norm], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::linalg_pinv.atol_rtol_float], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::linalg_pinv], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::linalg_solve], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::linalg_solve_ex], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::linalg_svd], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::linalg_svdvals], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::linalg_tensorinv], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::linalg_vander], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::linalg_vecdot], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::linear], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::log_sigmoid], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::log_softmax.int], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::logdet], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::mH], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::mT], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::matmul], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::matrix_H], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::matrix_exp], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::matrix_power], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::max.other], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::max_pool1d], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::max_pool1d_with_indices], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::max_pool2d], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::max_pool3d], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::meshgrid.indexing], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::meshgrid], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::min.other], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::moveaxis.intlist], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::movedim.int], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::movedim.intlist], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::msort], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::multiply.Scalar], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::multiply.Tensor], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::multiply_.Scalar], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::multiply_.Tensor], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::nanmean], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::narrow], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::negative], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::nll_loss2d], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::nll_loss], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::nll_loss_nd], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::not_equal.Scalar], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::not_equal.Tensor], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::nuclear_norm.dim], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::nuclear_norm], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::numpy_T], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::orgqr], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::outer], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::pad], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::pairwise_distance], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::pinverse], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::poisson_nll_loss], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::positive], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::prelu], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::qr], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::ravel], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::real], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::relu6], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::relu6_], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::repeat_interleave.self_Tensor], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::repeat_interleave.self_int], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::reshape], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::reshape_as], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::resolve_conj], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::resolve_neg], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::result_type.Scalar], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::result_type.Scalar_Scalar], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::result_type.Scalar_Tensor], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::result_type.Tensor], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::rms_norm], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::row_stack], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::rrelu], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::rrelu_], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::scaled_dot_product_attention], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::selu], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::selu_], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::size.int], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::slogdet], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::softmax.int], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::special_digamma], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::special_erf], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::special_erfc], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::special_erfinv], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::special_exp2], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::special_expit], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::special_expm1], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::special_gammainc], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::special_gammaincc], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::special_gammaln], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::special_i0], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::special_log1p], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::special_log_softmax], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::special_logit], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::special_logsumexp], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::special_multigammaln], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::special_ndtr], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::special_polygamma], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::special_psi], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::special_round], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::special_sinc], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::special_softmax], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::special_xlogy.other_scalar], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::special_xlogy.self_scalar], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::special_xlogy], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::split.sizes], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::square], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::std.dim], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::std], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::std_mean.dim], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::std_mean], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::subtract.Tensor], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::sum_to_size], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::svd], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::swapaxes], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::swapaxes_], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::swapdims], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::swapdims_], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::take_along_dim], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::tensor_split.indices], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::tensor_split.sections], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::tensordot], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::tile], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::to.device], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::to.dtype], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::to.dtype_layout], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::to.other], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::trapezoid.dx], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::trapezoid.x], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::trapz.dx], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::trapz.x], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::true_divide.Scalar], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::true_divide.Tensor], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::true_divide_.Scalar], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::true_divide_.Tensor], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::type_as], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::unflatten.int], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::unfold_copy], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::unsafe_chunk], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::upsample_bicubic2d.vec], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::upsample_bilinear2d.vec], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::upsample_linear1d.vec], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::upsample_nearest1d.vec], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::upsample_nearest2d.vec], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::upsample_nearest3d.vec], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::upsample_trilinear3d.vec], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::value_selecting_reduction_backward], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::var.dim], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::var], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::var_mean.dim], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::var_mean], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::view_as], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::vsplit.array], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::vsplit.int], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::vstack], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::where.ScalarOther], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::where.ScalarSelf], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_register_functorch_batched_decomposition_[aten::where.Scalar], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::absolute], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::absolute_], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::adaptive_avg_pool1d], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::adaptive_avg_pool2d], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::adaptive_avg_pool3d], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::adaptive_max_pool1d], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::adjoint], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::affine_grid_generator_backward], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::align_as], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::align_tensors], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::align_to.ellipsis_idx], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::align_to], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::alpha_dropout], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::alpha_dropout_], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::arccos], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::arccos_], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::arccosh], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::arccosh_], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::arcsin], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::arcsin_], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::arcsinh], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::arcsinh_], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::arctan2], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::arctan2_], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::arctan], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::arctan_], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::arctanh], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::arctanh_], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::argsort.stable], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::argsort], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::argwhere], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::atleast_1d.Sequence], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::atleast_1d], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::atleast_2d.Sequence], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::atleast_2d], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::atleast_3d.Sequence], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::atleast_3d], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::avg_pool1d], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::batch_norm], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::bilinear], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::broadcast_tensors], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::broadcast_to], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::can_cast], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::cartesian_prod], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::cat.names], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::cdist], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::chain_matmul], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::chalf], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::choose_qparams_optimized], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::chunk], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::clip.Tensor], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::clip], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::clip_.Tensor], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::clip_], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::coalesce], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::column_stack], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::combinations], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::concat.names], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::concat], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::concatenate.names], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::concatenate], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::conj], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::conj_physical], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::contiguous], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::conv1d.padding], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::conv1d], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::conv2d.padding], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::conv2d], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::conv3d.padding], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::conv3d], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::conv_tbc_backward], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::conv_transpose1d], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::conv_transpose2d.input], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::conv_transpose3d.input], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::corrcoef], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::cosine_embedding_loss], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::cosine_similarity], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::cov], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::cross], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::cross_entropy_loss], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::ctc_loss.IntList], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::ctc_loss.Tensor], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::cudnn_is_acceptable], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::cummaxmin_backward], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::cumprod_backward], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::cumulative_trapezoid.dx], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::cumulative_trapezoid.x], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::data], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::det], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::diag], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::diagflat], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::diff], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::divide.Scalar], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::divide.Scalar_mode], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::divide.Tensor], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::divide.Tensor_mode], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::divide.out_mode], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::divide_.Scalar], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::divide_.Scalar_mode], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::divide_.Tensor], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::divide_.Tensor_mode], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::dropout], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::dropout_], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::dsplit.array], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::dsplit.int], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::dstack], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::einsum], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::embedding_backward], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::embedding_bag.padding_idx], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::embedding_bag], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::expand_as], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::feature_alpha_dropout], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::feature_alpha_dropout_], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::feature_dropout], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::feature_dropout_], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::fft_fft2], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::fft_fft], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::fft_fftn], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::fft_fftshift], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::fft_hfft2], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::fft_hfft], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::fft_hfftn], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::fft_ifft2], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::fft_ifft], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::fft_ifftn], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::fft_ifftshift], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::fft_ihfft2], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::fft_ihfft], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::fft_ihfftn], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::fft_irfft2], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::fft_irfft], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::fft_irfftn], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::fft_rfft2], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::fft_rfft], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::fft_rfftn], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::fill_diagonal_], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::fix], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::fix_], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::flatten.named_out_dim], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::flatten.using_ints], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::flatten.using_names], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::flatten_dense_tensors], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::fliplr], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::flipud], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::float_power.Scalar], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::float_power.Tensor_Scalar], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::float_power.Tensor_Tensor], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::float_power_.Scalar], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::float_power_.Tensor], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::frobenius_norm.dim], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::fused_moving_avg_obs_fake_quant], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::gather_backward], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::ger], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::get_gradients], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::gradient.array], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::gradient.scalararray], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::gradient.scalarint], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::gradient.scalarrayarray], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::gradient.scalarrayint], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::gradient.tensorarray], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::gradient.tensorarrayint], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::greater.Scalar], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::greater.Tensor], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::greater_.Scalar], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::greater_.Tensor], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::greater_equal.Scalar], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::greater_equal.Tensor], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::greater_equal_.Scalar], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::greater_equal_.Tensor], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::grid_sampler], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::group_norm], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::gru.data], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::gru.input], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::gru_cell], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::hinge_embedding_loss], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::histogramdd.TensorList_bins], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::histogramdd.int_bins], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::histogramdd], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::hsplit.array], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::hsplit.int], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::hstack], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::imag], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::index_select_backward], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::infinitely_differentiable_gelu_backward], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::inner], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::instance_norm], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::inverse], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::isclose], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::isfinite], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::isreal], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::istft], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::item], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::kl_div], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::kron], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::l1_loss], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::layer_norm], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::ldexp.Tensor], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::ldexp_], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::less.Scalar], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::less.Tensor], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::less_.Scalar], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::less_.Tensor], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::less_equal.Scalar], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::less_equal.Tensor], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::less_equal_.Scalar], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::less_equal_.Tensor], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::linalg_cholesky], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::linalg_cond.p_str], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::linalg_cond], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::linalg_det], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::linalg_diagonal], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::linalg_eigh.eigvals], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::linalg_eigh], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::linalg_eigvals], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::linalg_eigvalsh], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::linalg_inv], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::linalg_ldl_factor], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::linalg_lu_factor], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::linalg_matmul], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::linalg_matrix_norm.str_ord], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::linalg_matrix_norm], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::linalg_matrix_power], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::linalg_matrix_rank.atol_rtol_float], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::linalg_matrix_rank.atol_rtol_tensor], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::linalg_matrix_rank.out_tol_tensor], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::linalg_matrix_rank.tol_tensor], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::linalg_matrix_rank], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::linalg_multi_dot], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::linalg_norm.ord_str], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::linalg_norm], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::linalg_pinv.atol_rtol_float], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::linalg_pinv.out_rcond_tensor], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::linalg_pinv.rcond_tensor], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::linalg_pinv], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::linalg_slogdet], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::linalg_solve], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::linalg_solve_ex], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::linalg_svd.U], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::linalg_svd], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::linalg_svdvals], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::linalg_tensorinv], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::linalg_tensorsolve], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::linalg_vander], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::linalg_vecdot], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::linear], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::log_sigmoid], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::log_softmax.int], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::logdet], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::logsumexp.names], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::lstm.data], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::lstm.input], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::lstm_cell], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::lu_solve], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::mH], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::mT], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::margin_ranking_loss], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::masked_select_backward], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::matmul], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::matrix_H], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::matrix_exp], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::matrix_exp_backward], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::matrix_power], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::max.names_dim], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::max.names_dim_max], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::max.other], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::max_pool1d], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::max_pool1d_with_indices], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::max_pool2d], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::max_pool3d], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::mean.names_dim], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::median.names_dim], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::median.names_dim_values], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::meshgrid.indexing], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::meshgrid], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::min.names_dim], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::min.names_dim_min], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::min.other], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::mish_backward], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::moveaxis.int], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::moveaxis.intlist], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::movedim.int], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::movedim.intlist], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::msort], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::multilabel_margin_loss], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::multiply.Scalar], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::multiply.Tensor], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::multiply_.Scalar], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::multiply_.Tensor], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::nanmean], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::nanmedian.names_dim], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::nanmedian.names_dim_values], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::nanquantile.scalar], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::nanquantile], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::narrow.Tensor], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::narrow], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::native_channel_shuffle], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::negative], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::negative_], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::nested_to_padded_tensor], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::nll_loss2d], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::nll_loss], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::nll_loss_nd], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::nonzero_numpy], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::norm.names_ScalarOpt_dim], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::norm.names_ScalarOpt_dim_dtype], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::norm_except_dim], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::not_equal.Scalar], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::not_equal.Tensor], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::not_equal_.Scalar], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::not_equal_.Tensor], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::nuclear_norm.dim], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::nuclear_norm], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::numpy_T], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::one_hot], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::orgqr], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::outer], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::output_nr], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::pad], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::pad_sequence], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::pairwise_distance], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::pdist], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::pin_memory], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::pinverse], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::poisson_nll_loss], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::positive], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::prelu], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::promote_types], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::qr.Q], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::qr], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::quantile.scalar], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::quantile], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::ravel], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::real], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::refine_names], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::relu6], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::relu6_], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::rename], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::rename_], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::repeat_interleave.self_Tensor], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::repeat_interleave.self_int], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::requires_grad_], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::reshape], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::reshape_as], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::resolve_conj], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::resolve_neg], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::result_type.Scalar], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::result_type.Scalar_Scalar], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::result_type.Scalar_Tensor], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::result_type.Tensor], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::retain_grad], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::retains_grad], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::rms_norm], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::rnn_relu.data], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::rnn_relu.input], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::rnn_relu_cell], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::rnn_tanh.data], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::rnn_tanh.input], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::rnn_tanh_cell], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::row_stack], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::rrelu], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::rrelu_], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::scaled_dot_product_attention], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::selu], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::selu_], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::set_.source_Tensor_storage_offset], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::set_data], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::silu_backward], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::size.int], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::slogdet], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::slow_conv3d], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::smm], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::softmax.int], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::special_digamma], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::special_erf], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::special_erfc], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::special_erfinv], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::special_exp2], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::special_expit], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::special_expm1], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::special_gammainc], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::special_gammaincc], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::special_gammaln], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::special_i0], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::special_log1p], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::special_log_softmax], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::special_logit], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::special_logsumexp], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::special_multigammaln], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::special_ndtr], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::special_polygamma], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::special_psi], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::special_round], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::special_sinc], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::special_softmax], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::special_xlogy.other_scalar], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::special_xlogy.self_scalar], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::special_xlogy], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::split.sizes], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::square], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::square_], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::sspaddmm], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::std.correction_names], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::std.dim], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::std.names_dim], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::std], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::std_mean.correction_names], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::std_mean.dim], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::std_mean.names_dim], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::std_mean], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::stft.center], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::stft], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::stride.int], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::subtract.Scalar], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::subtract.Tensor], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::subtract_.Scalar], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::subtract_.Tensor], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::sum_to_size], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::svd.U], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::svd], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::swapaxes], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::swapaxes_], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::swapdims], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::swapdims_], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::sym_is_contiguous], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::sym_numel], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::sym_size.int], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::sym_storage_offset], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::sym_stride.int], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::take_along_dim], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::tensor_split.indices], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::tensor_split.sections], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::tensor_split.tensor_indices_or_sections], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::tensordot], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::thnn_conv2d], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::tile], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::to.device], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::to.dtype], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::to.dtype_layout], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::to.other], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::to_dense], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::to_dense_backward], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::to_mkldnn_backward], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::trace_backward], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::trapezoid.dx], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::trapezoid.x], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::trapz.dx], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::trapz.x], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::triplet_margin_loss], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::true_divide.Scalar], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::true_divide.Tensor], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::true_divide_.Scalar], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::true_divide_.Tensor], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::type_as], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::unflatten.int], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::unflatten_dense_tensors], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::unsafe_chunk], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::upsample_bicubic2d.vec], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::upsample_bilinear2d.vec], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::upsample_linear1d.vec], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::upsample_nearest1d.vec], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::upsample_nearest2d.vec], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::upsample_nearest3d.vec], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::upsample_trilinear3d.vec], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::value_selecting_reduction_backward], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::vander], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::var.correction_names], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::var.dim], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::var.names_dim], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::var], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::var_mean.correction_names], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::var_mean.dim], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::var_mean.names_dim], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::var_mean], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::view_as], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::vsplit.array], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::vsplit.int], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::vstack], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::where.ScalarOther], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::where.ScalarSelf], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::where.Scalar], test/functorch/test_vmap_registrations.py::TestFunctorchDispatcher::test_unimplemented_batched_registrations_[aten::where] 2025-12-04T17:04:52.5062064Z 2025-12-04T17:04:52.5062564Z Finished functorch/test_vmap_registrations 1/1 ... [2025-12-04 17:04:52.256357][28320.639245214], took 0.17min 2025-12-04T17:04:52.5064050Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/functorch.test_vmap_registrations/functorch.test_vmap_registrations-40d5b566ee6986dc.xml 2025-12-04T17:04:52.5065358Z Running nn/test_parametrization 1/1 ... [2025-12-04 17:04:52.405089][28320.787982798] 2025-12-04T17:04:52.5065929Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T17:04:52.5067181Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'nn/test_parametrization.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 17:04:52.405560] 2025-12-04T17:05:01.8838286Z 2025-12-04T17:05:01.8839296Z nn/test_parametrization 1/1 was successful, full logs can be found in artifacts with path test/test-reports/nn.test_parametrization_1.1_0b836fe205c49662_.log 2025-12-04T17:05:01.8871946Z Running 58 items in this shard: test/nn/test_parametrization.py::TestNNParametrization::test_caching_parametrization_swap_False, test/nn/test_parametrization.py::TestNNParametrization::test_caching_parametrization_swap_True, test/nn/test_parametrization.py::TestNNParametrization::test_caching_parametrization_with_transfer_parametrizations_and_params_swap_False, test/nn/test_parametrization.py::TestNNParametrization::test_caching_parametrization_with_transfer_parametrizations_and_params_swap_True, test/nn/test_parametrization.py::TestNNParametrization::test_deepcopy_after_parametrization_swap_False, test/nn/test_parametrization.py::TestNNParametrization::test_deepcopy_after_parametrization_swap_True, test/nn/test_parametrization.py::TestNNParametrization::test_errors_parametrized_tensor_parametrization_swap_False, test/nn/test_parametrization.py::TestNNParametrization::test_errors_parametrized_tensor_parametrization_swap_True, test/nn/test_parametrization.py::TestNNParametrization::test_errors_unparametrized_tensor_parametrization_swap_False, test/nn/test_parametrization.py::TestNNParametrization::test_errors_unparametrized_tensor_parametrization_swap_True, test/nn/test_parametrization.py::TestNNParametrization::test_initialization_parametrization_swap_False, test/nn/test_parametrization.py::TestNNParametrization::test_initialization_parametrization_swap_True, test/nn/test_parametrization.py::TestNNParametrization::test_multiple_inputs_parametrization_swap_False, test/nn/test_parametrization.py::TestNNParametrization::test_multiple_inputs_parametrization_swap_True, test/nn/test_parametrization.py::TestNNParametrization::test_new_spectral_norm_dim_swap_False, test/nn/test_parametrization.py::TestNNParametrization::test_new_spectral_norm_dim_swap_True, test/nn/test_parametrization.py::TestNNParametrization::test_new_spectral_norm_forward_swap_False, test/nn/test_parametrization.py::TestNNParametrization::test_new_spectral_norm_forward_swap_True, test/nn/test_parametrization.py::TestNNParametrization::test_new_spectral_norm_load_state_dict_swap_False, test/nn/test_parametrization.py::TestNNParametrization::test_new_spectral_norm_load_state_dict_swap_True, test/nn/test_parametrization.py::TestNNParametrization::test_new_spectral_norm_swap_False, test/nn/test_parametrization.py::TestNNParametrization::test_new_spectral_norm_swap_True, test/nn/test_parametrization.py::TestNNParametrization::test_new_spectral_norm_value_swap_False, test/nn/test_parametrization.py::TestNNParametrization::test_new_spectral_norm_value_swap_True, test/nn/test_parametrization.py::TestNNParametrization::test_orthogonal_errors_swap_False, test/nn/test_parametrization.py::TestNNParametrization::test_orthogonal_errors_swap_True, test/nn/test_parametrization.py::TestNNParametrization::test_orthogonal_parametrization_swap_False, test/nn/test_parametrization.py::TestNNParametrization::test_orthogonal_parametrization_swap_True, test/nn/test_parametrization.py::TestNNParametrization::test_parametrization_same_training_mode_swap_False, test/nn/test_parametrization.py::TestNNParametrization::test_parametrization_same_training_mode_swap_True, test/nn/test_parametrization.py::TestNNParametrization::test_register_and_remove_buffer_parametrization_swap_False, test/nn/test_parametrization.py::TestNNParametrization::test_register_and_remove_buffer_parametrization_swap_True, test/nn/test_parametrization.py::TestNNParametrization::test_register_and_remove_nested_parametrization_swap_False, test/nn/test_parametrization.py::TestNNParametrization::test_register_and_remove_nested_parametrization_swap_True, test/nn/test_parametrization.py::TestNNParametrization::test_register_and_remove_parametrization_swap_False, test/nn/test_parametrization.py::TestNNParametrization::test_register_and_remove_parametrization_swap_True, test/nn/test_parametrization.py::TestNNParametrization::test_register_parametrization_no_grad, test/nn/test_parametrization.py::TestNNParametrization::test_serialization_parametrization_swap_False, test/nn/test_parametrization.py::TestNNParametrization::test_serialization_parametrization_swap_True, test/nn/test_parametrization.py::TestNNParametrization::test_transfer_parametrizations_and_params_many_to_one_swap_False, test/nn/test_parametrization.py::TestNNParametrization::test_transfer_parametrizations_and_params_many_to_one_swap_True, test/nn/test_parametrization.py::TestNNParametrization::test_transfer_parametrizations_and_params_right_inverse_swap_False, test/nn/test_parametrization.py::TestNNParametrization::test_transfer_parametrizations_and_params_right_inverse_swap_True, test/nn/test_parametrization.py::TestNNParametrization::test_transfer_parametrizations_and_params_single_param_swap_False, test/nn/test_parametrization.py::TestNNParametrization::test_transfer_parametrizations_and_params_single_param_swap_True, test/nn/test_parametrization.py::TestNNParametrization::test_transfer_parametrizations_and_params_swap_False, test/nn/test_parametrization.py::TestNNParametrization::test_transfer_parametrizations_and_params_swap_True, test/nn/test_parametrization.py::TestNNParametrization::test_type_before_parametrizations_swap_False, test/nn/test_parametrization.py::TestNNParametrization::test_type_before_parametrizations_swap_True, test/nn/test_parametrization.py::TestNNParametrization::test_weight_norm_deepcopy_swap_False, test/nn/test_parametrization.py::TestNNParametrization::test_weight_norm_deepcopy_swap_True, test/nn/test_parametrization.py::TestNNParametrization::test_weight_norm_pickle_swap_False, test/nn/test_parametrization.py::TestNNParametrization::test_weight_norm_pickle_swap_True, test/nn/test_parametrization.py::TestNNParametrization::test_weight_norm_state_dict_compat_swap_False, test/nn/test_parametrization.py::TestNNParametrization::test_weight_norm_state_dict_compat_swap_True, test/nn/test_parametrization.py::TestNNParametrization::test_wrapper_subclass_parametrization_swap_True, test/nn/test_parametrization.py::TestNNParametrizationDeviceCUDA::test_weight_norm_parametrization_swap_False_cuda, test/nn/test_parametrization.py::TestNNParametrizationDeviceCUDA::test_weight_norm_parametrization_swap_True_cuda 2025-12-04T17:05:01.8903480Z 2025-12-04T17:05:01.8903839Z Finished nn/test_parametrization 1/1 ... [2025-12-04 17:05:01.883696][28330.266591922], took 0.16min 2025-12-04T17:05:01.9191101Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/nn.test_parametrization/nn.test_parametrization-ed4e97080833ff92.xml 2025-12-04T17:05:01.9944797Z Running test_dynamic_shapes 1/1 ... [2025-12-04 17:05:01.994162][28330.377055699] 2025-12-04T17:05:01.9945352Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T17:05:01.9948527Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_dynamic_shapes.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 17:05:01.994612] 2025-12-04T17:06:11.8184055Z 2025-12-04T17:06:11.8185234Z test_dynamic_shapes 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_dynamic_shapes_1.1_f2bbcf4caeac0628_.log 2025-12-04T17:06:11.8363079Z Running 378 items in this shard: test/test_dynamic_shapes.py::TestPySymInt::test_arith_ops, test/test_dynamic_shapes.py::TestPySymInt::test_aten_ops, test/test_dynamic_shapes.py::TestPySymInt::test_avoid_unbacked_substitution, test/test_dynamic_shapes.py::TestPySymInt::test_backed_size_oblivious_01_spec, test/test_dynamic_shapes.py::TestPySymInt::test_baddbmm_symint, test/test_dynamic_shapes.py::TestPySymInt::test_binary, test/test_dynamic_shapes.py::TestPySymInt::test_data_dependent_guard, test/test_dynamic_shapes.py::TestPySymInt::test_data_dependent_guard_propagate_real_tensors, test/test_dynamic_shapes.py::TestPySymInt::test_debug_has_internal_overlap_unbacked, test/test_dynamic_shapes.py::TestPySymInt::test_deepcopy, test/test_dynamic_shapes.py::TestPySymInt::test_duck_shape, test/test_dynamic_shapes.py::TestPySymInt::test_ephemeral_source_simplification, test/test_dynamic_shapes.py::TestPySymInt::test_ephemeral_source_unified_with_non_ephemeral_source, test/test_dynamic_shapes.py::TestPySymInt::test_expect_true_basic, test/test_dynamic_shapes.py::TestPySymInt::test_expect_true_double_digits, test/test_dynamic_shapes.py::TestPySymInt::test_expect_true_prefer_later, test/test_dynamic_shapes.py::TestPySymInt::test_expect_true_refine_range, test/test_dynamic_shapes.py::TestPySymInt::test_expect_true_with_s0, test/test_dynamic_shapes.py::TestPySymInt::test_floor_clean_div_axioms, test/test_dynamic_shapes.py::TestPySymInt::test_floordiv_static, test/test_dynamic_shapes.py::TestPySymInt::test_fx_trace_intlist, test/test_dynamic_shapes.py::TestPySymInt::test_guard_int, test/test_dynamic_shapes.py::TestPySymInt::test_guard_refine_range, test/test_dynamic_shapes.py::TestPySymInt::test_hash_size, test/test_dynamic_shapes.py::TestPySymInt::test_int_bool, test/test_dynamic_shapes.py::TestPySymInt::test_int_conversion, test/test_dynamic_shapes.py::TestPySymInt::test_int_to_float, test/test_dynamic_shapes.py::TestPySymInt::test_max_of_unique_summation_opt, test/test_dynamic_shapes.py::TestPySymInt::test_meta_symint, test/test_dynamic_shapes.py::TestPySymInt::test_mul_int_oo_nan, test/test_dynamic_shapes.py::TestPySymInt::test_non_overlapping_and_dense_backed, test/test_dynamic_shapes.py::TestPySymInt::test_non_overlapping_and_dense_unbacked, test/test_dynamic_shapes.py::TestPySymInt::test_numel, test/test_dynamic_shapes.py::TestPySymInt::test_numpy_sym_max, test/test_dynamic_shapes.py::TestPySymInt::test_numpy_sym_min, test/test_dynamic_shapes.py::TestPySymInt::test_prefer_deferred_runtime_assertions_over_guards, test/test_dynamic_shapes.py::TestPySymInt::test_prims_non_overlapping_and_dense, test/test_dynamic_shapes.py::TestPySymInt::test_print_readable_with_symints, test/test_dynamic_shapes.py::TestPySymInt::test_reverse_arith_ops, test/test_dynamic_shapes.py::TestPySymInt::test_roundtrip, test/test_dynamic_shapes.py::TestPySymInt::test_size_expressions, test/test_dynamic_shapes.py::TestPySymInt::test_slice_backed_size_oblivious, test/test_dynamic_shapes.py::TestPySymInt::test_specialize_zero_one, test/test_dynamic_shapes.py::TestPySymInt::test_statically_known_false, test/test_dynamic_shapes.py::TestPySymInt::test_statically_known_true, test/test_dynamic_shapes.py::TestPySymInt::test_stride, test/test_dynamic_shapes.py::TestPySymInt::test_sym_ceil, test/test_dynamic_shapes.py::TestPySymInt::test_sym_floor, test/test_dynamic_shapes.py::TestPySymInt::test_sym_int, test/test_dynamic_shapes.py::TestPySymInt::test_sym_ite, test/test_dynamic_shapes.py::TestPySymInt::test_sym_log2, test/test_dynamic_shapes.py::TestPySymInt::test_sym_max_multi_max_simplify, test/test_dynamic_shapes.py::TestPySymInt::test_sym_sqrt, test/test_dynamic_shapes.py::TestPySymInt::test_sym_sum, test/test_dynamic_shapes.py::TestPySymInt::test_sym_trunc, test/test_dynamic_shapes.py::TestPySymInt::test_symint_args, test/test_dynamic_shapes.py::TestPySymInt::test_symint_as_scalar, test/test_dynamic_shapes.py::TestPySymInt::test_symint_bitwise_and, test/test_dynamic_shapes.py::TestPySymInt::test_symint_bitwise_or, test/test_dynamic_shapes.py::TestPySymInt::test_symint_bitwise_xor, test/test_dynamic_shapes.py::TestPySymInt::test_symint_vargs, test/test_dynamic_shapes.py::TestPySymInt::test_sympify_symint, test/test_dynamic_shapes.py::TestPySymInt::test_sympy_optimized_add, test/test_dynamic_shapes.py::TestPySymInt::test_sympy_optimized_add_binary_search, test/test_dynamic_shapes.py::TestPySymInt::test_tensor_factory_with_symint, test/test_dynamic_shapes.py::TestPySymInt::test_tracing_sym_ite, test/test_dynamic_shapes.py::TestPySymInt::test_unbacked_substitution, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_bool_method_fn_abs, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_bool_method_fn_add, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_bool_method_fn_and, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_bool_method_fn_bitwise_and, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_bool_method_fn_bitwise_or, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_bool_method_fn_bitwise_xor, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_bool_method_fn_ceil, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_bool_method_fn_eq, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_bool_method_fn_float_pow, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_bool_method_fn_float_truediv, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_bool_method_fn_floor, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_bool_method_fn_ge, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_bool_method_fn_gt, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_bool_method_fn_int_floordiv, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_bool_method_fn_int_truediv, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_bool_method_fn_is_integer, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_bool_method_fn_le, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_bool_method_fn_lshift, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_bool_method_fn_lt, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_bool_method_fn_mod, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_bool_method_fn_mul, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_bool_method_fn_ne, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_bool_method_fn_neg, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_bool_method_fn_or, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_bool_method_fn_pos, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_bool_method_fn_pow_by_natural, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_bool_method_fn_round, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_bool_method_fn_rshift, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_bool_method_fn_sub, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_bool_method_fn_sym_acos, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_bool_method_fn_sym_asin, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_bool_method_fn_sym_atan, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_bool_method_fn_sym_cos, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_bool_method_fn_sym_cosh, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_bool_method_fn_sym_float, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_bool_method_fn_sym_ite, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_bool_method_fn_sym_log2, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_bool_method_fn_sym_max, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_bool_method_fn_sym_min, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_bool_method_fn_sym_not, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_bool_method_fn_sym_sin, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_bool_method_fn_sym_sinh, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_bool_method_fn_sym_sqrt, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_bool_method_fn_sym_tan, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_bool_method_fn_sym_tanh, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_bool_method_fn_trunc, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_dynamic_int_basic_compile_backend_eager, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_dynamic_int_basic_compile_backend_inductor, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_dynamic_int_eager_usage, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_abs_first_type_float_second_type_float, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_abs_first_type_float_second_type_int, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_abs_first_type_int_second_type_float, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_abs_first_type_int_second_type_int, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_add_first_type_float_second_type_float, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_add_first_type_float_second_type_int, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_add_first_type_int_second_type_float, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_add_first_type_int_second_type_int, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_and_first_type_float_second_type_float, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_and_first_type_float_second_type_int, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_and_first_type_int_second_type_float, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_and_first_type_int_second_type_int, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_bitwise_and_first_type_float_second_type_float, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_bitwise_and_first_type_float_second_type_int, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_bitwise_and_first_type_int_second_type_float, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_bitwise_and_first_type_int_second_type_int, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_bitwise_or_first_type_float_second_type_float, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_bitwise_or_first_type_float_second_type_int, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_bitwise_or_first_type_int_second_type_float, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_bitwise_or_first_type_int_second_type_int, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_bitwise_xor_first_type_float_second_type_float, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_bitwise_xor_first_type_float_second_type_int, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_bitwise_xor_first_type_int_second_type_float, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_bitwise_xor_first_type_int_second_type_int, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_ceil_first_type_float_second_type_float, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_ceil_first_type_float_second_type_int, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_ceil_first_type_int_second_type_float, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_ceil_first_type_int_second_type_int, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_eq_first_type_float_second_type_float, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_eq_first_type_float_second_type_int, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_eq_first_type_int_second_type_float, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_eq_first_type_int_second_type_int, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_float_pow_first_type_float_second_type_float, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_float_pow_first_type_float_second_type_int, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_float_pow_first_type_int_second_type_float, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_float_pow_first_type_int_second_type_int, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_float_truediv_first_type_float_second_type_float, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_float_truediv_first_type_float_second_type_int, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_float_truediv_first_type_int_second_type_float, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_float_truediv_first_type_int_second_type_int, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_floor_first_type_float_second_type_float, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_floor_first_type_float_second_type_int, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_floor_first_type_int_second_type_float, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_floor_first_type_int_second_type_int, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_ge_first_type_float_second_type_float, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_ge_first_type_float_second_type_int, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_ge_first_type_int_second_type_float, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_ge_first_type_int_second_type_int, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_gt_first_type_float_second_type_float, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_gt_first_type_float_second_type_int, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_gt_first_type_int_second_type_float, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_gt_first_type_int_second_type_int, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_int_floordiv_first_type_float_second_type_float, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_int_floordiv_first_type_float_second_type_int, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_int_floordiv_first_type_int_second_type_float, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_int_floordiv_first_type_int_second_type_int, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_int_truediv_first_type_float_second_type_float, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_int_truediv_first_type_float_second_type_int, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_int_truediv_first_type_int_second_type_float, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_int_truediv_first_type_int_second_type_int, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_is_integer_first_type_float_second_type_float, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_is_integer_first_type_float_second_type_int, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_is_integer_first_type_int_second_type_float, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_is_integer_first_type_int_second_type_int, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_le_first_type_float_second_type_float, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_le_first_type_float_second_type_int, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_le_first_type_int_second_type_float, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_le_first_type_int_second_type_int, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_lshift_first_type_float_second_type_float, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_lshift_first_type_float_second_type_int, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_lshift_first_type_int_second_type_float, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_lshift_first_type_int_second_type_int, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_lt_first_type_float_second_type_float, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_lt_first_type_float_second_type_int, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_lt_first_type_int_second_type_float, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_lt_first_type_int_second_type_int, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_mod_first_type_float_second_type_float, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_mod_first_type_float_second_type_int, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_mod_first_type_int_second_type_float, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_mod_first_type_int_second_type_int, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_mul_first_type_float_second_type_float, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_mul_first_type_float_second_type_int, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_mul_first_type_int_second_type_float, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_mul_first_type_int_second_type_int, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_ne_first_type_float_second_type_float, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_ne_first_type_float_second_type_int, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_ne_first_type_int_second_type_float, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_ne_first_type_int_second_type_int, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_neg_first_type_float_second_type_float, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_neg_first_type_float_second_type_int, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_neg_first_type_int_second_type_float, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_neg_first_type_int_second_type_int, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_or_first_type_float_second_type_float, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_or_first_type_float_second_type_int, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_or_first_type_int_second_type_float, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_or_first_type_int_second_type_int, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_pos_first_type_float_second_type_float, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_pos_first_type_float_second_type_int, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_pos_first_type_int_second_type_float, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_pos_first_type_int_second_type_int, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_pow_by_natural_first_type_float_second_type_float, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_pow_by_natural_first_type_float_second_type_int, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_pow_by_natural_first_type_int_second_type_float, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_pow_by_natural_first_type_int_second_type_int, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_round_first_type_float_second_type_float, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_round_first_type_float_second_type_int, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_round_first_type_int_second_type_float, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_round_first_type_int_second_type_int, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_rshift_first_type_float_second_type_float, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_rshift_first_type_float_second_type_int, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_rshift_first_type_int_second_type_float, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_rshift_first_type_int_second_type_int, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_sub_first_type_float_second_type_float, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_sub_first_type_float_second_type_int, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_sub_first_type_int_second_type_float, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_sub_first_type_int_second_type_int, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_sym_acos_first_type_float_second_type_float, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_sym_acos_first_type_float_second_type_int, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_sym_acos_first_type_int_second_type_float, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_sym_acos_first_type_int_second_type_int, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_sym_asin_first_type_float_second_type_float, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_sym_asin_first_type_float_second_type_int, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_sym_asin_first_type_int_second_type_float, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_sym_asin_first_type_int_second_type_int, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_sym_atan_first_type_float_second_type_float, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_sym_atan_first_type_float_second_type_int, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_sym_atan_first_type_int_second_type_float, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_sym_atan_first_type_int_second_type_int, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_sym_cos_first_type_float_second_type_float, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_sym_cos_first_type_float_second_type_int, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_sym_cos_first_type_int_second_type_float, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_sym_cos_first_type_int_second_type_int, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_sym_cosh_first_type_float_second_type_float, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_sym_cosh_first_type_float_second_type_int, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_sym_cosh_first_type_int_second_type_float, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_sym_cosh_first_type_int_second_type_int, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_sym_float_first_type_float_second_type_float, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_sym_float_first_type_float_second_type_int, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_sym_float_first_type_int_second_type_float, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_sym_float_first_type_int_second_type_int, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_sym_ite_first_type_float_second_type_float, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_sym_ite_first_type_float_second_type_int, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_sym_ite_first_type_int_second_type_float, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_sym_ite_first_type_int_second_type_int, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_sym_log2_first_type_float_second_type_float, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_sym_log2_first_type_float_second_type_int, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_sym_log2_first_type_int_second_type_float, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_sym_log2_first_type_int_second_type_int, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_sym_max_first_type_float_second_type_float, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_sym_max_first_type_float_second_type_int, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_sym_max_first_type_int_second_type_float, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_sym_max_first_type_int_second_type_int, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_sym_min_first_type_float_second_type_float, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_sym_min_first_type_float_second_type_int, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_sym_min_first_type_int_second_type_float, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_sym_min_first_type_int_second_type_int, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_sym_not_first_type_float_second_type_float, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_sym_not_first_type_float_second_type_int, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_sym_not_first_type_int_second_type_float, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_sym_not_first_type_int_second_type_int, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_sym_sin_first_type_float_second_type_float, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_sym_sin_first_type_float_second_type_int, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_sym_sin_first_type_int_second_type_float, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_sym_sin_first_type_int_second_type_int, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_sym_sinh_first_type_float_second_type_float, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_sym_sinh_first_type_float_second_type_int, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_sym_sinh_first_type_int_second_type_float, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_sym_sinh_first_type_int_second_type_int, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_sym_sqrt_first_type_float_second_type_float, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_sym_sqrt_first_type_float_second_type_int, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_sym_sqrt_first_type_int_second_type_float, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_sym_sqrt_first_type_int_second_type_int, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_sym_tan_first_type_float_second_type_float, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_sym_tan_first_type_float_second_type_int, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_sym_tan_first_type_int_second_type_float, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_sym_tan_first_type_int_second_type_int, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_sym_tanh_first_type_float_second_type_float, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_sym_tanh_first_type_float_second_type_int, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_sym_tanh_first_type_int_second_type_float, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_sym_tanh_first_type_int_second_type_int, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_trunc_first_type_float_second_type_float, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_trunc_first_type_float_second_type_int, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_trunc_first_type_int_second_type_float, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_method_fn_trunc_first_type_int_second_type_int, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_non_symbolic_symnode, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_stride_symnode, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_symint_deepcopy, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_symint_hashing, test/test_dynamic_shapes.py::TestSymNumberMagicMethods::test_symnode_hashing, test/test_dynamic_shapes.py::TestFloorDiv::test_floordiv_assumptions, test/test_dynamic_shapes.py::TestFloorDiv::test_floordiv_div_by_one, test/test_dynamic_shapes.py::TestFloorDiv::test_floordiv_div_does_not_generate_non_int_rational, test/test_dynamic_shapes.py::TestFloorDiv::test_floordiv_float_int, test/test_dynamic_shapes.py::TestFloorDiv::test_floordiv_simplify, test/test_dynamic_shapes.py::TestDimConstraints::test_dim_constraints_reduce_congruences_simple, test/test_dynamic_shapes.py::TestDimConstraints::test_dim_constraints_reduce_inequalities_error, test/test_dynamic_shapes.py::TestDimConstraints::test_dim_constraints_reduce_inequalities_simple, test/test_dynamic_shapes.py::TestDimConstraints::test_dim_constraints_solve_full, test/test_dynamic_shapes.py::TestDimConstraints::test_simplify_max_1_0, test/test_dynamic_shapes.py::TestGuardsExpressions::test_guard_or_false, test/test_dynamic_shapes.py::TestGuardsExpressions::test_guard_or_true, test/test_dynamic_shapes.py::TestGuardsExpressions::test_guards_float_div, test/test_dynamic_shapes.py::TestGuardsExpressions::test_guards_float_print, test/test_dynamic_shapes.py::TestGuardsExpressions::test_guards_gt_lt, test/test_dynamic_shapes.py::TestGuardsExpressions::test_remove_symbols_without_guarding, test/test_dynamic_shapes.py::TestGuardsExpressions::test_size_comparison_no_recompile, test/test_dynamic_shapes.py::TestUnbacked::test_deferred_neq_assert_backend_eager, test/test_dynamic_shapes.py::TestUnbacked::test_deferred_neq_assert_backend_inductor, test/test_dynamic_shapes.py::TestUnbacked::test_deferred_sym_eq_assert_backend_eager, test/test_dynamic_shapes.py::TestUnbacked::test_deferred_sym_eq_assert_backend_inductor, test/test_dynamic_shapes.py::TestUnbacked::test_deferred_sym_or_assert_backend_eager, test/test_dynamic_shapes.py::TestUnbacked::test_deferred_sym_or_assert_backend_inductor, test/test_dynamic_shapes.py::TestUnbacked::test_deferred_with_unbacked_input_backend_eager, test/test_dynamic_shapes.py::TestUnbacked::test_deferred_with_unbacked_input_backend_inductor, test/test_dynamic_shapes.py::TestUnbacked::test_div_unbacked_eq_globals, test/test_dynamic_shapes.py::TestUnbacked::test_div_unbacked_eq_input_ints, test/test_dynamic_shapes.py::TestUnbacked::test_div_unbacked_eq_input_tensors, test/test_dynamic_shapes.py::TestUnbacked::test_div_unbacked_eq_item, test/test_dynamic_shapes.py::TestUnbacked::test_do_not_guard_unbacked_inputs, test/test_dynamic_shapes.py::TestUnbacked::test_has_free_symbols, test/test_dynamic_shapes.py::TestUnbacked::test_post_specialize_runtime_assert1_backend_eager, test/test_dynamic_shapes.py::TestUnbacked::test_post_specialize_runtime_assert1_backend_inductor, test/test_dynamic_shapes.py::TestUnbacked::test_post_specialize_runtime_assert2_backend_eager, test/test_dynamic_shapes.py::TestUnbacked::test_post_specialize_runtime_assert2_backend_inductor, test/test_dynamic_shapes.py::TestUbackedOps::test_backed_size_oblivious_broadcast, test/test_dynamic_shapes.py::TestUbackedOps::test_backed_size_oblivious_expand, test/test_dynamic_shapes.py::TestUbackedOps::test_invalid_view_unbacked_view, test/test_dynamic_shapes.py::TestUbackedOps::test_narrow_unbacked_start, test/test_dynamic_shapes.py::TestUbackedOps::test_narrow_unbacked_start_cpp_wrapper, test/test_dynamic_shapes.py::TestUbackedOps::test_narrow_with_tensor_start, test/test_dynamic_shapes.py::TestUbackedOps::test_nonzero_select, test/test_dynamic_shapes.py::TestUbackedOps::test_nonzero_select_cpp_wrapper, test/test_dynamic_shapes.py::TestUbackedOps::test_nonzero_slice, test/test_dynamic_shapes.py::TestUbackedOps::test_nonzero_slice_cpp_wrapper, test/test_dynamic_shapes.py::TestUbackedOps::test_padnd, test/test_dynamic_shapes.py::TestUbackedOps::test_select_scatter_unbacked_index, test/test_dynamic_shapes.py::TestUbackedOps::test_slice_with_tensor_indices, test/test_dynamic_shapes.py::TestUbackedOps::test_slice_with_tensor_indices_cpp_wrapper, test/test_dynamic_shapes.py::TestUbackedOps::test_tensor_split, test/test_dynamic_shapes.py::TestUbackedOps::test_tensor_split_cpp_wrapper, test/test_dynamic_shapes.py::TestUbackedOps::test_trunc_int_div_true, test/test_dynamic_shapes.py::TestUbackedOps::test_unbacked_contiguous, test/test_dynamic_shapes.py::TestUbackedOps::test_unbacked_item, test/test_dynamic_shapes.py::TestUbackedOps::test_unbacked_item_set_item, test/test_dynamic_shapes.py::TestUbackedOps::test_unbacked_item_set_item2, test/test_dynamic_shapes.py::TestUbackedOps::test_unbacked_item_set_item3, test/test_dynamic_shapes.py::TestUbackedOps::test_unbacked_non_contigious_reshape_failing, test/test_dynamic_shapes.py::TestUbackedOps::test_unbacked_reshape1, test/test_dynamic_shapes.py::TestUbackedOps::test_unbacked_reshape2, test/test_dynamic_shapes.py::TestUbackedOps::test_unbacked_reshape3, test/test_dynamic_shapes.py::TestUbackedOps::test_unbacked_reshape_copy, test/test_dynamic_shapes.py::TestUbackedOps::test_unbacked_select2, test/test_dynamic_shapes.py::TestUbackedOps::test_unbacked_select_2, test/test_dynamic_shapes.py::TestUbackedOps::test_unbacked_select_index, test/test_dynamic_shapes.py::TestUbackedOps::test_unbacked_select_index_cpp_wrapper, test/test_dynamic_shapes.py::TestUbackedOps::test_unbacked_select_index_with_check, test/test_dynamic_shapes.py::TestUbackedOps::test_unbacked_slice, test/test_dynamic_shapes.py::TestUbackedOps::test_unbacked_slice_cpp_wrapper, test/test_dynamic_shapes.py::TestUbackedOps::test_unbacked_slice_with_step, test/test_dynamic_shapes.py::TestUbackedOps::test_unbacked_slice_with_step_cpp_wrapper, test/test_dynamic_shapes.py::TestUbackedOps::test_unbacked_view_extra, test/test_dynamic_shapes.py::TestUbackedOps::test_unbind_not_dynamic 2025-12-04T17:06:11.8533297Z 2025-12-04T17:06:11.8533652Z Finished test_dynamic_shapes 1/1 ... [2025-12-04 17:06:11.818874][28400.201766344], took 1.16min 2025-12-04T17:06:11.8556534Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_dynamic_shapes/test_dynamic_shapes-07075f000d166d21.xml 2025-12-04T17:06:11.9585570Z Running test_dispatch 1/1 ... [2025-12-04 17:06:11.958266][28400.341160247] 2025-12-04T17:06:11.9586100Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T17:06:11.9589593Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_dispatch.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 17:06:11.958714] 2025-12-04T17:06:43.9213019Z 2025-12-04T17:06:43.9213935Z test_dispatch 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_dispatch_1.1_a7d630610c114c46_.log 2025-12-04T17:06:43.9226100Z Running 32 items in this shard: test/test_dispatch.py::TestDispatch::test_all_invariants, test/test_dispatch.py::TestDispatch::test_computed_table, test/test_dispatch.py::TestDispatch::test_computed_table_with_ambiguous_autogradother, test/test_dispatch.py::TestDispatch::test_computed_table_with_autograd, test/test_dispatch.py::TestDispatch::test_computed_table_with_cpu_autograd_defaultbackend, test/test_dispatch.py::TestDispatch::test_computed_table_with_cpu_autograd_math, test/test_dispatch.py::TestDispatch::test_computed_table_with_cpu_autograd_math_defaultbackend, test/test_dispatch.py::TestDispatch::test_computed_table_with_cpu_defaultbackend, test/test_dispatch.py::TestDispatch::test_computed_table_with_cpu_math, test/test_dispatch.py::TestDispatch::test_computed_table_with_cpu_math_autogradcpu_fallthrough, test/test_dispatch.py::TestDispatch::test_computed_table_with_math, test/test_dispatch.py::TestDispatch::test_def, test/test_dispatch.py::TestDispatch::test_def_impl_schema_mismatch, test/test_dispatch.py::TestDispatch::test_def_only, test/test_dispatch.py::TestDispatch::test_def_with_explicit_alias, test/test_dispatch.py::TestDispatch::test_def_with_inference, test/test_dispatch.py::TestDispatch::test_dispatch_print_registrations_for_dispatch_key_invalid, test/test_dispatch.py::TestDispatch::test_find_dangling_impls, test/test_dispatch.py::TestDispatch::test_find_dangling_impls_ext, test/test_dispatch.py::TestDispatch::test_impl_only, test/test_dispatch.py::TestDispatch::test_multiple_def_alias_defaulting, test/test_dispatch.py::TestDispatch::test_multiple_def_alias_mismatch, test/test_dispatch.py::TestDispatch::test_multiple_def_error, test/test_dispatch.py::TestDispatch::test_multiple_fallback, test/test_dispatch.py::TestDispatch::test_overwrite_math, test/test_dispatch.py::TestPythonDispatcher::test_autogradother, test/test_dispatch.py::TestPythonDispatcher::test_basic, test/test_dispatch.py::TestPythonDispatcher::test_defaultbackend_autogradcpu, test/test_dispatch.py::TestPythonDispatcher::test_defaultbackend_math, test/test_dispatch.py::TestPythonDispatcher::test_duplicate_registrations, test/test_dispatch.py::TestPythonDispatcher::test_math_autogradcpu, test/test_dispatch.py::TestPythonDispatcher::test_quantized_structured_not_implemented 2025-12-04T17:06:43.9237546Z 2025-12-04T17:06:43.9237844Z Finished test_dispatch 1/1 ... [2025-12-04 17:06:43.921094][28432.303986641], took 0.53min 2025-12-04T17:06:43.9572419Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_dispatch/test_dispatch-bf1fd68f7abb7228.xml 2025-12-04T17:06:44.0382376Z Running test_numba_integration 1/1 ... [2025-12-04 17:06:44.037893][28432.420786236] 2025-12-04T17:06:44.0382966Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T17:06:44.0386061Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_numba_integration.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 17:06:44.038345] 2025-12-04T17:06:51.4639057Z 2025-12-04T17:06:51.4640056Z test_numba_integration 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_numba_integration_1.1_4248037d4c172e88_.log 2025-12-04T17:06:51.4644305Z Running 8 items in this shard: test/test_numba_integration.py::TestNumbaIntegration::test_active_device, test/test_numba_integration.py::TestNumbaIntegration::test_array_adaptor, test/test_numba_integration.py::TestNumbaIntegration::test_conversion_errors, test/test_numba_integration.py::TestNumbaIntegration::test_cuda_array_interface, test/test_numba_integration.py::TestNumbaIntegration::test_from_cuda_array_interface, test/test_numba_integration.py::TestNumbaIntegration::test_from_cuda_array_interface_active_device, test/test_numba_integration.py::TestNumbaIntegration::test_from_cuda_array_interface_inferred_strides, test/test_numba_integration.py::TestNumbaIntegration::test_from_cuda_array_interface_lifetime 2025-12-04T17:06:51.4648169Z 2025-12-04T17:06:51.4648527Z Finished test_numba_integration 1/1 ... [2025-12-04 17:06:51.463709][28439.846601623], took 0.12min 2025-12-04T17:06:51.5000410Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_numba_integration/test_numba_integration-edcc49db775b9990.xml 2025-12-04T17:06:51.5795394Z Running test_functional_optim 1/1 ... [2025-12-04 17:06:51.579175][28439.962069932] 2025-12-04T17:06:51.5795971Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T17:06:51.5798700Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_functional_optim.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 17:06:51.579625] 2025-12-04T17:06:57.3524435Z 2025-12-04T17:06:57.3525437Z test_functional_optim 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_functional_optim_1.1_82fdba90420e8f47_.log 2025-12-04T17:06:57.3528382Z Running 4 items in this shard: test/test_functional_optim.py::TestFunctionalOptimParity::test_functional_optim_parity_adam, test/test_functional_optim.py::TestFunctionalOptimParity::test_functional_optim_parity_adam_w, test/test_functional_optim.py::TestFunctionalOptimParity::test_functional_optim_parity_sgd, test/test_functional_optim.py::TestFunctionalOptimParity::test_functional_optim_registration 2025-12-04T17:06:57.3530572Z 2025-12-04T17:06:57.3530924Z Finished test_functional_optim 1/1 ... [2025-12-04 17:06:57.352257][28445.735151173], took 0.10min 2025-12-04T17:06:57.3886612Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_functional_optim/test_functional_optim-389cbc1bb3d61470.xml 2025-12-04T17:06:57.4235801Z Running test_maskedtensor 1/1 ... [2025-12-04 17:06:57.423281][28445.806174687] 2025-12-04T17:06:57.4236357Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T17:06:57.4239632Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_maskedtensor.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 17:06:57.423729] 2025-12-04T17:09:12.8903389Z 2025-12-04T17:09:12.8904305Z test_maskedtensor 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_maskedtensor_1.1_4e0623e742dfe084_.log 2025-12-04T17:09:12.9298748Z Running 958 items in this shard: test/test_maskedtensor.py::TestUnary::test_inplace_unary_fn0, test/test_maskedtensor.py::TestUnary::test_inplace_unary_fn1, test/test_maskedtensor.py::TestUnary::test_inplace_unary_fn10, test/test_maskedtensor.py::TestUnary::test_inplace_unary_fn11, test/test_maskedtensor.py::TestUnary::test_inplace_unary_fn12, test/test_maskedtensor.py::TestUnary::test_inplace_unary_fn13, test/test_maskedtensor.py::TestUnary::test_inplace_unary_fn14, test/test_maskedtensor.py::TestUnary::test_inplace_unary_fn15, test/test_maskedtensor.py::TestUnary::test_inplace_unary_fn16, test/test_maskedtensor.py::TestUnary::test_inplace_unary_fn17, test/test_maskedtensor.py::TestUnary::test_inplace_unary_fn18, test/test_maskedtensor.py::TestUnary::test_inplace_unary_fn19, test/test_maskedtensor.py::TestUnary::test_inplace_unary_fn2, test/test_maskedtensor.py::TestUnary::test_inplace_unary_fn20, test/test_maskedtensor.py::TestUnary::test_inplace_unary_fn21, test/test_maskedtensor.py::TestUnary::test_inplace_unary_fn22, test/test_maskedtensor.py::TestUnary::test_inplace_unary_fn23, test/test_maskedtensor.py::TestUnary::test_inplace_unary_fn24, test/test_maskedtensor.py::TestUnary::test_inplace_unary_fn25, test/test_maskedtensor.py::TestUnary::test_inplace_unary_fn26, test/test_maskedtensor.py::TestUnary::test_inplace_unary_fn27, test/test_maskedtensor.py::TestUnary::test_inplace_unary_fn28, test/test_maskedtensor.py::TestUnary::test_inplace_unary_fn29, test/test_maskedtensor.py::TestUnary::test_inplace_unary_fn3, test/test_maskedtensor.py::TestUnary::test_inplace_unary_fn30, test/test_maskedtensor.py::TestUnary::test_inplace_unary_fn31, test/test_maskedtensor.py::TestUnary::test_inplace_unary_fn32, test/test_maskedtensor.py::TestUnary::test_inplace_unary_fn33, test/test_maskedtensor.py::TestUnary::test_inplace_unary_fn34, test/test_maskedtensor.py::TestUnary::test_inplace_unary_fn35, test/test_maskedtensor.py::TestUnary::test_inplace_unary_fn36, test/test_maskedtensor.py::TestUnary::test_inplace_unary_fn37, test/test_maskedtensor.py::TestUnary::test_inplace_unary_fn38, test/test_maskedtensor.py::TestUnary::test_inplace_unary_fn39, test/test_maskedtensor.py::TestUnary::test_inplace_unary_fn4, test/test_maskedtensor.py::TestUnary::test_inplace_unary_fn40, test/test_maskedtensor.py::TestUnary::test_inplace_unary_fn41, test/test_maskedtensor.py::TestUnary::test_inplace_unary_fn42, test/test_maskedtensor.py::TestUnary::test_inplace_unary_fn43, test/test_maskedtensor.py::TestUnary::test_inplace_unary_fn44, test/test_maskedtensor.py::TestUnary::test_inplace_unary_fn45, test/test_maskedtensor.py::TestUnary::test_inplace_unary_fn46, test/test_maskedtensor.py::TestUnary::test_inplace_unary_fn47, test/test_maskedtensor.py::TestUnary::test_inplace_unary_fn48, test/test_maskedtensor.py::TestUnary::test_inplace_unary_fn49, test/test_maskedtensor.py::TestUnary::test_inplace_unary_fn5, test/test_maskedtensor.py::TestUnary::test_inplace_unary_fn50, test/test_maskedtensor.py::TestUnary::test_inplace_unary_fn51, test/test_maskedtensor.py::TestUnary::test_inplace_unary_fn52, test/test_maskedtensor.py::TestUnary::test_inplace_unary_fn53, test/test_maskedtensor.py::TestUnary::test_inplace_unary_fn54, test/test_maskedtensor.py::TestUnary::test_inplace_unary_fn55, test/test_maskedtensor.py::TestUnary::test_inplace_unary_fn56, test/test_maskedtensor.py::TestUnary::test_inplace_unary_fn57, test/test_maskedtensor.py::TestUnary::test_inplace_unary_fn6, test/test_maskedtensor.py::TestUnary::test_inplace_unary_fn7, test/test_maskedtensor.py::TestUnary::test_inplace_unary_fn8, test/test_maskedtensor.py::TestUnary::test_inplace_unary_fn9, test/test_maskedtensor.py::TestUnary::test_unary_fn0, test/test_maskedtensor.py::TestUnary::test_unary_fn1, test/test_maskedtensor.py::TestUnary::test_unary_fn10, test/test_maskedtensor.py::TestUnary::test_unary_fn11, test/test_maskedtensor.py::TestUnary::test_unary_fn12, test/test_maskedtensor.py::TestUnary::test_unary_fn13, test/test_maskedtensor.py::TestUnary::test_unary_fn14, test/test_maskedtensor.py::TestUnary::test_unary_fn15, test/test_maskedtensor.py::TestUnary::test_unary_fn16, test/test_maskedtensor.py::TestUnary::test_unary_fn17, test/test_maskedtensor.py::TestUnary::test_unary_fn18, test/test_maskedtensor.py::TestUnary::test_unary_fn19, test/test_maskedtensor.py::TestUnary::test_unary_fn2, test/test_maskedtensor.py::TestUnary::test_unary_fn20, test/test_maskedtensor.py::TestUnary::test_unary_fn21, test/test_maskedtensor.py::TestUnary::test_unary_fn22, test/test_maskedtensor.py::TestUnary::test_unary_fn23, test/test_maskedtensor.py::TestUnary::test_unary_fn24, test/test_maskedtensor.py::TestUnary::test_unary_fn25, test/test_maskedtensor.py::TestUnary::test_unary_fn26, test/test_maskedtensor.py::TestUnary::test_unary_fn27, test/test_maskedtensor.py::TestUnary::test_unary_fn28, test/test_maskedtensor.py::TestUnary::test_unary_fn29, test/test_maskedtensor.py::TestUnary::test_unary_fn3, test/test_maskedtensor.py::TestUnary::test_unary_fn30, test/test_maskedtensor.py::TestUnary::test_unary_fn31, test/test_maskedtensor.py::TestUnary::test_unary_fn32, test/test_maskedtensor.py::TestUnary::test_unary_fn33, test/test_maskedtensor.py::TestUnary::test_unary_fn34, test/test_maskedtensor.py::TestUnary::test_unary_fn35, test/test_maskedtensor.py::TestUnary::test_unary_fn36, test/test_maskedtensor.py::TestUnary::test_unary_fn37, test/test_maskedtensor.py::TestUnary::test_unary_fn38, test/test_maskedtensor.py::TestUnary::test_unary_fn39, test/test_maskedtensor.py::TestUnary::test_unary_fn4, test/test_maskedtensor.py::TestUnary::test_unary_fn40, test/test_maskedtensor.py::TestUnary::test_unary_fn41, test/test_maskedtensor.py::TestUnary::test_unary_fn42, test/test_maskedtensor.py::TestUnary::test_unary_fn43, test/test_maskedtensor.py::TestUnary::test_unary_fn44, test/test_maskedtensor.py::TestUnary::test_unary_fn45, test/test_maskedtensor.py::TestUnary::test_unary_fn46, test/test_maskedtensor.py::TestUnary::test_unary_fn47, test/test_maskedtensor.py::TestUnary::test_unary_fn48, test/test_maskedtensor.py::TestUnary::test_unary_fn49, test/test_maskedtensor.py::TestUnary::test_unary_fn5, test/test_maskedtensor.py::TestUnary::test_unary_fn50, test/test_maskedtensor.py::TestUnary::test_unary_fn51, test/test_maskedtensor.py::TestUnary::test_unary_fn52, test/test_maskedtensor.py::TestUnary::test_unary_fn53, test/test_maskedtensor.py::TestUnary::test_unary_fn54, test/test_maskedtensor.py::TestUnary::test_unary_fn55, test/test_maskedtensor.py::TestUnary::test_unary_fn56, test/test_maskedtensor.py::TestUnary::test_unary_fn57, test/test_maskedtensor.py::TestUnary::test_unary_fn58, test/test_maskedtensor.py::TestUnary::test_unary_fn59, test/test_maskedtensor.py::TestUnary::test_unary_fn6, test/test_maskedtensor.py::TestUnary::test_unary_fn60, test/test_maskedtensor.py::TestUnary::test_unary_fn61, test/test_maskedtensor.py::TestUnary::test_unary_fn7, test/test_maskedtensor.py::TestUnary::test_unary_fn8, test/test_maskedtensor.py::TestUnary::test_unary_fn9, test/test_maskedtensor.py::TestBinary::test_binary_fn0, test/test_maskedtensor.py::TestBinary::test_binary_fn1, test/test_maskedtensor.py::TestBinary::test_binary_fn10, test/test_maskedtensor.py::TestBinary::test_binary_fn11, test/test_maskedtensor.py::TestBinary::test_binary_fn12, test/test_maskedtensor.py::TestBinary::test_binary_fn13, test/test_maskedtensor.py::TestBinary::test_binary_fn14, test/test_maskedtensor.py::TestBinary::test_binary_fn15, test/test_maskedtensor.py::TestBinary::test_binary_fn16, test/test_maskedtensor.py::TestBinary::test_binary_fn17, test/test_maskedtensor.py::TestBinary::test_binary_fn18, test/test_maskedtensor.py::TestBinary::test_binary_fn19, test/test_maskedtensor.py::TestBinary::test_binary_fn2, test/test_maskedtensor.py::TestBinary::test_binary_fn20, test/test_maskedtensor.py::TestBinary::test_binary_fn21, test/test_maskedtensor.py::TestBinary::test_binary_fn22, test/test_maskedtensor.py::TestBinary::test_binary_fn23, test/test_maskedtensor.py::TestBinary::test_binary_fn24, test/test_maskedtensor.py::TestBinary::test_binary_fn25, test/test_maskedtensor.py::TestBinary::test_binary_fn26, test/test_maskedtensor.py::TestBinary::test_binary_fn27, test/test_maskedtensor.py::TestBinary::test_binary_fn28, test/test_maskedtensor.py::TestBinary::test_binary_fn29, test/test_maskedtensor.py::TestBinary::test_binary_fn3, test/test_maskedtensor.py::TestBinary::test_binary_fn30, test/test_maskedtensor.py::TestBinary::test_binary_fn31, test/test_maskedtensor.py::TestBinary::test_binary_fn32, test/test_maskedtensor.py::TestBinary::test_binary_fn33, test/test_maskedtensor.py::TestBinary::test_binary_fn34, test/test_maskedtensor.py::TestBinary::test_binary_fn35, test/test_maskedtensor.py::TestBinary::test_binary_fn4, test/test_maskedtensor.py::TestBinary::test_binary_fn5, test/test_maskedtensor.py::TestBinary::test_binary_fn6, test/test_maskedtensor.py::TestBinary::test_binary_fn7, test/test_maskedtensor.py::TestBinary::test_binary_fn8, test/test_maskedtensor.py::TestBinary::test_binary_fn9, test/test_maskedtensor.py::TestBinary::test_inplace_binary_fn0, test/test_maskedtensor.py::TestBinary::test_inplace_binary_fn1, test/test_maskedtensor.py::TestBinary::test_inplace_binary_fn10, test/test_maskedtensor.py::TestBinary::test_inplace_binary_fn11, test/test_maskedtensor.py::TestBinary::test_inplace_binary_fn12, test/test_maskedtensor.py::TestBinary::test_inplace_binary_fn13, test/test_maskedtensor.py::TestBinary::test_inplace_binary_fn14, test/test_maskedtensor.py::TestBinary::test_inplace_binary_fn15, test/test_maskedtensor.py::TestBinary::test_inplace_binary_fn16, test/test_maskedtensor.py::TestBinary::test_inplace_binary_fn17, test/test_maskedtensor.py::TestBinary::test_inplace_binary_fn18, test/test_maskedtensor.py::TestBinary::test_inplace_binary_fn19, test/test_maskedtensor.py::TestBinary::test_inplace_binary_fn2, test/test_maskedtensor.py::TestBinary::test_inplace_binary_fn20, test/test_maskedtensor.py::TestBinary::test_inplace_binary_fn21, test/test_maskedtensor.py::TestBinary::test_inplace_binary_fn22, test/test_maskedtensor.py::TestBinary::test_inplace_binary_fn23, test/test_maskedtensor.py::TestBinary::test_inplace_binary_fn24, test/test_maskedtensor.py::TestBinary::test_inplace_binary_fn25, test/test_maskedtensor.py::TestBinary::test_inplace_binary_fn26, test/test_maskedtensor.py::TestBinary::test_inplace_binary_fn27, test/test_maskedtensor.py::TestBinary::test_inplace_binary_fn28, test/test_maskedtensor.py::TestBinary::test_inplace_binary_fn29, test/test_maskedtensor.py::TestBinary::test_inplace_binary_fn3, test/test_maskedtensor.py::TestBinary::test_inplace_binary_fn4, test/test_maskedtensor.py::TestBinary::test_inplace_binary_fn5, test/test_maskedtensor.py::TestBinary::test_inplace_binary_fn6, test/test_maskedtensor.py::TestBinary::test_inplace_binary_fn7, test/test_maskedtensor.py::TestBinary::test_inplace_binary_fn8, test/test_maskedtensor.py::TestBinary::test_inplace_binary_fn9, test/test_maskedtensor.py::TestBinary::test_masks_match_fn_name_add, test/test_maskedtensor.py::TestBinary::test_masks_match_fn_name_add_, test/test_maskedtensor.py::TestReductions::test__is_any_true, test/test_maskedtensor.py::TestReductions::test__is_any_true_false, test/test_maskedtensor.py::TestReductions::test_all, test/test_maskedtensor.py::TestReductions::test_amax, test/test_maskedtensor.py::TestReductions::test_amax_grad, test/test_maskedtensor.py::TestReductions::test_amin, test/test_maskedtensor.py::TestReductions::test_amin_grad, test/test_maskedtensor.py::TestReductions::test_any_true_dtype, test/test_maskedtensor.py::TestReductions::test_backward, test/test_maskedtensor.py::TestReductions::test_grad_dtype, test/test_maskedtensor.py::TestReductions::test_max_not_implemented, test/test_maskedtensor.py::TestReductions::test_mean, test/test_maskedtensor.py::TestReductions::test_mean_dim_grad, test/test_maskedtensor.py::TestReductions::test_mean_grad_case_1a, test/test_maskedtensor.py::TestReductions::test_mean_grad_case_1b, test/test_maskedtensor.py::TestReductions::test_mean_grad_case_1c, test/test_maskedtensor.py::TestReductions::test_mean_grad_case_1d, test/test_maskedtensor.py::TestReductions::test_mean_grad_case_1e, test/test_maskedtensor.py::TestReductions::test_mean_grad_case_1f, test/test_maskedtensor.py::TestReductions::test_prod, test/test_maskedtensor.py::TestReductions::test_prod_grad, test/test_maskedtensor.py::TestReductions::test_sum, test/test_maskedtensor.py::TestReductions::test_sum_grad, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_add_layout0_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_add_layout0_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_add_layout0_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_add_layout1_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_add_layout1_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_add_layout1_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_add_layout2_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_add_layout2_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_add_layout2_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_atan2_layout0_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_atan2_layout0_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_atan2_layout0_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_atan2_layout1_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_atan2_layout1_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_atan2_layout1_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_atan2_layout2_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_atan2_layout2_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_atan2_layout2_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_div_floor_rounding_layout0_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_div_floor_rounding_layout0_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_div_floor_rounding_layout0_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_div_floor_rounding_layout1_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_div_floor_rounding_layout1_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_div_floor_rounding_layout1_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_div_floor_rounding_layout2_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_div_floor_rounding_layout2_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_div_floor_rounding_layout2_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_div_no_rounding_mode_layout0_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_div_no_rounding_mode_layout0_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_div_no_rounding_mode_layout0_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_div_no_rounding_mode_layout1_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_div_no_rounding_mode_layout1_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_div_no_rounding_mode_layout1_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_div_no_rounding_mode_layout2_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_div_no_rounding_mode_layout2_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_div_no_rounding_mode_layout2_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_div_trunc_rounding_layout0_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_div_trunc_rounding_layout0_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_div_trunc_rounding_layout0_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_div_trunc_rounding_layout1_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_div_trunc_rounding_layout1_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_div_trunc_rounding_layout1_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_div_trunc_rounding_layout2_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_div_trunc_rounding_layout2_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_div_trunc_rounding_layout2_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_eq_layout0_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_eq_layout0_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_eq_layout0_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_eq_layout1_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_eq_layout1_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_eq_layout1_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_eq_layout2_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_eq_layout2_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_eq_layout2_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_floor_divide_layout0_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_floor_divide_layout0_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_floor_divide_layout0_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_floor_divide_layout1_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_floor_divide_layout1_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_floor_divide_layout1_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_floor_divide_layout2_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_floor_divide_layout2_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_floor_divide_layout2_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_fmax_layout0_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_fmax_layout0_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_fmax_layout0_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_fmax_layout1_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_fmax_layout1_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_fmax_layout1_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_fmax_layout2_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_fmax_layout2_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_fmax_layout2_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_fmin_layout0_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_fmin_layout0_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_fmin_layout0_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_fmin_layout1_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_fmin_layout1_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_fmin_layout1_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_fmin_layout2_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_fmin_layout2_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_fmin_layout2_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_fmod_layout0_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_fmod_layout0_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_fmod_layout0_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_fmod_layout1_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_fmod_layout1_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_fmod_layout1_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_fmod_layout2_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_fmod_layout2_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_fmod_layout2_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_ge_layout0_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_ge_layout0_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_ge_layout0_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_ge_layout1_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_ge_layout1_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_ge_layout1_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_ge_layout2_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_ge_layout2_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_ge_layout2_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_gt_layout0_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_gt_layout0_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_gt_layout0_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_gt_layout1_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_gt_layout1_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_gt_layout1_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_gt_layout2_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_gt_layout2_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_gt_layout2_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_le_layout0_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_le_layout0_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_le_layout0_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_le_layout1_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_le_layout1_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_le_layout1_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_le_layout2_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_le_layout2_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_le_layout2_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_logaddexp_layout0_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_logaddexp_layout0_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_logaddexp_layout0_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_logaddexp_layout1_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_logaddexp_layout1_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_logaddexp_layout1_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_logaddexp_layout2_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_logaddexp_layout2_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_logaddexp_layout2_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_lt_layout0_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_lt_layout0_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_lt_layout0_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_lt_layout1_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_lt_layout1_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_lt_layout1_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_lt_layout2_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_lt_layout2_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_lt_layout2_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_maximum_layout0_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_maximum_layout0_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_maximum_layout0_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_maximum_layout1_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_maximum_layout1_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_maximum_layout1_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_maximum_layout2_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_maximum_layout2_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_maximum_layout2_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_minimum_layout0_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_minimum_layout0_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_minimum_layout0_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_minimum_layout1_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_minimum_layout1_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_minimum_layout1_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_minimum_layout2_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_minimum_layout2_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_minimum_layout2_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_mul_layout0_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_mul_layout0_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_mul_layout0_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_mul_layout1_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_mul_layout1_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_mul_layout1_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_mul_layout2_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_mul_layout2_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_mul_layout2_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_ne_layout0_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_ne_layout0_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_ne_layout0_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_ne_layout1_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_ne_layout1_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_ne_layout1_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_ne_layout2_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_ne_layout2_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_ne_layout2_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_nextafter_layout0_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_nextafter_layout0_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_nextafter_layout0_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_nextafter_layout1_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_nextafter_layout1_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_nextafter_layout1_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_nextafter_layout2_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_nextafter_layout2_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_nextafter_layout2_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_remainder_layout0_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_remainder_layout0_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_remainder_layout0_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_remainder_layout1_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_remainder_layout1_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_remainder_layout1_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_remainder_layout2_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_remainder_layout2_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_remainder_layout2_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_sub_layout0_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_sub_layout0_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_sub_layout0_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_sub_layout1_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_sub_layout1_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_sub_layout1_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_sub_layout2_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_sub_layout2_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_sub_layout2_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_true_divide_layout0_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_true_divide_layout0_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_true_divide_layout0_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_true_divide_layout1_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_true_divide_layout1_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_true_divide_layout1_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_true_divide_layout2_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_true_divide_layout2_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_binary_core_true_divide_layout2_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_reduction_all_amax_layout0_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_reduction_all_amax_layout0_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_reduction_all_amax_layout0_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_reduction_all_amax_layout1_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_reduction_all_amax_layout1_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_reduction_all_amax_layout1_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_reduction_all_amax_layout2_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_reduction_all_amax_layout2_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_reduction_all_amax_layout2_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_reduction_all_amin_layout0_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_reduction_all_amin_layout0_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_reduction_all_amin_layout0_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_reduction_all_amin_layout1_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_reduction_all_amin_layout1_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_reduction_all_amin_layout1_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_reduction_all_amin_layout2_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_reduction_all_amin_layout2_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_reduction_all_amin_layout2_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_reduction_all_argmax_layout0_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_reduction_all_argmax_layout0_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_reduction_all_argmax_layout0_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_reduction_all_argmax_layout1_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_reduction_all_argmax_layout1_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_reduction_all_argmax_layout1_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_reduction_all_argmax_layout2_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_reduction_all_argmax_layout2_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_reduction_all_argmax_layout2_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_reduction_all_argmin_layout0_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_reduction_all_argmin_layout0_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_reduction_all_argmin_layout0_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_reduction_all_argmin_layout1_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_reduction_all_argmin_layout1_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_reduction_all_argmin_layout1_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_reduction_all_argmin_layout2_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_reduction_all_argmin_layout2_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_reduction_all_argmin_layout2_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_reduction_all_prod_layout0_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_reduction_all_prod_layout0_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_reduction_all_prod_layout0_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_reduction_all_prod_layout1_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_reduction_all_prod_layout1_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_reduction_all_prod_layout1_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_reduction_all_prod_layout2_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_reduction_all_prod_layout2_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_reduction_all_prod_layout2_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_reduction_all_sum_layout0_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_reduction_all_sum_layout0_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_reduction_all_sum_layout0_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_reduction_all_sum_layout1_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_reduction_all_sum_layout1_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_reduction_all_sum_layout1_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_reduction_all_sum_layout2_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_reduction_all_sum_layout2_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_reduction_all_sum_layout2_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_abs_layout0_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_abs_layout0_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_abs_layout0_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_abs_layout1_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_abs_layout1_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_abs_layout1_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_abs_layout2_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_abs_layout2_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_abs_layout2_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_acos_layout0_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_acos_layout0_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_acos_layout0_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_acos_layout1_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_acos_layout1_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_acos_layout1_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_acos_layout2_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_acos_layout2_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_acos_layout2_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_acosh_layout0_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_acosh_layout0_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_acosh_layout0_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_acosh_layout1_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_acosh_layout1_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_acosh_layout1_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_acosh_layout2_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_acosh_layout2_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_acosh_layout2_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_angle_layout0_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_angle_layout0_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_angle_layout1_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_angle_layout1_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_angle_layout2_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_angle_layout2_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_asin_layout0_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_asin_layout0_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_asin_layout0_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_asin_layout1_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_asin_layout1_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_asin_layout1_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_asin_layout2_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_asin_layout2_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_asin_layout2_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_asinh_layout0_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_asinh_layout0_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_asinh_layout0_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_asinh_layout1_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_asinh_layout1_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_asinh_layout1_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_asinh_layout2_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_asinh_layout2_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_asinh_layout2_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_atan_layout0_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_atan_layout0_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_atan_layout0_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_atan_layout1_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_atan_layout1_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_atan_layout1_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_atan_layout2_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_atan_layout2_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_atan_layout2_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_atanh_layout0_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_atanh_layout0_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_atanh_layout0_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_atanh_layout1_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_atanh_layout1_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_atanh_layout1_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_atanh_layout2_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_atanh_layout2_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_atanh_layout2_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_ceil_layout0_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_ceil_layout0_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_ceil_layout0_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_ceil_layout1_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_ceil_layout1_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_ceil_layout1_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_ceil_layout2_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_ceil_layout2_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_ceil_layout2_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_conj_physical_layout0_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_conj_physical_layout0_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_conj_physical_layout0_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_conj_physical_layout1_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_conj_physical_layout1_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_conj_physical_layout1_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_conj_physical_layout2_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_conj_physical_layout2_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_conj_physical_layout2_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_cos_layout0_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_cos_layout0_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_cos_layout0_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_cos_layout1_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_cos_layout1_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_cos_layout1_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_cos_layout2_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_cos_layout2_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_cos_layout2_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_cosh_layout0_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_cosh_layout0_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_cosh_layout0_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_cosh_layout1_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_cosh_layout1_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_cosh_layout1_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_cosh_layout2_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_cosh_layout2_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_cosh_layout2_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_deg2rad_layout0_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_deg2rad_layout0_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_deg2rad_layout0_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_deg2rad_layout1_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_deg2rad_layout1_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_deg2rad_layout1_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_deg2rad_layout2_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_deg2rad_layout2_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_deg2rad_layout2_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_digamma_layout0_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_digamma_layout0_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_digamma_layout0_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_digamma_layout1_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_digamma_layout1_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_digamma_layout1_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_digamma_layout2_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_digamma_layout2_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_digamma_layout2_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_erf_layout0_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_erf_layout0_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_erf_layout0_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_erf_layout1_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_erf_layout1_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_erf_layout1_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_erf_layout2_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_erf_layout2_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_erf_layout2_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_erfc_layout0_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_erfc_layout0_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_erfc_layout0_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_erfc_layout1_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_erfc_layout1_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_erfc_layout1_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_erfc_layout2_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_erfc_layout2_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_erfc_layout2_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_erfinv_layout0_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_erfinv_layout0_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_erfinv_layout0_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_erfinv_layout1_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_erfinv_layout1_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_erfinv_layout1_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_erfinv_layout2_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_erfinv_layout2_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_erfinv_layout2_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_exp2_layout0_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_exp2_layout0_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_exp2_layout0_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_exp2_layout1_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_exp2_layout1_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_exp2_layout1_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_exp2_layout2_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_exp2_layout2_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_exp2_layout2_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_exp_layout0_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_exp_layout0_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_exp_layout0_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_exp_layout1_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_exp_layout1_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_exp_layout1_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_exp_layout2_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_exp_layout2_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_exp_layout2_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_expm1_layout0_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_expm1_layout0_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_expm1_layout0_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_expm1_layout1_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_expm1_layout1_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_expm1_layout1_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_expm1_layout2_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_expm1_layout2_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_expm1_layout2_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_floor_layout0_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_floor_layout0_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_floor_layout0_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_floor_layout1_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_floor_layout1_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_floor_layout1_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_floor_layout2_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_floor_layout2_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_floor_layout2_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_frac_layout0_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_frac_layout0_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_frac_layout0_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_frac_layout1_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_frac_layout1_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_frac_layout1_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_frac_layout2_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_frac_layout2_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_frac_layout2_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_i0_layout0_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_i0_layout0_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_i0_layout0_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_i0_layout1_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_i0_layout1_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_i0_layout1_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_i0_layout2_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_i0_layout2_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_i0_layout2_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_isnan_layout0_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_isnan_layout0_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_isnan_layout0_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_isnan_layout1_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_isnan_layout1_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_isnan_layout1_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_isnan_layout2_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_isnan_layout2_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_isnan_layout2_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_lgamma_layout0_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_lgamma_layout0_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_lgamma_layout0_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_lgamma_layout1_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_lgamma_layout1_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_lgamma_layout1_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_lgamma_layout2_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_lgamma_layout2_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_lgamma_layout2_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_log10_layout0_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_log10_layout0_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_log10_layout0_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_log10_layout1_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_log10_layout1_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_log10_layout1_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_log10_layout2_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_log10_layout2_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_log10_layout2_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_log1p_layout0_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_log1p_layout0_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_log1p_layout0_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_log1p_layout1_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_log1p_layout1_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_log1p_layout1_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_log1p_layout2_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_log1p_layout2_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_log1p_layout2_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_log2_layout0_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_log2_layout0_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_log2_layout0_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_log2_layout1_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_log2_layout1_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_log2_layout1_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_log2_layout2_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_log2_layout2_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_log2_layout2_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_log_layout0_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_log_layout0_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_log_layout0_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_log_layout1_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_log_layout1_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_log_layout1_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_log_layout2_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_log_layout2_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_log_layout2_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_logit_layout0_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_logit_layout0_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_logit_layout0_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_logit_layout1_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_logit_layout1_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_logit_layout1_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_logit_layout2_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_logit_layout2_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_logit_layout2_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_nan_to_num_layout0_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_nan_to_num_layout0_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_nan_to_num_layout0_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_nan_to_num_layout1_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_nan_to_num_layout1_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_nan_to_num_layout1_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_nan_to_num_layout2_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_nan_to_num_layout2_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_nan_to_num_layout2_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_neg_layout0_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_neg_layout0_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_neg_layout0_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_neg_layout1_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_neg_layout1_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_neg_layout1_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_neg_layout2_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_neg_layout2_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_neg_layout2_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_positive_layout0_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_positive_layout0_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_positive_layout0_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_positive_layout1_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_positive_layout1_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_positive_layout1_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_positive_layout2_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_positive_layout2_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_positive_layout2_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_rad2deg_layout0_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_rad2deg_layout0_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_rad2deg_layout0_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_rad2deg_layout1_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_rad2deg_layout1_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_rad2deg_layout1_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_rad2deg_layout2_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_rad2deg_layout2_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_rad2deg_layout2_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_reciprocal_layout0_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_reciprocal_layout0_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_reciprocal_layout0_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_reciprocal_layout1_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_reciprocal_layout1_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_reciprocal_layout1_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_reciprocal_layout2_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_reciprocal_layout2_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_reciprocal_layout2_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_round_decimals_0_layout0_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_round_decimals_0_layout0_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_round_decimals_0_layout0_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_round_decimals_0_layout1_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_round_decimals_0_layout1_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_round_decimals_0_layout1_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_round_decimals_0_layout2_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_round_decimals_0_layout2_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_round_decimals_0_layout2_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_round_decimals_3_layout0_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_round_decimals_3_layout0_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_round_decimals_3_layout0_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_round_decimals_3_layout1_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_round_decimals_3_layout1_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_round_decimals_3_layout1_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_round_decimals_3_layout2_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_round_decimals_3_layout2_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_round_decimals_3_layout2_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_round_decimals_neg_3_layout0_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_round_decimals_neg_3_layout0_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_round_decimals_neg_3_layout0_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_round_decimals_neg_3_layout1_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_round_decimals_neg_3_layout1_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_round_decimals_neg_3_layout1_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_round_decimals_neg_3_layout2_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_round_decimals_neg_3_layout2_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_round_decimals_neg_3_layout2_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_round_layout0_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_round_layout0_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_round_layout0_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_round_layout1_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_round_layout1_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_round_layout1_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_round_layout2_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_round_layout2_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_round_layout2_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_rsqrt_layout0_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_rsqrt_layout0_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_rsqrt_layout0_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_rsqrt_layout1_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_rsqrt_layout1_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_rsqrt_layout1_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_rsqrt_layout2_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_rsqrt_layout2_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_rsqrt_layout2_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_sgn_layout0_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_sgn_layout0_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_sgn_layout0_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_sgn_layout1_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_sgn_layout1_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_sgn_layout1_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_sgn_layout2_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_sgn_layout2_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_sgn_layout2_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_sigmoid_layout0_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_sigmoid_layout0_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_sigmoid_layout0_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_sigmoid_layout1_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_sigmoid_layout1_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_sigmoid_layout1_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_sigmoid_layout2_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_sigmoid_layout2_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_sigmoid_layout2_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_sign_layout0_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_sign_layout0_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_sign_layout0_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_sign_layout1_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_sign_layout1_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_sign_layout1_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_sign_layout2_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_sign_layout2_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_sign_layout2_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_signbit_layout0_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_signbit_layout0_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_signbit_layout0_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_signbit_layout1_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_signbit_layout1_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_signbit_layout1_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_signbit_layout2_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_signbit_layout2_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_signbit_layout2_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_sin_layout0_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_sin_layout0_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_sin_layout0_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_sin_layout1_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_sin_layout1_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_sin_layout1_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_sin_layout2_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_sin_layout2_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_sin_layout2_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_sinc_layout0_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_sinc_layout0_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_sinc_layout0_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_sinc_layout1_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_sinc_layout1_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_sinc_layout1_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_sinc_layout2_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_sinc_layout2_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_sinc_layout2_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_sinh_layout0_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_sinh_layout0_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_sinh_layout0_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_sinh_layout1_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_sinh_layout1_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_sinh_layout1_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_sinh_layout2_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_sinh_layout2_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_sinh_layout2_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_sqrt_layout0_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_sqrt_layout0_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_sqrt_layout0_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_sqrt_layout1_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_sqrt_layout1_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_sqrt_layout1_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_sqrt_layout2_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_sqrt_layout2_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_sqrt_layout2_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_square_layout0_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_square_layout0_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_square_layout0_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_square_layout1_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_square_layout1_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_square_layout1_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_square_layout2_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_square_layout2_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_square_layout2_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_tan_layout0_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_tan_layout0_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_tan_layout0_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_tan_layout1_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_tan_layout1_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_tan_layout1_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_tan_layout2_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_tan_layout2_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_tan_layout2_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_tanh_layout0_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_tanh_layout0_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_tanh_layout0_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_tanh_layout1_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_tanh_layout1_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_tanh_layout1_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_tanh_layout2_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_tanh_layout2_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_tanh_layout2_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_trunc_layout0_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_trunc_layout0_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_trunc_layout0_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_trunc_layout1_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_trunc_layout1_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_trunc_layout1_cuda_float64, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_trunc_layout2_cuda_float16, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_trunc_layout2_cuda_float32, test/test_maskedtensor.py::TestOperatorsCUDA::test_unary_core_trunc_layout2_cuda_float64, test/test_maskedtensor.py::TestBasicsCUDA::test_add_cuda, test/test_maskedtensor.py::TestBasicsCUDA::test_contiguous_cuda, test/test_maskedtensor.py::TestBasicsCUDA::test_diff_dim_cuda, test/test_maskedtensor.py::TestBasicsCUDA::test_diff_layouts_cuda, test/test_maskedtensor.py::TestBasicsCUDA::test_diff_sizes_cuda, test/test_maskedtensor.py::TestBasicsCUDA::test_grad_warning_cuda, test/test_maskedtensor.py::TestBasicsCUDA::test_invalid_sparse_coo_values_cuda, test/test_maskedtensor.py::TestBasicsCUDA::test_invalid_sparse_csr_values_cuda, test/test_maskedtensor.py::TestBasicsCUDA::test_invalid_sparse_layout_cuda, test/test_maskedtensor.py::TestBasicsCUDA::test_invalid_tensor_inputs_cuda, test/test_maskedtensor.py::TestBasicsCUDA::test_nn_unfold_cuda, test/test_maskedtensor.py::TestBasicsCUDA::test_softmax_cuda, test/test_maskedtensor.py::TestBasicsCUDA::test_stack_cuda, test/test_maskedtensor.py::TestBasicsCUDA::test_to_dense_and_sparse_coo_cuda, test/test_maskedtensor.py::TestBasicsCUDA::test_to_dense_and_sparse_csr_cuda, test/test_maskedtensor.py::TestBasicsCUDA::test_to_dense_cuda, test/test_maskedtensor.py::TestBasicsCUDA::test_to_device_cuda, test/test_maskedtensor.py::TestBasicsCUDA::test_to_dtype_cuda, test/test_maskedtensor.py::TestBasicsCUDA::test_to_sparse_cuda, test/test_maskedtensor.py::TestBasicsCUDA::test_unfold_cuda, test/test_maskedtensor.py::TestBasicsCUDA::test_where_cuda 2025-12-04T17:09:12.9682864Z 2025-12-04T17:09:12.9683237Z Finished test_maskedtensor 1/1 ... [2025-12-04 17:09:12.891560][28581.274453082], took 2.26min 2025-12-04T17:09:12.9684680Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_maskedtensor/test_maskedtensor-1089c4e953521eec.xml 2025-12-04T17:09:13.0411969Z Running benchmark_utils/test_benchmark_utils 1/1 ... [2025-12-04 17:09:13.040837][28581.423729815] 2025-12-04T17:09:13.0412621Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T17:09:13.0415623Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'benchmark_utils/test_benchmark_utils.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 17:09:13.041291] 2025-12-04T17:09:22.3696873Z 2025-12-04T17:09:22.3698045Z benchmark_utils/test_benchmark_utils 1/1 was successful, full logs can be found in artifacts with path test/test-reports/benchmark_utils.test_benchmark_utils_1.1_63175fb80c7f9ea7_.log 2025-12-04T17:09:22.3702957Z Running 9 items in this shard: test/benchmark_utils/test_benchmark_utils.py::TestBenchmarkUtils::test_adaptive_timer, test/benchmark_utils/test_benchmark_utils.py::TestBenchmarkUtils::test_collect_callgrind, test/benchmark_utils/test_benchmark_utils.py::TestBenchmarkUtils::test_collect_cpp_callgrind, test/benchmark_utils/test_benchmark_utils.py::TestBenchmarkUtils::test_compare, test/benchmark_utils/test_benchmark_utils.py::TestBenchmarkUtils::test_cpp_timer, test/benchmark_utils/test_benchmark_utils.py::TestBenchmarkUtils::test_fuzzer, test/benchmark_utils/test_benchmark_utils.py::TestBenchmarkUtils::test_manipulate_callgrind_stats, test/benchmark_utils/test_benchmark_utils.py::TestBenchmarkUtils::test_timer, test/benchmark_utils/test_benchmark_utils.py::TestBenchmarkUtils::test_timer_tiny_fast_snippet 2025-12-04T17:09:22.3707071Z 2025-12-04T17:09:22.3707744Z Finished benchmark_utils/test_benchmark_utils 1/1 ... [2025-12-04 17:09:22.369471][28590.752366852], took 0.16min 2025-12-04T17:09:22.4059839Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/benchmark_utils.test_benchmark_utils/benchmark_utils.test_benchmark_utils-76c10c33afe299c4.xml 2025-12-04T17:09:22.4849813Z Running test_scaled_matmul_cuda 1/1 ... [2025-12-04 17:09:22.484682][28590.867576586] 2025-12-04T17:09:22.4850394Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T17:09:22.4853747Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_scaled_matmul_cuda.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 17:09:22.485110] 2025-12-04T17:09:31.7636167Z 2025-12-04T17:09:31.7637381Z test_scaled_matmul_cuda 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_scaled_matmul_cuda_1.1_751f5e87909cbd5d_.log 2025-12-04T17:09:31.8337474Z Running 893 items in this shard: test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_compile_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_error_messages_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_error_messages_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_False_1023_64_48_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_False_1023_64_48_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_False_1023_64_48_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_False_1025_128_96_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_False_1025_128_96_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_False_1025_128_96_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_False_127_96_1024_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_False_127_96_1024_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_False_127_96_1024_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_False_128_128_128_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_False_128_128_128_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_False_128_128_128_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_False_128_256_512_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_False_128_256_512_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_False_128_256_512_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_False_197_224_272_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_False_197_224_272_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_False_197_224_272_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_False_197_240_272_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_False_197_240_272_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_False_197_240_272_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_False_256_256_256_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_False_256_256_256_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_False_256_256_256_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_False_256_512_128_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_False_256_512_128_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_False_256_512_128_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_False_2_1024_128_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_False_2_1024_128_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_False_2_1024_128_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_False_31_1024_64_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_False_31_1024_64_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_False_31_1024_64_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_False_45_96_1024_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_False_45_96_1024_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_False_45_96_1024_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_False_512_128_256_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_False_512_128_256_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_False_512_128_256_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_False_65_96_112_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_False_65_96_112_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_False_65_96_112_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_True_1023_64_48_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_True_1023_64_48_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_True_1023_64_48_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_True_1025_128_96_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_True_1025_128_96_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_True_1025_128_96_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_True_127_96_1024_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_True_127_96_1024_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_True_127_96_1024_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_True_128_128_128_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_True_128_128_128_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_True_128_128_128_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_True_128_256_512_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_True_128_256_512_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_True_128_256_512_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_True_197_224_272_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_True_197_224_272_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_True_197_224_272_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_True_197_240_272_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_True_197_240_272_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_True_197_240_272_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_True_256_256_256_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_True_256_256_256_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_True_256_256_256_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_True_256_512_128_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_True_256_512_128_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_True_256_512_128_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_True_2_1024_128_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_True_2_1024_128_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_True_2_1024_128_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_True_31_1024_64_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_True_31_1024_64_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_True_31_1024_64_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_True_45_96_1024_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_True_45_96_1024_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_True_45_96_1024_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_True_512_128_256_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_True_512_128_256_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_True_512_128_256_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_True_65_96_112_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_True_65_96_112_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_True_65_96_112_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_False_1023_64_48_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_False_1023_64_48_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_False_1023_64_48_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_False_1025_128_96_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_False_1025_128_96_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_False_1025_128_96_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_False_127_96_1024_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_False_127_96_1024_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_False_127_96_1024_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_False_128_128_128_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_False_128_128_128_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_False_128_128_128_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_False_128_256_512_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_False_128_256_512_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_False_128_256_512_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_False_197_224_272_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_False_197_224_272_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_False_197_224_272_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_False_197_240_272_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_False_197_240_272_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_False_197_240_272_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_False_256_256_256_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_False_256_256_256_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_False_256_256_256_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_False_256_512_128_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_False_256_512_128_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_False_256_512_128_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_False_2_1024_128_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_False_2_1024_128_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_False_2_1024_128_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_False_31_1024_64_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_False_31_1024_64_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_False_31_1024_64_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_False_45_96_1024_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_False_45_96_1024_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_False_45_96_1024_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_False_512_128_256_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_False_512_128_256_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_False_512_128_256_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_False_65_96_112_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_False_65_96_112_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_False_65_96_112_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_True_1023_64_48_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_True_1023_64_48_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_True_1023_64_48_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_True_1025_128_96_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_True_1025_128_96_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_True_1025_128_96_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_True_127_96_1024_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_True_127_96_1024_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_True_127_96_1024_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_True_128_128_128_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_True_128_128_128_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_True_128_128_128_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_True_128_256_512_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_True_128_256_512_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_True_128_256_512_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_True_197_224_272_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_True_197_224_272_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_True_197_224_272_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_True_197_240_272_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_True_197_240_272_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_True_197_240_272_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_True_256_256_256_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_True_256_256_256_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_True_256_256_256_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_True_256_512_128_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_True_256_512_128_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_True_256_512_128_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_True_2_1024_128_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_True_2_1024_128_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_True_2_1024_128_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_True_31_1024_64_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_True_31_1024_64_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_True_31_1024_64_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_True_45_96_1024_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_True_45_96_1024_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_True_45_96_1024_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_True_512_128_256_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_True_512_128_256_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_True_512_128_256_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_True_65_96_112_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_True_65_96_112_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_True_65_96_112_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_False_1023_64_48_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_False_1023_64_48_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_False_1023_64_48_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_False_1025_128_96_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_False_1025_128_96_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_False_1025_128_96_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_False_127_96_1024_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_False_127_96_1024_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_False_127_96_1024_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_False_128_128_128_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_False_128_128_128_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_False_128_128_128_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_False_128_256_512_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_False_128_256_512_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_False_128_256_512_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_False_197_224_272_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_False_197_224_272_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_False_197_224_272_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_False_197_240_272_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_False_197_240_272_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_False_197_240_272_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_False_256_256_256_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_False_256_256_256_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_False_256_256_256_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_False_256_512_128_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_False_256_512_128_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_False_256_512_128_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_False_2_1024_128_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_False_2_1024_128_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_False_2_1024_128_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_False_31_1024_64_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_False_31_1024_64_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_False_31_1024_64_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_False_45_96_1024_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_False_45_96_1024_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_False_45_96_1024_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_False_512_128_256_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_False_512_128_256_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_False_512_128_256_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_False_65_96_112_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_False_65_96_112_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_False_65_96_112_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_True_1023_64_48_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_True_1023_64_48_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_True_1023_64_48_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_True_1025_128_96_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_True_1025_128_96_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_True_1025_128_96_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_True_127_96_1024_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_True_127_96_1024_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_True_127_96_1024_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_True_128_128_128_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_True_128_128_128_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_True_128_128_128_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_True_128_256_512_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_True_128_256_512_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_True_128_256_512_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_True_197_224_272_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_True_197_224_272_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_True_197_224_272_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_True_197_240_272_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_True_197_240_272_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_True_197_240_272_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_True_256_256_256_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_True_256_256_256_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_True_256_256_256_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_True_256_512_128_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_True_256_512_128_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_True_256_512_128_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_True_2_1024_128_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_True_2_1024_128_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_True_2_1024_128_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_True_31_1024_64_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_True_31_1024_64_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_True_31_1024_64_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_True_45_96_1024_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_True_45_96_1024_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_True_45_96_1024_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_True_512_128_256_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_True_512_128_256_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_True_512_128_256_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_True_65_96_112_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_True_65_96_112_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_True_65_96_112_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_False_1023_64_48_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_False_1023_64_48_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_False_1023_64_48_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_False_1025_128_96_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_False_1025_128_96_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_False_1025_128_96_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_False_127_96_1024_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_False_127_96_1024_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_False_127_96_1024_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_False_128_128_128_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_False_128_128_128_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_False_128_128_128_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_False_128_256_512_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_False_128_256_512_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_False_128_256_512_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_False_197_224_272_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_False_197_224_272_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_False_197_224_272_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_False_197_240_272_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_False_197_240_272_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_False_197_240_272_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_False_256_256_256_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_False_256_256_256_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_False_256_256_256_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_False_256_512_128_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_False_256_512_128_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_False_256_512_128_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_False_2_1024_128_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_False_2_1024_128_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_False_2_1024_128_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_False_31_1024_64_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_False_31_1024_64_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_False_31_1024_64_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_False_45_96_1024_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_False_45_96_1024_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_False_45_96_1024_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_False_512_128_256_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_False_512_128_256_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_False_512_128_256_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_False_65_96_112_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_False_65_96_112_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_False_65_96_112_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_True_1023_64_48_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_True_1023_64_48_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_True_1023_64_48_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_True_1025_128_96_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_True_1025_128_96_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_True_1025_128_96_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_True_127_96_1024_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_True_127_96_1024_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_True_127_96_1024_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_True_128_128_128_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_True_128_128_128_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_True_128_128_128_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_True_128_256_512_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_True_128_256_512_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_True_128_256_512_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_True_197_224_272_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_True_197_224_272_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_True_197_224_272_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_True_197_240_272_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_True_197_240_272_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_True_197_240_272_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_True_256_256_256_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_True_256_256_256_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_True_256_256_256_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_True_256_512_128_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_True_256_512_128_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_True_256_512_128_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_True_2_1024_128_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_True_2_1024_128_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_True_2_1024_128_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_True_31_1024_64_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_True_31_1024_64_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_True_31_1024_64_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_True_45_96_1024_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_True_45_96_1024_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_True_45_96_1024_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_True_512_128_256_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_True_512_128_256_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_True_512_128_256_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_True_65_96_112_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_True_65_96_112_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_True_65_96_112_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_False_1023_64_48_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_False_1023_64_48_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_False_1023_64_48_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_False_1025_128_96_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_False_1025_128_96_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_False_1025_128_96_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_False_127_96_1024_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_False_127_96_1024_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_False_127_96_1024_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_False_128_128_128_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_False_128_128_128_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_False_128_128_128_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_False_128_256_512_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_False_128_256_512_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_False_128_256_512_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_False_197_224_272_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_False_197_224_272_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_False_197_224_272_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_False_197_240_272_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_False_197_240_272_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_False_197_240_272_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_False_256_256_256_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_False_256_256_256_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_False_256_256_256_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_False_256_512_128_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_False_256_512_128_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_False_256_512_128_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_False_2_1024_128_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_False_2_1024_128_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_False_2_1024_128_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_False_31_1024_64_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_False_31_1024_64_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_False_31_1024_64_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_False_45_96_1024_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_False_45_96_1024_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_False_45_96_1024_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_False_512_128_256_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_False_512_128_256_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_False_512_128_256_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_False_65_96_112_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_False_65_96_112_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_False_65_96_112_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_True_1023_64_48_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_True_1023_64_48_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_True_1023_64_48_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_True_1025_128_96_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_True_1025_128_96_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_True_1025_128_96_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_True_127_96_1024_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_True_127_96_1024_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_True_127_96_1024_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_True_128_128_128_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_True_128_128_128_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_True_128_128_128_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_True_128_256_512_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_True_128_256_512_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_True_128_256_512_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_True_197_224_272_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_True_197_224_272_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_True_197_224_272_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_True_197_240_272_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_True_197_240_272_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_True_197_240_272_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_True_256_256_256_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_True_256_256_256_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_True_256_256_256_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_True_256_512_128_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_True_256_512_128_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_True_256_512_128_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_True_2_1024_128_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_True_2_1024_128_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_True_2_1024_128_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_True_31_1024_64_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_True_31_1024_64_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_True_31_1024_64_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_True_45_96_1024_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_True_45_96_1024_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_True_45_96_1024_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_True_512_128_256_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_True_512_128_256_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_True_512_128_256_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_True_65_96_112_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_True_65_96_112_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_True_65_96_112_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_False_1023_64_48_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_False_1023_64_48_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_False_1023_64_48_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_False_1025_128_96_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_False_1025_128_96_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_False_1025_128_96_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_False_127_96_1024_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_False_127_96_1024_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_False_127_96_1024_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_False_128_128_128_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_False_128_128_128_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_False_128_128_128_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_False_128_256_512_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_False_128_256_512_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_False_128_256_512_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_False_197_224_272_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_False_197_224_272_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_False_197_224_272_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_False_197_240_272_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_False_197_240_272_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_False_197_240_272_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_False_256_256_256_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_False_256_256_256_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_False_256_256_256_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_False_256_512_128_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_False_256_512_128_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_False_256_512_128_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_False_2_1024_128_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_False_2_1024_128_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_False_2_1024_128_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_False_31_1024_64_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_False_31_1024_64_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_False_31_1024_64_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_False_45_96_1024_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_False_45_96_1024_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_False_45_96_1024_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_False_512_128_256_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_False_512_128_256_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_False_512_128_256_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_False_65_96_112_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_False_65_96_112_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_False_65_96_112_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_True_1023_64_48_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_True_1023_64_48_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_True_1023_64_48_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_True_1025_128_96_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_True_1025_128_96_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_True_1025_128_96_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_True_127_96_1024_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_True_127_96_1024_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_True_127_96_1024_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_True_128_128_128_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_True_128_128_128_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_True_128_128_128_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_True_128_256_512_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_True_128_256_512_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_True_128_256_512_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_True_197_224_272_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_True_197_224_272_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_True_197_224_272_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_True_197_240_272_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_True_197_240_272_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_True_197_240_272_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_True_256_256_256_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_True_256_256_256_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_True_256_256_256_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_True_256_512_128_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_True_256_512_128_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_True_256_512_128_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_True_2_1024_128_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_True_2_1024_128_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_True_2_1024_128_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_True_31_1024_64_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_True_31_1024_64_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_True_31_1024_64_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_True_45_96_1024_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_True_45_96_1024_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_True_45_96_1024_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_True_512_128_256_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_True_512_128_256_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_True_512_128_256_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_True_65_96_112_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_True_65_96_112_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_True_65_96_112_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_False_1023_64_48_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_False_1023_64_48_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_False_1023_64_48_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_False_1025_128_96_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_False_1025_128_96_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_False_1025_128_96_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_False_127_96_1024_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_False_127_96_1024_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_False_127_96_1024_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_False_128_128_128_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_False_128_128_128_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_False_128_128_128_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_False_128_256_512_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_False_128_256_512_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_False_128_256_512_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_False_197_224_272_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_False_197_224_272_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_False_197_224_272_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_False_197_240_272_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_False_197_240_272_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_False_197_240_272_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_False_256_256_256_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_False_256_256_256_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_False_256_256_256_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_False_256_512_128_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_False_256_512_128_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_False_256_512_128_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_False_2_1024_128_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_False_2_1024_128_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_False_2_1024_128_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_False_31_1024_64_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_False_31_1024_64_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_False_31_1024_64_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_False_45_96_1024_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_False_45_96_1024_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_False_45_96_1024_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_False_512_128_256_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_False_512_128_256_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_False_512_128_256_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_False_65_96_112_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_False_65_96_112_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_False_65_96_112_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_True_1023_64_48_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_True_1023_64_48_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_True_1023_64_48_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_True_1025_128_96_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_True_1025_128_96_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_True_1025_128_96_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_True_127_96_1024_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_True_127_96_1024_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_True_127_96_1024_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_True_128_128_128_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_True_128_128_128_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_True_128_128_128_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_True_128_256_512_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_True_128_256_512_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_True_128_256_512_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_True_197_224_272_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_True_197_224_272_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_True_197_224_272_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_True_197_240_272_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_True_197_240_272_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_True_197_240_272_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_True_256_256_256_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_True_256_256_256_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_True_256_256_256_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_True_256_512_128_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_True_256_512_128_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_True_256_512_128_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_True_2_1024_128_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_True_2_1024_128_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_True_2_1024_128_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_True_31_1024_64_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_True_31_1024_64_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_True_31_1024_64_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_True_45_96_1024_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_True_45_96_1024_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_True_45_96_1024_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_True_512_128_256_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_True_512_128_256_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_True_512_128_256_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_True_65_96_112_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_True_65_96_112_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_True_65_96_112_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_False_1023_64_48_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_False_1023_64_48_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_False_1023_64_48_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_False_1025_128_96_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_False_1025_128_96_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_False_1025_128_96_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_False_127_96_1024_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_False_127_96_1024_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_False_127_96_1024_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_False_128_128_128_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_False_128_128_128_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_False_128_128_128_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_False_128_256_512_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_False_128_256_512_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_False_128_256_512_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_False_197_224_272_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_False_197_224_272_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_False_197_224_272_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_False_197_240_272_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_False_197_240_272_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_False_197_240_272_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_False_256_256_256_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_False_256_256_256_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_False_256_256_256_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_False_256_512_128_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_False_256_512_128_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_False_256_512_128_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_False_2_1024_128_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_False_2_1024_128_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_False_2_1024_128_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_False_31_1024_64_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_False_31_1024_64_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_False_31_1024_64_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_False_45_96_1024_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_False_45_96_1024_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_False_45_96_1024_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_False_512_128_256_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_False_512_128_256_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_False_512_128_256_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_False_65_96_112_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_False_65_96_112_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_False_65_96_112_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_True_1023_64_48_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_True_1023_64_48_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_True_1023_64_48_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_True_1025_128_96_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_True_1025_128_96_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_True_1025_128_96_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_True_127_96_1024_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_True_127_96_1024_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_True_127_96_1024_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_True_128_128_128_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_True_128_128_128_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_True_128_128_128_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_True_128_256_512_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_True_128_256_512_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_True_128_256_512_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_True_197_224_272_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_True_197_224_272_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_True_197_224_272_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_True_197_240_272_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_True_197_240_272_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_True_197_240_272_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_True_256_256_256_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_True_256_256_256_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_True_256_256_256_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_True_256_512_128_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_True_256_512_128_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_True_256_512_128_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_True_2_1024_128_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_True_2_1024_128_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_True_2_1024_128_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_True_31_1024_64_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_True_31_1024_64_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_True_31_1024_64_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_True_45_96_1024_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_True_45_96_1024_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_True_45_96_1024_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_True_512_128_256_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_True_512_128_256_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_True_512_128_256_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_True_65_96_112_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_True_65_96_112_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_True_65_96_112_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_nvfp4_compile_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_nvfp4_with_global_scale_1023_64_48_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_nvfp4_with_global_scale_1025_128_96_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_nvfp4_with_global_scale_127_96_1024_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_nvfp4_with_global_scale_128_128_128_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_nvfp4_with_global_scale_128_256_512_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_nvfp4_with_global_scale_256_256_256_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_nvfp4_with_global_scale_256_512_128_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_nvfp4_with_global_scale_2_1024_128_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_nvfp4_with_global_scale_31_1024_64_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_nvfp4_with_global_scale_45_96_1024_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_nvfp4_with_global_scale_512_128_256_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_error_message_fp8_pre_sm89_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_float32_output_errors_with_bias_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_float8_basics_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_float8_bias_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_float8_bias_relu_edgecase_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_float8_error_messages_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_float8_rowwise_scaling_sanity_use_fast_accum_False_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_float8_rowwise_scaling_sanity_use_fast_accum_True_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_float8_scale_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_float8_scale_fast_accum_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_honor_sm_carveout_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_mxfp8_nvfp4_scaled_grouped_mm_2d_2d_G_16_M_2048_N_8192_K_16640_format_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_mxfp8_nvfp4_scaled_grouped_mm_2d_2d_G_16_M_2048_N_8192_K_16640_format_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_mxfp8_nvfp4_scaled_grouped_mm_2d_2d_G_16_M_2048_N_8192_K_16640_format_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_mxfp8_nvfp4_scaled_grouped_mm_2d_2d_G_16_M_2049_N_8192_K_16640_format_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_mxfp8_nvfp4_scaled_grouped_mm_2d_2d_G_16_M_2049_N_8192_K_16640_format_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_mxfp8_nvfp4_scaled_grouped_mm_2d_2d_G_16_M_2049_N_8192_K_16640_format_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_mxfp8_nvfp4_scaled_grouped_mm_2d_2d_G_1_M_2048_N_8192_K_16640_format_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_mxfp8_nvfp4_scaled_grouped_mm_2d_2d_G_1_M_2048_N_8192_K_16640_format_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_mxfp8_nvfp4_scaled_grouped_mm_2d_2d_G_1_M_2048_N_8192_K_16640_format_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_mxfp8_nvfp4_scaled_grouped_mm_2d_2d_G_1_M_2049_N_8192_K_16640_format_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_mxfp8_nvfp4_scaled_grouped_mm_2d_2d_G_1_M_2049_N_8192_K_16640_format_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_mxfp8_nvfp4_scaled_grouped_mm_2d_2d_G_1_M_2049_N_8192_K_16640_format_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_mxfp8_nvfp4_scaled_grouped_mm_2d_2d_G_4_M_2048_N_8192_K_16640_format_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_mxfp8_nvfp4_scaled_grouped_mm_2d_2d_G_4_M_2048_N_8192_K_16640_format_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_mxfp8_nvfp4_scaled_grouped_mm_2d_2d_G_4_M_2048_N_8192_K_16640_format_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_mxfp8_nvfp4_scaled_grouped_mm_2d_2d_G_4_M_2049_N_8192_K_16640_format_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_mxfp8_nvfp4_scaled_grouped_mm_2d_2d_G_4_M_2049_N_8192_K_16640_format_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_mxfp8_nvfp4_scaled_grouped_mm_2d_2d_G_4_M_2049_N_8192_K_16640_format_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_mxfp8_scaled_grouped_mm_2d_3d_G_16_M_16640_N_8192_K_4096_format_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_mxfp8_scaled_grouped_mm_2d_3d_G_16_M_16640_N_8192_K_4096_format_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_mxfp8_scaled_grouped_mm_2d_3d_G_16_M_16640_N_8192_K_4096_format_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_mxfp8_scaled_grouped_mm_2d_3d_G_1_M_16640_N_8192_K_4096_format_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_mxfp8_scaled_grouped_mm_2d_3d_G_1_M_16640_N_8192_K_4096_format_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_mxfp8_scaled_grouped_mm_2d_3d_G_1_M_16640_N_8192_K_4096_format_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_mxfp8_scaled_grouped_mm_2d_3d_G_4_M_16640_N_8192_K_4096_format_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_mxfp8_scaled_grouped_mm_2d_3d_G_4_M_16640_N_8192_K_4096_format_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_mxfp8_scaled_grouped_mm_2d_3d_G_4_M_16640_N_8192_K_4096_format_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_non_divisible_leading_dim_bias_False_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_non_divisible_leading_dim_bias_True_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_pack_uint4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_grouped_gemm_2d_2d_fast_accum_False_strided_False_wrap_v2_False_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_grouped_gemm_2d_2d_fast_accum_False_strided_False_wrap_v2_True_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_grouped_gemm_2d_2d_fast_accum_False_strided_True_wrap_v2_False_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_grouped_gemm_2d_2d_fast_accum_False_strided_True_wrap_v2_True_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_grouped_gemm_2d_2d_fast_accum_True_strided_False_wrap_v2_False_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_grouped_gemm_2d_2d_fast_accum_True_strided_False_wrap_v2_True_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_grouped_gemm_2d_2d_fast_accum_True_strided_True_wrap_v2_False_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_grouped_gemm_2d_2d_fast_accum_True_strided_True_wrap_v2_True_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_grouped_gemm_2d_3d_fast_accum_False_strided_False_wrap_v2_False_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_grouped_gemm_2d_3d_fast_accum_False_strided_False_wrap_v2_True_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_grouped_gemm_2d_3d_fast_accum_False_strided_True_wrap_v2_False_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_grouped_gemm_2d_3d_fast_accum_False_strided_True_wrap_v2_True_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_grouped_gemm_2d_3d_fast_accum_True_strided_False_wrap_v2_False_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_grouped_gemm_2d_3d_fast_accum_True_strided_False_wrap_v2_True_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_grouped_gemm_2d_3d_fast_accum_True_strided_True_wrap_v2_False_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_grouped_gemm_2d_3d_fast_accum_True_strided_True_wrap_v2_True_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_grouped_gemm_3d_2d_fast_accum_False_strided_False_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_grouped_gemm_3d_2d_fast_accum_False_strided_True_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_grouped_gemm_3d_2d_fast_accum_True_strided_False_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_grouped_gemm_3d_2d_fast_accum_True_strided_True_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_grouped_gemm_3d_3d_fast_accum_False_strided_False_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_grouped_gemm_3d_3d_fast_accum_False_strided_True_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_grouped_gemm_3d_3d_fast_accum_True_strided_False_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_grouped_gemm_3d_3d_fast_accum_True_strided_True_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_bfloat16_lhs_block_128_rhs_block_1_M_256_N_768_K_512_test_case_data_random_calc_scales_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_bfloat16_lhs_block_128_rhs_block_1_M_256_N_768_K_512_test_case_data_random_scales_one_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_bfloat16_lhs_block_128_rhs_block_1_M_256_N_768_K_512_test_case_x_eye_b_eye_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_bfloat16_lhs_block_128_rhs_block_1_M_256_N_768_K_512_test_case_x_ones_y_ones_calc_scales_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_bfloat16_lhs_block_128_rhs_block_1_M_256_N_768_K_512_test_case_x_ones_y_ones_modify_scales_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_bfloat16_lhs_block_128_rhs_block_1_M_256_N_768_K_512_test_case_x_ones_y_ones_set_scales_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_bfloat16_lhs_block_128_rhs_block_1_M_384_N_128_K_1280_test_case_data_random_calc_scales_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_bfloat16_lhs_block_128_rhs_block_1_M_384_N_128_K_1280_test_case_data_random_scales_one_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_bfloat16_lhs_block_128_rhs_block_1_M_384_N_128_K_1280_test_case_x_eye_b_eye_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_bfloat16_lhs_block_128_rhs_block_1_M_384_N_128_K_1280_test_case_x_ones_y_ones_calc_scales_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_bfloat16_lhs_block_128_rhs_block_1_M_384_N_128_K_1280_test_case_x_ones_y_ones_modify_scales_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_bfloat16_lhs_block_128_rhs_block_1_M_384_N_128_K_1280_test_case_x_ones_y_ones_set_scales_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_bfloat16_lhs_block_128_rhs_block_1_M_512_N_512_K_512_test_case_data_random_calc_scales_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_bfloat16_lhs_block_128_rhs_block_1_M_512_N_512_K_512_test_case_data_random_scales_one_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_bfloat16_lhs_block_128_rhs_block_1_M_512_N_512_K_512_test_case_x_eye_b_eye_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_bfloat16_lhs_block_128_rhs_block_1_M_512_N_512_K_512_test_case_x_ones_y_ones_calc_scales_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_bfloat16_lhs_block_128_rhs_block_1_M_512_N_512_K_512_test_case_x_ones_y_ones_modify_scales_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_bfloat16_lhs_block_128_rhs_block_1_M_512_N_512_K_512_test_case_x_ones_y_ones_set_scales_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_bfloat16_lhs_block_1_rhs_block_128_M_256_N_768_K_512_test_case_data_random_calc_scales_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_bfloat16_lhs_block_1_rhs_block_128_M_256_N_768_K_512_test_case_data_random_scales_one_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_bfloat16_lhs_block_1_rhs_block_128_M_256_N_768_K_512_test_case_x_eye_b_eye_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_bfloat16_lhs_block_1_rhs_block_128_M_256_N_768_K_512_test_case_x_ones_y_ones_calc_scales_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_bfloat16_lhs_block_1_rhs_block_128_M_256_N_768_K_512_test_case_x_ones_y_ones_modify_scales_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_bfloat16_lhs_block_1_rhs_block_128_M_256_N_768_K_512_test_case_x_ones_y_ones_set_scales_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_bfloat16_lhs_block_1_rhs_block_128_M_384_N_128_K_1280_test_case_data_random_calc_scales_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_bfloat16_lhs_block_1_rhs_block_128_M_384_N_128_K_1280_test_case_data_random_scales_one_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_bfloat16_lhs_block_1_rhs_block_128_M_384_N_128_K_1280_test_case_x_eye_b_eye_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_bfloat16_lhs_block_1_rhs_block_128_M_384_N_128_K_1280_test_case_x_ones_y_ones_calc_scales_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_bfloat16_lhs_block_1_rhs_block_128_M_384_N_128_K_1280_test_case_x_ones_y_ones_modify_scales_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_bfloat16_lhs_block_1_rhs_block_128_M_384_N_128_K_1280_test_case_x_ones_y_ones_set_scales_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_bfloat16_lhs_block_1_rhs_block_128_M_512_N_512_K_512_test_case_data_random_calc_scales_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_bfloat16_lhs_block_1_rhs_block_128_M_512_N_512_K_512_test_case_data_random_scales_one_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_bfloat16_lhs_block_1_rhs_block_128_M_512_N_512_K_512_test_case_x_eye_b_eye_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_bfloat16_lhs_block_1_rhs_block_128_M_512_N_512_K_512_test_case_x_ones_y_ones_calc_scales_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_bfloat16_lhs_block_1_rhs_block_128_M_512_N_512_K_512_test_case_x_ones_y_ones_modify_scales_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_bfloat16_lhs_block_1_rhs_block_128_M_512_N_512_K_512_test_case_x_ones_y_ones_set_scales_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_bfloat16_lhs_block_1_rhs_block_1_M_256_N_768_K_512_test_case_data_random_calc_scales_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_bfloat16_lhs_block_1_rhs_block_1_M_256_N_768_K_512_test_case_data_random_scales_one_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_bfloat16_lhs_block_1_rhs_block_1_M_256_N_768_K_512_test_case_x_eye_b_eye_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_bfloat16_lhs_block_1_rhs_block_1_M_256_N_768_K_512_test_case_x_ones_y_ones_calc_scales_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_bfloat16_lhs_block_1_rhs_block_1_M_256_N_768_K_512_test_case_x_ones_y_ones_modify_scales_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_bfloat16_lhs_block_1_rhs_block_1_M_256_N_768_K_512_test_case_x_ones_y_ones_set_scales_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_bfloat16_lhs_block_1_rhs_block_1_M_384_N_128_K_1280_test_case_data_random_calc_scales_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_bfloat16_lhs_block_1_rhs_block_1_M_384_N_128_K_1280_test_case_data_random_scales_one_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_bfloat16_lhs_block_1_rhs_block_1_M_384_N_128_K_1280_test_case_x_eye_b_eye_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_bfloat16_lhs_block_1_rhs_block_1_M_384_N_128_K_1280_test_case_x_ones_y_ones_calc_scales_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_bfloat16_lhs_block_1_rhs_block_1_M_384_N_128_K_1280_test_case_x_ones_y_ones_modify_scales_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_bfloat16_lhs_block_1_rhs_block_1_M_384_N_128_K_1280_test_case_x_ones_y_ones_set_scales_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_bfloat16_lhs_block_1_rhs_block_1_M_512_N_512_K_512_test_case_data_random_calc_scales_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_bfloat16_lhs_block_1_rhs_block_1_M_512_N_512_K_512_test_case_data_random_scales_one_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_bfloat16_lhs_block_1_rhs_block_1_M_512_N_512_K_512_test_case_x_eye_b_eye_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_bfloat16_lhs_block_1_rhs_block_1_M_512_N_512_K_512_test_case_x_ones_y_ones_calc_scales_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_bfloat16_lhs_block_1_rhs_block_1_M_512_N_512_K_512_test_case_x_ones_y_ones_modify_scales_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_bfloat16_lhs_block_1_rhs_block_1_M_512_N_512_K_512_test_case_x_ones_y_ones_set_scales_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_float32_lhs_block_128_rhs_block_1_M_256_N_768_K_512_test_case_data_random_calc_scales_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_float32_lhs_block_128_rhs_block_1_M_256_N_768_K_512_test_case_data_random_scales_one_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_float32_lhs_block_128_rhs_block_1_M_256_N_768_K_512_test_case_x_eye_b_eye_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_float32_lhs_block_128_rhs_block_1_M_256_N_768_K_512_test_case_x_ones_y_ones_calc_scales_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_float32_lhs_block_128_rhs_block_1_M_256_N_768_K_512_test_case_x_ones_y_ones_modify_scales_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_float32_lhs_block_128_rhs_block_1_M_256_N_768_K_512_test_case_x_ones_y_ones_set_scales_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_float32_lhs_block_128_rhs_block_1_M_384_N_128_K_1280_test_case_data_random_calc_scales_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_float32_lhs_block_128_rhs_block_1_M_384_N_128_K_1280_test_case_data_random_scales_one_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_float32_lhs_block_128_rhs_block_1_M_384_N_128_K_1280_test_case_x_eye_b_eye_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_float32_lhs_block_128_rhs_block_1_M_384_N_128_K_1280_test_case_x_ones_y_ones_calc_scales_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_float32_lhs_block_128_rhs_block_1_M_384_N_128_K_1280_test_case_x_ones_y_ones_modify_scales_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_float32_lhs_block_128_rhs_block_1_M_384_N_128_K_1280_test_case_x_ones_y_ones_set_scales_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_float32_lhs_block_128_rhs_block_1_M_512_N_512_K_512_test_case_data_random_calc_scales_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_float32_lhs_block_128_rhs_block_1_M_512_N_512_K_512_test_case_data_random_scales_one_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_float32_lhs_block_128_rhs_block_1_M_512_N_512_K_512_test_case_x_eye_b_eye_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_float32_lhs_block_128_rhs_block_1_M_512_N_512_K_512_test_case_x_ones_y_ones_calc_scales_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_float32_lhs_block_128_rhs_block_1_M_512_N_512_K_512_test_case_x_ones_y_ones_modify_scales_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_float32_lhs_block_128_rhs_block_1_M_512_N_512_K_512_test_case_x_ones_y_ones_set_scales_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_float32_lhs_block_1_rhs_block_128_M_256_N_768_K_512_test_case_data_random_calc_scales_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_float32_lhs_block_1_rhs_block_128_M_256_N_768_K_512_test_case_data_random_scales_one_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_float32_lhs_block_1_rhs_block_128_M_256_N_768_K_512_test_case_x_eye_b_eye_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_float32_lhs_block_1_rhs_block_128_M_256_N_768_K_512_test_case_x_ones_y_ones_calc_scales_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_float32_lhs_block_1_rhs_block_128_M_256_N_768_K_512_test_case_x_ones_y_ones_modify_scales_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_float32_lhs_block_1_rhs_block_128_M_256_N_768_K_512_test_case_x_ones_y_ones_set_scales_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_float32_lhs_block_1_rhs_block_128_M_384_N_128_K_1280_test_case_data_random_calc_scales_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_float32_lhs_block_1_rhs_block_128_M_384_N_128_K_1280_test_case_data_random_scales_one_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_float32_lhs_block_1_rhs_block_128_M_384_N_128_K_1280_test_case_x_eye_b_eye_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_float32_lhs_block_1_rhs_block_128_M_384_N_128_K_1280_test_case_x_ones_y_ones_calc_scales_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_float32_lhs_block_1_rhs_block_128_M_384_N_128_K_1280_test_case_x_ones_y_ones_modify_scales_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_float32_lhs_block_1_rhs_block_128_M_384_N_128_K_1280_test_case_x_ones_y_ones_set_scales_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_float32_lhs_block_1_rhs_block_128_M_512_N_512_K_512_test_case_data_random_calc_scales_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_float32_lhs_block_1_rhs_block_128_M_512_N_512_K_512_test_case_data_random_scales_one_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_float32_lhs_block_1_rhs_block_128_M_512_N_512_K_512_test_case_x_eye_b_eye_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_float32_lhs_block_1_rhs_block_128_M_512_N_512_K_512_test_case_x_ones_y_ones_calc_scales_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_float32_lhs_block_1_rhs_block_128_M_512_N_512_K_512_test_case_x_ones_y_ones_modify_scales_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_float32_lhs_block_1_rhs_block_128_M_512_N_512_K_512_test_case_x_ones_y_ones_set_scales_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_float32_lhs_block_1_rhs_block_1_M_256_N_768_K_512_test_case_data_random_calc_scales_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_float32_lhs_block_1_rhs_block_1_M_256_N_768_K_512_test_case_data_random_scales_one_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_float32_lhs_block_1_rhs_block_1_M_256_N_768_K_512_test_case_x_eye_b_eye_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_float32_lhs_block_1_rhs_block_1_M_256_N_768_K_512_test_case_x_ones_y_ones_calc_scales_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_float32_lhs_block_1_rhs_block_1_M_256_N_768_K_512_test_case_x_ones_y_ones_modify_scales_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_float32_lhs_block_1_rhs_block_1_M_256_N_768_K_512_test_case_x_ones_y_ones_set_scales_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_float32_lhs_block_1_rhs_block_1_M_384_N_128_K_1280_test_case_data_random_calc_scales_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_float32_lhs_block_1_rhs_block_1_M_384_N_128_K_1280_test_case_data_random_scales_one_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_float32_lhs_block_1_rhs_block_1_M_384_N_128_K_1280_test_case_x_eye_b_eye_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_float32_lhs_block_1_rhs_block_1_M_384_N_128_K_1280_test_case_x_ones_y_ones_calc_scales_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_float32_lhs_block_1_rhs_block_1_M_384_N_128_K_1280_test_case_x_ones_y_ones_modify_scales_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_float32_lhs_block_1_rhs_block_1_M_384_N_128_K_1280_test_case_x_ones_y_ones_set_scales_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_float32_lhs_block_1_rhs_block_1_M_512_N_512_K_512_test_case_data_random_calc_scales_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_float32_lhs_block_1_rhs_block_1_M_512_N_512_K_512_test_case_data_random_scales_one_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_float32_lhs_block_1_rhs_block_1_M_512_N_512_K_512_test_case_x_eye_b_eye_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_float32_lhs_block_1_rhs_block_1_M_512_N_512_K_512_test_case_x_ones_y_ones_calc_scales_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_float32_lhs_block_1_rhs_block_1_M_512_N_512_K_512_test_case_x_ones_y_ones_modify_scales_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_float32_lhs_block_1_rhs_block_1_M_512_N_512_K_512_test_case_x_ones_y_ones_set_scales_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_change_stride_bfloat16_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_change_stride_float16_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_change_stride_float32_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_deepseek_error_messages_bfloat16_lhs_block_128_rhs_block_1_M_256_N_256_K_256_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_deepseek_error_messages_bfloat16_lhs_block_128_rhs_block_1_M_256_N_256_K_512_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_deepseek_error_messages_bfloat16_lhs_block_1_rhs_block_128_M_256_N_256_K_256_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_deepseek_error_messages_bfloat16_lhs_block_1_rhs_block_128_M_256_N_256_K_512_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_deepseek_error_messages_bfloat16_lhs_block_1_rhs_block_1_M_256_N_256_K_256_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_deepseek_error_messages_bfloat16_lhs_block_1_rhs_block_1_M_256_N_256_K_512_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_vs_emulated_bfloat16_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_vs_emulated_block_wise_verify_small_shapes_bfloat16_lhs_block_128_rhs_block_1_M_256_N_128_K_256_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_vs_emulated_block_wise_verify_small_shapes_bfloat16_lhs_block_128_rhs_block_1_M_256_N_256_K_128_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_vs_emulated_block_wise_verify_small_shapes_bfloat16_lhs_block_1_rhs_block_128_M_256_N_128_K_256_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_vs_emulated_block_wise_verify_small_shapes_bfloat16_lhs_block_1_rhs_block_128_M_256_N_256_K_128_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_vs_emulated_block_wise_verify_small_shapes_bfloat16_lhs_block_1_rhs_block_1_M_256_N_128_K_256_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_vs_emulated_block_wise_verify_small_shapes_bfloat16_lhs_block_1_rhs_block_1_M_256_N_256_K_128_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_vs_emulated_block_wise_verify_small_shapes_float32_lhs_block_128_rhs_block_1_M_256_N_128_K_256_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_vs_emulated_block_wise_verify_small_shapes_float32_lhs_block_128_rhs_block_1_M_256_N_256_K_128_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_vs_emulated_block_wise_verify_small_shapes_float32_lhs_block_1_rhs_block_128_M_256_N_128_K_256_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_vs_emulated_block_wise_verify_small_shapes_float32_lhs_block_1_rhs_block_128_M_256_N_256_K_128_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_vs_emulated_block_wise_verify_small_shapes_float32_lhs_block_1_rhs_block_1_M_256_N_128_K_256_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_vs_emulated_block_wise_verify_small_shapes_float32_lhs_block_1_rhs_block_1_M_256_N_256_K_128_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_vs_emulated_float16_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_vs_emulated_float32_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_vs_emulated_row_wise_bfloat16_shapes0_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_vs_emulated_row_wise_float16_shapes0_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_vs_emulated_row_wise_float32_shapes0_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_zero_dim_tensorwise_which_dim_zero_0_use_torch_compile_False_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_zero_dim_tensorwise_which_dim_zero_0_use_torch_compile_True_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_zero_dim_tensorwise_which_dim_zero_1_use_torch_compile_False_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_zero_dim_tensorwise_which_dim_zero_1_use_torch_compile_True_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_zero_dim_tensorwise_which_dim_zero_2_use_torch_compile_False_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_zero_dim_tensorwise_which_dim_zero_2_use_torch_compile_True_cuda 2025-12-04T17:09:31.9023384Z 2025-12-04T17:09:31.9024093Z Finished test_scaled_matmul_cuda 1/1 ... [2025-12-04 17:09:31.765677][28600.14856865], took 0.15min 2025-12-04T17:09:31.9025467Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_scaled_matmul_cuda/test_scaled_matmul_cuda-d1f8763e6c1869e6.xml 2025-12-04T17:09:31.9320250Z Running torch_np/numpy_tests/core/test_shape_base 1/1 ... [2025-12-04 17:09:31.931701][28600.314594588] 2025-12-04T17:09:31.9320899Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T17:09:31.9324151Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'torch_np/numpy_tests/core/test_shape_base.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 17:09:31.932154] 2025-12-04T17:09:38.1561817Z 2025-12-04T17:09:38.1563009Z torch_np/numpy_tests/core/test_shape_base 1/1 was successful, full logs can be found in artifacts with path test/test-reports/torch_np.numpy_tests.core.test_shape_base_1.1_0a0e6d68a930787e_.log 2025-12-04T17:09:38.1621540Z Running 119 items in this shard: test/torch_np/numpy_tests/core/test_shape_base.py::TestAtleast1d::test_0D_array, test/torch_np/numpy_tests/core/test_shape_base.py::TestAtleast1d::test_1D_array, test/torch_np/numpy_tests/core/test_shape_base.py::TestAtleast1d::test_2D_array, test/torch_np/numpy_tests/core/test_shape_base.py::TestAtleast1d::test_3D_array, test/torch_np/numpy_tests/core/test_shape_base.py::TestAtleast1d::test_r1array, test/torch_np/numpy_tests/core/test_shape_base.py::TestAtleast2d::test_0D_array, test/torch_np/numpy_tests/core/test_shape_base.py::TestAtleast2d::test_1D_array, test/torch_np/numpy_tests/core/test_shape_base.py::TestAtleast2d::test_2D_array, test/torch_np/numpy_tests/core/test_shape_base.py::TestAtleast2d::test_3D_array, test/torch_np/numpy_tests/core/test_shape_base.py::TestAtleast2d::test_r2array, test/torch_np/numpy_tests/core/test_shape_base.py::TestAtleast3d::test_0D_array, test/torch_np/numpy_tests/core/test_shape_base.py::TestAtleast3d::test_1D_array, test/torch_np/numpy_tests/core/test_shape_base.py::TestAtleast3d::test_2D_array, test/torch_np/numpy_tests/core/test_shape_base.py::TestAtleast3d::test_3D_array, test/torch_np/numpy_tests/core/test_shape_base.py::TestHstack::test_0D_array, test/torch_np/numpy_tests/core/test_shape_base.py::TestHstack::test_1D_array, test/torch_np/numpy_tests/core/test_shape_base.py::TestHstack::test_2D_array, test/torch_np/numpy_tests/core/test_shape_base.py::TestHstack::test_casting_and_dtype, test/torch_np/numpy_tests/core/test_shape_base.py::TestHstack::test_casting_and_dtype_type_error, test/torch_np/numpy_tests/core/test_shape_base.py::TestHstack::test_empty_input, test/torch_np/numpy_tests/core/test_shape_base.py::TestHstack::test_generator, test/torch_np/numpy_tests/core/test_shape_base.py::TestHstack::test_non_iterable, test/torch_np/numpy_tests/core/test_shape_base.py::TestVstack::test_0D_array, test/torch_np/numpy_tests/core/test_shape_base.py::TestVstack::test_1D_array, test/torch_np/numpy_tests/core/test_shape_base.py::TestVstack::test_2D_array, test/torch_np/numpy_tests/core/test_shape_base.py::TestVstack::test_2D_array2, test/torch_np/numpy_tests/core/test_shape_base.py::TestVstack::test_casting_and_dtype, test/torch_np/numpy_tests/core/test_shape_base.py::TestVstack::test_casting_and_dtype_type_error, test/torch_np/numpy_tests/core/test_shape_base.py::TestVstack::test_empty_input, test/torch_np/numpy_tests/core/test_shape_base.py::TestVstack::test_generator, test/torch_np/numpy_tests/core/test_shape_base.py::TestVstack::test_non_iterable, test/torch_np/numpy_tests/core/test_shape_base.py::TestConcatenate::test_bad_out_shape, test/torch_np/numpy_tests/core/test_shape_base.py::TestConcatenate::test_concatenate, test/torch_np/numpy_tests/core/test_shape_base.py::TestConcatenate::test_concatenate_axis_None, test/torch_np/numpy_tests/core/test_shape_base.py::TestConcatenate::test_exceptions, test/torch_np/numpy_tests/core/test_shape_base.py::TestConcatenate::test_large_concatenate_axis_None, test/torch_np/numpy_tests/core/test_shape_base.py::TestConcatenate::test_operator_concat, test/torch_np/numpy_tests/core/test_shape_base.py::TestConcatenate::test_out_and_dtype_axis0_out_dtype_c8_casting_equiv, test/torch_np/numpy_tests/core/test_shape_base.py::TestConcatenate::test_out_and_dtype_axis0_out_dtype_c8_casting_no, test/torch_np/numpy_tests/core/test_shape_base.py::TestConcatenate::test_out_and_dtype_axis0_out_dtype_c8_casting_safe, test/torch_np/numpy_tests/core/test_shape_base.py::TestConcatenate::test_out_and_dtype_axis0_out_dtype_c8_casting_same_kind, test/torch_np/numpy_tests/core/test_shape_base.py::TestConcatenate::test_out_and_dtype_axis0_out_dtype_c8_casting_unsafe, test/torch_np/numpy_tests/core/test_shape_base.py::TestConcatenate::test_out_and_dtype_axis0_out_dtype_f4_casting_equiv, test/torch_np/numpy_tests/core/test_shape_base.py::TestConcatenate::test_out_and_dtype_axis0_out_dtype_f4_casting_no, test/torch_np/numpy_tests/core/test_shape_base.py::TestConcatenate::test_out_and_dtype_axis0_out_dtype_f4_casting_safe, test/torch_np/numpy_tests/core/test_shape_base.py::TestConcatenate::test_out_and_dtype_axis0_out_dtype_f4_casting_same_kind, test/torch_np/numpy_tests/core/test_shape_base.py::TestConcatenate::test_out_and_dtype_axis0_out_dtype_f4_casting_unsafe, test/torch_np/numpy_tests/core/test_shape_base.py::TestConcatenate::test_out_and_dtype_axis0_out_dtype_f8_casting_equiv, test/torch_np/numpy_tests/core/test_shape_base.py::TestConcatenate::test_out_and_dtype_axis0_out_dtype_f8_casting_no, test/torch_np/numpy_tests/core/test_shape_base.py::TestConcatenate::test_out_and_dtype_axis0_out_dtype_f8_casting_safe, test/torch_np/numpy_tests/core/test_shape_base.py::TestConcatenate::test_out_and_dtype_axis0_out_dtype_f8_casting_same_kind, test/torch_np/numpy_tests/core/test_shape_base.py::TestConcatenate::test_out_and_dtype_axis0_out_dtype_f8_casting_unsafe, test/torch_np/numpy_tests/core/test_shape_base.py::TestConcatenate::test_out_and_dtype_axis0_out_dtype_i8_casting_equiv, test/torch_np/numpy_tests/core/test_shape_base.py::TestConcatenate::test_out_and_dtype_axis0_out_dtype_i8_casting_no, test/torch_np/numpy_tests/core/test_shape_base.py::TestConcatenate::test_out_and_dtype_axis0_out_dtype_i8_casting_safe, test/torch_np/numpy_tests/core/test_shape_base.py::TestConcatenate::test_out_and_dtype_axis0_out_dtype_i8_casting_same_kind, test/torch_np/numpy_tests/core/test_shape_base.py::TestConcatenate::test_out_and_dtype_axis0_out_dtype_i8_casting_unsafe, test/torch_np/numpy_tests/core/test_shape_base.py::TestConcatenate::test_out_and_dtype_axis_0_out_dtype_c8_casting_equiv, test/torch_np/numpy_tests/core/test_shape_base.py::TestConcatenate::test_out_and_dtype_axis_0_out_dtype_c8_casting_no, test/torch_np/numpy_tests/core/test_shape_base.py::TestConcatenate::test_out_and_dtype_axis_0_out_dtype_c8_casting_safe, test/torch_np/numpy_tests/core/test_shape_base.py::TestConcatenate::test_out_and_dtype_axis_0_out_dtype_c8_casting_same_kind, test/torch_np/numpy_tests/core/test_shape_base.py::TestConcatenate::test_out_and_dtype_axis_0_out_dtype_c8_casting_unsafe, test/torch_np/numpy_tests/core/test_shape_base.py::TestConcatenate::test_out_and_dtype_axis_0_out_dtype_f4_casting_equiv, test/torch_np/numpy_tests/core/test_shape_base.py::TestConcatenate::test_out_and_dtype_axis_0_out_dtype_f4_casting_no, test/torch_np/numpy_tests/core/test_shape_base.py::TestConcatenate::test_out_and_dtype_axis_0_out_dtype_f4_casting_safe, test/torch_np/numpy_tests/core/test_shape_base.py::TestConcatenate::test_out_and_dtype_axis_0_out_dtype_f4_casting_same_kind, test/torch_np/numpy_tests/core/test_shape_base.py::TestConcatenate::test_out_and_dtype_axis_0_out_dtype_f4_casting_unsafe, test/torch_np/numpy_tests/core/test_shape_base.py::TestConcatenate::test_out_and_dtype_axis_0_out_dtype_f8_casting_equiv, test/torch_np/numpy_tests/core/test_shape_base.py::TestConcatenate::test_out_and_dtype_axis_0_out_dtype_f8_casting_no, test/torch_np/numpy_tests/core/test_shape_base.py::TestConcatenate::test_out_and_dtype_axis_0_out_dtype_f8_casting_safe, test/torch_np/numpy_tests/core/test_shape_base.py::TestConcatenate::test_out_and_dtype_axis_0_out_dtype_f8_casting_same_kind, test/torch_np/numpy_tests/core/test_shape_base.py::TestConcatenate::test_out_and_dtype_axis_0_out_dtype_f8_casting_unsafe, test/torch_np/numpy_tests/core/test_shape_base.py::TestConcatenate::test_out_and_dtype_axis_0_out_dtype_i8_casting_equiv, test/torch_np/numpy_tests/core/test_shape_base.py::TestConcatenate::test_out_and_dtype_axis_0_out_dtype_i8_casting_no, test/torch_np/numpy_tests/core/test_shape_base.py::TestConcatenate::test_out_and_dtype_axis_0_out_dtype_i8_casting_safe, test/torch_np/numpy_tests/core/test_shape_base.py::TestConcatenate::test_out_and_dtype_axis_0_out_dtype_i8_casting_same_kind, test/torch_np/numpy_tests/core/test_shape_base.py::TestConcatenate::test_out_and_dtype_axis_0_out_dtype_i8_casting_unsafe, test/torch_np/numpy_tests/core/test_shape_base.py::TestConcatenate::test_out_and_dtype_simple, test/torch_np/numpy_tests/core/test_shape_base.py::TestConcatenate::test_returns_copy, test/torch_np/numpy_tests/core/test_shape_base.py::TestStackMisc::test_stack, test/torch_np/numpy_tests/core/test_shape_base.py::TestStackMisc::test_stack_out_and_dtype_axis_0_out_dtype_c8_casting_equiv, test/torch_np/numpy_tests/core/test_shape_base.py::TestStackMisc::test_stack_out_and_dtype_axis_0_out_dtype_c8_casting_no, test/torch_np/numpy_tests/core/test_shape_base.py::TestStackMisc::test_stack_out_and_dtype_axis_0_out_dtype_c8_casting_safe, test/torch_np/numpy_tests/core/test_shape_base.py::TestStackMisc::test_stack_out_and_dtype_axis_0_out_dtype_c8_casting_same_kind, test/torch_np/numpy_tests/core/test_shape_base.py::TestStackMisc::test_stack_out_and_dtype_axis_0_out_dtype_c8_casting_unsafe, test/torch_np/numpy_tests/core/test_shape_base.py::TestStackMisc::test_stack_out_and_dtype_axis_0_out_dtype_f4_casting_equiv, test/torch_np/numpy_tests/core/test_shape_base.py::TestStackMisc::test_stack_out_and_dtype_axis_0_out_dtype_f4_casting_no, test/torch_np/numpy_tests/core/test_shape_base.py::TestStackMisc::test_stack_out_and_dtype_axis_0_out_dtype_f4_casting_safe, test/torch_np/numpy_tests/core/test_shape_base.py::TestStackMisc::test_stack_out_and_dtype_axis_0_out_dtype_f4_casting_same_kind, test/torch_np/numpy_tests/core/test_shape_base.py::TestStackMisc::test_stack_out_and_dtype_axis_0_out_dtype_f4_casting_unsafe, test/torch_np/numpy_tests/core/test_shape_base.py::TestStackMisc::test_stack_out_and_dtype_axis_0_out_dtype_f8_casting_equiv, test/torch_np/numpy_tests/core/test_shape_base.py::TestStackMisc::test_stack_out_and_dtype_axis_0_out_dtype_f8_casting_no, test/torch_np/numpy_tests/core/test_shape_base.py::TestStackMisc::test_stack_out_and_dtype_axis_0_out_dtype_f8_casting_safe, test/torch_np/numpy_tests/core/test_shape_base.py::TestStackMisc::test_stack_out_and_dtype_axis_0_out_dtype_f8_casting_same_kind, test/torch_np/numpy_tests/core/test_shape_base.py::TestStackMisc::test_stack_out_and_dtype_axis_0_out_dtype_f8_casting_unsafe, test/torch_np/numpy_tests/core/test_shape_base.py::TestStackMisc::test_stack_out_and_dtype_axis_0_out_dtype_i8_casting_equiv, test/torch_np/numpy_tests/core/test_shape_base.py::TestStackMisc::test_stack_out_and_dtype_axis_0_out_dtype_i8_casting_no, test/torch_np/numpy_tests/core/test_shape_base.py::TestStackMisc::test_stack_out_and_dtype_axis_0_out_dtype_i8_casting_safe, test/torch_np/numpy_tests/core/test_shape_base.py::TestStackMisc::test_stack_out_and_dtype_axis_0_out_dtype_i8_casting_same_kind, test/torch_np/numpy_tests/core/test_shape_base.py::TestStackMisc::test_stack_out_and_dtype_axis_0_out_dtype_i8_casting_unsafe, test/torch_np/numpy_tests/core/test_shape_base.py::TestBlock::test_3d, test/torch_np/numpy_tests/core/test_shape_base.py::TestBlock::test_block_complicated, test/torch_np/numpy_tests/core/test_shape_base.py::TestBlock::test_block_memory_order, test/torch_np/numpy_tests/core/test_shape_base.py::TestBlock::test_block_mixed_1d_and_2d, test/torch_np/numpy_tests/core/test_shape_base.py::TestBlock::test_block_simple_column_wise, test/torch_np/numpy_tests/core/test_shape_base.py::TestBlock::test_block_simple_row_wise, test/torch_np/numpy_tests/core/test_shape_base.py::TestBlock::test_block_total_size_estimate, test/torch_np/numpy_tests/core/test_shape_base.py::TestBlock::test_block_with_1d_arrays_column_wise, test/torch_np/numpy_tests/core/test_shape_base.py::TestBlock::test_block_with_1d_arrays_multiple_rows, test/torch_np/numpy_tests/core/test_shape_base.py::TestBlock::test_block_with_1d_arrays_row_wise, test/torch_np/numpy_tests/core/test_shape_base.py::TestBlock::test_block_with_mismatched_shape, test/torch_np/numpy_tests/core/test_shape_base.py::TestBlock::test_different_ndims, test/torch_np/numpy_tests/core/test_shape_base.py::TestBlock::test_different_ndims_depths, test/torch_np/numpy_tests/core/test_shape_base.py::TestBlock::test_empty_lists, test/torch_np/numpy_tests/core/test_shape_base.py::TestBlock::test_invalid_nesting, test/torch_np/numpy_tests/core/test_shape_base.py::TestBlock::test_nested, test/torch_np/numpy_tests/core/test_shape_base.py::TestBlock::test_no_lists, test/torch_np/numpy_tests/core/test_shape_base.py::TestBlock::test_returns_copy, test/torch_np/numpy_tests/core/test_shape_base.py::TestBlock::test_tuple 2025-12-04T17:09:38.1678836Z 2025-12-04T17:09:38.1679273Z Finished torch_np/numpy_tests/core/test_shape_base 1/1 ... [2025-12-04 17:09:38.156253][28606.539148309], took 0.10min 2025-12-04T17:09:38.1934191Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/torch_np.numpy_tests.core.test_shape_base/torch_np.numpy_tests.core.test_shape_base-b9eed7c143bc9bc3.xml 2025-12-04T17:09:38.2640860Z Running test_vulkan 1/1 ... [2025-12-04 17:09:38.263784][28606.646679071] 2025-12-04T17:09:38.2641366Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T17:09:38.2644936Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_vulkan.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 17:09:38.264227] 2025-12-04T17:09:43.2861253Z 2025-12-04T17:09:43.2862143Z test_vulkan 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_vulkan_1.1_2892328dc9a2ec74_.log 2025-12-04T17:09:43.2863169Z Running 1 items in this shard: test/test_vulkan.py::TestVulkanRewritePass::test_conv 2025-12-04T17:09:43.2863635Z 2025-12-04T17:09:43.2863906Z Finished test_vulkan 1/1 ... [2025-12-04 17:09:43.285916][28611.668812892], took 0.08min 2025-12-04T17:09:43.3234076Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_vulkan/test_vulkan-b25d187bf3baa78a.xml 2025-12-04T17:09:43.3507281Z Running lazy/test_generator 1/1 ... [2025-12-04 17:09:43.350427][28611.733322111] 2025-12-04T17:09:43.3507861Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T17:09:43.3511337Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'lazy/test_generator.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 17:09:43.350878] 2025-12-04T17:09:49.0240164Z 2025-12-04T17:09:49.0241729Z lazy/test_generator 1/1 was successful, full logs can be found in artifacts with path test/test-reports/lazy.test_generator_1.1_9fb7d5917fd83b83_.log 2025-12-04T17:09:49.0244756Z Running 2 items in this shard: test/lazy/test_generator.py::LazyGeneratorTest::test_generator, test/lazy/test_generator.py::LazyGeneratorTest::test_generator_causes_multiple_compiles 2025-12-04T17:09:49.0246602Z 2025-12-04T17:09:49.0247237Z Finished lazy/test_generator 1/1 ... [2025-12-04 17:09:49.023775][28617.406671142], took 0.09min 2025-12-04T17:09:49.0621487Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/lazy.test_generator/lazy.test_generator-42072a3593c4e25d.xml 2025-12-04T17:09:49.0923040Z Running torch_np/numpy_tests/linalg/test_linalg 1/1 ... [2025-12-04 17:09:49.091883][28617.474777746] 2025-12-04T17:09:49.0924300Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T17:09:49.0927148Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'torch_np/numpy_tests/linalg/test_linalg.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 17:09:49.092364] 2025-12-04T17:10:03.5289562Z 2025-12-04T17:10:03.5290992Z torch_np/numpy_tests/linalg/test_linalg 1/1 was successful, full logs can be found in artifacts with path test/test-reports/torch_np.numpy_tests.linalg.test_linalg_1.1_f8a6a4a0c07965ac_.log 2025-12-04T17:10:03.5405053Z Running 268 items in this shard: test/torch_np/numpy_tests/linalg/test_linalg.py::TestSolve::test_0_size, test/torch_np/numpy_tests/linalg/test_linalg.py::TestSolve::test_0_size_k, test/torch_np/numpy_tests/linalg/test_linalg.py::TestSolve::test_empty_sq_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestSolve::test_generalized_empty_sq_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestSolve::test_generalized_sq_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestSolve::test_sq_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestSolve::test_types_dtype0, test/torch_np/numpy_tests/linalg/test_linalg.py::TestSolve::test_types_dtype1, test/torch_np/numpy_tests/linalg/test_linalg.py::TestSolve::test_types_dtype2, test/torch_np/numpy_tests/linalg/test_linalg.py::TestSolve::test_types_dtype3, test/torch_np/numpy_tests/linalg/test_linalg.py::TestInv::test_0_size, test/torch_np/numpy_tests/linalg/test_linalg.py::TestInv::test_empty_sq_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestInv::test_generalized_empty_sq_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestInv::test_generalized_sq_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestInv::test_sq_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestInv::test_types_dtype0, test/torch_np/numpy_tests/linalg/test_linalg.py::TestInv::test_types_dtype1, test/torch_np/numpy_tests/linalg/test_linalg.py::TestInv::test_types_dtype2, test/torch_np/numpy_tests/linalg/test_linalg.py::TestInv::test_types_dtype3, test/torch_np/numpy_tests/linalg/test_linalg.py::TestEigvals::test_0_size, test/torch_np/numpy_tests/linalg/test_linalg.py::TestEigvals::test_empty_sq_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestEigvals::test_generalized_empty_sq_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestEigvals::test_generalized_sq_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestEigvals::test_sq_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestEigvals::test_types_dtype0, test/torch_np/numpy_tests/linalg/test_linalg.py::TestEigvals::test_types_dtype1, test/torch_np/numpy_tests/linalg/test_linalg.py::TestEigvals::test_types_dtype2, test/torch_np/numpy_tests/linalg/test_linalg.py::TestEigvals::test_types_dtype3, test/torch_np/numpy_tests/linalg/test_linalg.py::TestEig::test_0_size, test/torch_np/numpy_tests/linalg/test_linalg.py::TestEig::test_empty_sq_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestEig::test_generalized_empty_sq_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestEig::test_generalized_sq_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestEig::test_sq_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestEig::test_types_dtype0, test/torch_np/numpy_tests/linalg/test_linalg.py::TestEig::test_types_dtype1, test/torch_np/numpy_tests/linalg/test_linalg.py::TestEig::test_types_dtype2, test/torch_np/numpy_tests/linalg/test_linalg.py::TestEig::test_types_dtype3, test/torch_np/numpy_tests/linalg/test_linalg.py::TestSVD::test_empty_identity, test/torch_np/numpy_tests/linalg/test_linalg.py::TestSVD::test_empty_sq_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestSVD::test_generalized_empty_sq_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestSVD::test_generalized_sq_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestSVD::test_sq_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestSVD::test_types_dtype0, test/torch_np/numpy_tests/linalg/test_linalg.py::TestSVD::test_types_dtype1, test/torch_np/numpy_tests/linalg/test_linalg.py::TestSVD::test_types_dtype2, test/torch_np/numpy_tests/linalg/test_linalg.py::TestSVD::test_types_dtype3, test/torch_np/numpy_tests/linalg/test_linalg.py::TestSVDHermitian::test_empty_herm_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestSVDHermitian::test_generalized_empty_herm_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestSVDHermitian::test_generalized_herm_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestSVDHermitian::test_herm_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestSVDHermitian::test_types_dtype0, test/torch_np/numpy_tests/linalg/test_linalg.py::TestSVDHermitian::test_types_dtype1, test/torch_np/numpy_tests/linalg/test_linalg.py::TestSVDHermitian::test_types_dtype2, test/torch_np/numpy_tests/linalg/test_linalg.py::TestSVDHermitian::test_types_dtype3, test/torch_np/numpy_tests/linalg/test_linalg.py::TestCond::test_basic_nonsvd, test/torch_np/numpy_tests/linalg/test_linalg.py::TestCond::test_empty_sq_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestCond::test_generalized_empty_sq_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestCond::test_generalized_sq_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestCond::test_nan, test/torch_np/numpy_tests/linalg/test_linalg.py::TestCond::test_singular, test/torch_np/numpy_tests/linalg/test_linalg.py::TestCond::test_sq_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestCond::test_stacked_singular, test/torch_np/numpy_tests/linalg/test_linalg.py::TestPinv::test_empty_nonsq_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestPinv::test_empty_sq_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestPinv::test_generalized_empty_nonsq_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestPinv::test_generalized_empty_sq_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestPinv::test_generalized_nonsq_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestPinv::test_generalized_sq_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestPinv::test_nonsq_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestPinv::test_sq_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestPinvHermitian::test_empty_herm_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestPinvHermitian::test_generalized_empty_herm_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestPinvHermitian::test_generalized_herm_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestPinvHermitian::test_herm_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestDet::test_0_size, test/torch_np/numpy_tests/linalg/test_linalg.py::TestDet::test_empty_sq_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestDet::test_generalized_empty_sq_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestDet::test_generalized_sq_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestDet::test_sq_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestDet::test_types_dtype0, test/torch_np/numpy_tests/linalg/test_linalg.py::TestDet::test_types_dtype1, test/torch_np/numpy_tests/linalg/test_linalg.py::TestDet::test_types_dtype2, test/torch_np/numpy_tests/linalg/test_linalg.py::TestDet::test_types_dtype3, test/torch_np/numpy_tests/linalg/test_linalg.py::TestDet::test_zero, test/torch_np/numpy_tests/linalg/test_linalg.py::TestLstsq::test_empty_a_b_m_0_n_0_n_rhs_0, test/torch_np/numpy_tests/linalg/test_linalg.py::TestLstsq::test_empty_a_b_m_0_n_4_n_rhs_1, test/torch_np/numpy_tests/linalg/test_linalg.py::TestLstsq::test_empty_a_b_m_0_n_4_n_rhs_2, test/torch_np/numpy_tests/linalg/test_linalg.py::TestLstsq::test_empty_a_b_m_4_n_0_n_rhs_1, test/torch_np/numpy_tests/linalg/test_linalg.py::TestLstsq::test_empty_a_b_m_4_n_0_n_rhs_2, test/torch_np/numpy_tests/linalg/test_linalg.py::TestLstsq::test_empty_a_b_m_4_n_2_n_rhs_2, test/torch_np/numpy_tests/linalg/test_linalg.py::TestLstsq::test_empty_nonsq_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestLstsq::test_empty_sq_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestLstsq::test_future_rcond, test/torch_np/numpy_tests/linalg/test_linalg.py::TestLstsq::test_incompatible_dims, test/torch_np/numpy_tests/linalg/test_linalg.py::TestLstsq::test_nonsq_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestLstsq::test_sq_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestEigvalshCases::test_generalized_herm_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestEigvalshCases::test_generalized_empty_herm_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestEigvalshCases::test_herm_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestEigvalshCases::test_empty_herm_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestEigvalsh::test_0_size, test/torch_np/numpy_tests/linalg/test_linalg.py::TestEigvalsh::test_UPLO, test/torch_np/numpy_tests/linalg/test_linalg.py::TestEigvalsh::test_invalid, test/torch_np/numpy_tests/linalg/test_linalg.py::TestEigvalsh::test_types_dtype0, test/torch_np/numpy_tests/linalg/test_linalg.py::TestEigvalsh::test_types_dtype1, test/torch_np/numpy_tests/linalg/test_linalg.py::TestEigvalsh::test_types_dtype2, test/torch_np/numpy_tests/linalg/test_linalg.py::TestEigvalsh::test_types_dtype3, test/torch_np/numpy_tests/linalg/test_linalg.py::TestEighCases::test_generalized_herm_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestEighCases::test_generalized_empty_herm_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestEighCases::test_herm_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestEighCases::test_empty_herm_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestEigh::test_0_size, test/torch_np/numpy_tests/linalg/test_linalg.py::TestEigh::test_UPLO, test/torch_np/numpy_tests/linalg/test_linalg.py::TestEigh::test_invalid, test/torch_np/numpy_tests/linalg/test_linalg.py::TestEigh::test_types_dtype0, test/torch_np/numpy_tests/linalg/test_linalg.py::TestEigh::test_types_dtype1, test/torch_np/numpy_tests/linalg/test_linalg.py::TestEigh::test_types_dtype2, test/torch_np/numpy_tests/linalg/test_linalg.py::TestEigh::test_types_dtype3, test/torch_np/numpy_tests/linalg/test_linalg.py::TestNorm_NonSystematic::test_intmin, test/torch_np/numpy_tests/linalg/test_linalg.py::TestNormDouble::test_axis, test/torch_np/numpy_tests/linalg/test_linalg.py::TestNormDouble::test_bad_args, test/torch_np/numpy_tests/linalg/test_linalg.py::TestNormDouble::test_empty, test/torch_np/numpy_tests/linalg/test_linalg.py::TestNormDouble::test_keepdims, test/torch_np/numpy_tests/linalg/test_linalg.py::TestNormDouble::test_matrix_2x2, test/torch_np/numpy_tests/linalg/test_linalg.py::TestNormDouble::test_matrix_3x3, test/torch_np/numpy_tests/linalg/test_linalg.py::TestNormDouble::test_matrix_empty, test/torch_np/numpy_tests/linalg/test_linalg.py::TestNormDouble::test_matrix_return_type, test/torch_np/numpy_tests/linalg/test_linalg.py::TestNormDouble::test_vector, test/torch_np/numpy_tests/linalg/test_linalg.py::TestNormDouble::test_vector_return_type, test/torch_np/numpy_tests/linalg/test_linalg.py::TestNormSingle::test_axis, test/torch_np/numpy_tests/linalg/test_linalg.py::TestNormSingle::test_bad_args, test/torch_np/numpy_tests/linalg/test_linalg.py::TestNormSingle::test_empty, test/torch_np/numpy_tests/linalg/test_linalg.py::TestNormSingle::test_keepdims, test/torch_np/numpy_tests/linalg/test_linalg.py::TestNormSingle::test_matrix_2x2, test/torch_np/numpy_tests/linalg/test_linalg.py::TestNormSingle::test_matrix_3x3, test/torch_np/numpy_tests/linalg/test_linalg.py::TestNormSingle::test_matrix_empty, test/torch_np/numpy_tests/linalg/test_linalg.py::TestNormSingle::test_matrix_return_type, test/torch_np/numpy_tests/linalg/test_linalg.py::TestNormSingle::test_vector, test/torch_np/numpy_tests/linalg/test_linalg.py::TestNormSingle::test_vector_return_type, test/torch_np/numpy_tests/linalg/test_linalg.py::TestNormInt64::test_axis, test/torch_np/numpy_tests/linalg/test_linalg.py::TestNormInt64::test_bad_args, test/torch_np/numpy_tests/linalg/test_linalg.py::TestNormInt64::test_empty, test/torch_np/numpy_tests/linalg/test_linalg.py::TestNormInt64::test_keepdims, test/torch_np/numpy_tests/linalg/test_linalg.py::TestNormInt64::test_matrix_2x2, test/torch_np/numpy_tests/linalg/test_linalg.py::TestNormInt64::test_matrix_3x3, test/torch_np/numpy_tests/linalg/test_linalg.py::TestNormInt64::test_matrix_empty, test/torch_np/numpy_tests/linalg/test_linalg.py::TestNormInt64::test_matrix_return_type, test/torch_np/numpy_tests/linalg/test_linalg.py::TestNormInt64::test_vector, test/torch_np/numpy_tests/linalg/test_linalg.py::TestNormInt64::test_vector_return_type, test/torch_np/numpy_tests/linalg/test_linalg.py::TestMatrixRank::test_matrix_rank, test/torch_np/numpy_tests/linalg/test_linalg.py::TestMatrixRank::test_reduced_rank, test/torch_np/numpy_tests/linalg/test_linalg.py::TestMatrixRank::test_symmetric_rank, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_mode_all_but_economic, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_mode_raw, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_qr_empty_m_0_n_0, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_qr_empty_m_0_n_3, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_qr_empty_m_3_n_0, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size0_outer_size0_dt0, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size0_outer_size0_dt1, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size0_outer_size0_dt2, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size0_outer_size0_dt3, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size0_outer_size1_dt0, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size0_outer_size1_dt1, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size0_outer_size1_dt2, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size0_outer_size1_dt3, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size0_outer_size2_dt0, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size0_outer_size2_dt1, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size0_outer_size2_dt2, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size0_outer_size2_dt3, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size1_outer_size0_dt0, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size1_outer_size0_dt1, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size1_outer_size0_dt2, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size1_outer_size0_dt3, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size1_outer_size1_dt0, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size1_outer_size1_dt1, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size1_outer_size1_dt2, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size1_outer_size1_dt3, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size1_outer_size2_dt0, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size1_outer_size2_dt1, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size1_outer_size2_dt2, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size1_outer_size2_dt3, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size2_outer_size0_dt0, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size2_outer_size0_dt1, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size2_outer_size0_dt2, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size2_outer_size0_dt3, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size2_outer_size1_dt0, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size2_outer_size1_dt1, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size2_outer_size1_dt2, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size2_outer_size1_dt3, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size2_outer_size2_dt0, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size2_outer_size2_dt1, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size2_outer_size2_dt2, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size2_outer_size2_dt3, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size3_outer_size0_dt0, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size3_outer_size0_dt1, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size3_outer_size0_dt2, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size3_outer_size0_dt3, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size3_outer_size1_dt0, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size3_outer_size1_dt1, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size3_outer_size1_dt2, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size3_outer_size1_dt3, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size3_outer_size2_dt0, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size3_outer_size2_dt1, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size3_outer_size2_dt2, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size3_outer_size2_dt3, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size4_outer_size0_dt0, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size4_outer_size0_dt1, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size4_outer_size0_dt2, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size4_outer_size0_dt3, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size4_outer_size1_dt0, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size4_outer_size1_dt1, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size4_outer_size1_dt2, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size4_outer_size1_dt3, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size4_outer_size2_dt0, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size4_outer_size2_dt1, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size4_outer_size2_dt2, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size4_outer_size2_dt3, test/torch_np/numpy_tests/linalg/test_linalg.py::TestCholesky::test_0_size, test/torch_np/numpy_tests/linalg/test_linalg.py::TestCholesky::test_basic_property_shape0_dtype0, test/torch_np/numpy_tests/linalg/test_linalg.py::TestCholesky::test_basic_property_shape0_dtype1, test/torch_np/numpy_tests/linalg/test_linalg.py::TestCholesky::test_basic_property_shape0_dtype2, test/torch_np/numpy_tests/linalg/test_linalg.py::TestCholesky::test_basic_property_shape0_dtype3, test/torch_np/numpy_tests/linalg/test_linalg.py::TestCholesky::test_basic_property_shape1_dtype0, test/torch_np/numpy_tests/linalg/test_linalg.py::TestCholesky::test_basic_property_shape1_dtype1, test/torch_np/numpy_tests/linalg/test_linalg.py::TestCholesky::test_basic_property_shape1_dtype2, test/torch_np/numpy_tests/linalg/test_linalg.py::TestCholesky::test_basic_property_shape1_dtype3, test/torch_np/numpy_tests/linalg/test_linalg.py::TestCholesky::test_basic_property_shape2_dtype0, test/torch_np/numpy_tests/linalg/test_linalg.py::TestCholesky::test_basic_property_shape2_dtype1, test/torch_np/numpy_tests/linalg/test_linalg.py::TestCholesky::test_basic_property_shape2_dtype2, test/torch_np/numpy_tests/linalg/test_linalg.py::TestCholesky::test_basic_property_shape2_dtype3, test/torch_np/numpy_tests/linalg/test_linalg.py::TestCholesky::test_basic_property_shape3_dtype0, test/torch_np/numpy_tests/linalg/test_linalg.py::TestCholesky::test_basic_property_shape3_dtype1, test/torch_np/numpy_tests/linalg/test_linalg.py::TestCholesky::test_basic_property_shape3_dtype2, test/torch_np/numpy_tests/linalg/test_linalg.py::TestCholesky::test_basic_property_shape3_dtype3, test/torch_np/numpy_tests/linalg/test_linalg.py::TestCholesky::test_basic_property_shape4_dtype0, test/torch_np/numpy_tests/linalg/test_linalg.py::TestCholesky::test_basic_property_shape4_dtype1, test/torch_np/numpy_tests/linalg/test_linalg.py::TestCholesky::test_basic_property_shape4_dtype2, test/torch_np/numpy_tests/linalg/test_linalg.py::TestCholesky::test_basic_property_shape4_dtype3, test/torch_np/numpy_tests/linalg/test_linalg.py::TestMisc::test_byteorder_check, test/torch_np/numpy_tests/linalg/test_linalg.py::TestMisc::test_generalized_raise_multiloop, test/torch_np/numpy_tests/linalg/test_linalg.py::TestMisc::test_sdot_bug_8577, test/torch_np/numpy_tests/linalg/test_linalg.py::TestMisc::test_xerbla_override, test/torch_np/numpy_tests/linalg/test_linalg.py::TestMultiDot::test_basic_function_with_dynamic_programming_optimization, test/torch_np/numpy_tests/linalg/test_linalg.py::TestMultiDot::test_basic_function_with_three_arguments, test/torch_np/numpy_tests/linalg/test_linalg.py::TestMultiDot::test_basic_function_with_two_arguments, test/torch_np/numpy_tests/linalg/test_linalg.py::TestMultiDot::test_dynamic_programming_logic, test/torch_np/numpy_tests/linalg/test_linalg.py::TestMultiDot::test_dynamic_programming_optimization_and_out, test/torch_np/numpy_tests/linalg/test_linalg.py::TestMultiDot::test_three_arguments_and_out, test/torch_np/numpy_tests/linalg/test_linalg.py::TestMultiDot::test_too_few_input_arrays, test/torch_np/numpy_tests/linalg/test_linalg.py::TestMultiDot::test_two_arguments_and_out, test/torch_np/numpy_tests/linalg/test_linalg.py::TestMultiDot::test_vector_as_first_and_last_argument, test/torch_np/numpy_tests/linalg/test_linalg.py::TestMultiDot::test_vector_as_first_argument, test/torch_np/numpy_tests/linalg/test_linalg.py::TestMultiDot::test_vector_as_last_argument, test/torch_np/numpy_tests/linalg/test_linalg.py::TestTensorinv::test_non_square_handling_arr0_ind_2, test/torch_np/numpy_tests/linalg/test_linalg.py::TestTensorinv::test_non_square_handling_arr1_ind_1, test/torch_np/numpy_tests/linalg/test_linalg.py::TestTensorinv::test_tensorinv_ind_limit_ind_-2, test/torch_np/numpy_tests/linalg/test_linalg.py::TestTensorinv::test_tensorinv_ind_limit_ind_0, test/torch_np/numpy_tests/linalg/test_linalg.py::TestTensorinv::test_tensorinv_result, test/torch_np/numpy_tests/linalg/test_linalg.py::TestTensorinv::test_tensorinv_shape_shape0_ind_2, test/torch_np/numpy_tests/linalg/test_linalg.py::TestTensorinv::test_tensorinv_shape_shape1_ind_1, test/torch_np/numpy_tests/linalg/test_linalg.py::TestTensorsolve::test_non_square_handling_a0_axes0, test/torch_np/numpy_tests/linalg/test_linalg.py::TestTensorsolve::test_non_square_handling_a1_axes1, test/torch_np/numpy_tests/linalg/test_linalg.py::TestTensorsolve::test_tensorsolve_result_shape0, test/torch_np/numpy_tests/linalg/test_linalg.py::TestTensorsolve::test_tensorsolve_result_shape1, test/torch_np/numpy_tests/linalg/test_linalg.py::TestTensorsolve::test_tensorsolve_result_shape2, test/torch_np/numpy_tests/linalg/test_linalg.py::TestMisc2::test_blas64_dot, test/torch_np/numpy_tests/linalg/test_linalg.py::TestMisc2::test_blas64_geqrf_lwork_smoketest, test/torch_np/numpy_tests/linalg/test_linalg.py::TestMisc2::test_unsupported_commontype 2025-12-04T17:10:03.5518211Z 2025-12-04T17:10:03.5518712Z Finished torch_np/numpy_tests/linalg/test_linalg 1/1 ... [2025-12-04 17:10:03.529160][28631.912055296], took 0.24min 2025-12-04T17:10:03.5663007Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/torch_np.numpy_tests.linalg.test_linalg/torch_np.numpy_tests.linalg.test_linalg-2974f2048ff6a577.xml 2025-12-04T17:10:03.6565045Z Running torch_np/numpy_tests/core/test_dtype 1/1 ... [2025-12-04 17:10:03.656158][28632.039051388] 2025-12-04T17:10:03.6565667Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T17:10:03.6568616Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'torch_np/numpy_tests/core/test_dtype.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 17:10:03.656595] 2025-12-04T17:10:09.7800715Z 2025-12-04T17:10:09.7802007Z torch_np/numpy_tests/core/test_dtype 1/1 was successful, full logs can be found in artifacts with path test/test-reports/torch_np.numpy_tests.core.test_dtype_1.1_7868c6a3dd1e371a_.log 2025-12-04T17:10:09.7852731Z Running 102 items in this shard: test/torch_np/numpy_tests/core/test_dtype.py::TestBuiltin::test_equivalent_dtype_hashing, test/torch_np/numpy_tests/core/test_dtype.py::TestBuiltin::test_invalid_types, test/torch_np/numpy_tests/core/test_dtype.py::TestBuiltin::test_numeric_style_types_are_invalid_dtype_Bool, test/torch_np/numpy_tests/core/test_dtype.py::TestBuiltin::test_numeric_style_types_are_invalid_dtype_Bytes0, test/torch_np/numpy_tests/core/test_dtype.py::TestBuiltin::test_numeric_style_types_are_invalid_dtype_Complex128, test/torch_np/numpy_tests/core/test_dtype.py::TestBuiltin::test_numeric_style_types_are_invalid_dtype_Complex32, test/torch_np/numpy_tests/core/test_dtype.py::TestBuiltin::test_numeric_style_types_are_invalid_dtype_Complex64, test/torch_np/numpy_tests/core/test_dtype.py::TestBuiltin::test_numeric_style_types_are_invalid_dtype_Datetime64, test/torch_np/numpy_tests/core/test_dtype.py::TestBuiltin::test_numeric_style_types_are_invalid_dtype_Float128, test/torch_np/numpy_tests/core/test_dtype.py::TestBuiltin::test_numeric_style_types_are_invalid_dtype_Float16, test/torch_np/numpy_tests/core/test_dtype.py::TestBuiltin::test_numeric_style_types_are_invalid_dtype_Float32, test/torch_np/numpy_tests/core/test_dtype.py::TestBuiltin::test_numeric_style_types_are_invalid_dtype_Float64, test/torch_np/numpy_tests/core/test_dtype.py::TestBuiltin::test_numeric_style_types_are_invalid_dtype_Int16, test/torch_np/numpy_tests/core/test_dtype.py::TestBuiltin::test_numeric_style_types_are_invalid_dtype_Int32, test/torch_np/numpy_tests/core/test_dtype.py::TestBuiltin::test_numeric_style_types_are_invalid_dtype_Int64, test/torch_np/numpy_tests/core/test_dtype.py::TestBuiltin::test_numeric_style_types_are_invalid_dtype_Int8, test/torch_np/numpy_tests/core/test_dtype.py::TestBuiltin::test_numeric_style_types_are_invalid_dtype_Object0, test/torch_np/numpy_tests/core/test_dtype.py::TestBuiltin::test_numeric_style_types_are_invalid_dtype_Str0, test/torch_np/numpy_tests/core/test_dtype.py::TestBuiltin::test_numeric_style_types_are_invalid_dtype_Timedelta64, test/torch_np/numpy_tests/core/test_dtype.py::TestBuiltin::test_numeric_style_types_are_invalid_dtype_UInt16, test/torch_np/numpy_tests/core/test_dtype.py::TestBuiltin::test_numeric_style_types_are_invalid_dtype_UInt32, test/torch_np/numpy_tests/core/test_dtype.py::TestBuiltin::test_numeric_style_types_are_invalid_dtype_UInt64, test/torch_np/numpy_tests/core/test_dtype.py::TestBuiltin::test_numeric_style_types_are_invalid_dtype_UInt8, test/torch_np/numpy_tests/core/test_dtype.py::TestBuiltin::test_numeric_style_types_are_invalid_dtype_Uint32, test/torch_np/numpy_tests/core/test_dtype.py::TestBuiltin::test_numeric_style_types_are_invalid_dtype_Uint64, test/torch_np/numpy_tests/core/test_dtype.py::TestBuiltin::test_numeric_style_types_are_invalid_dtype_Void0, test/torch_np/numpy_tests/core/test_dtype.py::TestBuiltin::test_richcompare_invalid_dtype_comparison_operation0, test/torch_np/numpy_tests/core/test_dtype.py::TestBuiltin::test_richcompare_invalid_dtype_comparison_operation1, test/torch_np/numpy_tests/core/test_dtype.py::TestBuiltin::test_richcompare_invalid_dtype_comparison_operation2, test/torch_np/numpy_tests/core/test_dtype.py::TestBuiltin::test_richcompare_invalid_dtype_comparison_operation3, test/torch_np/numpy_tests/core/test_dtype.py::TestBuiltin::test_richcompare_invalid_dtype_equality, test/torch_np/numpy_tests/core/test_dtype.py::TestBuiltin::test_run_t0, test/torch_np/numpy_tests/core/test_dtype.py::TestBuiltin::test_run_t1, test/torch_np/numpy_tests/core/test_dtype.py::TestBuiltin::test_run_t2, test/torch_np/numpy_tests/core/test_dtype.py::TestBuiltin::test_run_t3, test/torch_np/numpy_tests/core/test_dtype.py::TestDtypeAttributeDeletion::test_dtype_non_writable_attributes_deletion, test/torch_np/numpy_tests/core/test_dtype.py::TestDtypeAttributeDeletion::test_dtype_writable_attributes_deletion, test/torch_np/numpy_tests/core/test_dtype.py::TestPickling::test_builtin_t0, test/torch_np/numpy_tests/core/test_dtype.py::TestPickling::test_builtin_t1, test/torch_np/numpy_tests/core/test_dtype.py::TestPickling::test_builtin_t2, test/torch_np/numpy_tests/core/test_dtype.py::TestPickling::test_builtin_t3, test/torch_np/numpy_tests/core/test_dtype.py::TestPickling::test_builtin_t4, test/torch_np/numpy_tests/core/test_dtype.py::TestPickling::test_pickle_types_DType11, test/torch_np/numpy_tests/core/test_dtype.py::TestPickling::test_pickle_types_bool__10, test/torch_np/numpy_tests/core/test_dtype.py::TestPickling::test_pickle_types_complex128_4, test/torch_np/numpy_tests/core/test_dtype.py::TestPickling::test_pickle_types_complex64_3, test/torch_np/numpy_tests/core/test_dtype.py::TestPickling::test_pickle_types_float16_0, test/torch_np/numpy_tests/core/test_dtype.py::TestPickling::test_pickle_types_float32_1, test/torch_np/numpy_tests/core/test_dtype.py::TestPickling::test_pickle_types_float64_2, test/torch_np/numpy_tests/core/test_dtype.py::TestPickling::test_pickle_types_int16_7, test/torch_np/numpy_tests/core/test_dtype.py::TestPickling::test_pickle_types_int32_8, test/torch_np/numpy_tests/core/test_dtype.py::TestPickling::test_pickle_types_int64_9, test/torch_np/numpy_tests/core/test_dtype.py::TestPickling::test_pickle_types_int8_6, test/torch_np/numpy_tests/core/test_dtype.py::TestPickling::test_pickle_types_uint8_5, test/torch_np/numpy_tests/core/test_dtype.py::TestPromotion::test_complex_other_value_based_complex64_complex64_None, test/torch_np/numpy_tests/core/test_dtype.py::TestPromotion::test_complex_other_value_based_float16_complex64_None, test/torch_np/numpy_tests/core/test_dtype.py::TestPromotion::test_complex_other_value_based_float32_complex64_None, test/torch_np/numpy_tests/core/test_dtype.py::TestPromotion::test_complex_other_value_based_other_4294967295_expected1_expected_weak1, test/torch_np/numpy_tests/core/test_dtype.py::TestPromotion::test_complex_other_value_based_other_65535_expected0_expected_weak0, test/torch_np/numpy_tests/core/test_dtype.py::TestPromotion::test_complex_scalar_value_based_other0_expected0, test/torch_np/numpy_tests/core/test_dtype.py::TestPromotion::test_complex_scalar_value_based_other1_expected1, test/torch_np/numpy_tests/core/test_dtype.py::TestPromotion::test_complex_scalar_value_based_other2_expected2, test/torch_np/numpy_tests/core/test_dtype.py::TestPromotion::test_complex_scalar_value_based_other3_expected3, test/torch_np/numpy_tests/core/test_dtype.py::TestPromotion::test_complex_scalar_value_based_other4_expected4, test/torch_np/numpy_tests/core/test_dtype.py::TestPromotion::test_complex_scalar_value_based_other5_expected5, test/torch_np/numpy_tests/core/test_dtype.py::TestPromotion::test_complex_scalar_value_based_other6_expected6, test/torch_np/numpy_tests/core/test_dtype.py::TestPromotion::test_permutations_do_not_influence_result_dtypes0_expected0, test/torch_np/numpy_tests/core/test_dtype.py::TestPromotion::test_permutations_do_not_influence_result_dtypes1_expected1, test/torch_np/numpy_tests/core/test_dtype.py::TestPromotion::test_permutations_do_not_influence_result_dtypes2_expected2, test/torch_np/numpy_tests/core/test_dtype.py::TestPromotion::test_permutations_do_not_influence_result_dtypes3_expected3, test/torch_np/numpy_tests/core/test_dtype.py::TestPromotion::test_permutations_do_not_influence_result_dtypes4_expected4, test/torch_np/numpy_tests/core/test_dtype.py::TestPromotion::test_permutations_do_not_influence_result_dtypes5_expected5, test/torch_np/numpy_tests/core/test_dtype.py::TestPromotion::test_permutations_do_not_influence_result_dtypes6_expected6, test/torch_np/numpy_tests/core/test_dtype.py::TestPromotion::test_permutations_do_not_influence_result_dtypes7_expected7, test/torch_np/numpy_tests/core/test_dtype.py::TestPromotion::test_permutations_do_not_influence_result_dtypes8_expected8, test/torch_np/numpy_tests/core/test_dtype.py::TestPromotion::test_permutations_do_not_influence_result_dtypes9_expected9, test/torch_np/numpy_tests/core/test_dtype.py::TestPromotion::test_python_integer_promotion_val_18446744073709551616, test/torch_np/numpy_tests/core/test_dtype.py::TestPromotion::test_python_integer_promotion_val_2, test/torch_np/numpy_tests/core/test_dtype.py::TestPromotion::test_python_integer_promotion_val_200, test/torch_np/numpy_tests/core/test_dtype.py::TestPromotion::test_python_integer_promotion_val_4294967296, test/torch_np/numpy_tests/core/test_dtype.py::TestPromotion::test_python_integer_promotion_val_9223372036854775808, test/torch_np/numpy_tests/core/test_dtype.py::TestMisc::test_dtypes_are_true, test/torch_np/numpy_tests/core/test_dtype.py::TestMisc::test_keyword_argument, test/torch_np/numpy_tests/core/test_dtype.py::TestFromDTypeAttribute::test_recursion, test/torch_np/numpy_tests/core/test_dtype.py::TestFromDTypeAttribute::test_simple, test/torch_np/numpy_tests/core/test_dtype.py::TestClassGetItem::test_dtype, test/torch_np/numpy_tests/core/test_dtype.py::TestClassGetItem::test_dtype_subclass_code_?, test/torch_np/numpy_tests/core/test_dtype.py::TestClassGetItem::test_dtype_subclass_code_B, test/torch_np/numpy_tests/core/test_dtype.py::TestClassGetItem::test_dtype_subclass_code_D, test/torch_np/numpy_tests/core/test_dtype.py::TestClassGetItem::test_dtype_subclass_code_F, test/torch_np/numpy_tests/core/test_dtype.py::TestClassGetItem::test_dtype_subclass_code_b, test/torch_np/numpy_tests/core/test_dtype.py::TestClassGetItem::test_dtype_subclass_code_d, test/torch_np/numpy_tests/core/test_dtype.py::TestClassGetItem::test_dtype_subclass_code_e, test/torch_np/numpy_tests/core/test_dtype.py::TestClassGetItem::test_dtype_subclass_code_f, test/torch_np/numpy_tests/core/test_dtype.py::TestClassGetItem::test_dtype_subclass_code_h, test/torch_np/numpy_tests/core/test_dtype.py::TestClassGetItem::test_dtype_subclass_code_i, test/torch_np/numpy_tests/core/test_dtype.py::TestClassGetItem::test_dtype_subclass_code_l, test/torch_np/numpy_tests/core/test_dtype.py::TestClassGetItem::test_subscript_scalar, test/torch_np/numpy_tests/core/test_dtype.py::TestClassGetItem::test_subscript_tuple_arg_len_0, test/torch_np/numpy_tests/core/test_dtype.py::TestClassGetItem::test_subscript_tuple_arg_len_1, test/torch_np/numpy_tests/core/test_dtype.py::TestClassGetItem::test_subscript_tuple_arg_len_2, test/torch_np/numpy_tests/core/test_dtype.py::TestClassGetItem::test_subscript_tuple_arg_len_3 2025-12-04T17:10:09.7902780Z 2025-12-04T17:10:09.7903206Z Finished torch_np/numpy_tests/core/test_dtype 1/1 ... [2025-12-04 17:10:09.780023][28638.16291905], took 0.10min 2025-12-04T17:10:09.8179648Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/torch_np.numpy_tests.core.test_dtype/torch_np.numpy_tests.core.test_dtype-4b8c4285965a7813.xml 2025-12-04T17:10:09.8941297Z Running lazy/test_debug_util 1/1 ... [2025-12-04 17:10:09.893777][28638.276671492] 2025-12-04T17:10:09.8941870Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T17:10:09.8945043Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'lazy/test_debug_util.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 17:10:09.894214] 2025-12-04T17:10:15.6171435Z 2025-12-04T17:10:15.6172412Z lazy/test_debug_util 1/1 was successful, full logs can be found in artifacts with path test/test-reports/lazy.test_debug_util_1.1_1fda475a5f9a06f5_.log 2025-12-04T17:10:15.6173598Z Running 1 items in this shard: test/lazy/test_debug_util.py::DebugUtilTest::test_get_python_frames 2025-12-04T17:10:15.6174112Z 2025-12-04T17:10:15.6174455Z Finished lazy/test_debug_util 1/1 ... [2025-12-04 17:10:15.616895][28643.999791016], took 0.10min 2025-12-04T17:10:15.6549329Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/lazy.test_debug_util/lazy.test_debug_util-7c02b1e3dfee61bd.xml 2025-12-04T17:10:15.6928455Z Running nn/test_load_state_dict 1/1 ... [2025-12-04 17:10:15.692523][28644.075416258] 2025-12-04T17:10:15.6929029Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T17:10:15.6932164Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'nn/test_load_state_dict.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 17:10:15.692955] 2025-12-04T17:10:21.9667334Z 2025-12-04T17:10:21.9668499Z nn/test_load_state_dict 1/1 was successful, full logs can be found in artifacts with path test/test-reports/nn.test_load_state_dict_1.1_54a686ad2f48d7f9_.log 2025-12-04T17:10:21.9682925Z Running 29 items in this shard: test/nn/test_load_state_dict.py::TestLoadStateDict::test_load_state_dict_BC_swap_False, test/nn/test_load_state_dict.py::TestLoadStateDict::test_load_state_dict_BC_swap_True, test/nn/test_load_state_dict.py::TestLoadStateDict::test_load_state_dict_assign_meta_swap_False_keep_vars_False, test/nn/test_load_state_dict.py::TestLoadStateDict::test_load_state_dict_assign_meta_swap_False_keep_vars_True, test/nn/test_load_state_dict.py::TestLoadStateDict::test_load_state_dict_assign_meta_swap_True_keep_vars_False, test/nn/test_load_state_dict.py::TestLoadStateDict::test_load_state_dict_assign_meta_swap_True_keep_vars_True, test/nn/test_load_state_dict.py::TestLoadStateDict::test_load_state_dict_assign_shape_stride_swap_False, test/nn/test_load_state_dict.py::TestLoadStateDict::test_load_state_dict_assign_shape_stride_swap_True, test/nn/test_load_state_dict.py::TestLoadStateDict::test_load_state_dict_assign_with_optimizer_swap_False, test/nn/test_load_state_dict.py::TestLoadStateDict::test_load_state_dict_assign_with_optimizer_swap_True, test/nn/test_load_state_dict.py::TestLoadStateDict::test_load_state_dict_child_swap_False, test/nn/test_load_state_dict.py::TestLoadStateDict::test_load_state_dict_child_swap_True, test/nn/test_load_state_dict.py::TestLoadStateDict::test_load_state_dict_custom_swap_False, test/nn/test_load_state_dict.py::TestLoadStateDict::test_load_state_dict_custom_swap_True, test/nn/test_load_state_dict.py::TestLoadStateDict::test_load_state_dict_invalid_swap_False, test/nn/test_load_state_dict.py::TestLoadStateDict::test_load_state_dict_invalid_swap_True, test/nn/test_load_state_dict.py::TestLoadStateDict::test_load_state_dict_ref_cycle_swap_False, test/nn/test_load_state_dict.py::TestLoadStateDict::test_load_state_dict_swap_False, test/nn/test_load_state_dict.py::TestLoadStateDict::test_load_state_dict_swap_True, test/nn/test_load_state_dict.py::TestLoadStateDict::test_load_state_dict_type_swap_False, test/nn/test_load_state_dict.py::TestLoadStateDict::test_load_state_dict_type_swap_True, test/nn/test_load_state_dict.py::TestLoadStateDict::test_load_state_dict_warn_assign_swap_False, test/nn/test_load_state_dict.py::TestLoadStateDict::test_load_state_dict_warn_assign_swap_True, test/nn/test_load_state_dict.py::TestLoadStateDict::test_load_state_dict_with_unexpected_key_swap_False, test/nn/test_load_state_dict.py::TestLoadStateDict::test_load_state_dict_with_unexpected_key_swap_True, test/nn/test_load_state_dict.py::TestLoadStateDict::test_scalar_param_1d_tensor_raises_swap_False, test/nn/test_load_state_dict.py::TestLoadStateDict::test_scalar_param_1d_tensor_raises_swap_True, test/nn/test_load_state_dict.py::TestLoadStateDictSwap::test_swap_subclass_swap_True_assign_False, test/nn/test_load_state_dict.py::TestLoadStateDictSwap::test_swap_subclass_swap_True_assign_True 2025-12-04T17:10:21.9696601Z 2025-12-04T17:10:21.9696956Z Finished nn/test_load_state_dict 1/1 ... [2025-12-04 17:10:21.966504][28650.349401217], took 0.10min 2025-12-04T17:10:22.0047346Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/nn.test_load_state_dict/nn.test_load_state_dict-e81d6ed8d3f8789f.xml 2025-12-04T17:10:22.0757260Z Running test_shape_ops 1/1 ... [2025-12-04 17:10:22.075403][28650.458297383] 2025-12-04T17:10:22.0760045Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T17:10:22.0761285Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_shape_ops.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 17:10:22.075840] 2025-12-04T17:10:30.7533474Z 2025-12-04T17:10:30.7534407Z test_shape_ops 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_shape_ops_1.1_4cd0c635a81aa180_.log 2025-12-04T17:10:30.7568009Z Running 99 items in this shard: test/test_shape_ops.py::TestShapeOpsCUDA::test_clamp_cuda_float32, test/test_shape_ops.py::TestShapeOpsCUDA::test_clamp_cuda_int64, test/test_shape_ops.py::TestShapeOpsCUDA::test_clamp_propagates_nans_cuda, test/test_shape_ops.py::TestShapeOpsCUDA::test_clamp_raises_arg_errors_cuda, test/test_shape_ops.py::TestShapeOpsCUDA::test_complex_rot90_cuda_complex128, test/test_shape_ops.py::TestShapeOpsCUDA::test_complex_rot90_cuda_complex64, test/test_shape_ops.py::TestShapeOpsCUDA::test_diag_cuda_bool, test/test_shape_ops.py::TestShapeOpsCUDA::test_diag_cuda_float32, test/test_shape_ops.py::TestShapeOpsCUDA::test_diagonal_cuda, test/test_shape_ops.py::TestShapeOpsCUDA::test_diagonal_multidim_cuda_float32, test/test_shape_ops.py::TestShapeOpsCUDA::test_flip_cuda_bfloat16, test/test_shape_ops.py::TestShapeOpsCUDA::test_flip_cuda_bool, test/test_shape_ops.py::TestShapeOpsCUDA::test_flip_cuda_complex128, test/test_shape_ops.py::TestShapeOpsCUDA::test_flip_cuda_complex64, test/test_shape_ops.py::TestShapeOpsCUDA::test_flip_cuda_float16, test/test_shape_ops.py::TestShapeOpsCUDA::test_flip_cuda_float32, test/test_shape_ops.py::TestShapeOpsCUDA::test_flip_cuda_float64, test/test_shape_ops.py::TestShapeOpsCUDA::test_flip_cuda_int16, test/test_shape_ops.py::TestShapeOpsCUDA::test_flip_cuda_int32, test/test_shape_ops.py::TestShapeOpsCUDA::test_flip_cuda_int64, test/test_shape_ops.py::TestShapeOpsCUDA::test_flip_cuda_int8, test/test_shape_ops.py::TestShapeOpsCUDA::test_flip_cuda_uint8, test/test_shape_ops.py::TestShapeOpsCUDA::test_flip_errors_cuda_bfloat16, test/test_shape_ops.py::TestShapeOpsCUDA::test_flip_errors_cuda_bool, test/test_shape_ops.py::TestShapeOpsCUDA::test_flip_errors_cuda_complex128, test/test_shape_ops.py::TestShapeOpsCUDA::test_flip_errors_cuda_complex64, test/test_shape_ops.py::TestShapeOpsCUDA::test_flip_errors_cuda_float16, test/test_shape_ops.py::TestShapeOpsCUDA::test_flip_errors_cuda_float32, test/test_shape_ops.py::TestShapeOpsCUDA::test_flip_errors_cuda_float64, test/test_shape_ops.py::TestShapeOpsCUDA::test_flip_errors_cuda_int16, test/test_shape_ops.py::TestShapeOpsCUDA::test_flip_errors_cuda_int32, test/test_shape_ops.py::TestShapeOpsCUDA::test_flip_errors_cuda_int64, test/test_shape_ops.py::TestShapeOpsCUDA::test_flip_errors_cuda_int8, test/test_shape_ops.py::TestShapeOpsCUDA::test_flip_errors_cuda_uint8, test/test_shape_ops.py::TestShapeOpsCUDA::test_flip_large_tensor_cuda, test/test_shape_ops.py::TestShapeOpsCUDA::test_flip_numpy_cuda_bfloat16, test/test_shape_ops.py::TestShapeOpsCUDA::test_flip_numpy_cuda_bool, test/test_shape_ops.py::TestShapeOpsCUDA::test_flip_numpy_cuda_complex128, test/test_shape_ops.py::TestShapeOpsCUDA::test_flip_numpy_cuda_complex64, test/test_shape_ops.py::TestShapeOpsCUDA::test_flip_numpy_cuda_float16, test/test_shape_ops.py::TestShapeOpsCUDA::test_flip_numpy_cuda_float32, test/test_shape_ops.py::TestShapeOpsCUDA::test_flip_numpy_cuda_float64, test/test_shape_ops.py::TestShapeOpsCUDA::test_flip_numpy_cuda_int16, test/test_shape_ops.py::TestShapeOpsCUDA::test_flip_numpy_cuda_int32, test/test_shape_ops.py::TestShapeOpsCUDA::test_flip_numpy_cuda_int64, test/test_shape_ops.py::TestShapeOpsCUDA::test_flip_numpy_cuda_int8, test/test_shape_ops.py::TestShapeOpsCUDA::test_flip_numpy_cuda_uint8, test/test_shape_ops.py::TestShapeOpsCUDA::test_flip_unsupported_dtype_cuda_quint2x4, test/test_shape_ops.py::TestShapeOpsCUDA::test_flip_unsupported_dtype_cuda_quint4x2, test/test_shape_ops.py::TestShapeOpsCUDA::test_fliplr_cuda_complex128, test/test_shape_ops.py::TestShapeOpsCUDA::test_fliplr_cuda_float64, test/test_shape_ops.py::TestShapeOpsCUDA::test_fliplr_cuda_int64, test/test_shape_ops.py::TestShapeOpsCUDA::test_fliplr_invalid_cuda_complex128, test/test_shape_ops.py::TestShapeOpsCUDA::test_fliplr_invalid_cuda_float64, test/test_shape_ops.py::TestShapeOpsCUDA::test_fliplr_invalid_cuda_int64, test/test_shape_ops.py::TestShapeOpsCUDA::test_flipud_cuda_complex128, test/test_shape_ops.py::TestShapeOpsCUDA::test_flipud_cuda_float64, test/test_shape_ops.py::TestShapeOpsCUDA::test_flipud_cuda_int64, test/test_shape_ops.py::TestShapeOpsCUDA::test_flipud_invalid_cuda_complex128, test/test_shape_ops.py::TestShapeOpsCUDA::test_flipud_invalid_cuda_float64, test/test_shape_ops.py::TestShapeOpsCUDA::test_flipud_invalid_cuda_int64, test/test_shape_ops.py::TestShapeOpsCUDA::test_movedim_cuda_complex128, test/test_shape_ops.py::TestShapeOpsCUDA::test_movedim_cuda_float32, test/test_shape_ops.py::TestShapeOpsCUDA::test_movedim_cuda_int64, test/test_shape_ops.py::TestShapeOpsCUDA::test_movedim_invalid_cuda_complex128, test/test_shape_ops.py::TestShapeOpsCUDA::test_movedim_invalid_cuda_float32, test/test_shape_ops.py::TestShapeOpsCUDA::test_movedim_invalid_cuda_int64, test/test_shape_ops.py::TestShapeOpsCUDA::test_nonzero_astuple_out_cuda, test/test_shape_ops.py::TestShapeOpsCUDA::test_nonzero_cuda_bfloat16, test/test_shape_ops.py::TestShapeOpsCUDA::test_nonzero_cuda_bool, test/test_shape_ops.py::TestShapeOpsCUDA::test_nonzero_cuda_float16, test/test_shape_ops.py::TestShapeOpsCUDA::test_nonzero_cuda_float32, test/test_shape_ops.py::TestShapeOpsCUDA::test_nonzero_cuda_float64, test/test_shape_ops.py::TestShapeOpsCUDA::test_nonzero_cuda_int16, test/test_shape_ops.py::TestShapeOpsCUDA::test_nonzero_cuda_int32, test/test_shape_ops.py::TestShapeOpsCUDA::test_nonzero_cuda_int64, test/test_shape_ops.py::TestShapeOpsCUDA::test_nonzero_cuda_int8, test/test_shape_ops.py::TestShapeOpsCUDA::test_nonzero_cuda_uint8, test/test_shape_ops.py::TestShapeOpsCUDA::test_nonzero_discontiguous_cuda, test/test_shape_ops.py::TestShapeOpsCUDA::test_nonzero_no_warning_cuda, test/test_shape_ops.py::TestShapeOpsCUDA::test_nonzero_non_diff_cuda, test/test_shape_ops.py::TestShapeOpsCUDA::test_rot90_cuda, test/test_shape_ops.py::TestShapeOpsCUDA::test_sparse_dense_dim_cuda_complex128, test/test_shape_ops.py::TestShapeOpsCUDA::test_sparse_dense_dim_cuda_float32, test/test_shape_ops.py::TestShapeOpsCUDA::test_sparse_dense_dim_cuda_int64, test/test_shape_ops.py::TestShapeOpsCUDA::test_tolist_cuda, test/test_shape_ops.py::TestShapeOpsCUDA::test_trace_cuda_float16, test/test_shape_ops.py::TestShapeOpsCUDA::test_trace_cuda_float32, test/test_shape_ops.py::TestShapeOpsCUDA::test_trace_cuda_float64, test/test_shape_ops.py::TestShapeOpsCUDA::test_trace_cuda_int16, test/test_shape_ops.py::TestShapeOpsCUDA::test_trace_cuda_int32, test/test_shape_ops.py::TestShapeOpsCUDA::test_trace_cuda_int64, test/test_shape_ops.py::TestShapeOpsCUDA::test_trace_cuda_int8, test/test_shape_ops.py::TestShapeOpsCUDA::test_trace_cuda_uint8, test/test_shape_ops.py::TestShapeOpsCUDA::test_unbind_cuda, test/test_shape_ops.py::TestShapeOpsCUDA::test_unfold_all_devices_and_dtypes_cuda, test/test_shape_ops.py::TestShapeOpsCUDA::test_unfold_backward_errors_cuda, test/test_shape_ops.py::TestShapeOpsCUDA::test_unfold_errors_cuda, test/test_shape_ops.py::TestShapeOpsCUDA::test_unfold_scalars_cuda 2025-12-04T17:10:30.7601023Z 2025-12-04T17:10:30.7601318Z Finished test_shape_ops 1/1 ... [2025-12-04 17:10:30.753230][28659.136125766], took 0.14min 2025-12-04T17:10:30.7915005Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_shape_ops/test_shape_ops-a6160583c0856270.xml 2025-12-04T17:10:30.8781197Z Running nn/test_module_hooks 1/1 ... [2025-12-04 17:10:30.877796][28659.260687459] 2025-12-04T17:10:30.8781774Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T17:10:30.8784996Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'nn/test_module_hooks.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 17:10:30.878248] 2025-12-04T17:10:36.7518298Z 2025-12-04T17:10:36.7519303Z nn/test_module_hooks 1/1 was successful, full logs can be found in artifacts with path test/test-reports/nn.test_module_hooks_1.1_b8e5016c3845034d_.log 2025-12-04T17:10:36.7542112Z Running 53 items in this shard: test/nn/test_module_hooks.py::TestModuleHooks::test_always_called_forward_hooks, test/nn/test_module_hooks.py::TestModuleHooks::test_bw_hook_warning_for_non_tensor_or_tuple, test/nn/test_module_hooks.py::TestModuleHooks::test_forward_hooks_named_tuple_False, test/nn/test_module_hooks.py::TestModuleHooks::test_forward_hooks_named_tuple_True, test/nn/test_module_hooks.py::TestModuleHooks::test_forward_pre_hooks_named_tuple_False, test/nn/test_module_hooks.py::TestModuleHooks::test_forward_pre_hooks_named_tuple_True, test/nn/test_module_hooks.py::TestModuleHooks::test_full_backward_hooks_named_tuple_False, test/nn/test_module_hooks.py::TestModuleHooks::test_full_backward_hooks_named_tuple_True, test/nn/test_module_hooks.py::TestModuleHooks::test_full_backward_pre_hooks_named_tuple_False, test/nn/test_module_hooks.py::TestModuleHooks::test_full_backward_pre_hooks_named_tuple_True, test/nn/test_module_hooks.py::TestModuleHooks::test_kwarg_hooks, test/nn/test_module_hooks.py::TestModuleHooks::test_mixed_hooks_named_tuple_False, test/nn/test_module_hooks.py::TestModuleHooks::test_mixed_hooks_named_tuple_True, test/nn/test_module_hooks.py::TestModuleHooks::test_remove_kwarg_hooks, test/nn/test_module_hooks.py::TestStateDictHooks::test_load_state_dict_module_pre_hook_swap_False, test/nn/test_module_hooks.py::TestStateDictHooks::test_load_state_dict_module_pre_hook_swap_True, test/nn/test_module_hooks.py::TestStateDictHooks::test_load_state_dict_post_hook_backward_compatibility_swap_False, test/nn/test_module_hooks.py::TestStateDictHooks::test_load_state_dict_post_hook_backward_compatibility_swap_True, test/nn/test_module_hooks.py::TestStateDictHooks::test_load_state_dict_post_hook_swap_False, test/nn/test_module_hooks.py::TestStateDictHooks::test_load_state_dict_post_hook_swap_True, test/nn/test_module_hooks.py::TestStateDictHooks::test_load_state_dict_pre_hook_swap_False, test/nn/test_module_hooks.py::TestStateDictHooks::test_load_state_dict_pre_hook_swap_True, test/nn/test_module_hooks.py::TestStateDictHooks::test_no_extra_ref_to_module, test/nn/test_module_hooks.py::TestStateDictHooks::test_pickled_hook, test/nn/test_module_hooks.py::TestStateDictHooks::test_register_state_dict_post_hook_private_False, test/nn/test_module_hooks.py::TestStateDictHooks::test_register_state_dict_post_hook_private_True, test/nn/test_module_hooks.py::TestStateDictHooks::test_register_state_dict_pre_hook, test/nn/test_module_hooks.py::TestStateDictHooks::test_register_state_dict_pre_hook_backward_compat, test/nn/test_module_hooks.py::TestStateDictHooks::test_register_state_dict_pre_hook_lazy_module, test/nn/test_module_hooks.py::TestModuleGlobalHooks::test_global_and_local_hooks_order, test/nn/test_module_hooks.py::TestModuleGlobalHooks::test_module_backward_global_hook_writeable, test/nn/test_module_hooks.py::TestModuleGlobalHooks::test_module_forward_forward_hook_removable, test/nn/test_module_hooks.py::TestModuleGlobalHooks::test_module_forward_preforward_hook_removable, test/nn/test_module_hooks.py::TestModuleGlobalHooks::test_module_global_forward_preforward_hook_writeable, test/nn/test_module_hooks.py::TestModuleGlobalHooks::test_module_global_hook_invalid_outputs, test/nn/test_module_hooks.py::TestModuleGlobalHooks::test_module_global_hooks, test/nn/test_module_hooks.py::TestModuleGlobalHooks::test_module_global_hooks_with_kwargs, test/nn/test_module_hooks.py::TestModuleHookNN::test_backward_hooks_interaction, test/nn/test_module_hooks.py::TestModuleHookNN::test_hook_backward_size, test/nn/test_module_hooks.py::TestModuleHookNN::test_hook_backward_writeable, test/nn/test_module_hooks.py::TestModuleHookNN::test_hook_buffer_registration, test/nn/test_module_hooks.py::TestModuleHookNN::test_hook_cpp, test/nn/test_module_hooks.py::TestModuleHookNN::test_hook_extra_input, test/nn/test_module_hooks.py::TestModuleHookNN::test_hook_forward_preforward_writable, test/nn/test_module_hooks.py::TestModuleHookNN::test_hook_inplace, test/nn/test_module_hooks.py::TestModuleHookNN::test_hook_invalid_outputs, test/nn/test_module_hooks.py::TestModuleHookNN::test_hook_last_arg_requires_grad, test/nn/test_module_hooks.py::TestModuleHookNN::test_hook_no_requires_grad, test/nn/test_module_hooks.py::TestModuleHookNN::test_hook_non_full_warning, test/nn/test_module_hooks.py::TestModuleHookNN::test_hook_parameter_registration, test/nn/test_module_hooks.py::TestModuleHookNN::test_hook_requires_grad, test/nn/test_module_hooks.py::TestModuleHookNN::test_hook_submodule_registration, test/nn/test_module_hooks.py::TestModuleHookNN::test_hooks 2025-12-04T17:10:36.7564247Z 2025-12-04T17:10:36.7564585Z Finished nn/test_module_hooks 1/1 ... [2025-12-04 17:10:36.751660][28665.134557521], took 0.10min 2025-12-04T17:10:36.7900424Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/nn.test_module_hooks/nn.test_module_hooks-e13d4f4eb9af9666.xml 2025-12-04T17:10:36.8341161Z Running torch_np/numpy_tests/lib/test_twodim_base 1/1 ... [2025-12-04 17:10:36.833821][28665.216715404] 2025-12-04T17:10:36.8341842Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T17:10:36.8344963Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'torch_np/numpy_tests/lib/test_twodim_base.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 17:10:36.834248] 2025-12-04T17:10:42.8078617Z 2025-12-04T17:10:42.8080079Z torch_np/numpy_tests/lib/test_twodim_base 1/1 was successful, full logs can be found in artifacts with path test/test-reports/torch_np.numpy_tests.lib.test_twodim_base_1.1_facf24e95ed5355d_.log 2025-12-04T17:10:42.8093999Z Running 34 items in this shard: test/torch_np/numpy_tests/lib/test_twodim_base.py::TestEye::test_2d, test/torch_np/numpy_tests/lib/test_twodim_base.py::TestEye::test_basic, test/torch_np/numpy_tests/lib/test_twodim_base.py::TestEye::test_bool, test/torch_np/numpy_tests/lib/test_twodim_base.py::TestEye::test_diag, test/torch_np/numpy_tests/lib/test_twodim_base.py::TestEye::test_diag2d, test/torch_np/numpy_tests/lib/test_twodim_base.py::TestEye::test_eye_bounds, test/torch_np/numpy_tests/lib/test_twodim_base.py::TestEye::test_order, test/torch_np/numpy_tests/lib/test_twodim_base.py::TestDiag::test_diag_bounds, test/torch_np/numpy_tests/lib/test_twodim_base.py::TestDiag::test_failure, test/torch_np/numpy_tests/lib/test_twodim_base.py::TestDiag::test_fortran_order, test/torch_np/numpy_tests/lib/test_twodim_base.py::TestDiag::test_matrix, test/torch_np/numpy_tests/lib/test_twodim_base.py::TestDiag::test_vector, test/torch_np/numpy_tests/lib/test_twodim_base.py::TestFliplr::test_basic, test/torch_np/numpy_tests/lib/test_twodim_base.py::TestFlipud::test_basic, test/torch_np/numpy_tests/lib/test_twodim_base.py::TestHistogram2d::test_all_outliers, test/torch_np/numpy_tests/lib/test_twodim_base.py::TestHistogram2d::test_asym, test/torch_np/numpy_tests/lib/test_twodim_base.py::TestHistogram2d::test_bad_length_x_len_10_y_len_11, test/torch_np/numpy_tests/lib/test_twodim_base.py::TestHistogram2d::test_bad_length_x_len_20_y_len_19, test/torch_np/numpy_tests/lib/test_twodim_base.py::TestHistogram2d::test_binparameter_combination, test/torch_np/numpy_tests/lib/test_twodim_base.py::TestHistogram2d::test_density, test/torch_np/numpy_tests/lib/test_twodim_base.py::TestHistogram2d::test_empty, test/torch_np/numpy_tests/lib/test_twodim_base.py::TestHistogram2d::test_simple, test/torch_np/numpy_tests/lib/test_twodim_base.py::TestTri::test_dtype, test/torch_np/numpy_tests/lib/test_twodim_base.py::TestTri::test_mask_indices, test/torch_np/numpy_tests/lib/test_twodim_base.py::TestTri::test_tril_indices, test/torch_np/numpy_tests/lib/test_twodim_base.py::TestTri::test_tril_triu_dtype, test/torch_np/numpy_tests/lib/test_twodim_base.py::TestTri::test_tril_triu_ndim2, test/torch_np/numpy_tests/lib/test_twodim_base.py::TestTri::test_tril_triu_ndim3, test/torch_np/numpy_tests/lib/test_twodim_base.py::TestTri::test_tril_triu_with_inf, test/torch_np/numpy_tests/lib/test_twodim_base.py::TestTriuIndices::test_triu_indices, test/torch_np/numpy_tests/lib/test_twodim_base.py::TestTrilIndicesFrom::test_exceptions, test/torch_np/numpy_tests/lib/test_twodim_base.py::TestTriuIndicesFrom::test_exceptions, test/torch_np/numpy_tests/lib/test_twodim_base.py::TestVander::test_basic, test/torch_np/numpy_tests/lib/test_twodim_base.py::TestVander::test_dtypes 2025-12-04T17:10:42.8107200Z 2025-12-04T17:10:42.8107622Z Finished torch_np/numpy_tests/lib/test_twodim_base 1/1 ... [2025-12-04 17:10:42.807696][28671.1905916], took 0.10min 2025-12-04T17:10:42.8462973Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/torch_np.numpy_tests.lib.test_twodim_base/torch_np.numpy_tests.lib.test_twodim_base-2da66c446de8da89.xml 2025-12-04T17:10:42.9146738Z Running profiler/test_memory_profiler 1/1 ... [2025-12-04 17:10:42.914328][28671.297222428] 2025-12-04T17:10:42.9147396Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T17:10:42.9150338Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'profiler/test_memory_profiler.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 17:10:42.914766] 2025-12-04T17:10:55.0471921Z 2025-12-04T17:10:55.0473152Z profiler/test_memory_profiler 1/1 was successful, full logs can be found in artifacts with path test/test-reports/profiler.test_memory_profiler_1.1_70baf0213dbc5855_.log 2025-12-04T17:10:55.0489692Z Running 33 items in this shard: test/profiler/test_memory_profiler.py::TestMemoryProfiler::test_config_check, test/profiler/test_memory_profiler.py::TestIdentifyGradients::test_extract_gradients_from_module, test/profiler/test_memory_profiler.py::TestIdentifyGradients::test_extract_gradients_from_module_and_optimizer, test/profiler/test_memory_profiler.py::TestIdentifyGradients::test_extract_gradients_from_optimizer, test/profiler/test_memory_profiler.py::TestIdentifyGradients::test_extract_gradients_from_optimizer_set_to_none, test/profiler/test_memory_profiler.py::TestIdentifyGradients::test_extract_gradients_low_level, test/profiler/test_memory_profiler.py::TestDataFlow::test_data_flow_graph_complicated, test/profiler/test_memory_profiler.py::TestDataFlow::test_data_flow_graph_non_op_allocations, test/profiler/test_memory_profiler.py::TestDataFlow::test_data_flow_graph_simple, test/profiler/test_memory_profiler.py::TestDataFlow::test_data_flow_graph_simple_backward, test/profiler/test_memory_profiler.py::TestDataFlow::test_data_flow_graph_simple_inplace, test/profiler/test_memory_profiler.py::TestDataFlow::test_data_flow_graph_stacked, test/profiler/test_memory_profiler.py::TestDataFlow::test_data_flow_graph_with_annotations, test/profiler/test_memory_profiler.py::TestDataFlow::test_match_schemas, test/profiler/test_memory_profiler.py::TestDataFlow::test_match_schemas_backward, test/profiler/test_memory_profiler.py::TestDataFlow::test_match_schemas_tensorlist, test/profiler/test_memory_profiler.py::TestMemoryProfilerE2E::test_categories_e2e_sequential_fwd, test/profiler/test_memory_profiler.py::TestMemoryProfilerE2E::test_categories_e2e_sequential_fwd_bwd, test/profiler/test_memory_profiler.py::TestMemoryProfilerE2E::test_categories_e2e_simple_fwd, test/profiler/test_memory_profiler.py::TestMemoryProfilerE2E::test_categories_e2e_simple_fwd_bwd, test/profiler/test_memory_profiler.py::TestMemoryProfilerE2E::test_categories_e2e_simple_fwd_bwd_step, test/profiler/test_memory_profiler.py::TestMemoryProfilerE2E::test_categories_e2e_simple_module_fwd, test/profiler/test_memory_profiler.py::TestMemoryProfilerE2E::test_categories_e2e_simple_module_fwd_bwd, test/profiler/test_memory_profiler.py::TestMemoryProfilerE2E::test_categories_e2e_simple_module_fwd_bwd_step, test/profiler/test_memory_profiler.py::TestMemoryProfilerE2E::test_inputs_fwd, test/profiler/test_memory_profiler.py::TestMemoryProfilerE2E::test_inputs_fwd_bwd, test/profiler/test_memory_profiler.py::TestMemoryProfilerE2E::test_inputs_fwd_lazy, test/profiler/test_memory_profiler.py::TestMemoryProfilerE2E::test_lazily_initialized, test/profiler/test_memory_profiler.py::TestMemoryProfilerE2E::test_manual_optimizer_step, test/profiler/test_memory_profiler.py::TestMemoryProfilerE2E::test_memory_timeline, test/profiler/test_memory_profiler.py::TestMemoryProfilerE2E::test_parameters_and_gradients, test/profiler/test_memory_profiler.py::TestMemoryProfilerE2E::test_parameters_and_gradients_set_to_none, test/profiler/test_memory_profiler.py::TestMemoryProfilerTimelineCUDA::test_memory_timeline_no_id_cuda 2025-12-04T17:10:55.0505407Z 2025-12-04T17:10:55.0505789Z Finished profiler/test_memory_profiler 1/1 ... [2025-12-04 17:10:55.046984][28683.429881605], took 0.20min 2025-12-04T17:10:55.0857591Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/profiler.test_memory_profiler/profiler.test_memory_profiler-20f5e2eefecacaee.xml 2025-12-04T17:10:55.1542330Z Running test_jit_llga_fuser 1/1 ... [2025-12-04 17:10:55.153875][28683.536769] 2025-12-04T17:10:55.1542893Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T17:10:55.1545596Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_jit_llga_fuser.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 17:10:55.154301] 2025-12-04T17:11:23.5679669Z 2025-12-04T17:11:23.5680957Z test_jit_llga_fuser 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_jit_llga_fuser_1.1_a67e637a7f701026_.log 2025-12-04T17:11:23.5722654Z Running 107 items in this shard: test/test_jit_llga_fuser.py::TestEnableDisableLlgaFuser::test_context_manager, test/test_jit_llga_fuser.py::TestDynamoAOT::test_dynamo_aot_ts_onednn, test/test_jit_llga_fuser.py::TestModel::test_vision_alexnet_bfloat16, test/test_jit_llga_fuser.py::TestModel::test_vision_alexnet_float32, test/test_jit_llga_fuser.py::TestModel::test_vision_densenet121_bfloat16, test/test_jit_llga_fuser.py::TestModel::test_vision_densenet121_float32, test/test_jit_llga_fuser.py::TestModel::test_vision_densenet161_bfloat16, test/test_jit_llga_fuser.py::TestModel::test_vision_densenet161_float32, test/test_jit_llga_fuser.py::TestModel::test_vision_densenet169_bfloat16, test/test_jit_llga_fuser.py::TestModel::test_vision_densenet169_float32, test/test_jit_llga_fuser.py::TestModel::test_vision_densenet201_bfloat16, test/test_jit_llga_fuser.py::TestModel::test_vision_densenet201_float32, test/test_jit_llga_fuser.py::TestModel::test_vision_efficientnet_b0_bfloat16, test/test_jit_llga_fuser.py::TestModel::test_vision_efficientnet_b0_float32, test/test_jit_llga_fuser.py::TestModel::test_vision_efficientnet_b1_bfloat16, test/test_jit_llga_fuser.py::TestModel::test_vision_efficientnet_b1_float32, test/test_jit_llga_fuser.py::TestModel::test_vision_efficientnet_b2_bfloat16, test/test_jit_llga_fuser.py::TestModel::test_vision_efficientnet_b2_float32, test/test_jit_llga_fuser.py::TestModel::test_vision_efficientnet_b3_bfloat16, test/test_jit_llga_fuser.py::TestModel::test_vision_efficientnet_b3_float32, test/test_jit_llga_fuser.py::TestModel::test_vision_efficientnet_b4_bfloat16, test/test_jit_llga_fuser.py::TestModel::test_vision_efficientnet_b4_float32, test/test_jit_llga_fuser.py::TestModel::test_vision_efficientnet_b5_bfloat16, test/test_jit_llga_fuser.py::TestModel::test_vision_efficientnet_b5_float32, test/test_jit_llga_fuser.py::TestModel::test_vision_efficientnet_b6_bfloat16, test/test_jit_llga_fuser.py::TestModel::test_vision_efficientnet_b6_float32, test/test_jit_llga_fuser.py::TestModel::test_vision_efficientnet_b7_bfloat16, test/test_jit_llga_fuser.py::TestModel::test_vision_efficientnet_b7_float32, test/test_jit_llga_fuser.py::TestModel::test_vision_googlenet_bfloat16, test/test_jit_llga_fuser.py::TestModel::test_vision_googlenet_float32, test/test_jit_llga_fuser.py::TestModel::test_vision_mnasnet1_0_bfloat16, test/test_jit_llga_fuser.py::TestModel::test_vision_mnasnet1_0_float32, test/test_jit_llga_fuser.py::TestModel::test_vision_mobilenet_v2_bfloat16, test/test_jit_llga_fuser.py::TestModel::test_vision_mobilenet_v2_float32, test/test_jit_llga_fuser.py::TestModel::test_vision_mobilenet_v3_large_bfloat16, test/test_jit_llga_fuser.py::TestModel::test_vision_mobilenet_v3_large_float32, test/test_jit_llga_fuser.py::TestModel::test_vision_regnet_y_400mf_bfloat16, test/test_jit_llga_fuser.py::TestModel::test_vision_regnet_y_400mf_float32, test/test_jit_llga_fuser.py::TestModel::test_vision_resnet50_bfloat16, test/test_jit_llga_fuser.py::TestModel::test_vision_resnet50_float32, test/test_jit_llga_fuser.py::TestModel::test_vision_resnext101_32x8d_bfloat16, test/test_jit_llga_fuser.py::TestModel::test_vision_resnext101_32x8d_float32, test/test_jit_llga_fuser.py::TestModel::test_vision_resnext50_32x4d_bfloat16, test/test_jit_llga_fuser.py::TestModel::test_vision_resnext50_32x4d_float32, test/test_jit_llga_fuser.py::TestModel::test_vision_shufflenet_v2_x1_0_bfloat16, test/test_jit_llga_fuser.py::TestModel::test_vision_shufflenet_v2_x1_0_float32, test/test_jit_llga_fuser.py::TestModel::test_vision_squeezenet1_0_bfloat16, test/test_jit_llga_fuser.py::TestModel::test_vision_squeezenet1_0_float32, test/test_jit_llga_fuser.py::TestModel::test_vision_vgg16_bfloat16, test/test_jit_llga_fuser.py::TestModel::test_vision_vgg16_float32, test/test_jit_llga_fuser.py::TestModel::test_vision_wide_resnet50_2_bfloat16, test/test_jit_llga_fuser.py::TestModel::test_vision_wide_resnet50_2_float32, test/test_jit_llga_fuser.py::TestFusionPatternCUDA::test_bn2d_eltwise_cuda_bfloat16, test/test_jit_llga_fuser.py::TestFusionPatternCUDA::test_bn2d_eltwise_cuda_float32, test/test_jit_llga_fuser.py::TestFusionPatternCUDA::test_conv2d_bn_cuda_bfloat16, test/test_jit_llga_fuser.py::TestFusionPatternCUDA::test_conv2d_bn_cuda_float32, test/test_jit_llga_fuser.py::TestFusionPatternCUDA::test_conv2d_bn_relu_cuda_bfloat16, test/test_jit_llga_fuser.py::TestFusionPatternCUDA::test_conv2d_bn_relu_cuda_float32, test/test_jit_llga_fuser.py::TestFusionPatternCUDA::test_conv2d_clamp_cuda_bfloat16, test/test_jit_llga_fuser.py::TestFusionPatternCUDA::test_conv2d_clamp_cuda_float32, test/test_jit_llga_fuser.py::TestFusionPatternCUDA::test_conv2d_eltwise_cuda_bfloat16, test/test_jit_llga_fuser.py::TestFusionPatternCUDA::test_conv2d_eltwise_cuda_float32, test/test_jit_llga_fuser.py::TestFusionPatternCUDA::test_conv2d_silu_cuda_bfloat16, test/test_jit_llga_fuser.py::TestFusionPatternCUDA::test_conv2d_silu_cuda_float32, test/test_jit_llga_fuser.py::TestFusionPatternCUDA::test_conv2d_sum_cuda_bfloat16, test/test_jit_llga_fuser.py::TestFusionPatternCUDA::test_conv2d_sum_cuda_float32, test/test_jit_llga_fuser.py::TestFusionPatternCUDA::test_ensure_tensor_is_rewrapped_cuda_bfloat16, test/test_jit_llga_fuser.py::TestFusionPatternCUDA::test_ensure_tensor_is_rewrapped_cuda_float32, test/test_jit_llga_fuser.py::TestFusionPatternCUDA::test_linear_eltwise_cuda_bfloat16, test/test_jit_llga_fuser.py::TestFusionPatternCUDA::test_linear_eltwise_cuda_float32, test/test_jit_llga_fuser.py::TestFusionPatternCUDA::test_rewrap_tensor_input_to_pytorch_cuda_bfloat16, test/test_jit_llga_fuser.py::TestFusionPatternCUDA::test_rewrap_tensor_input_to_pytorch_cuda_float32, test/test_jit_llga_fuser.py::TestFusionPatternCUDA::test_wildcard_cuda_bfloat16, test/test_jit_llga_fuser.py::TestFusionPatternCUDA::test_wildcard_cuda_float32, test/test_jit_llga_fuser.py::TestFusionPatternCUDA::test_wildcard_unsupported_dtype_cuda_int32, test/test_jit_llga_fuser.py::TestOpCUDA::test_add_cuda_bfloat16, test/test_jit_llga_fuser.py::TestOpCUDA::test_add_cuda_float32, test/test_jit_llga_fuser.py::TestOpCUDA::test_add_scalar_cuda_bfloat16, test/test_jit_llga_fuser.py::TestOpCUDA::test_add_scalar_cuda_float32, test/test_jit_llga_fuser.py::TestOpCUDA::test_addmm_cuda_bfloat16, test/test_jit_llga_fuser.py::TestOpCUDA::test_addmm_cuda_float32, test/test_jit_llga_fuser.py::TestOpCUDA::test_avg_pool2d_cuda_bfloat16, test/test_jit_llga_fuser.py::TestOpCUDA::test_avg_pool2d_cuda_float32, test/test_jit_llga_fuser.py::TestOpCUDA::test_bn2d_cuda_bfloat16, test/test_jit_llga_fuser.py::TestOpCUDA::test_bn2d_cuda_float32, test/test_jit_llga_fuser.py::TestOpCUDA::test_cat_cuda_bfloat16, test/test_jit_llga_fuser.py::TestOpCUDA::test_cat_cuda_float32, test/test_jit_llga_fuser.py::TestOpCUDA::test_conv2d_cuda_bfloat16, test/test_jit_llga_fuser.py::TestOpCUDA::test_conv2d_cuda_float32, test/test_jit_llga_fuser.py::TestOpCUDA::test_eltwise_cuda_bfloat16, test/test_jit_llga_fuser.py::TestOpCUDA::test_eltwise_cuda_float32, test/test_jit_llga_fuser.py::TestOpCUDA::test_identity_binary_cuda_bfloat16, test/test_jit_llga_fuser.py::TestOpCUDA::test_identity_binary_cuda_float32, test/test_jit_llga_fuser.py::TestOpCUDA::test_layer_norm_cuda_bfloat16, test/test_jit_llga_fuser.py::TestOpCUDA::test_layer_norm_cuda_float32, test/test_jit_llga_fuser.py::TestOpCUDA::test_linear_cuda_bfloat16, test/test_jit_llga_fuser.py::TestOpCUDA::test_linear_cuda_float32, test/test_jit_llga_fuser.py::TestOpCUDA::test_max_pool2d_cuda_bfloat16, test/test_jit_llga_fuser.py::TestOpCUDA::test_max_pool2d_cuda_float32, test/test_jit_llga_fuser.py::TestOpCUDA::test_mul_cuda_bfloat16, test/test_jit_llga_fuser.py::TestOpCUDA::test_mul_cuda_float32, test/test_jit_llga_fuser.py::TestOpCUDA::test_softmax_cuda_bfloat16, test/test_jit_llga_fuser.py::TestOpCUDA::test_softmax_cuda_float32, test/test_jit_llga_fuser.py::TestOpCUDA::test_typecheck_cuda_bfloat16, test/test_jit_llga_fuser.py::TestOpCUDA::test_typecheck_cuda_float32, test/test_jit_llga_fuser.py::TestOpCUDA::test_variable_kernel_avg_pool2d_cuda_bfloat16, test/test_jit_llga_fuser.py::TestOpCUDA::test_variable_kernel_avg_pool2d_cuda_float32 2025-12-04T17:11:23.5760229Z 2025-12-04T17:11:23.5760547Z Finished test_jit_llga_fuser 1/1 ... [2025-12-04 17:11:23.567937][28711.950832179], took 0.47min 2025-12-04T17:11:23.6073223Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_jit_llga_fuser/test_jit_llga_fuser-b203cab2c461ce78.xml 2025-12-04T17:11:23.6836773Z Running optim/test_optim 1/1 ... [2025-12-04 17:11:23.683313][28712.06620721] 2025-12-04T17:11:23.6837339Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T17:11:23.6840695Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'optim/test_optim.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 17:11:23.683763] 2025-12-04T17:11:28.5287776Z 2025-12-04T17:11:28.5288758Z optim/test_optim 1/1 was successful, full logs can be found in artifacts with path test/test-reports/optim.test_optim_1.1_e409dee8e8c07436_.log 2025-12-04T17:11:28.5289500Z 2025-12-04T17:11:28.5289831Z Finished optim/test_optim 1/1 ... [2025-12-04 17:11:28.528541][28716.911438104], took 0.08min 2025-12-04T17:11:28.5681262Z Running torch_np/numpy_tests/core/test_getlimits 1/1 ... [2025-12-04 17:11:28.567812][28716.950706048] 2025-12-04T17:11:28.5682155Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T17:11:28.5685268Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'torch_np/numpy_tests/core/test_getlimits.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 17:11:28.568258] 2025-12-04T17:11:34.3913585Z 2025-12-04T17:11:34.3914769Z torch_np/numpy_tests/core/test_getlimits 1/1 was successful, full logs can be found in artifacts with path test/test-reports/torch_np.numpy_tests.core.test_getlimits_1.1_827a2f053af78584_.log 2025-12-04T17:11:34.3922498Z Running 17 items in this shard: test/torch_np/numpy_tests/core/test_getlimits.py::TestPythonFloat::test_singleton, test/torch_np/numpy_tests/core/test_getlimits.py::TestHalf::test_singleton, test/torch_np/numpy_tests/core/test_getlimits.py::TestSingle::test_singleton, test/torch_np/numpy_tests/core/test_getlimits.py::TestDouble::test_singleton, test/torch_np/numpy_tests/core/test_getlimits.py::TestFinfo::test_basic, test/torch_np/numpy_tests/core/test_getlimits.py::TestFinfo::test_basic_missing, test/torch_np/numpy_tests/core/test_getlimits.py::TestIinfo::test_basic, test/torch_np/numpy_tests/core/test_getlimits.py::TestIinfo::test_unsigned_max_T0, test/torch_np/numpy_tests/core/test_getlimits.py::TestIinfo::test_unsigned_max_T1, test/torch_np/numpy_tests/core/test_getlimits.py::TestIinfo::test_unsigned_max_T2, test/torch_np/numpy_tests/core/test_getlimits.py::TestIinfo::test_unsigned_max_T3, test/torch_np/numpy_tests/core/test_getlimits.py::TestRepr::test_finfo_repr, test/torch_np/numpy_tests/core/test_getlimits.py::TestRepr::test_iinfo_repr, test/torch_np/numpy_tests/core/test_getlimits.py::TestMisc::test_instances, test/torch_np/numpy_tests/core/test_getlimits.py::TestMisc::test_known_types, test/torch_np/numpy_tests/core/test_getlimits.py::TestMisc::test_plausible_finfo, test/torch_np/numpy_tests/core/test_getlimits.py::TestMisc::test_subnormal_warning 2025-12-04T17:11:34.3929350Z 2025-12-04T17:11:34.3929783Z Finished torch_np/numpy_tests/core/test_getlimits 1/1 ... [2025-12-04 17:11:34.391134][28722.774031195], took 0.10min 2025-12-04T17:11:34.4302204Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/torch_np.numpy_tests.core.test_getlimits/torch_np.numpy_tests.core.test_getlimits-5149534e2555ec6f.xml 2025-12-04T17:11:34.5026541Z Running torch_np/test_ndarray_methods 1/1 ... [2025-12-04 17:11:34.502319][28722.885212469] 2025-12-04T17:11:34.5027163Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T17:11:34.5029929Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'torch_np/test_ndarray_methods.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 17:11:34.502758] 2025-12-04T17:11:44.3318799Z 2025-12-04T17:11:44.3319933Z torch_np/test_ndarray_methods 1/1 was successful, full logs can be found in artifacts with path test/test-reports/torch_np.test_ndarray_methods_1.1_793e3aaaf30f7d3c_.log 2025-12-04T17:11:44.3480381Z Running 342 items in this shard: test/torch_np/test_ndarray_methods.py::TestIndexing::test_indexing_simple, test/torch_np/test_ndarray_methods.py::TestIndexing::test_setitem, test/torch_np/test_ndarray_methods.py::TestReshape::test_reshape_function, test/torch_np/test_ndarray_methods.py::TestReshape::test_reshape_method, test/torch_np/test_ndarray_methods.py::TestTranspose::test_transpose_function, test/torch_np/test_ndarray_methods.py::TestTranspose::test_transpose_method, test/torch_np/test_ndarray_methods.py::TestRavel::test_ravel_function, test/torch_np/test_ndarray_methods.py::TestRavel::test_ravel_method, test/torch_np/test_ndarray_methods.py::TestNonzero::test_array_method, test/torch_np/test_ndarray_methods.py::TestNonzero::test_nonzero_onedim, test/torch_np/test_ndarray_methods.py::TestNonzero::test_nonzero_trivial, test/torch_np/test_ndarray_methods.py::TestNonzero::test_nonzero_twodim, test/torch_np/test_ndarray_methods.py::TestNonzero::test_sparse, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_all_method_max, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_all_method_min, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size0_axis0_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size0_axis0_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size10_axis_-1_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size10_axis_-1_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size11_axis_0_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size11_axis_0_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size12_axis_1_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size12_axis_1_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size13_axis13_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size13_axis13_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size14_axis_-2_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size14_axis_-2_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size15_axis_-1_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size15_axis_-1_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size16_axis_0_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size16_axis_0_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size17_axis_1_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size17_axis_1_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size18_axis18_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size18_axis18_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size19_axis_-3_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size19_axis_-3_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size1_axis_-1_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size1_axis_-1_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size20_axis_-2_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size20_axis_-2_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size21_axis_-1_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size21_axis_-1_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size22_axis_0_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size22_axis_0_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size23_axis_1_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size23_axis_1_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size24_axis_2_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size24_axis_2_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size25_axis25_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size25_axis25_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size26_axis_-3_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size26_axis_-3_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size27_axis_-2_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size27_axis_-2_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size28_axis_-1_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size28_axis_-1_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size29_axis_0_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size29_axis_0_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size2_axis_0_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size2_axis_0_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size30_axis_1_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size30_axis_1_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size31_axis_2_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size31_axis_2_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size32_axis32_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size32_axis32_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size33_axis_-4_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size33_axis_-4_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size34_axis_-3_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size34_axis_-3_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size35_axis_-2_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size35_axis_-2_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size36_axis_-1_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size36_axis_-1_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size37_axis_0_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size37_axis_0_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size38_axis_1_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size38_axis_1_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size39_axis_2_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size39_axis_2_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size3_axis3_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size3_axis3_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size40_axis_3_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size40_axis_3_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size41_axis41_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size41_axis41_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size42_axis_-4_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size42_axis_-4_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size43_axis_-3_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size43_axis_-3_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size44_axis_-2_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size44_axis_-2_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size45_axis_-1_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size45_axis_-1_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size46_axis_0_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size46_axis_0_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size47_axis_1_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size47_axis_1_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size48_axis_2_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size48_axis_2_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size49_axis_3_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size49_axis_3_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size4_axis_-2_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size4_axis_-2_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size50_axis50_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size50_axis50_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size51_axis_-4_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size51_axis_-4_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size52_axis_-3_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size52_axis_-3_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size53_axis_-2_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size53_axis_-2_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size54_axis_-1_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size54_axis_-1_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size55_axis_0_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size55_axis_0_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size56_axis_1_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size56_axis_1_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size57_axis_2_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size57_axis_2_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size58_axis_3_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size58_axis_3_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size59_axis59_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size59_axis59_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size5_axis_-1_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size5_axis_-1_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size60_axis_-4_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size60_axis_-4_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size61_axis_-3_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size61_axis_-3_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size62_axis_-2_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size62_axis_-2_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size63_axis_-1_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size63_axis_-1_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size64_axis_0_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size64_axis_0_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size65_axis_1_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size65_axis_1_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size66_axis_2_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size66_axis_2_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size67_axis_3_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size67_axis_3_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size68_axis68_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size68_axis68_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size69_axis_-1_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size69_axis_-1_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size6_axis_0_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size6_axis_0_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size70_axis_0_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size70_axis_0_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size71_axis71_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size71_axis71_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size72_axis_-1_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size72_axis_-1_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size73_axis_0_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size73_axis_0_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size74_axis74_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size74_axis74_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size75_axis_-1_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size75_axis_-1_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size76_axis_0_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size76_axis_0_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size77_axis77_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size77_axis77_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size7_axis_1_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size7_axis_1_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size8_axis8_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size8_axis8_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size9_axis_-2_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size9_axis_-2_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_vs_ndarray_arr_method_argmax_np_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_vs_ndarray_arr_method_argmin_np_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_vs_ndarray_positional_arr_method_argmax_np_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_vs_ndarray_positional_arr_method_argmin_np_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_output_shape_method_argmax, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_output_shape_method_argmin, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_ret_is_out_ndim_0_method_argmax, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_ret_is_out_ndim_0_method_argmin, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_ret_is_out_ndim_1_method_argmax, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_ret_is_out_ndim_1_method_argmin, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data0, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data1, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data10, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data11, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data12, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data13, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data14, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data15, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data16, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data17, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data18, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data19, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data2, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data20, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data21, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data22, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data23, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data24, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data25, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data26, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data27, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data28, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data29, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data3, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data30, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data31, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data32, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data33, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data34, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data35, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data36, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data37, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data38, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data39, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data4, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data40, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data41, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data42, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data43, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data44, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data45, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data46, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data47, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data48, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data49, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data5, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data50, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data51, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data52, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data53, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data54, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data55, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data56, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data57, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data58, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data59, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data6, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data60, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data61, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data62, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data63, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data64, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data65, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data66, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data67, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data68, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data69, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data7, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data70, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data71, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data72, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data73, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data8, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data9, test/torch_np/test_ndarray_methods.py::TestArgmax::test_maximum_signed_integers, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data0, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data1, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data10, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data11, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data12, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data13, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data14, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data15, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data16, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data17, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data18, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data19, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data2, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data20, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data21, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data22, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data23, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data24, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data25, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data26, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data27, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data28, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data29, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data3, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data30, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data31, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data32, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data33, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data34, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data35, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data36, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data37, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data38, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data39, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data4, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data40, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data41, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data42, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data43, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data44, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data45, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data46, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data47, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data48, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data49, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data5, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data50, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data51, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data52, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data53, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data54, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data55, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data56, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data57, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data58, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data59, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data6, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data60, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data61, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data62, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data63, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data64, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data65, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data66, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data67, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data68, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data69, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data7, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data70, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data71, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data72, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data73, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data8, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data9, test/torch_np/test_ndarray_methods.py::TestArgmin::test_minimum_signed_integers, test/torch_np/test_ndarray_methods.py::TestAmax::test_basic, test/torch_np/test_ndarray_methods.py::TestAmin::test_basic, test/torch_np/test_ndarray_methods.py::TestContains::test_contains, test/torch_np/test_ndarray_methods.py::TestNoExtraMethods::test_extra_methods_name_fn, test/torch_np/test_ndarray_methods.py::TestNoExtraMethods::test_extra_methods_name_ivar, test/torch_np/test_ndarray_methods.py::TestNoExtraMethods::test_extra_methods_name_method, test/torch_np/test_ndarray_methods.py::TestNoExtraMethods::test_extra_methods_name_name, test/torch_np/test_ndarray_methods.py::TestNoExtraMethods::test_extra_methods_name_plain, test/torch_np/test_ndarray_methods.py::TestNoExtraMethods::test_extra_methods_name_rvar, test/torch_np/test_ndarray_methods.py::TestIter::test_iter_1d, test/torch_np/test_ndarray_methods.py::TestIter::test_iter_2d 2025-12-04T17:11:44.3638413Z 2025-12-04T17:11:44.3638815Z Finished torch_np/test_ndarray_methods 1/1 ... [2025-12-04 17:11:44.332351][28732.715245344], took 0.16min 2025-12-04T17:11:44.3717905Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/torch_np.test_ndarray_methods/torch_np.test_ndarray_methods-fe7c638b86097b2d.xml 2025-12-04T17:11:44.4740225Z Running test_view_ops 1/1 ... [2025-12-04 17:11:44.473689][28732.85658216] 2025-12-04T17:11:44.4740745Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T17:11:44.4743807Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_view_ops.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 17:11:44.474145] 2025-12-04T17:12:07.4722256Z 2025-12-04T17:12:07.4723164Z test_view_ops 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_view_ops_1.1_405f53c81662ed35_.log 2025-12-04T17:12:07.4821986Z Running 279 items in this shard: test/test_view_ops.py::TestViewOpsCUDA::test_T_view_cuda, test/test_view_ops.py::TestViewOpsCUDA::test_advanced_indexing_assignment_cuda, test/test_view_ops.py::TestViewOpsCUDA::test_advanced_indexing_nonview_cuda, test/test_view_ops.py::TestViewOpsCUDA::test_as_strided_gradients_cuda, test/test_view_ops.py::TestViewOpsCUDA::test_as_strided_inplace_view_cuda, test/test_view_ops.py::TestViewOpsCUDA::test_as_strided_view_cuda, test/test_view_ops.py::TestViewOpsCUDA::test_basic_indexing_ellipses_view_cuda, test/test_view_ops.py::TestViewOpsCUDA::test_basic_indexing_newaxis_view_cuda, test/test_view_ops.py::TestViewOpsCUDA::test_basic_indexing_slice_view_cuda, test/test_view_ops.py::TestViewOpsCUDA::test_chunk_view_cuda, test/test_view_ops.py::TestViewOpsCUDA::test_conj_imag_view_cuda_complex128, test/test_view_ops.py::TestViewOpsCUDA::test_conj_imag_view_cuda_complex64, test/test_view_ops.py::TestViewOpsCUDA::test_conj_self_cuda_bfloat16, test/test_view_ops.py::TestViewOpsCUDA::test_conj_self_cuda_float16, test/test_view_ops.py::TestViewOpsCUDA::test_conj_self_cuda_float32, test/test_view_ops.py::TestViewOpsCUDA::test_conj_self_cuda_float64, test/test_view_ops.py::TestViewOpsCUDA::test_conj_self_cuda_int16, test/test_view_ops.py::TestViewOpsCUDA::test_conj_self_cuda_int32, test/test_view_ops.py::TestViewOpsCUDA::test_conj_self_cuda_int64, test/test_view_ops.py::TestViewOpsCUDA::test_conj_self_cuda_int8, test/test_view_ops.py::TestViewOpsCUDA::test_conj_self_cuda_uint8, test/test_view_ops.py::TestViewOpsCUDA::test_conj_view_with_shared_memory_cuda, test/test_view_ops.py::TestViewOpsCUDA::test_contiguous_nonview_cuda, test/test_view_ops.py::TestViewOpsCUDA::test_contiguous_self_cuda, test/test_view_ops.py::TestViewOpsCUDA::test_diagonal_view_cuda, test/test_view_ops.py::TestViewOpsCUDA::test_expand_as_view_cuda, test/test_view_ops.py::TestViewOpsCUDA::test_expand_view_cuda, test/test_view_ops.py::TestViewOpsCUDA::test_flatten_nonview_cuda, test/test_view_ops.py::TestViewOpsCUDA::test_flatten_view_cuda, test/test_view_ops.py::TestViewOpsCUDA::test_imag_noncomplex_cuda_bfloat16, test/test_view_ops.py::TestViewOpsCUDA::test_imag_noncomplex_cuda_float16, test/test_view_ops.py::TestViewOpsCUDA::test_imag_noncomplex_cuda_float32, test/test_view_ops.py::TestViewOpsCUDA::test_imag_noncomplex_cuda_float64, test/test_view_ops.py::TestViewOpsCUDA::test_imag_noncomplex_cuda_int16, test/test_view_ops.py::TestViewOpsCUDA::test_imag_noncomplex_cuda_int32, test/test_view_ops.py::TestViewOpsCUDA::test_imag_noncomplex_cuda_int64, test/test_view_ops.py::TestViewOpsCUDA::test_imag_noncomplex_cuda_int8, test/test_view_ops.py::TestViewOpsCUDA::test_imag_noncomplex_cuda_uint8, test/test_view_ops.py::TestViewOpsCUDA::test_movedim_view_cuda, test/test_view_ops.py::TestViewOpsCUDA::test_narrow_view_cuda, test/test_view_ops.py::TestViewOpsCUDA::test_permute_view_cuda, test/test_view_ops.py::TestViewOpsCUDA::test_real_imag_view_cuda_complex128, test/test_view_ops.py::TestViewOpsCUDA::test_real_imag_view_cuda_complex64, test/test_view_ops.py::TestViewOpsCUDA::test_reshape_as_view_cuda, test/test_view_ops.py::TestViewOpsCUDA::test_reshape_nonview_cuda, test/test_view_ops.py::TestViewOpsCUDA::test_reshape_view_cuda, test/test_view_ops.py::TestViewOpsCUDA::test_select_view_cuda, test/test_view_ops.py::TestViewOpsCUDA::test_set_real_imag_cuda_complex128_bfloat16, test/test_view_ops.py::TestViewOpsCUDA::test_set_real_imag_cuda_complex128_bool, test/test_view_ops.py::TestViewOpsCUDA::test_set_real_imag_cuda_complex128_complex128, test/test_view_ops.py::TestViewOpsCUDA::test_set_real_imag_cuda_complex128_complex64, test/test_view_ops.py::TestViewOpsCUDA::test_set_real_imag_cuda_complex128_float16, test/test_view_ops.py::TestViewOpsCUDA::test_set_real_imag_cuda_complex128_float32, test/test_view_ops.py::TestViewOpsCUDA::test_set_real_imag_cuda_complex128_float64, test/test_view_ops.py::TestViewOpsCUDA::test_set_real_imag_cuda_complex128_int16, test/test_view_ops.py::TestViewOpsCUDA::test_set_real_imag_cuda_complex128_int32, test/test_view_ops.py::TestViewOpsCUDA::test_set_real_imag_cuda_complex128_int64, test/test_view_ops.py::TestViewOpsCUDA::test_set_real_imag_cuda_complex128_int8, test/test_view_ops.py::TestViewOpsCUDA::test_set_real_imag_cuda_complex128_uint8, test/test_view_ops.py::TestViewOpsCUDA::test_set_real_imag_cuda_complex64_bfloat16, test/test_view_ops.py::TestViewOpsCUDA::test_set_real_imag_cuda_complex64_bool, test/test_view_ops.py::TestViewOpsCUDA::test_set_real_imag_cuda_complex64_complex128, test/test_view_ops.py::TestViewOpsCUDA::test_set_real_imag_cuda_complex64_complex64, test/test_view_ops.py::TestViewOpsCUDA::test_set_real_imag_cuda_complex64_float16, test/test_view_ops.py::TestViewOpsCUDA::test_set_real_imag_cuda_complex64_float32, test/test_view_ops.py::TestViewOpsCUDA::test_set_real_imag_cuda_complex64_float64, test/test_view_ops.py::TestViewOpsCUDA::test_set_real_imag_cuda_complex64_int16, test/test_view_ops.py::TestViewOpsCUDA::test_set_real_imag_cuda_complex64_int32, test/test_view_ops.py::TestViewOpsCUDA::test_set_real_imag_cuda_complex64_int64, test/test_view_ops.py::TestViewOpsCUDA::test_set_real_imag_cuda_complex64_int8, test/test_view_ops.py::TestViewOpsCUDA::test_set_real_imag_cuda_complex64_uint8, test/test_view_ops.py::TestViewOpsCUDA::test_split_view_cuda, test/test_view_ops.py::TestViewOpsCUDA::test_squeeze_inplace_view_cuda, test/test_view_ops.py::TestViewOpsCUDA::test_squeeze_view_cuda, test/test_view_ops.py::TestViewOpsCUDA::test_t_inplace_view_cuda, test/test_view_ops.py::TestViewOpsCUDA::test_t_view_cuda, test/test_view_ops.py::TestViewOpsCUDA::test_transpose_inplace_view_cuda, test/test_view_ops.py::TestViewOpsCUDA::test_transpose_view_cuda, test/test_view_ops.py::TestViewOpsCUDA::test_unbind_cuda, test/test_view_ops.py::TestViewOpsCUDA::test_unbind_view_cuda, test/test_view_ops.py::TestViewOpsCUDA::test_unfold_view_cuda, test/test_view_ops.py::TestViewOpsCUDA::test_unsqueeze_inplace_view_cuda, test/test_view_ops.py::TestViewOpsCUDA::test_unsqueeze_view_cuda, test/test_view_ops.py::TestViewOpsCUDA::test_view_as_complex_cuda, test/test_view_ops.py::TestViewOpsCUDA::test_view_as_real_cuda_complex128, test/test_view_ops.py::TestViewOpsCUDA::test_view_as_real_cuda_complex32, test/test_view_ops.py::TestViewOpsCUDA::test_view_as_real_cuda_complex64, test/test_view_ops.py::TestViewOpsCUDA::test_view_as_view_cuda, test/test_view_ops.py::TestViewOpsCUDA::test_view_copy_cuda, test/test_view_ops.py::TestViewOpsCUDA::test_view_copy_out_cuda, test/test_view_ops.py::TestViewOpsCUDA::test_view_copy_output_contiguous_cuda, test/test_view_ops.py::TestViewOpsCUDA::test_view_dtype_new_cuda_bool, test/test_view_ops.py::TestViewOpsCUDA::test_view_dtype_new_cuda_complex128, test/test_view_ops.py::TestViewOpsCUDA::test_view_dtype_new_cuda_complex64, test/test_view_ops.py::TestViewOpsCUDA::test_view_dtype_new_cuda_float16, test/test_view_ops.py::TestViewOpsCUDA::test_view_dtype_new_cuda_float32, test/test_view_ops.py::TestViewOpsCUDA::test_view_dtype_new_cuda_float64, test/test_view_ops.py::TestViewOpsCUDA::test_view_dtype_new_cuda_int16, test/test_view_ops.py::TestViewOpsCUDA::test_view_dtype_new_cuda_int32, test/test_view_ops.py::TestViewOpsCUDA::test_view_dtype_new_cuda_int64, test/test_view_ops.py::TestViewOpsCUDA::test_view_dtype_new_cuda_int8, test/test_view_ops.py::TestViewOpsCUDA::test_view_dtype_new_cuda_uint8, test/test_view_ops.py::TestViewOpsCUDA::test_view_dtype_upsize_errors_cuda_bfloat16, test/test_view_ops.py::TestViewOpsCUDA::test_view_dtype_upsize_errors_cuda_bool, test/test_view_ops.py::TestViewOpsCUDA::test_view_dtype_upsize_errors_cuda_complex128, test/test_view_ops.py::TestViewOpsCUDA::test_view_dtype_upsize_errors_cuda_complex64, test/test_view_ops.py::TestViewOpsCUDA::test_view_dtype_upsize_errors_cuda_float16, test/test_view_ops.py::TestViewOpsCUDA::test_view_dtype_upsize_errors_cuda_float32, test/test_view_ops.py::TestViewOpsCUDA::test_view_dtype_upsize_errors_cuda_float64, test/test_view_ops.py::TestViewOpsCUDA::test_view_dtype_upsize_errors_cuda_int16, test/test_view_ops.py::TestViewOpsCUDA::test_view_dtype_upsize_errors_cuda_int32, test/test_view_ops.py::TestViewOpsCUDA::test_view_dtype_upsize_errors_cuda_int64, test/test_view_ops.py::TestViewOpsCUDA::test_view_dtype_upsize_errors_cuda_int8, test/test_view_ops.py::TestViewOpsCUDA::test_view_dtype_upsize_errors_cuda_uint8, test/test_view_ops.py::TestViewOpsCUDA::test_view_tensor_dsplit_cuda_bfloat16, test/test_view_ops.py::TestViewOpsCUDA::test_view_tensor_dsplit_cuda_bool, test/test_view_ops.py::TestViewOpsCUDA::test_view_tensor_dsplit_cuda_complex128, test/test_view_ops.py::TestViewOpsCUDA::test_view_tensor_dsplit_cuda_complex64, test/test_view_ops.py::TestViewOpsCUDA::test_view_tensor_dsplit_cuda_float16, test/test_view_ops.py::TestViewOpsCUDA::test_view_tensor_dsplit_cuda_float32, test/test_view_ops.py::TestViewOpsCUDA::test_view_tensor_dsplit_cuda_float64, test/test_view_ops.py::TestViewOpsCUDA::test_view_tensor_dsplit_cuda_int16, test/test_view_ops.py::TestViewOpsCUDA::test_view_tensor_dsplit_cuda_int32, test/test_view_ops.py::TestViewOpsCUDA::test_view_tensor_dsplit_cuda_int64, test/test_view_ops.py::TestViewOpsCUDA::test_view_tensor_dsplit_cuda_int8, test/test_view_ops.py::TestViewOpsCUDA::test_view_tensor_dsplit_cuda_uint8, test/test_view_ops.py::TestViewOpsCUDA::test_view_tensor_hsplit_cuda_bfloat16, test/test_view_ops.py::TestViewOpsCUDA::test_view_tensor_hsplit_cuda_bool, test/test_view_ops.py::TestViewOpsCUDA::test_view_tensor_hsplit_cuda_complex128, test/test_view_ops.py::TestViewOpsCUDA::test_view_tensor_hsplit_cuda_complex64, test/test_view_ops.py::TestViewOpsCUDA::test_view_tensor_hsplit_cuda_float16, test/test_view_ops.py::TestViewOpsCUDA::test_view_tensor_hsplit_cuda_float32, test/test_view_ops.py::TestViewOpsCUDA::test_view_tensor_hsplit_cuda_float64, test/test_view_ops.py::TestViewOpsCUDA::test_view_tensor_hsplit_cuda_int16, test/test_view_ops.py::TestViewOpsCUDA::test_view_tensor_hsplit_cuda_int32, test/test_view_ops.py::TestViewOpsCUDA::test_view_tensor_hsplit_cuda_int64, test/test_view_ops.py::TestViewOpsCUDA::test_view_tensor_hsplit_cuda_int8, test/test_view_ops.py::TestViewOpsCUDA::test_view_tensor_hsplit_cuda_uint8, test/test_view_ops.py::TestViewOpsCUDA::test_view_tensor_split_cuda_bfloat16, test/test_view_ops.py::TestViewOpsCUDA::test_view_tensor_split_cuda_bool, test/test_view_ops.py::TestViewOpsCUDA::test_view_tensor_split_cuda_complex128, test/test_view_ops.py::TestViewOpsCUDA::test_view_tensor_split_cuda_complex64, test/test_view_ops.py::TestViewOpsCUDA::test_view_tensor_split_cuda_float16, test/test_view_ops.py::TestViewOpsCUDA::test_view_tensor_split_cuda_float32, test/test_view_ops.py::TestViewOpsCUDA::test_view_tensor_split_cuda_float64, test/test_view_ops.py::TestViewOpsCUDA::test_view_tensor_split_cuda_int16, test/test_view_ops.py::TestViewOpsCUDA::test_view_tensor_split_cuda_int32, test/test_view_ops.py::TestViewOpsCUDA::test_view_tensor_split_cuda_int64, test/test_view_ops.py::TestViewOpsCUDA::test_view_tensor_split_cuda_int8, test/test_view_ops.py::TestViewOpsCUDA::test_view_tensor_split_cuda_uint8, test/test_view_ops.py::TestViewOpsCUDA::test_view_tensor_vsplit_cuda_bfloat16, test/test_view_ops.py::TestViewOpsCUDA::test_view_tensor_vsplit_cuda_bool, test/test_view_ops.py::TestViewOpsCUDA::test_view_tensor_vsplit_cuda_complex128, test/test_view_ops.py::TestViewOpsCUDA::test_view_tensor_vsplit_cuda_complex64, test/test_view_ops.py::TestViewOpsCUDA::test_view_tensor_vsplit_cuda_float16, test/test_view_ops.py::TestViewOpsCUDA::test_view_tensor_vsplit_cuda_float32, test/test_view_ops.py::TestViewOpsCUDA::test_view_tensor_vsplit_cuda_float64, test/test_view_ops.py::TestViewOpsCUDA::test_view_tensor_vsplit_cuda_int16, test/test_view_ops.py::TestViewOpsCUDA::test_view_tensor_vsplit_cuda_int32, test/test_view_ops.py::TestViewOpsCUDA::test_view_tensor_vsplit_cuda_int64, test/test_view_ops.py::TestViewOpsCUDA::test_view_tensor_vsplit_cuda_int8, test/test_view_ops.py::TestViewOpsCUDA::test_view_tensor_vsplit_cuda_uint8, test/test_view_ops.py::TestViewOpsCUDA::test_view_view_cuda, test/test_view_ops.py::TestOldViewOpsCUDA::test_T_cuda, test/test_view_ops.py::TestOldViewOpsCUDA::test_as_strided_overflow_storage_offset_cuda, test/test_view_ops.py::TestOldViewOpsCUDA::test_atleast_cuda_complex128, test/test_view_ops.py::TestOldViewOpsCUDA::test_atleast_cuda_complex64, test/test_view_ops.py::TestOldViewOpsCUDA::test_atleast_cuda_float16, test/test_view_ops.py::TestOldViewOpsCUDA::test_atleast_cuda_float32, test/test_view_ops.py::TestOldViewOpsCUDA::test_atleast_cuda_float64, test/test_view_ops.py::TestOldViewOpsCUDA::test_atleast_cuda_int16, test/test_view_ops.py::TestOldViewOpsCUDA::test_atleast_cuda_int32, test/test_view_ops.py::TestOldViewOpsCUDA::test_atleast_cuda_int64, test/test_view_ops.py::TestOldViewOpsCUDA::test_atleast_cuda_int8, test/test_view_ops.py::TestOldViewOpsCUDA::test_atleast_cuda_uint8, test/test_view_ops.py::TestOldViewOpsCUDA::test_atleast_gradient_cuda, test/test_view_ops.py::TestOldViewOpsCUDA::test_big_transpose_cuda, test/test_view_ops.py::TestOldViewOpsCUDA::test_broadcast_shapes_cuda, test/test_view_ops.py::TestOldViewOpsCUDA::test_broadcast_tensors_cuda_float32, test/test_view_ops.py::TestOldViewOpsCUDA::test_broadcast_to_cuda_bool, test/test_view_ops.py::TestOldViewOpsCUDA::test_broadcast_to_cuda_complex128, test/test_view_ops.py::TestOldViewOpsCUDA::test_broadcast_to_cuda_complex64, test/test_view_ops.py::TestOldViewOpsCUDA::test_broadcast_to_cuda_float16, test/test_view_ops.py::TestOldViewOpsCUDA::test_broadcast_to_cuda_float32, test/test_view_ops.py::TestOldViewOpsCUDA::test_broadcast_to_cuda_float64, test/test_view_ops.py::TestOldViewOpsCUDA::test_broadcast_to_cuda_int16, test/test_view_ops.py::TestOldViewOpsCUDA::test_broadcast_to_cuda_int32, test/test_view_ops.py::TestOldViewOpsCUDA::test_broadcast_to_cuda_int64, test/test_view_ops.py::TestOldViewOpsCUDA::test_broadcast_to_cuda_int8, test/test_view_ops.py::TestOldViewOpsCUDA::test_broadcast_to_cuda_uint8, test/test_view_ops.py::TestOldViewOpsCUDA::test_chunk_cuda, test/test_view_ops.py::TestOldViewOpsCUDA::test_conj_neg_view_numpy_error_cuda, test/test_view_ops.py::TestOldViewOpsCUDA::test_contiguous_cuda, test/test_view_ops.py::TestOldViewOpsCUDA::test_crow_col_indices_cuda, test/test_view_ops.py::TestOldViewOpsCUDA::test_empty_reshape_cuda, test/test_view_ops.py::TestOldViewOpsCUDA::test_expand_cuda, test/test_view_ops.py::TestOldViewOpsCUDA::test_flatten_cuda, test/test_view_ops.py::TestOldViewOpsCUDA::test_memory_format_resize__cuda, test/test_view_ops.py::TestOldViewOpsCUDA::test_memory_format_resize_as_cuda, test/test_view_ops.py::TestOldViewOpsCUDA::test_narrow_cuda, test/test_view_ops.py::TestOldViewOpsCUDA::test_narrow_tensor_cuda, test/test_view_ops.py::TestOldViewOpsCUDA::test_python_types_cuda, test/test_view_ops.py::TestOldViewOpsCUDA::test_ravel_cuda, test/test_view_ops.py::TestOldViewOpsCUDA::test_reshape_cuda, test/test_view_ops.py::TestOldViewOpsCUDA::test_reshape_view_semantics_cuda_bfloat16, test/test_view_ops.py::TestOldViewOpsCUDA::test_reshape_view_semantics_cuda_bool, test/test_view_ops.py::TestOldViewOpsCUDA::test_reshape_view_semantics_cuda_complex128, test/test_view_ops.py::TestOldViewOpsCUDA::test_reshape_view_semantics_cuda_complex64, test/test_view_ops.py::TestOldViewOpsCUDA::test_reshape_view_semantics_cuda_float16, test/test_view_ops.py::TestOldViewOpsCUDA::test_reshape_view_semantics_cuda_float32, test/test_view_ops.py::TestOldViewOpsCUDA::test_reshape_view_semantics_cuda_float64, test/test_view_ops.py::TestOldViewOpsCUDA::test_reshape_view_semantics_cuda_int16, test/test_view_ops.py::TestOldViewOpsCUDA::test_reshape_view_semantics_cuda_int32, test/test_view_ops.py::TestOldViewOpsCUDA::test_reshape_view_semantics_cuda_int64, test/test_view_ops.py::TestOldViewOpsCUDA::test_reshape_view_semantics_cuda_int8, test/test_view_ops.py::TestOldViewOpsCUDA::test_reshape_view_semantics_cuda_uint8, test/test_view_ops.py::TestOldViewOpsCUDA::test_resize_all_dtypes_and_devices_cuda, test/test_view_ops.py::TestOldViewOpsCUDA::test_resize_as_all_dtypes_and_devices_cuda, test/test_view_ops.py::TestOldViewOpsCUDA::test_resize_as_preserves_strides_cuda, test/test_view_ops.py::TestOldViewOpsCUDA::test_resize_overflow_cuda, test/test_view_ops.py::TestOldViewOpsCUDA::test_split_cuda, test/test_view_ops.py::TestOldViewOpsCUDA::test_t_cuda, test/test_view_ops.py::TestOldViewOpsCUDA::test_tensor_split_errors_cuda, test/test_view_ops.py::TestOldViewOpsCUDA::test_tensor_split_indices_cuda_bool, test/test_view_ops.py::TestOldViewOpsCUDA::test_tensor_split_indices_cuda_complex128, test/test_view_ops.py::TestOldViewOpsCUDA::test_tensor_split_indices_cuda_complex64, test/test_view_ops.py::TestOldViewOpsCUDA::test_tensor_split_indices_cuda_float16, test/test_view_ops.py::TestOldViewOpsCUDA::test_tensor_split_indices_cuda_float32, test/test_view_ops.py::TestOldViewOpsCUDA::test_tensor_split_indices_cuda_float64, test/test_view_ops.py::TestOldViewOpsCUDA::test_tensor_split_indices_cuda_int16, test/test_view_ops.py::TestOldViewOpsCUDA::test_tensor_split_indices_cuda_int32, test/test_view_ops.py::TestOldViewOpsCUDA::test_tensor_split_indices_cuda_int64, test/test_view_ops.py::TestOldViewOpsCUDA::test_tensor_split_indices_cuda_int8, test/test_view_ops.py::TestOldViewOpsCUDA::test_tensor_split_indices_cuda_uint8, test/test_view_ops.py::TestOldViewOpsCUDA::test_tensor_split_sections_cuda_bool, test/test_view_ops.py::TestOldViewOpsCUDA::test_tensor_split_sections_cuda_complex128, test/test_view_ops.py::TestOldViewOpsCUDA::test_tensor_split_sections_cuda_complex64, test/test_view_ops.py::TestOldViewOpsCUDA::test_tensor_split_sections_cuda_float16, test/test_view_ops.py::TestOldViewOpsCUDA::test_tensor_split_sections_cuda_float32, test/test_view_ops.py::TestOldViewOpsCUDA::test_tensor_split_sections_cuda_float64, test/test_view_ops.py::TestOldViewOpsCUDA::test_tensor_split_sections_cuda_int16, test/test_view_ops.py::TestOldViewOpsCUDA::test_tensor_split_sections_cuda_int32, test/test_view_ops.py::TestOldViewOpsCUDA::test_tensor_split_sections_cuda_int64, test/test_view_ops.py::TestOldViewOpsCUDA::test_tensor_split_sections_cuda_int8, test/test_view_ops.py::TestOldViewOpsCUDA::test_tensor_split_sections_cuda_uint8, test/test_view_ops.py::TestOldViewOpsCUDA::test_transpose_invalid_cuda_complex128, test/test_view_ops.py::TestOldViewOpsCUDA::test_transpose_invalid_cuda_float32, test/test_view_ops.py::TestOldViewOpsCUDA::test_transpose_invalid_cuda_int64, test/test_view_ops.py::TestOldViewOpsCUDA::test_transpose_vs_numpy_cuda_complex128, test/test_view_ops.py::TestOldViewOpsCUDA::test_transpose_vs_numpy_cuda_float32, test/test_view_ops.py::TestOldViewOpsCUDA::test_transpose_vs_numpy_cuda_int64, test/test_view_ops.py::TestOldViewOpsCUDA::test_transposes_cuda_bfloat16, test/test_view_ops.py::TestOldViewOpsCUDA::test_transposes_cuda_bool, test/test_view_ops.py::TestOldViewOpsCUDA::test_transposes_cuda_complex128, test/test_view_ops.py::TestOldViewOpsCUDA::test_transposes_cuda_complex64, test/test_view_ops.py::TestOldViewOpsCUDA::test_transposes_cuda_float16, test/test_view_ops.py::TestOldViewOpsCUDA::test_transposes_cuda_float32, test/test_view_ops.py::TestOldViewOpsCUDA::test_transposes_cuda_float64, test/test_view_ops.py::TestOldViewOpsCUDA::test_transposes_cuda_int16, test/test_view_ops.py::TestOldViewOpsCUDA::test_transposes_cuda_int32, test/test_view_ops.py::TestOldViewOpsCUDA::test_transposes_cuda_int64, test/test_view_ops.py::TestOldViewOpsCUDA::test_transposes_cuda_int8, test/test_view_ops.py::TestOldViewOpsCUDA::test_transposes_cuda_uint8, test/test_view_ops.py::TestOldViewOpsCUDA::test_transposes_errors_cuda_bfloat16, test/test_view_ops.py::TestOldViewOpsCUDA::test_transposes_errors_cuda_bool, test/test_view_ops.py::TestOldViewOpsCUDA::test_transposes_errors_cuda_complex128, test/test_view_ops.py::TestOldViewOpsCUDA::test_transposes_errors_cuda_complex64, test/test_view_ops.py::TestOldViewOpsCUDA::test_transposes_errors_cuda_float16, test/test_view_ops.py::TestOldViewOpsCUDA::test_transposes_errors_cuda_float32, test/test_view_ops.py::TestOldViewOpsCUDA::test_transposes_errors_cuda_float64, test/test_view_ops.py::TestOldViewOpsCUDA::test_transposes_errors_cuda_int16, test/test_view_ops.py::TestOldViewOpsCUDA::test_transposes_errors_cuda_int32, test/test_view_ops.py::TestOldViewOpsCUDA::test_transposes_errors_cuda_int64, test/test_view_ops.py::TestOldViewOpsCUDA::test_transposes_errors_cuda_int8, test/test_view_ops.py::TestOldViewOpsCUDA::test_transposes_errors_cuda_uint8, test/test_view_ops.py::TestOldViewOpsCUDA::test_unsqueeze_cuda, test/test_view_ops.py::TestOldViewOpsCUDA::test_view_all_dtypes_and_devices_cuda, test/test_view_ops.py::TestOldViewOpsCUDA::test_view_cuda, test/test_view_ops.py::TestOldViewOpsCUDA::test_view_empty_cuda 2025-12-04T17:12:07.4918829Z 2025-12-04T17:12:07.4919117Z Finished test_view_ops 1/1 ... [2025-12-04 17:12:07.472419][28755.85531383], took 0.38min 2025-12-04T17:12:07.5112374Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_view_ops/test_view_ops-f5d6b3525797eb50.xml 2025-12-04T17:12:07.6225157Z Running test_type_info 1/1 ... [2025-12-04 17:12:07.622224][28756.005117232] 2025-12-04T17:12:07.6225676Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T17:12:07.6228549Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_type_info.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 17:12:07.622649] 2025-12-04T17:12:13.0951746Z 2025-12-04T17:12:13.0952654Z test_type_info 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_type_info_1.1_9ab09808df8277a9_.log 2025-12-04T17:12:13.0954924Z Running 5 items in this shard: test/test_type_info.py::TestDTypeInfo::test_finfo, test/test_type_info.py::TestDTypeInfo::test_iinfo, test/test_type_info.py::TestDTypeInfo::test_invalid_input, test/test_type_info.py::TestDTypeInfo::test_to_complex, test/test_type_info.py::TestDTypeInfo::test_to_real 2025-12-04T17:12:13.0956397Z 2025-12-04T17:12:13.0956689Z Finished test_type_info 1/1 ... [2025-12-04 17:12:13.094942][28761.477839247], took 0.09min 2025-12-04T17:12:13.1340749Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_type_info/test_type_info-22600993e111f6f2.xml 2025-12-04T17:12:13.1678215Z Running functorch/test_aotdispatch 1/1 ... [2025-12-04 17:12:13.167527][28761.550421028] 2025-12-04T17:12:13.1678829Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T17:12:13.1682193Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'functorch/test_aotdispatch.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 17:12:13.167957] 2025-12-04T17:14:34.5974427Z 2025-12-04T17:14:34.5976097Z functorch/test_aotdispatch 1/1 was successful, full logs can be found in artifacts with path test/test-reports/functorch.test_aotdispatch_1.1_9b74bc936a6dcdae_.log 2025-12-04T17:14:34.6518785Z Running 537 items in this shard: test/functorch/test_aotdispatch.py::TestAOTAutograd::test_alias_of_intermediate_detach_backend_aot_eager_view_replay_for_aliased_outputs_False_dynamic_shapes_False, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_alias_of_intermediate_detach_backend_aot_eager_view_replay_for_aliased_outputs_False_dynamic_shapes_True, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_alias_of_intermediate_detach_backend_aot_eager_view_replay_for_aliased_outputs_True_dynamic_shapes_False, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_alias_of_intermediate_detach_backend_aot_eager_view_replay_for_aliased_outputs_True_dynamic_shapes_True, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_alias_of_intermediate_detach_backend_inductor_view_replay_for_aliased_outputs_False_dynamic_shapes_False, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_alias_of_intermediate_detach_backend_inductor_view_replay_for_aliased_outputs_False_dynamic_shapes_True, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_alias_of_intermediate_detach_backend_inductor_view_replay_for_aliased_outputs_True_dynamic_shapes_False, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_alias_of_intermediate_detach_backend_inductor_view_replay_for_aliased_outputs_True_dynamic_shapes_True, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_autocast_disable_guard, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_backward_mutation_data, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_backward_mutation_forward_inputs, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_backward_mutation_forward_inputs_create_graph, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_backward_mutation_metadata, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_backward_mutation_on_grad_out, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_backward_pass_autocast_custom, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_backward_pass_autocast_off, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_backward_pass_autocast_on, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_batch_norm_amp, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_batchnorm, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_batchnorm_inference, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_buffer_batch_norm, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_buffer_copied_in_graph, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_buffer_copied_in_graph_with_different_shapes, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_compilation_context, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_complex_linear, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_composite_impl_compile, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_custom_autograd, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_custom_tensor_metadata, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_default_partitioner_saves_symints_not_tensors_for_bw, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_dupe_arg, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_dupe_arg_returned_as_output, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_dupe_arg_torture, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_duplicated_arguments_on_tensor_overlap, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_dynamic_output_aliases_input_view_meta_replay, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_dynamic_shape_output_not_in_bw_graph, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_embedding_bag_view_dynamic, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_fw_bw_mutation_no_functionalization1, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_fw_bw_mutation_no_functionalization2, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_grad_context, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_inference_mode, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_inner_grad, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_input_aliased_with_mutation_output_alias, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_input_data_and_metadata_mutation, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_input_data_and_metadata_mutation_aliases_other_input, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_input_inplace_requires_grad_true, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_input_metadata_mutation_aliases, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_input_mutation_alias_everything, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_input_mutation_aliases_and_none_require_gradients, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_input_mutation_aliases_and_output_alias, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_input_mutation_aliases_bases_out_of_order, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_input_mutation_aliases_other_input, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_input_mutation_aliases_other_input2, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_input_mutation_and_output_view, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_input_mutation_batchnorm, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_input_mutation_false_aliasing, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_input_mutation_hidden_from_autograd_aliasing, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_input_mutation_is_output, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_input_mutation_metadata, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_input_mutation_metadata2, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_input_mutation_modifies_autograd_meta_of_aliases, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_input_mutation_multiple, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_input_mutation_noncontiguous, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_input_mutation_output_view_multiple, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_input_mutation_requires_grad_detach, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_input_mutation_requires_grad_no_grad, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_input_mutation_requires_grad_no_grad_detach_mixed, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_input_mutation_requires_grad_no_grad_inference_graph, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_input_mutation_return, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_input_mutation_set__input_mutation, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_input_mutation_set__nop, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_input_mutation_simple, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_input_mutation_simple_with_none_and_nontensor, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_input_mutation_storage_resize_before_set_, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_input_mutation_storage_resize_down, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_input_mutation_storage_resize_down_and_set_, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_input_mutation_storage_resize_up, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_input_output_aliase_custom_autograd_function, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_input_output_view_metadata_mutate_multiple, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_input_output_view_mutate_multiple, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_input_output_view_simple, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_invalid_dupe, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_invalid_dupe_fake, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_invalid_dupe_left_bias, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_invalid_requires_grad, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_invalid_requires_grad_fake, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_list_codegen, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_mark_activations_dynamic, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_mark_activations_dynamic_with_nested, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_mark_outputs_dynamic_use_autograd_False, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_mark_outputs_dynamic_use_autograd_True, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_mem_leak_from_save_for_bw, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_module, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_multi_output, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_multi_output_list, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_mutates_input_noncontiguous, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_nested_subclasses, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_nested_subclasses_complicated_inps, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_nested_subclasses_complicated_inps_mixed, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_nested_subclasses_non_homogenous, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_nested_subclasses_non_nested_grad, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_new_inp_requires_grad_now, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_no_grad_input_output, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_non_tensor_and_none_inputs, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_nonidempotent_amp, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_output_aliases_input_multi_output_view, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_output_aliases_input_multi_output_view_should_raise_autograd_error, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_output_aliases_input_view_meta_replay, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_output_aliases_intermediate_and_returned, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_output_aliases_intermediate_and_returned_different_grad, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_output_aliases_intermediate_and_returned_flipped, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_output_aliases_intermediate_inplace_view, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_output_aliases_intermediate_inplace_view_and_view, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_output_aliases_intermediate_inplace_view_with_detach, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_output_aliases_intermediate_multi_output_view, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_output_aliases_intermediate_multiple, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_output_aliases_intermediate_multiple_mixed, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_output_aliases_intermediate_mutation_linear, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_output_aliases_intermediate_no_grad, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_output_aliases_intermediate_returned_multiple_times, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_output_aliases_intermediate_single, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_output_aliases_intermediate_view_meta_replay, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_output_aliases_multiple_inputs_get_correct_one, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_output_aliases_output_view_meta_replay, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_output_all_alias_types, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_output_dict, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_output_op_depending_on_symint, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_outputs_are_aliased, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_real_weights_in_symbolic_mode, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_real_weights_in_symbolic_mode_with_inplace_ops, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_saved_tensors_hooks_mutations_raise, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_set__and_data_mutation_bad, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_set__and_data_mutation_good, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_set__not_allowed, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_set__steals_view_chain, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_single_output, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_some_output_requires_grad_input_doesnt, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_some_outputs_dont_require_grad_non_view, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_some_outputs_dont_require_grad_view, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_squeeze_mutation, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_subclass_metadata_mutation_req_grad_False, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_subclass_metadata_mutation_req_grad_True, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_subclasses_mixed, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_subclasses_mixed_mode, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_synthetic_base_base_attribute_is_none, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_view_and_inplace_view, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_view_detach, test/functorch/test_aotdispatch.py::TestAOTExport::test_aot_export_ban_dropout_mut_pre_dispatch, test/functorch/test_aotdispatch.py::TestAOTExport::test_aot_export_forward_mutation_multiple_mut, test/functorch/test_aotdispatch.py::TestAOTExport::test_aot_export_forward_mutation_no_buffer_mut, test/functorch/test_aotdispatch.py::TestAOTExport::test_aot_export_functionalized_rng_banned, test/functorch/test_aotdispatch.py::TestAOTExport::test_aot_export_input_dupes_banned, test/functorch/test_aotdispatch.py::TestAOTExport::test_aot_export_input_mutation, test/functorch/test_aotdispatch.py::TestAOTExport::test_aot_export_input_mutation_on_input_requiring_grad_banned, test/functorch/test_aotdispatch.py::TestAOTExport::test_aot_export_input_mutation_on_parameter_banned, test/functorch/test_aotdispatch.py::TestAOTExport::test_aot_export_metadata_mutation_banned, test/functorch/test_aotdispatch.py::TestAOTExport::test_aot_export_module_joint, test/functorch/test_aotdispatch.py::TestAOTExport::test_aot_export_multiple_outputs_require_grad_banned, test/functorch/test_aotdispatch.py::TestAOTExport::test_aot_export_predispatch_buffer_mutation_metadata, test/functorch/test_aotdispatch.py::TestAOTExport::test_aot_export_predispatch_composite_implicit_inplace, test/functorch/test_aotdispatch.py::TestAOTExport::test_aot_export_predispatch_composite_implicit_linear, test/functorch/test_aotdispatch.py::TestAOTExport::test_aot_export_predispatch_contiguous, test/functorch/test_aotdispatch.py::TestAOTExport::test_aot_export_predispatch_conv_and_bn, test/functorch/test_aotdispatch.py::TestAOTExport::test_aot_export_predispatch_func_composite_implicit, test/functorch/test_aotdispatch.py::TestAOTExport::test_aot_export_predispatch_func_simple, test/functorch/test_aotdispatch.py::TestAOTExport::test_aot_export_predispatch_func_view, test/functorch/test_aotdispatch.py::TestAOTExport::test_aot_export_predispatch_map_1, test/functorch/test_aotdispatch.py::TestAOTExport::test_aot_export_predispatch_map_2, test/functorch/test_aotdispatch.py::TestAOTExport::test_aot_export_predispatch_outdtype, test/functorch/test_aotdispatch.py::TestAOTExport::test_aot_export_predispatch_reshape, test/functorch/test_aotdispatch.py::TestAOTExport::test_aot_export_predispatch_with_autograd_op, test/functorch/test_aotdispatch.py::TestAOTExport::test_aot_export_predispatch_with_cond, test/functorch/test_aotdispatch.py::TestAOTExport::test_aot_export_predispatch_with_cond_nested, test/functorch/test_aotdispatch.py::TestAOTExport::test_aot_export_simplified_basic, test/functorch/test_aotdispatch.py::TestAOTExport::test_aot_export_simplified_pytrees_banned, test/functorch/test_aotdispatch.py::TestAOTExport::test_aot_export_synthetic_bases_banned, test/functorch/test_aotdispatch.py::TestAOTExport::test_aot_export_unbacked_arg, test/functorch/test_aotdispatch.py::TestAOTExport::test_aot_export_with_torch_cond, test/functorch/test_aotdispatch.py::TestPartitioning::test_autocast, test/functorch/test_aotdispatch.py::TestPartitioning::test_contiguous, test/functorch/test_aotdispatch.py::TestPartitioning::test_custom_partitioner_fn, test/functorch/test_aotdispatch.py::TestPartitioning::test_default_partitioner_getitem, test/functorch/test_aotdispatch.py::TestPartitioning::test_default_partitioner_output_tensor_shape_tensor, test/functorch/test_aotdispatch.py::TestPartitioning::test_generate_gives_inference_graph, test/functorch/test_aotdispatch.py::TestPartitioning::test_meta_tensor_inplace_op, test/functorch/test_aotdispatch.py::TestPartitioning::test_min_cut_partitioner, test/functorch/test_aotdispatch.py::TestPartitioning::test_min_cut_partitioner_output_tensor_shape_tensor, test/functorch/test_aotdispatch.py::TestPartitioning::test_min_cut_partitioner_raise_getitems, test/functorch/test_aotdispatch.py::TestPartitioning::test_min_cut_partitioner_save_shape, test/functorch/test_aotdispatch.py::TestPartitioning::test_preserve_random, test/functorch/test_aotdispatch.py::TestPartitioning::test_quantize_activation_duplicate_nodes, test/functorch/test_aotdispatch.py::TestPartitioning::test_recompute_partitioning, test/functorch/test_aotdispatch.py::TestAOTDispatch::test_aot_dispatch_incorrect_backward, test/functorch/test_aotdispatch.py::TestAOTDispatch::test_aot_dispatch_inference, test/functorch/test_aotdispatch.py::TestAOTDispatch::test_aot_dispatch_input_data_and_metadata_mutation, test/functorch/test_aotdispatch.py::TestAOTDispatch::test_aot_dispatch_input_metadata_mutation, test/functorch/test_aotdispatch.py::TestAOTDispatch::test_aot_dispatch_input_mutation, test/functorch/test_aotdispatch.py::TestAOTDispatch::test_aot_dispatch_input_mutation_and_output_alias, test/functorch/test_aotdispatch.py::TestAOTDispatch::test_aot_dispatch_output_alias, test/functorch/test_aotdispatch.py::TestAOTDispatch::test_aot_dispatch_output_requires_grad_in_no_grad, test/functorch/test_aotdispatch.py::TestAOTDispatch::test_aot_dispatch_output_requires_grad_in_no_grad_views, test/functorch/test_aotdispatch.py::TestAOTDispatch::test_aot_dispatch_simple, test/functorch/test_aotdispatch.py::TestAOTModuleSimplified::test_aot_module_simplified, test/functorch/test_aotdispatch.py::TestAOTModuleSimplified::test_aot_module_simplified_dynamic, test/functorch/test_aotdispatch.py::TestAOTModuleSimplified::test_aot_module_simplified_fake_tensor_gm_raises, test/functorch/test_aotdispatch.py::TestAOTModuleSimplified::test_aot_module_simplified_preserves_stack_trace, test/functorch/test_aotdispatch.py::TestAOTModuleSimplified::test_aot_module_simplified_preserves_stack_trace_from_mutation, test/functorch/test_aotdispatch.py::TestAOTModuleSimplified::test_aot_test_subclasses_with_tensor_factories, test/functorch/test_aotdispatch.py::TestAOTModuleSimplified::test_flex_attn_noncontiguous_tangents, test/functorch/test_aotdispatch.py::TestAOTModuleSimplified::test_grads_no_force_contiguous_dense, test/functorch/test_aotdispatch.py::TestAOTModuleSimplified::test_grads_no_force_contiguous_nested_subclass, test/functorch/test_aotdispatch.py::TestAOTModuleSimplified::test_grads_no_force_contiguous_nested_tensor_tangent, test/functorch/test_aotdispatch.py::TestAOTModuleSimplified::test_grads_no_force_contiguous_subclass, test/functorch/test_aotdispatch.py::TestAOTModuleSimplified::test_inductor_freezing_with_subclasses, test/functorch/test_aotdispatch.py::TestAOTModuleSimplified::test_inference_python_dispatcher, test/functorch/test_aotdispatch.py::TestAOTModuleSimplified::test_layer_norm, test/functorch/test_aotdispatch.py::TestAOTModuleSimplified::test_lift_fresh_copy_in_graph, test/functorch/test_aotdispatch.py::TestAOTModuleSimplified::test_noncontig_nonmemformat_tangents_dynamic_shapes_False_test_subclasses_False_device_cpu, test/functorch/test_aotdispatch.py::TestAOTModuleSimplified::test_noncontig_nonmemformat_tangents_dynamic_shapes_False_test_subclasses_False_device_cuda, test/functorch/test_aotdispatch.py::TestAOTModuleSimplified::test_noncontig_nonmemformat_tangents_dynamic_shapes_False_test_subclasses_True_device_cpu, test/functorch/test_aotdispatch.py::TestAOTModuleSimplified::test_noncontig_nonmemformat_tangents_dynamic_shapes_False_test_subclasses_True_device_cuda, test/functorch/test_aotdispatch.py::TestAOTModuleSimplified::test_noncontig_nonmemformat_tangents_dynamic_shapes_True_test_subclasses_False_device_cpu, test/functorch/test_aotdispatch.py::TestAOTModuleSimplified::test_noncontig_nonmemformat_tangents_dynamic_shapes_True_test_subclasses_False_device_cuda, test/functorch/test_aotdispatch.py::TestAOTModuleSimplified::test_noncontig_nonmemformat_tangents_dynamic_shapes_True_test_subclasses_True_device_cpu, test/functorch/test_aotdispatch.py::TestAOTModuleSimplified::test_noncontig_nonmemformat_tangents_dynamic_shapes_True_test_subclasses_True_device_cuda, test/functorch/test_aotdispatch.py::TestAOTModuleSimplified::test_rms_norm, test/functorch/test_aotdispatch.py::TestAOTModuleSimplified::test_rrelu, test/functorch/test_aotdispatch.py::TestAOTModuleSimplified::test_rrelu_with_noise_mutation, test/functorch/test_aotdispatch.py::TestAOTModuleSimplified::test_saved_tensors_hooks_base_saved_tensors_hooks_filtering_mode_all, test/functorch/test_aotdispatch.py::TestAOTModuleSimplified::test_saved_tensors_hooks_base_saved_tensors_hooks_filtering_mode_donated, test/functorch/test_aotdispatch.py::TestAOTModuleSimplified::test_saved_tensors_hooks_base_saved_tensors_hooks_filtering_mode_no_static, test/functorch/test_aotdispatch.py::TestAOTModuleSimplified::test_saved_tensors_hooks_donated_buffers, test/functorch/test_aotdispatch.py::TestAOTModuleSimplified::test_saved_tensors_hooks_params, test/functorch/test_aotdispatch.py::TestAOTModuleSimplified::test_saved_tensors_hooks_recompile, test/functorch/test_aotdispatch.py::TestAOTModuleSimplified::test_subclass_parameters, test/functorch/test_aotdispatch.py::TestAOTModuleSimplified::test_subclass_parameters_torture_case, test/functorch/test_aotdispatch.py::TestAOTModuleSimplified::test_tangent_type_coercion, test/functorch/test_aotdispatch.py::TestAOTModuleSimplified::test_wrong_guess_tangent_type, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_alias_of_intermediate_detach_backend_aot_eager_view_replay_for_aliased_outputs_False_dynamic_shapes_False, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_alias_of_intermediate_detach_backend_aot_eager_view_replay_for_aliased_outputs_False_dynamic_shapes_True, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_alias_of_intermediate_detach_backend_aot_eager_view_replay_for_aliased_outputs_True_dynamic_shapes_False, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_alias_of_intermediate_detach_backend_aot_eager_view_replay_for_aliased_outputs_True_dynamic_shapes_True, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_alias_of_intermediate_detach_backend_inductor_view_replay_for_aliased_outputs_False_dynamic_shapes_False, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_alias_of_intermediate_detach_backend_inductor_view_replay_for_aliased_outputs_False_dynamic_shapes_True, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_alias_of_intermediate_detach_backend_inductor_view_replay_for_aliased_outputs_True_dynamic_shapes_False, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_alias_of_intermediate_detach_backend_inductor_view_replay_for_aliased_outputs_True_dynamic_shapes_True, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_autocast_disable_guard, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_backward_mutation_data, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_backward_mutation_forward_inputs, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_backward_mutation_forward_inputs_create_graph, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_backward_mutation_metadata, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_backward_mutation_on_grad_out, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_backward_pass_autocast_custom, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_backward_pass_autocast_off, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_backward_pass_autocast_on, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_batch_norm_amp, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_batchnorm, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_batchnorm_inference, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_buffer_batch_norm, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_buffer_copied_in_graph, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_buffer_copied_in_graph_with_different_shapes, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_compilation_context, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_complex_linear, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_composite_impl_compile, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_custom_autograd, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_custom_tensor_metadata, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_default_partitioner_saves_symints_not_tensors_for_bw, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_dupe_arg, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_dupe_arg_returned_as_output, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_dupe_arg_torture, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_duplicated_arguments_on_tensor_overlap, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_dynamic_output_aliases_input_view_meta_replay, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_dynamic_shape_output_not_in_bw_graph, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_embedding_bag_view_dynamic, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_fw_bw_mutation_no_functionalization1, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_fw_bw_mutation_no_functionalization2, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_grad_context, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_inference_mode, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_inner_grad, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_input_aliased_with_mutation_output_alias, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_input_data_and_metadata_mutation, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_input_data_and_metadata_mutation_aliases_other_input, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_input_inplace_requires_grad_true, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_input_metadata_mutation_aliases, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_input_mutation_alias_everything, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_input_mutation_aliases_and_none_require_gradients, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_input_mutation_aliases_and_output_alias, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_input_mutation_aliases_bases_out_of_order, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_input_mutation_aliases_other_input, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_input_mutation_aliases_other_input2, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_input_mutation_and_output_view, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_input_mutation_batchnorm, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_input_mutation_false_aliasing, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_input_mutation_hidden_from_autograd_aliasing, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_input_mutation_is_output, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_input_mutation_metadata, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_input_mutation_metadata2, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_input_mutation_modifies_autograd_meta_of_aliases, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_input_mutation_multiple, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_input_mutation_noncontiguous, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_input_mutation_output_view_multiple, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_input_mutation_requires_grad_detach, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_input_mutation_requires_grad_no_grad, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_input_mutation_requires_grad_no_grad_detach_mixed, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_input_mutation_requires_grad_no_grad_inference_graph, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_input_mutation_return, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_input_mutation_set__input_mutation, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_input_mutation_set__nop, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_input_mutation_simple, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_input_mutation_simple_with_none_and_nontensor, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_input_mutation_storage_resize_before_set_, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_input_mutation_storage_resize_down, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_input_mutation_storage_resize_down_and_set_, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_input_mutation_storage_resize_up, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_input_output_aliase_custom_autograd_function, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_input_output_view_metadata_mutate_multiple, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_input_output_view_mutate_multiple, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_input_output_view_simple, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_inputs_overlapping_unsqueeze_with_mutation, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_inputs_overlapping_with_mutation_guard_base, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_invalid_dupe, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_invalid_dupe_fake, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_invalid_dupe_left_bias, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_invalid_requires_grad, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_invalid_requires_grad_fake, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_list_codegen, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_mark_activations_dynamic, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_mark_activations_dynamic_with_nested, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_mark_outputs_dynamic_use_autograd_False, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_mark_outputs_dynamic_use_autograd_True, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_mem_leak_from_save_for_bw, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_module, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_multi_output, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_multi_output_list, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_mutates_input_noncontiguous, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_mutation_of_input_in_fw_and_bw, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_mutations_in_bw_detached_from_tangent, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_nested_subclasses, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_nested_subclasses_complicated_inps, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_nested_subclasses_complicated_inps_mixed, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_nested_subclasses_non_homogenous, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_nested_subclasses_non_nested_grad, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_new_inp_requires_grad_now, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_no_grad_input_output, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_non_tensor_and_none_inputs, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_nonidempotent_amp, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_output_aliases_input_multi_output_view, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_output_aliases_input_multi_output_view_should_raise_autograd_error, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_output_aliases_input_view_meta_replay, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_output_aliases_intermediate_and_returned, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_output_aliases_intermediate_and_returned_different_grad, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_output_aliases_intermediate_and_returned_flipped, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_output_aliases_intermediate_inplace_view, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_output_aliases_intermediate_inplace_view_and_view, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_output_aliases_intermediate_inplace_view_with_detach, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_output_aliases_intermediate_multi_output_view, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_output_aliases_intermediate_multiple, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_output_aliases_intermediate_multiple_mixed, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_output_aliases_intermediate_mutation_linear, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_output_aliases_intermediate_no_grad, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_output_aliases_intermediate_returned_multiple_times, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_output_aliases_intermediate_single, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_output_aliases_intermediate_view_meta_replay, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_output_aliases_multiple_inputs_get_correct_one, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_output_aliases_output_view_meta_replay, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_output_all_alias_types, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_output_dict, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_output_op_depending_on_symint, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_outputs_are_aliased, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_real_weights_in_symbolic_mode, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_real_weights_in_symbolic_mode_with_inplace_ops, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_saved_tensors_hooks_mutations_raise, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_set__and_data_mutation_bad, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_set__and_data_mutation_good, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_set__not_allowed, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_set__steals_view_chain, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_single_output, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_some_output_requires_grad_input_doesnt, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_some_outputs_dont_require_grad_non_view, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_some_outputs_dont_require_grad_view, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_squeeze_mutation, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_subclass_metadata_mutation_req_grad_False, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_subclass_metadata_mutation_req_grad_True, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_subclasses_mixed, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_subclasses_mixed_mode, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_synthetic_base_base_attribute_is_none, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_view_and_inplace_view, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_view_detach, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_alias_of_intermediate_detach_backend_aot_eager_view_replay_for_aliased_outputs_False_dynamic_shapes_False, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_alias_of_intermediate_detach_backend_aot_eager_view_replay_for_aliased_outputs_False_dynamic_shapes_True, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_alias_of_intermediate_detach_backend_aot_eager_view_replay_for_aliased_outputs_True_dynamic_shapes_False, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_alias_of_intermediate_detach_backend_aot_eager_view_replay_for_aliased_outputs_True_dynamic_shapes_True, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_alias_of_intermediate_detach_backend_inductor_view_replay_for_aliased_outputs_False_dynamic_shapes_False, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_alias_of_intermediate_detach_backend_inductor_view_replay_for_aliased_outputs_False_dynamic_shapes_True, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_alias_of_intermediate_detach_backend_inductor_view_replay_for_aliased_outputs_True_dynamic_shapes_False, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_alias_of_intermediate_detach_backend_inductor_view_replay_for_aliased_outputs_True_dynamic_shapes_True, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_autocast_disable_guard, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_backward_mutation_data, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_backward_mutation_forward_inputs, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_backward_mutation_forward_inputs_create_graph, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_backward_mutation_metadata, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_backward_mutation_on_grad_out, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_backward_pass_autocast_custom, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_backward_pass_autocast_off, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_backward_pass_autocast_on, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_batch_norm_amp, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_batchnorm, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_batchnorm_inference, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_buffer_batch_norm, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_buffer_copied_in_graph, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_buffer_copied_in_graph_with_different_shapes, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_compilation_context, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_complex_linear, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_composite_impl_compile, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_custom_autograd, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_custom_tensor_metadata, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_default_partitioner_saves_symints_not_tensors_for_bw, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_dupe_arg, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_dupe_arg_returned_as_output, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_dupe_arg_torture, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_duplicated_arguments_on_tensor_overlap, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_dynamic_output_aliases_input_view_meta_replay, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_dynamic_shape_output_not_in_bw_graph, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_embedding_bag_view_dynamic, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_fw_bw_mutation_no_functionalization1, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_fw_bw_mutation_no_functionalization2, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_grad_context, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_inference_mode, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_inner_grad, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_input_aliased_with_mutation_output_alias, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_input_data_and_metadata_mutation, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_input_data_and_metadata_mutation_aliases_other_input, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_input_inplace_requires_grad_true, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_input_metadata_mutation_aliases, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_input_mutation_alias_everything, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_input_mutation_aliases_and_none_require_gradients, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_input_mutation_aliases_and_output_alias, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_input_mutation_aliases_bases_out_of_order, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_input_mutation_aliases_other_input, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_input_mutation_aliases_other_input2, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_input_mutation_and_output_view, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_input_mutation_batchnorm, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_input_mutation_false_aliasing, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_input_mutation_hidden_from_autograd_aliasing, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_input_mutation_is_output, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_input_mutation_metadata, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_input_mutation_metadata2, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_input_mutation_modifies_autograd_meta_of_aliases, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_input_mutation_multiple, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_input_mutation_noncontiguous, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_input_mutation_output_view_multiple, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_input_mutation_requires_grad_detach, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_input_mutation_requires_grad_no_grad, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_input_mutation_requires_grad_no_grad_detach_mixed, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_input_mutation_requires_grad_no_grad_inference_graph, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_input_mutation_return, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_input_mutation_set__input_mutation, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_input_mutation_set__nop, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_input_mutation_simple, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_input_mutation_simple_with_none_and_nontensor, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_input_mutation_storage_resize_before_set_, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_input_mutation_storage_resize_down, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_input_mutation_storage_resize_down_and_set_, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_input_mutation_storage_resize_up, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_input_output_aliase_custom_autograd_function, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_input_output_view_metadata_mutate_multiple, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_input_output_view_mutate_multiple, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_input_output_view_simple, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_inputs_overlapping_unsqueeze_with_mutation, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_inputs_overlapping_with_mutation_guard_base, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_invalid_dupe, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_invalid_dupe_fake, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_invalid_dupe_left_bias, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_invalid_requires_grad, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_invalid_requires_grad_fake, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_list_codegen, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_mark_activations_dynamic, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_mark_activations_dynamic_with_nested, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_mark_outputs_dynamic_use_autograd_False, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_mark_outputs_dynamic_use_autograd_True, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_mem_leak_from_save_for_bw, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_module, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_multi_output, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_multi_output_list, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_mutates_input_noncontiguous, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_mutation_of_input_in_fw_and_bw, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_mutations_in_bw_detached_from_tangent, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_nested_subclasses, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_nested_subclasses_complicated_inps, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_nested_subclasses_complicated_inps_mixed, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_nested_subclasses_non_homogenous, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_nested_subclasses_non_nested_grad, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_new_inp_requires_grad_now, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_no_grad_input_output, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_non_tensor_and_none_inputs, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_nonidempotent_amp, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_output_aliases_input_multi_output_view, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_output_aliases_input_multi_output_view_should_raise_autograd_error, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_output_aliases_input_view_meta_replay, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_output_aliases_intermediate_and_returned, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_output_aliases_intermediate_and_returned_different_grad, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_output_aliases_intermediate_and_returned_flipped, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_output_aliases_intermediate_inplace_view, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_output_aliases_intermediate_inplace_view_and_view, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_output_aliases_intermediate_inplace_view_with_detach, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_output_aliases_intermediate_multi_output_view, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_output_aliases_intermediate_multiple, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_output_aliases_intermediate_multiple_mixed, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_output_aliases_intermediate_mutation_linear, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_output_aliases_intermediate_no_grad, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_output_aliases_intermediate_returned_multiple_times, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_output_aliases_intermediate_single, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_output_aliases_intermediate_view_meta_replay, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_output_aliases_multiple_inputs_get_correct_one, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_output_aliases_output_view_meta_replay, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_output_all_alias_types, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_output_dict, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_output_op_depending_on_symint, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_outputs_are_aliased, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_real_weights_in_symbolic_mode, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_real_weights_in_symbolic_mode_with_inplace_ops, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_saved_tensors_hooks_mutations_raise, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_set__and_data_mutation_bad, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_set__and_data_mutation_good, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_set__not_allowed, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_set__steals_view_chain, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_single_output, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_some_output_requires_grad_input_doesnt, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_some_outputs_dont_require_grad_non_view, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_some_outputs_dont_require_grad_view, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_squeeze_mutation, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_subclass_metadata_mutation_req_grad_False, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_subclass_metadata_mutation_req_grad_True, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_subclasses_mixed, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_subclasses_mixed_mode, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_synthetic_base_base_attribute_is_none, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_view_and_inplace_view, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_view_detach 2025-12-04T17:14:34.6867134Z 2025-12-04T17:14:34.6867549Z Finished functorch/test_aotdispatch 1/1 ... [2025-12-04 17:14:34.598421][28902.981313059], took 2.36min 2025-12-04T17:14:34.6868910Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/functorch.test_aotdispatch/functorch.test_aotdispatch-efb7e0b79840fa38.xml 2025-12-04T17:14:34.7453241Z Running test_scatter_gather_ops 1/1 ... [2025-12-04 17:14:34.745038][28903.127932174] 2025-12-04T17:14:34.7453846Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T17:14:34.7457203Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_scatter_gather_ops.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 17:14:34.745477] 2025-12-04T17:14:57.9962995Z 2025-12-04T17:14:57.9963978Z test_scatter_gather_ops 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_scatter_gather_ops_1.1_f3de59c3735d2471_.log 2025-12-04T17:14:58.0000109Z Running 76 items in this shard: test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_gather_backward_with_empty_index_tensor_sparse_grad_False_cuda_float32, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_gather_backward_with_empty_index_tensor_sparse_grad_False_cuda_float64, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_gather_backward_with_empty_index_tensor_sparse_grad_True_cuda_float32, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_gather_backward_with_empty_index_tensor_sparse_grad_True_cuda_float64, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_gather_bool_cuda_bool, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_gather_cuda_complex64, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_gather_cuda_float32, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_gather_expanded_index_cuda_bfloat16, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_gather_expanded_index_cuda_float32, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_gather_expanded_index_cuda_float64, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_gather_large_cuda_bfloat16, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_gather_large_cuda_int8, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter__cuda_complex64, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter__cuda_float16, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter__cuda_float32, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter__reductions_cuda_float16, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter__reductions_cuda_float32, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter__scalar_cuda_complex64, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter__scalar_cuda_float16, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter__scalar_cuda_float32, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_add__cuda_complex64, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_add__cuda_float16, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_add__cuda_float32, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_add_broadcasted_index_deterministic_cuda_float32, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_add_mult_index_base_cuda_float32, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_expanded_index_cuda_bfloat16, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_expanded_index_cuda_float16, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_expanded_index_cuda_float32, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_expanded_index_cuda_float64, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_reduce_amax_cuda_bfloat16, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_reduce_amax_cuda_float16, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_reduce_amax_cuda_float32, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_reduce_amax_cuda_float64, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_reduce_amax_cuda_int16, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_reduce_amax_cuda_int32, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_reduce_amax_cuda_int64, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_reduce_amax_cuda_int8, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_reduce_amax_cuda_uint8, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_reduce_amin_cuda_bfloat16, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_reduce_amin_cuda_float16, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_reduce_amin_cuda_float32, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_reduce_amin_cuda_float64, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_reduce_amin_cuda_int16, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_reduce_amin_cuda_int32, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_reduce_amin_cuda_int64, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_reduce_amin_cuda_int8, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_reduce_amin_cuda_uint8, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_reduce_mean_cuda_bfloat16, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_reduce_mean_cuda_float16, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_reduce_mean_cuda_float32, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_reduce_mean_cuda_float64, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_reduce_mean_cuda_int16, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_reduce_mean_cuda_int32, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_reduce_mean_cuda_int64, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_reduce_mean_cuda_int8, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_reduce_mean_cuda_uint8, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_reduce_prod_cuda_bfloat16, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_reduce_prod_cuda_float16, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_reduce_prod_cuda_float32, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_reduce_prod_cuda_float64, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_reduce_prod_cuda_int16, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_reduce_prod_cuda_int32, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_reduce_prod_cuda_int64, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_reduce_prod_cuda_int8, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_reduce_prod_cuda_uint8, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_reduce_sum_cuda_bfloat16, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_reduce_sum_cuda_complex128, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_reduce_sum_cuda_complex64, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_reduce_sum_cuda_float16, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_reduce_sum_cuda_float32, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_reduce_sum_cuda_float64, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_reduce_sum_cuda_int16, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_reduce_sum_cuda_int32, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_reduce_sum_cuda_int64, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_reduce_sum_cuda_int8, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_reduce_sum_cuda_uint8 2025-12-04T17:14:58.0035030Z 2025-12-04T17:14:58.0035374Z Finished test_scatter_gather_ops 1/1 ... [2025-12-04 17:14:57.996119][28926.379014958], took 0.39min 2025-12-04T17:14:58.0368430Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_scatter_gather_ops/test_scatter_gather_ops-5274bd99c8a0619f.xml 2025-12-04T17:14:58.1148029Z Running test_cuda_multigpu 1/1 ... [2025-12-04 17:14:58.114496][28926.49739045] 2025-12-04T17:14:58.1148576Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T17:14:58.1151854Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_cuda_multigpu.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 17:14:58.114943] 2025-12-04T17:15:06.0409023Z 2025-12-04T17:15:06.0410137Z test_cuda_multigpu 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_cuda_multigpu_1.1_5809c25d23c9a947_.log 2025-12-04T17:15:06.0432121Z Running 61 items in this shard: test/test_cuda_multigpu.py::TestCudaMultiGPU::test_autogpu, test/test_cuda_multigpu.py::TestCudaMultiGPU::test_caching_pinned_memory_multi_gpu, test/test_cuda_multigpu.py::TestCudaMultiGPU::test_cat_autogpu, test/test_cuda_multigpu.py::TestCudaMultiGPU::test_copy_device, test/test_cuda_multigpu.py::TestCudaMultiGPU::test_copy_streams, test/test_cuda_multigpu.py::TestCudaMultiGPU::test_cuda_device_memory_allocated, test/test_cuda_multigpu.py::TestCudaMultiGPU::test_cuda_init_race, test/test_cuda_multigpu.py::TestCudaMultiGPU::test_cuda_memory_leak_detection, test/test_cuda_multigpu.py::TestCudaMultiGPU::test_cuda_set_device, test/test_cuda_multigpu.py::TestCudaMultiGPU::test_cuda_synchronize, test/test_cuda_multigpu.py::TestCudaMultiGPU::test_current_stream, test/test_cuda_multigpu.py::TestCudaMultiGPU::test_default_stream, test/test_cuda_multigpu.py::TestCudaMultiGPU::test_events_multi_gpu_elapsed_time, test/test_cuda_multigpu.py::TestCudaMultiGPU::test_events_multi_gpu_query, test/test_cuda_multigpu.py::TestCudaMultiGPU::test_events_wait, test/test_cuda_multigpu.py::TestCudaMultiGPU::test_external_streams, test/test_cuda_multigpu.py::TestCudaMultiGPU::test_external_streams_multi_device, test/test_cuda_multigpu.py::TestCudaMultiGPU::test_get_set_rng_state_all, test/test_cuda_multigpu.py::TestCudaMultiGPU::test_grad_scaling_device_as_key, test/test_cuda_multigpu.py::TestCudaMultiGPU::test_grad_scaling_multigpu, test/test_cuda_multigpu.py::TestCudaMultiGPU::test_grad_scaling_scale, test/test_cuda_multigpu.py::TestCudaMultiGPU::test_load_nonexistent_device, test/test_cuda_multigpu.py::TestCudaMultiGPU::test_mem_get_info, test/test_cuda_multigpu.py::TestCudaMultiGPU::test_memory_stats, test/test_cuda_multigpu.py::TestCudaMultiGPU::test_memory_stats_multigpu, test/test_cuda_multigpu.py::TestCudaMultiGPU::test_multigpu_serialization_remap, test/test_cuda_multigpu.py::TestCudaMultiGPU::test_multigpu_serialization_remap_dict, test/test_cuda_multigpu.py::TestCudaMultiGPU::test_multigpu_storage_clone, test/test_cuda_multigpu.py::TestCudaMultiGPU::test_new, test/test_cuda_multigpu.py::TestCudaMultiGPU::test_rng_state_offset, test/test_cuda_multigpu.py::TestCudaMultiGPU::test_stream_context, test/test_cuda_multigpu.py::TestCudaMultiGPU::test_stream_event_device, test/test_cuda_multigpu.py::TestCudaMultiGPU::test_stream_event_nogil, test/test_cuda_multigpu.py::TestCudaMultiGPU::test_streaming_backwards_device_transfer, test/test_cuda_multigpu.py::TestCudaMultiGPU::test_streams_multi_gpu, test/test_cuda_multigpu.py::TestCudaMultiGPU::test_streams_multi_gpu_eq, test/test_cuda_multigpu.py::TestCudaMultiGPU::test_streams_multi_gpu_query, test/test_cuda_multigpu.py::TestCudaMultiGPU::test_streams_priority, test/test_cuda_multigpu.py::TestCudaMultiGPU::test_tensor_device, test/test_cuda_multigpu.py::TestCudaComm::test_broadcast_coalesced, test/test_cuda_multigpu.py::TestCudaComm::test_broadcast_coalesced_dense_only, test/test_cuda_multigpu.py::TestCudaComm::test_broadcast_coalesced_empty_tensors, test/test_cuda_multigpu.py::TestCudaComm::test_broadcast_cpu, test/test_cuda_multigpu.py::TestCudaComm::test_broadcast_gpu, test/test_cuda_multigpu.py::TestCudaComm::test_gather, test/test_cuda_multigpu.py::TestCudaComm::test_gather_dim, test/test_cuda_multigpu.py::TestCudaComm::test_gather_namedtuple, test/test_cuda_multigpu.py::TestCudaComm::test_gather_neg_dim, test/test_cuda_multigpu.py::TestCudaComm::test_memory_format_scatter_gather, test/test_cuda_multigpu.py::TestCudaComm::test_reduce_add, test/test_cuda_multigpu.py::TestCudaComm::test_reduce_add_coalesced, test/test_cuda_multigpu.py::TestCudaComm::test_reduce_add_coalesced_dense_only, test/test_cuda_multigpu.py::TestCudaComm::test_scatter_cpu, test/test_cuda_multigpu.py::TestCudaComm::test_scatter_cpu_dim, test/test_cuda_multigpu.py::TestCudaComm::test_scatter_cpu_neg_dim, test/test_cuda_multigpu.py::TestCudaComm::test_scatter_cpu_sizes, test/test_cuda_multigpu.py::TestCudaComm::test_scatter_gpu, test/test_cuda_multigpu.py::TestCudaComm::test_scatter_gpu_dim, test/test_cuda_multigpu.py::TestCudaComm::test_scatter_gpu_neg_dim, test/test_cuda_multigpu.py::TestCudaComm::test_scatter_gpu_sizes, test/test_cuda_multigpu.py::TestCudaComm::test_scatter_namedtuple 2025-12-04T17:15:06.0452719Z 2025-12-04T17:15:06.0453046Z Finished test_cuda_multigpu 1/1 ... [2025-12-04 17:15:06.040758][28934.423651614], took 0.13min 2025-12-04T17:15:06.0808021Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_cuda_multigpu/test_cuda_multigpu-a9a26e79d8868522.xml 2025-12-04T17:15:06.1535451Z Running torch_np/numpy_tests/lib/test_index_tricks 1/1 ... [2025-12-04 17:15:06.153137][28934.536030664] 2025-12-04T17:15:06.1536153Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T17:15:06.1538189Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'torch_np/numpy_tests/lib/test_index_tricks.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 17:15:06.153567] 2025-12-04T17:15:12.0768955Z 2025-12-04T17:15:12.0770124Z torch_np/numpy_tests/lib/test_index_tricks 1/1 was successful, full logs can be found in artifacts with path test/test-reports/torch_np.numpy_tests.lib.test_index_tricks_1.1_e2c692e766f99011_.log 2025-12-04T17:15:12.0792230Z Running 47 items in this shard: test/torch_np/numpy_tests/lib/test_index_tricks.py::TestRavelUnravelIndex::test_0d, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestRavelUnravelIndex::test_basic, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestRavelUnravelIndex::test_big_indices, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestRavelUnravelIndex::test_clipmodes, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestRavelUnravelIndex::test_dtypes, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestRavelUnravelIndex::test_empty_array_ravel_mode_clip, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestRavelUnravelIndex::test_empty_array_ravel_mode_raise, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestRavelUnravelIndex::test_empty_array_ravel_mode_wrap, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestRavelUnravelIndex::test_empty_array_unravel, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestRavelUnravelIndex::test_empty_indices, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestRavelUnravelIndex::test_writeability, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestGrid::test_accepts_longdouble, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestGrid::test_accepts_npcomplexfloating, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestGrid::test_accepts_npfloating, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestGrid::test_basic, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestGrid::test_linspace_equivalence, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestGrid::test_mgrid_size_none_handling_start0_stop_10_step0_expected0, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestGrid::test_mgrid_size_none_handling_start_-10_stop_20_step1_expected1, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestGrid::test_nd, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestGrid::test_sparse, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestConcatenator::test_0d, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestConcatenator::test_1d, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestConcatenator::test_2d, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestConcatenator::test_complex_step, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestConcatenator::test_mixed_type, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestConcatenator::test_more_mixed_type, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestNdenumerate::test_basic, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestIndexExpression::test_regression_1, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestIndexExpression::test_simple_1, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestIx_::test_1d_only, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestIx_::test_bool, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestIx_::test_regression_1, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestIx_::test_repeated_input, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestIx_::test_shape_and_dtype, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestC::test_c_, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestFillDiagonal::test_basic, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestFillDiagonal::test_hetero_shape_handling, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestFillDiagonal::test_low_dim_handling, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestFillDiagonal::test_operate_4d_array, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestFillDiagonal::test_tall_matrix, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestFillDiagonal::test_tall_matrix_wrap, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestFillDiagonal::test_wide_matrix, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestDiagIndices::test_diag_indices, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestDiagIndicesFrom::test_diag_indices_from, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestDiagIndicesFrom::test_error_shape_mismatch, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestDiagIndicesFrom::test_error_small_input, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestNdIndex::test_ndindex 2025-12-04T17:15:12.0812806Z 2025-12-04T17:15:12.0813248Z Finished torch_np/numpy_tests/lib/test_index_tricks 1/1 ... [2025-12-04 17:15:12.076737][28940.459632773], took 0.10min 2025-12-04T17:15:12.1170680Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/torch_np.numpy_tests.lib.test_index_tricks/torch_np.numpy_tests.lib.test_index_tricks-43f32c31fbfc43cd.xml 2025-12-04T17:15:12.1501175Z Running test_jit_autocast 1/1 ... [2025-12-04 17:15:12.149853][28940.532747438] 2025-12-04T17:15:12.1501698Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T17:15:12.1504812Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_jit_autocast.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 17:15:12.150262] 2025-12-04T17:15:31.6997373Z 2025-12-04T17:15:31.6998320Z test_jit_autocast 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_jit_autocast_1.1_9af7b4b8017e3406_.log 2025-12-04T17:15:31.7016759Z Running 54 items in this shard: test/test_jit_autocast.py::TestAutocast::test_autocast_api, test/test_jit_autocast.py::TestAutocast::test_autocast_api_not_supported, test/test_jit_autocast.py::TestAutocast::test_autocast_autodiff, test/test_jit_autocast.py::TestAutocast::test_autocast_decorator, test/test_jit_autocast.py::TestAutocast::test_autocast_decorator_outside_jit, test/test_jit_autocast.py::TestAutocast::test_autocast_mixed_dtypes, test/test_jit_autocast.py::TestAutocast::test_callees, test/test_jit_autocast.py::TestAutocast::test_callees_with_autocast_off, test/test_jit_autocast.py::TestAutocast::test_callees_with_autocast_on, test/test_jit_autocast.py::TestAutocast::test_conditional_autocast, test/test_jit_autocast.py::TestAutocast::test_control_flow, test/test_jit_autocast.py::TestAutocast::test_divergent_autocast, test/test_jit_autocast.py::TestAutocast::test_divergent_types, test/test_jit_autocast.py::TestAutocast::test_duplicate_inputs, test/test_jit_autocast.py::TestAutocast::test_eager_and_script, test/test_jit_autocast.py::TestAutocast::test_explicit_casts, test/test_jit_autocast.py::TestAutocast::test_fp32_policy, test/test_jit_autocast.py::TestAutocast::test_fp32_policy_with_fp64, test/test_jit_autocast.py::TestAutocast::test_fp32_set_opt_dtype_policy, test/test_jit_autocast.py::TestAutocast::test_fp32_set_opt_dtype_policy_fp64, test/test_jit_autocast.py::TestAutocast::test_ignore_amp, test/test_jit_autocast.py::TestAutocast::test_implicitly_nested_autocast, test/test_jit_autocast.py::TestAutocast::test_inplace, test/test_jit_autocast.py::TestAutocast::test_jit_autocast_softmax_cpu, test/test_jit_autocast.py::TestAutocast::test_jit_autocast_softmax_gpu, test/test_jit_autocast.py::TestAutocast::test_jit_call_method_under_autocast, test/test_jit_autocast.py::TestAutocast::test_jit_executor_under_autocast, test/test_jit_autocast.py::TestAutocast::test_jit_freeze_autocast_basic, test/test_jit_autocast.py::TestAutocast::test_jit_freeze_autocast_constants, test/test_jit_autocast.py::TestAutocast::test_jit_generic_autocast, test/test_jit_autocast.py::TestAutocast::test_linear_bf16, test/test_jit_autocast.py::TestAutocast::test_minimal, test/test_jit_autocast.py::TestAutocast::test_minimal_cpu, test/test_jit_autocast.py::TestAutocast::test_minimal_off, test/test_jit_autocast.py::TestAutocast::test_nested_autocast, test/test_jit_autocast.py::TestAutocast::test_promote_policy, test/test_jit_autocast.py::TestAutocast::test_promote_policy_fp64, test/test_jit_autocast.py::TestAutocast::test_reused_autocast, test/test_jit_autocast.py::TestAutocast::test_reused_autocast_expr, test/test_jit_autocast.py::TestAutocast::test_runtime_autocast_state, test/test_jit_autocast.py::TestAutocast::test_runtime_autocast_state_expr, test/test_jit_autocast.py::TestAutocast::test_script_and_tracing, test/test_jit_autocast.py::TestAutocast::test_script_and_tracing_with_autocast, test/test_jit_autocast.py::TestAutocast::test_script_module, test/test_jit_autocast.py::TestAutocast::test_tracing_and_script, test/test_jit_autocast.py::TestAutocast::test_tracing_with_autocast_and_script, test/test_jit_autocast.py::TestJitTraceAutocast::test_cat_promote, test/test_jit_autocast.py::TestJitTraceAutocast::test_generate_autocast_jit_trace_model, test/test_jit_autocast.py::TestJitTraceAutocast::test_nchw_autocast_jit_trace_model, test/test_jit_autocast.py::TestJitTraceAutocast::test_nhwc_autocast_jit_trace_model, test/test_jit_autocast.py::TestJitTraceAutocast::test_script_autocast_cpu, test/test_jit_autocast.py::TestJitTraceAutocast::test_script_autocast_cuda, test/test_jit_autocast.py::TestJitTraceAutocast::test_script_autocast_enable_and_check, test/test_jit_autocast.py::TestJitTraceAutocast::test_scripted_aliasing 2025-12-04T17:15:31.7034735Z 2025-12-04T17:15:31.7035046Z Finished test_jit_autocast 1/1 ... [2025-12-04 17:15:31.699542][28960.082437457], took 0.33min 2025-12-04T17:15:31.7401890Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_jit_autocast/test_jit_autocast-9b5e22ff1077135a.xml 2025-12-04T17:15:31.8234723Z Running nn/test_pooling 1/1 ... [2025-12-04 17:15:31.823144][28960.206037859] 2025-12-04T17:15:31.8235491Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T17:15:31.8238186Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'nn/test_pooling.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 17:15:31.823580] 2025-12-04T17:15:47.0613491Z 2025-12-04T17:15:47.0614566Z nn/test_pooling 1/1 was successful, full logs can be found in artifacts with path test/test-reports/nn.test_pooling_1.1_e8e935ea909a1883_.log 2025-12-04T17:15:47.0682844Z Running 147 items in this shard: test/nn/test_pooling.py::TestAvgPool::test_avg_pool1d_ceil_mode, test/nn/test_pooling.py::TestAvgPool::test_avg_pool2d_ceil_mode, test/nn/test_pooling.py::TestAvgPool::test_avg_pool3d_ceil_mode, test/nn/test_pooling.py::TestAvgPool::test_doubletensor_avg_pool2d, test/nn/test_pooling.py::TestAvgPool::test_doubletensor_avg_pool2d_with_divisor, test/nn/test_pooling.py::TestAvgPool::test_doubletensor_avg_pool3d, test/nn/test_pooling.py::TestAvgPool::test_doubletensor_avg_pool3d_with_divisor, test/nn/test_pooling.py::TestPoolingNN::test_MaxUnpool2d_output_size, test/nn/test_pooling.py::TestPoolingNN::test_adaptive_avg_pooling_nhwc_overflow, test/nn/test_pooling.py::TestPoolingNN::test_adaptive_avg_pooling_overflow, test/nn/test_pooling.py::TestPoolingNN::test_adaptive_pooling_avg_nhwc, test/nn/test_pooling.py::TestPoolingNN::test_adaptive_pooling_avg_nhwc_launch_config_backward, test/nn/test_pooling.py::TestPoolingNN::test_adaptive_pooling_avg_nhwc_launch_config_forward, test/nn/test_pooling.py::TestPoolingNN::test_adaptive_pooling_avg_nhwc_non_contiguous, test/nn/test_pooling.py::TestPoolingNN::test_adaptive_pooling_lower_precision, test/nn/test_pooling.py::TestPoolingNN::test_adaptive_pooling_size_none, test/nn/test_pooling.py::TestPoolingNN::test_adaptive_pooling_size_overflow, test/nn/test_pooling.py::TestPoolingNN::test_max_unpool, test/nn/test_pooling.py::TestPoolingNN::test_max_unpool2d_nhwc_cpu, test/nn/test_pooling.py::TestPoolingNN::test_max_unpool3d_input_check, test/nn/test_pooling.py::TestPoolingNN::test_quantized_max_pool1d_empty_kernel, test/nn/test_pooling.py::TestPoolingNN::test_quantized_max_pool3d, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_AdaptiveMaxPool1d_indices_cuda_bfloat16, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_AdaptiveMaxPool1d_indices_cuda_float16, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_AdaptiveMaxPool1d_indices_cuda_float32, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_AdaptiveMaxPool1d_indices_cuda_float64, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_AdaptiveMaxPool2d_indices_cuda_bfloat16, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_AdaptiveMaxPool2d_indices_cuda_float16, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_AdaptiveMaxPool2d_indices_cuda_float32, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_AdaptiveMaxPool2d_indices_cuda_float64, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_AdaptiveMaxPool3d_indices_cuda_bfloat16, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_AdaptiveMaxPool3d_indices_cuda_float16, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_AdaptiveMaxPool3d_indices_cuda_float32, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_AdaptiveMaxPool3d_indices_cuda_float64, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_AdaptiveMaxPool_zero_batch_dim_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_AvgPool2d_empty_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_AvgPool3d_backward_after_cat_dim1_device_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_FractionalMaxPool2d_zero_batch_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_FractionalMaxPool2d_zero_out_size_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_FractionalMaxPool2d_zero_samples_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_FractionalMaxPool3d_errors_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_FractionalMaxPool3d_zero_batch_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_FractionalMaxPool3d_zero_out_size_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_FractionalMaxPool3d_zero_samples_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_LPPool1d_kernel_size_overflow_large_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_MaxPool1d_indices_cuda_bfloat16, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_MaxPool1d_indices_cuda_float16, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_MaxPool1d_indices_cuda_float32, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_MaxPool1d_indices_cuda_float64, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_MaxPool2d_indices_cuda_bfloat16, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_MaxPool2d_indices_cuda_float16, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_MaxPool2d_indices_cuda_float32, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_MaxPool2d_indices_cuda_float64, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_MaxPool3d_errors_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_MaxPool3d_indices_cuda_bfloat16, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_MaxPool3d_indices_cuda_float16, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_MaxPool3d_indices_cuda_float32, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_MaxPool3d_indices_cuda_float64, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_MaxPool_zero_batch_dim_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_MaxUnpool_index_errors_case10_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_MaxUnpool_index_errors_case1_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_MaxUnpool_index_errors_case2_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_MaxUnpool_index_errors_case3_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_MaxUnpool_index_errors_case4_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_MaxUnpool_index_errors_case5_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_MaxUnpool_index_errors_case6_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_MaxUnpool_index_errors_case7_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_MaxUnpool_index_errors_case8_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_MaxUnpool_index_errors_case9_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_MaxUnpool_invalid_output_size_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_MaxUnpool_zero_batch_dim_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_adaptive_avg_pool2d_output_size_one_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_adaptive_avg_pool3d_output_size_one_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_adaptive_avg_pooling_backward_fails_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_adaptive_max_pooling_backward_fails_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_adaptive_pool_odd_size_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_adaptive_pooling_empty_output_size_cuda_bfloat16, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_adaptive_pooling_empty_output_size_cuda_float16, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_adaptive_pooling_empty_output_size_cuda_float32, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_adaptive_pooling_empty_output_size_cuda_float64, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_adaptive_pooling_max_nhwc_cuda_float32, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_adaptive_pooling_max_nhwc_cuda_float64, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_adaptive_pooling_no_suppot_input_cuda_int16, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_adaptive_pooling_no_suppot_input_cuda_int32, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_adaptive_pooling_no_suppot_input_cuda_int64, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_adaptive_pooling_no_suppot_input_cuda_int8, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_adaptive_pooling_no_suppot_input_cuda_uint8, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_adaptive_pooling_zero_batch_cuda_float32, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_adaptive_pooling_zero_batch_cuda_float64, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_avg_pool2d_nhwc_cuda_float16, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_avg_pool2d_nhwc_cuda_float32, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_avg_pool2d_nhwc_cuda_float64, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_avg_pool2d_reduced_floating_cuda_bfloat16, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_avg_pool2d_reduced_floating_cuda_float16, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_fractional_max_pool2d_backward_fails_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_fractional_max_pool2d_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_fractional_max_pool3d_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_fractional_max_pool_nan_inf_cuda_float16, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_fractional_max_pool_nan_inf_cuda_float32, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_fractional_max_pool_nan_inf_cuda_float64, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_max_pool1d_corner_cases_cuda_float32, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_max_pool1d_corner_cases_cuda_float64, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_max_pool1d_cuda_float32, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_max_pool1d_cuda_float64, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_max_pool2d_corner_cases_cuda_int32, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_max_pool2d_corner_cases_cuda_int64, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_max_pool2d_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_max_pool2d_indices_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_max_pool2d_nhwc_cuda_float16, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_max_pool2d_nhwc_cuda_float32, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_max_pool2d_nhwc_cuda_float64, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_max_pool2d_with_indices_backward_fails_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_max_pool3d_ndhwc_cuda_float16, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_max_pool3d_ndhwc_cuda_float32, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_max_pool3d_ndhwc_cuda_float64, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_max_pool_bfloat16_half_cuda_bfloat16, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_max_pool_bfloat16_half_cuda_float16, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_max_pool_nan_inf_cuda_float16, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_max_pool_nan_inf_cuda_float32, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_max_pool_nan_inf_cuda_float64, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_max_unpool_invalid_indices_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_maxpool3d_non_square_backward_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_maxpool_indices_no_batch_dim_cuda_bfloat16, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_maxpool_indices_no_batch_dim_cuda_float16, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_maxpool_indices_no_batch_dim_cuda_float32, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_maxpool_indices_no_batch_dim_cuda_float64, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_pool3d_large_size_int64_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_pool3d_size_one_feature_dim_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_pool_invalid_size_cuda_bfloat16, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_pool_invalid_size_cuda_float16, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_pool_invalid_size_cuda_float32, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_pool_invalid_size_cuda_float64, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_pool_large_size_cuda_bfloat16, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_pool_large_size_cuda_float16, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_pool_large_size_cuda_float32, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_pool_large_size_cuda_float64, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_pooling_bfloat16_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_pooling_large_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_pooling_max_nhwc_cuda_float32, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_pooling_max_nhwc_cuda_float64, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_pooling_shape_kernel_avg_pooling_dims_1_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_pooling_shape_kernel_avg_pooling_dims_2_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_pooling_shape_kernel_avg_pooling_dims_3_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_pooling_shape_kernel_max_pooling_dims_1_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_pooling_shape_kernel_max_pooling_dims_2_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_pooling_shape_kernel_max_pooling_dims_3_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_pooling_zero_stride_cuda 2025-12-04T17:15:47.0750283Z 2025-12-04T17:15:47.0750661Z Finished nn/test_pooling 1/1 ... [2025-12-04 17:15:47.061345][28975.444240019], took 0.25min 2025-12-04T17:15:47.1015471Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/nn.test_pooling/nn.test_pooling-2151df52b065bbdf.xml 2025-12-04T17:15:47.1855714Z Running nn/test_embedding 1/1 ... [2025-12-04 17:15:47.185235][28975.568129139] 2025-12-04T17:15:47.1856276Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T17:15:47.1859353Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'nn/test_embedding.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 17:15:47.185684] 2025-12-04T17:16:06.4275801Z 2025-12-04T17:16:06.4276905Z nn/test_embedding 1/1 was successful, full logs can be found in artifacts with path test/test-reports/nn.test_embedding_1.1_dc9119745a665b44_.log 2025-12-04T17:16:06.4368478Z Running 156 items in this shard: test/nn/test_embedding.py::TestEmbeddingNN::test_embedding_bag_from_pretrained, test/nn/test_embedding.py::TestEmbeddingNN::test_embedding_bag_from_pretrained_padding_idx, test/nn/test_embedding.py::TestEmbeddingNN::test_embedding_bag_functional, test/nn/test_embedding.py::TestEmbeddingNN::test_embedding_bag_padding_idx_error, test/nn/test_embedding.py::TestEmbeddingNN::test_embedding_from_pretrained_float32, test/nn/test_embedding.py::TestEmbeddingNN::test_embedding_from_pretrained_float64, test/nn/test_embedding.py::TestEmbeddingNN::test_embedding_from_pretrained_int16, test/nn/test_embedding.py::TestEmbeddingNN::test_embedding_from_pretrained_int32, test/nn/test_embedding.py::TestEmbeddingNN::test_embedding_from_pretrained_int64, test/nn/test_embedding.py::TestEmbeddingNN::test_embedding_from_pretrained_int8, test/nn/test_embedding.py::TestEmbeddingNN::test_embedding_from_pretrained_options, test/nn/test_embedding.py::TestEmbeddingNN::test_embedding_from_pretrained_padding_idx, test/nn/test_embedding.py::TestEmbeddingNN::test_embedding_from_pretrained_uint8, test/nn/test_embedding.py::TestEmbeddingNN::test_embedding_functional, test/nn/test_embedding.py::TestEmbeddingNN::test_embedding_max_norm, test/nn/test_embedding.py::TestEmbeddingNN::test_embedding_max_norm_unsorted_repeating_indices, test/nn/test_embedding.py::TestEmbeddingNN::test_embedding_sparse_basic, test/nn/test_embedding.py::TestEmbeddingNN::test_embedding_sparse_empty_tensor, test/nn/test_embedding.py::TestEmbeddingNN::test_embeddingbag_2d_include_last_offset, test/nn/test_embedding.py::TestEmbeddingNN::test_embeddingbag_from_pretrained, test/nn/test_embedding.py::TestEmbeddingNN::test_embeddingbag_from_pretrained_options, test/nn/test_embedding.py::TestEmbeddingNN::test_embeddingbag_include_last_offset, test/nn/test_embedding.py::TestEmbeddingNN::test_large_tensors, test/nn/test_embedding.py::TestEmbeddingNN::test_move_sparse_half_embedding, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_EmbeddingBag_empty_per_sample_weights_and_offsets_cuda_int32_int32_float16, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_EmbeddingBag_empty_per_sample_weights_and_offsets_cuda_int32_int32_float32, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_EmbeddingBag_empty_per_sample_weights_and_offsets_cuda_int32_int32_float64, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_EmbeddingBag_empty_per_sample_weights_and_offsets_cuda_int32_int64_float16, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_EmbeddingBag_empty_per_sample_weights_and_offsets_cuda_int32_int64_float32, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_EmbeddingBag_empty_per_sample_weights_and_offsets_cuda_int32_int64_float64, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_EmbeddingBag_empty_per_sample_weights_and_offsets_cuda_int64_int32_float16, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_EmbeddingBag_empty_per_sample_weights_and_offsets_cuda_int64_int32_float32, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_EmbeddingBag_empty_per_sample_weights_and_offsets_cuda_int64_int32_float64, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_EmbeddingBag_empty_per_sample_weights_and_offsets_cuda_int64_int64_float16, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_EmbeddingBag_empty_per_sample_weights_and_offsets_cuda_int64_int64_float32, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_EmbeddingBag_empty_per_sample_weights_and_offsets_cuda_int64_int64_float64, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_EmbeddingBag_per_sample_weights_and_new_offsets_cuda_int32_int32_float16, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_EmbeddingBag_per_sample_weights_and_new_offsets_cuda_int32_int32_float32, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_EmbeddingBag_per_sample_weights_and_new_offsets_cuda_int32_int32_float64, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_EmbeddingBag_per_sample_weights_and_new_offsets_cuda_int32_int64_float16, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_EmbeddingBag_per_sample_weights_and_new_offsets_cuda_int32_int64_float32, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_EmbeddingBag_per_sample_weights_and_new_offsets_cuda_int32_int64_float64, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_EmbeddingBag_per_sample_weights_and_new_offsets_cuda_int64_int32_float16, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_EmbeddingBag_per_sample_weights_and_new_offsets_cuda_int64_int32_float32, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_EmbeddingBag_per_sample_weights_and_new_offsets_cuda_int64_int32_float64, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_EmbeddingBag_per_sample_weights_and_new_offsets_cuda_int64_int64_float16, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_EmbeddingBag_per_sample_weights_and_new_offsets_cuda_int64_int64_float32, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_EmbeddingBag_per_sample_weights_and_new_offsets_cuda_int64_int64_float64, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_EmbeddingBag_per_sample_weights_and_no_offsets_cuda_int32_float16, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_EmbeddingBag_per_sample_weights_and_no_offsets_cuda_int32_float32, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_EmbeddingBag_per_sample_weights_and_no_offsets_cuda_int32_float64, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_EmbeddingBag_per_sample_weights_and_no_offsets_cuda_int64_float16, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_EmbeddingBag_per_sample_weights_and_no_offsets_cuda_int64_float32, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_EmbeddingBag_per_sample_weights_and_no_offsets_cuda_int64_float64, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_EmbeddingBag_per_sample_weights_and_offsets_cuda_int32_int32_float16, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_EmbeddingBag_per_sample_weights_and_offsets_cuda_int32_int32_float32, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_EmbeddingBag_per_sample_weights_and_offsets_cuda_int32_int32_float64, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_EmbeddingBag_per_sample_weights_and_offsets_cuda_int32_int64_float16, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_EmbeddingBag_per_sample_weights_and_offsets_cuda_int32_int64_float32, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_EmbeddingBag_per_sample_weights_and_offsets_cuda_int32_int64_float64, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_EmbeddingBag_per_sample_weights_and_offsets_cuda_int64_int32_float16, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_EmbeddingBag_per_sample_weights_and_offsets_cuda_int64_int32_float32, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_EmbeddingBag_per_sample_weights_and_offsets_cuda_int64_int32_float64, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_EmbeddingBag_per_sample_weights_and_offsets_cuda_int64_int64_float16, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_EmbeddingBag_per_sample_weights_and_offsets_cuda_int64_int64_float32, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_EmbeddingBag_per_sample_weights_and_offsets_cuda_int64_int64_float64, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_EmbeddingBag_per_sample_weights_failures_cuda_int32_int32, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_EmbeddingBag_per_sample_weights_failures_cuda_int32_int64, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_EmbeddingBag_per_sample_weights_failures_cuda_int64_int32, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_EmbeddingBag_per_sample_weights_failures_cuda_int64_int64, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_backward_cuda_float16, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_backward_cuda_float64, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_backward_large_batch_overflow_cuda_bfloat16, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_1D_padding_idx_cuda_bfloat16, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_1D_padding_idx_cuda_float16, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_2D_padding_idx_cuda_bfloat16, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_2D_padding_idx_cuda_float16, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_bfloat16_cuda_int32_int32, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_bfloat16_cuda_int32_int64, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_bfloat16_cuda_int64_int32, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_bfloat16_cuda_int64_int64, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_device_cuda_int32_int32_float16, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_device_cuda_int32_int32_float32, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_device_cuda_int32_int32_float64, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_device_cuda_int32_int64_float16, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_device_cuda_int32_int64_float32, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_device_cuda_int32_int64_float64, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_device_cuda_int64_int32_float16, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_device_cuda_int64_int32_float32, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_device_cuda_int64_int32_float64, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_device_cuda_int64_int64_float16, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_device_cuda_int64_int64_float32, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_device_cuda_int64_int64_float64, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_dimension_errors_cuda, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_empty_input_cuda_int32_int32, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_empty_input_cuda_int32_int64, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_empty_input_cuda_int64_int32, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_empty_input_cuda_int64_int64, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_half_cuda_int32_int32, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_half_cuda_int32_int64, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_half_cuda_int64_int32, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_half_cuda_int64_int64, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_non_contiguous_weight_cuda_int32_int32_float16, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_non_contiguous_weight_cuda_int32_int32_float32, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_non_contiguous_weight_cuda_int32_int32_float64, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_non_contiguous_weight_cuda_int32_int64_float16, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_non_contiguous_weight_cuda_int32_int64_float32, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_non_contiguous_weight_cuda_int32_int64_float64, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_non_contiguous_weight_cuda_int64_int32_float16, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_non_contiguous_weight_cuda_int64_int32_float32, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_non_contiguous_weight_cuda_int64_int32_float64, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_non_contiguous_weight_cuda_int64_int64_float16, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_non_contiguous_weight_cuda_int64_int64_float32, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_non_contiguous_weight_cuda_int64_int64_float64, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_out_of_bounds_idx_padding_idx0_mode_max_cuda_float32_int32, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_out_of_bounds_idx_padding_idx0_mode_max_cuda_float32_int64, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_out_of_bounds_idx_padding_idx0_mode_max_cuda_float64_int32, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_out_of_bounds_idx_padding_idx0_mode_max_cuda_float64_int64, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_out_of_bounds_idx_padding_idx0_mode_mean_cuda_float32_int32, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_out_of_bounds_idx_padding_idx0_mode_mean_cuda_float32_int64, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_out_of_bounds_idx_padding_idx0_mode_mean_cuda_float64_int32, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_out_of_bounds_idx_padding_idx0_mode_mean_cuda_float64_int64, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_out_of_bounds_idx_padding_idx0_mode_sum_cuda_float32_int32, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_out_of_bounds_idx_padding_idx0_mode_sum_cuda_float32_int64, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_out_of_bounds_idx_padding_idx0_mode_sum_cuda_float64_int32, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_out_of_bounds_idx_padding_idx0_mode_sum_cuda_float64_int64, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_out_of_bounds_idx_padding_idx_0_mode_max_cuda_float32_int32, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_out_of_bounds_idx_padding_idx_0_mode_max_cuda_float32_int64, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_out_of_bounds_idx_padding_idx_0_mode_max_cuda_float64_int32, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_out_of_bounds_idx_padding_idx_0_mode_max_cuda_float64_int64, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_out_of_bounds_idx_padding_idx_0_mode_mean_cuda_float32_int32, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_out_of_bounds_idx_padding_idx_0_mode_mean_cuda_float32_int64, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_out_of_bounds_idx_padding_idx_0_mode_mean_cuda_float64_int32, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_out_of_bounds_idx_padding_idx_0_mode_mean_cuda_float64_int64, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_out_of_bounds_idx_padding_idx_0_mode_sum_cuda_float32_int32, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_out_of_bounds_idx_padding_idx_0_mode_sum_cuda_float32_int64, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_out_of_bounds_idx_padding_idx_0_mode_sum_cuda_float64_int32, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_out_of_bounds_idx_padding_idx_0_mode_sum_cuda_float64_int64, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_per_sample_weights_grad_bag_use_grad_False_per_sample_weights_use_grad_False_cuda, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_per_sample_weights_grad_bag_use_grad_False_per_sample_weights_use_grad_True_cuda, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_per_sample_weights_grad_bag_use_grad_True_per_sample_weights_use_grad_False_cuda, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_per_sample_weights_grad_bag_use_grad_True_per_sample_weights_use_grad_True_cuda, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_dense_grad_cuda, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_max_norm_backward_cuda_float16, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_max_norm_backward_cuda_float32, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_max_norm_backward_cuda_float64, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_max_norm_device_cuda_float16, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_max_norm_device_cuda_float32, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_max_norm_device_cuda_float64, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_max_norm_fwd_AD_cuda_float16, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_max_norm_fwd_AD_cuda_float32, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_max_norm_fwd_AD_cuda_float64, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_padding_idx_cuda_float16, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_padding_idx_cuda_float32, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_padding_idx_cuda_float64, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_scalar_weight_error_cuda 2025-12-04T17:16:06.4459712Z 2025-12-04T17:16:06.4460059Z Finished nn/test_embedding 1/1 ... [2025-12-04 17:16:06.427721][28994.810613375], took 0.32min 2025-12-04T17:16:06.4684201Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/nn.test_embedding/nn.test_embedding-d055fd5d393643fe.xml 2025-12-04T17:16:06.5496435Z Running test_xnnpack_integration 1/1 ... [2025-12-04 17:16:06.549336][28994.932229904] 2025-12-04T17:16:06.5497024Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T17:16:06.5500360Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_xnnpack_integration.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 17:16:06.549802] 2025-12-04T17:16:19.7881778Z 2025-12-04T17:16:19.7883439Z test_xnnpack_integration 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_xnnpack_integration_1.1_5b815e7820ba690d_.log 2025-12-04T17:16:19.7891915Z Running 12 items in this shard: test/test_xnnpack_integration.py::TestXNNPACKOps::test_conv2d, test/test_xnnpack_integration.py::TestXNNPACKOps::test_conv2d_transpose, test/test_xnnpack_integration.py::TestXNNPACKOps::test_linear, test/test_xnnpack_integration.py::TestXNNPACKOps::test_linear_1d_input, test/test_xnnpack_integration.py::TestXNNPACKSerDes::test_combined_model, test/test_xnnpack_integration.py::TestXNNPACKSerDes::test_conv2d, test/test_xnnpack_integration.py::TestXNNPACKSerDes::test_conv2d_transpose, test/test_xnnpack_integration.py::TestXNNPACKSerDes::test_linear, test/test_xnnpack_integration.py::TestXNNPACKRewritePass::test_decomposed_linear, test/test_xnnpack_integration.py::TestXNNPACKRewritePass::test_linear, test/test_xnnpack_integration.py::TestXNNPACKConv1dTransformPass::test_conv1d_basic, test/test_xnnpack_integration.py::TestXNNPACKConv1dTransformPass::test_conv1d_with_relu_fc 2025-12-04T17:16:19.7896712Z 2025-12-04T17:16:19.7897078Z Finished test_xnnpack_integration 1/1 ... [2025-12-04 17:16:19.787918][29008.170811951], took 0.22min 2025-12-04T17:16:19.8288549Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_xnnpack_integration/test_xnnpack_integration-ed8e38bda9a33f4f.xml 2025-12-04T17:16:19.9144138Z Running test_cuda_trace 1/1 ... [2025-12-04 17:16:19.914076][29008.296970052] 2025-12-04T17:16:19.9144687Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T17:16:19.9148442Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_cuda_trace.py', '--shard-id=1', '--num-shards=1', '-v', '--subprocess', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 17:16:19.914570] 2025-12-04T17:17:54.4414203Z 2025-12-04T17:17:54.4417305Z test_cuda_trace 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_cuda_trace_1.1_70d30feb0b9acc89_.log 2025-12-04T17:17:54.4422470Z Running 12 items in this shard: test/test_cuda_trace.py::TestCudaTrace::test_all_trace_callbacks_called, test/test_cuda_trace.py::TestCudaTrace::test_device_synchronization_callback, test/test_cuda_trace.py::TestCudaTrace::test_event_creation_callback, test/test_cuda_trace.py::TestCudaTrace::test_event_deletion_callback, test/test_cuda_trace.py::TestCudaTrace::test_event_record_callback, test/test_cuda_trace.py::TestCudaTrace::test_event_synchronization_callback, test/test_cuda_trace.py::TestCudaTrace::test_event_wait_callback, test/test_cuda_trace.py::TestCudaTrace::test_memcpy_synchronization, test/test_cuda_trace.py::TestCudaTrace::test_memory_allocation_callback, test/test_cuda_trace.py::TestCudaTrace::test_memory_deallocation_callback, test/test_cuda_trace.py::TestCudaTrace::test_stream_creation_callback, test/test_cuda_trace.py::TestCudaTrace::test_stream_synchronization_callback 2025-12-04T17:17:54.4427206Z Running 1 items in this shard: test/test_cuda_trace.py::TestCudaTrace::test_all_trace_callbacks_called 2025-12-04T17:17:54.4428152Z Running 1 items in this shard: test/test_cuda_trace.py::TestCudaTrace::test_device_synchronization_callback 2025-12-04T17:17:54.4429063Z Running 1 items in this shard: test/test_cuda_trace.py::TestCudaTrace::test_event_creation_callback 2025-12-04T17:17:54.4429941Z Running 1 items in this shard: test/test_cuda_trace.py::TestCudaTrace::test_event_deletion_callback 2025-12-04T17:17:54.4430806Z Running 1 items in this shard: test/test_cuda_trace.py::TestCudaTrace::test_event_record_callback 2025-12-04T17:17:54.4431710Z Running 1 items in this shard: test/test_cuda_trace.py::TestCudaTrace::test_event_synchronization_callback 2025-12-04T17:17:54.4432588Z Running 1 items in this shard: test/test_cuda_trace.py::TestCudaTrace::test_event_wait_callback 2025-12-04T17:17:54.4433450Z Running 1 items in this shard: test/test_cuda_trace.py::TestCudaTrace::test_memcpy_synchronization 2025-12-04T17:17:54.4434340Z Running 1 items in this shard: test/test_cuda_trace.py::TestCudaTrace::test_memory_allocation_callback 2025-12-04T17:17:54.4435333Z Running 1 items in this shard: test/test_cuda_trace.py::TestCudaTrace::test_memory_deallocation_callback 2025-12-04T17:17:54.4436235Z Running 1 items in this shard: test/test_cuda_trace.py::TestCudaTrace::test_stream_creation_callback 2025-12-04T17:17:54.4437165Z Running 1 items in this shard: test/test_cuda_trace.py::TestCudaTrace::test_stream_synchronization_callback 2025-12-04T17:17:54.4437706Z 2025-12-04T17:17:54.4438017Z Finished test_cuda_trace 1/1 ... [2025-12-04 17:17:54.441447][29102.824340688], took 1.58min 2025-12-04T17:17:54.4828433Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_cuda_trace/test_cuda_trace-5e38c3c197506de5.xml 2025-12-04T17:17:54.5762360Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_cuda_trace/test_cuda_trace-4795d7c5159b6e03.xml 2025-12-04T17:17:54.6204411Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_cuda_trace/test_cuda_trace-6c76df2a5666e90f.xml 2025-12-04T17:17:54.6512720Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_cuda_trace/test_cuda_trace-c93df3ae687a8e58.xml 2025-12-04T17:17:54.6980158Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_cuda_trace/test_cuda_trace-524a9565fc6ac576.xml 2025-12-04T17:17:54.7320230Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_cuda_trace/test_cuda_trace-b18b3c9d4ddc6b34.xml 2025-12-04T17:17:54.7636999Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_cuda_trace/test_cuda_trace-319074c2014cbf3e.xml 2025-12-04T17:17:54.7940600Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_cuda_trace/test_cuda_trace-2483ba726355768c.xml 2025-12-04T17:17:54.8236701Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_cuda_trace/test_cuda_trace-f2eed9c29ea8eac7.xml 2025-12-04T17:17:54.8535114Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_cuda_trace/test_cuda_trace-c778cc218c519690.xml 2025-12-04T17:17:54.8945457Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_cuda_trace/test_cuda_trace-7897a7f1d03cdaa3.xml 2025-12-04T17:17:54.9231712Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_cuda_trace/test_cuda_trace-d4fb26045e698199.xml 2025-12-04T17:17:54.9627499Z Running torch_np/test_reductions 1/1 ... [2025-12-04 17:17:54.962504][29103.345397789] 2025-12-04T17:17:54.9628067Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T17:17:54.9631801Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'torch_np/test_reductions.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 17:17:54.962946] 2025-12-04T17:18:03.8908253Z 2025-12-04T17:18:03.8909293Z torch_np/test_reductions 1/1 was successful, full logs can be found in artifacts with path test/test-reports/torch_np.test_reductions_1.1_b720aba5a84f607c_.log 2025-12-04T17:18:03.9397499Z Running 966 items in this shard: test/torch_np/test_reductions.py::TestFlatnonzero::test_basic, test/torch_np/test_reductions.py::TestAny::test_basic, test/torch_np/test_reductions.py::TestAny::test_method_vs_function, test/torch_np/test_reductions.py::TestAny::test_nd, test/torch_np/test_reductions.py::TestAll::test_basic, test/torch_np/test_reductions.py::TestAll::test_method_vs_function, test/torch_np/test_reductions.py::TestAll::test_nd, test/torch_np/test_reductions.py::TestMean::test_mean, test/torch_np/test_reductions.py::TestMean::test_mean_float16, test/torch_np/test_reductions.py::TestMean::test_mean_values, test/torch_np/test_reductions.py::TestMean::test_mean_where, test/torch_np/test_reductions.py::TestSum::test_sum, test/torch_np/test_reductions.py::TestSum::test_sum_boolean, test/torch_np/test_reductions.py::TestSum::test_sum_complex_1_dt0, test/torch_np/test_reductions.py::TestSum::test_sum_complex_1_dt1, test/torch_np/test_reductions.py::TestSum::test_sum_complex_2_dt0, test/torch_np/test_reductions.py::TestSum::test_sum_complex_2_dt1, test/torch_np/test_reductions.py::TestSum::test_sum_dtypes_2, test/torch_np/test_reductions.py::TestSum::test_sum_dtypes_warnings, test/torch_np/test_reductions.py::TestSum::test_sum_initial, test/torch_np/test_reductions.py::TestSum::test_sum_stability, test/torch_np/test_reductions.py::TestSum::test_sum_where, test/torch_np/test_reductions.py::TestGenericReductions::test_array_axis_func0, test/torch_np/test_reductions.py::TestGenericReductions::test_array_axis_func1, test/torch_np/test_reductions.py::TestGenericReductions::test_array_axis_func10, test/torch_np/test_reductions.py::TestGenericReductions::test_array_axis_func11, test/torch_np/test_reductions.py::TestGenericReductions::test_array_axis_func2, test/torch_np/test_reductions.py::TestGenericReductions::test_array_axis_func3, test/torch_np/test_reductions.py::TestGenericReductions::test_array_axis_func4, test/torch_np/test_reductions.py::TestGenericReductions::test_array_axis_func5, test/torch_np/test_reductions.py::TestGenericReductions::test_array_axis_func6, test/torch_np/test_reductions.py::TestGenericReductions::test_array_axis_func7, test/torch_np/test_reductions.py::TestGenericReductions::test_array_axis_func8, test/torch_np/test_reductions.py::TestGenericReductions::test_array_axis_func9, test/torch_np/test_reductions.py::TestGenericReductions::test_axis_bad_tuple_func0, test/torch_np/test_reductions.py::TestGenericReductions::test_axis_bad_tuple_func1, test/torch_np/test_reductions.py::TestGenericReductions::test_axis_bad_tuple_func10, test/torch_np/test_reductions.py::TestGenericReductions::test_axis_bad_tuple_func11, test/torch_np/test_reductions.py::TestGenericReductions::test_axis_bad_tuple_func2, test/torch_np/test_reductions.py::TestGenericReductions::test_axis_bad_tuple_func3, test/torch_np/test_reductions.py::TestGenericReductions::test_axis_bad_tuple_func4, test/torch_np/test_reductions.py::TestGenericReductions::test_axis_bad_tuple_func5, test/torch_np/test_reductions.py::TestGenericReductions::test_axis_bad_tuple_func6, test/torch_np/test_reductions.py::TestGenericReductions::test_axis_bad_tuple_func7, test/torch_np/test_reductions.py::TestGenericReductions::test_axis_bad_tuple_func8, test/torch_np/test_reductions.py::TestGenericReductions::test_axis_bad_tuple_func9, test/torch_np/test_reductions.py::TestGenericReductions::test_axis_empty_generic_func0, test/torch_np/test_reductions.py::TestGenericReductions::test_axis_empty_generic_func1, test/torch_np/test_reductions.py::TestGenericReductions::test_axis_empty_generic_func10, test/torch_np/test_reductions.py::TestGenericReductions::test_axis_empty_generic_func11, test/torch_np/test_reductions.py::TestGenericReductions::test_axis_empty_generic_func2, test/torch_np/test_reductions.py::TestGenericReductions::test_axis_empty_generic_func3, test/torch_np/test_reductions.py::TestGenericReductions::test_axis_empty_generic_func4, test/torch_np/test_reductions.py::TestGenericReductions::test_axis_empty_generic_func5, test/torch_np/test_reductions.py::TestGenericReductions::test_axis_empty_generic_func6, test/torch_np/test_reductions.py::TestGenericReductions::test_axis_empty_generic_func7, test/torch_np/test_reductions.py::TestGenericReductions::test_axis_empty_generic_func8, test/torch_np/test_reductions.py::TestGenericReductions::test_axis_empty_generic_func9, test/torch_np/test_reductions.py::TestGenericReductions::test_bad_axis_func0, test/torch_np/test_reductions.py::TestGenericReductions::test_bad_axis_func1, test/torch_np/test_reductions.py::TestGenericReductions::test_bad_axis_func10, test/torch_np/test_reductions.py::TestGenericReductions::test_bad_axis_func11, test/torch_np/test_reductions.py::TestGenericReductions::test_bad_axis_func2, test/torch_np/test_reductions.py::TestGenericReductions::test_bad_axis_func3, test/torch_np/test_reductions.py::TestGenericReductions::test_bad_axis_func4, test/torch_np/test_reductions.py::TestGenericReductions::test_bad_axis_func5, test/torch_np/test_reductions.py::TestGenericReductions::test_bad_axis_func6, test/torch_np/test_reductions.py::TestGenericReductions::test_bad_axis_func7, test/torch_np/test_reductions.py::TestGenericReductions::test_bad_axis_func8, test/torch_np/test_reductions.py::TestGenericReductions::test_bad_axis_func9, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis5_func0, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis5_func1, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis5_func10, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis5_func11, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis5_func2, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis5_func3, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis5_func4, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis5_func5, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis5_func6, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis5_func7, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis5_func8, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis5_func9, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis6_func0, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis6_func1, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis6_func10, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis6_func11, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis6_func2, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis6_func3, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis6_func4, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis6_func5, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis6_func6, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis6_func7, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis6_func8, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis6_func9, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis7_func0, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis7_func1, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis7_func10, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis7_func11, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis7_func2, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis7_func3, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis7_func4, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis7_func5, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis7_func6, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis7_func7, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis7_func8, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis7_func9, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis8_func0, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis8_func1, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis8_func10, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis8_func11, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis8_func2, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis8_func3, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis8_func4, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis8_func5, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis8_func6, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis8_func7, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis8_func8, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis8_func9, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_-1_func0, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_-1_func1, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_-1_func10, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_-1_func11, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_-1_func2, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_-1_func3, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_-1_func4, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_-1_func5, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_-1_func6, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_-1_func7, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_-1_func8, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_-1_func9, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_-2_func0, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_-2_func1, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_-2_func10, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_-2_func11, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_-2_func2, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_-2_func3, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_-2_func4, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_-2_func5, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_-2_func6, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_-2_func7, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_-2_func8, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_-2_func9, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_0_func0, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_0_func1, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_0_func10, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_0_func11, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_0_func2, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_0_func3, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_0_func4, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_0_func5, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_0_func6, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_0_func7, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_0_func8, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_0_func9, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_1_func0, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_1_func1, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_1_func10, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_1_func11, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_1_func2, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_1_func3, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_1_func4, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_1_func5, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_1_func6, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_1_func7, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_1_func8, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_1_func9, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_2_func0, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_2_func1, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_2_func10, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_2_func11, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_2_func2, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_2_func3, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_2_func4, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_2_func5, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_2_func6, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_2_func7, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_2_func8, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_2_func9, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_none_func0, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_none_func1, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_none_func10, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_none_func11, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_none_func2, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_none_func3, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_none_func4, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_none_func5, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_none_func6, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_none_func7, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_none_func8, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_none_func9, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func0_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func0_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func0_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func0_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func0_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func0_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func0_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func0_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func0_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func10_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func10_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func10_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func10_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func10_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func10_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func10_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func10_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func10_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func11_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func11_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func11_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func11_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func11_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func11_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func11_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func11_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func11_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func1_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func1_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func1_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func1_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func1_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func1_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func1_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func1_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func1_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func2_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func2_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func2_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func2_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func2_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func2_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func2_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func2_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func2_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func3_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func3_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func3_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func3_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func3_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func3_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func3_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func3_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func3_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func4_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func4_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func4_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func4_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func4_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func4_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func4_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func4_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func4_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func5_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func5_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func5_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func5_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func5_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func5_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func5_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func5_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func5_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func6_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func6_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func6_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func6_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func6_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func6_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func6_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func6_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func6_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func7_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func7_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func7_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func7_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func7_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func7_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func7_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func7_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func7_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func8_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func8_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func8_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func8_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func8_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func8_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func8_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func8_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func8_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func9_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func9_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func9_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func9_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func9_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func9_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func9_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func9_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func9_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func0_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func0_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func0_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func0_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func0_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func0_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func0_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func0_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func0_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func10_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func10_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func10_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func10_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func10_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func10_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func10_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func10_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func10_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func11_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func11_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func11_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func11_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func11_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func11_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func11_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func11_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func11_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func1_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func1_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func1_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func1_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func1_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func1_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func1_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func1_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func1_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func2_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func2_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func2_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func2_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func2_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func2_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func2_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func2_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func2_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func3_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func3_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func3_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func3_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func3_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func3_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func3_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func3_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func3_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func4_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func4_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func4_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func4_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func4_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func4_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func4_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func4_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func4_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func5_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func5_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func5_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func5_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func5_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func5_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func5_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func5_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func5_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func6_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func6_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func6_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func6_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func6_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func6_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func6_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func6_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func6_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func7_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func7_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func7_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func7_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func7_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func7_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func7_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func7_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func7_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func8_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func8_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func8_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func8_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func8_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func8_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func8_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func8_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func8_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func9_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func9_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func9_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func9_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func9_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func9_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func9_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func9_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func9_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func0_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func0_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func0_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func0_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func0_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func0_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func0_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func0_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func0_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func10_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func10_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func10_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func10_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func10_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func10_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func10_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func10_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func10_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func11_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func11_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func11_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func11_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func11_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func11_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func11_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func11_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func11_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func1_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func1_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func1_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func1_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func1_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func1_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func1_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func1_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func1_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func2_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func2_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func2_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func2_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func2_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func2_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func2_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func2_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func2_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func3_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func3_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func3_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func3_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func3_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func3_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func3_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func3_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func3_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func4_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func4_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func4_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func4_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func4_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func4_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func4_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func4_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func4_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func5_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func5_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func5_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func5_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func5_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func5_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func5_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func5_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func5_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func6_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func6_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func6_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func6_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func6_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func6_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func6_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func6_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func6_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func7_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func7_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func7_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func7_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func7_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func7_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func7_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func7_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func7_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func8_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func8_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func8_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func8_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func8_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func8_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func8_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func8_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func8_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func9_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func9_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func9_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func9_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func9_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func9_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func9_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func9_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func9_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func0_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func0_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func0_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func0_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func0_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func0_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func0_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func0_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func0_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func10_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func10_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func10_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func10_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func10_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func10_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func10_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func10_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func10_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func11_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func11_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func11_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func11_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func11_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func11_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func11_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func11_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func11_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func1_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func1_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func1_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func1_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func1_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func1_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func1_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func1_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func1_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func2_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func2_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func2_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func2_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func2_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func2_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func2_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func2_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func2_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func3_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func3_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func3_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func3_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func3_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func3_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func3_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func3_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func3_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func4_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func4_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func4_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func4_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func4_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func4_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func4_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func4_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func4_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func5_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func5_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func5_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func5_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func5_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func5_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func5_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func5_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func5_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func6_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func6_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func6_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func6_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func6_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func6_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func6_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func6_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func6_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func7_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func7_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func7_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func7_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func7_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func7_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func7_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func7_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func7_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func8_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func8_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func8_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func8_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func8_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func8_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func8_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func8_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func8_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func9_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func9_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func9_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func9_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func9_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func9_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func9_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func9_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func9_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func0_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func0_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func0_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func0_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func0_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func0_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func0_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func0_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func0_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func10_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func10_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func10_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func10_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func10_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func10_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func10_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func10_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func10_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func11_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func11_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func11_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func11_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func11_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func11_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func11_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func11_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func11_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func1_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func1_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func1_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func1_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func1_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func1_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func1_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func1_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func1_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func2_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func2_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func2_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func2_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func2_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func2_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func2_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func2_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func2_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func3_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func3_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func3_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func3_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func3_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func3_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func3_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func3_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func3_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func4_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func4_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func4_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func4_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func4_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func4_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func4_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func4_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func4_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func5_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func5_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func5_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func5_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func5_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func5_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func5_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func5_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func5_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func6_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func6_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func6_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func6_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func6_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func6_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func6_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func6_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func6_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func7_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func7_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func7_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func7_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func7_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func7_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func7_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func7_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func7_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func8_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func8_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func8_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func8_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func8_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func8_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func8_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func8_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func8_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func9_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func9_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func9_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func9_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func9_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func9_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func9_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func9_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func9_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func0_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func0_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func0_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func0_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func0_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func0_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func0_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func0_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func0_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func10_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func10_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func10_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func10_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func10_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func10_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func10_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func10_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func10_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func11_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func11_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func11_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func11_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func11_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func11_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func11_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func11_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func11_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func1_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func1_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func1_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func1_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func1_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func1_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func1_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func1_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func1_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func2_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func2_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func2_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func2_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func2_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func2_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func2_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func2_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func2_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func3_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func3_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func3_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func3_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func3_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func3_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func3_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func3_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func3_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func4_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func4_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func4_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func4_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func4_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func4_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func4_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func4_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func4_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func5_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func5_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func5_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func5_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func5_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func5_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func5_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func5_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func5_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func6_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func6_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func6_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func6_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func6_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func6_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func6_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func6_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func6_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func7_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func7_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func7_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func7_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func7_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func7_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func7_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func7_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func7_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func8_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func8_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func8_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func8_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func8_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func8_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func8_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func8_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func8_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func9_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func9_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func9_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func9_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func9_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func9_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func9_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func9_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func9_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func0_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func0_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func0_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func0_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func0_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func0_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func0_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func0_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func0_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func10_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func10_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func10_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func10_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func10_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func10_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func10_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func10_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func10_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func11_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func11_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func11_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func11_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func11_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func11_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func11_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func11_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func11_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func1_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func1_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func1_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func1_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func1_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func1_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func1_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func1_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func1_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func2_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func2_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func2_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func2_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func2_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func2_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func2_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func2_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func2_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func3_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func3_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func3_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func3_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func3_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func3_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func3_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func3_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func3_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func4_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func4_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func4_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func4_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func4_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func4_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func4_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func4_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func4_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func5_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func5_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func5_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func5_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func5_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func5_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func5_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func5_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func5_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func6_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func6_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func6_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func6_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func6_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func6_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func6_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func6_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func6_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func7_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func7_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func7_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func7_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func7_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func7_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func7_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func7_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func7_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func8_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func8_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func8_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func8_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func8_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func8_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func8_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func8_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func8_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func9_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func9_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func9_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func9_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func9_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func9_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func9_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func9_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func9_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_scalar_func0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_scalar_func1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_scalar_func10, test/torch_np/test_reductions.py::TestGenericReductions::test_out_scalar_func11, test/torch_np/test_reductions.py::TestGenericReductions::test_out_scalar_func2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_scalar_func3, test/torch_np/test_reductions.py::TestGenericReductions::test_out_scalar_func4, test/torch_np/test_reductions.py::TestGenericReductions::test_out_scalar_func5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_scalar_func6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_scalar_func7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_scalar_func8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_scalar_func9, test/torch_np/test_reductions.py::TestGenericCumSumProd::test_array_axis_func0, test/torch_np/test_reductions.py::TestGenericCumSumProd::test_array_axis_func1, test/torch_np/test_reductions.py::TestGenericCumSumProd::test_axis_bad_tuple_func0, test/torch_np/test_reductions.py::TestGenericCumSumProd::test_axis_bad_tuple_func1, test/torch_np/test_reductions.py::TestGenericCumSumProd::test_axis_empty_generic_func0, test/torch_np/test_reductions.py::TestGenericCumSumProd::test_axis_empty_generic_func1, test/torch_np/test_reductions.py::TestGenericCumSumProd::test_bad_axis_func0, test/torch_np/test_reductions.py::TestGenericCumSumProd::test_bad_axis_func1 2025-12-04T17:18:03.9874376Z 2025-12-04T17:18:03.9874753Z Finished torch_np/test_reductions 1/1 ... [2025-12-04 17:18:03.892288][29112.275179736], took 0.15min 2025-12-04T17:18:03.9876041Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/torch_np.test_reductions/torch_np.test_reductions-73a2026a6cdfd4dc.xml 2025-12-04T17:18:04.0300891Z Running torch_np/numpy_tests/core/test_scalar_ctors 1/1 ... [2025-12-04 17:18:04.029755][29112.41264687] 2025-12-04T17:18:04.0301841Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T17:18:04.0305433Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'torch_np/numpy_tests/core/test_scalar_ctors.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 17:18:04.030275] 2025-12-04T17:18:09.8034036Z 2025-12-04T17:18:09.8035288Z torch_np/numpy_tests/core/test_scalar_ctors 1/1 was successful, full logs can be found in artifacts with path test/test-reports/torch_np.numpy_tests.core.test_scalar_ctors_1.1_4168b5c3b3d7f9be_.log 2025-12-04T17:18:09.8067902Z Running 65 items in this shard: test/torch_np/numpy_tests/core/test_scalar_ctors.py::TestFromString::test_bool, test/torch_np/numpy_tests/core/test_scalar_ctors.py::TestFromString::test_floating, test/torch_np/numpy_tests/core/test_scalar_ctors.py::TestFromString::test_floating_overflow, test/torch_np/numpy_tests/core/test_scalar_ctors.py::TestFromInt::test_intp, test/torch_np/numpy_tests/core/test_scalar_ctors.py::TestFromInt::test_uint64_from_negative, test/torch_np/numpy_tests/core/test_scalar_ctors.py::TestArrayFromScalar::test_complex_t10_t20, test/torch_np/numpy_tests/core/test_scalar_ctors.py::TestArrayFromScalar::test_complex_t10_t21, test/torch_np/numpy_tests/core/test_scalar_ctors.py::TestArrayFromScalar::test_complex_t10_t22, test/torch_np/numpy_tests/core/test_scalar_ctors.py::TestArrayFromScalar::test_complex_t11_t20, test/torch_np/numpy_tests/core/test_scalar_ctors.py::TestArrayFromScalar::test_complex_t11_t21, test/torch_np/numpy_tests/core/test_scalar_ctors.py::TestArrayFromScalar::test_complex_t11_t22, test/torch_np/numpy_tests/core/test_scalar_ctors.py::TestArrayFromScalar::test_integers_np_byte_np_byte, test/torch_np/numpy_tests/core/test_scalar_ctors.py::TestArrayFromScalar::test_integers_np_byte_np_int_, test/torch_np/numpy_tests/core/test_scalar_ctors.py::TestArrayFromScalar::test_integers_np_byte_np_intc, test/torch_np/numpy_tests/core/test_scalar_ctors.py::TestArrayFromScalar::test_integers_np_byte_np_longlong, test/torch_np/numpy_tests/core/test_scalar_ctors.py::TestArrayFromScalar::test_integers_np_byte_np_short, test/torch_np/numpy_tests/core/test_scalar_ctors.py::TestArrayFromScalar::test_integers_np_byte_t25, test/torch_np/numpy_tests/core/test_scalar_ctors.py::TestArrayFromScalar::test_integers_np_byte_t26, test/torch_np/numpy_tests/core/test_scalar_ctors.py::TestArrayFromScalar::test_integers_np_int__np_byte, test/torch_np/numpy_tests/core/test_scalar_ctors.py::TestArrayFromScalar::test_integers_np_int__np_int_, test/torch_np/numpy_tests/core/test_scalar_ctors.py::TestArrayFromScalar::test_integers_np_int__np_intc, test/torch_np/numpy_tests/core/test_scalar_ctors.py::TestArrayFromScalar::test_integers_np_int__np_longlong, test/torch_np/numpy_tests/core/test_scalar_ctors.py::TestArrayFromScalar::test_integers_np_int__np_short, test/torch_np/numpy_tests/core/test_scalar_ctors.py::TestArrayFromScalar::test_integers_np_int__t25, test/torch_np/numpy_tests/core/test_scalar_ctors.py::TestArrayFromScalar::test_integers_np_int__t26, test/torch_np/numpy_tests/core/test_scalar_ctors.py::TestArrayFromScalar::test_integers_np_intc_np_byte, test/torch_np/numpy_tests/core/test_scalar_ctors.py::TestArrayFromScalar::test_integers_np_intc_np_int_, test/torch_np/numpy_tests/core/test_scalar_ctors.py::TestArrayFromScalar::test_integers_np_intc_np_intc, test/torch_np/numpy_tests/core/test_scalar_ctors.py::TestArrayFromScalar::test_integers_np_intc_np_longlong, test/torch_np/numpy_tests/core/test_scalar_ctors.py::TestArrayFromScalar::test_integers_np_intc_np_short, test/torch_np/numpy_tests/core/test_scalar_ctors.py::TestArrayFromScalar::test_integers_np_intc_t25, test/torch_np/numpy_tests/core/test_scalar_ctors.py::TestArrayFromScalar::test_integers_np_intc_t26, test/torch_np/numpy_tests/core/test_scalar_ctors.py::TestArrayFromScalar::test_integers_np_longlong_np_byte, test/torch_np/numpy_tests/core/test_scalar_ctors.py::TestArrayFromScalar::test_integers_np_longlong_np_int_, test/torch_np/numpy_tests/core/test_scalar_ctors.py::TestArrayFromScalar::test_integers_np_longlong_np_intc, test/torch_np/numpy_tests/core/test_scalar_ctors.py::TestArrayFromScalar::test_integers_np_longlong_np_longlong, test/torch_np/numpy_tests/core/test_scalar_ctors.py::TestArrayFromScalar::test_integers_np_longlong_np_short, test/torch_np/numpy_tests/core/test_scalar_ctors.py::TestArrayFromScalar::test_integers_np_longlong_t25, test/torch_np/numpy_tests/core/test_scalar_ctors.py::TestArrayFromScalar::test_integers_np_longlong_t26, test/torch_np/numpy_tests/core/test_scalar_ctors.py::TestArrayFromScalar::test_integers_np_short_np_byte, test/torch_np/numpy_tests/core/test_scalar_ctors.py::TestArrayFromScalar::test_integers_np_short_np_int_, test/torch_np/numpy_tests/core/test_scalar_ctors.py::TestArrayFromScalar::test_integers_np_short_np_intc, test/torch_np/numpy_tests/core/test_scalar_ctors.py::TestArrayFromScalar::test_integers_np_short_np_longlong, test/torch_np/numpy_tests/core/test_scalar_ctors.py::TestArrayFromScalar::test_integers_np_short_np_short, test/torch_np/numpy_tests/core/test_scalar_ctors.py::TestArrayFromScalar::test_integers_np_short_t25, test/torch_np/numpy_tests/core/test_scalar_ctors.py::TestArrayFromScalar::test_integers_np_short_t26, test/torch_np/numpy_tests/core/test_scalar_ctors.py::TestArrayFromScalar::test_integers_t15_np_byte, test/torch_np/numpy_tests/core/test_scalar_ctors.py::TestArrayFromScalar::test_integers_t15_np_int_, test/torch_np/numpy_tests/core/test_scalar_ctors.py::TestArrayFromScalar::test_integers_t15_np_intc, test/torch_np/numpy_tests/core/test_scalar_ctors.py::TestArrayFromScalar::test_integers_t15_np_longlong, test/torch_np/numpy_tests/core/test_scalar_ctors.py::TestArrayFromScalar::test_integers_t15_np_short, test/torch_np/numpy_tests/core/test_scalar_ctors.py::TestArrayFromScalar::test_integers_t15_t25, test/torch_np/numpy_tests/core/test_scalar_ctors.py::TestArrayFromScalar::test_integers_t15_t26, test/torch_np/numpy_tests/core/test_scalar_ctors.py::TestArrayFromScalar::test_reals_t10_t20, test/torch_np/numpy_tests/core/test_scalar_ctors.py::TestArrayFromScalar::test_reals_t10_t21, test/torch_np/numpy_tests/core/test_scalar_ctors.py::TestArrayFromScalar::test_reals_t10_t22, test/torch_np/numpy_tests/core/test_scalar_ctors.py::TestArrayFromScalar::test_reals_t10_t23, test/torch_np/numpy_tests/core/test_scalar_ctors.py::TestArrayFromScalar::test_reals_t11_t20, test/torch_np/numpy_tests/core/test_scalar_ctors.py::TestArrayFromScalar::test_reals_t11_t21, test/torch_np/numpy_tests/core/test_scalar_ctors.py::TestArrayFromScalar::test_reals_t11_t22, test/torch_np/numpy_tests/core/test_scalar_ctors.py::TestArrayFromScalar::test_reals_t11_t23, test/torch_np/numpy_tests/core/test_scalar_ctors.py::TestArrayFromScalar::test_reals_t12_t20, test/torch_np/numpy_tests/core/test_scalar_ctors.py::TestArrayFromScalar::test_reals_t12_t21, test/torch_np/numpy_tests/core/test_scalar_ctors.py::TestArrayFromScalar::test_reals_t12_t22, test/torch_np/numpy_tests/core/test_scalar_ctors.py::TestArrayFromScalar::test_reals_t12_t23 2025-12-04T17:18:09.8100010Z 2025-12-04T17:18:09.8100529Z Finished torch_np/numpy_tests/core/test_scalar_ctors 1/1 ... [2025-12-04 17:18:09.803289][29118.186183807], took 0.10min 2025-12-04T17:18:09.8453747Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/torch_np.numpy_tests.core.test_scalar_ctors/torch_np.numpy_tests.core.test_scalar_ctors-e23576bcb06b5d61.xml 2025-12-04T17:18:09.8772530Z Running torch_np/numpy_tests/lib/test_arraypad 1/1 ... [2025-12-04 17:18:09.876979][29118.259874076] 2025-12-04T17:18:09.8773178Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T17:18:09.8776835Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'torch_np/numpy_tests/lib/test_arraypad.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 17:18:09.877438] 2025-12-04T17:18:15.5505561Z 2025-12-04T17:18:15.5506760Z torch_np/numpy_tests/lib/test_arraypad 1/1 was successful, full logs can be found in artifacts with path test/test-reports/torch_np.numpy_tests.lib.test_arraypad_1.1_867803734c4a045d_.log 2025-12-04T17:18:15.5511644Z Running 9 items in this shard: test/torch_np/numpy_tests/lib/test_arraypad.py::TestConstant::test_check_constant, test/torch_np/numpy_tests/lib/test_arraypad.py::TestConstant::test_check_constant_float, test/torch_np/numpy_tests/lib/test_arraypad.py::TestConstant::test_check_constant_float2, test/torch_np/numpy_tests/lib/test_arraypad.py::TestConstant::test_check_constant_float3, test/torch_np/numpy_tests/lib/test_arraypad.py::TestConstant::test_check_constant_odd_pad_amount, test/torch_np/numpy_tests/lib/test_arraypad.py::TestConstant::test_check_constant_pad_2d, test/torch_np/numpy_tests/lib/test_arraypad.py::TestConstant::test_check_constant_zeros, test/torch_np/numpy_tests/lib/test_arraypad.py::TestConstant::test_check_large_integers, test/torch_np/numpy_tests/lib/test_arraypad.py::TestConstant::test_pad_empty_dimension 2025-12-04T17:18:15.5515678Z 2025-12-04T17:18:15.5516099Z Finished torch_np/numpy_tests/lib/test_arraypad 1/1 ... [2025-12-04 17:18:15.550341][29123.933237058], took 0.09min 2025-12-04T17:18:15.5924931Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/torch_np.numpy_tests.lib.test_arraypad/torch_np.numpy_tests.lib.test_arraypad-f4e46a1506be78e1.xml 2025-12-04T17:18:15.6260865Z Running test_prims 1/1 ... [2025-12-04 17:18:15.625817][29124.008710886] 2025-12-04T17:18:15.6261384Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T17:18:15.6264785Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_prims.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 17:18:15.626231] 2025-12-04T17:18:24.7540596Z 2025-12-04T17:18:24.7541533Z test_prims 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_prims_1.1_8a7702ff07b7da5d_.log 2025-12-04T17:18:24.7551837Z Running 26 items in this shard: test/test_prims.py::TestPrimsBasic::test_check_deprecation_warning, test/test_prims.py::TestPrimsBasic::test_clone_complex, test/test_prims.py::TestPrimsBasic::test_clone_meta_stride_preservation_dense, test/test_prims.py::TestPrimsBasic::test_clone_meta_stride_preservation_sparse, test/test_prims.py::TestPrimsBasic::test_mul_complex, test/test_prims.py::TestPrimsBasic::test_torch_ops, test/test_prims.py::TestPrimsCUDA::test_aten_overload_to_prims_cuda, test/test_prims.py::TestPrimsCUDA::test_broadcast_in_dim_cuda_float32, test/test_prims.py::TestPrimsCUDA::test_broadcast_in_dim_sum_cuda_float32, test/test_prims.py::TestPrimsCUDA::test_cbrt_prim_cuda_float64, test/test_prims.py::TestPrimsCUDA::test_cbrt_prim_cuda_int64, test/test_prims.py::TestPrimsCUDA::test_collapse_cuda_float32, test/test_prims.py::TestPrimsCUDA::test_functional_rng_wrappers_cuda_float32, test/test_prims.py::TestPrimsCUDA::test_memory_format_strides_cuda_float32, test/test_prims.py::TestPrimsCUDA::test_philox_rand_cuda_float32, test/test_prims.py::TestPrimsCUDA::test_reshape_view_method_cuda_float32, test/test_prims.py::TestPrimsCUDA::test_var_correction_0_cuda_float32, test/test_prims.py::TestPrimsCUDA::test_var_correction_1_cuda_float32, test/test_prims.py::TestRefsCUDA::test_constant_pad_nd_memory_format_cuda_float32, test/test_prims.py::TestRefsCUDA::test_inferred_tags_cuda, test/test_prims.py::TestRefsCUDA::test_infinite_loop_from_py_dispatcher_cuda, test/test_prims.py::TestRefsCUDA::test_linspace_with_complex_input_cuda, test/test_prims.py::TestRefsCUDA::test_logspace_with_complex_input_cuda, test/test_prims.py::TestRefsCUDA::test_unbind_cuda, test/test_prims.py::TestDecompCUDA::test_decomposition_method_vararg_ones_cuda_float32, test/test_prims.py::TestDecompCUDA::test_decomposition_method_vararg_permute_cuda_float32 2025-12-04T17:18:24.7560877Z 2025-12-04T17:18:24.7561151Z Finished test_prims 1/1 ... [2025-12-04 17:18:24.753865][29133.136760746], took 0.15min 2025-12-04T17:18:24.7967842Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_prims/test_prims-38188698633a9bb5.xml 2025-12-04T17:18:24.8860550Z Running test_spectral_ops 1/1 ... [2025-12-04 17:18:24.885676][29133.268569242] 2025-12-04T17:18:24.8861139Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T17:18:24.8863835Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_spectral_ops.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 17:18:24.886128] 2025-12-04T17:19:02.6575822Z 2025-12-04T17:19:02.6577337Z test_spectral_ops 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_spectral_ops_1.1_434231ff814fe9e8_.log 2025-12-04T17:19:02.6717879Z Running 347 items in this shard: test/test_spectral_ops.py::TestFFTCUDA::test_batch_istft_cuda, test/test_spectral_ops.py::TestFFTCUDA::test_complex_istft_real_equiv_cuda_complex128, test/test_spectral_ops.py::TestFFTCUDA::test_complex_stft_definition_cuda_complex128, test/test_spectral_ops.py::TestFFTCUDA::test_complex_stft_onesided_cuda, test/test_spectral_ops.py::TestFFTCUDA::test_complex_stft_real_equiv_cuda_complex128, test/test_spectral_ops.py::TestFFTCUDA::test_complex_stft_roundtrip_cuda_complex128, test/test_spectral_ops.py::TestFFTCUDA::test_complex_stft_roundtrip_cuda_float64, test/test_spectral_ops.py::TestFFTCUDA::test_cufft_context_cuda_complex128, test/test_spectral_ops.py::TestFFTCUDA::test_cufft_context_cuda_complex64, test/test_spectral_ops.py::TestFFTCUDA::test_cufft_plan_cache_cuda_float64, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft__refs_fft_fft2_cuda_complex32, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft__refs_fft_fft2_cuda_complex64, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft__refs_fft_fft2_cuda_float16, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft__refs_fft_fft2_cuda_float32, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft__refs_fft_fft_cuda_complex32, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft__refs_fft_fft_cuda_complex64, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft__refs_fft_fft_cuda_float16, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft__refs_fft_fft_cuda_float32, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft__refs_fft_fftn_cuda_complex32, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft__refs_fft_fftn_cuda_complex64, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft__refs_fft_fftn_cuda_float16, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft__refs_fft_fftn_cuda_float32, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft__refs_fft_hfft2_cuda_complex32, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft__refs_fft_hfft2_cuda_complex64, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft__refs_fft_hfft2_cuda_float16, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft__refs_fft_hfft2_cuda_float32, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft__refs_fft_hfft_cuda_complex32, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft__refs_fft_hfft_cuda_complex64, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft__refs_fft_hfft_cuda_float16, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft__refs_fft_hfft_cuda_float32, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft__refs_fft_hfftn_cuda_complex32, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft__refs_fft_hfftn_cuda_complex64, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft__refs_fft_hfftn_cuda_float16, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft__refs_fft_hfftn_cuda_float32, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft__refs_fft_ifft2_cuda_complex32, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft__refs_fft_ifft2_cuda_complex64, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft__refs_fft_ifft2_cuda_float16, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft__refs_fft_ifft2_cuda_float32, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft__refs_fft_ifft_cuda_complex32, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft__refs_fft_ifft_cuda_complex64, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft__refs_fft_ifft_cuda_float16, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft__refs_fft_ifft_cuda_float32, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft__refs_fft_ifftn_cuda_complex32, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft__refs_fft_ifftn_cuda_complex64, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft__refs_fft_ifftn_cuda_float16, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft__refs_fft_ifftn_cuda_float32, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft__refs_fft_ihfft2_cuda_float16, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft__refs_fft_ihfft2_cuda_float32, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft__refs_fft_ihfft_cuda_float16, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft__refs_fft_ihfft_cuda_float32, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft__refs_fft_ihfftn_cuda_float16, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft__refs_fft_ihfftn_cuda_float32, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft__refs_fft_irfft2_cuda_complex32, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft__refs_fft_irfft2_cuda_complex64, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft__refs_fft_irfft2_cuda_float16, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft__refs_fft_irfft2_cuda_float32, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft__refs_fft_irfft_cuda_complex32, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft__refs_fft_irfft_cuda_complex64, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft__refs_fft_irfft_cuda_float16, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft__refs_fft_irfft_cuda_float32, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft__refs_fft_irfftn_cuda_complex32, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft__refs_fft_irfftn_cuda_complex64, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft__refs_fft_irfftn_cuda_float16, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft__refs_fft_irfftn_cuda_float32, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft__refs_fft_rfft2_cuda_float16, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft__refs_fft_rfft2_cuda_float32, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft__refs_fft_rfft_cuda_float16, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft__refs_fft_rfft_cuda_float32, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft__refs_fft_rfftn_cuda_float16, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft__refs_fft_rfftn_cuda_float32, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft_fft_fft2_cuda_complex32, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft_fft_fft2_cuda_complex64, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft_fft_fft2_cuda_float16, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft_fft_fft2_cuda_float32, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft_fft_fft_cuda_complex32, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft_fft_fft_cuda_complex64, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft_fft_fft_cuda_float16, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft_fft_fft_cuda_float32, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft_fft_fftn_cuda_complex32, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft_fft_fftn_cuda_complex64, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft_fft_fftn_cuda_float16, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft_fft_fftn_cuda_float32, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft_fft_hfft2_cuda_complex32, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft_fft_hfft2_cuda_complex64, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft_fft_hfft2_cuda_float16, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft_fft_hfft2_cuda_float32, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft_fft_hfft_cuda_complex32, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft_fft_hfft_cuda_complex64, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft_fft_hfft_cuda_float16, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft_fft_hfft_cuda_float32, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft_fft_hfftn_cuda_complex32, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft_fft_hfftn_cuda_complex64, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft_fft_hfftn_cuda_float16, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft_fft_hfftn_cuda_float32, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft_fft_ifft2_cuda_complex32, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft_fft_ifft2_cuda_complex64, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft_fft_ifft2_cuda_float16, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft_fft_ifft2_cuda_float32, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft_fft_ifft_cuda_complex32, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft_fft_ifft_cuda_complex64, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft_fft_ifft_cuda_float16, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft_fft_ifft_cuda_float32, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft_fft_ifftn_cuda_complex32, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft_fft_ifftn_cuda_complex64, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft_fft_ifftn_cuda_float16, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft_fft_ifftn_cuda_float32, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft_fft_ihfft2_cuda_float16, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft_fft_ihfft2_cuda_float32, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft_fft_ihfft_cuda_float16, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft_fft_ihfft_cuda_float32, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft_fft_ihfftn_cuda_float16, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft_fft_ihfftn_cuda_float32, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft_fft_irfft2_cuda_complex32, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft_fft_irfft2_cuda_complex64, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft_fft_irfft2_cuda_float16, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft_fft_irfft2_cuda_float32, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft_fft_irfft_cuda_complex32, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft_fft_irfft_cuda_complex64, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft_fft_irfft_cuda_float16, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft_fft_irfft_cuda_float32, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft_fft_irfftn_cuda_complex32, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft_fft_irfftn_cuda_complex64, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft_fft_irfftn_cuda_float16, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft_fft_irfftn_cuda_float32, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft_fft_rfft2_cuda_float16, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft_fft_rfft2_cuda_float32, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft_fft_rfft_cuda_float16, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft_fft_rfft_cuda_float32, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft_fft_rfftn_cuda_float16, test/test_spectral_ops.py::TestFFTCUDA::test_empty_fft_fft_rfftn_cuda_float32, test/test_spectral_ops.py::TestFFTCUDA::test_empty_ifft_cuda, test/test_spectral_ops.py::TestFFTCUDA::test_fft2_fftn_equivalence_cuda_complex64, test/test_spectral_ops.py::TestFFTCUDA::test_fft2_fftn_equivalence_cuda_float32, test/test_spectral_ops.py::TestFFTCUDA::test_fft2_invalid_cuda, test/test_spectral_ops.py::TestFFTCUDA::test_fft2_numpy_cuda_complex128, test/test_spectral_ops.py::TestFFTCUDA::test_fft2_numpy_cuda_float64, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_bfloat16_errors__refs_fft_fft2_cuda_bfloat16, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_bfloat16_errors__refs_fft_fft_cuda_bfloat16, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_bfloat16_errors__refs_fft_fftn_cuda_bfloat16, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_bfloat16_errors__refs_fft_hfft2_cuda_bfloat16, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_bfloat16_errors__refs_fft_hfft_cuda_bfloat16, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_bfloat16_errors__refs_fft_hfftn_cuda_bfloat16, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_bfloat16_errors__refs_fft_ifft2_cuda_bfloat16, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_bfloat16_errors__refs_fft_ifft_cuda_bfloat16, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_bfloat16_errors__refs_fft_ifftn_cuda_bfloat16, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_bfloat16_errors__refs_fft_ihfft2_cuda_bfloat16, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_bfloat16_errors__refs_fft_ihfft_cuda_bfloat16, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_bfloat16_errors__refs_fft_ihfftn_cuda_bfloat16, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_bfloat16_errors__refs_fft_irfft2_cuda_bfloat16, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_bfloat16_errors__refs_fft_irfft_cuda_bfloat16, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_bfloat16_errors__refs_fft_irfftn_cuda_bfloat16, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_bfloat16_errors__refs_fft_rfft2_cuda_bfloat16, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_bfloat16_errors__refs_fft_rfft_cuda_bfloat16, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_bfloat16_errors__refs_fft_rfftn_cuda_bfloat16, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_bfloat16_errors_fft_fft2_cuda_bfloat16, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_bfloat16_errors_fft_fft_cuda_bfloat16, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_bfloat16_errors_fft_fftn_cuda_bfloat16, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_bfloat16_errors_fft_hfft2_cuda_bfloat16, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_bfloat16_errors_fft_hfft_cuda_bfloat16, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_bfloat16_errors_fft_hfftn_cuda_bfloat16, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_bfloat16_errors_fft_ifft2_cuda_bfloat16, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_bfloat16_errors_fft_ifft_cuda_bfloat16, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_bfloat16_errors_fft_ifftn_cuda_bfloat16, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_bfloat16_errors_fft_ihfft2_cuda_bfloat16, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_bfloat16_errors_fft_ihfft_cuda_bfloat16, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_bfloat16_errors_fft_ihfftn_cuda_bfloat16, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_bfloat16_errors_fft_irfft2_cuda_bfloat16, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_bfloat16_errors_fft_irfft_cuda_bfloat16, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_bfloat16_errors_fft_irfftn_cuda_bfloat16, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_bfloat16_errors_fft_rfft2_cuda_bfloat16, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_bfloat16_errors_fft_rfft_cuda_bfloat16, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_bfloat16_errors_fft_rfftn_cuda_bfloat16, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_chalf_not_power_of_two_error__refs_fft_fft2_cuda_complex32, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_chalf_not_power_of_two_error__refs_fft_fft2_cuda_float16, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_chalf_not_power_of_two_error__refs_fft_fft_cuda_complex32, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_chalf_not_power_of_two_error__refs_fft_fft_cuda_float16, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_chalf_not_power_of_two_error__refs_fft_fftn_cuda_complex32, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_chalf_not_power_of_two_error__refs_fft_fftn_cuda_float16, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_chalf_not_power_of_two_error__refs_fft_hfft2_cuda_complex32, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_chalf_not_power_of_two_error__refs_fft_hfft2_cuda_float16, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_chalf_not_power_of_two_error__refs_fft_hfft_cuda_complex32, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_chalf_not_power_of_two_error__refs_fft_hfft_cuda_float16, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_chalf_not_power_of_two_error__refs_fft_hfftn_cuda_complex32, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_chalf_not_power_of_two_error__refs_fft_hfftn_cuda_float16, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_chalf_not_power_of_two_error__refs_fft_ifft2_cuda_complex32, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_chalf_not_power_of_two_error__refs_fft_ifft2_cuda_float16, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_chalf_not_power_of_two_error__refs_fft_ifft_cuda_complex32, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_chalf_not_power_of_two_error__refs_fft_ifft_cuda_float16, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_chalf_not_power_of_two_error__refs_fft_ifftn_cuda_complex32, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_chalf_not_power_of_two_error__refs_fft_ifftn_cuda_float16, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_chalf_not_power_of_two_error__refs_fft_ihfft2_cuda_float16, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_chalf_not_power_of_two_error__refs_fft_ihfft_cuda_float16, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_chalf_not_power_of_two_error__refs_fft_ihfftn_cuda_float16, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_chalf_not_power_of_two_error__refs_fft_irfft2_cuda_complex32, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_chalf_not_power_of_two_error__refs_fft_irfft2_cuda_float16, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_chalf_not_power_of_two_error__refs_fft_irfft_cuda_complex32, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_chalf_not_power_of_two_error__refs_fft_irfft_cuda_float16, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_chalf_not_power_of_two_error__refs_fft_irfftn_cuda_complex32, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_chalf_not_power_of_two_error__refs_fft_irfftn_cuda_float16, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_chalf_not_power_of_two_error__refs_fft_rfft2_cuda_float16, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_chalf_not_power_of_two_error__refs_fft_rfft_cuda_float16, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_chalf_not_power_of_two_error__refs_fft_rfftn_cuda_float16, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_chalf_not_power_of_two_error_fft_fft2_cuda_complex32, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_chalf_not_power_of_two_error_fft_fft2_cuda_float16, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_chalf_not_power_of_two_error_fft_fft_cuda_complex32, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_chalf_not_power_of_two_error_fft_fft_cuda_float16, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_chalf_not_power_of_two_error_fft_fftn_cuda_complex32, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_chalf_not_power_of_two_error_fft_fftn_cuda_float16, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_chalf_not_power_of_two_error_fft_hfft2_cuda_complex32, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_chalf_not_power_of_two_error_fft_hfft2_cuda_float16, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_chalf_not_power_of_two_error_fft_hfft_cuda_complex32, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_chalf_not_power_of_two_error_fft_hfft_cuda_float16, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_chalf_not_power_of_two_error_fft_hfftn_cuda_complex32, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_chalf_not_power_of_two_error_fft_hfftn_cuda_float16, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_chalf_not_power_of_two_error_fft_ifft2_cuda_complex32, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_chalf_not_power_of_two_error_fft_ifft2_cuda_float16, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_chalf_not_power_of_two_error_fft_ifft_cuda_complex32, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_chalf_not_power_of_two_error_fft_ifft_cuda_float16, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_chalf_not_power_of_two_error_fft_ifftn_cuda_complex32, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_chalf_not_power_of_two_error_fft_ifftn_cuda_float16, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_chalf_not_power_of_two_error_fft_ihfft2_cuda_float16, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_chalf_not_power_of_two_error_fft_ihfft_cuda_float16, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_chalf_not_power_of_two_error_fft_ihfftn_cuda_float16, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_chalf_not_power_of_two_error_fft_irfft2_cuda_complex32, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_chalf_not_power_of_two_error_fft_irfft2_cuda_float16, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_chalf_not_power_of_two_error_fft_irfft_cuda_complex32, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_chalf_not_power_of_two_error_fft_irfft_cuda_float16, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_chalf_not_power_of_two_error_fft_irfftn_cuda_complex32, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_chalf_not_power_of_two_error_fft_irfftn_cuda_float16, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_chalf_not_power_of_two_error_fft_rfft2_cuda_float16, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_chalf_not_power_of_two_error_fft_rfft_cuda_float16, test/test_spectral_ops.py::TestFFTCUDA::test_fft_half_and_chalf_not_power_of_two_error_fft_rfftn_cuda_float16, test/test_spectral_ops.py::TestFFTCUDA::test_fft_ifft_rfft_irfft_cuda_float64, test/test_spectral_ops.py::TestFFTCUDA::test_fft_input_modification_cuda, test/test_spectral_ops.py::TestFFTCUDA::test_fft_invalid_dtypes_cuda, test/test_spectral_ops.py::TestFFTCUDA::test_fft_plan_repeatable_cuda, test/test_spectral_ops.py::TestFFTCUDA::test_fft_round_trip_cuda_complex128, test/test_spectral_ops.py::TestFFTCUDA::test_fft_round_trip_cuda_complex32, test/test_spectral_ops.py::TestFFTCUDA::test_fft_round_trip_cuda_complex64, test/test_spectral_ops.py::TestFFTCUDA::test_fft_round_trip_cuda_float16, test/test_spectral_ops.py::TestFFTCUDA::test_fft_round_trip_cuda_float32, test/test_spectral_ops.py::TestFFTCUDA::test_fft_round_trip_cuda_float64, test/test_spectral_ops.py::TestFFTCUDA::test_fft_type_promotion_cuda_complex128, test/test_spectral_ops.py::TestFFTCUDA::test_fft_type_promotion_cuda_complex32, test/test_spectral_ops.py::TestFFTCUDA::test_fft_type_promotion_cuda_complex64, test/test_spectral_ops.py::TestFFTCUDA::test_fft_type_promotion_cuda_float16, test/test_spectral_ops.py::TestFFTCUDA::test_fft_type_promotion_cuda_float32, test/test_spectral_ops.py::TestFFTCUDA::test_fft_type_promotion_cuda_float64, test/test_spectral_ops.py::TestFFTCUDA::test_fft_type_promotion_cuda_int8, test/test_spectral_ops.py::TestFFTCUDA::test_fftfreq_numpy_cuda_float32, test/test_spectral_ops.py::TestFFTCUDA::test_fftfreq_numpy_cuda_float64, test/test_spectral_ops.py::TestFFTCUDA::test_fftfreq_out_cuda_float32, test/test_spectral_ops.py::TestFFTCUDA::test_fftfreq_out_cuda_float64, test/test_spectral_ops.py::TestFFTCUDA::test_fftn_invalid__refs_fft_fftn_cuda_complex64, test/test_spectral_ops.py::TestFFTCUDA::test_fftn_invalid__refs_fft_fftn_cuda_float32, test/test_spectral_ops.py::TestFFTCUDA::test_fftn_invalid__refs_fft_hfftn_cuda_complex64, test/test_spectral_ops.py::TestFFTCUDA::test_fftn_invalid__refs_fft_hfftn_cuda_float32, test/test_spectral_ops.py::TestFFTCUDA::test_fftn_invalid__refs_fft_ifftn_cuda_complex64, test/test_spectral_ops.py::TestFFTCUDA::test_fftn_invalid__refs_fft_ifftn_cuda_float32, test/test_spectral_ops.py::TestFFTCUDA::test_fftn_invalid__refs_fft_ihfftn_cuda_float32, test/test_spectral_ops.py::TestFFTCUDA::test_fftn_invalid__refs_fft_irfftn_cuda_complex64, test/test_spectral_ops.py::TestFFTCUDA::test_fftn_invalid__refs_fft_irfftn_cuda_float32, test/test_spectral_ops.py::TestFFTCUDA::test_fftn_invalid__refs_fft_rfftn_cuda_float32, test/test_spectral_ops.py::TestFFTCUDA::test_fftn_invalid_fft_fftn_cuda_complex64, test/test_spectral_ops.py::TestFFTCUDA::test_fftn_invalid_fft_fftn_cuda_float32, test/test_spectral_ops.py::TestFFTCUDA::test_fftn_invalid_fft_hfftn_cuda_complex64, test/test_spectral_ops.py::TestFFTCUDA::test_fftn_invalid_fft_hfftn_cuda_float32, test/test_spectral_ops.py::TestFFTCUDA::test_fftn_invalid_fft_ifftn_cuda_complex64, test/test_spectral_ops.py::TestFFTCUDA::test_fftn_invalid_fft_ifftn_cuda_float32, test/test_spectral_ops.py::TestFFTCUDA::test_fftn_invalid_fft_ihfftn_cuda_float32, test/test_spectral_ops.py::TestFFTCUDA::test_fftn_invalid_fft_irfftn_cuda_complex64, test/test_spectral_ops.py::TestFFTCUDA::test_fftn_invalid_fft_irfftn_cuda_float32, test/test_spectral_ops.py::TestFFTCUDA::test_fftn_invalid_fft_rfftn_cuda_float32, test/test_spectral_ops.py::TestFFTCUDA::test_fftn_noop_transform_cuda_complex128, test/test_spectral_ops.py::TestFFTCUDA::test_fftn_noop_transform_cuda_complex64, test/test_spectral_ops.py::TestFFTCUDA::test_fftn_noop_transform_cuda_float16, test/test_spectral_ops.py::TestFFTCUDA::test_fftn_noop_transform_cuda_float32, test/test_spectral_ops.py::TestFFTCUDA::test_fftn_noop_transform_cuda_float64, test/test_spectral_ops.py::TestFFTCUDA::test_fftn_round_trip_cuda_complex128, test/test_spectral_ops.py::TestFFTCUDA::test_fftn_round_trip_cuda_complex32, test/test_spectral_ops.py::TestFFTCUDA::test_fftn_round_trip_cuda_complex64, test/test_spectral_ops.py::TestFFTCUDA::test_fftn_round_trip_cuda_float16, test/test_spectral_ops.py::TestFFTCUDA::test_fftn_round_trip_cuda_float32, test/test_spectral_ops.py::TestFFTCUDA::test_fftn_round_trip_cuda_float64, test/test_spectral_ops.py::TestFFTCUDA::test_fftshift_frequencies_cuda_float32, test/test_spectral_ops.py::TestFFTCUDA::test_fftshift_frequencies_cuda_float64, test/test_spectral_ops.py::TestFFTCUDA::test_fftshift_numpy_cuda_complex128, test/test_spectral_ops.py::TestFFTCUDA::test_fftshift_numpy_cuda_complex64, test/test_spectral_ops.py::TestFFTCUDA::test_fftshift_numpy_cuda_float32, test/test_spectral_ops.py::TestFFTCUDA::test_fftshift_numpy_cuda_float64, test/test_spectral_ops.py::TestFFTCUDA::test_hfftn_cuda_float16, test/test_spectral_ops.py::TestFFTCUDA::test_hfftn_cuda_float32, test/test_spectral_ops.py::TestFFTCUDA::test_hfftn_cuda_float64, test/test_spectral_ops.py::TestFFTCUDA::test_ihfftn_cuda_float16, test/test_spectral_ops.py::TestFFTCUDA::test_ihfftn_cuda_float32, test/test_spectral_ops.py::TestFFTCUDA::test_ihfftn_cuda_float64, test/test_spectral_ops.py::TestFFTCUDA::test_istft_against_librosa_cuda_float64, test/test_spectral_ops.py::TestFFTCUDA::test_istft_linearity_cuda_float64, test/test_spectral_ops.py::TestFFTCUDA::test_istft_of_sine_cuda_float64, test/test_spectral_ops.py::TestFFTCUDA::test_istft_requires_window_cuda, test/test_spectral_ops.py::TestFFTCUDA::test_istft_round_trip_simple_cases_cuda_float64, test/test_spectral_ops.py::TestFFTCUDA::test_istft_round_trip_various_params_cuda_float64, test/test_spectral_ops.py::TestFFTCUDA::test_istft_round_trip_with_padding_cuda_float64, test/test_spectral_ops.py::TestFFTCUDA::test_istft_throws_cuda, test/test_spectral_ops.py::TestFFTCUDA::test_reference_1d__refs_fft_fft_cuda_complex64, test/test_spectral_ops.py::TestFFTCUDA::test_reference_1d__refs_fft_fft_cuda_float32, test/test_spectral_ops.py::TestFFTCUDA::test_reference_1d__refs_fft_hfft_cuda_complex64, test/test_spectral_ops.py::TestFFTCUDA::test_reference_1d__refs_fft_hfft_cuda_float32, test/test_spectral_ops.py::TestFFTCUDA::test_reference_1d__refs_fft_ifft_cuda_complex64, test/test_spectral_ops.py::TestFFTCUDA::test_reference_1d__refs_fft_ifft_cuda_float32, test/test_spectral_ops.py::TestFFTCUDA::test_reference_1d__refs_fft_ihfft_cuda_float32, test/test_spectral_ops.py::TestFFTCUDA::test_reference_1d__refs_fft_irfft_cuda_complex64, test/test_spectral_ops.py::TestFFTCUDA::test_reference_1d__refs_fft_irfft_cuda_float32, test/test_spectral_ops.py::TestFFTCUDA::test_reference_1d__refs_fft_rfft_cuda_float32, test/test_spectral_ops.py::TestFFTCUDA::test_reference_1d_fft_fft_cuda_complex64, test/test_spectral_ops.py::TestFFTCUDA::test_reference_1d_fft_fft_cuda_float32, test/test_spectral_ops.py::TestFFTCUDA::test_reference_1d_fft_hfft_cuda_complex64, test/test_spectral_ops.py::TestFFTCUDA::test_reference_1d_fft_hfft_cuda_float32, test/test_spectral_ops.py::TestFFTCUDA::test_reference_1d_fft_ifft_cuda_complex64, test/test_spectral_ops.py::TestFFTCUDA::test_reference_1d_fft_ifft_cuda_float32, test/test_spectral_ops.py::TestFFTCUDA::test_reference_1d_fft_ihfft_cuda_float32, test/test_spectral_ops.py::TestFFTCUDA::test_reference_1d_fft_irfft_cuda_complex64, test/test_spectral_ops.py::TestFFTCUDA::test_reference_1d_fft_irfft_cuda_float32, test/test_spectral_ops.py::TestFFTCUDA::test_reference_1d_fft_rfft_cuda_float32, test/test_spectral_ops.py::TestFFTCUDA::test_reference_nd__refs_fft_fftn_cuda_complex128, test/test_spectral_ops.py::TestFFTCUDA::test_reference_nd__refs_fft_fftn_cuda_complex64, test/test_spectral_ops.py::TestFFTCUDA::test_reference_nd__refs_fft_hfftn_cuda_complex128, test/test_spectral_ops.py::TestFFTCUDA::test_reference_nd__refs_fft_hfftn_cuda_complex64, test/test_spectral_ops.py::TestFFTCUDA::test_reference_nd__refs_fft_ifftn_cuda_complex128, test/test_spectral_ops.py::TestFFTCUDA::test_reference_nd__refs_fft_ifftn_cuda_complex64, test/test_spectral_ops.py::TestFFTCUDA::test_reference_nd__refs_fft_irfftn_cuda_complex128, test/test_spectral_ops.py::TestFFTCUDA::test_reference_nd__refs_fft_irfftn_cuda_complex64, test/test_spectral_ops.py::TestFFTCUDA::test_reference_nd_fft_fftn_cuda_complex128, test/test_spectral_ops.py::TestFFTCUDA::test_reference_nd_fft_fftn_cuda_complex64, test/test_spectral_ops.py::TestFFTCUDA::test_reference_nd_fft_hfftn_cuda_complex128, test/test_spectral_ops.py::TestFFTCUDA::test_reference_nd_fft_hfftn_cuda_complex64, test/test_spectral_ops.py::TestFFTCUDA::test_reference_nd_fft_ifftn_cuda_complex128, test/test_spectral_ops.py::TestFFTCUDA::test_reference_nd_fft_ifftn_cuda_complex64, test/test_spectral_ops.py::TestFFTCUDA::test_reference_nd_fft_irfftn_cuda_complex128, test/test_spectral_ops.py::TestFFTCUDA::test_reference_nd_fft_irfftn_cuda_complex64, test/test_spectral_ops.py::TestFFTCUDA::test_stft_align_to_window_only_requires_non_center_cuda, test/test_spectral_ops.py::TestFFTCUDA::test_stft_cuda_float64, test/test_spectral_ops.py::TestFFTCUDA::test_stft_requires_complex_cuda, test/test_spectral_ops.py::TestFFTCUDA::test_stft_requires_window_cuda, test/test_spectral_ops.py::TestFFTCUDA::test_stft_roundtrip_complex_window_cuda_complex128, test/test_spectral_ops.py::TestFFTCUDA::test_stft_roundtrip_complex_window_cuda_float64, test/test_spectral_ops.py::TestFFTCUDA::test_stft_window_device_cuda 2025-12-04T17:19:02.6856900Z 2025-12-04T17:19:02.6857221Z Finished test_spectral_ops 1/1 ... [2025-12-04 17:19:02.657942][29171.040835748], took 0.63min 2025-12-04T17:19:02.7011963Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_spectral_ops/test_spectral_ops-ae86fbbf23286ef9.xml 2025-12-04T17:19:02.7813066Z Running test_autoload_disable 1/1 ... [2025-12-04 17:19:02.780994][29171.163888872] 2025-12-04T17:19:03.1536324Z Processing /var/lib/jenkins/workspace/test/cpp_extensions 2025-12-04T17:19:08.1219827Z Preparing metadata (pyproject.toml) ... [?25l- done 2025-12-04T17:19:08.1242714Z [?25hBuilding wheels for collected packages: torch_test_cpp_extension 2025-12-04T17:20:47.6583179Z Building wheel for torch_test_cpp_extension (pyproject.toml) ... [?25l- \ | / - \ | / - \ | / - \ | / - \ | / - done 2025-12-04T17:20:47.6959136Z [?25h Created wheel for torch_test_cpp_extension: filename=torch_test_cpp_extension-0.0.0-cp310-cp310-linux_x86_64.whl size=13197897 sha256=cc46647de823d8374b96f8523d19d867e9d39561b9e921d3b31efe3d71f46120 2025-12-04T17:20:47.6961552Z Stored in directory: /tmp/pip-ephem-wheel-cache-dyxdiszk/wheels/2b/79/8d/635cf291e138cfea331292ca746c62b61fade208eb55a7e3a1 2025-12-04T17:20:47.6982453Z Successfully built torch_test_cpp_extension 2025-12-04T17:20:48.3159868Z Installing collected packages: torch_test_cpp_extension 2025-12-04T17:20:48.5636691Z Successfully installed torch_test_cpp_extension-0.0.0 2025-12-04T17:20:52.8264014Z 2025-12-04T17:20:52.8264441Z Running tests... 2025-12-04T17:20:52.8264859Z ---------------------------------------------------------------------- 2025-12-04T17:20:54.7280014Z . 2025-12-04T17:20:54.7280439Z ---------------------------------------------------------------------- 2025-12-04T17:20:54.7280916Z Ran 1 test in 1.902s 2025-12-04T17:20:54.7281113Z 2025-12-04T17:20:54.7281204Z OK 2025-12-04T17:20:54.7281341Z 2025-12-04T17:20:54.7281462Z Generating XML reports... 2025-12-04T17:20:55.7018699Z Finished test_autoload_disable 1/1 ... [2025-12-04 17:20:55.701457][29284.084344964], took 1.88min 2025-12-04T17:20:55.7449158Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-unittest/test_autoload/TEST-TestDeviceBackendAutoload-20251204172052.xml 2025-12-04T17:20:55.9315414Z Running test_cpp_extensions_aot_ninja 1/1 ... [2025-12-04 17:20:55.931186][29284.314079145] 2025-12-04T17:20:56.3558820Z Processing /var/lib/jenkins/workspace/test/cpp_extensions 2025-12-04T17:21:01.4060441Z Preparing metadata (pyproject.toml) ... [?25l- done 2025-12-04T17:21:01.4083652Z [?25hBuilding wheels for collected packages: torch_test_cpp_extension 2025-12-04T17:22:41.3790833Z Building wheel for torch_test_cpp_extension (pyproject.toml) ... [?25l- \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | done 2025-12-04T17:22:41.4243503Z [?25h Created wheel for torch_test_cpp_extension: filename=torch_test_cpp_extension-0.0.0-cp310-cp310-linux_x86_64.whl size=16079182 sha256=3f76972410efd549ce5ce0fc42c4d311b1eda2e2cc15bab4ad63434d9d6c315d 2025-12-04T17:22:41.4245117Z Stored in directory: /tmp/pip-ephem-wheel-cache-4s2_754p/wheels/2b/79/8d/635cf291e138cfea331292ca746c62b61fade208eb55a7e3a1 2025-12-04T17:22:41.4268024Z Successfully built torch_test_cpp_extension 2025-12-04T17:22:42.0449811Z Installing collected packages: torch_test_cpp_extension 2025-12-04T17:22:42.3468390Z Successfully installed torch_test_cpp_extension-0.0.0 2025-12-04T17:22:42.8069819Z Processing /var/lib/jenkins/workspace/test/cpp_extensions/no_python_abi_suffix_test 2025-12-04T17:22:46.1406001Z Preparing metadata (pyproject.toml) ... [?25l- done 2025-12-04T17:22:46.1429199Z [?25hBuilding wheels for collected packages: no_python_abi_suffix_test 2025-12-04T17:22:49.9000589Z Building wheel for no_python_abi_suffix_test (pyproject.toml) ... [?25l- \ | done 2025-12-04T17:22:49.9009007Z [?25h Created wheel for no_python_abi_suffix_test: filename=no_python_abi_suffix_test-0.0.0-cp310-cp310-linux_x86_64.whl size=2944 sha256=9e1be669c02aec48f2b8fbc04478a32fe38a4139d4ab0c15944290f8b27df031 2025-12-04T17:22:49.9010617Z Stored in directory: /tmp/pip-ephem-wheel-cache-6q62uft5/wheels/8c/c7/11/bcf2bfbdebb3cf78b8211ac54acc945a8fdf1732548d147a80 2025-12-04T17:22:49.9032746Z Successfully built no_python_abi_suffix_test 2025-12-04T17:22:50.5211536Z Installing collected packages: no_python_abi_suffix_test 2025-12-04T17:22:50.5311625Z Successfully installed no_python_abi_suffix_test-0.0.0 2025-12-04T17:22:50.6738679Z * Getting build dependencies for wheel... 2025-12-04T17:22:53.5095367Z running egg_info 2025-12-04T17:22:53.5183467Z creating python_agnostic.egg-info 2025-12-04T17:22:53.5184680Z writing python_agnostic.egg-info/PKG-INFO 2025-12-04T17:22:53.5189449Z writing dependency_links to python_agnostic.egg-info/dependency_links.txt 2025-12-04T17:22:53.5192481Z writing top-level names to python_agnostic.egg-info/top_level.txt 2025-12-04T17:22:53.5195009Z writing manifest file 'python_agnostic.egg-info/SOURCES.txt' 2025-12-04T17:22:53.5713466Z reading manifest file 'python_agnostic.egg-info/SOURCES.txt' 2025-12-04T17:22:53.5722907Z writing manifest file 'python_agnostic.egg-info/SOURCES.txt' 2025-12-04T17:22:54.0383236Z * Building wheel... 2025-12-04T17:22:56.8940752Z running bdist_wheel 2025-12-04T17:22:56.9631689Z running build 2025-12-04T17:22:56.9632018Z running build_ext 2025-12-04T17:22:56.9670725Z building 'python_agnostic._C' extension 2025-12-04T17:22:56.9674995Z creating /var/lib/jenkins/workspace/test/cpp_extensions/python_agnostic_extension/build/temp.linux-x86_64-cpython-310/var/lib/jenkins/workspace/test/cpp_extensions/python_agnostic_extension/python_agnostic/csrc 2025-12-04T17:23:09.7489297Z [1/1] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output /var/lib/jenkins/workspace/test/cpp_extensions/python_agnostic_extension/build/temp.linux-x86_64-cpython-310/var/lib/jenkins/workspace/test/cpp_extensions/python_agnostic_extension/python_agnostic/csrc/ultra_norm.o.d -I/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include -I/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -I/usr/local/cuda/include -I/opt/conda/envs/py_3.10/include/python3.10 -c -c /var/lib/jenkins/workspace/test/cpp_extensions/python_agnostic_extension/python_agnostic/csrc/ultra_norm.cu -o /var/lib/jenkins/workspace/test/cpp_extensions/python_agnostic_extension/build/temp.linux-x86_64-cpython-310/var/lib/jenkins/workspace/test/cpp_extensions/python_agnostic_extension/python_agnostic/csrc/ultra_norm.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x030A0000 -DTORCH_EXTENSION_NAME=_C -gencode=arch=compute_75,code=sm_75 -std=c++17 2025-12-04T17:23:09.7552731Z creating build/lib.linux-x86_64-cpython-310/python_agnostic 2025-12-04T17:23:09.7559487Z g++ -pthread -B /opt/conda/envs/py_3.10/compiler_compat -Wno-unused-result -Wsign-compare -DNDEBUG -fwrapv -O2 -Wall -fPIC -O2 -isystem /opt/conda/envs/py_3.10/include -fPIC -O2 -isystem /opt/conda/envs/py_3.10/include -pthread -B /opt/conda/envs/py_3.10/compiler_compat -shared /var/lib/jenkins/workspace/test/cpp_extensions/python_agnostic_extension/build/temp.linux-x86_64-cpython-310/var/lib/jenkins/workspace/test/cpp_extensions/python_agnostic_extension/python_agnostic/csrc/ultra_norm.o -L/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/lib -L/usr/local/cuda/lib64 -lc10 -ltorch -ltorch_cpu -lcudart -lc10_cuda -ltorch_cuda -o build/lib.linux-x86_64-cpython-310/python_agnostic/_C.so 2025-12-04T17:23:10.3029453Z installing to build/bdist.linux-x86_64/wheel 2025-12-04T17:23:10.3029899Z running install 2025-12-04T17:23:10.3086233Z running install_lib 2025-12-04T17:23:10.3169041Z creating build/bdist.linux-x86_64/wheel 2025-12-04T17:23:10.3171622Z creating build/bdist.linux-x86_64/wheel/python_agnostic 2025-12-04T17:23:10.3173163Z copying build/lib.linux-x86_64-cpython-310/python_agnostic/_C.so -> build/bdist.linux-x86_64/wheel/./python_agnostic 2025-12-04T17:23:10.3179926Z running install_egg_info 2025-12-04T17:23:10.3263441Z running egg_info 2025-12-04T17:23:10.3340783Z writing python_agnostic.egg-info/PKG-INFO 2025-12-04T17:23:10.3345188Z writing dependency_links to python_agnostic.egg-info/dependency_links.txt 2025-12-04T17:23:10.3358915Z writing top-level names to python_agnostic.egg-info/top_level.txt 2025-12-04T17:23:10.3449672Z reading manifest file 'python_agnostic.egg-info/SOURCES.txt' 2025-12-04T17:23:10.3461333Z writing manifest file 'python_agnostic.egg-info/SOURCES.txt' 2025-12-04T17:23:10.3475224Z Copying python_agnostic.egg-info to build/bdist.linux-x86_64/wheel/./python_agnostic-0.0-py3.10.egg-info 2025-12-04T17:23:10.3481813Z running install_scripts 2025-12-04T17:23:10.3609366Z creating build/bdist.linux-x86_64/wheel/python_agnostic-0.0.dist-info/WHEEL 2025-12-04T17:23:10.3615380Z creating '/var/lib/jenkins/workspace/test/cpp_extensions/python_agnostic_extension/dist/.tmp-2zipdfqj/python_agnostic-0.0-cp39-abi3-linux_x86_64.whl' and adding 'build/bdist.linux-x86_64/wheel' to it 2025-12-04T17:23:10.3821680Z adding 'python_agnostic/_C.so' 2025-12-04T17:23:10.3838895Z adding 'python_agnostic-0.0.dist-info/METADATA' 2025-12-04T17:23:10.3840609Z adding 'python_agnostic-0.0.dist-info/WHEEL' 2025-12-04T17:23:10.3841603Z adding 'python_agnostic-0.0.dist-info/top_level.txt' 2025-12-04T17:23:10.3843175Z adding 'python_agnostic-0.0.dist-info/RECORD' 2025-12-04T17:23:10.3843987Z removing build/bdist.linux-x86_64/wheel 2025-12-04T17:23:10.8300273Z Successfully built python_agnostic-0.0-cp39-abi3-linux_x86_64.whl 2025-12-04T17:23:11.2246337Z Processing /var/lib/jenkins/workspace/test/cpp_extensions/libtorch_agnostic_2_9_extension 2025-12-04T17:23:14.6566486Z Preparing metadata (pyproject.toml) ... [?25l- done 2025-12-04T17:23:14.6591375Z [?25hRequirement already satisfied: torch in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from libtorch_agnostic_2_9==0.0) (2.10.0a0+gitffd9b0f) 2025-12-04T17:23:14.6621608Z Requirement already satisfied: filelock in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from torch->libtorch_agnostic_2_9==0.0) (3.18.0) 2025-12-04T17:23:14.6627228Z Requirement already satisfied: typing-extensions>=4.10.0 in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from torch->libtorch_agnostic_2_9==0.0) (4.12.2) 2025-12-04T17:23:14.6632692Z Requirement already satisfied: sympy>=1.13.3 in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from torch->libtorch_agnostic_2_9==0.0) (1.13.3) 2025-12-04T17:23:14.6637963Z Requirement already satisfied: networkx>=2.5.1 in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from torch->libtorch_agnostic_2_9==0.0) (2.8.8) 2025-12-04T17:23:14.6642139Z Requirement already satisfied: jinja2 in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from torch->libtorch_agnostic_2_9==0.0) (3.1.6) 2025-12-04T17:23:14.6647457Z Requirement already satisfied: fsspec>=0.8.5 in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from torch->libtorch_agnostic_2_9==0.0) (2025.10.0) 2025-12-04T17:23:14.7083864Z Requirement already satisfied: mpmath<1.4,>=1.1.0 in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from sympy>=1.13.3->torch->libtorch_agnostic_2_9==0.0) (1.3.0) 2025-12-04T17:23:14.7149222Z Requirement already satisfied: MarkupSafe>=2.0 in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from jinja2->torch->libtorch_agnostic_2_9==0.0) (3.0.3) 2025-12-04T17:23:14.7160552Z Building wheels for collected packages: libtorch_agnostic_2_9 2025-12-04T17:23:22.0021819Z Building wheel for libtorch_agnostic_2_9 (pyproject.toml) ... [?25l- \ | / done 2025-12-04T17:23:22.0031377Z [?25h Created wheel for libtorch_agnostic_2_9: filename=libtorch_agnostic_2_9-0.0-cp39-abi3-linux_x86_64.whl size=54876 sha256=63993cdfa4505b36917d28d00ca4398f906ad02fe7a4cab3e5d2f92a48153c55 2025-12-04T17:23:22.0033014Z Stored in directory: /tmp/pip-ephem-wheel-cache-ap_pvjar/wheels/e1/56/0d/91ac1e918c8015b48f6a77f66abeeb8427a8788f7d37715e0e 2025-12-04T17:23:22.0053091Z Successfully built libtorch_agnostic_2_9 2025-12-04T17:23:22.5774690Z Installing collected packages: libtorch_agnostic_2_9 2025-12-04T17:23:22.5913546Z Successfully installed libtorch_agnostic_2_9-0.0 2025-12-04T17:23:23.0578068Z Processing /var/lib/jenkins/workspace/test/cpp_extensions/libtorch_agnostic_2_10_extension 2025-12-04T17:23:26.3638137Z Preparing metadata (pyproject.toml) ... [?25l- done 2025-12-04T17:23:26.3661454Z [?25hRequirement already satisfied: torch in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from libtorch_agnostic_2_10==0.0) (2.10.0a0+gitffd9b0f) 2025-12-04T17:23:26.3691378Z Requirement already satisfied: filelock in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from torch->libtorch_agnostic_2_10==0.0) (3.18.0) 2025-12-04T17:23:26.3696974Z Requirement already satisfied: typing-extensions>=4.10.0 in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from torch->libtorch_agnostic_2_10==0.0) (4.12.2) 2025-12-04T17:23:26.3702409Z Requirement already satisfied: sympy>=1.13.3 in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from torch->libtorch_agnostic_2_10==0.0) (1.13.3) 2025-12-04T17:23:26.3707727Z Requirement already satisfied: networkx>=2.5.1 in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from torch->libtorch_agnostic_2_10==0.0) (2.8.8) 2025-12-04T17:23:26.3711765Z Requirement already satisfied: jinja2 in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from torch->libtorch_agnostic_2_10==0.0) (3.1.6) 2025-12-04T17:23:26.3717056Z Requirement already satisfied: fsspec>=0.8.5 in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from torch->libtorch_agnostic_2_10==0.0) (2025.10.0) 2025-12-04T17:23:26.4151055Z Requirement already satisfied: mpmath<1.4,>=1.1.0 in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from sympy>=1.13.3->torch->libtorch_agnostic_2_10==0.0) (1.3.0) 2025-12-04T17:23:26.4215588Z Requirement already satisfied: MarkupSafe>=2.0 in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from jinja2->torch->libtorch_agnostic_2_10==0.0) (3.0.3) 2025-12-04T17:23:26.4226702Z Building wheels for collected packages: libtorch_agnostic_2_10 2025-12-04T17:23:34.2679908Z Building wheel for libtorch_agnostic_2_10 (pyproject.toml) ... [?25l- \ | / - \ done 2025-12-04T17:23:34.2689618Z [?25h Created wheel for libtorch_agnostic_2_10: filename=libtorch_agnostic_2_10-0.0-cp39-abi3-linux_x86_64.whl size=81593 sha256=5f00068c251b607df4407edd8072b05735d778ebe5fb0a9e5a9f6d2828e50e3c 2025-12-04T17:23:34.2691135Z Stored in directory: /tmp/pip-ephem-wheel-cache-6rmob9iy/wheels/03/17/c4/d9b9dbd12b271a9a317a75e944d0966701385d67eac86f2c1a 2025-12-04T17:23:34.2712315Z Successfully built libtorch_agnostic_2_10 2025-12-04T17:23:34.8337332Z Installing collected packages: libtorch_agnostic_2_10 2025-12-04T17:23:34.8478112Z Successfully installed libtorch_agnostic_2_10-0.0 2025-12-04T17:23:34.9367114Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T17:23:34.9371457Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_cpp_extensions_aot_ninja.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 17:23:34.936857] 2025-12-04T17:23:43.5986616Z 2025-12-04T17:23:43.5987795Z test_cpp_extensions_aot_ninja 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_cpp_extensions_aot_ninja_1.1_f69ae0466baae8e0_.log 2025-12-04T17:23:43.5998650Z Running 21 items in this shard: test/test_cpp_extensions_aot_ninja.py::TestCppExtensionAOT::test_backward, test/test_cpp_extensions_aot_ninja.py::TestCppExtensionAOT::test_cublas_extension, test/test_cpp_extensions_aot_ninja.py::TestCppExtensionAOT::test_cuda_dlink_libs, test/test_cpp_extensions_aot_ninja.py::TestCppExtensionAOT::test_cuda_extension, test/test_cpp_extensions_aot_ninja.py::TestCppExtensionAOT::test_cusolver_extension, test/test_cpp_extensions_aot_ninja.py::TestCppExtensionAOT::test_extension_function, test/test_cpp_extensions_aot_ninja.py::TestCppExtensionAOT::test_extension_module, test/test_cpp_extensions_aot_ninja.py::TestCppExtensionAOT::test_mps_extension, test/test_cpp_extensions_aot_ninja.py::TestCppExtensionAOT::test_no_python_abi_suffix_sets_the_correct_library_name, test/test_cpp_extensions_aot_ninja.py::TestCppExtensionAOT::test_optional, test/test_cpp_extensions_aot_ninja.py::TestCppExtensionAOT::test_sycl_extension, test/test_cpp_extensions_aot_ninja.py::TestPybindTypeCasters::test_pybind_return_types, test/test_cpp_extensions_aot_ninja.py::TestMAIATensor::test_add, test/test_cpp_extensions_aot_ninja.py::TestMAIATensor::test_autocast_apis_for_maia_device, test/test_cpp_extensions_aot_ninja.py::TestMAIATensor::test_conv_backend_override, test/test_cpp_extensions_aot_ninja.py::TestMAIATensor::test_matmul_autocast_default_precision, test/test_cpp_extensions_aot_ninja.py::TestMAIATensor::test_matmul_autocast_float16_precision, test/test_cpp_extensions_aot_ninja.py::TestMAIATensor::test_unregistered, test/test_cpp_extensions_aot_ninja.py::TestMAIATensor::test_zeros, test/test_cpp_extensions_aot_ninja.py::TestRNGExtension::test_rng, test/test_cpp_extensions_aot_ninja.py::TestTorchLibrary::test_torch_library 2025-12-04T17:23:43.6007578Z 2025-12-04T17:23:43.6007953Z Finished test_cpp_extensions_aot_ninja 1/1 ... [2025-12-04 17:23:43.598541][29451.98143494], took 2.79min 2025-12-04T17:23:43.6425061Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_cpp_extensions_aot_ninja/test_cpp_extensions_aot_ninja-5c9ab2f003415ced.xml 2025-12-04T17:23:43.7209053Z Running test_cpp_extensions_aot_no_ninja 1/1 ... [2025-12-04 17:23:43.720588][29452.10348363] 2025-12-04T17:23:44.1137318Z Processing /var/lib/jenkins/workspace/test/cpp_extensions 2025-12-04T17:23:49.0856887Z Preparing metadata (pyproject.toml) ... [?25l- done 2025-12-04T17:23:49.0880602Z [?25hBuilding wheels for collected packages: torch_test_cpp_extension 2025-12-04T17:25:27.3206304Z Building wheel for torch_test_cpp_extension (pyproject.toml) ... [?25l- \ | / - \ | / - \ | / - \ | / - \ | / - done 2025-12-04T17:25:27.3584232Z [?25h Created wheel for torch_test_cpp_extension: filename=torch_test_cpp_extension-0.0.0-cp310-cp310-linux_x86_64.whl size=13197897 sha256=d3bb43a8de5ff621a65010cc20d8c1ac3de2b334ada123116dd85af039ed0652 2025-12-04T17:25:27.3585862Z Stored in directory: /tmp/pip-ephem-wheel-cache-5rvnqec8/wheels/2b/79/8d/635cf291e138cfea331292ca746c62b61fade208eb55a7e3a1 2025-12-04T17:25:27.3606700Z Successfully built torch_test_cpp_extension 2025-12-04T17:25:27.9727292Z Installing collected packages: torch_test_cpp_extension 2025-12-04T17:25:28.2260458Z Successfully installed torch_test_cpp_extension-0.0.0 2025-12-04T17:25:28.6883320Z Processing /var/lib/jenkins/workspace/test/cpp_extensions/no_python_abi_suffix_test 2025-12-04T17:25:32.0294575Z Preparing metadata (pyproject.toml) ... [?25l- done 2025-12-04T17:25:32.0317997Z [?25hBuilding wheels for collected packages: no_python_abi_suffix_test 2025-12-04T17:25:35.8430880Z Building wheel for no_python_abi_suffix_test (pyproject.toml) ... [?25l- \ | done 2025-12-04T17:25:35.8439493Z [?25h Created wheel for no_python_abi_suffix_test: filename=no_python_abi_suffix_test-0.0.0-cp310-cp310-linux_x86_64.whl size=2944 sha256=da515172478958bd0a69f7a1366f3fe56fc2eef5b1de4d4bc2e7d1388f072f02 2025-12-04T17:25:35.8441079Z Stored in directory: /tmp/pip-ephem-wheel-cache-r4jbm4xg/wheels/8c/c7/11/bcf2bfbdebb3cf78b8211ac54acc945a8fdf1732548d147a80 2025-12-04T17:25:35.8464029Z Successfully built no_python_abi_suffix_test 2025-12-04T17:25:36.4691998Z Installing collected packages: no_python_abi_suffix_test 2025-12-04T17:25:36.4811270Z Successfully installed no_python_abi_suffix_test-0.0.0 2025-12-04T17:25:36.6296596Z * Getting build dependencies for wheel... 2025-12-04T17:25:39.4404257Z running egg_info 2025-12-04T17:25:39.4494373Z writing python_agnostic.egg-info/PKG-INFO 2025-12-04T17:25:39.4499279Z writing dependency_links to python_agnostic.egg-info/dependency_links.txt 2025-12-04T17:25:39.4502594Z writing top-level names to python_agnostic.egg-info/top_level.txt 2025-12-04T17:25:39.5024833Z reading manifest file 'python_agnostic.egg-info/SOURCES.txt' 2025-12-04T17:25:39.5035508Z writing manifest file 'python_agnostic.egg-info/SOURCES.txt' 2025-12-04T17:25:39.9703280Z * Building wheel... 2025-12-04T17:25:43.0101796Z running bdist_wheel 2025-12-04T17:25:43.0805250Z running build 2025-12-04T17:25:43.0805546Z running build_ext 2025-12-04T17:25:43.0844648Z building 'python_agnostic._C' extension 2025-12-04T17:25:43.1611382Z ninja: no work to do. 2025-12-04T17:25:43.1663775Z g++ -pthread -B /opt/conda/envs/py_3.10/compiler_compat -Wno-unused-result -Wsign-compare -DNDEBUG -fwrapv -O2 -Wall -fPIC -O2 -isystem /opt/conda/envs/py_3.10/include -fPIC -O2 -isystem /opt/conda/envs/py_3.10/include -pthread -B /opt/conda/envs/py_3.10/compiler_compat -shared /var/lib/jenkins/workspace/test/cpp_extensions/python_agnostic_extension/build/temp.linux-x86_64-cpython-310/var/lib/jenkins/workspace/test/cpp_extensions/python_agnostic_extension/python_agnostic/csrc/ultra_norm.o -L/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/lib -L/usr/local/cuda/lib64 -lc10 -ltorch -ltorch_cpu -lcudart -lc10_cuda -ltorch_cuda -o build/lib.linux-x86_64-cpython-310/python_agnostic/_C.so 2025-12-04T17:25:43.7452553Z installing to build/bdist.linux-x86_64/wheel 2025-12-04T17:25:43.7452987Z running install 2025-12-04T17:25:43.7508359Z running install_lib 2025-12-04T17:25:43.7593703Z creating build/bdist.linux-x86_64/wheel 2025-12-04T17:25:43.7595402Z creating build/bdist.linux-x86_64/wheel/python_agnostic 2025-12-04T17:25:43.7596826Z copying build/lib.linux-x86_64-cpython-310/python_agnostic/_C.so -> build/bdist.linux-x86_64/wheel/./python_agnostic 2025-12-04T17:25:43.7603533Z running install_egg_info 2025-12-04T17:25:43.7687771Z running egg_info 2025-12-04T17:25:43.7765088Z writing python_agnostic.egg-info/PKG-INFO 2025-12-04T17:25:43.7769995Z writing dependency_links to python_agnostic.egg-info/dependency_links.txt 2025-12-04T17:25:43.7773669Z writing top-level names to python_agnostic.egg-info/top_level.txt 2025-12-04T17:25:43.7855178Z reading manifest file 'python_agnostic.egg-info/SOURCES.txt' 2025-12-04T17:25:43.7866452Z writing manifest file 'python_agnostic.egg-info/SOURCES.txt' 2025-12-04T17:25:43.7868418Z Copying python_agnostic.egg-info to build/bdist.linux-x86_64/wheel/./python_agnostic-0.0-py3.10.egg-info 2025-12-04T17:25:43.7876513Z running install_scripts 2025-12-04T17:25:43.8004515Z creating build/bdist.linux-x86_64/wheel/python_agnostic-0.0.dist-info/WHEEL 2025-12-04T17:25:43.8010283Z creating '/var/lib/jenkins/workspace/test/cpp_extensions/python_agnostic_extension/dist/.tmp-p4c6246r/python_agnostic-0.0-cp39-abi3-linux_x86_64.whl' and adding 'build/bdist.linux-x86_64/wheel' to it 2025-12-04T17:25:43.8214874Z adding 'python_agnostic/_C.so' 2025-12-04T17:25:43.8232202Z adding 'python_agnostic-0.0.dist-info/METADATA' 2025-12-04T17:25:43.8233490Z adding 'python_agnostic-0.0.dist-info/WHEEL' 2025-12-04T17:25:43.8234742Z adding 'python_agnostic-0.0.dist-info/top_level.txt' 2025-12-04T17:25:43.8236256Z adding 'python_agnostic-0.0.dist-info/RECORD' 2025-12-04T17:25:43.8237071Z removing build/bdist.linux-x86_64/wheel 2025-12-04T17:25:44.2516429Z Successfully built python_agnostic-0.0-cp39-abi3-linux_x86_64.whl 2025-12-04T17:25:44.6409568Z Processing /var/lib/jenkins/workspace/test/cpp_extensions/libtorch_agnostic_2_9_extension 2025-12-04T17:25:47.9354864Z Preparing metadata (pyproject.toml) ... [?25l- done 2025-12-04T17:25:47.9379863Z [?25hRequirement already satisfied: torch in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from libtorch_agnostic_2_9==0.0) (2.10.0a0+gitffd9b0f) 2025-12-04T17:25:47.9409782Z Requirement already satisfied: filelock in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from torch->libtorch_agnostic_2_9==0.0) (3.18.0) 2025-12-04T17:25:47.9415715Z Requirement already satisfied: typing-extensions>=4.10.0 in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from torch->libtorch_agnostic_2_9==0.0) (4.12.2) 2025-12-04T17:25:47.9421427Z Requirement already satisfied: sympy>=1.13.3 in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from torch->libtorch_agnostic_2_9==0.0) (1.13.3) 2025-12-04T17:25:47.9427269Z Requirement already satisfied: networkx>=2.5.1 in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from torch->libtorch_agnostic_2_9==0.0) (2.8.8) 2025-12-04T17:25:47.9431438Z Requirement already satisfied: jinja2 in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from torch->libtorch_agnostic_2_9==0.0) (3.1.6) 2025-12-04T17:25:47.9437162Z Requirement already satisfied: fsspec>=0.8.5 in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from torch->libtorch_agnostic_2_9==0.0) (2025.10.0) 2025-12-04T17:25:47.9871239Z Requirement already satisfied: mpmath<1.4,>=1.1.0 in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from sympy>=1.13.3->torch->libtorch_agnostic_2_9==0.0) (1.3.0) 2025-12-04T17:25:47.9935899Z Requirement already satisfied: MarkupSafe>=2.0 in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from jinja2->torch->libtorch_agnostic_2_9==0.0) (3.0.3) 2025-12-04T17:25:47.9947517Z Building wheels for collected packages: libtorch_agnostic_2_9 2025-12-04T17:25:51.8812873Z Building wheel for libtorch_agnostic_2_9 (pyproject.toml) ... [?25l- \ | done 2025-12-04T17:25:51.8822117Z [?25h Created wheel for libtorch_agnostic_2_9: filename=libtorch_agnostic_2_9-0.0-cp39-abi3-linux_x86_64.whl size=54876 sha256=82babd3bcf2045972a618ae7912fb61a0a5e5f4e6e94fc7cdfdab7eeb43e57c2 2025-12-04T17:25:51.8823699Z Stored in directory: /tmp/pip-ephem-wheel-cache-yxx7mpj8/wheels/e1/56/0d/91ac1e918c8015b48f6a77f66abeeb8427a8788f7d37715e0e 2025-12-04T17:25:51.8844069Z Successfully built libtorch_agnostic_2_9 2025-12-04T17:25:52.4531318Z Installing collected packages: libtorch_agnostic_2_9 2025-12-04T17:25:52.4666938Z Successfully installed libtorch_agnostic_2_9-0.0 2025-12-04T17:25:52.9282891Z Processing /var/lib/jenkins/workspace/test/cpp_extensions/libtorch_agnostic_2_10_extension 2025-12-04T17:25:56.3839340Z Preparing metadata (pyproject.toml) ... [?25l- done 2025-12-04T17:25:56.3863694Z [?25hRequirement already satisfied: torch in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from libtorch_agnostic_2_10==0.0) (2.10.0a0+gitffd9b0f) 2025-12-04T17:25:56.3894348Z Requirement already satisfied: filelock in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from torch->libtorch_agnostic_2_10==0.0) (3.18.0) 2025-12-04T17:25:56.3900553Z Requirement already satisfied: typing-extensions>=4.10.0 in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from torch->libtorch_agnostic_2_10==0.0) (4.12.2) 2025-12-04T17:25:56.3906036Z Requirement already satisfied: sympy>=1.13.3 in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from torch->libtorch_agnostic_2_10==0.0) (1.13.3) 2025-12-04T17:25:56.3911534Z Requirement already satisfied: networkx>=2.5.1 in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from torch->libtorch_agnostic_2_10==0.0) (2.8.8) 2025-12-04T17:25:56.3915532Z Requirement already satisfied: jinja2 in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from torch->libtorch_agnostic_2_10==0.0) (3.1.6) 2025-12-04T17:25:56.3921070Z Requirement already satisfied: fsspec>=0.8.5 in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from torch->libtorch_agnostic_2_10==0.0) (2025.10.0) 2025-12-04T17:25:56.4362006Z Requirement already satisfied: mpmath<1.4,>=1.1.0 in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from sympy>=1.13.3->torch->libtorch_agnostic_2_10==0.0) (1.3.0) 2025-12-04T17:25:56.4428038Z Requirement already satisfied: MarkupSafe>=2.0 in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from jinja2->torch->libtorch_agnostic_2_10==0.0) (3.0.3) 2025-12-04T17:25:56.4439584Z Building wheels for collected packages: libtorch_agnostic_2_10 2025-12-04T17:26:00.6296282Z Building wheel for libtorch_agnostic_2_10 (pyproject.toml) ... [?25l- \ | done 2025-12-04T17:26:00.6306491Z [?25h Created wheel for libtorch_agnostic_2_10: filename=libtorch_agnostic_2_10-0.0-cp39-abi3-linux_x86_64.whl size=81593 sha256=a10451cbfb056d0b0c8c2a62b0a026ccfaf49e43556f6e1867a4e59bfd96dff1 2025-12-04T17:26:00.6308195Z Stored in directory: /tmp/pip-ephem-wheel-cache-5l5_jzuz/wheels/03/17/c4/d9b9dbd12b271a9a317a75e944d0966701385d67eac86f2c1a 2025-12-04T17:26:00.6328218Z Successfully built libtorch_agnostic_2_10 2025-12-04T17:26:01.1988983Z Installing collected packages: libtorch_agnostic_2_10 2025-12-04T17:26:01.2133811Z Successfully installed libtorch_agnostic_2_10-0.0 2025-12-04T17:26:01.3007918Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T17:26:01.3012241Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_cpp_extensions_aot_no_ninja.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 17:26:01.300941] 2025-12-04T17:26:09.8033158Z 2025-12-04T17:26:09.8034386Z test_cpp_extensions_aot_no_ninja 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_cpp_extensions_aot_no_ninja_1.1_8356099a97b89d55_.log 2025-12-04T17:26:09.8044028Z Running 21 items in this shard: test/test_cpp_extensions_aot_no_ninja.py::TestCppExtensionAOT::test_backward, test/test_cpp_extensions_aot_no_ninja.py::TestCppExtensionAOT::test_cublas_extension, test/test_cpp_extensions_aot_no_ninja.py::TestCppExtensionAOT::test_cuda_dlink_libs, test/test_cpp_extensions_aot_no_ninja.py::TestCppExtensionAOT::test_cuda_extension, test/test_cpp_extensions_aot_no_ninja.py::TestCppExtensionAOT::test_cusolver_extension, test/test_cpp_extensions_aot_no_ninja.py::TestCppExtensionAOT::test_extension_function, test/test_cpp_extensions_aot_no_ninja.py::TestCppExtensionAOT::test_extension_module, test/test_cpp_extensions_aot_no_ninja.py::TestCppExtensionAOT::test_mps_extension, test/test_cpp_extensions_aot_no_ninja.py::TestCppExtensionAOT::test_no_python_abi_suffix_sets_the_correct_library_name, test/test_cpp_extensions_aot_no_ninja.py::TestCppExtensionAOT::test_optional, test/test_cpp_extensions_aot_no_ninja.py::TestCppExtensionAOT::test_sycl_extension, test/test_cpp_extensions_aot_no_ninja.py::TestPybindTypeCasters::test_pybind_return_types, test/test_cpp_extensions_aot_no_ninja.py::TestMAIATensor::test_add, test/test_cpp_extensions_aot_no_ninja.py::TestMAIATensor::test_autocast_apis_for_maia_device, test/test_cpp_extensions_aot_no_ninja.py::TestMAIATensor::test_conv_backend_override, test/test_cpp_extensions_aot_no_ninja.py::TestMAIATensor::test_matmul_autocast_default_precision, test/test_cpp_extensions_aot_no_ninja.py::TestMAIATensor::test_matmul_autocast_float16_precision, test/test_cpp_extensions_aot_no_ninja.py::TestMAIATensor::test_unregistered, test/test_cpp_extensions_aot_no_ninja.py::TestMAIATensor::test_zeros, test/test_cpp_extensions_aot_no_ninja.py::TestRNGExtension::test_rng, test/test_cpp_extensions_aot_no_ninja.py::TestTorchLibrary::test_torch_library 2025-12-04T17:26:09.8053017Z 2025-12-04T17:26:09.8053387Z Finished test_cpp_extensions_aot_no_ninja 1/1 ... [2025-12-04 17:26:09.803163][29598.18605664], took 2.43min 2025-12-04T17:26:09.8471572Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_cpp_extensions_aot_no_ninja/test_cpp_extensions_aot_no_ninja-dc0b3ab1cc30279c.xml 2025-12-04T17:26:11.5773483Z Uploading artifacts took 1.65 seconds 2025-12-04T17:26:18.8605662Z Running test batch 'tests to run' cost 27651.08 seconds 2025-12-04T17:26:18.8620448Z Emitting td_test_failure_stats_v2 2025-12-04T17:26:18.8624184Z Writing 1 documents to S3 ossci-raw-job-status/ossci_uploaded_metrics/td_test_failure_stats_v2_1764869178_5850e89cd13611f09b240242ac110002 2025-12-04T17:26:18.9615302Z Done! Finish writing document to S3 ossci-raw-job-status/ossci_uploaded_metrics/td_test_failure_stats_v2_1764869178_5850e89cd13611f09b240242ac110002 2025-12-04T17:26:18.9630024Z Emitting td_test_failure_stats_v2 2025-12-04T17:26:18.9632718Z Writing 1 documents to S3 ossci-raw-job-status/ossci_uploaded_metrics/td_test_failure_stats_v2_1764869178_58604cb0d13611f09b240242ac110002 2025-12-04T17:26:18.9967555Z Done! Finish writing document to S3 ossci-raw-job-status/ossci_uploaded_metrics/td_test_failure_stats_v2_1764869178_58604cb0d13611f09b240242ac110002 2025-12-04T17:26:18.9983682Z Emitting td_test_failure_stats_v2 2025-12-04T17:26:18.9986442Z Writing 1 documents to S3 ossci-raw-job-status/ossci_uploaded_metrics/td_test_failure_stats_v2_1764869178_5865b312d13611f09b240242ac110002 2025-12-04T17:26:19.0324791Z Done! Finish writing document to S3 ossci-raw-job-status/ossci_uploaded_metrics/td_test_failure_stats_v2_1764869178_5865b312d13611f09b240242ac110002 2025-12-04T17:26:19.0341150Z Emitting td_test_failure_stats_v2 2025-12-04T17:26:19.0344078Z Writing 1 documents to S3 ossci-raw-job-status/ossci_uploaded_metrics/td_test_failure_stats_v2_1764869179_586b2798d13611f09b240242ac110002 2025-12-04T17:26:19.0714313Z Done! Finish writing document to S3 ossci-raw-job-status/ossci_uploaded_metrics/td_test_failure_stats_v2_1764869179_586b2798d13611f09b240242ac110002 2025-12-04T17:26:19.0730442Z Emitting td_test_failure_stats_v2 2025-12-04T17:26:19.0733117Z Writing 1 documents to S3 ossci-raw-job-status/ossci_uploaded_metrics/td_test_failure_stats_v2_1764869179_587117acd13611f09b240242ac110002 2025-12-04T17:26:19.1094534Z Done! Finish writing document to S3 ossci-raw-job-status/ossci_uploaded_metrics/td_test_failure_stats_v2_1764869179_587117acd13611f09b240242ac110002 2025-12-04T17:26:19.1111649Z Emitting td_test_failure_stats_v2 2025-12-04T17:26:19.1114509Z Writing 1 documents to S3 ossci-raw-job-status/ossci_uploaded_metrics/td_test_failure_stats_v2_1764869179_5876e736d13611f09b240242ac110002 2025-12-04T17:26:19.1468105Z Done! Finish writing document to S3 ossci-raw-job-status/ossci_uploaded_metrics/td_test_failure_stats_v2_1764869179_5876e736d13611f09b240242ac110002 2025-12-04T17:26:19.1484907Z Emitting td_test_failure_stats_v2 2025-12-04T17:26:19.1487526Z Writing 1 documents to S3 ossci-raw-job-status/ossci_uploaded_metrics/td_test_failure_stats_v2_1764869179_587c987ad13611f09b240242ac110002 2025-12-04T17:26:19.1863554Z Done! Finish writing document to S3 ossci-raw-job-status/ossci_uploaded_metrics/td_test_failure_stats_v2_1764869179_587c987ad13611f09b240242ac110002 2025-12-04T17:26:19.1880012Z Emitting td_test_failure_stats_v2 2025-12-04T17:26:19.1883223Z Writing 1 documents to S3 ossci-raw-job-status/ossci_uploaded_metrics/td_test_failure_stats_v2_1764869179_5882a238d13611f09b240242ac110002 2025-12-04T17:26:19.2238927Z Done! Finish writing document to S3 ossci-raw-job-status/ossci_uploaded_metrics/td_test_failure_stats_v2_1764869179_5882a238d13611f09b240242ac110002 2025-12-04T17:26:19.2255286Z Emitting td_test_failure_stats_v2 2025-12-04T17:26:19.2258702Z Writing 1 documents to S3 ossci-raw-job-status/ossci_uploaded_metrics/td_test_failure_stats_v2_1764869179_58885c82d13611f09b240242ac110002 2025-12-04T17:26:19.2614033Z Done! Finish writing document to S3 ossci-raw-job-status/ossci_uploaded_metrics/td_test_failure_stats_v2_1764869179_58885c82d13611f09b240242ac110002 2025-12-04T17:26:19.2630330Z Emitting td_test_failure_stats_v2 2025-12-04T17:26:19.2633590Z Writing 1 documents to S3 ossci-raw-job-status/ossci_uploaded_metrics/td_test_failure_stats_v2_1764869179_588e156ed13611f09b240242ac110002 2025-12-04T17:26:19.3176402Z Done! Finish writing document to S3 ossci-raw-job-status/ossci_uploaded_metrics/td_test_failure_stats_v2_1764869179_588e156ed13611f09b240242ac110002 2025-12-04T17:26:19.3192773Z Emitting td_test_failure_stats_v2 2025-12-04T17:26:19.3196125Z Writing 1 documents to S3 ossci-raw-job-status/ossci_uploaded_metrics/td_test_failure_stats_v2_1764869179_5896aad0d13611f09b240242ac110002 2025-12-04T17:26:19.3551613Z Done! Finish writing document to S3 ossci-raw-job-status/ossci_uploaded_metrics/td_test_failure_stats_v2_1764869179_5896aad0d13611f09b240242ac110002 2025-12-04T17:26:19.3552809Z inductor/test_aot_inductor 1/6 failed! 2025-12-04T17:26:19.3553531Z inductor/test_aot_inductor 6/6 failed! 2025-12-04T17:26:19.3554216Z inductor/test_torchinductor_codegen_dynamic_shapes 2/4 failed! 2025-12-04T17:26:19.3554742Z inductor/test_cuda_select_algorithm 3/5 failed! 2025-12-04T17:26:19.3555186Z inductor/test_compile_subprocess 3/3 failed! 2025-12-04T17:26:19.3555599Z inductor/test_deterministic 5/8 failed! 2025-12-04T17:26:19.3555961Z inductor/test_fp8 1/1 failed! 2025-12-04T17:26:19.3556299Z dynamo/test_model_output 1/1 failed! 2025-12-04T17:26:19.3556795Z inductor/test_loop_ordering 1/1 failed! 2025-12-04T17:26:19.3557160Z dynamo/test_backends 1/1 failed! 2025-12-04T17:26:19.3557541Z inductor/test_aot_inductor_package 1/1 failed! 2025-12-04T17:26:20.2353282Z 2025-12-04T17:26:20.2353766Z real 460m59.836s 2025-12-04T17:26:20.2354144Z user 454m32.649s 2025-12-04T17:26:20.2354397Z sys 59m41.849s 2025-12-04T17:26:20.2354659Z + sccache_epilogue 2025-12-04T17:26:20.2354999Z + echo '::group::Sccache Compilation Log' 2025-12-04T17:26:20.2355833Z ##[group]Sccache Compilation Log 2025-12-04T17:26:20.2356250Z + echo '=================== sccache compilation log ===================' 2025-12-04T17:26:20.2356720Z =================== sccache compilation log =================== 2025-12-04T17:26:20.2357460Z + python /var/lib/jenkins/workspace/.ci/pytorch/print_sccache_log.py /var/lib/jenkins/sccache_error.log 2025-12-04T17:26:20.2507813Z + echo '=========== If your build fails, please take a look at the log above for possible reasons ===========' 2025-12-04T17:26:20.2508640Z =========== If your build fails, please take a look at the log above for possible reasons =========== 2025-12-04T17:26:20.2509205Z + sccache --show-stats 2025-12-04T17:26:20.2543267Z Compile requests 6749 2025-12-04T17:26:20.2543680Z Compile requests executed 604 2025-12-04T17:26:20.2544039Z Cache hits 314 2025-12-04T17:26:20.2545634Z Cache hits (C/C++) 314 2025-12-04T17:26:20.2546036Z Cache misses 290 2025-12-04T17:26:20.2546495Z Cache misses (C/C++) 290 2025-12-04T17:26:20.2546851Z Cache hits rate 51.99 % 2025-12-04T17:26:20.2547231Z Cache hits rate (C/C++) 51.99 % 2025-12-04T17:26:20.2547603Z Cache timeouts 0 2025-12-04T17:26:20.2547954Z Cache read errors 0 2025-12-04T17:26:20.2548316Z Forced recaches 0 2025-12-04T17:26:20.2548674Z Cache write errors 0 2025-12-04T17:26:20.2549020Z Cache errors 0 2025-12-04T17:26:20.2549373Z Compilations 290 2025-12-04T17:26:20.2549737Z Compilation failures 0 2025-12-04T17:26:20.2550101Z Non-cacheable compilations 0 2025-12-04T17:26:20.2550522Z Non-cacheable calls 339 2025-12-04T17:26:20.2550899Z Non-compilation calls 5806 2025-12-04T17:26:20.2551285Z Unsupported compiler calls 0 2025-12-04T17:26:20.2551658Z Average cache write 0.044 s 2025-12-04T17:26:20.2552044Z Average compiler 6.073 s 2025-12-04T17:26:20.2552427Z Average cache read hit 0.036 s 2025-12-04T17:26:20.2552821Z Failed distributed compilations 0 2025-12-04T17:26:20.2553076Z 2025-12-04T17:26:20.2553191Z Non-cacheable reasons: 2025-12-04T17:26:20.2553511Z unknown source language 258 2025-12-04T17:26:20.2553873Z -E 81 2025-12-04T17:26:20.2554112Z 2025-12-04T17:26:20.2554383Z Cache location s3, name: ossci-compiler-cache-circleci-v2, prefix: / 2025-12-04T17:26:20.2554924Z Version (client) 0.10.0 2025-12-04T17:26:20.2555287Z + sccache --stop-server 2025-12-04T17:26:20.2569994Z Stopping sccache server... 2025-12-04T17:26:20.2573875Z Compile requests 6749 2025-12-04T17:26:20.2574554Z Compile requests executed 604 2025-12-04T17:26:20.2574967Z Cache hits 314 2025-12-04T17:26:20.2575317Z Cache hits (C/C++) 314 2025-12-04T17:26:20.2575680Z Cache misses 290 2025-12-04T17:26:20.2576048Z Cache misses (C/C++) 290 2025-12-04T17:26:20.2576599Z Cache hits rate 51.99 % 2025-12-04T17:26:20.2577089Z Cache hits rate (C/C++) 51.99 % 2025-12-04T17:26:20.2577567Z Cache timeouts 0 2025-12-04T17:26:20.2577926Z Cache read errors 0 2025-12-04T17:26:20.2578269Z Forced recaches 0 2025-12-04T17:26:20.2578739Z Cache write errors 0 2025-12-04T17:26:20.2579094Z Cache errors 0 2025-12-04T17:26:20.2579437Z Compilations 290 2025-12-04T17:26:20.2579805Z Compilation failures 0 2025-12-04T17:26:20.2580181Z Non-cacheable compilations 0 2025-12-04T17:26:20.2580545Z Non-cacheable calls 339 2025-12-04T17:26:20.2580916Z Non-compilation calls 5806 2025-12-04T17:26:20.2581292Z Unsupported compiler calls 0 2025-12-04T17:26:20.2581670Z Average cache write 0.044 s 2025-12-04T17:26:20.2582036Z Average compiler 6.073 s 2025-12-04T17:26:20.2582417Z Average cache read hit 0.036 s 2025-12-04T17:26:20.2582808Z Failed distributed compilations 0 2025-12-04T17:26:20.2583060Z 2025-12-04T17:26:20.2583171Z Non-cacheable reasons: 2025-12-04T17:26:20.2583488Z unknown source language 258 2025-12-04T17:26:20.2583849Z -E 81 2025-12-04T17:26:20.2584086Z 2025-12-04T17:26:20.2584360Z Cache location s3, name: ossci-compiler-cache-circleci-v2, prefix: / 2025-12-04T17:26:20.2584902Z Version (client) 0.10.0 2025-12-04T17:26:20.2585293Z + echo ::endgroup:: 2025-12-04T17:26:20.2585860Z ##[endgroup] 2025-12-04T17:26:20.2586131Z + cleanup_workspace 2025-12-04T17:26:20.2586796Z + echo 'sudo may print the following warning message that can be ignored. The chown command will still run.' 2025-12-04T17:26:20.2587981Z sudo may print the following warning message that can be ignored. The chown command will still run. 2025-12-04T17:26:20.2588736Z + echo ' sudo: setrlimit(RLIMIT_STACK): Operation not permitted' 2025-12-04T17:26:20.2589400Z sudo: setrlimit(RLIMIT_STACK): Operation not permitted 2025-12-04T17:26:20.2590054Z + echo 'For more details refer to https://github.com/sudo-project/sudo/issues/42' 2025-12-04T17:26:20.2590760Z For more details refer to https://github.com/sudo-project/sudo/issues/42 2025-12-04T17:26:20.2591457Z + sudo chown -R 1000 /var/lib/jenkins/workspace 2025-12-04T17:26:21.0454603Z ##[error]Process completed with exit code 1. 2025-12-04T17:26:21.0546909Z Prepare all required actions 2025-12-04T17:26:21.0547390Z Getting action download info 2025-12-04T17:26:21.2514905Z ##[group]Run ./.github/actions/pytest-cache-upload 2025-12-04T17:26:21.2515312Z with: 2025-12-04T17:26:21.2515569Z cache_dir: .pytest_cache 2025-12-04T17:26:21.2515885Z shard: 1 2025-12-04T17:26:21.2516162Z sha: ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32 2025-12-04T17:26:21.2516564Z test_config: legacy_nvidia_driver 2025-12-04T17:26:21.2517008Z job_identifier: periodic_linux-jammy-cuda12.4-py3.10-gcc11 2025-12-04T17:26:21.2517449Z env: 2025-12-04T17:26:21.2517679Z GIT_DEFAULT_BRANCH: main 2025-12-04T17:26:21.2517987Z HAS_NVIDIA_GPU: true 2025-12-04T17:26:21.2518352Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T17:26:21.2518982Z DOCKER_CONTAINER_ID: 764ff984146fd3268e049644ccb47d7d8238fae8138055ae0a6928cb5da435ad 2025-12-04T17:26:21.2519570Z ##[endgroup] 2025-12-04T17:26:21.2557490Z ##[group]Run nick-fields/retry@v3.0.0 2025-12-04T17:26:21.2557888Z with: 2025-12-04T17:26:21.2558113Z shell: bash 2025-12-04T17:26:21.2558370Z timeout_minutes: 5 2025-12-04T17:26:21.2558651Z max_attempts: 5 2025-12-04T17:26:21.2558911Z retry_wait_seconds: 30 2025-12-04T17:26:21.2559296Z command: set -eu python3 -m pip install boto3==1.35.42 2025-12-04T17:26:21.2559747Z polling_interval_seconds: 1 2025-12-04T17:26:21.2560075Z warning_on_retry: true 2025-12-04T17:26:21.2560368Z continue_on_error: false 2025-12-04T17:26:21.2560659Z env: 2025-12-04T17:26:21.2560898Z GIT_DEFAULT_BRANCH: main 2025-12-04T17:26:21.2561188Z HAS_NVIDIA_GPU: true 2025-12-04T17:26:21.2561550Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T17:26:21.2562197Z DOCKER_CONTAINER_ID: 764ff984146fd3268e049644ccb47d7d8238fae8138055ae0a6928cb5da435ad 2025-12-04T17:26:21.2562764Z ##[endgroup] 2025-12-04T17:26:21.6664692Z Defaulting to user installation because normal site-packages is not writeable 2025-12-04T17:26:22.9494346Z Collecting boto3==1.35.42 2025-12-04T17:26:22.9701631Z Downloading boto3-1.35.42-py3-none-any.whl (139 kB) 2025-12-04T17:26:22.9875622Z Requirement already satisfied: jmespath<2.0.0,>=0.7.1 in /usr/lib/python3.9/site-packages (from boto3==1.35.42) (0.10.0) 2025-12-04T17:26:24.4419975Z Collecting botocore<1.36.0,>=1.35.42 2025-12-04T17:26:24.4469693Z Downloading botocore-1.35.99-py3-none-any.whl (13.3 MB) 2025-12-04T17:26:24.6967727Z Collecting s3transfer<0.11.0,>=0.10.0 2025-12-04T17:26:24.7036897Z Downloading s3transfer-0.10.4-py3-none-any.whl (83 kB) 2025-12-04T17:26:24.7145041Z Requirement already satisfied: python-dateutil<3.0.0,>=2.1 in /usr/lib/python3.9/site-packages (from botocore<1.36.0,>=1.35.42->boto3==1.35.42) (2.8.1) 2025-12-04T17:26:24.7157907Z Requirement already satisfied: urllib3<1.27,>=1.25.4 in /usr/lib/python3.9/site-packages (from botocore<1.36.0,>=1.35.42->boto3==1.35.42) (1.25.10) 2025-12-04T17:26:24.9206718Z Requirement already satisfied: six>=1.5 in /usr/lib/python3.9/site-packages (from python-dateutil<3.0.0,>=2.1->botocore<1.36.0,>=1.35.42->boto3==1.35.42) (1.15.0) 2025-12-04T17:26:25.0248522Z Installing collected packages: botocore, s3transfer, boto3 2025-12-04T17:26:25.6766098Z Successfully installed boto3-1.35.42 botocore-1.35.99 s3transfer-0.10.4 2025-12-04T17:26:26.3496261Z Command completed after 1 attempt(s). 2025-12-04T17:26:26.3554203Z ##[group]Run python3 .github/scripts/pytest_cache.py \ 2025-12-04T17:26:26.3554762Z python3 .github/scripts/pytest_cache.py \ 2025-12-04T17:26:26.3555189Z  --upload \ 2025-12-04T17:26:26.3555559Z  --cache_dir "$GITHUB_WORKSPACE/$CACHE_DIR" \ 2025-12-04T17:26:26.3556009Z  --pr_identifier "$GITHUB_REF" \ 2025-12-04T17:26:26.3556439Z  --job_identifier "$JOB_IDENTIFIER" \ 2025-12-04T17:26:26.3556837Z  --sha "$SHA" \ 2025-12-04T17:26:26.3557189Z  --test_config "$TEST_CONFIG" \ 2025-12-04T17:26:26.3557553Z  --shard "$SHARD" \ 2025-12-04T17:26:26.3558079Z  --repo "$REPO" \ 2025-12-04T17:26:26.3558427Z  --temp_dir "$RUNNER_TEMP" \ 2025-12-04T17:26:26.3569833Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T17:26:26.3570287Z env: 2025-12-04T17:26:26.3570549Z GIT_DEFAULT_BRANCH: main 2025-12-04T17:26:26.3570851Z HAS_NVIDIA_GPU: true 2025-12-04T17:26:26.3571530Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T17:26:26.3572190Z DOCKER_CONTAINER_ID: 764ff984146fd3268e049644ccb47d7d8238fae8138055ae0a6928cb5da435ad 2025-12-04T17:26:26.3572790Z CACHE_DIR: .pytest_cache 2025-12-04T17:26:26.3573196Z JOB_IDENTIFIER: periodic_linux-jammy-cuda12.4-py3.10-gcc11 2025-12-04T17:26:26.3573697Z SHA: ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32 2025-12-04T17:26:26.3574106Z TEST_CONFIG: legacy_nvidia_driver 2025-12-04T17:26:26.3574433Z SHARD: 1 2025-12-04T17:26:26.3574697Z REPO: pytorch/pytorch 2025-12-04T17:26:26.3574992Z ##[endgroup] 2025-12-04T17:26:26.8688011Z PR identifier for `refs/heads/main` is `96e092540d6b3c4076e3d2bc6f1f9013` 2025-12-04T17:26:26.8690408Z Uploading cache with args Namespace(upload=True, download=False, cache_dir='/home/ec2-user/actions-runner/_work/pytorch/pytorch/.pytest_cache', pr_identifier='refs/heads/main', job_identifier='periodic_linux-jammy-cuda12.4-py3.10-gcc11', sha='ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32', test_config='legacy_nvidia_driver', shard='1', repo='pytorch/pytorch', temp_dir='/home/ec2-user/actions-runner/_work/_temp', bucket=None) 2025-12-04T17:26:26.8692779Z Zipping /home/ec2-user/actions-runner/_work/pytorch/pytorch/.pytest_cache 2025-12-04T17:26:26.8694331Z to /home/ec2-user/actions-runner/_work/_temp/zip-upload/pytest_cache/pytorch/pytorch/96e092540d6b3c4076e3d2bc6f1f9013/periodic_linux-jammy-cuda12_4-py3_10-gcc11/ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32/legacy_nvidia_driver/1 2025-12-04T17:26:26.8697052Z Uploading /home/ec2-user/actions-runner/_work/_temp/zip-upload/pytest_cache/pytorch/pytorch/96e092540d6b3c4076e3d2bc6f1f9013/periodic_linux-jammy-cuda12_4-py3_10-gcc11/ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32/legacy_nvidia_driver/1.zip 2025-12-04T17:26:26.8699363Z to s3://gha-artifacts/pytest_cache/pytorch/pytorch/96e092540d6b3c4076e3d2bc6f1f9013/periodic_linux-jammy-cuda12_4-py3_10-gcc11/ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32/legacy_nvidia_driver/1.zip 2025-12-04T17:26:26.9389190Z ##[group]Run cat test/**/*_toprint.log || true 2025-12-04T17:26:26.9389667Z cat test/**/*_toprint.log || true 2025-12-04T17:26:26.9397596Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T17:26:26.9398038Z env: 2025-12-04T17:26:26.9398300Z GIT_DEFAULT_BRANCH: main 2025-12-04T17:26:26.9398613Z HAS_NVIDIA_GPU: true 2025-12-04T17:26:26.9398968Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T17:26:26.9399617Z DOCKER_CONTAINER_ID: 764ff984146fd3268e049644ccb47d7d8238fae8138055ae0a6928cb5da435ad 2025-12-04T17:26:26.9400213Z ##[endgroup] 2025-12-04T17:26:26.9504427Z cat: 'test/**/*_toprint.log': No such file or directory 2025-12-04T17:26:26.9536253Z ##[group]Run kill "$MONITOR_SCRIPT_PID" 2025-12-04T17:26:26.9536823Z kill "$MONITOR_SCRIPT_PID" 2025-12-04T17:26:26.9543560Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T17:26:26.9544004Z env: 2025-12-04T17:26:26.9544261Z GIT_DEFAULT_BRANCH: main 2025-12-04T17:26:26.9544671Z HAS_NVIDIA_GPU: true 2025-12-04T17:26:26.9545045Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T17:26:26.9545703Z DOCKER_CONTAINER_ID: 764ff984146fd3268e049644ccb47d7d8238fae8138055ae0a6928cb5da435ad 2025-12-04T17:26:26.9546299Z MONITOR_SCRIPT_PID: 68781 2025-12-04T17:26:26.9546601Z ##[endgroup] 2025-12-04T17:26:26.9574416Z /home/ec2-user/actions-runner/_work/_temp/3915b074-498c-4516-9932-2a86b1af7de7.sh: line 1: kill: (68781) - No such process 2025-12-04T17:26:26.9576878Z ##[error]Process completed with exit code 1. 2025-12-04T17:26:26.9722444Z Prepare all required actions 2025-12-04T17:26:26.9723023Z Getting action download info 2025-12-04T17:26:27.1423223Z Download action repository 'seemethere/upload-artifact-s3@v5' (SHA:baba72d0712b404f646cebe0730933554ebce96a) 2025-12-04T17:26:27.3785519Z Download action repository 'actions/upload-artifact@v4' (SHA:ea165f8d65b6e75b540449e92b4886f43607fa02) 2025-12-04T17:26:27.7942311Z ##[group]Run ./.github/actions/upload-test-artifacts 2025-12-04T17:26:27.7942750Z with: 2025-12-04T17:26:27.7943212Z file-suffix: test-legacy_nvidia_driver-1-5-linux.g4dn.4xlarge.nvidia.gpu_57119749248 2025-12-04T17:26:27.7943810Z s3-bucket: gha-artifacts 2025-12-04T17:26:27.7944109Z env: 2025-12-04T17:26:27.7944340Z GIT_DEFAULT_BRANCH: main 2025-12-04T17:26:27.7944649Z HAS_NVIDIA_GPU: true 2025-12-04T17:26:27.7945019Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T17:26:27.7945653Z DOCKER_CONTAINER_ID: 764ff984146fd3268e049644ccb47d7d8238fae8138055ae0a6928cb5da435ad 2025-12-04T17:26:27.7946301Z ##[endgroup] 2025-12-04T17:26:27.7974069Z ##[group]Run # Remove any previous test jsons if they exist 2025-12-04T17:26:27.7974626Z # Remove any previous test jsons if they exist 2025-12-04T17:26:27.7975057Z rm -f test-jsons-*.zip 2025-12-04T17:26:27.7975567Z zip -r "test-jsons-${FILE_SUFFIX}.zip" test/test-reports -i '*.json' 2025-12-04T17:26:27.7982650Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T17:26:27.7983109Z env: 2025-12-04T17:26:27.7983353Z GIT_DEFAULT_BRANCH: main 2025-12-04T17:26:27.7983667Z HAS_NVIDIA_GPU: true 2025-12-04T17:26:27.7984033Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T17:26:27.7984672Z DOCKER_CONTAINER_ID: 764ff984146fd3268e049644ccb47d7d8238fae8138055ae0a6928cb5da435ad 2025-12-04T17:26:27.7985479Z FILE_SUFFIX: test-legacy_nvidia_driver-1-5-linux.g4dn.4xlarge.nvidia.gpu_57119749248 2025-12-04T17:26:27.7986049Z ##[endgroup] 2025-12-04T17:26:27.8198526Z adding: test/test-reports/td_exclusions-308e0cc0998f3631517c.json (deflated 16%) 2025-12-04T17:26:27.8199546Z adding: test/test-reports/python-pytest/lazy.test_ts_opinfo/lazy.test_ts_opinfo-8eadd60536af3632.json (deflated 76%) 2025-12-04T17:26:27.8210013Z adding: test/test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-f2c58a9dfc31919e.json (deflated 93%) 2025-12-04T17:26:27.8213808Z adding: test/test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-a793ea186f6e0edb.json (deflated 91%) 2025-12-04T17:26:27.8216126Z adding: test/test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-fcd1db8f24799401.json (deflated 91%) 2025-12-04T17:26:27.8219972Z adding: test/test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-bb08f25297cc596b.json (deflated 94%) 2025-12-04T17:26:27.8225299Z adding: test/test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-bf15e775351f3d84.json (deflated 92%) 2025-12-04T17:26:27.8234486Z adding: test/test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-cd1c50b62bb47a1b.json (deflated 95%) 2025-12-04T17:26:27.8243422Z adding: test/test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-3e5313e420476f15.json (deflated 95%) 2025-12-04T17:26:27.8246305Z adding: test/test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-b23b654b51890d24.json (deflated 90%) 2025-12-04T17:26:27.8248416Z adding: test/test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-2e7c8f13f7be0603.json (deflated 91%) 2025-12-04T17:26:27.8250600Z adding: test/test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-8d6cdce6581fa448.json (deflated 91%) 2025-12-04T17:26:27.8254806Z adding: test/test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-9dce38c1d023996d.json (deflated 92%) 2025-12-04T17:26:27.8257279Z adding: test/test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-b570798f966501a4.json (deflated 91%) 2025-12-04T17:26:27.8259148Z adding: test/test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-be9e2a318f1480ff.json (deflated 91%) 2025-12-04T17:26:27.8262502Z adding: test/test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-e62e290dfdad5699.json (deflated 94%) 2025-12-04T17:26:27.8290980Z adding: test/test-reports/python-pytest/inductor.test_torchinductor_codegen_dynamic_shapes/inductor.test_torchinductor_codegen_dynamic_shapes-0c75da116b2f10f8.json (deflated 94%) 2025-12-04T17:26:27.8292781Z adding: test/test-reports/python-pytest/inductor.test_torchinductor_codegen_dynamic_shapes/inductor.test_torchinductor_codegen_dynamic_shapes-fd0863b8a222871a.json (deflated 88%) 2025-12-04T17:26:27.8294618Z adding: test/test-reports/python-pytest/inductor.test_torchinductor_codegen_dynamic_shapes/inductor.test_torchinductor_codegen_dynamic_shapes-6fcb35b3fc35a71c.json (deflated 88%) 2025-12-04T17:26:27.8302107Z adding: test/test-reports/python-pytest/inductor.test_torchinductor_codegen_dynamic_shapes/inductor.test_torchinductor_codegen_dynamic_shapes-f8b2416e9d43ac69.json (deflated 94%) 2025-12-04T17:26:27.8307002Z adding: test/test-reports/python-pytest/inductor.test_torchinductor_opinfo/inductor.test_torchinductor_opinfo-8ad43f769763d7e0.json (deflated 95%) 2025-12-04T17:26:27.8312756Z adding: test/test-reports/python-pytest/inductor.test_torchinductor_opinfo/inductor.test_torchinductor_opinfo-6495f5d67df68869.json (deflated 95%) 2025-12-04T17:26:27.8317986Z adding: test/test-reports/python-pytest/inductor.test_torchinductor_opinfo/inductor.test_torchinductor_opinfo-f9f6352517dfd8be.json (deflated 96%) 2025-12-04T17:26:27.8323558Z adding: test/test-reports/python-pytest/inductor.test_torchinductor_opinfo/inductor.test_torchinductor_opinfo-25ab0fa1230b07b5.json (deflated 95%) 2025-12-04T17:26:27.8325062Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-74cab4bdcde89184.json (deflated 86%) 2025-12-04T17:26:27.8326716Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-77e37a2f8b75b3d9.json (deflated 85%) 2025-12-04T17:26:27.8328198Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-3ba19b390afd5854.json (deflated 85%) 2025-12-04T17:26:27.8329677Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-4ad317a243ecdd30.json (deflated 85%) 2025-12-04T17:26:27.8331133Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-f482798b2b39d897.json (deflated 85%) 2025-12-04T17:26:27.8332599Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-cbe2514f89eef609.json (deflated 85%) 2025-12-04T17:26:27.8334077Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-3707d31910126ebf.json (deflated 85%) 2025-12-04T17:26:27.8335562Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-dedaec5daecec784.json (deflated 85%) 2025-12-04T17:26:27.8337115Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-2f4f0e9c4ac682e4.json (deflated 85%) 2025-12-04T17:26:27.8338666Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-580d25229e34cb07.json (deflated 86%) 2025-12-04T17:26:27.8340136Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-9d15e1ab064c4537.json (deflated 85%) 2025-12-04T17:26:27.8341771Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-e6d909bcc6975bf8.json (deflated 85%) 2025-12-04T17:26:27.8343247Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-0a612698d44183a1.json (deflated 85%) 2025-12-04T17:26:27.8344712Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-90f2ceb88314c75a.json (deflated 85%) 2025-12-04T17:26:27.8346188Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-9644b19a5203c0ee.json (deflated 85%) 2025-12-04T17:26:27.8347669Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-1a6f999e52eb1904.json (deflated 86%) 2025-12-04T17:26:27.8349159Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-547414903ca204e9.json (deflated 85%) 2025-12-04T17:26:27.8350630Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-41f0f199b083e6d2.json (deflated 85%) 2025-12-04T17:26:27.8352083Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-438f9d52209526cc.json (deflated 85%) 2025-12-04T17:26:27.8353555Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-98df9406c6e0faf3.json (deflated 85%) 2025-12-04T17:26:27.8355024Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-f706cf73cc88a5b8.json (deflated 85%) 2025-12-04T17:26:27.8356499Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-c67f05de6c39b0d8.json (deflated 85%) 2025-12-04T17:26:27.8357969Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-0e30a339afee7d22.json (deflated 85%) 2025-12-04T17:26:27.8359475Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-4a14a2e6be65f97f.json (deflated 85%) 2025-12-04T17:26:27.8360946Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-82a3db4b14f41cd2.json (deflated 85%) 2025-12-04T17:26:27.8362420Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-a02c7191ab69f431.json (deflated 85%) 2025-12-04T17:26:27.8363895Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-e37b8ebc7938792f.json (deflated 85%) 2025-12-04T17:26:27.8365360Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-ee37665d187f9309.json (deflated 86%) 2025-12-04T17:26:27.8366840Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-511047743df1b08e.json (deflated 85%) 2025-12-04T17:26:27.8368315Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-4d9221d5ac70ff44.json (deflated 85%) 2025-12-04T17:26:27.8369797Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-af9a500a606c950b.json (deflated 85%) 2025-12-04T17:26:27.8371615Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-e3ba96547605fc4e.json (deflated 85%) 2025-12-04T17:26:27.8373190Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-ce470e45644e1cc6.json (deflated 85%) 2025-12-04T17:26:27.8374665Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-cece0bb00c5477e6.json (deflated 85%) 2025-12-04T17:26:27.8376253Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-4e672e5e3ae6046c.json (deflated 85%) 2025-12-04T17:26:27.8377824Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-65775801d71c7290.json (deflated 85%) 2025-12-04T17:26:27.8379288Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-9ed754aaaf490f98.json (deflated 86%) 2025-12-04T17:26:27.8380774Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-36a3a8a6a9d0a436.json (deflated 85%) 2025-12-04T17:26:27.8382256Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-f55b1076fbed9be9.json (deflated 85%) 2025-12-04T17:26:27.8383740Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-6062b5e411b734f8.json (deflated 85%) 2025-12-04T17:26:27.8385221Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-21fa07c752a411ad.json (deflated 85%) 2025-12-04T17:26:27.8386686Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-4c09ee7a97c51183.json (deflated 85%) 2025-12-04T17:26:27.8388152Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-c713b1dc3f3923ec.json (stored 0%) 2025-12-04T17:26:27.8392685Z adding: test/test-reports/python-pytest/inductor.test_compile_subprocess/inductor.test_compile_subprocess-84a2c5e5cdda7bdd.json (deflated 96%) 2025-12-04T17:26:27.8395072Z adding: test/test-reports/python-pytest/inductor.test_compile_subprocess/inductor.test_compile_subprocess-97e49e1b6070e822.json (deflated 92%) 2025-12-04T17:26:27.8397412Z adding: test/test-reports/python-pytest/inductor.test_compile_subprocess/inductor.test_compile_subprocess-aaac502093c587a7.json (deflated 92%) 2025-12-04T17:26:27.8407222Z adding: test/test-reports/python-pytest/inductor.test_compile_subprocess/inductor.test_compile_subprocess-decce829c4432557.json (deflated 95%) 2025-12-04T17:26:27.8408681Z adding: test/test-reports/python-pytest/inductor.test_compile_subprocess/inductor.test_compile_subprocess-491de48d6c983340.json (deflated 74%) 2025-12-04T17:26:27.8418577Z adding: test/test-reports/python-pytest/inductor.test_compile_subprocess/inductor.test_compile_subprocess-35b1cdd46f4129e6.json (deflated 96%) 2025-12-04T17:26:27.8419948Z adding: test/test-reports/python-pytest/inductor.test_flex_decoding/inductor.test_flex_decoding-4523fe803428b665.json (stored 0%) 2025-12-04T17:26:27.8421276Z adding: test/test-reports/python-pytest/inductor.test_deterministic/inductor.test_deterministic-ccc55353a2e77d8f.json (deflated 91%) 2025-12-04T17:26:27.8425479Z adding: test/test-reports/python-pytest/inductor.test_deterministic/inductor.test_deterministic-cbc1aeff512c7b0d.json (deflated 94%) 2025-12-04T17:26:27.8429969Z adding: test/test-reports/python-pytest/inductor.test_deterministic/inductor.test_deterministic-b35d65d1a2e42e4e.json (deflated 94%) 2025-12-04T17:26:27.8431318Z adding: test/test-reports/python-pytest/inductor.test_deterministic/inductor.test_deterministic-feba5ff46dbc30dd.json (deflated 74%) 2025-12-04T17:26:27.8432541Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-dff864e79f1bf91b.json (deflated 88%) 2025-12-04T17:26:27.8434371Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-053a0e10a178eff6.json (deflated 88%) 2025-12-04T17:26:27.8436476Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-966288eeb3fe785e.json (deflated 88%) 2025-12-04T17:26:27.8438439Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-47dd8058babbbd0d.json (deflated 88%) 2025-12-04T17:26:27.8440524Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-e92e228ccdafe934.json (deflated 88%) 2025-12-04T17:26:27.8442249Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-0328fb4bc2fb022d.json (deflated 88%) 2025-12-04T17:26:27.8444281Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-4ecceae3d20d3515.json (deflated 88%) 2025-12-04T17:26:27.8446207Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-af3f0411f43ffff1.json (deflated 88%) 2025-12-04T17:26:27.8448146Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-5f646abfecfc34db.json (deflated 88%) 2025-12-04T17:26:27.8450094Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-2f71fa45f6063b14.json (deflated 88%) 2025-12-04T17:26:27.8452074Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-3d881319a967678f.json (deflated 88%) 2025-12-04T17:26:27.8454040Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-7b45d70025cf6016.json (deflated 88%) 2025-12-04T17:26:27.8456035Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-b4a285d41fdad5fc.json (deflated 88%) 2025-12-04T17:26:27.8458046Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-9b24822b6f23300e.json (deflated 88%) 2025-12-04T17:26:27.8459966Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-642548938a706c13.json (deflated 88%) 2025-12-04T17:26:27.8462252Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-3087aa3d89d0a96b.json (deflated 89%) 2025-12-04T17:26:27.8464241Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-5fb4c628c04a1cdc.json (deflated 88%) 2025-12-04T17:26:27.8466197Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-0d752e0bfa5071ea.json (deflated 88%) 2025-12-04T17:26:27.8468136Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-fedbc7df4b1c2869.json (deflated 88%) 2025-12-04T17:26:27.8470103Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-3cf6d62b643bfad7.json (deflated 88%) 2025-12-04T17:26:27.8472325Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-e2e744a24cd2751e.json (deflated 88%) 2025-12-04T17:26:27.8474174Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-6de5a411a3f65f82.json (deflated 88%) 2025-12-04T17:26:27.8476161Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-d2f3621583fff098.json (deflated 88%) 2025-12-04T17:26:27.8478092Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-319cee3df6121e1a.json (deflated 88%) 2025-12-04T17:26:27.8479993Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-452be63c68b4eb35.json (deflated 88%) 2025-12-04T17:26:27.8481928Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-5a49841d6a2b730b.json (deflated 88%) 2025-12-04T17:26:27.8483851Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-f1313a025d30dc09.json (deflated 88%) 2025-12-04T17:26:27.8485805Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-03aedafc0832726c.json (deflated 88%) 2025-12-04T17:26:27.8487694Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-89171bcc48f05a69.json (deflated 88%) 2025-12-04T17:26:27.8489645Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-6450e334481f0131.json (deflated 88%) 2025-12-04T17:26:27.8490956Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-f7999da795e3cf34.json (deflated 87%) 2025-12-04T17:26:27.8498427Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-ad7a38726bbc8b50.json (deflated 96%) 2025-12-04T17:26:27.8505588Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-b434424093647de3.json (deflated 96%) 2025-12-04T17:26:27.8507762Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-ccd966f4e119e833.json (deflated 88%) 2025-12-04T17:26:27.8510354Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-d16f18ba4de45d90.json (deflated 90%) 2025-12-04T17:26:27.8513029Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-4078dca354f1c797.json (deflated 90%) 2025-12-04T17:26:27.8515293Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-7591ded94ad5fda9.json (deflated 89%) 2025-12-04T17:26:27.8517317Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-4955a88ef6b89264.json (deflated 88%) 2025-12-04T17:26:27.8519367Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-2fae1650dec37ec0.json (deflated 88%) 2025-12-04T17:26:27.8521376Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-0893388d06071d35.json (deflated 88%) 2025-12-04T17:26:27.8523425Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-b62e3abe6013e6ef.json (deflated 88%) 2025-12-04T17:26:27.8525437Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-0fe50dbde6f69754.json (deflated 88%) 2025-12-04T17:26:27.8527476Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-1381d94cd6abec18.json (deflated 88%) 2025-12-04T17:26:27.8529493Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-2b9aebe063e8f7ef.json (deflated 88%) 2025-12-04T17:26:27.8531520Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-649ba93d0ac5919c.json (deflated 88%) 2025-12-04T17:26:27.8533577Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-df60bd1ca7e6baab.json (deflated 88%) 2025-12-04T17:26:27.8535639Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-927fdf8f8ff6280c.json (deflated 88%) 2025-12-04T17:26:27.8537797Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-2f380c761dc75570.json (deflated 88%) 2025-12-04T17:26:27.8539843Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-db3aa4c2f1c0f2c1.json (deflated 88%) 2025-12-04T17:26:27.8541839Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-7c20e7902388541e.json (deflated 88%) 2025-12-04T17:26:27.8543871Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-43cf13c151388d8e.json (deflated 88%) 2025-12-04T17:26:27.8545883Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-27661fe34019a4f8.json (deflated 88%) 2025-12-04T17:26:27.8547911Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-63ef36c446edecf7.json (deflated 88%) 2025-12-04T17:26:27.8549891Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-818cc5e6f257d295.json (deflated 88%) 2025-12-04T17:26:27.8551970Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-b552d5ebf2a766dc.json (deflated 88%) 2025-12-04T17:26:27.8553966Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-08c28ac73e77007a.json (deflated 88%) 2025-12-04T17:26:27.8555995Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-df1b42bf8f6cd06e.json (deflated 88%) 2025-12-04T17:26:27.8558019Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-97d6c66aee44b097.json (deflated 88%) 2025-12-04T17:26:27.8560003Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-232f2d4b09cdec77.json (deflated 88%) 2025-12-04T17:26:27.8562066Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-6add3d31a0a55a66.json (deflated 88%) 2025-12-04T17:26:27.8564132Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-fa52f41f0c0be4e5.json (deflated 88%) 2025-12-04T17:26:27.8566325Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-38b24c1b21208356.json (deflated 88%) 2025-12-04T17:26:27.8568258Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-b1ae24833396f782.json (deflated 88%) 2025-12-04T17:26:27.8570391Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-80996ba6b8c32f81.json (deflated 88%) 2025-12-04T17:26:27.8572555Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-8b26ba548538abde.json (deflated 88%) 2025-12-04T17:26:27.8574552Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-d73817a3e5f02a06.json (deflated 88%) 2025-12-04T17:26:27.8578145Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-90e37d7f0968dad1.json (deflated 92%) 2025-12-04T17:26:27.8580116Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-6ba281452d587f38.json (deflated 88%) 2025-12-04T17:26:27.8582068Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-85d1d6e9267cc116.json (deflated 88%) 2025-12-04T17:26:27.8584037Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-7a610c26dd7fa0e9.json (deflated 88%) 2025-12-04T17:26:27.8585963Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-269f6089cafc9f3b.json (deflated 88%) 2025-12-04T17:26:27.8587877Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-f11fe18ee197cc1f.json (deflated 88%) 2025-12-04T17:26:27.8590321Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-0b8acd36d7258295.json (deflated 89%) 2025-12-04T17:26:27.8592369Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-babe12520ea62fea.json (deflated 88%) 2025-12-04T17:26:27.8594258Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-08a6bb29b776e6ca.json (deflated 88%) 2025-12-04T17:26:27.8596160Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-ef8db3fa00c6c1d7.json (deflated 88%) 2025-12-04T17:26:27.8598002Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-7a3ac84fc91fa02b.json (deflated 88%) 2025-12-04T17:26:27.8599976Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-e162f70cb76e49ff.json (deflated 88%) 2025-12-04T17:26:27.8602416Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-e70a5c274fb86b8e.json (deflated 89%) 2025-12-04T17:26:27.8604520Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-0c17434f07767682.json (deflated 89%) 2025-12-04T17:26:27.8606573Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-3815c1aa47a06d85.json (deflated 89%) 2025-12-04T17:26:27.8608657Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-69850f25ab7699fd.json (deflated 89%) 2025-12-04T17:26:27.8610725Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-da23a1d59c747be6.json (deflated 89%) 2025-12-04T17:26:27.8612898Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-36993cd4956a89fe.json (deflated 89%) 2025-12-04T17:26:27.8614965Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-78153e5fcd212bc6.json (deflated 89%) 2025-12-04T17:26:27.8617155Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-04b538cf09549803.json (deflated 89%) 2025-12-04T17:26:27.8619198Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-d91f9f6b0d5ec125.json (deflated 89%) 2025-12-04T17:26:27.8621280Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-ebbb316cfb6210df.json (deflated 89%) 2025-12-04T17:26:27.8623330Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-6f1b13e751374b5d.json (deflated 89%) 2025-12-04T17:26:27.8625395Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-f2c09e3279cd971a.json (deflated 89%) 2025-12-04T17:26:27.8627387Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-5c0e3bac2edd6805.json (deflated 88%) 2025-12-04T17:26:27.8629030Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-1c004f486086cbb5.json (deflated 88%) 2025-12-04T17:26:27.8631022Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-5e05e7b060f911b0.json (deflated 88%) 2025-12-04T17:26:27.8635548Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-4d9849432c7f5caf.json (deflated 98%) 2025-12-04T17:26:27.8636717Z adding: test/test-reports/python-pytest/dynamo.test_model_output/dynamo.test_model_output-d1e6e78eb0372411.json (deflated 83%) 2025-12-04T17:26:27.8637964Z adding: test/test-reports/python-pytest/dynamo.test_model_output/dynamo.test_model_output-0e0432c8246f889e.json (deflated 83%) 2025-12-04T17:26:27.8639197Z adding: test/test-reports/python-pytest/dynamo.test_model_output/dynamo.test_model_output-1d824658578ee605.json (deflated 83%) 2025-12-04T17:26:27.8640436Z adding: test/test-reports/python-pytest/dynamo.test_model_output/dynamo.test_model_output-a0141b45c0b55065.json (deflated 85%) 2025-12-04T17:26:27.8663834Z adding: test/test-reports/python-pytest/inductor.test_triton_kernels/inductor.test_triton_kernels-498ce8e3e7c25595.json (deflated 95%) 2025-12-04T17:26:27.8667385Z adding: test/test-reports/python-pytest/inductor.test_loop_ordering/inductor.test_loop_ordering-264346bf50f4314b.json (deflated 87%) 2025-12-04T17:26:27.8668775Z adding: test/test-reports/python-pytest/inductor.test_loop_ordering/inductor.test_loop_ordering-f4f4ac9590e83730.json (deflated 87%) 2025-12-04T17:26:27.8670204Z adding: test/test-reports/python-pytest/inductor.test_loop_ordering/inductor.test_loop_ordering-193da166d9e268ac.json (deflated 87%) 2025-12-04T17:26:27.8672202Z adding: test/test-reports/python-pytest/inductor.test_loop_ordering/inductor.test_loop_ordering-d4eac70f931f6c8b.json (deflated 94%) 2025-12-04T17:26:27.8775181Z adding: test/test-reports/python-pytest/export.test_serdes/export.test_serdes-191fd84c43c29743.json (deflated 95%) 2025-12-04T17:26:27.8776989Z adding: test/test-reports/python-pytest/dynamo.test_backends/dynamo.test_backends-7c3220f8bc842d2f.json (deflated 84%) 2025-12-04T17:26:27.8778170Z adding: test/test-reports/python-pytest/dynamo.test_backends/dynamo.test_backends-7281b232f81c7a26.json (deflated 87%) 2025-12-04T17:26:27.8779343Z adding: test/test-reports/python-pytest/dynamo.test_backends/dynamo.test_backends-0c21d337a20b3a01.json (deflated 87%) 2025-12-04T17:26:27.8780518Z adding: test/test-reports/python-pytest/dynamo.test_backends/dynamo.test_backends-62a9a8755ec319d6.json (deflated 84%) 2025-12-04T17:26:27.8786043Z adding: test/test-reports/python-pytest/inductor.test_aot_inductor_package/inductor.test_aot_inductor_package-b1ca468dab29d0d8.json (deflated 95%) 2025-12-04T17:26:27.8788321Z adding: test/test-reports/python-pytest/inductor.test_aot_inductor_package/inductor.test_aot_inductor_package-69f64b5320fd797d.json (deflated 91%) 2025-12-04T17:26:27.8790618Z adding: test/test-reports/python-pytest/inductor.test_aot_inductor_package/inductor.test_aot_inductor_package-e41e403fca9b1188.json (deflated 91%) 2025-12-04T17:26:27.8792078Z adding: test/test-reports/python-pytest/inductor.test_aot_inductor_package/inductor.test_aot_inductor_package-07f86488d4cce1d3.json (deflated 90%) 2025-12-04T17:26:27.8796140Z adding: test/test-reports/python-pytest/inductor.test_padding/inductor.test_padding-be250a10b53bb058.json (deflated 91%) 2025-12-04T17:26:27.8797696Z adding: test/test-reports/python-pytest/dynamo.test_aot_compile/dynamo.test_aot_compile-10a88b68c9603fe3.json (deflated 88%) 2025-12-04T17:26:27.8800490Z adding: test/test-reports/python-pytest/dynamo.test_sets/dynamo.test_sets-f0cb58e83c4ea8ef.json (deflated 94%) 2025-12-04T17:26:27.8802802Z adding: test/test-reports/python-pytest/dynamo.test_wrap_inductor_compiled_regions/dynamo.test_wrap_inductor_compiled_regions-2f1d9c362e038030.json (deflated 87%) 2025-12-04T17:26:27.8834101Z adding: test/test-reports/python-pytest/test_sparse/test_sparse-598e6683c5cfc22a.json (deflated 97%) 2025-12-04T17:26:27.8846642Z adding: test/test-reports/python-pytest/test_decomp/test_decomp-5879e0e26736617e.json (deflated 95%) 2025-12-04T17:26:27.8859043Z adding: test/test-reports/python-pytest/test_decomp/test_decomp-c4519c63d1395608.json (deflated 95%) 2025-12-04T17:26:27.8871678Z adding: test/test-reports/python-pytest/test_decomp/test_decomp-fd1a91e45a41098b.json (deflated 95%) 2025-12-04T17:26:27.8916014Z adding: test/test-reports/python-pytest/test_ops_fwd_gradients/test_ops_fwd_gradients-dac273fbaf67ad10.json (deflated 97%) 2025-12-04T17:26:27.9089659Z adding: test/test-reports/python-pytest/test_meta/test_meta-cbc50d7c3e0b1b6a.json (deflated 97%) 2025-12-04T17:26:27.9106082Z adding: test/test-reports/python-pytest/test_ops_jit/test_ops_jit-2f4faab6a29e642c.json (deflated 95%) 2025-12-04T17:26:27.9138325Z adding: test/test-reports/python-pytest/test_nestedtensor/test_nestedtensor-8372b6917771ca4c.json (deflated 98%) 2025-12-04T17:26:27.9218030Z adding: test/test-reports/python-pytest/test_ops/test_ops-d95bfbe57b5d2d89.json (deflated 96%) 2025-12-04T17:26:27.9309669Z adding: test/test-reports/python-pytest/test_ops/test_ops-75f8d45594e24741.json (deflated 97%) 2025-12-04T17:26:27.9311168Z adding: test/test-reports/python-pytest/functorch.test_dims/functorch.test_dims-e2a9e671430fd99e.json (deflated 93%) 2025-12-04T17:26:27.9354780Z adding: test/test-reports/python-pytest/functorch.test_ops/functorch.test_ops-caabf5583dae6043.json (deflated 95%) 2025-12-04T17:26:27.9398771Z adding: test/test-reports/python-pytest/functorch.test_ops/functorch.test_ops-b6190fae5240f1fb.json (deflated 95%) 2025-12-04T17:26:27.9414526Z adding: test/test-reports/python-pytest/inductor.test_cpu_repro/inductor.test_cpu_repro-e45fcbaf6c1a2b2c.json (deflated 97%) 2025-12-04T17:26:27.9415859Z adding: test/test-reports/python-pytest/inductor.test_custom_lowering/inductor.test_custom_lowering-f90a8c2a1b7dd9b0.json (deflated 83%) 2025-12-04T17:26:27.9422010Z adding: test/test-reports/python-pytest/inductor.test_perf/inductor.test_perf-34de9a09a2935f8d.json (deflated 93%) 2025-12-04T17:26:27.9423246Z adding: test/test-reports/python-pytest/inductor.test_binary_folding/inductor.test_binary_folding-0c797ad2be676af7.json (deflated 83%) 2025-12-04T17:26:27.9430600Z adding: test/test-reports/python-pytest/inductor.test_mkldnn_pattern_matcher/inductor.test_mkldnn_pattern_matcher-c93031a5b8f8293d.json (deflated 95%) 2025-12-04T17:26:27.9443677Z adding: test/test-reports/python-pytest/inductor.test_gpu_cpp_wrapper/inductor.test_gpu_cpp_wrapper-c206afd337165094.json (deflated 95%) 2025-12-04T17:26:27.9445055Z adding: test/test-reports/python-pytest/inductor.test_cutedsl_template/inductor.test_cutedsl_template-1780c0291e7a0397.json (deflated 92%) 2025-12-04T17:26:27.9446447Z adding: test/test-reports/python-pytest/inductor.test_benchmark_fusion/inductor.test_benchmark_fusion-33e3c50f2f02127c.json (deflated 84%) 2025-12-04T17:26:27.9453616Z adding: test/test-reports/python-pytest/dynamo.test_modules/dynamo.test_modules-f3674dc870090d50.json (deflated 91%) 2025-12-04T17:26:27.9454935Z adding: test/test-reports/python-pytest/dynamo.test_recompiles/dynamo.test_recompiles-755ec9793479e2dd.json (deflated 82%) 2025-12-04T17:26:27.9456158Z adding: test/test-reports/python-pytest/export.test_tree_utils/export.test_tree_utils-4b33de82582b2e92.json (deflated 61%) 2025-12-04T17:26:27.9458125Z adding: test/test-reports/python-pytest/inductor.test_triton_wrapper/inductor.test_triton_wrapper-7697274370716365.json (deflated 51%) 2025-12-04T17:26:27.9459523Z adding: test/test-reports/python-pytest/inductor.test_static_cuda_launcher/inductor.test_static_cuda_launcher-96effba66b878950.json (deflated 90%) 2025-12-04T17:26:27.9460878Z adding: test/test-reports/python-pytest/export.test_dynamic_shapes/export.test_dynamic_shapes-6f817f896f94c83c.json (deflated 63%) 2025-12-04T17:26:27.9462223Z adding: test/test-reports/python-pytest/dynamo.test_sdpa/dynamo.test_sdpa-3e0149796a415876.json (deflated 79%) 2025-12-04T17:26:27.9463340Z adding: test/test-reports/python-pytest/dynamo.test_utils/dynamo.test_utils-e6d94f5c34c685f8.json (deflated 84%) 2025-12-04T17:26:27.9464563Z adding: test/test-reports/python-pytest/inductor.test_codegen_triton/inductor.test_codegen_triton-f741c3b21cf28e3b.json (deflated 36%) 2025-12-04T17:26:27.9465823Z adding: test/test-reports/python-pytest/dynamo.test_frame_init/dynamo.test_frame_init-c2e1024fb8a07387.json (deflated 37%) 2025-12-04T17:26:27.9467084Z adding: test/test-reports/python-pytest/inductor.test_device_assert/inductor.test_device_assert-451c7142fcd9d62b.json (deflated 84%) 2025-12-04T17:26:27.9468387Z adding: test/test-reports/python-pytest/dynamo.test_skip_non_tensor/dynamo.test_skip_non_tensor-f190ace25428cb94.json (deflated 80%) 2025-12-04T17:26:27.9469757Z adding: test/test-reports/python-pytest/dynamo.test_skip_guard_eval_unsafe/dynamo.test_skip_guard_eval_unsafe-aa1ded9d0a4e400e.json (deflated 80%) 2025-12-04T17:26:27.9471389Z adding: test/test-reports/python-pytest/inductor.test_control_deps/inductor.test_control_deps-23047419ffe03376.json (deflated 50%) 2025-12-04T17:26:27.9472702Z adding: test/test-reports/python-pytest/inductor.test_benchmarking/inductor.test_benchmarking-53f04a03954d2058.json (deflated 91%) 2025-12-04T17:26:27.9474039Z adding: test/test-reports/python-pytest/inductor.test_helion_kernels/inductor.test_helion_kernels-0df86f8cd24ea26a.json (deflated 69%) 2025-12-04T17:26:27.9475373Z adding: test/test-reports/python-pytest/inductor.test_quantization/inductor.test_quantization-951156711359c867.json (deflated 66%) 2025-12-04T17:26:27.9476582Z adding: test/test-reports/python-pytest/export.test_tools/export.test_tools-c033e9415dabe65c.json (deflated 56%) 2025-12-04T17:26:27.9484004Z adding: test/test-reports/python-pytest/inductor.test_compiled_optimizers/inductor.test_compiled_optimizers-1745c9b9e5fc7ed3.json (deflated 96%) 2025-12-04T17:26:27.9485532Z adding: test/test-reports/python-pytest/inductor.test_aot_inductor_utils/inductor.test_aot_inductor_utils-e7355f16ccb52d23.json (stored 0%) 2025-12-04T17:26:27.9504754Z adding: test/test-reports/python-pytest/inductor.test_control_flow/inductor.test_control_flow-0b2081966a192cef.json (deflated 97%) 2025-12-04T17:26:27.9509446Z adding: test/test-reports/python-pytest/inductor.test_minifier_isolate/inductor.test_minifier_isolate-f50615d1a1981661.json (deflated 93%) 2025-12-04T17:26:27.9636017Z adding: test/test-reports/python-pytest/dynamo.test_error_messages/dynamo.test_error_messages-36d8e363c2770c16.json (deflated 95%) 2025-12-04T17:26:27.9637334Z adding: test/test-reports/python-pytest/dynamo.test_fake_distributed/dynamo.test_fake_distributed-b0f5d6fe6c345e8f.json (deflated 73%) 2025-12-04T17:26:27.9638594Z adding: test/test-reports/python-pytest/dynamo.test_tree_map/dynamo.test_tree_map-39d9c68e899fe910.json (deflated 93%) 2025-12-04T17:26:27.9647405Z adding: test/test-reports/python-pytest/dynamo.test_minifier/dynamo.test_minifier-9124cc51e1c5e7b6.json (deflated 94%) 2025-12-04T17:26:27.9649006Z adding: test/test-reports/python-pytest/dynamo.test_guard_manager/dynamo.test_guard_manager-f0dd8a549f18516b.json (deflated 93%) 2025-12-04T17:26:27.9650251Z adding: test/test-reports/python-pytest/export.test_schema/export.test_schema-98e7fce7714746ab.json (deflated 82%) 2025-12-04T17:26:27.9651630Z adding: test/test-reports/python-pytest/export.test_pass_infra/export.test_pass_infra-0489f34d1d482c78.json (deflated 81%) 2025-12-04T17:26:27.9652885Z adding: test/test-reports/python-pytest/dynamo.test_recompile_ux/dynamo.test_recompile_ux-5436245cbc75fddd.json (deflated 82%) 2025-12-04T17:26:27.9654151Z adding: test/test-reports/python-pytest/export.test_experimental/export.test_experimental-4743e9a7200af635.json (deflated 90%) 2025-12-04T17:26:27.9657043Z adding: test/test-reports/python-pytest/export.test_converter/export.test_converter-a6e4e9ebcfaea6df.json (deflated 92%) 2025-12-04T17:26:27.9658482Z adding: test/test-reports/python-pytest/dynamo.test_reorder_logs/dynamo.test_reorder_logs-d530254831fe0a21.json (deflated 87%) 2025-12-04T17:26:27.9664185Z adding: test/test-reports/python-pytest/dynamo.test_subclasses/dynamo.test_subclasses-90ae20717b7fd572.json (deflated 93%) 2025-12-04T17:26:27.9665480Z adding: test/test-reports/python-pytest/dynamo.test_python_autograd/dynamo.test_python_autograd-b76a60537c2ba691.json (deflated 83%) 2025-12-04T17:26:27.9666780Z adding: test/test-reports/python-pytest/export.test_draft_export/export.test_draft_export-0c8a812115433a7d.json (deflated 89%) 2025-12-04T17:26:27.9669365Z adding: test/test-reports/python-pytest/test_package/test_package-523d81f0792170f1.json (deflated 93%) 2025-12-04T17:26:27.9670407Z adding: test/test-reports/python-pytest/test_mkl_verbose/test_mkl_verbose-874cbf06946f8b3e.json (deflated 64%) 2025-12-04T17:26:27.9671686Z adding: test/test-reports/python-pytest/test_comparison_utils/test_comparison_utils-ce770324779d51b3.json (deflated 87%) 2025-12-04T17:26:27.9672939Z adding: test/test-reports/python-pytest/functorch.test_ac_logging/functorch.test_ac_logging-f1c79a1c8c74be66.json (deflated 77%) 2025-12-04T17:26:27.9674144Z adding: test/test-reports/python-pytest/test_mkldnn_verbose/test_mkldnn_verbose-e983273d29ed8e1e.json (deflated 64%) 2025-12-04T17:26:27.9681193Z adding: test/test-reports/python-pytest/test_cpp_api_parity/test_cpp_api_parity-c6b7300fef8db168.json (deflated 97%) 2025-12-04T17:26:27.9682304Z adding: test/test-reports/python-pytest/test_autoload/test_autoload-21f1eacf8f4a4d28.json (deflated 36%) 2025-12-04T17:26:27.9683519Z adding: test/test-reports/python-pytest/nn.attention.test_open_registry/nn.attention.test_open_registry-bacfee0084c93992.json (deflated 63%) 2025-12-04T17:26:27.9684744Z adding: test/test-reports/python-pytest/test_as_strided/test_as_strided-4555079064233d7d.json (deflated 61%) 2025-12-04T17:26:27.9744528Z adding: test/test-reports/python-pytest/test_foreach/test_foreach-aa4419a4e7b6d381.json (deflated 98%) 2025-12-04T17:26:27.9745687Z adding: test/test-reports/python-pytest/xpu.test_gemm/xpu.test_gemm-2cb9cf39de6aa2cf.json (stored 0%) 2025-12-04T17:26:27.9746737Z adding: test/test-reports/python-pytest/test_numpy_interop/test_numpy_interop-660870d95235d56d.json (deflated 93%) 2025-12-04T17:26:27.9747923Z adding: test/test-reports/python-pytest/profiler.test_cpp_thread/profiler.test_cpp_thread-31559e2ba96f64a3.json (deflated 85%) 2025-12-04T17:26:27.9748992Z adding: test/test-reports/python-pytest/test_hub/test_hub-33a47573ff45c77e.json (deflated 86%) 2025-12-04T17:26:27.9751229Z adding: test/test-reports/python-pytest/test_segment_reductions/test_segment_reductions-ad616dd6940e0de0.json (deflated 97%) 2025-12-04T17:26:27.9752465Z adding: test/test-reports/python-pytest/test_autograd_fallback/test_autograd_fallback-e1a7bbd98afc63dc.json (deflated 94%) 2025-12-04T17:26:27.9753602Z adding: test/test-reports/python-pytest/test_type_hints/test_type_hints-d14fd0906e097d86.json (deflated 58%) 2025-12-04T17:26:27.9754937Z adding: test/test-reports/python-pytest/functorch.test_aot_joint_with_descriptors/functorch.test_aot_joint_with_descriptors-79fd9b229bc0c00b.json (deflated 91%) 2025-12-04T17:26:27.9756315Z adding: test/test-reports/python-pytest/test_fx_reinplace_pass/test_fx_reinplace_pass-047146b9ff22e4f6.json (deflated 87%) 2025-12-04T17:26:27.9781852Z adding: test/test-reports/python-pytest/functorch.test_control_flow/functorch.test_control_flow-922a9914156e0312.json (deflated 97%) 2025-12-04T17:26:27.9783586Z adding: test/test-reports/python-pytest/test_subclass/test_subclass-68565895e4fc66ea.json (deflated 96%) 2025-12-04T17:26:27.9827249Z adding: test/test-reports/python-pytest/functorch.test_vmap_registrations/functorch.test_vmap_registrations-40d5b566ee6986dc.json (deflated 98%) 2025-12-04T17:26:27.9828783Z adding: test/test-reports/python-pytest/nn.test_parametrization/nn.test_parametrization-ed4e97080833ff92.json (deflated 94%) 2025-12-04T17:26:27.9846807Z adding: test/test-reports/python-pytest/test_dynamic_shapes/test_dynamic_shapes-07075f000d166d21.json (deflated 95%) 2025-12-04T17:26:27.9847901Z adding: test/test-reports/python-pytest/test_dispatch/test_dispatch-bf1fd68f7abb7228.json (deflated 93%) 2025-12-04T17:26:27.9849031Z adding: test/test-reports/python-pytest/test_numba_integration/test_numba_integration-edcc49db775b9990.json (deflated 87%) 2025-12-04T17:26:27.9850237Z adding: test/test-reports/python-pytest/test_functional_optim/test_functional_optim-389cbc1bb3d61470.json (deflated 79%) 2025-12-04T17:26:27.9866776Z adding: test/test-reports/python-pytest/test_maskedtensor/test_maskedtensor-1089c4e953521eec.json (deflated 97%) 2025-12-04T17:26:27.9868092Z adding: test/test-reports/python-pytest/benchmark_utils.test_benchmark_utils/benchmark_utils.test_benchmark_utils-76c10c33afe299c4.json (deflated 87%) 2025-12-04T17:26:27.9898683Z adding: test/test-reports/python-pytest/test_scaled_matmul_cuda/test_scaled_matmul_cuda-d1f8763e6c1869e6.json (deflated 99%) 2025-12-04T17:26:27.9902024Z adding: test/test-reports/python-pytest/torch_np.numpy_tests.core.test_shape_base/torch_np.numpy_tests.core.test_shape_base-b9eed7c143bc9bc3.json (deflated 97%) 2025-12-04T17:26:27.9903301Z adding: test/test-reports/python-pytest/test_vulkan/test_vulkan-b25d187bf3baa78a.json (deflated 44%) 2025-12-04T17:26:27.9904355Z adding: test/test-reports/python-pytest/lazy.test_generator/lazy.test_generator-42072a3593c4e25d.json (deflated 63%) 2025-12-04T17:26:27.9909537Z adding: test/test-reports/python-pytest/torch_np.numpy_tests.linalg.test_linalg/torch_np.numpy_tests.linalg.test_linalg-2974f2048ff6a577.json (deflated 97%) 2025-12-04T17:26:27.9912999Z adding: test/test-reports/python-pytest/torch_np.numpy_tests.core.test_dtype/torch_np.numpy_tests.core.test_dtype-4b8c4285965a7813.json (deflated 97%) 2025-12-04T17:26:27.9914324Z adding: test/test-reports/python-pytest/lazy.test_debug_util/lazy.test_debug_util-7c02b1e3dfee61bd.json (deflated 33%) 2025-12-04T17:26:27.9915598Z adding: test/test-reports/python-pytest/nn.test_load_state_dict/nn.test_load_state_dict-e81d6ed8d3f8789f.json (deflated 94%) 2025-12-04T17:26:27.9916707Z adding: test/test-reports/python-pytest/test_shape_ops/test_shape_ops-a6160583c0856270.json (deflated 96%) 2025-12-04T17:26:27.9918328Z adding: test/test-reports/python-pytest/nn.test_module_hooks/nn.test_module_hooks-e13d4f4eb9af9666.json (deflated 92%) 2025-12-04T17:26:27.9919765Z adding: test/test-reports/python-pytest/torch_np.numpy_tests.lib.test_twodim_base/torch_np.numpy_tests.lib.test_twodim_base-2da66c446de8da89.json (deflated 93%) 2025-12-04T17:26:27.9921255Z adding: test/test-reports/python-pytest/profiler.test_memory_profiler/profiler.test_memory_profiler-20f5e2eefecacaee.json (deflated 88%) 2025-12-04T17:26:27.9923836Z adding: test/test-reports/python-pytest/test_jit_llga_fuser/test_jit_llga_fuser-b203cab2c461ce78.json (deflated 96%) 2025-12-04T17:26:27.9925181Z adding: test/test-reports/python-pytest/torch_np.numpy_tests.core.test_getlimits/torch_np.numpy_tests.core.test_getlimits-5149534e2555ec6f.json (deflated 91%) 2025-12-04T17:26:27.9932214Z adding: test/test-reports/python-pytest/torch_np.test_ndarray_methods/torch_np.test_ndarray_methods-fe7c638b86097b2d.json (deflated 98%) 2025-12-04T17:26:27.9937924Z adding: test/test-reports/python-pytest/test_view_ops/test_view_ops-f5d6b3525797eb50.json (deflated 95%) 2025-12-04T17:26:27.9939031Z adding: test/test-reports/python-pytest/test_type_info/test_type_info-22600993e111f6f2.json (deflated 83%) 2025-12-04T17:26:27.9960126Z adding: test/test-reports/python-pytest/functorch.test_aotdispatch/functorch.test_aotdispatch-efb7e0b79840fa38.json (deflated 95%) 2025-12-04T17:26:27.9961838Z adding: test/test-reports/python-pytest/test_scatter_gather_ops/test_scatter_gather_ops-5274bd99c8a0619f.json (deflated 95%) 2025-12-04T17:26:27.9964412Z adding: test/test-reports/python-pytest/test_cuda_multigpu/test_cuda_multigpu-a9a26e79d8868522.json (deflated 94%) 2025-12-04T17:26:27.9966016Z adding: test/test-reports/python-pytest/torch_np.numpy_tests.lib.test_index_tricks/torch_np.numpy_tests.lib.test_index_tricks-43f32c31fbfc43cd.json (deflated 94%) 2025-12-04T17:26:27.9967673Z adding: test/test-reports/python-pytest/test_jit_autocast/test_jit_autocast-9b5e22ff1077135a.json (deflated 91%) 2025-12-04T17:26:27.9971594Z adding: test/test-reports/python-pytest/nn.test_pooling/nn.test_pooling-2151df52b065bbdf.json (deflated 96%) 2025-12-04T17:26:27.9975364Z adding: test/test-reports/python-pytest/nn.test_embedding/nn.test_embedding-d055fd5d393643fe.json (deflated 97%) 2025-12-04T17:26:27.9976618Z adding: test/test-reports/python-pytest/test_xnnpack_integration/test_xnnpack_integration-ed8e38bda9a33f4f.json (deflated 88%) 2025-12-04T17:26:27.9977789Z adding: test/test-reports/python-pytest/test_cuda_trace/test_cuda_trace-5e38c3c197506de5.json (deflated 36%) 2025-12-04T17:26:27.9978840Z adding: test/test-reports/python-pytest/test_cuda_trace/test_cuda_trace-4795d7c5159b6e03.json (deflated 33%) 2025-12-04T17:26:27.9979878Z adding: test/test-reports/python-pytest/test_cuda_trace/test_cuda_trace-6c76df2a5666e90f.json (deflated 34%) 2025-12-04T17:26:27.9980929Z adding: test/test-reports/python-pytest/test_cuda_trace/test_cuda_trace-c93df3ae687a8e58.json (deflated 34%) 2025-12-04T17:26:27.9981984Z adding: test/test-reports/python-pytest/test_cuda_trace/test_cuda_trace-524a9565fc6ac576.json (deflated 34%) 2025-12-04T17:26:27.9983026Z adding: test/test-reports/python-pytest/test_cuda_trace/test_cuda_trace-b18b3c9d4ddc6b34.json (deflated 34%) 2025-12-04T17:26:27.9984052Z adding: test/test-reports/python-pytest/test_cuda_trace/test_cuda_trace-319074c2014cbf3e.json (deflated 34%) 2025-12-04T17:26:27.9985082Z adding: test/test-reports/python-pytest/test_cuda_trace/test_cuda_trace-2483ba726355768c.json (deflated 33%) 2025-12-04T17:26:27.9986131Z adding: test/test-reports/python-pytest/test_cuda_trace/test_cuda_trace-f2eed9c29ea8eac7.json (deflated 34%) 2025-12-04T17:26:27.9987289Z adding: test/test-reports/python-pytest/test_cuda_trace/test_cuda_trace-c778cc218c519690.json (deflated 33%) 2025-12-04T17:26:27.9988315Z adding: test/test-reports/python-pytest/test_cuda_trace/test_cuda_trace-7897a7f1d03cdaa3.json (deflated 34%) 2025-12-04T17:26:27.9989356Z adding: test/test-reports/python-pytest/test_cuda_trace/test_cuda_trace-d4fb26045e698199.json (deflated 33%) 2025-12-04T17:26:28.0006113Z adding: test/test-reports/python-pytest/torch_np.test_reductions/torch_np.test_reductions-73a2026a6cdfd4dc.json (deflated 98%) 2025-12-04T17:26:28.0007799Z adding: test/test-reports/python-pytest/torch_np.numpy_tests.core.test_scalar_ctors/torch_np.numpy_tests.core.test_scalar_ctors-e23576bcb06b5d61.json (deflated 97%) 2025-12-04T17:26:28.0009339Z adding: test/test-reports/python-pytest/torch_np.numpy_tests.lib.test_arraypad/torch_np.numpy_tests.lib.test_arraypad-f4e46a1506be78e1.json (deflated 87%) 2025-12-04T17:26:28.0010589Z adding: test/test-reports/python-pytest/test_prims/test_prims-38188698633a9bb5.json (deflated 89%) 2025-12-04T17:26:28.0016556Z adding: test/test-reports/python-pytest/test_spectral_ops/test_spectral_ops-ae86fbbf23286ef9.json (deflated 96%) 2025-12-04T17:26:28.0017789Z adding: test/test-reports/python-pytest/test_cpp_extensions_aot_ninja/test_cpp_extensions_aot_ninja-5c9ab2f003415ced.json (deflated 90%) 2025-12-04T17:26:28.0019147Z adding: test/test-reports/python-pytest/test_cpp_extensions_aot_no_ninja/test_cpp_extensions_aot_no_ninja-dc0b3ab1cc30279c.json (deflated 90%) 2025-12-04T17:26:28.0023284Z adding: test/test-reports/td_exclusions-c770a75ce015dc406ab1.json (deflated 82%) 2025-12-04T17:26:28.0024314Z adding: test/test-reports/python-unittest/test_autoload/TEST-TestDeviceBackendAutoload-20251204172052.json (deflated 37%) 2025-12-04T17:26:28.0056530Z ##[group]Run # Remove any previous test reports if they exist 2025-12-04T17:26:28.0057102Z # Remove any previous test reports if they exist 2025-12-04T17:26:28.0057746Z rm -f test-reports-*.zip 2025-12-04T17:26:28.0058311Z zip -r "test-reports-${FILE_SUFFIX}.zip" test/test-reports -i '*.xml' -i '*.csv' 2025-12-04T17:26:28.0065447Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T17:26:28.0065888Z env: 2025-12-04T17:26:28.0066144Z GIT_DEFAULT_BRANCH: main 2025-12-04T17:26:28.0066463Z HAS_NVIDIA_GPU: true 2025-12-04T17:26:28.0066823Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T17:26:28.0067485Z DOCKER_CONTAINER_ID: 764ff984146fd3268e049644ccb47d7d8238fae8138055ae0a6928cb5da435ad 2025-12-04T17:26:28.0068297Z FILE_SUFFIX: test-legacy_nvidia_driver-1-5-linux.g4dn.4xlarge.nvidia.gpu_57119749248 2025-12-04T17:26:28.0068883Z ##[endgroup] 2025-12-04T17:26:28.0208188Z adding: test/test-reports/python-pytest/lazy.test_ts_opinfo/lazy.test_ts_opinfo-8eadd60536af3632.xml (deflated 62%) 2025-12-04T17:26:28.0215016Z adding: test/test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-f2c58a9dfc31919e.xml (deflated 92%) 2025-12-04T17:26:28.0217311Z adding: test/test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-a793ea186f6e0edb.xml (deflated 90%) 2025-12-04T17:26:28.0219367Z adding: test/test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-fcd1db8f24799401.xml (deflated 90%) 2025-12-04T17:26:28.0222741Z adding: test/test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-bb08f25297cc596b.xml (deflated 92%) 2025-12-04T17:26:28.0227955Z adding: test/test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-bf15e775351f3d84.xml (deflated 90%) 2025-12-04T17:26:28.0239045Z adding: test/test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-cd1c50b62bb47a1b.xml (deflated 93%) 2025-12-04T17:26:28.0275051Z adding: test/test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-3e5313e420476f15.xml (deflated 93%) 2025-12-04T17:26:28.0277296Z adding: test/test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-b23b654b51890d24.xml (deflated 89%) 2025-12-04T17:26:28.0278589Z adding: test/test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-2e7c8f13f7be0603.xml (deflated 91%) 2025-12-04T17:26:28.0279866Z adding: test/test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-8d6cdce6581fa448.xml (deflated 91%) 2025-12-04T17:26:28.0281156Z adding: test/test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-9dce38c1d023996d.xml (deflated 91%) 2025-12-04T17:26:28.0282430Z adding: test/test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-b570798f966501a4.xml (deflated 90%) 2025-12-04T17:26:28.0283705Z adding: test/test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-be9e2a318f1480ff.xml (deflated 90%) 2025-12-04T17:26:28.0284972Z adding: test/test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-e62e290dfdad5699.xml (deflated 91%) 2025-12-04T17:26:28.0293738Z adding: test/test-reports/python-pytest/inductor.test_torchinductor_codegen_dynamic_shapes/inductor.test_torchinductor_codegen_dynamic_shapes-0c75da116b2f10f8.xml (deflated 93%) 2025-12-04T17:26:28.0296125Z adding: test/test-reports/python-pytest/inductor.test_torchinductor_codegen_dynamic_shapes/inductor.test_torchinductor_codegen_dynamic_shapes-fd0863b8a222871a.xml (deflated 87%) 2025-12-04T17:26:28.0298545Z adding: test/test-reports/python-pytest/inductor.test_torchinductor_codegen_dynamic_shapes/inductor.test_torchinductor_codegen_dynamic_shapes-6fcb35b3fc35a71c.xml (deflated 87%) 2025-12-04T17:26:28.0304087Z adding: test/test-reports/python-pytest/inductor.test_torchinductor_codegen_dynamic_shapes/inductor.test_torchinductor_codegen_dynamic_shapes-f8b2416e9d43ac69.xml (deflated 93%) 2025-12-04T17:26:28.0308016Z adding: test/test-reports/python-pytest/inductor.test_torchinductor_opinfo/inductor.test_torchinductor_opinfo-8ad43f769763d7e0.xml (deflated 92%) 2025-12-04T17:26:28.0312322Z adding: test/test-reports/python-pytest/inductor.test_torchinductor_opinfo/inductor.test_torchinductor_opinfo-6495f5d67df68869.xml (deflated 92%) 2025-12-04T17:26:28.0316391Z adding: test/test-reports/python-pytest/inductor.test_torchinductor_opinfo/inductor.test_torchinductor_opinfo-f9f6352517dfd8be.xml (deflated 92%) 2025-12-04T17:26:28.0320765Z adding: test/test-reports/python-pytest/inductor.test_torchinductor_opinfo/inductor.test_torchinductor_opinfo-25ab0fa1230b07b5.xml (deflated 92%) 2025-12-04T17:26:28.0323246Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-74cab4bdcde89184.xml (deflated 85%) 2025-12-04T17:26:28.0325706Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-77e37a2f8b75b3d9.xml (deflated 84%) 2025-12-04T17:26:28.0328269Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-3ba19b390afd5854.xml (deflated 84%) 2025-12-04T17:26:28.0330723Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-4ad317a243ecdd30.xml (deflated 85%) 2025-12-04T17:26:28.0333251Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-f482798b2b39d897.xml (deflated 84%) 2025-12-04T17:26:28.0335796Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-cbe2514f89eef609.xml (deflated 84%) 2025-12-04T17:26:28.0338374Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-3707d31910126ebf.xml (deflated 85%) 2025-12-04T17:26:28.0341029Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-dedaec5daecec784.xml (deflated 84%) 2025-12-04T17:26:28.0343618Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-2f4f0e9c4ac682e4.xml (deflated 84%) 2025-12-04T17:26:28.0346092Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-580d25229e34cb07.xml (deflated 85%) 2025-12-04T17:26:28.0348547Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-9d15e1ab064c4537.xml (deflated 84%) 2025-12-04T17:26:28.0351109Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-e6d909bcc6975bf8.xml (deflated 84%) 2025-12-04T17:26:28.0353731Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-0a612698d44183a1.xml (deflated 85%) 2025-12-04T17:26:28.0356307Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-90f2ceb88314c75a.xml (deflated 84%) 2025-12-04T17:26:28.0358878Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-9644b19a5203c0ee.xml (deflated 84%) 2025-12-04T17:26:28.0361366Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-1a6f999e52eb1904.xml (deflated 85%) 2025-12-04T17:26:28.0363837Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-547414903ca204e9.xml (deflated 84%) 2025-12-04T17:26:28.0366492Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-41f0f199b083e6d2.xml (deflated 84%) 2025-12-04T17:26:28.0369033Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-438f9d52209526cc.xml (deflated 85%) 2025-12-04T17:26:28.0371711Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-98df9406c6e0faf3.xml (deflated 84%) 2025-12-04T17:26:28.0373183Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-f706cf73cc88a5b8.xml (deflated 84%) 2025-12-04T17:26:28.0374650Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-c67f05de6c39b0d8.xml (deflated 85%) 2025-12-04T17:26:28.0376117Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-0e30a339afee7d22.xml (deflated 84%) 2025-12-04T17:26:28.0377661Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-4a14a2e6be65f97f.xml (deflated 84%) 2025-12-04T17:26:28.0379115Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-82a3db4b14f41cd2.xml (deflated 85%) 2025-12-04T17:26:28.0380586Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-a02c7191ab69f431.xml (deflated 84%) 2025-12-04T17:26:28.0382056Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-e37b8ebc7938792f.xml (deflated 84%) 2025-12-04T17:26:28.0383525Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-ee37665d187f9309.xml (deflated 85%) 2025-12-04T17:26:28.0384974Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-511047743df1b08e.xml (deflated 84%) 2025-12-04T17:26:28.0386427Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-4d9221d5ac70ff44.xml (deflated 84%) 2025-12-04T17:26:28.0387927Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-af9a500a606c950b.xml (deflated 85%) 2025-12-04T17:26:28.0389967Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-e3ba96547605fc4e.xml (deflated 84%) 2025-12-04T17:26:28.0391942Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-ce470e45644e1cc6.xml (deflated 84%) 2025-12-04T17:26:28.0393730Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-cece0bb00c5477e6.xml (deflated 85%) 2025-12-04T17:26:28.0395370Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-4e672e5e3ae6046c.xml (deflated 84%) 2025-12-04T17:26:28.0396831Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-65775801d71c7290.xml (deflated 84%) 2025-12-04T17:26:28.0398292Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-9ed754aaaf490f98.xml (deflated 85%) 2025-12-04T17:26:28.0399752Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-36a3a8a6a9d0a436.xml (deflated 84%) 2025-12-04T17:26:28.0401220Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-f55b1076fbed9be9.xml (deflated 84%) 2025-12-04T17:26:28.0403739Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-6062b5e411b734f8.xml (deflated 85%) 2025-12-04T17:26:28.0405197Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-21fa07c752a411ad.xml (deflated 84%) 2025-12-04T17:26:28.0406657Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-4c09ee7a97c51183.xml (deflated 84%) 2025-12-04T17:26:28.0408263Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-c713b1dc3f3923ec.xml (deflated 28%) 2025-12-04T17:26:28.0409721Z adding: test/test-reports/python-pytest/inductor.test_compile_subprocess/inductor.test_compile_subprocess-84a2c5e5cdda7bdd.xml (deflated 95%) 2025-12-04T17:26:28.0411174Z adding: test/test-reports/python-pytest/inductor.test_compile_subprocess/inductor.test_compile_subprocess-97e49e1b6070e822.xml (deflated 92%) 2025-12-04T17:26:28.0412606Z adding: test/test-reports/python-pytest/inductor.test_compile_subprocess/inductor.test_compile_subprocess-aaac502093c587a7.xml (deflated 92%) 2025-12-04T17:26:28.0414596Z adding: test/test-reports/python-pytest/inductor.test_compile_subprocess/inductor.test_compile_subprocess-decce829c4432557.xml (deflated 94%) 2025-12-04T17:26:28.0416432Z adding: test/test-reports/python-pytest/inductor.test_compile_subprocess/inductor.test_compile_subprocess-491de48d6c983340.xml (deflated 73%) 2025-12-04T17:26:28.0418073Z adding: test/test-reports/python-pytest/inductor.test_compile_subprocess/inductor.test_compile_subprocess-35b1cdd46f4129e6.xml (deflated 96%) 2025-12-04T17:26:28.0419446Z adding: test/test-reports/python-pytest/inductor.test_flex_decoding/inductor.test_flex_decoding-4523fe803428b665.xml (deflated 28%) 2025-12-04T17:26:28.0420774Z adding: test/test-reports/python-pytest/inductor.test_deterministic/inductor.test_deterministic-ccc55353a2e77d8f.xml (deflated 90%) 2025-12-04T17:26:28.0422656Z adding: test/test-reports/python-pytest/inductor.test_deterministic/inductor.test_deterministic-cbc1aeff512c7b0d.xml (deflated 93%) 2025-12-04T17:26:28.0427233Z adding: test/test-reports/python-pytest/inductor.test_deterministic/inductor.test_deterministic-b35d65d1a2e42e4e.xml (deflated 93%) 2025-12-04T17:26:28.0428804Z adding: test/test-reports/python-pytest/inductor.test_deterministic/inductor.test_deterministic-feba5ff46dbc30dd.xml (deflated 72%) 2025-12-04T17:26:28.0430409Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-dff864e79f1bf91b.xml (deflated 88%) 2025-12-04T17:26:28.0432135Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-053a0e10a178eff6.xml (deflated 88%) 2025-12-04T17:26:28.0433733Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-966288eeb3fe785e.xml (deflated 88%) 2025-12-04T17:26:28.0435467Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-47dd8058babbbd0d.xml (deflated 88%) 2025-12-04T17:26:28.0437486Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-e92e228ccdafe934.xml (deflated 88%) 2025-12-04T17:26:28.0439472Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-0328fb4bc2fb022d.xml (deflated 88%) 2025-12-04T17:26:28.0441469Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-4ecceae3d20d3515.xml (deflated 88%) 2025-12-04T17:26:28.0443420Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-af3f0411f43ffff1.xml (deflated 88%) 2025-12-04T17:26:28.0445417Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-5f646abfecfc34db.xml (deflated 88%) 2025-12-04T17:26:28.0447380Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-2f71fa45f6063b14.xml (deflated 88%) 2025-12-04T17:26:28.0449393Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-3d881319a967678f.xml (deflated 88%) 2025-12-04T17:26:28.0451382Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-7b45d70025cf6016.xml (deflated 88%) 2025-12-04T17:26:28.0453354Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-b4a285d41fdad5fc.xml (deflated 88%) 2025-12-04T17:26:28.0455280Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-9b24822b6f23300e.xml (deflated 88%) 2025-12-04T17:26:28.0457429Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-642548938a706c13.xml (deflated 88%) 2025-12-04T17:26:28.0459608Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-3087aa3d89d0a96b.xml (deflated 89%) 2025-12-04T17:26:28.0461584Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-5fb4c628c04a1cdc.xml (deflated 88%) 2025-12-04T17:26:28.0463488Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-0d752e0bfa5071ea.xml (deflated 88%) 2025-12-04T17:26:28.0465442Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-fedbc7df4b1c2869.xml (deflated 88%) 2025-12-04T17:26:28.0467390Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-3cf6d62b643bfad7.xml (deflated 88%) 2025-12-04T17:26:28.0469405Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-e2e744a24cd2751e.xml (deflated 88%) 2025-12-04T17:26:28.0471522Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-6de5a411a3f65f82.xml (deflated 88%) 2025-12-04T17:26:28.0473551Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-d2f3621583fff098.xml (deflated 88%) 2025-12-04T17:26:28.0475484Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-319cee3df6121e1a.xml (deflated 88%) 2025-12-04T17:26:28.0477476Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-452be63c68b4eb35.xml (deflated 88%) 2025-12-04T17:26:28.0479440Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-5a49841d6a2b730b.xml (deflated 88%) 2025-12-04T17:26:28.0481418Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-f1313a025d30dc09.xml (deflated 88%) 2025-12-04T17:26:28.0483412Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-03aedafc0832726c.xml (deflated 88%) 2025-12-04T17:26:28.0485399Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-89171bcc48f05a69.xml (deflated 88%) 2025-12-04T17:26:28.0487617Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-6450e334481f0131.xml (deflated 88%) 2025-12-04T17:26:28.0489297Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-f7999da795e3cf34.xml (deflated 86%) 2025-12-04T17:26:28.0496870Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-ad7a38726bbc8b50.xml (deflated 96%) 2025-12-04T17:26:28.0504976Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-b434424093647de3.xml (deflated 96%) 2025-12-04T17:26:28.0506896Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-ccd966f4e119e833.xml (deflated 88%) 2025-12-04T17:26:28.0509860Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-d16f18ba4de45d90.xml (deflated 90%) 2025-12-04T17:26:28.0512727Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-4078dca354f1c797.xml (deflated 90%) 2025-12-04T17:26:28.0514881Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-7591ded94ad5fda9.xml (deflated 88%) 2025-12-04T17:26:28.0516871Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-4955a88ef6b89264.xml (deflated 88%) 2025-12-04T17:26:28.0518974Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-2fae1650dec37ec0.xml (deflated 88%) 2025-12-04T17:26:28.0521002Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-0893388d06071d35.xml (deflated 88%) 2025-12-04T17:26:28.0522991Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-b62e3abe6013e6ef.xml (deflated 88%) 2025-12-04T17:26:28.0525070Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-0fe50dbde6f69754.xml (deflated 88%) 2025-12-04T17:26:28.0527227Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-1381d94cd6abec18.xml (deflated 88%) 2025-12-04T17:26:28.0529327Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-2b9aebe063e8f7ef.xml (deflated 88%) 2025-12-04T17:26:28.0531132Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-649ba93d0ac5919c.xml (deflated 88%) 2025-12-04T17:26:28.0533179Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-df60bd1ca7e6baab.xml (deflated 88%) 2025-12-04T17:26:28.0535316Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-927fdf8f8ff6280c.xml (deflated 88%) 2025-12-04T17:26:28.0537383Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-2f380c761dc75570.xml (deflated 88%) 2025-12-04T17:26:28.0539450Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-db3aa4c2f1c0f2c1.xml (deflated 88%) 2025-12-04T17:26:28.0541474Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-7c20e7902388541e.xml (deflated 88%) 2025-12-04T17:26:28.0543441Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-43cf13c151388d8e.xml (deflated 88%) 2025-12-04T17:26:28.0545437Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-27661fe34019a4f8.xml (deflated 88%) 2025-12-04T17:26:28.0547553Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-63ef36c446edecf7.xml (deflated 88%) 2025-12-04T17:26:28.0549493Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-818cc5e6f257d295.xml (deflated 88%) 2025-12-04T17:26:28.0551479Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-b552d5ebf2a766dc.xml (deflated 88%) 2025-12-04T17:26:28.0553566Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-08c28ac73e77007a.xml (deflated 88%) 2025-12-04T17:26:28.0555550Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-df1b42bf8f6cd06e.xml (deflated 88%) 2025-12-04T17:26:28.0557605Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-97d6c66aee44b097.xml (deflated 88%) 2025-12-04T17:26:28.0559696Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-232f2d4b09cdec77.xml (deflated 88%) 2025-12-04T17:26:28.0561582Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-6add3d31a0a55a66.xml (deflated 88%) 2025-12-04T17:26:28.0563630Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-fa52f41f0c0be4e5.xml (deflated 88%) 2025-12-04T17:26:28.0565711Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-38b24c1b21208356.xml (deflated 88%) 2025-12-04T17:26:28.0567824Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-b1ae24833396f782.xml (deflated 88%) 2025-12-04T17:26:28.0569864Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-80996ba6b8c32f81.xml (deflated 88%) 2025-12-04T17:26:28.0572012Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-8b26ba548538abde.xml (deflated 88%) 2025-12-04T17:26:28.0574082Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-d73817a3e5f02a06.xml (deflated 88%) 2025-12-04T17:26:28.0577918Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-90e37d7f0968dad1.xml (deflated 92%) 2025-12-04T17:26:28.0579842Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-6ba281452d587f38.xml (deflated 88%) 2025-12-04T17:26:28.0581703Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-85d1d6e9267cc116.xml (deflated 88%) 2025-12-04T17:26:28.0583700Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-7a610c26dd7fa0e9.xml (deflated 88%) 2025-12-04T17:26:28.0585606Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-269f6089cafc9f3b.xml (deflated 88%) 2025-12-04T17:26:28.0587536Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-f11fe18ee197cc1f.xml (deflated 88%) 2025-12-04T17:26:28.0590424Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-0b8acd36d7258295.xml (deflated 87%) 2025-12-04T17:26:28.0592110Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-babe12520ea62fea.xml (deflated 88%) 2025-12-04T17:26:28.0593996Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-08a6bb29b776e6ca.xml (deflated 88%) 2025-12-04T17:26:28.0595966Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-ef8db3fa00c6c1d7.xml (deflated 88%) 2025-12-04T17:26:28.0597896Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-7a3ac84fc91fa02b.xml (deflated 88%) 2025-12-04T17:26:28.0599901Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-e162f70cb76e49ff.xml (deflated 88%) 2025-12-04T17:26:28.0602406Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-e70a5c274fb86b8e.xml (deflated 88%) 2025-12-04T17:26:28.0604368Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-0c17434f07767682.xml (deflated 88%) 2025-12-04T17:26:28.0606626Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-3815c1aa47a06d85.xml (deflated 88%) 2025-12-04T17:26:28.0608637Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-69850f25ab7699fd.xml (deflated 88%) 2025-12-04T17:26:28.0610773Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-da23a1d59c747be6.xml (deflated 88%) 2025-12-04T17:26:28.0612924Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-36993cd4956a89fe.xml (deflated 88%) 2025-12-04T17:26:28.0614946Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-78153e5fcd212bc6.xml (deflated 88%) 2025-12-04T17:26:28.0617182Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-04b538cf09549803.xml (deflated 88%) 2025-12-04T17:26:28.0619441Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-d91f9f6b0d5ec125.xml (deflated 88%) 2025-12-04T17:26:28.0621479Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-ebbb316cfb6210df.xml (deflated 88%) 2025-12-04T17:26:28.0623514Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-6f1b13e751374b5d.xml (deflated 88%) 2025-12-04T17:26:28.0625633Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-f2c09e3279cd971a.xml (deflated 88%) 2025-12-04T17:26:28.0627519Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-5c0e3bac2edd6805.xml (deflated 88%) 2025-12-04T17:26:28.0629500Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-1c004f486086cbb5.xml (deflated 88%) 2025-12-04T17:26:28.0631485Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-5e05e7b060f911b0.xml (deflated 88%) 2025-12-04T17:26:28.0634972Z adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-4d9849432c7f5caf.xml (deflated 97%) 2025-12-04T17:26:28.0637413Z adding: test/test-reports/python-pytest/dynamo.test_model_output/dynamo.test_model_output-d1e6e78eb0372411.xml (deflated 82%) 2025-12-04T17:26:28.0639272Z adding: test/test-reports/python-pytest/dynamo.test_model_output/dynamo.test_model_output-0e0432c8246f889e.xml (deflated 82%) 2025-12-04T17:26:28.0640634Z adding: test/test-reports/python-pytest/dynamo.test_model_output/dynamo.test_model_output-1d824658578ee605.xml (deflated 82%) 2025-12-04T17:26:28.0642099Z adding: test/test-reports/python-pytest/dynamo.test_model_output/dynamo.test_model_output-a0141b45c0b55065.xml (deflated 78%) 2025-12-04T17:26:28.0660456Z adding: test/test-reports/python-pytest/inductor.test_triton_kernels/inductor.test_triton_kernels-498ce8e3e7c25595.xml (deflated 94%) 2025-12-04T17:26:28.0663839Z adding: test/test-reports/python-pytest/inductor.test_loop_ordering/inductor.test_loop_ordering-264346bf50f4314b.xml (deflated 85%) 2025-12-04T17:26:28.0666423Z adding: test/test-reports/python-pytest/inductor.test_loop_ordering/inductor.test_loop_ordering-f4f4ac9590e83730.xml (deflated 87%) 2025-12-04T17:26:28.0668397Z adding: test/test-reports/python-pytest/inductor.test_loop_ordering/inductor.test_loop_ordering-193da166d9e268ac.xml (deflated 87%) 2025-12-04T17:26:28.0669859Z adding: test/test-reports/python-pytest/inductor.test_loop_ordering/inductor.test_loop_ordering-d4eac70f931f6c8b.xml (deflated 92%) 2025-12-04T17:26:28.0758138Z adding: test/test-reports/python-pytest/export.test_serdes/export.test_serdes-191fd84c43c29743.xml (deflated 95%) 2025-12-04T17:26:28.0760083Z adding: test/test-reports/python-pytest/dynamo.test_backends/dynamo.test_backends-7c3220f8bc842d2f.xml (deflated 82%) 2025-12-04T17:26:28.0762422Z adding: test/test-reports/python-pytest/dynamo.test_backends/dynamo.test_backends-7281b232f81c7a26.xml (deflated 86%) 2025-12-04T17:26:28.0763821Z adding: test/test-reports/python-pytest/dynamo.test_backends/dynamo.test_backends-0c21d337a20b3a01.xml (deflated 86%) 2025-12-04T17:26:28.0765118Z adding: test/test-reports/python-pytest/dynamo.test_backends/dynamo.test_backends-62a9a8755ec319d6.xml (deflated 77%) 2025-12-04T17:26:28.0768164Z adding: test/test-reports/python-pytest/inductor.test_aot_inductor_package/inductor.test_aot_inductor_package-b1ca468dab29d0d8.xml (deflated 94%) 2025-12-04T17:26:28.0770393Z adding: test/test-reports/python-pytest/inductor.test_aot_inductor_package/inductor.test_aot_inductor_package-69f64b5320fd797d.xml (deflated 90%) 2025-12-04T17:26:28.0772914Z adding: test/test-reports/python-pytest/inductor.test_aot_inductor_package/inductor.test_aot_inductor_package-e41e403fca9b1188.xml (deflated 90%) 2025-12-04T17:26:28.0774863Z adding: test/test-reports/python-pytest/inductor.test_aot_inductor_package/inductor.test_aot_inductor_package-07f86488d4cce1d3.xml (deflated 88%) 2025-12-04T17:26:28.0778217Z adding: test/test-reports/python-pytest/inductor.test_padding/inductor.test_padding-be250a10b53bb058.xml (deflated 89%) 2025-12-04T17:26:28.0780048Z adding: test/test-reports/python-pytest/dynamo.test_aot_compile/dynamo.test_aot_compile-10a88b68c9603fe3.xml (deflated 83%) 2025-12-04T17:26:28.0781803Z adding: test/test-reports/python-pytest/dynamo.test_sets/dynamo.test_sets-f0cb58e83c4ea8ef.xml (deflated 88%) 2025-12-04T17:26:28.0783845Z adding: test/test-reports/python-pytest/dynamo.test_wrap_inductor_compiled_regions/dynamo.test_wrap_inductor_compiled_regions-2f1d9c362e038030.xml (deflated 84%) 2025-12-04T17:26:28.0808854Z adding: test/test-reports/python-pytest/test_sparse/test_sparse-598e6683c5cfc22a.xml (deflated 95%) 2025-12-04T17:26:28.0818882Z adding: test/test-reports/python-pytest/test_decomp/test_decomp-5879e0e26736617e.xml (deflated 91%) 2025-12-04T17:26:28.0828673Z adding: test/test-reports/python-pytest/test_decomp/test_decomp-c4519c63d1395608.xml (deflated 91%) 2025-12-04T17:26:28.0838858Z adding: test/test-reports/python-pytest/test_decomp/test_decomp-fd1a91e45a41098b.xml (deflated 91%) 2025-12-04T17:26:28.0876525Z adding: test/test-reports/python-pytest/test_ops_fwd_gradients/test_ops_fwd_gradients-dac273fbaf67ad10.xml (deflated 96%) 2025-12-04T17:26:28.1027082Z adding: test/test-reports/python-pytest/test_meta/test_meta-cbc50d7c3e0b1b6a.xml (deflated 96%) 2025-12-04T17:26:28.1039912Z adding: test/test-reports/python-pytest/test_ops_jit/test_ops_jit-2f4faab6a29e642c.xml (deflated 93%) 2025-12-04T17:26:28.1069195Z adding: test/test-reports/python-pytest/test_nestedtensor/test_nestedtensor-8372b6917771ca4c.xml (deflated 98%) 2025-12-04T17:26:28.1134019Z adding: test/test-reports/python-pytest/test_ops/test_ops-d95bfbe57b5d2d89.xml (deflated 94%) 2025-12-04T17:26:28.1211884Z adding: test/test-reports/python-pytest/test_ops/test_ops-75f8d45594e24741.xml (deflated 95%) 2025-12-04T17:26:28.1213538Z adding: test/test-reports/python-pytest/functorch.test_dims/functorch.test_dims-e2a9e671430fd99e.xml (deflated 86%) 2025-12-04T17:26:28.1248695Z adding: test/test-reports/python-pytest/functorch.test_ops/functorch.test_ops-caabf5583dae6043.xml (deflated 93%) 2025-12-04T17:26:28.1284410Z adding: test/test-reports/python-pytest/functorch.test_ops/functorch.test_ops-b6190fae5240f1fb.xml (deflated 93%) 2025-12-04T17:26:28.1299029Z adding: test/test-reports/python-pytest/inductor.test_cpu_repro/inductor.test_cpu_repro-e45fcbaf6c1a2b2c.xml (deflated 96%) 2025-12-04T17:26:28.1300465Z adding: test/test-reports/python-pytest/inductor.test_custom_lowering/inductor.test_custom_lowering-f90a8c2a1b7dd9b0.xml (deflated 69%) 2025-12-04T17:26:28.1305819Z adding: test/test-reports/python-pytest/inductor.test_perf/inductor.test_perf-34de9a09a2935f8d.xml (deflated 92%) 2025-12-04T17:26:28.1307453Z adding: test/test-reports/python-pytest/inductor.test_binary_folding/inductor.test_binary_folding-0c797ad2be676af7.xml (deflated 79%) 2025-12-04T17:26:28.1313789Z adding: test/test-reports/python-pytest/inductor.test_mkldnn_pattern_matcher/inductor.test_mkldnn_pattern_matcher-c93031a5b8f8293d.xml (deflated 94%) 2025-12-04T17:26:28.1325590Z adding: test/test-reports/python-pytest/inductor.test_gpu_cpp_wrapper/inductor.test_gpu_cpp_wrapper-c206afd337165094.xml (deflated 94%) 2025-12-04T17:26:28.1327612Z adding: test/test-reports/python-pytest/inductor.test_cutedsl_template/inductor.test_cutedsl_template-1780c0291e7a0397.xml (deflated 88%) 2025-12-04T17:26:28.1328969Z adding: test/test-reports/python-pytest/inductor.test_benchmark_fusion/inductor.test_benchmark_fusion-33e3c50f2f02127c.xml (deflated 80%) 2025-12-04T17:26:28.1334107Z adding: test/test-reports/python-pytest/dynamo.test_modules/dynamo.test_modules-f3674dc870090d50.xml (deflated 88%) 2025-12-04T17:26:28.1336134Z adding: test/test-reports/python-pytest/dynamo.test_recompiles/dynamo.test_recompiles-755ec9793479e2dd.xml (deflated 76%) 2025-12-04T17:26:28.1338522Z adding: test/test-reports/python-pytest/export.test_tree_utils/export.test_tree_utils-4b33de82582b2e92.xml (deflated 48%) 2025-12-04T17:26:28.1340573Z adding: test/test-reports/python-pytest/inductor.test_triton_wrapper/inductor.test_triton_wrapper-7697274370716365.xml (deflated 50%) 2025-12-04T17:26:28.1343265Z adding: test/test-reports/python-pytest/inductor.test_static_cuda_launcher/inductor.test_static_cuda_launcher-96effba66b878950.xml (deflated 85%) 2025-12-04T17:26:28.1345939Z adding: test/test-reports/python-pytest/export.test_dynamic_shapes/export.test_dynamic_shapes-6f817f896f94c83c.xml (deflated 50%) 2025-12-04T17:26:28.1347156Z adding: test/test-reports/python-pytest/dynamo.test_sdpa/dynamo.test_sdpa-3e0149796a415876.xml (deflated 73%) 2025-12-04T17:26:28.1348245Z adding: test/test-reports/python-pytest/dynamo.test_utils/dynamo.test_utils-e6d94f5c34c685f8.xml (deflated 80%) 2025-12-04T17:26:28.1349463Z adding: test/test-reports/python-pytest/inductor.test_codegen_triton/inductor.test_codegen_triton-f741c3b21cf28e3b.xml (deflated 35%) 2025-12-04T17:26:28.1350727Z adding: test/test-reports/python-pytest/dynamo.test_frame_init/dynamo.test_frame_init-c2e1024fb8a07387.xml (deflated 38%) 2025-12-04T17:26:28.1351973Z adding: test/test-reports/python-pytest/inductor.test_device_assert/inductor.test_device_assert-451c7142fcd9d62b.xml (deflated 80%) 2025-12-04T17:26:28.1353258Z adding: test/test-reports/python-pytest/dynamo.test_skip_non_tensor/dynamo.test_skip_non_tensor-f190ace25428cb94.xml (deflated 71%) 2025-12-04T17:26:28.1354591Z adding: test/test-reports/python-pytest/dynamo.test_skip_guard_eval_unsafe/dynamo.test_skip_guard_eval_unsafe-aa1ded9d0a4e400e.xml (deflated 73%) 2025-12-04T17:26:28.1356059Z adding: test/test-reports/python-pytest/inductor.test_control_deps/inductor.test_control_deps-23047419ffe03376.xml (deflated 48%) 2025-12-04T17:26:28.1357357Z adding: test/test-reports/python-pytest/inductor.test_benchmarking/inductor.test_benchmarking-53f04a03954d2058.xml (deflated 87%) 2025-12-04T17:26:28.1358796Z adding: test/test-reports/python-pytest/inductor.test_helion_kernels/inductor.test_helion_kernels-0df86f8cd24ea26a.xml (deflated 62%) 2025-12-04T17:26:28.1360120Z adding: test/test-reports/python-pytest/inductor.test_quantization/inductor.test_quantization-951156711359c867.xml (deflated 62%) 2025-12-04T17:26:28.1361665Z adding: test/test-reports/python-pytest/export.test_tools/export.test_tools-c033e9415dabe65c.xml (deflated 47%) 2025-12-04T17:26:28.1363191Z adding: test/test-reports/python-pytest/inductor.test_compiled_optimizers/inductor.test_compiled_optimizers-1745c9b9e5fc7ed3.xml (deflated 96%) 2025-12-04T17:26:28.1364618Z adding: test/test-reports/python-pytest/inductor.test_aot_inductor_utils/inductor.test_aot_inductor_utils-e7355f16ccb52d23.xml (deflated 28%) 2025-12-04T17:26:28.1382075Z adding: test/test-reports/python-pytest/inductor.test_control_flow/inductor.test_control_flow-0b2081966a192cef.xml (deflated 97%) 2025-12-04T17:26:28.1386332Z adding: test/test-reports/python-pytest/inductor.test_minifier_isolate/inductor.test_minifier_isolate-f50615d1a1981661.xml (deflated 93%) 2025-12-04T17:26:28.1509754Z adding: test/test-reports/python-pytest/dynamo.test_error_messages/dynamo.test_error_messages-36d8e363c2770c16.xml (deflated 95%) 2025-12-04T17:26:28.1511639Z adding: test/test-reports/python-pytest/dynamo.test_fake_distributed/dynamo.test_fake_distributed-b0f5d6fe6c345e8f.xml (deflated 58%) 2025-12-04T17:26:28.1512879Z adding: test/test-reports/python-pytest/dynamo.test_tree_map/dynamo.test_tree_map-39d9c68e899fe910.xml (deflated 90%) 2025-12-04T17:26:28.1520593Z adding: test/test-reports/python-pytest/dynamo.test_minifier/dynamo.test_minifier-9124cc51e1c5e7b6.xml (deflated 94%) 2025-12-04T17:26:28.1522729Z adding: test/test-reports/python-pytest/dynamo.test_guard_manager/dynamo.test_guard_manager-f0dd8a549f18516b.xml (deflated 86%) 2025-12-04T17:26:28.1524509Z adding: test/test-reports/python-pytest/export.test_schema/export.test_schema-98e7fce7714746ab.xml (deflated 67%) 2025-12-04T17:26:28.1526038Z adding: test/test-reports/python-pytest/export.test_pass_infra/export.test_pass_infra-0489f34d1d482c78.xml (deflated 75%) 2025-12-04T17:26:28.1527411Z adding: test/test-reports/python-pytest/dynamo.test_recompile_ux/dynamo.test_recompile_ux-5436245cbc75fddd.xml (deflated 79%) 2025-12-04T17:26:28.1529136Z adding: test/test-reports/python-pytest/export.test_experimental/export.test_experimental-4743e9a7200af635.xml (deflated 86%) 2025-12-04T17:26:28.1530858Z adding: test/test-reports/python-pytest/export.test_converter/export.test_converter-a6e4e9ebcfaea6df.xml (deflated 90%) 2025-12-04T17:26:28.1532076Z adding: test/test-reports/python-pytest/dynamo.test_reorder_logs/dynamo.test_reorder_logs-d530254831fe0a21.xml (deflated 85%) 2025-12-04T17:26:28.1535373Z adding: test/test-reports/python-pytest/dynamo.test_subclasses/dynamo.test_subclasses-90ae20717b7fd572.xml (deflated 91%) 2025-12-04T17:26:28.1537340Z adding: test/test-reports/python-pytest/dynamo.test_python_autograd/dynamo.test_python_autograd-b76a60537c2ba691.xml (deflated 75%) 2025-12-04T17:26:28.1538605Z adding: test/test-reports/python-pytest/export.test_draft_export/export.test_draft_export-0c8a812115433a7d.xml (deflated 83%) 2025-12-04T17:26:28.1540703Z adding: test/test-reports/python-pytest/test_package/test_package-523d81f0792170f1.xml (deflated 87%) 2025-12-04T17:26:28.1542090Z adding: test/test-reports/python-pytest/test_mkl_verbose/test_mkl_verbose-874cbf06946f8b3e.xml (deflated 50%) 2025-12-04T17:26:28.1543207Z adding: test/test-reports/python-pytest/test_comparison_utils/test_comparison_utils-ce770324779d51b3.xml (deflated 76%) 2025-12-04T17:26:28.1544531Z adding: test/test-reports/python-pytest/functorch.test_ac_logging/functorch.test_ac_logging-f1c79a1c8c74be66.xml (deflated 63%) 2025-12-04T17:26:28.1545722Z adding: test/test-reports/python-pytest/test_mkldnn_verbose/test_mkldnn_verbose-e983273d29ed8e1e.xml (deflated 50%) 2025-12-04T17:26:28.1548731Z adding: test/test-reports/python-pytest/test_cpp_api_parity/test_cpp_api_parity-c6b7300fef8db168.xml (deflated 94%) 2025-12-04T17:26:28.1550515Z adding: test/test-reports/python-pytest/test_autoload/test_autoload-21f1eacf8f4a4d28.xml (deflated 38%) 2025-12-04T17:26:28.1551701Z adding: test/test-reports/python-pytest/nn.attention.test_open_registry/nn.attention.test_open_registry-bacfee0084c93992.xml (deflated 51%) 2025-12-04T17:26:28.1552903Z adding: test/test-reports/python-pytest/test_as_strided/test_as_strided-4555079064233d7d.xml (deflated 49%) 2025-12-04T17:26:28.1600519Z adding: test/test-reports/python-pytest/test_foreach/test_foreach-aa4419a4e7b6d381.xml (deflated 96%) 2025-12-04T17:26:28.1601958Z adding: test/test-reports/python-pytest/xpu.test_gemm/xpu.test_gemm-2cb9cf39de6aa2cf.xml (deflated 28%) 2025-12-04T17:26:28.1603601Z adding: test/test-reports/python-pytest/test_numpy_interop/test_numpy_interop-660870d95235d56d.xml (deflated 88%) 2025-12-04T17:26:28.1605245Z adding: test/test-reports/python-pytest/profiler.test_cpp_thread/profiler.test_cpp_thread-31559e2ba96f64a3.xml (deflated 82%) 2025-12-04T17:26:28.1607075Z adding: test/test-reports/python-pytest/test_hub/test_hub-33a47573ff45c77e.xml (deflated 83%) 2025-12-04T17:26:28.1609166Z adding: test/test-reports/python-pytest/test_segment_reductions/test_segment_reductions-ad616dd6940e0de0.xml (deflated 95%) 2025-12-04T17:26:28.1610381Z adding: test/test-reports/python-pytest/test_autograd_fallback/test_autograd_fallback-e1a7bbd98afc63dc.xml (deflated 89%) 2025-12-04T17:26:28.1611492Z adding: test/test-reports/python-pytest/test_type_hints/test_type_hints-d14fd0906e097d86.xml (deflated 58%) 2025-12-04T17:26:28.1612809Z adding: test/test-reports/python-pytest/functorch.test_aot_joint_with_descriptors/functorch.test_aot_joint_with_descriptors-79fd9b229bc0c00b.xml (deflated 83%) 2025-12-04T17:26:28.1614189Z adding: test/test-reports/python-pytest/test_fx_reinplace_pass/test_fx_reinplace_pass-047146b9ff22e4f6.xml (deflated 76%) 2025-12-04T17:26:28.1632137Z adding: test/test-reports/python-pytest/functorch.test_control_flow/functorch.test_control_flow-922a9914156e0312.xml (deflated 95%) 2025-12-04T17:26:28.1633921Z adding: test/test-reports/python-pytest/test_subclass/test_subclass-68565895e4fc66ea.xml (deflated 93%) 2025-12-04T17:26:28.1658320Z adding: test/test-reports/python-pytest/functorch.test_vmap_registrations/functorch.test_vmap_registrations-40d5b566ee6986dc.xml (deflated 97%) 2025-12-04T17:26:28.1660162Z adding: test/test-reports/python-pytest/nn.test_parametrization/nn.test_parametrization-ed4e97080833ff92.xml (deflated 90%) 2025-12-04T17:26:28.1675759Z adding: test/test-reports/python-pytest/test_dynamic_shapes/test_dynamic_shapes-07075f000d166d21.xml (deflated 94%) 2025-12-04T17:26:28.1677812Z adding: test/test-reports/python-pytest/test_dispatch/test_dispatch-bf1fd68f7abb7228.xml (deflated 85%) 2025-12-04T17:26:28.1678911Z adding: test/test-reports/python-pytest/test_numba_integration/test_numba_integration-edcc49db775b9990.xml (deflated 80%) 2025-12-04T17:26:28.1680188Z adding: test/test-reports/python-pytest/test_functional_optim/test_functional_optim-389cbc1bb3d61470.xml (deflated 74%) 2025-12-04T17:26:28.1692042Z adding: test/test-reports/python-pytest/test_maskedtensor/test_maskedtensor-1089c4e953521eec.xml (deflated 95%) 2025-12-04T17:26:28.1693593Z adding: test/test-reports/python-pytest/benchmark_utils.test_benchmark_utils/benchmark_utils.test_benchmark_utils-76c10c33afe299c4.xml (deflated 79%) 2025-12-04T17:26:28.1714636Z adding: test/test-reports/python-pytest/test_scaled_matmul_cuda/test_scaled_matmul_cuda-d1f8763e6c1869e6.xml (deflated 99%) 2025-12-04T17:26:28.1716974Z adding: test/test-reports/python-pytest/torch_np.numpy_tests.core.test_shape_base/torch_np.numpy_tests.core.test_shape_base-b9eed7c143bc9bc3.xml (deflated 95%) 2025-12-04T17:26:28.1718454Z adding: test/test-reports/python-pytest/test_vulkan/test_vulkan-b25d187bf3baa78a.xml (deflated 45%) 2025-12-04T17:26:28.1719512Z adding: test/test-reports/python-pytest/lazy.test_generator/lazy.test_generator-42072a3593c4e25d.xml (deflated 52%) 2025-12-04T17:26:28.1722745Z adding: test/test-reports/python-pytest/torch_np.numpy_tests.linalg.test_linalg/torch_np.numpy_tests.linalg.test_linalg-2974f2048ff6a577.xml (deflated 94%) 2025-12-04T17:26:28.1725429Z adding: test/test-reports/python-pytest/torch_np.numpy_tests.core.test_dtype/torch_np.numpy_tests.core.test_dtype-4b8c4285965a7813.xml (deflated 95%) 2025-12-04T17:26:28.1727400Z adding: test/test-reports/python-pytest/lazy.test_debug_util/lazy.test_debug_util-7c02b1e3dfee61bd.xml (deflated 35%) 2025-12-04T17:26:28.1728868Z adding: test/test-reports/python-pytest/nn.test_load_state_dict/nn.test_load_state_dict-e81d6ed8d3f8789f.xml (deflated 89%) 2025-12-04T17:26:28.1730811Z adding: test/test-reports/python-pytest/test_shape_ops/test_shape_ops-a6160583c0856270.xml (deflated 91%) 2025-12-04T17:26:28.1732375Z adding: test/test-reports/python-pytest/nn.test_module_hooks/nn.test_module_hooks-e13d4f4eb9af9666.xml (deflated 88%) 2025-12-04T17:26:28.1734391Z adding: test/test-reports/python-pytest/torch_np.numpy_tests.lib.test_twodim_base/torch_np.numpy_tests.lib.test_twodim_base-2da66c446de8da89.xml (deflated 87%) 2025-12-04T17:26:28.1736044Z adding: test/test-reports/python-pytest/profiler.test_memory_profiler/profiler.test_memory_profiler-20f5e2eefecacaee.xml (deflated 79%) 2025-12-04T17:26:28.1737361Z adding: test/test-reports/python-pytest/test_jit_llga_fuser/test_jit_llga_fuser-b203cab2c461ce78.xml (deflated 94%) 2025-12-04T17:26:28.1738772Z adding: test/test-reports/python-pytest/torch_np.numpy_tests.core.test_getlimits/torch_np.numpy_tests.core.test_getlimits-5149534e2555ec6f.xml (deflated 86%) 2025-12-04T17:26:28.1740629Z adding: test/test-reports/python-pytest/torch_np.test_ndarray_methods/torch_np.test_ndarray_methods-fe7c638b86097b2d.xml (deflated 96%) 2025-12-04T17:26:28.1743845Z adding: test/test-reports/python-pytest/test_view_ops/test_view_ops-f5d6b3525797eb50.xml (deflated 92%) 2025-12-04T17:26:28.1745069Z adding: test/test-reports/python-pytest/test_type_info/test_type_info-22600993e111f6f2.xml (deflated 67%) 2025-12-04T17:26:28.1763142Z adding: test/test-reports/python-pytest/functorch.test_aotdispatch/functorch.test_aotdispatch-efb7e0b79840fa38.xml (deflated 93%) 2025-12-04T17:26:28.1764917Z adding: test/test-reports/python-pytest/test_scatter_gather_ops/test_scatter_gather_ops-5274bd99c8a0619f.xml (deflated 91%) 2025-12-04T17:26:28.1766550Z adding: test/test-reports/python-pytest/test_cuda_multigpu/test_cuda_multigpu-a9a26e79d8868522.xml (deflated 91%) 2025-12-04T17:26:28.1768480Z adding: test/test-reports/python-pytest/torch_np.numpy_tests.lib.test_index_tricks/torch_np.numpy_tests.lib.test_index_tricks-43f32c31fbfc43cd.xml (deflated 90%) 2025-12-04T17:26:28.1770212Z adding: test/test-reports/python-pytest/test_jit_autocast/test_jit_autocast-9b5e22ff1077135a.xml (deflated 86%) 2025-12-04T17:26:28.1772234Z adding: test/test-reports/python-pytest/nn.test_pooling/nn.test_pooling-2151df52b065bbdf.xml (deflated 92%) 2025-12-04T17:26:28.1775325Z adding: test/test-reports/python-pytest/nn.test_embedding/nn.test_embedding-d055fd5d393643fe.xml (deflated 95%) 2025-12-04T17:26:28.1777667Z adding: test/test-reports/python-pytest/test_xnnpack_integration/test_xnnpack_integration-ed8e38bda9a33f4f.xml (deflated 81%) 2025-12-04T17:26:28.1779900Z adding: test/test-reports/python-pytest/test_cuda_trace/test_cuda_trace-5e38c3c197506de5.xml (deflated 38%) 2025-12-04T17:26:28.1781456Z adding: test/test-reports/python-pytest/test_cuda_trace/test_cuda_trace-4795d7c5159b6e03.xml (deflated 35%) 2025-12-04T17:26:28.1782598Z adding: test/test-reports/python-pytest/test_cuda_trace/test_cuda_trace-6c76df2a5666e90f.xml (deflated 36%) 2025-12-04T17:26:28.1783634Z adding: test/test-reports/python-pytest/test_cuda_trace/test_cuda_trace-c93df3ae687a8e58.xml (deflated 35%) 2025-12-04T17:26:28.1784663Z adding: test/test-reports/python-pytest/test_cuda_trace/test_cuda_trace-524a9565fc6ac576.xml (deflated 36%) 2025-12-04T17:26:28.1785807Z adding: test/test-reports/python-pytest/test_cuda_trace/test_cuda_trace-b18b3c9d4ddc6b34.xml (deflated 35%) 2025-12-04T17:26:28.1786845Z adding: test/test-reports/python-pytest/test_cuda_trace/test_cuda_trace-319074c2014cbf3e.xml (deflated 36%) 2025-12-04T17:26:28.1787860Z adding: test/test-reports/python-pytest/test_cuda_trace/test_cuda_trace-2483ba726355768c.xml (deflated 36%) 2025-12-04T17:26:28.1788890Z adding: test/test-reports/python-pytest/test_cuda_trace/test_cuda_trace-f2eed9c29ea8eac7.xml (deflated 36%) 2025-12-04T17:26:28.1789932Z adding: test/test-reports/python-pytest/test_cuda_trace/test_cuda_trace-c778cc218c519690.xml (deflated 36%) 2025-12-04T17:26:28.1790961Z adding: test/test-reports/python-pytest/test_cuda_trace/test_cuda_trace-7897a7f1d03cdaa3.xml (deflated 37%) 2025-12-04T17:26:28.1791979Z adding: test/test-reports/python-pytest/test_cuda_trace/test_cuda_trace-d4fb26045e698199.xml (deflated 35%) 2025-12-04T17:26:28.1799383Z adding: test/test-reports/python-pytest/torch_np.test_reductions/torch_np.test_reductions-73a2026a6cdfd4dc.xml (deflated 98%) 2025-12-04T17:26:28.1802044Z adding: test/test-reports/python-pytest/torch_np.numpy_tests.core.test_scalar_ctors/torch_np.numpy_tests.core.test_scalar_ctors-e23576bcb06b5d61.xml (deflated 94%) 2025-12-04T17:26:28.1803588Z adding: test/test-reports/python-pytest/torch_np.numpy_tests.lib.test_arraypad/torch_np.numpy_tests.lib.test_arraypad-f4e46a1506be78e1.xml (deflated 78%) 2025-12-04T17:26:28.1804803Z adding: test/test-reports/python-pytest/test_prims/test_prims-38188698633a9bb5.xml (deflated 81%) 2025-12-04T17:26:28.1807712Z adding: test/test-reports/python-pytest/test_spectral_ops/test_spectral_ops-ae86fbbf23286ef9.xml (deflated 93%) 2025-12-04T17:26:28.1809892Z adding: test/test-reports/python-pytest/test_cpp_extensions_aot_ninja/test_cpp_extensions_aot_ninja-5c9ab2f003415ced.xml (deflated 81%) 2025-12-04T17:26:28.1811236Z adding: test/test-reports/python-pytest/test_cpp_extensions_aot_no_ninja/test_cpp_extensions_aot_no_ninja-dc0b3ab1cc30279c.xml (deflated 82%) 2025-12-04T17:26:28.1812643Z adding: test/test-reports/python-unittest/test_autoload/TEST-TestDeviceBackendAutoload-20251204172052.xml (deflated 42%) 2025-12-04T17:26:28.1841167Z ##[group]Run # Remove any previous usage logs if they exist 2025-12-04T17:26:28.1841723Z # Remove any previous usage logs if they exist 2025-12-04T17:26:28.1842159Z rm -f logs-*.zip 2025-12-04T17:26:28.1842569Z zip "logs-${FILE_SUFFIX}.zip" 'usage_log.txt' || true 2025-12-04T17:26:28.1843188Z zip -r "logs-${FILE_SUFFIX}.zip" test/test-reports -i '*.log' || true 2025-12-04T17:26:28.1850064Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T17:26:28.1850515Z env: 2025-12-04T17:26:28.1850771Z GIT_DEFAULT_BRANCH: main 2025-12-04T17:26:28.1851073Z HAS_NVIDIA_GPU: true 2025-12-04T17:26:28.1851447Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T17:26:28.1852101Z DOCKER_CONTAINER_ID: 764ff984146fd3268e049644ccb47d7d8238fae8138055ae0a6928cb5da435ad 2025-12-04T17:26:28.1852925Z FILE_SUFFIX: test-legacy_nvidia_driver-1-5-linux.g4dn.4xlarge.nvidia.gpu_57119749248 2025-12-04T17:26:28.1853488Z ##[endgroup] 2025-12-04T17:26:28.1933402Z adding: usage_log.txt (deflated 58%) 2025-12-04T17:26:28.1996432Z adding: test/test-reports/lazy.test_ts_opinfo_1.1_4d268f7078430bdf_.log (deflated 60%) 2025-12-04T17:26:28.2009549Z adding: test/test-reports/inductor.test_aot_inductor_1.6_cf1c969272c5d084_.log (deflated 93%) 2025-12-04T17:26:28.2042103Z adding: test/test-reports/inductor.test_aot_inductor_6.6_462385258b0b1d27_.log (deflated 95%) 2025-12-04T17:26:28.2066064Z adding: test/test-reports/inductor.test_torchinductor_codegen_dynamic_shapes_2.4_37f84ce4dcc870f4_.log (deflated 93%) 2025-12-04T17:26:28.2070236Z adding: test/test-reports/nn.test_pooling_1.1_e8e935ea909a1883_.log (deflated 90%) 2025-12-04T17:26:28.2077076Z adding: test/test-reports/inductor.test_torchinductor_opinfo_2.17_595df7515ef47f8b_.log (deflated 91%) 2025-12-04T17:26:28.2081947Z adding: test/test-reports/nn.test_embedding_1.1_dc9119745a665b44_.log (deflated 93%) 2025-12-04T17:26:28.2092032Z adding: test/test-reports/inductor.test_torchinductor_opinfo_7.17_bf87dc9c512027f2_.log (deflated 92%) 2025-12-04T17:26:28.2095108Z adding: test/test-reports/torch_np.numpy_tests.core.test_dtype_1.1_7868c6a3dd1e371a_.log (deflated 91%) 2025-12-04T17:26:28.2101683Z adding: test/test-reports/inductor.test_torchinductor_opinfo_12.17_a032934f54d29036_.log (deflated 91%) 2025-12-04T17:26:28.2102595Z adding: test/test-reports/lazy.test_debug_util_1.1_1fda475a5f9a06f5_.log (deflated 51%) 2025-12-04T17:26:28.2112229Z adding: test/test-reports/inductor.test_torchinductor_opinfo_17.17_0b4f962be1a8215a_.log (deflated 92%) 2025-12-04T17:26:28.2113159Z adding: test/test-reports/test_xnnpack_integration_1.1_5b815e7820ba690d_.log (deflated 72%) 2025-12-04T17:26:28.2167139Z adding: test/test-reports/inductor.test_cuda_select_algorithm_3.5_e3565bc7025c1889_.log (deflated 96%) 2025-12-04T17:26:28.2168586Z adding: test/test-reports/test_cuda_trace_1.1_70d30feb0b9acc89_.log (deflated 92%) 2025-12-04T17:26:28.2213383Z adding: test/test-reports/inductor.test_compile_subprocess_3.3_92ce494afd455b37_.log (deflated 95%) 2025-12-04T17:26:28.2214314Z adding: test/test-reports/inductor.test_flex_decoding_1.1_a47e1c88f2ff3c9a_.log (deflated 50%) 2025-12-04T17:26:28.2223621Z adding: test/test-reports/inductor.test_deterministic_5.8_04041ff7a6ce6208_.log (deflated 94%) 2025-12-04T17:26:28.2580857Z adding: test/test-reports/inductor.test_fp8_1.1_5b24deb545871ee8_.log (deflated 95%) 2025-12-04T17:26:28.2582930Z adding: test/test-reports/dynamo.test_model_output_1.1_9f288500c4a144e5_.log (deflated 89%) 2025-12-04T17:26:28.2603209Z adding: test/test-reports/torch_np.test_reductions_1.1_b720aba5a84f607c_.log (deflated 96%) 2025-12-04T17:26:28.2616750Z adding: test/test-reports/inductor.test_triton_kernels_1.1_80e8269e9d3330b3_.log (deflated 92%) 2025-12-04T17:26:28.2626077Z adding: test/test-reports/inductor.test_loop_ordering_1.1_ca0aee6babe9c71a_.log (deflated 92%) 2025-12-04T17:26:28.2674298Z adding: test/test-reports/export.test_serdes_1.1_d6753111c4d56d4f_.log (deflated 91%) 2025-12-04T17:26:28.2677119Z adding: test/test-reports/dynamo.test_backends_1.1_0248c6271c37d6dd_.log (deflated 91%) 2025-12-04T17:26:28.2678148Z adding: test/test-reports/test_prims_1.1_8a7702ff07b7da5d_.log (deflated 77%) 2025-12-04T17:26:28.2764497Z adding: test/test-reports/inductor.test_aot_inductor_package_1.1_5509f9f54e762912_.log (deflated 97%) 2025-12-04T17:26:28.2766394Z adding: test/test-reports/inductor.test_padding_1.1_4d224b6d5f4af5af_.log (deflated 86%) 2025-12-04T17:26:28.2767641Z adding: test/test-reports/dynamo.test_aot_compile_1.1_232ed44e0e50b87e_.log (deflated 78%) 2025-12-04T17:26:28.2770881Z adding: test/test-reports/dynamo.test_sets_1.1_e77962cd1c25fe47_.log (deflated 87%) 2025-12-04T17:26:28.2772308Z adding: test/test-reports/nn.test_load_state_dict_1.1_54a686ad2f48d7f9_.log (deflated 85%) 2025-12-04T17:26:28.2773703Z adding: test/test-reports/dynamo.test_wrap_inductor_compiled_regions_1.1_1c64e72dd7c0888e_.log (deflated 81%) 2025-12-04T17:26:28.2816923Z adding: test/test-reports/test_sparse_2.2_a491ad82f72502f4_.log (deflated 94%) 2025-12-04T17:26:28.2833103Z adding: test/test-reports/test_decomp_3.17_3a5dd6feb399010e_.log (deflated 89%) 2025-12-04T17:26:28.2849290Z adding: test/test-reports/test_decomp_8.17_26b4abb8a1042a34_.log (deflated 89%) 2025-12-04T17:26:28.2865999Z adding: test/test-reports/test_decomp_13.17_a52400f805dcf5ec_.log (deflated 89%) 2025-12-04T17:26:28.2912578Z adding: test/test-reports/test_ops_fwd_gradients_1.2_4abfc4ee1bccdea9_.log (deflated 94%) 2025-12-04T17:26:28.3122439Z adding: test/test-reports/test_meta_2.5_dad2a564d06ce93f_.log (deflated 93%) 2025-12-04T17:26:28.3139469Z adding: test/test-reports/test_ops_jit_2.2_814c1a8715769c60_.log (deflated 91%) 2025-12-04T17:26:28.3153844Z adding: test/test-reports/test_nestedtensor_3.4_8e55fc0245a5aec0_.log (deflated 91%) 2025-12-04T17:26:28.3243106Z adding: test/test-reports/test_ops_2.11_06c992f175cc3a27_.log (deflated 91%) 2025-12-04T17:26:28.3330553Z adding: test/test-reports/test_ops_7.11_97114ebb7b0ad963_.log (deflated 91%) 2025-12-04T17:26:28.3332433Z adding: test/test-reports/functorch.test_dims_1.1_a45bb86ae199f167_.log (deflated 83%) 2025-12-04T17:26:28.3373525Z adding: test/test-reports/functorch.test_ops_1.7_2b66798f0700c47b_.log (deflated 92%) 2025-12-04T17:26:28.3415011Z adding: test/test-reports/functorch.test_ops_6.7_b2e5f87489ea3e61_.log (deflated 92%) 2025-12-04T17:26:28.3424791Z adding: test/test-reports/test_spectral_ops_1.1_434231ff814fe9e8_.log (deflated 93%) 2025-12-04T17:26:28.3425668Z adding: test/test-reports/inductor.test_select_algorithm_1.1_7db4d246e17eb863_.log (deflated 7%) 2025-12-04T17:26:28.3435072Z adding: test/test-reports/inductor.test_cpu_repro_1.3_45e7fcc9d89e84f9_.log (deflated 93%) 2025-12-04T17:26:28.3436125Z adding: test/test-reports/test_cpp_extensions_aot_no_ninja_1.1_8356099a97b89d55_.log (deflated 78%) 2025-12-04T17:26:28.3437065Z adding: test/test-reports/inductor.test_custom_lowering_1.1_b51e0c13dc286ed6_.log (deflated 67%) 2025-12-04T17:26:28.3445703Z adding: test/test-reports/inductor.test_perf_1.1_8b1dd16368b2df6e_.log (deflated 91%) 2025-12-04T17:26:28.3446785Z adding: test/test-reports/test_cpp_extensions_aot_ninja_1.1_f69ae0466baae8e0_.log (deflated 78%) 2025-12-04T17:26:28.3447862Z adding: test/test-reports/inductor.test_binary_folding_1.1_181cb55db6266036_.log (deflated 66%) 2025-12-04T17:26:28.3450222Z adding: test/test-reports/test_shape_ops_1.1_4cd0c635a81aa180_.log (deflated 87%) 2025-12-04T17:26:28.3454631Z adding: test/test-reports/inductor.test_mkldnn_pattern_matcher_3.3_de8f963f0fd4260a_.log (deflated 91%) 2025-12-04T17:26:28.3455644Z adding: test/test-reports/inductor.test_cutlass_backend_1.1_15c862b0fcbdbc05_.log (deflated 33%) 2025-12-04T17:26:28.3456602Z adding: test/test-reports/inductor.test_ck_backend_1.1_578c7dfc11700a2c_.log (deflated 33%) 2025-12-04T17:26:28.3465530Z adding: test/test-reports/inductor.test_gpu_cpp_wrapper_1.1_e2281895ade7355a_.log (deflated 93%) 2025-12-04T17:26:28.3466575Z adding: test/test-reports/inductor.test_cutedsl_template_1.1_431b05ccc7f3aa92_.log (deflated 77%) 2025-12-04T17:26:28.3467631Z adding: test/test-reports/inductor.test_benchmark_fusion_1.1_06ce66c290620934_.log (deflated 75%) 2025-12-04T17:26:28.3473445Z adding: test/test-reports/dynamo.test_modules_1.1_8a3e7afe44c0508c_.log (deflated 87%) 2025-12-04T17:26:28.3474680Z adding: test/test-reports/dynamo.test_recompiles_1.1_781d5b3da7b99916_.log (deflated 79%) 2025-12-04T17:26:28.3475649Z adding: test/test-reports/export.test_tree_utils_1.1_01fdd9412c3dc291_.log (deflated 55%) 2025-12-04T17:26:28.3476553Z adding: test/test-reports/inductor.test_triton_wrapper_1.1_aad0f3987661a0f9_.log (deflated 53%) 2025-12-04T17:26:28.3477596Z adding: test/test-reports/inductor.test_static_cuda_launcher_1.1_aa705837cbb50573_.log (deflated 79%) 2025-12-04T17:26:28.3478655Z adding: test/test-reports/export.test_dynamic_shapes_1.1_fa1beed2f0eed81a_.log (deflated 55%) 2025-12-04T17:26:28.3479595Z adding: test/test-reports/dynamo.test_sdpa_1.1_5570cc8ef25d14ab_.log (deflated 63%) 2025-12-04T17:26:28.3480431Z adding: test/test-reports/dynamo.test_utils_1.1_31a21332cf86ab83_.log (deflated 76%) 2025-12-04T17:26:28.3481416Z adding: test/test-reports/inductor.test_codegen_triton_1.1_8e8a3c1b0bc12db7_.log (deflated 53%) 2025-12-04T17:26:28.3482523Z adding: test/test-reports/dynamo.test_frame_init_1.1_2f60459938295159_.log (deflated 51%) 2025-12-04T17:26:28.3483421Z adding: test/test-reports/inductor.test_device_assert_1.1_d916ba60ad9d20e5_.log (deflated 74%) 2025-12-04T17:26:28.3484387Z adding: test/test-reports/dynamo.test_skip_non_tensor_1.1_5109354b2e4bf091_.log (deflated 70%) 2025-12-04T17:26:28.3485380Z adding: test/test-reports/dynamo.test_skip_guard_eval_unsafe_1.1_b141b115e14ff53c_.log (deflated 66%) 2025-12-04T17:26:28.3486554Z adding: test/test-reports/inductor.test_control_deps_1.1_e3804afa5ea10bb1_.log (deflated 51%) 2025-12-04T17:26:28.3487451Z adding: test/test-reports/inductor.test_benchmarking_1.1_f947c0362e7ea45b_.log (deflated 78%) 2025-12-04T17:26:28.3488347Z adding: test/test-reports/inductor.test_helion_kernels_1.1_7576dd76567d0db5_.log (deflated 56%) 2025-12-04T17:26:28.3489240Z adding: test/test-reports/inductor.test_quantization_1.1_84a522d95ca6c1ae_.log (deflated 56%) 2025-12-04T17:26:28.3490081Z adding: test/test-reports/export.test_tools_1.1_7b301d5abd4a995c_.log (deflated 63%) 2025-12-04T17:26:28.3495921Z adding: test/test-reports/inductor.test_compiled_optimizers_1.3_8b95325a31b7233d_.log (deflated 92%) 2025-12-04T17:26:28.3496979Z adding: test/test-reports/inductor.test_aot_inductor_utils_1.1_6e3c972b94953db6_.log (deflated 51%) 2025-12-04T17:26:28.4137987Z adding: test/test-reports/inductor.test_control_flow_3.4_41808f1ad591b77f_.log (deflated 96%) 2025-12-04T17:26:28.4139200Z adding: test/test-reports/inductor.test_minifier_isolate_1.1_057329f0cdaf132f_.log (deflated 55%) 2025-12-04T17:26:28.4140852Z adding: test/test-reports/dynamo.test_error_messages_1.1_69ccbdbb7b8c4f0d_.log (deflated 85%) 2025-12-04T17:26:28.4141851Z adding: test/test-reports/dynamo.test_fake_distributed_1.1_14aa9693a6d04f2f_.log (deflated 60%) 2025-12-04T17:26:28.4143128Z adding: test/test-reports/dynamo.test_tree_map_1.1_63649f1aa127b381_.log (deflated 87%) 2025-12-04T17:26:28.4147261Z adding: test/test-reports/dynamo.test_minifier_1.1_70592d9088ca13b1_.log (deflated 93%) 2025-12-04T17:26:28.4148689Z adding: test/test-reports/dynamo.test_guard_manager_1.1_bfbfec93ec272b46_.log (deflated 83%) 2025-12-04T17:26:28.4149756Z adding: test/test-reports/export.test_schema_1.1_81eb22b4e3e11516_.log (deflated 62%) 2025-12-04T17:26:28.4150572Z adding: test/test-reports/export.test_pass_infra_1.1_d5838225a9a8bb31_.log (deflated 62%) 2025-12-04T17:26:28.4151557Z adding: test/test-reports/dynamo.test_recompile_ux_1.1_ac1d0051161f3db2_.log (deflated 78%) 2025-12-04T17:26:28.4152736Z adding: test/test-reports/export.test_experimental_1.1_01776c650d6c59b4_.log (deflated 79%) 2025-12-04T17:26:28.4154821Z adding: test/test-reports/export.test_converter_1.1_96408107873dd104_.log (deflated 87%) 2025-12-04T17:26:28.4155860Z adding: test/test-reports/dynamo.test_reorder_logs_1.1_c9bc43c050335e8d_.log (deflated 78%) 2025-12-04T17:26:28.4162286Z adding: test/test-reports/dynamo.test_subclasses_1.1_2bde93c2c59c5c84_.log (deflated 89%) 2025-12-04T17:26:28.4163267Z adding: test/test-reports/dynamo.test_python_autograd_1.1_3d66bfb1c1737055_.log (deflated 65%) 2025-12-04T17:26:28.4167864Z adding: test/test-reports/export.test_draft_export_1.1_dc9e6c5dfafe9a68_.log (deflated 92%) 2025-12-04T17:26:28.4173171Z adding: test/test-reports/test_package_1.1_34eeddca63aecf34_.log (deflated 87%) 2025-12-04T17:26:28.4174008Z adding: test/test-reports/test_mkl_verbose_1.1_8df5a0c4f0a0ed8d_.log (deflated 54%) 2025-12-04T17:26:28.4174993Z adding: test/test-reports/test_comparison_utils_1.1_bef8586b0834f006_.log (deflated 68%) 2025-12-04T17:26:28.4175875Z adding: test/test-reports/functorch.test_ac_logging_1.1_7064fc1f81d9dc21_.log (deflated 63%) 2025-12-04T17:26:28.4176783Z adding: test/test-reports/test_mkldnn_verbose_1.1_7178d5eae573783e_.log (deflated 55%) 2025-12-04T17:26:28.4190422Z adding: test/test-reports/test_cpp_api_parity_1.1_286b24be771dc4b7_.log (deflated 94%) 2025-12-04T17:26:28.4191271Z adding: test/test-reports/test_autoload_1.1_4b58ab9cd8e50318_.log (deflated 50%) 2025-12-04T17:26:28.4192361Z adding: test/test-reports/nn.attention.test_open_registry_1.1_52b8c107579dfb04_.log (deflated 58%) 2025-12-04T17:26:28.4193209Z adding: test/test-reports/test_as_strided_1.1_915ecc12abd3e105_.log (deflated 53%) 2025-12-04T17:26:28.4283578Z adding: test/test-reports/test_foreach_1.1_754d93a1205d9df5_.log (deflated 95%) 2025-12-04T17:26:28.4284445Z adding: test/test-reports/xpu.test_gemm_1.1_f9c98ad78a8f930f_.log (deflated 48%) 2025-12-04T17:26:28.4285992Z adding: test/test-reports/test_numpy_interop_1.1_0cfaaa8b9ef10506_.log (deflated 85%) 2025-12-04T17:26:28.4287348Z adding: test/test-reports/profiler.test_cpp_thread_1.1_6bc17e34ef07b5a0_.log (deflated 82%) 2025-12-04T17:26:28.4288299Z adding: test/test-reports/test_hub_1.1_af317e8677316cdb_.log (deflated 70%) 2025-12-04T17:26:28.4290813Z adding: test/test-reports/test_segment_reductions_1.1_c6d7e787931576c3_.log (deflated 91%) 2025-12-04T17:26:28.4291974Z adding: test/test-reports/test_autograd_fallback_1.1_60e7b253f9787096_.log (deflated 85%) 2025-12-04T17:26:28.4292896Z adding: test/test-reports/test_type_hints_1.1_d9336b501fe8992b_.log (deflated 49%) 2025-12-04T17:26:28.4294577Z adding: test/test-reports/nn.test_module_hooks_1.1_b8e5016c3845034d_.log (deflated 86%) 2025-12-04T17:26:28.4295597Z adding: test/test-reports/functorch.test_aot_joint_with_descriptors_1.1_948ec5a85f7c1f8f_.log (deflated 80%) 2025-12-04T17:26:28.4296752Z adding: test/test-reports/test_fx_reinplace_pass_1.1_8f7033a49b0aaa2e_.log (deflated 75%) 2025-12-04T17:26:28.4707971Z adding: test/test-reports/functorch.test_control_flow_2.2_2e5432104edc7835_.log (deflated 96%) 2025-12-04T17:26:28.4710061Z adding: test/test-reports/test_subclass_1.1_b65d4f741f14f053_.log (deflated 90%) 2025-12-04T17:26:28.4753100Z adding: test/test-reports/functorch.test_vmap_registrations_1.1_8a0424ce5b3ca65e_.log (deflated 96%) 2025-12-04T17:26:28.4755076Z adding: test/test-reports/nn.test_parametrization_1.1_0b836fe205c49662_.log (deflated 89%) 2025-12-04T17:26:28.4768103Z adding: test/test-reports/test_dynamic_shapes_1.1_f2bbcf4caeac0628_.log (deflated 91%) 2025-12-04T17:26:28.4769769Z adding: test/test-reports/test_dispatch_1.1_a7d630610c114c46_.log (deflated 77%) 2025-12-04T17:26:28.4770724Z adding: test/test-reports/test_numba_integration_1.1_4248037d4c172e88_.log (deflated 71%) 2025-12-04T17:26:28.4771950Z adding: test/test-reports/test_functional_optim_1.1_82fdba90420e8f47_.log (deflated 65%) 2025-12-04T17:26:28.4794703Z adding: test/test-reports/test_maskedtensor_1.1_4e0623e742dfe084_.log (deflated 94%) 2025-12-04T17:26:28.4796134Z adding: test/test-reports/torch_np.numpy_tests.lib.test_twodim_base_1.1_facf24e95ed5355d_.log (deflated 82%) 2025-12-04T17:26:28.4797287Z adding: test/test-reports/benchmark_utils.test_benchmark_utils_1.1_63175fb80c7f9ea7_.log (deflated 72%) 2025-12-04T17:26:28.4823539Z adding: test/test-reports/test_scaled_matmul_cuda_1.1_751f5e87909cbd5d_.log (deflated 97%) 2025-12-04T17:26:28.4825024Z adding: test/test-reports/profiler.test_memory_profiler_1.1_70baf0213dbc5855_.log (deflated 82%) 2025-12-04T17:26:28.4828740Z adding: test/test-reports/torch_np.numpy_tests.core.test_shape_base_1.1_0a0e6d68a930787e_.log (deflated 92%) 2025-12-04T17:26:28.4829701Z adding: test/test-reports/test_vulkan_1.1_2892328dc9a2ec74_.log (deflated 48%) 2025-12-04T17:26:28.4830462Z adding: test/test-reports/lazy.test_generator_1.1_9fb7d5917fd83b83_.log (deflated 55%) 2025-12-04T17:26:28.4831540Z adding: test/test-reports/torch_np.numpy_tests.lib.test_index_tricks_1.1_e2c692e766f99011_.log (deflated 85%) 2025-12-04T17:26:28.4838585Z adding: test/test-reports/torch_np.numpy_tests.linalg.test_linalg_1.1_f8a6a4a0c07965ac_.log (deflated 92%) 2025-12-04T17:26:28.4841464Z adding: test/test-reports/test_jit_llga_fuser_1.1_a67e637a7f701026_.log (deflated 88%) 2025-12-04T17:26:28.4842269Z adding: test/test-reports/optim.test_optim_1.1_e409dee8e8c07436_.log (deflated 7%) 2025-12-04T17:26:28.4843670Z adding: test/test-reports/test_jit_autocast_1.1_9af7b4b8017e3406_.log (deflated 81%) 2025-12-04T17:26:28.4844895Z adding: test/test-reports/torch_np.numpy_tests.core.test_getlimits_1.1_827a2f053af78584_.log (deflated 77%) 2025-12-04T17:26:28.4853117Z adding: test/test-reports/torch_np.test_ndarray_methods_1.1_793e3aaaf30f7d3c_.log (deflated 94%) 2025-12-04T17:26:28.4859897Z adding: test/test-reports/test_view_ops_1.1_405f53c81662ed35_.log (deflated 91%) 2025-12-04T17:26:28.4860893Z adding: test/test-reports/test_type_info_1.1_9ab09808df8277a9_.log (deflated 61%) 2025-12-04T17:26:28.4877376Z adding: test/test-reports/functorch.test_aotdispatch_1.1_9b74bc936a6dcdae_.log (deflated 92%) 2025-12-04T17:26:28.4879674Z adding: test/test-reports/test_scatter_gather_ops_1.1_f3de59c3735d2471_.log (deflated 89%) 2025-12-04T17:26:28.4881722Z adding: test/test-reports/test_cuda_multigpu_1.1_5809c25d23c9a947_.log (deflated 85%) 2025-12-04T17:26:28.4883799Z adding: test/test-reports/torch_np.numpy_tests.core.test_scalar_ctors_1.1_4168b5c3b3d7f9be_.log (deflated 90%) 2025-12-04T17:26:28.4884978Z adding: test/test-reports/torch_np.numpy_tests.lib.test_arraypad_1.1_867803734c4a045d_.log (deflated 72%) 2025-12-04T17:26:28.4915180Z ##[group]Run # Remove any previous debugging artifacts if they exist 2025-12-04T17:26:28.4915801Z # Remove any previous debugging artifacts if they exist 2025-12-04T17:26:28.4916285Z rm -f debug-*.zip 2025-12-04T17:26:28.4916619Z if [ -d 'test/debug' ]; then 2025-12-04T17:26:28.4917060Z  zip -r "debug-${FILE_SUFFIX}.zip" test/debug 2025-12-04T17:26:28.4917451Z fi 2025-12-04T17:26:28.4924246Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T17:26:28.4924691Z env: 2025-12-04T17:26:28.4924934Z GIT_DEFAULT_BRANCH: main 2025-12-04T17:26:28.4925253Z HAS_NVIDIA_GPU: true 2025-12-04T17:26:28.4925615Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T17:26:28.4926251Z DOCKER_CONTAINER_ID: 764ff984146fd3268e049644ccb47d7d8238fae8138055ae0a6928cb5da435ad 2025-12-04T17:26:28.4927053Z FILE_SUFFIX: test-legacy_nvidia_driver-1-5-linux.g4dn.4xlarge.nvidia.gpu_57119749248 2025-12-04T17:26:28.4927625Z ##[endgroup] 2025-12-04T17:26:28.5019853Z ##[group]Run seemethere/upload-artifact-s3@v5 2025-12-04T17:26:28.5020237Z with: 2025-12-04T17:26:28.5020492Z s3-bucket: gha-artifacts 2025-12-04T17:26:28.5020865Z s3-prefix: pytorch/pytorch/19922826259/1/artifact 2025-12-04T17:26:28.5021278Z retention-days: 14 2025-12-04T17:26:28.5021801Z if-no-files-found: warn 2025-12-04T17:26:28.5022385Z path: test-jsons-*.zip 2025-12-04T17:26:28.5022816Z name: artifact 2025-12-04T17:26:28.5023197Z region: us-east-1 2025-12-04T17:26:28.5023639Z env: 2025-12-04T17:26:28.5023999Z GIT_DEFAULT_BRANCH: main 2025-12-04T17:26:28.5024579Z HAS_NVIDIA_GPU: true 2025-12-04T17:26:28.5025129Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T17:26:28.5025905Z DOCKER_CONTAINER_ID: 764ff984146fd3268e049644ccb47d7d8238fae8138055ae0a6928cb5da435ad 2025-12-04T17:26:28.5026619Z ##[endgroup] 2025-12-04T17:26:28.9133280Z NOTE: s3-prefix specified, ignoring name parameter 2025-12-04T17:26:28.9133918Z With the provided path, there will be 1 file uploaded 2025-12-04T17:26:28.9134648Z Uploading to s3 prefix: pytorch/pytorch/19922826259/1/artifact 2025-12-04T17:26:28.9190866Z Starting upload of test-jsons-test-legacy_nvidia_driver-1-5-linux.g4dn.4xlarge.nvidia.gpu_57119749248.zip 2025-12-04T17:26:29.1147237Z Finished upload of test-jsons-test-legacy_nvidia_driver-1-5-linux.g4dn.4xlarge.nvidia.gpu_57119749248.zip 2025-12-04T17:26:29.1379339Z ##[group]Run seemethere/upload-artifact-s3@v5 2025-12-04T17:26:29.1379741Z with: 2025-12-04T17:26:29.1379988Z s3-bucket: gha-artifacts 2025-12-04T17:26:29.1380365Z s3-prefix: pytorch/pytorch/19922826259/1/artifact 2025-12-04T17:26:29.1380781Z retention-days: 14 2025-12-04T17:26:29.1381063Z if-no-files-found: error 2025-12-04T17:26:29.1381387Z path: test-reports-*.zip 2025-12-04T17:26:29.1381804Z name: artifact 2025-12-04T17:26:29.1382053Z region: us-east-1 2025-12-04T17:26:29.1382316Z env: 2025-12-04T17:26:29.1382564Z GIT_DEFAULT_BRANCH: main 2025-12-04T17:26:29.1382875Z HAS_NVIDIA_GPU: true 2025-12-04T17:26:29.1383232Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T17:26:29.1383885Z DOCKER_CONTAINER_ID: 764ff984146fd3268e049644ccb47d7d8238fae8138055ae0a6928cb5da435ad 2025-12-04T17:26:29.1384461Z ##[endgroup] 2025-12-04T17:26:29.5556382Z NOTE: s3-prefix specified, ignoring name parameter 2025-12-04T17:26:29.5556950Z With the provided path, there will be 1 file uploaded 2025-12-04T17:26:29.5557477Z Uploading to s3 prefix: pytorch/pytorch/19922826259/1/artifact 2025-12-04T17:26:29.5612200Z Starting upload of test-reports-test-legacy_nvidia_driver-1-5-linux.g4dn.4xlarge.nvidia.gpu_57119749248.zip 2025-12-04T17:26:29.8075705Z Finished upload of test-reports-test-legacy_nvidia_driver-1-5-linux.g4dn.4xlarge.nvidia.gpu_57119749248.zip 2025-12-04T17:26:29.8289130Z ##[group]Run seemethere/upload-artifact-s3@v5 2025-12-04T17:26:29.8289543Z with: 2025-12-04T17:26:29.8289807Z s3-bucket: gha-artifacts 2025-12-04T17:26:29.8290195Z s3-prefix: pytorch/pytorch/19922826259/1/artifact 2025-12-04T17:26:29.8290608Z retention-days: 14 2025-12-04T17:26:29.8290896Z if-no-files-found: ignore 2025-12-04T17:26:29.8291222Z path: logs-*.zip 2025-12-04T17:26:29.8291497Z name: artifact 2025-12-04T17:26:29.8291753Z region: us-east-1 2025-12-04T17:26:29.8292020Z env: 2025-12-04T17:26:29.8292291Z GIT_DEFAULT_BRANCH: main 2025-12-04T17:26:29.8292588Z HAS_NVIDIA_GPU: true 2025-12-04T17:26:29.8292962Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T17:26:29.8293783Z DOCKER_CONTAINER_ID: 764ff984146fd3268e049644ccb47d7d8238fae8138055ae0a6928cb5da435ad 2025-12-04T17:26:29.8294355Z ##[endgroup] 2025-12-04T17:26:30.2082494Z NOTE: s3-prefix specified, ignoring name parameter 2025-12-04T17:26:30.2083046Z With the provided path, there will be 1 file uploaded 2025-12-04T17:26:30.2083590Z Uploading to s3 prefix: pytorch/pytorch/19922826259/1/artifact 2025-12-04T17:26:30.2138543Z Starting upload of logs-test-legacy_nvidia_driver-1-5-linux.g4dn.4xlarge.nvidia.gpu_57119749248.zip 2025-12-04T17:26:30.4403939Z Finished upload of logs-test-legacy_nvidia_driver-1-5-linux.g4dn.4xlarge.nvidia.gpu_57119749248.zip 2025-12-04T17:26:30.4614787Z ##[group]Run seemethere/upload-artifact-s3@v5 2025-12-04T17:26:30.4615170Z with: 2025-12-04T17:26:30.4615430Z s3-bucket: gha-artifacts 2025-12-04T17:26:30.4615953Z s3-prefix: pytorch/pytorch/19922826259/1/artifact 2025-12-04T17:26:30.4616481Z retention-days: 14 2025-12-04T17:26:30.4616775Z if-no-files-found: ignore 2025-12-04T17:26:30.4617098Z path: debug-*.zip 2025-12-04T17:26:30.4617373Z name: artifact 2025-12-04T17:26:30.4617630Z region: us-east-1 2025-12-04T17:26:30.4617894Z env: 2025-12-04T17:26:30.4618137Z GIT_DEFAULT_BRANCH: main 2025-12-04T17:26:30.4618431Z HAS_NVIDIA_GPU: true 2025-12-04T17:26:30.4618810Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T17:26:30.4619457Z DOCKER_CONTAINER_ID: 764ff984146fd3268e049644ccb47d7d8238fae8138055ae0a6928cb5da435ad 2025-12-04T17:26:30.4620017Z ##[endgroup] 2025-12-04T17:26:30.8339120Z No files were found with the provided path: debug-*.zip. No artifacts will be uploaded. 2025-12-04T17:26:30.8556195Z ##[group]Run # shellcheck disable=SC2156 2025-12-04T17:26:30.8556657Z # shellcheck disable=SC2156 2025-12-04T17:26:30.8557345Z find . -iname "core.[1-9]*" -exec docker exec "${DOCKER_CONTAINER_ID}" sh -c "gdb python {} -ex 'bt' -ex 'q'" \; 2025-12-04T17:26:30.8564494Z shell: /usr/bin/bash -e {0} 2025-12-04T17:26:30.8564820Z env: 2025-12-04T17:26:30.8565070Z GIT_DEFAULT_BRANCH: main 2025-12-04T17:26:30.8565383Z HAS_NVIDIA_GPU: true 2025-12-04T17:26:30.8565740Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T17:26:30.8566392Z DOCKER_CONTAINER_ID: 764ff984146fd3268e049644ccb47d7d8238fae8138055ae0a6928cb5da435ad 2025-12-04T17:26:30.8567075Z ##[endgroup] 2025-12-04T17:26:31.2346710Z ##[group]Run seemethere/upload-artifact-s3@baba72d0712b404f646cebe0730933554ebce96a 2025-12-04T17:26:31.2347300Z with: 2025-12-04T17:26:31.2347719Z name: coredumps-legacy_nvidia_driver-1-5-linux.g4dn.4xlarge.nvidia.gpu 2025-12-04T17:26:31.2348242Z retention-days: 14 2025-12-04T17:26:31.2348545Z if-no-files-found: ignore 2025-12-04T17:26:31.2348850Z path: ./**/core.[1-9]* 2025-12-04T17:26:31.2349155Z s3-bucket: gha-artifacts 2025-12-04T17:26:31.2349472Z region: us-east-1 2025-12-04T17:26:31.2349724Z env: 2025-12-04T17:26:31.2349965Z GIT_DEFAULT_BRANCH: main 2025-12-04T17:26:31.2350271Z HAS_NVIDIA_GPU: true 2025-12-04T17:26:31.2350627Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T17:26:31.2351276Z DOCKER_CONTAINER_ID: 764ff984146fd3268e049644ccb47d7d8238fae8138055ae0a6928cb5da435ad 2025-12-04T17:26:31.2351848Z ##[endgroup] 2025-12-04T17:26:40.9348455Z No files were found with the provided path: ./**/core.[1-9]*. No artifacts will be uploaded. 2025-12-04T17:26:40.9676380Z Prepare all required actions 2025-12-04T17:26:40.9676842Z Getting action download info 2025-12-04T17:26:41.1182041Z Download action repository 'actions/setup-python@v6' (SHA:83679a892e2d95755f2dac6acb0bfd1e9ac5d548) 2025-12-04T17:26:41.5839914Z ##[group]Run ./.github/actions/upload-utilization-stats 2025-12-04T17:26:41.5840333Z with: 2025-12-04T17:26:41.5840582Z job_id: 57119749248 2025-12-04T17:26:41.5841306Z job_name: linux-jammy-cuda12.4-py3.10-gcc11 / test (legacy_nvidia_driver, 1, 5, linux.g4dn.4xlarge.nvidia.gpu, mem_leak_check, unstable) 2025-12-04T17:26:41.5842122Z workflow_name: periodic 2025-12-04T17:26:41.5842424Z workflow_run_id: 19922826259 2025-12-04T17:26:41.5842744Z workflow_attempt: 1 2025-12-04T17:26:41.5843013Z env: 2025-12-04T17:26:41.5843245Z GIT_DEFAULT_BRANCH: main 2025-12-04T17:26:41.5843551Z HAS_NVIDIA_GPU: true 2025-12-04T17:26:41.5843924Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T17:26:41.5844708Z DOCKER_CONTAINER_ID: 764ff984146fd3268e049644ccb47d7d8238fae8138055ae0a6928cb5da435ad 2025-12-04T17:26:41.5845269Z ##[endgroup] 2025-12-04T17:26:41.5904366Z ##[group]Run actions/setup-python@v6 2025-12-04T17:26:41.5904723Z with: 2025-12-04T17:26:41.5904979Z python-version: 3.10 2025-12-04T17:26:41.5905264Z check-latest: false 2025-12-04T17:26:41.5905686Z token: *** 2025-12-04T17:26:41.5905956Z update-environment: true 2025-12-04T17:26:41.5906281Z allow-prereleases: false 2025-12-04T17:26:41.5906704Z freethreaded: false 2025-12-04T17:26:41.5906979Z env: 2025-12-04T17:26:41.5907221Z GIT_DEFAULT_BRANCH: main 2025-12-04T17:26:41.5907513Z HAS_NVIDIA_GPU: true 2025-12-04T17:26:41.5907884Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T17:26:41.5908539Z DOCKER_CONTAINER_ID: 764ff984146fd3268e049644ccb47d7d8238fae8138055ae0a6928cb5da435ad 2025-12-04T17:26:41.5909104Z ##[endgroup] 2025-12-04T17:26:41.7579925Z ##[group]Installed versions 2025-12-04T17:26:41.7590806Z Version 3.10 was not found in the local cache 2025-12-04T17:26:41.7796261Z (node:368369) [DEP0040] DeprecationWarning: The `punycode` module is deprecated. Please use a userland alternative instead. 2025-12-04T17:26:41.7797227Z (Use `node --trace-deprecation ...` to show where the warning was created) 2025-12-04T17:26:42.1058796Z ##[error]The version '3.10' with architecture 'x64' was not found for this operating system. The list of all available versions can be found here: https://raw.githubusercontent.com/actions/python-versions/main/versions-manifest.json 2025-12-04T17:26:42.1235178Z ##[group]Run pytorch/test-infra/.github/actions/teardown-linux@main 2025-12-04T17:26:42.1235698Z with: 2025-12-04T17:26:42.1235922Z env: 2025-12-04T17:26:42.1236167Z GIT_DEFAULT_BRANCH: main 2025-12-04T17:26:42.1236479Z HAS_NVIDIA_GPU: true 2025-12-04T17:26:42.1236850Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T17:26:42.1237490Z DOCKER_CONTAINER_ID: 764ff984146fd3268e049644ccb47d7d8238fae8138055ae0a6928cb5da435ad 2025-12-04T17:26:42.1238174Z ##[endgroup] 2025-12-04T17:26:42.1256690Z ##[group]Run set -eou pipefail 2025-12-04T17:26:42.1257063Z set -eou pipefail 2025-12-04T17:26:42.1257372Z  2025-12-04T17:26:42.1257800Z echo "Holding runner for 2 hours until all ssh sessions have logged out" 2025-12-04T17:26:42.1258346Z for _ in $(seq 1440); do 2025-12-04T17:26:42.1258723Z  # Break if no ssh session exists anymore 2025-12-04T17:26:42.1259145Z  if [ "$(who)" = "" ]; then 2025-12-04T17:26:42.1259536Z  break 2025-12-04T17:26:42.1259812Z  fi 2025-12-04T17:26:42.1260075Z  echo "." 2025-12-04T17:26:42.1260355Z  sleep 5 2025-12-04T17:26:42.1260617Z done 2025-12-04T17:26:42.1267528Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T17:26:42.1267976Z env: 2025-12-04T17:26:42.1268217Z GIT_DEFAULT_BRANCH: main 2025-12-04T17:26:42.1268542Z HAS_NVIDIA_GPU: true 2025-12-04T17:26:42.1268910Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T17:26:42.1269541Z DOCKER_CONTAINER_ID: 764ff984146fd3268e049644ccb47d7d8238fae8138055ae0a6928cb5da435ad 2025-12-04T17:26:42.1270123Z ##[endgroup] 2025-12-04T17:26:42.1299439Z Holding runner for 2 hours until all ssh sessions have logged out 2025-12-04T17:26:42.1381137Z ##[group]Run # ignore expansion of "docker ps -q" since it could be empty 2025-12-04T17:26:42.1381815Z # ignore expansion of "docker ps -q" since it could be empty 2025-12-04T17:26:42.1382356Z # shellcheck disable=SC2046 2025-12-04T17:26:42.1382749Z docker stop $(docker ps -q) || true 2025-12-04T17:26:42.1383141Z # Prune all of the docker images 2025-12-04T17:26:42.1383527Z docker system prune -af 2025-12-04T17:26:42.1391767Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T17:26:42.1392217Z env: 2025-12-04T17:26:42.1392481Z GIT_DEFAULT_BRANCH: main 2025-12-04T17:26:42.1392782Z HAS_NVIDIA_GPU: true 2025-12-04T17:26:42.1393155Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T17:26:42.1393812Z DOCKER_CONTAINER_ID: 764ff984146fd3268e049644ccb47d7d8238fae8138055ae0a6928cb5da435ad 2025-12-04T17:26:42.1394396Z ##[endgroup] 2025-12-04T17:26:53.2018792Z 764ff984146f 2025-12-04T17:26:58.5413707Z Deleted Containers: 2025-12-04T17:26:58.5414211Z 764ff984146fd3268e049644ccb47d7d8238fae8138055ae0a6928cb5da435ad 2025-12-04T17:26:58.5414866Z 2025-12-04T17:27:07.5200864Z Deleted Images: 2025-12-04T17:27:07.5201899Z untagged: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.4-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a 2025-12-04T17:27:07.5203434Z untagged: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image@sha256:ae30f11a5b50741bd652aa0c94ad89ef791c4e50157eff642748620825cf7940 2025-12-04T17:27:07.5204497Z deleted: sha256:5465aa79632b68f6240c23f0d0b021df4d0fd595333b61a40d36a0cf73656024 2025-12-04T17:27:07.5205254Z deleted: sha256:f57a578c46f36a858c2be92210a89558688ee36b619af78c698952c0e3ef05ad 2025-12-04T17:27:07.5206015Z deleted: sha256:ce0698bd1efc811ccead0ecdad944b4839bf17bff387495b58e64cf8db0e210c 2025-12-04T17:27:07.5206774Z deleted: sha256:f0ee66f328fa98c40f336c64fee9a4b42e51a793cceea7f81932068bdc7bd315 2025-12-04T17:27:07.5207545Z deleted: sha256:ea24b30a25c161bd4bd564bfd90c36d88674a1aa59ef3e65647e926c76685be0 2025-12-04T17:27:07.5208312Z deleted: sha256:15bc0847ce5e60cc1a9b36d25283dc5648fb45e04aa9a8dec984af3c193e2f0b 2025-12-04T17:27:07.5209467Z deleted: sha256:3639aa26691090ef45641c75bffcb2e3f427f5e282abc93d607de4433bf90488 2025-12-04T17:27:07.5210206Z deleted: sha256:86258272ba477934c917d08b21e0da6000c268b60f5a9ae907038e7bf3236532 2025-12-04T17:27:07.5210960Z deleted: sha256:ba8e0040c98ddbf87acbc3ae6575b2933c09421ac7094a96e027d1fc9356fbb6 2025-12-04T17:27:07.5211728Z deleted: sha256:ca0176fc0de6cc059c4dbfc313434b5dea2c90dc24f2dc3a1061b941c7b3e6ca 2025-12-04T17:27:07.5212580Z deleted: sha256:cc6a480ab9e6091c6c206bc9b340611b3863258975e835769bd8f2a38b5d8c13 2025-12-04T17:27:07.5213333Z deleted: sha256:8465c24f0b284d8589ea191edeb80d1da07e4a59dfcfdcfa153bdf3d5d678d3e 2025-12-04T17:27:07.5214097Z deleted: sha256:b93bfbd3b55899c606fb98c5edbd21fd63114862a4f5a5b67c7aa63fc9ada9a3 2025-12-04T17:27:07.5214861Z deleted: sha256:6b7582e3ce445d82e9d2ae7769502119c39c1edbf5fe11c195615db8da846931 2025-12-04T17:27:07.5215597Z deleted: sha256:9d79615a9d9ae67110cc9da697933492b385b1e4708d30c2211625bea5d42f27 2025-12-04T17:27:07.5216429Z deleted: sha256:7132c6db5e7d5692786167dfb22dea62d8203dc7837b2d1de435c6e5c85e906e 2025-12-04T17:27:07.5217186Z deleted: sha256:d61bc13a0957d633ff633186c6cbdf48da1c551991d814281262e58709e225a8 2025-12-04T17:27:07.5217943Z deleted: sha256:0c348bbc3988acd329b3e42de4d2c73d5dc4942618716ca312d389d4f704f4bb 2025-12-04T17:27:07.5218684Z deleted: sha256:28d30dd15686ab6819c2f03388c9999bbdaef35e8756817297d795e00dd623fc 2025-12-04T17:27:07.5219449Z deleted: sha256:0a57608df6cffb31a0b24f2537b4dfe7a55bbe6ea02216703cc3172062ab9d75 2025-12-04T17:27:07.5220211Z deleted: sha256:43d23f49f4d70a54b4aff6f4f10d5c5a3d75b100abbbf281ad510177cc80cd99 2025-12-04T17:27:07.5220964Z deleted: sha256:f9e33c2e4c7b8e7179fba052da4d7c4acdc8287f253c95328ae04055755f88a4 2025-12-04T17:27:07.5221728Z deleted: sha256:cfce0930cf33c7136fc92511b9bcad570958363b55e9e0c82e9b8ebc29301356 2025-12-04T17:27:07.5222483Z deleted: sha256:9a709ae20528f500f51271ad2ce6a3d7196fe814a28ae73881901ecef9748c2a 2025-12-04T17:27:07.5223238Z deleted: sha256:68a1d16e9392be6fe939a58c5f941a0919408b5852e52cb04027b0b8777e2b0e 2025-12-04T17:27:07.5223975Z deleted: sha256:042a0022b3eea78f54015f4cf2888bcfa3b91deb0b08830a33c2814b93285dd9 2025-12-04T17:27:07.5224735Z deleted: sha256:a7ba703ff0aa305a608f3b4afd89c2ecd0d1244b127629145a2e691490abb271 2025-12-04T17:27:07.5225510Z deleted: sha256:be44f5fbae55066faba60eebf7065a082abf517ab8f2ebf8ece69e74d45def07 2025-12-04T17:27:07.5226290Z deleted: sha256:a01f1b0d88a8936d648f78787f56579bdb6617edf4620d0410ab6b118351bbb2 2025-12-04T17:27:07.5227031Z deleted: sha256:dc93f45553adafb5c6e7473711c833996f6884dab2da708ffc76b5cf65b8db9d 2025-12-04T17:27:07.5227802Z deleted: sha256:ffdba9ecb5890a9cb23368d781ff5484270b7f13c6d5629feca3512b58b9a0ac 2025-12-04T17:27:07.5228551Z deleted: sha256:268a91c420865628895871795b524436f5cc4403aa53d71f457db21bf42dd530 2025-12-04T17:27:07.5229297Z deleted: sha256:72450bfd97986ccc53d8fa76252130b464fdb3c5fd8e688546e8c3ce0b9d4394 2025-12-04T17:27:07.5230060Z deleted: sha256:63954235d3be0420af6ad2dae2b24849e3eee1edb10cf86d29137c3e19621f47 2025-12-04T17:27:07.5230914Z deleted: sha256:1c4e2d3e68e8a166d1965962077fe194ea00cad2ee636399c0c17ba5a94bdb9c 2025-12-04T17:27:07.5231687Z deleted: sha256:361cacbab7154a0cb62486f57d75b112feedbcc751a7d8f7bb02ec7a61b1fe0d 2025-12-04T17:27:07.5232454Z deleted: sha256:e653f6af92265f4300717bd617aab954cfbf049d4be32e890e57c2e8135be7f9 2025-12-04T17:27:07.5233212Z deleted: sha256:bfffeb2974ffc58c0669724812f701df860257ac3d047a7315a100beb0ea0507 2025-12-04T17:27:07.5233966Z deleted: sha256:6ae48d8efc75420f721058928fe8b1ccf48aa1bdc92de539b1f0db9248a41fcf 2025-12-04T17:27:07.5234725Z deleted: sha256:535c7026785a690366fc69ecbc9a81f1b58a46f63c782620591c1297406a2731 2025-12-04T17:27:07.5235482Z deleted: sha256:8462076c3cc8db6030f38e1137bfbef1aad85404ed4231285c1e06cd414d3e57 2025-12-04T17:27:07.5236234Z deleted: sha256:fe340d63ccb66e5b395b7900c1002a513e4afd7f610e9df5e7262c4f71e93bef 2025-12-04T17:27:07.5236994Z deleted: sha256:b61085386114396fe42144a4aa739b2a0b45f0c30a083462a2ea7b9b675c02aa 2025-12-04T17:27:07.5237856Z deleted: sha256:7772f25c05bcd5ede631d287b826aa108db67c773e377db98ffa73b0917f3629 2025-12-04T17:27:07.5238631Z deleted: sha256:3ea8a43d8193d05ecd6aa473b523a3569e11ae691eed9e6ffd693f23b0106035 2025-12-04T17:27:07.5239375Z deleted: sha256:34647b4087d29cf48a18668bb935a95fc8b2dac3522c2581397f0f27227047fd 2025-12-04T17:27:07.5240192Z deleted: sha256:b6a169f1ab01281c16562ad43b462a1a47a33be8d3cfae0a117ffa5c47d0b532 2025-12-04T17:27:07.5240992Z deleted: sha256:664173a33cd21248a2d73d2eba7887602e36fbc96002d991eb0bd0a2d574ac88 2025-12-04T17:27:07.5241751Z deleted: sha256:d67fdfe94c9a0228f17991cd3e958e36da96d4d597b46773cb7eed98c489f947 2025-12-04T17:27:07.5242560Z deleted: sha256:f2be0722250908742f067756b56ed3fa169daa2f1c8201a7ed4335b2fed2cae5 2025-12-04T17:27:07.5243302Z deleted: sha256:8614db257d8dc9e0f0ee8398a4a4d3c061b2797d6017daaf0696dd7f87633b3e 2025-12-04T17:27:07.5244063Z deleted: sha256:23ee0908a1bf254f1d4dd0591cc0c6801571b4d93950b6fd4fee57ca7e361da0 2025-12-04T17:27:07.5244836Z deleted: sha256:f627a99df4c0f370bd7fc8ea6be7695d8027f988aed52b65233cbcf78b01989b 2025-12-04T17:27:07.5245576Z deleted: sha256:d5e92389b59d4134cdb96113af964186602e98c392e76a8f26d4ea6e54056ccc 2025-12-04T17:27:07.5246337Z deleted: sha256:cbfccf44b9dc670c109634fbf19c2bfff2a3d5243bfa351c851d9fad3f1acfc2 2025-12-04T17:27:07.5247099Z deleted: sha256:1242535e81ad4bd713910a6c5e1b38375b12ed1bcd1b48419813a5ef28a5c84c 2025-12-04T17:27:07.5247848Z deleted: sha256:10b1394079cfe756a1ad9aa9aa3a2995bd5e46ef1e18029eb9eae0398f6d4e88 2025-12-04T17:27:07.5248589Z deleted: sha256:1d32da9a5f10e10c4a97a839151a1943d4db18494e8080bea91a6c9784fde067 2025-12-04T17:27:07.5249340Z deleted: sha256:af2fd59653ebd685a032ef800f8227c0d7b9b0e5ef397b30d4301e001c943e8b 2025-12-04T17:27:07.5250101Z deleted: sha256:c48d351980e3bd24d533ae55d1acc6a27911dffcbb03b2ae552d7ccc3e4cd74f 2025-12-04T17:27:07.5250849Z deleted: sha256:e663afac609b1b6c812ab45265c27d870b92c9fc6849939f0b8635da83cbfb53 2025-12-04T17:27:07.5251595Z deleted: sha256:f79dc17668331d4214ef24000d5c54a0bb2ba70f152d8523f571e2b76a303f4f 2025-12-04T17:27:07.5252351Z deleted: sha256:00de9606a6cd2a2dfb4ceffcb076474d027a1f6273894677090aee7478035865 2025-12-04T17:27:07.5253108Z deleted: sha256:cf35fe1d0317253b75ee17c12783c2561faebf9bf2c59c07ad4712c053246586 2025-12-04T17:27:07.5253841Z deleted: sha256:06622801490739d9db884c23c05a31a1ee86c41e888b34c3ccef23d37f2bdbb5 2025-12-04T17:27:07.5254594Z deleted: sha256:df5dafcaee865ddfb66e22075c63769836e01a627d6fe46658b6f4b4a25318d3 2025-12-04T17:27:07.5255365Z deleted: sha256:7949ae5c4df921feb0e2cd7bac1e402e1ab9135e758fa41cd567880b354b40bc 2025-12-04T17:27:07.5256119Z deleted: sha256:9f19148d820adb1d6e86d0ce68e21fbcedafa7c7ec6c45c9004fa3a607096923 2025-12-04T17:27:07.5256970Z deleted: sha256:1d37d963e85ce22ffaab56a1cf35b3411f34f9432dc5e49ebbdf6f30816cdfa8 2025-12-04T17:27:07.5257740Z deleted: sha256:bac6d91e3830e51e96879deaa3e6d0d39da076fa802ebda68f81bdf7ef8342d5 2025-12-04T17:27:07.5258495Z deleted: sha256:ffd496b07151c90e7ddd68a81a36471f51a544187982db5e34621358e1b29681 2025-12-04T17:27:07.5259418Z deleted: sha256:890b2042bdb9e22a614cea1be88366cd3ae15159bf78ac510b9daa6f802493a6 2025-12-04T17:27:07.5260188Z deleted: sha256:ddd9a57b20a8b45ae0e8e350ec266d50a1b9e9a7ff4921470eb38f004d50eb20 2025-12-04T17:27:07.5260950Z deleted: sha256:2f4f91684b8221bc5cbc3f14c7e00bb693854027a1a6de5ad6bdcd000bb579f2 2025-12-04T17:27:07.5261717Z deleted: sha256:9c01ec5e73233284a0f9bb42de59696a1fa61caacacdf63d04df5ebd73895d77 2025-12-04T17:27:07.5262465Z deleted: sha256:f6153a90f0f5316b03f1464826325a1578231b89b3c1f1c83cc7cebdd41cee2a 2025-12-04T17:27:07.5263209Z deleted: sha256:4e89cd2181813af7fd2219923bae493e33111d8b4ebd76f257b7fb26744fda28 2025-12-04T17:27:07.5263968Z deleted: sha256:a0b77eb4054db8f2ea2ec957b3941b4aeee14b59e94a99a1521f90d6e41faf0e 2025-12-04T17:27:07.5264704Z deleted: sha256:1a1b2848f15aa5114f5a67e3705439512880bf1a7a6436cc67760c59b5f10c46 2025-12-04T17:27:07.5265437Z deleted: sha256:004fc01362840c164664c18580e479546fa0b7f9599487558f80190aec30e2b5 2025-12-04T17:27:07.5266269Z deleted: sha256:35f36e20799f0a0dead81bc3701732e43489264e6bee9fcb789b376a99e17e78 2025-12-04T17:27:07.5267020Z deleted: sha256:1207fd2ede86015c3f105620cb491e8199d2060a4a87490de358286d0ae52e4e 2025-12-04T17:27:07.5267768Z deleted: sha256:02dccb85ee744d1fbb819c6da618b2c52a3e4affc89e407f79b875e7b3bbb7df 2025-12-04T17:27:07.5268551Z deleted: sha256:d22e6ff9c3ac9dabbcc6052e1459f8dc4ebd19bd057bd0688615d6cc3ebb5cf0 2025-12-04T17:27:07.5269320Z deleted: sha256:73974f74b436f39a2fdb6461b1e3f7c3e41c73325776fa71d16b942a5b4a365b 2025-12-04T17:27:07.5269977Z untagged: public.ecr.aws/docker/library/python:3.13 2025-12-04T17:27:07.5270810Z untagged: public.ecr.aws/docker/library/python@sha256:3f986299a7b8b44b0d8cf9bda2b22361ce5c3058ef5d7cb17fb7452506680ab0 2025-12-04T17:27:07.5271995Z deleted: sha256:44438aecfedf7b6086fce506dae0db5ba7fc0027f9b743f1a75a6b5cbc7de70a 2025-12-04T17:27:07.5272769Z deleted: sha256:6f09a1f5d8a107c2532fbd116e75116cb75fa77b1a7d72d3bdf1ac12de152acd 2025-12-04T17:27:07.5273524Z deleted: sha256:fe5f3ac0be086125eb1e3cd10cc33e8e426f4e079381f7ce5a987b626e99fa67 2025-12-04T17:27:07.5274292Z deleted: sha256:79dd2061a22cf919cfc4f1f02704bfda09afadb017265e670ee54441d296c06c 2025-12-04T17:27:07.5275063Z deleted: sha256:9447ad402aafdbee17e999b0ec84ad89c2646dbebf054d469d4f8bee77f66212 2025-12-04T17:27:07.5275815Z deleted: sha256:7a4909f3c1975be52292f53107495ee1b41c17494918767ccedf1cf1688ae318 2025-12-04T17:27:07.5276541Z deleted: sha256:3474923d97f1f498237650a7d51bd4aea37d5e6b9d8a778777920584af5dd560 2025-12-04T17:27:07.5277291Z deleted: sha256:683afd1773444401a9cbd24842ee5d9154a11abb4fab63ddea5c03df788597ee 2025-12-04T17:27:07.5277741Z 2025-12-04T17:27:07.5277882Z Total reclaimed space: 36GB 2025-12-04T17:27:07.5316525Z ##[group]Run set +e 2025-12-04T17:27:07.5316942Z set +e 2025-12-04T17:27:07.5317204Z set -x 2025-12-04T17:27:07.5317465Z  2025-12-04T17:27:07.5317717Z nvidia-smi 2025-12-04T17:27:07.5318254Z # NB: Surprisingly, nvidia-smi command returns successfully with return code 0 even in 2025-12-04T17:27:07.5319081Z # the case where the driver has already crashed as it still can get the driver version 2025-12-04T17:27:07.5319883Z # and some basic information like the bus ID. However, the rest of the information 2025-12-04T17:27:07.5320491Z # would be missing (ERR!), for example: 2025-12-04T17:27:07.5320856Z # 2025-12-04T17:27:07.5321217Z # +-----------------------------------------------------------------------------+ 2025-12-04T17:27:07.5321850Z # | NVIDIA-SMI 525.89.02 Driver Version: 525.89.02 CUDA Version: 12.0 | 2025-12-04T17:27:07.5322507Z # |-------------------------------+----------------------+----------------------+ 2025-12-04T17:27:07.5323139Z # | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | 2025-12-04T17:27:07.5323834Z # | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | 2025-12-04T17:27:07.5324504Z # | | | MIG M. | 2025-12-04T17:27:07.5324931Z # |===============================+======================+======================| 2025-12-04T17:27:07.5325427Z # | 0 ERR! Off | 00000000:00:1E.0 Off | ERR! | 2025-12-04T17:27:07.5326000Z # |ERR! ERR! ERR! ERR! / ERR! | 4184MiB / 23028MiB | ERR! Default | 2025-12-04T17:27:07.5326525Z # | | | ERR! | 2025-12-04T17:27:07.5327018Z # +-------------------------------+----------------------+----------------------+ 2025-12-04T17:27:07.5327469Z # 2025-12-04T17:27:07.5327824Z # +-----------------------------------------------------------------------------+ 2025-12-04T17:27:07.5328370Z # | Processes: | 2025-12-04T17:27:07.5328925Z # | GPU GI CI PID Type Process name GPU Memory | 2025-12-04T17:27:07.5329454Z # | ID ID Usage | 2025-12-04T17:27:07.5329899Z # |=============================================================================| 2025-12-04T17:27:07.5330396Z # +-----------------------------------------------------------------------------+ 2025-12-04T17:27:07.5330903Z # 2025-12-04T17:27:07.5331354Z # This should be reported as a failure instead as it will guarantee to fail when 2025-12-04T17:27:07.5331955Z # Docker tries to run with --gpus all 2025-12-04T17:27:07.5332331Z # 2025-12-04T17:27:07.5332750Z # So, the correct check here is to query one of the missing piece of info like 2025-12-04T17:27:07.5333371Z # GPU name, so that the command can fail accordingly 2025-12-04T17:27:07.5333953Z nvidia-smi --query-gpu=gpu_name --format=csv,noheader --id=0 2025-12-04T17:27:07.5334448Z NVIDIA_SMI_STATUS=$? 2025-12-04T17:27:07.5334768Z  2025-12-04T17:27:07.5335287Z # These are acceptable return code from nvidia-smi as copied from setup-nvidia GitHub action 2025-12-04T17:27:07.5336064Z if [ "$NVIDIA_SMI_STATUS" -ne 0 ] && [ "$NVIDIA_SMI_STATUS" -ne 14 ]; then 2025-12-04T17:27:07.5336861Z  echo "NVIDIA driver installation has failed, shutting down the runner..." 2025-12-04T17:27:07.5337464Z  .github/scripts/stop_runner_service.sh 2025-12-04T17:27:07.5337850Z fi 2025-12-04T17:27:07.5338085Z  2025-12-04T17:27:07.5338674Z # For runner with multiple GPUs, we also want to confirm that the number of GPUs are the 2025-12-04T17:27:07.5339421Z # power of 2, i.e. 1, 2, 4, or 8. This is to avoid flaky test issue when one GPU fails 2025-12-04T17:27:07.5340045Z # https://github.com/pytorch/test-infra/issues/4000 2025-12-04T17:27:07.5340549Z GPU_COUNT=$(nvidia-smi --list-gpus | wc -l) 2025-12-04T17:27:07.5340970Z NVIDIA_SMI_STATUS=$? 2025-12-04T17:27:07.5341286Z  2025-12-04T17:27:07.5341789Z # These are acceptable return code from nvidia-smi as copied from setup-nvidia GitHub action 2025-12-04T17:27:07.5342558Z if [ "$NVIDIA_SMI_STATUS" -ne 0 ] && [ "$NVIDIA_SMI_STATUS" -ne 14 ]; then 2025-12-04T17:27:07.5343258Z  echo "NVIDIA driver installation has failed, shutting down the runner..." 2025-12-04T17:27:07.5343859Z  .github/scripts/stop_runner_service.sh 2025-12-04T17:27:07.5344232Z fi 2025-12-04T17:27:07.5344479Z  2025-12-04T17:27:07.5344769Z # Check the GPU count to be a power of 2 2025-12-04T17:27:07.5345429Z if [ "$GPU_COUNT" -le 8 ] && [ "$GPU_COUNT" -ne 1 ] && [ "$GPU_COUNT" -ne 2 ] && [ "$GPU_COUNT" -ne 4 ] && [ "$GPU_COUNT" -ne 8 ]; then 2025-12-04T17:27:07.5346315Z  echo "NVIDIA driver detects $GPU_COUNT GPUs. The runner has a broken GPU, shutting it down..." 2025-12-04T17:27:07.5347039Z  .github/scripts/stop_runner_service.sh 2025-12-04T17:27:07.5347423Z fi 2025-12-04T17:27:07.5356303Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T17:27:07.5356753Z env: 2025-12-04T17:27:07.5357008Z GIT_DEFAULT_BRANCH: main 2025-12-04T17:27:07.5357310Z HAS_NVIDIA_GPU: true 2025-12-04T17:27:07.5357688Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T17:27:07.5358333Z DOCKER_CONTAINER_ID: 764ff984146fd3268e049644ccb47d7d8238fae8138055ae0a6928cb5da435ad 2025-12-04T17:27:07.5358911Z ##[endgroup] 2025-12-04T17:27:07.5391016Z + nvidia-smi 2025-12-04T17:27:07.5597303Z Thu Dec 4 17:27:07 2025 2025-12-04T17:27:07.5597747Z +-----------------------------------------------------------------------------+ 2025-12-04T17:27:07.5598355Z | NVIDIA-SMI 525.105.17 Driver Version: 525.105.17 CUDA Version: 12.0 | 2025-12-04T17:27:07.5598949Z |-------------------------------+----------------------+----------------------+ 2025-12-04T17:27:07.5599542Z | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | 2025-12-04T17:27:07.5600194Z | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | 2025-12-04T17:27:07.5600723Z | | | MIG M. | 2025-12-04T17:27:07.5601117Z |===============================+======================+======================| 2025-12-04T17:27:07.5760129Z | 0 Tesla T4 On | 00000000:00:1E.0 Off | 0 | 2025-12-04T17:27:07.5760673Z | N/A 27C P8 13W / 70W | 2MiB / 15360MiB | 0% Default | 2025-12-04T17:27:07.5761125Z | | | N/A | 2025-12-04T17:27:07.5761592Z +-------------------------------+----------------------+----------------------+ 2025-12-04T17:27:07.5762065Z 2025-12-04T17:27:07.5762531Z +-----------------------------------------------------------------------------+ 2025-12-04T17:27:07.5763025Z | Processes: | 2025-12-04T17:27:07.5763547Z | GPU GI CI PID Type Process name GPU Memory | 2025-12-04T17:27:07.5764036Z | ID ID Usage | 2025-12-04T17:27:07.5764458Z |=============================================================================| 2025-12-04T17:27:07.5765461Z | No running processes found | 2025-12-04T17:27:07.5766035Z +-----------------------------------------------------------------------------+ 2025-12-04T17:27:07.6584397Z + nvidia-smi --query-gpu=gpu_name --format=csv,noheader --id=0 2025-12-04T17:27:07.6762989Z Tesla T4 2025-12-04T17:27:07.6805480Z + NVIDIA_SMI_STATUS=0 2025-12-04T17:27:07.6805858Z + '[' 0 -ne 0 ']' 2025-12-04T17:27:07.6811582Z ++ nvidia-smi --list-gpus 2025-12-04T17:27:07.6812928Z ++ wc -l 2025-12-04T17:27:07.7033066Z + GPU_COUNT=1 2025-12-04T17:27:07.7033407Z + NVIDIA_SMI_STATUS=0 2025-12-04T17:27:07.7033704Z + '[' 0 -ne 0 ']' 2025-12-04T17:27:07.7033974Z + '[' 1 -le 8 ']' 2025-12-04T17:27:07.7034234Z + '[' 1 -ne 1 ']' 2025-12-04T17:27:07.7133497Z Post job cleanup. 2025-12-04T17:27:07.7243322Z Post job cleanup. 2025-12-04T17:27:07.7296856Z Post job cleanup. 2025-12-04T17:27:07.8492438Z [command]/usr/bin/git version 2025-12-04T17:27:07.8539429Z git version 2.50.1 2025-12-04T17:27:07.8582273Z Copying '/home/ec2-user/.gitconfig' to '/home/ec2-user/actions-runner/_work/_temp/84ee311e-a21c-45e6-bbc0-dd53cf2d8378/.gitconfig' 2025-12-04T17:27:07.8593246Z Temporarily overriding HOME='/home/ec2-user/actions-runner/_work/_temp/84ee311e-a21c-45e6-bbc0-dd53cf2d8378' before making global git config changes 2025-12-04T17:27:07.8594402Z Adding repository directory to the temporary git global config as a safe directory 2025-12-04T17:27:07.8598942Z [command]/usr/bin/git config --global --add safe.directory /home/ec2-user/actions-runner/_work/pytorch/pytorch 2025-12-04T17:27:07.8642425Z [command]/usr/bin/git config --local --name-only --get-regexp core\.sshCommand 2025-12-04T17:27:07.8689087Z [command]/usr/bin/git submodule foreach --recursive sh -c "git config --local --name-only --get-regexp 'core\.sshCommand' && git config --local --unset-all 'core.sshCommand' || :" 2025-12-04T17:27:07.9045544Z Entering 'android/libs/fbjni' 2025-12-04T17:27:07.9116247Z Entering 'third_party/FP16' 2025-12-04T17:27:07.9181957Z Entering 'third_party/FXdiv' 2025-12-04T17:27:07.9247183Z Entering 'third_party/NNPACK' 2025-12-04T17:27:07.9315220Z Entering 'third_party/NVTX' 2025-12-04T17:27:07.9383350Z Entering 'third_party/VulkanMemoryAllocator' 2025-12-04T17:27:07.9450762Z Entering 'third_party/XNNPACK' 2025-12-04T17:27:07.9534515Z Entering 'third_party/aiter' 2025-12-04T17:27:07.9601844Z Entering 'third_party/aiter/3rdparty/composable_kernel' 2025-12-04T17:27:07.9676107Z Entering 'third_party/benchmark' 2025-12-04T17:27:07.9743146Z Entering 'third_party/composable_kernel' 2025-12-04T17:27:07.9821389Z Entering 'third_party/cpp-httplib' 2025-12-04T17:27:07.9886561Z Entering 'third_party/cpuinfo' 2025-12-04T17:27:07.9953867Z Entering 'third_party/cudnn_frontend' 2025-12-04T17:27:08.0020449Z Entering 'third_party/cutlass' 2025-12-04T17:27:08.0101588Z Entering 'third_party/fbgemm' 2025-12-04T17:27:08.0171283Z Entering 'third_party/fbgemm/external/asmjit' 2025-12-04T17:27:08.0239915Z Entering 'third_party/fbgemm/external/composable_kernel' 2025-12-04T17:27:08.0316591Z Entering 'third_party/fbgemm/external/cpuinfo' 2025-12-04T17:27:08.0383140Z Entering 'third_party/fbgemm/external/cutlass' 2025-12-04T17:27:08.0458476Z Entering 'third_party/fbgemm/external/googletest' 2025-12-04T17:27:08.0525095Z Entering 'third_party/fbgemm/external/hipify_torch' 2025-12-04T17:27:08.0590350Z Entering 'third_party/fbgemm/external/json' 2025-12-04T17:27:08.0660362Z Entering 'third_party/flash-attention' 2025-12-04T17:27:08.0729126Z Entering 'third_party/flash-attention/csrc/composable_kernel' 2025-12-04T17:27:08.0802721Z Entering 'third_party/flash-attention/csrc/cutlass' 2025-12-04T17:27:08.0879733Z Entering 'third_party/flatbuffers' 2025-12-04T17:27:08.0952382Z Entering 'third_party/fmt' 2025-12-04T17:27:08.1018158Z Entering 'third_party/gemmlowp/gemmlowp' 2025-12-04T17:27:08.1085811Z Entering 'third_party/gloo' 2025-12-04T17:27:08.1151573Z Entering 'third_party/googletest' 2025-12-04T17:27:08.1221150Z Entering 'third_party/ideep' 2025-12-04T17:27:08.1286820Z Entering 'third_party/ideep/mkl-dnn' 2025-12-04T17:27:08.1360685Z Entering 'third_party/ittapi' 2025-12-04T17:27:08.1429425Z Entering 'third_party/kineto' 2025-12-04T17:27:08.1497852Z Entering 'third_party/kineto/libkineto/third_party/dynolog' 2025-12-04T17:27:08.1562748Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM' 2025-12-04T17:27:08.1629025Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr' 2025-12-04T17:27:08.1696086Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt' 2025-12-04T17:27:08.1763169Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags' 2025-12-04T17:27:08.1827433Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc' 2025-12-04T17:27:08.1896972Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog' 2025-12-04T17:27:08.1964840Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest' 2025-12-04T17:27:08.2030428Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/json' 2025-12-04T17:27:08.2097004Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs' 2025-12-04T17:27:08.2163453Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp' 2025-12-04T17:27:08.2230592Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/civetweb' 2025-12-04T17:27:08.2299756Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/googletest' 2025-12-04T17:27:08.2372714Z Entering 'third_party/kineto/libkineto/third_party/fmt' 2025-12-04T17:27:08.2438156Z Entering 'third_party/kineto/libkineto/third_party/googletest' 2025-12-04T17:27:08.2508137Z Entering 'third_party/kleidiai' 2025-12-04T17:27:08.2577752Z Entering 'third_party/mimalloc' 2025-12-04T17:27:08.2643159Z Entering 'third_party/nlohmann' 2025-12-04T17:27:08.2713072Z Entering 'third_party/onnx' 2025-12-04T17:27:08.2800446Z Entering 'third_party/onnx/third_party/pybind11' 2025-12-04T17:27:08.2868617Z Entering 'third_party/opentelemetry-cpp' 2025-12-04T17:27:08.2937009Z Entering 'third_party/opentelemetry-cpp/third_party/benchmark' 2025-12-04T17:27:08.3001983Z Entering 'third_party/opentelemetry-cpp/third_party/googletest' 2025-12-04T17:27:08.3066160Z Entering 'third_party/opentelemetry-cpp/third_party/ms-gsl' 2025-12-04T17:27:08.3129820Z Entering 'third_party/opentelemetry-cpp/third_party/nlohmann-json' 2025-12-04T17:27:08.3196122Z Entering 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto' 2025-12-04T17:27:08.3263405Z Entering 'third_party/opentelemetry-cpp/third_party/opentracing-cpp' 2025-12-04T17:27:08.3330029Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp' 2025-12-04T17:27:08.3395749Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb' 2025-12-04T17:27:08.3462044Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest' 2025-12-04T17:27:08.3530369Z Entering 'third_party/opentelemetry-cpp/tools/vcpkg' 2025-12-04T17:27:08.3619123Z Entering 'third_party/pocketfft' 2025-12-04T17:27:08.3686244Z Entering 'third_party/protobuf' 2025-12-04T17:27:08.3755350Z Entering 'third_party/protobuf/third_party/benchmark' 2025-12-04T17:27:08.3822391Z Entering 'third_party/protobuf/third_party/googletest' 2025-12-04T17:27:08.3889470Z Entering 'third_party/psimd' 2025-12-04T17:27:08.3955521Z Entering 'third_party/pthreadpool' 2025-12-04T17:27:08.4022022Z Entering 'third_party/pybind11' 2025-12-04T17:27:08.4088374Z Entering 'third_party/python-peachpy' 2025-12-04T17:27:08.4152970Z Entering 'third_party/sleef' 2025-12-04T17:27:08.4220251Z Entering 'third_party/tensorpipe' 2025-12-04T17:27:08.4286252Z Entering 'third_party/tensorpipe/third_party/googletest' 2025-12-04T17:27:08.4350413Z Entering 'third_party/tensorpipe/third_party/libnop' 2025-12-04T17:27:08.4414989Z Entering 'third_party/tensorpipe/third_party/libuv' 2025-12-04T17:27:08.4482427Z Entering 'third_party/tensorpipe/third_party/pybind11' 2025-12-04T17:27:08.4545072Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang' 2025-12-04T17:27:08.4637058Z [command]/usr/bin/git config --local --name-only --get-regexp http\.https\:\/\/github\.com\/\.extraheader 2025-12-04T17:27:08.4662158Z http.https://github.com/.extraheader 2025-12-04T17:27:08.4673429Z [command]/usr/bin/git config --local --unset-all http.https://github.com/.extraheader 2025-12-04T17:27:08.4708365Z [command]/usr/bin/git submodule foreach --recursive sh -c "git config --local --name-only --get-regexp 'http\.https\:\/\/github\.com\/\.extraheader' && git config --local --unset-all 'http.https://github.com/.extraheader' || :" 2025-12-04T17:27:08.5063561Z Entering 'android/libs/fbjni' 2025-12-04T17:27:08.5108870Z http.https://github.com/.extraheader 2025-12-04T17:27:08.5149263Z Entering 'third_party/FP16' 2025-12-04T17:27:08.5198679Z http.https://github.com/.extraheader 2025-12-04T17:27:08.5238714Z Entering 'third_party/FXdiv' 2025-12-04T17:27:08.5283864Z http.https://github.com/.extraheader 2025-12-04T17:27:08.5323747Z Entering 'third_party/NNPACK' 2025-12-04T17:27:08.5369098Z http.https://github.com/.extraheader 2025-12-04T17:27:08.5411732Z Entering 'third_party/NVTX' 2025-12-04T17:27:08.5456201Z http.https://github.com/.extraheader 2025-12-04T17:27:08.5499267Z Entering 'third_party/VulkanMemoryAllocator' 2025-12-04T17:27:08.5543891Z http.https://github.com/.extraheader 2025-12-04T17:27:08.5586812Z Entering 'third_party/XNNPACK' 2025-12-04T17:27:08.5631580Z http.https://github.com/.extraheader 2025-12-04T17:27:08.5691199Z Entering 'third_party/aiter' 2025-12-04T17:27:08.5735608Z http.https://github.com/.extraheader 2025-12-04T17:27:08.5776963Z Entering 'third_party/aiter/3rdparty/composable_kernel' 2025-12-04T17:27:08.5820453Z http.https://github.com/.extraheader 2025-12-04T17:27:08.5872058Z Entering 'third_party/benchmark' 2025-12-04T17:27:08.5916315Z http.https://github.com/.extraheader 2025-12-04T17:27:08.5955741Z Entering 'third_party/composable_kernel' 2025-12-04T17:27:08.6000313Z http.https://github.com/.extraheader 2025-12-04T17:27:08.6050111Z Entering 'third_party/cpp-httplib' 2025-12-04T17:27:08.6094943Z http.https://github.com/.extraheader 2025-12-04T17:27:08.6136249Z Entering 'third_party/cpuinfo' 2025-12-04T17:27:08.6181126Z http.https://github.com/.extraheader 2025-12-04T17:27:08.6222663Z Entering 'third_party/cudnn_frontend' 2025-12-04T17:27:08.6266772Z http.https://github.com/.extraheader 2025-12-04T17:27:08.6307956Z Entering 'third_party/cutlass' 2025-12-04T17:27:08.6352389Z http.https://github.com/.extraheader 2025-12-04T17:27:08.6405096Z Entering 'third_party/fbgemm' 2025-12-04T17:27:08.6450258Z http.https://github.com/.extraheader 2025-12-04T17:27:08.6495100Z Entering 'third_party/fbgemm/external/asmjit' 2025-12-04T17:27:08.6538609Z http.https://github.com/.extraheader 2025-12-04T17:27:08.6579113Z Entering 'third_party/fbgemm/external/composable_kernel' 2025-12-04T17:27:08.6622555Z http.https://github.com/.extraheader 2025-12-04T17:27:08.6672655Z Entering 'third_party/fbgemm/external/cpuinfo' 2025-12-04T17:27:08.6717321Z http.https://github.com/.extraheader 2025-12-04T17:27:08.6757215Z Entering 'third_party/fbgemm/external/cutlass' 2025-12-04T17:27:08.6802603Z http.https://github.com/.extraheader 2025-12-04T17:27:08.6852906Z Entering 'third_party/fbgemm/external/googletest' 2025-12-04T17:27:08.6897494Z http.https://github.com/.extraheader 2025-12-04T17:27:08.6936663Z Entering 'third_party/fbgemm/external/hipify_torch' 2025-12-04T17:27:08.6982054Z http.https://github.com/.extraheader 2025-12-04T17:27:08.7021021Z Entering 'third_party/fbgemm/external/json' 2025-12-04T17:27:08.7066336Z http.https://github.com/.extraheader 2025-12-04T17:27:08.7110710Z Entering 'third_party/flash-attention' 2025-12-04T17:27:08.7157168Z http.https://github.com/.extraheader 2025-12-04T17:27:08.7199393Z Entering 'third_party/flash-attention/csrc/composable_kernel' 2025-12-04T17:27:08.7242842Z http.https://github.com/.extraheader 2025-12-04T17:27:08.7292000Z Entering 'third_party/flash-attention/csrc/cutlass' 2025-12-04T17:27:08.7336737Z http.https://github.com/.extraheader 2025-12-04T17:27:08.7390662Z Entering 'third_party/flatbuffers' 2025-12-04T17:27:08.7437235Z http.https://github.com/.extraheader 2025-12-04T17:27:08.7483443Z Entering 'third_party/fmt' 2025-12-04T17:27:08.7529722Z http.https://github.com/.extraheader 2025-12-04T17:27:08.7574088Z Entering 'third_party/gemmlowp/gemmlowp' 2025-12-04T17:27:08.7620355Z http.https://github.com/.extraheader 2025-12-04T17:27:08.7661504Z Entering 'third_party/gloo' 2025-12-04T17:27:08.7707356Z http.https://github.com/.extraheader 2025-12-04T17:27:08.7748970Z Entering 'third_party/googletest' 2025-12-04T17:27:08.7794550Z http.https://github.com/.extraheader 2025-12-04T17:27:08.7836306Z Entering 'third_party/ideep' 2025-12-04T17:27:08.7880835Z http.https://github.com/.extraheader 2025-12-04T17:27:08.7919504Z Entering 'third_party/ideep/mkl-dnn' 2025-12-04T17:27:08.7964309Z http.https://github.com/.extraheader 2025-12-04T17:27:08.8014222Z Entering 'third_party/ittapi' 2025-12-04T17:27:08.8059687Z http.https://github.com/.extraheader 2025-12-04T17:27:08.8099695Z Entering 'third_party/kineto' 2025-12-04T17:27:08.8145382Z http.https://github.com/.extraheader 2025-12-04T17:27:08.8185650Z Entering 'third_party/kineto/libkineto/third_party/dynolog' 2025-12-04T17:27:08.8230005Z http.https://github.com/.extraheader 2025-12-04T17:27:08.8270697Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM' 2025-12-04T17:27:08.8315032Z http.https://github.com/.extraheader 2025-12-04T17:27:08.8357349Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr' 2025-12-04T17:27:08.8402474Z http.https://github.com/.extraheader 2025-12-04T17:27:08.8444525Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt' 2025-12-04T17:27:08.8489925Z http.https://github.com/.extraheader 2025-12-04T17:27:08.8531024Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags' 2025-12-04T17:27:08.8576392Z http.https://github.com/.extraheader 2025-12-04T17:27:08.8615317Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc' 2025-12-04T17:27:08.8659548Z http.https://github.com/.extraheader 2025-12-04T17:27:08.8703151Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog' 2025-12-04T17:27:08.8745510Z http.https://github.com/.extraheader 2025-12-04T17:27:08.8785240Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest' 2025-12-04T17:27:08.8827268Z http.https://github.com/.extraheader 2025-12-04T17:27:08.8866565Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/json' 2025-12-04T17:27:08.8909681Z http.https://github.com/.extraheader 2025-12-04T17:27:08.8950050Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs' 2025-12-04T17:27:08.8993428Z http.https://github.com/.extraheader 2025-12-04T17:27:08.9032797Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp' 2025-12-04T17:27:08.9076596Z http.https://github.com/.extraheader 2025-12-04T17:27:08.9114679Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/civetweb' 2025-12-04T17:27:08.9157090Z http.https://github.com/.extraheader 2025-12-04T17:27:08.9202339Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/googletest' 2025-12-04T17:27:08.9246019Z http.https://github.com/.extraheader 2025-12-04T17:27:08.9291524Z Entering 'third_party/kineto/libkineto/third_party/fmt' 2025-12-04T17:27:08.9334933Z http.https://github.com/.extraheader 2025-12-04T17:27:08.9374881Z Entering 'third_party/kineto/libkineto/third_party/googletest' 2025-12-04T17:27:08.9418636Z http.https://github.com/.extraheader 2025-12-04T17:27:08.9460175Z Entering 'third_party/kleidiai' 2025-12-04T17:27:08.9505284Z http.https://github.com/.extraheader 2025-12-04T17:27:08.9545724Z Entering 'third_party/mimalloc' 2025-12-04T17:27:08.9592424Z http.https://github.com/.extraheader 2025-12-04T17:27:08.9631172Z Entering 'third_party/nlohmann' 2025-12-04T17:27:08.9676144Z http.https://github.com/.extraheader 2025-12-04T17:27:08.9716117Z Entering 'third_party/onnx' 2025-12-04T17:27:08.9760169Z http.https://github.com/.extraheader 2025-12-04T17:27:08.9820977Z Entering 'third_party/onnx/third_party/pybind11' 2025-12-04T17:27:08.9864461Z http.https://github.com/.extraheader 2025-12-04T17:27:08.9906546Z Entering 'third_party/opentelemetry-cpp' 2025-12-04T17:27:08.9952435Z http.https://github.com/.extraheader 2025-12-04T17:27:08.9993468Z Entering 'third_party/opentelemetry-cpp/third_party/benchmark' 2025-12-04T17:27:09.0035936Z http.https://github.com/.extraheader 2025-12-04T17:27:09.0074782Z Entering 'third_party/opentelemetry-cpp/third_party/googletest' 2025-12-04T17:27:09.0116187Z http.https://github.com/.extraheader 2025-12-04T17:27:09.0153951Z Entering 'third_party/opentelemetry-cpp/third_party/ms-gsl' 2025-12-04T17:27:09.0197338Z http.https://github.com/.extraheader 2025-12-04T17:27:09.0235713Z Entering 'third_party/opentelemetry-cpp/third_party/nlohmann-json' 2025-12-04T17:27:09.0278754Z http.https://github.com/.extraheader 2025-12-04T17:27:09.0319855Z Entering 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto' 2025-12-04T17:27:09.0362108Z http.https://github.com/.extraheader 2025-12-04T17:27:09.0401872Z Entering 'third_party/opentelemetry-cpp/third_party/opentracing-cpp' 2025-12-04T17:27:09.0446343Z http.https://github.com/.extraheader 2025-12-04T17:27:09.0485836Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp' 2025-12-04T17:27:09.0528638Z http.https://github.com/.extraheader 2025-12-04T17:27:09.0566134Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb' 2025-12-04T17:27:09.0608802Z http.https://github.com/.extraheader 2025-12-04T17:27:09.0649511Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest' 2025-12-04T17:27:09.0693055Z http.https://github.com/.extraheader 2025-12-04T17:27:09.0735348Z Entering 'third_party/opentelemetry-cpp/tools/vcpkg' 2025-12-04T17:27:09.0778738Z http.https://github.com/.extraheader 2025-12-04T17:27:09.0840687Z Entering 'third_party/pocketfft' 2025-12-04T17:27:09.0886633Z http.https://github.com/.extraheader 2025-12-04T17:27:09.0925184Z Entering 'third_party/protobuf' 2025-12-04T17:27:09.0968649Z http.https://github.com/.extraheader 2025-12-04T17:27:09.1010456Z Entering 'third_party/protobuf/third_party/benchmark' 2025-12-04T17:27:09.1054093Z http.https://github.com/.extraheader 2025-12-04T17:27:09.1094295Z Entering 'third_party/protobuf/third_party/googletest' 2025-12-04T17:27:09.1136027Z http.https://github.com/.extraheader 2025-12-04T17:27:09.1179089Z Entering 'third_party/psimd' 2025-12-04T17:27:09.1224428Z http.https://github.com/.extraheader 2025-12-04T17:27:09.1265023Z Entering 'third_party/pthreadpool' 2025-12-04T17:27:09.1310452Z http.https://github.com/.extraheader 2025-12-04T17:27:09.1350646Z Entering 'third_party/pybind11' 2025-12-04T17:27:09.1394150Z http.https://github.com/.extraheader 2025-12-04T17:27:09.1432896Z Entering 'third_party/python-peachpy' 2025-12-04T17:27:09.1476318Z http.https://github.com/.extraheader 2025-12-04T17:27:09.1515413Z Entering 'third_party/sleef' 2025-12-04T17:27:09.1558989Z http.https://github.com/.extraheader 2025-12-04T17:27:09.1597779Z Entering 'third_party/tensorpipe' 2025-12-04T17:27:09.1641623Z http.https://github.com/.extraheader 2025-12-04T17:27:09.1680867Z Entering 'third_party/tensorpipe/third_party/googletest' 2025-12-04T17:27:09.1723394Z http.https://github.com/.extraheader 2025-12-04T17:27:09.1766513Z Entering 'third_party/tensorpipe/third_party/libnop' 2025-12-04T17:27:09.1810883Z http.https://github.com/.extraheader 2025-12-04T17:27:09.1848260Z Entering 'third_party/tensorpipe/third_party/libuv' 2025-12-04T17:27:09.1892502Z http.https://github.com/.extraheader 2025-12-04T17:27:09.1929951Z Entering 'third_party/tensorpipe/third_party/pybind11' 2025-12-04T17:27:09.1972520Z http.https://github.com/.extraheader 2025-12-04T17:27:09.2009374Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang' 2025-12-04T17:27:09.2051924Z http.https://github.com/.extraheader 2025-12-04T17:27:09.2118213Z [command]/usr/bin/git config --local --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T17:27:09.2155036Z [command]/usr/bin/git submodule foreach --recursive git config --local --show-origin --name-only --get-regexp remote.origin.url 2025-12-04T17:27:09.2514570Z Entering 'android/libs/fbjni' 2025-12-04T17:27:09.2543868Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/android/libs/fbjni/config remote.origin.url 2025-12-04T17:27:09.2563477Z Entering 'third_party/FP16' 2025-12-04T17:27:09.2594196Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/FP16/config remote.origin.url 2025-12-04T17:27:09.2612657Z Entering 'third_party/FXdiv' 2025-12-04T17:27:09.2641652Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/FXdiv/config remote.origin.url 2025-12-04T17:27:09.2660359Z Entering 'third_party/NNPACK' 2025-12-04T17:27:09.2689594Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK/config remote.origin.url 2025-12-04T17:27:09.2708516Z Entering 'third_party/NVTX' 2025-12-04T17:27:09.2737056Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/NVTX/config remote.origin.url 2025-12-04T17:27:09.2756610Z Entering 'third_party/VulkanMemoryAllocator' 2025-12-04T17:27:09.2785573Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/VulkanMemoryAllocator/config remote.origin.url 2025-12-04T17:27:09.2804252Z Entering 'third_party/XNNPACK' 2025-12-04T17:27:09.2832749Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/XNNPACK/config remote.origin.url 2025-12-04T17:27:09.2868946Z Entering 'third_party/aiter' 2025-12-04T17:27:09.2897949Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/aiter/config remote.origin.url 2025-12-04T17:27:09.2917154Z Entering 'third_party/aiter/3rdparty/composable_kernel' 2025-12-04T17:27:09.2945418Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/aiter/modules/3rdparty/composable_kernel/config remote.origin.url 2025-12-04T17:27:09.2974468Z Entering 'third_party/benchmark' 2025-12-04T17:27:09.3003307Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/benchmark/config remote.origin.url 2025-12-04T17:27:09.3022825Z Entering 'third_party/composable_kernel' 2025-12-04T17:27:09.3052286Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/composable_kernel/config remote.origin.url 2025-12-04T17:27:09.3081836Z Entering 'third_party/cpp-httplib' 2025-12-04T17:27:09.3111023Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/cpp-httplib/config remote.origin.url 2025-12-04T17:27:09.3129922Z Entering 'third_party/cpuinfo' 2025-12-04T17:27:09.3159097Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/cpuinfo/config remote.origin.url 2025-12-04T17:27:09.3179880Z Entering 'third_party/cudnn_frontend' 2025-12-04T17:27:09.3212542Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/cudnn_frontend/config remote.origin.url 2025-12-04T17:27:09.3231893Z Entering 'third_party/cutlass' 2025-12-04T17:27:09.3261070Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/cutlass/config remote.origin.url 2025-12-04T17:27:09.3291693Z Entering 'third_party/fbgemm' 2025-12-04T17:27:09.3320672Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/config remote.origin.url 2025-12-04T17:27:09.3342513Z Entering 'third_party/fbgemm/external/asmjit' 2025-12-04T17:27:09.3371836Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/asmjit/config remote.origin.url 2025-12-04T17:27:09.3389804Z Entering 'third_party/fbgemm/external/composable_kernel' 2025-12-04T17:27:09.3419270Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/composable_kernel/config remote.origin.url 2025-12-04T17:27:09.3450156Z Entering 'third_party/fbgemm/external/cpuinfo' 2025-12-04T17:27:09.3479759Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/cpuinfo/config remote.origin.url 2025-12-04T17:27:09.3499228Z Entering 'third_party/fbgemm/external/cutlass' 2025-12-04T17:27:09.3527511Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/cutlass/config remote.origin.url 2025-12-04T17:27:09.3557193Z Entering 'third_party/fbgemm/external/googletest' 2025-12-04T17:27:09.3586295Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/googletest/config remote.origin.url 2025-12-04T17:27:09.3604618Z Entering 'third_party/fbgemm/external/hipify_torch' 2025-12-04T17:27:09.3632981Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/hipify_torch/config remote.origin.url 2025-12-04T17:27:09.3650516Z Entering 'third_party/fbgemm/external/json' 2025-12-04T17:27:09.3680078Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/json/config remote.origin.url 2025-12-04T17:27:09.3703514Z Entering 'third_party/flash-attention' 2025-12-04T17:27:09.3733500Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/flash-attention/config remote.origin.url 2025-12-04T17:27:09.3753876Z Entering 'third_party/flash-attention/csrc/composable_kernel' 2025-12-04T17:27:09.3785195Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/flash-attention/modules/csrc/composable_kernel/config remote.origin.url 2025-12-04T17:27:09.3810590Z Entering 'third_party/flash-attention/csrc/cutlass' 2025-12-04T17:27:09.3839333Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/flash-attention/modules/csrc/cutlass/config remote.origin.url 2025-12-04T17:27:09.3869279Z Entering 'third_party/flatbuffers' 2025-12-04T17:27:09.3898942Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/flatbuffers/config remote.origin.url 2025-12-04T17:27:09.3921163Z Entering 'third_party/fmt' 2025-12-04T17:27:09.3950600Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fmt/config remote.origin.url 2025-12-04T17:27:09.3970012Z Entering 'third_party/gemmlowp/gemmlowp' 2025-12-04T17:27:09.3999352Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/gemmlowp/gemmlowp/config remote.origin.url 2025-12-04T17:27:09.4018650Z Entering 'third_party/gloo' 2025-12-04T17:27:09.4047835Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/gloo/config remote.origin.url 2025-12-04T17:27:09.4067423Z Entering 'third_party/googletest' 2025-12-04T17:27:09.4096614Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/googletest/config remote.origin.url 2025-12-04T17:27:09.4115380Z Entering 'third_party/ideep' 2025-12-04T17:27:09.4144782Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/ideep/config remote.origin.url 2025-12-04T17:27:09.4162560Z Entering 'third_party/ideep/mkl-dnn' 2025-12-04T17:27:09.4192297Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/ideep/modules/mkl-dnn/config remote.origin.url 2025-12-04T17:27:09.4220455Z Entering 'third_party/ittapi' 2025-12-04T17:27:09.4251457Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/ittapi/config remote.origin.url 2025-12-04T17:27:09.4269415Z Entering 'third_party/kineto' 2025-12-04T17:27:09.4299895Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/config remote.origin.url 2025-12-04T17:27:09.4318797Z Entering 'third_party/kineto/libkineto/third_party/dynolog' 2025-12-04T17:27:09.4347559Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/config remote.origin.url 2025-12-04T17:27:09.4364971Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM' 2025-12-04T17:27:09.4395003Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/DCGM/config remote.origin.url 2025-12-04T17:27:09.4414238Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr' 2025-12-04T17:27:09.4444771Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/cpr/config remote.origin.url 2025-12-04T17:27:09.4462887Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt' 2025-12-04T17:27:09.4492041Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/fmt/config remote.origin.url 2025-12-04T17:27:09.4510461Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags' 2025-12-04T17:27:09.4539730Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/gflags/config remote.origin.url 2025-12-04T17:27:09.4556554Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc' 2025-12-04T17:27:09.4585918Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/gflags/modules/doc/config remote.origin.url 2025-12-04T17:27:09.4606051Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog' 2025-12-04T17:27:09.4634650Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/glog/config remote.origin.url 2025-12-04T17:27:09.4653919Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest' 2025-12-04T17:27:09.4683189Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/googletest/config remote.origin.url 2025-12-04T17:27:09.4701940Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/json' 2025-12-04T17:27:09.4731521Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/json/config remote.origin.url 2025-12-04T17:27:09.4751074Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs' 2025-12-04T17:27:09.4781865Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/pfs/config remote.origin.url 2025-12-04T17:27:09.4800227Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp' 2025-12-04T17:27:09.4828500Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/prometheus-cpp/config remote.origin.url 2025-12-04T17:27:09.4845897Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/civetweb' 2025-12-04T17:27:09.4875645Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/prometheus-cpp/modules/civetweb/config remote.origin.url 2025-12-04T17:27:09.4895892Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/googletest' 2025-12-04T17:27:09.4925242Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/prometheus-cpp/modules/googletest/config remote.origin.url 2025-12-04T17:27:09.4948016Z Entering 'third_party/kineto/libkineto/third_party/fmt' 2025-12-04T17:27:09.4976731Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/fmt/config remote.origin.url 2025-12-04T17:27:09.4994593Z Entering 'third_party/kineto/libkineto/third_party/googletest' 2025-12-04T17:27:09.5023331Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/googletest/config remote.origin.url 2025-12-04T17:27:09.5043199Z Entering 'third_party/kleidiai' 2025-12-04T17:27:09.5073189Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kleidiai/config remote.origin.url 2025-12-04T17:27:09.5092451Z Entering 'third_party/mimalloc' 2025-12-04T17:27:09.5122001Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/mimalloc/config remote.origin.url 2025-12-04T17:27:09.5141343Z Entering 'third_party/nlohmann' 2025-12-04T17:27:09.5169871Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/nlohmann/config remote.origin.url 2025-12-04T17:27:09.5196973Z Entering 'third_party/onnx' 2025-12-04T17:27:09.5226270Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/onnx/config remote.origin.url 2025-12-04T17:27:09.5266354Z Entering 'third_party/onnx/third_party/pybind11' 2025-12-04T17:27:09.5296788Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/onnx/modules/third_party/pybind11/config remote.origin.url 2025-12-04T17:27:09.5318276Z Entering 'third_party/opentelemetry-cpp' 2025-12-04T17:27:09.5349374Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/config remote.origin.url 2025-12-04T17:27:09.5370009Z Entering 'third_party/opentelemetry-cpp/third_party/benchmark' 2025-12-04T17:27:09.5400854Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/benchmark/config remote.origin.url 2025-12-04T17:27:09.5419437Z Entering 'third_party/opentelemetry-cpp/third_party/googletest' 2025-12-04T17:27:09.5447480Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/googletest/config remote.origin.url 2025-12-04T17:27:09.5466057Z Entering 'third_party/opentelemetry-cpp/third_party/ms-gsl' 2025-12-04T17:27:09.5495744Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/ms-gsl/config remote.origin.url 2025-12-04T17:27:09.5513362Z Entering 'third_party/opentelemetry-cpp/third_party/nlohmann-json' 2025-12-04T17:27:09.5541138Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/nlohmann-json/config remote.origin.url 2025-12-04T17:27:09.5560430Z Entering 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto' 2025-12-04T17:27:09.5596291Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/opentelemetry-proto/config remote.origin.url 2025-12-04T17:27:09.5613377Z Entering 'third_party/opentelemetry-cpp/third_party/opentracing-cpp' 2025-12-04T17:27:09.5643725Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/opentracing-cpp/config remote.origin.url 2025-12-04T17:27:09.5660692Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp' 2025-12-04T17:27:09.5690608Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/prometheus-cpp/config remote.origin.url 2025-12-04T17:27:09.5707495Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb' 2025-12-04T17:27:09.5736381Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/prometheus-cpp/modules/civetweb/config remote.origin.url 2025-12-04T17:27:09.5756634Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest' 2025-12-04T17:27:09.5785069Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/prometheus-cpp/modules/googletest/config remote.origin.url 2025-12-04T17:27:09.5804720Z Entering 'third_party/opentelemetry-cpp/tools/vcpkg' 2025-12-04T17:27:09.5833581Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/tools/vcpkg/config remote.origin.url 2025-12-04T17:27:09.5877058Z Entering 'third_party/pocketfft' 2025-12-04T17:27:09.5907856Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/pocketfft/config remote.origin.url 2025-12-04T17:27:09.5926153Z Entering 'third_party/protobuf' 2025-12-04T17:27:09.5956344Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/protobuf/config remote.origin.url 2025-12-04T17:27:09.5980515Z Entering 'third_party/protobuf/third_party/benchmark' 2025-12-04T17:27:09.6008425Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/protobuf/modules/third_party/benchmark/config remote.origin.url 2025-12-04T17:27:09.6026535Z Entering 'third_party/protobuf/third_party/googletest' 2025-12-04T17:27:09.6055705Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/protobuf/modules/third_party/googletest/config remote.origin.url 2025-12-04T17:27:09.6077041Z Entering 'third_party/psimd' 2025-12-04T17:27:09.6183653Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/psimd/config remote.origin.url 2025-12-04T17:27:09.6202348Z Entering 'third_party/pthreadpool' 2025-12-04T17:27:09.6231693Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/pthreadpool/config remote.origin.url 2025-12-04T17:27:09.6250167Z Entering 'third_party/pybind11' 2025-12-04T17:27:09.6279170Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/pybind11/config remote.origin.url 2025-12-04T17:27:09.6298166Z Entering 'third_party/python-peachpy' 2025-12-04T17:27:09.6326368Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/python-peachpy/config remote.origin.url 2025-12-04T17:27:09.6346476Z Entering 'third_party/sleef' 2025-12-04T17:27:09.6376395Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/sleef/config remote.origin.url 2025-12-04T17:27:09.6394987Z Entering 'third_party/tensorpipe' 2025-12-04T17:27:09.6425386Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/config remote.origin.url 2025-12-04T17:27:09.6443017Z Entering 'third_party/tensorpipe/third_party/googletest' 2025-12-04T17:27:09.6472457Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/googletest/config remote.origin.url 2025-12-04T17:27:09.6490358Z Entering 'third_party/tensorpipe/third_party/libnop' 2025-12-04T17:27:09.6518804Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/libnop/config remote.origin.url 2025-12-04T17:27:09.6537178Z Entering 'third_party/tensorpipe/third_party/libuv' 2025-12-04T17:27:09.6565186Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/libuv/config remote.origin.url 2025-12-04T17:27:09.6584672Z Entering 'third_party/tensorpipe/third_party/pybind11' 2025-12-04T17:27:09.6613853Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/pybind11/config remote.origin.url 2025-12-04T17:27:09.6630678Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang' 2025-12-04T17:27:09.6660355Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/pybind11/modules/tools/clang/config remote.origin.url 2025-12-04T17:27:09.6702037Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/android/libs/fbjni/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T17:27:09.6731835Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/FP16/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T17:27:09.6758630Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/FXdiv/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T17:27:09.6788167Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T17:27:09.6813638Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/NVTX/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T17:27:09.6843583Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/VulkanMemoryAllocator/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T17:27:09.6871771Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/XNNPACK/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T17:27:09.6897189Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/aiter/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T17:27:09.6923552Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/aiter/modules/3rdparty/composable_kernel/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T17:27:09.6950839Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/benchmark/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T17:27:09.6978053Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/composable_kernel/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T17:27:09.7005181Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/cpp-httplib/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T17:27:09.7033671Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/cpuinfo/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T17:27:09.7060657Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/cudnn_frontend/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T17:27:09.7087360Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/cutlass/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T17:27:09.7113822Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T17:27:09.7140209Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/asmjit/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T17:27:09.7166759Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/composable_kernel/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T17:27:09.7194739Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/cpuinfo/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T17:27:09.7220309Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/cutlass/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T17:27:09.7247138Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/googletest/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T17:27:09.7274472Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/hipify_torch/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T17:27:09.7301723Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/json/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T17:27:09.7327620Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/flash-attention/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T17:27:09.7355618Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/flash-attention/modules/csrc/composable_kernel/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T17:27:09.7382692Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/flash-attention/modules/csrc/cutlass/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T17:27:09.7408141Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/flatbuffers/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T17:27:09.7434235Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fmt/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T17:27:09.7463940Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/gemmlowp/gemmlowp/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T17:27:09.7492061Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/gloo/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T17:27:09.7520143Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/googletest/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T17:27:09.7549654Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/ideep/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T17:27:09.7580538Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/ideep/modules/mkl-dnn/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T17:27:09.7610893Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/ittapi/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T17:27:09.7639368Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T17:27:09.7671497Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T17:27:09.7701562Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/DCGM/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T17:27:09.7731770Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/cpr/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T17:27:09.7760360Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/fmt/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T17:27:09.7800983Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/gflags/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T17:27:09.7819291Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/gflags/modules/doc/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T17:27:09.7847622Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/glog/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T17:27:09.7883329Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/googletest/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T17:27:09.7912674Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/json/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T17:27:09.7942318Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/pfs/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T17:27:09.7972418Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/prometheus-cpp/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T17:27:09.8003662Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/prometheus-cpp/modules/civetweb/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T17:27:09.8034333Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/prometheus-cpp/modules/googletest/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T17:27:09.8062682Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/fmt/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T17:27:09.8091799Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/googletest/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T17:27:09.8117017Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kleidiai/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T17:27:09.8143544Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/mimalloc/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T17:27:09.8169317Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/nlohmann/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T17:27:09.8195469Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/onnx/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T17:27:09.8221607Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/onnx/modules/third_party/pybind11/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T17:27:09.8248230Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T17:27:09.8275628Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/benchmark/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T17:27:09.8302161Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/googletest/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T17:27:09.8327063Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/ms-gsl/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T17:27:09.8353609Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/nlohmann-json/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T17:27:09.8382775Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/opentelemetry-proto/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T17:27:09.8409155Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/opentracing-cpp/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T17:27:09.8436838Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/prometheus-cpp/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T17:27:09.8464495Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/prometheus-cpp/modules/civetweb/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T17:27:09.8500656Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/prometheus-cpp/modules/googletest/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T17:27:09.8527137Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/tools/vcpkg/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T17:27:09.8552884Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/pocketfft/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T17:27:09.8580460Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/protobuf/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T17:27:09.8610735Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/protobuf/modules/third_party/benchmark/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T17:27:09.8638401Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/protobuf/modules/third_party/googletest/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T17:27:09.8663797Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/psimd/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T17:27:09.8692519Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/pthreadpool/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T17:27:09.8719596Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/pybind11/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T17:27:09.8746294Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/python-peachpy/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T17:27:09.8773560Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/sleef/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T17:27:09.8800910Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T17:27:09.8827504Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/googletest/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T17:27:09.8854719Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/libnop/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T17:27:09.8881213Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/libuv/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T17:27:09.8911204Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/pybind11/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T17:27:09.8942046Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/pybind11/modules/tools/clang/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T17:27:09.9053598Z A job completed hook has been configured by the self-hosted runner administrator 2025-12-04T17:27:09.9069245Z ##[group]Run '/home/ec2-user/runner-scripts/after_job.sh' 2025-12-04T17:27:09.9075593Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T17:27:09.9076071Z ##[endgroup] 2025-12-04T17:27:09.9169373Z [!ALERT!] Swap in detected! [!ALERT!] 2025-12-04T17:27:22.9963070Z [!ALERT!] Swap out detected [!ALERT!] 2025-12-04T17:27:44.8692407Z Cleaning up orphan processes